<< Chapter < Page | Chapter >> Page > |
In this appendix, we take a look at the assembly language produced by a number of different compilers on a number of different architectures. In this survey we revisit some of the issues of CISC versus RISC, and the strengths and weaknesses of different architectures.
For this survey, two roughly identical segments of code were used. The code was a relatively long loop adding two arrays and storing the result in a third array. The loops were written both in FORTRAN and C.
The FORTRAN loop was as follows:
SUBROUTINE ADDEM(A,B,C,N)
REAL A(10000),B(10000),C(10000)INTEGER N,I
DO 10 I=1,NA(I) = B(I) + C(I)
ENDDOEND
The C version was:
for(i=0;i<n;i++) a[i] = b[i]+ c[i];
We have gathered these examples over the years from a number of different compilers, and the results are not particularly scientific. This is not intended to review a particular architecture or compiler version, but rather just to show an example of the kinds of things you can learn from looking at the output of the compiler.
The Intel 8088 processor used in the original IBM Personal Computer is a very traditional CISC processing system with features severely limited by its transistor count. It has very few registers, and the registers generally have rather specific functions. To support a large memory model, it must set its segment register leading up to each memory operation. This limitation means that every memory access takes a minimum of three instructions. Interestingly, a similar pattern occurs on RISC processors.
You notice that at one point, the code moves a value from the
ax
register to the
bx
register because it needs to perform another computation that can only be done in the
ax
register. Note that this is only an integer computation, as the Intel
mov word ptr -2[bp],0 # bp is I$11:
mov ax,word ptr -2[bp]# Load I
cmp ax,word ptr 18[bp]# Check I>=N
bge $10shl ax,1 # Multiply I by 2
mov bx,ax # Done - now move to bxadd bx,word ptr 10[bp] # bx = Address of B + Offsetmov es,word ptr 12[bp] # Top part of addressmov ax,es: word ptr [bx] # Load B(i)mov bx,word ptr -2[bp] # Load Ishl bx,1 # Multiply I by 2
add bx,word ptr 14[bp]# bx = Address of C + Offset
mov es,word ptr 16[bp]# Top part of address
add ax,es: word ptr [bx]# Load C(I)
mov bx,word ptr -2[bp]# Load I
shl bx,1 # Multiply I by 2add bx,word ptr 6[bp] # bx = Address of A + Offsetmov es,word ptr 8[bp] # Top part of addressmov es: word ptr [bx],ax # Store$9:
inc word ptr -2[bp]# Increment I in memory
jmp $11$10:
Because there are so few registers, the variable
I
is kept in memory and loaded several times throughout the loop. The inc instruction at the end of the loop actually updates the value in memory. Interestingly, at the top of the loop, the value is then reloaded from memory.
Notification Switch
Would you like to follow the 'High performance computing' conversation and receive update notifications?