<< Chapter < Page | Chapter >> Page > |
In this type of architecture, the available registers put such a strain on the flexibility of the compiler, there is often not much optimization that is practical.
In this section, we examine another classic CISC processor, the Motorola MC68020, which was used to build Macintosh computers and Sun workstations. We happened to run this code on a BBN GP-1000 Butterfly parallel processing system made up of 96 MC68020 processors.
The Motorola architecture is relatively easy to program in assembly language. It has plenty of 32-bit registers, and they are relatively easy to use. It has a CISC instruction set that keeps assembly language programming quite simple. Many instructions can perform multiple operations in a single instruction.
We use this example to show a progression of optimization levels, using a f77 compiler on a floating-point version of the loop. Our first example is with no optimization:
! Note d0 contains the value I
L5:movl d0,L13 ! Store I to memory if loop ends
lea a1@(-4),a0 ! a1 = address of Bfmoves a0@(0,d0:l:4),fp0 ! Load of B(I)
lea a3@(-4),a0 ! a3 = address of Cfadds a0@(0,d0:l:4),fp0 ! Load of C(I) (And Add)
lea a2@(-4),a0 ! a2 = address of Afmoves fp0,a0@(0,d0:l:4) ! Store of A(I)
addql #1,d0 ! Increment Isubql #1,d1 ! Decrement "N"
tstl d1bnes L5
The value for
I
is stored in the
d0
register. Each time through the loop, it’s incremented by
1
. At the same time, register
d1
is initialized to the value for
N
and decremented each time through the loop. Each time through the loop,
I
is stored into memory, so the proper value for
I
ends up in memory when the loop terminates. Registers
a1
,
a2
, and
a3
are preloaded to be the first address of the arrays
B
,
A
, and
C
respectively. However, since FORTRAN arrays begin at 1, we must subtract 4 from each of these addresses before we can use
I
as the offset. The
lea
instructions are effectively subtracting 4 from one address register and storing it in another.
The following instruction performs an address computation that is almost a one-to- one translation of an array reference:
fmoves a0@(0,d0:l:4),fp0 ! Load of B(I)
This instruction retrieves a floating-point value from the memory. The address is computed by first multiplying
d0
by 4 (because these are 32-bit floating-point numbers) and adding that value to
a0
. As a matter of fact, the
lea
and
fmoves
instructions could have been combined as follows:
fmoves a1@(-4,d0:l:4),fp0 ! Load of B(I)
To compute its memory address, this instruction multiplies
d0
by 4, adds the contents of
a1
, and then subtracts 4. The resulting address is used to load 4 bytes into floating-point register
fp0
. This is almost a literal translation of fetching
B(I)
. You can see how the assembly is set up to track high-level constructs.
It is almost as if the compiler were “trying” to show off and make use of the nifty assembly language instructions.
Notification Switch
Would you like to follow the 'High performance computing' conversation and receive update notifications?