<< Chapter < Page Chapter >> Page >

In this type of architecture, the available registers put such a strain on the flexibility of the compiler, there is often not much optimization that is practical.

Motorola mc68020

In this section, we examine another classic CISC processor, the Motorola MC68020, which was used to build Macintosh computers and Sun workstations. We happened to run this code on a BBN GP-1000 Butterfly parallel processing system made up of 96 MC68020 processors.

The Motorola architecture is relatively easy to program in assembly language. It has plenty of 32-bit registers, and they are relatively easy to use. It has a CISC instruction set that keeps assembly language programming quite simple. Many instructions can perform multiple operations in a single instruction.

We use this example to show a progression of optimization levels, using a f77 compiler on a floating-point version of the loop. Our first example is with no optimization:


! Note d0 contains the value I L5:movl d0,L13 ! Store I to memory if loop ends lea a1@(-4),a0 ! a1 = address of Bfmoves a0@(0,d0:l:4),fp0 ! Load of B(I) lea a3@(-4),a0 ! a3 = address of Cfadds a0@(0,d0:l:4),fp0 ! Load of C(I) (And Add) lea a2@(-4),a0 ! a2 = address of Afmoves fp0,a0@(0,d0:l:4) ! Store of A(I) addql #1,d0 ! Increment Isubql #1,d1 ! Decrement "N" tstl d1bnes L5

The value for I is stored in the d0 register. Each time through the loop, it’s incremented by 1 . At the same time, register d1 is initialized to the value for N and decremented each time through the loop. Each time through the loop, I is stored into memory, so the proper value for I ends up in memory when the loop terminates. Registers a1 , a2 , and a3 are preloaded to be the first address of the arrays B , A , and C respectively. However, since FORTRAN arrays begin at 1, we must subtract 4 from each of these addresses before we can use I as the offset. The lea instructions are effectively subtracting 4 from one address register and storing it in another.

The following instruction performs an address computation that is almost a one-to- one translation of an array reference:


fmoves a0@(0,d0:l:4),fp0 ! Load of B(I)

This instruction retrieves a floating-point value from the memory. The address is computed by first multiplying d0 by 4 (because these are 32-bit floating-point numbers) and adding that value to a0 . As a matter of fact, the lea and fmoves instructions could have been combined as follows:


fmoves a1@(-4,d0:l:4),fp0 ! Load of B(I)

To compute its memory address, this instruction multiplies d0 by 4, adds the contents of a1 , and then subtracts 4. The resulting address is used to load 4 bytes into floating-point register fp0 . This is almost a literal translation of fetching B(I) . You can see how the assembly is set up to track high-level constructs.

It is almost as if the compiler were “trying” to show off and make use of the nifty assembly language instructions.

Get Jobilize Job Search Mobile App in your pocket Now!

Get it on Google Play Download on the App Store Now




Source:  OpenStax, High performance computing. OpenStax CNX. Aug 25, 2010 Download for free at http://cnx.org/content/col11136/1.5
Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'High performance computing' conversation and receive update notifications?

Ask