<< Chapter < Page | Chapter >> Page > |
With code like this, it’s common for every value of
K(I)
to be unique. This is called a
permutation . If you can tell a compiler that it is dealing with a permutation, the penalty is lessened in some cases. Even so, there is insult being added to injury. Indirect references require more memory activity than direct references, and this slows you down.
FORTRAN compilers depend on programmers to observe aliasing rules. That is, programmers are not supposed to modify locations through pointers that may be aliases of one another. They can become aliases in several ways, such as when two dummy arguments receive pointers to the same storage locations:
CALL BOB (A,A)
...END
SUBROUTINE BOB (X,Y) ← X,Y become aliases
C compilers don’t enjoy the same restrictions on aliasing. In fact, there are cases where aliasing could be desirable. Additionally, C is blessed with pointer types, increasing the opportunities for aliasing to occur. This means that a C compiler has to approach operations through pointers more conservatively than a FORTRAN compiler would. Let’s look at some examples to see why.
The following loop nest looks like a FORTRAN loop cast in C. The arrays are declared or allocated all at once at the top of the routine, and the starting address and leading dimensions are visible to the compiler. This is important because it means that the storage relationship between the array elements is well known. Hence, you could expect good performance:
#define N ...
double *a[N][N], c[N][N], d;for (i=0; i<N; i++)
for (j=0; j<N; j++)
a[i][j] = a[i][j] + c[j][i] * d;
Now imagine what happens if you allocate the rows dynamically. This makes the address calculations more complicated. The loop nest hasn’t changed; however, there is no guaranteed stride that can get you from one row to the next. This is because the storage relationship between the rows is unknown:
#define N ...
double *a[N], *c[N], d;for (i=0; i<N; i++) {
a[i]= (double *) malloc (N*sizeof(double));
c[i]= (double *) malloc (N*sizeof(double));
}for (i=0; i<N; i++)
for (j=0; j<N; j++)
a[i][j] = a[i][j] + c[j][i] * d;
In fact, your compiler knows even less than you might expect about the storage relationship. For instance, how can it be sure that references to a and c aren’t aliases? It may be obvious to you that they’re not. You might point out that malloc never overlaps storage. But the compiler isn’t free to assume that. Who knows? You may be substituting your own version of malloc !
Let’s look at a different example, where storage is allocated all at once, though the declarations are not visible to all routines that are using it. The following subroutine bob performs the same computation as our previous example. However, because the compiler can’t see the declarations for a and c (they’re in the main routine), it doesn’t have enough information to be able to overlap memory references from successive iterations; the references could be aliases:
#define N...
main(){
double a[N][N], c[N][N], d;...
bob (a,c,d,N);}
bob (double *a,double *c,double d,int n){
int i,j;double *ap, *cp;
for (i=0;i<n;i++) {
ap = a + (i*n);cp = c + i;
for (j=0; j<n; j++)
*(ap+j) = *(ap+j) + *(cp+(j*n)) * d;}
}
To get the best performance, make available to the compiler as many details about the size and shape of your data structures as possible. Pointers, whether in the form of formal arguments to a subroutine or explicitly declared, can hide important facts about how you are using memory. The more information the compiler has, the more it can overlap memory references. This information can come from compiler directives or from making declarations visible in the routines where performance is most critical.
Notification Switch
Would you like to follow the 'High performance computing' conversation and receive update notifications?