<< Chapter < Page | Chapter >> Page > |
% prof stuff>stuff.prof
To explore how the prof command works, we have created the following ridiculous little application, loops.c . It contains a main routine and three subroutines for which you can predict the time distribution just by looking at the code.
main () {
int l;for (l=0;l<1000;l++) {
if (l == 2*(l/2)) foo ();bar();
baz();}
}foo (){
int j;for (j=0;j<200;j++)
}bar () {
int i;for (i=0;i<200;i++);
}baz () {
int k;for (k=0;k<300;k++);
}
Again, you need to compile and link
loops with the
–p
flag, run the program, and then run the
prof utility to extract a profile, as follows:
% cc loops.c -p -o loops
% ./loops% prof loops>loops.prof
The following example shows what a loops.prof should look like. There are six columns.
%Time Seconds Cumsecs #Calls msec/call Name
56.8 0.50 0.50 1000 0.500 _baz27.3 0.24 0.74 1000 0.240 _bar
15.9 0.14 0.88 500 0.28 _foo0.0 0.00 0.88 1 0. _creat
0.0 0.00 0.88 2 0. _profil0.0 0.00 0.88 1 0. _main
0.0 0.00 0.88 3 0. _getenv0.0 0.00 0.88 1 0. _strcpy
0.0 0.00 0.88 1 0. _write
The columns can be described as follows:
%Time
Percentage of CPU time consumed by this routineSeconds
CPU time consumed by this routineCumsecs
A running total of time consumed by this and all preceding routines in the listCalls
The number of times this particular routine was calledmsec/call
Seconds divided by number of calls giving the average length of time taken by each invocation of the routineName
The name of this routineThe top three routines listed are from
loops.c itself. You can see an entry for the “main” routine more than halfway down the list. Depending on the vendor, the names of the routines may contain leading or trailing underscores, and there will always be some routines listed you don’t recognize. These are contributions from the C library and possibly the FORTRAN libraries, if you are using FORTRAN. Profiling also introduces some overhead into the run, and often shows up as one or two subroutines in the
prof output. In this case, the entry for
_profil
represents code inserted by the linker for collecting runtime profiling data.
If it was our intention to tune loops , we would consider a profile like the one in the figure above to be a fairly good sign. The lead routine takes 50% of the runtime, so at least there is a chance we could do something with it that would have a significant impact on the overall runtime. (Of course with a program as trivial as loops , there is plenty we can do, since loops does nothing.)
Just as it’s important to know how time is distributed when your program runs, it’s also valuable to be able to tell who called who in the list of routines. Imagine, for instance, if something labeled
_exp
showed up high in the list in the prof output. You might say: “Hmmm, I don’t remember calling anything named
exp()
. I wonder where that came from.” A call tree helps you find it.
Notification Switch
Would you like to follow the 'High performance computing' conversation and receive update notifications?