<< Chapter < Page | Chapter >> Page > |
Even given the finite return on effort suggested by Amdahl’s Law, tuning a program with a sharp profile can be rewarding. Programs with flat profiles are much more difficult to tune. These are often system codes, nonnumeric applications, and varieties of numerical codes without matrix solutions. It takes a global tuning approach to reduce, to any justifiable degree, the runtime of a program with a flat profile. For instance, you can sometimes optimize instruction cache usage, which is complicated because of the program’s equal distribution of activity among a large number of routines. It can also help to reduce subroutine call overhead by folding callees into callers. Occasionally, you can find a memory reference problem that is endemic to the whole program — and one that can be fixed all at once.
When you look at a profile, you might find an unusually large percentage of time spent in the library routines such as
log
,
exp
, or
sin
. Often these functions are done in software routines rather than inline. You may be able to rewrite your code to eliminate some of these operations. Another important pattern to look for is when a routine takes far longer than you expect. Unexpected execution time may indicate you are accessing memory in a pattern that is bad for performance or that some aspect of the code cannot be optimized properly.
In any case, to get a profile, you need a profiler. One or two subroutine profilers come standard with the software development environments on all UNIX machines. We discuss two of them: prof and gprof . In addition, we mention a few line-by-line profilers. Subroutine profilers can give you a general overall view of where time is being spent. You probably should start with prof , if you have it (most machines do). Otherwise, use gprof . After that, you can move to a line-by- line profiler if you need to know which statements take the most time.
prof is the most common of the UNIX profiling tools. In a sense, it is an extension of the compiler, linker, and object libraries, plus a few extra utilities, so it is hard to look at any one thing and say “this profiles your code.”
prof works by periodically sampling the program counter as your application runs. To enable profiling, you must recompile and relink using the
–p
flag. For example, if your program has two modules,
stuff.c and
junk.c , you need to compile and link according to the following code:
% cc stuff.c -p -O -c
% cc junk.c -p -O -c% cc stuff.o junk.o -p -o stuff
This creates a stuff binary that is ready for profiling. You don’t need to do anything special to run it. Just treat it normally by entering
stuff
. Because runtime statistics are being gathered, it takes a little longer than usual to execute.
Remember: code with profiling enabled takes longer to run. You should recompile and relink the whole thing
without the
–p
flag when you have finished profiling. At completion, there is a new file called
mon.out in the directory where you ran it. This file contains the history of
stuff in binary form, so you can’t look at it directly. Use the
prof utility to read
mon.out and create a profile of
stuff . By default, the information is written to your screen on standard output, though you can easily redirect it to a file:
Notification Switch
Would you like to follow the 'High performance computing' conversation and receive update notifications?