<< Chapter < Page Chapter >> Page >

Let’s look at an illustration. Below is a short C program that performs a bubble sort of 10 integers:


int n[] = {23,12,43,2,98,78,2,51,77,8};main () {int i, j, ktemp; for (i=10; i>0; i--) { for (j=0; j<i; j++) { if (n[j]<n[j+1]) {ktemp = n[j+1], n[j+1]= n[j], n[j]= ktemp; }} }}

tcov produces a basic block profile that contains execution counts for each source line, plus some summary statistics (not shown):


int n[] = {23,12,43,2,98,78,2,51,77,8};main () 1 ->{ int i, j, ktemp;10 ->for (i=10; i>0; i--) { 10, 55 ->for (j=0; j<i; j++) { 55 ->if (n[j]<n[j+1]) {23 ->ktemp = n[j+1], n[j+1]= n[j], n[j] = ktemp;} }} 1 ->}

The numbers to the left tell you the number of times each block was entered. For instance, you can see that the routine was entered just once, and that the highest count occurs at the test n[j]<n[j+1] . tcov shows more than one count on a line in places where the compiler has created more than one block.

Pixie

pixie is a little different from tcov . Rather than reporting the number of times each source line was executed, pixie reports the number of machine clock cycles devoted to executing each line. In theory, you could use this to calculate the amount of time spent per statement, although anomalies like cache misses are not represented.

pixie works by “pixifying” an executable file that has been compiled and linked in the normal way. Below we run pixie on foo to create a new executable called foo.pixie :


% cc foo.c -o foo % pixie foo% foo.pixie % prof -pixie foo

Also created was a file named foo.Addrs , which contains addresses for the basic blocks within foo . When the new program, foo.pixie , is run, it creates a file called foo.Counts , containing execution counts for the basic blocks whose addresses are stored in foo.Addrs . pixie data accumulates from run to run. The statistics are retrieved using prof and a special –pixie flag.

pixie ’s default output comes in three sections and shows:

  • Cycles per routine
  • Procedure invocation counts
  • Cycles per basic line

Below, we have listed the output of the third section for the bubble sort:


procedure (file) line bytes cycles % cum %main (foo.c) 7 44 605 12.11 12.11 _cleanup (flsbuf.c) 59 20 500 10.01 22.13fclose (flsbuf.c) 81 20 500 10.01 32.14 fclose (flsbuf.c) 94 20 500 10.01 42.15_cleanup (flsbuf.c) 54 20 500 10.01 52.16 fclose (flsbuf.c) 76 16 400 8.01 60.17main (foo.c) 10 24 298 5.97 66.14 main (foo.c) 8 36 207 4.14 70.28.... .. .. .. ... ...

Here you can see three entries for the main routine from foo.c , plus a number of system library routines. The entries show the associated line number and the number of machine cycles dedicated to executing that line as the program ran. For instance, line 7 of foo.c took 605 cycles (12% of the runtime).

Get Jobilize Job Search Mobile App in your pocket Now!

Get it on Google Play Download on the App Store Now




Source:  OpenStax, High performance computing. OpenStax CNX. Aug 25, 2010 Download for free at http://cnx.org/content/col11136/1.5
Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'High performance computing' conversation and receive update notifications?

Ask