<< Chapter < Page | Chapter >> Page > |
There are, in fact, trigonometric recurrences with the same logarithmic error growth as the FFT, but these seem more difficultto implement efficiently; they require that a table of values be stored and updated as the recurrence progresses [link] , [link] . Instead, in order to gain at least some of the benefits of a trigonometric recurrence (reducedmemory pressure at the expense of more arithmetic), FFTW includes several ways to compute a much smaller twiddle table, from which thedesired entries can be computed accurately on the fly using a bounded number (usually ) of complex multiplications. For example, instead of a twiddle table with entries , FFTW can use two tables with entries each, so that is computed by multiplying an entry in one table (indexed with the low-order bits of ) by an entry in the other table (indexed with the high-order bits of ).
There are a few non-Cooley-Tukey algorithms that are known to have worse error characteristics, such as the “real-factor”algorithm [link] , [link] , but these are rarely used in practice (and are not used at all in FFTW). On the other hand,some commonly used algorithms for type-I and type-IV discrete cosine transforms [link] , [link] , [link] have errors that we observed to grow as even for accurate trigonometric constants (although we are not aware of any theoretical error analysisof these algorithms), and thus we were forced to use alternative algorithms [link] .
To measure the accuracy of FFTW, we compare against a slow FFT implemented in arbitrary-precision arithmetic, while to verify thecorrectness we have found the self-test algorithm of [link] very useful.
It is unlikely that many readers of this chapter will ever have to implement their own fast Fourier transform software, except as alearning exercise. The computation of the DFT, much like basic linear algebra or integration of ordinary differential equations, isso central to numerical computing and so well-established that robust, flexible, highly optimized libraries are widely available, for themost part as free/open-source software. And yet there are many other problems for which the algorithms are not so finalized, or for whichalgorithms are published but the implementations are unavailable or of poor quality. Whatever new problems one comes across, thereis a good chance that the chasm between theory and efficient implementation will be just as large as it is for FFTs, unlesscomputers become much simpler in the future. For readers who encounter such a problem, we hope that these lessons from FFTW will beuseful:
We should also mention one final lesson that we haven't discussed in this chapter: you can't optimize in a vacuum, or you end upcongratulating yourself for making a slow program slightly faster. We started the FFTW project after downloading a dozen FFTimplementations, benchmarking them on a few machines, and noting how the winners varied between machines and between transform sizes.Throughout FFTW's development, we continued to benefit from repeated benchmarks against the dozens of high-quality FFT programsavailable online, without which we would have thought FFTW was “complete” long ago.
SGJ was supported in part by the Materials Research Science and Engineering Center program of the National Science Foundation underaward DMR-9400334; MF was supported in part by the Defense Advanced Research Projects Agency (DARPA) under contract No. NBCH30390004. Weare also grateful to Sidney Burrus for the opportunity to contribute this chapter, and for his continual encouragement—dating back to hisfirst kind words in 1997 for the initial FFT efforts of two graduate students venturing outside their fields.
Notification Switch
Would you like to follow the 'Fast fourier transforms' conversation and receive update notifications?