<< Chapter < Page | Chapter >> Page > |
where and is a primitive root of unity. Implemented directly, [link] would require operations; fast Fourier transforms are algorithms to compute the same result. The most important FFT (and the one primarily used in FFTW) is known as the“Cooley-Tukey” algorithm, after the two authors who rediscovered and popularized it in 1965 [link] , although it had been previously known as early as 1805 by Gauss as well as by laterre-inventors [link] . The basic idea behind this FFT is that a DFT of a composite size can be re-expressed in terms of smaller DFTs of sizes and —essentially, as a two-dimensional DFT of size where the output is transposed . The choices of factorizations of , combined with the many different ways to implement the data re-orderings of thetranspositions, have led to numerous implementation strategies for the Cooley-Tukey FFT, with many variants distinguished by their ownnames [link] , [link] . FFTW implements a space of many such variants, as described in "Adaptive Composition of FFT Algorithms" , but here we derive the basic algorithm, identify its key features, and outline some important historical variations and their relation to FFTW.
The Cooley-Tukey algorithm can be derived as follows. If can be factored into , [link] can be rewritten by letting and . We then have:
where . Thus, the algorithm computes DFTs of size (the inner sum), multiplies the result by the so-called [link] twiddle factors , and finally computes DFTs of size (the outer sum). This decomposition is then continued recursively. The literature uses the term radix to describe an or that is bounded (often constant); the small DFT of the radix is traditionally called a butterfly .
Many well-known variations are distinguished by the radix alone. A decimation in time ( DIT ) algorithm uses as the radix, while a decimation in frequency ( DIF ) algorithm uses as the radix. If multiple radices are used, e.g. for composite but not a prime power, the algorithm is called mixed radix . A peculiar blending of radix 2 and 4 is called split radix , which was proposed to minimize the count of arithmeticoperations [link] , [link] , [link] , [link] , [link] although it has been superseded in this regard [link] , [link] . FFTW implements both DIT and DIF, is mixed-radix with radices that are adapted to the hardware, and often uses much larger radices (e.g. radix 32) than wereonce common. On the other end of the scale, a “radix” of roughly has been called a four-step FFT algorithm (or six-step , depending on how many transposes one performs) [link] ; see "FFTs and the Memory Hierarchy" for some theoretical and practical discussion of this algorithm.
A key difficulty in implementing the Cooley-Tukey FFT is that the dimension corresponds to discontiguous inputs in but contiguous outputs in , and vice-versa for . This is a matrix transpose for a single decomposition stage, and the compositionof all such transpositions is a (mixed-base) digit-reversal permutation (or bit-reversal , for radix 2). The resulting necessity of discontiguous memory access and data re-ordering hindersefficient use of hierarchical memory architectures (e.g., caches), so that the optimal execution order of an FFT for given hardware isnon-obvious, and various approaches have been proposed.
Notification Switch
Would you like to follow the 'Fast fourier transforms' conversation and receive update notifications?