<< Chapter < Page | Chapter >> Page > |
These plans extract a vector loop to reduce a DFT problem to a problem of lower vector rank, which is then solved recursively. Anyof the vector loops of could be extracted in this way, leading to a number of possible plans corresponding to different looporderings.
Formally, to solve , where , FFTW generates a loop that, for all such that , invokes a plan for .
Indirect plans transform a DFT problem that requires some data shuffling (or discontiguous operation) into a problem that requires noshuffling plus a rank-0 problem that performs the shuffling.
Formally, to solve where , FFTW generates a plan that first solves , and then solves . Here we define to be the I/O tensor : that is, it replaces the input strides with the output strides. Thus, an indirect plan firstrearranges/copies the data to the output, then solves the problem in place.
As discussed in "Goals and Background of the FFTW Project" , it turns out to be surprisingly useful to be able to handle large prime (or large prime factors). Rader plans implement the algorithm from [link] to compute one-dimensional DFTs of prime size in time. Bluestein plans implement Bluestein's “chirp-z” algorithm, which can also handle prime in time [link] , [link] , [link] . Generic plans implement a naive algorithm (useful for ).
Although it may not be immediately apparent, the combination of the recursive rules in "The space of plans in FFTW" can produce a number of useful algorithms. To illustrate these compositions, we discuss three particular issues: depth- vs. breadth-first, loop reordering,and in-place transforms.
As discussed previously in sections "Review of the Cooley-Tukey FFT" and "Understanding FFTs with an ideal cache" , the same Cooley-Tukey decomposition can be executed in either traditionalbreadth-first order or in recursive depth-first order, where the latter has some theoretical cache advantages. FFTW is explicitlyrecursive, and thus it can naturally employ a depth-first order. Because its sub-problems contain a vector loop that can be executed ina variety of orders, however, FFTW can also employ breadth-first traversal. In particular, a 1d algorithm resembling thetraditional breadth-first Cooley-Tukey would result from applying "Cooley-Tukey plans" to completely factorize the problem size before applying the loop rule "Plans for higher vector ranks" to reduce the vector ranks, whereas depth-first traversal would result fromapplying the loop rule before factorizing each subtransform. These two possibilities are illustrated by an example in [link] .
Notification Switch
Would you like to follow the 'Fast fourier transforms' conversation and receive update notifications?