<< Chapter < Page | Chapter >> Page > |
Roughly speaking, to solve a general DFT problem, one must perform three tasks. First, one must reduce a problem of arbitrary vectorrank to a set of loops nested around a problem of vector rank 0, i.e., a single (possibly multi-dimensional) DFT. Second, one must reducethe multi-dimensional DFT to a sequence of of rank-1 problems, i.e., one-dimensional DFTs; for simplicity, however, we do notconsider multi-dimensional DFTs below. Third, one must solve the rank-1, vector rank-0 problem by means of some DFT algorithm suchas Cooley-Tukey. These three steps need not be executed in the stated order, however, and in fact, almost every permutation and interleavingof these three steps leads to a correct DFT plan. The choice of the set of plans explored by the planner is critical for the usability ofthe FFTW system: the set must be large enough to contain the fastest possible plans, but it must be small enough to keep theplanning time acceptable.
The rank-0 problem denotes a permutation of the input array into the output array. FFTWdoes not solve arbitrary rank-0 problems, only the following two special cases that arise in practice.
memcpy
, which is
specialized to copy contiguous regions of memory.Rank-1 DFT problems denote ordinary one-dimensional Fourier transforms. FFTW deals with most rank-1 problems as follows.
When the DFT rank-1 problem is “small enough” (usually, ), FFTW produces a direct plan that solves the problem directly. These plans operate by calling a fragment of C code(a codelet ) specialized to solve problems of one particular size, whose generation is described in "Generating Small FFT Kernels" . More precisely, the codelets compute a loop ( ) of small DFTs.
For problems of the form where , FFTW generates a plan that implements a radix- Cooley-Tukey algorithm "Review of the Cooley-Tukey FFT" . Both decimation-in-time and decimation-in-frequency plans are supported, with both small fixed radices (usually, ) produced by the codelet generator "Generating Small FFT Kernels" and also arbitrary radices (e.g. radix- ).
The most common case is a decimation in time ( DIT ) plan, corresponding to a radix (and thus ) in the notation of "Review of the Cooley-Tukey FFT" : it first solves , then multiplies the output array by the twiddle factors, and finally solves . For performance, the last two steps are not planned independently, but arefused together in a single “twiddle” codelet—a fragment of C code that multiplies its input by the twiddle factors and performs a DFTof size , operating in-place on .
Notification Switch
Would you like to follow the 'Fast fourier transforms' conversation and receive update notifications?