<< Chapter < Page Chapter >> Page >

Even for one-dimensional DFTs, there is a common misperception that one should always choose power-of-two sizes if one cares aboutefficiency. Thanks to FFTW's code generator (described in "Generating Small FFT Kernels" ), we could afford to devote equal optimization effort to any n with small factors (2, 3, 5, and 7 are good), instead of mostly optimizing powers of two likemany high-performance FFTs. As a result, to pick a typical example on the 3 GHz Core Duo processor of [link] , n = 3600 = 2 4 · 3 2 · 5 2 and n = 3840 = 2 8 · 3 · 5 both execute faster than n = 4096 = 2 12 . (And if there are factors one particularly cares about, one can generate code for them too.)

One initially missing feature was efficient support for large prime sizes; the conventional wisdom was that large-prime algorithms weremainly of academic interest, since in real applications (including ours) one has enough freedom to choose a highly composite transformsize. However, the prime-size algorithms are fascinating, so we implemented Rader's O ( n log n ) prime- n algorithm [link] purely for fun, including it in FFTW 2.0 (released in 1998) as a bonus feature. The response wasastonishingly positive—even though users are (probably) never forced by their application to compute a prime-size DFT, it is rather inconvenient to always worry that collecting an unluckynumber of data points will slow down one's analysis by a factor of a million. The prime-size algorithms are certainly slower thanalgorithms for nearby composite sizes, but in interactive data-analysis situations the difference between 1 ms and 10 ms means little,while educating users to avoid large prime factors is hard.

Another form of flexibility that deserves comment has to do with a purely technical aspect of computer software. FFTW'simplementation involves some unusual language choices internally (the FFT-kernel generator, described in "Generating Small FFT Kernels" , is written in Objective Caml, a functional languageespecially suited for compiler-like programs), but its user-callable interface is purely in C with lowest-common-denominator datatypes(arrays of floating-point values). The advantage of this is that FFTW can be (and has been) called from almost any other programminglanguage, from Java to Perl to Fortran 77. Similar lowest-common-denominator interfaces are apparent in many otherpopular numerical libraries, such as LAPACK [link] . Language preferences arouse strong feelings, but this technical constraint meansthat modern programming dialects are best hidden from view for a numerical library.

Ultimately, very few scientific-computing applications should have performance as their top priority. Flexibility is often far more important,because one wants to be limited only by one's imagination, rather than by one's software, in the kinds of problems that can be studied.

Ffts and the memory hierarchy

There are many complexities of computer architectures that impact the optimization of FFT implementations, but one of the most pervasiveis the memory hierarchy. On any modern general-purpose computer, memory is arranged into a hierarchy of storage devices with increasingsize and decreasing speed: the fastest and smallest memory being the CPU registers, then two or three levels of cache, then the main-memoryRAM, then external storage such as hard disks. A hard disk is utilized by “out-of-core” FFT algorithms for very large n [link] , but these algorithms appear to have been largely superseded in practice by both the gigabytes of memory now common on personal computers and,for extremely large n , by algorithms for distributed-memory parallel computers. Most of these levels are managed automatically by the hardware to hold the most-recently-used data from the next levelin the hierarchy. This includes the registers: on current “x86” processors, the user-visible instruction set (with a smallnumber of floating-point registers) is internally translated at runtime to RISC-like “ μ -ops” with a much larger number of physical rename registers that are allocated automatically. There are many complications, however, such as limited cache associativity (which means that certain locations in memory cannot becached simultaneously) and cache lines (which optimize the cache for contiguous memory access), which are reviewed in numerous textbooks oncomputer architectures. In this section, we focus on the simplest abstract principles of memory hierarchies in order to grasptheir fundamental impact on FFTs.

Questions & Answers

write 150 organic compounds and name it and draw the structure
Joseph Reply
write 200 organic compounds and name it and draw the structure
Joseph
name 150 organic compounds and draw the structure
Joseph
organic chemistry is a science or social science discuss it's important to our country development
Musa Reply
what is chemistry
Terhemba Reply
what is the difference between ph and poh?
Abagaro Reply
chemical bond that results from the attractive force between shared electrons and nonmetals nucleus is what?
Abagaro
what is chemistry
Ayok
what is chemistry
ISIYAKA Reply
what is oxidation
Chidiebube Reply
calculate molarity of NaOH solution when 25.0ml of NaOH titrated with 27.2ml of 0.2m H2SO4
Gasin Reply
what's Thermochemistry
rhoda Reply
the study of the heat energy which is associated with chemical reactions
Kaddija
How was CH4 and o2 was able to produce (Co2)and (H2o
Edafe Reply
explain please
Victory
First twenty elements with their valences
Martine Reply
first twenty element with their valence
Victoria
what is chemistry
asue Reply
what is atom
asue
what is atom
Victoria
what is the best way to define periodic table for jamb
Damilola Reply
what is the change of matter from one state to another
Elijah Reply
what is isolation of organic compounds
IKyernum Reply
what is atomic radius
ThankGod Reply
Read Chapter 6, section 5
Dr
Read Chapter 6, section 5
Kareem
Atomic radius is the radius of the atom and is also called the orbital radius
Kareem
atomic radius is the distance between the nucleus of an atom and its valence shell
Amos
Read Chapter 6, section 5
paulino
Got questions? Join the online conversation and get instant answers!
Jobilize.com Reply

Get Jobilize Job Search Mobile App in your pocket Now!

Get it on Google Play Download on the App Store Now




Source:  OpenStax, Fast fourier transforms. OpenStax CNX. Nov 18, 2012 Download for free at http://cnx.org/content/col10550/1.22
Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Fast fourier transforms' conversation and receive update notifications?

Ask