The x86 benchmarks were performed with BenchFFT, a collection of FFT libraries and benchmarking software assembled by Frigo and Johnson, the authors
of FFTW [1]. The benchmarks in BenchFFT use timing and calibration code from lmbench, a performance analysis tool written by
Larry McVoy and Carl Staelin [2].
Timing
BenchFFT measures the initialization time and runtime of an FFT
separately. The initialization time is measured only once, and thus outliers
due to effects from external factors such as OS scheduling are
occasionally observed. Routines from lmbench are then used to calibrate
the minimum number of FFT iterations required for accurate measurement
using the gettimeofday function. Finally, the time taken to run the
minimum number of iterations is measured eight times, from which the minimum
time divided by the number of iterations is used, in order to factor out
effects from external factors.
The minimum time for a transform is then used to determine a scaled inverse time measurement, sometimes known as CTGs. CTG are defined as:
for complex transforms and
for real transforms, where
When a transform has several variants (such as direction or radix), BenchFFT reports the speed of the FFT as being the fastest of the possible options.
Accuracy
To measure the accuracy of a transform, BenchFFT compares an FFT with an
arbitrary-precision FFT computed on the same inputs, and reports the
relative RMS error. The inputs are pseudo-random in the range
When a transform has several variants (such as direction or radix), BenchFFT reports the accuracy as being worst of the results.
Compiling
Except where otherwise noted, ICC version 12.1.0 for OS X was used to compile 64-bit code. For OS X builds, the compiler flags used were “-O3”, while “-O3 -msse2” (or equivalent) was used for Linux builds. In the cases where the FFT uses AVX, the code is compiled with “-xAVX” or “-mavx” (depending on compiler).
Some libraries included in the BenchFFT software have their own compilation scripts which override the defaults, and in the case of commercial libraries (such as Intel IPP and Apple vDSP), the compiler flags are of little consequence because the libraries are distributed in binary form.
Data format
FFT libraries use interleaved format and/or complex format to store the data. In the case of interleaved format, the real and imaginary parts of complex numbers are stored adjacently in memory, while in the case of split format, the real and imaginary parts are stored in separate arrays.
The majority of FFT libraries use interleaved format to store data. In the case where the library supports interleaved or split format, BenchFFT uses interleaved format. However there are a few libraries that only support split format, and in theses cases it should be noted the results are not strictly comparable (Apple vDSP is one such case).




