The choice of a power-of-two algorithm may not just depend on computational complexity.
The latest extensions of the
split-radix algorithm offer the lowest known power-of-two FFT operation counts, but the 10%-30% difference may not make up for other factors such as regularity of structure or data flow,
FFT programming tricks, or special hardware features.
For example, the
decimation-in-time radix-2 FFT is the fastest FFT on
Texas Instruments' TMS320C54x DSP microprocessors, because this processor family has special assembly-language instructions that accelerate this particular algorithm.
On other hardware,
radix-4 algorithms may be more efficient.
Some devices, such as
AMI Semiconductor's Toccata ultra-low-power DSP microprocessor family, have on-chip FFT accelerators; it is always faster and more power-efficient to use these accelerators and whatever radix they prefer.
For
fast convolution, the
decimation-in-frequency algorithms may be preferred because the bit-reversing can be bypassed; however, most DSP microprocessors provide zero-overhead bit-reversed indexing hardware and prefer decimation-in-time algorithms, so this may not be true for such machines.
Good, compiler- or hardware-friendly programming always matters more than modest differences in raw operation counts, so manufacturers' or good third-party FFT libraries are often the best choice.
The module
FFT programming tricks references some good, free FFT software (including the
FFTW package) that is carefully coded to be compiler-friendly; such codes are likely to be considerably faster than codes written by the casual programmer.