Soon after the paper by Cooley and Tukey, there were improvements and extensions made. One very important discovery was the improvement in efficiency by using a larger radix of 4, 8 or even 16. For example, just as for the radix-2 butterfly, there are no multiplications required for a length-4 DFT, and therefore, a radix-4 FFT would have only twiddle factor multiplications. Because there are half as many stages in a radix-4 FFT, there would be half as many multiplications as in a radix-2 FFT. In practice, because some of the multiplications are by unity, the improvement is not by a factor of two, but it is significant. A radix-4 FFT is easily developed from the basic radix-2 structure by replacing the length-2 butterfly by a length-4 butterfly and making a few other modifications. Programs can be found in [1] and operation counts will be given in "Evaluation of the Cooley-Tukey FFT Algorithms".

Increasing the radix to 8 gives some improvement but not as much as from 2 to 4. Increasing it to 16 is theoretically promising but the small decrease in multiplications is somewhat offset by an increase in additions and the program becomes rather long. Other radices are not attractive because they generally require a substantial number of multiplications and additions in the butterflies.

The second method of reducing arithmetic is to remove the
unnecessary TF multiplications by plus or minus unity or by plus or
minus the square root of minus one. This occurs when the exponent of

In addition to removing multiplications by one or

A time-consuming and unnecessary part of the execution of a FFT program is the calculation of the sine and cosine terms which are the real and imaginary parts of the TFs. There are basically three approaches to obtaining the sine and cosine values. They can be calculated as needed which is what is done in the sample program above. One value per stage can be calculated and the others recursively calculated from those. That method is fast but suffers from accumulated round-off errors. The fastest method is to fetch precalculated values from a stored table. This has the disadvantage of requiring considerable memory space.

If all the N DFT values are not needed, special forms of the FFT can be developed using a process called pruning [17] which removes the operations concerned with the unneeded outputs.

Special algorithms are possible for cases with real data or with symmetric data [6]. The decimation-in-time algorithm can be easily modified to transform real data and save half the arithmetic required for complex data [27]. There are numerous other modifications to deal with special hardware considerations such as an array processor or a special microprocessor such as the Texas Instruments TMS320. Examples of programs that deal with some of these items can be found in [22], [1], [6].

Comments:"The Fast Fourier Transform (FFT) is a landmark algorithm used in fields ranging from signal processing to high-performance computing. First popularized by two American scientists in 1965, the […]"