A fast Fourier transform,
or FFT, is not a new transform,
but is a computationally efficient algorithm for the computing
the DFT.
The length-NN DFT, defined as

Xk=∑
n
=0N−1xne−(j2πnkN)
X
k
n
N
1
0
x
n
2
n
k
N

(1)
where

Xk
X
k
and

xn
x
n
are in general complex-valued and

0≤k
0
k
,

n≤N−1
n
N
1
,
requires

NN complex multiplies to compute each

Xk
X
k
.
Direct computation of all

NN frequency samples thus requires

N2
N
2
complex multiplies and

N(N−1)
N
N
1
complex additions.
(This assumes precomputation of the DFT coefficients

W
N
n
k
≐e−(j2πnkN)
≐
W
N
n
k
2
n
k
N
; otherwise, the cost is even higher.)
For the large DFT lengths used in many applications,

N2
N
2
operations may be prohibitive.
(For example, digital terrestrial television broadcast
in Europe uses

NN = 2048 or 8192 OFDM channels, and the

SETI project uses
up to length-4194304 DFTs.)
DFTs are thus almost always computed in practice by an

FFT algorithm.
FFTs are very widely used in signal processing, for applications
such as

spectrum analysis and
digital filtering via

fast convolution.

It is now known that C.F. Gauss invented an FFT in 1805 or so
to assist the computation of planetary orbits via
discrete Fourier series.
Various FFT algorithms were independently invented over the next two
centuries, but FFTs achieved widespread awareness and impact only
with the Cooley and Tukey algorithm published in 1965, which came
at a time of increasing use of digital computers and when the vast
range of applications of numerical Fourier techniques was becoming apparent.
Cooley and Tukey's algorithm spawned a surge of research in FFTs
and was also partly responsible for the emergence of Digital Signal Processing (DSP) as a
distinct, recognized discipline.
Since then, many different algorithms have been rediscovered or developed,
and efficient FFTs now exist for all DFT lengths.

The main strategy behind most FFT algorithms is to factor a
length-NN DFT into a number of
shorter-length DFTs, the outputs of which are reused multiple
times (usually in additional short-length DFTs!) to compute the
final results.
The lengths of the short DFTs correspond to integer factors of the
DFT length, NN, leading to different
algorithms for different lengths and factors.
By far the most commonly used FFTs select
N=2M
N
2
M
to be a power of two, leading to the very efficient
power-of-two FFT algorithms,
including the decimation-in-time radix-2 FFT
and the decimation-in-frequency radix-2 FFT algorithms,
the radix-4 FFT
(
N=4M
N
4
M
),
and the split-radix FFT.
Power-of-two algorithms gain their high efficiency
from extensive reuse of intermediate results and
from the low complexity of length-2 and length-4
DFTs, which require no multiplications.
Algorithms for lengths with repeated common factors
(such as 2 or 4 in the radix-2 and radix-4 algorithms, respectively)
require extra twiddle factor multiplications
between the short-length DFTs, which together lead
to a computational complexity of
ONlogN
O
N
N
,
a very considerable savings over direct computation of the DFT.

The other major class of algorithms is the
Prime-Factor Algorithms (PFA).
In PFAs, the short-length DFTs must be of relatively prime lengths.
These algorithms gain efficiency by reuse of intermediate
computations and by eliminating twiddle-factor multiplies,
but require more operations than the power-of-two algorithms to compute the short DFTs of various prime lengths.
In the end, the computational costs of the prime-factor
and the power-of-two algorithms are comparable for similar
lengths, as illustrated in Choosing the Best FFT Algorithm.
Prime-length DFTs cannot be factored into shorter DFTs,
but in different ways both Rader's conversion
and the chirp z-transform
convert prime-length DFTs into convolutions of other
lengths that can be computed efficiently using FFTs
via fast convolution.

Some applications require only a few DFT frequency samples, in which case Goertzel's algorithm halves the number of computations relative to the DFT sum.
Other applications involve successive DFTs of overlapped
blocks of samples, for which the running FFT
can be more efficient than separate FFTs of each block.