One wonders if the DFT can be computed faster: Does another
computational procedure -- an algorithm -- exist
that can compute the same quantity, but more efficiently. We
could seek methods that reduce the constant of proportionality,
but do not change the DFT's complexity
ON2
O
N
2
.
Here, we have
something more dramatic in mind: Can the computations be restructured
so that a smaller complexity results?
In 1965, IBM researcher Jim Cooley and Princeton faculty member
John Tukey developed what is now known as the Fast Fourier
Transform (FFT). It is an algorithm for computing that DFT that
has order
ONlogN
O
N
N
for certain length inputs. Now when the length
of data doubles, the spectral computational time will not quadruple as
with the DFT algorithm; instead, it approximately doubles. Later
research showed that no algorithm for computing the DFT could have a
smaller complexity than the FFT. Surprisingly, historical work has
shown that
Gauss
in the early nineteenth century developed the same
algorithm, but did not publish it! After the FFT's rediscovery,
not only was the computation of a signal's spectrum greatly
speeded, but also the added feature of
algorithm
meant that computations had flexibility not available to analog
implementations.
Problem 1
Before developing the FFT, let's try to appreciate the
algorithm's impact. Suppose a short-length transform takes
1 ms. We want to calculate a transform of a signal that is
10 times longer. Compare how much longer a straightforward
implementation of the DFT would take in comparison to an
FFT, both of which compute exactly the same quantity.
[
Click for Solution 1 ]
Solution 1
If a DFT required 1ms to compute, and signal having ten
times the duration would require 100ms to compute. Using the
FFT, a 1ms computing time would increase by a factor of
about
log210=3.3
2
10
3.3
, a factor of 30 less than the DFT would have
needed.
[
Hide Solution 1 ]
To derive the FFT, we assume that the signal's duration is a
power of two:
N=2L
N
2
L
.
Consider what happens to the even-numbered and odd-numbered
elements of the sequence in the DFT calculation.
Sk=s0+s2ⅇ-ⅈ2π2kN+…+sN-2ⅇ-ⅈ2πN-2kN+s1ⅇ-ⅈ2πkN+s3ⅇ-ⅈ2π2+1kN+…+sN-1ⅇ-ⅈ2πN-2-1kN=[s0+s2ⅇ-ⅈ2πkN2+…+sN-2ⅇ-ⅈ2πN2-1kN2]+[s1+s3ⅇ-ⅈ2π+…+sN-1ⅇ-ⅈ2πN2-1kN2]ⅇ-ⅈ2πkN
S
k
s
0
s
2
2
2
k
N
…
s
N
2
2
N
2
k
N
s
1
2
k
N
s
3
2
2
1
k
N
…
s
N
1
2
N
2
1
k
N
[
s
0
s
2
2
k
N
2
…
s
N
2
2
N
2
1
k
N
2
]
[
s
1
s
3
2
k
N
2
…
s
N
1
2
N
2
1
k
N
2
]
2
k
N
(1)
Each term in square brackets has the
form of a
N2
N
2
-length DFT. The first one is a DFT of the
even-numbered elements, and the second of the odd-numbered
elements. The first DFT is combined with the second multiplied
by the complex exponential
ⅇ-ⅈ2πkN
2
k
N
. The half-length transforms are each evaluated at
frequency indices
k=0k0,
……,
N-1N1.
Normally, the number of frequency indices in a DFT calculation
range between zero and the transform length minus one. The
computational advantage of the FFT comes from
recognizing the periodic nature of the discrete Fourier
transform. The FFT simply reuses the computations made in the
half-length transforms and combines them through additions and
the multiplication by
ⅇ-ⅈ2πkN
2
k
N
, which is not periodic over
N2
N
2
,
to rewrite the length-
NN DFT.
Figure 1 illustrates this decomposition.
As it stands, we now compute two length-
N2
N
2
transforms (complexity
2ON24
2
O
N
2
4
), multiply one of them by the complex exponential
(complexity
ON
O
N
), and add the results (complexity
ON
O
N
). At this point, the total complexity is still
dominated by the half-length DFT calculations, but the
proportionality coefficient has been reduced.
Now for the fun. Because
N=2L
N
2
L
, each of the half-length transforms can be reduced to
two quarter-length transforms, each of these to two
eighth-length ones, etc. This decomposition continues until we
are left with length-2 transforms. This transform is quite
simple, involving only additions. Thus, the first stage of the
FFT has
N2
N
2
length-2 transforms (see the bottom part of
Figure 1). Pairs of these transforms are
combined by adding one to the other multiplied by a complex
exponential. Each pair requires 4 additions and 4
multiplications, giving a total number of computations equaling
8·N4=N2
8
·
N
4
N
2
.
This number of computations does not change from stage to stage.
Because the number of stages, the number of times the length can
be divided by two, equals
log2N
2
N
, the complexity of the FFT is
ONlog2N
O
N
2
N
.
Doing an example will make
computational savings more obvious. Let's look at the details
of a length-8 DFT. As shown on
Figure 2, we first decompose the DFT into two length-4
DFTs, with the outputs added and subtracted together in pairs.
Considering
Figure 2 as the
frequency index goes from 0 through 7, we recycle values from
the length-4 DFTs into the final calculation because of the
periodicity of the DFT output. Examining how pairs of outputs
are collected together, we create the basic computational
element known as a
butterfly (
Figure 2).
By considering together the computations involving common output
frequencies from the two half-length DFTs, we see that the two
complex multiplies are related to each other, and we can reduce
our computational work even further. By further decomposing the
length-4 DFTs into two length-2 DFTs and combining their
outputs, we arrive at the diagram summarizing the length-8 fast
Fourier transform (
Figure 1).
Although most of the complex multiplies are quite simple
(multiplying by
ⅇ-ⅈπ
means negating real and imaginary parts), let's count those for
purposes of evaluating the complexity as full complex
multiplies. We have
N2=4
N
2
4
complex multiplies and
2N=16
2
N
16
additions for each stage and
log2N=3
2
N
3
stages, making the number of basic computations
3N2log2N
3
N
2
2
N
as predicted.
Problem 2
Note that the ordering of the input sequence in the two
parts of
Figure 1 aren't quite
the same. Why not? How is the ordering determined?
[
Click for Solution 2 ]
Solution 2
The upper panel has not used the FFT algorithm to compute
the length-4 DFTs while the lower one has. The ordering is
determined by the algorithm.
[
Hide Solution 2 ]
Other "fast" algorithms were discovered,
all of which make use of how many common factors the transform
length NN has. In number theory,
the number of prime factors a given integer has measures how
composite it is. The numbers 16 and 81 are
highly composite (equaling
24
2
4
and
34
3
4
respectively), the number 18 is less so
(
21·32
2
1
·
3
2
), and 17 not at all (it's prime). In over thirty
years of Fourier transform algorithm development, the original
Cooley-Tukey algorithm is far and away the most frequently
used. It is so computationally efficient that power-of-two
transform lengths are frequently used regardless of what the
actual length of the data.
Problem 3
Suppose the length of the signal were
500500? How would you compute
the spectrum of this signal using the Cooley-Tukey
algorithm? What would the length
NN of the transform be?
[
Click for Solution 3 ]
Solution 3
The transform can have
any greater than
or equal to the actual duration of the signal. We simply
“pad” the signal with zero-valued samples until
a computationally advantageous signal length results. Recall
that the FFT is an
algorithm to compute
the
DFT.
Extending the length of the signal this way merely means we
are sampling the frequency axis more finely than required.
To use the Cooley-Tukey algorithm, the length of the
resulting zero-padded signal can be 512, 1024, etc. samples
long.
[
Hide Solution 3 ]
"Electrical Engineering Digital Processing Systems in Braille."