To derive the FFT, we assume that the signal's duration is a
power of two:
N=2l
N
2
l
. Consider what happens to the even-numbered and odd-numbered
elements of the sequence in the DFT calculation.
Sk=s0+s2e(−i)2π2kN+…+sN−2e(−i)2π(N−2)kN+s1e(−i)2πkN+s3e(−i)2π×(2+1)kN+…+sN−1e(−i)2π(N−2+1)kN=s0+s2e(−i)2πkN2+…+sN−2e(−i)2π(N2−1)kN2+(s1+s3e(−i)2πkN2+…+sN−1e(−i)2π(N2−1)kN2)e−(i2πk)N
S
k
s
0
s
2
2
2
k
N
…
s
N
2
2
N
2
k
N
s
1
2
k
N
s
3
2
2
1
k
N
…
s
N
1
2
N
2
1
k
N
s
0
s
2
2
k
N
2
…
s
N
2
2
N
2
1
k
N
2
s
1
s
3
2
k
N
2
…
s
N
1
2
N
2
1
k
N
2
2
k
N
(1)
Each term in square brackets has the form of a
N2
N
2
-length DFT. The first one is a DFT of the even-numbered
elements, and the second of the odd-numbered elements. The
first DFT is combined with the second multiplied by the complex
exponential
e−(i2πk)N
2
k
N
. The half-length transforms are each evaluated at frequency indices
k∈0…N−1
k
0
…
N
1
. Normally, the number of frequency indices in a DFT
calculation range between zero and the transform length minus
one. The computational advantage of the FFT comes
from recognizing the periodic nature of the discrete Fourier
transform. The FFT simply reuses the computations made in the
half-length transforms and combines them through additions and
the multiplication by
e−(i2πk)N
2
k
N
, which is not periodic over
N2
N
2
, to rewrite the length-N DFT. Figure 1 illustrates this decomposition. As it stands, we
now compute two length-
N2
N
2
transforms (complexity
2ON24
2
O
N
2
4
), multiply one of them by the complex exponential (complexity
ON
O
N
), and add the results (complexity
ON
O
N
). At this point, the total complexity is still dominated by
the half-length DFT calculations, but the proportionality
coefficient has been reduced.
Now for the fun. Because
N=2l
N
2
l
, each of the half-length transforms can be reduced to two
quarter-length transforms, each of these to two eighth-length
ones, etc. This decomposition continues until we are left with
length-2 transforms. This transform is quite simple, involving
only additions. Thus, the first stage of the FFT has
N2
N
2
length-2 transforms (see the bottom part of Figure 1). Pairs of these transforms are
combined by adding one to the other multiplied by a complex
exponential. Each pair requires 4 additions and 4
multiplications, giving a total number of computations equaling
8N4=N2
8
N
4
N
2
. This number of computations does not change from stage to
stage. Because the number of stages, the number of times the
length can be divided by two, equals
log2N
2
N
, the complexity of the FFT is
ONlogN
O
N
N
.
Doing an example will make computational savings more obvious.
Let's look at the details of a length-8 DFT. As shown on Figure 1, we first decompose the DFT into
two length-4 DFTs, with the outputs added and subtracted
together in pairs. Considering Figure 1 as the frequency index goes from 0 through 7, we
recycle values from the length-4 DFTs into the final calculation
because of the periodicity of the DFT output. Examining how
pairs of outputs are collected together, we create the basic
computational element known as a butterfly (Figure 2).
By considering together the computations involving common output
frequencies from the two half-length DFTs, we see that the two
complex multiplies are related to each other, and we can reduce
our computational work even further. By further decomposing the
length-4 DFTs into two length-2 DFTs and combining their
outputs, we arrive at the diagram summarizing the length-8 fast
Fourier transform (Figure 1).
Although most of the complex multiplies are quite simple
(multiplying by
e−(iπ)
means negating real and imaginary parts), let's count those for
purposes of evaluating the complexity as full complex
multiplies. We have
N2=4
N
2
4
complex multiplies and
2N=16
2
N
16
additions for each stage and
log2N=3
2
N
3
stages, making the number of basic computations
3N2log2N
3
N
2
2
N
as predicted.
Note that the ordering of the input sequence in the two
parts of Figure 1 aren't quite
the same. Why not? How is the ordering determined?
The upper panel has not used the FFT algorithm to compute
the length-4 DFTs while the lower one has. The ordering is
determined by the algorithm.
We now have a way of computing the spectrum for an
arbitrary signal: The Discrete Fourier Transform (DFT)
computes the spectrum at NN equally spaced frequencies
from a length- NN
sequence. An issue that never arises in analog
"computation," like that performed by a circuit, is
how much work it takes to perform the signal processing
operation such as filtering. In computation, this consideration
translates to the number of basic computational steps required
to perform the needed processing. The number of steps, known as
the complexity, becomes equivalent to how long the
computation takes (how long must we wait for an
answer). Complexity is not so much tied to specific computers or
programming languages but to how many steps are required on any
computer. Thus, a procedure's stated complexity says that
the time taken will be proportional to some
function of the amount of data used in the computation and the
amount demanded.
For example, consider the formula for the discrete Fourier
transform. For each frequency we chose, we must multiply each
signal value by a complex number and add together the
results. For a real-valued signal, each real-times-complex
multiplication requires two real multiplications, meaning we
have
2N
2N multiplications to perform. To add the results
together, we must keep the real and imaginary parts
separate. Adding NN
numbers requires
N−1
N1
additions. Consequently, each frequency requires
2N+2(N−1)=4N−2
2N
2
N1
4N
2
basic computational steps. As we have NN frequencies, the total
number of computations is
N(4N−2)
N
4N
2
.
In complexity calculations, we only worry about what happens as
the data lengths increase, and take the dominant
term—here the
4N2
4
N2
term—as reflecting how much work is involved in
making the computation. As multiplicative constants don't
matter since we are making a "proportional to"
evaluation, we find the DFT is an
ON2ON2
computational procedure. This notation is read "order NN-squared". Thus, if we
double the length of the data, we would expect that the
computation time to approximately quadruple.
In making the complexity evaluation for
the DFT, we assumed the data to be real. Three questions
emerge. First of all, the spectra of such signals have
conjugate symmetry, meaning that negative frequency components
( k=N2+1...N+1
k
N2
1
...
N1
in the DFT) can be computed from the
corresponding positive frequency components. Does this
symmetry change the DFT's complexity?
Secondly, suppose the data are complex-valued; what is the
DFT's complexity now?
Finally, a less important but interesting
question is suppose we want KK frequency values
instead of NN;
now what is the complexity?
When the signal is real-valued, we may only need
half the spectral values, but the complexity remains
unchanged. If the data are complex-valued, which demands
retaining all frequency values, the complexity is again the
same. When only KK frequencies are
needed, the complexity is OKN
OKN.
How much better is O(NlogN) than O(
N2
N
2
)?
Table 1
|
NN
|
1010
|
100100
|
10001000
|
106106
|
109109
|
|
N2N2
|
100
|
104104
|
106106
|
10121012
|
10181018
|
|
NlogNNN
|
11
|
200200
|
30003000
|
6×1066106
|
9×1099109
|
Say you have a 1 MFLOP machine (a million "floating point"
operations per second). Let
N=1million=106
N
1
million
10
6
.
An O(
N2
N
2
) algorithm takes
10121012
flors →
106
10
6
seconds ≃ 11.5 days.
An O(
NlogN
N
N
) algorithm takes
6×106
6
10
6
Flors → 6 seconds.
N=1million
N
1
million
is not unreasonable.
3 megapixel digital camera spits out
3×106
3
10
6
numbers for each picture. So for two
NN point sequences
fnfn
and hnhn. If
computing
fn⊛hn
⊛
f
n
h
n
directly: O(
N2
N
2
) operations.
taking FFTs -- O(NlogN)
multiplying FFTs -- O(N)
inverse FFTs -- O(NlogN).
the total complexity is O(NlogN).
Other "fast" algorithms have been discovered, most of which make use
of how many common factors the transform length N has. In
number theory, the number of prime factors a given integer has
measures how composite it is. The numbers 16 and
81 are highly composite (equaling
24
2
4
and
34
3
4
respectively), the number 18 is less so (
2132
2
1
3
2
), and 17 not at all (it's prime). In over thirty years of
Fourier transform algorithm development, the original
Cooley-Tukey algorithm is far and away the most frequently
used. It is so computationally efficient that power-of-two
transform lengths are frequently used regardless of what the
actual length of the data. It is even well established that the FFT, alongside the digital computer, were almost completely responsible for
the "explosion" of DSP in the 60's.
"My introduction to signal processing course at Rice University."