A powerful approach to the development of efficient
algorithms is to break a large problem into multiple small ones.
One method for doing this with both the DFT and convolution uses a
linear change of index variables to map the original one-dimensional problem into a multi-dimensional problem. This
approach provides a unified derivation of the Cooley-Tukey FFT, the
prime factor algorithm (PFA) FFT, and the Winograd Fourier transform
algorithm (WFTA) FFT. It can also be applied directly to convolution
to break it down into multiple short convolutions that can be executed
faster than a direct implementation. It is often easy to translate an
algorithm using index mapping into an efficient program.
The basic definition of the discrete Fourier transform (DFT) is
C
(
k
)
=
∑
n
=
0
N
-
1
x
(
n
)
W
N
n
k
C
(
k
)
=
∑
n
=
0
N
-
1
x
(
n
)
W
N
n
k
(1)
where nn, kk, and NN are integers, j=-1j=-1, the basis functions are the NN roots of unity,
W
N
=
e
-
j
2
π
/
N
W
N
=
e
-
j
2
π
/
N
(2)
and k=0,1,2,⋯,N-1k=0,1,2,⋯,N-1.
If the NN values of the transform are calculated from the NN values of
the data, x(n)x(n), it is easily seen that N2N2 complex multiplications and
approximately that same number of complex additions are required. One
method for reducing this required arithmetic is to use an index
mapping (a change of variables) to change the one-dimensional DFT
into a two- or higher dimensional DFT. This is one of the ideas
behind the very efficient Cooley-Tukey [6] and Winograd
[19] algorithms. The purpose of index mapping is to
change a large problem into several easier ones [5], [7]. This
is sometimes called the “divide and conquer" approach
[3] but a more accurate description would be “organize
and share" which explains the process of redundancy removal or reduction.
For a length-N sequence, the time index takes on the values
n
=
0
,
1
,
2
,
.
.
.
,
N
-
1
n
=
0
,
1
,
2
,
.
.
.
,
N
-
1
(3)
When the length of the DFT is not prime, NN can be factored as N=N1N2N=N1N2 and two new independent variables can be defined over the
ranges
n
1
=
0
,
1
,
2
,
.
.
.
,
N
1
-
1
n
1
=
0
,
1
,
2
,
.
.
.
,
N
1
-
1
(4)
n
2
=
0
,
1
,
2
,
.
.
.
,
N
2
-
1
n
2
=
0
,
1
,
2
,
.
.
.
,
N
2
-
1
(5)
A linear change of variables is defined which maps n1n1 and
n2n2 to nn and is expressed by
n
=
(
(
K
1
n
1
+
K
2
n
2
)
)
N
n
=
(
(
K
1
n
1
+
K
2
n
2
)
)
N
(6)
where KiKi are integers and the notation ((x))N((x))N denotes the integer
residue of xx modulo NN[13]. This map defines a relation between all possible
combinations of n1n1 and n2n2 in Equation 4 and Equation 5 and the values
for nn in Equation 3. The question as to whether all of the nn in Equation 3 are
represented, i.e., whether the map is one-to-one (unique), has been
answered in [5] showing that certain integer KiKi always exist
such that the map in Equation 6 is one-to-one. Two cases must be
considered.
N1N1 and N2N2 are relatively prime, i.e., the greatest common divisor (N1,N2)=1(N1,N2)=1.
The integer map of Equation 6 is one-to-one if and only if:
(
K
1
=
a
N
2
)
and/or
(
K
2
=
b
N
1
)
and
(
K
1
,
N
1
)
=
(
K
2
,
N
2
)
=
1
(
K
1
=
a
N
2
)
and/or
(
K
2
=
b
N
1
)
and
(
K
1
,
N
1
)
=
(
K
2
,
N
2
)
=
1
(7)
where aa and bb are integers.
N1N1 and N2N2 are not relatively prime, i.e., (N1,N2)>1(N1,N2)>1.
The integer map of Equation 6 is one-to-one if and only if:
(
K
1
=
a
N
2
)
and
(
K
2
≠
b
N
1
)
and
(
a
,
N
1
)
=
(
K
2
,
N
2
)
=
1
(
K
1
=
a
N
2
)
and
(
K
2
≠
b
N
1
)
and
(
a
,
N
1
)
=
(
K
2
,
N
2
)
=
1
(8)
or
(
K
1
≠
a
N
2
)
and
(
K
2
=
b
N
1
)
and
(
K
1
,
N
1
)
=
(
b
,
N
2
)
=
1
(
K
1
≠
a
N
2
)
and
(
K
2
=
b
N
1
)
and
(
K
1
,
N
1
)
=
(
b
,
N
2
)
=
1
(9)
Reference [5] should be consulted for the details of
these conditions and examples. Two classes of index maps are defined
from these conditions.
The map of Equation 6 is called a type-one map when integers aa and bb exist such that
K
1
=
a
N
2
and
K
2
=
b
N
1
K
1
=
a
N
2
and
K
2
=
b
N
1
(10)
The map of Equation 6 is called a type-two map when when integers aa and bb exist such that
K
1
=
a
N
2
or
K
2
=
b
N
1
,
but
not
both.
K
1
=
a
N
2
or
K
2
=
b
N
1
,
but
not
both.
(11)
The type-one can be used only if the factors of NN are relatively prime,
but the type-two can be used whether they are relatively prime or
not. Good [8], Thomas, and Winograd [19] all used the
type-one map in their DFT algorithms. Cooley and Tukey [6]
used the type-two in their algorithms, both for a fixed radix (N=RM)(N=RM) and a mixed radix [17].
The frequency index is defined by a map similar to Equation 6 as
k
=
(
(
K
3
k
1
+
K
4
k
2
)
)
N
k
=
(
(
K
3
k
1
+
K
4
k
2
)
)
N
(12)
where the same conditions, Equation 7 and Equation 8, are used for
determining the uniqueness of this map in terms of the integers K3K3 and K4K4.
Two-dimensional arrays for the input data and its DFT are
defined using these index maps to give
x
^
(
n
1
,
n
2
)
=
x
(
(
K
1
n
1
+
K
2
n
2
)
)
N
x
^
(
n
1
,
n
2
)
=
x
(
(
K
1
n
1
+
K
2
n
2
)
)
N
(13)
X
^
(
k
1
,
k
2
)
=
X
(
(
K
3
k
1
+
K
4
k
2
)
)
N
X
^
(
k
1
,
k
2
)
=
X
(
(
K
3
k
1
+
K
4
k
2
)
)
N
(14)
In some of the following equations, the residue reduction
notation will be omitted for clarity. These changes of variables
applied to the definition of the DFT given in Equation 1 give
C
(
k
)
=
∑
n
2
=
0
N
2
-
1
∑
n
1
=
0
N
1
-
1
x
(
n
)
W
N
K
1
K
3
n
1
k
1
W
N
K
1
K
4
n
1
k
2
W
N
K
2
K
3
n
2
k
1
W
N
K
2
K
4
n
2
k
2
C
(
k
)
=
∑
n
2
=
0
N
2
-
1
∑
n
1
=
0
N
1
-
1
x
(
n
)
W
N
K
1
K
3
n
1
k
1
W
N
K
1
K
4
n
1
k
2
W
N
K
2
K
3
n
2
k
1
W
N
K
2
K
4
n
2
k
2
(15)
where all of the exponents are evaluated modulo NN.
The amount of arithmetic required to calculate Equation 15 is the
same as in the direct calculation of Equation 1. However, because
of the special nature of the DFT, the integer constants KiKi can be
chosen in such a way that the calculations are “uncoupled" and
the arithmetic is reduced. The requirements for this are
(
(
K
1
K
4
)
)
N
=
0
and/or
(
(
K
2
K
3
)
)
N
=
0
(
(
K
1
K
4
)
)
N
=
0
and/or
(
(
K
2
K
3
)
)
N
=
0
(16)
When this condition and those for uniqueness in Equation 6 are
applied, it is found that the KiKi may always be chosen such that
one of the terms in Equation 16 is zero. If the NiNi are
relatively prime, it is always possible to make both terms zero. If
the NiNi are not relatively prime, only one of the terms can be
set to zero. When they are relatively prime, there is a choice, it
is possible to either set one or both to zero. This in turn causes
one or both of the center two WW terms in Equation 15 to become
unity.
An example of the Cooley-Tukey radix-4 FFT for a length-16
DFT uses the type-two map with K1=4K1=4, K2=1K2=1, K3=1K3=1, K4=4K4=4 giving
n
=
4
n
1
+
n
2
n
=
4
n
1
+
n
2
(17)
k
=
k
1
+
4
k
2
k
=
k
1
+
4
k
2
(18)
The residue reduction in Equation 6 is not needed here since nn
does not exceed NN as n1n1 and n2n2 take on their values. Since,
in this example, the factors of NN have a common factor, only
one of the conditions in Equation 16 can hold and, therefore,
Equation 15 becomes
C
^
(
k
1
,
k
2
)
=
C
(
k
)
=
∑
n
2
=
0
3
∑
n
1
=
0
3
x
(
n
)
W
4
n
1
k
1
W
16
n
2
k
1
W
4
n
2
k
2
C
^
(
k
1
,
k
2
)
=
C
(
k
)
=
∑
n
2
=
0
3
∑
n
1
=
0
3
x
(
n
)
W
4
n
1
k
1
W
16
n
2
k
1
W
4
n
2
k
2
(19)
Note the definition of WNWN in Equation 3 allows the simple form of
W16K1K3=W4W16K1K3=W4
This has the form of a two-dimensional DFT with an extra
term W16W16, called a “twiddle factor". The inner
sum over n1n1 represents four length-4 DFTs, the W16W16 term
represents 16 complex multiplications, and the outer sum over
n2n2 represents another four length-4 DFTs. This choice of the
KiKi “uncouples" the calculations since the first sum over n1n1
for n2=0n2=0 calculates the DFT of the first row of the data array
x^(n1,n2)x^(n1,n2), and those data values are never needed in the
succeeding row calculations. The row calculations are independent,
and examination of the outer sum shows that the column
calculations are likewise independent. This is illustrated in Figure 1.
The left 4-by-4 array is the mapped input data, the center
array has the rows transformed, and the right array is the DFT
array. The row DFTs and the column DFTs are independent of each
other. The twiddle factors (TF) which are the center WW in
Equation 19, are the multiplications which take place on the center
array of Figure 1.
This uncoupling feature reduces the amount of arithmetic
required and allows the results of each row DFT to be written
back over the input data locations, since that input row will
not be needed again. This is called “in-place" calculation
and it results in a large memory requirement savings.
An example of the type-two map used when the factors of NN
are relatively prime is given for N=15N=15 as
n
=
5
n
1
+
n
2
n
=
5
n
1
+
n
2
(20)
k
=
k
1
+
3
k
2
k
=
k
1
+
3
k
2
(21)
The residue reduction is again not explicitly needed. Although the
factors 3 and 5 are relatively prime, use of the type-two map
sets only one of the terms in Equation 16 to zero. The DFT in
Equation 15 becomes
X
=
∑
n
2
=
0
4
∑
n
1
=
0
2
x
W
3
n
1
k
1
W
15
n
2
k
1
W
5
n
2
k
2
X
=
∑
n
2
=
0
4
∑
n
1
=
0
2
x
W
3
n
1
k
1
W
15
n
2
k
1
W
5
n
2
k
2
(22)
which has the same form as Equation 19, including the existence of
the twiddle factors (TF). Here the inner sum is five length-3
DFTs, one for each value of k1k1. This is illustrated in Equation 2
where the rectangles are the 5 by 3 data arrays and the system is called a ``mixed radix" FFT.
An alternate illustration is shown in Figure 3 where the
rectangles are the short length 3 and 5 DFTs.
The type-one map is illustrated next on the same length-15
example. This time the situation of Equation 7 with the “and"
condition is used in Equation 10 using an index map of
n
=
5
n
1
+
3
n
2
n
=
5
n
1
+
3
n
2
(23)
and
k
=
10
k
1
+
6
k
2
k
=
10
k
1
+
6
k
2
(24)
The residue reduction is now necessary. Since the factors of NN
are relatively prime and the type-one map is being used, both terms
in Equation 16 are zero, and Equation 15 becomes
X
^
=
∑
n
2
=
0
4
∑
n
1
=
0
2
x
^
W
3
n
1
k
1
W
5
n
2
k
2
X
^
=
∑
n
2
=
0
4
∑
n
1
=
0
2
x
^
W
3
n
1
k
1
W
5
n
2
k
2
(25)
which is similar to Equation 22, except that now the type-one map
gives a pure two-dimensional DFT calculation with no TFs, and the
sums can be done in either order. Figures Figure 2 and Figure 3
also describe this case but now there are no Twiddle Factor
multiplications in the center and the resulting system is called a ``prime factor algorithm" (PFA).
The purpose of index mapping is to improve the arithmetic
efficiency. For example a direct calculation of a length-16 DFT
requires 16^2 or 256 real multiplications (recall, one complex multiplication requires 4 real multiplications and 2 real additions) and an uncoupled version
requires 144. A direct calculation of a length-15 DFT requires
225 multiplications but with a type-two map only 135 and with a
type-one map, 120. Recall one complex multiplication requires
four real multiplications and two real additions.
Algorithms of practical interest use short DFT's that
require fewer than N2N2 multiplications. For example, length-4
DFTs require no multiplications and, therefore, for the length-16
DFT, only the TFs must be calculated. That calculation uses
16 multiplications, many fewer than the 256 or 144 required for
the direct or uncoupled calculation.
The concept of using an index map can also be applied to
convolution to convert a length N=N1N2N=N1N2 one-dimensional
cyclic convolution into a N1N1 by N2N2 two-dimensional cyclic
convolution [5], [1]. There is no savings of arithmetic
from the mapping alone as there is with the DFT, but savings can be
obtained by using special short algorithms along each dimension.
This is discussed in Algorithms for Data with Restrictions .
Because use of both the type-one and two index maps
uncouples the calculations of the rows and columns of the data
array, the results of each short length NiNi DFT can be written
back over the data as it will not be needed again after that
particular row or column is transformed. This is easily seen from
Figures Figure 1, Figure 2, and Figure 3 where the DFT of the
first row of x(n1,n2)x(n1,n2) can be put
back over the data rather written into a new array. After all the
calculations are finished, the total DFT is in the array of the
original data. This gives a significant memory savings over using a
separate array for the output.
Unfortunately, the use of in-place calculations results in
the order of the DFT values being permuted or scrambled. This is because
the data is indexed according to the input map Equation 6 and the
results are put into the same locations rather than the locations
dictated by the output map Equation 12. For example with a length-8
radix-2 FFT, the input index map is
n
=
4
n
1
+
2
n
2
+
n
3
n
=
4
n
1
+
2
n
2
+
n
3
(26)
which to satisfy Equation 16 requires an output map of
k
=
k
1
+
2
k
2
+
4
k
3
k
=
k
1
+
2
k
2
+
4
k
3
(27)
The in-place calculations will place the DFT results in the
locations of the input map and these should be reordered or
unscrambled into the locations given by the output map.
Examination of these two maps shows the scrambled output to be in
a “bit reversed" order.
For certain applications, this scrambled output order is not
important, but for many applications, the order must be unscrambled
before the DFT can be considered complete. Because the radix of the
radix-2 FFT is the same as the base of the binary number
representation, the correct address for any term is found by
reversing the binary bits of the address. The part of most FFT programs that does
this reordering is called a bit-reversed counter. Examples of
various unscramblers are found in [9], [4] and in the appendices.
The development here uses the input map and the resulting
algorithm is called “decimation-in-frequency". If the output rather
than the input map is used to derive the FFT algorithm so the
correct output order is obtained, the input order must be scrambled
so that its values are in locations specified by the output map
rather than the input map. This algorithm is called
“decimation-in-time". The scrambling is the same bit-reverse
counting as before, but it precedes the FFT algorithm in this case.
The same process of a post-unscrambler or pre-scrambler occurs for
the in-place calculations with the type-one maps. Details can be
found in [4], [2]. It is possible to do the unscrambling
while calculating the FFT and to avoid a separate unscrambler. This
is done for the Cooley-Tukey FFT in [11] and for the PFA in
[4], [2], [16].
If a radix-2 FFT is used, the unscrambler is a bit-reversed
counter. If a radix-4 FFT is used, the unscrambler is a base-4
reversed counter, and similarly for radix-8 and others. However,
if for the radix-4 FFT, the short length-4 DFTs (butterflies) have their
outputs in bit-revered order, the output of the total radix-4 FFT
will be in bit-reversed order, not base-4 reversed order. This
means any radix-2n2n FFT can use the same radix-2 bit-reversed
counter as an unscrambler if the proper butterflies are used.
In this section the reductions in arithmetic in the DFT that
result from the index mapping alone will be examined. In
practical algorithms several methods are always combined, but it
is helpful in understanding the effects of a particular method to
study it alone.
The most general form of an uncoupled two-dimensional DFT is
given by
X
(
k
1
,
k
2
)
=
∑
n
2
=
0
N
2
-
1
{
∑
n
1
=
0
N
1
-
1
x
(
n
1
,
n
2
)
f
1
(
n
1
,
n
2
,
k
1
)
}
f
2
(
n
2
,
k
1
,
k
2
)
X
(
k
1
,
k
2
)
=
∑
n
2
=
0
N
2
-
1
{
∑
n
1
=
0
N
1
-
1
x
(
n
1
,
n
2
)
f
1
(
n
1
,
n
2
,
k
1
)
}
f
2
(
n
2
,
k
1
,
k
2
)
(28)
where the inner sum calculates N2N2 length-N1N1 DFT's and, if for
a type-two map, the effects of the TFs. If the number of arithmetic
operations for a length-N DFT is denoted by F(N)F(N), the number of
operations for this inner sum is F=N2F(N1)F=N2F(N1). The outer sum
which gives N1N1 length-N2N2 DFT's requires N1F(N2)N1F(N2)
operations. The total number of arithmetic operations is then
F
=
N
2
F
(
N
1
)
+
N
1
F
(
N
2
)
F
=
N
2
F
(
N
1
)
+
N
1
F
(
N
2
)
(29)
The first question to be considered is for a fixed length NN, what
is the optimal relation of N1N1 and N2N2 in the sense of
minimizing the required amount of arithmetic. To answer this
question, N1N1 and N2N2 are temporarily assumed to be real
variables rather than integers. If the short length-NiNi DFT's in
Equation 28 and any TF multiplications are assumed to require
Ni2Ni2 operations, i.e. F(Ni)=Ni2F(Ni)=Ni2, "Efficiencies Resulting from Index Mapping with the
DFT" becomes
F
=
N
2
N
1
2
+
N
1
N
2
2
=
N
(
N
1
+
N
2
)
=
N
(
N
1
+
N
N
1
-
1
)
F
=
N
2
N
1
2
+
N
1
N
2
2
=
N
(
N
1
+
N
2
)
=
N
(
N
1
+
N
N
1
-
1
)
(30)
To find the minimum of FF over N1N1, the derivative of FF with
respect to N1N1 is set to zero (temporarily assuming the variables
to be continuous) and the result requires N1=N2N1=N2.
d
F
/
d
N
1
=
0
=
>
N
1
=
N
2
d
F
/
d
N
1
=
0
=
>
N
1
=
N
2
(31)
This result is also easily seen from the symmetry of N1N1 and N2N2
in N=N1N2N=N1N2. If a more general model of the arithmetic
complexity of the short DFT's is used, the same result is obtained,
but a closer examination must be made to assure that N1=N2N1=N2 is
a global minimum.
If only the effects of the index mapping are to be
considered, then the F(N)=N2F(N)=N2 model is used and Equation 31
states that the two factors should be equal. If there are M factors,
a similar reasoning shows that all MM factors should be equal. For
the sequence of length
N
=
R
M
N
=
R
M
(32)
there are now MM length-R DFT's and, since the factors are all
equal, the index map must be type two. This means there must be
twiddle factors.
In order to simplify the analysis, only the number of
multiplications will be considered. If the number of multiplications
for a length-R DFT is F(R)F(R), then the formula for operation counts
in Equation 30 generalizes to
F
=
N
∑
i
=
1
M
F
(
N
i
)
/
N
i
=
N
M
F
(
R
)
/
R
F
=
N
∑
i
=
1
M
F
(
N
i
)
/
N
i
=
N
M
F
(
R
)
/
R
(33)
for Ni=RNi=R
F
=
N
ln
R
(
N
)
F
(
R
)
/
R
=
(
N
ln
N
)
(
F
(
R
)
/
(
R
ln
R
)
)
F
=
N
ln
R
(
N
)
F
(
R
)
/
R
=
(
N
ln
N
)
(
F
(
R
)
/
(
R
ln
R
)
)
(34)
This is a very important formula which was derived by Cooley and
Tukey in their famous paper [6] on the FFT. It states that
for a given R which is called the radix, the number of
multiplications (and additions) is proportional to NlnNNlnN. It
also shows the relation to the value of the radix, RR.
In order to get some idea of the “best" radix, the number of
multiplications to compute a length-R DFT is assumed to be F(R)=RxF(R)=Rx. If this is used with Equation 34, the optimal R can be found.
d
F
/
d
R
=
0
=
>
R
=
e
1
/
(
x
-
1
)
d
F
/
d
R
=
0
=
>
R
=
e
1
/
(
x
-
1
)
(35)
For x=2x=2 this gives R=eR=e, with the closest integer being
three.
The result of this analysis states that if no other
arithmetic saving methods other than index mapping are used, and if
the length-R DFT's plus TFs require F=R2F=R2 multiplications, the
optimal algorithm requires
F
=
3
N
log
3
N
F
=
3
N
log
3
N
(36)
multiplications for a length N=3MN=3M DFT. Compare this with N2N2
for a direct calculation and the improvement is obvious.
While this is an interesting result from the analysis of the
effects of index mapping alone, in practice, index mapping is almost
always used in conjunction with special algorithms for the short
length-NiNi DFT's in Equation 15. For example, if R=2R=2 or 4,
there are no multiplications required for the short DFT's. Only the
TFs require multiplications. Winograd (see Winorad's Short DFT Algorithms) has
derived some algorithms for short DFT's that require O(N)O(N)
multiplications. This means that F(Ni)=KNiF(Ni)=KNi and the
operation count FF in "Efficiencies Resulting from Index Mapping with the
DFT" is independent of NiNi.
Therefore, the derivative of FF is zero for all NiNi. Obviously,
these particular cases must be examined.
It is possible to formulate the DFT so a length-N DFT can be calculated in terms
of two length-(N/2) DFTs. And, if N=2MN=2M, each of those length-(N/2) DFTs can
be found in terms of length-(N/4) DFTs. This allows the DFT to be calculated by
a recursive algorithm with MM recursions, giving the familiar order Nlog(N)Nlog(N)
arithmetic complexity.
Calculate the even indexed DFT values from Equation 1 by:
C
(
2
k
)
=
∑
n
=
0
N
-
1
x
(
n
)
W
N
2
n
k
=
∑
n
=
0
N
-
1
x
(
n
)
W
N
/
2
n
k
C
(
2
k
)
=
∑
n
=
0
N
-
1
x
(
n
)
W
N
2
n
k
=
∑
n
=
0
N
-
1
x
(
n
)
W
N
/
2
n
k
(37)
C
(
2
k
)
=
∑
n
=
0
N
/
2
-
1
x
(
n
)
W
N
2
n
k
+
∑
n
=
N
/
2
N
-
1
x
(
n
)
W
N
/
2
n
k
C
(
2
k
)
=
∑
n
=
0
N
/
2
-
1
x
(
n
)
W
N
2
n
k
+
∑
n
=
N
/
2
N
-
1
x
(
n
)
W
N
/
2
n
k
(38)
C
(
2
k
)
=
∑
n
=
0
N
/
2
-
1
{
x
(
n
)
+
x
(
n
+
N
/
2
)
}
W
N
/
2
n
k
C
(
2
k
)
=
∑
n
=
0
N
/
2
-
1
{
x
(
n
)
+
x
(
n
+
N
/
2
)
}
W
N
/
2
n
k
(39)
and a similar argument gives the odd indexed values as:
C
(
2
k
+
1
)
=
∑
n
=
0
N
/
2
-
1
{
x
(
n
)
-
x
(
n
+
N
/
2
)
}
W
N
n
W
N
/
2
n
k
C
(
2
k
+
1
)
=
∑
n
=
0
N
/
2
-
1
{
x
(
n
)
-
x
(
n
+
N
/
2
)
}
W
N
n
W
N
/
2
n
k
(40)
Together, these are recursive DFT formulas expressing the length-N DFT of x(n)x(n)
in terms of length-N/2 DFTs:
C
(
2
k
)
=
DFT
N
/
2
{
x
(
n
)
+
x
(
n
+
N
/
2
)
}
C
(
2
k
)
=
DFT
N
/
2
{
x
(
n
)
+
x
(
n
+
N
/
2
)
}
(41)
C
(
2
k
+
1
)
=
DFT
N
/
2
{
[
x
(
n
)
-
x
(
n
+
N
/
2
)
]
W
N
n
}
C
(
2
k
+
1
)
=
DFT
N
/
2
{
[
x
(
n
)
-
x
(
n
+
N
/
2
)
]
W
N
n
}
(42)
This is a “decimation-in-frequency" (DIF) version since it gives
samples of the frequency domain representation in terms of blocks
of the time domain signal.
A recursive Matlab program which implements this is given by:
Listing 1: DIF Recursive FFT for N=2MN=2Mfunction c = dftr2(x)
% Recursive Decimation-in-Frequency FFT algorithm, csb 8/21/07
L = length(x);
if L > 1
L2 = L/2;
TF = exp(-j*2*pi/L).^[0:L2-1];
c1 = dftr2( x(1:L2) + x(L2+1:L));
c2 = dftr2((x(1:L2) - x(L2+1:L)).*TF);
cc = [c1';c2'];
c = cc(:);
else
c = x;
end
|
A DIT version can be derived in the form:
C
(
k
)
=
DFT
N
/
2
{
x
(
2
n
)
}
+
W
N
k
DFT
N
/
2
{
x
(
2
n
+
1
)
}
C
(
k
)
=
DFT
N
/
2
{
x
(
2
n
)
}
+
W
N
k
DFT
N
/
2
{
x
(
2
n
+
1
)
}
(43)
C
(
k
+
N
/
2
)
=
DFT
N
/
2
{
x
(
2
n
)
}
-
W
N
k
DFT
N
/
2
{
x
(
2
n
+
1
)
}
C
(
k
+
N
/
2
)
=
DFT
N
/
2
{
x
(
2
n
)
}
-
W
N
k
DFT
N
/
2
{
x
(
2
n
+
1
)
}
(44)
which gives blocks of the frequency domain from samples of the signal.
A recursive Matlab program which implements this is given by:
Listing 2: DIT Recursive FFT for N=2MN=2Mfunction c = dftr(x)
% Recursive Decimation-in-Time FFT algorithm, csb
L = length(x);
if L > 1
L2 = L/2;
ce = dftr(x(1:2:L-1));
co = dftr(x(2:2:L));
TF = exp(-j*2*pi/L).^[0:L2-1];
c1 = TF.*co;
c = [(ce+c1), (ce-c1)];
else
c = x;
end
|
Similar recursive expressions can be developed for other radices and and algorithms.
Most recursive programs do not execute as efficiently as looped or straight code,
but some can be very efficient, e.g. parts of the FFTW.
Note a length-2M2M sequence will require MM recursions, each of which will require
N/2N/2 multiplications. This give the Nlog(N)Nlog(N) formula that the other approaches
also derive.
-
Agarwal, R. C. and Cooley, J. W. (1977, October). New Algorithms for Digital Convolution. IEEE Trans. on ASSP, 25(2), 392–410.
-
Burrus, C. S. and Eschenbacher, P. W. (1981, August). An In–Place, In–Order Prime Factor FFT Algorithm. [Reprinted in it DSP Software, by L.R. Morris, 1983]. IEEE Transactions on Acoustics, Speech, and Signal Processing, 29(4), 806–817.
-
Blahut, Richard E. (1985). Fast Algorithms for Digital Signal Processing. Reading, Mass.: Addison-Wesley.
-
Burrus, C. S. and Parks, T. W. (1985). DFT/FFT and Convolution Algorithms. New York: John Wiley & Sons.
-
Burrus, C. S. (1977, June). Index Mapping for Multidimensional Formulation of the DFT and Convolution. IEEE Transactions on Acoustics, Speech, and Signal Processing, ASSP-25(3), 239–242.
-
Cooley, J. W. and Tukey, J. W. (1965). An Algorithm for the Machine Calculation of Complex Fourier Series. Math. Computat., 19, 297–301.
-
Elliott, Douglas F. (Ed.). (1987). Handbook of Digital Signal Processing. [Chapter 7 on FFT by Elliott]. San Diego, CA: Academic Press.
-
Good, I. J. (1958). Interaction Algorithm and Practical Fourier Analysis. [Addendum: vol. 22, 1960, pp 372–375]. J. Royal Statist. Soc., B, 20, 361–372.
-
Gold, B. and Rader, C. M. (1969). Digital Processing of Signals. New York: McGraw-Hill.
-
Hegland, Markus and Wheeler, W. W. (1997). Linear Bijections and the Fast Fourier Transform. Applicable Algebra in Engineering, Communication and Computing, 8(2), 143–163.
-
Johnson, H. W. and Burrus, C. S. (1984, March). An In-Order, In-Place Radix-2 FFT. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing. (p. 28A.2.1–4). San Diego, CA
-
Lun, D. P-K. and Siu, W-C. (1993, July). An Analysis for the Realization of an In-Place and In-Order Prime Factor Algorithm. IEEE Transactions on Signal Processing, 41(7), 2362–2370.
-
McClellan, J. H. and Rader, C. M. (1979). Number Theory in Digital Signal Processing. Englewood Cliffs, NJ: Prentice-Hall.
-
Oppenheim, A. V. and Schafer, R. W. (1999). Discrete-Time Signal Processing. (Second). [Earlier editions in 1975 and 1989]. Englewood Cliffs, NJ: Prentice-Hall.
-
Parks, T. W. and Burrus, C. S. (1987). Digital Filter Design. New York: John Wiley & Sons.
-
Rothweiler, J.H. (1982, February). Implementation of the In-Order Prime Factor FFT Algorithm. IEEE TRANS. ON ASSP, 30, 105-107.
-
Rabiner, L. R. and Rader, C. M. (Eds.). (1972). Digital Signal Processing, selected reprints. New York: IEEE Press.
-
Schatzman, James C. (1996, March). Index Mapping for the Fast Fourier Transform. IEEE Transactions on Signal Processing, 44(3), 717–719.
-
Winograd, S. (1976, April). On Computing the Discrete Fourier Transform. Proc. National Academy of Sciences, USA, 73, 1006–1006.
"The Fast Fourier Transform (FFT) is a landmark algorithm used in fields ranging from signal processing to high-performance computing. First popularized by two American scientists in 1965, the […]"