by Markus Pueschel, Carnegie Mellon University
In infinite, or nonperiodic, discretetime signal processing, there
is a strong connection between the zztransform, Laurent series,
convolution, and the discretetime Fourier transform (DTFT)
[10]. As one may expect, a similar connection exists
for the DFT but bears surprises. Namely, it turns out that the proper
framework for the DFT requires modulo operations of polynomials, which
means working with socalled polynomial algebras
[6]. Associated with polynomial algebras is the Chinese
remainder theorem, which describes the DFT algebraically and can be
used as a tool to concisely derive various FFTs as well as convolution
algorithms [9], [21], [22], [1]
(see also Winograd’s Short DFT Algorithms). The polynomial algebra
framework was fully developed for signal processing as part of the
algebraic signal processing theory (ASP). ASP identifies the structure
underlying many transforms used in signal processing, provides deep
insight into their properties, and enables the derivation of their
fast algorithms
[13], [14], [11], [12]. Here we
focus on the algebraic description of the DFT and on the algebraic
derivation of the generalradix CooleyTukey FFT from
Factoring the Signal Processing Operators. The derivation will make use of and extend the
Polynomial Description of Signals.
We start with motivating the appearance
of modulo operations.
The zztransform associates with infinite discrete signals X=(⋯,x(1),x(0),x(1),⋯)X=(⋯,x(1),x(0),x(1),⋯) a Laurent series:
X
↦
X
(
s
)
=
∑
n
∈
Z
x
(
n
)
s
n
.
X
↦
X
(
s
)
=
∑
n
∈
Z
x
(
n
)
s
n
.
(1)
Here we used s=z1s=z1 to simplify the notation in the following.
The DTFT of XX is the evaluation of X(s)X(s) on the unit circle
X
(
e

j
ω
)
,

π
<
ω
≤
π
.
X
(
e

j
ω
)
,

π
<
ω
≤
π
.
(2)
Finally, filtering or (linear) convolution is simply the multiplication
of Laurent series,
H
*
X
↔
H
(
s
)
X
(
s
)
.
H
*
X
↔
H
(
s
)
X
(
s
)
.
(3)
For finite signals X=(x(0),⋯,x(N1))X=(x(0),⋯,x(N1)) one expects that the
equivalent of Equation 1 becomes a mapping to polynomials of
degree N1N1,
X
↦
X
(
s
)
=
∑
n
=
0
N

1
x
(
n
)
s
n
,
X
↦
X
(
s
)
=
∑
n
=
0
N

1
x
(
n
)
s
n
,
(4)
and that the DFT is an evaluation of these polynomials. Indeed, the
definition of the DFT in Winograd’s Short DFT Algorithms shows that
C
(
k
)
=
X
(
W
N
k
)
=
X
(
e

j
2
π
k
N
)
,
0
≤
k
<
N
,
C
(
k
)
=
X
(
W
N
k
)
=
X
(
e

j
2
π
k
N
)
,
0
≤
k
<
N
,
(5)
i.e., the DFT computes the evaluations of the polynomial X(s)X(s)
at the nnth roots of unity.
The problem arises with the equivalent of Equation 3,
since the multiplication H(s)X(s)H(s)X(s) of two polynomials of degree N1N1
yields one of degree 2N22N2. Also, it does not coincide with the
circular convolution known to be associated with the DFT. The solution
to both problems is to reduce the product modulo sn1sn1:
H
*
circ
X
↔
H
(
s
)
X
(
s
)
mod
(
s
n

1
)
.
H
*
circ
X
↔
H
(
s
)
X
(
s
)
mod
(
s
n

1
)
.
(6)
Table 1: Infinite and finite discrete time signal processing.
Concept 
Infinite Time 
Finite Time 
Signal 
X
(
s
)
=
∑
n
∈
Z
x
(
n
)
s
n
X
(
s
)
=
∑
n
∈
Z
x
(
n
)
s
n

∑
n
=
0
N

1
x
(
n
)
s
n
∑
n
=
0
N

1
x
(
n
)
s
n

Filter 
H
(
s
)
=
∑
n
∈
Z
h
(
n
)
s
n
H
(
s
)
=
∑
n
∈
Z
h
(
n
)
s
n

∑
n
=
0
N

1
h
(
n
)
s
n
∑
n
=
0
N

1
h
(
n
)
s
n

Convolution 
H
(
s
)
X
(
s
)
H
(
s
)
X
(
s
)

H
(
s
)
X
(
s
)
mod
(
s
n

1
)
H
(
s
)
X
(
s
)
mod
(
s
n

1
)

Fourier transform 
DTFT:
X
(
e

j
ω
)
,

π
<
ω
≤
π
DTFT:
X
(
e

j
ω
)
,

π
<
ω
≤
π

DFT:
X
(
e

j
2
π
k
n
)
,
0
≤
k
<
n
DFT:
X
(
e

j
2
π
k
n
)
,
0
≤
k
<
n

The resulting polynomial then has again degree N1N1 and this form of
convolution becomes equivalent to circular convolution of the
polynomial coefficients. We also observe that the evaluation points in
Equation 5 are precisely the roots of sn1sn1. This connection will
become clear in this chapter.
The discussion is summarized in Table 1.
The proper framework to describe the multiplication of polynomials
modulo a fixed polynomial are polynomial algebras. Together with the
Chinese remainder theorem, they provide the theoretical underpinning
for the DFT and the CooleyTukey FFT.
In this chapter, the DFT will naturally arise as a linear mapping with
respect to chosen bases, i.e., as a matrix. Indeed, the definition
shows that if all input and outputs are collected into
vectors X=(X(0),⋯,X(N1))X=(X(0),⋯,X(N1)) and C=(C(0),⋯C(N1))C=(C(0),⋯C(N1)), then
Winograd’s Short DFT Algorithms is equivalent to
C
=
DFT
N
X
,
C
=
DFT
N
X
,
(7)
where
DFT
N
=
[
W
N
k
n
]
0
≤
k
,
n
<
N
.
DFT
N
=
[
W
N
k
n
]
0
≤
k
,
n
<
N
.
(8)
The matrix point of view is adopted in the FFT books
[18], [17].
In this section we introduce polynomial algebras and explain how they
are associated to transforms. Then we identify this connection for the
DFT. Later we use polynomial algebras to derive the CooleyTukey FFT.
For further background on the mathematics in this section and
polynomial algebras in particular, we refer to [6].
An algebra AA is a vector space that also provides a
multiplication of its elements such that the distributivity law holds
(see [6] for a complete definition). Examples include
the sets of complex or real numbers CC or RR, and the sets of
complex or real polynomials in the variable ss: C[s]C[s] or R[s]R[s].
The key player in this chapter is the polynomial algebra. Given
a fixed polynomial P(s)P(s) of degree deg(P)=Ndeg(P)=N, we define a
polynomial algebra as the set
C
[
s
]
/
P
(
s
)
=
{
X
(
s
)
∣
deg
(
X
)
<
deg
(
P
)
}
C
[
s
]
/
P
(
s
)
=
{
X
(
s
)
∣
deg
(
X
)
<
deg
(
P
)
}
(9)of polynomials of degree smaller than NN with addition and
multiplication modulo PP. Viewed as a vector space, C[s]/P(s)C[s]/P(s)
hence has dimension NN.
Every polynomial X(s)∈C[s]X(s)∈C[s] is reduced to a unique polynomial
R(s)R(s) modulo P(s)P(s) of degree smaller than NN. R(s)R(s) is computed
using division with rest, namely
X
(
s
)
=
Q
(
s
)
P
(
s
)
+
R
(
s
)
,
deg
(
R
)
<
deg
(
P
)
.
X
(
s
)
=
Q
(
s
)
P
(
s
)
+
R
(
s
)
,
deg
(
R
)
<
deg
(
P
)
.
(10)Regarding this equation modulo PP, P(s)P(s) becomes zero, and we get
X
(
s
)
≡
R
(
s
)
mod
P
(
s
)
.
X
(
s
)
≡
R
(
s
)
mod
P
(
s
)
.
(11)We read this equation as “X(s)X(s) is congruent (or equal) R(s)R(s)
modulo P(s)P(s).” We will also write X(s)modP(s)X(s)modP(s) to denote
that X(s)X(s) is reduced modulo P(s)P(s). Obviously,
P
(
s
)
≡
0
mod
P
(
s
)
.
P
(
s
)
≡
0
mod
P
(
s
)
.
(12)As a simple example we consider A=C[s]/(s21)A=C[s]/(s21), which has
dimension 2. A possible basis is b=(1,s)b=(1,s). In AA, for example,
s·(s+1)=s2+s≡s+1mod(s21)s·(s+1)=s2+s≡s+1mod(s21), obtained
through division with rest
s
2
+
s
=
1
·
(
s
2

1
)
+
(
s
+
1
)
s
2
+
s
=
1
·
(
s
2

1
)
+
(
s
+
1
)
(13)or simply by replacing s2s2 with 1 (since s21=0s21=0 implies s2=1s2=1).
Assume P(s)=Q(s)R(s)P(s)=Q(s)R(s) factors into two coprime (no common factors)
polynomials QQ and RR. Then the Chinese remainder theorem (CRT) for
polynomials is the linear mapping
Δ
:
C
[
s
]
/
P
(
s
)
→
C
[
s
]
/
Q
(
s
)
⊕
C
[
s
]
/
R
(
s
)
,
X
(
s
)
↦
(
X
(
s
)
mod
Q
(
s
)
,
X
(
s
)
mod
R
(
s
)
)
.
Δ
:
C
[
s
]
/
P
(
s
)
→
C
[
s
]
/
Q
(
s
)
⊕
C
[
s
]
/
R
(
s
)
,
X
(
s
)
↦
(
X
(
s
)
mod
Q
(
s
)
,
X
(
s
)
mod
R
(
s
)
)
.
(14)Here, ⊕⊕ is the Cartesian product of vector spaces with
elementwise operation (also called outer direct sum). In words, the CRT
asserts that computing (addition, multiplication, scalar
multiplication) in C[s]/P(s)C[s]/P(s) is equivalent to computing in parallel
in C[s]/Q(s)C[s]/Q(s) and C[s]/R(s)C[s]/R(s).
If we choose bases b,c,db,c,d in the three polynomial algebras, then
ΔΔ can be expressed as a matrix. As usual with linear mappings,
this matrix is obtained by mapping every element of bb with ΔΔ,
expressing it in the concatenation c∪dc∪d of the bases cc and dd,
and writing the results into the columns of the matrix.
As an example, we consider again the polynomial P(s)=s21=(s1)(s+1)P(s)=s21=(s1)(s+1) and the CRT decomposition
Δ
:
C
[
s
]
/
(
s
2

1
)
→
C
[
s
]
/
(
x

1
)
⊕
C
[
s
]
/
(
x
+
1
)
.
Δ
:
C
[
s
]
/
(
s
2

1
)
→
C
[
s
]
/
(
x

1
)
⊕
C
[
s
]
/
(
x
+
1
)
.
(15)As bases, we choose b=(1,x),c=(1),d=(1)b=(1,x),c=(1),d=(1). Δ(1)=(1,1)Δ(1)=(1,1) with the same coordinate vector in c∪d=(1,1)c∪d=(1,1). Further,
because of x≡1mod(x1)x≡1mod(x1) and x≡1mod(x+1)x≡1mod(x+1), Δ(x)=(x,x)≡(1,1)Δ(x)=(x,x)≡(1,1) with the same coordinate
vector. Thus, ΔΔ in matrix form is the socalled butterfly
matrix, which is a DFT of size 2: DFT2=[
1 1
1 1
]DFT2=[
1 1
1 1
].
Assume P(s)∈C[s]P(s)∈C[s] has pairwise distinct
zeros α=(α0,⋯,αN1)α=(α0,⋯,αN1). Then the CRT
can be used to completely
decompose C[s]/P(s)C[s]/P(s) into its spectrum:
Δ
:
C
[
s
]
/
P
(
s
)
→
C
[
s
]
/
(
s

α
0
)
⊕
⋯
⊕
C
[
s
]
/
(
s

α
N

1
)
,
X
(
s
)
↦
(
X
(
s
)
mod
(
s

α
0
)
,
⋯
,
X
(
s
)
mod
(
s

α
N

1
)
)
=
(
s
(
α
0
)
,
⋯
,
s
(
α
N

1
)
)
.
Δ
:
C
[
s
]
/
P
(
s
)
→
C
[
s
]
/
(
s

α
0
)
⊕
⋯
⊕
C
[
s
]
/
(
s

α
N

1
)
,
X
(
s
)
↦
(
X
(
s
)
mod
(
s

α
0
)
,
⋯
,
X
(
s
)
mod
(
s

α
N

1
)
)
=
(
s
(
α
0
)
,
⋯
,
s
(
α
N

1
)
)
.
(16)If we choose a basis b=(P0(s),⋯,PN1(s))b=(P0(s),⋯,PN1(s)) in C[s]/P(s)C[s]/P(s) and
bases bi=(1)bi=(1) in each C[s]/(sαi)C[s]/(sαi), then ΔΔ, as a linear
mapping, is represented by a matrix. The matrix is obtained by mapping
every basis element PnPn, 0≤n<N0≤n<N, and collecting the results in
the columns of the matrix. The result is
P
b
,
α
=
[
P
n
(
α
k
)
]
0
≤
k
,
n
<
N
P
b
,
α
=
[
P
n
(
α
k
)
]
0
≤
k
,
n
<
N
(17)and is called the polynomial transform
for A=C[s]/P(s)A=C[s]/P(s) with basis bb.
If, in general, we choose bi=(βi)bi=(βi) as spectral basis, then the
matrix corresponding to the decomposition Equation 16 is the scaled
polynomial transform
diag
0
≤
k
<
N
(
1
/
β
n
)
P
b
,
α
,
diag
0
≤
k
<
N
(
1
/
β
n
)
P
b
,
α
,
(18)where diag0≤n<N(γn)diag0≤n<N(γn) denotes a diagonal matrix
with diagonal entries γnγn.
We jointly refer to polynomial transforms, scaled or not, as Fourier
transforms.
We show that the DFTNDFTN is a polynomial transform for A=C[s]/(sN1)A=C[s]/(sN1) with basis b=(1,s,⋯,sN1)b=(1,s,⋯,sN1). Namely,
s
N

1
=
∏
0
≤
k
<
N
(
x

W
N
k
)
,
s
N

1
=
∏
0
≤
k
<
N
(
x

W
N
k
)
,
(19)which means that ΔΔ takes the form
Δ
:
C
[
s
]
/
(
s
N

1
)
→
C
[
s
]
/
(
s

W
N
0
)
⊕
⋯
⊕
C
[
s
]
/
(
s

W
N
N

1
)
,
X
(
s
)
↦
(
X
(
s
)
mod
(
s

W
N
0
)
,
⋯
,
X
(
s
)
mod
(
s

W
N
N

1
)
)
=
(
X
(
W
N
0
)
,
⋯
,
X
(
W
N
N

1
)
)
.
Δ
:
C
[
s
]
/
(
s
N

1
)
→
C
[
s
]
/
(
s

W
N
0
)
⊕
⋯
⊕
C
[
s
]
/
(
s

W
N
N

1
)
,
X
(
s
)
↦
(
X
(
s
)
mod
(
s

W
N
0
)
,
⋯
,
X
(
s
)
mod
(
s

W
N
N

1
)
)
=
(
X
(
W
N
0
)
,
⋯
,
X
(
W
N
N

1
)
)
.
(20)The associated polynomial transform hence becomes
P
b
,
α
=
[
W
N
k
n
]
0
≤
k
,
n
<
N
=
DFT
N
.
P
b
,
α
=
[
W
N
k
n
]
0
≤
k
,
n
<
N
=
DFT
N
.
(21)This interpretation of the DFT has been known at least since
[21], [9] and clarifies the connection between
the evaluation points in Equation 5 and the circular convolution in
Equation 6.
In [4], DFTs of types 1–4 are defined, with type 1
being the standard DFT. In the algebraic framework, type 3 is obtained
by choosing A=C[s]/(sN+1)A=C[s]/(sN+1) as algebra with the same basis as
before:
P
b
,
α
=
[
W
N
(
k
+
1
/
2
)
n
]
0
≤
k
,
n
<
N
=
DFT
3
N
,
P
b
,
α
=
[
W
N
(
k
+
1
/
2
)
n
]
0
≤
k
,
n
<
N
=
DFT
3
N
,
(22)The DFTs of type 2 and 4 are scaled polynomial
transforms [13].
Knowing the polynomial algebra underlying the DFT enables us to derive
the CooleyTukey FFT algebraically. This means that instead of
manipulating the DFT definition, we manipulate the polynomial algebra
C[s]/(sN1)C[s]/(sN1). The basic idea is intuitive. We showed that the DFT
is the matrix representation of the complete decomposition
Equation 20. The CooleyTukey FFT is now derived by performing this
decomposition in steps as shown in Figure 1. Each
step yields a sparse matrix; hence, the DFTNDFTN is factorized into a
product of sparse matrices, which will be the matrix representation of
the CooleyTukey FFT.
This stepwise decomposition can be formulated generically for
polynomial transforms [15], [12]. Here, we
consider only the DFT.
We first introduce the matrix notation we will use and in particular
the Kronecker product formalism that became mainstream for FFTs
in [18], [17].
Then we first derive the radix2 FFT using a factorization of
sN1sN1. Subsequently, we obtain the generalradix FFT using a decomposition of sN1sN1.
We denote the N×NN×N identity matrix with ININ, and diagonal
matrices with
diag
0
≤
k
<
N
(
γ
k
)
=
γ
0
⋱
γ
N

1
.
diag
0
≤
k
<
N
(
γ
k
)
=
γ
0
⋱
γ
N

1
.
(23)The N×NN×N stride
permutation matrix is defined for N=KMN=KM by the permutation
L
M
N
:
i
K
+
j
↦
j
M
+
i
L
M
N
:
i
K
+
j
↦
j
M
+
i
(24)for 0≤i<K,0≤j<M0≤i<K,0≤j<M. This definition shows that LMNLMN
transposes a K×MK×M matrix stored in rowmajor order.
Alternatively, we can write
L
M
N
:
i
↦
iM
mod
N

1
,
for
0
≤
i
<
N

1
,
N

1
↦
N

1
.
L
M
N
:i↦iMmodN1,for0≤i<N1,N1↦N1.
(25)For example (·· means 0),
L
2
6
=
1
·
·
·
·
·
·
·
1
·
·
·
·
·
·
·
1
·
·
1
·
·
·
·
·
·
·
1
·
·
·
·
·
·
·
1
.
L
2
6
=
1
·
·
·
·
·
·
·
1
·
·
·
·
·
·
·
1
·
·
1
·
·
·
·
·
·
·
1
·
·
·
·
·
·
·
1
.
(26)LN/2NLN/2N is sometimes called the perfect shuffle.
Further, we use matrix operators; namely the direct sum
A
⊕
B
=
A
B
A
⊕
B
=
A
B
(27)and the Kronecker or tensor product
A
⊗
B
=
[
a
k
,
ℓ
B
]
k
,
ℓ
,
for
A
=
[
a
k
,
ℓ
]
.
A
⊗
B
=
[
a
k
,
ℓ
B
]
k
,
ℓ
,
for
A
=
[
a
k
,
ℓ
]
.
(28)In particular,
I
n
⊗
A
=
A
⊕
⋯
⊕
A
=
A
⋱
A
I
n
⊗
A
=
A
⊕
⋯
⊕
A
=
A
⋱
A
(29)is blockdiagonal.
We may also construct a larger matrix as a matrix of
matrices, e.g.,
If an algorithm for a transform is given as a product of sparse matrices
built from the constructs above, then an algorithm for the transpose or
inverse of the transform can be readily derived using mathematical
properties including
(
A
B
)
T
=
B
T
A
T
,
(
A
B
)

1
=
B

1
A

1
,
(
A
⊕
B
)
T
=
A
T
⊕
B
T
,
(
A
⊕
B
)

1
=
A

1
⊕
B

1
,
(
A
⊗
B
)
T
=
A
T
⊗
B
T
,
(
A
⊗
B
)

1
=
A

1
⊗
B

1
.
(
A
B
)
T
=
B
T
A
T
,
(
A
B
)

1
=
B

1
A

1
,
(
A
⊕
B
)
T
=
A
T
⊕
B
T
,
(
A
⊕
B
)

1
=
A

1
⊕
B

1
,
(
A
⊗
B
)
T
=
A
T
⊗
B
T
,
(
A
⊗
B
)

1
=
A

1
⊗
B

1
.
(31)Permutation matrices are orthogonal, i.e., PT=P1PT=P1. The
transposition or inversion of diagonal matrices is obvious.
The DFT decomposes A=C[s]/(sN1)A=C[s]/(sN1) with basis b=(1,s,⋯,sN1)b=(1,s,⋯,sN1) as shown in Equation 20. We assume N=2MN=2M.
Then
s
2
M

1
=
(
s
M

1
)
(
s
M
+
1
)
s
2
M

1
=
(
s
M

1
)
(
s
M
+
1
)
(32)factors and we can apply the CRT in
the following steps:
C
[
s
]
/
(
s
N

1
)
→
C
[
s
]
/
(
s
M

1
)
⊕
C
[
s
]
/
(
s
M
+
1
)
C
[
s
]
/
(
s
N

1
)
→
C
[
s
]
/
(
s
M

1
)
⊕
C
[
s
]
/
(
s
M
+
1
)
(33)
→
⊕
0
≤
i
<
M
C
[
s
]
/
(
x

W
N
2
i
)
⊕
⊕
0
≤
i
<
M
C
[
s
]
/
(
x

W
M
2
i
+
1
)
→
⊕
0
≤
i
<
M
C
[
s
]
/
(
x

W
N
2
i
)
⊕
⊕
0
≤
i
<
M
C
[
s
]
/
(
x

W
M
2
i
+
1
)
(34)
→
⊕
0
≤
i
<
N
C
[
s
]
/
(
x

W
N
i
)
.
→
⊕
0
≤
i
<
N
C
[
s
]
/
(
x

W
N
i
)
.
(35)As bases in the smaller algebras C[s]/(sM1)C[s]/(sM1) and C[s]/(sM+1)C[s]/(sM+1),
we choose c=d=(1,s,⋯,sM1)c=d=(1,s,⋯,sM1). The derivation of an
algorithm for DFTNDFTN based on Equation 33Equation 35 is now
completely mechanical by reading off the matrix for each of the three
decomposition steps. The product of these matrices is equal to
the DFTNDFTN.
First, we derive the base change matrix BB corresponding to
Equation 33. To do so, we have to express the base elements
sn∈bsn∈b in the basis c∪dc∪d; the coordinate
vectors are the columns of BB. For 0≤n<M0≤n<M, snsn is actually
contained in cc and dd, so the first MM columns of BB are
B
=
I
M
*
I
M
*
,
B
=
I
M
*
I
M
*
,
(36)where the entries ** are determined next.
For the base elements sM+nsM+n, 0≤n<M0≤n<M, we have
s
M
+
n
≡
s
n
mod
(
s
M

1
)
,
s
M
+
n
≡

s
n
mod
(
s
M
+
1
)
,
s
M
+
n
≡
s
n
mod
(
s
M

1
)
,
s
M
+
n
≡

s
n
mod
(
s
M
+
1
)
,
(37)which yields the final result
B
=
I
M

I
M
I
M

I
M
=
DFT
2
⊗
I
M
.
B
=
I
M

I
M
I
M

I
M
=
DFT
2
⊗
I
M
.
(38)Next, we consider step Equation 34. C[s]/(sM1)C[s]/(sM1) is decomposed
by DFTMDFTM and C[s]/(sM+1)C[s]/(sM+1) by DFT3MDFT3M in Equation 22.
Finally, the permutation in step Equation 35 is the perfect
shuffle LMNLMN, which interleaves the even and odd spectral
components (even and odd exponents of WNWN).
The final algorithm obtained is
DFT
2
M
=
L
M
N
(
DFT
M
⊕
DFT
3
M
)
(
DFT
2
⊗
I
M
)
.
DFT
2
M
=
L
M
N
(
DFT
M
⊕
DFT
3
M
)
(
DFT
2
⊗
I
M
)
.
(39)To obtain a better known form, we use DFT3M=DFTMDMDFT3M=DFTMDM,
with DM=diag0≤i<M(WNi)DM=diag0≤i<M(WNi), which is evident from
Equation 22. It yields
DFT
2
M
=
L
M
N
(
DFT
M
⊕
DFT
M
D
M
)
(
DFT
2
⊗
I
M
)
=
L
M
N
(
I
2
⊗
DFT
M
)
(
I
M
⊕
D
M
)
(
DFT
2
⊗
I
M
)
.
DFT
2
M
=
L
M
N
(
DFT
M
⊕
DFT
M
D
M
)
(
DFT
2
⊗
I
M
)
=
L
M
N
(
I
2
⊗
DFT
M
)
(
I
M
⊕
D
M
)
(
DFT
2
⊗
I
M
)
.
(40)The last expression is the radix2 decimationinfrequency
CooleyTukey FFT. The corresponding decimationintime version is
obtained by transposition using Equation 31 and the symmetry of
the DFT:
DFT
2
M
=
(
DFT
2
⊗
I
M
)
(
I
M
⊕
D
M
)
(
I
2
⊗
DFT
M
)
L
2
N
.
DFT
2
M
=
(
DFT
2
⊗
I
M
)
(
I
M
⊕
D
M
)
(
I
2
⊗
DFT
M
)
L
2
N
.
(41)The entries of the diagonal matrix IM⊕DMIM⊕DM are
commonly called twiddle factors.
The above method for deriving DFT algorithms is used extensively in
[9].
To algebraically derive the generalradix FFT, we use the decomposition property of sN1sN1. Namely, if N=KMN=KM then
s
N

1
=
(
s
M
)
K

1
.
s
N

1
=
(
s
M
)
K

1
.
(42)Decomposition means that the polynomial is written as the composition
of two polynomials: here, sMsM is inserted into sK1sK1.
Note that this is a special property: most polynomials do not decompose.
Based on this polynomial decomposition, we obtain the following
stepwise decomposition of C[s]/(sN1)C[s]/(sN1), which is more general than
the previous one in Equation 33–Equation 35. The basic idea
is to first decompose with respect to the outer polynomial tK1tK1, t=sMt=sM, and then completely [15]:
C
[
s
]
/
(
s
N

1
)
=
C
[
x
]
/
(
(
s
M
)
K

1
)
→
⊕
0
≤
i
<
K
C
[
s
]
/
(
s
M

W
K
i
)
C
[
s
]
/
(
s
N

1
)
=
C
[
x
]
/
(
(
s
M
)
K

1
)
→
⊕
0
≤
i
<
K
C
[
s
]
/
(
s
M

W
K
i
)
(43)
→
⊕
0
≤
i
<
K
⊕
0
≤
j
<
M
C
[
s
]
/
(
x

W
N
j
K
+
i
)
→
⊕
0
≤
i
<
K
⊕
0
≤
j
<
M
C
[
s
]
/
(
x

W
N
j
K
+
i
)
(44)
→
⊕
0
≤
i
<
N
C
[
s
]
/
(
x

W
N
i
)
.
→
⊕
0
≤
i
<
N
C
[
s
]
/
(
x

W
N
i
)
.
(45)As bases in the smaller algebras C[s]/(sMWKi)C[s]/(sMWKi) we choose
ci=(1,s,⋯,sM1)ci=(1,s,⋯,sM1). As before, the derivation is completely
mechanical from here: only the three matrices corresponding to
Equation 43–Equation 45 have to be read off.
The first decomposition step requires us to compute snmod(sMWKi)snmod(sMWKi), 0≤n<N0≤n<N. To do so, we decompose the index
nn as n=ℓM+mn=ℓM+m and compute
s
n
=
s
ℓ
M
+
m
=
(
s
M
)
ℓ
s
m
≡
W
k
ℓ
m
s
m
mod
(
s
M

W
K
i
)
.
s
n
=
s
ℓ
M
+
m
=
(
s
M
)
ℓ
s
m
≡
W
k
ℓ
m
s
m
mod
(
s
M

W
K
i
)
.
(46)This shows that the matrix for Equation 43
is given by DFTK⊗IMDFTK⊗IM.
In step Equation 44, each C[s]/(sMWKi)C[s]/(sMWKi) is completely
decomposed by its polynomial transform
DFT
M
(
i
,
K
)
=
DFT
M
·
diag
0
≤
i
<
M
(
W
N
i
j
)
.
DFT
M
(
i
,
K
)
=
DFT
M
·
diag
0
≤
i
<
M
(
W
N
i
j
)
.
(47)At this point, C[s]/(sN1)C[s]/(sN1) is
completely decomposed, but the spectrum is ordered according to
jK+ijK+i, 0≤i<M0≤i<M, 0≤j<K0≤j<K (jj runs faster). The desired
order is iM+jiM+j.
Thus, in step Equation 45, we need to apply the permutation jK+i↦iM+jjK+i↦iM+j, which is exactly the stride permutation LMNLMN
in Equation 24.
In summary, we obtain the CooleyTukey decimationinfrequency FFT with
arbitrary radix:
L
M
N
⊕
0
≤
i
<
K
DFT
M
·
diag
j
=
0
M

1
(
W
N
i
j
)
(
DFT
k
⊗
I
M
)
=
L
M
N
(
I
K
⊗
DFT
M
)
T
M
N
(
DFT
k
⊗
I
M
)
.
L
M
N
⊕
0
≤
i
<
K
DFT
M
·
diag
j
=
0
M

1
(
W
N
i
j
)
(
DFT
k
⊗
I
M
)
=
L
M
N
(
I
K
⊗
DFT
M
)
T
M
N
(
DFT
k
⊗
I
M
)
.
(48)The matrix TMNTMN is diagonal and usually called the
twiddle matrix. Transposition using
Equation 31 yields the corresponding decimationintime version:
(
DFT
k
⊗
I
M
)
T
M
N
(
I
K
⊗
DFT
M
)
L
K
N
.
(
DFT
k
⊗
I
M
)
T
M
N
(
I
K
⊗
DFT
M
)
L
K
N
.
(49)
This chapter only scratches the surface of the connection between
algebra and the DFT or signal processing in general. We provide
a few references for further reading.
As mentioned before, the use of polynomial algebras and the CRT
underlies much of the early work on FFTs and convolution algorithms
[21], [9], [1]. For example, Winograd's
work on FFTs minimizes the number of nonrational multiplications.
This and his work on complexity theory in general makes heavy use of
polynomial algebras [21], [22], [23] (see
Chapter Winograd’s Short DFT Algorithms for more information and references).
See [2] for a broad treatment of algebraic complexity
theory.
Since C[x]/(sN1)=C[CN]C[x]/(sN1)=C[CN] can be viewed a group algebra for the
cyclic group, the methods shown in this chapter can be translated into
the context of group representation theory. For example,
[8] derives the generalradix FFT using group theory
and also uses already the Kronecker product formalism. So does Beth
and started the area of FFTs for more general
groups [3], [7]. However, Fourier transforms for
groups have found only sporadic applications [16].
Along a related line of work, [5] shows that using group
theory it is possible that to discover and generate certain algorithms
for trigonometric transforms, such as discrete cosine transforms
(DCTs), automatically using a computer program.
More recently, the polynomial algebra framework was extended to
include most trigonometric transforms used in signal processing
[14], [13], namely, besides the DFT, the discrete cosine and sine transforms and various real DFTs including the discrete Hartley transform.
It turns out that the same
techniques shown in this chapter can then be applied to derive,
explain, and classify most of the known algorithms for these
transforms and even obtain a large class of new algorithms including
generalradix algorithms for the discrete cosine and sine transforms
(DCTs/DSTs) [15], [12], [19], [20].
This latter line of work is part of the algebraic signal processing
theory briefly discussed next.
The algebraic properties of transforms used in the above work on
algorithm derivation hints at a connection between algebra and
(linear) signal processing itself. This is indeed the case and was
fully developed in a recent body of work called algebraic signal
processing theory (ASP). The foundation of ASP is developed in [13], [14], [11].
ASP first identifies the algebraic structure of (linear) signal
processing: the common assumptions on available operations for filters
and signals make the set of filters an algebraAA and the
set of signals an associated AAmodule
MM. ASP then
builds a signal processing theory formally from the axiomatic
definition of a signal model: a triple (A,M,Φ)(A,M,Φ), where
ΦΦ generalizes the idea of the zztransform to mappings from
vector spaces of signal values to MM. If a signal model is given,
other concepts, such as spectrum, Fourier transform, frequency
response are automatically defined but take different forms for
different models. For example, infinite and finite time as discussed
in Table 1 are two examples of signal models. Their
complete definition is provided in Table 2 and
identifies the proper notion of a finite zztransform as a mapping
Cn→C[s]/(sn1)Cn→C[s]/(sn1).
Table 2: Infinite and finite time models as defined in ASP.
Signal model 
Infinite time 
Finite time 
A
A

∑
n
∈
Z
H
(
n
)
s
n
∣
(
⋯
,
H
(

1
)
,
H
(
0
)
,
H
(
1
)
,
⋯
)
∈
ℓ
1
(
Z
)
∑
n
∈
Z
H
(
n
)
s
n
∣
(
⋯
,
H
(

1
)
,
H
(
0
)
,
H
(
1
)
,
⋯
)
∈
ℓ
1
(
Z
)

C
[
x
]
/
(
s
n

1
)
C
[
x
]
/
(
s
n

1
)

M
M

∑
n
∈
Z
X
(
n
)
s
n
∣
(
⋯
,
X
(

1
)
,
X
(
0
)
,
X
(
1
)
,
⋯
)
∈
ℓ
2
(
Z
)
∑
n
∈
Z
X
(
n
)
s
n
∣
(
⋯
,
X
(

1
)
,
X
(
0
)
,
X
(
1
)
,
⋯
)
∈
ℓ
2
(
Z
)

C
[
s
]
/
(
s
n

1
)
C
[
s
]
/
(
s
n

1
)

Φ
Φ

Φ
:
ℓ
2
(
Z
)
→
M
Φ
:
ℓ
2
(
Z
)
→
M

Φ
:
C
n
→
M
Φ
:
C
n
→
M


defined in Equation 1 
defined in Equation 4 
ASP shows that many signal models are in principle possible, each with
its own notion of filtering and Fourier transform. Those that support
shiftinvariance have commutative algebras. Since finitedimensional
commutative algebras are precisely polynomial algebras, their
appearance in signal processing is explained. For example, ASP
identifies the polynomial algebras underlying the DCTs and DSTs, which
hence become Fourier transforms in the ASP sense. The signal models
are called finite space models since they support signal
processing based on an undirected shift operator, different from the
directed time shift. Many more insights are provided by ASP including
the need for and choices in choosing boundary conditions, properties
of transforms, techniques for deriving new signal models, and the
concise derivation of algorithms mentioned before.

Auslander, L. and Feig, E. and Winograd, S. (1984). Abelian Semisimple Algebras and Algorithms for the Discrete Fourier Transform. Advances in Applied Mathematics, 5, 31–55.

Bürgisser, P. and Clausen, M. and Shokrollahi, M. A. (1997). Algebraic Complexity Theory. Springer.

Beth, Th. (1984). Verfahren der Schnellen Fouriertransformation [Fast Fourier Transform Methods]. Teubner.

Britanak, V. and Rao, K. R. (1999). The fast generalized discrete Fourier transforms: A unified approach to the discrete sinusoidal transforms computation. Signal Processing, 79, 135–150.

Egner, S. and Püschel, M. (2001). Automatic Generation of Fast Discrete Signal Transforms. IEEE Transactions on Signal Processing, 49(9), 1992–2002.

Fuhrman, Paul A. (1996). A Polynomial Approach to Linear Algebra. New York: Springer Verlag.

Maslen, D. and Rockmore, D. (1995). Generalized FFTs – A survey of some recent results. In Proceedings of IMACS Workshop in Groups and Computation. (Vol. 28, p. 182–238).

Nicholson, P. J. (1971). Algebraic Theory of Finite Fourier Transforms. Journal of Computer and System Sciences, 5, 524–547.

Nussbaumer, H. J. (1982). Fast Fourier Transformation and Convolution Algorithms. (2nd). Springer.

Oppenheim, A. V. and Schafer, R. W. and Buck, J. R. (1999). DiscreteTime Signal Processing. (2nd). Prentice Hall.

Püschel, M. and Moura, J. M. F. Algebraic Signal Processing Theory. [available at http://arxiv.org/abs/cs.IT/0612077].

Püschel, M. and Moura, J. M. F. (2008). Algebraic Signal Processing Theory: CooleyTukey Type Algorithms for DCTs and DSTs. [a longer version is available at http://arxiv.org/abs/cs.IT/0702025]. IEEE Transactions on Signal Processing, 56(4), 15021521.

Püschel, M. and Moura, J. M. F. (2008). Algebraic Signal Processing Theory: Foundation and 1D Time. IEEE Transactions on Signal Processing, 56(8), 35723585.

Püschel, M. and Moura, J. M. F. (2008). Algebraic Signal Processing Theory: 1D Space. IEEE Transactions on Signal Processing, 56(8), 35863599.

Püschel, M. and Moura, J. M. F. (2003). The Algebraic Approach to the Discrete Cosine and Sine Transforms and their Fast Algorithms. SIAM Journal of Computing, 32(5), 1280–1316.

Rockmore, D. (1995). Some applications of generalized FFT's. In Proceedings of DIMACS Workshop in Groups and Computation. (Vol. 28, p. 329–370).

Tolimieri, R. and An, M. and Lu, C. (1997). Algorithms for Discrete Fourier Transforms and Convolution. (2nd). Springer.

Van Loan, C. (1992). Computational Framework of the Fast Fourier Transform. Siam.

Voronenko, Y. and Püschel, M. (2006). Algebraic derivation of general radix CooleyTukey algorithms for the real discrete Fourier transform. In Proc. International Conference on Acoustics, Speech, and Signal Processing (ICASSP). (Vol. 3, pp. 876879).

Voronenko, Y. and Püschel, M. Algebraic Signal Processing Theory: CooleyTukey Type Algorithms for Real DFTs. [to appear]. IEEE Transactions on Signal Processing.

Winograd, S. (1978). On Computing the Discrete Fourier Transform. Mathematics of Computation, 32, 175–199.

Winograd, S. (1979). On the Multiplicative Complexity of the Discrete Fourier Transform. Advances in Mathematics, 32, 83–117.

Winograd, S. (1980). Arithmetic Complexity of Computation. Siam.
"The Fast Fourier Transform (FFT) is a landmark algorithm used in fields ranging from signal processing to highperformance computing. First popularized by two American scientists in 1965, the […]"