In 1976, S. Winograd Entry 20 presented a new DFT algorithm
which had significantly fewer multiplications than the Cooley-Tukey
FFT which had been published eleven years earlier. This new Winograd
Fourier Transform Algorithm (WFTA) is based on the type- one index
map from Multidimensional Index Mapping with each of the relatively prime length
short DFT's calculated by very efficient special algorithms. It is
these short algorithms that this section will develop. They use the
index permutation of Rader described in the previous section to
convert the prime length short DFT's into cyclic convolutions.
Winograd developed a method for calculating digital convolution with
the minimum number of multiplications. These optimal algorithms are
based on the polynomial residue reduction techniques of Polynomial Description of Signals: Equation 1 to break the convolution into multiple small ones
Entry 2, Entry 12, Entry 14, Entry 23, Entry 21, Entry 9.
The operation of discrete convolution defined by
y
(
n
)
=
∑
k
h
(
n
-
k
)
x
(
k
)
y
(
n
)
=
∑
k
h
(
n
-
k
)
x
(
k
)
(1)
is called a bilinear operation because, for a fixed h(n)h(n),
y(n)y(n) is a linear function of x(n)x(n) and for a fixed x(n)x(n) it is a
linear function of h(n)h(n). The operation of cyclic convolution is
the same but with all indices evaluated modulo NN.
Recall from Polynomial Description of Signals: Equation 3 that length-N cyclic convolution of
x(n)x(n) and h(n)h(n) can be represented by polynomial multiplication
Y
(
s
)
=
X
(
s
)
H
(
s
)
mod
(
s
N
-
1
)
Y
(
s
)
=
X
(
s
)
H
(
s
)
mod
(
s
N
-
1
)
(2)
This bilinear operation of Equation 1 and Equation 2 can also be
expressed in terms of linear matrix operators and a simpler bilinear
operator denoted by oo which may be only a simple
element-by-element multiplication of the two vectors
Entry 12, Entry 9, Entry 10. This matrix formulation is
Y
=
C
[
A
X
o
B
H
]
Y
=
C
[
A
X
o
B
H
]
(3)
where XX, HH and YY are length-N vectors with elements of
x(n)x(n), h(n)h(n) and y(n)y(n) respectively. The matrices AA and BB
have dimension MM x NN , and CC is NN x MM with M≥NM≥N.
The elements of AA, BB, and CC are constrained to be simple;
typically small integers or rational numbers. It will be these
matrix operators that do the equivalent of the residue reduction on
the polynomials in Equation 2.
In order to derive a useful algorithm of the form Equation 3 to
calculate Equation 1, consider the polynomial formulation
Equation 2 again. To use the residue reduction scheme, the modulus
is factored into relatively prime factors. Fortunately the factoring
of this particular polynomial, sN-1sN-1, has been extensively studied
and it has considerable structure. When factored over the rationals,
which means that the only coefficients allowed are rational numbers,
the factors are called cyclotomic polynomials
Entry 2, Entry 12, Entry 14. The most interesting property for our
purposes is that most of the coefficients of cyclotomic polynomials
are zero and the others are plus or minus unity for orders up to
over one hundred. This means the residue reduction will generally
not require multiplications.
The operations of reducing X(s)X(s) and H(s)H(s) in Equation 2 are carried
out by the matrices AA and BB in Equation 3. The convolution of
the residue polynomials is carried out by the oo operator and the
recombination by the CRT is done by the CC matrix. More details are
in Entry 2, Entry 12, Entry 14, Entry 9, Entry 10 but the important fact is
the AA and BB matrices usually contain only zero and plus or minus
unity entries and the CC matrix only contains rational numbers. The
only general multiplications are those represented by oo. Indeed,
in the theoretical results from computational complexity theory,
these real or complex multiplications are usually the only ones
counted. In practical algorithms, the rational multiplications
represented by CC could be a limiting factor.
The h(n)h(n) terms are fixed for a digital filter, or they
represent the WW terms from Multidimensional Index Mapping: Equation 1 if the convolution is being
used to calculate a DFT. Because of this, d=BHd=BH in Equation 3
can be precalculated and only the AA and CC operators represent
the mathematics done at execution of the algorithm. In order to
exploit this feature, it was shown Entry 23, Entry 9 that the
properties of Equation 3 allow the exchange of the more complicated
operator CC with the simpler operator BB. Specifically this is
given by
Y
=
C
[
A
X
o
B
H
]
Y
=
C
[
A
X
o
B
H
]
(4)
Y
'
=
B
T
[
A
X
o
C
T
H
'
]
Y
'
=
B
T
[
A
X
o
C
T
H
'
]
(5)
where HH' has the same elements as HH, but in a permuted order,
and likewise YY' and YY. This very important property allows
precomputing the more complicated CTHCTH' in Equation 5 rather than
BHBH as in Equation 3.
Because BHBH or CTHCTH' can be precomputed, the bilinear form of
Equation 3 and Equation 5 can be written as a linear form. If an
MM x MM diagonal matrix DD is formed from d=CTHd=CTH, or in the
case of Equation 3 d=BHd=BH, assuming a commutative property for
oo, Equation 5 becomes
Y
'
=
B
T
D
A
X
Y
'
=
B
T
D
A
X
(6)
and Equation 3 becomes
Y
=
C
D
A
X
Y
=
C
D
A
X
(7)
In most cases there is no reason not to use the same reduction
operations on XX and HH, therefore, BB can be the same as AA and
Equation 6 then becomes
Y
'
=
A
T
D
A
X
Y
'
=
A
T
D
A
X
(8)
In order to illustrate how the residue reduction is carried
out and how the A matrix is obtained, the length-5 DFT algorithm
started in The DFT as Convolution or Filtering: Matrix 1 will be continued. The DFT is first converted
to a length-4 cyclic convolution by the index permutation from
The DFT as Convolution or Filtering: Equation 3 to give the cyclic convolution in The DFT as Convolution or Filtering. To avoid
confusion from the permuted order of the data x(n)x(n) in The DFT as Convolution or Filtering,
the cyclic convolution will first be developed without the
permutation, using the polynomial U(s)U(s)
U
(
s
)
=
x
(
1
)
+
x
(
3
)
s
+
x
(
4
)
s
2
+
x
(
2
)
s
3
U
(
s
)
=
x
(
1
)
+
x
(
3
)
s
+
x
(
4
)
s
2
+
x
(
2
)
s
3
(9)
U
(
s
)
=
u
(
0
)
+
u
(
1
)
s
+
u
(
2
)
s
2
+
u
(
3
)
s
3
U
(
s
)
=
u
(
0
)
+
u
(
1
)
s
+
u
(
2
)
s
2
+
u
(
3
)
s
3
(10)
and then the results will be converted back to the permuted x(n)x(n).
The length-4 cyclic convolution in terms of polynomials is
Y
(
s
)
=
U
(
s
)
H
(
s
)
mod
(
s
4
-
1
)
Y
(
s
)
=
U
(
s
)
H
(
s
)
mod
(
s
4
-
1
)
(11)
and the modulus factors into three cyclotomic polynomials
s
4
-
1
=
(
s
2
-
1
)
(
s
2
+
1
)
s
4
-
1
=
(
s
2
-
1
)
(
s
2
+
1
)
(12)
=
(
s
-
1
)
(
s
+
1
)
(
s
2
+
1
)
=
(
s
-
1
)
(
s
+
1
)
(
s
2
+
1
)
(13)
=
P
1
P
2
P
3
=
P
1
P
2
P
3
(14)
Both U(s)U(s) and H(s)H(s) are reduced modulo these three polynomials.
The reduction modulo P1P1 and P2P2 is done in two stages. First it
is done modulo (s2-1)(s2-1), then that residue is further reduced
modulo (s-1)(s-1) and (s+1)(s+1).
U
(
s
)
=
u
0
+
u
1
s
+
u
2
s
2
+
u
3
s
3
U
(
s
)
=
u
0
+
u
1
s
+
u
2
s
2
+
u
3
s
3
(15)
U
'
(
s
)
=
(
(
U
(
s
)
)
)
(
s
2
-
1
)
=
(
u
0
+
u
2
)
+
(
u
1
+
u
3
)
s
U
'
(
s
)
=
(
(
U
(
s
)
)
)
(
s
2
-
1
)
=
(
u
0
+
u
2
)
+
(
u
1
+
u
3
)
s
(16)
U
1
(
s
)
=
(
(
U
'
(
s
)
)
)
P
1
=
(
u
0
+
u
1
+
u
2
+
u
3
)
U
1
(
s
)
=
(
(
U
'
(
s
)
)
)
P
1
=
(
u
0
+
u
1
+
u
2
+
u
3
)
(17)
U
2
(
s
)
=
(
(
U
'
(
s
)
)
)
P
2
=
(
u
0
-
u
1
+
u
2
-
u
3
)
U
2
(
s
)
=
(
(
U
'
(
s
)
)
)
P
2
=
(
u
0
-
u
1
+
u
2
-
u
3
)
(18)
U
3
(
s
)
=
(
(
U
(
s
)
)
)
P
3
=
(
u
0
-
u
2
)
+
(
u
1
-
u
3
)
s
U
3
(
s
)
=
(
(
U
(
s
)
)
)
P
3
=
(
u
0
-
u
2
)
+
(
u
1
-
u
3
)
s
(19)
The reduction in Equation 16 of the data polynomial Equation 15 can
be denoted by a matrix operation on a vector which has the data as
entries.
1
0
1
0
0
1
0
1
u
0
u
1
u
2
u
3
=
u
0
+
u
2
u
1
+
u
3
1
0
1
0
0
1
0
1
u
0
u
1
u
2
u
3
=
u
0
+
u
2
u
1
+
u
3
(20)
and the reduction in Equation 19 is
1
0
-
1
0
0
1
0
-
1
u
0
u
1
u
2
u
3
=
u
0
-
u
2
u
1
-
u
3
1
0
-
1
0
0
1
0
-
1
u
0
u
1
u
2
u
3
=
u
0
-
u
2
u
1
-
u
3
(21)
Combining Equation 20 and Equation 21 gives one operator
1
0
1
0
0
1
0
1
1
0
-
1
0
0
1
0
-
1
u
0
+
u
2
u
1
+
u
3
u
0
-
u
2
u
1
-
u
3
=
u
0
+
u
2
u
1
+
u
3
u
0
-
u
2
u
1
-
u
3
=
w
0
w
1
v
0
v
1
1
0
1
0
0
1
0
1
1
0
-
1
0
0
1
0
-
1
u
0
+
u
2
u
1
+
u
3
u
0
-
u
2
u
1
-
u
3
=
u
0
+
u
2
u
1
+
u
3
u
0
-
u
2
u
1
-
u
3
=
w
0
w
1
v
0
v
1
(22)
Further reduction of v0+v1sv0+v1s is not possible because P3=s2+1P3=s2+1 cannot be factored over the rationals. However s2-1s2-1 can be
factored into P1P2=(s-1)(s+1)P1P2=(s-1)(s+1) and, therefore, w0+w1sw0+w1s can
be further reduced as was done in Equation 17 and Equation 18 by
1
1
w
0
w
1
=
w
0
+
w
1
=
u
0
+
u
2
+
u
1
+
u
3
1
1
w
0
w
1
=
w
0
+
w
1
=
u
0
+
u
2
+
u
1
+
u
3
(23)
1
-
1
w
0
w
1
=
w
0
-
w
1
=
u
0
+
u
2
-
u
1
-
u
3
1
-
1
w
0
w
1
=
w
0
-
w
1
=
u
0
+
u
2
-
u
1
-
u
3
(24)
Combining Equation 22, Equation 23 and Equation 24 gives
1
1
0
0
1
-
1
0
0
0
0
1
0
0
0
0
1
1
0
1
0
0
1
0
1
1
0
-
1
0
0
1
0
-
1
u
0
u
1
u
2
u
3
=
r
0
r
1
v
0
v
1
1
1
0
0
1
-
1
0
0
0
0
1
0
0
0
0
1
1
0
1
0
0
1
0
1
1
0
-
1
0
0
1
0
-
1
u
0
u
1
u
2
u
3
=
r
0
r
1
v
0
v
1
(25)
The same reduction is done to H(s)H(s) and then the convolution of
Equation 11 is done by multiplying each residue polynomial of X(s)X(s)
and H(s)H(s) modulo each corresponding cyclotomic factor of P(s)P(s) and
finally a recombination using the polynomial Chinese Remainder
Theorem (CRT) as in Polynomial Description of Signals: Equation 9 and Polynomial Description of Signals: Equation 13.
Y
(
s
)
=
K
1
(
s
)
U
1
(
s
)
H
1
(
s
)
+
K
2
(
s
)
U
2
(
s
)
H
2
(
s
)
+
K
3
(
s
)
U
3
(
s
)
H
3
(
s
)
Y
(
s
)
=
K
1
(
s
)
U
1
(
s
)
H
1
(
s
)
+
K
2
(
s
)
U
2
(
s
)
H
2
(
s
)
+
K
3
(
s
)
U
3
(
s
)
H
3
(
s
)
(26)
mod (s4-1)(s4-1) (73)
where U1(s)=r1U1(s)=r1 and U2(s)=r2<