The Winograd Fourier transform algorithm (WFTA) uses a very
powerful property of the Type-1 index map and the DFT to give a
further reduction of the number of multiplications in the PFA. Using
an operator notation where F1F1 represents taking row DFT's and
F2F2 represents column DFT's, the two-factor PFA of Equation 8 is
represented by
X
=
F
2
F
1
x
X
=
F
2
F
1
x
(9)It has been shown [21], [9] that if
each operator represents identical operations on each row or column,
they commute. Since F1F1 and F2F2 represent length N1N1 and N2N2
DFT's, they commute and Equation 9 can also be written
X
=
F
1
F
2
x
X
=
F
1
F
2
x
(10)If each short DFT in FF is expressed by
three operators as in
Winograd’s Short DFT Algorithms: Equation 8 and Winograd’s Short DFT Algorithms: Figure 2, FF can be factored
as
F
=
A
T
D
A
F
=
A
T
D
A
(11)where A represents the set of additions
done on each row or column that performs the residue reduction as
Winograd’s Short DFT Algorithms: Equation 30. Because of the appearance of the flow graph of AA and
because it is the first operator on xx, it is called a preweave
operator [15]. DD is the set of MM multiplications and
ATAT (or BTBT or CTCT) from Winograd’s Short DFT Algorithms: Equation 5 or Winograd’s Short DFT Algorithms: Equation 6 is the
reconstruction operator called the postweave. Applying Equation 11
to Equation 9 gives
X
=
A
2
T
D
2
A
2
A
1
T
D
1
A
1
x
X
=
A
2
T
D
2
A
2
A
1
T
D
1
A
1
x
(12)This is the PFA of Equation 8 and Figure 1 where A1D1A1A1D1A1
represents the row DFT's on the array formed from xx. Because these
operators commute, Equation 12 can also be written as
X
=
A
2
T
A
1
T
D
2
D
1
A
2
A
1
x
X
=
A
2
T
A
1
T
D
2
D
1
A
2
A
1
x
(13)or
X
=
A
1
T
A
2
T
D
2
D
1
A
2
A
1
x
X
=
A
1
T
A
2
T
D
2
D
1
A
2
A
1
x
(14)but the two adjacent multiplication
operators can be premultiplied and the result represented by one
operator D=D2D1D=D2D1 which is no longer the same for each row or
column. Equation Equation 14 becomes
X
=
A
1
T
A
2
T
D
A
2
A
1
x
X
=
A
1
T
A
2
T
D
A
2
A
1
x
(15)This is the basic idea of the Winograd Fourier transform algorithm.
The commuting of the multiplication operators together in the center
of the algorithm is called nesting and it results in a significant
decrease in the number of multiplications that must be done at the
execution of the algorithm. Pictorially, the PFA of Figure 1 becomes
[12] the WFTA in Figure 2.
The rectangular structure of the preweave addition operators
causes an expansion of the data in the center of the algorithm. The
15 data points in Figure 2 become 18 intermediate values. This
expansion is a major problem in programming the WFTA because it
prevents a straightforward in-place calculation and causes an
increase in the number of required additions and in the number of
multiplier constants that must be precalculated and stored.
From Figure 2 and the idea of premultiplying the individual
multiplication operators, it can be seen why the multiplications by
unity had to be considered in Winograd’s Short DFT Algorithms: Table 1. Even if a multiplier in D1D1
is unity, it may not be in D2D1D2D1. In Figure 2 with factors of
three and five, there appear to be 18 multiplications required
because of the expansion of the length-5 preweave operator, A2A2,
however, one of multipliers in each of the length three and five
operators is unity, so one of the 18 multipliers in the product is
unity. This gives 17 required multiplications - a rather impressive
reduction from the 152=225152=225 multiplications required by direct
calculation. This number of 17 complex multiplications will require
only 34 real multiplications because, as mentioned earlier, the
multiplier constants are purely real or imaginary while the 225
complex multiplications are general and therefore will require four
times as many real multiplications.
The number of additions depends on the order of the pre- and
postweave operators. For example in the length-15 WFTA in Figure 2,
if the length-5 had been done first and last, there would have been
six row addition preweaves in the preweave operator rather than the
five shown. It is difficult to illustrate the algorithm for three or
more factors of N, but the ideas apply to any number of factors.
Each length has an optimal ordering of the pre- and postweave
operators that will minimize the number of additions.
A program for the WFTA is not as simple as for the FFT or
PFA because of the very characteristic that reduces the number of
multiplications, the nesting. A simple two-factor example program is
given in [3] and a general program can be found in
[15], [5]. The same lengths are possible with the PFA and
WFTA and the same short DFT modules can be used, however, the
multiplies in the modules must occur in one place for use in the
WFTA.
"The Fast Fourier Transform (FFT) is a landmark algorithm used in fields ranging from signal processing to high-performance computing. First popularized by two American scientists in 1965, the […]"