This report describes three large DFT modules (17,19,25) which were developed by the first author, Howard Johnson, in June of 1981, and two previously undocumented modules (11,13) which were originally generated at Stanford in 1978 [8].
The length 17 and 19 modules were created in the style of Winograd's convolutional DFT programs with strict adherence to three additional module development principles. First, as much code as possible was automatically generated. This included use of FORTRAN programs to generate the input and output mapping statements and the multiplication statements, and heavy use of EDIT commands to copy redundant sections of code. The code for imaginary data manipulation was copied directly from a working listing of code for the real part. All discussion below therefore centers on producing code only for the real part of the input data array. Even the EDIT commands for copying sections of code and substituting variable names were themselves listed in a command file. In this way, the programmer was prevented from introducing occasional typographical errors which are the bane of the DFT module debugger. Errors which did occur tended to be very large and obvious. Test routines were written to test particularly difficult sections of code before they were inserted into the DFT module (such as the modulo
Once the reduction, or PRE-WEAVE, section was written, the reconstruction, or POST-WEAVE, section was arranged to be the transpose of the reduction equations, according to the method of 'transposing the tensor' [6]. Although the problem of minimizing the number of additions in a module is not necessarily solved by transposing the tensor, due to the inordinate difficulty of finding suitable substitutions which would abate the addition count, and the high probability of error involved in making such substitutions, it was decided to use this method. This method also provides a convenient way to check the correctness of the reconstruction procedure by computing the matrices of the reduction and reconstruction subroutines and testing to see that they are indeed a transpose pair.
Intrinsic to the method of transposing the tensor is the fact that the matrix B used to compute the algorithm's multiplication coefficients from the Nth roots of unity is generally more complicated than either the reduction matrix or its transpose, the reconstruction matrix. This result is a consequence of B having been generated from Toom-Cook polynomial reconstruction procedures and also CRT polynomial reconstructions, which are both known to be more complicated than their associated reduction procedures. The problem of finding B in order to compute a set of multipliers may be neatly circumvented by directly solving a set of linear equations to find a coefficient vector which makes the algorithm work. The details of this trick are not reported here, but may be found in [3]. Suffice to say that given working FORTRAN subroutines for the reduction and reconstruction procedures, a FORTRAN program exists which will solve for the correct coefficients.
The length 25 module does not follow the traditional Winograd approach. This module is an in-line code version of a common-factor 5x5 DFT. Each length 5 DFT is a prime-length convolutional module. The output unscrambling is included in the assignment statements at the end of the program. Some of the length 5 modules used in this program are implemented as scaled versions of conventional length 5 modules in order to save some multiplies by 1/4. The scaling factors are then compensated for by adjusting the twiddle factors. This module has three multiply sections, one for the row DFT's with a data expansion factor of 6/5, one for the twiddle factors (expansion=33/25) and on for the column DFT's (expansion=6/5).
Modules for lengths 11 and 13 are very similar in spirit to the length 19 and 17 modules. Derivations are presented for both the 11 and 13 length modules which are consistent with the listings, although these interpretations may not agree with the original intentions of the designer [8] they are correct in the sense that the algorithms could have been derived in the stated manner. Both the modules are of prime length and they are implemented in Winograd's convolutional style.
FORTRAN listings for all five modules are included with this report in a subroutine form suitable for use in Burrus' PFA program [1]. Addition and multiplication counts given are for complex input data.











