Note: Your browser may not currently support MathML. See our browser support page for additional details. You can always view the correct math in the PDF version.
One of the main applications of the FFT is to do convolution more efficiently than the direct calculation from the definition which is:
which can also be written as:
This is often used to filter a signal
Because the DFT converts convolution to multiplication:
can be calculated with the FFT and bring the order of arithmetic
operations down to
This approach, which is called “fast convolutions", is a form of
block processing since a whole block of
For filtering and some other applications, one want “on going" convolution
where the filter response
In order to use the FFT to convolve (or filter) a long input sequence
The reason this procedure is not totally straightforward, is the length of the output of convolving a length-L block with a length-M filter is of length
The second issue that must be taken into account is the fact that the overlap-add steps need non-cyclic convolution and convolution by the FFT is cyclic. This is easily handled by appending
The savings in arithmetic can be considerable when implementing convolution or performing FIR digital filtering. However, there are two penalties. The use of blocks introduces a delay of one block length. None of the first block of output can be calculated until all of the first block of input is available. This is not a problem for “off line" or “batch" processing but can be serious for real-time processing. The second penalty is the memory required to store and process the blocks. The continuing reduction of memory cost often removes this problem.
The efficiency in terms of number of arithmetic operations per output point increases for large blocks because of the
Usually, the block convolutions are done by the FFT, but they could be done by any efficient, finite length method. One could use “rectangular transforms" or “number-theoretic transforms". A generalization of this method is presented later in the notes.
An alternative approach to the Overlap-Add can be developed by starting with segmenting the output rather than the input. If one considers the calculation of a block of output, it is seen that not only the corresponding input block is needed, but part of the preceding input block also needed. Indeed, one can show that a length
Convolution is intimately related to the DFT. It was shown in The DFT as Convolution or Filtering that a prime length DFT could be converted to cyclic convolution. It has been long known [48] that convolution can be calculated by multiplying the DFTs of signals.
An important question is what is the fastest method for calculating digital convolution. There are several methods that each have some advantage. The earliest method for fast convolution was the use of sectioning with overlap-add or overlap-save and the FFT [48], [53], [10]. In most cases the convolution is of real data and, therefore, real-data FFTs should be used. That approach is still probably the fastest method for longer convolution on a general purpose computer or microprocessor. The shorter convolutions should simply be calculated directly.
The partitioning of long or infinite strings of data into shorter sections or blocks has been used to allow application of the FFT to realize on-going or continuous convolution [57], [30]. This section develops the idea of block processing and shows that it is a generalization of the overlap-add and overlap-save methods [57], [26]. They further generalize the idea to a multidimensional formulation of convolution [3], [15]. Moving in the opposite direction, it is shown that, rather than partitioning a string of scalars into blocks and then into blocks of blocks, one can partition a scalar number into blocks of bits and then include the operation of multiplication in the signal processing formulation. This is called distributed arithmetic [14] and, since it describes operations at the bit level, is completely general. These notes try to present a coherent development of these ideas.
In this section the usual convolution and recursion that implements FIR
and IIR discrete-time filters are reformulated in terms of vectors and
matrices. Because the same data is partitioned and grouped in a variety
of ways, it is important to have a consistent notation in order to be
clear. The
The operation of a finite impulse response (FIR) filter is described by a finite convolution as
where
which can be expressed as a matrix operation by
The
Substituting these definitions into Equation 6 gives
The general expression for the
which is a vector or block convolution. Since the matrix-vector multiplication within the block convolution is itself a convolution, Equation 11 is a sort of convolution of convolutions and the finite length matrix-vector multiplication can be carried out using the FFT or other fast convolution methods.
The equation for one output block can be written as the product
and the effects of one input block can be written
These are generalize statements of overlap save and overlap add [57], [26]. The block length can be longer, shorter, or equal to the filter length.
Although less well-known, IIR filters can be implemented with block
processing [24], [18], [59], [12], [13]. The block form of an IIR
filter is developed in much the same way as for the block convolution
implementation of the FIR filter. The general constant coefficient
difference equation which describes an IIR filter with recursive
coefficients
using both functional notation and subscripts, depending on which is
easier and clearer. The impulse response
which can be written in matrix operator form
In terms of
From this formulation, a block recursive equation can be written that will generate the impulse response block by block.
with initial conditions given by
This can also be written to generate the square partitions of the impulse response matrix by
with initial conditions given by
ane
Next, we develop the recursive formulation for a general input as described by the scalar difference equation Equation 14 and in matrix operator form by
which, after substituting the definitions of the sub matrices and assuming the block length is larger than the order of the numerator or denominator, becomes
From the partitioned rows of Equation 24, one can write the block recursive relation
Solving for
which is a first order vector difference equation [12], [13]. This is the fundamental block recursive algorithm that implements the original scalar difference equation in Equation 14. It has several important characteristics.
It is possible to reduce the size of the matrix operators in the block
recursive description Equation 26 to give a form even more like a state
variable equation [42], [13], [64]. If
where
This is now a minimal state equation whose input and output are blocks of the original input and output. Some of the matrix multiplications can be carried out using the FFT or other techniques.
The advantage of the block convolution and recursion implementations is a possible improvement in arithmetic efficiency by using the FFT or other fast convolution methods for some of the multiplications in Equation 10 or Equation 25 [39], [40]. There is the reduction of quantization effects due to an effective decrease in the magnitude of the eigenvalues and the possibility of easier parallel implementation for IIR filters. The disadvantages are a delay of at least one block length and an increased memory requirement.
These methods could also be used in the various filtering methods for evaluating the DFT. This the chirp z-transform, Rader's method, and Goertzel's algorithm.
This process of partitioning the data vectors and the operator matrices can be continued by partitioning Equation 10 and Equation 24 and creating blocks of blocks to give a higher dimensional structure. One should use index mapping ideas rather than partitioned matrices for this approach [3], [15].
Most time-varying systems are periodically time-varying and this allows special results to be obtained. If the block length is set equal to the period of the time variations, the resulting block equations are time invariant and all to the time varying characteristics are contained in the matrix multiplications. This allows some of the tools of time invariant systems to be used on periodically time-varying systems.
The PTV system is analyzed in [61], [20], [19], [37], the filter analysis and design problem, which includes the decimation–interpolation structure, is addressed in [22], [38], [36], and the bandwidth compression problem in [35]. These structures can take the form of filter banks [58].
Another area that is related to periodically time varying systems and to block processing is filter banks [58], [25]. Recently the area of perfect reconstruction filter banks has been further developed and shown to be closely related to wavelet based signal analysis [20], [21], [23], [58]. The filter bank structure has several forms with the polyphase and lattice being particularly interesting.
An idea that has some elements of multirate filters, perfect reconstruction, and distributed arithmetic is given in [27], [28], [29]. Parks has noted that design of multirate filters has some elements in common with complex approximation and of 2-D filter design [54], [56] and is looking at using Tang's method for these designs.
Rather than grouping the individual scalar data values in a discrete-time signal into blocks, the scalar values can be partitioned into groups of bits. Because multiplication of integers, multiplication of polynomials, and discrete-time convolution are the same operations, the bit-level description of multiplication can be mixed with the convolution of the signal processing. The resulting structure is called distributed arithmetic [14], [60]. It can be used to create an efficient table look-up scheme to implement an FIR or IIR filter using no multiplications by fetching previously calculated partial products which are stored in a table. Distributed arithmetic, block processing, and multi-dimensional formulations can be combined into an integrated powerful description to implement digital filters and processors. There may be a new form of distributed arithmetic using the ideas in [28], [29].
A relatively new approach uses index mapping directly to convert a one dimensional convolution into a multidimensional convolution [15], [5]. This can be done by either a type-1 or type-2 map. The short convolutions along each dimension are then done by Winograd's optimal algorithms. Unlike for the case of the DFT, there is no savings of arithmetic from the index mapping alone. All the savings comes from efficient short algorithms. In the case of index mapping with convolution, the multiplications must be nested together in the center of the algorithm in the same way as for the WFTA. There is no equivalent to the PFA structure for convolution. The multidimensional convolution can not be calculated by row and column convolutions as the DFT was by row and column DFTs.
It would first seem that applying the index mapping and optimal short algorithms directly to convolution would be more efficient than using DFTs and converting them to convolution to be calculated by the same optimal algorithms. In practical algorithms, however, the DFT method seems to be more efficient [49].
A method that is attractive for special purpose hardware uses distributed arithmetic [14]. This approach uses a table look up of precomputed partial products to produce a system that does convolution without requiring multiplications [17].
Another method that requires special hardware uses number theoretic transforms [9], [41], [45] to calculate convolution. These transforms are defined over finite fields or rings with arithmetic performed modulo special numbers. These transforms have rather limited flexibility, but when they can be used, they are very efficient.
A basic review of the number theory useful for signal processing algorithms will be given here with specific emphasis on the congruence theory for number theoretic transforms [47], [31], [46], [41], [55].
Here we look at the conditions placed on a general linear transform in order for it to support cyclic convolution. The form of a linear transformation of a length-N sequence of number is given by
for
for
The conditions are derived by taking the transform defined in Equation 4 of both sides of equation Equation 5 which gives
Making the change of index variables,
But from Equation 6, this must be
This must be true for all
For
and, therefore,
For
and, therefore,
But
therefore,
Defining
where
Theorem 1
The transform Equation 13 supports cyclic convolution if and only if
This is discussed in [2], [4].
Theorem 2 The transform Equation 13 supports cyclic convolution if and only if
where
and
This theorem is a more useful form of Theorem 1. Notice that
One needs to find appropriate
We see that if
For
Since Fermat numbers up to
The following table gives possible parameters for various Fermat number moduli.
| t | b |
|
|
|
|
|
| 3 | 8 |
|
16 | 32 | 256 | 3 |
| 4 | 16 |
|
32 | 64 | 65536 | 3 |
| 5 | 32 |
|
64 | 128 | 128 |
|
| 6 | 64 |
|
128 | 256 | 256 |
|
This table gives values of
The books, articles, and presentations that discuss NTT and related topics are [33], [41], [45], [9], [43], [44], [50], [52], [51], [1], [7], [2], [4]. A recent book discusses NT in a signal processing context [32].
"The Fast Fourier Transform (FFT) is a landmark algorithm used in fields ranging from signal processing to high-performance computing. First popularized by two American scientists in 1965, the […]"