Skip to content Skip to navigation Skip to collection information

Connexions

You are here: Home » Content » Fast Fourier Transforms » Convolution Algorithms

Navigation

Lenses

What is a lens?

Definition of a lens

Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

Who can create a lens?

Any individual member, a community, or a respected organization.

What are tags? tag icon

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

This content is ...

Affiliated with (What does "Affiliated with" mean?)

This content is either by members of the organizations listed or about topics related to the organizations listed. Click each link to see a list of all content affiliated with the organization.
  • Rice Digital Scholarship

    This collection is included in aLens by: Digital Scholarship at Rice University

    Click the "Rice Digital Scholarship" link to see all content affiliated with them.

  • NSF Partnership display tagshide tags

    This collection is included inLens: NSF Partnership in Signal Processing
    By: Sidney Burrus

    Click the "NSF Partnership" link to see all content affiliated with them.

    Click the tag icon tag icon to display tags associated with this content.

  • Featured Content display tagshide tags

    This collection is included inLens: Connexions Featured Content
    By: Connexions

    Comments:

    "The Fast Fourier Transform (FFT) is a landmark algorithm used in fields ranging from signal processing to high-performance computing. First popularized by two American scientists in 1965, the […]"

    Click the "Featured Content" link to see all content affiliated with them.

    Click the tag icon tag icon to display tags associated with this content.

Also in these lenses

  • UniqU content

    This collection is included inLens: UniqU's lens
    By: UniqU, LLC

    Click the "UniqU content" link to see all content selected in this lens.

  • Lens for Engineering

    This module and collection are included inLens: Lens for Engineering
    By: Sidney Burrus

    Click the "Lens for Engineering" link to see all content selected in this lens.

Recently Viewed

This feature requires Javascript to be enabled.

Tags

(What is a tag?)

These tags come from the endorsement, affiliation, and other lenses that include this content.
 

Convolution Algorithms

Module by: C. Sidney Burrus. E-mail the author

Fast Convolution by the FFT

One of the main applications of the FFT is to do convolution more efficiently than the direct calculation from the definition which is:

y ( n ) = h ( m ) x ( n - m ) y ( n ) = h ( m ) x ( n - m )
(1)

which, with a change of variables, can also be written as:

y ( n ) = x ( m ) h ( n - m ) y ( n ) = x ( m ) h ( n - m )
(2)

This is often used to filter a signal x(n)x(n) with a filter whose impulse response is h(n)h(n). Each output value y(n)y(n) requires NN multiplications and N-1N-1 additions if y(n)y(n) and h(n)h(n) have NN terms. So, for NN output values, on the order of N2N2 arithmetic operations are required.

Because the DFT converts convolution to multiplication:

D F T { y ( n ) } = D F T { h ( n ) } D F T { x ( n ) } D F T { y ( n ) } = D F T { h ( n ) } D F T { x ( n ) }
(3)

can be calculated with the FFT and bring the order of arithmetic operations down to Nlog(N)Nlog(N) which can be significant for large NN.

This approach, which is called “fast convolutions", is a form of block processing since a whole block or segment of x(n)x(n) must be available to calculate even one output value, y(n)y(n). So, a time delay of one block length is always required. Another problem is the filtering use of convolution is usually non-cyclic and the convolution implemented with the DFT is cyclic. This is dealt with by appending zeros to x(n)x(n) and h(n)h(n) such that the output of the cyclic convolution gives one block of the output of the desired non-cyclic convolution.

For filtering and some other applications, one wants “on going" convolution where the filter response h(n)h(n) may be finite in length or duration, but the input x(n)x(n) is of arbitrary length. Two methods have traditionally used to break the input into blocks and use the FFT to convolve the block so that the output that would have been calculated by directly implementing Equation 1 or Equation 2 can be constructed efficiently. These are called “overlap-add" and “over-lap save".

Fast Convolution by Overlap-Add

In order to use the FFT to convolve (or filter) a long input sequence x(n)x(n) with a finite length-M impulse response, h(n)h(n), we partition the input sequence in segments or blocks of length LL. Because convolution (or filtering) is linear, the output is a linear sum of the result of convolving the first block with h(n)h(n) plus the result of convolving the second block with h(n)h(n), plus the rest. Each of these block convolutions can be calculated by using the FFT. The output is the inverse FFT of the product of the FFT of x(n)x(n) and the FFT of h(n)h(n). Since the number of arithmetic operation to calculate the convolution directly is on the order of M2M2 and, if done with the FFT, is on the order of Mlog(M)Mlog(M), there can be a great savings by using the FFT for large MM.

The reason this procedure is not totally straightforward, is the length of the output of convolving a length-L block with a length-M filter is of length L+M-1L+M-1. This means the output blocks cannot simply be concatenated but must be overlapped and added, hence the name for this algorithm is “Overlap-Add".

The second issue that must be taken into account is the fact that the overlap-add steps need non-cyclic convolution and convolution by the FFT is cyclic. This is easily handled by appending L-1L-1 zeros to the impulse response and M-1M-1 zeros to each input block so that all FFTs are of length M+L-1M+L-1. This means there is no aliasing and the implemented cyclic convolution gives the same output as the desired non-cyclic convolution.

The savings in arithmetic can be considerable when implementing convolution or performing FIR digital filtering. However, there are two penalties. The use of blocks introduces a delay of one block length. None of the first block of output can be calculated until all of the first block of input is available. This is not a problem for “off line" or “batch" processing but can be serious for real-time processing. The second penalty is the memory required to store and process the blocks. The continuing reduction of memory cost often removes this problem.

The efficiency in terms of number of arithmetic operations per output point increases for large blocks because of the Mlog(M)Mlog(M) requirements of the FFT. However, the blocks become very large (L>>ML>>M), much of the input block will be the appended zeros and efficiency is lost. For any particular application, taking the particular filter and FFT algorithm being used and the particular hardware being used, a plot of efficiency vs. block length, LL should be made and LL chosen to maximize efficiency given any other constraints that are applicable.

Usually, the block convolutions are done by the FFT, but they could be done by any efficient, finite length method. One could use “rectangular transforms" or “number-theoretic transforms". A generalization of this method is presented later in the notes.

Fast Convolution by Overlap-Save

An alternative approach to the Overlap-Add can be developed by starting with segmenting the output rather than the input. If one considers the calculation of a block of output, it is seen that not only the corresponding input block is needed, but part of the preceding input block also needed. Indeed, one can show that a length M+L-1M+L-1 segment of the input is needed for each output block. So, one saves the last part of the preceding block and concatenates it with the current input block, then convolves that with h(n)h(n) to calculate the current output

Block Processing, a Generalization of Overlap Methods

Convolution is intimately related to the DFT. It was shown in The DFT as Convolution or Filtering that a prime length DFT could be converted to cyclic convolution. It has been long known [48] that convolution can be calculated by multiplying the DFTs of signals.

An important question is what is the fastest method for calculating digital convolution. There are several methods that each have some advantage. The earliest method for fast convolution was the use of sectioning with overlap-add or overlap-save and the FFT [48], [53], [10]. In most cases the convolution is of real data and, therefore, real-data FFTs should be used. That approach is still probably the fastest method for longer convolution on a general purpose computer or microprocessor. The shorter convolutions should simply be calculated directly.

Introduction

The partitioning of long or infinite strings of data into shorter sections or blocks has been used to allow application of the FFT to realize on-going or continuous convolution [57], [30]. This section develops the idea of block processing and shows that it is a generalization of the overlap-add and overlap-save methods [57], [26]. They further generalize the idea to a multidimensional formulation of convolution [3], [15]. Moving in the opposite direction, it is shown that, rather than partitioning a string of scalars into blocks and then into blocks of blocks, one can partition a scalar number into blocks of bits and then include the operation of multiplication in the signal processing formulation. This is called distributed arithmetic [14] and, since it describes operations at the bit level, is completely general. These notes try to present a coherent development of these ideas.

Block Signal Processing

In this section the usual convolution and recursion that implements FIR and IIR discrete-time filters are reformulated in terms of vectors and matrices. Because the same data is partitioned and grouped in a variety of ways, it is important to have a consistent notation in order to be clear. The nthnth element of a data sequence is expressed h(n)h(n) or, in some cases to simplify, hnhn. A block or finite length column vector is denoted h̲nh̲n with nn indicating the nthnth block or section of a longer vector. A matrix, square or rectangular, is indicated by an upper case letter such as HH with a subscript if appropriate.

Block Convolution

The operation of a finite impulse response (FIR) filter is described by a finite convolution as

y ( n ) = k = 0 L - 1 h ( k ) x ( n - k ) y ( n ) = k = 0 L - 1 h ( k ) x ( n - k )
(4)

where x(n)x(n) is causal, h(n)h(n) is causal and of length LL, and the time index nn goes from zero to infinity or some large value. With a change of index variables this becomes

y ( n ) = h ( n - k ) x ( k ) y ( n ) = h ( n - k ) x ( k )
(5)

which can be expressed as a matrix operation by

y 0 y 1 y 2 = h 0 0 0 0 h 1 h 0 0 h 2 h 1 h 0 x 0 x 1 x 2 . y 0 y 1 y 2 = h 0 0 0 0 h 1 h 0 0 h 2 h 1 h 0 x 0 x 1 x 2 .
(6)

The HH matrix of impulse response values is partitioned into NN by NN square sub matrices and the XX and YY vectors are partitioned into length-NN blocks or sections. This is illustrated for N=3N=3 by

H 0 = h 0 0 0 h 1 h 0 0 h 2 h 1 h 0 H 1 = h 3 h 2 h 1 h 4 h 3 h 2 h 5 h 4 h 3 etc. H 0 = h 0 0 0 h 1 h 0 0 h 2 h 1 h 0 H 1 = h 3 h 2 h 1 h 4 h 3 h 2 h 5 h 4 h 3 etc.
(7)
x ̲ 0 = x 0 x 1 x 2 x ̲ 1 = x 3 x 4 x 5 y ̲ 0 = y 0 y 1 y 2 etc. x ̲ 0 = x 0 x 1 x 2 x ̲ 1 = x 3 x 4 x 5 y ̲ 0 = y 0 y 1 y 2 etc.
(8)

Substituting these definitions into Equation 6 gives

y ̲ 0 y ̲ 1 y ̲ 2 = H 0 0 0 0 H 1 H 0 0 H 2 H 1 H 0 x ̲ 0 x ̲ 1 x ̲ 2 y ̲ 0 y ̲ 1 y ̲ 2 = H 0 0 0 0 H 1 H 0 0 H 2 H 1 H 0 x ̲ 0 x ̲ 1 x ̲ 2
(9)

The general expression for the nthnth output block is

y ̲ n = k = 0 n H n - k x ̲ k y ̲ n = k = 0 n H n - k x ̲ k
(10)

which is a vector or block convolution. Since the matrix-vector multiplication within the block convolution is itself a convolution, Equation 10 is a sort of convolution of convolutions and the finite length matrix-vector multiplication can be carried out using the FFT or other fast convolution methods.

The equation for one output block can be written as the product

y ̲ 2 = H 2 H 1 H 0 x ̲ 0 x ̲ 1 x ̲ 2 y ̲ 2 = H 2 H 1 H 0 x ̲ 0 x ̲ 1 x ̲ 2
(11)

and the effects of one input block can be written

H 0 H 1 H 2 x ̲ 1 = y ̲ 0 y ̲ 1 y ̲ 2 . H 0 H 1 H 2 x ̲ 1 = y ̲ 0 y ̲ 1 y ̲ 2 .
(12)

These are generalize statements of overlap save and overlap add [57], [26]. The block length can be longer, shorter, or equal to the filter length.

Block Recursion

Although less well-known, IIR filters can also be implemented with block processing [24], [18], [59], [12], [13]. The block form of an IIR filter is developed in much the same way as for the block convolution implementation of the FIR filter. The general constant coefficient difference equation which describes an IIR filter with recursive coefficients alal, convolution coefficients bkbk, input signal x(n)x(n), and output signal y(n)y(n) is given by

y ( n ) = l = 1 N - 1 a l y n - l + k = 0 M - 1 b k x n - k y ( n ) = l = 1 N - 1 a l y n - l + k = 0 M - 1 b k x n - k
(13)

using both functional notation and subscripts, depending on which is easier and clearer. The impulse response h(n)h(n) is

h ( n ) = l = 1 N - 1 a l h ( n - l ) + k = 0 M - 1 b k δ ( n - k ) h ( n ) = l = 1 N - 1 a l h ( n - l ) + k = 0 M - 1 b k δ ( n - k )
(14)

which can be written in matrix operator form

1 0 0 0 a 1 1 0 a 2 a 1 1 a 3 a 2 a 1 0 a 3 a 2 h 0 h 1 h 2 h 3 h 4 = b 0 b 1 b 2 b 3 0 1 0 0 0 a 1 1 0 a 2 a 1 1 a 3 a 2 a 1 0 a 3 a 2 h 0 h 1 h 2 h 3 h 4 = b 0 b 1 b 2 b 3 0
(15)

In terms of NN by NN submatrices and length-NN blocks, this becomes

A 0 0 0 0 A 1 A 0 0 0 A 1 A 0 h ̲ 0 h ̲ 1 h ̲ 2 = b ̲ 0 b ̲ 1 0 A 0 0 0 0 A 1 A 0 0 0 A 1 A 0 h ̲ 0 h ̲ 1 h ̲ 2 = b ̲ 0 b ̲ 1 0
(16)

From this formulation, a block recursive equation can be written that will generate the impulse response block by block.

A 0 h ̲ n + A 1 h ̲ n - 1 = 0 for n 2 A 0 h ̲ n + A 1 h ̲ n - 1 = 0 for n 2
(17)
h ̲ n = - A 0 - 1 A 1 h ̲ n - 1 = K h ̲ n - 1 for n 2 h ̲ n = - A 0 - 1 A 1 h ̲ n - 1 = K h ̲ n - 1 for n 2
(18)

with initial conditions given by

h ̲ 1 = - A 0 - 1 A 1 A 0 - 1 b ̲ 0 + A 0 - 1 b ̲ 1 h ̲ 1 = - A 0 - 1 A 1 A 0 - 1 b ̲ 0 + A 0 - 1 b ̲ 1
(19)

This can also be written to generate the square partitions of the impulse response matrix by

H n = K H n - 1 for n 2 H n = K H n - 1 for n 2
(20)

with initial conditions given by

H 1 = K A 0 - 1 B 0 + A 0 - 1 B 1 H 1 = K A 0 - 1 B 0 + A 0 - 1 B 1
(21)

ane K=-A0-1A1K=-A0-1A1. This recursively generates square submatrices of HH similar to those defined in Equation 7 and Equation 9 and shows the basic structure of the dynamic system.

Next, we develop the recursive formulation for a general input as described by the scalar difference equation Equation 14 and in matrix operator form by

1 0 0 0 a 1 1 0 a 2 a 1 1 a 3 a 2 a 1 0 a 3 a 2 y 0 y 1 y 2 y 3 y 4 = b 0 0 0 0 b 1 b 0 0 b 2 b 1 b 0 0 b 2 b 1 0 0 b 2 x 0 x 1 x 2 x 3 x 4 1 0 0 0 a 1 1 0 a 2 a 1 1 a 3 a 2 a 1 0 a 3 a 2 y 0 y 1 y 2 y 3 y 4 = b 0 0 0 0 b 1 b 0 0 b 2 b 1 b 0 0 b 2 b 1 0 0 b 2 x 0 x 1 x 2 x 3 x 4
(22)

which, after substituting the definitions of the sub matrices and assuming the block length is larger than the order of the numerator or denominator, becomes

A 0 0 0 0 A 1 A 0 0 0 A 1 A 0 y ̲ 0 y ̲ 1 y ̲ 2 = B 0 0 0 0 B 1 B 0 0 0 B 1 B 0 x ̲ 0 x ̲ 1 x ̲ 2 . A 0 0 0 0 A 1 A 0 0 0 A 1 A 0 y ̲ 0 y ̲ 1 y ̲ 2 = B 0 0 0 0 B 1 B 0 0 0 B 1 B 0 x ̲ 0 x ̲ 1 x ̲ 2 .
(23)

From the partitioned rows of Equation 24, one can write the block recursive relation

A 0 y ̲ n + 1 + A 1 y ̲ n = B 0 x ̲ n + 1 + B 1 x ̲ n A 0 y ̲ n + 1 + A 1 y ̲ n = B 0 x ̲ n + 1 + B 1 x ̲ n
(24)

Solving for y̲n+1y̲n+1 gives

y ̲ n + 1 = - A 0 - 1 A 1 y ̲ n + A 0 - 1 B 0 x ̲ n + 1 + A 0 - 1 B 1 x ̲ n y ̲ n + 1 = - A 0 - 1 A 1 y ̲ n + A 0 - 1 B 0 x ̲ n + 1 + A 0 - 1 B 1 x ̲ n
(25)
y ̲ n + 1 = K y ̲ n + H 0 x ̲ n + 1 + H ˜ 1 x ̲ n y ̲ n + 1 = K y ̲ n + H 0 x ̲ n + 1 + H ˜ 1 x ̲ n
(26)

which is a first order vector difference equation [12], [13]. This is the fundamental block recursive algorithm that implements the original scalar difference equation in Equation 14. It has several important characteristics.

  • The block recursive formulation is similar to a state variable equation but the states are blocks or sections of the output [13], [34], [63], [64].
  • The eigenvalues of KK are the poles of the original scalar problem raised to the NN power plus others that are zero. The longer the block length, the “more stable" the filter is, i.e. the further the poles are from the unit circle [12], [13], [63], [8], [11].
  • If the block length were shorter than the denominator, the vector difference equation would be higher than first order. There would be a non zero A2A2. If the block length were shorter than the numerator, there would be a non zero B2B2 and a higher order block convolution operation. If the block length were one, the order of the vector equation would be the same as the scalar equation. They would be the same equation.
  • The actual arithmetic that goes into the calculation of the output is partly recursive and partly convolution. The longer the block, the more the output is calculated by convolution and, the more arithmetic is required.
  • It is possible to remove the zero eigenvalues in KK by making KK rectangular or square and N by N This results in a form even more similar to a state variable formulation [42], [13]. This is briefly discussed below in section 2.3.
  • There are several ways of using the FFT in the calculation of the various matrix products in Equation 25 and in Equation 27 and Equation 28. Each has some arithmetic advantage for various forms and orders of the original equation. It is also possible to implement some of the operations using rectangular transforms, number theoretic transforms, distributed arithmetic, or other efficient convolution algorithms [13], [63], [6], [16], [62], [49].
  • By choosing the block length equal to the period, a periodically time varying filter can be made block time invariant. In other words, all the time varying characteristics are moved to the finite matrix multiplies which leave the time invariant properties at the block level. This allows use of z-transform and other time-invariant methods to be used for stability analysis and frequency response analysis [37], [38]. It also turns out to be related to filter banks and multi-rate filters [36], [35], [20].

Block State Formulation

It is possible to reduce the size of the matrix operators in the block recursive description Equation 26 to give a form even more like a state variable equation [42], [13], [64]. If KK in Equation 26 has several zero eigenvalues, it should be possible to reduce the size of KK until it has full rank. That was done in [13] and the result is

z ̲ n = K 1 z ̲ n - 1 + K 2 x ̲ n z ̲ n = K 1 z ̲ n - 1 + K 2 x ̲ n
(27)
y ̲ n = H 1 z ̲ n - 1 + H 0 x ̲ n y ̲ n = H 1 z ̲ n - 1 + H 0 x ̲ n
(28)

where H0H0 is the same NN by NN convolution matrix, N1N1 is a rectangular LL by NN partition of the convolution matrix HH, K1K1 is a square NN by NN matrix of full rank, and K2K2 is a rectangular NN by LL matrix.

This is now a minimal state equation whose input and output are blocks of the original input and output. Some of the matrix multiplications can be carried out using the FFT or other techniques.

Block Implementations of Digital Filters

The advantage of the block convolution and recursion implementations is a possible improvement in arithmetic efficiency by using the FFT or other fast convolution methods for some of the multiplications in Equation 10 or Equation 25 [39], [40]. There is the reduction of quantization effects due to an effective decrease in the magnitude of the eigenvalues and the possibility of easier parallel implementation for IIR filters. The disadvantages are a delay of at least one block length and an increased memory requirement.

These methods could also be used in the various filtering methods for evaluating the DFT. This the chirp z-transform, Rader's method, and Goertzel's algorithm.

Multidimensional Formulation

This process of partitioning the data vectors and the operator matrices can be continued by partitioning Equation 10 and Equation 24 and creating blocks of blocks to give a higher dimensional structure. One should use index mapping ideas rather than partitioned matrices for this approach [3], [15].

Periodically Time-Varying Discrete-Time Systems

Most time-varying systems are periodically time-varying and this allows special results to be obtained. If the block length is set equal to the period of the time variations, the resulting block equations are time invariant and all to the time varying characteristics are contained in the matrix multiplications. This allows some of the tools of time invariant systems to be used on periodically time-varying systems.

The PTV system is analyzed in [61], [20], [19], [37], the filter analysis and design problem, which includes the decimation–interpolation structure, is addressed in [22], [38], [36], and the bandwidth compression problem in [35]. These structures can take the form of filter banks [58].

Multirate Filters, Filter Banks, and Wavelets

Another area that is related to periodically time varying systems and to block processing is filter banks [58], [25]. Recently the area of perfect reconstruction filter banks has been further developed and shown to be closely related to wavelet based signal analysis [20], [21], [23], [58]. The filter bank structure has several forms with the polyphase and lattice being particularly interesting.

An idea that has some elements of multirate filters, perfect reconstruction, and distributed arithmetic is given in [27], [28], [29]. Parks has noted that design of multirate filters has some elements in common with complex approximation and of 2-D filter design [54], [56] and is looking at using Tang's method for these designs.

Distributed Arithmetic

Rather than grouping the individual scalar data values in a discrete-time signal into blocks, the scalar values can be partitioned into groups of bits. Because multiplication of integers, multiplication of polynomials, and discrete-time convolution are the same operations, the bit-level description of multiplication can be mixed with the convolution of the signal processing. The resulting structure is called distributed arithmetic [14], [60]. It can be used to create an efficient table look-up scheme to implement an FIR or IIR filter using no multiplications by fetching previously calculated partial products which are stored in a table. Distributed arithmetic, block processing, and multi-dimensional formulations can be combined into an integrated powerful description to implement digital filters and processors. There may be a new form of distributed arithmetic using the ideas in [28], [29].

Direct Fast Convolution and Rectangular Transforms

A relatively new approach uses index mapping directly to convert a one dimensional convolution into a multidimensional convolution [15], [5]. This can be done by either a type-1 or type-2 map. The short convolutions along each dimension are then done by Winograd's optimal algorithms. Unlike for the case of the DFT, there is no savings of arithmetic from the index mapping alone. All the savings comes from efficient short algorithms. In the case of index mapping with convolution, the multiplications must be nested together in the center of the algorithm in the same way as for the WFTA. There is no equivalent to the PFA structure for convolution. The multidimensional convolution can not be calculated by row and column convolutions as the DFT was by row and column DFTs.

It would first seem that applying the index mapping and optimal short algorithms directly to convolution would be more efficient than using DFTs and converting them to convolution to be calculated by the same optimal algorithms. In practical algorithms, however, the DFT method seems to be more efficient [49].

A method that is attractive for special purpose hardware uses distributed arithmetic [14]. This approach uses a table look up of precomputed partial products to produce a system that does convolution without requiring multiplications [17].

Another method that requires special hardware uses number theoretic transforms [9], [41], [45] to calculate convolution. These transforms are defined over finite fields or rings with arithmetic performed modulo special numbers. These transforms have rather limited flexibility, but when they can be used, they are very efficient.

Number Theoretic Transforms for Convolution

Results from Number Theory

A basic review of the number theory useful for signal processing algorithms will be given here with specific emphasis on the congruence theory for number theoretic transforms [47], [31], [46], [41], [55].

Number Theoretic Transforms

Here we look at the conditions placed on a general linear transform in order for it to support cyclic convolution. The form of a linear transformation of a length-N sequence of number is given by

X ( k ) = n = 0 N - 1 t ( n , k ) x ( n ) X ( k ) = n = 0 N - 1 t ( n , k ) x ( n )
(29)

for k=0,1,,(N-1)k=0,1,,(N-1). The definition of cyclic convolution of two sequences is given by

y ( n ) = m = 0 N - 1 x ( m ) h ( n - m ) y ( n ) = m = 0 N - 1 x ( m ) h ( n - m )
(30)

for n=0,1,,(N-1)n=0,1,,(N-1) and all indices evaluated modulo N. We would like to find the properties of the transformation such that it will support the cyclic convolution. This means that if X(k)X(k), H(k)H(k), and Y(k)Y(k) are the transforms of x(n)x(n), h(n)h(n), and y(n)y(n) respectively,

Y ( k ) = X ( k ) H ( k ) . Y ( k ) = X ( k ) H ( k ) .
(31)

The conditions are derived by taking the transform defined in Equation 4 of both sides of equation Equation 5 which gives

Y ( k ) = n = 0 N - 1 t ( n , k ) m = 0 N - 1 x ( m ) h ( n - m ) Y ( k ) = n = 0 N - 1 t ( n , k ) m = 0 N - 1 x ( m ) h ( n - m )
(32)
= m = 0 N - 1 n = 0 N - 1 x ( m ) h ( n - m ) t ( n , k ) . = m = 0 N - 1 n = 0 N - 1 x ( m ) h ( n - m ) t ( n , k ) .
(33)

Making the change of index variables, l=n-ml=n-m, gives

= m = 0 N - 1 l = 0 N - 1 x ( m ) h ( l ) t ( l + m , k ) . = m = 0 N - 1 l = 0 N - 1 x ( m ) h ( l ) t ( l + m , k ) .
(34)

But from Equation 6, this must be

Y ( k ) = n = 0 N - 1 x ( n ) t ( n , k ) m = 0 N - 1 x ( m ) t ( m , k ) Y ( k ) = n = 0 N - 1 x ( n ) t ( n , k ) m = 0 N - 1 x ( m ) t ( m , k )
(35)
= m = 0 N - 1 l = 0 N - 1 x ( m ) h ( l ) t ( n , k ) t ( l , k ) . = m = 0 N - 1 l = 0 N - 1 x ( m ) h ( l ) t ( n , k ) t ( l , k ) .
(36)

This must be true for all x(n)x(n), h(n)h(n), and kk, therefore from Equation 9 and Equation 11 we have

t ( m + l , k ) = t ( m , k ) t ( l , k ) t ( m + l , k ) = t ( m , k ) t ( l , k )
(37)

For l=0l=0 we have

t ( m , k ) = t ( m , k ) t ( 0 , k ) t ( m , k ) = t ( m , k ) t ( 0 , k )
(38)

and, therefore, t(0,k)=1t(0,k)=1. For l=ml=m we have

t ( 2 m , k ) = t ( m , k ) t ( m , k ) = t 2 ( m , k ) t ( 2 m , k ) = t ( m , k ) t ( m , k ) = t 2 ( m , k )
(39)

For l=pml=pm we likewise have

t ( p m , k ) = t p ( m , k ) t ( p m , k ) = t p ( m , k )
(40)

and, therefore,

t N ( m , k ) = t ( N m , k ) = t ( 0 , k ) = 1 . t N ( m , k ) = t ( N m , k ) = t ( 0 , k ) = 1 .
(41)

But

t ( m , k ) = t m ( 1 , k ) = t k ( m , 1 ) , t ( m , k ) = t m ( 1 , k ) = t k ( m , 1 ) ,
(42)

therefore,

t ( m , k ) = t m k ( 1 , 1 ) . t ( m , k ) = t m k ( 1 , 1 ) .
(43)

Defining t(1,1)=αt(1,1)=α gives the form for our general linear transform Equation 4 as

X ( k ) = n = 0 N - 1 α n k x ( n ) X ( k ) = n = 0 N - 1 α n k x ( n )
(44)

where αα is a root of order NN , which means that NN is the smallest integer such that αN=1αN=1.

Theorem 1 The transform Equation 13 supports cyclic convolution if and only if αα is a root of order NN and N-1N-1 is defined.

This is discussed in [2], [4].

Theorem 2 The transform Equation 13 supports cyclic convolution if and only if

N | O ( M ) N | O ( M )
(45)

where

O ( M ) = g c d { p 1 - 1 , p 2 - 1 , , p l - 1 } O ( M ) = g c d { p 1 - 1 , p 2 - 1 , , p l - 1 }
(46)

and

M = p 1 r 1 p 2 r 2 p l r l . M = p 1 r 1 p 2 r 2 p l r l .
(47)

This theorem is a more useful form of Theorem 1. Notice that Nmax=O(M)Nmax=O(M).

One needs to find appropriate NN, MM, and αα such that

  • NN should be appropriate for a fast algorithm and handle the desired sequence lengths.
  • MM should allow the desired dynamic range of the signals and should allow simple modular arithmetic.
  • αα should allow a simple multiplication for αnkx(n)αnkx(n).

We see that if MM is even, it has a factor of 2 and, therefore, O(M)=Nmax=1O(M)=Nmax=1 which implies MM should be odd. If MM is prime the O(M)=M-1O(M)=M-1 which is as large as could be expected in a field of MM integers. For M=2k-1M=2k-1, let kk be a composite k=pqk=pq where pp is prime. Then 2p-12p-1 divides 2pq-12pq-1 and the maximum possible length of the transform will be governed by the length possible for 2p-12p-1. Therefore, only the prime kk need be considered interesting. Numbers of this form are know as Mersenne numbers and have been used by Rader [51]. For Mersenne number transforms, it can be shown that transforms of length at least 2p2p exist and the corresponding α=-2α=-2. Mersenne number transforms are not of as much interest because 2p2p is not highly composite and, therefore, we do not have FFT-type algorithms.

For M=2k+1M=2k+1 and kk odd, 3 divides 2k+12k+1 and the maximum possible transform length is 2. Thus we consider only even kk. Let k=s2tk=s2t, where ss is an odd integer. Then 22t22t divides 2s2t+12s2t+1 and the length of the possible transform will be governed by the length possible for 22t+122t+1. Therefore, integers of the form M=22t+1M=22t+1 are of interest. These numbers are known as Fermat numbers [51]. Fermat numbers are prime for 0t40t4 and are composite for all t5t5.

Since Fermat numbers up to F4F4 are prime, O(Ft)=2bO(Ft)=2b where b=2tb=2t and we can have a Fermat number transform for any length N=2mN=2m where mbmb. For these Fermat primes the integer α=3α=3 is of order N=2bN=2b allowing the largest possible transform length. The integer α=2α=2 is of order N=2b=2t+1N=2b=2t+1. This is particularly attractive since αα to a power is multiplied times the data values in Equation 4.

The following table gives possible parameters for various Fermat number moduli.

Table 1
t b M = F t M = F t N 2 N 2 N 2 N 2 N m a x N m a x αα for NmaxNmax
             
3 8 2 8 + 1 2 8 + 1 16 32 256 3
4 16 2 16 + 1 2 16 + 1 32 64 65536 3
5 32 2 32 + 1 2 32 + 1 64 128 128 2 2
6 64 2 64 + 1 2 64 + 1 128 256 256 2 2

This table gives values of NN for the two most important values of αα which are 2 and 22. The second column give the approximate number of bits in the number representation. The third column gives the Fermat number modulus, the fourth is the maximum convolution length for α=2α=2, the fifth is the maximum length for α=2α=2, the sixth is the maximum length for any αα, and the seventh is the αα for that maximum length. Remember that the first two rows have a Fermat number modulus which is prime and second two rows have a composite Fermat number as modulus. Note the differences.

The books, articles, and presentations that discuss NTT and related topics are [33], [41], [45], [9], [43], [44], [50], [52], [51], [1], [7], [2], [4]. A recent book discusses NT in a signal processing context [32].

References

  1. Agarwal, R. C. and Burrus, C. S. (1973, April). Fast Digital Convolution using Fermat Transforms. In Proceedings of the IEEE 25th Annual Southwestern Conference. (p. 538–543). Houston
  2. Agarwal, R. C. and Burrus, C. S. (1974, April). Fast Convolution Using Fermat Number Transforms with Applications to Digital Filtering. [Reprinted in Number Theory in DSP, by McClellan and Rader, Prentice-Hall, 1979]. IEEE Transactions on Acoustics, Speech, and Signal Processing, ASSP-22(2), 87-97.
  3. Agarwal, R. C. and Burrus, C. S. (1974, February). Fast One-Dimensional Digital Convolution by Multi-Dimensional Techniques. [also in IEEE Press DSP Reprints II, 1979; and Number Theory in DSP, by McClellan and Rader, Prentice-Hall, 1979]. IEEE Transactions on Acoustics, Speech, and Signal Processing, ASSP-22(1), 1–10.
  4. Agarwal, R. C. and Burrus, C. S. (1975, April). Number Theoretic Transforms to Implement Fast Digital Convolution. [also in IEEE Press DSP Reprints II, 1979]. Proceedings of the IEEE, 63(4), 550–560.
  5. Agarwal, R. C. and Cooley, J. W. (1977, October). New Algorithms for Digital Convolution. IEEE Trans. on ASSP, 25(2), 392–410.
  6. Burrus, C. S. and Agarwal, R. C. (1973, November, 5). Efficient Implementation of Recursive Digital Filters. In Proceedings of the Seventh Asilomar Conference on Circuits and Systems. (p. 280–284). Pacific Grove, CA
  7. Burrus, C. S. and Agarwal, R. C. (1974, January). The Use of Number Theoretic Transforms for Convolution. In Presented at the IEEE Arden House Workshop on Digital Signal Processing. Harriman, NY
  8. Barnes, C. W. (1979, March). Roundoff–Noise and Overflow in Normal Digital Filters. IEEE Transactions on Circuit and Systems, CAS-26, 154–155.
  9. Blahut, Richard E. (1985). Fast Algorithms for Digital Signal Processing. Reading, Mass.: Addison-Wesley.
  10. Burrus, C. S. and Parks, T. W. (1985). DFT/FFT and Convolution Algorithms. New York: John Wiley & Sons.
  11. Barnes, C. W. and Shinnaka, S. (1980, August). Block Shift Invariance and Block Implementation of Discrete-Time Filters. IEEE Transactions on Circuit and Systems, CAS-27(4), 667–672.
  12. Burrus, C. S. (1971, November). Block Implementation of Digital Filters. IEEE Transactions on Circuit Theory, CT-18(6), 697–701.
  13. Burrus, C. S. (1972, October). Block Realization of Digital Filters. IEEE Transactions on Audio and Electroacoustics, AU-20(4), 230–235.
  14. Burrus, C. S. (1977, December). Digital Filter Structures Described by Distributed Arithmetic. IEEE Transactions on Circuit and Systems, CAS-24(12), 674–680.
  15. Burrus, C. S. (1977, June). Index Mapping for Multidimensional Formulation of the DFT and Convolution. IEEE Transactions on Acoustics, Speech, and Signal Processing, ASSP-25(3), 239–242.
  16. Burrus, C. S. (1977, May). Recursive Digital Filter Structures Using New High Speed Convolution Algorithms. In IEEE International Conference on Acoustics, Speech, and Signal Processing. (p. 363–365). Hartford, CT
  17. Chu, Shuni and Burrus, C. S. (1982, April). A Prime Factor FFT Algorithm using Distributed Arithmetic. IEEE Transactions on Acoustics, Speech, and Signal Processing, 30(2), 217–227.
  18. Chanoux, D. (1970, June). Synthesis of Recursive Digital Filters using the FFT. IEEE Transactions on Audio and Electroacoustics, AU-18, 211–212.
  19. Claasen, T. A. C. M. and Mecklenbraüker, W. F. G. (1982, March). On Stationary Linear Time–Varying Systems. IEEE Trans. on Circuits and Systems, 29(3), 169–184.
  20. Crochiere, R. E. and Rabiner, L. R. (1983). Multirate Digital Signal Processing. Englewood Cliffs, NJ: Prentice-Hall.
  21. Daubechies, Ingrid. (1992). Ten Lectures on Wavelets. [Notes from the 1990 CBMS-NSF Conference on Wavelets and Applications at Lowell, MA]. Philadelphia, PA: SIAM.
  22. Franasek, P. A. and Liu, B. (1967). On a Class of Time–Varying Filters. IEEE Trans. on Information Theory, 13, 477.
  23. Gopinath, R. A. and Burrus, C. S. (1992). Wavelet Transforms and Filter Banks. In Chui, Charles K. (Ed.), Wavelets: A Tutorial in Theory and Applications. (p. 603–655). [Volume 2 in the series: Wavelet Analysis and its Applications]. San Diego, CA: Academic Press.
  24. Gold, B. and Jordan, K. L. (1968, October). A Note on Digital Filter Synthesis. Proceedings of the IEEE, 56, 1717–1718.
  25. Gopinath, Ramesh A. (1992, August). Wavelets and Filter Banks – New Results and Applications. Ph. D. Thesis. Rice University, Houston, Tx.
  26. Gold, B. and Rader, C. M. (1969). Digital Processing of Signals. New York: McGraw-Hill.
  27. Ghanekar, Sachin and Tantaratana, Sawasd and Franks, Lewis E. (1991, May). Implementation of Recursive Filters using Highly Quantized Periodically Time–Varying Coefficients. In Proceedings of the ICASSP-91. (p. 1625–1628). Toronto, Canada
  28. Ghanekar, S. P. and Tantaratana, S. and Franks, L. E. (1992). High-Precision Multiplier–Free FIR Filter Realization with Periodically Time–Varying Coefficients. In Paper Summaries for the 1992 DSP Workshop. (p. 3.3.1). Starved Rock Lodge, Utica, Ill.
  29. Ghanekar, S. P. and Tantaratana, S. and Franks, L. E. (1995). A Class of High-Precision Multiplier–Free FIR Filter Realizations with Periodically Time–Varying Coefficients. IEEE Transactions on Signal Processing, 43(4), 822–830.
  30. Helms, H. D. (1967, June). Fast Fourier Transform Method of Computing Difference Equations and Simulating Filters. IEEE Trans. on Audio and Electroacoustics, AU-15, 85–90.
  31. Hardy, G. H. and Wright, E. M. (1938, 1960). An Introduction to the Theory of Numbers. (Fourth). London: Oxford.
  32. Krishna, H. and Krishna, B. and Lin, K.-Y. and Sun, J.-D. (1994). Computational Number Theory and Digital Signal Processing. Boca Raton, FL: CRC Press.
  33. Knuth, Donald E. (1997). The Art of Computer Programming, Vol. 2, Seminumerical Algorithms. (Third). Reading, MA: Addison-Wesley.
  34. Loeffler, C. M. and Burrus, C. S. (1981, April). Equivalence of Block Filter Representations. In Proceedings of the 1981 IEEE International Symposium on Circuits and Systems. (pp. 546-550). Chicago, IL
  35. Loeffler, C. M. and Burrus, C. S. (1982, May). Periodically Time–Varying Bandwidth Compressor. In Proceedings of the IEEE International Symposium on Circuits and Systems. (p. 663–665). Rome, Italy
  36. Loeffler, C. M. and Burrus, C. S. (1984, October). Optimal Design of Periodically Time Varying and Multirate Digital Filters. IEEE Transactions on Acoustics, Speech, and Signal Processing, ASSP-32(5), 991-924.
  37. Meyer, R. A. and Burrus, C. S. (1975, March). A Unified Analysis of Multirate and Periodically Time Varying Digital Filters. IEEE Transactions on Circuits and Systems, CAS-22(3), 162–168.
  38. Meyer, R. A. and Burrus, C. S. (1976, February). Design and Implementation of Multirate Digital Filters. IEEE Transactions on Acoustics, Speech, and Signal Processing, ASSP-24(1), 53–58.
  39. Mitra, S. K. and Gransekaran, R. (1977, July). A Note on Block Implementation of IIR Digital Filters. IEEE Transactions on Circuit and Systems, CAS-24(7),
  40. Mitra, S. K. and Gransekaran, R. (1978, April). Block Implementation of Recursive Digital Filters – New Structures and Properties. IEEE Transactions on Circuit and Systems, CAS-25(4), 200–207.
  41. McClellan, J. H. and Rader, C. M. (1979). Number Theory in Digital Signal Processing. Englewood Cliffs, NJ: Prentice-Hall.
  42. Meek, J. W. and Veletsos, A. S. (1972, March). Fast Convolution for Recursive Digital Filters. IEEE Transactions on Audio and Electroacoustics, AU-20, 93–94.
  43. Myers, Douglas G. (1990). Digital Signal Processing, Efficient Convolution and Fourier Transform Techniques. Sydney, Australia: Prentice-Hall.
  44. Nicholson, P. J. (1971, February). Algebraic Theory of Finite Fourier Transforms. Journal of Computer and System Sciences, 5(2), 524–547.
  45. Nussbaumer, H. J. (1981, 1982). Fast Fourier Transform and Convolution Algorithms. (Second). Heidelberg, Germany: Springer-Verlag.
  46. Niven, Ivan and Zuckerman, H. S. (1980). An Introduction to the Theory of Numbers. (Fourth). New York: John Wiley & Sons.
  47. Ore, Oystein. (1948). Number Theory and Its History. New York: McGraw-Hill.
  48. Oppenheim, A. V. and Schafer, R. W. (1999). Discrete-Time Signal Processing. (Second). [Earlier editions in 1975 and 1989]. Englewood Cliffs, NJ: Prentice-Hall.
  49. Pitas, I. and Burrus, C. S. (1983, March). Time and Error Analysis of Digital Convolution by Rectangular Transforms. Signal Processing, 5(2), 153–162.
  50. Pollard, J. M. (1971, April). The Fast Fourier Transform in a Finite Field. Mathematics of Computation, 25(114), 365–374.
  51. Rader, Charles M. (1972, December). Discrete Convolution via Mersenne Transforms. IEEE Transactions on Computers, 21(12), 1269–1273.
  52. Rader, Charles M. (1972, January). Number Theoretic Convolution. In IEEE Signal Processing Workshop. Arden House, Harriman, NY
  53. Rabiner, L. R. and Gold, B. (1975). Theory and Application of Digital Signal Processing. Englewood Cliffs, NJ: Prentice-Hall.
  54. Shenoy, R. G. and Burnside, Daniel and Parks, T. W. (1992, September). Linear Periodic Systems and Multirate Filter Design. (GEO-002-92-16b). Technical report. Schlumberger–Doll Research Note.
  55. Schroeder, Manfred R. (1984, 1986). Number Theory in Science and Comminication. (Second). Berlin: Springer–Verlag.
  56. Shenoy, R. G. and Parks, T. W. and Burnside, Daniel. (1992). Fourier Analysis of Linear Periodic Systems and Multirate Filter Design. In Paper Summaries for the 1992 DSP Workshop. (p. 2.4.1). Starved Rock Lodge, Utica, Ill.
  57. Stockham, T. G. (1966). High Speed Convolution and Correlation. In AFIPS Conf. Proc. (Vol. 28, p. 229–233). 1966 Spring Joint Computer Conference.
  58. Vaidyanathan, P. P. (1992). Multirate Systems and Filter Banks. Englewood Cliffs, NJ: Prentice-Hall.
  59. Voelcker, H. B. and Hartquist, E. E. (1970, June). Digital Filtering via Block Recursion. IEEE Transactions on Audio and Electroacoustics, AU-18, 169–176.
  60. White, S. A. (1989, July). Applications of Distributed Arithmetic to Digital Signal Processing. IEEE ASSP Magazine, 6(3), 4–19.
  61. Zadeh, L. A. (1950). Frequency Analysis of Variable Networks. Proceeding of the IRE, 38(3), 291–299.
  62. Zalcstein, Y. (1971, June). A Note on Fast Convolution. IEEE Transactions on Computers, C-20, 665.
  63. Zeman, Jan and Lindgren, Allen G. (1981, July). Fast Digital Filters with Low Round–Off Noise. IEEE Transactions on Circuit and Systems, CAS-28, 716–723.
  64. Zeman, Jan and Lindgren, Allen G. (1981, October). Fast State–Space Decimator with Very Low Round–off Noise. Signal Processing, 3(4), 377–388.

Collection Navigation

Content actions

Download:

Collection as:

PDF | EPUB (?)

What is an EPUB file?

EPUB is an electronic book format that can be read on a variety of mobile devices.

Downloading to a reading device

For detailed instructions on how to download this content's EPUB to your specific device, click the "(?)" link.

| More downloads ...

Module as:

PDF | EPUB (?)

What is an EPUB file?

EPUB is an electronic book format that can be read on a variety of mobile devices.

Downloading to a reading device

For detailed instructions on how to download this content's EPUB to your specific device, click the "(?)" link.

| More downloads ...

Add:

Collection to:

My Favorites (?)

'My Favorites' is a special kind of lens which you can use to bookmark modules and collections. 'My Favorites' can only be seen by you, and collections saved in 'My Favorites' can remember the last module you were on. You need an account to use 'My Favorites'.

| A lens I own (?)

Definition of a lens

Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

Who can create a lens?

Any individual member, a community, or a respected organization.

What are tags? tag icon

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

| External bookmarks

Module to:

My Favorites (?)

'My Favorites' is a special kind of lens which you can use to bookmark modules and collections. 'My Favorites' can only be seen by you, and collections saved in 'My Favorites' can remember the last module you were on. You need an account to use 'My Favorites'.

| A lens I own (?)

Definition of a lens

Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

Who can create a lens?

Any individual member, a community, or a respected organization.

What are tags? tag icon

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

| External bookmarks