# Connexions

You are here: Home » Content » Computing the fast Fourier transform on SIMD microprocessors » Algorithms

### Recently Viewed

This feature requires Javascript to be enabled.

Inside Collection (Ph.D. thesis):

Thesis by: Anthony Blake. E-mail the author

# Algorithms

Module by: Anthony Blake. E-mail the author

Efficient computation of the fast Fourier transform (FFT) requires an understanding of the computation at every level of abstraction, from the high-level algorithmic view down to the low-level details of the target machine (or failing that, a lot of time to code all known FFT algorithms and exhaustively search the configuration space). This chapter provides an overview of FFT from the mathematical perspective.

Fast Fourier transform algorithms are derived from the discrete Fourier transform (DFT), which is formally defined as [2]:

X k = n = 0 N - 1 ω N n k x n X k = n = 0 N - 1 ω N n k x n
(1)

where k=0,,N-1k=0,,N-1 and ωNωN is the primitive root-of-unity, defined as e-2π-1/Ne-2π-1/N (often referred to as a “twiddle factor” in the context of fast Fourier transforms). XkXk and xnxn are sequences of complex numbers, XkXk being the outputs in the frequency domain, and xnxn being the inputs in the time or space domain.

A source of mild confusion in the FFT literature is the sign of the twiddle factor [17]; the definition in Equation 1 is considered to be the engineers view of the discrete Fourier transform, where the goal is to compute the coefficients of a discrete Fourier series. Mathematicians, on the other hand, typically view the DFT as a method of evaluating a polynomial at the powers of a primitive root of unity, and thus consider Equation 1 to be an inverse DFT [17]. Cooley and Tukey [5], Fiduccia [8] and Bernstein [1] are notable examples of those who adopt the mathematicians convention. This work adopts the engineer's view of a DFT, and thus the inverse discrete Fourier transform (IDFT) is defined by the following equation:

x n = 1 N k = 0 N - 1 ω N - n k X k x n = 1 N k = 0 N - 1 ω N - n k X k
(2)

where n=0,,N-1n=0,,N-1. It should be noted that in some implementations, such as FFTW and the implementation presented in this thesis, the IDFT is actually non-normalised for reasons of efficiency; i.e., IFFT(FFT(x))=NxIFFT(FFT(x))=Nx, thus avoiding division of each of the samples in time by N [9].

## Cooley-Tukey

In 1965 James Cooley and John Tukey published a description of an economical algorithm for computing the DFT that became known as the Cooley-Tukey FFT, or simply the FFT due to its overwhelming popularity [5]. A later investigation by Heideman, Johnson and Burrus [11] revealed that the algorithm had actually been discovered several times in various forms prior to Cooley and Tukey, most notably by Gauss sometime around 1805 [3].

The algorithm recursively divides a transform of size N=N1N2N=N1N2 into smaller DFTs of size N1N1 and N2N2 (where N is highly composite), reducing the time complexity from O(n2n2) to O(nlognnlogn) by exploiting common factors.

As the algorithm recursively divides a DFT, either N1N1 or N2N2 is typically a small factor, and is known as the radix. Small N1N1 characterizes the algorithm as being decimation-in-time (DIT), otherwise the algorithm is decimation-in-frequency (DIF). If the radix changes between stages, then the algorithm is referred to as `mixed-radix'.

For example, a radix-2 decimation-in-time algorithm decomposes Equation 1 into a sum over the even indices (n=2n2n=2n2) and a sum over the odd indices (n=2n2+1n=2n2+1):

X k = n 2 = 0 N / 2 - 1 ω N ( 2 n 2 ) k x 2 n 2 + n 2 = 0 N / 2 - 1 ω N ( 2 n 2 + 1 ) k x 2 n 2 + 1 X k = n 2 = 0 N / 2 - 1 ω N ( 2 n 2 ) k x 2 n 2 + n 2 = 0 N / 2 - 1 ω N ( 2 n 2 + 1 ) k x 2 n 2 + 1
(3)

The trigonometric coefficient in the second sum can be expanded to ωN2n2kωNkωN2n2kωNk, and the term now common to both sums is simplified using the identity ωNmnk=ωN/mnkωNmnk=ωN/mnk. Because one of the trigonometric coefficients in the second sum is constant with respect to the index variable, it may be factored out to obtain:

X k = n 2 = 0 N / 2 - 1 ω N / 2 n 2 k x 2 n 2 + ω N k n 2 = 0 N / 2 - 1 ω N / 2 n 2 k x 2 n 2 + 1 X k = n 2 = 0 N / 2 - 1 ω N / 2 n 2 k x 2 n 2 + ω N k n 2 = 0 N / 2 - 1 ω N / 2 n 2 k x 2 n 2 + 1
(4)

where the two sums are now DFTs of the even indexed terms (x2n2x2n2) and the odd indexed terms (x2n2+1x2n2+1), which are combined with twiddle factor ωNkωNk.

In order to compute the transform more efficiently, the Cooley-Tukey algorithm divides XkXk into two halves, and exploits the periodicity of sub-transforms and symmetries in the trigonometric coefficients. Firstly, Equation 4 is rewritten as two halves with EkEk substituted for the even sub-transform, and OkOk substituted for the odd sub-transform:

X k = E k + ω N k O k X k + N / 2 = E k + N / 2 + ω N k + N / 2 O k + N / 2 X k = E k + ω N k O k X k + N / 2 = E k + N / 2 + ω N k + N / 2 O k + N / 2
(5)

where k=0,,N/2-1k=0,,N/2-1. Because of the periodicity property of the outputs of a DFT, Ek=Ek+N/2Ek=Ek+N/2 and Ok=Ok+N/2Ok=Ok+N/2, Equation 5 simplifies thus:

X k = E k + ω N k O k X k + N / 2 = E k + ω N k + N / 2 O k X k = E k + ω N k O k X k + N / 2 = E k + ω N k + N / 2 O k
(6)

And finally, by exploiting symmetries in the complex exponential function, namely that ωNk+N/2=-ωNkωNk+N/2=-ωNk, the radix-2 DIT FFT can be expressed as:

X k = E k + ω N k O k X k + N / 2 = E k - ω N k O k X k = E k + ω N k O k X k + N / 2 = E k - ω N k O k
(7)

which makes it clear that each pair of outputs share common computation, approximately halving the number of arithmetic operations when compared to the DFT. But since the even and odd terms in Equation 7 are themselves DFTs that can be computed with the FFT, the savings compound with each stage of recursion. The total number of real arithmetic operations required to compute the radix-2 FFT can be expressed with the following recurrence relation:

T ( n ) = 2 T ( n / 2 ) + 5 n - 6 for n 2 0 for n = 1 T ( n ) = 2 T ( n / 2 ) + 5 n - 6 for n 2 0 for n = 1
(8)

which is in Θ(nlogn)Θ(nlogn).

In 1968 a derivitive of the Cooley-Tukey algorithm broke the record for the lowest number of arithmetic operations for computing the DFT [7], [16], [19]. The algorithm was initially discovered by Yavne [20], but was not widely cited until 1984 when it was rediscovered by Duhamel and Hollman [6] and became known as the split-radix algorithm.

The split-radix algorithm improves the arithmetic complexity of the Cooley-Tukey algorithm by further decomposing the odd parts into odd-odd and odd-even parts, while the even parts are left alone because they have no multiplicative factor. More formally, Equation 1 can be re-written as three sums:

X k = n 2 = 0 N / 2 - 1 ω N 2 n 2 k x 2 n 2 + n 4 = 0 N / 4 - 1 ω N ( 4 n 4 + 1 ) k x 4 n 4 + 1 + n 4 = 0 N / 4 - 1 ω N ( 4 n 4 + 3 ) k x 4 n 4 + 3 X k = n 2 = 0 N / 2 - 1 ω N 2 n 2 k x 2 n 2 + n 4 = 0 N / 4 - 1 ω N ( 4 n 4 + 1 ) k x 4 n 4 + 1 + n 4 = 0 N / 4 - 1 ω N ( 4 n 4 + 3 ) k x 4 n 4 + 3
(9)

where n=4n4=2n2n=4n4=2n2. As with the Cooley-Tukey radix-2 example in "Cooley-Tukey", the trigonometric coefficients are expanded and simplified, and the terms constant with respect to the index variables factored out:

X k = n 2 = 0 N / 2 - 1 ω N / 2 n 2 k x 2 n 2 + ω N k n 4 = 0 N / 4 - 1 ω N / 4 n 4 k x 4 n 4 + 1 + ω N 3 k n 4 = 0 N / 4 - 1 ω N / 4 n 4 k x 4 n 4 + 3 X k = n 2 = 0 N / 2 - 1 ω N / 2 n 2 k x 2 n 2 + ω N k n 4 = 0 N / 4 - 1 ω N / 4 n 4 k x 4 n 4 + 1 + ω N 3 k n 4 = 0 N / 4 - 1 ω N / 4 n 4 k x 4 n 4 + 3
(10)

By substituting the even sum with UkUk (where k=0,,N/2-1k=0,,N/2-1) and the odd sums with ZkZk and Zk'Zk' (where k=0,,N/4-1k=0,,N/4-1), Equation 10 is simplified:

X k = U k + ω N k Z k + ω N 3 k Z k ' X k = U k + ω N k Z k + ω N 3 k Z k '
(11)

Computation can be factored out of Equation 11 by again exploiting periodicity in the sub-transforms and symmetries in the twiddle factors. Equation 11 is first expressed as an equation of four parts:

X k = U k + ω N k Z k + ω N 3 k Z k ' X k + N / 2 = U k + N / 2 + ω N k + N / 2 Z k + N / 2 + ω N 3 ( k + N / 2 ) Z k + N / 2 ' X k + N / 4 = U k + N / 4 + ω N k + N / 4 Z k + N / 4 + ω N 3 ( k + N / 4 ) Z k + N / 4 ' X k + 3 N / 4 = U k + 3 N / 4 + ω N k + 3 N / 4 Z k + 3 N / 4 + ω N 3 ( k + 3 N / 4 ) Z k + 3 N / 4 ' X k = U k + ω N k Z k + ω N 3 k Z k ' X k + N / 2 = U k + N / 2 + ω N k + N / 2 Z k + N / 2 + ω N 3 ( k + N / 2 ) Z k + N / 2 ' X k + N / 4 = U k + N / 4 + ω N k + N / 4 Z k + N / 4 + ω N 3 ( k + N / 4 ) Z k + N / 4 ' X k + 3 N / 4 = U k + 3 N / 4 + ω N k + 3 N / 4 Z k + 3 N / 4 + ω N 3 ( k + 3 N / 4 ) Z k + 3 N / 4 '
(12)

where k=0,,N/4-1k=0,,N/4-1. The periodicity properties of the sub-transforms can be expressed with the relationships Uk=Uk+N/2Uk=Uk+N/2, Zk=Zk+N/4Zk=Zk+N/4 and Zk'=Zk+N/4'Zk'=Zk+N/4'. These are used to simplify Equation 12 thus:

X k = U k + ω N k Z k + ω N 3 k Z k ' X k + N / 2 = U k + ω N k + N / 2 Z k + ω N 3 ( k + N / 2 ) Z k ' X k + N / 4 = U k + N / 4 + ω N k + N / 4 Z k + ω N 3 ( k + N / 4 ) Z k ' X k + 3 N / 4 = U k + N / 4 + ω N k + 3 N / 4 Z k + ω N 3 ( k + 3 N / 4 ) Z k ' X k = U k + ω N k Z k + ω N 3 k Z k ' X k + N / 2 = U k + ω N k + N / 2 Z k + ω N 3 ( k + N / 2 ) Z k ' X k + N / 4 = U k + N / 4 + ω N k + N / 4 Z k + ω N 3 ( k + N / 4 ) Z k ' X k + 3 N / 4 = U k + N / 4 + ω N k + 3 N / 4 Z k + ω N 3 ( k + 3 N / 4 ) Z k '
(13)

Symmetries in the complex exponential function are again used to expose common computation among each part of the equation; hence

X k = U k + ( ω N k Z k + ω N 3 k Z k ' ) X k + N / 2 = U k - ( ω N k Z k + ω N 3 k Z k ' ) X k + N / 4 = U k + N / 4 - i ( ω N k Z k - ω N 3 k Z k ' ) X k + 3 N / 4 = U k + N / 4 + i ( ω N k Z k - ω N 3 k Z k ' ) X k = U k + ( ω N k Z k + ω N 3 k Z k ' ) X k + N / 2 = U k - ( ω N k Z k + ω N 3 k Z k ' ) X k + N / 4 = U k + N / 4 - i ( ω N k Z k - ω N 3 k Z k ' ) X k + 3 N / 4 = U k + N / 4 + i ( ω N k Z k - ω N 3 k Z k ' )
(14)

which, when recursively applied to the sub-transforms, results in the following recurrence relation for real arithmetic operations:

T ( n ) = T ( n / 2 ) + 2 T ( n / 4 ) + 6 n - 4 for n 2 0 for n = 1 T ( n ) = T ( n / 2 ) + 2 T ( n / 4 ) + 6 n - 4 for n 2 0 for n = 1
(15)

The exact solution T(n)=4nlog2n-6n+8T(n)=4nlog2n-6n+8 for n2n2 was the best arithmetic complexity of all known FFT algorithms for over 30 years, until Van Buskirk was able to break the record in 2004 [15], as described in "Tangent".

Van Buskirk's arithmetic complexity breakthrough was based on a variant of the split-radix algorithm known as the “conjugate-pair” algorithm [13] or the “-1-1 exponent” split-radix algorithm [1], [17]. In 1989 the conjugate-pair algorithm was published with the claim that it had broken the record set by Yavne in 1968 for the lowest number of arithmetic operations for computing the DFT [13]. Unfortunately the reduction in the number of arithmetic operations was due to an error in the author's analysis, and the algorithm was subsequently proven to have an arithmetic count equal to the original split-radix algorithm [10], [18], [14]. Despite initial claims about the arithmetic savings being discredited, the conjugate-pair algorithm has been used to reduce twiddle factor loads in software implementations of the FFT and fast Hartley transform (FHT) [13], and the algorithm was also recently used as the basis for an algorithm that does reduce the arithmetic operation count, as described in "Tangent".

The difference between the conjugate-pair algorithm and the split-radix algorithm is in the decomposition of odd elements. In the standard split-radix algorithm, the odd elements are decomposed into two parts: x4n4+1x4n4+1 and x4n4+3x4n4+3 (see Equation 10), while in the conjugate-pair algorithm, the last sub-sequence is cyclically shifted by -4-4, where negative indices wrap around (i.e., x-1=xN-1x-1=xN-1). The result of this cyclic shift is that twiddle factors are now conjugate pairs. Formally, the conjugate-pair algorithm is defined as:

X k = n 2 = 0 N / 2 - 1 ω N / 2 n 2 k x 2 n 2 + ω N k n 4 = 0 N / 4 - 1 ω N / 4 n 4 k x 4 n 4 + 1 + ω N - k n 4 = 0 N / 4 - 1 ω N / 4 n 4 k x 4 n 4 - 1 X k = n 2 = 0 N / 2 - 1 ω N / 2 n 2 k x 2 n 2 + ω N k n 4 = 0 N / 4 - 1 ω N / 4 n 4 k x 4 n 4 + 1 + ω N - k n 4 = 0 N / 4 - 1 ω N / 4 n 4 k x 4 n 4 - 1
(16)

As with the ordinary split-radix algorithm, a DIT decomposition of the conjugate-pair algorithm can be expressed as a system of equations:

X k = U k + ( ω N k Z k + ω N - k Z k ' ) X k + N / 2 = U k - ( ω N k Z k + ω N - k Z k ' ) X k + N / 4 = U k + N / 4 - i ( ω N k Z k - ω N - k Z k ' ) X k + 3 N / 4 = U k + N / 4 + i ( ω N k Z k - ω N - k Z k ' ) X k = U k + ( ω N k Z k + ω N - k Z k ' ) X k + N / 2 = U k - ( ω N k Z k + ω N - k Z k ' ) X k + N / 4 = U k + N / 4 - i ( ω N k Z k - ω N - k Z k ' ) X k + 3 N / 4 = U k + N / 4 + i ( ω N k Z k - ω N - k Z k ' )
(17)

where k=0,,N/4-1k=0,,N/4-1. As can be seen, the trigonometric coefficients are conjugates – a feature that can be exploited to reduce twiddle factor loads.

## Tangent

In 2004, some thirty years after Yavne set the record for the lowest arithmetic operation count, Van Buskirk posted software to Usenet that had asymptotically reduced the arithmetic operation count by about 6%. Three papers were subsequently published [15], [1], [12] with differing explanations on how to achieve the lowest arithmetic operation count initially demonstrated by Van Buskirk.

Although all three papers describe algorithms that achieve the lowest arithmetic operation count in the same way, and thus can be considered to be different views of the same algorithm, all three papers refer to the algorithms by different names. Lundy and Van Buskirk [15] refer to their algorithm as “scaled odd tail FFT”, Bernstein [1] describes an algorithm named “tangent FFT”, while Johnson and Frigo [12] refer to the algorithm by various names. Many works have cited Johnson and Frigo for the algorithm [4]. Of these names, “tangent FFT” is used in this work because it is the most descriptive; scaling the twiddle factors into tangent form was the linchpin of Van Buskirk's breakthrough in arithmetic complexity.

Bernstein expresses a DIF decomposition of the tangent FFT in a very concise but somewhat obscure polynomial form that was first practised by Fiduccia [8]. In order to be consistent with earlier sections, a DIT decomposition of the tangent FFT using linear functions will be described in this section.1 While the polynomial form is more elegant and concise, expressing the FFT in terms of linear functions has the advantage of mapping to software or hardware more directly.

The key to the tangent FFT is Van Buskirk's observation that if the trigonometric constant ωNk=cosθ+isinθωNk=cosθ+isinθ is factored as (1+itanθ)cosθ(1+itanθ)cosθ or (cotθ+i)sinθ(cotθ+i)sinθ, the multiplication by cosθcosθ or sinθsinθ can sometimes be absorbed elsewhere in the computation, assuming the constants are precomputed, and the remaining multiplication by constants of the form ±(1+itanθ)±(1+itanθ) or ±(cotθ+i)±(cotθ+i) now only costs four floating point operations instead of six, assuming the usual scheme of complex multiplication using four multiply and two add operations.

Firstly, consider the conjugate-pair FFT being recursively scaled by a wavelet sN,ksN,k:

X k s N , k = U k s N / 2 , k s N , k + ω N k s N / 4 , k s N , k Z k + ω N - k s N / 4 , k s N , k Z k ' X k s N , k = U k s N / 2 , k s N , k + ω N k s N / 4 , k s N , k Z k + ω N - k s N / 4 , k s N , k Z k '
(18)

for k=0,,N/4-1k=0,,N/4-1, and where UkUk is evaluated with Xk/sN/2,kXk/sN/2,k, and ZkZk and Zk'Zk' are evaluated with Xk/sN/4,kXk/sN/4,k.

The wavelet is crafted such that it is periodic in kk (i.e., sN,k=sN,k+N/4sN,k=sN,k+N/4) and ωNk(sN/4,k/sN,k)ωNk(sN/4,k/sN,k) is of the form ±(1+itanθ)±(1+itanθ) or ±(cotθ+i)±(cotθ+i). Bernstein defines the wavelet as [1]:

s N , k = 0 max cos 4 2 π k N , sin 4 2 π k N s N , k = 0 max cos 4 2 π k N , sin 4 2 π k N
(19)

Multiplying ZkZk and Zk'Zk' by the scaled constants saves a total of four floating point operations, while scaling UkUk costs four operations, resulting in no gain or loss. But the cost of scaling the result back to XkXk is about 2N2N real operations. In order to realize a reduction in the number of floating point operations, the split-radix FFT is decomposed further, so that the extra operations can be absorbed into constants in the sub-transform. Starting with the unscaled split-radix FFT (see Equation 9), the sum over the x2n2x2n2 terms is itself decomposed with a split-radix decomposition into x4n4x4n4, x8n8+2x8n8+2 and x8n8+6x8n8+6, resulting in a DFT of five sums:

X k = n 4 = 0 N / 4 - 1 ω N 4 n 4 k x 4 n 4 + n 8 = 0 N / 8 - 1 ω N ( 8 n 8 + 2 ) k x 8 n 8 + 2 + n 8 = 0 N / 8 - 1 ω N ( 8 n 8 + 6 ) k x 8 n 8 + 6 + n 4 = 0 N / 4 - 1 ω N ( 4 n 4 + 1 ) k x 4 n 4 + 1 + n 4 = 0 N / 4 - 1 ω N ( 4 n 4 + 3 ) k x 4 n 4 + 3 X k = n 4 = 0 N / 4 - 1 ω N 4 n 4 k x 4 n 4 + n 8 = 0 N / 8 - 1 ω N ( 8 n 8 + 2 ) k x 8 n 8 + 2 + n 8 = 0 N / 8 - 1 ω N ( 8 n 8 + 6 ) k x 8 n 8 + 6 + n 4 = 0 N / 4 - 1 ω N ( 4 n 4 + 1 ) k x 4 n 4 + 1 + n 4 = 0 N / 4 - 1 ω N ( 4 n 4 + 3 ) k x 4 n 4 + 3
(20)

where n=4n4=8n8n=4n4=8n8. As with earlier decompositions, invariants are factored out to obtain:

X k = n 4 = 0 N / 4 - 1 ω N 4 n 4 k x 4 n 4 + ω N 2 k n 8 = 0 N / 8 - 1 ω N / 8 n 8 k x 8 n 8 + 2 + ω N 6 k n 8 = 0 N / 8 - 1 ω N / 8 n 8 k x 8 n 8 + 6 + ω N k n 4 = 0 N / 4 - 1 ω N / 4 n 4 k x 4 n 4 + 1 + ω N 3 k n 4 = 0 N / 4 - 1 ω N / 4 n 4 k x 4 n 4 + 3 X k = n 4 = 0 N / 4 - 1 ω N 4 n 4 k x 4 n 4 + ω N 2 k n 8 = 0 N / 8 - 1 ω N / 8 n 8 k x 8 n 8 + 2 + ω N 6 k n 8 = 0 N / 8 - 1 ω N / 8 n 8 k x 8 n 8 + 6 + ω N k n 4 = 0 N / 4 - 1 ω N / 4 n 4 k x 4 n 4 + 1 + ω N 3 k n 4 = 0 N / 4 - 1 ω N / 4 n 4 k x 4 n 4 + 3
(21)

Following from the conjugate-pair split-radix algorithm, x8n8+6x8n8+6 is shifted cyclically by -8-8 and x4n4+3x4n4+3 is shifted cyclically by -4-4 to obtain:

X k = n 4 = 0 N / 4 - 1 ω N 4 n 4 k x 4 n 4 + ω N 2 k n 8 = 0 N / 8 - 1 ω N / 8 n 8 k x 8 n 8 + 2 + ω N - 2 k n 8 = 0 N / 8 - 1 ω N / 8 n 8 k x 8 n 8 - 2 + ω N k n 4 = 0 N / 4 - 1 ω N / 4 n 4 k x 4 n 4 + 1 + ω N - k n 4 = 0 N / 4 - 1 ω N / 4 n 4 k x 4 n 4 - 1 X k = n 4 = 0 N / 4 - 1 ω N 4 n 4 k x 4 n 4 + ω N 2 k n 8 = 0 N / 8 - 1 ω N / 8 n 8 k x 8 n 8 + 2 + ω N - 2 k n 8 = 0 N / 8 - 1 ω N / 8 n 8 k x 8 n 8 - 2 + ω N k n 4 = 0 N / 4 - 1 ω N / 4 n 4 k x 4 n 4 + 1 + ω N - k n 4 = 0 N / 4 - 1 ω N / 4 n 4 k x 4 n 4 - 1
(22)

where x-n=xN-nx-n=xN-n. Note that the sums over x8n8+2x8n8+2 and x8n8-2x8n8-2 are multiplied by constants that are now conjugate pairs, as are the sums over x4n4+1x4n4+1 and x4n4-1x4n4-1.

The sum over x4n4x4n4 is now substituted with UkUk (where k=0,,N/4-1k=0,,N/4-1), while the sums over x8n8+2x8n8+2 and x8nn-2x8nn-2 are respectively substituted with YkYk and Yk'Yk' (where k=0,,N/8-1k=0,,N/8-1) and the sums over x4n4+1x4n4+1 and x4n4-1x4n4-1 respectively substituted with ZkZk and Zk'Zk' (where k=0,,N/4-1k=0,,N/4-1), simplifying Equation 22 thus:

X k = U k + ω N 2 k Y k + ω N - 2 k Y k ' + ω N k Z k + ω N - k Z k ' X k = U k + ω N 2 k Y k + ω N - 2 k Y k ' + ω N k Z k + ω N - k Z k '
(23)

As with earlier examples, computation is factored out of Equation 23 by exploiting periodicity in the sub-transforms and symmetries in the twiddle factors. Equation 23 is first expressed as a parametric equation of eight parts:

X k + p N = U k + p N + ω N 2 ( k + p N ) Y k + p N + ω N - 2 ( k + p N ) Y k + p N ' + ω N k + p N Z k + p N + ω N - ( k + p N ) Z k + p N ' X k + p N = U k + p N + ω N 2 ( k + p N ) Y k + p N + ω N - 2 ( k + p N ) Y k + p N ' + ω N k + p N Z k + p N + ω N - ( k + p N ) Z k + p N '
(24)

where k=0,,N/8-1k=0,,N/8-1 and p0,12,14,34,18,38,58,78p0,12,14,34,18,38,58,78. By exploiting periodicity in the sub-transforms:

U k = U k + N / 4 Y k = Y k + N / 8 Y k ' = Y k + N / 8 ' Z k = Z k + N / 4 Z k ' = Z k + N / 4 ' U k = U k + N / 4 Y k = Y k + N / 8 Y k ' = Y k + N / 8 ' Z k = Z k + N / 4 Z k ' = Z k + N / 4 '
(25)

and the following symmetries in the twiddle factors:

ω N 2 k = ω N 2 ( k + N / 2 ) = - ω N 2 ( k + N / 4 ) = - ω N 2 ( k + 3 N / 4 ) = - i ω N 2 ( k + N / 8 ) = i ω N 2 ( k + 3 N / 8 ) = - i ω N 2 ( k + 5 N / 8 ) = i ω N 2 ( k + 7 N / 8 ) ω N - 2 k = ω N - 2 ( k + N / 2 ) = - ω N - 2 ( k + N / 4 ) = - ω N - 2 ( k + 3 N / 4 ) = i ω N - 2 ( k + N / 8 ) = - i ω N - 2 ( k + 3 N / 8 ) = i ω N - 2 ( k + 5 N / 8 ) = - i ω N - 2 ( k + 7 N / 8 ) ω N k = - ω N k + N / 2 = - i ω N k + N / 4 = i ω N k + 3 N / 4 ω N - k = - ω N - ( k + N / 2 ) = i ω N - ( k + N / 4 ) = - i ω N - ( k + 3 N / 4 ) ω N k + N / 8 = - i ω N k + 3 N / 8 = - ω N k + 5 N / 8 = i ω N k + 7 N / 8 ω N - ( k + N / 8 ) = i ω N - ( k + 3 N / 8 ) = - ω N - ( k + 5 N / 8 ) = - i ω N - ( k + 7 N / 8 ) ω N 2 k = ω N 2 ( k + N / 2 ) = - ω N 2 ( k + N / 4 ) = - ω N 2 ( k + 3 N / 4 ) = - i ω N 2 ( k + N / 8 ) = i ω N 2 ( k + 3 N / 8 ) = - i ω N 2 ( k + 5 N / 8 ) = i ω N 2 ( k + 7 N / 8 ) ω N - 2 k = ω N - 2 ( k + N / 2 ) = - ω N - 2 ( k + N / 4 ) = - ω N - 2 ( k + 3 N / 4 ) = i ω N - 2 ( k + N / 8 ) = - i ω N - 2 ( k + 3 N / 8 ) = i ω N - 2 ( k + 5 N / 8 ) = - i ω N - 2 ( k + 7 N / 8 ) ω N k = - ω N k + N / 2 = - i ω N k + N / 4 = i ω N k + 3 N / 4 ω N - k = - ω N - ( k + N / 2 ) = i ω N - ( k + N / 4 ) = - i ω N - ( k + 3 N / 4 ) ω N k + N / 8 = - i ω N k + 3 N / 8 = - ω N k + 5 N / 8 = i ω N k + 7 N / 8 ω N - ( k + N / 8 ) = i ω N - ( k + 3 N / 8 ) = - ω N - ( k + 5 N / 8 ) = - i ω N - ( k + 7 N / 8 )
(26)

Equation 24 is rewritten thus:

X k = U k + ( ω N 2 k Y k + ω N - 2 k Y k ' ) + ( ω N k Z k + ω N - k Z k ' ) X k + N / 2 = U k + ( ω N 2 k Y k + ω N - 2 k Y k ' ) - ( ω N k Z k + ω N - k Z k ' ) X k + N / 4 = U k - ( ω N 2 k Y k + ω N - 2 k Y k ' ) - i ( ω N k Z k - ω N - k Z k ' ) X k + 3 N / 4 = U k - ( ω N 2 k Y k + ω N - 2 k Y k ' ) + i ( ω N k Z k - ω N - k Z k ' ) X k + N / 8 = U k + N / 8 - i ( ω N 2 k Y k - ω N - 2 k Y k ' ) + ( ω N k + N / 8 Z k + N / 8 + ω N - ( k + N / 8 ) Z k + N / 8 ' ) X k + 3 N / 8 = U k + N / 8 + i ( ω N 2 k Y k - ω N - 2 k Y k ' ) - i ( ω N k + N / 8 Z k + N / 8 - ω N - ( k + N / 8 ) Z k + N / 8 ' ) X k + 5 N / 8 = U k + N / 8 - i ( ω N 2 k Y k - ω N - 2 k Y k ' ) - ( ω N k + N / 8 Z k + N / 8 + ω N - ( k + N / 8 ) Z k + N / 8 ' ) X k + 7 N / 8 = U k + N / 8 + i ( ω N 2 k Y k - ω N - 2 k Y k ' ) + i ( ω N k + N / 8 Z k + N / 8 - ω N - ( k + N / 8 ) Z k + N / 8 ' ) X k = U k + ( ω N 2 k Y k + ω N - 2 k Y k ' ) + ( ω N k Z k + ω N - k Z k ' ) X k + N / 2 = U k + ( ω N 2 k Y k + ω N - 2 k Y k ' ) - ( ω N k Z k + ω N - k Z k ' ) X k + N / 4 = U k - ( ω N 2 k Y k + ω N - 2 k Y k ' ) - i ( ω N k Z k - ω N - k Z k ' ) X k + 3 N / 4 = U k - ( ω N 2 k Y k + ω N - 2 k Y k ' ) + i ( ω N k Z k - ω N - k Z k ' ) X k + N / 8 = U k + N / 8 - i ( ω N 2 k Y k - ω N - 2 k Y k ' ) + ( ω N k + N / 8 Z k + N / 8 + ω N - ( k + N / 8 ) Z k + N / 8 ' ) X k + 3 N / 8 = U k + N / 8 + i ( ω N 2 k Y k - ω N - 2 k Y k ' ) - i ( ω N k + N / 8 Z k + N / 8 - ω N - ( k + N / 8 ) Z k + N / 8 ' ) X k + 5 N / 8 = U k + N / 8 - i ( ω N 2 k Y k - ω N - 2 k Y k ' ) - ( ω N k + N / 8 Z k + N / 8 + ω N - ( k + N / 8 ) Z k + N / 8 ' ) X k + 7 N / 8 = U k + N / 8 + i ( ω N 2 k Y k - ω N - 2 k Y k ' ) + i ( ω N k + N / 8 Z k + N / 8 - ω N - ( k + N / 8 ) Z k + N / 8 ' )
(27)

By applying terms with the appropriate scaling, viz. αN,k=sN/4,k/sN,kαN,k=sN/4,k/sN,k, βN,k=sN/8,k/sN/2,kβN,k=sN/8,k/sN/2,k, γN,k=sN/4,k+N/8/sN,k+N/8γN,k=sN/4,k+N/8/sN,k+N/8, δN,k=sN/2,k/sN,kδN,k=sN/2,k/sN,k and ϵN,k=sN/2,k+N/8/sN,k+N/8ϵN,k=sN/2,k+N/8/sN,k+N/8, Equation 27 now becomes:

X k / s N , k = U k α N , k + ( β N , k ω N 2 k Y k + β N , k ω N - 2 k Y k ' ) δ N , k + ( α N , k ω N k Z k + α N , k ω N - k Z k ' ) X k + N / 2 / s N , k = U k α N , k + ( β N , k ω N 2 k Y k + β N , k ω N - 2 k Y k ' ) δ N , k - ( α N , k ω N k Z k + α N , k ω N - k Z k ' ) X k + N / 4 / s N , k = U k α N , k - ( β N , k ω N 2 k Y k + β N , k ω N - 2 k Y k ' ) δ N , k - i ( α N , k ω N k Z k - α N , k ω N - k Z k ' ) X k + 3 N / 4 / s N , k = U k α N , k - ( β N , k ω N 2 k Y k + β N , k ω N - 2 k Y k ' ) δ N , k + i ( α N , k ω N k Z k - α N , k ω N - k Z k ' ) X k + N / 8 / s N , k = U k + N / 8 γ N , k - i ( β N , k ω N 2 k Y k - β N , k ω N - 2 k Y k ' ) ϵ N , k + ( γ N , k ω N k + N / 8 Z k + N / 8 + γ N , k ω N - ( k + N / 8 ) Z k + N / 8 ' ) X k + 3 N / 8 / s N , k = U k + N / 8 γ N , k + i ( β N , k ω N 2 k Y k - β N , k ω N - 2 k Y k ' ) ϵ N , k - i ( γ N , k ω N k + N / 8 Z k + N / 8 - γ N , k ω N - ( k + N / 8 ) Z k + N / 8 ' ) X k + 5 N / 8 / s N , k = U k + N / 8 γ N , k - i ( β N , k ω N 2 k Y k - β N , k ω N - 2 k Y k ' ) ϵ N , k - ( γ N , k ω N k + N / 8 Z k + N / 8 + γ N , k ω N - ( k + N / 8 ) Z k + N / 8 ' ) X k + 7 N / 8 / s N , k = U k + N / 8 γ N , k + i ( β N , k ω N 2 k Y k - β N , k ω N - 2 k Y k ' ) ϵ N , k + i ( γ N , k ω N k + N / 8 Z k + N / 8 - γ N , k ω N - ( k + N / 8 ) Z k + N / 8 ' ) X k / s N , k = U k α N , k + ( β N , k ω N 2 k Y k + β N , k ω N - 2 k Y k ' ) δ N , k + ( α N , k ω N k Z k + α N , k ω N - k Z k ' ) X k + N / 2 / s N , k = U k α N , k + ( β N , k ω N 2 k Y k + β N , k ω N - 2 k Y k ' ) δ N , k - ( α N , k ω N k Z k + α N , k ω N - k Z k ' ) X k + N / 4 / s N , k = U k α N , k - ( β N , k ω N 2 k Y k + β N , k ω N - 2 k Y k ' ) δ N , k - i ( α N , k ω N k Z k - α N , k ω N - k Z k ' ) X k + 3 N / 4 / s N , k = U k α N , k - ( β N , k ω N 2 k Y k + β N , k ω N - 2 k Y k ' ) δ N , k + i ( α N , k ω N k Z k - α N , k ω N - k Z k ' ) X k + N / 8 / s N , k = U k + N / 8 γ N , k - i ( β N , k ω N 2 k Y k - β N , k ω N - 2 k Y k ' ) ϵ N , k + ( γ N , k ω N k + N / 8 Z k + N / 8 + γ N , k ω N - ( k + N / 8 ) Z k + N / 8 ' ) X k + 3 N / 8 / s N , k = U k + N / 8 γ N , k + i ( β N , k ω N 2 k Y k - β N , k ω N - 2 k Y k ' ) ϵ N , k - i ( γ N , k ω N k + N / 8 Z k + N / 8 - γ N , k ω N - ( k + N / 8 ) Z k + N / 8 ' ) X k + 5 N / 8 / s N , k = U k + N / 8 γ N , k - i ( β N , k ω N 2 k Y k - β N , k ω N - 2 k Y k ' ) ϵ N , k - ( γ N , k ω N k + N / 8 Z k + N / 8 + γ N , k ω N - ( k + N / 8 ) Z k + N / 8 ' ) X k + 7 N / 8 / s N , k = U k + N / 8 γ N , k + i ( β N , k ω N 2 k Y k - β N , k ω N - 2 k Y k ' ) ϵ N , k + i ( γ N , k ω N k + N / 8 Z k + N / 8 - γ N , k ω N - ( k + N / 8 ) Z k + N / 8 ' )
(28)

Assuming that the scaling factors are absorbed into precomputed twiddle factors where possible (e.g., αN,kωNkαN,kωNk is a single precomputed constant), computing Equation 28 requires about (68/8)N(68/8)N real operations, in contrast to (72/8)N(72/8)N operations for Equation 27. Further assuming that operations are skipped in the cases where precomputed constants are of the form ±1±1 or ±i±i, a further 28 real operations are saved in Equation 28. Thus the arithmetic cost of Equation 28 can be expressed with the following recurrence relation:

T ( n ) = 3 T ( n / 4 ) + 2 T ( n / 8 ) + max { n - 12 , 0 } + 7 . 5 n - 16 for n 8 16 for n = 4 4 for n = 2 0 for n = 1 T ( n ) = 3 T ( n / 4 ) + 2 T ( n / 8 ) + max { n - 12 , 0 } + 7 . 5 n - 16 for n 8 16 for n = 4 4 for n = 2 0 for n = 1
(29)

Bernstein gives the exact solution of Equation 29 as [1]:

T ( n ) = ( 34 / 9 ) n log 2 n - ( 142 / 27 ) n - ( 2 / 9 ) ( - 1 ) log 2 n log 2 n + ( 7 / 27 ) ( - 1 ) log 2 n + 7 T ( n ) = ( 34 / 9 ) n log 2 n - ( 142 / 27 ) n - ( 2 / 9 ) ( - 1 ) log 2 n log 2 n + ( 7 / 27 ) ( - 1 ) log 2 n + 7
(30)

for n2n2.

Equation 28 is scaled by sN,ksN,k, but if the application is convolution in frequency, the scaling could be absorbed into the filter, and the cost of scaling the results back to XkXk avoided. Otherwise, a split-radix FFT can be used to change basis, absorbing the scaling into the twiddle factors of the x4n4+1x4n4+1 and x4n4-1x4n4-1 terms:

X k = U k + ( s N , k ω N k Z k + s N , k ω N - k Z k ' ) X k + N / 2 = U k - ( s N , k ω N k Z k + s N , k ω N - k Z k ' ) X k + N / 4 = U k + N / 4 - i ( s N , k ω N k Z k - s N , k ω N - k Z k ' ) X k + 3 N / 4 = U k + N / 4 + i ( s N , k ω N k Z k - s N , k ω N - k Z k ' ) X k = U k + ( s N , k ω N k Z k + s N , k ω N - k Z k ' ) X k + N / 2 = U k - ( s N , k ω N k Z k + s N , k ω N - k Z k ' ) X k + N / 4 = U k + N / 4 - i ( s N , k ω N k Z k - s N , k ω N - k Z k ' ) X k + 3 N / 4 = U k + N / 4 + i ( s N , k ω N k Z k - s N , k ω N - k Z k ' )
(31)

where ZkZk and Zk'Zk' are now recursively computed with the tangent FFT of Equation 28, and the UkUk terms are themselves computed with Equation 31. The arithmetic cost of computing the tangent FFT in the traditional basis is thus expressed:

T ' ( n ) = T ' ( n / 2 ) + 2 T ( n / 4 ) + 3 n + max { 3 n - 16 , 0 } for n 4 4 for n = 2 0 for n = 1 T ' ( n ) = T ' ( n / 2 ) + 2 T ( n / 4 ) + 3 n + max { 3 n - 16 , 0 } for n 4 4 for n = 2 0 for n = 1
(32)

giving rise to Van Buskirk's exact operation count of [15]:

T ' ( n ) = ( 34 / 9 ) n log 2 n - ( 124 / 27 ) n - 2 log 2 n - ( 2 / 9 ) ( - 1 ) log 2 n log 2 n + ( 16 / 27 ) ( - 1 ) log 2 n + 8 T ' ( n ) = ( 34 / 9 ) n log 2 n - ( 124 / 27 ) n - 2 log 2 n - ( 2 / 9 ) ( - 1 ) log 2 n log 2 n + ( 16 / 27 ) ( - 1 ) log 2 n + 8
(33)

for n2n2.

## Footnotes

1. Although derived differently, the underlying structure presented here is identical to the network transpose of Bernstein's tangent FFT. In contrast to Johnson and Frigo's algorithm of four sub-transforms, the view presented here uses only one sub-transform and a scaled split-radix transform.

## References

1. Bernstein, D. (2007). The tangent FFT. Applied Algebra, Algebraic Algorithms and Error-Correcting Codes, 291–300.
2. Burrus, CSS and Parks, T.W. (1991). DFT/FFT and Convolution Algorithms: theory and Implementation. John Wiley & Sons, Inc.
3. Carl, F. Werke, Band 3, Königlichen Gesellschaft der Wissenschaften, Göttingen, 1866. 308–310.
4. Chang, W.H. and Nguyen, T.Q. (2008). On the fixed-point accuracy analysis of FFT algorithms. Signal Processing, IEEE Transactions on, 56(10), 4673–4682.
5. Cooley, J.W. and Tukey, J.W. (1965). An Algorithm for the Machine Calculation of Complex Fourier Series. Mathematics of Computation, 19(90), 297–301.
6. Duhamel, P. and Hollmann, H. (1984). Split radix FFT algorithm. Electronics Letters, 20(1), 14–16.
7. Duhamel, P. and Vetterli, M. (1990). Fast Fourier transforms: a tutorial review and a state of the art. Signal Processing, 19(4), 259–299.
8. Fiduccia, C.M. (1972). Polynomial evaluation via the division algorithm the fast Fourier transform revisited. In Proceedings of the fourth annual ACM symposium on Theory of computing. (p. 88–93). ACM.
9. Frigo, M. (1999). A fast Fourier transform compiler. In ACM SIGPLAN Notices. (Vol. 34, p. 169–180). ACM.
10. Gopinath, R. A. (1989). Conjugate pair fast Fourier transform. Electronics Letters, 25, 1084.
11. Heideman, M.T. and Johnson, D.H. and Burrus, C.S. (1985). Gauss and the history of the fast Fourier transform. Archive for history of exact sciences, 34(3), 265–277.
12. Johnson, S.G. and Frigo, M. (2006). A modified split-radix FFT with fewer arithmetic operations. Signal Processing, IEEE Transactions on, 55(1), 111–119.
13. Kamar, I. and Elcherif, Y. (1989). Conjugate pair fast Fourier transform. Electronics Letters, 25, 324.
14. Krot, A. M. and Minervina, H. B. (1992). Conjugate pair fast Fourier transform. Electronics Letters, 28, 1143.
15. Lundy, T. and Van Buskirk, J. (2007). A new matrix approach to real FFTs and convolutions of length 2 k. Computing, 80(1), 23–45.
16. Martens, J.B. (1984). Recursive cyclotomic factorization–A new algorithm for calculating the discrete Fourier transform. Acoustics, Speech and Signal Processing, IEEE Transactions on, 32(4), 750–761.
17. Mateer, T. (2008). Fast Fourier transform algorithms with applications. Ph. D. Thesis. Clemson University.
18. Qian, H-S. and Zhao, Z-J. (1990). Conjugate pair fast Fourier transform. Electronics Letters, 26, 541.
19. Vetterli, M. and Nussbaumer, H.J. (1984). Simple FFT and DCT algorithms with reduced number of operations. Signal Processing, 6(4), 267–278.
20. Yavne, R. (1968). An economical method for calculating the discrete Fourier transform. In Proceedings of the December 9-11, 1968, fall joint computer conference, part I. (p. 115–125). ACM.

## Content actions

PDF | EPUB (?)

### What is an EPUB file?

EPUB is an electronic book format that can be read on a variety of mobile devices.

PDF | EPUB (?)

### What is an EPUB file?

EPUB is an electronic book format that can be read on a variety of mobile devices.

#### Collection to:

My Favorites (?)

'My Favorites' is a special kind of lens which you can use to bookmark modules and collections. 'My Favorites' can only be seen by you, and collections saved in 'My Favorites' can remember the last module you were on. You need an account to use 'My Favorites'.

| A lens I own (?)

#### Definition of a lens

##### Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

##### What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

##### Who can create a lens?

Any individual member, a community, or a respected organization.

##### What are tags?

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

| External bookmarks

#### Module to:

My Favorites (?)

'My Favorites' is a special kind of lens which you can use to bookmark modules and collections. 'My Favorites' can only be seen by you, and collections saved in 'My Favorites' can remember the last module you were on. You need an account to use 'My Favorites'.

| A lens I own (?)

#### Definition of a lens

##### Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

##### What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

##### Who can create a lens?

Any individual member, a community, or a respected organization.

##### What are tags?

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

| External bookmarks