# Connexions

You are here: Home » Content » An Introduction to Source-Coding: Quantization, DPCM, Transform Coding, and Sub-band Coding » Sub-Optimum Orthogonal Transforms

### Lenses

What is a lens?

#### Definition of a lens

##### Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

##### What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

##### Who can create a lens?

Any individual member, a community, or a respected organization.

##### What are tags?

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

#### Affiliated with (What does "Affiliated with" mean?)

This content is either by members of the organizations listed or about topics related to the organizations listed. Click each link to see a list of all content affiliated with the organization.
• NSF Partnership

This collection is included inLens: NSF Partnership in Signal Processing
By: Sidney Burrus

Click the "NSF Partnership" link to see all content affiliated with them.

Click the tag icon to display tags associated with this content.

• Featured Content

This collection is included inLens: Connexions Featured Content
By: Connexions

Click the "Featured Content" link to see all content affiliated with them.

Click the tag icon to display tags associated with this content.

#### Also in these lenses

• UniqU content

This collection is included inLens: UniqU's lens
By: UniqU, LLC

Click the "UniqU content" link to see all content selected in this lens.

• Lens for Engineering

This module and collection are included inLens: Lens for Engineering
By: Sidney Burrus

Click the "Lens for Engineering" link to see all content selected in this lens.

### Recently Viewed

This feature requires Javascript to be enabled.

### Tags

(What is a tag?)

These tags come from the endorsement, affiliation, and other lenses that include this content.

Inside Collection (Course):

Course by: Phil Schniter. E-mail the author

# Sub-Optimum Orthogonal Transforms

Module by: Phil Schniter. E-mail the author

Summary: The KLT, which is the optimal orthogonal transform for transform coding, requires knowledge of the source statistics that may be unavailable in practice, and eigendecomposition that may be too expensive in practice. This motivates the consideration of other orthogonal transforms, like the DFT, DCT, and DHT. Here we discuss these and analyze the resulting performance when possible. In addition, we discuss practical matters like real-valued DFTs and fast DCT implementations.

• Goal: Recall that the goal of the optimal orthogonal transform was to minimize the ratio of geometric to arithmetic output variances:
k=0N-1σyk21/N1Nk=0N-1σyk2.k=0N-1σyk21/N1Nk=0N-1σyk2.
(1)
The ratio Equation 1 attains its maximum value (=1=1) when σyk2σyk2 are equal for all k and takes on much smaller values when the difference between the σyk2σyk2 (sometimes called the dynamic range of {σyk2}{σyk2}) is large.
• Problem with KLT: The KLT, i.e., the orthogonal transform minimizing Equation 1, is a function of the input signal statistics. Specifically, the KLT equals the eigenvector matrix of the input autocorrelation matrix. Unfortunately, realistic signals are non-stationary, requiring continual KLT redesign if optimality is to be preserved, and eigenvector computation is computationally intensive, especially for large N. Thus, the question becomes: Are there fixed orthogonal transforms that do a good job of minimizing the ratio Equation 1 for “typical” input signals? As we will see, the answer is yes...
• DFT Intuitions: For the sake of intuition, lets first consider choosing T as an orthogonal DFT matrix. In this case, the coefficient variances {σyk2}{σyk2} would be samples of the power spectrum and the dynamic range of {σyk2}{σyk2} would be determined by the relative input power in different frequency bands. Recalling that asymptotic TC (NN) performance is determined by SFMxSFMx, which has the same geometric-to-arithmetic-average structure as Equation 1:
SFMx=exp12π-ππlnSx(ejω)dω12π-ππSx(ejω)dω=limNk=0N-1S(ejωk)1/N1Nk=0N-1S(ejωk)ωk=2πk/N,SFMx=exp12π-ππlnSx(ejω)dω12π-ππSx(ejω)dω=limNk=0N-1S(ejωk)1/N1Nk=0N-1S(ejωk)ωk=2πk/N,
(2)
we might intuit that the DFT is optimal as NN. The asymptotic optimality of the DFT can, in fact, be proven (see Jayant & Noll). Of course, we don't have much reason to expect that the DFT would be optimal for finite transform dimension N. Still, for many signals, it performs well. (See Figure 1.)
• Other Transforms: The most commonly used orthogonal transform in speech, image, audio, and video coding is the discrete cosine transform (DCT). The excellent performance of the DCT follows from the fact that it is especially suited to “lowpass” signals, a feature shared by most signals in the previously mentioned applications. Note that there are plenty of signals for which the DCT performs poorly—it just so happens that such signals are not frequently encountered in speech, image, audio, or video. We will describe the DCT and provide intuition regarding it's good “lowpass” performance shortly. Like the DFT, the DCT has fast algorithms which make it extremely practical from an implementation standpoint. Another reasonably popular orthogonal transform, requiring even less in the way of computation, is the discrete Hadamard transform (DHT), also described below. Figure 1 compares DFT, DCT, DHT, and KLT for various transform lengths N along with asymptotic TC performance. Figure 1(a) shows gain over PCM when using a lowpass autoregressive (AR) source {x(n)}{x(n)} generated from white Gaussian noise {v(n)}{v(n)} via:
X(z)=11-0.8z-1V(z),X(z)=11-0.8z-1V(z),
(3)
while Figure 1(b) shows the gain for highpass {x(n)}{x(n)}:
X(z)=11+0.8z-1V(z).X(z)=11+0.8z-1V(z).
(4)
See Figure 2 for the power spectra of these two processes.
• The DHT: The N×NN×N DHT is defined below for power-of-two N:
H2=12111-1H2N=12HNHNHN-HNH2=12111-1H2N=12HNHNHN-HN
(5)
Note that HN is orthogonal1, i.e., HNHNt=IHNHNt=I. As an example
H4=1211111-11-111-1-11-1-11H4=1211111-11-111-1-11-1-11
(6)
Figure 3 illustrates DHT basis vectors for the case N=8N=8. The primary advantage of the DHT is that its implementation can be accomplished very efficiently. Figure 1 suggests that the DHT performs nearly as well as the KLT for N=2N=2 and 4, but its performance falls well short of optimal for larger N.
• The DFT: The normalized2 DFT from {xn}{xn} to {yk}{yk} is defined below along with its inverse.
yk=1Nn=0N-1xne-j2πNkn;k=0N-1,xn=1Nk=0N-1ykej2πNkn;n=0N-1.yk=1Nn=0N-1xne-j2πNkn;k=0N-1,xn=1Nk=0N-1ykej2πNkn;n=0N-1.
(7)
The normalized DFT can be represented by a symmetric unitary3 matrix W N W N :
WNk,n=1Ne-j2πNkn;k,n=0N-1.WNk,n=1Ne-j2πNkn;k,n=0N-1.
(8)
By unitary, we mean that WNWN*t=IWNWN*t=I, where (·)*(·)* denotes complex conjugation. Note that a unitary matrix is the complex-valued equivalent of an orthogonal matrix. In practice, the N×NN×N DFT is implemented using the fast Fourier transform (FFT), which requires N2log2NN2log2N complex multiply/adds when N is a power of two.
• The Real-Valued DFT: Since we assume real-valued xn, complex-valued DFT outputs yk might seem problematic since transmitting both real and imaginary components would decrease our transmission efficiency. For a real-valued DFT input, however, the DFT outputs exhibit conjugate symmetry which allows us to represent the N complex valued outputs with only N real-valued numbers. More precisely, real-valued DFT input {xn}{xn} implies that DFT output {yk}{yk} has the property
yk=yN-k*,k=1,2,,N/2,yk=yN-k*,k=1,2,,N/2,
(9)
which implies
Re {yk}= Re {yN-k}k=1,2,,N/2, Im {yk}=- Im {yN-k}k=1,2,,N/2, Im {y0}= Im {yN/2}=0. Re {yk}= Re {yN-k}k=1,2,,N/2, Im {yk}=- Im {yN-k}k=1,2,,N/2, Im {y0}= Im {yN/2}=0.
(10)
A good method by which to select non-redundant components of the DFT output is:
1. Compute complex-valued {yk}{yk} using the standard DFT.
2. Construct real-valued {yk'}{yk'} from {yk}{yk} as follows:
y0'=y0(R)y1'=2 Im {y1}y2'=2 Re {y1}yN-3'=2 Im {yN/2-1}yN-2'=2 Re {yN/2-1}yN-1'=yN/2(R).y0'=y0(R)y1'=2 Im {y1}y2'=2 Re {y1}yN-3'=2 Im {yN/2-1}yN-2'=2 Re {yN/2-1}yN-1'=yN/2(R).
(11)
The method above is convenient because (i) it preserves the frequency ordering of the DFT and (ii) it preserves the norm of the DFT output vector, i.e., y'=yy'=y. Using the conjugate symmetry property of DFT outputs, we can write the transformation from {yk}{yk} from {yk'}{yk'} as a matrix operation U N U N :
y'=10000000-j/20000j/201/200001/200-j/200j/20001/2001/2000-j/20j/200001/201/2000001000UN Re {y0} Re {y1}+j Im {y1} Re {y2}+j Im {y2} Re {yN/2-1}+j Im {yN/2-1} Re {yN/2} Re {yN/2-1}-j Im {yN/2-1} Re {y2}-j Im {y2} Re {y1}-j Im {y1}yy'=10000000-j/20000j/201/200001/200-j/200j/20001/2001/2000-j/20j/200001/201/2000001000UN Re {y0} Re {y1}+j Im {y1} Re {y2}+j Im {y2} Re {yN/2-1}+j Im {yN/2-1} Re {yN/2} Re {yN/2-1}-j Im {yN/2-1} Re {y2}-j Im {y2} Re {y1}-j Im {y1}y
(12)
The normalization feature guarantees that UNUN is unitary (which is easily checked by inspection). Then UNWNUNWN, the product of two unitary matrices, is also unitary. Since UNWNUNWN is actually real-valued (since it takes any real-valued x to a real-valued y') it should be referred to as orthogonal rather than unitary. Henceforth we rename UNWNUNWN the real-valued DFT matrixWNrWNr
WNr:=UNWN.WNr:=UNWN.
(13)
It is easily checked that the basis vectors of WNrWNr are sampled sines and cosines at the uniformly spaced frequencies {2πk/N;k=0,,N/2}{2πk/N;k=0,,N/2}. Figure 4 gives an illustration of the real-valued DFT basis vectors for the case N=8N=8.

## Example 1: [DFT and Real-valued DFT for N=4N=4]

W4=1211111-j-1j1-11-11j-1-j,x=3-142,W4x=4-1/2+j3/23-1/2-j3/2.W4=1211111-j-1j1-11-11j-1-j,x=3-142,W4x=4-1/2+j3/23-1/2-j3/2.
(14)
W4r=1211110-20220-201-11-1,x=3-142,W4rx=43/2-1/23.W4r=1211110-20220-201-11-1,x=3-142,W4rx=43/2-1/23.
(15)
Recall that when N is a power of 2, an N-dimensional complex-valued FFT requires N2log2NN2log2N complex multiply/adds. When the input is real, however, an N-dimensional FFT may be computed using Nlog2NNlog2N real multiply/adds (see Sorensen & Jones & Heideman & Burrus TASSP 87).
• The DCT: The DCT is defined below along with its inverse
yk=2Nαkn=0N-1xncos(2n+1)kπ2N;k=0N-1,forα0=1/2,αk0=1,xn=2Nk=0N-1αkykcos(2n+1)kπ2N;n=0N-1,yk=2Nαkn=0N-1xncos(2n+1)kπ2N;k=0N-1,forα0=1/2,αk0=1,xn=2Nk=0N-1αkykcos(2n+1)kπ2N;n=0N-1,
(16)
The DCT can be represented by an orthogonal matrix C N C N :
CNk,n=2Nαkcos(2n+1)kπ2N;k,n=0N-1.CNk,n=2Nαkcos(2n+1)kπ2N;k,n=0N-1.
(17)
See Figure 5 for an illustration of DCT basis vectors when N=8N=8.
• A Fast DCT: There are a number of fast algorithms to compute the DCT. The method presented below is based on the FFT and leads to intuition concerning the good “lowpass” performance of the DCT.
1. Create a 2N2N-length mirrored version of N-length {xn}{xn} (see Figure 6(c)):
x¯n=xnn=0,1,,N-1,x2N-1-nn=N,N+1,,2N-1.x¯n=xnn=0,1,,N-1,x2N-1-nn=N,N+1,,2N-1.
(18)
2. Compute { y ¯ k } { y ¯ k } , the 2N2N-point DFT of {x¯n}{x¯n}:
y¯k=12Nn=02N-1x¯ne-j2π2Nkn=2Nejπ2Nkn=0N-1xncos(2n+1)kπ2N.y¯k=12Nn=02N-1x¯ne-j2π2Nkn=2Nejπ2Nkn=0N-1xncos(2n+1)kπ2N.
(19)
3. Compute {yk}{yk}, the N-point DCT outputs, from {y¯k}{y¯k}:
yk=e-jπ2Nkαky¯k;k=0,1,,N-1.yk=e-jπ2Nkαky¯k;k=0,1,,N-1.
(20)
Assuming a real-valued input, the scheme outlined above can be implemented using
(21)
where we are assuming use of the real-FFT described previously.
• DCT vs. DFT Performance for Lowpass Signals: Figure 1 suggests that DCT and DFT performance both equal KLT performance asymptotically, i.e., as transform dimension NN. Indeed, this can be proven (see Jayant & Noll). A more practical question is: How do DCT and DFT performances compare for finite N? To answer this question, we will investigate the effects of input data block length on the DCT and DFT. To start, consider the DFT of an N-length input block {x0,,xN-1}{x0,,xN-1}:
yk=1Nn=0N-1xne-j2πNkn;k=0N-1.yk=1Nn=0N-1xne-j2πNkn;k=0N-1.
(22)
It can be seen that the DFT outputs {X0,,XN-1}{X0,,XN-1} are samples of the discrete-time Fourier transform (DTFT) at frequencies {ωk=2πNk;k=0N-1}{ωk=2πNk;k=0N-1}:
X(ω)=1Nn=0N-1xne-jωn,yk=X2πNk.X(ω)=1Nn=0N-1xne-jωn,yk=X2πNk.
(23)
Now lets consider a periodic extension of {x0,,xN-1}{x0,,xN-1} which repeats this N-length sequence a total of L times:
xn'=1LxnN-NL2n<NL20else.xn'=1LxnN-NL2n<NL20else.
(24)
Above, nNnN denotes “nmoduloNnmoduloN” and L is assumed even (see Figure 6(a)-(b)). Here is the interesting point: the DTFT of the NLNL-length periodic extension equals the DTFT of the original N-length data block when sampled at the frequencies {ωk}{ωk}!
X'(ωk)=1NLn=-NL/2NL/2-1xn'e-j2πNkn=1NL=-L/2L/2-1m=0N-1xm+N'e-j2πNk(m+N)=1LN=-L/2L/2-1m=0N-1xme-j2πNk(m+N)=1Nm=0N-1e-j2πNkm=X(ωk).X'(ωk)=1NLn=-NL/2NL/2-1xn'e-j2πNkn=1NL=-L/2L/2-1m=0N-1xm+N'e-j2πNk(m+N)=1LN=-L/2L/2-1m=0N-1xme-j2πNk(m+N)=1Nm=0N-1e-j2πNkm=X(ωk).
(25)
This implies that the overall spectral shape of X'(ω)X'(ω) will be inherited by the DFT outputs {X0,,XN-1}{X0,,XN-1}. So what is the overall shape of X'(ω)X'(ω)? Say that {xn}{xn} is a lowpass process. Being lowpass, we expect the time-domain sequence {xn}{xn} to look relatively “smooth.” If the starting and ending points of the N-block, are different, however, the periodic extension {xn'}{xn'} will exhibit time-domain discontinuities (see Figure 6(b)) that are uncharacteristic of the process {xn}{xn}. These discontinuities imply that X'(ω)X'(ω) will contain high-frequency content not present in the power spectrum of the lowpass input process. Based on our previous findings, if artificial high-frequency content exists at X'(ωk)X'(ωk), it must also exist at X(ωk)=XkX(ωk)=Xk. In conclusion, the periodic extension {xn'}{xn'} provides an intuitive explanation of why short-block DFT analysis of lowpass signals often seems corrupted by “artificial” high-frequency content. So why is this important? Recall that transform performance is proportional to the dynamic range of transform output variances. If the DFT outputs corresponding to otherwise low spectral energy are artificially increased due to short-window effects, the dynamic range of {σyk2}{σyk2} will decrease, and DFT performance will fall short of optimal. Now lets consider the DCT. From derivation of the fast algorithm, we know that the N DCT output magnitudes from length-N input {x0,,xN-1}{x0,,xN-1} are equal to the first N DFT output magnitudes from length-2N2N input {x¯n}{x¯n}—a mirrored version of {x0,,xN-1}{x0,,xN-1}. (See Figure 6(c).) Due to the mirroring effect, the periodic extension of {x¯n}{x¯n} will not have the discontinuities present in the periodic extension of {xn}{xn}, and so a 2N2N-point DFT analysis of {x¯n}{x¯n} will not have “artificial” high frequency enhancement. Assuming that the process from which {xn}{xn} was extracted is lowpass, the DCT outputs will exhibit large dynamic range, and an improvement over DFT coding performance is expected. This is confirmed by Figure 1(a).

## Footnotes

1. Caution: outputs of the Matlab command hadamard must be scaled by 1/N1/N to produce orthogonal DHT matrices!
2. Due to the norm-preserving scale factor 1/N1/N, the DFT definitions above differ from those given in most digital signal processing textbooks.
3. Outputs of the Matlab command dftmtx must be scaled by 1/N1/N to produce unitary DFT matrices.

## Content actions

EPUB (?)

### What is an EPUB file?

EPUB is an electronic book format that can be read on a variety of mobile devices.

PDF | EPUB (?)

### What is an EPUB file?

EPUB is an electronic book format that can be read on a variety of mobile devices.

#### Collection to:

My Favorites (?)

'My Favorites' is a special kind of lens which you can use to bookmark modules and collections. 'My Favorites' can only be seen by you, and collections saved in 'My Favorites' can remember the last module you were on. You need an account to use 'My Favorites'.

| A lens I own (?)

#### Definition of a lens

##### Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

##### What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

##### Who can create a lens?

Any individual member, a community, or a respected organization.

##### What are tags?

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

| External bookmarks

#### Module to:

My Favorites (?)

'My Favorites' is a special kind of lens which you can use to bookmark modules and collections. 'My Favorites' can only be seen by you, and collections saved in 'My Favorites' can remember the last module you were on. You need an account to use 'My Favorites'.

| A lens I own (?)

#### Definition of a lens

##### Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

##### What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

##### Who can create a lens?

Any individual member, a community, or a respected organization.

##### What are tags?

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

| External bookmarks