Skip to content Skip to navigation Skip to collection information

OpenStax-CNX

You are here: Home » Content » An Introduction to Source-Coding: Quantization, DPCM, Transform Coding, and Sub-band Coding » Transform Coding: Background and Motivation

Navigation

Lenses

What is a lens?

Definition of a lens

Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

Who can create a lens?

Any individual member, a community, or a respected organization.

What are tags? tag icon

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

This content is ...

Affiliated with (What does "Affiliated with" mean?)

This content is either by members of the organizations listed or about topics related to the organizations listed. Click each link to see a list of all content affiliated with the organization.

Also in these lenses

  • UniqU content

    This collection is included inLens: UniqU's lens
    By: UniqU, LLC

    Click the "UniqU content" link to see all content selected in this lens.

  • Lens for Engineering

    This module and collection are included inLens: Lens for Engineering
    By: Sidney Burrus

    Click the "Lens for Engineering" link to see all content selected in this lens.

Recently Viewed

This feature requires Javascript to be enabled.

Tags

(What is a tag?)

These tags come from the endorsement, affiliation, and other lenses that include this content.
 

Transform Coding: Background and Motivation

Module by: Phil Schniter. E-mail the author

Summary: Transform coding is described and an analysis is performed for the simple 2-dimensional case, including a comparison to PCM.

  • In transform coding (TC), blocks of N input samples are transformed to N transform coefficients which are then quantized and transmitted. At the decoder, an inverse transform is applied to the quantized coefficients, yielding a reconstruction of the original waveform. By designing individual quantizers in accordance with the statistics of their inputs, it is possible to allocate bits in a more optimal manner, e.g., encoding the “more important” coefficients at a higher bit rate.
    Figure 1: N×NN×N Transform Coder/Decoder with Scalar Quantization
    This figure is a flowchart. Beginning on the left is an arrow labeled, x_0, ..., x_(n-1), pointing to the right at a large box labeled N x N transform. To the right of this large box are a series of short arrows pointing to the right labeled from top to bottom y_0, y_2, ..., y_(N-1). to the right of each of these arrows is a box labeled with a capital Q. To the right of each of the Q-boxes is another arrow labeled ytilde_0, ytilde_1, ..., ytilde_(N-1). At this point in the graph there is a large separation, as if the preceding arrows do not point at anything directly. To the left of the separation are a series of identical ytilde arrows, all pointing at a single large box labeled Inverse Transform. To the right of the large box is a final arrow pointing to the right labeled xtilde_0, ..., xtilde_(N-1).
  • Orthogonal Transforms:  From our perspective, an N×NN×N “transform” will be any real-valued linear operation taking N input samples to N output samples, or transform coefficients. This operation can always be written in matrix form
    y(m)=Tx(m),TRN×Ny(m)=Tx(m),TRN×N
    (1)
    where x(m)x(m) and y(m)y(m) are vectors representing N×1N×1 blocks of input/output elements:
    x(m)=x(mN),x(mN-1),,x(mN-N+1)ty(m)=y(mN),y(mN-1),,y(mN-N+1)t.x(m)=x(mN),x(mN-1),,x(mN-N+1)ty(m)=y(mN),y(mN-1),,y(mN-N+1)t.
    (2)
    Intuition comes from considering the transform's basis vectors{tk}{tk} defined by the rows of the matrix
    T=---t0t------t1t-----tN-1t--T=---t0t------t1t-----tN-1t--
    (3)
    since the coefficient yk=tktxyk=tktx can be thought of as the result of a “comparison” between the kthkth basis vector and the input x. These comparisons are defined by the inner product <tk,x>=tktx<tk,x>=tktx which has a geometrical interpretation involving the angle θk between vectors t k t k and x.
    <tk,x>=cos(θk)tk2x2.<tk,x>=cos(θk)tk2x2.
    (4)
    When the vectors {tk}{tk} are mutually orthogonal, i.e., tktt=0tktt=0 for kk, the transform coefficients represent separate, unrelated features of the input. This property is convenient if the transform coefficients are independently quantized, as is typical in TC schemes.

    Example 1: 2×22×2 Transform Coder

    Say that stationary zero-mean Gaussian source x(m)x(m) has autocorrelation rx(0)=1rx(0)=1, rx(1)=ρrx(1)=ρ, and rx(k)=0rx(k)=0 for k>1k>1. For a bit rate of R bits per sample, uniformly-quantized PCM implies a mean-squared reconstruction error of

    σ r 2 | PCM = Δ 2 12 | Δ = 2 x max / L L = 2 R = 1 3 x max 2 σ x 2 γ x σ x 2 2 - 2 R = γ x σ x 2 2 - 2 R . σ r 2 | PCM = Δ 2 12 | Δ = 2 x max / L L = 2 R = 1 3 x max 2 σ x 2 γ x σ x 2 2 - 2 R = γ x σ x 2 2 - 2 R .
    (5)

    For transform coding, say we choose linear transform

    T=t0tt1t=12111-1T=t0tt1t=12111-1
    (6)

    Setting x(m)=x(2m)x(2m-1)tx(m)=x(2m)x(2m-1)t and y(m)=Tx(m)y(m)=Tx(m), we find that the transformed coefficients have variance

    σy02=E{|t0tx(m)|2}=12E|x(2m)+x(2m-1)|2=122rx(0)+2rx(1)=1+ρσy02=E{|t0tx(m)|2}=12E|x(2m)+x(2m-1)|2=122rx(0)+2rx(1)=1+ρ
    (7)
    σy12=E{|t1tx(m)|2}=12E|x(2m)-x(2m-1)|2=122rx(0)-2rx(1)=1-ρσy12=E{|t1tx(m)|2}=12E|x(2m)-x(2m-1)|2=122rx(0)-2rx(1)=1-ρ
    (8)

    and using uniformly-quantized PCM on each coefficient we get mean-squared reconstruction errors

    σq02=(1+ρ)γx2-2R0σq02=(1+ρ)γx2-2R0
    (9)
    σq12=(1-ρ)γx2-2R1.σq12=(1-ρ)γx2-2R1.
    (10)

    We use the same quantizer performance factor γx as before since linear operations preserve Gaussianity.
    For orthogonal matrices T, i.e., T-1=TtT-1=Tt, we can show that the mean-squared reconstruction error σr2 equals the mean-squared quantization error:

    σr2:=1Nk=0N-1Ex˜(Nm-k)-x(Nm-k)2(hereN=2)=1NEx˜(m)-x(m)2=1NET-1y˜(m)-x(m)2=1NET-1y(m)+q(m)-x(m)2=1NET-1Tx(m)+T-1q(m)-x(m)2=1NET-1q(m)2=1NEqt(m)(T-1)tT-1Iq(m)=1NEq(m)2=1Nk=0N-1σqk2.σr2:=1Nk=0N-1Ex˜(Nm-k)-x(Nm-k)2(hereN=2)=1NEx˜(m)-x(m)2=1NET-1y˜(m)-x(m)2=1NET-1y(m)+q(m)-x(m)2=1NET-1Tx(m)+T-1q(m)-x(m)2=1NET-1q(m)2=1NEqt(m)(T-1)tT-1Iq(m)=1NEq(m)2=1Nk=0N-1σqk2.
    (11)

    Since our 2×22×2 matrix is indeed orthogonal, we have mean-squared reconstruction error

    σr2|TC=12(1+ρ)γx2-2R0+(1-ρ)γx2-2R1σr2|TC=12(1+ρ)γx2-2R0+(1-ρ)γx2-2R1
    (12)

    at bit rate of R0+R1R0+R1 bits per two samples. Comparing TC to PCM at equal bit rates (i.e. R0+R1=2RR0+R1=2R),

    σr2|TCσr2|PCM=12(1+ρ)γx2-2R0+(1-ρ)γx2-2(2R-R0)γx2-2R=(1+ρ)22(R-R0)-1+(1-ρ)22(R0-R)-1.σr2|TCσr2|PCM=12(1+ρ)γx2-2R0+(1-ρ)γx2-2(2R-R0)γx2-2R=(1+ρ)22(R-R0)-1+(1-ρ)22(R0-R)-1.
    (13)

    Figure 2 shows that (i) allocating a higher bit rate to the quantizer with stronger input signal can reduce the average reconstruction error relative to PCM, and (ii) the gain over PCM is higher when the input signal exhibits stronger correlation ρ. Also note that when R0=R1=RR0=R1=R, there is no gain over PCM—a verification of the fact that σr2=σq2σr2=σq2 when T is orthogonal.

    Figure 2: Ratio of TC to PCM mean-squared reconstruction errors versus bit rate R0 for two values of ρ.
    This figure is composed of two cartesian graphs. Both graphs plot a horizontal axis, R_0/R, and a vertical axis, (σ^2_τ|TC)/(σ^2_τ|PCM). The vertical values range from 0 to 5, and the horizontal values range from 0 to 2. The graph on the left is titled ρ=0.8, and the graph on the right is labeled ρ=0.2. There are two curves on each graph, both parabolic in shape. On the ρ=0.8 graph, a curve labeled R=1 enters the page at (0, 3.5) and continues downward to a vertex at (1.75, 0.5). A sharper parabola labeled R=2 enters the graph at (0.4, 5) with a sharp downward slope to a vertex at (1.5, 0.5), where it then curves sharply upward and terminates at approximately (2, 1.6). On the ρ=0.2 graph, the R=1 curve enters at (0, 2.5) with a shallow downward slope  to a vertex at approximately (1.25, 1). A second curve, R=2, begins at (0.25, 5) with a strong downward slope to a vertex at approximately (1.1, 1), where it then continues upward to terminate at (1.9, 5).

Collection Navigation

Content actions

Download:

Collection as:

EPUB (?)

What is an EPUB file?

EPUB is an electronic book format that can be read on a variety of mobile devices.

Downloading to a reading device

For detailed instructions on how to download this content's EPUB to your specific device, click the "(?)" link.

| More downloads ...

Module as:

PDF | More downloads ...

Add:

Collection to:

My Favorites (?)

'My Favorites' is a special kind of lens which you can use to bookmark modules and collections. 'My Favorites' can only be seen by you, and collections saved in 'My Favorites' can remember the last module you were on. You need an account to use 'My Favorites'.

| A lens I own (?)

Definition of a lens

Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

Who can create a lens?

Any individual member, a community, or a respected organization.

What are tags? tag icon

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

| External bookmarks

Module to:

My Favorites (?)

'My Favorites' is a special kind of lens which you can use to bookmark modules and collections. 'My Favorites' can only be seen by you, and collections saved in 'My Favorites' can remember the last module you were on. You need an account to use 'My Favorites'.

| A lens I own (?)

Definition of a lens

Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

Who can create a lens?

Any individual member, a community, or a respected organization.

What are tags? tag icon

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

| External bookmarks