Skip to content Skip to navigation

OpenStax_CNX

You are here: Home » Content » Introduction and Motivation

Navigation

Lenses

What is a lens?

Definition of a lens

Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

Who can create a lens?

Any individual member, a community, or a respected organization.

What are tags? tag icon

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

This content is ...

Affiliated with (What does "Affiliated with" mean?)

This content is either by members of the organizations listed or about topics related to the organizations listed. Click each link to see a list of all content affiliated with the organization.
  • NSF Partnership display tagshide tags

    This module is included inLens: NSF Partnership in Signal Processing
    By: Sidney BurrusAs a part of collection: "An Introduction to Source-Coding: Quantization, DPCM, Transform Coding, and Sub-band Coding"

    Click the "NSF Partnership" link to see all content affiliated with them.

    Click the tag icon tag icon to display tags associated with this content.

  • Featured Content display tagshide tags

    This module is included inLens: Connexions Featured Content
    By: ConnexionsAs a part of collection: "An Introduction to Source-Coding: Quantization, DPCM, Transform Coding, and Sub-band Coding"

    Click the "Featured Content" link to see all content affiliated with them.

    Click the tag icon tag icon to display tags associated with this content.

Also in these lenses

  • UniqU content

    This module is included inLens: UniqU's lens
    By: UniqU, LLCAs a part of collection: "An Introduction to Source-Coding: Quantization, DPCM, Transform Coding, and Sub-band Coding"

    Click the "UniqU content" link to see all content selected in this lens.

  • Lens for Engineering

    This module is included inLens: Lens for Engineering
    By: Sidney Burrus

    Click the "Lens for Engineering" link to see all content selected in this lens.

Recently Viewed

This feature requires Javascript to be enabled.

Tags

(What is a tag?)

These tags come from the endorsement, affiliation, and other lenses that include this content.
 

Introduction and Motivation

Module by: Phil Schniter. E-mail the author

Summary: In this module, we give a brief introduction to sub-band coding, its relation to transform coding, and its use in MPEG-style audio coding.

  • Sub-band coding is a popular compression tool used in, for example, MPEG-style audio coding schemes (see Figure 1).
    Figure 1: Simplified MPEG-style audio coding system.
    This is a flowchart that will be described from left to right. Beginning on the far left is an arrow pointing to the right, labeled input. This arrow points at a rounded box labeled sub-band analysis. Breaking off downward from the input arrow is a second arrow that points down, then to the right at a rounded box labeled psycho-acoustic model. To the right of the box labeled sub-band analysis is a larger arrow pointing to the right labeled freq. data. This arrow points at another box labeled bit alloc and quantization. The freq. data arrow also breaks off to point down at the aforementioned box, psycho-acoustic model. From the right of the psycho-acoustic model is another arrow pointing back up at the bit alloc and quantization box. To the right of that box is another arrow pointing directly to the right, labeled quant. data. This arrow points at a box labeled stream formatting. To the right of this box is a final arrow pointing to the right, labeled output.
  • Figure 2 illustrates a generic subband coder. In short, the input signal is passed through a parallel bank of analysis filters {Hi(z)}{Hi(z)} and the outputs are “downsampled” by a factor of N. Downsampling-by-N is a process which passes every NthNth sample and ignores the rest, effectively decreasing the data rate by factor N. The downsampled outputs are quantized (using a potentially different number of bits per branch—as in transform coding) for storage or transmission. Downsampling ensures that the number of data samples to store is not any larger than the number of data samples entering the coder; in Figure 2, N sub-band outputs are generated for every N system inputs.
    Figure 2: Sub-band coder/decoder with scalar quantization.
    This is a large, complex flowchart which will be described from left to right, as this is the flow of the diagram. The diagram begins with the expression x(n), and from this expression is a line that splits into a series of arrows each pointing to the right at boxes containing the expressions H_0(z), H_1(z), and so on to a final box H_(N-1)(z). From the ends of each of these boxes are more arrows pointing to the right, this time each at an identical circle containing a down arrow and the variable N. To the right of these circles again are a series of arrows, labeled from top to bottom s_0(m), s_1(m), and so on to the final arrow, s_(N-1)(m). These arrows each point at boxes containing the variable Q. To the right of these boxes are another series of arrows pointing to the right, labeled s-tilde_0 (m). There is then a gap in the diagram, followed by a series of identical arrows to those preceding it, with the s-tilde variables. These arrows each point at circles containing an up arrow and the variable N. To the right of these circles are more arrows pointing at boxes containing the labels K_0(z), K_1(z), and so on to a final box containing K_(N-1)(z). Each of these boxes point with arrows to the right at a single circle containing a plus sign. From the plus sign is a final arrow pointing to the right, labeled u(n).
  • Relationship to Transform Coding:  Conceptually, sub-band coding (SC) is very similar to transform coding (TC). Like TC, SC analyzes a block of input data and produces a set of linearly transformed outputs, now called “subband outputs.” Like TC, these transformed outputs are independently quantized in a way that yields coding gain over straightforward PCM. And like TC, it is possible to derive an optimal bit allocation which minimizes reconstruction error variance for a specified average bit rate. In fact, an N-band SC system with length-N filters is equivalent to a TC system with N×NN×N transformation matrix T: the decimated convolution operation which defines the ithith analysis branch of Figure 2 is identical to an inner product between an N-length input block and t i t t i t , the ithith row of T. (See Figure 3.)
    Figure 3: Equivalence between (a) N-band sub-band coding with length-N filters and (b) N×NN×N transform coding (shown for N=4)N=4). Note: impulse response coefficients {hn}{hn} correspond to filter Hi(z)Hi(z).
    This is a two-part figure. part a contains a series of horizontally connected boxes in a single row, labeled h_0, h_1, h_2, h_3 from left to right, followed by a long arrow that points at the expression s_i(m). In a second row of this part of the figure, a series of horizontally connected boxes continues at the same vertical position that the first row's boxes end. These boxes are also labeled h_0, h_1, h_2, and h_3. To the right of these is a short arrow that ends at the same part of the page that the upper row ends, pointing at the variable s_i(m-1) Below these is a final row of the first part of the figure, containing a series of connected boxes that span the entire width of the page. From left to right, the expressions inside the boxes are x(Nm), x(Nm-1), x(Nm-2), x(Nm-3), x(Nm-4), x(Nm-5), x(Nm-6), and x(Nm-7). The second part of the figure is drawn in a similar fashion, except that there is one large box in place of the connected boxes from the first part. The large box in the first row contains the expression t_i^t, and the arrow points at the expression y_i(m). In the second row, the box contains the same expression as the first row, and its arrow points at the expression y_i(m-1). The bottom row contains two connected boxes rather than the eight connected boxes in the first part of this figure. The two boxes contain the expressions x(m) on the left, and x(m-1) on the right.
    So what kind of frequency responses characterize the most-commonly used transformation matrices? Lets look at the DFT first. For the ithith row, we have
    |Hi(ω)|=n=0N-1e-j2πNine-jωn=n=0N-1e-j(ω+2πNi)n=sin(N2(ω+2πiN))sin(12(ω+2πiN)).|Hi(ω)|=n=0N-1e-j2πNine-jωn=n=0N-1e-j(ω+2πNi)n=sin(N2(ω+2πiN))sin(12(ω+2πiN)).
    (1)
    Figure 4 plots these magnitude responses. Note that the ithith DFT row acts as a bandpass filter with center frequency 2πi/N2πi/N and stopband attenuation of 66 dB. Figure 5 plots the magnitude responses of DCT filters, where we see that they have even less stopband attenuation.
    Figure 4: Magnitude responses of DFT basis vectors for N=8N=8.
    This figure is a cartesian graph, plotting the horizontal axis omega of values -3 to 3, and vertical axis dB of values -20 to 0. The figure contains seven disconnected peaks, each approximately one horizontal unit in width, with the exception of the fourth peak, which is nearly two units wide. The vertical values at the waves' peak are the following from left to right: -9, -8, -6, 0, -6, -8, -9. Beyond these curves are a series of dashed peaks of varying heights that are even in width and alignment with the aforementioned solid peaks, but are of different heights as if each peak's different height is drawn over every other peak in the chart.
    Figure 5: Magnitude responses of DCT basis vectors for N=8N=8.
    This figure is a cartesian graph, plotting the horizontal axis omega of values -3 to 3, and vertical axis dB of values -20 to 2. The figure contains six disconnected peaks, although the figure is exactly symmetrical about a vertical line at omega=0. The first wave is approximately one unit wide, and reaches a vertical value of -4. The second wave is approximately 1.5 units wide and reaches a vertical value of 0. The third wave is approximately 0.5 units wide and reaches a vertical value of -9. The latter three waves follow the same progression after the reflection of symmetry.
  • Psycho-acoustic Motivations:  We have seen that N-band SC with length-N filters is equivalent to N×NN×N transform coding. But is transform coding the best technique to use in high quality audio coders? It turns out that the key to preserving sonic quality under high levels of compression is to shape the reconstruction error so that the ear will not hear it. When we talk about psychoacoustics later in the course, we'll see that the properties of noise tolerated by the ear/brain are most easily described in the frequency domain. Hence, bitrate allocation based on psychoacoustic models is most conveniently performed when SC outputs represent signal components in isolated frequency bands. In other words, instead of allocating fewer bits to sub-band outputs having a smaller effect on reconstruction error variance, we will allocate fewer bits to sub-band outputs having a smaller contribution to perceived reconstruction error. We have seen that length-N DFT and DCT filters give a 2π/N2π/N bandwidth with no better than 6 dB of stopband attenuation. The SC filters required for high-quality audio coding require much better stopband performance, say >90>90 dB. It turns out that filters with passband width 2π/N2π/N, narrow transition bands, and descent stopband attenuation require impulse response lengths NN. In N-band SC there is no constraint on filter length, unlike N-band TC. This is the advantage of SC over TC when it comes to audio coding1.
  • To summarize, the key differences between transform and sub-band coding are the following.
    1. SC outputs measure relative signal strength in different frequency bands, while TC outputs might not have a strict bandpass correspondence.
    2. The TC input window length is equal to the number of TC outputs, while the SC input window length is usually much greater than number of SC outputs (16×× greater in MPEG).
  • At first glance SC implementation complexity is a valid concern. Recall that in TC, fast N×NN×N transforms such as the DCT and DFT could be performed using Nlog2NNlog2N multiply/adds! Must we give up this computational efficiency for better frequency resolution? Fortunately the answer is no; clever SC implementations are built around fast DFT or DCT transforms and are very efficient as a result. Fast sub-band coding, in fact, lies at the heart of MPEG audio compression (see ISO/IEC 13818-3).

Footnotes

  1. A similar conclusion resulted from our comparison of DPCM and TC of equal dimension N; it was reasoned that the longer “effective” input length of DPCM with N-length prediction filtering gave performance improvement relative to TC.

Content actions

Download module as:

PDF | EPUB (?)

What is an EPUB file?

EPUB is an electronic book format that can be read on a variety of mobile devices.

Downloading to a reading device

For detailed instructions on how to download this content's EPUB to your specific device, click the "(?)" link.

| More downloads ...

Add module to:

My Favorites (?)

'My Favorites' is a special kind of lens which you can use to bookmark modules and collections. 'My Favorites' can only be seen by you, and collections saved in 'My Favorites' can remember the last module you were on. You need an account to use 'My Favorites'.

| A lens I own (?)

Definition of a lens

Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

Who can create a lens?

Any individual member, a community, or a respected organization.

What are tags? tag icon

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

| External bookmarks