Skip to content Skip to navigation Skip to collection information

OpenStax-CNX

You are here: Home » Content » An Introduction to Source-Coding: Quantization, DPCM, Transform Coding, and Sub-band Coding » MP3 and AAC: MDCT Processing

Navigation

Lenses

What is a lens?

Definition of a lens

Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

Who can create a lens?

Any individual member, a community, or a respected organization.

What are tags? tag icon

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

This content is ...

Affiliated with (What does "Affiliated with" mean?)

This content is either by members of the organizations listed or about topics related to the organizations listed. Click each link to see a list of all content affiliated with the organization.

Also in these lenses

  • UniqU content

    This collection is included inLens: UniqU's lens
    By: UniqU, LLC

    Click the "UniqU content" link to see all content selected in this lens.

  • Lens for Engineering

    This module and collection are included inLens: Lens for Engineering
    By: Sidney Burrus

    Click the "Lens for Engineering" link to see all content selected in this lens.

Recently Viewed

This feature requires Javascript to be enabled.

Tags

(What is a tag?)

These tags come from the endorsement, affiliation, and other lenses that include this content.
 

MP3 and AAC: MDCT Processing

Module by: Phil Schniter. E-mail the author

Summary: In MP3 and AAC coders, the frequency resolution of the polyphase quadrature filterbank is increased using a cascaded MDCT stage. We describe that here, and give the details of the MDCT stage.

MDCT Filterbanks

  • Hybrid Filter Banks: In more advanced audio coders such as MPEG “Layer-3” or MPEG “Advanced Audio Coding” (the details of which will be discussed later), the 32-band polyphase quadrature filterbank (PQF) is thought to not give adequate frequency resolution, and so an additional stage of frequency division is cascaded onto the output of the PQF. This additional frequency division is accomplished using the so-called “Modified DCT” (MDCT) filterbank. (See Figure 1.)
    Figure 1: Hybrid filterbank scheme used in MPEG Layer-3 (where N=32N=32 and Q switches bewteen 6 and 18) and MPEG AAC (where N=4N=4 and Q switches between 128 and 1024).
    This is a flowchart with general movement to the right, beginning with a single arrow pointing to the right at a large box labeled N-band Polyphase Quadrature Filterbank. From the right edge of this bos are a series of arrows that each point at a series of boxes, all labeled MDCT. From each MDCT box there are four arrows of equal length and size pointing to the right, and these groups of arrows are labeled Q-bands.
  • Lapped Transforms: The MDCT is a so-called “lapped transform.” At the encoder, blocks of length 2Q2Q which overlap by Q samples are windowed and transformed, generating Q subband samples each. At the decoder, the Q subband samples are inverse-transformed and windowed. The windowed output samples are overlapped with and added to the previous Q windowed outputs to form the output stream. Figure 2 gives an intuitive view of the coding/decoding operation, while Figure 3 and Figure 4 specify the specific coder/decoder implementations used in the MPEG schemes.
    Figure 2: A lapped transform.
    This is a flowchart that contains two cartesian graphs, each with four peaked waves, and two boxes, with arrows in between the objects showing movement. The first graph is labeled overlapping input windows, and contains four peaks, with bases overlapping so that the beginning of each wave begins at the midpoint of the preceding wave. Below the right half of the horizontal axis are six dashed arrows that point down at a box labeled transform. To the right of this box are four dashed arrows that point to the right at a box labeled inverse transform. Above the inverse transform box are six more dashed arrows that point up at the second graph, which is visually identical to the first graph, except that it is labeled windowed and overlapped outputs.
    Figure 3: MDCT filterbank: encoder implementation.
    This figure is a large flowchart with a general downward direction. It begins with a series of connected boxes labeled across from left to right in a pattern x(mQ- 2Q + 1), x(mQ -2Q +2) and so on to x(mQ). Below these boxes is a single arrow labeled with an asterisk that points down at a second row of connected rectangles with the series of labels w(0), w(1), and so on to w(2Q - 1). Below these rectangles is a single small arrow pointing down labeled with an equal sign, and a series of larger arrows pointing down at a large box labeled Cosine Matrix Transformation. The positions in which the larger arrows point at the large box are labeled in a series from j = 0 to j = 2Q -1. To the right of the box are a series of arrows pointing to the right at the equations that read from top to bottom, i = 0, i = 1, and so on to a final equation,  i = Q - 1.
    Figure 4: MDCT filterbank: decoder implementation.
    This figure is a large flowchart that moves generally downward. It begins with a large box labeled Cosine matrix transformation. To the left of this box are a series of arrows pointing at the box that are labeled with the equations, i = 0,  i = 1, and so on to i = Q - 1. At the base of this box are the equations j = 0, j = 1, and so on in the series to  j = 2Q - 1. From each of these equations in the series at the base are arrows labeled with asterisks pointing at different segments of a long rectangle containing hash marks. Inside the long rectangle is the label w(0) . . . w(2Q - 1). Below this rectangle is a single arrow pointing down, labeled with an equal sign, at two connected rectangles with the same width and same number of hash marks. Each of the connected rectangles is then divided into two segments because the middle hash mark is longer. The segments, from left to right, contain the captions u_m(0) . . . u_m(Q - 1), u_m(Q) . . . u_m(2Q - 1), u_m-1(0) . . . u_m-1(Q-1), and u_m-1(Q) . . . u_m-1(2Q-1). From certain points along these rectangles are arrows pointing at a row of circles containing a plus sign. below each circle is an arrow pointing down at a final row of connected boxes, labeled u(mQ) to u(mQ + Q - 1).
  • Perfect Reconstruction: Based on the cancellation of time-domain aliasing components, Princen, Johnson, & Bradley show (in ICASSP 87 and TASSP 86 papers) that the MDCT acheives perfect-reconstruction when window {wn}{wn} is chosen so that overlapped squared copies sum to one, i.e.,
    1=wn+Q2+wn2for0nQ-1.1=wn+Q2+wn2for0nQ-1.
    (1)
    The “sine” window
    wn=sinπ2Qnfor0n2Q-1wn=sinπ2Qnfor0n2Q-1
    (2)
    is one example of a window satisfying this requirement, and it turns out to be the one used in MPEG Layer-3.
  • Frequency Resolution: With a window length that is only twice the number of transform outputs, we cannot expect very good frequency selectivity. But, it turns out that this is not a problem. In MPEG Layer-3, sine-window MDCTs appear at the outputs of a 32-band PQF where frequency selectivity is not a critical issue due to the limited frequency resolution of the human ear. In MPEG AAC, a 4-band PQF in conjunction with an optimized MDCT window function gives frequency selectivity just above that which current psychoacoustic models deem necessary (see M. Bosi et al., "ISO/IEC MPEG-2 Advanced Audio Coding" in JAES Oct 1997).
  • Window Switching: Larger values of Q lead to increased frequency resolution but decreased time resolution. Time resolution is linked to the following: error due to the quantization of one MDCT output is spread out over 2QN2QN time-domain output samples. For signals of a transient nature, choosing QNQN too high leads to audible “pre-echoes.” For less transient signals, on the other hand, the same value of QNQN might not be perceptible (and the increased frequency resolution might be very beneficial). Hence, most advanced coding schemes have a provision to switch between different time/frequency resolutions depending on local signal behavior. In MPEG Layer-3, for example, Q switches between 6 and 18. This is accomplished using a sine window of length 36, a sine window of length 12, and intermediate windows which are used to switch between the long and short windows while retaining the perfect reconstruction property. Figure 5 shows an example window sequence.
    Figure 5: Example MDCT window sequence for MPEG Layer-3.
    this figure is a graph of nine peaked waves, each beginning and ending at the horizontal axis. They have equal amplitudes, but the wavelengths decrease incrementally until the fifth wave, which has the shortest wavelength, and then they increase symmetrically back to the maximum wavelengths of the first and ninth waves. In shape, the waves are not sinusoidal, most resembling a parabolic shape, except for the third and seventh waves, which begin with a wide ascension to maximum amplitude on the outside, continue with a horizontal segment at their local maxima, and then descend sharply with wavelengths comparable to the fourth and sixth waves.

Collection Navigation

Content actions

Download:

Collection as:

EPUB (?)

What is an EPUB file?

EPUB is an electronic book format that can be read on a variety of mobile devices.

Downloading to a reading device

For detailed instructions on how to download this content's EPUB to your specific device, click the "(?)" link.

| More downloads ...

Module as:

PDF | More downloads ...

Add:

Collection to:

My Favorites (?)

'My Favorites' is a special kind of lens which you can use to bookmark modules and collections. 'My Favorites' can only be seen by you, and collections saved in 'My Favorites' can remember the last module you were on. You need an account to use 'My Favorites'.

| A lens I own (?)

Definition of a lens

Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

Who can create a lens?

Any individual member, a community, or a respected organization.

What are tags? tag icon

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

| External bookmarks

Module to:

My Favorites (?)

'My Favorites' is a special kind of lens which you can use to bookmark modules and collections. 'My Favorites' can only be seen by you, and collections saved in 'My Favorites' can remember the last module you were on. You need an account to use 'My Favorites'.

| A lens I own (?)

Definition of a lens

Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

Who can create a lens?

Any individual member, a community, or a respected organization.

What are tags? tag icon

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

| External bookmarks