Skip to content Skip to navigation Skip to collection information

Connexions

You are here: Home » Content » An Introduction to Source-Coding: Quantization, DPCM, Transform Coding, and Sub-band Coding » MPEG Layers 1-3: Cosine-Modulated Filterbanks

Navigation

Lenses

What is a lens?

Definition of a lens

Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

Who can create a lens?

Any individual member, a community, or a respected organization.

What are tags? tag icon

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

This content is ...

Affiliated with (What does "Affiliated with" mean?)

This content is either by members of the organizations listed or about topics related to the organizations listed. Click each link to see a list of all content affiliated with the organization.

Also in these lenses

  • UniqU content

    This collection is included inLens: UniqU's lens
    By: UniqU, LLC

    Click the "UniqU content" link to see all content selected in this lens.

  • Lens for Engineering

    This module and collection are included inLens: Lens for Engineering
    By: Sidney Burrus

    Click the "Lens for Engineering" link to see all content selected in this lens.

Recently Viewed

This feature requires Javascript to be enabled.

Tags

(What is a tag?)

These tags come from the endorsement, affiliation, and other lenses that include this content.
 

MPEG Layers 1-3: Cosine-Modulated Filterbanks

Module by: Phil Schniter. E-mail the author

Summary: Here the "polyphase quadrature" filterbank used in the MPEG audio standards is described in great detail. It has the following practical features: real-valued sub-band outputs, near-perfect reconstruction, and polyphase implementation; and is based on cancellation of adjacent sub-band interference.

  • Though the uniformly modulated filterbank in Figure 4 from "Uniformly-Modulated Filterbanks" was shown to have the fast implementation in Figure 5 from "Uniformly-Modulated Filterbanks", the sub-band outputs are complex-valued for real-valued input, hence inconvenient (at first glance1) for sub-band coding of real-valued data. In this section we propose a closely related filterbank with the following properties.
    1. Real-valued sub-band outputs (assuming real-valued inputs),
    2. Near-perfect reconstruction,
    3. Polyphase/fast-transform implementation.
    This turns out to be the filterbank specified in the MPEG-1 and 2 (layers 1-3) audio compression standards (see IS0/IEC 13818-3).

Filter Design

  • Real-valued Sub-band Outputs: Recall the generic filterbank structure of Figure 1 from "Uniformly-Modulated Filterbanks". For the sub-band outputs to be real-valued (for real-valued input), we require that the impulse responses of {Hi(z)}{Hi(z)} and {Ki(z)}{Ki(z)} are real-valued. We can insure this by allocating the N (symmetric) frequency band pairs shown in Figure 1. The positive and negative halves of each band pair are centered at ωi=(2i+1)π2Nωi=(2i+1)π2N radians.
    Figure 1: Frequency band pairs for the polyphase quadrature filterbank (N=4N=4).
    This figure is a cartesian graph, with horizontal axis ω, and an unlabeled vertical axis. The graph consists of eight colored, connected rectangles of identical dimension, beginning. The rectangles all have one side drawn on the base of the graph. The leftmost rectangle's left side is located at a ω-value of -π, and the rightmost rectangles right side is located at a ω-value of π. The midpoint along the horizontal axis of each rectangle is labeled from left to right as -ω_3, -ω_2, -ω_1, -ω_0, ω_0, ω_1, ω_2, and ω_3. The rectangles from left to right are colored green, red, blue, grey, grey, blue, red, green. Above the left side of the graph is a horizontal arrow pointing in both directions, labeled π/N. Above the right side of the graph is the equation ω_i = [(2i + 1)π]/2N.
    We can consider each filter Hi(z)Hi(z) as some combination of symmetric positive-frequency and negative-frequency components
    Hi(z)=aiFi(z)+biGi(z)Hi(z)=aiFi(z)+biGi(z)
    (1)
    as shown in Figure 2.
    Figure 2: Positive- and negative-frequency decomposition of Hi(ω)Hi(ω). Note Ki(ω)Ki(ω) will have a similar, if not identical, frequency response.
    this figure is a graph with horizontal axis ω and vertical axis H_i(ω). The horizontal axis establishes wide boundaries of -π and π. Below the graph is a horizontal arrow pointing in both directions, labeled (2i + 1)π/N. There are two shaded trapezoids located on the graph, with their base drawn on the horizontal axis. The trapezoid on the left is labeled G_i(ω), and the midpoint of its base is labeled with a horizontal value -ω_i. The trapezoid on the right is labeled F_i(ω), and the midpoint of its base is labeled with a horizontal value ω_i. Above both trapezoids are horizontal arrows pointing in both directions, labeled π/N.
    When bi=ai*bi=ai* and the pairs {Fi(z),Gi(z)}{Fi(z),Gi(z)} are modulated versions of the same prototype filter H(z)H(z), we can show that Hi(z)Hi(z) must be real-valued:
    Hi(z)=aiH(e-jπ2i+12Nz)Fi(z)+ai*H(ejπ2i+12Nz)Gi(z)=ainhnejπ2i+12Nnz-n+ai*nhne-jπ2i+12Nnz-n= Re (ai)nhnz-nejπ2i+12Nn+e-jπ2i+12Nn+j Im (ai)nhnz-nejπ2i+12Nn-e-jπ2i+12Nn= Re (ai)nhnz-n·2cosπ2i+12Nn+j Im (ai)nhnz-n·2jsinπ2i+12Nn=2n Re (ai)cosπ2i+12Nn- Im (ai)sinπ2i+12Nnhnz-nHi(z)=aiH(e-jπ2i+12Nz)Fi(z)+ai*H(ejπ2i+12Nz)Gi(z)=ainhnejπ2i+12Nnz-n+ai*nhne-jπ2i+12Nnz-n= Re (ai)nhnz-nejπ2i+12Nn+e-jπ2i+12Nn+j Im (ai)nhnz-nejπ2i+12Nn-e-jπ2i+12Nn= Re (ai)nhnz-n·2cosπ2i+12Nn+j Im (ai)nhnz-n·2jsinπ2i+12Nn=2n Re (ai)cosπ2i+12Nn- Im (ai)sinπ2i+12Nnhnz-n
    (2)
  • Aliasing Cancellation: Recall again the generic filterbank in Figure 1 from "Uniformly-Modulated Filterbanks". Here we determine conditions on real-valued {Hi(z)}{Hi(z)} and {Ki(z)}{Ki(z)} which lead to near-perfect reconstruction. It will be insightful to derive an expression for the input to the ithith reconstruction filter, {yi(n)}{yi(n)}. The downsample-upsample-cascade equation Equation 14 from "Fundamentals of Multirate Signal Processing" (fourth equation) implies that
    Yi(z)=1Np=0N-1Xie-j2πNpz=1Np=0N-1Hie-j2πNpzXe-j2πNpz=1Np=0N-1aiFie-j2πNpz+ai*Gie-j2πNpzXe-j2πNpz=1NaiFi(z)+ai*Gi(z)X(z)desired+1Np=1N-1aiFie-j2πNpz+ai*Gie-j2πNpzXe-j2πNpzundesiredimages.Yi(z)=1Np=0N-1Xie-j2πNpz=1Np=0N-1Hie-j2πNpzXe-j2πNpz=1Np=0N-1aiFie-j2πNpz+ai*Gie-j2πNpzXe-j2πNpz=1NaiFi(z)+ai*Gi(z)X(z)desired+1Np=1N-1aiFie-j2πNpz+ai*Gie-j2πNpzXe-j2πNpzundesiredimages.
    (3)
    Thus the input to the ithith reconstruction filter is corrupted by unwanted spectral images, and the reconstruction filter's job is the removal of these images. The reconstruction filter Ki(z)Ki(z) will have a bandpass frequency response similar (or identical) to that of Hi(z)Hi(z) illustrated in Figure 2. Due to the practical design considerations, neither Ki(z)Ki(z) nor Hi(z)Hi(z) will be perfect bandpass filters, but we will assume that the only significant out-of-band energy passed by these filters will occur in the frequency range just outside of their passbands. (Note the limited “spillover” in Figure 2.) Under these assumptions, the only undesired images in Yi(ω)Yi(ω) that will not be completely attenuated by Ki(ω)Ki(ω) are the images adjacent to Fi(ω)Fi(ω) and Gi(ω)Gi(ω). Which indices p in Equation 3 (third equation) are responsible for these adjacent images? Equation 3 (third equation) implies that index p=p= shifts the frequency response up by 2π/N2π/N radians. Since the passband centers of Fi(z)Fi(z) and Gi(z)Gi(z) are (2i+1)π/N(2i+1)π/N radians apart, the passband of Gie-j2πNpzGie-j2πNpz will reside directly to the left of the passband of Fi(z)Fi(z) when p=ip=i. Similarly, the passband of Gie-j2πNpzGie-j2πNpz will reside directly to the right of the passband of Fi(z)Fi(z) when p=i+1p=i+1. See Figure 3 for an illustration. Using the same reasoning, the passband of Fie-j2πNpzFie-j2πNpz will reside directly to the right of the passband of Gi(z)Gi(z) when p=-ip=-i and directly to the left when p=-(i+1)p=-(i+1). The only exceptions to this rule occur when i=0i=0, in which case the images to the right of Gi(z)Gi(z) and to the left of Fi(z)Fi(z) are desired, and when i=N-1i=N-1, in which case the images to the left of Gi(z)Gi(z) and to the right of Fi(z)Fi(z) are desired.
    Figure 3: Spectral images of Yi(ω)Yi(ω) not completely attenuated by Ki(ω)Ki(ω).
    This figure is identical to the preceding figure, except that there are now four dashed trapezoids, one on each side of each of the shaded trapezoids. The bases slightly overlap on both sides with the shaded trapezoids. The trapezoids to the left and right of G_i(z) are labeled F_i(e^(-j(2π/n)p)z), and the trapezoids to the left and right of F_i(z) are labeled G_i(e^(-j(2π/n)p)z). Above the dashed trapezoids are small captions that read from left to right, p = -(i + 1), p = -i, p = i, and p = i + 1.
    Based on the arguments above, we can write {ui(n)}{ui(n)}, the output of the ithith reconstruction filter, as follows:
    Ui(z)=Ki(z)Yi(z)=1NKi(z)aiFi(z)X(z)+ai*Gi(z)X(z)desired+1NKi(z)aiFiej2πNizXej2πNiz+ai*Gie-j2πNizXe-j2πNizaliasingfrominnerundesiredimageswhen1iN-1+1NKi(z)aiFiej2πN(i+1)zXej2πN(i+1)z+ai*Gie-j2πN(i+1)zXe-j2πN(i+1)zaliasingfromouterundesiredimageswhen0iN-2.Ui(z)=Ki(z)Yi(z)=1NKi(z)aiFi(z)X(z)+ai*Gi(z)X(z)desired+1NKi(z)aiFiej2πNizXej2πNiz+ai*Gie-j2πNizXe-j2πNizaliasingfrominnerundesiredimageswhen1iN-1+1NKi(z)aiFiej2πN(i+1)zXej2πN(i+1)z+ai*Gie-j2πN(i+1)zXe-j2πN(i+1)zaliasingfromouterundesiredimageswhen0iN-2.
    (4)
    The previous equation shows that Ui(z)Ui(z) is corrupted by the portions of the undesired images not completely removed by the reconstruction filter Ki(z)Ki(z). In the filterbank context, this undesired behavior is referred to as aliasing. But notice that aliasing contributions to the signal U(z)=iUi(z)U(z)=iUi(z) will vanish if the inner aliasing components in Ui(z)Ui(z) cancel the outer aliasing components in Ui-1(z)Ui-1(z). This happens when
    Ki(z)aiFiej2πNizXej2πNiz+ai*Gie-j2πNizXe-j2πNiz =-Ki-1(z)ai-1Fi-1ej2πNizXej2πNiz+ai-1*Gi-1e-j2πNizXe-j2πNiz. Ki(z)aiFiej2πNizXej2πNiz+ai*Gie-j2πNizXe-j2πNiz =-Ki-1(z)ai-1Fi-1ej2πNizXej2πNiz+ai-1*Gi-1e-j2πNizXe-j2πNiz.
    (5)
    which occurs under satisfaction of the two conditions below.
    aiKi(z)Fiej2πNiz=-ai-1Ki-1(z)Fi-1ej2πNizai*Ki(z)Gie-j2πNiz=-ai-1*Ki-1(z)Gi-1e-j2πNiz.aiKi(z)Fiej2πNiz=-ai-1Ki-1(z)Fi-1ej2πNizai*Ki(z)Gie-j2πNiz=-ai-1*Ki-1(z)Gi-1e-j2πNiz.
    (6)
    We assume from this point on that the real-valued filters {Hi(z)}{Hi(z)} and {Ki(z)}{Ki(z)} are constructed using modulated versions of a lowpass prototype filter H(z)H(z). (This assumption is required for the existence of a polyphase filterbank implementation.)
    Hi(z)=aiFi(z)+ai*Gi(z)Ki(z)=ciFi(z)+ci*Gi(z)where{Fi(z)=He-jπ2N(2i+1)zGi(z)=Hejπ2N(2i+1)zHi(z)=aiFi(z)+ai*Gi(z)Ki(z)=ciFi(z)+ci*Gi(z)where{Fi(z)=He-jπ2N(2i+1)zGi(z)=Hejπ2N(2i+1)z
    (7)
    Then condition Equation 6 (upper equation) becomes
    aiciHe-jπ2N(2i+1)zHejπ2N(2i-1)z+aici*Hejπ2N(2i+1)zHejπ2N(2i-1)z=-ai-1ci-1He-jπ2N(2i-1)zHejπ2N(2i+1)z-ai-1ci-1*Hejπ2N(2i-1)zHejπ2N(2i+1)z.aiciHe-jπ2N(2i+1)zHejπ2N(2i-1)z+aici*Hejπ2N(2i+1)zHejπ2N(2i-1)z=-ai-1ci-1He-jπ2N(2i-1)zHejπ2N(2i+1)z-ai-1ci-1*Hejπ2N(2i-1)zHejπ2N(2i+1)z.
    (8)
    Lets take a closer look at the products He-jπ2N(2i+1)zHejπ2N(2i-1)zHe-jπ2N(2i+1)zHejπ2N(2i-1)z in the previous equation. As illustrated in Figure 4, these products equal zero when 1iN/21iN/2 since their passbands do not overlap. Setting these products to zero in Equation 8 (bottom equation) yields the condition
    aici*=-ai-1ci-1*for1iN-1,aici*=-ai-1ci-1*for1iN-1,
    (9)
    which can also be shown to satisfy Equation 6 (bottom equation).
    Figure 4: Illustration of vanishing terms in Equation 8 (lower equation).
    This figure is a cartesian graph with horizontal axis ω. There are three identical shaded trapezoids of similar shape to those trapezoids in the previous figure. Two of the trapezoids are located in the second quadrant, and the other is above the first and third trapezoids is the title H(e^(-j(π/2N)(2i + 1))z), and above the second is the title  H(e^(j(π/2N)(2i + 1))z). The midpoint of the bases of these trapezoids are measured as follows: the leftmost's horizontal position is (π/2N)(2i + 1) - 2π, the second trapezoid's midpoint is (-π/(2N))(2i - 1), and the rightmost is (π/2N)(2i + 1). Below these trapezoids are two horizontal lines with arrows pointing in either direction. The first line is labeled (2(N - i) -1)π/N, and the second is labeled (2i - 1)π/N.
    Next we concern ourselves with the requirements on a0 and c0. Assuming Equation 9 is satisfied, we know that inner aliasing in Ui(z)Ui(z) cancels outer aliasing in Ui-1(z)Ui-1(z) for 1iN-11iN-1. Hence, from Equation 4 (fourth equation) and Equation 7 (lower equation),
    U(z)=i=0N-1Ui(z)=1Ni=0N-1Ki(z)Hi(z)X(z)=1Ni=0N-1ciHe-jπ2N(2i+1)z+ci*Hejπ2N(2i+1)z·aiHe-jπ2N(2i+1)z+ai*Hejπ2N(2i+1)zX(z)U(z)=i=0N-1Ui(z)=1Ni=0N-1Ki(z)Hi(z)X(z)=1Ni=0N-1ciHe-jπ2N(2i+1)z+ci*Hejπ2N(2i+1)z·aiHe-jπ2N(2i+1)z+ai*Hejπ2N(2i+1)zX(z)
    (10)
    Noting that the passbands of He-jπ2N(2i+1)zHe-jπ2N(2i+1)z and Hejπ2N(2i+1)zHejπ2N(2i+1)z do not overlap for 1iN-21iN-2, we have
    U(z)=1N[(a0c0*+a0*c0)He-jπ2NzHejπ2Nz+(aN-1cN-1*+aN-1*cN-1)He-jπ2N(2N-1)zHejπ2N(2N-1)z+i=0N-1aiciH2e-jπ2N(2i+1)z+ai*ci*H2ejπ2N(2i+1)z]X(z).U(z)=1N[(a0c0*+a0*c0)He-jπ2NzHejπ2Nz+(aN-1cN-1*+aN-1*cN-1)He-jπ2N(2N-1)zHejπ2N(2N-1)z+i=0N-1aiciH2e-jπ2N(2i+1)z+ai*ci*H2ejπ2N(2i+1)z]X(z).
    (11)
    The first two terms in Equation 11 (third equation) represent aliasing components that prevent flat overall response at ω=0ω=0 and ω=πω=π, respectively. These aliasing terms vanish when
    a0c0*=-a0*c0aN-1cN-1*=-aN-1*cN-1a0c0*=-a0*c0aN-1cN-1*=-aN-1*cN-1
    (12)
    What remains is
    U(z)=1Ni=0N-1aiciH2e-jπ2N(2i+1)z+ai*ci*H2ejπ2N(2i+1)zX(z).U(z)=1Ni=0N-1aiciH2e-jπ2N(2i+1)z+ai*ci*H2ejπ2N(2i+1)zX(z).
    (13)
  • Phase Distortion: Perfect reconstruction requires that the analysis/synthesis system has no phase distortion. To guarantee the absence of phase distortion, we require that the composite system
    Q(z):=U(z)X(z)=1Ni=0N-1aiciH2e-jπ2N(2i+1)z+ai*ci*H2ejπ2N(2i+1)zQ(z):=U(z)X(z)=1Ni=0N-1aiciH2e-jπ2N(2i+1)z+ai*ci*H2ejπ2N(2i+1)z
    (14)
    has a linear phase response. (Recall that a linear phase response is equivalent to a pure delay in the time domain.) This linear-phase constraint will provide the final condition used to specify the constants {ai}{ai} and {ci}{ci}. We start by examining the impulse response of Q(z)Q(z). Using a technique analogous to Equation 2 (fifth equation), we can write
    Q(z)=2Nn=02M-2i=0N-1 Re (aici)cosπ2i+12Nn- Im (aici)sinπ2i+12Nnkhkhn-kz-nQ(z)=2Nn=02M-2i=0N-1 Re (aici)cosπ2i+12Nn- Im (aici)sinπ2i+12Nnkhkhn-kz-n
    (15)
    Above, we have used the property that multiplication in the z-domain implies convolution in the time domain. For Q(z)Q(z) to be linear phase, it's impulse response must be symmetric. Let us assume that the prototype filter H(z)H(z) is linear phase, so that {hn}{hn} is symmetric. Thus khmhn-kkhmhn-k is symmetric about n=M-1n=M-1, and thus for linear phase Q(z)Q(z), we require that the quantity
    i=0N-1 Re (aici)cosπ2i+12Nn- Im (aici)sinπ2i+12Nni=0N-1 Re (aici)cosπ2i+12Nn- Im (aici)sinπ2i+12Nn
    (16)
    is symmetric about n=M-1n=M-1, i.e.,
    i=0N-1 Re (aici)cosπ2i+12N(M-1+n)- Im (aici)sinπ2i+12N(M-1+n)=i=0N-1 Re (aici)cosπ2i+12N(M-1-n)- Im (aici)sinπ2i+12N(M-1-n)i=0N-1 Re (aici)cosπ2i+12N(M-1+n)- Im (aici)sinπ2i+12N(M-1+n)=i=0N-1 Re (aici)cosπ2i+12N(M-1-n)- Im (aici)sinπ2i+12N(M-1-n)
    (17)
    for n=0,,M-1n=0,,M-1. Using trigonometric identities, it can be shown that the condition above is equivalent to
    0=i=0N-1sinπ2i+12Nn Re (aici)sinπ2i+12N(M-1)+ Im (aici)cosπ2i+12N(M-1),0=i=0N-1sinπ2i+12Nn Re (aici)sinπ2i+12N(M-1)+ Im (aici)cosπ2i+12N(M-1),
    (18)
    which is satisfied when
    Im (aici) Re (aici)=-sinπ2i+12N(M-1)cosπ2i+12N(M-1)=tan-π2i+12N(M-1). Im (aici) Re (aici)=-sinπ2i+12N(M-1)cosπ2i+12N(M-1)=tan-π2i+12N(M-1).
    (19)
    Restricting |ai|=|ci|=1|ai|=|ci|=1, the previous equation requires that
    aici=e-jπ2i+12N(M-1).aici=e-jπ2i+12N(M-1).
    (20)
    It can be easily verified that the following {ai}{ai} and {ci}{ci} satisfy conditions Equation 9, Equation 12, and Equation 20:
    ai=e-jπM+N-14N(2i+1)ci=e-jπM-N-14N(2i+1).ai=e-jπM+N-14N(2i+1)ci=e-jπM-N-14N(2i+1).
    (21)
    Plugging these into the expression for Hi(z)Hi(z) we find that
    Hi(z)=aiHe-jπ2i+12Nz+ai*Hejπ2i+12Nz=n=0M-1aiejπ2i+12Nn+ai*e-jπ2i+12Nnhnz-n=n=0M-1ejπ2i+12N(n-M+N-12)+e-jπ2i+12N(n-M+N-12)hnz-n=n=0M-12cosπ2i+12Nn-M+N-12hnimpulseresponseofHi(z)z-n.Hi(z)=aiHe-jπ2i+12Nz+ai*Hejπ2i+12Nz=n=0M-1aiejπ2i+12Nn+ai*e-jπ2i+12Nnhnz-n=n=0M-1ejπ2i+12N(n-M+N-12)+e-jπ2i+12N(n-M+N-12)hnz-n=n=0M-12cosπ2i+12Nn-M+N-12hnimpulseresponseofHi(z)z-n.
    (22)
    Repeating this procedure for Ki(z)Ki(z) yields
    Ki(z)=n=0M-12cosπ2i+12Nn-M-N-12hnimpulseresponseofKi(z)z-n.Ki(z)=n=0M-12cosπ2i+12Nn-M-N-12hnimpulseresponseofKi(z)z-n.
    (23)
    At this point we make a few comments on the design of the lowpass prototype H(z)H(z). The perfect H(z)H(z) would be an ideal linear-phase lowpass filter with cutoff at ω=π/2Nω=π/2N, as illustrated in Figure 5. Such a filter would perfectly separate the subbands as well as yield flat composite magnitude response, as per Equation 14. Unfortunately, however, this perfect filter is not realizable with a finite number of filter coefficients. So, what we really want is a finite-length FIR filter having good frequency selectivity, nearly-flat composite response, and linear phase. The length-512 prototype filter specified in the MPEG standards is such a filter, as evidenced by the responses in Figure 6. Unfortunately, the standards do not describe how this filter was designed, and a thorough discussion of multirate filter design is outside the scope of this course. For more on prototype filter design, we point the interested reader to page 358 of Vaidyanathan or Crochiere & Rabiner.
    Figure 5: Ideal (dashed) and typical (solid) prototype-filter magnitude responses for the cosine-modulated filterbank. Note bandwidth relative to (Reference).
    This figure is a graph with horizontal axis ω, ranging in value from -π to π, and vertical axis |H(π)|. There is one dashed rectangle with its base sitting on the horizontal axis, and  with its width measured from horizontal position -π/2N to π/2N. The height is not measured or labeled. There is also a solid curve that, in a calmly wavelike distortion follows the horizontal axis to the bottom-left vertex of the rectangle. The curve then sharply increases to follow the boundary of the dashed rectangle, until at the bottom-right vertex it flattens to continue following the horizontal axis to the edge of the graph.
    Figure 6: Magnitude response of |H(ω)||H(ω)| of MPEG prototype filter and the resulting composite response |Q(ω)||Q(ω)|, where N=32N=32 and M=16N=512M=16N=512.
    This figure is comprised of two cartesian graphs. Both graphs show waves plotted with radians on the horizontal axis and magnitude [dB] on the vertical axis. The first graph is titled prototype filter. The vertical values on this graph range from -120 to 20, and the horizontal values from 0 to 3. The graph begins at nearly a vertical value of 20, immediately falling into a series of nonuniform waves of varying amplitudes and wavelengths in no distinct pattern. There are perhaps one hundred of these waves, never reaching a vertical value again higher than -80, and continuing to the right side of the graph. The second graph is titled composite system. The vertical values range from -2 x 10^-4 to 8 x 10^-4. The horizontal values range from 0 to 3. The waves in this graph follow a rigid, predictable pattern. They have extremely short wavelengths and there are perhaps 150 waves occurring across the page. The waves are centered around a vertical value of 3, and follow a repeating amplitude pattern of 3.2, 3.1, 3.1, 3.2, 2.5.
    To conclude, Equation 22 (fourth equation) and Equation 23 give impulse response expressions for a set of real-valued filters that comprise a near-perfectly reconstructing filterbank (under suitable selection of {hi}{hi}). This is commonly referred to2 as a “cosine-modulated filterbank” because all filters are based on cosine modulations of a real-valued linear-phase lowpass prototype H(z)H(z). The near-perfect reconstruction property follows from the frequency-domain cancellation of adjacent-spectrum aliasing and the lack of phase distortion.
    It should be noted that our derivation of the cosine modulated filterbank is similar to that in Rothweiler ICASSP 83 except for the treatments of phase distortion. See Chapter 8 of Vaidyanathan for a more comprehensive view of cosine-modulated filterbanks.
  • Polyphase Implementations: Recall the uniformly modulated filterbank in Figure 4 from "Uniformly-Modulated Filterbanks", whose combined modulator-filter coefficients can be constructed using products of the terms hnhn and ejπNinejπNin. Figure 5 from "Uniformly-Modulated Filterbanks" shows a computationally-efficient polyphase/DFT implementation of the analysis filter which requires only M multiplies and one N-dimensional DFT computation for calculation of N subband outputs. We might wonder: Is there a similar polyphase/fast-transform implementation of the cosine-modulated filterbank derived in this section? From Equation 22 (fourth equation), we see that the impulse responses of {Hi(z)}{Hi(z)} are products of the terms hnhn and cosπ2i+12Nn-M+N-12cosπ2i+12Nn-M+N-12 for n=0,,M-1n=0,,M-1. Note that the inverse-DCT matrix Cnt can be specified via components with form similar to the cosine term in Equation 22 (fourth equation):
    CNti,n=2Nαncosπ(2i+1)2Nn;i,n=0N-1.forα0=1/2,αn0=1.CNti,n=2Nαncosπ(2i+1)2Nn;i,n=0N-1.forα0=1/2,αn0=1.
    (24)
    Thus it may not be surprising that there exist polyphase/DCT implementations of the cosine-modulated filterbank. Indeed, one such implementation is specified in the MPEG-2 audio compression standard (see ISO/IEC 13818-3). This particular implementation is the focus of the next section.

MPEG Filterbank Implementation

  • Since MPEG audio compression standards are so well-known and widespread, a detailed look at the MPEG filterbank implementation is warranted. The cosine-modulated, or polyphase-quadrature filterbank described in the previous section is used in MPEG Layers 1-3. (The MPEG hierarchy will be described in a later chapter.) This section discusses the specific implementation suggested by the MPEG-2 standard (see ISO/IEC 13818-3).
  • The MPEG standard specifies 512 prototype filter coefficients, the first of which is zero. To adapt the MPEG filter to our cosine-modulated-filterbank framework, we append a zero-valued 513th coefficient so that the resulting MPEG prototype filter becomes symmetric and hence linear phase. Since the standard specifies N=32N=32 frequency bands, we have
    M=513=16N+1.M=513=16N+1.
    (25)
    Plugging this value of M into the filter expressions Equation 22 (fourth equation) and Equation 23, the 2π2π-periodicity of the cosine implies that they may be rewritten as follows.
    Hi(z)=n=016N-12cosπ2i+12Nn-N2hnimpulseresponseofHi(z)z-nKi(z)=n=016N-12cosπ2i+12Nn+N2hnimpulseresponseofKi(z)z-n.Hi(z)=n=016N-12cosπ2i+12Nn-N2hnimpulseresponseofHi(z)z-nKi(z)=n=016N-12cosπ2i+12Nn+N2hnimpulseresponseofKi(z)z-n.
    (26)
  • Encoding: Here we derive the encoder filterbank implementation suggested in the MPEG-2 standard (see ISO/IEC 13818-3). Using xi(n)xi(n) to denote the output of the ithith analysis filter, we have
    xi(n)=k=016N-12cosπ2i+12N(k-N2hk]x(n-k).xi(n)=k=016N-12cosπ2i+12N(k-N2hk]x(n-k).
    (27)
    The relationship between xi(n)xi(n) and its downsampled version si(m)si(m) is given by
    si(m)=xi(mN),si(m)=xi(mN),
    (28)
    so that the downsampled analysis output si(m)si(m) can be written as
    si(m)=n=016N-12cosπ2i+12N(n-N2hn]x(mN-n).si(m)=n=016N-12cosπ2i+12N(n-N2hn]x(mN-n).
    (29)
    Using the substitution n=kN+n=kN+ for 0N-10N-1,
    si(m)=2k=015=0N-1cosπ2i+12NkN+-N2repeatsevery4incrementsofksignchangesevery2incrementsofkhkN+x(m-k)N-=k=015=0N-1cosπ2i+12Nk2N+-N2repeatsevery2incrementsofk2(-1)k/2hkN+analysiswindowx(m-k)N-si(m)=2k=015=0N-1cosπ2i+12NkN+-N2repeatsevery4incrementsofksignchangesevery2incrementsofkhkN+x(m-k)N-=k=015=0N-1cosπ2i+12Nk2N+-N2repeatsevery2incrementsofk2(-1)k/2hkN+analysiswindowx(m-k)N-
    (30)
    Figure 7 illustrates this process.
    Figure 7: MPEG encoder filterbank implementation suggested in ISO/IEC 13818-3.
    This is a complex flowchart with general downward movement of a number of rows of labeled, connected rectangles, all pointing down at a row of circles containing plus signs, which point down at a large box titled cosine matrix transformation. The columns of these connected rectangles are labeled across from k = 0 to k = 16. The first row of  boxes contain labels across from x(mN),... to x(m-16)N +1. The first set of arrows pointing down to the next set of rectangles are all labeled with a * sign. The second row of boxes are labeled across from h_0, ..., h_N-1 to h_15N, ... , h_16N-1. This is followed by another set of arrows with asterisks. The next row of  rectangles contain a series of the number 2 or the number -2. Below this are arrows labeled =. Below these arrows are more boxes that contain 13 hash marks inside. From different hash marks are longer arrows that point at the different circles containing plus signs, which in turn all point at the large cosine matrix transformation box. The positions at which these circles point at the box are labeled across from j = 0 to j = 2N-1. On the right side of the box are a series of equations from top to bottom, i = 0 to i = N - 1. To the right of each of these equations is an arrow pointing to the right at the variables from top to bottom s_0(m) to s_N-1(m).
  • Decoding: Here we derive the dencoder filterbank implementation suggested in the MPEG-2 standard (see ISO/IEC 13818-3). Using yi(n)yi(n) to denote the output of the ithith upsampler,
    ui(n)=k=016N-12cosπ2i+12Nk+N2hkyi(n-k).ui(n)=k=016N-12cosπ2i+12Nk+N2hkyi(n-k).
    (31)
    The input to the upsampler si(m)si(m) is related to the output yi(n)yi(n) by
    yi(n)=si(n/N)whenn/NZ0else,yi(n)=si(n/N)whenn/NZ0else,
    (32)
    so that
    ui(n)= {k:n-kNZ}2cosπ2i+12Nk+N2hksin-kN.ui(n)= {k:n-kNZ}2cosπ2i+12Nk+N2hksin-kN.
    (33)
    Lets write n=mN+n=mN+ for 0N-10N-1 and k=pN+qk=pN+q for 0qN-10qN-1. Then due to the restricted ranges of and q,
    n-kN=m-p+-qNZ=q.n-kN=m-p+-qNZ=q.
    (34)
    Using these substitutions in the previous equation for ui(n)ui(n),
    ui(mN+)=2 p=015cosπ2i+12NpN++N2hpN+si(m-p).ui(mN+)=2 p=015cosπ2i+12NpN++N2hpN+si(m-p).
    (35)
    Summing ui(mN+)ui(mN+) over i to create u(mN+)u(mN+),
    u(mN+)=2i=0N-1p=015cosπ2i+12NpN++N2)repeatsevery4incrementsofpsignchangesevery2incrementsofphpN+si(m-p)=p=0152(-1)p/2hpN+synthesiswindowi=0N-1cosπ2i+12Np2N++N2)=cosπ2i+12N+N2pevencosπ2i+12N+N+N2poddsi(m-p)u(mN+)=2i=0N-1p=015cosπ2i+12NpN++N2)repeatsevery4incrementsofpsignchangesevery2incrementsofphpN+si(m-p)=p=0152(-1)p/2hpN+synthesiswindowi=0N-1cosπ2i+12Np2N++N2)=cosπ2i+12N+N2pevencosπ2i+12N+N+N2poddsi(m-p)
    (36)
    If we define
    vj(m)=i=0N-1cosπ2i+12Nj+N2si(m)for0j2N-1,vj(m)=i=0N-1cosπ2i+12Nj+N2si(m)for0j2N-1,
    (37)
    (note the range of jj!) then we can rewrite
    u(mN+)=p=0,2,,14(-1)p/2hpN+v(m-p)+p=1,3,,15(-1)p/2hpN+v+N(m-p).u(mN+)=p=0,2,,14(-1)p/2hpN+v(m-p)+p=1,3,,15(-1)p/2hpN+v+N(m-p).
    (38)
    Figure 8 illustrates the construction of u(mN+)u(mN+) using the notation
    v(m)=v0(m)v2N-1(m).v(m)=v0(m)v2N-1(m).
    (39)
    Figure 8: MPEG decoder filterbank implementation suggested in ISO/IEC 13818-3.
    This is another complex flowchart that generally moves in the reverse direction of the encoder flowchart in figure 25. The chart begins with the large cosine matrix transformation box, containing the same labels, then points down at a row of connected boxes labeled from v(m) to v(m - 15). Below these boxes are a series of shaded, but unlabeled boxes. Below these are the arrows with asterisks, which point at boxes  containing the h_subscript labels. Below these are more asterisk arrows, which point at the boxes with the series of 2's or -2's. Below these are the equal sign arrows which point down at the boxes with the hash marks. From the hash marks are arrows that point at each circle containing a plus sign, and from each circle there is an arrow pointing down at a final single rectangle containing six hash marks (8 including the sides of the rectangle) numbered from 0 to N.
  • DCT Implementation of Cosine Matrixing: As seen in Figure 7 and Figure 8, the filterbank implementations suggested by the MPEG standard require a cosine matrix operation that, if implemented using straightforward arithmetic, requires 32×64=204832×64=2048 multiply/adds at both the encoder and decoder. Note, however, that the cosine transformations in Figure 7 and Figure 8 do bear a great deal of similarity to the DCT:
    yk=2Nαkn=0N-1xncosπ2n+12Nk;k=0N-1,forα0=1/2,αk0=1,xn=2Nk=0N-1αkykcosπ2n+12Nk;n=0N-1,yk=2Nαkn=0N-1xncosπ2n+12Nk;k=0N-1,forα0=1/2,αk0=1,xn=2Nk=0N-1αkykcosπ2n+12Nk;n=0N-1,
    (40)
    which we know has a fast algorithm: Lee's 32×3232×32 fast-DCT, for example, requires only 80 multiplications and 209 additions (see B.G.Lee TASSP Dec 84). So how do we implement the matrix operation using the fast-DCT? A technique has been described clearly in Konstantinides SPL 1994, the results of which are summarized below. At the encoder, the matrix operation can be written
    si(m)=j=02N-1cosπ2i+12Nj-N2wj(m)fori=0,,N-1,si(m)=j=02N-1cosπ2i+12Nj-N2wj(m)fori=0,,N-1,
    (41)
    where {w0(m),,w2N-1(m){w0(m),,w2N-1(m) is created from {x(m),,x(m-16N+1)}{x(m),,x(m-16N+1)} by windowing, shifting, and adding. (See Figure 7.) We can write
    si(m)=j=0N-1cosπ2i+12Njw¯j(m);i=0,,N-1,si(m)=j=0N-1cosπ2i+12Njw¯j(m);i=0,,N-1,
    (42)
    where, for N=32N=32, {w¯j(m)}{w¯j(m)} is the following manipulation of {wj(m)}{wj(m)}:
    w¯j(m):=w16(m)j=0w16+j(m)+w16-j(m)j=1,2,,16w16+j(m)-w80-j(m)j=17,18,,31.w¯j(m):=w16(m)j=0w16+j(m)+w16-j(m)j=1,2,,16w16+j(m)-w80-j(m)j=17,18,,31.
    (43)
    Compare Equation 42 to the inverse DCT in Equation 40 (lower equation). At the decoder, the matrix operation can be written
    vj(m)=i=0N-1cosπ2i+12Nj+N2si(m)forj=0,,2N-1,vj(m)=i=0N-1cosπ2i+12Nj+N2si(m)forj=0,,2N-1,
    (44)
    where {v0(m),,v2N-1(m)}{v0(m),,v2N-1(m)} are windowed, shifted, and added to compute {u(m)}{u(m)}. (See Figure 8.) It is shown in Konstantinides SPL 1994 that, for N=32N=32, {vj(m)}{vj(m)} can be calculated by first computing {v¯j(m)}{v¯j(m)}:
    v¯j(m)=i=0N-1cosπ2i+12Njsi(m);j=0,,N-1v¯j(m)=i=0N-1cosπ2i+12Njsi(m);j=0,,N-1
    (45)
    and rearranging the outputs according to
    vj(m):=v¯j+16(m)j=0,1,,15,0j=16,-v¯48-j(m)j=17,18,,47,-v¯j-48(m)j=48,49,,63.vj(m):=v¯j+16(m)j=0,1,,15,0j=16,-v¯48-j(m)j=17,18,,47,-v¯j-48(m)j=48,49,,63.
    (46)
    Compare Equation 45 to the DCT in Equation 40 (upper equation).

Footnotes

  1. In the structure in Figure 4 from "Uniformly-Modulated Filterbanks", it would be reasonable to replace the standard DFT with a real-valued DFT (defined in the notes on transform coding), requiring Nlog2NNlog2N real-multiplies when N is a power of 2. Though it is not clear to the author why such a structure was not adopted in the MPEG standards, the cosine modulated filterbank derived in this section has equivalent performance and, with its polyphase/DCT implementation, equivalent implementation cost.
  2. The MPEG standards refer to this filterbank as a “polyphase quadrature” filterbank (PQF), the name given to the technique by an early technical paper: Rothweiler ICASSP 83

Collection Navigation

Content actions

Download module as:

Add:

Collection to:

My Favorites (?)

'My Favorites' is a special kind of lens which you can use to bookmark modules and collections. 'My Favorites' can only be seen by you, and collections saved in 'My Favorites' can remember the last module you were on. You need an account to use 'My Favorites'.

| A lens I own (?)

Definition of a lens

Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

Who can create a lens?

Any individual member, a community, or a respected organization.

What are tags? tag icon

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

| External bookmarks

Module to:

My Favorites (?)

'My Favorites' is a special kind of lens which you can use to bookmark modules and collections. 'My Favorites' can only be seen by you, and collections saved in 'My Favorites' can remember the last module you were on. You need an account to use 'My Favorites'.

| A lens I own (?)

Definition of a lens

Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

Who can create a lens?

Any individual member, a community, or a respected organization.

What are tags? tag icon

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

| External bookmarks