Skip to content Skip to navigation

Connexions

You are here: Home » Content » Quantization Error in FIR Filters

Navigation

Content Actions

Quantization Error in FIR Filters

Module by: Douglas L. Jones

Summary: FIR filters suffer from both data and coefficient quantization; each has different effects. Double-precision accumulation inside the FIR filter structure greatly reduces the data quantization error.

In digital filters, both the data at various places in the filter, which are continually varying, and the coefficients, which are fixed, must be quantized. The effects of quantization on data and coefficients are quite different, so they are analyzed separately.

Data Quantization

Typically, the input and output in a digital filter are quantized by the analog-to-digital and digital-to-analog converters, respectively. Quantization also occurs at various points in a filter structure, usually after a multiply, since multiplies increase the number of bits.

Direct-form Structures

There are two common possibilities for quantization in a direct-form FIR filter structure: after each multiply, or only once at the end.

Figure 1
Subfigure 1.1: Single-precision accumulate; total variance =MΔ212 M Δ 2 12
Subfigure 1.1 (fig1QuantErrorFIR.png)
Subfigure 1.2: Double-precision accumulate; variance =Δ212 Δ 2 12
Subfigure 1.2 (fig2QuantErrorFIR.png)
In the latter structure, a double-length accumulator adds all 2B-1 2 B 1 bits of each product into the accumulating sum, and truncates only at the end. Obviously, this is much preferred, and should always be used wherever possible. All DSP microprocessors and most general-pupose computers support double-precision accumulation.

Transpose-form

Similarly, the transpose-form FIR filter structure presents two common options for quantization: after each multiply, or once at the end.

Figure 2
Subfigure 2.1: Quantize at each stage before storing intermediate sum. Output variance =MΔ212 M Δ 2 12
Subfigure 2.1 (fig3QuantErrorFIR.png)
or
Subfigure 2.2: Store double-precision partial sums. Costs more memory, but variance =Δ212 Δ 2 12
or (fig4QuantErrorFIR.png)

The transpose form is not as convenient in terms of supporting double-precision accumulation, which is a significant disadvantage of this structure.

Coefficient Quantization

Since a quantized coefficient is fixed for all time, we treat it differently than data quantization. The fundamental question is: how much does the quantization affect the frequency response of the filter?

The quantized filter frequency response is DTFThQ=DTFThinf. prec.+e= H inf. prec. w+ H e w DTFT h Q DTFT h inf. prec. e H inf. prec. w H e w Assuming the quantization model is correct, H e w H e w should be fairly random and white, with the error spread fairly equally over all frequencies w-ππ w ; however, the randomness of this error destroys any equiripple property or any infinite-precision optimality of a filter.

Problem 1

What quantization scheme minimizes the L 2 L 2 quantization error in frequency (minimizes -ππ|Hw- H Q w|2dw w H w H Q w 2 )? On average, how big is this error?

Ideally, if one knows the coefficients are to be quantized to BB bits, one should incorporate this directly into the filter design problem, and find the MM BB-bit binary fractional coefficients minimizing the maximum deviation ( L L error). This can be done, but it is an integer program, which is known to be np-hard (i.e., requires almost a brute-force search). This is so expensive computationally that it's rarely done. There are some sub-optimal methods that are much more efficient and usually produce pretty good results.

Comments, questions, feedback, criticisms?

Send feedback