Connexions

You are here: Home » Content » Fixed-Point Quantization
Content Actions

Fixed-Point Quantization

Module by: Douglas L. Jones

Summary: Finite word lengths introduce quantization error in fixed-point systems. Truncation quantization causes a larger maximum error and a negative bias compared to rounding, but is easier to implement in hardware. Similarly, wraparound overflow is typically worse than saturation, but also requires more hardware.

The fractional BB-bit two's complement number representation evenly distributes 2B 2 B quantization levels between -1 -1 and 1-2-B-1 1 2 B 1 . The spacing between quantization levels is then 22B=2-B-1 Δ B 2 2 B 2 B 1 Δ B Any signal value falling between two levels is assigned to one of the two levels.
X Q =Qx X Q Q x is our notation for quantization. e=Qx-x e Q x x is then the quantization error.
One method of quantization is rounding, which assigns the signal value to the nearest level. The maximum error is thus Δ B 2=2-B Δ B 2 2 B .
subfig1aFixed-PointQuant.pngsubfig1bFixed-PointQuant.png
Subfigure 1.1
Subfigure 1.2
Figure 1
Another common scheme, which is often easier to implement in hardware, is truncation. Qx Q x assigns xx to the next lowest level.
subfig2aFixed-PointQuant.pngsubfig2bFixed-PointQuant.png
Subfigure 2.1
Subfigure 2.2
Figure 2
The worst-case error with truncation is Δ=2-B-1 Δ 2 B 1 , which is twice as large as with rounding. Also, the error is always negative, so on average it may have a non-zero mean (i.e., a bias component).
Overflow is the other problem. There are two common types: two's complement (or wraparound) overflow, or saturation overflow.
wraparoundsaturation
subfig3aFixed-PointQuant.pngsubfig3bFixed-PointQuant.png
Subfigure 3.1
Subfigure 3.2
Figure 3
Obviously, overflow errors are bad because they are typically large; two's complement (or wraparound) overflow introduces more error than saturation, but is easier to implement in hardware. It also has the advantage that if the sum of several numbers is between -11 -1 1 , the final answer will be correct even if intermediate sums overflow! However, wraparound overflow leaves IIR systems susceptible to zero-input large-scale limit cycles, as discussed in another module. As usual, there are many tradeoffs to evaluate, and no one right answer for all applications.

Comments, questions, feedback, criticisms?

Send feedback