The fractional BB-bit two's
complement number representation evenly distributes
2B
2
B
quantization levels between
-1
-1 and
1−2−(B−1)
1
2
B
1
. The spacing between quantization levels is then
22B=2−(B−1)≐
Δ
B
≐
2
2
B
2
B
1
Δ
B
Any signal value falling between two levels is assigned to one
of the two levels.
X
Q
=Qx
X
Q
Q
x
is our notation for quantization.
e=Qx−x
e
Q
x
x
is then the quantization error.
One method of quantization is rounding, which assigns the signal
value to the nearest level. The maximum
error is thus
Δ
B
2=2−B
Δ
B
2
2
B
.
Another common scheme, which is often easier to implement in
hardware, is truncation.
Qx
Q
x
assigns xx to the next
lowest level.
The worst-case error with truncation is
Δ=2−(B−1)
Δ
2
B
1
, which is twice as large as with rounding. Also, the
error is always negative, so on average it may have a non-zero
mean (i.e., a bias component).
Overflow is the other problem. There are two common types: two's
complement (or wraparound) overflow, or
saturation overflow.
Obviously, overflow errors are bad because they are typically
large; two's complement (or
wraparound) overflow introduces more error than saturation, but is easier
to implement in hardware. It also has the advantage that if the
sum of several numbers is between
-1
1
-1
1
, the final answer will be correct even if intermediate
sums overflow! However, wraparound overflow leaves IIR systems
susceptible to zero-input large-scale limit cycles, as discussed in
another module. As usual, there are many tradeoffs to evaluate, and
no one right answer for all applications.
"Doug course at UIUC using the TI C54x DSP has been adopted by many EE, CE and CS depts Worldwide "