The fractional BB-bit two's
complement number representation evenly distributes
2B
2
B
quantization levels between
-1
-1 and
1-2-B-1
1
2
B
1
. The spacing between quantization levels is then
22B=2-B-1≐
Δ
B
≐
2
2
B
2
B
1
Δ
B
Any signal value falling between two levels is assigned to one
of the two levels.
X
Q
=Qx
X
Q
Q
x
is our notation for quantization.
e=Qx-x
e
Q
x
x
is then the quantization error.
One method of quantization is rounding, which assigns the signal
value to the nearest level. The maximum
error is thus
Δ
B
2=2-B
Δ
B
2
2
B
.
Another common scheme, which is often easier to implement in
hardware, is
truncation.
Qx
Q
x
assigns
xx to the next
lowest level.
The worst-case error with truncation is
Δ=2-B-1
Δ
2
B
1
, which is twice as large as with rounding. Also, the
error is always negative, so on average it may have a non-zero
mean (i.e., a bias component).
Overflow is the other problem. There are two common types: two's
complement (or
wraparound) overflow, or
saturation overflow.
Obviously, overflow errors are bad because they are typically
large; two's complement (or
wraparound) overflow introduces more error than saturation, but is easier
to implement in hardware. It also has the advantage that if the
sum of several numbers is between
-11
-1
1
, the final answer will be correct even if intermediate
sums overflow! However, wraparound overflow leaves IIR systems
susceptible to zero-input large-scale limit cycles, as discussed in
another module. As usual, there are many tradeoffs to evaluate, and
no one right answer for all applications.