Fixed-point arithmetic is generally used when hardware cost, speed,
or complexity is important. Finite-precision quantization issues
usually arise in fixed-point systems, so we concentrate on fixed-point
quantization and error analysis in the remainder of this course.
For basic signal processing computations such as digital
filters and FFTs, the magnitude of the data, the internal
states, and the output can usually be scaled to obtain good performance
with a fixed-point implementation.
Two's-Complement Integer Representation
As far as the hardware is concerned, fixed-point number systems
represent data as
BB-bit
integers. The two's-complement number system is usually used:
k=binary integer representationif0≤k≤2B-1-1bit-by-bit inverse-k+1if-2B-1≤k≤0
k
binary integer representation
0
k
2
B
1
1
bit-by-bit inverse
k
1
2
B
1
k
0
The most significant bit is known at the
sign
bit; it is 0 when the number is non-negative; 1 when the
number is negative.
Fractional Fixed-Point Number Representation
For the purposes of signal processing, we often regard the
fixed-point numbers as binary fractions between
-11
-1
1
, by implicitly placing a decimal point after the sign bit.
or
x=-
b
0
+∑i=1B-1
b
i
2-i
x
b
0
i
B
1
1
b
i
2
i
This interpretation makes it clearer how to implement digital
filters in fixed-point, at least when the coefficients have a
magnitude less than 1.
Truncation Error
Consider the multiplication of two binary
fractions
Note that full-precision multiplication almost doubles the
number of bits; if we wish to return the product to a
BB-bit representation, we must
truncate the
B-1
B
1
least significant bits. However, this introduces
truncation error (also known as
quantization error,
or
roundoff error if the number is rounded to the nearest
BB-bit fractional value rather than truncated). Note
that this occurs after
multiplication.
Overflow Error
Consider the addition of two binary fractions;
Note the occurence of wraparound
overflow; this
only happens with
addition. Obviously, it
can be a bad problem.
There are thus two types of fixed-point error: roundoff error,
associated with data quantization and multiplication, and
overflow error, associated with data quantization and
additions. In fixed-point systems, one must strike a balance
between these two error sources; by scaling down the data, the
occurence of overflow errors is reduced, but the relative size
of the roundoff error is increased.
Note:
Since multiplies require a number of additions, they
are especially expensive in terms of hardware
(with a complexity proportional to
B
x
B
h
B
x
B
h
, where
B
x
B
x
is the number of bits in the data, and
B
h
B
h
is the number of bits in the filter coefficients).
Designers try to minimize both
B
x
B
x
and
B
h
B
h
, and often choose
B
x
≠
B
h
B
x
B
h
!