This handout explains how numbers are represented in the fixed
point TI C6211 DSP processor. Because hardware can only store
and process *bits*, all the numbers must be
represented as a collection of bits. Each bit represents
either "0" or "1", hence the number system naturally used in
microprocessors is the binary system. This handout explains
how numbers are represented and processed in DSP processors
for implementing DSP algorithms.

**How numbers are represented**

A collection of *interpretation*.

**Unsigned integer representation**

The natural binary representation interprets
each binary word as a positive integer. For example, we
interpret an 8-bit binary word

We can add and multiply two binary words in a straightforward fashion. Because all the numbers are positive, the results of addition or multiplication are also positive.

However, the result of adding two

Another problem of the unsigned integer representation is that it can only represent positive integers. To represent negative values, naturally we need a different interpretation of binary words, and we introduce the two's complement representation and corresponding operations to implement arithmetic on the numbers represented in the two's complement format.

**Two's complement integer representation**

Using the natural binary representation, an

For example, we interpret an 8-bit binary word

binary | decimal |
---|---|

`00000000` |
0 |

`00000001` |
1 |

`01000000` |
64 |

`01111111` |
127 |

`10000000` |
-128 |

`10000001` |
-127 |

`11000000` |
-64 |

`11111111` |
-1 |

When

In 2's compliment representation, subtraction of two
integers can be accomplished by usual binary summation by
computing

##### Exercise 1

(2's complement): What are the decimal numbers
corresponding to the 2's complement 8-bit binary
numbers;

###### Solution

Intentionally left blank.

Sometimes, you need to convert an 8-bit 2's complement number
to a 16-bit number. What is the 16-bit 2's complement
number representing the same value as the 8-bit numbers

For the arithmetic assembly instructions, C62x CPU has
different versions depending on how it handles the signs.
For example, the load instructions `LDH`

and `LDB`

load halfword and byte value to
a 32-bit register with sign extension. That is, the loaded
values are converted to 32-bit 2's complement number and
loaded into a register. The instructions
`LDHU`

and `LDBU`

do
not perform sign extension. They simply fill zeros for the
upper 16- and 24-bits, respectively.

For the shift right instructions `SHR`

and `SHRU`

, the same rule applies. The
`ADDU`

instruction simply treats the
operands as unsigned values.

**Fractional representation**

Although using 2's compliment integers we can implement both
addition and subtraction by usual binary addition (with
special care for the sign bit), the integers are not
convenient to handle to implement DSP algorithms. For
example, If we multiply two 8-bit words together, we need 16
bits to store the result. The number of required word
length increases without bound as we multiply numbers
together more. Although not impossible, it is complicated
to handle this increase in word-length using integer
arithmetic. The problem can be easily handled by using
numbers between

In the 2's complement fractional representation, an

For example, we interpret an 8-bit binary word

This representation is also referred as
Q-format. We can think of having an implied
binary digit right after the MSB. If we have an

##### Exercise 2

(Q format): What are the decimal fractional numbers
corresponding to the Q-7 format binary numbers;

###### Solution

Intentionally left blank.

**Two's complement arithmetic**

The convenience of 2's compliment format comes from the ability to represent negative numbers and compute subtraction using the same algorithm as a binary addition. The C62x processor has instructions to add, subtract and multiply numbers in the 2's compliment format. Because, in most digital signal processing algorithms, Q-15 format is most easy to implement on C62x processors, we only focus on the arithmetic operations on Q-15 numbers in the following.

**Addition and subtraction**

The addition of two binary numbers is computed in the
same way as we compute the sum of two decimal numbers.
Using the relation
`ADD`

performs this binary addition on different operands.

However, care must be taken when adding binary numbers.
Because each Q-15 number can represent numbers in the
range
*scaling* is
necessary and it is important to figure out how much
scaling is necessary to avoid overflow. Because scaling
results in loss of effective number of digits, increasing
quantization errors, we usually need to find the minimum
amount of scaling to prevent overflow.

Another way of handling the overflow (and underflow) is
saturation. If the result is out of the
range that can be properly represented in the given data
size, the value is saturated, meaning that the value
closest to the true result is taken in the range
representable. Such instructions as
`SADD`

, `SSUB`

perform the operations followed by saturation.

##### Exercise 3

(Q format addition, subtraction): Perform the
additions

###### Solution

Intentionally left blank.

**Multiplication**

Multiplication of two 2's complement numbers is a bit
complicated because of the sign bit. Similar to the
multiplication of two decimal fractional numbers, the
result of multiplying two
Q-

The following is the two examples of binary fractional multiplications:

```
0.110 0.75 Q-3
X 1.110 -0.25 Q-3
--------------------------
0000
0110
0110
1010
-------------------------------
1110100 -0.1875 Q-6
```

Above, all partial products are computed and represented
in Q-6 format for summation. For example,
`0.110*0.010 =0.01100`

in Q-6 for the
second partial product. For the 4th partial product, care
must be taken because in `0.110*1.000`

,
`1.000`

represents
`-0.110 = 1.01000`

(in Q-6 format) that
is 2's complement of `0.11000`

. As
noticed in this example, it is important to represent each
partial product in Q-6 (or in general Q-

```
1.110 -0.25 Q-3
X 0.110 0.75 Q-3
-------------------------
0000
111110
11110
0000
-----------------------------
11110100 -0.1875 Q-6
```

For the second partial product, we need
`1.110*0.010`

in Q-6 format. This is
obtained as `1111100`

in Q-6 (check!).
A simple way to obtain it is to first perform the
multiplication in normal fashion as ```
1110*0010 =
11100
```

ignoring the binary points, then perform
*sign extension* by putting enough 1s
(if the result is negative) or 0s (if the result is
nonnegative), then put the binary point to obtain a Q-6
number. Also notice that we need to remove the extra sign
bit to obtain the final result.

In C62x, if we multiply two Q-15 numbers using one of
multiply instruction (for example
`MPY`

), we obtain 32 bit result in Q-30
format with 2 sign bits. To obtain the result back in
Q-15 format, (i) first we remove 15 trailing bits and (ii)
remove the extended sign bit.

##### Exercise 4

(Q format multiplication): Perform the multiplications
`01001101*11100100`

, and
`01111001*10001011`

when the binary
numbers are Q-7 format.

###### Solution

Intentionally left blank.

**Assembly language implementation**

When `A0`

and `A1`

contain two 16-bit numbers in the Q-15 format, we can
perform the multiplications using `MPY`

followed by a right shift.

```
1 MPY .M1 A0,A1,A2
2 NOP
3 SHR .S1 A2,15,A2 ;lower 16 bit contains result
4 ;in Q-15 format
```

Rather than throwing away the 15 LSBs of the multiplication
result by shifting, you can round up the result by adding
`0x4000`

before shifting.

```
1 MPY .M1 A0,A1,A2
2 NOP
3 ADDK .S1 4000h,A6
4 SHR .S1 A2,15,A2 ;lower 16 bit contains result
5 ;in Q-15 format
```

**C language implementation**

Let's suppose we have two 16-bit numbers in Q-15 format,
stored in variable `x`

and
`y`

as follows:

```
short x = 0x0011; /* 0.000518799 in decimal */
short y = 0xfe12; /* -0.015075684 in decimal */
short z; /* variable to store x*y */
```

The product of `x`

and `y`

can be
computed and stored in Q-15 format as follows:

```
z = (x * y) >> 15;
```

The result of `x*y`

is a 32-bit word with
2 sign bits. Right shifting it by 15 bits ignores the last
15 bits, and storing the shifted result in
`z`

that is a `short`

variable (16 bit) removes the extended sign bit by taking
only lower 16 bits.