Loading constants to registers
Quite often you need to load a register with a constant.
The C62x instructions you can use for this task are
MVK, MVKL, and
MVKH. Each of these instructions can
load a 16-bit constant to a register. Read and understand
the description of these instructions in the manual.
Exercise 1
(Loading constants): Write assembly instructions to do the following:
-
Load the 16-bit constant
0xff12toA1. -
Load the 32-bit constant
0xabcd45eftoB0.
Solution
Intentionally left blank.
Register moves, zeroing
Contents of one register can be copied to another register
by using the MV instruction. There is
also the ZERO instruction to set a
register to zero. Learn how to use these instructions by
reading the appropriate TI manual pages.
Loading from memory to registers
Because the C62x processor has the so-called load/store
architecture, you must first load up the content of memory
to a register to be able to manipulate it. The basic
assembly instructions you use for loading are
LDB, LDH, and
LDW for loading up 8-, 16-, and 32-bit
data from memory. (There are some variations to these
instructions for different handling of the signs of the
loaded values.) Read and understand how these instructions
work.
However, to specify the address of the memory location to load from, you need to load up another register (used as an address index) and you can use various addressing modes to specify the memory locations in many different ways. The addressing modes is the method by which an instruction calculates the location of an object in memory. The table below lists all the possible different ways to handle the address pointers in C62x CPU. Note the similarity with the C pointer manipulation.
| Syntax | Memory address accessed | Pointer modification |
|---|---|---|
*R |
R |
None |
*++R |
R |
Preincrement |
*--R |
R |
Predecrement |
*R++ |
R |
Postincrement |
*R-- |
R |
Postdecrement |
*+R[disp]
|
R+disp |
None |
*-R[disp]
|
R+disp |
None |
*++R[disp]
|
R+disp |
Preincrement |
*--R[disp]
|
R+disp |
Predecrement |
*R++[disp]
|
R+disp |
Postincrement |
*R--[disp]
|
R+disp |
Postdecrement |
The [disp] specifies the number of
elements in word, halfword, or byte, depending on the
instruction type and it can be either 5-bit
constant or a register. The
increment/decrement of the index registers are also in terms
of the number of bytes in word, halfword or byte. The
addressing modes with displacements are useful when a block
of memory locations is accessed. Those with automatic
increment/decrement are useful when a block is accessed
consecutively to implement a buffer, for example, to store
signal samples to implement a digital filter.
Exercise 2
(Load from memory): Assume the following values are stored in memory addresses:
100h fe54 7834h
104h 3459 f34dh
108h 2ef5 7ee4h
10ch 2345 6789h
110h ffff eeddh
114h 3456 787eh
118h 3f4d 7ab3h
Suppose A10 = 0000 0108h. Find the
contents of A1 and
A10 after executing the each of the
following instructions.
-
LDW .D1 *A10, A1 -
LDH .D1 *A10, A1 -
LDB .D1 *A10, A1 -
LDW .D1 *-A10[1], A1 -
LDW .D1 *+A10[1], A1 -
LDW .D1 *+A10[2], A1 -
LDB .D1 *+A10[2], A1 -
LDW .D1 *++A10[1], A1 -
LDW .D1 *--A10[1], A1 -
LDB .D1 *++A10[1], A1 -
LDB .D1 *--A10[1], A1 -
LDW .D1 *A10++[1], A1 -
LDW .D1 *A10--[1], A1
Solution
Intentionally left blank.
Storing data to memory
Storing the register contents uses the same addressing
modes. The assembly instructions used for storing are
STB, STH, and
STW. Read and understand these
instructions in the TI manual.
Exercise 3
(Storing to memory): Write assembly instructions to
store 32-bit constant 53fe 23e4h to
memory address 0000 0123h.
Solution
Intentionally left blank.
Sometimes, it becomes necessary to access part of the data
stored in memory. For example, if you store the 32-bit word
0x11223344 at memory location
0x8000, the four bytes having addresses
location 0x8000, location
0x8001, location
0x8002, and location
0x8003 contain the value
0x11223344. Then, if I read the byte
data at memory location 0x8000, what
would be the byte value to be read?
The answer depends on the endian mode of the memory system. In the little endian mode, the lower memory addresses contain the LSB part of the data. Thus, the bytes stored in the four byte addresses will be as shown in Table 2.
0x8000 |
0x44 |
0x8001 |
0x33 |
0x8002 |
0x22 |
0x8003 |
0x11 |
In the big endian mode, the lower memory addresses contain the MSB part of the data. Thus, we have
0x8000 |
0x11 |
0x8001 |
0x22 |
0x8002 |
0x33 |
0x8003 |
0x44 |
In this course, we use the little endian mode by default and all the lab programming must assume the little endian mode.
Exercise 4
(Little endian mode): What will be the value in
A0 after executing the following
assembly instructions? (functional unit specifications
were omitted.)
-
MVKL 0x80000000, A10 -
MVKH 0x80000000, A10 -
MVKL 0x12345678, A9 -
MVKH 0x12345678, A9 -
STW A9, *A10 -
LDB *+A10[2],A0
A0 if the
system uses the big endian mode?
Solution
Intentionally left blank.
In fact, the above addressing method describes the so-called linear addressing mode (default upon reset), where the offset or increment/decrement of pointers occur without bound. There is a circular addressing modes that can handle a finite size buffer efficiently. You will implement circular buffers for the FIR filtering algorithm in the FIR filtering experiments later.
In the C62x CPU, it takes exactly one CPU clock cycle to
execute each instruction. However, the instructions such as
LDW need to access the slow external
memory and the results of the load are not available
immediately at the end of the execution. This
delay of the execution results is
called delay slots.
Example 1
For example, let's consider loading up the content of
memory content at address pointed by
A10 to A1 and
then moving the loaded data to A2.
You might be tempted to write simple 2 line assembly code
as follows:
1 LDW .D1 *A10, A1
2 MV .D1 A1,A2
What is wrong with the above code? The result of the
LDW instruction is not available
immediately after LDW is executed.
As a consequence, the MV instruction
does not copy the desired value of A1
to A2. To prevent this undesirable
execution, we need to make the CPU wait until the result
of the LDW instruction is correctly
loaded to A1 before executing the
MV instruction. For load
instructions, we need extra 4 clock cycles until the load
results are valid. To make the CPU wait for 4 clock
cycles, we need to insert 4 NOP (no
operations) instructions between LDW
and MV. Each
NOP instruction makes the CPU idle
for one clock cycle. The resulting code will be like
this:
1 LDW .D1 *A10, A1
2 NOP
3 NOP
4 NOP
5 NOP
6 MV .D1 A1,A2
or simply you can write
1 LDW .D1 *A10, A1
2 NOP 4
3 MV .D1 A1,A2
Then, why didn't the designer of the CPU make such that
LDW instruction takes 5 clock cycles to
begin with, rather than let the programmer insert 4
NOPs? The answer is that you can
insert other instructions other than
NOPs as far as those instructions do
not use the result of the LDW
instruction above. By doing this, the CPU can execute
additional instructions while waiting for the result of the
LDW instruction to be valid, greatly
reducing the total execution time of the entire program.
More on instructions with delay slots
The Table 3-5 in TI's instruction set description shows the
execution of the instructions with delay slots in more
detail. The instructions with delay slots are multiply
(MPY, 1 delay slot), the load
(LDB, LDW etc. 4 delay slots)
instructions, and the branch (B, 5
delay slots) instruction.
The functional unit latency indicates for how many clock cycles each instructions actually use a functional unit. All C62x instructions have 1 functional unit latency, meaning that each functional unit is ready to execute the next instruction after 1 clock cycle regardless of the delay slots of the instructions. Therefore, the following instructions are valid:
1 LDW .D1 *A10, A4
2 ADD .D1 A1,A2,A3
Although the first LDW instruction do
not load the A4 register correctly
while the ADD is executed, the
D1 functional unit becomes available
in the clock cycle right after the one in which
LDW is executed.
To clarify the execution of instructions with delay slots,
let's think of the following example of
LDW instruction. Let's assume
A10 = 0x0100 A2=1,
and your intent is loading A9 with the
32-bit word at the address 0x0104. The
3 MV instructions are not related to
the LDW instruction. They do something
else.
1 LDW .D1 *A10++[A2], A9
2 MV .L1 A10, A8
3 MV .L1 A1, A10
4 MV .L1 A1, A2
5 ...
We can ask several interesting questions at this point:
-
What is the value loaded to
A8? That is, in which clock cycle, the address pointer is updated? -
Can we load the address offset register
A2before theLDWinstruction finishes the actual loading? -
Is it legal to load to
A10before the firstLDWfinishes loading the memory content toA9? That is, can we change the address pointer before the 4 delay slots elapse?
-
Although it takes extra 4 clock cycles for the
LDWinstruction to load the memory content toA9, the address pointer and offset registers (A10andA2) are read and updated in the clock cycle theLDWinstruction is issued. Therefore, in line 2,A8is loaded with the updatedA10, that isA10 = A8 = 0x104. -
Because the
LDWreads theA10andA2registers in the first clock cycle, you are free to change these registers and do not affect the operation of the firstLDW. - This was already answered above.
Similar theory holds for MPY and
B (when using a register as a branch
address) instructions. The MPY reads
in the source values in the first clock cycle and loads the
multiplication result after the 2nd clock cycle. For
B, the address pointer is read in the
first clock cycle, and the actual branching occurs after the
5th clock cycle. Thus, after the first clock cycle, you are
free to modify the source or the address pointer registers.
For more details, refer Table 3-5 in the instruction set
description or read the description of the individual
instruction.
Addition, Subtraction and Multiplication
There are several instructions for addition, subtraction and
multiplication on C62x CPU. The basic instructions are
ADD, SUB, and
MPY. Learn about these instructions in
the TI manual. ADD and
SUB have 0 delay slots (meaning the
results of operation are immediately available), but the
MPY has 1 delay slot (the result of
multiplication is valid after additional 1 clock cycle).
Exercise 5
(Add, subtract, and multiply): Write an assembly program
to compute ( 0000 ef35h + 0000 33dch - 0000
1234h ) * 0000 0007h
Solution
Intentionally left blank.
Branching and conditional operations
Often you need to control the flow of the program execution
by branching to another block of code. The
B instruction does the job in the C62x
CPU. The address of the branch can be specified either by
displacement or stored in a register to be used by the
B instruction. Read and understand the
B instruction in the manual. The
B instruction has 5 delay slots,
meaning that the actual branch occurs in the 5th clock cycle
after the instruction is executed.
In many cases, depending on the result of previous
operations, you execute the branch instruction
conditionally. For example, to implement a loop, you
decrement the loop counter by 1 each time you run a set of
instructions and whenever the loop counter is not zero, you
need to branch to the beginning of the code block to iterate
the loop operations. In C62x CPU, this conditional
branching is implemented using the conditional
operations. Although B may be
the instruction implemented using conditional operations
most often, all instructions in C62x can be conditional.
Conditional instructions are represented in code by using
square brackets, [ ], surrounding the
condition register name. For example, the following
B instruction is executed only if
B0 is nonzero:
1 [B0] B .L1 A0
To execute an instruction conditionally when the condition
register is zero, we use ! in front of the register. For
example, the B instruction is executed
when B0 is zero.
1 [!B0] B .L1 A0
Not all registers can be used as the condition registers.
In C62x CPU, the registers that can be tested in conditional
operations are B0,
B1, B2,
A1, A2.
Exercise 6
(Simple loop): Write an assembly program computing the
summation
Solution
Intentionally left blank.
Logical operations and bit manipulation
The logical operations and bit manipulations are
accomplished by the AND,
OR, XOR,
CLR, SET,
SHL, and SHR
instructions. Read and understand the operations of these
instructions.
Other assembly instructions
Other useful instructions include IDLE
and compare instructions such as CMPEQ
etc. Read and understand the operations of these
instructions.
C62x instruction set summary
The set of instructions that can be performed in each functional unit is as follows (See Table 4, Table 5, Table 6 and Table 7). Please refer to TMS320C62x/C67x CPU and Instruction Set Reference Guide for detailed description of each instruction.
| Instruction | Description |
|---|---|
ADD(U) |
signed or unsigned integer addition without saturation |
ADDK |
integer addition using signed 16-bit constant |
ADD2 |
two 16-bit integer adds on upper and lower register halves |
B |
branch using a register |
CLR |
clear a bit field |
EXT |
extract and sign-extend a bit field |
MV |
move from register to register |
MVC |
move between the control file and the register file |
MVK |
move a 16-bit constant into a register and sign extend |
MVKH |
move 16-bit constant into the upper bits of a register |
NEG |
negate (pseudo-operation) |
NOT |
bitwise NOT |
OR |
bitwise OR |
SET |
set a bit field |
SHL |
arithmetic shift left |
SHR |
arithmetic shift right |
SSHL |
shift left with saturation |
SUB(U) |
signed or unsigned integer subtraction without saturation |
SUB2 |
two 16-bit integer integer subs on upper and lower register halves |
XOR |
exclusive OR |
ZERO |
zero a register (pseudo-operation) |
| Instruction | Description |
|---|---|
ABS |
integer absolute value with saturation |
ADD(U) |
signed or unsigned integer addition without saturation |
AND |
bitwise AND |
CMPEQ |
integer compare for equality |
CMPGT(U) |
signed or unsigned integer compare for greater than |
CMPLT(U) |
signed or unsigned integer compare for less than |
LMBD |
leftmost bit detection |
MV |
move from register to register |
NEG |
negate (pseudo-operation) |
NORM |
normalize integer |
NOT |
bitwise NOT |
+OR |
bitwise OR |
SADD |
integer addition with saturation to result size |
SAT |
saturate a 40-bit integer to a 32-bit integer |
SSUB |
integer subtraction with saturation to result size |
SUBC |
conditional integer subtraction and shift - used for division |
XOR |
exclusive OR |
ZERO |
zero a register (pseudo-operation) |
| Instruction | Description |
|---|---|
ADD(U) |
signed or unsigned integer addition without saturation |
ADDAB (B/H/W) |
integer addition using addressing mode |
LDB (B/H/W) |
load from memory with a 15-bit constant offset |
MV |
move from register to register |
STB (B/H/W) |
store to memory with a register offset or 5-bit unsigned constant offset |
SUB(U) |
signed or unsigned integer subtraction without saturation |
SUBAB (B/H/W) |
integer subtraction using addressing mode |
ZERO |
zero a register (pseudo-operation) |
| Instruction | Description |
|---|---|
MPY (U/US/SU) |
signed or unsigned integer multiply 16lsb*16lsb |
MPYH (U/US/SU) |
signed or unsigned integer multiply 16msb*16msb |
MPYLH |
signed or unsigned integer multiply 16lsb*16msb |
MPYHL |
signed or unsigned integer multiply 16msb*16lsb |
SMPY (HL/LH/H) |
integer multiply with left shift and saturation |






