Summary: Introduction to the basic C62x architecture and instruction set.
Although you can obtain fairly good performance on the C62x CPU by programming in C language with TI's optimizing compiler, to control the CPU and peripherals directly or to optimize the code for maximum efficiency, you need to learn how the CPU works internally and how to write programs in C62x assembly language. This lab introduces the basic C62xs architecture and instruction set to familiarize you with the internal functions of the CPU and the assembly programming. You also learn the basic programming techniques using TI's assembler.
The C62x consists of internal memory, peripherals (serial port, external memory interface, etc), and most importantly, the CPU that has the registers and the functional units for execution of instructions. Although you don't need to care about the internal architecture of the CPU for compiling and running programs, it is necessary to understand how the CPU fetches and executes the assembly instructions to write a highly optimized assembly program.
We learn the architecture and basic function of each CPU unit through the development of simple assembly language programs.
In many DSP algorithms, the Sum of Products or Multiply-Accumulate (MAC) operations are very common. A DSP CPU is designed to handle the math-intensive calculations necessary for common DSP algorithms. For efficient implementation of the MAC operation, the C6211 CPU has two multipliers and each of them can perform a 16-bit multiplication in each clock cycle. For example, if we want to compute the dot product of two length-40 vectors
1 MPY .M a,x,prod
2 ADD .L y,prod,y
Ignore .M and .L for now. Here, a,x,prod,y are numbers stored in memory and the instruction MPY multiplies two numbers a and x together and stores the result in prod. The ADD instruction adds two numbers y and prod together storing the result back to y.
Where are the numbers stored in the CPU? In C62x, the numbers used in operations are stored in the registers. Because the registers are directly accessible though the data path of the CPU, accessing the registers are much faster than accessing data in the external memory.
The C62x CPU has two register files (A and B). Each of these files consists of sixteen 32-bit registers (A0-A15 for file A and B0-B15 for file B). The general-purpose registers can be used for data, data address pointers, or condition registers.
The general-purpose register files support data ranging in size from 16-bit data through 40-bit fixed-point. Values larger than 32 bits, such as 40-bit long quantities, are stored in register pairs. In a register pair, the 32 LSB's of data are placed in an even-numbered register and the remaining 8 MSB's in the next upper register (which is always an odd-numbered register). In assembly language syntax, a colon between two register names denotes the register pairs, and the odd-numbered register is specified first. For example, A1:A0 represents the register pair consisting of A0 and A1. But you don't need to be concerned with the 40-bit numbers too much. Throughout this course, you will be mostly handling either 16 or 32-bit values stored in a single register.
Let's for now focus on file A only. The registers in the register file A are named A0 to A15. Each register can store a 32-bit binary number. Then numbers such as a,x,prod,y above are stored in these registers. For example, register A0 stores a. For now, let's assume we interpret all 32-bit numbers stored in registers as unsigned integer. Therefore the range of values we can represent is 0 to
a,x,prod,y are in the registers A0, A1, A3, A4, respectively. Then the above assembly instructions can be written specifically:
1 MPY .M1 A0,A1,A3
2 ADD .L1 A4,A3,A4
(Ignore .M1 and .L1 for the moment.)
The TI C62x CPU has a load/store architecture. This means that all the numbers must be stored in the registers for being used as operands of the operations for instructions such as MPY and ADD. The numbers can be read from a memory location to a register (using, for example, LDW, LDB instructions) or a register can be loaded with a constant value. The content of a register can be stored to a memory location (using, for example, STW, STB instructions).
In addition to the general-purpose register files, the CPU has a separate register file for the control registers. The control registers are used to control various CPU functions such as addressing mode, interrupts, etc. You will learn more about some of the control registers when we learn each individual topic.
Then, where do the actual operations such as multiplication and addition take place? The C62x CPU has several functional units that perform the actual operations. Each register file has 4 functional units named .M, .L, .S, and .D. The 4 functional units connected to the register file A are named .L1, .S1, .D1, and .M1. Those connected to the register file B are named .L2, .S2, .D2, and .M2. For example, the functional unit .M1 performs multiplication on the operands that are in register file A. When the CPU executes the MPY .M1 A0, A1, A3 above, the functional unit .M1 takes the value stored in A0 and A1, multiply them together and stores the result to A3. The .M1 in MPY .M1 A0, A1, A3 indicates that this operation is performed in the .M1 unit. The .M1 unit has a 16 bit multiplier and all the multiplications are performed by the .M1 (or .M2) unit.
Similarly, the ADD operation can be executed by the .L1 unit. The .L1 can perform all the logical operations such as bitwise AND operation (AND instruction) as well as basic addition (ADD instruction) and subtraction (SUB instruction).
We will later learn more about assigning the functional units for assembly instructions.
Read the description of ADD and MPY instructions in the TI manual handed out. Write an assembly program that computes A0*(A1+A2)+A3.
When you have a piece of assembly code to execute on the CPU, you need to first load it up at some memory location. The C6211 CPU has some internal memory space to store program code and data. The DSK board also has an external RAM on the board you can use to store program code and data. The memory map of the DSL board is as follows:
| Address | Memory Map | Size |
|---|---|---|
| 0000 0000 | Internal Ram | 64K bytes |
| 0001 0000 | Reserved | 24K bytes |
| 0180 0000 | Control registers | 316 bytes |
| 01A0 0000 | EDMA parameter RAM | 2M bytes |
| 01A0 FFE0 | Control registers | 72 bytes |
| 3000 0000 | McBSP0 data | 64M bytes |
| 3400 0000 | McBSP1 data | 64M bytes |
| 8000 0000 | SDRM (CE0) | 16M bytes |
| 9000 0000 | 8-bit ROM (CE1) | 128K bytes |
| 9008 0000 | 8-bit I/O port (CE1) | 4 bytes |
| A000 0000 | Daughtercard (CE2) | 256M bytes |
| B000 0000 | Daughtercard (CE3) | 256M bytes |
The memory map is fixed by the CPU architecture itself and the way the external memory and input/output (I/O) devices are wired to the CPU. As shown above, each memory location has a 32-bit address and the addresses can be stored in the registers to be used as memory index for data load/store.
When you write an assembly program, you must designate where you want to load up each piece of your codes to execute it. After you write and assemble a piece of assembly code, you obtain a relocatable code, meaning that the code doesn't have any fixed memory reference in it and it can be placed at any memory location by supplying the information on where to be put in the memory map. Then, the linker combines different pieces of assembly codes together to produce the final executable code. The executable code has all the memory location information. For the linker to be able to generate an executable code by actually specifying the memory locations of each assembly code and data, we need to let the linker know the memory map (physical addresses) of the DSK board. For convenience, we can assign named to different pieces of the memory.
The linker command file is the file in which we let the linker know the memory map and the names of each memory sections.
A typical linker command file that can be used for our DSK board is listed below:
1 MEMORY
2 {
3 VECS: org = 0h, len = 0x220
4 I_HS_MEM: org = 0x00000220, len = 0x00000020
5 IRAM: org = 0x00000240, len = 0x0000FDC0
6 SDRAM: org = 0x80000000, len = 0x01000000
7 FLASH: org = 0x90000000, len = 0x00020000
8 }
9
10 SECTIONS
11 {
12 /* Created in vectors.asm */
13 vectors :> VECS
14
15 /* Created by Assembler */
16 .text :> IRAM
17
18 }
The file consists of two parts MEMORY and SECTIONS. The MEMORY part defines the physical addresses of memory blocks memory gap). In C6211, the internal RAM starts at 0x00000000 and the first 0x220 bytes contain the rest and interrupt vectors (we will later learn what they are). The above file named this block VECS. Most of the rest of the internal memory was named IRAM and is used to load program and data (defined in SECTIONS part). The external SDRAM was named SDRAM and the FLASH ROM memory is named FLASH. Note that the starting addresses and lengths of each memory exactly represent the memory map of our DSK board.
The SECTIONS part defines at which memory address to load each “section” of the program code or data. A section is a named piece of code. The section names are defined either in the source files (either assembly or C) or by the C compiler. Line 13 indicates that the vectors section (defined in vectors.asm file) is to be loaded starting at VECS memory address (which starts at 0x00000000). Other sections are all generated by either the assembler or the C compiler. For example, .text section represents the piece of program code generated by the assembler or the C compiler, and the linker command file directs it to be loaded on the internal memory (TRAM). For detailed description of all different sections, please refer to TMS320C6x Assembly Language Tools User’s Guide and TMSS3206x Optimizing C Compiler User’s Guide.
Write the above linker command file as dsk6211.cmd and save in your directory. If you want to load your program code in the external SDRAM. what changes do you need to make to the above linker command file?
After you load up your program code and data at some memory locations, you need to let the CPU to start executing the code. If you reset the CPU, the C6211 CPU starts executing the program code at memory location 0x00000000. Therefore, to execute your own program located at some other location, you have to write a short assembly code the jumps to your program’s entry point. To do this, you need another separate assembly code that is loaded at memory address 0x00000000. We call this file the reset vector file. Here is a example of the reset vector file:
1 .title "vectors.asm"
2
3 .ref entry
4
5 .sect "vectors"
6 rst: mvkl .s2 entry,b0
7 mvkl .s2 entry,b0
S b .s2 b0
9 nop
10 nop
11 nop
12 nop
13 nop
The first line names this piece of code as vector.asm. The .ref assembler directive lists the symbolic names defined in another file and used in the current file. That is, it declares that entry is a symbol (the address of the entry point defined in your own assembly program file) defined elsewhere. (.ref is similar to extern declaration in C). The .sect directive simply says that the linker should load the following assembly instructions in the vectors section defined in the linker command file. Because the linker command file above defines the vectors section to start at memory address 0x00000000, the assembly instructions are loaded starting at this location. This is exactly what we want.
When the C6211 DSP receives the reset signal, the CPU first initializes all registers and starts fetching and executing the code at memory address 0x00000000. Thus, we need to load the reset codes at memory address 0x00000000 before running any code. The file vectors.asm is the piece of code we let the linker load at this address.
When you have the entry point of your program code named entry, upon reset we direct the execution to this entry point. Lines 6 and 7 in the above vectors.asm load b0 register with the memory address of the entry to jump (branch) to the address contained in the b0 using the b (branch) instruction in line 8. Because the pipeline function of the processor (discussed later), unless we want to execute extra instructions before branching, we need five nop (no operation) instructions after each b instruction. For more detailed discussion of C62x instructions and the pipeline functions, refer to the TMS320C62x/67x CPU and Instruction Set Reference Guide.
To be able to write your own vectors.asm file, you need to know basic assembly programming. For now, all the reset vector files will have exactly same format as above - loading b0 register with the address to jump and then b instruction followed by five nop instructions.
Write your own vector.asm file and save it.
Now let’s write a very short assembly program that adds two numbers. The program does the following:
A0.Al.A0 and A1 and store the result in A2.MVK instruction. To add two register contents, we use the ADD instruction. Read the description of the MVK and ADD in the instruction set handout.
The core of the program will consist of three instructions.
1 MVK 0x1234,A0
2 MVK 0x0012,Al
3 ADD A0,A1,A2
We need to add the assembler directives to let the assembler and linker know how to assemble the code. First, to let the linker know that the code should be loaded at the internal memory area, we specify the section name using .text. Because .text is a special section name we don’t need to say .sect ‘”text”, and you can simply say .text. To define the program entry point, we need to define a label at the program start. The .def directive defines the symbol entry so that it can be referenced outside the current file. At the end of the program, we need to have the . end directive to let the assembler know the end of the code.
Putting all these together, we obtain
1 .text
2 .def entry
3 entry: MVK 0x1234,A0
4 MVK 0x0012,A1
5 ADD AO,Al,A2
6 IDLE
7 .end
We also added the IDLE instruction to let the CPU idle (execute infinite NOPs) after finishing the ADD instruction.
Write an assembly file add.asm having above 5 lines. Can you, assign the functional units to each instruction? Look up the table in the instruction set handout and properly assign functional units to all instructions.
Now you should have three files, vectors.asm, dsk6211.cmd, and add.asm. You’re ready to assemble them and execute your code under the Code Composer Studio.
The first thing you should do is to create a project and add the files to the project. This is exactly same as you did with the example C program in (Reference). The files you need to add are add.asm (assuming this is your assembly file name), vectors.asm, and c6211dsk.cmd. You do not need any run-time support library because your assembly program is simple and does not require any library support.
It is useful to let the assembler know the program entry point. You can set the options for the assembler using the Project:Build Options… menu. The program entry point address was defined as entry in your assembly code. You should let the linker know the entry point by specifying it in the linker options under Project:Build Options…. Put this name in the assembler option for the entry point. This is useful when restarting the program under the CCS. When you issue the restart command in code composer studio, the program counter (PC) is set to the address of the entry point.
After making the project, build your program under the CCS studio to generate the executable file. You can load the executable file onto the DSK in the same way as you did in (Reference).
Because your program consists of only 3 assembly instructions and it does not explicitly output any values, you cannot watch the program execution by simply running it. You should watch the register values to see what values are stored in registers and if the add instruction performed correctly. Examine the values stored in the A0, Al, and A2 registers before executing the program. Then run the program. After executing the first three instructions, the CPU will idle with the IDLE instruction. Halt the CPU execution under the CCS. Then, re-examine the register values to make sure they have the proper values.
Break points, watch variables, etc. work exactly same way as you tried in (Reference). Try setting break points at different instructions in the program and watch how the register contents change. Also try single step execution of the program.
To make your code more readable and easier to understand, you can define symbolic variables using the .set assembler directive (.eqn does the same job). We can re-write the program as follows:
1 a .set 0x1234
2 b .set 0x0012
3 .text
4 .def entry
3 entry: MVK a,A0
6 MVK b,Al
7 ADD AO,Al,A2
8 IDLE
9 .end
Build and execute your code after the above modification.
Let’s modify the program to compute
A3, we can write
1 a .set 0x1234
2 b .set 0x0012
3 x .set 0x3
4 .text
5 .def entry
6 entry: MVK 0,A2
7 MVK 0,A4
8 MVK a,A0
9 MVK b,Al
10 MVX x,A3
11 MPY A0,A3,A4
12 ADD A4,A1,A2
13 IDLE
14 .end
Assemble the above multiply-and-accumulate program under the CCS. What is the value expected in the registers after executing the program? Did you get the expected result in A2? If not, think of why you didn’t get the expected result. Using single step execution of the program, figure out what was wrong. How can you modify the program to obtain the correct result?