<?xml version="1.0" encoding="utf-8" standalone="no"?>
<!DOCTYPE document PUBLIC "-//CNX//DTD CNXML 0.5 plus MathML//EN" "http://cnx.rice.edu/cnxml/0.5/DTD/cnxml_mathml.dtd">
<document xmlns="http://cnx.rice.edu/cnxml" xmlns:md="http://cnx.rice.edu/mdml/0.4" xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:bib="http://bibtexml.sf.net/" id="None">
  <name>C62x Assembly Primer 1</name>
  <metadata>
  <md:version>1.1</md:version>
  <md:created>2004/10/04 21:33:45 GMT-5</md:created>
  <md:revised>2005/04/21 13:55:00.769 GMT-5</md:revised>
  <md:authorlist>
      <md:author id="choi">
      <md:firstname>Hyeokho</md:firstname>
      
      <md:surname>Choi</md:surname>
      <md:email>choi@ece.rice.edu</md:email>
    </md:author>
  </md:authorlist>

  <md:maintainerlist>
    <md:maintainer id="choi">
      <md:firstname>Hyeokho</md:firstname>
      
      <md:surname>Choi</md:surname>
      <md:email>choi@ece.rice.edu</md:email>
    </md:maintainer>
    <md:maintainer id="ahlfing">
      <md:firstname>Robert</md:firstname>
      
      <md:surname>Ahlfinger</md:surname>
      <md:email>ahlfing@rice.edu</md:email>
    </md:maintainer>
  </md:maintainerlist>
  
  <md:keywordlist>
    <md:keyword>C6211</md:keyword>
    <md:keyword>DSP</md:keyword>
    <md:keyword>ELEC 434</md:keyword>
    <md:keyword>Lab 2</md:keyword>
  </md:keywordlist>

  <md:abstract>Introduction to the basic C62x architecture and instruction set.</md:abstract>
</metadata>

  <content>
    <section id="intro"><name>Introduction</name>
       <para id="sec1p1">
Although you can obtain fairly good performance on the C62x CPU by programming in C language with TI's optimizing compiler, to control the CPU and peripherals directly or to optimize the code for maximum efficiency, you need to learn how the CPU works internally and how to write programs in C62x assembly language. This lab introduces the basic C62xs architecture and instruction set to familiarize you with the internal functions of the CPU and the assembly programming. You also learn the basic programming techniques using TI's assembler.
       </para>   
    </section>  
    <section id="overview"><name>Overview of C6211 Architecture</name>
       <para id="sec2p2">
The C62x consists of internal memory, peripherals (serial port, external memory interface, etc), and most importantly, the CPU that has the registers and the functional units for execution of instructions. <!-- fig1 illustrates the internal structure of the CPU and the relation with the peripherals outside the CPU. --> Although you don't need to care about the internal architecture of the CPU for compiling and running programs, it is necessary to understand how the CPU fetches and executes the assembly instructions to write a highly optimized assembly program.
      </para>
      <para id="sec2p3">
We learn the architecture and basic function of each CPU unit through the development of simple assembly language programs.
      </para>
    <section id="sscoreop">
      <name>Core DSP Operation</name>
      <para id="sec3p1">
In many DSP algorithms, the Sum of Products or Multiply-Accumulate (MAC) operations are very common.  A DSP CPU is designed to handle the math-intensive calculations necessary for common DSP algorithms.  For efficient implementation of the MAC operation, the C6211 CPU has two multipliers and each of them can perform a 16-bit multiplication in each clock cycle.  For example, if we want to compute the dot product of two length-40 vectors
         <m:math>
          <m:ci type="vector">
             <m:msub>
                <m:mi>a</m:mi>
                <m:mi>n</m:mi>
             </m:msub>
           </m:ci>
         </m:math> and 
         <m:math>
          <m:ci type="vector">
             <m:msub>
                <m:mi>x</m:mi>
                <m:mi>n</m:mi>
             </m:msub>
           </m:ci>
         </m:math>, we need to compute:
         <m:math display="block">
           <m:apply>
             <m:eq/>
               <m:ci type="vector">y</m:ci>
               <m:apply>
                 <m:sum/>
                 <m:bvar>
                    <m:ci>n</m:ci>
                 </m:bvar>
                 <m:lowlimit>
                    <m:cn>1</m:cn>
                 </m:lowlimit>
                 <m:uplimit>
                    <m:cn>40</m:cn>
                 </m:uplimit>   
                 <m:apply>
                    <m:vectorproduct/> 
                    <m:ci type="vector"> 
                      <m:msub>
                        <m:mi>a</m:mi>
                        <m:mi>n</m:mi>
                      </m:msub> 
                    </m:ci> 
                    <m:ci type="vector"> 
                      <m:msub>
                        <m:mi>x</m:mi>
                        <m:mi>n</m:mi>
                      </m:msub>
                    </m:ci> 
                 </m:apply>
               </m:apply>
          </m:apply>
         </m:math>   
(For example, the FIR filtering algorithm is exactly the same as this dot product operation.)  When
         <m:math>
          <m:ci type="vector">
             <m:msub>
                <m:mi>a</m:mi>
                <m:mi>n</m:mi>
             </m:msub>
           </m:ci>
         </m:math> and
         <m:math>
          <m:ci type="vector">
             <m:msub>
                <m:mi>x</m:mi>
                <m:mi>n</m:mi>
             </m:msub>
           </m:ci>
         </m:math>
are stored in memory, starting from
         <m:math>
           <m:apply>
             <m:eq/>
             <m:ci>n</m:ci>
             <m:cn>1</m:cn>
           </m:apply>
         </m:math>, we need to compute 
         <m:math>
           <m:apply>
             <m:vectorproduct/> 
             <m:ci type="vector"> 
               <m:msub>
                 <m:mi>a</m:mi>
                 <m:mi>n</m:mi>
               </m:msub> 
             </m:ci> 
             <m:ci type="vector"> 
                <m:msub>
                   <m:mi>x</m:mi>
                   <m:mi>n</m:mi>
                </m:msub>
             </m:ci> 
           </m:apply>
         </m:math> and add it to 
         <m:math>
           <m:ci type="vector">y</m:ci>
         </m:math> ( 
         <m:math>
           <m:ci type="vector">y</m:ci>
         </m:math> is initially 
         <m:math>
            <m:cn>0</m:cn>
         </m:math>) and repeat this up to 
         <m:math>
           <m:apply>
             <m:eq/>
             <m:ci>n</m:ci>
             <m:cn>40</m:cn>
           </m:apply>
         </m:math>.  In the C62x assembly, this MAC operation can be written as:
      </para>
      <para id="sec3p2">
        <code type="block">1  MPY .M a,x,prod
2  ADD .L y,prod,y</code>
      </para>
      <para id="sec3p3">
Ignore <code>.M</code> and <code>.L</code> for now.  Here, <code>a,x,prod,y</code> are numbers stored in memory and the instruction <code>MPY</code> multiplies two numbers <code>a</code> and <code>x</code> together and stores the result in <code>prod</code>.  The <code>ADD</code> instruction adds two numbers <code>y</code> and <code>prod</code> together storing the result back to <code>y</code>.
      </para>
    </section>  
    <section id="ssregfiles">
      <name>Register Files</name>
      <para id="sec4p1">
Where are the numbers stored in the CPU?  In C62x, the numbers used in operations are stored in the registers.  Because the registers are directly accessible though the data path of the CPU, accessing the registers are much faster than accessing data in the external memory.
      </para>
      <para id="sec4p2">
The C62x CPU has two register files (A and B).  Each of these files consists of sixteen 32-bit registers (A0-A15 for file A and B0-B15 for file B).  The general-purpose registers can be used for data, data address pointers, or condition registers.
      </para>
      <para id="sec4p3">
The general-purpose register files support data ranging in size from 16-bit data through 40-bit fixed-point.  Values larger than 32 bits, such as 40-bit long quantities, are stored in register pairs.  In a register pair, the 32 LSB's of data are placed in an even-numbered register and the remaining 8 MSB's in the next upper register (which is always an odd-numbered register).  In assembly language syntax, a colon between two register names denotes the register pairs, and the odd-numbered register is specified first.  For example, A1:A0 represents the register pair consisting of A0 and A1.  But you don't need to be concerned with the 40-bit numbers too much.  Throughout this course, you will be mostly handling either 16 or 32-bit values stored in a single register.
      </para>
      <para id="sec4p4">
Let's for now focus on file A only.  The registers in the register file A are named A0 to A15.  Each register can store a 32-bit binary number.  Then numbers such as <code>a,x,prod,y</code> above are stored in these registers.  For example, register A0 stores <code>a</code>.  For now, let's assume we interpret all 32-bit numbers stored in registers as unsigned integer.  Therefore the range of values we can represent is 0 to 
         <m:math>
           <m:apply>
             <m:minus/>
                <m:apply>
                   <m:power/>
                      <m:cn>2</m:cn>
                      <m:cn>32</m:cn>
                </m:apply>
                <m:cn>1</m:cn>
            </m:apply>
          </m:math>.  (For representation of real numbers using binary bits, we will learn about the Q format numbers for fixed-point representation of real numbers.)  Let's assume the numbers <code>a,x,prod,y</code> are in the registers A0, A1, A3, A4, respectively.  Then the above assembly instructions can be written specifically:
          <code type="block">1  MPY .M1 A0,A1,A3
2  ADD .L1 A4,A3,A4</code>
(Ignore <code>.M1</code> and <code>.L1</code> for the moment.)
      </para>
      <para id="sec4p5">
The TI C62x CPU has a load/store architecture. This means that all the numbers must be stored in the registers for being used as operands of the operations for instructions such as <code>MPY</code> and <code>ADD</code>.  The numbers can be read from a memory location to a register (using, for example, <code>LDW</code>, <code>LDB</code> instructions) or a register can be loaded with a constant value.  The content of a register can be stored to a memory location (using, for example,  <code>STW</code>, <code>STB</code> instructions).
     </para>
     <para id="sec4p6">
In addition to the general-purpose register files, the CPU has a separate register file for the control registers.  The control registers are used to control various CPU functions such as addressing mode, interrupts, etc.  You will learn more about some of the control registers when we learn each individual topic.
     </para>
    </section>
    <section id="functunits">
      <name>Functional Units</name>
      <para id="sec5p1">
Then, where do the actual operations such as multiplication and addition take place?  The C62x CPU has several <term>functional units</term> that perform the actual operations.  Each register file has 4 functional units named <code>.M</code>, <code>.L</code>, <code>.S</code>, and <code>.D</code><!--see figure 1-1 -->.  The 4 functional units connected to the register file A are named <code>.L1</code>, <code>.S1</code>, <code>.D1</code>, and <code>.M1</code>.  Those connected to the register file B are named <code>.L2</code>, <code>.S2</code>, <code>.D2</code>, and <code>.M2</code>.<!--See figure 1-1. -->  For example, the functional unit <code>.M1</code> performs multiplication on the operands that are in register file A.  When the CPU executes the <code>MPY .M1 A0, A1, A3</code> above, the functional unit <code>.M1</code> takes the value stored in <code>A0</code> and <code>A1</code>, multiply them together and stores the result to <code>A3</code>.  The <code>.M1</code> in <code>MPY .M1 A0, A1, A3</code> indicates that this operation is performed in the <code>.M1</code> unit.  The <code>.M1</code> unit has a 16 bit multiplier and all the multiplications are performed by the <code>.M1</code> (or <code>.M2</code>) unit.
      </para>
      <para id="sec5p2">
Similarly, the <code>ADD</code> operation can be executed by the <code>.L1</code> unit.  The <code>.L1</code> can perform all the logical operations such as bitwise AND operation (<code>AND</code> instruction) as well as basic addition (<code>ADD</code> instruction) and subtraction (<code>SUB</code> instruction).
      </para>
      <para id="sec5p3">
<!-- For complete list of instructions executed by each function unit, see Table 3-2 in the handout TMS320C62x/C64x/C67x Fixed-Point Instruction Set. -->We will later learn more about assigning the functional units for assembly instructions.      
      </para>
      <exercise id="ex21">
        <problem>
        <name>Addition and Multiplication</name>
        <para id="ex1p1">Read the description of <code>ADD</code> and <code>MPY</code> instructions in the TI manual handed out.  Write an assembly program that computes <code>A0*(A1+A2)+A3</code>.</para>
        </problem>
      </exercise>
    </section>
   </section>
   <section id="writerunassembly">
   <name>Writing and Running Your First Assembly Program</name>
     <section id="memmaplink">
     <name>Memory Map and Linker Command File</name>
     <para id="sec6p1">
When you have a piece of assembly code to execute on the CPU, you need to first load it up at some memory location.  The C6211 CPU has some <emphasis>internal</emphasis> memory space to store program code and data.  The DSK board also has an external RAM on the board you can use to store program code and data.  The memory map of the DSL board is as follows:
     </para>
     <table id="mapDSK" frame="all">
       <tgroup cols="3" colsep="1" rowsep="1">
         <thead>
           <row>
             <entry>Address</entry>
             <entry>Memory Map</entry>
             <entry>Size</entry>
           </row>
         </thead>
         <tbody>
           <row>
             <entry align="center">0000 0000</entry>
             <entry align="center">Internal Ram</entry>
             <entry align="center">64K bytes</entry>
           </row>
           <row>
             <entry align="center">0001 0000</entry>
             <entry align="center">Reserved</entry>
             <entry align="center">24K bytes</entry>
           </row> 
           <row>
             <entry align="center">0180 0000</entry>
             <entry align="center">Control registers</entry>
             <entry align="center">316 bytes</entry>
           </row>
           <row>
             <entry align="center">01A0 0000</entry>
             <entry align="center">EDMA parameter RAM</entry>
             <entry align="center">2M bytes</entry>
           </row>
           <row>
             <entry align="center">01A0 FFE0</entry>
             <entry align="center">Control registers</entry>
             <entry align="center">72 bytes</entry>
           </row>
           <row>
             <entry align="center">3000 0000</entry>
             <entry align="center">McBSP0 data</entry>
             <entry align="center">64M bytes</entry>
           </row>
           <row>
             <entry align="center">3400 0000</entry>
             <entry align="center">McBSP1 data</entry>
             <entry align="center">64M bytes</entry>
           </row>
           <row>
             <entry align="center">8000 0000</entry>
             <entry align="center">SDRM (CE0)</entry>
             <entry align="center">16M bytes</entry>
           </row>
           <row>
             <entry align="center">9000 0000</entry>
             <entry align="center">8-bit ROM (CE1)</entry>
             <entry align="center">128K bytes</entry>
           </row>
           <row>
             <entry align="center">9008 0000</entry>
             <entry align="center">8-bit I/O port (CE1)</entry>
             <entry align="center">4 bytes</entry>
           </row>
           <row>
             <entry align="center">A000 0000</entry>
             <entry align="center">Daughtercard (CE2)</entry>
             <entry align="center">256M bytes</entry>
           </row>
           <row>
             <entry align="center">B000 0000</entry>
             <entry align="center">Daughtercard (CE3)</entry>
             <entry align="center">256M bytes</entry>
           </row>
         </tbody>
       </tgroup>
     </table>
     <para id="sec6p2">
The memory map is fixed by the CPU architecture itself and the way the external memory and input/output (I/O) devices are wired to the CPU.  As shown above, each memory location has a 32-bit address and the addresses can be stored in the registers to be used as memory index for data load/store.
     </para>
     <para id="sec6p3">
When you write an assembly program, you must designate where you want to load up each piece of your codes to execute it.  After you write and assemble a piece of assembly code, you obtain a <term>relocatable</term> code, meaning that the code doesn't have any fixed memory reference in it and it can be placed at any memory location by supplying the information on where to be put in the memory map.  Then, the <term>linker</term> combines different pieces of assembly codes together to produce the final executable code.  The executable code has all the memory location information.  For the linker to be able to generate an executable code by actually specifying the memory locations of each assembly code and data, we need to let the linker know the memory map (physical addresses) of the DSK board.  For convenience, we can assign named to different pieces of the memory.
     </para>
     <para id="sec6p4">
 The <term>linker command file</term> is the file in which we let the linker know the memory map and the names of each memory sections.
     </para>
     <para id="sec6p5">
A typical linker command file that can be used for our DSK board is listed below:
     <code type="block">1  MEMORY
2   {
3   VECS:      org =         0h,  len =      0x220
4   I_HS_MEM:  org = 0x00000220,  len = 0x00000020
5   IRAM:      org = 0x00000240,  len = 0x0000FDC0
6   SDRAM:     org = 0x80000000,  len = 0x01000000
7   FLASH:     org = 0x90000000,  len = 0x00020000
8   }
9   
10  SECTIONS
11  {
12    /* Created in vectors.asm */
13    vectors  :&gt; VECS
14  
15    /* Created by Assembler */
16    .text    :&gt; IRAM
17 
18  }</code>
     </para>
     <para id="sec6p6">
<emphasis>The file consists of two parts MEMORY and SECTIONS.</emphasis>  The MEMORY part defines the physical addresses of memory blocks memory gap).  In C6211, the internal RAM starts at 0x00000000 and the first 0x220 bytes contain the rest and interrupt vectors (we will later learn what they are).  The above file named this block VECS.  Most of the rest of the internal memory was named IRAM and is used to load program and data (defined in SECTIONS part).  The external SDRAM was named SDRAM and the FLASH ROM memory is named FLASH.  Note that the starting addresses and lengths of each memory exactly represent the memory map of our DSK board.
     </para>
     <para id="sec6p7">
The <code>SECTIONS</code> part defines at which memory address to load each “section” of the program code or data. A <term>section</term> is a named piece of code. The section names are defined either in the source files (either assembly or C) or by the C compiler. Line 13 indicates that the <code>vectors</code> section (defined in vectors.asm file) is to be loaded starting at <code>VECS</code> memory address (which starts at 0x00000000). Other sections are all generated by either the assembler or the C compiler. For example, <code>.text</code> section represents the piece of program code generated by the assembler or the C compiler, and the linker command file directs it to be loaded on the internal memory (TRAM). For detailed description of all different sections, please refer to TMS320C6x Assembly Language Tools User’s Guide and TMSS3206x Optimizing C Compiler User’s Guide.
     </para>
     <exercise id="ex22">
        <problem>
             <name>Linker Command File</name>
             <para id="ex22p1">
Write the above linker command file as dsk6211.cmd and save in your directory. If you want to load your program code in the external SDRAM. what changes do you need to make to the above linker command file?
             </para>
        </problem>
      </exercise>
     </section>
     <section id="resetintervecfile">
     <name>Reset and Interrupt Vector File</name>
     <para id="sec7p1">
After you load up your program code and data at some memory locations, you need to let the CPU to start executing the code. If you reset the CPU, the C6211 CPU starts executing the program code at memory location 0x00000000. Therefore, to execute your own program located at some other location, you have to write a short assembly code the jumps to your program’s entry point. To do this, you need another separate assembly code that is loaded at memory address 0x00000000. We call this file the <term>reset vector file</term>.
Here is a example of the reset vector file:
<code type="block">
1	.title	"vectors.asm"
2
3	.ref	entry
4
5	.sect	"vectors"
6  rst:	mvkl	.s2	entry,b0
7       mvkl	.s2	entry,b0
S	b	.s2	b0
9       nop
10      nop
11	nop
12      nop
13      nop
</code>
     </para>
     <para id="sec7p2">
The first line names this piece of code as vector.asm. The <code>.ref</code> assembler directive lists the symbolic names defined in another file and used in the current file. That is, it declares that <code>entry</code> is a symbol (the address of the entry point defined in your own assembly program file) defined elsewhere. (<code>.ref</code> is similar to <code>extern</code> declaration in C). The <code>.sect</code> directive simply says that the linker should load the following assembly instructions in the vectors section defined in the linker command file. Because the linker command file above defines the vectors section to start at memory address 0x00000000, the assembly instructions are loaded starting at this location. This is exactly what we want.
     </para>
     <para id="sec7p3">
When the C6211 DSP receives the reset signal, the CPU first initializes all registers and starts fetching and executing the code at memory address 0x00000000. Thus, we need to load the reset codes at memory address 0x00000000 before running any code. The file vectors.asm is the piece of code we let the linker load at this address.
     </para>
     <para id="sec7p4">
When you have the entry point of your program code named <code>entry</code>, upon reset we direct the execution to this entry point. Lines 6 and 7 in the above vectors.asm load <code>b0</code> register with the memory address of the <code>entry</code> to jump (branch) to the address contained in the <code>b0</code> using the <code>b</code> (branch) instruction in line 8. Because the pipeline function of the processor (discussed later), unless we want to execute extra instructions before branching, we need five <code>nop</code> (no operation) instructions after each <code>b</code> instruction.  For more detailed discussion of C62x instructions and the pipeline functions, refer to the TMS320C62x/67x CPU and Instruction Set Reference Guide.
     </para>
     <para id="sec7p5">
To be able to write your own vectors.asm file, you need to know basic assembly programming. For now, all the reset vector files will have exactly same format as above -   loading <code>b0</code> register with the address to jump and then <code>b</code> instruction followed by five <code>nop</code> instructions.
     </para>
     <exercise id="ex23">
        <problem>
             <name>Reset Vector File</name>
             <para id="ex23p1">
Write your own vector.asm file and save it.
             </para>
        </problem>
      </exercise>
     </section>
     <section id="writeassembprogs">
     <name>Writing Simple Assembly Programs</name>
     <para id="sec8p1">
Now let’s write a very short assembly program that adds two numbers. The program does the following:
          <list id="instructsaddtwo" type="enumerated">
               <item>Load 0x1234 onto register <code>A0</code>.</item>
               <item>Load 0x0012 onto register <code>Al</code>.</item>
               <item>Add <code>A0</code> and <code>A1</code> and store the result in <code>A2</code>.</item>
          </list>
To load the constants to registers, we use the <code>MVK</code> instruction. To add two register contents, we use the <code>ADD</code> instruction. Read the description of the <code>MVK</code>  and <code>ADD</code>  in the instruction set handout.
     </para>
     <para id="sec8p2">
The core of the program will consist of three instructions.
       <code type="block">
1	MVK 0x1234,A0
2	MVK 0x0012,Al
3	ADD A0,A1,A2
       </code>
We need to add the assembler directives to let the assembler and linker know how to assemble the code. First, to let the linker know that the code should be loaded at the internal memory area, we specify the section name using <code>.text</code>. Because <code>.text</code> is a special section name we don’t need to say <code>.sect ‘”text”</code>, and you can simply say <code>.text</code>. To define the program entry point, we need to define a label at the program start. The <code>.def</code> directive defines the symbol entry so that it can be referenced outside the current file. At the end of the program, we need to have the <code>. end</code> directive to let the assembler know the end of the code.
     </para> 
     <para id="sec8p3">
Putting all these together, we obtain
       <code type="block">
1	  .text
2	  .def entry
3  entry: MVK 0x1234,A0
4         MVK 0x0012,A1
5  	  ADD AO,Al,A2
6	  IDLE
7	  .end
       </code>
We also added the <code>IDLE</code> instruction to let the CPU idle (execute infinite <code>NOP</code>s) after finishing the <code>ADD</code> instruction.
     </para>
     <exercise id="ex24">
        <problem>
             <name>Addition Program</name>
             <para id="ex24p1">
Write an assembly file add.asm having above 5 lines. Can you, assign the functional units to each instruction? Look up the table in the instruction set handout and properly assign functional units to all instructions.
             </para>
        </problem>
      </exercise>
    </section>
    <section id="assembexecusingCCS">
     <name>Assembly and Execution Using Code Composer Studio</name>
     <para id="sec9p1">
Now you should have three files, vectors.asm, dsk6211.cmd, and add.asm. You’re ready to assemble them and execute your code under the Code Composer Studio.
     </para>
   </section>
    <section id="creatingproject">
     <name>Creating A New Project</name>
     <para id="sec10p1">
The first thing you should do is to create a project and add the files to the project. This is exactly same as you did with the example C program in <cnxn document="m12767"/>. The files you need to add are add.asm (assuming this is your assembly file name), vectors.asm, and c6211dsk.cmd. You do not need any run-time support library because your assembly program is simple and does not require any library support.
     </para>
     <section id="setassembopt">
        <name>Setting The Assembler Options</name>
        <para id="ssec10p1">
It is useful to let the assembler know the program entry point. You can set the options for the assembler using the Project:Build Options… menu. The program entry point address was defined as <code>entry</code> in your assembly code. You should let the linker know the entry point by specifying it in the linker options under Project:Build Options….  Put this name in the assembler option for the entry point. This is useful when restarting the program under the CCS. When you issue the <code>restart</code> command in code composer studio, the program counter (PC) is set to the address of the entry point.
        </para>
        <para id="ssec10p2">
After making the project, build your program under the CCS studio to generate the executable file. You can load the executable file onto the DSK in the same way as you did in <cnxn document=" m12767"/>.
        </para>
     </section>
     <section id="execcode">
        <name>Executing Your Code</name>
        <para id="ssec10p3">
Because your program consists of only 3 assembly instructions and it does not explicitly output any values, you cannot watch the program execution by simply running it. You should watch the register values to see what values are stored in registers and if the add instruction performed correctly. Examine the values stored in the <code>A0</code>,  <code>Al</code>, and <code>A2</code> registers before executing the program. Then run the program. After executing the first three instructions, the CPU will idle with the <code>IDLE</code> instruction. Halt the CPU execution under the CCS. Then, re-examine the register values to make sure they have the proper values.
        </para>
     </section>
     <section id="debugging">
        <name>Debugging</name>
        <para id="ssec10p4">
Break points, watch variables, etc. work exactly same way as you tried in <cnxn document="m12767"/>. Try setting break points at different instructions in the program and watch how the register contents change. Also try single step execution of the program.
        </para>
     </section>
   </section>
   <section id="defandusesymname">
     <name>Defining and Using Symbolic Names</name>
     <para id="sec11p1">
To make your code more readable and easier to understand, you can define symbolic variables using the <code>.set</code> assembler directive (<code>.eqn</code> does the same job). We can re-write the program as follows:
       <code type="block">
1  a	  .set 0x1234
2  b	  .set 0x0012
3	  .text
4         .def entry
3  entry: MVK a,A0
6	  MVK b,Al
7	  ADD AO,Al,A2
8	  IDLE
9	  .end
       </code>
Build and execute your code after the above modification.    
     </para>
     <exercise id="ex25">
        <problem>
             <name>Modifying Your First asm Code</name>
             <para id="ex25p1">
Let’s modify the program to compute 
              <m:math>
                <m:apply>
                  <m:plus/>
                    <m:apply>
                      <m:times/>
                      <m:ci>a</m:ci>
                      <m:ci>x</m:ci>
                    </m:apply>
                    <m:ci>b</m:ci>
                </m:apply>
              </m:math>. For example, if  
              <m:math>
                <m:apply>
                  <m:eq/>
                  <m:ci>x</m:ci>
                  <m:cn>3</m:cn>
                </m:apply>
              </m:math> and stored in <code>A3</code>, we can write
<code type="block">
1  a      .set 0x1234
2  b      .set 0x0012
3  x      .set 0x3
4	  .text
5	  .def entry
6  entry: MVK 0,A2
7	  MVK 0,A4
8	  MVK a,A0
9	  MVK b,Al
10	  MVX x,A3
11	  MPY A0,A3,A4
12	  ADD A4,A1,A2
13	  IDLE
14	  .end
</code>
Assemble the above multiply-and-accumulate program under the CCS. What is the value expected in the registers after executing the program? Did you get the expected result in <code>A2</code>? If not, think of why you didn’t get the expected result. Using single step execution of the program, figure out what was wrong. How can you modify the program to obtain the correct result?
             </para>
        </problem>
      </exercise>
     </section>
   </section>
  </content>
</document>
