<?xml version="1.0" encoding="utf-8" standalone="no"?>
<!DOCTYPE document PUBLIC "-//CNX//DTD CNXML 0.5 plus MathML//EN" "http://cnx.rice.edu/cnxml/0.5/DTD/cnxml_mathml.dtd">
<document xmlns="http://cnx.rice.edu/cnxml" xmlns:md="http://cnx.rice.edu/mdml/0.4" xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:bib="http://bibtexml.sf.net/" id="m10658">
  <name>Spectrum Analyzer: Processor Exercise Using C Language with C Introduction (55x)</name>
  <metadata>
  <md:version>1.11</md:version>
  <md:created>2006/08/02 11:06:55 GMT-5</md:created>
  <md:revised>2007/09/26 16:53:22.666 GMT-5</md:revised>
  <md:authorlist>
      <md:author id="tbshen">
      <md:firstname>Thomas</md:firstname>
      
      <md:surname>Shen</md:surname>
      <md:email>tbshen@uiuc.edu</md:email>
    </md:author>
  </md:authorlist>

  <md:maintainerlist>
    <md:maintainer id="tbshen">
      <md:firstname>Thomas</md:firstname>
      
      <md:surname>Shen</md:surname>
      <md:email>tbshen@uiuc.edu</md:email>
    </md:maintainer>
  </md:maintainerlist>
  
  <md:keywordlist>
    <md:keyword>autocorrelation</md:keyword>
    <md:keyword>block processing</md:keyword>
    <md:keyword>C language</md:keyword>
    <md:keyword>DFT</md:keyword>
    <md:keyword>digital signal processing</md:keyword>
    <md:keyword>discrete fourier transform</md:keyword>
    <md:keyword>discrete time fourier transform</md:keyword>
    <md:keyword>DTFT</md:keyword>
    <md:keyword>fast algorithms</md:keyword>
    <md:keyword>fast fourier transform</md:keyword>
    <md:keyword>FFT</md:keyword>
    <md:keyword>frequency domain</md:keyword>
    <md:keyword>IIR filter</md:keyword>
    <md:keyword>PN generator</md:keyword>
    <md:keyword>power spectral density</md:keyword>
    <md:keyword>power spectral density estimation</md:keyword>
    <md:keyword>PSD</md:keyword>
    <md:keyword>pseudo-noise generator</md:keyword>
    <md:keyword>spectral analysis</md:keyword>
    <md:keyword>spectrum</md:keyword>
    <md:keyword>windowing</md:keyword>
  </md:keywordlist>

  <md:abstract>This is a processor exercise in which students implement a spectrum analyzer using mixed C and assembly code.  Students are to acquire a block of 1024 samples, apply a Hamming window, compute a length-1024 Discrete Fourier Transform using provided Fast Fourier Transform code, and display the magnitude-squared spectrum on an oscilloscope. Students will also compile and explore a reference implementation of an autocorrelation-based power spectral density (PSD) estimator. This implementation estimates the PSD of an IIR-filtered pseudo-noise generator.</md:abstract>
</metadata>


  <content>

    <section id="sec1">
      <name>Implementation</name>

      <para id="sec1para1">
        As this is your first experience with the C environment, you will
        have the option to add most of the required code in C or assembly.
        A C skeleton will provide access to input samples, output samples,
        and interrupt handling code. You will add code to transfer the
        inputs and outputs (in blocks at a time), apply a hamming window,
        compute the magnitude-squared spectrum, and include a trigger pulse.
        After the hamming window is created, either an assembly or C module
        that bit reverses the input and performs an FFT calculation is called.
      </para>

      <para id="sec1para2">As your spectrum analyzer works on a block of samples at a
        time, you will need to use interrupts to pause your processing
        while samples are transferred from/to the CODEC (A/D and D/A)
        buffer.  Fortunately, the interrupt handling routines have
        been written for you in a C shell program available at
        <code>v:\ece420\55x\lab4\main.c</code>. For this lab, you will be working with the code available at <code>v:\ece420\55x\lab4</code>.
      </para>

      <section id="sec1sub1">
        <name>Interrupt Basics</name>

        <para id="sec1sub1para1">Interrupts are an essential part of the operation of any
          microprocessor.  They are particularly important in embedded
          applications where DSPs are often used.  Hardware interrupts
          provide a way for interacting with external devices while
          the processor executes code.  For example, in a key entry
          system, a key press would generate a hardware interrupt.
          The system code would then jump to a specified location in
          program memory where a routine could process the key input.
          Interrupts provide an alternative to polling.  Instead of
          checking for key presses at a predetermined rate (requires a
          clock), the system could be busy executing other code.  On
          the TI-C55x DSP, interrupts provide a convenient way to
          transfer blocks of data to/from the CODEC in a timely
          fashion.
        </para>
      </section>

      <section id="sec1sub2">
        <name>Interrupt Handling</name>

        <para id="sec1sub2para1">The <code>main.c</code>,<code>dma.c</code>, and <code>lab4.c</code> code are intended
          to make your interaction with the hardware much simpler. These files handle the the buffering of data using interrupts from the CODEC (A/D and D/A) and the USB port. Here, we
          will describe the important aspects of the code necessary
          to complete the assignment.
        </para>
        <para id="sec1sub2para2">In the first few labs, data was processed on a sample-by-sample basis, so no buffering was necesary. However, the spectrum analyzer to be
          implemented in this lab works over a block of
          <m:math>
            <m:apply>
              <m:eq/>
              <m:ci>N</m:ci>
              <m:cn>1024</m:cn>
            </m:apply>
          </m:math> samples.  If it were possible to compute a
          1024-point FFT in the sample time of one
          sample, then no additional interrupt handling
          routines would be necessary.  Samples could be collected in
          a 1024-length buffer and a 1024-point FFT could be computed
          uninterrupted while the buffer fills.
          Unfortunately, the DSP is not fast enough to accomplish this
          task.
        </para>

        <para id="sec1sub2para3">We now provide an explanation of the shell C program
          <code>main.c</code>. The
          <code>main.c</code> file contains the main function that sets up the McBSP (Multi Channel Buffered Serial Port), DMA (Direct Memory Access), and interrupts. Then it returns and allows the DSP/BIOS scheduler  to take over.

 </para>

        <para id="element-551">Many of the important interrupt routines are defined in the DSP/BIOS scheduler. The settings can be viewed by expanding the DSP/BIOS Config folder and double-clicking on the .cdb file. Changes do not have to be made, but it is good to know that the most important parts are the Scheduling and the Chip Support Library. Under Scheduling, there is the Hardware Interrupt Manager and Software Interrupt Manager. The Chip Support Library includes the DMA controller and the MCBSP. Various settings for these and many other modules can be set using a graphical interface instead of straight up coding. </para><para id="element-398">Code for the DMA is defined in <code>dma.c</code>. The buffers are defined in this file, as well as the hardware interrupts. The hardware interrupts are initialized by calling the function <code>init_DMA</code>in the main function. When the hardware interrupts are triggered, they call the <code>HWI_DMA0_Transmit()</code> and <code>HWI_DMA1_Receive()</code> functions. At the end of these two functions, the software interrupt <code>SWI_Process()</code> is posted with different variables. Posting <code>SWI_Process()</code> will call the <code>SWI_ProcessBuffer()</code> function which will require modification.</para><para id="element-209">The <code>SWI_ProcessBuffer()</code> function is defined in <code>lab4.c</code>. It is called every time the software interrupt <code>SWI_Process</code> is posted, which is set to happen every N = 1024 samples. As given, the function will simply copy the inputs to the outputs. (After commenting out some lines and uncommenting others.) Follow the example and comments to modify the code to perform the necessary operations.</para><para id="element-51">Although each of these operations may be performed
          in C or assembly, we suggest you follow the guidelines
          suggested.

          <list id="list1" type="enumerated"><item>Transfer inputs (C)</item>
            <item>Apply a Hamming Window (C/assembly)</item>
            <item>Bit-reverse the input (C and assembly) (Already done)</item>
            <item>Apply an 
              <m:math>
		<m:ci>N</m:ci>
	      </m:math>-point FFT (C and assembly) (Already done)</item>
            <item>Compute the magnitude-squared spectrum and place in output (C/assembly)</item>
            <item>Include a trigger pulse (C/assembly)</item>
          </list>
        <emphasis>Note: Bit-reversing and application of the FFT may be done in reverse order depending on implementation.</emphasis></para><para id="sec1sub2para4">Near the beginning of the SWI_ProcessBuffer function, the input samples need to be copied to specific buffers for processing. In C, pointers may be
          used as array names so that <code>pdest[0]</code> is the
          first word pointed to by <code>pdest</code>. The input samples are not in 
          consecutive order and must be accessed with offsets.  The four channels of input are offset from 
          <code>psrc</code> respectively by 
          <m:math display="inline">
            <m:apply>
              <m:times/>
                <m:cn>4</m:cn>
                <m:ci>i</m:ci>
            </m:apply>
          </m:math> and
          <m:math display="inline">
            <m:apply>
              <m:plus/>
                <m:apply>
                  <m:times/>
                    <m:cn>4</m:cn>
                    <m:ci>i</m:ci>
                </m:apply>
                <m:cn>2</m:cn>
            </m:apply>
          </m:math>,
          <m:math display="inline">
            <m:apply>
              <m:eq/>
                <m:ci>i</m:ci>
                <m:cn>0</m:cn>
            </m:apply>
          </m:math>, …,
          <m:math display="inline">
            <m:apply>
              <m:minus/>
                <m:ci>BlockLen</m:ci>
                <m:cn>1</m:cn>
            </m:apply>
          </m:math>.
          The four output channels are accessed consecutively as offsets
          from <code>pdest</code>. On channel 1 of the
          output, the input is echoed out. <emphasis>You are to fill
          channel 2 with the windowed
          magnitude-squared FFT values by performing the operations
          listed above.</emphasis> For the first step, take a look at the way we make the DSPLIB cfft call to find out where to transfer the inputs to. (You may change the function call <code>cfft</code> to pass in different values if you like. Just remember that <code>bit_rev()</code> expects its input in a specific location.) Likewise, take a look at the C FFT code (declared in <code>lab4fft.c</code> to find out where to copy the inputs to.
        </para>

        

        

        

      </section>

      <section id="sec1sub3">
        <name>Assembly FFT Routine</name>

        <para id="sec1sub3para1">As the list of operations indicates, bit-reversal and FFT
          computation are to be done in both C and assembly.  For the
          assembly version, make sure that the line defining
          <code>C_FFT</code> is commented out in <code>lab4.c</code>.
          We are providing you with a shell assembly file, available at
          <code>v:\ece420\55x\lab4\c_fft_given.asm</code> and shown
          in <cnxn target="sec_appendix_b">Appendix B</cnxn>, containing many
          useful declarations and some code. The code for performing
          bit-reversal and other declarations needed for the FFT
          routine are also provided in this section.
          <emphasis>However, we would like you to enter this code
          manually, as you will be expected to understand its
          operation.
          </emphasis>
        </para>

        <para id="sec1sub3para2">The assembly file <code>c_fft_given.asm</code>
          contains two main parts, the data section 
          starting with <code>.sect ".data"</code> and the
          program section starting with  
          <code>.sect ".text"</code>.  Every function and
          variable accessed in C must be preceded by a single underscore
          <code>_</code> in assembly and a 
          <code>.global _name</code> must be placed in the
          assembly file for linking.  In this example, 
          <code>bit_rev_fft</code> is an assembly function called from 
          the C program with a label <code>_bit_rev_fft</code> in the
          text portion of the assembly file and a
          <code>.global _bit_rev_fft</code> declaration.  In each
          assembly function, the macro <code>ENTER_ASM</code> is
          called upon entering and <code>LEAVE_ASM</code> is
          called upon exiting.  These macros are defined in
          <code>v:\ece420\55x\macro.asm</code>.  The 
          <code>ENTER_ASM</code> macro saves the status registers
          and <code>AR1</code>, <code>AR6</code>, and
          <code>AR7</code> when entering a function as required
          by the register use conventions.  The <code>ENTER_ASM</code>
          macro also sets the status registers to the assembly conventions we 
          have been using (i.e, <code>FRCT</code>=1 for fractional
          arithmetic and <code>CPL</code>=0 for
          <code>DP</code> referenced addressing).  The
          <code>LEAVE_ASM</code> macro just restores the saved
          registers.
        </para>

      <section id="sec1sub4">
        <name>Parameter Passing</name>

        <para id="sec1sub4para1">The parameter passing convention between assembly and C places the parameters into registers depending on the size of the parameters. Data pointers (16 or 23 bit) will be placed in (X)AR0 through (X)AR4 in that order. 16-bit data will be placed in T0, T1, and AR0 through AR4 in that order. 32-bit data will be placed in accumulators AC0 through AC2. If there are no available registers of the correct type, the parameters will be passed onto the stack. For more details, see page 6-16 of the Optimizing C/C++ Compiler User's Guide (<link src="http://focus.ti.com/lit/ug/spru281e/spru281e.pdf">spru281e</link>).
        </para>
      </section>

      <section id="sec1sub5">
        <name>Registers Modified</name>

        <para id="sec1sub5para1">
          When entering and leaving an assembly function, the 
          <code>ENTER_ASM</code> and <code>LEAVE_ASM</code>
          macros ensure that certain registers are saved and restored.  
          Since the C program may use any and all registers, the state of
          a register cannot be expected to remain the same between calls to
          assembly function(s).  <emphasis>Therefore, any information that 
          needs to be preserved across calls to an assembly function must be 
          saved to memory!</emphasis>
        </para>
      </section>

        <para id="sec1sub5para2">Now, we explain how to use the FFT routine provided by TI
          for the C55x. TI provides a library of commonly used functions. These functions include FFT, FIR, IIR, and some math operations. More information can be found in the <link src="http://focus.ti.com/lit/ug/spru422i/spru422i.pdf">DSP Library Programmer's Reference</link>. The <code>CFFT</code> function will be used for the DSPLIB implementation. Refer to the reference guide to figure out the correct syntax in calling the function. </para>

        <para id="element-242">The length of the FFT is defined as a label
          K_FFT_SIZE and the bit-reversing algorithm assumes that the
          input starts at data memory location _fft_data.
          To have your code assemble for an
          
            
	  N-point FFT, you will have to include the following
          label definitions in your assembly code.</para><code type="block" id="verbatim"><![CDATA[
	  N                 .set       1024
	  K_FFT_SIZE        .set       N           ; size of FFT
	  ]]>
        </code>

        

        

        <para id="sec1sub5para4">The FFT provided by TI requires that the input be in normal order, with alternating real and imaginary
          components. The output will be in bit-reversed order, so bit-reversing needs to be done after the FFT. Bit-reversed addressing is a convenient way to
          order input/output
          <m:math>
            <m:apply>
              <m:ci type="fn" class="discrete">x</m:ci>
              <m:ci>n</m:ci>
            </m:apply>
          </m:math> into a FFT so that the output/input 
          <m:math>
            <m:apply>
              <m:ci type="fn">X</m:ci>
              <m:ci>k</m:ci>
            </m:apply>
          </m:math> is in sequential order (<foreign>i.e.</foreign>
          <m:math>
            <m:apply>
              <m:ci type="fn">X</m:ci>
              <m:cn>0</m:cn>
            </m:apply>
          </m:math>,
          <m:math>
            <m:apply>
              <m:ci type="fn">X</m:ci>
              <m:cn>1</m:cn>
            </m:apply>
          </m:math>, …,
          <m:math>
            <m:apply>
              <m:ci type="fn">X</m:ci>
              <m:apply>
                <m:minus/>
		<m:ci>N</m:ci>
		<m:cn>1</m:cn>
              </m:apply>
            </m:apply>
          </m:math> for an 
          <m:math>
            <m:ci>N</m:ci>
	  </m:math>-point FFT).  The following table illustrates the
          bit-reversed order for an eight-point sequence.
        </para>

        <table frame="topbot" id="table1">
          <name/>
          <tgroup cols="4" align="center" colsep="1" rowsep="1">
	    <thead valign="top">
	      <row>
		<entry align="center">Input Order</entry>
		<entry align="center">Binary Representation</entry>
		<entry align="center">Bit-Reversed Representation</entry>
		<entry align="center">Output Order</entry>
	      </row>
	    </thead>
	    <tbody valign="top">
	      <row>
		<entry align="center">0</entry>
		<entry align="center">000</entry>
		<entry align="center">000</entry>
		<entry align="center">0</entry>
	      </row>
	      <row>
		<entry align="center">1</entry>
		<entry align="center">001</entry>
		<entry align="center">100</entry>
		<entry align="center">4</entry>
	      </row>
	      <row>
		<entry align="center">2</entry>
		<entry align="center">010</entry>
		<entry align="center">010</entry>
		<entry align="center">2</entry>
	      </row>
	      <row>
		<entry align="center">3</entry>
		<entry align="center">011</entry>
		<entry align="center">110</entry>
		<entry align="center">6</entry>
	      </row>
	      <row>
		<entry align="center">4</entry>
		<entry align="center">100</entry>
		<entry align="center">001</entry>
		<entry align="center">1</entry>
	      </row>
	      <row>
		<entry align="center">5</entry>
		<entry align="center">101</entry>
		<entry align="center">101</entry>
		<entry align="center">5</entry>
	      </row>
	      <row>
		<entry align="center">6</entry>
		<entry align="center">110</entry>
		<entry align="center">011</entry>
		<entry align="center">3</entry>
	      </row>
	      <row>
		<entry align="center">7</entry>
		<entry align="center">111</entry>
		<entry align="center">111</entry>
		<entry align="center">7</entry>
	      </row>
	    </tbody>
          </tgroup>
        </table>

        <para id="sec1sub5para5">The following routine performs the bit-reversed reordering
          of the output data.  The routine assumes that the output is
          stored in data memory starting at the location labeled
          <code>_bit_rev_data</code>, which must be aligned to the
          least power of two greater than the input buffer length, and
          consists of alternating real and imaginary parts.  Because
          our input data is going to be purely real in this lab, you
          will have to make sure that you set the imaginary parts to
          zero by zeroing out every other memory location.
        </para>

        <code type="block" id="block3"><![CDATA[
_bit_rev

	ENTER_ASM

	MOV		#_bit_rev_data, AR3            ;AR3 -> original input
	MOV		#_fft_data, AR7                ;AR7 -> data processing buffer
	MOV		AR7, AR2                       ;AR2 -> bit-reversed data
	MOV		#K_FFT_SIZE-1, BRC0
	MOV		#K_FFT_SIZE, T0                ; T0 = 1/2 size of circ buffer
	RPTB	bit_rev_end-1
	MOV		dbl(*AR3), AC0
	MOV		AC0, dbl(*AR2+)
	MAR		*AR3+0B
bit_rev_end:

	LEAVE_ASM

	RET
          ]]>
        </code>

        <para id="sec1sub5para6">As mentioned, in the above code <code>_bit_rev_data</code>
          is a label indicating the start of the input data and
          <code>_fft_data</code> is a label indicating the start of a
          circular buffer where the bit-reversed data will be
          written. </para>

        <para id="sec1sub5para7">In general, to have a pointer index memory in bit-reversed
          order, the <code>T0</code> register needs to be set to
          one-half the length of the circular buffer; a statement such
          as <code>ARx+0B</code> is used to move the <code>ARx</code>
          pointer to the next location. For more information regarding
          the bit-reversed addressing mode, refer to <cite>Chapter 6</cite> in the <cite src="http://focus.ti.com/lit/ug/spru376a/spru376a.pdf">DSP Programmer's Reference Guide</cite>.  Is it possible to
          bit-reverse a buffer in place?</para>
      </section>

      <section id="sec1sub6">
        <name>C FFT Routine</name>
        <para id="sec1sub6para1">A bit-reversing and FFT routine have also been provided in 
           <code>lab4fft.c</code>, listed in
           <cnxn target="sec_appendix_c">Appendix C</cnxn>. <emphasis>Again,
           make sure you understand how the bit reversal is taking
           place.</emphasis>
           In <code>main.c</code>, the line defining <code>C_FFT</code>
           must not be commented for use of the C FFT routine. The sine
           tables (twiddle factors) are located in 
           <link src="sinetables.h">sinetables.h</link>.
           This fft requires its inputs in two buffers, the real buffer
           <code>real</code> and the imaginary buffer <code>imag</code>,
           and the output is placed in the same buffers.
           The length of the FFT, <code>N</code>, and <code>logN</code> are
           defined in <code>lab4.h</code>, which is also listed in 
           <cnxn target="sec_appendix_c">Appendix C</cnxn>.
           <emphasis>When experimenting with the C
           FFT make sure you modify these length values instead of the ones
           in the assembly code and <code>main.c</code>!</emphasis>
        </para>
      </section>

      <section id="sec1sub7">
        <name>Creating the Window</name>
        <para id="sec1sub7para1">As mentioned, you will be using the FFT to compute the
          spectrum of a windowed input.  For your implementation you
          will need to create a 1024-point Hamming window.  First,
          create a Hamming window in matlab using the function
          <code>hamming</code>.  Use the matlab
          function <link src="write_intvector_headerfile.m">
          <code>write_intvector_headerfile</code></link>
          with <code>name</code> set to <code>'window'</code> and
          <code>elemperline</code> set to <code>8</code> to create
          the header file that is included in <code>lab4.c</code>.
        </para>
      </section>

      <section id="sec1sub8">
        <name>Displaying the Spectrum</name>

        <para id="sec1sub8para1">Once the DFT has been computed, you must calculate the
          squared magnitude of the spectrum for display.

          <equation id="eq2">
            <m:math>
              <m:apply>
                <m:eq/>
		<m:apply>
		  <m:power/>
		  <m:apply>
		    <m:abs/>
		    <m:apply>
		      <m:ci type="fn">X</m:ci>
		      <m:ci>k</m:ci>
		    </m:apply>
		  </m:apply>
		  <m:cn>2</m:cn>
		</m:apply>
		<m:apply>
		  <m:plus/>
		  <m:apply>
		    <m:power/>
		    <m:apply>
		      <m:real/>
		      <m:apply>
			<m:ci type="fn">X</m:ci>
			<m:ci>k</m:ci>
		      </m:apply>
		    </m:apply>
		    <m:cn>2</m:cn>
		  </m:apply>
		  <m:apply>
		    <m:power/>
		    <m:apply>
		      <m:imaginary/>
		      <m:apply>
			<m:ci type="fn">X</m:ci>
			<m:ci>k</m:ci>
		      </m:apply>
		    </m:apply>
		    <m:cn>2</m:cn>
		  </m:apply>
		</m:apply>
              </m:apply>
            </m:math>
          </equation> You may find the assembly instructions
          <code>sqrm</code> and <code>sqam</code> useful in
          implementing <cnxn target="eq2"/>.
        </para>

        <para id="sec1sub8para2">
          Because the squared magnitude is always nonnegative, you can
          replace one of the magnitude values with a -1.0 as a trigger
          pulse for display on the oscilloscope. This is easily
          performed by replacing the DC term (
          <m:math>
            <m:apply>
              <m:eq/>
	      <m:ci>k</m:ci>
	      <m:cn>0</m:cn>
            </m:apply>
          </m:math>)
          with a -1.0 when copying the magnitude values to the output buffer. The
          trigger pulse is necessary for the oscilloscope to lock to a specific
          point in the spectrum and keep the spectrum fixed on the scope.
        </para>
      </section>

      <section id="sec1sub9">
        <name>Intrinsics</name>

        <para id="sec1sub9para1">If you are planning on writing some of the code in C, then
          you may be forced to use intrinsics.  Intrinsic instructions
          provide a way to use assembly instructions directly in C.
          An example of an intrinsic instruction is
          <code>bit_rev_data[0]=_smpyr(bit_rev_data[0],window[0])</code>
          which performs the assembly signed multiply round
          instruction.  You may also find the <code>_lsmpy</code>
          instruction useful.  For more information on intrinsics, see
          <cite>page 3-29</cite> of the <cite src="http://focus.ti.com/lit/ug/spru376a/spru376a.pdf">TI-C55x
          DSP Programmer's Guide</cite>.
        </para>

        <para id="sec1sub9para2">The following lines of code were borrowed from the C FFT to serve as an example of arithmetic operations in C.  Save this code in a file called mathex.c and create a new project by going to <code>Project-&gt;New...</code> . In the following window, enter mathex as the name for the project and save it in its own folder on the W: drive. Verify that the Project Type is Executable (.out) and that the target is TMS320C55XX before clicking Finish.</para>

        <code type="block" id="block5"><![CDATA[
            int s1, s2;
            int t1, t2;
            int i1, i2;
            int n1 = 16383, n2 = 16382, n3 = 16381, n4 = 16380;

            void main(void)
            {
               /* Code for standard 32-bit hardware, */
               /* with x,y limited to 16 bits        */
               s1 = (n1*n2 + n3*n4) >> 15;
               s2 = (n1 + n2) >> 1;

               /* Code for TI TMS320C55X series */
               t1 = ((long int)(n1*n2) + (long int)(n3*n4)) >> 15;
               t2 = ((long int)n1 + (long int)n2) >> 1;

               /* Intrinsic code for TMS320C55X series */
               i1 = _sadd(_smpy(n1,n2), _smpy(n3,n4));
               i2 = _sshl(_sadd(n1, n2),-1);

               while(1);
            }
          ]]>
        </code><para id="element-504">Add the mathex.c file to the project by left-clicking on the mathex.pjt file in the left-hand window and selecting <code>Add Files to Project..</code> . By default, the generated assembly code is not saved. To save the generated assembly for later comparison, go to <code>Project-&gt;Build Options</code>. Under the Compiler tab, click on the Assembly category and make sure <code>Keep Generated .asm Files</code> is selected.</para><para id="element-916">Compile your project before looking at the resulting assembly file and investigating the differences between
          each block.  Be sure to reference <cite>page 3-32</cite> of the
          DSP Programmer's Guide to find 
          out what the state of the FRCT and OVM bits are. Run
          this program on the DSP, halt the program, and compare the
          output values in a memory window. Does each block work properly
          for all possible values?
        </para>
      </section>

      <section id="sec1sub10">
        <name>Compiling and Linking</name>

        <para id="sec1sub10para1">
          A working program can be produced by compiling the C code and
          linking assembly modules and the core module.  The compiler
          translates C code to a relocatable assembly form.  The linker
          assigns physical addresses on the DSP to the relocatable data
          and code segments, resolves <code>.global</code>
          references and links runtime libraries.
        </para>

        <para id="sec1sub10para2">Close the mathex project and go back to the original Lab 4 project. In the future if there are additional source code files to include in the project, just follow the above instructions.  Once you have
          completed <code>lab4.c</code> and <code>c_fft_given.asm</code>, select <code>Project-&gt;Rebuild All</code>. Load the output file onto the DSP as usual and confirm that valid
          FFTs are calculated.  Once valid output is obtained, measure how
          many clock cycles it takes to compute both the assembly and C FFT.
        </para>
      </section>

    </section>

    <section id="sec2">
      <name>Reference Implementation of a PSD estimator</name>
      <para id="sec2para1">We provide for you in Appendix <cnxn target="sec_appendix_d">D</cnxn>
        and <cnxn target="sec_appendix_e">E</cnxn> a complete C implementation
        of a PSD estimator. The input is an IIR-filtered pseudo-noise (PN)
        sequence generator and the PSD estimate is based on windowing the
        autocorrelation with a rectangular window.
        The code consists of the files <code>lab4b.c</code>,
        <code>lab4b.h</code>, <code>intrinsics.h</code>, <code>pn.c</code>,
        <code>iirfilt.c</code>, <code>autocorr.c</code>,
        <code>c_fft_given_iirc.asm</code>, and the previously-given TI FFT
        routine. The assembly file <code>c_fft_given_iirc.asm</code> differs
        from <code>c_fft_given.asm</code> in that the window array has been
        removed and variables and arrays associated with IIR filtering have
        been added. Note that the multiply functions in the functions are
        actually compiler directives contained in <code>intrinsics.h</code>.
        Make sure you know which ones are used and why; note that
        <code>VPO</code> is not defined by the TI compiler, therefore the
        corresponding section of the <code>#ifdef</code> statement is not used.
        Open up the lab4b.pjt project and <code>Rebuild All</code>.
Load <code>lab4b.out</code> onto the DSP and
        run the code. Make sure that an
        IIR-filtered PN sequence appears on channel 1 and its PSD estimate
        appears on channel 2.
      </para>

      <para id="sec2para2">
        Does the output match your expectations based on the theory? Does this
        application illustrate any limitations of the FFT implementation?
        (Hint: note that most of the values in the FFT input are zero.)
        The previously-given C implementation uses a similar algorithm as the
        TI FFT; take a look at the C code for help.
        What are the limitation(s) of the FFT that show up in this application?
      </para>

      <para id="sec2para3">In <code>lab4b.h</code>
	<code>M</code> sets the number of
        autocorrelation points that are calculated. What is the maximum value
        of <code>M</code> that allows the reference routines to run in real
        time? In determining this value you may find it useful to connect a
        wave-function generator to the input and copy input on that channel
        into channel 1 of output. You may limit <code>M</code> to
        powers of <code>2</code> minus <code>1</code>.
      </para>
    </section>

    <section id="sec3">
      <name>Quiz Information</name>

      <para id="sec3para1">
        From your prelab experiments, you should be able to describe
        the effect of windowing and zero-padding on FFT spectral
        analysis.  In your DSP system, experiment with different
        inputs, changing
        <m:math>
	  <m:ci>N</m:ci>
	</m:math> and the type of window.  Can you explain what happens
        as the input frequency is increased beyond the Nyquist rate? Does the
        <m:math>
          <m:apply>
            <m:power/>
	    <m:apply>
	      <m:abs/>
	      <m:apply>
		<m:ci type="fn">X</m:ci>
		<m:ci>k</m:ci>
	      </m:apply>
	    </m:apply>
	    <m:cn>2</m:cn>
          </m:apply>
        </m:math> coincide with what you expect from Matlab?  What is
        the relationship between the observed spectrum and the DTFT?
        What would happen if the FFT calculation takes longer than it
        takes to fill <code>inputs</code> with
        <m:math>
	  <m:ci>N</m:ci>
	</m:math> samples? How long does it take to compute each FFT?
        What are the tradeoffs between writing code in C versus assembly?
      </para>
    </section>

    <section id="sec_appendix_a">
      <name>Appendix A:</name>

      <para id="sec_appendix_a_para1"><link src="lab4.c">lab4.c</link>
      </para>

      <code type="block" id="block6"><![CDATA[
#include "dsk5510_dual3006cfg.h"
#include "dsk5510.h"
#include "swi_process.h"
#include "dsplib.h"

#define N 1024
#define logN 10
#include "window.h"

/* comment the next line to use DSPLIB fft */
//#define C_FFT

#ifdef C_FFT /* Use C FFT */
	
   /* function defined in lab4fft.c */
   void fft(void);

   /* FFT data buffers */
   int real[N]; /* Real part of data      */
   int imag[N]; /* Imaginary part of data */
   #include "lab4fft.c"
#else        /* Use DSPLIB FFT */

   /* Function defined by c_fft_given.asm */            
   void bit_rev(void);

   /* FFT data buffers (declared in c_fft_given.asm) */
   extern int bit_rev_data[N*2]; /* Data input for bit-reverse function */
   extern int fft_data[N*2];     /* In-place FFT & Output array         */
 
#endif       /* C_FFT */


// all data processing should be done in SWI_ProcessBuffer

void SWI_ProcessBuffer()
{
	static unsigned int mbox_value = 0;
	short *psrc, *pdest;
	unsigned int i;

	mbox_value |= SWI_getmbox();

	// buffers are only processed when both transmit and receive are ready
	if((mbox_value & DMA_RECEIVE_DONE) && (mbox_value & DMA_TRANSMIT_DONE)) {
 		mbox_value = 0;

		// get buffer pointers
		psrc = receive_buffer[receive_buffer_to_process_index];
		pdest = transmit_buffer[transmit_buffer_to_fill_index];

		// samples are interleaved in input buffer 3-4-1-2-3-4-1-2
		// output buffer is organized 1-2-3-4-1-2-3-4
		// pdest[0] = psrc[2] would copy input from channel 1
		// to output channel 1.
		
#ifdef C_FFT /* Use C FFT */
    /* I n s e r t   c o d e   t o   f i l l   */
	/* C   F F T   b u f f e r s               */

#else        /* Use DSPLIB FFT */
    /* I n s e r t   c o d e   t o   f i l l   */
	/* a s s e m b l y   F F T   b u f f e r s */

#endif       /* C_FFT */
		


#ifdef C_FFT /* Use C FFT */

    /* Multiply the input signal by the Hamming window.       */
	/* . . . i n s e r t   C / a s m   code . . .             */

    /* Bit-reverse and compute FFT in C                       */
    	fft();

    /* Now, take absolute value squared of FFT                */
    /* . . . i n s e r t   C / a s m   code . . .             */
	
	/* Last, set the DC coefficient to -1 for a trigger pulse */
	/* . . . i n s e r t   C / a s m   code . . .             */

    /* done, wait for next time around!                       */

#else        /* Use DSPLIB FFT */

    /* Multiply the input signal by the Hamming window.       */
	/* . . . i n s e r t   C / a s m   code . . .             */

	/* Compute FFT using DSPLIB function                */
		cfft((DATA *)bit_rev_data,N, SCALE);
	
	/* Bit reverse using assembly function			*/
    	bit_rev();

    /* Now, take absolute value squared of FFT                */
  	/* . . . i n s e r t   C / a s m   code . . .             */

    /* Last, set the DC coefficient to -1 for a trigger pulse */
    /* . . . i n s e r t   C / a s m   code . . .             */

    /* done, wait for next time around!                       */


#endif       /* C_FFT */
		
		receive_buffer_processed = 1; // flag receive buffer as processed
		transmit_buffer_filled = 1; // flag output buffer as full
	}
}
        ]]>
</code>

    </section>

    <section id="sec_appendix_b">
      <name>Appendix B:</name>

      <para id="sec_appendix_b_para1">
        <link src="c_fft_given.asm">c_fft_given.asm</link>
      </para>

      <code type="block" id="block7"><![CDATA[
      .ARMS_off                     ;enable assembler for ARMS=0
      .CPL_on                       ;enable assembler for CPL=1
      .mmregs                       ;enable mem mapped register names

	.global _bit_rev_data
	.global _fft_data
	.global _window
	.global _bit_rev

	.copy "macro.asm"

	.sect ".data"

N	.set 1024
K_FFT_SIZE	.set 1024

	.align 4*N
_bit_rev_data .space 16*2*N	; Input to _bit_rev_fft

	.align	4*N
_fft_data .space 16*2*N		; FFT output buffer

	.sect ".text"


_bit_rev

	ENTER_ASM

	MOV		#_bit_rev_data, AR3
	MOV		#_fft_data, AR7
	MOV		AR7, AR2
	MOV		#K_FFT_SIZE-1, BRC0
	MOV		#K_FFT_SIZE, T0
	RPTB	bit_rev_end-1
	MOV		dbl(*AR3), AC0
	MOV		AC0, dbl(*AR2+)
	MAR		*AR3+0B
bit_rev_end:

	LEAVE_ASM

	RET

	; If you need any more assembly subroutines, make sure you name them
	; _name, and include a ".global _name" directive at the top. Also,
	; don't forget to use ENTER_ASM at the beginning, and LEAVE_ASM
	; and RET at the end!
        ]]>
</code>

    </section>

    <section id="sec_appendix_c">
      <name>Appendix C:</name>

      

      

      <para id="sec_appendix_c_para2">
        <link src="lab4fft.c">lab4fft.c</link>
      </para>

      <code type="block" id="block9">
        <![CDATA[
1    /*****************************************************************/
2    /* lab4fft.c                                                     */
3    /* Douglas L. Jones                                              */
4    /* University of Illinois at Urbana-Champaign                    */
5    /* January 19, 1992                                              */
6    /* Changed for use w/ short integers and lookup table for ECE420 */
7    /* Matt Kleffner                                                 */
8    /* February 10, 2004                                             */
9    /*                                                               */
10    /*   fft: in-place radix-2 DIT DFT of a complex input            */
11    /*                                                               */
12    /*   Permission to copy and use this program is granted          */
13    /*   as long as this header is included.                         */
14    /*                                                               */
15    /* WARNING:                                                      */
16    /*   This file is intended for educational use only, since most  */
17    /*   manufacturers provide hand-tuned libraries which typically  */
18    /*   include the fastest fft routine for their DSP/processor     */
19    /*   architectures. High-quality, open-source fft routines       */
20    /*   written in C (and included in MATLAB) can be found at       */
21    /*   http://www.fftw.org                                         */
22    /*                                                               */
23    /*   #defines expected in lab4.h                                 */
24    /*         N:   length of FFT: must be a power of two            */
25    /*      logN:   N = 2**logN                                      */
26    /*                                                               */
27    /*   16-bit-limited input/output (must be defined elsewhere)     */
28    /*   real:   integer array of length N with real part of data    */
29    /*   imag:   integer array of length N with imag part of data    */
30    /*                                                               */
31    /*   sinetables.h must                                           */
32    /*   1) #define Nt to an equal or greater power of two than N    */
33    /*   2) contain the following integer arrays with                */
34    /*      element magnitudes bounded by M = 2**15-1:               */
35    /*         costable:   M*cos(-2*pi*n/Nt), n=0,1,...,Nt/2-1       */
36    /*         sintable:   M*sin(-2*pi*n/Nt), n=0,1,...,Nt/2-1       */
37    /*                                                               */
38    /*****************************************************************/
39
40    #include "lab4.h"
41    #include "sinetables.h"
42
43    extern int real[N];
44    extern int imag[N];
45
46    void fft(void)
47    {
48       int   i,j,k,n1,n2,n3;
49       int   c,s,a,t,Wr,Wi;
50
51       j = 0;            /* bit-reverse */
52       n2 = N >> 1;
53       for (i=1; i < N - 1; i++)
54       {
55          n1 = n2;
56          while ( j >= n1 )
57          {
58             j = j - n1;
59             n1 = n1 >> 1;
60          }
61          j = j + n1;
62
63          if (i < j)
64          {
65             t = real[i];
66             real[i] = real[j];
67             real[j] = t;
68             t = imag[i];
69             imag[i] = imag[j];
70             imag[j] = t;
71          }
72       }
73
74       /* FFT */
75       n2 = 1; n3 = Nt;
76
77       for (i=0; i < logN; i++)
78       {
79          n1 = n2;      /* n1 = 2**i     */
80          n2 = n2 + n2; /* n2 = 2**(i+1) */
81          n3 = n3 >> 1; /* cos/sin arg of -6.283185307179586/n2 */
82          a = 0;
83
84          for (j=0; j < n1; j++)
85          {
86             c = costable[a];
87             s = sintable[a];
88             a = a + n3;
89
90             for (k=j; k < N; k=k+n2)
91             {
92                /* Code for standard 32-bit hardware, */
93                /* with real,imag limited to 16 bits  */
94                /*
95                Wr = (c*real[k+n1] - s*imag[k+n1]) >> 15;
96                Wi = (s*real[k+n1] + c*imag[k+n1]) >> 15;
97                real[k+n1] = (real[k] - Wr) >> 1;
98                imag[k+n1] = (imag[k] - Wi) >> 1;
99                real[k] = (real[k] + Wr) >> 1;
100                imag[k] = (imag[k] + Wi) >> 1;
101                */
102                /* End standard 32-bit code */
103
104                /* Code for TI TMS320C54X series */
105
106                Wr = ((long int)(c*real[k+n1]) - (long int)(s*imag[k+n1])) >> 15;
107                Wi = ((long int)(s*real[k+n1]) + (long int)(c*imag[k+n1])) >> 15;
108                real[k+n1] = ((long int)real[k] - (long int)Wr) >> 1;
109                imag[k+n1] = ((long int)imag[k] - (long int)Wi) >> 1;
110                real[k] = ((long int)real[k] + (long int)Wr) >> 1;
111                imag[k] = ((long int)imag[k] + (long int)Wi) >> 1;
112
113                /* End code for TMS320C54X series */
114
115                /* Intrinsic code for TMS320C54X series */
116                /*
117                Wr = _ssub(_smpy(c, real[k+n1]), _smpy(s, imag[k+n1]));
118                Wi = _sadd(_smpy(s, real[k+n1]), _smpy(c, imag[k+n1]));
119                real[k+n1] = _sshl(_ssub(real[k], Wr),-1);
120                imag[k+n1] = _sshl(_ssub(imag[k], Wi),-1);
121                real[k] = _sshl(_sadd(real[k], Wr),-1);
122                imag[k] = _sshl(_sadd(imag[k], Wi),-1);
123                */
124                /* End intrinsic code for TMS320C54X series */
125             }
126          }
127       }
128       return;
129    }
        ]]>
      </code>

    </section>

    <section id="sec_appendix_d">
      <name>Appendix D: Main routine, header files for PSD estimator</name>

      <para id="sdp1">
        <link src="lab4b.h">lab4b.h</link>
      </para>

      <para id="sdp2">
        <link src="intrinsics.h">intrinsics.h</link>
      </para>

      <para id="sdp3"><link src="lab4b.c">lab4b.c</link>
</para>

      <para id="element-445"><link src="lab4bmain.c">lab4bmain.c</link>
</para><code type="block" id="sdb1">
        <![CDATA[
1    #define N 1024    /* Length of output buffers */
2    #define L N       /* Length of input data     */
3    #define logL 10   /* log base 2 of L          */
4    #define M 31      /* Compute 2*M+1 autocorrelation points */
5
6    /* #define M (L/2-1) */ /* Be sure to use ()'s in this case */
7                            /* or algebraic substitution bugs   */
8                            /* can be introduced                */
        ]]>
      </code>

      <code type="block" id="sdb2"><![CDATA[
/* Compiler intrinsics for the TI compiler        */
/* and the Very Portable Optimizer (VPO) port     */
/* to TMS320C54X series DSPs                      */
/*                                                */
/* Use compile option -DVPO when using VPO        */
/*                                                */
/* Copyright September 2005 by Matt Kleffner      */
/* under the Creative Commons Attribution License */
/* Works with TMS320C55X series                   */

#ifndef INTRINSICS_H

#define INTRINSICS_H

#ifdef VPO

long int vpo_l_mul_ii(int w0, int w1);

/* fractional multiply without fractional mode (long result) */
#define _l_mul_fract_fb0_ii(w0,w1) \
         (vpo_l_mul_ii(w0,w1) << 1)

/* fractional multiply with fractional mode already on (long result) */
#define _l_mul_fract_fb1_ii(w0,w1) \
         (vpo_l_mul_ii(w0,w1))

/* fractional multiply without fractional mode (int result) */
#define _i_mul_fract_fb0_ii(w0,w1) \
         (vpo_l_mul_ii(w0,w1) >> 15)

/* fractional multiply with fractional mode already on (int result) */
#define _i_mul_fract_fb1_ii(w0,w1) \
         (vpo_l_mul_ii(w0,w1) >> 16)

#define _set_fract_bit() vpo_set_fract()
#define _reset_fract_bit() vpo_reset_fract()
#define _set_ovm_bit() vpo_set_ovm()
#define _reset_ovm_bit() vpo_reset_ovm()

#define _l_add_shiftl_li(w0,w1) (((int32)(w0))+(((int32)(int16)(w1))<<16) )
#define _l_sub_shiftl_li(w0,w1) (((int32)(w0))-(((int32)(int16)(w1))<<16) )

#else

/* fractional multiply without fractional mode (long result) */
#define _l_mul_fract_fb0_ii(w0,w1) \
         (((long int)w0 * (long int)w1) << 1)

/* fractional multiply with fractional mode already on (long result) */
#define _l_mul_fract_fb1_ii(w0,w1) \
         (((long int)w0 * (long int)w1))

/* fractional multiply without fractional mode (int result) */
#define _i_mul_fract_fb0_ii(w0,w1) \
         (((long int)w0 * (long int)w1) >> 15)

/* fractional multiply with fractional mode already on (int result) */
#define _i_mul_fract_fb1_ii(w0,w1) \
         (((long int)w0 * (long int)w1) >> 16)

#define _set_fract_bit() asm("	ssbx frct")
#define _reset_fract_bit() asm("	rsbx frct")
#define _set_ovm_bit() asm("	ssbx ovm")
#define _reset_ovm_bit() asm("	rsbx ovm")

#endif /* VPO */

#endif /* INTRINSICS_H */

        ]]>
</code>

      <code type="block" id="sdb3"><![CDATA[
// lab4b.c
// Uses PN generation, IIR filtering, and autocorrelation
// code by Matt Kleffner -9/2004
// Based on swi_process.c by Educational DSP

#include "dsk5510_dual3006cfg.h"
#include "dsk5510.h"
#include "swi_process.h"
#include "dsplib.h"

#include "lab4b.h"              /* Define N here in header file */

/* function defined in pn.c */
void rand_fillbuffer(void);

/* IIR values and buffers (declared in c_fft_given_iirc.asm) */
#define IIR_order 4
extern int scale;
extern int coef[IIR_order];
extern int state[IIR_order];

/* Pointer to state buffer location */
int iirptr;
extern unsigned int *iseed;            /* seed for rand_fillbuffer() and randbit() */

/* function defined in iirfilt.c */
void iirfilt(void);

/* function defined in autocorr.c */
void autocorr(void);

/* Function defined by c_fft_given_iirc.asm */
//void bit_rev_fft(void);

/* FFT data buffers (declared in c_fft_given_iirc.asm) */
extern int bit_rev_data[N*2];   /* Data output for bit-reverse function */
extern int fft_data[N*2];       /* In-place FFT input & Output array */

/* Our input/output buffers */
int autocorr_in[N];
int autocorr_out[N];

// all data processing should be done in SWI_ProcessBuffer

void SWI_ProcessBuffer()
{
	static unsigned int mbox_value = 0;
	short *psrc, *pdest;
	unsigned int i;

	mbox_value |= SWI_getmbox();

	// buffers are only processed when both transmit and receive are ready
	if((mbox_value & DMA_RECEIVE_DONE) && (mbox_value & DMA_TRANSMIT_DONE)) {
 		mbox_value = 0;

		// get buffer pointers
		psrc = receive_buffer[receive_buffer_to_process_index];
		pdest = transmit_buffer[transmit_buffer_to_fill_index];
		
		    /* First, transfer inputs and outputs */

	    for (i = 0; i < N; i++) 
	    {
    	    pdest[4*i] = autocorr_in[i];
        	pdest[4*i+1] = bit_rev_data[i*2] << 8;
		
        /* Some statements useful in debugging */
        /* pdest[4*i] = psrc[4*i+2]; */
        /* Be sure to comment out PN-sequence generation */
        /* when using the next two lines */
        //autocorr_in[i] = psrc[4i+2];
		
	    }
    	
    	/* Last, set the DC coefficient to -1 for a trigger pulse */
    	pdest[0] = -32768;

		/* Generate PN input */
    	rand_fillbuffer();
		
    	/* Filter input */
    	iirfilt();

    	/* Calculate autocorrelation */
    	autocorr();
	
	 	/* Transfer autocorr output to FFT input buffer */
    	for (i = 0; i < N; i++) {
        	fft_data[i*2] = autocorr_out[i];
        	fft_data[i*2+1] = 0;
    	}
 
		
		// Bit-reverse and compute FFT 
		cfft((DATA *)fft_data,N, SCALE);
		cbrev((DATA *)fft_data,(DATA *)bit_rev_data,N);
		
		
		 /* Done, wait for next time around! */
		receive_buffer_processed = 1; // flag receive buffer as processed
		transmit_buffer_filled = 1; // flag output buffer as full
	}
}


        ]]>
</code><code type="block"><![CDATA[
// lab4bmain.c
// based on main.c by Educational DSP
// Initializes autocorr_out, state, and iseed.
// Can be modified for optimization. 

#include "dsk5510_dual3006cfg.h"
#include "dsk5510.h"

#include <csl.h>
#include "lab4b.h"
#define IIR_order 4

extern int iirptr;
unsigned int *iseed;
extern int autocorr_out[N];
extern int state[IIR_order];

void init_DMA();

void main()
{
    int i;
 
	
    DSK5510_init();						// init BSL
    
    DSK5510_rset(DSK5510_MISC, 0x03);	// route McBSP0/1 to J3


	// Initialize autocorr_out to zero since some values will remain zero
	for (i = 0; i < N; ++i)
	{
		autocorr_out[i] = 0;
	}

	for ( i = 0; i < IIR_order; ++i)
		state[i] = 0;
    
    // Start McBSP0 I2S slave
    MCBSP_start(hMcbsp0, MCBSP_XMIT_START | MCBSP_RCV_START |
	MCBSP_SRGR_START | MCBSP_SRGR_FRAMESYNC, 220);
    
    // Start McBSP1 I2S master
    MCBSP_start(hMcbsp1, MCBSP_XMIT_START | MCBSP_RCV_START |
	MCBSP_SRGR_START | MCBSP_SRGR_FRAMESYNC, 220);
   
	init_DMA();							// configure DMA and interrupts

	*iseed = 1;	
	iirptr = 0;
   
    return;								// let DSP/BIOS scheduler take over
}

]]></code>

    </section>

    <section id="sec_appendix_e">                                                                     
      <name>Appendix E: Additional routines for PSD estimator</name>
                                                                                  
      <para id="sep1">
        <link src="pn.c">pn.c</link>
      </para>

      <para id="sep2">
        <link src="iirfilt.c">iirfilt.c</link>
      </para>

      <para id="sep3">
        <link src="autocorr.c">autocorr.c</link>
      </para>

      <para id="sep4">
        <link src="c_fft_given_iirc.asm">c_fft_given_iirc.asm</link>
      </para>

      <code type="block" id="seb1"><![CDATA[
/* ECE420, Lab 4, Reference PN Generator Implementation (Non-Optimized) */
/* Matt Kleffner 08/04                                                  */
/* Original by Michael Frutiger 02/24/04                                */
/* Use governed by the Creative Commons Attribution License             */

#include "lab4b.h"

extern unsigned int *iseed;
extern int autocorr_in[N];

/* Returns as an integer a random bit, based on the 15 lowest significant
   bits in iseed (which is modified for the next call). */
int randbit()
{
   int newbit;
   /* XOR bits 15, 1 and 0 of iseed */
   newbit =  (*iseed >> 15) & 1 ^ (*iseed >> 1) & 1 ^ *iseed & 1;
   /* Leftshift the seed and put the result of the XOR's in bit 1. */
   *iseed=(*iseed << 1) | newbit;  
   return(newbit);
}

void rand_fillbuffer(void)
{
   int i;

   for (i = 0; i < N; ++i)
   {
      if (randbit()) autocorr_in[i] =  32767;
      else           autocorr_in[i] = -32767;
   }
}

        ]]>
</code>

      <code type="block" id="seb2"><![CDATA[
/* Simple, unoptimized IIR filter (feedback only) */
/* for TMS320C54X series DSPs                     */
/* Copyright September 2005 by Matt Kleffner      */
/* under the Creative Commons Attribution License */
/* Works for TMS320C55X series as well			  */

#include "lab4b.h"
#include "intrinsics.h"

/* IIR values and buffers (declared in c_fft_given_iirc.asm) */
#define IIR_order 4
extern int scale;
extern int coef[IIR_order];
extern int state[IIR_order];

/* Arrays declared in main routine */
extern int autocorr_in[N];
extern int autocorr_out[N];

/* Pointer to state buffer location */
extern int iirptr;

void iirfilt()
{
    int i, j;

    _set_fract_bit();
    /* Filter PN input */
    for (i = 0; i < N; ++i)
    {
       int sum = 0;
       /* Calculate and sum all feedback terms except the "oldest" one */
       for (j = 0; j < (IIR_order-1); ++j)
       {
          sum += _i_mul_fract_fb1_ii(coef[j],state[iirptr]);
          /* Avoid usage of "modulo" routine */
          iirptr++;
          if (iirptr == IIR_order) iirptr = 0;
       }
       /* Calculate and sum oldest feedback term without incrementing iirptr */
       sum += _i_mul_fract_fb1_ii(coef[IIR_order-1],state[iirptr]);

       /* Calculate direct input contribution */
       sum += _i_mul_fract_fb1_ii(scale,autocorr_in[i]);
       autocorr_in[i] = sum;
       state[iirptr] = autocorr_in[i];
    }
    _reset_fract_bit();
}

       ]]>
</code>

      <code type="block" id="seb3"><![CDATA[
/***********************************************************/
/* autocorr.c                                              */
/* Copyright August 2004 by Matt Kleffner                  */
/* under the Creative Commons Attribution License          */
/*                                                         */
/* Simple, unoptimized autocorrelation function            */
/* for ECE 420 (TMS320C54X series)                         */
/*                                                         */
/* #defines expected in lab4b.h                            */
/*    L: length of data in autocorr_in buffer              */
/*    N: length of data in autocorr_out buffer             */
/* logL: log base 2 of L (used for scaling output)         */
/*    M: Largest positive lag of autocorrelation desired   */
/*       (must be < L and < N/2)                           */
/*                                                         */
/* 16-bit-limited input/output (must be defined elsewhere) */
/*  autocorr_in: buffer for input data  (L pts)            */
/* autocorr_out: buffer for output data (N pts)            */
/*               N must be >= 2*M+1                        */
/*               assumed to be full of zeros at start      */
/*               output in zero-phase form                 */
/***********************************************************/
/* Works for TMS320C55X series 							   */

#include "lab4b.h"
#include "intrinsics.h"

extern int autocorr_in[L];
extern int autocorr_out[N];

void autocorr(void)
{
   int i,j,temp;

   _set_fract_bit();
   for(i=0;i<=M;++i)
   {
      long int sum=0;
      for(j=0;j<(L-i);++j)
      {
         temp = _i_mul_fract_fb1_ii(autocorr_in[j],autocorr_in[j+i]);
         sum += temp;
      }
      autocorr_out[i]=(int)(sum >> logL);
   }
   _reset_fract_bit();

   /* Copy values for negative indeces at end of buffer */
   for(i=1,j=N-1;i<=M;++i,--j)
   {  autocorr_out[j]=autocorr_out[i]; }
}

        ]]>
</code>

    </section><code type="block"><![CDATA[
; c_fft_given_iirc.asm
; Designed for use in lab4b for ECE420



      .ARMS_off                     ;enable assembler for ARMS=0
      .CPL_on                       ;enable assembler for CPL=1
      .mmregs                       ;enable mem mapped register names

	.global _bit_rev_data
	.global _fft_data
	.global _state
	.global _scale
	.global _coef

	.copy "macro.asm"

	.sect ".data"

N	.set 1024


	.align 4*N
_bit_rev_data .space 16*2*N

	.align 4*N
_fft_data .space 16*2*N


; IIR filter
	.align 4
_coef
	.word	0
	.word	0
	.word	0
	.word	-13421

_state
	.space 16*4

_scale
	.word	19345

	.sect ".text"


]]></code>

  </content>

</document>
