Based on: Spectrum Analyzer: Processor Exercise Using C Language with C Introduction by Matt Kleffner
Summary: This is a processor exercise in which students implement a spectrum analyzer using mixed C and assembly code. Students are to acquire a block of 1024 samples, apply a Hamming window, compute a length-1024 Discrete Fourier Transform using provided Fast Fourier Transform code, and display the magnitude-squared spectrum on an oscilloscope. Students will also compile and explore a reference implementation of an autocorrelation-based power spectral density (PSD) estimator. This implementation estimates the PSD of an IIR-filtered pseudo-noise generator.
Note: Your browser may not currently support MathML. See our browser support page for additional details. You can always view the correct math in the PDF version.
Note: You are viewing an old version of this document. The latest version is available here.
As this is your first experience with the C environment, you will have the option to add most of the required code in C or assembly. A C skeleton will provide access to input samples, output samples, and interrupt handling code. You will add code to transfer the inputs and outputs (in blocks at a time), apply a hamming window, compute the magnitude-squared spectrum, and include a trigger pulse. After the hamming window is created, either an assembly or C module that bit reverses the input and performs an FFT calculation is called.
As your spectrum analyzer works on a block of samples at a
time, you will need to use interrupts to pause your processing
while samples are transferred from/to the CODEC (A/D and D/A)
buffer. Fortunately, the interrupt handling routines have
been written for you in a C shell program available at
v:\ece420\55x\lab4\main.c. For this lab, you will be working with the code available at v:\ece420\55x\lab4.
Interrupts are an essential part of the operation of any microprocessor. They are particularly important in embedded applications where DSPs are often used. Hardware interrupts provide a way for interacting with external devices while the processor executes code. For example, in a key entry system, a key press would generate a hardware interrupt. The system code would then jump to a specified location in program memory where a routine could process the key input. Interrupts provide an alternative to polling. Instead of checking for key presses at a predetermined rate (requires a clock), the system could be busy executing other code. On the TI-C55x DSP, interrupts provide a convenient way to transfer blocks of data to/from the CODEC in a timely fashion.
The main.c,dma.c, and lab4.c code are intended
to make your interaction with the hardware much simpler. These files handle the the buffering of data using interrupts from the CODEC (A/D and D/A) and the USB port. Here, we
will describe the important aspects of the code necessary
to complete the assignment.
In the first few labs, data was processed on a sample-by-sample basis, so no buffering was necesary. However, the spectrum analyzer to be
implemented in this lab works over a block of
We now provide an explanation of the shell C program
main.c. The
main.c file contains the main function that sets up the McBSP (Multi Channel Buffered Serial Port), DMA (Direct Memory Access), and interrupts. Then it returns and allows the DSP/BIOS scheduler to take over.
Many of the important interrupt routines are defined in the DSP/BIOS scheduler. The settings can be viewed by expanding the DSP/BIOS Config folder and double-clicking on the .cdb file. Changes do not have to be made, but it is good to know that the most important parts are the Scheduling and the Chip Support Library. Under Scheduling, there is the Hardware Interrupt Manager and Software Interrupt Manager. The Chip Support Library includes the DMA controller and the MCBSP. Various settings for these and many other modules can be set using a graphical interface instead of straight up coding.
Code for the DMA is defined in dma.c. The buffers are defined in this file, as well as the hardware interrupts. The hardware interrupts are initialized by calling the function init_DMAin the main function. When the hardware interrupts are triggered, they call the HWI_DMA0_Transmit() and HWI_DMA1_Receive() functions. At the end of these two functions, the software interrupt SWI_Process() is posted with different variables. Posting SWI_Process() will call the SWI_ProcessBuffer() function which will require modification.
The SWI_ProcessBuffer() function is defined in lab4.c. It is called every time the software interrupt SWI_Process is posted, which is set to happen every N = 1024 samples. As given, the function will simply copy the inputs to the outputs. (After commenting out some lines and uncommenting others.) Follow the example and comments to modify the code to perform the necessary operations.
Although each of these operations may be performed in C or assembly, we suggest you follow the guidelines suggested.
Near the beginning of the SWI_ProcessBuffer function, the input samples need to be copied to specific buffers for processing. In C, pointers may be
used as array names so that pdest[0] is the
first word pointed to by pdest. The input samples are not in
consecutive order and must be accessed with offsets. The four channels of input are offset from
psrc respectively by
pdest. On channel 1 of the
output, the input is echoed out. You are to fill
channel 2 with the windowed
magnitude-squared FFT values by performing the operations
listed above. For the first step, take a look at the way we make the DSPLIB cfft call to find out where to transfer the inputs to. (You may change the function call cfft to pass in different values if you like. Just remember that bit_rev() expects its input in a specific location.) Likewise, take a look at the C FFT code (declared in lab4fft.c to find out where to copy the inputs to.
As the list of operations indicates, bit-reversal and FFT
computation are to be done in both C and assembly. For the
assembly version, make sure that the line defining
C_FFT is commented out in lab4.c.
We are providing you with a shell assembly file, available at
v:\ece420\55x\lab4\c_fft_given.asm and shown
in Appendix B, containing many
useful declarations and some code. The code for performing
bit-reversal and other declarations needed for the FFT
routine are also provided in this section.
However, we would like you to enter this code
manually, as you will be expected to understand its
operation.
The assembly file c_fft_given.asm
contains two main parts, the data section
starting with .sect ".data" and the
program section starting with
.sect ".text". Every function and
variable accessed in C must be preceded by a single underscore
_ in assembly and a
.global _name must be placed in the
assembly file for linking. In this example,
bit_rev_fft is an assembly function called from
the C program with a label _bit_rev_fft in the
text portion of the assembly file and a
.global _bit_rev_fft declaration. In each
assembly function, the macro ENTER_ASM is
called upon entering and LEAVE_ASM is
called upon exiting. These macros are defined in
v:\ece420\55x\macro.asm. The
ENTER_ASM macro saves the status registers
and AR1, AR6, and
AR7 when entering a function as required
by the register use conventions. The ENTER_ASM
macro also sets the status registers to the assembly conventions we
have been using (i.e, FRCT=1 for fractional
arithmetic and CPL=0 for
DP referenced addressing). The
LEAVE_ASM macro just restores the saved
registers.
The parameter passing convention between assembly and C places the parameters into registers depending on the size of the parameters. Data pointers (16 or 23 bit) will be placed in (X)AR0 through (X)AR4 in that order. 16-bit data will be placed in T0, T1, and AR0 through AR4 in that order. 32-bit data will be placed in accumulators AC0 through AC2. If there are no available registers of the correct type, the parameters will be passed onto the stack. For more details, see page 6-16 of the Optimizing C/C++ Compiler User's Guide (spru281e).
When entering and leaving an assembly function, the
ENTER_ASM and LEAVE_ASM
macros ensure that certain registers are saved and restored.
Since the C program may use any and all registers, the state of
a register cannot be expected to remain the same between calls to
assembly function(s). Therefore, any information that
needs to be preserved across calls to an assembly function must be
saved to memory!
Now, we explain how to use the FFT routine provided by TI
for the C55x. TI provides a library of commonly used functions. These functions include FFT, FIR, IIR, and some math operations. More information can be found in the DSP Library Programmer's Reference. The CFFT function will be used for the DSPLIB implementation. Refer to the reference guide to figure out the correct syntax in calling the function.
The length of the FFT is defined as a label K_FFT_SIZE and the bit-reversing algorithm assumes that the input starts at data memory location _fft_data. To have your code assemble for an N-point FFT, you will have to include the following label definitions in your assembly code.
N .set 1024
K_FFT_SIZE .set N ; size of FFT
The FFT provided by TI requires that the input be in normal order, with alternating real and imaginary
components. The output will be in bit-reversed order, so bit-reversing needs to be done after the FFT. Bit-reversed addressing is a convenient way to
order input/output
| Input Order | Binary Representation | Bit-Reversed Representation | Output Order |
|---|---|---|---|
| 0 | 000 | 000 | 0 |
| 1 | 001 | 100 | 4 |
| 2 | 010 | 010 | 2 |
| 3 | 011 | 110 | 6 |
| 4 | 100 | 001 | 1 |
| 5 | 101 | 101 | 5 |
| 6 | 110 | 011 | 3 |
| 7 | 111 | 111 | 7 |
The following routine performs the bit-reversed reordering
of the output data. The routine assumes that the output is
stored in data memory starting at the location labeled
_bit_rev_data, which must be aligned to the
least power of two greater than the input buffer length, and
consists of alternating real and imaginary parts. Because
our input data is going to be purely real in this lab, you
will have to make sure that you set the imaginary parts to
zero by zeroing out every other memory location.
_bit_rev
ENTER_ASM
MOV #_bit_rev_data, AR3 ;AR3 -> original input
MOV #_fft_data, AR7 ;AR7 -> data processing buffer
MOV AR7, AR2 ;AR2 -> bit-reversed data
MOV #K_FFT_SIZE-1, BRC0
MOV #K_FFT_SIZE, T0 ; T0 = 1/2 size of circ buffer
RPTB bit_rev_end-1
MOV dbl(*AR3), AC0
MOV AC0, dbl(*AR2+)
MAR *AR3+0B
bit_rev_end:
LEAVE_ASM
RET
As mentioned, in the above code _bit_rev_data
is a label indicating the start of the input data and
_fft_data is a label indicating the start of a
circular buffer where the bit-reversed data will be
written.
In general, to have a pointer index memory in bit-reversed
order, the T0 register needs to be set to
one-half the length of the circular buffer; a statement such
as ARx+0B is used to move the ARx
pointer to the next location. For more information regarding
the bit-reversed addressing mode, refer to Chapter 6 in the DSP Programmer's Reference Guide [link]. Is it possible to
bit-reverse a buffer in place?
A bit-reversing and FFT routine have also been provided in
lab4fft.c, listed in
Appendix C. Again,
make sure you understand how the bit reversal is taking
place.
In main.c, the line defining C_FFT
must not be commented for use of the C FFT routine. The sine
tables (twiddle factors) are located in
sinetables.h.
This fft requires its inputs in two buffers, the real buffer
real and the imaginary buffer imag,
and the output is placed in the same buffers.
The length of the FFT, N, and logN are
defined in lab4.h, which is also listed in
Appendix C.
When experimenting with the C
FFT make sure you modify these length values instead of the ones
in the assembly code and main.c!
As mentioned, you will be using the FFT to compute the
spectrum of a windowed input. For your implementation you
will need to create a 1024-point Hamming window. First,
create a Hamming window in matlab using the function
hamming. Use the matlab
function
write_intvector_headerfile
with name set to 'window' and
elemperline set to 8 to create
the header file that is included in lab4.c.
Once the DFT has been computed, you must calculate the squared magnitude of the spectrum for display.
sqrm and sqam useful in
implementing Equation 1.
Because the squared magnitude is always nonnegative, you can
replace one of the magnitude values with a -1.0 as a trigger
pulse for display on the oscilloscope. This is easily
performed by replacing the DC term (
If you are planning on writing some of the code in C, then
you may be forced to use intrinsics. Intrinsic instructions
provide a way to use assembly instructions directly in C.
An example of an intrinsic instruction is
bit_rev_data[0]=_smpyr(bit_rev_data[0],window[0])
which performs the assembly signed multiply round
instruction. You may also find the _lsmpy
instruction useful. For more information on intrinsics, see
page 3-29 of the TI-C55x
DSP Programmer's Guide [link].
The following lines of code were borrowed from the C FFT to serve as an example of arithmetic operations in C. Save this code in a file called mathex.c and create a new project by going to Project->New... . In the following window, enter mathex as the name for the project and save it in its own folder on the W: drive. Verify that the Project Type is Executable (.out) and that the target is TMS320C55XX before clicking Finish.
int s1, s2;
int t1, t2;
int i1, i2;
int n1 = 16383, n2 = 16382, n3 = 16381, n4 = 16380;
void main(void)
{
/* Code for standard 32-bit hardware, */
/* with x,y limited to 16 bits */
s1 = (n1*n2 + n3*n4) >> 15;
s2 = (n1 + n2) >> 1;
/* Code for TI TMS320C55X series */
t1 = ((long int)(n1*n2) + (long int)(n3*n4)) >> 15;
t2 = ((long int)n1 + (long int)n2) >> 1;
/* Intrinsic code for TMS320C55X series */
i1 = _sadd(_smpy(n1,n2), _smpy(n3,n4));
i2 = _sshl(_sadd(n1, n2),-1);
while(1);
}
Add the mathex.c file to the project by left-clicking on the mathex.pjt file in the left-hand window and selecting Add Files to Project.. . By default, the generated assembly code is not saved. To save the generated assembly for later comparison, go to Project->Build Options. Under the Compiler tab, click on the Assembly category and make sure Keep Generated .asm Files is selected.
Compile your project before looking at the resulting assembly file and investigating the differences between each block. Be sure to reference page 3-32 of the DSP Programmer's Guide to find out what the state of the FRCT and OVM bits are. Run this program on the DSP, halt the program, and compare the output values in a memory window. Does each block work properly for all possible values?
A working program can be produced by compiling the C code and
linking assembly modules and the core module. The compiler
translates C code to a relocatable assembly form. The linker
assigns physical addresses on the DSP to the relocatable data
and code segments, resolves .global
references and links runtime libraries.
Close the mathex project and go back to the original Lab 4 project. In the future if there are additional source code files to include in the project, just follow the above instructions. Once you have
completed lab4.c and c_fft_given.asm, select Project->Rebuild All. Load the output file onto the DSP as usual and confirm that valid
FFTs are calculated. Once valid output is obtained, measure how
many clock cycles it takes to compute both the assembly and C FFT.
We provide for you in Appendix D
and E a complete C implementation
of a PSD estimator. The input is an IIR-filtered pseudo-noise (PN)
sequence generator and the PSD estimate is based on windowing the
autocorrelation with a rectangular window.
The code consists of the files lab4b.c,
lab4b.h, intrinsics.h, pn.c,
iirfilt.c, autocorr.c,
c_fft_given_iirc.asm, and the previously-given TI FFT
routine. The assembly file c_fft_given_iirc.asm differs
from c_fft_given.asm in that the window array has been
removed and variables and arrays associated with IIR filtering have
been added. Note that the multiply functions in the functions are
actually compiler directives contained in intrinsics.h.
Make sure you know which ones are used and why; note that
VPO is not defined by the TI compiler, therefore the
corresponding section of the #ifdef statement is not used.
Open up the lab4b.pjt project and Rebuild All.
Load lab4b.out onto the DSP and
run the code. Make sure that an
IIR-filtered PN sequence appears on channel 1 and its PSD estimate
appears on channel 2.
Does the output match your expectations based on the theory? Does this application illustrate any limitations of the FFT implementation? (Hint: note that most of the values in the FFT input are zero.) The previously-given C implementation uses a similar algorithm as the TI FFT; take a look at the C code for help. What are the limitation(s) of the FFT that show up in this application?
In lab4b.h
M sets the number of
autocorrelation points that are calculated. What is the maximum value
of M that allows the reference routines to run in real
time? In determining this value you may find it useful to connect a
wave-function generator to the input and copy input on that channel
into channel 1 of output. You may limit M to
powers of 2 minus 1.
From your prelab experiments, you should be able to describe
the effect of windowing and zero-padding on FFT spectral
analysis. In your DSP system, experiment with different
inputs, changing
inputs with
#include "dsk5510_dual3006cfg.h"
#include "dsk5510.h"
#include "swi_process.h"
#include "dsplib.h"
#define N 1024
#define logN 10
#include "window.h"
/* comment the next line to use DSPLIB fft */
//#define C_FFT
#ifdef C_FFT /* Use C FFT */
/* function defined in lab4fft.c */
void fft(void);
/* FFT data buffers */
int real[N]; /* Real part of data */
int imag[N]; /* Imaginary part of data */
#include "lab4fft.c"
#else /* Use DSPLIB FFT */
/* Function defined by c_fft_given.asm */
void bit_rev(void);
/* FFT data buffers (declared in c_fft_given.asm) */
extern int bit_rev_data[N*2]; /* Data input for bit-reverse function */
extern int fft_data[N*2]; /* In-place FFT & Output array */
#endif /* C_FFT */
// all data processing should be done in SWI_ProcessBuffer
void SWI_ProcessBuffer()
{
static unsigned int mbox_value = 0;
short *psrc, *pdest;
unsigned int i;
mbox_value |= SWI_getmbox();
// buffers are only processed when both transmit and receive are ready
if((mbox_value & DMA_RECEIVE_DONE) && (mbox_value & DMA_TRANSMIT_DONE)) {
mbox_value = 0;
// get buffer pointers
psrc = receive_buffer[receive_buffer_to_process_index];
pdest = transmit_buffer[transmit_buffer_to_fill_index];
// samples are interleaved in input buffer 3-4-1-2-3-4-1-2
// output buffer is organized 1-2-3-4-1-2-3-4
// pdest[0] = psrc[2] would copy input from channel 1
// to output channel 1.
#ifdef C_FFT /* Use C FFT */
/* I n s e r t c o d e t o f i l l */
/* C F F T b u f f e r s */
#else /* Use DSPLIB FFT */
/* I n s e r t c o d e t o f i l l */
/* a s s e m b l y F F T b u f f e r s */
#endif /* C_FFT */
#ifdef C_FFT /* Use C FFT */
/* Multiply the input signal by the Hamming window. */
/* . . . i n s e r t C / a s m code . . . */
/* Bit-reverse and compute FFT in C */
fft();
/* Now, take absolute value squared of FFT */
/* . . . i n s e r t C / a s m code . . . */
/* Last, set the DC coefficient to -1 for a trigger pulse */
/* . . . i n s e r t C / a s m code . . . */
/* done, wait for next time around! */
#else /* Use DSPLIB FFT */
/* Multiply the input signal by the Hamming window. */
/* . . . i n s e r t C / a s m code . . . */
/* Compute FFT using DSPLIB function */
cfft((DATA *)bit_rev_data,N, SCALE);
/* Bit reverse using assembly function */
bit_rev();
/* Now, take absolute value squared of FFT */
/* . . . i n s e r t C / a s m code . . . */
/* Last, set the DC coefficient to -1 for a trigger pulse */
/* . . . i n s e r t C / a s m code . . . */
/* done, wait for next time around! */
#endif /* C_FFT */
receive_buffer_processed = 1; // flag receive buffer as processed
transmit_buffer_filled = 1; // flag output buffer as full
}
}
.ARMS_off ;enable assembler for ARMS=0
.CPL_on ;enable assembler for CPL=1
.mmregs ;enable mem mapped register names
.global _bit_rev_data
.global _fft_data
.global _window
.global _bit_rev
.copy "macro.asm"
.sect ".data"
N .set 1024
K_FFT_SIZE .set 1024
.align 4*N
_bit_rev_data .space 16*2*N ; Input to _bit_rev_fft
.align 4*N
_fft_data .space 16*2*N ; FFT output buffer
.sect ".text"
_bit_rev
ENTER_ASM
MOV #_bit_rev_data, AR3
MOV #_fft_data, AR7
MOV AR7, AR2
MOV #K_FFT_SIZE-1, BRC0
MOV #K_FFT_SIZE, T0
RPTB bit_rev_end-1
MOV dbl(*AR3), AC0
MOV AC0, dbl(*AR2+)
MAR *AR3+0B
bit_rev_end:
LEAVE_ASM
RET
; If you need any more assembly subroutines, make sure you name them
; _name, and include a ".global _name" directive at the top. Also,
; don't forget to use ENTER_ASM at the beginning, and LEAVE_ASM
; and RET at the end!
1 /*****************************************************************/
2 /* lab4fft.c */
3 /* Douglas L. Jones */
4 /* University of Illinois at Urbana-Champaign */
5 /* January 19, 1992 */
6 /* Changed for use w/ short integers and lookup table for ECE420 */
7 /* Matt Kleffner */
8 /* February 10, 2004 */
9 /* */
10 /* fft: in-place radix-2 DIT DFT of a complex input */
11 /* */
12 /* Permission to copy and use this program is granted */
13 /* as long as this header is included. */
14 /* */
15 /* WARNING: */
16 /* This file is intended for educational use only, since most */
17 /* manufacturers provide hand-tuned libraries which typically */
18 /* include the fastest fft routine for their DSP/processor */
19 /* architectures. High-quality, open-source fft routines */
20 /* written in C (and included in MATLAB) can be found at */
21 /* http://www.fftw.org */
22 /* */
23 /* #defines expected in lab4.h */
24 /* N: length of FFT: must be a power of two */
25 /* logN: N = 2**logN */
26 /* */
27 /* 16-bit-limited input/output (must be defined elsewhere) */
28 /* real: integer array of length N with real part of data */
29 /* imag: integer array of length N with imag part of data */
30 /* */
31 /* sinetables.h must */
32 /* 1) #define Nt to an equal or greater power of two than N */
33 /* 2) contain the following integer arrays with */
34 /* element magnitudes bounded by M = 2**15-1: */
35 /* costable: M*cos(-2*pi*n/Nt), n=0,1,...,Nt/2-1 */
36 /* sintable: M*sin(-2*pi*n/Nt), n=0,1,...,Nt/2-1 */
37 /* */
38 /*****************************************************************/
39
40 #include "lab4.h"
41 #include "sinetables.h"
42
43 extern int real[N];
44 extern int imag[N];
45
46 void fft(void)
47 {
48 int i,j,k,n1,n2,n3;
49 int c,s,a,t,Wr,Wi;
50
51 j = 0; /* bit-reverse */
52 n2 = N >> 1;
53 for (i=1; i < N - 1; i++)
54 {
55 n1 = n2;
56 while ( j >= n1 )
57 {
58 j = j - n1;
59 n1 = n1 >> 1;
60 }
61 j = j + n1;
62
63 if (i < j)
64 {
65 t = real[i];
66 real[i] = real[j];
67 real[j] = t;
68 t = imag[i];
69 imag[i] = imag[j];
70 imag[j] = t;
71 }
72 }
73
74 /* FFT */
75 n2 = 1; n3 = Nt;
76
77 for (i=0; i < logN; i++)
78 {
79 n1 = n2; /* n1 = 2**i */
80 n2 = n2 + n2; /* n2 = 2**(i+1) */
81 n3 = n3 >> 1; /* cos/sin arg of -6.283185307179586/n2 */
82 a = 0;
83
84 for (j=0; j < n1; j++)
85 {
86 c = costable[a];
87 s = sintable[a];
88 a = a + n3;
89
90 for (k=j; k < N; k=k+n2)
91 {
92 /* Code for standard 32-bit hardware, */
93 /* with real,imag limited to 16 bits */
94 /*
95 Wr = (c*real[k+n1] - s*imag[k+n1]) >> 15;
96 Wi = (s*real[k+n1] + c*imag[k+n1]) >> 15;
97 real[k+n1] = (real[k] - Wr) >> 1;
98 imag[k+n1] = (imag[k] - Wi) >> 1;
99 real[k] = (real[k] + Wr) >> 1;
100 imag[k] = (imag[k] + Wi) >> 1;
101 */
102 /* End standard 32-bit code */
103
104 /* Code for TI TMS320C54X series */
105
106 Wr = ((long int)(c*real[k+n1]) - (long int)(s*imag[k+n1])) >> 15;
107 Wi = ((long int)(s*real[k+n1]) + (long int)(c*imag[k+n1])) >> 15;
108 real[k+n1] = ((long int)real[k] - (long int)Wr) >> 1;
109 imag[k+n1] = ((long int)imag[k] - (long int)Wi) >> 1;
110 real[k] = ((long int)real[k] + (long int)Wr) >> 1;
111 imag[k] = ((long int)imag[k] + (long int)Wi) >> 1;
112
113 /* End code for TMS320C54X series */
114
115 /* Intrinsic code for TMS320C54X series */
116 /*
117 Wr = _ssub(_smpy(c, real[k+n1]), _smpy(s, imag[k+n1]));
118 Wi = _sadd(_smpy(s, real[k+n1]), _smpy(c, imag[k+n1]));
119 real[k+n1] = _sshl(_ssub(real[k], Wr),-1);
120 imag[k+n1] = _sshl(_ssub(imag[k], Wi),-1);
121 real[k] = _sshl(_sadd(real[k], Wr),-1);
122 imag[k] = _sshl(_sadd(imag[k], Wi),-1);
123 */
124 /* End intrinsic code for TMS320C54X series */
125 }
126 }
127 }
128 return;
129 }
1 #define N 1024 /* Length of output buffers */
2 #define L N /* Length of input data */
3 #define logL 10 /* log base 2 of L */
4 #define M 31 /* Compute 2*M+1 autocorrelation points */
5
6 /* #define M (L/2-1) */ /* Be sure to use ()'s in this case */
7 /* or algebraic substitution bugs */
8 /* can be introduced */
/* Compiler intrinsics for the TI compiler */
/* and the Very Portable Optimizer (VPO) port */
/* to TMS320C54X series DSPs */
/* */
/* Use compile option -DVPO when using VPO */
/* */
/* Copyright September 2005 by Matt Kleffner */
/* under the Creative Commons Attribution License */
/* Works with TMS320C55X series */
#ifndef INTRINSICS_H
#define INTRINSICS_H
#ifdef VPO
long int vpo_l_mul_ii(int w0, int w1);
/* fractional multiply without fractional mode (long result) */
#define _l_mul_fract_fb0_ii(w0,w1) \
(vpo_l_mul_ii(w0,w1) << 1)
/* fractional multiply with fractional mode already on (long result) */
#define _l_mul_fract_fb1_ii(w0,w1) \
(vpo_l_mul_ii(w0,w1))
/* fractional multiply without fractional mode (int result) */
#define _i_mul_fract_fb0_ii(w0,w1) \
(vpo_l_mul_ii(w0,w1) >> 15)
/* fractional multiply with fractional mode already on (int result) */
#define _i_mul_fract_fb1_ii(w0,w1) \
(vpo_l_mul_ii(w0,w1) >> 16)
#define _set_fract_bit() vpo_set_fract()
#define _reset_fract_bit() vpo_reset_fract()
#define _set_ovm_bit() vpo_set_ovm()
#define _reset_ovm_bit() vpo_reset_ovm()
#define _l_add_shiftl_li(w0,w1) (((int32)(w0))+(((int32)(int16)(w1))<<16) )
#define _l_sub_shiftl_li(w0,w1) (((int32)(w0))-(((int32)(int16)(w1))<<16) )
#else
/* fractional multiply without fractional mode (long result) */
#define _l_mul_fract_fb0_ii(w0,w1) \
(((long int)w0 * (long int)w1) << 1)
/* fractional multiply with fractional mode already on (long result) */
#define _l_mul_fract_fb1_ii(w0,w1) \
(((long int)w0 * (long int)w1))
/* fractional multiply without fractional mode (int result) */
#define _i_mul_fract_fb0_ii(w0,w1) \
(((long int)w0 * (long int)w1) >> 15)
/* fractional multiply with fractional mode already on (int result) */
#define _i_mul_fract_fb1_ii(w0,w1) \
(((long int)w0 * (long int)w1) >> 16)
#define _set_fract_bit() asm(" ssbx frct")
#define _reset_fract_bit() asm(" rsbx frct")
#define _set_ovm_bit() asm(" ssbx ovm")
#define _reset_ovm_bit() asm(" rsbx ovm")
#endif /* VPO */
#endif /* INTRINSICS_H */
// lab4b.c
// Uses PN generation, IIR filtering, and autocorrelation
// code by Matt Kleffner -9/2004
// Based on swi_process.c by Educational DSP
#include "dsk5510_dual3006cfg.h"
#include "dsk5510.h"
#include "swi_process.h"
#include "dsplib.h"
#include "lab4b.h" /* Define N here in header file */
/* function defined in pn.c */
void rand_fillbuffer(void);
/* IIR values and buffers (declared in c_fft_given_iirc.asm) */
#define IIR_order 4
extern int scale;
extern int coef[IIR_order];
extern int state[IIR_order];
/* Pointer to state buffer location */
int iirptr;
extern unsigned int *iseed; /* seed for rand_fillbuffer() and randbit() */
/* function defined in iirfilt.c */
void iirfilt(void);
/* function defined in autocorr.c */
void autocorr(void);
/* Function defined by c_fft_given_iirc.asm */
//void bit_rev_fft(void);
/* FFT data buffers (declared in c_fft_given_iirc.asm) */
extern int bit_rev_data[N*2]; /* Data output for bit-reverse function */
extern int fft_data[N*2]; /* In-place FFT input & Output array */
/* Our input/output buffers */
int autocorr_in[N];
int autocorr_out[N];
// all data processing should be done in SWI_ProcessBuffer
void SWI_ProcessBuffer()
{
static unsigned int mbox_value = 0;
short *psrc, *pdest;
unsigned int i;
mbox_value |= SWI_getmbox();
// buffers are only processed when both transmit and receive are ready
if((mbox_value & DMA_RECEIVE_DONE) && (mbox_value & DMA_TRANSMIT_DONE)) {
mbox_value = 0;
// get buffer pointers
psrc = receive_buffer[receive_buffer_to_process_index];
pdest = transmit_buffer[transmit_buffer_to_fill_index];
/* First, transfer inputs and outputs */
for (i = 0; i < N; i++)
{
pdest[4*i] = autocorr_in[i];
pdest[4*i+1] = bit_rev_data[i*2] << 8;
/* Some statements useful in debugging */
/* pdest[4*i] = psrc[4*i+2]; */
/* Be sure to comment out PN-sequence generation */
/* when using the next two lines */
//autocorr_in[i] = psrc[4i+2];
}
/* Last, set the DC coefficient to -1 for a trigger pulse */
pdest[0] = -32768;
/* Generate PN input */
rand_fillbuffer();
/* Filter input */
iirfilt();
/* Calculate autocorrelation */
autocorr();
/* Transfer autocorr output to FFT input buffer */
for (i = 0; i < N; i++) {
fft_data[i*2] = autocorr_out[i];
fft_data[i*2+1] = 0;
}
// Bit-reverse and compute FFT
cfft((DATA *)fft_data,N, SCALE);
cbrev((DATA *)fft_data,(DATA *)bit_rev_data,N);
/* Done, wait for next time around! */
receive_buffer_processed = 1; // flag receive buffer as processed
transmit_buffer_filled = 1; // flag output buffer as full
}
}
// lab4bmain.c
// based on main.c by Educational DSP
// Initializes autocorr_out, state, and iseed.
// Can be modified for optimization.
#include "dsk5510_dual3006cfg.h"
#include "dsk5510.h"
#include <csl.h>
#include "lab4b.h"
#define IIR_order 4
extern int iirptr;
unsigned int *iseed;
extern int autocorr_out[N];
extern int state[IIR_order];
void init_DMA();
void main()
{
int i;
DSK5510_init(); // init BSL
DSK5510_rset(DSK5510_MISC, 0x03); // route McBSP0/1 to J3
// Initialize autocorr_out to zero since some values will remain zero
for (i = 0; i < N; ++i)
{
autocorr_out[i] = 0;
}
for ( i = 0; i < IIR_order; ++i)
state[i] = 0;
// Start McBSP0 I2S slave
MCBSP_start(hMcbsp0, MCBSP_XMIT_START | MCBSP_RCV_START |
MCBSP_SRGR_START | MCBSP_SRGR_FRAMESYNC, 220);
// Start McBSP1 I2S master
MCBSP_start(hMcbsp1, MCBSP_XMIT_START | MCBSP_RCV_START |
MCBSP_SRGR_START | MCBSP_SRGR_FRAMESYNC, 220);
init_DMA(); // configure DMA and interrupts
*iseed = 1;
iirptr = 0;
return; // let DSP/BIOS scheduler take over
}
/* ECE420, Lab 4, Reference PN Generator Implementation (Non-Optimized) */
/* Matt Kleffner 08/04 */
/* Original by Michael Frutiger 02/24/04 */
/* Use governed by the Creative Commons Attribution License */
#include "lab4b.h"
extern unsigned int *iseed;
extern int autocorr_in[N];
/* Returns as an integer a random bit, based on the 15 lowest significant
bits in iseed (which is modified for the next call). */
int randbit()
{
int newbit;
/* XOR bits 15, 1 and 0 of iseed */
newbit = (*iseed >> 15) & 1 ^ (*iseed >> 1) & 1 ^ *iseed & 1;
/* Leftshift the seed and put the result of the XOR's in bit 1. */
*iseed=(*iseed << 1) | newbit;
return(newbit);
}
void rand_fillbuffer(void)
{
int i;
for (i = 0; i < N; ++i)
{
if (randbit()) autocorr_in[i] = 32767;
else autocorr_in[i] = -32767;
}
}
/* Simple, unoptimized IIR filter (feedback only) */
/* for TMS320C54X series DSPs */
/* Copyright September 2005 by Matt Kleffner */
/* under the Creative Commons Attribution License */
/* Works for TMS320C55X series as well */
#include "lab4b.h"
#include "intrinsics.h"
/* IIR values and buffers (declared in c_fft_given_iirc.asm) */
#define IIR_order 4
extern int scale;
extern int coef[IIR_order];
extern int state[IIR_order];
/* Arrays declared in main routine */
extern int autocorr_in[N];
extern int autocorr_out[N];
/* Pointer to state buffer location */
extern int iirptr;
void iirfilt()
{
int i, j;
_set_fract_bit();
/* Filter PN input */
for (i = 0; i < N; ++i)
{
int sum = 0;
/* Calculate and sum all feedback terms except the "oldest" one */
for (j = 0; j < (IIR_order-1); ++j)
{
sum += _i_mul_fract_fb1_ii(coef[j],state[iirptr]);
/* Avoid usage of "modulo" routine */
iirptr++;
if (iirptr == IIR_order) iirptr = 0;
}
/* Calculate and sum oldest feedback term without incrementing iirptr */
sum += _i_mul_fract_fb1_ii(coef[IIR_order-1],state[iirptr]);
/* Calculate direct input contribution */
sum += _i_mul_fract_fb1_ii(scale,autocorr_in[i]);
autocorr_in[i] = sum;
state[iirptr] = autocorr_in[i];
}
_reset_fract_bit();
}
/***********************************************************/
/* autocorr.c */
/* Copyright August 2004 by Matt Kleffner */
/* under the Creative Commons Attribution License */
/* */
/* Simple, unoptimized autocorrelation function */
/* for ECE 420 (TMS320C54X series) */
/* */
/* #defines expected in lab4b.h */
/* L: length of data in autocorr_in buffer */
/* N: length of data in autocorr_out buffer */
/* logL: log base 2 of L (used for scaling output) */
/* M: Largest positive lag of autocorrelation desired */
/* (must be < L and < N/2) */
/* */
/* 16-bit-limited input/output (must be defined elsewhere) */
/* autocorr_in: buffer for input data (L pts) */
/* autocorr_out: buffer for output data (N pts) */
/* N must be >= 2*M+1 */
/* assumed to be full of zeros at start */
/* output in zero-phase form */
/***********************************************************/
/* Works for TMS320C55X series */
#include "lab4b.h"
#include "intrinsics.h"
extern int autocorr_in[L];
extern int autocorr_out[N];
void autocorr(void)
{
int i,j,temp;
_set_fract_bit();
for(i=0;i<=M;++i)
{
long int sum=0;
for(j=0;j<(L-i);++j)
{
temp = _i_mul_fract_fb1_ii(autocorr_in[j],autocorr_in[j+i]);
sum += temp;
}
autocorr_out[i]=(int)(sum >> logL);
}
_reset_fract_bit();
/* Copy values for negative indeces at end of buffer */
for(i=1,j=N-1;i<=M;++i,--j)
{ autocorr_out[j]=autocorr_out[i]; }
}
; c_fft_given_iirc.asm
; Designed for use in lab4b for ECE420
.ARMS_off ;enable assembler for ARMS=0
.CPL_on ;enable assembler for CPL=1
.mmregs ;enable mem mapped register names
.global _bit_rev_data
.global _fft_data
.global _state
.global _scale
.global _coef
.copy "macro.asm"
.sect ".data"
N .set 1024
.align 4*N
_bit_rev_data .space 16*2*N
.align 4*N
_fft_data .space 16*2*N
; IIR filter
.align 4
_coef
.word 0
.word 0
.word 0
.word -13421
_state
.space 16*4
_scale
.word 19345
.sect ".text"