Summary: This module describes a processor exercise in which students implement a spectrum analyzer using mixed C and assembly code. Students are to acquire a block of 1024 samples, apply a Hamming window, compute a length-1024 Discrete Fourier Transform using provided Fast Fourier Transform code, and display the magnitude-squared spectrum on an oscilloscope.
As this is your first experience with the C environment, you will have the option to add most of the required code in C or assembly. A C skeleton will provide access to input samples, output samples, and interrupt handling code. You will add code to transfer the inputs and outputs (in blocks at a time), apply a hamming window, compute the magnitude-squared spectrum, and include a trigger pulse. After the hamming window is created, either an assembly or C module that bit reverses the input and performs an FFT calculation is called.
As your spectrum analyzer works on a block of samples at a
time, you will need to use interrupts to pause your processing
while samples are transferred from/to the CODEC (A/D and D/A)
buffer. Fortunately, the interrupt handling routines have
been written for you in a C shell program available at
v:\ece420\54x\dspclib\lab4main.c and the core
code.
Interrupts are an essential part of the operation of any microprocessor. They are particularly important in embedded applications where DSPs are often used. Hardware interrupts provide a way for interacting with external devices while the processor executes code. For example, in a key entry system, a key press would generate a hardware interrupt. The system code would then jump to a specified location in program memory where a routine could process the key input. Interrupts provide an alternative to polling. Instead of checking for key presses at a predetermined rate (requires a clock), the system could be busy executing other code. On the TI-C54x DSP, interrupts provide a convenient way to transfer blocks of data to/from the CODEC in a timely fashion.
The lab4main.c code and the core code are intended
to make your interaction with the hardware much simpler. As
there was a core file for working in the assembly environment
(Labs 0-3), there is a core file for the C environment
(V:\ece420\54x\dspclib\core.asm) which handles the interrupts
from the CODEC (A/D and D/A) and the serial port. Here, we
will describe the important aspects of the core code necessary
to complete the assignment.
At the heart of the hardware interaction is the auto-buffering
serial port. In the auto-buffering serial mode, the TI-C54x
processor is able to do processing
uninterrupted while samples are
transferred to/from a buffer of length
BlockLen, then no additional interrupt handling
routines would be necessary. Samples could be collected in
a 1024-length buffer and a 1024-point FFT could be computed
uninterrupted while the auto-buffering buffer fills.
Unfortunately, the DSP is not fast enough to accomplish this
task.
We now provide an explanation of the shell C program
lab4main.c listed in Appendix A. The
lab4main.c file contains the function
interrupt void irq and a main program. The
main program is an infinite loop over blocks of
BlockLen samples. Inside the infinite loop,
you will insert code to do the operations which
follow. Although each of these operations may be performed
in C or assembly, we suggest you follow the guidelines
suggested.
The function WaitAudio is an assembly function in the core code
which handles the CODEC interrupts. An interrupt from the CODEC
occurs every BlockLen samples. The
SetAudioInterrupt(irq) call in the main program
tells the core code to jump to the irq function
when an interrupt occurs. In the irq function,
BlockLen samples of the A/D input in
Rcvptr (channel 1) are written to a length
inputs buffer, and
BlockLen of the output samples in the
outputs buffer are written to the D/A output
via Xmitptr on channel 2. In C, pointers may be
used as array names so that Xmitptr[0] is the
first word pointed to by Xmitptr. As in the
assembly core, the input samples are not in
consecutive order. The right and left inputs are offset from
Rcvptr respectively by
Xmitptr. On channel 1 of the
output, the input is echoed out. You are to fill
the buffer outputs with the windowed
magnitude-squared FFT values by performing the operations
listed above.
In the main code, the while(!input_full);
loop waits for
inputs
buffer. Next, the
BlockLen sample times;
otherwise the first BlockLen of samples of
output would not be available on time. Once this loop is
finished, the lengthy processing of the FFT can continue.
During this processing, the DSP is interrupted every
BlockLen samples to transfer samples. Once
this processing is over, the infinite loop returns to
while(!input_full); to wait for
The flow diagram in Figure 1 summarizes the operation of the interrupt handling routine
|
As the list of operations indicates, bit-reversal and FFT
computation are to be done in both C and assembly. For the
assembly version, make sure that the line defining
C_FFT is commented in lab4main.c.
We are providing you with a shell assembly file, available at
v:\ece420\54x\dspclib\c_fft_given.asm and shown
in Appendix B, containing many
useful declarations and some code. The code for performing
bit-reversal and other declarations needed for the FFT
routine are also provided in this section.
However, we would like you to enter this code
manually, as you will be expected to understand its
operation.
The assembly file c_fft_given.asm
contains two main parts, the data section
starting with .sect ".data" and the
program section starting with
.sect ".text". Every function and
variable accessed in C must be preceded by a single underscore
_ in assembly and a
.global _name must be placed in the
assembly file for linking. In this example,
bit_rev_fft is an assembly function called from
the C program with a label _bit_rev_fft in the
text portion of the assembly file and a
.global _bit_rev_fft declaration. In each
assembly function, the macro ENTER_ASM is
called upon entering and LEAVE_ASM is
called upon exiting. These macros are defined in
v:\ece420\54x\dspclib\core.inc. The
ENTER_ASM macro saves the status registers
and AR1, AR6, and
AR7 when entering a function as required
by the register use conventions. The ENTER_ASM
macro also sets the status registers to the assembly conventions we
have been using (i.e, FRCT=1 for fractional
arithmetic and CPL=0 for
DP referenced addressing). The
LEAVE_ASM macro just restores the saved
registers.
The parameter passing convention between assembly and C is
simple for single input, single output assembly functions. From
a C program, the input to an assembly program is in the low part
of accumulator A with the output returned
in the same place. When more
than one parameter is passed to an assembly function, the
parameters are passed on the stack (see the core file
description for more information). We suggest that you avoid
passing or returning more than one parameter. Instead, use global
memory addresses to pass in or return more than one parameter.
Another alternative is to pass a pointer to the start of a buffer
intended for passing and returning parameters.
When entering and leaving an assembly function, the
ENTER_ASM and LEAVE_ASM
macros ensure that certain registers are saved and restored.
Since the C program may use any and all registers, the state of
a register cannot be expected to remain the same between calls to
assembly function(s). Therefore, any information that
needs to be preserved across calls to an assembly function must be
saved to memory!
Now, we explain how to use the FFT routine provided by TI
for the C54x. The FFT routine fft.asm located
in v:\ece420\54x\dsplib\ computes an in-place,
complex FFT. The length of the FFT is defined as a label
K_FFT_SIZE and the algorithm assumes that the
input starts at data memory location _fft_data.
To have your code assemble for an
N .set 1024
K_FFT_SIZE .set N ; size of FFT
K_LOGN .set 10 ; number of stages (log_2(N))
In addition to defining these constants, you will have to
include twiddle-factor tables for the FFT. These tables
(twiddle1 and twiddle2) are available in the shared
directory v:\ece420\54x\dsplib\. Note that the
tables are each
.sect ".data"
.align 1024
sine .copy "v:\ece420\54x\dsplib\twiddle1"
.align 1024
cosine .copy "v:\ece420\54x\dsplib\twiddle2"
The FFT provided requires that the input be in bit-reversed
order, with alternating real and imaginary
components. Bit-reversed addressing is a convenient way to
order input
| Input Order | Binary Representation | Bit-Reversed Representation | Output Order |
|---|---|---|---|
| 0 | 000 | 000 | 0 |
| 1 | 001 | 100 | 4 |
| 2 | 010 | 010 | 2 |
| 3 | 011 | 110 | 6 |
| 4 | 100 | 001 | 1 |
| 5 | 101 | 101 | 5 |
| 6 | 110 | 011 | 3 |
| 7 | 111 | 111 | 7 |
The following routine performs the bit-reversed reordering
of the input data. The routine assumes that the input is
stored in data memory starting at the location labeled
_bit_rev_data, which must be aligned to the
least power of two greater than the input buffer length, and
consists of alternating real and imaginary parts. Because
our input data is going to be purely real in this lab, you
will have to make sure that you set the imaginary parts to
zero by zeroing out every other memory location.
1 bit_rev:
2 STM #_bit_rev_data,AR3 ; AR3 -> original input
3 STM #_fft_data,AR7 ; AR7 -> data processing buffer
4 MVMM AR7,AR2 ; AR2 -> bit-reversed data
5 STM #K_FFT_SIZE-1,BRC
6 RPTBD bit_rev_end-1
7 STM #K_FFT_SIZE,AR0 ; AR0 = 1/2 size of circ buffer
8 MVDD *AR3+,*AR2+
9 MVDD *AR3-,*AR2+
10 MAR *AR3+0B
11 bit_rev_end:
12 NOP
13 RET
As mentioned, in the above code _bit_rev_data
is a label indicating the start of the input data and
_fft_data is a label indicating the start of a
circular buffer where the bit-reversed data will be
written. Note that although AR7 is not used by
the bit-reversed routine directly, it is used extensively in
the FFT routine to keep track of the start of the FFT data
space.
In general, to have a pointer index memory in bit-reversed
order, the AR0 register needs to be set to
one-half the length of the circular buffer; a statement such
as ARx+0B is used to move the ARx
pointer to the next location. For more information regarding
the bit-reversed addressing mode, refer to page
5-18 in the TI-54x
CPU and Peripherals manual. Is it possible to
bit-reverse a buffer in place? For a diagram of the
ordering of the data expected by the FFT routine, see
Figure 4-10 in the TI-54x
Applications Guide. Note that the FFT code uses all
the pointers available and does not restore the pointers to
their original values.
A bit-reversing and FFT routine have also been provided in
lab4fft.c, listed in
Appendix C. Again,
make sure you understand how the bit reversal is taking
place.
In lab4main.c, the line defining C_FFT
must not be commented for use of the C FFT routine. The sine
tables (twiddle factors) are located in
sinetables.h.
This fft requires its inputs in two buffers, the real buffer
real and the imaginary buffer imag,
and the output is placed in the same buffers.
The length of the FFT, N, and logN are
defined in lab4.h, which is also listed in
Appendix C.
When experimenting with the C
FFT make sure you modify these length values instead of the ones
in the assembly code and lab4main.c!
As mentioned, you will be using the FFT to compute the
spectrum of a windowed input. For your implementation you
will need to create a 1024-point Hamming window. First,
create a Hamming window in matlab using the function
hamming. For the assembly FFT, use
save_coef to save the window to a file
that can then be included in your code with the
.copy directive. For the C FFT, use the matlab
function
write_intvector_headerfile
with name set to 'window' and
elemperline set to 8 to create
the header file that is included in lab4main.c.
Once the DFT has been computed, you must calculate the squared magnitude of the spectrum for display.
squr and squra useful in
implementing Equation 1.
Because the squared magnitude is always nonnegative, you can
replace one of the magnitude values with a -1.0 as a trigger
pulse for display on the oscilloscope. This is easily
performed by replacing the DC term (
If you are planning on writing some of the code in C, then
you may be forced to use intrinsics. Intrinsic instructions
provide a way to use assembly instructions directly in C.
An example of an intrinsic instruction is
bit_rev_data[0]=_smpyr(bit_rev_data[0],window[0])
which performs the assembly signed multiply round
instruction. You may also find the _lsmpy
instruction useful. For more information on intrinsics, see
page 6-22 of the TI-C54x
Optimizing C/C++ Compiler User's Guide.
The following lines of code were borrowed from the C FFT to serve as an example of arithmetic operations in C. Save this code in a file called mathex.c and compile this file. Look at the resulting assembly file and investigate the differences between each block. Be sure to reference the compiler user's guide to find out what the state of the FRCT and OVM bits are. Does each block work properly for all possible values?
void main(void)
{
int s1, s2;
int t1, t2;
int i1, i2;
int n1 = 16383, n2 = 16382, n3 = 16381, n4 = 16380;
/* Code for standard 32-bit hardware, */
/* with x,y limited to 16 bits */
s1 = (n1*n2 + n3*n4) >> 15;
s2 = (n1 + n2) >> 1;
/* Code for TI TMSC5000 series */
t1 = ((long int)(n1*n2) + (long int)(n3*n4)) >> 15;
t2 = ((long int)n1 + (long int)n2) >> 1;
/* Intrinsic code for TMS320C54X series */
i1 = _sadd(_smpy(n1,n2), _smpy(n3,n4));
i2 = _sshl(_sadd(n1, n2),-1);
}
A working program can be produced by compiling the C code and
linking assembly modules and the core module. The compiler
translates C code to a relocatable assembly form. The linker
assigns physical addresses on the DSP to the relocatable data
and code segments, resolves .global
references and links runtime libraries.
The procedure for compiling C code and linking assembly modules
has been automated for you in the batch file
v:\ece420\54x\dsptools\c_asm.bat. The name of the
first file becomes the name of the executable. Once you have
completed lab4main.c and c_fft_given.asm,
type c_asm lab4main.c c_fft_given.asm to produce a
lab4main.out file to be loaded onto the DSP. For the
C FFT type c_asm lab4main.c lab4fft.c to produce
lab4main.out.
Load the output file onto the DSP as usual and confirm that valid
FFTs are calculated. Once valid output is obtained, measure how
many clock cycles it takes to compute both the assembly and C FFT.
From your prelab experiments, you should be able to describe
the effect of windowing and zero-padding on FFT spectral
analysis. In your DSP system, experiment with different
inputs, changing
inputs with
1 /* v:/ece420/54x/dspclib/lab4main.c */
2 /* dgs - 9/14/2001 */
3 /* mdk - 2/10/2004 C FFT update */
4
5 #include "v:/ece420/54x/dspclib/core.h"
6
7 /* comment the next line to use assembly fft */
8 #define C_FFT
9
10 #ifdef C_FFT /* Use C FFT */
11
12 #include "window.h"
13 #include "lab4.h" /* Number of C FFT points defined here */
14
15 /* function defined in lab4fft.c */
16 void fft(void);
17
18 /* FFT data buffers */
19 int real[N]; /* Real part of data */
20 int imag[N]; /* Imaginary part of data */
21
22 #else /* Use assembly FFT */
23
24 #define N 1024 /* Number of assembly FFT points */
25
26 /* Function defined by c_fft_given.asm */
27 void bit_rev_fft(void);
28
29 /* FFT data buffers (declared in c_fft_given.asm) */
30 extern int bit_rev_data[N*2]; /* Data input for bit-reverse function */
31 extern int fft_data[N*2]; /* In-place FFT & Output array */
32 extern int window[N]; /* The Hamming window */
33
34 #endif /* C_FFT */
35
36
37 /* Our input/output buffers */
38 int inputs[N];
39 int outputs[N];
40
41 volatile int input_full = 0; /* volatile means interrupt changes it */
42 int count = 0;
43
44
45 interrupt void irq(void)
46 {
47 int *Xmitptr,*Rcvptr; /* pointers to Xmit & Rcv Bufs */
48 int i;
49
50 static int in_irq = 0; /* Flag to prevent reentrance */
51
52 /* Make sure we're not in the interrupt (should never happen) */
53 if( in_irq )
54 return;
55
56 /* Mark we're processing, and enable interrupts */
57 in_irq = 1;
58 enable_irq();
59
60 /* The following waitaudio call is guaranteed not to
61 actually wait; it will simply return the pointers. */
62 WaitAudio(&Rcvptr,&Xmitptr);
63
64 /* input_full should never be true... */
65 if( !input_full )
66 {
67 for (i=0; i<BlockLen; i++)
68 {
69 /* Save input, and echo to channel 1 */
70 inputs[count] = Xmitptr[6*i] = Rcvptr[4*i];
71
72 /* Send FFT output to channel 2 */
73 Xmitptr[6*i+1] = outputs[count];
74
75 count++;
76 }
77 }
78
79 /* Have we collected enough data yet? */
80 if( count >= N )
81 input_full = 1;
82
83 /* We're not in the interrupt anymore... */
84 disable_irq();
85 in_irq = 0;
86 }
87
88
89 main()
90 {
91 /* Initialize IRQ stuff */
92 count = 0;
93 input_full = 0;
94 SetAudioInterrupt(irq); /* Set up interrupts */
95
96 while (1)
97 {
98 while( !input_full ); /* Wait for a data buffer to collect */
99
100 /* From here until we clear input_full can only take *
101 * BlockLen sample times, so don't do too much here. */
102
103 /* First, transfer inputs and outputs */
104
105 #ifdef C_FFT /* Use C FFT */
106 /* I n s e r t c o d e t o f i l l */
107 /* C F F T b u f f e r s */
108
109 #else /* Use assembly FFT */
110 /* I n s e r t c o d e t o f i l l */
111 /* a s s e m b l y F F T b u f f e r s */
112
113 #endif /* C_FFT */
114
115 /* Done with that... ready for new data collection */
116 count = 0; /* Need to reset the count */
117 input_full = 0; /* Mark we're ready to collect more data */
118
119 /**********************************************************/
120 /* Now that we've gotten the data moved, we can do the */
121 /* more lengthy processing. */
122
123 #ifdef C_FFT /* Use C FFT */
124
125 /* Multiply the input signal by the Hamming window. */
126 /* . . . i n s e r t C / a s m code . . . */
127
128 /* Bit-reverse and compute FFT in C */
129 fft();
130
131 /* Now, take absolute value squared of FFT */
132 /* . . . i n s e r t C / a s m code . . . */
133
134 /* Last, set the DC coefficient to -1 for a trigger pulse */
135 /* . . . i n s e r t C / a s m code . . . */
136
137 /* done, wait for next time around! */
138
139
140 #else /* Use assembly FFT */
141
142 /* Multiply the input signal by the Hamming window. */
143 /* . . . i n s e r t C / a s m code . . . */
144
145 /* Bit-reverse and compute FFT in assembly */
146 bit_rev_fft();
147
148 /* Now, take absolute value squared of FFT */
149 /* . . . i n s e r t C / a s m code . . . */
150
151 /* Last, set the DC coefficient to -1 for a trigger pulse */
152 /* . . . i n s e r t C / a s m code . . . */
153
154 /* done, wait for next time around! */
155
156
157 #endif /* C_FFT */
158
159 }
160 }
1 ; v:\ece420\54x\dspclib\c_fft_given.asm
2 ; dgs - 9/14/2001
3 .copy "v:\ece420\54x\dspclib\core.inc"
4
5 .global _bit_rev_data
6 .global _fft_data
7 .global _window
8
9 .global _bit_rev_fft
10
11 .sect ".data"
12
13 .align 4*N
14 _bit_rev_data .space 16*2*N ; Input to _bit_rev_fft
15
16 .align 4*N
17 _fft_data .space 16*2*N ; FFT output buffer
18
19
20 ; Copy in the Hamming window
21 _window ; The Hamming window
22 .copy "window.asm"
23
24 .sect ".text"
25
26 _bit_rev_fft
27 ENTER_ASM
28
29 call bit_rev ; Do the bit-reversal.
30
31 call fft ; Do the FFT
32
33 LEAVE_ASM
34 RET
35
36 bit_rev:
37 STM #_bit_rev_data,AR3 ; AR3 -> original input
38 STM #_fft_data,AR7 ; AR7 -> data processing buffer
39 MVMM AR7,AR2 ; AR2 -> bit-reversed data
40 STM #K_FFT_SIZE-1,BRC
41 RPTBD bit_rev_end-1
42 STM #K_FFT_SIZE,AR0 ; AR0 = 1/2 size of circ buffer
43 MVDD *AR3+,*AR2+
44 MVDD *AR3-,*AR2+
45 MAR *AR3+0B
46 bit_rev_end:
47 NOP
48 RET
49
50 ; Copy the actual FFT subroutine.
51 fft_data .set _fft_data ; FFT code needs this.
52 .copy "v:\ece420\54x\dsplib\fft.asm"
53
54
55 ; If you need any more assembly subroutines, make sure you name them
56 ; _name, and include a ".global _name" directive at the top. Also,
57 ; don't forget to use ENTER_ASM at the beginning, and LEAVE_ASM
58 ; and RET at the end!
1 #define N 1024 /* Number of FFT points */
2 #define logN 10
1 /*****************************************************************/
2 /* lab4fft.c */
3 /* Douglas L. Jones */
4 /* University of Illinois at Urbana-Champaign */
5 /* January 19, 1992 */
6 /* Changed for use w/ short integers and lookup table for ECE420 */
7 /* Matt Kleffner */
8 /* February 10, 2004 */
9 /* */
10 /* fft: in-place radix-2 DIT DFT of a complex input */
11 /* */
12 /* Permission to copy and use this program is granted */
13 /* as long as this header is included. */
14 /* */
15 /* WARNING: */
16 /* This file is intended for educational use only, since most */
17 /* manufacturers provide hand-tuned libraries which typically */
18 /* include the fastest fft routine for their DSP/processor */
19 /* architectures. High-quality, open-source fft routines */
20 /* written in C (and included in MATLAB) can be found at */
21 /* http://www.fftw.org */
22 /* */
23 /* #defines expected in lab4.h */
24 /* N: length of FFT: must be a power of two */
25 /* logN: N = 2**logN */
26 /* */
27 /* 16-bit-limited input/output (must be defined elsewhere) */
28 /* real: integer array of length N with real part of data */
29 /* imag: integer array of length N with imag part of data */
30 /* */
31 /* sinetables.h must */
32 /* 1) #define Nt to an equal or greater power of two than N */
33 /* 2) contain the following integer arrays with */
34 /* element magnitudes bounded by M = 2**15-1: */
35 /* costable: M*cos(-2*pi*n/Nt), n=0,1,...,Nt/2-1 */
36 /* sintable: M*sin(-2*pi*n/Nt), n=0,1,...,Nt/2-1 */
37 /* */
38 /*****************************************************************/
39
40 #include "lab4.h"
41 #include "sinetables.h"
42
43 extern int real[N];
44 extern int imag[N];
45
46 void fft(void)
47 {
48 int i,j,k,n1,n2,n3;
49 int c,s,a,t,Wr,Wi;
50
51 j = 0; /* bit-reverse */
52 n2 = N >> 1;
53 for (i=1; i < N - 1; i++)
54 {
55 n1 = n2;
56 while ( j >= n1 )
57 {
58 j = j - n1;
59 n1 = n1 >> 1;
60 }
61 j = j + n1;
62
63 if (i < j)
64 {
65 t = real[i];
66 real[i] = real[j];
67 real[j] = t;
68 t = imag[i];
69 imag[i] = imag[j];
70 imag[j] = t;
71 }
72 }
73
74 /* FFT */
75 n2 = 1; n3 = Nt;
76
77 for (i=0; i < logN; i++)
78 {
79 n1 = n2; /* n1 = 2**i */
80 n2 = n2 + n2; /* n2 = 2**(i+1) */
81 n3 = n3 >> 1; /* cos/sin arg of -6.283185307179586/n2 */
82 a = 0;
83
84 for (j=0; j < n1; j++)
85 {
86 c = costable[a];
87 s = sintable[a];
88 a = a + n3;
89
90 for (k=j; k < N; k=k+n2)
91 {
92 /* Code for standard 32-bit hardware, */
93 /* with real,imag limited to 16 bits */
94 /*
95 Wr = (c*real[k+n1] - s*imag[k+n1]) >> 15;
96 Wi = (s*real[k+n1] + c*imag[k+n1]) >> 15;
97 real[k+n1] = (real[k] - Wr) >> 1;
98 imag[k+n1] = (imag[k] - Wi) >> 1;
99 real[k] = (real[k] + Wr) >> 1;
100 imag[k] = (imag[k] + Wi) >> 1;
101 */
102 /* End standard 32-bit code */
103
104 /* Code for TI TMS320C54X series */
105
106 Wr = ((long int)(c*real[k+n1]) - (long int)(s*imag[k+n1])) >> 15;
107 Wi = ((long int)(s*real[k+n1]) + (long int)(c*imag[k+n1])) >> 15;
108 real[k+n1] = ((long int)real[k] - (long int)Wr) >> 1;
109 imag[k+n1] = ((long int)imag[k] - (long int)Wi) >> 1;
110 real[k] = ((long int)real[k] + (long int)Wr) >> 1;
111 imag[k] = ((long int)imag[k] + (long int)Wi) >> 1;
112
113 /* End code for TMS320C54X series */
114
115 /* Intrinsic code for TMS320C54X series */
116 /*
117 Wr = _ssub(_smpy(c, real[k+n1]), _smpy(s, imag[k+n1]));
118 Wi = _sadd(_smpy(s, real[k+n1]), _smpy(c, imag[k+n1]));
119 real[k+n1] = _sshl(_ssub(real[k], Wr),-1);
120 imag[k+n1] = _sshl(_ssub(imag[k], Wi),-1);
121 real[k] = _sshl(_sadd(real[k], Wr),-1);
122 imag[k] = _sshl(_sadd(imag[k], Wi),-1);
123 */
124 /* End intrinsic code for TMS320C54X series */
125 }
126 }
127 }
128 return;
129 }