Summary: This is a processor exercise in which students implement a spectrum analyzer using mixed C and assembly code. Students are to acquire a block of 1024 samples, apply a Hamming window, compute a length-1024 Discrete Fourier Transform using provided Fast Fourier Transform code, and display the magnitude-squared spectrum on an oscilloscope. Students will also compile and explore a reference implementation of an autocorrelation-based power spectral density (PSD) estimator. This implementation estimates the PSD of an IIR-filtered pseudo-noise generator.
v:\ece420\55x\lab4\main.c. For this lab, you will be working with the code available at v:\ece420\55x\lab4.
main.c,dma.c, and lab4.c code are intended
to make your interaction with the hardware much simpler. These files handle the the buffering of data using interrupts from the CODEC (A/D and D/A) and the USB port. Here, we
will describe the important aspects of the code necessary
to complete the assignment.
main.c. The
main.c file contains the main function that sets up the McBSP (Multi Channel Buffered Serial Port), DMA (Direct Memory Access), and interrupts. Then it returns and allows the DSP/BIOS scheduler to take over.
dma.c. The buffers are defined in this file, as well as the hardware interrupts. The hardware interrupts are initialized by calling the function init_DMAin the main function. When the hardware interrupts are triggered, they call the HWI_DMA0_Transmit() and HWI_DMA1_Receive() functions. At the end of these two functions, the software interrupt SWI_Process() is posted with different variables. Posting SWI_Process() will call the SWI_ProcessBuffer() function which will require modification.SWI_ProcessBuffer() function is defined in lab4.c. It is called every time the software interrupt SWI_Process is posted, which is set to happen every N = 1024 samples. As given, the function will simply copy the inputs to the outputs. (After commenting out some lines and uncommenting others.) Follow the example and comments to modify the code to perform the necessary operations.pdest[0] is the
first word pointed to by pdest. The input samples are not in
consecutive order and must be accessed with offsets. The four channels of input are offset from
psrc respectively by
pdest. On channel 1 of the
output, the input is echoed out. You are to fill
channel 2 with the windowed
magnitude-squared FFT values by performing the operations
listed above. For the first step, take a look at the way we make the DSPLIB cfft call to find out where to transfer the inputs to. (You may change the function call cfft to pass in different values if you like. Just remember that bit_rev() expects its input in a specific location.) Likewise, take a look at the C FFT code (declared in lab4fft.c to find out where to copy the inputs to.
C_FFT is commented out in lab4.c.
We are providing you with a shell assembly file, available at
v:\ece420\55x\lab4\c_fft_given.asm and shown
in Appendix B, containing many
useful declarations and some code. The code for performing
bit-reversal and other declarations needed for the FFT
routine are also provided in this section.
However, we would like you to enter this code
manually, as you will be expected to understand its
operation.
c_fft_given.asm
contains two main parts, the data section
starting with .sect ".data" and the
program section starting with
.sect ".text". Every function and
variable accessed in C must be preceded by a single underscore
_ in assembly and a
.global _name must be placed in the
assembly file for linking. In this example,
bit_rev_fft is an assembly function called from
the C program with a label _bit_rev_fft in the
text portion of the assembly file and a
.global _bit_rev_fft declaration. In each
assembly function, the macro ENTER_ASM is
called upon entering and LEAVE_ASM is
called upon exiting. These macros are defined in
v:\ece420\55x\macro.asm. The
ENTER_ASM macro saves the status registers
and AR1, AR6, and
AR7 when entering a function as required
by the register use conventions. The ENTER_ASM
macro also sets the status registers to the assembly conventions we
have been using (i.e, FRCT=1 for fractional
arithmetic and CPL=0 for
DP referenced addressing). The
LEAVE_ASM macro just restores the saved
registers.
ENTER_ASM and LEAVE_ASM
macros ensure that certain registers are saved and restored.
Since the C program may use any and all registers, the state of
a register cannot be expected to remain the same between calls to
assembly function(s). Therefore, any information that
needs to be preserved across calls to an assembly function must be
saved to memory!
CFFT function will be used for the DSPLIB implementation. Refer to the reference guide to figure out the correct syntax in calling the function.
N .set 1024
K_FFT_SIZE .set N ; size of FFT
| Input Order | Binary Representation | Bit-Reversed Representation | Output Order |
|---|---|---|---|
| 0 | 000 | 000 | 0 |
| 1 | 001 | 100 | 4 |
| 2 | 010 | 010 | 2 |
| 3 | 011 | 110 | 6 |
| 4 | 100 | 001 | 1 |
| 5 | 101 | 101 | 5 |
| 6 | 110 | 011 | 3 |
| 7 | 111 | 111 | 7 |
_bit_rev_data, which must be aligned to the
least power of two greater than the input buffer length, and
consists of alternating real and imaginary parts. Because
our input data is going to be purely real in this lab, you
will have to make sure that you set the imaginary parts to
zero by zeroing out every other memory location.
_bit_rev
ENTER_ASM
MOV #_fft_data, AR3 ;AR3 -> original input
MOV #_bit_rev_data, AR7 ;AR7 -> data processing buffer
MOV AR7, AR2 ;AR2 -> bit-reversed data
MOV #K_FFT_SIZE-1, BRC0
MOV #K_FFT_SIZE, T0 ; T0 = 1/2 size of circ buffer
RPTB bit_rev_end-1
MOV dbl(*AR3), AC0
MOV AC0, dbl(*AR2+)
AMAR *(AR3+T0B)
bit_rev_end:
LEAVE_ASM
RET
_fft_data
is a label indicating the start of the input data and
_bit_rev_data is a label indicating the start of a
circular buffer where the bit-reversed data will be
written. T0 register needs to be set to
one-half the length of the circular buffer; a statement such
as ARx+T0B is used to move the ARx
pointer to the next location. For more information regarding
the bit-reversed addressing mode, refer to Chapter 6 in the DSP Programmer's Reference Guide. Is it possible to
bit-reverse a buffer in place?lab4fft.c, listed in
Appendix C. Again,
make sure you understand how the bit reversal is taking
place.
In main.c, the line defining C_FFT
must not be commented for use of the C FFT routine. The sine
tables (twiddle factors) are located in
sinetables.h.
This fft requires its inputs in two buffers, the real buffer
real and the imaginary buffer imag,
and the output is placed in the same buffers.
The length of the FFT, N, and logN are
defined in lab4.h, which is also listed in
Appendix C.
When experimenting with the C
FFT make sure you modify these length values instead of the ones
in the assembly code and main.c!
hamming. Use the matlab
function
write_intvector_headerfile
with name set to 'window' and
elemperline set to 8 to create
the header file that is included in lab4.c.
The window has already been created for you.
sqrm and sqam useful in
implementing Equation 1.
bit_rev_data[0]=_smpyr(bit_rev_data[0],window[0])
which performs the assembly signed multiply round
instruction. You may also find the _lsmpy
instruction useful. For more information on intrinsics, see
page 3-29 of the TI-C55x
DSP Programmer's Guide.
Project->New... . In the following window, enter mathex as the name for the project and save it in its own folder on the W: drive. Verify that the Project Type is Executable (.out) and that the target is TMS320C55XX before clicking Finish.
int s1, s2;
int t1, t2;
int i1, i2;
int n1 = 16383, n2 = 16382, n3 = 16381, n4 = 16380;
void main(void)
{
/* Code for standard 32-bit hardware, */
/* with x,y limited to 16 bits */
s1 = (n1*n2 + n3*n4) >> 15;
s2 = (n1 + n2) >> 1;
/* Code for TI TMS320C55X series */
t1 = ((long int)(n1*n2) + (long int)(n3*n4)) >> 15;
t2 = ((long int)n1 + (long int)n2) >> 1;
/* Intrinsic code for TMS320C55X series */
i1 = _sadd(_smpy(n1,n2), _smpy(n3,n4));
i2 = _sshl(_sadd(n1, n2),-1);
while(1);
}
Add Files to Project.. . By default, the generated assembly code is not saved. To save the generated assembly for later comparison, go to Project->Build Options. Under the Compiler tab, click on the Assembly category and make sure Keep Generated .asm Files is selected..global
references and links runtime libraries.
lab4.c and c_fft_given.asm, select Project->Rebuild All. Load the output file onto the DSP as usual and confirm that valid
FFTs are calculated. Once valid output is obtained, measure how
many clock cycles it takes to compute both the assembly and C FFT.
inputs with
#include "dsk5510_dual3006cfg.h"
#include "dsk5510.h"
#include "swi_process.h"
#include "dsplib.h"
#define N 1024
#define logN 10
#include "window.h"
/* comment the next line to use DSPLIB fft */
//#define C_FFT
#ifdef C_FFT /* Use C FFT */
/* function defined in lab4fft.c */
void fft(void);
/* FFT data buffers */
int real[N]; /* Real part of data */
int imag[N]; /* Imaginary part of data */
#include "lab4fft.c"
#else /* Use DSPLIB FFT */
/* Function defined by c_fft_given.asm */
void bit_rev(void);
/* FFT data buffers (declared in c_fft_given.asm) */
extern int bit_rev_data[N*2]; /* Data output for bit-reverse function */
extern int fft_data[N*2]; /* In-place FFT & Output array */
#endif /* C_FFT */
// all data processing should be done in SWI_ProcessBuffer
void SWI_ProcessBuffer()
{
static unsigned int mbox_value = 0;
short *psrc, *pdest;
unsigned int i;
mbox_value |= SWI_getmbox();
// buffers are only processed when both transmit and receive are ready
if((mbox_value & DMA_RECEIVE_DONE) && (mbox_value & DMA_TRANSMIT_DONE)) {
mbox_value = 0;
// get buffer pointers
psrc = receive_buffer[receive_buffer_to_process_index];
pdest = transmit_buffer[transmit_buffer_to_fill_index];
// samples are interleaved in input buffer 3-4-1-2
// output buffer is organized 3-4-1-2-3-4-1-2
// The following code would copy input from each input channel to the
// respective output channel:
/*
for (i = 0; i < 1024; i++)
{
pdest[4*i] = psrc[4*i]; //channel 3 output is channel 3 input
pdest[4*i+1] = psrc[4*i+1]; //channel 4 output is channel 4 input
pdest[4*i+2] = psrc[4*i+2]; //channel 1 output is channel 1 input
pdest[4*i+3] = psrc[4*i+3]; //channel 2 output is channel 2 input
}
*/
#ifdef C_FFT /* Use C FFT */
/* I n s e r t c o d e t o f i l l */
/* C F F T b u f f e r s */
#else /* Use DSPLIB FFT */
/* I n s e r t c o d e t o f i l l */
/* a s s e m b l y F F T b u f f e r s */
#endif /* C_FFT */
#ifdef C_FFT /* Use C FFT */
/* Multiply the input signal by the Hamming window. */
/* . . . i n s e r t C / a s m code . . . */
/* Bit-reverse and compute FFT in C */
fft();
/* Now, take absolute value squared of FFT */
/* . . . i n s e r t C / a s m code . . . */
/* Last, set the DC coefficient to -1 for a trigger pulse */
/* . . . i n s e r t C / a s m code . . . */
/* done, wait for next time around! */
#else /* Use DSPLIB FFT */
/* Multiply the input signal by the Hamming window. */
/* . . . i n s e r t C / a s m code . . . */
/* Compute FFT using DSPLIB function */
cfft((DATA *)fft_data,N, SCALE);
/* Bit reverse using assembly function */
bit_rev();
/* Now, take absolute value squared of FFT */
/* . . . i n s e r t C / a s m code . . . */
/* Last, set the DC coefficient to -1 for a trigger pulse */
/* . . . i n s e r t C / a s m code . . . */
/* done, wait for next time around! */
#endif /* C_FFT */
receive_buffer_processed = 1; // flag receive buffer as processed
transmit_buffer_filled = 1; // flag output buffer as full
}
}
.ARMS_off ;enable assembler for ARMS=0
.CPL_on ;enable assembler for CPL=1
.mmregs ;enable mem mapped register names
.global _bit_rev_data
.global _fft_data
.global _window
.global _bit_rev
.copy "macro.asm"
.sect ".data"
N .set 1024
K_FFT_SIZE .set 1024
.align 4*N
_bit_rev_data .space 16*2*N ; Output of bit reversing function
.align 4*N
_fft_data .space 16*2*N ; FFT output buffer
.sect ".text"
_bit_rev
ENTER_ASM
MOV #_fft_data, AR3
MOV #_bit_rev_data, AR7
MOV AR7, AR2
MOV #K_FFT_SIZE-1, BRC0
MOV #K_FFT_SIZE, T0
RPTB bit_rev_end-1
MOV dbl(*AR3), AC0
MOV AC0, dbl(*AR2+)
AMAR *(AR3+T0B)
bit_rev_end:
LEAVE_ASM
RET
; If you need any more assembly subroutines, make sure you name them
; _name, and include a ".global _name" directive at the top. Also,
; don't forget to use ENTER_ASM at the beginning, and LEAVE_ASM
; and RET at the end!
/*****************************************************************/
/* lab4fft.c */
/* Douglas L. Jones */
/* University of Illinois at Urbana-Champaign */
/* January 19, 1992 */
/* Changed for use w/ short integers and lookup table for ECE420 */
/* Matt Kleffner */
/* February 10, 2004 */
/* */
/* fft: in-place radix-2 DIT DFT of a complex input */
/* */
/* Permission to copy and use this program is granted */
/* as long as this header is included. */
/* */
/* WARNING: */
/* This file is intended for educational use only, since most */
/* manufacturers provide hand-tuned libraries which typically */
/* include the fastest fft routine for their DSP/processor */
/* architectures. High-quality, open-source fft routines */
/* written in C (and included in MATLAB) can be found at */
/* http://www.fftw.org */
/* */
/* #defines expected in lab4.h */
/* N: length of FFT: must be a power of two */
/* logN: N = 2**logN */
/* */
/* 16-bit-limited input/output (must be defined elsewhere) */
/* real: integer array of length N with real part of data */
/* imag: integer array of length N with imag part of data */
/* */
/* sinetables.h must */
/* 1) #define Nt to an equal or greater power of two than N */
/* 2) contain the following integer arrays with */
/* element magnitudes bounded by M = 2**15-1: */
/* costable: M*cos(-2*pi*n/Nt), n=0,1,...,Nt/2-1 */
/* sintable: M*sin(-2*pi*n/Nt), n=0,1,...,Nt/2-1 */
/* */
/*****************************************************************/
#include "sinetables.h"
void fft(void)
{
int i,j,k,n1,n2,n3;
int c,s,a,t,Wr,Wi;
j = 0; /* bit-reverse */
n2 = N >> 1;
for (i=1; i < N - 1; i++)
{
n1 = n2;
while ( j >= n1 )
{
j = j - n1;
n1 = n1 >> 1;
}
j = j + n1;
if (i < j)
{
t = real[i];
real[i] = real[j];
real[j] = t;
t = imag[i];
imag[i] = imag[j];
imag[j] = t;
}
}
/* FFT */
n2 = 1; n3 = Nt;
for (i=0; i < logN; i++)
{
n1 = n2; /* n1 = 2**i */
n2 = n2 + n2; /* n2 = 2**(i+1) */
n3 = n3 >> 1; /* cos/sin arg of -6.283185307179586/n2 */
a = 0;
for (j=0; j < n1; j++)
{
c = costable[a];
s = sintable[a];
a = a + n3;
for (k=j; k < N; k=k+n2)
{
/* Code for standard 32-bit hardware, */
/* with real,imag limited to 16 bits */
/*
Wr = (c*real[k+n1] - s*imag[k+n1]) >> 15;
Wi = (s*real[k+n1] + c*imag[k+n1]) >> 15;
real[k+n1] = (real[k] - Wr) >> 1;
imag[k+n1] = (imag[k] - Wi) >> 1;
real[k] = (real[k] + Wr) >> 1;
imag[k] = (imag[k] + Wi) >> 1;
*/
/* End standard 32-bit code */
/* Code for TI TMS320C54X series */
/*
Wr = ((long int)(c*real[k+n1]) - (long int)(s*imag[k+n1])) >> 15;
Wi = ((long int)(s*real[k+n1]) + (long int)(c*imag[k+n1])) >> 15;
real[k+n1] = ((long int)real[k] - (long int)Wr) >> 1;
imag[k+n1] = ((long int)imag[k] - (long int)Wi) >> 1;
real[k] = ((long int)real[k] + (long int)Wr) >> 1;
imag[k] = ((long int)imag[k] + (long int)Wi) >> 1;
*/
/* End code for TMS320C54X series */
/* Intrinsic code for TMS320C55X series */
Wr = _ssub(_smpy(c, real[k+n1]), _smpy(s, imag[k+n1]));
Wi = _sadd(_smpy(s, real[k+n1]), _smpy(c, imag[k+n1]));
real[k+n1] = _sshl(_ssub(real[k], Wr),-1);
imag[k+n1] = _sshl(_ssub(imag[k], Wi),-1);
real[k] = _sshl(_sadd(real[k], Wr),-1);
imag[k] = _sshl(_sadd(imag[k], Wi),-1);
/* End intrinsic code for TMS320C55X series */
/* Intrinsic code for TMS320C54X series */
/*
Wr = _ssub(_smpy(c, real[k+n1]), _smpy(s, imag[k+n1]));
Wi = _sadd(_smpy(s, real[k+n1]), _smpy(c, imag[k+n1]));
real[k+n1] = _sshl(_ssub(real[k], Wr),-1);
imag[k+n1] = _sshl(_ssub(imag[k], Wi),-1);
real[k] = _sshl(_sadd(real[k], Wr),-1);
imag[k] = _sshl(_sadd(imag[k], Wi),-1);
*/
/* End intrinsic code for TMS320C54X series */
}
}
}
return;
}
Comments, questions, feedback, criticisms?