The program flow for these image processing applications may
be a bit different from your previous experiences in C
programming. In most C programs, the main function is where
program execution starts and ends. In this real-time
application, the main function serves only to setup
initializations for the cache, the CSL, and the DMA channel.
When it exits, the main task, tskMainFunc(), will
execute automatically, starting the DSP/BIOS. This is where
our image processing application begins.
The tskMainFunc(), in main.c, opens
the handles to the board for image capture
(VCAP_open()) and to the display
(VCAP_open()) and calls the grayscale function.
Here, several data structures are instantiated that are
defined in the file img_proc.h. The IMAGE
structures will point to the data that is captured by the FPGA
and the data that will be output to the display. The
SCRATCH_PAD structure points to our internal and external
memory buffers used for temporary storage during processing.
LPF_PARAMS is used to store filter coefficients for the low
pass filter.
The call to img_proc() takes us to the file
img_proc.c. First, several variables are
declared and defined. The variable quadrant will denote on
which quadrant of the screen we currently want output;
out_ptr will point to the current output spot in
the output image; and pitch refers to the byte offset between
two lines. This function is the high level control for our
image-processing algorithm. See algorithm flow.
The first function called is the pre_scale_image
function in the file pre_scale_image.c. The
purpose of this function is to take the 640x480 image and
scale it down to a quarter of its size by first downsampling
the input rows by two and then averaging every two pixels
horizontally. The internal and external memory spaces in the
scratch pad are used for this task. The vertical downsampling
will occur when only every other line is read into the
internal memory from the input image. Within internal memory,
we will operate on two lines of data (640 columns/line) at a
time, averaging every two pixels (horizontal neighbors) and
producing two lines of output (320 columns/line) that are
stored in the external memory.
To accomplish this, we will need to take advantage of the IDM
by initializing the input and output streams. At the start of
the function, two instantiations of a new structure
dstr_t are declared. You can view the structure
contents of dstr_t on p. 2-11 of the
IDK
Programmer's Guide [link]. The structure contents are defined
with calls to dstr_open(). This data flow for
the pre-scale is shown in data
flow.
To give you a better understanding of how these streams are
created, let's analyze the parameters passed in the first call
to dstr_open():
This is a pointer to the place in memory serving as the
source of our input data (it's the source because the last
function parameter is set to DSTR_INPUT).
This is the total size of our input data. We will only be
taking every other line from in_image->data, so
only 240 rows. The extra two rows are for buffer.
This is a pointer to an 8x640 lexographic array,
specifically scratchpad->int_data. This is
where we will be putting the data on each call to
dstr_get().
The size of space available for data to be input into
int_mem from in_image->data.
Because double buffering is used, num_lines is
set to 2.
Each time dstr_get() is called, it will
return a pointer to 2 lines of data, 640 bytes in
length.
The need for the window size is not really apparent here.
It will become apparent when we do the 3x3 block
convolution. Then, the window size will be set to 3. This
tells the IDM to send a pointer to 3 lines of data when
dstr_get() is called, but only increment the
stream's internal pointer by 1 (instead of 3) the next time
dstr_get() is called. This is not a parameter
when setting up an output stream.
Sets the direction of data flow. If it had been set to
DSTR_OUTPUT (as done in the next call to
dstr_open()), we would be setting the data to
flow from the Internal Address to the External Address.
Once our data streams are setup, we can begin processing by
calling the component function pre_scale() (in
pre_scale.c) to operate on one block of data at a
time. This function will perform the horizontal scaling by
averaging every two pixels. This algorithm operates on four
pixels at a time. The entire function is iterated within
pre_scale_image() 120 times, which is the number
of rows in each quadrant. Before
pre_scale_image() exits, the data streams are
closed, and one line is added to the top and bottom of the
image to provide context necessary for the next processing
steps. Now that the input image has been scaled to a quarter
of its initial size, we will proceed with the four image
processing algorithms. In img_proc.c, the
set_ptr() function is called to set the variable
out_ptr to point to the correct quadrant on the
640x480 output image. Then copy_image(),
copy_image.c, is called, performing a direct copy
of the scaled input image into the lower right quadrant of the
output.
Next we will set the out_ptr to point to the
upper right quadrant of the output image and call
conv3x3_image() in conv3x3_image.c.
As with pre_scale_image(), the
_image indicates this is only the wrapper
function for the ImageLIB component, conv3x3().
As before, we must setup our input and output streams. This
time, however, data will be read from the external memory,
into internal memory for processing, and then written to the
output image. Iterating over each row, we compute one line of
data by calling the component function conv3x3()
in conv3x3.c.
In conv3x3(), you will see that we perform a 3x3
block convolution, computing one line of data with the low
pass filter mask. Note here that the variables
IN1[i], IN2[i], and
IN3[i] all grab only one pixel at a time. This
is in contrast to the operation of pre_scale()
where the variable in_ptr[i] grabbed 4 pixels at a time. This
is because in_ptr was of type unsigned int, which
implies that it points to four bytes of data at a time.
IN1, IN2, and IN3 are
all of type unsigned char, which implies they point to a
single byte of data. In block convolution, we are computing
the value of one pixel by placing weights on a 3x3 block of
pixels in the input image and computing the sum. What happens
when we are trying to compute the rightmost pixel in a row?
The computation is now bogus. That is why the wrapper
function copies the last good column of data into the two
rightmost columns. You should also note that the component
function ensures output pixels will lie between 0 and 255.
Back in img_proc.c, we can begin the edge
detection algorithm, sobel_image(), for the lower
left quadrant of the output image. This wrapper function,
located in sobel_image.c, performs edge detection
by utilizing the assembly written component function
sobel() in sobel.asm. The wrapper
function is very similar to the others you have seen and
should be straightforward to understand. Understanding the
assembly file is considerably more difficult since you are not
familiar with the assembly language for the c6711 DSP. As
you'll see in the assembly file, the comments are very helpful
since an "equivalent" C program is given there.
The Sobel algorithm convolves two masks with a 3x3 block of
data and sums the results to produce a single pixel of output.
This algorithm approximates a 3x3 nonlinear edge enhancement
operator. The brightest edges in the result represent a rapid
transition (well-defined features), and darker edges represent
smoother transitions (blurred or blended features).
"Doug course at UIUC using the TI C54x DSP has been adopted by many EE, CE and CS depts Worldwide "