We continue to develop the properties for the uniform linear array (ULA) that has been discussed
previously. With the important relationship that we found to avoid spatial aliasing,
d
≤
λ
min
2
d
≤
λ
min
2
, we now consider the theoretical background of the ULA. Once we understand how the array will be used, we will look at a method called 'beamforming' that directs the array's focus in specific directions.
Far-field Signals
We saw previously that a very nice property of the ULA is the constant delay between the arrival of a signal at consecutive sensors. This is only true, however, if we assume that plane waves arrive at the array. Remember that sounds radiate spherically outward from their sources, so the assumption is generally not true! To get around that problem, we only consider signals in the
far-field, in which case the signal sources are far enough away that the arriving sound waves are essentially planes over the length of the array.
Definition 1:
Far-field source
A source is considered to be in the far-field if
r
>
2
L
2
λ
r
>
2
L
2
λ
, where r is the distance from the source to the array, L is the length of the array, and
λ
λ is the wavelength of the arriving wave.
If you have an array and sound sources, you can tell whether the sources are in the far-field based on what angle the array estimates for the source direction compared to the actual source direction. If the source is kept at the same angle with respect to the broadside and moved further away from the array, the estimate of the source direction should improve as the arriving waves become more planar. Of course, this only works if the array is able to accurately estimate far-field source directions to begin with, so use the formula first to make sure that everything works well in the far-field, and then move closer to see how distance affects the array's performance.
Near-field sources are beyond the scope of our project, but they are not beyond the scope of array processing. For more information on just about everything related to array processing, take a look at Array Signal Processing: Concepts and Techniques, by Don H. Johnson and Dan E. Dudgeon, Englewood Cliffs, NJ: Prentice Hall, 1993.
Array Properties
Depending on how the array will be used, it may be important (as it was in our project) that the microphones used be able to receive sound from all directions. We used omni-directional microphones, which are exactly what they sound like -- microphones that hear in all directions. If you don't need this ability, you can look into other options, but keep in mind that the theoretical development here assumes omni-directional capability, so you will need to do some research on array processing techniques that suit your needs. In fact, it would be a good idea no matter what! Our project used a simple array design, but it took a while to learn all of the theory and to figure out how to implement it.
Our array comprises six generic omni-directional microphones. We built an array frame out of PVC pipe to hold each microphone in place with equidistant spacing between the sensors. Physical limitations of the microphones and PVC connecting pieces prevented us from using a very small spacing; for our array, we had a sensor spacing of d=9.9 cm. Since we know that we need to have
d
≤
λ
min
2
d
≤
λ
min
2
to avoid spatial aliasing, we are able to calculate the highest frequency that the array is capable of processing:
f
max
=
1600
Hz
f
max
1600
Hz
. (Actually,
f
max
f
max
is a little higher than 1600 Hz as you can verify, but to be on the safe side we kept it a bit lower.)
If you want to have any chance of figuring out some useful information about a signal, particularly in real-time, you're going to have to ditch the pencil and paper for some electronic equipment. We used National Instruments' LabVIEW 7.1 to do all of our signal processing, although we performed some analog conditioning on the received signal before the analog to digital conversion (ADC). We also used National Instruments' 6024e data acquisition card to digitize the signal. This is a multiplexed ADC with a total sampling capacity of 200 kHz that divides between the number of inputs. Therefore, with six sensor inputs, we could sample the signals received at each microphone at a maximum rate of 33.3 kHz. Since twice the Nyquist rate for speech is about 44.1 kHz, this is not a good DAQ for speech applications; however, it would have worked for our original plan to listen to low frequency sound in the 0 to 8 kHz range. As it turns out, since our array can process a maximum frequency of 1600 Hz, we chose to sample at
f
s
=
4000
Hz
f
s
4000
Hz
, which exceeds the Nyquist requirement and is well within the capability of our DAQ.
All of these properties generealize to determining the design of any ULA or the design of any array, though other designs may have greater capabilities and thus would require that you consider additional signal properties (e.g., signal elevation above or below the array) and how they affect the array. If you need a starting point, think about the range of frequencies that you are interested in and get equipment that is capable of processing them. That includes an ADC that can sample at a high enough rate to avoid temporal aliasing and the materials to construct an array such that spatial aliasing will not occur. You will probably have to do some pre-conditioning of the signal before you digitize it, such as lowpass filtering to reject frequencies above those you are interested in and applying a gain to the input signals. These are all important things to think about when you design your array!
ULA Processing Fundamentals
Now it's time to look at the theory that we need to implement for a ULA that enables us to figure out where signals come from and to listen to them. We are considering narrowband signals (i.e., sinusoids) of the form
x
(
t
)
=
ⅇ
j2πf
t
x
(
t
)
ⅇ
j2πf
t
(1)
where f is the frequency of the signal. If we have N sensors numbered from n=0,...,N-1, then the delayed versions of x(t) that arrive at each microphone n are
x
n
(
t
)
=
ⅇ
j2πf
(
t
-
nτ
)
x
n
(
t
)
ⅇ
j2πf
(
t
-
nτ
)
(2)
Thus, the first sensor (n=0) has zero delay, while the signal arrives at the second sensor (n=1) one unit delay later than at the first, and so on for each sensor. Then, we sample the signal at each microphone in the process of ADC, and call
x
n
(
r
)
=
x
n
(
mT
)
x
n
(
r
)
x
n
(
m
T
)
, where m is the integers. This gives us the sampled sinusoids at each sensor:
x
n
(
r
)
=
ⅇ
j2πf
(
r
-
n
τ
)
x
n
(
r
)
ⅇ
j2πf
(
r
-
n
τ
)
(3)
Now we need to do some Fourier transforms to look at the frequency content of our received signals. Even though each sensor receives the same frequency signal, recall that delays
x
(
t
-
nτ
)
x
(
t
-
n
τ
)
in time correspond to modulation by
ⅇ
-
j
n
τ
ⅇ
-
j
n
τ
in frequency, so the spectra of the received signals at each sensor are not identical. The first Discrete Fourier Transform (DFT) looks at the temporal frequency content at each sensor:
X
n
(
k
)
=
1
R
∑
r
=
0
R
-
1
ⅇ
j2πf
(
r
-
n
τ
)
ⅇ
-
j2
π
kr
R
X
n
(
k
)
1
R
∑
r
=
0
R
-
1
ⅇ
j2πf
(
r
-
n
τ
)
ⅇ
-
j2
π
kr
R
(4)
=
ⅇ
-
j2
π
f
n
τ
R
ⅇ
-
j2
π
f
n
τ
R
for
k
fn
and
zow
(5)
for k=fn, and zero otherwise. Here we have used the definition of the normalized DFT, but it isn't particularly important whether you use the normalized or unnormalized DFT because ultimately the transform factors 1/Sqrt(R) or 1/R just scale the frequency coefficients by a small amount.
Now that we have N spectra from each of the array's sensors, we are interested to see how a certain frequency of interest fo is distributed spatially. In other words, this
spatial Fourier transform will tell us how strong the frequency fo for different angles with respect to the array's broadside. We perform this DFT by taking the frequency component from each received signal that corresponds to fo and concatenating them into an array. We then zero pad that array to a length that is a power of two in order to make the Fast Fourier Transform (FFT) computationally efficient. (Every DFT that we do in this project is implemented as an FFT. We use the DFT in developing the theory because it applies always, whereas the FFT is only for computers.)
Point of Interest: When we build the array of frequency components from each of the received signals, we have an N length array before we zero pad it. Let's think about the resolution of the array, which refers to its ability to discriminate between sounds coming from different angles. The greater the number of sensors in the array, the finer the array's resolution. Therefore, what happens when we zero pad the array of frequency components? We are essentially adding components from additional 'virtual sensors' that have zero magnitude. The result is that we have improved resolution! What effect does this have? Read on a bit!.
Once we have assembled our zero padded array of components fo, we can perform the spatial DFT:
Ω
(
k
)
=
1
NR
∑
n
=
0
N
-
1
ⅇ
-
j2
π
(
k
N
+
f
0
τ
)
Ω
(
k
)
1
NR
∑
n
=
0
N
-
1
ⅇ
-
j2
π
(
k
N
+
f
0
τ
)
(6)
where N for this equation is the length of the zero padded array and R remains from the temporal DFT. The result of the summation is a spectrum that is a digital sinc function centered at
f
0
τ
f
0
τ
. The value of the sinc represents the magnitude of the frequency fo at an angle theta. Because of the shape of the lobes of the sinc, which look like beams at the various angles, the process of using the array to look for signals in different directions is called
beamforming. This technique is used frequently in array processing and it is what enabled us to detect the directions from which certain frequency sounds come from and to listen in different directions.
Recalling that we zero padded our array of coefficients corresponding to
f
0
f
0
, what has that done for us in terms of the spatial spectrum? Well, we have improved our resolution, which means that the spectrum is smoother and more well-defined. This is because we are able to see the frequency differences for smaller angles. If we increase the actual number of sensors in the array, we will also improve our resolution and we will improve the beamforming by increasing the magnitude of the main lobe in the sinc spectrum and decreasing the magnitudes of the side lobes.