Taking the FFT (Fast Fourier Transform) of
each voice sample outputs its frequency spectrum. A formant is one of four highest peaks in a spectrum sample. From the
frequency spectrums, the main formants can be extracted. It is the
location of these formants along the frequency axis that define a
vowel sound. There are four main peaks between 300 and 4400 Hz,
this bandwidth is where the strongest formants for human speech
occur. For the purposes of this project, the group is to extract
the frequency values of only the first two peaks since they provide
the most information in terms of what the vowel sound is. Since all
vowels follow constant and recognizable patterns in these two
formants, the changes along an accent can be recorded with a high
degree of accuracy. Figure 1 shows this pattern between the vowel sounds and formant frequencies.
The first formant (F1) is dependant on
whether a vowel sound is more open or closed, so on the chart, F1
varies along the y axis. F1 increases in frequency as the vowel
becomes more open and decreases to its minimum as the vowel sound
closes. The second formant (F2), however, follows along the x-axis.
Thus, it varies depending on whether a sound is made in the front
or the back of the vocal cavity. F2 increases in frequency the
farther forward that a vowel is and decreases to its minimum as a
vowel moves to the back. Therefore, each vowel sound has unique,
characteristic formant values for its first two formants. With this
in mind, it means that theoretically, across many speakers, the
same frequency values for the first two formant locations should
hold as long as they are making the same vowel sound.