Summary: An explanation of formants, what they are, and how their frequency values relate to specific vowel sounds. Includes an IPA vowel chart and spectogram examples.
Taking the FFT (Fast Fourier Transform) of each voice sample outputs its frequency spectrum. A formant is one of four highest peaks in a spectrum sample. From the frequency spectrums, the main formants can be extracted. It is the location of these formants along the frequency axis that define a vowel sound. There are four main peaks between 300 and 4400 Hz, this bandwidth is where the strongest formants for human speech occur. For the purposes of this project, the group is to extract the frequency values of only the first two peaks since they provide the most information in terms of what the vowel sound is. Since all vowels follow constant and recognizable patterns in these two formants, the changes along an accent can be recorded with a high degree of accuracy. Figure 1 shows this pattern between the vowel sounds and formant frequencies.
| The IPA Vowel Chart |
|---|
![]() |
The first formant (F1) is dependant on whether a vowel sound is more open or closed, so on the chart, F1 varies along the y axis. F1 increases in frequency as the vowel becomes more open and decreases to its minimum as the vowel sound closes. The second formant (F2), however, follows along the x-axis. Thus, it varies depending on whether a sound is made in the front or the back of the vocal cavity. F2 increases in frequency the farther forward that a vowel is and decreases to its minimum as a vowel moves to the back. Therefore, each vowel sound has unique, characteristic formant values for its first two formants. With this in mind, it means that theoretically, across many speakers, the same frequency values for the first two formant locations should hold as long as they are making the same vowel sound.
| a as in call |
|---|
![]() |
| i as in please |
|---|
![]() |