Summary: Analysis of the speech spectrogram, and how the energy in speech effects the design of systems for transmiting speech.
When we speak, pitch and the vocal tract's transfer function are not static; they change according to their control signals to produce speech. Engineers typically display how the speech spectrum changes over time with what is known as a spectrogram Figure 1.
| spectrogram |
|---|
![]() |
Note how the line spectrum, which indicates how the pitch changes, is visible during the vowels, but not during the consonants (like the ce in "Rice").
The fundamental model for speech indicates how engineers use the physics underlying the signal generation process and exploit its structure to produce a systems model that suppresses the physics while emphasizing how the signal is "constructed." From everyday life, we know that speech contains a wealth of information. We want to determine how to transmit and receive it. Efficient and effective speech transmission requires us to know the signal's properties and its structure (as expressed by the fundamental model of speech production). We see from Figure 1, for example, that speech contains significant energy from zero frequency up to around 5 kHz. Effective speech transmission systems must be able to cope with signals having this bandwidth. It is interesting that one system that does not support this 5 kHz bandwidth is the telephone: Telephone systems act like a bandpass filter passing energy between about 200 Hz and 3.2 kHz. The most important consequence of this filtering is the removal of high frequency energy. In our sample utterance, the "ce" sound in "Rice" contains most of its energy above 3.2 kHz; this filtering effect is why it is extremely difficult to distinguish the sounds "s" and "f" over the telephone. Radio does support this bandwidth (more about AM and FM radio systems later). Efficient speech transmission systems exploit the speech signal's special structure: What makes speech speech? You can conjure many signals that span the same frequencies as speech—car engine sounds, violin music, dog barks—but don't sound at all like speech. We shall learn later that transmission of any 5 kHz bandwidth signal requires about 80 kbps (thousands of bits per second) to transmit digitally. Speech signals can be transmitted using less than 1 kbps because of its special structure. Reducing the "digital bandwidth" so drastically took engineers many years. They developed signal processing and coding methods that could deal effectively with speech without destroying it. If you used a speech transmission system to send a violin sound, it would arrive horribly distorted; speech transmitted the same way would sound fine.