In visual perception, people discriminate among colors based on the frequency of the wave length of light. Low frequencies are perceived as red and high frequencies are perceived as violet.
![]() |
As we move from low to high frequencies, we perceive a continuum of colors from red to violet. Notice that as we move from red to orange, we pass through a middle ground that we call "red orange." Speech sounds lie on a physical continuum as well. For example, an important dimension in speech perception is voice onset time. This refers to the time between the beginning of the pronunciation of the word and the onset of the vibration of the vocal chords. For example, when you say "ba" your vocal chords vibrate right from the start. When you say "pa" your vocal chords do not vibrate until after a short delay. To see this for yourself, put one of your fingers on your vocal chords and say "ba" and then "pa."
The only difference between the sound "ba" and the sound "pa" is that the voice onset time for "ba" is shorter than the voice onset time for "pa". An important difference between speech perception and visual perception is that we do not hear speech sounds as falling halfway between a "ba" and a "pa." We hear a sound one way or the other. This means that a range of voice onset times are perceived as "ba" and a different range of voice onset times are perceived as "pa". This phenomenon is called categorical perception and is very helpful for understanding speech.
The sounds "ba" and "pa" differ on the continuous dimension of voice onset time. The sounds "ga" and "da" also differ on a continuous dimension. However, the continuous dimension for these stimuli is more complex than the dimension of voice onset time (it is called the second formant but that is a little beyond the scope of this text). What is important here is that there is a continuum of sounds from "da" to "ga." The following demonstration uses computer generated speech sounds. Ten sounds were generated in equal steps from "da" to "ga." The experiment uses sounds numbered 1, 4, 7, and 10. Sounds 1 and 4 are both heard as "da" whereas sounds 7 and 10 are heard as "ga." In the task, subjects are presented with a randomly-ordered series of sound pairs and asked, for each pair, to judge whether the sounds are the same or different. Since sounds 1 and 4 are both heard as "da" it should be very hard to tell them apart. Therefore, subjects usually judge these sounds as identical. By contrast, Sound 4 is heard as "da" while Sound 7 is heard as "ga." Since Sound 4 and Sound 7 are on opposite sides of the "categorical boundary" it is easier to hear the difference between these sounds than the difference between Sounds 1 and 4. This occurs even though the physical difference between Sounds 1 and 4 is the same as the difference between Sounds 4 and 7. By similar logic, the difference between Sounds 7 and 10 should be hard to hear.
The results from one subject in this demonstration experiment are shown below and can be interpreted as follows: When the comparison was between Sounds 1 and 4, the subject judged them to be different once and the same 4 times. When the comparison was between Sounds 4 and 7 (which cross the border), the subject correctly judged them to be different 5/5 times. Finally, in comparing Sounds 7 and 10, the subject always judged the sounds to be the same. Thus, the only time this subject heard a difference between sounds that were three steps apart was for Sounds 4 and 7.
| Sound Pair | Judged different | Judged same |
|---|---|---|
| 1 vs. 4 | 1 | 4 |
| 4 vs. 7 | 5 | 0 |
| 7 vs. 10 | 0 | 5 |
Not all results are as clear cut as those shown above. Many people need more time to become familiar with the task than is possible in this demonstration. In any case, you should get a sense of how this kind of experiment works.
The hypothesis that speech is perceptually special has arisen from this phenomenon of categorical perception. Listeners can differentiate between /p/ and /b/; however, performance in distinguishing between different types of /p/ sounds is difficult and, for some, impossible. This pattern is consistent with the pragmatic demands of language; there is a meaning distinction between /p/ and /b/, while the distinction between two variations of /p/ carries no meaning. (There are languages in which two different /p/ sounds are used, and, in such cases, perception would be categorical).The first experiment to demonstrate categorical perception was conducted by Liberman, Harris, Hoffman and Griffith (1957), and in it they presented consonant-vowel syllables along a continuum. The consonants were stop consonants, or plosives, /b/, /d/, and /g/, followed by /a/; for example, /ba/. When asked to say whether two syllables were the same or different, the participants reported various forms of /pa/ to be the same, whereas /pa/ and /ba/ were easily discriminated.
Another categorical perception task presents two syllables followed by a probe syllable, and participants have to say which of the first two syllables the probe matches. If the first two sounds are from two different categories - for example, /da/ and /ga/ - participants accurately match the probe syllable. If the first two syllables are taken from the same category, however, participants cannot differentiate them well enough to do the matching task, and their performance is at chance.
Does the categorical perception of speech mean that speech is perceived via a specialized speech processor? Kewley-Port and Luce (1984) did not find categorical perception in some non-speech stimuli, indicating that there may be something special about speech.
For there to be a specialized speech processor, categorical perception should occur during the perception of all phonemes. However, Fry, Abramson, Eimas, and Liberman (1962), failed to find categorical perception with a vowel continuum. So, there are vowels and consonants that do not behave the same in that respect. Additionally, chinchillas have been shown to categorically perceive speech, despite their obvious lack of speech-processing mechanism (Kuhl, 1987).





