The issues with speech recognition in general are complex and wide-ranging. One of the main problems lies in the complexity of the actual speech signal itself. In such signals, as in signal 1 below, it is very difficult to interpret the large amounts of information presented to a system.
![]() |
One of the more evident problems is the jaggedness of the signal. A natural speech signal is not smooth; instead, it fluctuates almost nonstop throughout the signal. Another naturally occurring property of speech patterns is the fluctuation in the volume, or amplitude, of the signal. Different people emphasize different syllables, letters, or words in different ways. If two signals have different volume levels, they are very difficult to compare. Speech signals also have a very large number of peaks in a short period of time. These peaks correspond to the syllables in the words being spoken. Comparing two signals becomes much more difficult as the number of peaks increases, as it is easy for results to be skewed by a higher peak, and, consequently, for those results to be interpreted incorrectly. The speed at which the input single is given is also an important issue. A user saying their name at a speed different from the speed at which they normally speak can change results, as two versions of the same pattern are compared. The problem is, the time over which they are spoken is different, and must be accounted for. Finally, when examining the signal in terms of speech verification, another individual may attempt to mimic the speech of another person. If the speaker has a good imitation, it would be possible for the speaker to be accepted by the system.





Introduction to Speaker Identification

