Skip to content Skip to navigation

OpenStax-CNX

You are here: Home » Content » Key Problems in Speaker Identification

Navigation

Lenses

What is a lens?

Definition of a lens

Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

Who can create a lens?

Any individual member, a community, or a respected organization.

What are tags? tag icon

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

This content is ...

Affiliated with (What does "Affiliated with" mean?)

This content is either by members of the organizations listed or about topics related to the organizations listed. Click each link to see a list of all content affiliated with the organization.
  • Rice University ELEC 301 Projects

    This module is included inLens: Rice University ELEC 301 Project Lens
    By: Rice University ELEC 301As a part of collection: "ELEC 301 Projects Fall 2006"

    Click the "Rice University ELEC 301 Projects" link to see all content affiliated with them.

  • Rice Digital Scholarship

    This module is included in aLens by: Digital Scholarship at Rice UniversityAs a part of collection: "ELEC 301 Projects Fall 2006"

    Click the "Rice Digital Scholarship" link to see all content affiliated with them.

Also in these lenses

  • Lens for Engineering

    This module is included inLens: Lens for Engineering
    By: Sidney BurrusAs a part of collection: "ELEC 301 Projects Fall 2006"

    Click the "Lens for Engineering" link to see all content selected in this lens.

Recently Viewed

This feature requires Javascript to be enabled.
 

Key Problems in Speaker Identification

Module by: Chris Pasich. E-mail the author

Summary: An explanation of the basic problems in analyzing speech patterns.

The Questions

The issues with speech recognition in general are complex and wide-ranging. One of the main problems lies in the complexity of the actual speech signal itself. In such signals, as in signal 1 below, it is very difficult to interpret the large amounts of information presented to a system.

Figure 1: The word diablo, with DC offset removed.
Figure 1 (Graphic1.png)

One of the more evident problems is the jaggedness of the signal. A natural speech signal is not smooth; instead, it fluctuates almost nonstop throughout the signal. Another naturally occurring property of speech patterns is the fluctuation in the volume, or amplitude, of the signal. Different people emphasize different syllables, letters, or words in different ways. If two signals have different volume levels, they are very difficult to compare. Speech signals also have a very large number of peaks in a short period of time. These peaks correspond to the syllables in the words being spoken. Comparing two signals becomes much more difficult as the number of peaks increases, as it is easy for results to be skewed by a higher peak, and, consequently, for those results to be interpreted incorrectly. The speed at which the input single is given is also an important issue. A user saying their name at a speed different from the speed at which they normally speak can change results, as two versions of the same pattern are compared. The problem is, the time over which they are spoken is different, and must be accounted for. Finally, when examining the signal in terms of speech verification, another individual may attempt to mimic the speech of another person. If the speaker has a good imitation, it would be possible for the speaker to be accepted by the system.

The Answers

How do you deal with the jaggedness of the signal and the noise introduced to the signal through the environment?

  • In order to actually account for this, you have to pass all the signals through a smoothing filter. The filter will accomplish two tasks: first, it gets rid of any excess noise. Second, it gets rid of the high frequency jaggedness in the signal and leaves behind simply the magnitude of the signal. As a result, you get a clean signal that is fairly easy to process.

How do you account for the different volumes of speakers?

  • The signals must all be normalized to the same volume before they are examined. Each signal is normalized about zero such that all of the signals will have the same relative maximum and minimum values, and so that comparing two signals with different volumes is the same as comparing the same two signals if they were to have the same volume.

How do you examine each of the individual peaks?

  • Just after the signal is smoothed by the filter, we use an envelope function to detect all of the peaks of the signal. By doing this, we can be sure that, if a signal passes a certain threshold amount, it will be examined and compared with the corresponding signal in the database. The analysis will not be an analysis of the entire signal, but rather a formant analysis. The individual formant, or vowel sounds, in the signal will be examined and those will be used to verify the speaker.

How does the system handle varying speeds of inputs?

  • Both the formant analysis and the envelope functions will be used to help with varying input speeds. The envelope of the peak will determine which vowels are available, and the actual formants themselves will be relatively unchanged. It is difficult to handle very high speed voices, but most other voices can be handled effectively.

How can you account for imitating speech patterns?

  • Once again, the formants of the individual signals are analyzed to actually determine if a speaker is who he claims to be. In most cases, the imitating formants do not match up closely with those stored in the database, and the imitator will be denied by the system.

Content actions

Download module as:

Add module to:

My Favorites (?)

'My Favorites' is a special kind of lens which you can use to bookmark modules and collections. 'My Favorites' can only be seen by you, and collections saved in 'My Favorites' can remember the last module you were on. You need an account to use 'My Favorites'.

| A lens I own (?)

Definition of a lens

Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

Who can create a lens?

Any individual member, a community, or a respected organization.

What are tags? tag icon

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

| External bookmarks