Skip to content Skip to navigation

Connexions

You are here: Home » Content » Key Problems in Speaker Identification

Navigation

Lenses

What is a lens?

Definition of a lens

Lenses

A lens is a custom view of Connexions content. You can think of it as a fancy kind of list that will let you see Connexions through the eyes of organizations and people you trust.

What is in a lens?

Lens makers point to Connexions materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

Who can create a lens?

Any individual Connexions member, a community, or a respected organization.

What are tags? tag icon

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

This content is ...

Affiliated with (What does "Affiliated with" mean?)

This content is either by members of the organizations listed or about topics related to the organizations listed. Click each link to see a list of all content affiliated with the organization.
  • Rice University ELEC 301 Projects

    This module is included inLens: Rice University ELEC 301 Project Lens
    By: Rice University ELEC 301As a part of collection:"ELEC 301 Projects Fall 2006"

    Click the "Rice University ELEC 301 Projects" link to see all content affiliated with them.

Recently Viewed

This feature requires Javascript to be enabled.

Key Problems in Speaker Identification

Module by: Chris Pasich. E-mail the author

User rating (How does the rating system work?)
Ratings

Ratings allow you to judge the quality of modules. If other users have ranked the module then its average rating is displayed below. Ratings are calculated on a scale from one star (Poor) to five stars (Excellent).

How to rate a module

Hover over the star that corresponds to the rating you wish to assign. Click on the star to add your rating. Your rating should be based on the quality of the content. You must have an account and be logged in to rate content.

:
(0 ratings)

Summary: An explanation of the basic problems in analyzing speech patterns.

The Questions

The issues with speech recognition in general are complex and wide-ranging. One of the main problems lies in the complexity of the actual speech signal itself. In such signals, as in signal 1 below, it is very difficult to interpret the large amounts of information presented to a system.

Figure 1: The word diablo, with DC offset removed.
Figure 1 (Graphic1.png)

One of the more evident problems is the jaggedness of the signal. A natural speech signal is not smooth; instead, it fluctuates almost nonstop throughout the signal. Another naturally occurring property of speech patterns is the fluctuation in the volume, or amplitude, of the signal. Different people emphasize different syllables, letters, or words in different ways. If two signals have different volume levels, they are very difficult to compare. Speech signals also have a very large number of peaks in a short period of time. These peaks correspond to the syllables in the words being spoken. Comparing two signals becomes much more difficult as the number of peaks increases, as it is easy for results to be skewed by a higher peak, and, consequently, for those results to be interpreted incorrectly. The speed at which the input single is given is also an important issue. A user saying their name at a speed different from the speed at which they normally speak can change results, as two versions of the same pattern are compared. The problem is, the time over which they are spoken is different, and must be accounted for. Finally, when examining the signal in terms of speech verification, another individual may attempt to mimic the speech of another person. If the speaker has a good imitation, it would be possible for the speaker to be accepted by the system.

The Answers

How do you deal with the jaggedness of the signal and the noise introduced to the signal through the environment?

  • In order to actually account for this, you have to pass all the signals through a smoothing filter. The filter will accomplish two tasks: first, it gets rid of any excess noise. Second, it gets rid of the high frequency jaggedness in the signal and leaves behind simply the magnitude of the signal. As a result, you get a clean signal that is fairly easy to process.

How do you account for the different volumes of speakers?

  • The signals must all be normalized to the same volume before they are examined. Each signal is normalized about zero such that all of the signals will have the same relative maximum and minimum values, and so that comparing two signals with different volumes is the same as comparing the same two signals if they were to have the same volume.

How do you examine each of the individual peaks?

  • Just after the signal is smoothed by the filter, we use an envelope function to detect all of the peaks of the signal. By doing this, we can be sure that, if a signal passes a certain threshold amount, it will be examined and compared with the corresponding signal in the database. The analysis will not be an analysis of the entire signal, but rather a formant analysis. The individual formant, or vowel sounds, in the signal will be examined and those will be used to verify the speaker.

How does the system handle varying speeds of inputs?

  • Both the formant analysis and the envelope functions will be used to help with varying input speeds. The envelope of the peak will determine which vowels are available, and the actual formants themselves will be relatively unchanged. It is difficult to handle very high speed voices, but most other voices can be handled effectively.

How can you account for imitating speech patterns?

  • Once again, the formants of the individual signals are analyzed to actually determine if a speaker is who he claims to be. In most cases, the imitating formants do not match up closely with those stored in the database, and the imitator will be denied by the system.

Content actions

Give Feedback:

E-mail the module author | Rate module ( How does the rating system work?)

Rating system

Ratings

Ratings allow you to judge the quality of modules. If other users have ranked the module then its average rating is displayed below. Ratings are calculated on a scale from one star (Poor) to five stars (Excellent).

How to rate a module

Hover over the star that corresponds to the rating you wish to assign. Click on the star to add your rating. Your rating should be based on the quality of the content. You must have an account and be logged in to rate content.

(0 ratings)

Download:

Add module to:

My Favorites (?)

'My Favorites' is a special kind of lens which you can use to bookmark modules and collections directly in Connexions. 'My Favorites' can only be seen by you, and collections saved in 'My Favorites' can remember the last module you were on. You need a Connexions account to use 'My Favorites'.

| A lens (?)

Definition of a lens

Lenses

A lens is a custom view of Connexions content. You can think of it as a fancy kind of list that will let you see Connexions through the eyes of organizations and people you trust.

What is in a lens?

Lens makers point to Connexions materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

Who can create a lens?

Any individual Connexions member, a community, or a respected organization.

What are tags? tag icon

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

| External bookmarks