Skip to content Skip to navigation

Connexions

You are here: Home » Content » Voice Recognition Using MATLAB

Navigation

Recently Viewed

This feature requires Javascript to be enabled.
 

Voice Recognition Using MATLAB

Module by: David Roberts. E-mail the author

Summary: The following module describes the process behind implementing a voice recognition algorithm in MATLAB. The algorithm utilizes the Discrete Fourier Transform in order to compare the frequency spectra of two voices. Chebyshev’s Inequality is then used to determine (with reasonable certainty) whether two voices came from the same person. All material in this module is the result of a course project for a Partial Differential Equations course (Math 480) held at California State University Northridge during the Fall 2009 semester. The project was carried out under the guidance of professors – Carol Shubin and Gloria Melara.

Voice Recognition M-Files

Click here to download.

Initial Problem

A human can easily recognize a familiar voice however, getting a computer to distinguish a particular voice among others is a more difficult task. Immediately, several problems arise when trying to write a voice recognition algorithm. The majority of these difficulties are due to the fact that it is almost impossible to say a word exactly the same way on two different occasions. Some factors that continuously change in human speech are how fast the word is spoken, emphasizing different parts of the word, etc… Furthermore, suppose that a word could in fact be said the same way on different occasions, then we would still be left with another major dilemma. Namely, in order to analyze two sound files in time domain, the recordings would have to be aligned just right so that both recordings would begin at precisely the same moment.

How to Compare Recordings

Frequency Domain

Given the difficulties mentioned in the above paragraph, it became quite evident that any voice analysis in time domain would be extremely impractical. Instead, an analysis of the frequency spectra in a voice (which remains predominately unchanged as speech is slightly varied) turned out to be a more viable option. Converting all recordings into frequency domain (by applying the Discrete Fourier Transform) greatly simplified the process of comparing two recordings. That being said, working in frequency domain also provided a new set of issues that required attention.

Finding a Norm

Due to nature of human speech, all data pertaining to frequencies above 600Hz can safely be discarded. Therefore, once a recording is converted into frequencey domain, it could then be simply regarded as a vector in 600-dimensional Euclidean space. At this point, a comparison between two vectors could easily be carried out by normalizing the vectors (giving them length 1) then computing the norm of the difference betweeen the two (of course, the difference between two vectors in R600 is performed by subtracting componentwise). Unfortunately, exactly which norm to use is not immediately clear. After carefully comparing and contrasting the use of the Taxicab, Euclidean, and Maximum norms, it became clear that the Euclidean norm most accurately measured the closeness between different frequency spectra. Once the norm function was chosen, all that remained was to decide exactly how small the norm of the difference of two vectors had to be in order to determine that both recordings originated from the same person.

Chebyshev's Inequality

Recall that Chebyshev's Inequality states that in particular, at least 3/4 of all measurements from the same population fall within 2 standard deviations of the mean. Hence, in response to the problem posed at the end of the previous paragraph, the following solution can be formulated:

By requiring that the norm of the difference fall within 2 standard deviations of the normal average voice, we are then ensured that at least 3/4 of the time, the algorithm would recognize a voice correctly.

Algorithm Instructions

All files pertaining to the algorithm are located within the zip-file VoiceRecognition.zip which can be downloaded by simply pressing the link. The following is a short synopsis regarding the proper execution of the software.

Short Description

As mentioned before, all files pertaining to the project can be accessed using the link: Voice Recognition. As soon as the file is opened, the following folders will be accessable:
David's Recordings
Matlab Files

The contents of these folders will now be discussed in more detail. The folder Matlab Files contains 10 audio recordings of David Roberts saying his name 'David'. Moreover, the folder contains the two m-files project.m and voicerec.m.

Project.m is the voice recognition algorithm that accomplishes the goals of the class project. The script file project.m can be executed by typing 'project' in the command window. Please make sure that the directory in Matlab is set to the directory that contains project.m and the 10 audio recordings g1.wav through g10.wav. Once project.m is ran in Matlab, it will then request that you "Enter the name that must be recognized". Since the recordings in that folder are of David Roberts, then type in 'David'. Next, the program will inform you that you have 2 seconds to say the name 'David'. After recording, Matlab will playback the sample and give you the option to try again or to proceed if satisfied. A plot is then generated depicting how the normalized frequency spectra in your voice (top window) compares to the average normal vector of David's Voice (bottom window). See the figure below for an example. At this point, the algorithm makes a comparison and displays in the command window 'YOU ARE NOT DAVID!!!!' if you do not fall within 2 standard deviations of the normal average voice. If you do happen to fall within 2 standard deviations, then the command window displays 'HELLO DAVID!!!'.

Figure 1: Example of a Frequency Spectra Comparison.
Example of a Frequency Spectra Comparison

The second m-file in that folder is voicerec.m. This script file is executed by typing 'voicerec' in the command window. Running voicerec.m will prompt the user to record their name 10 times. The recordings are then saved as g1.wav through g10.wav in the directory. Therefore, the ten new recording will in fact replace the recordings of David Roberts. Doing this results in the conversion of project.m into a voice recognition algorithm for the user's voice (as oppose to the voice of David Roberts). In this case, the user's name should be entered as the voice to be recognized (instead of 'David') when running project.m. Lastly, since voicerec.m replaces g1.wav through g10.wav in the directory, back-up copies of David Roberts' voice are conviently stored in the folder David's Recordings.

Content actions

Download module as:

PDF | EPUB (?)

What is an EPUB file?

EPUB is an electronic book format that can be read on a variety of mobile devices.

Downloading to a reading device

For detailed instructions on how to download this content's EPUB to your specific device, click the "(?)" link.

| More downloads ...

Add module to:

My Favorites (?)

'My Favorites' is a special kind of lens which you can use to bookmark modules and collections. 'My Favorites' can only be seen by you, and collections saved in 'My Favorites' can remember the last module you were on. You need an account to use 'My Favorites'.

| A lens I own (?)

Definition of a lens

Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

Who can create a lens?

Any individual member, a community, or a respected organization.

What are tags? tag icon

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

| External bookmarks