Skip to content Skip to navigation

OpenStax_CNX

You are here: Home » Content » Time Domain Pitch Correction

Navigation

Lenses

What is a lens?

Definition of a lens

Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

Who can create a lens?

Any individual member, a community, or a respected organization.

What are tags? tag icon

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

This content is ...

Affiliated with (What does "Affiliated with" mean?)

This content is either by members of the organizations listed or about topics related to the organizations listed. Click each link to see a list of all content affiliated with the organization.
  • Rice University ELEC 301 Projects

    This module is included inLens: Rice University ELEC 301 Project Lens
    By: Rice University ELEC 301As a part of collection: "ECE 301 Projects Fall 2003"

    Click the "Rice University ELEC 301 Projects" link to see all content affiliated with them.

  • Rice Digital Scholarship

    This module is included in aLens by: Digital Scholarship at Rice UniversityAs a part of collection: "ECE 301 Projects Fall 2003"

    Click the "Rice Digital Scholarship" link to see all content affiliated with them.

Also in these lenses

  • Lens for Engineering

    This module is included inLens: Lens for Engineering
    By: Sidney Burrus

    Click the "Lens for Engineering" link to see all content selected in this lens.

  • EW Public Lens display tagshide tags

    This module is included inLens: Ed Woodward's Public Lens
    By: Ed WoodwardAs a part of collection: "ECE 301 Projects Fall 2003"

    Comments:

    "assafdf"

    Click the "EW Public Lens" link to see all content selected in this lens.

    Click the tag icon tag icon to display tags associated with this content.

Recently Viewed

This feature requires Javascript to be enabled.

Tags

(What is a tag?)

These tags come from the endorsement, affiliation, and other lenses that include this content.
 

Time Domain Pitch Correction

Module by: Gareth Middleton. E-mail the author

Summary: Two algorithms operating in the time domain to change the pitch of solo human voice.

Introduction

Time domain pitch shifting algorithms have several advantages over frequency domain approaches. First, the formants of the original signal can be preserved, meaning that the timbre of the input signal will be largely unaffected. Second, the computational complexity is much less for time domain algorithm because there is no need to take transforms of the data. Two different algorithms were created and utilized with the major difference being the way in which the section of signal that is to be overlapped and added is selected.

Through this overlap and add approach the signal retains most of its correct shape. For both algorithms the original signal was broken down into overlapping windows of a specified size and hop size (which should be consistent with the values provided to the detection algorithm). Then, for each window the detected period (one divided by the detected fundamental frequency) and target period is computed and used to build up the new data for that window. After the construction of each window, which is described further under the two approaches, the windows themselves where then overlapped and added to create the new pitch corrected output signal. When the detection algorithm decides that a given window is unvoiced (i.e. has no fundamental frequency), both algorithms just copy that window as is, without any modification. A Hanning window is used to filter out the inconsistencies created from adding together overlapping windows of the output signal. This helps in the smoothing process so that there are not large discontinuities between added segments.

Figure 1: Several periods of a human voice holding a note.
Sample of Human Voice
Sample of Human Voice (tgraph1.jpg)

PSOLA : Pitch-Synchronous Overlap-Add

The key to PSOLA is the determination and utilization of pitch markers in the original signals. The idea is that these markers should be equally spaced throughout the signal (at intervals equal to the detected fundamental period), but also that they should be placed at a location for which the signal has a maximum value (a peak). These two constraints are often in conflict, especially since our assumption that the fundamental period is constant for the entire window is not entirely true. As a result, following the highest peak in the signal from period to period may require relaxing the requirement that the markers be exactly equally spaced. On the other hand, if we only follow the maximum peak without regard for the fundamental period, our markers no longer have any regard for the pitch of the window and are not useful.

In order to strike this compromise, we created a matrix where each column contains two periods of the signal and the center row starts at 0 and increments by one period each column. Then we used a dynamic path finding algorithm (created by Vladimir Goncharoff and Patrick Gries from the University of Chicago in Illinois) to find a path that went through the maximum peak as much as possible, but which did not exceed a given slope as it went through the matrix. Since a slope of 0 (horizontal line) means the markers are equally spaced, the slope is the factor that is adjusted to strike the compromise between following peaks and maintaining periodicity. Empirically, we found a suitable value of this slope to be around 4. In the diagram below these pitch marks are labeled as mi-1, mi and mi+1.

Figure 2
Pitch Markers across windows
Pitch Markers across windows (tgraph3.jpg)

The matrix described is pictured graphically in the top graph of the figure above (cyan is zero, dark blue is negative, red is positive). Below that is a matrix that shows the two periods around these pitch markers (found by this path), which the pitch marker itself in the center of each column. As you can see, the peaks seem to move across the matrix in a straight line, meaning that when we overlap and add these segments, the peaks will be added on top of one another. This reduces phase problems with constructive and destructive interference between the peaks (which is why the algorithm is pitch-synchronous).

Having marked the boundaries of the regions to extract from the original signal, their new locations need to be defined (where they will end up in the output signal). A vector of new pitch markers is created, which begins with the first old pitch marker (found above), which is the phase offset, and then equally spaced at intervals equal to the desired fundamental period. For each new marker, the closest marker in the original signal is found and the two periods centered around that marker are Hanning windowed and copied to the output signal, centered about the new marker. Depending on whether the frequency is being raised or lowered, some pitch markers in the original signal may be used more than once, or not at all. The result of all this is a signal whose waveform retains the shape of the original, but has a shorter or longer period (depending on the amount of shift and in which direction). Hence, the pitch is shifted without altering the qualities of the voice that produced the sound.

Figure 3: The sample shown at the beginning of the module after having had a pitch-shift performed using the PSOLA algorithm.
Original signal modified using PSOLA algorithm
Original signal modified using PSOLA algorithm (tgraph2.jpg)

Time Shifting

This algorithm is based loosely on a paper written by Keith Lent from the University of Texas. As our project already had a separate component for pitch detection, many of the topics in the paper did not apply.

First, the first two periods of the original signal are located (using our knowledge of the detected frequency for the window). We then apply a Hanning window to these two periods and copy them at intervals of the new desired frequency. This is very similar to PSOLA, except that we do not place pitch markers throughout the original signal and locate the closest to our output. Instead, we always use the first two periods in the window and copy it centered on each new pitch marker, under the assumption that each period of the signal will be largely the same in a window that covers only a few milliseconds. Again, the result is a waveform with much the same shape as the original (at least in general) but a different period, and thus a modified fundamental frequency.

Figure 4: One period of the original signal is shown in the topmost graph. Pitch markers calculated by the algorithm are shown in the second and third graphs, along with a copy of the single period placed after them. The sum of these signals is the corrected output, shown in the bottommost graph.
Overview of Time-Shifting algorithm
Overview of Time-Shifting algorithm (tgraph5.jpg)

The figure presented below offers a visual comparison of these two algorithms. The graph on the left is about two periods from the original signal, whereas the graph on the right shows the output signal during the same time interval for both the PSOLA (red) and time-shifting algorithms (blue). By inspection, it should be clear that while both algorithms produce similar output, the PSOLA algorithm more closely resembles the shape of the original signal. An informal listening test confirms that the PSOLA algorithm sounds better.

Figure 5
Comparison
Comparison (n1.jpg)
Figure 6: This is the original signal after having been pitch-shifted using the Time Shifting algorithm.
Original signal corrected with Time Shifting algorithm
Original signal corrected with Time Shifting algorithm (tgraph4.jpg)

References

  1. Schnell N., Peeters G., Lemouton S., Manoury P., and Rodet X. (2000). Synthesizing a choir in real-time using Pitch Synchronous Overlap Add(PSOLA). International Computer Music Conference 2000.
  2. Harmon C., Moulines E., and Charpentier F. (1995). A diphone synthesis system based on time-domain prosodic modifications of speech. Centre National d'Etudes des Telecommunications, France, S5.7, p. 238..
  3. Lent K. (1989). An efficient method for pitch shifting digitally sampled sounds. Computer Music Journal Vol. 13 No.4.
  4. Goncharoff V. and Gries P. (1998). An algorithm for accurately marking pitch pulses in speech signals. International Conference Signal and Image Processing.

Content actions

Download module as:

PDF | EPUB (?)

What is an EPUB file?

EPUB is an electronic book format that can be read on a variety of mobile devices.

Downloading to a reading device

For detailed instructions on how to download this content's EPUB to your specific device, click the "(?)" link.

| More downloads ...

Add module to:

My Favorites (?)

'My Favorites' is a special kind of lens which you can use to bookmark modules and collections. 'My Favorites' can only be seen by you, and collections saved in 'My Favorites' can remember the last module you were on. You need an account to use 'My Favorites'.

| A lens I own (?)

Definition of a lens

Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

Who can create a lens?

Any individual member, a community, or a respected organization.

What are tags? tag icon

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

| External bookmarks