Skip to content Skip to navigation Skip to collection information

OpenStax-CNX

You are here: Home » Content » ECE 301 Projects Fall 2003 » Frequency Domain Pitch Correction

Navigation

Table of Contents

Lenses

What is a lens?

Definition of a lens

Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

Who can create a lens?

Any individual member, a community, or a respected organization.

What are tags? tag icon

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

This content is ...

Affiliated with (What does "Affiliated with" mean?)

This content is either by members of the organizations listed or about topics related to the organizations listed. Click each link to see a list of all content affiliated with the organization.
  • Rice University ELEC 301 Projects

    This collection is included inLens: Rice University ELEC 301 Project Lens
    By: Rice University ELEC 301

    Click the "Rice University ELEC 301 Projects" link to see all content affiliated with them.

  • Rice Digital Scholarship

    This collection is included in aLens by: Digital Scholarship at Rice University

    Click the "Rice Digital Scholarship" link to see all content affiliated with them.

Also in these lenses

  • Lens for Engineering

    This module and collection are included inLens: Lens for Engineering
    By: Sidney Burrus

    Click the "Lens for Engineering" link to see all content selected in this lens.

  • EW Public Lens display tagshide tags

    This collection is included inLens: Ed Woodward's Public Lens
    By: Ed Woodward

    Comments:

    "assafdf"

    Click the "EW Public Lens" link to see all content selected in this lens.

    Click the tag icon tag icon to display tags associated with this content.

Recently Viewed

This feature requires Javascript to be enabled.

Tags

(What is a tag?)

These tags come from the endorsement, affiliation, and other lenses that include this content.
 

Frequency Domain Pitch Correction

Module by: Gareth Middleton. E-mail the author

Summary: An algorithm for modifying the pitch of a solo human voice in the frequency domain.

Introduction

As with the time domain techniques for pitch shifting, we will deal with one window of the signal at a time (with the length and hop size between windows specified as inputs to the algorithm). For each window, we will multiply it by a Hanning window, and then take the fft. Since the fft is redundant for real signals, we need only work with half of the coefficients. We can then recreate the truncated coefficients at the end by complex conjugation.

Modified Phase Vocoder

Method

First, we need to identify peaks in the spectrum. For computational simplicity, we define a peak as any fft bin whose magnitude is greater than the magnitude of its two nearest neighbors on either side. We then assume that the area around a peak (as far away as halfway to the next peak) is part of the peak, or is in the peaks region of influence. Thus, wherever we shift the peak, this region will move with it. In order to figure out how much the peak must be shifted by (it is different for every peak), we must identify the frequency of the underlying sinusoid that caused it. We do this by fitting a parabola to the peak bin and its neighbor to either side. This involves solving three linear equations (using a matrix multiply). We then find the vertex of this parabola and assume that point to be the frequency of the peak. We then will want to shift that peak to some multiple of its current location. In other words, lower frequency peaks will not shift as far as higher frequency peaks. That factor is determined by a ratio of the target frequency to the detected frequency.

Figure 1: Note the clear peaks, but note also that they are not single points but rather slightly spread over several bins.
Partial Spectrum of Original Signal
Partial Spectrum of Original Signal (fgraph1.jpg)

Since it is unlikely that the amount of bins we need to shift the peak by will be an integer, we will need to use linear interpolation to figure out what values to assign in bins where the peak and surrounding regions shift to, since the fft bins are discrete. A more sophisticated method of interpolation could work as well, but it would only add a great deal of complexity to what is already a very expensive algorithm. We then add the peak with the interpolated values into its new location in the spectrum and subtract the values of the original peak in the original location (thus, cutting and pasting it, in a sense, rather than just copying it). If a shift would cause any bins to move beyond the last bin of the fft in either direction, it should be assumed that it has moved into negative frequencies and should therefore be reflected back into the positive frequencies with a complex conjugation since the signal is real.

Finally, we must adjust the phase of the peak and its surrounding region to account for the changes we have made to its frequency. We multiply by a phasor of e^(j*dw*h), where dw is delta omega, the change in frequency, and h is the hop size between windows. We apply this phasor to all the bins in the area around the peak, thus preserving the phase relationships in the original signal for each peak, and by using the phasor, ensuring maximum frame to frame phase coherence. One last difficulty that arises is that these phasors must be accumulated from one frame to the next. This requires the tracking of peaks so that these phasors may be accumulated (since every peak will have a different dw, and thus a different phasor). One simple way of dealing with this is to look up the region from the frame before at the bin where the current peak in located, then assume the peak that influences that region of the frame is the same peak as the current peak, and accumulate the phasor accordingly. This principle works under the assumption that because audio signals (produced by a singer) must change frequency smoothly, and thus the peak can't have moved far from one frame to the next as the time difference is very small.

Figure 2: Partial spectrum of shifted signal. Direct comparison is difficult, but the peaks have been shifted. The first peak was shifted by only a few bins, corresponding to about 5% of its original frequency, while the higher peaks were shifted by many more bins, corresponding to 5% of their original frequencies.
Partial Spectrum of Shifted Signal
Partial Spectrum of Shifted Signal (fgraph2.jpg)

Once the phase has been adjusted, the second half of the fft can be recreated by complex conjugation. Then taking an inverse fft, we should get values for this window of the output signal. By overlapping and adding these windows in the same manner in which they were analyzed, we will create an output signal corresponding to a pitch-corrected version of the input.

Limitations of the modified phase vocoder

There are three key problems with this approach. First, it is painfully slow. Taking the transform and fitting many parabolas within every window is extremely computationally expensive. Second, the formants of the singer (seen in the Fourier domain as the spectral envelope) are stretched or compressed depending on the direction of frequency shift. In reality, a singer's formants should not change when singing a higher or lower note. For small shifts, this will not be terribly noticeable, but for large shifts it will become problematic and very detectable. Finally, even with the phase correction, the output of the algorithm still sounds "phasy". The overlapping windows interfere constructively and destructively to create an effect somewhat like reverberation in a concert hall. The output signal seems to have less presence, or to be more distant from the microphone than the input signal.

Advantages

However, the frequency domain approach does have a few advantages over the time domain approaches. First, it deals well with noisy signals, which can throw off time domain techniques. Also, it can handle larger pitch shifts than time domain approaches. For instance, if you wanted to decrease the frequency by a very large amount, the period could become long enough that in PSOLA, the data that you were adding at each new pitch marker did not overlap with the other data, resulting in a very choppy and unacceptable signal. The frequency domain approach would have no problems with arbitrarily large shifts, as long as you don't mind the formant shifting that will accompany it. However, there are ways to try to restore the original formants after processing with an algorithm such as this, which would be fertile ground for further exploration. Finally, this algorithm has no difficulty in handling polyphonic signals. It could be used to shift the pitch of a track from a CD, or two voices or instruments in harmony. The time domain algorithms cannot handle anything but a monophonic input, because they require that there be a single dominant fundamental frequency.

Clearly, there are pros and cons to this algorithm, but given its complexity, and the huge difference in time it takes to process a sample with this algorithm versus a time domain algorithm, we have concluded that unless the signal is exceptionally noisy, extremely large pitch shifts are required, or the source material is polyphonic, it would be better off sticking with a time domain approach for pitch shifting, such as PSOLA.

References

  1. M.S. Puckette. (1995). Phase-locked vocoder. Proc. IEEE ASSP Workshop on app. of sig. proc. to audio and acous..
  2. J. Laroche and M. Dolson. (1999). New Phase-Vocoder Techniques for Pitch-Shifting, Harmonizing, and other Exotic Effects. Proc. IEEE ASSP Workshop on app. of sig. proc. to audio and acous..
  3. J. Laroche and M. Dolson,. (1997). Phase-vocoder: About this phasiness business. Proc. IEEE ASSP Workshop on app. of sig. proc. to audio and acous.

Collection Navigation

Content actions

Download:

Collection as:

PDF | EPUB (?)

What is an EPUB file?

EPUB is an electronic book format that can be read on a variety of mobile devices.

Downloading to a reading device

For detailed instructions on how to download this content's EPUB to your specific device, click the "(?)" link.

| More downloads ...

Module as:

PDF | More downloads ...

Add:

Collection to:

My Favorites (?)

'My Favorites' is a special kind of lens which you can use to bookmark modules and collections. 'My Favorites' can only be seen by you, and collections saved in 'My Favorites' can remember the last module you were on. You need an account to use 'My Favorites'.

| A lens I own (?)

Definition of a lens

Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

Who can create a lens?

Any individual member, a community, or a respected organization.

What are tags? tag icon

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

| External bookmarks

Module to:

My Favorites (?)

'My Favorites' is a special kind of lens which you can use to bookmark modules and collections. 'My Favorites' can only be seen by you, and collections saved in 'My Favorites' can remember the last module you were on. You need an account to use 'My Favorites'.

| A lens I own (?)

Definition of a lens

Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

Who can create a lens?

Any individual member, a community, or a respected organization.

What are tags? tag icon

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

| External bookmarks