As far as the ability to correct pitch with minimal distortion, the PSOLA algorithm is far superior to either the standard Time Shift or even Modified Phase Vocoder systems. This is because the PSOLA algorithm takes into account the
In the way of detection routines, the autocorrelator provided better results than the HPS algorithm. This is because the autocorrelator was less sensitive to noise, while HPS detected notes even in portions of relative silence. If we had expanded the HPS to include a "silence" detector, we might have seen an improvement. However, HPS also suffered from a severe tendency to misclass the octave of the pitch. This is a result of the harmonic nature of the spectrum, the very thing that makes HPS work. In order to correct this, we would have had to add more layers of detectors to determine if the highest peak really was also the lowest to appear in the transform, since we want the fundamental frequency and not multiples of it.
The autocorrelator provided good results, but was unreliable in sections of high frequency, such as an 's' sound. In this regions, the r(s) function was extremely badly behaved and did not really have any local minima. To deal with this case, we introduced a "threshold" and declared that if the minimum was above this value, the region must be noisy. This is a quick fix--a better method would be to take a transform of this function and examine its behavior, but that can be left for future investigation.
Future Work
Items to consider in an expansion of this project include: an improved method of calculating the autocorrelation function (using Fourier domain methods), and perhaps a method to better detect voice/unvoiced sections of sound, in both the Autocorrelation and the HPS methods.
As far as correction is concerned, a more improved dynamic programming algorithm in the PSOLA method would improve the "phasiness" heard in the output of that method. We do not believe that either of the other two methods--Time Shifting and Mod Phase Vocoder--are likely to ever be as useful as PSOLA, since they introduce formant errors as they are changing pitch.
Another area in which to develop a better algorithm would be in the mapping between detected pitch and desired pitch. Currently we use a logarithmic rounder, but this is simplistic and assumes that the singer is always closer to the desired note than any other, which is clearly not always the case. It would be nice to implement a "note tracker" which follows the detected notes and perhaps tries to determine a melody, but this is another project entirely. Also, we would like to make the correction seem more "natural" by correcting by small amounts in each window, leading to a "pick-up" or "pull-down" sound in the result, as if the singer corrected the pitch himself. This is opposed to the robotic, "Cher-Effect" results we have currently, which were generated using an instantaneous correction to exactly the desired pitch.







Examples

"assafdf"