Skip to content Skip to navigation

OpenStax-CNX

You are here: Home » Content » Harmonic Pitch Class Profile (HPCP)

Navigation

Recently Viewed

This feature requires Javascript to be enabled.
 

Harmonic Pitch Class Profile (HPCP)

Module by: Eric Kang, Abhipray Sahoo, John Yan, Chenxi Liu. E-mail the authors

Harmonic Pitch Class Profile

The key to note recognition is analyzing the frequency consistence of the audio clip. Suppose the clip we are analyzing consists of one or several individual notes, then unsurprisingly, the FFT of the audio should have several "peaks". HPCP provides a way to pin down the note(s) by analyzing these "peaks" in the FFT plot.

Big Picture

The HPCP algorithm returns a vector of length 12, representing the 12 notes within an octave. The elements will in the end be normalized, representing the likelihood that the corresponding note is actually in the audio. The values are obtained as follows:

..

where n = 1, ... 12, ai is the linear magnitude of the ith peak, and fi is the frequency value of the ith peak. i = 1, ... nPeaks, where nPeaks is the number of spectral peaks that we consider, and w is the weight of the frequency fi.

Basically, the note represented by integer n is compared with every peak in FFT. The weighting function represents the similarity between the note and the peak. This correlation is then multiplied by the square of the amplitude of the peak. We do this for every peak and add all the correlation up to get the "likelihood" that note n is compatible with the FFT graph. We repeat the same operations for all 12 values of n, and the HPCP vector is complete.

Weighting Function

The weighting function mentioned above is determined by the following three steps.

STEP ONE:

...

Note:

where size=12. f_ref can be set to 440Hz but doesn't affect the general result. Here fn is the frequency of the note represented by n in a certain octave.

STEP TWO:

....

Note:

where m is the integer that minimizes the magnitude of the distance d. The role of m is to drive d to zero as much as possible so that the potential difference in vector is eliminated.

STEP THREE:

.....

Note:

where l is the length of the weighting window. This value is a parameter of the algorithm that can be adjusted. What this equation says is that when d is small enough, we think there is a correlation between the note and the peak and return a positive value given by cosine square. Otherwise we set the correlation to zero.

Normalization

After we have obtained the original HPCP results, we normalize the biggest term to 1. Now we have a vector of length 12 and each element is between 0 and 1, each representing the "possibility" that the corresponding note is in that audio clip.

Example

Here is an HPCP vector for a C Major chord:

.

Note:

The integers of the x axis represents 12 notes within an octave. 1 represents #A and 12 represents A. We can see clearly that C, E, and G are the three notes with the most significant correlation, therefore we can safely conclude that the audio most likely consists of C, E, and G, which is correct.

Here is the input file:

yep

Octave Detection

One problem with the HPCP algorithm is that it ignores what octave the original note was in. We have written a function findOctave to rectify this.

First off, we exploit the fact that a note's fft is the same regardless of what pitch it is: spikes the the note's pitch's fundamental frequency and all its multiples. So an A4 starts at 440 Hz, and has spikes at 880, 1320, etc. Our HPCP will identify this note as an A. To find out what octave it is in, we just look at the location of the lowest spike because all other spikes are multiples of this one frequency. So we find it at 440 Hz.

This process is repeated for every pitch detected by the HPCP, so for the C Major chord above, it'd look for the harmonics of C and find the lowest at 261 Hz (C4), then E at 330 Hz (E4), and G at 392 Hz (G4).

This method is fast and accurate -- however there is one limitation: if there are multiple notes of the same pitch but in different octaves (e.g. C4 and C5 and C6), their spectra would overlap and the algorithm would only detect the lowest note, C4.

Content actions

Download module as:

Add module to:

My Favorites (?)

'My Favorites' is a special kind of lens which you can use to bookmark modules and collections. 'My Favorites' can only be seen by you, and collections saved in 'My Favorites' can remember the last module you were on. You need an account to use 'My Favorites'.

| A lens I own (?)

Definition of a lens

Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

Who can create a lens?

Any individual member, a community, or a respected organization.

What are tags? tag icon

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

| External bookmarks