Skip to content Skip to navigation


You are here: Home » Content » Pitch Detection and Sinusoidal Harmonic Modeling


Recently Viewed

This feature requires Javascript to be enabled.

Pitch Detection and Sinusoidal Harmonic Modeling

Module by: Yi-Chieh Wu, Kyle Ringgenberg. E-mail the authors

Summary: Finding the pitch of a musical signal, and approximating the spectral envelope of a tone by estimating its harmonics and harmonic amplitudes.

Note: You are viewing an old version of this document. The latest version is available here.

Pitch Detection

Detecting the pitch of an input signal seems deceptively simple. Many groups have tackled this challenge by simply taking the Fourier transform of the signal, and then finding the frequency with the highest spectral magnitude. As elegant as it may seem, this approach does not work for many musical instruments. Instead, we have chosen to approach the problem from a more expandable point of view.

One of the problems with finding the fundamental frequency lies in simple definition. In our case, we will define this as being the frequency that the human ear recognizes as being dominate. The human auditory system responds most sensitively to the equivalent of the lowest common denominator of the produced frequencies. This can be modeled by finding the strongest set of frequencies amplitudes, and taking the lowest frequency value of that group. This process is quite effective, though it does rely on the condition that the fundamental frequency actually exists, and isn't just simulated via a combination of higher harmonics. The following example illustrates this more concretely.

Figure 1: Frequence vs. Time for Trumpet playing a concert 'A'=440 Hz
Figure 1 (Graphic1)

In the above waveform, we want to find the frequency heard by the human ear as being the fundamental pitch. To do this, we first look at the five highest peaks, which occur at 440, 880, 1320, 1760, and 2640 Hz. From this set of values, we grab the lowest occurring frequency. Hence, the fundamental frequency of the above signal would be stated as being 440 Hz, or a concert 'A'... which is, in fact, the pitch that was played.

Sinusoid Harmonic Modeling

We would like to capture the “typical” spectrum for each instrument, independent of the pitch being produced. This allows us to classify a signal using our model without providing the pitch as another parameter to the model. (We note that this method is not without consequences, as the frequency response of the instrument changes the spectrum depending on the note being played. For example, very low and very high notes are more likely to vary than notes at mid-range. We decided to go with this approach to save time in model training and hopefully reduce the dimensionality of our problem.)

Sinusoidal harmonic modeling (SHM) captures the harmonic envelope of a signal (as opposed to its spectral envelope) and is ideal for tonal sounds produced by wind instruments, as most of the spectral energy is captured in the harmonics. Given a spectrum, SHM finds the fundamental frequency and estimates the harmonics and the harmonic amplitudes, eventually producing a amplitude versus harmonic graph.

Figure 2: Average Harmonic Envelope for Clarinet (Blue), Tenor Sax (Green), and Trumpet (Red)
Figure 2 (Graphic2)

From this representation, we can then determine characteristic features of the instrument. For example, qualitatively, we can tell that the spectrum of a clarinet declines rather fast, and that most of the energy is in the odd harmonics. Similarly, we can tell that the saxophone declines slower, and that the trumpet has its harmonic energies relatively distributed among the odd and even indices.

Content actions

Download module as:

Add module to:

My Favorites (?)

'My Favorites' is a special kind of lens which you can use to bookmark modules and collections. 'My Favorites' can only be seen by you, and collections saved in 'My Favorites' can remember the last module you were on. You need an account to use 'My Favorites'.

| A lens I own (?)

Definition of a lens


A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

Who can create a lens?

Any individual member, a community, or a respected organization.

What are tags? tag icon

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

| External bookmarks