Inside Collection (Course): ELEC 301 Projects Fall 2006
Summary: This module explains how a vector representing a song played by a piano can be analyzed to determine when each note is depressed.
A music file can be stored on a computer in the form of a vector, with each successive element of the vector representing the next sample taken from the song. A song lasting s seconds, sampled at a frequency of f, is represented by a vector of length s*f. CD quality music is sampled at a rate of 44.1kHz, causing even short songs to take up large amounts of space.
A recorded piano note is usually very sinusoidal, has a sharp, almost immediate rise, and a slow and steady exponential decay. In order to determine the times during a recording of a piano when a note is hit, the sharp rise of a note is taken advantage of through an edge-detection filter.
The first step in this process is the take the absolute value of the signal. This gives the signal a non-zero absolute value, which gives the signal a detectable envelope.
![]() |
Next, this signal is convolved with an edge detection filter using fast-convolution techniques. A filter of length 5200 for a song sampled at 44.1kHz seems to work very well. The filter is the first derivative of a Gaussian pulse, and will output a positively valued spike for positive edges, and a negatively valued spike for negative edges, or drop-offs.
![]() |
![]() |
The last step is simply a matter of assigning peaks to note depressions. Basically, every relative minimum that occurs above a certain threshold value (0.1, in this case) is counted as a note depression. Negatively-value peaks are largely artifacts due to notes decaying too steeply, and are thus ignored. The following plot places red stems for every detected note depression over the original song where the note most likely occurred. It is simple to tell that in this case, the note depression detection went without error.
![]() |