Cepstral coefficients have received a great deal of attention in the speech processing community, as they try to extract the characteristics of the filter and model it independently of the signal being produced. This is ideal, as the filter in our case is the instrument that we are trying to recognize. We work on a Mel scale because it more accurately models how the human auditory system perceives different frequencies, i.e. it gives more weight to changes at low frequencies as humans are more adept at distinguishing low frequency changes.
The centroid correlates to the “brightness” of the sound and is often higher than expected due to the energy from harmonics above the fundamental frequency. The spread, skew, and kurtosis are based on the 2nd, 3rd, and 4th moments and, along with the slope, help portray spectral shape.
Odd-to-even harmonic energy ratio simply determines whether a sound consists primarily of odd harmonic energy, of even harmonic energy, or whether the harmonic energy is equally spread.
The tristimulus measure energy as well and were introduced as the timbre equivalent to the color attributes of vision. Like the OER, it provides clues regarding the distribution of harmonic energy, this time focusing on low, mid, and high harmonics rather than odd and even harmonics. This gives more weight to the first few harmonics, which are perceptually more important.




