Skip to content Skip to navigation

OpenStax-CNX

You are here: Home » Content » Audio Features

Navigation

Lenses

What is a lens?

Definition of a lens

Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

Who can create a lens?

Any individual member, a community, or a respected organization.

What are tags? tag icon

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

This content is ...

Affiliated with (What does "Affiliated with" mean?)

This content is either by members of the organizations listed or about topics related to the organizations listed. Click each link to see a list of all content affiliated with the organization.
  • Rice University ELEC 301 Projects

    This module is included inLens: Rice University ELEC 301 Project Lens
    By: Rice University ELEC 301As a part of collection: "ELEC 301 Projects Fall 2005"

    Click the "Rice University ELEC 301 Projects" link to see all content affiliated with them.

  • Rice Digital Scholarship

    This module is included in aLens by: Digital Scholarship at Rice UniversityAs a part of collection: "ELEC 301 Projects Fall 2005"

    Click the "Rice Digital Scholarship" link to see all content affiliated with them.

Also in these lenses

  • Lens for Engineering

    This module is included inLens: Lens for Engineering
    By: Sidney BurrusAs a part of collection: "ELEC 301 Projects Fall 2005"

    Click the "Lens for Engineering" link to see all content selected in this lens.

Recently Viewed

This feature requires Javascript to be enabled.
 

Audio Features

Module by: Yi-Chieh Wu. E-mail the author

Summary: The audio features used to characterize the sound signal and classify the sample by instrument.

How do we decide what parts of the spectrum are important? The CUIDADO project(2) provided a set of 72 audio features, and research1 has shown that some of the features are more important in capturing the signal characteristics. We therefore decided to implement a small subset of these features:

Cepstral Features

  • Mel-Frequency Cepstrum Coefficients (MFCC), k = 2:13

Spectral Features

  • Slope
  • Roll-Off
  • Centroid
  • Spread
  • Skew
  • Kurtosis
  • Odd-to-Even Harmonic Energy Ratio (OER)
  • Tristimulus

Definitions

Cepstral coefficients have received a great deal of attention in the speech processing community, as they try to extract the characteristics of the filter and model it independently of the signal being produced. This is ideal, as the filter in our case is the instrument that we are trying to recognize. We work on a Mel scale because it more accurately models how the human auditory system perceives different frequencies, i.e. it gives more weight to changes at low frequencies as humans are more adept at distinguishing low frequency changes.

The centroid correlates to the “brightness” of the sound and is often higher than expected due to the energy from harmonics above the fundamental frequency. The spread, skew, and kurtosis are based on the 2nd, 3rd, and 4th moments and, along with the slope, help portray spectral shape.

Odd-to-even harmonic energy ratio simply determines whether a sound consists primarily of odd harmonic energy, of even harmonic energy, or whether the harmonic energy is equally spread.

The tristimulus measure energy as well and were introduced as the timbre equivalent to the color attributes of vision. Like the OER, it provides clues regarding the distribution of harmonic energy, this time focusing on low, mid, and high harmonics rather than odd and even harmonics. This gives more weight to the first few harmonics, which are perceptually more important.

How We Chose Features

MFCC have shown to work very well in monophonic environments, as they capture the shape of the spectrum very effectively. Unfortunately, they are of less use in polyphonic recordings, as the MFCC captures the shape of a spectrum calculated from multiple sources. Most of the work we have seen on this subject uses MFCC regardless, however. They are particularly useful if only one instrument is playing or is relatively quite salient.

Most wind instruments have their harmonics evenly spread among the odd and even indices, but the clarinet is distinct in that it produces spectra consisting predominantly of odd ratios, with very little even harmonics appearing at all. This makes sense from a physics standpoint, as when played, the clarinet becomes a closed cylinder at one end, therefore allowing only the odd harmonics to resonate. This feature was thus chosen primarily with clarinet classification in mind.

We chose the roll-off and tristimulus as our energy measures, as they were both easy to implement and judged to be important(1). Finally, the first four spectral moments and the spectral slope, in both perceptual and spectral models, were shown to be the top ten most important features in the same study and were therefore some of the first features added to our classification system. We note that we had hoped to implement a perceptual model and thereby nearly double our features, but we could not find an accurate filter model for the mid-ear and thus decided to forgo any features based on perceptual modeling.

For further discussion of these features, along with explicit mathematical formulas, please refer to (1).

References

  1. A.A. Livshin and X. Rodet. “Musical Instrument Identification in Continuous Recordings,” in Proc. of the 7th Int. Conference on Digital Audio Effects, Naples, Italy, October 5-8, 2004.
  2. G. Peeters. “A large set of audio features for sound description (similarity and classification) in the CUIDADO project,” 2003. URL: http://www.ircam.fr/anasyn/peeters/ARTICLES/Peeters_2003_cuidadoaudiofeatures.pdf.

Content actions

Download module as:

Add module to:

My Favorites (?)

'My Favorites' is a special kind of lens which you can use to bookmark modules and collections. 'My Favorites' can only be seen by you, and collections saved in 'My Favorites' can remember the last module you were on. You need an account to use 'My Favorites'.

| A lens I own (?)

Definition of a lens

Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

Who can create a lens?

Any individual member, a community, or a respected organization.

What are tags? tag icon

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

| External bookmarks