Summary: A description of Gaussian Mixture Models as applied to instrument classification.
A Gaussian Mixture Model (GMM) was used as our classification tool. As our work focused mainly on signal processing, we forgo a rigorous treatment of the mathematics behind the model in favor of a brief description of GMMs and its application to our system.
GMMs belong to the class of pattern recognition systems. They model the probability density function of observed variables using a multivariate Gaussian mixture density. Given a series of inputs, it refines the weights of each distribution through expectation-maximization algorithms.
In this respect, GMMs are very similar to Support Vector Machines and Neural Networks, and all of these models have been used in instrument classification (1). Reported success (2) with GMMs prompted us to use this model for our system.
We use 9 features in our recognition program and relied on the GMM to find patterns that would associate these features to the correct instrument. Some of our features consist of a vector (we used 12 MFCC, and tristimulus has 3 components), so we are actually working in 22 dimension space. For convenience, we focus here on recognizing a pattern between the instrument and two of these dimensions, using the first two MFCC coefficients as an example.
Looking at the distribution of features for the three instruments in figure 1, we clearly see that there are some feature differences based on instrument.
| Distribution of First Two MFCC Coefficients for Three Instruments |
|---|
GMM detects the patterns in these features and gives us a nice decision rule, as pictured in figure 2. Based on these two features alone, the GMM tells us which instrument most likely played the note, visually represented by the highest peak in the three-dimensional representation.
| Two-Parameter Gaussian Mixture Model for Three Instruments |
|---|
Finally, we note that GMMs have been shown to be useful if features are particularly weak or missing (2). This is of particular importance in polyphonic environments, as harmonics may overlap, thus causing some features to be unreliable measures of the instrument.