Summary: An overview of our Musical Instrument Recognition System.
| Pitch and Instrument Recognition System Diagram |
|---|
The system takes some training songs and creates an output vector of features that characterize the signal. A Gaussian mixture model (GMM) is trained to identify patterns and predict an output instrument classification given a set of features.
Each digitized signal was windowed into smaller chunks for feature processing. In training, features were calculated for each window and concatenated into a single vector to be fed into the GMM for training. In testing, features were calculated for each window and fed into the GMM for classification. If multiple notes were to be detected, we recurred on the same window until we found the maximum number of notes or until a note could no longer be detected (as evaluated using a cutoff threshold for what constitutes silence).
From a user standpoint, the user must input a set of training songs, which includes a wav file and the instrument that produced the sound at specific times. Once the system is trained, the user can then input a new song, and our algorithm will output the song in “piano roll” format, i.e. the pitch and instrument of notes plotted over time.