<?xml version="1.0" encoding="utf-8" standalone="no"?>
<!DOCTYPE document PUBLIC "-//CNX//DTD CNXML 0.5 plus MathML//EN" "http://cnx.rice.edu/cnxml/0.5/DTD/cnxml_mathml.dtd">
<document xmlns="http://cnx.rice.edu/cnxml" xmlns:md="http://cnx.rice.edu/mdml/0.4" xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:bib="http://bibtexml.sf.net/" id="Module.2003-12-08.2634">
  <name>Music Classification by Genre: Frequency Smoothness</name>
  <metadata>
  <md:version>1.2</md:version>
  <md:created>2003/12/08 14:26:34 US/Central</md:created>
  <md:revised>2003/12/08 14:38:30.267 US/Central</md:revised>
  <md:authorlist>
    <md:author id="chunter">
      <md:firstname>Christopher</md:firstname>
      <md:othername>Robert</md:othername>
      <md:surname>Hunter</md:surname>
      <md:email>chunter@rice.edu</md:email>
    </md:author>
  </md:authorlist>

  <md:maintainerlist>
    <md:maintainer id="mitali">
      <md:firstname>Mitali</md:firstname>
      
      <md:surname>Banerjee</md:surname>
      <md:email>mitali@rice.edu</md:email>
    </md:maintainer>
  </md:maintainerlist>
  
  <md:keywordlist>
    <md:keyword>frequency</md:keyword>
    <md:keyword>smoothness</md:keyword>
  </md:keywordlist>

  <md:abstract>Spectrogram Smoothness Analyzer</md:abstract>
</metadata>

  <content>
    <para id="intro">
A spectrogram is a tool that belongs to a set of tools called time-frequency representations.  Music, on a CD, is a time-vector. Performing an FFT of this time-vector would give us its frequency content. However, a single FFT would lose all time information since it gives us the frequency content of the time-vector as a whole. We need something like an instantaneous frequency response so we have both frequency and time information. A spectrogram essentially breaks a signal up into many different time-vectors and performs FFTs of each. These FFTs are then placed as columns in the spectrogram. In the end, we have a time-frequency representation of our music.
    </para>   

<figure id="techno">
  <media type="image/jpeg" src="spectechno.gif"/>
</figure>

<figure id="classical">
  <media type="image/jpeg" src="specclassical.gif"/>
</figure>

    <para id="figexplain">
This is a spectrogram of a techno song and a classical song. freqsmooth.m quantifies the differences seen in these spectrograms. To do this, freqsmooth calculates the variance in the indices of the max values of each column. In other words, a song with a clear, loud melody will show small variance in these indices while a song with a harder-to-identify melody will show a large variance.
    </para> 

<section id="results">
  <name>Results</name>
    <para id="endresults">
While freqsmooth does give a different value for each genre, it also gives a radically different value for songs within a given genre. In other words, it does not give a good representation of a genre as a whole. Given the plus and minus standard deviation bars, each genre overlaps heavily.
    </para> 

<figure id="findings">
  <media type="image/jpeg" src="freqsmooth.gif"/>
</figure>

</section>
  </content>
  
</document>
