<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE document PUBLIC "-//CNX//DTD CNXML 0.5//EN" "http://cnx.rice.edu/technology/cnxml/schema/dtd/0.5/cnxml_plain.dtd">
<document xmlns="http://cnx.rice.edu/cnxml" xmlns:md="http://cnx.rice.edu/mdml/0.4" xmlns:bib="http://bibtexml.sf.net/" id="id5716921">
<name>Audio Fingerprint Generation</name>
<metadata>
  <md:version>1.2</md:version>
  <md:created>2006/12/16 18:45:00 US/Central</md:created>
  <md:revised>2006/12/22 09:45:15.128 US/Central</md:revised>
  <md:authorlist>
      <md:author id="dremos">
      <md:firstname>Andre</md:firstname>
      <md:othername>T.</md:othername>
      <md:surname>Mosley</md:surname>
      <md:email>dremos@rice.edu</md:email>
    </md:author>
      <md:author id="ptwang">
      <md:firstname>Po</md:firstname>
      <md:othername>T</md:othername>
      <md:surname>Wang</md:surname>
      <md:email>ptwang@rice.edu</md:email>
    </md:author>
      <md:author id="jbroadway">
      <md:firstname>John</md:firstname>
      
      <md:surname>Broadway</md:surname>
      <md:email>jtb5020@rice.edu</md:email>
    </md:author>
      <md:author id="yjlee">
      <md:firstname>Yu-Heng</md:firstname>
      <md:othername>Jaret</md:othername>
      <md:surname>Lee</md:surname>
      <md:email>jaret.lee@rice.edu</md:email>
    </md:author>
  </md:authorlist>

  <md:maintainerlist>
    <md:maintainer id="dremos">
      <md:firstname>Andre</md:firstname>
      <md:othername>T.</md:othername>
      <md:surname>Mosley</md:surname>
      <md:email>dremos@rice.edu</md:email>
    </md:maintainer>
    <md:maintainer id="ptwang">
      <md:firstname>Po</md:firstname>
      <md:othername>T</md:othername>
      <md:surname>Wang</md:surname>
      <md:email>ptwang@rice.edu</md:email>
    </md:maintainer>
    <md:maintainer id="jbroadway">
      <md:firstname>John</md:firstname>
      
      <md:surname>Broadway</md:surname>
      <md:email>jtb5020@rice.edu</md:email>
    </md:maintainer>
    <md:maintainer id="yjlee">
      <md:firstname>Yu-Heng</md:firstname>
      <md:othername>Jaret</md:othername>
      <md:surname>Lee</md:surname>
      <md:email>jaret.lee@rice.edu</md:email>
    </md:maintainer>
    <md:maintainer id="Markpanzee">
      <md:firstname>Mark</md:firstname>
      <md:othername>A.</md:othername>
      <md:surname>Davenport</md:surname>
      <md:email>md@rice.edu</md:email>
    </md:maintainer>
    <md:maintainer id="richb">
      <md:firstname>Richard</md:firstname>
      <md:othername>G.</md:othername>
      <md:surname>Baraniuk</md:surname>
      <md:email>richb@rice.edu</md:email>
    </md:maintainer>
  </md:maintainerlist>
  
  <md:keywordlist>
    <md:keyword>audio fingerprint generation</md:keyword>
  </md:keywordlist>

  <md:abstract>Often times, there is a need to represent a long audio signal in a way that it is smaller but yet retains enough information to distinguish it from another audio file.  This is useful when implementing a system that compares audio files in order to determine their identity.  This module explains how we used a particular audio fingerprinting method in our audio recognition system.</md:abstract>
</metadata>
<content>
<para id="id5769918">In order to provide a more compact
representation of our audio signals, we researched methods of audio
fingerprinting. The method we used, which appeared to be very
effective, was one that compared the amount of energy in different
frequency bands. This method takes advantage of the fact that the
human ear can only distinguish audio frequencies that are very
different from one another. There are 25 audible frequency bands,
also known as critical bands, that vary in width and range from 0
to 20kHz. The human ear can be modeled as a series of 25 band pass
filters. As long as two frequencies fall within the same critical
band, they are generally indistinguishable. Therefore rather than
keep the entire spectrum for each one of our audio signals we can
examine the patterns of energy variations from one critical band to
the next, which only requires retaining one value, (the total
energy), for each critical band.</para>
<para id="id5624659">To implement this system in Matlab, we first
divided the signal into time frames using Hanning windows of about
37 ms. The purpose of using Hanning windows is to prevent
extraneous oscillations from occurring in the spectrum. We also
overlapped the Hanning windows in order to prevent the amplitude
modulating effect of consecutive Hanning window.</para>
<figure id="id5723858">
<media type="image/png" src="Graphic1.png"/>
</figure>
<para id="id4164051">Figure 1: Audio Fingerprint Generation
scheme.</para>
<para id="id5723642">Each time window goes through this
process.</para>
<para id="id5665534">The Fast Fourier Transform (FFT) is taken for
each windowed timeframe and the generated spectrum is divided into
the 25 critical bands. The energy is calculated for each critical
band and energy difference between two consecutive bands is stored
into an array of length 24. This process is done for each timeframe
which results in a matrix of 24 rows and one column for each
timeframe. Within each row of the matrix, consecutive values are
compared. If the value increases from one column index to the next,
the value in the prior column is replaced with a 1. Otherwise it
becomes a -1. This column to column, or timeframe to timeframe,
comparison examines energy fluctuations similar to the way we
compared energy differences between consecutive critical bands.
This overall scheme results in a matrix of dimensions 24 x
(#timeframes -1). This set of 1s and -1s is the audio signal’s
fingerprint. The only thing left to do once is compare audio
fingerprints to find a match and return song information.</para>
</content>
</document>
