<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE document PUBLIC "-//CNX//DTD CNXML 0.5 plus MathML//EN" "http://cnx.rice.edu/cnxml/0.5/DTD/cnxml_mathml.dtd">
<document xmlns="http://cnx.rice.edu/cnxml" xmlns:md="http://cnx.rice.edu/mdml/0.4" xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:bib="http://bibtexml.sf.net/" id="id9581428">
  <name>DirectShow Filter Design for Laugh Track Removal</name>
  <metadata>
  <md:version>1.3</md:version>
  <md:created>2007/12/16 20:16:14 US/Central</md:created>
  <md:revised>2007/12/18 03:24:49.385 US/Central</md:revised>
  <md:authorlist>
      <md:author id="nordin">
      <md:firstname>Justin</md:firstname>
      <md:othername>Layne</md:othername>
      <md:surname>Nordin</md:surname>
      <md:email>nordin@rice.edu</md:email>
    </md:author>
  </md:authorlist>

  <md:maintainerlist>
    <md:maintainer id="nordin">
      <md:firstname>Justin</md:firstname>
      <md:othername>Layne</md:othername>
      <md:surname>Nordin</md:surname>
      <md:email>nordin@rice.edu</md:email>
    </md:maintainer>
  </md:maintainerlist>
  
  <md:keywordlist>
    <md:keyword>Canned Laughter</md:keyword>
    <md:keyword>DirectShow</md:keyword>
    <md:keyword>Filtering</md:keyword>
    <md:keyword>Finite State Machine</md:keyword>
    <md:keyword>FSM</md:keyword>
    <md:keyword>Laugh Track</md:keyword>
    <md:keyword>Real-time</md:keyword>
    <md:keyword>Signal Processing</md:keyword>
  </md:keywordlist>

  <md:abstract>This module discusses the implementation of a DirectShow filter designed to remove laugh tracks from audio streams. It is part of a series discussing the implementation of a real-time laugh track removal system. A link containing a working version of the filter is provided.</md:abstract>
</metadata>
  <content>
    <section id="id-270115163605">
      <name>Real Time Implementation for Laugh Track Removal</name>
      <section id="id-683710602497">
        <name>Overview</name>
        <para id="id10684420">In order to make best use of the <emphasis>Laugh Track Assassinator</emphasis>'s algorithm, we need to be able to run it in real time with as wide a range of source materials as possible. To accomplish this lofty goal, we have implemented a <term>DirectShow</term> filter. DirectShow is Microsoft's technology for manipulating media on the Windows platform. Nearly all media players, such as Windows Media Player, Media Player Classic, and various DVD program, use DirectShow to render video and audio. By writing a DirectShow filter, our algorithm can be used to manipulate nearly any type of media, be it a DVD, an encoded movie, or a live TV video stream.</para>
      </section>
      <section id="id-13164221782">
        <name>Direct Show</name>
        <para id="id10684453">All DirectShow operations are based on filters. Filters describe the translation of data from one source or type to another. DirectShow automatically finds what filters are needed to play a particular media file. The generated graph can be visualized in Microsoft's <emphasis>GraphEdit</emphasis> program. Here is what the generated graph looks like for a source video file with the <emphasis>Laugh Track Assassinator</emphasis> filter inserted:</para>
        
        <figure id="element-721"><name>Filter Graph</name>
  <media type="image/png" src="graphics1.png">
<param name="width" value="1400"/>
<param name="height" value="200"/>
</media>
  <caption>This is the filter graph generated by Microsoft DirectShow with the Laugh Track Assassinator filter already inserted.</caption></figure><para id="id10687865">DirectShow has generated an AVI splitter to transform the file data into an audio and video stream. The video is then sent to the <emphasis>ffdshow Video Decoder</emphasis> filter, which is then sent to the <emphasis>Video Renderer</emphasis>. The audio stream is sent from the file, through the <emphasis>MP3 Decoder</emphasis>, an <emphasis>AC3Filter</emphasis>, the <emphasis>Laugh Track Assassinator</emphasis>, and finally rendered to the speakers through the <emphasis>DirectSound</emphasis> filter.</para>
        <para id="id10687906">To create the DirectShow-compatible filter we used Microsoft's <emphasis>Windows SDK</emphasis>, and rewrote the audio transform filter example. (The Windows SDK can be downloaded from Microsoft <link src="http://www.microsoft.com/downloads/details.aspx?familyid=4377F86D-C913-4B5C-B87E-EF72E5B4E065&amp;displaylang=en">here</link>). We then coded the two main steps in our algorithm: a low pass filter and a threshold detection scheme.</para>
      </section>
      <section id="id-885862484126">
        <name>Low Pass Filter</name>
        <para id="id10687938">In order to find a balance between frequency resolution and speed, we chose a 1000-point finite impulse response <term>low pass filter</term>. We had <emphasis>Matlab</emphasis> generate the one thousand filter weights, and then we converted them into a C++ format suitable for DirectShow. Since the filter requires 1000 previous samples to calculate one low pass filtered sample, we created a 1000 point circular buffer to hold the last 1000 samples of the input at any given time.</para>
      </section>
      <section id="id-968749230573">
        <name>Finite State Machine</name>
        <para id="id10687968">The final step in our removal algorithm requires a threshold detection in both amplitude (vertical) and time (horizontal). The requirement for a time-based threshold meant we had to delay the input signal by at least the width of the horizontal threshold. In the end we decided on a 1 second delay to allow for the width threshold of 0.8 seconds, as well as making it easier to resynchronize the video signal with the audio afterward.</para>
        <para id="id10687980">The actual threshold test are performed by means of a finite state machine. Here is an overview of the <term>FSM</term>:</para>
        <figure id="element-608"><name>State Diagram</name>
  <media type="image/png" src="Elec 301 - State Diagram.png">
</media>
  <caption>This is the state diagram for the real time Laugh Track Assassinator filter.</caption></figure><para id="id10687992">As soon as the amplitude threshold for the low-passed signal is met, the filter enters the <emphasis>Possible Laugh</emphasis> state. From here, if the signal falls below the falling amplitude threshold, the machine returns to the <emphasis>Initial State</emphasis>. If the width threshold is reached, then the machine enters the <emphasis>Laugh Detected</emphasis> state, and continually suppress the output audio. During this transition, the last second of audio is also eliminated from the output buffer. Since the filter is delayed by at least 1 second, as long as the width threshold is less than this value, the output will reflect the proper changes. Finally, as soon as the falling amplitude threshold is passed, the machine again returns to its <emphasis>Initial State</emphasis>.</para>
      </section>
      <section id="id-335283997929">
        <name>Optimization</name>
        <para id="id10688060">The scheme described above generates a working laugh track removal filter. One big problem, however, is speed. Though the above system works on a high-end computer for a real time video signal, any moderate computer will not be able to run it. The chief problem is in the low pass filtering phase. </para>
        <para id="id10688069">The low pass filter takes 1000 samples to calculate 1 sample of the low passed signal. This means there are roughly 2000 operations (1000 additions and 1000 multiplications) per sample. With a standard sampling rate of 44.1 kHz, that means the filter uses 44.1 million operations per second. This is generally unacceptable when accounting for the overhead in the filtering process.</para>
        <para id="id10688080">To speed the filter up, we must first realize that we do not need an accurate low pass signal value for every sample. In fact, if we took every 1000 samples of the low pass signal, we would only need to perform 2 operations per sample to get the same results. Using this method gives us a speed increase of 1000x by effectively sampling the low pass filter output. Generally, strictly sampling a signal like this produces rather severe aliasing. But, since the signal is already low-pass-filtered, the signal has already gone anti-aliasing processing, and the optimization works out.</para>
      </section>
      <section id="id-706478961825">
        <name>Download and Installation</name>
        <para id="id10688102">The <emphasis>Laugh Track Assassinator</emphasis> filter can be downloaded <link src="LaughTrackAssassinator.dll">here</link>. Since this is implemented as a DirectShow filter, this will only run on Windows-based computers.</para>
        <para id="id10688114">To install, follow these steps: </para>
        <list type="bulleted" id="id10688118">
          <item>Copy the LaughTrackAssassinator.dll file into your C:\Windows\System32 folder.</item>
          <item>Open a command prompt window (Start-&gt;Run-&gt;“cmd”).</item>
          <item>Type “regsvr32 LaughTrackAssassinator.dll” and press enter in the command box.</item>
          <item>The Laugh Track Assassinator is now registered with DirectShow.</item>
        </list>
        <para id="id10688149">Now that the filter is registered, most any DirectShow based media player should be able to use the filter on any media. We tested the filter with <term>Media Player Classic</term>, a free media player that can be downloaded <link src="http://sourceforge.net/project/showfiles.php?group_id=82303&amp;package_id=84358">here</link>. Here are the steps to get it to work:</para>
        <list type="bulleted" id="id10688172">
          <item>Open Media Player Classic.</item>
          <item>Go to View-&gt;Options-&gt;External Filters.</item>
          <item>Select “Add filter...”.</item>
          <item>Select the <emphasis>Laugh Track Assassinator</emphasis> from the list of available filters.</item>
          <item>Select the newly added filter, and select the “Prefer” radio button.</item>
        </list>
        <para id="id10688212">You can now view any media that has audio and it will automatically run the <emphasis>Laugh Track Assassinator</emphasis>. In order to get video back in sync with the audio, you can set the audio delay to 500ms in Media Player Classic by using the + and – keys on the numpad of your keyboard.</para>
      </section>
    </section>
  </content>
</document>
