Skip to content Skip to navigation

OpenStax_CNX

You are here: Home » Content » DirectShow Filter Design for Laugh Track Removal

Navigation

Lenses

What is a lens?

Definition of a lens

Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

Who can create a lens?

Any individual member, a community, or a respected organization.

What are tags? tag icon

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

This content is ...

Affiliated with (What does "Affiliated with" mean?)

This content is either by members of the organizations listed or about topics related to the organizations listed. Click each link to see a list of all content affiliated with the organization.
  • Rice University ELEC 301 Projects display tagshide tags

    This module is included inLens: Rice University ELEC 301 Project Lens
    By: Rice University ELEC 301As a part of collection: "ELEC 301 Projects Fall 2007"

    Click the "Rice University ELEC 301 Projects" link to see all content affiliated with them.

    Click the tag icon tag icon to display tags associated with this content.

Also in these lenses

  • Lens for Engineering

    This module is included inLens: Lens for Engineering
    By: Sidney BurrusAs a part of collection: "ELEC 301 Projects Fall 2007"

    Click the "Lens for Engineering" link to see all content selected in this lens.

Recently Viewed

This feature requires Javascript to be enabled.

Tags

(What is a tag?)

These tags come from the endorsement, affiliation, and other lenses that include this content.
 

DirectShow Filter Design for Laugh Track Removal

Module by: Justin Nordin. E-mail the author

Summary: This module discusses the implementation of a DirectShow filter designed to remove laugh tracks from audio streams. It is part of a series discussing the implementation of a real-time laugh track removal system. A link containing a working version of the filter is provided.

Real Time Implementation for Laugh Track Removal

Overview

In order to make best use of the Laugh Track Assassinator's algorithm, we need to be able to run it in real time with as wide a range of source materials as possible. To accomplish this lofty goal, we have implemented a DirectShow filter. DirectShow is Microsoft's technology for manipulating media on the Windows platform. Nearly all media players, such as Windows Media Player, Media Player Classic, and various DVD program, use DirectShow to render video and audio. By writing a DirectShow filter, our algorithm can be used to manipulate nearly any type of media, be it a DVD, an encoded movie, or a live TV video stream.

Direct Show

All DirectShow operations are based on filters. Filters describe the translation of data from one source or type to another. DirectShow automatically finds what filters are needed to play a particular media file. The generated graph can be visualized in Microsoft's GraphEdit program. Here is what the generated graph looks like for a source video file with the Laugh Track Assassinator filter inserted:

Figure 1: This is the filter graph generated by Microsoft DirectShow with the Laugh Track Assassinator filter already inserted.
Filter Graph
Filter Graph (graphics1.png)

DirectShow has generated an AVI splitter to transform the file data into an audio and video stream. The video is then sent to the ffdshow Video Decoder filter, which is then sent to the Video Renderer. The audio stream is sent from the file, through the MP3 Decoder, an AC3Filter, the Laugh Track Assassinator, and finally rendered to the speakers through the DirectSound filter.

To create the DirectShow-compatible filter we used Microsoft's Windows SDK, and rewrote the audio transform filter example. (The Windows SDK can be downloaded from Microsoft here). We then coded the two main steps in our algorithm: a low pass filter and a threshold detection scheme.

Low Pass Filter

In order to find a balance between frequency resolution and speed, we chose a 1000-point finite impulse response low pass filter. We had Matlab generate the one thousand filter weights, and then we converted them into a C++ format suitable for DirectShow. Since the filter requires 1000 previous samples to calculate one low pass filtered sample, we created a 1000 point circular buffer to hold the last 1000 samples of the input at any given time.

Finite State Machine

The final step in our removal algorithm requires a threshold detection in both amplitude (vertical) and time (horizontal). The requirement for a time-based threshold meant we had to delay the input signal by at least the width of the horizontal threshold. In the end we decided on a 1 second delay to allow for the width threshold of 0.8 seconds, as well as making it easier to resynchronize the video signal with the audio afterward.

The actual threshold test are performed by means of a finite state machine. Here is an overview of the FSM:

Figure 2: This is the state diagram for the real time Laugh Track Assassinator filter.
State Diagram
State Diagram (Elec 301 - State Diagram.png)

As soon as the amplitude threshold for the low-passed signal is met, the filter enters the Possible Laugh state. From here, if the signal falls below the falling amplitude threshold, the machine returns to the Initial State. If the width threshold is reached, then the machine enters the Laugh Detected state, and continually suppress the output audio. During this transition, the last second of audio is also eliminated from the output buffer. Since the filter is delayed by at least 1 second, as long as the width threshold is less than this value, the output will reflect the proper changes. Finally, as soon as the falling amplitude threshold is passed, the machine again returns to its Initial State.

Optimization

The scheme described above generates a working laugh track removal filter. One big problem, however, is speed. Though the above system works on a high-end computer for a real time video signal, any moderate computer will not be able to run it. The chief problem is in the low pass filtering phase.

The low pass filter takes 1000 samples to calculate 1 sample of the low passed signal. This means there are roughly 2000 operations (1000 additions and 1000 multiplications) per sample. With a standard sampling rate of 44.1 kHz, that means the filter uses 44.1 million operations per second. This is generally unacceptable when accounting for the overhead in the filtering process.

To speed the filter up, we must first realize that we do not need an accurate low pass signal value for every sample. In fact, if we took every 1000 samples of the low pass signal, we would only need to perform 2 operations per sample to get the same results. Using this method gives us a speed increase of 1000x by effectively sampling the low pass filter output. Generally, strictly sampling a signal like this produces rather severe aliasing. But, since the signal is already low-pass-filtered, the signal has already gone anti-aliasing processing, and the optimization works out.

Download and Installation

The Laugh Track Assassinator filter source code can be downloaded here. The Hamming filter we used can be downloaded here. This is not a complete source listing, but rather the function that handles the actual laughtrack filtering. This code is suitable to be ported to any system that operates on PCM audio streams.

The Laugh Track Assassinator filter can be downloaded here. Since this is implemented as a DirectShow filter, this will only run on Windows-based computers.

To install, follow these steps:

  • Copy the LaughTrackAssassinator.dll file into your C:\Windows\System32 folder.
  • Open a command prompt window (Start->Run->“cmd”).
  • Type “regsvr32 LaughTrackAssassinator.dll” and press enter in the command box.
  • The Laugh Track Assassinator is now registered with DirectShow.

Now that the filter is registered, most any DirectShow based media player should be able to use the filter on any media. We tested the filter with Media Player Classic, a free media player that can be downloaded here. Here are the steps to get it to work:

  • Open Media Player Classic.
  • Go to View->Options->External Filters.
  • Select “Add filter...”.
  • Select the Laugh Track Assassinator from the list of available filters.
  • Select the newly added filter, and select the “Prefer” radio button.

You can now view any media that has audio and it will automatically run the Laugh Track Assassinator. In order to get video back in sync with the audio, you can set the audio delay to 500ms in Media Player Classic by using the + and – keys on the numpad of your keyboard.

Content actions

Download module as:

PDF | EPUB (?)

What is an EPUB file?

EPUB is an electronic book format that can be read on a variety of mobile devices.

Downloading to a reading device

For detailed instructions on how to download this content's EPUB to your specific device, click the "(?)" link.

| More downloads ...

Add module to:

My Favorites (?)

'My Favorites' is a special kind of lens which you can use to bookmark modules and collections. 'My Favorites' can only be seen by you, and collections saved in 'My Favorites' can remember the last module you were on. You need an account to use 'My Favorites'.

| A lens I own (?)

Definition of a lens

Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

Who can create a lens?

Any individual member, a community, or a respected organization.

What are tags? tag icon

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

| External bookmarks