Based on the distinct time-domain characteristics of a laugh track, it is possible to use a simple method to detect such sounds. Our primary detection method uses the envelope of the input signal to find laughs.

Summary: This module discusses discusses the primary detection methods used in a real-time laugh track removal system. It is part of a larger series discussing the implementation of this system.

Based on the distinct time-domain characteristics of a laugh track, it is possible to use a simple method to detect such sounds. Our primary detection method uses the envelope of the input signal to find laughs.

We used two methods of finding and smoothing the envelope of the input signal. In the first method, the magnitude of the input signal is fed into a low pass filter and then squared to obtain the envelope. The filter is a 1000-point fir, linear phase filter generated by MATLAB. The method is simple and easy to implement, but not very efficient.

Another method of finding the envelope of a signal is by using the Hilbert Transform. The Hilbert Transform shifts all the positive frequencies in a signal forward by pi/2 and all the negative frequencies backward by the same amount. The envelope may then be calculated by taking the square root of the sum of the squares of the Hilbert transform and the original signal. The Hilbert Transform is calculated by taking the FFT of the input signal, multiplying the positive frequencies by j and the negative frequencies by –j, and taking the inverse FFT. As in the first method, the envelope needs to be smoothed for further processing by low-pass filtering.

Once the envelope of the signal is found, laughs are detected by a threshold system. The location routine iterates through the samples of the envelope looking for values above a given amplitude threshold. Once this threshold is reached, the routine continues, tracking how long the envelope stays above a second amplitude threshold (lower than the first). If this width reaches a given threshold, the part of the signal from where its envelope rises above the first amplitude threshold to the part where its envelope drops below the second amplitude threshold is flagged as a laugh.

Our primary detection method gave us good results for the sound clips we used (sitcoms with audio tracks consisting mostly of dialogue and laughs). It does however have a few shortcomings. This method has trouble detecting laughs of low volume, and tends to cut out other sounds that overlap with the laughs. Additionally, it has trouble finding precisely the beginning and end of laughs, due to variations in the envelope shapes of different laughs. We had very little trouble with false positives in the clips we tested, however more sophisticated methods would be required to distinguish between laughter and other sounds with similar time-domain characteristics (such as applause).