<?xml version="1.0" encoding="utf-8" standalone="no"?>
<!DOCTYPE document PUBLIC "-//CNX//DTD CNXML 0.5 plus MathML//EN" "http://cnx.rice.edu/cnxml/0.5/DTD/cnxml_mathml.dtd">
<document xmlns="http://cnx.rice.edu/cnxml" xmlns:md="http://cnx.rice.edu/mdml/0.4" xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:bib="http://bibtexml.sf.net/" id="Module.2004-05-17.4339">
  <name>Fast Convolution</name>
  <metadata>
  <md:version>1.4</md:version>
  <md:created>2004/05/17 11:43:39 GMT-5</md:created>
  <md:revised>2004/06/21 11:59:00.562 GMT-5</md:revised>
  <md:authorlist>
      <md:author id="dljones">
      <md:firstname>Douglas</md:firstname>
      <md:othername>L.</md:othername>
      <md:surname>Jones</md:surname>
      <md:email>dl-jones@uiuc.edu</md:email>
    </md:author>
  </md:authorlist>

  <md:maintainerlist>
    <md:maintainer id="kclarks">
      <md:firstname>Kyle</md:firstname>
      
      <md:surname>Clarkson</md:surname>
      <md:email>kclarks@rice.edu</md:email>
    </md:maintainer>
    <md:maintainer id="harika">
      <md:firstname>Harika</md:firstname>
      
      <md:surname>Basana</md:surname>
      <md:email>ilsai@rice.edu</md:email>
    </md:maintainer>
  </md:maintainerlist>
  
  <md:keywordlist>
    <md:keyword>fast</md:keyword>
    <md:keyword>convolution</md:keyword>
    <md:keyword>FFT</md:keyword>
  </md:keywordlist>

  <md:abstract>Efficient computation of convolution using FFTs.</md:abstract>
</metadata>

  <content>
    <section id="sec_A">
      <name>Fast Circular Convolution</name>
      <para id="intro">
	Since, 
	<m:math display="block">
          

	  <m:apply>
	    <m:eq/>
	    <m:apply>
	      <m:sum/>
	      <m:bvar><m:ci>m</m:ci></m:bvar>
	      <m:lowlimit><m:cn>0</m:cn></m:lowlimit>
	      <m:uplimit>
                <m:apply>
		  <m:minus/>
                  <m:ci>N</m:ci>
                  <m:cn>1</m:cn>
		</m:apply>
              </m:uplimit>
	      <m:apply>
		<m:times/>
		<m:apply>
		  <m:ci type="fn">x</m:ci>
		  <m:ci>m</m:ci>
		</m:apply>
                <m:apply> 
                  <m:rem/>
		  <m:apply>
		    <m:ci type="fn">h</m:ci>
		    
                    <m:apply> 
		      <m:minus/>
		      <m:ci>n</m:ci>
		      <m:ci>m</m:ci> 
                    </m:apply>    
		    
		  </m:apply>
		  <m:ci>N</m:ci>
                </m:apply> 
	      </m:apply>
	    </m:apply>
            <m:apply>
              <m:ci type="fn">y</m:ci>
	      <m:ci>n</m:ci>
            </m:apply>
	  </m:apply> 
          <m:ci> is equivalent to </m:ci>

	  <m:apply>
	    <m:eq/>
	    <m:apply>
              <m:ci type="fn">Y</m:ci>
	      <m:ci>k</m:ci>
	    </m:apply>
	    <m:apply>
	      <m:times/>
	      <m:apply>
		<m:ci type="fn">X</m:ci>
		<m:ci>k</m:ci>
	      </m:apply>
	      <m:apply>
		<m:ci type="fn">H</m:ci>
		<m:ci>k</m:ci>
	      </m:apply>  
	    </m:apply>
	  </m:apply> 
	  
	</m:math>
	
	<m:math> 
	  <m:apply>
	    <m:ci type="fn">y</m:ci>
	    <m:ci>n</m:ci>
	  </m:apply>  
	</m:math> 
	can be computed as 
	<m:math> 
	  <m:apply>
	    <m:eq/>
	    <m:apply>
	      <m:ci type="fn">y</m:ci>
	      <m:ci>n</m:ci>
	    </m:apply> 
	    <m:apply>
	      <m:ci type="fn" class="discrete">IDFT</m:ci>
	      <m:apply>
		<m:times/>
		<m:apply>
		  <m:ci type="fn" class="discrete">DFT</m:ci>
		  <m:apply>
		    <m:ci type="fn">x</m:ci>
		    <m:ci>n</m:ci>
		  </m:apply>
		</m:apply>
		<m:apply>
		  <m:ci type="fn" class="discrete">DFT</m:ci>
		  <m:apply>
		    <m:ci type="fn">h</m:ci>
		    <m:ci>n</m:ci>
		  </m:apply>
		</m:apply>
	      </m:apply>
	    </m:apply>
	  </m:apply>
	</m:math>
	

	<list id="cost" type="bulleted">
	  <name>Cost</name>
	  <item> <list id="cost1">
	      <name>Direct</name>
	      <item><m:math>
		  <m:apply>
		    <m:power/>
		    <m:ci>N</m:ci>
		    <m:ci>2</m:ci>  
		  </m:apply>
		</m:math> complex multiplies. 
	      </item>  
	      <item><m:math>
		  <m:apply>
		    <m:times/>
		    <m:ci>N</m:ci>
		    <m:apply>
		      <m:minus/>
		      <m:ci>N</m:ci> 
		      <m:ci>1</m:ci>  
		    </m:apply>
		  </m:apply>
		</m:math> complex adds.
	      </item>
	    </list>
	  </item>
	  <item><list id="cost2">
	      <name>Via FFTs</name>
	      <item>3 FFTs + <m:math><m:ci>N</m:ci></m:math> multipies.</item> 
	      <item><m:math>
		  <m:apply>
		    <m:plus/>
		    <m:ci>N</m:ci>   
		    <m:apply>
		      <m:times/>
		      <m:apply>
			<m:divide/>  
			<m:apply>
			  <m:times/>
			  <m:cn>3</m:cn>
			  <m:ci>N</m:ci>
			</m:apply> 
			<m:cn>2</m:cn>
		      </m:apply>     
		      <m:apply>
			<m:log/> 
			<m:logbase><m:cn>2</m:cn></m:logbase> 
			<m:ci>N</m:ci> 
		      </m:apply> 
		    </m:apply>
		  </m:apply>
		</m:math> complex multiplies.</item>  
	      <item><m:math>
		  <m:apply>
		    <m:times/>
		    <m:cn>3</m:cn>
		    <m:apply>
		      <m:times/>
		      <m:ci>N</m:ci> 
		      <m:apply>
			<m:log/> 
			<m:logbase><m:cn>2</m:cn></m:logbase>  
			<m:ci>N</m:ci>
		      </m:apply>
		    </m:apply>
		  </m:apply>
		</m:math> complex adds.</item>
	    </list></item>
	</list>
	If <m:math><m:apply>
	    <m:ci type="fn">H</m:ci>
	    <m:ci>k</m:ci>
	  </m:apply>
	</m:math> can be precomputed, cost is only 2 FFts +
	<m:math><m:ci>N</m:ci></m:math> multiplies.
      </para>
    </section> 


    <section>
      <name>Fast Linear Convolution</name> 
      <para id="sec2_1">
	<cnxn document="m12032" target="DFTequation">DFT</cnxn> produces cicular convolution. For linear convolution, we
	must zero-pad sequences so that circular wrap-around always
	wraps over zeros.  </para>

      <figure id="fig1001" orient="horizontal">
	<media type="image/png" src="figure6.png"/>
      </figure> 
      <para id="sec2_2">
	To achieve linear convolution using fast circular convolution,
	we must use zero-padded DFTs of length
	<m:math>
	  <m:apply>
	    <m:geq/>
	    <m:ci>N</m:ci>
	    <m:apply>
	      <m:minus/>
	      <m:apply>
		<m:plus/>
		<m:ci>L</m:ci>
		<m:ci>M</m:ci>
	      </m:apply>
	      <m:cn>1</m:cn> 
	    </m:apply>
	  </m:apply>
	</m:math> 
      </para>

      <figure id="fig1002" orient="horizontal">
	<media type="image/png" src="Figure7.png"/> 
      </figure> 

      <para id="sec2_3">
	Choose shortest convenient 
	<m:math>
	  <m:apply>
	    <m:ci>N</m:ci>
	  </m:apply>
	</m:math>
	(usually smallest power-of-two greater than or equal to 
	<m:math>
	  <m:apply>
	    <m:minus/>
	    <m:apply>
	      <m:plus/>
	      <m:ci>L</m:ci>
	      <m:ci>M</m:ci>
	    </m:apply>
	    <m:cn>1</m:cn>
	  </m:apply>
	  
	</m:math>)

	<m:math display="block">
	  <m:apply>
	    <m:eq/>
	    <m:apply>
	      <m:ci type="fn">y</m:ci>
	      <m:ci>n</m:ci>
	    </m:apply>
	    <m:apply>
	      <m:ci type="fn" class="discrete"><m:msub>
		  <m:mi>IDFT</m:mi>
		  <m:mi>N</m:mi>
		</m:msub></m:ci>
	      <m:apply>
		<m:times/>
		<m:apply>
		  <m:ci type="fn" class="discrete">
		    <m:msub><m:mi>DFT</m:mi>
		      <m:mi>N</m:mi>
		    </m:msub></m:ci>
		  <m:apply>
		    <m:ci type="fn">x</m:ci>     
		    <m:ci>n</m:ci>     
		  </m:apply> 
		</m:apply>
		<m:apply>
		  <m:ci type="fn" class="discrete"><m:msub>
		      <m:mi>DFT</m:mi>
		      <m:mi>N</m:mi>
		    </m:msub></m:ci>
		  <m:apply>
		    <m:ci type="fn">h</m:ci>     
		    <m:ci>n</m:ci>     
		  </m:apply>
		</m:apply>
	      </m:apply>
	      
	    </m:apply>      

	  </m:apply>
	</m:math>
	<note type="note">There is some inefficiency when compared to circular convolution due to longer zero-padded <cnxn document="m12032" target="DFTequation">DFTs</cnxn>. Still, 
	  <m:math>
	    <m:apply>
	      <m:ci type="fn">O</m:ci>
	      <m:apply>
		<m:divide/>
		<m:ci>N</m:ci>
		<m:apply>
		  <m:log/>
		  <m:logbase><m:cn>2</m:cn></m:logbase>
		  <m:ci>N</m:ci>          
		</m:apply>
	      </m:apply>
	    </m:apply> 
	  </m:math>
	  savings over direct computation.</note>
      </para> 
    </section>


    <section>
      <name>Running Convolution</name>
      <para id="sec3_1"> Suppose 
	<m:math>
	  <m:apply>
	    <m:eq/>
	    <m:ci>L</m:ci>
	    <m:infinity/>
	  </m:apply>
	</m:math>, as in a real time filter application, or 

	<m:math>
	  <m:apply>   
	    <m:mo>≫</m:mo>
	    <m:ci>L</m:ci>
	    <m:ci>M</m:ci>
	  </m:apply>
	</m:math>. There are efficient block methods for computing fast convolution.
      </para>
      
      <section>
	<name>Overlap-Save (OLS) Method</name>
	<para id="subsec_1.1">
	  Note that if a length-<m:math><m:ci>M</m:ci></m:math> filter
	  <m:math>
	    <m:apply>
	      <m:ci type="fn">h</m:ci>
	      <m:ci>n</m:ci>
	    </m:apply>
	  </m:math>
	  is circularly convulved with a
	  length-<m:math><m:ci>N</m:ci></m:math> segment of a signal
	  <m:math>
	    <m:apply>
	      <m:ci type="fn">x</m:ci>
	      <m:ci>n</m:ci>
	    </m:apply>
	  </m:math>,
	  
	  <figure id="fig_sec3_1">
	    <media type="image/png" src="figure4.png"/>
	  </figure>
	  
	  the first 
	  <m:math>
	    <m:apply>
	      <m:minus/>
	      <m:ci>M</m:ci>
	      <m:cn>1</m:cn>
	    </m:apply>
	  </m:math>
	  samples are wrapped around and thus is
	  <emphasis>incorrect</emphasis>. However, for
	  <m:math>
	    <m:apply>
	      <m:leq/> 
	      <m:apply>
		<m:minus/>
		<m:ci>M</m:ci>
		<m:cn>1</m:cn>
	      </m:apply>
	      <m:apply>
		<m:leq/>
		<m:ci>n</m:ci>
		<m:apply>
		  <m:minus/>
		  <m:ci>N</m:ci>
		  <m:cn>1</m:cn>
		</m:apply>
	      </m:apply>
	    </m:apply> </m:math>,the convolution is linear
	  convolution, so these samples are correct. Thus
	  <m:math>
	    <m:apply>
	      <m:plus/>
	      <m:apply>
		<m:minus/>
		<m:ci>N</m:ci> 
		<m:ci>M</m:ci>
	      </m:apply>
	      <m:cn>1</m:cn>  
	    </m:apply>
	  </m:math> good outputs are produced for each
	  length-<m:math><m:ci>N</m:ci></m:math> circular convolution.
	</para>

	<para id="subsec_1.2">
	  The Overlap-Save Method: Break long signal into successive
	  blocks of
	  <m:math>
	    <m:ci>N</m:ci>  
	  </m:math>
	  samples, each block overlapping the previous block by 
	  <m:math>
	    <m:apply>
	      <m:minus/>
	      <m:ci>M</m:ci>
	      <m:cn>1</m:cn>
	    </m:apply>
	  </m:math> samples. Perform circular convolution of each
	  block with filter
	  <m:math>
	    <m:apply>
	      <m:ci type="fn">h</m:ci>
	      <m:ci>m</m:ci>
	    </m:apply>
	  </m:math>. Discard first 
	  <m:math>
	    <m:apply>
	      <m:minus/>
	      <m:ci>M</m:ci>
	      <m:cn>1</m:cn>
	    </m:apply>
	  </m:math>
	  points in each output block, and concatenate the remaining
	  points to create <m:math>
	    <m:apply>
	      <m:ci type="fn">y</m:ci>
	      <m:ci>n</m:ci>
	    </m:apply>
	  </m:math>.   
	  
	  <figure id="fig_sec3_2">
	    <media type="image/png" src="Figure1.png"/>
	  </figure>
	</para>
	
	<para id="subsec_1.3">
	  Computation cost for a length-<m:math><m:ci>N</m:ci></m:math> equals

	  <m:math>
	    <m:apply>
	      <m:power/>
	      <m:cn>2</m:cn>
	      <m:ci>n</m:ci> 
	    </m:apply>
	  </m:math> FFT per output sample is (assuming precomputed
	  <m:math>
	    <m:apply>
	      <m:ci type="fn">H</m:ci>
	      <m:ci>k</m:ci> 
	    </m:apply>
	  </m:math>) 2 FFTs and <m:math><m:ci>N</m:ci></m:math> multiplies

	  <m:math display="block">
	    <m:apply>
	      <m:eq/>
	      
	      <m:apply>
		<m:divide/>
		<m:apply>
		  <m:plus/>
		  <m:apply>      
		    <m:times/>
		    <m:cn>2</m:cn>
		    <m:apply> 
		      <m:times/>
		      <m:apply>
			<m:divide/>
		        <m:ci>N</m:ci>    
                        <m:cn>2</m:cn>
		      </m:apply> 
		      <m:apply>
			<m:log/> 
		        <m:logbase><m:cn>2</m:cn></m:logbase> 
		        <m:ci>N</m:ci> 
		      </m:apply> 
		    </m:apply>
		  </m:apply>
		  <m:ci>N</m:ci>
		</m:apply>
		<m:apply>
		  <m:plus/>
		  <m:apply>	
		    <m:minus/>
		    <m:ci>N</m:ci> 
		    <m:ci>M</m:ci>
		  </m:apply>
		  <m:cn>1</m:cn>    
		</m:apply>
	      </m:apply>
	      
	      <m:apply>
		<m:divide/>
		<m:apply>
		  <m:times/>
		  <m:ci>N</m:ci>
		  <m:apply>
		    <m:plus/>
		    <m:apply>
		      <m:log/>
		      <m:logbase><m:cn>2</m:cn></m:logbase> 
		      <m:ci>N</m:ci> 
		    </m:apply>
		    <m:cn>1</m:cn>
		  </m:apply>  
		</m:apply>
		
		<m:apply>
		  <m:plus/>
		  <m:apply>	
		    <m:minus/>
		    <m:ci>N</m:ci> 
		    <m:ci>M</m:ci>
		  </m:apply>
		  <m:cn>1</m:cn>  
		</m:apply>
		
	      </m:apply>    


	    </m:apply>
	    <m:ci>complex multiplies</m:ci>
	  </m:math> 

	  <m:math display="block">
	    <m:apply>
	      <m:eq/>
	      
	      <m:apply>
		<m:divide/>
		<m:apply>      
		  <m:times/>
		  <m:cn>2</m:cn>
		  <m:apply> 
		    <m:times/>
                    <m:ci>N</m:ci>
                    <m:apply>
		      <m:log/> 
		      <m:logbase><m:cn>2</m:cn></m:logbase> 
		      <m:ci>N</m:ci> 
		    </m:apply> 
		  </m:apply>
		</m:apply>
		
		<m:apply>
		  <m:plus/>
		  <m:apply>	
		    <m:minus/>
		    <m:ci>N</m:ci> 
		    <m:ci>M</m:ci>
		  </m:apply>
		  <m:cn>1</m:cn>    
		</m:apply>
	      </m:apply>
	      
	      <m:apply>
		<m:divide/>
		<m:apply>
		  <m:times/>
		  <m:cn>2</m:cn>
		  <m:ci>N</m:ci>
		  <m:apply>
		    <m:log/>
		    <m:logbase><m:cn>2</m:cn></m:logbase> 
		    <m:ci>N</m:ci> 
		  </m:apply>
		</m:apply>
		
		<m:apply>
		  <m:plus/>
		  <m:apply>	
		    <m:minus/>
		    <m:ci>N</m:ci> 
		    <m:ci>M</m:ci>
		  </m:apply>
		  <m:cn>1</m:cn>  
		</m:apply>
		
	      </m:apply>    


	    </m:apply>
	    <m:ci>    complex adds</m:ci>
	  </m:math> 
	</para> 

	<para id="subsec_1.4">
	  Compare to  
	  <m:math>
	    <m:ci>M</m:ci>
	  </m:math> mults, 

	  <m:math>
	    <m:apply>
	      <m:minus/>
	      <m:ci>M</m:ci>
	      <m:cn>1</m:cn>
	    </m:apply>
	  </m:math> adds per output point for direct method. For a given
	  <m:math>
	    <m:ci>M</m:ci>
	  </m:math>, optimal 
	  <m:math>
	    <m:ci>N</m:ci>
	  </m:math> can be determined by finding 
	  <m:math>
	    <m:ci>N</m:ci>
	  </m:math> minimizing operation counts. Usualy, optimal 
	  <m:math>
	    <m:ci>N</m:ci>
	  </m:math> is 

	  <m:math>
	    <m:apply>
	      <m:leq/> 
	      <m:apply>
		<m:times/>
		<m:cn>4</m:cn>
		<m:ci>M</m:ci>
	      </m:apply>
	      <m:ci><m:msub><m:mi>N</m:mi><m:mi>opt</m:mi></m:msub></m:ci>
	      <m:apply>
		<m:times/>
		<m:cn>8</m:cn>
		<m:ci>M</m:ci>
	      </m:apply>
	    </m:apply>
	  </m:math>.
	</para> 
      </section>

      <section>
	<name>Overlap-Add (OLA) Method</name>
	<para id="subsec_2.1">
	  Zero-pad length-<m:math><m:ci>L</m:ci></m:math> blocks by 
	  <m:math>
	    <m:apply>
	      <m:minus/>
	      <m:ci>M</m:ci>
	      <m:cn>1</m:cn>
	    </m:apply>
	  </m:math> samples.
	  <figure id="fig_006">
	    <media type="image/png" src="figure5.png"/>
	  </figure>

	</para>

	<para id="subsec_2.2">
	  Add successive blocks, overlapped by 
	  <m:math>
	    <m:apply>
	      <m:minus/>
	      <m:ci>M</m:ci>
	      <m:cn>1</m:cn>
	    </m:apply>
	  </m:math> samples, so that the tails sum to produce the
	  complete linear convolution.
	  <figure id="fig_007">
	    <media type="image/png" src="Figure2.png"/>
	  </figure>
	  Computational Cost: Two length 
	  <m:math>
	    <m:apply>
	      <m:eq/>
	      <m:ci>N</m:ci>
	      <m:apply>
		<m:minus/>
		<m:apply>
		  <m:plus/>
		  <m:ci>L</m:ci> 
		  <m:ci>M</m:ci>
		</m:apply>
		<m:cn>1</m:cn>  
	      </m:apply>
	    </m:apply>
	  </m:math> FFTs and
	  <m:math>
	    <m:ci>M</m:ci>
	  </m:math>  mults and
	  <m:math>
	    <m:apply>
	      <m:minus/>
	      <m:ci>M</m:ci>
	      <m:cn>1</m:cn>
	    </m:apply>
	  </m:math> adds per 
	  <m:math>
	    <m:ci>L</m:ci>
	  </m:math> output points; essentially the sames as OLS method.

	</para>
      </section>

    </section>

  </content>
  
</document>
