<?xml version="1.0" encoding="utf-8" standalone="no"?>
<!DOCTYPE document PUBLIC "-//CNX//DTD CNXML 0.5 plus MathML//EN" "http://cnx.rice.edu/cnxml/0.5/DTD/cnxml_mathml.dtd">
<document xmlns="http://cnx.rice.edu/cnxml" xmlns:md="http://cnx.rice.edu/mdml/0.4" xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:bib="http://bibtexml.sf.net/" id="new26">
  <name>Quantization Error in FIR Filters</name>
  <metadata>
  <md:version>1.1</md:version>
  <md:created>2003/07/16 16:46:38 GMT-5</md:created>
  <md:revised>2004/12/29 17:28:10.555 US/Central</md:revised>
  <md:authorlist>
      <md:author id="dljones">
      <md:firstname>Douglas</md:firstname>
      <md:othername>L.</md:othername>
      <md:surname>Jones</md:surname>
      <md:email>dl-jones@uiuc.edu</md:email>
    </md:author>
  </md:authorlist>

  <md:maintainerlist>
    <md:maintainer id="dljones">
      <md:firstname>Douglas</md:firstname>
      <md:othername>L.</md:othername>
      <md:surname>Jones</md:surname>
      <md:email>dl-jones@uiuc.edu</md:email>
    </md:maintainer>
    <md:maintainer id="kclarks">
      <md:firstname>Kyle</md:firstname>
      
      <md:surname>Clarkson</md:surname>
      <md:email>kclarks@rice.edu</md:email>
    </md:maintainer>
  </md:maintainerlist>
  
  <md:keywordlist>
    <md:keyword>coefficient quantization</md:keyword>
    <md:keyword>data quantization</md:keyword>
    <md:keyword>direct form</md:keyword>
    <md:keyword>FIR filters</md:keyword>
    <md:keyword>quantization error</md:keyword>
    <md:keyword>transpose form</md:keyword>
  </md:keywordlist>

  <md:abstract>FIR filters suffer from both data and coefficient quantization; each has different effects.  Double-precision accumulation inside the FIR filter structure greatly reduces the data quantization error.</md:abstract>
</metadata>

  <content>
    <para id="para1">
      In digital filters, both the data at various places in the
    filter, which are continually varying, and the coefficients, which
    are fixed, must be quantized. The effects of quantization on data
    and coefficients are quite different, so they are analyzed
    separately.
    </para>
    <section id="section1">
      <name>Data Quantization</name>
      <para id="para2">
	Typically, the input and output in a digital filter are
	quantized by the analog-to-digital and digital-to-analog converters,
        respectively. Quantization also occurs at
	various points in a filter structure, usually after a
	multiply, since multiplies increase the number of bits.
      </para>
      <section id="section2">
	<name>Direct-form Structures</name>
	<para id="para3">
	  There are two common possibilities for quantization in a
	  direct-form FIR filter structure:
          after each multiply, or only once at the end.
	  
	  <figure id="figure1" orient="vertical">
	    <subfigure>
	      <media type="image/png" src="fig1QuantErrorFIR.png"/>
	      <caption> Single-precision accumulate; total variance
		<m:math>
		  <m:apply>
		    <m:eq/>
		    <m:apply>
		      <m:times/>
		      <m:ci>M</m:ci>
		      <m:apply>
			<m:divide/>
			<m:apply>
			  <m:power/>
			  <m:ci>Δ</m:ci>
			  <m:cn>2</m:cn>
			</m:apply>
			<m:cn>12</m:cn>
		      </m:apply>
		    </m:apply>
		  </m:apply>
		</m:math>
	      </caption>
	    </subfigure>
	    <subfigure>
	      <media type="image/png" src="fig2QuantErrorFIR.png"/>
	      <caption> Double-precision accumulate; variance
		<m:math>
		  <m:apply>
		    <m:eq/>
		    <m:apply>
		      <m:divide/>
		      <m:apply>
			<m:power/>
			<m:ci>Δ</m:ci>
			<m:cn>2</m:cn>
		      </m:apply>
		      <m:cn>12</m:cn>
		    </m:apply>
		  </m:apply>
		</m:math>
	      </caption>
	    </subfigure>
	  </figure>
	 
	 
	  In the latter structure, a double-length accumulator adds all 
	  <m:math>
	    <m:apply>
	      <m:minus/>
	      <m:apply>
		<m:times/>
		<m:cn>2</m:cn>
		<m:ci>B</m:ci>
	      </m:apply>
	      <m:cn>1</m:cn>
	    </m:apply>
	  </m:math> bits of each product into the accumulating sum, and
	  truncates only at the end. Obviously, this is much
	  preferred, and should <emphasis>always</emphasis> be used
	  wherever possible. All DSP microprocessors and most
	  general-pupose computers support double-precision
	  accumulation.
	</para>
      </section>
      <section id="section3">
	<name>Transpose-form</name>
        <para id="para3.5">
          Similarly, the transpose-form FIR filter structure presents two
          common options for quantization: after each multiply, or once at the end.
	<figure id="figure2" orient="vertical">
	  <subfigure>
	    <media type="image/png" src="fig3QuantErrorFIR.png"/>
	    <caption>Quantize at each stage before storing
	    intermediate sum. Output variance
	      <m:math>
		<m:apply>
		  <m:eq/>
		  <m:apply>
		    <m:times/>
		    <m:ci>M</m:ci>
		    <m:apply>
		      <m:divide/>
		      <m:apply>
			<m:power/>
			<m:ci>Δ</m:ci>
			<m:cn>2</m:cn>
		      </m:apply>
		      <m:cn>12</m:cn>
		    </m:apply>
		  </m:apply>
		</m:apply>
	      </m:math>
	    </caption>
	  </subfigure>
	  <subfigure>
	    <name>or</name> 
	    <media type="image/png" src="fig4QuantErrorFIR.png"/> 
	    <caption>Store double-precision partial sums. Costs more
	    memory, but variance
	      <m:math>
		<m:apply>
		  <m:eq/>
		  <m:apply>
		    <m:divide/>
		    <m:apply>
		      <m:power/>
		      <m:ci>Δ</m:ci>
		      <m:cn>2</m:cn>
		    </m:apply>
		    <m:cn>12</m:cn>
		  </m:apply>
		</m:apply>
	      </m:math>
	    </caption>
	  </subfigure>
	</figure>
        </para>
	
	<para id="para4">
	  The transpose form is not as convenient in terms of
	  supporting double-precision accumulation, which is a
	  significant disadvantage of this structure.
	</para>
      </section>
    </section>
    <section id="section4">
      <name>Coefficient Quantization</name>
      <para id="para5">
	Since a quantized coefficient is fixed for all time, we treat it
	differently than data quantization. The fundamental question
	is: how much does the quantization affect the frequency
	response of the filter?
      </para>
      <para id="para6">
	The quantized filter frequency response is
	<m:math display="block">
	  <m:apply>
	    <m:eq/>
	    <m:apply>
	      <m:ci type="fn" class="discrete">DTFT</m:ci>
	      <m:ci type="vector">
		<m:msub>
		  <m:mi>h</m:mi>
		  <m:mi>Q</m:mi>
		</m:msub>
	      </m:ci>
	    </m:apply>
	    <m:apply>
	      <m:ci type="fn" class="discrete">DTFT</m:ci>
	      <m:apply>
		<m:plus/>
		<m:ci type="vector">
		  <m:msub>
		    <m:mi>h</m:mi>
		    <m:mi>inf. prec.</m:mi>
		  </m:msub>
		</m:ci>
		<m:ci>e</m:ci>
	      </m:apply>
	    </m:apply>
	    <m:apply>
	      <m:plus/>
	      <m:apply>
		<m:ci type="fn">
		  <m:msub>
		    <m:mi>H</m:mi>
		    <m:mi>inf. prec.</m:mi>
		  </m:msub>
		</m:ci>
		<m:ci>w</m:ci>
	      </m:apply>
	      <m:apply>
		<m:ci type="fn">
		  <m:msub>
		    <m:mi>H</m:mi>
		    <m:mi>e</m:mi>
		  </m:msub>
		</m:ci>
		<m:ci>w</m:ci>
	      </m:apply>
	    </m:apply>
	  </m:apply>
	</m:math>
	Assuming the quantization model is correct, 
	<m:math>
	  <m:apply>
	    <m:ci type="fn">
	      <m:msub>
		<m:mi>H</m:mi>
		<m:mi>e</m:mi>
	      </m:msub>
	    </m:ci>
	    <m:ci>w</m:ci>
	  </m:apply>
	</m:math> should be fairly random and white, with the error
	spread fairly equally over all frequencies
	<m:math>
	  <m:apply>
	    <m:in/>
	    <m:ci>w</m:ci>
	    <m:interval closure="closed-open">
	      <m:apply>
		<m:minus/>
		<m:pi/>
	      </m:apply>
	      <m:pi/>
	    </m:interval>
	  </m:apply>
	</m:math>; however, the randomness of this error destroys any
	equiripple property or any infinite-precision optimality of a filter.
      </para>
      <exercise id="exer1">
	<problem>
	  <para id="para7">
	    What quantization scheme minimizes the 
	    <m:math>
	      <m:ci>
		<m:msub>
		  <m:mi>L</m:mi>
		  <m:mn>2</m:mn>
		</m:msub>
	      </m:ci>
	    </m:math> quantization error in frequency (minimizes
	    <m:math>
	      <m:apply>
		<m:int/>
		<m:bvar>
		  <m:ci>w</m:ci>
		</m:bvar>
		<m:uplimit>
		  <m:pi/>
		</m:uplimit>
		<m:lowlimit>
		  <m:apply>
		    <m:minus/>
		    <m:pi/>
		  </m:apply>
		</m:lowlimit>
		<m:apply>
		  <m:power/>
		  <m:apply>
		    <m:abs/>
		    <m:apply>
		      <m:minus/>
		      <m:apply>
			<m:ci type="fn">H</m:ci>
			<m:ci>w</m:ci>
		      </m:apply>
		      <m:apply>
			<m:ci type="fn">
			  <m:msub>
			    <m:mi>H</m:mi>
			    <m:mi>Q</m:mi>
			  </m:msub>
			</m:ci>
			<m:ci>w</m:ci>
		      </m:apply>
		    </m:apply>
		  </m:apply>
		  <m:cn>2</m:cn>
		</m:apply>
	      </m:apply>
	    </m:math>)? On average, how big is this error?
	  </para>
	</problem>
      </exercise>
      <para id="para8">
	Ideally, if one knows the coefficients are to be quantized to
	<m:math><m:ci>B</m:ci></m:math> bits, one should incorporate
	this directly into the filter design problem, and find the
	<m:math><m:ci>M</m:ci></m:math>
	<m:math><m:ci>B</m:ci></m:math>-bit binary fractional
	coefficients minimizing the maximum deviation 
	(<m:math>
	  <m:ci>
	    <m:msub>
	      <m:mi>L</m:mi>
	      <m:mi>∞</m:mi>
	    </m:msub>
	  </m:ci>
	</m:math> error). This can be done, but it is an integer
	program, which is known to be np-hard (i.e., requires almost a
        brute-force
	search). This is so expensive computationally that it's
	rarely done. There are some sub-optimal methods that are
	much more efficient and usually produce pretty good results.
      </para>
    </section>
	    
  </content>
  
</document>
