<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE document PUBLIC "-//CNX//DTD CNXML 0.5 plus MathML//EN" "http://cnx.rice.edu/cnxml/0.5/DTD/cnxml_mathml.dtd">
<document xmlns="http://cnx.rice.edu/cnxml" xmlns:md="http://cnx.rice.edu/mdml/0.4" xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:bib="http://bibtexml.sf.net/" id="id10495311">
  <name>Newton's Method</name>
  <metadata>
  <md:version>1.6</md:version>
  <md:created>2007/09/28 14:16:59 GMT-5</md:created>
  <md:revised>2007/12/05 21:27:48.977 US/Central</md:revised>
  <md:authorlist>
      <md:author id="cburrus">
      <md:firstname>C.</md:firstname>
      <md:othername>Sidney</md:othername>
      <md:surname>Burrus</md:surname>
      <md:email>csb@rice.edu</md:email>
    </md:author>
  </md:authorlist>

  <md:maintainerlist>
    <md:maintainer id="cburrus">
      <md:firstname>C.</md:firstname>
      <md:othername>Sidney</md:othername>
      <md:surname>Burrus</md:surname>
      <md:email>csb@rice.edu</md:email>
    </md:maintainer>
  </md:maintainerlist>
  
  <md:keywordlist>
    <md:keyword>factor</md:keyword>
    <md:keyword>Newton's method</md:keyword>
    <md:keyword>polynomial</md:keyword>
  </md:keywordlist>

  <md:abstract>A simple application of Newton's and Horner's methods to factor a polynomial.  Matlab code is included.</md:abstract>
</metadata>
  <content>
    <section id="id-596493387624">
      <name>Newton’s Method for Factoring a Polynomial</name>
      <para id="id11551757">We present Newton's method here as an example of the use of <cnxn document="m15099"> Horner's </cnxn> method. It takes on a particularly straight forward form when applied to polynomials because a formula for the derivative is available. The iterative form of Newton's method is</para>
      
      <equation id="element-289"><m:math>
          <m:semantics>
            <m:mrow>
              <m:mstyle fontsize="12pt">
                <m:mrow>
                  <m:mrow>
                    <m:msub>
                      <m:mi>z</m:mi>
                      <m:mstyle fontsize="8pt">
                        <m:mrow>
                          <m:mrow>
                            <m:mi>k</m:mi>
                            <m:mo stretchy="false">+</m:mo>
                            <m:mn>1</m:mn>
                          </m:mrow>
                        </m:mrow>
                      </m:mstyle>
                    </m:msub>
                    <m:mo stretchy="false">=</m:mo>
                    <m:mrow>
                      <m:msub>
                        <m:mi>z</m:mi>
                        <m:mstyle fontsize="8pt">
                          <m:mrow>
                            <m:mi>k</m:mi>
                          </m:mrow>
                        </m:mstyle>
                      </m:msub>
                      <m:mo stretchy="false">−</m:mo>
                      <m:mfrac>
                        <m:mrow>
                          <m:mi>f</m:mi>
                          <m:mo stretchy="false">(</m:mo>
                          <m:mi>z</m:mi>
                          <m:mo stretchy="false">)</m:mo>
                        </m:mrow>
                        <m:mrow>
                          <m:msup>
                            <m:mi>f</m:mi>
                            <m:mi>'</m:mi>
                          </m:msup>
                          <m:mo stretchy="false">(</m:mo>
                          <m:mi>z</m:mi>
                          <m:mo stretchy="false">)</m:mo>
                        </m:mrow>
                      </m:mfrac>
                    </m:mrow>
                  </m:mrow>
                </m:mrow>
              </m:mstyle>
              <m:mrow/>
            </m:mrow>
            <m:annotation encoding="StarMath 5.0"> size 12{z rSub { size 8{k+1} } =z rSub { size 8{k} }  -  {  {f \( z \) }  over  { { {f}} sup { ' } \( z \) } } } {}</m:annotation>
          </m:semantics>
        </m:math>
</equation><para id="id6317429">where the <emphasis>new</emphasis> estimate of the zero location, 
<m:math><m:semantics><m:mrow><m:mstyle fontsize="12pt"><m:mrow><m:msub><m:mi>z</m:mi><m:mstyle fontsize="8pt"><m:mrow><m:mrow><m:mi>k</m:mi><m:mo stretchy="false">+</m:mo><m:mn>1</m:mn></m:mrow></m:mrow></m:mstyle></m:msub></m:mrow></m:mstyle><m:mrow/></m:mrow><m:annotation encoding="StarMath 5.0"> size 12{z rSub { size 8{k+1} } } {}</m:annotation></m:semantics></m:math>, is calculated from the <emphasis>old</emphasis> value, 
<m:math><m:semantics><m:mrow><m:mstyle fontsize="12pt"><m:mrow><m:msub><m:mi>z</m:mi><m:mstyle fontsize="8pt"><m:mrow><m:mi>k</m:mi></m:mrow></m:mstyle></m:msub></m:mrow></m:mstyle><m:mrow/></m:mrow><m:annotation encoding="StarMath 5.0"> size 12{z rSub { size 8{k} } } {}</m:annotation></m:semantics></m:math>. A simple derivation is seen from the approximate definition of the derivative
<m:math><m:semantics><m:mrow/><m:annotation encoding="StarMath 5.0">{}</m:annotation></m:semantics></m:math></para>
      
      <equation id="element-923"><m:math>
          <m:semantics>
            <m:mrow>
              <m:mstyle fontsize="12pt">
                <m:mrow>
                  <m:mrow>
                    <m:msup>
                      <m:mi>f</m:mi>
                      <m:mi>'</m:mi>
                    </m:msup>
                    <m:mo stretchy="false">(</m:mo>
                    <m:msub>
                      <m:mi>z</m:mi>
                      <m:mstyle fontsize="8pt">
                        <m:mrow>
                          <m:mi>k</m:mi>
                        </m:mrow>
                      </m:mstyle>
                    </m:msub>
                    <m:mrow>
                      <m:mo stretchy="false">)</m:mo>
                      <m:mo stretchy="false">≈</m:mo>
                      <m:mfrac>
                        <m:mrow>
                          <m:mi>f</m:mi>
                          <m:mo stretchy="false">(</m:mo>
                          <m:msub>
                            <m:mi>z</m:mi>
                            <m:mstyle fontsize="8pt">
                              <m:mrow>
                                <m:mrow>
                                  <m:mi>k</m:mi>
                                  <m:mo stretchy="false">+</m:mo>
                                  <m:mn>1</m:mn>
                                </m:mrow>
                              </m:mrow>
                            </m:mstyle>
                          </m:msub>
                          <m:mrow>
                            <m:mo stretchy="false">)</m:mo>
                            <m:mo stretchy="false">−</m:mo>
                            <m:mi>f</m:mi>
                          </m:mrow>
                          <m:mo stretchy="false">(</m:mo>
                          <m:msub>
                            <m:mi>z</m:mi>
                            <m:mstyle fontsize="8pt">
                              <m:mrow>
                                <m:mi>k</m:mi>
                              </m:mrow>
                            </m:mstyle>
                          </m:msub>
                          <m:mo stretchy="false">)</m:mo>
                        </m:mrow>
                        <m:mrow>
                          <m:msub>
                            <m:mi>z</m:mi>
                            <m:mstyle fontsize="8pt">
                              <m:mrow>
                                <m:mrow>
                                  <m:mi>k</m:mi>
                                  <m:mo stretchy="false">+</m:mo>
                                  <m:mn>1</m:mn>
                                </m:mrow>
                              </m:mrow>
                            </m:mstyle>
                          </m:msub>
                          <m:mo stretchy="false">−</m:mo>
                          <m:msub>
                            <m:mi>z</m:mi>
                            <m:mstyle fontsize="8pt">
                              <m:mrow>
                                <m:mi>k</m:mi>
                              </m:mrow>
                            </m:mstyle>
                          </m:msub>
                        </m:mrow>
                      </m:mfrac>
                    </m:mrow>
                  </m:mrow>
                </m:mrow>
              </m:mstyle>
              <m:mrow/>
            </m:mrow>
            <m:annotation encoding="StarMath 5.0"> size 12{ { {f}} sup { ' } \( z rSub { size 8{k} }  \)  approx  {  {f \( z rSub { size 8{k+1} }  \)  - f \( z rSub { size 8{k} }  \) }  over  {z rSub { size 8{k+1} }  - z rSub { size 8{k} } } } } {}</m:annotation>
          </m:semantics>
        </m:math>
</equation><para id="id11489241">by setting 
<m:math><m:semantics><m:mrow><m:mstyle fontsize="12pt"><m:mrow><m:mrow><m:mi>f</m:mi><m:mo stretchy="false">(</m:mo><m:msub><m:mi>z</m:mi><m:mstyle fontsize="8pt"><m:mrow><m:mrow><m:mi>k</m:mi><m:mo stretchy="false">+</m:mo><m:mn>1</m:mn></m:mrow></m:mrow></m:mstyle></m:msub><m:mrow><m:mo stretchy="false">)</m:mo><m:mo stretchy="false">=</m:mo><m:mn>0</m:mn></m:mrow></m:mrow></m:mrow></m:mstyle><m:mrow/></m:mrow><m:annotation encoding="StarMath 5.0"> size 12{f \( z rSub { size 8{k+1} }  \) =0} {}</m:annotation></m:semantics></m:math> and solving for 
<m:math><m:semantics><m:mrow><m:mstyle fontsize="12pt"><m:mrow><m:msub><m:mi>z</m:mi><m:mstyle fontsize="8pt"><m:mrow><m:mrow><m:mi>k</m:mi><m:mo stretchy="false">+</m:mo><m:mn>1</m:mn></m:mrow></m:mrow></m:mstyle></m:msub></m:mrow></m:mstyle><m:mrow/></m:mrow><m:annotation encoding="StarMath 5.0"> size 12{z rSub { size 8{k+1} } } {}</m:annotation></m:semantics></m:math>. Another approach is to truncate the Taylor's series expansion</para>
      
      <equation id="element-416"><m:math>
          <m:semantics>
            <m:mrow>
              <m:mstyle fontsize="12pt">
                <m:mrow>
                  <m:mrow>
                    <m:mi>f</m:mi>
                    <m:mo stretchy="false">(</m:mo>
                    <m:mrow>
                      <m:mi>z</m:mi>
                      <m:mo stretchy="false">+</m:mo>
                      <m:mi>δ</m:mi>
                    </m:mrow>
                    <m:mrow>
                      <m:mo stretchy="false">)</m:mo>
                      <m:mo stretchy="false">=</m:mo>
                      <m:mi>f</m:mi>
                    </m:mrow>
                    <m:mo stretchy="false">(</m:mo>
                    <m:mi>z</m:mi>
                    <m:mrow>
                      <m:mo stretchy="false">)</m:mo>
                      <m:mo stretchy="false">+</m:mo>
                      <m:msup>
                        <m:mi>f</m:mi>
                        <m:mi>'</m:mi>
                      </m:msup>
                    </m:mrow>
                    <m:mo stretchy="false">(</m:mo>
                    <m:mi>z</m:mi>
                    <m:mo stretchy="false">)</m:mo>
                    <m:mrow>
                      <m:mi>δ</m:mi>
                      <m:mo stretchy="false">+</m:mo>
                      <m:msup>
                        <m:mi>f</m:mi>
                        <m:mrow>
                          <m:mi>'</m:mi>
                          <m:mi>'</m:mi>
                        </m:mrow>
                      </m:msup>
                    </m:mrow>
                    <m:mo stretchy="false">(</m:mo>
                    <m:mi>z</m:mi>
                    <m:mo stretchy="false">)</m:mo>
                    <m:mrow>
                      <m:mrow>
                        <m:msup>
                          <m:mi>δ</m:mi>
                          <m:mstyle fontsize="8pt">
                            <m:mrow>
                              <m:mn>2</m:mn>
                            </m:mrow>
                          </m:mstyle>
                        </m:msup>
                        <m:mo stretchy="false">/</m:mo>
                        <m:mn>2</m:mn>
                      </m:mrow>
                      <m:mo stretchy="false">+</m:mo>
                      <m:mo stretchy="false">⋯</m:mo>
                    </m:mrow>
                  </m:mrow>
                </m:mrow>
              </m:mstyle>
              <m:mrow/>
            </m:mrow>
            <m:annotation encoding="StarMath 5.0"> size 12{f \( z+δ \) =f \( z \) + { {f}} sup { ' } \( z \) δ+ { {f}} sup { '' } \( z \) δ rSup { size 8{2} } /2+ dotsaxis } {}</m:annotation>
          </m:semantics>
        </m:math>
</equation><para id="id12401557">with 
<m:math><m:semantics><m:mrow><m:mstyle fontsize="12pt"><m:mrow><m:mrow><m:mi>z</m:mi><m:mo stretchy="false">=</m:mo><m:msub><m:mi>z</m:mi><m:mstyle fontsize="8pt"><m:mrow><m:mi>k</m:mi></m:mrow></m:mstyle></m:msub></m:mrow></m:mrow></m:mstyle><m:mrow/></m:mrow><m:annotation encoding="StarMath 5.0"> size 12{z=z rSub { size 8{k} } } {}</m:annotation></m:semantics></m:math> and 
<m:math><m:semantics><m:mrow><m:mstyle fontsize="12pt"><m:mrow><m:mrow><m:mrow><m:mi>z</m:mi><m:mo stretchy="false">+</m:mo><m:mi>δ</m:mi></m:mrow><m:mo stretchy="false">=</m:mo><m:msub><m:mi>z</m:mi><m:mstyle fontsize="8pt"><m:mrow><m:mrow><m:mi>k</m:mi><m:mo stretchy="false">+</m:mo><m:mn>1</m:mn></m:mrow></m:mrow></m:mstyle></m:msub></m:mrow></m:mrow></m:mstyle><m:mrow/></m:mrow><m:annotation encoding="StarMath 5.0"> size 12{z+δ=z rSub { size 8{k+1} } } {}</m:annotation></m:semantics></m:math>. Indeed, this approach explains why the standard Newton's method does not work well with multiple order zeros since 
<m:math><m:semantics><m:mrow><m:mstyle fontsize="12pt"><m:mrow><m:mrow><m:mi>f</m:mi><m:mo stretchy="false">(</m:mo><m:mi>z</m:mi><m:mo stretchy="false">)</m:mo></m:mrow></m:mrow></m:mstyle><m:mrow/></m:mrow><m:annotation encoding="StarMath 5.0"> size 12{f \( z \) } {}</m:annotation></m:semantics></m:math> and 
<m:math><m:semantics><m:mrow><m:mstyle fontsize="12pt"><m:mrow><m:mrow><m:msup><m:mi>f</m:mi><m:mi>'</m:mi></m:msup><m:mo stretchy="false">(</m:mo><m:mi>z</m:mi><m:mo stretchy="false">)</m:mo></m:mrow></m:mrow></m:mstyle><m:mrow/></m:mrow><m:annotation encoding="StarMath 5.0"> size 12{ { {f}} sup { ' } \( z \) } {}</m:annotation></m:semantics></m:math> are both zero at the multiple order zero.</para>
      <para id="id11738195">Newton's method requires the value of the polynomial and its derivative at eac iteration and Horner's method provides both efficiently.</para>
      <para id="id8153040">One approach to finding the zeros of 
<m:math><m:semantics><m:mrow><m:mstyle fontsize="12pt"><m:mrow><m:mrow><m:mi>f</m:mi><m:mo stretchy="false">(</m:mo><m:mi>z</m:mi><m:mo stretchy="false">)</m:mo></m:mrow></m:mrow></m:mstyle><m:mrow/></m:mrow><m:annotation encoding="StarMath 5.0"> size 12{f \( z \) } {}</m:annotation></m:semantics></m:math> could be to find the minima of the square of the magnitude of the polynomial. This could be done using a steepest descent algorithm which, when used with an appropriate step size choice, would result in guaranteed convergence, but usually is very slow. Another approach would be to use Newton’s method which has quadratic convergence but often must be started very close to the solution for convergence to occur. A third method would be to use the Gauss-Newton algorithm which is an approximation of the Newton scheme for a squared error measure. This approach also has quadratic convergence but can be more tolerant of starting values. </para>
      <para id="id10245355">When applied to factoring a polynomial, all three are similar or the same. </para>
      <para id="id7565168">Moore [15,16] pointed out that the steepest descent algorithm applied to (10) gives the same directional derivative as a Newton type algorithm. Closer examination shows that all three of the approaches suggested above give the same direction, and both the Newton and the Gauss-Newton algorithms have the same step size and, therefore, are exactly the same algorithm. This is true because 
<m:math><m:semantics><m:mrow><m:mstyle fontsize="12pt"><m:mrow><m:mrow><m:mi>f</m:mi><m:mo stretchy="false">(</m:mo><m:mi>z</m:mi><m:mo stretchy="false">)</m:mo></m:mrow></m:mrow></m:mstyle><m:mrow/></m:mrow><m:annotation encoding="StarMath 5.0"> size 12{f \( z \) } {}</m:annotation></m:semantics></m:math> is analytic and the Cauchy-Riemann conditions cause the directions to be the same.</para>
      <para id="id5655602">These relationships are interesting to us because they means that Newton methods always give a direction that reduces the squared magnitude, even if the step size, which approximates a parabolic shape, may be in considerable error. However, since the direction is correct, there always exists a sufficiently small step size that will reduce the squared magnitude, and, since there are no minima other than zeros of 
<m:math><m:semantics><m:mrow><m:mstyle fontsize="12pt"><m:mrow><m:mrow><m:mi>f</m:mi><m:mo stretchy="false">(</m:mo><m:mi>z</m:mi><m:mo stretchy="false">)</m:mo></m:mrow></m:mrow></m:mstyle><m:mrow/></m:mrow><m:annotation encoding="StarMath 5.0"> size 12{f \( z \) } {}</m:annotation></m:semantics></m:math>, Newton's method with a suitable step size control will always converge to a zero from any starting point. Near a first order zero of 
<m:math><m:semantics><m:mrow><m:mstyle fontsize="12pt"><m:mrow><m:mrow><m:mi>f</m:mi><m:mo stretchy="false">(</m:mo><m:mi>z</m:mi><m:mo stretchy="false">)</m:mo></m:mrow></m:mrow></m:mstyle><m:mrow/></m:mrow><m:annotation encoding="StarMath 5.0"> size 12{f \( z \) } {}</m:annotation></m:semantics></m:math>, the surface is approximately parabolic and the method then converges quadratically. Newton type algorithms do, however, have trouble with multiple zeros or near a cluster of zeros where the surface is not approximately parabolic.</para>
      <para id="id11443777">A simple section of Matlab code which implements Horner's method (also called synthetic division or nested evaluation and described in the appendix) to evaluate 
<m:math><m:semantics><m:mrow><m:mstyle fontsize="12pt"><m:mrow><m:mrow><m:mi>f</m:mi><m:mo stretchy="false">(</m:mo><m:mi>z</m:mi><m:mo stretchy="false">)</m:mo></m:mrow></m:mrow></m:mstyle><m:mrow/></m:mrow><m:annotation encoding="StarMath 5.0"> size 12{f \( z \) } {}</m:annotation></m:semantics></m:math> and 
<m:math><m:semantics><m:mrow><m:mstyle fontsize="12pt"><m:mrow><m:mrow><m:msup><m:mi>f</m:mi><m:mi>'</m:mi></m:msup><m:mo stretchy="false">(</m:mo><m:mi>z</m:mi><m:mo stretchy="false">)</m:mo></m:mrow></m:mrow></m:mstyle><m:mrow/></m:mrow><m:annotation encoding="StarMath 5.0"> size 12{ { {f}} sup { ' } \( z \) } {}</m:annotation></m:semantics></m:math> at each iteration and uses them for <emphasis>updating</emphasis> the estimate of the zero location, 
<m:math><m:semantics><m:mrow><m:mstyle fontsize="12pt"><m:mrow><m:mi>z</m:mi></m:mrow></m:mstyle><m:mrow/></m:mrow><m:annotation encoding="StarMath 5.0"> size 12{z} {}</m:annotation></m:semantics></m:math>, is given in Figure 1.</para>
      <figure id="element-711"><code type="block">N = length(a)          % Number of poly coefficients
while (abs(f)&gt;e)    % Newton's iterations
  f = a(1);  fp = 0;   % Horner initial values
  for k = 2:N          % Horner's algorithm
    fp = f + fp*z;     % to eval f'(z)
    f = a(k) + f*z;    % and f(z)
  end
  z = z - f/fp;        % Newton's update of z
end
</code>
<caption>Matlab code for Newton's Method</caption></figure>
      
      
      
      
      
      
      
      
      
      <para id="id12958846">where 
<m:math><m:semantics><m:mrow><m:mstyle fontsize="12pt"><m:mrow><m:mi>z</m:mi></m:mrow></m:mstyle><m:mrow/></m:mrow><m:annotation encoding="StarMath 5.0"> size 12{z} {}</m:annotation></m:semantics></m:math> must be given as an initial estimate and 
<m:math><m:semantics><m:mrow><m:mstyle fontsize="12pt"><m:mrow><m:mi>e</m:mi></m:mrow></m:mstyle><m:mrow/></m:mrow><m:annotation encoding="StarMath 5.0"> size 12{e} {}</m:annotation></m:semantics></m:math> is the value of 
<m:math><m:semantics><m:mrow><m:mstyle fontsize="12pt"><m:mrow><m:mrow><m:mi>f</m:mi><m:mo stretchy="false">(</m:mo><m:mi>z</m:mi><m:mo stretchy="false">)</m:mo></m:mrow></m:mrow></m:mstyle><m:mrow/></m:mrow><m:annotation encoding="StarMath 5.0"> size 12{f \( z \) } {}</m:annotation></m:semantics></m:math> that is assumed to be close enough to zero. For 
<m:math><m:semantics><m:mrow><m:mstyle fontsize="12pt"><m:mrow><m:mi>N</m:mi></m:mrow></m:mstyle><m:mrow/></m:mrow><m:annotation encoding="StarMath 5.0"> size 12{N} {}</m:annotation></m:semantics></m:math> coefficients, 
<m:math><m:semantics><m:mrow><m:mstyle fontsize="12pt"><m:mrow><m:mrow><m:mi>a</m:mi><m:mo stretchy="false">(</m:mo><m:mi>k</m:mi><m:mo stretchy="false">)</m:mo></m:mrow></m:mrow></m:mstyle><m:mrow/></m:mrow><m:annotation encoding="StarMath 5.0"> size 12{a \( k \) } {}</m:annotation></m:semantics></m:math>, each iteration requires approximately 
<m:math><m:semantics><m:mrow><m:mstyle fontsize="12pt"><m:mrow><m:mn>2N</m:mn></m:mrow></m:mstyle><m:mrow/></m:mrow><m:annotation encoding="StarMath 5.0"> size 12{2N} {}</m:annotation></m:semantics></m:math> multiplications and the Newton's formula requires one division. In a practical algorithm, measures must be taken to prevent dividing by zero or having overflow or divergence. A good method for starting the algorithm must be used. The practical program discussed later is built around this simple combination of <cnxn document="m15099"> Horner's </cnxn> and Newton's methods.</para>
    </section>
  </content>
</document>
