<?xml version="1.0" encoding="utf-8" standalone="no"?>
<!DOCTYPE document PUBLIC "-//CNX//DTD CNXML 0.5 plus MathML//EN" "http://cnx.rice.edu/cnxml/0.5/DTD/cnxml_mathml.dtd">
<document xmlns="http://cnx.rice.edu/cnxml" xmlns:md="http://cnx.rice.edu/mdml/0.4" xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:bib="http://bibtexml.sf.net/" id="m11120">
  <name xmlns:md="http://cnx.rice.edu/mdml/0.4" xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:bib="http://bibtexml.sf.net/">Back propagation mathematics</name>
  <metadata xmlns:md="http://cnx.rice.edu/mdml/0.4" xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:bib="http://bibtexml.sf.net/">
  <md:version xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:bib="http://bibtexml.sf.net/">2.0</md:version>
  <md:created xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:bib="http://bibtexml.sf.net/">2003/04/22</md:created>
  <md:revised xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:bib="http://bibtexml.sf.net/">2003/04/22</md:revised>
  <md:authorlist xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:bib="http://bibtexml.sf.net/">
      <md:author xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:bib="http://bibtexml.sf.net/" id="chrisc">
      <md:firstname xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:bib="http://bibtexml.sf.net/">Krzysztof</md:firstname>
      <md:othername xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:bib="http://bibtexml.sf.net/">A.</md:othername>
      <md:surname xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:bib="http://bibtexml.sf.net/">Cyran</md:surname>
      <md:email xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:bib="http://bibtexml.sf.net/">chrisc1@rice.edu</md:email>
    </md:author>
  </md:authorlist>

  <md:maintainerlist xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:bib="http://bibtexml.sf.net/">
    <md:maintainer xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:bib="http://bibtexml.sf.net/" id="chrisc">
      <md:firstname xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:bib="http://bibtexml.sf.net/">Krzysztof</md:firstname>
      <md:othername xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:bib="http://bibtexml.sf.net/">A.</md:othername>
      <md:surname xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:bib="http://bibtexml.sf.net/">Cyran</md:surname>
      <md:email xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:bib="http://bibtexml.sf.net/">chrisc1@rice.edu</md:email>
    </md:maintainer>
  </md:maintainerlist>
  
  <md:keywordlist xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:bib="http://bibtexml.sf.net/">
    <md:keyword xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:bib="http://bibtexml.sf.net/">back propagation</md:keyword>
    <md:keyword xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:bib="http://bibtexml.sf.net/">first order rule</md:keyword>
    <md:keyword xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:bib="http://bibtexml.sf.net/">gradient based learning</md:keyword>
    <md:keyword xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:bib="http://bibtexml.sf.net/">gradient optimization</md:keyword>
    <md:keyword xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:bib="http://bibtexml.sf.net/">learning methods</md:keyword>
    <md:keyword xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:bib="http://bibtexml.sf.net/">neural networks</md:keyword>
    <md:keyword xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:bib="http://bibtexml.sf.net/">steepest descent method</md:keyword>
  </md:keywordlist>

  <md:abstract xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:bib="http://bibtexml.sf.net/">This module explains mathematical background of the most popular learning algorithm, called back propagation. It is widely used in multilayer perceptrons training but it's usage is not limited to this particular type of neural network.</md:abstract>
</metadata>



  <content xmlns:md="http://cnx.rice.edu/mdml/0.4" xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:bib="http://bibtexml.sf.net/">
<section xmlns:md="http://cnx.rice.edu/mdml/0.4" xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:bib="http://bibtexml.sf.net/" id="errordef">
  <name xmlns:md="http://cnx.rice.edu/mdml/0.4" xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:bib="http://bibtexml.sf.net/">Error definition</name>
    <para xmlns:md="http://cnx.rice.edu/mdml/0.4" xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:bib="http://bibtexml.sf.net/" id="introduction">
       <!-- Insert module text here -->
The back propagation method is the example of the wide class of training methods based on the information covered in the gradient of error function. The independent variables in this minimization are weights of neural network and the considered error to be minimized is the root mean square one.
    </para>   
    <para xmlns:md="http://cnx.rice.edu/mdml/0.4" xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:bib="http://bibtexml.sf.net/" id="error_definition">
Let us consider the training set composed of <m:math><m:ci>L</m:ci></m:math> ordered pairs, of the following form:
<m:math>
 <m:semantics>
  <m:mrow>
   <m:mrow><m:mo>{</m:mo> <m:mrow>
    <m:mrow><m:mo>(</m:mo>
     <m:mrow>
      <m:msup>
       <m:mstyle mathvariant="bold" mathsize="normal"><m:mi>x</m:mi></m:mstyle>
       <m:mrow>
        <m:mrow><m:mo>(</m:mo>
         <m:mn>1</m:mn>
        <m:mo>)</m:mo></m:mrow>
       </m:mrow>
      </m:msup>
      <m:mo>,</m:mo><m:msup>
       <m:mstyle mathvariant="bold" mathsize="normal"><m:mi>d</m:mi></m:mstyle>
       <m:mrow>
        <m:mrow><m:mo>(</m:mo>
         <m:mn>1</m:mn>
        <m:mo>)</m:mo></m:mrow>
       </m:mrow>
      </m:msup>
      
     </m:mrow>
    <m:mo>)</m:mo></m:mrow><m:mo>,</m:mo><m:mrow><m:mo>(</m:mo>
     <m:mrow>
      <m:msup>
       <m:mstyle mathvariant="bold" mathsize="normal"><m:mi>x</m:mi></m:mstyle>
       <m:mrow>
        <m:mrow><m:mo>(</m:mo>
         <m:mn>2</m:mn>
        <m:mo>)</m:mo></m:mrow>
       </m:mrow>
      </m:msup>
      <m:mo>,</m:mo><m:msup>
       <m:mstyle mathvariant="bold" mathsize="normal"><m:mi>d</m:mi></m:mstyle>
       <m:mrow>
        <m:mrow><m:mo>(</m:mo>
         <m:mn>2</m:mn>
        <m:mo>)</m:mo></m:mrow>
       </m:mrow>
      </m:msup>
      
     </m:mrow>
    <m:mo>)</m:mo></m:mrow><m:mn>,...,</m:mn><m:mrow><m:mo>(</m:mo>
     <m:mrow>
      <m:msup>
       <m:mstyle mathvariant="bold" mathsize="normal"><m:mi>x</m:mi></m:mstyle>
       <m:mrow>
        <m:mrow><m:mo>(</m:mo>
         <m:mi>L</m:mi>
        <m:mo>)</m:mo></m:mrow>
       </m:mrow>
      </m:msup>
      <m:mo>,</m:mo><m:msup>
       <m:mstyle mathvariant="bold" mathsize="normal"><m:mi>d</m:mi></m:mstyle>
       <m:mrow>
        <m:mrow><m:mo>(</m:mo>
         <m:mi>L</m:mi>
        <m:mo>)</m:mo></m:mrow>
       </m:mrow>
      </m:msup>
      
     </m:mrow>
    <m:mo>)</m:mo></m:mrow>
   </m:mrow> <m:mo>}</m:mo></m:mrow>
  </m:mrow>
 <m:annotation encoding="MathType-MTEF">
 </m:annotation>
</m:semantics></m:math>
Furthermore, let us define the total error <m:math><m:ci>E</m:ci></m:math> generated on outputs of neural network after presenting the entire training set, as:
<m:math display="block">
 <m:semantics>
  <m:mrow>
   <m:mi>E</m:mi><m:mo>=</m:mo><m:munderover>
    <m:mo>∑</m:mo>
    <m:mrow>
     <m:mi>l</m:mi><m:mo>=</m:mo><m:mn>1</m:mn>
    </m:mrow>
    <m:mi>L</m:mi>
   </m:munderover>
   <m:mrow>
    <m:msup>
     <m:mi>E</m:mi>
     <m:mrow>
      <m:mrow><m:mo>(</m:mo>
       <m:mi>l</m:mi>
      <m:mo>)</m:mo></m:mrow>
     </m:mrow>
    </m:msup>
    
   </m:mrow>
   
  </m:mrow>
 <m:annotation encoding="MathType-MTEF">
 </m:annotation>
</m:semantics></m:math>
where:
<m:math display="block">
 <m:semantics>
  <m:mrow>
   <m:msup>
    <m:mi>E</m:mi>
    <m:mrow>
     <m:mrow><m:mo>(</m:mo>
      <m:mi>l</m:mi>
     <m:mo>)</m:mo></m:mrow>
    </m:mrow>
   </m:msup>
   <m:mo>=</m:mo><m:munderover>
    <m:mo>∑</m:mo>
    <m:mrow>
     <m:mi>m</m:mi><m:mo>=</m:mo><m:mn>1</m:mn>
    </m:mrow>
    <m:mi>M</m:mi>
   </m:munderover>
   <m:mrow>
    <m:msubsup>
     <m:mi>E</m:mi>
     <m:mi>m</m:mi>
     <m:mrow>
      <m:mrow><m:mo>(</m:mo>
       <m:mi>l</m:mi>
      <m:mo>)</m:mo></m:mrow>
     </m:mrow>
    </m:msubsup>
    
   </m:mrow>
   <m:mo>=</m:mo><m:mfrac>
    <m:mn>1</m:mn>
    <m:mn>2</m:mn>
   </m:mfrac>
   <m:munderover>
    <m:mo>∑</m:mo>
    <m:mrow>
     <m:mi>m</m:mi><m:mo>=</m:mo><m:mn>1</m:mn>
    </m:mrow>
    <m:mi>M</m:mi>
   </m:munderover>
   <m:mrow>
    <m:msup>
     <m:mrow>
      <m:mrow><m:mo>(</m:mo>
       <m:mrow>
        <m:msubsup>
         <m:mi>d</m:mi>
         <m:mi>m</m:mi>
         <m:mrow>
          <m:mrow><m:mo>(</m:mo>
           <m:mi>l</m:mi>
          <m:mo>)</m:mo></m:mrow>
         </m:mrow>
        </m:msubsup>
        <m:mo>−</m:mo><m:msubsup>
         <m:mi>y</m:mi>
         <m:mi>m</m:mi>
         <m:mrow>
          <m:mrow><m:mo>(</m:mo>
           <m:mi>l</m:mi>
          <m:mo>)</m:mo></m:mrow>
         </m:mrow>
        </m:msubsup>
        
       </m:mrow>
      <m:mo>)</m:mo></m:mrow>
     </m:mrow>
     <m:mn>2</m:mn>
    </m:msup>
    
   </m:mrow>
   
  </m:mrow>
 <m:annotation encoding="MathType-MTEF">
 </m:annotation>
</m:semantics></m:math>
As was already told, the independent variables in the minimization of error <m:math><m:ci>E</m:ci></m:math> are weights 
<m:math>
 <m:semantics>
  <m:msub>
   <m:mi>w</m:mi>
   <m:mi>ij</m:mi>
  </m:msub>
  <m:annotation-xml encoding="MathML-Content">
   <m:apply>
    <m:ci>Subscript</m:ci>
    <m:ci>w</m:ci>
    <m:ci>ij</m:ci>
   </m:apply>
  </m:annotation-xml>
 </m:semantics>
</m:math>

Since even for the relatively small networks the number of weigths is big, in real applications, the training of the neural network is the minimization of the scalar field over the vector space with hundreds or (more often) thousands dimensions. One of the minizmiazation techniques for such problem is the steapest descent method
    </para>
    <para xmlns:md="http://cnx.rice.edu/mdml/0.4" xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:bib="http://bibtexml.sf.net/" id="steapest_descent_method">
    </para>
</section>





    <para xmlns:md="http://cnx.rice.edu/mdml/0.4" xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:bib="http://bibtexml.sf.net/" id="trial">
    
<m:math display="block">
    <m:apply>
     <m:int/>
     <m:bvar>
      <m:ci>x</m:ci>
     </m:bvar>
     <m:lowlimit>
      <m:cn type="integer">0</m:cn>
     </m:lowlimit>
     <m:uplimit>
      <m:cn type="integer">1</m:cn>
     </m:uplimit>
     <m:apply>
      <m:power/>
      <m:ci>x</m:ci>
      <m:cn type="integer">2</m:cn>
     </m:apply>
    </m:apply>
</m:math>


<m:math display="block">  <m:apply>   <m:sum/>    
<m:bvar>    <m:ci>n</m:ci>   </m:bvar>   <m:lowlimit>    <m:cn type="integer">1</m:cn>   </m:lowlimit>   <m:uplimit>    <m:infinity/>   
 </m:uplimit>   <m:apply>    <m:times/>    <m:cn type="integer">2</m:cn> 
    <m:apply>     <m:power/>     <m:cn type="integer">2</m:cn>      
<m:cn type="rational">1<m:sep/>2</m:cn>    </m:apply>    <m:apply>      
<m:plus/>     <m:apply>      <m:times/>      <m:cn type="integer">26390</m:cn>      <m:ci>n</m:ci>     </m:apply>     <m:cn type="integer">1103</m:cn>    </m:apply>    <m:apply>     <m:factorial/> 
     <m:apply>      <m:times/>      <m:cn type="integer">4</m:cn>       
<m:ci>n</m:ci>     </m:apply>    </m:apply>    <m:apply>     <m:power/>   
   <m:apply>      <m:times/>      <m:cn type="integer">9801</m:cn>       
<m:apply>       <m:power/>       <m:cn type="integer">396</m:cn>        
<m:apply>        <m:times/>        <m:cn type="integer">4</m:cn>         
<m:ci>n</m:ci>       </m:apply>      </m:apply>      <m:apply>        
<m:power/>       <m:apply>        <m:factorial/>        <m:ci>n</m:ci>     
   </m:apply>       <m:cn type="integer">4</m:cn>      </m:apply>      
</m:apply>     <m:cn type="integer">-1</m:cn>    </m:apply>   </m:apply> 
  </m:apply> </m:math>


<m:math display="block">
 <m:mrow>
  <m:munderover>
   <m:mo>∑</m:mo>
   <m:mrow>
    <m:mi>n</m:mi>
    <m:mo>=</m:mo>
    <m:mn>1</m:mn>
   </m:mrow>
   <m:mi>∞</m:mi>
  </m:munderover>
  <m:mfrac>
   <m:mrow>
    <m:mn>2</m:mn>
    <m:mo>⁢</m:mo>
    <m:msqrt>
     <m:mn>2</m:mn>
    </m:msqrt>
    <m:mo>⁢</m:mo>
    <m:mrow>
     <m:mo>(</m:mo>
     <m:mrow>
      <m:mrow>
       <m:mn>26390</m:mn>
       <m:mo>⁢</m:mo>
       <m:mi>n</m:mi>
      </m:mrow>
      <m:mo>+</m:mo>
      <m:mn>1103</m:mn>
     </m:mrow>
     <m:mo>)</m:mo>
    </m:mrow>
    <m:mo>⁢</m:mo>
    <m:mrow>
     <m:mrow>
      <m:mo>(</m:mo>
      <m:mrow>
       <m:mn>4</m:mn>
       <m:mo>⁢</m:mo>
       <m:mi>n</m:mi>
      </m:mrow>
      <m:mo>)</m:mo>
     </m:mrow>
     <m:mo>!</m:mo>
    </m:mrow>
   </m:mrow>
   <m:mrow>
    <m:mn>9801</m:mn>
    <m:mo>⁢</m:mo>
    <m:msup>
     <m:mn>396</m:mn>
     <m:mrow>
      <m:mn>4</m:mn>
      <m:mo>⁢</m:mo>
      <m:mi>n</m:mi>
     </m:mrow>
    </m:msup>
    <m:mo>⁢</m:mo>
    <m:msup>
     <m:mrow>
      <m:mi>n</m:mi>
      <m:mo>!</m:mo>
     </m:mrow>
     <m:mn>4</m:mn>
    </m:msup>
   </m:mrow>
  </m:mfrac>
 </m:mrow>
</m:math>


<m:math display="block">
 <m:semantics>
  <m:mrow>
   <m:munderover>
    <m:mo>∑</m:mo>
    <m:mrow>
     <m:mi>n</m:mi>
     <m:mo>=</m:mo>
     <m:mn>1</m:mn>
    </m:mrow>
    <m:mi>∞</m:mi>
   </m:munderover>
   <m:mfrac>
    <m:mrow>
     <m:mn>2</m:mn>
     <m:mo>⁢</m:mo>
     <m:msqrt>
      <m:mn>2</m:mn>
     </m:msqrt>
     <m:mo>⁢</m:mo>
     <m:mrow>
      <m:mo>(</m:mo>
      <m:mrow>
       <m:mrow>
        <m:mn>26390</m:mn>
        <m:mo>⁢</m:mo>
        <m:mi>n</m:mi>
       </m:mrow>
       <m:mo>+</m:mo>
       <m:mn>1103</m:mn>
      </m:mrow>
      <m:mo>)</m:mo>
     </m:mrow>
     <m:mo>⁢</m:mo>
     <m:mrow>
      <m:mrow>
       <m:mo>(</m:mo>
       <m:mrow>
        <m:mn>4</m:mn>
        <m:mo>⁢</m:mo>
        <m:mi>n</m:mi>
       </m:mrow>
       <m:mo>)</m:mo>
      </m:mrow>
      <m:mo>!</m:mo>
     </m:mrow>
    </m:mrow>
    <m:mrow>
     <m:mn>9801</m:mn>
     <m:mo>⁢</m:mo>
     <m:msup>
      <m:mn>396</m:mn>
      <m:mrow>
       <m:mn>4</m:mn>
       <m:mo>⁢</m:mo>
       <m:mi>n</m:mi>
      </m:mrow>
     </m:msup>
     <m:mo>⁢</m:mo>
     <m:msup>
      <m:mrow>
       <m:mi>n</m:mi>
       <m:mo>!</m:mo>
      </m:mrow>
      <m:mn>4</m:mn>
     </m:msup>
    </m:mrow>
   </m:mfrac>
  </m:mrow>
  <m:annotation-xml encoding="MathML-Content">
   <m:apply>
    <m:sum/>
    <m:bvar>
     <m:ci>n</m:ci>
    </m:bvar>
    <m:lowlimit>
     <m:cn type="integer">1</m:cn>
    </m:lowlimit>
    <m:uplimit>
     <m:infinity/>
    </m:uplimit>
    <m:apply>
     <m:times/>
     <m:cn type="integer">2</m:cn>
     <m:apply>
      <m:power/>
      <m:cn type="integer">2</m:cn>
      <m:cn type="rational">1<m:sep/>2</m:cn>
     </m:apply>
     <m:apply>
      <m:plus/>
      <m:apply>
       <m:times/>
       <m:cn type="integer">26390</m:cn>
       <m:ci>n</m:ci>
      </m:apply>
      <m:cn type="integer">1103</m:cn>
     </m:apply>
     <m:apply>
      <m:factorial/>
      <m:apply>
       <m:times/>
       <m:cn type="integer">4</m:cn>
       <m:ci>n</m:ci>
      </m:apply>
     </m:apply>
     <m:apply>
      <m:power/>
      <m:apply>
       <m:times/>
       <m:cn type="integer">9801</m:cn>
       <m:apply>
        <m:power/>
        <m:cn type="integer">396</m:cn>
        <m:apply>
         <m:times/>
         <m:cn type="integer">4</m:cn>
         <m:ci>n</m:ci>
        </m:apply>
       </m:apply>
       <m:apply>
        <m:power/>
        <m:apply>
         <m:factorial/>
         <m:ci>n</m:ci>
        </m:apply>
        <m:cn type="integer">4</m:cn>
       </m:apply>
      </m:apply>
      <m:cn type="integer">-1</m:cn>
     </m:apply>
    </m:apply>
   </m:apply>
  </m:annotation-xml>
 </m:semantics>
</m:math>

<m:math display="block">
    <m:apply>
    <m:sum/>
    <m:bvar>
     <m:ci>n</m:ci>
    </m:bvar>
    <m:lowlimit>
     <m:cn type="integer">1</m:cn>
    </m:lowlimit>
    <m:uplimit>
     <m:infinity/>
    </m:uplimit>
    <m:apply>
     <m:times/>
     <m:cn type="integer">2</m:cn>
     <m:apply>
      <m:power/>
      <m:cn type="integer">2</m:cn>
      <m:cn type="rational">1<m:sep/>2</m:cn>
     </m:apply>
     <m:apply>
      <m:plus/>
      <m:apply>
       <m:times/>
       <m:cn type="integer">26390</m:cn>
       <m:ci>n</m:ci>
      </m:apply>
      <m:cn type="integer">1103</m:cn>
     </m:apply>
     <m:apply>
      <m:factorial/>
      <m:apply>
       <m:times/>
       <m:cn type="integer">4</m:cn>
       <m:ci>n</m:ci>
      </m:apply>
     </m:apply>
     <m:apply>
      <m:power/>
      <m:apply>
       <m:times/>
       <m:cn type="integer">9801</m:cn>
       <m:apply>
        <m:power/>
        <m:cn type="integer">396</m:cn>
        <m:apply>
         <m:times/>
         <m:cn type="integer">4</m:cn>
         <m:ci>n</m:ci>
        </m:apply>
       </m:apply>
       <m:apply>
        <m:power/>
        <m:apply>
         <m:factorial/>
         <m:ci>n</m:ci>
        </m:apply>
        <m:cn type="integer">4</m:cn>
       </m:apply>
      </m:apply>
      <m:cn type="integer">-1</m:cn>
     </m:apply>
    </m:apply>
   </m:apply>
</m:math>


    </para>
  </content>
  
</document>
