<?xml version="1.0" encoding="utf-8" standalone="no"?>
<!DOCTYPE document PUBLIC "-//CNX//DTD CNXML 0.5 plus MathML//EN" "http://cnx.rice.edu/cnxml/0.5/DTD/cnxml_mathml.dtd">
<document xmlns="http://cnx.rice.edu/cnxml" xmlns:md="http://cnx.rice.edu/mdml/0.4" xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:bib="http://bibtexml.sf.net/" id="m11061">

  <name>Measures of Central Tendency</name>
  <metadata>
  <md:version>2.3</md:version>
  <md:created>2003/02/27</md:created>
  <md:revised>2008/04/20 15:25:21.683 GMT-5</md:revised>
  <md:authorlist>
      <md:author id="dmlane">
      <md:firstname>David</md:firstname>
      
      <md:surname>Lane</md:surname>
      <md:email>lane@rice.edu</md:email>
    </md:author>
  </md:authorlist>

  <md:maintainerlist>
    <md:maintainer id="dmlane">
      <md:firstname>David</md:firstname>
      
      <md:surname>Lane</md:surname>
      <md:email>lane@rice.edu</md:email>
    </md:maintainer>
    <md:maintainer id="meyer">
      <md:firstname>Eileen</md:firstname>
      
      <md:surname>Meyer</md:surname>
      <md:email>meyer@rice.edu</md:email>
    </md:maintainer>
  </md:maintainerlist>
  
  <md:keywordlist>
    <md:keyword>Central Tendency</md:keyword>
    <md:keyword>Statistics</md:keyword>
  </md:keywordlist>

  <md:abstract/>
</metadata>

  <content>
    <para id="para1">
      In the <cnxn document="m10942" strength="8">previous section</cnxn>
      we saw that there are several ways to define central tendency.
      This section defines the three most common measures of central
      tendency: the mean, the median, and the mode.  The relationships
      between these measures of central tendency and the definitions
      given in the previous section will probably not be obvious to
      you.  Rather than just tell you these relationships, we will
      allow you to discover them in the simulations in the sections
      that follow.
    </para>
    <para id="para2">
      This section gives only the basic definitions of the mean,
      median and mode.  A further discussion of the relative merits
      and proper applications of these statistics is presented in a
      <cnxn document="m11011" strength="9">later section</cnxn>.
    </para>

    <section id="sect1">
      <name>Arithmetic Mean</name>
      <para id="para3">
	The <term>arithmetic mean</term> is the most common measure of
	central tendency.  It simply the sum of the numbers divided by
	the number of numbers.  The symbol
	<m:math><m:ci>m</m:ci></m:math> is used for the mean of a
	population.  The symbol <m:math><m:ci>M</m:ci></m:math> is
	used for the mean of a sample.  The formula for
	<m:math><m:ci>m</m:ci></m:math> is shown below:

	<m:math display="block">
	  <m:apply>
	    <m:eq/>
	    <m:ci>m</m:ci>
	    <m:apply>
	      <m:divide/>
	      <m:apply>
		<m:times/>
		<m:ci>S</m:ci>
		<m:ci>X</m:ci>
	      </m:apply>
	      <m:ci>N</m:ci>
	    </m:apply>
	  </m:apply>
	</m:math>
	where
	<m:math>
	  <m:apply>
	    <m:times/>
	    <m:ci>S</m:ci>
	    <m:ci>X</m:ci>
	  </m:apply>
	</m:math>
	is the sum of all the numbers in the numbers in the sample and
	<m:math><m:ci>N</m:ci></m:math> is the number of numbers in
	the sample.  As an example, the mean of the numbers 
	<m:math>
	  <m:apply>
	    <m:eq/>
	    <m:apply>
	      <m:plus/>
	      <m:cn>1</m:cn>
	      <m:cn>2</m:cn>
	      <m:cn>3</m:cn>
	      <m:cn>6</m:cn>
	      <m:cn>8</m:cn>
	    </m:apply>
	    <m:apply>
	      <m:divide/>
	      <m:cn>20</m:cn>
	      <m:cn>5</m:cn>
	    </m:apply>
	    <m:cn>4</m:cn>
	  </m:apply>
	</m:math>
	regardless of whether the numbers constitute the entire
	population or just a sample from the population.
      </para>
      <para id="para7">
	The table, <cnxn target="table1" strength="9">Number of
	touchdown passes</cnxn>, shows the number of touchdown (TD)
	passes thrown by each of the 31 teams in the National Football
	League in the 2000 season.  The mean number of touchdown passes
	thrown is 20.4516 as shown below.

	<m:math display="block">
	  <m:apply>
	    <m:eq/>
	    <m:ci>m</m:ci>
	    <m:apply>
	      <m:divide/>
	      <m:apply>
		<m:times/>
		<m:ci>S</m:ci>
		<m:ci>X</m:ci>
	      </m:apply>
	      <m:ci>N</m:ci>
	    </m:apply>
	    <m:apply>
	      <m:divide/>
	      <m:cn>634</m:cn>
	      <m:cn>31</m:cn>
	    </m:apply>
	    <m:cn>20.4516</m:cn>
	  </m:apply>
	</m:math>
      </para>

      <table frame="all" id="table1">
	<name>Number of touchdown passes</name>
	<tgroup cols="8" align="left" colsep="1" rowsep="1">
	  <tbody valign="top">
	    <row>
	      <entry>
		37
	      </entry>
	      <entry>
		33
	      </entry>
	      <entry>
		33
	      </entry>
	      <entry>
		32
	      </entry>
	      <entry>
		29
	      </entry>
	      <entry>
		28
	      </entry>
	      <entry>
		28
	      </entry>
	      <entry>
		23
	      </entry>
	    </row>
	    <row>
	      <entry>
		22
	      </entry>
	      <entry>
		22
	      </entry>
	      <entry>
		22
	      </entry>
	      <entry>
		21
	      </entry>
	      <entry>
		21
	      </entry>
	      <entry>
		21
	      </entry>
	      <entry>
		20
	      </entry>
	      <entry>
		20
	      </entry>
	    </row>
	    <row>
	      <entry>
		19
	      </entry>
	      <entry>
		19
	      </entry>
	      <entry>
		18
	      </entry>
	      <entry>
		18
	      </entry>
	      <entry>
		18
	      </entry>
	      <entry>
		18
	      </entry>
	      <entry>
		16
	      </entry>
	      <entry>
		15
	      </entry>
	    </row>
	    <row>
	      <entry>
		14
	      </entry>
	      <entry>
		14
	      </entry>
	      <entry>
		14
	      </entry>
	      <entry>
		12
	      </entry>
	      <entry>
		12
	      </entry>
	      <entry>
		9
	      </entry>
	      <entry>
		6
	      </entry>
	      <entry>
		
	      </entry>
	    </row>
	  </tbody>
	</tgroup>
      </table>
      <para id="para9">
	Although the arithmetic mean is not the only "mean" (there is
	also a geometic mean), it is by far the most commonly used.
	Therefore, if the term "mean" is used without specifying
	whether it is the arithmetic mean, the geometic mean, or some
	other mean, it is assumed to refer to the arithmetic mean.
      </para>
    </section>

    <section id="sect2">
      <name>Median</name>
      <para id="para10">
	The <term>median</term> is also a frequently used measure of
	central tendency.  The median is the midpoint of a
	distribution: the same number of scores are above the median
	as below it.  For the data in the table, <cnxn target="table1" strength="9">Number of touchdown passes</cnxn>, there are 31
	scores.  The 16th highest score (which equals 20) is the
	median because there are 15 scores below the 16th score and 15
	scores above the 16th score.  The median can also be thought
	of as the 50th <cnxn document="m10805" strength="8">percentile</cnxn>.
      </para>
      <para id="para11">
	Let's return to the made up example of the quiz on which you
	made a three discussed previously in the module <cnxn document="m10942" strength="8">Introduction to Central
	Tendency</cnxn> and shown in <cnxn target="table2" strength="9"/>.
      </para>
      <table frame="all" id="table2">
	<name>Three possible datasets for the 5-point make-up quiz</name>
	<tgroup cols="4" align="left" colsep="1" rowsep="1">
	  <thead valign="top">
	    <row>
	      <entry>
		Student
	      </entry>
	      <entry>
		Dataset 1
	      </entry>
	      <entry>
		Dataset 2
	      </entry>
	      <entry>
		Dataset 3
	      </entry>
	    </row>
	  </thead>
	  <tbody valign="top">
	    <row>
	      <entry>
		You
	      </entry>
	      <entry>
		3
	      </entry>
	      <entry>
		3
	      </entry>
	      <entry>
		3
	      </entry>
	    </row>
	    <row>
	      <entry>
		John's
	      </entry>
	      <entry>
		3
	      </entry>
	      <entry>
		4
	      </entry>
	      <entry>
		2
	      </entry>
	    </row>
	    <row>
	      <entry>
		Maria's
	      </entry>
	      <entry>
		3
	      </entry>
	      <entry>
		4
	      </entry>
	      <entry>
		2
	      </entry>
	    </row>
	    <row>
	      <entry>
		Shareecia's
	      </entry>
	      <entry>
		3
	      </entry>
	      <entry>
		4
	      </entry>
	      <entry>
		2
	      </entry>
	    </row>
	    <row>
	      <entry>
		Luther's
	      </entry>
	      <entry>
		3
	      </entry>
	      <entry>
		5
	      </entry>
	      <entry>
		1
	      </entry>
	    </row>
	  </tbody>
	</tgroup>
      </table>
      <para id="para12">
	For Dataset 1, the median is three, the same as your
	score. For Dataset 2, the median is 4.  Therefore, your score
	is below the median.  This means you are in the lower half of
	the class. Finally for Dataset 3, the median is 2.  For this
	dataset, your score is above the median and therefore in the
	upper half of the distribution.
      </para>
      <para id="para13">
	<emphasis>Computation of the Median</emphasis>: When there is
	an odd number of numbers, the median is simply the middle
	number.  For example, the median of 2, 4, and 7 is 4.  When
	there is an even number of numbers, the median is the mean of
	the two middle numbers.  Thus, the median of the numbers
	<m:math><m:cn>2</m:cn></m:math>,
	<m:math><m:cn>4</m:cn></m:math>,
	<m:math><m:cn>7</m:cn></m:math>,
	<m:math><m:cn>12</m:cn></m:math> is
	<m:math>
	  <m:apply>
	    <m:eq/>       
	    <m:apply>
	      <m:divide/>
	      <m:apply>
		<m:plus/>
		<m:cn>4</m:cn>
		<m:cn>7</m:cn>
	      </m:apply>
	      <m:cn>2</m:cn>
            </m:apply>
	    <m:cn>5.5</m:cn>
	  </m:apply>
	</m:math>.
      </para>
    </section>

    <section id="sect3">
      <name>mode</name>
      <para id="para15">
	The <term>mode</term> is the most frequently occuring value.
	For the data in the table, <cnxn target="table1" strength="9">Number of touchdown passes</cnxn>, the mode is 18
	since more teams (4) had 18 touchdown passes than any other
	number of touchdown passes.  With continuous data such as
	response time measured to many decimals, the frequency of each
	value is one since no two scores will be exactly the same (see
	discussion of <cnxn document="m10868" strength="5">continuous
	variables</cnxn>).  Therefore the mode of continuous data is
	normally computed from a grouped frequency distribution.  The
	<cnxn target="table3" strength="9">Grouped frequency
	distribution</cnxn> table shows a grouped frequency
	distribution for the target response time data.  Since the
	interval with the highest frequency is 600-700, the mode is
	the middle of that interval (650).
      </para>
      <table frame="all" id="table3">
	<name>Grouped frequency distribution</name>
	<tgroup cols="2" align="left" colsep="1" rowsep="1">
	  <thead valign="top">
	    <row>
	      <entry>
		Range
	      </entry>
	      <entry>
		Frequency
	      </entry>
	    </row>
	  </thead>
	  <tbody valign="top">
	    <row>
	      <entry>
		500-600
	      </entry>
	      <entry>
		3
	      </entry>
	    </row>
	    <row>
	      <entry>
		600-700
	      </entry>
	      <entry>
		6
	      </entry>
	    </row>
	    <row>
	      <entry>
		700-800
	      </entry>
	      <entry>
		5
	      </entry>
	    </row>
	    <row>
	      <entry>
		800-900
	      </entry>
	      <entry>
		5
	      </entry>
	    </row>
	    <row>
	      <entry>
		900-1000
	      </entry>
	      <entry>
		0
	      </entry>
	    </row>
	    <row>
	      <entry>
		1000-1100
	      </entry>
	      <entry>
		1
	      </entry>
	    </row>
	  </tbody>
	</tgroup>
      </table>
    </section>

  </content>
</document>
