Skip to content Skip to navigation

Connexions

You are here: Home » Content » Introduction to Central Tendency

Navigation

Recently Viewed

This feature requires Javascript to be enabled.
Download
x

Download module as:

  • PDF
  • EPUB (what's this?)

    What is an EPUB file?

    EPUB is an electronic book format that can be read on a variety of mobile devices.

    Downloading to a reading device

    For detailed instructions on how to download this content's EPUB to your specific device, click the "(what's this?)" link.

  • More downloads ...
Reuse / Edit
x

Module:

Add to a lens
x

Add module to:

Add to Favorites
x

Add module to:

 

Introduction to Central Tendency

Module by: David Lane. E-mail the author

What is central tendency, and why do we want to know the central tendency of a group of scores? Let us first try to answer these questions intuitively. Then we will proceed to a more formal discussion.

Imagine this situation: You are in a class with just four other students, and the five of you took a 5-point pop quiz. Today your instructor is walking around the room, handing back the quizzes. She stops at your desk and hands you your paper. Written in bold black ink on the front is 3/5 35. How do you react? Are you happy with your score of 3 or disappointed? How do you decide? You might calculate your percentage correct, realize it is 60%, and be appalled. But it is more likely that when deciding how to react to your performance, you will want additional information. What additional information would you like?

If you are like most students, you will immediately ask your neighbors, "Whad'ja get?" and then ask the instructor, "How did the class do?" In other words, the additional information you want is how your quiz score compares to other students' scores. You therefore understand the importance of comparing your score to the class distribution of scores. Should your score of 3 turn out to be among the higher grades then you'll be pleased after all. On the other hand, if 3 is among the lowest scores in the class, you won't be quite so happy.

This idea of comparing individual scores to a distribution of scores is fundamental to statistics. So let's explore it further, using the same example (the pop quiz you took with your four classmates). Three possible outcomes are shown in Table 1. They are labeled "Dataset A," "Dataset B," and "Dataset C." Which of the three datasets would make you happiest? In other words, in comparing your score with your fellow students' scores, in which dataset would your score of 3 be the most impressive?

In Dataset A, everyone's score is 3. This puts your score at the exact center of the distribution. You can draw satisfaction from the fact that you did as well as everyone else. But of course it cuts both ways: everyone else did just as well as you.

Table 1: Three possible datasets for the 5-point make-up quiz.
Student Dataset A Dataset B Dataset C
You 3 3 3
John's 3 4 2
Maria's 3 4 2
Shareecia's 3 4 2
Luther's 3 5 1

Now consider the possibility that the scores are described as in Dataset B. This is a depressing outcome even though your score is no different than the one in Dataset 1. The problem is that the other four students had higher grades, putting yours below the center of the distribution.

Finally, let's look at Dataset C. This is more like it! All of your classmates score lower than you so your score is above the center of the distribution.

Now let's change the example in order to develop more insight into the center of a distribution. Figure 1 shows the results of an experiment on memory for chess positions. Subjects were shown a chess position and then asked to reconstruct it on an empty chess board. The number of pieces correctly placed was recorded. This was repeated for two more chess positions. The scores represent the total number of chess pieces correctly placed for the three chess positions. The maximum possible score was 89.

Two groups are compared. On the left are people who don't play chess. On the right are people who play a great deal (tournament players). It is clear that the location of the center of the distribution for the non players is lower than the center of the distribution for the tournament players.

Figure 1: Back to back stem and leaf display. The left side shows the memory scores of the non-players. The right side shows the scores of the tournament players.
Figure 1 (stem1.bmp)

We're sure you get the idea now about the center of a distribution. It is time to move beyond intuition. We need a formal definition of the center of a distribution. In fact, we'll offer you three definitions! This is not just generosity on our part. There turn out to be (at least) three different ways of thinking about the center of a distribution, all of them useful in various contexts. In the remainder of this section we attempt to communicate the idea behind each concept. In the succeeding sections we will give statistical measures for these concepts of central tendency.

Definitions of Center

Now we explain the three different ways of defining the center of a distribution. All three are called measures of central tendency.

Balance Scale

One definition of central tendency is the point at which the distribution is in balance. Figure 2 shows the distribution of the five numbers 2, 3, 4, 9, 16 placed upon a balance scale. If each number weighs one pound, and is placed at its position along the number line, then it would be possible to balance them by placing a fulcrum at 6.8.

Figure 2: A Balance Scale
Figure 2 (balance1.gif)
For another example, consider the distribution shown in Figure 3. It is balanced by placing the fulcrum in the geometric middle.
Figure 3: A distribution balanced on the tip of a triangle.
Figure 3 (balance.gif)
Figure 4 illustrates that the same distribution can't be balanced by placing the fulcrum to the left of center.
Figure 4: The distribution is not balanced.
Figure 4 (unbalance.gif)
Figure 5 shows an asymmetric distribution. To balance it, we cannot put the fulcrum halfway between the lowest and highest values (as we did in Figure 3). Placing the fulcrum at the "half way" point would cause it to tip towards the left.
Figure 5: An asymmetric distribution balanced on the tip of a triangle.
Figure 5 (asymmetric.gif)
The balance point defines one sense of a distribution's center. The simulation in the document Balance Scale Simulation shows how to find the point at which the distribution balances.

Smallest Absolute Deviation

Another way to define the center of a distribution is based on the concept of the sum of the absolute differences. Consider the distribution made up of the five numbers 2, 3, 4, 9, 16. Let's see how far the distribution is from 10 (picking a number arbitrarily). Table 2 shows the sum of the absolute differences of these numbers from the number 10.

Table 2: An example of the sum of absolute deviations
Values Absolute difference from 10
2 8
3 7
4 6
9 1
16 6
Sum 28

The first row of the table shows that the absolute value of the difference between 2 and 10 is 8; the second row shows that the difference between 3 and 10 is 7, and similarly for the other rows. When we add up the five absolute differences, we get 28. So, the sum of the absolute differences from 10 is 28. Likewise, the sum of the absolute differences from 5 equals 3+2+1+4+11=21 3 2 1 4 11 21 . So, the sum of the absolute differences from 5 is smaller than the sum of the absolute differences from 10. In this sense, 5 is closer, overall, to the other numbers than is 10.

We are now in position to define a second measure of central tendency, this time in terms of absolute differences. Specifically, according to our second definition, the center of a distribution is the number for which the sum of the absolute differences is smallest. As we just saw, the sum of the absolute differences from 10 is 28 and the sum of the absolute differences from 5 is 21. Is there a value for which the sum of the absolute difference is even smaller than 21? Yes. For these data, there is a value for which the sum of absolute deviation is only 20. See if you can find it. A general method for finding the center of a distribution in the sense of absolute difference is provided in the document Absolute Differences Simulation

Smallest Squared Deviation

We shall discuss one more way to define the center of a distribution. It is is based on the concept of the sum of squared differences. Again, consider the distribution of the five numbers 2, 3, 4, 9, 16. Table 3 shows the sum of the squared differences of these numbers from the number 10.

Table 3: An example of the sum of squared deviations
Values Squared differences from 5
2 9
3 4
4 1
9 16
16 121
Sum 151
The first row in the table shows that the squared value of the difference between 2 and 10 is 64; the second row shows that the difference between 3 and 10 is 49, and so forth. When we add up all these differences, we get 486. Changing the target from 10 to 5, we calculate the sum of the squared differences from 5 as 9+4+1+16+121=151 9 4 1 16 121 151 . So, the sum of the squared differences from 5 is smaller than the sum of the absolute differences from 10. Is there a value for which the sum of the squared difference is even smaller than 151? Yes, it is possible to reach 134.8. Can you find the target number for which the sum of squared deviations is 134.8?

The target that minimizes the sum of squared differences provides another useful definition of central tendency (the last one to be discussed in this section). It can be challenging to find the value that minimizes this sum. We'll show you how to do it in the upcoming document Squared Differences Simulation

Glossary

Average:
1. The (arithmetic) mean
2. Any measure of central tendency
Bar Chart:
A graphical method of presenting data from a discrete variable. A bar is drawn for each value of the variable. The height of each bar contains the number or percentage of observations with that value of the variable. An example is shown below. See also: histogram, line graph, pie chart, box plot. See Figure 6 for an example.
Figure 6
Figure 6 (image004.png)
Box Plot:
One of the more effective graphical summaries of a data set, the box plot generally shows mean, median, 25th and 75th percentiles, and outliers. A standard box plot is composed of the median, upper hinge, lower hinge, higher adjacent value, lower adjacent value, outside values, and far out values. An example is shown below. Parallel box plots are very useful for comparing distributions. See Figure 7 for an example. See also: step, H-spread.
Figure 7
Figure 7 (boxplot_labeled.png)
Center (of a Distribution):
Central Tendency: The center or middle of a distribution. There are many measures of central tendency. The most common are the mean, median, and mode. Others include the trimean, trimmed mean, and geometric mean.
Class Interval:
Bin Width: Also known as bin width, the class interval is a division of data for use in a histogram. For instance, it is possible to partition scores on a 100 point test into class intervals of 1-25, 26-49, 50-74 and 75-100.
Class Frequency:
One of the components of a histogram, the class frequency is the number of observations in each class interval. See also: relative frequency.
Continuous Variables:
Variables that can take on any value in a certain range. Time and distance are continuous; gender, SAT score and "time rounded to the nearest second" are not. Variables that are not continuous are known as discrete variables. No measured variable is truly continuous; however, discrete variables measured with enough precision can often be considered continuous for practical purposes.
Data:
A collection of values to be used for statistical analysis. See also: variable.
Dependent Variable:
A variable that measures the experimental outcome. In most experiments, the effects of the independent variable on the dependent variables are observed. For example, if a study investigated the effectiveness of an experimental treatment for depression, then the measure of depression would be the dependent variable. Synonym: dependent measure
Discrete:
Variables that can only take on a finite number of values are called "discrete variables." All qualitative variables are discrete. Some quantitative variables are discrete, such as performance rated as 1, 2, 3, 4, or 5, or temperature rounded to the nearest degree. Sometimes, a variable that takes on enough discrete values can be considered to be continuous for practical purposes. One example is time to the nearest millisecond. Variables that can take on an infinite number of possible values are called continuous variables.
Distribution:
Frequency Distribution: The distribution of empirical data is called a frequency distribution and consists of a count of the number of occurrences of each value. If the data are continuous, then a grouped frequency distribution is used. Typically, a distribution is portrayed using a frequency polygon or a histogram. Mathematical distributions are often used to define distributions. The normal distribution is, perhaps, the best known example. Many empirical distributions are approximated well by mathematical distributions such as the normal distribution.
Far Out Value:
One of the components of a box plot, far out values are those that are more than 2 steps from the nearest hinge. They are beyond the outer fences.
Frequency Polygon:
A frequency polygon is a graphical representation of a distribution. It partitions the variable on the x-axis into various contiguous class intervals of (usually) equal widths. The heights of the polygon's points represent the class frequencies. See Figure 8 for an example.
Figure 8
Figure 8 (image1.png)
Geometric Mean:
The geometric mean of n n numbers is obtained by multiplying all of them together, and then taking the nth root of them. It is one of the rarer measures of central tendency, and not to be confused with the much, much more common arithmetic mean.
Grouped Frequency Distribution:
A grouped frequency distribution is a frequency distribution in which frequencies are displayed for ranges of data rather than for individual values. For example, the distribution of heights might be calculated by defining one-inch ranges. The frequency of indivuals with various heights rounded off to the nearest inch would be then be tabulated. See also: histogram.
Higher Adjacent Value:
One of the components of a box plot, the higher adjacent value is the largest value in the data below the 75th percentile.
Histogram:
A histogram is a graphical representation of a distribution. It partitions the variable on the x-axis into various contiguous class intervals of (usually) equal widths. The heights of the bars represent the class frequencies. See Figure 9 for an example.
Figure 9
Figure 9 (image001.png)
See also: Sturgis's Rule.
H-spread:
One of the components of a box plot, the H-spread is the difference between the upper hinge and the lower hinge.
Independent Variables:
Variables that are manipulated by the experimenter, as opposed to dependent variables. Most experiments consist of observing the effect of the independent variable on the dependent variable(s).
Interval Scales:
One of 4 Levels of Measurement, interval scales are numerical scales in which intervals have the same interpretation throughout. As an example, consider the Fahrenheit scale of temperature. The difference between 30 degrees and 40 degrees represents the same temperature difference as the difference between 80 degrees and 90 degrees. This is because each 10 degree interval has the same physical meaning (in terms of the kinetic energy. Unlike ratio scales, interval scales do not have a true zero point.
Levels of Measurement:
Measurement scales differ in their level of measurement. There are four common levels of measurement:
  1. Nominal scales are only labels.
  2. Ordinal Scales are ordered but are not truly quantitative. Equal intervals on the ordinal scale do not imply equal intervals on the underlying trait.
  3. Interval scales are are ordered and equal intervals equal intervals on the underlying trait. However, interval scales do not have a true zero point.
  4. Ratio scales are interval scales that do have a true zero point. With ratio scales, it is sensible to talk about one value being twice as large as another, for example.
Line Graph:
Essentially a bar graph in which the height of each par is represented by a single point, with each of these points connected by a line. Line graphs are best used to show change over time, and should never be used if your X-axis is not an ordered variable.
Lower Hinge:
A component of a box plot, the lower hinge is the 25th percentile. The upper hinge is the 75th percentile.
Lower Adjacent Value:
A component of a box plot, the lower adjacent value is smallest value in the data above the inner lower fence.
Mean:
Arithmetic Mean: Also known as the arithmetic mean, the mean is typically what is meant by the word average. The mean is perhaps the most common measure of central tendency. The mean of a variable is given by (the sum of all its values)/(the number of values). For example, the mean of 4, 8, and 9 is 7. The sample mean is written as M, and the population mean as the Greek letter mu (μμ). Despite its popularity, the mean may not be an appropriate measure of central tendency for skewed distributions, or in situations with outliers.
Median:
The median is a popular measure of central tendency. It is the 50th percentile of a distribution. To find the median of a number of values, first order them, then find the observation in the middle: the median of 5, 2, 7, 9, and 4 is 5. (Note that if there is an even number of values, one takes the average of the middle two: the median of 4, 6, 8, and 10 is 7.) The median is often more appropriate than the mean in skewed distributions, or in situations with large outliers.
Mode:
The mode is a measure of central tendency. It is the most common value in a distribution: the mode of 3, 4, 4, 5, 5, 5, 8 is 5. Note that the mode may be very different from the mean and the median: 1, 1, 1, 3, 8, 10 has mode 1, but mean 6 and median 2.
Nominal Scale:
A nominal scale is one of four Levels of Measurement. No ordering is implied, and addition/subtraction and multiplication/division would be inappropriate for a variable on a nominal scale. FemaleMale Female Male and BuddhistChristianHinduMuslim Buddhist Christian Hindu Muslim have no natural ordering (except alphabetic). Occasionally, numeric values are nominal: for instance, if a variable was coded as Female=1 Female 1 , Male=2 Male 2 , the set 12 1 2 is still nominal.
Ordinal Scale:
One of four levels of measurement, an ordinal scale is a set of ordered values. However, there is no set distance between scale values. For instance, for the scale: (Very Poor, Poor, Average, Good, Very Good) is an ordinal scale. You can assign numerical values to an ordinal scale: rating performance such as 1 for "Very Poor," 2 for "Poor," etc, but there is no assurance that the difference between a score of 1 and 2 means the same thing as the difference between a score of and 3.
Outside Value:
A component of a box plot, an outside value is a value more than 1 step from the nearest hinge. See also: Far out value.
Parallel Box Plots:
Two or more box plots drawn on the same Y-axis. These are often useful in comparing features of distributions. An example portraying the times it took samples of women and men to do a task is shown below. See Figure 10 for an example.
Figure 10
Figure 10 (image.png)
Percentile:
1. There is no universally accepted definition of a percentile. Using the 65th percentile as an example, some statisticians define the 65th percentile as the lowest score that is larger than 65% of the scores. Others have defined the 65th percentile as the smallest score that is greater than or equal to 65% of the scores. A more sophisticated definition is given below.
2. The first step is to compute the rank (R R) of the percentile in question. This is done using the following formula: R=P100(N+1) R P 100 N 1 where P P is the desired percentile and NN is the number of numbers. If RR is an integer, then the Pth Pth percentile is the number with rank RR. When R R is not an integer, we compute the Pth Pth percentile by interpolation as follows:
  1. Define IR IR as the integer portion of RR (the number to the left of the decimal point).
  2. Define FR FR as the fractional portion or RR.
  3. Find the scores with Rank IR IR and with Rank I R +1 I R 1 .
  4. Interpolate by multiplying the difference between the scores by FR FR and add the result to the lower score.
Pie Chart:
A graphical representation of data, the pie chart shows relative frequencies of classes of data. It is a circle cut into a number of wedges, one for each class, with the area of each wedge proportional to its relative frequency. Pie charts are only effective for a small number of classes, and are one of the less effective graphical representations.
Qualitative Variables:
Categorical Variable: Also known as categorical variables, qualitative variables are variables with no natural sense of ordering. For instance, hair color (Black, Brown, Gray, Red, Yellow) is a qualitative variable, as is name (Adam, Becky, Christina, Dave . . .). Qualitative variables can be coded to appear numeric but their numbers are meaningless, as in male=1, female=2. Variables that are not qualitative are known as quantitative variables.
Quantitative Variables:
Variables that have are measured on a numeric or quantitative scale. Ordinal, interval and ratio scales are quantitative. A country's population, a person's shoe size, or a car's speed are all quantitative variables. Variables that are not quantitative are known as qualitative variables.
Ratio Scale:
One of the four basic levels of measurement, a ratio scale is a numerical scale with a true zero point and in which a given size interval has the same interpretation for the entire scale. Weight is a ratio scale, Therefore it is meaningful to say that a 200 pound person weighs twice as much as a 100 pound person.
Relative Frequency:
The proportion of observations falling into a given class. For example, if a bag of 55 M&M's has 11 green M&M's, then the frequency of green M&M's is 11 and the relative frequency is 11/55=0.20 1155 0.20 . Relative frequencies arise in the creation of histograms and pie charts, and sometimes in bar graphs.
Skew:
A distribution is skewed if one tail extends out further than the other. A distribution has positive skew (is skewed to the right) if the tail to the right is longer. See Figure 11 for an example.
Figure 11
Figure 11 (histo2.png)
A distribution has a negative skew (is skewed to the left) if the tail to the left is longer. See Figure 12 for an example.
Figure 12
Figure 12 (midterm11.png)
Step:
One of the components of a box plot, the step is 1.5 times the difference between the upper hinge and the lower hinge. See also: H-spread.
Sturgis's Rule:
One method of determining the number of classes for a histogram, Sturgis's Rule is to take 1+log 2 N 1 2 N classes, rounded to the nearest integer.
Trimean:
The trimean is a measure of central tendency; it is a weighted average of the 25th, 50th, and 75th percentiles. Specifically it is computed as follows: Trimean=0.25 25 th +0.5 50 th +0.25 75 th Trimean 0.25 25 th 0.5 50 th 0.25 75 th
Trimmed Mean:
The trimmed mean is a measure of central tendency generally falling between the mean and the median. As in the computation of the median, all observations are ordered. Next, the highest and lowest alpha percent of the data are removed, where alpha ranges from 0 to 50. Finally, the mean of the remaining observations is taken. The trimmed mean has advantages over both the mean and median, but is computationally more difficult and analytically more intractable.
Upper Hinge:
The upper hinge is one of the components of a box plot; it is the 75th percentile.
Variables:
Something that can take on different values. For example, different subjects in an experiment weight different amounts. Therefore "weight" is a variable in the experiment. Or, subjects may be given different doses of a drug. This would make "dosage" a variable. Variables can be dependent or independent, qualitative or quantitative, and continuous or discrete.

Content actions

Download module as:

PDF | EPUB (?)

What is an EPUB file?

EPUB is an electronic book format that can be read on a variety of mobile devices.

Downloading to a reading device

For detailed instructions on how to download this content's EPUB to your specific device, click the "(?)" link.

| More downloads ...

Add module to:

My Favorites (?)

'My Favorites' is a special kind of lens which you can use to bookmark modules and collections. 'My Favorites' can only be seen by you, and collections saved in 'My Favorites' can remember the last module you were on. You need an account to use 'My Favorites'.

| A lens I own (?)

Definition of a lens

Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

Who can create a lens?

Any individual member, a community, or a respected organization.

What are tags? tag icon

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

| External bookmarks

Reuse / Edit:

Reuse or edit module (?)

Check out and edit

If you have permission to edit this content, using the "Reuse / Edit" action will allow you to check the content out into your Personal Workspace or a shared Workgroup and then make your edits.

Derive a copy

If you don't have permission to edit the content, you can still use "Reuse / Edit" to adapt the content by creating a derived copy of it and then editing and publishing the copy.