Connexions

You are here: Home » Content » Comparing Measures of Central Tendency

Recently Viewed

This feature requires Javascript to be enabled.

Comparing Measures of Central Tendency

Module by: David Lane. E-mail the author

Comparing Measures of Central Tendency

How do the various measures of central tendency compare with each other? For symmetric distributions [link], the mean, median, trimean, and trimmed mean are equal, as is the mode except in bimodal distributions [link]. Differences among the measures occur with skewed [link] distributions. Figure 1 shows the distribution of 642 scores on an introductory psychology test. Notice this distribution has a slight positive skew.

Measures of central tendency are shown in Table 1. Notice they do not differ greatly, with the exception that the mode is lower than the other measures. When distributions have a positive skew, the mean is higher than the median. For these data, the mean of 91.58 is higher than the median of 90. Typically the trimean [link] and trimmed [link] mean will fall between the median [link] and the mean [link], although in this case, the trimmed mean is slightly lower than the median. The geomtric mean [link] is the lower than all measures except the mode [link].

Table 1: Measures of central tendency for the test scores.
Measure Value
Mode 84.00
Median 90.00
Geometric Mean 89.70
Trimean 90.25
Mean trimmed 50% 89.81
Mean 91.58

The distribution of baseball salaries (in 1994) shown in Figure 2 has a much more pronounced skew than the distribution in Figure 1.

Table 2 shows the measures of central tendency for these data. The large skew results in very different values for these measures. No single measure of central tendency is sufficient for data such as these. If you were asked the very general question:"So, what do baseball players make?" and answered with the mean of $1,183,000, you would have not told the whole story since only about one third of baseball players make that much. If you answered with the mode of$250,000 or the median of \$500,000, you would not be giving any indication that some players make many millions of dollars. Fortunately, there is no need to summarize a distribution with a single number. When the various measurs differ, our opinion is that you should report the mean, median, and either the trimean or a the mean trimmed 50%. Sometimes it is worth reporting the mode as well. In the media, the median is usually reported to summarize the center of skewed distributions. You will hear about median salaries and median prices of houses sold, etc. This is better than reporting only the mean, but it would be informative to hear more statistics.

Table 2: Measures of central tendency for baseball salaries (in thousands of dollars).
Measure Value
Mode 250
Median 500
Geometric Mean 555
Trimean 792
Mean trimmed 50% 619
Mean 1,183

Glossary

Average:
1. The (arithmetic) mean
2. Any measure of central tendency
Bimodal Distribution:
A distribution with two distinct peaks. An example is shown below.
Bar Chart:
A graphical method of presenting data from a discrete variable. A bar is drawn for each value of the variable. The height of each bar contains the number or percentage of observations with that value of the variable. An exmple is shown below. See also: histogram, line graph, pie chart, box plot. See Figure 4 for an example.
Box Plot:
One of the more effective graphical summaries of a data set, the box plot generally shows mean, median, 25th and 75th percentiles, and outliers. A standard box plot is composed of the median, upper hinge, lower hinge, higher adjacent value, lower adjacent value, outside values, and far out values. An example is shown below. Parallel box plots are very useful for comparing distributions. See Figure 5 for an example. See also: step, H-spread.
Center (of a Distribution):
Central Tendency: The center or middle of a distribution. There are many measures of central tendency. The most common are the mean, median, and mode. Others include the trimean, trimmed mean, and geometric mean.
Class Interval:
Bin Width: Also known as bin width, the class interval is a division of data for use in a histogram. For instance, it is possible to partition scores on a 100 point test into class intervals of 1-25, 26-49, 50-74 and 75-100.
Class Frequency:
One of the components of a histogram, the class frequency is the number of observations in each class interval. See also: relative frequency.
Continuous Variables:
Variables that can take on any value in a certain range. Time and distance are continuous; gender, SAT score and "time rounded to the nearest second" are not. Variables that are not continuous are known as discrete variables. No measured variable is truly continuous; however, discrete variables measured with enough precision can often be considered continuous for practical purposes.
Data:
A collection of values to be used for statistical analysis. See also: variable.
Discrete:
Variables that can only take on a finite number of values are called "discrete variables." All qualitative variables are discrete. Some quantitative variables are discrete, such as performance rated as 1, 2, 3, 4, or 5, or temperature rounded to the nearest degree. Sometimes, a variable that takes on enough discrete values can be considered to be continuous for practical purposes. One example is time to the nearest millisecond. Variables that can take on an infinite number of possible values are called continuous variables.
Distribution:
Frequency Distribution: The distribution of empirical data is called a frequency distribution and consists of a count of the number of occurences of each value. If the data are continuous, then a grouped frequency distribution is used. Typically, a distribution is portrayed using a frequency polygon or a histogram. Mathematical distributions are often used to define distributions. The normal distribution is, perhaps, the best known example. Many empirical distributions are approximated well by mathematical distributions such as the normal distribution.
Far Out Value:
One of the components of a box plot, far out values are those that are more than 2 steps from the nearest hinge. They are beyond the outer fences.
Frequency Polygon:
A frequency polygon is a graphical representation of a distribution. It partitions the variable on the x-axis into various contiguous class intervals of (usually) equal widths. The heights of the polygon's points represent the class frequencies. See Figure 6 for an example.
Geometric Mean:
The geometric mean of n n numbers is obtained by multiplying all of them together, and then taking the nth root of them. It is one of the rarer measures of central tendency, and not to be confused with the much, much more common arithmetic mean.
Grouped Frequency Distribution:
A grouped frequency distribution is a frequency distribution in which frequencies are displayed for ranges of data rather than for individual values. For example, the distribution of heights might be calculated by defining one-inch ranges. The frequency of indivuals with various heights rounded off to the nearest inch would be then be tabulated. See also: histogram.
One of the components of a box plot, the higher adjacent value is the largest value in the data below the 75th percentile.
Histogram:
A histogram is a graphical representation of a distribution. It partitions the variable on the x-axis into various contiguous class intervals of (usually) equal widths. The heights of the bars represent the class frequencies. See Figure 7 for an example. See also: Sturgis's Rule.
One of the components of a box plot, the H-spread is the difference between the upper hinge and the lower hinge.
Levels of Measurement:
Measurement scales differ in their level of measurement. There are four common levels of measurement:
1. Nominal scales are only labels.
2. Ordinal Scales are ordered but are not truly quantitative. Equal intervals on the ordinal scale do not imply equal intervals on the underlying trait.
3. Interval scales are are ordered and equal intervals equal intervals on the underlying trait. However, interval scales do not have a true zero point.
4. Ratio scales are interval scales that do have a true zero point. With ratio scales, it is sensible to talk about one value being twice as large as another, for example.
Line Graph:
Essentially a bar graph in which the height of each par is represented by a single point, with each of these points connected by a line. Line graphs are best used to show change over time, and should never be used if your X-axis is not an ordered variable.
Lower Hinge:
A component of a box plot, the lower hinge is the 25th percentile. The upper hinge is the 75th percentile.
A component of a box plot, the lower adjacent value is smallest value in the data above the inner lower fence.
Mean:
Arithmetic Mean: Also known as the arithmetic mean, the mean is typically what is meant by the word average. The mean is perhaps the most common measure of central tendency. The mean of a variable is given by (the sum of all its values)/(the number of values). For example, the mean of 4, 8, and 9 is 7. The sample mean is written as M, and the population mean as the Greek letter mu (μ). Despite its popularity, the mean may not be an appropriate measure of central tendency for skewed distributions, or in situations with outliers.
Median:
The median is a popular measure of central tendency. It is the 50th percentile of a distribution. To find the median of a number of values, first order them, then find the observation in the middle: the median of 5, 2, 7, 9, and 4 is 5. (Note that if there is an even number of values, one takes the average of the middle two: the median of 4, 6, 8, and 10 is 7.) The median is often more appropriate than the mean in skewed distributions, or in situations with large outliers.
Mode:
The mode is a measure of central tendency. It is the most common value in a distribution: the mode of 3, 4, 4, 5, 5, 5, 8 is 5. Note that the mode may be very different from the mean and the median: 1, 1, 1, 3, 8, 10 has mode 1, but mean 6 and median 2.
Nominal Scale:
A nominal scale is one of four Levels of Measurement. No ordering is implied, and addition/subtraction and multiplication/division would be inappropriate for a variable on a nominal scale. FemaleMale Female Male and BuddhistChristianHinduMuslim Buddhist Christian Hindu Muslim have no natural ordering (except alphabetic). Occasionally, numeric values are nominal: for instance, if a variable was coded as Female=1 Female 1 , Male=2 Male 2 , the set 12 1 2 is still nominal.
Ordinal Scale:
One of four levels of measurement, an ordinal scale is a set of ordered values. However, there is no set distance between scale values. For instance, for the scale: (Very Poor, Poor, Average, Good, Very Good) is an ordinal scale. You can assign numerical values to an ordinal scale: rating performance such as 1 for "Very Poor," 2 for "Poor," etc, but there is no assurance that the difference between a score of 1 and 2 means the same thing as the difference between a score of and 3.
Outside Value:
A component of a box plot, an outside value is a value more than 1 step from the nearest hinge. See also: Far out value.
Parallel Box Plots:
Two or more box plots drawn on the same Y-axis. These are often useful in comparing features of distributions. An example portraying the times it took samples of women and men to do a task is shown below. See Figure 8 for an example.
Percentile:
1. There is no universally accepted definition of a percentile. Using the 65th percentile as an example, some statisticians define the 65th percentile as the lowest score that is larger than 65% of the scores. Others have defined the 65th percentile as the smallest score that is greater than or equal to 65% of the scores. A more sophisticated definition is given below.
2. The first step is to compute the rank (R R) of the percentile in question. This is done using the following formula: R=P100(N+1) R P 100 N 1 where P P is the desired percentile and N N is the number of numbers. If R R is an integer, then the Pth Pth percentile is the number with rank R R. When R R is not an integer, we compute the Pth Pth perentile by interpolation as follows:
1. Define IR IR as the integer portion of R R (the number to the left of the decimal point).
2. Define FR FR as the fractional portion or R R.
3. Find the scores with Rank IR IR and with Rank I R +1 I R 1 .
4. Interpolate by multiplying the difference between the scores by FR FR and add the result to the lower score.
Pie Chart:
A graphical representation of data, the pie chart shows relative frequencies of classes of data. It is a circle cut into a number of wedges, one for each class, with the area of each wedge proportional to its relative frequency. Pie charts are only effective for a small number of classes, and are one of the less effective graphical representations.
Qualitative Variables:
Categorical Variable: Also known as categorical variables, qualitative variables are variables with no natural sense of ordering. For instance, hair color (Black, Brown, Gray, Red, Yellow) is a qualitative variable, as is name (Adam, Becky, Christina, Dave . . .). Qualitative variables can be coded to appear numeric but their numbers are meaningless, as in male=1, female=2. Variables that are not qualitative are known as quantitative variables.
Quantitative Variables:
Variables that have are measured on a numeric or quantitative scale. Ordinal, interval and ratio scales are quantitative. A country's population, a person's shoe size, or a car's speed are all quantitative variables. Variables that are not quantitative are known as qualitative variables.
Ratio Scale:
One of the four basic levels of measurement, a ratio scale is a numerical scale with a true zero point and in which a given size interval has the same interpretation for the entire scale. Weight is a ratio scale, Therefore it is meaningful to say that a 200 pound person weighs twice as much as a 100 pound person.
Relative Frequency:
The proportion of observations falling into a given class. For example, if a bag of 55 M&M's has 11 green M&M's, then the frequency of green M&M's is 11 and the relative frequency is 11/55=0.20 1155 0.20 . Relative frequencies arise in the creation of histograms and pie charts, and sometimes in bar graphs.
Skew:
A distribution is skewed if one tail extends out further than the other. A distribution has positive skew (is skewed to the right) if the tail to the right is longer. See Figure 9 for an example. A distribution has a negative skew (is skewed to the left) if the tail to the left is longer. See Figure 10 for an example.
Step:
One of the components of a box plot, the step is 1.5 times the difference between the upper hinge and the lower hinge. See also: H-spread.
Sturgis's Rule:
One method of determining the number of classes for a histogram, Sturgis's Rule is to take 1+log 2 N 1 2 N classes, rounded to the nearest integer.
Symmetric Distribution:
In a symmetric distribution, the upper and lower halfs of the distribution are mirror images of each other. For example, in the distribution shown below, the portions above and below 50 are mirror images of each other. In a symmetric distribution, the mean is equal to the median. See Figure 11 for an example.
Trimean:
The trimean is a measure of central tendency; it is a weighted average of the 25th, 50th, and 75th percentiles. Specifically it is computed as follows: Trimean=0.25 25 th +0.5 50 th +0.25 75 th Trimean 0.25 25 th 0.5 50 th 0.25 75 th
Trimmed Mean:
The trimmed mean is a measure of central tendency generally falling between the mean and the median. As in the computation of the median, all observations are ordered. Next, the highest and lowest alpha percent of the data are removed, where alpha ranges from 0 to 50. Finally, the mean of the remaining observations is taken. The trimmed mean has advantages over both the mean and median, but is computationally more difficult and analytically more intractable.
Upper Hinge:
The upper hinge is one of the components of a box plot; it is the 75th percentile.
Variables:
Something that can take on different values. For example, different subjects in an experiment weight different amounts. Therefore "weight" is a variable in the experiment. Or, subjects may be given different doses of a drug. This would make "dosage" a variable. Variables can be dependent or independent, qualitative or quantitative, and continuous or discrete.
Dependent Variable:
A variable that measures the experimental outcome. In most experiments, the effects of the independent variable on the dependent variables are observed. For example, if a study investigated the effectiveness of an experimental treatment for depression, then the measure of depression would be the dependent variable. Synonym: dependent measure
Independent Variables:
Variables that are manipulated by the experimenter, as opposed to dependent variables. Most experiments consist of observing the effect of the independent variable on the dependent variable(s).

Content actions

PDF | EPUB (?)

What is an EPUB file?

EPUB is an electronic book format that can be read on a variety of mobile devices.

For detailed instructions on how to download this content's EPUB to your specific device, click the "(?)" link.

My Favorites (?)

'My Favorites' is a special kind of lens which you can use to bookmark modules and collections. 'My Favorites' can only be seen by you, and collections saved in 'My Favorites' can remember the last module you were on. You need an account to use 'My Favorites'.

| A lens I own (?)

Definition of a lens

Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

Who can create a lens?

Any individual member, a community, or a respected organization.

What are tags?

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

| External bookmarks