Skip to content Skip to navigation

Connexions

You are here: Home » Content » Measures of Variability

Navigation

Recently Viewed

This feature requires Javascript to be enabled.

Measures of Variability

Module by: David Lane. E-mail the author

User rating (How does the rating system work?)
Ratings

Ratings allow you to judge the quality of modules. If other users have ranked the module then its average rating is displayed below. Ratings are calculated on a scale from one star (Poor) to five stars (Excellent).

How to rate a module

Hover over the star that corresponds to the rating you wish to assign. Click on the star to add your rating. Your rating should be based on the quality of the content. You must have an account and be logged in to rate content.

:
(0 ratings)

Note: Your browser may not currently support MathML. See our browser support page for additional details. You can always view the correct math in the PDF version.

What is Variability?

Variability refers to how "spread out" a group of scores is. To see what we mean by spread out, consider graphs in Figure 1. These graphs represent the scores on two quizzes. The mean score for each quiz is 7.07.0. Despite the equality of means, you can see that the distributions are quite different. Specifically, the scores on Quiz 1 are more densely packed and those on Quiz 2 are more spread out. The differences among students was much greater on Quiz 2 than on Quiz 1.

Figure 1: Bar charts of two quizzes.
(a) Quiz 1
Figure 1(a) (spread1.gif)
(b) Quiz 2
Figure 1(b) (spread2.gif)

The terms variability, spread, and dispersion are synonyms, and refer to how spread out a distribution is. Just as in the section on central tendency we discussed measures of the center of a distribution of scores, in this chapter we will discuss measures of the variability of a distribution. There are four frequently used measures of variability, the range, interquartile range, variance, and standard deviation. In the next few paragraphs, we will look at each of these four measures of variability in more detail.

Range

The range is the simplest measure of variability to calculate, and one you have probably encountered many times in your life. The range is simply the highest score minus the lowest score. Let's take a few examples. What is the range of the following group of numbers - 10256734 10 2 5 6 7 3 4 ? Well, the highest number is 1010, and the lowest number is 2 2 , so 102=8 10 2 8 . The range is 88. Let's take another example. Here's a dataset with 1010 numbers - 99452367459182786251 99 45 23 67 45 91 82 78 62 51 . What is the range? The highest number is 9999 and the lowest number is 2323, so 9923=76 99 23 76 ; the range is 7676. Now consider the two quizzes shown in Figure 1. On Quiz 1, the lowest score was 55 and the highest score was 99. Therefore, the range is 44. The range on Quiz 2 was larger: the lowest score was 44 and the highest score was 1010. Therefore the range is 66.

Interquartile Range

The interquartile range (IQR) is a range that contains the middle 50% of the scores in a distribution. It is computed as follows: IQR=75th percentile25th percentile IQR 75th percentile 25th percentile For Quiz 1, the 75th percentile is 88 and the 25th percentile is 66. The interquartile range is therefore 22. For Quiz 2, which has greater spread, the 75th percentile is 99, the 25th percentile is 55, and the interquartile range is 44. Recall that in the discussion of boxplots, the 75th percentile was called the upper hinge and the 25th percentile was called the lower hinge. Using this terminology, the interquartile range is referred to as the H-spread.

A related measure of variability is called the semi-interquartile range. The semi-interquartile range is defined simply as the interquartile range divided by 22. If a distribution is symmetric, the median plus or minus the semi-interquartile range contains half the scores in the distribution.

Variance

Variability can also be defined in terms of how close the scores in the distribution are to the middle of the distribution. Using the mean as the measure of the middle of the distribution, the variance is defined as the average squared difference of the scores from the mean. The data from Quiz 1 are shown in Table 1. The mean score is 7.07.0. Therefore, the column "Deviation from Mean" contains the score -7-7. The column "Squared Deviation" is simply the previous column squares.

Table 1: Calculation of Variance for Quiz 1 scores.
  Scores Deviation from Mean Squared Deviation
  9 2 4
  9 2 4
  9 2 4
  8 1 1
  8 1 1
  8 1 1
  8 1 1
  7 0 0
  7 0 0
  7 0 0
  7 0 0
  7 0 0
  6 -1 1
  6 -1 1
  6 -1 1
  6 -1 1
  6 -1 1
  6 -1 1
  5 -2 4
  5 -2 4
Mean 7 0 1.5

One thing that is important to notice is that the mean deviation from the mean is 00. This will always be the case. The mean of the squared deviations is 1.51.5. Therefore, the variance is 1.51.5. Analogous calculations with Quiz 2 show that it's variance is 6.76.7. The formula for the variance is: σ2=Xμ2N σ 2 X μ 2 N where σ2 σ 2 is the variance, μμ is the mean, and NN is the number of numbers. For Quiz 1, μ=7 μ 7 and N=20 N 20 .

If the variance in a sample is used to estimate the variance in a population, then the previous formula underestimates the variance and the following formula should be used: s2=XM2N1 s 2 X M 2 N 1 where s2 s 2 is the estimate of the variance and MM is the sample mean. Note that MM is the mean of a sample taken from a population with a mean of μμ. Since, in practice, the variance is usually computed in a sample, this formula is most often used. The simulation "estimating variance" illustrates the bias in the formula with NN in the denominator.

Let's take a concrete example. Assume the scores 11, 22, 44, and 55 were sampled from a larger population. To estimate the variance in the population you would compute s2 s 2 as follows: M=1+2+4+54=124=3 M 1 2 4 5 4 12 4 3 s2=132+232+432+53241=4+1+1+43=103=3.333 s 2 1 3 2 2 3 2 4 3 2 5 3 2 4 1 4 1 1 4 3 10 3 3.333 There are an alternate formulas that can be easier to use if you are doing your calculations with a hand calculator: σ2=X2X2NN σ 2 X 2 X 2 N N and s2=X2X2NN1 s 2 Σ X 2 X 2 N N 1 For this example, X2=12+22+42+52=46 X 2 1 2 2 2 4 2 5 2 46 X2N=1+2+4+52N=1444=36 X 2 N 1 2 4 5 2 N 144 4 36 σ2=46364=2.5 σ 2 46 36 4 2.5 and s2=46363=3.333 s 2 46 36 3 3.333 as with the other formula.

Standard Deviation

The standard deviation is simply the square root of the variance. This makes the standard deviations of the two quiz distributions 1.2251.225 and 2.5882.588. The standard deviation is an especially useful measure of variability when the distribution is normal or approximately normal (see Probability) because the proportion of the distribution within a given number of standard deviations from the mean can be calculated. For example, 6868% of the distribution is within one standard deviation of the mean and approximately 9595% of the distribution is within two standard deviations of the mean. Therefore, if you had a normal distribution with a mean of 5050 and a standard deviation of 1010, then 6868% of the distribution would be between 5010=40 50 10 40 and 50+10=60 50 10 60 . Similarly, about 9595% of the distribution would be between 502×10=30 50 2 10 30 and 50+2×10=70 50 2 10 70 . The symbol for the population standard deviation is σσ; the symbol for an estimate computed in a sample is ss. Figure 2 shows two normal distributions. Both distributions have means of 5050. The blue distribution has a standard deviation of 55; the red distribution has a standard deviation of 1010. For the blue distribution, 68 68 % of the distribution is between 4545 and 5555; for the red distribution, 6868% is between 4040 and 6060.

Figure 2: Normal distributions with standard deviations of 5 (blue line) and 10 (red line).
Figure 2 (normal_sd.gif)

Content actions

Give Feedback:

E-mail the module author | Rate module ( How does the rating system work?)

Rating system

Ratings

Ratings allow you to judge the quality of modules. If other users have ranked the module then its average rating is displayed below. Ratings are calculated on a scale from one star (Poor) to five stars (Excellent).

How to rate a module

Hover over the star that corresponds to the rating you wish to assign. Click on the star to add your rating. Your rating should be based on the quality of the content. You must have an account and be logged in to rate content.

(0 ratings)

Download:

Add module to:

My Favorites (?)

'My Favorites' is a special kind of lens which you can use to bookmark modules and collections directly in Connexions. 'My Favorites' can only be seen by you, and collections saved in 'My Favorites' can remember the last module you were on. You need a Connexions account to use 'My Favorites'.

| A lens (?)

Definition of a lens

Lenses

A lens is a custom view of Connexions content. You can think of it as a fancy kind of list that will let you see Connexions through the eyes of organizations and people you trust.

What is in a lens?

Lens makers point to Connexions materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

Who can create a lens?

Any individual Connexions member, a community, or a respected organization.

What are tags? tag icon

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

| External bookmarks