# Connexions

You are here: Home » Content » Measures of Variability

### Recently Viewed

This feature requires Javascript to be enabled.

# Measures of Variability

Module by: David Lane. E-mail the author

## What is Variability?

Variability refers to how "spread out" a group of scores is. To see what we mean by spread out, consider graphs in Figure 1. These graphs represent the scores on two quizzes. The mean score for each quiz is 7.07.0. Despite the equality of means, you can see that the distributions are quite different. Specifically, the scores on Quiz 1 are more densely packed and those on Quiz 2 are more spread out. The differences among students was much greater on Quiz 2 than on Quiz 1.

The terms variability, spread, and dispersion are synonyms, and refer to how spread out a distribution is. Just as in the section on central tendency we discussed measures of the center of a distribution of scores, in this chapter we will discuss measures of the variability of a distribution. There are four frequently used measures of variability, the range, interquartile range, variance, and standard deviation. In the next few paragraphs, we will look at each of these four measures of variability in more detail.

## Range

The range is the simplest measure of variability to calculate, and one you have probably encountered many times in your life. The range is simply the highest score minus the lowest score. Let's take a few examples. What is the range of the following group of numbers - 10256734 10 2 5 6 7 3 4 ? Well, the highest number is 1010, and the lowest number is 2 2 , so 102=8 10 2 8 . The range is 88. Let's take another example. Here's a dataset with 1010 numbers - 99452367459182786251 99 45 23 67 45 91 82 78 62 51 . What is the range? The highest number is 9999 and the lowest number is 2323, so 9923=76 99 23 76 ; the range is 7676. Now consider the two quizzes shown in Figure 1. On Quiz 1, the lowest score was 55 and the highest score was 99. Therefore, the range is 44. The range on Quiz 2 was larger: the lowest score was 44 and the highest score was 1010. Therefore the range is 66.

## Interquartile Range

The interquartile range (IQR) is a range that contains the middle 50% of the scores in a distribution. It is computed as follows: IQR=75th percentile25th percentile IQR 75th percentile 25th percentile For Quiz 1, the 75th percentile is 88 and the 25th percentile is 66. The interquartile range is therefore 22. For Quiz 2, which has greater spread, the 75th percentile is 99, the 25th percentile is 55, and the interquartile range is 44. Recall that in the discussion of boxplots, the 75th percentile was called the upper hinge and the 25th percentile was called the lower hinge. Using this terminology, the interquartile range is referred to as the H-spread.

A related measure of variability is called the semi-interquartile range. The semi-interquartile range is defined simply as the interquartile range divided by 22. If a distribution is symmetric, the median plus or minus the semi-interquartile range contains half the scores in the distribution.

## Variance

Variability can also be defined in terms of how close the scores in the distribution are to the middle of the distribution. Using the mean as the measure of the middle of the distribution, the variance is defined as the average squared difference of the scores from the mean. The data from Quiz 1 are shown in Table 1. The mean score is 7.07.0. Therefore, the column "Deviation from Mean" contains the score -7-7. The column "Squared Deviation" is simply the previous column squares.

Table 1: Calculation of Variance for Quiz 1 scores.
Scores Deviation from Mean Squared Deviation
9 2 4
9 2 4
9 2 4
8 1 1
8 1 1
8 1 1
8 1 1
7 0 0
7 0 0
7 0 0
7 0 0
7 0 0
6 -1 1
6 -1 1
6 -1 1
6 -1 1
6 -1 1
6 -1 1
5 -2 4
5 -2 4
Mean 7 0 1.5

One thing that is important to notice is that the mean deviation from the mean is 00. This will always be the case. The mean of the squared deviations is 1.51.5. Therefore, the variance is 1.51.5. Analogous calculations with Quiz 2 show that it's variance is 6.76.7. The formula for the variance is: σ2=Xμ2N σ 2 X μ 2 N where σ2 σ 2 is the variance, μμ is the mean, and NN is the number of numbers. For Quiz 1, μ=7 μ 7 and N=20 N 20 .

If the variance in a sample is used to estimate the variance in a population, then the previous formula underestimates the variance and the following formula should be used: s2=XM2N1 s 2 X M 2 N 1 where s2 s 2 is the estimate of the variance and MM is the sample mean. Note that MM is the mean of a sample taken from a population with a mean of μμ. Since, in practice, the variance is usually computed in a sample, this formula is most often used. The simulation "estimating variance" illustrates the bias in the formula with NN in the denominator.

Let's take a concrete example. Assume the scores 11, 22, 44, and 55 were sampled from a larger population. To estimate the variance in the population you would compute s2 s 2 as follows: M=1+2+4+54=124=3 M 1 2 4 5 4 12 4 3 s2=132+232+432+53241=4+1+1+43=103=3.333 s 2 1 3 2 2 3 2 4 3 2 5 3 2 4 1 4 1 1 4 3 10 3 3.333 There are an alternate formulas that can be easier to use if you are doing your calculations with a hand calculator: σ2=X2X2NN σ 2 X 2 X 2 N N and s2=X2X2NN1 s 2 Σ X 2 X 2 N N 1 For this example, X2=12+22+42+52=46 X 2 1 2 2 2 4 2 5 2 46 X2N=1+2+4+52N=1444=36 X 2 N 1 2 4 5 2 N 144 4 36 σ2=46364=2.5 σ 2 46 36 4 2.5 and s2=46363=3.333 s 2 46 36 3 3.333 as with the other formula.

## Standard Deviation

The standard deviation is simply the square root of the variance. This makes the standard deviations of the two quiz distributions 1.2251.225 and 2.5882.588. The standard deviation is an especially useful measure of variability when the distribution is normal or approximately normal (see Probability) because the proportion of the distribution within a given number of standard deviations from the mean can be calculated. For example, 6868% of the distribution is within one standard deviation of the mean and approximately 9595% of the distribution is within two standard deviations of the mean. Therefore, if you had a normal distribution with a mean of 5050 and a standard deviation of 1010, then 6868% of the distribution would be between 5010=40 50 10 40 and 50+10=60 50 10 60 . Similarly, about 9595% of the distribution would be between 502×10=30 50 2 10 30 and 50+2×10=70 50 2 10 70 . The symbol for the population standard deviation is σσ; the symbol for an estimate computed in a sample is ss. Figure 2 shows two normal distributions. Both distributions have means of 5050. The blue distribution has a standard deviation of 55; the red distribution has a standard deviation of 1010. For the blue distribution, 68 68 % of the distribution is between 4545 and 5555; for the red distribution, 6868% is between 4040 and 6060.

## Content actions

PDF | EPUB (?)

### What is an EPUB file?

EPUB is an electronic book format that can be read on a variety of mobile devices.

My Favorites (?)

'My Favorites' is a special kind of lens which you can use to bookmark modules and collections. 'My Favorites' can only be seen by you, and collections saved in 'My Favorites' can remember the last module you were on. You need an account to use 'My Favorites'.

| A lens I own (?)

#### Definition of a lens

##### Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

##### What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

##### Who can create a lens?

Any individual member, a community, or a respected organization.

##### What are tags?

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

| External bookmarks