Skip to content Skip to navigation

OpenStax-CNX

You are here: Home » Content » Descriptive Statistics: Measuring the Spread of the Data

Navigation

Recently Viewed

This feature requires Javascript to be enabled.

Tags

(What is a tag?)

These tags come from the endorsement, affiliation, and other lenses that include this content.
 

Descriptive Statistics: Measuring the Spread of the Data

Module by: Susan Dean, Barbara Illowsky, Ph.D.. E-mail the authors

Summary: This module describes a number of statistical measures used to describe data, such as percentiles, spread, and skewness.

Note: You are viewing an old version of this document. The latest version is available here.

The most common measure of spread is the standard deviation. The standard deviation is a number that measures how far data values are from their mean. For example, if the mean of a set of data containing 7 is 5 and the standard deviation is 2, then the value 7 is one (1) standard deviation from its mean because 5 + (1)(2) = 7.

The number line may help you understand standard deviation. If we were to put 5 and 7 on a number line, 7 is to the right of 5. We say, then, that 7 is one standard deviation to the right of 5. If 1 were also part of the data set, then 1 is two standard deviations to the left of 5 because 5 +(-2)(2) = 1.

1=5+(-2)(2) ; 7=5+(1)(2)

A number line labeled from 0 to 7.

Formula: value = x¯ x + (#ofSTDEVs)(s)

Generally, a value = mean + (#ofSTDEVs)(standard deviation), where #ofSTDEVs = the number of standard deviations.

If xx is a value and x¯ x is the sample mean, then x-x- x¯ x is called a deviation. In a data set, there are as many deviations as there are data values. Deviations are used to calculate the sample standard deviation.

Calculation of the Sample Standard Deviation

To calculate the standard deviation, calculate the variance first. The variance is the average of the squares of the deviations. The standard deviation is the square root of the variance. You can think of the standard deviation as a special average of the deviations (the x-x- x¯ x values). The lower case letter ss represents the sample standard deviation and the Greek letter σσ (sigma) represents the population standard deviation. We use s 2 s 2 to represent the sample variance and σ 2 σ 2 to represent the population variance. If the sample has the same characteristics as the population, then s should be a good estimate of σσ.

Sampling Variability of a Statistic

The statistic of a sampling distribution was discussed in Descriptive Statistics: Measuring the Center of the Data. How much the statistic varies from one sample to another is known as the sampling variability of a statistic. You typically measure the sampling variability of a statistic by its standard error. The standard error of the mean is an example of a standard error. It is a special standard deviation and is known as the standard deviation of the sampling distribution of the mean. You will cover the standard error of the mean in The Central Limit Theorem (not now). The notation for the standard error of the mean is σ n σ n where σσ is the standard deviation of the population and nn is the size of the sample.

Note:

In practice, use either a calculator or computer software to calculate the standard deviation. However, please study the following step-by-step example.

Example 1

In a fifth grade class, the teacher was interested in the average age and the standard deviation of the ages of her students. What follows are the ages of her students to the nearest half year:

9 ; 9.5 ; 9.5 ; 10 ; 10 ; 10 ; 10 ; 10.5 ; 10.5 ; 10.5 ; 10.5 ; 11 ; 11 ; 11 ; 11 ; 11 ; 11 ; 11.5 ; 11.5 ; 11.5

x ¯ = 9 + 9.5 × 2 + 10 × 4 + 10.5 × 4 + 11 × 6 + 11.5 × 3 20 = 10.525 x ¯ = 9 + 9.5 × 2 + 10 × 4 + 10.5 × 4 + 11 × 6 + 11.5 × 3 20 =10.525
(1)

The average age is 10.53 years, rounded to 2 places.

The variance may be calculated by using a table. Then the standard deviation is calculated by taking the square root of the variance. We will explain the parts of the table after calculating ss.

Table 1
Data Freq. Deviations Deviations 2 Deviations 2 (Freq.)( Deviations 2 Deviations 2 )
xx ff (x-x¯)(x- x ) ( x - x ¯ ) 2 ( x - x ¯ ) 2 ( f ) ( x - x ¯ ) 2 ( f ) ( x - x ¯ ) 2
99 11 9 - 10.525 = - 1.525 9-10.525=-1.525 ( - 1.525 ) 2 = 2.325625 ( - 1.525 ) 2 =2.325625 1 × 2.325625 = 2.325625 1 × 2.325625 = 2.325625
9.59.5 22 9.5 - 10.525 = - 1.025 9.5-10.525=-1.025 ( - 1.025 ) 2 = 1.050625 ( - 1.025 ) 2 =1.050625 2 × 1.050625 = 2.101250 2 × 1.050625 = 2.101250
1010 44 10 - 10.525 = - 0.525 10-10.525=-0.525 ( - 0.525 ) 2 = 0.275625 ( - 0.525 ) 2 =0.275625 4 × .275625 = 1.1025 4 × .275625 = 1.1025
10.510.5 44 10.5 - 10.525 = - 0.025 10.5-10.525=-0.025 ( - 0.025 ) 2 = 0.000625 ( - 0.025 ) 2 =0.000625 4 × .000625 = .0025 4 × .000625 = .0025
1111 66 11 - 10.525 = 0.475 11-10.525=0.475 ( 0.475 ) 2 = 0.225625 ( 0.475 ) 2 =0.225625 6 × .225625 = 1.35375 6 × .225625 = 1.35375
11.511.5 33 11.5 - 10.525 = 0.975 11.5-10.525=0.975 ( 0.975 ) 2 = 0.950625 ( 0.975 ) 2 =0.950625 3 × .950625 = 2.851875 3 × .950625 = 2.851875

The sample variance, s 2 s 2 , is equal to the sum of the last column (9.7375) divided by the total number of data values minus one (20 - 1):

s 2 = 9.7375 20 - 1 = 0.5125 s 2 = 9.7375 20 - 1 =0.5125

The sample standard deviation, ss, is equal to the square root of the sample variance:

s = 0.5125 = . 0715891 s= 0.5125 =.0715891 Rounded to two decimal places, s = 0.72 s=0.72

Typically, you do the calculation for the standard deviation on your calculator or computer. The intermediate results are not rounded. This is done for accuracy.

Problem 1

Verify the mean and standard deviation calculated above on your calculator or computer. Find the median and mode.

Solution

  • Median = 10.5
  • Mode = 11

Problem 2

Find the value that is 1 standard deviation above the mean. Find ( x ¯ + 1 s ) ( x ¯ + 1 s ) .

Solution

( x ¯ + 1 s ) = 10.53 + ( 1 ) ( 0.72 ) = 11.25 ( x ¯ + 1 s ) =10.53+(1)(0.72)=11.25

Problem 3

Find the value that is two standard deviations below the mean. Find ( x ¯ - 2 s ) ( x ¯ - 2 s ) .

Solution

( x ¯ - 2 s ) = 10.53 - ( 2 ) ( 0.72 ) = 9.09 ( x ¯ - 2 s ) =10.53-(2)(0.72)=9.09

Problem 4

Find the values that are 1.5 standard deviations from (below and above) the mean.

Solution

  • ( x ¯ - 1.5 s ) = 10.53 - ( 1.5 ) ( 0.72 ) = 9.45 ( x ¯ - 1.5 s ) =10.53-(1.5)(0.72)=9.45
  • ( x ¯ + 1.5 s ) = 10.53 + ( 1.5 ) ( 0.72 ) = 11.61 ( x ¯ + 1.5 s ) =10.53+(1.5)(0.72)=11.61

Explanation of the table: The deviations show how spread out the data are about the mean. The value 11.5 is farther from the mean than 11. The deviations 0.975 and 0.475 indicate that. If you add the deviations, the sum is always zero. (For this example, there are 20 deviations.) So you cannot simply add the deviations to get the spread of the data. By squaring the deviations, you make them positive numbers. The variance, then, is the average squared deviation. It is small if the values are close to the mean and large if the values are far from the mean.

The variance is a squared measure and does not have the same units as the data. Taking the square root solves the problem. The standard deviation measures the spread in the same units as the data.

For the sample variance, we divide by the total number of data values minus one (n-1n-1). Why not divide by nn? The answer has to do with the population variance. The sample variance is an estimate of the population variance. By dividing by (n-1)(n-1), we get a better estimate of the population variance.

Your concentration should be on what the standard deviation does, not on the arithmetic. The standard deviation is a number which measures how far the data are spread from the mean. Let a calculator or computer do the arithmetic.

The sample standard deviation, ss , is either zero or larger than zero. When s = 0 s=0, there is no spread. When ss is a lot larger than zero, the data values are very spread out about the mean. Outliers can make ss very large.

The standard deviation, when first presented, can seem unclear. By graphing your data, you can get a better "feel" for the deviations and the standard deviation. You will find that in symmetrical distributions, the standard deviation can be very helpful but in skewed distributions, the standard deviation may not be much help. The reason is that the two sides of a skewed distribution have different spreads. In a skewed distribution, it is better to look at the first quartile, the median, the third quartile, the smallest value, and the largest value. Because numbers can be confusing, always graph your data.

Note:

The formula for the standard deviation is at the end of the chapter.

Example 2

Problem 1

Use the following data (first exam scores) from Susan Dean's spring pre-calculus class:

33; 42; 49; 49; 53; 55; 55; 61; 63; 67; 68; 68; 69; 69; 72; 73; 74; 78; 80; 83; 88; 88; 88; 90; 92; 94; 94; 94; 94; 96; 100

  • a. Create a chart containing the data, frequencies, relative frequencies, and cumulative relative frequencies to three decimal places.
  • b. Calculate the following to one decimal place using a TI-83+ or TI-84 calculator:
    • i. The sample mean
    • ii. The sample standard deviation
    • iii. The median
    • iv. The first quartile
    • v. The third quartile
    • vi. IQR
  • c. Construct a box plot and a histogram on the same set of axes. Make comments about the box plot, the histogram, and the chart.

Solution

  • a.
    Table 2
    Data Frequency Relative Frequency Cumulative Relative Frequency
    33 1 0.032 0.032
    42 1 0.032 0.064
    49 2 0.065 0.129
    53 1 0.032 0.161
    55 2 0.065 0.226
    61 1 0.032 0.258
    63 1 0.032 0.29
    67 1 0.032 0.322
    68 2 0.065 0.387
    69 2 0.065 0.452
    72 1 0.032 0.484
    73 1 0.032 0.516
    74 1 0.032 0.548
    78 1 0.032 0.580
    80 1 0.032 0.612
    83 1 0.032 0.644
    88 3 0.097 0.741
    90 1 0.032 0.773
    92 1 0.032 0.805
    94 4 0.129 0.934
    96 1 0.032 0.966
    100 1 0.032 0.998 (Why isn't this value 1?)
  • b.
    • i. The sample mean = 73.5
    • ii. The sample standard deviation = 17.9
    • iii. The median = 73
    • iv. The first quartile = 61
    • v. The third quartile = 90
    • vi. IQR = 90 - 61 = 29
  • c. The x-axis goes from 32.5 to 100.5; y-axis goes from -2.4 to 15 for the histogram; number of intervals is 5 for the histogram so the width of an interval is (100.5 - 32.5) divided by 5 which is equal to 13.6. Endpoints of the intervals: starting point is 32.5, 32.5+13.6 = 46.1, 46.1+13.6 = 59.7, 59.7+13.6 = 73.3, 73.3+13.6 = 86.9, 86.9+13.6 = 100.5 = the ending value; No data values fall on an interval boundary.
    Figure 1
    A hybrid image displaying both a histogram and box plot described in detail in the answer solution above.

The long left whisker in the box plot is reflected in the left side of the histogram. The spread of the exam scores in the lower 50% is greater (73 - 33 = 40) than the spread in the upper 50% (100 - 73 = 27). The histogram, box plot, and chart all reflect this. There are a substantial number of A and B grades (80s, 90s, and 100). The histogram clearly shows this. The box plot shows us that the middle 50% of the exam scores (IQR = 29) are Ds, Cs, and Bs. The box plot also shows us that the lower 25% of the exam scores are Ds and Fs.

Example 3

Problem 1

Two students, John and Ali, from different high schools, wanted to find out who had the highest G.P.A. when compared to his school. Which student had the highest G.P.A. when compared to his school?

Table 3
Student GPA School Mean GPA School Standard Deviation
John 2.85 3.0 0.7
Ali 77 80 10

Solution

Use the formula value = mean + (#ofSTDEVs)(stdev) and solve for #ofSTDEVs for each student (stdev = standard deviation):

# ofSTDEVs = value - mean stdev #ofSTDEVs= value - mean stdev :

For John, # ofSTDEVs = 2.85 - 3.0 0.7 = - 0.21 #ofSTDEVs= 2.85 - 3.0 0.7 =-0.21

For Ali, # ofSTDEVs = 77 - 80 10 = - 0.3 #ofSTDEVs= 77 - 80 10 =-0.3

John has the better G.P.A. when compared to his school because his G.P.A. is 0.21 standard deviations below his mean while Ali's G.P.A. is 0.3 standard deviations below his mean.

Glossary

Standard Deviation:
A number that is equal to the square root of the variance and measures how far data values are from their mean. Notation: s for sample standard deviation and σσfor population standard deviation.
Variance:
Mean of the squared deviations from the mean. Square of the standard deviation. For a set of data, a deviation can be represented as x-x¯x- x where xx is a value of the data and x¯ x is the sample mean. The sample variance is equal to the sum of the squares of the deviations divided by the difference of the sample size and 1.

Content actions

Download module as:

Add module to:

My Favorites (?)

'My Favorites' is a special kind of lens which you can use to bookmark modules and collections. 'My Favorites' can only be seen by you, and collections saved in 'My Favorites' can remember the last module you were on. You need an account to use 'My Favorites'.

| A lens I own (?)

Definition of a lens

Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

Who can create a lens?

Any individual member, a community, or a respected organization.

What are tags? tag icon

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

| External bookmarks