Skip to content Skip to navigation Skip to collection information

OpenStax_CNX

You are here: Home » Content » Collaborative Statistics: Custom Version modified by V Moyle » Using the Central Limit Theorem (modified R. Bloom)

Navigation

Table of Contents

Recently Viewed

This feature requires Javascript to be enabled.
 

Using the Central Limit Theorem (modified R. Bloom)

Module by: Roberta Bloom. E-mail the author

Based on: Central Limit Theorem: Using the Central Limit Theorem by Barbara Illowsky, Ph.D., Susan Dean

Summary: This module has examples illustrating use of the Central Limit Theorem is used. This revision of the original module in the collection Collaborative Statistics by S. Dean and Dr. B. Illowsky includes examples only for the CLT for means, and omits their material for the CLT for sums. The second example in this section has been changed to correct errors in the earlier versions of this module.

It is important to understand when to use the CLT. Use the CLT for means or averages when you are asked to find the probability for a sample average or mean, or when working with percentiles for sample averages. (If you are being asked to find the probability or percentile of a sum or total, use the CLT for sums.)

Note:

If you are being asked to find the probability of an individual value, do not use the CLT. Use the distribution of its random variable.

Law of Large Numbers

The Law of Large Numbers says that if you take samples of larger and larger size from any population, then the mean x¯ x of the sample gets closer and closer to μμ. From the Central Limit Theorem, we know that as nn gets larger and larger, the sample averages follow a normal distribution. The larger n gets, the smaller the standard deviation gets. (Remember that the standard deviation for X¯ X is σ n σ n .) This means that the sample mean x¯ x must be close to the population mean μμ. We can say that μμ is the value that the sample averages approach as nn gets larger. The Central Limit Theorem illustrates the Law of Large Numbers.

Example 1

A study involving stress is done on a college campus among the students. The stress scores follow a continuous uniform distribution with the lowest stress score equal to 1 and the highest equal to 5. Using a sample of 75 students, find:

  • a. The probability that the average stress score for the 75 students is less than 2.
  • b. The 90th percentile for the average stress score for the 75 students.

Let XX = the stress score for one individual student

The individual stress scores follow a continuous uniform distribution, XX ~ U(1, 5)U(1,5) where a=1a=1 and b=5b=5 (See the chapter on Continuous Random Variables).

μ X = a + b 2 = 1 + 5 2 = 3 μ X = a + b 2 = 1 + 5 2 =3

σ X = ( b - a ) 2 12 = ( 5 - 1 ) 2 12 = 1.15 σ X = ( b - a ) 2 12 = ( 5 - 1 ) 2 12 =1.15

Problems a and b ask you to find a probability or a percentile for an average or mean. The sample size, nn, is equal to 75.

Let X¯ X = the average stress score for the 75 students.

For the average stress score, use the CLT which tells us that X¯ X ~ N ( μ , σ n ) N(μ, σ n )

X¯ X ~ N ( 3 , 1.15 75 ) N(3, 1.15 75 ) where n = 75n = 75.

Problem 1

Find P ( X¯ < 2 ) P ( X 2 ) . Draw the graph.

Solution

P ( X¯ < 2 ) = 0 P ( X 2 ) =0

The probability that the average stress score is less than 2 is about 0.

Normal distribution curve for the average with values of 2 and 3 on the x-axis. A vertical upward line extends from point 2 up to the curve. The probability area occurs from the beginning of the curve to point 2.

normalcdf ( 1 , 2 , 3 , 1.15 75 ) = 0 (1,2,3, 1.15 75 )=0

Reminder:
The smallest stress score is 1. Therefore, the smallest average for 75 stress scores is 1.

Problem 2

Find the 90th percentile for the sample average of 75 stress scores. Draw a graph.

Solution

Let k k = the 90th precentile. Find kk where P ( X¯ < k ) = 0.90 P ( X k ) =0.90.

k = 3.17 k=3.17 using invNorm ( .90 , 3 , 1.15 75 ) = 3.17 (.90,3, 1.15 75 )=3.17

Normal distribution curve graph with a vertical upward line at point k on the x-axis. The probability area under the curve before k is equal to 0.90. k is equal to the 90th percentile.

The 90th percentile for the sample average of 75 scores is about 3.17. This means that 90% of all the averages of samples of 75 stress scores are at most 3.17 and 10% of the sample averages are at least 3.17 .

Example 2

Suppose that a market research analyst for a cell phone company conducts a study of their customers who exceed the time allowance included on their basic cell phone contract; the analyst finds that for those people who exceed the time included in their basic contract, the excess time used follows an exponential distribution with a mean of 22 minutes.

Consider a random sample of 80 customers who exceed the time allowance included in their basic cell phone contract.

Let XX = the excess time used by one INDIVIDUAL cell phone customer who exceeds his contracted time allowance.

XX ~ Exp(122)Exp(122) From Chapter 5, we know that μ=22μ=22 and σ=22σ=22.

Let X¯ X = the AVERAGE excess time used by a sample of n = 80 n = 80 customers who exceed their contracted time allowance.

X¯ X ~ N ( 22 , 22 80 ) N(22, 22 80 ) by the CLT for Sample Means or Averages

Problem 1

Using the CLT to find Probability:
  • a. Find the probability that the average excess time used by the 80 customers in the sample is longer than 20 minutes. This is asking us to find P ( X¯ > 20 ) P ( X 20 ) Draw the graph.
  • b. Suppose that one customer who exceeds the time limit for his cell phone contract is randomly selected. Find the probability that this individual customer's excess time is longer than 20 minutes. This is asking us to find P(X>20) P(X 20)
  • c. Explain why the probabilities in (a) and (b) are different.
Solution
Part a.

Find: P ( X¯ > 20 ) P ( X 20 )

P ( X¯ > 20 ) = 0.7919 P ( X 20 ) =0.7919 using normalcdf ( 20 , 1E99 , 22 , 22 80 ) (20,1E99,22, 22 80 )

The probability is 0.7919 that the average excess time used is more than 20 minutes, for a sample of 80 customers who exceed their contracted time allowance.

Normal distribution curve with values of 20 and 22 on the x-axis. Vertical upward line extends from point 20 to curve. The probability area begins from point 20 to the end of the curve.

Reminder:
1E99 = 10 99 and -1E99 = - 10 99 1E99= 10 99 and-1E99=- 10 99 . Press the EE key for E. Or just use 10^99 instead of 1E99.
Part b.

Find P(X>20) . Remember to use the exponential distribution for an individual: X~Exp(1/22).

P(X>20) = e^(–(1/22)*20) or e^(–.04545*20) = 0.4029

Part c. Explain why the probabilities in (a) and (b) are different.
  • P ( X > 20 ) = 0.4029 P ( X 20 ) =0.4029 but P ( X¯ > 20 ) = 0.7919 P ( X 20 ) =0.7919
  • The probabilities are not equal because we use different distributions to calculate the probability for individuals and for averages.
  • When asked to find the probability of an individual value, use the stated distribution of its random variable; do not use the CLT. Use the CLT with the normal distribution when you are being asked to find the probability for an average.

Problem 2

Using the CLT to find Percentiles:

Find the 95th percentile for the sample average excess time for samples of 80 customers who exceed their basic contract time allowances. Draw a graph.

Solution

Let kk = the 95th percentile. Find kk where P ( X¯ < k ) = 0.95 P ( X k ) =0.95

k = 26.0 k=26.0 using invNorm ( .95 , 22 , 22 80 ) = 26.0 (.95,22, 22 80 )=26.0

Normal distribution curve with value of k on x-axis. Vertical upward line extends from k to curve. Probability area from the beginning of the curve to point k is equal to 0.95.

The 95th percentile for the sample average excess time used is about 26.0 minutes for random samples of 80 customers who exceed their contractual allowed time.

95% of such samples would have averages under 26 minutes; only 5% of such samples would have averages above 26 minutes.

Glossary

Average:
A number that describes the central tendency of the data. There are a number of specialized averages, including the arithmetic mean, weighted mean, median, mode, and geometric mean.
Central Limit Theorem:
Given a random variable (RV) with known mean μμ and known variance σσ 22 size 12{ {} rSup { size 8{2} } } {}, we are sampling with size n and we are interested in two new RV - sample mean, XˉXˉ size 12{ { bar {X}}} {},and sample sum,ΣΣ XX size 12{X} {}. If the size n of the sample is sufficiently large, then XˉXˉ size 12{ { bar {X}}} {} N σ 2 n N σ 2 n and ΣXΣX size 12{X} {}N n σ 2 N n σ 2 . In words, if the size n of the sample is sufficiently large, then the distribution of the sample means and the distribution of the sample sums will approximate a normal distribution regardless of the shape of the population. And even more, the mean of the sampling distribution will equal the population mean and mean of sampling sums will equal n times the population mean. The standard deviation of the distribution of the sample means, σ n σ n , is called standard error of the mean.
Exponential Distribution:
Continuous random variable (RV) that appears when we are interested in intervals of time between some random events, for example, the length of time between emergency arrivals at a hospital. Notation: X~Exp(m)X~Exp(m) size 12{X "~" ital "Exp" \( m \) } {}; the mean is μ=1mμ=1m size 12{μ= { {1} over {m} } } {}, and the variance is σ 2 = 1 m 2 σ 2 = 1 m 2 , the probability density function is f(x)=memx,f(x)=memx, size 12{f \( x \) = ital "me" rSup { size 8{- ital "mx"} } ," "} {} x 0 x 0 and cumulative distribution is P(Xx)=1emxP(Xx)=1emx size 12{P \( X <= x \) =1-e rSup { size 8{- ital "mx"} } } {}.
Mean:
A number to measure the central tendency (average), shortening from arithmetic mean. By definition, the mean for a sample (usually denoted by XˉXˉ size 12{ { bar {X}}} {}) is Xˉ=Sum of all values in the sampleNumber of values in the sampleXˉ=Sum of all values in the sampleNumber of values in the sample size 12{ { bar {X}}= { {"Sum of all values in the sample"} over {"Number of values in the sample"} } } {}, and the mean for a population (usually denoted by mm size 12{m} {}) is m=Sum of all values in the populationNumber of values in the populationm=Sum of all values in the populationNumber of values in the population size 12{m= { {"Sum of all values in the population"} over {"Number of values in the population"} } } {}.

Collection Navigation

Content actions

Download:

Collection as:

PDF | EPUB (?)

What is an EPUB file?

EPUB is an electronic book format that can be read on a variety of mobile devices.

Downloading to a reading device

For detailed instructions on how to download this content's EPUB to your specific device, click the "(?)" link.

| More downloads ...

Module as:

PDF | EPUB (?)

What is an EPUB file?

EPUB is an electronic book format that can be read on a variety of mobile devices.

Downloading to a reading device

For detailed instructions on how to download this content's EPUB to your specific device, click the "(?)" link.

| More downloads ...

Add:

Collection to:

My Favorites (?)

'My Favorites' is a special kind of lens which you can use to bookmark modules and collections. 'My Favorites' can only be seen by you, and collections saved in 'My Favorites' can remember the last module you were on. You need an account to use 'My Favorites'.

| A lens I own (?)

Definition of a lens

Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

Who can create a lens?

Any individual member, a community, or a respected organization.

What are tags? tag icon

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

| External bookmarks

Module to:

My Favorites (?)

'My Favorites' is a special kind of lens which you can use to bookmark modules and collections. 'My Favorites' can only be seen by you, and collections saved in 'My Favorites' can remember the last module you were on. You need an account to use 'My Favorites'.

| A lens I own (?)

Definition of a lens

Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

Who can create a lens?

Any individual member, a community, or a respected organization.

What are tags? tag icon

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

| External bookmarks