Law of Large Numbers
The Law of Large Numbers says that if you take samples of larger and larger size
from any population, then the mean
x¯
x
of the sample gets closer and closer to μμ.
From the Central Limit Theorem, we know that as nn gets larger and larger, the
sample averages follow a normal distribution. The larger n gets, the smaller the
standard deviation gets. (Remember that the standard deviation for
X¯
X
is
σ
n
σ
n
.)
This means that the sample mean
x¯
x
must be close to the population mean μμ. We
can say that μμ is the value that the sample averages approach as nn gets larger. The
Central Limit Theorem illustrates the Law of Large Numbers.
Central Limit Theorem for the Mean (Average) and Sum Examples
A study involving stress is done on a college campus among the
students. The stress scores follow a uniform distribution with the lowest stress
score equal to 1 and the highest equal to 5. Using a sample of 75 students, find:
- The probability that the average stress score for the 75 students is less than 2.
- The 90th percentile for the average stress score for the 75 students.
- The probability that the total of the 75 stress scores is less than 200.
- The 90th percentile for the total stress score for the 75 students.
Let XX = one stress score.
Problems 1. and 2. ask you to find a probability or a percentile for an average or mean.
Problems 3 and 4 ask you to find a probability or a percentile for a total or sum.
The sample size, nn, is equal to 75.
Since the individual stress scores follow a uniform
distribution, XX ~ U(1, 5)U(1,5) where a=1a=1 and b=5b=5 (See Continuous Random Variables for the uniform).
μ
X
=
a
+
b
2
=
1
+
5
2
=
3
μ
X
=
a
+
b
2
=
1
+
5
2
=3
σ
X
=
(
b
-
a
)
2
12
=
(
5
-
1
)
2
12
=
1.15
σ
X
=
(
b
-
a
)
2
12
=
(
5
-
1
)
2
12
=1.15
For problems 1. and 2., let
X¯
X
= the average stress score for the 75
students. Then,
X¯
X
~
N
(
3
,
1.15
75
)
N(3,
1.15
75
)
where
n = 75n = 75.
Find
P
(
X¯
<
2
)
P
(
X
2
)
.
Draw the graph.
P
(
X¯
<
2
)
=
0
P
(
X
2
)
=0
The probability that the
average stress score is less
than 2 is about 0.
normalcdf
(
1
,
2
,
3
,
1.15
75
)
=
0
(1,2,3,
1.15
75
)=0
The smallest stress score is 1. Therefore, the smallest
average for 75 stress scores is 1.
Find the 90th percentile for the average of 75 stress scores. Draw a graph.
Let
k
k
= the 90th precentile.
Find kk
where
P
(
X¯
<
k
)
=
0.90
P
(
X
k
)
=0.90.
k
=
3.2
k=3.2
The 90th percentile for the average of 75 scores is about 3.2. This means that
90% of all the averages of 75 stress scores are at most 3.2 and 10% are at least
3.2.
invNorm
(
.90
,
3
,
1.15
75
)
=
3.2
(.90,3,
1.15
75
)=3.2
For problems c and d, let
ΣXΣX = the sum of the 75 stress scores.
Then,
ΣXΣX ~
N
[
(
75
)
⋅
(
3
)
,
75
⋅
1.15
]
N[(75)⋅(3),
75
⋅1.15]
Find
P
(
ΣX
<
200
)
P
(
ΣX
200
)
.
Draw the graph.
The mean of the sum
of 75 stress scores is
75
⋅
3
=
225
75⋅3=225
The standard
deviation of the
sum of 75 stress
scores is
75
⋅
1.15
=
9.96
75
⋅1.15=9.96
P
(
ΣX
<
200
)
=
0
P
(
ΣX
200
)
=0

The probability that the total of 75 scores is less than 200 is about 0.
normalcdf
(
75
,
200
,
75
⋅
3
,
75
⋅
1.15
)
=
0
(75,200,75⋅3,
75
⋅1.15)=0.
The smallest total of 75 stress scores is 75
since the smallest single score is 1.
Find the 90th percentile for the total of 75 stress scores. Draw a graph.
Let
k
k
= the 90th percentile.
Find
k
k
where
P
(
ΣX
<
k
)
=
0.90
P
(
ΣX
k
)
=0.90.
k
=
237.8
k=237.8

The 90th percentile for the sum of 75 scores is about 237.8. This means that 90% of
all the sums of 75 scores are no more than 237.8 and 10% are no less than 237.8.
invNorm
(
.90
,
75
⋅
3
,
75
⋅
1.15
)
=
237.8
(.90,75⋅3,
75
⋅1.15)=237.8
Suppose that a market research analyst for a cell phone company conducts a study of their customers who exceed the time allowance included on their basic cell phone contract; the analyst finds that for those people who exceed the time included in their basic contract, the excess time used follows an
exponential distribution with a mean of 22 minutes.
Consider a random sample of 80 customers who exceed the time allowance included in their basic cell phone contract.
Let XX = the excess time used by one INDIVIDUAL cell phone customer who exceeds his contracted time allowance.
XX ~ Exp(122)Exp(122) From Chapter
5, we know that μ=22μ=22 and σ=22σ=22.
Let
X¯
X
= the AVERAGE excess time used by a sample of n = 80 n = 80 customers who exceed their contracted time allowance.
X¯
X
~
N
(
22
,
22
80
)
N(22,
22
80
)
by the CLT for Sample Means or Averages
- a. Find the probability that the average excess time used by the 80 customers in the sample is longer than 20 minutes. This is asking us to find
P
(
X¯
>
20
)
P
(
X
20
)
Draw the graph.
- b. Suppose that one customer who exceeds the time limit for his cell phone contract is randomly selected. Find the probability that this individual customer's excess time is longer than 20 minutes. This is asking us to find
P(X>20)
P(X
20)
- c. Explain why the probabilities in (a) and (b) are different.
Find:
P
(
X¯
>
20
)
P
(
X
20
)
P
(
X¯
>
20
)
=
0.7919
P
(
X
20
)
=0.7919 using
normalcdf
(
20
,
1E99
,
22
,
22
80
)
(20,1E99,22,
22
80
)
The probability is 0.7919 that the
average excess time used is more
than 20 minutes, for a sample of 80 customers who exceed their contracted time allowance.
1E99
=
10
99
and
-1E99
=
-
10
99
1E99=
10
99
and-1E99=-
10
99
.
Press the EE key for E. Or just use 10^99 instead of 1E99.
Find P(X>20) . Remember to use the exponential distribution for an individual: X~Exp(1/22).
P(X>20) = e^(–(1/22)*20) or e^(–.04545*20) = 0.4029
-
P
(
X
>
20
)
=
0.4029
P
(
X
20
)
=0.4029 but
P
(
X¯
>
20
)
=
0.7919
P
(
X
20
)
=0.7919
- The probabilities are not equal because we use different distributions to calculate the probability for individuals and for averages.
- When asked to find the probability of an individual value, use the stated
distribution of its random variable; do not use the CLT. Use the CLT with the normal distribution when you are being asked to find the probability for an average.
Find the 95th percentile for the sample average excess time for samples of 80 customers who exceed their basic contract time allowances. Draw a graph.
Let kk = the 95th
percentile. Find kk where
P
(
X¯
<
k
)
=
0.95
P
(
X
k
)
=0.95
k
=
26.0
k=26.0 using
invNorm
(
.95
,
22
,
22
80
)
=
26.0
(.95,22,
22
80
)=26.0
The 95th percentile for the sample average excess time used is about 26.0 minutes for random samples of 80 customers who exceed their contractual allowed time.
95% of such samples would have averages under 26 minutes; only 5% of such samples would have averages above 26 minutes.
(HISTORICAL): Normal Approximation to the Binomial
Historically, being able to compute binomial probabilities was one of the most important applications of the Central Limit Theorem. Binomial probabilities were displayed in a table in a book with a small value for nn (say, 20). To calculate the probabilities with large values of nn, you had to use the binomial formula which could be very complicated.
Using the Normal Approximation to the Binomial simplified the process. To compute the Normal Approximation to the Binomial,
take a simple random sample from a population. You must meet the conditions
for a binomial distribution:
- •. there are a certain number nn of independent
trials
- •. the outcomes of any
trial are success or failure
- •. each trial has the same probability of a success pp
Recall that if
XX is the binomial random variable, then
XX~ B
(
n
,
p
)B(n,p).
The shape of the binomial distribution needs to be
similar to the shape of the normal distribution. To ensure this, the quantities
npnp
and
nqnq must both be greater than five (
np>5np>5 and
nq>5nq>5; the approximation is better if they are both greater than or equal to 10). Then the binomial can be approximated by the normal distribution with mean
μ=
n
p
μ=
n
p
and standard deviation
σ=
n
p
q
.
σ=
n
p
q
.
Remember that
q=1-p. q=1-p. In order to get the best approximation, add 0.5 to
X X or subtract 0.5 from
X X
( (use
X+0.5X+0.5 or
X-0.5X-0.5)). The number
0.50.5 is called the
continuity correction factor.
Suppose in a local Kindergarten through 12th grade (K - 12) school district, 53 percent of the population favor a charter school for grades K - 5. A simple random sample of 300 is surveyed.
- Find the probability that at least 150 favor a charter school.
- Find the probability that at most 160 favor a charter school.
- Find the probability that more than 155 favor a charter school.
- Find the probability that less than 147 favor a charter school.
- Find the probability that exactly 175 favor a charter school.
Let X=X= the number that favor a charter school for grades K - 5. XX~ B
(
n
,
p
)B(n,p) where
n=300n=300 and p=0.53. p=0.53.
Since np>5np>5 and nq>5, nq>5, use the normal approximation to the binomial. The formulas for the mean and standard deviation are
μ=
n
p
μ=
n
p
and
σ=
n
p
q
.
σ=
n
p
q
. The mean is 159 and the standard deviation is 8.6447.
The random variable for the normal distribution is YY.
Y~N
(
159
,
8.6447
)Y~N
(
159
,
8.6447
).
See The Normal Distribution for help with calculator instructions.
For Problem 1., you include 150 so
P
(
X
≥
150
)
P
(
X
150
)
has normal approximation
P
(
Y
≥
149.5
)
= 0.8641
P
(
Y
149.5
)
= 0.8641.
normalcdf
(
149.5
,
10^99
,
159
,
8.6447
)
=
0.8641
(149.5,10^99,159,8.6447)=0.8641.
For Problem 2., you include 160 so
P
(
X
≤
160
)
P
(
X
160
)
has normal approximation
P
(
Y
≤
160.5
)
= 0.5689
P
(
Y
160.5
)
= 0.5689.
normalcdf
(
0
,
160.5
,
159
,
8.6447
)
=
0.5689
(0,160.5,159,8.6447)=0.5689
For Problem 3., you exclude 155 so
P
(
X
>
155
)
P
(
X
155
)
has normal approximation
P
(
Y
>
155.5
)
= 0.6572
P
(
Y
155.5
)
= 0.6572.
normalcdf
(
155.5
,
10^99
,
159
,
8.6447
)
=
0.6572
(155.5,10^99,159,8.6447)=0.6572
For Problem 4., you exclude 147 so
P
(
X
<
147
)
P
(
X
147
)
has normal approximation
P
(
Y
<
146.5
)
= 0.0741
P
(
Y
146.5
)
= 0.0741.
normalcdf
(
0
,
146.5
,
159
,
8.6447
)
=
0.0741
(0,146.5,159,8.6447)=0.0741
For Problem 5.,
P
(
X
=
175
)
P
(
X
=
175
)
has normal approximation
P
(
174.5
<
Y
<
175.5
)
= 0.0083
P(174.5<Y<175.5) = 0.0083.
normalcdf
(
174.5
,
175.5
,
159
,
8.6447
)
=
0.0083
(174.5,175.5,159,8.6447)=0.0083
Because of calculators and computer software that easily let you calculate binomial probabilities for large values of nn, it is not necessary to use the the Normal Approximation to the Binomial provided you have access to these technology tools. Most school labs have Microsoft Excel, an example of computer software that calculates binomial probabilities. Many students have access to the TI-83 or 84 series calculators and they easily calculate probabilities for the binomial. In an Internet browser, if you type in "binomial probability distribution calculation," you can find at least one online calculator for the binomial.
For Example 3, the probabilities are calculated using the binomial (n=300n=300 and p=0.53p=0.53) below. Compare the binomial and normal distribution answers. See Discrete Random Variables for help with calculator instructions for the binomial.
P
(
X
≥
150
)
P
(
X
150
)
:
1 - binomialcdf
(
300
,
0.53
,
149
)
=0.8641(300,0.53,149)=0.8641
P
(
X
≤
160
)
P
(
X
160
)
:
binomialcdf
(
300
,
0.53
,
160
)
=0.5684(300,0.53,160)=0.5684
P
(
X
>
155
)
P
(
X
155
)
:
1 - binomialcdf
(
300
,
0.53
,
155
)
=0.6576(300,0.53,155)=0.6576
P
(
X
<
147
)
P
(
X
147
)
:
binomialcdf
(
300
,
0.53
,
146
)
=0.0742(300,0.53,146)=0.0742
P
(
X
=
175
)
P
(
X
=
175
)
: (You use the binomial pdf.)
binomialpdf
(
175
,
0.53
,
146
)
=0.0083(175,0.53,146)=0.0083
**Contributions made to Example 2 by Roberta Bloom
"Reviewer's Comments: 'I recommend this book. Overall, the chapters are very readable and the material presented is consistent and appropriate for the course. A wide range of exercises introduces […]"