During an election year, we see articles in the newspaper that state confidence intervals
in terms of proportions or percentages. For example, a poll for a particular
candidate running for president might show that the candidate has 40% of the vote
within 3 percentage points. Often, election polls are calculated with 95% confidence.
So, the pollsters would be 95% confident that the true proportion of voters who
favored the candidate would be between 0.37 and 0.43 :
(
0.40
-
0.03
,
0.40
+
0.03
)
(0.40-0.03,0.40+0.03).
Investors in the stock market are interested in the true proportion of stocks that go up
and down each week. Businesses that sell personal computers are interested in the
proportion of households in the United States that own personal computers.
Confidence intervals can be calculated for the true proportion of stocks that go up or
down each week and for the true proportion of households in the United States that
own personal computers.
The procedure to find the confidence interval, the sample size, the error bound, and
the confidence level for a proportion is similar to that for the population mean. The
formulas are different.
How do you know you are dealing with a proportion problem? First, the
underlying distribution is binomial. (There is no mention of a mean or average.) If
XX is a binomial random variable, then X~B(n,p) X~B(n,p) where nn = the number of trials
and
pp = the probability of a success. To form a proportion, take XX, the random
variable for the number of successes and divide it by nn, the number of trials (or the
sample size). The random variable P'P' (read "P prime") is that proportion,
P'=XnP'=Xn
(Sometimes the random variable is denoted as P̂P̂, read "P hat".)
When nn is large, we can use the normal distribution to approximate the binomial.
XX ~
N
(
n
⋅
p
,
n
⋅
p
⋅
q
)
N(n⋅p,
n
⋅
p
⋅
q
)
If we divide the random variable by nn, the mean by nn, and the standard
deviation by nn, we get a normal distribution of proportions with P'P', called the
estimated proportion, as the random variable. (Recall that a proportion = the
number of successes divided by nn.)
X
n
=
P
'
X
n
=P' ~
N
(
n
⋅
p
n
,
n
⋅
p
⋅
q
n
)
N(
n
⋅
p
n
,
n
⋅
p
⋅
q
n
)
Using algebra to simplify :
n
⋅
p
⋅
q
n
=
p
⋅
q
n
n
⋅
p
⋅
q
n
=
p
⋅
q
n
P'P' follows a normal distribution for proportions:
P
'P' ~
N
(
p
,
(
p
⋅
q
n
)
)
N(p,
(
p
⋅
q
n
)
)
The confidence interval has the form (p'-EBP,p'+EBP)(p'-EBP,p'+EBP).
p
'
=
x
n
p'=
x
n
p
'
p' = the estimated proportion of successes (p'p' is a point estimate for pp, the true proportion)
xx = the number of successes.
nn = the size of the sample
The error bound for a proportion is
EBP
=
z
α
2
⋅
(
p
'
⋅
q
'
n
)
q
'
=
1
-
p
'
EBP=
z
α
2
⋅
(
p
'
⋅
q
'
n
)q'=1-p'
This formula is similar to the error bound formula for a
mean, except that the "appropriate standard deviation" is different. For a mean, when the population standard deviation is known, the appropriate standard deviation that we use is
σ
n
σ
n
. For a proportion, the appropriate standard deviation is
p
⋅
q
n
p
⋅
q
n
.
However, in the error bound formula, we use
p
'
⋅
q
'
n
p
'
⋅
q
'
n
as the standard deviation, instead of
p
⋅
q
n
p
⋅
q
n
However, in the error bound formula, the standard deviation is
(
p
'
⋅
q
'
n
)
(
p
'
⋅
q
'
n
)
.
In the error bound formula, the sample proportions p'p' and q'q' are estimates of the unknown population proportions pp and qq. The estimated
proportions p'p' and q'q' are used because pp and qq are not known. p'p' and q'q' are
calculated from the data. p'p' is the estimated proportion of successes. q'q' is the
estimated proportion of failures.
For the normal distribution of proportions, the z-score formula is as follows.
If
P
'P' ~
N
(
p
,
(
p
⋅
q
n
)
)
N(p,
(
p
⋅
q
n
)
) then the z-score formula is
z
=
p
'
-
p
(
p
⋅
q
n
)
z=
p
'
-
p
(
p
⋅
q
n
)
Suppose that a market research firm is hired to estimate the percent of adults living in a large city who have cell phones. 500 randomly selected adult residents in this city are surveyed to determine whether they have cell phones. Of
the 500 people surveyed, 421 responded yes - they own cell phones. Using a 95% confidence level, compute a confidence interval estimate for the true
proportion of adults residents of this city who have cell phones.
- You can use technology to directly calculate the confidence interval.
- The first solution is step-by-step (Solution A).
- The second solution uses a function of the TI-83, 83+ or 84 calculators (Solution B).
Let XX = the number of people in the sample who have cell phones. XX is binomial. XX ~ B(500, 421500)B(500,421500).
To calculate the confidence interval, you must find p'p', q'q', and EBPEBP.
n
=
500
x
n=500x
= the number of successes
=
421
=421
p
'
=
x
n
=
421
500
=
0.842 p'=
x
n
=
421
500
=0.842
p
'
=
0.842 p'=0.842 is the sample proportion; this is the point estimate of the population proportion.
q
'
=
1
-
p
'
=
1
-
0.842
=
0.158
q'=1-p'=1-0.842=0.158
Since
CL
=
0.95
CL=0.95, then
α
=
1
-
CL
=
1
-
0.95
=
0.05
α
2
=
0.025
α=1-CL=1-0.95=0.05
α
2
=0.025.
Then
z
α
2
=
z
.025
=
1.96
z
α
2
=
z
.025
=1.96
Use the TI-83, 83+ or 84+ calculator command invnorm(.975,0,1) to find
z.025
z.025. Remember that the area to the right of
z.025
z.025
is 0.025 and the area to the left of
z.025
z.025
is 0.975.
This can also be found using appropriate commands on other calculators, using a computer, or using a Standard Normal probability table.
EBP
=
z
α
2
⋅
p
'
⋅
q
'
n
=
1.96
⋅
[
(
.842
)
⋅
(
.158
)
500
]
=
0.032
EBP=
z
α
2
⋅
p
'
⋅
q
'
n
=1.96⋅
[
(
.842
)
⋅
(
.158
)
500
]
=0.032
p
'
-
EBP
=
0.842
-
0.032
=
0.81
p'-EBP=0.842-0.032=0.81
p
'
+
EBP
=
0.842
+
0.032
=
0.874
p'+EBP=0.842+0.032=0.874
The confidence interval for the true binomial population proportion is (p'-EBP,p'+EBP) =(p'-EBP,p'+EBP)=(0.810,0.874)(0.810,0.874).
We estimate with 95% confidence that between 81% and 87.4% of all adult residents of this city have cell phones.
95% of the confidence intervals constructed in this
way would contain the true value for the population proportion of all adult residents of this city who have cell phones.
Press STAT and arrow over to TESTS.
Arrow down to A:PropZint. Press ENTER.
Arrow down to xx and enter 421.
Arrow down to nn and enter 500.
Arrow down to C-Level and enter .95.
Arrow down to Calculate and press ENTER.
The confidence interval is (0.81003, 0.87397).
For a class project, a political science student at a large university
wants to determine the percent of students that are registered voters. He surveys 500
students and finds that 300 are registered voters. Compute a 90% confidence interval
for the true percent of students that are registered voters and interpret the confidence
interval.
- You can use technology to directly calculate the confidence interval.
- The first solution is step-by-step (Solution A).
- The second solution uses a function of the TI-83, 83+ or 84 calculators (Solution B).
x=300x=300 and n=500n=500.
p
'
=
x
n
=
300
500
=
0.600
p'=
x
n
=
300
500
=0.600
q
'
=
1
-
p
'
=
1
-
0.600
=
0.400
q'=1-p'=1-0.600=0.400
Since
CL
=
0.90
CL=0.90, then
α
=
1
-
CL
=
1
-
0.90
=
0.10
α
2
=
0.05
α=1-CL=1-0.90=0.10
α
2
=0.05.
z
α
2
=
z
.05
=
1.645
z
α
2
=
z
.05
=1.645
Use the TI-83, 83+ or 84+ calculator command invnorm(.95,0,1) to find
z.05
z.05. Remember that the area to the right of
z.05
z.05
is 0.05 and the area to the left of
z.05
z.05
is 0.95.
This can also be found using appropriate commands on other calculators, using a computer, or using a Standard Normal probability table.
EBP
=
z
α
2
⋅
p
'
⋅
q
'
n
=
1.645
⋅
[
(
.60
)
⋅
(
.40
)
500
]
=
0.036
EBP=
z
α
2
⋅
p
'
⋅
q
'
n
=1.645⋅
[
(
.60
)
⋅
(
.40
)
500
]
=0.036
p
'
-
EBP
=
0.60
-
0.036
=
0.564
p'-EBP=0.60-0.036=0.564
p
'
+
EBP
=
0.60
+
0.036
=
0.636
p'+EBP=0.60+0.036=0.636
The confidence interval for the true binomial population proportion is (p'-EBP,p'+EBP) =(p'-EBP,p'+EBP)=(0.564,0.636)(0.564,0.636).
- We estimate with 90% confidence that the true percent of all students that are registered voters is between 56.4% and 63.6%.
- Alternate Wording: We estimate with 90% confidence that between 56.4% and 63.6% of ALL students are registered voters.
90% of all confidence intervals constructed in this way contain the true value for the population percent of students that are registered voters.
Using a function of the TI-83, 83+ or 84 calculators:
Press STAT and arrow over to TESTS.
Arrow down to A:PropZint. Press ENTER.
Arrow down to xx and enter 300.
Arrow down to nn and enter 500.
Arrow down to C-Level and enter .90.
Arrow down to Calculate and press ENTER.
The confidence interval is (0.564, 0.636).
If researchers desire a specific margin of error, then they can use the error bound formula to calculate the required sample size.
The error bound formula for a population proportion is
-
EBP
=
z
α
2
⋅
(
p'q'
n
)
EBP=
z
α
2
⋅
(
p'q'
n
)
- Solving for nn gives you an equation for the sample size.
-
n=
z
α
2
2
⋅
p'q'
EBP
2
n=
z
α
2
2
⋅
p'q'
EBP
2
Suppose a mobile phone company wants to determine the current percentage of customers aged 50+ that use text messaging on their cell phone. How many customers aged 50+ should the company survey in order to be 90% confident that the estimated (sample) proportion is within 3 percentage points of the true population proportion of customers aged 50+ that use text messaging on their cell phone.
From the problem, we know that
EBP=0.03 (3%=0.03) and
z
α
2
=
z
.05
=
1.645
z
α
2
=
z
.05
=1.645 because the confidence level is 90%
However, in order to find n , we need to know the estimated (sample) proportion p'. Remember that q'=1-p'. But, we do not know p' yet. Since we multiply p' and q' together, we make them both equal to 0.5 because p'q'= (.5)(.5)=.25 results in the largest possible product. (Try other products: (.6)(.4)=.24; (.3)(.7)=.21; (.2)(.8)=.16 and so on). The largest possible product gives us the largest n. This gives us a large enough sample so that we can be 90% confident that we are within 3 percentage points of the true population proportion. To calculate the sample size n, use the formula and make the substitutions.
n=z2p'q'EBP2
n
z
2
p'
q'
EBP
2
gives
n=1.6452(.5)(.5).032
n
1.645
2
(.5)
(.5)
.03
2
=751.7
Round the answer to the next higher value. The sample size should be 758 cell phone customers aged 50+ in order to be 90% confident that the estimated (sample) proportion is within 3 percentage points of the true population proportion of all customers aged 50+ that use text messaging on their cell phone.
**With contributions from Roberta Bloom.
- Binomial Distribution:
A discrete random variable (RV) which arises from Bernoulli trials. There are a fixed number, nn, of independent trials. “Independent” means that the result of any trial (for example, trial 1) does not affect the results of the following trials, and all trials are conducted under the same conditions. Under these circumstances the binomial RV
XX size 12{X} {} is defined as the number of successes in nn trials. The notation is:
XX~ B
(
n
,
p
)B(n,p). The mean is μ=np
μ
np
and the standard deviation is
σ
=
npq
σ=npq. The probability of exactly xx successes in nn trials is
P
(
X
=
x
)
=
n
x
p
x
q
n
−
x
P(X=x)=
n
x
p
x
q
n
−
x
.
- Confidence Interval (CI):
An interval estimate for an unknown population parameter. This depends on:
- The desired confidence level.
- Information that is known about the distribution (for example, known standard deviation).
- The sample and its size.
- Confidence Level (CL):
The percent expression for the probability that the confidence interval contains the true population parameter. For example, if the CL=90%CL=90%, then in 9090 out of 100100 samples the interval estimate will enclose the true population parameter.
- Error Bound for a Population Proportion(EBP):
The margin of error. Depends on the confidence level, sample size, and the estimated (from the sample) proportion of successes.
- Normal Distribution:
A continuous random variable (RV) with pdf
f(x)=1σ2πe−(x−μ)2/2σ2f(x)=1σ2πe−(x−μ)2/2σ2 size 12{ ital "pdf"= { {1} over {σ sqrt {2π} } } e rSup { size 8{ - \( x - μ \) rSup { size 6{2} } /2σ rSup { size 6{2} } } } } {}, where μμ is the mean of the distribution and σσ is the standard deviation. Notation: XX ~ N
μ
σ
N
μ
σ
. If μ=0μ=0 and σ=1σ=1, the RV is called the standard normal distribution.
"Reviewer's Comments: 'I recommend this book. Overall, the chapters are very readable and the material presented is consistent and appropriate for the course. A wide range of exercises introduces […]"