During an election year, we see articles in the newspaper that state confidence intervals
in terms of proportions or percentages. For example, a poll for a particular
candidate running for president might show that the candidate has 40% of the vote
within 3 percentage points. Often, election polls are calculated with 95% confidence.
So, the pollsters would be 95% confident that the true proportion of voters who
favored the candidate would be between 0.37 and 0.43
(
0.40
-
0.03
,
0.40
+
0.03
)
(0.40-0.03,0.40+0.03).
Investors in the stock market are interested in the true proportion of stocks that go up
and down each week. Businesses that sell personal computers are interested in the
proportion of households in the United States that own personal computers.
Confidence intervals can be calculated for the true proportion of stocks that go up or
down each week and for the true proportion of households in the United States that
own personal computers.
The procedure to find the confidence interval, the sample size, the error bound, and
the confidence level for a proportion is similar to that for the population mean. The
formulas are different.
How do you know you are dealing with a proportion problem? First, the
underlying distribution is binomial. (There is no mention of a mean or average.) If
XX is a binomial random variable, then X~B(n,p) X~B(n,p) where nn = the number of trials
and
pp = the probability of a success. To form a proportion, take XX, the random
variable for the number of successes and divide it by nn, the number of trials (or the
sample size). The random variable P'P' (read "P prime") is that proportion,
P'=XnP'=Xn
(Sometimes the random variable is P̂P̂, read "P hat".)
When nn is large, we can use the normal distribution to approximate the binomial.
XX ~
N
(
n
⋅
p
,
n
⋅
p
⋅
q
)
N(n⋅p,
n
⋅
p
⋅
q
)
If we divide all values of the random variable by nn, the mean by nn, and the standard
deviation by nn, we get a normal distribution of proportions with P'P', called the
estimated proportion, as the random variable. (Recall that a proportion = the
number of successes divided by nn.)
X
n
=
P
'
X
n
=P' ~
N
(
n
⋅
p
n
,
n
⋅
p
⋅
q
n
)
N(
n
⋅
p
n
,
n
⋅
p
⋅
q
n
)
By algebra,
n
⋅
p
⋅
q
n
=
p
⋅
q
n
n
⋅
p
⋅
q
n
=
p
⋅
q
n
P'P' follows a normal distribution for proportions:
P
'P' ~
N
(
p
,
p
⋅
q
n
)
N(p,
p
⋅
q
n
)
The confidence interval has the form (p'-EBP,p'+EBP)(p'-EBP,p'+EBP).
p
'
=
x
n
p'=
x
n
p
'
p' = the estimated proportion of successes (p'p' is a point estimate for pp, the true proportion)
xx = the number of successes.
nn = the size of the sample
The error bound for a proportion is
EBP
=
z
α
2
⋅
p
'
⋅
q
'
n
q
'
=
1
-
p
'
EBP=
z
α
2
⋅
p
'
⋅
q
'
n
q'=1-p'
This formula is actually very similar to the error bound formula for a
mean. The difference is the standard deviation. For a mean where
the population standard deviation is known, the standard deviation is
σ
n
σ
n
.
For a proportion, the standard deviation is
p
⋅
q
n
p
⋅
q
n
.
However, in the error bound formula, the standard deviation is
p
'
⋅
q
'
n
p
'
⋅
q
'
n
.
In the error bound formula, p'p' and q'q' are estimates of pp and qq. The estimated
proportions p'p' and q'q' are used because pp and qq are not known. p'p' and q'q' are
calculated from the data. p'p' is the estimated proportion of successes. q'q' is the
estimated proportion of failures.
When a study gives a margin of error of "+ or - 3 percentage points", this is determined
before the survey is done. Since p'p' and q'q' are unknown, the most conservative
choice is p'=0.5p'=0.5 and
q'=0.5q'=0.5, because these values give the largest standard
deviation, error bound, and confidence interval.
For the normal distribution of proportions, the z-score formula is as follows.
If
P
'P' ~
N
(
p
,
p
⋅
q
n
)
N(p,
p
⋅
q
n
) then the z-score formula is
z
=
p
'
-
p
p
⋅
q
n
z=
p
'
-
p
p
⋅
q
n
Suppose that a sample of 500 households in Phoenix was taken last May
to determine whether the oldest child had given his/her mother a Mother's Day card. Of
the 500 households, 421 responded yes. Compute a 95% confidence interval for the true
proportion of all Phoenix households whose oldest child gave his/her mother a Mother's
Day card.
- The first solution is step-by-step.
- The second solution uses the TI-83+ and TI-84 calculators.
Let XX = the number of oldest children who gave their mothers Mother's Day card last
May. XX is binomial. XX ~ B(500, 421500)B(500,421500).
To calculate the confidence interval, you must find p'p', q'q', and EBPEBP.
n
=
500
x
n=500x
= the number of successes
=
421
=421
p
'
=
x
n
=
421
500
=
0.842
p'=
x
n
=
421
500
=0.842
q
'
=
1
-
p
'
=
1
-
0.842
=
0.158
q'=1-p'=1-0.842=0.158
Since
CL
=
0.95
CL=0.95, then
α
=
1
-
CL
=
1
-
0.95
=
0.05
α
2
=
0.025
α=1-CL=1-0.95=0.05
α
2
=0.025.
Then
z
α
2
=
z
.025
=
1.96
z
α
2
=
z
.025
=1.96 using a calculator, computer, or standard normal table.
Remember that the area to the right = 0.025 and therefore, area to the
left is 0.975.
The z-score that corresponds to 0.975 is 1.96.
EBP
=
z
α
2
⋅
p
'
⋅
q
'
n
=
1.96
⋅
[
(
.842
)
⋅
(
.158
)
500
]
=
0.032
EBP=
z
α
2
⋅
p
'
⋅
q
'
n
=1.96⋅
[
(
.842
)
⋅
(
.158
)
500
]
=0.032
p
'
-
EBP
=
0.842
-
0.032
=
0.81
p'-EBP=0.842-0.032=0.81
p
'
+
EBP
=
0.842
+
0.032
=
0.874
p'+EBP=0.842+0.032=0.874
The confidence interval for the true binomial population proportion is (p'-EBP,p'+EBP) =(p'-EBP,p'+EBP)=(0.810,0.874)(0.810,0.874).
We are 95% confident that between 81% and 87.4% of the
oldest children in households in Phoenix gave their mothers a
Mother's Day card last May.
We can also say that 95% of the confidence intervals constructed in this
way contain the true proportion of oldest children in Phoenix who gave
their mothers a Mother's Day card last May.
TI-83+ and TI-84: Press STAT and arrow over to TESTS. Arrow down
to A:PropZint. Press ENTER. Enter 421 for xx, 500 for nn, and .95 for
C-Level. Arrow down to Calculate and press ENTER. The confidence
interval is (0.81003, 0.87397).
For a class project, a political science student at a large university
wants to determine the percent of students that are registered voters. He surveys 500
students and finds that 300 are registered voters. Compute a 90% confidence interval
for the true percent of students that are registered voters and interpret the confidence
interval.
x=300x=300 and n=500n=500. Using a TI-83+ or 84 calculator, the 90% confidence
interval for the true percent of students that are registered voters is (0.564, 0.636).
- We are 90% confident that the true percent of students that are registered voters
is between 56.4% and 63.6%.
- Ninety percent (90 %) of all confidence intervals constructed in this way contain
the true percent of students that are registered voters.
- Binomial Distribution:
A discrete random variable (RV) which arises from the Bernoulli trials with the next additional requirements. There are fixed number, n, of independent trials. “Independent” means that the result to any trial (for example, trial 1) in no way affects the answer to all the following trials, and all trials are conducted under the same conditions. Under these circumstances the binomial RV
XX size 12{X} {} is defined as the number of success in n trials. The notation is:
XX~ B
(
n
,
p
)B(n,p); the domain is
the mean is μ=np
μ
np
, and the variance is
σ
2
=
df
σ
2
=df. The probability to have exactly xx successes in nn trials is
P
(
X
=
x
)
=
n
x
p
x
q
n
−
x
P(X=x)=
n
x
p
x
q
n
−
x
.
- Confidential Interval:
An interval estimate for unknown population parameter. This depends on:
- The desired confidence level.
- What is known for the distribution information (for ex., known variance).
- Gathering from the sampling information.
- Confidence Level:
The percent expression for the probability that the confidence interval contains the true population parameter. That is, for ex., if CL=90%, then in 90 out of 100 samples the interval estimate will enclose the true population parameter.
- Error Bound for a Population Mean (EBM):
The margin of error. Depends on the confidence level, sample size, and known or estimated population standard deviation.
- Normal Distribution:
A continuous random variable (RV) with
pdf=1σ2πe−(x−μ)2/2σ2pdf=1σ2πe−(x−μ)2/2σ2 size 12{ ital "pdf"= { {1} over {σ sqrt {2π} } } e rSup { size 8{ - \( x - μ \) rSup { size 6{2} } /2σ rSup { size 6{2} } } } } {}, where μμ is the mean of the distribution and σσ is its standard deviation. Notation: XX ~ N
μ
σ
2
N
μ
σ
2
. If μ=0μ=0 and σ=1σ=1, the RV is called standard normal distribution, or z-score.
"This book was purchased from the authors by the Maxfield Foundation and provided to the community as an open textbook available freely online and in PDF format. Bound copies of the book can also […]"