Confidence Intervals for Means
In the preceding considerations
(
Confidence Intervals I), the confidence interval for the mean
μ
μ
of a normal distribution was found, assuming that the value of the standard deviation
σ
σ
is known. However, in most applications, the value of the standard deviation
σ
σ
is rather unknown, although in some cases one might have a very good idea about its value.
Suppose that the underlying distribution is normal and that
σ
2
σ
2
is unknown. It is shown that given random sample
X
1
,
X
2
,...,
X
n
X
1
,
X
2
,...,
X
n
from a normal distribution, the statistic
T=
X
¯
−μ
S/
n
T=
X
¯
−μ
S/
n
has a
t distribution with
r=n−1
r=n−1
degrees of freedom, where
S
2
S
2
is the usual unbiased estimator of
σ
2
σ
2
, (see,
t distribution).
Select
t
α/2
(
n−1
)
t
α/2
(
n−1
)
so that
P[
T≥
t
α/2
(
n−1
)
]=α/2.
P[
T≥
t
α/2
(
n−1
)
]=α/2.
Then
1−α=P[
−
t
α/2
(
n−1
)≤
X
¯
−μ
S/
n
≤
t
α/2
(
n−1
)
]
=P[
−
t
α/2
(
n−1
)
S
n
≤
X
¯
−μ≤
t
α/2
(
n−1
)
S
n
]
=P[
−
X
¯
−
t
α/2
(
n−1
)
S
n
≤−μ≤−
X
¯
+
t
α/2
(
n−1
)
S
n
]
=P[
X
¯
−
t
α/2
(
n−1
)
S
n
≤−μ≤
X
¯
+
t
α/2
(
n−1
)
S
n
].
1−α=P[
−
t
α/2
(
n−1
)≤
X
¯
−μ
S/
n
≤
t
α/2
(
n−1
)
]
=P[
−
t
α/2
(
n−1
)
S
n
≤
X
¯
−μ≤
t
α/2
(
n−1
)
S
n
]
=P[
−
X
¯
−
t
α/2
(
n−1
)
S
n
≤−μ≤−
X
¯
+
t
α/2
(
n−1
)
S
n
]
=P[
X
¯
−
t
α/2
(
n−1
)
S
n
≤−μ≤
X
¯
+
t
α/2
(
n−1
)
S
n
].
Thus the observations of a random sample provide a
x
¯
x
¯
and
s
2
s
2
and
x
¯
−
t
α/2
(
n−1
)
s
n
,
x
¯
+
t
α/2
(
n−1
)
s
n
x
¯
−
t
α/2
(
n−1
)
s
n
,
x
¯
+
t
α/2
(
n−1
)
s
n
is a
100(
1−α
)%
100(
1−α
)%
interval for
μ
μ
.
Example 1
Let X equals the amount of butterfat in pound produced by a typical cow during a 305-day milk production period between her first and second claves. Assume the distribution of X is
N(
μ,
σ
2
)
N(
μ,
σ
2
)
. To estimate
μ
μ
a farmer measures the butterfat production for n-20 cows yielding the following data:
| 481 |
537 |
513 |
583 |
453 |
510 |
570 |
| 500 |
487 |
555 |
618 |
327 |
350 |
643 |
| 499 |
421 |
505 |
637 |
599 |
392 |
- |
For these data,
x
¯
=507.50
x
¯
=507.50
and
s=89.75
s=89.75
. Thus a point estimate of
μ
μ
is
x
¯
=507.50
x
¯
=507.50
. Since
t
0.05
(
19
)=1.729
t
0.05
(
19
)=1.729
, a 90% confidence interval for
μ
μ
is
507.50±1.729(
89.75
20
)
507.50±1.729(
89.75
20
)
, or equivalently, [472.80, 542.20].
Let T have a t distribution with n-1 degrees of freedom. Then,
t
α/2
(
n−1
)>
z
α/2
t
α/2
(
n−1
)>
z
α/2
. Consequently, the interval
x
¯
±
z
α/2
σ/
n
x
¯
±
z
α/2
σ/
n
is expected to be shorter than the interval
x
¯
±
t
α/2
(
n−1
)s/
n
x
¯
±
t
α/2
(
n−1
)s/
n
. After all, there gives more information, namely the value of
σ
σ
, in construction the first interval. However, the length of the second interval is very much dependent on the value of s. If the observed s is smaller than
σ
σ
, a shorter confidence interval could result by the second scheme. But on the average,
x
¯
±
z
α/2
σ/
n
x
¯
±
z
α/2
σ/
n
is the shorter of the two confidence intervals.
If it is not possible to assume that the underlying distribution is normal but
μ
μ
and
σ
σ
are both unknown, approximate confidence intervals for
μ
μ
can still be constructed using
T=
X
¯
−μ
S/
n
,
T=
X
¯
−μ
S/
n
,
which now only has an approximate t distribution.
Generally, this approximation is quite good for many normal distributions, in particular, if the underlying distribution is symmetric, unimodal, and of the continuous type. However, if the distribution is highly skewed, there is a great danger using this approximation. In such a situation, it would be safer to use certain nonparametric method for finding a confidence interval for the median of the distribution.
Confidence Interval for Variances
The confidence interval for the variance
σ
2
σ
2
is based on the sample variance
S
2
=
1
n−1
∑
i=1
n
(
X
i
−
X
¯
)
2
.
S
2
=
1
n−1
∑
i=1
n
(
X
i
−
X
¯
)
2
.
In order to find a confidence interval for
σ
2
σ
2
, it is used that the distribution of
(
n−1
)
S
2
/
σ
2
(
n−1
)
S
2
/
σ
2
is
χ
2
(
n−1
)
χ
2
(
n−1
)
. The constants
a and
b should selected from tabularized
Chi Squared Distribution with
n-1 degrees of freedom such that
P(
a≤
(
n−1
)
S
2
σ
2
≤b
)=1−α.
P(
a≤
(
n−1
)
S
2
σ
2
≤b
)=1−α.
That is select a and b so that the probabilities in two tails are equal:
a=
χ
1−α/2
2
(
n−1
)
a=
χ
1−α/2
2
(
n−1
)
and
b=
χ
α/2
2
(
n−1
).
b=
χ
α/2
2
(
n−1
).
Then, solving the inequalities, we have
1−α=P(
a
(
n−1
)
S
2
≤
1
σ
2
≤
b
(
n−1
)
S
2
)=P(
(
n−1
)
S
2
b
≤
σ
2
≤
(
n−1
)
S
2
a
).
1−α=P(
a
(
n−1
)
S
2
≤
1
σ
2
≤
b
(
n−1
)
S
2
)=P(
(
n−1
)
S
2
b
≤
σ
2
≤
(
n−1
)
S
2
a
).
Thus the probability that the random interval
[(n-1)S
2
/b, (n-1)S
2
/a]
[(n-1)S
2
/b, (n-1)S
2
/a]
contains the unknown
σ
2
σ
2
is 1-
α
α
. Once the values of
X
1
,
X
2
,...,
X
n
X
1
,
X
2
,...,
X
n
are observed to be
x
1
,
x
2
,...,
x
n
x
1
,
x
2
,...,
x
n
and
s
2
s
2
computed, then the interval
[(n-1)S
2
/b, (n-1)S
2
/a]
[(n-1)S
2
/b, (n-1)S
2
/a]
is a
100(
1−α
)%
100(
1−α
)%
confidence interval for
σ
2
σ
2
.
It follows that
[
(
n−1
)/bs
,
(
n−1
)/as
]
[
(
n−1
)/bs
,
(
n−1
)/as
]
is a
100(
1−α
)%
100(
1−α
)%
confidence interval for
σ
σ
, the standard deviation.
Example 2
Assume that the time in days required for maturation of seeds of a species of a flowering plant found in Mexico is
N(
μ,
σ
2
)
N(
μ,
σ
2
)
. A random sample of n=13 seeds, both parents having narrow leaves, yielded
x
¯
x
¯
=18.97 days and
12
s
2
=
∑
i=1
13
(
x
−
x
¯
)
2
=128.41
12
s
2
=
∑
i=1
13
(
x
−
x
¯
)
2
=128.41
.
A confidence interval for
σ
2
σ
2
is
[
128.41
21.03
,
128.41
5.226
]=[
6.11,24.57
]
[
128.41
21.03
,
128.41
5.226
]=[
6.11,24.57
]
, because
5.226=
χ
0.95
2
(
12
)
5.226=
χ
0.95
2
(
12
)
and
21.03=
χ
0.055
2
(
12
)
21.03=
χ
0.055
2
(
12
)
, what can be read from the tabularized Chi Squared Distribution. The corresponding 90% confidence interval for
σ
σ
is
[
6.11
,
24.57
]=[
2.47,4.96
].
[
6.11
,
24.57
]=[
2.47,4.96
].