In the previous paragraphs it was assumed that we were sampling from a normal distribution and the variance was known. The null hypothesis was generally of the form
H
0
: μ=
μ
0
H
0
: μ=
μ
0
.
There are essentially tree possibilities for the alternative hypothesis, namely that
μ
μ
has increased,
-
H
1
: μ>
μ
0
H
1
: μ>
μ
0
;
μ
μ
has decreased,
-
H
1
: μ<
μ
0
H
1
: μ<
μ
0
;
μ
μ
has changed, but it is not known if it has increased or decreased, which leads to a two-sided alternative hypothesis
-
H
1
;μ≠
μ
0
H
1
;μ≠
μ
0
.
To test
H
0
;μ=
μ
0
H
0
;μ=
μ
0
against one of these tree alternative hypotheses, a random sample is taken from the distribution, and an observed sample mean,
x
¯
x
¯
, that is close to
μ
0
μ
0
supports
H
0
H
0
. The closeness of
x
¯
x
¯
to
μ
0
μ
0
is measured in term of standard deviations of
X
¯
X
¯
,
σ/
n
σ/
n
which is sometimes called the standard error of the mean. Thus the statistic could be defined by
Z=
X
¯
−
μ
0
σ2
/n
=
X
¯
−
μ
0
σ/
n
,
Z=
X
¯
−
μ
0
σ2
/n
=
X
¯
−
μ
0
σ/
n
,
and the critical regions, at a significance level
α
α
, for the tree respective alternative hypotheses would be:
-
z≥
z
α
z≥
z
α
-
z≤
z
α
z≤
z
α
-
| z |=
z
α/2
| z |=
z
α/2
In terms of
x
¯
x
¯
these tree critical regions become
-
x
¯
≥
μ
0
+
z
α
σ/
n
,
x
¯
≥
μ
0
+
z
α
σ/
n
,
-
x
¯
≤
μ
0
−
z
α
σ/
n
,
x
¯
≤
μ
0
−
z
α
σ/
n
,
-
|
x
¯
−
μ
0
|≥
z
α
σ/
n
|
x
¯
−
μ
0
|≥
z
α
σ/
n
These tests and critical regions are summarized in TABLE 1 . The underlying assumption is that the distribution is
N(
μ,
σ
2
)
N(
μ,
σ
2
)
and
σ
2
σ
2
is known. Thus far we have assumed that the variance
σ
2
σ
2
was known. We now take a more realistic position and assume that the variance is unknown. Suppose our null hypothesis is
H
0
;μ=
μ
0
H
0
;μ=
μ
0
and the two-sided alternative hypothesis is
H
1
;μ≠
μ
0
H
1
;μ≠
μ
0
. If a random sample
X
1
,
X
2
,...,
X
n
X
1
,
X
2
,...,
X
n
is taken from a normal distribution
N(
μ,
σ
2
)
N(
μ,
σ
2
)
,let recall that a confidence interval for
μ
μ
was based on
T=
X
¯
−μ
S
2
/n
=
X
¯
−μ
S/
n
.
T=
X
¯
−μ
S
2
/n
=
X
¯
−μ
S/
n
.
TABLE 1
|
H
0
H
0
|
H
1
H
1
|
Critical Region |
|
μ=
μ
0
μ=
μ
0
|
μ>
μ
0
μ>
μ
0
|
z≥
z
α
z≥
z
α
or
x
¯
≥
μ
0
+
z
α
σ/
n
x
¯
≥
μ
0
+
z
α
σ/
n
|
|
μ=
μ
0
μ=
μ
0
|
μ<
μ
0
μ<
μ
0
|
z≤−
z
α
z≤−
z
α
or
x
¯
≤
μ
0
−
z
α
σ/
n
x
¯
≤
μ
0
−
z
α
σ/
n
|
|
μ=
μ
0
μ=
μ
0
|
μ≠
μ
0
μ≠
μ
0
|
| z |≥
z
α/2
| z |≥
z
α/2
or
|
x
¯
−
μ
0
|≥
z
α/2
σ/
n
|
x
¯
−
μ
0
|≥
z
α/2
σ/
n
|
This suggests that T might be a good statistic to use for the test
H
0
;μ=
μ
0
H
0
;μ=
μ
0
with
μ
μ
replaced by
μ
0
μ
0
. In addition, it is the natural statistic to use if we replace
σ
2
/n
σ
2
/n
by its unbiased estimator
S
2
/n
S
2
/n
in
(
X
¯
−
μ
0
)/
σ
2
/n
(
X
¯
−
μ
0
)/
σ
2
/n
in a proper equation. If
μ=
μ
0
μ=
μ
0
we know that T has a t distribution with n-1 degrees of freedom. Thus, with
μ=
μ
0
μ=
μ
0
,
P[
| T |≥
t
α/2
(
n−1
)
]=P[
|
X
¯
−
μ
0
|
S/
n
≥
t
α/2
(
n−1
)
]=α.
P[
| T |≥
t
α/2
(
n−1
)
]=P[
|
X
¯
−
μ
0
|
S/
n
≥
t
α/2
(
n−1
)
]=α.
Accordingly, if
x
¯
x
¯
and s are the sample mean and the sample standard deviation, the rule that rejects
H
0
;μ=
μ
0
H
0
;μ=
μ
0
if and only if
| t |=
|
x
¯
−
μ
0
|
s/
n
≥
t
α/2
(
n−1
).
| t |=
|
x
¯
−
μ
0
|
s/
n
≥
t
α/2
(
n−1
).
Provides the test of the hypothesis with significance level
α
α
. It should be noted that this rule is equivalent to rejecting
H
0
;μ=
μ
0
H
0
;μ=
μ
0
if
μ
0
μ
0
is not in the open
100(
1−α
)%
100(
1−α
)%
confidence interval
(
x
¯
−
t
α/2
(
n−1
)s/
n
,
x
¯
+
t
α/2
(
n−1
)s/
n
).
(
x
¯
−
t
α/2
(
n−1
)s/
n
,
x
¯
+
t
α/2
(
n−1
)s/
n
).
Table 2 summarizes tests of hypotheses for a single mean, along with the three possible alternative hypotheses, when the underlying distribution is
N(
μ,
σ
2
)
N(
μ,
σ
2
)
,
σ
2
σ
2
is unknown,
t=(
x
¯
−
μ
0
)/(
s/
n
)
t=(
x
¯
−
μ
0
)/(
s/
n
)
and
n≤31
n≤31
. If n>31, use table 1 for approximate tests with
σ
σ
replaced by s.
TABLE 2
|
H
0
H
0
|
H
1
H
1
|
Critical Region |
|
μ=
μ
0
μ=
μ
0
|
μ>
μ
0
μ>
μ
0
|
t≥
t
α
(
n−1
)
t≥
t
α
(
n−1
)
or
x
¯
≥
μ
0
+
t
α
(
n−1
)s/
n
x
¯
≥
μ
0
+
t
α
(
n−1
)s/
n
|
|
μ=
μ
0
μ=
μ
0
|
μ<
μ
0
μ<
μ
0
|
t≤−
t
α
(
n−1
)
t≤−
t
α
(
n−1
)
or
x
¯
≤
μ
0
−
t
α
(
n−1
)s/
n
x
¯
≤
μ
0
−
t
α
(
n−1
)s/
n
|
|
μ=
μ
0
μ=
μ
0
|
μ≠
μ
0
μ≠
μ
0
|
| t |≥
t
α/2
(
n−1
)
| t |≥
t
α/2
(
n−1
)
or
|
x
¯
−
μ
0
|≥
t
α/2
(
n−1
)s/
n
|
x
¯
−
μ
0
|≥
t
α/2
(
n−1
)s/
n
|
Let X (in millimeters) equal the growth in 15 days of a tumor induced in a mouse. Assume that the distribution of X is
N(
μ,
σ
2
)
N(
μ,
σ
2
)
. We shall test the null hypothesis
H
0
:μ=
μ
0
=4.0
H
0
:μ=
μ
0
=4.0
millimeters against the two-sided alternative hypothesis is
H
1
:μ≠4.0
H
1
:μ≠4.0
. If we use n=9 observations and a significance level of
α
α
=0.10, the critical region is
| t |=
|
x
¯
−4.0
|
s/
9
≥
t
α/2
(
8
)=
t
0.05
(
8
)=1.860.
| t |=
|
x
¯
−4.0
|
s/
9
≥
t
α/2
(
8
)=
t
0.05
(
8
)=1.860.
If we are given that n=9,
x
¯
x
¯
=4.3, and s=1.2, we see that
t=
4.3−4.0
1.2/
9
=
0.3
0.4
=0.75.
t=
4.3−4.0
1.2/
9
=
0.3
0.4
=0.75.
Thus
| t |=|
0.75
|<1.860
| t |=|
0.75
|<1.860
and we accept (do not reject)
H
0
:μ=4.0
H
0
:μ=4.0
at the
α
α
=10% significance level. See Figure 1.
In discussing the test of a statistical hypothesis, the word
accept might better be replaced by
do not reject. That is, in
Example 1,
x
¯
x
¯
is close enough to 4.0 so that we accept
μ
μ
=4.0, we do not want that acceptance to imply that
μ
μ
is actually equal to 4.0. We want to say that the data do not deviate enough from
μ
μ
=4.0 for us to reject that hypothesis; that is, we do not reject
μ
μ
=4.0 with these observed data, With this understanding, one sometimes uses
accept and sometimes
fail to reject or
do not reject, the null hypothesis.
In this example the use of the t-statistic with a one-sided alternative hypothesis will be illustrated.
In attempting to control the strength of the wastes discharged into a nearby river, a paper firm has taken a number of measures. Members of the firm believe that they have reduced the oxygen-consuming power of their wastes from a previous mean
μ
μ
of 500. They plan to test
H
0
:μ=500
H
0
:μ=500
against
H
1
:μ<500
H
1
:μ<500
, using readings taken on n=25 consecutive days. If these 25 values can be treated as a random sample, then the critical region, for a significance level of
α
α
=0.01, is
t=
x
¯
−500
s/
25
≤−
t
0.01
(
24
)=−2.492.
t=
x
¯
−500
s/
25
≤−
t
0.01
(
24
)=−2.492.
The observed values of the sample mean and sample standard deviation were
x
¯
x
¯
=308.8 and s=115.15. Since
t=
308.8−500
115.15/
25
=−8.30<−2.492,
t=
308.8−500
115.15/
25
=−8.30<−2.492,
we clearly reject the null hypothesis and accept
H
1
:μ<500
H
1
:μ<500
. It should be noted, however, that although an improvement has been made, there still might exist the question of whether the improvement is adequate. The 95% confidence interval
308.8±2.064(
115.15/5
)
308.8±2.064(
115.15/5
)
or
[
261.27, 356.33
]
[
261.27, 356.33
]
for
μ
μ
might the company answer that question.