In hypothesis
testing, as in all other areas of statistical inference,
there are two major schools of thought on designing good tests:
Bayesian and frequentist (or classical). Consider the simple
binary hypothesis testing problem
ℋ
0
:
x∼
f
0
x
ℋ
0
:
x
f
0
x
ℋ
1
:
x∼
f
1
x
ℋ
1
:
x
f
1
x
In the Bayesian setup, the prior probability
π
i
=Pr
ℋ
i
π
i
ℋ
i
of each hypothesis occurring is assumed known. This
approach to hypothesis testing is represented by the minimum Bayes risk criterion and the
minimum probability of error
criterion.
In some applications, however, it may not be
reasonable to assign an a priori probability to
a hypothesis. For example, what is the a priori
probability of a supernova occurring in any particular region of
the sky? What is the prior probability of being attacked by a
ballistic missile? In such cases we need a decision rule that does
not depend on making assumptions about the a
priori probability of each hypothesis. Here the
Neyman-Pearson criterion offers an alternative to the Bayesian
framework.
The Neyman-Pearson criterion is stated in terms
of certain probabilities associated with a particular
hypothesis test. The relevant quantities are summarized in Table 1. Depending on the setting, different terminology
is used.
| Statistics |
Signal Processing |
| Probability |
Name |
Notation |
Name |
Notation |
|
P
0
declare
ℋ
1
P
0
declare
ℋ
1
|
size |
α
α
|
false-alarm probability |
P
F
P
F
|
|
P
1
declare
ℋ
1
P
1
declare
ℋ
1
|
power |
β
β
|
detection probability |
P
D
P
D
|
Here
P
i
declare
ℋ
j
P
i
declare
ℋ
j
dentoes the probability that we declare hypothesis
ℋ
j
ℋ
j
to be in effect when
ℋ
i
ℋ
i
is actually in effect. The probabilities
P
0
declare
ℋ
0
P
0
declare
ℋ
0
and
P
1
declare
ℋ
0
P
1
declare
ℋ
0
(sometimes called the miss probability),
are equal to
1-
P
F
1
P
F
and
1-
P
D
1
P
D
, respectively. Thus,
P
F
P
F
and
P
D
P
D
represent the two degrees of freedom in a binary
hypothesis test. Note that
P
F
P
F
and
P
D
P
D
do not involve a priori
probabilities of the hypotheses.
These two probabilities are related to
each other through the decision regions. If
R
1
R
1
is the decision region for
ℋ
1
ℋ
1
, we have
P
F
=∫
R
1
f
0
xdx
P
F
x
R
1
f
0
x
P
D
=∫
R
1
f
1
xdx
P
D
x
R
1
f
1
x
The densities
f
i
x
f
i
x
are nonnegative, so as
R
1
R
1
shrinks, both probabilities tend to zero. As
R
1
R
1
expands, both tend to one. The ideal case, where
P
D
=1
P
D
1
and
P
F
=0
P
F
0
, cannot occur unless the distributions do not overlap
(i.e.,
∫
f
0
x
f
1
xdx=0
x
f
0
x
f
1
x
0
). Therefore, in order to increase
P
D
P
D
, we must also increase
P
F
P
F
. This represents the fundamental tradeoff in
hypothesis testing and detection theory.
Consider the simple binary hypothesis test
of a scalar measurement xx:
ℋ
0
:
x∼01
ℋ
0
:
x
0
1
ℋ
1
:
x∼11
ℋ
1
:
x
1
1
Suppose we use a threshold test
x
≷
ℋ
0
ℋ
1
γ
x
≷
ℋ
0
ℋ
1
γ
where
γ∈ℝ
γ
is a free parameter. Then the false alarm and
detection probabilities are
P
F
=∫γ∞12πⅇ-t22dt=Qγ
P
F
t
γ
1
2
t
2
2
Q
γ
P
D
=∫γ∞12πⅇ-t-122dt=Qγ-1
P
D
t
γ
1
2
t
1
2
2
Q
γ
1
where QQ denotes the
Q-function. These quantities
are depicted in Figure 1.
Since the
QQ-function
is monotonicaly decreasing, it is evident that both
P
D
P
D
and
P
F
P
F
decay to zero as
γγ
increases. There is also an explicit relationship
P
D
=QQ-1
P
F
-1
P
D
Q
Q
P
F
1
A common means of displaying this relationship is with a
receiver operating characteristic (ROC) curve,
which is nothing more than a plot of
P
D
P
D
versus
P
F
P
F
(
Figure 2).
The Neyman-Pearson criterion says that we should
construct our decision rule to have maximum probability of
detection while not allowing the probability of false alarm to
exceed a certain value αα. In
other words, the optimal detector according to the
Neyman-Pearson criterion is the solution to the following
constrainted optimization problem:
max{
P
D
}
,
such that
P
F
≤α
P
D
,
such that
P
F
α
(1)
The maximization is over all decision rules (equivalently, over
all decision regions
R
0
R
0
,
R
1
R
1
).
Using different terminology, the Neyman-Pearson criterion
selects the most powerful test of size (not exceeding)
αα.
Fortunately, the above optimization problem has
an explicit solution. This is given by the celebrated
Neyman-Pearson lemma, which we now state. To ease
the exposition, our initial statement of this result only
applies to continuous random variables, and places a technical
condition on the densities. A more general statement is given
later in the module.
Consider the test
ℋ
0
:
x∼
f
0
x
ℋ
0
:
x
f
0
x
ℋ
1
:
x∼
f
1
x
ℋ
1
:
x
f
1
x
where
f
i
x
f
i
x
is a density. Define
Λx=
f
1
x
f
0
x
Λ
x
f
1
x
f
0
x
, and assume that
Λx
Λ
x
satisfies the condition that for each
γ∈ℝ
γ
,
Λx
Λ
x
takes on the value
γγ with probability
zero under hypothesis
ℋ
0
ℋ
0
. The solution to the optimization problem
in Equation 1 is given by
Λx=
f
1
x
f
0
x
≷
ℋ
0
ℋ
1
η
Λ
x
f
1
x
f
0
x
≷
ℋ
0
ℋ
1
η
where ηη is such that
P
F
=∫∀x,Λx>η
f
0
xdx=α
P
F
x
x
Λ
x
η
f
0
x
α
If
α=0
α
0
, then
η=∞
η
. The optimal test is unique up to a set of
probability zero under
ℋ
0
ℋ
0
and
ℋ
1
ℋ
1
.
The optimal decision rule is called the
likelihood ratio
test.
Λx
Λ
x
is the
likelihood ratio, and
ηη is a
threshold. Observe that neither the likelihood
ratio nor the threshold depends on the
a
priori probabilities
Pr
ℋ
i
ℋ
i
. they depend only on the conditional densities
f
i
f
i
and the size constraint
αα. The threshold can often
be solved for as a function of
αα, as the next example
shows.
Continuing with Example 1,
suppose we wish to design a Neyman-Pearson decision rule with
size constraint αα. We have
Λx=12πⅇ-x-12212πⅇ-x22=ⅇx-12
Λ
x
1
2
x
1
2
2
1
2
x
2
2
x
1
2
(2)
By taking the natural logarithm of both sides of the LRT and
rarranging terms, the decision rule is not changed, and we
obtain
x
≷
ℋ
0
ℋ
1
lnη+12≡γ
x
≷
ℋ
0
ℋ
1
η
1
2
γ
Thus, the optimal rule is in fact a thresholding rule like we
considered in
Example 1. The false-alarm
probability was seen to be
P
F
=Qγ
P
F
Q
γ
Thus, we may express the value of
γγ required by the
Neyman-Pearson lemma in terms of
αα:
γ=Q-1α
γ
Q
α
For hypothesis testing involving multiple or
vector-valued data, direct evaluation of the size
(
P
F
P
F
) and power
(
P
D
P
D
)
of a Neyman-Pearson decision rule would require integration
over multi-dimensional, and potentially complicated decision
regions. In many cases, however, this can be avoided by
simplifying the LRT to a test of the form
t
≷
ℋ
0
ℋ
1
γ
t
≷
ℋ
0
ℋ
1
γ
where the test statistic
t=Tx
t
T
x
is a sufficient
statistic for the data. Such a simplified form is
arrived at by modifying both sides of the LRT with
montonically increasing transformations, and by algebraic
simplifications. Since the modifications do not change the
decision rule, we may calculate
P
F
P
F
and
P
D
P
D
in terms of the sufficient statistic. For
example, the false-alarm probability may be written
P
F
=Pr
declare
ℋ
1
=∫∀t,t>γ
f
0
tdt
P
F
declare
ℋ
1
t
t
t
γ
f
0
t
(3)
where
f
0
t
f
0
t
denotes the density of
t
t under
ℋ
0
ℋ
0
. Since
t
t is typically of lower dimension than
xx, evaluation of
P
F
P
F
and
P
D
P
D
can be greatly simplified. The key is being able to reduce the
LRT to a threshold test involving a sufficient statistic
for which we know the distribution.
Let's design a Neyman-Pearson decision rule
of size αα for the
problem
ℋ
0
:
x∼0σ2I
ℋ
0
:
x
0
σ
2
I
ℋ
1
:
x∼μ1σ2I
ℋ
1
:
x
μ
1
σ
2
I
where
μ>0
μ
0
,
σ2>0
σ
2
0
are known,
0=0…0T
0
0
…
0
,
1=1…1T
1
1
…
1
are NN-dimensional
vectors, and II
is the N
N×NN identity
matrix. The likelihood ratio is
Λx=∏n=1N12πσ2ⅇ-
x
n
-μ22σ2∏n=1N12πσ2ⅇ-
x
n
22σ2=ⅇ-∑n=1N
x
n
-μ22σ2ⅇ-∑n=1N
x
n
22σ2=ⅇ12σ2∑n=1N2
x
n
μ-μ2=ⅇ1σ2-Nμ22+μ∑n=1N
x
n
Λ
x
n
1
N
1
2
σ
2
x
n
μ
2
2
σ
2
n
1
N
1
2
σ
2
x
n
2
2
σ
2
n
1
N
x
n
μ
2
2
σ
2
n
1
N
x
n
2
2
σ
2
1
2
σ
2
n
1
N
2
x
n
μ
μ
2
1
σ
2
N
μ
2
2
μ
n
1
N
x
n
(4)
To simplify the test further we may apply the natural
logarithm and rearrange terms to obtain
t≡∑n=1N
x
n
≷
ℋ
0
ℋ
1
σ2μlnη+Nμ2≡γ
t
n
1
N
x
n
≷
ℋ
0
ℋ
1
σ
2
μ
η
N
μ
2
γ
We have used the assumption
μ>0
μ
0
. If
μ<0
μ
0
, then division by
μμ is not a
monotonically increasing operation, and the inequalities
would be reversed.
The test statistic
tt is
sufficient for the unknown
mean. To set the threshold
γγ, we write the
false-alarm probability (size) as
P
F
=Prt>γ=∫∀t,t>γ
f
0
tdt
P
F
t
γ
t
t
t
γ
f
0
t
To evaluate
P
F
P
F
, we need to know the density of
tt under
ℋ
0
ℋ
0
. Fortunately,
tt
is the sum of normal variates, so it is again normally
distributed. In particular, we have
t=Ax
t
A
x
, where
A=1T
A
1
, so
t∼A0Aσ2IAT=0Nσ2
t
A
0
A
σ
2
I
A
0
N
σ
2
under
ℋ
0
ℋ
0
. Therefore, we may write
P
F
P
F
in terms of the
Q-function as
P
F
=QγNσ
P
F
Q
γ
N
σ
The threshold is thus determined by
γ=NσQ-1α
γ
N
σ
Q
α
Under
ℋ
1
ℋ
1
, we have
t∼A1Aσ2IAT=NμNσ2
t
A
1
A
σ
2
I
A
N
μ
N
σ
2
and so the detection probability (power) is
P
D
=Prt>γ=Qγ-NμNσ
P
D
t
γ
Q
γ
N
μ
N
σ
Writing
P
D
P
D
as a function of
P
F
P
F
, the ROC curve is given by
P
D
=QQ-1
P
F
-Nμσ
P
D
Q
Q
P
F
N
μ
σ
The quantity
Nμσ
N
μ
σ
is called the
signal-to-noise
ratio. As its name suggests, a larger SNR
corresponds to improved performance of the Neyman-Pearson
decision rule.
In the context of signal processing, the
foregoing problem may be viewed as the problem of detecting a
constant (DC) signal in
additive white
Gaussian noise:
ℋ
0
:
x
n
=
w
n
,
n
=
1
,
…
,
N
ℋ
0
:
x
n
w
n
,
n
=
1
,
…
,
N
ℋ
1
:
x
n
=A+
w
n
,
n
=
1
,
…
,
N
ℋ
1
:
x
n
A
w
n
,
n
=
1
,
…
,
N
where
AA is a known, fixed
amplitude, and
w
n
∼0σ2
w
n
0
σ
2
. Here
AA corresponds
to the mean
μμ in the
example.
In our initial statement of the Neyman-Pearson
Lemma, we assumed that for all
ηη, the set
{x|Λx=η}
x
Λ
x
η
x
had probability zero under
ℋ
0
ℋ
0
. This eliminated many important problems
from consideration, including tests of discrete data.
In this section we remove this restriction.
It is helpful to introduce a more general way of
writing decision rules. Let φφ
be a function of the data xx with
φx∈01
φ
x
0
1
. φφ defines the
decision rule "declare
ℋ
1
ℋ
1
with probability
φx
φ
x
." In other words, upon observing xx