In hypothesis
testing, as in all other areas of statistical inference,
there are two major schools of thought on designing good tests:
Bayesian and frequentist (or classical). Consider the simple
binary hypothesis testing problem
ℋ
0
:
x∼
f
0
x
ℋ
0
:
x
f
0
x
ℋ
1
:
x∼
f
1
x
ℋ
1
:
x
f
1
x
In the Bayesian setup, the prior probability
π
i
=Pr
ℋ
i
π
i
ℋ
i
of each hypothesis occurring is assumed known. This
approach to hypothesis testing is represented by the minimum Bayes risk criterion and the
minimum probability of error
criterion.
In some applications, however, it may not be
reasonable to assign an a priori probability to
a hypothesis. For example, what is the a priori
probability of a supernova occurring in any particular region of
the sky? What is the prior probability of being attacked by a
ballistic missile? In such cases we need a decision rule that does
not depend on making assumptions about the a
priori probability of each hypothesis. Here the
Neyman-Pearson criterion offers an alternative to the Bayesian
framework.
The Neyman-Pearson criterion is stated in terms
of certain probabilities associated with a particular
hypothesis test. The relevant quantities are summarized in Table 1. Depending on the setting, different terminology
is used.
Table 1
| Statistics |
Signal Processing |
| Probability |
Name |
Notation |
Name |
Notation |
|
P
0
declare
ℋ
1
P
0
declare
ℋ
1
|
size |
α
α
|
false-alarm probability |
P
F
P
F
|
|
P
1
declare
ℋ
1
P
1
declare
ℋ
1
|
power |
β
β
|
detection probability |
P
D
P
D
|
Here
P
i
declare
ℋ
j
P
i
declare
ℋ
j
dentoes the probability that we declare hypothesis
ℋ
j
ℋ
j
to be in effect when
ℋ
i
ℋ
i
is actually in effect. The probabilities
P
0
declare
ℋ
0
P
0
declare
ℋ
0
and
P
1
declare
ℋ
0
P
1
declare
ℋ
0
(sometimes called the miss probability),
are equal to
1−
P
F
1
P
F
and
1−
P
D
1
P
D
, respectively. Thus,
P
F
P
F
and
P
D
P
D
represent the two degrees of freedom in a binary
hypothesis test. Note that
P
F
P
F
and
P
D
P
D
do not involve a priori
probabilities of the hypotheses.
These two probabilities are related to
each other through the decision
regions. If
R
1
R
1
is the decision region for
ℋ
1
ℋ
1
, we have
P
F
=∫
R
1
f
0
xdx
P
F
x
R
1
f
0
x
P
D
=∫
R
1
f
1
xdx
P
D
x
R
1
f
1
x
The densities
f
i
x
f
i
x
are nonnegative, so as
R
1
R
1
shrinks, both probabilities tend to zero. As
R
1
R
1
expands, both tend to one. The ideal case, where
P
D
=1
P
D
1
and
P
F
=0
P
F
0
, cannot occur unless the distributions do not overlap
(i.e.,
∫
f
0
x
f
1
xdx=0
x
f
0
x
f
1
x
0
). Therefore, in order to increase
P
D
P
D
, we must also increase
P
F
P
F
. This represents the fundamental tradeoff in
hypothesis testing and detection theory.
Consider the simple binary hypothesis test
of a scalar measurement xx:
ℋ
0
:
x∼01
ℋ
0
:
x
0
1
ℋ
1
:
x∼11
ℋ
1
:
x
1
1
Suppose we use a threshold test
x
≷
ℋ
0
ℋ
1
γ
x
≷
ℋ
0
ℋ
1
γ
where
γ∈ℝ
γ
is a free parameter. Then the false alarm and
detection probabilities are
P
F
=∫γ∞12πⅇ-t22dt=Qγ
P
F
t
γ
1
2
t
2
2
Q
γ
P
D
=∫γ∞12πⅇ-t−122dt=Qγ−1
P
D
t
γ
1
2
t
1
2
2
Q
γ
1
where QQ denotes the
Q-function. These quantities
are depicted in Figure 1.
Since the
QQ-function
is monotonicaly decreasing, it is evident that both
P
D
P
D
and
P
F
P
F
decay to zero as
γγ
increases. There is also an explicit relationship
P
D
=QQ-1
P
F
−1
P
D
Q
Q
P
F
1
A common means of displaying this relationship is with a
receiver operating characteristic (ROC) curve,
which is nothing more than a plot of
P
D
P
D
versus
P
F
P
F
(
Figure 2).
The Neyman-Pearson criterion says that we should
construct our decision rule to have maximum probability of
detection while not allowing the probability of false alarm to
exceed a certain value αα. In
other words, the optimal detector according to the
Neyman-Pearson criterion is the solution to the following
constrainted optimization problem:
max{
P
D
}
,
such that
P
F
≤α
P
D
,
such that
P
F
α
(1)
The maximization is over all decision rules (equivalently, over
all decision regions
R
0
R
0
,
R
1
R
1
).
Using different terminology, the Neyman-Pearson criterion
selects the most powerful test of size (not exceeding)
αα.
Fortunately, the above optimization problem has
an explicit solution. This is given by the celebrated
Neyman-Pearson lemma, which we now state. To ease
the exposition, our initial statement of this result only
applies to continuous random variables, and places a technical
condition on the densities. A more general statement is given
later in the module.
Consider the test
ℋ
0
:
x∼
f
0
x
ℋ
0
:
x
f
0
x
ℋ
1
:
x∼
f
1
x
ℋ
1
:
x
f
1
x
where
f
i
x
f
i
x
is a density. Define
Λx=
f
1
x
f
0
x
Λ
x
f
1
x
f
0
x
, and assume that
Λx
Λ
x
satisfies the condition that for each
γ∈ℝ
γ
,
Λx
Λ
x
takes on the value
γγ with probability
zero under hypothesis
ℋ
0
ℋ
0
. The solution to the optimization problem
in Equation 1 is given by
Λx=
f
1
x
f
0
x
≷
ℋ
0
ℋ
1
η
Λ
x
f
1
x
f
0
x
≷
ℋ
0
ℋ
1
η
where ηη is such that
P
F
=∫∀x,Λx>η
f
0
xdx=α
P
F
x
x
Λ
x
η
f
0
x
α
If
α=0
α
0
, then
η=∞
η
. The optimal test is unique up to a set of
probability zero under
ℋ
0
ℋ
0
and
ℋ
1
ℋ
1
.
The optimal decision rule is called the
likelihood ratio
test.
Λx
Λ
x
is the
likelihood ratio, and
ηη is a
threshold. Observe that neither the likelihood
ratio nor the threshold depends on the
a
priori probabilities
Pr
ℋ
i
ℋ
i
. they depend only on the conditional densities
f
i
f
i
and the size constraint
αα. The threshold can often
be solved for as a function of
αα, as the next example
shows.
Continuing with Example 1,
suppose we wish to design a Neyman-Pearson decision rule with
size constraint αα. We have
Λx=12πⅇ-x−12212πⅇ-x22=ⅇx−12
Λ
x
1
2
x
1
2
2
1
2
x
2
2
x
1
2
(2)
By taking the natural logarithm of both sides of the LRT and
rarranging terms, the decision rule is not changed, and we
obtain
x
≷
ℋ
0
ℋ
1
lnη+12≡γ
x
≷
ℋ
0
ℋ
1
η
1
2
γ
Thus, the optimal rule is in fact a thresholding rule like we
considered in
Example 1. The false-alarm
probability was seen to be
P
F
=Qγ
P
F
Q
γ
Thus, we may express the value of
γγ required by the
Neyman-Pearson lemma in terms of
αα:
γ=Q-1α
γ
Q
α
For hypothesis testing involving multiple or
vector-valued data, direct evaluation of the size
(
P
F
P
F
) and power
(
P
D
P
D
)
of a Neyman-Pearson decision rule would require integration
over multi-dimensional, and potentially complicated decision
regions. In many cases, however, this can be avoided by
simplifying the LRT to a test of the form
t
≷
ℋ
0
ℋ
1
γ
t
≷
ℋ
0
ℋ
1
γ
where the test statistic
t=Tx
t
T
x
is a sufficient
statistic for the data. Such a simplified form is
arrived at by modifying both sides of the LRT with
montonically increasing transformations, and by algebraic
simplifications. Since the modifications do not change the
decision rule, we may calculate
P
F
P
F
and
P
D
P
D
in terms of the sufficient statistic. For
example, the false-alarm probability may be written
P
F
=Pr
declare
ℋ
1
=∫∀t,t>γ
f
0
tdt
P
F
declare
ℋ
1
t
t
t
γ
f
0
t
(3)
where
f
0
t
f
0
t
denotes the density of
t
t under
ℋ
0
ℋ
0
. Since
t
t is typically of lower dimension than
xx, evaluation of
P
F
P
F
and
P
D
P
D
can be greatly simplified. The key is being able to reduce the
LRT to a threshold test involving a sufficient statistic
for which we know the distribution.
Let's design a Neyman-Pearson decision rule
of size αα for the
problem
ℋ
0
:
x∼0σ2I
ℋ
0
:
x
0
σ
2
I
ℋ
1
:
x∼μ1σ2I
ℋ
1
:
x
μ
1
σ
2
I
where
μ>0
μ
0
,
σ2>0
σ
2
0
are known,
0=0…0T
0
0
…
0
,
1=1…1T
1
1
…
1
are NN-dimensional
vectors, and II
is the N
N×NN identity
matrix. The likelihood ratio is
Λx=∏n=1N12πσ2ⅇ-
x
n
−μ22σ2∏n=1N12πσ2ⅇ-
x
n
22σ2=ⅇ-∑n=1N
x
n
−μ22σ2ⅇ-∑n=1N
x
n
22σ2=ⅇ12σ2∑n=1N2
x
n
μ−μ2=ⅇ1σ2-Nμ22+μ∑n=1N
x
n
Λ
x
n
1
N
1
2
σ
2
x
n
μ
2
2
σ
2
n
1
N
1
2
σ
2
x
n
2
2
σ
2
n
1
N
x
n
μ
2
2
σ
2
n
1
N
x
n
2
2
σ
2
1
2
σ
2
n
1
N
2
x
n
μ
μ
2
1
σ
2
N
μ
2
2
μ
n
1
N
x
n
(4)
To simplify the test further we may apply the natural
logarithm and rearrange terms to obtain
t≡∑n=1N
x
n
≷
ℋ
0
ℋ
1
σ2μlnη+Nμ2≡γ
t
n
1
N
x
n
≷
ℋ
0
ℋ
1
σ
2
μ
η
N
μ
2
γ
We have used the assumption
μ>0
μ
0
. If
μ<0
μ
0
, then division by
μμ is not a
monotonically increasing operation, and the inequalities
would be reversed.
The test statistic
tt is
sufficient for the unknown
mean. To set the threshold
γγ, we write the
false-alarm probability (size) as
P
F
=Prt>γ=∫∀t,t>γ
f
0
tdt
P
F
t
γ
t
t
t
γ
f
0
t
To evaluate
P
F
P
F
, we need to know the density of
tt under
ℋ
0
ℋ
0
. Fortunately,
tt
is the sum of normal variates, so it is again normally
distributed. In particular, we have
t=Ax
t
A
x
, where
A=1T
A
1
, so
t∼A0Aσ2IAT=0Nσ2
t
A
0
A
σ
2
I
A
0
N
σ
2
under
ℋ
0
ℋ
0
. Therefore, we may write
P
F
P
F
in terms of the
Q-function as
P
F
=QγNσ
P
F
Q
γ
N
σ
The threshold is thus determined by
γ=NσQ-1α
γ
N
σ
Q
α
Under
ℋ
1
ℋ
1
, we have
t∼A1Aσ2IAT=NμNσ2
t
A
1
A
σ
2
I
A
N
μ
N
σ
2
and so the detection probability (power) is
P
D
=Prt>γ=Qγ−NμNσ
P
D
t
γ
Q
γ
N
μ
N
σ
Writing
P
D
P
D
as a function of
P
F
P
F
, the ROC curve is given by
P
D
=QQ-1
P
F
−Nμσ
P
D
Q
Q
P
F
N
μ
σ
The quantity
Nμσ
N
μ
σ
is called the
signal-to-noise
ratio. As its name suggests, a larger SNR
corresponds to improved performance of the Neyman-Pearson
decision rule.
In the context of signal processing, the
foregoing problem may be viewed as the problem of detecting a
constant (DC) signal in
additive white
Gaussian noise:
ℋ
0
:
x
n
=
w
n
,
n
=
1
,
…
,
N
ℋ
0
:
x
n
w
n
,
n
=
1
,
…
,
N
ℋ
1
:
x
n
=A+
w
n
,
n
=
1
,
…
,
N
ℋ
1
:
x
n
A
w
n
,
n
=
1
,
…
,
N
where
AA is a known, fixed
amplitude, and
w
n
∼0σ2
w
n
0
σ
2
. Here
AA corresponds
to the mean
μμ in the
example.
In our initial statement of the Neyman-Pearson
Lemma, we assumed that for all
ηη, the set
{x|Λx=η}
x
Λ
x
η
x
had probability zero under
ℋ
0
ℋ
0
. This eliminated many important problems
from consideration, including tests of discrete data.
In this section we remove this restriction.
It is helpful to introduce a more general way of
writing decision rules. Let φφ
be a function of the data xx with
φx∈01
φ
x
0
1
. φφ defines the
decision rule "declare
ℋ
1
ℋ
1
with probability
φx
φ
x
." In other words, upon observing xx, we flip a
"
φx
φ
x
coin." If it turns up heads, we declare
ℋ
1
ℋ
1
; otherwise we declare
ℋ
0
ℋ
0
. Thus far, we have only considered rules with
φx∈01
φ
x
0
1
Consider the hypothesis testing problem
ℋ
0
:
x∼
f
0
x
ℋ
0
:
x
f
0
x
ℋ
1
:
x∼
f
1
x
ℋ
1
:
x
f
1
x
where
f
0
f
0
and
f
1
f
1
are both pdfs or both pmfs. Let
α∈01
α
0
1
be the size (false-alarm probability)
constraint. The decision rule
φx=1ifΛx>ηρifΛx=η0ifΛx<η
φ
x
1
Λ
x
η
ρ
Λ
x
η
0
Λ
x
η
is the most powerful test of size
αα, where
ηη and
ρρ
are uniquely determined by requiring
P
F
=α
P
F
α
. If
α=0
α
0
, we take
η=∞
η
,
ρ=0
ρ
0
. This test is unique up to sets of probability
zero under
ℋ
0
ℋ
0
and
ℋ
1
ℋ
1
.
When
PrΛx=η>0
Λ
x
η
0
for certain
η
η, we choose
ηη and
ρρ as follows: Write
P
F
=PrΛx>η+ρPrΛx=η
P
F
Λ
x
η
ρ
Λ
x
η
Choose
ηη such that
PrΛx>η≤α≤PrΛx≥η
Λ
x
η
α
Λ
x
η
Then choose
ρρ such that
ρPrΛx=η=α−PrΛx<η
ρ
Λ
x
η
α
Λ
x
η
Suppose we have a friend who is trying to
transmit a bit (0 or 1) to us over a noisy channel. The
channel causes an error in the transmission (that is, the bit
is flipped) with probability pp,
where
0≤p<12
0
p
1
2
, and pp is known. In
order to increase the chance of a successful transmission,
our friend sends the same bit
NN times. Assume the
NN transmissions are
statistically independent. Under these assumptions, the bits
you receive are Bernoulli random variables:
x
n
∼Bernoulliθ
x
n
Bernoulli
θ
. We are faced with the following hypothesis test:
ℋ
0
:
θ=p
(0 sent)
ℋ
0
:
θ
p
(0 sent)
ℋ
1
:
θ=1−p
(1 sent)
ℋ
1
:
θ
1
p
(1 sent)
We decide to decode the received sequence
x=
x
1
…
x
N
T
x
x
1
…
x
N
by designing a Neyman-Pearson rule. The likelihood ratio is
Λx=∏n=1N1−p
x
n
p1−
x
n
∏n=1Np
x
n
1−p1−
x
n
=1−pkpN−kpk1−pN−k=1−pp2k−N
Λ
x
n
1
N
1
p
x
n
p
1
x
n
n
1
N
p
x
n
1
p
1
x
n
1
p
k
p
N
k
p
k
1
p
N
k
1
p
p
2
k
N
(5)
where
k=∑n=1N
x
n
k
n
1
N
x
n
is the number of 1s received.
The LRT is
1−pp2k−N
⋛
ℋ
0
ℋ
1
η
1
p
p
2
k
N
⋛
ℋ
0
ℋ
1
η
Taking the natural logarithm of both sides and rearranging,
we have
k
⋛
ℋ
0
ℋ
1
N2+12lnηln1−pp=γ
k
⋛
ℋ
0
ℋ
1
N
2
1
2
η
1
p
p
γ
The false alarm probability is
P
F
=Prk>γ+ρPrk=γ=∑k=γ+1NNkpk1−pN−k+ρNγpγ1−pN−γ
P
F
k
γ
ρ
k
γ
k
γ
1
N
N
k
p
k
1
p
N
k
ρ
N
γ
p
γ
1
p
N
γ
(6)
γγ and
ρρ are chosen so that
P
F
=α
P
F
α
, as described above.
The corresponding detection probability is
P
D
=Prk>γ+ρPrk=γ=∑k=γ+1NNk1−pkpN−k+ρNγ1−pγpN−γ
P
D
k
γ
ρ
k
γ
k
γ
1
N
N
k
1
p
k
p
N
k
ρ
N
γ
1
p
γ
p
N
γ
(7)
Design a hypothesis testing problem
involving continous random variables such that
PrΛx=η>0
Λ
x
η
0
for certain values of
ηη. Write down the
false-alarm probability as a function of the
threshold. Make as general a statement as possible about
when the technical
condition is satisfied.
Consider the scalar hypothesis testing problem
ℋ
0
:
x∼
f
0
x
ℋ
0
:
x
f
0
x
ℋ
1
:
x∼
f
1
x
ℋ
1
:
x
f
1
x
where
f
i
x=1π1+x−i2
,
i=01
f
i
x
1
1
x
i
2
,
i
0
1
Write down the likelihood ratio test.
Determine the decision regions as a function of
η
1
η
1
for all
η>0
η
0
. Draw a representative of each. What are the
"critical" values of
ηη?
There are five distinct cases.
Compute the size and power
(
P
F
P
F
and
P
D
P
D
) in terms of the threshold
η
1
η
1
and plot the ROC.
∫11+x2dx=arctanx
x
1
1
x
2
x
Suppose we decide to use a simple
threshold test
x
≷
ℋ
0
ℋ
1
η
x
≷
ℋ
0
ℋ
1
η
instead of the Neyman-Pearson rule. Does our
performance
ℋ
0
ℋ
0
suffer much? Plot the ROC for this decision
rule on the same graph as for the previous ROC.
Suppose we observe
NN independent realizations
of a Poisson random variable
kk with intensity parameter
λλ:
fk=ⅇ-λλkk!
f
k
λ
λ
k
k
We must decide which of two intensities is in effect:
ℋ
0
:
λ=
λ
0
ℋ
0
:
λ
λ
0
ℋ
1
:
λ=
λ
1
ℋ
1
:
λ
λ
1
where
λ
0
<
λ
1
λ
0
λ
1
.
Write down the likelihood ratio test.
Simplify the LRT to a test statistic
involving only a sufficient statistic. Apply a monotonically
increasing transformation to simplify further.
Determine the distribution of the sufficient
statistic under both hypotheses.
Use the
characteristic function to show that a sum of IID Poisson
variates is again Poisson distributed.
Derive an expression for the probability of
error.
Assuming the two hypotheses are equally likely, and
λ
0
=5
λ
0
5
and
λ
1
=6
λ
1
6
, what is the minimum number
NN of observations needed
to attain a false-alarm probability no greater than
0.01?
If you have numerical trouble,
try rewriting the log-factorial so as to avoid
evaluating the factorial of large
integers.
In Example 3, suppose
p=0.1
p
0.1
. What is the smallest value of
NN needed to ensure
P
F
≤0.01
P
F
0.01
? What is
P
D
P
D
in this case?