Situations occur frequently where assigning or measuring the
a priori probabilities
P
i
P
i
is unreasonable. For example, just what is the
a priori probability of a supernova
occurring in any particular region of the sky? We clearly
need a model evaluation procedure which can function without
a priori probabilities. This kind of test
results when the so-called Neyman-Pearson criterion is used to
derive the decision rule. The ideas behind and decision rules
derived with the Neyman-Pearson criterion (Neyman and Pearson) will serve us
well in sequel; their result is important!
Using nomenclature from radar, where model
ℳ
1
ℳ
1
represents the presence of a target and
ℳ
0
ℳ
0
its absence, the various types of correct and
incorrect decisions have the following names (Woodward, pp. 127-129).
-
Detection:
we say it's there when it is;
P
D
=Pr
say
ℳ
1
|
ℳ
1
true
P
D
Pr
say
ℳ
1
|
ℳ
1
true
-
False-alarm:
we say it's there when it's not;
P
F
=Pr
say
ℳ
1
|
ℳ
0
true
P
F
Pr
say
ℳ
1
|
ℳ
0
true
-
Miss:
we say it's not there when it is;
P
M
=Pr
say
ℳ
0
|
ℳ
1
true
P
M
Pr
say
ℳ
0
|
ℳ
1
true
The remaining probability
Pr
say
ℳ
0
|
ℳ
0
true
ℳ
0
true
say
ℳ
0
has historically been left nameless and equals
1−
P
F
1
P
F
. We should also note that the detection and miss
probabilities are related by
P
M
=1−
P
D
P
M
1
P
D
. As these are conditional probabilities, they do
not depend on the
a priori probabilities
and the two probabilities
P
F
P
F
and
P
D
P
D
characterize the errors when
any decision rule is used.
These two probabilities are related to each other in an
interesting way. Expressing these quantities in terms of the
decision regions and the likelihood functions, we have
P
F
=∫p
r
|
ℳ
0
rd
r
P
F
r
ℜ
1
p
r
ℳ
0
r
P
D
=∫p
r
|
ℳ
1
rd
r
P
D
r
ℜ
1
p
r
ℳ
1
r
As the region
ℜ
1
ℜ
1
shrinks, both of these
probabilities tend toward zero; as
ℜ
1
ℜ
1
expands to engulf the entire range of observation
values, they both tend toward unity. This rather direct
relationship between
P
D
P
D
and
P
F
P
F
does not mean that they equal each other;
in most cases, as
ℜ
1
ℜ
1
expands,
P
D
P
D
increases more rapidly than
P
F
P
F
(we had better be right more often than we are
wrong!). However, the "ultimate" situation where a rule is
always right and never wrong
(
P
D
=1
P
D
1
,
P
F
=0
P
F
0
) cannot occur when the conditional distributions
overlap. Thus, to increase the detection probability we must
also allow the false-alarm probability to increase. This
behavior represents the fundamental tradeoff in hypothesis
testing and detection theory.
One can attempt to impose a performance criterion that depends
only on these probabilities with the consequent decision rule
not depending on the a priori
probabilities. The Neyman-Pearson criterion assumes that the
false-alarm probability is constrained to be less than or
equal to a specified value
α
α while we attempt to maximize the detection
probability
P
D
P
D
.
max
ℜ
1
ℜ
1
P
D
,
P
F
≤α
P
F
P
F
α
ℜ
1
P
D
A subtlety of the succeeding solution is that the
underlying probability distribution functions may not be
continuous, with the result that
P
F
P
F
can never equal the constraining value
α
α. Furthermore, an (unlikely) possibility is that the
optimum value for the false-alarm probability is somewhat less
than the criterion value. Assume, therefore, that we rephrase
the optimization problem by requiring that the false-alarm
probability equal a value
α′
α
that is less than or equal to
α
α.
This optimization problem can be solved using Lagrange
multipliers (see Constrained
Optimization); we seek to find the decision rule that
maximizes
F=
P
D
+λ(
P
F
−α′)
F
P
D
λ
P
F
α
where
λ
λ is the Lagrange multiplier. This optimization
technique amounts to finding the decision rule that maximizes
F
F, then finding the value of the multiplier that
allows the criterion to be satisfied. As is usual in the
derivation of optimum decision rules, we maximize these
quantities with respect to the decision regions. Expressing
P
D
P
D
and
P
F
P
F
in terms of them, we have
F=∫p
r
|
ℳ
1
rd
r
+λ(∫p
r
|
ℳ
0
rd
r
−α′)=−(λα′)+∫p
r
|
ℳ
1
r+λp
r
|
ℳ
0
rd
r
F
r
ℜ
1
p
r
ℳ
1
r
λ
r
ℜ
1
p
r
ℳ
0
r
α
λ
α
r
ℜ
1
p
r
ℳ
1
r
λ
p
r
ℳ
0
r
(2) To maximize this quantity with respect to
ℜ
1
ℜ
1
, we need only to integrate over those regions of
r
r where the integrand is positive. The region
ℜ
1
ℜ
1
thus corresponds to those values of
r
r where
p
r
|
ℳ
1
r>−(λp
r
|
ℳ
0
r)
p
r
ℳ
1
r
λ
p
r
ℳ
0
r
and the resulting decision rule is
p
r
|
ℳ
1
rp
r
|
ℳ
0
r
≷
ℳ
0
ℳ
1
(−λ)
p
r
ℳ
1
r
p
r
ℳ
0
r
≷
ℳ
0
ℳ
1
λ
The ubiquitous likelihood ratio test again appears;
it
is indeed the fundamental quantity in
hypothesis testing. Using the logarithm of the likelihood
ratio or the sufficient statistic, this result can be
expressed as either
lnΛr
≷
ℳ
0
ℳ
1
ln−λ
Λ
r
≷
ℳ
0
ℳ
1
λ
or
ϒr
≷
ℳ
0
ℳ
1
γ
ϒ
r
≷
ℳ
0
ℳ
1
γ
We have not as yet found a value for the threshold. The
false-alarm probability can be expressed in terms of the
Neyman-Pearson threshold in two (useful) ways.
P
F
=∫−λ∞p
Λ
|
ℳ
0
Λd
Λ
=∫γ∞p
ϒ
|
ℳ
0
ϒd
ϒ
P
F
Λ
λ
p
Λ
ℳ
0
Λ
ϒ
γ
p
ϒ
ℳ
0
ϒ
(3) One of these implicit equations must be solved for
the threshold by setting
P
F
P
F
equal to
α′
α
. The selection of which to use is usually based on
pragmatic considerations: the easiest to compute. From the
previous discussion of the relationship between the detection
and false-alarm probabilities, we find that to maximize
P
D
P
D
we must allow
α′
α
to be as large as possible while remaining less than
α
α. Thus, we want to find the
smallest value of
−λ
λ
(note the minus sign) consistent with the
constraint. Computation of the threshold is
problem-dependent, but a solution always exists.
An important application of the likelihood ratio test occurs
when
r
r is a Gaussian random vector for each model.
Suppose the models correspond to Gaussian random vectors
having different mean values but sharing the same identity
covariance.
-
ℳ
0
ℳ
0
:
r∼𝒩0σ2I
r
0
σ
2
I
-
ℳ
1
ℳ
1
:
r∼𝒩mσ2I
r
m
σ
2
I
Thus,
r
r is of dimension
L
L and has statistically independent, equal variance
components. The vector of means
m=
m
0
…
m
L
−
1
T
m
m
0
…
m
L
−
1
distinguishes the two models. The likelihood
functions associated this problem are
p
r
|
ℳ
0
r=∏
l
=0L−112πσ2e−(1/2
r
l
σ2)
p
r
ℳ
0
r
l
0
L
1
1
2
σ
2
12
r
l
σ
2
p
r
|
ℳ
1
r=∏
l
=0L−112πσ2e−(1/2
r
l
−
m
l
σ2)
p
r
ℳ
1
r
l
0
L
1
1
2
σ
2
12
r
l
m
l
σ
2
The likelihood ratio
Λr
Λ
r
becomes
Λr=∏
l
=0L−1e−(1/2
r
l
−
m
l
σ2)∏
l
=0L−1e−(1/2
r
l
σ2)
Λ
r
l
0
L
1
12
r
l
m
l
σ
2
l
0
L
1
12
r
l
σ
2
This expression for the likelihood ratio is
complicated. In the Gaussian case (and many others), we use
the logarithm the reduce the complexity of the likelihood
ratio and form a sufficient statistic.
lnΛr=∑
l
=0L−1-1/2
r
l
−
m
l
2σ2+1/2
r
l
2σ2=1σ2∑
l
=0L−1
m
l
r
l
−12σ2∑
l
=0L−1
m
l
2
Λ
r
l
0
L
1
-12
r
l
m
l
2
σ
2
12
r
l
2
σ
2
1
σ
2
l
0
L
1
m
l
r
l
1
2
σ
2
l
0
L
1
m
l
2
(4) The likelihood ratio test then has the much
simpler, but equivalent form
∑
l
=0L−1(
m
l
r
l
)
≷
ℳ
0
ℳ
1
(σ2lnη)+1/2∑
l
=0L−1
m
l
2
l
0
L
1
m
l
r
l
≷
ℳ
0
ℳ
1
σ
2
η
12
l
0
L
1
m
l
2
To focus on the model evaluation aspects of this
problem, let's assume means be equal to a positive constant:
m
l
=m
m
l
m
(
0
0
).
∑
l
=0L−1
r
l
≷
ℳ
0
ℳ
1
(σ2mlnη)+Lm2
l
0
L
1
r
l
≷
ℳ
0
ℳ
1
σ
2
m
η
L
m
2
Note that all that need be known about the observations
r
l
r
l
is their sum. This quantity is the sufficient
statistic for the Gaussian problem:
ϒr=∑
r
l
ϒ
r
r
l
and
γ=σ2lnηm+Lm2
γ
σ
2
η
m
L
m
2
.
When trying to compute the probability of error or the
threshold in the Neyman-Pearson criterion, we must find the
conditional probability density of one of the decision
statistics: the likelihood ratio, the log-likelihood, or the
sufficient statistic. The log-likelihood and the sufficient
statistic are quite similar in this problem, but clearly we
should use the latter. One practical property of the
sufficient statistic is that it usually simplifies
computations. For this Gaussian example, the sufficient
statistic is a Gaussian random variable under each model.
-
ℳ
0
ℳ
0
:
ϒr∼𝒩0Lσ2
ϒ
r
0
L
σ
2
-
ℳ
1
ℳ
1
:
ϒr∼𝒩LmLσ2
ϒ
r
L
m
L
σ
2
To find the probability of error from
Equation 1, we must evaluate the area under a
Gaussian probability density function. These integrals are
succinctly expressed in terms of
Qx
Q
x
, which denotes the probability that a
unit-variance, zero-mean Gaussian random variable exceeds
x
x (see
Probability and
Stochastic Processes). As
1−Qx=Q−x
1
Q
x
Q
x
, the probability of error can be written as
P
e
=
π
1
QLm−γLσ+
π
0
QγLσ
P
e
π
1
Q
L
m
γ
L
σ
π
0
Q
γ
L
σ
An interesting special case occurs when
π
0
=1/2=
π
1
π
0
12
π
1
. In this case,
γ=Lm2
γ
L
m
2
and the probability of error becomes
P
e
=QLm2σ
P
e
Q
L
m
2
σ
As
Q·
Q
·
is a monotonically decreasing function, the
probability of error decreases with increasing values of the
ratio
Lm2σ
L
m
2
σ
. However, as shown in
this figure,
Q·
Q
·
decreases in a nonlinear fashion. Thus,
increasing
m
m by a factor of two may decrease the probability of
error by a larger
or a smaller factor;
the amount of change depends on the initial value of the
ratio.
To find the threshold for the Neyman-Pearson test from the
expressions given on Equation 3, we
need the area under a Gaussian density.
P
F
=QγLσ2=α′
P
F
Q
γ
L
σ
2
α
(5)
As
Q·
Q
·
is a monotonic and continuous function, we can now set
α′
α
equal to the criterion value
α α with the result
γ=LσQ-1α
γ
L
σ
Q
α
where
Q-1·
Q
·
denotes the inverse function of
Q·
Q
·
. The solution of this equation cannot
be performed analytically as no closed form expression
exists for
Q·
Q
·
(much less its inverse function); the criterion
value must be found from tables or numerical routines.
Because Gaussian problems arise frequently, the
Table 1 accompanying table provides
numeric values for this quantity at the decade points.
Table 1:
The table displays interesting values for
Q-1·
Q
·
that can be used to determine thresholds in
the Neyman-Pearson variant of the likelihood ratio test.
Note how little the inverse function changes for decade
changes in its argument;
Q·
Q
·
is indeed very nonlinear.
|
x
x
|
Q-1x
Q
x
|
|
10-1
10
-1
|
1.281
|
|
10-2
10
-2
|
2.396
|
|
10-3
10
-3
|
3.090
|
|
10-4
10
-4
|
3.719
|
|
10-5
10
-5
|
4.265
|
|
10-6
10
-6
|
4.754
|
The detection probability is given by
P
D
=QQ-1α−Lmσ
P
D
Q
Q
α
L
m
σ