The criterion used in the previous section---minimize the
average cost of an incorrect decision---may seem to be a
contrived way of quantifying decisions. Well, often it is. For
example, the Bayesian decision rule depends explicitly on the
a priori probabilities. A rational method of
assigning values to these---either by experiment or through true
knowledge of the relative likelihood of each model---may be
unreasonable. In this section, we develop alternative decision
rules that try to respond to such objections. One essential point
will emerge from these considerations: the likelihood
ratio persists as the core of optimal detectors as
optimization criteria and problem complexity change.
Even criteria remote from
performance error measures can result in the likelihood ratio test.
Such an invariance does not occur often in signal processing and
underlines the likelihood ratio test's importance.
As only one model can describe any given set of data (the
models are mutually exclusive), the probability of being
correct
P
c
P
c
for distinguishing two models is given by
P
c
=Pr
say
ℳ
0
when
ℳ
0
true
+Pr
say
ℳ
1
when
ℳ
1
true
P
c
say
ℳ
0
when
ℳ
0
true
say
ℳ
1
when
ℳ
1
true
We wish to determine the optimum decision region
placement.
Expressing the probability of being correct in terms of the
likelihood functions
pR|
ℳ
i
r
p
R
ℳ
i
r
, the a priori probabilities and
the decision regions, we have
P
c
=∫
Z
0
π
0
pR|
ℳ
0
rdr+∫
Z
1
π
1
pR|
ℳ
1
rdr
P
c
r
Z
0
π
0
p
R
ℳ
0
r
r
Z
1
π
1
p
R
ℳ
1
r
We want to maximize
P
c
P
c
by selecting the decision regions
Z
0
Z
0
and
Z
1
Z
1
. Mimicking the ideas of the previous section, we associate each value of
rr with the largest integral in the expression for
P
c
P
c
. Decision region
Z
0
Z
0
, for example, is defined by the collection of values
of
rr for which the first term is largest. As all of the
quantities involved are non-negative, the decision rule
maximizing the probability of a correct decision is
Given
r
r, choose
ℳ
i
ℳ
i
for which the product
π
i
pR|
ℳ
i
r
π
i
p
R
ℳ
i
r
is largest.
When we must select among more than two models, this result still applies (prove this for yourself). Simple manipulations lead to the likelihood ratio test when we must decide between two models.
pR|
ℳ
1
rpR|
ℳ
0
r
≷
ℳ
0
ℳ
1
π
0
π
1
p
R
ℳ
1
r
p
R
ℳ
0
r
≷
ℳ
0
ℳ
1
π
0
π
1
Note that if the Bayes' costs were chosen so that
C
i
i
=0
C
i
i
0
and
C
i
j
=C
C
i
j
C
, (
i≠j
i
j
), the Bayes' cost and the maximum-probability-correct thresholds would be the same.
To evaluate the quality of the decision rule, we usually
compute the probability of error
P
e
P
e
rather than the probability of being correct. This
quantity can be expressed in terms of the observations, the
likelihood ratio, and the sufficient statistic.
P
e
=
π
0
∫
Z
1
pR|
ℳ
0
rdr+
π
1
∫
Z
0
pR|
ℳ
1
rdr=
π
0
∫Λ>ηpΛ|
ℳ
0
ΛdΛ+
π
1
∫Λ<ηpΛ|
ℳ
1
ΛdΛ=
π
0
∫ϒ>γpϒ|
ℳ
0
ϒdϒ+
π
1
∫ϒ<γpϒ|
ℳ
1
ϒdϒ
P
e
π
0
r
Z
1
p
R
ℳ
0
r
π
1
r
Z
0
p
R
ℳ
1
r
π
0
Λ
Λ
η
p
Λ
ℳ
0
Λ
π
1
Λ
Λ
η
p
Λ
ℳ
1
Λ
π
0
ϒ
ϒ
γ
p
ϒ
ℳ
0
ϒ
π
1
ϒ
ϒ
γ
p
ϒ
ℳ
1
ϒ
(1)
These expressions point out that the likelihood
ratio and the sufficient statistic can each be considered a
function of the observations
r
r; hence, they are random variables and have
probability densities for each model.
When the likelihood ratio is non-monotonic, the
first expression is most difficult to evaluate. When
monotonic, the middle expression often proves to be the most difficult.
No matter how it is calculated,
no other
decision rule can yield a smaller probability of
error. This statement is obvious as we minimized
the probability of error implicitly by maximizing the probability of being correct because
P
e
=1-
P
c
P
e
1
P
c
.
From a grander viewpoint, these expressions represent an achievable lower
bound on performance (as assessed by the probability of
error). Furthermore, this probability will be non-zero if the conditional
densities overlap over some range of values of
r
r, such as occurred in the previous example. Within
regions of overlap, the observed values are ambiguous: either
model is consistent with the observations. Our "optimum"
decision rule operates in such regions by selecting that model
which is most likely (has the highest probability) of
generating the measured data.
Situations occur frequently where assigning or measuring the
a priori probabilities
π
i
π
i
is unreasonable. For example, just what is the
a priori probability of a supernova
occurring in any particular region of the sky? We clearly
need a model evaluation procedure that can function without
a priori probabilities. This kind of test
results when the so-called Neyman-Pearson criterion is used to
derive the decision rule.
Using nomenclature from radar, where model
ℳ
1
ℳ
1
represents the presence of a target and
ℳ
0
ℳ
0
its absence, the various types of correct and
incorrect decisions have the following names.
- Detection Probability -
we say it's there when it is;
P
D
=Pr
say
ℳ
1
|
ℳ
1
true
P
D
ℳ
1
true
say
ℳ
1
- False-alarm Probability -
we say it's there when it's not;
P
F
=Pr
say
ℳ
1
|
ℳ
0
true
P
F
ℳ
0
true
say
ℳ
1
- Miss Probability -
we say it's not there when it is;
P
M
=Pr
say
ℳ
0
|
ℳ
1
true
P
M
ℳ
1
true
say
ℳ
0
The remaining probability
Pr
say
ℳ
0
|
ℳ
0
true
ℳ
0
true
say
ℳ
0
has historically been left nameless and equals
1-
P
F
1
P
F
. We should also note that the detection and miss
probabilities are related by
P
M
=1-
P
D
P
M
1
P
D
. As these are conditional probabilities, they do
not depend on the
a priori probabilities.
Furthermore, the two probabilities
P
F
P
F
and
P
D
P
D
characterize the errors when
any decision rule is used.
These two probabilities are related to each other in an
interesting way. Expressing these quantities in terms of the
decision regions and the likelihood functions, we have
P
F
=∫
Z
1
pR|
ℳ
0
rdr
P
F
r
Z
1
p
R
ℳ
0
r
P
D
=∫
Z
1
pR|
ℳ
1
rdr
P
D
r
Z
1
p
R
ℳ
1
r
As the region
Z
1
Z
1
shrinks, both of these
probabilities tend toward zero; as
Z
1
Z
1
expands to engulf the entire range of observation
values, they both tend toward unity. This rather direct
relationship between
P
D
P
D
and
P
F
P
F
does not mean that they equal each other;
in most cases, as
Z
1
Z
1
expands,
P
D
P
D
increases more rapidly than
P
F
P
F
(we had better be right more often than we are
wrong!). However, the "ultimate" situation where a rule is
always right and never wrong
(
P
D
=1
P
D
1
,
P
F
=0
P
F
0
) cannot occur when the conditional distributions
overlap. Thus, to increase the detection probability we must
also allow the false-alarm probability to increase. This
behavior represents the fundamental tradeoff in detection theory.
One can attempt to impose a performance criterion that depends
only on these probabilities with the consequent decision rule
not depending on the a priori
probabilities. The Neyman-Pearson criterion assumes that the
false-alarm probability is constrained to be less than or
equal to a specified value
α
α while we maximize the detection
probability
P
D
P
D
.
∀
P
F
,
P
F
≤α:max
Z
1
{
P
D
}
P
F
P
F
α
Z
1
P
D
A subtlety of the solution we are about to obtain is that the
underlying probability distribution functions may not be
continuous, with the consequence that
P
F
P
F
can never equal the constraining value
α
α. Furthermore, a (unlikely) possibility is that the
optimum value for the false-alarm probability is somewhat less
than the criterion value. Assume, therefore, that we rephrase
the optimization problem by requiring that the false-alarm
probability equal a value
α
′
α
that is the largest possible value less than or equal to
α
α.
This optimization problem can be solved using
Lagrange
multipliers; we seek to find the decision rule that
maximizes
F=
P
D
-λ
P
F
-
α
′
F
P
D
λ
P
F
α
where
λ
λ is a positive Lagrange multiplier. This optimization
technique amounts to finding the decision rule that maximizes
F
F, then finding the value of the multiplier that
allows the criterion toinge the detection probability in competition with false-alrm probabilities
in excess of the criterion value. As is usual in the
derivation of optimum decision rules, we maximize these
quantities with respect to the decision regions. Expressing
P
D
P
D
and
P
F
P
F
in terms of them, we have
F=∫
Z
1
pR|
ℳ
1
rdr-λ∫
Z
1
pR|
ℳ
0
rdr-
α
′=λ
α
′+∫
Z
1
pR|
ℳ
1
r-λpR|
ℳ
0
r
dr
F
r
Z
1
p
R
ℳ
1
r
λ
r
Z
1
p
R
ℳ
0
r
α
λ
α
r
Z
1
p
R
ℳ
1
r
λ
p
R
ℳ
0
r
(2)
To maximize this quantity with respect to
Z
1
Z
1
, we need only to integrate over those regions of
r
r where the integrand is positive). The region
Z
1
Z
1
thus corresponds to those values of
r
r
where
pR|
ℳ
1
r>λpR|
ℳ
0
r
p
R
ℳ
1
r
λ
p
R
ℳ
0
r
and the resulting decision rule is
pR|
ℳ
1
rpR|
ℳ
0
r
≷
ℳ
0
ℳ
1
λ
p
R
ℳ
1
r
p
R
ℳ
0
r
≷
ℳ
0
ℳ
1
λ
The ubiquitous likelihood ratio test again appears;
it
is indeed the fundamental quantity in
hypothesis testing. Using either the logarithm of the likelihood
ratio or the sufficient statistic, this result can be
expressed as
lnΛr
≷
ℳ
0
ℳ
1
lnλ
Λ
r
≷
ℳ
0
ℳ
1
λ
or
ϒr
≷
ℳ
0
ℳ
1
γ
ϒ
r
≷
ℳ
0
ℳ
1
γ
We have not as yet found a value for the threshold. The
false-alarm probability can be expressed in terms of the
Neyman-Pearson threshold in two (useful) ways.
P
F
=∫λ∞pΛ|
ℳ
0
ΛdΛ=∫γ∞pϒ|
ℳ
0
ϒdϒ
P
F
Λ
λ
p
Λ
ℳ
0
Λ
ϒ
γ
p
ϒ
ℳ
0
ϒ
(3)
One of these implicit equations must be solved for
the threshold by setting
P
F
P
F
equal to
α
′
α
. The selection of which to use is usually based on
pragmatic considerations: the easiest to compute. From the
previous discussion of the relationship between the detection
and false-alarm probabilities, we find that to maximize
P
D
P
D
we must allow
α
′
α
to be as large as possible while remaining less than
α
α. Thus, we want to find the
smallest value of
λλ consistent with the
constraint. Computation of the threshold is
problem-dependent, but a solution always exists.
An important application of the likelihood ratio test occurs
when
R
R is a Gaussian random vector for each model.
Suppose the models correspond to Gaussian random vectors
having different mean values but sharing the same
covariance.
-
ℳ
0
ℳ
0
:
R∼N0σ2I
R
N
0
σ
2
I
-
ℳ
1
ℳ
1
:
R∼Nmσ2I
R
N
m
σ
2
I
R
R is of dimension
L
L and has statistically independent, equi-variance
components. The vector of means
m=
m
0
…
m
L
−
1
T
m
m
0
…
m
L
−
1
distinguishes the two models. The likelihood
functions associated this problem are
pR|
ℳ
0
r=∏l=0L-112πσ2ⅇ-1/2
r
l
σ2
p
R
ℳ
0
r
l
0
L
1
1
2
σ
2
12
r
l
σ
2
pR|
ℳ
1
r=∏l=0L-112πσ2ⅇ-1/2
r
l
-
m
l
σ2
p
R
ℳ
1
r
l
0
L
1
1
2
σ
2
12
r
l
m
l
σ
2
The likelihood ratio
Λr
Λ
r
becomes
Λr=∏l=0L-1ⅇ-1/2
r
l
-
m
l
σ2∏l=0L-1ⅇ-1/2
r
l
σ2
Λ
r
l
0
L
1
12
r
l
m
l
σ
2
l
0
L
1
12
r
l
σ
2
This expression for the likelihood ratio is
complicated. In the Gaussian case (and many others), we use
the logarithm the reduce the complexity of the likelihood
ratio and form a sufficient statistic.
lnΛr=∑l=0L-1-1/2
r
l
-
m
l
2σ2+1/2
r
l
2σ2=1σ2∑l=0L-1
m
l
r
l
-12σ2∑l=0L-1
m
l
2
Λ
r
l
0
L
1
-12
r
l
m
l
2
σ
2
12
r
l
2
σ
2
1
σ
2
l
0
L
1
m
l
r
l
1
2
σ
2
l
0
L
1
m
l
2
(4)
The likelihood ratio test then has the much
simpler, but equivalent form
∑l=0L-1
m
l
r
l
≷
ℳ
0
ℳ
1
σ2lnη+1/2∑l=0L-1
m
l
2
l
0
L
1
m
l
r
l
≷
ℳ
0
ℳ
1
σ
2
η
12
l
0
L
1
m
l
2
To focus on the model evaluation aspects of this
problem, let's assume the means equal each other and are a positive constant:
m
l
=m>0
m
l
m
0
.
We now have
∑l=0L-1
r
l
≷
ℳ
0
ℳ
1
σ2mlnη+Lm2
l
0
L
1
r
l
≷
ℳ
0
ℳ
1
σ
2
m
η
L
m
2
Note that all that need be known about the observations
r
l
r
l
is their sum. This quantity is the sufficient
statistic for the Gaussian problem:
ϒr=∑
r
l
ϒ
r
r
l
and
γ=σ2lnηm+Lm2
γ
σ
2
η
m
L
m
2
.
When trying to compute the probability of error or the
threshold in the Neyman-Pearson criterion, we must find the
conditional probability density of one of the decision
statistics: the likelihood ratio, the log-likelihood, or the
sufficient statistic. The log-likelihood and the sufficient
statistic are quite similar in this problem, but clearly we
should use the latter. One practical property of the
sufficient statistic is that it usually simplifies
computations. For this Gaussian example, the sufficient
statistic is a Gaussian random variable under each model.
-
ℳ
0
ℳ
0
:
ϒr∼N0Lσ2
ϒ
r
N
0
L
σ
2
-
ℳ