We alluded earlier to the relationship between the
false-alarm probability
P
F
P
F
and the detection probability
P
D
P
D
as one varies the decision region. Because the
Neyman-Pearson criterion depends on specifying the false-alarm
probability to yield an acceptable detection probability, we
need to examine carefully how the detection probability is
affected by a specification of the false-alarm probability. The
usual way these quantities are discussed is through a parametric
plot of
P
D
P
D
versus
P
F
P
F
: the receiver operating characteristic or
ROC.
As we discovered in the Gaussian example, the sufficient
statistic provides the simplest way of computing these
probabilities; thus, they are usually considered to depend on
the threshold parameter
γ
γ. In these terms, we have
P
D
=∫γ∞pϒ|
ℳ
1
ϒdϒ
P
D
ϒ
γ
p
ϒ
ℳ
1
ϒ
(1)
and
P
F
=∫γ∞pϒ|
ℳ
0
ϒdϒ
P
F
ϒ
γ
p
ϒ
ℳ
0
ϒ
(2)
These densities and their relationship to the
threshold
γ
γ are shown in
Figure 1.
We see that the detection probability is greater than or equal
to the false-alarm probability. Since these probabilities must
decrease monotonically as the threshold is increased, the ROC
curve must be concave-down and must always
exceed the equality line (Figure 2).
The degree to which the ROC departs from the equality line
P
D
=
P
F
P
D
P
F
measures the relative distinctiveness
between the two hypothesized models for generating the
observations. In the limit, the two models can be distinguished
perfectly if the ROC is discontinuous and consists of the point
(1,0). The two are totally confused if the ROC lies on the
equality line (this would mean, of course, that the two models
are identical); distinguishing the two in this case would be
"somewhat difficult".
Consider the Gaussian example we have been discussing where
the two models differ only in the means of the conditional
distributions. In this case, the two model-testing
probabilities are given by
P
F
=QγLσ
P
F
Q
γ
L
σ
and
P
D
=Qγ-LmLσ
P
D
Q
γ
L
m
L
σ
By re-expressing
γ
γ as
σ2m
γ
′
+Lm2
σ
2
m
γ
′
L
m
2
, we discover that these probabilities depend only on
the ratio
Lmσ
L
m
σ
.
P
F
=Q
γ
′
Lmσ+Lm2σ
P
F
Q
γ
′
L
m
σ
L
m
2
σ
P
D
=Q
γ
′
Lmσ-Lm2σ
P
D
Q
γ
′
L
m
σ
L
m
2
σ
As this signal-to-noise ratio increases, the ROC
curve approaches its "ideal" form: the northwest corner of a
square as illustrated in Figure 2 by the value of
7.44 for
Lmσ
L
m
σ
, which corresponds to a signal-to-noise
ratio of
7.442≈17dB
7.44
2
17
dB
. If a small false-alarm probability (say
10-4
10
-4
) is specified, a large detection probability
(0.9999) can result. Such values of signal-to-noise ratios can thus
be considered "large" and the corresponding model evaluation problem
relatively easy. If, however, the signal-to-noise ratio equals 4 (6
dB), the figure illustrates the worsened performance: a
10-4
10
-4
specification on the false-alarm probability would
result in a detection probability of essentially zero. Thus,
in a fairly small signal-to-noise ratio range, the likelihood
ratio test's performance capabilities can vary dramatically.
However, no other decision rule can yield better performance.
Specification of the false-alarm probability for a new problem
requires experience. Choosing a "reasonable" value for the
false-alarm probability in the Neyman-Pearson criterion depends
strongly on the problem difficulty. Too small a number will
result in small detection probabilities; too large and the
detection probability will be close to unity, suggesting that
fewer false alarms could have been tolerated. Problem
difficulty is assessed by the degree to which the conditional
densities
pr|
ℳ
0
r
p
r
ℳ
0
r
and
pr|
ℳ
1
r
p
r
ℳ
1
r
overlap, a problem dependent measurement. If we are
testing whether a distribution has one of two possible mean
values as in our Gaussian example, a quantity like a
signal-to-noise ratio will probably emerge as determining
performance. The performance in this case can vary drastically
depending on whether the signal-to-noise ratio is large or
small. In other kinds of problems, the best possible
performance provided by the likelihood ratio test can be poor.
For example, consider the problem of determining which of two
zero-mean probability densities describes a given set of data
consisting of statistically independent observations (See this problem).
Presumably, the variances of these two densities are equal as we
are trying to determine which density is most appropriate. In
this case, the performance probabilities can be quite low,
especially when the general shapes of the densities are similar.
Thus a single quantity, like the signal-to-noise ratio, does
not emerge to characterize problem
difficulty in all hypothesis testing problems. In sequel, we
will analyze each model evaluation and detection problem in a
standard way. After the sufficient statistic has been found, we
will seek a value for the threshold that attains a specified
false-alarm probability. The detection probability will then be
determined as a function of "problem difficulty", the measure of
which is problem-dependent. We can control the choice of
false-alarm probability; we cannot control over problem
difficulty. Confusingly, the detection probability will vary
with both the specified false-alarm
probability and the problem difficulty.
We are implicitly assuming that we have a rational method for
choosing the false-alarm probability criterion value. In signal
processing applications, we usually make a sequence of decisions
and pass them to systems making more global determinations. For
example, in digital communications problems the model evaluation
formalism could be used to "receive" each bit. Each bit is
received in sequence and then passed to the decoder which
invokes error-correction algorithms. The important notions here
are that the decision-making process occurs at a given
rate and that the decisions are presented
to other signal processing systems. The rate at which errors
occur in system input(s) greatly influences system design.
Thus, the selection of a false-alarm probability is usually
governed by the error rate that can be tolerated by
succeeding systems. If the decision rate is one per day, then a
moderately large (say 0.1) false-alarm probability might be
appropriate. If the decision rate is a million per second as in
a one megabit communication channel, the false-alarm probability
should be much lower:
10-12
10
-12
would suffice for the one-tenth per day error rate.