In many circumstances, the observations to be used in evaluating
models arrive sequentially rather than all at once. For
example, passive sonar systems may well "listen" over a period
of time to an array's output while the array is steered in a
particular direction. The decision rules we have derived
implicitly assume the entire block of
data - the array output observed over a long period of time - is
available. You might wonder whether a hypothesis test could be
developed that takes the sequential arrival of data into
account, making decisions as the data arrive, with the
possibility of determining early in the
data collection procedure the validity of one model, while
maintaining the same performance
specifications. Answering this question leads to the
formulation of sequential hypothesis testing (Poor: 136-156 , Wald). Not only do
sequential tests exist, they can provide
performance superior to that of block tests in certain cases.
To make decisions as the data become available, we must
generalize the decision-making process. Assume as before that
the observed data comprise an observation vector rr of length
LL. The decision rule (in the
two-model case) now consists of determining which model is valid
or that more data are required. Thus, the
range of values of rr
is partitioned into three regions
ℜ
0
ℜ
0
,
ℜ
1
ℜ
1
, and
ℜ
?
ℜ
?
. Making the latter decision implies that the data gathered to
that point is insufficient to meet the performance requirements.
More data must be obtained to achieve the required performance
and the test re-applied once these additional data become
available. Thus, a variable number of
observations are required to make a decision. An issue in this
kind of procedure is the number of observations required to
satisfy the performance criteria: for a common set of
performance specifications, does this procedure result in a
decision rule requiring, on the average, fewer observations than
does a fixed-length block test?
In a manner similar to the Neyman-Pearson criterion, we
specify the false-alarm probability
P
F
P
F
; in addition, we need to specify the detection probability
P
D
P
D
. These constraints over-specify the model evaluation problem
where the number of observations is fixed: enforcing one
constraint forces violation of the other. In contrast, both
may be specified as the sequential test as we shall see.
Assuming a likelihood ratio test, two thresholds are required
to define the three decision regions.
Λ
L
r<
η
0
say
ℳ
0
η
0
<
Λ
L
r<
η
1
say "need more data"
η
1
<
Λ
L
r
say
ℳ
1
Λ
L
r
η
0
say
ℳ
0
η
0
Λ
L
r
η
1
say "need more data"
η
1
Λ
L
r
say
ℳ
1
where
Λ
L
r
Λ
L
r
is the usual likelihood ratio where the dimension
LL of the vector rr is explicitly denoted. The
threshold values
η
0
η
0
and
η
1
η
1
are found from the constraints, which are expressed as
P
F
=∫
ℜ
1
pr|
ℳ
0
rdr=α
P
F
r
ℜ
1
p
r
ℳ
0
r
α
and
P
D
=∫
ℜ
1
pr|
ℳ
1
rdr=β
P
D
r
ℜ
1
p
r
ℳ
1
r
β
Here, αα and
ββ are design constants that
you choose according to the application. Note that the
probabilities
P
F
P
F
,
P
D
P
D
are associated not with what happens on a
given trial, but what the sequential test yields in terms of
performance when a decision is made. Thus,
P
M
=1-
P
D
P
M
1
P
D
although the probability of correctly saying
ℳ
1
ℳ
1
on a given trial does not equal one minus the probability of
incorrectly saying
ℳ
0
ℳ
0
is true: The "need more data" region must be accounted for on
an individual trial but not when considering the sequential
test's performance when it terminates.
Rather explicitly attempting to relate thresholds to
performance probabilities, we obtain simpler results by using
bounds and approximations. Note that the expression for
P D
P D
may be written as
P
D
=∫
ℜ
1
pr|
ℳ
1
rpr|
ℳ
0
rpr|
ℳ
0
rdr=∫
ℜ
1
Λ
L
rpr|
ℳ
0
rdr
P
D
r
ℜ
1
p
r
ℳ
1
r
p
r
ℳ
0
r
p
r
ℳ
0
r
r
ℜ
1
Λ
L
r
p
r
ℳ
0
r
In the decision region
ℜ
1
ℜ
1
,
Λ
L
r≥
η
1
Λ
L
r
η
1
; thus, a lower bound on the detection probability
can be established by substituting this inequality into the
integral.
P
D
≥
η
1
∫
ℜ
1
pr|
ℳ
0
rdr
P
D
η
1
r
ℜ
1
p
r
ℳ
0
r
The integral is the false-alarm probability
P
F
P
F
of the test when it terminates. In this way, we find that
P
D
P
F
≥
η
1
P
D
P
F
η
1
. Using similar arguments on the miss probability, we
obtain a similar bound on the threshold
η
0
η
0
.
These inequalities are summarized as
η
0
≥1-
P
D
1-
P
F
η
0
1
P
D
1
P
F
and
η
1
≤
P
D
P
F
η
1
P
D
P
F
These bounds, which relate the thresholds in the sequential
likelihood ratio test with the false-alarm and detection
probabilities, are general, applying even when sequential
tests are not being used. In the usual likelihood ratio test,
there is a single threshold
ηη; these bounds apply to it
as well, implying that in a likelihood ratio test the error
probabilities will always satisfy
P
D
P
F
≥1-
P
D
1-
P
F
P
D
P
F
1
P
D
1
P
F
(1)
This relationship can be manipulated to show
P
D
≥
P
F
P
D
P
F
, indicating that the likelihood ratio test is, at
the very least, a reasonable decision rule and that the
ROC curves have the
right general form.
Only with difficulty can we solve the inequality constraints
on the sequential test's thresholds in the general case.
Surprisingly, by approximating the inequality constraints by
equality constraints we can obtain a
result having pragmatically important properties. As an
approximation, we thus turn to solving for
η
0
η
0
and
η
1
η
1
under the conditions
η
0
=1-β1-α
η
0
1
β
1
α
and
η
1
=βα
η
1
β
α
In this way, the threshold values are explicitly specified in
terms of the desired performance probabilities. We use the
criterion values for the false-alarm and detection
probabilities because when we use these equalities, the test's
resulting performance probabilities
P
F
P
F
and
P
D
P
D
usually do not satisfy the design criteria.
For example, equating
η
1
η
1
to a value potentially larger than its desired value might
result in a smaller detection probability and a larger
false-alarm rate. We will want to understand how much actual
performance departs from what we want.
The relationships derived above between the performance levels
and the thresholds apply no matter how the thresholds are
chosen.
1-β1-α≥1-
P
D
1-
P
F
1
β
1
α
1
P
D
1
P
F
βα≤
P
D
P
F
β
α
P
D
P
F
From these inequalities, two important results follow:
P
F
≤αβ
P
F
α
β
1-
P
D
≤1-β1-α
1
P
D
1
β
1
α
and
P
F
+1-
P
D
≤α+1-β
P
F
1
P
D
α
1
β
The first result follows directly from the threshold bounds.
To derive the second result, we must work a little harder.
Multiplying the first inequality by
1-α1-
P
F
1
α
1
P
F
yields
1-β1-
P
F
≥1-
P
D
1-α
1
β
1
P
F
1
P
D
1
α
. Considering the reciprocal of the second
inequality and multiplying it by
β
P
D
β
P
D
yields
P
D
α≥β
P
F
P
D
α
β
P
F
. Adding the two inequalities yields the second result.
The first set of inequalities suggest that the false-alarm and
miss (which equals
1-
P
D
1
P
D
) probabilities will increase only slightly from
their specified values: the denominators on the right sides
are very close to unity in the interesting cases (e.g., small
error probabilities like 0.01). The second inequality
suggests that the sum of the false-alarm and miss
probabilities obtained in practice will be less than the sum
of the specified error probabilities. Taking these results
together, one of two situations will occur when we approximate
the inequality criterion by equality: either the false alarm
probability will decrease and the detection probability
increase (a most pleasing but unlikely circumstance)
or one of the error probabilities will
increase while the other decreases. The false-alarm
and miss probabilities cannot both increase.
Furthermore, whichever one increases, the first inequalities
suggest that the incremental change will be small. Our
ad hoc approximation to the thresholds does
indeed yield a level of performance close to that specified
Usually, the likelihood is manipulated to derive a sufficient
statistic. The resulting sequential decision rule is
ϒ
L
r<
γ
0
L
say
ℳ
0
γ
0
L<
ϒ
L
r<
γ
1
L
say "need more data"
γ
1
L<
ϒ
L
r
say
ℳ
1
ϒ
L
r
γ
0
L
say
ℳ
0
γ
0
L
ϒ
L
r
γ
1
L
say "need more data"
γ
1
L
ϒ
L
r
say
ℳ
1
Note that the thresholds
γ
0
L
γ
0
L
and
γ
1
L
γ
1
L
, derived
from the thresholds
η
0
η
0
and
η
1
η
1
, usually depend on the
number of observations used in the decision rule.
Let rr be a
Gaussian random vector as in our previous examples with
statistically independent components.
ℳ
0
:
r∼0σ2I
ℳ
0
:
r
0
σ
2
I
ℳ
1
:
r∼mσ2I
ℳ
1
:
r
m
σ
2
I
The mean vector mm is assumed for simplicity to
consist of equal positive values:
M=m…mT
M
m
…
m
,
m>0
m
0
. Using the previous derivations, our sequential
test becomes
∑l=0L-1
r
l
<σ2mlog
η
0
+Lm2
say
ℳ
0
σ2mlog
η
0
+Lm2<∑l=0L-1
r
l
<σ2mlog
η
1
+Lm2
say "need more data"
σ2mlog
η
1
+Lm2<∑l=0L-1
r
l
say
ℳ
1
l
0
L
1
r
l
σ
2
m
η
0
L
m
2
say
ℳ
0
σ
2
m
η
0
L
m
2
l
0
L
1
r
l
σ
2
m
η
1
L
m
2
say "need more data"
σ
2
m
η
1
L
m
2
l
0
L
1
r
l
say
ℳ
1
Starting with
L=1
L
1
, we gather the data and compute the sum. The
sufficient statistic will lie in the middle range between
the two thresholds until one of them is exceeded as shown in
Figure 1.
The model evaluation procedure then terminates and the
chosen model announced. Note how the thresholds depend on
the amount of data available (as expressed by
LL). This variation typifies
the sequential hypothesis tests.
The awake reader might wonder whether that the sequential
likelihood ratio test just derived has the disturbing property
that it may never terminate: can the likelihood ratio wander
between the two thresholds forever? Fortunately, the
sequential likelihood ratio test has been shown to terminate
with probability one (Wald).
Confident of eventual termination, we need to explore how many
observations are required to meet performance specifications.
The number of observations is variable, depending on the
observed data and the stringency of the specifications. The
average number of observations required
can be determined in the interesting case when the
observations are statistically independent.
Assuming that the observations are statistically independent
and identically distributed, the likelihood ratio is equal to
the product of the likelihood ratios evaluated at each
observation. Considering
ln
Λ
L
0
r
Λ
L
0
r
, the logarithm of the likelihood ratio
when a decision is made on observation
L
0
L
0
, we have
ln
Λ
L
0
r=∑l=0
L
0
-1lnΛ
r
l
Λ
L
0
r
l
0
L
0
1
Λ
r
l
where
Λ
r
l
Λ
r
l
is the likelihood ratio corresponding to the
l
th
l
th
observation. We seek an expression for
E
L
0
L
0
, the expected value of the number of observations
required to make the decision. To derive this quantity, we
evaluate the expected value of the likelihood ratio when the
decision is made. This value will usually vary with which
model is actually valid; we must consider both models
separately. Using the laws of conditional expectation (see
Joint Distributions), we find that the expected value of
Λ
L
0
r
Λ
L
0
r
, assuming that model
ℳ
1
ℳ
1
was true, is given by
Eln
Λ
L
0
r|
ℳ
1
=EEln
Λ
L
0
r|
ℳ
1
L
0
ℳ
1
Λ
L
0
r
ℳ
1
L
0
Λ
L
0
r
The outer expected value is evaluated with respect to the probability
distribution of
L
0
L
0
; the inner expected value is average value of the
log-likelihood assuming that
L
0
L
0
observations were required to choose model
ℳ
1
ℳ
1
. In the latter case, the log-likelihood is the sum of
L
0
L
0
component log-likelihood ratios
Eln
Λ
L
0
r|
ℳ
1
L
0
=
L
0
ElnΛ
r
l
|
ℳ
1
ℳ
1
L
0
Λ
L
0
r
L
0
ℳ
1
Λ
r
l
Noting that the expected value on the right is a constant with
respect to the outer expected value, we find that
Eln
Λ
L
0
r|
ℳ
1
=E
L
0
|
ℳ
1
ElnΛ
r
l
|
ℳ
1
ℳ
1
Λ
L
0
r
ℳ
1
L
0
ℳ
1
Λ
r
l
The average number of observations required to make a
decision, correct or incorrect, assuming that
ℳ
1
ℳ
1
is true is thus expressed by
E
L
0
|
ℳ
1
=Eln
Λ
L
0
r|
ℳ
1
ElnΛ
r
l
|
ℳ
1
ℳ
1
L
0
ℳ
1
Λ
L
0
r
ℳ
1
Λ
r
l
Assuming that the other model was true, we have the complementary
result
E
L
0
|
ℳ
0
=Eln
Λ
L
0
r|
ℳ
0
ElnΛ
r
l
|
ℳ
0
ℳ
0
L
0
ℳ
0
Λ
L
0
r
ℳ
0
Λ
r
l
The numerator is difficult to calculate exactly but easily
approximated; assuming that the likelihood ratio equals its
threshold value when the decision is made,
Eln
Λ
L
0
r|
ℳ
0
≈
P
F
ln
η
1
+1-
P
F
ln
η
0
=
P
F
ln
P
D
P
F
+1-
P
F
ln1-
P
D
1-
P
F
ℳ
0
Λ
L
0
r
P
F
η
1
1
P
F
η
0
P
F
P
D
P
F
1
P
F
1
P
D
1
P
F
Eln
Λ
L
0
r|
ℳ
1
≈
P
D
ln
η
1
+1-
P
D
ln
η
0
=
P
D
ln
P
D
P
F
+1-
P
D
ln1-
P
D
1-
P
F
ℳ
1
Λ
L
0
r
P
D
η
1
1
P
D
η
0
P
D
P
D
P
F
1
P
D
1
P
D
1
P
F
Note these expressions are not problem
dependent; they depend only on the specified probabilities.
The denominator cannot be approximated in a similar way with
such generality; it must be evaluated for each problem.
In the Gaussian example we have been exploring, the log-likelihood
of each component observation
r
l
r
l
is given by
lnΛ
r
l
=m
r
l
σ2-m22σ2
Λ
r
l
m
r
l
σ
2
m
2
2
σ
2
The conditional expected values required to evaluate the
expression for the average number of required observations
are
ElnΛ
r
l
|
ℳ
0
=-m22σ2
ℳ
0
Λ
r
l
m
2
2
σ
2
ElnΛ
r
l
|
ℳ
1
=m22σ2
ℳ
1
Λ
r
l
m
2
2
σ
2
For simplicity, let's assume that the false-alarm and
detection probabilities are symmetric (i.e.
P
F
=1-
P
D
P
F
1
P
D
).
The expressions for the average number of observations are equal for
each model and we have
E
L
0
|
ℳ
0
=E
L
0
|
ℳ
1
=f
P
F
σ2m2
ℳ
0
L
0
ℳ
1