# Connexions

You are here: Home » Content » Statistical Signal Processing » Sequential Hypothesis Testing

### Recently Viewed

This feature requires Javascript to be enabled.

Inside Collection (Course):

Course by: Don Johnson. E-mail the author

# Sequential Hypothesis Testing

Module by: Don Johnson. E-mail the author

In many circumstances, the observations to be used in evaluating models arrive sequentially rather than all at once. For example, passive sonar systems may well "listen" over a period of time to an array's output while the array is steered in a particular direction. The decision rules we have derived implicitly assume the entire block of data - the array output observed over a long period of time - is available. You might wonder whether a hypothesis test could be developed that takes the sequential arrival of data into account, making decisions as the data arrive, with the possibility of determining early in the data collection procedure the validity of one model, while maintaining the same performance specifications. Answering this question leads to the formulation of sequential hypothesis testing (Poor: 136-156 , Wald). Not only do sequential tests exist, they can provide performance superior to that of block tests in certain cases.

To make decisions as the data become available, we must generalize the decision-making process. Assume as before that the observed data comprise an observation vector rr of length LL. The decision rule (in the two-model case) now consists of determining which model is valid or that more data are required. Thus, the range of values of rr is partitioned into three regions 0 0 , 1 1 , and ? ? . Making the latter decision implies that the data gathered to that point is insufficient to meet the performance requirements. More data must be obtained to achieve the required performance and the test re-applied once these additional data become available. Thus, a variable number of observations are required to make a decision. An issue in this kind of procedure is the number of observations required to satisfy the performance criteria: for a common set of performance specifications, does this procedure result in a decision rule requiring, on the average, fewer observations than does a fixed-length block test?

## Sequential Likelihood Ratio Test

In a manner similar to the Neyman-Pearson criterion, we specify the false-alarm probability P F P F ; in addition, we need to specify the detection probability P D P D . These constraints over-specify the model evaluation problem where the number of observations is fixed: enforcing one constraint forces violation of the other. In contrast, both may be specified as the sequential test as we shall see.

Assuming a likelihood ratio test, two thresholds are required to define the three decision regions. Λ L r< η 0 say 0 η 0 < Λ L r< η 1 say "need more data" η 1 < Λ L r say 1 Λ L r η 0 say 0 η 0 Λ L r η 1 say "need more data" η 1 Λ L r say 1 where Λ L r Λ L r is the usual likelihood ratio where the dimension LL of the vector rr is explicitly denoted. The threshold values η 0 η 0 and η 1 η 1 are found from the constraints, which are expressed as P F = 1 p r | 0 rdr =α P F r 1 p r 0 r α and P D = 1 pr | 1 rdr =β P D r 1 p r 1 r β Here, αα and ββ are design constants that you choose according to the application. Note that the probabilities P F P F , P D P D are associated not with what happens on a given trial, but what the sequential test yields in terms of performance when a decision is made. Thus, P M =1 P D P M 1 P D although the probability of correctly saying 1 1 on a given trial does not equal one minus the probability of incorrectly saying 0 0 is true: The "need more data" region must be accounted for on an individual trial but not when considering the sequential test's performance when it terminates.

Rather explicitly attempting to relate thresholds to performance probabilities, we obtain simpler results by using bounds and approximations. Note that the expression for P D P D may be written as P D = 1 pr | 1 rpr | 0 rpr | 0 rdr = 1 Λ L rpr | 0 rdr P D r 1 p r 1 r p r 0 r p r 0 r r 1 Λ L r p r 0 r In the decision region 1 1 , Λ L r η 1 Λ L r η 1 ; thus, a lower bound on the detection probability can be established by substituting this inequality into the integral. P D η 1 1 pr | 0 rdr P D η 1 r 1 p r 0 r The integral is the false-alarm probability P F P F of the test when it terminates. In this way, we find that P D P F η 1 P D P F η 1 . Using similar arguments on the miss probability, we obtain a similar bound on the threshold η 0 η 0 . These inequalities are summarized as η 0 1 P D 1 P F η 0 1 P D 1 P F and η 1 P D P F η 1 P D P F These bounds, which relate the thresholds in the sequential likelihood ratio test with the false-alarm and detection probabilities, are general, applying even when sequential tests are not being used. In the usual likelihood ratio test, there is a single threshold ηη; these bounds apply to it as well, implying that in a likelihood ratio test the error probabilities will always satisfy

P D P F 1 P D 1 P F P D P F 1 P D 1 P F
(1)
This relationship can be manipulated to show P D P F P D P F , indicating that the likelihood ratio test is, at the very least, a reasonable decision rule and that the ROC curves have the right general form.

Only with difficulty can we solve the inequality constraints on the sequential test's thresholds in the general case. Surprisingly, by approximating the inequality constraints by equality constraints we can obtain a result having pragmatically important properties. As an approximation, we thus turn to solving for η 0 η 0 and η 1 η 1 under the conditions η 0 =1β1α η 0 1 β 1 α and η 1 =βα η 1 β α In this way, the threshold values are explicitly specified in terms of the desired performance probabilities. We use the criterion values for the false-alarm and detection probabilities because when we use these equalities, the test's resulting performance probabilities P F P F and P D P D usually do not satisfy the design criteria. For example, equating η 1 η 1 to a value potentially larger than its desired value might result in a smaller detection probability and a larger false-alarm rate. We will want to understand how much actual performance departs from what we want.

The relationships derived above between the performance levels and the thresholds apply no matter how the thresholds are chosen. 1β1α1 P D 1 P F 1 β 1 α 1 P D 1 P F βα P D P F β α P D P F From these inequalities, two important results follow: P F αβ P F α β 1 P D 1β1α 1 P D 1 β 1 α and P F +1 P D α+1β P F 1 P D α 1 β The first result follows directly from the threshold bounds. To derive the second result, we must work a little harder. Multiplying the first inequality by (1α)(1 P F ) 1 α 1 P F yields (1β)(1 P F )(1 P D )(1α) 1 β 1 P F 1 P D 1 α . Considering the reciprocal of the second inequality and multiplying it by β P D β P D yields P D αβ P F P D α β P F . Adding the two inequalities yields the second result.

The first set of inequalities suggest that the false-alarm and miss (which equals 1 P D 1 P D ) probabilities will increase only slightly from their specified values: the denominators on the right sides are very close to unity in the interesting cases (e.g., small error probabilities like 0.01). The second inequality suggests that the sum of the false-alarm and miss probabilities obtained in practice will be less than the sum of the specified error probabilities. Taking these results together, one of two situations will occur when we approximate the inequality criterion by equality: either the false alarm probability will decrease and the detection probability increase (a most pleasing but unlikely circumstance) or one of the error probabilities will increase while the other decreases. The false-alarm and miss probabilities cannot both increase. Furthermore, whichever one increases, the first inequalities suggest that the incremental change will be small. Our ad hoc approximation to the thresholds does indeed yield a level of performance close to that specified

Usually, the likelihood is manipulated to derive a sufficient statistic. The resulting sequential decision rule is ϒ L r< γ 0 L say 0 γ 0 L< ϒ L r< γ 1 L say "need more data" γ 1 L< ϒ L r say 1 ϒ L r γ 0 L say 0 γ 0 L ϒ L r γ 1 L say "need more data" γ 1 L ϒ L r say 1 Note that the thresholds γ 0 L γ 0 L and γ 1 L γ 1 L , derived from the thresholds η 0 η 0 and η 1 η 1 , usually depend on the number of observations used in the decision rule.

### Example 1

Let rr be a Gaussian random vector as in our previous examples with statistically independent components. 0 : r𝒩0σ2I 0 : r 0 σ 2 I 1 : r𝒩mσ2I 1 : r m σ 2 I The mean vector mm is assumed for simplicity to consist of equal positive values: M=mmT M m m , m>0 m 0 . Using the previous derivations, our sequential test becomes l =0L1 r l <σ2mlog η 0 +Lm2 say 0 σ2mlog η 0 +Lm2<l =0L1 r l <σ2mlog η 1 +Lm2 say "need more data" σ2mlog η 1 +Lm2<l =0L1 r l say 1 l 0 L 1 r l σ 2 m η 0 L m 2 say 0 σ 2 m η 0 L m 2 l 0 L 1 r l σ 2 m η 1 L m 2 say "need more data" σ 2 m η 1 L m 2 l 0 L 1 r l say 1 Starting with L=1 L 1 , we gather the data and compute the sum. The sufficient statistic will lie in the middle range between the two thresholds until one of them is exceeded as shown in Figure 1.

The model evaluation procedure then terminates and the chosen model announced. Note how the thresholds depend on the amount of data available (as expressed by LL). This variation typifies the sequential hypothesis tests.

## Average Number of Required Observations

The awake reader might wonder whether that the sequential likelihood ratio test just derived has the disturbing property that it may never terminate: can the likelihood ratio wander between the two thresholds forever? Fortunately, the sequential likelihood ratio test has been shown to terminate with probability one (Wald). Confident of eventual termination, we need to explore how many observations are required to meet performance specifications. The number of observations is variable, depending on the observed data and the stringency of the specifications. The average number of observations required can be determined in the interesting case when the observations are statistically independent.

Assuming that the observations are statistically independent and identically distributed, the likelihood ratio is equal to the product of the likelihood ratios evaluated at each observation. Considering ln Λ L 0 r Λ L 0 r , the logarithm of the likelihood ratio when a decision is made on observation L 0 L 0 , we have ln Λ L 0 r=l =0 L 0 1lnΛ r l Λ L 0 r l 0 L 0 1 Λ r l where Λ r l Λ r l is the likelihood ratio corresponding to the l th l th observation. We seek an expression for E L 0 L 0 , the expected value of the number of observations required to make the decision. To derive this quantity, we evaluate the expected value of the likelihood ratio when the decision is made. This value will usually vary with which model is actually valid; we must consider both models separately. Using the laws of conditional expectation (see Joint Distributions), we find that the expected value of Λ L 0 r Λ L 0 r , assuming that model 1 1 was true, is given by Eln Λ L 0 r| 1 =EEln Λ L 0 r| 1 L 0 1 Λ L 0 r 1 L 0 Λ L 0 r The outer expected value is evaluated with respect to the probability distribution of L 0 L 0 ; the inner expected value is average value of the log-likelihood assuming that L 0 L 0 observations were required to choose model 1 1 . In the latter case, the log-likelihood is the sum of L 0 L 0 component log-likelihood ratios Eln Λ L 0 r| 1 L 0 = L 0 ElnΛ r l | 1 1 L 0 Λ L 0 r L 0 1 Λ r l Noting that the expected value on the right is a constant with respect to the outer expected value, we find that Eln Λ L 0 r| 1 =E L 0 | 1 ElnΛ r l | 1 1 Λ L 0 r 1 L 0 1 Λ r l The average number of observations required to make a decision, correct or incorrect, assuming that 1 1 is true is thus expressed by E L 0 | 1 =Eln Λ L 0 r| 1 ElnΛ r l | 1 1 L 0 1 Λ L 0 r 1 Λ r l Assuming that the other model was true, we have the complementary result E L 0 | 0 =Eln Λ L 0 r| 0 ElnΛ r l | 0 0 L 0 0 Λ L 0 r 0 Λ r l

The numerator is difficult to calculate exactly but easily approximated; assuming that the likelihood ratio equals its threshold value when the decision is made, Eln Λ L 0 r| 0 P F ln η 1 1ln η 0 P F ln P D P F 1ln1 P D 1 P F 0 Λ L 0 r P F η 1 1 P F η 0 P F P D P F 1 P F 1 P D 1 P F Eln Λ L 0 r| 1 P D ln η 1 1ln η 0 P D ln P D P F 1ln1 P D 1 P F 1 Λ L 0 r P D η 1 1 P D η 0 P D P D P F 1 P D 1 P D 1 P F Note these expressions are not problem dependent; they depend only on the specified probabilities. The denominator cannot be approximated in a similar way with such generality; it must be evaluated for each problem.

### Example 2

In the Gaussian example we have been exploring, the log-likelihood of each component observation r l r l is given by lnΛ r l =m r l σ2m22σ2 Λ r l m r l σ 2 m 2 2 σ 2 The conditional expected values required to evaluate the expression for the average number of required observations are ElnΛ r l | 0 =m22σ2 0 Λ r l m 2 2 σ 2 ElnΛ r l | 1 =m22σ2 1 Λ r l m 2 2 σ 2 For simplicity, let's assume that the false-alarm and detection probabilities are symmetric (i.e. P F =1 P D P F 1 P D ). The expressions for the average number of observations are equal for each model and we have E L 0 | 0 =E L 0 | 1 =f P F σ2m2 0 L 0 1 L 0 f P F σ 2 m 2 where f P F f P F is a function equal to (24 P F )ln1 P F P F 2 4 P F 1 P F P F . Thus, the number of observations decreases with increasing signal-to-noise ratio mσ m σ and increases as the false-alarm probability is reduced.

Suppose we used a likelihood ratio test where all data were considered once and a decision made; how many observations would be required to achieve a specified level of performance and how would this fixed number compare with the average number of observations in a sequential test? In this example, we find from our earlier calculations (see equation) that P F =QLm2σ P F Q L m 2 σ so that L=4Q-1 P F 2σ2m2 L 4 Q P F 2 σ 2 m 2 The duration of the sequential and block tests depend on the signal-to-noise ratio in the same way; however, the dependence on the false-alarm probability is quite different. As depicted in the Figure 2, the disparity between these quantities increases rapidly as the false alarm probability decreases, with the sequential test requiring correspondingly fewer observations on the average.

We must not forget that these results apply to the average number of observations required to make a decision. Expressions for the distribution of the number of observations are complicated and depend heavily on the problem. When an extremely large number of observation are required to resolve a difficult case to the required accuracy, we are forced to truncate the sequential test, stopping when a specified number of observations have been used. A decision would then be made by dividing the region between the boundaries in half and selecting the model corresponding to the boundary nearest to the sufficient statistic. If this truncation point is larger than the expected number, the performance probabilities will change little. "Larger" is again problem dependent; analytic results are few, leaving the option of computer simulations to estimate the distribution of the number of observations required for a decision.

## References

1. H.V.Poor. (1988). An Introduction to Signal Detection and Estimation. New York: Springer-Verlag.
2. A.Wald. (1947). Sequential Analysis. New York: Wiley and Sons.

## Content actions

PDF | EPUB (?)

### What is an EPUB file?

EPUB is an electronic book format that can be read on a variety of mobile devices.

PDF | EPUB (?)

### What is an EPUB file?

EPUB is an electronic book format that can be read on a variety of mobile devices.

#### Collection to:

My Favorites (?)

'My Favorites' is a special kind of lens which you can use to bookmark modules and collections. 'My Favorites' can only be seen by you, and collections saved in 'My Favorites' can remember the last module you were on. You need an account to use 'My Favorites'.

| A lens I own (?)

#### Definition of a lens

##### Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

##### What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

##### Who can create a lens?

Any individual member, a community, or a respected organization.

##### What are tags?

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

| External bookmarks

#### Module to:

My Favorites (?)

'My Favorites' is a special kind of lens which you can use to bookmark modules and collections. 'My Favorites' can only be seen by you, and collections saved in 'My Favorites' can remember the last module you were on. You need an account to use 'My Favorites'.

| A lens I own (?)

#### Definition of a lens

##### Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

##### What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

##### Who can create a lens?

Any individual member, a community, or a respected organization.

##### What are tags?

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

| External bookmarks