Skip to content Skip to navigation

Connexions

You are here: Home » Content » Detection Performance Criteria

Navigation

Content Actions

  • Download module PDF
  • Add to ...
    Add the module to:
    • My Favorites
    • A lens
    • An external social bookmarking service
    • My Favorites (What is 'My Favorites'?)
      'My Favorites' is a special kind of lens which you can use to bookmark modules and collections directly in Connexions. 'My Favorites' can only be seen by you, and collections saved in 'My Favorites' can remember the last module you were on. You need a Connexions account to use 'My Favorites'.
    • A lens (What is a lens?)

      Definition of a lens

      Lenses

      A lens is a custom view of Connexions content. You can think of it as a fancy kind of list that will let you see Connexions through the eyes of organizations and people you trust.

      What is in a lens?

      Lens makers point to Connexions materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

      Who can create a lens?

      Any individual Connexions member, a community, or a respected organization.

    • External bookmarks
  • E-mail the author

Recently Viewed

Detection Performance Criteria

Module by: Don Johnson Based on: Criteria in Hypothesis Testing by Don Johnson

The criterion used in the previous section---minimize the average cost of an incorrect decision---may seem to be a contrived way of quantifying decisions. Well, often it is. For example, the Bayesian decision rule depends explicitly on the a priori probabilities. A rational method of assigning values to these---either by experiment or through true knowledge of the relative likelihood of each model---may be unreasonable. In this section, we develop alternative decision rules that try to respond to such objections. One essential point will emerge from these considerations: the likelihood ratio persists as the core of optimal detectors as optimization criteria and problem complexity change. Even criteria remote from performance error measures can result in the likelihood ratio test. Such an invariance does not occur often in signal processing and underlines the likelihood ratio test's importance.

Maximizing the Probability of a Correct Decision

As only one model can describe any given set of data (the models are mutually exclusive), the probability of being correct P c P c for distinguishing two models is given by P c =Pr say 0 when 0 true +Pr say 1 when 1 true P c say 0 when 0 true say 1 when 1 true We wish to determine the optimum decision region placement. Expressing the probability of being correct in terms of the likelihood functions pR| i r p R i r , the a priori probabilities and the decision regions, we have P c = Z 0 π 0 pR| 0 rdr+ Z 1 π 1 pR| 1 rdr P c r Z 0 π 0 p R 0 r r Z 1 π 1 p R 1 r We want to maximize P c P c by selecting the decision regions Z 0 Z 0 and Z 1 Z 1 . Mimicking the ideas of the previous section, we associate each value of rr with the largest integral in the expression for P c P c . Decision region Z 0 Z 0 , for example, is defined by the collection of values of rr for which the first term is largest. As all of the quantities involved are non-negative, the decision rule maximizing the probability of a correct decision is

correct decision:

Given r r, choose i i for which the product π i pR| i r π i p R i r is largest.
When we must select among more than two models, this result still applies (prove this for yourself). Simple manipulations lead to the likelihood ratio test when we must decide between two models. pR| 1 rpR| 0 r 0 1 π 0 π 1 p R 1 r p R 0 r 0 1 π 0 π 1 Note that if the Bayes' costs were chosen so that C i i =0 C i i 0 and C i j =C C i j C , ( ij i j ), the Bayes' cost and the maximum-probability-correct thresholds would be the same.

To evaluate the quality of the decision rule, we usually compute the probability of error P e P e rather than the probability of being correct. This quantity can be expressed in terms of the observations, the likelihood ratio, and the sufficient statistic.

P e = π 0 Z 1 pR| 0 rdr+ π 1 Z 0 pR| 1 rdr= π 0 Λ>ηpΛ| 0 ΛdΛ+ π 1 Λ<ηpΛ| 1 ΛdΛ= π 0 ϒ>γpϒ| 0 ϒdϒ+ π 1 ϒ<γpϒ| 1 ϒdϒ P e π 0 r Z 1 p R 0 r π 1 r Z 0 p R 1 r π 0 Λ Λ η p Λ 0 Λ π 1 Λ Λ η p Λ 1 Λ π 0 ϒ ϒ γ p ϒ 0 ϒ π 1 ϒ ϒ γ p ϒ 1 ϒ (1)
These expressions point out that the likelihood ratio and the sufficient statistic can each be considered a function of the observations r r; hence, they are random variables and have probability densities for each model. When the likelihood ratio is non-monotonic, the first expression is most difficult to evaluate. When monotonic, the middle expression often proves to be the most difficult. No matter how it is calculated, no other decision rule can yield a smaller probability of error. This statement is obvious as we minimized the probability of error implicitly by maximizing the probability of being correct because P e =1- P c P e 1 P c .

From a grander viewpoint, these expressions represent an achievable lower bound on performance (as assessed by the probability of error). Furthermore, this probability will be non-zero if the conditional densities overlap over some range of values of r r, such as occurred in the previous example. Within regions of overlap, the observed values are ambiguous: either model is consistent with the observations. Our "optimum" decision rule operates in such regions by selecting that model which is most likely (has the highest probability) of generating the measured data.

Neyman-Pearson Criterion

Situations occur frequently where assigning or measuring the a priori probabilities π i π i is unreasonable. For example, just what is the a priori probability of a supernova occurring in any particular region of the sky? We clearly need a model evaluation procedure that can function without a priori probabilities. This kind of test results when the so-called Neyman-Pearson criterion is used to derive the decision rule.

Using nomenclature from radar, where model 1 1 represents the presence of a target and 0 0 its absence, the various types of correct and incorrect decisions have the following names.1

  • Detection Probability - we say it's there when it is; P D =Pr say 1 | 1 true P D 1 true say 1
  • False-alarm Probability - we say it's there when it's not; P F =Pr say 1 | 0 true P F 0 true say 1
  • Miss Probability - we say it's not there when it is; P M =Pr say 0 | 1 true P M 1 true say 0
The remaining probability Pr say 0 | 0 true 0 true say 0 has historically been left nameless and equals 1- P F 1 P F . We should also note that the detection and miss probabilities are related by P M =1- P D P M 1 P D . As these are conditional probabilities, they do not depend on the a priori probabilities. Furthermore, the two probabilities P F P F and P D P D characterize the errors when any decision rule is used.

These two probabilities are related to each other in an interesting way. Expressing these quantities in terms of the decision regions and the likelihood functions, we have P F = Z 1 pR| 0 rdr P F r Z 1 p R 0 r P D = Z 1 pR| 1 rdr P D r Z 1 p R 1 r As the region Z 1 Z 1 shrinks, both of these probabilities tend toward zero; as Z 1 Z 1 expands to engulf the entire range of observation values, they both tend toward unity. This rather direct relationship between P D P D and P F P F does not mean that they equal each other; in most cases, as Z 1 Z 1 expands, P D P D increases more rapidly than P F P F (we had better be right more often than we are wrong!). However, the "ultimate" situation where a rule is always right and never wrong ( P D =1 P D 1 , P F =0 P F 0 ) cannot occur when the conditional distributions overlap. Thus, to increase the detection probability we must also allow the false-alarm probability to increase. This behavior represents the fundamental tradeoff in detection theory.

One can attempt to impose a performance criterion that depends only on these probabilities with the consequent decision rule not depending on the a priori probabilities. The Neyman-Pearson criterion assumes that the false-alarm probability is constrained to be less than or equal to a specified value α α while we maximize the detection probability P D P D . P F , P F α:max Z 1 { P D } P F P F α Z 1 P D A subtlety of the solution we are about to obtain is that the underlying probability distribution functions may not be continuous, with the consequence that P F P F can never equal the constraining value α α. Furthermore, a (unlikely) possibility is that the optimum value for the false-alarm probability is somewhat less than the criterion value. Assume, therefore, that we rephrase the optimization problem by requiring that the false-alarm probability equal a value α α that is the largest possible value less than or equal to α α.

This optimization problem can be solved using Lagrange multipliers; we seek to find the decision rule that maximizes F= P D -λ P F - α F P D λ P F α where λ λ is a positive Lagrange multiplier. This optimization technique amounts to finding the decision rule that maximizes F F, then finding the value of the multiplier that allows the criterion toinge the detection probability in competition with false-alrm probabilities in excess of the criterion value. As is usual in the derivation of optimum decision rules, we maximize these quantities with respect to the decision regions. Expressing P D P D and P F P F in terms of them, we have

F= Z 1 pR| 1 rdr-λ Z 1 pR| 0 rdr- α =λ α + Z 1 pR| 1 r-λpR| 0 r dr F r Z 1 p R 1 r λ r Z 1 p R 0 r α λ α r Z 1 p R 1 r λ p R 0 r (2)
To maximize this quantity with respect to Z 1 Z 1 , we need only to integrate over those regions of r r where the integrand is positive). The region Z 1 Z 1 thus corresponds to those values of r r where pR| 1 r>λpR| 0 r p R 1 r λ p R 0 r and the resulting decision rule is pR| 1 rpR| 0 r 0 1 λ p R 1 r p R 0 r 0 1 λ The ubiquitous likelihood ratio test again appears; it is indeed the fundamental quantity in hypothesis testing. Using either the logarithm of the likelihood ratio or the sufficient statistic, this result can be expressed as lnΛr 0 1 lnλ Λ r 0 1 λ or ϒr 0 1 γ ϒ r 0 1 γ

We have not as yet found a value for the threshold. The false-alarm probability can be expressed in terms of the Neyman-Pearson threshold in two (useful) ways.

P F =λpΛ| 0 ΛdΛ=γpϒ| 0 ϒdϒ P F Λ λ p Λ 0 Λ ϒ γ p ϒ 0 ϒ (3)
One of these implicit equations must be solved for the threshold by setting P F P F equal to α α . The selection of which to use is usually based on pragmatic considerations: the easiest to compute. From the previous discussion of the relationship between the detection and false-alarm probabilities, we find that to maximize P D P D we must allow α α to be as large as possible while remaining less than α α. Thus, we want to find the smallest value of λλ consistent with the constraint. Computation of the threshold is problem-dependent, but a solution always exists.

Example 1

An important application of the likelihood ratio test occurs when R R is a Gaussian random vector for each model. Suppose the models correspond to Gaussian random vectors having different mean values but sharing the same covariance.

  • 0 0 : RN0σ2I R N 0 σ 2 I
  • 1 1 : RNmσ2I R N m σ 2 I
R R is of dimension L L and has statistically independent, equi-variance components. The vector of means m= m 0 m L 1 T m m 0 m L 1 distinguishes the two models. The likelihood functions associated this problem are pR| 0 r=l=0L-112πσ2-1/2 r l σ2 p R 0 r l 0 L 1 1 2 σ 2 12 r l σ 2 pR| 1 r=l=0L-112πσ2-1/2 r l - m l σ2 p R 1 r l 0 L 1 1 2 σ 2 12 r l m l σ 2 The likelihood ratio Λr Λ r becomes Λr=l=0L-1-1/2 r l - m l σ2l=0L-1-1/2 r l σ2 Λ r l 0 L 1 12 r l m l σ 2 l 0 L 1 12 r l σ 2 This expression for the likelihood ratio is complicated. In the Gaussian case (and many others), we use the logarithm the reduce the complexity of the likelihood ratio and form a sufficient statistic.
lnΛr=l=0L-1-1/2 r l - m l 2σ2+1/2 r l 2σ2=1σ2l=0L-1 m l r l -12σ2l=0L-1 m l 2 Λ r l 0 L 1 -12 r l m l 2 σ 2 12 r l 2 σ 2 1 σ 2 l 0 L 1 m l r l 1 2 σ 2 l 0 L 1 m l 2 (4)
The likelihood ratio test then has the much simpler, but equivalent form l=0L-1 m l r l 0 1 σ2lnη+1/2l=0L-1 m l 2 l 0 L 1 m l r l 0 1 σ 2 η 12 l 0 L 1 m l 2 To focus on the model evaluation aspects of this problem, let's assume the means equal each other and are a positive constant: m l =m>0 m l m 0 .2 We now have l=0L-1 r l 0 1 σ2mlnη+Lm2 l 0 L 1 r l 0 1 σ 2 m η L m 2 Note that all that need be known about the observations r l r l is their sum. This quantity is the sufficient statistic for the Gaussian problem: ϒr= r l ϒ r r l and γ=σ2lnηm+Lm2 γ σ 2 η m L m 2 .

When trying to compute the probability of error or the threshold in the Neyman-Pearson criterion, we must find the conditional probability density of one of the decision statistics: the likelihood ratio, the log-likelihood, or the sufficient statistic. The log-likelihood and the sufficient statistic are quite similar in this problem, but clearly we should use the latter. One practical property of the sufficient statistic is that it usually simplifies computations. For this Gaussian example, the sufficient statistic is a Gaussian random variable under each model.

  • 0 0 : ϒrN0Lσ2 ϒ r N 0 L σ 2