Skip to content Skip to navigation

Connexions

You are here: Home » Content » Performance Evaluation

Navigation

Content Actions

  • Download module PDF
  • Add to ...
    Add the module to:
    • My Favorites
    • A lens
    • An external social bookmarking service
    • My Favorites (What is 'My Favorites'?)
      'My Favorites' is a special kind of lens which you can use to bookmark modules and collections directly in Connexions. 'My Favorites' can only be seen by you, and collections saved in 'My Favorites' can remember the last module you were on. You need a Connexions account to use 'My Favorites'.
    • A lens (What is a lens?)

      Definition of a lens

      Lenses

      A lens is a custom view of Connexions content. You can think of it as a fancy kind of list that will let you see Connexions through the eyes of organizations and people you trust.

      What is in a lens?

      Lens makers point to Connexions materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

      Who can create a lens?

      Any individual Connexions member, a community, or a respected organization.

    • External bookmarks
  • E-mail the author

Recently Viewed

This feature requires Javascript to be enabled.

Performance Evaluation

Module by: Don Johnson

We alluded earlier to the relationship between the false-alarm probability P F P F and the detection probability P D P D as one varies the decision region. Because the Neyman-Pearson criterion depends on specifying the false-alarm probability to yield an acceptable detection probability, we need to examine carefully how the detection probability is affected by a specification of the false-alarm probability. The usual way these quantities are discussed is through a parametric plot of P D P D versus P F P F : the receiver operating characteristic or ROC.

As we discovered in the Gaussian example, the sufficient statistic provides the simplest way of computing these probabilities; thus, they are usually considered to depend on the threshold parameter γ γ. In these terms, we have

P D =γpϒ| 1 ϒdϒ P D ϒ γ p ϒ 1 ϒ (1)
and
P F =γpϒ| 0 ϒdϒ P F ϒ γ p ϒ 0 ϒ (2)
These densities and their relationship to the threshold γ γ are shown in Figure 1.

Figure 1: The densities of the sufficient statistic ϒr ϒ r conditioned on two hypotheses are shown for the Gaussian example. The threshold γ γ used to distinguish between the two models is indicated. The false-alarm probability is the area under the density corresponding to 0 0 to the right of the threshold; the detection probability is the area under the density corresponding to 1 1 .
Densities of the sufficient statistic
Densities of the sufficient statistic (suff.jpg)

We see that the detection probability is greater than or equal to the false-alarm probability. Since these probabilities must decrease monotonically as the threshold is increased, the ROC curve must be concave-down and must always exceed the equality line (Figure 2).1

Figure 2: A plot of the receiver operating characteristic for the densities shown in the previous figure. Three ROC curves are shown corresponding to different values for the parameter Lmσ L m σ .
Figure 2 (roc.jpg)

The degree to which the ROC departs from the equality line P D = P F P D P F measures the relative distinctiveness between the two hypothesized models for generating the observations. In the limit, the two models can be distinguished perfectly if the ROC is discontinuous and consists of the point (1,0). The two are totally confused if the ROC lies on the equality line (this would mean, of course, that the two models are identical); distinguishing the two in this case would be "somewhat difficult".

Example 1

Consider the Gaussian example we have been discussing where the two models differ only in the means of the conditional distributions. In this case, the two model-testing probabilities are given by P F =QγLσ P F Q γ L σ and P D =Qγ-LmLσ P D Q γ L m L σ By re-expressing γ γ as σ2m γ +Lm2 σ 2 m γ L m 2 , we discover that these probabilities depend only on the ratio Lmσ L m σ . P F =Q γ Lmσ+Lm2σ P F Q γ L m σ L m 2 σ P D =Q γ Lmσ-Lm2σ P D Q γ L m σ L m 2 σ As this signal-to-noise ratio increases, the ROC curve approaches its "ideal" form: the northwest corner of a square as illustrated in Figure 2 by the value of 7.44 for Lmσ L m σ , which corresponds to a signal-to-noise ratio of 7.44217dB 7.44 2 17 dB . If a small false-alarm probability (say 10-4 10 -4 ) is specified, a large detection probability (0.9999) can result. Such values of signal-to-noise ratios can thus be considered "large" and the corresponding model evaluation problem relatively easy. If, however, the signal-to-noise ratio equals 4 (6 dB), the figure illustrates the worsened performance: a 10-4 10 -4 specification on the false-alarm probability would result in a detection probability of essentially zero. Thus, in a fairly small signal-to-noise ratio range, the likelihood ratio test's performance capabilities can vary dramatically. However, no other decision rule can yield better performance.

Specification of the false-alarm probability for a new problem requires experience. Choosing a "reasonable" value for the false-alarm probability in the Neyman-Pearson criterion depends strongly on the problem difficulty. Too small a number will result in small detection probabilities; too large and the detection probability will be close to unity, suggesting that fewer false alarms could have been tolerated. Problem difficulty is assessed by the degree to which the conditional densities pr| 0 r p r 0 r and pr| 1 r p r 1 r overlap, a problem dependent measurement. If we are testing whether a distribution has one of two possible mean values as in our Gaussian example, a quantity like a signal-to-noise ratio will probably emerge as determining performance. The performance in this case can vary drastically depending on whether the signal-to-noise ratio is large or small. In other kinds of problems, the best possible performance provided by the likelihood ratio test can be poor. For example, consider the problem of determining which of two zero-mean probability densities describes a given set of data consisting of statistically independent observations (See this problem). Presumably, the variances of these two densities are equal as we are trying to determine which density is most appropriate. In this case, the performance probabilities can be quite low, especially when the general shapes of the densities are similar. Thus a single quantity, like the signal-to-noise ratio, does not emerge to characterize problem difficulty in all hypothesis testing problems. In sequel, we will analyze each model evaluation and detection problem in a standard way. After the sufficient statistic has been found, we will seek a value for the threshold that attains a specified false-alarm probability. The detection probability will then be determined as a function of "problem difficulty", the measure of which is problem-dependent. We can control the choice of false-alarm probability; we cannot control over problem difficulty. Confusingly, the detection probability will vary with both the specified false-alarm probability and the problem difficulty.

We are implicitly assuming that we have a rational method for choosing the false-alarm probability criterion value. In signal processing applications, we usually make a sequence of decisions and pass them to systems making more global determinations. For example, in digital communications problems the model evaluation formalism could be used to "receive" each bit. Each bit is received in sequence and then passed to the decoder which invokes error-correction algorithms. The important notions here are that the decision-making process occurs at a given rate and that the decisions are presented to other signal processing systems. The rate at which errors occur in system input(s) greatly influences system design. Thus, the selection of a false-alarm probability is usually governed by the error rate that can be tolerated by succeeding systems. If the decision rate is one per day, then a moderately large (say 0.1) false-alarm probability might be appropriate. If the decision rate is a million per second as in a one megabit communication channel, the false-alarm probability should be much lower: 10-12 10 -12 would suffice for the one-tenth per day error rate.

Footnotes

  1. This seemingly haughty claim is proved when we consider the sequential hypothesis test.

Comments, questions, feedback, criticisms?

Send feedback