Skip to content Skip to navigation

Connexions

You are here: Home » Content » The Bayes Risk Criterion in Hypothesis Testing

Navigation

Content Actions

  • Download module PDF
  • Add to ...
    Add the module to:
    • My Favorites
    • A lens
    • An external social bookmarking service
    • My Favorites (What is 'My Favorites'?)
      'My Favorites' is a special kind of lens which you can use to bookmark modules and collections directly in Connexions. 'My Favorites' can only be seen by you, and collections saved in 'My Favorites' can remember the last module you were on. You need a Connexions account to use 'My Favorites'.
    • A lens (What is a lens?)

      Definition of a lens

      Lenses

      A lens is a custom view of Connexions content. You can think of it as a fancy kind of list that will let you see Connexions through the eyes of organizations and people you trust.

      What is in a lens?

      Lens makers point to Connexions materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

      Who can create a lens?

      Any individual Connexions member, a community, or a respected organization.

    • External bookmarks
  • E-mail the authors

Recently Viewed

The Bayes Risk Criterion in Hypothesis Testing

Module by: Clayton Scott, Robert Nowak

The design of a hypothesis test/detector often involves constructing the solution to an optimization problem. The optimality criteria used fall into two classes: Bayesian and frequent.

In the Bayesian setup, it is assumed that the a priori probability of each hypothesis occuring ( π i π i ) is known. A cost C ij C ij is assigned to each possible outcome: C ij =Pr say H i when H j true C ij say H i when H j true The optimal test/detector is the one that minimizes the Bayes risk, which is defined to be the expected cost of an experiment: C¯= C ij π i Pr say H i when H j true C i j C ij π i say H i when H j true

In the event that we have a binary problem, and both hypotheses are simple, the decision rule that minimizes the Bayes risk can be constructed explicitly. Let us assume that the data is continuous (i.e., it has a density) under each hypothesis: H 0 : x f 0 x H 0 : x f 0 x H 1 : x f 1 x H 1 : x f 1 x Let R 0 R 0 and R 1 R 1 denote the decision regions corresponding to the optimal test. Clearly, the optimal test is specified once we specify R 0 R 0 and R 1 = R 0 R 1 R 0 .

The Bayes risk may be written

C - =ij=01 C i j π i R i f j xdx= R 0 C 00 π 0 f 0 x+ C 01 π 1 f 1 xdx+ R 1 C 10 π 0 f 0 x+ C 11 π 1 f 1 xdx C - i j 0 1 C i j π i x R i f j x x R 0 C 00 π 0 f 0 x C 01 π 1 f 1 x x R 1 C 10 π 0 f 0 x C 11 π 1 f 1 x (1)
Recall that R 0 R 0 and R 1 R 1 partition the input space: they are disjoint and their union is the full input space. Thus, every possible input xx belongs to precisely one of these regions. In order to minimize the Bayes risk, a measurement xx should belong to the decision region R i R i for which the corresponding integrand in the preceding equation is smaller. Therefore, the Bayes risk is minimized by assigning xx to R 0 R 0 whenever π 0 C 00 f 0 x+ π 1 C 01 f 1 x< π 0 C 10 f 0 x+ π 1 C 11 f 1 x π 0 C 00 f 0 x π 1 C 01 f 1 x π 0 C 10 f 0 x π 1 C 11 f 1 x and assigning xx to R 1 R 1 whenever this inequality is reversed. The resulting rule may be expressed concisely as Λx f 1 x f 0 x H 0 H 1 π 0 C 10 - C 00 π 1 C 01 - C 11 η Λ x f 1 x f 0 x H 0 H 1 π 0 C 10 C 00 π 1 C 01 C 11 η Here, Λx Λ x is called the likelihood ratio, ηη is called the threshold, and the overall decision rule is called the Likelihood Ratio Test (LRT). The expression on the right is called a threshold.

Example 1

An instructor in a course in detection theory wants to determine if a particular student studied for his last test. The observed quantity is the student's grade, which we denote by r r. Failure may not indicate studiousness: conscientious students may fail the test. Define the models as

  • 0 0 : did not study
  • 1 1 : did study
The conditional densities of the grade are shown in Figure 1.
Figure 1: Conditional densities for the grade distributions assuming that a student did not study ( 0 0 ) or did ( 1 1 ) are shown in the top row. The lower portion depicts the likelihood ratio formed from these densities.
Figure 1 (grade.png)
Based on knowledge of student behavior, the instructor assigns a priori probabilities of π 0 =14 π 0 1 4 and π 1 =34 π 1 3 4 . The costs C i j C i j are chosen to reflect the instructor's sensitivity to student feelings: C 01 =1= C 10 C 01 1 C 10 (an erroneous decision either way is given the same cost) and C 00 =0= C 11 C 00 0 C 11 . The likelihood ratio is plotted in Figure 1 and the threshold value η η, which is computed from the a priori probabilities and the costs to be 13 1 3 , is indicated. The calculations of this comparison can be simplified in an obvious way. r50 0 1 13 r 50 0 1 1 3 or r 0 1 503=16.7 r 0 1 50 3 16.7 The multiplication by the factor of 50 is a simple illustration of the reduction of the likelihood ratio to a sufficient statistic. Based on the assigned costs and a priori probabilities, the optimum decision rule says the instructor must assume that the student did not study if the student's grade is less than 16.7; if greater, the student is assumed to have studied despite receiving an abysmally low grade such as 20. Note that as the densities given by each model overlap entirely: the possibility of making the wrong interpretation always haunts the instructor. However, no other procedure will be better!

A special case of the minimum Bayes risk rule, the minimum probability of error rule, is used extensively in practice, and is discussed at length in another module.

Problems

Exercise 1

Denote α=Pr declare H 1 when H 0 true α declare H 1 when H 0 true and β=Pr declare H 1 when H 1 true β declare H 1 when H 1 true . Express the Bayes risk C - C - in terms of α α and ββ, C i j C i j , and π i π i . Argue that the optimal decision rule is not altered by setting C 00 = C 11 =0 C 00 C 11 0 .

Exercise 2

Suppose we observe x x such that Λx=η Λ x η . Argue that it doesn't matter whether we assign x x to R 0 R 0 or R 1 R 1 .

Comments, questions, feedback, criticisms?

Send feedback