Skip to content Skip to navigation

Connexions

You are here: Home » Content » Minimum Probability of Error Decision Rule

Navigation

Recently Viewed

This feature requires Javascript to be enabled.

Minimum Probability of Error Decision Rule

Module by: Clayton Scott, Robert Nowak. E-mail the authors

User rating (How does the rating system work?)
Ratings

Ratings allow you to judge the quality of modules. If other users have ranked the module then its average rating is displayed below. Ratings are calculated on a scale from one star (Poor) to five stars (Excellent).

How to rate a module

Hover over the star that corresponds to the rating you wish to assign. Click on the star to add your rating. Your rating should be based on the quality of the content. You must have an account and be logged in to rate content.

:
(0 ratings)

Note: Your browser may not currently support MathML. See our browser support page for additional details. You can always view the correct math in the PDF version.

Consider the binary hypothesis test 0 : x f 0 x 0 : x f 0 x 1 : x f 1 x 1 : x f 1 x Let π i π i , denote the a priori probability of hypothesis i i . Suppose our decision rule declares " 0 0 is the true model" when x R 0 x R 0 , and it selects 1 1 when x R 1 x R 1 , where R 1 = R 0 R 1 R 0 . The probability of making an error, denoted P e P e , is

P e =Pr declare 0 and 1 true +Pr declare 1 and 0 true =Pr 1 Pr 0 | 1 +Pr 0 Pr 1 | 0 = R 0 π 1 f 1 xdx+ R 1 π 0 f 0 xdx P e declare 0 and 1 true declare 1 and 0 true 1 1 0 0 0 1 x R 0 π 1 f 1 x x R 1 π 0 f 0 x (1)
In this module, we study the minimum probability of error decision rule, which selects R 0 R 0 and R 1 R 1 so as to minimize the above expression.

Since an observation xx falls into one and only one of the decision regions R i R i , in order to minimize P e P e , we assign xx to the region for which the corresponding integrand in Equation 1 is smaller. Thus, we select x R 0 x R 0 if π 1 f 1 x< π 0 f 0 x π 1 f 1 x π 0 f 0 x , and x R 1 x R 1 if the inequality is reversed. This decision rule may be summarized concisely as Λx f 1 x f 0 x 0 1 π 0 π 1 η Λ x f 1 x f 0 x 0 1 π 0 π 1 η Here, Λx Λ x is called the likelihood ratio, ηη is called a threshold, and the overall decision rule is called the Likelihood Ratio Test.

Example 1

Normal with Common Variance, Uncommon Means

Consider the binary hypothesis test of a scalar xx 0 : x0σ2 0 : x 0 σ 2 1 : xμσ2 1 : x μ σ 2 where μμ and σ2 σ 2 are known, positive quantities. Suppose we observe a single measurement xx. The likelihood ratio is

Λx=12πσ2-xμ22σ212πσ2-x22σ2=1σ2μxμ22 Λ x 1 2 σ 2 x μ 2 2 σ 2 1 2 σ 2 x 2 2 σ 2 1 σ 2 μ x μ 2 2 (2)
and so the minimum probability of error decision rule is 1σ2μxμ22 0 1 π 0 π 1 =η 1 σ 2 μ x μ 2 2 0 1 π 0 π 1 η The expression for Λx Λ x is somewhat complicated. By applying a sequence of monotonically increasing functions to both sides, we can obtain a simplified expression for the optimal decision rule without changing the rule. In this example, we apply the natural logarithm and rearrange terms to arrive at x 0 1 σ2μlnη+μ2γ x 0 1 σ 2 μ η μ 2 γ Here we have used the assumption μ>0 μ 0 . If μ<0 μ 0 , then dividing by μ μ would reverse the inequalities.

This form of the decision rule is much simpler: we just compare the observed value xx to a threshold γγ. Figure 1 depicts the two candidate densities and a possible value of γγ. If each hypothesis is a priori equally likely ( π 0 = π 1 =12 π 0 π 1 1 2 ), then γ=μ2 γ μ 2 . Figure 1 illustrates the case where π 0 > π 1 π 0 π 1 ( γ>μ2 γ μ 2 ).

Figure 1: The two candidate densities, and a threshold corresponding to π 0 > π 1 π 0 π 1
(a)
Figure 1(a) (GaussUncMeanComVarPd.png)
(b)
Figure 1(b) (GaussUncMeanComVarPf.png)
If we plot the two densities so that each is weighted by its a priori probability of occuring, the two curves will intersect at the threshold γγ (see Figure 2). (Can you explain why this is? Think back to our derivation of the LRT). This plot also offers a way to visualize the probability of error. Recall
P e = R 0 π 1 f 1 xdx+ R 1 π 0 f 0 xdx=x<γ π 1 f 1 xdx+x>γ π 0 f 0 xdx= π 1 P M + π 0 P F P e x R 0 π 1 f 1 x x R 1 π 0 f 0 x x x γ π 1 f 1 x x x γ π 0 f 0 x π 1 P M π 0 P F (3)
where P M P M and P F P F denote the miss and false alarm probabilities, respectively. These quantities are depicted in Figure 2.
Figure 2: The candidate densities weighted by their a priori probabilities. The shaded region is the probability of error for the optimal decision rule.
Figure 2 (GaussUncMeanComVarPe.png)
We can express P M P M and P F P F in terms of the Q-function as P e = π 1 Qμγσ+ π 0 Qγσ P e π 1 Q μ γ σ π 0 Q γ σ When π 0 = π 1 =12 π 0 π 1 1 2 , we have γ=μ2 γ μ 2 , and the error probability is P e =Qμ2σ P e Q μ 2 σ Since Qx Q x is monotonically decreasing, this says that the "difficulty" of the detection problem decreases with decreasing σσ and increasing μμ.

In the preceding example, computation of the probability of error involved a one-dimensional integral. If we had multiple observations, or vector-valued data, generalizing this procedure would involve multi-dimensional integrals over potentially complicated decision regions. Fortunately, in many cases, we can avoid this problem through the use of sufficient statistics.

Example 2

Suppose we have the same test as in the previous example, but now we have NN independent observations: 0 : x n 0σ2 , n = 1 , , N 0 : x n 0 σ 2 , n = 1 , , N 1 : x n μσ2 , n = 1 , , N 1 : x n μ σ 2 , n = 1 , , N where μ>0 μ 0 and σ2>0 σ 2 0 and both are known. The likelihood ratio is

Λx=n=1N12πσ2- x n μ22σ2n=1N12πσ2- x n 22σ2=-12σ2n=1N x n μ2-12σ2n=1N x n 2=12σ2n=1N2 x n μμ2=1σ2μn=1N x n Nμ22 Λ x n 1 N 1 2 σ 2 x n μ 2 2 σ 2 n 1 N 1 2 σ 2 x n 2 2 σ 2 -1 2 σ 2 n 1 N x n μ 2 -1 2 σ 2 n 1 N x n 2 1 2 σ 2 n 1 N 2 x n μ μ 2 1 σ 2 μ n 1 N x n N μ 2 2 (4)
As in the previous example, we may apply the natural logarithm and rearrange terms to obtain an equivalent form of the LRT: tn=1N x n 0 1 σ2μlnη+Nμ2γ t n 1 N x n 0 1 σ 2 μ η N μ 2 γ The scalar quantity tt is a sufficient statistic for the mean. In order to evaluate the probability of error without resorting to a multi-dimensional integral, we can express P e P e in terms of tt as P e = π 1 Prt<γ| 1 true + π 0 Prt>γ| 0 true P e π 1 1 true t γ π 0 0 true t γ Now tt is a linear combination of normal variates, so it is itself normal. In particular, we have t=Ax t A x , where 11 1 1 is an NN-dimensional row vector of 1's, and xx is multivariate normal with mean 00 or μ=μμT μ μ μ , and covariance σ2I σ 2 I . Thus we have t | 0 A0Aσ2IAT=0Nσ2 t | 0 A 0 A σ 2 I A 0 N σ 2 t | 1 AμAσ2IAT=NμNσ2 t | 1 A μ A σ 2 I A N μ N σ 2 Therefore, we may write P e P e in terms of the Q-function as P e = π 1 QNμγNσ+ π 0 QγNσ P e π 1 Q N μ γ N σ π 0 Q γ N σ In the special case π 0 = π 1 =12 π 0 π 1 1 2 , P e =QNμσ P e Q N μ σ Since QQ is monotonically decreasing, this result provides mathematical support for something that is intuitively obvious: The performance of our decision rule improves with increasing NN and μμ, and decreasing σσ.

remark:

In the context of signal processing, the foregoing problem may be viewed as the problem of detecting a constant (DC) signal in additive white Gaussian noise: 0 : x n = w n , n = 1 , , N 0 : x n w n , n = 1 , , N 1 : x n =A+ w n , n = 1 , , N 1 : x n A w n , n = 1 , , N where AA is a known, fixed amplitude, and w n 0σ2 w n 0 σ 2 . Here AA corresponds to the mean μμ in the example.

The next example explores the minimum probability of error decision rule in a discrete setting.

Example 3

Repetition Code

Suppose we have a friend who is trying to transmit a bit (0 or 1) to us over a noisy channel. The channel causes an error in the transmission (that is, the bit is flipped) with probability pp, where 0p<12 0 p 1 2 , and pp is known. In order to increase the chance of a successful transmission, our friend sends the same bit NN times. Assume the NN transmissions are statistically independent. Under these assumptions, the bits you receive are Bernoulli random variables: x n Bernoulliθ x n Bernoulli θ . We are faced with the following hypothesis test:

Table 1
0 0 θ=p θ p 0 sent
1 1 θ=1p θ 1 p 1 sent
We decide to decode the received sequence x= x 1 x N T x x 1 x N by minimizing the probability of error. The likelihood ratio is
Λx=n=1N1p x n p1 x n n=1Np x n 1p1 x n =1pkpNkpk1pNk=1pp2kN Λ x n 1 N 1 p x n p 1 x n n 1 N p x n 1 p 1 x n 1 p k p N k p k 1 p N k 1 p p 2 k N (5)
where k=n=1N x n k n 1 N x n is the number of 1s received.

Note:

kk is a sufficient statistic for θθ.
The LRT is 1pp2kN 0 1 π 0 π 1 =η 1 p p 2 k N 0 1 π 0 π 1 η Taking the natural logarithm of both sides and rearranging, we have k 0 1 N2+12lnηln1pp=γ k 0 1 N 2 1 2 η 1 p p γ In the case that both hypotheses are equally likely, the minimum probability of error decision is the "majority-vote" rule: Declare 1 1 if there are more 1s than 0s, declare 0 0 otherwise. In the event k=γ k γ , we may decide arbitrarily; the probability of error is the same either way. Let's adopt the convention that 0 0 is declared in this case.

To compute the probability of error of the optimal rule, write

P e = π 0 Pr declare 1 | 0 true + π 1 Pr declare 0 | 1 true = π 0 Prk>γ| 0 true + π 1 Prkγ| 1 true P e π 0 0 true declare 1 π 1 1 true declare 0 π 0 0 true k γ π 1 1 true k γ (6)
Now kk is a binomial random variable, kBinomialNθ k Binomial N θ , where θθ depends on which hypothesis is true. We have
Prk>γ| 0 =k=γ+1N f 0 k=k=γ+1NNkpk1pNk 0 k γ k γ 1 N f 0 k k γ 1 N N k p k 1 p N k (7)
and Prkγ| 1 =k=0γNk1pkpNk 1 k γ k 0 γ N k 1 p k p N k Using these formulae, we may compute P e P e explicitly for given values of NN, pp, π 0 π 0 and π 1 π 1 .

MAP Interpretation

The likelihood ratio test is one way of expressing the minimum probability of error decision rule. Another way is

Rule 1

Declare hypothesis ii such that π i f i x π i f i x is maximal.

This rule is referred to as the maximum a posteriori, or MAP rule, because the quantity π i f i x π i f i x is proportional to the posterior probability of hypothesis ii. This becomes clear when we write π i =Pr i π i i and f i x=f x | i f i x f x | i . Then, by Bayes rule, the posterior probability of i i given the data is Pr i |x=Pr i f x | i fx x i i f x | i f x Here fx f x is the unconditional density or mass function for xx, which is effectively a constant when trying to maximiaze with respect to ii.

According to the MAP interpretation, the optimal decision boundary is the locus of points where the weighted densities (in the continuous case) π i f i x π i f i x intersect one another. This idea is illustrated in Example 2.

Multiple Hypotheses

One advantage the MAP formulation of the minimum probability of error decision rule has over the LRT is that it generalizes easily to MM-ary hypothesis testing. If we are to choose between hypotheses i i , i=1M i 1 M , the optimal rule is still the MAP rule

Special Case of Bayes Risk

The Bayes risk criterion for constructing decision rules assigns a cost C i j C i j to the outcome of declaring i i when j j is in effect. The probability of error is simply a special case of the Bayes risk corresponding to C 00 = C 11 =0 C 00 C 11 0 and C 01 = C 10 =1 C 01 C 10 1 . Therefore, the form of the minimum probability of error decision rule is a specialization of the minimum Bayes risk decision rule: both are likelihood ratio tests. The different costs in the Bayes risk formulation simply shift the threshold to favor one hypothesis over the other.

Problems

Exercise 1

Generally speaking, when is the probability of error zero for the optimal rule? Phrase your answer in terms of the distributions underlying each hypothesis. Does the LRT agree with your answer in this case?

Exercise 2

Suppose we measure NN independent values x 1 , , x N x 1 , , x N . We know the variance of our measurements ( σ2=1 σ 2 1 ), but are unsure whether the data obeys a Laplacian or Gaussian probability law: 0 : f 0 x=12-2|r| 0 : f 0 x 1 2 2 r 1 : f 1 x=12π-r22 1 : f 1 x 1 2 r 2 2

2.a)

Show that the two densities have the same mean and variance, and plot the densities on the same graph.

2.b)

Find the likelihood ratio.

2.c)

Determine the decision regions for different values of the threshold ηη. Consider all possible values of η>0 η 0

hint:
There are three distinct cases.

2.d)

Draw the decision regions and decision boundaries for η=1212 η 1 2 1 2 .

2.e)

Assuming the two hypotheses are equally likely, compute the probability of error. Your answer should be a number.

Exercise 3

3.a) Arbitrary Means and Covariances

Consider the hypothesis testing problem 0 : xμ0 Σ 0 0 : x μ 0 Σ 0 1 : xμ1 Σ 1 1 : x μ 1 Σ 1 where μ0d μ 0 d and μ0d μ 0 d , and Σ 0 Σ 0 , Σ 1 Σ 1 are positive definite, symmetric dd× dd matrices. Write down the likelihood ratio test, and simplify, for the following cases. In each case, provide a geometric description of the decision boundary.

Σ 0 = Σ 1 Σ 0 Σ 1 , but μ0μ1 μ 0 μ 1 .

μ0=μ1 μ 0 μ 1 , but Σ 0 Σ 1 Σ 0 Σ 1 .

μ0μ1 μ 0 μ 1 and Σ 0 Σ 1 Σ 0 Σ 1 .

Exercise 4

Suppose we observe NN independent realizations of a Poisson random variable kk with intensity parameter λλ: fk=-λλkk! f k λ λ k k We must decide which of two intensities is in effect: 0 : λ= λ 0 0 : λ λ 0 1 : λ= λ 1 1 : λ λ 1 where λ 0 < λ 1 λ 0 λ 1 .

4.a)

Give the minimum probability of error decision rule.

4.b)

Simplify the LRT to a test statistic involving only a sufficient statistic. Apply a monotonically increasing transformation to simplify further.

4.c)

Determine the distribution of the sufficient statistic under both hypotheses.

Hint:
Use the characteristic function to show that a sum of IID Poisson variates is again Poisson distributed.

4.d)

Derive an expression for the probability of error.

4.e)

Assuming the two hypotheses are equally likely, and λ 0 =5 λ 0 5 and λ 1 =6 λ 1 6 , what is the minimum number NN of observations needed to attain a probability of error no greater than 0.01?

Hint:
If you have numerical trouble, try rewriting the log-factorial so as to avoid evaluating the factorial of large integers.

Exercise 5

In Example 3, suppose π 0 = π 1 =12 π 0 π 1 1 2 , and p=0.1 p 0.1 . What is the smallest value of NN needed to ensure P e 0.01 P e 0.01 ?

Content actions

Give Feedback:

E-mail the module authors | Rate module ( How does the rating system work?)

Rating system

Ratings

Ratings allow you to judge the quality of modules. If other users have ranked the module then its average rating is displayed below. Ratings are calculated on a scale from one star (Poor) to five stars (Excellent).

How to rate a module

Hover over the star that corresponds to the rating you wish to assign. Click on the star to add your rating. Your rating should be based on the quality of the content. You must have an account and be logged in to rate content.

(0 ratings)

Download:

Add module to:

My Favorites (?)

'My Favorites' is a special kind of lens which you can use to bookmark modules and collections directly in Connexions. 'My Favorites' can only be seen by you, and collections saved in 'My Favorites' can remember the last module you were on. You need a Connexions account to use 'My Favorites'.

| A lens (?)

Definition of a lens

Lenses

A lens is a custom view of Connexions content. You can think of it as a fancy kind of list that will let you see Connexions through the eyes of organizations and people you trust.

What is in a lens?

Lens makers point to Connexions materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

Who can create a lens?

Any individual Connexions member, a community, or a respected organization.

What are tags? tag icon

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

| External bookmarks