Skip to content Skip to navigation Skip to collection information

Connexions

You are here: Home » Content » Statistical Signal Processing » Minimum Probability of Error Decision Rule

Navigation

Recently Viewed

This feature requires Javascript to be enabled.
 

Minimum Probability of Error Decision Rule

Module by: Clayton Scott, Robert Nowak. E-mail the authors

Consider the binary hypothesis test 0 : x f 0 x 0 : x f 0 x 1 : x f 1 x 1 : x f 1 x Let π i π i , denote the a priori probability of hypothesis i i . Suppose our decision rule declares " 0 0 is the true model" when x R 0 x R 0 , and it selects 1 1 when x R 1 x R 1 , where R 1 = R 0 R 1 R 0 . The probability of making an error, denoted P e P e , is

P e =Pr declare 0 and 1 true +Pr declare 1 and 0 true =Pr 1 Pr 0 | 1 +Pr 0 Pr 1 | 0 = π 1 f 1 xd x + π 0 f 0 xd x P e declare 0 and 1 true declare 1 and 0 true 1 1 0 0 0 1 x R 0 π 1 f 1 x x R 1 π 0 f 0 x
(1)
In this module, we study the minimum probability of error decision rule, which selects R 0 R 0 and R 1 R 1 so as to minimize the above expression.

Since an observation xx falls into one and only one of the decision regions R i R i , in order to minimize P e P e , we assign xx to the region for which the corresponding integrand in Equation 1 is smaller. Thus, we select x R 0 x R 0 if π 1 f 1 x< π 0 f 0 x π 1 f 1 x π 0 f 0 x , and x R 1 x R 1 if the inequality is reversed. This decision rule may be summarized concisely as Λx f 1 x f 0 x 0 1 π 0 π 1 η Λ x f 1 x f 0 x 0 1 π 0 π 1 η Here, Λx Λ x is called the likelihood ratio, ηη is called a threshold, and the overall decision rule is called the Likelihood Ratio Test.

Example 1

Normal with Common Variance, Uncommon Means

Consider the binary hypothesis test of a scalar xx 0 : x𝒩0σ2 0 : x 0 σ 2 1 : x𝒩μσ2 1 : x μ σ 2 where μμ and σ2 σ 2 are known, positive quantities. Suppose we observe a single measurement xx. The likelihood ratio is

Λx=12πσ2exμ22σ212πσ2ex22σ2=e1σ2(μxμ22) Λ x 1 2 σ 2 x μ 2 2 σ 2 1 2 σ 2 x 2 2 σ 2 1 σ 2 μ x μ 2 2
(2)
and so the minimum probability of error decision rule is e1σ2(μxμ22) 0 1 π 0 π 1 =η 1 σ 2 μ x μ 2 2 0 1 π 0 π 1 η The expression for Λx Λ x is somewhat complicated. By applying a sequence of monotonically increasing functions to both sides, we can obtain a simplified expression for the optimal decision rule without changing the rule. In this example, we apply the natural logarithm and rearrange terms to arrive at x 0 1 σ2μlnη+μ2γ x 0 1 σ 2 μ η μ 2 γ Here we have used the assumption μ>0 μ 0 . If μ<0 μ 0 , then dividing by μ μ would reverse the inequalities.

This form of the decision rule is much simpler: we just compare the observed value xx to a threshold γγ. Figure 1 depicts the two candidate densities and a possible value of γγ. If each hypothesis is a priori equally likely ( π 0 = π 1 =12 π 0 π 1 1 2 ), then γ=μ2 γ μ 2 . Figure 1 illustrates the case where π 0 > π 1 π 0 π 1 ( γ>μ2 γ μ 2 ).

Figure 1: The two candidate densities, and a threshold corresponding to π 0 > π 1 π 0 π 1
(a)
Figure 1(a) (GaussUncMeanComVarPd.png)
(b)
Figure 1(b) (GaussUncMeanComVarPf.png)
If we plot the two densities so that each is weighted by its a priori probability of occuring, the two curves will intersect at the threshold γγ (see Figure 2). (Can you explain why this is? Think back to our derivation of the LRT). This plot also offers a way to visualize the probability of error. Recall
P e = π 1 f 1 xd x + π 0 f 0 xd x = π 1 f 1 xd x + π 0 f 0 xd x = π 1 P M + π 0 P F P e x R 0 π 1 f 1 x x R 1 π 0 f 0 x x x γ π 1 f 1 x x x γ π 0 f 0 x π 1 P M π 0 P F
(3)
where P M P M and P F P F denote the miss and false alarm probabilities, respectively. These quantities are depicted in Figure 2.
Figure 2: The candidate densities weighted by their a priori probabilities. The shaded region is the probability of error for the optimal decision rule.
Figure 2 (GaussUncMeanComVarPe.png)
We can express P M P M and P F P F in terms of the Q-function as P e = π 1 Qμγσ+ π 0 Qγσ P e π 1 Q μ γ σ π 0 Q γ σ When π 0 = π 1 =12 π 0 π 1 1 2 , we have γ=μ2 γ μ 2 , and the error probability is P e =Qμ2σ P e Q μ 2 σ Since Qx Q x is monotonically decreasing, this says that the "difficulty" of the detection problem decreases with decreasing σσ and increasing μμ.

In the preceding example, computation of the probability of error involved a one-dimensional integral. If we had multiple observations, or vector-valued data, generalizing this procedure would involve multi-dimensional integrals over potentially complicated decision regions. Fortunately, in many cases, we can avoid this problem through the use of sufficient statistics.

Example 2

Suppose we have the same test as in the previous example, but now we have NN independent observations: 0 : x n 𝒩0σ2 , n = 1 , , N 0 : x n 0 σ 2 , n = 1 , , N 1 : x n 𝒩μσ2 , n = 1 , , N 1 : x n μ σ 2 , n = 1 , , N where μ>0 μ 0 and σ2>0 σ 2 0 and both are known. The likelihood ratio is

Λx= n =1N12πσ2e x n μ22σ2 n =1N12πσ2e x n 22σ2=e-12σ2 n =1N x n μ2e-12σ2 n =1N x n 2=e12σ2 n =1N2 x n μμ2=e1σ2(μ n =1N x n Nμ22) Λ x n 1 N 1 2 σ 2 x n μ 2 2 σ 2 n 1 N 1 2 σ 2 x n 2 2 σ 2 -1 2 σ 2 n 1 N x n μ 2 -1 2 σ 2 n 1 N x n 2 1 2 σ 2 n 1 N 2 x n μ μ 2 1 σ 2 μ n 1 N x n N μ 2 2
(4)
As in the previous example, we may apply the natural logarithm and rearrange terms to obtain an equivalent form of the LRT: t n =1N x n 0 1 σ2μlnη+Nμ2γ t n 1 N x n 0 1 σ 2 μ η N μ 2 γ The scalar quantity tt is a sufficient statistic for the mean. In order to evaluate the probability of error without resorting to a multi-dimensional integral, we can express P e P e in terms of tt as P e = π 1 Prt<γ| 1 true + π 0 Prt>γ| 0 true P e π 1 1 true t γ π 0 0 true t γ Now tt is a linear combination of normal variates, so it is itself normal. In particular, we have t=Ax t A x , where ( 11 ) 1 1 is an NN-dimensional row vector of 1's, and xx is multivariate normal with mean 00 or μ=μμT μ μ μ , and covariance σ2I σ 2 I . Thus we have t | 0 𝒩A0Aσ2IAT=𝒩0Nσ2 t | 0 A 0 A σ 2 I A 0 N σ 2 t | 1 𝒩AμAσ2IAT=𝒩NμNσ2 t | 1 A μ A σ 2 I A N μ N σ 2 Therefore, we may write P e P e in terms of the Q-function as P e = π 1 QNμγNσ+ π 0 QγNσ P e π 1 Q N μ γ N σ π 0 Q γ N σ In the special case π 0 = π 1 =12 π 0 π 1 1 2 , P e =QNμσ P e Q N μ σ Since QQ is monotonically decreasing, this result provides mathematical support for something that is intuitively obvious: The performance of our decision rule improves with increasing NN and μμ, and decreasing σσ.

remark:

In the context of signal processing, the foregoing problem may be viewed as the problem of detecting a constant (DC) signal in additive white Gaussian noise: 0 : x n = w n , n = 1 , , N 0 : x n w n , n = 1 , , N 1 : x n =A+ w n , n = 1 , , N 1 : x n A w n , n = 1 , , N where AA is a known, fixed amplitude, and w n 𝒩0σ2 w n 0 σ 2 . Here AA corresponds to the mean μμ in the example.

The next example explores the minimum probability of error decision rule in a discrete setting.

Example 3

Repetition Code

Suppose we have a friend who is trying to transmit a bit (0 or 1) to us over a noisy channel. The channel causes an error in the transmission (that is, the bit is flipped) with probability pp, where 0p<12 0 p 1 2 , and pp is known. In order to increase the chance of a successful transmission, our friend sends the same bit NN times. Assume the NN transmissions are statistically independent. Under these assumptions, the bits you receive are Bernoulli random variables: x n Bernoulliθ x n Bernoulli θ . We are faced with the following hypothesis test:

Table 1
0 0 θ=p θ p 0 sent
1 1 θ=1p θ 1 p 1 sent
We decide to decode the received sequence x= x 1 x N T x x 1 x N by minimizing the probability of error. The likelihood ratio is
Λx= n =1N1p x n p1 x n n =1Np x n 1p1 x n =1pkpNkpk1pNk=1pp2kN Λ x n 1 N 1 p x n p 1 x n n 1 N p x n 1 p 1 x n 1 p k p N k p k 1 p N k 1 p p 2 k N
(5)
where k= n =1N x n k n 1 N x n is the number of 1s received.

Note:

kk is a sufficient statistic for θθ.
The LRT is 1pp2kN 0 1 π 0 π 1 =η 1 p p 2 k N 0 1 π 0 π 1 η Taking the natural logarithm of both sides and rearranging, we have k 0 1 N2+12lnηln1pp=γ k 0 1 N 2 1 2 η 1 p p γ In the case that both hypotheses are equally likely, the minimum probability of error decision is the "majority-vote" rule: Declare 1 1 if there are more 1s than 0s, declare 0 0 otherwise. In the event k=γ k γ , we may decide arbitrarily; the probability of error is the same either way. Let's adopt the convention that 0 0 is declared in this case.

To compute the probability of error of the optimal rule, write

P e = π 0 Pr declare 1 | 0 true + π 1 Pr declare 0 | 1 true = π 0 Prk>γ| 0 true + π 1 Prkγ| 1 true P e π 0 0 true declare 1 π 1 1 true declare 0 π 0 0 true k γ π 1 1 true k γ
(6)
Now kk is a binomial random variable, kBinomialNθ k Binomial N θ , where θθ depends on which hypothesis is true. We have
Prk>γ| 0 = k =γ+1N f 0 k= k =γ+1NNkpk1pNk 0 k γ k γ 1 N f 0 k k γ 1 N N k p k 1 p N k
(7)
and Prkγ| 1 = k =0γNk1pkpNk 1 k γ k 0 γ N k 1 p k p N k Using these formulae, we may compute P e P e explicitly for given values of NN, pp, π 0 π 0 and π 1 π 1 .

MAP Interpretation

The likelihood ratio test is one way of expressing the minimum probability of error decision rule. Another way is

Rule 1

Declare hypothesis ii such that π i f i x π i f i x is maximal.

This rule is referred to as the maximum a posteriori, or MAP rule, because the quantity π i f i x π i f i x is proportional to the posterior probability of hypothesis ii. This becomes clear when we write π i =Pr i π i i and f i x=f x | i f i x f x | i . Then, by Bayes rule, the posterior probability of i i given the data is Pr i | x =Pr i f x | i fx x i i f x | i f x Here fx f x is the unconditional density or mass function for xx, which is effectively a constant when trying to maximiaze with respect to ii.

According to the MAP interpretation, the optimal decision boundary is the locus of points where the weighted densities (in the continuous case) π i f i x π i f i x intersect one another. This idea is illustrated in Example 2.

Multiple Hypotheses

One advantage the MAP formulation of the minimum probability of error decision rule has over the LRT is that it generalizes easily to MM-ary hypothesis testing. If we are to choose between hypotheses i i , i=1M i 1 M , the optimal rule is still the MAP rule

Special Case of Bayes Risk

The Bayes risk criterion for constructing decision rules assigns a cost C i j C i j to the outcome of declaring i i when j j is in effect. The probability of error is simply a special case of the Bayes risk corresponding to C 00 = C 11 =0 C 00 C 11 0 and C 01 = C 10 =1 C 01 C 10 1 . Therefore, the form of the minimum probability of error decision rule is a specialization of the minimum Bayes risk decision rule: both are likelihood ratio tests. The different costs in the Bayes risk formulation simply shift the threshold to favor one hypothesis over the other.

Problems

Exercise 1

Generally speaking, when is the probability of error zero for the optimal rule? Phrase your answer in terms of the distributions underlying each hypothesis. Does the LRT agree with your answer in this case?

Exercise 2

Suppose we measure NN independent values x 1 , , x N x 1 , , x N . We know the variance of our measurements ( σ2=1 σ 2 1 ), but are unsure whether the data obeys a Laplacian or Gaussian probability law: 0 : f 0 x=12e(2|r|) 0 : f 0 x 1 2 2 r 1 : f 1 x=12πer22 1 : f 1 x 1 2 r 2 2

2.a)

Show that the two densities have the same mean and variance, and plot the densities on the same graph.

2.b)

Find the likelihood ratio.

2.c)

Determine the decision regions for different values of the threshold ηη. Consider all possible values of η>0 η 0

hint:
There are three distinct cases.

2.d)

Draw the decision regions and decision boundaries for η=1212 η 1 2 1 2 .

2.e)

Assuming the two hypotheses are equally likely, compute the probability of error. Your answer should be a number.

Exercise 3

3.a) Arbitrary Means and Covariances

Consider the hypothesis testing problem 0 : x𝒩 μ 0 Σ 0 0 : x μ 0 Σ 0 1 : x𝒩 μ 1 Σ 1 1 : x μ 1 Σ 1 where μ 0 Rd μ 0 d and μ 0 Rd μ 0 d , and Σ 0 Σ 0 , Σ 1 Σ 1 are positive definite, symmetric dd× dd matrices. Write down the likelihood ratio test, and simplify, for the following cases. In each case, provide a geometric description of the decision boundary.

Σ 0 = Σ 1 Σ 0 Σ 1 , but μ 0 μ 1 μ 0 μ 1 .

μ 0 = μ 1 μ 0 μ 1 , but Σ 0 Σ 1 Σ 0 Σ 1 .

μ 0 μ 1 μ 0 μ 1 and Σ 0 Σ 1 Σ 0 Σ 1 .

Exercise 4

Suppose we observe NN independent realizations of a Poisson random variable kk with intensity parameter λλ: fk=eλλkk! f k λ λ k k We must decide which of two intensities is in effect: 0 : λ= λ 0 0 : λ λ 0 1 : λ= λ 1 1 : λ λ 1 where λ 0 < λ 1 λ 0 λ 1 .

4.a)

Give the minimum probability of error decision rule.

4.b)

Simplify the LRT to a test statistic involving only a sufficient statistic. Apply a monotonically increasing transformation to simplify further.

4.c)

Determine the distribution of the sufficient statistic under both hypotheses.

Hint:
Use the characteristic function to show that a sum of IID Poisson variates is again Poisson distributed.

4.d)

Derive an expression for the probability of error.

4.e)

Assuming the two hypotheses are equally likely, and λ 0 =5 λ 0 5 and λ 1 =6 λ 1 6 , what is the minimum number NN of observations needed to attain a probability of error no greater than 0.01?

Hint:
If you have numerical trouble, try rewriting the log-factorial so as to avoid evaluating the factorial of large integers.

Exercise 5

In Example 3, suppose π 0 = π 1 =12 π 0 π 1 1 2 , and p=0.1 p 0.1 . What is the smallest value of NN needed to ensure P e 0.01 P e 0.01 ?

Collection Navigation

Content actions

Download module as:

Add:

Collection to:

My Favorites (?)

'My Favorites' is a special kind of lens which you can use to bookmark modules and collections. 'My Favorites' can only be seen by you, and collections saved in 'My Favorites' can remember the last module you were on. You need an account to use 'My Favorites'.

| A lens I own (?)

Definition of a lens

Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

Who can create a lens?

Any individual member, a community, or a respected organization.

What are tags? tag icon

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

| External bookmarks

Module to:

My Favorites (?)

'My Favorites' is a special kind of lens which you can use to bookmark modules and collections. 'My Favorites' can only be seen by you, and collections saved in 'My Favorites' can remember the last module you were on. You need an account to use 'My Favorites'.

| A lens I own (?)

Definition of a lens

Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

Who can create a lens?

Any individual member, a community, or a respected organization.

What are tags? tag icon

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

| External bookmarks