Skip to content Skip to navigation

Connexions

You are here: Home » Content » Statistical Hypothesis Testing: Problems

Navigation

Content Actions

  • Download module PDF
  • Add to ...
    Add the module to:
    • My Favorites
    • A lens
    • An external social bookmarking service
    • My Favorites (What is 'My Favorites'?)
      'My Favorites' is a special kind of lens which you can use to bookmark modules and collections directly in Connexions. 'My Favorites' can only be seen by you, and collections saved in 'My Favorites' can remember the last module you were on. You need a Connexions account to use 'My Favorites'.
    • A lens (What is a lens?)

      Definition of a lens

      Lenses

      A lens is a custom view of Connexions content. You can think of it as a fancy kind of list that will let you see Connexions through the eyes of organizations and people you trust.

      What is in a lens?

      Lens makers point to Connexions materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

      Who can create a lens?

      Any individual Connexions member, a community, or a respected organization.

    • External bookmarks
  • E-mail the author

Recently Viewed

This feature requires Javascript to be enabled.

Statistical Hypothesis Testing: Problems

Module by: Don Johnson

Exercise 1

Consider the following two-model evaluation problem (van Trees; prob.2.2.1). 0 :   r=n 0 :   r n 1 :   r=s+n 1 :   r s n where ss and nn are statistically independent, positively valued, random variables having the densities pss=a-as p s s a a s and pnn=b-bn p n n b b n

1.a)

Prove that the likelihood ratio test reduces to r 0 1 γ r 0 1 γ

1.b)

Find γ γ for the minimum probability of error test as a function of the a priori probabilities.

1.c)

Now assume that we need a Neyman-Pearson test. Find γ γ as a function of P F P F , the false-alarm probability.

Exercise 2

The two models describe different equi-variance statistical models for the observations (van Trees; Prob. 2.2.11). 0 :   prr=12-2|r| 0 :   p r r 1 2 2 r 1 :   prr=12π-12r2 1 :   p r r 1 2 1 2 r 2

2.a)

Find the likelihood ratio.

2.b)

Compute the decision regions for various values of the threshold in the likelihood ratio test.

2.c)

Assuming these two densities are equally likely, find the probability of making an error in distinguishing between them.

Exercise 3

A hypothesis testing criterion radically different from those discussed in this section and this section is minimum equivocation. In this information theoretic approach, the two-model testing problem is modeled as a digital channel, shown in this figure. The channel's inputs, generically represented by the x x, are the models and the channel's ouputs, denoted by y y, are the decisions.

The quality of such information theoretic channels is quantified by the mutual information x;y x y defined to be the difference between the entropy of the inputs and the equivocation (Cover and Thomas; sections 2.3, 2.4). x;y=Hx-H x | y x y H x H x | y Hx=-iP x i logP x i H x i i P x i P x i H x | y =-ijP x i y j logP x i y j P y j H x | y i j i j P x i y j P x i y j P y j Here, P x i P x i denotes the a priori probabilities, P y j P y j the output probabilities, and P x i y j P x i y j the joint probability of input x i x i resulting in output y j y j . For example, P x 0 y 0 =P x 0 1- P F P x 0 y 0 P x 0 1 P F and P y 0 =P x 0 1- P F +P x 1 1- P D P y 0 P x 0 1 P F P x 1 1 P D . For a fixed set of a priori probabilities, show that the decision rule that maximizes the mutual information is the likelihood ratio test. What is the threshold when this criterion is employed?

note:

This problem is relatively difficult. The key to its solution is to exploit the concavity of the entropy function.

Exercise 4

Non-Gaussian statistical models sometimes yield surprising results in comparison to Gaussian ones. Consider the following hypothesis testing problem where the observations have a Laplacian probability distribution. 0 :   prr=12-|r+m| 0 :   p r r 1 2 r m 1 :   prr=12-|r-m| 1 :   p r r 1 2 r m

4.a)

Find the sufficient statistic for the optimal decision rule.

4.b)

What decision rule guarantees that the miss probability will be less than 0.1?

Exercise 5

Developing a Neyman-Pearson decision rule for more than two models has not been detailed. Assume K K distinct models are required to account for the observations. We seek to maximize the probability of correctly announcing i i under the constraint that the probability of announcing i i when model 0 0 was indeed true does not exceed a specified value.

5.a)

Formulate the optimization problem that simultaneously maximizes Pr say i | i i say i under the constraint Pr say i | 0 α i 0 say i α i . Find the solution using Lagrange multipliers.

5.b)

Show that your solution can be expressed as choosing the largest of the sufficient statistics ϒ i r+ C i ϒ i r C i .

Exercise 6

Pattern recognition relies heavily on ideas derived from the principles of statistical model testing. Measurements are made of a test object and these are compared with those of "standard" objects to determine which the test object most closely resembles. Assume that the measurement vector r r is jointly Gaussian with mean m i m i ( i1K i 1 K ) and covariance matrix σ2I σ 2 I (i.e., statistically independent components). Thus there are K K possible objects, each having an "ideal" measurement vector mi m i and probability P i P i of being present.

6.a)

How is the minimum probability of error choice of object determined from the observation of r r?

6.b)

Assuming that only two equally likely objects are possible ( K=2 K 2 ), what is the probability of error of your decision rule?

6.c)

The expense of making measurements is always a practical consideration. Assuming each measurement costs the same to perform, how would you determine the effectiveness of a measurement vector's component?

Exercise 7

Define y y to be y=k=0L x k y k 0 L x k where the x k x k are statistically independent random variables, each having a Gaussian density 0σ2 0 σ 2 . The number L L of variables in the sum is a random variable with a Poisson distribution. PrL=l=λll!-λ L l λ l l λ where l01 l 0 1

Based upon the observation of y y, we want to decide whether L1 L 1 or L>1 L 1 . Write an expression for the minimum P e P e .

Exercise 8

One observation of the random variable r r is obtained. This random variable is either uniformly distributed between -1 and +1 or expressed as the sum of statistically independent random variables, each of which is also uniformly distributed between -1 and +1.

8.a)

Suppose there are two terms in the aforementioned sum. Assuming that the two models are equally likely, find the minimum probability of error decision rule.

8.b)

Compute the resulting probability of error of your decision rule.

8.c)

Show that the decision rule found in this previous part applies no matter how many terms are assumed present in the sum.

Exercise 9

The observed random variable r r has a Gaussian density on each of five models. pr| i r=12πσ-r- m i 22σ2 p r i r 1 2 σ r m i 2 2 σ 2 for i125 i 1 2 5 , where m 1 =-2m m 1 -2 m , m 2 =-m m 2 m , m 3 =0 m 3 0 , m 4 =m m 4 m , m 5 =2m m 5 2 m . The models are equally likely and the criterion of the test is to minimize P e P e .

9.a)

Draw the decision regions on the r r-axis.

9.b)

Compute the probability of error.

9.c)

Let σ=1 σ 1 . Sketch accurately P e P e as a function of m m.

Exercise 10

The goal is to choose which of the following four models is true upong the reception of the three-dimensional vector r r (van Trees; Prob. 2.6.6). 0 :   r=m0+n 0 :   r m 0 n 1 :   r=m1+n 1 :   r m 1 n 2 :   r=m2+n 2 :   r m 2 n 3 :   r=m3+n 3 :   r m 3 n where m0=a0b ,   m1=0ab ,   m2=-a0b ,   m3=0-ab m 0 a 0 b ,   m 1 0 a b ,   m 2 a 0 b ,   m 3 0 a b The noise vector n n is a Gaussian random vector having statistically independent, identically distributed components, each of which has zero mean and variance σ2 σ 2 . We have L L independent observations of the received vector r r.

10.a)

Assuming equally likely models, find the minimum P e P e decision rule.

10.b)

Calculate the resulting error probability.

10.c)

Show that neither the decision rule nor the probability of error do not depend on b b. Intuitively, why is this fact true?

Exercise 11

To gain some appreciation of some of the issues in implementing a detector, this problem asks you to program (preferably in Matlab) a simple detector and numerically compare its performance with theoretical predictions. Let the observations consist of a signal contained in additive Gaussian white noise. 0 :   rl=nl    l0L-1 0 :   r l n l    l 0 L 1 1 :   r=Asin2πlL+nl    l0L-1 1 :   r A 2 l L n l    l 0 L 1 The variance of each noise value equals σ2 σ 2 .

11.a)

What is the theoretical false-alarm probability of the minimum P e P e detector when the hypotheses are equally likely?

11.b)

Write a Matlab program that estimates the false-alarm probability. How many simulation trials are needed to accurately estimate the false-alarm probability? Choose values for A A and σ2 σ 2 that will result in values for P F P F of 0.1 and 0.01. Estimate the false-alarm probability and compare with the theoretical value in each case.

Exercise 12

Calculate the Kullback-Leibler distance between the following pairs of densities. Use these results to find the Fisher information for the mean parameter m m.

12.a)

Jointly Gaussian random vectors having the same covariance matrix but dissimilar mean vectors.

12.b)

Two Poisson random variables having average rates λ 0 λ 0 and λ 1 λ 1 . In this example, the observation time T T plays the role of the number of observations.

12.c)

Two sequences of statistically independent Laplacian random variables having the same variance but different means.

12.d)

Plot the Kullback-Leibler distances for the Laplacian case and for the Gaussian case of statistically independent random variables. Set the variance equal to σ2 σ 2 in each case and plot the distances as a function of mσ m σ .

Exercise 13

The Kullback-Leibler and Chernoff distances can be related to the Fisher information matrix. Let p r ; θ r ; θ p r ; θ r ; θ be a probability density that depends on the parameter vector θ θ We want to consider the distance between probability densities that differ by a perturbation δθ δ θ in their parameter vectors.

13.a)

Show that p r ; θ 0 + δ θ r ; θ 0 + δ θ p r ; θ r ; θ δθHF θ 0 δθ p r ; θ 0 + δ θ r ; θ 0 + δ θ p r ; θ r ; θ δ θ F θ 0 δ θ for small δθ δ θ . What is the constant of proportionality?

13.b)

What is the Chernoff distance between these distributions?

13.c)

Deduce from the Kullback-Leibler distance for the Gaussian and Poisson cases what the Cramér-Rao bound is for estimating the mean and average rate, respectively.

Exercise 14

Insights into certain detection problems can be gained by examining the Kullback-Leibler distance and the properties of Fisher information. We begin by first showing that the Gaussian distribution has the smallest Fisher information for the mean parameter for all differentiable distributions having the same variance.

14.a)

Show that if ft f t and gt g t are linear functions of t t and gt g t is positive for 0<t<1 0 t 1 , then the ratio f2tgt f t 2 g t is convex over this interval.

14.b)

Use this property to show that the Fisher information is a convex function of the probability density.

14.c)

Define p t x ; θ =1-t p 0 x ; θ +t p 1 x ; θ p t x ; θ 1 t p 0 x ; θ t p 1 x ; θ , 0t1 0 t 1 , where p 0 x ; θ p 1 x ; θ 𝒫 p 0 x ; θ p 1 x ; θ 𝒫 , a class of densities having variance one. Show that this set is convex.

14.d)

Because of the Fisher information's convexity, a given distribution p 0 x | θ 𝒫 p 0 x | θ 𝒫 minimizes the Fisher information if and only if ddt F t 0 t F t 0 at t=0 t 0 for all p 1 x | θ 𝒫 p 1 x | θ 𝒫 . Let the parameter be the expected value of all densities in 𝒫 𝒫. By using Lagrange multipliers to impose the constant variance constraint on all densities in the class, show that the Gaussian uniquely minimizes Fisher information.

14.e)

What does this result suggest about the performance probabilities for problems wherein the models differ in mean?

Exercise 15

Find the Chernoff distance between the following distributions.

15.a)

Two Gaussian distributions having the same variance but different means.

15.b)

Two Poisson distributions having differing parameter values.

Exercise 16

Let's explore how well Stein's Lemma predicts optimal detector performance probabilities. Consider the two-model detection problem wherein L L statistically independent, identically distributed Gaussian random variables are observed. Under 0 0 , the mean is zero and the variance one; under 1 1 , the mean is one and the variance one.

Figure 1: A binary symmetric digital communications channel.
Figure 1 (bsc2.png)

16.a)

Find an expression for the false-alarm probability P F P F when the miss probability is constrained to be less than α α.

16.b)

Find the Kullback-Leibler distance corresponding to the false-alarm probability's exponent.

16.c)

Plot the exact error for values of α=0.1 α 0.1 and α=0.01 α 0.01 as a function of the number of observations. Plot on the same axes the result predicted by Stein's Lemma.

Exercise 17

We observe a Gaussian random variable r r. This random variable has zero mean under model 0 0 and mean m m under 1 1 . The variance of r r in either instance is σ2 σ 2 . The models are equally likely.

17.a)

What is an expression for the probability of error for the minimum P e P e test when one observation of r r is made?

17.b)

Assume one can perform statistically independent observations of r r. Construct a sequential decision rule which results in P e P e equal to one-half of that found in this previous part.

17.c)

What is the sufficient statistic in this previous part and sketch how the thresholds for this statistic vary with the number of trials. Assume that m=10 m 10 and that σ=1 σ 1 . What is the expected number of trials for the sequential test to terminate?

Exercise 18

The optimum reception of binary information can be viewed as a model testing problem. Here, equally-likely binary data (a "zero" or a "one") is transmitted through a binary symmetric channel. The indicated parameters denote the probabilities of receiving a binary digit given that a particular digit was sent. Assume that ε=0.1 ε 0.1 .

18.a)

Assuming a single transmission for each digit, what is the minimum probability of error receiver and what is the resulting probability of error?

18.b)

One method of improving the probability of error is to repeat the digit to be transmitted L L times. This transmission scheme is equivalent to the so-called repetition code. The receiver uses all of the received L L digits to decide what was actually sent. Assume that the results of each transmission are statistically independent of all others. Construct the minimum probability of error receiver and find an expression for P e P e in terms of L L.

18.c)

Assume that we desire the probability of error to be 10-6 10 -6 . How long a repetition code is required to achieve this goal for the channel given above? Assume that the leading term in the probability of error expression found in this previous part dominates.

18.d)

Construct a sequential algorithm which achieves the required probability of error. Assume that the transmitter will repeat each digit until informed by the receiver that it has determined what digit was sent. What is the expected length of the repetition code in this instance?

Exercise 19

You have accepted the (dangerous) job of determining whether the radioactivity levels at the Chernobyl reactor are elevated or not. Because you want to stay as short a time as possible to render you professional opinion, you decide to use a sequential-like decision rule. Radioactivity is governed by Poisson probability laws, which means that the probability that n n counts are observed in T T seconds equals Prn=λTn-λTn! n λ T n λ T n where λ λ is the radiation intensity. Safe radioactivity levels occur when λ= λ 0 λ λ 0 and unsafe ones at λ= λ 1 λ λ 1 , λ 1 > λ 0 λ 1 λ 0 .

19.a)

Construct a sequential decision rule to determine whether it is safe or not. Assume you have defined false-alarm and miss probabilities according to accepted "professional standards." According to these standards, these probabilities equal each other.

19.b)

What is the expected time it will take to render a decision?

Exercise 20

Sequential tests can be used to advantage in situations where analytic difficulties obscure the problem. Consider the case where the observations either contain no signal ( 0 0 ) or a signal whose components are randomly set to zero.