# Connexions

You are here: Home » Content » Statistical Hypothesis Testing: Problems

### Recently Viewed

This feature requires Javascript to be enabled.

# Statistical Hypothesis Testing: Problems

Module by: Don Johnson. E-mail the author

## Exercise 1

Consider the following two-model evaluation problem (van Trees; prob.2.2.1). 0 :   r=n 0 :   r n 1 :   r=s+n 1 :   r s n where ss and nn are statistically independent, positively valued, random variables having the densities p s s=ae(as) p s s a a s and p n n=be(bn) p n n b b n

### 1.a)

Prove that the likelihood ratio test reduces to r 0 1 γ r 0 1 γ

### 1.b)

Find γ γ for the minimum probability of error test as a function of the a priori probabilities.

### 1.c)

Now assume that we need a Neyman-Pearson test. Find γ γ as a function of P F P F , the false-alarm probability.

## Exercise 2

The two models describe different equi-variance statistical models for the observations (van Trees; Prob. 2.2.11). 0 :   p r r=12e(2|r|) 0 :   p r r 1 2 2 r 1 :   p r r=12πe(12r2) 1 :   p r r 1 2 1 2 r 2

### 2.a)

Find the likelihood ratio.

### 2.b)

Compute the decision regions for various values of the threshold in the likelihood ratio test.

### 2.c)

Assuming these two densities are equally likely, find the probability of making an error in distinguishing between them.

## Exercise 3

A hypothesis testing criterion radically different from those discussed in this section and this section is minimum equivocation. In this information theoretic approach, the two-model testing problem is modeled as a digital channel, shown in this figure. The channel's inputs, generically represented by the x x, are the models and the channel's ouputs, denoted by y y, are the decisions.

The quality of such information theoretic channels is quantified by the mutual information x;y x y defined to be the difference between the entropy of the inputs and the equivocation (Cover and Thomas; sections 2.3, 2.4). x;y=HxH x | y x y H x H x | y Hx=iiP x i logP x i H x i i P x i P x i H x | y =i,jijP x i y j logP x i y j P y j H x | y i j i j P x i y j P x i y j P y j Here, P x i P x i denotes the a priori probabilities, P y j P y j the output probabilities, and P x i y j P x i y j the joint probability of input x i x i resulting in output y j y j . For example, P x 0 y 0 =P x 0 (1 P F ) P x 0 y 0 P x 0 1 P F and P y 0 =P x 0 (1 P F )+P x 1 (1 P D ) P y 0 P x 0 1 P F P x 1 1 P D . For a fixed set of a priori probabilities, show that the decision rule that maximizes the mutual information is the likelihood ratio test. What is the threshold when this criterion is employed?

### note:

This problem is relatively difficult. The key to its solution is to exploit the concavity of the entropy function.

## Exercise 4

Non-Gaussian statistical models sometimes yield surprising results in comparison to Gaussian ones. Consider the following hypothesis testing problem where the observations have a Laplacian probability distribution. 0 :   p r r=12e|r+m| 0 :   p r r 1 2 r m 1 :   p r r=12e|rm| 1 :   p r r 1 2 r m

### 4.a)

Find the sufficient statistic for the optimal decision rule.

### 4.b)

What decision rule guarantees that the miss probability will be less than 0.1?

## Exercise 5

Developing a Neyman-Pearson decision rule for more than two models has not been detailed. Assume K K distinct models are required to account for the observations. We seek to maximize the probability of correctly announcing i i under the constraint that the probability of announcing i i when model 0 0 was indeed true does not exceed a specified value.

### 5.a)

Formulate the optimization problem that simultaneously maximizes Pr say i | i i say i under the constraint Pr say i | 0 α i 0 say i α i . Find the solution using Lagrange multipliers.

### 5.b)

Show that your solution can be expressed as choosing the largest of the sufficient statistics ϒ i r+ C i ϒ i r C i .

## Exercise 6

Pattern recognition relies heavily on ideas derived from the principles of statistical model testing. Measurements are made of a test object and these are compared with those of "standard" objects to determine which the test object most closely resembles. Assume that the measurement vector r r is jointly Gaussian with mean m i m i ( i1K i 1 K ) and covariance matrix σ2I σ 2 I (i.e., statistically independent components). Thus there are K K possible objects, each having an "ideal" measurement vector m i m i and probability P i P i of being present.

### 6.a)

How is the minimum probability of error choice of object determined from the observation of r r?

### 6.b)

Assuming that only two equally likely objects are possible ( K=2 K 2 ), what is the probability of error of your decision rule?

### 6.c)

The expense of making measurements is always a practical consideration. Assuming each measurement costs the same to perform, how would you determine the effectiveness of a measurement vector's component?

## Exercise 7

Define y y to be y=k=0L x k y k 0 L x k where the x k x k are statistically independent random variables, each having a Gaussian density 𝒩0σ2 0 σ 2 . The number L L of variables in the sum is a random variable with a Poisson distribution. PrL=l=λll!eλ L l λ l l λ where l01 l 0 1

Based upon the observation of y y, we want to decide whether L1 L 1 or L>1 L 1 . Write an expression for the minimum P e P e .

## Exercise 8

One observation of the random variable r r is obtained. This random variable is either uniformly distributed between -1 and +1 or expressed as the sum of statistically independent random variables, each of which is also uniformly distributed between -1 and +1.

### 8.a)

Suppose there are two terms in the aforementioned sum. Assuming that the two models are equally likely, find the minimum probability of error decision rule.

### 8.b)

Compute the resulting probability of error of your decision rule.

### 8.c)

Show that the decision rule found in this previous part applies no matter how many terms are assumed present in the sum.

## Exercise 9

The observed random variable r r has a Gaussian density on each of five models. p r | i r=12πσer m i 22σ2 p r i r 1 2 σ r m i 2 2 σ 2 for i125 i 1 2 5 , where m 1 =-2m m 1 -2 m , m 2 =m m 2 m , m 3 =0 m 3 0 , m 4 =m m 4 m , m 5 =2m m 5 2 m . The models are equally likely and the criterion of the test is to minimize P e P e .

### 9.a)

Draw the decision regions on the r r-axis.

### 9.b)

Compute the probability of error.

### 9.c)

Let σ=1 σ 1 . Sketch accurately P e P e as a function of m m.

## Exercise 10

The goal is to choose which of the following four models is true upong the reception of the three-dimensional vector r r (van Trees; Prob. 2.6.6). 0 :   r= m 0 +n 0 :   r m 0 n 1 :   r= m 1 +n 1 :   r m 1 n 2 :   r= m 2 +n 2 :   r m 2 n 3 :   r= m 3 +n 3 :   r m 3 n where m 0 =a0b ,   m 1 =0ab ,   m 2 =a0b ,   m 3 =0ab m 0 a 0 b ,   m 1 0 a b ,   m 2 a 0 b ,   m 3 0 a b The noise vector n n is a Gaussian random vector having statistically independent, identically distributed components, each of which has zero mean and variance σ2 σ 2 . We have L L independent observations of the received vector r r.

### 10.a)

Assuming equally likely models, find the minimum P e P e decision rule.

### 10.b)

Calculate the resulting error probability.

### 10.c)

Show that neither the decision rule nor the probability of error do not depend on b b. Intuitively, why is this fact true?

## Exercise 11

To gain some appreciation of some of the issues in implementing a detector, this problem asks you to program (preferably in Matlab) a simple detector and numerically compare its performance with theoretical predictions. Let the observations consist of a signal contained in additive Gaussian white noise. 0 :   rl=nl    l0L1 0 :   r l n l    l 0 L 1 1 :   r=Asin2πlL+nl    l0L1 1 :   r A 2 l L n l    l 0 L 1 The variance of each noise value equals σ2 σ 2 .

### 11.a)

What is the theoretical false-alarm probability of the minimum P e P e detector when the hypotheses are equally likely?

### 11.b)

Write a Matlab program that estimates the false-alarm probability. How many simulation trials are needed to accurately estimate the false-alarm probability? Choose values for A A and σ2 σ 2 that will result in values for P F P F of 0.1 and 0.01. Estimate the false-alarm probability and compare with the theoretical value in each case.

## Exercise 12

Calculate the Kullback-Leibler distance between the following pairs of densities. Use these results to find the Fisher information for the mean parameter m m.

### 12.a)

Jointly Gaussian random vectors having the same covariance matrix but dissimilar mean vectors.

### 12.b)

Two Poisson random variables having average rates λ 0 λ 0 and λ 1 λ 1 . In this example, the observation time T T plays the role of the number of observations.

### 12.c)

Two sequences of statistically independent Laplacian random variables having the same variance but different means.

### 12.d)

Plot the Kullback-Leibler distances for the Laplacian case and for the Gaussian case of statistically independent random variables. Set the variance equal to σ2 σ 2 in each case and plot the distances as a function of mσ m σ .

## Exercise 13

The Kullback-Leibler and Chernoff distances can be related to the Fisher information matrix. Let p r ; θ r ; θ p r ; θ r ; θ be a probability density that depends on the parameter vector θ θ We want to consider the distance between probability densities that differ by a perturbation δθ δ θ in their parameter vectors.

### 13.a)

Show that 𝒟p r ; θ 0 + δ θ r ; θ 0 + δ θ p r ; θ r ; θ δθHF θ 0 δθ p r ; θ 0 + δ θ r ; θ 0 + δ θ p r ; θ r ; θ δ θ F θ 0 δ θ for small δθ δ θ . What is the constant of proportionality?

### 13.b)

What is the Chernoff distance between these distributions?

### 13.c)

Deduce from the Kullback-Leibler distance for the Gaussian and Poisson cases what the Cramér-Rao bound is for estimating the mean and average rate, respectively.

## Exercise 14

Insights into certain detection problems can be gained by examining the Kullback-Leibler distance and the properties of Fisher information. We begin by first showing that the Gaussian distribution has the smallest Fisher information for the mean parameter for all differentiable distributions having the same variance.

### 14.a)

Show that if ft f t and gt g t are linear functions of t t and gt g t is positive for 0<t<1 0 t 1 , then the ratio f2tgt f t 2 g t is convex over this interval.

### 14.b)

Use this property to show that the Fisher information is a convex function of the probability density.

### 14.c)

Define p t x ; θ =1 p 0 x ; θ +t p 1 x ; θ p t x ; θ 1 t p 0 x ; θ t p 1 x ; θ , 0t1 0 t 1 , where p 0 x ; θ p 1 x ; θ 𝒫 p 0 x ; θ p 1 x ; θ 𝒫 , a class of densities having variance one. Show that this set is convex.

### 14.d)

Because of the Fisher information's convexity, a given distribution p 0 x | θ 𝒫 p 0 x | θ 𝒫 minimizes the Fisher information if and only if ddt F t 0 t F t 0 at t=0 t 0 for all p 1 x | θ 𝒫 p 1 x | θ 𝒫 . Let the parameter be the expected value of all densities in 𝒫 𝒫. By using Lagrange multipliers to impose the constant variance constraint on all densities in the class, show that the Gaussian uniquely minimizes Fisher information.

### 14.e)

What does this result suggest about the performance probabilities for problems wherein the models differ in mean?

## Exercise 15

Find the Chernoff distance between the following distributions.

### 15.a)

Two Gaussian distributions having the same variance but different means.

### 15.b)

Two Poisson distributions having differing parameter values.

## Exercise 16

Let's explore how well Stein's Lemma predicts optimal detector performance probabilities. Consider the two-model detection problem wherein L L statistically independent, identically distributed Gaussian random variables are observed. Under 0 0 , the mean is zero and the variance one; under 1 1 , the mean is one and the variance one.

### 16.a)

Find an expression for the false-alarm probability P F P F when the miss probability is constrained to be less than α α.

### 16.b)

Find the Kullback-Leibler distance corresponding to the false-alarm probability's exponent.

### 16.c)

Plot the exact error for values of α=0.1 α 0.1 and α=0.01 α 0.01 as a function of the number of observations. Plot on the same axes the result predicted by Stein's Lemma.

## Exercise 17

We observe a Gaussian random variable r r. This random variable has zero mean under model 0 0 and mean m m under 1 1 . The variance of r r in either instance is σ2 σ 2 . The models are equally likely.

### 17.a)

What is an expression for the probability of error for the minimum P e P e test when one observation of r r is made?

### 17.b)

Assume one can perform statistically independent observations of r r. Construct a sequential decision rule which results in P e P e equal to one-half of that found in this previous part.

### 17.c)

What is the sufficient statistic in this previous part and sketch how the thresholds for this statistic vary with the number of trials. Assume that m=10 m 10 and that σ=1 σ 1 . What is the expected number of trials for the sequential test to terminate?

## Exercise 18

The optimum reception of binary information can be viewed as a model testing problem. Here, equally-likely binary data (a "zero" or a "one") is transmitted through a binary symmetric channel. The indicated parameters denote the probabilities of receiving a binary digit given that a particular digit was sent. Assume that ε=0.1 ε 0.1 .

### 18.a)

Assuming a single transmission for each digit, what is the minimum probability of error receiver and what is the resulting probability of error?

### 18.b)

One method of improving the probability of error is to repeat the digit to be transmitted L L times. This transmission scheme is equivalent to the so-called repetition code. The receiver uses all of the received L L digits to decide what was actually sent. Assume that the results of each transmission are statistically independent of all others. Construct the minimum probability of error receiver and find an expression for P e P e in terms of L L.

### 18.c)

Assume that we desire the probability of error to be 10-6 10 -6 . How long a repetition code is required to achieve this goal for the channel given above? Assume that the leading term in the probability of error expression found in this previous part dominates.

### 18.d)

Construct a sequential algorithm which achieves the required probability of error. Assume that the transmitter will repeat each digit until informed by the receiver that it has determined what digit was sent. What is the expected length of the repetition code in this instance?

## Exercise 19

You have accepted the (dangerous) job of determining whether the radioactivity levels at the Chernobyl reactor are elevated or not. Because you want to stay as short a time as possible to render you professional opinion, you decide to use a sequential-like decision rule. Radioactivity is governed by Poisson probability laws, which means that the probability that n n counts are observed in T T seconds equals Prn=λTne(λT)n! n λ T n λ T n where λ λ is the radiation intensity. Safe radioactivity levels occur when λ= λ 0 λ λ 0 and unsafe ones at λ= λ 1 λ λ 1 , λ 1 > λ 0 λ 1 λ 0 .

### 19.a)

Construct a sequential decision rule to determine whether it is safe or not. Assume you have defined false-alarm and miss probabilities according to accepted "professional standards." According to these standards, these probabilities equal each other.

### 19.b)

What is the expected time it will take to render a decision?

## Exercise 20

Sequential tests can be used to advantage in situations where analytic difficulties obscure the problem. Consider the case where the observations either contain no signal ( 0 0 ) or a signal whose components are randomly set to zero. 0 :   r l = n l 0 :   r l n l 1 :   r l = a l + s l 1 :   r l a l s l where a l =1Probp0Prob1p a l 1 p 0 1 p The probability that a signal value remains intact is a known quantity p p and "drop-outs" are statistically independent of each other and of the noise. This kind of model is often used to describe intermittent behavior in electronic equipment.

### 20.a)

Find the likelihood ratio for the observations.

### 20.b)

Develop a sequential test that would determine whether a signal is present or not.

### 20.c)

Find a formula for the test's thresholds in terms of P F P F and P D P D .

### 20.d)

How does the average number of observations vary with p p?

## Exercise 21

In some cases it might be wise to not make a decision when the data do not justify it. Thus, in addition to declaring that one of two models occurred, we might declare "no decision" when the data are indecisive. Assume you observe L L statistically independent observations r l r l , each of which is Gaussian and has a variance of two. Under one model the mean is zero, and under the other the mean is one. The models are equally likely to occur.

### 21.a)

Construct a hypothesis testing rule that yields a probability of no-decision no larger than some specified value α α, maximizes the probabilities of making correct decisions when they are made, and makes these correct-decision probabilities equal.

### 21.b)

What is the probability of a correct decision for your rule?

## Exercise 22

You decide to flip coins with Sleazy Sam. If heads is the result of a coin flip, you win one dollar; if tails, Sam wins a dollar. However, Sam's reputation has preceded him. You suspect that the probability of tails, p p, may not be 1/2 12. You want to determine whether a biased coin is being used or not after observing the results of three coin tosses.

### 22.a)

You suspect that p=3/4 p 34 . Assuming that the probability of a biased coin equals that of an unbiased coin, how would you decide whether a biased coin is being used or not in a "good" fashion?

### 22.b)

Using your decision rule, what is the probability that your determination is incorrect?

### 22.c)

One potential flaw with your decision rule is that a specific value of p p was assumed. Can a reasonable decision rule be developed without knowing p p? If so, demonstrate the rule; if not, show why not.

## Exercise 23

When a patient is screened for the presence of a disease in an organ, a section of tissue is viewed under a microscope and a count of abnormal cells made. Even under healthy conditions, a small number of abnormal cells will be present. Presumably a much larger number will be present if the organ is diseased. Assume that the number L L of abnormal cells in a section is geometrically distributed. PrL=l=(1α)αl L l 1 α α l for l01 l 0 1 . The parameter α α of a diseased organ will be larger than that of a healthy one. The probability of a randomly selected organ being diseased is p p.

### 23.a)

Assuming that the value of the parameter α α is known in each situation, find the best method of deciding whether an organ is diseased.

### 23.b)

Using your method, a patient was said to have a diseased organ. In this case, what is the probability that the organ is diseased?

### 23.c)

Assume that α α is known only for healthy organs. Find the disease screening method that minimizes the maxmimum possible value of the probability that the screening method will be in error.

## Exercise 24

How can the standard sequential test be extended to unknown parameter situations? Formulate the theory, determine the formulas for the thresholds. How would you approach finding the average number of observations?

## Exercise 25

A common situation in statistical signal processing problems is that the variance of the observations is unknown (there is no reason that noise should be nice to us!). Consider the two Gaussian model testing problem where the models differ in their means and have a common, but unknown variance. 0 :   r𝒩0σ2I 0 :   r 0 σ 2 I 1 :   r𝒩mσ2I 1 :   r m σ 2 I σ2=? σ 2 ?

### 25.a)

Show that the unknown variance enters into the optimum decision only in the threshold term.

### 25.b)

In the (happy) situation where the threshold η η equals one, show that the optimum test does not depend on σ2 σ 2 and that we did not need to know its value in the first place. When will η η equal one?

## Exercise 26

Consider the following composite hypothesis testing problem (van Trees; Prob. 2.5.2). 0 :   p r r=12π σ 0 er22 σ 0 2 0 :   p r r 1 2 σ 0 r 2 2 σ 0 2 1 :   p r r=12π σ 1 er22 σ 1 2 1 :   p r r 1 2 σ 1 r 2 2 σ 1 2 where σ 0 σ 0 is known but σ 1 σ 1 is known only to be greater than σ 0 σ 0 . Assume that we require that P F =10-2 P F 10 -2 .

### 26.a)

Does an UMP test exist for this problem? If it does, find it.

### 26.b)

Construct a generalized likelihood ratio test for this problem. Under what conditions can the requirement on the false-alarm probability be met?

## Exercise 27

Data are often processed "in the field," with the results from several systems sent to a central place for final analysis. Consider a detection system wherein each of N N field radar systems detects the presence or absence of an airplane. The detection results are collected together so that a final judgment about the airplane's presence can be made. Assume each field system has false-alarm and detection probabilities P F P F and P D P D respectively.

### 27.a)

Find the optimal detection strategy for making a final determination that maximizes the probability of making a correct decision. Assume that the a priori probabilities π 0 π 0 , π 1 π 1 of the airplane's absence or presence, respectively, are known.

### 27.b)

How does the airplane detection system change when the a priori probabilities are not known? Require that the central judgment have a false-alarm probability no bigger than P F N P F N .

## Exercise 28

Mathematically, a preconception is a model for the "world" that you believe applies over a broad class of circumstances. Clearly, you should be vigilant and continually judge your assumption's correctness.

Let X l X l denote a sequence of random variables that you believe to be independent and identically distributed with a Gaussian distribution having zero mean and variance σ2 σ 2 . Elements of this sequence arrive one after the other, and you decide to use the sample average M n M n as a test statistic. M l =1li=1l X i M l 1 l i 1 l X i

### 28.a)

Based on the sample average, develop a procedure that test for each n n whether the preconceived model is correct. This test should be designed so that it continually monitors the validity of the assumptions, and indicates at each n n whether the preconception is valid or not. Establish this test so that it yields a constant probability of judging the model incorrect when, in fact, it is actually valid.

### 28.b)

To judge the efficacy of this test, assume the elements of the actual sequence have the assumed distribution, but that they are correlated with correlation coefficient p p. Determine the probability (as a function of n n) that your test correctly invalidates the preconception.

### 28.c)

Is the test based on the sample average optimal? If so, prove it so; if not, find the optimal one.

## Exercise 29

### Delegating Responsibility

Modern management styles tend to want decisions to be made locally (by people at the scene) rather than by "the boss." While this approach might be considered more democratic, we should understand how to make decisions under such organizational constraints and what the performance might be.

Let three "local" systems separately make observations. Each local system's observations are identically distributed and statistically independent of the others, and based on the observations, each system decides which of two models applies best. The judgments are relayed to the central manager who must make the final decision. Assume the local observations consist either of white Gaussian noise or of a signal having energy E E to which the same white Gaussian noise has been added. The signal energy is the same at each local system. Each local decision system must meet a performance standard on the probability it declares the presence of a signal when none is present.

#### 29.a)

What decision rule should each local system use?

#### 29.b)

Assuming the observation models are equally likely, how should the central management make its decision so as to minimize the probability of error?

#### 29.c)

Is this decentralized decision system optimal (i.e., the probability of error for the final decision is minimized)? If so, demonstrate optimality; if not, find the optimal system.

## References

1. H.L. van Trees. (1968). Detection, Estimation, and Modulation Theory, Part I. New York: John Wiley and Sons.
2. T.M. Cover and J.A. Thomas. (1991). Elements of Information Theory. John Wiley and Sons, Inc.

## Content actions

PDF | EPUB (?)

### What is an EPUB file?

EPUB is an electronic book format that can be read on a variety of mobile devices.

My Favorites (?)

'My Favorites' is a special kind of lens which you can use to bookmark modules and collections. 'My Favorites' can only be seen by you, and collections saved in 'My Favorites' can remember the last module you were on. You need an account to use 'My Favorites'.

| A lens I own (?)

#### Definition of a lens

##### Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

##### What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

##### Who can create a lens?

Any individual member, a community, or a respected organization.

##### What are tags?

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

| External bookmarks