Skip to content Skip to navigation

Connexions

You are here: Home » Content » Sufficient Statistics

Navigation

Content Actions

  • Download module PDF
  • Add to ...
    Add the module to:
    • My Favorites
    • A lens
    • An external social bookmarking service
    • My Favorites (What is 'My Favorites'?)
      'My Favorites' is a special kind of lens which you can use to bookmark modules and collections directly in Connexions. 'My Favorites' can only be seen by you, and collections saved in 'My Favorites' can remember the last module you were on. You need a Connexions account to use 'My Favorites'.
    • A lens (What is a lens?)

      Definition of a lens

      Lenses

      A lens is a custom view of Connexions content. You can think of it as a fancy kind of list that will let you see Connexions through the eyes of organizations and people you trust.

      What is in a lens?

      Lens makers point to Connexions materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

      Who can create a lens?

      Any individual Connexions member, a community, or a respected organization.

    • External bookmarks
  • E-mail the authors

Recently Viewed

This feature requires Javascript to be enabled.

Sufficient Statistics

Module by: Clayton Scott, Robert Nowak

Introduction

Sufficient statistics arise in nearly every aspect of statistical inference. It is important to understand them before progressing to areas such as hypothesis testing and parameter estimation.

Suppose we observe an NN-dimensional random vector XX, characterized by the density or mass function fθx f θ x , where θθ is a pp-dimensional vector of parameters to be estimated. The functional form of fx f x is assumed known. The parameter θθ completely determines the distribution of XX. Conversely, a measurement xx of XX provides information about θθ through the probability law fθx f θ x .

Example 1

Suppose X= X 1 X 2 X X 1 X 2 , where X i θ1 X i θ 1 are IID. Here θθ is a scalar parameter specifying the mean. The distribution of XX is determined by θθ through the density fθx=12π- x 1 -θ2212π- x 2 -θ22 f θ x 1 2 x 1 θ 2 2 1 2 x 2 θ 2 2 On the other hand, if we observe x=100102 x 100 102 , then we may safely assume θ=0 θ 0 is highly unlikely.

The NN-dimensional observation XX carries information about the pp-dimensional parameter vector θθ. If p<N p N , one may ask the following question: Can we compress xx into a low-dimensional statistic without any loss of information? Does there exist some function t=Tx t T x , where the dimension of tt is M<N M N , such that tt carries all the useful information about θθ?

If so, for the purpose of studying θθ we could discard the raw measurements xx and retain only the low-dimensional statistic tt. We call tt a sufficient statistic. The following definition captures this notion precisely:

Definition 1:
Let X1 , , XM X 1 , , X M be a random sample, governed by the density or probability mass function fx|θ f θ x . The statistic Tx T x is sufficient for θθ if the conditional distribution of xx, given Tx=t T x t , is independent of θθ. Equivalently, the functional form of fθ|tx f θ t x does not involve θθ.
How should we interpret this definition? Here are some possibilities:

1. Let fθxt f θ x t denote the joint density or probability mass function on ( X , T ( X ) ) ( X , T ( X ) ) . If TX T X is a sufficient statistic for θθ, then

fθx=fθxTx=fθ|txfθt=fx|tfθt f θ x f θ x T x f θ t x f θ t f t x f θ t (1)
Therefore, the parametrization of the probability law for the measurement xx is manifested in the parametrization of the probability law for the statistic Tx T x .

2. Given t=Tx t T x , full knowledge of the measurement xx brings no additional information about θθ. Thus, we may discard xx and retain on the compressed statistic tt.

3. Any inference strategy based on fθx f θ x may be replaced by a strategy based on fθt f θ t .

Example 2

Binary Information Source

(Scharf, pp.78) Suppose a binary information source emits a sequence of binary (0 or 1) valued, independent variables x 1 , , x N x 1 , , x N . Each binary symbol may be viewed as a realization of a Bernoulli trial: x n Bernoulliθ x n Bernoulli θ , iid. The parameter θ01 θ 0 1 is to be estimated.

The probability mass function for the random sample x= x 1 x N T x x 1 x N is

fθx=n=1Nfθ x n n=1Nθk1-θN-k f θ x n 1 N f θ x n n 1 N θ f θ x x n 1 θ 1 x n θ k 1 θ N k (2)
where k=n=1N x n k n 1 N x n is the number of 1's in the sample.

We will show that kk is a sufficient statistic for xx. This will entail showing that the conditional probability mass function fθ|kx f θ k x does not depend on θθ.

The distribution of the number of ones in NN independent Bernoulli trials is binomial: fθk=Nkθk1-θN-k f θ k N k θ k 1 θ N k Next, consider the joint distribution of ( x , x n ) ( x , x n ) . We have fθx=fθx x n f θ x f θ x x n Thus, the conditional probability may be written

fθ|kx=fθxkfθk=fθxfθk=θk1-θN-kNkθk1-θN-k=1Nk f θ k x f θ x k f θ k f θ x f θ k θ k 1 θ N k N k θ k 1 θ N k 1 N k (3)
This shows that kk is indeed a sufficient statistic for θθ. The NN values x 1 , , x N x 1 , , x N can be replaced by the quantity kk without losing information about θθ.

Exercise 1

In the previous example, suppose we wish to store in memory the information we possess about θθ. Compare the savings, in terms of bits, we gain by storing the sufficient statistic kk instead of the full sample x 1 , , x N x 1 , , x N .

Determining Sufficient Statistics

In the example above, we had to guess the sufficient statistic, and work out the conditional probability by hand. In general, this will be a tedious way to go about finding sufficient statistics. Fortunately, spotting sufficient statistics can be made easier by the Fisher-Neyman Factorization Theorem.

Uses of Sufficient Statistics

Sufficient statistics have many uses in statistical inference problems. In hypothesis testing, the Likelihood Ratio Test can often be reduced to a sufficient statistic of the data. In parameter estimation, the Minimum Variance Unbiased Estimator of a parameter θθ can be characterized by sufficient statistics and the Rao-Blackwell Theorem.

Minimality and Completeness

Minimal sufficient statistics are, roughly speaking, sufficient statistics that cannot be compressed any more without losing information about the unknown parameter. Completeness is a technical characterization of sufficient statistics that allows one to prove minimality. These topics are covered in detail in this module.

Further examples of sufficient statistics may be found in the module on the Fisher-Neyman Factorization Theorem.

References

  1. L. Scharf. (1991). Statistical Signal Processing. Addison-Wesley.

Comments, questions, feedback, criticisms?

Send feedback