Skip to content Skip to navigation

Connexions

You are here: Home » Content » Sufficient Statistics

Navigation

Content Actions

  • Download module PDF
  • Add to ...
    Add the module to:
    • My Favorites
    • A lens
    • An external social bookmarking service
    • My Favorites (What is 'My Favorites'?)
      'My Favorites' is a special kind of lens which you can use to bookmark modules and collections directly in Connexions. 'My Favorites' can only be seen by you, and collections saved in 'My Favorites' can remember the last module you were on. You need a Connexions account to use 'My Favorites'.
    • A lens (What is a lens?)

      Definition of a lens

      Lenses

      A lens is a custom view of Connexions content. You can think of it as a fancy kind of list that will let you see Connexions through the eyes of organizations and people you trust.

      What is in a lens?

      Lens makers point to Connexions materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

      Who can create a lens?

      Any individual Connexions member, a community, or a respected organization.

      What are tags? tag icon

      Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

    • External bookmarks
  • E-mail the authors
  • Rate this module (How does the rating system work?)

    Rating system

    Ratings

    Ratings allow you to judge the quality of modules. If other users have ranked the module then its average rating is displayed below. Ratings are calculated on a scale from one star (Poor) to five stars (Excellent).

    How to rate a module

    Hover over the star that corresponds to the rating you wish to assign. Click on the star to add your rating. Your rating should be based on the quality of the content. You must have an account and be logged in to rate content.

    (0 ratings)

Recently Viewed

This feature requires Javascript to be enabled.

Sufficient Statistics

Module by: Clayton Scott, Robert Nowak

Note: Your browser may not currently support MathML. See our browser support page for additional details. You can always view the correct math in the PDF version.

Introduction

Sufficient statistics arise in nearly every aspect of statistical inference. It is important to understand them before progressing to areas such as hypothesis testing and parameter estimation.

Suppose we observe an NN-dimensional random vector XX, characterized by the density or mass function fθx f θ x , where θθ is a pp-dimensional vector of parameters to be estimated. The functional form of fx f x is assumed known. The parameter θθ completely determines the distribution of XX. Conversely, a measurement xx of XX provides information about θθ through the probability law fθx f θ x .

Example 1

Suppose X= X 1 X 2 X X 1 X 2 , where X i θ1 X i θ 1 are IID. Here θθ is a scalar parameter specifying the mean. The distribution of XX is determined by θθ through the density fθx=12π- x 1 θ2212π- x 2 θ22 f θ x 1 2 x 1 θ 2 2 1 2 x 2 θ 2 2 On the other hand, if we observe x=100102 x 100 102 , then we may safely assume θ=0 θ 0 is highly unlikely.

The NN-dimensional observation XX carries information about the pp-dimensional parameter vector θθ. If p<N p N , one may ask the following question: Can we compress xx into a low-dimensional statistic without any loss of information? Does there exist some function t=Tx t T x , where the dimension of tt is M<N M N , such that tt carries all the useful information about θθ?

If so, for the purpose of studying θθ we could discard the raw measurements xx and retain only the low-dimensional statistic tt. We call tt a sufficient statistic. The following definition captures this notion precisely:

Definition 1:
Let X1 , , XM X 1 , , X M be a random sample, governed by the density or probability mass function fx|θ f θ x . The statistic Tx T x is sufficient for θθ if the conditional distribution of xx, given Tx=t T x t , is independent of θθ. Equivalently, the functional form of fθ|tx f θ t x does not involve θθ.
How should we interpret this definition? Here are some possibilities:

1. Let fθxt f θ x t denote the joint density or probability mass function on ( X , T ( X ) ) ( X , T ( X ) ) . If TX T X is a sufficient statistic for θθ, then

fθx=fθxTx=fθ|txfθt=fx|tfθt f θ x f θ x T x f θ t x f θ t f t x f θ t (1)
Therefore, the parametrization of the probability law for the measurement xx is manifested in the parametrization of the probability law for the statistic Tx T x .

2. Given t=Tx t T x , full knowledge of the measurement xx brings no additional information about θθ. Thus, we may discard xx and retain on the compressed statistic tt.

3. Any inference strategy based on fθx f θ x may be replaced by a strategy based on fθt f θ t .

Example 2

Binary Information Source

(Scharf, pp.78) Suppose a binary information source emits a sequence of binary (0 or 1) valued, independent variables x 1 , , x N x 1 , , x N . Each binary symbol may be viewed as a realization of a Bernoulli trial: x n Bernoulliθ x n Bernoulli θ , iid. The parameter θ01 θ 0 1 is to be estimated.

The probability mass function for the random sample x= x 1 x N T x x 1 x N is

fθx=n=1Nfθ x n n=1Nθk1θNk f θ x n 1 N f θ x n n 1 N θ f θ x x n 1 θ 1 x n θ k 1 θ N k (2)
where k=n=1N x n k n 1 N x n is the number of 1's in the sample.

We will show that kk is a sufficient statistic for xx. This will entail showing that the conditional probability mass function fθ|kx f θ k x does not depend on θθ.

The distribution of the number of ones in NN independent Bernoulli trials is binomial: fθk=Nkθk1θNk f θ k N k θ k 1 θ N k Next, consider the joint distribution of ( x , x n ) ( x , x n ) . We have fθx=fθx x n f θ x f θ x x n Thus, the conditional probability may be written

fθ|kx=fθxkfθk=fθxfθk=θk1θNkNkθk1θNk=1Nk f θ k x f θ x k f θ k f θ x f θ k θ k 1 θ N k N k θ k 1 θ N k 1 N k (3)
This shows that kk is indeed a sufficient statistic for θθ. The NN values x 1 , , x N x 1 , , x N can be replaced by the quantity kk without losing information about θθ.

Exercise 1

In the previous example, suppose we wish to store in memory the information we possess about θθ. Compare the savings, in terms of bits, we gain by storing the sufficient statistic kk instead of the full sample x 1 , , x N x 1 , , x N .

Determining Sufficient Statistics

In the example above, we had to guess the sufficient statistic, and work out the conditional probability by hand. In general, this will be a tedious way to go about finding sufficient statistics. Fortunately, spotting sufficient statistics can be made easier by the Fisher-Neyman Factorization Theorem.

Uses of Sufficient Statistics

Sufficient statistics have many uses in statistical inference problems. In hypothesis testing, the Likelihood Ratio Test can often be reduced to a sufficient statistic of the data. In parameter estimation, the Minimum Variance Unbiased Estimator of a parameter θθ can be characterized by sufficient statistics and the Rao-Blackwell Theorem.

Minimality and Completeness

Minimal sufficient statistics are, roughly speaking, sufficient statistics that cannot be compressed any more without losing information about the unknown parameter. Completeness is a technical characterization of sufficient statistics that allows one to prove minimality. These topics are covered in detail in this module.

Further examples of sufficient statistics may be found in the module on the Fisher-Neyman Factorization Theorem.

References

  1. L. Scharf. (1991). Statistical Signal Processing. Addison-Wesley.

Comments, questions, feedback, criticisms?

Send feedback