# OpenStax-CNX

You are here: Home » Content » Applied Probability » Convergence and the central Limit Theorem

• Preface to Pfeiffer Applied Probability

### Lenses

What is a lens?

#### Definition of a lens

##### Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

##### What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

##### Who can create a lens?

Any individual member, a community, or a respected organization.

##### What are tags?

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

#### Affiliated with (What does "Affiliated with" mean?)

This content is either by members of the organizations listed or about topics related to the organizations listed. Click each link to see a list of all content affiliated with the organization.
• Rice Digital Scholarship

This collection is included in aLens by: Digital Scholarship at Rice University

Click the "Rice Digital Scholarship" link to see all content affiliated with them.

#### Also in these lenses

• UniqU content

This collection is included inLens: UniqU's lens
By: UniqU, LLC

Click the "UniqU content" link to see all content selected in this lens.

### Recently Viewed

This feature requires Javascript to be enabled.

Inside Collection:

Collection by: Paul E Pfeiffer. E-mail the author

# Convergence and the central Limit Theorem

Module by: Paul E Pfeiffer. E-mail the author

Summary: The central limit theorem (CLT) asserts that the sum of a large class of independent random variables, each with reasonable distributions,is approximately normally distributed. Various versions of this theorem have been studied intensively. On the other hand, certain common forms serve as the basis of an extraordinary amount of applied work. In the statistics of large samples, the sample average is approximately normal—whether or not the population distribution is normal. In much of the theory of errors of measurement, the observed error is the sum of a large number of independent random quantities which contribute additively to the result. Similarly, in the theory of noise, the noise signal is the sum of a large number of random components, independently produced. In such situations, the assumption of a normal population distribution is frequently quite appropriate

## The Central Limit Theorem

The central limit theorem (CLT) asserts that if random variable X is the sum of a large class of independent random variables, each with reasonable distributions, then X is approximately normally distributed. This celebrated theorem has been the object of extensive theoretical research directed toward the discovery of the most general conditions under which it is valid. On the other hand, this theorem serves as the basis of an extraordinary amount of applied work. In the statistics of large samples, the sample average is a constant times the sum of the random variables in the sampling process . Thus, for large samples, the sample average is approximately normal—whether or not the population distribution is normal. In much of the theory of errors of measurement, the observed error is the sum of a large number of independent random quantities which contribute additively to the result. Similarly, in the theory of noise, the noise signal is the sum of a large number of random components, independently produced. In such situations, the assumption of a normal population distribution is frequently quite appropriate.

We consider a form of the CLT under hypotheses which are reasonable assumptions in many practical situations. We sketch a proof of this version of the CLT, known as the Lindeberg-Lévy theorem, which utilizes the limit theorem on characteristic functions, above, along with certain elementary facts from analysis. It illustrates the kind of argument used in more sophisticated proofs required for more general cases.

Consider an independent sequence {Xn:1n}{Xn:1n} of random variables. Form the sequence of partial sums

S n = i = 1 n X i n 1 with E [ S n ] = i = 1 n E [ X i ] and Var [ S n ] = i = 1 n Var [ X i ] S n = i = 1 n X i n 1 with E [ S n ] = i = 1 n E [ X i ] and Var [ S n ] = i = 1 n Var [ X i ]
(1)

Let Sn*Sn* be the standardized sum and let Fn be the distribution function for Sn*Sn*. The CLT asserts that under appropriate conditions, Fn(t)Φ(t)Fn(t)Φ(t) as nn for all t. We sketch a proof of the theorem under the condition the Xi form an iid class.

Central Limit Theorem (Lindeberg-Lévy form)

If {Xn:1n}{Xn:1n} is iid, with

E [ X i ] = μ , Var [ X i ] = σ 2 , and S n * = S n - n μ σ n E [ X i ] = μ , Var [ X i ] = σ 2 , and S n * = S n - n μ σ n
(2)

then

F n ( t ) Φ ( t ) as n , for all t F n ( t ) Φ ( t ) as n , for all t
(3)

IDEAS OF A PROOF

There is no loss of generality in assuming μ=0μ=0. Let φ be the common characteristic function for the Xi, and for each n let φn be the characteristic function for Sn*Sn*. We have

φ ( t ) = E [ e i t X ] and φ n ( t ) = E [ e i t S n * ] = φ n ( t / σ n ) φ ( t ) = E [ e i t X ] and φ n ( t ) = E [ e i t S n * ] = φ n ( t / σ n )
(4)

Using the power series expansion of φ about the origin noted above, we have

φ ( t ) = 1 - σ 2 t 2 2 + β ( t ) where β ( t ) = o ( t 2 ) as t 0 φ ( t ) = 1 - σ 2 t 2 2 + β ( t ) where β ( t ) = o ( t 2 ) as t 0
(5)

This implies

| φ ( t / σ n ) - ( 1 - t 2 / 2 n ) | = | β ( t / σ n ) | = o ( t 2 / σ 2 n ) | φ ( t / σ n ) - ( 1 - t 2 / 2 n ) | = | β ( t / σ n ) | = o ( t 2 / σ 2 n )
(6)

so that

n | φ ( t / σ n ) - ( 1 - t 2 / 2 n ) | 0 as n n | φ ( t / σ n ) - ( 1 - t 2 / 2 n ) | 0 as n
(7)

A standard lemma of analysis ensures

| φ n ( t / σ n ) - ( 1 - t 2 / 2 n ) n | n | φ ( t / σ n ) - ( 1 - t 2 / 2 n ) | 0 as n | φ n ( t / σ n ) - ( 1 - t 2 / 2 n ) n | n | φ ( t / σ n ) - ( 1 - t 2 / 2 n ) | 0 as n
(8)

It is a well known property of the exponential that

1 - t 2 2 n n e - t 2 / 2 as n 1 - t 2 2 n n e - t 2 / 2 as n
(9)

so that

φ ( t / σ n ) e - t 2 / 2 as n for all t φ ( t / σ n ) e - t 2 / 2 as n for all t
(10)

By the convergence theorem on characteristic functions, above, Fn(t)Φ(t)Fn(t)Φ(t).

The theorem says that the distribution functions for sums of increasing numbers of the Xi converge to the normal distribution function, but it does not tell how fast. It is instructive to consider some examples, which are easily worked out with the aid of our m-functions.

Demonstration of the central limit theorem

Discrete examples

We first examine the gaussian approximation in two cases. We take the sum of five iid simple random variables in each case. The first variable has six distinct values; the second has only three. The discrete character of the sum is more evident in the second case. Here we use not only the gaussian approximation, but the gaussian approximation shifted one half unit (the so called continuity correction for integer-values random variables). The fit is remarkably good in either case with only five terms.

A principal tool is the m-function diidsum (sum of discrete iid random variables). It uses a designated number of iterations of mgsum.

### Example 1: First random variable

X = [-3.2 -1.05 2.1 4.6 5.3 7.2];
PX = 0.1*[2 2 1 3 1 1];
EX = X*PX'
EX =  1.9900
VX = dot(X.^2,PX) - EX^2
VX = 13.0904
[x,px] = diidsum(X,PX,5);            % Distribution for the sum of 5 iid rv
F = cumsum(px);                      % Distribution function for the sum
stairs(x,F)                          % Stair step plot
hold on
plot(x,gaussian(5*EX,5*VX,x),'-.')   % Plot of gaussian distribution function
% Plotting details                   (see Figure 1)


### Example 2: Second random variable

X = 1:3;
PX = [0.3 0.5 0.2];
EX = X*PX'
EX = 1.9000
EX2 = X.^2*PX'
EX2 =  4.1000
VX = EX2 - EX^2
VX =  0.4900
[x,px] = diidsum(X,PX,5);            % Distribution for the sum of 5 iid rv
F = cumsum(px);                      % Distribution function for the sum
stairs(x,F)                          % Stair step plot
hold on
plot(x,gaussian(5*EX,5*VX,x),'-.')   % Plot of gaussian distribution function
plot(x,gaussian(5*EX,5*VX,x+0.5),'o')  % Plot with continuity correction
% Plotting details                   (see Figure 2)


As another example, we take the sum of twenty one iid simple random variables with integer values. We examine only part of the distribution function where most of the probability is concentrated. This effectively enlarges the x-scale, so that the nature of the approximation is more readily apparent.

### Example 3: Sum of twenty-one iid random variables

X = [0 1 3 5 6];
PX = 0.1*[1 2 3 2 2];
EX = dot(X,PX)
EX =  3.3000
VX = dot(X.^2,PX) - EX^2
VX =  4.2100
[x,px] = diidsum(X,PX,21);
F = cumsum(px);
FG = gaussian(21*EX,21*VX,x);
stairs(40:90,F(40:90))
hold on
plot(40:90,FG(40:90))
% Plotting details               (see Figure 3)


Absolutely continuous examples

By use of the discrete approximation, we may get approximations to the sums of absolutely continuous random variables. The results on discrete variables indicate that the more values the more quickly the conversion seems to occur. In our next example, we start with a random variable uniform on (0,1)(0,1).

### Example 4: Sum of three iid, uniform random variables.

Suppose XX uniform (0,1)(0,1). Then E[X]=0.5E[X]=0.5 and Var [X]=1/12 Var [X]=1/12.

tappr
Enter matrix [a b] of x-range endpoints  [0 1]
Enter number of x approximation points  100
Enter density as a function of t  t<=1
Use row matrices X and PX as in the simple case
EX = 0.5;
VX = 1/12;
[z,pz] = diidsum(X,PX,3);
F = cumsum(pz);
FG = gaussian(3*EX,3*VX,z);
length(z)
ans = 298
a = 1:5:296;                     % Plot every fifth point
plot(z(a),F(a),z(a),FG(a),'o')
% Plotting details               (see Figure 4)


For the sum of only three random variables, the fit is remarkably good. This is not entirely surprising, since the sum of two gives a symmetric triangular distribution on (0,2)(0,2). Other distributions may take many more terms to get a good fit. Consider the following example.

### Example 5: Sum of eight iid random variables

Suppose the density is one on the intervals (-1,-0.5)(-1,-0.5) and (0.5,1)(0.5,1). Although the density is symmetric, it has two separate regions of probability. From symmetry, E[X]=0E[X]=0. Calculations show Var [X]=E[X2]=7/12 Var [X]=E[X2]=7/12. The MATLAB computations are:

tappr
Enter matrix [a b] of x-range endpoints  [-1 1]
Enter number of x approximation points  200
Enter density as a function of t  (t<=-0.5)|(t>=0.5)
Use row matrices X and PX as in the simple case
[z,pz] = diidsum(X,PX,8);
VX = 7/12;
F = cumsum(pz);
FG = gaussian(0,8*VX,z);
plot(z,F,z,FG)
% Plottting details                (see Figure 5)


Although the sum of eight random variables is used, the fit to the gaussian is not as good as that for the sum of three in Example 4. In either case, the convergence is remarkable fast—only a few terms are needed for good approximation.

## Convergence phenomena in probability theory

The central limit theorem exhibits one of several kinds of convergence important in probability theory, namely convergence in distribution (sometimes called weak convergence). The increasing concentration of values of the sample average random variable An with increasing n illustrates convergence in probability. The convergence of the sample average is a form of the so-called weak law of large numbers. For large enough n the probability that An lies within a given distance of the population mean can be made as near one as desired. The fact that the variance of An becomes small for large n illustrates convergence in the mean (of order 2).

E [ | A n - μ | 2 ] 0 as n E [ | A n - μ | 2 ] 0 as n
(11)

In the calculus, we deal with sequences of numbers. If {an:1n}{an:1n} is a sequence of real numbers, we say the sequence converges iff for N sufficiently large an approximates arbitrarily closely some number L for all nNnN. This unique number L is called the limit of the sequence. Convergent sequences are characterized by the fact that for large enough N, the distance |an-am||an-am| between any two terms is arbitrarily small for all n,mNn,mN. Such a sequence is said to be fundamental (or Cauchy). To be precise, if we let ϵ>0ϵ>0 be the error of approximation, then the sequence is

• Convergent iff there exists a number L such that for any ϵ>0ϵ>0 there is an N such that
|L-an|ϵforallnN|L-an|ϵforallnN
(12)
• Fundamental iff for any ϵ>0ϵ>0 there is an N such that
|an-am|ϵforalln,mN|an-am|ϵforalln,mN
(13)

As a result of the completeness of the real numbers, it is true that any fundamental sequence converges (i.e., has a limit). And such convergence has certain desirable properties. For example the limit of a linear combination of sequences is that linear combination of the separate limits; and limits of products are the products of the limits.

The notion of convergent and fundamental sequences applies to sequences of real-valued functions with a common domain. For each x in the domain, we have a sequence

{fn(x):1n}{fn(x):1n} of real numbers. The sequence may converge for some x and fail to converge for others.

A somewhat more restrictive condition (and often a more desirable one) for sequences of functions is uniform convergence. Here the uniformity is over values of the argument x. In this case, for any ϵ>0ϵ>0 there exists an N which works for all x (or for some suitable prescribed set of x).

These concepts may be applied to a sequence of random variables, which are real-valued functions with domain Ω and argument ω. Suppose {Xn:1n}{Xn:1n} is a sequence of real random variables. For each argument ω we have a sequence {Xn(ω):1n}{Xn(ω):1n} of real numbers. It is quite possible that such a sequence converges for some ω and diverges (fails to converge) for others. As a matter of fact, in many important cases the sequence converges for all ω except possibly a set (event) of probability zero. In this case, we say the seqeunce converges almost surely (abbreviated a.s.). The notion of uniform convergence also applies. In probability theory we have the notion of almost uniform convergence. This is the case that the sequence converges uniformly for all ω except for a set of arbitrarily small probability.

The notion of convergence in probability noted above is a quite different kind of convergence. Rather than deal with the sequence on a pointwise basis, it deals with the random variables as such. In the case of sample average, the “closeness” to a limit is expressed in terms of the probability that the observed value Xn(ω)Xn(ω) should lie close the the value X(ω)X(ω) of the limiting random variable. We may state this precisely as follows:

A sequence {Xn:1n}{Xn:1n} converges to Xin probability, designated XnPXXnPX iff for any ϵ>0ϵ>0,

lim n P ( | X - X n | > ϵ ) = 0 lim n P ( | X - X n | > ϵ ) = 0
(14)

There is a corresponding notion of a sequence fundamental in probability.

The following schematic representation may help to visualize the difference between almost-sure convergence and convergence in probability. In setting up the basic probability model, we think in terms of “balls” drawn from a jar or box. Instead of balls, consider for each possible outcome ω a “tape” on which there is the sequence of values X1(ω),X2(ω),X3(ω),X1(ω),X2(ω),X3(ω),.

• If the sequence of random variable converges a.s. to a random variable X, then there is an set of “exceptional tapes” which has zero probability. For all other tapes, Xn(ω)X(ω)Xn(ω)X(ω). This means that by going far enough out on any such tape, the values Xn(ω)Xn(ω) beyond that point all lie within a prescribed distance of the value X(ω)X(ω) of the limit random variable.
• If the sequence converges in probability, the situation may be quite different. A tape is selected. For n sufficiently large, the probability is arbitrarily near one that the observed value Xn(ω)Xn(ω) lies within a prescribed distance of X(ω)X(ω). This says nothing about the values Xm(ω)Xm(ω) on the selected tape for any larger m. In fact, the sequence on the selected tape may very well diverge.

It is not difficult to construct examples for which there is convergence in probability but pointwise convergence for no ω. It is easy to confuse these two types of convergence. The kind of convergence noted for the sample average is convergence in probability (a “weak” law of large numbers). What is really desired in most cases is a.s. convergence (a “strong” law of large numbers). It turns out that for a sampling process of the kind used in simple statistics, the convergence of the sample average is almost sure (i.e., the strong law holds). To establish this requires much more detailed and sophisticated analysis than we are prepared to make in this treatment.

The notion of mean convergence illustrated by the reduction of Var [An] Var [An] with increasing n may be expressed more generally and more precisely as follows. A sequence {Xn:1n}{Xn:1n} converges in the mean of order p to X iff

E [ | X - X n | p ] 0 as n designated X n L p X ; as n E [ | X - X n | p ] 0 as n designated X n L p X ; as n
(15)

If the order p is one, we simply say the sequence converges in the mean. For p=2p=2, we speak of mean-square convergence.

The introduction of a new type of convergence raises a number of questions.

1. There is the question of fundamental (or Cauchy) sequences and convergent sequences.
2. Do the various types of limits have the usual properties of limits? Is the limit of a linear combination of sequences the linear combination of the limits? Is the limit of products the product of the limits?
3. What conditions imply the various kinds of convergence?
4. What is the relation between the various kinds of convergence?

Before sketching briefly some of the relationships between convergence types, we consider one important condition known as uniform integrability. According to the property (E9b) for integrals

X is integrable iff E [ I { | X t | > a } | X t | ] 0 as a X is integrable iff E [ I { | X t | > a } | X t | ] 0 as a
(16)

Roughly speaking, to be integrable a random variable cannot be too large on too large a set. We use this characterization of the integrability of a single random variable to define the notion of the uniform integrability of a class.

Definition. An arbitrary class {Xt:tT}{Xt:tT} is uniformly integrable (abbreviated u.i.) with respect to probability measure P iff

sup t T E [ I { | X t | > a } | X t | ] 0 as a sup t T E [ I { | X t | > a } | X t | ] 0 as a
(17)

This condition plays a key role in many aspects of theoretical probability.

The relationships between types of convergence are important. Sometimes only one kind can be established. Also, it may be easier to establish one type which implies another of more immediate interest. We simply state informally some of the important relationships. A somewhat more detailed summary is given in PA, Chapter 17. But for a complete treatment it is necessary to consult more advanced treatments of probability and measure.

Relationships between types of convergence for probability measures

Consider a sequence {Xn:1n}{Xn:1n} of random variables.

1. It converges almost surely iff it converges almost uniformly.
2. If it converges almost surely, then it converges in probability.
3. It converges in mean, order p, iff it is uniformly integrable and converges in probability.
4. If it converges in probability, then it converges in distribution (i.e. weakly).

Various chains of implication can be traced. For example

• Almost sure convergence implies convergence in probability implies convergence in distribution.
• Almost sure convergence and uniform integrability implies convergence in mean p.

We do not develop the underlying theory. While much of it could be treated with elementary ideas, a complete treatment requires considerable development of the underlying measure theory. However, it is important to be aware of these various types of convergence, since they are frequently utilized in advanced treatments of applied probability and of statistics.

## Content actions

PDF | EPUB (?)

### What is an EPUB file?

EPUB is an electronic book format that can be read on a variety of mobile devices.

#### Collection to:

My Favorites (?)

'My Favorites' is a special kind of lens which you can use to bookmark modules and collections. 'My Favorites' can only be seen by you, and collections saved in 'My Favorites' can remember the last module you were on. You need an account to use 'My Favorites'.

| A lens I own (?)

#### Definition of a lens

##### Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

##### What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

##### Who can create a lens?

Any individual member, a community, or a respected organization.

##### What are tags?

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

| External bookmarks

#### Module to:

My Favorites (?)

'My Favorites' is a special kind of lens which you can use to bookmark modules and collections. 'My Favorites' can only be seen by you, and collections saved in 'My Favorites' can remember the last module you were on. You need an account to use 'My Favorites'.

| A lens I own (?)

#### Definition of a lens

##### Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

##### What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

##### Who can create a lens?

Any individual member, a community, or a respected organization.

##### What are tags?

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

| External bookmarks