Skip to content Skip to navigation

OpenStax-CNX

You are here: Home » Content » Concentration of measure for sub-Gaussian random variables

Navigation

Recently Viewed

This feature requires Javascript to be enabled.
 

Concentration of measure for sub-Gaussian random variables

Module by: Mark A. Davenport. E-mail the author

Summary: This module establishes concentration bounds for sub-Gaussian vectors and matrices.

Sub-Gaussian distributions have a close relationship to the concentration of measure phenomenon [3]. To illustrate this, we note that we can combine Lemma 2 and Theorem 1 from "Sub-Gaussian random variables" to obtain deviation bounds for weighted sums of sub-Gaussian random variables. For our purposes, however, it will be more enlightening to study the norm of a vector of sub-Gaussian random variables. In particular, if XX is a vector where each XiXi is i.i.d. with Xi Sub (c)Xi Sub (c), then we would like to know how X2X2 deviates from its mean.

In order to establish the result, we will make use of Markov's inequality for nonnegative random variables.

Lemma 1: (Markov's Inequality)

For any nonnegative random variable XX and t>0t>0,

P ( X t ) E ( X ) t . P ( X t ) E ( X ) t .
(1)

Proof

Let f(x)f(x) denote the probability density function for XX.

E ( X ) = 0 x f ( x ) d x t x f ( x ) d x t t f ( x ) d x = t P ( X t ) . E ( X ) = 0 x f ( x ) d x t x f ( x ) d x t t f ( x ) d x = t P ( X t ) .
(2)

In addition, we will require the following bound on the exponential moment of a sub-Gaussian random variable.

Lemma 2

Suppose X Sub (c2)X Sub (c2). Then

E exp ( λ X 2 / 2 c 2 ) 1 1 - λ , E exp ( λ X 2 / 2 c 2 ) 1 1 - λ ,
(3)

for any λ[0,1)λ[0,1).

Proof

First, observe that if λ=0λ=0, then the lemma holds trivially. Thus, suppose that λ(0,1)λ(0,1). Let f(x)f(x) denote the probability density function for XX. Since XX is sub-Gaussian, we have by definition that

- exp ( t x ) f ( x ) d x exp ( c 2 t 2 / 2 ) - exp ( t x ) f ( x ) d x exp ( c 2 t 2 / 2 )
(4)

for any tRtR. If we multiply by exp(-c2t2/2λ)exp(-c2t2/2λ), then we obtain

- exp ( t x - c 2 t 2 / 2 λ ) f ( x ) d x exp ( c 2 t 2 ( λ - 1 ) / 2 λ ) . - exp ( t x - c 2 t 2 / 2 λ ) f ( x ) d x exp ( c 2 t 2 ( λ - 1 ) / 2 λ ) .
(5)

Now, integrating both sides with respect to tt, we obtain

- - exp ( t x - c 2 t 2 / 2 λ ) d t f ( x ) d x - exp ( c 2 t 2 ( λ - 1 ) / 2 λ ) d t , - - exp ( t x - c 2 t 2 / 2 λ ) d t f ( x ) d x - exp ( c 2 t 2 ( λ - 1 ) / 2 λ ) d t ,
(6)

which reduces to

1 c 2 π λ - exp ( λ x 2 / 2 c 2 ) f ( x ) d x 1 c 2 π λ 1 - λ . 1 c 2 π λ - exp ( λ x 2 / 2 c 2 ) f ( x ) d x 1 c 2 π λ 1 - λ .
(7)

This simplifies to prove the lemma.

We now state our main theorem, which generalizes the results of [2] and uses substantially the same proof technique.

Theorem 1

Suppose that X=[X1,X2,...,XM]X=[X1,X2,...,XM], where each XiXi is i.i.d. with Xi Sub (c2)Xi Sub (c2) and E(Xi2)=σ2E(Xi2)=σ2. Then

E X 2 2 = M σ 2 . E X 2 2 = M σ 2 .
(8)

Moreover, for any α(0,1)α(0,1) and for any β[c2/σ2,β max ]β[c2/σ2,β max ], there exists a constant κ*4κ*4 depending only on β max β max and the ratio σ2/c2σ2/c2 such that

P X 2 2 α M σ 2 exp - M ( 1 - α ) 2 / κ * P X 2 2 α M σ 2 exp - M ( 1 - α ) 2 / κ *
(9)

and

P X 2 2 β M σ 2 exp - M ( β - 1 ) 2 / κ * . P X 2 2 β M σ 2 exp - M ( β - 1 ) 2 / κ * .
(10)

Proof

Since the XiXi are independent, we obtain

E X 2 2 = i = 1 M E X i 2 = i = 1 M σ 2 = M σ 2 E X 2 2 = i = 1 M E X i 2 = i = 1 M σ 2 = M σ 2
(11)

and hence Equation 8 holds. We now turn to Equation 9 and Equation 10. Let us first consider Equation 10. We begin by applying Markov's inequality:

P X 2 2 β M σ 2 = P exp ( λ X 2 2 ) exp λ β M σ 2 E exp ( λ X 2 2 ) exp λ β M σ 2 = i = 1 M E exp ( λ X i 2 ) exp λ β M σ 2 . P X 2 2 β M σ 2 = P exp ( λ X 2 2 ) exp λ β M σ 2 E exp ( λ X 2 2 ) exp λ β M σ 2 = i = 1 M E exp ( λ X i 2 ) exp λ β M σ 2 .
(12)

Since Xi Sub (c2)Xi Sub (c2), we have from Lemma 2 that

E exp ( λ X i 2 ) = E exp ( 2 c 2 λ X i 2 / 2 c 2 ) 1 1 - 2 c 2 λ . E exp ( λ X i 2 ) = E exp ( 2 c 2 λ X i 2 / 2 c 2 ) 1 1 - 2 c 2 λ .
(13)

Thus,

i = 1 M E exp λ X i 2 1 1 - 2 c 2 λ M / 2 i = 1 M E exp λ X i 2 1 1 - 2 c 2 λ M / 2
(14)

and hence

P X 2 2 β M σ 2 exp - 2 λ β σ 2 1 - 2 c 2 λ M / 2 . P X 2 2 β M σ 2 exp - 2 λ β σ 2 1 - 2 c 2 λ M / 2 .
(15)

By setting the derivative to zero and solving for λλ, one can show that the optimal λλ is

λ = β σ 2 - c 2 2 c 2 σ 2 ( 1 + β ) . λ = β σ 2 - c 2 2 c 2 σ 2 ( 1 + β ) .
(16)

Plugging this in we obtain

P X 2 2 β M σ 2 β σ 2 c 2 exp 1 - β σ 2 c 2 M / 2 . P X 2 2 β M σ 2 β σ 2 c 2 exp 1 - β σ 2 c 2 M / 2 .
(17)

Similarly,

P X 2 2 α M σ 2 α σ 2 c 2 exp 1 - α σ 2 c 2 M / 2 . P X 2 2 α M σ 2 α σ 2 c 2 exp 1 - α σ 2 c 2 M / 2 .
(18)

In order to combine and simplify these inequalities, note that if we define

κ * = max 4 , 2 ( β max σ 2 / c - 1 ) 2 ( β max σ 2 / c - 1 ) - log ( β max σ 2 / c ) κ * = max 4 , 2 ( β max σ 2 / c - 1 ) 2 ( β max σ 2 / c - 1 ) - log ( β max σ 2 / c )
(19)

then we have that for any γ[0,β max σ2/c]γ[0,β max σ2/c] we have the bound

log ( γ ) ( γ - 1 ) - 2 ( γ - 1 ) 2 κ * , log ( γ ) ( γ - 1 ) - 2 ( γ - 1 ) 2 κ * ,
(20)

and hence

γ exp ( γ - 1 ) - 2 ( γ - 1 ) 2 κ * . γ exp ( γ - 1 ) - 2 ( γ - 1 ) 2 κ * .
(21)

By setting γ=ασ2/c2γ=ασ2/c2, Equation 18 reduces to yield Equation 9. Similarly, setting γ=βσ2/c2γ=βσ2/c2 establishes Equation 10.

This result tells us that given a vector with entries drawn according to a sub-Gaussian distribution, we can expect the norm of the vector to concentrate around its expected value of Mσ2Mσ2 with exponentially high probability as MM grows. Note, however, that the range of allowable choices for ββ in Equation 10 is limited to βc2/σ21βc2/σ21. Thus, for a general sub-Gaussian distribution, we may be unable to achieve an arbitrarily tight concentration. However, recall that for strictly sub-Gaussian distributions we have that c2=σ2c2=σ2, in which there is no such restriction. Moreover, for strictly sub-Gaussian distributions we also have the following useful corollary.1

Corollary 1

Suppose that X=[X1,X2,...,XM]X=[X1,X2,...,XM], where each XiXi is i.i.d. with Xi SSub (σ2)Xi SSub (σ2). Then

E X 2 2 = M σ 2 E X 2 2 = M σ 2
(22)

and for any ϵ>0ϵ>0,

P X 2 2 - M σ 2 ϵ M σ 2 2 exp - M ϵ 2 κ * P X 2 2 - M σ 2 ϵ M σ 2 2 exp - M ϵ 2 κ *
(23)

with κ*=2/(1-log(2))6.52κ*=2/(1-log(2))6.52.

Proof

Since each Xi SSub (σ2)Xi SSub (σ2), we have that Xi Sub (σ2)Xi Sub (σ2) and E(Xi2)=σ2E(Xi2)=σ2, in which case we may apply Theorem 1 with α=1-ϵα=1-ϵ and β=1+ϵβ=1+ϵ. This allows us to simplify and combine the bounds in Equation 9 and Equation 10 to obtain Equation 23. The value of κ*κ* follows from the observation that 1+ϵ21+ϵ2 so that we can set β max =2β max =2.

Finally, from Corollary 1 we also have the following additional useful corollary. This result generalizes the main results of [1] to the broader family of general strictly sub-Gaussian distributions via a much simpler proof.

Corollary 2

Suppose that ΦΦ is an M×NM×N matrix whose entries φijφij are i.i.d. with φij SSub (1/M)φij SSub (1/M). Let Y=ΦxY=Φx for xRNxRN. Then for any ϵ>0ϵ>0, and any xRNxRN,

E Y 2 2 = x 2 2 E Y 2 2 = x 2 2
(24)

and

P Y 2 2 - x 2 2 ϵ x 2 2 2 exp - M ϵ 2 κ * P Y 2 2 - x 2 2 ϵ x 2 2 2 exp - M ϵ 2 κ *
(25)

with κ*=2/(1-log(2))6.52κ*=2/(1-log(2))6.52.

Proof

Let φiφi denote the i th i th row of ΦΦ. Observe that if YiYi denotes the first element of YY, then Yi=φi,xYi=φi,x, and thus by Lemma 2 from "Sub-Gaussian random variables", Yi SSub x22/MYi SSub x22/M. Applying Corollary 1 to the MM-dimensional random vector YY, we obtain Equation 25.

Footnotes

  1. Corollary 1 exploits the strictness in the strictly sub-Gaussian distribution twice — first to ensure that β(1,2]β(1,2] is an admissible range for ββ and then to simplify the computation of κ*κ*. One could easily establish a different version of this corollary for non-strictly sub-Gaussian vectors but for which we consider a more restricted range of ϵϵ provided that c2/σ2<2c2/σ2<2. However, since most of the distributions of interest in this thesis are indeed strictly sub-Gaussian, we do not pursue this route. Note also that if one is interested in very small ϵϵ, then there is considerable room for improvement in the constant C*C*.

References

  1. Achlioptas, D. (2001, May). Database-friendly random projections. In Proc. Symp. Principles of Database Systems (PODS). Santa Barbara, CA
  2. Dasgupta, S. and Gupta, A. (1999, Mar.). An elementary proof of the Johnson-Lindenstrauss Lemma. (TR-99-006). Technical report. Univ. of Cal. Berkeley, Comput. Science Division.
  3. Ledoux, M. (2001). The Concentration of Measure Phenomenon. Providence, RI: American Mathematical Society.

Content actions

Download module as:

PDF | EPUB (?)

What is an EPUB file?

EPUB is an electronic book format that can be read on a variety of mobile devices.

Downloading to a reading device

For detailed instructions on how to download this content's EPUB to your specific device, click the "(?)" link.

| More downloads ...

Add module to:

My Favorites (?)

'My Favorites' is a special kind of lens which you can use to bookmark modules and collections. 'My Favorites' can only be seen by you, and collections saved in 'My Favorites' can remember the last module you were on. You need an account to use 'My Favorites'.

| A lens I own (?)

Definition of a lens

Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

Who can create a lens?

Any individual member, a community, or a respected organization.

What are tags? tag icon

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

| External bookmarks