Sub-Gaussian distributions have a close relationship to the concentration of measure phenomenon [3]. To illustrate this, we note that we can combine Lemma 2 and Theorem 1 from "Sub-Gaussian random variables" to obtain deviation bounds for weighted sums of sub-Gaussian random variables. For our purposes, however, it will be more enlightening to study the norm of a vector of sub-Gaussian random variables. In particular, if XX is a vector where each XiXi is i.i.d. with Xi∼ Sub (c)Xi∼ Sub (c), then we would like to know how ∥X∥2∥X∥2 deviates from its mean.
In order to establish the result, we will make use of Markov's inequality for nonnegative random variables.
For any nonnegative random variable XX and t>0t>0,
P
(
X
≥
t
)
≤
E
(
X
)
t
.
P
(
X
≥
t
)
≤
E
(
X
)
t
.
(1)Let f(x)f(x) denote the probability density function for XX.
E
(
X
)
=
∫
0
∞
x
f
(
x
)
d
x
≥
∫
t
∞
x
f
(
x
)
d
x
≥
∫
t
∞
t
f
(
x
)
d
x
=
t
P
(
X
≥
t
)
.
E
(
X
)
=
∫
0
∞
x
f
(
x
)
d
x
≥
∫
t
∞
x
f
(
x
)
d
x
≥
∫
t
∞
t
f
(
x
)
d
x
=
t
P
(
X
≥
t
)
.
(2)
In addition, we will require the following bound on the exponential moment of a sub-Gaussian random variable.
Suppose X∼ Sub (c2)X∼ Sub (c2). Then
E
exp
(
λ
X
2
/
2
c
2
)
≤
1
1
-
λ
,
E
exp
(
λ
X
2
/
2
c
2
)
≤
1
1
-
λ
,
(3)for any λ∈[0,1)λ∈[0,1).
First, observe that if λ=0λ=0, then the lemma holds trivially. Thus, suppose that λ∈(0,1)λ∈(0,1). Let f(x)f(x) denote the probability density function for XX. Since XX is sub-Gaussian, we have by definition that
∫
-
∞
∞
exp
(
t
x
)
f
(
x
)
d
x
≤
exp
(
c
2
t
2
/
2
)
∫
-
∞
∞
exp
(
t
x
)
f
(
x
)
d
x
≤
exp
(
c
2
t
2
/
2
)
(4)for any t∈Rt∈R. If we multiply by exp(-c2t2/2λ)exp(-c2t2/2λ), then we obtain
∫
-
∞
∞
exp
(
t
x
-
c
2
t
2
/
2
λ
)
f
(
x
)
d
x
≤
exp
(
c
2
t
2
(
λ
-
1
)
/
2
λ
)
.
∫
-
∞
∞
exp
(
t
x
-
c
2
t
2
/
2
λ
)
f
(
x
)
d
x
≤
exp
(
c
2
t
2
(
λ
-
1
)
/
2
λ
)
.
(5)Now, integrating both sides with respect to tt, we obtain
∫
-
∞
∞
∫
-
∞
∞
exp
(
t
x
-
c
2
t
2
/
2
λ
)
d
t
f
(
x
)
d
x
≤
∫
-
∞
∞
exp
(
c
2
t
2
(
λ
-
1
)
/
2
λ
)
d
t
,
∫
-
∞
∞
∫
-
∞
∞
exp
(
t
x
-
c
2
t
2
/
2
λ
)
d
t
f
(
x
)
d
x
≤
∫
-
∞
∞
exp
(
c
2
t
2
(
λ
-
1
)
/
2
λ
)
d
t
,
(6)which reduces to
1
c
2
π
λ
∫
-
∞
∞
exp
(
λ
x
2
/
2
c
2
)
f
(
x
)
d
x
≤
1
c
2
π
λ
1
-
λ
.
1
c
2
π
λ
∫
-
∞
∞
exp
(
λ
x
2
/
2
c
2
)
f
(
x
)
d
x
≤
1
c
2
π
λ
1
-
λ
.
(7)This simplifies to prove the lemma.
We now state our main theorem, which generalizes the results of [2] and uses substantially the same proof technique.
Suppose that X=[X1,X2,...,XM]X=[X1,X2,...,XM], where each XiXi is i.i.d. with Xi∼ Sub (c2)Xi∼ Sub (c2) and E(Xi2)=σ2E(Xi2)=σ2. Then
E
∥
X
∥
2
2
=
M
σ
2
.
E
∥
X
∥
2
2
=
M
σ
2
.
(8)Moreover, for any α∈(0,1)α∈(0,1) and for any β∈[c2/σ2,β max ]β∈[c2/σ2,β max ], there exists a constant κ*≥4κ*≥4 depending only on β max β max and the ratio σ2/c2σ2/c2 such that
P
∥
X
∥
2
2
≤
α
M
σ
2
≤
exp
-
M
(
1
-
α
)
2
/
κ
*
P
∥
X
∥
2
2
≤
α
M
σ
2
≤
exp
-
M
(
1
-
α
)
2
/
κ
*
(9)and
P
∥
X
∥
2
2
≥
β
M
σ
2
≤
exp
-
M
(
β
-
1
)
2
/
κ
*
.
P
∥
X
∥
2
2
≥
β
M
σ
2
≤
exp
-
M
(
β
-
1
)
2
/
κ
*
.
(10)Since the XiXi are independent, we obtain
E
∥
X
∥
2
2
=
∑
i
=
1
M
E
X
i
2
=
∑
i
=
1
M
σ
2
=
M
σ
2
E
∥
X
∥
2
2
=
∑
i
=
1
M
E
X
i
2
=
∑
i
=
1
M
σ
2
=
M
σ
2
(11)and hence Equation 8 holds. We now turn to Equation 9 and Equation 10. Let us first consider Equation 10. We begin by applying Markov's inequality:
P
∥
X
∥
2
2
≥
β
M
σ
2
=
P
exp
(
λ
∥
X
∥
2
2
)
≥
exp
λ
β
M
σ
2
≤
E
exp
(
λ
∥
X
∥
2
2
)
exp
λ
β
M
σ
2
=
∏
i
=
1
M
E
exp
(
λ
X
i
2
)
exp
λ
β
M
σ
2
.
P
∥
X
∥
2
2
≥
β
M
σ
2
=
P
exp
(
λ
∥
X
∥
2
2
)
≥
exp
λ
β
M
σ
2
≤
E
exp
(
λ
∥
X
∥
2
2
)
exp
λ
β
M
σ
2
=
∏
i
=
1
M
E
exp
(
λ
X
i
2
)
exp
λ
β
M
σ
2
.
(12)Since Xi∼ Sub (c2)Xi∼ Sub (c2), we have from Lemma 2 that
E
exp
(
λ
X
i
2
)
=
E
exp
(
2
c
2
λ
X
i
2
/
2
c
2
)
≤
1
1
-
2
c
2
λ
.
E
exp
(
λ
X
i
2
)
=
E
exp
(
2
c
2
λ
X
i
2
/
2
c
2
)
≤
1
1
-
2
c
2
λ
.
(13)Thus,
∏
i
=
1
M
E
exp
λ
X
i
2
≤
1
1
-
2
c
2
λ
M
/
2
∏
i
=
1
M
E
exp
λ
X
i
2
≤
1
1
-
2
c
2
λ
M
/
2
(14)and hence
P
∥
X
∥
2
2
≥
β
M
σ
2
≤
exp
-
2
λ
β
σ
2
1
-
2
c
2
λ
M
/
2
.
P
∥
X
∥
2
2
≥
β
M
σ
2
≤
exp
-
2
λ
β
σ
2
1
-
2
c
2
λ
M
/
2
.
(15)By setting the derivative to zero and solving for λλ, one can show that the optimal λλ is
λ
=
β
σ
2
-
c
2
2
c
2
σ
2
(
1
+
β
)
.
λ
=
β
σ
2
-
c
2
2
c
2
σ
2
(
1
+
β
)
.
(16)Plugging this in we obtain
P
∥
X
∥
2
2
≥
β
M
σ
2
≤
β
σ
2
c
2
exp
1
-
β
σ
2
c
2
M
/
2
.
P
∥
X
∥
2
2
≥
β
M
σ
2
≤
β
σ
2
c
2
exp
1
-
β
σ
2
c
2
M
/
2
.
(17)Similarly,
P
∥
X
∥
2
2
≤
α
M
σ
2
≤
α
σ
2
c
2
exp
1
-
α
σ
2
c
2
M
/
2
.
P
∥
X
∥
2
2
≤
α
M
σ
2
≤
α
σ
2
c
2
exp
1
-
α
σ
2
c
2
M
/
2
.
(18)In order to combine and simplify these inequalities, note that if we define
κ
*
=
max
4
,
2
(
β
max
σ
2
/
c
-
1
)
2
(
β
max
σ
2
/
c
-
1
)
-
log
(
β
max
σ
2
/
c
)
κ
*
=
max
4
,
2
(
β
max
σ
2
/
c
-
1
)
2
(
β
max
σ
2
/
c
-
1
)
-
log
(
β
max
σ
2
/
c
)
(19)then we have that for any γ∈[0,β max σ2/c]γ∈[0,β max σ2/c] we have the bound
log
(
γ
)
≤
(
γ
-
1
)
-
2
(
γ
-
1
)
2
κ
*
,
log
(
γ
)
≤
(
γ
-
1
)
-
2
(
γ
-
1
)
2
κ
*
,
(20)and hence
γ
≤
exp
(
γ
-
1
)
-
2
(
γ
-
1
)
2
κ
*
.
γ
≤
exp
(
γ
-
1
)
-
2
(
γ
-
1
)
2
κ
*
.
(21)By setting γ=ασ2/c2γ=ασ2/c2, Equation 18 reduces to yield Equation 9. Similarly, setting γ=βσ2/c2γ=βσ2/c2 establishes Equation 10.
This result tells us that given a vector with entries drawn according to a sub-Gaussian distribution, we can expect the norm of the vector to concentrate around its expected value of Mσ2Mσ2 with exponentially high probability as MM grows. Note, however, that the range of allowable choices for ββ in Equation 10 is limited to β≥c2/σ2≥1β≥c2/σ2≥1. Thus, for a general sub-Gaussian distribution, we may be unable to achieve an arbitrarily tight concentration. However, recall that for strictly sub-Gaussian distributions we have that c2=σ2c2=σ2, in which there is no such restriction. Moreover, for strictly sub-Gaussian distributions we also have the following useful corollary.
Suppose that X=[X1,X2,...,XM]X=[X1,X2,...,XM], where each XiXi is i.i.d. with Xi∼ SSub (σ2)Xi∼ SSub (σ2). Then
E
∥
X
∥
2
2
=
M
σ
2
E
∥
X
∥
2
2
=
M
σ
2
(22)and for any ϵ>0ϵ>0,
P
∥
X
∥
2
2
-
M
σ
2
≥
ϵ
M
σ
2
≤
2
exp
-
M
ϵ
2
κ
*
P
∥
X
∥
2
2
-
M
σ
2
≥
ϵ
M
σ
2
≤
2
exp
-
M
ϵ
2
κ
*
(23)with κ*=2/(1-log(2))≈6.52κ*=2/(1-log(2))≈6.52.
Since each Xi∼ SSub (σ2)Xi∼ SSub (σ2), we have that Xi∼ Sub (σ2)Xi∼ Sub (σ2) and E(Xi2)=σ2E(Xi2)=σ2, in which case we may apply Theorem 1 with α=1-ϵα=1-ϵ and β=1+ϵβ=1+ϵ. This allows us to simplify and combine the bounds in Equation 9 and Equation 10 to obtain Equation 23. The value of κ*κ* follows from the observation that 1+ϵ≤21+ϵ≤2 so that we can set β max =2β max =2.
Finally, from Corollary 1 we also have the following additional useful corollary. This result generalizes the main results of [1] to the broader family of general strictly sub-Gaussian distributions via a much simpler proof.
Suppose that ΦΦ is an M×NM×N matrix whose entries φijφij are i.i.d. with φij∼ SSub (1/M)φij∼ SSub (1/M). Let Y=ΦxY=Φx for x∈RNx∈RN. Then for any ϵ>0ϵ>0, and any x∈RNx∈RN,
E
∥
Y
∥
2
2
=
∥
x
∥
2
2
E
∥
Y
∥
2
2
=
∥
x
∥
2
2
(24)and
P
∥
Y
∥
2
2
-
∥
x
∥
2
2
≥
ϵ
∥
x
∥
2
2
≤
2
exp
-
M
ϵ
2
κ
*
P
∥
Y
∥
2
2
-
∥
x
∥
2
2
≥
ϵ
∥
x
∥
2
2
≤
2
exp
-
M
ϵ
2
κ
*
(25)with κ*=2/(1-log(2))≈6.52κ*=2/(1-log(2))≈6.52.
Let φiφi denote the i th i th row of ΦΦ. Observe that if YiYi denotes the first element of YY, then Yi=〈φi,x〉Yi=〈φi,x〉, and thus by Lemma 2 from "Sub-Gaussian random variables", Yi∼ SSub ∥x∥22/MYi∼ SSub ∥x∥22/M. Applying Corollary 1 to the MM-dimensional random vector YY, we obtain Equation 25.
-
Achlioptas, D. (2001, May). Database-friendly random projections. In Proc. Symp. Principles of Database Systems (PODS). Santa Barbara, CA
-
Dasgupta, S. and Gupta, A. (1999, Mar.). An elementary proof of the Johnson-Lindenstrauss Lemma. (TR-99-006). Technical report. Univ. of Cal. Berkeley, Comput. Science Division.
-
Ledoux, M. (2001). The Concentration of Measure Phenomenon. Providence, RI: American Mathematical Society.