A number of distributions, notably Gaussian and Bernoulli, are known to satisfy certain concentration of measure inequalities. We will analyze this phenomenon from a more general perspective by considering the class of sub-Gaussian distributions [1].
- Definition 1:
A random variable XX is called sub-Gaussian if there exists a constant c>0c>0 such that
E
exp
(
X
t
)
≤
exp
(
c
2
t
2
/
2
)
E
exp
(
X
t
)
≤
exp
(
c
2
t
2
/
2
)
(1)
holds for all t∈Rt∈R. We use the notation X∼ Sub (c2)X∼ Sub (c2) to denote that XX satisfies Equation 1.
The function Eexp(Xt)Eexp(Xt) is the moment-generating function of XX, while the upper bound in Equation 1 is the moment-generating function of a Gaussian random variable. Thus, a sub-Gaussian distribution is one whose moment-generating function is bounded by that of a Gaussian. There are a tremendous number of sub-Gaussian distributions, but there are two particularly important examples:
If X∼N(0,σ2)X∼N(0,σ2), i.e., XX is a zero-mean Gaussian random variable with variance σ2σ2, then X∼ Sub (σ2)X∼ Sub (σ2). Indeed, as mentioned above, the moment-generating function of a Gaussian is given by Eexp(Xt)=exp(σ2t2/2)Eexp(Xt)=exp(σ2t2/2), and thus Equation 1 is trivially satisfied.
If XX is a zero-mean, bounded random variable, i.e., one for which there exists a constant BB such that |X|≤B|X|≤B with probability 1, then X∼ Sub (B2)X∼ Sub (B2).
A common way to characterize sub-Gaussian random variables is through analyzing their moments. We consider only the mean and variance in the following elementary lemma, proven in [1].
If X∼ Sub (c2)X∼ Sub (c2) then,
E
(
X
)
=
0
E
(
X
)
=
0
(2)and
E
(
X
2
)
≤
c
2
.
E
(
X
2
)
≤
c
2
.
(3)
Lemma 1 shows that if X∼ Sub (c2)X∼ Sub (c2) then E(X2)≤c2E(X2)≤c2. In some settings it will be useful to consider a more restrictive class of random variables for which this inequality becomes an equality.
- Definition 2:
A random variable XX is called strictly sub-Gaussian if X∼ Sub (σ2)X∼ Sub (σ2) where σ2=E(X2)σ2=E(X2), i.e., the inequality
E
exp
(
X
t
)
≤
exp
(
σ
2
t
2
/
2
)
E
exp
(
X
t
)
≤
exp
(
σ
2
t
2
/
2
)
(4)
holds for all t∈Rt∈R. To denote that XX is strictly sub-Gaussian with variance σ2σ2, we will use the notation X∼ SSub (σ2)X∼ SSub (σ2).
If X∼N(0,σ2)X∼N(0,σ2), then X∼ SSub (σ2)X∼ SSub (σ2).
If X∼U(-1,1)X∼U(-1,1), i.e., XX is uniformly distributed on the interval [-1,1][-1,1], then X∼ SSub (1/3)X∼ SSub (1/3).
Now consider the random variable with distribution such that
P
(
X
=
1
)
=
P
(
X
=
-
1
)
=
1
-
s
2
,
P
(
X
=
0
)
=
s
,
s
∈
[
0
,
1
)
.
P
(
X
=
1
)
=
P
(
X
=
-
1
)
=
1
-
s
2
,
P
(
X
=
0
)
=
s
,
s
∈
[
0
,
1
)
.
(5)For any s∈[0,2/3]s∈[0,2/3], X∼ SSub (1-s)X∼ SSub (1-s). For s∈(2/3,1)s∈(2/3,1), XX is not strictly sub-Gaussian.
We now provide an equivalent characterization for sub-Gaussian and strictly sub-Gaussian random variables, proven in [1], that illustrates their concentration of measure behavior.
A random variable X∼ Sub (c2)X∼ Sub (c2) if and only if there exists a t0≥0t0≥0 and a constant a≥0a≥0 such that
P
(
|
X
|
≥
t
)
≤
2
exp
-
t
2
2
a
2
P
(
|
X
|
≥
t
)
≤
2
exp
-
t
2
2
a
2
(6)for all t≥t0t≥t0. Moreover, if X∼ SSub (σ2)X∼ SSub (σ2), then Equation 6 holds for all t>0t>0 with a=σa=σ.
Finally, sub-Gaussian distributions also satisfy one of the fundamental properties of a Gaussian distribution: the sum of two sub-Gaussian random variables is itself a sub-Gaussian random variable. This result is established in more generality in the following lemma.
Suppose that X=[X1,X2,...,XN]X=[X1,X2,...,XN], where each XiXi is independent and identically distributed (i.i.d.) with Xi∼ Sub (c2)Xi∼ Sub (c2). Then for any α∈RNα∈RN, 〈X,α〉∼ Sub c2∥α∥22〈X,α〉∼ Sub c2∥α∥22. Similarly, if each Xi∼ SSub (σ2)Xi∼ SSub (σ2), then for any α∈RNα∈RN, 〈X,α〉∼ SSub σ2∥α∥22〈X,α〉∼ SSub σ2∥α∥22.
Since the XiXi are i.i.d., the joint distribution factors and simplifies as:
E
exp
t
∑
i
=
1
N
α
i
X
i
=
E
∏
i
=
1
N
exp
t
α
i
X
i
=
∏
i
=
1
N
E
exp
t
α
i
X
i
≤
∏
i
=
1
N
exp
c
2
(
α
i
t
)
2
/
2
=
exp
∑
i
=
1
N
α
i
2
c
2
t
2
/
2
.
E
exp
t
∑
i
=
1
N
α
i
X
i
=
E
∏
i
=
1
N
exp
t
α
i
X
i
=
∏
i
=
1
N
E
exp
t
α
i
X
i
≤
∏
i
=
1
N
exp
c
2
(
α
i
t
)
2
/
2
=
exp
∑
i
=
1
N
α
i
2
c
2
t
2
/
2
.
(7)
If the XiXi are strictly sub-Gaussian, then the result follows by setting c2=σ2c2=σ2 and observing that E〈X,α〉2=σ2∥α∥22E〈X,α〉2=σ2∥α∥22.
-
Buldygin, V. and Kozachenko, Y. (2000). Metric Characterization of Random Variables and Random Processes. Providence, RI: American Mathematical Society.