In this paragraph the Chebyshev’s inequality is used to show, in another sense, that the sample mean,
x
¯
x
¯
, is a good statistic to use to estimate a population with mean
μ
μ
; the relative frequency of successes in n Bernoulli trials,
y/n
y/n
, is a good statistic for estimating p; and the empirical distribution function,
F
n
(
x
)
F
n
(
x
)
, can be used to estimate the theoretical distribution function
F(
x
)
F(
x
)
. The effect of the sample size n on these estimates is discussed.
At the beginning, it is showed that the Chebyshev’s inequality gives added significance to the standard deviation in terms of bounding certain probabilities. The inequality is valid for all distributions for which the standard deviation exists. The proof is given for the discrete case, but it holds for the continuous case with integrals replacing summations.
If the random variable X has a mean
μ
μ
and variance
σ
2
σ
2
, then for every
k≥1
k≥1
,
P(
|
X−μ
|≥kσ
)≤
1
k
2
.
P(
|
X−μ
|≥kσ
)≤
1
k
2
.
Let
f(
x
)
f(
x
)
denote p.d.f. of X. Then
σ
2
=E[
(
X−μ
)
2
]=
∑
x∈R
(
x−μ
)
2
f(
x
)
=
∑
x∈A
(
x−μ
)
2
f(
x
)
+
∑
x∈A'
(
x−μ
)
2
f(
x
)
,
σ
2
=E[
(
X−μ
)
2
]=
∑
x∈R
(
x−μ
)
2
f(
x
)
=
∑
x∈A
(
x−μ
)
2
f(
x
)
+
∑
x∈A'
(
x−μ
)
2
f(
x
)
,
where
A=(
x:|
x−μ
|≥kσ
)
A=(
x:|
x−μ
|≥kσ
)
. The second term in the right-hand member of the equation is the sum of nonnegative numbers and thus is greater than or equal to zero, Hence
σ
2
≥
∑
x∈A
(
x−μ
)
2
f(
x
)
.
σ
2
≥
∑
x∈A
(
x−μ
)
2
f(
x
)
.
However, in A,
|
x−μ
|≥kσ
|
x−μ
|≥kσ
so
σ
2
≥
∑
x∈A
(
kσ
)
2
f(
x
)=
k
2
σ
2
∑
x∈A
f(
x
)
.
σ
2
≥
∑
x∈A
(
kσ
)
2
f(
x
)=
k
2
σ
2
∑
x∈A
f(
x
)
.
But the latter summation equals
P(
X∈A
)
P(
X∈A
)
, and thus
σ
2
≥
k
2
σ
2
P(
X∈A
)=
k
2
σ
2
P(
|
X−μ
|≥kσ
).
σ
2
≥
k
2
σ
2
P(
X∈A
)=
k
2
σ
2
P(
|
X−μ
|≥kσ
).
That is,
P(
|
X−μ
|≥kσ
)≤
1
k
2
.
P(
|
X−μ
|≥kσ
)≤
1
k
2
.
COROLLARY
If
ε=kσ
ε=kσ
, then
P(
|
X−μ
|≥ε
)≤
σ
2
ε
2
.
P(
|
X−μ
|≥ε
)≤
σ
2
ε
2
.
In words, Chebyshev’s inequality states that the probability that X differs from its mean by at least k standard deviations is less than or equal to
1
k
2
1
k
2
. It follows that the probability that X differs from its mean by less than k standard deviations is at least
1
k
2
1
k
2
. That is,
P(
|
X−μ
|<kσ
)≥1−
1
k
2
.
P(
|
X−μ
|<kσ
)≥1−
1
k
2
.
From the corollary, it also follows that
P(
|
X−μ
|<ε
)≥1−
σ
2
ε
2
.
P(
|
X−μ
|<ε
)≥1−
σ
2
ε
2
.
Thus Chebyshev’s inequality can be used as a bound for certain probabilities. However, in many instances, the bound is not very close to the true probability.
If it is known that X has a mean of 25 and a variance of 16, then,
σ=4
σ=4
a lower bound for
P(
17<X<33
)
P(
17<X<33
)
is given by
P(
17<X<33
)=P(
|
X−25
|<8
)=P(
|
X−μ
|<2σ
)≥1−
1
4
=0.75,
P(
17<X<33
)=P(
|
X−25
|<8
)=P(
|
X−μ
|<2σ
)≥1−
1
4
=0.75,
and an upper bound for
P(
|
X−25
|≥12
)
P(
|
X−25
|≥12
)
is found to be
P(
|
X−25
|≥12
)=P(
|
X−μ
|≥3σ
)≤
1
9
.
P(
|
X−25
|≥12
)=P(
|
X−μ
|≥3σ
)≤
1
9
.
Note that the results of the last example hold for any distribution with mean 25 and standard deviation 4. But, even stronger, the probability that any random variable X differs from its mean by 3 or more standard deviations is at most 1/9 by letting k =3 in the theorem. Also the probability that any random variable X differs from its mean by less than 2 standard deviations is at least 3/4 by letting k=2.