Our goal is to estimate an arbitrary probability density function,
fX(x)fX(x), within a finite region of the xxaxis.
We will do this by partitioning the region into LL equally spaced
subintervals, or “bins”,
and forming an approximation for fX(x)fX(x) within each bin.
Let our region of support start at the value x0x0, and end at xLxL.
Our LL subintervals of this region will be
[x0,x1][x0,x1], (x1,x2](x1,x2], ..., (xL1,xL](xL1,xL].
To simplify our notation we will define bin(k)bin(k) to represent the interval
(xk1,xk](xk1,xk], k=1,2,⋯,Lk=1,2,⋯,L, and define the quantity
ΔΔ to be the length of each subinterval.
b
i
n
(
k
)
=
(
x
k

1
,
x
k
]
k
=
1
,
2
,
⋯
,
L
Δ
=
x
L

x
0
L
b
i
n
(
k
)
=
(
x
k

1
,
x
k
]
k
=
1
,
2
,
⋯
,
L
Δ
=
x
L

x
0
L
(23)We will also define f˜(k)f˜(k) to be the probability that XX
falls into bin(k)bin(k).
f
˜
(
k
)
=
P
(
X
∈
b
i
n
(
k
)
)
=
∫
x
k

1
x
k
f
X
(
x
)
d
x
≈
f
X
(
x
)
Δ
for
x
∈
b
i
n
(
k
)
f
˜
(
k
)
=
P
(
X
∈
b
i
n
(
k
)
)
=
∫
x
k

1
x
k
f
X
(
x
)
d
x
≈
f
X
(
x
)
Δ
for
x
∈
b
i
n
(
k
)
(24)The approximation in ((Reference)) only holds for an appropriately small
bin widthΔΔ.
Next we introduce the concept of a histogram of a collection
of i.i.d. random variables {X1,X2,⋯,XN}{X1,X2,⋯,XN}.
Let us start by defining a function that will indicate whether or
not the random variable XnXn falls within bin(k)bin(k).
I
n
(
k
)
=
1
,
if
X
n
∈
b
i
n
(
k
)
0
,
if
X
n
∉
b
i
n
(
k
)
I
n
(
k
)
=
1
,
if
X
n
∈
b
i
n
(
k
)
0
,
if
X
n
∉
b
i
n
(
k
)
(25)The histogram of XnXn at bin(k)bin(k), denoted as H(k)H(k), is simply
the number of random variables that fall within bin(k)bin(k).
This can be written as
H
(
k
)
=
∑
n
=
1
N
I
n
(
k
)
.
H
(
k
)
=
∑
n
=
1
N
I
n
(
k
)
.
(26)We can show that the normalized histogram, H(k)/NH(k)/N, is an
unbiased estimate of the probability of XX falling in bin(k)bin(k).
Let us compute the expected value of the normalized histogram.
E
H
(
k
)
N
=
1
N
∑
n
=
1
N
E
[
I
n
(
k
)
]
=
1
N
∑
n
=
1
N
{
1
·
P
(
X
n
∈
b
i
n
(
k
)
)
+
0
·
P
(
X
n
∉
b
i
n
(
k
)
)
}
=
f
˜
(
k
)
E
H
(
k
)
N
=
1
N
∑
n
=
1
N
E
[
I
n
(
k
)
]
=
1
N
∑
n
=
1
N
{
1
·
P
(
X
n
∈
b
i
n
(
k
)
)
+
0
·
P
(
X
n
∉
b
i
n
(
k
)
)
}
=
f
˜
(
k
)
(27)The last equality results from the definition of f˜(k)f˜(k), and from the
assumption that the XnXn's have the same distribution.
A similar argument may be used to show that the variance of H(k)H(k)
is given by
V
a
r
H
(
k
)
N
=
1
N
f
˜
(
k
)
(
1

f
˜
(
k
)
)
.
V
a
r
H
(
k
)
N
=
1
N
f
˜
(
k
)
(
1

f
˜
(
k
)
)
.
(28)Therefore, as NN grows large,
the bin probabilities f˜(k)f˜(k) can be approximated
by the normalized histogram H(k)/NH(k)/N.
f
˜
(
k
)
≈
H
(
k
)
N
f
˜
(
k
)
≈
H
(
k
)
N
(29)Using ((Reference)), we may then approximate the density function
fX(x)fX(x) within bin(k)bin(k) by
f
X
(
x
)
≈
H
(
k
)
N
Δ
for
x
∈
b
i
n
(
k
)
.
f
X
(
x
)
≈
H
(
k
)
N
Δ
for
x
∈
b
i
n
(
k
)
.
(30)Notice this estimate is a staircase function of xx which is constant
over each interval bin(k)bin(k).
It can also easily be verified that this density estimate integrates to 1.