Information sources take very different forms. Since the
information is not known to the destination, it is then best
modeled as a random process, discrete-time or continuous time.
Here are a few examples:
- Digital data source (e.g., a text)
can be modeled as a discrete-time and discrete valued random
process
X
1
X
1
,
X
2
X
2
, …,
where
X
i
∈ABCDE…
X
i
A
B
C
D
E
…
with a particular
p
X
1
x
p
X
1
x
,
p
X
2
x
p
X
2
x
, …,
and a specific
p
X
1
X
2
p
X
1
X
2
,
p
X
2
X
3
p
X
2
X
3
, …,
and
p
X
1
X
2
X
3
p
X
1
X
2
X
3
,
p
X
2
X
3
X
4
p
X
2
X
3
X
4
, …, etc.
-
Video signals can be modeled as a continuous time random
process. The power spectral density is bandlimited to
around 5 MHz (the value depends on the standards used to
raster the frames of image).
-
Audio signals can be modeled as a continuous-time random
process. It has been demonstrated that the power spectral
density of speech signals is bandlimited between 300 Hz and
3400 Hz. For example, the speech signal can be modeled as a
Gaussian process with the shown
power spectral density over a small observation period.
These analog information signals are bandlimited. Therefore, if
sampled faster than the Nyquist rate, they can be reconstructed
from their sample values.
Example 1
A speech signal with bandwidth of 3100 Hz can be sampled at
the rate of 6.2 kHz. If the samples are quantized with a 8
level quantizer then the speech signal can be represented with
a binary sequence with the rate of
6.2×103log28=18600bitssamplesamplessec=18.6kbitssec
6.23
2
8
18600
bits
sample
samples
sec
18.6
kbits
sec
(1)
The sampled real values can be quantized to create a discrete-time
discrete-valued random process. Since any bandlimited analog
information signal can be converted to a sequence of discrete
random variables, we will continue the discussion only for discrete
random variables.
Example 2
The random variable
xx
takes the value of 0 with probability 0.9 and the value of 1 with
probability 0.1. The statement that
x=1
x
1
carries more information than the statement that
x=0
x
0
.
The reason is that
xx
is expected to be 0, therefore, knowing that
x=1
x
1
is more surprising news!! An intuitive definition of
information measure should be larger when the probability is
small.
Example 3
The information content in the statement about the temperature
and pollution level on July 15th in Chicago should be the sum
of the information that July 15th in Chicago was hot and
highly polluted since pollution and temperature could be
independent.
Ihothigh=Ihot+Ihigh
I
hot
high
I
hot
I
high
(2)
An intuitive and meaningful measure of information should have
the following properties:
-
Self information should decrease with increasing probability.
-
Self information of two independent events should be their
sum.
-
Self information should be a continuous function of the
probability.
The only function satisfying the above conditions is the -log of
the probability.
Definition 1:
Entropy
1.
The entropy (average self information) of a discrete random
variable
XX is a function of its
probability mass function and is defined as
HX=-∑i=1NpX
x
i
logpX
x
i
H
X
i
1
N
p
X
x
i
p
X
x
i
(3)
where
NN is the number of
possible values of
XX and
pX
x
i
=PrX=
x
i
p
X
x
i
X
x
i
. If log is base 2 then the unit of entropy is bits.
Entropy is a measure of uncertainty in a random variable and a
measure of information it can reveal.
Example 4
If a source produces binary information
01
0
1
with probabilities
pp
and
1-p
1
p
.
The entropy of the source is
HX=-plog2p-1-plog21-p
H
X
p
2
p
1
p
2
1
p
(4)
If
p=0
p
0
then
HX=0
H
X
0
,
if
p=1
p
1
then
HX=0
H
X
0
,
if
p=1/2
p
12
then
HX=1
H
X
1
bits.
The source has its largest entropy if
p=1/2
p
12
and the source provides no new information if
p=0
p
0
or
p=1
p
1
.
Example 5
An analog source is modeled as a continuous-time random
process with power spectral density bandlimited to the band
between 0 and 4000 Hz. The signal is sampled at the Nyquist
rate. The sequence of random variables, as a result of
sampling, are assumed to be independent. The samples are
quantized to 5 levels
-2-1012
-2
-1
0
1
2
.
The probability of the samples taking the quantized values are
121418116116
1
2
1
4
1
8
1
16
1
16
,
respectively. The entropy of the random variables are
HX=-12log212-14log214-18log218-116log2116-116log2116=12log22+14log24+18log28+116log2
16
+116log216=12+12+38+48=158bitssample
H
X
1
2
2
1
2
1
4
2
1
4
1
8
2
1
8
1
16
2
1
16
1
16
2
1
16
1
2
2
2
1
4
2
4
1
8
2
8
1
16
2
16
1
16
2
16
1
2
1
2
3
8
4
8
15
8
bits
sample
(5)
There are 8000 samples per second. Therefore, the source
produces
8000158=15000bitssec
8000
15
8
15000
bits
sec
of information.
Definition 2:
Joint Entropy
The joint entropy of two discrete random variables
(
XX,
YY) is defined by
HXY=-∑i∑jpXY
x
i
y
j
logpXY
x
i
y
j
H
X
Y
i
i
j
j
p
X
Y
x
i
y
j
p
X
Y
x
i
y
j
(6)
The joint entropy for a random vector
X=
X
1
X
2
…
X
n
T
X
X
1
X
2
…
X
n
is defined as
HX=-∑
x
1
∑
x
2
…∑
x
n
pX
x
1
x
2
…
x
n
logpX
x
1
x
2
…
x
n
H
X
x
1
x
1
x
2
x
2
…
x
n
x
n
p
X
x
1
x
2
…
x
n
p
X
x
1
x
2
…
x
n
(7)
Definition 3:
Conditional Entropy
The conditional entropy of the random variable
XX given the random variable
YY is defined by
H
X
|
Y
=-∑i∑jpXY
x
i
y
j
log
p
X
|
Y
x
i
|
y
j
H
X
|
Y
i
i
j
j
p
X
Y
x
i
y
j
p
X
|
Y
x
i
|
y
j
(8)
It is easy to show that
HX=H
X
1
+H
X
2
|
X
1
+…+H
X
n
|
X
1
X
2
…
X
n-1
H
X
H
X
1
H
X
2
|
X
1
…
H
X
n
|
X
1
X
2
…
X
n-1
(9)
and
HXY=HY+H
X
|
Y
=HX+H
Y
|
X
H
X
Y
H
Y
H
X
|
Y
H
X
H
Y
|
X
(10)
If
X
1
X
1
,
X
2
X
2
,
…,
X
n
X
n
are mutually independent it is easy to show that
HX=∑i=1nH
X
i
H
X
i
1
n
H
X
i
(11)
Definition 4:
Entropy Rate
The entropy rate of a stationary discrete-time random process
is defined by
H=limn→∞H
X
n
|
X
1
X
2
…
X
n
H
n
H
X
n
|
X
1
X
2
…
X
n
(12)
The limit exists and is equal to
H=limn→∞1nH
X
1
X
2
…
X
n
H
n
1
n
H
X
1
X
2
…
X
n
(13)
The entropy rate is a measure of the uncertainty of
information content per output symbol of the source.
Entropy is closely tied to
source coding. The extent to which a
source can be compressed is related to its entropy. In 1948,
Claude E. Shannon introduced a theorem which related the
entropy to the number of bits per second required to represent
a source without much loss.