The self information gives the information in
a single outcome. In most cases, e.g in data compression, it is much more
interesting to know the average information content
of a source. This average is given by the expected
value of the self information with respect to the source's probability
distribution. This average of self information is called the source entropy.
- Definition 1: Entropy
1.
The entropy (average self information) of a discrete random
variable
XX is a function of its
probability mass function and is defined as
HX=-∑i=1N
p
X
x
i
log
p
X
x
i
H
X
i
1
N
p
X
x
i
p
X
x
i
(1)
where
NN is the number of
possible values of
XX and
P
X
x
i
=PrX=
x
i
P
X
x
i
X
x
i
. If log is base 2 then the unit of entropy is bits per (source)symbol.
Entropy is a measure of uncertainty in a random variable and a
measure of information it can reveal.
2.
If symbol has zero probability, which means it never occurs,
it should not affect the entropy. Letting
0log0=0
0
0
0
, we have dealt with that.
In texts you will find that the argument to the entropy function
may vary. The two most common are
HX
H
X
and
Hp
H
p
.
We calculate the entropy of a source X, but the entropy is,
strictly speaking, a function of the source's probabilty function p.
So both notations are justified.
Most calculators does not allow you to directly calculate the
logarithm with base 2, so we have to use a logarithm base that most
calculators support. Fortunately it is easy to convert between different
bases.
Assume you want to calculate
log2x
2
x
, where
x>0
x
0
.
Then
log2x=y
2
x
y
implies that
2y=x
2
y
x
.
Taking the natural logarithm on both sides we obtain
log2x=lnxln2
2
x
x
2
When throwing a dice, one may ask for the average information conveyed
in a single throw. Using the formula for entropy we get
HX=-∑i=16pXxilogpXxi=log6
bits/symbol
H
X
i
1
6
pX
xi
pX
xi
6
bits/symbol
If a soure produces binary information
01
0
1
with probabilities
pp
and
1−p
1
p
.
The entropy of the source is
HX=-plog2p−1−plog21−p
H
X
p
2
p
1
p
2
1
p
(2)
If
p=0
p
0
then
HX=0
H
X
0
,
if
p=1
p
1
then
HX=0
H
X
0
,
if
p=1/2
p
12
then
HX=1
H
X
1
.
The source has its largest entropy if
p=1/2
p
12
and the source provides no new information if
p=0
p
0
or
p=1
p
1
.
An analog source is modeled as a continuous-time random
process with power spectral density bandlimited to the band
between 0 and 4000 Hz. The signal is sampled at the Nyquist
rate. The sequence of random variables, as a result of
sampling, are assumed to be independent. The samples are
quantized to 5 levels
-2-1012
-2
-1
0
1
2
.
The probability of the samples taking the quantized values are
121418116116
1
2
1
4
1
8
1
16
1
16
,
respectively. The entropy of the random variables are
HX=-∑i=15pXxilogpXxi=12+12+38+14+14=158bits/sample
H
X
i
1
5
pX
xi
pX
xi
1
2
1
2
3
8
1
4
1
4
15
8
bits/sample
(3)
There are 8000 samples per second. Therefore, the source
produces
8000158=15000
8000
15
8
15000
bits/sec of information.
Entropy is closely tied to source coding. The extent to which a
source can be compressed is related to its entropy.
There are many interpretations possible for the entropy of
a random variable, including
- (Average)Self information in a random variable
- Minimum number of bits per source symbol
required to describe the random variable without loss
- Description complexity
- Measure of uncertainty in a random variable
- Øien, G.E. and Lundheim,L. (2003)
Information Theory, Coding and Compression,
Trondheim: Tapir Akademisk forlag.