Communication theory has been formulated best for
symbolic-valued signals. Claude
Shannon published in 1948 *The Mathematical Theory
of Communication*, which became the cornerstone of digital
communication. He showed the power of probabilistic
models for symbolic-valued signals, which allowed him to
quantify the information present in a signal. In the simplest
signal model, each symbol can occur at index
nn with a probability
Pr
a
k
a
k
,
k=1…K
k
1
…
K
. What this model says is that for each signal value a
KK-sided coin is flipped (note that
the coin need not be fair). For this model to make sense, the
probabilities must be numbers between zero and one and must sum
to one.

∑k=1KPrak=1
k
1
K
ak
1

(2)
This coin-flipping model assumes that symbols occur without
regard to what preceding or succeeding symbols were, a false
assumption for typed text. Despite this probabilistic
model's over-simplicity, the ideas we develop here also
work when more accurate, but still probabilistic, models are
used. The key quantity that characterizes a symbolic-valued
signal is the

entropy of its alphabet.

HA=−∑kkPr
a
k
log2Pr
a
k
H
A
k
k
a
k
2
a
k

(3)
Because we use the base-2 logarithm, entropy has units of
bits. For this definition to make sense, we must take special
note of symbols having probability zero of occurring. A
zero-probability symbol never occurs; thus, we define

0log20=0
0
2
0
0
so that such symbols do not affect the entropy. The
maximum value attainable by an alphabet's entropy occurs
when the symbols are equally likely
(

Pr
a
k
=Pr
a
l
a
k
a
l
). In this case, the entropy equals

log2K
2
K
. The minimum value occurs when only one symbol
occurs; it has probability one of occurring and the rest have
probability zero.

Derive the maximum-entropy results, both the
numeric aspect (entropy equals
log2K
2
K
) and the theoretical one (equally likely symbols
maximize entropy). Derive the value of the minimum entropy
alphabet.

Equally likely symbols each have a probability of
1K
1
K
. Thus,
HA=−∑kk1Klog21K=log2K
H
A
k
k
1K
2
1K
2
K
. To prove that this is the maximum-entropy
probability assignment, we must explicitly take into account
that probabilities sum to one. Focus on a particular
symbol, say the first.
Pr
a
0
a
0
appears *twice* in the entropy
formula: the terms
Pr
a
0
log2Pr
a
0
a
0
2
a
0
and
(1−Pra0+…+Pr
a
K-2
)log2(1−Pra0+…+Pr
a
K-2
)
1
a0
…
a
K-2
2
1
a0
…
a
K-2
. The derivative with respect to this probability
(and all the others) must be zero. The derivative equals
log2Pr
a
0
−log2(1−Pra0+…+Pr
a
K-2
)
2
a
0
2
1
a0
…
a
K-2
, and all other derivatives have the same form
(just substitute your letter's index). Thus, each
probability must equal the others, and we are done. For the
minimum entropy answer, one term is
1log21=0
1
2
1
0
, and the others are
0log20
0
2
0
, which we define to be zero also. The minimum
value of entropy is zero.

A four-symbol alphabet has the following probabilities.
Pra0=12
a0
1
2
Pra1=14
a1
1
4
Pra2=18
a2
1
8
Pra3=18
a3
1
8
Note that these probabilities sum to one as they should. As
12=2-1
1
2
2
,
log212=-1
2
1
2
-1
. The
entropy of this alphabet equals

HA=−(12log212+14log214+18log218+18log218)=−(12-1+14-2+18-3+18-3)=1.75 bits
H
A
1
2
2
1
2
1
4
2
1
4
1
8
2
1
8
1
8
2
1
8
12
-1
14
-2
18
-3
18
-3
1.75
bits

(4)
Comments:"Electrical Engineering Digital Processing Systems in Braille."