When representing a source we want to use as few bits as possible, as this will imply that less disk space is required for storage or that
transmission over the Internet is quicker. However, we do not want to use so few bits that the receiver cannot determine what was sent or stored.
So, for a given source what is the minimal representation? Here we consider the minimal representation as the representation that uses the
minimum number of bits (on average) to encode the source without errors.
According to Shannon's
source coding theorem, a source that produces statistically independent outcomes, the minimum average number of bits per symbol
is the entropy of the source! (A classical example of a source that produces statistically independent outcomes is
throwing a die.)
Average indicates that the number of bits used for a specific symbol may be different
from the number of bits representing another. E.g., as opposed to ASCII coding, we might represent an "A" with 7 bits, but an "E" with 3 bits.
But it also implies that when you receive a series of symbols, the number you receive per time unit,
say per second, will not be exactly the same, but averaged over a long term period, the rate is proportional to time with the rate per symbol
as the proportionality constant.
Let us assume that we represent a symbol xnxn, with probability
pnpn, by
lnln bits. Then, the average number of bits spent per symbol will be
L-=∑
n
=1Npnln
L
n
1
N
pn
ln
(1)
We see that this equation is equal to the entropy if the code words are selected to have the lengths
ln=−logpn
ln
pn
.
Thus, if the source produces stochastically independent outcomes with probabilities
pnpn, such that
logpn
pn
is an integer, then we can easily find an optimal code as we show in the next example.
A four-symbol alphabet produces stochastically independent outcomes with the following probabilities.
Prx1=12
x1
1
2
Prx2=14
x2
1
4
Prx3=18
x3
1
8
Prx4=18
x4
1
8
and an entropy of 1.75 bits/symbol. Let's see if we can find a codebook for
this four-letter alphabet that satisfies the Source Coding
Theorem. The simplest code to try is known as the simple
binary code: convert the symbol's index into a binary
number and use the same number of bits for each symbol by
including leading zeros where necessary.
x
1
↔00
x
2
↔01
x
3
↔10
x
4
↔11
↔
x
1
00
↔
x
2
01
↔
x
3
10
↔
x
4
11
(2)
As all symbols are represented by 2 bits, obviously the average number of bits per symbol is 2.
Because the entropy equals
1.751.75 bits, the simple
binary code is not a minimal representation according to the source coding theorem.
If we chose a codebook with differing number of bits for the symbols, a smaller average number of
bits can indeed be obtained. The idea is to
use shorter bit
sequences for the symbols that occur more often, i.e., symbols that have a higher probability.
One codebook like this is
x
1
↔0
x
2
↔10
x
3
↔110
x
4
↔111
↔
x
1
0
↔
x
2
10
↔
x
3
110
↔
x
4
111
(3)
Now
L-=1×12+2×14+3×18+3×18=1.75
L
1
1
2
2
1
4
3
1
8
3
1
8
1.75
. We can reach the entropy limit! This should come as no surprise, as promised above, when
logpnpn is an integer
for all
nn, the optimal code is easily found.
The simple
binary code is, in this case, less efficient than the
unequal-length code. Using the efficient code, we can transmit
the symbolic-valued signal having this alphabet 12.5%
faster. Furthermore, we know that no more efficient codebook
can be found because of Shannon's source coding theorem.
Let us return to the ASCII codes presented in Example 1. Is the 7-bit ASCII code optimal, i.e., is it a minimal representation?
The 7-bit ASCII code assign an equal length (7-bit) to all characters it represents. Thus, it would be optimal if all of the 128 characters were
equiprobable, that is each character should have a probability of
1128
1
128
. To find out whether the characters really are equiprobable an analysis of all English texts would be needed. Such an analysis is
difficult to do. However, the letter "E" is more probable than the letter "Z", so the equiprobable assumption does not hold, and the ASCII code
is not optimal.
(A technical note: We should take into account that in English text subsequent outcomes are not stochastically independent.
To see this, assume the first letter to be "b", then it is more probable that the next letter is "e", than "z". In the case where the outcomes
are not stochastically independent, the formulation we have given of Shannon's source coding theorem is no longer valid, to fix this, we should
replace the entropy with the entropy rate, but we will not pursue this here).