Note: Your browser may not currently support MathML. See our browser support page for additional details. You can always view the correct math in the PDF version.
In 1838, Samuel Morse was struggling with the problem of designing an efficient code for transmitting information over telegraph lines. He reasoned that an efficient code would use short code words for common letters and long code words for uncommon letters. (Can you see the profit motive at work?) In order to turn this reasoned principle into workable practice, Morse rummaged around in the composition trays for typeface in a printshop. He discovered that typesetters use many more
A Variation on Morse's Experiment. In order to set the stage for our study of efficient source codes, let's run a variation on Morse's experiment to see if we can independently arrive at a way of designing codes. Instead of giving ourselves a composition tray, let's start with a communication source that generates five symbols or letters
We will assume that these “source statistics” are typical, meaning that 1000 transmissions would yield 500
The most primitive binary code we could build for our source would use three bits for each symbol:
This code is inefficient in two ways. First, it leaves three illegal code words that correspond to no source symbol. Second, it uses the same code word length for an unlikely symbol (like
The number of bits per source symbol is
This scheme improves on the best variable length code of Table 2 from "Binary Codes: From Symbols to Binary Codes" by 0.08 bits/symbol.
Suppose your source of information generates the 26 lowercase roman letters used in English language text. These letters are to be concatenated into blocks of length M. Complete the following table of N (number of bits) versus M (number of letters in a block) and show that
| M | ||||||
|---|---|---|---|---|---|---|
| 1 | 2 | 3 | 4 | 5 | 6 | |
| N | 5 | 10 | ||||
|
|
5 | 5 | ||||
Now let's reason, as Morse did, that an efficient code would use short codes for likely symbols and long codes for unlikely symbols. Let's pick code #5 from Table 2 from "Binary Codes: From Symbols to Binary Codes" for this purpose:
This is a variable-length code. If we use this code on the 100 symbols that generated our source statistic, the average number of bits/symbol is
Use the source statistics of Equation 1 to determine the average number of bits/symbol for each code in Table 2 from "Binary Codes: From Symbols to Binary Codes".
Entropy. So far, each ad hoc scheme we have tried has produced an improvement in the average number of bits/symbol. How far can this go? The answer is given by Shannon's source coding theorem, which says that the minimum number of bits/symbol is
where pi is the probability that symbol Si is generated and
Code
| Symbol | Probability | ~ Log Probability |
|---|---|---|
| 0.5 | 1 | |
| 0.2 | 2.32 | |
| 02 | 2.32 | |
| 0.05 | 4.32 | |
| 0.05 | 4.32 |
Select an arbitrary page of English text. Build a table of source statistics containing
Huffman Codes. In the late
![]() |
This tree is now labeled with 1's and 0's to obtain the Huffman code. The labeling procedure is to label each right branch with a 1 and each left branch with a 0. The procedure for laying out symbols and constructing Huffman trees and codes is illustrated in the following examples.
Consider the source statistics
for which the Huffman algorithm produces the following binary tree and its corresponding code:
![]() |
The Huffman code for the source statistics
is illustrated next:
![]() |
Generate binary trees and Huffman codes for the following source statistics:
Coding a FAX Machine. Symbols can arise in unusual ways and be defined quite arbitrarily. To illustrate this point, we consider the design of a hypothetical FAX machine. For our design we will assume that a laser scanner reads a page of black and white text or pictures, producing a high voltage for a black spot and a low voltage for a white spot. We will also assume that the laser scanner resolves the page at 1024 lines, with 1024 spots/line. This means that each page is represented by a two-dimensional array, or matrix, of pixels (picture elements), each pixel being 1 or 0. If we simply transmitted these l's and
Let's think about a typical scan line for a printed page. It will contain long runs of 0's, corresponding to long runs of white, interrupted by short bursts of l's, corresponding to short runs of black where the scanner encounters a line or a part of a letter. So why not try to define a symbol to be “a run of
If we simply used a simple three-bit binary code for these eight symbols, then for each scan line we would generate anywhere from
An experiment conducted on FAXed documents produces the following statistics for run lengths of white ranging from 0 to 7:
These statistics indicate that only 1% of a typical page is black. Construct the Huffman code for this source. Use your Huffman code to code and decode these scan lines:
![]() |
"A First Course in Electrical and Computer Engineering provides readers with a comprehensive, introductory look at the world of electrical engineering. It was originally written by Louis Scharf […]"