Summary: This module introduces practical entropy coding techniques, such as Huffman Coding, Run-length Coding (RLC) and Arithmetic Coding.
In the module of Use of Laplacian PDFs in Image Compression we have assumed that ideal entropy coding has been used in order to calculate the bit rates for the coded data. In practise we must use real codes and we shall now see how this affects the compression performance.
There are three main techniques for achieving entropy coding:
First we consider the change in compression performance if simple Huffman Coding is used to code the subimages of the 4-level Haar transform.
The calculation of entropy in this equation from our discussion of
entropy assumed that each message with probability
We can use the probability histograms which generated the
entropy plots in figures of level 1 energies, level 2
energies, level 3 energies and level 4 energies to
calculate the Huffman entropies
An algorithm for finding the optimum codesizes
![]() |
| Column: | 1 | 2 | 3 | 4 | 5 | 6 | - |
|---|---|---|---|---|---|---|---|
| 0.0264 | 0.0265 | 0.0264 | 0.0266 | ||||
| 0.0220 | 0.0222 | 0.0221 | 0.0221 | Level 4 | |||
| 0.0186 | 0.0187 | 0.0185 | 0.0186 | ||||
| 0.0171 | 0.0172 | 0.0171 | 0.0173 | - | |||
| 0.0706 | 0.0713 | 0.0701 | 0.0705 | ||||
| 0.0556 | 0.0561 | 0.0557 | 0.0560 | Level 3 | |||
| 3.7106 | 3.7676 | 0.0476 | 0.0482 | 0.0466 | 0.0471 | - | |
| 0.1872 | 0.1897 | 0.1785 | 0.1796 | ||||
| 0.1389 | 0.1413 | 0.1340 | 0.1353 | Level 2 | |||
| 0.1096 | 0.1170 | 0.1038 | 0.1048 | - | |||
| 0.4269 | 0.4566 | 0.3739 | 0.3762 | ||||
| 0.2886 | 0.3634 | 0.2691 | 0.2702 | Level 1 | |||
| 0.2012 | 0.3143 | 0.1819 | 0.1828 | - | |||
| Totals: | 3.7106 | 3.7676 | 1.6103 | 1.8425 | 1.4977 | 1.5071 |
Figure 1 shows the results of
applying this algorithm to the probability histograms and Table 1 lists the same results
numerically for ease of analysis. Columns 1 and 2 compare the
ideal entropy with the mean word length or bit rate from using a
Huffman code (the Huffman entropy) for the case of the
untransformed image where the original pels are quantized with
Run-length codes (RLCs) are a simple and effective way of improving the efficiency of Huffman coding when one event is much more probable than all of the others combined. They operate as follows:
![]() |
The total entropy per event for an RLC subimage is calculated as before from the entropy histogram. However to get the entropy per pel we scale the entropy by the ratio of the number of events (runs and non-zero samples) in the subimage to the number of pels in the subimage (note that with RLC this ratio will no longer equal one - it will hopefully be much less).
Figure 2 gives the entropies per pel after RLC for each subimage, which are now less than the entropies in this figure. This is because RLC takes advantage of spatial clustering of the zero samples in a subimage, rather than just depending on the histogram of amplitudes.
Clearly if all the zeros were clustered into a single run, this could be coded much more efficiently than if they are distributed into many runs. The entropy of the zero event tells us the mean number of bits to code each zero pel if the zero pels are distributed randomly, ie if the probability of a given pel being zero does not depend on the amplitudes of any nearby pels.
In typical bandpass subimages, non-zero samples tend to be clustered around key features such as object boundaries and areas of high texture. Hence RLC usually reduces the entropy of the data to be coded. There are many other ways to take advantage of clustering (correlation) of the data - RLC is just one of the simplest.
In Figure 1, comparing column 5
with column 3, we see the modest (7%) reduction in entropy per
pel achieved by RLC, due clustering in the Lenna image. The main
advantage of RLC is apparent in column 6, which shows the mean
bit rate per pel when we use a real Huffman code on the RLC
histograms of Figure 2. The
increase in bit rate over the RLC entropy is only
Finally, comparing column 6 with column 3, we see that, relative
to the simple entropy measure, combined RLC and Huffman coding
can reduce the bit rate by
The following is the listing of the M-file to calculate the Huffman entropy from a given histogram.
% Find Huffman code sizes: JPEG fig K.1, procedure Code_size.
% huffhist contains the histogram of event counts (frequencies).
freq = huffhist(:);
codesize = zeros(size(freq));
others = -ones(size(freq)); %Pointers to next symbols in code tree.
% Find non-zero entries in freq, and loop until only 1 entry left.
nz = find(freq > 0);
while length(nz) > 1,
% Find v1 for least value of freq(v1) > 0.
[y,i] = min(freq(nz));
v1 = nz(i);
% Find v2 for next least value of freq(v2) > 0.
nz = nz([1:(i-1) (i+1):length(nz)]); % Remove v1 from nz.
[y,i] = min(freq(nz));
v2 = nz(i);
% Combine frequency values.
freq(v1) = freq(v1) + freq(v2);
freq(v2) = 0;
codesize(v1) = codesize(v1) + 1;
% Increment code sizes for all codewords in this tree branch.
while others(v1) > -1,
v1 = others(v1);
codesize(v1) = codesize(v1) + 1;
end
others(v1) = v2;
codesize(v2) = codesize(v2) + 1;
while others(v2) > -1,
v2 = others(v2);
codesize(v2) = codesize(v2) + 1;
end
nz = find(freq > 0);
end
% Generate Huffman entropies by multiplying probabilities by code sizes.
huffent = (huffhist(:)/sum(huffhist(:))) .* codesize;