Skip to content Skip to navigation

OpenStax-CNX

You are here: Home » Content » Source Coding

Navigation

Lenses

What is a lens?

Definition of a lens

Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

Who can create a lens?

Any individual member, a community, or a respected organization.

What are tags? tag icon

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

This content is ...

Affiliated with (What does "Affiliated with" mean?)

This content is either by members of the organizations listed or about topics related to the organizations listed. Click each link to see a list of all content affiliated with the organization.
  • Rice Digital Scholarship

    This module is included in aLens by: Digital Scholarship at Rice UniversityAs a part of collection: "Digital Communication Systems"

    Click the "Rice Digital Scholarship" link to see all content affiliated with them.

Recently Viewed

This feature requires Javascript to be enabled.
 

Source Coding

Module by: Behnaam Aazhang. E-mail the author

Summary: An introduction to the concept of typical sequences, which lie at the heart of source coding. The idea of typical sequences leads to Shannon's source-coding Theorem.

As mentioned earlier, how much a source can be compressed should be related to its entropy. In 1948, Claude E. Shannon introduced three theorems and developed very rigorous mathematics for digital communications. In one of the three theorems, Shannon relates entropy to the minimum number of bits per second required to represent a source without much loss (or distortion).

Consider a source that is modeled by a discrete-time and discrete-valued random process X 1 X 1 , X 2 X 2 , …, X n X n , … where x i a 1 a 2 a N x i a 1 a 2 a N and define p X i x i = a j = p j p X i x i a j p j for j= 1 , 2 , , N j 1 , 2 , , N , where it is assumed that X 1 X 1 , X 2 X 2 ,… X n X n are mutually independent and identically distributed.

Consider a sequence of length nn

X= X 1 X 2 X n X X 1 X 2 X n
(1)
The symbol a 1 a 1 can occur with probability p 1 p 1 . Therefore, in a sequence of length nn, on the average, a 1 a 1 will appear n p 1 n p 1 times with high probabilities if nn is very large.

Therefore,

PX=x= p X 1 x 1 p X 2 x 2 p X n x n P X x p X 1 x 1 p X 2 x 2 p X n x n
(2)
PX=x p 1 n p 1 p 2 n p 2 p N n p N = i =1N p i n p i P X x p 1 n p 1 p 2 n p 2 p N n p N i 1 N p i n p i
(3)
where p i =P X j = a i p i P X j a i for all jj and for all ii.

A typical sequence X X may look like

X= a 2 a 1 a N a 2 a 5 a 1 a N a 6 X a 2 a 1 a N a 2 a 5 a 1 a N a 6
(4)
where a i a i appears n p i n p i times with large probability. This is referred to as a typical sequence. The probability of X X being a typical sequence is
PX=x i =1N p i n p i = i =1N2log 2 p i n p i = i =1N2n p i log 2 p i =2n i =1N p i log 2 p i =2(nHX) P X x i 1 N p i n p i i 1 N 2 2 p i n p i i 1 N 2 n p i 2 p i 2 n i 1 N p i 2 p i 2 n H X
(5)
where HX H X is the entropy of the random variables X 1 X 1 , X 2 X 2 ,…, X n X n .

For large nn, almost all the output sequences of length nn of the source are equally probably with probability2(nHX) probability 2 n H X . These are typical sequences. The probability of nontypical sequences are negligible. There are Nn N n different sequences of length nn with alphabet of size NN. The probability of typical sequences is almost 1.

k =1# of typical seq.2(nHX)=1 k 1 # of typical seq. 2 n H X 1
(6)

Figure 1
Figure 1 (Figure7-17.png)

Example 1

Consider a source with alphabet {A,B,C,D} with probabilities { 12 1 2 , 14 1 4 , 18 1 8 , 18 1 8 }. Assume X 1 X 1 , X 2 X 2 ,…, X 8 X 8 is an independent and identically distributed sequence with X i ABCD X i A B C D with the above probabilities.

HX=((12log 2 12))14log 2 1418log 2 1818log 2 18=12+24+38+38=4+4+68=148 H X 1 2 2 1 2 1 4 2 1 4 1 8 2 1 8 1 8 2 1 8 1 2 2 4 3 8 3 8 4 4 6 8 14 8
(7)

The number of typical sequences of length 8

28×148=214 2 8 14 8 2 14
(8)

The number of nontypical sequences 48214=216214=214(41)=3×214 4 8 2 14 2 16 2 14 2 14 4 1 3 2 14

Examples of typical sequences include those with A appearing 8×12=4 8 1 2 4 times, B appearing 8×14=2 8 1 4 2 times, etc. {A,D,B,B,A,A,C,A}, {A,A,A,A,C,D,B,B} and much more.

Examples of nontypical sequences of length 8: {D,D,B,C,C,A,B,D}, {C,C,C,C,C,B,C,C} and much more. Indeed, these definitions and arguments are valid when n is very large. The probability of a source output to be in the set of typical sequences is 1 when n n . The probability of a source output to be in the set of nontypical sequences approaches 0 as n n .

The essence of source coding or data compression is that as n n , nontypical sequences never appear as the output of the source. Therefore, one only needs to be able to represent typical sequences as binary codes and ignore nontypical sequences. Since there are only 2nHX 2 n H X typical sequences of length nn, it takes nHX n H X bits to represent them on the average. On the average it takes HX H X bits per source output to represent a simple source that produces independent and identically distributed outputs.

Theorem 1: Shannon's Source-Coding

A source that produced independent and identically distributed random variables with entropy HH can be encoded with arbitrarily small error probability at any rate RR in bits per source output if RH R H . Conversely, if R<H R H , the error probability will be bounded away from zero, independent of the complexity of coder and decoder.

The source coding theorem proves existence of source coding techniques that achieve rates close to the entropy but does not provide any algorithms or ways to construct such codes.

If the source is not i.i.d. (independent and identically distributed), but it is stationary with memory, then a similar theorem applies with the entropy HX H X replaced with the entropy rate H=limit   n H X n | X 1 X 2 X n-1 H n H X n | X 1 X 2 X n-1

In the case of a source with memory, the more the source produces outputs the more one knows about the source and the more one can compress.

Example 2

The English language has 26 letters, with space it becomes an alphabet of size 27. If modeled as a memoryless source (no dependency between letters in a word) then the entropy is HX=4.03 H X 4.03 bits/letter.

If the dependency between letters in a text is captured in a model the entropy rate can be derived to be H=1.3 H 1.3 bits/letter. Note that a non-information theoretic representation of a text may require 5 bits/letter since 25 2 5 is the closest power of 2 to 27. Shannon's results indicate that there may be a compression algorithm with the rate of 1.3 bits/letter.

Although Shannon's results are not constructive, there are a number of source coding algorithms for discrete time discrete valued sources that come close to Shannon's bound. One such algorithm is the Huffman source coding algorithm. Another is the Lempel and Ziv algorithm.

Huffman codes and Lempel and Ziv apply to compression problems where the source produces discrete time and discrete valued outputs. For cases where the source is analog there are powerful compression algorithms that specify all the steps from sampling, quantizations, and binary representation. These are referred to as waveform coders. JPEG, MPEG, vocoders are a few examples for image, video, and voice, respectively.

Content actions

Download module as:

Add module to:

My Favorites (?)

'My Favorites' is a special kind of lens which you can use to bookmark modules and collections. 'My Favorites' can only be seen by you, and collections saved in 'My Favorites' can remember the last module you were on. You need an account to use 'My Favorites'.

| A lens I own (?)

Definition of a lens

Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

Who can create a lens?

Any individual member, a community, or a respected organization.

What are tags? tag icon

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

| External bookmarks