Connexions

You are here: Home » Content » Entropy
Content Actions
Lenses

What is a lens?

Lenses

A lens is a custom view of Connexions content. You can think of it as a fancy kind of list that will let you see Connexions through the eyes of organizations and people you trust.

What is in a lens?

Lens makers point to Connexions materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

Who can create a lens?

Any individual Connexions member, a community, or a respected organization.

This content is ...
Affiliated with (?)
This content is either by members of the organizations listed or about topics related to the organizations listed. Click each link to see a list of all content affiliated with the organization.
  • This module is included inLens: Rice University OpenCourseWare
    By: OpenCourseWare ConsortiumAs a part of collection:"Digital Communication Systems"

    Click the "Rice University OCW" link to see all content affiliated with them.

    Rice University OCW
Tags

(?)

These tags come from the endorsement, affiliation, and other lenses that include this content.

Entropy

Module by: Behnaam Aazhang

Summary: This module presents a quantification of information by the use of entropy. Entropy, or average self-information, measures the uncertainty of a source and hence provides a measure of the information it could reveal.

Information sources take very different forms. Since the information is not known to the destination, it is then best modeled as a random process, discrete-time or continuous time.
Here are a few examples:
  • Digital data source (e.g., a text) can be modeled as a discrete-time and discrete valued random process X 1 X 1 , X 2 X 2 , …, where X i ABCDE X i A B C D E with a particular p X 1 x p X 1 x , p X 2 x p X 2 x , …, and a specific p X 1 X 2 p X 1 X 2 , p X 2 X 3 p X 2 X 3 , …, and p X 1 X 2 X 3 p X 1 X 2 X 3 , p X 2 X 3 X 4 p X 2 X 3 X 4 , …, etc.
  • Video signals can be modeled as a continuous time random process. The power spectral density is bandlimited to around 5 MHz (the value depends on the standards used to raster the frames of image).
  • Audio signals can be modeled as a continuous-time random process. It has been demonstrated that the power spectral density of speech signals is bandlimited between 300 Hz and 3400 Hz. For example, the speech signal can be modeled as a Gaussian process with the shown power spectral density over a small observation period.
Figure7-5.png
Figure 1
These analog information signals are bandlimited. Therefore, if sampled faster than the Nyquist rate, they can be reconstructed from their sample values.
Example 1 
A speech signal with bandwidth of 3100 Hz can be sampled at the rate of 6.2 kHz. If the samples are quantized with a 8 level quantizer then the speech signal can be represented with a binary sequence with the rate of
6.2×103log28=18600bitssamplesamplessec=18.6kbitssec 6.23 2 8 18600 bits sample samples sec 18.6 kbits sec (1)
Figure7-6.png
Figure 2
The sampled real values can be quantized to create a discrete-time discrete-valued random process. Since any bandlimited analog information signal can be converted to a sequence of discrete random variables, we will continue the discussion only for discrete random variables.
Example 2 
The random variable xx takes the value of 0 with probability 0.9 and the value of 1 with probability 0.1. The statement that x=1 x 1 carries more information than the statement that x=0 x 0 . The reason is that xx is expected to be 0, therefore, knowing that x=1 x 1 is more surprising news!! An intuitive definition of information measure should be larger when the probability is small.
Example 3 
The information content in the statement about the temperature and pollution level on July 15th in Chicago should be the sum of the information that July 15th in Chicago was hot and highly polluted since pollution and temperature could be independent.
Ihothigh=Ihot+Ihigh I hot high I hot I high (2)
An intuitive and meaningful measure of information should have the following properties:
  1. Self information should decrease with increasing probability.
  2. Self information of two independent events should be their sum.
  3. Self information should be a continuous function of the probability.
The only function satisfying the above conditions is the -log of the probability.
Definition 1: Entropy
1. The entropy (average self information) of a discrete random variable XX is a function of its probability mass function and is defined as
HX=-i=1NpX x i logpX x i H X i 1 N p X x i p X x i (3)
where NN is the number of possible values of XX and pX x i =PrX= x i p X x i X x i . If log is base 2 then the unit of entropy is bits. Entropy is a measure of uncertainty in a random variable and a measure of information it can reveal.
2. A more basic explanation of entropy is provided in another module.
Example 4 
If a source produces binary information 01 0 1 with probabilities pp and 1-p 1 p . The entropy of the source is
HX=-plog2p-1-plog21-p H X p 2 p 1 p 2 1 p (4)
If p=0 p 0 then HX=0 H X 0 , if p=1 p 1 then HX=0 H X 0 , if p=1/2 p 12 then HX=1 H X 1 bits. The source has its largest entropy if p=1/2 p 12 and the source provides no new information if p=0 p 0 or p=1 p 1 .
Figure7-10.png
Figure 3
Example 5 
An analog source is modeled as a continuous-time random process with power spectral density bandlimited to the band between 0 and 4000 Hz. The signal is sampled at the Nyquist rate. The sequence of random variables, as a result of sampling, are assumed to be independent. The samples are quantized to 5 levels -2-1012 -2 -1 0 1 2 . The probability of the samples taking the quantized values are 121418116116 1 2 1 4 1 8 1 16 1 16 , respectively. The entropy of the random variables are
HX=-12log212-14log214-18log218-116log2116-116log2116=12log22+14log24+18log28+116log2 16 +116log216=12+12+38+48=158bitssample H X 1 2 2 1 2 1 4 2 1 4 1 8 2 1 8 1 16 2 1 16 1 16 2 1 16 1 2 2 2 1 4 2 4 1 8 2 8 1 16 2 16 1 16 2 16 1 2 1 2 3 8 4 8 15 8 bits sample (5)
There are 8000 samples per second. Therefore, the source produces 8000158=15000bitssec 8000 15 8 15000 bits sec of information.
Definition 2: Joint Entropy
The joint entropy of two discrete random variables (XX, YY) is defined by
HXY=-ijpXY x i y j logpXY x i y j H X Y i i j j p X Y x i y j p X Y x i y j (6)
The joint entropy for a random vector X= X 1 X 2 X n T X X 1 X 2 X n is defined as
HX=- x 1 x 2 x n pX x 1 x 2 x n logpX x 1 x 2 x n H X x 1 x 1 x 2 x 2 x n x n p X x 1 x 2 x n p X x 1 x 2 x n (7)
Definition 3: Conditional Entropy
The conditional entropy of the random variable XX given the random variable YY is defined by
H X | Y =-ijpXY x i y j log p X | Y x i | y j H X | Y i i j j p X Y x i y j p X | Y x i | y j (8)
It is easy to show that
HX=H X 1 +H X 2 | X 1 ++H X n | X 1 X 2 X n-1 H X H X 1 H X 2 | X 1 H X n | X 1 X 2 X n-1 (9)
and
HXY=HY+H X | Y =HX+H Y | X H X Y H Y H X | Y H X H Y | X (10)
If X 1 X 1 , X 2 X 2 , …, X n X n are mutually independent it is easy to show that
HX=i=1nH X i H X i 1 n H X i (11)
Definition 4: Entropy Rate
The entropy rate of a stationary discrete-time random process is defined by
H=limnH X n | X 1 X 2 X n H n H X n | X 1 X 2 X n (12)
The limit exists and is equal to
H=limn1nH X 1 X 2 X n H n 1 n H X 1 X 2 X n (13)
The entropy rate is a measure of the uncertainty of information content per output symbol of the source.
Entropy is closely tied to source coding. The extent to which a source can be compressed is related to its entropy. In 1948, Claude E. Shannon introduced a theorem which related the entropy to the number of bits per second required to represent a source without much loss.

Comments, questions, feedback, criticisms?

Send feedback