Skip to content Skip to navigation

OpenStax-CNX

You are here: Home » Content » Error Correction and the Hamming Distance

Navigation

Recently Viewed

This feature requires Javascript to be enabled.

Tags

(What is a tag?)

These tags come from the endorsement, affiliation, and other lenses that include this content.
 

Error Correction and the Hamming Distance

Module by: Don Johnson. E-mail the author

Summary: So-called linear codes create error-correction bits by combining the data bits linearly. Topics discussed include generator matrices and the Hamming distance.

Note: You are viewing an old version of this document. The latest version is available here.

So-called linear codes create error-correction bits by combining the data bits linearly. The phrase "linear combination" means here single-bit binary arithmetic.

Figure 1
00=0 0 0 0 11=0 1 1 0 01=1 0 1 1 10=1 1 0 1
00=0 0 0 0 11=1 1 1 1 01=0 0 1 0 10=0 1 0 0

For example, let's consider the specific (7,4) ( 7 , 4 ) error correction code described by the following coding table and, more concisely, by the succeeding matrix expression.

Figure 2
c1=b1 c2=b2 c3=b3 c4=b4 c5=b1b2b3 c6=b1b3b4 c6=b1b2b4 c 1 b 1 c 2 b 2 c 3 b 3 c 4 b 4 c 5 b 1 b 2 b 3 c 6 b 1 b 3 b 4 c 6 b 1 b 2 b 4
or
c=Gb , G=( 1000 0100 0010 0001 1110 0111 1101 ) c=c1c2c3c4c5c6c7 b=b1b2b3b4 c G b , G 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 1 1 1 0 0 1 1 1 1 1 0 1 c c 1 c 2 c 3 c 4 c 5 c 6 c 7 b b 1 b 2 b 3 b 4

The length-KK block of data bits is represented by the vector bb, and the length- NN output block of the channel coder, known as a codeword, by cc. For example, if b=1000T b 1 0 0 0 , c=1000101T c 1000101 . The generator matrix GG defines all block-oriented linear channel coders.

In this (7,4) ( 7 , 4 ) code, 24=16 2 4 16 of the 27=128 2 7 128 possible blocks arriving at the channel decoder correspond to error-free transmission and reception. In analyzing the error-correcting capability of this or any other channel code, we must have a systematic way of assigning a codeword to a received block when communication errors occur. Note that the number of possible received blocks is much larger than the number of codewords; the channel has ample opportunity to confuse us! Conceptually, we want to assign to each possible received length- NN block a length- KK block of data bits. This assignment should correspond to the most probable error pattern that could have transformed a transmitted block into the one received. Because the matched-filter receiver guarantees that the single-bit error probability p e p e of the digital channel will be less than 1/2 12 , the most probable error pattern is the one having the fewest errors. The way of assessing error patterns is to compute the distance between the received block and all codewords. The Hamming distance between blocks c 1 c 1 and c 2 c 2 , denoted by d c 1 c 2 d c 1 c 2 , equals the sum of the difference between the bit sequences comprising the two blocks. For example, if c 1 =1000101T c 1 1 0 0 0 1 0 1 and c 2 =1010101T c 2 1 0 1 0 1 0 1 , the difference equals 0010000T 0 0 1 0 0 0 0 , the number of nonzero terms equals one, and thus d c 1 c 2 =1 d c 1 c 2 1 . Interestingly, examination of the binary arithmetic table reveals that subtraction and addition are equivalent. We can express the Hamming distance as

d c 1 c 2 = sum c 1 c 2 d c 1 c 2 sum c 1 c 2
(1)

Error correction amounts to searching for the codeword cc closest to the received block cc in terms of the Hamming distance between the two. The error correction capability of a channel code is limited by how close together any two error-free blocks are. Bad codes would produce blocks close together, which would result in ambiguity when assigning a block of data bits to a received block. The quantity to examine, therefore, in designing code error correction codes is the minimum distance between codewords.

d min =mind c i c j           with           c i c j d min min d c i c j           with           c i c j
(2)
Suppose this minimum distance is one; this means the channel code has two codewords with the not-so-desirable property that one error occurring somewhere within the NN-bit block transforms one codeword into another! The receiver would never realize that an error occurred in this case. Such a channel code would have no error-correction capability, and because the energy/bit must be reduced to insert error correction bits, we obtain larger error probabilities. Now suppose the minimum distance between error-free blocks is two. This situation is better than the previous one: If one error occurs anywhere within a block, no matter what codeword was transmitted, we can detect the fact that an error occurred. Unfortunately, we cannot correct the error. Because the minimum distance is two, a received block containing one error can be a distance of one from two (or more) codewords; we cannot associate such a block uniquely with a codeword. We must be resigned to the fact that to correct errors, we must have the minimum distance between error-free blocks be at least three.

Exercise 1

Suppose we want a channel code to have an error-correction capability of nn bits. What must the minimum Hamming distance between codewords d min d min be?

Solution

d min =2n+1 d min 2 n 1

How do we calculate the minimum distance between codewords? Because we have 2K 2 K codewords, the number of possible unique pairs equals 2K1(2K1) 2 K 1 2 K 1 , which can be a large number. Recall that our channel coding procedure is linear, with c=Gb c G b . Therefore c i c j =G b i b j G c i c j G b i b j . Because b i b j b i b j always yields another block of data bits, we find that the difference between any two codewords is another codeword! Thus, to find d min d min we need only compute the number of ones that comprise all non-zero codewords. Finding these codewords is easy once we examine the coder's generator matrix. Note that the columns of GG are codewords, and that all codewords can be found by all possible pairwise sums of the columns. To find d min d min , we need only count the number of bits in each column and sums of columns. For our example (7,4) ( 7 , 4 ) ((Reference)), GG's first column has three ones, the next one four, and the last two three. No sum of columns has fewer than three bits, meaning d min =3 d min 3 , and we have a channel coder that can correct all occurrences of one error within a received 77-bit block.

Exercise 2

Find the generator matrix for our three-fold repetition code. Show that it indeed has single-bit error correction capability. How many repetitions would be required to yield a double-error-correcting channel code?

Solution

G=111T G 1 1 1 . As it has only one non-zero codeword and it has three ones, it has single-bit error correction capability. For double-bit error correction, we would need a five-fold repetition code.

We must question whether a (7,4) ( 7 , 4 ) code's error correction capability compensates for the increased error probability due to the necessitated reduced bit energy. (For example, the repetition code does not meet this requirement.) Figure 3 shows that if the signal-to-noise ratio is large enough that channel coding indeed yields a smaller overall error probability.

Figure 3: The probability of an error occurring in transmitted K=4 K 4 data bits equals 11 p e 4 1 1 p e 4 as 1 p e 4 1 p e 4 equals the probability that the four bits are received without error. The upper curve displays how this probability of an error anywhere in the four-bit block varies with signal-to-noise ratio. When a (7,4) ( 7 , 4 ) single-bit error correcting code is used, the transmitter reduces the energy it expends during a single-bit transmission by 4/747 , appending three extra bits for error correction. Now the probability of any bit in the seven-bit block being in error after error correction equals 11 p e 77 p e 1 p e 6 1 1 p e 7 7 p e 1 p e 6 , where p e p e is the probability of a bit error occurring in the channel when channel coding occurs. Here, 7 p e 1 p e 6 7 p e 1 p e 6 equals the probability of exactly one of seven bits emerging from the channel in error; The channel decoder corrects this type of error, and all data bits in the block are received correctly.
Figure 3 (hamming.png)

Because the bit stream emerging from the source coder is segmented into four-bit blocks, the fair way of comparing coded and uncoded transmission is to compute the probability of a block error: the probability that any bit in a block remains in error despite error correction and regardless of whether the error occurs in the data or coding bits. Clearly, our (7,4) ( 7 , 4 ) channel code does yield smaller error rates, and is worth the additional systems required to make it work.

Content actions

Download module as:

Add module to:

My Favorites (?)

'My Favorites' is a special kind of lens which you can use to bookmark modules and collections. 'My Favorites' can only be seen by you, and collections saved in 'My Favorites' can remember the last module you were on. You need an account to use 'My Favorites'.

| A lens I own (?)

Definition of a lens

Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

Who can create a lens?

Any individual member, a community, or a respected organization.

What are tags? tag icon

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

| External bookmarks