As we consider other block codes, the simple idea of the decoder
amounting to a simple majority count of the bits in the received
data block won't generalize easily. We nees a broader biew that
takes into acount the distance between
codewords. A length-N codeword means that the receiver must
decide among the
2N
2N
possible datawords to select which of the
2K
2K
codewords was actually transmitted. As
shown in Figure 3, we can think of the datawords geometrically.
We define the Hamming distance between binary datawords
c
1
c
1
and
c
2
c
2
, denoted
by
d
c
1
c
2
d
c
1
c
2
to be the minimum number of bits that must be "flipped" to go
from one word to the other. For example, the distance between
codewords is 3 bits. In our table of binary arithmetic, we see
that adding a 1 corresponds to flipping a bit. Furthermore,
subtraction and addition are equivalent. We can express the
Hamming distance as
d
c
1
c
2
=
sum
c
1
⊕
c
2
d
c
1
c
2
sum
c
1
c
2
(1)
Show that adding the error vector
col[1,0,...,0] to a codeword flips the tcodeword's leading
bit and leaves the rest unaffected.
The probability of one bit being flipped anywhere in a codeword
is
N
pe
1−
pe
N−1
N
pe
1
pe
N1
. The number of errors the channel introduces equals
the number of ones in e; the probability of any particular error
vector decreases with the number of errors.
To perform decoding when wrrors occur, we want to find the
codeword (one of the filled circles in Fig.3) that has the
highest probability of occurring: the one closest to the one
received. Note that if a dataword lies a distance of 1 from two
codewords , it is impossible to determine
whic codeword was actually sent. This criterion means that if
any two codewords two bits apart, then the code
cannot correct the channel-induced error.
Thus, to have a code that can correct all single-bit
errors, codewords must have a minimum separation of
three. Our repetition code has this property.
Does any error-correcting code reduce communication errors when
introducing code bits increase the probability that bits will
arrive in error? The answer is yes. To understand channel
coding, we need to develop first a general framework for channel
coding, and discover what it takes for a code to be maximally
efficient: Correct as many errors as possible using the fewest
error correction bits as possible (making the efficiency KNKN
as large as possible.) We also nees a systematic way of finding
the codeword closest to any received dataword. A much better
code than our (3,1) repetition code is the following (7,4) code.
c1=b1
c2=b2
c3=b3
c4=b4
c5=b1⊕b2⊕b3
c6=b1⊕b3⊕b4
c6=b1⊕b2⊕b4
c
1
b
1
c
2
b
2
c
3
b
3
c
4
b
4
c
5
b
1
b
2
b
3
c
6
b
1
b
3
b
4
c
6
b
1
b
2
b
4
G=(
1000
0100
0010
0001
1110
0111
1101
)
G
1
0
0
0
0
1
0
0
0
0
1
0
0
0
0
1
1
1
1
0
0
1
1
1
1
1
0
1
In this (7,4) code,
24=16
24
16
of the
27=128
27
128
possible blocks at the channel decoder corredpond to error-free
transmission and reception.
Error correction amounts to searching for the codeword
cc
closest to the
received block cc
in terms of the Hamming distance between the two. The error correction capability of a
channel code is limited by how close together any two error-free blocks are. Bad codes
would produce blocks close together, which would result in ambiguity when assigning a
block of data bits to a received block. The quantity to examine, therefore, in
designing code error correction codes is the minimum distance between codewords.
d
min
=mind
c
i
c
j
with
c
i
≠
c
j
d
min
min
d
c
i
c
j
with
c
i
c
j
(2)
To have a channel code that can correct all single-bit errors,
dmin
≥3
dmin
3
.
Suppose we want a channel code to have an error-correction capability of
nn
bits. What must the minimum Hamming distance between codewords
d
min
d
min
be?
How do we calculate the minimum distance between codewords? Because we have
2K
2
K
codewords, the number of possible unique pairs equals
2K−1(2K−1)
2
K
1
2
K
1
,
which can be a large number. Recall that our channel coding procedure is linear, with
c=Gb
c
G
b
. Therefore
c
i
⊕
c
j
=G
b
i
⊕
b
j
G
c
i
c
j
G
b
i
b
j
. Because
b
i
⊕
b
j
b
i
b
j
always yields another block of data bits, we find that the difference
between any two codewords is another codeword! Thus, to find
d
min
d
min
we need
only compute the number of ones that comprise all non-zero codewords. Finding these
codewords is easy once we examine the coder's generator matrix. Note that the columns
of GG are
codewords (why is this?), and that all codewords can be found by all possible pairwise sums of the columns.
To find
d
min
d
min
,
we need only count the number of bits in each column and sums of columns. For our example
(7,4)
(
7
,
4
)
((Reference)),
GG's first column
has three ones, the next one four, and the last two three. Considering
sums of column pairs next, note theat because the upper portion of
GG is an
identity matrix, the corresponding upper portion of all column sums
must habe exactly two bits. Because the bottom portion of each
column differd from the other columns in at least one place, the
bottom portion of a sum must have at least one bit. Triple sums will
have at least three bits because the upper portion of GG being three.
Thus, no sum of columns has fewer than three
bits, meaning
d
min
=3
d
min
3
,
and we have a channel coder that can correct all occurrences of one error within a received
77-bit
block.