The main standard for image compression in current use is the
JPEG (Joint Picture Experts Group) standard, devised and refined
over the period 1985 to 1993. It is formally known as ISO Draft
International standard 10981-1 and CCITT Recommendation T.81,
and is described in depth in The JPEG Book by W B Pennebaker and
J L Mitchell, Van Nostrand Reinhold 1993.
We shall briefly outline the baseline version of JPEG but first
we consider its energy compression technique - the discrete
cosine transform (DCT).
In this equation from our discussion of the Haar transform, we
met the 2-point Haar transform and in this equation we saw that it
can be easily inverted if the transform matrix TT is orthonormal so that
T-1=TT
T
T
.
If TT is of size
nn x nn,
where
n=2m
n
2
m
, then we may easily generate larger orthonormal matrices,
which lead to definitions of larger transforms.
An nn-point transform is defined as:
y1…yn=Tx1…xn
y
1
…
y
n
T
x
1
…
x
n
(1)
where
T=t11…t1n⋮⋮⋮tn1…tnn
T
t
1
1
…
t
1
n
⋮
⋮
⋮
t
n
1
…
t
n
n
A 4-point orthonormal transform matrix that is equivalent to 2 levels
of the Haar transform is:
T=12101010-10020000021211001-1000011001-1=12111111-1-12-200002-2
T
1
2
1
0
1
0
1
0
-1
0
0
2
0
0
0
0
0
2
1
2
1
1
0
0
1
-1
0
0
0
0
1
1
0
0
1
-1
1
2
1
1
1
1
1
1
-1
-1
2
2
0
0
0
0
2
2
(2)
where
12101010-1002000002
1
2
1
0
1
0
1
0
-1
0
0
2
0
0
0
0
0
2
is Haar level 2 and
1211001-1000011001-1
1
2
1
1
0
0
1
-1
0
0
0
0
1
1
0
0
1
-1
is Haar level 1. Similarly 3 and 4 level Haar transforms may
be expressed using 8 and 16 point transform matrices respectively.
However for
n>2
n
2
, there are better matrices than those based on the Haar
transform, where better means with
improved energy compression properties for typical images.
Discrete Cosine Transforms (DCTs) have some of these improved
properties and are also simple to define and implement. The
nn rows of an
nn-point DCT matrix TT are defined by:
∀,i=1→n:t1i=1n
i
1
n
t
1
i
1
n
∀,i=1→n∧k=2→n:tki=2ncosπ2i-1k-12n
i
1
n
k
2
n
t
k
i
2
n
2
i
1
k
1
2
n
(3)
It is straightforward to show that this matrix is orthonormal
for
nn even, since the norm of
each row is unity and the dot product of any pair of rows is
zero (the product terms may be expressed as the sum of a pair
of cosine functions, which are each zero mean).
The 8-point DCT matrix (
n=8
n
8
) is:
T=0.35360.35360.35360.35360.35360.35360.35360.35360.49040.41570.27780.0975-0.0975-0.2778-0.4157-0.49040.46190.1913-0.1913-0.4619-0.4619-0.19130.19130.46190.4157-0.0975-0.4904-0.27780.27780.49040.0975-0.41570.3536-0.3536-0.35360.35360.3536-0.3536-0.35360.35360.2778-0.49040.09750.4157-0.4157-0.09750.4904-0.27780.1913-0.46190.4619-0.1913-0.19130.4619-0.46190.19130.0975-0.27780.4157-0.49040.4904-0.41570.2778-0.0975
T
0.3536
0.3536
0.3536
0.3536
0.3536
0.3536
0.3536
0.3536
0.4904
0.4157
0.2778
0.0975
-0.0975
-0.2778
-0.4157
-0.4904
0.4619
0.1913
-0.1913
-0.4619
-0.4619
-0.1913
0.1913
0.4619
0.4157
-0.0975
-0.4904
-0.2778
0.2778
0.4904
0.0975
-0.4157
0.3536
-0.3536
-0.3536
0.3536
0.3536
-0.3536
-0.3536
0.3536
0.2778
-0.4904
0.0975
0.4157
-0.4157
-0.0975
0.4904
-0.2778
0.1913
-0.4619
0.4619
-0.1913
-0.1913
0.4619
-0.4619
0.1913
0.0975
-0.2778
0.4157
-0.4904
0.4904
-0.4157
0.2778
-0.0975
(4)
The rows of
TT,
known as basis function, are plotted as asterisks in
Figure 1. The asterisks are
superimposed on the underlying continuous cosine functions,
used in all sizes of DCT. Only the amplitude scaling and the
maximum frequency vary with the size
nn.
When we take the transform of an
nn-point vector using
y=Tx
y
T
x
, xx is
decomposed into a linear combination of the basis function
(rows) of TT, whose
coefficients are the samples of yy, because
x=TTy
x
T
y
.
The basis functions may also be viewed as the impulse
responses of FIR filters, being applied to the data
xx.
The DCT is closely related to the discrete Fourier transform
(DFT). It represents the result of applying the
2n
2
n
-point DFT
to a vector:
x
2
n
=xxrev
x
2
n
x
xrev
where
xrev=xn…x1
xrev
x
n
…
x
1
.
x
2
n
x
2
n
is symmetric about its centre and so the
2n
2
n
Fourier coefficients are all purely real and
symmetric about zero frequency. The
nn DCT coefficients are then the
first nn Fourier coefficients.
The DFT must be defined with a half sample
period offset on the indexing of the input samples for the
above to be strictly true.
The 8-point DCT is the basis of the JPEG standard, as well
as several other standards such as MPEG-1 and MPEG-2 (for TV
and video) and H.263 (for video-phones). Hence we shall
concentrate on it as our main example, but bear in mind that
DCTs may be defined for a wide range of sizes
nn.