The human vision system perceives images in colour using
receptors on the retina of the eye which respond to three
relatively broad colour bands in the regions of red, green and
blue (RGB) in the colour spectrum (red, orange, yellow, green,
blue, indigo, violet).
Colours in between these are perceived as different linear
combinations of RGB. Hence colour TVs and monitors can form
almost any perceivable colour by controlling the relative
intensities of R, G and B light sources. Thus most colour
images which exist in electronic form are fundamentally
represented by 3 intensities (R, G and B) at each picture
element (pel) position.
The numerical values used for these intensities are usually
chosen such that equal increments in value result in
approximately equal apparent increases in brightness. In
practise this means that the numerical value is approximately
proportional to the log of the true light intensity (energy of
the wave) - this is Weber's Law. Throughout this course, we
shall refer to these numerical values as intensities, since
for compression it is most convenient to use a subjectively
linear scale.
The eye is much more sensitive to overall intensity
(luminance) changes than to colour changes. Usually most of the
information about a scene is contained in its luminance rather
than its colour (chrominance).
This is why black-and-white (monochrome) reproduction was
acceptable for photography and TV for many years until
technology provided colour reproduction at a sufficient cheap
price to make its modest advantages worth having.
The luminance (YY) of a pel may
be obtained from its RGB components as:
Y=0.3R+0.6G+0.1B
Y
0.3
R
0.6
G
0.1
B
(1)
These coefficients are only approximate, and are the values
defined in the JPEG Book. In other places values of
0.30.3,
0.590.59 and
0.110.11 are used.
RGB representations of images are normally defined so that if
R=G=B
R
G
B
, the pel is always some shade of gray, and if
Y=R=G=B
Y
R
G
B
in these cases, the 3 coefficients in Equation 1 should sum to unity.
When YY defines the luminance of
a pel, its chrominance is usually defined by
UU and
VV such that:
U=0.5(B−Y)
U
0.5
B
Y
V=0.625(R−Y)
V
0.625
R
Y
(2)
Note that gray pels will always have
U=V=0
U
V
0
.
The transformation between RGB and YUV colour spaces is linear
and may be achieved by a
3×3
3
3
matrix CC
and its inverse:
(
Y
U
V
)=C(
R
G
B
)
Y
U
V
C
R
G
B
(3)
where
C=(
0.30.60.1
-0.15-0.30.45
0.4375-0.3750-0.0625
)
C
0.3
0.6
0.1
-0.15
-0.3
0.45
0.4375
-0.3750
-0.0625
and
(
R
G
B
)=C-1(
Y
U
V
)
R
G
B
C
Y
U
V
(4)
where
C-1=(
101.6
1-0.3333-0.8
120
)
C
1
0
1.6
1
-0.3333
-0.8
1
2
0
Figure 1 shows the sensitivity of
the eye to luminance (YY) and
chrominance (UU,
VV) components of images. The
horizontal scale is spatial frequency, and represents the
frequency of an alternating pattern of parallel stripes with
sinusoidally varying intensity. The vertical scale is the
contrast sensitivity of human vision, which is the ratio of
the maximum visible range of intensities to the minimum
discernible peak-to-peak intensity variation at the specified
frequency.
In Figure 1 we see that:
-
the maximum sensitivity to YY
occurs for spatial frequencies around 5 cycles / degree,
which corresponds to striped patterns with a half-period
(stripe width) of 1.8 mm at a distance of 1 m (~arm's
length).
-
The eye has very little response above 100 cycles /
degree, which corresponds to a stripe width of 0.1 mm at 1
m. On a standard PC display of width 250 mm, this would
require 2500 pels per line! Hence the current SVGA
standard of
1024×768
1024
768
pels still falls somewhat short of the ideal and
is limited by CRT spot size. Modern laptop displays have a
pel size of about 0.3 mm, but are pleasing to view because
the pel edges are so sharp (and there is no flicker).
-
The sensitivity to luminance drops off at low spatial
frequencies, showing that we are not very good at
estimating absolute luminance levels as long as
they do not change with time - the luminance
sensitivity to temporal fluctuations (flicker) does not
fall off at low spatial frequencies.
-
The maximum chrominance sensitivity is much lower than the
maximum luminance sensitivity with blue-yellow
(UU) sensitivity being about
half of red-green (VV)
sensitivity and about
16
1
6
of the maximum luminance sensitivity.
-
The chrominance sensitivities fall off above 1 cycle /
degree, requiring a much lower spatial bandwidth than
luminance.
We can now see why it is better to convert to the YUV domain
before attempting image compression. The
UU and
VV components may be sampled at a
lower rate than
YY (due to
narrower bandwidth) and may be quantised more coarsely (due to
lower contrast sensitivity).
A colour demonstration on the computer will show this effect.
The 3 RGB samples at each pel are transformed into 3 YUV
samples using Equation 3.
Most image compression systems then subsample the
UU and
VV information by 2:1
horizontally and vertically so that there is one
UU and one
VV pel for each
2×2
2
2
block of YY pels. The
subsampled UU and
VV pels are obtained by averaging
the four UU and
VV samples, from Equation 3. The quarter-size
UU and
VV subimages are then compressed
using the same techniques as the full-size
YY image, except that coarser
quantisation may be used for UU and
VV, so the total cost of adding
colour may only be about 25% increase in bit rate. Sometimes
UU and
VV are subsamples 4:1 each way
(16:1 total), giving an even lower cost of colour.
From now on we will mostly be considering compression of the
monochrome YY image, and assume
that similar techniques will be used for the smaller
UU and
VV subimages.
A final feature of human vision, which is useful for
compression, is that the contrast sensitivity to a given
pattern is reduced in the presence of other patterns
(activity) in the same region. This is known as activity
masking.
It is a complicated subject as it depends on the similarity
between the given pattern and the background activity. However
in general, the higher the variance of the pels in a given
region (typically ~ 8 to 16 pels across), the lower is the
contrast sensitivity.
Hence compression schemes which adapt the quantisation to
local image activity tend to perform better than those which
use uniform quantisation.
A computer demonstration will show the effect of reduced
sensitivity to quantisation effects when noise is added to an
image.