Skip to content Skip to navigation

Connexions

You are here: Home » Content » Human Vision

Navigation

Content Actions

  • Download module PDF
  • Add to ...
    Add the module to:
    • My Favorites
    • A lens
    • An external social bookmarking service
    • My Favorites (What is 'My Favorites'?)
      'My Favorites' is a special kind of lens which you can use to bookmark modules and collections directly in Connexions. 'My Favorites' can only be seen by you, and collections saved in 'My Favorites' can remember the last module you were on. You need a Connexions account to use 'My Favorites'.
    • A lens (What is a lens?)

      Definition of a lens

      Lenses

      A lens is a custom view of Connexions content. You can think of it as a fancy kind of list that will let you see Connexions through the eyes of organizations and people you trust.

      What is in a lens?

      Lens makers point to Connexions materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

      Who can create a lens?

      Any individual Connexions member, a community, or a respected organization.

      What are tags? tag icon

      Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

    • External bookmarks
  • E-mail the author
  • Rate this module (How does the rating system work?)

    Rating system

    Ratings

    Ratings allow you to judge the quality of modules. If other users have ranked the module then its average rating is displayed below. Ratings are calculated on a scale from one star (Poor) to five stars (Excellent).

    How to rate a module

    Hover over the star that corresponds to the rating you wish to assign. Click on the star to add your rating. Your rating should be based on the quality of the content. You must have an account and be logged in to rate content.

    (0 ratings)

Recently Viewed

This feature requires Javascript to be enabled.

Human Vision

Module by: Nick Kingsbury

Summary: This modules introduces human vision on colors, YUV color space, visual sensitivity and color compression strategy.

Note: Your browser may not currently support MathML. See our browser support page for additional details. You can always view the correct math in the PDF version.

Colours

The human vision system perceives images in colour using receptors on the retina of the eye which respond to three relatively broad colour bands in the regions of red, green and blue (RGB) in the colour spectrum (red, orange, yellow, green, blue, indigo, violet).

Colours in between these are perceived as different linear combinations of RGB. Hence colour TVs and monitors can form almost any perceivable colour by controlling the relative intensities of R, G and B light sources. Thus most colour images which exist in electronic form are fundamentally represented by 3 intensities (R, G and B) at each picture element (pel) position.

The numerical values used for these intensities are usually chosen such that equal increments in value result in approximately equal apparent increases in brightness. In practise this means that the numerical value is approximately proportional to the log of the true light intensity (energy of the wave) - this is Weber's Law. Throughout this course, we shall refer to these numerical values as intensities, since for compression it is most convenient to use a subjectively linear scale.

The YUV Colour Space

The eye is much more sensitive to overall intensity (luminance) changes than to colour changes. Usually most of the information about a scene is contained in its luminance rather than its colour (chrominance).

This is why black-and-white (monochrome) reproduction was acceptable for photography and TV for many years until technology provided colour reproduction at a sufficient cheap price to make its modest advantages worth having.

The luminance (YY) of a pel may be obtained from its RGB components as:

Y=0.3R+0.6G+0.1B Y 0.3 R 0.6 G 0.1 B (1)
These coefficients are only approximate, and are the values defined in the JPEG Book. In other places values of 0.30.3, 0.590.59 and 0.110.11 are used.

RGB representations of images are normally defined so that if R=G=B R G B , the pel is always some shade of gray, and if Y=R=G=B Y R G B in these cases, the 3 coefficients in Equation 1 should sum to unity.

When YY defines the luminance of a pel, its chrominance is usually defined by UU and VV such that: U=0.5BY U 0.5 B Y

V=0.625RY V 0.625 R Y (2)
Note that gray pels will always have U=V=0 U V 0 .

The transformation between RGB and YUV colour spaces is linear and may be achieved by a 3×3 3 3 matrix CC and its inverse:

YUV=CRGB Y U V C R G B (3)
where C=0.30.60.1-0.15-0.30.450.4375-0.3750-0.0625 C 0.3 0.6 0.1 -0.15 -0.3 0.45 0.4375 -0.3750 -0.0625 and
RGB=C-1YUV R G B C Y U V (4)
where C-1=101.61-0.3333-0.8120 C 1 0 1.6 1 -0.3333 -0.8 1 2 0

Visual Sensitivity

Figure 1: Sensitivity of the eye to luminance and chrominance intensity changes.
Figure 1 (figure1.png)

Figure 1 shows the sensitivity of the eye to luminance (YY) and chrominance (UU, VV) components of images. The horizontal scale is spatial frequency, and represents the frequency of an alternating pattern of parallel stripes with sinusoidally varying intensity. The vertical scale is the contrast sensitivity of human vision, which is the ratio of the maximum visible range of intensities to the minimum discernible peak-to-peak intensity variation at the specified frequency.

In Figure 1 we see that:

  • the maximum sensitivity to YY occurs for spatial frequencies around 5 cycles / degree, which corresponds to striped patterns with a half-period (stripe width) of 1.8 mm at a distance of 1 m (~arm's length).
  • The eye has very little response above 100 cycles / degree, which corresponds to a stripe width of 0.1 mm at 1 m. On a standard PC display of width 250 mm, this would require 2500 pels per line! Hence the current SVGA standard of 1024×768 1024 768 pels still falls somewhat short of the ideal and is limited by CRT spot size. Modern laptop displays have a pel size of about 0.3 mm, but are pleasing to view because the pel edges are so sharp (and there is no flicker).
  • The sensitivity to luminance drops off at low spatial frequencies, showing that we are not very good at estimating absolute luminance levels as long as they do not change with time - the luminance sensitivity to temporal fluctuations (flicker) does not fall off at low spatial frequencies.
  • The maximum chrominance sensitivity is much lower than the maximum luminance sensitivity with blue-yellow (UU) sensitivity being about half of red-green (VV) sensitivity and about 16 1 6 of the maximum luminance sensitivity.
  • The chrominance sensitivities fall off above 1 cycle / degree, requiring a much lower spatial bandwidth than luminance.
We can now see why it is better to convert to the YUV domain before attempting image compression. The UU and VV components may be sampled at a lower rate than YY (due to narrower bandwidth) and may be quantised more coarsely (due to lower contrast sensitivity).

A colour demonstration on the computer will show this effect.

Colour compression Strategy

The 3 RGB samples at each pel are transformed into 3 YUV samples using Equation 3.

Most image compression systems then subsample the UU and VV information by 2:1 horizontally and vertically so that there is one UU and one VV pel for each 2×2 2 2 block of YY pels. The subsampled UU and VV pels are obtained by averaging the four UU and VV samples, from Equation 3. The quarter-size UU and VV subimages are then compressed using the same techniques as the full-size YY image, except that coarser quantisation may be used for UU and VV, so the total cost of adding colour may only be about 25% increase in bit rate. Sometimes UU and VV are subsamples 4:1 each way (16:1 total), giving an even lower cost of colour.

From now on we will mostly be considering compression of the monochrome YY image, and assume that similar techniques will be used for the smaller UU and VV subimages.

Activity Masking

A final feature of human vision, which is useful for compression, is that the contrast sensitivity to a given pattern is reduced in the presence of other patterns (activity) in the same region. This is known as activity masking.

It is a complicated subject as it depends on the similarity between the given pattern and the background activity. However in general, the higher the variance of the pels in a given region (typically ~ 8 to 16 pels across), the lower is the contrast sensitivity.

Hence compression schemes which adapt the quantisation to local image activity tend to perform better than those which use uniform quantisation.

A computer demonstration will show the effect of reduced sensitivity to quantisation effects when noise is added to an image.

Comments, questions, feedback, criticisms?

Send feedback