Skip to content Skip to navigation

Connexions

You are here: Home » Content » The Discrete Cosine Transform (DCT)

Navigation

Lenses

What is a lens?

Definition of a lens

Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

Who can create a lens?

Any individual member, a community, or a respected organization.

What are tags? tag icon

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

This content is ...

In these lenses

  • eScience, eResearch and Computational Problem Solving

    This module is included inLens: eScience, eResearch and Computational Problem Solving
    By: Jan E. OdegardAs a part of collection: "Image Coding"

    Click the "eScience, eResearch and Computational Problem Solving" link to see all content selected in this lens.

Recently Viewed

This feature requires Javascript to be enabled.
 

The Discrete Cosine Transform (DCT)

Module by: Nick Kingsbury. E-mail the author

Summary: This module briefly outlines the energy compression technique - the discrete cosine transform (DCT).

The main standard for image compression in current use is the JPEG (Joint Picture Experts Group) standard, devised and refined over the period 1985 to 1993. It is formally known as ISO Draft International standard 10981-1 and CCITT Recommendation T.81, and is described in depth in The JPEG Book by W B Pennebaker and J L Mitchell, Van Nostrand Reinhold 1993.

We shall briefly outline the baseline version of JPEG but first we consider its energy compression technique - the discrete cosine transform (DCT).

The Discrete Cosine Transform (DCT)

In this equation from our discussion of the Haar transform, we met the 2-point Haar transform and in this equation we saw that it can be easily inverted if the transform matrix TT is orthonormal so that T-1=TT T T .

If TT is of size nn x nn, where n=2m n 2 m , then we may easily generate larger orthonormal matrices, which lead to definitions of larger transforms.

An nn-point transform is defined as:

( y1 yn )=T( x1 xn ) y 1 y n T x 1 x n
(1)
where T=( t1,1t1,n tn,1tn,n ) T t 1 1 t 1 n t n 1 t n n

A 4-point orthonormal transform matrix that is equivalent to 2 levels of the Haar transform is:

T=12( 1010 10-10 0200 0002 )12( 1100 1-100 0011 001-1 )=12( 1111 11-1-1 2200 0022 ) T 1 2 1 0 1 0 1 0 -1 0 0 2 0 0 0 0 0 2 1 2 1 1 0 0 1 -1 0 0 0 0 1 1 0 0 1 -1 1 2 1 1 1 1 1 1 -1 -1 2 2 0 0 0 0 2 2
(2)
where 12( 1010 10-10 0200 0002 ) 1 2 1 0 1 0 1 0 -1 0 0 2 0 0 0 0 0 2 is Haar level 2 and 12( 1100 1-100 0011 001-1 ) 1 2 1 1 0 0 1 -1 0 0 0 0 1 1 0 0 1 -1 is Haar level 1. Similarly 3 and 4 level Haar transforms may be expressed using 8 and 16 point transform matrices respectively.

However for n>2 n 2 , there are better matrices than those based on the Haar transform, where better means with improved energy compression properties for typical images.

Discrete Cosine Transforms (DCTs) have some of these improved properties and are also simple to define and implement. The nn rows of an nn-point DCT matrix TT are defined by: i=1n:t1,i=1n i 1 n t 1 i 1 n

(i=1n)(k=2n):tk,i=2ncosπ(2i1)(k1)2n i 1 n k 2 n t k i 2 n 2 i 1 k 1 2 n
(3)
It is straightforward to show that this matrix is orthonormal for nn even, since the norm of each row is unity and the dot product of any pair of rows is zero (the product terms may be expressed as the sum of a pair of cosine functions, which are each zero mean).

The 8-point DCT matrix ( n=8 n 8 ) is:

T=( 0.35360.35360.35360.35360.35360.35360.35360.3536 0.49040.41570.27780.0975-0.0975-0.2778-0.4157-0.4904 0.46190.1913-0.1913-0.4619-0.4619-0.19130.19130.4619 0.4157-0.0975-0.4904-0.27780.27780.49040.0975-0.4157 0.3536-0.3536-0.35360.35360.3536-0.3536-0.35360.3536 0.2778-0.49040.09750.4157-0.4157-0.09750.4904-0.2778 0.1913-0.46190.4619-0.1913-0.19130.4619-0.46190.1913 0.0975-0.27780.4157-0.49040.4904-0.41570.2778-0.0975 ) T 0.3536 0.3536 0.3536 0.3536 0.3536 0.3536 0.3536 0.3536 0.4904 0.4157 0.2778 0.0975 -0.0975 -0.2778 -0.4157 -0.4904 0.4619 0.1913 -0.1913 -0.4619 -0.4619 -0.1913 0.1913 0.4619 0.4157 -0.0975 -0.4904 -0.2778 0.2778 0.4904 0.0975 -0.4157 0.3536 -0.3536 -0.3536 0.3536 0.3536 -0.3536 -0.3536 0.3536 0.2778 -0.4904 0.0975 0.4157 -0.4157 -0.0975 0.4904 -0.2778 0.1913 -0.4619 0.4619 -0.1913 -0.1913 0.4619 -0.4619 0.1913 0.0975 -0.2778 0.4157 -0.4904 0.4904 -0.4157 0.2778 -0.0975
(4)
The rows of TT, known as basis function, are plotted as asterisks in Figure 1. The asterisks are superimposed on the underlying continuous cosine functions, used in all sizes of DCT. Only the amplitude scaling and the maximum frequency vary with the size nn.

Figure 1: The 8-point DCT basis functions(*) and their underlying continuous cosine waves.
Figure 1 (figure1.png)

When we take the transform of an nn-point vector using y=Tx y T x , xx is decomposed into a linear combination of the basis function (rows) of TT, whose coefficients are the samples of yy, because x=TTy x T y .

The basis functions may also be viewed as the impulse responses of FIR filters, being applied to the data xx.

The DCT is closely related to the discrete Fourier transform (DFT). It represents the result of applying the 2n 2 n -point DFT to a vector: x 2 n =( x xrev ) x 2 n x xrev where xrev =( xn x1 ) xrev x n x 1 . x 2 n x 2 n is symmetric about its centre and so the 2n 2 n Fourier coefficients are all purely real and symmetric about zero frequency. The nn DCT coefficients are then the first nn Fourier coefficients.

note:

The DFT must be defined with a half sample period offset on the indexing of the input samples for the above to be strictly true.

Standards

The 8-point DCT is the basis of the JPEG standard, as well as several other standards such as MPEG-1 and MPEG-2 (for TV and video) and H.263 (for video-phones). Hence we shall concentrate on it as our main example, but bear in mind that DCTs may be defined for a wide range of sizes nn.

Content actions

Download module as:

PDF | EPUB (?)

What is an EPUB file?

EPUB is an electronic book format that can be read on a variety of mobile devices.

Downloading to a reading device

For detailed instructions on how to download this content's EPUB to your specific device, click the "(?)" link.

| More downloads ...

Add module to:

My Favorites (?)

'My Favorites' is a special kind of lens which you can use to bookmark modules and collections. 'My Favorites' can only be seen by you, and collections saved in 'My Favorites' can remember the last module you were on. You need an account to use 'My Favorites'.

| A lens I own (?)

Definition of a lens

Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

Who can create a lens?

Any individual member, a community, or a respected organization.

What are tags? tag icon

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

| External bookmarks