<?xml version="1.0" encoding="utf-8" standalone="no"?>
<!DOCTYPE document PUBLIC "-//CNX//DTD CNXML 0.5 plus MathML//EN" "http://cnx.rice.edu/cnxml/0.5/DTD/cnxml_mathml.dtd">
<document xmlns="http://cnx.rice.edu/cnxml" xmlns:md="http://cnx.rice.edu/mdml/0.4" xmlns:bib="http://bibtexml.sf.net/" xmlns:m="http://www.w3.org/1998/Math/MathML" id="new">
  <name>Signal Processing in Processing: Miscellanea</name>
  <metadata>
  <md:version>1.3</md:version>
  <md:created>2005/10/12 17:37:57 GMT-5</md:created>
  <md:revised>2006/06/26 07:46:26.675 GMT-5</md:revised>
  <md:authorlist>
      <md:author id="ppolotti">
      <md:firstname>Pietro</md:firstname>
      
      <md:surname>Polotti</md:surname>
      <md:email>polotti@sci.univr.it</md:email>
    </md:author>
      <md:author id="drocchesso">
      <md:firstname>Davide</md:firstname>
      
      <md:surname>Rocchesso</md:surname>
      <md:email>Davide.Rocchesso@univr.it</md:email>
    </md:author>
  </md:authorlist>

  <md:maintainerlist>
    <md:maintainer id="ppolotti">
      <md:firstname>Pietro</md:firstname>
      
      <md:surname>Polotti</md:surname>
      <md:email>polotti@sci.univr.it</md:email>
    </md:maintainer>
    <md:maintainer id="drocchesso">
      <md:firstname>Davide</md:firstname>
      
      <md:surname>Rocchesso</md:surname>
      <md:email>Davide.Rocchesso@univr.it</md:email>
    </md:maintainer>
  </md:maintainerlist>
  
  <md:keywordlist>
    <md:keyword>audio dithering</md:keyword>
    <md:keyword>color dithering</md:keyword>
    <md:keyword>color palette</md:keyword>
    <md:keyword>edge detection</md:keyword>
    <md:keyword>histogram</md:keyword>
  </md:keywordlist>

  <md:abstract>Miscellaneous topics and effects in audio and image processing.</md:abstract>
</metadata>
  <content>
    <section id="palette_dithering">
      <name>Economic Color Representations</name>
      <para id="palette_tablep">
	In <cnxn document="m12664">Media Representation in
	  Processing</cnxn> we saw how one devotes
	  <m:math> <m:mn>8</m:mn>
	</m:math>
 bits to each channel corresponding to a primary color. If we add to these
	  the alpha channel, the total number of bits per pixel becomes
	  <m:math> <m:mn>32</m:mn>
	</m:math>. We do not always have the possibility to use such a big amount of memory for colors. Therefore, one has to adopt various strategies in order to reduce the number of bits per pixel.
      </para>
      <section id="palette_t">
	<name>Palette</name>
	<para id="palette_tp">
	  A first solution comes from the observation that usually in an image, not all of the <m:math>
		<m:apply>
			<m:power/>
			<m:mn>2</m:mn>
			<m:mn>24</m:mn>
		</m:apply>
	</m:math> representable colors are present at the same time. Supposing that the number of colors necessary for an ordinary image is not greater than
	    <m:math>
		<m:mn>256</m:mn>
	</m:math>, one can think about
	    memorizing the codes of the colors in a table
	    (<term>palette</term>), whose elements are
	    accessible by means of an index of only <m:math>
		<m:mn>8</m:mn>
	</m:math> bits. Thus, the image will require a memory space of
<m:math>
		<m:mn>8</m:mn>
	</m:math> bits per pixel plus the space necessary for the
	    palette. For examples and further explanations see
	    <link src="http://en.wikipedia.org/wiki/Color_depth">color
	    depth</link> in Wikipedia.
	</para>
      </section>
      <section id="dithering">
	<name>Dithering</name>
	<para id="ditheringp">
	  Alternatively, in order to have a low number of bits per pixel, one can apply a digital processing technique called
	  <term>dithering</term>. The idea is that of obtaining
	  a mixture of colors in a perceptual way, exploiting the proximity of small points of different color. An exhaustive
	  presentation of the phenomenon can be found at the <link src="http://en.wikipedia.org/wiki/Dithering">voice
	  dithering</link> of Wikipedia.
	</para>
      </section>
      <section id="floyd_steinberg">
	<name>Floyd-Steinberg's Dithering</name>
	<para id="floyd_steinbergp">
	  The Floyd-Steinberg's algorithm is one of the most popular techniques 
for the distribution of colored pixels in order to obtain a
	  dithering effect. The idea is to minimize the visual artifacts by
	  processing the error-diffusion. The algorithm can be resumed
	  as follows:
	  <list id="floyd_steinbergalgo">
	<item>While proceeding top down and from left to right for each considered pixel, </item>
	<item><list id="fsalgo1">
	<item>calculate the difference between the goal color and the closest representable color (error)</item>
	<item>spread the error on the contiguous pixels according to the mask 	  <m:math type="block" id="fsmask">
			<m:apply>
				<m:times/>
				<m:apply>
					<m:divide/>
					<m:mn>1</m:mn>
					<m:mn>16</m:mn>
				</m:apply>
				<m:matrix>
					<m:matrixrow>
						<m:mn>0</m:mn>
						<m:mn>0</m:mn>
						<m:mn>0</m:mn>
					</m:matrixrow>
					<m:matrixrow>
						<m:mn>0</m:mn>
						<m:mn>X</m:mn>
						<m:mn>7</m:mn>
					</m:matrixrow>
					<m:matrixrow>
						<m:mn>3</m:mn>
						<m:mn>5</m:mn>
						<m:mn>1</m:mn>
					</m:matrixrow>
				</m:matrix>
			</m:apply>
		</m:math>. That is, add 		<m:math>
			<m:apply>
				<m:divide/>
				<m:mn>7</m:mn>
				<m:mn>16</m:mn>
			</m:apply>
		</m:math> of the error to the pixel on the right of the considered one, add  <m:math>
			<m:apply>
				<m:divide/>
				<m:mn>3</m:mn>
				<m:mn>16</m:mn>
			</m:apply>
		</m:math> of the error to the pixel bottom left with respect to the considered one, and so on.
		</item>
</list></item>
</list>
By means of this algorithm it is possible to reproduce an image with different
	  gray levels by means of a device able to define
	only white and black points. The mask of the
	  Floyd-Steinberg's algorithm was chosen in a way that
	  a uniform distribution of gray intensities <m:math>
		<m:apply>
			<m:divide/>
			<m:mn>1</m:mn>
			<m:mn>2</m:mn>
		</m:apply>
	</m:math> produces a chessboard layout pattern. </para>
      </section>
    <exercise id="lena_dither">
      <problem>
	<para id="lena_ditherp">
	By means of Processing, implement a program to process the file
	  <link src="lena.jpg">lena</link>, a "dithered" black and white version of the famous <link src="http://en.wikipedia.org/wiki/Lenna">Lena</link> image. The image,
	 treated only in the left half, should result similar to that of <cnxn target="lena_dithered"/>
	<figure id="lena_dithered">
		<media type="image/png" src="lena_dithered.png">
		</media>
	</figure>
</para>
      </problem>
      <solution>
	<para id="lena_dithers">
	  <code type="block">
<![CDATA[
size(300, 420); 
PImage a;  // Declare variable "a" of type PImage 
a = loadImage("lena.jpg"); // Load the images into the program 
image(a, 0, 0); // Displays the image from point (0,0) 
 
int[][] output = new int[width][height]; 
for (int i=0; i<width; i++)
  for (int j=0; j<height; j++) {
    output[i][j] = (int)red(get(i, j)); 
    }
int grayVal;
float errore;
float val = 1.0/16.0;
float[][] kernel = { {0, 0, 0}, 
                     {0, -1, 7*val}, 
                     {3*val, 5*val, val }}; 

for(int y=0; y<height; y++) { 
  for(int x=0; x<width; x++) { 
    grayVal = output[x][y];// (int)red(get(x, y));
    if (grayVal<128) errore=grayVal;
      else errore=grayVal-256;
    for(int k=-1; k<=1; k++) { 
      for(int j=-1; j<=0 /*1*/; j++) { 
        // Reflect x-j to not exceed array boundary 
        int xp = x-j; 
        int yp = y-k; 
        if (xp < 0) { 
          xp = xp + width; 
        } else if (x-j >= width) { 
          xp = xp - width;  
        } 
        // Reflect y-k to not exceed array boundary 
        if (yp < 0) { 
          yp = yp + height; 
        } else if (yp >= height) { 
          yp = yp - height; 
      }
      output[xp][yp] = (int)(output[xp][yp] + errore * kernel[-j+1][-k+1]);
      } 
    } 
  } 
} 
 
for(int i=0; i<height; i++)  
  for(int j=0; j<width; j++) 
    if (output[j][i] < 128) output[j][i] = 0;
    else output[j][i] = 255; 
    
// Display the result of dithering on half image
loadPixels(); 
for(int i=0; i<height; i++) { 
  for(int j=0; j<width/2; j++) { 
    pixels[i*width + j] = 
      color(output[j][i], output[j][i], output[j][i]);
  } 
} 
updatePixels(); 
]]>
</code>
	</para>
      </solution>
    </exercise>
    </section>
<section id="audio_quant">
<name>Economic Sound Representations</name>
      <para id="pitched_noise">
In case of audio signals, the use of dithering aims at reducing
the perceptual effect of the error produced by the changes in the quantization resolution, that one typically performs when recording and processing audio signals. For example, when recoding music, more than 16-bits of quantization are usually employed. Furthermore, mathematical operations applied to the signal (as, for instance, simple dynamical variations) require an increasing of the bit depth that is of the number of bits. As soon as one reaches the final product, the audio CD, the number of quantization bits has to be reduced to 16. In each of these consecutives processes of re-quantization one introduces an error, that adds up. In case of a reduction of the number
of bits, it is possible to truncate the values (that is the digits after the decimal point are neglected and set to zero) or round them (that is the decimal number is approximated to the closest integer). In both cases one introduces an error. In particular, when one considers signals with a well defined <term>pitch</term> (as in the case of musical signals), the error becomes periodic-like. From the example of the <link src="http://en.wikipedia.org/wiki/Dithering">voice dithering</link>
of Wikipedia, the reason of this additional
pseudo-periodic noise (i.e. harmonic-like) becomes clear. This distortion corresponds to a buzz-like sound that "follows" the pitch of the quantized sound. The whole result is quite annoying from a perceptual point of view. 
</para>
<para id="audio_dithering">In case
of audio signals, thus, dithering has the function of transforming this
buzz-like sound in a background noise similar to a rustling, less annoying from a listening point of view. In <cnxn target="clar"/> an example of some
periods of the waveform of a clarinet quantized with 16
bits is reported. The result of a reduction of the number of bits to 8 is represented
in <cnxn target="clar8"/>. It is clearly visible how the reduction of the quantization levels produces series of lines with constant amplitude. The application of dithering generates a further
transformation that, how one can see in <cnxn target="clar8dith"/> "breaks"
the constant lines by means of the introduction of white noise. The <cnxn target="FT-clar"/>, <cnxn target="FT-clar8"/> and <cnxn target="FT-clar8dith"/> represent the Fourier transforms of 
the sounds of <cnxn target="clar"/>, <cnxn target="clar8"/> and <cnxn target="clar8dith"/>, respectively. In the frequecy 
representation, it is also visible how the change to a
quantization at 8 bits introduces improper harmonics (<cnxn target="FT-clar8"/>) not present in the sound at 16 bit (<cnxn target="FT-clar"/>). These harmonics are canceled by the effect of dithering (<cnxn target="FT-clar8dith"/>).

<figure id="clar">
	    <media type="image/png" src="Clarinetto.png">
	    </media>
	  </figure>

<figure id="clar8">
	    <media type="image/jpg" src="Clarinetto8.png">
	    </media>
	  </figure>

<figure id="clar8dith">
	    <media type="image/jpg" src="Clarinetto8dith.png">
	    </media>
	  </figure>

<figure id="FT-clar">
	    <media type="image/jpg" src="FT-Clarinetto.png">
	    </media>
	  </figure>

<figure id="FT-clar8">
	    <media type="image/jpg" src="FT-Clarinetto8.png">
	    </media>
	  </figure>

<figure id="FT-clar8dith">
	    <media type="image/jpg" src="FT-Clarinetto8dith.png">
	    </media>
	  </figure>
 </para>

<para id="noise_shapingp">There exist also methods that exploit perceptual factors as the fact that our ear is more sensitive in the central region of the audio band and less sensitive in the higher region. This allows one to make the effect of a re-quantization less audible. This is the case of the
<term>noise shaping</term> techniques. This method consists
in "modeling" the quantization noise. <cnxn target="clar8dithNS"/> represents the sound of a clarinet
re-quantized with 8 bits. A dithering and a noise-shaping were applied to the sound. The result, apparently destructive from the point of view of the waveform, corresponds to a sound, whose 
spectrum is closer to that of the original sound at 16 bits, excepted
for a considerable increasing of energy in the very high frequency region (<cnxn target="FT-clar8dithNS"/>). This high frequency noise produces this "ruffled" waveform, i.e. high energy fast amplitude variations. This noise is anyway not
audible. Thus, at a listening test, the final result is better than the previous one.

<figure id="clar8dithNS">
		<media type="image/jpg" src="Clarinetto8dithNS.png">
		</media>
	</figure>
	<figure id="FT-clar8dithNS">
		<media type="image/jpg" src="FT-Clarinetto8dithNS.png">
		</media>
	</figure>

It is possible to think about the noise shaping as
the audio counterpart of the Floyd-Steinberg's algorithm for graphics. In the
audio case the error propagation occurs in the time-domain instead of the space-domain. 
The most simple version of noise shaping can be obtained by means of the definition of the quantization error  
	<equation id="eyy">
		<m:math type="block">
			<m:apply>
				<m:eq/>
				<m:apply>
					<m:ci type="fn">e</m:ci>
					<m:ci>n</m:ci>
				</m:apply>
				<m:apply>
					<m:minus/>
					<m:apply>
						<m:ci type="fn">y</m:ci>
						<m:ci>n</m:ci>
					</m:apply>
					<m:apply>
						<m:ci type="fn">Q</m:ci>
						<m:apply>
							<m:ci type="fn">y</m:ci>
							<m:ci>n</m:ci>
						</m:apply>
					</m:apply>
				</m:apply>
			</m:apply>
		</m:math></equation> where <equation id="yxe">
		<m:math type="block">
			<m:apply>
				<m:eq/>
				<m:apply>
					<m:ci type="fn">y</m:ci>
					<m:ci>n</m:ci>
				</m:apply>
				<m:apply>
					<m:plus/>
					<m:apply>
						<m:ci type="fn">x</m:ci>
						<m:ci>n</m:ci>
					</m:apply>
					<m:apply>
						<m:ci type="fn">e</m:ci>
						<m:apply>
							<m:minus/>
							<m:ci>n</m:ci>
							<m:mn>1</m:mn>
						</m:apply>
					</m:apply>
				</m:apply>
			</m:apply>
		</m:math></equation> and <m:math>
		<m:ci>x</m:ci>
	</m:math> is the non-quantized signal. Further details about noise shaping can be found
	  at Wikipedia, <link src="http://en.wikipedia.org/wiki/Noise_shaping">noise
	  shaping</link>.

What presented above can be tested in the sound examples 
<link src="./clarinetto.wav"> clarinet</link> , <link src="./clarinetto8.wav"> clarinet at 8 bits</link>, <link src="./clarinetto8dith.wav"> clarinet at 8 bits with dithering</link> and <link src="./clarinetto8dithNs.wav"> clarinet at 8 bit with noise shaping</link> that contain the clarinet sounds represented in <cnxn target="clar"/>, <cnxn target="clar8"/>, <cnxn target="clar8dith"/>, e <cnxn target="clar8dithNS"/>, respectively. 
  </para>
</section>

    <section id="histogram">
      <name>Histogram-Based Processing.</name>
      <definition id="istogrammad">
	<term>Image Histograms </term>
	<meaning>Graphic representation by means of vertical bars, where each bar represents the number of pixels present
	in the image for a given intensity of the gray scale (or color channel). <link src="http://en.wikipedia.org/wiki/Color_histogram">Wikipedia definition.</link></meaning>      </definition>
      <para id="histogramp">
	Among the <link src="http://www.processing.org/learning/examples/blur.html">examples
	in Processing</link>, one finds the code <link src="http://www.processing.org/learning/examples/histogram.html">Histogram</link>
	that overlap an image to its own histogram.
      </para>
      <para id="histogrampp">
	The histogram offers a synthetic representation of
	images, in which one looses the information concerning the pixel positions and considers only the chromatic aspects. This provides information about the <term>Tonal Gamma</term> of an image
	(gray intensity that are present) and about the
	<term>Dynamics</term> (extension of the Tonal Gamma). The image of a chess board, for example, has a Tonal Gamma that includes only two gray levels (black and white) but it has a maximal dynamics (since white and black are the two extremity of the representable gray levels).</para>
      <para id="histogramtp">
	The histogram is the starting point for various processing effects aiming at 
balancing or altering the chromatic contents of an image. In general, 
the question is building a map <m:math>
		<m:apply>
			<m:eq/>
			<m:msub>
				<m:mi>g</m:mi>
				<m:mn>o</m:mn>
			</m:msub>
			<m:apply>
				<m:ci type="fn">f</m:ci>
				<m:msub>
					<m:mi>g</m:mi>
					<m:mn>i</m:mn>
				</m:msub>
			</m:apply>
		</m:apply>
	</m:math> for the gray levels (or color-channel levels) that can be applied to
each pixel. The histogram can drive the definition of this map. 
      </para>
      <section id="hist_transp">
	<name>Translation and Expansion of an Histogram</name>
	<para id="hist_transpp">
	If the map is of the kind <m:math>
		<m:apply>
			<m:eq/>
			<m:msub>
				<m:mi>g</m:mi>
				<m:mn>o</m:mn>
			</m:msub>
			<m:apply>
				<m:plus/>
				<m:msub>
					<m:mi>g</m:mi>
					<m:mn>i</m:mn>
				</m:msub>
				<m:ci>k</m:ci>
			</m:apply>
		</m:apply>
	</m:math>
	  the histogram is translated in the sense of a higher or lower brightness, 
according to the sign of <m:math>
		<m:ci>k</m:ci>
	</m:math>. On the other side, if the map is of the kind
	    <m:math>
		<m:apply>
			<m:eq/>
			<m:msub>
				<m:mi>g</m:mi>
				<m:mn>o</m:mn>
			</m:msub>
			<m:apply>
				<m:times/>
				<m:ci>k</m:ci>
				<m:msub>
					<m:mi>g</m:mi>
					<m:mn>i</m:mn>
				</m:msub>
			</m:apply>
		</m:apply>
	</m:math> the histogram will be expanded or compressed, for
	    values of <m:math>
		<m:ci>k</m:ci>
	</m:math>
	    smaller or greater than <m:math>
		<m:mn>1</m:mn>
	</m:math>, respectively.
	</para>
	<para id="contrast_stretchingp">
	  The <term>contrast stretching</term> is one of the operations of this kind 
of linear scaling that tries to extend the dynamic range of an image. The interval
	    by means of which one performs the scaling is set on the basis of the 
histogram, neglecting, for example, the
	    tails of the distribution corresponding to <m:math>
		<m:mn>10</m:mn>
		<m:ci>%</m:ci>
	</m:math> of the darkest and brightest pixels. In order to find more information and practical suggestions see <link src="http://www.cee.hw.ac.uk/hipr/html/stretch.html">: A Guide to Contrast Stretching.</link>
</para>
      </section>
      <section id="non-linear_scaling">
	<name>Non Linear Scaling</name>
	<para id="non-linear_scalingp">
	  More in general, the map 
	  <m:math>
		<m:apply>
			<m:eq/>
			<m:msub>
				<m:mi>g</m:mi>
				<m:mn>o</m:mn>
			</m:msub>
			<m:apply>
				<m:ci type="fn">f</m:ci>
				<m:msub>
					<m:mi>g</m:mi>
					<m:mn>i</m:mn>
				</m:msub>
			</m:apply>
		</m:apply>
	</m:math>
	  can be non linear, and this allows a greater
	  flexibility in the manipulation of the histogram. A useful
	  instrument is the one that allows to manipulate
	  interactively the scaling map and to see the results on the image and/or 
on the histogram in real time. The instrument
	  <code>Color Tools/Curves</code> of the image processing software <link src="http://www.gimp.org">Gimp</link> does this, using an interpolating spline. In
	  Processing it is possible to build a similar instrument, as reported in <cnxn target="lenahist"/>.
</para>
	<example id="lenahist">
	  <para id="lenahistp"> <link src="histogram_t.html">Applet
	  that allows to apply a non linear scaling to the
	  gray levels and to analyze the effect by means of an histogram. </link>
	  </para>
	</example>
      </section>
      <section id="histogrameq">
	<name>Equalization of an Histogram</name>
	<para id="histogrameqp">
	  The non linear scaling is the tool to
	  <term>equalize</term> the histogram, that is to shape it in a desirable way.
 An image has a balanced tonal gamma, if all of the gray levels are
	  represented and if the distribution is approximately uniform. 
In other words, one aims at a flat histogram. Without entering too much into the mathematical
details (more information is available in the <link src="http://www.cee.hw.ac.uk/hipr/html/histeq.html">Guide
	  to the Equalization of the Histogram</link>), one can say that the non linear map 
	  to be used for the equalization is obtained from the
	  <term>cumulated distribution</term> of the histogram
	  of the image
	  <m:math>
		<m:apply>
			<m:eq/>
			<m:apply>
				<m:ci type="fn">f</m:ci>
				<m:msub>
					<m:mi>g</m:mi>
					<m:mn>i</m:mn>
				</m:msub>
			</m:apply>
			<m:apply>
				<m:sum/>
				<m:bvar>
					<m:ci>k</m:ci>
				</m:bvar>
				<m:lowlimit>
					<m:mn>0</m:mn>
				</m:lowlimit>
				<m:uplimit>
					<m:msub>
						<m:mi>g</m:mi>
						<m:mn>i</m:mn>
					</m:msub>
				</m:uplimit>
				<m:apply>
					<m:ci type="fn">h</m:ci>
					<m:ci>k</m:ci>
				</m:apply>
			</m:apply>
		</m:apply>
	</m:math>, where <m:math>
		<m:apply>
			<m:ci type="fn">h</m:ci>
			<m:ci>k</m:ci>
		</m:apply>
	</m:math> is the frequency, properly scaled by means of a 
	    normalization constant, with which the <m:math>
		<m:ci>k</m:ci>
	</m:math>-th gray level appears .
	</para>
      </section>
    <exercise id="histogramap">
      <problem>
	<para id="histogramapp">
	    Modify the Processing code of <cnxn target="lenahist"/> in order to add the operation of
	    equalization of the histogram.
	</para>
      </problem>
	<solution>
	  <para id="histogramsp">
	    <code type="block">
<![CDATA[
int grayValues = 256;
int[] hist = new int[grayValues]; 
int[] histc = new int[grayValues];
PImage a; 

void setup() {
  background(255);
  stroke(0,0,0);
  size(300, 420); 
  colorMode(RGB, width); 
  framerate(5);
  a = loadImage("lena.jpg"); 
  image(a, 0, 0);
}

void draw() {
    // calculate the histogram 
  for (int i=0; i<width; i++) { 
    for (int j=0; j<height; j++) { 
      int c = constrain(int(red(get(i,j))), 0, grayValues-1);
      hist[c]++;
    } 
  } 

  // Find the largest value in the histogram 
  float maxval = 0; 
  for (int i=0; i<grayValues; i++) { 
    if(hist[i] > maxval) { 
      maxval = hist[i]; 
    }  
  } 

  // Accumulate the histogram
  histc[0] = hist[0];
  for (int i=1; i<grayValues; i++) { 
    histc[i] = histc[i-1] + hist[i];    
  } 
  
  // Normalize the histogram to values between 0 and "height" 
  for (int i=0; i<grayValues; i++) { 
    hist[i] = int(hist[i]/maxval * height); 
  } 

  if (mousePressed == true) { //equalization
    for (int i=1; i<grayValues; i++) { 
    println(float(histc[i])/histc[grayValues-1]*256);
    } 
    loadPixels();
    println("click");
    for (int i=0; i<width; i++)
      for (int j=0; j<height; j++) {
        //normalized cumulated histogram mapping
        pixels[i+j*width] = color(
          int(
          float(histc[constrain(int(red(a.get(i,j))), 0, grayValues-1)])/
            histc[grayValues-1]*256)); 
      }
    updatePixels();
     }
 
  // Draw half of the histogram 
  stroke(50, 250, 0); 
  strokeWeight(2);
  for (int i=0; i<grayValues; i++) { 
    line(i, height, i, height-hist[i]); 
  } 
}
]]>
</code>
	  </para>
	</solution>
    </exercise>
    </section>
    <section id="edges_regions">
      <name>Segmentation and Contour Extraction</name>
      <section>
	<name>Contours</name>
	<para id="contornip">
	 The objects that populate a scene represented in an
	  image are usually detectable by means of their contours:
	  Thread-like profiles that correspond to a fast variation of color 
	  or of gray intensity. The contour extraction is one of the typical 
operation of the image processing. From the point of view of the Mathematical Analysis, 
a fast variation of a function corresponds to a high value of the 
	  <term>derivative</term> function. In the frame of Digital Signal Processing 
(DSP), the derivative can be approximated by means of a difference operation, i.e. 
by means of a filter. 
The filters that let fast variations through and eliminate slow
	  variations are of a high-pass type. It is not surprising, thus,
	  that for contour extraction, one uses convolution masks similar to that seen
 in <cnxn document="m12827" target="crispp">Elementary Filters</cnxn>
	  employed for edge crispening. More in detail, one can say that in
	  2D one looks for points with maximum amplitude of the
	  <term>gradient</term>, i.e. points, where the <term>Laplacian</term> 
of the image that is its second spatial derivative is necessarily zero. 
The Laplacian can be approximated (in the discrete space) by means of a convolution mask
	  
<m:math type="block">
		<m:matrix>
			<m:matrixrow>
				<m:mn>0</m:mn>
				<m:mn>1</m:mn>
				<m:mn>0</m:mn>
			</m:matrixrow>
			<m:matrixrow>
				<m:mn>1</m:mn>
				<m:mn>-4</m:mn>
				<m:mn>1</m:mn>
			</m:matrixrow>
			<m:matrixrow>
				<m:mn>0</m:mn>
				<m:mn>1</m:mn>
				<m:mn>0</m:mn>
			</m:matrixrow>
		</m:matrix>
	</m:math>.  The direct application of the Laplacian is often not
	  satisfying, since the result is too sensitive to noise and to small details. 
It is, thus, useful to combine the Laplacian mask with that of a low-pass filter. 
The combination of a Laplacian and a Gaussian (<term>LoG</term>) filter
	  produces, in the case <m:math>
		<m:mn>5</m:mn>
	</m:math> by<m:math>
		<m:mn>5</m:mn>
	</m:math>, the mask
<m:math type="block">
		<m:matrix>
			<m:matrixrow>
				<m:mn>0</m:mn>
				<m:mn>0</m:mn>
				<m:mn>-1</m:mn>
				<m:mn>0</m:mn>
				<m:mn>0</m:mn>
			</m:matrixrow>
			<m:matrixrow>
				<m:mn>0</m:mn>
				<m:mn>-1</m:mn>
				<m:mn>-2</m:mn>
				<m:mn>-1</m:mn>
				<m:mn>0</m:mn>
			</m:matrixrow>
			<m:matrixrow>
				<m:mn>-1</m:mn>
				<m:mn>-2</m:mn>
				<m:mn>16</m:mn>
				<m:mn>-2</m:mn>
				<m:mn>-1</m:mn>
			</m:matrixrow>
			<m:matrixrow>
				<m:mn>0</m:mn>
				<m:mn>-1</m:mn>
				<m:mn>-2</m:mn>
				<m:mn>-1</m:mn>
				<m:mn>0</m:mn>
			</m:matrixrow>
			<m:matrixrow>
				<m:mn>0</m:mn>
				<m:mn>0</m:mn>
				<m:mn>-1</m:mn>
				<m:mn>0</m:mn>
				<m:mn>0</m:mn>
			</m:matrixrow>
		</m:matrix>
	</m:math>
	  In the software Gimp, a contour enhancer is available, based on the difference 
between two Gaussian filters, corresponding to two Gaussian curves 
with different amplitude. This produces an inhibition effect outside the principal
	  contours, in a way similar to what happens in our perceptual system.
	</para>
      </section>
      <section>
	<name>Regions</name>
	<para id="regionsp">
	  In many applications it is necessary to isolate the various objects that populate a scene, starting from their representation, as pixel collections of an image. For example, it could be interesting to isolate an
	  object in <term>foreground</term> from the <term>background</term>. In this case, one talks about segmentation or extraction of regions. The most simple way for isolating different regions is that of doing it on a color basis, or on a gray-intensity basis. Also in this case, the operation can be driven by the histogram that
	  can be helpful in order to establish a gray <term>threshold</term>. All the darker pixels will be
	  mapped to black, while all the lighter ones will be mapped to white. For example, if the histogram presents two evident maxima, it is possible to assign the region of one maximum to the foreground
	  (since it is, for example, lighter) and the region corresponding to the other maximum to the background (since it is, for example, darker). The threshold will be chosen in between the two maxima. Sometimes it is necessary to 
	  establish a multiplicity of thresholds, in order to isolate different
	  regions by means of different gray levels. For color images, the thresholds can be different for different RGB channels. For more details and practical suggestions see the <link src="http://www.cee.hw.ac.uk/hipr/html/threshld.html">guide
	  to thresholding</link>.
	</para>
      </section>
    </section>
    <exercise id="edgesex">
      <problem>
	<para id="edgesexp">
	  Apply the Laplacian filter and the LoG filter to the image of Lena.
	</para>
      </problem>
    </exercise>
    <exercise id="regionf">
      <problem>
	<para id="regionfp">
	  Show that the extraction of a background figure by means of a 
	  threshold can be implemented by means of non
	  linear map  <m:math>
		<m:apply>
			<m:eq/>
			<m:msub>
				<m:mi>g</m:mi>
				<m:mn>o</m:mn>
			</m:msub>
			<m:apply>
				<m:ci type="fn">f</m:ci>
				<m:msub>
					<m:mi>g</m:mi>
					<m:mn>i</m:mn>
				</m:msub>
			</m:apply>
		</m:apply>
	</m:math>. What should be the form of this map?
	</para>
      </problem>
      <solution>
	<para id="regionfs">
	  It is a step map, with transition from <m:math>
		<m:mn>0</m:mn>
	</m:math> to <m:math>
		<m:mn>255</m:mn>
	</m:math> set in correspondence of the chosen threshold.
	</para>
      </solution>
    </exercise>
    <exercise id="regiongimp">
      <problem>
	<para id="regiongimpp">
	  Employ a program for image processing
	  (ex. <link src="http://www.gimp.org">Gimp</link>) in order to
	  isolate (by means of thresholding) the breaks in the image of the
	  <link src="http://cnx.rice.edu/content/m12837/latest/vetro.jpg">broken glass</link>.
	</para>
      </problem>
    </exercise>


<section id="Mulaw">
      <name>Audio Dynamic Compression</name>
        <para id="mu_law">
For audio signals, similarly to what seen for images, one considers the problem of reducing the data
necessary to represent a sound, while preserving an acceptable quality of the signal from a perceptual point of view. It is not easy to define, what is meant by "acceptable quality", when one perform a data reduction or, better, a data compression. In
general the qualitative evaluation parameters of the audio compression standards are statistical, based on the results of listening tests made on groups of listeners, representing a wide gamma of users. The audio compression standards are usually founded on the optimization of the
dynamics of the signal, that is on the optimization of the number of bits employed for the quantization. A well known example of compression standard is
that of mp3, in which one exploits psychoacoustic phenomena, as the fact that louder sounds mask (make inaudible) softer sounds. In the reproduction of digitalized sounds,
the thing that one wants to mask is the quantization noise. In other words, if the sound has a wide dynamics (it is loud) one can adopt a greater quantization step, since the louder quantization noise produced by the rougher subdivision of the quantization levels is anyway masked by the reproduced sound. 
Still simplifying things in a drastic way, one could say that mp3 varies the step of 
quantization according to the dynamics of the sound and in a different way in different 
bands of frequency. In other words, the signal is divided into many frequency bands (in a similar way as the Equalizer of a Hi-fi system does) and each band is quantized separately. This allows
a reduction of even 20 times of the number of bits 
with respect to a fixed 16 bit dynamics. Another compression technique is provided by the mu-law (µ-law). This standard is used mainly in audio systems for digital communication in North America and Japan. In this case, the main idea is to modify the dynamic 
range of the analogical audio signal before the quantization.  Behind
these compression techniques, there is once more a psychoacoustic phenomenon that is the fact that our perception of the intensity is not linear but logarithmic-like. In other words, our 
perception behaves approximately according to what shown in <cnxn target="loudness"/>.
</para>
	  <figure id="loudness">
	    <media type="image/png" src="Loudness.png">
	    </media>
	  </figure>
<para id="mulaw2">
	What the mu-law actually performs is a reduction of the dynamic range of the
signal by means of an amplitude re-scaling according to the map described in 
<cnxn target="mulaw"/>. It is visible how the effect is
that of amplifying the small amplitudes, reducing the range of amplitude
values of the signal (in the sense of big amplitudes) and, as a consequence,
increasing the relationship (the amplitude difference) between the sound and the 
quantization noise. Afterwards, a linear quantization of the non linearly distorted signal 
is performed. As one wants to play back the digital signal,
this is first converted into an analogical signal and then
transformed by means of a curve performing 
an inverse amplitude distortion with respect to that of <cnxn target="mulaw"/>. 
The global result is equivalent to a non linear quantization of sound, where the quantization step is bigger (rougher) for bigger amplitudes and smaller (more detailed) for smaller amplitudes. This corresponds, at least from a qualitative point of view, 
to the way of functioning of our perceptual system. We are more
sensitive to the intensity differences in case of soft sounds and less
sensitive in case of loud and very loud sounds. The A-law, adopted in the digital systems in Europe, is very similar to the mu-law.
</para>
  <figure id="mulaw">
	    <media type="image/png" src="mu-law.png">
	    </media>
	  </figure>
</section>

  </content>
  
</document>
