Summary: Miscellaneous topics and effects in audio and image processing.
In Media Representation in
Processing we saw how one devotes
A first solution comes from the observation that usually in an image, not all of the
Alternatively, in order to have a low number of bits per pixel, one can apply a digital processing technique called dithering. The idea is that of obtaining a mixture of colors in a perceptual way, exploiting the proximity of small points of different color. An exhaustive presentation of the phenomenon can be found at the voice dithering of Wikipedia.
The Floyd-Steinberg's algorithm is one of the most popular techniques for the distribution of colored pixels in order to obtain a dithering effect. The idea is to minimize the visual artifacts by processing the error-diffusion. The algorithm can be resumed as follows:
By means of Processing, implement a program to process the file lena, a "dithered" black and white version of the famous Lena image. The image, treated only in the left half, should result similar to that of Figure 1
![]() |
size(300, 420);
PImage a; // Declare variable "a" of type PImage
a = loadImage("lena.jpg"); // Load the images into the program
image(a, 0, 0); // Displays the image from point (0,0)
int[][] output = new int[width][height];
for (int i=0; i<width; i++)
for (int j=0; j<height; j++) {
output[i][j] = (int)red(get(i, j));
}
int grayVal;
float errore;
float val = 1.0/16.0;
float[][] kernel = { {0, 0, 0},
{0, -1, 7*val},
{3*val, 5*val, val }};
for(int y=0; y<height; y++) {
for(int x=0; x<width; x++) {
grayVal = output[x][y];// (int)red(get(x, y));
if (grayVal<128) errore=grayVal;
else errore=grayVal-256;
for(int k=-1; k<=1; k++) {
for(int j=-1; j<=0 /*1*/; j++) {
// Reflect x-j to not exceed array boundary
int xp = x-j;
int yp = y-k;
if (xp < 0) {
xp = xp + width;
} else if (x-j >= width) {
xp = xp - width;
}
// Reflect y-k to not exceed array boundary
if (yp < 0) {
yp = yp + height;
} else if (yp >= height) {
yp = yp - height;
}
output[xp][yp] = (int)(output[xp][yp] + errore * kernel[-j+1][-k+1]);
}
}
}
}
for(int i=0; i<height; i++)
for(int j=0; j<width; j++)
if (output[j][i] < 128) output[j][i] = 0;
else output[j][i] = 255;
// Display the result of dithering on half image
loadPixels();
for(int i=0; i<height; i++) {
for(int j=0; j<width/2; j++) {
pixels[i*width + j] =
color(output[j][i], output[j][i], output[j][i]);
}
}
updatePixels();
In case of audio signals, the use of dithering aims at reducing the perceptual effect of the error produced by the changes in the quantization resolution, that one typically performs when recording and processing audio signals. For example, when recoding music, more than 16-bits of quantization are usually employed. Furthermore, mathematical operations applied to the signal (as, for instance, simple dynamical variations) require an increasing of the bit depth that is of the number of bits. As soon as one reaches the final product, the audio CD, the number of quantization bits has to be reduced to 16. In each of these consecutives processes of re-quantization one introduces an error, that adds up. In case of a reduction of the number of bits, it is possible to truncate the values (that is the digits after the decimal point are neglected and set to zero) or round them (that is the decimal number is approximated to the closest integer). In both cases one introduces an error. In particular, when one considers signals with a well defined pitch (as in the case of musical signals), the error becomes periodic-like. From the example of the voice dithering of Wikipedia, the reason of this additional pseudo-periodic noise (i.e. harmonic-like) becomes clear. This distortion corresponds to a buzz-like sound that "follows" the pitch of the quantized sound. The whole result is quite annoying from a perceptual point of view.
In case of audio signals, thus, dithering has the function of transforming this buzz-like sound in a background noise similar to a rustling, less annoying from a listening point of view. In Figure 2 an example of some periods of the waveform of a clarinet quantized with 16 bits is reported. The result of a reduction of the number of bits to 8 is represented in Figure 3. It is clearly visible how the reduction of the quantization levels produces series of lines with constant amplitude. The application of dithering generates a further transformation that, how one can see in Figure 4 "breaks" the constant lines by means of the introduction of white noise. The Figure 5, Figure 6 and Figure 7 represent the Fourier transforms of the sounds of Figure 2, Figure 3 and Figure 4, respectively. In the frequecy representation, it is also visible how the change to a quantization at 8 bits introduces improper harmonics (Figure 6) not present in the sound at 16 bit (Figure 5). These harmonics are canceled by the effect of dithering (Figure 7).
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
There exist also methods that exploit perceptual factors as the fact that our ear is more sensitive in the central region of the audio band and less sensitive in the higher region. This allows one to make the effect of a re-quantization less audible. This is the case of the noise shaping techniques. This method consists in "modeling" the quantization noise. Figure 8 represents the sound of a clarinet re-quantized with 8 bits. A dithering and a noise-shaping were applied to the sound. The result, apparently destructive from the point of view of the waveform, corresponds to a sound, whose spectrum is closer to that of the original sound at 16 bits, excepted for a considerable increasing of energy in the very high frequency region (Figure 9). This high frequency noise produces this "ruffled" waveform, i.e. high energy fast amplitude variations. This noise is anyway not audible. Thus, at a listening test, the final result is better than the previous one.
![]() |
![]() |
Among the examples in Processing, one finds the code Histogram that overlap an image to its own histogram.
The histogram offers a synthetic representation of images, in which one looses the information concerning the pixel positions and considers only the chromatic aspects. This provides information about the Tonal Gamma of an image (gray intensity that are present) and about the Dynamics (extension of the Tonal Gamma). The image of a chess board, for example, has a Tonal Gamma that includes only two gray levels (black and white) but it has a maximal dynamics (since white and black are the two extremity of the representable gray levels).
The histogram is the starting point for various processing effects aiming at
balancing or altering the chromatic contents of an image. In general,
the question is building a map
If the map is of the kind
The contrast stretching is one of the operations of this kind
of linear scaling that tries to extend the dynamic range of an image. The interval
by means of which one performs the scaling is set on the basis of the
histogram, neglecting, for example, the
tails of the distribution corresponding to
More in general, the map
Color Tools/Curves of the image processing software Gimp does this, using an interpolating spline. In
Processing it is possible to build a similar instrument, as reported in Example 1.
The non linear scaling is the tool to
equalize the histogram, that is to shape it in a desirable way.
An image has a balanced tonal gamma, if all of the gray levels are
represented and if the distribution is approximately uniform.
In other words, one aims at a flat histogram. Without entering too much into the mathematical
details (more information is available in the Guide
to the Equalization of the Histogram), one can say that the non linear map
to be used for the equalization is obtained from the
cumulated distribution of the histogram
of the image
Modify the Processing code of Example 1 in order to add the operation of equalization of the histogram.
int grayValues = 256;
int[] hist = new int[grayValues];
int[] histc = new int[grayValues];
PImage a;
void setup() {
background(255);
stroke(0,0,0);
size(300, 420);
colorMode(RGB, width);
framerate(5);
a = loadImage("lena.jpg");
image(a, 0, 0);
}
void draw() {
// calculate the histogram
for (int i=0; i<width; i++) {
for (int j=0; j<height; j++) {
int c = constrain(int(red(get(i,j))), 0, grayValues-1);
hist[c]++;
}
}
// Find the largest value in the histogram
float maxval = 0;
for (int i=0; i<grayValues; i++) {
if(hist[i] > maxval) {
maxval = hist[i];
}
}
// Accumulate the histogram
histc[0] = hist[0];
for (int i=1; i<grayValues; i++) {
histc[i] = histc[i-1] + hist[i];
}
// Normalize the histogram to values between 0 and "height"
for (int i=0; i<grayValues; i++) {
hist[i] = int(hist[i]/maxval * height);
}
if (mousePressed == true) { //equalization
for (int i=1; i<grayValues; i++) {
println(float(histc[i])/histc[grayValues-1]*256);
}
loadPixels();
println("click");
for (int i=0; i<width; i++)
for (int j=0; j<height; j++) {
//normalized cumulated histogram mapping
pixels[i+j*width] = color(
int(
float(histc[constrain(int(red(a.get(i,j))), 0, grayValues-1)])/
histc[grayValues-1]*256));
}
updatePixels();
}
// Draw half of the histogram
stroke(50, 250, 0);
strokeWeight(2);
for (int i=0; i<grayValues; i++) {
line(i, height, i, height-hist[i]);
}
}
The objects that populate a scene represented in an
image are usually detectable by means of their contours:
Thread-like profiles that correspond to a fast variation of color
or of gray intensity. The contour extraction is one of the typical
operation of the image processing. From the point of view of the Mathematical Analysis,
a fast variation of a function corresponds to a high value of the
derivative function. In the frame of Digital Signal Processing
(DSP), the derivative can be approximated by means of a difference operation, i.e.
by means of a filter.
The filters that let fast variations through and eliminate slow
variations are of a high-pass type. It is not surprising, thus,
that for contour extraction, one uses convolution masks similar to that seen
in Elementary Filters
employed for edge crispening. More in detail, one can say that in
2D one looks for points with maximum amplitude of the
gradient, i.e. points, where the Laplacian
of the image that is its second spatial derivative is necessarily zero.
The Laplacian can be approximated (in the discrete space) by means of a convolution mask
In many applications it is necessary to isolate the various objects that populate a scene, starting from their representation, as pixel collections of an image. For example, it could be interesting to isolate an object in foreground from the background. In this case, one talks about segmentation or extraction of regions. The most simple way for isolating different regions is that of doing it on a color basis, or on a gray-intensity basis. Also in this case, the operation can be driven by the histogram that can be helpful in order to establish a gray threshold. All the darker pixels will be mapped to black, while all the lighter ones will be mapped to white. For example, if the histogram presents two evident maxima, it is possible to assign the region of one maximum to the foreground (since it is, for example, lighter) and the region corresponding to the other maximum to the background (since it is, for example, darker). The threshold will be chosen in between the two maxima. Sometimes it is necessary to establish a multiplicity of thresholds, in order to isolate different regions by means of different gray levels. For color images, the thresholds can be different for different RGB channels. For more details and practical suggestions see the guide to thresholding.
Apply the Laplacian filter and the LoG filter to the image of Lena.
Show that the extraction of a background figure by means of a
threshold can be implemented by means of non
linear map
It is a step map, with transition from
Employ a program for image processing (ex. Gimp) in order to isolate (by means of thresholding) the breaks in the image of the broken glass.
For audio signals, similarly to what seen for images, one considers the problem of reducing the data necessary to represent a sound, while preserving an acceptable quality of the signal from a perceptual point of view. It is not easy to define, what is meant by "acceptable quality", when one perform a data reduction or, better, a data compression. In general the qualitative evaluation parameters of the audio compression standards are statistical, based on the results of listening tests made on groups of listeners, representing a wide gamma of users. The audio compression standards are usually founded on the optimization of the dynamics of the signal, that is on the optimization of the number of bits employed for the quantization. A well known example of compression standard is that of mp3, in which one exploits psychoacoustic phenomena, as the fact that louder sounds mask (make inaudible) softer sounds. In the reproduction of digitalized sounds, the thing that one wants to mask is the quantization noise. In other words, if the sound has a wide dynamics (it is loud) one can adopt a greater quantization step, since the louder quantization noise produced by the rougher subdivision of the quantization levels is anyway masked by the reproduced sound. Still simplifying things in a drastic way, one could say that mp3 varies the step of quantization according to the dynamics of the sound and in a different way in different bands of frequency. In other words, the signal is divided into many frequency bands (in a similar way as the Equalizer of a Hi-fi system does) and each band is quantized separately. This allows a reduction of even 20 times of the number of bits with respect to a fixed 16 bit dynamics. Another compression technique is provided by the mu-law (µ-law). This standard is used mainly in audio systems for digital communication in North America and Japan. In this case, the main idea is to modify the dynamic range of the analogical audio signal before the quantization. Behind these compression techniques, there is once more a psychoacoustic phenomenon that is the fact that our perception of the intensity is not linear but logarithmic-like. In other words, our perception behaves approximately according to what shown in Figure 10.
![]() |
What the mu-law actually performs is a reduction of the dynamic range of the signal by means of an amplitude re-scaling according to the map described in Figure 11. It is visible how the effect is that of amplifying the small amplitudes, reducing the range of amplitude values of the signal (in the sense of big amplitudes) and, as a consequence, increasing the relationship (the amplitude difference) between the sound and the quantization noise. Afterwards, a linear quantization of the non linearly distorted signal is performed. As one wants to play back the digital signal, this is first converted into an analogical signal and then transformed by means of a curve performing an inverse amplitude distortion with respect to that of Figure 11. The global result is equivalent to a non linear quantization of sound, where the quantization step is bigger (rougher) for bigger amplitudes and smaller (more detailed) for smaller amplitudes. This corresponds, at least from a qualitative point of view, to the way of functioning of our perceptual system. We are more sensitive to the intensity differences in case of soft sounds and less sensitive in case of loud and very loud sounds. The A-law, adopted in the digital systems in Europe, is very similar to the mu-law.
![]() |