Long-tail traffic refers to the probability distribution of telecommunication traffic. A Long-tailed, or heavy-tailed, probability distribution is a distribution that has high probabilities assigned to regions far from the mean or median (Wikipedia 2006b).
Long-tail distributions have been observed in many natural phenomena including both physical and sociological phenomena (Wikipedia 2006b). Long-tailed distributions have been used to model real-world phenomena, e.g. Stock markets, earthquakes, and the weather (Wikipedia 2006b).
In teletraffic engineering, a number of quantities of interest have been shown to have a long-tailed distribution (Wikipedia 2006b).
Example: if we consider the sizes of files transferred from a web-server, then, to a good degree of accuracy, the distribution is heavy-tailed, that is, there are a large number of small files transferred but, crucially, the number of very large files transferred remains significant (Wikipedia 2006b).
As discussed, long-tail distributions have properties that are qualitatively different to the commonly used and memoryless distributions such as the Poisson distribution (Wikipedia 2006b). The Hurst parameter H is a measure of the level of self-similarityof a time series that exhibits long-range dependence, to which the long-tail distribution can be applied (Wikipedia 2006b). H has the values in the region of 0.5 to 1. A value of 0.5 indicates the data is uncorrelated or has only short-range correlations, whilst the closer H is to 1, the greater the degree of long-range dependence (Wikipedia 2006b). These values of H allow for the analysis of traffic, i.e. a value of H = 0.5 yields a pure random process, whilst a value > 0.5 yields a complex process structure (Wikipedia 2006b). In some cases an increase in the Hurst parameter can lead to a reduction in network performance (Wikipedia 2006b). The extent to which long-tailedness degrades network performance is determined by how well congestion controlis able to shape source traffic into an on-average constant output stream while conserving information (Wikipedia 2006b).
A distribution is said to be heavy-tailed if:
This means that regardless of the distribution for small values of the random variable, if the asymptotic shape of the distribution is hyperbolic, it is long-tailed (Wikipedia 2006b). The simplest example of a long-tail distribution is the Pareto distribution(Wikipedia 2006b). The Pareto distribution is hyperbolic over its entire range (Wikipedia 2006).
Two characteristics of the long-tail distribution is:
The probability mass functionof a heavy-tail distribution is given by:
and its cumulative distribution functionis given by:
where k represents the smallest value the random variable can take.
Readers interested in a more rigorous mathematical description of the long-tailed distribution should see the external links.
Long-tail traffic is causedby the fact that there is the possibility that extremely infrequent occurrences in traffic are more likely than anticipated (Wikipedia 2006a). Because of this reason it is desirable to be able to modelsuch a phenomenon so that networks can be provisioned based on accurate assumptions of the traffic that they carry (Wikipedia 2006a).
Given the ubiquity of scale-invariant burstiness observed across diverse networking contexts, finding an effective traffic controlalgorithm capable of detecting and managing self-similar traffic has become an important problem (Wikipedia 2006b). The problem of controlling self-similar network traffic is still in its infancy (Wikipedia 2006b).
Exercise: If a long-tail distribution has an H parameter value of 0.5, what does this indicate? Answer.
References:
Wikipedia. "Teletraffic engineering", Wikimedia Foundation Inc, http://en.wikipedia.org/wiki/Teletraffic_engineering, Last accessed 9 February 2006.Wikipedia. "Long-tail traffic", Wikimedia Foundation Inc, http://en.wikipedia.org/wiki/Long-tail_traffic, Last accessed 10 February 2006.
Brandon Hodgson