Skip to content Skip to navigation Skip to collection information

OpenStax_CNX

You are here: Home » Content » Derived copy of FHSST: Grade 10 Maths [CAPS] » Statistics

Navigation

Lenses

What is a lens?

Definition of a lens

Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

Who can create a lens?

Any individual member, a community, or a respected organization.

What are tags? tag icon

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

This content is ...

Affiliated with (What does "Affiliated with" mean?)

This content is either by members of the organizations listed or about topics related to the organizations listed. Click each link to see a list of all content affiliated with the organization.
  • FETMaths display tagshide tags

    This module is included inLens: Siyavula: Mathematics (Gr. 10-12)
    By: Siyavula

    Review Status: In Review

    Click the "FETMaths" link to see all content affiliated with them.

    Click the tag icon tag icon to display tags associated with this content.

Recently Viewed

This feature requires Javascript to be enabled.

Tags

(What is a tag?)

These tags come from the endorsement, affiliation, and other lenses that include this content.
 

Introduction

Information in the form of numbers, graphs and tables is all around us; on television, on the radio or in the newspaper. We are exposed to crime rates, sports results, rainfall, government spending, rate of HIV/AIDS infection, population growth and economic growth.

This chapter demonstrates how Mathematics can be used to manipulate data, to represent or misrepresent trends and patterns and to provide solutions that are directly applicable to the world around us.

Skills relating to the collection, organisation, display, analysis and interpretation of information that were introduced in earlier grades are developed further.

Recap of Earlier Work

The collection of data has been introduced in earlier grades as a method of obtaining answers to questions about the world around us. This work will be briefly reviewed.

Data and Data Collection

Data

Definition 1: Data

Data refers to the pieces of information that have been observed and recorded, from an experiment or a survey. There are two types of data: primary and secondary. The word "data" is the plural of the word "datum", and therefore one should say, "the data are" and not "the data is".

Data can be classified as primary or secondary, and primary or secondary data can be classified as qualitative or quantitative. Figure 1 summarises the classifications of data.

Figure 1: Classes of data.
Figure 1 (MG10C16_001.png)
  • Primary data: describes the original data that have been collected. This type of data is also known as raw data. Often the primary data set is very large and is therefore summarised or processed to extract meaningful information.
  • Qualitative data: is information that cannot be written as numbers, for example, if you were collecting data from people on how they feel or what their favourite colour is.
  • Quantitative data: is information that can be written as numbers, for example, if you were collecting data from people on their height or weight.
  • Secondary data: is primary data that has been summarised or processed, for example, the set of colours that people gave as favourite colours would be secondary data because it is a summary of responses.

Transforming primary data into secondary data through analysis, grouping or organisation into secondary data is the process of generating information.

Purpose of Collecting Primary Data

Data is collected to provide answers that help with understanding a particular situation. Here are examples to illustrate some real world data collections scenarios in the categories of qualitative and quantitative data.

Qualitative Data

  • The local government might want to know how many residents have electricity and might ask the question: "Does your home have a safe, independent supply of electricity?"
  • A supermarket manager might ask the question: “What flavours of soft drink should be stocked in my supermarket?" The question asked of customers might be “What is your favourite soft drink?” Based on the customers' responses (i.e. which flavours are chosen), the manager can make an informed decision as to what soft drinks to stock.
  • A company manufacturing medicines might ask “How effective is our pill at relieving a headache?” The question asked of people using the pill for a headache might be: “Does taking the pill relieve your headache?” Based on responses, the company learns how effective their product is.
  • A motor car company might want to improve their customer service, and might ask their customers: “How can we improve our customer service?”

Quantitative Data

  • A cell phone manufacturing company might collect data about how often people buy new cell phones and what factors affect their choice, so that the cell phone company can focus on those features that would make their product more attractive to buyers.
  • A town councillor might want to know how many accidents have occurred at a particular intersection, to decide whether a robot should be installed. The councillor would visit the local police station to research their records to collect the appropriate data.
  • A supermarket manager might ask the question: “What flavours of soft drink should be stocked in my supermarket?" The question asked of customers might be “What is your favourite soft drink?” Based on the customers' responses (i.e. the number of customers who liked soft drink A), the manager can make an informed decision as to what soft drinks to stock.

However, it is important to note that different questions reveal different features of a situation, and that this affects the ability to understand the situation. For example, if the first question in the list was re-phrased to be: "Does your home have electricity?" then if you answered yes, but you were getting your electricity from a neighbour, then this would give the wrong impression that you did not need an independent supply of electricity.

Methods of Data Collection

The method of collecting the data must be appropriate to the question being asked. Some examples of data collecting methods are:

  1. Questionnaires, surveys and interviews
  2. Experiments
  3. Other sources (friends, family, newspapers, books, magazines and the Internet)

The most important aspect of each method of data collecting is to clearly formulate the question that is to be answered. The details of the data collection should therefore be structured to take your question into account.

For example, questionnaires, interviews or surveys would be most appropriate for the list of questions in "Purpose of Collecting Primary Data".

Samples and Populations

Before the data collecting starts, it is important to decide how much data is needed to make sure that the results give an accurate reflection to the required answers. Ideally, the study should be designed to maximise the amount of information collected while minimising the effort. The concepts of populations and samples is vital to minimising effort.

The following terms should be familiar:

  • Population: describes the entire group under consideration in a study. For example, if you wanted to know how many learners in your school got the flu each winter, then your population would be all the learners in your school.
  • Sample: describes a group chosen to represent the population under consideration in a study. For example, for the survey on winter flu, you might select a sample of learners, maybe one from each class.
  • Random sample: describes a sample chosen from a population in such a way that each member of the population has an equal chance of being chosen.

Figure 2
Figure 2 (MG10C16_002.png)

Choosing a representative sample is crucial to obtaining results that are unbiased. For example, if we wanted to determine whether peer pressure affects the decision to start smoking, then the results would be different if only boys were interviewed, compared to if only girls were interviewed, compared to both boys and girls being interviewed.

Therefore questions like: "How many interviews are needed?" and "How do I select the candidates for the interviews?" must be asked during the design stage of the sampling process.

The most accurate results are obtained if the entire population is sampled for the survey, but this is expensive and time-consuming. The next best method is to randomly select a sample of subjects for the interviews. This means that whatever the method used to select subjects for the interviews, each subject has an equal chance of being selected. There are various methods of doing this for example, names can be picked out of a hat or can be selected by using a random number generator. Most modern scientific calculators have a random number generator or you can find one on a spreadsheet program on a computer.

So, if you had a total population of 1 000 learners in your school and you randomly selected 100, then that would be the sample that is used to conduct your survey.

Example Data Sets

The remainder of this chapter deals with the mathematical details that are required to analyse the data collected.

The following are some example sets of data which can be used to apply the methods that are being explained.

Data Set 1: Tossing a Coin

A fair coin was tossed 100 times and the values on the top face were recorded. The data are recorded in "Data Set 1: Tossing a coin".

Table 1: Results of 100 tosses of a fair coin. H means that the coin landed heads-up and T means that the coin landed tails-up.
H T T H H T H H H H
H H H H T H H T T T
T T H T T H T H T H
H H T T H T T H T T
T H H H T T H T T H
H T T T T H T T H H
T T H T T H T T H T
H T T H T T T T H T
T H T T H H H T H T
T T T H H T T T H T

Data Set 2: Casting a die

A fair die was cast 100 times and the values on the top face were recorded. The data are recorded in "Data Set 2: Casting a die".

Table 2: Results of 200 casts of a fair die.
3 5 3 6 2 6 6 5 5 6 6 4 2 1 5 3 2 4 5 4
1 4 3 2 6 6 4 6 2 6 5 1 5 1 2 4 4 2 4 4
4 2 6 4 5 4 3 5 5 4 6 1 1 4 6 6 4 5 3 5
2 6 3 2 4 5 3 2 2 6 3 4 3 2 6 4 5 2 1 5
5 4 1 3 1 3 5 1 3 6 5 3 4 3 4 5 1 2 1 2
1 3 2 3 6 3 1 6 3 6 6 1 4 5 2 2 6 3 5 3
1 1 6 4 5 1 6 5 3 2 6 2 3 2 5 6 3 5 5 6
2 6 6 3 5 4 1 4 5 1 4 1 3 4 3 6 2 4 3 6
6 1 1 2 4 5 2 5 3 4 3 4 5 3 3 3 1 1 4 3
5 2 1 4 2 5 2 2 1 5 4 5 1 5 3 2 2 5 1 1

Data Set 3: Mass of a Loaf of Bread

There are regulations in South Africa related to bread production to protect consumers. Here is an excerpt from a report about the legislation:

"The Trade Metrology Act requires that if a loaf of bread is not labelled, it must weigh 800g, with the leeway of five percent under or 10 percent over. However, an average of 10 loaves must be an exact match to the mass stipulated. - Sunday Tribune of 10 October 2004 on page 10"

We can use measurements to test if consumers getting value for money. An unlabelled loaf of bread should weigh 800g. The masses of 10 different loaves of bread were measured at a store for 1 week. The data are shown in Table 3.

Table 3: Masses (in g) of 10 different loaves of bread, from the same manufacturer, measured at the same store over a period of 1 week.
Monday Tuesday Wednesday Thursday Friday Saturday Sunday
802.39 787.78 815.74 807.41 801.48 786.59 799.01
796.76 798.93 809.68 798.72 818.26 789.08 805.99
802.50 793.63 785.37 809.30 787.65 801.45 799.35
819.59 812.62 809.05 791.13 805.28 817.76 801.01
801.21 795.86 795.21 820.39 806.64 819.54 796.67
789.00 796.33 787.87 799.84 789.45 802.05 802.20
788.99 797.72 776.71 790.69 803.16 801.24 807.32
808.80 780.38 812.61 801.82 784.68 792.19 809.80
802.37 790.83 792.43 789.24 815.63 799.35 791.23
796.20 817.57 799.05 825.96 807.89 806.65 780.23

Data Set 4: Global Temperature

The mean global temperature from 1861 to 1996 is listed in Table 4. The data, obtained from http://www.cgd.ucar.edu/stats/Data/Climate/, was converted to mean temperature in degrees Celsius.

Table 4: Global temperature changes over the past 135 years. There has been a lot of discussion regarding changing weather patterns and a possible link to pollution and greenhouse gasses.
Year Temperature Year Temperature Year Temperature Year Temperature
1861 12.66 1901 12.871 1941 13.152 1981 13.228
1862 12.58 1902 12.726 1942 13.147 1982 13.145
1863 12.799 1903 12.647 1943 13.156 1983 13.332
1864 12.619 1904 12.601 1944 13.31 1984 13.107
1865 12.825 1905 12.719 1945 13.153 1985 13.09
1866 12.881 1906 12.79 1946 13.015 1986 13.183
1867 12.781 1907 12.594 1947 13.006 1987 13.323
1868 12.853 1908 12.575 1948 13.015 1988 13.34
1869 12.787 1909 12.596 1949 13.005 1989 13.269
1870 12.752 1910 12.635 1950 12.898 1990 13.437
1871 12.733 1911 12.611 1951 13.044 1991 13.385
1872 12.857 1912 12.678 1952 13.113 1992 13.237
1873 12.802 1913 12.671 1953 13.192 1993 13.28
1874 12.68 1914 12.85 1954 12.944 1994 13.355
1875 12.669 1915 12.962 1955 12.935 1995 13.483
1876 12.687 1916 12.727 1956 12.836 1996 13.314
1877 12.957 1917 12.584 1957 13.139
1878 13.092 1918 12.7 1958 13.208
1879 12.796 1919 12.792 1959 13.133
1880 12.811 1920 12.857 1960 13.094
1881 12.845 1921 12.902 1961 13.124
1882 12.864 1922 12.787 1962 13.129
1883 12.783 1923 12.821 1963 13.16
1884 12.73 1924 12.764 1964 12.868
1885 12.754 1925 12.868 1965 12.935
1886 12.826 1926 13.014 1966 13.035
1887 12.723 1927 12.904 1967 13.031
1888 12.783 1928 12.871 1968 13.004
1889 12.922 1929 12.718 1969 13.117
1890 12.703 1930 12.964 1970 13.064
1891 12.767 1931 13.041 1971 12.903
1892 12.671 1932 12.992 1972 13.031
1893 12.631 1933 12.857 1973 13.175
1894 12.709 1934 12.982 1974 12.912
1895 12.728 1935 12.943 1975 12.975
1896 12.93 1936 12.993 1976 12.869
1897 12.936 1937 13.092 1977 13.148
1898 12.759 1938 13.187 1978 13.057
1899 12.874 1939 13.111 1979 13.154
1900 12.959 1940 13.055 1980 13.195

Data Set 5: Price of Petrol

The price of petrol in South Africa from August 1998 to July 2000 is shown in Table 5.

Table 5: Petrol prices
Date Price (R/l)
August 1998 R 2.37
September 1998 R 2.38
October 1998 R 2.35
November 1998 R 2.29
December 1998 R 2.31
January 1999 R 2.25
February 1999 R 2.22
March 1999 R 2.25
April 1999 R 2.31
May 1999 R 2.49
June 1999 R 2.61
July 1999 R 2.61
August 1999 R 2.62
September 1999 R 2.75
October 1999 R 2.81
November 1999 R 2.86
December 1999 R 2.85
January 2000 R 2.86
February 2000 R 2.81
March 2000 R 2.89
April 2000 R 3.03
May 2000 R 3.18
June 2000 R 3.22
July 2000 R 3.36

Grouping Data

One of the first steps to processing a large set of raw data is to arrange the data values together into a smaller number of groups, and then count how many of each data value there are in each group. The groups are usually based on some sort of interval of data values, so data values that fall into a specific interval, would be grouped together. The grouped data is often presented graphically or in a frequency table. (Frequency means “how many times”)

Exercise 1: Grouping Data

Group the elements of Data Set 1 to determine how many times the coin landed heads-up and how many times the coin landed tails-up.

Solution

  1. Step 1. Identify the groups :

    There are two unique data values: H and T. Therefore there are two groups, one for the H-data values and one for the T-data values.

  2. Step 2. Count how many data values fall into each group. :
    Table 6
    Data Value Frequency
    H 44
    T 56
  3. Step 3. Check that the total of the frequency column is equal to the total number of data values. :

    There are 100 data values and the total of the frequency column is 44+56=100.

Exercises - Grouping Data

  1. The height of 30 learners are given below. Fill in the grouped data below. (Tally is a convenient way to count in 5's. We use llll to indicate 5.)
    Table 7
    142163169132139140152168139150
    161132162172146152150132157133
    141170156155169138142160164168
    Table 8
    GroupTallyFrequency
    130 h<h< 140  
    140 h<h< 150  
    150 h<h< 160  
    160 h<h< 170  
    170 h<h< 180  
    Click here for the solution
  2. An experiment was conducted in class and 50 learners were asked to guess the number of sweets in a jar. The following guesses were recorded.
    Table 9
    56494011333337293059
    21163844385222243034
    42154833514433171944
    47232747132553572823
    36354023453932582240
    Draw up a grouped frequency table using intervals 11-20, 21-30, 31-40, etc.
    Click here for the solution

Summarising Data

Once the data has been collected, it must be organised in a manner that allows for the information to be extracted most efficiently. For this reason it is useful to be able to summarise the data set by calculating a few quantities that give information about how the data values are spread and about the central values in the data set. Other methods of summarising and representing data will be covered in grade 11.

Measures of Central Tendency

Mean or Average

The mean, (also known as arithmetic mean), is simply the arithmetic average of a group of numbers (or data set) and is shown using the bar symbol

 
¯¯. So the mean of the variable xx is x¯x¯ pronounced "x-bar". The mean of a set of values is calculated by adding up all the values in the set and dividing by the number of items in that set. The mean is calculated from the raw, ungrouped data.

Definition 2: Mean

The mean of a data set, xx, denoted by x¯x¯, is the average of the data values, and is calculated as:

x ¯ = sum of all values number of all values = x 1 + x 2 + x 3 + ... + x n n x ¯ = sum of all values number of all values = x 1 + x 2 + x 3 + ... + x n n
(1)

Method: Calculating the mean

  1. Find the total of the data values in the data set.
  2. Count how many data values there are in the data set.
  3. Divide the total by the number of data values.
Exercise 2: Mean

What is the mean of x={10,20,30,40,50}x={10,20,30,40,50}?

Solution
  1. Step 1. Find the total of the data values :
    10 + 20 + 30 + 40 + 50 = 150 10 + 20 + 30 + 40 + 50 = 150
    (2)
  2. Step 2. Count the number of data values in the data set :

    There are 5 values in the data set.

  3. Step 3. Divide the total by the number of data values. :
    150 ÷ 5 = 30 150 ÷ 5 = 30
    (3)
  4. Step 4. Answer :

    the mean of the data set x={10,20,30,40,50}x={10,20,30,40,50} is 30.

Median

Definition 3: Median

The median of a set of data is the data value in the central position, when the data set has been arranged from highest to lowest or from lowest to highest. There are an equal number of data values on either side of the median value.

The median is calculated from the raw, ungrouped data, as follows.

Method: Calculating the median

  1. Order the data from smallest to largest or from largest to smallest.
  2. Count how many data values there are in the data set.
  3. Find the data value in the central position of the set.
Exercise 3: Median

What is the median of {10,14,86,2,68,99,1}{10,14,86,2,68,99,1}?

Solution
  1. Step 1. Order the data set from lowest to highest :

    1,2,10,14,68,86,99

  2. Step 2. Count the number of data values in the data set :

    There are 7 points in the data set.

  3. Step 3. Find the central position of the data set :

    The central position of the data set is 4.

  4. Step 4. Find the data value in the central position of the ordered data set. :

    14 is in the central position of the data set.

  5. Step 5. Answer :

    14 is the median of the data set {1,2,10,14,68,86,99}{1,2,10,14,68,86,99}.

This example has highlighted a potential problem with determining the median. It is very easy to determine the median of a data set with an odd number of data values, but what happens when there is an even number of data values in the data set?

When there is an even number of data values, the median is the mean of the two middle points.

Tip: Finding the Central Position of a Data Set:
An easy way to determine the central position or positions for any ordered data set is to take the total number of data values, add 1, and then divide by 2. If the number you get is a whole number, then that is the central position. If the number you get is a fraction, take the two whole numbers on either side of the fraction, as the positions of the data values that must be averaged to obtain the median.
Exercise 4: Median

What is the median of {11,10,14,86,2,68,99,1}{11,10,14,86,2,68,99,1}?

Solution
  1. Step 1. Order the data set from lowest to highest :

    1,2,10,11,14,68,85,99

  2. Step 2. Count the number of data values in the data set :

    There are 8 points in the data set.

  3. Step 3. Find the central position of the data set :

    The central position of the data set is between positions 4 and 5.

  4. Step 4. Find the data values around the central position of the ordered data set. :

    11 is in position 4 and 14 is in position 5.

  5. Step 5. Answer :

    the median of the data set {1,2,10,11,14,68,85,99}{1,2,10,11,14,68,85,99} is

    ( 11 + 14 ) ÷ 2 = 12 , 5 ( 11 + 14 ) ÷ 2 = 12 , 5
    (4)

Mode

Definition 4: Mode

The mode is the data value that occurs most often, i.e. it is the most frequent value or most common value in a set.

Method: Calculating the mode Count how many times each data value occurs. The mode is the data value that occurs the most.

The mode is calculated from grouped data, or single data items.

Exercise 5: Mode

Find the mode of the data set x={1,2,3,4,4,4,5,6,7,8,8,9,10,10}x={1,2,3,4,4,4,5,6,7,8,8,9,10,10}

Solution
  1. Step 1. Count how many times each data value occurs. :
    Table 10
    data value frequency data value frequency
    1 1 6 1
    2 1 7 1
    3 1 8 2
    4 3 9 1
    5 1 10 2
  2. Step 2. Find the data value that occurs most often. :

    4 occurs most often.

  3. Step 3. Answer :

    The mode of the data set x={1,2,3,4,4,4,5,6,7,8,8,9,10,10}x={1,2,3,4,4,4,5,6,7,8,8,9,10,10} is 4. Since the number 4 appears the most frequently.

A data set can have more than one mode. For example, both 2 and 3 are modes in the set 1, 2, 2, 3, 3. If all points in a data set occur with equal frequency, it is equally accurate to describe the data set as having many modes or no mode.

Figure 3
Khan academy video on statistics

Measures of Dispersion

The mean, median and mode are measures of central tendency, i.e. they provide information on the central data values in a set. When describing data it is sometimes useful (and in some cases necessary) to determine the spread of a distribution. Measures of dispersion provide information on how the data values in a set are distributed around the mean value. Some measures of dispersion are range, percentiles and quartiles.

Range

Definition 5: Range

The range of a data set is the difference between the lowest value and the highest value in the set.

Method: Calculating the range

  1. Find the highest value in the data set.
  2. Find the lowest value in the data set.
  3. Subtract the lowest value from the highest value. The difference is the range.
Exercise 6: Range

Find the range of the data set x={1,2,3,4,4,4,5,6,7,8,8,9,10,10}x={1,2,3,4,4,4,5,6,7,8,8,9,10,10}

Solution
  1. Step 1. Find the highest and lowest values. :

    10 is the highest value and 1 is the lowest value.

  2. Step 2. Subtract the lowest value from the highest value to calculate the range. :
    10 - 1 = 9 10 - 1 = 9
    (5)
  3. Step 3. Answer :

    For the data set x={1,2,3,4,4,4,5,6,7,8,8,9,10,10}x={1,2,3,4,4,4,5,6,7,8,8,9,10,10}, the range is 9.

Quartiles

Definition 6: Quartiles

Quartiles are the three data values that divide an ordered data set into four groups containing equal numbers of data values. The median is the second quartile.

The quartiles of a data set are formed by the two boundaries on either side of the median, which divide the set into four equal sections. The lowest 25% of the data being found below the first quartile value, also called the lower quartile. The median, or second quartile divides the set into two equal sections. The lowest 75% of the data set should be found below the third quartile, also called the upper quartile. For example:

Table 11
22 24 48 51 60 72 73 75 80 88 90
               
    Lower quartile     Median     Upper quartile    
    (Q1Q1)     (Q2Q2)     (Q3Q3)    

Method: Calculating the quartiles

  1. Order the data from smallest to largest or from largest to smallest.
  2. Count how many data values there are in the data set.
  3. Divide the number of data values by 4. The result is the number of data values per group.
  4. Determine the data values corresponding to the first, second and third quartiles using the number of data values per quartile.
Exercise 7: Quartiles

What are the quartiles of {3,5,1,8,9,12,25,28,24,30,41,50}{3,5,1,8,9,12,25,28,24,30,41,50}?

Solution
  1. Step 1. Order the data set from lowest to highest :

    { 1 , 3 , 5 , 8 , 9 , 12 , 24 , 25 , 28 , 30 , 41 , 50 } { 1 , 3 , 5 , 8 , 9 , 12 , 24 , 25 , 28 , 30 , 41 , 50 }

  2. Step 2. Count the number of data values in the data set :

    There are 12 values in the data set.

  3. Step 3. Divide the number of data values by 4 to find the number of data values per quartile. :
    12 ÷ 4 = 3 12 ÷ 4 = 3
    (6)
  4. Step 4. Find the data values corresponding to the quartiles. :
    Table 12
    1 3 5 8 9 12 24 25 28 30 41 50
          Q 1 Q 1       Q 2 Q 2       Q 3 Q 3      

    The first quartile occurs between data position 3 and 4 and is the average of data values 5 and 8. The second quartile occurs between positions 6 and 7 and is the average of data values 12 and 24. The third quartile occurs between positions 9 and 10 and is the average of data values 28 and 30.

  5. Step 5. Answer :

    The first quartile = 6,5. (Q1Q1)

    The second quartile = 18. (Q2Q2)

    The third quartile = 29. (Q3Q3)

Inter-quartile Range

Definition 7: Inter-quartile Range

The inter quartile range is a measure which provides information about the spread of a data set, and is calculated by subtracting the first quartile from the third quartile, giving the range of the middle half of the data set, trimming off the lowest and highest quarters, i.e. Q3-Q1Q3-Q1.

The semi-interquartile range is half the interquartile range, i.e. Q3-Q12Q3-Q12

Exercise 8: Medians, Quartiles and the Interquartile Range

A class of 12 students writes a test and the results are as follows: 20, 39, 40, 43, 43, 46, 53, 58, 63, 70, 75, 91. Find the range, quartiles and the Interquartile Range.

Solution
  1. Step 1. :
    Table 13
    20 39 40 43 43 46 53 58 63 70 75 91
          Q 1 Q 1       M M       Q 3 Q 3      
  2. Step 2. The Range :

    The range = 91 - 20 = 71. This tells us that the marks are quite widely spread. (Remember, however, that 'wide' and 'large' are relative terms. If you are considering one hundred people, a range of 71 would be 'large', but if you are considering one million people, a range of 71 would likely be 'small', depending, of course, on what you were analyzing).

  3. Step 3. The median lies between the 6th and 7th mark :

    i.e. M=46+532=992=49,5M=46+532=992=49,5

  4. Step 4. The lower quartile lies between the 3rd and 4th mark :

    i.e. Q1=40+432=832=41,5Q1=40+432=832=41,5

  5. Step 5. The upper quartile lies between the 9th and 10th mark :

    i.e. Q3=63+702=1332=66,5Q3=63+702=1332=66,5

  6. Step 6. Analysing the quartiles :

    The quartiles are 41,5, 49,5 and 66,5. These quartiles tell us that 25%% of the marks are less than 41,5; 50%% of the marks are less than 49,5 and 75%% of the marks are less than 66,5. They also tell us that 50%% of the marks lie between 41,5 and 66,5.

  7. Step 7. The Interquartile Range :

    The Interquartile Range = 66,5 - 41,5 = 25. This tells us that the width of the middle 50%% of the data values is 25.

  8. Step 8. The Semi-interquatile Range :

    The Semi-interquartile Range = 252252 = 12,5

Percentiles

Definition 8: Percentiles

Percentiles are the 99 data values that divide a data set into 100 groups.

The calculation of percentiles is identical to the calculation of quartiles, except the aim is to divide the data values into 100 groups instead of the 4 groups required by quartiles.

Method: Calculating the percentiles

  1. Order the data from smallest to largest or from largest to smallest.
  2. Count how many data values there are in the data set.
  3. Divide the number of data values by 100. The result is the number of data values per group.
  4. Determine the data values corresponding to the first, second and third quartiles using the number of data values per quartile.

Five number summary

We can summarise a data set by using the five number summary. The five number summary gives the lowest data value, the highest data value, the median, the first (lower) quartile and the third (higher) quartile. Consider the following set of data: 5, 3, 4, 6, 2, 8, 5, 4, 6, 7, 3, 6, 9, 4, 5. We first order the data as follows: 2, 3, 3, 4, 4, 4, 5, 5, 5, 6, 6, 6, 7, 8, 9. The lowest data value is 2 and the highest data value is 9. The median is 5. The first quartile is 4 and the third quartile is 6. So the five number summary is: 2, 4, 5, 6, 9.

Box and whisker plots

The five number summary can be shown graphically in a box and whisker plot. The main features of the box and whisker diagram are shown in Figure 4. The box can lie horizontally (as shown) or vertically. For a horizonatal diagram, the left edge of the box is placed at the first quartile and the right edge of the box is placed at the third quartile. The height of the box is arbitrary, as there is no y-axis. Inside the box there is some representation of central tendency, with the median shown with a vertical line dividing the box into two. Additionally, a star or asterix is placed at the mean value, centered in the box in the vertical direction. The whiskers which extend to the sides reach the minimum and maximum values. This is shown for the data set: 5, 3, 4, 6, 2, 8, 5, 4, 6, 7, 3, 6, 9, 4, 5.

Figure 4: Main features of a box and whisker plot
boxwhisker

Exercise 9

Draw a box and whisker diagram for the data set: x={1,25;1,5;2,5;2,5;3,1; 3,2;4,1;4,25;4,75;4,8;4,95;5,1}x={1,25;1,5;2,5;2,5;3,1;3,2;4,1;4,25;4,75;4,8;4,95;5,1}.

Solution
  1. Step 1. Determine the five number summary:
    Minimum=1,25Minimum=1,25
    Maximum=5,10Maximum=5,10
    The position of first quartile is between 3 and 4.
    The position of second quartile is between 6 and 7.
    The position of third quartile is between 9 and 10.
    The data value between 3 and 4 is: 12(2,5+2,5)=2,512(2,5+2,5)=2,5
    The data value between 6 and 7 is: 12(3,2+4,1)=3,6512(3,2+4,1)=3,65
    The data value between 9 and 10 is: 12(4,75+4,8)=4,77512(4,75+4,8)=4,775
  2. Step 2. Draw a box and whisker diagram and mark the positions of the minimum, maximum and quartiles:
    Figure 5
    Figure 5 (boxwhisker1.png)

Exercises - Summarising Data

  1. Three sets of data are given:
    1. Data set 1: 9 12 12 14 16 22 24
    2. Data set 2: 7 7 8 11 13 15 16 16
    3. Data set 3: 11 15 16 17 19 19 22 24 27
    For each one find:
    1. the range
    2. the lower quartile
    3. the interquartile range
    4. the semi-interquartile range
    5. the median
    6. the upper quartile
    Click here for the solution
  2. There is 1 sweet in one jar, and 3 in the second jar. The mean number of sweets in the first two jars is 2.
    1. If the mean number in the first three jars is 3, how many are there in the third jar?
    2. If the mean number in the first four jars is 4, how many are there in the fourth jar?
    Click here for the solution
  3. Find a set of five ages for which the mean age is 5, the modal age is 2 and the median age is 3 years.
    Click here for the solution
  4. Four friends each have some marbles. They work out that the mean number of marbles they have is 10. One of them leaves. She has 4 marbles. How many marbles do the remaining friends have together?
    Click here for the solution
  5. Jason is working in a computer store. He sells the following number of computers each month: 27; 39; 3; 15; 43; 27; 19; 54; 65; 23; 45; 16 Give a five number summary and a box and whisker plot of his sales.
    Click here for the solution
  6. Lisa works as a telesales person. She keeps a record of the number of sales she makes each month. The data below show how much she sells each month. 49; 12; 22; 35; 2; 45; 60; 48; 19; 1; 43; 12 Give a five number summary and a box and whisker plot of her sales.
    Click here for the solution
  7. Rose has worked in a florists shop for nine months. She sold the following number of wedding bouquets: 16; 14; 8; 12; 6; 5; 3; 5; 7
    1. What is the five-number summary of the data?
    2. Since there is an odd number of data points what do you observe when calculating the five-numbers?
    Click here for the solution

We can apply the concepts of mean, median and mode to data that has been grouped. Grouped data does not have individual data points, but rather has the data organized into groups or bins. To calculate the mean we need to add up all the frequencies and divide by the total. We do not know what the actual data values are, so we approximate by choosing the midpoint of each group. We then multiply those midpoint numbers by the frequency. Then we add these numbers together to find the approximate total of the masses. The modal group is the group with the highest frequency. The median group is the group that contains the middle terms.

Measures of dispersion can also be found for grouped data. The range is found by subtracting the smallest number in the lowest bin from the largest number in the highest bin. The quartiles are found in a similar way to the median.

Exercise 10: Mean, Median and Mode for Grouped Data

Consider the following grouped data and calculate the mean, the modal group and the median group.

Table 14
Mass (kg) Frequency
41 - 45 7
46 - 50 10
51 - 55 15
56 - 60 12
61 - 65 6
  Total = 50
Solution
  1. Step 1. Calculating the mean :

    To calculate the mean we need to add up all the masses and divide by 50. We do not know actual masses, so we approximate by choosing the midpoint of each group. We then multiply those midpoint numbers by the frequency. Then we add these numbers together to find the approximate total of the masses. This is show in the table below.

    Table 15
    Mass (kg) Midpoint Frequency Midpt ×× Freq
    41 - 45 (41+45)/2 = 43 7 43 ×× 7 = 301
    46 - 50 48 10 480
    51 - 55 53 15 795
    56 - 60 58 12 696
    61 - 65 63 6 378
        Total = 50 Total = 2650
  2. Step 2. Answer :

    The mean = 265050=53265050=53.

    The modal group is the group 51 - 53 because it has the highest frequency.

    The median group is the group 51 - 53, since the 25th and 26th terms are contained within this group.

More mean, modal and median group exercises.

In each data set given, find the mean, the modal group and the median group.

  1. Times recorded when learners played a game.
    Table 16
    Time in secondsFrequency
    36 - 455
    46 - 5511
    56 - 6515
    66 - 7526
    76 - 8519
    86 - 9513
    96 - 1056
    Click here for the solution
  2. The following data were collected from a group of learners.
    Table 17
    Mass in kilogramsFrequency
    41 - 453
    46 - 505
    51 - 558
    56 - 6012
    61 - 6514
    66 - 709
    71 - 757
    76 - 802
    Click here for the solution

Bias and error in measurements

All measurements have some error associated with them. Random errors occur in all data sets and are sometimes known as non-systematic errors. Random errors can arise from estimation of data values, imprecision of instruments, etc. For example if you are reading lengths off a ruler, random errors will arise in each measurement as a result of estimating between which two lines the length lies. Bias is also sometimes known as systematic error. Bias in a data set is where a value is consistently under or overestimated. Bias can arise from forgetting to take into account a correction factor or from instruments that are not properly calibrated (calibration is the process of marking off predefined measurements). Bias leads to a sample mean that is either lower or higher than the true mean.

Data interpretation

Many people take statistics and just blindly apply it to life or quote it. This, however, is not wise since the data that led to the statistics also needs to be considered. A well known example of several sets of data that lead to the same statistical analysis (the process of examining data and determining values such as central tendency, etc.) but are in fact very different is Anscombe's quartet. This is shown in (Reference). In Grade 11 you will learn about the methods used to represent data graphically. For now, however, you should simply appreciate the fact that we can plot data values on the Cartesian plane in a similar way to plotting graphs. If each of the datasets in Anscombe's quartet are analysed statistically, then one finds that the mean, variance, correlation and linear regression (these terms will be explained in later grades) are identical. If, instead of analysing the data statistically, we simply plot the data points we can see that the data sets are very different. This example shows us that it is very important to consider the underlying data set as well as the statistics that we obtain from the data. We cannot simply assume that just because we know the statistics of a data set, we know what the data set is telling us. For general interest, some of the ways that statistics and data can be misinterpreted are given in the following extension section.

Figure 6: Anscombe's quartet
Figure 6 (anscombe.png)

Misuse of Statistics - For enrichment, not in CAPS

In many cases groups can gain an advantage by misleading people with the misuse of statistics. Companies misuse statistics to attempt to show that they are performing better than a competitor, advertisers abuse statistics to try to convince you to buy their product, researchers misuse statistics to attempt to show that their data is of better quality than it really is, etc.

Common techniques used include:

  • Three dimensional graphs.
  • Axes that do not start at zero.
  • Axes without scales.
  • Graphic images that convey a negative or positive mood.
  • Assumption that a correlation shows a necessary causality.
  • Using statistics that are not truly representative of the entire population.
  • Using misconceptions of mathematical concepts

For example, the following pairs of graphs show identical information but look very different. Explain why.

Figure 7
Figure 7 (MG10C16_010.png)

Exercises - Misuse of Statistics

  1. A company has tried to give a visual representation of the increase in their earnings from one year to the next. Does the graph below convince you? Critically analyse the graph.
    Figure 8
    Figure 8 (MG10C16_011.png)
    Click here for the solution
  2. In a study conducted on a busy highway, data was collected about drivers breaking the speed limit and the colour of the car they were driving. The data were collected during a 20 minute time interval during the middle of the day, and are presented in a table and pie chart below.
    • Conclusions made by a novice based on the data are summarised as follows:
    • “People driving white cars are more likely to break the speed limit.”
    • “Drivers in blue and red cars are more likely to stick to the speed limit.”
    • Do you agree with these conclusions? Explain.
    Click here for the solution
  3. A record label produces a graphic, showing their advantage in sales over their competitors. Identify at least three devices they have used to influence and mislead the readers impression.
    Figure 9
    Figure 9 (MG10C16_013.png)
    Click here for the solution
  4. In an effort to discredit their competition, a tour bus company prints the graph shown below. Their claim is that the competitor is losing business. Can you think of a better explanation?
    Figure 10
    Figure 10 (MG10C16_014.png)
    Click here for the solution
  5. To test a theory, 8 different offices were monitored for noise levels and productivity of the employees in the office. The results are graphed below.
    Figure 11
    Figure 11 (MG10C16_015.png)
    The following statement was then made: “If an office environment is noisy, this leads to poor productivity.” Explain the flaws in this thinking.
    Click here for the solution

End of chapter summary

  • Data types can be divided into primary and secondary data. Primary data may be further divided into qualitative and quantitative data.
  • We use the following as measures of central tendency:
    • Mean: The mean of a data set, xx, denoted by x¯x¯, is the average of the data values, and is calculated as:
      x¯=sum of valuesnumber of valuesx¯=sum of valuesnumber of values
      (7)
    • Median: The median is the centre data value in a data set that has been ordered from lowest to highest
    • Mode: The mode is the data value that occurs most often in a data set.
  • The following are measures of dispersion:
    • Range: The range of a data set is the difference between the lowest value and the highest value in the set.
    • Quartiles: Quartiles are the three data values that divide an ordered data set into four groups containing equal numbers of data values. The median is the second quartile.
    • Percentiles: Percentiles are the 99 data values that divide a data set into 100 groups.
    • Inter quartile range: The inter quartile range is a measure which provides information about the spread of a data set, and is calculated by subtracting the first quartile from the third quartile, giving the range of the middle half of the data set, trimming off the lowest and highest quarters, i.e. Q3-Q1Q3-Q1. Half of this value is the semi-interquartile range.
  • The five number summary is a way to summarise data. A box and whisker plot is a graphical representation of the five number summary.
  • Random errors are found in all sets of data and arise from estimating data values. Bias or systematic error occurs when you consistently under or over estimate data values.
  • You must always consider the data and the statistics that summarise the data

Exercises

  1. Calculate the mean, median, and mode of Data Set 3.
    Click here for the solution
  2. The tallest 7 trees in a park have heights in metres of 41, 60, 47, 42, 44, 42, and 47. Find the median of their heights.
    Click here for the solution
  3. The students in Bjorn's class have the following ages: 5, 6, 7, 5, 4, 6, 6, 6, 7, 4. Find the mode of their ages.
    Click here for the solution
  4. An engineering company has designed two different types of engines for motorbikes. The two different motorbikes are tested for the time it takes (in seconds) for them to accelerate from 0 km/h to 60 km/h.
    Table 18
     Test 1Test 2Test 3Test 4Test 5Test 6Test 7Test 8Test 9Test 10Average
    Bike 11.551.000.920.801.490.711.060.680.871.09 
    Bike 20.91.01.11.01.00.90.91.00.91.1 
    1. What measure of central tendency should be used for this information?
    2. Calculate the average you chose in the previous question for each motorbike.
    3. Which motorbike would you choose based on this information? Take note of accuracy of the numbers from each set of tests.
    Click here for the solution
  5. The heights of 40 learners are given below.
    Table 19
    154140145159150132149150138152
    141132169173139161163156157171
    168166151152132142170162146152
    142150161138170131145146147160
    1. Set up a frequency table using 6 intervals.
    2. Calculate the approximate mean.
    3. Determine the mode.
    4. How many learners are taller than your approximate average in (b)?
    Click here for the solution
  6. In a traffic survey, a random sample of 50 motorists were asked the distance they drove to work daily. This information is shown in the table below.
    Table 20
    Distance in km1-56-1011-1516-2021-2526-3031-3536-4041-45
    Frequency4591078322
    1. Find the approximate mean.
    2. What percentage of samples drove
      1. less than 16 km?
      2. more than 30 km?
      3. between 16 km and 30 km daily?
    Click here for the solution
  7. A company wanted to evaluate the training programme in its factory. They gave the same task to trained and untrained employees and timed each one in seconds.
    Table 21
    Trained121137131135130
     128130126132127
     129120118125134
    Untrained135142126148145
     156152153149145
     144134139140142
    1. Find the medians and quartiles for both sets of data.
    2. Find the Interquartile Range for both sets of data.
    3. Comment on the results.
    Click here for the solution
  8. A small firm employs nine people. The annual salaries of the employers are:
    Table 22
    R600 000R250 000R200 000
    R120 000R100 000R100 000
    R100 000R90 000R80 000
    1. Find the mean of these salaries.
    2. Find the mode.
    3. Find the median.
    4. Of these three figures, which would you use for negotiating salary increases if you were a trade union official? Why?
    Click here for the solution
  9. The marks for a particular class test are listed here:
    Table 23
    67589167588271516084
    31679664787187788938
    6962607360877149  

    Complete the frequency table using the given class intervals.

    Table 24
    ClassTallyFrequencyMid-pointFreq ×× Midpt
    30-39 34,5  
    40-49 44,5  
    50-59    
    60-69    
    70-79    
    80-89    
    90-99    
      Sum = Sum =

    Click here for the solution

Collection Navigation

Content actions

Download:

Collection as:

PDF | EPUB (?)

What is an EPUB file?

EPUB is an electronic book format that can be read on a variety of mobile devices.

Downloading to a reading device

For detailed instructions on how to download this content's EPUB to your specific device, click the "(?)" link.

| More downloads ...

Module as:

PDF | EPUB (?)

What is an EPUB file?

EPUB is an electronic book format that can be read on a variety of mobile devices.

Downloading to a reading device

For detailed instructions on how to download this content's EPUB to your specific device, click the "(?)" link.

| More downloads ...

Add:

Collection to:

My Favorites (?)

'My Favorites' is a special kind of lens which you can use to bookmark modules and collections. 'My Favorites' can only be seen by you, and collections saved in 'My Favorites' can remember the last module you were on. You need an account to use 'My Favorites'.

| A lens I own (?)

Definition of a lens

Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

Who can create a lens?

Any individual member, a community, or a respected organization.

What are tags? tag icon

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

| External bookmarks

Module to:

My Favorites (?)

'My Favorites' is a special kind of lens which you can use to bookmark modules and collections. 'My Favorites' can only be seen by you, and collections saved in 'My Favorites' can remember the last module you were on. You need an account to use 'My Favorites'.

| A lens I own (?)

Definition of a lens

Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

Who can create a lens?

Any individual member, a community, or a respected organization.

What are tags? tag icon

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

| External bookmarks