Skip to content Skip to navigation

OpenStax-CNX

You are here: Home » Content » Estimating Variance Simulation

Navigation

Recently Viewed

This feature requires Javascript to be enabled.
 

Estimating Variance Simulation

Module by: David Lane. E-mail the author

Begin by answering the questions, even if you have to guess. The first time you answer the questions you will not be told whether you are correct or not.

Once you have answered all the questions, answer them again using the simulation to help you. This time you will get feedback about each individual answer.

Show Simulation

General Instructions

This simulation samples from the population of 50 numbers shown here. You can see that there are 10 instances of the values 1, 2, 3, 4, and 5. The mean of the population is therefore 3. The variance is the average squared deviation from the mean of 3. You can compute that this is exactly 2.

When you click on the button "Draw four numbers" four scores are sampled (with replacement) from the population. The four numbers are shown in red, as is the mean of the four numbers. The variance is then computed in two ways. The upper formula computes the variance by computing the mean of the squared deviations or the four sampled numbers from the sample mean. The lower formula computes the mean of the squared deviations or the four sampled numbers from the population mean of 3.00 (on rare occasions, the sample and population means will be equal). The computed variances are placed in the fields to the right of the formulas. The mean of the values in a field is shown at the bottom of the field. When there is onlyu on e value in the field, the mean will, of course, equal that value.

If you click the "Draw four numbers" button again, another four numbers will be sampled. The mean and variance will also be computed as before. The fields to the right of the formulas will hold both variances and the bottom of the field will show the mean of the variances.

The population variance is exactly 2. Use this fact to assess the relative value of the two formulas for variance. See which one, on average, approaches 2 and which one gives lower estimates. Explore whether either formula is always more accurate, or whether sometimes one is more accurate and at other times, the other formula is. If the variance based on the sample mean had been computed by dividing by N-1 = 3 instead of 4, then the variance would be 4/3 times bigger. Does multiplying the variance by 4/3 lead to better estimates?

Step by Step Instructions

Click the "Draw 4 numbers" button. Four numbers will be selected from the population. They will be shown in red in the population. They will also be shown in red below the "Draw 4 numbers button." The mean of the 4 numbers is also presented. The population mean is 3.0. See how the sample mean compares to the population mean.

Two formulas for the variance are shown. In the first, the average squared deviation of the four numbers from the sample mean is computed. In the second, the average squared deviation from the population mean of 3 is computed. You should notice that the former formula will always produce a smaller value than the latter formula unless the sample mean is the same as the populaton mean. In this case, the two computations lead to the same result.

Notice the text fields to the right of the formulas. They are used to store the results of the simulation. The values of the variances are stored, and the mean of all the values is displayed at the bottom. After only one sample, the mean equals the single value.

Click the Draw 4 numbers" button again. Another sample will be taken and the computations will be done as before. Each text field will have two variances in it. Look to see which formula is giving the more accurate estimates of the population variance of 2.0.

With only two samples, it is hard to be sure which formula is more accurate. Continue sampling until you have taken about 20 samples. For each sample, note which formula gives you an answer closer to 2.0. You will probably find that formula 2 usually, but not always comes closer.

Look at the means for the two formulas. The mean for the upper formula will be lower than the mean for the lower formula. Look to see which is closer to the population variance of 2.0. You should find that the mean of the values for the upper formula is too low, probably somewhere around 1.50. The mean for the lower formula should be closer to 2.0. If fomula 1 had divided by N -1 (which is 3) rather than N (which is 4), it s values would have been larger. Specifically, they would have been 4/3 times larger. Multiply the mean from forumula 1 by 4/3 and see if it comes closer to the populaton variance of 2.

This makes sense because you would expect to be better able to estimate the variance if you knew the population mean (as you do in formula 2) than if you had to estimate it (as you do in formula 1).

The critical point is that when you have to estimate the population mean, you get values that are, on average, too low. This does not mean that every value will be too low. Look through the variances based on formula 1. Even though the mean is lower than 2.0, you will find that some of the values are above 2.0. This means that even though this formula tends to give you values that are too low, there are instances when it gives you values that are too high.

Summary

The average squared difference from the sample mean will, on average, understimate the populaton variance. In some samples it will overestimate it, but most of the time it will underestimate it. If the formula is modified so that the sum of squared deviations is divided by N -1 rather than by N, then the tendency to underestimate the population variance is eliminated.

Content actions

Download module as:

Add module to:

My Favorites (?)

'My Favorites' is a special kind of lens which you can use to bookmark modules and collections. 'My Favorites' can only be seen by you, and collections saved in 'My Favorites' can remember the last module you were on. You need an account to use 'My Favorites'.

| A lens I own (?)

Definition of a lens

Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

Who can create a lens?

Any individual member, a community, or a respected organization.

What are tags? tag icon

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

| External bookmarks