- The two independent samples are simple random samples from two distinct
populations.
- Both populations are normally distributed with the population means and standard
deviations unknown.
The comparison of two population means is very common. A difference between
the two samples depends on both the means and the standard deviations. Very
different means can occur by chance if there is great variation among the individual
samples. In order to account for the variation, we take the difference of the sample
means,
X
1
¯
X
1
-
X
2
¯
X
2
, and divide by the standard error (shown below) in order to
standardize the difference. The result is a t-score test statistic (shown below).
Because we do not know the population standard deviations, we estimate them using
the two sample standard deviations from our independent samples. For the
hypothesis test, we calculate the estimated standard deviation, or standard error, of
the difference in sample means,
X
1
¯
X
1
-
X
2
¯
X
2
.
(
S
1
)
2
n
1
+
(
S
2
)
2
n
2
(
S
1
)
2
n
1
+
(
S
2
)
2
n
2
(1)
The test statistic (t-score) is calculated as follows:
(
x
1
¯
-
x
2
¯
)
-
(
μ
1
-
μ
2
)
(
S
1
)
2
n
1
+
(
S
2
)
2
n
2
(
x
1
-
x
2
)
-
(
μ
1
-
μ
2
)
(
S
1
)
2
n
1
+
(
S
2
)
2
n
2
(2)
- s1s1 and s2s2, the sample standard
deviations, are estimates of σ1σ1 and
σ2σ2,
respectively.
- σ1σ1 and σ2σ2 are the unknown
population standard deviations.
-
x
1
¯
x
1
- and
x
2
¯
x
2
are the sample means.
μ1μ1 and μ2μ2 are the population means.
The degrees of freedom (df) is a somewhat complicated calculation. However, a computer
or calculator calculates it easily. The dfs are not always a whole number. The test statistic
calculated above is approximated by the Student-t distribution with dfs as follows:
df
=
[
(
s
1
)
2
n
1
+
(
s
2
)
2
n
2
]
2
1
n
1
−
1
·
[
(
s
1
)
2
n
1
]
2
+
1
n
2
−
1
·
[
(
s
2
)
2
n
2
]
2
df=
[
(
s
1
)
2
n
1
+
(
s
2
)
2
n
2
]
2
1
n
1
−
1
·
[
(
s
1
)
2
n
1
]
2
+
1
n
2
−
1
·
[
(
s
2
)
2
n
2
]
2
(3)
When both sample sizes n1n1 and n2n2 are five or larger, the Student-t approximation is very
good. Notice that the sample variances
s
1
2
s
1
2
and
s
2
2
s
2
2
are not pooled. (If the question comes
up, do not pool the variances.)
It is not
necessary to
compute this by
hand. A calculator
or computer easily
computes it.
The average amount of time boys and girls
ages 7 through 11 spend playing sports each day is believed to be the same. An
experiment is done, data is collected, resulting in the table below:
| |
Sample Size |
Average Number of Hours Playing Sports Per Day |
Sample Standard Deviation |
| Girls |
9 |
2 hours |
0.750.75 |
| Boys |
16 |
3.2 hours |
1.00 |
Is there a difference in the average amount of time boys and girls ages 7 through 11 play
sports each day? Test at the 5% level of significance.
The population standard deviations are not known.
Let gg be the subscript for girls and bb be the subscript for boys. Then, μgμg is the population
mean for girls and μbμb is the population mean for boys.
This is a test of two independent groups, two population means.
Random variable:
X
g
¯
-
X
b
¯
X
g
-
X
b
= difference in the average amount of time girls and boys play sports each day.
H
o
H
o
:
μ
g
=
μ
b
(
μ
g
−
μ
b
=
0
)
μ
g
=
μ
b
(
μ
g
−
μ
b
=0)
H
a
H
a
:
μ
g
≠
μ
b
(
μ
g
−
μ
b
≠
0
)
μ
g
≠
μ
b
(
μ
g
−
μ
b
≠0)
The words "the same" tell you
H
o
H
o
has an "=". Since there are
no other words to indicate
H
a
H
a
,
then assume "is different."
This is a two-tailed test.
Distribution for the test:
Use
t
df
t
df
where
df
df is calculated using the
df
df formula for independent groups, two
population means. Using a calculator,
df
df is approximately 18.8462. Do not pool
the variances.
Calculate the p-value using a Student-t distribution: p-value = 0.0054
Graph:
s
g
=
0.75
s
g
=
0.75
s
b
=
1
s
b
=1
So,
x
g
¯
-
x
b
¯
=
2
-
3.2
=
-
1.2
x
g
-
x
b
=2-3.2=-1.2
Half the p-value is
below -1.2 and
half is above 1.2.
Make a decision: Since α>α> p-value, reject
H
o
H
o
.
This means you reject
μ
g
=
μ
b
μ
g
=
μ
b
. The means are different.
Conclusion: At the 5% level of significance, the sample data show there is sufficient
evidence to conclude that the average number of hours that girls and boys aged 7
through 11 play sports per day is different.
TI-83+ and TI-84: Press STAT. Arrow over to TESTS and press
4:2-SampTTest. Arrow over to Stats and press ENTER. Arrow down
and enter 2 for the first sample mean, .75 for Sx1, 9 for n1, 3.2 for the
second sample mean, 1 for Sx2, and 16 for n2. Arrow down to μ1: and
arrow to does not equal μ2. Press ENTER. Arrow down to Pooled: and
No. Press ENTER. Arrow down to Calculate and press ENTER. The
p-value is p = 0.0054, the dfs are approximately 18.8462, and the test
statistic is -3.14. Do the procedure again but instead of Calculate do Draw.
A study is done by a community group in two neighboring colleges to
determine which one graduates students with more math classes. College A samples
11 graduates. Their average is 4 math classes with a standard deviation of 1.5 math
classes. College B samples 9 graduates. Their average is 3.5 math classes with a
standard deviation of 1 math class. The community group believes that a student who
graduates from college A has taken more math classes, on the average. Test at a
1% significance level.
Answer the following questions.
Is this a test of two means or two proportions?
Are the populations standard deviations known or unknown?
Which distribution do you use to perform the test?
What is the random variable?
What are the null and alternate hypothesis?
-
H
o
:
μ
A
≤
μ
B
H
o
:
μ
A
≤
μ
B
-
H
a
:
μ
A
>
μ
B
H
a
:
μ
A
>
μ
B
Is this test right, left, or two tailed?
Do you reject or not reject the null hypothesis?
At the 1% level of significance, from the sample data, there is not
sufficient evidence to conclude that a student who graduates from college A has
taken more math classes, on the average, than a student who graduates from
college B.
- Degrees of Freedom (df):
The number of objects in a sample that are free to vary.
- Standard Deviation:
A number that is equal to the square root of the variance and measures how far data values are from their mean. Notations: s for sample standard deviation and σσ for population standard deviation.
- Variable (Random Variable):
A characteristic of interest in a population being studied. Common notation for variables are upper case Latin letters
XX size 12{X} {},
YY size 12{Y} {},
ZZ size 12{Z} {},...; common notation for specific value from the domain (set of all possible values of a variable) are lower case Latin letters
xx size 12{x} {},
yy size 12{y} {},
zz size 12{z} {},.... For example, if
XX size 12{X} {} is a number of children in a family, then domain is and
xx size 12{x} {} represents any integer from 0 to 20. Variable in statistics differs from variable in intermediate algebra in two following ways.
- The domain of random variable (RV) is not necessarily numerical set; it can be some “wording” set; for example, if
XX size 12{X} {} = hair color then the domain is {black, blond, gray, green, orange}.
- We can tell what specific value of
xx size 12{x} {} does the variable
XX size 12{X} {} take only after performing the experiment.
Before the experiment any value from domain is possible. For example, without ultrasound we can not tell the gender of a baby that should be delivered, but after delivery the gender is evident. More exact, every value from the domain is accompanied with some number
pp size 12{p} {},
0≤p≤10≤p≤1 size 12{0 <= p <= 1} {}, that characterizes the chance to have this value as an outcome of the experiment. In the example with gender,
p=12p=12 size 12{p= { {1} over {2} } } {}. That’s why statisticians use more exact name
“Random variable” (RV) instead of variable. Even more, they use word “distribution” having in the mind the RV, that is the pairing (value, probability of the value).
"This book was purchased from the authors by the Maxfield Foundation and provided to the community as an open textbook available freely online and in PDF format. Bound copies of the book can also […]"