Skip to content Skip to navigation Skip to collection information

OpenStax-CNX

You are here: Home » Content » Collaborative Statistics » Testing the Significance of the Correlation Coefficient

Navigation

Table of Contents

Lenses

What is a lens?

Definition of a lens

Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

Who can create a lens?

Any individual member, a community, or a respected organization.

What are tags? tag icon

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

This content is ...

Endorsed by Endorsed (What does "Endorsed by" mean?)

This content has been endorsed by the organizations listed. Click each link for a list of all content endorsed by the organization.
  • College Open Textbooks display tagshide tags

    This collection is included inLens: Community College Open Textbook Collaborative
    By: CC Open Textbook Collaborative

    Comments:

    "Reviewer's Comments: 'I recommend this book. Overall, the chapters are very readable and the material presented is consistent and appropriate for the course. A wide range of exercises introduces […]"

    Click the "College Open Textbooks" link to see all content they endorse.

    Click the tag icon tag icon to display tags associated with this content.

  • JVLA Endorsed

    This collection is included inLens: Jesuit Virtual Learning Academy Endorsed Material
    By: Jesuit Virtual Learning Academy

    Comments:

    "This is a robust collection (textbook) approved by the College Board as a resource for the teaching of AP Statistics. "

    Click the "JVLA Endorsed" link to see all content they endorse.

  • WebAssign display tagshide tags

    This collection is included inLens: WebAssign The Independent Online Homework and Assessment Solution
    By: WebAssign

    Comments:

    "Online homework and assessment available from WebAssign."

    Click the "WebAssign" link to see all content they endorse.

    Click the tag icon tag icon to display tags associated with this content.

Affiliated with (What does "Affiliated with" mean?)

This content is either by members of the organizations listed or about topics related to the organizations listed. Click each link to see a list of all content affiliated with the organization.
  • OrangeGrove display tagshide tags

    This collection is included inLens: Florida Orange Grove Textbooks
    By: Florida Orange Grove

    Click the "OrangeGrove" link to see all content affiliated with them.

    Click the tag icon tag icon to display tags associated with this content.

  • Bookshare

    This collection is included inLens: Bookshare's Lens
    By: Bookshare - A Benetech Initiative

    Comments:

    "DAISY and BRF versions of this collection are available."

    Click the "Bookshare" link to see all content affiliated with them.

  • Featured Content display tagshide tags

    This collection is included inLens: Connexions Featured Content
    By: Connexions

    Comments:

    "Collaborative Statistics was written by two faculty members at De Anza College in Cupertino, California. This book is intended for introductory statistics courses being taken by students at two- […]"

    Click the "Featured Content" link to see all content affiliated with them.

    Click the tag icon tag icon to display tags associated with this content.

Also in these lenses

  • statistics display tagshide tags

    This collection is included inLens: Statistics
    By: Brylie Oxley

    Click the "statistics" link to see all content selected in this lens.

    Click the tag icon tag icon to display tags associated with this content.

  • Lucy Van Pelt display tagshide tags

    This collection is included inLens: Lucy's Lens
    By: Tahiya Marome

    Comments:

    "Part of the Books featured on Community College Open Textbook Project"

    Click the "Lucy Van Pelt" link to see all content selected in this lens.

    Click the tag icon tag icon to display tags associated with this content.

  • Educational Technology Lens display tagshide tags

    This collection is included inLens: Educational Technology
    By: Steve Wilhite

    Click the "Educational Technology Lens" link to see all content selected in this lens.

    Click the tag icon tag icon to display tags associated with this content.

  • Statistics

    This collection is included inLens: Mathieu Plourde's Lens
    By: Mathieu Plourde

    Click the "Statistics" link to see all content selected in this lens.

  • statf12

    This collection is included inLens: Statistics Fall 2012
    By: Alex Kolesnik

    Click the "statf12" link to see all content selected in this lens.

  • UTEP display tagshide tags

    This collection is included inLens: Amy Wagler's Lens
    By: Amy Wagler

    Click the "UTEP" link to see all content selected in this lens.

    Click the tag icon tag icon to display tags associated with this content.

  • Make Textbooks Affordable

    This collection is included inLens: Make Textbooks Affordable
    By: Nicole Allen

    Click the "Make Textbooks Affordable" link to see all content selected in this lens.

  • BUS204 Homework display tagshide tags

    This collection is included inLens: Saylor BUS 204 Homework
    By: David Bourgeois

    Comments:

    "Homework for Discrete Variables/Probability. "

    Click the "BUS204 Homework" link to see all content selected in this lens.

    Click the tag icon tag icon to display tags associated with this content.

  • crowe

    This collection is included in aLens by: Chris Rowe

    Click the "crowe" link to see all content selected in this lens.

  • Bio 502 at CSUDH display tagshide tags

    This collection is included inLens: Bio 502
    By: Terrence McGlynn

    Comments:

    "This is the course textbook for Biology 502 at CSU Dominguez Hills"

    Click the "Bio 502 at CSUDH" link to see all content selected in this lens.

    Click the tag icon tag icon to display tags associated with this content.

Recently Viewed

This feature requires Javascript to be enabled.

Tags

(What is a tag?)

These tags come from the endorsement, affiliation, and other lenses that include this content.
 

Testing the Significance of the Correlation Coefficient

Module by: Susan Dean, Barbara Illowsky, Ph.D.. E-mail the authors

Summary: Linear Regression and Correlation: Testing the Significance of the Correlation Coefficient is a part of Collaborative Statistics collection (col10522) by Barbara Illowsky and Susan Dean. The title has been changed from Facts About the Correlation Coefficient for Linear Regression. Roberta Bloom has made major contributions to this module.

Testing the Significance of the Correlation Coefficient

The correlation coefficient, rr, tells us about the strength of the linear relationship between xx and yy. However, the reliability of the linear model also depends on how many observed data points are in the sample. We need to look at both the value of the correlation coefficient rr and the sample size nn, together.

We perform a hypothesis test of the "significance of the correlation coefficient" to decide whether the linear relationship in the sample data is strong enough to use to model the relationship in the population.

The sample data is used to compute rr, the correlation coefficient for the sample. If we had data for the entire population, we could find the population correlation coefficient. But because we only have sample data, we can not calculate the population correlation coefficient. The sample correlation coefficient, rr, is our estimate of the unknown population correlation coefficient.

  • The symbol for the population correlation coefficient is ρρ, the Greek letter "rho".
  • ρρ = population correlation coefficient (unknown)
  • rr = sample correlation coefficient (known; calculated from sample data)

The hypothesis test lets us decide whether the value of the population correlation coefficient ρρ is "close to 0" or "significantly different from 0". We decide this based on the sample correlation coefficient rr and the sample size nn.

If the test concludes that the correlation coefficient is significantly different from 0, we say that the correlation coefficient is "significant".

  • Conclusion: "There is sufficient evidence to conclude that there is a significant linear relationship between xx and yy because the correlation coefficient is significantly different from 0."
  • What the conclusion means: There is a significant linear relationship between xx and yy. We can use the regression line to model the linear relationship between xx and yy in the population.

If the test concludes that the correlation coefficient is not significantly different from 0 (it is close to 0), we say that correlation coefficient is "not significant".

  • Conclusion: "There is insufficient evidence to conclude that there is a significant linear relationship between xx and yy because the correlation coefficient is not significantly different from 0."
  • What the conclusion means: There is not a significant linear relationship between xx and yy. Therefore we can NOT use the regression line to model a linear relationship between xx and yy in the population.

Note:

  • If rr is significant and the scatter plot shows a linear trend, the line can be used to predict the value of yy for values of xx that are within the domain of observed xx values.
  • If rr is not significant OR if the scatter plot does not show a linear trend, the line should not be used for prediction.
  • If rr is significant and if the scatter plot shows a linear trend, the line may NOT be appropriate or reliable for prediction OUTSIDE the domain of observed xx values in the data.

PERFORMING THE HYPOTHESIS TEST

SETTING UP THE HYPOTHESES:

  • Null Hypothesis: H o H o : ρρ = 0
  • Alternate Hypothesis: H a H a : ρρ ≠ 0

What the hypotheses mean in words:

  • Null Hypothesis H o H o : The population correlation coefficient IS NOT significantly different from 0. There IS NOT a significant linear relationship(correlation) between xx and yy in the population.
  • Alternate Hypothesis H a H a : The population correlation coefficient IS significantly DIFFERENT FROM 0. There IS A SIGNIFICANT LINEAR RELATIONSHIP (correlation) between xx and yy in the population.

DRAWING A CONCLUSION:

  • There are two methods to make the decision. Both methods are equivalent and give the same result.
  • Method 1: Using the p-value
  • Method 2: Using a table of critical values
  • In this chapter of this textbook, we will always use a significance level of 5%, αα = 0.05
  • Note: Using the p-value method, you could choose any appropriate significance level you want; you are not limited to using αα = 0.05. But the table of critical values provided in this textbook assumes that we are using a significance level of 5%, αα = 0.05. (If we wanted to use a different significance level than 5% with the critical value method, we would need different tables of critical values that are not provided in this textbook.)

METHOD 1: Using a p-value to make a decision

  • The linear regression tt-test LinRegTTEST on the TI-83+ or TI-84+ calculators calculates the p-value.
  • On the LinRegTTEST input screen, on the line prompt for ββ or ρρ, highlight "≠ 0"
  • The output screen shows the p-value on the line that reads "p =".
  • (Most computer statistical software can calculate the p-value.)

If the p-value is less than the significance level (α = 0.05):

  • Decision: REJECT the null hypothesis.
  • Conclusion: "There is sufficient evidence to conclude that there is a significant linear relationship between xx and yy because the correlation coefficient is significantly different from 0."

If the p-value is NOT less than the significance level (α = 0.05)

  • Decision: DO NOT REJECT the null hypothesis.
  • Conclusion: "There is insufficient evidence to conclude that there is a significant linear relationship between xx and yy because the correlation coefficient is NOT significantly different from 0."

Calculation Notes:

  • You will use technology to calculate the p-value. The following describe the calculations to compute the test statistics and the p-value:
  • The p-value is calculated using a tt-distribution with n-2n-2 degrees of freedom.
  • The formula for the test statistic is t=rn21r2 t r n 2 1 r 2 . The value of the test statistic, tt, is shown in the computer or calculator output along with the p-value. The test statistic tt has the same sign as the correlation coefficient rr.
  • The p-value is the combined area in both tails.
  • An alternative way to calculate the p-value (p) given by LinRegTTest is the command 2*tcdf(abs(t),10^99, n-2) in 2nd DISTR.

THIRD EXAM vs FINAL EXAM EXAMPLE: p value method

  • Consider the third exam/final exam example.
  • The line of best fit is: y ^ = -173.51 + 4.83x y ^ =-173.51+4.83x with r = 0.6631 r=0.6631 and there are n = 11n = 11 data points.
  • Can the regression line be used for prediction? Given a third exam score (xx value), can we use the line to predict the final exam score (predicted yy value)?
  • H o H o : ρρ = 0
  • H a H a : ρρ ≠ 0
  • αα = 0.05
  • The p-value is 0.026 (from LinRegTTest on your calculator or from computer software)
  • The p-value, 0.026, is less than the significance level of αα = 0.05
  • Decision: Reject the Null Hypothesis H o H o
  • Conclusion: There is sufficient evidence to conclude that there is a significant linear relationship between xx and yy because the correlation coefficient is significantly different from 0.
  • Because rr is significant and the scatter plot shows a linear trend, the regression line can be used to predict final exam scores.

METHOD 2: Using a table of Critical Values to make a decision

The 95% Critical Values of the Sample Correlation Coefficient Table at the end of this chapter (before the Summary) may be used to give you a good idea of whether the computed value of rr is significant or not. Compare rr to the appropriate critical value in the table. If rr is not between the positive and negative critical values, then the correlation coefficient is significant. If rr is significant, then you may want to use the line for prediction.

Example 1

Suppose you computed r=0.801r=0.801 using n=10n=10 data points. df=n-2=10 -2=8df=n-2=10 -2=8. The critical values associated with df=8df=8 are -0.632 and + 0.632. If rr<negative critical valuenegative critical value or r>positive critical valuer>positive critical value, then rr is significant. Since r=0.801r=0.801 and 0.801>0.6320.801>0.632, rr is significant and the line may be used for prediction. If you view this example on a number line, it will help you.

Figure 1: rr is not significant between -0.632 and +0.632. r=0.801>+0.632r=0.801>+0.632. Therefore, rr is significant.
Horizontal number line with values of -1, -0.632, 0, 0.632, 0.801, and 1. A dashed line above values -0.632, 0, and 0.632 indicates not significant values.

Example 2

Suppose you computed r=-0.624r=-0.624 with 14 data points. df=14-2=12df=14-2=12. The critical values are -0.532 and 0.532. Since -0.624-0.624<-0.532-0.532, rr is significant and the line may be used for prediction

Figure 2: r=-0.624r=-0.624<-0.532-0.532. Therefore, rr is significant.
Horizontal number line with values of -0.624, -0.532, and 0.532.

Example 3

Suppose you computed r=0.776r=0.776 and n=6n=6. df=6-2=4df=6-2=4. The critical values are -0.811 and 0.811. Since -0.811-0.811< 0.7760.776 < 0.8110.811, rr is not significant and the line should not be used for prediction.

Figure 3: -0.811-0.811<r=0.776r=0.776<0.8110.811. Therefore, rr is not significant.
Horizontal number line with values -0.924, -0.532, and 0.532.

THIRD EXAM vs FINAL EXAM EXAMPLE: critical value method

  • Consider the third exam/final exam example.
  • The line of best fit is: y ^ = -173.51 + 4.83x y ^ =-173.51+4.83x with r = 0.6631 r=0.6631 and there are n = 11n = 11 data points.
  • Can the regression line be used for prediction? Given a third exam score (xx value), can we use the line to predict the final exam score (predicted yy value)?
  • H o H o : ρρ = 0
  • H a H a : ρρ ≠ 0
  • αα = 0.05
  • Use the "95% Critical Value" table for rr with df = n -2 = 11 -2 =9df=n-2=11-2=9
  • The critical values are -0.602 and +0.602
  • Since 0.6631>0.6020.6631>0.602, r r is significant.
  • Decision: Reject H o H o :
  • Conclusion:There is sufficient evidence to conclude that there is a significant linear relationship between xx and yy because the correlation coefficient is significantly different from 0.
  • Because rr is significant and the scatter plot shows a linear trend, the regression line can be used to predict final exam scores.

Example 4: Additional Practice Examples using Critical Values

Suppose you computed the following correlation coefficients. Using the table at the end of the chapter, determine if rr is significant and the line of best fit associated with each rr can be used to predict a yy value. If it helps, draw a number line.

  1. r=-0.567r=-0.567 and the sample size, nn, is 19. The df=n-2=17df=n-2=17. The critical value is -0.456. -0.567-0.567<-0.456-0.456 so rr is significant.
  2. r=0.708r=0.708 and the sample size, nn, is 9. The df=n-2=7df=n-2=7. The critical value is 0.666. 0.708>0.6660.708>0.666 so rr is significant.
  3. r=0.134r=0.134 and the sample size, nn, is 14. The df=14-2=12df=14-2=12. The critical value is 0.532. 0.134 is between -0.532 and 0.532 so rr is not significant.
  4. r=0r=0 and the sample size, nn, is 5. No matter what the dfs are, r=0r=0 is between the two critical values so rr is not significant.

Assumptions in Testing the Significance of the Correlation Coefficient

Testing the significance of the correlation coefficient requires that certain assumptions about the data are satisfied. The premise of this test is that the data are a sample of observed points taken from a larger population. We have not examined the entire population because it is not possible or feasible to do so. We are examining the sample to draw a conclusion about whether the linear relationship that we see between xx and yy in the sample data provides strong enough evidence so that we can conclude that there is a linear relationship between xx and yy in the population.

The regression line equation that we calculate from the sample data gives the best fit line for our particular sample. We want to use this best fit line for the sample as an estimate of the best fit line for the population. Examining the scatterplot and testing the significance of the correlation coefficient helps us determine if it is appropriate to do this.

The assumptions underlying the test of significance are:

  • There is a linear relationship in the population that models the average value of yy for varying values of xx. In other words, the expected value of yy for each particular value lies on a straight line in the population. (We do not know the equation for the line for the population. Our regression line from the sample is our best estimate of this line in the population.)
  • The yy values for any particular xx value are normally distributed about the line. This implies that there are more yy values scattered closer to the line than are scattered farther away. Assumption (1) above implies that these normal distributions are centered on the line: the means of these normal distributions of yy values lie on the line.
  • The standard deviations of the population yy values about the line are equal for each value of xx. In other words, each of these normal distributions of yy values has the same shape and spread about the line.
  • The residual errors are mutually independent (no pattern).

Figure 4: The y values for each x value are normally distributed about the line with the same standard deviation. For each x value, the mean of the y values lies on the regression line. More y values lie near the line than are scattered further away from the line.
A downward sloping regression line is shown with the y values normally distributed about the line with equal standard deviations for each x value. For each x value, the mean of the y values lies on the regression line. More y values lie near the line than are scattered further away from the line.

**With contributions from Roberta Bloom

Collection Navigation

Content actions

Download:

Collection as:

PDF | EPUB (?)

What is an EPUB file?

EPUB is an electronic book format that can be read on a variety of mobile devices.

Downloading to a reading device

For detailed instructions on how to download this content's EPUB to your specific device, click the "(?)" link.

| More downloads ...

Module as:

PDF | More downloads ...

Add:

Collection to:

My Favorites (?)

'My Favorites' is a special kind of lens which you can use to bookmark modules and collections. 'My Favorites' can only be seen by you, and collections saved in 'My Favorites' can remember the last module you were on. You need an account to use 'My Favorites'.

| A lens I own (?)

Definition of a lens

Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

Who can create a lens?

Any individual member, a community, or a respected organization.

What are tags? tag icon

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

| External bookmarks

Module to:

My Favorites (?)

'My Favorites' is a special kind of lens which you can use to bookmark modules and collections. 'My Favorites' can only be seen by you, and collections saved in 'My Favorites' can remember the last module you were on. You need an account to use 'My Favorites'.

| A lens I own (?)

Definition of a lens

Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

Who can create a lens?

Any individual member, a community, or a respected organization.

What are tags? tag icon

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

| External bookmarks