Learning Outcomes
Show
The correlation coefficient, r, tells us about the strength and direction of the linear relationship between x and y. However, the reliability of the linear model also depends on how many observed data points are in the sample. We need to look at both the value of the correlation coefficient r and the sample size n, together. We perform a hypothesis test of the “significance of the correlation coefficient” to decide whether the linear relationship in the sample data is strong enough to use to model the relationship in the population. The sample data are used to compute r, the correlation coefficient for the sample. If we had data for the entire population, we could find the population correlation coefficient. But because we have only have sample data, we cannot calculate the population correlation coefficient. The sample correlation coefficient, r, is our estimate of the unknown population correlation coefficient.
The hypothesis test lets us decide whether the
value of the population correlation coefficient If the test concludes that the correlation coefficient is significantly different from zero, we say that the correlation coefficient is “significant.” If the test concludes that the correlation coefficient is not significantly different from zero (it is close to zero), we say that correlation coefficient is “not significant.” Conclusion: “There is insufficient evidence to conclude that there is a significant linear relationship between Note
Performing the Hypothesis Test
What the Hypotheses Mean in Words
Drawing a ConclusionThere are two methods of making the decision. The two methods are equivalent and give the same result.
In this chapter of this textbook, we will always use a significance level of 5%, α = 0.05 NoteUsing the p-value method, you could choose any appropriate significance level you want; you are not limited to using α = 0.05. But the table of critical values provided in this textbook assumes that we are using a significance level of 5%, α = 0.05. (If we wanted to use a different significance level than 5% with the critical value method, we would need different tables of critical values that are not provided in this textbook.) Method 1: Using a p-value to make a decisionTo calculate the p-value using LinRegTTEST:
If the p-value is less than the significance level (α = 0.05)
If the p-value is NOT less than the significance level (α = 0.05)
Calculation Notes:
An alternative way to calculate the p-value (p) given by LinRegTTest is the command 2*tcdf(abs(t),10^99, n-2) in 2nd DISTR. Method 2: Using a table of Critical Values to make a decisionThe 95% Critical Values of the Sample Correlation Coefficient Table can be used to give you a good idea of whether the computed value of is significant or not. Compare r to the appropriate critical value in the table. If r is not between the positive and negative critical values, then the correlation coefficient is significant. Ifr is significant, then you may want to use the line for prediction. ExampleSuppose you computed r = 0.801 using n = 10 data points.df = n – 2 = 10 – 2 = 8. The critical values associated with df = 8 are -0.632 and + 0.632. If r < negative critical value or r > positive critical value, then r is significant. Since r = 0.801 and 0.801 > 0.632, r is significant and the line may be used for prediction. If you view this example on a number line, it will help you. r is not significant between -0.632 and +0.632. r = 0.801 > +0.632. Therefore, r is significant. try itFor a given line of best fit, you computed that r = 0.6501 using n = 12 data points and the critical value is 0.576. Can the line be used for prediction? Why or why not? If the scatter plot looks linear then, yes, the line can be used for prediction, because r > the positive critical value. ExampleSuppose you computed r = –0.624 with 14 data points. df = 14 – 2 = 12. The critical values are –0.532 and 0.532. Since –0.624 < –0.532, r is significant and the line can be used for prediction r = –0.624-0.532. Therefore, r is significant. try itFor a given line of best fit, you compute that r = 0.5204 using n = 9 data points, and the critical value is 0.666. Can the line be used for prediction? Why or why not? No, the line cannot be used for prediction, because r < the positive critical value. Example 3Suppose you computed r = 0.776 and n = 6. df = 6 – 2 = 4. The critical values are –0.811 and 0.811. Since –0.811 < 0.776 < 0.811, r is not significant, and the line should not be used for prediction. –0.811 < r = 0.776 < 0.811. Therefore, r is not significant. Try itFor a given line of best fit, you compute that r = –0.7204 using n = 8 data points, and the critical value is = 0.707. Can the line be used for prediction? Why or why not? Yes, the line can be used for prediction, because r < the negative critical value. ExampleSuppose you computed the following correlation coefficients. Using the table at the end of the chapter, determine if r is significant and the line of best fit associated with each r can be used to predict a y value. If it helps, draw a number line.
try itFor a given line of best fit, you compute that r = 0 using n = 100 data points. Can the line be used for prediction? Why or why not? No, the line cannot be used for prediction no matter what the sample size is. Assumptions in Testing the Significance of the Correlation CoefficientTesting the significance of the correlation coefficient requires that certain assumptions about the data are
satisfied. The premise of this test is that the data are a sample of observed points taken from a larger population. We have not examined the entire population because it is not possible or feasible to do so. We are examining the sample to draw a conclusion about whether the linear relationship that we see between The regression line equation that we calculate from the sample data gives the best-fit line for our particular sample. We want to use this best-fit line for the sample as an estimate of the best-fit line for the population. Examining the scatterplot and testing the significance of the correlation coefficient helps us determine if it is appropriate to do this. The assumptions underlying the test of significance are:
The y values for each x value are normally distributed about the line with the same standard deviation. For each x value, the mean of the y values lies on the regression line. More y values lie near the line than are scattered further away from the line. Concept ReviewLinear regression
is a procedure for fitting a straight line of the form
The slope b and intercept a of the least-squares line estimate the slope β and intercept α of the population (true) regression line. To estimate the population standard deviation of y, σ, use the standard deviation of the residuals, s. [latex]\displaystyle{s}=\sqrt{{\frac{{{S}{S}{E}}}{{{n}-{2}}}}}[/latex] The variable ρ (rho) is the population correlation coefficient. To test the null hypothesis H0: ρ = hypothesized value, use a linear regression t-test. The most common null hypothesis is H0: ρ = 0 which indicates there is no linear relationship between x and y in the population. The TI-83, 83+, 84, 84+ calculator function LinRegTTest can perform this test (STATS TESTS LinRegTTest). Formula ReviewLeast Squares Line or Line of Best Fit: [latex]\displaystyle\hat{{y}}={a}+{b}{x}[/latex] where a = y-intercept, b = slope Standard deviation of the residuals: [latex]\displaystyle{s}=\sqrt{{\frac{{{S}{S}{E}}}{{{n}-{2}}}}}[/latex] where SSE = sum of squared errors n = the number of data points What does a correlation coefficient of 0.95 mean?The magnitude of the correlation coefficient indicates the strength of the association. For example, a correlation of r = 0.9 suggests a strong, positive association between two variables, whereas a correlation of r = -0.2 suggest a weak, negative association.
When the value of Pearson's correlation coefficient between two continuous variables is 0.95 What does it imply?Pearson Correlation Coefficient is calculated using the formula given below. We have an output of 0.95; this indicates that when the number of hours played to increase, the test scores also increase. These two variables are positively correlated.
Is 0.95 A strong correlation?In summary: As a rule of thumb, a correlation greater than 0.75 is considered to be a “strong” correlation between two variables. However, this rule of thumb can vary from field to field. For example, a much lower correlation could be considered strong in a medical field compared to a technology field.
What does the Pearson correlation tell us about the relationship between two variables?The Pearson correlation measures the strength of the linear relationship between two variables. It has a value between -1 to 1, with a value of -1 meaning a total negative linear correlation, 0 being no correlation, and + 1 meaning a total positive correlation.
|