We are searching data for your request:
Upon completion, a link will appear to access the found materials.
Many times in the study of statistics it is important to make connections between different topics. We will see an example of this, in which the slope of the regression line is directly related to the correlation coefficient. Since these concepts both involve straight lines, it is only natural to ask the question, "How are the correlation coefficient and least square line related?"
First, we will look at some background regarding both of these topics.
Details Regarding Correlation
It is important to remember the details pertaining to the correlation coefficient, which is denoted by r. This statistic is used when we have paired quantitative data. From a scatterplot of this paired data, we can look for trends in the overall distribution of data. Some paired data exhibits a linear or straight line pattern. But in practice, the data never falls exactly along a straight line.
Several people looking at the same scatterplot of paired data would disagree on how close it was to showing an overall linear trend. After all, our criteria for this may be somewhat subjective. The scale that we use could also affect our perception of the data. For these reasons and more we need some kind of objective measure to tell how close our paired data is to being linear. The correlation coefficient achieves this for us.
A few basic facts about r include:
- The value of r ranges between any real number from -1 to 1.
- Values of r close to 0 imply that there is little to no linear relationship between the data.
- Values of r close to 1 imply that there is a positive linear relationship between the data. This means that as x increases that y also increases.
- Values of r close to -1 imply that there is a negative linear relationship between the data. This means that as x increases that y decreases.
The Slope of the Least Squares Line
The last two items in the above list point us toward the slope of the least squares line of best fit. Recall that the slope of a line is a measurement of how many units it goes up or down for every unit we move to the right. Sometimes this is stated as the rise of the line divided by the run, or the change in y values divided by the change in x values.
In general, straight lines have slopes that are positive, negative or zero. If we were to examine our least-square regression lines and compare the corresponding values of r, we would notice that every time that our data has a negative correlation coefficient, the slope of the regression line is negative. Similarly, for every time that we have a positive correlation coefficient, the slope of the regression line is positive.
It should be evident from this observation that there is definitely a connection between the sign of the correlation coefficient and the slope of the least squares line. It remains to explain why this is true.
The Formula for the Slope
The reason for the connection between the value of r and the slope of the least squares line has to do with the formula that gives us the slope of this line. For paired data (x,y) we denote the standard deviation of the x data by sx and the standard deviation of the y data by sy.
The formula for the slope a of the regression line is:
- a = r(sy/sx)
The calculation of a standard deviation involves taking the positive square root of a nonnegative number. As a result, both standard deviations in the formula for the slope must be nonnegative. If we assume that there is some variation in our data, we will be able to disregard the possibility that either of these standard deviations is zero. Therefore the sign of the correlation coefficient will be the same as the sign of the slope of the regression line.