Correlation and regression, both are commonly used and popular concept in the field of statistics and data science applications. Though both correlation and regression the concepts are being used to understand the relation of two or more variables, many people may get confused between the same concepts. In this article, we’ll discuss both the concept in terms of difference and similarity and use cases. By end of the article, you will get clear understanding of difference between correlation and regression and you will not be confused among correlation and regression in your life.
To describe the difference between correlation and regression, we’ll use two variables as an example.
Table of Contents
History of Correlation and Regression
Both correlation and regression concept were introduced by Francis Galton in the early 90’s. F. Galton was statistician, psychologist, sociologist, and anthropologist. Those days regression was termed as regression towards mean.
What is Correlation?
Correlation is to find out the association and strength of association among two or more variables. In this discussion we’ll use two variables. Correlation method is being used for descriptive analytics to find relation between variables.
Broadly there are three types of correlation, positive correlation, negative correlation and zero correlation. The output of correlation is denoted by ‘r’.
Type of association- If r is positive number then the correlation between variables are positive. In other words, if one variable increase, other will also increase. If r showing negative value, the association between variables are negative, which means if one variable increase, other will decrease. In case of no association between variables, r is showing zero output.
There are broadly four types of correlation, Pearson correlation, Rank correlation, Spearman correlation and Point- Biserial correlation
Example of correlation explained in the later part of this discussion.
Strength of association- The value of r also tells us about the strength of association among variables.
0.7 to 1- Strong correlation
0.5 to 0.7- Moderate correlation
0.3 to 0.5- Week correlation
0 to 0.3- Zero stands for no correlation and less then 0.3 interpreted as very week correlation.
Note- Whenever doing correlation exercise, it is a recommendation to remover outlier from the data set so that output would not get affected by the outlier and to get proper “r” value.
What is Regression?
Regression is a methodology to find association among dependent and independent variables. It is used for predictive analytics. Regression is defined by the equation y= a + bx (linear regression) where y is the dependent variable and x is independent variable. In case of multiple linear regression, independent variables are more than one. On the other hand, a denotes as intercept and b denotes for slope of the regression line. R2 is the measure of fit of regression line or fit of dependent and independent variables.
Example of correlation explained in the later part of this discussion.
Note- Whenever doing regression exercise, it is a recommendation to remover outlier from the data set so that output would not get affected by the outlier and to get proper RSQ value.
Interpret R square vs r output value
As mentioned earlier that the value of r is describing the type association with positive and negative value and strength of association is measured by the scale of value whether it is near to r or near to zero.
In case of regression, R square is depicting the best fit of regression line or strength of association between two variables. R square cannot be negative and if r value is near to 1, it can be depicted as good strength of association among two variables.
In a nutshell, if r value is squared, R2 can be found.
Difference between correlation and regression (tabular comparison)
Correlation
Regression
Tells about association among variables
Tells the strength of dependency of dependent variable on independent variable
Denoted by r
Denoted by R square
R can be negative and positive
R square cannot be negative, always it is positive
Single point representation in scatterplot
Regression line or line representation in the concept
Variables dependency concept not used
Concept of Dependent and independent variable used
How to find out correlation coefficient (r) and R square (RSQ) in excel and R
Correlation in excel can be found either creating scatter plot to visualize the relation between two variables. Another way to find out correlation or value of r in excel is to use “CORREL” function. By using scatterplot, viewer can only have a sense whether correlation is exists among the variables or not where “CORREL” function will show the output in terms of value.
In case of R, correlation can be found by using “cor(X,Y)” function.
To find out the R square for regression line, RSQ function should be used in excel. There is another way to find RSQ in excel, by creating scatter plot and activate R square value showing option from the trendline option.
To activate R square value in excel, follow the below steps.
i) After plotting the data into scatter plot, click on + icon at top right corner of the graph and add trendline.
ii) Double click on the trendline or right click on the trendline and select ‘Format Trendline’.
iii) From the right side ‘Format Trendline’ window, select ‘Display R-squared value chart’, located at the bottom.
In case of R, RSQ can be found by using “rsq()” function.
Visualize correlation and regression on same data
As it is visualizing in the scatter plot that the variables, Temperature, and Ice Cream Sales are negatively correlated. The r value is -0.877 which also describes that both the variables are negatively correlated and strongly associated. On the other hand, RSQ value on the graph is 0.7691 which is the square value of correlation coefficient r.
Conclusion
Correlation and regression both are useful and important to identify the relation among two or more variables. In a nutshell, correlation talks about association and type of association among variable which is denoted by r. On the other hand, regression helps to find out explanatory or independent variable and the association. Regression line best fit is denoted by R2 or RSQ. So, the difference between correlation and regression was explained with example.
1 thought on “What is the difference between Correlation and Regression | Know difference between r and RSQ”