Date post: | 15-Nov-2014 |
Category: |
Education |
Upload: | aakriti-agarwal |
View: | 322 times |
Download: | 5 times |
Correlation and Regression
-Aakriti Agarwal
Roll No. 13004
BMS 1A
Correlation
• Correlation refers to statistical relationships involving two random variables or sets of data
• The correlation coefficient is denoted by ‘r’ and ranges from -1 to +1
• Tells the Direction and Measure of the Relationship between two variables
Coefficient of Correlation The coefficient of correlation can be:
• perfectly negative r=-1
• strong negative -1<r<0 and r closer to 1
• weak negative -1<r<0 and r closer to 0
• independent r=0
• strong positive 0<r<1 and r closer to 1
• weak positive 0<r<0 and r closer to 0
• perfect positive r=1
Methods to calculate
Correlation Coefficient
Karl Pearson
Spearman
Karl Pearson
𝑟 = (𝑥1 − 𝑥 )(𝑦1 − 𝑦 )𝑛𝑖=0
𝑥1 − 𝑥 2 𝑦1 − 𝑦 2
n - number of pairs of observations
Data for Calculation in MS Excel
Year Marketing
Expenditure Sales (In Rs. Lakhs) (in Unit Lakhs)
2001 8 9.1 2002 10.5 10.1 2003 11 9.3 2004 12 9.9 2005 12.9 11.3 2006 13.5 10.9 2007 11.6 11.6 2008 10.9 12.5 2009 13 14 2010 14 14.5 2011 15.3 15 2012 16 15.6 2013 17 16.2
0
2
4
6
8
10
12
14
16
18
2000 2005 2010 2015
Lakhs
Year
Expenditure In Lakhs
Sales in Lakhs
Pearson in MS Excel r=$H$16/($E$16*$G$16)^0.5
∑(𝑥𝑖 − 𝑥 )(𝑦𝑖 − 𝑦 )
∑ 𝑥𝑖 − 𝑥 2
∑ 𝑦𝑖 − 𝑦 2
Year x y 𝑥1 − 𝑥 𝑥𝑖 − 𝑥 2 𝑦𝑖 − 𝑦 𝑦𝑖 − 𝑦 2 (𝑥𝑖 − 𝑥 )(𝑦𝑖 − 𝑦 )
2001 8 9.1 -4.746153846 22.52598 -3.20769 10.28929 15.2242
2002 10.5 10.1 -2.246153846 5.045207 -2.20769 4.873905 4.958817
2003 11 9.3 -1.746153846 3.049053 -3.00769 9.046213 5.251893
2004 12 9.9 -0.746153846 0.556746 -2.40769 5.796982 1.796509
2005 12.9 11.3 0.153846154 0.023669 -1.00769 1.015444 -0.15503
2006 13.5 10.9 0.753846154 0.568284 -1.40769 1.981598 -1.06118
2007 11.6 11.6 -1.146153846 1.313669 -0.70769 0.500828 0.811124
2008 10.9 12.5 -1.846153846 3.408284 0.192308 0.036982 -0.35503
2009 13 14 0.253846154 0.064438 1.692308 2.863905 0.429586
2010 14 14.5 1.253846154 1.57213 2.192308 4.806213 2.748817
2011 15.3 15 2.553846154 6.52213 2.692308 7.248521 6.87574
2012 16 15.6 3.253846154 10.58751 3.292308 10.83929 10.71266
2013 17 16.2 4.253846154 18.09521 3.892308 15.15006 16.55728
𝒙 12.74615 ∑ ∑ ∑
𝒚 12.30769 73.33231 74.44923 63.79538
Square
root
R= 0.863399
Spearman
𝑟 = 1 −6 𝐷2 +
112
𝑚𝑖3 −𝑚𝑖
𝑛𝑖=0
𝑛3 − 𝑛
m=no. of times a pair of observations is repeated
D=Rank 1- Rank 2
Spearman in MS Excel
=SUM(F3:F15)
=64 ∑𝐷2
=1-(6*F17)/($A$1*($A$1^2-1))
Year x y Rank 1 Rank 2 D(R1-R2) 𝐷2
2001 8 9.1 1 1 0 0
2002 10.5 10.1 2 4 -2 4
2003 11 9.3 4 2 2 4
2004 12 9.9 6 3 3 9
2005 12.9 11.3 7 6 1 1
2006 13.5 10.9 9 5 4 16
2007 11.6 11.6 5 7 -2 4
2008 10.9 12.5 3 8 -5 25
2009 13 14 8 9 -1 1
2010 14 14.5 10 10 0 0
2011 15.3 15 11 11 0 0
2012 16 15.6 12 12 0 0
2013 17 16.2 13 13 0 0
∑𝐷2
No. of pairs of
observations
squaring
R=0.824176
Are Correlation and Causation the same?
Correlation ≠ Causation
If it were, these would be true...
Practical Applications
Practical Application
Correlation is used in:
• Business
• Government
• Education
• Medicine
• Agriculture
Business
• Marketing Expenditure and Sales Volume correlation (to measure the efficiency of marketing department)
• Correlation between prices of two securities in the stock market.
• Price of a commodity to supply(or demand) correlation.
Government • Year on Year Revenue and Expenditure
Correlation (to forecast revenue based on expenditure)
• Tool in formulating various Economic Policies by correlating past trends.
• Yardstick to measure performance (Correlation between Planned and Actual Revenue)
Education Models • Forecasting of student input flows towards elementary education (Correlation between birth rate data and enrollment in elementary grades)
• Forecasting of dropped out student flows at different levels of education (intermediate, graduate, post graduate)
Medicine
• Finding out after effects of interactions between different medicines.
• Estimating the best treatment where various methods are applicable (Correlation between individual treatments’ results and severity of disease.
Agriculture
• Correlation between certain weather conditions and Productivity.
• Correlation between irrigating and Productivity.
• Correlation between price and production or price and demand, to study demand supply pattern of crops in different seasons.
Conclusion
• Correlation is one of the many effective ways of forecasting and predicting possible outcomes based on past observations.
• Though other statistical methods too need to be implemented to get a complete picture of the situation.
Thank You