8/2/2019 Regression Corr
1/15
Simple Linear Regression & Correlation
We now consider bivariate data. Bivariate data is data
connecting two variables:
The independent or explanatory variable, and
The dependent or response variable.
8/2/2019 Regression Corr
2/15
Simple Linear Regression & Correlation
A scatter diagram is used to display bivariate data.
We are interested if there is a linear relationship between
the two sets of data displayed. This can fall in one of 3
classes:
1. POSITIVE LINEAR CORRELATION
2. NEGATIVE LINEAR CORRELATION
3. NO LINEAR CORRELATION
8/2/2019 Regression Corr
3/15
Simple Linear Regression & Correlation
Scatter diagrams show plots of ordered pairs .
Usually, represents the independent variable and
represents the dependent variable. The following
scatter diagrams show different classes of linear
relationships:
x
y
, x y
8/2/2019 Regression Corr
4/15
Simple Linear Regression & Correlation
Graph showing positive correlation
8/2/2019 Regression Corr
5/15
Simple Linear Regression & Correlation
Graph showing negative correlation
8/2/2019 Regression Corr
6/15
Simple Linear Regression & Correlation
Graph showing no linear correlation
8/2/2019 Regression Corr
7/15
Simple Linear Regression & Correlation
A scatter diagram is a good indication as to whether the
linear relationship between the two sets of data can be
represented by a mathematical equation.
This mathematical equation is called a regression function.
We can determine the equation of the regression line
representing the proposed relationship.
8/2/2019 Regression Corr
8/15
Simple Linear Regression & Correlation The least squares regression line of on :
The general equation of a straight line is
where represents the gradient and represents the
y-intercept. The regression equation is very similar:
The regression equation takes the form
where will represent the gradient and will represent the
y-intercept. These values are called the regression coefficients.
y
y mx c
x
cm
y a bx
ab
8/2/2019 Regression Corr
9/15
Simple Linear Regression & Correlation
These values for and are determined by the following
formulae: and
where and are the mean of and the mean of
respectively. The formula for is based on the fact that
MUST be a point on the regression line.
N.B. and
22n xy x
b n x x
a y bx
a b
x y x
y , x y
a
n
x x
n
y y
8/2/2019 Regression Corr
10/15
Simple Linear Regression & Correlation
A way to measure the strength of a linear relationship between two
variables is using the product moment correlation coefficient .
This is a number which lies between -1 and +1 . Consider 3 cases:
If then there is positive linear correlation.
If then there is negative linear correlation.
If then there is no linear correlation present
r
0r
0r
0r
8/2/2019 Regression Corr
11/15
Simple Linear Regression & Correlation
The correlation coefficient is calculated using the formula:
From this formula we can also determine the coefficient of determination :
This measure represents the ratio of explained variation to total variation.
2 22 2
nxy xyr
nx x ny y
2
2 22 2
nxy xy
cdr nx x ny y
8/2/2019 Regression Corr
12/15
*A Practical ExampleTabulated is the maintenance cost/00s$ to the age/mths of 10 X -ray machines.
i. Find the least squares regression line of maintenance cost on age and use it to
predict the maintenance cost for a similar machine 40 mths old.
ii. Calculate the product moment correlation coefficient between the age of the
machine and the cost for maintenance.
iii. Determine the percentage variation in the total maintenance cost that is explained
by the variation in the machine age.
Question adapted from Business Mathematics & Statistics
395500000351000504090ost ( y )
6000000050ge ( x )10achine
8/2/2019 Regression Corr
13/15
8/2/2019 Regression Corr
14/15
From the table on the previous slide we know that:
i. Using the formulae from slide 9
The regression equation is
Estimated cost =
2 210; 300; 2970; 97650; 12050;
nxy xy x y
2 22
10 97650 300 29702.8
10 12050 300
2970 3002.8 21.
10 10
nxy x y
b nx x
y xaybx b
n n
212.9 2.8 y y a bx
212. 9 2. 840 $324. y
8/2/2019 Regression Corr
15/15
Recall
Using the formulae from slide 11
i. The correlation coefficient =
i. The coefficient of determination =
Thus 77% of the variation in maintenance costs is explained by the variation in
machine ages.
2 210; 300; 2970; 97650; 12050;
nxy xy x y
2 22 2
2 2
10 97650 300 2970
10 12050 300 10 913050 2970
0.88
n xy x yr
n x x n y y
r
r
22
0.88 0.cdr