7/28/2019 8 Identifying Relationships
1/50
Identifyingrelationships
Dr James Abdey
Overview
Relationship between twovariables
Correlation
Regression
The simple linear
regression model
Parameter estimation
Interpretation of correlation
coefficient
Coefficient of determination,
R
2
Prediction
Regression diagnostics
Worked example
Multiple linear regression
Applied Marketing(Market Research Methods)
Topic 8:
Identifying relationships
Dr James Abdey
http://find/7/28/2019 8 Identifying Relationships
2/50
Identifyingrelationships
Dr James Abdey
Overview
Relationship between twovariables
Correlation
Regression
The simple linear
regression model
Parameter estimation
Interpretation of correlation
coefficient
Coefficient of determination,
R
2
Prediction
Regression diagnostics
Worked example
Multiple linear regression
Overview
We consider regression analysis which is used for
explaining variation in market share, sales, brand
preference etc.
This may use explanatory variables such as
advertising, price, distribution and product quality
Starting with correlation, we proceed to the simple
linear model followed by multiple linear regression
http://find/7/28/2019 8 Identifying Relationships
3/50
Identifyingrelationships
Dr James Abdey
Overview
Relationship between twovariables
Correlation
Regression
The simple linear
regression model
Parameter estimation
Interpretation of correlation
coefficient
Coefficient of determination,
R
2
Prediction
Regression diagnostics
Worked example
Multiple linear regression
Relationship between two variables
We now investigate the relationship between two
variables
When we have data on two variables (X and Y), we
have bivariate data
We will consider how to:
measure the strength of the relationship
model the relationship
predict the value of one variable on the basis of theother
http://find/7/28/2019 8 Identifying Relationships
4/50
Identifyingrelationships
Dr James Abdey
Overview
Relationship between twovariables
Correlation
Regression
The simple linear
regression model
Parameter estimation
Interpretation of correlation
coefficient
Coefficient of determination,
R
2
Prediction
Regression diagnostics
Worked example
Multiple linear regression
Relationship between two variables
First thing to do with data is to provide a graphical
representation
For one variable this might be a histogram,
stem-and-leaf diagram etc.
For two variables we produce a scatter diagram
This must include the following:
title axis labels units and be accurate!
http://find/7/28/2019 8 Identifying Relationships
5/50
Identifyingrelationships
Dr James Abdey
Overview
Relationship between twovariables
Correlation
Regression
The simple linear
regression model
Parameter estimation
Interpretation of correlation
coefficient
Coefficient of determination,
R
2
Prediction
Regression diagnostics
Worked example
Multiple linear regression
Relationship between two variables
Assume that we have some data in paired form:
(xi, yi), i= 1,2, . . . , n
An example might be unemployment and crime
figures for 12 areas of a city, of interest to insurers in
setting policy premia for people insuring against theft
Unemp., x 2614 1160 1055 1199 2157 2305
Offences, y 6200 4610 5336 5411 5808 6004
Unemp., x 1687 1287 1869 2283 1162 1201
Offences, y 5420 5588 5719 6336 5103 5268
http://find/7/28/2019 8 Identifying Relationships
6/50
Identifyingrelationships
Dr James Abdey
Overview
Relationship between twovariables
Correlation
Regression
The simple linear
regression model
Parameter estimation
Interpretation of correlation
coefficient
Coefficient of determination,
R2
Prediction
Regression diagnostics
Worked example
Multiple linear regression
Relationship between two variables We plot X on the horizontal axis, and Y on the
vertical axis
This emphasises any relationship between thevariables
x
x
xx
x
x
x
x
x
x
x
x
1000 1500 2000 2500
5000
5500
6000
Scatter plot of Crime against Unemployment
Unemployment
Numberofoffences
http://find/7/28/2019 8 Identifying Relationships
7/50
Identifyingrelationships
Dr James Abdey
Overview
Relationship between twovariables
Correlation
Regression
The simple linear
regression model
Parameter estimation
Interpretation of correlation
coefficient
Coefficient of determination,
R2
Prediction
Regression diagnostics
Worked example
Multiple linear regression
Relationship between two variables
A positive, linear relationship is apparent
X and Y increase together, roughly linearly
Hence the implied linear relationship is not exact
The points do not lie exactly on a straight line
Such an upward shape is termed positive
correlation
We will see later how to quantify correlation
http://find/7/28/2019 8 Identifying Relationships
8/50
Identifyingrelationships
Dr James Abdey
Overview
Relationship between twovariables
Correlation
Regression
The simple linear
regression model
Parameter estimation
Interpretation of correlation
coefficient
Coefficient of determination,
R2
Prediction
Regression diagnostics
Worked example
Multiple linear regression
Relationship between two variables
Other examples of scatter plots include:
LHS: Negative correlation (Y decreases as X increases)RHS: Uncorrelated data (no obvious (linear) relationshipbetween X and Y)
xx
x
x
x
x
x
x
x
x
x
x
2 4 6 8
2
4
6
8
Scatter plot
x
y
x
x
x
x
x
x
x
x
x
x
x
x
0 2 4 6 8
2
4
6
8
Scatter plot
x
y
http://find/7/28/2019 8 Identifying Relationships
9/50
Identifyingrelationships
Dr James Abdey
Overview
Relationship between twovariables
Correlation
Regression
The simple linear
regression model
Parameter estimation
Interpretation of correlation
coefficient
Coefficient of determination,
R2
Prediction
Regression diagnostics
Worked example
Multiple linear regression
Correlation
Correlation measures the strength of the linearrelationship between two variables, each measured
on an interval scale
Positive correlation the two variables tend to
vary in the same direction
Negative correlation the two variables tend to
vary in the opposite direction
Perfect correlation the two variables have points
which all lie exactly on a straight line
http://find/7/28/2019 8 Identifying Relationships
10/50
Identifyingrelationships
Dr James Abdey
Overview
Relationship between twovariables
Correlation
Regression
The simple linear
regression model
Parameter estimation
Interpretation of correlation
coefficient
Coefficient of determination,
R2
Prediction
Regression diagnostics
Worked example
Multiple linear regression
Correlation
If there exists a perfect linear relationship between
X and Y, we can represent them using an equationof the form
Y = + X
represents the intercept of the line
represents the slope or gradient of the line Examples of anticipated correlation:
Variables Correlation
Height & weight Positive
Rainfall & sunshine hours NegativeIce cream sales & sun cream sales Positive
Hours of study & exam mark Positive
Cars petrol consumption & goals scored Zero
http://goforward/http://find/http://goback/7/28/2019 8 Identifying Relationships
11/50
Identifyingrelationships
Dr James Abdey
Overview
Relationship between twovariables
Correlation
Regression
The simple linear
regression model
Parameter estimation
Interpretation of correlation
coefficient
Coefficient of determination,
R2
Prediction
Regression diagnostics
Worked example
Multiple linear regression
Correlation
Positive correlation: large X with large Y; small Xwith small Y
Negative correlation: large X with small Y; small X
with large Y
However, since the X and Y may have widely
different numerical values we need to take this into
account
We do this by considering how far away from the
means the two scores are
http://find/7/28/2019 8 Identifying Relationships
12/50
Identifyingrelationships
Dr James Abdey
Overview
Relationship between twovariables
Correlation
Regression
The simple linear
regression model
Parameter estimation
Interpretation of correlation
coefficient
Coefficient of determination,
R2
Prediction
Regression diagnostics
Worked example
Multiple linear regression
Correlation
So, we are interested in the degree to which
variations in variable values are related to each other
Our basis for the measurement of correlation is
n
i=1
(xi x)(yi y) =
n
i=1
xiyi nxy
Unfortunately, this measure is extremely sensitive to
the units in which the variables are measured
We would prefer a measure of correlation to remain
the same regardless of the units of measurement
(e.g. days, hours, minutes or seconds)
http://find/7/28/2019 8 Identifying Relationships
13/50
Identifyingrelationships
Dr James Abdey
Overview
Relationship between twovariables
Correlation
Regression
The simple linear
regression model
Parameter estimation
Interpretation of correlation
coefficient
Coefficient of determination,
R2
Prediction
Regression diagnostics
Worked example
Multiple linear regression
Correlation
So, we use the following to measure the correlation
for (sample) data
r = xiyi nxy(
x2i nx2) (
y2i ny
2)
=
(xi x)(yi y)
(xi x)2
(yi y)2
http://find/7/28/2019 8 Identifying Relationships
14/50
Identifyingrelationships
Dr James Abdey
Overview
Relationship between twovariables
Correlation
Regression
The simple linear
regression model
Parameter estimation
Interpretation of correlation
coefficient
Coefficient of determination,
R2
Prediction
Regression diagnostics
Worked example
Multiple linear regression
Correlation
Returning to the unemployment/crime dataset:xi = 19979,
x2i = 36695129,
yi = 66803,
y2i = 374471231,
xiyi = 113784494
Since n= 12, we have x = 19979/12 = 1664.92 andy = 66803/12 = 5566.92
Hence the (sample) correlation coefficient, r, is
r = 0.861
Of course, in practise we can software like SPSS to
calculate r for us!
f
http://find/7/28/2019 8 Identifying Relationships
15/50
Identifyingrelationships
Dr James Abdey
Overview
Relationship between twovariables
Correlation
Regression
The simple linear
regression model
Parameter estimation
Interpretation of correlation
coefficient
Coefficient of determination,
R2
Prediction
Regression diagnostics
Worked example
Multiple linear regression
Correlation
The (sample) correlation coefficient, r, takes values
between 1 and 1, i.e.
1 r 1
r> 0 indicates positive correlation, with r = 1indicating perfect positive correlation
r< 0 indicates negative correlation, with r = 1
indicating perfect negative correlation
The closer |r| is to 1, the stronger the linearrelationship is
Id tif iR i
http://find/7/28/2019 8 Identifying Relationships
16/50
Identifyingrelationships
Dr James Abdey
Overview
Relationship between twovariables
Correlation
Regression
The simple linear
regression model
Parameter estimation
Interpretation of correlation
coefficient
Coefficient of determination,
R2
Prediction
Regression diagnostics
Worked example
Multiple linear regression
Regression
Here we introduce simple linear regression
Only part of a very large topic in statistical analysis
In the simple model, we have two variables Y and X:
Y is the dependent (or response) variable thatwhich we are trying to explain using:
X, the independent (or explanatory) variable the factor we think influences Y
IdentifyingTh i l li i d l
http://find/7/28/2019 8 Identifying Relationships
17/50
Identifyingrelationships
Dr James Abdey
Overview
Relationship between twovariables
Correlation
Regression
The simple linear
regression model
Parameter estimation
Interpretation of correlation
coefficient
Coefficient of determination,
R2
Prediction
Regression diagnostics
Worked example
Multiple linear regression
The simple linear regression model
Assume a true (population) linear relationshipbetween a response variable y and an explanatory
variable x of the approximate form:
y = + x
and are fixed, but unknown, populationparameters
is the y-intercept
is the slope of the line
We seek to estimate and using (paired) sampledata (xi, yi), i= 1, . . . , n
IdentifyingTh i l li i d l
http://find/7/28/2019 8 Identifying Relationships
18/50
Identifyingrelationships
Dr James Abdey
Overview
Relationship between twovariables
Correlation
Regression
The simple linear
regression model
Parameter estimation
Interpretation of correlation
coefficient
Coefficient of determination,
R2
Prediction
Regression diagnostics
Worked example
Multiple linear regression
The simple linear regression model
Particularly in business, we would not expect aperfect linear relationship between the two variables
Hence we modify this basic model to
y = + x+
is some random perturbation from the initialapproximate line
In other words, each y observation almost lies on the
postulated line, but jumps off the line according to
the random variable
Often referred to as the error term
IdentifyingP t ti ti Th l t
http://find/http://goback/7/28/2019 8 Identifying Relationships
19/50
Identifyingrelationships
Dr James Abdey
Overview
Relationship between twovariables
Correlation
Regression
The simple linear
regression model
Parameter estimation
Interpretation of correlation
coefficient
Coefficient of determination,
R2
Prediction
Regression diagnostics
Worked example
Multiple linear regression
Parameter estimation The least
squares method
For given sample data we could produce a scatter
plot
Any linear relationship would be visible
This would suggest performing a (simple) linear
regression
We estimate the population regression line
This estimated line is often termed the line of best fit
IdentifyingParameter estimation The least
http://find/7/28/2019 8 Identifying Relationships
20/50
Identifyingrelationships
Dr James Abdey
Overview
Relationship between twovariables
Correlation
Regression
The simple linear
regression model
Parameter estimation
Interpretation of correlation
coefficient
Coefficient of determination,
R2
Prediction
Regression diagnostics
Worked example
Multiple linear regression
Parameter estimation The least
squares method
How do we choose the line of best fit?
We require a formal criterion for determining the line
of best fit
Estimation of and will be by least squaresestimation
Specifically, we seek to minimise the sum of thesquared residuals, where a residual is the difference
between the true y value and its predicted (fitted)
value
IdentifyingParameter estimation The least
http://find/7/28/2019 8 Identifying Relationships
21/50
Identifyingrelationships
Dr James Abdey
Overview
Relationship between twovariables
Correlation
Regression
The simple linear
regression model
Parameter estimation
Interpretation of correlation
coefficient
Coefficient of determination,
R2
Prediction
Regression diagnostics
Worked example
Multiple linear regression
Parameter estimation The least
squares method
The least squares estimator for is
=
xiyi nxyx2i nx
2
The least squares estimators for is
= y x
Hence the line of best fit has equation:
y = + x
Again, this is routinely calculated in SPSS
IdentifyingExample
http://find/http://goback/7/28/2019 8 Identifying Relationships
22/50
y grelationships
Dr James Abdey
Overview
Relationship between twovariables
Correlation
Regression
The simple linear
regression model
Parameter estimation
Interpretation of correlation
coefficient
Coefficient of determination,
R2
Prediction
Regression diagnostics
Worked example
Multiple linear regression
Example
Returning to the unemployment/crime dataset
xi = 19979,
x2i = 36695129,
yi = 66803,
y2i = 374471231,
xiyi = 113784494
Since n= 12, we have x = 19979/12 = 1664.92 andy = 66803/12 = 5566.92, hence
=
xiyi nxy
x2i nx
2
=113784494 (12 1664.92 5566.92)
36695129 (12 1664.922)
= 0.7468
IdentifyingExample
http://find/7/28/2019 8 Identifying Relationships
23/50
y grelationships
Dr James Abdey
Overview
Relationship between two
variables
Correlation
Regression
The simple linear
regression model
Parameter estimation
Interpretation of correlation
coefficient
Coefficient of determination,
R2
Prediction
Regression diagnostics
Worked example
Multiple linear regression
Example
We estimate the intercept to be
= y x
= 5566.92 0.7468 1664.92
= 4323.6 Hence the least squares regression line is
y = 4323.6 + 0.7468x
Note the y notation, where the hat denotes anestimated value
IdentifyingInterpretation of correlation
http://find/7/28/2019 8 Identifying Relationships
24/50
relationships
Dr James Abdey
Overview
Relationship between two
variables
Correlation
Regression
The simple linear
regression model
Parameter estimation
Interpretation of correlation
coefficient
Coefficient of determination,
R2
Prediction
Regression diagnostics
Worked example
Multiple linear regression
Interpretation of correlation
coefficient
In the case of perfect correlation between X and Y,
we can predict Y directly and exactly from X
In the case of zero correlation between X and Y,knowledge of X tells us nothing about Y
Here we consider measuring the extent to which the
values of one variable can be used to predict the
values of another where the correlation is neither 1,nor 0, nor 1
IdentifyingInterpretation of correlation
http://find/7/28/2019 8 Identifying Relationships
25/50
relationships
Dr James Abdey
Overview
Relationship between two
variables
Correlation
Regression
The simple linear
regression model
Parameter estimation
Interpretation of correlation
coefficient
Coefficient of determination,
R2
Prediction
Regression diagnostics
Worked example
Multiple linear regression
Interpretation of correlation
coefficient
Our overall objective is to explain the response
variable Y, which is a random variable
We try to explain the variation in Y
Using simple linear regression, we attempt this using
a single explanatory variable, X
The total variation in the response variable sample
data is simplyn
i=1
(yi y)2
We term this the total sum of squares (TSS)
Identifyingl ti hiInterpretation of correlation
http://find/7/28/2019 8 Identifying Relationships
26/50
relationships
Dr James Abdey
Overview
Relationship between two
variables
Correlation
Regression
The simple linear
regression model
Parameter estimation
Interpretation of correlation
coefficient
Coefficient of determination,
R2
Prediction
Regression diagnostics
Worked example
Multiple linear regression
Interpretation of correlation
coefficient
We can decompose TSS into two components:
the amount we are able to explain using the modelcalled the explained sum of squares (ESS);
and the remaining variation that we are unable toexplain with the model, called the residual sum ofsquares (RSS)
Hence,
TSS = ESS+ RSS
IdentifyingrelationshipsCoefficient of determination R2
http://find/http://goback/7/28/2019 8 Identifying Relationships
27/50
relationships
Dr James Abdey
Overview
Relationship between two
variables
Correlation
Regression
The simple linear
regression model
Parameter estimation
Interpretation of correlation
coefficient
Coefficient of determination,
R2
Prediction
Regression diagnostics
Worked example
Multiple linear regression
Coefficient of determination, R
We can assess the overall fit of a model using R2
This measures the proportion of the total variabilityin the response variable explained by the model
This statistic is known as the coefficient of
determination and is denoted R2 and defined as
R2 = ESSTSS
0 R2 1
The closer R2 is to 1, the better the explanatory
power of the model
Note that R2 = r2 for a simple linear model, so wecan also compute it from r (correlation coefficient)
IdentifyingrelationshipsCoefficient of determination R2
http://find/7/28/2019 8 Identifying Relationships
28/50
relationships
Dr James Abdey
Overview
Relationship between two
variables
Correlation
Regression
The simple linear
regression model
Parameter estimation
Interpretation of correlation
coefficient
Coefficient of determination,
R2
Prediction
Regression diagnostics
Worked example
Multiple linear regression
Coefficient of determination, R
Returning to the crime/unemployment dataset, lets
assign Y and X as follows Y = number of offences X = unemployment
The least squares regression line was
y = 4323.6 + 0.7468x
The correlation coefficient was 0.861, therefore
R2 = 0.8612 = 0.7413
This means we can explain 74.13% of the variation
in number of offences using unemployment
IdentifyingrelationshipsPrediction
http://find/http://goback/7/28/2019 8 Identifying Relationships
29/50
relationships
Dr James Abdey
Overview
Relationship between two
variables
Correlation
Regression
The simple linear
regression model
Parameter estimation
Interpretation of correlation
coefficient
Coefficient of determination,
R2
Prediction
Regression diagnostics
Worked example
Multiple linear regression
Prediction
One of the purposes in calculating the line of best fitis prediction
Specifically, for some value of x, we can provide a
prediction for y
So, returning to the example, how many offenceswould you predict if there were 2000 unemployed
people in a city area?
Answer: just substitute the desired value of x into the
least squares regression line:
y = 4323.6 + 0.7468 2000 = 5817
IdentifyingrelationshipsPrediction
http://find/7/28/2019 8 Identifying Relationships
30/50
relationships
Dr James Abdey
Overview
Relationship between two
variables
Correlation
Regression
The simple linear
regression model
Parameter estimation
Interpretation of correlation
coefficient
Coefficient of determination,
R2
Prediction
Regression diagnostics
Worked example
Multiple linear regression
Prediction
Provided we are predicting y for an x value that is
within the available xdata, then we can be fairlyconfident in the prediction
This is what we call interpolation
However, if we base our prediction on an x value
outside the available x data, then we should view
the prediction with caution
This would be an example of extrapolation which is
risky since the relationship between x and y may
change for such values of x
IdentifyingrelationshipsRegression diagnostics
http://find/7/28/2019 8 Identifying Relationships
31/50
relationships
Dr James Abdey
Overview
Relationship between two
variables
Correlation
Regression
The simple linear
regression model
Parameter estimation
Interpretation of correlation
coefficient
Coefficient of determination,
R2
Prediction
Regression diagnostics
Worked example
Multiple linear regression
Regression diagnostics
The usefulness of a fitted regression model rests on
a basic assumption:
E(y) = + x
Furthermore inference such as the hypothesis tests,
confidence intervals and predictive intervals onlymake sense if the error terms are (approximately)
independent and normal with constant variance 2
Therefore it is important to check these conditions
are met in practice this task is called regression
diagnostics
Basic idea: Looking into the residualsi or thenormalised residuals
i/
IdentifyingrelationshipsRegression diagnostics
http://find/7/28/2019 8 Identifying Relationships
32/50
p
Dr James Abdey
Overview
Relationship between two
variables
Correlation
Regression
The simple linear
regression model
Parameter estimation
Interpretation of correlation
coefficient
Coefficient of determination,
R2
Prediction
Regression diagnostics
Worked example
Multiple linear regression
g g
What to look for?
Do the residuals manifest IID normal behaviour?
Is the scatter plot of
i versus xi patternless?
Is the scatter plot ofi versusyi patternless? Is the scatter plot ofi versus i patternless?
If you see trends, periodic patterns, increasingvariation in any one of the above scatter plots, it is
very likely that at least one assumption is violated!
IdentifyingrelationshipsRegression diagnostics
http://goforward/http://find/http://goback/7/28/2019 8 Identifying Relationships
33/50
p
Dr James Abdey
Overview
Relationship between two
variables
Correlation
Regression
The simple linear
regression model
Parameter estimation
Interpretation of correlation
coefficient
Coefficient of determination,
R2
Prediction
Regression diagnostics
Worked example
Multiple linear regression
g g
Two other issues in regression diagnostics: outliers
and influential observations
Outlier: An unusually small or unusually large yiwhich lies outside of the majority of observations
An outlier is often caused by an error in either
sampling or recording data. If so, we should correct itbefore proceeding with the regression analysis
If an observation which looks like an outlier indeed
belongs to the sample and no errors in sampling or
recording were discovered, we may use a more
complex model or distribution to accommodate this
outlier. For example, stock returns often exhibit
extreme values and they often cannot be modelled
satisfactorily by a normal regression model
IdentifyingrelationshipsRegression diagnostics
http://find/http://goback/7/28/2019 8 Identifying Relationships
34/50
Dr James Abdey
Overview
Relationship between two
variables
Correlation
Regression
The simple linear
regression model
Parameter estimation
Interpretation of correlation
coefficient
Coefficient of determination,
R2
Prediction
Regression diagnostics
Worked example
Multiple linear regression
g g
Influential observation: An xi which is far away
from other xis Such an observation may have a large influence on
the fitted regression line
IdentifyingrelationshipsRegression: Worked example
http://find/7/28/2019 8 Identifying Relationships
35/50
Dr James Abdey
Overview
Relationship between two
variables
Correlation
Regression
The simple linear
regression model
Parameter estimation
Interpretation of correlation
coefficient
Coefficient of determination,
R2
Prediction
Regression diagnostics
Worked example
Multiple linear regression
g p
We apply the simple linear regression method to
study the relationship between two series of financial
returns: a regression of Cisco Systems stock returns,y, on S&P500 Index returns, x
This regression model is an example of the CAPM
(Capital Asset Pricing Model)
Stock returns:
Return =Current price Previous price
Previous price
log
current price
previous price
when the difference between the two prices is small
IdentifyingrelationshipsRegression: Worked example
http://find/7/28/2019 8 Identifying Relationships
36/50
Dr James Abdey
Overview
Relationship between two
variables
Correlation
Regression
The simple linear
regression model
Parameter estimation
Interpretation of correlation
coefficient
Coefficient of determination,
R2
Prediction
Regression diagnostics
Worked example
Multiple linear regression
Remark: Daily prices are definitely not independent.
However, daily returns may be seen as a sequence
of uncorrelated random variables
For S&P500, the average daily return is -0.04%, the
maximum daily return is 4.46%, the minimum daily
return is -6.01%, and the standard deviation is 1.40%
For Cisco, the average daily return is -0.13%, the
maximum daily return is 15.42%, the minimum daily
return is -13.44%, and the standard deviation is
4.23%
Descriptive Statistics
N Range Minimum Maximum Mean Std. Deviation Variance
SP500 252 10.66 -6.00 4.65 -.0424 1.40017 1.960
Cisco 252 28.85 -13.44 15.42 -.1336 4.23419 17.928
Valid N (listwise) 252
IdentifyingrelationshipsRegression: Worked example
http://find/7/28/2019 8 Identifying Relationships
37/50
Dr James Abdey
Overview
Relationship between two
variables
Correlation
Regression
The simple linear
regression model
Parameter estimation
Interpretation of correlation
coefficient
Coefficient of determination,
R2
Prediction
Regression diagnostics
Worked example
Multiple linear regression
Remark: Cisco is much more volatile than the
S&P500
There is clear synchronisation between the
movements of the two series of returns
IdentifyingrelationshipsRegression: Worked example
http://find/7/28/2019 8 Identifying Relationships
38/50
Dr James Abdey
Overview
Relationship between two
variables
Correlation
Regression
The simple linear
regression model
Parameter estimation
Interpretation of correlation
coefficient
Coefficient of determination,
R2
PredictionRegression diagnostics
Worked example
Multiple linear regression
We fit a regression model:
Cisco = + S&P500 +
Rationale: Part of the fluctuation in Cisco returns
was driven by the fluctuation of the S&P500 returns
IdentifyingrelationshipsRegression: Worked example
http://find/7/28/2019 8 Identifying Relationships
39/50
Dr James Abdey
Overview
Relationship between two
variables
Correlation
Regression
The simple linear
regression model
Parameter estimation
Interpretation of correlation
coefficient
Coefficient of determination,
R2
PredictionRegression diagnostics
Worked example
Multiple linear regression
Coefficientsa
Model Unstandardized Coefficients Standardized
Coefficients
t Sig. 95.0% Confidence Interval for B
B Std. Error Beta Lower Bound Upper Bound
1(Constant) -.012 .064 -.188 .851 -.139 .114
Cisco .227 .015 .687 14.943 .000 .197 .257
a. Dependent Variable: SP500
Model Summaryb
Model R R Square Adjusted R
Square
Std. Error of the
Estimate
1 .687a
.472 .470 1.01964
a. Predictors: (Constant), Cisco
b. Dependent Variable: SP500
IdentifyingrelationshipsRegression: Worked example
http://find/7/28/2019 8 Identifying Relationships
40/50
Dr James Abdey
Overview
Relationship between two
variables
Correlation
Regression
The simple linear
regression model
Parameter estimation
Interpretation of correlation
coefficient
Coefficient of determination,
R2
PredictionRegression diagnostics
Worked example
Multiple linear regression
When testing the statistical significance ofregression coefficients, we just need to look at the
p-value
The smaller the p-value, the more significant the
result, i.e. that the true parameter value is different
from zero
In practice, we treat p-values smaller than 0.05 as
being statistically significant (at the 5% significancelevel)
IdentifyingrelationshipsRegression: Worked example
http://find/7/28/2019 8 Identifying Relationships
41/50
Dr James Abdey
Overview
Relationship between two
variables
Correlation
Regression
The simple linear
regression model
Parameter estimation
Interpretation of correlation
coefficient
Coefficient of determination,
R2
PredictionRegression diagnostics
Worked example
Multiple linear regression
The estimated slope:
= 2.077. The null hypothesis
H0 : = 0 is rejected with p-value 0.000: extremelysignificant
Attempted interpretation: When the market index
goes up by 1%, Cisco stock goes up by 2.077%, on
average. However, the error term in the model islarge with an estimated = 3.08%
The p-value for testing H0 : = 0 is 0.815, so we
cannot reject the hypothesis that = 0
Recall = yx and both y and x are very close to0
IdentifyingrelationshipsRegression: Worked example
http://find/7/28/2019 8 Identifying Relationships
42/50
Dr James Abdey
Overview
Relationship between two
variables
Correlation
Regression
The simple linear
regression model
Parameter estimation
Interpretation of correlation
coefficient
Coefficient of determination,
R2
PredictionRegression diagnostics
Worked example
Multiple linear regression
R2 = 47.2% of the variation of Cisco stock may beexplained by the variation of the S&P500 index, or in
other words 47.2% of the risk in Cisco stock is the
market-related risk see CAPM below
CAPM: A simple asset pricing model in finance:
yi = + xi + i
where yi is a stock return and xi is a market return attime i
IdentifyingrelationshipsRegression: Worked example
http://find/7/28/2019 8 Identifying Relationships
43/50
Dr James Abdey
Overview
Relationship between two
variablesCorrelation
Regression
The simple linear
regression model
Parameter estimation
Interpretation of correlation
coefficient
Coefficient of determination,
R2
PredictionRegression diagnostics
Worked example
Multiple linear regression
Total risk of the stock:
1
n
ni=1
(yi y)2 =
1
n
ni=1
(yi y)2 + 1n
ni=1
(yiyi)2 Market-related (or systematic) risk:
1
n
ni=1
(yi y)2 = 1n2 n
i=1
(xi x)2
Firm-specific risk:
1
n
ni=1
(yiyi)2
IdentifyingrelationshipsRegression: Worked example
http://find/http://goback/7/28/2019 8 Identifying Relationships
44/50
Dr James Abdey
Overview
Relationship between two
variablesCorrelation
Regression
The simple linear
regression model
Parameter estimation
Interpretation of correlation
coefficient
Coefficient of determination,
R2
PredictionRegression diagnostics
Worked example
Multiple linear regression
measures the market-related (or systematic) risk ofthe stock
Market-related risk is unavoidable, while firm-specific
risk may be diversified away through hedging
Variance is a simple measure (and one of the most
frequently used) for risk in finance
Identifyingrelationships
D J Abd
Multiple linear regression
http://find/7/28/2019 8 Identifying Relationships
45/50
Dr James Abdey
Overview
Relationship between two
variablesCorrelation
Regression
The simple linear
regression model
Parameter estimation
Interpretation of correlation
coefficient
Coefficient of determination,
R2
PredictionRegression diagnostics
Worked example
Multiple linear regression
Previously we saw simple linear regression
That had one explanatory variable
Often one explanatory variable is not enough to
explain variation in the response variable
So we add more linear explanatory variables
Identifyingrelationships
Dr James Abdey
Multiple linear regression
e amples
http://find/7/28/2019 8 Identifying Relationships
46/50
Dr James Abdey
Overview
Relationship between two
variablesCorrelation
Regression
The simple linear
regression model
Parameter estimation
Interpretation of correlation
coefficient
Coefficient of determination,
R2
PredictionRegression diagnostics
Worked example
Multiple linear regression
examples
Absenteeism in the workforce could be due to:
hours worked flexibility in work practice
salary paid...
Salary for managers could be related to:
qualifications experience hours worked performance...
Identifyingrelationships
Dr James Abdey
Multiple linear regression
http://find/7/28/2019 8 Identifying Relationships
47/50
Dr James Abdey
Overview
Relationship between two
variablesCorrelation
Regression
The simple linear
regression model
Parameter estimation
Interpretation of correlation
coefficient
Coefficient of determination,
R2
PredictionRegression diagnostics
Worked example
Multiple linear regression
Remember the aim of statistics is prediction anddecision making
In order to make the best predictions and decisions
we need to use the best models
This often means making more complex models
adding more explanation
But not too complex (Occams razor)
Identifyingrelationships
Dr James Abdey
The multiple linear model
http://find/7/28/2019 8 Identifying Relationships
48/50
Dr James Abdey
Overview
Relationship between two
variablesCorrelation
Regression
The simple linear
regression model
Parameter estimation
Interpretation of correlation
coefficient
Coefficient of determination,
R2
PredictionRegression diagnostics
Worked example
Multiple linear regression
Suppose y is the managers salary
x1 = qualifications, x2 = experience, x3 = hours, x4 =
performance
y = 0 + qualx1 + expx2 + hrsx3 + perx4 +
We can visualise up to n= 3
Identifyingrelationships
Dr James Abdey
The multiple linear model
http://find/7/28/2019 8 Identifying Relationships
49/50
Dr James Abdey
Overview
Relationship between two
variablesCorrelation
Regression
The simple linear
regression model
Parameter estimation
Interpretation of correlation
coefficient
Coefficient of determination,
R2
Prediction
Regression diagnostics
Worked example
Multiple linear regression
Identifyingrelationships
Dr James Abdey
The multiple linear model
http://find/http://goback/7/28/2019 8 Identifying Relationships
50/50
Dr James Abdey
Overview
Relationship between two
variablesCorrelation
Regression
The simple linear
regression model
Parameter estimation
Interpretation of correlation
coefficient
Coefficient of determination,
R2
Prediction
Regression diagnostics
Worked example
Multiple linear regression
Multiple linear regression uses least squares
estimation like simple linear regression
That is, we minimise the sum of the squared
residuals in all dimensions
Sounds tricky, but fortunately software (SPSS etc.)
takes care of that for us
http://find/