Linear RegressionSiana Halim
Draper,N.R; Smith, H.; Applied Regression Analysis,3rd Edition, John Wiley & Sons, Inc. 1998Montgomery, D.C; Peck, E.A; Introduction to Linear Regression Analysis, 2nd Edition, 1992
OutlineOutline
Introduction
Fitting a Straight Line by Least Squares
Measures of Model Adequacy
The Need for Statistical AnalysisThe Need for Statistical Analysis
In research laboratories, experiments are being performed daily. These are usually small, carefully planned studies and result ca e u y p a e stu es a esu t in sets of data in modest size. The objective is often a quick yet accurate analysis enabling the accurate analysis, enabling the experimenter to move to „better“ experimental conditions, which will produce a product with desirable characteristics.
The Need for Statistical AnalysisThe Need for Statistical Analysis
A Ph.D researcher may travel into Af i j l f an African jungle for a one-year
period of intensive data-gathering on plants or animals. She will return with the raw material for her thesis and will put much-effort into analyzing the data she has into analyzing the data she has, searching for the messages that they contain. It will not be easy to bt i d t h t i i obtain more data once her trip is
completed, so she must carefully analyze every aspect of what data she has.
The Need for Statistical AnalysisThe Need for Statistical Analysis
Regression analysis is a h i h b d i
Predictor variable = input variables = inputs technique that can be used in
any of these situations.In any system in which
= inputs = X – variables= regressors= independent variables
variable quantities change; it is of interest to examine the effect that some variable
independent variables
Response variable = output variables = outputs effect that some variable
exert (or appear to exert) on others.W th f ll i
outputs = Y – variables= dependent variables
We use the following names :
Response variable = Model function + Random errorResponse variable Model function + Random error
Straight Line Relationship between Two Variables Variables
In much experimental work we wish to investigate how the changes in one variable affect g ganother variables.
In this example, for any given height there is a range of observed weights and vice versa This range of observed weights, and vice versa. This variation will be partially due to measurement errors but primarily due to variation between individuals. Thus no unique relationship between individuals. Thus no unique relationship between actual height and weight can be expected.
When we are concerned with the dependence of a random variable Y on a quantity X i e a a random variable Y on a quantity X,i.e., a variable but not randomly variable, an equation that relates Y on X is usually called a regression equation.equation.
Fitting a Straight Line by Least SquaresFitting a Straight Line by Least Squares
1313
12
11
25 observations of variable Y : pounds of steam used per monthX : average atmospheric temperature
Y
10
9
g p pin degrees Fahrenheit.
The linear first-order model
εββ ++= XY 108
7
6
(1)
X807060504030
Parameters Errors
Meaning of Linear ModelMeaning of Linear Model
When we say that a model is linear or nonlinear, we are When we say that a model is linear or nonlinear, we are referring to linearity or nonlinearity in the parameters. The value of the highest power of a predictor variable in the model is called the order of the model. For example
εβββ +++= 21110 XXY
Is a second-order (in X) linear in the (β‘s) regression modelmodel.
Least Square EstimationLeast Square EstimationModel estimate
(2)XbbY 10ˆ +=
Y(2)XbbY 10 +
Suppose we have n sets of observations (X1,Y1),...,(Xn,Yn) then we can write (1) as
(3)niXY iii ,...,1,10 =++= εββ
so that the sum of squares of deviation from the true line is
(4)( )
n n 22
X
( )( )∑ ∑
= =−−==
i iii XYS
i1 1
210
2 ββε
Sum of squares function
Least Square EstimationLeast Square EstimationWe can determine b0 and b1 by differentiating Eq. (4) first w.r.t to β0 and
From (6) we have n ng q ( ) β0
then w.r.t β1 and setting the result to zero
( ),2 10∑ −−−=∂∂ n
ii XYS βββ (7)
=∑ ∑−∑−
=∑ ∑−−= =n
i
n
ii
n
iiii
i iii
XbXbYX
XbnbY
1 1
21
10
1 110
0
0
(5)
( )
( ),21
101
10
∑ −−−=∂∂∂
=
=
n
iiii
i
XYXS βββ
β (7)
∑=∑+
= ==
ni
ni
i ii
YXbnb
or
10
1 11
So that the estimates b0 and b1 are solutions of the two equations :
n
(8)∑=∑−∑===
==n
iii
n
ii
n
ii
ii
YXXbXb11
21
10
11
(6)
( )
( ) 0
0
110
110
=∑ −−
=∑ −−
=
=n
iiii
n
iii
XbbYX
XbbYThese equations are called the normal equations (orthogonal).
where we substitute (b0,b1) for (β0,β1)
Least Square EstimationLeast Square Estimation
The solution of (8) XbYb
13
( )( )[ ]
( )∑ ∑
∑ ∑∑−=
−=
=22
11
10
nXX
nYXYXb
XbYbn
iiiii
12
11XY 0798.06230.13ˆ −=
( )( )( )
( )∑ −
−∑ −=
∑ ∑−
2XX
YYXX
nXX
i
ii
ii
Y
10
9
8
( )( )( ) ∑ −=∑ −=
∑ −=−∑ −=222 XnXXXS
YXnYXYYXXS
iiXX
iiiiXY
807060504030
7
6
( )∑ ∑ −=−= 222 YnYYYS iiYYX
807060504030
The Analysis of VarianceThe Analysis of VarianceY
iY
We now tackle the question of how much of the variation in the data has iY
iY
Y
iii eYY =− ˆ
YYi −ˆ
YYi −
much of the variation in the data has been explained by the regression line. Consider the following identity
)ˆ(ˆ YYYYYYe
iXX
XbbY 10ˆ +=
The residual ei is the difference between
)( YYYYYYe iiiii −−−=−=
iXi
two quantities :
(1)the deviation of the observed Yi from the overall mean
( ) ( ) ( )iiii YYYYYY ˆˆ −+−=−
If we square both sides of this and the overall mean
(2)and the deviation of the fitted from the overall mean
sum from i = 1,..., n, we obtain
( ) ( ) ( )222 ˆˆiiii YYYYYY −+−=∑ −
Sum of SquaresSum of Squares
⎟⎟⎞
⎜⎜⎛
+⎟⎟⎞
⎜⎜⎛
=⎟⎟⎞
⎜⎜⎛ squaresof Sum squaresof Sum squaresof Sum
⎟⎟⎠
⎜⎜⎝
+⎟⎟⎠
⎜⎜⎝
⎟⎟⎠
⎜⎜⎝ regressionaboutregressiontoduemean the about
ANOVA Table
Source of Variation Degree of Freedom (df)
Sum of Squares(SS)
Mean Square (MS)
Due to regression 1 ( )∑ −=n
YYSS2ˆ RMS
About regression(Residual)
n-1
( )∑=
−=i
iR YYSS1
( )∑ −=n
iiiE YYSS
1
2ˆ EE MS
nSSs =−
=2
2
R
( )
Total Corrected for mean
n-1=i 1
Y( )∑ −
=
n
ii YY
1
2
R2 StatisticR Statistic
2 bgivenregression to due SSR 0 Adjusted R2 = 1n
( )( )∑ −
∑ −=
=
2
2ˆ
YY
YY
Y mean the for corrected SS,Totalgg
R
i
i
0
Where n number of samples, p number of regressors in the linear models
11)1(1 2
−−−
−−pn
nR
( )∑ i
R2 measures the „proportion of total variation about the mean Y
regressors in the linear models
explained by the regression.
In fact, R is the correlation between Y and and is usually called
Y
Ythe multiple correlation coefficient.
R2 is then „the square of the multiple correlation coefficient“.
InferencesInferences
The basic assumptions in the model Y = β + β X i = 1 n :Yi = β0 + β1Xi, i = 1,..,n :
A1. εi is a random variable with mean zero and variance σ2 (unknown); that
2is E(εi) = 0, V(εi) = σ2
A2. εi are uncorrelated, i ≠ j, so that cov(εi,εj) = 0. Thus
E(Y ) = β + β X V(Y ) = σ2E(Yi) = β0 + β1Xi, V(Yi) = σ2
Yi and Yj , i ≠ j, are uncorrelated.A3. εi is a normally distributed random
variable with mean zero and variable, with mean zero and variance σ2 ; i.e.εi ~ N(0, σ2)
Under A3, εi,´,εj are not only uncorrelated but necessarily independentbut necessarily independent.
InferencesInferences
Variance and standard
22
1)(bV ==σσ
Variance and standard Deviation of b1
( )
( ) 2/12/121
21
)(
)(
XX
XXi
Sbsd
SXXbV
=⎤⎡
=
=−
=
∑
∑σσ
( )
( ) 2/12/121
2
)(.XX
XXi
Ss
XX
sbsdest
SXX
=⎤⎡
=
⎥⎦⎤
⎢⎣⎡ −
∑
∑
( ) XXi XX ⎥⎦
⎤⎢⎣⎡ −∑
InferencesInferences
Confidence Interval for β1 Test for H0: β1 = β10 vs H β β
( )101b β−
If we assume that the variations of the observations about the line are normal, 100(1-α)% confidence limits for β1 by
H1: β1 ≠ β10
( )( )1
101
bset
β=calculating
( ) 2/121
211,2
⎤⎡
⎟⎠⎞
⎜⎝⎛ −−
±snt
bα
Compare |t| with t (n-2, 1-1/2 α) from
where t(n 2 1/2α) is the 100(1 1/2α)
( ) 2/12⎥⎦⎤
⎢⎣⎡ −∑ XX i
Compare |t| with t (n 2, 1 1/2 α) from t-table.If it had happened that the observed |t| value had been smaller than the
where t(n-2,1/2α) is the 100(1-1/2α) percentage point of a t-distribution, with (n-2) degrees of freedom.
| |critical value, we could not reject the hypothesis.
InferencesInferences
Standard Deviation of b0 A t- test for 0
( )σ
2/1
2
2
0 )(⎪
⎪⎬⎫
⎪
⎪⎨⎧
=∑∑
XX
Xbsd i
A t test for H0: β0 = β00 vs H1: β0 ≠ β00
will be rejected if β00 fall outside the ( ) ⎪⎭⎪⎩ −∑ XXn i
100(1-α) Confidence Interval
will be rejected if β00 fall outside the confidence interval, and vice versa, or compare
bt 000 −=
β( )for β0
sX
ntb i
2/12
)112( ⎪⎬⎫
⎪⎨⎧
± ∑α( )
sXXn
Xt
i
i
2/1
2
2
⎪⎭
⎪⎬⎫
⎪⎩
⎪⎨⎧
−
=
∑∑
( )s
XXnntb
i20 )
21,2(
⎪⎭⎬
⎪⎩⎨
−−−±
∑∑α
With t(n-2, 1-1/2α).
F-Test for Significance of RegressionF Test for Significance of Regression
The ratio : This fact can be used as a test of The ratio :
R
MSMSF =0
This fact can be used as a test of H0:β1 = 0 versus H1: β1 ≠ 0. We compare the ratio F0 with the
follows an F-distribution with (1 2) d f f d
EMS 100(1-α)% point of the tabulated F(1, n-2) distribution. Reject H0 if
F0 > F(1, n-2)(1,n-2) degrees of freedom provided that β1 = 0
0 ( , )
InferencesInferences
Situations yS tuat o swhere the hypothesis H0:β1 = 0 is not rejected
y y
j
Situations x x
where the hypothesis H0:β1 = 0 is rejected
y y
j
x x
Regression Analysis : Y vs XRegression Analysis : Y vs X
The regression equation isY 13 6 0 0798 XY = 13.6 - 0.0798 X
Predictor Coef SE Coef T PConstant 13 6230 0 5815 23 43 0 000Constant 13.6230 0.5815 23.43 0.000X -0.07983 0.01052 -7.59 0.000
S = 0.890125 R-Sq = 71.4% R-Sq(adj) = 70.2%S 0.890125 R Sq 71.4% R Sq(adj) 70.2%
Analysis of VarianceSource DF SS MS F PRegression 1 45.592 45.592 57.54 0.000Residual Error 23 18.223 0.792Total 24 63.816
Measures of AdequacyMeasures of Adequacy
The major assumptions that we have made so far in our study of regression analysis are as follows :study of regression analysis are as follows :
1. The relationship between y and x is linear, or at least it is well approximated by a straight line
2. The error term ε has zero mean3 Th t h t t i 23. The error term ε has constant variance σ2
4. The errors are uncorrelated5 The errors are normally distributed5. The errors are normally distributed.
Residual AnalysisResidual Analysis
W h d fi d h id l explained by the regression
We have defined the residuals as
niyye iii ,...,1,ˆ =−=
p y gmodel. It is also convenient to think of residuals as the realized or observed values of the errorsniyye iii ,...,1, or observed values of the errors.The residuals have several important properties. They have
h i i Observation Fitted value
Residual may be viewed as the deviation between the data and th fit It i f th
mean zero, their approximate average variance is
nnthe fit. It is a measure of the variability not
( )E
E
n
ii
n
ii
MSnSS
n
e
n
ee=
−=
−=
−
− ∑∑==
2221
2
1
2
Normal Probability PlotNormal Probability Plot
Although small departures from normality do not effects the Although small departures from normality do not effects the model greatly, gross non normality is potentially more serious as the t- or F – statistics, and confidence and prediction intervals d d th lit tidepend on the normality assumption.Furthermore if the errors come from a distribution with thicker or heavier tails than the normal, the least squares fit may be sensitive to a small subset of the dataTo check the normality, we use the QQ (Quantile-Quantile) plot of residual(Quantile Quantile) plot of residual
Normal Probability PlotNormal Probability Plot
Ideal heavy tailed light tailed
right skew left skew
Plot of Residual against iyPlot of Residual against
A plot of the residuals ei versus the corresponding fitted
iy
values and xi is useful for detecting several common types of model inadequacies
satisfactory funnel double bow non linear
Other Residual PlotsOther Residual Plots
If the time are sequence in which the data were collected is known, it may b i t ti t l t th id l i t ti dbe instructive to plot the residuals against time order.
The time sequence plot of residuals may indicate that the errors at one time period are correlated with those at other time periods (autocorrelation)
Positive autocorrelate Negative autocorrelate
Testing Homogeneity of Pure Error (Optional)(Optional)
1. Bartlett’s Test Let υ = υ1+ υ2+…+ υm and m
Let s1,s2,…,sm be the estimate of σ2 from the m groups of repeats with υ1 υ2 υ degrees of
And m is the number of groups with 13
1
1
1
1
−
−+=
−
=
−∑m
C ii υυ
with υ1,υ2,…,υm degrees of freedom, respectively, where υj = nj – 1 and
repeat runs. The test statistic is then
CssBm
jjje⎭⎬⎫
⎩⎨⎧
−= ∑=1
22 lnln υυ
( )−
∑ −= =
j
n
ujju
j n
YYs
j
1
2
1
When the variance of the groups are all the same, B is distributed as χ 2 A significant B value could
j ⎭⎩
∑
∑= =
mi
m
iii
e
ss 1
2
υ
υχm-1
2. A significant B value could indicate inhomogeneous variances.
It could also indicate non-normality. =i
i1
It could also indicate non normality.
Testing Homogeneity of Pure Error (Optional)(Optional)
2. Levene’s Test using Means The appropriate F- statistic is then
Consider, in the jth group of repeats, the absolute deviations
( ) ( )∑ ∑ ∑ −−
∑ −−=
m n mjjju
m
jjj
j
nzz
mzzn
2
1
2
1
)1()(
Of the Y‘s from the means of
nuYYz jjuju ,...,2,1, =−=
where
( ) ( )∑ ∑ ∑= = =j u j
jjju1 1 1
nntheir repeats group. Consider this as a one-way classification and compare the „between
∑ ∑ ∑=∑== = ==
m
j
n
u
n
jjju
n
ujjuj
jj
nzznzz1 1 11
,
and compare the „between groups“ mean square with the „within groups“ mean square via an F test
The F-value is referred to F , using only the upper tail.
{ }∑ −− =mj jnm 1 )1(,1
an F-test.
Durbin-Watson TestDurbin Watson Test
The Durbin – Watson test checks for a sequential
It can be shown that :1 0≤ d ≤ 4 alwayschecks for a sequential
dependence in which each error (on so residual) is correlated
1. 0≤ d ≤ 4 always2. If successive residuals are
postively serially correlated, with those before and after it in the sequence.
that is, positively correlated in their sequence, d will be near 0
( )∑ ∑−== =
−
n
u
n
uuuu eeed
2 1
221
3. If succesive residuals are negatively correlated, d will be near 4 so that 4 d will be = =u u2 1 near 4, so that 4 – d, will be near 0
4. The distribution of d is 2symmetric about 2.
Durbin-Watson TestDurbin Watson Test
The test is conducted as follows : C d ( 4 d hi h i
•If 4-d < dL, conclude that Compare d (or 4-d, whichever is closer to zero) with dL dan du in the following table.
negative serial correlation is a possibility• if 4-d > du, conclude that no
• If d < dL, conclude that positive serial correlation is a possibility.• If d > d conculde that no serial
if 4 d du, conclude that no serial correlation is indicated.• if the d (or 4-d) value lies between d and d the test is If d > du, conculde that no serial
correlation is indicated.between dL and du, the test is inconclusive. An indication of positive or negative serial correlation would be cause for the model to be reexaminde.
Durbin-Watson TestDurbin Watson Test
1% 2.5% 5%
n d d d d d dn dL du dL du dL du
15 0.81 1.07 0.95 1.23 1.08 1.36
20 0.95 1.15 1.08 1.28 1.20 1.41
25 1.05 1.21 1.18 1.34 1.29 1.45
30 1.13 1.26 1.25 1.38 1.35 1.49
40 1.25 1.34 1.35 1.45 1.44 1.54
50 1.32 1.40 1.42 1.50 1.50 1.59
70 1.43 1.49 1.51 1.57 1.58 1.64
100 1 52 1 56 1 59 1 63 1 65 1 69Interpolate
100 1.52 1.56 1.59 1.63 1.65 1.69
150 1.61 1.64 - - 1.72 1.75
200 1.66 1.68 - - 1.76 1.78
linearly for intermediate n-values
Detection and Treatment of OutliersDetection and Treatment of Outliers
An outlier is an extreme observation. Residuals that are id bl l i b l l h h h h considerably larger in absolute value than the others, say three or
four standard deviations from the mean, are potential outliers. Outliers are data points that are not typical of the rest of the data.
Detection and Treatment of OutliersDetection and Treatment of Outliers
Outliers should be carefully investigated to see if a reason for
Sometimes we find that the outlier is l b f l l ibl investigated to see if a reason for
their unusual behavior can be found.Sometimes outliers are „bad“
an unusual but perfectly plausible observation.Deleting these points to „improve the fit equation“ can be dangerous as it Sometimes outliers are „bad
values, occuring as a result of unusuaal but explainable events. Examples include faulty
fit equation can be dangerous, as it can give the user a false sense of precision in estimation or prediction.
measurement or analysis, incorrect recording of data and failure of a measuring instrument. If this is the case, then the outlier should be corrected (if possible) or deleted from the data set.
Cook’s DistanceCook s Distance
Cook (1977) proposed that the influence of the ith data point be measured by the squared scaled distance
( ) ( ) ( )2)(ˆˆ)(ˆˆ psiYYiYYD′
= ( ) ( ) ( ){ }
{ } { } 2)()(
(ˆˆ),()(ˆ,ˆ)()(
ibbXXibbD
b(i)-bXi)YYiXbiYXbY
psiYYiYYDi
′′′=−==
−−=
is the vector of predicted values from a least square fit when the
{ } { } 2)()( psibbXXibbDi −′−=
is the vector of predicted values from a least square fit when the ith data point is deleted. p is any power, s is the sample of standard deviation.
Residual PlotsResidual Plots
Normal Probability Plot of the Residuals Residuals Versus the Fitted Values
Residual Plots for Y
Per
cent
99
90
50
Res
idua
l
1
0
Residual
P
210-1-2
10
1
Fitted Value
R
1110987
-1
-2
quen
cy
8
6
4 sidu
al
1
0
Histogram of the Residuals Residuals Versus the Order of the Data
Residual
Freq
1.51.00.50.0-0.5-1.0-1.5
4
2
0
Observation Order
Res
24222018161412108642
-1
-2