The General Linear Model
The Simple Linear Model
Linear Regression
Suppose that we have two variables
1. Y – the dependent variable (response variable)
2. X – the independent variable (explanatory variable, factor)
X , the independent variable may or may not be a random variable .
Sometimes it is randomly observed.
Sometimes specific values of X are selected
The dependent variable, Y, is assumed to be a random variable .
The distribution of Y is dependent on X
The object is to determine that distribution using statistical techniques. (Estimation and Hypothesis Testing)
These decisions will be based on data collected on both variable Y (the dependent variable) and X (the independent variable) .
Let (x1, y1), (x2, y2), … ,(xn, yn) denote n pairs of values measured on the independent variable (X) and the dependent variable (Y)
The scatterplot:
The graphical plot of the points:
(x1, y1), (x2, y2), … ,(xn, yn)
Assume that we have collected data on two variables X and Y. Let
(x1, y1) (x2, y2) (x3, y3) … (xn, yn)
denote the pairs of measurements on the on two variables X and Y for n cases in a sample (or population)
1. independent random variables.
2. Normally distributed.
3. Have the common variance, .
4. The mean of yi is i = + xi
The assumption will be made that y1, y2, y3
…, yn are
Data that satisfies the assumptions above is to come from the Simple Linear Model
Each yi is assumed to be randomly generated
from a normal distribution with
mean i = + xi and
standard deviation .
yi
+ xi
xi
• When data is correlated it falls roughly about a straight line.
0
20
40
60
80
100
120
140
160
40 60 80 100 120 140
The density of yi is:
2
2
2
2
1
ii xy
i eyf
The joint density of y1 ,y2 , …,yn is:
n
i
xy
n
ii
eyyyf1
221
2
2
2
1,,,
n
iii xy
nn e 1
222
1
2/2
1
Estimation of the parameters
the intercept
the slope the standard deviation (or variance 2)
The Least Squares Line
Fitting the best straight line
to “linear” data
LetY = a + b X
denote an arbitrary equation of a straight line.a and b are known values.This equation can be used to predict for each value of X, the value of Y.
For example, if X = xi (as for the ith case) then the predicted value of Y is:
ii bxay ˆ
Define the residual for each case in the sample to be:
iiiii bxayyyr ˆ
,ˆ,,ˆ,ˆ 222111 nnn yyryyryyr
n
iii
n
ii yyrRSS
1
2
1
2 ˆ
The residual sum of squares (RSS) is defined as:
The residual sum of squares (RSS) is a measure of the “goodness of fit of the line Y = a + bX to the data
n
iii bxay
1
2
One choice of a and b will result in the residual sum of squares
n
iii
n
iii
n
ii bxayyyrRSS
1
2
1
2
1
2 ˆ
attaining a minimum.
If this is the case than the line:
Y = a + bX
is called the Least Squares Line
baR ,
To find the Least Squares estimates, a and b, we need to solve the equations:
0,
1
2
n
iii bxay
aa
baR
and
0,
1
2
n
iii bxay
bb
baR
Note:
n
iii bxay
aa
baR
1
2,
or
n
iii bxay
1
)1(2
0)2(1 1
n
i
n
iii xbnay
n
ii
n
ii xbnay
11
and xbay
Note:
or
n
iii bxay
bb
baR
1
2,
n
iiii xbxay
1
)(2
0)2(1
2
11
n
ii
n
ii
n
iii xbxayx
n
ii
n
ii
n
iii xbxayx
1
2
11
Hence the optimal values of a and b satisfy the equations: and
n
ii
n
ii
n
iii xbxayx
1
2
11
From the first equation we have:
xbay
xbya
n
iixbxnxby
1
2
The second equation becomes:
n
ii
n
ii
n
iii xbxxbyyx
1
2
11
n
ii xnxbyxn
1
22
Solving the second equation for b:
xS
Syxbya
xx
xy
xx
xy
n
ii
n
iii
S
S
xnx
yxnyxb
1
22
1
and
n
iixx xnxS
1
22where
n
iiixy yxnyxS
1
and
Note:
and
xx
n
ii
n
ii Sxnxxx
1
22
1
2
Proof
xy
n
iii
n
iii Syxnyxyyxx
11
n
iiiii
n
iii yxyxyxyxyyxx
11
yxnyxxyyxn
ii
n
ii
n
iii
111
yxnyxnyxnyxn
iii
1
yxnyxn
iii
1
Summary:Slope and intercept of the least squares Line
xS
Syxy
xx
xy ˆˆ
xx
xy
n
ii
n
iii
S
S
xx
yyxx
1
2
1
and
Maximum Likelihood Estimation of the parameters
the intercept
the slope the standard deviation
Recall
The joint density of y1 ,y2 , …,yn is:
n
i
xy
n
ii
eyyyf1
221
2
2
2
1,,,
n
iii xy
nn e 1
222
1
2/2
1
L
= the Likelihood function
n
iii xyn
n1
2
22
1ln2ln
2
Ll ln
the log Likelihood function
To find the maximum Likelihood estimates of , and we need to solve the equations:
0l
0l
0l
n
iii xy
1
2 0
These are the same equations for the least squares line which have solution:
0l
0l
becomes
n
iii xy
1
2 0becomes
xy ˆˆ xx
xy
S
S
The third equation:
0l
becomes
022
111
23
n
iii xyn
n
iii xy
n 1
22 ˆˆ1
ˆ
Summary: Maximum Likelihood Estimates
xS
Syxy
xx
xy ˆˆ
xx
xy
n
ii
n
iii
S
S
xx
yyxx
1
2
1
and
n
iii xy
n 1
22 ˆˆ1
ˆ
A computing formula for the estimate of 2
xy ˆˆ and
n
iii xy
n 1
22 ˆˆ1
ˆ
n
iii xxyy
n 1
22 ˆˆ1ˆ Hence
n
iii xxyy
n 1
2ˆ1
n
iiiii xxyyxxyy
n 1
222 ˆˆ21
xxxyyy SSSn
2ˆˆ21
Now
xx
xy
S
S
2Hence xxxyyy SSSn
2ˆˆ21
xx
xx
xyxy
xx
xyyy S
S
SS
S
SS
n
2
21
xx
xyyy S
SS
n
21
222 2ˆ
n
nE
It also can be shown that
Thus , the maximum likelihood estimator of 2, is a biased estimator of 2.
2
This estimator can be easily converted into an unbiased estimator of 2 by multiply by the ratio n/(n – 2)
n
iii xy
nn
ns
1
222 ˆˆ2
1ˆ
2
xx
xyyy S
SS
n
2
2
1
Estimators in Linear Regression
xS
Syxy
xx
xy ˆˆ
xx
xy
n
ii
n
iii
S
S
xx
yyxx
1
2
1
and
xx
xyyy
n
iii S
SS
nxy
ns
2
1
22
2
1ˆˆ2
1
The major computation is :
n
iixx xxS
1
2
n
iiyy yyS
1
2
n
iiixy yyxxS
1
n
x
xxxS
n
iin
ii
n
iixx
2
1
1
2
1
2
n
yx
yx
n
ii
n
iin
iii
11
1
n
y
yyyS
n
iin
ii
n
iiyy
2
1
1
2
1
2
n
iiixy yyxxS
1
Computing Formulae:
Application Of Statistical Theory to simple Linear Regression
We will now use statistical theory to prove optimal properties of the estimators.
Recall, the joint density of y1 ,y2 , …,yn is:
n
i
xy
n
ii
eyyyf1
221
2
2
2
1,,,
n
iii xy
nn e 1
222
1
2/2
1
i
n
ii
n
ii
n
ii
n
ii
n
ii
n
ii
n
iii
n
ii
n
ii
n
ii
n
iiiii
xyyyxxn
nn
xxnxyyy
nn
xxyy
nn
ee
e
e
111
22
1
22
1
22
1
22
1
2
111
22
1
222
222
12
2
1
2/
2222
1
2/
22
1
2/
2
1
2
1
2
1
ii
ii ypSgh )()(exp)()(
3
1
θyθy
n
ii
n
ii xxn
nn egh 1
22
1
22
22
1
2/2
1)(,
2
1)(,
where
θyθ
231
3
221
2
211
21
)( ,)(
,)( ,)(
,2
1)( ,)( and
θy
θy
θy
pyxS
pyS
pyS
n
iii
n
ii
n
ii
statisticssufficientcomplete
yyy
are
)(,)(,)(
Thus
13
12
1
21
n
iii
n
ii
n
ii yxSySyS
n
SS
n
y
yS
n
iin
iiyy
22
1
2
1
1
2 )()(
yy
)()(
)(
)(
Now
23
21
3
11
1
yy
y
y
SxSn
Sx
S
n
yx
yxS
n
ii
n
ii
n
iin
iiixy
n
S
n
yy
n
ii )(21 y
Also
)()()(ˆˆ 232
2 yyy
SxSS
x
n
Sx
S
Syxy
xxxx
xy
)()(1ˆ
23 yy
SxSSS
S
xxxx
xy
and
223
22
1
22
)()(1)(
)(2
1
2
1
yyy
y
SxSSn
SS
n
S
SS
ns
xx
xx
xyyy
Thus all three estimators are functions of the set of complete sufficient statistics.
If they are also unbiased then they are Uniform Minimum Variance Unbiased (UMVU) estimators (using the Lehmann-Scheffe theorem)
)(),(),( 321 yyy
SSS
)()()(
ˆ 2322 yy
y
SxSS
x
n
S
xx
)()(1ˆ
23 yy
SxSSxx
and
We have already shown that s2 is an unbiased estimator of 2. We need only show that:
are unbiased estimators of and .
. and )( and )( Now1
31
2 ii
n
iii
n
ii xyEyxSyS
yy
n
i
n
iii xnnxyESE
1 12 )( Thus y
n
ii
n
iii
n
iii xxnxxyExSE
1
2
113 )( and y
)()(1ˆ
23 yy
SExSES
Exx
Thus
are unbiased estimators of and .
xnnxxxnS
n
ii
xx
1
21
n
ii
xx
xnxS 1
221
n
iixx xnxS
1
22 since
)()()(
ˆ 2322 yy
y
SExSES
x
n
SEE
xx
Also
xnnxxxnS
x
n
xnn n
ii
xx
1
22
2
1
22
xnxS
xx
n
ii
xx
2
2
1
2
xx
n
ii
S
xnxxx
. and of esimators unbiased are ˆ and ˆ Thus
The General Linear Model
Consider the random variable Y with
1. E[Y] = 1X1+ 2X2 + ... + pXp
(alternatively E[Y] = 0+ 1X1+ ... + pXp, intercept included)
and
2. var(Y) = 2
• where 1, 2 , ... ,p are unknown parameters
• and X1 ,X2 , ... , Xp are nonrandom variables.
• Assume further that Y is normally distributed.
p
iii X
1
Thus the density of Y is:
f(Y|1, 2 , ... ,p, 2) = f(Y| , 2)
2
221122...
2
1exp
2
1pp XXXY
p
2
1
where β
β
Now suppose that n independent observations of Y, (y1, y2, ..., yn) are made
corresponding to n sets of values of (X1 ,X2 , ... , Xp) - (x11 ,x12 , ... , x1p),(x21 ,x22 , ... , x2p),...(xn1 ,xn2 , ... , xnp).
Then the joint density of y = (y1, y2, ... yn) is:
f(y1, y2, ..., yn|1, 2 , ... ,p, 2) =
n
i
p
jijjin
xy1
2
122/2 2
1exp
2
1
2βy
f
βXyβXyβy
22/2
2
2
1exp
2
1 Thus
nf
βXXββXyyy
2
2
1exp
2
122/2 n
βXyyyβXXβ
2
2
1exp
2
1exp
2
1222/2 n
βXyyyβy
2
2
1exp,
22
gh
npnn
p
p
xxx
xxx
xxx
21
22221
11211
X
Thus is a member of the exponential family of distributions
And is a Minimal Complete set of Sufficient Statistics.
2βy
f
yXyyS ,
Matrix-vector formulation
The General Linear Model
npnn
p
p
pn xxx
xxx
xxx
y
y
y
21
22221
11211
2
1
2
1
,,Let Xβy
ondistributi , a has Then 2IβXy
N X
ondistributi , a has and
thenlet or 2I0εεβXy
βXyε
N
The General Linear Model
npnn
p
p
pn xxx
xxx
xxx
y
y
y
21
22221
11211
2
1
2
1
,,Let Xβy
ondistributi , a has where 2I0εεβXy
N
Geometrical interpretation of the General Linear Model
p
npnn
p
p
pn xxx
xxx
xxx
y
y
y
xxXβy
1
21
22221
11211
2
1
2
1
,,Let
ondistributi , a has where 2I0εεβXy
N
X
xxxβXyμ
of columns by the spanned spacelinear in the lies
221 ppE
Geometical interpretation of the General Linear Model
1y
βXμ
ε
y
px
X of columns by the
spanned spaceLinear
1x
2y
ny
Estimation
The General Linear Model
Least squares estimates of Let
βXyβXy
n
i
p
jijjip xyRR
1
2
121 ,, β
p ,, of estimates squaresLeast The 21
p ˆ,ˆ,ˆ values theare 21
n
i
p
jijjip xyR
1
2
121 ,, minimizethat
The Equations for the Least squares estimates
pk
R
k
p ,,2 ,1 ,0,, 21
02or 1 1
n
iik
p
jijji xxy
pkyxxxn
iiik
p
j
n
iikijj ,...,2 ,1 and
11 1
Written out in full
n
iiip
n
iiip
n
iii
n
ii yxxxxxx
11
112
1121
1
21
n
iiip
n
iiip
n
ii
n
iii yxxxxxx
12
122
1
221
121
n
iiipp
n
iip
n
iipi
n
iipi yxxxxxx
11
22
121
11
These equations are called the Normal Equations
Matrix development of the Normal Equations
Now βXXββXyyyβXyβXyβ
2R
0βXXyX
β
βXXββXyyy
β
β
222R
yXβXXor
yXβXX
The Normal Equations
Summary (the Least Squares Estimates)
yXβXX ˆ
The Least Squares estimates satisfy The Normal Equations
npnn
p
p
n xxx
xxx
xxx
y
y
y
21
22221
11211
2
1
, where Xy
Note: Some matrix properties
Rank rank(AB) = min(rank(A), rank(B))
rank(A) ≤ min(# rows of A, # cols of A )
rank(A) = rank(A)Consider the normal equations
yXβXX ˆ
matrix. a is andmatrix a is pppn XXX
. if ,min nppnprankrank XXX
. is then invertibleXXXXX prankrank
. of be tosaid is matrix The rank fullX
then the solution to the normal equations
yXβXX ˆ
matrix. a is andmatrix a is pppn XXX
. if ,min nppnprankrank XXX
. is then invertibleXXXXX prankrank
. of is matrix theIf rank fullX
yXXXβ 1ˆ
Maximum Likelihood Estimation
General Linear Model
The General Linear Model
εβXy
ondistributi , a has where 2I0ε
N
βXyβXy
βy
22
1
2/2
2
2
1,
ef n
2
1,
function Likelihood
22
1
2/2
2βXyβXy
y β
eL n
The Maximum Likelihood estimates of and 2 are the values
that maximize
or equivalently
2ˆ and ˆ β
2
1,
22
1
2/2
2βXyβXy
y β
eL n
,ln, 22 ββ yy
Ll
2
1ln2ln
22
22 βXyβXy
nn
22
1ln2ln
22
22 βXXββXyyy
nn
This yields the system of linear equations
(The Normal Equations)
0
β
βXXβ
β
βXy
β
βy
2
,2l
yXβXX ˆ
0βXXyX
22 or
0
βy
2
2 ,
l
yields the equation:
0
1
2
ln2
2
2
2
2
βXyβXy
n
0 2
1 2222
βXyβXy
n
0
22 42
βXyβXy
n
βXyβXy ˆˆ1ˆ 2
n
If [X'X]-1 exists then the normal equations have solution:
and
βXyβXy ˆˆ1
ˆ 2
n
yXXXXyyXXXXy 11
n
1
yXXXXIXXXXIy 11
n
1
yXXXXIy 1
n
1
βXyyyyXXXXyyy 1 ˆ11
nn
yXXXβ 1
Summary
and
βXyβXy ˆˆ1
ˆ 2
n
yXXXXIy 1
n
1
βXyyy ˆ1
n
yXXXβ 1ˆ
Comments
The matrices
are symmetric idempotent matrices
XXXXIEXXXXE 12
11 and
1
1
11
1111
EXXXX
XXXXXXXX
XXXXXXXXEE
also 222 EEE
Comments (continued)
pnrankrank
prankrank
prank
XXXXIE
XXXXE
XX
12
11 and
rank) full of (i.e. if
1E1E
Geometry of Least Squares
1y
yXXXXβX 1ˆ
y
px
X of columns by the
spanned spaceLinear
1x
2y
ny
Example
• Data is collected for n = 15 cases on the variables Y (the dependent variable) and X1, X2, X3 and X4.
• The data and calculations are displayed on the next page:
x 1 x 2 x 3 x 4
52 59 34 7449 67 37 8953 51 31 8738 76 21 7456 69 27 8648 74 30 8351 69 28 76
X = 52 70 37 7757 76 29 8549 63 29 7648 72 26 7548 66 31 8746 67 25 7548 61 21 8045 66 32 78
36806 49540 21734 59457XX = 49540 68096 29284 80543
21734 29284 13118 3521659457 80543 35216 96736
63637.4Xy = 85176.9
37647103047.5
88.686.6
110.259.291.785.776.8
y = 86.197.179.782.592.374.287.979.8
0.004291 -0.00019 -0.00131 -0.002
(XX)-1 = -0.00019 0.000979 0.00018 -0.00076
-0.00131 0.00018 0.003771 -0.00072-0.002 -0.00076 -0.00072 0.002134
1.238816
(XX)-1Xy = -0.64272-0.003
0.840056
β
yXc
yXβyy
where20.59156ˆ11
1
ˆ11
1
4
1
17
1
2
2
jjj
ii cy
s
4.53779220.59156 s
Properties of The Maximum Likelihood Estimates
Unbiasedness, Minimum Variance
Note:
and
yXXXβ
1ˆ EE
ββXXXXyXXX
11 E
βcβcβc
ˆˆ EE
Thus is an unbiased estimator of . Since
is also a function of the set of complete minimal sufficient statistics, it is the UMVU estimator of . (Lehman-Scheffe)
βcβc
βc
βc
Note:
where
βXyβXy ˆˆ1
ˆ 2
n
yAyyXXXXIy 1
n
1
XXXXIA 1
n
1
In general
AΣμAμyAy trE
where yμ
EyΣ
ofmatrix covariance-variance and
Thus:
βXyβXy ˆˆ1
ˆ 2
n
yAyyXXXXIy 1
E
nEE
1ˆ 2
XXXXIA 1
n
1where
Aμμ trA
βXyμ
E nI2
βXXXXXIXβ 1
nE
1ˆ 2
nntr IXXXXI 1 21
Thus:
βXXXXXXXβ 1
nE
1ˆ 2
XXXXI 1 ntr
n
2
XXXXI 1 trtrn n
2
0
ptrnn
trnn
IXXXX 2
12
2n
pn
Let
βXyβXy ˆˆ1
ˆ 22
pnpn
ns
Then
2222 ˆ
n
pn
pn
nE
pn
nsE
Thus s2 is an unbiased estimator of 2.
Since s2 is also a function of the set of complete minimal sufficient statistics, it is the UMVU estimator of 2.
Distributional Properties
Least square Estimates (Maximum Likelidood estimates)
1. If then where A is a q × p matrix
Recall
~ , μy
pN AAμAyAw , ~
qN
yAyμy UN p then , ~ Suppose 2.
.rank of idempotent is 2.
.rank of idempotent is .1
r
r
A
A
μAμ
21on with distributi , has r
t.independen are and
then and , ~ Suppose 3.
yAwyAy
0ACμy
U
N p
t.independen are and
then and , ~ Suppose 4.
21 yByyAy
0BAμy
UU
N p
The General Linear Model
and yXXXXIy 1
pns
1 2. 2
XXXAyAyXXXβ 11 whereˆ 1.
yBy
yXXXXIy 1
1 Now
22
2
spn
U
XXXXIB 1 2
1 where
IβXy 2, ~
nN
The Estimates
Theorem
.0 with ,~ 2. 22
2
pnspn
U
12, ~ ˆ 1. XXββ
pN
Proof
tindependen are and ˆ 3. 2sβ
IβXμy 2,, ~ Since
nn NN
AAμAyAyXXXβ , ~ ˆthen 1 pN
ββXXXXβXXXXμA
11 Now
12112
121
and
XXXXXXXX
XXXIXXXAA
12, ~ ˆ Thus XXββ
pN
0on with distributi , has Now pnU yBy
.rank of idempotent is 2.
.rank of idempotent is .1
pn
pn
B
B
0 and 21 μBμ
IXXXXIB 1 22
and 1
where
BXXXXI
IXXXXIB
1
1
1 Now 2
2
XXXXXXXX
XXXXXXXXI
XXXXIXXXXI
11
11
11
Since
βXXXXXIXβμBμ 1
221
21 1
Also
.idempotent is Thus XXXXIBB 1
XXXXXXXX
XXXXXXXXI11
11
XXXXI 1
βXXXXXIXβ 1
22
1
βXXXXXXXXβ 1
22
1
02
12
βXXXXβ
pn
rank of
idempotent is Thus XXXXIBB 1
.rank full of is if Now pprankrank XXXXXX 1
XXXXIEXXXXE 11 21 and Let
0EE
EEIEE
21
2121
and
with idempotent symmetric are and
Thus 12 pnranknrank EE
.0 with ,~ Hence 22
2
pnspn
U yBy
Finally
and yCyyXXXXIy 1
pns
12
XXXAyAyXXXβ 11 whereˆ Since
XXXXIC 1
pn
1with
XXXXIIXXXXCA 11
pn
1then
XXXXIXXXX 11
pn
XXXXXXXXXXXX 111
pn
0XXXXXXXX 11
pn
Thus
are independent.
yXXXXIyyXXXβ 1
pns
1 and ˆ 21
Summary
.0 with ,~ 2. 22
2
pnspn
U
12, ~ ˆ 1. XXββ
pN
tindependen are and ˆ 3. 2sβ
Example
• Data is collected for n = 15 cases on the variables Y (the dependent variable) and X1, X2, X3 and X4.
• The data and calculations are displayed on the next page:
x 1 x 2 x 3 x 4
52 59 34 7449 67 37 8953 51 31 8738 76 21 7456 69 27 8648 74 30 8351 69 28 76
X = 52 70 37 7757 76 29 8549 63 29 7648 72 26 7548 66 31 8746 67 25 7548 61 21 8045 66 32 78
36806 49540 21734 59457XX = 49540 68096 29284 80543
21734 29284 13118 3521659457 80543 35216 96736
63637.4Xy = 85176.9
37647103047.5
88.686.6
110.259.291.785.776.8
y = 86.197.179.782.592.374.287.979.8
0.004291 -0.00019 -0.00131 -0.002
(XX)-1 = -0.00019 0.000979 0.00018 -0.00076
-0.00131 0.00018 0.003771 -0.00072-0.002 -0.00076 -0.00072 0.002134
1.238816
(XX)-1Xy = -0.64272-0.003
0.840056
β
yXc
yXβyy
where20.59156ˆ11
1
ˆ11
1
4
1
17
1
2
2
jjj
ii cy
s
4.53779220.59156 s
0.2096250.043943
0.2786530.077648
0.1419820.020159
0.2972470.088356
4
3
2
ˆ
ˆ
ˆ
ˆ
s
s
s
s
βvar 0.088356 -0.00401 -0.02694 -0.04116
s2(XX)-1
= -0.00401 0.020159 0.003709 -0.01567-0.02694 0.003709 0.077648 -0.0148-0.04116 -0.01567 -0.0148 0.043943
1.238816
(XX)-1Xy = -0.64272-0.003
0.840056
β
0.2096250.043943
0.2786530.077648
0.1419820.020159
0.2972470.088356
4
3
2
ˆ
ˆ
ˆ
ˆ
s
s
s
s
Compare with SPSS output
Estimates of the coefficients
The General Linear Model
with an intercept
Consider the random variable Y with
1. E[Y] = 0+ 1X1+ 2X2 + ... + pXp
(intercept included)
and
2. var(Y) = 2
• where 1, 2 , ... ,p are unknown parameters
• and X1 ,X2 , ... , Xp are nonrandom variables.
• Assume further that Y is normally distributed.
npnn
p
p
pn xxx
xxx
xxx
y
y
y
21
22221
11211
2
1
0
2
1
1
1
1
,,Let Xβy
ondistributi , a has i.e. 2IβXy
N
ondistributi , a has where 2I0εεβXy
N
The matrix formulation (intercept included)
Then the model becomes
Thus to include an intercept add an extra column of 1’s in the design matrix X and include the intercept in the parameter vector
nn x
x
x
y
y
y
1
1
1
,, 2
1
1
02
1
Xβy
The matrix formulation of the Simple Linear regression model
2
1
2
1
12
1
21
1
1
1
111
xnSxn
xnn
xx
xn
x
x
x
xxx
xx
n
ii
n
ii
n
ii
n
n XX
and
yxnS
yn
yx
y
y
y
y
xxx xyn
iii
n
ii
n
n
1
12
1
21
111
yX
nxn
xnxnS
xnxnSn
xnSxn
xnn
xx
xx
xx
2
22
1
2
1
1
XX
12
1
1
xxxx
xxxx
SS
xS
x
S
x
n
Now
thus
yxnS
yn
SS
xS
x
S
x
n
xy
xxxx
xxxx
1
1
ˆ
ˆ
2
1
1
yXXX
yxnS
Syn
S
x
yxnSS
xyn
S
x
n
xyxxxx
xyxxxx
1
1 2
xx
xy
xx
xy
S
S
xS
Sy
Finally
xxxx
xxxx
S
ss
S
x
sS
xs
S
x
ns
22
222
12
1
1
ˆ
ˆvar XX
xx
xx
xx
S
sx
S
s
sS
x
n
2
10
2
1
22
ˆ,ˆcov
ˆvar
1ˆvar
Thus
The Gauss-Markov Theorem
An important result in the theory of Linear models
Proves optimality of Least squares estimates in a more general setting
Assume the following model Linear Model
IyβXy 2var and
E
We will not necessarily assume Normality.
Consider the least squares estimate of
ˆ 1 yXXXβ
β
nn yayaya
2211
1
ˆ
ya
yXXXcβc
is an unbiased linear estimator of βc
The Gauss-Markov Theorem
Assume
IyβXy 2var and
E
Consider the least squares estimate of
ˆ 1 yXXXβ
β
nn yayaya
2211
1
ˆ
ya
yXXXcβc
, an unbiased linear estimator of βc
and
Let nn ybybyb
2211yb
denote any other unbiased linear estimator of βc
βcyb ˆvarvarthen
Proof Now IyβXy 2var and
E
βcβXXXXc
yXXXc
yXXXcβc
1
1
1
ˆ
E
EE
cXXccXXXXXXc
cXXXIXXXc
cXXXyXXXc
yXXXcβc
12112
121
11
1
var
var
varˆvar
Now is an unbiased estimator of if
yb βc
ββcβXbybyb
allfor EE
cbXcXb
or i.e.
Also
bbbIbbybyb 22varvar
Thus
cXXcbbβcyb 122ˆvarvar
bXXXXbbb 122
bXXXXIb 12
bXXXXIuuu
bXXXXIXXXXIb
12
112
where0
Thus
βcyb ˆvarvar
The Gauss-Markov theorem states that
is the Best Linear Unbiased Estimator (B.L.U.E.) of
βc
βc
Hypothesis testing for the GLM
The General Linear Hypothesis
Testing the General Linear Hypotheses
The General Linear Hypothesis H0: h111 + h122 + h133 +... + h1pp = h1
h211 + h222 + h233 +... + h2pp = h2
...
hq11 + hq22 + hq33 +... + hqpp = hq
where h11h12, h13, ... , hqp and h1h2, h3, ... , hq are known coefficients.
In matrix notation11
qppqhβH
Examples 1. H0: 1 = 0
2. H0: 1 = 0, 2 = 0, 3 = 0
3. H0: 1 = 2
6
5
4
3
2
1
16
β 0,000001
1161
hH
0
0
0
,
000100
000010
000001
1363hH
0,0000111161
hH
Examples 4. H0: 1 = 2 , 3 = 4
5. H0: 1 = 1/2(2 + 3)
6. H0: 1 = 1/2(2 + 3), 3 = 1/3(4 + 5 + 6)
6
5
4
3
2
1
16
β
0
0,
001100
0000111262
hH
0,0001112
121
61
hH
0
0,
100
000112
31
31
31
21
21
62hH
TheLikelihood Ratio Test
The joint density of is:
βXyβXyβy
22/2
2
2
1exp
2
1
nf
y
The likelihood function
βXyβXyβy
22/2
2
2
1exp
2
1
nL
The log-likelihood function
βXyβXy
ββ yy
22
22
22
2
1ln2ln
ln
nn
Ll
Defn (Likelihood Ratio Test of size )Rejects
H0:
against the alternative hypothesis
H1: .
when
and K is chosen so that
KL
L
f
f
2
2
ˆˆ
ˆˆ
)|(max
)|(max
β
β
θx
θx
y
y
θ
θ
and allfor )|( θxθxxC
dfCP
0 oneleast at for )|( θxθxxC
dfCP
hβH
hβH
hβHβ
βββ
ˆ: assuming of sM.L.E.' theare
ˆˆ and of sM.L.E.' theare ˆˆ where
02
222
H
Note
2ˆ and ˆ
find To β
We will maximize.
condition side thesubject to ly equivalent
2
1ln2ln
2
22
222
β
βXyβXyβ
y
y
L
l nn
βXyβXyyXXXβ ˆˆˆ and ˆ 121
n
hβH
:0H
The Lagrange multiplier technique will be used for this purpose
We will maximize.
hβHλ
βXyβXy
hβHλβλβ y
2
1ln2ln
,
22
22
22
nn
lg
0hβH0
λ
λβ
gives ,2g
β
hβHλ
β
βXyβXy
β
λβ
2
2
2
1,
g
0λHβXXyX
222
12
or0λHβXXyX
2
λHyXβXX 2
λHXXyXXXβ 121
0
2
1
2
,2222
2
βXyβXyλβ
ng
finally
or βXyβXy
n
12
Thus the equations for are
λHXXyXXXβ 121 ˆˆ
Now
or
βXyβXy
ˆˆ1ˆ 2
n
hβH
ˆ
λHXXHyXXXHβHh 121 ˆˆ
yXXXHhλHXXH 1
2
1
ˆ1
and yXXXHHXXHhHXXHλ 11111
2ˆ1