Date post: | 23-Dec-2015 |
Category: |
Documents |
Upload: | augustus-sims |
View: | 234 times |
Download: | 0 times |
Linear regression modelsin matrix terms
The regression function in matrix terms
Simple linear regression function
iii xY 10 for i = 1,…, n
nnn xY
xY
xY
10
22102
11101
Simple linear regression function in matrix notation
XY
nnn x
x
x
Y
Y
Y
2
1
1
02
1
2
1
1
1
1
Definition of a matrixAn r×c matrix is a rectangular array of symbols or numbers arranged in r rows and c columns.
A matrix is almost always denoted by a single capital letter in boldface type.
36
21A
9.1401
8.2711
5.2651
1.3921
4.3801
B
6261
5251
4241
3231
2221
1211
1
1
1
1
1
1
xx
xx
xx
xx
xx
xx
X
Definition of a vector and a scalarA column vector is an r×1 matrix, that is, a matrix with only one column.
8
5
2
q
A row vector is an 1×c matrix, that is, a matrix with only one row.
90324621h
A 1×1 “matrix” is called a scalar, but it’s just an ordinary number, such as 29 or σ2.
Matrix multiplication
XY• The Xβ in the regression function is an example of
matrix multiplication.• Two matrices can be multiplied together:
– Only if the number of columns of the first matrix equals the number of rows of the second matrix.
– The number of rows of the resulting matrix equals the number of rows of the first matrix.
– The number of columns of the resulting matrix equals the number of columns of the second matrix.
Matrix multiplication
• If A is a 2×3 matrix and B is a 3×5 matrix then matrix multiplication AB is possible. The resulting matrix C = AB has … rows and … columns.
• Is the matrix multiplication BA possible?
• If X is an n×p matrix and β is a p×1 column vector, then Xβ is …
Matrix multiplication
59273841
8810610190
8696
3745
5123
218
791ABC
The entry in the ith row and jth column of C is the inner product (element-by-element products added together) of the ith row of A with the jth column of B.
101)9(7)4(9)2(1
90)6(7)5(9)3(1
12
11
c
c
24
23 27)6(2)7(1)1(8
c
c
The Xβ multiplication in simple linear regression setting
1
02
1
1
1
1
nx
x
x
X
Matrix addition
XY• The Xβ+ε in the regression function is an example
of matrix addition.• Simply add the corresponding elements of the two
matrices.– For example, add the entry in the first row, first column
of the first matrix with the entry in the first row, first column of the second matrix, and so on.
• Two matrices can be added together only if they have the same number of rows and columns.
Matrix addition
1465
8510
199
812
139
257
653
781
142
BAC
23
12
11
954
972
c
c
c
For example:
The Xβ+ε addition in the simple linear regression setting
nnn x
x
x
X
Y
Y
Y
Y
2
1
10
210
110
2
1
Multiple linear regression functionin matrix notation
XY
nnnnn xxx
xxx
xxx
Y
Y
Y
2
1
321
232221
131211
2
1
1
1
1
Least squares estimates of the parameters
Least squares estimates
YXXX
b
b
b
b
p
1
1
1
0
The p×1 vector containing the estimates of the p parameters can be shown to equal:
where (X'X)-1 is the inverse of the X'X matrix and X' is the transpose of the X matrix.
Definition of the transpose of a matrix
The transpose of a matrix A is a matrix, denoted A' or AT, whose rows are the columns of A and whose columns are the rows of A … all in the same original order.
97
84
51
A
TAA
The X'X matrix in the simple linear regression setting
n
n
x
x
x
xxxXX
1
1
1
111 2
1
21
Definition of the identity matrixThe (square) n×n identity matrix, denoted In, is a matrix with 1’s on the diagonal and 0’s elsewhere.
10
012I
The identity matrix plays the same role as the number 1 in ordinary arithmetic.
10
01
64
79
Definition of the inverse of a matrix
The inverse A-1 of a square (!!) matrix A is the unique matrix such that …
11 AAIAA
Least squares estimates in simple linear regression setting
?1
1
0
YXXXb
bb
n
n
x
x
x
xxxXX
1
1
1
111 2
1
21
soap suds so*su soap2
4.0 33 132.0 16.004.5 42 189.0 20.255.0 45 225.0 25.005.5 51 280.5 30.256.0 53 318.0 36.006.5 61 396.5 42.257.0 62 434.0 49.00--- --- ----- -----38.5 347 1975.0 218.75
ix iy ii yx 2ix
Find X'X.
Least squares estimates in simple linear regression setting
It’s very messy to determine inverses by hand. We let computers find inverses for us.
14286.078571.0
78571.04643.4
75.2185.38
5.3871XXXX
14286.078571.0
78571.04643.4
75.2185.38
5.3871
1XX
Find inverse of X'X.
Therefore:
Least squares estimates in simple linear regression setting
?1
1
0
YXXXb
bb
n
n
y
y
y
xxxYX
2
1
21
111soap suds so*su soap2
4.0 33 132.0 16.004.5 42 189.0 20.255.0 45 225.0 25.005.5 51 280.5 30.256.0 53 318.0 36.006.5 61 396.5 42.257.0 62 434.0 49.00--- --- ----- -----38.5 347 1975.0 218.75
ix iy ii yx 2ix
Find X'Y.
Least squares estimates in simple linear regression setting
1975
347
14286.078571.0
78571.04643.41 YXXXb
51.9
67.2
)1975(14286.0)347(78571.0
)1975(78571.0)347(4643.4
1
0
b
bb
The regression equation issuds = - 2.68 + 9.50 soap
Linear dependence
The columns of the matrix:
31263
6812
1421
A
are linearly dependent, since (at least) one of the columns can be written as a linear combination of another.
If none of the columns can be written as a linear combination of another, then we say the columns are linearly independent.
Linear dependence is not always obvious
123
132
141
A
Formally, the columns a1, a2, …, an of an n×n matrix are linearly dependent if there are constants c1, c2, …, cn, not all 0, such that:
02211 nnacacac
Implications of linear dependence on regression
• The inverse of a square matrix exists only if the columns are linearly independent.
• Since the regression estimate b depends on (X'X)-1, the parameter estimates b0, b1, …, cannot be (uniquely) determined if some of the columns of X are linearly dependent.
The main point about linear dependence
• If the columns of the X matrix (that is, if two or more of your predictor variables) are linearly dependent (or nearly so), you will run into trouble when trying to estimate the regression function.
Implications of linear dependenceon regressionsoap1 soap2 suds4.0 8 334.5 9 425.0 10 455.5 11 516.0 12 536.5 13 617.0 14 62
* soap2 is highly correlated with other X variables* soap2 has been removed from the equation
The regression equation issuds = - 2.68 + 9.50 soap1
Fitted values and residuals
Fitted values
nn xbb
xbb
xbb
y
y
y
y
10
210
110
2
1
ˆ
ˆ
ˆ
ˆ
Fitted values
yXXXXXby 1ˆ
The vector of fitted values
is sometimes represented as a function of the hat matrix H
XXXXH 1
That is:
HyyXXXXy 1ˆ
The residual vector
iii yye ˆ for i = 1,…, n
nnn yye
yye
yye
ˆ
ˆ
ˆ
222
111
nnn yy
yy
yy
e
e
e
e
ˆ
ˆ
ˆ
22
11
2
1
The residual vector written as a function of the hat matrix
nnn yy
yy
yy
e
e
e
e
ˆ
ˆ
ˆ
22
11
2
1
Sum of squares and the analysis of variance table
Analysis of variance table in matrix terms
Source DF SS MS F
Regression p-1
Error n-p
Total n-1 JYYn
YYSSTO
1
JYYn
YXbSSR
1
YXbYYSSE
1pSSR
pn
SSE
MSE
MSR
Sum of squares
In general, if you pre-multiply a vector by its transpose, you get a sum of squares.
n
iin
n
n yyyy
y
y
y
yyyyy1
2222
21
2
1
21
Error sum of squares
2
1
ˆn
iii yySSE
Error sum of squares
yyyySSE ˆˆ '
Total sum of squares
n
yyyySSTO iii
2
22
Previously, we’d write:
JYYn
YYSSTO
1
But, it can be shown that equivalently:
where J is a (square) n×n matrix containing all 1’s.
An example oftotal sum of squares
If n = 2:
2221
21
221
22
1
2 YYYYYYYi
i
But, note that we get the same answer by:
2221
21
2
121 2
11
11YYYY
Y
YYYJYY
Analysis of variance table in matrix terms
Source DF SS MS F
Regression p-1
Error n-p
Total n-1 JYYn
YYSSTO
1
JYYn
YXbSSR
1
YXbYYSSE
1pSSR
pn
SSE
MSE
MSR
Model assumptions
Error term assumptions
• As always, the error terms εi are:
– independent– normally distributed (with mean 0)– with equal variances σ2
• Now, how can we say the same thing using matrices and vectors?
Error terms as a random vector
The n×1 random error term vector, denoted as ε, is:
n
2
1
The mean (expectation) of the random error term vector
The n×1 mean error term vector, denoted as E(ε), is:
0
0
0
0
2
1
nE
E
E
E
Definition Assumption Definition
The variance of the random error term vector
The n×n variance matrix, denoted as σ2(ε), is defined as:
nnn
n
n
n
221
222
21
12112
2
1
22
),(),(
),(),(
),(),(
Diagonal elements are variances of the errors.Off-diagonal elements are covariances between errors.
The ASSUMED variance of the random error term vector
BUT, we assume error terms are independent (covariances are 0), and have equal variances (σ2).
2
2
2
2
1
22
00
00
00
n
Scalar by matrix multiplication
Just multiply each element of the matrix by the scalar.
462
101214
082
231
567
041
2
For example:
The ASSUMED variance of the random error term vector
2
2
2
2
00
00
00
The general linear regression model
Putting the regression function and assumptions all together, we get:
XYwhere:
• Y is a ( ) vector of response values
• β is a ( ) vector of unknown parameters
• X is an ( ) matrix of predictor values
• ε is an ( ) vector of independent, normal error terms with mean 0 and (equal) variance σ2I.