Date post: | 27-Mar-2015 |
Category: |
Documents |
Upload: | juan-fisher |
View: | 214 times |
Download: | 1 times |
STATISTICS Linear Statistical Models
Professor Ke-Sheng ChengDepartment of Bioenvironmental Systems Engineering
National Taiwan University
The Method of Least Squares • Consider the data shown in the following
table and figure. We are interested in fitting a straight line to the points in order to obtain a simple mathematical relationship for runoff and rainfall.
112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU
2
112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU
3
• Intuitively, we want that, for each observed value of rainfall, the corresponding value of runoff will be as close as possible to the observed value. It is equivalent to say that we want the vertical deviations to be as small as possible.
112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU
4
• One method of constructing such a straight line to fit the observed data is called the method of least squares. It requires the sum of the squares of the vertical deviations of all the points from the fitted line to be a minimum.
• Let the rainfall and runoff data in the above figure be respectively represented by x and y. The fitted line is expressed by
xy 10ˆ
112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU
5
112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU
6
112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU
7
112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU
8
112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU
9
Remarks
n
ini
n
inini
xx
yyxx
1
2
11
)(
))((
n
ini
n
iini
xx
yxx
1
2
1
)(
)(
n
ini
n
inii
xx
yyx
1
2
1
)(
)(
,0)ˆ(1
n
iii yy
n
iiii yyx
1
0)ˆ(
112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU
10
112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU
11
112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU
12
Given a value of x, what dose the predicted value of y really represent?
• Given a value of x, what dose the predicted value of y really represent?– It is unlikely that the predicted value will be the
same as the observed value at all times. – It may even be possible that the predicted value is
the same as the observed value only in very few cases.
– In some cases, the predicted values are far different from observed values.
• We are sure that the linear model may overpredict or underpredict the observed values.
112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU
13
Linear statistical model
112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU
14
.),0(~,)|( , Given 210 Niidxxyx iiiiii
Random component
We are not able to predict y without errors due to existence of the random component. If a phenomenon is stochastic in nature, it cannot be predicted without errors.
112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU
15
iii
ixY
xxYVar
x
xNxY
ii
2
10|
210
)|(
),(~)|(
model) d(Postulate
.),0(~,)|( , Given 210 Niidxxyx iiiiii
iiiii
ii
exeyy
xy
10
10
ˆˆˆ
ˆˆˆ
Coefficient of determination• How well does the least squares line explain
the variation in the data? • The coefficient of determination represents
the proportion of data variation that can be explained by the linear regression model.
112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU
16
112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU
17
112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU
18
Estimating the variance of Y|x
112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU
19
iii
ixY
xxYVar
xxNxYii
2
10|2
10
)|(
,),(~)|(
Note: The variance of Y|x is NOT the same as the variance of Y.
RSS (Residual sum of squares) = SSE (sum of squared errors)
n
iii
n
iii xyyyRSS
1
210
1
2 )]ˆˆ([)ˆ(
112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU
20
Unbiasedness of the least squares estimators
112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU
21
Confidence intervals of the regression coefficients• Pivotal quantities
112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU
22
10 and
22
000 ~
1
ˆ
n
xx
n
t
s
x
ns
Q
211
1 ~ˆ
n
xx
tss
Q
222
2
3 ~)2(
n
snQ
Hypothesis tests for regression coefficients
112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU
23
112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU
24
112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU
25
112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU
26
112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU
27
112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU
28
112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU
29
Simple linear regression using R• Useful material– Chapter 11 of Introduction to Probability and
Statistics Using R (G. J. Kerns) is highly recommended.
– http://www.montefiore.ulg.ac.be/~kvansteen/GBIO0009-1/ac20092010/Class8/Using%20R%20for%20linear%20regression.pdf
112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU
30
• Defining linear regression models
112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU
31
• Conducting regressionlm(y~model)
112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU
32
s
• Other useful commands
112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU
33
– For prediction (x values not observed)
112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU
34
Graphing the Confidence and Prediction Bands
112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU
35
You may want to change it. For example, data.frame(x=seq(20,30,by=0.5))
Confidence and prediction intervals
112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU
36
Line of prediction. It represents the estimated conditional expectation of y given x.
112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU
37
• Multiple regression – The following slides are provided for your reference
only. Due to the time constraint, they will not be covered in this class.
112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU
38
• Now let’s consider fitting a linear function of several variables. Suppose that we have the following data set:
112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU
39
112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU
40
112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU
41
112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU
42
knnn
k
k
xxx
xxx
xxx
X
21
22212
12111
1
1
1
k
1
0
ny
y
y
Y2
1
YXB
112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU
43
YXXBX TT YXXXB TT 1)(
The Linear Regression Model
112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU
44
112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU
45
112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU
46
112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU
47
112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU
48
Covariance and Correlation Coefficient
• Suppose we have observed the following data. We wish to measure both the direction and the strength of the relationship between Y and X.
112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU
49
112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU
50
112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU
51
112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU
52
112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU
53
112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU
54
112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU
55
112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU
56
The Analysis of Variance (ANOVA)
112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU
57
• Given X, Y’s are independent normal random variables, i.e.,
• The residual sum of squares (or sum of squared errors, SSE) is expressed by
nI2,~ XBNY
BXYBXYSSET ˆˆ
112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU
58
BXYYY
BXYXBBXYY
BXYBXYSSE
TT
TTT
T
ˆ
ˆˆˆ
ˆˆ
0ˆ BXYX T
112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU
59
112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU
60
112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU
61
112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU
62
112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU
63
• The total sum of squares corrected for the mean is referred to as the total variation. This total variation is split up in two parts:– the regression part (SSRm) “explained by the model”,
and
– the residual part (SSE).
112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU
64
• The ratio is known as the coefficient of determination.
• If the coefficient of determination is large then the model provides a good fit to the data. It also represents the part of the total variation which is explained by the model.
mm SSTSSRR /2
112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU
65
112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU
66
112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU
67
112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU
68
Properties of the Estimators
112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU
69
112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU
70
112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU
71
112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU
72
Confidence Intervals
112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU
73
• The 100(1 – )% confidence interval of 2 is
2
2,
2
221,
2 )(,
)(
pnpn
spnspn
112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU
74
112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU
75
• However, the true value of is unknown, the above equation can not be used to establish the confidence interval of .
• We then use s to substitute and it is known that has a t-distribution with
(n–p) degree of freedom.
i
i
ii
vs
ˆ
112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU
76
• The 100(1 – )% confidence interval of
is
1ˆˆ
1ˆ
2,2,
2,2,
ipniiipni
pn
i
iipn
vstvstP
tvs
tP
i
ipniipni vstvst 2,2,ˆ,ˆ
112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU
77
112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU
78
112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU
79
Example 1 • A scientist carries out an experiment on the
relationship between the yield Y of a crop and the amount of irrigation water X. It is believed that the relationship between expected yield and amount of irrigation water (ignore the units) can be described adequately as
xxxYE 210)|(
112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU
80
• The data shown in the following table were collected in the field.
112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU
81
112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU
82
112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU
83
112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU
84
112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU
85
Example 2 • Data in the following table are rainfall (x)
and runoff (y) measured during the rainy season in a study area.
• A regression model is postulated for the above data iXY X
ii 10|
112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU
86
112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU
87
112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU
88
112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU
89
112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU
90
112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU
91
Test of Hypotheses
112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU
92
112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU
93
112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU
94
112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU
95
112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU
96