Date post: | 29-Dec-2015 |
Category: |
Documents |
Upload: | bartholomew-cain |
View: | 215 times |
Download: | 0 times |
Unit 4: ModelingTopic 6: Least Squares Method
April 1, 2003
Mathematical Modeling – Least Squares Section 2.3
Three Modeling Methods Known Relationship – underlying
mathematical setting is known Finite Differences – theoretical data or
hard science data with little scatter Least Squares – modeling data with
scatter
Model: Global WarmingGlobal warming is partly the
result of burning fuels, which increases the amount of carbon dioxide in the air. One of the major sources of fuel consumption are cars. Let’s examine the number of cars in the U.S. (in millions) as one variable of global warming.
Year Cars
1940 27.5
1950 40.3
1960 61.7
1970 89.3
1980 121.6
1990 150.5
Linear or Curvilinear? Numeric Method –
use Finite Differences method to determine if data has a near linear trend. Is the first difference
nearly constant?
Year Cars
1940 27.5
1950 40.3
1960 61.7
1970 89.3
1980 121.6
1990 150.5
Linear or Curvilinear? Numeric Method –
use Finite Differences method to determine if data has a near linear trend. Is the first difference
nearly constant? Solution: Nearly so
after 1960.
Year Cars
1940 27.5
1950 40.3
1960 61.7
1970 89.3
1980 121.6
1990 150.5
1st Finite
Difference
12.8
21.4
27.6
32.3
28.9
Linear or Curvilinear?
YearScaled
Cars
1 27.5
2 40.3
3 61.7
4 89.3
5 121.6
6 150.5
Graphic Method – plot the data to see the trendEstimate a line of best fit
What point should the line pass through?
Eyeball line of fit
Linear or Curvilinear?
YearScaled
Cars
1 27.5
2 40.3
3 61.7
4 89.3
5 121.6
6 150.5
Graphic Method – plot the data to see the trendEstimate a line of best fit
What point should the line pass through?Solution: Midpoint (3.5,81.8)Estimate the trend with approximate slope of m = 30
Error in ModelError = observed - predicted
Error Error or Residual for a model is the
vertical distance between the actual (observed) value and the predicted (fitted) value
Error = observed - predicted Actual value: (x,y) Predicted value: (x, ) So Error = y -
yy
Finding Total Error Why not just add the errors to get
total error?
Error Cancellation
Finding Total Error Why not just add the errors to get
total error? Solution: + and – values cancel out This gives a false sense of what an
error of zero should mean
How do we find total error? Make errors positive by
Taking the absolute value of the errors
Square the errors Sum the positive errors to find the
total error The best fit line makes the positive
sum of errors as small as possible
Question
Why square deviations vs. absolute deviations?
Gauss-Markov Theorem: Of all methods for fitting a line to data, the least squares method provides predictions with minimum variance (a measure of uncertainty).
Gauss
Markov
Least Squares Method vs. Absolute Value Method The Least Squares Method always
yields a unique best fit line, while the Absolute Value Method does not.
Example: Use the NCTM Applet for Least Squares to approximate the line of best fit for the following set of points: (0,2), (0,4), (6,3), (6,5), (3,3.5)
Absolute Value MethodPts: (0,2),(0,4),(6,3),(6,5),(3,3.5)
Absolute Value MethodPts: (0,2),(0,4),(6,3),(6,5),(3,3.5)
Absolute Value MethodPts: (0,2),(0,4),(6,3),(6,5),(3,3.5)
Least Square MethodPts: (0,2),(0,4),(6,3),(6,5),(3,3.5)
Outliers An outlier is a point that lies
further from the line of best fit than most other points.
Least-square method is more sensitive to outliers.
Absolute Value MethodPts: 4 near line, one outlier
Least Square MethodPts: 4 near line, one outlier
Mean Squared Error For a set of data (x,y) with n elements modeled by a line
bmxy ˆ
n
yyyyyyMSE nn
2222
211 )ˆ()ˆ()ˆ(
Standard Deviation Standard Deviation: a measure of error found by square rooting the MSE
to get a measure of error in the same dimension as the original data.
so the standard deviation is
n
yyyyyyMSE nn
2222
211 )ˆ()ˆ()ˆ(
MSE
Mean Squared Error
2.2330ˆ xy
n
yyyyyyMSE nn
2222
211 )ˆ()ˆ()ˆ(
Calculate the MSE for the Global warming data when modeled by the eyeball line of best fit
Year Cars Observed
Cars Predicted
1 27.5 6.8
2 40.3 36.8
3 61.7 66.8
4 89.3 96.8
5 121.6 126.8
6 150.5 156.8
Mean Squared Error
2.2330ˆ xy
n
yyyyyyMSE nn
2222
211 )ˆ()ˆ()ˆ(
Calculate the MSE for the Global warming data when modeled by the eyeball line of best fit
Solution: MSE 98.3
Year Cars Observed
Cars Predicted
1 27.5 6.8
2 40.3 36.8
3 61.7 66.8
4 89.3 96.8
5 121.6 126.8
6 150.5 156.8
915.93.98 MSE
Some History - Gauss Gauss came up with a
mathematical model for fitting a line to data when he was 17 years old (in 1794)!
However, he did not publish his findings until 1809.
Some History - Legendre Legendre published the first explicit
account of the method of least squares in 1805, 4 years before Gauss published his.
But in Gauss’s publication, he referred to his earlier (1794) work, which created a controversy about who had first discovered the method.
Today both Gauss and Legendre are given credit for discovering the method of least squares independently.
Line of Best Fit Least Squares Fit Method –
algebraic method of finding the best fit line
This method gives a line which has the smallest possible mean squared error or standard deviation.
Verification of Least Square Fit Method Minimize MSE
n
yy
n
yyyyyy
n
iii
nn
1
2
2222
211
)ˆ(
)ˆ()ˆ()ˆ( MSE
Verification of LS Method n is a fixed constant, so minimize
numerator
where is predicted value
yi is actual observed value for xi
n
iii
n
iii yyyy
1
2
1
2 )ˆ()ˆ(
iii bxaxy )(ˆ
Verification of LS Method
n
iiiiiii
n
iii
n
iii
yybxxbayabxa
ybxayy
1
2222
1
2
1
2
)222(
)()ˆ( Substitute for iy
Square
Verification of LS Method
Now expand the summation, which requires some summation rules:
Distribute Summation: Summation of Constant:
Summation and Mean:
byaxbyax )(
n
i n
n1
55...5555
n
yy
n
xx
i
i
Verification of LS Method
222
2
1
2222
2
22
)222(
iiii
n
iiiiiii
yyxbxb
ynaxnbaan
yybxxbayabxa
Verification of LS Method
Now the line of best fit passes throughii bxay ˆ
xbya
giving
xbay
soyx
),,(
Substitute for a with and simplify xby
Verification of LS Method
22222
2222
2
2)ˆ(
xnx
yxnyxbbxnx
yxyxnbxnxbyy
i
iii
iiiii
The left expression and is a constant value since data is known, so we ignore it when minimizing.
022 xnxi
Verification of LS Method
2
22
2
22
xnx
yxnyx
xnx
yxnyxb
i
ii
i
ii
Complete square on second expression to get
Since both terms are squared they are positive Subtracting the second positive term can only
make the expression smaller Thus minimizing the first positive term will
minimize the entire expression
Verification of LS Method Entire expression is minimal when
Thus
022
xnx
yxnyxb
i
ii
xbya
xnx
yxnyxb
i
ii
22
and in line of best fit
bxay ˆ
Line of Best Fit Derive’s calculated line of best fit
isy = 25.3 x – 6.8
Mean Squared Error is reduced from 98.3 for the eyeballed line of fit to 34.6 for the line of best fit.
Line of Best Fit (blue)
Polynomial of Best Fit Is the line of best fit a better model
then a quadratic or cubic polynomial? Examine the line of best fit with the
data. Is the data curvilinear?
Polynomial of Best Fit Is the line of best fit a better model
then a quadratic or cubic polynomial? Examine the line of best fit with the data.
Is the data curvilinear? Solution: Data starts above line of best
fit, then goes below and finishes back above – indication that data is curvilinear
Extension of Linear Method to Other Polynomial Functions
Polynomial of Best Fit Derive calculated quadratic of best
fit for the Global Warming Data
The MSE is reduced from 34.6 for the line of best fit to 2.02 for the quadratic of best fit
9.138.92.2)( 2 xxxy
Polynomial of Best Fit
Model:Higher Ed. Cost
Find a model for the average cost of tuition and fees per semester for public 4-year colleges in the U.S.
What is the predicted cost in 1990? 2005?
When will the cost reach $3500?
Year Cost
1975 599
1980 840
1985 1386
1988 1726
1989 1846
1990 2006
Higher Ed. Cost Is the data linear or
curvilinear? Scale data Derive or Grapher to Fit
C(y) = 95.7y + 491.4 (sd 80.5)
C(y) = 6.1y2 + 671.7 (sd 56.0)
Year Cost
1975Scale 0
599
1980Sc. 5
840
1985Sc. 10
1386
1988Sc. 13
1726
1989Sc. 14
1846
1990Sc. 15
2006
Scatter plot & Models
Predicting Cost
The model for the average cost of tuition and fees per semester for public 4-year colleges in the U.S. is
C(y) = 6.1y2 + 671.7 What is the predicted
cost in 1990? What is the error in the
prediction?
Year Cost
1975 599
1980 840
1985 1386
1988 1726
1989 1846
1990 2006
Predicting CostThe model for the average cost
of tuition and fees per semester for public 4-year colleges in the U.S. is
C(y) = 6.1y2 + 671.7 Predicted cost in 1990 which
is scaled year y = 15: C(15) = 2044.2 Error in the prediction: Error = 2006 – 2044.2 = -38.2
Year Cost
1975 599
1980 840
1985 1386
1988 1726
1989 1846
1990 2006
Predicting YearThe model for the average
cost of tuition and fees per semester for public 4-year colleges in the U.S. is
C(y) = 6.1y2 + 671.7 When will the cost reach
$3500? Solve the quadratic
equation 3500 = 6.1y2 + 671.7
Year Cost
1975 599
1980 840
1985 1386
1988 1726
1989 1846
1990 2006
Quadratic Equations
Methods of Solving Quadratic Equations Factoring Method Square Root Method Completing the Square Quadratic Formula
Square Root Method Let y = 3500 in model y = 6.1x2 + 671.7 Resulting quadratic equation has
no linear term 6.1x2 + 671.7 = 3500
How can we parallel the method of solving linear equations to solve this quadratic equation?
Square Root Method Solving a quadratic with no linear
term Isolate the square term
6.1x2 + 671.7 = 3500 6.1x2 + 671.7 – 671.1 = 3500 – 671.1 6.1x2 = 2828.9 x2 = 2828.9/6.1
Square root to find x5.21
1.6
9.2828x
Solving Quadratic Equation with a Linear Term Quadratic of Best Fit with Linear Term y = 3.7x2 + 38.9x + 587.6 (sd 22.4)
Let y = 3500 and solve resulting quadratic equation with a linear term
3.7x2 + 38.9x + 587.6 = 3500 How do we solve such equations?
Completing the Square Method Solve by converting to a perfect
square and using the Square Root Method
x2 + 4x - 5 = 0Isolate the x terms
x2 + 4x = 5Complete the square
x2 + 4x + 22 = 5 + 22
(x+2)2 = 9
Completing the Square Method
Square Root and solve (x+2)2 = 9
x + 2 = 3 or x + 2 = -3 x = 1 or x = -5
9)2( 2 x
Quadratic Formula Complete the square on the
general quadratic to get a general solution
ax2 + bx + c = 0
ax2 + bx = -c
a
cx
a
bx 2
222
22
a
b
a
c
a
bx
a
bx
Quadratic Formula
2
22
4
4
2 a
acb
a
bx
2
2
4
4
2 a
acb
a
bx
a
acbbx
2
42
Quadratic Formula
a
acbbx
2
42
Use the Quadratic Formula to solve 3.7x2 + 38.9x + 587.6 = 3500
3.7x2 + 38.9x –2912.4 = 0 So a = 3.7, b = 38.9, and c =-
2912.4
2.237.33 xorx
Graphic Method Graph the related function for the Higher
Education problem. Equation: 3.7x2 + 38.9x –2912.4 = 0
Related Function: f(x) = 3.7x2 + 38.9x – 2914.4
Use Derive or graphing calculator to generate a graph and zoom-in to find an error less than 0.01
Graphic Solution
Graphic Solution
What is the approximation and error from this graph?
Least Squares Modeling
The End