+ All Categories
Home > Documents > Unit 4: Modeling Topic 6: Least Squares Method April 1, 2003.

Unit 4: Modeling Topic 6: Least Squares Method April 1, 2003.

Date post: 29-Dec-2015
Category:
Upload: bartholomew-cain
View: 215 times
Download: 0 times
Share this document with a friend
Popular Tags:
64
Unit 4: Modeling Topic 6: Least Squares Method April 1, 2003
Transcript
Page 1: Unit 4: Modeling Topic 6: Least Squares Method April 1, 2003.

Unit 4: ModelingTopic 6: Least Squares Method

April 1, 2003

Page 2: Unit 4: Modeling Topic 6: Least Squares Method April 1, 2003.

Mathematical Modeling – Least Squares Section 2.3

Three Modeling Methods Known Relationship – underlying

mathematical setting is known Finite Differences – theoretical data or

hard science data with little scatter Least Squares – modeling data with

scatter

Page 3: Unit 4: Modeling Topic 6: Least Squares Method April 1, 2003.

Model: Global WarmingGlobal warming is partly the

result of burning fuels, which increases the amount of carbon dioxide in the air. One of the major sources of fuel consumption are cars. Let’s examine the number of cars in the U.S. (in millions) as one variable of global warming.

Year Cars

1940 27.5

1950 40.3

1960 61.7

1970 89.3

1980 121.6

1990 150.5

Page 4: Unit 4: Modeling Topic 6: Least Squares Method April 1, 2003.

Linear or Curvilinear? Numeric Method –

use Finite Differences method to determine if data has a near linear trend. Is the first difference

nearly constant?

Year Cars

1940 27.5

1950 40.3

1960 61.7

1970 89.3

1980 121.6

1990 150.5

Page 5: Unit 4: Modeling Topic 6: Least Squares Method April 1, 2003.

Linear or Curvilinear? Numeric Method –

use Finite Differences method to determine if data has a near linear trend. Is the first difference

nearly constant? Solution: Nearly so

after 1960.

Year Cars

1940 27.5

1950 40.3

1960 61.7

1970 89.3

1980 121.6

1990 150.5

1st Finite

Difference

12.8

21.4

27.6

32.3

28.9

Page 6: Unit 4: Modeling Topic 6: Least Squares Method April 1, 2003.

Linear or Curvilinear?

YearScaled

Cars

1 27.5

2 40.3

3 61.7

4 89.3

5 121.6

6 150.5

Graphic Method – plot the data to see the trendEstimate a line of best fit

What point should the line pass through?

Page 7: Unit 4: Modeling Topic 6: Least Squares Method April 1, 2003.

Eyeball line of fit

Page 8: Unit 4: Modeling Topic 6: Least Squares Method April 1, 2003.

Linear or Curvilinear?

YearScaled

Cars

1 27.5

2 40.3

3 61.7

4 89.3

5 121.6

6 150.5

Graphic Method – plot the data to see the trendEstimate a line of best fit

What point should the line pass through?Solution: Midpoint (3.5,81.8)Estimate the trend with approximate slope of m = 30

Page 9: Unit 4: Modeling Topic 6: Least Squares Method April 1, 2003.

Error in ModelError = observed - predicted

Page 10: Unit 4: Modeling Topic 6: Least Squares Method April 1, 2003.

Error Error or Residual for a model is the

vertical distance between the actual (observed) value and the predicted (fitted) value

Error = observed - predicted Actual value: (x,y) Predicted value: (x, ) So Error = y -

yy

Page 11: Unit 4: Modeling Topic 6: Least Squares Method April 1, 2003.

Finding Total Error Why not just add the errors to get

total error?

Page 12: Unit 4: Modeling Topic 6: Least Squares Method April 1, 2003.

Error Cancellation

Page 13: Unit 4: Modeling Topic 6: Least Squares Method April 1, 2003.

Finding Total Error Why not just add the errors to get

total error? Solution: + and – values cancel out This gives a false sense of what an

error of zero should mean

Page 14: Unit 4: Modeling Topic 6: Least Squares Method April 1, 2003.

How do we find total error? Make errors positive by

Taking the absolute value of the errors

Square the errors Sum the positive errors to find the

total error The best fit line makes the positive

sum of errors as small as possible

Page 15: Unit 4: Modeling Topic 6: Least Squares Method April 1, 2003.

Question

Why square deviations vs. absolute deviations?

Gauss-Markov Theorem: Of all methods for fitting a line to data, the least squares method provides predictions with minimum variance (a measure of uncertainty).

Gauss

Markov

Page 16: Unit 4: Modeling Topic 6: Least Squares Method April 1, 2003.

Least Squares Method vs. Absolute Value Method The Least Squares Method always

yields a unique best fit line, while the Absolute Value Method does not.

Example: Use the NCTM Applet for Least Squares to approximate the line of best fit for the following set of points: (0,2), (0,4), (6,3), (6,5), (3,3.5)

Page 17: Unit 4: Modeling Topic 6: Least Squares Method April 1, 2003.

Absolute Value MethodPts: (0,2),(0,4),(6,3),(6,5),(3,3.5)

Page 18: Unit 4: Modeling Topic 6: Least Squares Method April 1, 2003.

Absolute Value MethodPts: (0,2),(0,4),(6,3),(6,5),(3,3.5)

Page 19: Unit 4: Modeling Topic 6: Least Squares Method April 1, 2003.

Absolute Value MethodPts: (0,2),(0,4),(6,3),(6,5),(3,3.5)

Page 20: Unit 4: Modeling Topic 6: Least Squares Method April 1, 2003.

Least Square MethodPts: (0,2),(0,4),(6,3),(6,5),(3,3.5)

Page 21: Unit 4: Modeling Topic 6: Least Squares Method April 1, 2003.

Outliers An outlier is a point that lies

further from the line of best fit than most other points.

Least-square method is more sensitive to outliers.

Page 22: Unit 4: Modeling Topic 6: Least Squares Method April 1, 2003.

Absolute Value MethodPts: 4 near line, one outlier

Page 23: Unit 4: Modeling Topic 6: Least Squares Method April 1, 2003.

Least Square MethodPts: 4 near line, one outlier

Page 24: Unit 4: Modeling Topic 6: Least Squares Method April 1, 2003.

Mean Squared Error For a set of data (x,y) with n elements modeled by a line

bmxy ˆ

n

yyyyyyMSE nn

2222

211 )ˆ()ˆ()ˆ(

Page 25: Unit 4: Modeling Topic 6: Least Squares Method April 1, 2003.

Standard Deviation Standard Deviation: a measure of error found by square rooting the MSE

to get a measure of error in the same dimension as the original data.

so the standard deviation is

n

yyyyyyMSE nn

2222

211 )ˆ()ˆ()ˆ(

MSE

Page 26: Unit 4: Modeling Topic 6: Least Squares Method April 1, 2003.

Mean Squared Error

2.2330ˆ xy

n

yyyyyyMSE nn

2222

211 )ˆ()ˆ()ˆ(

Calculate the MSE for the Global warming data when modeled by the eyeball line of best fit

Year Cars Observed

Cars Predicted

1 27.5 6.8

2 40.3 36.8

3 61.7 66.8

4 89.3 96.8

5 121.6 126.8

6 150.5 156.8

Page 27: Unit 4: Modeling Topic 6: Least Squares Method April 1, 2003.

Mean Squared Error

2.2330ˆ xy

n

yyyyyyMSE nn

2222

211 )ˆ()ˆ()ˆ(

Calculate the MSE for the Global warming data when modeled by the eyeball line of best fit

Solution: MSE 98.3

Year Cars Observed

Cars Predicted

1 27.5 6.8

2 40.3 36.8

3 61.7 66.8

4 89.3 96.8

5 121.6 126.8

6 150.5 156.8

915.93.98 MSE

Page 28: Unit 4: Modeling Topic 6: Least Squares Method April 1, 2003.

Some History - Gauss Gauss came up with a

mathematical model for fitting a line to data when he was 17 years old (in 1794)!

However, he did not publish his findings until 1809.

Page 29: Unit 4: Modeling Topic 6: Least Squares Method April 1, 2003.

Some History - Legendre Legendre published the first explicit

account of the method of least squares in 1805, 4 years before Gauss published his.

But in Gauss’s publication, he referred to his earlier (1794) work, which created a controversy about who had first discovered the method.

Today both Gauss and Legendre are given credit for discovering the method of least squares independently.

Page 30: Unit 4: Modeling Topic 6: Least Squares Method April 1, 2003.

Line of Best Fit Least Squares Fit Method –

algebraic method of finding the best fit line

This method gives a line which has the smallest possible mean squared error or standard deviation.

Page 31: Unit 4: Modeling Topic 6: Least Squares Method April 1, 2003.

Verification of Least Square Fit Method Minimize MSE

n

yy

n

yyyyyy

n

iii

nn

1

2

2222

211

)ˆ(

)ˆ()ˆ()ˆ( MSE

Page 32: Unit 4: Modeling Topic 6: Least Squares Method April 1, 2003.

Verification of LS Method n is a fixed constant, so minimize

numerator

where is predicted value

yi is actual observed value for xi

n

iii

n

iii yyyy

1

2

1

2 )ˆ()ˆ(

iii bxaxy )(ˆ

Page 33: Unit 4: Modeling Topic 6: Least Squares Method April 1, 2003.

Verification of LS Method

n

iiiiiii

n

iii

n

iii

yybxxbayabxa

ybxayy

1

2222

1

2

1

2

)222(

)()ˆ( Substitute for iy

Square

Page 34: Unit 4: Modeling Topic 6: Least Squares Method April 1, 2003.

Verification of LS Method

Now expand the summation, which requires some summation rules:

Distribute Summation: Summation of Constant:

Summation and Mean:

byaxbyax )(

n

i n

n1

55...5555

n

yy

n

xx

i

i

Page 35: Unit 4: Modeling Topic 6: Least Squares Method April 1, 2003.

Verification of LS Method

222

2

1

2222

2

22

)222(

iiii

n

iiiiiii

yyxbxb

ynaxnbaan

yybxxbayabxa

Page 36: Unit 4: Modeling Topic 6: Least Squares Method April 1, 2003.

Verification of LS Method

Now the line of best fit passes throughii bxay ˆ

xbya

giving

xbay

soyx

),,(

Substitute for a with and simplify xby

Page 37: Unit 4: Modeling Topic 6: Least Squares Method April 1, 2003.

Verification of LS Method

22222

2222

2

2)ˆ(

xnx

yxnyxbbxnx

yxyxnbxnxbyy

i

iii

iiiii

The left expression and is a constant value since data is known, so we ignore it when minimizing.

022 xnxi

Page 38: Unit 4: Modeling Topic 6: Least Squares Method April 1, 2003.

Verification of LS Method

2

22

2

22

xnx

yxnyx

xnx

yxnyxb

i

ii

i

ii

Complete square on second expression to get

Since both terms are squared they are positive Subtracting the second positive term can only

make the expression smaller Thus minimizing the first positive term will

minimize the entire expression

Page 39: Unit 4: Modeling Topic 6: Least Squares Method April 1, 2003.

Verification of LS Method Entire expression is minimal when

Thus

022

xnx

yxnyxb

i

ii

xbya

xnx

yxnyxb

i

ii

22

and in line of best fit

bxay ˆ

Page 40: Unit 4: Modeling Topic 6: Least Squares Method April 1, 2003.

Line of Best Fit Derive’s calculated line of best fit

isy = 25.3 x – 6.8

Mean Squared Error is reduced from 98.3 for the eyeballed line of fit to 34.6 for the line of best fit.

Page 41: Unit 4: Modeling Topic 6: Least Squares Method April 1, 2003.

Line of Best Fit (blue)

Page 42: Unit 4: Modeling Topic 6: Least Squares Method April 1, 2003.

Polynomial of Best Fit Is the line of best fit a better model

then a quadratic or cubic polynomial? Examine the line of best fit with the

data. Is the data curvilinear?

Page 43: Unit 4: Modeling Topic 6: Least Squares Method April 1, 2003.

Polynomial of Best Fit Is the line of best fit a better model

then a quadratic or cubic polynomial? Examine the line of best fit with the data.

Is the data curvilinear? Solution: Data starts above line of best

fit, then goes below and finishes back above – indication that data is curvilinear

Extension of Linear Method to Other Polynomial Functions

Page 44: Unit 4: Modeling Topic 6: Least Squares Method April 1, 2003.

Polynomial of Best Fit Derive calculated quadratic of best

fit for the Global Warming Data

The MSE is reduced from 34.6 for the line of best fit to 2.02 for the quadratic of best fit

9.138.92.2)( 2 xxxy

Page 45: Unit 4: Modeling Topic 6: Least Squares Method April 1, 2003.

Polynomial of Best Fit

Page 46: Unit 4: Modeling Topic 6: Least Squares Method April 1, 2003.

Model:Higher Ed. Cost

Find a model for the average cost of tuition and fees per semester for public 4-year colleges in the U.S.

What is the predicted cost in 1990? 2005?

When will the cost reach $3500?

 Year Cost

1975 599

1980 840

1985 1386

1988 1726

1989 1846

1990 2006 

Page 47: Unit 4: Modeling Topic 6: Least Squares Method April 1, 2003.

Higher Ed. Cost Is the data linear or

curvilinear? Scale data Derive or Grapher to Fit

C(y) = 95.7y + 491.4 (sd 80.5)

C(y) = 6.1y2 + 671.7 (sd 56.0)

 Year Cost

1975Scale 0

599

1980Sc. 5

840

1985Sc. 10

1386

1988Sc. 13

1726

1989Sc. 14

1846

1990Sc. 15

2006 

Page 48: Unit 4: Modeling Topic 6: Least Squares Method April 1, 2003.

Scatter plot & Models

Page 49: Unit 4: Modeling Topic 6: Least Squares Method April 1, 2003.

Predicting Cost

The model for the average cost of tuition and fees per semester for public 4-year colleges in the U.S. is

C(y) = 6.1y2 + 671.7 What is the predicted

cost in 1990? What is the error in the

prediction?

 Year Cost

1975 599

1980 840

1985 1386

1988 1726

1989 1846

1990 2006 

Page 50: Unit 4: Modeling Topic 6: Least Squares Method April 1, 2003.

Predicting CostThe model for the average cost

of tuition and fees per semester for public 4-year colleges in the U.S. is

C(y) = 6.1y2 + 671.7 Predicted cost in 1990 which

is scaled year y = 15: C(15) = 2044.2 Error in the prediction: Error = 2006 – 2044.2 = -38.2

 Year Cost

1975 599

1980 840

1985 1386

1988 1726

1989 1846

1990 2006 

Page 51: Unit 4: Modeling Topic 6: Least Squares Method April 1, 2003.

Predicting YearThe model for the average

cost of tuition and fees per semester for public 4-year colleges in the U.S. is

C(y) = 6.1y2 + 671.7 When will the cost reach

$3500? Solve the quadratic

equation 3500 = 6.1y2 + 671.7

 Year Cost

1975 599

1980 840

1985 1386

1988 1726

1989 1846

1990 2006 

Page 52: Unit 4: Modeling Topic 6: Least Squares Method April 1, 2003.

Quadratic Equations

Methods of Solving Quadratic Equations Factoring Method Square Root Method Completing the Square Quadratic Formula

Page 53: Unit 4: Modeling Topic 6: Least Squares Method April 1, 2003.

Square Root Method Let y = 3500 in model y = 6.1x2 + 671.7 Resulting quadratic equation has

no linear term 6.1x2 + 671.7 = 3500

How can we parallel the method of solving linear equations to solve this quadratic equation?

Page 54: Unit 4: Modeling Topic 6: Least Squares Method April 1, 2003.

Square Root Method Solving a quadratic with no linear

term Isolate the square term

6.1x2 + 671.7 = 3500 6.1x2 + 671.7 – 671.1 = 3500 – 671.1 6.1x2 = 2828.9 x2 = 2828.9/6.1

Square root to find x5.21

1.6

9.2828x

Page 55: Unit 4: Modeling Topic 6: Least Squares Method April 1, 2003.

Solving Quadratic Equation with a Linear Term Quadratic of Best Fit with Linear Term y = 3.7x2 + 38.9x + 587.6 (sd 22.4)

Let y = 3500 and solve resulting quadratic equation with a linear term

3.7x2 + 38.9x + 587.6 = 3500 How do we solve such equations?

Page 56: Unit 4: Modeling Topic 6: Least Squares Method April 1, 2003.

Completing the Square Method Solve by converting to a perfect

square and using the Square Root Method

x2 + 4x - 5 = 0Isolate the x terms

x2 + 4x = 5Complete the square

x2 + 4x + 22 = 5 + 22

(x+2)2 = 9

Page 57: Unit 4: Modeling Topic 6: Least Squares Method April 1, 2003.

Completing the Square Method

Square Root and solve (x+2)2 = 9

x + 2 = 3 or x + 2 = -3 x = 1 or x = -5

9)2( 2 x

Page 58: Unit 4: Modeling Topic 6: Least Squares Method April 1, 2003.

Quadratic Formula Complete the square on the

general quadratic to get a general solution

ax2 + bx + c = 0

ax2 + bx = -c

a

cx

a

bx 2

222

22

a

b

a

c

a

bx

a

bx

Page 59: Unit 4: Modeling Topic 6: Least Squares Method April 1, 2003.

Quadratic Formula

2

22

4

4

2 a

acb

a

bx

2

2

4

4

2 a

acb

a

bx

a

acbbx

2

42

Page 60: Unit 4: Modeling Topic 6: Least Squares Method April 1, 2003.

Quadratic Formula

a

acbbx

2

42

Use the Quadratic Formula to solve 3.7x2 + 38.9x + 587.6 = 3500

3.7x2 + 38.9x –2912.4 = 0 So a = 3.7, b = 38.9, and c =-

2912.4

2.237.33 xorx

Page 61: Unit 4: Modeling Topic 6: Least Squares Method April 1, 2003.

Graphic Method Graph the related function for the Higher

Education problem. Equation: 3.7x2 + 38.9x –2912.4 = 0

Related Function: f(x) = 3.7x2 + 38.9x – 2914.4

Use Derive or graphing calculator to generate a graph and zoom-in to find an error less than 0.01

Page 62: Unit 4: Modeling Topic 6: Least Squares Method April 1, 2003.

Graphic Solution

Page 63: Unit 4: Modeling Topic 6: Least Squares Method April 1, 2003.

Graphic Solution

What is the approximation and error from this graph?

Page 64: Unit 4: Modeling Topic 6: Least Squares Method April 1, 2003.

Least Squares Modeling

The End


Recommended