Unit 4: Modeling Topic 6: Least Squares Method April 1, 2003.

Unit 4: ModelingTopic 6: Least Squares Method

April 1, 2003

Mathematical Modeling – Least Squares Section 2.3

Three Modeling Methods Known Relationship – underlying

mathematical setting is known Finite Differences – theoretical data or

hard science data with little scatter Least Squares – modeling data with

scatter

Model: Global WarmingGlobal warming is partly the

result of burning fuels, which increases the amount of carbon dioxide in the air. One of the major sources of fuel consumption are cars. Let’s examine the number of cars in the U.S. (in millions) as one variable of global warming.

Year Cars

1940 27.5

1950 40.3

1960 61.7

1970 89.3

1980 121.6

1990 150.5

Linear or Curvilinear? Numeric Method –

use Finite Differences method to determine if data has a near linear trend. Is the first difference

nearly constant?

Year Cars

1940 27.5

1950 40.3

1960 61.7

1970 89.3

1980 121.6

1990 150.5

Linear or Curvilinear? Numeric Method –

use Finite Differences method to determine if data has a near linear trend. Is the first difference

nearly constant? Solution: Nearly so

after 1960.

Year Cars

1940 27.5

1950 40.3

1960 61.7

1970 89.3

1980 121.6

1990 150.5

1st Finite

Difference

12.8

21.4

27.6

32.3

28.9

Linear or Curvilinear?

YearScaled

Cars

1 27.5

2 40.3

3 61.7

4 89.3

5 121.6

6 150.5

Graphic Method – plot the data to see the trendEstimate a line of best fit

What point should the line pass through?

Eyeball line of fit

Linear or Curvilinear?

YearScaled

Cars

1 27.5

2 40.3

3 61.7

4 89.3

5 121.6

6 150.5

Graphic Method – plot the data to see the trendEstimate a line of best fit

What point should the line pass through?Solution: Midpoint (3.5,81.8)Estimate the trend with approximate slope of m = 30

Error in ModelError = observed - predicted

Error Error or Residual for a model is the

vertical distance between the actual (observed) value and the predicted (fitted) value

Error = observed - predicted Actual value: (x,y) Predicted value: (x, ) So Error = y -

yy

Finding Total Error Why not just add the errors to get

total error?

Error Cancellation

Finding Total Error Why not just add the errors to get

total error? Solution: + and – values cancel out This gives a false sense of what an

error of zero should mean

How do we find total error? Make errors positive by

Taking the absolute value of the errors

Square the errors Sum the positive errors to find the

total error The best fit line makes the positive

sum of errors as small as possible

Question

Why square deviations vs. absolute deviations?

Gauss-Markov Theorem: Of all methods for fitting a line to data, the least squares method provides predictions with minimum variance (a measure of uncertainty).

Gauss

Markov

Least Squares Method vs. Absolute Value Method The Least Squares Method always

yields a unique best fit line, while the Absolute Value Method does not.

Example: Use the NCTM Applet for Least Squares to approximate the line of best fit for the following set of points: (0,2), (0,4), (6,3), (6,5), (3,3.5)

Absolute Value MethodPts: (0,2),(0,4),(6,3),(6,5),(3,3.5)



Least Square MethodPts: (0,2),(0,4),(6,3),(6,5),(3,3.5)

Outliers An outlier is a point that lies

further from the line of best fit than most other points.

Least-square method is more sensitive to outliers.

Absolute Value MethodPts: 4 near line, one outlier

Least Square MethodPts: 4 near line, one outlier

Mean Squared Error For a set of data (x,y) with n elements modeled by a line

bmxy ˆ

n

yyyyyyMSE nn

2222

211 )ˆ()ˆ()ˆ(

Standard Deviation Standard Deviation: a measure of error found by square rooting the MSE

to get a measure of error in the same dimension as the original data.

so the standard deviation is

n

yyyyyyMSE nn

2222

211 )ˆ()ˆ()ˆ(

MSE

Mean Squared Error

2.2330ˆ xy

n

yyyyyyMSE nn

2222

211 )ˆ()ˆ()ˆ(

Calculate the MSE for the Global warming data when modeled by the eyeball line of best fit

Year Cars Observed

Cars Predicted

1 27.5 6.8

2 40.3 36.8

3 61.7 66.8

4 89.3 96.8

5 121.6 126.8

6 150.5 156.8

Mean Squared Error

2.2330ˆ xy

n

yyyyyyMSE nn

2222

211 )ˆ()ˆ()ˆ(

Calculate the MSE for the Global warming data when modeled by the eyeball line of best fit

Solution: MSE 98.3

Year Cars Observed

Cars Predicted

1 27.5 6.8

2 40.3 36.8

3 61.7 66.8

4 89.3 96.8

5 121.6 126.8

6 150.5 156.8

915.93.98 MSE

Some History - Gauss Gauss came up with a

mathematical model for fitting a line to data when he was 17 years old (in 1794)!

However, he did not publish his findings until 1809.

Some History - Legendre Legendre published the first explicit

account of the method of least squares in 1805, 4 years before Gauss published his.

But in Gauss’s publication, he referred to his earlier (1794) work, which created a controversy about who had first discovered the method.

Today both Gauss and Legendre are given credit for discovering the method of least squares independently.

Line of Best Fit Least Squares Fit Method –

algebraic method of finding the best fit line

This method gives a line which has the smallest possible mean squared error or standard deviation.

Verification of Least Square Fit Method Minimize MSE

n

yy

n

yyyyyy

n

iii

nn

1

2

2222

211

)ˆ(

)ˆ()ˆ()ˆ( MSE

Verification of LS Method n is a fixed constant, so minimize

numerator

where is predicted value

yi is actual observed value for xi

n

iii

n

iii yyyy

1

2

1

2 )ˆ()ˆ(

iii bxaxy )(ˆ

Verification of LS Method

n

iiiiiii

n

iii

n

iii

yybxxbayabxa

ybxayy

1

2222

1

2

1

2

)222(

)()ˆ( Substitute for iy

Square


Now expand the summation, which requires some summation rules:

Distribute Summation: Summation of Constant:

Summation and Mean:

byaxbyax )(

n

i n

n1

55...5555

n

yy

n

xx

i

i


222

2

1

2222

2

22

)222(

iiii

n

iiiiiii

yyxbxb

ynaxnbaan

yybxxbayabxa


Now the line of best fit passes throughii bxay ˆ

xbya

giving

xbay

soyx

),,(

Substitute for a with and simplify xby


22222

2222

2

2)ˆ(

xnx

yxnyxbbxnx

yxyxnbxnxbyy

i

iii

iiiii

The left expression and is a constant value since data is known, so we ignore it when minimizing.

022 xnxi


2

22

2

22

xnx

yxnyx

xnx

yxnyxb

i

ii

i

ii

Complete square on second expression to get

Since both terms are squared they are positive Subtracting the second positive term can only

make the expression smaller Thus minimizing the first positive term will

minimize the entire expression

Verification of LS Method Entire expression is minimal when

Thus

022

xnx

yxnyxb

i

ii

xbya

xnx

yxnyxb

i

ii

22

and in line of best fit

bxay ˆ

Line of Best Fit Derive’s calculated line of best fit

isy = 25.3 x – 6.8

Mean Squared Error is reduced from 98.3 for the eyeballed line of fit to 34.6 for the line of best fit.

Line of Best Fit (blue)

Polynomial of Best Fit Is the line of best fit a better model

then a quadratic or cubic polynomial? Examine the line of best fit with the

data. Is the data curvilinear?

Polynomial of Best Fit Is the line of best fit a better model

then a quadratic or cubic polynomial? Examine the line of best fit with the data.

Is the data curvilinear? Solution: Data starts above line of best

fit, then goes below and finishes back above – indication that data is curvilinear

Extension of Linear Method to Other Polynomial Functions

Polynomial of Best Fit Derive calculated quadratic of best

fit for the Global Warming Data

The MSE is reduced from 34.6 for the line of best fit to 2.02 for the quadratic of best fit

9.138.92.2)( 2 xxxy

Polynomial of Best Fit

Model:Higher Ed. Cost

Find a model for the average cost of tuition and fees per semester for public 4-year colleges in the U.S.

What is the predicted cost in 1990? 2005?

When will the cost reach $3500?

Year Cost

1975 599

1980 840

1985 1386

1988 1726

1989 1846

1990 2006

Higher Ed. Cost Is the data linear or

curvilinear? Scale data Derive or Grapher to Fit

C(y) = 95.7y + 491.4 (sd 80.5)

C(y) = 6.1y2 + 671.7 (sd 56.0)

Year Cost

1975Scale 0

599

1980Sc. 5

840

1985Sc. 10

1386

1988Sc. 13

1726

1989Sc. 14

1846

1990Sc. 15

2006

Scatter plot & Models

Predicting Cost

The model for the average cost of tuition and fees per semester for public 4-year colleges in the U.S. is

C(y) = 6.1y2 + 671.7 What is the predicted

cost in 1990? What is the error in the

prediction?

Year Cost

1975 599

1980 840

1985 1386

1988 1726

1989 1846

1990 2006

Predicting CostThe model for the average cost

of tuition and fees per semester for public 4-year colleges in the U.S. is

C(y) = 6.1y2 + 671.7 Predicted cost in 1990 which

is scaled year y = 15: C(15) = 2044.2 Error in the prediction: Error = 2006 – 2044.2 = -38.2

Year Cost

1975 599

1980 840

1985 1386

1988 1726

1989 1846

1990 2006

Predicting YearThe model for the average

cost of tuition and fees per semester for public 4-year colleges in the U.S. is

C(y) = 6.1y2 + 671.7 When will the cost reach

$3500? Solve the quadratic

equation 3500 = 6.1y2 + 671.7

Year Cost

1975 599

1980 840

1985 1386

1988 1726

1989 1846

1990 2006

Quadratic Equations

Methods of Solving Quadratic Equations Factoring Method Square Root Method Completing the Square Quadratic Formula

Square Root Method Let y = 3500 in model y = 6.1x2 + 671.7 Resulting quadratic equation has

no linear term 6.1x2 + 671.7 = 3500

How can we parallel the method of solving linear equations to solve this quadratic equation?

Square Root Method Solving a quadratic with no linear

term Isolate the square term

6.1x2 + 671.7 = 3500 6.1x2 + 671.7 – 671.1 = 3500 – 671.1 6.1x2 = 2828.9 x2 = 2828.9/6.1

Square root to find x5.21

1.6

9.2828x

Solving Quadratic Equation with a Linear Term Quadratic of Best Fit with Linear Term y = 3.7x2 + 38.9x + 587.6 (sd 22.4)

Let y = 3500 and solve resulting quadratic equation with a linear term

3.7x2 + 38.9x + 587.6 = 3500 How do we solve such equations?

Completing the Square Method Solve by converting to a perfect

square and using the Square Root Method

x2 + 4x - 5 = 0Isolate the x terms

x2 + 4x = 5Complete the square

x2 + 4x + 22 = 5 + 22

(x+2)2 = 9

Completing the Square Method

Square Root and solve (x+2)2 = 9

x + 2 = 3 or x + 2 = -3 x = 1 or x = -5

9)2( 2 x

Quadratic Formula Complete the square on the

general quadratic to get a general solution

ax2 + bx + c = 0

ax2 + bx = -c

a

cx

a

bx 2

222

22

a

b

a

c

a

bx

a

bx

Quadratic Formula

2

22

4

4

2 a

acb

a

bx

2

2

4

4

2 a

acb

a

bx

a

acbbx

2

42

Quadratic Formula

a

acbbx

2

42

Use the Quadratic Formula to solve 3.7x2 + 38.9x + 587.6 = 3500

3.7x2 + 38.9x –2912.4 = 0 So a = 3.7, b = 38.9, and c =-

2912.4

2.237.33 xorx

Graphic Method Graph the related function for the Higher

Education problem. Equation: 3.7x2 + 38.9x –2912.4 = 0

Related Function: f(x) = 3.7x2 + 38.9x – 2914.4

Use Derive or graphing calculator to generate a graph and zoom-in to find an error less than 0.01

Graphic Solution

Graphic Solution

What is the approximation and error from this graph?

Least Squares Modeling

The End

Date post:	29-Dec-2015
Category:	Documents
Upload:	bartholomew-cain
View:	215 times
Download:	0 times

Unit 4: Modeling Topic 6: Least Squares Method April 1, 2003.

Documents