+ All Categories
Home > Documents > 1 Curve-Fitting Regression. 2 Some Applications of Curve Fitting To fit curves to a collection of...

1 Curve-Fitting Regression. 2 Some Applications of Curve Fitting To fit curves to a collection of...

Date post: 20-Dec-2015
Category:
View: 221 times
Download: 1 times
Share this document with a friend
Popular Tags:
39
1 Curve-Fitting Regression
Transcript
Page 1: 1 Curve-Fitting Regression. 2 Some Applications of Curve Fitting To fit curves to a collection of discrete points in order to obtain intermediate estimates.

1

Curve-Fitting

Regression

Page 2: 1 Curve-Fitting Regression. 2 Some Applications of Curve Fitting To fit curves to a collection of discrete points in order to obtain intermediate estimates.

2

Some Applications of Curve Fitting

• To fit curves to a collection of discrete points in order to obtain intermediate estimates or to provide trend analysis

Page 3: 1 Curve-Fitting Regression. 2 Some Applications of Curve Fitting To fit curves to a collection of discrete points in order to obtain intermediate estimates.

3

Some Applications of Curve Fitting• Function approximation

– e.g.: In the applications of numerical integration

polynomialorderthanis)(where

)()()()(

nxp

xpxfxpxf

n

b

a n

b

an

• Hypothesis testing– Compare theoretical data model to empirical data

collected through experiments to test if they agree with each other.

Page 4: 1 Curve-Fitting Regression. 2 Some Applications of Curve Fitting To fit curves to a collection of discrete points in order to obtain intermediate estimates.

4

Two Approaches

• Regression – Find the "best" curve to fit the points. The curve does not have to pass through the points. (Fig (a))

• Interpolation – Fit a curve or series of curves that pass through every point. (Figs (b) & (c))

Page 5: 1 Curve-Fitting Regression. 2 Some Applications of Curve Fitting To fit curves to a collection of discrete points in order to obtain intermediate estimates.

5

Curve FittingRegression

Linear RegressionPolynomial RegressionMultiple Linear RegressionNon-linear Regression

InterpolationNewton's Divided-Difference InterpolationLagrange Interpolating PolynomialsSpline Interpolation

Page 6: 1 Curve-Fitting Regression. 2 Some Applications of Curve Fitting To fit curves to a collection of discrete points in order to obtain intermediate estimates.

6

• Some data exhibit a linear relationship but have noises

• A curve that interpolates all points (that contain errors) would make a poor representation of the behavior of the data set.

• A straight line captures the linear relationship better.

Linear Regression – Introduction

Page 7: 1 Curve-Fitting Regression. 2 Some Applications of Curve Fitting To fit curves to a collection of discrete points in order to obtain intermediate estimates.

7

Objective: Want to fit the "best" line to the data points (that exhibit linear relation).

– How do we define "best"?

Linear Regression

Pass through as many points as possible

Minimize the maximum residual of each point

Each point carries the same weight

Page 8: 1 Curve-Fitting Regression. 2 Some Applications of Curve Fitting To fit curves to a collection of discrete points in order to obtain intermediate estimates.

8

Objective• Given a set of points

( x1, y1 ) , (x2, y2 ), …, (xn, yn )

• Want to find a straight line

y = a0+ a1xthat best fits the points.

The error or residual at each given point can be expressed as

ei = yi – a0 – a1xi

Linear Regression

Page 9: 1 Curve-Fitting Regression. 2 Some Applications of Curve Fitting To fit curves to a collection of discrete points in order to obtain intermediate estimates.

9

Residual (Error) Measurement

Page 10: 1 Curve-Fitting Regression. 2 Some Applications of Curve Fitting To fit curves to a collection of discrete points in order to obtain intermediate estimates.

10

Criteria for a "Best" Fit• Minimize the sum of residuals

– Inadequate– e.g.: Any line passing through mid-points

would satisfy the criteria.

n

iii

n

ii xaaye

110

1

)(

• Minimize the sum of absolute values of residuals (L1-norm)

– "Best" line may not be unique

– e.g.: Any line within the upper and lower points would satisfy the criteria.

n

iii

n

ii xaaye

110

1

Page 11: 1 Curve-Fitting Regression. 2 Some Applications of Curve Fitting To fit curves to a collection of discrete points in order to obtain intermediate estimates.

11

Criteria for a "Best" Fit• Minimax method: Minimize the

largest residuals of all the point (L∞-Norm)

– Not easy to compute– Bias toward outlier– e.g.: Data set with an outlier. The

line is affected strongly by the outlier.

iini

ini

xaaye 1000maxminmaxmin

Outlier

Note: Minimax method is sometimes well suited for fitting a simple function to a complicated function. (Why?)

Page 12: 1 Curve-Fitting Regression. 2 Some Applications of Curve Fitting To fit curves to a collection of discrete points in order to obtain intermediate estimates.

12

Least-Square Fit

n

iii

n

ii xaaye

1

210

1

2 )(

• Minimize the sum of squares of the residuals (L2-Norm)

• Unique solution• Easy to compute• Closely related to statistics

n

iii xaayaa

1

21010 )( minimize that and find How to

Page 13: 1 Curve-Fitting Regression. 2 Some Applications of Curve Fitting To fit curves to a collection of discrete points in order to obtain intermediate estimates.

13

Least-Squares Fit of a Straight Line

satisfy that , find can we),,( minimize To

)(),(Let

1010

1

21010

aaaaS

xaayaaS

r

n

iiir

n

iii

n

iii

r

xaay

xaay

a

S

110

110

0

0)(

0)(2

0

n

iiiii

n

iiii

r

xaxayx

xaayx

a

S

1

210

110

1

0)(

0)(2

0

Page 14: 1 Curve-Fitting Regression. 2 Some Applications of Curve Fitting To fit curves to a collection of discrete points in order to obtain intermediate estimates.

14

Least-Squares Fit of a Straight Line

n

ii

n

ii

n

ii

n

ii

n

ii

n

i

n

ii

n

iii

yaxna

xanay

xaay

xaay

11

10

110

1

11

10

1

110

0

0

0)(

n

iii

n

ii

n

ii

n

ii

n

ii

n

iii

n

ii

n

ii

n

iii

n

iiiii

yxaxax

xaxayx

xaxayx

xaxayx

11

1

20

1

1

21

10

1

1

21

10

1

1

210

0

0

0)(

These are called the normal equations.

How do you find a0 and a1?

Page 15: 1 Curve-Fitting Regression. 2 Some Applications of Curve Fitting To fit curves to a collection of discrete points in order to obtain intermediate estimates.

15

Least-Squares Fit of a Straight Line

Solving the system of equations yields

n

iii

n

ii

n

ii

n

ii

n

ii

yxaxax

yaxna

11

1

20

1

11

10

xayn

xaya

xxn

yxyxna ii

ii

iiii1

10221

n

iii

n

ii

n

ii

n

ii

n

ii

yx

y

a

a

xx

xn

1

1

1

0

1

2

1

1

Page 16: 1 Curve-Fitting Regression. 2 Some Applications of Curve Fitting To fit curves to a collection of discrete points in order to obtain intermediate estimates.

16

Statistics Review

• Mean – The "best point" that minimizes the sum of squares of residuals.

• Standard deviation – Measure how the sample (data) spread about the mean. – The smaller the standard deviation the better the mean

describes the sample.

deviation) Standard(1

residuals) theof squares of (Sum)(

Mean)(1

1

2

1

n

SS

yyS

yn

y

ty

n

iit

n

ii

Page 17: 1 Curve-Fitting Regression. 2 Some Applications of Curve Fitting To fit curves to a collection of discrete points in order to obtain intermediate estimates.

17

Quantification of Error of Linear Regression

Sy/x is called the standard error of the estimate.

Similar to "standard deviation", Sy/x quantifies the spread of the data points around the regression line.

The notation "y/x" designates that the error is for predicted value of y corresponding to a particular value of x.

2/

n

SS r

xy

n

iii

n

iir xaayeS

1

210

1

2 )(

Page 18: 1 Curve-Fitting Regression. 2 Some Applications of Curve Fitting To fit curves to a collection of discrete points in order to obtain intermediate estimates.

18

(a) Spread of the data around the mean of the dependent variable.

(b) Spread of the data around the best-fit line.

Linear regression with (a) small and (b) large residual errors.

Page 19: 1 Curve-Fitting Regression. 2 Some Applications of Curve Fitting To fit curves to a collection of discrete points in order to obtain intermediate estimates.

19

"Goodness" of our fit• Let St be the sum of the squares around the

mean for the dependent variable, y

• Let Sr be the sum of the squares of residuals around the regression line

• St - Sr quantifies the improvement or error reduction due to describing data in terms of a straight line rather than as an average value.

n

iit yyS

1

2)(

Page 20: 1 Curve-Fitting Regression. 2 Some Applications of Curve Fitting To fit curves to a collection of discrete points in order to obtain intermediate estimates.

20

"Goodness" of our fit

• For a perfect fit

Sr=0 and r=r2=1, signifying that the line explains 100 percent of the variability of the data.

• For r=r2=0, Sr=St, the fit represents no improvement.

• e.g.: r2=0.868 means 86.8% of the original uncertainty has been "explained" by the linear model.

t

rt

S

SSr

2

tcoefficienncorrelatio:

iondeterminatoftcoefficien:2

r

r

Page 21: 1 Curve-Fitting Regression. 2 Some Applications of Curve Fitting To fit curves to a collection of discrete points in order to obtain intermediate estimates.

21

Objective• Given n points

( x1, y1 ) , (x2, y2 ), …, (xn, yn )

• Want to find a polynomial of degree m

y = a0+ a1x + a2x2 + …+ amxm

that best fits the points.

The error or residual at each given point can be expressed as

ei = yi – a0 – a1x – a2x2 – … – amxm

Polynomial Regression

Page 22: 1 Curve-Fitting Regression. 2 Some Applications of Curve Fitting To fit curves to a collection of discrete points in order to obtain intermediate estimates.

22

Least-Squares Fit of a Polynomial

n

ii

ji

mimii

ji

n

i

mimiii

ji

j

r

yxxaxaxaax

xaxaxaayx

,...,m,ja

S

1

2210

1

2210

)...(

0)...(

yields10for 0Setting

The procedures for finding a0, a1, …, am that minimize the sum of squares of the residuals is the same as those used in the linear least-square regression.

n

i

mimiioi

n

iir xaxaxaayeS

1

2221

1

2 )...(

Page 23: 1 Curve-Fitting Regression. 2 Some Applications of Curve Fitting To fit curves to a collection of discrete points in order to obtain intermediate estimates.

23

imi

ii

ii

i

m

o

mi

mi

mi

mi

miiii

miiii

miii

yx

yx

yx

y

a

a

a

a

xxxx

xxxx

xxxx

xxxn

22

1

221

2432

132

2

Least-Squares Fit of a Polynomial

To find a0, a1, …, an that minimize Sr, we can solve this system of linear equations.

The standard error of the estimate becomes

)1(/

mn

SS r

xy

Page 24: 1 Curve-Fitting Regression. 2 Some Applications of Curve Fitting To fit curves to a collection of discrete points in order to obtain intermediate estimates.

24

• In linear regression, y is a function of one variable.

• In multiple linear regression, y is a linear function of multiple variables.

• Want to find the best fitting linear equation

y = a0+ a1x1 + a2x2 + …+ amxm

• Same procedure to find a0, a1, a2, … ,am that minimize the sum of squared residuals

• The standard error of estimate is

Multiple Linear Regression

)1(/

mn

SS r

xy

Page 25: 1 Curve-Fitting Regression. 2 Some Applications of Curve Fitting To fit curves to a collection of discrete points in order to obtain intermediate estimates.

25

General Linear Least Square• All of simple linear, polynomial, and multiple

linear regressions belong to the following general linear least squares model:

)functions of kindany be can( s' of functionsdifferent are

where

...221100

xz

ezazazazay

i

mm

• It is called "linear" because the dependent variable, y, is a linear function of ai's.

Page 26: 1 Curve-Fitting Regression. 2 Some Applications of Curve Fitting To fit curves to a collection of discrete points in order to obtain intermediate estimates.

26

How Other Regressions Fit Into Linear Least Square Model• Polynomial:

mm

mm

xzxzxzxz

exaxaxaay

...,,,,1 i.e.,

)(...)()()1(2

210

0

2210

• Multiple linear:

mm

mm

xzxzxzz

exaxaxaay

...,,,,1 i.e.,

)(...)()()1(

22110

22110

• Others:

12133221110

21

33221110

)(,cos,ln,sin i.e.,

)cos()(ln)(sin

xxzxxzxzxz

exx

axxaxaxay

Page 27: 1 Curve-Fitting Regression. 2 Some Applications of Curve Fitting To fit curves to a collection of discrete points in order to obtain intermediate estimates.

27

.pointtheatfunctionofvaluetherepresentswhere

,...,1,...th

221100

jzz

njezazazazay

iij

imjmjjjj

• We can express the above equations in matrix form as

nm

mnnn

m

m

n

n

e

e

e

a

a

a

zzz

zzz

zzz

y

y

y

or

2

1

1

0

10

21202

111011

eZay

• Given n points, we have

General Linear Least Square

Page 28: 1 Curve-Fitting Regression. 2 Some Applications of Curve Fitting To fit curves to a collection of discrete points in order to obtain intermediate estimates.

28

General Linear Least Square

The sum of squares of the residuals can be calculated as

To minimize Sr, we can set the partial derivatives of Sr to zeroes and solve the resulting normal equations.

The normal equations can be expressed concisely as

n

i

m

jjijir zayS

1

2

0

yZZaZ TT How should we solve this system?

Page 29: 1 Curve-Fitting Regression. 2 Some Applications of Curve Fitting To fit curves to a collection of discrete points in order to obtain intermediate estimates.

29

Example

• Find the straight line that best fit the data in least-square sense.

• A straight line can be expressed in the form y = a0 + a1x. That is, with z0 = 1, z1 = x.

• Thus we can construct Z as

X 3 5 6

Y 4 1 4

61

51

31

Z

Page 30: 1 Curve-Fitting Regression. 2 Some Applications of Curve Fitting To fit curves to a collection of discrete points in order to obtain intermediate estimates.

30

or , solve tois objectiveOur 1

0T yZZZ T

a

a

4

1

4

653

111

61

51

31

653

111

1

0

a

a

41

9

7014

143

1

0

a

a

2143.0

4or

14/3

4

1

0

a

a

Example

x2143.04 y is linebest The

Page 31: 1 Curve-Fitting Regression. 2 Some Applications of Curve Fitting To fit curves to a collection of discrete points in order to obtain intermediate estimates.

31

Solving ZTZa = ZTyNote: Z is an n by (m+1) matrix.

• Gaussian or LU decomposition– Less efficient

• Cholesky decomposition– Decompose ZTZ into RTR where R is an upper

triangular matrix.– Solve ZTZa = ZTy as RTRa = ZTy

• QR decomposition• Singular value decomposition

Page 32: 1 Curve-Fitting Regression. 2 Some Applications of Curve Fitting To fit curves to a collection of discrete points in order to obtain intermediate estimates.

32

Solving ZTZa = ZTy (Cholesky decomposition) **

• Given a nxm matrix Z.

• Suppose we have computed Rmxm from ZTZ using Cholesky decomposition

• If we add an additional column to Z, then the new R will be in the form

1,1

1,2

1,1

00 mm

mmm

m

r

r

r

R

• Suitable for testing how much improvement in terms of least-square fit a polynomial of one degree higher can provide

i.e., we only need to compute the (m+1)th column of R.

Page 33: 1 Curve-Fitting Regression. 2 Some Applications of Curve Fitting To fit curves to a collection of discrete points in order to obtain intermediate estimates.

33

Linearization of Nonlinear Relationships

• Some non-linear relationships can be transformed so that in the transformed space the data exhibit a linear relationship.

• For examples,

33

3

33

222

111

111

equation. rate-Growth

Saturation

logloglogequationPower

lnlnequation lExponentia

2

1

axa

b

yxb

xay

xbayxay

xbayeay

b

xb

Page 34: 1 Curve-Fitting Regression. 2 Some Applications of Curve Fitting To fit curves to a collection of discrete points in order to obtain intermediate estimates.

34

Fig 17.9

Page 35: 1 Curve-Fitting Regression. 2 Some Applications of Curve Fitting To fit curves to a collection of discrete points in order to obtain intermediate estimates.

35

Example

Find the saturation growth rate equation

that best fit the data in least-square sense.

Solution: Step 1: Linearize the curve as

X 1 2 3

Y 4 1 4

12

1

11

2111

1

11

1,,

1',

1' where

''111

ac

a

bc

xx

yy

cxcyaxa

b

yxb

xay

xb

xay

11

Page 36: 1 Curve-Fitting Regression. 2 Some Applications of Curve Fitting To fit curves to a collection of discrete points in order to obtain intermediate estimates.

36

Example

sense). squareleast in optimalnot is(It data thefitsthat

curve good"accetably " an is )4866.0/(4055.1 Thus xxy

X 1 2 3

Y 4 1 4

X' = 1/X 1 1/2 1/3

Y' = 1/Y 1/4 1 1/4

Step 2: Transform data from original space to "linearized space".

Step 3: Perform linear least square fit for y' = c1x' + c2

4866.0/,4055.1/1

7115.0

3462.0 yields Solving

4/1

1

4/1

,

13/1

12/1

11

have data we theFrom

1111112

2

1

2

1T

babcaac

c

c

c

c T yZZZ

yZ

Page 37: 1 Curve-Fitting Regression. 2 Some Applications of Curve Fitting To fit curves to a collection of discrete points in order to obtain intermediate estimates.

37

Linearization of Nonlinear Relationships

• Best least square fit in the transformed space ≠best least square fit in the original space– For many applications, however, the parameters

obtained from performing least square fit in the transformed space are acceptable.

• Linearization of Nonlinear Relationships– Sub-optimal result– Easy to compute

Page 38: 1 Curve-Fitting Regression. 2 Some Applications of Curve Fitting To fit curves to a collection of discrete points in order to obtain intermediate estimates.

38

• The relationship among the parameters, ai's, is non-linear and cannot be linearized using direct method.

• For example,

Non-Linear Regression **

)1( 10

xaeay

• Objective: Find a0 and a1 that minimizes

2

10

1

2 )1( 1

n

i

xai

n

ii

ieaye

• Possible approaches to find the solution:– Applying minimization of non-linear function– Set partial derivatives to zero and solve non-linear

equation.– Gauss-Newton Method

Page 39: 1 Curve-Fitting Regression. 2 Some Applications of Curve Fitting To fit curves to a collection of discrete points in order to obtain intermediate estimates.

39

Other Notes• When performing least square fit,

– The order of the data in the table is not important

– The order in which you arrange the basis functions is not important.

– e.g., Least square fit of

y = a0 + a1x or y = b0x + b1 to

or or

would yield the same straight line.

X 3 5 6

Y 4 1 4

X 6 3 5

Y 4 4 1

X 5 6 3

Y 1 4 4


Recommended