Week06&07_Curve Fitting.pdf

Post on 07-Aug-2018

215 views 0 download

transcript

8/20/2019 Week06&07_Curve Fitting.pdf

http://slidepdf.com/reader/full/week0607curve-fittingpdf 1/38

8/20/2019 Week06&07_Curve Fitting.pdf

http://slidepdf.com/reader/full/week0607curve-fittingpdf 2/38

WEEK 6&7

Curve Fitting Linear regression

 Polynomial regression

 Multiple regression

 General linear least squares

 Nonlinear regression 2

8/20/2019 Week06&07_Curve Fitting.pdf

http://slidepdf.com/reader/full/week0607curve-fittingpdf 3/38

 At the end of this topic, the students will be able:

  To fits the data using linear and polynomial

regression

  To fits the data using multiple linear and

non-linear regression

  To assess and choose the preferred method

for any particular problems

LESSON OUTCOMES

8/20/2019 Week06&07_Curve Fitting.pdf

http://slidepdf.com/reader/full/week0607curve-fittingpdf 4/38

4

• Describes techniques to fit curves (curve fitting ) to

discrete data to obtain intermediate estimates.

• There are two general approaches for curve fitting: – Data exhibit a significant degree of error . The strategy is to

derive a single curve that represents the general trend of the data. – Data is very precise. The strategy is to pass a curve or a series of

curves through each of the points.

• In engineering two types of applications are

encountered normally when fitting experimental data: – Trend analysis. Predicting values of dependent variable, may

include extrapolation beyond data points or interpolation betweendata points.

 – Hypothesis testing. Comparing existing mathematical model with

measured data.

 Curve Fitting

8/20/2019 Week06&07_Curve Fitting.pdf

http://slidepdf.com/reader/full/week0607curve-fittingpdf 5/38

5

Three attempts to fit a “best” 

curve through uncertain data

points.

a) Least-squares regression Linear regression

Polynomial regression

Multiple regression General linear least squares

Nonlinear regression

b) Linear interpolation

c) Curvilinear interpolation

8/20/2019 Week06&07_Curve Fitting.pdf

http://slidepdf.com/reader/full/week0607curve-fittingpdf 6/38

6

Mathematical background in Simple Statistics

• Arithmetic mean . The sum of the individual data

 points (yi) divided by the number of points (n).

• Standard deviation . The most common measure of a

spread for a sample.

or

ni

n

 y y

  i

,,1

 

mean)theand pointsdata between

residualstheof squarestheof sumtotalt(S

 )(

12

 

 y yS 

nS  s

it 

t  y

1

/22

2

n

n y y s

  ii

 y

8/20/2019 Week06&07_Curve Fitting.pdf

http://slidepdf.com/reader/full/week0607curve-fittingpdf 7/38

7

• Variance . Representation of spread by the square of

the standard deviation.

• Coefficient of variation . Has the utility to quantify the

spread of data and provides a normalized measure of

the spread.

1

)( 2

2

 

n

 y y s

  i

 y

%100.. y

 svc

  y

Degrees of freedom

8/20/2019 Week06&07_Curve Fitting.pdf

http://slidepdf.com/reader/full/week0607curve-fittingpdf 8/38

8

 Least-Squares Regression 

• Regression analysis is a study of the relationships among

variables.

• Get the ‘best’  straight line to fit through a set of uncertain

data points.

• Calculate the slope and the intercept of the line.

• Also fit the ‘best’ polynomial to data.

• Consider multiple linear regression for a case when one

variable depends on two or more variables in linear function.

8/20/2019 Week06&07_Curve Fitting.pdf

http://slidepdf.com/reader/full/week0607curve-fittingpdf 9/38

9

 Linear Regression 

• Fitting a straight line to a set of pairedobservations: (x 1  , y 1  ), (x 2  , y 2  ),… ,(x n  , y n  ) .

• The mathematical expression for straight line: 

 y=a 0 +a 1 x+e

where a 0 = intercept

a 1 = slopee = error, or residual, between the model and theobservations (e = y –   a 0  –   a 1 x ; discrepancy between

true value of y and approximate value, a 0  + a 1 x )

8/20/2019 Week06&07_Curve Fitting.pdf

http://slidepdf.com/reader/full/week0607curve-fittingpdf 10/38

10

Criteria for a ‘Best’ Fit 

• Minimize the sum of the residual errors for all available data:

n = total number of points

• However, this is an inadequate criterion

because the errors cancel.

• So, another logical criteria might be to minimize the sum of

the absolute values of discrepancies:

• However, this criterion is also

inadequate. It will minimize the sum of

the absolute values. This criterion also

does not yield a unique best fit.

n

i

ioi

n

i

i   xaa ye1

1

1

)(

n

iii

n

ii   xaa ye 1

101

8/20/2019 Week06&07_Curve Fitting.pdf

http://slidepdf.com/reader/full/week0607curve-fittingpdf 11/38

11

• Best strategy is to minimize the sum of the squares of the

residuals between the measured y and the y calculated withthe linear model:

• Yields a unique line for a given set of data. The line chosen thatminimizes the maximum distance that individual point fall

from the line.

n

i

n

i

iiii

n

i

ir    xaa y y yeS 1 1

2

10

2

1

2 )()model,measured,( Eq.(17.3) 

8/20/2019 Week06&07_Curve Fitting.pdf

http://slidepdf.com/reader/full/week0607curve-fittingpdf 12/38

12

Least-Squares Fit of a Straight Line

• To determine values for a0  and a1, Eq.(17.3) is differentiated

with respect to each coefficient:

00

2

10

10

1

1

1

1

2

10

 where

0)(2

0)(2

)( 

naa

 xa xa x y

 xaa y

 x xaa ya

 xaa yaS 

 xaa yS 

iiii

ii

iioir 

ioi

o

n

i

iir 

Setting derivatives equal to

zero will result in a

minimum Sr

8/20/2019 Week06&07_Curve Fitting.pdf

http://slidepdf.com/reader/full/week0607curve-fittingpdf 13/38

13

 

 xa ya

 x xn

 y x y xna

 y xa xa x

 ya xna

ii

iiii

iiii

ii

10

221

12

0

10

rule,sCramer'usingBy

Mean values

 Normal equations,

can be solved

simultaneously

8/20/2019 Week06&07_Curve Fitting.pdf

http://slidepdf.com/reader/full/week0607curve-fittingpdf 14/38

14

Example 1

Fit a straight line to the x and y values in the first twocolumns of Table 1

x y

1.0 0.5

2.0 2.5

3.0 2.04.0 4.0

5.0 3.5

6.0 6.0

7.0 5.5

4286.37

24y 24y 

4

7

28x 28x 

140x 5.119x 7

i

i

2

ii

  i yn

0714.0)4)(8393.0(4286.3 8393.0)28()140(7

)24)(28()5.119(7

 

021

10221

aa

 xa ya x xn

 y x y xn

aii

iiii

0714.08393.0

isfitsquares-leastThe

  x y

8/20/2019 Week06&07_Curve Fitting.pdf

http://slidepdf.com/reader/full/week0607curve-fittingpdf 15/38

15

y = 0.8393x + 0.0714R² = 0.8683

0.0

1.0

2.0

3.0

4.0

5.0

6.0

7.0

0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0

  y

x

8/20/2019 Week06&07_Curve Fitting.pdf

http://slidepdf.com/reader/full/week0607curve-fittingpdf 16/38

16

Exercise 1

Fit the best straight line to the following set of x and yvalues using the method of least-squares.

x y

0 2

1 5

2 9

3 15

4 17

5 24

6 25

8/20/2019 Week06&07_Curve Fitting.pdf

http://slidepdf.com/reader/full/week0607curve-fittingpdf 17/38

17

y = 4.1071x + 1.5357R² = 0.9822

0

5

10

15

20

25

30

0 1 2 3 4 5 6 7

  y

x

8/20/2019 Week06&07_Curve Fitting.pdf

http://slidepdf.com/reader/full/week0607curve-fittingpdf 18/38

18

Quantification of Error of Linear Regression

•  Recall that the sum of the residuals squared

which measures the deviations of the regression

line from the actual data values is

n

iii

n

iir    xaa yeS  1

2

101

2

)(

The square of the residual

represents the square of the

vertical distance between thedata and another measure of

central tendency-the straight line.

8/20/2019 Week06&07_Curve Fitting.pdf

http://slidepdf.com/reader/full/week0607curve-fittingpdf 19/38

19

• The analogy can be extended further for cases:

 The spread of points around the line is of similar

magnitude along the entire range of data.  The distribution of these points about the line is

normal.

• The following statistical formulas may be used to 

quantify the error associated with linear regression

2 )(

 

1

 )(

/

2

10

2

n

S  s xaa yS 

n

S  s y yS 

r  x yiir 

t  yit 

8/20/2019 Week06&07_Curve Fitting.pdf

http://slidepdf.com/reader/full/week0607curve-fittingpdf 20/38

20

• Standard error of the estimate

quantifies the spread the data.• sy/x  quantifies the spread

around the regression line

• Original standard deviation,

sy  quantified the spreadaround the mean.

8/20/2019 Week06&07_Curve Fitting.pdf

http://slidepdf.com/reader/full/week0607curve-fittingpdf 21/38

21

“Goodness” of our fit 

Determine

1)  Total sum of the squares around the mean for the

dependent variable, y, is S t  

2)  Sum of the squares of residuals around the

regression line is S r3)  Difference between two quantities, S t -S r   quantifies

the improvement or error reduction due to

describing data in terms of a straight line rather than

as an average value.

4)  Correlation of determination , r 2  and correlation

coefficient , r .

8/20/2019 Week06&07_Curve Fitting.pdf

http://slidepdf.com/reader/full/week0607curve-fittingpdf 22/38

22

r t 

r t 

S S r 

S S r 

 

 2

coefficient of determination

• For a perfect fit, Sr= 0 and r=r2=1, signifying that the

line explains 100 percent of the variability of thedata.

• For r =r2=0, Sr=St, the fit represents no improvement.

correlation coefficient

2222

iiii

iiii

 y yn x xn

 y x y xnr 

In a more convenient form for computation of r is  

8/20/2019 Week06&07_Curve Fitting.pdf

http://slidepdf.com/reader/full/week0607curve-fittingpdf 23/38

23

Example 3Determine the total standard deviation, standard

error of the estimate and coefficient of correlation forthe linear regression line obtained in the Example 1.

xi yi (yi-ymean)2 (yi-a0-a1xi)2

1.0 0.5 8.5765 0.1687

2.0 2.5 0.8622 0.5625

3.0 2.0 2.0408 0.3473

4.0 4.0 0.3265 0.3265

5.0 3.5 0.0051 0.5897

6.0 6.0 6.6122 0.7970

7.0 5.5 4.2908 0.1994

  24.0000 22.7143 2.9911

8/20/2019 Week06&07_Curve Fitting.pdf

http://slidepdf.com/reader/full/week0607curve-fittingpdf 24/38

24

932.0868.0

868.07143.22

9911.27143.22

 0.77342-7

2.9911

2

9911.2)(

 1.94571-7

22.7143 

1

7143.22)(

2

/

2

10

2

S S r 

n

S  s

 xaa yS 

n

S  s

 y yS 

r t 

r  x y

iir 

t  y

it 

Solution

8/20/2019 Week06&07_Curve Fitting.pdf

http://slidepdf.com/reader/full/week0607curve-fittingpdf 25/38

25

Exercise 4Determine the total standard deviation, standard

error of the estimate and coefficient of correlation forthe linear regression line obtained in the Example 2.

x y

0 21 5

2 9

3 15

4 175 24

6 25

8/20/2019 Week06&07_Curve Fitting.pdf

http://slidepdf.com/reader/full/week0607curve-fittingpdf 26/38

26

Linearization of Nonlinear Relationships

•  Most of the engineering functions has nonlinearrelationships.

•  These functions cannot adequately be described by

a linear line.

•  Transformations can be used to express the data ina form that is compatible with linear regression.

•  Examples of functions that can be linearized are

 Exponential equation

 Power equation

 Saturation-growth rate equation

8/20/2019 Week06&07_Curve Fitting.pdf

http://slidepdf.com/reader/full/week0607curve-fittingpdf 27/38

27

 x

e y1

1

  

 2

2

  

   x y   x

 x

 y 33    

11 lnln           x y22 logloglog           x y

33

3 111

  

  

 x y

8/20/2019 Week06&07_Curve Fitting.pdf

http://slidepdf.com/reader/full/week0607curve-fittingpdf 28/38

8/20/2019 Week06&07_Curve Fitting.pdf

http://slidepdf.com/reader/full/week0607curve-fittingpdf 29/38

29

 Polynomial Regression 

• Some engineering data is poorly

represented by a straight line wherethe error is high.

•  Example (a) and (b): A curve would

be suited to fit the data.

•  Instead of trying to linearize some

of the nonlinear functions and use

linear regression,we may alternatively

fit  polynomials to the data using

 polynomial regression.

• The least-squares procedure can be

readily extended to fit the data to a

higher order polynomial.

8/20/2019 Week06&07_Curve Fitting.pdf

http://slidepdf.com/reader/full/week0607curve-fittingpdf 30/38

30

• For a second-order polynomial or quadratic:

 y=a 0 +a 1 x+a 2 x 2  + e  

• For this case, the sum of the squares of the residual is

• Derivative with respect to each of the unknown coefficients of the

polynomial

• These eq. set to be zero and develop set of normal equations:

n

i

iiir    xa xaa yS 1

22

210 )(

)(2

)(2

)(2

2

21

2

2

2

21

1

2

211

ii

i

i

 xa xaa y xa

 xa xaa y xa

 xa xaa y

a

ioir 

ioiir 

ioi

o

iiiii

iiiii

iii

 y xa xa xa x

 y xa xa xa x

 ya xa xan

2

2

4

1

3

0

2

2

3

1

2

0

2

2

10

)()()(

)()()(

)()()( 

8/20/2019 Week06&07_Curve Fitting.pdf

http://slidepdf.com/reader/full/week0607curve-fittingpdf 31/38

31

• The coefficients of the unknowns can be calculated directly

from the observed data.

• Then, solving a system of three simultaneous linear

equations.

• The standard error is formulated as

•  Coefficient of determination for polynomial regression can

also be as

)1(/

mn

S S    r 

 x y

r t 

S S r 

  2

8/20/2019 Week06&07_Curve Fitting.pdf

http://slidepdf.com/reader/full/week0607curve-fittingpdf 32/38

32

Example 6Fit a second order polynomial to the data

xi yi

0 2.1

1 7.7

2 13.6

3 27.2

4 40.9

5 61.1

Solution: Refer to page 472 text book

8/20/2019 Week06&07_Curve Fitting.pdf

http://slidepdf.com/reader/full/week0607curve-fittingpdf 33/38

33

 Multiple Linear Regression 

Case where y is a linear function of two or more independent

variable:

 y=a 0 +a 1 x 1 +a 2 x 2  + e  

n x1i  x2i  a0  yi

x1i  (x1i)2  x1ix2i  a1  = x1iyi

x2i  x1ix2i  (x2i)2  a2  x2iyi

n

i

iiir    xa xaa yS 1

222110 )(

)1(/

mn

S S    r 

 x y Example:

Refer to page 475 text book

8/20/2019 Week06&07_Curve Fitting.pdf

http://slidepdf.com/reader/full/week0607curve-fittingpdf 34/38

34

• Multiple linear regression can also be extended to the power

equation of the general form,

• This equation is extremely useful when fitting experimental

data.

• To use multiple regression analysis, we transform the above

equation by taking logarithm of base 10 which yield

8/20/2019 Week06&07_Curve Fitting.pdf

http://slidepdf.com/reader/full/week0607curve-fittingpdf 35/38

35

 General Linear Least Squares 

 

residualsE

tscoefficienunknownA

variabledependenttheof valuedobservedY

t variableindependentheof valuesmeasuredat the

 functions basistheof valuescalculatedtheof matrix

functions basis1are

 

10

221100

 Z 

 E  A Z Y 

m , z  , , z  z 

e z a z a z a z a y

m

mm

2

1 0

   

  

 

n

i

m

 j

 ji jir    z a yS 

Minimized by taking its partial

derivative with respect to each of

the coefficients and setting the

resulting equation equal to zero

8/20/2019 Week06&07_Curve Fitting.pdf

http://slidepdf.com/reader/full/week0607curve-fittingpdf 36/38

36

       YTZAZTZ

• The normal equations can be expressed concisely in matrix

form as

    1Y T  Z  Z T  Z  A

• To solve it, matrix inverse can be employed:

8/20/2019 Week06&07_Curve Fitting.pdf

http://slidepdf.com/reader/full/week0607curve-fittingpdf 37/38

37

 Nonlinear Regression 

• The nonlinear models as those which have a nonlinear

dependence on their parameters. There are times when thenonlinear model must fit the data.

• As with linear least squares, nonlinear regression is based on

determining values of the parameters that minimize the sum of the

squares of the residual.

•  The Gauss-Newton method is one of the procedures to achieve

the least-squares criterion. It uses a Taylor series expansion to

approximate the nonlinear equation in linear form.

• The least-squares theory is applied to obtain new parameter

values that move in the direction of minimizing the residuals.

8/20/2019 Week06&07_Curve Fitting.pdf

http://slidepdf.com/reader/full/week0607curve-fittingpdf 38/38

38

 

m

o

nn

nn

  j

a

a

a

 x  f   y

 x  f   y

 x  f   y

a

  f  

a

  f  

a

  f  

a

  f  

a

  f  

a

  f  

 Z 

 E  A  j

 Z  D

. A 

)(

)(

)(

122

11

10

1

2

0

2

1

1

0

1

criterion.stoppingaceptablean belowfallsuntil

 %100 

converges,solutiontheuntilrepeatedis procedureThis

1,11,1 

0,01,0 

1,

,1,

a

 

  jk 

  jk   jk 

k  a

aa

a  j

a  j

a

a  j

a  j

a

 DT 

  j Z  A

  j Z 

  j Z 

 

For example nonlinear models, 

Example:

Refer to page

483 text book