Section 7 Curve Fitting - College of...

transcript

MAE 4020/5020 – Numerical Methods with MATLAB

SECTION 7: CURVE FITTING

Introduction2

K. Webb MAE 4020/5020

Curve Fitting

K. Webb MAE 4020/5020

Often have data, , that is a function of some independent variable, , but the underlying relationship is unknown Know ’s and ’s (perhaps only approximately), but don’t know Measured data Tabulated data

Determine a function (i.e., a curve) that “best” describes relationship between and An approximation to (the unknown) This is curve fitting

Regression vs. Interpolation

K. Webb MAE 4020/5020

We’ll look at two categories of curve fitting:

Least‐squares regression Noisy data – uncertainty in value for a given valueWant “good” agreement between and data points Curve (i.e., ) may not pass through any data points

Polynomial interpolation Data points are known exactly – noiseless data Resulting curve passes through all data points

Before moving on to discuss least‐squares regression, we’ll first review a few basic concepts from statistics.

Review of Basic Statistics5

K. Webb MAE 4020/5020

Basic Statistical Quantities

K. Webb MAE 4020/5020

Arithmetic mean – the average or expected value

Standard deviation (unbiased) – a measure of the spread of the data about the mean

where is the total sum of the squares of the residuals

Basic Statistical Quantities

K. Webb MAE 4020/5020

Variance – another measure of spread The square of the standard deviation Useful measure due to relationship with power and power spectral density of a signal or data set

Normal (Gaussian) Distribution

K. Webb MAE 4020/5020

Many naturally‐occurring random process are normally‐distributedMeasurement noise Very often assume noise in our data is Gaussian Probability density function (pdf):

where is the variance, and is the mean of the random variable,

Statistics in MATLAB

K. Webb MAE 4020/5020

Generation of random values Very useful for simulations including noise Uniformly‐distributed: rand(m,n) E.g., generate an m nmatrix of uniformly‐distributed points between xmin and xmax:

x = xmin + (xmax – xmin)*rand(m,n)

Normally‐distributed: randn(m,n) E.g., generate an m nmatrix of normally‐distributed points with mean of mu and standard deviation of sigma:

x = mu + sigma*randn(m,n)

K. Webb MAE 4020/5020

Histogram plots: hist(x,binx,Nbins) Graphical depiction of the variation of random quantities Plots the frequency of occurrence of ranges (bins) of values

Provides insight into the nature of the distribution E.g., generate a histogram of the values in the vector x with 20 bins:

hist(x,20)

Optional input argument, binx, specifies x values at the centers of the bins

Built‐in statistical functions:min.m, max.m, mean.m, var.m, std.m, median.m, mode.m

K. Webb MAE 4020/5020

Linear Least Squares Regression12

K. Webb MAE 4020/5020

Linear Regression

K. Webb MAE 4020/5020

Noisy data, , values at known values

Suspect relationship between and is linear

i.e., assume

Determine and that define the “best‐fit” line for the data

How do we define the “best fit”?

Measured Data

K. Webb MAE 4020/5020

Assumed a linear relationship between and :

Due to noise, can’t measure exactly at each Can only approximate values

Measured values are approximations True value of plus some random error or residual

Best Fit Criteria

K. Webb MAE 4020/5020

Noisy data do not all line on a single line – discrepancy between each point and the line fit to the data The error, or residual:

Minimize some measure of this residual: Minimize the sum of the residuals Positive and negative errors can cancel Non‐unique fit

Minimize the sum of the absolute values of the residuals Effect of sign of error eliminated, but still not a unique fit

Minimize the maximum error –minimax criterion Excessive influence given to single outlying points

Least‐Squares Criterion

K. Webb MAE 4020/5020

Better fitting criterion is to minimize the sum of the squares of the residuals

Yields a unique best‐fit line for a given set of data

The sum of the squares of the residuals is a function of the two fitting parameters, and ,

Minimize by setting its partial derivatives to zero and solving for and

Least‐Squares Criterion

K. Webb MAE 4020/5020

At its minimum point, partial derivatives of with respect to and will be zero

Breaking up the summation:

Normal Equations

K. Webb MAE 4020/5020

and form a system of two equations with two unknowns, and

In matrix form:

These are the normal equations

Normal Equations

K. Webb MAE 4020/5020

Normal equations can be solved for and :

∑ ∑ ∑∑ ∑

∑ ∑̅

Or solve the matrix form of the normal equations, (3), in MATLAB using mldivide.m (\)

Linear Least‐Squares ‐ Example

K. Webb MAE 4020/5020

Noisy data with suspected linear relationship

Calculate summation terms in the normal equations: , , ,

Linear Least‐Squares ‐ Example

K. Webb MAE 4020/5020

Assemble normal equation matrices

Solve normal equations for vector of coefficients, , using mldivide.m

Goodness of Fit

K. Webb MAE 4020/5020

How well does a function fit the data? Is a linear fit best? A quadratic, higher‐order polynomial, or other non‐linear function?

Want a way to be able to quantify goodness of fit

Quantify spread of data about the mean prior to regression:

Following regression, quantify spread of data about the regression line (or curve):

Goodness of Fit

K. Webb MAE 4020/5020

quantifies the spread of the data about the mean quantifies spread about the best‐fit line (curve)

The spread that remains after the trend is explained The unexplained sum of the squares

represents the reduction in data spread after regression explains the underlying trend

Normalize to ‐ the coefficient of determination

Coefficient of Determination

K. Webb MAE 4020/5020

For a perfect fit: No variation in data about the regression line 0 → 1

If the fit provides no improvement over simply characterizing data by its mean value: → 0

If the fit is worse at explaining the data than their mean value: → 0

K. Webb MAE 4020/5020

Calculate for previous example:

K. Webb MAE 4020/5020

Don’t rely too heavily on the value of Anscombe’s famous data sets:

Same line fit to all four data sets in each case

Chapra

Linearization of Nonlinear Relationships27

K. Webb MAE 4020/5020

Nonlinear functions

K. Webb MAE 4020/5020

Not all data can be explained by a linear relationship to an independent variable, e.g.

Exponential model

Power equation

Saturation‐growth‐rate equation

Nonlinear functions

K. Webb MAE 4020/5020

Methods for nonlinear curve fitting:

Linearization of the nonlinear relationship Transform the dependent and/or independent data values

Apply linear least‐squares regression Inverse transform the determined coefficients back to those that define the nonlinear functional relationship

Nonlinear regression Treat as an optimization problem – more later…

Linearizing an Exponential Relationship

K. Webb MAE 4020/5020

Linearize the fitting equation:

Have noisy data that is believed to be best described by an exponential relationship

K. Webb MAE 4020/5020

Determine and :

Can calculate for the line fit to the transformed data

Note that original data must be positive

Fit a line to the transformed data using linear least‐squares regression

K. Webb MAE 4020/5020

Exponential fit:

where,

Note that is different than that for the line fit to the transformed data

Transform the linear fitting parameters, and , back to the parameters defining the exponential relationship

K. Webb MAE 4020/5020

Linearizing a Power Equation

K. Webb MAE 4020/5020

log log logor

log log

Have noisy data that is believed to be best described by an power equation

K. Webb MAE 4020/5020

Determine and :

Note that original data –both and – must be positive

K. Webb MAE 4020/5020

Power equation:

where,

Transform the linear fitting parameters, and , back to the parameters defining the power equation

K. Webb MAE 4020/5020

Linearizing a Saturation Growth‐Rate Equation

K. Webb MAE 4020/5020

Have noisy data that is believed to be best described by a saturation growth‐rate equation

K. Webb MAE 4020/5020

Determine and :

K. Webb MAE 4020/5020

Saturation growth‐rate equation:

Transform the linear fitting parameters, and , back to the parameters defining the saturation growth‐rate equation

K. Webb MAE 4020/5020

Polynomial Regression42

K. Webb MAE 4020/5020

Polynomial Regression

K. Webb MAE 4020/5020

So far we’ve looked at fitting straight lines to linearand linearized data sets

Can also fit mth‐order polynomials directly to data using polynomial regression

Same fitting criterion as linear regression:Minimize the sum of the squares of the residualsm+1 fitting parameters for an mth‐order polynomialm+1 normal equations

Polynomial Regression

K. Webb MAE 4020/5020

Assume, for example, that we have data we believe to be quadratic in nature

2nd‐order polynomial regression Fitting equation:

Best fit will minimize the sum of the squares of the residuals:

Polynomial Regression – Normal Equations

K. Webb MAE 4020/5020

Best‐fit polynomial coefficients will minimize Differentiate w.r.t. each coefficient and set to zero

K. Webb MAE 4020/5020

Rearranging the normal equations yields

Σ Σ Σ Σ Σ Σ ΣΣ Σ Σ Σ

Which can be put into matrix form:

Σ ΣΣ Σ ΣΣ Σ Σ

ΣΣΣ

This system of equations can be solved for the vector of unknown coefficients using MATLAB’s mldivide.m (\)

K. Webb MAE 4020/5020

For mth‐order polynomial regression the normal equations are:

Σ ⋯ ΣΣ Σ ⋯ Σ⋮ ⋮ ⋱ ⋮

Σ Σ ⋯ Σ⋮

ΣΣ⋮

Again, this system of 1 equations can be solved for the vector of 1 unknown polynomial coefficients using MATLAB’s mldivide.m (\)

Polynomial Regression – Example

K. Webb MAE 4020/5020

Polynomial Regression – polyfit.m

K. Webb MAE 4020/5020

p = polyfit(x,y,m)

x: ‐vector of independent variable data values y: ‐vector of dependent variable data values m: order of the polynomial to be fit to the data p: 1 ‐vector of best‐fit polynomial coefficients

Least‐squares polynomial regression if: 1 i.e., for over‐determined systems

Polynomial interpolation if: 1 Resulting fit passes through all (x,y) points – more later

Polynomial Regression – polyfit.m

K. Webb MAE 4020/5020

Note that the result matches that obtained by solving normal equations

51Exercise

Determine the 4th‐order polynomial with roots at

Generate noiseless data points by evaluating this polynomial at integer values of from to

Add Gaussian white noise with a standard deviation of to your data points

Use polyfit.m to fit a 4th‐order polynomial to the noisy data

Calculate the coefficient of determination, Plot the noisy data points, along with the best‐fit polynomial

Polynomial Regression Using polyfit.m

K. Webb MAE 4020/5020

Multiple Linear Regression52

K. Webb MAE 4020/5020

Multiple Linear Regression

K. Webb MAE 4020/5020

We have so far fit lines or curves to data described by functions of a single variable

For functions of multiple variables, fit planes or surfaces to data

Linear function of two independent variables: multiple linear regression

Sum of the squares of the residuals is now

Multiple Linear Regression – Normal Equations

K. Webb MAE 4020/5020

Differentiate w.r.t. fitting coefficients and equate to zero

The normal equations:

, , , ,

Solve as before – now fitting coefficients, , define a plane

General Linear Least Squares Regression55

K. Webb MAE 4020/5020

General Linear Least Squares

K. Webb MAE 4020/5020

We’ve seen three types of least‐squares regression Linear regression Polynomial regressionMultiple linear regression

All are special cases of general linear least‐squares regression

The ’s are basis functions Basis functions may be nonlinear This is linear regression, because dependence on fitting coefficients, , is linear

K. Webb MAE 4020/5020

For linear regression – simple or multiple:, , , …

For polynomial regression:, , , …

In all cases, this is a linear combination of basis function, which may, themselves, be nonlinear

K. Webb MAE 4020/5020

The general linear least‐squares model:⋯

Can be expressed in matrix form:

where is an 1 matrix, the design matrix, whose entries are the 1 basis functions evaluated at the independent variable values corresponding to the measurements:

⋯⋯

⋮ ⋮ ⋱ ⋮⋯

where is the basis function evaluated at the independent variable value. (Note: is not the row index and is not the column index, here.)

K. Webb MAE 4020/5020

The least‐squares model is:

More measurements than coefficients

is not square – tall and narrow Over‐determined system does not exist

General Linear Least Squares – Design Matrix Example

K. Webb MAE 4020/5020

For example, consider fitting a quadratic to five measured values, , at

Model is:

Basis functions are , , and Least‐squares equation is

General Linear Least Squares – Residuals

K. Webb MAE 4020/5020

Linear least‐squares model is:

Residual:(2)

Sum of the squares or the residuals:

Expanding,

Deriving the Normal Equations

K. Webb MAE 4020/5020

Best fit will minimize the sum of the squares of the residuals Differentiate with respect to the coefficient vector, , and set to zero

We’ll need to use some matrix calculus identities:

Deriving the Normal Equations

K. Webb MAE 4020/5020

Using the matrix derivative relationships, (6),

Equation (7) is the matrix form of the normal equations:

Solution to (8) is the vector of least‐squares fitting coefficients:

Solving the Normal Equations

K. Webb MAE 4020/5020

Remember, our starting point was the linear least‐squares model:

Couldn’t we have solved (10) for fitting coefficients as

No, must solve using (9), because: Don’t have , only noisy approximations, We have an over‐determined system is not square does not exist

Solving the Normal Equations

K. Webb MAE 4020/5020

Solution to the linear least‐squares problem is:

is the Moore‐Penrose pseudo‐inverse of

Use the pseudo‐inverse to find the least‐squares solutions to an over‐determined system

K. Webb MAE 4020/5020

Goodness of fit characterized by the coefficient of determination:

where is given by (3)

and(15)

General Least Squares in MATLAB

K. Webb MAE 4020/5020

Have measurements⋯

at known independent variable values⋯

and a model, defined by 1 basis functions⋯

Generate design matrix by evaluating 1 basis functions at all values of

⋯⋯

⋮ ⋮ ⋱ ⋮⋯

General Least Squares in MATLAB

K. Webb MAE 4020/5020

Solve for vector of fitting coefficients as the solution to the normal equations

or by using mldivide.m (\)

Result is the same, though the methods are differentmldivide.m uses QR factorization to find the solution to over‐determined systems

Nonlinear Regression69

K. Webb MAE 4020/5020

Nonlinear Regression – fminsearch.m

K. Webb MAE 4020/5020

Nonlinear models: Have nonlinear dependence on fitting parameters E.g.,

Two options for fitting nonlinear models to data Linearize the model first, then use linear regression Fit a nonlinear model directly by treating as an optimization problem

Want to minimize a cost function Cost function is the sum of the squares of the residuals

Find the minimum of – a multi‐dimensional optimization Use MATLAB’s fminsearch.m

K. Webb MAE 4020/5020

Cost function:

Find and to minimize Use fminsearch.m

Have noisy data that is believed to be best described by an exponential relationship

K. Webb MAE 4020/5020

Nonlinear Regression – lsqcurvefit.m

K. Webb MAE 4020/5020

An alternative to minimizing a cost function using fminsearch.m, if the optimization toolbox is available:

a = lsqcurvefit(f,a0,x,y)

f: handle to the fitting function e.g., f = @(a,x) a(1)*exp(a(2)*x);

a0: initial guess for fitting parameters x: independent variable data y: dependent variable data a: best‐fit parameters

Nonlinear Regression – lsqcurvefit.m

K. Webb MAE 4020/5020

Polynomial Interpolation75

K. Webb MAE 4020/5020

Polynomial Interpolation

K. Webb MAE 4020/5020

Sometimes we know both and values exactlyWant a function that describes Allows for interpolation between know data points

Fit an ‐order polynomial to data points

Polynomial will pass through all points

We’ll look at polynomial interpolation using Newton’s polynomial The Lagrange polynomial

K. Webb MAE 4020/5020

Can approach similar to linear least‐squares regression

where, , …

For an ‐order polynomial, we have equations with unknowns

In matrix form

K. Webb MAE 4020/5020

Now, unlike for linear regression All 1 values in are known exactly 1 equations with 1 unknown coefficients is square 1 1

⋯ 1⋯ 1

⋮ ⋮ ⋱ ⋮⋯ 1

Could solve in MATLAB using mldivide.m\

is a Vandermonde matrix Tend to be ill‐conditioned The techniques that follow are more numerically robust

Newton Interpolating Polynomial79

K. Webb MAE 4020/5020

Linear Interpolation

K. Webb MAE 4020/5020

Fit a line (1st‐order polynomial) to two data points using a truncated Taylor series (or simple trigonometry):

where is the function for the line fit to the data, and are the known data values

This is the Newton linear‐interpolation formula

Quadratic Interpolation

K. Webb MAE 4020/5020

To fit a 2nd‐order polynomial to three data points, consider the following form

Evaluate at to find

Back‐substitution and evaluation at and at will yield the other coefficients

Quadratic Interpolation

K. Webb MAE 4020/5020

Can still view this as a Taylor series approximation represents an offset is slope is curvature

Choice of initial quadratic form (Newton interpolating polynomial) was made to facilitate the development Resulting polynomial would be the same for any initial form of an ‐order polynomial

Solution is unique

‐Order Newton Interpolating Polynomial

K. Webb MAE 4020/5020

Extending the quadratic example to ‐order⋯ ⋯

Solve for coefficients as before with back‐substitution and evaluation of

denotes a finite divided difference

Finite Divided Differences

K. Webb MAE 4020/5020

First finite divided difference

Second finite divided difference

, ,, ,

finite divided difference

, , … , ,, … , , … ,

Calculate recursively

‐Order Newton Interpolating Polynomial

K. Webb MAE 4020/5020

‐order Newton interpolating polynomial in terms of divided differences:

, ⋯, , … , , ⋯

Divided difference table for calculation of coefficients:

Chapra

Newton Interpolating Polynomial – Example

K. Webb MAE 4020/5020

Lagrange Interpolating Polynomial87

K. Webb MAE 4020/5020

Linear Lagrange Interpolation

K. Webb MAE 4020/5020

Fit a first‐order polynomial (a line) to two known data points: and

∙ ∙

and are weighting functions, where1,0,1,0,

The interpolating polynomial is a weighted sum of the individual data point values

Linear Lagrange Interpolation

K. Webb MAE 4020/5020

For linear (1st‐order) interpolation, the weighting functions are:

The linear Lagrange interpolating polynomial is:

‐Order Lagrange Interpolation

K. Webb MAE 4020/5020

Lagrange interpolation technique can be extended to ‐order polynomials

Lagrange Interpolating Polynomial – Example

K. Webb MAE 4020/5020

MATLAB’s Curve‐Fitting Tool92

K. Webb MAE 4020/5020

MATLAB’s Curve‐Fitting Tool GUI

K. Webb MAE 4020/5020

Type cftool at the command line Launches curve fitting tool GUI in a new window

Import data from the workspace Quickly try fitting different types of functions to the data Polynomial Exponential Power Fourier Custom equations, and more…

Interpolation for Least‐squares regression for

MATLAB’s Curve‐Fitting Tool GUI

K. Webb MAE 4020/5020

95Exercise

Go to the course website and download the following data file: cftool_ex_data.mat

Load data file into the workspace At the command line, type: load(‘cftool_ex_data’)

Launch the Curve Fitting Tool Type cftool at the command line

Try fitting different functions to each of the three data sets Can open a new tab in the GUI for each fit

Curve Fitting with the cftool GUI

K. Webb MAE 4020/5020

Section 7 Curve Fitting - College of...

Documents