+ All Categories
Home > Documents > Homework 4 Solution - Stanford Universityee263.stanford.edu/hw/hw4/hw4sol.pdf · Solution....

Homework 4 Solution - Stanford Universityee263.stanford.edu/hw/hw4/hw4sol.pdf · Solution....

Date post: 11-Apr-2018
Category:
Upload: ngothien
View: 322 times
Download: 59 times
Share this document with a friend
28
Homework 4 Solution EE263 Stanford University, Fall 2017 Due: Friday 10/27/17 11:59pm 1. Estimation with sensor offset and drift. We consider the usual estimation setup: y i = a T i x + v i , i =1,...,m, where y i is the ith (scalar) measurement x R n is the vector of parameters we wish to estimate from the measurements v i is the sensor or measurement error of the ith measurement In this problem we assume the measurements y i are taken at times evenly spaced, T seconds apart, starting at time t = T . Thus, y i , the ith measurement, is taken at time t = iT . (This isn’t really material; it just makes the interpretation simpler.) You can assume that m n and the measurement matrix A = a T 1 a T 2 . . . a T m is full rank (i.e., has rank n). Usually we assume (often implicitly) that the measurement errors v i are random, unpredictable, small, and centered around zero. (You don’t need to worry about how to make this idea precise.) In such cases, least-squares estimation of x works well. In some cases, however, the measurement error includes some predictable terms. For example, each sensor measurement might include a (common) offset or bias, as well as a term that grows linearly with time (called a drift ). We model this situation as v i = α + βiT + w i where α is the sensor bias (which is unknown but the same for all sensor measurements), β is the drift term (again the same for all measurements), and w i is part of the sensor error that is unpredictable, small, and centered around 0. If we knew the offset α and the drift term β we could just subtract the predictable part of the sensor signal, i.e., α + βiT from the sensor signal. But we’re interested in the case where we don’t know the offset α or the drift coefficient β . Show how to use least-squares to simultaneously estimate the parameter vector x R n , the offset α R, and the drift coefficient β R. Clearly explain your method. If your method always works, say so. Otherwise describe the conditions (on the matrix A) that must hold for your method to work, and give a simple example where the conditions don’t hold. 1
Transcript
Page 1: Homework 4 Solution - Stanford Universityee263.stanford.edu/hw/hw4/hw4sol.pdf · Solution. Substitutingtheexpressionforthenoiseintothemeasurementequationgives y i= aT ix+ + iT+ w

Homework 4 Solution

EE263 Stanford University, Fall 2017

Due: Friday 10/27/17 11:59pm

1. Estimation with sensor offset and drift. We consider the usual estimation setup:

yi = aTi x+ vi, i = 1, . . . ,m,

where

• yi is the ith (scalar) measurement

• x ∈ Rn is the vector of parameters we wish to estimate from the measurements

• vi is the sensor or measurement error of the ith measurement

In this problem we assume the measurements yi are taken at times evenly spaced, T secondsapart, starting at time t = T . Thus, yi, the ith measurement, is taken at time t = iT . (Thisisn’t really material; it just makes the interpretation simpler.) You can assume that m ≥ nand the measurement matrix

A =

aT1aT2...aTm

is full rank (i.e., has rank n). Usually we assume (often implicitly) that the measurementerrors vi are random, unpredictable, small, and centered around zero. (You don’t need toworry about how to make this idea precise.) In such cases, least-squares estimation of x workswell. In some cases, however, the measurement error includes some predictable terms. Forexample, each sensor measurement might include a (common) offset or bias, as well as a termthat grows linearly with time (called a drift). We model this situation as

vi = α+ βiT + wi

where α is the sensor bias (which is unknown but the same for all sensor measurements), βis the drift term (again the same for all measurements), and wi is part of the sensor errorthat is unpredictable, small, and centered around 0. If we knew the offset α and the driftterm β we could just subtract the predictable part of the sensor signal, i.e., α+ βiT from thesensor signal. But we’re interested in the case where we don’t know the offset α or the driftcoefficient β. Show how to use least-squares to simultaneously estimate the parameter vectorx ∈ Rn, the offset α ∈ R, and the drift coefficient β ∈ R. Clearly explain your method. If yourmethod always works, say so. Otherwise describe the conditions (on the matrix A) that musthold for your method to work, and give a simple example where the conditions don’t hold.

1

Page 2: Homework 4 Solution - Stanford Universityee263.stanford.edu/hw/hw4/hw4sol.pdf · Solution. Substitutingtheexpressionforthenoiseintothemeasurementequationgives y i= aT ix+ + iT+ w

Solution. Substituting the expression for the noise into the measurement equation gives

yi = aTi x+ α+ βiT + wi i = 1, . . . ,m.

In matrix form we can write this asy1y2...ym

︸ ︷︷ ︸

y

=

aT1 1 TaT2 1 2T...

......

aTm 1 mT

︸ ︷︷ ︸

A

xαβ

︸ ︷︷ ︸x

.

If A is skinny (m ≥ n+ 2) and full-rank, least-squares can be used to estimate x, α, and β. Inthat case,

xls =

xlsαlsβls

= (ATA)−1ATy.

The requirement that A be skinny (or at least, square) makes perfect sense: you can’t extractn+ 2 parameters (i.e., x, α, β) from fewer than n+ 2 measurements. Even is A is skinny andfull-rank, A may not be. For example, with

A =

2 02 12 02 1

,we have

A =

2 0 1 T2 1 1 2T2 0 1 3T2 1 1 4T

,which is not full-rank. In this example we can understand exactly what happened. The firstcolumn of A, which tells us how the sensors respond to the first component of x, has exactlythe same form as an offset, so it is not possible to separate the offset from the signal inducedby x1. In the general case, A is not full rank only if some linear combinations of the sensorsignals looks exactly like an offset or drift, or some linear combination of offset and drift. Somepeople asserted that A is full rank if the vector of ones and the vector (T, 2T, . . . ,mT ) arenot in the span of the columns of A. This is false. Several people went into the case when Ais not full rank in great detail, suggesting regularization and other things. We weren’t reallyexpecting you to go into such detail about this case. Most people who went down this route,however, failed to mention the most important thing about what happens. When A is notfull rank, you cannot separate the offset and drift parameters from the parameter x by anymeans at all. Regularization means that the numerics will work, but the result will be quitemeaningless. Some people pointed out that the normal equations always have a solution, evenwhen A is not full rank. Again, this is true, but the most important thing here is that even ifyou solve the normal equations, the results are meaningless, since you cannot separate out the

2

Page 3: Homework 4 Solution - Stanford Universityee263.stanford.edu/hw/hw4/hw4sol.pdf · Solution. Substitutingtheexpressionforthenoiseintothemeasurementequationgives y i= aT ix+ + iT+ w

offset and drift terms from the parameter x. Accounting for known noise characteristics (likeoffset and drift) can greatly improve estimation accuracy. The following matlab code showsan example.

T = 60; % Time between samplesalpha = 3; % sensor offsetbeta = .05; % sensor drift constantnum_x = 3;num_y = 8;A =[ 1 4 02 0 1-2 -2 3-1 1 -4-3 1 10 -2 23 2 30 -4 -6 ]; % matrix whose rows are a_i^\tpx = [-8; 20; 5];, with v a (Gaussian, random) noisey = A*x;for i = 1:num_y;y(i) = y(i)+alpha+beta*T*i+randn;end;x_ls = A\y;for i = 1:num_ylast_col(i) = T*i;endA = [A ones(num_y,1) last_col’];x_ls_with_noise_model = A\y;norm(x-x_ls)norm(x-x_ls_with_noise_model(1:3))

Additional comments. Many people correctly stated that A needed to be full rank and thenpresented a condition they claimed was equivalent. Unfortunately, many of these statementswere incorrect. The most common error was to claim that if neither of the two column vectorsthat were appended to A in creating A was in range(A), then A was full rank. As a counter-example, take

A =

23...

m+ 1

,and

A =

2 1 13 1 2...

......

m+ 1 1 m

.

3

Page 4: Homework 4 Solution - Stanford Universityee263.stanford.edu/hw/hw4/hw4sol.pdf · Solution. Substitutingtheexpressionforthenoiseintothemeasurementequationgives y i= aT ix+ + iT+ w

Since the first column of A is the sum of the last two, A has rank 2, not 3.

2. Navigation from range measurements. In this problem we are going to study a simple2D navigation system. Let x ∈ R2 be the unknown coordinates of a vehicle and let pi ∈ R2

be the known fixed coordinates of beacons for i = 1, 2, 3, 4. The vehicle can measure its rangeor distance ρi ∈ R+ from the each beacon i and use the four range measurements to estimateits coordinates x = (x1, x2).

ρ ∈ R4+ is a nonlinear function of x ∈ R2, given by

ρi(x) =√

(x1 − pi1)2 + (x2 − pi2)2 i = 1, 2, 3, 4 (1)

a) Linearize ρ(x) around x0 and express it in δρ = Aδx form. Explicitly express thedimension and entries of matrix A.

Let x0 = (0, 0) be the last navigation fix. You would like to estimate the current positionx a short time after, based on the changes in the range measurements. However, thereis some noise in your range measurements, i.e. you have

δρ = Ax+ v, (2)

where the measurement errors vi are independent, Gaussian, with zero mean and stan-dard deviation 2 (these details are not important for solving the problem). Also note thatsince x0 = (0, 0) we have δx = x. Let the beacon coordinates and range measurementsbe given by

P =

pT1pT2pT3pT4

=

64279 76604

113240 1976.5−13691 −97414−117390 42726

, δρ =

−9.897−4.70913.062−0.091

(3)

b) Compute the estimated position xjem using just enough measurements. In this methodyou should ignore the last two range measurements and compute the position by invertingonly the top 2× 2 half of A. Show that this method gives in fact an unbiased estimatorof the actual position x.

4

Page 5: Homework 4 Solution - Stanford Universityee263.stanford.edu/hw/hw4/hw4sol.pdf · Solution. Substitutingtheexpressionforthenoiseintothemeasurementequationgives y i= aT ix+ + iT+ w

c) Compute the estimated position xls using least-squares.

d) If the actual position is x = (4.39, 11.13), compare the norm of the errors for xls andxjem. Which method gives a more accurate estimate of the position in this case? Is ittrue that one of the two methods would always have better accuracy than the other?Briefly justify your answer.

Solution.

a) δρ = Aδx, where matrix A ∈ R4×2 has entries:

ai1 =(x01 − pi1)√

(x01 − pi1)2 + (x02 − pi2)2, ai2 =

(x02 − pi2)√(x01 − pi1)2 + (x02 − pi2)2

ith row of A shows (approximate) change in ith range measurement for (small) shift inx from x0.

b) In the just enough measurements method, δρ1 and δρ2 suffice to find x. We computeestimate xjem by inverting top (2× 2) half of A. Note that from x0 = (0, 0) we get:

A =

−0.6428 −0.7660−0.9998 −0.0175

0.1392 0.99030.9397 −0.3420

The corresponding left inverse of A would be given by

Bjem =

[0.0231 −1.0150 0 0−1.3248 0.8517 0 0

]Note that we have BjemA = I so Bjem is an unbiased linear estimator of x and theestimate would be given by

xjem = Bjemδρ =

[4.559.10

]c) The least-squares estimation would be given by

xls = (ATA)−1AT δρ =

[3.91

11.49

]d) The norms of the errors are

‖x− xjem‖ = 2.03, ‖x− xjem‖ = 0.60

We see that is this case the least-squares gives a much more accurate result. However, itis not true that the estimation error would always be smaller for least squares, comparedto just enough measurements. For instance, think a case where the third and fourthmeasurements are completely corrupted by huge amounts of noise, obviously you wouldbe better off by completely ignoring those two measurements in such a case and estimatex using Bjem. Having said that, statistically speaking if the noises added to the fourmeasurements are i.i.d the probability of the least squares estimation having smallererror than just enough measurements method is very large.

5

Page 6: Homework 4 Solution - Stanford Universityee263.stanford.edu/hw/hw4/hw4sol.pdf · Solution. Substitutingtheexpressionforthenoiseintothemeasurementequationgives y i= aT ix+ + iT+ w

3. Quadratic placement. We consider an integrated circuit (IC) that contains N cells ormodules that are connected by K wires. We model a cell as a single point in R2 (whichgives its location on the IC) and ignore the requirement that the cells must not overlap. Thepositions of the cells are

(x1, y1), (x2, y2), . . . , (xN , yN ),

i.e., xi gives the x-coordinate of cell i, and yi gives the y-coordinate of cell i. We have twotypes of cells: fixed cells, whose positions are fixed and given, and free cells, whose positionsare to be determined. We will take the first n cells, at positions

(x1, y1), . . . , (xn, yn),

to be the free ones, and the remaining N − n cells, at positions

(xn+1, yn+1), . . . , (xN , yN ),

to be the fixed ones. The task of finding good positions for the free cells is called placement.(The fixed cells correspond to cells that are already placed, or external pins on the IC.) Thereare K wires that connect pairs of the cells. We will assign an orientation to each wire (eventhough wires are physically symmetric). Specifically, wire k goes from cell I(k) to cell J(k).Here I and J are functions that map wire number (i.e., k) into the origination cell number(i.e., I(k)), and the destination cell number (i.e., J(k)), respectively. To describe the wire/celltopology and the functions I and J , we’ll use the node incidence matrix A for the associateddirected graph. The node incidence matrix A ∈ RK×N is defined as

Akj =

1 wire k goes to cell j, i.e., j = J(k)

−1 wire k goes from cell j, i.e., j = I(k)

0 otherwise.

Note that the kth row of A is associated with the kth wire, and the jth column of A isassociated with the jth cell. The goal in placing the free cells is to use the smallest amount ofinterconnect wire, assuming that the wires are run as straight lines between the cells. (In fact,the wires in an IC are not run on straight lines directly between the cells, but that’s anotherstory. Pretending that the wires do run on straight lines seems to give good placements.) Onecommon method, called quadratic placement, is to place the free cells in order to minimize thethe total square wire length, given by

J =

K∑k=1

((xI(k) − xJ(k))2 + (yI(k) − yJ(k))2

).

a) Explain how to find the positions of the free cells, i.e.,

(x1, y1), . . . , (xn, yn),

that minimize the total square wire length. You may make an assumption about therank of one or more matrices that arise.

6

Page 7: Homework 4 Solution - Stanford Universityee263.stanford.edu/hw/hw4/hw4sol.pdf · Solution. Substitutingtheexpressionforthenoiseintothemeasurementequationgives y i= aT ix+ + iT+ w

b) In this part you will determine the optimal quadratic placement for a specific set ofcells and interconnect topology. The mfile qplace_data.m defines an instance of thequadratic placement problem. Specifically, it defines the dimensions n, N , and K, andN −n vectors xfixed and yfixed, which give the x- and y-coordinates of the fixed cells.The mfile also defines the node incidence matrix A, which is K ×N . Be sure to explainhow you solve this problem, and to explain the matlab source code that solves it (whichyou must submit). Give the optimal locations of the free cells. Check your placementagainst various others, such as placing all free cells at the origin. You will also find anmfile that plots a proposed placement in a nice way:view_layout(xfree,yfree,xfixed,yfixed,A).This mfile takes as argument the x- and y-coordinates of the free and fixed cells, as wellas the node incidence matrix that describes the wires. It plots the proposed placement.Plot your optimal placement using view_layout.

Solution. The first thing to do is express the total square wire length in matrix form:

J =

K∑k=1

((xI(k) − xJ(k))2 + (yI(k) − yJ(k))2

)= ‖Ax‖2 + ‖Ay‖2.

This is a simple observation, but the key to everything. The reason is simple: the kth entry ofthe vector Ax is precisely xJ(k)−xI(k). Aren’t matrices great? One of the things this formulashows is that we can separately choose the x and y coordinates of the free cells, since theobjective is a sum of two terms, one that depends only on the x coordinates and the other,only on the y coordinates. In other words, we have J = Jx + Jy, where

Jx = ‖Ax‖2, Jy = ‖Ay‖2.

We can separately minimize each term. Now let’s break up x and y into the subvectorsxfree, yfree ∈ Rn and xfixed, yfixed ∈ RN−n−1. We’ll also partition A conformally:

Jx =

[A11 A12

A21 A22

] [xfreexfixed

]2, Jy =

[A11 A12

A21 A22

] [yfreeyfixed

]2.

Let’s write out Jx as

Jx =

[A11

A21

]xfree +

[A12

A22

]xfixed

2

,

and similarly for Jy. Now we have two completely standard least-squares problems, withsolutions

xfree = −(AT

11A11 +AT21A21

)−1 (AT

11A12 +AT12A22

)xfixed

yfree = −(AT

11A11 +AT21A21

)−1 (AT

11A12 +AT12A22

)yfixed.

Our assumption is that the matrix in parentheses is invertible, or equivalently, that the matrix[A11

A21

]7

Page 8: Homework 4 Solution - Stanford Universityee263.stanford.edu/hw/hw4/hw4sol.pdf · Solution. Substitutingtheexpressionforthenoiseintothemeasurementequationgives y i= aT ix+ + iT+ w

is full rank. The matlab for this problem is pretty simple:

Aleft = A(:,1:n);Aright = A(:,n+1:N);xfree = -Aleft\(Aright*xfixed);yfree = -Aleft\(Aright*yfixed);

Giving the optimal layout

−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

Optimal layout

and the free cell optimal locations

xfree =

−0.31930.2700−0.2965−0.16120.24740.28990.5254

, yfree =

0.36450.30910.0660−0.4786−0.3789−0.1003−0.1159

4. Fleet modeling. In this problem, we will consider model estimation for vehicles in a fleet.We collect input and output data at multiple time instances, for each vehicle in a fleet ofvehicles:

x(k)(t) ∈ Rn, y(k)(t) ∈ R, t = 1, . . . , T, k = 1, . . . ,K.

Here k denotes the vehicle number, t denotes the time, x(k)(t) ∈ Rn the input, and y(k)(t) ∈ Rthe output. (In the general case the output would also be a vector; but for simplicity here weconsider the scalar output case.)

While it does not affect the problem, we describe a more specific application, where thevehicles are airplanes. The components of the inputs might be, for example, the deflections ofvarious control surfaces and the thrust of the engines; the output might be vertical acceleration.

8

Page 9: Homework 4 Solution - Stanford Universityee263.stanford.edu/hw/hw4/hw4sol.pdf · Solution. Substitutingtheexpressionforthenoiseintothemeasurementequationgives y i= aT ix+ + iT+ w

Airlines are required to collect this data, called FOQA data, for every commercial flight. (Thisdescription is not needed to solve the problem.)

We will fit a model of the form

y(k)(t) ≈ aTx(k)(t) + b(k),

where a ∈ Rn is the (common) linear model parameter, and b(k) ∈ R is the (individual) offsetfor the kth vehicle.

We will choose these to minimize the mean square error

E =1

TK

T∑t=1

K∑k=1

(y(k)(t)− aTx(k)(t)− b(k)

)2.

a) Explain how to find the model parameters a and b(1), . . . , b(K).

b) Carry out your method on the data given in fleet_mod_data.m. The data is given usingcell arrays X and y. The columns of the n× T matrix X{k} are x(k)(1), . . . , x(k)(T ), andthe 1 × T row vector y{k} contains y(k)(1), . . . , y(k)(T ). Give the model parameters aand b(1), . . . , b(K), and report the associated mean square error E. Compare E to the(minimum) mean square error Ecom obtained using a common offset b = b(1) = · · · = b(K)

for all vehicles.

By examining the offsets for the different vehicles, suggest a vehicle you might want tohave a maintenance crew check out. (This is a simple, straightforward question; we don’twant to hear a long explanation or discussion.)

Solution.

a) Define

y(k) = [y(k)(1) · · · y(k)(T )] ∈ R1×T , X(k) = [x(k)(1) · · ·x(k)(T )] ∈ Rn×T .

We will stack the TK measurements, by vehicle number (and then time), and stack thecommon parameter above the offsets. Define

v =

(y(1))T

(y(2))T

...(y(K))T

, F =

(X(1))T 1 0 · · · 0

(X(2))T 0 1 · · · 0...

. . ....

(X(K))T 0 0 · · · 1

, w =

a

b(1)

...b(K)

,where 1 ∈ RT is the all ones vector. The mean square error can then be written as

E =1

TKv − Fw2

2.

Since 1/(TK) is a constant, finding the parameters a and b(1), . . . , b(K) that minimizeE is clearly a least squares problem, with solution w = F †v.

9

Page 10: Homework 4 Solution - Stanford Universityee263.stanford.edu/hw/hw4/hw4sol.pdf · Solution. Substitutingtheexpressionforthenoiseintothemeasurementequationgives y i= aT ix+ + iT+ w

b) Solving the problem instance using our method gives E = 0.0982. The following plotshows the values of b(k).

1 2 3 4 5 6 7 8 9 10−2

0

2

4

6

8

10

b(k

)

vehicle k

To find the mean square error assuming b = b(1) = · · · = b(K), we solve the least squaresproblem associated with minimizing the mean square error Ecom,

Ecom =1

TK

(y(1))T

(y(2))T

...(y(K))T

(x(1))T 1

(x(2))T 1...

...(x(K))T 1

[ab

]2

2

.

A more complicated approach might be to solve the original least squares problem withequality constraints.Solving for a common a and b gives a total square error of Ecom = 7.1068. Clearly,attributing individual offsets to each vehicle allows us to fit better models and decreasethe mean square error.Examining the offsets b(1), . . . , b(K), we see that vehicle k = 7 has a significantly largeroffset than the rest of the fleet, signifying a potential anomaly in this vehicle.The minimizing a for the individual and common offsets are given by, respectively,

a =

−0.5887−0.0041−0.1493−0.32220.7477

,−0.59020.0486−0.0035−0.32280.8283

,

10

Page 11: Homework 4 Solution - Stanford Universityee263.stanford.edu/hw/hw4/hw4sol.pdf · Solution. Substitutingtheexpressionforthenoiseintothemeasurementequationgives y i= aT ix+ + iT+ w

and the common offset is b = 1.3047. We see that a does not change much.

The Matlab code to solve the problem is given below.

fleet_mod_data;

% solve with individual offsetsF = [];v = [];for k = 1:K,

ek = (1:K == k)’; % generate e_k

F = [F; X{k}’ ones(T,1)*ek’];v = [v; y{k}’];

endw = F\v;

% report total square errorE = (norm(v - F*w)^2)/(T*K)

% solve with common offsetF = [];v = [];for k = 1:K,

F = [F; X{k}’ ones(T,1)];v = [v; y{k}’];

endw = F\v;

% report total square errorE_com = (norm(v - F*w)^2)/(T*K)

5. Fitting a Gaussian function to data. A Gaussian function has the form

f(t) = ae−(t−µ)2/σ2

.

Here t ∈ R is the independent variable, and a ∈ R, µ ∈ R, and σ ∈ R are parameters thataffect its shape. The parameter a is called the amplitude of the Gaussian, µ is called its center,and σ is called the spread or width. We can always take σ > 0. For convenience we definep ∈ R3 as the vector of the parameters, i.e., p = [a µ σ]T. We are given a set of data,

t1, . . . , tN , y1, . . . , yN ,

and our goal is to fit a Gaussian function to the data. We will measure the quality of the fitby the root-mean-square (RMS) fitting error, given by

E =

(1

N

N∑i=1

(f(ti)− yi)2)1/2

.

11

Page 12: Homework 4 Solution - Stanford Universityee263.stanford.edu/hw/hw4/hw4sol.pdf · Solution. Substitutingtheexpressionforthenoiseintothemeasurementequationgives y i= aT ix+ + iT+ w

Note that E is a function of the parameters a, µ, σ, i.e., p. Your job is to choose theseparameters to minimize E. You’ll use the Gauss-Newton method.

a) Work out the details of the Gauss-Newton method for this fitting problem. Explicitlydescribe the Gauss-Newton steps, including the matrices and vectors that come up.You can use the notation ∆p(k) = [∆a(k) ∆µ(k) ∆σ(k)]T to denote the update to theparameters, i.e.,

p(k+1) = p(k) + ∆p(k).

(Here k denotes the kth iteration.)

b) Get the data t, y (and N) from the file gauss_fit_data.m, available on the class website.Implement the Gauss-Newton (as outlined in part (a) above). You’ll need an initialguess for the parameters. You can visually estimate them (giving a short justification),or estimate them by any other method (but you must explain your method). Plot theRMS error E as a function of the iteration number. (You should plot enough iterationsto convince yourself that the algorithm has nearly converged.) Plot the final Gaussianfunction obtained along with the data on the same plot. Repeat for another reasonable,but different initial guess for the parameters. Repeat for another set of parameters thatis not reasonable, i.e., not a good guess for the parameters. (It’s possible, of course, thatthe Gauss-Newton algorithm doesn’t converge, or fails at some step; if this occurs, sayso.) Briefly comment on the results you obtain in the three cases.

Solution.

a) Minimizing E is the same as minimizingNE2, which is a nonlinear least-squares problem.The first thing to do is to find the first-order approximation of the Gaussian function,with respect to the parameters a, µ, and σ. This approximation is

f(t) +∂

∂af(t)∆a+

∂µf(t)∆µ+

∂σf(t)∆σ,

where all the partial derivatives are evaluated at the current parameter values. In matrixform, this first-order approximation is

f(t) + (∇pf(t))T∆p,

where ∇p denotes the gradient with respect to p. These partial derivatives are:

∂af(t) = e−(t−µ)

2/σ2

∂µf(t) = 2a(t− µ)/σ2e−(t−µ)

2/σ2

∂σf(t) = 2a(t− µ)2/σ3e−(t−µ)

2/σ2

The Gauss-Newton method proceeds as follows. We find ∆p that minimizes

N∑i=1

(f(ti) +∇pf(ti)

T∆p− yi)2,

12

Page 13: Homework 4 Solution - Stanford Universityee263.stanford.edu/hw/hw4/hw4sol.pdf · Solution. Substitutingtheexpressionforthenoiseintothemeasurementequationgives y i= aT ix+ + iT+ w

and then set the new value of p to be p := p+ ∆p. Finding ∆p is a (linear) least-squaresproblem. We can put this least-squares problem in a more conventional form by defining

A =

∇pf(t1)T

...∇pf(tN )T

, b =

y1 − f(t1)...

yN − f(tN )

.Then, ∆p is found by minimizing ‖A∆p− b‖. Thus, we have

∆p = (ATA)−1ATb.

To summarize, the algorithm repeats the following steps:

• Evaluate the vector b (which is the vector of fitting residuals.) Evaluate the partialderivatives to form the matrix A.

• Solve the least-squares problem to get ∆p.

• Update the parameter vector: p := p+ ∆p.

This can be repeated until the update ∆p is small, or the improvement in E is small.

b) We used the starting parameter values p = [11, 50, 35]T, estimated visually. The ampli-tude a = 11 was estimated as a guess for the (noise-free) peak of the graph, µ = 50 wasestimated as its center, and σ = 35 was estimated from its spread. The matlab code forthe Gauss-Newton method is given below.

gauss_fit_datap = [11, 50 , 35]’;RMS = [];while(1)A = [exp(-(t-p(2)).^2/p(3)^2) , p(1)*exp(-(t-p(2)).^2/p(3)^2)*2.* ...(t-p(2))/p(3)^2 , p(1)*exp(-(t-p(2)).^2/p(3)^2)*2.*(t-p(2)).^2/p(3)^3];f_0 = p(1)*exp(-(t-p(2)).^2/p(3)^2);dp = inv(A’*A)*A’*(y-f_0)plot(t,y,t,f_0); drawnowp = p + dp;RMS = [RMS, 1/sqrt(N)*sqrt((f_0-y)’*(f_0-y))];if dp’*dp < 1e-3 breakendendplot(t,y,t,f_0); drawnowtitle(’Gaussian Fit’)print -deps gauss_fit.epsfigure;plot(RMS); title(’RMS Error as function of iterations’)xlabel(’iterations’)print -deps gauss_fit_rms.eps

13

Page 14: Homework 4 Solution - Stanford Universityee263.stanford.edu/hw/hw4/hw4sol.pdf · Solution. Substitutingtheexpressionforthenoiseintothemeasurementequationgives y i= aT ix+ + iT+ w

The results are shown below. The final fit clearly is good (at least, visually). The finalRMS fit level is around 0.13, which is quite good, since the data ranges from 0 to around10. Convergence takes around 30 iterations, although E is still decreasing for another20 or 30 iterations or so.

0 10 20 30 40 50 60 70 80 90 100−2

0

2

4

6

8

10

12

14

16

Gaussian Fit

1 1.5 2 2.5 3 3.5 41.8

1.9

2

2.1

2.2

2.3

2.4

2.5

2.6

2.7

2.8

RMS Error as function of iterations

iterations

Now we try with another starting point, p = [11.5 65 37]T. The final fit is the same, butthis time it requires more iterations, around 100. This bolsters our comnfidence that the

14

Page 15: Homework 4 Solution - Stanford Universityee263.stanford.edu/hw/hw4/hw4sol.pdf · Solution. Substitutingtheexpressionforthenoiseintothemeasurementequationgives y i= aT ix+ + iT+ w

fit found in our first run (the same as this one) is probably the best fit possible.

0 10 20 30 40 50 60 70 80 90 100−2

0

2

4

6

8

10

12

14

16

Gaussian Fit

0 20 40 60 80 100 120 1400.12

0.13

0.14

0.15

0.16

0.17

0.18

0.19

0.2

0.21

0.22

RMS Error as function of iterations

iterations

Now we start Gauss-Newton from the initial parameter estimate p = [9, 30, 30]T, whichare not particularly reasonable guesses. The results are shown below. Surprisingly, weeventually get convergence to the same fit found above. But it takes over 400 iterations.This example shows it’s possible to converge to a good (and probably, the best) fit even

15

Page 16: Homework 4 Solution - Stanford Universityee263.stanford.edu/hw/hw4/hw4sol.pdf · Solution. Substitutingtheexpressionforthenoiseintothemeasurementequationgives y i= aT ix+ + iT+ w

when the initial parameter estimates are very poor.

0 10 20 30 40 50 60 70 80 90 100−2

0

2

4

6

8

10

12

14

16

Gaussian Fit

0 50 100 150 200 250 300 350 400 450 5000.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

RMS Error as function of iterations

iterations

For other poor initial guesses, however, the algorithm fails to converge. For example,

16

Page 17: Homework 4 Solution - Stanford Universityee263.stanford.edu/hw/hw4/hw4sol.pdf · Solution. Substitutingtheexpressionforthenoiseintothemeasurementequationgives y i= aT ix+ + iT+ w

with initial parameter estimate p = [2, 60, 100], the plot below shows E versus iteration.

1 2 3 4 5 6 7 80

1

2

3

4

5

6x 10

5 Convergence rate

iterations

RM

S e

rro

r

6. Smallest input that drives a system to a desired steady-state output. We startwith the discrete-time model of the system used in lecture 1:

x(t+ 1) = Adx(t) +Bdu(t), y(t) = Cdx(t), t = 1, 2, . . . ,

where Ad ∈ R16×16, Bd ∈ R16×2, Cd ∈ R2×16. The system starts from the zero state, i.e.,x(1) = 0. (We start from initial time t = 1 rather than the more conventional t = 0 sincematlab indexes vectors starting from 1, not 0.) The data for this problem can be found inss_small_input_data.m.

The goal is to find an input u that results in y(t) → ydes = (1,−2) as t → ∞ (i.e.,asymptotic convergence to a desired output) or, even better, an input u that results in y(t) =ydes for t = T + 1, . . . (i.e., exact convergence after T steps).

a) Steady-state analysis for desired constant output. Suppose that the system is in steady-state, i.e., x(t) = xss, u(t) = uss and y(t) = ydes are constant (do not depend on t). Finduss and xss.

b) Simple simulation. Find y(t), with initial state x(1) = 0, with u(t) = uss, for t =1, . . . , 20000. Plot u and y versus t. If you’ve done everything right, you should observethat y(t) appears to be converging to ydes.

You can use the following matlab code to obtain plots that look like the ones in lecture 1.

figure;subplot(411); plot(u(1,:));subplot(412); plot(u(2,:));subplot(413); plot(y(1,:));subplot(414); plot(y(2,:));

Here we assume that u and y are 2 × 20000 matrices. There will be two differencesbetween these plots and those in lecture 1: These plots start from t = 1, and the plotsin lecture 1 scale t by a factor of 0.1.

17

Page 18: Homework 4 Solution - Stanford Universityee263.stanford.edu/hw/hw4/hw4sol.pdf · Solution. Substitutingtheexpressionforthenoiseintothemeasurementequationgives y i= aT ix+ + iT+ w

c) Smallest input. Let u?(t), for t = 1, . . . , T , be the input with minimum RMS value(1

T

T∑t=1

‖u(t)‖2)1/2

that yields x(T + 1) = xss (the value found in part (a)). Note that if u(t) = u?(t) fort = 1, . . . , T , and then u(t) = uss for t = T + 1, T + 2, . . ., then y(t) = ydes for t ≥ T + 1.In other words, we have exact convergence to the desired output in T steps.

For the three cases T = 100, T = 200, and T = 500, find u? and its associated RMSvalue. For each of these three cases, plot u and y versus t.

d) Plot the RMS value of u? versus T for T between 100 and 1000 (for multiples of 10, ifyou like). The plot is probably better viewed on a log-log scale, which can be done usingthe command loglog instead of the command plot.

Solution.

a) uss and xss must satisfy the following equations:

xss = Adxss +Bduss, ydes = Cdxss.

Therefore

(I −Ad)xss = Bduss,

ydes = Cdxss.

If I −Ad is full rank, xss is unique and given by

xss = (I −Ad)−1Bduss.

If Cd(I −Ad)−1 is full rank, uss is unique and given by

uss = (Cd(I −Ad)−1Bd)−1ydes.

The following MATLAB script computes uss and xss.

ss_small_input_datau_ss = inv(Cd*inv(eye(n)-Ad)*Bd)*ydesx_ss = inv(eye(n)-Ad)*Bd*u_ss

The output of the script is

u_ss =

-0.68320.3135

x_ss =

18

Page 19: Homework 4 Solution - Stanford Universityee263.stanford.edu/hw/hw4/hw4sol.pdf · Solution. Substitutingtheexpressionforthenoiseintothemeasurementequationgives y i= aT ix+ + iT+ w

1.0000-2.0000-2.73866.0404

-2.9635-3.39233.24000.1856

-0.0039-0.00310.0037

-0.0033-0.0018-0.00140.0020

-0.0003

b) The following MATLAB script simulates the system and plot the steady-state input andthe resulting output for t = 0, 1, . . . , 20000.

ss_small_input_dataT = 20000;x = zeros(n,T);y = zeros(m,T);for i=1:T-1

x(:,i+1) = Ad*x(:,i) + Bd*u_ss;y(:,i) = Cd*x(:,i);

endy(:,end) = Cd*x(:,end);figure;subplot(411)plot(u_s(1)*ones(1,T));subplot(412)plot(u_s(2)*ones(1,T));subplot(413)plot(y(1,:));subplot(414)plot(y(2,:));

19

Page 20: Homework 4 Solution - Stanford Universityee263.stanford.edu/hw/hw4/hw4sol.pdf · Solution. Substitutingtheexpressionforthenoiseintothemeasurementequationgives y i= aT ix+ + iT+ w

The results are shown in the following figure.

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2

x 104

−2

0

2

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2

x 104

−2

0

2

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2

x 104

0

1

2

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2

x 104

−4

−2

0

t

t

t

t

u1

u2

y1

y2

c) We want that x(T + 1) = xss. Let

U =

u(1)u(2)...

u(T )

,then

x(T + 1) = ATx(1) +[AT−1B AT−2B . . . B

]U.

Let H =[AT−1B AT−2B . . . B

]. Our objective is to minimize the RMS of u(t),

i.e., to minimize∑T

t=0‖u(t)‖2, which is equal to ‖U‖2, subject to HU = xss. This is aleast-norm problem and the solution is given by

U? = H†xss.

The following script computes the minimum RMS input u?(t) for T = 100 and plots u(t)and y(t) versus t.

ss_small_input_data

20

Page 21: Homework 4 Solution - Stanford Universityee263.stanford.edu/hw/hw4/hw4sol.pdf · Solution. Substitutingtheexpressionforthenoiseintothemeasurementequationgives y i= aT ix+ + iT+ w

T = 100;% finding the minimum energy inputH = Bd;temp = Bd;for i=1:T-1

temp = Ad*temp;H = [temp H];

endU = pinv(H)*x_ss;% simulating the systemu = [U(1:2:end)’; U(2:2:end)’];x = zeros(n,T+1);y = zeros(m,T+1);for i=1:T

x(:,i+1) = Ad*x(:,i) + Bd*u(:,i);y(:,i) = Cd*x(:,i);

endy(:,end) = Cd*x(:,end);figure;subplot(411)plot(1:T, u(1,:));subplot(412)plot(1:T, u(2,:));subplot(413)plot(1:T+1, y(1,:));subplot(414)plot(1:T+1, y(2,:));

21

Page 22: Homework 4 Solution - Stanford Universityee263.stanford.edu/hw/hw4/hw4sol.pdf · Solution. Substitutingtheexpressionforthenoiseintothemeasurementequationgives y i= aT ix+ + iT+ w

We obtain the following the following plots for T = 100.

0 10 20 30 40 50 60 70 80 90 100−100

0

100

0 10 20 30 40 50 60 70 80 90 100−50

0

50

10 20 30 40 50 60 70 80 90 100−50

0

50

10 20 30 40 50 60 70 80 90 100−10

0

10

t

t

t

t

u1

u2

y1

y2

22

Page 23: Homework 4 Solution - Stanford Universityee263.stanford.edu/hw/hw4/hw4sol.pdf · Solution. Substitutingtheexpressionforthenoiseintothemeasurementequationgives y i= aT ix+ + iT+ w

Rerunning the script for T = 200, we obtain the following plots.

0 20 40 60 80 100 120 140 160 180 200−5

0

5

0 20 40 60 80 100 120 140 160 180 200−2

0

2

20 40 60 80 100 120 140 160 180 200−2

0

2

20 40 60 80 100 120 140 160 180 200−2

−1

0

t

t

t

t

u1

u2

y1

y2

23

Page 24: Homework 4 Solution - Stanford Universityee263.stanford.edu/hw/hw4/hw4sol.pdf · Solution. Substitutingtheexpressionforthenoiseintothemeasurementequationgives y i= aT ix+ + iT+ w

Rerunning the script for T = 500, we obtain the following plots.

0 50 100 150 200 250 300 350 400 450 500−1

0

1

0 50 100 150 200 250 300 350 400 450 500−0.5

0

0.5

50 100 150 200 250 300 350 400 450 500−2

0

2

50 100 150 200 250 300 350 400 450 500−5

0

5

t

t

t

t

u1

u2

y1

y2

Notice that the convergence in part b) is extremely slow when applying constantly thesteady-state input. The convergence can be accelerated as shown in part c) by choosingminimum energy inputs to reach steady-state. Shorter convergence times require largertotal energies.

d) The MATLAB script for computing and plotting the trade-off curve is

ss_small_input_dataTs = 100:10:1000;RMS = [];for T=1:length(Ts)

H = Bd;temp = Bd;for i=1:T-1

temp = Ad*temp;H = [temp H];

endU = pinv(H)*x_ss;RMS = [RMS sqrt(norm(U)^2/T)];

end

24

Page 25: Homework 4 Solution - Stanford Universityee263.stanford.edu/hw/hw4/hw4sol.pdf · Solution. Substitutingtheexpressionforthenoiseintothemeasurementequationgives y i= aT ix+ + iT+ w

figureloglog(Ts, RMS)

and returns the following figure.

102

103

10−1

100

101

102

T

RMSofume

7. Smooth and least-norm force profiles. Consider the mass/force example described inthe lecture notes (slide 5-5) with n = 10. For this problem, we are interested in input forcesequences which move the mass from an initial position and velocity of zero to final position1 and final velocity zero.

a) Find the sequence of forces that will move the mass as required, while minimizing thenorm of the force vector.

b) Define the roughness R of a vector x ∈ Rn as

R =n∑i=0

(xi+1 − xi)2,

where we let x0 = xn+1 = 0. Find the sequence of forces with the smallest roughness R.Show both force profiles in a single plot.

Remark. Please solve these problems exactly, i.e., do not solve a regularized least-squaresproblem with µ set very large or small.

25

Page 26: Homework 4 Solution - Stanford Universityee263.stanford.edu/hw/hw4/hw4sol.pdf · Solution. Substitutingtheexpressionforthenoiseintothemeasurementequationgives y i= aT ix+ + iT+ w

Solution. The position and velocity of a unit mass at t = 10 can be expressed as

x = Af,

where x = (0, 1) denotes the final velocity and position of the mass, f = (f1, . . . , f10) are theforces applied in each time interval, and the matrix A is given by

A =

[1 1 · · · 1192

172 · · · 1

2

].

a) The norm of the force vector will be minimized by finding the least norm solution

fls = AT(AAT)−1x.

We find the numerical values

fls = (.055, .042, .030, .018, .006,−.006,−.018,−.030,−.042,−.055).

b) Observe that the roughness R of the force profile f can in fact be represented as R =||Bf ||2 where

B =

1 0 0 · · · 0−1 1 0 · · · 00 −1 1 · · · 0...

. . ....

0 · · · 0 −1 10 · · · 0 0 −1

.

The objective is to minimize the roughness of a force sequence that moves the massto a specific location. This problem is therefore a general minimization problem withequality constraints, which we can write as

minimize ||Bfs||subject to Afs = x

where fs denotes the force profile of minimal roughness. The solution comes from solvingfor fs and the Lagrange mulipliers λ in[

BTB AT

A 0

] [fsλ

]=

[0x

].

We find the numerical values

fs = (.035, .049, .047, .033, .012,−.012,−.033,−.047,−.049,−.035).

26

Page 27: Homework 4 Solution - Stanford Universityee263.stanford.edu/hw/hw4/hw4sol.pdf · Solution. Substitutingtheexpressionforthenoiseintothemeasurementequationgives y i= aT ix+ + iT+ w

The code and plot are given below. We have plotted the force profiles as pieceise constantfunctions (which you were not required to do) using the matlab function stairs.

0 1 2 3 4 5 6 7 8 9 10−0.06

−0.04

−0.02

0

0.02

0.04

0.06

Time, t

Fo

rce

, f

Least norm

Smooth

x = [0 1]’;A = [ones(1,10); 9.5:-1:0.5];

% Least norm solutionfln = A’*((A*A’)\x);

% Roughness matrix BB = eye(10) + diag(-ones(9,1),-1);B = [B; zeros(1,9) -1];

M = [B’*B A’;A zeros(2)];y = [zeros(10,1); x];

v = M\y;

% Smooth solutionfs = v(1:10);

%pad values to end so piecewise constant plot is correctfln(end+1) = fln(end);fs(end+1) = fs(end);stairs(0:10,[fln fs]);

27

Page 28: Homework 4 Solution - Stanford Universityee263.stanford.edu/hw/hw4/hw4sol.pdf · Solution. Substitutingtheexpressionforthenoiseintothemeasurementequationgives y i= aT ix+ + iT+ w

xlabel(’Time, t’);ylabel(’Force, f’);legend(’Least norm’,’Smooth’);

28


Recommended