Method of Least Squares
Least Squares Method of Least Squares:
Deterministic approach
The inputs u(1), u(2), ..., u(N) are applied to the system The outputs y(1), y(2), ..., y(N) are observed
Find a model which fits the input-output relation to a (linear?) curve, f(n,u(n))
‘best’ fit by minimising the sum of the squres of the difference f - y
0 5 10 15 20 25 30 35 40 45 500
5
10
15
20
25
30
35
40
45
50
Least Squares The curve fitting problem can be formulated as
Error: Sum-of-error-squares:
Minimum (least-squares of error) is achieved when the gradient is zero
model observationsvariable
Problem Statement For the inputs to the system, u(i) The observed desired response
is, d(i)
Relation is assumed to be linear
Unobservable measurement error Zero mean
White
Problem Statement Design a transversal filter which finds the least squares solution
Then, sum of error squares is
Data Windowing We will express the input in matrix form Depending on the limits i1 and i2 this matrix changes
Covariance Methodi1=M, i2=N
Prewindowing Methodi1=1, i2=N
Postwindowing Methodi1=M, i2=N+M1
Autocorr. Methodi1=1, i2=N+M1
Error signal
Least squares (minimum of sum of squares) is achieved when
i.e., when
The minimum-error time series emin(i) is orthogonal to the time series of the input u(i-k) applied to tap k of a transversal filter of length M for k=0,1,...,M-1 when the filter is operating in its least-squares condition.
Principle of Orthogonality
!Time averaging!(For Wiener filtering)
(this was ensemble average)
Corollary of Principle of Orthogonality LS estimate of the desired response is
Multiply principle of orthogonality by wk* and take summation over k
Then
When a transversal filter operates in its least-squares condition, the least-squares estimate of the desired response -produced at the output of the filter- and the minimum estimation error time series are orthogonal to each other over time i.
Energy of Minimum Error
Due to the principle of orthogonality, second and third terms are orthogonal, hence
where
, when eo(i)= 0 for all i, impossible , when the problem is underdetermined fewer data points
than parameters infinitely many solutions (no unique soln.)!
Normal Equations
Hence,
Expanded system of the normal equations for linear least-squares filters.
Minimum error: Principle of Orthogonality→
(t,k), 0≤(t,k) ≤M-1time-average
autocorrelation functionof the input
z(-k), 0 ≤k ≤M-1time-average
cross-correlation bwthe desired response
and the input
Normal Equations (Matrix Formulation)
Matrix form of the normal equations for linear least-squares filters:
Linear least-squares counterpart of the Wiener-Hopf eqn.s. Here and z are time averages, whereas in Wiener-Hopf eqn.s
they were ensemble averages.
(if -1 exists!)
Minimum Sum of Error Squares
Energy contained in the time series is
Or,
Then the minimum sum of error squares is
Properties of the Time-Average Correlation Matrix
Property I: The correlation matrix is Hermitian symmetric,
Property II: The correlation matrix is nonnegative definite,
Property III: The correlation matrix is nonsingular iff det() is nonzero
Property IV: The eigenvalues of the correlation matrix are real and non-negative.
Properties of the Time-Average Correlation Matrix
Property V: The correlation matrix is the product of two rectangular Toeplitz matrices that are Hermitian transpose of each other.
Normal Equations (Reformulation)
But we know that
which yields
Substituting into the minimum sum of error squares expression gives
then
! Pseudo-inverse !
Projection
The LS estimate of d is given by
The matrix
is a projection operator onto the linear space spanned by the columns of data matrix A i.e. the space Ui.
The orthogonal complement projector is
Projection - Example
M=2 tap filter, N=4 → N-M+1=3 Let
Then
And
orthogonal
Projection - Example
Uniqueness of the LS Solution LS always has a solution, is that solution unique?
The least-squares estimate is unique if and only if the nullity (the dimension of the null space) of the data matrix A equals zero.
AKxM, (K=N-M+1)
Solution is unique when A is of full column rank, K≥M All columns of A are linearly independent Overdetermined system (more eqns. than variables (taps)) (AHA)-1 nonsingular → exists and unique
Infinitely many solutions when A has linearly dependent columns, K<M
(AHA)-1 is singular
Properties of the LS Estimates Property I: The least-squares estimate is unbiased, provided that
the measurement error process eo(i) has zero mean.
Property II: When the measurement error process eo(i) is white with zero mean and variance 2, the covariance matrix of the least-squares estimate equals 2-1.
Property III: When the measurement error process eo(i) is white with zero mean, the least squares estimate is the best linear unbiased estimate.
Property IV: When the measurement error process eo(i) is white and Gaussian with zero mean, the least-squares estimate achieves the Cramer-Rao lower bound for unbiased estimates.
Computation of the LS Estimates The rank (W) of an KxN (K≥N or K<N) matrix A gives
The number of linearly independent columns/rows The number of non-zero eigenvalues/singular values
The matrix is said to be full rank (full column or row rank) if
Otherwise, it is said to be rank-deficient
Rank is an important parameter for matrix inversion
If K=N (square matrix) and the matrix is full rank (W=K=N) (non-singular) inverse of the matrix can be calculated, A-1=adj(A)/det(A)
If the matrix is not square (K≠N), and/or it is rank-deficient (singular), A-1 does not exist, instead we can use the pseudo-inverse (a projection of the inverse), A+
SVD We can calculate the pseudo-inverse using SVD.
Any KxN matrix (K≥N or K<N) can be decomposed using the Singular Value Decomposition (SVD) as follows:
SVD The system of eqn.s,
is overdetermined if K>N, more eqn.s than unknowns, Unique solution (if A is full-rank) Non-unique, infinitely many solutions (if A is rank-deficient)
is underdetermined if K<N, more unknowns than eqn.s, Non-unique, infinitely many solutions
In either case the solution(s) is(are)
where
Computation of the LS Estimates Find the solution of (A: KxM)
If K>M and rank(A)=M, ( ) the unique solution is
Otherwise , infinitely many solutions, but pseudo-inverse gives the minimum-norm solution to the least squares problem.
Shortest length possible in the Euclidean norm sense.