Hard-constrained vs. soft-constrained parameter estimation

IEEE TRANSACTIONS ON AEROSPACE AND ELECTRONIC SYSTEMS 1

Hard-constrained vs. soft-constrained parameterestimation

A. Benavoli, L. Chisci Member, IEEE, A. Farina, Fellow, IEEE,L. Ortenzi, G. Zappa

Abstract— The paper aims at contrasting two different waysof incorporating a-priori information in parameter estimation,i.e. hard-constrained and soft-constrained estimation. Hard-constrained estimation can be interpreted, in the Bayesian frame-work, as Maximum A-posteriori Probability (MAP) estimation withuniform prior distribution over the constraining set, and amountsto a constrained Least-Squares (LS) optimization. Novel analyticalresults on the statistics of the hard-constrained estimator arepresented for a linear regression model subject to lower andupper bounds on a single parameter. This analysis allows toquantify the Mean Squared Error (MSE) reduction implied byconstraints and to see how this depends on the size of theconstraining set compared to the confidence regions of the uncon-strained estimator. Contrastingly, soft-constrained estimation canbe regarded as MAP estimation with Gaussian prior distributionand amounts to a less computationally demanding unconstrainedLS optimization with a cost suitably modified by the mean andcovariance of the Gaussian distribution. Results on the design ofthe prior covariance of the soft-constrained estimator for optimalMSE performance are also given. Finally, a practical case-studyconcerning a line fitting estimation problem is presented in orderto validate the theoretical results derived in the paper as well asto compare the performance of the hard-constrained and soft-constrained approaches under different settings.

Index Terms— Constrained estimation, Mean Squared Error,MAP estimator.

I. INTRODUCTION

COnsider the classical Maximum Likelihood (ML) estima-tion of a parameter vector, given a set of measurements.

The estimate is provided by the minimization of the likelihoodfunction; in this case, the Cramer-Rao Lower Bound (CRLB)can be computed by the inversion of the Fisher InformationMatrix (FIM) and gives the minimum theoretically achievablevariance of the estimate [2], [3]. In this paper, it will be inves-tigated how to improve the estimation accuracy by exploitinga-priori information on the parameters to be estimated.

In [1] the CRLB is computed in presence of equality“constraints”. The constrained estimation problem, in this case,can be reduced to an unconstrained problem by the use ofLagrange multipliers. Equality constraints [4], [5] expressvery precise a-priori information on the parameters to beestimated; unfortunately, in several practical applications, suchprecise information is not available. In most cases, “inequality”constraints are available [6] with a consequent complication ofthe estimation problem and of the computation of the CRLB.

Authors’ addresses:A. Benavoli, L. Chisci and G. Zappa, DSI, Universita di

Firenze, Firenze, Italy, e-mail: {benavoli, chisci}@dsi.unifi.it;A. Farina and L. Ortenzi, Engineering Division, SELEX - Sistemi IntegratiS.p.A., Rome, Italy, e-mail: {afarina, lortenzi}@selex-si.com.

There are clearly many ways of expressing a-priori infor-mation. This paper will specifically address two possible ap-proaches. A first approach, referred to as “hard-constrained”estimation, assumes inequality constraints (e.g. lower and up-per bounds) on the parameters to be estimated and formulates,accordingly, the estimation problem as a constrained LeastSquares (LS) optimization problem. The second approach,referred to as “soft-constrained” estimation, considers theparameters to be estimated as random variables with a Gaus-sian probability density function, whose statistical parameters(mean and covariance) are a-priori known. In this case the a-priori knowledge on the parameters simply modifies the LScost functional and the resulting estimation problem amountsto a less computationally demanding unconstrained LS op-timization. Hard-constrained and soft-constrained estimationcan be interpreted, in a Bayesian framework, as maximuma-posteriori probability (MAP) estimation with a uniformand, respectively, Gaussian prior distribution. Another alter-native could be constrained Minimum Mean Squared Error(MMSE) estimation which can be numerically approximatedvia Monte Carlo methods [7], [8]. This approach, whichinvolves extensive computations for numerical integration, iscomputationally much more expensive and will, therefore, notbe considered in this work.

A theoretical analysis on the statistics of the hard-constrained estimator for a linear regression model subjectto hard bounds on a single parameter will be carried out.The obtained results allow to quantify the bias and MSEreduction implied by the hard bounds in terms of the widthof the bounding interval and of the location of the trueparameter in such an interval. To the best of the authors’knowledge there are no similar results in the literature givingan analytical quantification of the performance improvementachievable using a-priori information on parameters in termsof hard constraints. It will be outlined how an extension ofsuch exact results to the case of multiple bounded parametersseems analytically intractable. It will also be shown howto use the single-bounded parameter analysis sequentially,i.e. parameter by parameter, in order to provide a, possiblyconservative, estimate of the MSE reduction achievable undermultiple bounded parameters. A further contribution concernsthe design of the prior covariance of the soft-constrainedestimator so as to make hard constraints and soft constraintsinformation-theoretically comparable.

The remaining part of the paper is organized as follows. Sec-tion 2 formulates the parameter estimation problem for a linearregression model and reviews classical results of unconstrained


estimation. Section 3 deals with the hard-constrained approachand provides novel results on the statistics of the hard-constrained estimator. Section 4 deals with the soft-constrainedapproach and provides results on the choice of the designparameters (prior covariance) of the soft-constrained estimator.Section 5 examines a practical case-study concerning a linefitting estimation problem in order to validate the presentedtheoretical results as well as to compare the performanceimprovement achievable with the hard-constrained and soft-constrained approaches. Finally, section 6 summarizes themain results of the paper drawing some concluding remarks.

II. UNCONSTRAINED ESTIMATION

Notation

Throughout the paper the following notation will beadopted.

• Lower-case, boldface lower-case and boldface upper-caseletters denote scalars, vectors and, respectively, matrices;

• Prime denotes transpose and tr denotes trace.• ‖v‖2

M

4= v

′Mv.

• Given a vector v: vi denotes its ith entry, vi:j =[vi, . . . , vj ]

′ if i ≤ j or vi:j = 0 if i > j. Similarly,given a matrix M: mi denotes its ith column and Mi:j =[mi, . . . ,mj ] if i ≤ j or Mi:j = 0 when i > j.

• Given the estimate x of x, x4= x − x is the associated

estimation error.• erf(s)

4= 2√

π

∫ s

0e−r2

dr is the error function.• v ∼ N (mv,Σv) means that v is normally distributed

with mean mv and covariance Σv; v ∼ U(V) meansthat v is uniformly distributed over the set V.

• Given the set X, IX is the indicator function of X definedas IX(x) = 1 for x ∈ X and IX(x) = 0 for x 6∈ X; δ(·)is the delta of Dirac.

Problem formulation

Consider the linear regression model

z = Hx + w (1)

where: x ∈ IRn is the variable to be estimated; z ∈ IRN is theobserved variable; H ∈ IRN×n is a known matrix; w ∈ IRN

is the measurement noise satisfying w ∼ N (0,Σw). It is wellknown that in the unconstrained case (i.e. the case in which noa-priori information on the parameters is assumed) the optimalestimate of x - in the weighted LS or, equivalently, ML, MAPor MMSE sense - is given by

x =(H

′Σ

−1

w H)−1

H′Σ

−1

w z. (2)

It is also well known that x ∼ N (x,Σ), i.e. x is normallydistributed with mean x (unbiased estimate) and covariancematrix [2], [3]

Σ4= E [xx

′] =(H

′Σ

−1

w H)−1

(3)

For the sake of comparison with the constrained case thatwill be addressed in the next two sections, let us considerthe situation in which one scalar parameter (without loss of

generality the first entry of x) is known. To this end, let x1

denote the first entry of x and introduce the partitioning

x =

[x1

x2

], H = [h1, H2] , Σ =

[σ2

1 σ′21

σ21 Σ2

]

The optimal estimate of x2 for known x1 is trivially

x2|1 =(H

′2Σ

−1

w H2

)−1H

′2Σ

−1

w (z − h1x1) (4)

and the corresponding MSE is

Σ2|1 = E[x2|1x

′2|1

]=

(H

′2Σ

−1

w H2

)−1(5)

Obviously the knowledge of x1 provides a MSE reduction onx2, i.e. Σ2|1 ≤ Σ2. More precisely, via matrix algebra, it canbe easily checked that

Σ2|1 = Σ2 − LL′ σ2

1 (6)

where

L4=

(H

′2Σ

−1

w H2

)−1H

′2Σ

−1

w h1 = Σ2|1 H′2Σ

−1

w h1 (7)

The relationship (6) expresses the fact that the knowledge ofx1 implies a MSE reduction on x2 which is proportional, viathe matrix gain LL

′, to the MSE σ21 that would be obtained

for x1 in the absence of such knowledge. In practice, perfectknowledge of a parameter is never available and the MSEreduction in (6) represents a theoretical limitation. In thenext two sections we shall investigate analytically the MSEreduction implied by a partial knowledge of the parameterx1 in terms of either (upper and lower) hard bounds (hard-constrained estimation) or a Gaussian prior distribution (soft-constrained estimation).

III. HARD-CONSTRAINED ESTIMATION

In this section, it will be assumed that the a-priori infor-mation on the parameters x is specified by a membership setX (hard constraints), i.e. it is assumed that x ∈ X ⊂ IRn. Apossible choice is

X ={x = [x1, x2, . . . , xn]

′:

xi,min ≤ xi ≤ xi,max, i = 1, 2, . . . , n} (8)

There exist several approaches to encompass the a-priori infor-mation provided by the hard bounds in the inference process.In particular one may assume a uniform prior distribution onthe parameter vector x , i.e. a prior probability density function(pdf)

f0(x) =

1

µ(X), x ∈ X

0, otherwise.(9)

where µ(X) is the Lebesgue measure of X, assumed finite.Consequently, the posterior pdf of x conditioned on theobservation z becomes

f(x|z) =fw(z − Hx) f0(x)∫

X

fw(z − Hx)f0(x)dx

=

1

cexp

(−‖z − Hx‖2

Σ−1

w

), x ∈ X

0, otherwise


(10)

where

c =

∫

X

exp(−‖z − Hx‖2

Σ−1

w

)dx

is a normalizing constant. In this case it is well known thatthe MMSE estimate of x is provided by the conditional mean

xMMSE = E [x|z] =1

c

∫

X

x exp(−‖z − Hx‖2

Σ−1

w

)dx

(11)

and the corresponding MMSE (conditioned on z) is given bythe conditional variance

var(xMMSE) =1

c

∫

X

xMMSE x′MMSE

exp(−‖z − Hx‖2

Σ−1

w

)dx

(12)

In this paper, however, the attention will be focused onMaximum A-Posteriori (MAP) estimation for which someinteresting results will be derived in the sequel, althoughMMSE estimation is clearly a possible alternative for hard-constrained estimation exploiting, for instance, Monte Carlomethods [7], [8].

The MAP estimate of x, under the assumption of a uniformprior distribution, is given by

xMAP = arg maxx

f(x|z) = arg minx∈X

‖z − Hx‖2Σ−1

w

(13)

Hence the MAP estimate coincides with the constrained ML(or equivalently constrained weighted LS) estimate

xc = arg maxx∈X

f(x|z) = arg minx∈X

‖z − Hx‖2Σ−1

w

(14)

In order to quantify the benefits provided by the knowledge ofX, it would be helpful to evaluate the bias E [xc] and the MSEE [xcx

′c] of the constrained estimator (14). Unfortunately this

is a prohibitive task for a general membership set X.

A. Single bounded parameter

Conversely, interesting analytical formulas can be obtainedin the special case of a single bounded parameter, i.e.

X ={x = [x1,x

′2]

′ : x1,min ≤ x1 ≤ x1,max,x2 ∈ IRn−1}

(15)

Such formulas are given by the following theorem.

Theorem 1 - Let us consider the constrained estimationproblem (14) with X as in (15). Then:(i) the resulting estimator turns out to be

x1,c =

x1,min, if x1 < x1,min

x1, if x1,min ≤ x1 ≤ x1,max

x1,max, if x1 > x1,max

(16)

x2,c = Σ2|1H′2Σ

−1

w (z − h1x1,c) (17)

where x = [x1, x′2]

′ is the unconstrained estimate provided by(2) and Σ2|1 is given by (5).(ii) The constrained estimate x1,c provided by (16) has pdf

fc(ξ1) = F (x1,min) δ(ξ1 − x1,min)

+ (1 − F (x1,max)) δ(ξ1 − x1,max)

+ f(ξ1) I[x1,min,x1,max](ξ1)

(18)

where f(·) is the Gaussian pdf of x1 and F (·) the correspond-ing distribution function.Further, let

α4=

x1,max − x1,min

2 σ1, β

4=

x1 − x1,min

x1,max − x1,min

(19)

then:(iii) the estimator (16)-(17) is biased with bias

E [x1,c] = δ(α, β) σ1 (20)E [x2,c] = − L δ(α, β) σ1 (21)

where the matrix L is defined in (7) and

δ(α, β) = ∆(αβ) − ∆(α(1 − β)) (22)

∆(y)4=

1√2π

exp(−2y2

)− y

(1 − erf

(√2 y

));

(23)

(iv) the estimator (16)-(17) is characterized by the followingMSE

σ21,c

4= E

[x2

1,c

]= (1 − γ(α, β)) σ2

1 (24)

Σ2,c4= E

[x2,cx

′2,c

]= Σ2 − LL

′ γ(α, β) σ21 (25)

where

γ(α, β) = Γ(αβ) + Γ(α(1 − β)) (26)

Γ(y)4=

1

2

(1 − erf

(√2 y

))+

√2

πy exp

(−2y2

)

− 2y2(1 − erf

(√2 y

)). (27)

Proof - See the Appendix.

Remarks on Theorem 1• In the case of a single bounded parameter, the estimator

becomes straightforward and requires no optimization. Infact, it first determines x1,c by the simple check (16) onthe unconstrained estimate x1 and then determines x2,c

by the conditioning over x1 = x1,c in (17). Conversely, inthe general case, the estimator (14) involves the solutionof a nonlinear programming problem, e.g. a quadraticprogramming problem whenever X is a polyhedron (i.e.a set defined by linear inequalities).

• The estimation performance (bias and MSE reduction)is completely characterized by the variables α and βdefined in (19). The variable α ≥ 0 is the width of thebounding semi-interval of x1 normalized by the standarddeviation of the unconstrained estimate of x1. Notice thatα → ∞ corresponds to the unconstrained estimation (2)while α = 0 corresponds to the estimation for known


x1 (4). Conversely β ∈ [0, 1] expresses the locationof the true parameter x1 within the bounding interval[x1,min, x1,max] i.e. β = 0, β = 1, β = 1

2 correspond tox1 = x1,min, x1 = x1,max,x1 = x1,m

4= (x1,min + x1,max) /2 respectively.

• The formulas (20)-(23) give the bias of the estimator as afunction of α and β. Notice that the estimator is unbiasedfor α = 0 (known x1) and for α → ∞ (unconstrainedestimation), since ∆(y) → 0 for y → ∞. The estimatoris also unbiased for β = 1/2, i.e. whenever the trueparameter x1 coincides with the midpoint x1,m of thebounding interval. Further, since ∆(y) is monotonicallydecreasing for y ≥ 0, the bias is positive (negative)if and only if β < 1/2 or equivalently x1 < x1,m

(respectively β > 1/2 or x1 > x1,m). This fact hasa simple interpretation: whenever the true parameter x1

is located unsymmetrically with respect to the bounds,the closer bound pushes the estimate towards the other(farther) bound. Also notice that for a given α the biasis anti-symmetric w.r.t. β = 1/2 and is maximum (inabsolute value) for β = 0 and β = 1. Finally, since ∆(y)is decreasing for y ≥ 0 and ∆(0) = 1/

√2π, the bias

never exceeds the threshold σ1/√

2π. However, the biasis not a good measure of the estimator’s quality. In fact,a little amount of bias can be tolerated provided that thisimplies a MSE reduction.

• The formulas (24)-(27) provide the MSE as a functionof α and β. From (24), it is clear that γ(α, β) ∈ [0, 1]represents the relative MSE reduction on x1 implied bythe a-priori information x1 ∈ [x1,min, x1,max], i.e.

γ(α, β)4=

σ21,c − σ2

1

σ21

.

Moreover, (25) states that the implied MSE reduction onx2 is obtained via multiplication of the MSE reductionon x1 by the gain matrix L

′L (like in the case of x1

known). It is interesting to analyze the dependence ofthe MSE reduction on the amount of a-priori informationavailable on x1 (variable α) as well as on the position ofthe true parameter x1 (variable β). Notice that for α → 0(tight constraints), γ(α, β) → 1 (100% MSE reduction);in particular for α = 0 (24) reduces, as expected, to theformula (6) relative to the case in which x1 is known.Conversely, since Γ(y) → 0 for y → ∞, γ(α, β) → 0 (noreduction) for α → ∞ (loose constraints), as expected.As far as the dependence on β is concerned, notice thatγ(α, β) is symmetric w.r.t. β = 1/2. Further usefulresults concerning the dependence of γ(α, β) on β aregiven in the subsequent corollary.

Corollary 1 - Let us consider the normalized MSEσ2

1,c/σ21 = 1− γ(α, β), where γ(α, β) is defined in (26)-(27).

Then there exists α0 > 0 such that:

(i) for α ≤ α0, the MSE has an unique minimum for β = 1/2;

(ii) for α > α0, the MSE has two minima symmetricw.r.t. β = 1/2, which tend to the endpoints of the interval

(β = 0 and β = 1) for increasing α.

Proof - See the Appendix.

The above analysis is illustrated in fig. 1 which plotsthe relative MSE 1 − γ(α, β) and displays the two typesof behaviour: (1) unique minimum in the center of theinterval for α ≤ α0; (2) two minima located symmetricallyw.r.t. the center of the interval for α > α0. Noticethat the threshold value is α0 ≈ 0.75. This means thatfor an interval [x1,min, x1,max] sufficiently small, i.e.x1,max − x1,min ≤ 1.5 σ1, a maximum MSE reduction isobtained whenever the true parameter is exactly in the middleof the interval, i.e. x1 = (x1,max +x1,min)/2. Conversely, foran increasing interval width, the % MSE reduction decreasesand for x1,max − x1,min > 1.5 σ1 the minimum splits upinto two minima symmetrically moving towards the interval’sextreme points. Please notice that, also for very large α(i.e. α >> α0) a significant MSE reduction is obtainedwhenever the true parameter is located close to the interval’sboundaries.

B. Multiple bounded parameters

Let us now consider the general case of multiple boundedparameters wherein the parameter membership set X is anhyper-rectangle in IRn as defined in (8). Now the dependenceof x1,c from x1 is no longer provided by (16) and may involvex2. The relationship (16) holds if x1 ∈ [x1,min, x1,max] butit may happen, for instance, that if xi > xi,max for somei 6= 1 and x1 > x1,max, nevertheless x1,c < x1,max. Thisis related to the ellipsoidal shape of the level surfaces ofthe LS cost functional (x − x)

′Σ

−1 (x − x). In the case ofa single bounded parameter it was possible to derive thepdf fc(·), see (18), as the sum of three terms one for eachpossible set of active constraints: a continuous term f(·)with support the interval [x1,min, x1,max] and two weightedsingular (impulsive) terms centered in x1,min and x1,max. Theweights of the impulsive terms correspond to the probabilities,under the unconstrained Gaussian pdf f(·), of the regions x1 <x1,min and x1 > x1,max which, in this simple case, representthe probabilities that the active constraint is x1 = x1,min or,respectively, x1 = x1,max. In the general case of n boundedparameters, there are clearly 3n possible active sets and thesituation becomes by far more complicated. Let a ∈ A denotea generic active set, A being the set of all possible active setsincluding the empty set ∅. Then a induces a partitioning ofx into active components xa ∈ IRna , subject to the equalityconstraint xa = xa where xa is a vector of bounds dependingon a, and free components xf ∈ IRn−na . Notice that theproblem (14) is equivalent to

xc = arg minx∈X

‖x − x‖2Σ−1 (28)

which can be regarded as a multiparametric Quadratic Pro-gramming (mpQP) problem where x is a parameter of theoptimization problem and the vector x must be optimizedfor all values of x. By the theory of mpQP [9], [10], foreach a ∈ A there exists a polyhedral set Xa such that


Fig. 1. Constrained-to-unconstrained MSE ratio σ2

1,c/σ2

1vs. α and β.

whenever x ∈ Xa the inequality-constrained estimate xc in(28) coincides with the equality-constrained estimate underthe active constraints xa = xa, namely

Xa =

{x ∈ IRn : xc = arg min

xa=xa

‖x − x‖2Σ−1

}(29)

Notice that obviously Xa = X for a = ∅ (no activeconstraints). Conversely, for a ∈ A\∅, the determination ofXa is not obvious but can be faced, at the price of a highcomputational burden, with the mpQP solvers available in theliterature [9], [10]. Assume that the sets Xa are given, the pdffc(·) of the constrained estimate can be formally expressed asfollows

fc(ξ) = f(ξ)IX +∑

a∈Av

δ (ξ − xa)

∫ �Xa

f(χ)dχ

+∑

a∈A0

δ (ξa − xa) fa(ξf )

(30)

where Av is the set of active sets of cardinality na = n(vertices of X), A0 is the set of active sets of cardinality0 < na < n (hyperfaces of dimension 1, . . . , n− 1 of X) andfa(ξf ) represent the pdf’s, on such hyperfaces, of the freevariables. The expression (30) is a generalization of (18) ton ≥ 1 bounded parameters. Its evaluation requires:

1) the determination of the regions Xa via mpQP tech-niques;

2) the calculation of the n-dimensional integrals of theunconstrained Gaussian pdf f(·) over Xa for all a ∈ Av;

3) the determination of the functions fa(·) for all a ∈ A0.Notice that in particular the above task 3, which is not neededin the case n = 1, seems analytically untractable. To seethe difficulty, let us consider the case of n = 2 boundedparameters. In this case, fc(ξ1, ξ2) consists of 32 = 9 terms:a continuous term f(ξ1, ξ2) with support the rectangle X; 4singular terms with support the edges of X; 4 singular termswith support the vertices of X. This is illustrated in fig. 2wherein two possible unconstrained estimates xa, xb and theellipsoidal level curves of the LS cost function centered aroundthem are displayed; it can be seen that in case a the constrainedestimate xca is in the vertex [x1,max, x2,max]′ while in case bthe constrained estimate xcb is on the segment x2 = x2,max,x1 ∈ [x1,min, x1,max]. However the weights of the above-mentioned components of fc(·), which correspond to theprobabilities, under the Gaussian pdf f(·), of nine regions ofthe parameter space, cannot be computed simply by evaluatingthe distribution function F (·) as in (18). For instance, the termassociated to the vertex v = [x1,max, x2,max]′

δ(ξ1 − x1,max) δ(ξ2 − x2,max)

∫ �Xv

f(χ1, χ2) dχ1 dχ2

requires the determination of the set Xv of unconstrainedestimates x for which the constrained estimate xc coincideswith v and then the evaluation of the probability ofthe event x ∈ Xv under the distribution f(·). Clearly,this probability can only be evaluated in a numericalway. The situation is even more complicated for theedges. For instance, the term associated to the edge


e = {x : x1,min ≤ x1 ≤ x1,max, x2 = x2,max} requires thedetermination of the pdf fe(ξ1) over the edge, which seemsto be related to the unconstrained Gaussian distribution in anoverly too complicated fashion. From the above arguments,it is clear that the computation of the statistics of the hard-constrained estimator xc is prohibitive, if not impossible, formultiple bounded parameters. Computations become easier ifwe consider a sub-optimal constrained estimator operating asfollows.

Sub-Optimal Constrained Estimator

for i = 1 to n

xi:n =(H

′i:nΣ

−1w Hi:n

)−1H

′i:nΣ

−1w (z − H1:i−1x1:i−1,s)

xi,s =

xi,min, if xi < xi,max

xi, if xi,min ≤ xi ≤ xi,max

xi,max, if xi > xi,max

(31)

The above estimator provides the estimate vector xs =[x1,s, x2,s, . . . , xn,s]

′ sequentially as follows: it first deter-mines the estimate x1,s of x1 based on x1 ∈ [x1,min, x1,max],then it determines the estimate x2,s of x2 based on x1 = x1,s

and x2 ∈ [x2,min, x2,max] and so forth. At stage i, theestimator gets xi,s based on x1:i−1 = x1:i−1,s and xi ∈[xi,min, xi,max]. In general, xs 6= xc and, therefore, xs issub-optimal in the sense of (14). Further, xs clearly dependson the ordering of the entries in the parameter vector. Since(31) actually involves the sequential application of the hard-constrained estimator for a single bounded parameter, it ispossible to use the analytical results of Theorem 1 in order toevaluate, in a conservative way, the MSE reduction that canbe achieved exploiting bounds on multiple parameters. In fact,the MSE σ2

i,s of the estimators xi,s, provided by (31), can becomputed by the following recursion.

1) Initialization

Σs = Σ =

σ21,s ∗ ∗ ∗∗ . . . ∗ ∗∗ ∗ σ2

i,s ∗∗ ∗ ∗ Σi+1:n,s

2) for i = 1 to n

αi =xi,max − xi,min

2 σi,s

βi =xi − xi,min

xi,max − xi,min

Li =(H

′i+1:nΣ

−1w Hi+1:n

)−1H

′i+1:nΣ

−1w hi

Σi+1:n,s = Σi+1:n,s − LiL′i γ(αi, βi) σ2

i,s

σ2i,s = σ2

i,s (1 − γ(αi, βi))

(32)

Please notice that overwriting has been used in (32) and ∗denote don’t care blocks at the i-th iteration. At the end of the

recursion, the quantity

γn,s(β) =σ2

n − σ2n,s

σ2n

provides, for the parameter xn and for a given location ofthe true parameter specified by the vector β = [β1, . . . , βn]′,a conservative estimate (lower bound) for the relative MSEreduction achievable by the use of constraints. Such an esti-mate can be obtained for all the parameters xi via (32) by asuitable re-ordering of the components of x which moves xi

in the bottom position of the re-ordered vector.

Fig. 2. Hard-constrained solution for the case of two bounded parameters

IV. SOFT-CONSTRAINED ESTIMATION

An alternative way of incorporating a-priori informationin parameter estimation is to introduce soft constraints, i.e.a penalization in the cost functional of the deviation fromsome prior parameter value. This can be interpreted, in aBayesian framework, as the adoption of a (normal) Gaussianprior distribution on the parameters. In fact, assuming thatx ∼ N

(x,Σ

), the MAP criterion (10) becomes [2]

f(x|z) =1

cexp

[−

(‖z − Hx‖2

Σ−1

w

+ ‖x − x‖2

Σ−1

)](33)

which involves an additional weighted penalization of x −x. Accordingly the MAP estimate, which maximizes (33), isgiven by [2], [3]

xsc = Σsc

(H

′Σ

−1

w z + Σ−1

x

)(34)

where

Σsc4=

(H

′Σ

−1

w H + Σ−1

)−1

=(Σ

−1 + Σ−1

)−1

(35)

The estimator (34), that will be referred to hereafter as thesoft-constrained estimator, has bias

E [xsc] = Σsc Σ−1

(x − E[x]) (36)


and MSE

E [xscx′sc] = Σsc Σ

−1E

[(x − x) (x − x)

′]Σ

−1Σsc

+ Σsc Σ−1

Σsc

(37)

If indeed the assumption x ∼ N(x,Σ

)holds true, the soft-

constrained estimator (34) is unbiased and its MSE in (37)turns out to be equal to Σsc. In this case, (34) is clearly optimalalso in the MMSE sense. Conversely, if the available a-prioriinformation is the hard constraint x ∈ X, x and Σ can beregarded as design parameters of the soft-constrained estimator(34) to be chosen so as to possibly minimize, in some sense,the MSE in (37). In particular, the following two settings canbe considered.Stochastic Setting - The parameter x is regarded as a randomvariable uniformly distributed over the set X defined in (8).In this case, it is natural to choose x as the center of X i.e.

x = E [x] = xm4=

1

2(xmin + xmax) (38)

where xmin and xmax are the vectors of entries xi,min and,respectively, xi,max. Moreover,

E[(x − x) (x − x)

′]= diag

{(xi,max − xi,min)2

12

}

i=1,...,n

(39)

and the MSE in (37) becomes a function of Σ only. Forinstance, one can minimize the MSE’s trace i.e.

minΣ

E [x′scxsc] =

minΣ

tr

{(Σ

−1 + Σ−1

)−1 [Σ

−1E

[(x − x) (x − x)

′]Σ

−1

+ Σ−1

] (Σ

−1 + Σ−1

)−1}

(40)

Deterministic Setting - The parameter x is regarded as adeterministic variable in X. Also in this case, x is naturallychosen as the center of X. Further, E

[(x − x) (x − x)

′]=

(x − x) (x − x)′ and the MSE in (37) depends, therefore,

on the unknown true parameter x, besides Σ. To avoid thedependence on x, one can perform the min-max optimizationof the MSE’s trace

minΣ

maxx∈X

E [x′scxsc] =

minΣ

maxx∈X

tr

{(Σ

−1 + Σ−1

)−1 [Σ

−1(x − x) (x − x)

′Σ

−1

+ Σ−1

] (Σ

−1 + Σ−1

)−1}

(41)

It can be noticed that in both of the above settings, Σ

becomes a design parameter which can be chosen in order toobtain a good performance for the estimator (34). A possibledesign approach is to take Σ = V diag{λ1, λ2, . . . , λn} V

′,where V is the eigenvector matrix of Σ i.e. Σ =V diag{λ1, λ2, . . . , λn} V

′, and to minimize the trace of theMSE via (40) (MSE averaged over x ∼ U(X)) or via (41)

(worst case MSE w.r.t. x ∈ X). The optimal solution turnsout to be [11]

λi =(xi,max − xi,min)

2

κ2(42)

where

κ =

{ √12, for the stochastic setting2, for the deterministic setting

. (43)

The resulting soft-constrained estimator (34) coincides withthe well known ridge estimator [11]. An alternative designapproach, which does not impose any structure on Σ, hasbeen proposed in [12]. The approach in [12] is based on thedeterministic setting and consists of minimizing tr(Σ) subjectto the constraint that the MSE (37) of the soft-constrainedestimator does not exceed, for any possible x ∈ X, the MSEΣ of the unconstrained estimator.

Like in the previous sections on the unconstrained and hard-constrained estimation, let us now consider the case in whicha-priori information is available on the parameter x1 only. Inthe soft-constrained setting, this amounts to assuming that

x =

[x1,m

0

]=

x1,min + x1,max

20

,

Σ−1

=

1

σ21

0

0 0

(44)

The following result suggests how to choose σ1 in an optimalway.

Theorem 2 - The optimal solution of (40) (for the stochasticsetting) or (41) (for the deterministic setting) under (15) and(44) is given by

σ1 =x1,max − x1,min

κ(45)

where κ is defined in (43).Proof - See the Appendix.

Remark - The estimators obtained in the deterministic andstochastic setting have not equivalent properties. The formerguarantees E [x′

scxsc] ≤ E [x′x] = tr Σ for any x ∈ X

while for the latter the above inequality holds in the mean,over the uniform distribution of x, but not necessarily for anyvalue of x in X. As a consequence, the stochastic settingis characterized by a lower covariance (uncertainty) than thedeterministic setting; this suggests that the deterministic MAPs

estimator is overly conservative.

V. HARD-CONSTRAINED VS. SOFT-CONSTRAINED:NUMERICAL EXAMPLE

In this section, the aim is to compare hard-constrainedand soft-constrained estimation on a simple line fitting case-study. The problem is to fit position measurements of a targetobtained at different time instants via a line (affine function of


time). The measured position z(i) at sample time i is modeledby

z(i) = x1 + iT x2 + w(i) (46)

where: i ∈ {0, 1, . . . , N − 1}; N > 2 is the number ofmeasurements; T > 0 is the sampling interval; the initialposition x1 and velocity x2 are the two parameters to beestimated; w(i) ∼ N (0, σ2

w) is the measurement noise. Thisis a special, 2-dimensional, case of the linear regression (1)with

z =

z(0)z(1)...z(N − 1)

, H =

1 01 T...

...1 (N − 1) T

(47)

Specializing (3) to this case, one gets

Σ4=

[σ2

1 σ21

σ21 σ22

]

= σ2w

2(2N − 1)

N(N + 1)− 6

TN(N + 1)

− 6

TN(N + 1)

12

T 2N(N2 − 1)

(48)

which shows how the MSE of the unconstrained estimatordecreases by increasing the size N of the data sample and thesampling rate 1/T or by decreasing the variance σ2

w of themeasurement noise.

According to the available a-priori information and the wayin which it is processed, three cases will be distinguished:

• information on a single parameter;• information on both parameters (optimal approach);• information on both parameters (sub-optimal approach).

For ease of reference, these cases will be referred to in thesequel as SP (Single Parameter), MP (Multiple Parameters)and SMP (Sequential Multiple Parameters). Remind that inthe sub-optimal approach the information on both parametersis processed in a sequential (sub-optimal) way operating asfollows: to compute the estimate x2 of x2, the information onx1 is used to estimate x1 and then this estimate is used, alongwith the information on x2, to compute x2; a reverse order isadopted to obtain the estimate x1 of x1. It is worth pointingout that the sub-optimal approach can be applied in the sameway to both hard-constrained and soft-constrained estimation,but turns out to be advantageous only for hard-constrainedestimation where it avoids the solution of a QP problem.Whenever the information is available on a single parameter,the equations (24)-(27) and (37) with (44) provide the MSEfor the hard- and, respectively, soft-constrained estimators. Tocompare the two estimators in this case it is useful, like in(24)-(27), to express the relationship (37) in terms of α. Infact, this allows to evaluate the performance of the constrainedestimators versus the ratio between the constraint width andthe standard deviation of the unconstrained estimator and,thus, quantify independently of the values of N , T , σw theeffectiveness of constraints for MSE reduction. From (19) and(45), it follows that a fair comparison “soft vs. hard” can be

performed by choosing σ1 = 2α σ1/κ for the soft-constrainedestimator.

Let us first consider the SP case. Fig. 3 shows, for boththe constrained parameter x1 and the free one x2 as well asfor both choices of κ (deterministic or stochastic setting), theMSE1, averaged on β, of the unconstrained, soft-constrained(MAPs) and hard-constrained (MAPh) estimators for varyingα. From these plots it can be seen that for low α, whereinthe benefits of the constrained estimation are more evident,soft constraints yield better performance than hard constraintswhereas the converse holds for high α. Fig. 3 also displaysthe MSE, for the constrained variable x1, obtained by usinga-priori information only (i.e. ignoring measurements); thisamounts to taking the center x1,m as estimate of x1 withassociated normalized MSE α2/3. Notice that the estimatorbased on prior information only, gives a lower MSE thanMAPh for α < 1.35. Conversely, it gives a MSE lower thanthe deterministic setting of MAPs for α < 1 and never lowerthan the stochastic setting of MAPs. Also notice that, thestochastic MAPs estimator always gives a better performanceon average than its deterministic counterpart. Fig. 4 displaysthe MSE behaviour of the constrained parameter x1 versusβ (which characterizes the position of the true parameter inthe interval [x1,min, x1,max]), for different values of α. It canbe noticed that the concavity of the MSE curve for the hard-constrained estimator changes with α, as previously remarkedafter Corollary 1. It can also be seen that, for α = 5, theMAPs estimator designed under the stochastic setting providesa higher MSE than the unconstrained estimator whenever thetrue parameter is close to the border of the constraint( β ≈ 0or β ≈ 1), while under the deterministic setting the MAPs

performance is never worse than in the unconstrained case(even if on average the MSE in the stochastic setting willbe lower than in the deterministic setting). Summing up, thedeterministic MAPs estimator is better when the true valueis close to the extreme points, while the stochastic MAPs

estimator is better on average. Since the latter represents themajority of cases, whereas a worst-case design (such as thedeterministic one) will be worse on average but slightly betterin some cases of low probability, it can be concluded that thestochastic MAPs estimator seems the most valuable.

Let us now examine the MP case. In this case, there is noanalytical (exact) formula to evaluate the MSE and resort hasbeen made to Monte Carlo simulations wherein both the noiserealization and the true parameter location have been randomlyvaried. As in the SP case, the MSE is reported as a functionof α, defined as the ratio between the constraint semi-widthand the standard deviation of the unconstrained estimate forboth parameters, i.e.

α =x1,max − x1,min

2σ1=

x2,max − x2,min

2σ2

For fixed α, the matrix Σ of the MAPs estimator has beentaken with the same eigenvectors of Σ and with eigenvalues

1Different measures, such as posterior entropy, could certainly be used butthe concern in this paper is on MSE, for which we have been able to deriveanalytical formulas and an approximate evaluation algorithm in presence ofhard bounds on the parameters.


optimally chosen as in (42)-(43). The simulation results areshown in Table I which reports, for different values of α, theMSE, averaged over β, of the MAPs (deterministic setting)and MAPh estimators in the SP, MP and SMP cases. Noticethat the MSE has been normalized w.r.t. the MSE of theunconstrained estimator and that the SP case is relative tox1 constrained (the results relative to x2 constrained can beobtained by interchanging the columns x1 and x2 in the table).It is worth pointing out that the same MSE values withinless than 1% of accuracy have been obtained with MonteCarlo simulations and with the analytical formulas given inthe previous section whenever available (i.e. except for theMAPh estimator in the MP case); this confirms the validityof the theoretical results of Theorem 1. Examination of thetable reveals that, in the case of both bounded parameters,hard-constrained estimation yields better performance startingfrom lower values of α. This difference w.r.t. the case of asingle bounded parameter is due to the fact that in the 2-dimensional space the special choice of Σ, aligned with theeigenvectors of Σ, is not able to effectively capture the a-prioriinformation provided by hard constraints. A different designprocedure of the parameters x and Σ of the soft-constrainedestimator, much more effective for multiple parameters andgeneral polytopic constraints, can be found in [12]. Finally,it can be noticed that the sub-optimal approach (SMP) givesa performance which is much better than the optimal singleparameter approach (SP) and very close to the optimal two-parameter approach (MP). This means not only that the sub-optimal hard-constrained estimator (31) provides a good trade-off among performance and computational cost but also thatthe sequential MSE analysis (32) gives a very good approxi-mation of the MSE obtained with the optimal hard-constrainedestimator (14).

VI. CONCLUSIONS

The paper has investigated two alternative ways of exploit-ing a-priori knowledge on the estimated parameters in theestimation process. The hard-constrained estimation approach,wherein the a-priori knowledge is specified in terms of amembership set to which the unknown parameters certainlybelong, requires constrained optimization and is, therefore,computationally more expensive than soft-constrained esti-mation, wherein the knowledge is expressed in terms of aGaussian prior distribution with known mean and covarianceof the unknown parameters. Exact analytical formulas for thebias and MSE reduction of the hard-constrained estimatorhave been derived for a linear regression model and intervalconstraints on a single parameter. These formulas expressthe performance improvement in terms of the key variableα, defined as the ratio between the width of the constraintsemi-interval and the standard deviation of the unconstrainedestimator. It has also been shown how these formulas canbe employed to evaluate in an approximate way the perfor-mance improvement in the general case in which bounds areavailable on all parameters. Results on the choice of the priorcovariance of the soft-constrained estimator for optimal MSEperformance have also been given. A case-study concerning

a line fitting estimation problem has confirmed the validityof the presented theoretical results and has also provided aninteresting comparison between the hard-constrained and soft-constrained approaches. From this comparison, the followingconsiderations can be drawn.

• The exploitation of a-priori information, when available,is always beneficial, the more beneficial the lower is αi.e. for tighter constraints, shorter data samples and highernoise variance.

• For low α, the soft-constrained estimator is preferablewhile, for α greater than a suitable threshold value, thehard-constrained estimator yields better performance.

• The above mentioned threshold decreases by increasingthe number of bounded parameters.

• Even for multiple bounded parameters, there seems to bean appropriate choice of the prior covariance of the soft-constrained estimator [12] for which its performance iscomparable to that of the hard-constrained estimator.

A perspective for future work is the generalization of theCRLB to inequality-constrained estimation. The fact that theinequality-constrained estimation problem is converted, witha given probability, into one of several possible equality-constrained problems can probably be exploited in order toderive a sort of CRLB using the results in [1].

APPENDIX

Proof of Theorem 1

Part (i) - If x1 ∈ [x1,min, x1,max], the unconstrainedestimate (2) satisfies the constraints and, therefore, providesthe solution of (14)-(15). Otherwise, the solution of (14)-(15)has an active constraint, either x1,c = x1,min or x1,c = x1,max

according to whether x1 < x1,min or x1 > x1,max. Hence x1,c

is given by (16). Moreover, since x2 is unconstrained, x2,c

is nothing but the unconstrained estimate of x2 for knownx1 = x1,c and is, therefore, given by (17).Part (ii) - From (16), the distribution function Fc(·) of theconstrained estimate x1,c turns out to be

Fc(ξ1) =

0, if ξ1 < x1,min

F (ξ1), if x1,min ≤ ξ1 < x1,max

1, if ξ1 ≥ x1,max

(49)

where F (·) is the Gaussian distribution function of the uncon-strained estimate x1. Hence, the pdf fc(·) of x1,c turns out tobe given by (18) where f(·) is the Gaussian pdf of x1.Part (iii) - To compute the bias of the estimate x1,c, let usconsider the definition

E [x1,c] =

∫ ∞

−∞ξ1 fc(ξ1) dξ1 (50)

Exploiting (18) and the relationships∫ ∞

−∞ξ1 f(ξ1) dξ1 = E [x1] = x1 (unbiasedness of x1)

(51)

F (y) =

∫ y

−∞f(ξ1)dξ1 (52)


Deterministic setting Stochastic setting

Constrained1 2 3 4 5 6 7

0

0.2

0.4

0.6

0.8

1

Estimate of the constrained parameter

α

MS

E

unconstrainedMAP

s

MAPh

info a−priori

1 2 3 4 5 6 70

0.2

0.4

0.6

0.8

1

Estimate of the constrained parameter

α

MS

E

unconstrainedMAP

s

MAPh

info a−priori

Free1 2 3 4 5 6 7

0

0.2

0.4

0.6

0.8

1

Estimate of the free parameter

α

MS

E

unconstrainedMAP

s

MAPh

1 2 3 4 5 6 70

0.2

0.4

0.6

0.8

1

Estimate of the free parameter

α

MS

E

unconstrainedMAP

s

MAPh

Fig. 3. MSE (averaged over β) vs. α for the unconstrained (dash-dotted line), hard-constrained (dotted line) and soft-constrained (solid line) estimators. Thedashed line in the two upper figures indicates the MSE α2/3 of the estimate based on prior information only.

TABLE INORMALIZED % MSE (MSE = 100% FOR UNCONSTRAINED ESTIMATION) FOR THE CASES SP, MP AND SMP. THE MAPs ESTIMATOR IN THE SMP

CASE IS NOT CONSIDERED.

SP MP SMP

α = 0.5MSE x1 x2

MAPs 10 40

MAPh 21 47

MSE x1 x2

MAPs 10 22

MAPh 17 17

MSE x1 x2

MAPs − −MAPh 18 18

α = 1

MSE x1 x2

MAPs 34 56

MAPh 48 65

MSE x1 x2

MAPs 34 48

MAPh 37 37

MSE x1 x2


α = 1.5MSE x1 x2

MAPs 56 71

MAPh 64 76

MSE x1 x2

MAPs 55 66

MAPh 51 51

MSE x1 x2


from (50) we get

E [x1,c] =

∫ x1,min

−∞(x1,min − ξ1) f(ξ1) dξ1

+

∫ ∞

x1,max

(x1,max − ξ1) f(ξ1) dξ1

(53)

Notice that the bias (53) clearly depends on the true parametervalue x1 via f(ξ1). The first term of (53) provides a positivebias while the second provides a negative bias. Exploiting theGaussianity of f(·) i.e.

f(ξ1) =1√

2π σ1

exp(− (ξ1 − x1)

2

2 σ21

)(54)

and evaluating the integrals in (53) we get, after lengthycalculations which are not reported due to lack of space, the

expression provided by (20) and (22)-(23). Next step is tocompute the bias of x2,c. Substitution of (2) into (17) yields

x2,c = x2 − Lx1,c + Σ2|1H′2Σ

−1

w w

where L is defined in (7). Consequently, x2,c is conditionallyGaussian to x1,c with mean −Lx1,c and covariance Σ2|1, i.e.

x2,c ∼ N(−Lx1,c,Σ2|1

). (55)

Hence

E [x2,c] = E [ E [ x2,c|x1,c ] ] = − L E [x1,c]

which, by (20), yields (21).Part (iv) - Let us now evaluate the MSE of x1,c defined as

σ21,c

4= E

[x2

1,c

]=

∫ ∞

−∞(ξ1 − x1)

2fc(ξ1) dξ1.


Deterministic setting Stochastic setting

α = 0.50 0.2 0.4 0.6 0.8 1

0

0.2

0.4

0.6

0.8

1

α =0.5

β

MS

E

unconstrainedMAP

s

MAPh

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

α =0.5

β

MS

E

unconstrainedMAP

s

MAPh

α = 10 0.2 0.4 0.6 0.8 1

0

0.2

0.4

0.6

0.8

1

α =1

β

MS

E

unconstrainedMAP

s

MAPh

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

α =1

βM

SE

unconstrainedMAP

s

MAPh

α = 1.50 0.2 0.4 0.6 0.8 1

0

0.2

0.4

0.6

0.8

1

α =1.5

β

MS

E

unconstrainedMAP

s

MAPh

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

α =1.5

β

MS

E

unconstrainedMAP

s

MAPh

α = 50 0.2 0.4 0.6 0.8 1

0

0.2

0.4

0.6

0.8

1

α =5

β

MS

E

unconstrainedMAP

s

MAPh

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

α =5

β

MS

E

unconstrainedMAP

s

MAPh

Fig. 4. MSE vs. β for different values of α

Exploiting (18), (52) and∫ ∞

−∞(ξ1 − x1)

2f(ξ1) dξ1 = σ2

1

we getσ2

1,c = σ21 − ε

where

ε =

∫ x1,min

−∞

[(x1 − ξ1)

2 − (x1 − x1,min)2]

f(ξ1) dξ1

+

∫ ∞

x1,max

[(ξ1 − x1)

2 − (x1,max − x1)2]

f(ξ1) dξ1

(56)

Notice that ε > 0 is the absolute MSE reduction; in particular,the two terms in (56) represent the reduction due to the a-priori information provided by the constraints x1 ≥ x1,min

and, respectively, x2 ≤ x2,max. Again, exploiting (54) andevaluating the integrals in (56) with the use of the erf function,after tedious calculations omitted due to space considerations,we get for E[x2

1,c] the expression in (24) and (26)-(27).The final step of the proof is to compute the MSE of x2,c.


From (55), using (6) and (24),

E[x2,cx

′2,c

]= E

[E

[x2,cx

′2,c | x1,c

] ]

= E[

LL′x2

1,c + Σ2|1]

= Σ2 − LL′σ2

1 + LL′E

[x2

1,c

]

= Σ2 − LL′ γ(α, β) σ2

1

which is just (25).

Proof of Corollary 1From (56), it follows that

γ(α, β) =ε

σ2

1

=1

σ2

1

��x1,min

−∞ � (ξ1 − x1)2− (x1,min − x1)

2 �f(ξ1) dξ1

+

�∞

x1,max � (ξ1 − x1)2− (x1,max − x1)

2 �f(ξ1) dξ1 �

(57)

Differentiating γ(α, β) w.r.t. x1 one gets

∂γ(α, β)

∂x1

=

�x1,min

−∞ � − 2(ξ1 − x1) + 2(x1,min − x1)

+(ξ1 − x1)

3

σ2

1

− (x1,min − x1)(ξ1 − x1)

σ2

1 � f(ξ1)

σ2

1

dξ1

+

�∞

x1,max � − 2(ξ1 − x1) + 2(x1,max − x1)

+(ξ1 − x1)

3

σ2

1

− (x1,max − x1)(ξ1 − x1)

σ2

1 � f(ξ1)

σ2

1

dξ1

(58)

Exploiting the relationship∂f(ξ1)

∂ξ1

= −

(ξ1 − x1)

σ2

1

f(ξ1) due to the

gaussianity of f(·), after straightforward calculations, (58) can besimplified to

σ2

1

∂γ(α, β)

∂x1

= 2 (x1,max − x1)

�∞

x1,max

f(ξ1) dξ1

− 2 (x1 − x1,min)

�x1,min

−∞

f(ξ1) dξ1

(59)

Differentiating (59) w.r.t. x1

σ2

1

∂2γ(α, β)

∂x2

1

= − 2

�x1,min

−∞

f(ξ1) dξ1

+ 2 (x1 − x1,min) f(x1,min)

− 2

�∞

x1,max

f(ξ1) dξ1

+ 2 (x1,max − x1) f(x1,max)

(60)

Finally, exploiting the definition of the error function, from(59)-(60) and taking into account that dx1/dβ = 2ασ1, onegets

∂γ(α, β)

∂β= Γ1(αβ) − Γ1(α(1 − β)) (61)

∂2γ(α, β)

∂β2= Γ2(αβ) + Γ2(α(1 − β)) (62)

where

Γ1(y) = −4αy

σ1

[1 − erf

(√2 y

)]

Γ2(y) = 4α2

[erf

(√2 y

)− 1 +

4y√2π

exp(−2y2

)]

(63)

Studying the nulls of the first derivative and the correspondingsigns of the second derivative, one proves the existence ofα0 > 0 for which the results (i) and (ii) hold.

Proof of Theorem 2

The matrix Σ−1 is partitioned as follows:

Σ−1 =

[a b

′

b C

](64)

where a is scalar. Replacing (64) in (37) and taking Σ−1

asin (44), one gets:

MSE4= E [xscx

′sc] =

[a + 1

σ2

1

b′

b C

]−1

[a + χ2

σ4

1

b′

b C

]

[ −b′C

−1ba + 1

σ2

1

b′

b C

]−1

(65)

whereχ =

x1,max − x1,min

κ

In order to carry out the matrix product in (65), one can exploitthe block matrix inversion

[a + 1

σ2

1

b′

b C

]−1

=

[S−1

C− S−1

Cb′C

−1

−S−1C

C−1

b C−1 + S−1

CC

−1bb

′C

−1

] (66)

where SC = a + 1σ2

1

− b′C

−1b is the Schur complement of

the matrix w.r.t. C. By straightforward calculations it followsthat the trace of the MSE (65) is:

tr MSE =

a − b′C

−1b +

χ2

σ41(

a − b′C−1b +1

σ21

)2

(1 − b

′C

−2b)

+ tr C−1

(67)

Finally, studying the derivative w.r.t. σ1, one can see that (67)has a minimum at σ1 = χ

ACKNOWLEDGMENT

The authors would like to remember their dear colleagueand friend Giovanni Zappa, inspirator of this work and pre-maturely deceased on August 23, 2004.


REFERENCES

[1] T.L. Marzetta, “A simple derivation of the constrained multiple parameterCramer-Rao bound”, IEEE Trans. on Signal Processing, vol. 41, pp. 2247-2249, 1993.

[2] S.M. Kay, Fundamentals of statistical signal processing: estimationtheory, Prentice Hall, 1993.

[3] Y. Bar-Shalom, X. R. Li, T. Kirubarajan, Estimation with applications totracking and navigation, John Wiley & Sons, 2001.

[4] D. Simon, T. Chia, “Kalman filtering with state equality constraints”,IEEE Trans. on Aerospace and Electronic Systems, vol. 39, pp. 128-136,2002.

[5] A. Farina, G. Golino, L. Ferranti, “Constrained tracking filters for A-SMGCS”, Proc. 6th International Conference on Information Fusion,Cairns, Australia, 2003.

[6] J. Garcia Herrero, J.A. Besada Portas, J.R. Casar Corredera, “Use ofmap information for tracking targets on airport surface”, IEEE Trans. onAerospace and Electronic Systems, vol. 39, pp. 675-692, 2003.

[7] B. Ristic, S. Arulampalam, N. Gordon, Beyond the Kalman filter: particlefilters for tracking applications, Artech House, 2004.

[8] S. Challa, S. Arulampalam, N. Bergman, “Target tracking incorporatingflight envelope information”, Proc. 3rd International Conference onInformation Fusion, CD-ROM, ThC2-4, Paris, France, 2000.

[9] A. Bemporad, M. Morari, V. Dua, E.N. Pistikopoulos, “The explicit linearquadratic regulator for constrained systems”, Automatica, vol. 38, pp. 3-20, 2002.

[10] P. Tondel, T.A. Johansen, A. Bemporad, “Further results on multipara-metric quadratic programming”, Proc. 42nd IEEE Conf. on Decision andControl, vol. 3, pp. 3173-3178, Maui, USA, 2003.

[11] A.E. Hoerl, R.W. Kennard, “Ridge regression: biased estimation fornonorthogonal problems”, Technometrics, vol. 12, pp. 55-67, 1970.

[12] A. Benavoli, L. Chisci, A. Farina, “Estimation of constrainedparameters with guaranteed MSE improvement”, submittedto IEEE Trans. on Signal Processing, July 2005. Available:http://www.dsi.unifi.it/users/chisci/recent publ.htm.

Date post:	23-Nov-2023
Category:	Documents
Upload:	independent
View:	0 times
Download:	0 times

Hard-constrained vs. soft-constrained parameter estimation

Documents