Further introduction to Data assimilation - including ...darc/nerc_training/reading... · 12...

Further introduction to Dataassimilation - including error covariances

Vn 2.0, Ross Bannister, [email protected]

What do we want DA to achieve?

To combine imperfect data from models, from observations distributed in time and space, exploiting any relevantphysical constraints, to produce a more accurate and comprehensive picture of the system as it evolves in time.

�[The atmosphere] is a chaotic system in which errors introduced into the system can grow with time . . . Asa consequence, data assimilation is a struggle between chaotic destruction of knowledge and its restoration by newobservations.� Leith (1993)

1

2

What sorts of things have errors?

�All models are wrong . . . � (George Box)

�All models are wrong and all observations are inaccurate.� (a data assimilator)

y Observations (system dependent - e.g. temperature, pressure, humidity, wind, radiance, trace gas mixing ratio, timedelay (GPS), radar re�ectivity, salinity, optical re�ectance, ...).

xB Forecast (background) state vector (system dependent).

H,M, etc. Operators used within the data assimilation itself (e.g. observation operator, model operator, etc.).

xA Assimilated state (analysis) state vector (system dependent).

Also: representivity error (due to �nite representation of state vector), boundary condition error, ...

3

Lecture outline

• Representing uncertainty

� Errors vs. error statistics.

� PDFs.

� Normal distributions in one and higher dimensions.

• Combining imperfect information with Bayes Theorem.

• Di�erent ways of solving the same data assimilation problem.

� Variational assimilation, Kalman �ltering, particle �ltering, and hybrid methods.

• The state vector and the observation vector.

• Covariance matrices.

� Anatomy.

� Importance in (Gaussian) data assimilation.

� Correlation functions and structure functions.

� Modelling covariance matrices for your application.

4

How do we represent uncertainty?

Errors:

• The di�erence between some estimated quantity and the truth. E.g.:

� in a forecast εf = xf − xt,

� in an observation εy = y − yt.

• Errors are unknown and unknowable.

Errorstatistics: • Some useful measures of the possible values that ε could have (e.g. a PDF or

quantities that describe a PDF).

• Error statistics are knowable (but can be di�cult to determine).

The probability that the error ε liesbetween ε and ε+ dε is P (ε)dε.

Form of a 1-D Gaussian:

ε ∼ N(µ, σ2),

P (ε) =1

2πσ2exp−(ε− µ)

2

2σ2.

First moment (µ) and second moment(σ2) only.

5

How do we represent uncertainty (continued)?

-10-5

0 5

10 15

20 -10-5

0 5

10 15

20

epsilon1

epsilon2

The probability that the error ε = (ε1, ε2)T lies between ε

and ε+ dε is P (ε)dε = P (ε)dε1dε2.

Form of an n-dimensional Gaussian for ε = (ε1, ε2, . . . εn)T

with mean µ = (µ1, µ2, . . . µn)T ∈ Rn and covariance S ∈

Rn×n:

ε ∼ N(µ,S),

P (ε) =1√

(2π)n det(S)= exp−1

2(ε− µ)TS−1(ε− µ).

The Gaussian shown has

ε =

(ε1ε2

), µ =

(85

), S =

(5 11 7.5

).

General form of S:

S =

⟨(ε1 − µ1)2

⟩〈(ε1 − µ1)(ε2 − µ2)〉 · · · 〈(ε1 − µ1)(εn − µn)〉

〈(ε2 − µ2)(ε1 − µ1)〉⟨(ε2 − µ2)2

⟩· · · ...

...... . . . ...

〈(εn − µn)(ε1 − µ1)〉 · · · · · ·⟨(εn − µn)2

⟩

2

.

6

Bayes Theorem (the root of all wisdom)

P (A|B) =P (B|A)× P (A)

P (B),

∝ P (B|A)× P (A).

Let A be the event x = xt ∈ Rn and B be the event yo ∈ Rp:

P (x = xt|yo)︸︷︷︸posterior

∝ P (yo|x = xt)︸︷︷︸likelihood

×P (x = xt)︸︷︷︸prior

.

Approaches to DA:

• 1st moment: Find the mode of P (x|yo) (maximum likeli-hood est. - the most likely x).

• 1st moment: Find the mean of P (x|yo), 〈x〉 =´P (x|yo)x dx (minimum variance est.).

• 1st and 2nd moments: �nd the covariance of P (x|yo),

cov =´P (x|yo) (x− 〈x〉) (x− 〈x〉)T dx .

• The whole PDF (or an approx. of).

Approximations: assume that each PDF is Gaussian.

• Likelihood: mean H(xt), covariance R ∈ Rp×p.

• Prior: mean xf , covariance Pf ∈ Rn×n.

P (x = xt|yo) ∝ exp−12(yo −H(xt))

TR−1(yo −H(xt))× exp−12(x− xf)

TP−1f (x− xf),

∝ exp−12(yo −H(x))TR−1(yo −H(x))× exp−1

2(x− xf)

TP−1f (x− xf),

∝ exp−12

[(yo −H(x))TR−1(yo −H(x)) + (x− xf)

TP−1f (x− xf)],

∝ exp−J [x],cost function: J [x] ≡ 1

2(x− xf)

TP−1f (x− xf) +1

2(yo −H(x))TR−1(yo −H(x)).

7

Data assimilation approaches used in practice

Variational Kalman Filter

• Assume Gaussian statistics.

• Solve a variational problem that minimizes the costfunction:

J [x] =1

2(x− xf)

TB−1(x− xf) +

1

2(yo −H(x))TR−1(yo −H(x)).

• 1st moment of prior: a-priori is evolved from aprevious variational analysis.

• 2nd moment of prior: Pf → B (prescribed)background error covariance matrix.

• Prescribed R.

• Variants: 1D-Var (e.g. for atmospheric pro�le),3D-Var (no consideration of time), strong constraint4D-Var (considers observations over a time windowassuming perfect model), weak constraint 4D-Var(account for imperfect model), variational biasestimation, ...

• Lectures and practicals on Tuesday.

• Assume Gaussian statistics.

• Use a formula that gives the mean (or mode) of theposterior P (x|yo):

xA = xB +PfHT(R+HPfH

T)−1

(yo −H(xB)) ,

and its error covariance:

PA =[I−PfH

T(R+HPfH

T)−1

H]Pf .

• 1st moment of prior: a-priori is evolved from aprevious KF analysis:

xB =M(xprevA ).

• 2nd moment of prior: Pf is evolved from a previousKF analysis.

Pf = MPpreva MT +Q.

• Prescribed R.

• Variants: Optimal Interpolation (not reallyconsidered a KF as it does not evolve covariance),ensemble KF (estimate 1st and 2nd moments of priorand posterior PDFs with an ensemble).

• Lectures and practicals on Wednesday\Thursday.

8

Data assimilation approaches used in practice (continued)

Particle Filter Hybrid

• Approximate PDFs that describe prior and posteriorstates with a weighted combination of states (the'particles').

• Non-Gaussian and fully non-linear.

• The �curse of dimensionality�.

• Lectures and practicals on Thursday.

• Virtually all methods make practical approximations.

• Can combine di�erent methods.

• E.g. variational and ensemble:

� Variational methods do not have adequate �owdependence in B.

� Ensemble KF methods su�er from low rank(No. of ensemble members � No. of elementsin state).

• Lecture on Thursday.

All methods derive from Bayes Theorem (the root of all wisdom).

9

Example state and observation vectors

xA analysis statexB background state

Sometimes x and y are for only one time

x-vectors have n elements in totaly-vectors have p elements in total

10

Anatomy of a covariance matrix

Univariate background error covariance matrix (e.g. if x represents a pressure �eld only):

x = p =

p1p2.

.

.

pn

, cov(p′) =

⟨p′p′T⟩ =

⟨p′21

⟩〈p′1p′2〉 · · · 〈p′1p′n〉

〈p′2p′1〉⟨p′22

⟩· · ·

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

〈p′np′1〉 · · · · · ·⟨p′2n

⟩

.

variance

outer product covariance(univariate)

where p′ = p− 〈p〉.

Multivariate background error covariance matrix (e.g. if x represents pressure, zonal wind andmeridional wind):

x =

puv

=

p1.

.

.

pn/3u1

.

.

.

un/3

v1.

.

.

vn/3

, cov(x′) =⟨x′x′T⟩ =

⟨p′p′T⟩ ⟨

p′u′T⟩ ⟨p′v′T⟩⟨

u′p′T⟩ ⟨u′u′T⟩ ⟨

u′v′T⟩⟨v′p′T⟩ ⟨

v′u′T⟩ ⟨v′v′T⟩

.

multivariate covariance sub−matrix

autocovariance sub−matrix

These covariances are symmetrix matrices.

11

Importance of covariance matrices - graphical demonstration

A single observation in a 2-element system (n = 2, p = 1).

x =

(T1T2

), xB =

(TB1TB2

), yo = (y) ,

H(x) = T1, H =(1 0

),

Pf =

(σ2B1 αα σ2B2

), R =

(σ2o).

P (x = xt|yo)︸︷︷︸posterior

∝ P (yo|x = xt)︸︷︷︸likelihood

×P (x = xt)︸︷︷︸prior

,

∝ exp−12

(yo − T1)2σ2o

×

exp−12

σ2B2(T1 − TB1)2 + σ2B1(T2 − TB2)2 − 2α(T1 − TB1)(T2 − TB2)2(σ2B1σ

2B2 − α2)

.

12

Importance of covariance matrices - mathematical demonstration with a2-element state vector

A single observation in a 2-element system (n = 2, p = 1).

The KF formula for the analysis increment is:


T)−1

(yo −H(xB)) .

x =

(T1T2

), xB =

(TB1TB2

), yo = (y) , H(x) = T1,

H =(1 0

), Pf =

(σ2B1 αα σ2B2

), R =

(σ2o).

PfHT =

(σ2B1 αα σ2B2

)(10

)=

(σ2B1α

), HPfH

T =

(1 0

) (σ2B1α

)=(σ2B1),

xA =

(TB1TB2

)+

(σ2B1α

)1

σ2o + σ2B1(y − TB1) .

• The analysis increment is a vector ∝ the �rst column of Pf (called a structure function or covariance function).

� The observation of box 1 in�uences analysis in box 2 because the a-priori errors are correlated (α).

• It is also ∝ the innovation y − TB1.• If σ2o � σ2B1 then the analysis innovation vanishes.

• If σ2o � σ2B1 then box 1 will be set to the observation value and box 2 will be set to TB2 + α(y − TB1)/σ2B1.

13

Importance of covariance matrices - mathematical demonstration with an-element state vector

A single observation in an n-element system (n = n, p = 1).

The KF formula for the analysis increment is:


T)−1

(yo −H(xB)) .

x =

T1...Ti...Tn

, xB =

TB1...TBi...TBn

, yo = (y) , H(x) = Ti,

H =(0 · · · 1 · · · 0

),

Pf =

Pf11 · · · Pf1i · · · Pf1n... . . . ...

......

Pfi1 · · · Pfii · · · Pfin...

...... . . . ...

Pfn1 · · · Pfni · · · Pfnn

, R =

(σ2o).

PfHT =

Pf11 · · · Pf1i · · · Pf1n... . . . ...

......

Pfi1 · · · Pfii · · · Pfin...

...... . . . ...

Pfn1 · · · Pfni · · · Pfnn

0...1...0

=

Pf1i...Pfii...

Pfni

, HPfH

T =

(0 · · · 1 · · · 0

)

Pf1i...Pfii...

Pfni

= (Pfii) =(σ2Bi),

xA =

TB1...TBi...TBn

+

Pf1i...Pfii...

Pfni

1

σ2o + σ2Bi(y − TBi) .

14

• The analysis increment is a vector ∝ the ith column of Pf (called a structure function or covariance function).

• Structure functions are often parametrised with a particular length-scale L. E.g.:

� Gaussian shape Pfij = σBiσBj exp[−(xi − xj)2/L2

],

� Lorenzian shape Pfij = σBiσBj/{1 +

[(xi − xj)2/L2

]},

� SOAR (second order auto-regressive) function Pfij = σBiσBj (1 + |xi − xj| /L) exp (− |xi − xj| /L) .

-10 -5 0 5 10

x_i - x_j

GaussianLorenzian

SOAR

15

Structure functions for �ow in the mid-latitude atmosphere

16

Modelling a covariance matrix

• Observation error covariance matrices (R):

� Often taken to be diagonal for independent obs. Observation error variances (diagonal elements) depend on charac-teristics of the instrument.

� Another contribution is representivity error which will have diagonal (and possibly o�-diagonal) elements.

� If measurements are not independent (e.g. if they are derived using some procedure) then R should not be diagonal.

• Background error covariance matrices (Pf):

� Can be rarely represented explicitly.

� Di�cult to measure (need a large sample of (unknowable) forecast errors).

� Can be modelled using a variety of methods:

∗ 'Inverse Laplacians'.

∗ Di�usion operators (used e.g. in Ocean DA).

∗ Recursive �lters.

∗ Spectral methods, wavelet methods.

∗ Exploit physics (e.g. geophysical balance).

∗ Control variable transforms (transform to a space where Pf is simpler - e.g. diagonal).

• Model error covariance matrices (Q).

17

Modelling a background error covariance matrix (simple example - relatedvariables and control variables)

System (two grid boxes)

• Same system as before x =(T1 T2

)T.

• Suppose that constraint applies: T2 ≈ τ0 + µ0T1 (τ0 andµ0 are known constants).

• We have (e.g.) one observation of each temperature,y = (y1, y2)

T:

y = Hx, H =

(1 00 1

).

Strategy A (assimilate with respect to x directly)

J [x] =1

2

(T1 − TB1 T2 − TB2

)P

(T )−1f

(T1 − TB1T2 − TB2

)+

1

2

(y1 − T1 y2 − T2

)R−1

(y1 − T1y2 − T2

),

=1

2(x− xB)

TP(T )−1f (x− xB) +

1

2(y −Hx)TR−1 (y −Hx) .

• Need to know the covariances explicitly.

• This approach does not exploit the constraint.

18

Simple example (continued)

Strategy A (assimilate with respect to x directly)

copied from previous slide: J [x] =1

2(x− xB)

TP(T )−1f (x− xB) +

1

2(y −Hx)TR−1 (y −Hx) .

Strategy B (use control variables)

• Reminder of the constraint: T2 ≈ τ0 + µ0T1.

• Let T1 have background error variance σ2T1and let the constraint be written T2 = τ0 + µ0T1 + c where c has variance σ2c .

• Write the problem in terms of control variables χ1 and χ2:

T1 = TB1 + σT1χ1, T2 = τ0 + µ0 (TB1 + σT1

χ1) + σ2cχ2.

• Introduce a control vector: χ =(χ1 χ2

)T:

x = U(χ), x = Uχ+ γ =

(T1T2

)=

(σT1

0µ0σT1

σc

)(χ1

χ2

)+

(TB1

τ0 + µ0TB1

).

• χ has background 0 =(0 0

)T, and background error covariance I =

(1 00 1

):

J [χ] =1

2χTχ+

1

2(y −HU(χ))T

(R11 00 R22

)−1(y −HU(χ)) .

• Minimise with respect to χ −→ χA giving xA = U(χA).

19

Simple example (continued)

• All we need to know:

� Background for T1 (TB1).

� Background error variance of T1 (σ2T1).

� Variance of c (σ2c ).

• The constraint is applied with a strength speci�ed by σ2c .

• The implied background error covariance of x is:

P(T,implied)f = UUT =

(σT1

0µ0σT1

σc

)(σT1

µ0σT1

0 σc

)=

(σ2T1

µ0σ2T1

µ0σ2T1

µ20σ2T1+ σ2c

).

• Meteorological/oceanic data assimilation de�ne a control variable transform with balance conditions.

20

Summary

• Uncertainty is in everything.

• Uncertainty is described by probabilities.

• All proper data assimilation problems need PDFs.

� Related via Bayes Theorem.

• The normal distribution is often used to describe PDFs.

� Mean and (co)variance.

� Leads to Kalman Filter and variational cost functions.

� (Co)variances describe the precision of the data (and hence the weight given to the data in DA).

• Have seen that background error covariances have a profound impact on the analysis.

� Often in�uenced by physical constraints.

� Explicit matrix size n× n.� Can be modelled.

• Pointers to further information . . .

21

Further reading - selected books and papers

• Barlow, R.J., Statistics - A guide to the use of statistical methods in the physical sciences, John Wiley and Sons (1989).This is an elementary, readable book on statistics for the scientist (e.g. it derives the Gaussian distribution from �rstprinciples). It also covers the least squares problem.

• Rodgers C.D., Inverse Methods for Atmospheric Sounding: Theory and Practice, World Scienti�c Publishing (2000).This is a very readable book. Even though it focuses on satellite retrieval theory (mathematically a similar problem todata assimilation), this is a good book for virtually everything that you need to know about covariances. It also contains asummary of basic data assimilation methods and has a useful appendix on linear algebra.

• Lewis J.M., Lakshmivarahan S., Dhall S., Dynamic Data Assimilation: A Least Squares Approach, CambridgeUniversity Press (2006). This huge book covers a lot of material with a lot of repetition. It has some good introductorychapters and some useful results if you know where to look. (Unfortunately there are LOADS of typos.)

• Kalnay E., Atmospheric Modeling, Data Assimilation and Predictability, Cambridge University Press (2002). A largesection of this book covers data assimilation, and there is also a lot of basic material for the budding dynamic modeller.The data assimilation part is introductory, but covers most key ideas. It will leave you wanting to know more!

• Schlatter T.W., Variational assimilation of meteorological observations in the lower atmosphere: a tutorial on how itworks, J. Atmos. and Solar-Terr. Phys. 62 pp.1057-1070 (2000). It is worth getting hold of this paper as it is an excellentdescription of variational data assimilation (relevant to lectures later in the course).

• Bannister R.N., A review of forecast error covariance statistics in atmospheric variational data assimilation. I: Character-istics and measurements of forecast error covariances., Q.J. Roy. Met. Soc. 134, 1951-1970 (2008) and Bannister R.N.,A review of forecast error covariance statistics in atmospheric variational data assimilation. II: Modelling the forecast errorcovariance statistics., Q.J. Roy. Met. Soc. 134, 1971-1996 (2008). What can I say - blatant self publicity! A source ofinformation about background error covariances and how they can be modelled.

Date post:	08-Oct-2020
Category:	Documents
Upload:	others
View:	4 times
Download:	0 times

Further introduction to Data assimilation - including ...darc/nerc_training/reading... · 12...

Documents