+ All Categories
Home > Documents > A Few Concluding Comments - Lorentz Center · A Few Concluding Comments ... Presentation by H. Igel...

A Few Concluding Comments - Lorentz Center · A Few Concluding Comments ... Presentation by H. Igel...

Date post: 05-Jun-2018
Category:
Upload: vuthuy
View: 214 times
Download: 0 times
Share this document with a friend
53
Workshop Uncertainty Analysis in Geophysical Inverse Problems Lorentz Center, Leiden, The Netherlands A Few Concluding Comments Olivier Talagrand Laboratoire de Météorologie Dynamique, École Normale Supérieure, Paris, France With Acknowledgments to M. Jardak, A. Fichtner, J. Trampert, P. J. van Leeuwen and all participants in the Workshop 11 November 2011
Transcript

Workshop Uncertainty Analysis in Geophysical Inverse ProblemsLorentz Center, Leiden, The Netherlands

A Few Concluding Comments …

Olivier TalagrandLaboratoire de Météorologie Dynamique, École Normale Supérieure, Paris, France

With Acknowledgments to M. Jardak, A. Fichtner, J. Trampert, P. J. van Leeuwenand all participants in the Workshop

11 November 2011

Vocabulary (e. g. ‘model’) !

- Summary and comments

- Ensemble estimation (resampling methods,Monte-Carlo or otherwise)

- A few problems to be solved

Bayesian vs. frequentist point of view (L. Tenorio, M.Holschneider, J. Trampert) ?

Bayesian estimation

The problem, if there is any, is with the prior.

But is there always a prior ?

First time geophysicists attempted to determine a ‘model’ of the Earth’s interior fromobservations of the propagation of seismic waves, did they have a prior ?

Early 2005. Huygens probe descended into the atmosphere of Titan (satellite ofSaturn). No prior was used.

And, if there a prior, where does it come from, if not from earlier data ? L. Tenorio,who works on cosmological problems, mentioned he had a prior for the Universe,based on quantum fluctuations.

Bayesian estimation (continued)

Prior can be implicit

State vector x, belonging to state space S (dimS = n), to be estimated.Data vector z, belonging to data space D (dimD = m), available.

z = F(x, ζ) (1)

where ζ is a random element representing the uncertainty on the data (or, more precisely, on thelink between the data and the unknown state vector).

For example

z = Γx + ζ

Bayesian estimation (continued)

Probability that x = ξ for given ξ ?

x = ξ ⇒ z = F(ξ, ζ)

P(x = ξ | z) = P[z = F(ξ, ζ)] / ∫ξ’ P[z = F(ξ’, ζ)]

Unambiguously defined iff, for any ζ, there is at most one x such that (1) isverified.

⇔ data contain information, either directly or indirectly, on any component of x.Determinacy condition.

No need for explicit knowledge for a prior probability distribution for x, but needfor for explicit knowledge for probability distribution for ζ.

Presentation by H. Igel (starting point m0 not associated with probabilitydistribution)

ExampleExample

z = Γx + ζ, ζ ∼ N [µ, S]

Then conditional probability distribution is

P(x | z) = N [xa, Pa]where

xa = (Γ T S-1Γ)-1 Γ T S-1 [z − µ] Pa = (Γ T S-1Γ)-1

Determinacy condition : rankΓ = n. Requires m ≥ n.

Expressions

xa = (Γ T S-1Γ)-1 Γ T S-1 [z − µ] (2a) Pa = (Γ T S-1Γ)-1 (2b)

can also be obtained, from the knowledge of only µ and S, as defining thevariance minimizing linear estimator of x from z, or Best Linear UnbiasedEstimate (BLUE) of x from z (but significance of Pa is now totally different)

Expressions (2) are commonly used, and are useful, in many (especially, butnon only, geophysical) applications where error ζ is distinctly non-gaussian and operator Γ is mildly nonlinear. In fact, going beyond (2)turns out to be very challenging in many practical problems.

This Workshop

In addition to mathematical and theoretical papers, large variety of differentphysical applications and methods.

Beyond differences in vocabulary and notations, methods used are fundamentallythe same. Linear and gaussian inversion (ray-tracing inversion, …) ,heuristically extended to (mildly) nonlinear and non-gaussian situations, andsampling, or ‘ensemble’ methods, intended at explicitly sampling theuncertainty.

Seismic Tomography

Major development. Full Waveform Inversion. Is physics simpler for seismologiststhan for fluid dynamicists ?

Major problem of underdeterminacy. Null space matters, hence Backus-Gilberttheory (D. Al-Attar)

Sampling algorithms: Gibbs, Metropolis-Hastings, Neighbourhood Algorithm (M.Sambridge).

Neural network (NN, J. Trampert). Very efficient for numerical inversion ofnonlinear functions. But link with bayesian approach and uncertaintyquantification ?

Assimilation

The word assimilation refers to the situation when the observed system evolves intime and the dynamical laws which govern the evolution of the system areamong the information to be used.

This does not change anything to the conceptual nature of the problem, but it meansthat in the description

z = F(x, ζ)

the dynamical laws are present in a way or another.

Because they exist for producing daily forecasts, meteorologists started assimilationvery early (A. Lorenc)

© Crown copyright Met Office Andrew Lorenc14

ratio of supercomputer costs: 1 day's assimilation / 1 day forecast

5

8

20

31

1

10

100

1985 1990 1995 2000 2005 2010

AC scheme

3D-Var on T3E

simple 4D-Var on SX8

4D-Var with

outer_loop

Ratio of global computer costs:1 day’s DA (total incl. FC) / 1 day’s forecast.

Computer power increased by 1M in 30 years.Only 0.04% of the Moore’s Law increase overthis time went into improved DA algorithms,rather than improved resolution!

Three classes of algorithms have been presented, with both specific applicationsand theoretical discussions

- Variational assimilation, which is a standard tool in meteorology, and hasextended in recent years to applications in solid earth geophysics.

- Kalman filter, which is equivalent to variational assimilation in the linear case,and has been presented in its nonlinear extension, the Ensemble Kalman filter(EnKF)

- Particle filters, which are totally bayesian in their principle (P. J. vanLeeuwen).

Assimilation, as it has done over the years, is continuously extending to more andmore novel physical applications : geomagnetism (A. Fournier, A. Jackson),observations of the Earh’s rotation (J. Saynisch), atmospheric chemistry (H.Eskes), identification of CO2 surface fluxes (V. Gaur).

Theory : R. Potthast, P. J. Van Leeuwen

Very active field, which we can expect to further develop in the coming years, boththrough the development of new physical applications, and of theoreticalmethods.

Purely bayesian methods, particle filters ?

Quantification of uncertainty : identification and quantification of model (I mean‘theory’) errors, and ensemble methods.

- Ensemble estimation (resampling methods, Monte-Carloor otherwise)

ECMWF, Technical Report 499, 2006

Data

z = Γx + ζ

Then conditional posterior probability distribution

P(x | z) = N [xa, Pa]with

xa = (Γ T S-1Γ)-1 Γ T S-1 [z − µ]Pa = (Γ T S-1Γ)-1

Ready recipe for producing sample of independent realizations of posterior probabilitydistribution (resampling method)

Perturb data vector additively according to error probability distribution N [0, S], andcompute analysis xa for each perturbed data vector

Available data consist of

- Background estimate at time 0 x0

b = x0 + ζ0

b ζ0b ∼ N [0, P0

b]

- Observations at times k = 0, …, K yk = Hkxk + εk εk ∼ N [0, Rk]

- Model (supposed to be exact) xk+1 = Mkxk k = 0, …, K-1

Errors assumed to be unbiased and uncorrelated in time, Hk and Mk linear

Then optimal state (mean of bayesian gaussian pdf) at initial time 0 minimizes objective function

ξ0 ∈ S →

J(ξ0) = (1/2) (x0b - ξ0)T [P0

b]-1 (x0b - ξ0) + (1/2) Σk[yk - Hkξk]T Rk

-1 [yk - Hkξk]

subject to ξk+1 = Mkξk , k = 0, …, K-1

Work done here (with M. Jardak). Apply ‘ensemble’ recipe described above tononlinear and non-gaussian cases, and look at what happens.

Everything synthetic, Two one-dimensional toy models : Lorenz ’96 model andKuramoto-Sivashinsky equation. Perfect model assumption

(Nonlinear) Lorenz’96 model

There is no (and there cannot be) a general objective test of bayesianity. We usehere as a substitute the much weaker property of reliability.

Reliability. Statistical consistency between predicted probability of occurrenceand observed frequency of occurrence (it rains 40% of the time incircumstances when I predict 40%-probability for rain).

Observed frequency of occurrence p‘(p) of event, given that it has beenpredicted to occur with probability p, must be equal to p.

For any p, p‘(p) = p

More generally, frequency distribution F‘(F) of reality, given that probabilitydistribution F has been predicted for the state of the system, must be equal to F.

Reliability can be objectively assessed, provided a large enough sample ofrealizations of the estimation process is available.

Reliability diagramme, NCEP, event T850 > Tc - 4C, 2-day range,Northern Atlantic Ocean, December 1998 - February 1999

Rank histograms, T850, Northern Atlantic, winter 1998-99

Top panels: ECMWF, bottom panels: NMC (from Candille, Doctoral Dissertation, 2003)

In both linear and nonlinear cases, size of ensembles Ne =30

Number of realizations of the process M =3000 and 7000 for Lorenz andKuramoto-Sivashinsky respectively.

Non-gaussianity of the error, if model is kept linear, has nosignificant impact on scores.

Brier Score (Brier, 1950), relative to binary event E

B ≡ E[(p - po)2]

where p is predicted probability of occurrence, po = 1 or 0 depending on whether E has beenobserved to occur or not, and E denotes average over all realizations of the prediction system.

Decomposes into

B = E[(p-p’)2] - E[(p’-pc)2] + pc(1-pc)

where pc ≡ E(po) = E(p’) is observed frequency of occurrence of E.

First term E[(p-p’)2] measures reliability.

Second term E[(p’-pc)2] measures dispersion of a posteriori calibrated probabilities p’. Thelarger that dispersion, the more discriminating, or resolving, and the more useful, the predictionsystem. That term measures the resolution of the system.

Third term, called uncertainty term, depends only on event E, not on performance of predictionsystem.

Reliability diagramme, NCEP, event T850 > Tc - 4C, 2-day range,Northern Atlantic Ocean, December 1998 - February 1999

Brier Skill Score

BSS ≡ 1 - B/pc(1-pc)

(positively oriented)

and components

Brel ≡ E[(p-p’)2]/pc(1-pc)

Bres ≡ 1 - E[(p’-pc)2] /pc(1-pc)

(negatively oriented)

Very similar results obtained with the Kuramoto-Sivashinsky equation.

Preliminary conclusions

• In the linear case, ensemble variational assimilationproduces, in agreement with theory, reliable estimates ofthe state of the system.

• Nonlinearity significantly degrades reliability (andtherefore bayesianity) of variational assimilationensembles. Resolution (i. e., capability of ensembles toreliably estimate a broad range of different probabilities ofoccurrence, is also degraded. Similar results (not shown)have been obtained with Particle Filter, which producesensembles with low reliability.

Perspectives

• Perform further studies on physically more realisticdynamical models (shallow-water equations)

• Further comparison with performance Particle Filters, andwith performance of Ensemble Kalman Filter.

Bayesian estimation (continued)

Consider again general situation

z = F(x, ζ) (1)

What if probability distribution for ζ is not known, or known insufficiently for determiningposterior bayesian probability distribution for x ?

Consider class C of all probability distributions for x ithat are compatible with data (i. e., that arethe bayesian posterior probability distribution for some choice of missing information ondistribution for ζ). Then choose one distribution in C.

Frequent choice. Choose pdf p(x) which maximizes entropy

E ≡ -∫p(x) lnp(x) dx = - E [lnp(x)]

Maximum entropy (S. Losa). Least commiting choice

Bayesian estimation (continued)

In the casez = Γx + ζ

assuming that only expectation µ and covariance matrix S of ζ are known, entropy-maximizing

probability distribution is gaussian distribution N [xa, Pa] defined above.

Question. How to quantify uncertainty ?

Through bounds, and then interval arithmetic, which carries the bounds through thecomputations ? That has probably been done. But if operator F-1, while existing, isunbounded, bounds are infinite. Regularization can help here (but is it legitimate to useregularization only for imposing bounds for which there is no physical justification ?).

‘Most-squares’ method presented by M. Meju.

Through probability distributions ? Probability distributions are very convenient for describinguncertainty, especially when it comes to updating uncertainty through Bayes’ rule.

Jaynes, E. T. 2007, Probability Theory: The Logic of Science, Cambridge University Press.

Following an earlier work by Cox (1946), author posits a number of axioms thatany numerical function that measures ‘uncertainty’ must verify, and then finds(maths is not trivial) that such an uncertainty function follows the rules of calculusof probability.

Alternative approaches

- Dempster Shafer theory of evidence (Dempster, A. P., 1967, Annals of Math. Statistics, Shafer,G., 1976, A mathematical theory of evidence. Princeton Univ. Press). Introduces distinctnotions of belief and plausibility.Has been used in remote observation of Earth surfaceparameters.

- Theory of possibility (link with fuzzy logic and fuzzy sets (Zadeh, L. A., 1978, Fuzzy Sets andSystems)

SIAM Activity Group on Uncertainty Quantification (http://www.siam.org/activity/uq/)

Statistical and Applied Mathematical Sciences Institute ((SAMSI) 2011-12 Program onUncertainty Quantification (http://www.samsi.info/programs/2011-12-program-uncertainty-quantification)

- A few problems to be solved

• Practically no change in the global misfit (modification lies essentially in the nullspace).

• Big change in the local misfit at observatory .

While being globally plausible, the local observation rules out the modified model.

original modified w/o changing misfit

de Wit &Trampert

The data weighting problem

original modified w/o changing misfit

de Wit &Trampert

• Practically no change in the global misfit (modification lies essentially in the nullspace).

• Big change in the local misfit at observatory .

While being globally plausible, the local observation rules out the modified model.

Can uncertainties be based on one single scalar misfit measure?

Can we attach more significance to observatory without becoming un-Bayesian?

What do we do with all the other observatories that may claim to be as important asobservatory ?

The data weighting problem

- How to represent, or even define, uncertainty ?

- How to objectively evaluate estimates of uncertainty(requires very large validation sets, that may not beavailable for solid Earth) ?

- Identification and validation of errors in ‘models’ (in thesense of statements of relevant physical laws)

Announcement

International Conference on Ensemble Methods in Geophysical Sciences

Supported by World Meteorological Organization, Météo-France, Agence Nationale dela Recherche (France)

Dates and location : 12-16 November 2012, Toulouse, France

Forum for study of all aspects of ensemble methods for estimation of state ofgeophysical systems (atmosphere, ocean, solid earth, …), in particular in assimilation ofobservations and prediction : theory, algorithmic implementation, practical applications,evaluation, …

Scientific Organizing Committee is being set up. Announcement will be widelycirculated in coming weeks.

If interested, send email to [email protected]


Recommended