+ All Categories
Home > Documents > Combining physical and statistical models to predict ... · Environmental Space-Time Fields, X X...

Combining physical and statistical models to predict ... · Environmental Space-Time Fields, X X...

Date post: 09-Aug-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
45
Combining physical and statistical models to predict environmental processes. Jim Zidek U British Columbia NICDS 2010 - CRM – p. 1/31
Transcript
Page 1: Combining physical and statistical models to predict ... · Environmental Space-Time Fields, X X massively multivariate:over time × space × species, often discretized: time may

Combining physical and statisticalmodels to predict environmental

processes.Jim Zidek

U British Columbia

NICDS 2010 - CRM – p. 1/31

Page 2: Combining physical and statistical models to predict ... · Environmental Space-Time Fields, X X massively multivariate:over time × space × species, often discretized: time may

Acknowledgements

Prasad Kasibhatla, Duke

Douw Steyn, UBC

Co-authors:

Nhu Le, BC Cancer AgencyZhong Liu, Capital One

NICDS 2010 - CRM – p. 2/31

Page 3: Combining physical and statistical models to predict ... · Environmental Space-Time Fields, X X massively multivariate:over time × space × species, often discretized: time may

Outline

PART I: Some relevant UBC research

PART II: Phystat modelling - fundamentals

PART III: Phystat modeling - approaches

Conclusions

NICDS 2010 - CRM – p. 3/31

Page 4: Combining physical and statistical models to predict ... · Environmental Space-Time Fields, X X massively multivariate:over time × space × species, often discretized: time may

PART I: UBC RESEARCHCurrent research involving natural resources.

NICDS 2010 - CRM – p. 4/31

Page 5: Combining physical and statistical models to predict ... · Environmental Space-Time Fields, X X massively multivariate:over time × space × species, often discretized: time may

NICDS - Agri & Agri - Food Canada

Soil - water - climate change - food -biofuelNICDS project with AAFC partnership 2007 -

One year’ only of NICDS funding - failure of NSERCrenewal. Success story. Work continues.

AAFC has provided:scientific collaborationpositions: 2 PDFs; 1 year full RA; MSc coopdatainteresting projectsmeeting opportunitiescyber course instruction

NICDS 2010 - CRM – p. 5/31

Page 6: Combining physical and statistical models to predict ... · Environmental Space-Time Fields, X X massively multivariate:over time × space × species, often discretized: time may

NICDS - Agri & Agri - Food Canada

Projects completed:

Markov models for binary climate processes (PhDthesis)

Phenological phenomena models, e.g. bloom dates forgrape vines (MSc Thesis)

Crop yield forecasting models based on soil - moisturecharacteristics (Current PhD thesis).

NICDS 2010 - CRM – p. 5/31

Page 7: Combining physical and statistical models to predict ... · Environmental Space-Time Fields, X X massively multivariate:over time × space × species, often discretized: time may

NICDS - Agri & Agri - Food Canada

Conventional regression residuals: yields on soil moistur e byagrodistrict. Fails to borrow strength by exploiting spatial modelingtechniques.

NICDS 2010 - CRM – p. 5/31

Page 8: Combining physical and statistical models to predict ... · Environmental Space-Time Fields, X X massively multivariate:over time × space × species, often discretized: time may

NICDS - Agri & Agri - Food Canada

Future work:

Future crop yields based on downscaled climate modelestimates

Web portals

Design of micro sensor monitoring networks - soilconditions.

NICDS 2010 - CRM – p. 5/31

Page 9: Combining physical and statistical models to predict ... · Environmental Space-Time Fields, X X massively multivariate:over time × space × species, often discretized: time may

FPInnovations

Forests - climate change Current NSERC - CRD - FPInnovationsresearch grant. 5 year collaborative research program - sited atUBC in partnership with SFU - concerns strength lumber.Current projects:

design of sampling programs - cross sectional andlongitudinal - catastropic change like mountain pinebeetle - long term trends due to climate change.

property relationships - e.g. cracks vs bending strength

duration of load = accelerated testingintegrates deterministic engineering models with data

NICDS 2010 - CRM – p. 6/31

Page 10: Combining physical and statistical models to predict ... · Environmental Space-Time Fields, X X massively multivariate:over time × space × species, often discretized: time may

FPInnovations

Forests - climate changePossible opportunities for cross Canada collaboration as F PI haslabs in both the East and West.

NICDS 2010 - CRM – p. 6/31

Page 11: Combining physical and statistical models to predict ... · Environmental Space-Time Fields, X X massively multivariate:over time × space × species, often discretized: time may

PART II: PHYSTAT MODELLING -FUNDAMENTALS

NICDS 2010 - CRM – p. 7/31

Page 12: Combining physical and statistical models to predict ... · Environmental Space-Time Fields, X X massively multivariate:over time × space × species, often discretized: time may

Origins

Need to model environmental space -time fields overlarge space - time domains that challenge physical andstatistical modelers

12

NICDS 2010 - CRM – p. 8/31

Page 13: Combining physical and statistical models to predict ... · Environmental Space-Time Fields, X X massively multivariate:over time × space × species, often discretized: time may

Environmental Space-Time Fields, X

X massively multivariate:over time × space × species, oftendiscretized:

time may mean, hour, day, month, etc

space can refer to a point with a latitude and longitudeor a region eg a county

species:

chemical species, gases, aerosols, etcgenerally, any vector of dependent variables eg 24hourly values within day

Reference: Le and Zidek (2006). Statistical analysis ofenvironmental space - time fields. Springer.

NICDS 2010 - CRM – p. 9/31

Page 14: Combining physical and statistical models to predict ... · Environmental Space-Time Fields, X X massively multivariate:over time × space × species, often discretized: time may

What’s a Model?

“an abstract, analogue representation of the prototypewhose behavior is being studied” (Steyn & Galmarini 2003)

NICDS 2010 - CRM – p. 10/31

Page 15: Combining physical and statistical models to predict ... · Environmental Space-Time Fields, X X massively multivariate:over time × space × species, often discretized: time may

Simulation Model Taxonomy

Analytic Models:

variables in tractable math equations representmeasurable attributes of the real thing

Physical Scale Models

physical behavior of their measurable propertiesanalogous to that of the real thing

Numerical Models

variables obtained by numerical solution thought tobe analogous to measurable attributes of the realthing. Model outputs called “simulated data”.

NICDS 2010 - CRM – p. 11/31

Page 16: Combining physical and statistical models to predict ... · Environmental Space-Time Fields, X X massively multivariate:over time × space × species, often discretized: time may

Models: The problem of meaning

Controversy! The Oreskes PaperOreskes, Schrader-Frechette & Belitz (1994) Science, 263, 641-646

highly influential

says physical models cannot be shown to representreality - validation meaningless/pointlessstill cited over 40 times per yrused to justify not validating!

NICDS 2010 - CRM – p. 12/31

Page 17: Combining physical and statistical models to predict ... · Environmental Space-Time Fields, X X massively multivariate:over time × space × species, often discretized: time may

Models: The problem of meaning

Controversy! The Oreskes PaperOreskes, Schrader-Frechette & Belitz (1994) Science, 263, 641-646

dismisses common assessment practices

verificationvalidationverifying numerical solutionscalibrationconfirmation

NICDS 2010 - CRM – p. 12/31

Page 18: Combining physical and statistical models to predict ... · Environmental Space-Time Fields, X X massively multivariate:over time × space × species, often discretized: time may

Oreskes Arguments

. . .

Confirmation: match between simulated and real dataimplies verification (truth) - logical fallacy called “affirmingthe consequence”

EXAMPLE: Hypothesis H : “It is raining.” Model: “IfH, I will stay home and revise the paper." You findme at home and therefore conclude it is rainingbecause empirical data matches predictedoutcome under the hypothesis of model!

Failing to predict the data ⇒ bad model. Success ; goodmodel! Moreover, numerous models could predict the sameobservations equally well!

NICDS 2010 - CRM – p. 13/31

Page 19: Combining physical and statistical models to predict ... · Environmental Space-Time Fields, X X massively multivariate:over time × space × species, often discretized: time may

Oreskes Arguments

Summary:

“The primary purpose of models in heuristic...usefulfor guiding further study but not susceptible toproof... [Any model is] a work of fiction (OUB citingphilosopher Nancy Cartwright). ... A model, like anovel may resonate with nature, but is not the ‘realthing’.”

NICDS 2010 - CRM – p. 13/31

Page 20: Combining physical and statistical models to predict ... · Environmental Space-Time Fields, X X massively multivariate:over time × space × species, often discretized: time may

Models: The problem of scales

Combining simulated & Real Data:Does it make sense?

Example:(1 + 1 )/2 = 0.5

Seems correct. But its actually nonsensical.

NICDS 2010 - CRM – p. 14/31

Page 21: Combining physical and statistical models to predict ... · Environmental Space-Time Fields, X X massively multivariate:over time × space × species, often discretized: time may

Models: The problem of scales

Combining simulated & Real Data:Does it make sense?

Example:(1 + 1 )/2 = 0.5

Seems correct. But its actually nonsensical.

(1 cm + 1 apple )/2 = 0.5

Nonsensical! Simulated data also on different scalesthan real data

NICDS 2010 - CRM – p. 14/31

Page 22: Combining physical and statistical models to predict ... · Environmental Space-Time Fields, X X massively multivariate:over time × space × species, often discretized: time may

Model Dynamic Scales

Steyn & Galmarini 2003 demonstrate the problem.

Continuous real data monitors on a space - time scale ofjust 1 m2× few minutes in lower left hand corner!!

NICDS 2010 - CRM – p. 15/31

Page 23: Combining physical and statistical models to predict ... · Environmental Space-Time Fields, X X massively multivariate:over time × space × species, often discretized: time may

PART III: PHYSTAT MODELLING -APPROACHES

21

NICDS 2010 - CRM – p. 16/31

Page 24: Combining physical and statistical models to predict ... · Environmental Space-Time Fields, X X massively multivariate:over time × space × species, often discretized: time may

Motivating Example: Calibration

AIR POLLUTION

DISEASE/DEATH

NEED TO REGULATE

NEED TO MONITOR

NEED TO RELATE AMBIENT TO

PERSONAL EXPOSURES

NEED DOSE RESPONSE MODELS

IMPACT ESTIMATES/CONTROL!!!

NICDS 2010 - CRM – p. 17/31

Page 25: Combining physical and statistical models to predict ... · Environmental Space-Time Fields, X X massively multivariate:over time × space × species, often discretized: time may

Ozone example

Pollution:US Ozone

Sources:

NA AnthroNot NA

Anthro

NICDS 2010 - CRM – p. 18/31

Page 26: Combining physical and statistical models to predict ... · Environmental Space-Time Fields, X X massively multivariate:over time × space × species, often discretized: time may

Ozone example

Not NA

Anthro

Foreign AnthroNatural Sources

Policy Related

Background (PRB)

Lightning

Wildfires

Biogenic

Emissions

NICDS 2010 - CRM – p. 18/31

Page 27: Combining physical and statistical models to predict ... · Environmental Space-Time Fields, X X massively multivariate:over time × space × species, often discretized: time may

Estimating the PRB

Not measurable:

urban pollution spreads to rural areasfew pristine sites availablerepresentative of contaminated areas???

Alternative: infer from deterministic chemical transportmodels (CTMs)

GEOS-CHEM used for ozoneMAQSIP similar (see below)

Calibration needed: to represent “ground truth”(measurements)

NICDS 2010 - CRM – p. 19/31

Page 28: Combining physical and statistical models to predict ... · Environmental Space-Time Fields, X X massively multivariate:over time × space × species, often discretized: time may

Calibrating CTMs

Fundamental issue: different scales.

CTMs: meso-scale models

Field measurements: microscale

NICDS 2010 - CRM – p. 20/31

Page 29: Combining physical and statistical models to predict ... · Environmental Space-Time Fields, X X massively multivariate:over time × space × species, often discretized: time may

Bayesian melding approach

Bayesian melding model (BM) (Fuentes & Raftery,Biometrics, 2005):

Z(s) = Z(s) + e(s)

Z(s) = a(s) + b(s)Z(s) + δ(s)

Z(B) =

B

Z(s)ds

Z(B) ≈1

L

L∑

j=1

Z(sj,B).

{sj,B}: sampling points within B;Z(s) : measurements ;

Z(B) : model outputs ; Z(s) : “true” process .NICDS 2010 - CRM – p. 21/31

Page 30: Combining physical and statistical models to predict ... · Environmental Space-Time Fields, X X massively multivariate:over time × space × species, often discretized: time may

Model calibration

Calibration formula:

Recalibrated Z(B) =

(

Z(B)−1

|B|

B

a(s)ds

)

/b.

Each iteration of the MCMC generates a recalibrated value& overall, empirical marginal distribution for it.

NICDS 2010 - CRM – p. 22/31

Page 31: Combining physical and statistical models to predict ... · Environmental Space-Time Fields, X X massively multivariate:over time × space × species, often discretized: time may

BM calibration assessment

How well does recalibrated model outputs predictmeasurements?

Uses 10AM - 5PM averages: measurements &MAQSIP model outputs (Kasibhatla & Chameides 2000;Fiore et al. 2003, 2004)

Model outputs: from 78 grid cells.Measurements: from 48 stations.Validation data: measurements from remaining 30stations (all collocated with grid cells by choice) to bepredicted.

NICDS 2010 - CRM – p. 23/31

Page 32: Combining physical and statistical models to predict ... · Environmental Space-Time Fields, X X massively multivariate:over time × space × species, often discretized: time may

BM calibration assessment

How well does recalibrated model outputs predictmeasurements?Define: root mean square prediction error (RMSPE):

1

n

n∑

i

(Oi − Oi)2,

Oi=measurement at location i; Oi= prediction at location i.

Results:melding Kriging 1 Kriging 2

mean 13.37* 14.24 14.79Average RMSPE for 30 days. Kriging1 : Using measurements. Kriging2 :Using model outputs.

NICDS 2010 - CRM – p. 23/31

Page 33: Combining physical and statistical models to predict ... · Environmental Space-Time Fields, X X massively multivariate:over time × space × species, often discretized: time may

Univariate STReg approach

Univariate spatial - temporal regression

Extends Guillas et al. (2006)

Ignore misaligned measurements - to - model supports.

Relate measurements {O(t)} to mod outputs {M(t)} ateach monitoring site:

O(t) = c+ aM(t) +Nt, t = 1, 2, · · ·, T

Nt = ρNt−1 + et

et =

p∑

i=1

γiZi(t) + ǫt

ǫt ∼ N(0, σ2ǫ ) i.i.d

NICDS 2010 - CRM – p. 24/31

Page 34: Combining physical and statistical models to predict ... · Environmental Space-Time Fields, X X massively multivariate:over time × space × species, often discretized: time may

Univariate STReg approach

Univariate spatial - temporal regression

Make coefficients site dependent & have jointGaussian field : a, c, γi & residuals ǫ.

Temporal process model : AR(p) with spatiallycorrelated coefficients & residuals.

Gives spatial predictions & temporal forecasts.

37

NICDS 2010 - CRM – p. 24/31

Page 35: Combining physical and statistical models to predict ... · Environmental Space-Time Fields, X X massively multivariate:over time × space × species, often discretized: time may

Model calibration: Uni STReg

Calibration formula:

Recalibrated Z(B) = a+ cZ(B).

Incorporated into MCMC runs.

NICDS 2010 - CRM – p. 25/31

Page 36: Combining physical and statistical models to predict ... · Environmental Space-Time Fields, X X massively multivariate:over time × space × species, often discretized: time may

Univariate STReg assessment

As before:78 stations collocated with 78 model grid cellsuse all 24 hours, not just 8-hour averages.

Fit model: use first 240 hours, measurements at 15monitoring sites for this.

Prediction & forecasting: use all model outputs, 78stations & 480 hours.

For the 15 stations: forecast future in 2nd block of 240hours.

Spatial predictions: 1st 240 hour measurements forremaining 63 sites.

NICDS 2010 - CRM – p. 26/31

Page 37: Combining physical and statistical models to predict ... · Environmental Space-Time Fields, X X massively multivariate:over time × space × species, often discretized: time may

Uni STReg assessment

Overall Uni STReg calibration beats Kriging, Melding (smallnumber of stations - 15 to predict 63; only spatial modelsonly).

Overall accuracy: Uni STReg does better, spatialprediction & forecasting.

Coverage probabilities: 90% credible intervals forforecasts = 90%; for predictions 95% (too big!).

RESULTS:Uni STReg Model outputs alone

RMSPE 14.43 16.50

RMSFE 15.57 18.57

NICDS 2010 - CRM – p. 27/31

Page 38: Combining physical and statistical models to predict ... · Environmental Space-Time Fields, X X massively multivariate:over time × space × species, often discretized: time may

Multivariate STReg

Extends Uni STReg: Can borrow strength even inunivariate case so improves on Uni STReg.

Os,t = βsMs,t +Ns,t

Ns,t = ρNs,t−1 + γsZs,t + ǫs,t

ǫs,t ∼ MVN(0,Σǫ) independently & identically.

Os,t : q× 1 measurements vector, q pollutants.

Ms,t : (p+ 1)× 1, vector of intercept terms, & p modeloutputs, for the q pollutants.

p = q not necessary.

NICDS 2010 - CRM – p. 28/31

Page 39: Combining physical and statistical models to predict ... · Environmental Space-Time Fields, X X massively multivariate:over time × space × species, often discretized: time may

Multivariate STReg

Make βs & γs spatial Gaussian random field .

Conjugate prior: Inverted Wishart for Σǫ.

Separability assumption: Residual covariance hasKronecker structure, ǫ ∼ MVN(0, In(T−1) ⊗Σǫ).

Calibration formula similar to Uni STReg.

NICDS 2010 - CRM – p. 28/31

Page 40: Combining physical and statistical models to predict ... · Environmental Space-Time Fields, X X massively multivariate:over time × space × species, often discretized: time may

Multi STReg assessment

More accurate than Uni STReg when reduced to onedimension when strength can be borrowed.

Use average of remaining 16 hours of measurements &model outputs on each day to forecast 8-hour(10AM-5PM) peak average.

Both the RMSPE & RMSFE smaller for the Multi STRegvariate model.

NICDS 2010 - CRM – p. 29/31

Page 41: Combining physical and statistical models to predict ... · Environmental Space-Time Fields, X X massively multivariate:over time × space × species, often discretized: time may

Multi STReg assessment

More accurate than Uni STReg when reduced to onedimension when strength can be borrowed.

Multi-STReg makes big changes:

30 40 50 60 70 80

3035

4045

5055

6065

uncalibrated modeling output

calib

rate

d m

odel

ing

outp

ut

30 40 50 60 70 80

3540

4550

5560

65

uncalibrated modeling output

calib

rate

d m

odel

ing

outp

ut

30 40 50 60 70 80 90

4050

6070

uncalibrated modeling output

calib

rate

d m

odel

ing

outp

ut

30 40 50 60 70 80 90

4050

6070

uncalibrated modeling output

calib

rate

d m

odel

ing

outp

ut

Uncalibrated model outputs versus calibrated (MultiSTReg) inferences for selected days.

NICDS 2010 - CRM – p. 29/31

Page 42: Combining physical and statistical models to predict ... · Environmental Space-Time Fields, X X massively multivariate:over time × space × species, often discretized: time may

Multi STReg assessment

More accurate than Uni STReg when reduced to onedimension when strength can be borrowed.

Bayesian Melding does too!:

30 40 50 60 70 80

4045

5055

60

uncalibrated modeling output

calib

rate

d m

odel

ing

outp

ut

30 40 50 60 70 80

4045

5055

6065

70

uncalibrated modeling output

calib

rate

d m

odel

ing

outp

ut

30 40 50 60 70 80 90

2030

4050

6070

80

uncalibrated modeling output

calib

rate

d m

odel

ing

outp

ut

30 40 50 60 70 80 90

4045

5055

6065

7075

uncalibrated modeling output

calib

rate

d m

odel

ing

outp

ut

Uncalibrated model outputs versus calibrated (BM)inferences for selected days.

NICDS 2010 - CRM – p. 29/31

Page 43: Combining physical and statistical models to predict ... · Environmental Space-Time Fields, X X massively multivariate:over time × space × species, often discretized: time may

Multi STReg assessment

More accurate than Uni STReg when reduced to onedimension when strength can be borrowed.

RMSPE RMSFEmultivariate univariate multivariate univariate

daytime 16.78 18.10 14.44 15.91nighttime 12.51 14.19 9.69 9.71

Average of MSEs for spat prediction & temporal forecasting: multi-vs univariate methods.

NICDS 2010 - CRM – p. 29/31

Page 44: Combining physical and statistical models to predict ... · Environmental Space-Time Fields, X X massively multivariate:over time × space × species, often discretized: time may

Conclusions

Overall Mult STReg best calibrator among the variousmethods.

But Uni STReg simpler!

Correlation between night- & daytime measurementsallows strength to be borrowed from the night.

Approaches show how to infer Policy RelatedBackground levels. (But they vary by hour & region.)

Current work: non-stationary extensions with timedependent coefficients.

NICDS 2010 - CRM – p. 30/31

Page 45: Combining physical and statistical models to predict ... · Environmental Space-Time Fields, X X massively multivariate:over time × space × species, often discretized: time may

References

http://www.stat.ubc.ca/Research/TechReports/tr08.ph p

http://www.stat.ubc.ca/ jim/

Contact: [email protected]

NICDS 2010 - CRM – p. 31/31


Recommended