Introduction to EKDMOS Matt Peroutka Meteorological Development Lab National Weather Service, NOAA...

Introduction to EKDMOS

Matt PeroutkaMeteorological Development LabNational Weather Service, NOAA

Bob GlahnJerry Wiedenfeld

John WagnerGreg Zylstra

Reference

• Glahn B., M. Peroutka, J. Wiedenfeld,J. Wagner, G. Zylstra, et al. (2008) MOS uncer-tainty estimates in an ensemble framework. Monthly Weather Review: In Press.– Available online at Monthly Weather Review

Early Online Releases.– AMS estimates publication in December 2008.

Introduction—1

• To date, MDL's MOS guidance has used multiple regression to produce single-valued forecasts for – Temperature (T)– Dew point (Td)– Daytime maximum temperature (MaxT)– Nighttime minimum temperature (MinT)

• Two changes– Ensembles now available– Emerging requirement for

probabilistic forecasts of these four weather elements

Introduction—2

• Regression yields error estimate from which probabilities can be calculated.– Probabilistic forecasts from a single model run– Distribution of weather element must be near normal

• Why haven't we used this before now?– Not needed to get good single-valued forecasts.– No big push for probabilistic

guidance.

• We now have ensemble runs, and we sought a way to combine ensembles and regression in some way.

EKDMOS Method

1. Use screening multiple regression with the NCEP's GEFS ensemble means,

2. Estimate error variance directly from the regression,

3. Apply equations to individual ensemble members and combine results with Kernel Density fitting

– Gaussian kernel– Standard deviation produced by the

regression

4. Apply a spread adjustment based on the spread of the ensemble members.

EKDMOS MethodDevelopment

Implementation

Ensemble Means

Observations

RegressionPredictive Equation

Error Variance

Ensemble Members Forecast

KernelDensityFitting

SpreadAdjustment

Probability Distribution

Error Estimation in Linear Regression

• The linear regression theory used to produce MOS guidance forecasts includes error estimation.

• The Confidence Interval quantifies uncertainty in the position of the regression line.

• The Prediction Interval quantifies uncertainty in predictions made using the regression line.

The prediction interval can be used to estimate uncertainty each time a MOS equation is used to make a forecast.

Estimated Variance of a Single New Independent Value

• Estimated variance

• Where

2

2

)(2 1

1ˆ

XX

XX

nMSEYs

i

hnewh

2

ˆ 2

n

YYMSE ii

Computing the Prediction Interval

The prediction bounds for a new prediction is

wheret(1-α/2;n-2) is the t distribution n-2 degrees of freedom at

the 1-α (two-tailed) level of significance

)()(ˆ2;2/1ˆnewhnewh YsntY

Multiple Regression (3-predictor case)

n

n

y

y

y

y

y

4

3

2

1

1 Y

321

434241

333231

232221

131211

4

1

1

1

1

1

nnn

n

xxx

xxx

xxx

xxx

xxx

X

3

2

1

0

14

a

a

a

a

A

PredictandVector

3-predictor Matrix

CoefficientVector

Multiple Regression, Continued

Error bounds can be put around the new value of Y with

where

– s2 is the variance of the predictand,

– R2 is the reduction of variance,

– X’ is the matrix transpose of X, and

– ()-1 indicates the matrix inverse.

2/1

141

444122

)( 11ˆ xXXx nnewh RsY

Regression Equation and Prediction Intervals

Kernel Density Fitting

Forecasts are combined with Kernel Density Fitting, using a Gaussian kernel and the standard deviation produced by the regression.

EKDMOS Spread Adjustment

)()(3

)()(3

minmaxmaxmin

minmaxmaxmin

FF

FFsfx

x = ratio of new width to old width

Applying the Spread Adjustment

• Method– KDE used to compute unadjusted distribution.– Average of forecasts used as center of

distribution.– Each point on the distribution adjusted to be

closer to center of distribution by width ratio (x).

• Spread factor (sf) empirically determined.– Values of 0.4 and 0.5 have worked best.– Same value for both seasons and all weather

elements.

EKDMOS Comments (T, Td, MaxT, MinT)

• Using ensemble means as predictors seems very effective.

• Prediction interval from regression can be used alone to make reliable forecasts.– Distributions will be symmetric.

• Combination of prediction interval and ensembles can yield skewed distributions.

QPF is not Quasi-Normal!

• One-predictor equations, prediction intervals, and data for QPF (top) and cube root QPF (bottom).

• Regression line fits poorly. Data points well beyond limits of prediction interval.

• Cube root improves the fit, but makes more problems. Note artificial "floor" at 0.22 in—cube root of 0.01 in.

• We need a different approach...

Method for EKDMOS 6-h PQPF1. Use screening multiple regression with

the NCEP's GEFS ensemble means to forecast QPF ≥ 0.01 in (PoP06).

2. Use same regression technique to forecast conditional QPF for other categories.

3. Apply equations for each category to individual ensemble members.

4. Multiply the PoP06 by each categorical probability to create unconditional probabilities for each category and member.

6. Ensure probabilities are monotonically increasing and between 0 and 1.

7. Combine ensemble results for each category with arithmetic mean.

Combination of EKDMOS temperature technique and GFS MOS QPF technique.

EKDMOS GFS MOS

0.01 0.01

0.05

0.10 0.10

0.15

0.20

0.25 0.25

0.30

0.40

0.50 0.50

0.75

1.00 1.00

1.50

EKDMOS Comments (QPF)

• Using ensemble means as predictors seems very effective.

• Probability distributions can be generated from a single model run.

• Combination of regression and ensemble modeling seems to yield better forecasts at long lead times.

Evaluation

• Tools– Probability Integral Transform (PIT)

Histograms– Square Bias (SB)– Cumulative Reliability Diagram (CRD)– Continuous Ranked Probability Score

(CRPS)• Baseline for Comparison

– Raw Ensembles– Operational Ensemble MOS

Probability Integral Transform (PIT) Histogram

• Graphically assesses reliability for a set of probabilistic forecasts. Visually similar to Ranked Histogram.

• Method– For each forecast-

observation pair, probability associated with observed event is computed.

– Frequency of occurrence for each probability is recorded in histogram as a ratio.

– Histogram boundaries set to forecast percentiles.

T=34F; p=.663

Ratio of 1.795 indicates ~9% of the observations fell into this category, rather than the desired 5%.Ratio of .809 indicates ~8% of the observations fell into this category, rather than the desired 10%.

0.1 0.2 0.95

Probability Integral Transform (PIT) Histogram, Continued

• Assessment– Flat histogram at unity

indicates reliable, unbiased forecasts.

– U-shaped histogram indicates under-dispersion in the forecasts.

– O-shaped histogram indicates over-dispersion.

– Higher values in higher percentages indicate a bias toward lower forecast values.

Squared Bias in Relative Frequency

• Weighted average of squared differences between actual height and unity for all histogram bars.

• Zero is ideal.• Summarizes

histogram with one value.

Sq Bias in RF = 0.057

Cumulative Reliability Diagram (CRD)

• Graphically assesses reliability for a set of probabilistic forecasts. Visually similar to reliability diagrams for event-based probability forecasts.

• Method– For each forecast-

observation pair, probability associated with observed event is computed.

– Cumulative distribution of verifying probabilities is plotted against the cumulative distribution of forecasts.

63.5% of the observations occurred when forecast probability was 70% for that temperature or colder.

0.7

Evaluation

• Tools– Probability Integral Transform (PIT)

Histograms– Square Bias (SB)– Cumulative Reliability Diagram (CRD)– Continuous Ranked Probability Score

(CRPS)• Baseline for Comparison

– Raw Ensembles– Operational Ensemble MOS

Baseline• Rank-ordered forecasts

can be used to form a crude CDF. Weibull plotting position estimator:

• Technique applied to raw ensemble output (RawEns) and Operational Ensemble MOS (ENSMOS).

1)TTPr(

n

ii

ENSMOS as seen on http://eyewall.met.psu.edu

Baseline Reliability

Day2

Day5

Day7

Raw Ens Raw EnsOpnl

EnsMOSOpnl

EnsMOS

EKDMOS: Dep. and Ind. Data

Day2

Day5

Day7

Dep

DepInd Ind

EKDMOS: Ind. Dew Point, Daytime MaxT, Nighttime MinT

Day2

Day5

Day7

Dew Point MinTMaxT

Continuous Ranked Probability Score

The formula for CRPS is

where P(x) and Pa(x) are both CDFs

and

dxxPxPxPCRPSCRPS aa

2

)()(,

xdyyxP )()(

)()( aa xxHxP

0for1

0for0)(

x

xxH

Continuous Ranked Probability Score

• Proper score that measures the accuracy of a set of probabilistic forecasts.

• Squared differ-ence between the forecast CDFand a perfect single value forecast, inte-grated over all possible values of the variable. Units are those of the variable.

• Zero indicates perfect accuracy. No upper bound.

dxxPxP

xPCRPS

a

a

2

)()(

,

Reliability/Accuracy



Accuracy for Td, MinT, MaxTTd

MinT

MaxT

6-h QPF Reliability at 24-h

≤0.01 ≤0.10

≤0.25 ≤0.50


≤0.01 ≤0.10

≤0.25 ≤0.50


≤0.01 ≤0.10

≤0.25 ≤0.50

6-h QPF Accuracy

Disseminating EKDMOS

• EKDMOS distributions are frequently skewed and occasionally bimodal.– Precludes fitting analytical distributions.– Rather, disseminate points on the CDF—percentile

temperatures. • 13 probability levels (0.05, 0.10, 0.20, 0.25, 0.30, 0.40,

0.50, 0.60, 0.70, 0.75, 0.80, 0.90, 0.95)• Allows users/partners to choose a threshold temperature

and compute an exceedance probability.– Single-value forecasts will be created as well.

• NDGD grids.• Tentative time projections:

• Days 1-8 x 3-h (T/Td).• Days 1-16 x 24-h (MaxT/MinT).

Sample T Forecast as Quantile Function (CDF)

25

30

35

40

45

50

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Probability

Te

mp

era

ture

20% chance of temperature below 35.2 degrees F.

Median of the distribution 38.3 degrees F.

50% Confidence Interval (35.8, 40.7) degrees F.

90% Confidence Interval (32.2,44.3) degrees F.

72-h T Fcst KBWI 12/14/2004

Chance of temperature below40.0 degrees F is 67.9%.

Sample Probability Density Function (PDF) for T

0

0.02

0.04

0.06

0.08

0.1

0.12

25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50

Temperature

Pro

ba

bil

ity

De

ns

ity

PDFs for a Single Event (PAJN)

EKDMOS Products/Services

• Grids in NDGD– Partners might find most value in EKDMOS.– AWIPS

• XML via SOAP web service (BAMS, April 2008)– Popular outlet (millions of queries/day) for

NDFD grids today.– Will support EKDMOS queries as well.– Will support queries for user-defined thresholds.

• Meteograms/Plume Diagrams• Limited number of graphics

Meteograms

• Uses 3 probability levels (0.10, 0.50, 0.90)

• Time series of forecast values

• Mullin Pass, Idaho (KMLP)

"Wings of Uncertainty"

Conclusions

1. Developed method for making MOS forecasts from ensembles for quasi-normally distributed variables.– Regression applied to NCEP ensemble

means.– KDE, normal kernel, mean and standard

deviation from regression.– Spread Adjustment.

2. Reliable and Robust.3. Will be implemented in NDGD.

Introducing EKDMOS

Date post:	17-Dec-2015
Category:	Documents
Upload:	randolf-green
View:	216 times
Download:	3 times

Introduction to EKDMOS Matt Peroutka Meteorological Development Lab National Weather Service, NOAA...

Documents