+ All Categories
Home > Documents > Forecast evaluation basics...Forecast Evaluation Concepts •Forecast evaluation basics –Tressa...

Forecast evaluation basics...Forecast Evaluation Concepts •Forecast evaluation basics –Tressa...

Date post: 09-Jul-2020
Category:
Upload: others
View: 4 times
Download: 0 times
Share this document with a friend
122
Forecast Evaluation Concepts Forecast evaluation basics Tressa Fowler Evaluation of categorical variables Tara Jensen Evaluation of continuous variables Tressa Fowler Evaluation of probabilistic forecasts Barbara Brown Intro to spatial forecast verification Barbara Brown 3:40 5:10 Exercises using R, A statistical tool Overview of tools and useful links Introduction to R (Tara Jensen)
Transcript
Page 1: Forecast evaluation basics...Forecast Evaluation Concepts •Forecast evaluation basics –Tressa Fowler •Evaluation of categorical variables –Tara Jensen •Evaluation of continuous

Forecast Evaluation Concepts

• Forecast evaluation basics – Tressa Fowler

• Evaluation of categorical variables – Tara Jensen

• Evaluation of continuous variables – Tressa Fowler

• Evaluation of probabilistic forecasts – Barbara Brown

• Intro to spatial forecast verification – Barbara Brown

3:40 – 5:10 Exercises using R, A statistical tool

• Overview of tools and useful links

• Introduction to R (Tara Jensen)

Page 2: Forecast evaluation basics...Forecast Evaluation Concepts •Forecast evaluation basics –Tressa Fowler •Evaluation of categorical variables –Tara Jensen •Evaluation of continuous

Basic Verification Concepts

Tressa L. Fowler

National Center for Atmospheric Research

Boulder Colorado USA

Page 3: Forecast evaluation basics...Forecast Evaluation Concepts •Forecast evaluation basics –Tressa Fowler •Evaluation of categorical variables –Tara Jensen •Evaluation of continuous

Basic concepts - outline

• What is verification?• Why verify?• Identifying verification goals• Forecast “goodness”• Designing a verification study• Types of forecasts and observations• Matching forecasts and observations• Verification attributes• Miscellaneous issues• Questions to ponder: Who? What? When? Where? Which?

Why?

3Copyright 2011, University Corporation for

Atmospheric Research, all rights reserved

Page 4: Forecast evaluation basics...Forecast Evaluation Concepts •Forecast evaluation basics –Tressa Fowler •Evaluation of categorical variables –Tara Jensen •Evaluation of continuous

What is verification?

• Verification is the process of comparing forecasts to relevant observations– Verification is one aspect of measuring forecast goodness

• Verification measures the quality of forecasts (as opposed to their value)

• For many purposes a more appropriate term is “evaluation”

4Copyright 2011, University Corporation for Atmospheric Research, all rights reserved

Page 5: Forecast evaluation basics...Forecast Evaluation Concepts •Forecast evaluation basics –Tressa Fowler •Evaluation of categorical variables –Tara Jensen •Evaluation of continuous

Why verify?

• Purposes of verification (traditional definition)

– Administrative purpose• Monitoring performance• Choice of model or model configuration

(has the model improved?)

– Scientific purpose• Identifying and correcting model flaws• Forecast improvement

– Economic purpose• Improved decision making• “Feeding” decision models or decision support systems

5Copyright 2011, University Corporation for Atmospheric Research, all rights reserved

Page 6: Forecast evaluation basics...Forecast Evaluation Concepts •Forecast evaluation basics –Tressa Fowler •Evaluation of categorical variables –Tara Jensen •Evaluation of continuous

Identifying verification goals

What questions do we want to answer?• Examples: In what locations does the model have the best

performance?Are there regimes in which the forecasts are better or

worse? Is the probability forecast well calibrated (i.e.,

reliable)?Do the forecasts correctly capture the natural

variability of the weather?

Other examples?

6Copyright 2011, University Corporation for Atmospheric Research, all rights reserved

Page 7: Forecast evaluation basics...Forecast Evaluation Concepts •Forecast evaluation basics –Tressa Fowler •Evaluation of categorical variables –Tara Jensen •Evaluation of continuous

Identifying verification goals (cont.)

• What forecast performance attribute should be measured?• Related to the question as well as the type of forecast

and observation

• Choices of verification statistics, measures, graphics• Should match the type of forecast and the attribute of

interest• Should measure the quantity of interest (i.e., the

quantity represented in the question)

7Copyright 2011, University Corporation for Atmospheric Research, all rights reserved

Page 8: Forecast evaluation basics...Forecast Evaluation Concepts •Forecast evaluation basics –Tressa Fowler •Evaluation of categorical variables –Tara Jensen •Evaluation of continuous

Forecast “goodness”

• Depends on the quality of the forecast

AND

• The user and his/her application of the forecast information

8Copyright 2011, University Corporation for Atmospheric Research, all rights reserved

Page 9: Forecast evaluation basics...Forecast Evaluation Concepts •Forecast evaluation basics –Tressa Fowler •Evaluation of categorical variables –Tara Jensen •Evaluation of continuous

Seasonal Forecast: Streamflow 15% > normal. Good or bad?

9Copyright 2011, University Corporation for Atmospheric Research, all rights reserved

Page 10: Forecast evaluation basics...Forecast Evaluation Concepts •Forecast evaluation basics –Tressa Fowler •Evaluation of categorical variables –Tara Jensen •Evaluation of continuous

Seasonal Forecast: Streamflow 15% > normal.Good or bad?

10Copyright 2011, University Corporation for Atmospheric Research, all rights reserved

Page 11: Forecast evaluation basics...Forecast Evaluation Concepts •Forecast evaluation basics –Tressa Fowler •Evaluation of categorical variables –Tara Jensen •Evaluation of continuous

Good forecast or Bad forecast?

• Agricultural users: No problem, draw full water rights at leisure.

• Rafting Companies: Geared up for busy rafting season, suffered losses when many sections of river

were closed for safety.

11

Different users have different ideas about

what makes a forecast good

Different verification approaches can measure different types of “goodness”

Page 12: Forecast evaluation basics...Forecast Evaluation Concepts •Forecast evaluation basics –Tressa Fowler •Evaluation of categorical variables –Tara Jensen •Evaluation of continuous

Basic guide for developing verification studies

Consider the users…– … of the forecasts– … of the verification information

• What aspects of forecast quality are of interest for the user?– Typically (always?) need to consider multiple aspects

Develop verification questions to evaluate those aspects/attributes

• Exercise: What verification questions and attributes would be of interest to …– … operators of an electric utility?– … a city emergency manager?– … a mesoscale model developer?– … aviation planners?

12Copyright 2011, University Corporation for Atmospheric Research, all rights reserved

Page 13: Forecast evaluation basics...Forecast Evaluation Concepts •Forecast evaluation basics –Tressa Fowler •Evaluation of categorical variables –Tara Jensen •Evaluation of continuous

Basic guide for developing verification studies

Identify observations that represent the event being forecast, including the– Element (e.g., temperature, precipitation)

– Temporal resolution

– Spatial resolution and representation

– Thresholds, categories, etc.

13Copyright 2011, University Corporation for Atmospheric Research, all rights reserved

Page 14: Forecast evaluation basics...Forecast Evaluation Concepts •Forecast evaluation basics –Tressa Fowler •Evaluation of categorical variables –Tara Jensen •Evaluation of continuous

Observations are not truth• We can’t know the complete “truth”.• Observations generally are more “true” than a

model analysis (at least they are relatively more independent)

• Observational uncertainty should be taken into account in whatever way possible In other words, how well do adjacent observations match

each other?

14

Page 15: Forecast evaluation basics...Forecast Evaluation Concepts •Forecast evaluation basics –Tressa Fowler •Evaluation of categorical variables –Tara Jensen •Evaluation of continuous

Observations might be garbage if

• Not Independent (of forecast or each other)

• Biased– Space

– Time

– Instrument

– Sampling

– Reporting

• Measurement errors

• Not enough of them

15Copyright 2011, University Corporation for Atmospheric Research, all rights reserved

Page 16: Forecast evaluation basics...Forecast Evaluation Concepts •Forecast evaluation basics –Tressa Fowler •Evaluation of categorical variables –Tara Jensen •Evaluation of continuous

Basic guide for developing verification studies

Identify multiple verification attributes that can provide answers to the questions of interest

Select measures and graphics that appropriately measure and represent the attributes of interest

Identify a standard of comparison that provides a reference level of skill (e.g., persistence, climatology, old model)

16Copyright 2011, University Corporation for Atmospheric Research, all rights reserved

Page 17: Forecast evaluation basics...Forecast Evaluation Concepts •Forecast evaluation basics –Tressa Fowler •Evaluation of categorical variables –Tara Jensen •Evaluation of continuous

Types of forecasts, observations• Continuous

– Diurnal Temperature Range

– Rainfall amount

– Annual Snowfall

• Categorical– Dichotomous

Rain vs. no rain

Strong winds vs. no strong wind

Night frost vs. no frost

Often formulated as Yes/No

– Multi-category Cloud amount category

Precipitation type

– May result from subsetting continuous variables into categories Ex: Temperature categories of 0-10, 11-20, 21-30, etc.

17Copyright 2011, University Corporation for Atmospheric Research, all rights reserved

Page 18: Forecast evaluation basics...Forecast Evaluation Concepts •Forecast evaluation basics –Tressa Fowler •Evaluation of categorical variables –Tara Jensen •Evaluation of continuous

Types of forecasts, observations• Probabilistic

– Observation can be dichotomous, multi-category, or continuous Precipitation occurrence – Dichotomous (Yes/No) Precipitation type – Multi-category Temperature distribution - Continuous

– Forecast can be Single probability value (for dichotomous events) Multiple probabilities (discrete probability distribution for

multiple categories) Continuous distribution

– For dichotomous or multiple categories, probability values may be limited to certain values (e.g., multiples of 0.1)

• Ensemble– Multiple iterations of a continuous or

categorical forecast May be transformed into a probability distribution

– Observations may be continuous,dichotomous or multi-category

2-category precipitation forecast (PoP) for US

ECMWF 2-m temperature meteogram for Helsinki

18Copyright 2011, University Corporation for Atmospheric Research, all rights reserved

Page 19: Forecast evaluation basics...Forecast Evaluation Concepts •Forecast evaluation basics –Tressa Fowler •Evaluation of categorical variables –Tara Jensen •Evaluation of continuous

Verification attributes

• Verification attributes measure different aspects of forecast quality

– Represent a range of characteristics that should be considered

– Many can be related to joint, conditional, and marginal distributions of forecasts and observations

19Copyright 2011, University Corporation for Atmospheric Research, all rights reserved

Page 20: Forecast evaluation basics...Forecast Evaluation Concepts •Forecast evaluation basics –Tressa Fowler •Evaluation of categorical variables –Tara Jensen •Evaluation of continuous

20

Joint : The probability of two events in conjunction.

Pr (Tornado forecast AND Tornado observed) = 30 / 2800 = 0.01

Conditional : The probability of one variable given that the second is already determined.

Pr (Tornado Observed | Tornado Fcst) = 30/100= 0.30

Tornado

forecast

Tornado Observed

yes no Total fc

yes 30 70 100

no 20 2680 2700

Total obs 50 2750 2800

Marginal : The probability of one variable without regard to the other.

Pr(Yes Forecast) = 100/2800 = 0.04Pr(Yes Obs) = 50 / 2800 = 0.02

Page 21: Forecast evaluation basics...Forecast Evaluation Concepts •Forecast evaluation basics –Tressa Fowler •Evaluation of categorical variables –Tara Jensen •Evaluation of continuous

Verification attribute examples

• Bias - (Marginal distributions)

• Correlation- Overall association (Joint distribution)

• Accuracy- Differences (Joint distribution)

• Calibration- Measures conditional bias (Conditional distributions)

• Discrimination- Degree to which forecasts discriminate between

different observations (Conditional distribution)

21Copyright 2011, University Corporation for Atmospheric Research, all rights reserved

Page 22: Forecast evaluation basics...Forecast Evaluation Concepts •Forecast evaluation basics –Tressa Fowler •Evaluation of categorical variables –Tara Jensen •Evaluation of continuous

Some key things to think about …

Who…– …wants to know?

What… – … does the user care about?– … kind of parameter are we evaluating? What are its

characteristics (e.g., continuous, probabilistic)?– … thresholds are important (if any)?– … forecast resolution is relevant (e.g., site-specific, area-

average)?– … are the characteristics of the obs (e.g., quality, uncertainty)? – … are appropriate methods?

Why…– …do we need to verify it?

22Copyright 2011, University Corporation for Atmospheric Research, all rights reserved

Page 23: Forecast evaluation basics...Forecast Evaluation Concepts •Forecast evaluation basics –Tressa Fowler •Evaluation of categorical variables –Tara Jensen •Evaluation of continuous

Some key things to think about…

How…

– …do you need/want to present results (e.g., stratification/aggregation)?

Which…

– …methods and metrics are appropriate?

– … methods are required (e.g., bias, event frequency, sample size)

23Copyright 2011, University Corporation for Atmospheric Research, all rights reserved

Page 24: Forecast evaluation basics...Forecast Evaluation Concepts •Forecast evaluation basics –Tressa Fowler •Evaluation of categorical variables –Tara Jensen •Evaluation of continuous

Categorical Verification

Tara JensenNCAR/RAL/JNT

Contributions from Matt Pocernich, Eric Gilleland,

Tressa Fowler, Barbara Brown and others

MH

F

Observation

Forecast

Page 25: Forecast evaluation basics...Forecast Evaluation Concepts •Forecast evaluation basics –Tressa Fowler •Evaluation of categorical variables –Tara Jensen •Evaluation of continuous

Finley Tornado Data

(1884)

Forecast answering the

question:

Will there be a tornado?

Observation answering the

question:

Did a tornado occur?

YES

NO

Answers fall into 1 of 2 categories Forecasts and Obs are Binary

YES

NO

Page 26: Forecast evaluation basics...Forecast Evaluation Concepts •Forecast evaluation basics –Tressa Fowler •Evaluation of categorical variables –Tara Jensen •Evaluation of continuous

Yes No Total

Yes 28 72 100

No 23 2680 2703

Total 51 2752 2803

Observed

Fo

reca

st

Finley Tornado Data

(1884)

Contingency Table

Page 27: Forecast evaluation basics...Forecast Evaluation Concepts •Forecast evaluation basics –Tressa Fowler •Evaluation of categorical variables –Tara Jensen •Evaluation of continuous

Yes No Total

Yes 28 72 100

No 23 2680 2703

Total 51 2752 2803

Observed

Fo

reca

stA Success?

Percent Correct = (28+2680)/2803 = 96.6% !!!!

Page 28: Forecast evaluation basics...Forecast Evaluation Concepts •Forecast evaluation basics –Tressa Fowler •Evaluation of categorical variables –Tara Jensen •Evaluation of continuous

Yes No Total

Yes 0 0 0

No 51 2752 2803

Total 51 2752 2803

Observed

Fo

reca

stWhat if forecaster

never forecasted a tornado?

Percent Correct = (0+2752)/2803 = 98.2% !!!!

Page 29: Forecast evaluation basics...Forecast Evaluation Concepts •Forecast evaluation basics –Tressa Fowler •Evaluation of categorical variables –Tara Jensen •Evaluation of continuous

maybe Accuracy is not the most

informative statistic

But the contingency table concept is good…

Page 30: Forecast evaluation basics...Forecast Evaluation Concepts •Forecast evaluation basics –Tressa Fowler •Evaluation of categorical variables –Tara Jensen •Evaluation of continuous

2 x 2 Contingency Table

Yes No Total

Yes Hit

False

Alarm

Forecast

Yes

No Miss

Correct

Negative

Forecast

No

Total Obs. Yes Obs. No Total

ObservedF

ore

cast

Example: Accuracy = (Hits+Correct Negs)/Total

Page 31: Forecast evaluation basics...Forecast Evaluation Concepts •Forecast evaluation basics –Tressa Fowler •Evaluation of categorical variables –Tara Jensen •Evaluation of continuous

Common Notation(however not universal notation)

Example: Accuracy = (a+d)/n

Yes No Total

Yes a b a+b

No c d c+d

Total a+c b+d n

Observed

Fore

cast

Page 32: Forecast evaluation basics...Forecast Evaluation Concepts •Forecast evaluation basics –Tressa Fowler •Evaluation of categorical variables –Tara Jensen •Evaluation of continuous

What if data are not binary?

Temperature < 0 C

Precipitation > 1 inch

CAPE > 1000 J/kg

Ozone > 20 µg/m³

Winds at 80 m > 24 m/s

500 mb HGTS < 5520 m

Radar Reflectivity > 40 dBZ

MSLP < 990 hPa

LCL < 1000 ft

Cloud Droplet Concentration > 500/cc

Hint: Pick a threshold

that is meaningful

to your end-user

Page 33: Forecast evaluation basics...Forecast Evaluation Concepts •Forecast evaluation basics –Tressa Fowler •Evaluation of categorical variables –Tara Jensen •Evaluation of continuous

Contingency Table for

Freezing Temps (i.e. T<=0 C)

Another Example:

Base Rate (aka sample climatology) = (a+c)/n

<= 0C > 0C Total

<= 0C a b a+b

> 0C c d c+d

Total a+c b+d n

Observed

Fore

cast

Page 34: Forecast evaluation basics...Forecast Evaluation Concepts •Forecast evaluation basics –Tressa Fowler •Evaluation of categorical variables –Tara Jensen •Evaluation of continuous

Alternative Perspective on

Contingency Table

Hits

Correct

Negatives

False Alarms Misses

Forecast = yes Observed = yes

Page 35: Forecast evaluation basics...Forecast Evaluation Concepts •Forecast evaluation basics –Tressa Fowler •Evaluation of categorical variables –Tara Jensen •Evaluation of continuous

Conditioning to form a statistic

• Considers the probability of one event given another event

• Notation: p(X|Y=1) is probability of X occuring given

Y=1 or in other words Y=yes

Conditioning on Fcst provides:

• Info about how your forecast is performing

• Apples-to-Oranges comparison if comparing stats from 2 models

Conditioning on Obs provides:

• Info about ability of forecast to discriminate between event and

non-event - also called Conditional Probability or “Likelihood”

• Apples-to-Apples comparison if comparing stats from 2 models

Page 36: Forecast evaluation basics...Forecast Evaluation Concepts •Forecast evaluation basics –Tressa Fowler •Evaluation of categorical variables –Tara Jensen •Evaluation of continuous

Conditioning on forecasts

Forecast = yes

f=1

Observed = yes

x=1

p(x|f=1)p(x=1|f=1) = a / aUb = a/(a+b) = Fraction of Hits

p(x=0|f=1) = b / aUb = b/(a+b) = False Alarm Ratio

Page 37: Forecast evaluation basics...Forecast Evaluation Concepts •Forecast evaluation basics –Tressa Fowler •Evaluation of categorical variables –Tara Jensen •Evaluation of continuous

Conditioning on observations

Forecast = yes

f=1

Observed = yes

x=1

p(f|x=1)p(f=1|x=1) = a / aUc = a/(a+c) = Hit Rate

p(f=0|x=1) = c / aUc = c/(a+c) = Fraction of Misses

Page 38: Forecast evaluation basics...Forecast Evaluation Concepts •Forecast evaluation basics –Tressa Fowler •Evaluation of categorical variables –Tara Jensen •Evaluation of continuous

What’s considered good?

Conditioning on Forecast

Fraction of hits - p(f=1|x=1) = a/(a+b) : close to 1

False Alarm Ratio - p(f=0|x=1) = b/(a+b) : close to 0

Conditioning on Observations

Hit Rate - p(f=1|x=1) = a/(a+c): close to 1

[aka Probability of Detection Yes (PODy)]

Fraction of misses p(f=0|x=1) = b/(a+c) : close to 0

Page 39: Forecast evaluation basics...Forecast Evaluation Concepts •Forecast evaluation basics –Tressa Fowler •Evaluation of categorical variables –Tara Jensen •Evaluation of continuous

Examples of categorical scores(most based on conditioning)

• Hit Rate (PODy) = a/(a+c)

• PODn = d/(b+d) = ( 1 – POFD)

• False Alarm Rate (POFD) = b/(b+d)

• False Alarm Ratio (FAR) = b/(a+b)

• (frequency) Bias (FBIAS) = (a+b)/(a+c)

• Threat Score or Critical Success Index = a/(a+b+c)

a cb

d

POD

Probability of

Detection

POFD

Probability of

False Detection

(CSI)

Page 40: Forecast evaluation basics...Forecast Evaluation Concepts •Forecast evaluation basics –Tressa Fowler •Evaluation of categorical variables –Tara Jensen •Evaluation of continuous

Examples of CTC calculations

Yes No Total

Yes 28 72 100

No 23 2680 2703

Total 51 2752 2803

ObservedF

ore

ca

st

Threat Score = 28 / (28 + 72+ 23) = 0.228

Probability of Detection = 28 / (28 + 23) = 0.55

False Alarm Ratio= 72/(28 + 72) = 0.720

Page 41: Forecast evaluation basics...Forecast Evaluation Concepts •Forecast evaluation basics –Tressa Fowler •Evaluation of categorical variables –Tara Jensen •Evaluation of continuous

Skill Scores

How do you compare the skill of easy to

predict events with difficult to predict events?

• Provides a single value to summarize performance.

• Reference forecast - best naive guess; persistence;

climatology.

• Reference forecast must be comparable.

• Perfect forecast implies that the object can be perfectly

observed.

Page 42: Forecast evaluation basics...Forecast Evaluation Concepts •Forecast evaluation basics –Tressa Fowler •Evaluation of categorical variables –Tara Jensen •Evaluation of continuous

Generic Skill Score

climo

1

ref

perf ref

A ASS

A A

MSEMSESS

MSE

Positively oriented and 1 is optimal

Climo could be a separate forecast or a

gridded forecast sample climotology

where A = any measure

ref = reference

perf = perfect

where MSE =

Mean Square ErrorExample:

Page 43: Forecast evaluation basics...Forecast Evaluation Concepts •Forecast evaluation basics –Tressa Fowler •Evaluation of categorical variables –Tara Jensen •Evaluation of continuous

Commonly Used Skill Scores

• Gilbert Skill Score - based on the CSI corrected for the

number of hits that would be expected by chance.

• Heidke Skill Score - based on Accuracy corrected by the

number of hits that would be expected by chance.

• Hanssen-Kuipers Discriminant – (Pierce Skill Score) measures the ability of the forecast to discriminate between (or

correctly classify) events and non-events. H-K=POD-POFD

• Brier Skill Score for probabilistic forecasts

• Fractional Skill Score for neighborhood methods

• Intensity-Scale Skill Score for wavelet methods

Page 44: Forecast evaluation basics...Forecast Evaluation Concepts •Forecast evaluation basics –Tressa Fowler •Evaluation of categorical variables –Tara Jensen •Evaluation of continuous

Accounting for Uncertainty

• Observational

• Model

– Model parameters

– Physics

– Verification scores

• Sampling

– Verification statistic is a realization of a random process

– What if the experiment were re-run under identical conditions?

Page 45: Forecast evaluation basics...Forecast Evaluation Concepts •Forecast evaluation basics –Tressa Fowler •Evaluation of categorical variables –Tara Jensen •Evaluation of continuous

When should sampling variability be used?

• You are comparing two forecasts of the same

event, evaluate the differences.

• Sampling variability is large and can quickly

overwhelm small, but significant differences.

6hr Accum Precip

Model 1 Model 2 Observation

Page 46: Forecast evaluation basics...Forecast Evaluation Concepts •Forecast evaluation basics –Tressa Fowler •Evaluation of categorical variables –Tara Jensen •Evaluation of continuous

Confidence Intervals (CIs)

“If we re-run the experiment N times, and create N (1-α)100% CI’s, then we expect the true value of the parameter to fall inside (1-α)100 of the intervals.”

Confidence intervals can be parametric or non-parametric…

Page 47: Forecast evaluation basics...Forecast Evaluation Concepts •Forecast evaluation basics –Tressa Fowler •Evaluation of categorical variables –Tara Jensen •Evaluation of continuous

Confidence Intervals (CI’s)

• Parametric

– Assume the observed sample is a realization from a known population distribution with possibly unknown parameters (e.g., normal).

– Normal approximation CI’s are most common.

– Quick and easy.

Page 48: Forecast evaluation basics...Forecast Evaluation Concepts •Forecast evaluation basics –Tressa Fowler •Evaluation of categorical variables –Tara Jensen •Evaluation of continuous

Example: Let be an independent and

identically distributed (iid) sample from a normal

distribution with variance.

Then, is an estimate of the mean

of the sample. A (1-α)100% CI for the mean is

given by

How to calculate Normal Approx.

CI’sX1, ,Xn

X

2

X1

nXi

i 1

n

X z /2

X

n Note: You can find much

more about these ideas in

any basic statistics text

book

Page 49: Forecast evaluation basics...Forecast Evaluation Concepts •Forecast evaluation basics –Tressa Fowler •Evaluation of categorical variables –Tara Jensen •Evaluation of continuous

Uncertainty

Yes No Total

Yes 28 72 100

No 23 2680 2703

Total 51 2752 2803

Observed

Fo

reca

st

Hit rate= 0.55 ≈ (0.41, 0.69)

FAR= 0.72 ≈ (0.63, 0.81)

95% normal

approximation CI

shown in red

Page 50: Forecast evaluation basics...Forecast Evaluation Concepts •Forecast evaluation basics –Tressa Fowler •Evaluation of categorical variables –Tara Jensen •Evaluation of continuous

Confidence Intervals (CI’s)

• Nonparametric

– Assume the distribution of the observed sample is representative of the population distribution.

– Bootstrap CI’s are most common.

– Can be computationally intensive, but easy enough.

Page 51: Forecast evaluation basics...Forecast Evaluation Concepts •Forecast evaluation basics –Tressa Fowler •Evaluation of categorical variables –Tara Jensen •Evaluation of continuous

X1, ,Xn

IID Bootstrap Algorithm

(Nonparametric) Bootstrap

CI’s

1. Resample with replacement from the sample, .

2. Calculate the verification statistic(s) of interest from the resample in step 1.

3. Repeat steps 1 and 2 many times, say B times, to obtain a sample of the verification statistic(s) θB .

4. Estimate (1-α)100% CI’s from the sample in step 3.

Page 52: Forecast evaluation basics...Forecast Evaluation Concepts •Forecast evaluation basics –Tressa Fowler •Evaluation of categorical variables –Tara Jensen •Evaluation of continuous

Empirical Distribution (Histogram) of statistic

calculated on repeated samples

5%5%

Bounds for

90% CI

Values of statistic θB

Page 53: Forecast evaluation basics...Forecast Evaluation Concepts •Forecast evaluation basics –Tressa Fowler •Evaluation of categorical variables –Tara Jensen •Evaluation of continuous

Presented by

Tressa L. Fowler

Adapted from presentations created

by

Barbara Casati and Barbara Brown

Verification of

Continuous Forecasts

Page 54: Forecast evaluation basics...Forecast Evaluation Concepts •Forecast evaluation basics –Tressa Fowler •Evaluation of categorical variables –Tara Jensen •Evaluation of continuous

• Exploratory methods

– Scatter plots

– Discrimination plots

– Box plots

• Statistics

– Bias

– Error statistics

– Robustness

– Comparisons

Page 55: Forecast evaluation basics...Forecast Evaluation Concepts •Forecast evaluation basics –Tressa Fowler •Evaluation of categorical variables –Tara Jensen •Evaluation of continuous

Exploratory methods:

joint distribution

Scatter-plot: plot of observation versus forecast values

Perfect forecast = obs, points should be on the 45o diagonal

Provides information on: bias, outliers, error magnitude, linear association, peculiar behaviours in extremes, misses and false alarms (link to contingency table)

Page 56: Forecast evaluation basics...Forecast Evaluation Concepts •Forecast evaluation basics –Tressa Fowler •Evaluation of categorical variables –Tara Jensen •Evaluation of continuous

Exploratory methods:

marginal distributionQuantile-quantile plots:

OBS quantile versus the corresponding FRCS quantile

Page 57: Forecast evaluation basics...Forecast Evaluation Concepts •Forecast evaluation basics –Tressa Fowler •Evaluation of categorical variables –Tara Jensen •Evaluation of continuous

Scatter-plot and qq-plot: example 1Q: is there any bias? Positive (over-forecast) or negative (under-forecast)?

Page 58: Forecast evaluation basics...Forecast Evaluation Concepts •Forecast evaluation basics –Tressa Fowler •Evaluation of categorical variables –Tara Jensen •Evaluation of continuous

Scatter-plot and qq-plot: example 2

Describe the peculiar behaviour of low temperatures

Page 59: Forecast evaluation basics...Forecast Evaluation Concepts •Forecast evaluation basics –Tressa Fowler •Evaluation of categorical variables –Tara Jensen •Evaluation of continuous

Scatter-plot: example 3

Describe how the error varies as the

temperatures grow

outlier

Page 60: Forecast evaluation basics...Forecast Evaluation Concepts •Forecast evaluation basics –Tressa Fowler •Evaluation of categorical variables –Tara Jensen •Evaluation of continuous

Scatter-plot and

Contingency TableDoes the forecast detect correctly

temperatures above 18 degrees ?Does the forecast detect correctly

temperatures below 10 degrees ?

Page 61: Forecast evaluation basics...Forecast Evaluation Concepts •Forecast evaluation basics –Tressa Fowler •Evaluation of categorical variables –Tara Jensen •Evaluation of continuous
Page 62: Forecast evaluation basics...Forecast Evaluation Concepts •Forecast evaluation basics –Tressa Fowler •Evaluation of categorical variables –Tara Jensen •Evaluation of continuous

Example Box (and Whisker) Plot

Copyright 2011, UCAR, all rights reserved.

Page 63: Forecast evaluation basics...Forecast Evaluation Concepts •Forecast evaluation basics –Tressa Fowler •Evaluation of categorical variables –Tara Jensen •Evaluation of continuous

Exploratory methods:

marginal distributionsVisual comparison: Histograms, box-plots, …

Summary statistics:

• Location:

• Spread:

IQRSTDEVMEDIANMEAN

9.755.9917.0018.62FRCS

8.525.1820.2520.71OBS

1

1mean

median

n

i

i=

0.5

= X = xn

= q

2

1

0.25

1st dev

Inter Quartile Range

IQR

n

i

i=

0.75

= x Xn

=

= q q

Page 64: Forecast evaluation basics...Forecast Evaluation Concepts •Forecast evaluation basics –Tressa Fowler •Evaluation of categorical variables –Tara Jensen •Evaluation of continuous

Exploratory methods:

conditional distributions

Conditional histogram and

conditional box-plot

Page 65: Forecast evaluation basics...Forecast Evaluation Concepts •Forecast evaluation basics –Tressa Fowler •Evaluation of categorical variables –Tara Jensen •Evaluation of continuous

Exploratory methods:

conditional qq-plot

Page 66: Forecast evaluation basics...Forecast Evaluation Concepts •Forecast evaluation basics –Tressa Fowler •Evaluation of categorical variables –Tara Jensen •Evaluation of continuous

Continuous scores: linear bias

linear bias = Mean Error =1

nf

i- o

i( )i=1

n

å = f - o

Mean Error = average of the errors = difference between the means

It indicates the average direction of error: positive bias indicates over-forecast, negative bias indicates under-forecast.

Does not indicate the magnitude of the error (positive and negative error can cancel outs).

Attribute:

measures

the bias

Page 67: Forecast evaluation basics...Forecast Evaluation Concepts •Forecast evaluation basics –Tressa Fowler •Evaluation of categorical variables –Tara Jensen •Evaluation of continuous

Mean Absolute Error

MAE =1

nf

i- o

ii=1

n

å

Average of the magnitude of the errors

Linear score = each error has same weight

It does not indicates the direction of the error, just the magnitude

Attribute:

measures

accuracy

Page 68: Forecast evaluation basics...Forecast Evaluation Concepts •Forecast evaluation basics –Tressa Fowler •Evaluation of categorical variables –Tara Jensen •Evaluation of continuous

Median Absolute Deviation

MAD = median f

i- o

i{ }

Median of the magnitude of the errors

Very robust

Extreme errors have no effect

Attribute:

measures

accuracy

Page 69: Forecast evaluation basics...Forecast Evaluation Concepts •Forecast evaluation basics –Tressa Fowler •Evaluation of categorical variables –Tara Jensen •Evaluation of continuous

Continuous scores: MSE

MSE =1

nf

i- o

i( )2

i=1

n

å

Average of the squares of the errors: it measures the magnitude of the error, weighted on the squares of the errors

it does not indicate the direction of the error

Quadratic rule, therefore large weight on large errors:

good if you wish to penalize large error

sensitive to large values (e.g. precipitation) and outliers; sensitive to large variance (high resolution models); encourage conservative forecasts (e.g. climatology)

Attribute:

measures

accuracy

Page 70: Forecast evaluation basics...Forecast Evaluation Concepts •Forecast evaluation basics –Tressa Fowler •Evaluation of categorical variables –Tara Jensen •Evaluation of continuous

Continuous scores: RMSE

RMSE = MSE =1

nf

i- o

i( )2

i=1

n

å

RMSE is the squared root of the MSE: measures the magnitude of the error retaining the variable unit (e.g. OC)

Similar properties of MSE: it does not indicate the direction the error; it is defined with a quadratic rule = sensitive to large values, etc.

NOTE: RMSE is always larger or equal than the MAE

Attribute:

measures

accuracy

Page 71: Forecast evaluation basics...Forecast Evaluation Concepts •Forecast evaluation basics –Tressa Fowler •Evaluation of categorical variables –Tara Jensen •Evaluation of continuous

Forecast Lead Time

24 48 12072 96

Model 1

Model 2

Page 72: Forecast evaluation basics...Forecast Evaluation Concepts •Forecast evaluation basics –Tressa Fowler •Evaluation of categorical variables –Tara Jensen •Evaluation of continuous

Continuous scores: linear correlation

rXY

=

1

ny

i- y( ) x

i- x( )

i=1

n

å

1

ny

i- y( )

2

×1

nx

i- x( )

2

i=1

n

åi=1

n

å

=cov(Y,X)

sYs

X

Measures linear association between forecast and observation

Y and X rescaled (non-dimensional) covariance: ranges in [-1,1]

It is not sensitive to the bias

Not robust = better if data are normally distributedNot resistant = sensitive to large values and outliers

Attribute:

measures

association

Page 73: Forecast evaluation basics...Forecast Evaluation Concepts •Forecast evaluation basics –Tressa Fowler •Evaluation of categorical variables –Tara Jensen •Evaluation of continuous

Scores for continuous forecasts

Simplest overall measure of performance:

Correlation coefficient

( , )

( ) ( )fx

Cov f x

Var f Var x1

( )( )

( 1)

n

i i

ifx

f x

f f x x

rn s s

Page 74: Forecast evaluation basics...Forecast Evaluation Concepts •Forecast evaluation basics –Tressa Fowler •Evaluation of categorical variables –Tara Jensen •Evaluation of continuous

Continuous scores:

anomaly correlation

• Correlation calculated on

anomaly.

• Anomaly is difference

between what was forecast

(observed) and climatology.

• Centered or uncentered

versions.

Page 75: Forecast evaluation basics...Forecast Evaluation Concepts •Forecast evaluation basics –Tressa Fowler •Evaluation of categorical variables –Tara Jensen •Evaluation of continuous

MSE and bias correction

• MSE is the sum of the squared bias and the

variance. So bias = MSE

MSE = f - o( )2

+ sf

2 + so

2 - 2sfs

or

fo

MSE = ME2 + var(f - o)

Page 76: Forecast evaluation basics...Forecast Evaluation Concepts •Forecast evaluation basics –Tressa Fowler •Evaluation of categorical variables –Tara Jensen •Evaluation of continuous

Continuous skill scores:

good practice rules

• Use same climatology for the comparison of different models.

• When evaluating the Reduction of Variance, sample climatology

gives always worse skill score than long-term climatology: ask

always which climatology is used to evaluate the skill.

Page 77: Forecast evaluation basics...Forecast Evaluation Concepts •Forecast evaluation basics –Tressa Fowler •Evaluation of categorical variables –Tara Jensen •Evaluation of continuous

Continuous skill scores:

good practice rules• If the climatology is calculated pulling together data from many

different stations and times of the year, the skill score will be better

than if a different climatology for each station and month of the

year are used.

• In the former case the model gets credit from forecasting correctly

seasonal trends and specific locations climatologies.

• In the latter case the specific topographic effects and long-term trends

are removed and the forecast discriminating capability is better

evaluated. Choose the appropriate climatology for fulfilling your

verification purposes.

• Persistence forecast: use same time of the day to avoid diurnal cycle

effects.

Page 78: Forecast evaluation basics...Forecast Evaluation Concepts •Forecast evaluation basics –Tressa Fowler •Evaluation of categorical variables –Tara Jensen •Evaluation of continuous

Continuous Scores of Ranks

Problem: Continuous scores sensitive to large values or non robust.

Solution: Use the ranks of the variable, rather than its actual values.

The value-to-rank transformation:• diminish effects due to large values • transform distribution to a Uniform distribution• remove bias

Rank correlation is the most common.

36714528rank

22.324.625.519.823.124.221.727.4Temp oC

Page 79: Forecast evaluation basics...Forecast Evaluation Concepts •Forecast evaluation basics –Tressa Fowler •Evaluation of categorical variables –Tara Jensen •Evaluation of continuous

Evaluation of Probability Forecasts

Barbara BrownJoint Numerical Testbed

NCAR, Boulder, CO

July 2011

Acknowledgments: Tom Hamill, Laurence Wilson, Tressa Fowler

Copyright UCAR 2011, all rights reserved.

Page 80: Forecast evaluation basics...Forecast Evaluation Concepts •Forecast evaluation basics –Tressa Fowler •Evaluation of categorical variables –Tara Jensen •Evaluation of continuous

Questions to ask before beginning?

• How were the probability forecasts constructed?– Subjective forecasts (i.e., human generated)– Statistical methods– Ensemble forecasts (i.e., model-based)

• What are the “events” being forecasted?– Often the “event” is confused with the forecast

• Multi-category or dichotomous?– Extended methods needed for multi-category

• How are your forecasts used?– Kinds of decisions and decision makers (decision

making “systems”)

Copyright UCAR 2011, all rights reserved.

Page 81: Forecast evaluation basics...Forecast Evaluation Concepts •Forecast evaluation basics –Tressa Fowler •Evaluation of categorical variables –Tara Jensen •Evaluation of continuous

Dichotomous variables

• Observations have 2 possible valuesExamples:

Rain / No rainTemperature > 40 C or 40 C

• Forecasts can – Have multiple values (e.g., 0, .1, .2, …, 1) – Be continuous between 0 and 1

• Probability forecasts are a special form of continuous or categorical forecast

• Extension to multiple categories: Each observed category is assigned a forecast probability

Copyright UCAR 2011, all rights reserved.

Page 82: Forecast evaluation basics...Forecast Evaluation Concepts •Forecast evaluation basics –Tressa Fowler •Evaluation of categorical variables –Tara Jensen •Evaluation of continuous

Verifying a probabilistic forecast

• You cannot verify a probabilistic forecast with a single observation.

• The more data you have for verification, (as is true in general for other statistical measures) the more certain you are.

• Rare events (low probability) require more data to verify.

Copyright UCAR 2011, all rights reserved.

Page 83: Forecast evaluation basics...Forecast Evaluation Concepts •Forecast evaluation basics –Tressa Fowler •Evaluation of categorical variables –Tara Jensen •Evaluation of continuous

The Brier Score

• Analogous to MSE…

– The observation, x, takes on values of 0 and 1

– The forecast, f, is a probability value

• Measures the average squared error in the probabilities

– Large errors result in large penalties

Copyright UCAR 2011, all rights reserved.

2

1

)(1

BS k

n

k

k xfn

Page 84: Forecast evaluation basics...Forecast Evaluation Concepts •Forecast evaluation basics –Tressa Fowler •Evaluation of categorical variables –Tara Jensen •Evaluation of continuous

Brier skill score

Copyright UCAR 2011, all rights reserved.

refref

ref

BS

BS1

BS0

BSBSBSS

The Brier Skill Score (BSS) measures the relative improvement of the forecasts over a reference

forecast

Typically, the reference forecast is the “sample climatology” – i.e., the frequency with which the

“event” actually occurred

Page 85: Forecast evaluation basics...Forecast Evaluation Concepts •Forecast evaluation basics –Tressa Fowler •Evaluation of categorical variables –Tara Jensen •Evaluation of continuous

Decomposition of the Brier Score

• Decomposition is based on “categories” of probability values

• Reliability and Resolution are measures of forecast performance

• Uncertainty depends only on the observations– Measure of forecast “difficulty”

Copyright UCAR 2011, all rights reserved.

)1()(1

)(1

BS 2

1

2

1

xxxxNn

xfNn

i

I

i

iii

I

i

i

Reliability Resolution Uncertainty

Page 86: Forecast evaluation basics...Forecast Evaluation Concepts •Forecast evaluation basics –Tressa Fowler •Evaluation of categorical variables –Tara Jensen •Evaluation of continuous

Brier Skill Score

Copyright UCAR 2011, all rights reserved.

)1()(1

)(1

BS 2

1

2

1

xxxxNn

xfNn

i

I

i

iii

I

i

i

Reliability Resolution Uncertainty

UNC

RELRESBSS

Page 87: Forecast evaluation basics...Forecast Evaluation Concepts •Forecast evaluation basics –Tressa Fowler •Evaluation of categorical variables –Tara Jensen •Evaluation of continuous

Components of the Brier Score• Reliability

Measures how well the conditional relative frequency of events matches the forecast

• ResolutionMeasures how well the forecasts

distinguish situations with different frequencies of occurrence

• UncertaintyMeasures the variability in the

observations (i.e., the difficulty of the forecast situations)

Copyright UCAR 2011, all rights reserved.

2

1

1( )

I

i i i

i

N f xn

2

1

1( )

I

i i

i

N x xn

(1 )x x

Page 88: Forecast evaluation basics...Forecast Evaluation Concepts •Forecast evaluation basics –Tressa Fowler •Evaluation of categorical variables –Tara Jensen •Evaluation of continuous

Properties of a perfect probabilistic forecast of a binary event.

Sharpness

forecast

frequency

observed

non-events

observed

events

Resolution Reliability

Copyright UCAR 2011, all rights reserved.

Page 89: Forecast evaluation basics...Forecast Evaluation Concepts •Forecast evaluation basics –Tressa Fowler •Evaluation of categorical variables –Tara Jensen •Evaluation of continuous

Our friend, the scatterplot

Copyright UCAR 2011, all rights reserved.

Page 90: Forecast evaluation basics...Forecast Evaluation Concepts •Forecast evaluation basics –Tressa Fowler •Evaluation of categorical variables –Tara Jensen •Evaluation of continuous

Introducing the reliability diagram!( close relative to the attribute diagram)

• Analogous to the scatter plot- same intuition holds

• Data must be binned!

• Hides how much data is represented by each category

• Expresses conditional probabilities

• Confidence intervals can illustrate the problems with small sample sizes

Copyright UCAR 2011, all rights reserved.

Page 91: Forecast evaluation basics...Forecast Evaluation Concepts •Forecast evaluation basics –Tressa Fowler •Evaluation of categorical variables –Tara Jensen •Evaluation of continuous

Reliability Diagram

From Eumetcalmodule on forecast verification

Page 92: Forecast evaluation basics...Forecast Evaluation Concepts •Forecast evaluation basics –Tressa Fowler •Evaluation of categorical variables –Tara Jensen •Evaluation of continuous

Reliability Diagram Characteristics

From Eumetcalmodule on forecast verification

Page 93: Forecast evaluation basics...Forecast Evaluation Concepts •Forecast evaluation basics –Tressa Fowler •Evaluation of categorical variables –Tara Jensen •Evaluation of continuous

Reliability Diagram Characteristics

Probs under-forecast

No skill

Perfect categorical

forecast

Tends to mean, some

skill

Too few samples

Relatively reliable, rare

event

No resolution

Over-resolved forecast

Typical categorical forecast

Page 94: Forecast evaluation basics...Forecast Evaluation Concepts •Forecast evaluation basics –Tressa Fowler •Evaluation of categorical variables –Tara Jensen •Evaluation of continuous

Sharpness is also important

• “Sharpness” measures specificity of Prob forecasts

• Given two reliable forecast systems, the one producing the sharper forecasts is preferable.

• Sharpness without reliability implies unrealistic confidence.

• Sharpness ≠ Resolution.

• Sharpness is function of forecasts only

From Eumetcal module on forecast verification

Page 95: Forecast evaluation basics...Forecast Evaluation Concepts •Forecast evaluation basics –Tressa Fowler •Evaluation of categorical variables –Tara Jensen •Evaluation of continuous

DiscriminationMeasures ability of forecasts to distinguish situations

leading to the occurrence and non-occurrence of event

Depends on:

• Separation of means of conditional distributions

• Variance within conditional distributions

forecast

frequency

observed

non-events

observed

events

forecast

frequency

observed

non-events

observed

events

forecast

frequency

observed

non-events

observed

events

Good discrimination Poor discrimination Good discrimination

Copyright UCAR 2011, all rights reserved.

Page 96: Forecast evaluation basics...Forecast Evaluation Concepts •Forecast evaluation basics –Tressa Fowler •Evaluation of categorical variables –Tara Jensen •Evaluation of continuous

Receiver Operating Characteristic (ROC)• Another approach for examining discrimination between

events and non-events• Formed by setting multiple thresholds for the forecast

value– For each threshold treat the forecast as categorical (i.e., Yes/No)– Analogous to setting “decision thresholds” in the probabilities

• For each threshold compute POD and PDFD (often called the “hit rate” and the “false alarm rate”)

• Plot the POD and POFD values for each threshold against each other using scatter plot

• ROCs do not take into account reliability – measure “potential skill”– Need to examine reliability in addition– Allows comparison of forecasts with different biases

• Typically used for Probability Forecasts but can be used any forecasts that can be thresholded

Page 97: Forecast evaluation basics...Forecast Evaluation Concepts •Forecast evaluation basics –Tressa Fowler •Evaluation of categorical variables –Tara Jensen •Evaluation of continuous

Empirical ROC Curve

Diagonal line represents

No Skill (hit just as likely as a false alarm)

If line falls under Diagonal

Fcst Worse than Random Guess

Area under the ROC curve (AUC) is a useful measure of skill

Perfect = 1, Random = 0.5

Perfect

Page 98: Forecast evaluation basics...Forecast Evaluation Concepts •Forecast evaluation basics –Tressa Fowler •Evaluation of categorical variables –Tara Jensen •Evaluation of continuous

Useful references• Good overall references for forecast verification:

– (1) Wilks, D.S., 2006: Statistical Methods in the Atmospheric Sciences (2nd Ed). Academic Press, 627 pp.

– (2) WMO Verification working group forecast verification web page, http://www.cawcr.gov.au/projects/verification/

– (3) Jolliffe and Stephenson Book: Jolliffe, I.T., and D.B. Stephenson, 2003: Forecast Verification. A Practitioner's Guide in Atmospheric Science. Wiley and Sons Ltd, 240 pp.

• Rank histograms: Hamill, T. M., 2001: Interpretation of rank histograms for verifying ensemble forecasts. Mon. Wea. Rev., 129, 550-560.

• Spread-skill relationships: Whitaker, J.S., and A. F. Loughe, 1998: The relationship between ensemble spread and ensemble mean skill. Mon. Wea. Rev., 126, 3292-3302.

• Brier score, continuous ranked probability score, reliability diagrams: Wilks text again.

• Relative operating characteristic: Harvey, L. O., Jr, and others, 1992: The application of signal detection theory to weather forecasting behavior. Mon. Wea. Rev., 120, 863-883.

• Economic value diagrams: – (1)Richardson, D. S., 2000: Skill and relative economic value of the ECMWF ensemble prediction system.

Quart. J. Royal Meteor. Soc., 126, 649-667.

– (2) Zhu, Y, and others, 2002: The economic value of ensemble-based weather forecasts. Bull. Amer. Meteor. Soc., 83, 73-83.

• Overestimating skill: Hamill, T. M., and J. Juras, 2006: Measuring forecast skill: is it real skill or is it the varying climatology? Quart. J. Royal Meteor. Soc., Jan 2007 issue. http://tinyurl.com/kxtct

Copyright UCAR 2011, all rights reserved.

Page 99: Forecast evaluation basics...Forecast Evaluation Concepts •Forecast evaluation basics –Tressa Fowler •Evaluation of categorical variables –Tara Jensen •Evaluation of continuous

Spatial Verification Methods

Barbara Brown ([email protected])

National Center for Atmospheric Research (NCAR)Boulder, Colorado

Collaborators: Randy Bullock, John Halley Gotway, David Ahijevych, Eric Gilleland, Beth Ebert,

Barbara Casati

July 2011

Page 100: Forecast evaluation basics...Forecast Evaluation Concepts •Forecast evaluation basics –Tressa Fowler •Evaluation of categorical variables –Tara Jensen •Evaluation of continuous

Challenge of High Resolution

Fawcett, BAMS

3-km WRF, 2009

Examples of 12-h accumulated precip

190-km LFM, 1977

THEN NOW

Page 101: Forecast evaluation basics...Forecast Evaluation Concepts •Forecast evaluation basics –Tressa Fowler •Evaluation of categorical variables –Tara Jensen •Evaluation of continuous

Traditional approach

101

Consider gridded

forecasts and

observations of

precipitation

Which is better?

OBS1

2 3

45

Page 102: Forecast evaluation basics...Forecast Evaluation Concepts •Forecast evaluation basics –Tressa Fowler •Evaluation of categorical variables –Tara Jensen •Evaluation of continuous

Traditional approach

102

OBS1

2 3

45

Scores for Examples 1-4:

Correlation Coefficient = -0.02

Probability of Detection = 0.00

False Alarm Ratio = 1.00

Hanssen-Kuipers = -0.03

Gilbert Skill Score (ETS) = -0.01

Scores for Example 5:

Correlation Coefficient = 0.2

Probability of Detection = 0.88

False Alarm Ratio = 0.89

Hanssen-Kuipers = 0.69

Gilbert Skill Score (ETS) = 0.08

Forecast 5 is “Best”

Page 103: Forecast evaluation basics...Forecast Evaluation Concepts •Forecast evaluation basics –Tressa Fowler •Evaluation of categorical variables –Tara Jensen •Evaluation of continuous

Traditional approach

103

OBS1

2 3

45

Some problems with the

traditional approach:

(1) Non-diagnostic –

doesn’t tell us what was

wrong with the forecast – or

what was right

(2) Utra-sensitive to small

errors in simulation of

localized phenomena

Page 104: Forecast evaluation basics...Forecast Evaluation Concepts •Forecast evaluation basics –Tressa Fowler •Evaluation of categorical variables –Tara Jensen •Evaluation of continuous

Spatial forecasts

Spatial methods aim to:• Account for

uncertainties in timing and location

• Account for spatial structure

• Provide information on error in physical terms

• Provide information that is– Diagnostic– Meaningful to

forecast users104

Weather variables (e.g., precipitation) defined over spatial

domains have coherent structure

and features

Page 105: Forecast evaluation basics...Forecast Evaluation Concepts •Forecast evaluation basics –Tressa Fowler •Evaluation of categorical variables –Tara Jensen •Evaluation of continuous

Spatial Method Categories

Page 106: Forecast evaluation basics...Forecast Evaluation Concepts •Forecast evaluation basics –Tressa Fowler •Evaluation of categorical variables –Tara Jensen •Evaluation of continuous

New spatial verification approaches

NeighborhoodGive credit to "close" forecasts

Scale separationMeasure scale-dependent error

Field deformationMeasure distortion and

displacement (phase error) for

whole field How should the forecast be

adjusted to make the best match

with the observed field?

Object- and feature-basedEvaluate attributes of

identifiable features

106

Page 107: Forecast evaluation basics...Forecast Evaluation Concepts •Forecast evaluation basics –Tressa Fowler •Evaluation of categorical variables –Tara Jensen •Evaluation of continuous

Scale separation methods

• Goal:

Examine performance as a function of spatial scale

• Example: Power spectra

– Does it look real?

– Harris et al. (2001) compared multi-scale statistics for model and radar data

107

From Harris et al. 2001

Page 108: Forecast evaluation basics...Forecast Evaluation Concepts •Forecast evaluation basics –Tressa Fowler •Evaluation of categorical variables –Tara Jensen •Evaluation of continuous

Scale separation methods

Example methods :

• Intensity-scale (Casati et al. 2004)

• Multi-scale variability(Zapeda-Arce et al. 2000; Harris et al. 2001; Mittermaier 2006)

• Variogram (Marzban and Sandgathe 2009)

108

Page 109: Forecast evaluation basics...Forecast Evaluation Concepts •Forecast evaluation basics –Tressa Fowler •Evaluation of categorical variables –Tara Jensen •Evaluation of continuous

Neighborhood verificationGoal:Examine forecast

performance in a region; don’t require exact matched

• Also called “fuzzy” verification

• Example: Upscaling– Put observations and/or

forecast on coarser grid– Calculate traditional

metrics

• Provide information about scales where the forecasts have skill

109

Page 110: Forecast evaluation basics...Forecast Evaluation Concepts •Forecast evaluation basics –Tressa Fowler •Evaluation of categorical variables –Tara Jensen •Evaluation of continuous

Neighborhood methodsExample methods :

• Distribution approach (Marsigli)

• Fractions Skill Score (Roberts 2005; Roberts and Lean 2008; Mittermaier and Roberts 2009)

• Multiple approaches (Ebert 2008, 2009) (e.g., Upscaling, Multi-event cont. table, Practically perfect)

110

bad

good

Page 111: Forecast evaluation basics...Forecast Evaluation Concepts •Forecast evaluation basics –Tressa Fowler •Evaluation of categorical variables –Tara Jensen •Evaluation of continuous

Field deformation

Goal:

Examine how much a forecast field needs to be transformed in order to match the observed field

111

Page 112: Forecast evaluation basics...Forecast Evaluation Concepts •Forecast evaluation basics –Tressa Fowler •Evaluation of categorical variables –Tara Jensen •Evaluation of continuous

Field deformation methods

112

Example methods :

• Forecast Quality Index (Venugopal et al. 2005)

• Forecast Quality Measure/Displacement Amplitude Score (Keil and Craig 2007, 2009)

• Image Warping (Gilleland et al. 2009; Lindström et al. 2009; Engel 2009)

From Keil and Craig 2008

Page 113: Forecast evaluation basics...Forecast Evaluation Concepts •Forecast evaluation basics –Tressa Fowler •Evaluation of categorical variables –Tara Jensen •Evaluation of continuous

Object/Feature-based

Goals:

1. Identify relevant features in the forecast and observed fields

2. Compare attributes of the forecast and observed features

113

MODE example 2008

Page 114: Forecast evaluation basics...Forecast Evaluation Concepts •Forecast evaluation basics –Tressa Fowler •Evaluation of categorical variables –Tara Jensen •Evaluation of continuous

Object/Feature-based

Example methods:

• Cluster analysis (Marzban and Sandgathe 2006a,b)

• Composite (Nachamkin 2005, 2009)

• Contiguous Rain Area (CRA) (Ebert and McBride 2000; Ebert and Gallus 2009)

• MODE (Davis et al. 2006, 2009)

• Procrustes (Micheas et al.2007, Lack et al. 2009)

• SAL (Wernli et al. 2008, 2009)

114

Composite Centered on All Observed Events

CRA example (Ebert and Gallus)

Composite: Nachamkin

Page 115: Forecast evaluation basics...Forecast Evaluation Concepts •Forecast evaluation basics –Tressa Fowler •Evaluation of categorical variables –Tara Jensen •Evaluation of continuous

Limitations: Filtering (Neighborhood and Scale separation)

Does not clearly isolate specific errors (e.g., displacement, amplitude, structure)

Page 116: Forecast evaluation basics...Forecast Evaluation Concepts •Forecast evaluation basics –Tressa Fowler •Evaluation of categorical variables –Tara Jensen •Evaluation of continuous

Limitations: Displacement methods (features-based, field deformation)

• May have somewhat arbitrary matching criteria

• Often many parameters to be defined• More research needed on diagnosing

mesoscale structure

Page 117: Forecast evaluation basics...Forecast Evaluation Concepts •Forecast evaluation basics –Tressa Fowler •Evaluation of categorical variables –Tara Jensen •Evaluation of continuous

Strengths – FilteringNeighborhood & Scale-Separation

• Accounts for

o Unpredictable scales

o Uncertainty in observations

• Simple – ready-to-go

• Evaluates different aspects of a forecast (e.g., texture)

• Scale-dependent skill

Page 118: Forecast evaluation basics...Forecast Evaluation Concepts •Forecast evaluation basics –Tressa Fowler •Evaluation of categorical variables –Tara Jensen •Evaluation of continuous

Strengths - Displacement

• Features-based– credit for close forecast– measures displacement, structure

• Field-deformation– Distinguish aspect ratio and orientation angle error– credit for close forecast

Page 119: Forecast evaluation basics...Forecast Evaluation Concepts •Forecast evaluation basics –Tressa Fowler •Evaluation of categorical variables –Tara Jensen •Evaluation of continuous

What do the new methods measure?

119

Attribute Traditional Feature-based

Neighbor-hood

Scale Field

Defor-mation

Perf at different scales

Indirectly Indirectly Yes Yes No

Location errors

No Yes Indirectly Indirectly Yes

Intensity errors

Yes Yes Yes Yes Yes

Structure errors

No Yes No No Yes

Hits, etc. Yes Yes Yes Indirectly Yes

Page 120: Forecast evaluation basics...Forecast Evaluation Concepts •Forecast evaluation basics –Tressa Fowler •Evaluation of categorical variables –Tara Jensen •Evaluation of continuous

Back to the original example… What can the new methods tell us?

Example:• MODE “Interest” measures

overall ability of forecasts to match obs

• Interest values provide more intuitive estimates of performance than the traditional measure (ETS)

• But note: Even for spatial methods, Single measures don’t tell the whole story!

120

Page 121: Forecast evaluation basics...Forecast Evaluation Concepts •Forecast evaluation basics –Tressa Fowler •Evaluation of categorical variables –Tara Jensen •Evaluation of continuous

Final comments

• Benefits of spatial methods

– Provide potential for greater insight into forecast performance

– Provide more meaningful comparisons of forecast performance

• Limitations

– Require gridded forecasts and observations

– May require setting many parameters

– Somewhat difficult to implement

121

Page 122: Forecast evaluation basics...Forecast Evaluation Concepts •Forecast evaluation basics –Tressa Fowler •Evaluation of categorical variables –Tara Jensen •Evaluation of continuous

Many references and other information

http://www.rap.ucar.edu/projects/icp/index.html

Software is available for many of the methods – See website above

MET (see lectures Thursday) includes several methods [MODE, Intensity-scale (wavelet), neighborhood]; R package includes intensity-scale

Information resources


Recommended