transcript
- Slide 1
- Daria Kluver Independent Study From Statistical Methods in the
Atmospheric Sciences By Daniel Wilks
- Slide 2
- Lets review a few concepts that were introduced last time on
Forecast Verification
- Slide 3
- Purposes of Forecast Verification Forecast verification- the
process of assessing the quality of forecasts. Any given
verification data set consists of a collection of
forecast/observation pairs whose joint behavior can be
characterized in terms of the relative frequencies of the possible
combinations of forecast/observation outcomes. This is an empirical
joint distribution
- Slide 4
- The Joint Distribution of Forecasts and Observations Forecast =
Observation = The joint distribution of the forecasts and
observations is denoted This is a discrete bivariate probability
distribution function associating a probability with each of the
IxJ possible combinations of forecast and observation.
- Slide 5
- The joint distribution can be factored in two ways, the one
used in a forecasting setting is: Called calibration-refinement
factorization The refinement of a set of forecasts refers to the
dispersion of the distribution p(y i ) If y i has occurred, this is
the probability of o j happening. Specifies how often each possible
weather event occurred on those occasions when the single forecast
y i was issued, or how well each forecast is calibrated. If y i has
occurred, this is the probability of o j happening. Specifies how
often each possible weather event occurred on those occasions when
the single forecast y i was issued, or how well each forecast is
calibrated. The unconditional distribution, which specifies the
relative frequencies of use of each of the forecast values y i
sometimes called the refinement of a forecast.
- Slide 6
- Scalar Attributes of Forecast Performance
- Slide 7
- Forecast Skill Forecast skill- the relative accuracy of a set
of forecasts, wrt some set of standard control, or reference,
forecast (like climatological average, persistence forecasts,
random forecasts based on climatological relative frequencies)
Skill score- a percentage improvement over reference forecast.
accuracy Accuracy of reference Accuracy that would be achieved by a
perfect forecast.
- Slide 8
- On to new material 2x2 Contingency tables Scalar attributes of
contingency tables Tornado example NWS vs Weather.com vs
climatology Skill Scores Probabilistic Forecasts Multicategory
Discrete Predictands Continuous Predictands Plots and score
Probability forecasts for multicategory events Non-Probabilistic
Field forecasts
- Slide 9
- Nonprobabilistic Forecasts of Discrete Predictands
Nonprobabilistic contains unqualified statement that a single
outcome will occur. Contains no expression of uncertainty.
- Slide 10
- The 2x2 Contingency Table The simplest joint distribution is
from I=J=2. (or nonprobabilistic yes/no forecasts) I=2 possible
forecasts J=2 outcomes i=1 or y 1, event will occur i=2 or y 2,
event will not occur j=1 or o 1, event subsequently occurs j=2 or o
2, event doesnt subsequently occur
- Slide 11
- a forecast- observation pairs called hits their relative
frequency, a/n is the sample estimate of the corresponding joint
probability p(y 1,o 1 ) b occasions called false alarms the
relative frequency estimates the joint probability p(y 1,o 2 ) C
occasions called misses the relative frequency estimates the joint
probability p(y 2,o 1 ) D occasions called correct rejection or
correct negative the relative frequency estimates the joint
probability p(y 2,o 2 )
- Slide 12
- Scalar Attributes Characterizing 2x2 contingency tables
Accuracy proportion correct Threat Score (TS) Odds ratio Bias-
Comparison of the average forecast with the average observation
Reliability and Resolution- False Alarm Ratio Discrimination- Hit
rate False Alarm Rate
- Slide 13
- NWS, weather.com,climatology example 12 random nights from Nov
6 to Dec 1 Will overnight lows be colder than or equal to freezing?
wx.comyesno forecastyes505 no257 75 NWSyesno forecastyes606 no156
75 climyesno forecastyes101 no6511 75 forecasterabcdPCTSodds
ratiobiasFARH wx.com50250.8330.71429#DIV/0!0.7142900.714
NWS60150.9170.85714#DIV/0!0.8571400.857
clim10650.50.14286#DIV/0!0.1428600.143
- Slide 14
- Skill Scores for 2x2 Contingency Tables Heidke Skill Score-
based on the proportion correct referenced with the proportion
correct that would be achieved by random forecasts that are
statistically independent of the observations. Peirce Skill Score-
similar to Heidke Skill score, except the reference hit rate in the
denominator is random and unbiased forecasts. Clayton Skill Score
Gilbert Skill Score or Equitable Threat Score The Odds Ratio ( )
can be used as a skill score
- Slide 15
- Finley Tornado Forecasts example
- Slide 16
- Finley chose to evaluate his forecasts using the proportion
correct, PC = (28+2680)/2803=0.966. Dominated by the correct no
forecast. Gilbert pointed out that never forecasting a tornado
produces an even higher proportion correct:, PC =
(0+2752)/2803=0.982. Threat score gives a better comparison,
because large number of no forecasts are ignored.
TS=28/(28+72+23)=.228 Odds ratio is 45.3>1, suggesting better
than random performance Bias ratio is B=1.96, indicating that
approximately twice as many tornados were forecast as actually
occurred FAR = 0.720, which expresses the fact that a fairly large
fraction of the forecast tornados did not eventually occur. H=0.549
and F=0.0262, indicating that more than half of the actual tornados
were forecast to occur, whereas a very small fraction of the non
tornado cases falsely warned of a tornado. Skill Scores: HSS=0.355
PSS=0.523 CSS=0.271 GSS=0.216 Q=0.957 Threat score gives a better
comparison, because large number of no forecasts are ignored.
TS=28/(28+72+23)=.228 Odds ratio is 45.3>1, suggesting better
than random performance Bias ratio is B=1.96, indicating that
approximately twice as many tornados were forecast as actually
occurred FAR = 0.720, which expresses the fact that a fairly large
fraction of the forecast tornados did not eventually occur. H=0.549
and F=0.0262, indicating that more than half of the actual tornados
were forecast to occur, whereas a very small fraction of the non
tornado cases falsely warned of a tornado. Skill Scores: HSS=0.355
PSS=0.523 CSS=0.271 GSS=0.216 Q=0.957
- Slide 17
- What if your data are Probabilistic? For a dichotomous
predictand, to convert from a probabilistic to a nonprobabilistic
format requires selection of a threshold probability, above which
the forecast will be yes. Ends up somewhat arbitrary.
- Slide 18
- Climatological probability of precip Threshold that would
maximize the Threat score Produce unbiased forecasts (b=1)
Nonprobabilistic forecasts of the more likely of the two
events.
- Slide 19
- Multicategory Discrete Predictands Make into 2x2 tables rain
mix snow R m s R non-rain rain Non-rain
- Slide 20
- Nonprobabilistic Forecasts of continuous predictands It is
informative to graphically represent aspects of the joint
distribution of nonprobabilistic forecasts for continuous
variables.
- Slide 21
- These plots are examples of a diagnostic verification
technique, allowing diagnosis of a particular strengths and
weakness of a set of forecasts through exposition of the full joint
distribution. Conditional Quantile Plots a)performance of MOS
forecasts b) performance of subjective forecasts Conditional
distributions of the observations given the forecasts are
represented in terms of selected quantiles, wrt the perfect 1:1
line. Contain 2 parts, representing the 2 factors in the
calibration refinement factorization of the joint distribution of
forecasts and observations. MOS observed temps are consistently
colder than the forecasts Subjective forecasts are essentially
unbiased. Subjective forecasts are somewhat sharper, or more
refined, more extreme temperatures being forecast more freq.
- Slide 22
- Scalar Accuracy Measures Only 2 scalar measures of forecast
accuracy for continuous predictands in common use. Mean Absolute
Error, and Mean Squared Error
- Slide 23
- Mean Absolute Error The arithmetic average of the absolute
values of the differences between the members of each pair. MAE = 0
if forecasts are perfect. Often used to verify temp forecasts.
- Slide 24
- Mean Squared Error The average squared difference between the
forecast and observed pairs More sensitive to larger errors than
MAE More sensitive to outliers MSE = 0 for perfect RMSE = which has
same physical dimensions as the forecasts and observations To
calculate the bias of the forecast, compute the Mean Error:
- Slide 25
- Skill Scores Can be computed with MAE, MSE, or RMSE as the
underlying accuracy statistics Climatological value for day k
- Slide 26
- Probability Forecasts of Discrete Predictands The joint
Distribution for Dichotomous Events Not just using probabilities of
0 and 1 For each possible forecast probability we see the relative
freq that forecast value was used, and the probability that the
event o 1 occurred given the forecast y i
- Slide 27
- The Brier Score Scalar accuracy measure for verification of
probabilistic forecasts of dichotomous events This is the mean
squared error of the probability forecasts, where o 1 = 1 if the
event occurs and o 2 = 0 if the event doesnt occur. Perfect
forecast BS = 0 less accurate forecasts receive higher BS. Briar
Skill Score:
- Slide 28
- The Reliability Diagram Is a graphical device that shows the
full joint distribution of forecasts and observations for
probability forecasts of a binary predictand, in terms of its
calibration-refinement factorization Allows diagnosis of particular
strengths and weaknesses in a verification set.
- Slide 29
- The conditional event relative frequency is essentially equal
to the forecast probability. Forecasts are consistently too small
relative to the conditional event relative frequencies, avg
forecast smaller than avg obs. Forecasts are consistently too large
relative to the conditional event relative frequencies, avg
forecast larger than avg obs. Overconfident: extreme probabilities
forecast too often Underconfident: extreme probabilities forecast
too infrequently
- Slide 30
- Well-calibrated probability forecasts mean what they say, in
the sense that subsequent event relative frequencies are equal to
the forecast probabilities.
- Slide 31
- Hedging and Strictly proper scoring rules If a forecaster is
just trying to get the best score, they may improve scores by
hedging, or gaming -> forecasting something other than our true
belief in order to achieve a better score. Strictly proper a
forecast evaluation procedure that awards a forecasters best
expected score only when his or her true beliefs are forecast.
Cannot be hedged Brier score You can derive that it is proper, but
I wont here.
- Slide 32
- Probability Forecasts for Multiple-category events For
multiple-category ordinal probability forecasts: Verification
should penalize forecasts increasingly as more probability is
assigned to event categories further removed from the actual
outcome. Should be strictly proper. Commonly used: Ranked
probability score (RPS)
- Slide 33
- Probability forecasts for continuous predictands For an
infinite number of predictand classes the ranked probability score
can be extended to the continuous case. Continuous ranked
probability score Strictly proper Smaller values are better It
rewards concentration of probability around the step function
located at the observed value. 1
- Slide 34
- Nonprobabilistic Forecasts of Fields General considerations for
field forecasts Usually nonprobabilistic Verification is done on a
grid
- Slide 35
- Slide 36
- Scalar accuracy measures of these fields: S1 score, Mean
Squared Error, Anomaly correlation
- Slide 37
- Thank you for your participation throughout the semester All
presentations will be posted on my UD website Additional
information can be found in Statistical Methods in the Atmospheric
Sciences (second edition) by Daniel Wilks