05/11/08 Training seminar on Forecasting 1
Verification at JMA on
Ensemble Prediction
- Part Ⅱ : Seasonal prediction -
Yukiko Naruse, Hitoshi Sato
Climate Prediction Division Japan Meteorological Agency
05/11/08
05/11/08 Training seminar on Forecasting 2
Contents
Standardized Verification System for Long Range Forecast (SVSLRF) Outline of SVSLRF
Compare verification scores of JMA with those of other centers
Notes
Verification of seasonal predictions at JMA Real-time verification of three-month prediction
Verification over the Hindcasts based on SVSLRF
Summary
Information Probabilistic forecasts using the Ordered Probit model※ on the TCC website
※The Ordered Probit model is used as the statistical tool of the MOS.
05/11/08 Training seminar on Forecasting 3
Verification of Seasonal predictions on TCC web-site TCC home ⇒ NWP Model Prediction⇒here
Verification of hindcast
based on SVSLRF
Verification of Real-time
forecast
Click here !
05/11/08 Training seminar on Forecasting 4
Real-time verification of three-month prediction
Z500 over the Northern Hemisphere
Stream function at 850 and 200hPa
Observation Forecast
RMSE and Anomaly Correlation ACC: 0.134(over NH), 0.623(over JPN)
Error
Exp. Initial date 2008.6.16.12Z
Forecast period : Jul.-Aug.-Sep.
Shaded patterns show anomalies.
Blue represents negative
anomalies, orange positive
anomalies.
Shaded
patterns
show errors
(forecast
minus
observation).
850hPa stream function:
Obs: Anti-cyclonic circulation
anomalies over the north of the
Pacific were stronger.
Forecast: Cyclone circulation
anomalies were located over the
south-eastern areas of Japan.
⇒higher errors
850hPa
200hPa
05/11/08 Training seminar on Forecasting 5
Real-time verification of three-month prediction
NH500 anomaly correlation
-0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
02/9/10
03/1/10
03/5/10
03/9/10
04/1/10
04/5/10
04/9/10
05/1/10
05/5/10
05/9/10
06/1/10
06/5/10
06/9/10
07/1/10
07/5/10
07/9/10
08/1/10
08/5/10
initial date
AC
C90days mean 90days mean ( running mean of five times)
El Nino
La Nina
● 0.32
average score:
2002.09-2008.06
Time series show anomaly correlation of 500hPa height in the Northern Hemisphere from Sep. 2002
started three-month forecast at JMA to present.
Red-colored solid circles
represent ACC of each forecast,
thick red line 5-times running
means.
Periods of El Nino and La
Nina events are shown as red-
arrows and green-arrows,
respectively.
Blue-colored solid circle
indicates average score of ACC
from Sep. 2002 to Jun. 2008.
Time series seem to
change the cycle about
one and a half of year or
two years.
Season and three-month
forecasts over periods of
El Nino and La Nina
events have better
accuracy.
My opinion:
05/11/08 Training seminar on Forecasting 6
Verification over the Hindcasts based on SVSLRF
1)Verification of deterministic forecasts
・Mean Square Skill Score (MSSS)
・Contingency tables
2)Verification of probabilistic forecasts
・Reliability diagrams
・ROC curves and ROC areas
SVSLRF : Standardized Verification System for Long-Range Forecast
Two kind of verification
The same method as
one-month forecast
05/11/08 Training seminar on Forecasting 7
Mean Squared Skill Score (MSSS)
N
i
ii OFN
MSE1
21
nobservatio:
forecast:
O
F
cMSE
MSEMSSS 1
where MSE is the mean squared error
MSSS can be expanded (Murphy,1988) as
22
22
1
121
1
122
n
n
n
n
s
of
s
sr
s
sMSSS
oo
f
fo
o
f
s : variance
r : correlation
f : forecast
o : observation
Perfect score: 1 (when MSE=0)
Climatology forecast score: 0
The first 3 terms are related to
① phase error (through the correlation)
②amplitude errors (through the ratio of the forecast to observed variances)
③bias error
① ② ③
05/11/08 Training seminar on Forecasting 8
Examples of MSSS
Forecast period: Dec-Feb started on 10 Nov.
T2m
Positive MSSS indicates that the forecast is better than climatological
forecast. • T2m is almost better than climatological forecast.
• Precipitation is little worse than climatological forecast.
• Be careful the forecast over the region worse.
Precipitation
05/11/08 Training seminar on Forecasting 9
Contingency tables
FARHRKS
2
1KSKSS
occurrencesnon-
occurrences
Forecasts occurrences O1 NO1 O1+NO1non-
occurrencesO2 NO2 O2+NO2
O1+O2 NO1+NO2 T
Observations
General ROC contingency table
for deterministic forecasts.
Below Normal
Near Normal
Above Normal
Below Normal
n11 n12 n13 n1*
ForecastsNear
Normaln21 n22 n23 n2*
Above Normal
n31 n32 n33 n3*
n*1 n*2 n*3 Total
Observations
General three by three contingency table
for categorical deterministic forecasts.
Exp. Dec.-Feb. started on 10 Nov. Elem.; T2m Region; Northern Japan
The contingency tables are useful for comparisons between
different deterministic categorical forecast sets. n31n21n111*n
KSS=0.5 : no information
(HR being equal to FAR) FARHRKS
2
1KSKSS
Our model are over-forecasting at
below and above categories
05/11/08 Training seminar on Forecasting 10
Examples of Reliability Diagrams
Dec-Feb started on 10 Nov.
Event; upper tercile
T2m
Positive MSSS indicates that the forecast is better than
climatological forecast.
Precipitation
Positive BSS indicates that the forecast is better than climatological forecast. The T2m prediction has good skill.
Be careful when use the precipitation prediction. Some calibrations before use are necessary.
BSS*=4.1 BSS=16.3 BSS=-1.2 BSS=-6.1
Good Skill ! No Skill !
*Brier Skill Scores x 100
05/11/08 Training seminar on Forecasting 11
Examples of ROC curves and areas
T2m:NH Precipitation:TRP
ROC area better than 0.5 indicates that the forecast is better than climatological forecast.
According to ROC area on each grid point, scores over Iran, India and Southeast Asia are good.
On the other hand, right above scores of Japan are worse than climatological forecast.
ROC=65.0 ROC*=67.9
*ROC x 100
Dec-Feb started on 10 Nov.
Event; upper tercile
Good Skill !
T2m: ROC area on each grid point
05/11/08 Training seminar on Forecasting 12
Contents
Standardized Verification System for Long Range Forecast (SVSLRF) Outline of SVSLRF
Compare verification scores of JMA with those of other centers
Notes
Verification of seasonal predictions at JMA Real-time verification of three-month prediction
Verification results over the Hindcasts based on SVSLRF
Summary
Information Probabilistic forecasts using the Ordered Probit model※ on the TCC website
※The Ordered Probit model is used as the statistical tool of the MOS.
05/11/08 Training seminar on Forecasting 13
SVSLRF (Standardized Verification System for Long-Range Forecast)
What is this ? WMO standard tool to verify skill in seasonal models
“The SVS for LRF“ described herein constitutes the basis for long-range
forecast evaluation and validation, and for exchange of verification scores.
This manual was offered within the Commission for Basic Systems (CBS) of
the World Meteorological Organisation (WMO) in December, 2002.
Why is SVSLRF necessary ? Long-range forecasts are being issued from several Centers and are being made available in
the public domain.
Forecasts for specific locations may differ substantially at times, due to the inherent limited
skill of long-range forecast systems.
This situation will lead to confusion amongst users, and ultimately was reflecting back
on the science behind long-range forecasts.
Users should appropriately understand "How much skill does this
forecast issued from the Center have?".
05/11/08 Training seminar on Forecasting 14
Outline of SVSLRF
Long-Range Forecasts LRF extend from thirty (30) days up to two (2) years.
Monthly and Three-month or 90-day period, Seasonal
Forecasts periods A 90-day period or a season. (If available 12 rolling three-month periods
(e.g. JFM, FMA, MAM).
Parameters to be verified a) Surface air temperature (T2m) anomaly at screen level;
b) Precipitation anomaly;
c) Sea surface temperature (SST) anomaly and the Nino3.4 Index. c) is only the coupled ocean-atmosphere model.
Three levels of verification Level 1 : large scale aggregated overall measures of forecast performance.
Level 2 : verification at grid-points.
Level 3 : grid-point by grid-point contingency tables for more extensive verification.
05/11/08 Training seminar on Forecasting 15
Three Levels of Verification
Parameters Verification regions Deterministic
forecasts
Probabilistic
forecasts
Level 1 T2m anomaly
Precipitation
anomaly
(Nino3.4 Index)
Tropics(20S-20N)
Northern
extratropics(20N-90N)
Southern
extratropics(20S-90S)
(N/A)
MSSS ROC curves
ROC areas
Reliability diagrams
Frequency
histograms
Level 2 T2m anomaly
Precipitation
anomaly
(SST anomaly)
Grid-point verification on
a 2.5°by 2.5°grid
MSSS and its
three-term
decomposition
at each grid-
point
ROC areas at each
grid-point
Level 3 T2m anomaly
Precipitation
anomaly
(SST anomaly)
Grid-point verification on
a 2.5°by 2.5°grid
3 by 3
contingency
tables at each
grid-point
ROC reliability tables
at each grid-point Submit results of levels 1
and 2 to the Lead Center
05/11/08 Training seminar on Forecasting 16
Verification Data
Hindcasts LRF systems should be verified over as long a period as possible in hindcast
mode.( not real-time operational forecast, not enough forecast time for verification )
Period : from 1981 to 2001
The number of bins : between 9 and 20 (bins = “ensemble member size” + 1)
Calculation (means, standard deviations, class limits, etc) Cross-validation framework
Lead time At least 2 , to 4 (max lead time) JMA : submitted only 1 lead time
Verification Data Sets T2m : UKMO/CRU or ERA-40
Precipitation : GPCP or CMAP
SST : Reynolds OI or Smith and Reynolds
If recommended data is not
available, the center can use the
center own reanalysis.
⇒JRA-25 in the case of JMA
•Considering the merit of JRA-25 and JCDAS in the verification of hindcast and real-time forecastings, JMA are
using JRA-25 as verification data of T2m prediction instead of UKMO/CRU.
05/11/08 Training seminar on Forecasting 17
Outline of SVSLRF
Lead Centre for the Long-Range Forecast Verification System
Australian Bureau of Meteorology (BOM)
Meteorological Service of Canada (MSC)
URL: http://www.bom.gov.au/wmo/lrfvs/
Lead Center web-site
You can get scores of several
centers on Lead Centre website.
The Manual is here
05/11/08 Training seminar on Forecasting 18
Examples of graphics on the SVSLRF website
Exp. : ROC curves
Model : JMA
Parameter : T2m
Season : DJF
Lead time : 1 month
Area : Tropics
Model : UKMetO
JMA・・・Upper Tercile : 0.755
UKMetO・・・Upper Tercile : 0.76
ROC scores
ROC scores>0.5 :
better skill than
climatology forecasts
Level1. Region scores
ROC scores ROC scores
05/11/08 Training seminar on Forecasting 19
Compare verification scores of JMA with those of other centers
Model Verification data sets Hindcasts
period
Ensemble size for
hindcasts
note
T2m Precipitati
on
JMA JRA-25 GPCP2 1984-2005(22) 11
TL95L40
SST: combination of persisted anomaly,
climate and prediction
MSC CRU2.1 GPCP 1981-2000(20) 12 AGCM(1.875×1.875L50,T32L10)
UKMetO ERA-40 GPCP 1987-2002(16) 15 AGCM (2.5×3.75L40,0.3-1.25×1.25L40)
KMA CDAS2 CMAP 1979-2006(28) ? GDAPS (Deterministic long-range forecasts)
BOM ERA40 CMAP 1987-2001(18) ?
ECMWF ERA-40 GPCP 1987-2002(16) 5 (may and nov.: 40) AOGCM (TL95L60,0.3-1.4×1.4L29) system2
Meteo-Fr ERA-40 CMAP 1993-2003(11) 5 AOGCM (T63L31C1,)
NCEP CPC CMAP 1982-2004(23) AOGCM (T62L64,0.3-1×1L40)
IRI CRU GPCP 1981-2001(21) 12 AGCM(T42)、SST; persisted anomaly
BCC-
CGCM ERA-40 CMAP 1983-2001(19) ?
AOGCM
(T63L16,T63L30)
HMC-
SLAV CRU CMAP 1980-2001(22) 10 (1.125x1.40625L28)
CPTEC CRU? GPCP 1979-2000(22) 10 AGCM T062L28
Publishing countries (12 countries)
05/11/08 Training seminar on Forecasting 20
Compare verification scores of JMA with those of other centers
[T2m] Upper tercile region: northern extra-tropics
0.5
0.52
0.54
0.56
0.58
0.6
0.62
0.64
0.66
0.68
DJF MAM JJA SON
RO
C s
core
JMA(V0703C) MSC UKMetOECMWF Meteo-Fr NCEPIRI BCC-CGCM HMC-SLAV
Northern extra-tropics (20-90N) [T2m] Upper tercile region: tropics
0.5
0.55
0.6
0.65
0.7
0.75
0.8
DJF MAM JJA SON
RO
C s
core
JMA(V0703C) MSC UKMetOECMWF Meteo-Fr NCEPIRI BCC-CGCM HMC-SLAV
Tropics (20S-20N)
ROC areas : Parameter・・・2-meter temperature (T2m)
Event・・・Upper tercile scores
ROC scores of T2m over the Northern extra-tropics in JMA is the best among them.
ROC scores of T2m over the Tropics in JMA is similar to these one in ECMWF.
05/11/08 Training seminar on Forecasting 21
Compare verification scores of JMA with those of other centers
[Precipitation] Upper tercile region: northern extra-tropics
0.44
0.46
0.48
0.5
0.52
0.54
0.56
0.58
0.6
0.62
DJF MAM JJA SON
RO
C s
core
JMA(V0703C) MSC UKMetO
ECMWF Meteo-Fr NCEP
IRI BCC-CGCM HMC-SLAV
[Precipitation] Upper tercile region: tropics
0.55
0.57
0.59
0.61
0.63
0.65
0.67
0.69
DJF MAM JJA SONR
OC
sco
reJMA(V0703C) MSC UKMetOECMWF Meteo-Fr NCEPIRI BCC-CGCM HMC-SLAV
ROC scores of precipitation over the Northern extra-tropics in JMA is similar to these one
in ECMWF. The score in JJA in JMA is the best among them.
ROC scores of precipitation over the tropics in JMA is worse than these one in ECMWF.
ROC areas : Parameter・・・Precipitation
Event・・・Upper tercile scores
Northern extra-tropics (20-90N) Tropics (20S-20N)
05/11/08 Training seminar on Forecasting 22
Compare verification scores of JMA with those of other centers
[T2m] region: northern extra-tropics
-0.6
-0.5
-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
DJF MAM JJA SON
MSSS s
core
JMA(V0703C) MSC ECMWF Meteo-Fr
Mean Square Skill Score
Parameter・・・2-meter temperature (T2m)
[T2m] region: tropics
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
0.4
0.5
DJF MAM JJA SONM
SSS s
core
JMA(V0703C) MSC ECMWF Meteo-Fr
MSSS of T2m over the Northern extra-tropics and the Tropics in JMA is the best
among them.
Northern extra-tropics (20-90N) Tropics (20S-20N)
05/11/08 Training seminar on Forecasting 23
Compare verification scores of JMA with those of other centers
[Precipitation] region: northern extra-tropics
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
DJF MAM JJA SON
MSSS s
core
JMA(V0703C) MSC ECMWF Meteo-Fr
[Precipitation] region: tropics
-0.6
-0.5
-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
DJF MAM JJA SONM
SS
S s
core
JMA(V0703C) MSC ECMWF Meteo-Fr
MSSS of precipitation over the Northern extra-tropics in JMA is similar to these
one in ECMWF.
MSSS of precipitation over the Tropics in JJA in JMA is the worst among them.
Mean Square Skill Score
Parameter・・・Precipitation
Northern extra-tropics (20-90N) Tropics (20S-20N)
05/11/08 Training seminar on Forecasting 24
Notes
Sensitivity of skill score to verification data
The differences between each score using three verification data are smaller than 1%. (Figure 1, 2)
The error bar indicates the uncertainty of verification sampling.
Verification sampling have larger influence on the forecast skill scores than difference of verification data.
Sensitivity of skill score to period of hindcasts
The difference between two scores is larger than those of verification data as shown in figure 2. (Figure 3)
Mean Square Skill Score of 2-meter temperature
over land in the tropics
-0.10
0.00
0.10
0.20
0.30
0.40
0.50
DJF MAM JJA SON
Season
MSSS
JRA-25 ERA-40 UKMO/CRU
Figure 1: MSSS with JRA-25, ERA-40 and CRU.
ROC area of 2-meter temperature in upper tercile
over land in the tropics
0.60
0.65
0.70
0.75
0.80
0.85
DJF MAM JJA SONSeason
RO
C a
rea
JRA-25 ERA-40 UKMO/CRU
Figure 2: ROC area with JRA-25, ERA-40 and CRU.
0.55
0.6
0.65
0.7
0.75
0.8
DJF MAM JJA SONSeason
RO
C a
rea
18years 22years
Figure 3: ROC area with 18years hindcast, 22years hindcst.
It is important that have the same verification sampling and hindcast period as other centers.
05/11/08 Training seminar on Forecasting 25
Summary of Part Ⅱ
We are verifying over period of hindcasts based on SVSLRF
and real-time operational forecasts.
You should check the accuracy of our models on TCC
web-site. http://ds.data.jma.go.jp/tcc/tcc/products/model/index.html
We have to note that there are different verification sampling and hindcast period among those centers when comparing own scores with other centers.
We can get verification scores of several centers on Lead
Center web-site.
05/11/08 Training seminar on Forecasting 26
References
Murphy, A.H., 1988: Skill scores based on the mean square error and their
relationships to the correlation coefficient. Mon. Wea. Rev., 16, 2417-2424.
WMO, 2006: Standardized Verification System (SVS) for Long-Range Forecasts
(LRF). New Attachment II-8 to the Manual on the GDPFS (WMO-No.485), Volume
I.
http://www.bom.gov.au/wmo/lrfvs/
05/11/08 Training seminar on Forecasting 27
Information
Probabilistic forecasts
05/11/08 Training seminar on Forecasting 28
Probabilistic forecasts using the Ordered Probit model* on the TCC website
*The Ordered Probit model is used as the statistical tool of the MOS.
three-month forecast
Verification
http://ds.data.jma.go.jp/tcc/tcc/products/model/probfcst/4mE/index.html
05/11/08 Training seminar on Forecasting 29
Probabilistic forecasts
Forecast period: three-month mean
Parameter: Surface temperature and Precipitation
Region: each grid
Issued date: the end of every month
Nov-Jan
05/11/08 Training seminar on Forecasting 30
Probabilistic forecasts
Init. Oct. 2008
Forecast period:
Nov.-Dec.-Jan. 2009
Parameter: surface temperature
Probabilistic verification score at the grid-point(100E, 15N)
The grid-point is better than climatology.
Notice: Observed frequency of probabilistic over 0.5
is less than the expected frequency.
BSS: 8.4 > 0
ROC area: 0.59 > 0.5
Init. 10th Aug.
Probabilistic forecast of all case
over hindcast
23:35:42
Probabilistic forecasts