Download - Extended Prediction of North Indian Ocean Tropical Cyclones using

Extended Prediction of North Indian Ocean Tropical Cyclones using the

ECMWF Variable Ensemble Prediction System

James I. Belanger*, Peter J. Webster, and Judith A. Curry

School of Earth and Atmospheric Sciences, Georgia Institute of Technology,

Atlanta, GA

July 2011

Submitted to Monthly Weather Review

*Corresponding Author Address: James I. Belanger School of Earth & Atmospheric Sciences Georgia Institute of Technology 311 Ferst Drive Atlanta, GA 30332-0340

Email: [email protected]

2

Abstract 1

This analysis examines the predictability of several key forecasting parameters using the 2

ECMWF Variable Ensemble Prediction System (VarEPS) for tropical cyclones (TCs) in the 3

North Indian Ocean including: tropical cyclone genesis, pre-genesis and post-genesis track and 4

intensity projections, and regional outlooks of tropical cyclone activity for the Arabian Sea and 5

the Bay of Bengal. Based on the evaluation period from 2007 to 2010, the VarEPS TC genesis 6

forecasts demonstrate low false alarm rates and moderate to high probabilities of detection for 7

lead-times of one to seven days. In addition, VarEPS pre-genesis track forecasts on average 8

perform similar to the VarEPS post-genesis forecasts through 120 hrs and feature a total track 9

error growth of 41 nm per day. The VarEPS provides superior track forecasts for lead-times 10

greater than 12 hrs compared to other models including: UKMET, NOGAPS, and GFS, and 11

slightly lower track errors than the Joint Typhoon Warning Center. Given the societal need for a 12

modern forecasting system that extends beyond three days in the NIO, we conclude with a 13

discussion of how the VarEPS can provide much of this extended predictability in a probabilistic 14

framework.15

3

1. Introduction: 16

Tropical cyclones (TCs) in the North Indian Ocean have a profound impact on the littoral 17

countries of the Arabian Sea and the Bay of Bengal. The combination of a shallow coastal plain 18

along with a thermodynamically favorable environment allows TCs to impart high surface 19

winds, torrential rains, and significant wave heights (wave setup plus storm surge) as these 20

systems move inland. Low socioeconomic conditions in the region combined with the world’s 21

highest population density allow landfalling TCs that historically have had little advance warning 22

to become devastating natural disasters. In fact, eight of the ten deadliest TCs of all time have 23

occurred in the Bay of Bengal and the Arabian Sea with five impacting Bangladesh and three 24

making landfall in India (WMO-TD No. 84). In addition, the highest surge height ever recorded 25

by a TC (45 feet) occurred in the North Indian Ocean near the Meghna Estuary, Bangladesh in 26

1876 (WMO-TD No. 84). 27

The complex meteorological and socioeconomic environment in the Bay of Bengal and 28

throughout the North Indian Ocean necessitates a warning system that provides several days of 29

advance notice for TC formation, track movement, intensity change, horizontal distribution of 30

surface winds, and maximum coastal wave heights. However, the World Meteorological 31

Organization (WMO) only requires a small subset of this critical forecast information to be 32

generated daily. The Indian Meteorological Department (IMD), which is the WMO–designated 33

Regional Specialized Meteorological Centre (RSMC), provides the official forecasts and 34

warnings for tropical cyclones in the North Indian Ocean region. As mandated by the WMO, the 35

IMD is required to coordinate and release their forecasts daily with each member country within 36

the North Indian Ocean. However, ultimate responsibility for forecast development and warning 37

dissemination lies with each country’s national meteorological service. As part of the daily 38

4

operational procedure when a TC is not present in the region, the IMD is required to prepare a 39

daily tropical weather outlook, which assesses the possibility of tropical depression development 40

in the Bay of Bengal and the Arabian Sea. Unlike the National Hurricane Center’s Tropical 41

Weather Outlook, which provides the likelihood of TC genesis during the next 48 hours 42

(Rappaport et al. 2009), the WMO provides no time constraint for this outlook product as the 43

time period that is covered by this forecast is determined separately by each RSMC. In addition, 44

this product does not provide any quantitative, probabilistic information about the potential for 45

tropical cyclone formation. 46

After a TC has formed in the NIO, the IMD begins issuing advisories, which contain 47

information for the TC including its past movement, current location and intensity, and forecast 48

information including its future location, translation speed, wind intensity, maximum average 49

surface wind speed, and highest surface wind gust. However, the WMO only requires that these 50

tropical cyclone advisories and forecasts extend out to three days in the NIO even though most 51

numerical guidance for tropical cyclone forecasting today spans a full five-day forecast window 52

(Rappaport et al. 2009). Since the average lifetime of a tropical cyclone in the North Indian 53

Ocean is about four days, these limited outlooks do not provide enough lead-time for disaster 54

preparation or evacuation. Also even though this region has historical experienced devastating 55

impacts from tropical cyclone-induced storm surge in the Bay of Bengal, the IMD does not 56

produce storm surge forecasts, as these remain the responsibility of each country’s national 57

meteorological service (WMO-TD No. 48). 58

Given the societal need for a modern forecasting system that incorporates several additional 59

forecast parameters with a lead-time well beyond three days, here we show how forecasts from 60

5

the European Centre for Medium-Range Weather Forecasts (ECMWF) Variable Ensemble 61

Prediction System (VarEPS) can provide much of this new information. 62

2. Data and Methods 63

a) ECMWF Variable Ensemble Prediction System 64

The predictability of tropical cyclones in the North Indian Ocean is evaluated using the 65

European Centre for Medium-Range Weather Forecasts (ECMWF) Variable Ensemble 66

Prediction System (VarEPS). During the period 2007–2010, the ECMWF VarEPS (hereafter; 67

VarEPS) has undergone a number of important changes1 that include increasing the horizontal 68

and vertical resolution of the modeling system, expanding data assimilation procedures to 69

include a greater number of satellite radiance measurements, updating model physics and 70

cumulus convection parameterization schemes, and changing how initial and stochastic 71

perturbations are generated using singular vectors. 72

As of January 26, 2010, the VarEPS includes the ECMWF global model that is run at 73

TL1279 spectral truncation (horizontal resolution ~ 16 km) with 91 vertical levels out to ten days 74

along with 51 ensemble members (50 perturbed members + 1 control run) at TL639 spectral 75

truncation (horizontal resolution ~ 32 km) with 62 vertical levels. For days 11–15, the 51-76

member ensemble is processed at a reduced TL319 spectral truncation (horizontal resolution 77

about 63 km). 78

To represent the uncertainty in initial conditions, ensemble perturbations are constructed 79

using singular vectors, which capture the fastest growing errors in the first 48 hours (Buizza and 80

Palmer 1995). Stochastic perturbations are also added during the model integration to account 81

for the uncertainty in parameterized physical processes. Five additional singular vectors are 82

82 1 A complete description of these changes may be found at: http://www.ecmwf.int/products/data/technical/model_id/index.html.

6

computed and perturbed in the six grid spaces enclosing each TC using a diabatic, adjoint 83

version of the ECMWF global atmospheric model at TL42 spectral truncation with 42 vertical 84

levels (Barkmeijer et al. 2001, Puri et al. 2001). The 15-day VarEPS 00UTC forecasts for the 85

period 1 January 2007 to 31 December 2010 were obtained through the THORPEX Interactive 86

Grand Global Ensemble (TIGGE; http://tigge.ecmwf.int/) project where the North Indian Ocean 87

domain included the region: 0–30N; 40–110E and were at a horizontal resolution of 0.25o x 88

0.25o. 89

b) Tropical Cyclone Tracking Scheme 90

To isolate tropical cyclones in the VarEPS analysis and forecast fields, a modified version of 91

the Suzuki-Parker tracking scheme is used (Holland et al. 2010). This tracking scheme utilizes 92

the following variables including: 850 hPa relative vorticity, mean sea level pressure, 10 m 93

winds, and geopotential heights at 925, 850, 700, 500, and 200 hPa. To increase the maximum 94

intensity retrievals from the VarEPS, 10 m winds are replaced with winds averaged in the lower 95

troposphere (10 m, 925 hPa, and 850 hPa). The tracking scheme processes each ensemble 96

member for tropical cyclones by first identifying candidate vortices that exhibit a local minimum 97

in mean sea level pressure. This initial set of vortices is filtered by removing systems that do not 98

have a maximum lower tropospheric wind speed greater than 16 kts and 850 hPa relative 99

vorticity greater than 1x10-4 s-1. The next component of the tracking scheme confirms that each 100

identified vortex possess a warm core as defined using the Hart (2003) phase-space method. 101

After the tracking scheme has been implemented for each ensemble member, utilizing the land-102

sea mask from the VarEPS, any tracks that originate over land are removed unless the ensemble 103

tracks are within 300 nm of an observed tropical cyclone. Furthermore, all ensemble forecast 104

tracks must have a lifetime of at least one day. Still, the modified tracking scheme still has some 105

7

deficiencies especially in localized regions where vorticity and winds in the lower troposphere 106

exceeds tracking thresholds. These exceedances may occur in association with topography and 107

with the Somalian easterly jet over northeastern Africa. Accordingly, due to a spurious number 108

of unrealistic forecasts tracks that originate within the region: 10–17N; 40–55W, any candidate 109

tracks that emerge from this region are automatically removed unless they were in association 110

with an observed tropical cyclone. 111

c) Filtering Tropical Cyclone Forecast Tracks and Determining False Alarms 112

After applying the tropical cyclone tracking scheme and post-processing routines to the 113

VarEPS, a tropical cyclone filtering algorithm is then used to determine which ensemble track 114

forecasts are false alarms versus those associated with an observed tropical cyclone. The 115

essential information is the initial location and time of the ensemble track forecast, which is 116

compared to observed locations and times of tropical cyclones. If the forecast track precedes or 117

follows the date of tropical cyclone genesis, defined as the date when the first tropical depression 118

advisory is issued by the Joint Typhoon Warning Center (JTWC), then the forecast track is 119

classified as a pre-genesis or post-genesis forecast track, respectively. To determine the 120

clustering of pre-genesis tracks, the initial location and time of the tropical cyclone when genesis 121

occurred is used as the observation point. To determine the post-genesis clustering, the 122

observation point is the location and time of the tropical cyclone based on the initialized date of 123

the VarEPS forecast. Initially, the distance and time threshold used for defining an ensemble 124

forecast track as a tropical cyclone forecast track is ‘loose’, such that an ensemble track forecast 125

that is within 1500 nm and within 15 days of the observed TC is considered a ‘tropical cyclone 126

forecast track’. 127

8

The pre-genesis ensemble tracks for all tropical cyclones during the period 2007–2010 are 128

then used to determine ‘strict’ distance and time separation thresholds, which replace the set of 129

‘loose’ thresholds used initially. In this case, the thresholds that were derived include an initial 130

distance threshold of 500 nm, which increases by 100 nm for every 24 hrs of forecast lead-time. 131

In addition, the initial temporal separation threshold is ± 5 days with the upper threshold 132

increasing by one day for every 24 hrs of additional forecast-lead time. 133

Upon determining which ensemble forecast tracks are TC tracks, there is an implication that 134

all other ensemble forecast tracks must be false alarms. Using this definition of a false alarm 135

track leads to the conclusion that >50% of all VarEPS forecast tracks for the period 2007–2010 136

are actually false alarms. The major problem with this false alarm definition is that it does not 137

take advantage of the probabilistic framework of the VarEPS, where for a particular forecast the 138

distribution of all ensemble track forecasts provides a probabilistic estimate for the potential 139

formation and movement of a tropical cyclone. Therefore, we have developed a false-alarm 140

clustering algorithm such that a false alarm occurs when a cluster of VarEPS ensembles produces 141

a localized set (in space and time) of forecast tracks. The method uses the tropical cyclone 142

filtering routine discussed earlier, as well as k–means clustering (Mirkin 1996), which is an 143

algorithm designed to divide a set of points (or in this case initial ensemble forecast coordinates) 144

into k–clusters whose membership is based on minimization of the distance between each 145

ensemble’s forecast point and each cluster’s mean or centroid location. 146

The procedure used to determine the number and location of the false alarm clusters for a 147

particular forecast is shown schematically in Figure. First, a probability threshold is assigned 148

such that a false alarm cluster occurs when the normalized number of ensemble tracks within a 149

cluster is in excess of the false alarm probability. Next, all ensemble tracks for a particular 150

9

forecast are compared with the set of ensemble forecast tracks that were defined as either pre-151

genesis or post-genesis tropical cyclone tracks. Those ensemble tracks that are not in the set of 152

tropical cyclone forecast tracks become the ‘candidate false alarms’ creating a new set of 153

ensemble forecast tracks for further analysis. Thereafter, the clustering routine begins first with 154

the assumption that all ensemble tracks belong to the same set, so that an ensemble mean initial 155

location and forecast time is defined. This coordinate information then becomes the ‘observation 156

point’ that the tropical cyclone filtering routine uses to determine which ensemble forecast tracks 157

are within the distance and time thresholds of the cluster’s ensemble mean starting time and 158

initial location. After a subset of ensemble forecast tracks is identified, if the total number of 159

forecast tracks within this set is in excess of the false alarm probability threshold, then the 160

process described previously begins again except that the number of clusters is increased by one 161

(k = 1 ! k = 2), so that now, two k-means clusters are created from the initial set of ensemble 162

forecast tracks. Each cluster’s coordinate information is then used in the tropical cyclone 163

filtering routine. If ensemble track membership of both clusters is in excess of the false alarm 164

probability threshold, then the complete process is repeated except that three clusters (k = 3) are 165

defined. This iterative process continues until the number of ensemble tracks for any cluster 166

drops below the false alarm probability threshold. When this occurs, the finalized number of 167

false alarm clusters is determined to be k–1 clusters. 168

3. Results and Discussion 169

In Section 3a we present an analysis of how well the VarEPS predicts tropical cyclone 170

formation in the North Indian Ocean for the period 2007–2010. The analysis is accomplished by 171

evaluating the probability of detection and the false alarm rate as a function of forecast lead-time 172

and increasing probability threshold. We also include an evaluation of how well the pre-genesis 173

10

tropical cyclone forecast tracks and intensities from the VarEPS perform relative to observations. 174

In Section 3b, a traditional evaluation of forecast skill (i.e. post-TC genesis) is presented through 175

an examination of the track and intensity error statistics. Finally, in Section 3c we examine the 176

basin-wide predictability of regional TC activity in the North Indian Ocean by evaluating the 177

VarEPS forecasts for the Arabian Sea and the Bay of Bengal. 178

a) Tropical Cyclone Formation and Pre-genesis Forecasts 179

To illustrate some of the forecast information that is contained within the VarEPS forecasts 180

for tropical cyclones, Figurea shows the spatial distribution of tropical cyclone forecast tracks for 181

Severe Cyclone Nargis from the VarEPS initialized on 23 April 2008 00UTC, which is about 182

four days prior to the initiation of tropical depression advisories by the JTWC. In this case, the 183

VarEPS forecasts are in good agreement that the pre-cyclonic vortex that would become Nargis 184

would reach advisory criteria around 27 April 2008 in the central Bay of Bengal. Thereafter, the 185

tropical cyclone is forecast to move generally towards the east-northeast on a track that would 186

cause the system to make landfall in Myanmar around 30 April 30 or 1 May with a high (60%+) 187

probability as a hurricane and a very low (5%) probability as a severe cyclone/major hurricane. 188

In reality, Nargis intensified to category 4 hurricane level on the Saffir-Simpson intensity scale 189

with maximum sustained winds of 115 kts. The system made landfall in southern Myanmar on 2 190

May 2 12UTC, propagating eastward across the Irrawaddy delta (Webster 2007). The forecast 191

performance of the VarEPS for Severe Cyclone Nargis is exceptional in nearly all facets of 192

prediction: tropical cyclone genesis was forecast correctly eight days in advance, ensemble mean 193

track errors never exceeded 375 nm even based on pre-genesis track forecasts, and maximum 194

intensity forecasts, although underestimated, indicated a moderate (30%+) probability of Nargis 195

reaching hurricane intensity nearly six days in advance of TC formation. 196

11

A more systematic study of all tropical cyclones in the North Indian Ocean is now conducted 197

to determine whether the Nargis forecast performance is a distinguishing characteristic of the 198

VarEPS. Figure 3 is a relative operating characteristic (ROC) evaluation of the VarEPS 199

forecasts for NIO tropical cyclone genesis using the metrics of probability of detection (POD; 200

also known as hit rate) and false alarm rate (FAR; also known as probability of false detection) 201

for all 23 tropical cyclones occurring during the 2007–2010 period. The POD is a measure of the 202

fraction of observed tropical cyclones where TC genesis was forecasted correctly relative to all 203

observed TCs. The FAR is the proportion of all forecasts where a forecast of TC genesis was 204

issued and did not occur along with the number of correction rejections (i.e. TC genesis was not 205

forecast to occur and it did not occur). The false alarm rate should not be confused with the false 206

alarm ratio, which is the proportion of all forecasts where TC formation is forecast to occur but 207

did not (Barnes et al. 2009). Since the false alarm rate is a function not only of the false alarms 208

but also the number of correct rejections, the number of VarEPS forecasts included in this 209

evaluation will modulate the false alarm rate mainly through the number of correct rejections. If 210

all VarEPS forecasts from 2007–2010 are included, the false alarm rate as a function of forecast 211

probability threshold is significantly lower than if only the months traditionally associated with 212

NIO TC activity are included. Therefore, for this analysis, the VarEPS evaluation is restricted to 213

the months of April to June and August to December for the 2007–2010 period. 214

The POD and FAR statistics are also sensitive to the time window that is used for verification. 215

For instance, if one were to verify all one-day lead-time TC genesis forecasts, but did not require 216

that the VarEPS correctly forecast the actual date and time of TC genesis (i.e. the full period of 217

the VarEPS integration is used), the POD (FAR) would be much larger (smaller) than if a short 218

window centered on the time of TC genesis is required. Figure 3a-c show the POD and FAR of 219

12

the VarEPS forecasts using 48-hr, 96-hr, and 360-hr time windows calculated at a 10% forecast 220

probability interval ranging from 0 to 100%. Using the 48-hr time window (Figure 3a), the 221

VarEPS forecasts for lead-times of one to five days in advance exhibit moderate probability of 222

detections (0.4 to 0.7) with very low false alarm rates (0.1–0.2) mainly for forecast probability 223

thresholds of 10–40%. Although the false alarm rate never exceeds 0.2 even at extended lead-224

times, VarEPS forecasts made over seven days in advance tend to have low POD once forecast 225

probability thresholds increase beyond 20%. This observation reflects a decreased frequency of 226

VarEPS forecasts at long lead-times where the forecast probability exceeds 20%, which is in 227

contrast with the occurrence frequency of higher forecast probability levels at shorter lead-times 228

(not shown). If a 96-hr time window (Figure 3b) is used, the probability of detection (false alarm 229

rate) of the VarEPS on average increases by about 0.2 (0.1) for forecasts of TC genesis made less 230

than seven days in advance. Forecasts greater than seven days in advance benefit even more 231

from the increase in time window, as the POD for a ten-day lead-time forecast increases on 232

average by about 0.15, and the FAR increases by only 0.05 for a forecast probability threshold of 233

10–30%. Finally, using the full 15-day period of the VarEPS to define the time window of TC 234

genesis (Figure 3c), the average POD exceeds 0.7 and the FAR ranges from 0.3 to 0.6 based on 235

forecast probability thresholds of 10–40%. 236

A key component of any operational forecasting system for TC genesis is knowing what 237

combination of forecast lead-time and time window maximizes the probability of detection while 238

minimizing the false alarm rate. The area under the ROC curve, known as the ROC score, varies 239

from 0 to 1, where 0.5 indicates no forecast skill, and a value of 1 indicates a perfect forecast 240

system (Mason and Graham 1999). If the ROC score is calculated as a function of forecast lead-241

time for various time windows, the optimum time window is found by determining at which time 242

13

window the ROC score obtains a maximum value. For TC genesis forecasts with lead-times of 243

one to five days in advance, a 48-hr time window around the forecast date of genesis maximizes 244

the probability of detection while minimizing the false alarm rate. For TC genesis forecasts 245

beyond a lead-time of five days, the ROC score is maximized if the full 15-days of the VarEPS 246

integration is used to determine the forecast time of TC genesis. 247

The following analysis allows a greater understanding of the NIO false alarms that occur in 248

the VarEPS. The spatial distribution of each false alarm cluster’s starting centroid location from 249

2007 to 2010 is shown in Figure 4a. Although the false alarms in the Arabian Sea and southern 250

Bay of Bengal tend to be distributed uniformly during the 2007–2010 period, there is a relatively 251

high concentration of false alarm clusters that stretch from the northwestern Bay of Bengal into 252

the extreme northeastern Arabian Sea. The false alarm clusters in the northwestern Bay of 253

Bengal tend to occur in a localized region with a large gradient in topography between the Bay 254

of Bengal and the northern portion of the Eastern Ghats mountains. This concentration of false 255

alarms may be defining the preferential track of pre-existing cyclonic vortices that move through 256

the Bay of Bengal and into India but never become TCs. The localized nature of this false alarm 257

concentration may indicate a relationship between current convective parameterization schemes, 258

topography, and the forecast frequency of TC genesis in the VarEPS. 259

Figure 4b shows the false alarm ratio for the period 2007–2010 with Figure 4c indicating the 260

probability density function of all false alarms as a function of forecast lead-time. The false 261

alarm ratio obtains a peak value around 0.5 for a forecast probability level of 10% and decays 262

nearly exponentially as the forecast probability threshold increases. At a forecast probability 263

level of 25%, the false alarm ratio and the forecast probability level are equivalent. Figure 4c 264

shows that nearly half of all false alarms during the 2007–2010 period occurred at a forecast 265

14

lead-time of 96 to 240 hours. Relative to the period 2007–2009, the distribution of false alarms 266

for 2010 shifts to earlier lead-times with the mean forecast lead-time decreasing from 153 to 93 267

hours. 268

We now evaluate how well the VarEPS forecasts for track and maximum intensity perform 269

prior to TC genesis. Figure 5a-b show the total track error (in nm) and absolute wind error (in 270

kts) as a function of lead-time for all pre-genesis VarEPS forecasts during the period 2007–2010. 271

The average ensemble mean track error at a lead-time of 24 hrs is 69 nm with a 50% interval of 272

42–86 nm, and at 120 hrs it is 241 nm with a 50% interval of 116–324 nm. From a lead-time of 273

24 hrs to 240 hrs, the mean ensemble track error growth is nearly linear at 41 nm per day, so that 274

by a lead-time of 240 hrs, the total mean ensemble error is 409 nm with a 50% percent interval of 275

184–504 nm. From a lead-time of 240 hrs to 360 hrs, the track error growth plateaus to an error 276

of about 443 nm; however, this lack of error growth at extended lead-times is due likely to the 277

decreasing number of pre-genesis VarEPS forecasts relative to their increased forecasts 278

frequency at lead-times less than 240 hrs. Figure 5a also shows that the VarEPS track forecasts 279

perform similarly regardless of year, indicating that even though the VarEPS has undergone 280

several major changes during the 2007-2010 period, there has not been a substantial change in 281

forecast track performance in the NIO. To place these pre-genesis track errors in perspective, the 282

Indian Meteorological Department’s post-genesis track errors at a lead-time of 48 hrs and 72 hrs 283

typically average 162 nm and 270 nm, respectively2. This result suggests that the VarEPS can 284

provide a track forecast prior to TC formation, at a lead-time of 138 hrs, and achieve the same 285

track accuracy as the IMD’s 72-hr track forecasts, which are not provided until after a TC has 286

formed. 287

287 2 http://www.imd.gov.in/section/nhac/dynamic/faq/FAQP.htm

15

Figure 5b shows the absolute wind error (in kts) for all pre-genesis ensemble forecasts during 288

the period 2007–2010. Forecasts with a lead-time of 24 hrs have a mean absolute wind error of 289

12 kts with a 50% percent interval of 4–16 kts, and by 120 hrs the mean absolute wind error 290

grows to 23 kts with a 50% interval of 7–29 kts. However, unlike the pre-genesis TC track 291

forecasts where the 95th percentile interval is approximately 2.5 times as large as the mean 292

ensemble track error at 120 hrs, the 95th percentile interval for absolute wind error is 3.4 times as 293

large, reflecting a substantial negative wind bias for several of the most intense tropical cyclones 294

during the 2007–2010 period (not shown). In terms of the interannual variation in forecast 295

performance of maximum intensity, Figure 5b indicates that the VarEPS forecasts for 2008–2010 296

have on average performed substantially better than the VarEPS forecasts from 2007, with 2008 297

and 2009 showing three times the improvement relative to 2007. This marked change in forecast 298

skill of maximum intensity is likely due to the horizontal and vertical resolution increase that 299

occurred after 2007 in the VarEPS. 300

Forecasting tropical cyclone formation requires an estimate of not only the likely location of 301

TC genesis, but also the time when a system is likely to reach advisory thresholds Figure 5c 302

shows the relative error (in days) in the VarEPS’s forecast timing of TC genesis. Positive values 303

indicate the VarEPS TC genesis forecasts occur sooner than observations, while negative values 304

indicate a later genesis date. For a lead-time of 24 hours, the VarEPS mean ensemble error is 305

-0.5 days with a 50% interval of -1 to 0.5 days. By a lead-time of 120 hrs, the VarEPS mean 306

ensemble error exhibits a slight positive bias of 0.5 days with a 50% interval of -0.9 to 1.9 days. 307

After a lead-time of 168 hrs, the VarEPS mean ensemble error begins to increase more rapidly 308

such that at 240 hrs the VarEPS mean ensemble bias grows to 2.2 days with a 50% interval of 309

0.8 to 3.3 days. 310

16

A possible explanation why the VarEPS’s forecasts for timing of TC genesis are well-311

constrained through a lead-time of 168 hrs is due to the dispersion or spread among the VarEPS’ 312

ensembles. Figure 5d shows the distribution of ensemble spread in genesis time for each TC 313

from 2007-2010 as a function of forecast lead-time. In this case, the ensemble spread is 314

calculated as the difference in time (in days) of TC genesis between the earliest and latest 315

ensemble member. As shown in Figure 5d, the ensemble spread in TC genesis time grows 316

rapidly for the first seven days of forecast lead time, then reaches a time limit plateau of around 317

12 days after a forecast lead-time of 168 hrs. This limit is likely in response to the integration 318

limit of 15 days imposed by the VarEPS and could explain why the VarEPS begins showing 319

significant positive biases in the time of TC genesis relative to observations after 168 hrs (c.f. 320

Figure 5c). 321

b) Post-genesis Tropical Cyclone Forecasts 322

Figure 6a shows the distribution of all VarEPS track forecasts during the 2007–2010 period 323

indicating how the total track error statistic varies as a function of lead-time once TC genesis has 324

occurred. From Figure 6a it is seen that at a lead-time of 24 hrs, the mean track error is 72 nm 325

with a 50% interval of 36 to 105 nm. From 24 to 120 hrs, the VarEPS track errors increase 326

linearly at a rate of 53 nm such that by a lead-time of 120 hrs, the mean track error is 287 nm 327

with a 50% interval of 162–332 nm. Similar to the VarEPS pre-genesis track forecasts, the 328

VarEPS post-genesis track errors show no significant improvement in annual track performance 329

for the period 2007–2010. 330

To place the VarEPS’s post-genesis track forecasts in perspective, Figure 6b compares the 331

VarEPS control and ensemble mean forecasts with other forecasting agencies including the 332

JTWC, the U.S. Navy’s version of the GFDL (GFDN), the United Kingdom Meteorological 333

17

Office’s global model (UKMET), the National Center for Environmental Prediction’s global 334

forecast system model (GFS), and the U.S. Navy’s NOGAPS model. Since these forecasts were 335

obtained through the U.S Navy’s Automated Tropical Cyclone Forecasting System, most of the 336

forecast guidance is limited to 72-hrs, similar to the temporal limit of the JTWC’s forecasts prior 337

to 2010. Although the VarEPS control and ensemble mean forecast on average begins with the 338

largest initial track error, 12-hrs later and beyond the VarEPS control and ensemble mean 339

showcase the lowest track errors among all other model forecasts. In addition, the VarEPS 340

control and ensemble mean on average exhibit slightly lower track errors than the JTWC through 341

a lead-time of 72-hrs, although this difference is not statistically significant at the 95% 342

confidence level. Relative to the next best performing forecast model, the VarEPS ensemble 343

mean’s 24-hr, 48-hr, and 72-hr track forecast error is on average 10%, 19%, and 27% smaller 344

than NCEP’s GFS. Since the linear track error growth per day for the VarEPS ensemble mean 345

(41 nm/day) is considerably smaller than the GFS (66 nm/day), greater track forecast utility is 346

obtained at longer lead-times with the VarEPS in comparison to other model forecasts. 347

Figure 6c shows the distribution of absolute wind error for all VarEPS post-genesis forecasts 348

during the 2007–2010 period. At the analysis time step (0 hrs), the mean absolute wind error of 349

all VarEPS forecasts is high at 21 kts with a 50% interval of 13–48 kts. By a lead-time of 72 hrs 350

the mean absolute wind error reaches 32 kts with a 50% interval of 8–54 kts, and by 108 hrs, the 351

VarEPS mean intensity forecasts reach their maximum wind error of 41 kts with a 50% interval 352

of 21–52 kts. Unlike the intensity error statistics for the pre-genesis VarEPS forecasts, the 353

interannual variation of post-genesis intensity forecasts reflects a more substantial improvement 354

for the 2008–2010 period compared to 2007. Using the first 72 hrs of lead-time as a reference, 355

an average improvement of 67% relative to 2007 is evident. Finally, Figure 6d compares the 356

18

VarEPS control and ensemble mean absolute wind error to other forecast models and the JTWC. 357

Generally, for the 2007–2010 period, the VarEPS begins with much higher initial error than any 358

other forecast model but also shows the lowest intensity error growth through the first 72 hrs. If 359

the 2007 period is excluded from the analysis, the VarEPS control and ensemble mean perform 360

better than any other global numerical weather model (GFS, NOGAPS, UKMET) and similar to 361

the high-resolution GFDN model and JTWC’s operational forecasts. 362

One rationale for why tropical cyclone forecasts are not issued prior to TC genesis is the 363

notion that pre-genesis track forecasts have significantly larger errors than track forecasts for 364

systems that have already formed. Although a large number of pre-genesis and post-genesis 365

forecasts are needed to test this hypothesis robustly, the sample of VarEPS mean forecasts from 366

2007–2010 for the North Indian Ocean of around 100 forecasts is sufficient to test this 367

hypothesis. Figure 7a shows the probability distribution function of pre-genesis and post-genesis 368

VarEPS track forecast errors for lead-times less than 48 hours. Consistent with the hypothesized 369

notion of track forecasting performance, the VarEPS forecasts made after TC genesis on average 370

are more skillful at predicting track movement (median track error of 62 nm) than forecasts made 371

prior to tropical cyclone formation (median track error 75 nm), with the difference being 372

statistically significant at the 90% confidence level. Similar Figure 7a, Figure 7b also shows the 373

probability distribution function of total track error for the VarEPS forecasts except at longer 374

lead-times (48 to 120 hours). Unlike the first 48 hours, pre-genesis TC track forecasts on 375

average exhibit lower total track errors at these extended lead-times (median track error of 136 376

nm versus 171 nm), which is a statistically significant difference at the 99% confidence level. 377

This difference between pre-genesis and post-genesis forecasts involves a systematic, 378

statistically significant shift in the first moment of the distributions. However, there is also a 379

19

narrowing of the VarEPS post-genesis track error distribution, which implies that there is greater 380

dispersion among the VarEPS ensembles prior to TC genesis. 381

A number of factors might be causing this rather counterintuitive longer lead-time result for 382

pre-genesis TC tracks. One possible explanation is that even though ensemble spread is 383

generated through moist singular vectors introduced into a 5o x 5o degree region enclosing the 384

TC after genesis has occurred, this process produces less dispersion than simply modifying the 385

initial TC location prior to model integration–a process which implicitly occurs with forecasts 386

prior to TC formation. Therefore, since the pre-genesis track forecasts begin with larger track 387

dispersion, this larger initial spread results in a greater sample of the forecast phase space, lower 388

track error growth rates, and allows the ensemble mean on average to perform better than for 389

forecasts with smaller initial dispersion (i.e. post-TC genesis). 390

c) Regional Outlooks of Tropical Cyclone Activity 391

We now consider the VarEPS’s ability to produce skillful regional outlooks of TC activity 392

(i.e. the probability that a tropical depression strength or greater vortex will be located within a 393

region) by dividing the North Indian Ocean into two sub-domains: the Arabian Sea and the Bay 394

of Bengal. Figure 8a shows the ROC for the Arabian Sea using the VarEPS forecasts from 395

April–June and August–December of 2007–2010. Similar to the ROC analysis of the VarEPS 396

TC genesis forecasts, the VarEPS forecasts for TC activity in the Arabian Sea exhibit moderate 397

probabilities of detection with very low false alarm rates, even as the forecast decision threshold 398

decreases to lower probabilities. As forecast lead-time increases, a transition in forecast 399

performance occurs after a lead-time of ten days. The ROC curve for forecasts with lead-times 400

five to ten days in advance have similar POD and FAR with forecasts at shorter lead-times. 401

However, comparing the ROC curve for forecasts with lead-times of ten to fifteen days relative 402

20

to five to ten days in advance reveals a large decrease in the probability of detection although the 403

false alarm rate essentially remains constant. From this analysis it is unclear whether this 404

decrease in POD is due an inherent lack of predictability at this longer-time scale or is 405

functionally dependent on the current configuration of the VarEPS with reduced horizontal 406

resolution at lead-times greater than 240 hrs. However, it is clear that the current configuration 407

of the VarEPS is incapable of generating forecast probabilities of TC activity in the Arabian Sea 408

greater than 30% for lead-times of ten to fifteen days, which is one reason why the POD is so 409

much lower than for forecasts with lead-times less than ten days. Figure 8b shows the ROC for 410

the Bay of Bengal using the same set of VarEPS forecasts as in Figure 8a. Relative to the 411

Arabian Sea, the VarEPS’s forecasts of TC activity in the Bay of Bengal have lower probabilities 412

of detection for shorter lead-times, but higher POD values at longer lead-times. Although for 413

lead-times of ten to fifteen days in advance, forecasts in the Bay of Bengal also exhibit slightly 414

higher false alarm rates than the Arabian Sea, the relative change in POD compared with FAR 415

indicates that overall the VarEPS forecasts are more skillful in the Bay of Bengal than the 416

Arabian Sea at extended lead-times. 417

To quantify how well the VarEPS forecasts of TC activity perform in the Arabian Sea and 418

the Bay of Bengal as a function of forecast lead-time, two skill score metrics are used: the Brier 419

skill score (BSS) and the ROC score. The BSS measures the accuracy or relative skill of a 420

forecast over climatology by comparing whether or not an event is forecast to occur relative to 421

observations. A BSS greater than zero implies forecast skill beyond climatology. In this 422

analysis, a 30-yr climatology (1980–2009) of tropical cyclones was developed from the JTWC 423

best-track dataset. Although the BSS metric reveals how skillful a forecast system is relative to 424

climatology, the BSS is regarded as a harsh forecast standard, as it can often hide useful 425

21

formation information even when the BSS is less than 0 (Mason 2004). Therefore, we use the 426

ROC score as another skill score metric. Table 1 provides the BSS and ROC scores for the 427

Arabian Sea and the Bay of Bengal based on the VarEPS forecasts from April–June and August–428

December from 2007 to 2010 as a function of forecast lead-time. To establish statistical 429

significance at the 95% confidence level, a nonparametric bootstrap test was used. In this case, 430

statistical significance of the BSS (ROC score) is determined if the 95% confidence interval of 431

the BSS exceeds 0 (0.50). From Table 1, the BSS metric indicates that the VarEPS forecasts for 432

TC activity in the Arabian Sea are skillful beyond climatology for forecasts less than two days, 433

two to five days, and five to ten days in advance. Beyond ten days, however, forecasts for TC 434

activity are not skillful relative to climatology. In contrast to the BSS metric, the ROC score is 435

well-above 0.50 for all time periods, which indicates that the VarEPS provides skillful forecasts 436

of TC activity through fifteen days in the Arabian Sea. 437

Table 1 also provides the BSS and ROC scores for TC forecasts in the Bay of Bengal. 438

According to the BSS, forecasts less than five days in advance in the Bay of Bengal are nearly 439

60% less skillful than in the Arabian Sea, which is likely due to the higher frequency of false 440

alarms in the Bay of Bengal (c.f. Figure 4c and Figure 8a-b). However, consistent with the ROC 441

interpretation of Figure 8a-b, the BSS is positive and statistically greater than 0 at the 95% 442

confidence level through a lead-time of ten days. For the ten to fifteen day forecast period, the 443

BSS indicates that the VarEPS performs as well as climatology in the Bay of Bengal. Although 444

TC forecasts from the VarEPS in the Arabian Sea are more skillful than those in the Bay of 445

Bengal for lead-times less than ten days, this result is not true at longer time scales where the 446

Bay of Bengal forecasts have on average an 18% higher BSS. If the ROC score metric is used to 447

22

determine forecast skill, the VarEPS forecasts for TCs in the Bay of Bengal are skillful through 448

fifteen days, which is similar to the ROC score results for the Arabian Sea. 449

The results in Table 1 may be compared to a similar analysis that Belanger et al. (2010) 450

performed for the tropical North Atlantic using the ECMWF Monthly Forecast System during 451

the hurricane seasons of 2008 and 2009. They find that the most predictable region for TC 452

activity in the North Atlantic is the Main Development Region, as ROC scores for forecast days 453

8-14 (15-21) are 0.81 (0.75). Relative to these findings, the results presented here indicate 454

similar predictability in the Bay of Bengal as in the North Atlantic’s Main Development Region, 455

with TC predictability in the Arabian Sea more characteristic of that of the Caribbean Sea at 456

extended forecast lead-times. Physically, the authors attribute the predictability at these time 457

scales to the ability of the VarEPS to predict accurately the magnitude of deep-layer (850-200 458

hPa) vertical wind shear as well as the correct frequency of pre-existing cyclonic vortices such as 459

African easterly waves. However unlike the Caribbean Sea, where predictability is modulated 460

largely by the variability in intensity and location of the tropical upper-tropospheric trough, in 461

the Arabian Sea it is likely the combination of dry environmental air and changes in deep-layer 462

vertical wind shear in association with the onset and end of the south Asian monsoon. In 463

addition, Vitart (2009) along with Belanger et al. (2010) show that regional TC predictability in 464

the tropical Atlantic is strongly modulated by the phase and amplitude of the Madden-Julian 465

Oscillation (MJO). Given the more pronounced impact that the MJO has in the NIO (Webster 466

and Hoyos 2004, Hoyos and Webster 2007), we expect this finding to be even more applicable to 467

the results that have been presented here. 468

4. Summary and Conclusions 469

23

The performance of the VarEPS in forecasting tropical cyclones in the North Indian Ocean 470

has been examined. An evaluation of the VarEPS forecasts for TC genesis from 2007–2010 was 471

conducted in which the VarEPS is shown to have low false alarm rates and moderate to high 472

probabilities of detection for a lead-time through seven days. The VarEPS TC genesis forecast 473

performance is sensitive to the time window that is used to define whether or not an event is 474

forecast to occur. Based on an optimization procedure to achieve the highest ROC score (i.e. 475

maximizing probability of detection and minimizing the false alarm rate), the optimum 476

forecasting combination to predict TC genesis is a 48-hr time window for a forecast lead-time 477

through five days. Thereafter, the full time period of the VarEPS integration should be used to 478

generate TC genesis forecasts five to ten days in advance. 479

An evaluation of the pre-genesis and post-genesis track and maximum intensity forecasts 480

from the VarEPS was presented in which the total track error for pre-genesis forecasts are shown 481

to perform comparable to the track forecasts produced after TC genesis. For a lead-time of 24 to 482

240 hours, the total mean track error statistic grew at a rate 41 nm per day such that by a lead-483

time of 120 hours (240 hours), the average track error of all VarEPS forecast is 241 nm (409 nm) 484

with a 50% interval of 116–324 nm (184–504 nm). The performance of these track forecasts 485

prior to TC genesis is remarkable considering that the Indian Meteorological Department’s 486

average 72-hr forecast track error is 500 km (270 nm), and these forecasts are issued only after 487

TC genesis has occurred. In terms of post-genesis TC track forecasting, the VarEPS forecasts 488

for a lead-time of 24 hrs average 72 nm with a 50% interval of 36 to 105 nm, and by 120 hours 489

the mean track error is 287 nm with a 50% interval of 162–332. Although the VarEPS control 490

and ensemble mean forecast on average starts with the largest initial track error when compared 491

to other forecast models (i.e. NOGAPS, UKMET, GFS, GFDN), 12-hrs later and beyond, the 492

24

VarEPS control and ensemble mean showcase the lowest track errors among all other model 493

forecasts. In fact, the VarEPS control and ensemble mean on average exhibit slightly lower track 494

errors than the JTWC through a lead-time of 72-hrs. Since these forecasts are not bias-adjusted 495

to account for the difference in the starting location of the observed TC relative to initialized 496

location in the VarEPS model, additional statistical post-processing steps could be applied to 497

significantly lower the average track errors at extended lead-times. 498

In addition, the VarEPS forecasts were translated into regional outlooks to provide the 499

likelihood of TC activity in the Arabian Sea and the Bay of Bengal. Skill score metrics including 500

the Brier Skill Score and the Relative Operating Characteristic Score were used to evaluate the 501

VarEPS forecasts. The BSS statistic indicates that the VarEPS TC forecasts for the Arabian Sea 502

and Bay of Bengal are predictable relative to climatology through ten days, whereas the ROC 503

score statistic show that TC activity for both regions are predictable through two weeks. 504

Based on this evaluation of the VarEPS TC forecasts, we recommend that warning agencies 505

in the NIO begin providing a probabilistic TC formation outlook that assesses the potential for 506

TC development through a lead-time of seven days. When the probability of formation is within 507

moderate (30-60%) levels, the VarEPS’s probability of detection will average around 60% with a 508

false alarm rate of about 30% for a lead-time of seven days. In addition, since the distribution of 509

the VarEPS forecasts provides a dynamical measure of the forecast uncertainty in the 510

atmosphere’s future state, some TCs will be more predictable than others. Therefore, we also 511

recommend including a probabilistic track and intensity outlook that is developed based on the 512

distribution of VarEPS forecast tracks. Although the VarEPS tends to be slightly 513

underdispersive at longer forecast lead-times (Majumdar et al. 2010), some additional statistical 514

25

post-processing steps could be incorporated to ensure that the final forecast track and intensity 515

probabilities are well-conditioned relative to observations. 516

Acknowledgments 517

The authors would like to thank Mark Jelinek for his helpful suggestions during manuscript 518

preparation and the TIGGE project for supplying the ECMWF VarEPS data. Also, we 519

appreciate continuing interactions with Dr. Frederic Vitart and his colleagues at ECMWF. The 520

Climate Dynamics Division of the National Science Foundation under grant NSF 0826909 521

provided funding support for this research.522

26

References 523

Barkmeijer, J., R. Buizza, T. N. Palmer, K. Puri, and J.-F. Mahfouf, 2001: Tropical singular 524

vectors computed with linearized diabatic physics. Quart. J. Roy. Meteor. Soc., 127, 685–708. 525

Barnes, Lindsey R., David M. Schultz, Eve C. Gruntfest, Mary H. Hayden, Charles C. Benight, 526

2009: Corrigendum: false alarm rate or false alarm ratio? Wea. Forecasting, 24, 1452–1454. 527

doi: 10.1175/2009WAF2222300.1 528

Belanger, J. I., J. A. Curry, and P. J. Webster, 2010: Predictability of North Atlantic tropical 529

cyclones on intraseasonal time scales, Mon. Wea. Rev., 138, 4393–4401. 530

Buizza, R., and T. N. Palmer, 1995: The singular vector structure of the atmospheric global 531

circulation. J. Atmos. Sci., 52, 1434–1456. 532

Hart, R.E., 2003: A cyclone phase space derived from thermal wind and thermal asymmetry. 533

Mon. Wea. Rev., 131, 585–616. 534

Holland, G. J., J. Done, C. Bruyere, C. Cooper, and A. Suzuki-Parker, 2010: Model 535

investigations of the effects of climate variability and change on future Gulf of Mexico tc 536

activity. Offshore Technology Conference, 20690, 1–13. 537

Hoyos, C. D. and P. J. Webster, 2007: The Role of intraseasonal variability in the nature of 538

Asian monsoon precipitation. J. Climate, 20, 4402–4424. 539

Mason, S. J., and N. E. Graham, 1999: Conditional probabilities, relative operating 540

characteristics, and relative operating levels. Wea. Forecasting, 14, 713–725. 541

Mason, S. J., 2004: On using “climatology” as a reference strategy in the Brier and ranked 542

probability skill scores. Mon. Wea. Rev., 132, 1891–1895. 543

Majumdar, Sharanya J., Peter M. Finocchio, 2010: On the Ability of Global Ensemble Prediction 544

Systems to Predict Tropical Cyclone Track Probabilities. Wea. Forecasting, 25, 659–680. 545

27

doi: 10.1175/2009WAF2222327.1 546

Mirkin, B., 1996: Mathematical Classification and Clustering. Kluwer Academic, 428 pp. 547

Puri, K., J. Barkmeijer, and T. N. Palmer, 2001: Ensemble prediction of tropical cyclones using 548

targeted diabatic singular vectors. Quart. J. Roy. Meteor. Soc., 127, 709–734. 549

Rappaport, Edward N., and Coauthors, 2009: Advances and challenges at the national hurricane 550

center. Wea. Forecasting, 24, 395–419. doi: 10.1175/2008WAF2222128.1 551

Tropical cyclone operational plan for the Bay of Bengal and Arabian Sea, World Meteorological 552

Organization Tech. Document WMO/TD-No. 84, 1–97. [Available from World Meteorological 553

Organization, Case Postale No. 5, HC-1211, Geneva 20, Switzerland.] 554

Vitart, F., 2009: Impact of the Madden-Julian Oscillation on tropical storms and risk of landfall 555

in the ECMWF forecast system. Geophys. Res. Lett., 36, L15802, doi:10.1029/2009GL039089. 556

Webster, P. J. and C. Hoyos, 2004: Prediction of monsoon rainfall and river discharge on 15-30 557

day time scales. Bull. Amer. Met. Soc., 85, 1745–1765. 558

Webster, P. J., 2008: Myanmar’s deadly daffodil. Nature Geoscience, 1, 488–490, 559

doi:10.1038/ngeo257 560

561

28

Figure Captions 561

Figure 1: Schematic of the false alarm clustering routine. Input variables include PT, tt, xt, 562

!

! O i! x ( ) , and

!

! E j! x ( ). PT is the false alarm probability threshold, tt is a time threshold of 120 hrs, 563

!

xt =10024

"

# $

%

& ' t + 500(nm),

!

! O i! x ( ) contains location and time information for the ith observed TC 564

during a particular season,

!

! E j! x ( )is the ECMWF forecast track for the jth ensemble member. 565

Other variables include

!

! E TC j

! x ( ) which is the ECMWF forecast track for the jth ensemble member 566

that corresponds to an observed TC.

!

! E TC j

! x ( ) is made up of pre-genesis TC forecast tracks, 567

!

! E PRE"TC j , and post-genesis TC forecast tracks,

!

! E POST"TC j . K is the number of clusters employed 568

in the k–means clustering algorithm,

!

! F k! x ( ) contains the kth false-alarm cluster’s mean location 569

and starting time information.

!

! E CFA j

! x ( ) is the ‘candidate false-alarm’ for each jth ECMWF 570

forecast track,

!

! E FAC k

! x ( ) is the final false-alarm grouping for the kth cluster. 571

572

Figure 2a (left): Example of the VarEPS forecasts for Severe Cyclone Nargis on 23 April 2008 573

00UTC, which was 3.75 days prior to tropical cyclone genesis according the JTWC Best Track 574

dataset. The black line with red dots denotes the observed track of Nargis. The thin grey lines 575

indicate unique ensemble track forecasts from the VarEPS with the thick black (blue) line 576

denoting the VarEPS ensemble mean (control) track. Figure 2b (right): The maximum intensity 577

forecast from the VarEPS is shown as a probabilistic time series in which color shading denotes 578

percentile intervals of the VarEPS forecasts ranging from 10 to 90%. Note: The red line 579

indicates the observed maximum intensity of Nargis. 580

581

29

Figure 3: Relative operating characteristic of the VarEPS forecasts for tropical cyclone genesis 582

during the period 2007–2010. a) ROC is determined as a function of forecast days in advance 583

using a 48-hour window on the date of TC genesis. b) Similar to a), except for a 96-hour 584

window on the date of TC genesis. c) Similar to a), except the full 360-hour forecast period is 585

used for the TC genesis evaluation. 586

587

Figure 4: a) Mean initial location for each false alarm cluster from 2007 to 2010. b) False alarm 588

ratio as a function of increasing probability detection threshold from 2007 to 2010. The red line 589

indicates the location where the false alarm ratio is equivalent to the probability threshold. c) 590

Probability distribution functions of the mean genesis lead-time in hours. 591

592

Figure 5: a) VarEPS pre-genesis track errors (in nm) and b) absolute wind errors (in kts) for all 593

ensemble forecasts during 2007–2010. c) VarEPS forecasts of the difference between forecast 594

time of TC genesis and observations (in days) with values greater (less) than 0 indicating the 595

VarEPS forecasts are sooner (later) than observations. d) Ensemble spread in the forecast time of 596

TC genesis. Color shading indicates the percentile ranges for the VarEPS forecasts and the black 597

line is the VarEPS ensemble mean. The total number of ensemble forecasts included in the 598

verification is listed above the abscissa. 599

600

Figure 6: a) VarEPS post-genesis track errors (in nm) and c) absolute wind errors (in kts) for all 601

ensemble forecasts during 2007–2010. Color shading indicates the percentile ranges for the 602

VarEPS forecasts and the black line is the VarEPS ensemble mean. The total number of 603

ensemble forecasts included in the verification is listed above the abscissa. b) Comparison of 604

30

ECMWF control and ensemble mean track errors (in nm) and d) absolute wind errors (in kts) to 605

other global weather models and the Joint Typhoon Warning Center for the period 2007–2010. 606

607

Figure 7: a) Probability distribution function of the 0–48 hour total track error statistics for pre-608

genesis (109 forecasts) and post-genesis VarEPS mean forecasts (175 forecasts) for the 23 North 609

Indian Ocean TCs of the period 2007–2010. b) Similar to a), except for the period 48–120 hours 610

and includes 171 (51) forecast pre-genesis (post-genesis). The thick line indicates the actual 611

PDF with shading denoting the 95% confidence intervals determined by a bootstrap resampling 612

technique. 613

614

Figure 8: Relative operating characteristic for the a) Arabian Sea and b) Bay of Bengal using the 615

VarEPS forecasts from 2007 to 2010 during the months of April–June and August–December for 616

various forecast lead-times.617

31

Tables 618

Table 1. Brier skill scores (BSS) and relative operating characteristic scores (ROCS) for the 619

Arabian Sea and the Bay of Bengal based on VarEPS forecasts for tropical cyclone activity 620

during the months of April–June and August–December for 2007–2010. BSS (ROCS) in bold 621

are statistically different from 0 (0.5) at the 95% confidence level. 622

Arabian Sea Brier Skill Score Relative Operating Characteristic Score

All Forecast Days 0.17 0.82

Forecast Days ! 2 0.47 0.85

Forecast Days 2–5 0.32 0.87


Forecast Days 10–15 -0.14 0.70

Bay of Bengal Brier Skill Score Relative Operating Characteristic Score

All Forecast Days 0.09 0.80

Forecast Days ! 2 0.32 0.78




623

624

32

Figures 624

625

Figure 1: Schematic of the false alarm clustering routine. Input variables include PT, tt, xt, 626

!

! O i! x ( ) , and

!

! E j! x ( ). PT is the false alarm probability threshold, tt is a time threshold of 120 hrs, 627

!

xt =10024

"

# $

%

& ' t + 500(nm),

!

! O i! x ( ) contains location and time information for the ith observed TC 628

during a particular season,

!

! E j! x ( )is the ECMWF forecast track for the jth ensemble member. 629

Other variables include

!

! E TC j

! x ( ) which is the ECMWF forecast track for the jth ensemble member 630

that corresponds to an observed TC.

!

! E TC j

! x ( ) is made up of pre-genesis TC forecast tracks, 631

!

! E PRE"TC j , and post-genesis TC forecast tracks,

!

! E POST"TC j . K is the number of clusters employed 632

in the k–means clustering algorithm,

!

! F k! x ( ) contains the kth false-alarm cluster’s mean location 633

33

and starting time information.

!

! E CFA j

! x ( ) is the ‘candidate false-alarm’ for each jth ECMWF 634

forecast track,

!

! E FAC k

! x ( ) is the final false-alarm grouping for the kth cluster. 635

636

34

636

637

Figure 2a (left): Example of the VarEPS forecasts for Severe Cyclone Nargis on 23 April 2008 638

00UTC, which was 3.75 days prior to tropical cyclone genesis according the JTWC Best Track 639

dataset. The black line with red dots denotes the observed track of Nargis. The thin grey lines 640

indicate unique ensemble track forecasts from the VarEPS with the thick black (blue) line 641

denoting the VarEPS ensemble mean (control) track. Figure 2b (right): The maximum intensity 642

forecast from the VarEPS is shown as a probabilistic time series in which color shading denotes 643

percentile intervals of the VarEPS forecasts ranging from 10 to 90%. Note: The red line 644

indicates the observed maximum intensity of Nargis. 645

646

35

646

Figure 3: Relative operating characteristic of the VarEPS forecasts for tropical cyclone genesis 647

during the period 2007–2010. a) ROC is determined as a function of forecast days in advance 648

using a 48-hour window on the date of TC genesis. b) Similar to a), except for a 96-hour 649

window on the date of TC genesis. c) Similar to a), except the full 360-hour forecast period is 650

used for the TC genesis evaluation. 651

652

36

652

Figure 4: a) Mean initial location for each false alarm cluster from 2007 to 2010. b) False alarm 653

ratio as a function of increasing probability detection threshold from 2007 to 2010. The red line 654

indicates the location where the false alarm ratio is equivalent to the probability threshold. c) 655

Probability distribution functions of the mean genesis lead-time in hours. 656

657

37

657

Figure 5: a) VarEPS pre-genesis track errors (in nm) and b) absolute wind errors (in kts) for all 658

ensemble forecasts during 2007–2010. c) VarEPS forecasts of the difference between forecast 659

time of TC genesis and observations (in days) with values greater (less) than 0 indicating the 660

VarEPS forecasts are sooner (later) than observations. d) Ensemble spread in the forecast time of 661

TC genesis. Color shading indicates the percentile ranges for the VarEPS forecasts and the black 662

line is the VarEPS ensemble mean. The total number of ensemble forecasts included in the 663

verification is listed above the abscissa. 664

38

665

Figure 6: a) VarEPS post-genesis track errors (in nm) and c) absolute wind errors (in kts) for all 666

ensemble forecasts during 2007–2010. Color shading indicates the percentile ranges for the 667

VarEPS forecasts and the black line is the VarEPS ensemble mean. The total number of 668

ensemble forecasts included in the verification is listed above the abscissa. b) Comparison of 669

ECMWF control and ensemble mean track errors (in nm) and d) absolute wind errors (in kts) to 670

other global weather models and the Joint Typhoon Warning Center for the period 2007–2010. 671

672

39

672

Figure 7: a) Probability distribution function of the 0–48 hour total track error statistics for pre-673

genesis (109 forecasts) and post-genesis VarEPS mean forecasts (175 forecasts) for the 23 North 674

Indian Ocean TCs of the period 2007–2010. b) Similar to a), except for the period 48–120 hours 675

and includes 171 (51) forecast pre-genesis (post-genesis). The thick line indicates the actual 676

PDF with shading denoting the 95% confidence intervals determined by a bootstrap resampling 677

technique. 678

679

680

40

680

Figure 8: Relative operating characteristic for the a) Arabian Sea and b) Bay of Bengal using the 681

VarEPS forecasts from 2007 to 2010 during the months of April–June and August–December for 682

various forecast lead-times. 683