+ All Categories
Home > Documents > Predicting Monsoon Floods in Rivers Embedding Wavelet Transform, Genetic Algorithm and Neural...

Predicting Monsoon Floods in Rivers Embedding Wavelet Transform, Genetic Algorithm and Neural...

Date post: 23-Dec-2016
Category:
Upload: ayush
View: 212 times
Download: 1 times
Share this document with a friend
17
Predicting Monsoon Floods in Rivers Embedding Wavelet Transform, Genetic Algorithm and Neural Network Rajeev Ranjan Sahay & Ayush Srivastava Received: 1 August 2012 / Accepted: 4 September 2013 / Published online: 18 December 2013 # Springer Science+Business Media Dordrecht 2013 Abstract Monsoon floods are recurring hazards in most countries of South-East Asia. In this paper, a wavelet transform-genetic algorithm-neural network model (WAGANN) is proposed for forecasting 1-day-ahead monsoon river flows which are difficult to model as they are characterized by irregularly spaced spiky large events and sustained flows of varying duration. Discrete wavelet transform (DWT) is employed for preprocessing the time series and genetic algorithm (GA) for optimizing the initial parameters of an artificial neural network (ANN) prior to the network training. Depending on different inputs, four WAGANN models are developed and evaluated for predicting flows in two Indian Rivers, the Kosi and the Gandak. These rivers are infamous for carrying large flows during monsoon (June to Sept), making the entire North Bihar of India unsafe for habitation or cultivation. When compared, WAGANN models are found to be better than autoregression models (ARs) and GA-optimized ANN models (GANNs) which use original flow time series (OFTS) for inputs, in simulating river flows during monsoon. In addition, WAGANN models predicted relatively reasonable esti- mates for the extreme flows, showing little bias for underprediction or overprediction. Keywords ANN . Autoregression . Flood forecasting . India . Rivers . Wavelet transform 1 Introduction Floods affect more people and cause greater devastation than any other natural calamity (IFRCRCS 2006). However, the severity of floods can be minimized if their occurrence is foretold. A reliable flood forecasting model can go a long way in reducing the impact of floods on human life, health and property in a country like India, which is one of the worlds most flood-affected countries. In recent times, floods in North Indian Rivers have become more frequent and devastating due to channel siltation and climatic change, making accurate flood forecasting even more warranting. Numerous models developed for predicting floods Water Resour Manage (2014) 28:301317 DOI 10.1007/s11269-013-0446-5 R. R. Sahay (*) Civil Engineering Department, BIT Mesra, Patna Campus, Patna, India 800014 e-mail: [email protected] A. Srivastava Civil Engineering Department, BIT Mesra, Ranchi, India e-mail: [email protected]
Transcript

Predicting Monsoon Floods in Rivers Embedding WaveletTransform, Genetic Algorithm and Neural Network

Rajeev Ranjan Sahay & Ayush Srivastava

Received: 1 August 2012 /Accepted: 4 September 2013 /Published online: 18 December 2013# Springer Science+Business Media Dordrecht 2013

Abstract Monsoon floods are recurring hazards in most countries of South-East Asia. Inthis paper, a wavelet transform-genetic algorithm-neural network model (WAGANN) isproposed for forecasting 1-day-ahead monsoon river flows which are difficult to model as theyare characterized by irregularly spaced spiky large events and sustained flows of varyingduration. Discrete wavelet transform (DWT) is employed for preprocessing the time seriesand genetic algorithm (GA) for optimizing the initial parameters of an artificial neural network(ANN) prior to the network training. Depending on different inputs, fourWAGANNmodels aredeveloped and evaluated for predicting flows in two Indian Rivers, the Kosi and the Gandak.These rivers are infamous for carrying large flows during monsoon (June to Sept), making theentire North Bihar of India unsafe for habitation or cultivation. When compared, WAGANNmodels are found to be better than autoregression models (ARs) and GA-optimized ANNmodels (GANNs) which use original flow time series (OFTS) for inputs, in simulating riverflows during monsoon. In addition, WAGANN models predicted relatively reasonable esti-mates for the extreme flows, showing little bias for underprediction or overprediction.

Keywords ANN . Autoregression . Flood forecasting . India . Rivers .Wavelet transform

1 Introduction

Floods affect more people and cause greater devastation than any other natural calamity(IFRCRCS 2006). However, the severity of floods can be minimized if their occurrence isforetold. A reliable flood forecasting model can go a long way in reducing the impact offloods on human life, health and property in a country like India, which is one of the world’smost flood-affected countries. In recent times, floods in North Indian Rivers have becomemore frequent and devastating due to channel siltation and climatic change, making accurateflood forecasting even more warranting. Numerous models developed for predicting floods

Water Resour Manage (2014) 28:301–317DOI 10.1007/s11269-013-0446-5

R. R. Sahay (*)Civil Engineering Department, BIT Mesra, Patna Campus, Patna, India 800014e-mail: [email protected]

A. SrivastavaCivil Engineering Department, BIT Mesra, Ranchi, Indiae-mail: [email protected]

in the last decades can broadly be kept into two categories, conceptual and data-based.Although, conceptual prediction models have the advantage in assisting physical under-standing of the hydrological process, the spatial and temporal variability of characteristics ofwatershed and the number of variables involved in the modeling of the physical processesrender them difficult to be implemented other than by specialists (Wu and Chau 2006). Onthe other hand, data-driven models, which are essentially statistical or based on artificialintelligence, have become popular in hydrological applications due to their simplicity, rapiddevelopment time and fewer data requirement. However, statistical models are foundunsuitable for their difficulty in handling data with transitory characteristics such as drifts,trends and abrupt changes, whereas artificial intelligent models have no such handicap,though determination of their optimal structure poses some difficult.

In recent times, wavelet transform has been widely used in time series analysis aswavelets can extract effectively both time and frequency-like information from the series.Smith et al. (1998) used discrete wavelet transform on daily river discharge records anddemonstrated its strong potential for quantifying stream flow variability, both periodic andnon-periodic. Labat et al. (2000) applied wavelet method to model rainfall and runoffmeasured at different sampling rates, from daily to half-hourly. Wang and Ding (2003)developed a hybrid model of wavelet and artificial neural network for short and long termforecasting of a hydrological time series. Coulibaly and Burn (2004) used wavelet analysisto identify variability in annual flow in Canadian Rivers. Partal and Kucuk (2006) success-fully used DWT for determining possible trends in the annual precipitation in Turkey. Kucukand Agiralioglu (2006) developed a DWT model for stream flow prediction. Zhou et al.(2008) proposed a wavelet predictor-corrector model for the simulation of monthly dis-charge time series and showed that the decomposition scale had no dependence on theprediction. Based on wavelet and cross-wavelet constituent components for time series,Adamowski (2008) developed flood forecasting models for predicting stream flows withgreater accuracy for 1 and 3 days lead time than ANN if there were no significant trends inthe amplitude of the time series, however, the accuracy suffered for longer lead timeforecasting. Kisi (2008) combined discrete wavelet transform and multi-layer perceptron,for one-month-ahead stream flow forecasting. Wang et al. (2009) developed wavelet net-work model to forecast seasonal mean discharge, mean daily discharge and annual meandischarge for Three Gorges dam in Yangtze River of China. Kisi (2009) developed awavelet-ANN model for forecasting daily intermittent streamflows. Tiwari and Chatterjee(2010) developed a combined wavelet-bootstrap-ANN model for hourly flood forecastingand showed them better than wavelet-ANN and bootstrap-ANN models. Adamowski andSun (2010) developed a coupled wavelet transform and neural network method for flowforecasting of non-perennial rivers in semi-arid watersheds. Kisi (2010) developed a waveletregression technique by combining discrete wavelet transform and linear regression forshort-term streamflow forecasting at two stations, Karabuk and Derecikviran, on the FilyosRiver in the Western Black Sea region of Turkey. Rajaee et al. (2010) developed a wavelet-ANN model for predicting sediment load in a river and showed its superiority over ANN andmultilinear regression models. Shiri and Kisi (2010) developed a hybrid wavelet-neuro-fuzzy model to forecast daily, monthly and yearly streamflows. Kisi and Shiri (2011)developed hybrid models of wavelet-genetic programming (WGEP) and wavelet-neuro-fuzzy (WNF) and concluded that WGEP was more effective in forecasting daily precipita-tion than WNF. Kisi (2011a, b) used wavelet regression as an alternative to neural networkfor river stage forecasting. . Rajaee (2011) proposed a wavelet-neural network model forpredicting suspended sediment load in rivers and showed its superiority over models basedon neuro-fuzzy, multilinear regression and sediment rating curves. Kisi (2011a, b) developed

302 R.R. Sahay, A. Srivastava

a combined model of generalized regression neural network and wavelet for monthlystreamflow prediction. Sahay and Chakraborty (2012) demonstrated the efficiency of combinedmodel of discrete wavelet transform and autoregression for daily river flows forecasting.Moosavi et al. (2013) compared performance of Wavelet-ANN and Wavelet-ANFIS modelsfor forecasting of groundwater levels. Sahay and Sehgal (2013) developed wavelet-autoregression models for forecasting 1-day-ahead river stages and found them superior toANN and ARmodels. Ramana et al. (2013) developed a combined model of wavelet and ANNand showed it performing better than stand-alone ANN model for monthly rainfall prediction.

The foregoing discussion suggests that a wavelet-transformed time series improves theefficiency of a forecasting model. For this reason, in the present investigation, discretewavelet transform has been combined to ANN for predicting monsoon flows. We selectedANN in our study because these heuristics are particularly suited for modeling nonlinear,nonstationary and nongaussian processes like those encountered in hydrological contexts(Maier and Dandy 2000; Taormina et al. 2012). However, initial values of weights andbiases have a significant effect on network performance as their inappropriately assignedvalues can lead to local convergence. So, prior to training, the initial parameters of ANNs areoptimized by genetic algorithm, a relatively new optimization technique. One can find agood illustration of the working of GA in the books of Goldberg (1989); Michalewicz (1992)or Deb (2001). The coupled wavelet-genetic algorithm-neural network model is compared togenetic algorithm-neural network model (GANN) and autoregression model (AR) which useoriginal flow data series without any preprocessing. AR has been included in this study as ayardstick to gauge the performance of WAGAAN and GANN models because it is simple todevelop and widely employed in hydrologic modeling.

2 Adopted Methodologies

2.1 Discrete Wavelet Transform

Wavelet transform, similar to short time Fourier transform, is a windowing technique inwhich time series are broken up into the shifted and scaled version of a wavelet, called themother wavelet. Awavelet is a waveform of effectively limited duration that has an averagevalue of zero. The wavelets which have strictly finite extent in the time domain are known asdiscrete wavelets, otherwise continuous wavelets. There are a variety of wavelets that can beused, but to be admissible as a mother wavelet, the function should have zero mean and belocalized in both time and frequency space.

Awavelet transform convolutes a time series against the particular instance of a wavelet atvarious time scales and positions. To perform these convolutions at every position and scale, amathematical technique, called continuous wavelet transform (CWT), is implemented resultinginto many wavelet coefficients. These coefficients are functions of scale and position whichgive measure of correlation between the scaled and shifted wavelet and the original signal(Fig. 1). Mathematically, CWT of a time series, f(t), is defined as the sum over all time of thesignal multiplied by the scaled and shifted version of the mother wavelet φ(t), i.e.,

Wa;b ¼ 1ffiffiffia

pZ

−∞

f tð Þφ* t−ba

� �dt ð1Þ

where Wa,b is CWT coefficient for scale a and location b of the function φ (t). The conjugatewavelet functions, φ∗ t−b

a

� �dt are derived from a common mother wavelet function φ0,0(t)

Predicting Monsoon Floods Embedding Wavelet Transform, GA and ANN 303

by scaling it by a and translating it by b.Determining wavelet coefficients at every possible scale is an enormous task; moreover,

actual flow data are measured at specific time intervals and are discrete in nature. In suchcases, discrete wavelet transform is found more suitable. Normally, DWT uses dyadicscheme of wavelet decomposition where transform coefficients are determined at alternatescale and position, reducing the computation burden. For a discrete time series with integertime steps, xi, DWT in the dyadic decomposition scheme is defined as

Tm;n ¼ 2−m=2XN−1

i¼0

xiφ 2−mi−nð Þ ð2Þ

where Tm,n is the discrete wavelet coefficient for scale a=2m and location b =2m n, m and n

being positive integers; N is the data length of the time series which is an integer power of 2,i.e., N=2M. This gives the ranges of m and n as 0<n<2M-m -1 and 1<m<M, respectively. Thisimplies that at the largest scale, i.e., a=2m, only one wavelet is needed to cover the timeinterval producing only one coefficient. At the next scale, i.e. a=2m-1, two wavelets wouldcover the time interval producing two coefficients, and so on till m =1. Thus, the totalnumber of coefficients generated by DWT for a discrete time series of length N=2M is 1+2+3+…+2m-1=N-1 (Addison et al. 2001).

The original time series may, then, be reconstructed employing inverse discrete trans-form, i.e.

xi ¼ T þXm¼1

M X2M−m−1

n¼0

Tm;n2−m=2φ 2−mi−nð Þ ð3Þ

or, in a simple format as:

xi ¼ T tð Þ þXm¼1

M

Wm tð Þ ð4Þ

where T tð Þ is called approximation sub-time series at levelM andWm(t) are details sub-timeseries at level m=1,2,3.....M.

Mallat (1989) devised an efficient way of estimating DWT coefficients at every subset ofscale and position by utilizing filters. The process consists of a number of successivefiltering steps in which the time series is decomposed into approximation and detail sub-time series or wavelet components. Approximation sub-time series, obtained by correlatingstretched version (low-frequency and high-scale) of a wavelet with the original time series,represents the slowly changing coarse features of a time series, while detail sub-time series,

Fig. 1 Correlation between a timeseries and a mother wavelet

304 R.R. Sahay, A. Srivastava

obtained by correlating compressed wavelet (high-frequency and low-scale) with the orig-inal time series, signifies rapidly changing features of the time series. The decompositionprocess can be iterated by successive decomposition of approximation component into manylower resolution components (Fig. 2).

2.2 GA-Optimized Artif icial Neural Networks (GANN)

GANN is a hybrid integration of GA and ANN. ANN, inspired by biological nervoussystems, are composed of interconnected elements called neurons with a unique capabilityof recognizing underlying relationships between input and output events. For this, there hasbeen an increasing trend in recent years towards the use of ANNs for water related researchand engineering projects, and in particular, for modelling hydrological processes. A criticalreview of the concepts and applications of ANN in the field of hydrology can be found inASCE (2000a, b). Though, many variants of ANN are available, yet the most widely usedNN structures are three-layer feed forward back propagation networks (FFBP). Here, allconnections are feed forward, i.e. information transfer takes place only from an earlier layerto the next consecutive layer. A typical FFBP has g, k and l neurons in the input, hidden, andoutput layers respectively. Neurons of a layer are connected to every neuron of thesucceeding layer but are not connected among themselves. The parameter associated witheach of the connection nodes, called weight, signifies relative importance of the connections.Each node j receives incoming signal βi from every node i in the previous layer, magnifiedby the weight of the connection θji. The effective incoming signal αj to node j is theweighted sum of all the incoming signals plus a bias λj, i.e.

α j ¼X

θjiβi þ λ j ð5ÞThe effective incoming signal αj is passed through a transfer function g (also called an

activation fuction) to nproduce the outgoing signal yj of the node j, i.e.

y j ¼ g α j

� � ¼ 1

1þ exp −α j

� � ð6Þ

Fig. 2 Decomposition of an observed flow time series into its wavelet components

Predicting Monsoon Floods Embedding Wavelet Transform, GA and ANN 305

In this study, sigmoid nonlinear transfer function is used in the hidden as well as theoutput layers so data variables are normalized on the range (0, 1) before applying the NNmethodology. To determine the optimal number of nodes in the hidden layer, weights andbiases, the created neural network is first trained using known inputs and outputs in someordered manner, adjusting the interconnection weights and biases until the desired outputsare achieved. The NN structure producing the best performance, i.e. one giving the mini-mum root mean square error on the verification dataset is selected for forecasting riverstages.

Although ANN is a flexible and powerful mapping tool, initialization of weights andbiases has a significant effect on network performance. We have taken the help of geneticalgorithm to search for the optimal initial values of the selected ANN. A hybrid integrationof these two algorithms may take advantage of the characteristics of both schemes. It canincrease solution stability and improve the performance of an ANN model, though at theexpense of computational time (Chau et al. 2005). Hence, in a genetic algorithm-artificialneural network model i.e., GANN, initial parameters of the network are first optimized byGA prior to training by the conventional NN. The GA scheme, in our study, is implementedusing GA Toolbox of MATLAB 7 and the binary code of representation is adopted for thevariables of a selected ANN structure. A string length of 10 is used to represent eachvariable. This string length is sufficient for the range of values these variables can attain.There is, therefore, a total string length of 310 bits in a string/chromosome for 36 variablesof an ANN (3,6,1) structure. After trying various combinations of population size, crossoverand mutation, the set of weights and biases which yielded the minimum root mean squareerror at the output layer was selected for the neural network training.

2.3 Proposed Wavelet-GANN Model (WAGANN)

The original flow time series (OFTS) may be viewed as a quasiperiodic signal, which iscontaminated by various noises at different flow levels (Wu et al. 2009). A suitable data pre-processing can improve the performance of data-driven models. An important signal de-composition technique, discrete wavelet transform (DWT) is able to expose importantcharacteristics of the time series in order to attain predictability. To improve the modelperformance, the data pre-processing techniques of discrete wavelet transform is coupledwith GANN. First, OFTS is decomposed into its sub-time series by DWT on suitableresolution levels. Wang and Ding (2003) suggested log (n) resolution levels, where n isthe length of the time series. In our study, 611 daily data are used in training period, hencethree decomposition levels are considered sufficient. The sub-time series, i.e., D1, D2 andD3 represent detail components corresponding to 2−, 4−and 8-days’ scale or periodicityrespectively, while A3 represents approximation component of 8-days’ scale or periodicity.According to Kisi and Shiri (2012), D1 components make forecasting time series difficult asit has the lowest correlation and may contain the noisy part of the original time series. For thisreason, D1 is excluded and a wavelet-smoothened flow time series (WFTS) is obtained byadding D2, D3 and A3 sub-time series. WAGANN models use WFTS while GANN and ARmodels use OFTS for input. The working structure of WAGANN model is shown in Fig. 3.

3 Model Implementation

Developed models are evaluated for predicting 1-day-ahead flows in two Indian Rivers, theKosi and the Gandak. These transboundary rivers rise in the Great Himalayan Range of

306 R.R. Sahay, A. Srivastava

Nepal and Tibet at an altitude of over 7,000 m. The Mount Everest, the highest peak in theworld, lies in the catchment of the Kosi, which has a catchment area of approximately69,100 km2 out of which 29,300 km2 lies in Tibet, 30,600 km2 lies in Nepal and 9,200 km2

lies in India. The alluvial fan of the Kosi is one of the largest in the world, extending fromBarahksetra in Nepal to the Indo-Gangetic Plain of North Bihar of India. Its basin issurrounded by the ridges separating it from the Brahmaputra in the north, the Gandaki inthe west, the Mahananda in the east and the Ganga in the south. The Kamla, the Baghmati(Kareh) and the Budhi Gandak are major tributaries of the Kosi in India, besides minortributaries like Bhutahi Balan. The Gandak, flowing through the Himalayas’ two peaks,Dhaulagiri (8,167 m) and Annapurna (8,091 m), is notable for its deep gorge. The river has atotal catchment area of 46,300 km2, out of which only 7,620 km2 lies in India and the restlies in Nepal and Tibet. The catchment map of the North Bihar Rivers (India) is shown inFig. 4.

The Kosi and the Gandak flow with steep channel slopes. Floods resulting from heavydownpours in their upper catchment rush towards the Bihar State of India with greatvelocity and take away many lives and cause destruction to infrastructure, agricultureand industrial production. The Kosi, nicknamed ‘Sorrow of Bihar’, in particular, hascaused widespread human suffering in the past through flooding. Shift in their coursesis a regular feature of the North Bihar Rivers. The Kosi, for example, has movedsideways by 120 km in the past 250 years. To confine these rivers and to control theflood damages, long embankments on both sides of the rivers have been constructed.Although, the embankments have confined the lateral shift of the rivers to a largeextent, frequent breaches and over-toppings have made flooding a perpetual challengein the area. In the year 2008, a breach in the Kosi embankment caused the biggest ever flooddisaster in India that spread to over half million hectares of land affecting about 3 millionpeople.

The developed models are derived and verified using monsoon flows of the Kosi River atBirpur gauge-site and the Gandak River at Valmikinagar gauge-site. For deriving the models,611 daily flows for the year 2001–05 are utilized, while another 242 daily flows for the year2006–07 are utilized for verifying them. Table 1 summarizes the statistical information on

Fig. 3 Working structure of WAGANN model

Predicting Monsoon Floods Embedding Wavelet Transform, GA and ANN 307

the observed datasets. In order to develop wavelet models, first, OFTS of these rivers aredecomposed into their wavelet components/sub-time series at three resolution levels. Theselection of a suitable mother wavelet in decomposing the time series is a critical issue as theefficiency of wavelet models is influenced by it. Widely used wavelets in hydrologicalmodeling are haar, db2, db6, sym5, coif5, bior6.8, rbio6.8 and dmey. In this study, Sym 5and the Coif 5 mother wavelets are found most suitable for decomposing the flow time seriesof the Kosi and the Gandak Rivers respectively (Table 2). The decomposed flow series atthree resolution levels is shown in Fig. 5. The time series D2+D3+A3, i.e., WFTS isobtained by removing D1 from the original flow time series. This smoothened time seriesformed input for the WAGAAN models.

In addition to WAGANN class of models, two more classes of models, i.e. GANN andAR, are also developed for the purpose of performance comparison. Based on differentinputs, four models in each class of models, i.e., WAGANN1, WAGANN2, WAGANN3 andWAGANN4 in WAGANN class, GANN1, GANN2, GANN3 and GANN4 in GANN classand AR1, AR2, AR3 and AR4 in AR class are developed. WAGANN3, for example, has3 days’ antecedent flows from the modified flow time series, i.e., qt, qt−1 and qt−2, for inputwhile GANN3 and AR3 have 3 days’ antecedent flows from the observed flow time series, i.e.,Qt, Qt−1 and Qt−2, for input (Tables 3 and 4). The desired output in all models is 1-day-ahead

Fig. 4 Catchment map of North Bihar Rivers (India, fmis 2012)

Table 1 Monsoon flow characteristics of the Kosi and the Gandak Rivers

Parameter Kosi River Gandak River

(m3/s) Der. dataset Ver. dataset Der. dataset Ver. dataset

Max. daily disch. 11282 7528 13315 12834

Min. daily disch. 467 825 926 623

Mean daily disch. 3252 3109 3984 3595

Std. Dev. 1673 1333 1850 1914

308 R.R. Sahay, A. Srivastava

flow, i.e., Qt+1. To allow performance comparison among the models, the following statisticalindices are used:

(i) Nash-Sutcliffe Coefficient,

NSC ¼ 1−

Xi¼1

N

Qp−Qo

� �2

Xi¼1

N

Qo− Qo

� �2ð7Þ

(ii) Root mean square error,

RMSE ¼

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiXi¼1

N �Qp−Qo

�2

N

vuuutð8Þ

(iii) Discrepancy ratio,

DR ¼ logQo

Qpð9Þ

(iv)

%Accuracy ¼ 100N0

Nð10Þ

where Qp and Qo are the predicted and the observed daily flow rates in the river respectively;N is the number of observations and N' is the number of predicted values lying between 75 %and 125 % of the observed values (i.e., DR value of the prediction lying between −0.097 and0.097). From Eq. (9), DR=0 suggests exact matching between the observed and predictedvalues, otherwise, there is either overprediction [DR>0, i.e. Qp>Qo] or underprediction[DR<0, i.e. Qp<Qo].

4 Results and Discussion

The performance of the developed models is evaluated for forecasting 1-day-ahead flows inthe Kosi and the Gandak Rivers for the monsoon period during which the deluge poses great

Table 2 Performance of WFTSderived by various motherwavelets with OFTS

Motherwavelet

Gandak River Kosi River

CC RMSE (m) CC RMSE (m)

coif5 0.981 357.7 0.967 400.7

dmey 0.981 361.9 0.969 389.9

bior6.8 0.980 364.9 0.966 409.9

rbio6.8 0.975 371.8 0.965 415.6

db2 0.977 390.1 0.930 585.4

sym5 0.979 376.6 0.974 358.2

db6 0.980 365.8 0.965 410.7

haar 0.971 440.5 0.972 367.6

Predicting Monsoon Floods Embedding Wavelet Transform, GA and ANN 309

challenge to the affected community. Table 3 summarizes the performance of these modelsfor the Kosi River while Table 4 for the Gandak River. To facilitate comparison, developedmodels are bracketed into four sets. The following section illustrates the performance andsensitivity of the developed models.

Fig. 5 DWT decomposition of the flow time series for the Kosi and the Gandak Rivers

310 R.R. Sahay, A. Srivastava

Models of Set 1 consider only one input, the current flow, qt or Qt. The objective is toinvestigate effectiveness of these simple models. ANN (1,3,1) is found suitable for applica-tion to GANN1 and WAGANN1 models by a GA scheme comprising population size of 50,

Table 4 Performance indices of the developed models for the Gandak River at Valmikinagar (India)

Set Inputvariables

Model Derivationdataset

Verificationdataset

Whole dataset

NSC RMSE(m3/s)

NSC RMSE(m3/s)

NSC RMSE(m3/s)

DR Range Accuracy(%)

Set 1 Qt AR1 0.74 933 0.81 837 0.76 907 −0.48 to 0.32 80.3

GANN1 0.78 873 0.76 922 0.77 888 −0.46 to 0.28 82.2

qt WAGANN1 0.83 758 0.84 764 0.83 760 −0.39 to 0.28 88.0

Set 2 Qt & Qt−1 AR2 0.75 931 0.81 836 0.76 905 −0.48 to 0.32 80.5

GANN2 0.80 835 0.81 818 0.80 831 −0.45 to 0.31 84.3

qt & qt−1 WAGANN2 0.87 672 0.88 668 0.87 671 −0.38 to 0.28 90.7

Set 3 Qt,

Qt−1 & Qt−2

AR3 0.75 920 0.81 837 0.77 898 −0.48 to 0.31 80.8

GANN3 0.80 823 0.78 886 0.80 841 −0.46 to 0.30 82.9

qt , qt−1 & qt−2 WAGANN3 0.89 645 0.85 726 0.87 667 −0.37 to 0.28 91.9

Set 4 Qt , Qt−1,Qt−2& Qt−3

AR4 0.75 920 0.80 838 0.77 898 −0.48 to 0.31 80.6

GANN4 0.81 798 0.79 888 0.78 845 −0.46 to 0.32 82.5

qt, qt−1, qt−2,qt−3

WAGANN4 0.87 658 0.87 691 0.87 669 −0.36 to 0.28 90.7

Qt−i and qt−i, (i = 0 to 3) are i-days’ before flows for OFTS and WFTS series, respectively

Table 3 Performance indices of the developed models for the Kosi River at Birpur (India)

Set Input Model Derivationdataset

Verificationdataset

Whole dataset

NSC RMSE(m3/s)

NSC RMSE(m3/s)

NSC RMSE(m3/s)

DR Range Accuracy(%)

Set 1 Qt AR1 0.67 968 0.72 698 0.68 892 −0.43 to 0.40 68.5

GANN1 0.69 932 0.71 707 0.70 869 −0.49 to 0.37 72.4

qt WAGANN1 0.74 847 0.72 695 0.74 804 −0.36 to 0.37 76.2

Set 2 Qt & Qt−1 AR2 0.67 967 0.72 698 0.68 892 −0.43 to 0.40 68.6

GANN2 0.72 889 0.70 728 0.71 844 −0.43 to 0.40 73.8

qt & qt−1 WAGANN2 0.82 721 0.79 608 0.81 689 −0.34 to 0.36 82.6

Set 3 Qt,

Qt−1 & Qt−2

AR3 0.67 961 0.73 690 0.69 885 −0.42 to 0.38 69.1

GANN3 0.74 845 0.63 807 0.71 836 −0.48 to 0.40 71.8

qt , qt−1 & qt−2 WAGANN3 0.84 668 0.80 591 0.83 646 −0.32 to 0.34 83.6

Set 4 Qt , Qt−1,Qt−2& Qt−3

AR4 0.67 956 0.72 687 0.69 882 −0.42 to 0.36 69.8

GANN4 0.79 775 0.58 857 0.73 827 −0.42 to 0.48 73.2

qt, qt−1, qt−2,qt−3

WAGANN4 0.84 662 0.81 571 0.84 636 −0.32 to 0.33 84.7

Qt−i and qt−i, (i = 0 to 3) are i-days’ before flows for OFTS and WFTS series, respectively

Predicting Monsoon Floods Embedding Wavelet Transform, GA and ANN 311

crossover probability of 0.75 and mutation probability of 0.002 as it yielded theminimum value of RMSE between the observed and the predicted values. The bestresults in this case are obtained with the feed forward neural network architecturetrained with the Levenberg–Marquardt algorithm. After training the network satisfacto-rily, it is tested for the verification dataset. As can be seen from Tables 3 and 4,WAGANN1 is a fairly accurate model with its prediction accuracy for the whole datasetbeing as high as 76.2 % for the Kosi River and 88.0 % for the Gandak River, whilethe corresponding accuracies by GANN1 for the two rivers are 72.4 % and 82.2 %respectively and by AR1 are 68.5 % and 80.3 % respectively. Further, WAGANN1shows the highest NSC values of 0.74 and 0.83 for the Kosi and the Gandak Riversrespectively and the least RMSE values of 804 m3/s and 760 m3/s respectively. Anotherperformance indicator, DR, which is commonly used as an error measure between theobserved and the predicted time series, also seems superior for the wavelet model withits range of values of −0.36 to 0.37 and −0.39 to 0.28 for the Kosi and the GandakRivers respectively. These DR values suggest that the model gives unbiased predictionfor the Kosi while the prediction is skewed towards negative side for the Gandak.However, when compared, prediction by GANN1 and AR1 are skewed significantlytoward the negative side for the two rivers. Another input, the 1-day-before flow qt−1/Qt−1, isadded for the models in Set 2. As is evident from Tables 3 and 4, performance ofGANN and WAGANN models improve. The most significant improvement is seen inthe performance of WAGANN model, with its NSC values increasing to 0.81 and0.87 for the Kosi and the Gandak Rivers respectively and RMSE values decreasing to689 m3/s and 671 m3/s respectively for the two rivers for the whole dataset. The otherindices, DR and Accuracy also improve. An additional input, qt−2/Qt−2, is added tothe models of Set 3. As can be observed from Tables 3 and 4, WAGANN3 and AR3show little improvement while GAAN3 shows some improvement for the derivationdataset but its performance deteriorates slightly for the verification dataset, for boththe rivers.

The time of concentration, as estimated by the Kirpich equation, comes to be around4 days for the Kosi River at the Birpur gauge-site and 3 days for the Gandak River at theValmikinagar gauge-site, implying that the flood water from the remotest place in theircatchment takes as many days to reach these gauge sites. With this in mind, modelsWAGANN4, GANN4 and AR4 are constructed with four previous days’ flow as inputs.Thus, WAGANN4 has qt−3, qt−2, qt−1 and qt as inputs, while GANN4 and AR4 have Qt−3,Qt−2, Qt−1 and Qt as inputs. After trying many combinations of population size, crossoverand mutation, a GA scheme of population size of 300, Gaussian crossover fraction of 0.75,Gaussian mutation function with scale and shrink 1 each and reproduction with elite count of4, finds the network (4,6,1) optimal for GANN4 and WAGANN4. The objective has been, asin the previous cases, the minimization of RMSE between the predicted and observed flows.The network (4,6,1) is then trained with the weights and biases given by GA. After trainingthe network successfully, the network is tested for the flow prediction using the verificationdataset. Table 3 shows some improvement for the Kosi, except GANN4 for the verificationdataset. The WAGANN4 is found to be the most reliable forecasting model for the Kosi withthe highest NSC of 0.84, least RMSE of 636 m3/s and the best DR range of −0.32 to 0.32. Itis predicting the Kosi’s flows with the highest accuracy of 84.7 %. However, in case of theGandak River, the Set 4 models do not show any better performance than the Set 3 models(Table 4). In fact, these models show a little deterioration for the verification and the wholedatasets. Although, the prediction for the derivation dataset with the GANN4 is foundimproved, it is not so for the verification and the whole datasets. The WAGANN4, on the

312 R.R. Sahay, A. Srivastava

other hand, shows improved performance for the verification dataset but not for thederivation dataset. It appears that for 1-day-ahead flow prediction in the GandakRiver, qt−4/Qt−4 is not required as an input and WAGANN3 (Annexure Ia), GANN3and AR3 (Eq. 11) are the most suitable for the purpose, whereas, WAGANN4(Annexure Ib), GANN4 and AR4 (Eq. 12) are found the most suitable model for the KosiRiver.

Qtþ1 ¼ 489þ 0:941Qt−0:216Qt−1 þ 0:153Qt−2 ð11Þ

Qtþ1 ¼ 478:79þ 0:828Qt− 0:111Qt−1 þ 0:036Qt−2þ0:099Qt−3 ð12Þ

Figure 6 illustrates the observed and the predicted flows of the two rivers for theverification dataset by the best performing models, i.e., WAGANN4, GANN4 and AR4for the Kosi River and WAGANN3, GANN3 and AR3 for the Gandak River. Apparently,compared to other models, WAGANN4 and WAGANN3 show the least deviation betweenthe observed and the predicted discharges. The WAGANN4 satisfactorily predicts thehighest three flows of 7,528 m3/s, 7,511 m3/s and 7,290 m3/s as 7,247 m3/s, 8,001 m3/sand 6,994 m3/s respectively in the Kosi River, while the AR4 and the GANN4 underpredictthem as 4,971 m3/s, 6,308 m3/s and 6,461 m3/s respectively, and 5,227 m3/s, 6,130 m3/s and6,021 m3/s respectively. The WAGANN4 also estimates the low flows better than the otherdeveloped models. It estimates the lowest three flows of 467 m3/s, 693 m3/s and 807 m3/s inthe Kosi River as 907 m3/s, 968 m3/s and 1,043 m3/s respectively, while the AR4 and theGANN4 significantly overpredict them as 1,075 m3/s, 1,213 m3/s and 1,197 m3/s respectively,and 1,129 m3/s, 1,115 m3/s and 1,235 m3/s respectively. This implies that the WAGANN4 iscapable of capturing the input–output pattern well even for the extreme values. On comparison,the WAGANN3, which is found the most suitable for the Gandak River, is not as good for thehigh flow prediction, significantly over- or underpredicting the three peak flows of 12,834m3/s,10,980 m3/s and 9,294 m3/s in the Gandak River as 6,930 m3/s, 11,714 m3/s and 6,149 m3/srespectively. But the skewness in prediction by the AR3 and the GANN3 is more striking withtheir prediction of the three peak flows as 6,655 m3/s, 11,987 m3/s and 6,022 m3/s respectively,and 6,091m3/s, 11,550m3/s and 5,769m3/s respectively. The prediction of the low flows by theWAGANN3 seems to be in good agreement with the observed data in the Gandak River. Itestimates the lowest three flows of 623 m3/s, 654 m3/s and 685 m3/s in this river as 921 m3/s,862 m3/s and 1,013 m3/s respectively, while the AR3 and the GANN3 overpredict these flowsas 1,086 m3/s, 1,010 m3/s and 1,198 m3/s respectively, and 1,137 m3/s, 1,214 m3/s and1,138 m3/s respectively.

Figure 7 shows percentage of the predicted flows for the whole dataset falling intodifferent discrepancy brackets by the best performing models in the Kosi and GandakRivers. The objective is to show how the predicted values compare against the observedvalues for the whole dataset. This figure reaffirms that the WAGANN models give an evenprediction, showing little bias for under- or overprediction while AR and GANN modelsshow significant bias for overprediction.

The above results show that the WAGANN models are efficient in monsoon river flowforecasting. However, it should be understood that the present study used daily flow dataonly for 7 years which included limited number of high flows. This length of data may notbe representative of the complexity of the large river systems like the Kosi and the Gandakand the models may have overfitted the data. If so, these models would give unsatisfactoryforecast for a new and unknown data. Moreover, the developed models are location and

Predicting Monsoon Floods Embedding Wavelet Transform, GA and ANN 313

period specific, i.e., developed for the Kosi River at Birpur gauge-site and the Gandak Riverat Triveni gauge-site for the monsoon period. Hence, the models may be sensitive and havesignificant phase problems if made to forecast flows for other periods, as the causes of floodsmay be different. During Jun-Sep, for example, intense monsoon rainfall causes floods inthese rivers, while during Oct-Dec, retreating -monsoon and during Jan-May, Himalayanglacier-melt influence flows in North Bihar Rivers. Therefore, models should be developedspecifically for a given period utilizing data for the same period.

5 Conclusions

Monsoon flow poses great challenge for modeling. The traditional statistical methods likeautoregression are unable to capture nonlinearities and nonstationarity associated with them.

Fig. 6 Predicted vs. observed flows of the Kosi and the Gandak Rivers (verification dataset)

314 R.R. Sahay, A. Srivastava

Intelligent methods also are not very accurate unless preprocessing of the input data is done.In this study, a hybrid model, WAGANN has been developed embedding DWT, GA andANN for predicting river flows for the monsoon period with 1 day lead time. Based ondifferent input combinations, four WAGANN models are developed and their performancesevaluated for two Indian Rivers, the Kosi and the Gandak. These rivers are infamous forbringing large floods almost every monsoon causing great destruction to life and property.Based on several performance indices, it is concluded that WAGANN models predictmonsoon flows better than AR and GANN models, developed for the comparison purpose.The best WAGANN models developed for the Kosi and the Gandak predict their flows withthe highest accuracy of 84.7 % and 91.9 % respectively, the highest Nash-Sutcliffe Coeffi-cient of 0.84 and 0.87 respectively and the least root mean square error of 636 m3/s and667 m3/s respectively. Their estimates of the extreme flows for the two rivers are also ingood agreement with the observed values. On comparison, AR and GANN models eithersignificantly underpredict or overpredict these extreme flows.

Fig. 7 % Proportion vs. DR rangeof the predicted flows (wholedataset)

Predicting Monsoon Floods Embedding Wavelet Transform, GA and ANN 315

Acknowledgment This research is supported by All India Council of Technical Education, New Delhi, India(F.N. 8023/BOR/RID/RPS-45/2007-8) and University Grants Commission, NewDelhi, India (F.N. 33-482/2007).

Appendix I

a. Optimal values of weights and biases for the ANN (3,6,1), used in WAGANN3 and theGANN3 models (for the Gandak River)

(i) Interconnection weights from hidden neurons to input neurons:[15.4486 23.0843 -59.9149; 3.0532 11.5108 -4.9351;-5.928 4.5639 -3.3893;−6.7051 12.8147 7.0374; 14.5115 21.3057 -57.6155;-0.4871 -5.4408 2.8422]

(ii) Interconnection weights from hidden neurons to output neuron:[14.0027 -2.3576 -3.0386 -8.3629 -14.2711 -6.3305]

(iii) Bias to neurons in hidden layer:[0.5931; -4.612; 1.6813; 1.2003; 0.9016; 0.7423]

(iv) Bias to output neuron:[12.474]

b. Optimal values of weights and biases for the ANN (4,6,1), used in WAGANN4 and theGANN4 models (for the Kosi River)

(i) Interconnection weights from hidden neurons to input neurons:[0.65459 -12.61 5.0574 -2.6024; -0.83345 -12.7699 8.5516 -1.1088;7.8724 -9.3901 -2.9088 -0.71956; 1.5982 0.87175 -1.8122 0.21225;2.4736 -10.7611 0.69216 -2.1315; -18.0929 8.5845 -25.7407 48.061]

(ii) Interconnection weights from hidden neurons to output neuron:[15.6315 9.3779 -7.3199 38.6412 -14.6288 -9.0617]

(iii) Bias to neurons in hidden layer:[1.9931; 4.6363; 4.3863; -0.25541; 1.9696; 7.4002]

(iv) Bias to output neuron:[−13.6284]

References

Adamowski JF (2008) River flow forecasting using wavelet and cross-wavelet transform models. HydroloProcess 22:4877–4891

Adamowski JF, Sun K (2010) Development of a coupled wavelet transform and neural network method forflow forecasting of non-perennial rivers in semi-arid watersheds. J Hydrol 390:85–91

Addison PS, Murray KB, Watson JN (2001) Wavelet transform analysis of open channel wake flows. J EngMech 127(1):58–70

ASCE (Task Committee on Application of Artificial Neural Networks in Hydrology (2000b) Artificial neuralnetworks in hydrology II: hydrologic applications. J Hydraulic Eng ASCE 5:123–137

ASCE (Task Committee on Application of Artificial Neural Networks in Hydrology) (2000a) Artificial neuralnetworks in hydrology I: preliminary concepts. J Hydraul Eng ASCE 5:115–123

Chau KW, Wu CL, Li YS (2005) Comparison of several flood forecasting models in Yangtze River. J HydrolEng 10:485–491

Coulibaly P, Burn HD (2004) Wavelet analysis of variability in annual Canadian streamflows. Water ResourRes 40, W03105. doi:10.1029/2003WR002667

Deb K (2001) Multi-Objective optimization using evolutionary algorithms. John Wiley and Sons AsiaFMIS (2012) Flood management information system. Water Resources Department, PatnaGoldberg DE (1989) Genetic algorithms in: Search, optimization and machine learning. Addison-Wesley,

New York

316 R.R. Sahay, A. Srivastava

International Federation of Red Cross and Red Crescent Societies (2006) Geneva, Switzerland: World DisasterReport, p 12

Kisi O (2008) Stream flow forecasting using neuro-wavelet technique. Hydrol Processes 22:4142–4152Kisi O (2009) Neural networks and wavelet conjunction model for intermittent streamflow forecasting. J

Hydrol Eng 14:773–782Kisi O (2010) Wavelet regression model for short-term streamflow forecasting. J Hydrol 389:344–353Kisi O (2011a) A combined generalized regression neural network wavelet model for monthly streamflow

prediction. KSCE J Civil Eng 15(8):1469–1479Kisi O (2011b) Wavelet regression model as an alternative to neural networks for river stage forecasting.

Water Resour Manag 25:579–600Kisi O, Shiri J (2011) Precipitation forecasting using wavelet-genetic programming and wavelet-neuro-fuzzy

conjunction models. Water Resour Manag 25:3135–3152Kisi O, Shiri J (2012) Discussion on precipitation forecasting using wavelet-genetic programming and

wavelet-neuro-fuzzy conjunction models. Water Resour Manag. doi:10.1007/s 11269-012-0060-yKucuk M, Agiralioglu N (2006) Wavelet regression techniques for stream flow predictions. J Appl Stat

33:943–960Labat D, Ababou R, Mangin A (2000) Rainfall-runoff relation for karstic spring, part 2: continuous wavelet

and discrete orthogonal multi resolution analyses. J Hydrol 238:149–178Maier HR, Dandy GC (2000) Neural networks for the predication and forecasting of water resources variables:

a review of modelling issues and applications. Environ Model Softw 15:101–124Mallat SG (1989) A theory for multi resolution signal decomposition: the wavelet representation. IEEE Trans

Pattern Anal Mach Intel 11:674–693Michalewicz Z (1992) Genetic algorithm + data structures = evolutionary programs. Springer, New YorkMoosavi V, Vafakhah M, Shirmohammadi B, Behnia N (2013) A wavelet-ANFIS hybrid model for ground-

water level forecasting for different prediction periods. Water Resour Manag 27:1301–1321Partal T, Kucuk M (2006) Long-term trend analysis using discrete wavelet components of annual precipita-

tions measurements in Marmara region (Turkey). Phys Chem Earth 31:1189–1200Rajaee T (2011) Wavelet and ANN combination model for prediction of daily suspended sediment load in

rivers. Sci Total Env 409:2917–2928Rajaee T, Nourani V, Kermani MZ, Kisi O (2010) River suspended sediment load prediction: application of

ANN and wavelet conjunction model. J Hydrol Eng 16(8):613–627Ramana RV, Krishna B, Kumar SR SR, Pandey NG (2013) Monthly rainfall prediction using wavelet neural

network analysis. Water Resour Manage 27:3697–3711Sahay RR, Chakraborty A (2012) Predicting river floods using discrete wavelet. J Soil Water Sci IV 1:29–41Sahay RR, Sehgal V (2013) Wavelet regression models for predicting flood stages in rivers: a case study in

Eastern India. J Flood Risk Manag 6:146–155Shiri J, Kisi O (2010) Short-term and long-term streamflow forecasting using a wavelet and neuro-fuzzy

conjunction model. J Hydrol 394:486–493Smith LC, Turcotte DL, Isacks B (1998) Stream flow characterization and feature detection using a discrete

wavelet transform. Hydrol Process 12:233–249Taormina R, Chau KW, Sethi R (2012) Artificial neural network simulation of hourly groundwater levels in a

coastal aquifer system of the Venice lagoon. Eng Appl Artif Intel 25:1670–1676Tiwari MK, Chatterjee C (2010) Development of an accurate and reliable hourly flood forecasting model

using wavelet-bootstrap-ANN (WBANN) hybrid approach. J of Hydrol 394:458–470Wang W, Ding J (2003) Wavelet network model and its application to the prediction of the hydrology. Nat Sci

1:67–71Wang W, Jin J, Li Y (2009) Prediction of inflow at Three Gorges Damin Yangtze river with wavelet network

model. Water Resour Manag 23:2791–2803Wu CL, Chau KW (2006) Evaluation of several algorithms in forecasting flood. Adv Appl Artif Intell

4031:111–116Wu CL, Chau KW, Li YS (2009) Predicting monthly streamflow using data-driven models coupled with data-

preprocessing techniques. Water Resour Res 45, W08432. doi:10.1029/2007WR006737Zhou HC, Peng Y, Liang GH (2008) The research of monthly discharge predictor-corrector model based on

wavelet decomposition. Water Resour Manag 22:217–227

Predicting Monsoon Floods Embedding Wavelet Transform, GA and ANN 317


Recommended