Hybridizing Bayesian and variational data assimilation for ... · focused on improving their...

Hydrol. Earth Syst. Sci., 22, 5759–5779, 2018https://doi.org/10.5194/hess-22-5759-2018© Author(s) 2018. This work is distributed underthe Creative Commons Attribution 4.0 License.

Hybridizing Bayesian and variational data assimilationfor high-resolution hydrologic forecastingFelipe Hernández and Xu LiangDepartment of Civil and Environmental Engineering, University of Pittsburgh, Pittsburgh, PA, 15261, USA

Correspondence: Xu Liang ([email protected])

Received: 18 July 2017 – Discussion started: 11 August 2017Revised: 5 September 2018 – Accepted: 6 September 2018 – Published: 9 November 2018

Abstract. The success of real-time estimation and forecast-ing applications based on geophysical models has been pos-sible thanks to the two main existing frameworks for thedetermination of the models’ initial conditions: Bayesiandata assimilation and variational data assimilation. However,while there have been efforts to unify these two paradigms,existing attempts struggle to fully leverage the advantagesof both in order to face the challenges posed by modernhigh-resolution models – mainly related to model indeter-minacy and steep computational requirements. In this articlewe introduce a hybrid algorithm called OPTIMISTS (Opti-mized PareTo Inverse Modeling through Integrated STochas-tic Search) which is targeted at non-linear high-resolutionproblems and that brings together ideas from particle fil-ters (PFs), four-dimensional variational methods (4D-Var),evolutionary Pareto optimization, and kernel density estima-tion in a unique way. Streamflow forecasting experimentswere conducted to test which specific configurations of OP-TIMISTS led to higher predictive accuracy. The experimentswere conducted on two watersheds: the Blue River (low res-olution) using the VIC (Variable Infiltration Capacity) modeland the Indiantown Run (high resolution) using the DHSVM(Distributed Hydrology Soil Vegetation Model). By selectingkernel-based non-parametric sampling, non-sequential eval-uation of candidate particles, and through the multi-objectiveminimization of departures from the streamflow observationsand from the background states, OPTIMISTS was shown toefficiently produce probabilistic forecasts with comparableaccuracy to that obtained from using a particle filter. More-over, the experiments demonstrated that OPTIMISTS scaleswell in high-resolution cases without imposing a significantcomputational overhead. With the combined advantages ofallowing for fast, non-Gaussian, non-linear, high-resolution

prediction, the algorithm shows the potential to increase theefficiency of operational prediction systems.

1 Introduction

Decision support systems that rely on model-based forecast-ing of natural phenomena are invaluable to society (Adamset al., 2003; Penning-Rowsell et al., 2000; Ziervogel et al.,2005). However, despite increasing availability of Earth-sensing data, the problem of estimation or prediction in geo-physical systems remains as underdetermined as ever be-cause of the growing complexity of such models (Clark et al.,2017). For example, taking advantage of distributed physicsand the mounting availability of computational power, mod-ern models have the potential to more accurately representimpacts of heterogeneities on eco-hydrological processes(Koster et al., 2017). This is achieved through the replace-ment of lumped representations with distributed ones, whichentails the inclusion of numerous parameters and state vari-ables. The inclusion of these additional unknowns has thedownside of increasing the level of uncertainty in their esti-mation. Therefore, in order to be able to rely on these high-resolution models for critical real-time and forecast appli-cations, considerable improvements on parameter and ini-tial state estimation techniques must be made with two maingoals: first, to allow for an efficient management of the hugenumber of unknowns; and second, to mitigate the harmfuleffects of overfitting – i.e. the loss of forecast skill due to anover-reliance on the calibration and training data (Hawkins,2004). Because of the numerous degrees of freedom asso-ciated with these high-resolution distributed models, overfit-

Published by Copernicus Publications on behalf of the European Geosciences Union.

5760 F. Hernández and X. Liang: Hybridizing Bayesian and variational data assimilation

ting is a much bigger threat due to the phenomenon of equi-finality (Beven, 2006).

There exists a plethora of techniques to initialize the statevariables of a model through the incorporation of avail-able observations, and they possess overlapping features thatmake it difficult to develop clear-cut classifications. How-ever, two main schools can be fairly identified: Bayesian dataassimilation and variational data assimilation. Bayesian dataassimilation creates probabilistic estimates of the state vari-ables in an attempt to also capture their uncertainty. Thesestate probability distributions are adjusted sequentially tobetter match the observations using Bayes’ theorem. Whilethe Kalman filter (KF) is constrained to linear dynamics andGaussian distributions, ensemble Kalman filters (EnKF) cansupport non-linear models (Evensen, 2009), and particle fil-ters (PFs) can also manage non-Gaussian estimates for addedaccuracy (Smith et al., 2013). The stochastic nature of theseBayesian filters is highly valuable because equifinality canrarely be avoided and because of the benefits of quantifyinguncertainty in forecasting applications (Verkade and Werner,2011; Zhu et al., 2002). While superior in accuracy, PFs areusually regarded as impractical for high-dimensional appli-cations (Snyder et al., 2008), and thus recent research hasfocused on improving their efficiency (van Leeuwen, 2015).

On the other hand, variational data assimilation is moreakin to traditional calibration approaches (Efstratiadis andKoutsoyiannis, 2010) because of its use of optimizationmethods. It seeks to find a single–deterministic initial-state-variable combination that minimizes the departures (or varia-tions) of the modelled values from the observations (Reichleet al., 2001) and, commonly, from their history. One- to three-dimensional variants are also employed sequentially, but theparadigm lends itself easily to evaluating the performance ofcandidate solutions throughout an extended time window infour-dimensional versions (4D-Var). If the model’s dynamicsare linearized, the optimum can be very efficiently found inthe resulting convex search space through the use of gradientmethods. While this feature has made 4D-Var very popular inmeteorology and oceanography (Ghil and Malanotte-Rizzoli,1991), its application in hydrology has been less widespreadbecause of the difficulty of linearizing land-surface physics(Liu and Gupta, 2007). Moreover, variational data assimila-tion requires the inclusion of computationally expensive ad-joint models if one wishes to account for the uncertainty ofthe state estimates (Errico, 1997).

Traditional implementations from both schools have in-teresting characteristics and thus the development of hy-brid methods has received considerable attention (Bannis-ter, 2016). For example, Bayesian filters have been used asadjoints in 4D-Var to enable probabilistic estimates (Zhanget al., 2009). Moreover, some Bayesian approaches havebeen coupled with optimization techniques to select ensem-ble members (Dumedah and Coulibaly, 2013; Park et al.,2009). The fully hybridized algorithm 4DEnVar (Buehner etal., 2010) is gaining increasing attention for weather predic-

tion (Desroziers et al., 2014; Lorenc et al., 2015). It is es-pecially interesting that some algorithms have defied the tra-ditional choice between sequential and extended-time eval-uations. Weakly constrained 4D-Var allows state estimatesto be determined at several time steps within the assimilationtime window and not only at the beginning (Ning et al., 2014;Trémolet, 2006). Conversely, modifications to EnKFs andPFs have been proposed to extend the analysis of candidatemembers/particles to span multiple time steps (Evensen andvan Leeuwen, 2000; Noh et al., 2011). The success of thesehybrids demonstrates that there is a balance to be sought be-tween the allowed number of degrees of freedom and theamount of information to be assimilated at once.

Following these promising paths, in this article we in-troduce OPTIMISTS (Optimized PareTo Inverse Modellingthrough Integrated STochastic Search), a hybrid data as-similation algorithm whose design was guided by the twostated goals: (i) to allow for practical scalability to high-dimensional models and (ii) to enable balancing the imper-fect observations and the imperfect model estimates to min-imize overfitting. Table 1 summarizes the main characteris-tics of typical Bayesian and variational approaches and theircontrasts with those of OPTIMISTS. Our algorithm incorpo-rates the features that the literature has found to be the mostvaluable from both Bayesian and variational methods whilemitigating the deficiencies or disadvantages associated withthese original approaches (e.g. the linearity and determinismof 4D-Var and the limited scalability of PFs): Non-Gaussianprobabilistic estimation and support for non-linear model dy-namics have been long held as advantageous over their alter-natives (Gordon et al., 1993; van Leeuwen, 2009) and, simi-larly, meteorologists favour extended-period evaluations oversequential ones (Gauthier et al., 2007; Rawlins et al., 2007;Yang et al., 2009). As shown in the table, OPTIMISTS canreadily adopt these proven strategies.

However, there are other aspects of the assimilation prob-lem for which no single combination of features has demon-strated its superiority. For example, is the consistency withprevious states better achieved through the minimization of acost function that includes a background error term (Fisher,2003), as in variational methods, or through limiting the ex-ploration to samples drawn from that background state distri-bution, as in Bayesian methods? Table 1 shows that in thesecases OPTIMISTS allows for flexible configurations, and itis an additional objective of this study to test which set offeature interactions allows for more accurate forecasts whenusing highly distributed models. While many of the conceptsutilized within the algorithm have been proposed in the liter-ature before, their combination and broad range of availableconfigurations are unlike those of other methods – includingexisting hybrids which have mostly been developed aroundensemble Kalman filters and convex optimization techniques(Bannister, 2016) – and therefore limited to Gaussian distri-butions and linear dynamics.

Hydrol. Earth Syst. Sci., 22, 5759–5779, 2018 www.hydrol-earth-syst-sci.net/22/5759/2018/

F. Hernández and X. Liang: Hybridizing Bayesian and variational data assimilation 5761

Table 1. Comparison between the main features of standard Bayesian data assimilation algorithms (KF: Kalman filter, EnKF: ensemble KF,PF: particle filter), variational data assimilation (one- to four-dimensional), and OPTIMISTS.

Bayesian Variational OPTIMISTS

Resulting state- Probabilistic: Gaussian (KF, Deterministic (unless adjoint Probabilisticvariable estimate EnKF), non-Gaussian (PF) model is used) (using kernel density estimation)

Solution quality High likelihood given Minimum cost value (error, Flexible: e.g. minimum error, maximumcriteria observations departure from history) consistency with history

Analysis time step Sequential Sequential (1-D–3-D) or entire Flexibleassimilation window (4-D)

Search method Iterative Bayesian belief Convex optimization Coupled beliefpropagation propagation and multi-objective

optimization

Model dynamics Linear (KF), Linearized to obtain convex Non-linearnon-linear (EnKF, PF) solution space (non-convex solution space)

2 Data assimilation algorithm

In this section we describe OPTIMISTS, our proposed dataassimilation algorithm which combines advantageous fea-tures from several Bayesian and variational methods. As willbe explained in detail for each of the steps of the algorithm,these features were selected with the intent of mitigating thelimitations of existing methods. OPTIMISTS allows select-ing a flexible data assimilation time step 1t – i.e. the timewindow in which candidate state configurations are com-pared to observations. It can be as short as the model timestep or as long as the entire assimilation window. For eachassimilation time step at time t a new state probability distri-bution St+1t is estimated from the current distribution St ,the model, and one or more observations ot :t+1tobs . For hy-drologic applications, as those explored in this article, thesestates S include land-surface variables within the modelledwatershed such as soil moisture, snow cover and water equiv-alent, and stream water volume; and observations o are typ-ically of streamflow at the outlet (Clark et al., 2008), soilmoisture (Houser et al., 1998), and/or snow cover (Andreadisand Lettenmaier, 2006). However, the description of the al-gorithm will use field-agnostic terminology to not discourageits application in other disciplines.

State probability distributions S in OPTIMISTS are deter-mined from a set of weighted root or base sample states siusing multivariate weighted kernel density estimation (West,1993). This form of non-parametric distributions stands instark contrast with those from KFs and EnKFs in their abilityto model non-Gaussian behaviour – an established advantageof PFs. Each of these samples or ensemble members si iscomprised of a value vector for the state variables. The ob-jective of the algorithm is then to produce a set of n samplesst+1ti with corresponding weights wi for the next assimila-tion time step to determine the target distribution St+1t .

This process is repeated iteratively each assimilation timestep 1t until the entire assimilation time frame is covered,at which point the resulting distribution can be used to per-form the forecast simulations. In Sect. 2.1 we describe themain ideas and steps involved in the OPTIMISTS data as-similation algorithm; details regarding the state probabilitydistributions, mainly on how to generate random samples andevaluate the likelihood of particles, are explained in Sect. 2.2;and modifications required for high-dimensional problemsare described in Sect. 2.3.

2.1 Description of the OPTIMISTS data assimilationalgorithm

Let a “particle” Pi be defined by a “source” (or initial) vec-tor of state variables sti (which is a sample of distributionSt ), a corresponding “target” (or final) state vector st+1ti (asample of distribution St+1t ), a set of output values ot :t+1ti

(those that have corresponding observations ot :t+1tobs ), a set offitness metrics f i , a rank ri , and a weight wi . Note that thedenomination “particle” stems from the PF literature and isanalogous to the “member” term in EnKFs. The fitness met-rics f i are used to compare particles with each other in thelight of one or more optimization objectives. The algorithmconsists of the following steps, whose motivation and detailsare included in the sub-subsections below and their interac-tions illustrated in Fig. 1. Table 2 lists the meaning of eachof the seven global parameters (1t , n, wroot, psamp, kF-class,nevo, and g).

1. Drawing: draw root samples sti from St in descendingweight order until

∑wi ≥ wroot.

2. Sampling: randomly sample St until the total number ofsamples in the ensemble is psamp× n.

www.hydrol-earth-syst-sci.net/22/5759/2018/ Hydrol. Earth Syst. Sci., 22, 5759–5779, 2018


Table 2. List of global parameters in OPTIMISTS.

Symbol Description Range

1t Assimilation time step (particle evaluation time frame) R+n Total number of root states si in the probability distributions N≥ 2wroot Total weight of root samples drawn from St R ∈ [0, 1]psamp Percentage of n corresponding to drawn and random samples R ∈ [0, 1]kF-class Whether or not to use F-class kernels. If not, use D-class kernels. true or falsenevo Samples to be generated by the optimizers per iteration N≥ 2g Level of greed for the assignment of particle weights wi R ∈ [−1, 1]

Figure 1. Steps in OPTIMISTS, to be repeated for each assimilation time step 1t . In this example state vectors have two variables, obser-vations are of streamflow, and particles are judged using two user-selected objectives: the likelihood given St to be maximized and the errorgiven the observations to be minimized. (a) Initial state kernel density distribution St from which root samples (purple rhombi) are takenduring the drawing step and random samples (yellow rhombi) are taken during the sampling step. (b) Execution of the model (simulationstep) for each source sample for a time equal to 1t to compute output variables (for comparison with observations) and target samples(circles). (c) Evaluation of each particle (evaluation step) based on the objectives and organization into non-domination fronts (ranking step).The dashed lines represent the fronts while the arrows denote domination relationships between particles in adjacent fronts. (d) Optionaloptimization step which can be executed several times and that uses a population-based evolutionary optimization algorithm to generateadditional samples (red rhombi). (e) Target state kernel density distribution St+t constructed from the particles’ final samples (circles) afterbeing weighted according to the rank of their front (weighting step): kernels centred on samples with higher weight (shown larger) have ahigher probability density contribution.

3. Simulation: compute st+1ti and ot :t+1ti from each non-evaluated sample sti using the model.

4. Evaluation: compute the fitness values f i for each par-ticle Pi .

5. Optimization: create additional samples using evolu-tionary algorithms and return to 3 (if number of samplesis below n).

6. Ranking: assign ranks ri to all particles Pi using non-dominated sorting.

7. Weighting: compute the weight wi for each particle Pibased on its rank ri .

2.1.1 Drawing step

While traditional PFs draw all the root (or base) samplesfrom St (Gordon et al., 1993), OPTIMISTS can limit this se-lection to a subset of them. The root samples with the highest

weight – those that are the best performers – are drawn first,followed by the next ones in descending weight order, untilthe total weight of the drawn samples

∑wi reaches wroot.

wroot thus controls what percentage of the root samples todraw, and, if set to one, all of them are selected.

2.1.2 Sampling step

In this step the set of root samples drawn is complementedwith random samples. The distinction between root samplesand random samples is that the former are those that definethe probability distribution St (that serve as centroids for thekernels), while the latter are generated stochastically fromthe kernels. Random samples are generated until the size ofthe combined set reaches psamp× n by following the equa-tions introduced in Sect. 2.2. This second step contributes tothe diversity of the ensemble in order to avoid sample im-poverishment as seen on PFs (Carpenter et al., 1999) andserves as a replacement for traditional resampling strategies



(Liu and Chen, 1998). The parameter wroot therefore con-trols the intensity with which this feature is applied to offerusers some level of flexibility. Generating random samplesat the beginning, instead of resampling those that have beenalready evaluated, could lead to discarding degenerate parti-cles (those with high errors) early on and contribute to im-proved efficiency, given that the ones discarded are mainlythose with the lowest weight as determined in the previousassimilation time step.

2.1.3 Simulation step

In this step, the algorithm uses the model to compute the re-sulting state vector st+1ti and an additional set of output vari-ables ot :t+1ti for each of the samples (it is possible that statevariables double as output variables). The simulation is ex-ecuted starting at time t for the duration of the assimilationtime step 1t (not to be confused with the model time stepwhich is usually shorter). Depending on the complexity ofthe model, the simulation step can be the one with the high-est computational requirements. In those cases, paralleliza-tion of the simulations would greatly help in reducing thetotal footprint of the assimilation process. The constructionof each particle Pi is started by assembling the correspondingvalues computed so far: sti (drawing, sampling, and optimiza-tion steps), and st+1ti and ot :t+1ti (simulation step).

2.1.4 Evaluation step

In order to determine which initial state sti is the most de-sirable, a two-term cost function J is typically used in vari-ational methods that simultaneously measures the resultingdeviations of modelled values ot :t+1ti from observed valuesot :t+1tobs and the departures from the background state distri-bution St (Fisher, 2003). The function usually has the formshown in Eq. (1):

Ji = c1 · Jbackground(sti,S

t)+ c2

· Jobservations

(ot :t+1ti ,ot :t+1tobs

), (1)

where c1 and c2 are balancing constants usually set so thatc1 = c2. Such a multi-criteria evaluation is crucial both toguarantee a good level of fit with the observations (sec-ond term) and to avoid the optimization algorithm to pro-duce an initial state that is inconsistent with previous states(first term) – which could potentially result in overfittingproblems rooted in disproportionate violations of mass andenergy conservation laws (e.g. in hydrologic applications asharp, unrealistic rise in the initial soil moisture could re-duce Jobservations but would increase Jbackground). In Bayesianmethods, since the consistency with the state history is main-tained by sampling only from the prior or background dis-tribution St , single-term functions are used instead – whichtypically measure the probability density or likelihood of themodelled values given a distribution of the observations.

In OPTIMISTS any such fitness metric could be used and,most importantly, the algorithm allows defining several ofthem. Moreover, users can determine whether each functionis to be minimized (e.g. costs or errors) or maximized (e.g.likelihoods). We expect these features to be helpful if onewishes to separate errors when multiple types of observationsare available (Montzka et al., 2012) and as a more naturalway to consider different fitness criteria (lumping them to-gether in a single function as in Eq. (1) can lead to balancingand “apples and oranges” complications). Moreover, it mightprove beneficial to take into account the consistency with thestate history both by explicitly defining such an objectivehere and by allowing states to be sampled from the previ-ous distribution (and thus compounding the individual mech-anisms of Bayesian and variational methods). Functions tomeasure this consistency are proposed in Sect. 2.2. With theset of objective functions defined by the user, the algorithmcomputes the vector of fitness metrics f i for each particleduring the evaluation step.

2.1.5 Optimization step

The optimization step is optional and is used to generate ad-ditional particles by exploiting the knowledge encoded in thefitness values of the current particle ensemble. In a twist tothe signature characteristic of variational data assimilation,OPTIMISTS incorporates evolutionary multi-objective opti-mization algorithms (Deb, 2014) instead of the establishedgradient-based, single-objective methods. Evolutionary opti-mizers compensate for their slower convergence speed withthe capability of efficiently navigating non-convex solutionspaces (i.e. the models and the fitness functions do not needto be linear with respect to the observations and the states).This feature effectively opens the door for variational meth-ods to be used in disciplines where the linearization of thedriving dynamics is either impractical, inconvenient, or un-desirable. Whereas any traditional multi-objective global op-timization method would work, our implementation of OP-TIMISTS features a state-of-the-art adaptive ensemble algo-rithm similar to the algorithm of Vrugt and Robinson (2007),AMALGAM, that allows model simulations to be run in par-allel (Crainic and Toulouse, 2010). The optimizer ensembleincludes a genetic algorithm (Deb et al., 2002) and a hybridapproach that combines ant colony optimization (Socha andDorigo, 2008) and Metropolis–Hastings sampling (Haario etal., 2001).

During the optimization step, the group of optimizers isused to generate nevo new sample states sti based on those inthe current ensemble. For example, the genetic algorithm se-lects pairs of base samples with high performance scores f iand then proceeds to combine their individual values usingstandard crossover and mutation operators. The simulationand evaluation steps are repeated for these new samples, andthen this iterative process is repeated until the particle en-semble has a size of n. Note that wroot and psamp thus deter-



mine what percentage of the particles is generated in whichway. For example, for relatively small values of wroot and apsamp of 0.2, 80 % of the particles will be generated by theoptimization algorithms. In this way, OPTIMISTS offers itsusers the flexibility to behave anywhere in the range betweenfully Bayesian (psamp = 1) and fully variational (psamp = 0)in terms of particle generation. In the latter case, in which noroot and random samples are available, the initial populationor ensemble of states sti is sampled uniformly from the viablerange of each state variable.

2.1.6 Ranking step

A fundamental aspect of OPTIMISTS is the way in whichit provides a probabilistic interpretation to the results of themulti-objective evaluation, thus bridging the gap betweenBayesian and variational assimilation. Such method has beenused before (Dumedah et al., 2011) and is based on theemployment of non-dominated sorting (Deb, 2014), anothertechnique from the multi-objective optimization literature,which is used to balance the potential tensions between var-ious objectives. This sorting approach is centred on the con-cept of dominance, instead of organizing all particles fromthe best to the worst. A particle dominates another if it out-performs it according to at least one of the criteria/objectiveswhile simultaneously is not being outperformed accordingto any of the others. Following this principle, in the rankingstep particles are grouped in fronts comprised of memberswhich are mutually non-dominated; that is, none of them isdominated by any of the rest. Particles in a front, therefore,represent the effective trade-offs between the competing cri-teria.

Figure 1c illustrates the result of non-dominated sortingapplied to nine particles being analysed under two objectives:minimum deviation from observations and maximum likeli-hood given the background state distribution St . Note that,if a single-objective function is used, the sorting method as-signs ranks from best to worst according to that function, andtwo particles would only share ranks if their fitness values co-incide. In our implementation we use the fast non-dominatedsorting algorithm to define the fronts and assign the corre-sponding ranks ri (Deb et al., 2002). More efficient non-dominated sorting alternatives are available if performancebecomes an issue (Zhang et al., 2015).

2.1.7 Weighting step

In this final step, OPTIMISTS assigns weights wi to eachparticle according to its rank ri as shown in Eqs. (2) and (3).This Gaussian weighting depends on the ensemble size n andthe greed parameter g and is similar to the one proposed bySocha and Dorigo (2008). When g is equal to zero, particlesin all fronts are weighted uniformly; when g is equal to one,only particles in the Pareto or first front are assigned non-zeroweights. With this, the final estimated probability distribution

of state variables for the next time step St+1t can be estab-lished using multivariate weighted kernel density estimation(details in the next sub-section), as demonstrated in Fig. 1e.,by taking all target states st+1ti (circles) as the centroids ofthe kernels. The obtained distribution St+1t can then be usedas the initial distribution for a new assimilation time step or,if the end of the assimilation window has been reached, it canbe used to perform (ensemble) forecast simulations.

wi =1

σ√

2πe−(ri−1)2

2σ2 (2)

σ = n ·[0.1+ 9.9 · (1− g)5

](3)

2.2 Model state probability distributions

As mentioned before, OPTIMISTS uses kernel density prob-ability distributions (West, 1993) to model the stochastic es-timates of the state-variable vectors. The algorithm requirestwo computations related to the state-variable probability dis-tribution St : obtaining the probability density p or likeli-hood L of a sample and generating random samples. The firstcomputation can be used in the evaluation step as an objec-tive function to preserve the consistency of particles with thestate history (e.g. to penalize aggressive departures from theprior conditions). It should be noted that several metrics thattry to approximate this consistency exist, from very simple(Dumedah et al., 2011) to quite complex (Ning et al., 2014).For example, it is common in variational data assimilation toutilize the background error term

Jbackground = (s− sb)TC−1 (s− sb) , (4)

where sb and C are the mean and the covariance of the back-ground state distribution (St in our case), which is assumedto be Gaussian (Fisher, 2003). The term Jbackground is pluggedinto the cost function shown in Eq. (1). For OPTIMISTS, wepropose that the probability density of the weighted state ker-nel density distribution St at a given point (p) be used as astand-alone objective. The density is given by Eq. (5) (Wandand Jones, 1994). If Gaussian kernels are selected, the ker-nel function K , parameterized by the bandwidth matrix B, isevaluated using Eq. (6).

p(s|s)=1∑wi

n∑i=1

[wi ·KB (s− si)] (5)

KGaussB (z)=

1√(2π)n · |B|

exp(−

12

zTB−1z)

(6)

Matrix B is the covariance matrix of the kernels and thusdetermines their spread and orientation in the state space.B is of size d × d , where d is the dimensionality of thestate distribution (i.e. the number of variables), and can bethought of as a scaled-down version of the background errorcovariance matrix C from the variational literature. In thissense, matrix B, together with the spread of the ensemble of



samples si , effectively encodes the uncertainty of the statevariables. Several optimization-based methods exist to com-pute B by attempting to minimize the asymptotic mean inte-grated squared error (AMISE) (Duong and Hazelton, 2005;Sheather and Jones, 1991). However, here we opt to use asimplified approach for the sake of computational efficiency:we determine B by scaling down the sample covariance ma-trix C using Silverman’s rule of thumb, which takes into ac-count the number of samples n and the dimensionality of thedistribution d , as shown in Eq. (7) (Silverman, 1986). Fig-ure 1 shows the density of two two-dimensional example dis-tributions using this method (Fig. 1a and e). If computationalconstraints are not a concern, using AMISE-based methodsor kernels with variable bandwidth (Hazelton, 2003; Terrelland Scott, 1992) could result in higher accuracy.

BSilverman=

(4

d + 2

) 2d+4· n−

2d+4 ·C (7)

Secondly, OPTIMISTS’ sampling step requires generatingrandom samples from a multivariate weighted kernel den-sity distribution. This is achieved by dividing the probleminto two: we first select the root sample and then generatea random sample from the kernel associated with that basesample. The first step corresponds to randomly sampling amultinomial distribution with n categories and assigning thenormalized weights of the particles as the probability of eachcategory. Once a root sample sroot is selected, a random sam-ple srandom can be generated from a vector v of independentstandard normal random values of size d and a matrix A asshown in Eq. (8). A can be computed from a Cholesky de-composition (Krishnamoorthy and Menon, 2011) such thatAAT = B. Alternatively, an eigendecomposition can be usedto obtain Q3QT

= B to then set A=Q312 .

srandom = sroot+Av (8)

Both computations (density or likelihood and sampling) re-quire B to be invertible and, therefore, that none of the vari-ables have zero variance or are perfectly linearly dependenton each other. Zero-variance variables must therefore be iso-lated and B marginalized before attempting to use Eq. (6) orto compute A. Similarly, linear dependencies must also beidentified beforehand. If we include variables one by one inthe construction of C, we can determine if a newly added oneis linearly dependent if the determinant of the extended sam-ple covariance matrix C is zero. Once identified, the regres-sion coefficients for the dependent variable can be efficientlycomputed from C following the method described by Fried-man et al. (2008). The constant coefficient of the regressionmust also be calculated for future reference. What this pro-cess effectively does is to determine a linear model for eachdependent variable that is represented by a set of regressioncoefficients. Dependent variables are not included in C, butthey need to be taken into account afterwards (e.g. by deter-

mining their values for the random samples by solving thelinear model with the values obtained for the variables in C).

2.3 High-dimensional state vectors

When the state vector of the model becomes large (i.e. d in-creases), as is the case for distributed high-resolution numer-ical models, difficulties start to arise when dealing with thecomputations involving the probability distribution. At first,the probability density, as computed with Eqs. (5) and (6),tends to diverge either towards zero or towards infinity. Thisphenomenon is related to the normalization of the density –so that it can integrate to one – and to its fast exponential de-cay as a function of the sample’s distance from the kernel’scentres. In these cases we propose replacing the density com-putation with an approximated likelihood formulation thatis proportional to the inverse square Mahalanobis distance(Mahalanobis, 1936) to the root samples, thus skipping theexponentiation and normalization operations of the Gaussiandensity. This simplification, which corresponds to the inversesquare difference between the sample value and the kernel’smean in the univariate case, is shown in Eq. (9). The result-ing distortion of the Gaussian bell-curve shape does not af-fect the results significantly, given that OPTIMISTS uses thefitness functions only to check for domination between parti-cles – so only the signs of the differences between likelihoodvalues are important and not their actual magnitudes.

LMahalanobis(s|S)=1∑wi

n∑i=1

wi∣∣(s− si)TB−1 (s− si)

∣∣ (9)

However, computational constraints might also make thissimplified approach unfeasible both due to the O(d2) spacerequirements for storing the bandwidth matrix B and theO(d3) time complexity of the decomposition algorithms,which rapidly become huge burdens for the memory and theprocessors. Therefore, we can chose to sacrifice some accu-racy by using a diagonal bandwidth matrix B which doesnot include any covariance term – only the variance termsin the diagonal are computed and stored. This implies that,even though the multiplicity of root samples would help inmaintaining a large portion of the covariance, another por-tion is lost by preventing the kernels from reflecting the exist-ing correlations. In other words, variables would not be ren-dered completely independent, but rather conditionally inde-pendent because the kernels are still centred on the set ofroot samples. Kernels using diagonal bandwidth matrices arereferred to as “D-class” kernels while those using the fullcovariance matrix are referred to as “F-class” kernels. ThekF-class parameter controls which version is used.

With only the diagonal terms of matrix B available (bjj ),we opt to roughly approximate the likelihood by computingthe average of the standardized marginal likelihood value for



each variable j , as shown in Eq. (10):

Lindependent(s|S)=1

d√

2π∑wi

d∑j=1

n∑i=1{

wi · exp

[−

(sj − si,j

)22bjj

]}, (10)

where sj represents the j th element of state vector s andsi,j represents the j th element of the ith sample of proba-bility distribution S. Independent and marginal random sam-pling of each variable can also be applied to replace Eq. (8)by adding random Gaussian residuals to the elements of theselected root sample sroot. Sparse bandwidth matrices (Fried-man et al., 2008; Ghil and Malanotte-Rizzoli, 1991) or low-rank approximations (Bannister, 2008; Ghorbanidehno et al.,2015; Li et al., 2015) could be worthwhile intermediate al-ternatives to our proposed quasi-independent approach to beexplored in the future.

3 Experimental set-up

In this section we prepare the elements to investigate whetherOPTIMISTS can help improve the forecasting skill of hy-drologic models. More specifically, the experiments seekto answer the following questions. Which characteristicsof Bayesian and variational methods are the most advanta-geous? How can OPTIMISTS be configured to take advan-tage of these characteristics? How does the algorithm com-pare to established data assimilation methods? And how doesit perform with high-dimensional applications? To help an-swer these questions, this section first introduces two casestudies and then it describes a traditional PF that was usedfor comparison purposes.

3.1 Case studies

We coupled a Java implementation of OPTIMISTS withtwo popular open-source distributed hydrologic modellingengines: Variable Infiltration Capacity (VIC) (Liang et al.,1994, 1996a b; Liang and Xie, 2001, 2003) and the Dis-tributed Hydrology Soil Vegetation Model (DHSVM) (Wig-mosta et al., 1994, 2002). VIC is targeted at large water-sheds by focusing on vertical subsurface dynamics and alsoenabling intra-cell precipitation, soil, and vegetation hetero-geneity. The DHSVM, on the other hand, was conceived forhigh-resolution representations of the Earth’s surface, allow-ing for saturated and unsaturated subsurface flow routing and1-D or 2-D surface routing (Zhang et al., 2018). Both enginesneeded several modifications so that they could be executedin a non-continuous fashion as required for sequential assim-ilation. Given the non-Markovian nature of surface routingschemes coupled with VIC that are based either on multi-scale approaches (Guo et al., 2004; Wen et al., 2012) or on

the unit hydrograph concept (Lohmann et al., 1998), a sim-plified routing routine was developed that treats the modelcells as channels – albeit with longer retention times. In thesimplified method, direct run-off and baseflow produced byeach model cell is partly routed through an assumed equiva-lent channel (slow component) and partly poured directly tothe channel network (fast component). Both the channel net-work and the equivalent channels representing overland flowhydraulics are modelled using the Muskingum method. Onthe other hand, several important bugs in version 3.2.1 of theDHSVM, mostly related to the initialization of state variablesbut also pertaining to routing data and physics, were fixed.

We selected two watersheds to perform streamflow fore-casting tests using OPTIMISTS: one with the VIC modelrunning at a 1/8◦ resolution for the Blue River in Oklahomaand the other with the DHSVM running at a 100 m resolu-tion for the Indiantown Run in Pennsylvania. Table 3 liststhe main characteristics of the two test watersheds and theinformation of their associated model configurations. Fig-ure 2 shows the land cover map together with the layoutof the modelling cells for the two watersheds. The multi-objective ensemble optimization algorithm included in OP-TIMISTS was employed to calibrate the parameters of thetwo models with the streamflow measurements from the cor-responding USGS stations. For the Blue River, the traditional`2-norm Nash–Sutcliffe efficiency (NSE`2 ) (which focusesmostly on the peaks of hydrographs), an `1-norm versionof the Nash–Sutcliffe efficiency coefficient (NSE`1 ) (Krauseet al., 2005), and the mean absolute relative error (MARE)(which focuses mostly on the inter-peak periods) were usedas optimization criteria. From 85 600 candidate parameteri-zations tried, one was chosen from the resulting Pareto frontwith NSE`2 = 0.69, NSE`1 = 0.56, and MARE= 44.71 %.For the Indiantown Run, the NSE`2 , MARE, and absolutebias were optimized, resulting in a parameterization, outof 2575, with NSE`2 = 0.81, MARE= 37.85 %, and an ab-solute bias of 11.83 L s−1.

These optimal parameter sets, together with additional setsproduced in the optimization process, were used to run themodels and determine a set of time-lagged state-variable vec-tors s to construct the state probability distribution S0 at thebeginning of each of a set of data assimilation scenarios. Thestate variables include liquid and solid interception; ponding,water equivalent, and temperature of the snow packs; andmoisture and temperature of each of the soil layers. Whilewe do not expect all of these variables to be identifiable andsensitive within the assimilation problem, we decided to bethorough in their inclusion – a decision that also increases thechallenge for the algorithm in terms of the potential for over-fitting. The Blue River model application has 20 cells, with amaximum of seven intra-cell soil–vegetation partitions. Afteradding the stream network variables, the model has a total ofd = 812 state variables. The Indiantown Run model applica-tion has a total of 1472 cells and d = 33455 state variables.



Table 3. Characteristics of the two test watersheds: Blue River and Indiantown Run. US hydrologic units are defined in Seaber et al. (1987).Elevation information was obtained from the Shuttle Radar Topography Mission (Rodríguez et al., 2006); land cover and impervious per-centage from the National Land Cover Database, NLCD (Homer et al., 2012); soil type from CONUS-SOIL (Miller and White, 1998); andprecipitation, evapotranspiration, and temperature from NLDAS-2 (Cosgrove et al., 2003). The streamflow and temperature include theirrange of variation of 90 % of the time (5 % tails at the high and low end are excluded).

Model characteristic Blue River Indiantown Run

USGS station; US hydrologic unit 07332500; 11140102 01572950; 02050305Area (km2); impervious 3031; 8.05 % 14.78; 0.83 %Elevation range; average slope 158–403 m; 3.5 % 153–412 m; 14.5 %Land cover 43 % grassland, 28 % forest, 74.6 % deciduous forest

21 % pasture/haySoil type Clay loam (26.4 %), clay (24.8 %), Silt loam (51 %), sandy loam (49 %)

sandy loam (20.26 %)Average streamflow (90 % range) 9.06 m3 s−1 (0.59–44.71 m3 s−1) 0.3 m3 s−1 (0.035–0.793 m3 s−1)Average precipitation; average ET 1086; 748 mm yr−1 1176; 528 mm yr−1

Average temperature (90 % range) 17.26 ◦C (2.5–31◦ C) 10.9 ◦C (−3.5–24 ◦C)Model cells; stream segments; d 20; 14; 812 1472; 21; 33 455Resolution 0.125◦; daily 100 m; hourlyCalibration 167 parameters; 85 months; 18 parameters; 20 months; objectives:

objectives: NSE`2 , NSE`1 , MARE NSE`2 , MARE, absolute bias

Figure 2. Maps of the two test watersheds in the United States displaying the 30 m resolution land cover distribution from the NLCD (Homeret al., 2012). (a) Oklahoma’s Blue River watershed 0.125◦ resolution VIC model application (20 cells). (b) Pennsylvania’s Indiantown Runwatershed 100 m resolution DHSVM model application (1472 cells).

Three diverse scenarios were selected for the Blue River,each of them comprised of a 2-week assimilation period(when streamflow observations are assimilated) and a 2-weekforecasting period (when the model is run in an open loop us-ing the states obtained at the end of the assimilation period):Scenario 1, starting on 15 October 1996, is rainy through the

entire 4 weeks. Scenario 2, which starts on 15 January 1997,has a dry assimilation period and a mildly rainy forecast pe-riod. Scenario 3, starting on 1 June 1997, has a relativelyrainy assimilation period and a mostly dry forecast period.Two scenarios, also spanning 4 weeks, were selected for the



Table 4. Set-up of the three factorial experiments, including the watershed, the total number of configurations (conf.), the values assignedto OPTIMISTS’ parameters, and which objectives (objs.) were used (one objective: minimize MAE given the streamflow observations; twoobjectives: minimize MAE and maximize likelihood given the source or background state distribution St). nevo was set to 25 in all cases.The total number of configurations results from combining all the possible parameter assignments listed for each experiment. Note that forExperiment 3 there are configurations that require a 4-week assimilation period (all others have a length of 2 weeks).

No. Watershed Conf. 1t n wroot psamp kF-class g Objs.

1 Blue River 48 1 d, 5 d, 2 w 100, 500 0.95 0.25, 1 false, true 0.75 1, 22 Indiantown Run 32 1 h, 2 w 100, 200 0.6, 0.95 0.25, 1 false 0.75 1, 23 Indiantown Run 24 1 h, 6 h, 1 d, 3.5 d, 2 w, 4 w 100 0.95 0.4, 1 false 0.5, 1 2

Indiantown Run, one starting on 26 July 2009 and the otheron 26 August 2009.

We used factorial experiments (Montgomery, 2012) to testdifferent configurations of OPTIMISTS on each of these sce-narios, by first assimilating the streamflow and then measur-ing the forecasting skill. In this type of experimental designsa set of assignments is established for each parameter andthen all possible assignment combinations are tried. The de-sign allows us to establish the statistical significance of al-tering several parameters simultaneously, providing an ade-quate framework for determining, for example, whether us-ing a short or a long assimilation time step 1t is preferable,or if utilizing the optional optimization step within the al-gorithm is worthwhile. Table 4 shows the set-up of each ofthe three full factorial experiments we conducted, togetherwith the selected set of assignments for OPTIMISTS’ param-eters. The forecasts were produced in an ensemble fashion,by running the models using each of the samples si from thestate distribution S at the end of the assimilation time period,and then using the samples’ weights wi to produce an aver-age forecast. Deterministic model parameters (those from thecalibrated models) and forcings were used in all simulations.

Observation errors are usually taken into account in tradi-tional assimilation algorithms by assuming a probability dis-tribution for the observations at each time step and then per-forming a probabilistic evaluation of the predicted value ofeach particle/member against that distribution. As mentionedin Sect. 2, such a fitness metric, like the likelihood utilizedin PFs to weight candidate particles, is perfectly compatiblewith OPTIMISTS. However, since it is difficult to estimatethe magnitude of the observation error in general, and fitnessmetrics f i here are only used to determine (non-)dominancebetween particles, we opted to use the mean absolute er-ror (MAE) with respect to the streamflow observations in allcases.

For the Blue River scenarios, a secondary likelihood ob-jective or metric was used in some cases to select for par-ticles with higher consistency with the state history. It wascomputed using either Eq. (10) if kF-class was set to falseor Eq. (9) if it was set to true. Equation (10) was used forall Indiantown Run scenarios given the large number of di-mensions. The assimilation period was of 2 weeks for most

configurations, except for those in Experiment 3, which have1t = 4 weeks. During both the assimilation and the fore-casting periods we used unaltered streamflow data from theUSGS and forcing data from the North American Land DataAssimilation System (NLDAS-2) (Cosgrove et al., 2003) –even though a forecasted forcing would be used instead in anoperational setting (e.g. from systems like NAM, Rogers etal., 2009; or ECMWF, Molteni et al., 1996). While adoptingperfect forcings for the forecast period leads to an overesti-mation of their accuracy, any comparisons with control runsor between methods are still valid as they all share the samebenefit. Also, removing the uncertainty in the meteorologi-cal forcings allows the analysis to focus on the uncertaintycorresponding to the land surface.

3.2 Data assimilation method comparison

Comparing the performance of different configurations ofOPTIMISTS can shed light into the adequacy of individ-ual strategies utilized by traditional Bayesian and variationalmethods. For example, producing all particles with the opti-mization algorithms (psamp = 0), setting long values for 1t ,and utilizing a traditional two-term cost function as that inEq. (1) makes the method behave somewhat as a stronglyconstrained 4D-Var approach, while sampling all particlesfrom the source state distribution (psamp = 1), setting 1t

equal to the model time step, and using a single likelihoodobjective involving the observation error would resemblea PF. Herein we also compare OPTIMISTS with a tradi-tional PF on both model applications. Since the forcing isassumed to be deterministic, the implemented PF uses Gaus-sian perturbation of resampled particles to avoid degener-ation (Pham, 2001). Resampling is executed such that theprobability of duplicating a particle is proportional to theirweight (Moradkhani et al., 2012).

Additionally, the comparison is performed using a con-tinuous forecasting experiment set-up instead of a scenario-based one. In this continuous test, forecasts are performedevery time step and compiled in series for different fore-cast lead times that span several months. Forecast lead timesare of 1, 3, 6, and 12 days for the Blue River and of 6 h,and 1, 4, and 16 days for the Indiantown Run. Before eachforecast, both OPTIMISTS and the PF assimilate stream-



flow observations for the assimilation time step of each al-gorithm (daily for the PF). The assimilation is performedcumulatively, meaning that the initial state distribution St

was produced by assimilating all the records available sincethe beginning of the experiment until time t . The forecastedstreamflow series are then compared to the actual measure-ments to evaluate their quality using deterministic metrics(NSE`2 , NSE`1 , and MARE) and two probabilistic ones: theensemble-based continuous ranked probability score (CRPS)(Bröcker, 2012), which is computed for each time step andthen averaged for the entire duration of the forecast; and theaverage normalized probability density p of the observedstreamflow qobs given the distribution of the forecasted en-semble qforecast,

p(qobs|qforecast

)=

n∑i=1wi ·

(2πb2)−2

· exp[−(qobs− qi)

2/(2b2)]

n∑i=1wi

, (11)

where the forecasted streamflow qforecast is composed of val-ues qi for each particle i and accompanying weight wi , andb is the bandwidth of the univariate kernel density estimate.The bandwidth b can be obtained by utilizing Silverman’srule of thumb (Silverman, 1986). The probability p is com-puted every time step, then normalized by multiplying by thestandard deviation of the estimate, and then averaged for alltime steps. As opposed to the CRPS, which can only givean idea of the bias of the estimate, the density p can detectboth bias and under- or overconfidence: high values for thedensity indicate that the ensemble is producing narrow esti-mates around the true value, while low values indicate eitherthat the stochastic estimate is spread too thin or is centred faraway from the true value.

4 Results and discussion

This section summarizes the forecasting results obtainedfrom the three scenario-based experiments and the continu-ous forecasting experiments on the Blue River and the In-diantown Run model applications. The scenario-based ex-periments were performed to explore the effects of multipleparameterizations of OPTIMISTS, and the performance wasanalysed as follows. The model was run for the duration ofthe forecast period (2 weeks) using the state configurationencoded in each root state si of the distribution S obtainedat the end of the assimilation period for each configurationof OPTIMISTS and each scenario. We then computed themean streamflow time series for each case by averaging themodel results for each particle Pi (the average was weightedbased on the corresponding weights wi). With this averagedstreamflow series, we compute the three performance metrics– the NSE`2 , the NSE`1 , and the MARE – based on the ob-servations from the corresponding stream gauge. The values

for each experiment, scenario, and configuration are listed intables in the Supplement. With these, we compute the changein the forecast performance between each configuration anda control open-loop model run (one without the benefit ofassimilating the observations).

4.1 Blue River – low-resolution application

The Supplement includes the performance metrics for all thetested configurations on all scenarios and for all scenario-based experiments. Figure 3 summarizes the results for Ex-periment 1 with the VIC model application for the BlueRiver watershed, in which the distributions of the changesin MARE after marginalizing the results for each scenarioand each of the parameter assignments are shown. That is,each box (and pair of whiskers) represents the distribution ofchange in MARE of all cases in the specified scenario or forwhich the specified parameter assignment was used. Nega-tive values in the vertical axis indicate that OPTIMISTS de-creased the error, while positive values indicate it increasedthe error. It can be seen that, on average, OPTIMISTS im-proves the precision of the forecast in most cases, except forseveral of the configurations in Scenario 1 (for this scenariothe control already produces a good forecast) and when usingan assimilation step1t of 1 day. We performed an analysis ofvariance (ANOVA) to determine the statistical significanceof the difference found for each of the factors indicated inthe horizontal axis. While Fig. 3 shows the p values for themain effects, the full ANOVA table for all experiments canbe found in the Supplement. From the values in Fig. 3, wecan conclude that the assimilation time step, the number ofobjectives, and the use of optimization algorithms are all sta-tistically significant. On the other hand, the number of parti-cles and the use of F-class kernels are not.

A 1t of 5 days produced the best results overall for thetested case, suggesting that there exists a sweet spot that bal-ances the amount of information being assimilated (larger fora long 1t) and the number of state variables to be modi-fied (larger for a small 1t). Based on such results, it is rea-sonable to assume that the sweet spot may depend on thetime series of precipitation, the characteristics of the water-shed, and the temporal and spatial resolutions of its modelapplication. From this perspective, the poor results for a stepof 1 day could be explained in terms of overfitting, wherethere are many degrees of freedom and only one value be-ing assimilated per step. Evaluating particles in the light oftwo objectives, one minimizing departures from the observa-tions and the other maximizing the likelihood of the sourcestate, resulted in statistically significant improvements com-pared to using the first objective alone. Additionally, the datasuggest that not executing the optional optimization step ofthe algorithm (optimization= false), but instead relying onlyon particles sampled from the prior or source distribution,is also beneficial. These two results reinforce the idea thatmaintaining consistency with the state history to some ex-



Figure 3. Box plots of the changes in forecasting error (MARE) achieved while using OPTIMISTS on Experiment 1 (Blue River). Changesare relative to an open-loop control run where no assimilation was performed. Each column corresponds to the distribution of the errorchanges on the specified scenario or assignment to the indicated parameter. Positive values indicate that OPTIMISTS increased the error,while negative values indicate it decreased the error. Outliers are noted as asterisks and values were limited to 100 %. For the one-objectivecase, the particles’ MAE was to be minimized; for the two-objective case, the likelihood given the background was to be maximized inaddition. No optimization (“false”) corresponds to psamp = 1.0 (i.e. all samples are obtained from the prior distribution); “true” correspondsto psamp = 0.25. The p values were determined using ANOVA (Montgomery, 2012) and indicate the probability that the differences in meanscorresponding to boxes of the same colour are produced by chance (e.g. values close to zero indicate certainty that the parameter effectivelyaffects the forecast error).

tent is of paramount importance, perhaps to the point wherethe strategies used in Bayesian filters and variational meth-ods are insufficient in isolation. Indeed, the best performancewas observed only when both sampling was limited to gener-ate particles from the prior state distribution and the particleswere evaluated for their consistency with that distribution.

On the other hand, we found it counterintuitive that nei-ther using a larger particle ensemble nor taking into accountstate-variable dependencies through the use of F-class ker-nels leads to improved results. In the first case it could be hy-pothesized that using too many particles could lead to over-fitting, since there would be more chances of particles beinggenerated that happen to match the observations better butfor the wrong reasons. In the second case, the non-parametricnature of kernel density estimation could be sufficient for en-coding the raw dependencies between variables, especially inlow-resolution cases like this one, in which significant corre-lations between variables in adjacent cells are not expectedto be too high. Both results deserve further investigation, es-pecially concerning the impact of D- vs. F-class kernels inhigh-dimensional models.

Interestingly, the ANOVA also yielded small p valuesfor several high-order interactions (see the ANOVA table in

the Supplement). This means that, unlike the general casefor factorial experiments as characterized by the sparsity-of-effects principle (Montgomery et al., 2009), specific com-binations of multiple parameters have a large effect on theforecasting skill of the model. There are significant inter-actions (with p smaller than 0.05) between the followinggroups of factors: objectives and 1t (p = 0.001); n andkF-class (p = 0.039); 1t and the use of optimization (p =0.000); the use of optimization and kF-class (p = 0.029); theobjectives, 1t , and the use of optimization (p = 0.043); n,1t , and kF-class (p = 0.020); n, the use of optimization, andkF-class (p = 0.013); and n, 1t , the use of optimizers, andkF-class (p = 0.006). These interactions show that, for exam-ple, (i) using a single objective is especially inadequate whenthe time step is 1 day or when optimization is used; (ii) em-ploying optimization is only significantly detrimental when1t is 1 day – probably because of intensified overfitting; and(iii) choosing F-class kernels leads to higher errors when 1tis small, when n is large, and when the optimizers are beingused.

Based on these results, we recommend the use of both ob-jectives and no optimization as the preferred configurationof OPTIMISTS for the Blue River application. A time step



Table 5. Continuous daily streamflow forecast performance metrics for the Blue River application using OPTIMISTS (1t = 7 days; three ob-jectives: NSE`2 , MARE, and likelihood; n= 30; no optimization; and D-class kernels) and a traditional PF (n= 30). The continuous forecastextends from January to June 1997. The NSE`2 , NSE`1 , and MARE (deterministic) are computed using the mean streamflow of the forecastensembles and contrasting it with the daily observations, while the CRPS and the density (probabilistic) are computed taking into account allthe members of the forecasted ensemble.

Algorithm Lead NSE`2 NSE`1 MARE CRPS Densitytime (m3 s−1)

OPTIMISTS

1 day 0.497 0.293 51.40 % 7.173 0.0613 days 0.527 0.312 50.16 % 6.959 0.0656 days 0.534 0.315 50.18 % 6.945 0.07312 days 0.516 0.297 51.26 % 7.124 0.078

Particle filter

1 day 0.675 0.522 30.06 % 4.480 0.0983 days 0.623 0.493 33.20 % 4.744 0.1136 days 0.602 0.473 35.79 % 5.000 0.10912 days 0.515 0.432 38.36 % 5.593 0.105

0

100

200

300

400

500

6001

10

100

1000

Prec

ipita

tion

(mm

d )

Stre

amflo

w (m

3s

)

200

300

400

Soil

moi

stur

e (m

m)

Precipitation Observed

OPTIMISTS Particle filter

-1 -1

Figure 4. Comparison of 6-day lead time probabilistic streamflow (a) and area-averaged soil moisture (b) forecasts between OPTIMISTS(1t = 7 days; three objectives: NSE`2 , MARE, and likelihood; n= 30; no optimization; and D-class kernels) and a traditional PF (n= 30)for the Blue River. The dark blue and orange lines indicate the mean of OPTIMISTS’ and the PF’s ensembles, respectively, while the lightblue and light orange bands illustrate the spread of the forecast by highlighting the areas where the probability density of the estimate is atleast 50 % of the density at the mode (the maximum) at that time step. The green bands indicate areas where the light blue and light orangebands intersect.

of around 5 days appears to be adequate for this specificmodel application. Also, without strong evidence for theiradvantages, we recommend using more particles or kernelsof class F only if there is no pressure for computational fru-gality. However, the number of particles should not be toosmall to ensure an appropriate sample size.

Table 5 shows the results of the 5-month-long continuousforecasting experiment on the Blue River using a 30-particlePF and a configuration of OPTIMISTS with a 7-day assimi-lation time step1t , three objectives (NSE`2 , MARE, and the

likelihood), 30 particles, no optimization, and D-class ker-nels. This specific configuration of OPTIMISTS was chosenfrom a few that were tested with the recommendations aboveapplied. The selected configuration was the one that best bal-anced the spread and the accuracy of the ensemble as someconfigurations had slightly better deterministic performancebut larger ensemble spread for dry weather – which lead toworse probabilistic performance.

Figure 4 shows the probabilistic streamflow forecasts forboth algorithms for a lead time of 6 days. The portrayed



Figure 5. Box plots of the changes in forecasting performance (NSE`2 , NSE`1 , and MARE) achieved while using OPTIMISTS on Experi-ment 2 (Indiantown Run). Changes are relative to an open-loop control run where no assimilation was performed. Each column correspondsto the distribution of the error metric changes on the specified scenario or assignment to the indicated parameter. Outliers are noted as starsand values were constrained to NSE`2 ≥−3, NSE`1 ≥−3, and MARE≤ 200 %. Positive values indicate improvements for the NSE`2 andthe NSE`1 . The meaning for the MARE and for other symbols is the same as those defined in Fig. 3.

evolution of the density, in which the mean does not nec-essarily correspond to the centre of the ensemble spread, ev-idences the non-Gaussian nature of both estimates. Both theselected configuration of OPTIMISTS and the PF methodsshow relatively good performance for all lead times (1, 3, 6,and 12 days) based on the performance metrics. However, thePF generally outperforms OPTIMISTS.

We offer three possible explanations for this result. First,the relatively low dimensionality of this test case does notallow OPTIMISTS to showcase its real strength, perhaps es-pecially since the large scale of the watershed does not allowfor tight spatial interactions between state variables. Second,OPTIMISTS can find solutions based on multiple objectivesrather than a single one, which could be advantageous whenmultiple types of observations are available (e.g. of stream-flow, evapotranspiration, and soil moisture). Thus, the so-lutions are likely not the best for each individual objective,but the algorithm balances their overall behaviour across themultiple objectives. Due to the lack of observations on multi-ple variables, only streamflow observations are used in theseexperiments even though more than one objective is used.Since it is the case that these objectives are consistent witheach other, to a large extent, for the studied watershed, thestrengths of using multiple objectives within the Pareto ap-proach in OPTIMISTS cannot be fully evidenced. Third, ad-

ditional efforts might be needed to find a configuration of thealgorithm, together with a set of objectives, that best suits thespecific conditions of the tested watershed.

While PFs remain easier to use out of the box because oftheir ease of configuration, the fact that adjusting the parame-ters of OPTIMISTS allowed us to trade off deterministic andprobabilistic accuracy points to the adaptability potential ofthe algorithm. This allows for probing the spectrum betweenexploration and exploitation of candidate particles – whichusually leads to higher and lower diversity of the ensemble,respectively.

4.2 Indiantown Run – high-resolution application

Figure 5 summarizes the changes in performance when us-ing OPTIMISTS in Experiment 2. In this case, the more uni-form forcing and streamflow conditions of the two scenar-ios allowed us to statistically analyse all three performancemetrics. For Scenario 1 we can see that OPTIMISTS pro-duces a general increase in the Nash–Sutcliffe coefficients,but a decline in the MARE, evidencing tension between fit-ting the peaks and the inter-peak periods simultaneously. Forboth scenarios there are configurations that performed verypoorly, and we can look at the marginalized results in the boxplots for clues into which parameters might have caused this.Similar to the Blue River case, the use of a 1 h time step sig-



Figure 6. Box plots of the changes in forecasting performance (NSE`2 , NSE`1 , and MARE) achieved while using OPTIMISTS on Experi-ment 3 (Indiantown Run). Changes are relative to an open-loop control run where no assimilation was performed. Each column correspondsto the distribution of the error metric changes on the specified scenario or assignment to the indicated parameter. Positive values indicateimprovements for the NSE`2 and the NSE`1 . See the caption of Fig. 3 for more information.

nificantly reduced the forecast skill, while the longer step al-most always improved it; and the inclusion of the secondaryhistory-consistent objective (two objectives) also resulted inimproved performance. Not only does it seem that for thiswatershed the secondary objective mitigated the effects ofoverfitting, but it was interesting to note some configurationsin which using it actually helped to achieve a better fit duringthe assimilation period.

While the ANOVA also provided evidence against the useof optimization algorithms, we are reluctant to instantly rulethem out on the grounds that there were statistically signifi-cant interactions with other parameters (see the ANOVA ta-ble in the Supplement). The optimizers led to poor resultsin cases with 1 h time steps or when only the first objectivewas used. Other statistically significant results point to thebenefits of using the root samples more intensively (in oppo-sition to using random samples) and, to a lesser extent, to thebenefits of maintaining an ensemble of moderate size.

Figure 6 shows the summarized changes in Experiment 3,where the effect of the time step1t is explored in greater de-tail. Once again, there appears to be evidence favouring thehypothesis that there exists a sweet spot, and in this case it ap-pears to be close to the 2-week mark: both shorter and longertime steps led to considerably poorer performance. In thisexperiment, with all configurations using both optimizationobjectives, we can see that there are no clear disadvantages

of using optimization algorithms (but also no advantages).Experiment 3 also shows that the effect of the greed parame-ter g is not very significant. That is, selecting some particlesfrom dominated fronts to construct the target state distribu-tion, and not only from the Pareto front, does not seem toaffect the results.

Table 6 and Fig. 7 show the results from comparing con-tinuous forecasts from the PF and from a configuration ofOPTIMISTS with a time step of 1 week, two objectives,50 particles, and no optimization. Both algorithms displayoverconfidence in their estimations, which is evidenced inFig. 7 by the bias and narrowness of the ensembles’ spread.It is possible that a more realistic incorporation of uncertain-ties pertaining to model parameters and forcings (which, asmentioned, are trivialized in these tests) would help to com-pensate for overconfidence. For the time being, these exper-iments help characterize the performance of OPTIMISTS incontrast with the PF, as both algorithms are deployed underthe same circumstances. In this sense, while the forecasts ob-tained using the PF show slightly better results for lead timesof 6 h and 1 day, OPTIMISTS shows a better characterizationof the ensemble’s uncertainty for the longer lead times.

OPTIMISTS’ improved results in the high-resolution testcase over those in the low-resolution one suggest that thestrengths of the hybrid method might become more apparentas the dimensionality, and therefore the difficulty, of the as-



Table 6. Continuous hourly streamflow forecast performance metrics for the Indiantown Run application using OPTIMISTS (1t = 7 days,two objectives, n= 50, no optimization, and D-class kernels) and a traditional PF (n= 50). The continuous forecast extends from Septemberto December 2009. The NSE`2 , NSE`1 , and MARE (deterministic) are computed using the mean streamflow of the forecast ensembles andcontrasting it with the daily observations, while the CRPS and the density (probabilistic) are computed taking into account all the membersof the forecasted ensemble.

Algorithm Lead NSE`2 NSE`1 MARE CRPS Densitytime (L s−1)

OPTIMISTS

6 h 0.574 0.316 32.25 % 97.27 0.0161 day 0.609 0.340 31.42 % 93.92 0.0134 days 0.573 0.316 32.20 % 97.19 0.02516 days 0.521 0.272 33.90 % 103.51 0.013

Particle filter

6 h 0.660 0.480 26.87 % 79.61 0.0611 day 0.639 0.464 26.68 % 82.75 0.0514 days 0.558 0.401 27.42 % 93.20 0.02116 days 0.520 0.346 28.75 % 102.37 0.010

Figure 7. Comparison of 4-day lead time probabilistic streamflow (a) and area-averaged soil moisture (b) forecasts between OPTIMISTS(1t = 7 days, two objectives, n= 50, no optimization, and D-class kernels) and a traditional PF (n= 50) for the Indiantown Run. The darkblue and orange lines indicate the mean of OPTIMISTS’ and the PF’s ensembles, respectively, while the light blue and light orange bandsillustrate the spread of the forecast by highlighting the areas where the probability density of the estimate is at least 50 % of the density atthe mode (the maximum) at that time step. The green bands indicate areas where the light blue and light orange bands intersect. Layer 2 ofthe soil corresponds to 100 to 250 mm depths.

similation problem increases. However, while OPTIMISTSwas able to produce comparable results to those of the PF,it was not able to provide definite advantages in terms ofaccuracy. As suggested before, additional efforts might beneeded to find the configurations of OPTIMISTS that bettermatch the characteristics of the individual case studies and,as with the Blue River, the limitation related to the lack of ob-servations of multiple variables also applies here. Moreover,the implemented version of the PF did not present the parti-

cle degeneracy or impoverishment problems usually associ-ated with these filters when dealing with high dimensionality,which also prompts further investigation.

4.3 Computational performance

It is worth noting that the longer the assimilation time step,the faster the entire process is. This occurs because, eventhough the number of hydrological calculations is the same inthe end, for every assimilation time step the model files need



to be generated accordingly, then accessed, and finally theresult files written and accessed. This whole process takes aconsiderable amount of time. Therefore, everything else be-ing constant, sequential assimilation (like with PFs) automat-ically imposes additional computational requirements. In ourtests we used RAM drive software to accelerate the processof running the models sequentially and, even then, the over-head imposed by OPTIMISTS was consistently below 10 %of the total computation time. Most of the computational ef-fort remained with running the model, both for VIC and theDHSVM. In this sense, model developers may consider al-lowing their engines to be able to receive input data frommain memory, if possible, to facilitate data assimilation andother similar processes.

4.4 Recommendations for configuring OPTIMISTS

Finally, here we summarize the recommended choices for theparameters in OPTIMISTS based on the results of the exper-iments. In the first place, given their low observed effect, de-fault values can be used for g (around 0.5). A wroot higherthan 90 % was found to be advantageous. The execution ofthe optimization step (psamp < 1) was, on the other hand, notfound to be advantageous and, therefore, we consider it acleaner approach to simply generate all samples from the ini-tial distribution. Similarly, while not found to be disadvanta-geous, using diagonal bandwidth (D-class) kernels provide asignificant improvement in computational efficiency and arethus recommended for the time being. Future work will beconducted to further explore the effect of the bandwidth con-figuration in OPTIMISTS.

Even though only two objective functions were tested, onemeasuring the departures from the observations being assim-ilated and another measuring the compatibility of initial sam-ples with the initial distribution, the results clearly show thatit is beneficial to simultaneously evaluate candidate particlesusing both criteria. While traditional cost functions like theone in Eq. (1) do indeed consider both aspects, we argue thatusing multiple objectives has the added benefit of enrichingthe diversity of the particle ensemble and, ultimately, the re-sulting probabilistic estimate of the target states.

Our results demonstrated that the assimilation time stepis the most sensitive parameter and, therefore, its selectionmust be done with the greatest involvement. Taking the re-sults together, we recommend that multiple choices be triedfor any new case study looking to strike a balance betweenthe amount of information being assimilated and the num-ber of degrees of freedom. This empirical selection shouldalso be performed with a rough sense of what is the rangeof forecasting lead times that is considered the most impor-tant. Lastly, more work is required to provide guidelines toselect the number of particles n to be used. While the liter-ature suggests that more should increase forecast accuracy,our tests did not back this conclusion. We tentatively recom-mend trying different ensemble sizes based on the computa-

tional resources available and selecting the one that offers thebest observed trade-off between accuracy and efficiency.

5 Conclusions and future work

In this article we introduced OPTIMISTS, a flexible, model-independent data assimilation algorithm that effectively com-bines the signature elements from Bayesian and variationalmethods: by employing essential features from particle fil-ters, it allows performing probabilistic non-Gaussian esti-mates of state variables through the filtering of a set of par-ticles drawn from a prior distribution to better match theavailable observations. Adding critical features from varia-tional methods, OPTIMISTS grants its users the option ofexploring the state space using optimization techniques andevaluating candidate states through a time window of arbi-trary length. The algorithm fuses a multi-objective or Paretoanalysis of candidate particles with kernel density probabilitydistributions to effectively bridge the gap between the proba-bilistic and the variational perspectives. Moreover, the use ofevolutionary optimization algorithms enables its efficient ap-plication on highly non-linear models as those usually foundin most geosciences. This unique combination of featuresrepresents a clear differentiation from the existing hybrid as-similation methods in the literature (Bannister, 2016), whichare limited to Gaussian distributions and linear dynamics.

We conducted a set of hydrologic forecasting factorial ex-periments on two watersheds, the Blue River with 812 statevariables and the Indiantown Run with 33 455, at two dis-tinct modelling resolutions using two different modelling en-gines: VIC and the DHSVM, respectively. Capitalizing onthe flexible configurations available for OPTIMISTS, thesetests allowed us to determine which individual characteristicsof traditional algorithms prove to be the most advantageousfor forecasting applications. For example, while there is ageneral consensus in the literature favouring extended timesteps (4-D) over sequential ones (1-D–3-D), the results fromassimilating streamflow data in our experiments suggest thatthere is an ideal duration of the assimilation time step thatis dependent on the case study under consideration, on thespatiotemporal resolution of the corresponding model appli-cation, and on the desired forecast length. Sequential timesteps not only required considerably longer computationaltimes but also produced the worst results – perhaps giventhe overwhelming number of degrees of freedom in contrastwith the scarce observations available. Similarly, there was adrop in the performance of the forecast ensemble when thealgorithm was set to use overly long time steps.

Procuring the consistency of candidate particles, not onlywith the observations but also with the state history, led tosignificant gains in predictive skill. OPTIMISTS can be con-figured to both perform Bayesian sampling and find Pareto-optimal particles that trade off deviations from the observa-tions and from the prior conditions. This Bayesian and multi-



objective formulation of the optimization problem was es-pecially beneficial for the high-resolution watershed applica-tion, as it allows the model to overcome the risk of overfittinggenerated by the enlarged effect of equifinality.

On the other hand, our experiments did not produceenough evidence to recommend either exploring the statespace with optimization algorithms instead of doing so withsimple probabilistic sampling, the use of a larger number ofparticles above the established baseline of 100, or the com-putationally intensive utilization of full covariance matricesto encode the dependencies between variables in the kernel-based state distributions. Nevertheless, strong interactionsbetween several of these parameters suggest that some spe-cific combinations could potentially yield strong outcomes.Together with OPTIMISTS’ observed high level of sensitiv-ity to the parameters, these results indicate that there couldbe promise in the implementation of self-adaptive strategies(Karafotias et al., 2014) to assist in their selection in the fu-ture. With these experiments, we were able to configure thealgorithm to consistently improve the forecasting skill of themodels compared to control open-loop runs. Additionally,comparative tests showed that OPTIMISTS was able to re-liably produce adequate forecasts that were comparable tothose resulting from assimilating the observations with a par-ticle filter in the high-resolution application. While not be-ing able to provide consistent accuracy advantages over theimplemented particle filter, OPTIMISTS does offer consid-erable gains in computational efficiency given its ability toanalyse multiple model time steps each time.

Moreover, in this article we offered several alternativesin the implementation of the components of OPTIMISTSwhenever there were tensions between prediction accuracyand computational efficiency. In the future, we will focus onincorporating additional successful ideas from diverse assim-ilation algorithms and on improving components in such away that both of these goals are attained with ever-smallercompromises. For instance, the estimation of initial statesshould not be overburdened with the responsibility of com-pensating for structural and calibration deficiencies in themodel. In this sense, we embrace the vision of a unifiedframework for the joint probabilistic estimation of structures,parameters, and state variables (Liu and Gupta, 2007), whereit is important to address challenges associated with ap-proaches that would increase the indeterminacy of the prob-lem by adding unknowns without providing additional infor-mation or additional means of relating existing variables. Weexpect that with continued efforts OPTIMISTS will be a wor-thy candidate framework to be deployed in operational set-tings for hydrologic prediction and beyond.

Code and data availability. All the data utilized to construct themodels are publicly available through the internet from their corre-sponding US government agencies’ websites. The Java implemen-tation of OPTIMISTS and of the particle filter are available through

GitHub (2018) (https://github.com/felherc/). These sources includeall the information needed to replicate the experiments in this arti-cle.

Supplement. The supplement related to this article is availableonline at: https://doi.org/10.5194/hess-22-5759-2018-supplement.

Author contributions. FH designed and implemented OPTIMISTS,implemented the particle filter, performed the experiments, anddrafted the manuscript. XL identified problems for study; providedguidance; supervised the investigation, including experiment de-sign; and finalized the manuscript.

Competing interests. The authors declare that they have no conflictof interest.

Acknowledgements. The authors are thankful to the two refereesand the editor for their valuable comments and suggestions. Thiswork was supported in part by the United States Department ofTransportation through award no. OASRTRS-14-H-PIT to theUniversity of Pittsburgh and by the William Kepler WhitefordProfessorship from the University of Pittsburgh.

Edited by: Dimitri SolomatineReviewed by: Maurizio Mazzoleni and one anonymous referee

References

Adams, R. M., Houston, L. L., McCarl, B. A., Tiscareño, M.L., Matus, J. G., and Weiher, R. F.: The benefits to Mex-ican agriculture of an El Niño-southern oscillation (ENSO)early warning system, Agr. Forest Meteorol., 115, 183–194,https://doi.org/10.1016/S0168-1923(02)00201-0, 2003.

Andreadis, K. M. and Lettenmaier, D. P.: Assimilating re-motely sensed snow observations into a macroscalehydrology model, Adv. Water Resour., 29, 872–886,https://doi.org/10.1016/j.advwatres.2005.08.004, 2006.

Bannister, R. N.: A review of forecast error covariance statistics inatmospheric variational data assimilation. II: Modelling the fore-cast error covariance statistics, Q. J. Roy. Meteorol. Soc., 134,1971–1996, https://doi.org/10.1002/qj.340, 2008.

Bannister, R. N.: A review of operational methods of variationaland ensemble-variational data assimilation, Q. J. Roy. Meteorol.Soc., 29, 1–29, https://doi.org/10.1002/QJ.2982, 2016.

Beven, K.: A manifesto for the equifinality thesis, J. Hydrol., 320,18–36, https://doi.org/10.1016/j.jhydrol.2005.07.007, 2006.

Bröcker, J.: Evaluating raw ensembles with the continuous rankedprobability score, Q. J. R. Meteorol. Soc., 138, 1611–1617,https://doi.org/10.1002/qj.1891, 2012.

Buehner, M., Houtekamer, P. L., Charette, C., Mitchell, H.L. and He, B.: Intercomparison of Variational Data As-similation and the Ensemble Kalman Filter for Global De-terministic NWP. Part II: One-Month Experiments with


https://github.com/felherc/

https://doi.org/10.5194/hess-22-5759-2018-supplement

https://doi.org/10.1016/S0168-1923(02)00201-0

https://doi.org/10.1016/j.advwatres.2005.08.004

https://doi.org/10.1002/qj.340

https://doi.org/10.1002/QJ.2982

https://doi.org/10.1016/j.jhydrol.2005.07.007



Real Observations, Mon. Weather Rev., 138, 1567–1586,https://doi.org/10.1175/2009MWR3158.1, 2010.

Carpenter, J., Clifford, P., and Fearnhead, P.: Improved particle filterfor nonlinear problems, IEEE Proc. - Radar, Sonar Navig., 146,2–7, https://doi.org/10.1049/ip-rsn:19990255, 1999.

Clark, M. P., Rupp, D. E., Woods, R. a., Zheng, X., Ibbitt,R. P., Slater, A. G., Schmidt, J., and Uddstrom, M. J.: Hy-drological data assimilation with the ensemble Kalman fil-ter: Use of streamflow observations to update states in a dis-tributed hydrological model, Adv. Water Resour., 31, 1309–1324, https://doi.org/10.1016/j.advwatres.2008.06.005, 2008.

Clark, M. P., Bierkens, M. F. P. P., Samaniego, L., Woods, R. A.,Uijlenhoet, R., Bennett, K. E., Pauwels, V. R. N. N., Cai, X.,Wood, A. W., Peters-Lidard, C. D., Uijenhoet, R., Bennet, K. E.,Pauwels, V. R. N. N., Cai, X., Wood, A. W., and Peters-Lidard, C.D.: The evolution of process-based hydrologic models: Historicalchallenges and the collective quest for physical realism, Hydrol.Earth Syst. Sci., 21, 3427–3440, https://doi.org/10.5194/hess-21-3427-2017, 2017.

Cosgrove, B. A., Lohmann, D., Mitchell, K. E., Houser, P.R., Wood, E. F., Schaake, J. C., Robock, A., Marshall, C.,Sheffield, J., Duan, Q., Luo, L., Higgins, R. W., Pinker, R.T., Tarpley, J. D., and Meng, J.: Real-time and retrospectiveforcing in the North American Land Data Assimilation Sys-tem (NLDAS) project, J. Geophys. Res.-Atmos., 108, 1–12,https://doi.org/10.1029/2002JD003118, 2003.

Crainic, T. G. and Toulouse, M.: Parallel Meta-heuristics, in: Hand-book of Metaheuristics, vol. 146, edited by: Gendreau, M. andPotvin, J.-Y., Springer US, 497–541, 2010.

Deb, K.: Multi-objective Optimization, in: Search Methodolo-gies: Introductory Tutorials in Optimization and Decision Sup-port Techniques, edited by: Burke, E. K. and Kendall, G.,Springer US, 403–449, 2014.

Deb, K., Pratap, A., Agarwal, S., and Meyarivan, T.:A fast and elitist multiobjective genetic algorithm:NSGA-II, IEEE Trans. Evol. Comput., 6, 182–197,https://doi.org/10.1109/4235.996017, 2002.

Desroziers, G., Camino, J. T., and Berre, L.: 4DEnVar: Link with4D state formulation of variational assimilation and differentpossible implementations, Q. J. Roy. Meteorol. Soc., 140, 2097–2110, https://doi.org/10.1002/qj.2325, 2014.

Dumedah, G. and Coulibaly, P.: Evolutionary assimilation ofstreamflow in distributed hydrologic modeling using in-situ soil moisture data, Adv. Water Resour., 53, 231–241,https://doi.org/10.1016/j.advwatres.2012.07.012, 2013.

Dumedah, G., Berg, A. A., and Wineberg, M.: An IntegratedFramework for a Joint Assimilation of Brightness Temper-ature and Soil Moisture Using the Nondominated SortingGenetic Algorithm II, J. Hydrometeorol., 12, 1596–1609,https://doi.org/10.1175/JHM-D-10-05029.1, 2011.

Duong, T. and Hazelton, M. L.: Cross-validation bandwidth ma-trices for multivariate kernel density estimation, Scand. J. Stat.,32, 485–506, https://doi.org/10.1111/j.1467-9469.2005.00445.x,2005.

Efstratiadis, A. and Koutsoyiannis, D.: One decade ofmulti-objective calibration approaches in hydrologi-cal modelling: a review, Hydrolog. Sci. J., 55, 58–78,https://doi.org/10.1080/02626660903526292, 2010.

Errico, R. M.: What Is an Adjoint Model?, B. Am. Me-teorol. Soc., 78, 2577–2591, https://doi.org/10.1175/1520-0477(1997)078<2577:WIAAM>2.0.CO;2, 1997.

Evensen, G.: Data assimilation: the ensemble Kalman filter,Springer Science & Business Media, 2009.

Evensen, G. and van Leeuwen, P. J.: An ensembleKalman smoother for nonlinear dynamics, Mon. WeatherRev., 128, 1852–1867, https://doi.org/10.1175/1520-0493(2000)128<1852:AEKSFN>2.0.CO;2, 2000.

Fisher, M.: Background error covariance modelling,Semin. Recent Dev. Data Assim., 45–63, available at:https://www.ecmwf.int/sites/default/files/elibrary/2003/9404-background-error-covariance-modelling.pdf (last ac-cess: 29 October 2018), 2003.

Friedman, J., Hastie, T., and Tibshirani, R.: Sparse inverse covari-ance estimation with the graphical lasso, Biostatistics, 9, 432–441, https://doi.org/10.1093/biostatistics/kxm045, 2008.

Gauthier, P., Tanguay, M., Laroche, S., Pellerin, S., and Morneau, J.:Extension of 3DVAR to 4DVAR: Implementation of 4DVAR atthe Meteorological Service of Canada, Mon. Weather Rev., 135,2339–2354, https://doi.org/10.1175/MWR3394.1, 2007.

Ghil, M. and Malanotte-Rizzoli, P.: Data assimilation in me-teorology and oceanography, Adv. Geophys., 33, 141–266,https://doi.org/10.1016/S0065-2687(08)60442-2, 1991.

Ghorbanidehno, H., Kokkinaki, A., Li, J. Y., Darve, E. and Kitani-dis, P. K.: Real-time data assimilation for large-scale systems:The spectral Kalman filter, Adv. Water Resour., 86, 260–272,https://doi.org/10.1016/j.advwatres.2015.07.017, 2015.

GitHub: felherc, available at: https://github.com/felherc/, last ac-cess: 1 October 2018.

Gordon, N. J., Salmond, D. J., and Smith, A. F. M.: Novelapproach to nonlinear/non-Gaussian Bayesian state estima-tion, IEEE Proc. F Radar Signal Process., 140, 107–113,https://doi.org/10.1049/ip-f-2.1993.0015, 1993.

Guo, J., Liang, X., and Leung, L. R.: A new multiscale flow networkgeneration scheme for land surface models, Geophys. Res. Lett.,31, 1–4, https://doi.org/10.1029/2004GL021381, 2004.

Haario, H., Saksman, E., and Tamminen, J.: An Adap-tive Metropolis Algorithm, Bernoulli, 7, 223–242,https://doi.org/10.2307/3318737, 2001.

Hawkins, D. M.: The Problem of Overfitting, J. Chem. Inf. Comput.Sci., 44, 1–12, https://doi.org/10.1021/ci0342472, 2004.

Hazelton, M. L.: Variable kernel density estimation, Aust. N. Z.J. Stat., 45, 271–284, https://doi.org/10.1111/1467-842X.00283,2003.

Homer, C., Fry, J., and Barnes, C.: The National Land CoverDatabase, US Geol. Surv. Fact Sheet, 3020, 1–4, availableat: http://pubs.usgs.gov/fs/2012/3020/ (last access: 22 Octo-ber 2018), 2012.

Houser, P. R., Shuttleworth, W. J., Famiglietti, J. S., Gupta, H. V.,Syed, K. H., and Goodrich, D. C.: Integration of soil moistureremote sensing and hydrologic modeling using data assimilation,Water Resour. Res., 34, 3405–3420, 1998.

Karafotias, G., Hoogendoorn, M., and Eiben, A. E.: Pa-rameter Control in Evolutionary Algorithms: Trends andChallenges, IEEE Trans. Evol. Comput., 2, 167–187,https://doi.org/10.1109/TEVC.2014.2308294, 2014.

Koster, R. D., Betts, A. K., Dirmeyer, P. A., Bierkens, M., Ben-nett, K. E., Déry, S. J., Evans, J. P., Fu, R., Hernández, F., Le-


https://doi.org/10.1175/2009MWR3158.1

https://doi.org/10.1049/ip-rsn:19990255


https://doi.org/10.5194/hess-21-3427-2017


https://doi.org/10.1029/2002JD003118

https://doi.org/10.1109/4235.996017



https://doi.org/10.1175/JHM-D-10-05029.1

https://doi.org/10.1111/j.1467-9469.2005.00445.x

https://doi.org/10.1080/02626660903526292

https://doi.org/10.1175/1520-0477(1997)078<2577:WIAAM>2.0.CO;2

https://doi.org/10.1175/1520-0477(1997)078<2577:WIAAM>2.0.CO;2

https://doi.org/10.1175/1520-0493(2000)128<1852:AEKSFN>2.0.CO;2

https://doi.org/10.1175/1520-0493(2000)128<1852:AEKSFN>2.0.CO;2

https://www.ecmwf.int/sites/default/files/elibrary/2003/9404-background-error-covariance-modelling.pdf

https://www.ecmwf.int/sites/default/files/elibrary/2003/9404-background-error-covariance-modelling.pdf

https://doi.org/10.1093/biostatistics/kxm045

https://doi.org/10.1175/MWR3394.1

https://doi.org/10.1016/S0065-2687(08)60442-2


https://github.com/felherc/

https://doi.org/10.1049/ip-f-2.1993.0015

https://doi.org/10.1029/2004GL021381

https://doi.org/10.2307/3318737

https://doi.org/10.1021/ci0342472

https://doi.org/10.1111/1467-842X.00283

http://pubs.usgs.gov/fs/2012/3020/

https://doi.org/10.1109/TEVC.2014.2308294


ung, L. R., Liang, X., Masood, M., Savenije, H., Wang, G.,and Yuan, X.: Hydroclimatic variability and predictability: a sur-vey of recent research, Hydrol. Earth Syst. Sci., 21, 3777–3798,https://doi.org/10.5194/hess-21-3777-2017, 2017.

Krause, P., Boyle, D. P., and Bäse, F.: Comparison of different effi-ciency criteria for hydrological model assessment, Adv. Geosci.,5, 89–97, https://doi.org/10.5194/adgeo-5-89-2005, 2005.

Krishnamoorthy, A. and Menon, D.: Matrix Inversion UsingCholesky Decomposition, CoRR, 10–12, available at: http://arxiv.org/abs/1111.4144 (last access: 1 October 2018), 2011.

Li, J. Y., Kokkinaki, A., Ghorbanidehno, H., Darve, E. F.,and Kitanidis, P. K.: The compressed state Kalman fil-ter for nonlinear state estimation: Application to large-scalereservoir monitoring, Water Resour. Res., 51, 9942–9963,https://doi.org/10.1002/2015WR017203, 2015.

Liang, X. and Xie, Z.: A new surface runoff parameterization withsubgrid-scale soil heterogeneity for land surface models, Adv.Water Resour., 24, 1173–1193, 2001.

Liang, X. and Xie, Z.: Important factors in land–atmosphere in-teractions: surface runoff generations and interactions betweensurface and groundwater, Global Planet. Change, 38, 101–114,2003.

Liang, X., Lettenmaier, D. P., Wood, E. F., and Burges, S. J.: Asimple hydrologically based model of land surface water and en-ergy fluxes for general circulation models, J. Geophys. Res., 99,14415, https://doi.org/10.1029/94JD00483, 1994.

Liang, X., Lettenmaier, D. P., and Wood, E. F.: One-dimensionalstatistical dynamic representation of subgrid spatial variability ofprecipitation in the two-layer variable infiltration capacity model,J. Geophys. Res.-Atmos., 101, 21403–21422, 1996a.

Liang, X., Wood, E. F., and Lettenmaier, D. P.: Surface soil moistureparameterization of the VIC-2L model: Evaluation and modifica-tion, Global Planet. Change, 13, 195–206, 1996b.

Liu, J. S. and Chen, R.: Sequential Monte Carlo Methodsfor Dynamic Systems, J. Am. Stat. Assoc., 93, 1032–1044,https://doi.org/10.2307/2669847, 1998.

Liu, Y. and Gupta, H. V.: Uncertainty in hydrologic modeling: To-ward an integrated data assimilation framework, Water Resour.Res., 43, 1–18, https://doi.org/10.1029/2006WR005756, 2007.

Lohmann, D., Rashke, E., Nijssen, B., and Lettenmaier, D. P.: Re-gional scale hydrology: I. Formulation of the VIC-2L modelcoupled to a routing model, Hydrolog. Sci. J., 43, 131–141,https://doi.org/10.1080/02626669809492107, 1998.

Lorenc, A. C., Bowler, N. E., Clayton, A. M., Pring, S. R., andFairbairn, D.: Comparison of Hybrid-4DEnVar and Hybrid-4DVar Data Assimilation Methods for Global NWP, Mon.Weather Rev., 143, 212–229, https://doi.org/10.1175/MWR-D-14-00195.1, 2015.

Mahalanobis, P. C.: On the generalized distance in statistics, Proc.Natl. Inst. Sci., 2, 49–55, 1936.

Miller, D. A. and White, R. A.: A ConterminousUnited States Multilayer Soil Characteristics Datasetfor Regional Climate and Hydrology Modeling,Earth Interact., 2, 1–26, https://doi.org/10.1175/1087-3562(1998)002<0002:CUSMS>2.0.CO;2, 1998.

Molteni, F., Buizza, R., Palmer, T. N., and Petroliagis, T.:The ECMWF ensemble prediction system: Methodologyand validation, Q. J. Roy. Meteorol. Soc., 122, 73–119,https://doi.org/10.1002/qj.49712252905, 1996.

Montgomery, D. C.: Design and analysis of experiments, 8th Edn.,John Wiley & Sons, 2012.

Montgomery, D. C., Runger, G. C., and Hubele, N. F.: Engineeringstatistics, John Wiley & Sons, USA, 2009.

Montzka, C., Pauwels, V. R. N., Franssen, H.-J. H., Han, X., andVereecken, H.: Multivariate and Multiscale Data Assimilationin Terrestrial Systems: A Review, Sensors, 12, 16291–16333,https://doi.org/10.3390/s121216291, 2012.

Moradkhani, H., DeChant, C. M., and Sorooshian, S.: Evolution ofensemble data assimilation for uncertainty quantification usingthe particle filter – Markov chain Monte Carlo method, WaterResour. Res., 48, 1–13, https://doi.org/10.1029/2012WR012144,2012.

Ning, L., Carli, F. P., Ebtehaj, A. M., Foufoula-Georgiou, E., andGeorgiou, T. T.: Coping with model error in variational data as-similation using optimal mass transport, Water Resour. Res., 50,5817–5830, https://doi.org/10.1002/2013WR014966, 2014.

Noh, S. J., Tachikawa, Y., Shiiba, M., and Kim, S.: Ap-plying sequential Monte Carlo methods into a distributedhydrologic model: Lagged particle filtering approach withregularization, Hydrol. Earth Syst. Sci., 15, 3237–3251,https://doi.org/10.5194/hess-15-3237-2011, 2011.

Park, S., Hwang, J. P., Kim, E., and Kang, H. J.: A newevolutionary particle filter for the prevention of sample im-poverishment, IEEE Trans. Evol. Comput., 13, 801–809,https://doi.org/10.1109/TEVC.2008.2011729, 2009.

Penning-Rowsell, E. C., Tunstall, S. M., Tapsell, S. M. and Parker,D. J.: The benefits of flood warnings: Real but elusive, and polit-ically significant, J. Chart. Inst. Water Environ. Manage., 14, 7–14, https://doi.org/10.1111/j.1747-6593.2000.tb00219.x, 2000.

Pham, D. T.: Stochastic methods for sequential data as-similation in strongly nonlinear systems, Mon. WeatherRev., 129, 1194–1207, https://doi.org/10.1175/1520-0493(2001)129<1194:SMFSDA>2.0.CO;2, 2001.

Rawlins, F., Ballard, S. P., Bovis, K. J., Clayton, A. M., Li,D., Inverarity, G. W., Lorenc, A. C., and Payne, T. J.: TheMet Office global four-dimensional variational data assim-ilation scheme, Q. J. Roy. Meteorol. Soc., 133, 347–362,https://doi.org/10.1002/qj.32, 2007.

Reichle, R. H., McLaughlin, D. B., and Entekhabi, D.: Variationaldata assimilation of microwave radiobrightness observations forland surface hydrology applications, IEEE T. Geosci. Remote,39, 1708–1718, https://doi.org/10.1109/36.942549, 2001.

Rodríguez, E., Morris, C. S., and Belz, J. E.: A global assessmentof the SRTM performance, Photogramm. Eng. Remote Sens., 72,249–260, 2006.

Rogers, E., DiMego, G., Black, T., Ek, M., Ferrier, B., Gayno, G.,Janic, Z., Lin, Y., Pyle, M., Wong, V., and Wu, W.-S.: The NCEPNorth American Mesoscale Modeling System: Recent Changesand Future Plans, in: 23rd Conf. Weather Anal. Forecast.Conf. Numer. Weather Predict., available at: http://ams.confex.com/ams/23WAF19NWP/techprogram/paper_154114.htm (lastaccess: 1 October 2018), 2009.

Seaber, P. R., Kapinos, F. P., and Knapp, G. L.: Hydrologic unitmaps, US Government Printing Office Washington, D.C., USA,1987.

Sheather, S. J. and Jones, M. C.: A Reliable Data-Based Band-width Selection Method for Kernel Density Estimation, J. R.



https://doi.org/10.5194/adgeo-5-89-2005

http://arxiv.org/abs/1111.4144

http://arxiv.org/abs/1111.4144

https://doi.org/10.1002/2015WR017203

https://doi.org/10.1029/94JD00483

https://doi.org/10.2307/2669847

https://doi.org/10.1029/2006WR005756

https://doi.org/10.1080/02626669809492107

https://doi.org/10.1175/MWR-D-14-00195.1

https://doi.org/10.1175/MWR-D-14-00195.1

https://doi.org/10.1175/1087-3562(1998)002<0002:CUSMS>2.0.CO;2

https://doi.org/10.1175/1087-3562(1998)002<0002:CUSMS>2.0.CO;2

https://doi.org/10.1002/qj.49712252905

https://doi.org/10.3390/s121216291

https://doi.org/10.1029/2012WR012144

https://doi.org/10.1002/2013WR014966



https://doi.org/10.1111/j.1747-6593.2000.tb00219.x

https://doi.org/10.1175/1520-0493(2001)129<1194:SMFSDA>2.0.CO;2

https://doi.org/10.1175/1520-0493(2001)129<1194:SMFSDA>2.0.CO;2


https://doi.org/10.1109/36.942549

http://ams.confex.com/ams/23WAF19NWP/techprogram/paper_154114.htm

http://ams.confex.com/ams/23WAF19NWP/techprogram/paper_154114.htm


Stat. Soc. Ser. B, 53, 683–690, available at: http://www.jstor.org/stable/2345597 (last access: 1 October 2018), 1991.

Silverman, B. B. W.: Density estimation for statistics and data anal-ysis, CRC Press, USA, 1986.

Smith, A., Doucet, A., de Freitas, N., and Gordon, N.: SequentialMonte Carlo methods in practice, Springer Science & BusinessMedia, USA, 2013.

Snyder, C., Bengtsson, T., Bickel, P., and Anderson, J.: Obstaclesto High-Dimensional Particle Filtering, Mon. Weather Rev., 136,4629–4640, https://doi.org/10.1175/2008MWR2529.1, 2008.

Socha, K. and Dorigo, M.: Ant colony optimization forcontinuous domains, Eur. J. Oper. Res., 185, 1155–1173,https://doi.org/10.1016/j.ejor.2006.06.046, 2008.

Terrell, G. R. and Scott, D. W.: Variable kerneldensity estimation, Ann. Stat., 20, 1236–1265,https://doi.org/10.1214/aos/1176348768, 1992.

Trémolet, Y.: Accounting for an imperfect model in4D-Var, Q. J. Roy. Meteorol. Soc., 132, 2483–2504,https://doi.org/10.1256/qj.05.224, 2006.

van Leeuwen, P. J.: Particle Filtering in Geophysi-cal Systems, Mon. Weather Rev., 137, 4089–4114,https://doi.org/10.1175/2009MWR2835.1, 2009.

van Leeuwen, P. J.: Nonlinear Data Assimilation for high-dimensional systems, in Nonlinear Data Assimilation, edited by:Van Leeuwen, J. P., Cheng, Y., and Reich, S., Springer Interna-tional Publishing, 1–73, 2015.

Verkade, J. S. and Werner, M. G. F.: Estimating the benefits of sin-gle value and probability forecasting for flood warning, Hydrol.Earth Syst. Sci., 15, 3751–3765, https://doi.org/10.5194/hess-15-3751-2011, 2011.

Vrugt, J. A. and Robinson, B. A.: Improved evolu-tionary optimization from genetically adaptive multi-method search, P. Natl. Acad. Sci. USA, 104, 708–711,https://doi.org/10.1073/pnas.0610471104, 2007.

Wand, M. P. and Jones, M. C.: Kernel smoothing, CRC Press, NewYork, 1994.

Wen, Z., Liang, X., and Yang, S.: A new multiscalerouting framework and its evaluation for land surfacemodeling applications, Water Resour. Res., 48, 1–16,https://doi.org/10.1029/2011WR011337, 2012.

West, M.: Mixture models, Monte Carlo, Bayesian updating, anddynamic models, Comput. Sci. Stat., 1–11, available at: http://www.stat.duke.edu/~mw/MWextrapubs/West1993a.pdf (last ac-cess: 1 October 2018), 1993.

Wigmosta, M. S., Vail, L. W., and Lettenmaier, D. P.: A distributedhydrology-vegetation model for complex terrain, Water Re-sour. Res., 30, 1665–1679, https://doi.org/10.1029/94WR00436,1994.

Wigmosta, M. S., Nijssen, B., and Storck, P.: The distributed hydrol-ogy soil vegetation model, Math. Model. Small Watershed Hy-drol. Appl., 7–42, available at: http://ftp.hydro.washington.edu/pub/dhsvm/The-distributed-hydrology-soil-vegetation-model.pdf (last access: 1 October 2018), 2002.

Yang, S.-C., Corazza, M., Carrassi, A., Kalnay, E., andMiyoshi, T.: Comparison of Local Ensemble Trans-form Kalman Filter, 3DVAR, and 4DVAR in a Quasi-geostrophic Model, Mon. Weather Rev., 137, 693–709,https://doi.org/10.1175/2008MWR2396.1, 2009.

Zhang, F., Zhang, M., and Hansen, J.: Coupling ensemble Kalmanfilter with four-dimensional variational data assimilation, Adv.Atmos. Sci., 26, 1–8, https://doi.org/10.1007/s00376-009-0001-8, 2009.

Zhang, L., Nan, Z., Liang, X., Xu, Y., Hernández, F.,and Li, L.: Application of the MacCormack scheme tooverland flow routing for high-spatial resolution dis-tributed hydrological model, J. Hydrol., 558, 421–431,https://doi.org/10.1016/j.jhydrol.2018.01.048, 2018.

Zhang, X., Tian, Y., Cheng, R., and Jin, Y.: An Efficient Ap-proach to Nondominated Sorting for Evolutionary Multiobjec-tive Optimization, IEEE Trans. Evol. Comput., 19, 201–213,https://doi.org/10.1109/TEVC.2014.2308305, 2015.

Zhu, Y., Toth, Z., Wobus, R., Richardson, D., and Mylne, K.:The economic value of ensemble-based weather forecasts, B.Am. Meteorol. Soc., 83, 73–83, https://doi.org/10.1175/1520-0477(2002)083<0073:TEVOEB>2.3.CO;2, 2002.

Ziervogel, G., Bithell, M., Washington, R., and Downing,T.: Agent-based social simulation: A method for as-sessing the impact of seasonal climate forecast applica-tions among smallholder farmers, Agric. Syst., 83, 1–26,https://doi.org/10.1016/j.agsy.2004.02.009, 2005.


http://www.jstor.org/stable/2345597

http://www.jstor.org/stable/2345597

https://doi.org/10.1175/2008MWR2529.1

https://doi.org/10.1016/j.ejor.2006.06.046

https://doi.org/10.1214/aos/1176348768

https://doi.org/10.1256/qj.05.224

https://doi.org/10.1175/2009MWR2835.1



https://doi.org/10.1073/pnas.0610471104

https://doi.org/10.1029/2011WR011337

http://www.stat.duke.edu/~mw/MWextrapubs/West1993a.pdf

http://www.stat.duke.edu/~mw/MWextrapubs/West1993a.pdf

https://doi.org/10.1029/94WR00436

http://ftp.hydro.washington.edu/pub/dhsvm/The-distributed-hydrology-soil-vegetation-model.pdf



https://doi.org/10.1175/2008MWR2396.1

https://doi.org/10.1007/s00376-009-0001-8

https://doi.org/10.1007/s00376-009-0001-8

https://doi.org/10.1016/j.jhydrol.2018.01.048


https://doi.org/10.1175/1520-0477(2002)083<0073:TEVOEB>2.3.CO;2

https://doi.org/10.1175/1520-0477(2002)083<0073:TEVOEB>2.3.CO;2

https://doi.org/10.1016/j.agsy.2004.02.009

Date post:	11-Jun-2020
Category:	Documents
Upload:	others
View:	2 times
Download:	0 times

Hybridizing Bayesian and variational data assimilation for ... · focused on improving their...

Documents