Adaptive discretization of convex multistage stochastic programs

Post on 21-Jan-2023

0 views 0 download

transcript

Math. Meth. Oper. Res. (2007) 65: 361–383DOI 10.1007/s00186-006-0124-y

ORIGINAL ARTICLE

Stefan Vigerske · Ivo Nowak

Adaptive discretization of convex multistagestochastic programs

Received: 15 May 2006 / Accepted: 16 October 2006 / Published online: 13 December 2006© Springer-Verlag 2006

Abstract We propose a new scenario tree reduction algorithm for multistagestochastic programs, which integrates the reduction of a scenario tree into thesolution process of the stochastic program. This allows to construct a scenario treethat is highly adapted on the optimization problem. The algorithm starts with arough approximation of the original tree and locally refines this approximation aslong as necessary. Promising numerical results for scenario tree reductions in thesettings of portfolio management and power management with uncertain load arepresented.

Keywords Stochastic programming · Multistage · Scenario tree · Scenario reduc-tion · Adaptive discretization

1 Introduction

Multistage stochastic programs (MSP) (Ruszczynski and Shapiro 2003; Shapiro2006) are a powerful tool to model a series of sequential optimization problemswhose data underlies uncertainties. Therefore we consider a discrete-time stochas-tic data-process ξ := ξt T

t=1 over some probability space (Ω, F, P), ξt : Ω → Rst ,

s := ∑Tt=1 st . Denote by ζt := (ξ1, . . . , ξt ) the stochastic variable that represents

the events that can be observed until time t , and by Ft := σ (ζt ) the σ -algebra gen-erated by ζt . We assume F1 = ∅, Ω, that is, the first-stage data is deterministic,and for convenience, FT = F . The aim of the optimization is to find a decision

S. Vigerske (B)Department of Mathematics, Humboldt-University, 10099 Berlin, GermanyE-mail: stefan@math.hu-berlin.de

I. NowakNiebuhrstraße 63, 10629 Berlin, Germany

362 S. Vigerske, I. Nowak

process x = xt Tt=1 over (Ω, F, P), xt : Ω → R

rt ,∑T

t=1 rt = r , which solvesthe multistage stochastic program

minx

E [ f (x, ξ)]

s.t. gt (x1, . . . , xt , ζt ) ≤ 0, t = 1, . . . , T P-a.s.

xt ∈ Xt , t = 1, . . . , T P-a.s.

xt = E [xt |Ft ] , t = 1, . . . , T

(MSP)

with f : Rr × R

s → R, gt : R

∑tτ=1 rτ × R

∑tτ=1 sτ → R

dt , and Xt ⊆ Rrt . The

property xt = E [xt |Ft ] is called nonanticipativity and ensures that the decisionxt at time t depends on ζt only.

When the probability distribution has an infinite or very large support, theproblem is numerically not tractable, so that the distribution is replaced by anapproximation using a finite and smaller support Ω . Since the first-stage data isdeterministic, the scenarios ξ (ω), ω ∈ Ω , can be represented in a scenario tree ofdepth T with |Ω| leaves. The size of this scenario tree determines mainly the sizeof the optimization problem. Hence, if it is too large, the scenario tree has to bereduced further, in order to make the MSP numerically tractable. This problem hasattracted much interest over the last decades. In the following we mention somepublications.

The algorithm of Casey and Sen (2005) for linear MSPs with random right-hand-side constructs a sequence of scenario trees as an approximation of the orig-inal distribution. The optimal solution for an approximated tree is prolongated to apolicy for the original tree. The choice of the nodes for which the tree is refined isgoverned by two node-specific parameters. First, the probability of infeasibility ofthe prolongated policy, and second the gap between a lower and upper bound on a(nodal) salvage value function. The advantages of this algorithm are its asymptoticconvergence and its measure of optimality and infeasibility. The computationalchallenge consists of the evaluation of bounds on the salvage value function and anupper bound on the infeasibility index in each node of the approximated scenariotree. Unfortunately, this algorithm seems to be limited to linear multistage pro-grams with stochasticity in the right-hand-side only, even though such problemsare common in practical applications.

Another scenario tree generation algorithm which utilizes nodal informationis the EVPI-based importance sampling algorithm from Dempster (2004). Theexpected value of perfect information (EVPI) process measures the impact of thenonanticipativity constraints on the optimal value of the MSP. For nodes of thescenario tree with a high EVPI process value the number of branches is increased,while for nodes with a small value the number of branches is decreased. The soft-ware package MSLiP (Gassmann 1990) is an implementation of this algorithm forlinear multistage stochastic programs.

Other approaches for solving MSPs are based on dual decomposition tech-niques. Edirisinghe’s algorithm (1999) for linear MSPs relaxes the nonanticipativ-ity requirements of the decision-process up to a certain second-order aggregationand the computation of lower bounds. To form the aggregations, it is assumed thatthe support Ω of ξ can be enclosed in a compact simplex Ω , which is then parti-tioned into smaller subsimplices. Those which contribute most to the gap betweena lower and an upper bound on the optimal value are then chosen for a refinement.

Adaptive discretization of convex multistage stochastic programs 363

The advantages of this approach are the computation of bounds on the value ofthe original problem, and the ability to work with continuous distributions. Butcurrently it is unclear, whether this method provides any asymptotic guarantees.

Higle et al. (2005) presented in a stochastic scenario decomposition algorithmfor MSPs. Here, the distribution is approximated by an empirical one obtained byscenario sampling, and dualization of the nonanticipativity constraints yields a dualproblem, which is further approximated by a stochastic cutting plane method. Toavoid a too extensive growth of the master problem due to the amount of sampledscenarios and generated cuts, columns referring to similar data in the problem areaggregated. Advantages of this approach are its mild assumptions on the stochasticprogram and the ability to work with a high number of scenarios. On the other hand,no estimate on the quality of a solution after finitely many iterations is known, butalmost sure asymptotical convergence can be shown.

Dupacová et al. (2003) and Römisch (2003) have shown that the optimal value ofa convex twostage stochastic program behaves (locally) Lipschitz-continuous withrespect to the underlying probability distribution. Recently, Heitsch et al. (2006)extended this result to linear multistage stochastic programs, where additionally thefiltration Ft T

t=1 has to be taken into account. Hence, their scenario reduction algo-rithms aim to find a scenario tree whose associated probability distribution variesas less as possible from the original distribution with respect to the considered dis-tances of probability measures and filtrations (Heitsch and Römisch 2003, 2005).We describe this approach more detailed later, since we utilize it for our algorithm.

Pflug (2001) developed an algorithm related to methods from cluster analysisthat considers the same distances of probability measures as in Dupacová et al.(2003) to discretize a given probability distribution by a scenario tree of prescribedstructure. For the case that the original distribution is not known, a stochasticapproximation procedure, which is based on a sample from the original distribu-tion, is proposed.

Also the approach of Høyland et al. (2003) for the discretization of a multivar-iate stochastic process ξ concentrates on properties of the probability distributiononly, more precisely, the first four moments and the correlation matrix. After gen-erating s discrete independent univariate random variables, they are transformed toa multivariate random variable with given correlation matrix. The moments of theunivariate variables are thereby chosen such that the moments of the multivariatevariable match prescribed values. Since independence of the generated univariatevariables is hard to achieve in general, an iterative algorithm is applied to obtain atleast uncorrelated random variables.

In this work we develop an algorithm which integrates the construction of areduced scenario tree into the solution process of a convex MSP. Hence, from nowon we assume that the functions f (x, ξ) and gt (x1, . . . , xt , ζt ) are convex in x andthat Xt is a convex subset of R

rt . The convexity of the MSP might follow from thenature of the application, or might be given due to a relaxation of a nonconvex MSP.Indeed, we see a main application of our algorithm in the construction of scenariotrees that are highly adapted to a convex relaxation of a nonconvex problem, andthat can be used to discretize the nonconvex problem with a much smaller numberof scenarios than by previous nonadaptive methods, cf. the thesis (Vigerske 2005)for such an extension. From a practical point of view, the assumption of convexityensures that intermediate optimization problems remain numerically tractable.

364 S. Vigerske, I. Nowak

The presented algorithm is based on the optimal scenario refinement algorithmof Nowak (2005, Chap. 9), which uses dual information to determine scenarios ofan aggregated scenario tree, which have a high influence, and should therefore berefined. Similarly, our approach starts with a rough aggregation of a given scenariotree and refines this aggregation as long as the estimated gap to the original problemis too large. The gap is estimated by two error terms. To form the aggregations, thefast-forward selection algorithm of Heitsch and Römisch (2003) is utilized. Sinceour algorithm does not deal with the construction of a scenario tree, but the reduc-tion of a given original tree, the significance of the reduced tree as a discretizationof the stochastic process ξ is limited by the quality of the original tree.

The paper is organized as follows. Next, notations for scenario trees, aggrega-tions of scenarios, and associated optimization problems are introduced. The thirdsection describes our approach to the adaptive discretization of scenario trees. Thefourth section illustrates the algorithm on a product mix problem, a portfolio man-agement problem, and the optimization of operation levels for a collection of coalfired thermal units and pumped hydro units under uncertain load. The paper finisheswith concluding remarks.

2 Problem formulation

2.1 Scenario tree formulation

We let Ω = ωi i∈I with finite index set I , and represent the information struc-ture defined by the scenarios ξ (ωi ), i ∈ I , in a scenario tree: nodes at depth tcorrespond to elements of a finite generator of Ft and two nodes have the samepredecessor when their history is “the same”, i.e., indistinguishable in terms of thefiltration Ft−1. We denote the nodes of the tree by N and introduce the followingnotation: For n ∈ N and N ⊆ N let

– n− be the predecessor of n, and n−k := (n−(k−1)

)− with n−1 := n−

– t (n) such that n−t(n)+1 is the root node (the depth or timestage of n)– Nt := n ∈ N |t (n) = t all nodes with depth (or timestage) t in N– p (n) := (n−t(n)+1, . . . , n−, n

)the path from the root node to node n

– i (n) ∈ I the scenario index that belongs to n ∈ NT .

We define recursively node probabilities πn : for n ∈ NT , πn := P[ωi(n)], andfor n ∈ N \ NT , πn := ∑

m∈N :m−=n πm . Further, let ξn := ξt(n) (ωi ), and ζ n :=ξ p(n) := (. . . , ξn−, ξn). Then (MSP) has the scenario tree formulation

minx

n∈NT

πn f(

x p(n), ζ n)

s.t. gt(n)

(x p(n), ζ n

) ≤ 0, n ∈ N

xn ∈ Xt(n), n ∈ N

(MSP[N ])

where x = xnn∈N . (MSP[N ]) is a convex problem of size proportional to∣∣N∣∣,

i.e., the number of variables is∑T

t=1

∣∣Nt∣∣ rt and the number of constraints is

∑Tt=1

∣∣Nt∣∣ dt plus those equations describing the constraints xn ∈ Xt(n), n ∈ N .

We denote the optimal value of (MSP[N ]) by v(N).

Adaptive discretization of convex multistage stochastic programs 365

2.2 The approximation problem

In the adaptive discretization algorithm (next section), we make extensive use ofaggregations of scenarios and corresponding scenario trees. We identify the sce-narios ξ (ωi ), i ∈ I , with the leaves ξn , n ∈ NT , in the scenario tree, and obtainan aggregation N of N by partitioning NT and selecting so-called representativenodes for each element N ∈ N . To be more precise, let N be a partition of NT andfor N ∈ N (i.e., N ⊆ NT ) let nT (N ) ∈ N be a representative node for N . We letnt (N ) := nT (N )−T +t , p (N ) := p (nT (N )) = (n1 (N ) , . . . , nT (N )), N ∈ N ,and N |N be the scenario tree that is induced by the paths p (N ), N ∈ N . Thenode probabilities for the leaves of N |N are πN :=∑n∈N πn , N ∈ N . We definethe aggregated problem of (MSP[N ]) by restricting the decision-process and con-straints to the aggregated scenario tree N |N . For the objective function FN (x, ξ),we average over aggregated scenarios, which will simplify the comparison withthe objective function of a refinement of N ,

FN (x, ξ) :=∑

N∈N

n∈N

πn f(

x p(N ), ζ n)

.

Thus, the aggregated problem reads:

minx

FN (x, ξ)

s.t. gt(n)

(x p(n), ζ n

) ≤ 0, n ∈ N |Nxn ∈ Xt(n), n ∈ N |N

(MSP [N ])

We denote its optimal value by v (N ).The aim of the adaptive scenario reduction algorithm is to find an aggregation

N such that the optimal value of the aggregated problem (MSP [N ]) is close tothe optimal value of the original problem, v (N ) ≈ v

(N), and the number |N | of

scenarios in the aggregated scenario tree is small.We use a refinement of an aggregation N as a test-aggregation to find out which

parts of N are aggregated too roughly, and have to be refined. Under a refinementof an aggregation N we understand another partition N ′ of NT that is finer thanN , i.e., ∀N ′ ∈ N ′∃N ∈ N : N ′ ⊆ N . We require further that the representativenodes in N ′ are chosen such that N |N ⊆ N |N ′ . As last notation, for N ∈ N letN ′ |N :=

N ′ ∈ N ′ ∣∣N ′ ⊆ N

be the sets N ′ in the refinement N ′ which unionforms the set N .

3 Adaptive scenario reduction

The adaptive scenario reduction algorithm computes an aggregation N of the orig-inal scenario tree N such that the distance between the optimal values of the aggre-gated problem (MSP [N ]) and the original problem (MSP[N ]) is small. The mainidea is the following. We estimate the approximation error by

∣∣v (N ) − v

(N)∣∣ ≤ ∣∣v (N ) − v

(N ′)∣∣︸ ︷︷ ︸

=:∆(N ,N ′)

+ |v(N ′) − v(N )|︸ ︷︷ ︸

=:∆(N ′,N )

, (1)

366 S. Vigerske, I. Nowak

whereby N ′ is a refinement of N . For the reduction of the former gap, ∆(N , N ′),

we identify explicitely sets of aggregated scenarios N ∈ N which have a contribu-tion to this gap. They are then replaced by a partition of N , based on N ′ |N . For thelatter gap, ∆

(N ′, N), we reduce the distance between the probability distributions

associated with the aggregated tree N |N ′ and the original tree N .

3.1 Reduction of the error ∆(N , N ′)

Assume that the optimal values of (MSP [N ]) and(MSP

[N ′]) are known. Let σP

be a (relative) tolerance on the gap ∆(N , N ′). If the gap exceeds the tolerance,

we assemble a collection M ⊆ N of at most m node sets which contribute most tothis gap and replace each N ∈ M by an aggregation of N ′ |N , so that a refinementof N in N is achieved. Finally, to ensure that N ′ remains a nontrivial refinementof N , sets N ′ ∈ N ∩ N ′, i.e., sets in the aggregation N that are associated with atrivial refinement in N ′, are refined in N ′ (if possible).

The parameter m determines the growth rate of the aggregated tree N |N . If it ischosen too large, many N ∈ N are refined in one iteration, so that N |N grows fastand unnecessarily many scenarios might be added. If m is small, many iterationsmight be necessary for a sufficient reduction of the gap ∆

(N ′, N).

We distinguish two cases for the identification of M.

3.1.1 The case v (N ) − v(N ′) > σP

∣∣v(N ′)∣∣

If v (N ) exceeds v(N ′) too much, we have to search for sets N ∈ N whose refine-

ment allows a high reduction in v (N ). Actually, we aim to reach v (N )−v(N ′) ≤

σP |v (N )|. Due to convexity, this relation is satisfied by a point that is feasiblefor (MSP [N ]) and stays close to the set of optimal points of

(MSP

[N ′]). To findthe sets N ∈ N that hamper this goal we compute a point x = (xN ′)N ′∈N ′ that isfeasible for

(MSP

[N ′]), its objective value does not exceed v(N ′) by more than

the fraction σP , and that is as “close” to the rougher aggregation N as possible.Then we select for M those N ∈ N where the variation of xN ′ over N ′ |N is thehighest.

More formal, we search for a point x feasible for(MSP

[N ′]) that satisfiesFN ′ (x, ξ) ≤ v

(N ′) + σP∣∣v(N ′)∣∣ and minimizes the variation of xN ′ from xN

for N ′ ∈ N ′ |N and N ∈ N . That is, we solve the convex problem

minx

N∈NπN max

t=1,...,Tmax

N ′∈N ′|N

∥∥∥xnt (N ′) − xnt (N )

∥∥∥∞

s.t. gt(n)

(x p(n), ζ n

) ≤ 0, n ∈ N |N ′

xn ∈ Xt(n), n ∈ N |N ′

FN ′ (x) ≤ v(N ′)+ σP

∣∣v(N ′)∣∣

(P)

Next, we select those N ∈ N with the highest impact on the optimal value of(P), i.e., for an optimal point x of (P) let

M := argmaxM⊆N ,|M|≤m

M∈MπM max

t=1,...,Tmax

M ′∈N ′|M

∥∥∥xnt (M ′) − xnt (M)

∥∥∥∞ (2)

Adaptive discretization of convex multistage stochastic programs 367

and such that maxt=1,...,T maxM ′∈N ′|M ‖xnt (M ′) − xnt (M)‖∞ is strictly positive foreach M ∈ M.

M is not empty, since otherwise the optimal value of (P) would be zero. Then thesolution point x of (P) could be reformulated to a feasible point x for (MSP [N ])with FN (x) ≤ v

(N ′) + σP∣∣v(N ′)∣∣, which violates the assumption v (N ) −

v(N ′) > σP

∣∣v(N ′)∣∣.

3.1.2 The case v(N ′)− v (N ) > σP

∣∣v(N ′)∣∣

Here, we cannot proceed as in the first case. Instead, we concentrate our attentiondirectly to the objective functions of (MSP [N ]) and

(MSP

[N ′]), and select thoseN ∈ N that contribute most to the difference of the objective functions evaluated atsolution points of (MSP [N ]) and

(MSP

[N ′]). That is, for x = (xN )N∈N optimalfor (MSP [N ]) and x ′ = (x ′

N ′)

N ′∈N ′ optimal for(MSP

[N ′]) it is

FN ′(x ′)− FN (x) =

N∈N

N ′∈N ′|N

n∈N ′πn

(f(

x ′p(N ′), ζ n)

− f(

x p(N ), ζ n))

and we select M by building

argmaxM⊆N :|M|≤m

M∈M

M ′∈N ′|M

n∈M ′πn

(f(

x ′p(M ′), ζ n)

− f(

x p(M), ζ n))

(3)

and such that∑

M ′∈N ′|M

∑n∈M ′ πn

(f(

x ′p(M ′), ζ n)

− f(x p(M), ζ n

))is strictly

positive for each M ∈ M.

3.2 Reduction of the error ∆(N ′, N

)

When the first error is small enough, we have to ensure that the approximation(MSP[N ′]) gives a meaningful estimate for the optimal value of the original prob-lem. For convex twostage stochastic programs, Dupacová et al. (2003) have shownthat the gap ∆(N ′, N ) can be bounded by a distance of the probability distributionsassociated with the trees N |N ′ and N . For multistage stochastic programs, furtherthe distance of the filtrations associated with the trees has to be taken into account(Heitsch et al. 2006).

In terms of the stochastic program (MSP) we summarize the main results as fol-lows, cf. Dupacová et al. (2003), Heitsch and Römisch (2005), Heitsch et al. (2006),Römisch (2003): assume that the functions f and gt are affine linear in x and ξ

with non-stochastic recourse matrices At,t , i.e., f (x, ξ) =∑Tt=1 〈ct (ζt ) , xt 〉 and

gt (x1, . . . , xt , ζt ) = ∑t−1τ=1 At,τ (ζt ) xτ + At,t xt + bt (ζt ) for suited vector- and

matrix-functions ct , bt , and At,τ , τ = 1, . . . , t − 1, t = 1, . . . , T , and assume Xtto be polyhedral for t = 1, . . . , T . Further on, we denote the objective function byF (x, ξ) := E [ f (x, ξ)], the feasible set at timestage t by Xt (x1, . . . , xt−1, ζt ) :=xt ∈ Xt |gt (x1, . . . , xt , ζt ) ≤ 0 , t = 2, . . . , T , and the feasible set of (MSP) by

X (ξ) :=

x ∈T

t=1L∞

(Ω, Ft , P; R

r )∣∣∣∣ x1 ∈ X1, xt ∈ Xt (x1, . . . , xt−1, ζt )

,

368 S. Vigerske, I. Nowak

where ‖x‖∞ := maxt ess supΩ ‖xt‖. Hence, problem (MSP) can be written asmin F (x, ξ) |x ∈ X (ξ) . We denote the optimal value of (MSP) by v (ξ), andlet, for any α ≥ 0,

lα (F (·, ξ)) := x ∈ X (ξ) |F (x, ξ) ≤ v (ξ) + α denote the α-level set of (MSP). Then the following statement about the stabilityof (MSP) with respect to the stochastic process ξ is known (Heitsch and Römisch2005; Heitsch et al. 2006):

Theorem 1 Let ξ ∈ Lq (Ω, F, P; Rs) for some q ≥ 1, where we define ‖ξ‖q

q :=∑T

t=1 E[‖ξt‖q

]. Assume that there exists a δ>0 such that for ξ ∈ Lq (Ω, F, P; R

s)

with ‖ξ − ξ‖q ≤ δ, the problem minF(x, ξ )|x ∈ X (ξ ) possesses relatively com-plete recourse. Assume further that the optimal value v (ξ) is finite and there exists aconstant α > 0 such that the objective function F is levelbounded locally uniformlyat ξ . Finally, assume that X1 is bounded.

Then there exists a positive constant L, such that the estimate∣∣∣v (ξ) − v(ξ )

∣∣∣ ≤ L

(‖ξ − ξ‖q + D f (ξ, ξ )

)(4)

holds for all random elements ξ ∈ Lq (Ω, F, P; Rs) with ‖ξ − ξ‖q ≤ δ and v(ξ )

finite. D f (ξ, ξ ) denotes the filtration distance of ξ and ξ defined by

supε∈(0,α]

infx∈lε(F(·,ξ))

x∈lε(F(.,ξ ))

T −1∑

t=2

max‖xt − E[ xt | Ft ]‖∞,

∥∥xt − E

[xt |Ft

]∥∥∞

,

where Ft and Ft denote the σ -algebras generated by ζt and ζt , respectively.

Observe, that for twostage stochastic programs the filtration distance vanishes in(4). Starting from an estimate similar to (4), Dupacová et al. (2003) derived sce-nario reduction algorithms for convex twostage stochastic programs with finitelysupported stochastic process ξ , which compute approximations ξ on a subset ofscenarios that minimize the distance ‖ξ − ξ‖q (Heitsch and Römisch 2005). Thatis, for a finite set Ω = ωi i∈I and a given stochastic process ξ on Ω (thus, havingscenarios ξ i := ξ (ωi ), i ∈ I ), the principle of optimal scenario reduction canbe written as: Determine an index set J ⊂ I of a given cardinality k such that theprocess ξ : Ω → R

s with

ξ (ωi ) :=

ξ (ωi ) , i ∈ J,

ξ(ωi( j)

), j ∈ J,

(5)

minimizes ‖ξ − ξ‖q , where i ( j) := argmini ∈J

∥∥ξ i − ξ j

∥∥, j ∈ I (optimal redis-

tribution rule). Combining (5) and the definition of ‖·‖q , we have

∥∥∥ξ − ξ

∥∥∥

q

q=∑

j∈J

p j

∥∥∥ξ j − ξ i( j)

∥∥∥

q =∑

j∈J

p j mini ∈J

∥∥∥ξ j − ξ i

∥∥∥

q, (6)

Adaptive discretization of convex multistage stochastic programs 369

where pi := P [ωi ], i ∈ I . Hence, the selection of the set J ⊂ I gives rise to thecombinatorial optimization problem

minJ⊂I :|J |=k

j∈J

p j mini ∈J

∥∥∥ξ i − ξ j

∥∥∥

q.

One heuristic to attack this problem is the (fast-)forward selection algorithm(Dupacová et al. 2003; Heitsch and Römisch 2003), which chooses the set I \ Jone by one by selecting scenarios that achieve the best improvement in ‖ξ − ξ‖q :The set I \ J := u1, . . . , u|I |−k

is computed by

u j := argminu∈I\u1,...,u j−1

i∈I\u1,...,u j−1,upi min

l∈u1,...,u j−1,u∥∥∥ξ l − ξ i

∥∥∥

q,

j = 1, . . . , |I | − k. (FF)

If the stochastic program has only two stages, inequality (4) bounds the dis-tance between the optimal values of the original problem and the problem with thereduced tree,

∣∣∣v (ξ) − v(ξ )

∣∣∣ ≤ L‖ξ − ξ‖q = L

⎝∑

j∈J

p j mini ∈J

∥∥∥ξ i − ξ j

∥∥∥

q

1/q

.

For multistage stochastic programs, a stagewise application of the fast-forwardselection algorithm (FF) yields an algorithm for the reduction of a scenario treethat controls the distance ‖ξ − ξ‖q (Heitsch and Römisch 2003). But since it doesnot take care of the filtration distance D f (ξ, ξ ), it can be seen as a heuristic only.For the special case that ξ is a fan of scenarios, it is shown in Heitsch and Römisch(2005) how the fast-forward selection algorithm can be adjusted to control thefiltration distance too.

Measuring the distance of the original tree N to the aggregated tree N |N ′ by‖ξ − ξ‖q (with ξ the process associated with N and ξ the process associated withN |N ′ ), we can write the distance (6) as

D (N ′) :=∑

N ′∈N ′DN ′

(N ′)

with

DN ′(N ′) :=

n∈N ′:n =nT (N ′)pi(n) min

N ′∈N ′

∥∥∥ξ i(nT (N ′)) − ξ i(n)

∥∥∥ .

For the case of a linear twostage problem, this definition yields ∆(N ′, N

) ≤L D (N ′) due to estimate (4).

To reduce D (N ′), we select N ′ ∈ N ′ that maximizes DN ′(N ′) and apply the

fast-forward selection algorithm (FF) for N ′, i.e., we select nodes from N ′ otherthan the representative node nT

(N ′) and add the corresponding paths to the refined

tree N |N ′ (see Vigerske 2005 for details). This procedure is repeated until wereduced the distance D (N ′) by at least 10%.

370 S. Vigerske, I. Nowak

Since the distance of ξ and ξ in (4) is multiplied by a problem-dependent factorL , it is even in the twostage case difficult to quantify the required accuracy of theapproximation ξ necessary to cause only a small perturbation of the optimal value.We proceed with the refinement of N ′ as long as the estimate D (N ′) exceeds agiven fraction on the gap between the tree consisting of only one scenario and theoriginal tree, i.e., as long as

D (N ′) > σDD0,

where σD ∈ [0, 1] is a given parameter and D0 := ∥∥ξ − ξ i(n)

∥∥

q for a scenarioi (n) chosen by (FF) from the set of all scenarios. Note, that a very small choicefor the tolerance σD on the distance between N |N ′ and N might indeed result ina very fine aggregation N ′, but has no direct impact on the final aggregation N .This is the main difference to the scenario reduction algorithms in Dupacová et al.(2003) and Heitsch and Römisch (2003, 2005), where a too small choice for thetolerance on ‖ξ − ξ‖q unaware of the problem-dependent parameter L can lead tounnecessary many scenarios and nodes in the reduced tree.

3.3 The algorithm

We can now formulate the adaptive scenario reduction algorithm, see Algorithm 1.For the initial aggregation N , we select one scenario ξ i(n) by fast-forward selection

Algorithm 1 Adaptive scenario reduction

Input: initial tree N , parameters m ∈ N and σP ∈ [0, 1], σD ∈ [0, 1]Compute:1: compute initial rough aggregation N := NT

with nT

(NT) := n and n selected by (FF)

2: D0 := ‖ξ − ξ i(n)‖q

3: compute initial refinement N ′ of N4: loop5: solve

(MSP

[N ]) and(MSP

[N ′])

6: if ∆(N ,N ′) > σP v

(N ′) then7: M := ∅8: if v (N ) − v

(N ′) > σP∣∣v(N ′)∣∣ then

9: solve (P)10: compute M according to formula (2)11: else if v

(N ′)− v (N ) > σP∣∣v(N ′)∣∣ then

12: compute M according to formula (3)13: end if14: replace M ∈ M by an aggregation of N ′ |M , using (FF)15: replace (in N ′) sets N ′ ∈ N ∩ N ′ by a refinement of N ′, using (FF)16: else if D (N ′) > σD D0 then17: let D1 := D (N ′)

18: while D (N ′) ≥ 910 D1 do

19: refine N ′ := argmaxDN ′

(N ′) ∣∣N ′ ∈ N ′ in N ′ using (FF)

20: end while21: else22: break23: end if24: end loopOutput: the aggregated node set N

Adaptive discretization of convex multistage stochastic programs 371

(FF) and set N := NT

with nT(NT) := n. For the initial refinement N ′ we

select some additional scenarios from NT using (FF).

Proposition 1 Algorithm 1 computes in a finite number of steps a reduced scenariotree N with ∆

(N , N ′) ≤ σP∣∣v(N ′)∣∣ and D (N ′) ≤ σDD0.

Proof The algorithm stops when no sets M ⊆ N are chosen for a refinement (line22). Hence, the size of the aggregation N or the refinement N ′ increases in everyiteration (lines 14, 15, and 19), and, due to the finiteness of the original tree, N ′can be equal to the original tree after a finite number of iterations, i.e., N = N |N ′ .In this (worst) case, the second error and its (heuristic) estimate, ∆

(N ′, N)

andD (N ′), respectively, are zero by definition. If the algorithm continues to selectsets M ⊆ N to reduce the first error, ∆

(N , N ′), the aggregation N is growing(lines 8 and 11) and can be equal to the original tree after a finite number of steps.However, in the general case, the algorithm stops because the first and second errorare small, i.e., ∆

(N , N ′) ≤ σP∣∣v(N ′)∣∣ and D (N ′) ≤ σDD0.

To solve (MSP [N ]) and(MSP

[N ′]), warm starts of the solver are desirable, sincethe problems that are solved during the iteration process are similar. In fact, onlynew variables and related constraints are introduced and the objective function ischanging. Furthermore, for an efficient algorithm, the problems

(MSP

[N ′]) and(P) should be solved only approximately. Lower and upper bounds on v

(N ′) canbe sufficient for an estimate of the first error, ∆

(N , N ′), and also an approxi-mate solution of (P) can give reliable information for the determination of the setsM ⊆ N . For example, a bundle algorithm based on Lagrangian Decomposition(cf. Vigerske 2005) might be suitable.

4 Numerical results

The described algorithm has been implemented for linear multistage stochasticprograms. The model can be formulated in GAMS language (General AlgebraicModeling System), whereby information about timestages and the original sce-nario tree is given separately (Vigerske 2005). The arising optimization problemsare solved by CPLEX.

We have applied the algorithm to three different linear stochastic programmingproblems. The first one is a twostage program with stochastic technology matrix.The second is a multistage program modeling a portfolio management problemwith stochastic right-hand-side and stochastic objective function. The last exam-ple is a multistage program arising in power management where only the right-hand-side is stochastic.

All models were run for several combinations of the parameters σP and σD .The influence of these parameters on the number of scenarios and nodes of thefinal reduced tree, and the accuracy of the optimal value are presented in severaltables. The accuracy is measured as relative distance between the optimal values

with respect to the reduced tree and the original tree, i.e.,∣∣v(N )−v(N )

∣∣

1+∣∣v(N )∣∣ . Also, the

iteration progress is illustrated for some selected values of σP and σD . The leftpicture in these figures shows the optimal values for the reduced tree N |N , the

372 S. Vigerske, I. Nowak

refined tree N |N ′ , and the original tree N (indicated by a straight line). The rightpicture shows the quotients v (N ) /v

(N ′) and D (N ′) /D0 and marks the inter-

val[

11+σP

, 1 + σP

]and the bound σD . Hence, we can track which of the cases

discussed in Sects. 3.1.1, 3.1.2, and 3.2 occur in which iteration. If the conditionv(N ′)− v (N ) < σP

∣∣v(N ′)∣∣ fails (when the curve falls below the marked inter-

val), a set M ⊆ N is selected according to (3) and v (N ) increases. If the conditionv (N )− v

(N ′) < σP∣∣v(N ′)∣∣ fails (when the curve exceeds the marked interval),

problem (P) is solved and M is selected according to formula (2). Otherwise,if ∆

(N , N ′) ≤ σP∣∣v(N ′)∣∣, but D (N ′) > σDD0, then N ′ is refined, i.e., sets

N ′ ∈ N ′ that contribute most to D (N ′) are selected and refined.The parameter m, which determines the growth rate of the reduced tree, has

been chosen relatively small (m = 2) to avoid the addition of too many unnecessaryscenarios, but on the cost of running time.

4.1 Product mix problem

The first example is a twostage model with simple recourse and stochastic technol-ogy matrix from King (1988, p. 554). It describes the production of four classesof products each consuming a certain number of man-hours in two types of fac-tories. Each product earns a certain fixed profit per item, and it is possible topurchase casual labor from outside. The available man-hours (η1, η2) are normallydistributed random variables with mean 6,000 and standard deviation 100 in thefirst factory, and mean 4,000 and standard deviation 50 in the second factory. Theman-hours ξi, j needed to produce a product of type j in factory i are uniformlydistributed random variables.

maxx,y

E[12x1 + 20x2 + 18x3 + 40x4 − 5y−

1 − 10y−2

]

s.t. ξ1,1x1 + ξ1,2x2 + ξ1,3x3 + ξ1,4x4 − η1 = y+1 − y−

1

ξ2,1x1 + ξ2,2x2 + ξ2,3x3 + ξ2,4x4 − η2 = y+2 − y−

2

x, y+, y− ≥ 0 x1, . . . , x4 deterministic

An SMPS-formulation of the model is available in the COIN-OR-library(Lougee-Heimer 2003), including a scenario tree on 300 scenarios. We run thealgorithm with several combinations of σP and σD , see Table 1. It can be observedthat the optimal value depends highly on the chosen set of scenarios. Hence, atoo rough choice for σD can result in a refinement N ′ that gives no meaningfulapproximation of the original scenario tree. Thus, if by coincidence the optimalvalues of (MSP [N ]) and

(MSP

[N ′]) are close, the algorithm stops with a stillvery inaccurate approximation of the original problem, see, e.g., the entries forσD = 0.5 and σP = 0.005 in Table 1 and see Fig. 1. The accuracy improves forsmaller values of σD , see, e.g., Fig. 2.

4.2 Portfolio management

The next example deals with the problem of financing business activities by bor-rowing and lending bonds with different maturities. The aim is to maximize the

Adaptive discretization of convex multistage stochastic programs 373

-32000

-31000

-30000

-29000

-28000

-27000

-26000

-25000

-24000

-23000

-22000

1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86 91 96 101 106 111 116

2332232132031931831731631531431331231131039383736353433323131

optim

al v

alue

s

iteration

optimal value original tree

0.7

0.8

0.9

1

1.1

1.2

1.3

1.4

1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86 91 96 101 106 111 1160

0.1

0.2

0.3

0.4

0.5

0.62332232132031931831731631531431331231131039383736353433323131

iteration

σP boundsσD bound

Fig. 1 Product mix problem: iteration progress for σp = 0.001 and σD = 0.1

profit which is determined by the interest rate and price changes of bonds, and theearnings of the underlying business activities (Frauendorfer et al. (1996)).

Let M := 1, . . . , 36 be a set of maturities in months, and MS :=1, 2, 3, 6, 12, 24, 36 ⊆ M be a subset of standard maturities for borrowing and

374 S. Vigerske, I. Nowak

-32000

-31000

-30000

-29000

-28000

-27000

-26000

-25000

-24000

-23000

-22000

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106 113 120 127 134 141 148

2972822692552392272131991851711571431291151018773594531171

optim

al v

alue

s

iteration

optimal value original tree

0.7

0.8

0.9

1

1.1

1.2

1.3

1.4

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106 113 120 127 134 141 1480

0.1

0.2

0.3

0.4

0.5

0.62972822692552392272131991851711571431291151018773594531171

iteration

σP boundsσD bound

Fig. 2 Product mix problem: iteration progress for σp = 0.001 and σD = 0.01

lending. The amount of borrowing and lending introduced to the market at time twith maturity m is denoted by v

m,+t and v

m,−t , respectively; vm

t is the total volumeof borrowing and lending at time t with maturity m, i.e., vm

t = vm+1t−1 +v

m,+t −v

m,−t ,

Adaptive discretization of convex multistage stochastic programs 375

Table 1 Product mix problem: number of scenarios of the final reduced tree N |N (left) andaccuracy of the optimal value (right) depending on σP and σD

σP \σD 0.5 0.1 0.05 0.01 0.5 0.1 0.05 0.01

0.1 269 70 120 133 1.66% 11.87% 5.70% 3.40%0.01 25 93 231 260 6.17% 0.47% 4.55% 0.16%0.005 61 177 177 257 13.37% 5.58% 5.58% 2.19%0.001 239 239 239 300 3.14% 3.14% 3.14% 0%0.0005 239 239 239 300 3.14% 3.14% 3.14% 0%0.0001 269 269 269 269 1.66% 1.66% 1.66% 1.66%

m ∈ MS , and vmt = vm+1

t−1 , m ∈ M\MS . The total volume at time t is given by xt =∑m∈M vm

t . The stochastic change in volume is denoted by ξt , i.e., xt = xt−1 + ξt ,and unlimited funding is prohibited by the inequality

∑m∈MS v

m,+t −v1

t−1 ≤ ξt . Astarting portfolio vm

0 , m ∈ M, has been given as input. Finally, the stochastic pro-cesses of returns for borrowing and lending with maturity m ∈ MS per period aredenoted by η

m,+t and η

m,−t , and the process of return for the associated underlying

business activity by ηxt . The complete model has the form

maxv,x

E

⎣T∑

t=0

⎧⎨

m∈MS

ηm,−t v

m,−t − η

m,+t v

m,+t

⎫⎬

⎭+ ηx

t xt

s.t. vmt − vm+1

t−1 − vm,+t + v

m,−t = 0, t = 1, . . . , T, m ∈ MS

vmt − vm+1

t−1 = 0, t = 1, . . . , T, m ∈ M\MS

xt −∑m∈M vmt = 0, t = 1, . . . , T

xt − xt−1 = ξt , t = 1, . . . , T∑

m∈MS vm,+t − v1

t−1 ≤ ξt , t = 1, . . . , T

vm,+t , v

m,−t , vm

t , xt ≥ 0, t = 1, . . . , T, m ∈ MS

vm,+t , v

m,−t , vm

t , xt ∈ Ft , t = 1, . . . , T, m ∈ MFor the numerical experiments we have chosen two trees from Frauendorfer

et al. (1996). The first tree has T = 5 stages, 625 scenarios, and 781 nodes. Thesecond tree has T = 6 stages, 3,125 scenarios, and 3,906 nodes (Figs. 3, 4).

For the tree on 625 scenarios, it can be observed that with only 89 scenariosan approximation with only 1% perturbation in the optimal value can be achieved(σP = 0.005, σD = 0.01), cf. Table 2. Similarly, for the tree on 3,125 scenarios, areduction of approx. 90% (in number of scenarios) still allows an approximationwith at most 1% perturbation in the optimal value, cf. Table 3.

4.3 Power management in a hydro-thermal system under uncertain load

In the final example we consider a power generation system consisting of a col-lection of coal fired thermal units I and pumped hydro units J . The aim is tofind cost-optimal operation levels of the thermal and hydro units for 1 week underuncertain electric load. This model is a simplified version of the one described inNowak and Römisch (1997).

376 S. Vigerske, I. Nowak

-5200

-5100

-5000

-4900

-4800

-4700

-4600

-4500

-4400

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39

6363636363555550424242343427271919771

optim

al v

alue

iteration

optimal value original tree

0.95

0.96

0.97

0.98

0.99

1

1.01

1.02

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 390

0.1

0.2

0.3

0.4

0.5

0.6

0.76363636363555550424242343427271919771

iteration

σP bounds

σD bound

Fig. 3 Portfolio management (625 scenarios) with σP = 0.01, σD = 0.01

Fixed parameters for the thermal units are fuel prices b ≥ 0 and operationranges 0 ≤ p ≤ p. For the hydro units the parameters are the efficiency of thepumps 0 ≤ η ≤ 1, the maximal operation levels of the pumps and turbines w, v ≥ 0and the capacity of the reservoirs l ≥ 0. For the beginning and end of the week,the fill levels lin and lend of the reservoirs are given. The stochastic data consistsof the load d and the required reserve r for 1 week. We search for a sequence of

Adaptive discretization of convex multistage stochastic programs 377

-6500

-6400

-6300

-6200

-6100

-6000

-5900

-5800

-5700

-5600

-5500

1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67 70

270262254238230206182182167159147134110968864645947392619111

optim

al v

alue

iteration

optimal value original tree

0.95

0.96

0.97

0.98

0.99

1

1.01

1.02

1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67 700

0.1

0.2

0.3

0.4

0.5

0.6

0.7270262254238230206182182167159147134110968864645947392619111

iteration

σP bounds

σD bound

Fig. 4 Portfolio management (3,125 scenarios) with σP = 0.01, σD = 0.01

operation levels for the thermal units p, pumps w, and turbines v, and of fill levelsl in the pump-reservoirs.

Further on, the decision process underlies the following constraints: the filllevels at the beginning and end of the week, (7a) and (7b), time-coupling of the filllevels, (7c), meet of the load, (7d), and reserve-requirements, (7e). The objectiveis to minimize the expected fuel costs. Thus, the complete model has the form

378 S. Vigerske, I. Nowak

Table 2 Portfolio management (625 scenarios): number of scenarios and nodes (left and middle)of the reduced tree and accuracy of the optimal value (right)

σP\σD 0.5 0.1 0.05 0.01 σP\σD 0.5 0.1 0.05 0.01 σP\σD 0.5 0.1 0.05 0.01

0.1 1 5 5 5 0.1 5 19 19 19 0.1 14.39% 7.95% 7.95% 7.95%0.01 7 34 50 63 0.01 24 91 126 149 0.01 6.05% 2.18% 1.67% 1.43%0.005 10 45 58 89 0.005 30 110 134 188 0.005 6.19% 2.18% 1.71% 1.00%0.001 9 48 83 136 0.001 28 112 174 263 0.001 5.74% 1.87% 1.10% 0.68%0.0005 9 90 211 242 0.0005 28 176 345 392 0.0005 5.74% 1.28% 0.73% 0.62%0.0001 9 82 134 289 0.0001 28 159 235 439 0.0001 5.74% 1.48% 0.97% 0.53%

Table 3 Portfolio management (3,125 scenarios): number of scenarios and nodes (left and mid-dle) of the reduced tree and accuracy of the optimal value (right)

σP\σD 0.5 0.1 0.05 0.01 σP\σD 0.5 0.1 0.05 0.01 σP\σD 0.5 0.1 0.05 0.01

0.1 1 13 13 13 0.1 6 53 53 53 0.1 13.52% 9.67% 9.67% 9.67%0.01 11 64 167 270 0.01 40 203 463 666 0.01 6.41% 2.15% 1.50% 1.16%0.005 7 91 223 327 0.005 28 275 576 754 0.005 6.62% 1.50% 1.08% 0.64%0.001 16 113 367 647 0.001 58 315 808 1226 0.001 6.31% 1.45% 0.61% 0.24%0.0005 16 143 335 671 0.0005 58 380 746 1247 0.0005 6.31% 1.38% 0.77% 0.20%0.0001 26 384 465 856 0.0001 72 732 886 1478 0.0001 6.44% 1.54% 0.75% 0.15%

minp,l,v,w

E

[∑

i∈I

T∑

t=1

bi pi,t

]

s.t. l j,1 + (v j,1 − η jw j,1) = lin, j , j ∈ J (7a)

l j,T = lend, j , j ∈ J (7b)

l j,t − l j,t−1 + (v j,t − η jw j,t) = 0, t = 2, . . . , T, j ∈ J

(7c)∑

i∈I

pi,t +∑

j∈J

(v j,t − w j,t

) ≥ dt , t = 1, . . . , T (7d)

i∈I

(pi − pi,t

) ≥ rt , t = 1, . . . , T (7e)

pt≤ pi,t ≤ pt , t = 1, . . . , T, i ∈ I

0 ≤ v j,t ≤ v j , 0≤w j,t ≤ w j , 0 ≤ l j,t ≤ l j , t = 1, . . . , T, j ∈ J

pi,t , v j,t , w j,t , l j,t ∈ Ft , t = 1, . . . , T,i ∈ I, j ∈ J

Two scenario trees modeling the stochasticity in the electric load over a timehorizon of 1 week with hourly discretization points (T = 168) were available. Thefirst tree consists of 64 scenarios and 3,048 nodes, the second one of 729 scenariosand 22,623 nodes (Figs. 5, 6).

When running the algorithm for the tree on 64 scenarios and parameters σP =0.001 and σD = 0.01, it results in a tree on 46 scenarios, see Fig. 9 (left). Theinfluence of the parameters σP and σD is visualized by Table 4. While σP hasobviously a high impact on the final number of scenarios, the distance D (N ′),controlled by σD , seems to be of minor importance (Fig. 7).

Adaptive discretization of convex multistage stochastic programs 379

0

200

400

600

800

1000

1200

1400

1600

1800

200

400

600

800

1000

1200

1400

1600

1800

Fig. 5 One respectively 64 scenarios of the electric load for 1 week

Fig. 6 The scenario trees for the electric load of 1 week. Left: 64 scenarios, 3,048 nodes. Right:729 scenarios, 22,623 nodes

Table 4 Power management model (64 scenarios): number of scenarios and nodes (left andmiddle) of the reduced tree and accuracy of the optimal value (right)

σP\σD 0.5 0.1 0.05 0.01 σP\σD 0.5 0.1 0.05 0.01 σP\σD 0.5 0.1 0.05 0.01

0.1 1 1 1 1 0.1 168 168 168 168 0.1 0.56% 0.56% 0.56% 0.56%0.01 5 7 7 7 0.01 624 720 720 720 0.01 0.47% 0.22% 0.22% 0.22%0.005 9 54 54 57 0.005 840 2640 2640 2856 0.005 0.47% 0.72% 0.72% 0.15%0.001 34 34 46 46 0.001 2112 2112 2592 2592 0.001 0.11% 0.11% 0.02% 0.02%0.0005 41 41 41 41 0.0005 2448 2448 2448 2448 0.0005 0.04% 0.04% 0.04% 0.04%0.0001 48 48 48 48 0.0001 2664 2664 2664 2664 0.0001 0% 0% 0% 0%

For the tree on 729 scenarios, σP = 0.001, and σD = 0.01 the algorithm fin-ishes with a tree on 106 scenarios, see Fig. 9 (right). Figure 8 shows the progressof the optimal values v (N ) and v

(N ′) and how the first error and the term D (N ′)

develops during the iteration progress. One observes that after the 14th iteration,the remaining iterations were used to push D (N ′) /D0 below the tolerance σD .When increasing σD to 0.1, one obtains the same tree, but the number of iterationsdecreases from 40 to 20 (Table 5).

5 Conclusions

We have proposed a new algorithm for the discretization of multistage stochasticprograms, which integrates the reduction of a scenario tree into the solution of the

380 S. Vigerske, I. Nowak

5.02e+06

5.04e+06

5.06e+06

5.08e+06

5.1e+06

5.12e+06

5.14e+06

5.16e+06

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23

464646444240383434312826242220181613119751

optim

al v

alue

iteration

optimal value original tree

0.99

0.995

1

1.005

1.01

1.015

1.02

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 230

0.1

0.2

0.3

0.4

0.5

0.6464646444240383434312826242220181613119751

iteration

σP bounds

σD bound

Fig. 7 Power management model (64 scenarios) with σp = 0.001, σD = 0.01

discretized problem. We estimated the approximation error by a sum of two terms,which we aim to diminish separately, see (1). The first term is oriented on the opti-mization problem itself and determines directly the size of the final approximation,while the latter is to ensure reliability of the first term, thus it influences the sizeof the final approximation only indirectly. This division allows the construction

Adaptive discretization of convex multistage stochastic programs 381

5.06e+06

5.08e+06

5.1e+06

5.12e+06

5.14e+06

5.16e+06

5.18e+06

5.2e+06

5.22e+06

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39

10610610610610610610610610610610610610610282444427141

optim

al v

alue

iteration

optimal value original tree

0.99

0.995

1

1.005

1.01

1.015

1.02

1.025

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 390

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.510610610610610610610610610610610610610610282444427141

iteration

σP boundsσD bound

Fig. 8 Power management model (729 scenarios) with σp = 0.001, σD = 0.01

of reduced scenario trees that are highly adapted to the optimization problem,and allow therefore much smaller trees than an approach that concentrates on thestochastic process only.

The efficiency of the algorithm depends mostly on the effort spend to solvethe problems (MSP [N ]) and

(MSP

[N ′]). In the current implementation, these

382 S. Vigerske, I. Nowak

Fig. 9 Power management model: reduced scenario trees on 46 and 77 scenarios. The originalscenario trees had 64 and 729 scenarios, respectively

Table 5 Power management model (729 scenarios): number of scenarios (left) and nodes (mid-dle) of the reduced tree and accuracy of the optimal value (right)

σP \σD 0.1 0.01 0.1 0.01 0.1 0.01

0.1 1 1 168 168 0.18% 0.18%0.01 1 1 168 168 0.18% 0.18%0.005 91 91 6384 6384 0.41% 0.41%0.001 106 106 7224 7224 0.03% 0.03%0.0001 267 478 13944 20160 0.04% 0.01%

problems were solved up to optimality, which is quite expensive, so that furtherdevelopments might estimate the first error by use of lower and upper bounds onthe optimal values. Also an approximate solution of (P) can give information stillreliable enough to determine the set M.

An advantage of the proposed approach is its potential extension to noncon-vex problems, as discussed in the Sect. 1 and proposed in Nowak (2005) andVigerske (2005). Using a convex relaxation of (MSP [N ]), the gap between theaggregated problem and its refinement, ∆

(N , N ′), can be estimated by the gapbetween convex relaxations of (MSP [N ]) and

(MSP

[N ′]). Hence, we can applythe algorithms from Sects. 3.1 and 3.2. For the reduction of the relaxation gaps wepropose a Lagrangian Decomposition approach where cuts for the master problemare constructed from solutions of the subproblems (Nowak 2005; Vigerske 2005).The dual parameter for the Lagrangian subproblems may be computed by a ColumnGeneration method, i.e., a convex inner approximation of the convexified feasibleset. This approach has the advantage that the generated inner approximation pointscan easily be updated after a refinement of the aggregations N and N ′.

Acknowledgements This work was supported by the BMBF under the grant 03SF0312E. Theauthors wish to thank Holger Heitsch and Werner Römisch (Humboldt-University Berlin) forhelpful discussions and providing the scenario trees for the power management problems.

References

Casey MS, Sen S (2005) The scenario generation algorithm for multistage stochastic linearprogramming. Math Oper Res 30:615–631

CPLEX. ILOG Inc. http://www.ilog.com/products/cplex

Adaptive discretization of convex multistage stochastic programs 383

Dempster MAH (2004) Sequential importance sampling algorithms for dynamic stochastic pro-gramming. Zapiski Nauchnykh Seminarov POMI 312:94–129

Dupacová J, Gröwe-Kuska G, Römisch W (2003) Scenario reduction in stochastic program-ming—an approach using probability metrics. Math Program Ser A 95:493–511

Edirisinghe NCP (1999) Bound-based approximations in multistage stochastic programming:nonanticipativity aggregation. Ann Oper Res 85:103–127

Frauendorfer K, Haarbrücker G, Marohn C, Schürle M (1996) SGPF—portfolio test problemsfor stochastic multistage linear programming. Institute of Operations Research, University ofSt. Gallen. http://www.ifu.unisg.ch/sgpf/

Gassmann HI (1990) MSLiP: a computer code for the multistage stochastic linear programmingproblem. Math Program 47:407–423

General Algebraic Modeling System. http://www.gams.comHeitsch H, Römisch W (2003) Scenario reduction algorithms in stochastic programming. Comput

Optim Appl 24:187–206Heitsch H, Römisch W (2005) Scenario tree modelling for multistage stochastic programs. Pre-

print 296. DFG Research Center Matheon “Mathematics for key technologies”Heitsch H, Römisch W, Strugarek C (2006) Stability of multistage stochastic programs. SIAM J

Optim 17:511–525Higle JL, Rayco B, Sen S (2005) Stochastic scenario decomposition for multi-stage stochastic

programs. Ann Oper Res (submitted). http://www.sie.arizona.edu/faculty/higle/Høyland K, Kaut M, Wallace SW (2003) A heuristic for moment-matching scenario generation.

Comput Optim Appl 24:169–185King AJ (1988) Stochastic programming problems: examples from the literature. In: Ermolév

JM, Ermoliev Y (eds) Numerical techniques for stochastic optimization. Springer series com-putational mathematics, vol 10. Springer, Berlin Heidelberg New York, pp 543–567

Lougee-Heimer R (2003) The common optimization INterface for operations research. IBM JOper Res Dev 47(1):57–66. http://www.coin-or.org

Nowak I (2005) Relaxation and decomposition methods for mixed integer nonlinear program-ming. In: Springer international series of numerical mathematics, vol 152. Birkhäuser, Basel

Nowak MP, Römisch W (1997) Optimal power dispatch via multistage stochastic programming.In: Brns M, Bendse MP, Srensen MP (eds) Progress in industrial mathematics at ECMI, Teub-ner, vol 96, pp 324–331

Pflug GC (2001) Scenario tree generation for multiperiod financial optimization by optimal dis-cretization. Math Program 89:251–271

Römisch W (2003) Stability of stochastic programming problems. In: Ruszczynski and Shapiro(2003), chap 8, pp 483–554

Ruszczynski A, Shapiro A (2003) Stochastic programming. In: Handbooks in operations researchand management science, vol 10. Elsevier Science, Amsterdam

Shapiro A (2006) Proceedings of the international symposion on mathematical programming(submitted)

Vigerske S (2005) Adaptive discretization of stochastic programs. Diploma Thesis, Humboldt-Universität Berlin