39 Optimal Budget Allocation in the Evaluation of …39 Optimal Budget Allocation in the Evaluation...

39

Optimal Budget Allocation in the Evaluation ofSimulation-Optimization Algorithms

ERIC ANJIE GUO and SHANE G. HENDERSON, Cornell University

To efficiently evaluate simulation-optimization algorithms, we propose three different performance mea-sures and their respective estimators. Only one estimator achieves the canonical Monte Carlo convergencerate O(T−1/2), while the other two converge at the sub-canonical rate of O(T−1/3). For each estimator, westudy how the computational budget should be allocated between the execution of the optimization algo-rithm and the assessment of the output, so that the mean squared error of the estimator is minimized.

General Terms: Simulation, Optimization, Performance

ACM Reference Format:Guo, E. A., and Henderson, S. G. 2012. Optimal Budget Allocation in the Evaluation of SO Algorithms. ACMTrans. Model. Comput. Simul. 9, 4, Article 39 (March 2010), 23 pages.DOI = 10.1145/0000000.0000000 http://doi.acm.org/10.1145/0000000.0000000

1. INTRODUCTIONSimulation Optimization captures the special case of mathematical programmingproblems where the objective function and constraints are expressed implicitlythrough a stochastic simulation. The stochastic simulation, when provided with aninput x, is “executed at the design x” to provide an estimate of the objective functionat x. The objective function g(x) is defined as the expectation of the random variableG(x):

g(x) := EG(x).

The random variable G(x) is the estimate provided by the stochastic simulation, andthe replications are assumed to be independent and identically distributed. In practice,the overhead of any simulation-optimization algorithm is usually dominated by thecomputational effort that has to be expended on the evaluation of the estimates G(x).In light of that, we measure the computational budget for simulation-optimizationalgorithms by the number of times the stochastic simulation is executed, and denotethe budget by t. We view t as a fixed parameter specified by the user.

Let V (t) denote the true objective function value associated with the estimated (andtherefore random) optimal solution X(t) after the budget t has been expended, for agiven algorithm, namely

V (t) := g(X(t)). (1)

In this paper we develop estimators of the mean, distribution function, and quantilesof V (t). These estimators can be used to characterize the performance of a given algo-rithm, or to rank different algorithms applied to a class of simulation-optimizationproblems. The variance of the estimators converges to 0 as we run multiple replica-tions of the simulation-optimization algorithm and get estimated optimal solutionsX(t). Our estimators of the distribution function and quantiles of V (t) are biased, andthe bias is reduced as we compute more precise estimates of g(·) at each X(t). Within agiven computational effort T , not to be confused with the budget t for the simulation-optimization algorithm on one replication, we trade off variance and bias to minimizethe mean squared error (MSE) of the estimators. Our results indicate that the estima-tor of the mean converges at the canonical Monte Carlo convergence rate T−1/2, whilesub-canonical rates are achieved for the distribution function and quantiles.

We further generalize our results to the scenario where the budget t is allowed toassume multiple values, and we might then plot the true objective function value

ACM Transactions on Modeling and Computer Simulation, Vol. 9, No. 4, Article 39, Publication date: March 2010.

39:2 E. A. Guo and S. G. Henderson

Fig. 1. Performance Measures Plot from SimRunner

g(X(t)) associated with the estimated optimal solution X(t) as a function of t. Thisplot illustrates how the quality of the optimal solution as estimated by the algorithmevolves over time. Such performance measures plots are present in various commer-cial simulation-optimization packages. Figure 1 shows one example of the PerformanceMeasures Plot as in SimRunner [ProModel 2011].

When multiple estimators with different budgets tk, k = 1, . . . , l are considered, theoptimal budget allocation scheme is also solved with the goal being to minimize thesum or maximum of MSEs across all budgets. Whether the performance measure isthe mean, distribution function or quantile, the same convergence rate as in the single-budget case is attained. Furthermore, different simulation approaches give rise to dif-ferent computational cost structure, due to the possibility of reusing sample paths forsome classes of simulation-optimization algorithms.

For each scenario discussed above, we provide an intuitive interpretation for theoptimal budget allocation scheme and the resulting convergence rate of the estimatorwith respect to the total computational budget.

The estimators discussed in this paper are all constructed via nested simulation,which has been adopted to estimate the probability of loss, Value at Risk (VaR), andexpected shortfall (ES) of portfolios in financial applications. The VaR was first ana-lyzed in the context of nested simulation by Lee [1998]; Lee and Glynn [2003], wherecentral limit theorems were established for estimators of the distribution function andquantiles. We exploit these results and generalize the central limit theorem to a mul-tidimensional scenario in order to demonstrate the asymptotic properties of our es-timators in the multiple-budget case. Nested-simulation problems were subsequentlyconsidered by Gordy and Juneja [2010], who suggested an optimal budget allocationthat minimizes the MSE of each estimator. Despite the different context, the optimiza-tion problems in Gordy and Juneja [2010] bear a strong resemblance to the optimalbudget allocation problems of a single-budget in this paper, and the resulting solu-tions show a lot of similarity as well. We also refer the reader to the work of Broadieet al. [2011], which proposed a nonuniform nested simulation algorithm. That paper isdifferent from previous work and this paper in the sense that the computational effortis not determined prior to the simulation via deterministic optimization, but allocatedin a sequential fashion. It may be possible to improve the convergence rate of someof the estimators discussed in this paper by adopting an adaptive inner simulation


Optimal Budget Allocation in the Evaluation of SO Algorithms 39:3

length and applying the methodology of Broadie et al. [2011]. That would require a se-quential allocation scheme and analysis that is substantially different from the staticallocations considered herein.

Our work should not be confused with Optimal Computing Budget Allocation(OCBA), another line of research that primarily focuses on attaining the highest sim-ulation decision quality using a fixed computing budget t. An introduction to OCBAideas can be found in Chen et al. [2008]; Fu et al. [2008]; Chen and Lee [2011].

In our view the primary contributions of this paper are as follows.

— We draw attention to the finite-time properties of simulation-optimization algo-rithms, as a complement to convergence results. Finite-time properties for practi-cally relevant simulation-optimization budgets (t) are of great interest, but appearto be difficult to obtain in any kind of analytic fashion. The results in this paper arerelevant for estimating these properties on test problems, as in a research agendaadvanced in Fu [2002]; Glynn [2002]; Pasupathy and Henderson [2006, 2011].

— We highlight V (t) as an important performance measure in these settings, and showhow to estimate its distributional properties through nested simulation.

— We obtain the asymptotically optimal (non-sequential) design of nested simulationsin the sense of minimizing mean-squared error. The resulting expressions provideinsight on how to design simulation experiments to estimate properties of V (t), bothin the single budget case (one value of t) and for multiple budgets.

— We develop multivariate central limit theorems for the distribution function andquantiles in the multiple-budget case.

The rest of this paper is organized as follows. Section 2 lays out the frameworkand basic notation for optimal budget allocation in the evaluation of simulation-optimization algorithms. Estimators of the mean, distribution function, and quantilesof V (t) are analyzed in Section 3, Section 4, and Section 5, respectively. Each sectiontreats the single-budget and multiple-budget cases separately, and budget allocationschemes are established for both. In particular, the optimal convergence rates can bedifferent depending on whether the feasible region is discrete or continuous, as will beseen in Sections 4 and 5. In Section 6, we focus on the scenario where sample paths arereused, and thus correlation is introduced between the estimators for different budgetstk, k = 1, . . . , l. Multivariate central limit theorems are established for each type of es-timator, which can be proved by generalizing the results of Lee [1998]; Lee and Glynn[2003]. A summary of the results is given in Section 7. We provide proof sketches ofsome of the results herein, and complete proofs in an online appendix.

2. FRAMEWORKSuppose that the budget t is fixed, and we wish to estimate one of the aforemen-tioned performance measures. A simulation approach to this problem is as follows.Run the simulation-optimization algorithm with a budget t, n independent times. LetX1(t), . . . , Xn(t) be the resulting estimates of optimal solutions. Now estimate g(Xi(t))at each point Xi(t) with a simulation run of length m, for each i = 1, . . . , n, yielding es-timates Z1(t), . . . , Zn(t) of g(X1(t)), . . . , g(Xn(t)) respectively, each of which is a sampleaverage of (conditionally) i.i.d. observations:

Zi(t) :=1

m

m∑j=1

Gj(Xi(t)). (2)

Then the performance measure is estimated using Z1(t), . . . , Zn(t). For example, for es-timating the pth quantile of g(X(t)), we use the pth sample quantile of Z1(t), . . . , Zn(t).



We can also vary the budget t to obtain further information on the performance ofa single algorithm on a single problem. Let t1, . . . , tl be the budgets to be considered.Without loss of generality, they are arranged in ascending order, namely t1 < · · · <tl. Let V (tk) denote the true objective function value associated with the estimatedoptimal solution X(tk) after the budget tk has been expended, i.e., V (tk) := g(X(tk)).

The number of times the simulation-optimization algorithm is run is allowed to varywith the budget t. For each budget tk, k = 1, . . . , l, run the simulation-optimizationagorithm nk independent times. Let Xi(tk), i = 1, . . . , nk be the resulting estimates ofoptimal solutions. The estimates of V (tk) are of the same form as (2):

Zi(tk) :=1

mk

mk∑j=1

Gj(Xi(tk)).

The estimators of the performance measures with respect to each budget tk are formu-lated in the same way as in the single-budget context.

Let T denote the total computational budget available for evaluating the simulation-optimization algorithm, not to be confused with the budget t for running thesimulation-optimization algorithm itself. We measure T in the same units as t, namelythe number of times the stochastic simulation is executed. For a given t, let n,m bechosen so that the computational effort required to estimate the performance measureis approximately T . Since n instances of X(t) need to be simulated, and each requiresm estimates of G(X(t)), the aggregate effort is tn+mn, and thus

n(t+m) = T. (3)In practice, n and m + t must be positive integers and lie in the interval [1, T ]. We

relax these restrictions, because our goal is insight into the form of the optimal allo-cations as the computational budget T → ∞ and imposing these restrictions with theensuing notational complication does not lead to further insight or clarity. For someproblem parameters, this relaxation could lead to “optimal” values of t+m or n that areless than 1 or greater than T . In such cases the optimal value is instead the projectiononto the interval [1, T ], because our recommendations arise from solving convex mini-mization problems. In any case, for sufficiently large T , both n and m will be smallerthan T , so the upper bound of T is unnecessary from an asymptotic standpoint.

In principle, in order to generate a performance measures plot as in Figure 1, weneed to estimate the value of V (t) for all t > 0. From a practical standpoint, we restrictour attention to a finite set of budgets t1 < t2 < · · · < tl. We assume that these budgetsare given exogenously, rather than attempting to select them as part of our estimationprocedures. Indeed, these budgets will likely be related to computational budgets thatare practically feasible, and therefore should be chosen by the user of a simulationoptimization algorithm. Consequently, we focus on delivering high-quality estimatorsof V (t) for any specified budget t, but we do not view t as a decision variable.

There are two possible cost structures associated with the multiple-budget scenario.First, if the simulation has to be done independently for each budget tk, then the totalcomputational budget is simply

T =

l∑k=1

Tk, (4)

whereTk := nk(tk +mk) (5)

is the aggregate effort spent to compute the performance measure for budget tk. Sec-ond, for many simulation-optimization algorithms, the partial sample paths of esti-



0 200 400 600 800 1000−1.73

−1.725

−1.72

−1.715

−1.71

−1.705

−1.7

−1.695

−1.69

−1.685

Computational Budget

Estim

ated

Obj

ectiv

e Fu

nctio

n Va

lue

Performance Measures Plot

Fig. 2. Performance Measures Plot of Scaled and Shifted Kiefer-Wolfowitz Stochastic Approximation Algo-rithm applied to G(x) = −xT x+N (0, 1), x ∈ R5: RMSEs are displayed at each computational budget.

mated optimal solutions can be reused. In this case, the expression for the total com-putational cost is more involved, as discussed in Section 3.3.

In the single-budget case, our goal is to balance the computational effort betweengenerating optimal solutions and valuing the quality of the solutions, so that the MSEof the estimator is minimized. This problem was discussed in the context of portfoliorisk measurement in, e.g., Gordy and Juneja [2010]; Broadie et al. [2011]; see also [Lee1998; Lee and Glynn 2003].

In Figure 2, an example of a performance measure plot is illustrated, and the root-mean-square errors (RMSE) at each computational budget tk are shown as error bars.The error bars indicate the accuracy of the graph of estimated objective-function val-ues, so we want to minimize their width collectively. There are many ways to accom-plish this, but for simplicity we do so by minimizing the sum or maximum of the MSEsacross all budgets tk, k = 1, . . . , l under consideration.

3. EXPECTATIONSOne way to examine the quality of an estimated optimal solution X(t) produced bya simulation-optimization algorithm is to take the expectation of the true objectivefunction value, i.e., EV (t) = E g(X(t)); see (1). In this section, we propose an estimatorµ for this expectation, and present the optimal budget allocation for both single andmultiple budgets. It turns out that the estimator achieves the canonical Monte Carloconvergence rate O(T−1/2).

3.1. The Single-Budget CaseSuppose that the budget t is fixed. In order for the performance measure to be well-defined we assume that V (t) has finite expectation, which holds, for example, wheng(·) is a continuous function and the feasible region of x is compact.



The obvious estimator of EV (t) is

µ :=1

n

n∑i=1

Zi(t),

which is unbiased.

Definition 3.1. Let ρ(t) be the correlation between the estimated objective functionvalues at a common estimated optimal solution X(t) after the budget t has been ex-pended, so that

ρ(t) := corr(G1(X(t)), G2(X(t))).

To compute ρ(t), first we compute

Cov(G1(X(t)), G2(X(t))) = E[(G1(X(t))− EV (t))(G2(X(t))− EV (t))]

= E[E[(G1(X(t))− EV (t))(G2(X(t))− EV (t))|X(t)]]

= E[E[G1(X(t))− EV (t)|X(t)] E[G2(X(t))− EV (t)|X(t)]]

= E[(g(X(t))− EV (t))(g(X(t))− EV (t))]

= Var g(X(t)).

It follows that the correlation ρ(t) can also be written as ρ(t) =Var g(X(t))/VarG(X(t)). In other words, it is the proportion of variability of G(X(t))caused by the common factor X(t).

Consequently, the correlation ρ(t) is well-defined if and only if VarG(X(t)) is. Thus,we also assume finite variance of the estimate at the random estimated optimal solu-tion, or

VarG(X(t)) <∞.If g(·) is a continuous function on a compact space, then the condition holds, e.g., whenVarG(x) is bounded uniformly on its domain.

By conditioning on the estimated optimal solution X(t), we obtain the following re-sult.

PROPOSITION 3.2. The correlation ρ(t) is nonnegative, and the MSE of µ with re-spect to EV (t) is

MSE µ:= MSE(µ,EV (t)) =1 + (m− 1)ρ(t)

nmVarG(X(t)).

We now turn to identifying the values of n and m that minimize this MSE, subjectto the constraint that n(t+m) = T as in (3), where we relax the constraint that n andt+m should be integers in [1, T ].

THEOREM 3.3. The values of n and m that minimize MSE µ are

n =T

√t[√

1/ρ(t)− 1 +√t] , m =

√(1/ρ(t)− 1) t, (6)

and the resulting minimum MSE is

MSE µ =[√

1− ρ(t) +√ρ(t)t

]2 VarG(X(t))

T

A proof is given in the Electronic Appendix. This result can also be viewed as aspecial case of splitting; cf. Asmussen and Glynn [2007, p. 147–149]. To see why, recallthat in splitting we wish to estimate Eϕ(X,Y ) for some (known) function ϕ of two



random variables X and Y . In our setting ϕ(x, y) = y, X = X(t) and Y = G(X(t)). Themethod of splitting involves using an estimator of the form

µ =1

n

n∑i=1

1

m

m∑j=1

Gj(Xi(t)) =1

nm

n∑i=1

m∑j=1

ϕ(Xi(t), Gj(Xi(t))), (7)

where we use multiple samples of Y for each sample of X. Since µ is unbiased, tominimize its MSE it is sufficient to find the values of n and m that minimize its vari-ance, which follow directly from the general theory in Asmussen and Glynn [2007] forsplitting.

It can be seen from (6) that m is increasing in t but decreasing in ρ(t), which matchesintuition. Indeed, when t is large, the optimal solutions X1(t), . . . , Xn(t) are costly toestimate, and so it is optimal to limit n and use long “internal” simulation run lengthsm. Meanwhile, if the correlation ρ(t) is large, then information is shared across Xi’sand incurs bigger covariance, and therefore the optimal m decreases.

Notice that m stays constant and does not increase with T . This is a direct conse-quence of the unbiasedness of the estimator. In this case the optimal value of m isdetermined purely by the variance, which depends on T only through the number of“macro” replications n.

The expression for the optimal MSE is of order T−1, which shows that the estimatorof EV (t) converges at the so-called canonical Monte Carlo convergence rate T−1/2; seeAsmussen and Glynn [2007, p. 70] for a discussion of this nomenclature.

3.2. The Multiple-Budget CaseNow we vary the budget t. Let t1 < t2 < · · · < tl be the budgets under consideration,and EV (tk), k = 1, . . . , l the performance measures that need to be estimated. Theobvious estimator for each EV (tk) is

µk :=1

nk

nk∑i=1

Zi(tk).

Definition 3.4. For each k = 1, . . . , l, let ρ(tk) be the correlation between the esti-mated objective function values at a common estimated optimal solution X(tk) afterthe budget tk has been expended:

ρ(tk) := corr (G1(X(tk)), G2(X(tk))) = Var g(X(tk))/VarG(X(tk)),

where the latter expression ensures that ρ(tk) is nonnegative; see Proposition 3.2.We first consider the scenario where sample paths cannot be reused, and simulations

for each budget tk have to be performed independently. The case where sample pathsare reused is discussed in Section 3.3.

In Section 2, we have established the constraints (4) and (5), namely nk(tk+mk) = Tkand

∑lk=1 Tk = T . Theorem 3.3 guarantees that the optimal values of nk and mk must

satisfy

nk =Tk

√tk

[√1/ρ(tk)− 1 +

√tk

] , mk =√

(1/ρ(tk)− 1) tk, (8)

given that T is sufficiently large so that mk ≤ Tk for all k, and the minimum MSE ofeach µk with respect to EV (tk) is

MSE µk:= MSE(µk,EV (tk)) =[√

1− ρ(tk) +√ρ(tk)tk

]2 VarG(X(tk))

Tk. (9)



THEOREM 3.5. The minimum sum of MSE µk, k = 1, . . . , l is

l∑k=1

MSE µk =

[l∑

k=1

[√1− ρ(tk) +

√ρ(tk)tk

]√VarG(X(tk))

]21

T, (10)

and is attained when

Tk =

[√1− ρ(tk) +

√ρ(tk)tk

]√VarG(X(tk))∑l

h=1

[√1− ρ(th) +

√ρ(th)th

]√VarG(X(th))

T (11)

for each k = 1, . . . , l.

It can be seen from (11) that each Tk is proportional to T , which implies T−1/2k =

O(T−1/2). Combining with Theorem 3.3, it follows that each estimator µk achieves thecanonical Monte Carlo convergence rate O(T−1/2), which is reflected in (10).

The intuition of (8) is identical to the single-budget case. Also, due to the constraint(4), each Tk decreases as Th, h 6= k increases, and is therefore adversely affected by thand ρ(th).

PROOF SKETCH. Apply the Cauchy-Schwarz inequality to the product of (4) and (9)to get a lower bound of the sum of MSEs. The optimal values of Tk are derived from thecondition for the equality to hold.

COROLLARY 3.6. The optimal values of nk and mk that minimize the sum ofMSE µk, k = 1, . . . , l are

nk =

√VarG(X(tk))ρ(tk)/tk∑l

h=1

√VarG(X(th))

[√1− ρ(th) +

√ρ(th)th

]T, mk =√

(1/ρ(tk)− 1)tk.

By now it should not be surprising that the optimal value of mk remains constant(for sufficiently large T ), which is the same as in the single-budget case. This is adirect consequence of splitting. Indeed, in Asmussen and Glynn [2007, p. 147–149],it was argued that the optimal length of the inner-level simulation can be determinedwithout knowledge of the total computational budget. In the context of multiple-budgetevaluation of simulation-optimization algorithms, this observation means that mk isindependent of Tk for each k = 1, . . . , l, and hence independent of T as well.

An alternative to minimizing the sum of the MSEs is to minimize the maximum ofthe MSEs, which is also equivalent to minimizing the width of the longest error bar inFigure 2.

THEOREM 3.7. The smallest possible maximum of MSE µk, k = 1, . . . , l is

maxk=1,...,l

MSE µk =1

T

l∑k=1

VarG(X(tk))[√


]2, (12)

and it is attained when

Tk =VarG(X(tk))

[√1− ρ(tk) +

√ρ(tk)tk

]2∑lh=1 VarG(X(th))

[√1− ρ(th) +

√ρ(th)th

]2T (13)

for each k = 1, . . . , l.



PROOF SKETCH. With the constraint (4), the maximum of MSE µk, k = 1, . . . , l isleast when they are all equal. Impose such a condition in (9), and the optimal valuesof Tk follow immediately.

The ratio Tk/T is independent of the total computational budget T , whether we min-imize the sum or maximum of the MSE. Moreover, the convergence rate is unaffectedby the choice of sum or maximum. In fact, as long as each ratio Tk/T is independentof the total computational budget T , the optimal convergence rate will be attained inboth cases, and the performance will differ only by a constant multiplier. Therefore,the choice of performance measure in the multiple-budget case is not vital in terms ofconvergence rate, and we only consider the “maximum” performance measure, whichis mathematically more tractable, in the remainder of the paper.

3.3. Reusing Partial Sample PathsSuppose the budgets tk, k = 1, . . . , l are increasing in k, namely t1 < · · · < tl. For manysimulation-optimization algorithms terminated at a computational budget tk with anestimated optimal solution X(tk), the partial sample path X(t1), . . . , X(tk−1) can bereused in µ1, . . . , µk−1, respectively.

For example, consider the class of stochastic approximation algorithms which in-cludes Kiefer-Wolfowitz [Kiefer and Wolfowitz 1952] and Robbins-Monro [Robbins andMonro 1951] algorithms. A typical iteration of these algorithms is of the form

X(j + 1)← X(j) + εjKY (j + 1),

where K is a given square matrix and, conditional on X(j), Y (j + 1) is an estimator of∇g(X(j)).

Typically, {εj}j>0 is a sequence of pre-specified positive deterministic constantsthat does not depend on the simulation-optimization budget t. If the simulation-optimization algorithm is used with budget tk, then when it terminates with the es-timated optimal solution X(tk), we would have iterated through the sequence of so-lutions {X(t)}t>0, and hence the intermediate optimal solutions X(t1), . . . , X(tk−1) forbudgets t1, . . . , tk−1 are available with no additional computational cost. Hence thebudget constraint takes the form of (14).

Sometimes, however, the sequence {εj}j>0 depends on the computational budget t.For example, in Nemirovski et al. [2008], εj = ε does not depend on j, but does dependon the (prespecified) budget t. In this case, the sample paths with each budget tk cannotbe reused for other budgets, since different values of ε are used for each budget.

Assuming that the estimated optimal solutions X(t1), . . . , X(tk−1) are reused, theireffective computational costs are all 0. Alternatively, one could think of X(t1) havinga computational cost of t1, but the cost of generating each X(tk) with k > 1 to be tk −tk−1, as the first tk−1 units of time have been expended to compute X(t1), . . . , X(tk−1).Hence, for given nk,mk, k = 1, . . . , l, the total computational cost for all the estimatorsµ1, . . . , µl is

T =

l∑k=1

nk (tk − tk−1 +mk) , (14)

where t0 = 0.The new budget constraint (14) takes the same form as the budget constraint (4) and

(5) without sample-path reuse, only with each tk replaced by tk − tk−1. Therefore, if noother constraints are imposed on nk and mk, then their optimal values take the same



form as in Corollary 3.6, only with each tk replaced by tk − tk−1, i.e.,

nk =

√ρ(tk) VarG(X(tk))/(tk − tk−1)∑l

h=1

√VarG(X(th))

[√1− ρ(th) +

√ρ(th)(th − th−1)

]T, (15)

mk =√

(1/ρ(tk)− 1)(tk − tk−1),

and the minimum sum of MSE µk is

l∑k=1

MSE µk =

[l∑

k=1

[√VarG(X(tk))

[√1− ρ(tk) +

√ρ(tk)(tk − tk−1)

]]]2 1

T.

This derivation relies on the assumption that all the sample paths are reused. Underthis assumption, the sample sizes nk, k = 1, 2, . . . , l are nested, so there is an additionalimplicit constraint that n1 ≥ · · · ≥ nl. This constraint will always be satisfied by the op-timal solution. Indeed, whether the performance measure is the sum or the minimumof the MSEs, it is certainly monotone increasing in each of the MSEs. A non-nestedsolution would imply that we would not use some of the partial sample paths, in effectreducing the sample size nk used in computing µk for some k. But this would increasethe MSE µk relative to reusing the sample paths. Therefore, we do not need to imposea constraint that the sample sizes nk are decreasing in k.

4. DISTRIBUTION FUNCTIONThe mean is sensitive to extreme values. In particular, if a simulation-optimizationalgorithm outputs a good quality solution most of the time, our intuition would sug-gest that this is a good algorithm. However, if the algorithm occasionally goes awry,giving an extremely poor solution with a small probability, then the estimator µ couldbe significantly affected. We might then draw the conclusion that the algorithm is un-acceptable, which is not the case.

An alternative performance measure is the distribution function value F (r) :=P(V (t) ≤ r) for some fixed r. Essentially, P(V (t) ≤ r) measures the probability thatthe estimated optimal solution X(t) is of low quality in terms of the true objectivefunction value g(X(t)) (for maximization problems).

In this section, we propose an estimator F (r) for the performance measure F (r)and show that it converges at a sub-canonical rate. In particular, if the simulation-optimization problem in question has a discrete feasible region, then the convergencerate is O(

√(log T )/T ), while the rate is O(T−1/3) if the feasible region is continuous.

We also solve for the optimal simulation run length parameters n,m that minimizethe asymptotic (as the budget available for evaluating the simulation-optimizationalgorithm increases) MSE (AMSE). Our argument involves expressing F (r) as the dis-tribution function of a conditional expectation, and invoking results from Lee [1998];Lee and Glynn [2003]. This is also the approach we follow in Section 5 and Section 6.

4.1. Monte Carlo Computation of Conditional Expectation QuantilesUsing (1), F (r) can be expressed as

F (r) := P(V (t) ≤ r) = P(g(X(t)) ≤ r) = P(E[G(X(t)|X(t))] ≤ r),which is the distribution function of a conditional expectation. The obvious estimatoris then

F (r) :=1

n

n∑i=1

I(Zi(t) ≤ r). (16)



The convergence rate of (16) is presented in Lee [1998]; Lee and Glynn [2003], wherecentral limit theorems are also established. As the main result of this paper concernsthe optimal values of n and m, only the cases with the optimal rate of convergence arestated here with an adaptation to the simulation-optimization algorithm context. Forcentral limit theorems with a suboptimal rate of convergence, see Lee [1998]; Lee andGlynn [2003].

For central limit theorems in both the discrete and continuous feasible region cases,we require a large number of technical conditions, as stated in Appendix A and Ap-pendix B, respectively. These appear to be unavoidable, as they arise in the proof ofthe central limit theorems derived in Lee [1998]; Lee and Glynn [2003]. A different setof conditions is provided in Gordy and Juneja [2010].

4.1.1. Discrete Feasible Region. Suppose that the range of the estimated optimal solu-tion X(t) is the discrete set

S = {x1, x2, . . . }.

Theorem 4.1 is a central limit theorem for F (r) in the case of a discrete feasibleregion, and is established in Lee and Glynn [2003, p. 12]. Here, η∗ is a positive constantrelating to the large-deviations behavior of G(x) that does not depend on t, and itsdefinition can be found in Appendix A.

In the sequel we use the notation g(T ) ∼ f(T ) to mean that limT→∞ g(T )/f(T ) = 1.For example, when we write m ∼ a log T , we mean that m depends on T , and m = m(T )is such that m(T )/(a log T )→ 1 as T →∞.

THEOREM 4.1. Assume the conditions A1-A6 as stated in Appendix A, and thatm ∼ a log T , n ∼ T/m, as T →∞, where a ≥ 1/2η∗. Then√

T

log T

[F (r)− F (r)

]⇒√aF (r)(1− F (r))N (0, 1)

as T →∞.

We will use the following specialized definition of the asymptotic mean squared error(AMSE). Let Wn be a sequence of estimators of the parameter θ, and {an} a sequenceof positive numbers satisfying an → ∞ or an → a > 0. Suppose that the central limittheorem holds in the sense of an(Wn − θ)⇒ Z with 0 < EZ2 <∞, then the asymptoticmean squared error of Wn with respect to θ is defined as

AMSEWn :=EZ2

a2n.

Under a uniform integrability assumption, we obtain the following result for theAMSE of F (r).

COROLLARY 4.2. In addition to the conditions of Theorem 4.1, assume that thesquare of the left-hand side in Theorem 4.1 is uniformly integrable. The asymptoticmean squared error (AMSE) of F (r) with respect to F (r) is

AMSE F (r):= AMSE(F (r), F (r)) = aF (r)(1− F (r))log T

T.

Like in Section 3, for any total computational budget T , the MSE can be minimizedby choosing appropriate parameters in the expression of n,m.



THEOREM 4.3. Under the assumptions of Theorem 4.1 and Corollary 4.2, the opti-mal value of a that minimizes AMSE F (r) is

a =1

2η∗,

and the minimum AMSE of F (r) is

AMSE F (r) =F (r)(1− F (r)) log T

2η∗T.

Here we see that F (r) has a sub-canonical convergence rate of O(√

(log T )/T ) whenthe feasible region is discrete. In fact, none of the estimators discussed in this paperother than µ attains the canonical convergence rate O(T−1/2).

4.1.2. Continuous Feasible Region. We now establish an analogous result when the esti-mated optimal solutions X(t) have a continuous support Γ.

Define

Y (x) :=g(x)− rσ(x)

,

for each possible x ∈ Γ, whereg(x) := EG(x) and σ2(x) := VarG(x).

Let Y denote Y (X(t)), and assume that it has a density fY (·). Denote by f(k)Y (·) the

k-th derivative of fY (·).The central limit theorem for F (r) for continuous feasible region is established in

Lee [1998, p. 56]. Here we only state the particular case where the optimal convergencerate is attained.

THEOREM 4.4. Assume the conditions B1-B7 in Appendix B hold. If T−1/3m→ a >0 and T−2/3n→ b > 0, where 0 < a, b <∞, then as T →∞,

T 1/3[F (r)− F (r)

]⇒√F (r)(1− F (r))

bN (0, 1) +

f(1)Y (0)

2a.

Since we balance the squared bias and the variance with the given choice of scalings,they converge at the same rate of O(T−1/3), and hence both show up in the asymptoticdistribution.

Similar to the discrete feasible region case, the approximate formula for the asymp-totic mean squared error follows immediately from the central limit theorem.

COROLLARY 4.5. In addition to the conditions of Theorem 4.4, assume that thesquare of the left-hand side in Theorem 4.4 is uniformly integrable. Then the AMSE ofF (r) with respect to F (r) is

AMSE F (r) =

[f(1)Y (0)2

4a2+F (r)(1− F (r))

b

]T−2/3.

The values of a, b can then be picked so that the AMSE is minimized.

COROLLARY 4.6. Under the conditions of Corollary 4.5, the optimal values of a andb that minimize AMSE F (r) are

a = 2−1/3f(1)Y (0)2/3F (r)−1/3(1− F (r))−1/3,

b = 21/3f(1)Y (0)−2/3F (r)1/3(1− F (r))1/3,



and the minimum AMSE of F (r) is

AMSE F (r) =3

2 3√

2f(1)Y (0)2/3F (r)2/3(1− F (r))2/3T−2/3.

This result indicates how the optimal values of n and m depend on the real valueof the performance measure F (r). Specifically, if F (r) is extreme, i.e., close to 0 or1, then the variance of its estimator F (r) can be very small. Therefore, most of thecomputational budget should be allocated to reduce the bias. Thus m, the length of theinner-level simulation, increases as F (r) approaches 0 or 1.

In the continuous feasible region case, the estimator F (r) converges at the rate ofO(T−1/3). It does not achieve the canonical Monte Carlo convergence rate O(T−1/2),and is also slower than the convergence rate in the discrete feasible region case. Asdiscussed in the introduction, Broadie et al. [2011] have developed an estimator witha faster convergence rate by sequentially allocating computational effort in the innersimulation with different values of m for different X(t). We do not pursue a sequentialapproach here.

4.2. The Multiple-Budget CaseLet t1 < t2 < · · · < tl be the budgets under consideration, and Fk(r) := P(V (tk) ≤r), k = 1, . . . , l be the performance measures that need to be estimated. The obviousestimator for each Fk(r) is

Fk(r) :=1

nk

nk∑i=1

I(Zi(tk) ≤ r). (17)

Suppose that no partial sample paths are reused. It follows that the optimal valuesof nk and mk can be chosen independently for each k = 1, . . . , l. Hence the minimumAMSE F k(r) with respect to Fk(r) for each k = 1, . . . , l is given by Corollary 4.2 orCorollary 4.5, namely

AMSE Fk(r):= AMSE(Fk(r), Fk(r)) =Fk(r)(1− Fk(r)) log Tk

2η∗kTk(18)

if the feasible region of the simulation-optimization problem is discrete, or

AMSE Fk(r) =3

2 3√

2f(1)Yk

(0)2/3Fk(r)2/3(1− Fk(r))2/3T−2/3 (19)

if the feasible region is continuous.

THEOREM 4.7. Suppose the simulation-optimization problem has discrete feasibleregion and the conditions in Corollary 4.2 are satisfied for each computational budgettk. Then a budget allocation achieves the optimal convergence rate if and only if

TkT→ τ∗k ,

where

τ∗k =

√Fk(r)(1− Fk(r))/2η∗k∑l

h=1

√Fh(r)(1− Fh(r))/2η∗h

,



and the minimum sum of the AMSE Fk(r) is

l∑k=1

AMSE Fk(r) =

[l∑

k=1

√Fk(r)(1− Fk(r))

2η∗k

]2log T

T.

PROOF SKETCH. We first show that the sum of AMSE Fk(r) can be bounded to theorder of O((log T )/T ). It is then left to argue that no better rate of convergence ispossible.

It is a basic fact in real analysis that Tk/T has a subsequence converging to some τk.The optimal values τ∗k can be determined via the inequality of arithmetic and geometricmeans, and then it follows that the entire sequence Tk/T has to converge to τ∗k .

The form of Tk here is very similar to that in Theorem 3.5. Indeed, both are propor-tional to the total computational budget T , and the proportionality constant dependsonly on the parameters at the budget tk, scaled by the total. This happens because theMSE at each budget tk depends on the others only through the total budget constraint.

As mentioned in Section 3, the optimal values of n and m do not have closed-formexpressions if the sum of MSEs is used as the performance measure in the continuousfeasible region case, but the maximum of the MSEs is more tractable. We now givethe result when minimizing the maximum of the MSEs. The resulting optimal valueshave the same convergence rate as would be obtained if we were minimizing the sumof the MSEs, and the optimal values of n and m for the two performance measures areequivalent in the sense that they differ only by a constant factor.

THEOREM 4.8. Suppose the simulation-optimization problem has continuous fea-sible region and the conditions in Corollary 4.5 are satisfied for each computationalbudget tk. Then the smallest possible maximum of AMSE Fk(r) is

maxk=1,...,l

AMSE Fk(r) =3

2 3√

2

[l∑

h=1

f(1)Yh

(0)Fh(r)(1− Fh(r))

]2/3T−2/3,

and is achieved whenTkT→ τ∗k ,

where

τ∗k =f(1)Y (0)Fk(r)(1− Fk(r))∑l

h=1 f(1)Yh

(0)Fh(r)(1− Fh(r))

for each k = 1, . . . , l.

PROOF SKETCH. The argument in the proof of Theorem 4.7 applies here as well, sothat Tk/T converges to some τk for each k. The optimal values of τk are obtained usingthe same idea as in Theorem 3.7.

As discussed in Section 3, here we also see that the optimal convergence rate isattained if and only if

TkT→ τk

for each k = 1, . . . , l, where each τk is a nonzero constant. The convergence rate is unaf-fected whether we minimize the sum or maximum of AMSEs. We chose to minimize thesum of AMSEs in the discrete feasible region case, and the maximum of the AMSE’s



in the continuous feasible region case, simply because the optimization problems aremathematically tractable and explicit expressions can be given for the optimal solu-tions.

5. QUANTILESIn this section, we consider the p-th quantile of g(X(t)), the true objective functionvalue of the estimated optimal solutionX(t). We adopt the definition of the pth quantileas in Lee [1998]:

q = q(p) := sup{r : F (r) < p}.The obvious estimator of q, and the one we adopt, is

Q = Q(p) := sup{r : F (r) < p}.

5.1. The Single-Budget CaseSuppose the range of the estimated optimal solution X(t) is continuous. Then theasymptotic behavior of Q(p) has been discussed in Lee [1998]. The result and a set ofsufficient conditions are quoted below, with adaptation to the simulation-optimizationalgorithm performance measure context.

Given the probability p, the corresponding benchmark r is simply q(p). Thus weredefine the quantity Y (x) defined in Section 4 to

Y (x) :=g(x)− qσ(x)

.

Recall that fY (·) is the density of Y := Y (X(t)).

THEOREM 5.1 (LEE [1998]). Let 0 < p < 1. Assume that the conditions C1-C7 inAppendix C hold, that F (·) is continuously differentiable at q with F ′(q) > 0, and f (1)Y (·)is continuous at the point 0. If T−1/3m → a and T−2/3n → b, where 0 < a, b < ∞, thenas T →∞,

T 1/3 [Q(p)− q(p)]⇒√p(1− p)

b

1

F ′(q)N (0, 1)− 1

2af(1)Y (0)

1

F ′(q).

The asymptotic distribution of Q(p) is the same as that of F (r), except for the factorof F ′(q) and the flipping of the sign of the bias term. As a result, its AMSE is similarto that of F (r) too.

COROLLARY 5.2. In addition to the conditions of Theorem 5.1, if the square of theleft-hand side in Theorem 5.1 is uniformly integrable, then the AMSE of Q(p) withrespect to q(p) is

AMSEQ(p):= AMSE(Q(p), q(p)) =

[f(1)Y (0)2

4a2+p(1− p)

b

]T−2/3

F ′(q)2

The AMSE here is a scaled version of that for the distribution function in Section 4,so we immediately have the following analog of Corollary 4.6.

COROLLARY 5.3. Assume the conditions in Theorem 5.1, as well as the uniformintegrability in Corollary 5.2. The optimal values of a and b that minimize AMSEQ(p)are

a = 2−1/3f(1)Y (0)2/3p−1/3(1− p)−1/3,

b = 21/3f(1)Y (0)−2/3p1/3(1− p)1/3,



and the minimum AMSE of Q(p) is

AMSEQ(p) =3

2 3√

2f(1)Y (0)2/3p2/3(1− p)2/3F ′(q)−2T−2/3.

The estimator Q(p) has a convergence rate of O(T−1/3), which is also sub-canonical.In particular, it converges at the same speed as its counterpart F (r).

5.2. The Multiple-Budget CaseLet t1 < t2 < · · · < tl be the budgets under consideration, and qk(p) := sup{r : Fk(r) <p}, k = 1, . . . , l be the performance measures that need to be estimated. The obviousestimator for each qk(p) is

Qk(p) := sup{r : Fk(r) < p},and the minimum AMSE of each Qk(p) with respect to qk(p) is given by Corollary 5.3:

AMSEQk(p) := AMSE(Qk(p), qk(p)) =3

2 3√

2f(1)Yk

(0)2/3p2/3(1− p)2/3F ′k(qk)−2T−2/3.

THEOREM 5.4. Suppose the simulation-optimization problem has continuous fea-sible region and the conditions in Corollary 5.3 are satisfied for each computationalbudget tk. Then the smallest possible maximum of AMSEQk(p) is

maxk=1,...,l

AMSEQk(p) =3

2 3√

2

[l∑

h=1

f(1)Yh

(0)F ′h(qh)−3p(1− p)

]2/3T−2/3,

and is achieved whenTkT→ τ∗k ,

where

τ∗k =f(1)Yk

(0)F ′k(qk)−3∑lh=1 f

(1)Yh

(0)F ′h(qh)−3.

The form of τ∗k follows from the same reasoning as in Theorem 4.7, because of thesimilarity of the normal distribution in Theorem 5.1 to that in Theorem 4.4.

PROOF SKETCH. Simply replace the result of Corollary 4.5 by that of Corollary 5.2,and apply the same argument used in the proof of Theorem 4.8.

6. MULTIDIMENSIONAL CENTRAL LIMIT THEOREMSIn the scenario where the sample paths are reused, the optimal solutions X(tk) withdifferent computational budgets are obtained from shared sample paths, and exhibitsome correlation. Therefore, besides the AMSE of the estimators with each budgettk, we are also interested in the joint distribution of the estimators for the variousbudgets, since they are not independent.

The central limit theorems in Lee [1998]; Lee and Glynn [2003] provide conditionsunder which the estimators in the single-budget case are asymptotically normally dis-tributed. In this section, we generalize this result, and show that the vector of allestimators with budgets tk, k = 1, . . . , l is asymptotically jointly normally distributed.

These multidimensional central limit theorems not only give the asymptotic distri-bution of each estimator with budget tk as its single-budget counterparts do, but more



importantly, they also reveal the dependence across different budgets when T is large.When the computational budget T is a fixed large quantity, the univariate central limittheorems can be used to compute the marginal accuracy of each estimator, but the mul-tivariate central limit theorems help by quantifying the jointly distributed error of theestimators.

6.1. PreliminariesIn this section we list some definitions and theorems that will be used in the proof ofthe central limit theorems.

Recall that the Hadamard product of two n ×m matrices A = (aij) and B = (bij) isthe element-wise product, i.e., the n×m matrix with (i, j)-th entry aijbij .

LEMMA 6.1 (SLUTSKY’S THEOREM FOR RANDOM VECTORS). Let {Xn}, {Yn} besequences of random vectors in Rm. If

Xn ⇒ X, Yn ⇒ c,

then

Xn + Yn ⇒ X + c,

Xn ◦ Yn ⇒ c ◦X.

The proof of Slutsky’s Theorem can be found in various references, e.g., [Billingsley1986], and Lemma 6.1 is a straightforward generalization to multiple dimension.

Before stating the multivariate central limit theorems, we need to study the covari-ance between the estimators associated with the estimated optimal solutions at eachbudget tk. The following result shows that one can replace Zi(tk) in the covariancecalculations by V (tk) asymptotically, i.e., the effect on the variance of inner-stage sam-pling vanishes.

LEMMA 6.2. Let qk, k = 1, 2, . . . , l be given and suppose that P (V (tk) = qk) = 0 forall k = 1, 2, . . . , l. Suppose further that, for each k = 1, . . . , l, mk = mk(T ) → ∞ asTk → ∞. Then for any pair k, h = 1, . . . , l, and for any sequence of constants {sk(T )}and {sh(T )} with sk(T )→ 0 and sh(T )→ 0 as T →∞, we have

Cov (I(Zi(tk) ≤ qk + sk(T )), I(Zi(th) ≤ qh + sh(T )))

T→∞−−−−→ Cov (I(V (tk) ≤ qk), I(V (th) ≤ qh)) .

Lemma 6.2 is used in the proof of the central limit theorem for the quantiles in themultiple-budget case. The corresponding result for the distribution function is simpler.

COROLLARY 6.3. Let rk, k = 1, 2, . . . , l be given and suppose that P (V (tk) = rk) = 0for all k = 1, 2, . . . , l. Suppose that, for each k = 1, . . . , l, mk → ∞ as T → ∞. For anypair k, h = 1, 2, . . . , l, we have

Cov (I(Zi(tk) ≤ rk), I(Zi(th) ≤ rh))T→∞−−−−→ Cov (I(V (tk) ≤ rk), I(V (th) ≤ rh)) .

PROOF. Set qk = rk and sk(T ) = 0 in Lemma 6.2, and the result follows.

To simplify the notation for the covariance matrix, we adopt the following definition.

Definition 6.4. Let ΛT be the covariance matrix of I(Zi(tk) ≤ rk), k = 1, . . . , l, and Λthe covariance matrix of I(V (tk) ≤ rk), k = 1, . . . , l, namely

ΛTkh := Cov (I(Zi(tk) ≤ rk), I(Zi(th) ≤ rh)) ,

Λkh := Cov (I(V (tk) ≤ rk), I(V (th) ≤ rh)) .



Corollary 6.3 above basically states that ΛT → Λ entry-wise as the inner sample sizeincreases.

Definition 6.5. For simplicity of notation, we use vr to denote the vector whose en-tries are those of v raised to the power r, i.e., (vr)i = vri . This is equivalent to defining vrso that Diag(vr) = (Diag v)r, where Diag(v) is the diagonal matrix with the elementsof v on the diagonal.

6.2. Distribution FunctionIn this section, we extend Theorem 4.1 and Theorem 4.4 to multivariate central limittheorems. Since the true objective function values g(X(tk)) at the estimated optimalsolutions X(tk) with different computational budgets tk are estimated independently,the simulation run lengths mk, k = 1, . . . , l may be different from each other.

As before, we discuss separately the case with discrete feasible region and the casewith continuous feasible region. In each case, the central limit theorem is stated withthe assumption that nk is the same for each k = 1, . . . , l, which is relaxed later to obtaina more general result.

THEOREM 6.6. Assume A1-A6 for each tk, k = 1, . . . , l. Suppose nk = n for eachk = 1, . . . , l. Then, if mk ∼ ak log T for each k = 1, . . . , l and n ∼ bT/ log T > 0 as T →∞where ak ≥ 1/2η∗k for each k = 1, . . . , l, then√

T

log T

[Fk(rk)− Fk(rk)

]k⇒ b−1/2N (0,Λ), (20)

where [v(k)]k represents a vector with k-th entry being v(k).

PROOF SKETCH. Using the Cramer-Wold Theorem [Billingsley 1986], we reducethe multiple-budget version to a univariate central limit theorem concerning an ar-bitrary linear combination of the estimators F (rk), k = 1, . . . , l. The single-budget cen-tral limit theorem, namely Theorem 4.1, has been proved in Lee and Glynn [2003]. Theproof centralizes the random variable and calculates separately the limit of the stan-dardized random variable and its bias. By adopting the same technique, we isolate thebias of the linear combination, and show that the standardized linear combination alsoconverges to a normal random variable using Lemma D.1, the same lemma used in theproof of the single-budget central limit theorem.

It is natural to further extend the above theorem and allow nk, k = 1, . . . , l to dif-fer, as the next theorem states. In fact, Theorem 6.6 is completely captured by Theo-rem 6.7, yet they are both fully stated here, since Theorem 6.6 serves as an importantintermediate step in the proof of Theorem 6.7.

THEOREM 6.7. Assume A1-A6 for each tk, k = 1, . . . , l. Suppose mk → ∞ and nk →∞ for each k = 1, . . . , l. Also assume that the nk ’s are in decreasing order. Then, ifmk ∼ ak log T and nk ∼ bkT/ log T for each k = 1, . . . , l as T → ∞ where ak ≥ 1/2η∗k foreach k = 1, . . . , l, then√

T

log T

[Fk(rk)− Fk(rk)

]k⇒ N (0, Bmin ◦ Λ),

where Bmin is an l-dimensional square matrix with entries Bmin(k, h) := b−1k∧h, and k ∧h = min{k, h}.

PROOF SKETCH. In contrast to the assumptions in Theorem 6.6, the nk ’s here areno longer assumed to be equal to each other, and so grouping the summands into i.i.d.



linear combinations does not work directly. However, in each estimator Fk(rk), the nksummands are independent, and can be divided into k groups, each of which consistsof identically distributed random variables. Hence the central limit theorem can beshown by breaking down the random vector into l independent parts, and proving thetheorem for each part using the same argument in Theorem 6.6.

For the case with continuous feasible region, we proceed in the same way as above,and obtain the following results.

THEOREM 6.8. Assume B1-B7 for each tk, k = 1, . . . , l. If T−1/3mk → ak > 0 andnk = n for each k = 1, . . . , l with T−2/3n→ b, where 0 < ak, b <∞, then as T →∞,

T 1/3[Fk(rk)− Fk(rk)

]k⇒ N (0, b−1Λ) +

[f(1)Yk

(0)

2ak

]k

.

PROOF SKETCH. The proof is almost the same as that of Theorem 6.6, except thatthe bias is no longer zero. Since the bias of each entry is unrelated to the rest, theirvalues are directly given by the single-budget central limit theorem, Theorem 4.4.

THEOREM 6.9. Assume B1-B7 for each tk, k = 1, . . . , l. Assume further thatT−1/3mk → ak > 0 and T−2/3nk → bk > 0 for each k = 1, . . . , l. If the nk ’s are indecreasing order, then as T →∞,

T 1/3[Fk(rk)− Fk(rk)

]k⇒ N (0, Bmin ◦ Λ) +

[f(1)Yk

(0)

2ak

]k

.

PROOF SKETCH. Apply the same technique in Theorem 6.7 to generalize Theo-rem 6.8.

6.3. QuantilesThe central limit theorem for quantiles as stated in Section 5 can also be generalizedin the same fashion. Again, the case with nk being the same for all k = 1, . . . , l is shownfirst, and serves as an intermediate result for the more general central limit theorem.

THEOREM 6.10. Assume that the conditions in Theorem 5.1 hold for each tk, k =1, . . . , l. If T−1/3mk → ak and nk = n for each k = 1, . . . , l with T−2/3n → b, where0 < ak, b <∞, then as T →∞,

T 1/3 [Qk(pk)− qk(pk)]k ⇒[

1

F ′k(qk(pk))

]k

◦ N (0, b−1Λ)−

[f(1)Yk

(0)

2ak

1

F ′k(qk(pk))

]k

.

PROOF SKETCH. The Cramer-Wold Theorem [Billingsley 1986] cannot be applieddirectly. We study the joint distribution function of Qk(pk), k = 1, . . . , l, and reduce it toa weighted sum of Bernoulli random variables χki, k = 1, . . . , l, i = 1, . . . , n.

The random variables χki, k = 1, . . . , l, i = 1, . . . , n are additive, so we can then applythe Cramer-Wold Theorem. The proof then follows that of the central limit theoremfor quantiles in Lee [1998], and the resulting weak convergence in terms of χki, k =1, . . . , l, i = 1, . . . , n can be “lifted” back to that of Qk(pk), k = 1, . . . , l.

THEOREM 6.11. Assume that the conditions in Theorem 5.1 hold for each tk, k =1, . . . , l. Suppose further that T−1/3mk → ak and T−2/3nk → bk for each k = 1, . . . , l,where 0 < ak, bk <∞. If the nk ’s are in decreasing order, then as T →∞,

T 1/3 [Qk(pk)− qk(pk)]k ⇒[

1

F ′k(qk(pk))

]k

◦ N (0, Bmin ◦ Λ)−

[f(1)Yk

(0)

2ak

1

F ′k(qk(pk))

]k

.



7. SUMMARYWe have proposed performance measures for simulation-optimization algorithms, in-cluding the mean, distribution function, and quantiles of V (t), which denotes the trueobjective function value associated with the estimated optimal solution. Estimatorsfor each of the performance measures have been analyzed, and we have shown thatthe estimator of the mean converges at the canonical Monte Carlo convergence rate.Estimators of the other performance measures converge at sub-canonical Monte Carloconvergence rates.

When the computational budget t is allowed to vary and multiple budgets are con-sidered, we have obtained the optimal budget allocation schemes for each estimator.In particular, we have shown that the optimal convergence rate is achieved as longas the computational effort spent on each tk is maintained as a constant fraction ofthe total computational budget T , and the convergence rate is unaffected whether oneminimizes the sum or maximum of the AMSEs.

The expressions we have derived for the optimal simulation run lengths n and mare important because they confirm our intuition in some cases and show the tradeoffsinvolved between the variance and bias of the estimators, exactly as in, for example,the expressions for the optimal step length in finite-difference estimators of sensitivi-ties; see Glasserman [2003]. Some of the constants appearing in these expressions areunlikely to be estimable in practice, but they provide insight into what features of theproblem influence n and m.

Finally, multivariate central limit theorems for each performance measure are es-tablished by generalizing results in Lee [1998], and indicate that a vector of estimatorsfor different budgets tk, k = 1, . . . , converges in distribution to a multivariate normalrandom variable.

A. CONDITIONS FOR THEOREM 4.1Define

Γ+ = {xi : EG(xi) > r, i ≥ 1},Γ− = {xi : EG(xi) ≤ r, i ≥ 1}.

One of the fundamental results in large deviations theory [Bucklew 1990] implies thatfor X(t) ∈ Γ−, P(Zi(t) > r) typically converges to 0 exponentially fast; whereas forX(t) ∈ Γ+, P(Zi(t) ≤ r) typically converges to 0 exponentially fast.

We also use stronger assumptions than are necessary for the central limit theorem,for simplicity. For all i ≥ 1 and all θ ∈ R, suppose

ϕi(θ) := E[exp(θG(xi))] <∞.The constant θ∗i is defined as the root of the equation

ϕ′i(θ∗i )/ϕi(θ

∗i ) = r,

and η(xi) is given by

η(xi) := θ∗i r − logϕi(θ∗i ) = sup

θ{θr − logϕi(θ)}.

Let η∗ be the slowest decay rate:

η∗ := inf{η(xi) : i ≥ 1}.Assume the following conditions for Theorem 4.1:

A1. For i ≥ 1 and θ ∈ R, ϕi(θ) <∞;A2. P(G(xi) > r) > 0, xi ∈ Γ−, P(G(xi) ≤ r) > 0, xi ∈ Γ+;



A3. EG(xi) 6= r for all i ≥ 1;A4. For all i ≥ 1, G(xi) is a continuous random variable;A5. B∗ = {xi : i ≥ 1, η(xi) = η∗} is nonempty and finite;A6. inf{η(xi) : i ≥ 1, xi /∈ B∗} > η∗.

B. CONDITIONS FOR THEOREM 4.4Let Ckb be the class of k-times continuously differentiable functions whose derivativesof order less than or equal to k are bounded.

Define

βk(x) := E|G(x)− µ(x)|k

σ(x)k

for each k = 1, 2, . . . , and

βk(y) := E [βk(X(t))|Y (X(t)) = y] .

Define γp(x) as the p-th cumulant of G(x). In particular, if

c(s) := log E[esG(x)

]is the cumulant-generating function, then γp(x) is the series that satisfies

c(s) =

∞∑p=1

γp(x)sp

p!.

Given ν ≥ 1, define

— Σ(ν) := {s : ∃k1, . . . , kν ∈ N0 solving k1 + 2k2 + · · ·+ νkν = ν, s = k1 + · · ·+ kν}— κ(ν, s) := {k1, . . . , kν ∈ N0 : k1 + 2k2 + · · ·+ νkν , s = k1 + · · ·+ kν} for each s ∈ Σ(ν)

— χν,s(x) :=∑

(k1,...,kν)∈κ(ν,s)

ν∏p=1

1

kp!

[γp+2(x)

(p+ 2)!σp+2(x)

]kpfor all z ∈ Γ and s ∈ Σ(ν).

For notational purpose, for any two real-valued function f(·) and g(·), denote (f ·g)(x) := f(x) · g(x). Assume for Theorem 4.4 that for some k ≥ 1:

B1. E |G(X(t))|2k+2 <∞;B2. there exists a density g(·) and ε > 0 such that P(G(X(t)) ∈ ·|X(t)) ≥ εg(·) almost

surely;B3. β2k+2(·) ∈ C1b ;B4. fY (·) ∈ C2kb ;B5. (fY · χ1,1)(·) ∈ C2k−1b ;

Assume further that for all s = k1 + · · · + kν , where (k1, . . . , kν) is any non-negativeinteger solution of the equation k1 + 2k2 + · · ·+ νkν = ν,

B6. (fY · χ2ν,s)(·) ∈ C2(k−ν)b for s ∈ Σ(2ν), 1 ≤ ν ≤ k − 1;B7. (fY · χ2ν+1,s)(·) ∈ C2(k−ν)−1b for s ∈ Σ(2ν + 1), 1 ≤ ν ≤ k − 1.

C. CONDITIONS FOR THEOREM 5.1Given the probability p, the corresponding value of r is q(p). Redefine

Y (x) :=g(x)− qσ(x)

.

Assume the following conditions for Theorem 5.1:



C1. E |G(X(t))|4 <∞;C2. there exists a density function g(·) such that P(G(X(t)) ∈ ·|X(t)) ≥ εg(·) a.s.;C3. E[β4(X(t))|Y (X(t)) = ·] ∈ C2b for | · −q| < ε;C4. fY (·) ∈ C2b for | · −1| < ε;C5. for | · −q| < ε, (fY · χ1,1)(·) ∈ C1b .

Let h(·, ·) denote the joint density function of (µ(X(t)), σ(X(t))), which is assumed toexist. Assume further that, for a fixed ε > 0,

C6. sup|x−q|<2ε

supξ∈R

∫ ∞0

z22h2(x+ ξz2, z2) dz2 <∞;

C7. sup|x−q|<ε

supξ∈R

∫ ∞0

z42

[∂

∂z1h(x+ ξz2, z2)

]2dz2 <∞.

D. LEMMA FOR THE PROOF OF MULTIVARIATE CENTRAL LIMIT THEOREMSLEMMA D.1. Assume that the following conditions hold:

(a) For each c > 0, the sequence (Xcj : j ≥ 1) consists of independent and identicallydistributed random variables;

(b) EXc,1 = 0,VarXc1 := σ2c ;

(c) limc→∞ σ2c = σ2 ∈ (0,∞);

(d) the family {X2c,1 : c > 0} is uniformly integrable.

If n(c) → ∞ as c → ∞, then {Xcj : 1 ≤ j ≤ n(c), c > 0} satisfies the Lindeberg-Fellercondition, namely for each ε > 0,

limc→∞

1

VarSc

n(c)∑j=1

E[X2cjI(X2cj > ε2 VarSc

)]= 0

where Sc =∑n(c)j=1 Xcj .

ELECTRONIC APPENDIXThe electronic appendix for this article can be accessed in the ACM Digital Library.

ACKNOWLEDGMENTS

This work was partially supported by National Science Foundation grants CMMI-0800688 and CMMI-1200315. We thank the editorial team for comments that improved the paper.

REFERENCES

ASMUSSEN, S. AND GLYNN, P. W. 2007. Stochastic Simulation: Algorithms and Anal-ysis. Stochastic Modelling and Applied Probability Series, vol. 57. Springer, NewYork.

BILLINGSLEY, P. 1986. Probability and Measure. Wiley-Interscience, New York, NY.BROADIE, M., DU, Y., AND MOALLEMI, C. C. 2011. Efficient risk estimation via nested

sequential simulation. Management Science 57, 6, 1172–1194.BUCKLEW, J. A. 1990. Large Deviation Techniques in Decision, Simulation, and Esti-

mation. Wiley-Interscience, New York, NY.CHEN, C.-H., FU, M. C., AND SHI, L. 2008. Simulation and optimization. In Tutorials

in Operations Research. INFORMS PubsOnLine, Hanover, MD, 247–260.CHEN, C.-H. AND LEE, L. H. 2011. Stochastic Simulation Optimization: An Optimal

Computing Budget Allocation. World Scientific.



FU, M. C. 2002. Optimization for simulation: theory vs. practice. INFORMS Journalon Computing 14, 192–215.

FU, M. C., CHEN, C.-H., AND SHI, L. 2008. Some topics for simulation optimization.In Proceedings of 2008 Winter Simulation Conference. IEEE, Piscataway NJ, 27–38.

GLASSERMAN, P. 2003. Monte Carlo Methods in Financial Engineering. StochasticModelling and Applied Probability Series, vol. 53. Springer, New York.

GLYNN, P. W. 2002. Additional perspectives on simulation for optimization. INFORMSJournal on Computing 14, 220–222.

GORDY, M. B. AND JUNEJA, S. 2010. Nested simulation in portfolio risk measurement.Management Science 56, 10, 1833–1848.

KIEFER, J. C. AND WOLFOWITZ, J. 1952. Stochastic estimation of the maximum of aregression function. Annals of Mathematical Statistics 23, 3, 462–466.

LEE, S.-H. 1998. Monte Carlo computation of conditional expectation quantiles. Ph.D.thesis, Stanford University, Stanford, CA.

LEE, S.-H. AND GLYNN, P. W. 2003. Computing the distribution function of a condi-tional expectation via Monte Carlo: Discrete conditioning spaces. ACM Transactionson Modeling and Computer Simulation 13, 3, 238–258.

NEMIROVSKI, A., JUDITSKY, A., LAN, G., AND SHAPIRO, A. 2008. Robust stochasticapproximation approach to stochastic programming. SIAM J. on Optimization 19, 4,1574–1609.

NOCEDAL, J. AND WRIGHT, S. 2006. Numerical Optimization. Springer Series inOperations Research and Financial Engineering. Springer, New York.

PASUPATHY, R. AND HENDERSON, S. G. 2006. A testbed of simulation-optimizationproblems. In Proceedings of the 2006 Winter Simulation Conference, L. F. Perrone,F. P. Wieland, J. Liu, B. G. Lawson, D. M. Nicol, and R. M. Fujimoto, Eds. IEEE,Piscataway NJ, 255–263.

PASUPATHY, R. AND HENDERSON, S. G. 2011. Simopt : A library of simulation op-timization problems. In Proceedings of the 2011 Winter Simulation Conference,S. Jain, R. R. Creasey, J. Himmelspach, K. P. White, and M. Fu, Eds. IEEE, Pis-cataway NJ, 4080–4090.

PROMODEL. 2011. SimRunner.ROBBINS, H. AND MONRO, S. 1951. A stochastic approximation method. Annals of

Mathematical Statistics 22, 3, 400–407.

Received February 2007; revised March 2009; accepted June 2009


Online Appendix to:Optimal Budget Allocation in the Evaluation ofSimulation-Optimization Algorithms

ERIC ANJIE GUO and SHANE G. HENDERSON, Cornell University

A. PROOFSPROOF OF PROPOSITION 3.2. Recall that ρ(t) = Var g(X(t))/VarG(X(t)) ≥ 0. Let

µ = EV (t) = EG(X(t)). Since µ is an unbiased estimator for µ, the MSE of µ is itsvariance, which is

Var µ = Var1

n

n∑i=1

Zi(t) =1

nVar

1

m

m∑j=1

Gj(X(t))

=1

nm2[mVarG(X(t)) +m(m− 1) Cov(G1(X(t)), G2(X(t)))]

=1 + (m− 1)ρ(t)

nmVarG(X(t)).

PROOF OF THEOREM 3.3. By Proposition 3.2 and the budget constraint (3), the op-timization problem can be formulated as

min1 + (m− 1)ρ(t)

nmVarG(X(t))

s.t. n(t+m) = T,

where the integer constraints and bounds on n,m are relaxed.Its Lagrangian function is

L(n,m;λ) =1 + (m− 1)ρ(t)

nmVarG(X(t))− λ [n(t+m)− T ] .

By the Karush-Kuhn-Tucker conditions, e.g., [Nocedal and Wright 2006], at an optimalpoint the partial derivatives satisfy

∂

∂nL(n,m;λ) = −1 + (m− 1)ρ(t)

n2mVarG(X(t))− λ(t+m) = 0,

∂

∂mL(n,m;λ) = −1− ρ(t)

nm2VarG(X(t))− λn = 0.

The above system of equations leads to the solution

m =√

(1/ρ(t)− 1)t,

and the budget constraint (3) gives

n =T

t+m=

T√t[√

1/ρ(t)− 1 +√t] .

Direct substitution of n,m into (3.2) gives the value of the minimum MSE.

c© 2010 ACM 1049-3301/2010/03-ART39 $10.00DOI 10.1145/0000000.0000000 http://doi.acm.org/10.1145/0000000.0000000


App–2 E. A. Guo and S. G. Henderson

PROOF OF THEOREM 3.5. Conditioning on the value of each Tk, the minimum MSEof µk is given by (9), and therefore

l∑k=1

MSE µk =

l∑k=1

[√1− ρ(tk) +

√ρ(tk)tk

]2 VarG(X(tk))

Tk.

Multiply both sides by the total budget constraint (4), and then apply the Cauchy-Schwarz inequality,

T

l∑k=1

MSE µk =

l∑k=1

Tk

l∑k=1

[√1− ρ(tk) +

√ρ(tk)tk

]2 VarG(X(tk))

Tk

≥

[l∑

k=1

[√1− ρ(tk) +

√ρ(tk)tk

]√VarG(X(tk))

]2,

with equality if and only if

Tk = C[√


]2 VarG(X(tk))

Tk,

for each k = 1, . . . , l, for a constant C independent of k, which leads to

Tk =

[√1− ρ(tk) +

√ρ(tk)tk

]√VarG(X(tk))∑l

h=1

[√1− ρ(th) +

√ρ(th)th

]√VarG(X(th))

T.

PROOF OF COROLLARY 3.6. The optimal value of each mk is already given in (8).By substituting the value of each Tk in (11) into (8), we further obtain the expressionfor the optimal nk.

PROOF OF THEOREM 3.7. Conditioning on each Tk, the minimum MSE of each µkis given by (9), which is a decreasing function of Tk. Hence with the total budget con-straint (4), the maximum of MSE µk, k = 1, . . . , l is least when they are equal to eachother.

It follows that

Tk ∝[√


]2VarG(X(tk)),

which leads to (13). A direct substitution into (9) gives the value of (12).

PROOF OF COROLLARY 4.2. Recall that MSE F (r) = E[F (r) − F (r)]2 by definition.By applying Theorem 25.12 in [Billingsley 1986], Theorem 4.1 and the assumptionthat the square of the left-hand side is uniformly integrable give us the convergence inexpectation:

T

log TMSE F (r) =

T

log TE[F (r)− F (r)

]2= E

[√T

log T

[F (r)− F (r)

]]2→ E

[√aF (r)(1− F (r))N (0, 1)

]2= aF (r)(1− F (r)),

as T →∞, which gives the expression for the AMSE.


Optimal Budget Allocation in the Evaluation of SO Algorithms App–3

PROOF OF THEOREM 4.3. It is clear from Corollary 4.2 that AMSE F (r) is strictlyincreasing in a. Since the condition a ≥ 1/2η∗ is needed for the optimal convergencerate, the optimal value of a is the lower bound 1/2η∗. A direct substitution gives thevalue of the AMSE.

PROOF OF COROLLARY 4.5. By following the same argument as in the proof ofCorollary 4.2, we have

T 2/3 MSE F (r) = T 2/3 E[F (r)− F (r)

]2= E

[T 1/3

[F (r)− F (r)

]]2→ E

[√F (r)(1− F (r))

bN (0, 1) +

f(1)Y (0)

2a

]2=f(1)Y (0)2

4a2+F (r)(1− F (r))

b,

which gives the expression for the AMSE:

AMSE F (r) =

[f(1)Y (0)2

4a2+F (r)(1− F (r))

b

]T−2/3.

PROOF OF COROLLARY 4.6. The conditions of Theorem 4.4

T−1/3m→ a, T−2/3n→ b

together with the budget constraint (3) implies that

ab = 1.

Thus, b = 1/a, and from Corollary 4.5 the AMSE is

AMSE F (r) =

[f(1)Y (0)2

4a2+ aF (r)(1− F (r))

]T−2/3.

The minimizing value of a can be found by one-dimensional optimization, and then bis obtained from b = 1/a.

a = 2−1/3f(1)Y (0)2/3F (r)−1/3(1− F (r))−1/3,

b = 21/3f(1)Y (0)−2/3F (r)1/3(1− F (r))1/3.

PROOF OF THEOREM 4.7. We first show that the sum of AMSE Fk(r) can indeed bebounded to the order of O((log T )/T ). To see this, choose constants 0 < τk < 1 such that∑lk=1 τk = 1 and let

Tk = τkT (21)

for each k = 1, . . . , l. Then it follows from (18) that, for sufficiently large T , since τk isfixed and log τk < 0, we have

AMSE Fk(r) =Fk(r)(1− Fk(r))(log τk + log T )

2η∗kτkT

≤ Fk(r)(1− Fk(r))

2η∗kτk

log T

T

= O(

log T

T

),



and consequentlyl∑

k=1

AMSE Fk(r) = O(

log T

T

).

Next we show that the linear growth rate of Tk in (21) has to be satisfied asymptoti-cally by the optimal Tk. Starting with T1, consider the ratio T1/T . Since 0 ≤ T1/T ≤ 1and [0, 1] is a compact set in R, there must exist a convergent subsequence, i.e.,

T1′

T ′→ τ1

for some 0 ≤ τ1 ≤ 1. By repeating this argument for T2, . . . , Tl, a subsequence can beidentified that satisfies

TkT→ τk ∈ [0, 1] (22)

for each k = 1, . . . , l. Without loss of generality, it can be assumed that (22) also holdsfor the full sequences of T and T1, . . . , Tl. A justification of the assumption is given atthe end of the proof.

Assume that τk = 0 for some k, i.e., Tk/T → 0 as T →∞. Then for any constant M >0, Tk/T < 1/M for sufficiently large T , i.e., MTk < T . Since (log T )/T is a decreasingfunction of T for T > 1, it follows that for T and Tk sufficiently large so that M ≤ Tk,

(log Tk)/Tk(log T )/T

≥ (log Tk)/Tklog(MTk)/(MTk)

≥ (log Tk)/Tk2 log Tk/(MTk)

=M

2,

which implies that

(log Tk)/Tk(log T )/T

→∞.

This shows that

AMSE Fk(r) ∈ O(

log TkTk

)grows strictly faster than the optimal rate O((log T )/T ), and hence τk = 0 is not opti-mal.

Now we determine the optimal values of τk, k = 1, . . . , l. Thus far we know thatTk ∼ τkT with τk 6= 0 for each k, so that

AMSE Fk(r) =Fk(r)(1− Fk(r))

2η∗kτk

log T

T.

Hence the sum of the AMSE’s satisfiesl∑

k=1

AMSE Fk(r) ∼l∑

k=1

Fk(r)(1− Fk(r))

2η∗kτk

log T

T

=

l∑k=1

τk

l∑k=1

Fk(r)(1− Fk(r))

2η∗kτk

log T

T

≥

[l∑

k=1

√Fk(r)(1− Fk(r))

2η∗k

]2log T

T



by the Cauchy-Schwarz inequality, with equality if and only if

τk = CFk(r)(1− Fk(r))

2η∗kτk,

for some C, which leads to the optimal solution

τ∗k =

√Fk(r)(1− Fk(r))/2η∗k∑l

h=1

√Fh(r)(1− Fh(r))/2η∗h

.

It is only left to argue that (22) must hold for the entire sequence with τk = τ∗kfor each k = 1, . . . , l. If this is not true for some k′, then there must exist a secondsubsequence such that

Tk′

T→ τk′

for some τk′ 6= τ∗k′ . Recursively restricting to a further subsequence, it can be assumedwithout loss of generality that Tk/T → τk for each k = 1, . . . , l. But τ 6= τ∗ and τ∗ is theunique optimal solution, so the new subsequence must be suboptimal, contradictingthe optimality of the full original sequence.

PROOF OF THEOREM 4.8. Using the same argument as in the proof of Theorem 4.7,(22) must hold for an optimal sequence of Tk values, k = 1, 2, . . . , l. It is then left to findthe optimal values of τk.

Recall the AMSE Fk(r) given by (19):

AMSE Fk(r) =3

2 3√

2f(1)Yk

(0)2/3Fk(r)2/3(1− Fk(r))2/3T−2/3k .

Under the constraintl∑

k=1

τk = 1,

the optimal values of τk must satisfy

τ2/3k ∝ 3

2 3√

2f(1)Yk

(0)2/3Fk(r)2/3(1− Fk(r))2/3,

which leads to the solution

τk =f(1)Yk

(0)Fk(r)(1− Fk(r))∑lh=1 f

(1)Yh

(0)Fh(r)(1− Fh(r)),

and a direct substitution gives the minimum value of the AMSEs.

PROOF OF COROLLARY 5.2. Notice that the asymptotic distribution of Q(p) in The-orem 5.1 is exactly the same as that of F (r) in Theorem 4.4, up to a constant factor of1/F ′(q) and with F (r) substituted by p. Therefore the AMSE of Q(p) is

AMSEQ(p) =AMSE F (q)

F ′(q)2=

[f(1)Y (0)2

4a2+p(1− p)

b

]T−2/3

F ′(q)2.

PROOF OF COROLLARY 5.3. Let F (r) be substituted by p. Corollary 5.2 has shownthat AMSEQ(p) takes the same form as AMSE F (r), only scaled by a constant factor.



It immediately follows that their optimal values also differ by the same factor, and theoptimal values of a and b do not change.

PROOF OF THEOREM 5.4. Since the optimal value of each AMSEQk(p) in Corol-lary 5.3 is of the same order as AMSE Fk(r) in Corollary 4.6, the theorem follows bysimply substituting in Theorem 4.8 with the new constant factor.

PROOF OF LEMMA 6.2. By the definition of covariance,

Cov (I(Zi(tk) ≤ qk + sk(T )), I(Zi(th) ≤ qh + sh(T )))

= E [I(Zi(tk) ≤ qk + sk(T )) · I(Zi(th) ≤ qh + sh(T ))]

− E I(Zi(tk) ≤ qk + sk(T )) · E I(Zi(th) ≤ qh + sh(T )).

We compute each of the expectations separately. First,

E I(Zi(tk) ≤ qk + sk(T )) = E [E [ I(Zi(tk) ≤ qk + sk(T ))|X(tk)]]

= E [P [Zi(tk) ≤ qk + sk(T )|X(tk)]] .

Conditional on X(tk), the weak law of large numbers [Billingsley 1986] ensures that

Zi(tk) =1

mk

mk∑j=1

Gj(X(tk))⇒ g(X(tk)) = V (tk),

as mk →∞ and Slutsky’s Theorem [Billingsley 1986] gives

Zi(tk)− sk(T )⇒ V (tk).

We assumed that P (V (tk) = qk) = 0, and hence

P [Zi(tk) ≤ qk + sk(T )|X(tk)] = P [Zi(tk)− sk(T ) ≤ qk|X(tk)]

→ P [V (tk) ≤ qk|X(tk)] .

Finally, by the above convergence and the dominated convergence theorem,

E I(Zi(tk) ≤ qk + sk(T )) = E [P [Zi(tk) ≤ qk + sk(T )|X(tk)]]

→ E [P [V (tk) ≤ qk|X(tk)]] = E I(V (tk) ≤ qk).

This also shows that

E I(Zi(th) ≤ qh + sh(T ))→ E I(V (th) ≤ qh).

Applying almost exactly the same argument establishes that

E [I(Zi(tk) ≤ qk + sk(T )) · I(Zi(th) ≤ qh + sh(T ))]→ E [I(V (tk) ≤ qk) · I(V (th) ≤ qh)]

as T →∞ and this completes the proof.

PROOF OF THEOREM 6.6. From the condition n ∼ bT/ log T it follows immediatelythat √

T

n log T→ b−1/2

as T →∞. Therefore, by Slutsky’s theorem, it suffices to show that√n[Fk(rk)− Fk(rk)

]k⇒ N (0,Λ) (23)



as T → ∞. The Cramer-Wold device [Billingsley 1986, p. 383] states that a necessaryand sufficient condition for (23) to hold is that

√n

l∑k=1

ζk

[Fk(rk)− Fk(rk)

]⇒ ζTN (0,Λ)

for any ζ ∈ Rl.Adopt the notation

Fζ(r) :=

l∑k=1

ζk

[Fk(rk)− Fk(rk)

].

For each k = 1, . . . , l and i = 1, . . . , nk, define

χki := I

1

mk

mk∑j=1

Gj(Xi(tk)) ≤ rk

,

and

χki := χki − Eχki = χki − P

1

mk

mk∑j=1

Gj(Xki) ≤ rk

to be the centered version of χki. Then

√nFζ(r) =

√n

l∑k=1

ζk

[Fk(rk)− Fk(rk)

]

=

l∑k=1

ζkn1/2

1

n

n∑i=1

χki + P

1

mk

mk∑j=1

Gj(Xi(tk)) ≤ rk

− Fk(rk)

=

l∑k=1

ζkn−1/2

n∑i=1

χki (24)

+

l∑k=1

ζkn1/2

P

1

mk

mk∑j=1

Gj(Xi(tk)) ≤ rk

− Fk(rk)

(25)

For (25), for each k = 1, . . . , l, the proof of Theorem 4.1 in Lee and Glynn [2003]argues that each of the summands satisfies

n1/2

P

1

mk

mk∑j=1

Gj(Xi(tk)) ≤ rk

− Fk(rk)

⇒ 0.

By Slutsky’s Theorem, it then suffices to show that for (24),l∑

k=1

ζkn−1/2

n∑i=1

χki ⇒ ζTN (0,Λ).

Define Si :=∑lk=1 ζkχki for each i = 1, . . . , n, and then (24) becomes

l∑k=1

ζkn−1/2

n∑i=1

χki = n−1/2n∑i=1

Si.



If the sequence {Si : i = 1, . . . , n} satisfies the conditions in Lemma D.1 [Leeand Glynn 2003] with σ2 = ζTΛζ, then the Lindeberg-Feller Theorem holds here; cf.[Billingsley 1986]. That is, as T →∞, we have the convergence

l∑k=1

ζkn−1/2

n∑i=1

χki ⇒ ζTN (0,Λ).

Now we verify that {Si : i = 1, . . . , n} indeed satisfies the appropriate conditions inLemma D.1:

(a) The sample path quantities Xi(tk), k = 1, . . . , l are independent for differenti = 1, . . . , n. Each of the sample paths is estimated from the same simulation-optimization algorithm, and hence has the same distribution. Since each Si is de-fined as the same function of Xi(tk), k = 1, . . . , l, the values of Si, i = 1, . . . , n areindependent and identically distributed.

(b) Since χki is the centered version of χki, we have E χki = 0 for each k = 1, . . . , l. Itthen follows that ESi = 0. Meanwhile, for each i = 1, . . . , n,

VarSi =

l∑i=1

ζ2k Var χki + 2∑

1≤k<h≤l

ζkζh Cov (χki, χhi)

=

l∑i=1

ζ2k Varχki + 2∑

1≤k<h≤l

ζkζh Cov(χki, χhi)

=

l∑i=1

ζ2kΛTkk + 2∑

1≤k<h≤l

ζkζhΛTkh = ζTΛT ζ.

(c) Corollary 6.3 established that ΛT → Λ entry-wise. As a result,

limT→∞

ΛT = Λ.

(d) Notice that, for each k = 1, . . . , l,

|χki| = |χki − Eχki| =

∣∣∣∣∣∣I 1

mk

mk∑j=1

Gj(Xi(tk)) ≤ rk

− P

1

mk

mk∑j=1

Gj(Xi(tk)) ≤ rk

∣∣∣∣∣∣≤ 1.

Therefore, |Si| ≤∑lk=1 |ζk| · |χki| =

∑lk=1 |ζk|. Hence Si, i = 1, . . . , n are uniformly

bounded, and therefore uniformly integrable.

PROOF OF THEOREM 6.7. First consider the context of Theorem 6.6, and recall thatFk(rk) := 1

n

∑ni=1 I(Zi(tk) ≤ rk) as defined in (17). In this context, bk does not depend

on k, so b can be viewed as a scalar. Scale (20) by n(log T )/T ∼ b to get√log T

T

[n∑i=1

I(Zi(tk) ≤ rk)− nF (rk)

]k

⇒ N (0, bΛ). (26)



Recall that the sample paths (Xi(tk), k = 1, . . . , l) are independent in i. To generalizeto the case with distinct nk, decompose the vector into independent summands:

[nk∑i=1

I(Zi(tk) ≤ rk)− nkF (rk)

]k

=

l∑k=1

∑nki=nk+1+1 I(Zi(t1) ≤ r1)− (nk − nk+1)F (r1)

...∑nki=nk+1+1 I(Zi(tk) ≤ rk)− (nk − nk+1)F (rk)

0...0

,

where we define nl+1 = 0. For each summand, notice that nk−nk+1 ∼ (bk−bk+1)T/ log Tas T →∞. Therefore (26) implies that

√log T

T

∑nki=nk+1+1 I(Zi(t1) ≤ r1)− (nk − nk+1)F (r1)

...∑nki=nk+1+1 I(Zi(tk) ≤ rk)− (nk − nk+1)F (rk)

0...0

⇒ N (0, (bk − bk+1)Ik ◦ Λ),

where Ik is an l-dimensional square matrix with the top-left k × k block having allentries equal to 1 and the other entries equalling 0. The effect of Ik is to preserve thetop-left k × k block and to truncate the rest. By summing over k, we get√

log T

T

[nk∑i=1

I(Zi(tk) ≤ rk)− nkF (rk)

]k

⇒l∑

k=1

Nk(0, (bk − bk+1)Ik ◦Λ) = N (0, Bmax ◦Λ),

where each Nk represents an independent normal random variable, and Bmax is anl-dimensional square matrix with entries Bmax(k, h) := bk∨h. Divide the kth entry by(log T )/T · nk ∼ bk for each k to get√

T

log T

[Fk(rk)− F (rk)

]k⇒ b−1 ◦ N (0, Bmax ◦ Λ) = N (0, Bmin ◦ Λ).

PROOF OF THEOREM 6.8. From the condition T−2/3n → b it follows immediatelythat

T 1/3

√n→ b−1/2.

Therefore, by Slutsky’s theorem, it suffices to show that

√n[Fk(rk)− Fk(rk)

]k⇒ N (0,Λ) +

[f(1)Yk

(0)√b

2ak

]k

.

Apply the Cramer-Wold theorem in the same way as in the proof of Theo-rem 6.6, and it is then sufficient to consider the asymptotic property of Fζ(r) :=∑lk=1 ζk

[Fk(rk)− Fk(rk)

]. Adopting the same notation as in the proof of Theorem 6.6,



we have the following identity:

√nFζ(r) =

l∑k=1

ζkn−1/2

n∑i=1

χki (27)

+

l∑k=1

ζkn1/2

P

1

mk

mk∑j=1

Gj(Xi(tk)) ≤ rk

− Fk(rk)

(28)

As in the proof of Theorem 6.6, (27) converges in distribution to ζTN (0,Λ). Hence itremains to discuss the asymptotic property of (28). To this end, for each k = 1, . . . , l,the proof of Theorem 4.4 [Lee 1998, p. 56] ensures that

n1/2

P

1

mk

mk∑j=1

Gj(Xi(tk)) ≤ rk

− Fk(rk)

⇒ f(1)Yk

(0)√b

2ak,

completing the proof.

PROOF OF THEOREM 6.9. Apply the technique used in the proof of Theorem 6.7 todecompose the random vector in question, and the theorem follows directly from theresult of Theorem 6.8.

PROOF OF THEOREM 6.10. We follow the argument in Lee [1998, p. 69]. For eachk = 1, . . . , l, let Ak be a constant whose value is determined later. For s ∈ Rl, define

G(s,m, n) := P

(n1/2(Qk(pk)− qk(pk))

Ak≤ sk,∀k

)= P

(Qk(pk) ≤ qk(pk) +Akskn

−1/2,∀k)

= P(pk ≤ Fk

(qk(pk) +Akskn

−1/2),∀k)

= P

(npk ≤

n∑i=1

χki,∀k

)

= P

(0 ≤ n−1/2

n∑i=1

(χki − pk),∀k

), (29)

where χki := I(Zi(tk) ≤ qk(pk) +Akskn−1/2) for each k = 1, . . . , l and i = 1, . . . , n.

To compute the value of (29), consider the joint distribution of n−1/2∑ni=1(χki − pk)

for all k. We will apply the Cramer-Wold device. For some A0 = A0(T ) that we specifylater, a vector of arbitrary multipliers ζ, and any s0 ∈ R, define

G(s0) := P

(s0 ≤

∑lk=1 ζkn

−1/2∑ni=1(χki − pk)

A0

)

= P

(∑ni=1

∑lk=1 ζk(χki −∆k)

n1/2A0≥ s0 −

n1/2∑lk=1 ζk(∆k − pk)

A0

),

where ∆k := Eχki = P(Zi(tk) ≤ qk(pk) +Akskn

−1/2). Let

A0 := Var12

(l∑

k=1

ζkχki

)=√ζTΛT ζ,



where ΛT is the covariance matrix in Definition 6.4 (with an appropriate value for rk).Define

Zn :=

n∑i=1

l∑k=1

ζkχki

and its standardized form

Z∗n :=

∑ni=1

∑lk=1 ζk(χki −∆k)

n1/2√ζTΛT ζ

.

We have shown thatG(s0) = P (Z∗n ≥ −cm,n) = 1− P (Z∗n < −cm,n) = Φ(cm,n) + Φ(−cm,n)− P (Z∗n < −cm,n) ,

where

cm,n := −s0 +n1/2

∑lk=1 ζk(∆k − pk)√ζTΛT ζ

.

Now utilizing the Berry-Esseen inequality, we write

sup−∞<x<∞

|P (Z∗n < x)− Φ(x)| ≤ C ρ

σ3n1/2,

where C is a universal constant, σ2 := VarZ1 = ζTΛT ζ, and

ρ := E

∣∣∣∣∣Z1 −l∑

k=1

ζk∆k

∣∣∣∣∣3

≤

(l∑

k=1

|ζk|

)3

.

Hence we have

|P (Z∗n < −cm,n)− Φ(−cm,n)| ≤ sup−∞<x<∞

|P (Z∗n < x)− Φ(x)| ≤ C ρ

σ3n1/2→ 0

as T →∞.It remains to investigate cm,n as m,n → ∞. But cm,n is a linear combination of

analogous quantities studied in Lee [1998, p. 71], and so it satisfies an asymptoticproperty derived there, namely that

cm,n = −s0 +n1/2

∑lk=1 ζk(∆k − pk)√ζTΛT ζ

= −s0 +

l∑k=1

ζkn1/2(∆k − pk)√

ζTΛT ζ

→ −s0 +

l∑k=1

ζk

[F ′k(qk(pk))√

ζTΛζskAk +

√b

2ak

1√ζTΛζ

f(1)Yk

(0)

].

Set Ak :=√b/F ′k(qk(pk)) for each k = 1, . . . , l. Then, as T →∞, for any s0 ∈ R,

limT→∞

G(s0) = limT→∞

Φ(cm,n) = Φ

(−s0 +

1√ζTΛζ

l∑k=1

ζk√b

[sk +

f(1)Yk

(0)

2ak

])

= 1− Φ

(s0 −

1√ζTΛζ

l∑k=1

ζk√b

[sk +

f(1)Yk

(0)

2ak

]),

which allows us to conclude thatl∑

k=1

ζkn−1/2

n∑i=1

(χki − pk)⇒ N (0, ζTΛζ) +

l∑k=1

ζk√b

[sk +

f(1)Yk

(0)

2ak

].



Since ζ ∈ Rl was arbitrary, the Cramer-Wold device ensures that[n−1/2

n∑i=1

(χki − pk)

]k

⇒ N (0,Λ) +

[√b

(sk +

f(1)Yk

(0)

2ak

)]k

. (30)

Therefore, as T →∞,

G(s,m, n)→ P

(0 ≤ N (0,Λ) +

[√b

(sk +

f(1)Yk

(0)

2ak

)]k

)= P

(N (0, b−1Λ)−

[f(1)Yk

(0)

2ak

]k

≤ s

),

which means that[T 1/3 (Qk(pk)− qk(pk))F ′k(qk(pk))

]k⇒ N (0, b−1Λ)−

[f(1)Yk

(0)

2ak

]k

.

The theorem then follows by dividing each entry by F ′k(qk(pk)).

PROOF OF THEOREM 6.11. In the context of Theorem 6.10, bk does not depend onk, so b can be viewed as a scalar. Recall the intermediate result (30) that appeared inthe proof of Theorem 6.10, which can be re-written as

T−1/3

[n∑i=1

(χki − pk)

]k

⇒ N (0, bΛ) + b

[sk +

f(1)Yk

(0)

2ak

]k

.

If the assumption nk = n is relaxed, then the left-hand side can be decomposed inthe same way as in the proof of Theorem 6.7 and Theorem 6.9:

T−1/3

[nk∑i=1

(χki − pk)

]k

=

l∑k=1

T−1/3

∑nki=nk+1+1(χ1i − p1)

...∑nki=nk+1+1(χki − pk)

0...0

Each summand then has the same asymptotic property:

T−1/3

∑nki=nk+1+1(χ1i − p1)

...∑nki=nk+1+1(χki − pk)

0...0

⇒ N (0, (bk−bk+1)Ik ◦Λ)+

(bk − bk+1)

(s1 +

f(1)Y1

(0)

2a1

)...

(bk − bk+1)

(sk +

f(1)Yk

(0)

2ak

)0...0

.

(Take bl+1 = 0.) Summing over k = 1, . . . , l, we get

T−1/3

[nk∑i=1

(χki − pk)

]k

⇒ N (0, Bmax ◦ Λ) +

[bk

(sk +

f(1)Yk

(0)

2ak

)]k

(31)



Also, modify the definition of G(t,m, n):

G(t,m, n) := P

(n1/2k (Qk(pk)− qk(pk))

Ak≤ sk,∀k

)with Ak :=

√bk/F

′k(qk(pk)). Follow the analysis in the proof of Theorem 6.10, and apply

(31), to get

G(t,m, n) = P

(0 ≤ T−1/3

nk∑i=1

(χki − pk),∀k

)

→ P

(0 ≤ N (0, Bmax ◦ Λ) +

[bk

(sk +

f(1)Yk

(0)

2ak

)]k

)

= P

(N (0, Bmin ◦ Λ)−

[f(1)Yk

(0)

2ak

]k

≤ s

),

which means that

T 1/3 [(Qk(pk)− qk(pk))F ′k(qk(pk))]k ⇒ N (0, Bmin ◦ Λ)−

[f(1)Yk

(0)

2ak

]k

.

Hence the theorem follows by dividing each entry by F ′k(qk(pk)).


Date post:	13-Jul-2020
Category:	Documents
Upload:	others
View:	0 times
Download:	0 times

39 Optimal Budget Allocation in the Evaluation of …39 Optimal Budget Allocation in the Evaluation...

Documents