Design and Analysis of Computer Experimentshqxu/stat201A/talk-upd.pdfComputer Experiments What are...

transcript

Design and Analysis of Computer Experiments

Hongquan XuUCLA Department of Statistics

Uniform Projection Designs

Outline

1 Introduction

2 Modeling Computer Experiments

3 Design for Computer Experiments

4 Uniform Projection Designs

5 Conclusions

6 References

Uniform Projection Designs

Computer Experiments

What are computer experiments?

Computer experiments are increasingly being used to explorethe behavior of complex physical systems.

A computer model is a large computer code that implementsa complex mathematical model of a physical process.e.g., simultaneous differential solver, finite element analysiscomputational fluid dynamics.

Introduction Uniform Projection Designs

A typical engineering model (P.1 of 3, in Liao and Wang,1995)

System

Meta-model

- y OutputInputsx1 -· · · · · ·

Figure 1: Computer experiment

Characteristics of computer experiments

Mostly deterministic (lack of random error)

May take hours or even days to produce a single output

Many input variables

The performance of the predictor depends upon the choice ofthe training data (design).

Principles in traditional DOE are irrelevant

Replication

Blocking

Randomization

Kriging Modeling: Gaussian Process Model

For x ∈ Rm, treat the deterministic response y(x) as a realizationof a stochastic process

Y (x) =

k∑j=1

βjfj(x) + Z(x),

where fj(x) are known functions, βj are unknown parameters andZ(·) is a stochastic process with mean 0 and covariance

cov (Z(w), Z(x)) = σ2R(w, x).

Modeling Computer Experiments Uniform Projection Designs

Prediction

Given a design S = {s1, . . . , sn} and datayS = {y(s1), . . . , y(sn)}′. Consider the linear predictor

y(x) = c′(x)yS .

Frequentists replace yS by the random vectorYS = {Y (s1), . . . , Y (sn)}′, and compute the MSE.The Best Linear Unbiased Predictor (BLUP): choose c(x) tominimize

MSE [y(x)] = E[c′(x)YS − Y (x)]2

subject toE[y(x)] = E[c′(x)YS ] = E[Y (x)]

Notationf(x) = (f1(x), . . . , fk(x))

F = (f(s1), . . . , f(sn))′ = (fj(si))n×k

R = (R(si, sj))n×n

r(x) = (R(s1, x), . . . , R(sn, x))′

Thus,Y (x) = f ′(x)β + Z(x)

YS = Fβ + Z, cov (Z) = R

BLUP and the generalized LS estimate are

y(x) = f ′(x)β + r′(x)R−1(YS − F β)

β = (F ′R−1F )−1F ′R−1YS

Correlation Functions

The correlation R(w, x) has to be specified.

R(w, x) =∏

exp(−θj |wj − xj |pj ), 0 < pj ≤ 2,

R(w, x) =∏

K(|wj − xj |; θj)

where K() is Matern correlation function with parameter ν = 5/2.

K(h; θ) =

(−√5h

The correlation parameters (e.g., θj , pj) need to be specified orestimated (by cross validation and MLE)Given the correlation parameters, the MLEs are

β = generalized l.s. estimate

σ2 =1

n(YS − Fβ)′R−1(YS − Fβ)

Examples of Matern ν = 5/2 correlation functions

0 1 2 3 4 5

Input distance, h

tial c

(h, θ

θ = 0.1θ = 0.5θ = 1θ = 2

A toy example: Kriging

0 1 2 3 4 5

Kriging

●●

Data: y = sin(2x)/(1 + x); Kriging: Y = µ+ Z(x).

A toy example: Kriging vs Polynomial models

0 1 2 3 4 5

Kriging

●●

0 1 2 3 4 5

Polynomial degree= 3

● ●

●●

0 1 2 3 4 5

● ●

●●

0 1 2 3 4 5

● ●

●●

Data: y = sin(2x)/(1 + x); Kriging: Y = µ+ Z(x).

Design Criteria

Integrated Mean Squared Error (IMSE)

MSE [y(x)]φ(x)dx,

where φ(x) is a given weight function.

Maximum Mean Squared Error (MMSE)

min : maxx∈X

MSE [y(x)].

Entropy (Gaussian process)

max : det(R) = det(R(si, sj)).

Maximin distance criterion:

max : mini<j

d(xi, xj)

Design for Computer Experiments Uniform Projection Designs

Computer experimental designs

Constructing a “good” design is crucial for the success of acomputer experiment.

A “good” design should be space-filling (i.e., cover as muchspace as possible), and have good projection properties.

Let (n, sm) denote an n×m design of s levels, each leveloccurs n/s times in each column.Latin hypercube design (LHD) [McKay, Beckman & Conover

(1979)]: s = n.

Optimality criteria: maximin distance, minimax distance,column-orthogonality, uniformity (discrepancy) etc.

Figure 2: Maximin LHD (left) and Minimax LHD (right) with n = 7 andm = 2

Maximin distance designs

For an (n, sm) design D = (xik)n×m,

dp(xi, xj) =

m∑k=1

|xik − xjk|p,

Define the Lp-distance of D as

dp(D) = min{dp(xi, xj), 1 ≤ i < j ≤ n}

Maximin distance design: maximize dp(D) among all designs

asymptotically D-optimal under the kriging model as thecorrelations become weak (Johnson et al. 1990)

Most constructions are based on stochastic algorithms:

Morris and Mitchell (1995), Joseph and Hung (2008),Ba, Myers and Brenneman (2015, R package SLHD), etc.Flexible but are not effective for large designs

Good Lattice Point (GLP) Designs

GLP designs are LHDs and often used to construct uniformdesigns (Fang and Wang, 1994).

Let h1 < . . . < hp be p integers (from 1 to n) coprime to n

D = (xij) with xij = i× hj (mod n)

An example n = 7:

1 2 3 4 5 62 4 6 1 3 53 6 2 5 1 44 1 5 2 6 35 3 1 6 4 26 5 4 3 2 10 0 0 0 0 0

with d(D) = 12

(while dupper = 16)

GLP designs

Results from Zhou and Xu (2015, Biometrika)

An upper bound (for L1-distance): For any N × n LHD D,

d(D) ≤ dupper = b(N + 1)n/3c,

where bxc is the integer part of x.

Obtain the distances for four classes of GLP designs.

For an odd prime n, the n× (n− 1) GLP design hasd(D) = (n+ 1)(n− 1)/4.the upper bound is dupper = (n+ 1)(n− 1)/3

The deff(D) = d(D)/dupper for a GLP design is 75%.

A surprising result: any linear level permutation of any columndoes not decrease the distance d(D).

GLP + Linear Permutation

Example: n = 7: Total 76 = 117, 649 linear permutations.

consider only 7 simple permutations: Di = D + i mod n

D → D1 = D + 1 mod n

1 2 3 4 5 62 4 6 1 3 53 6 2 5 1 44 1 5 2 6 35 3 1 6 4 26 5 4 3 2 10 0 0 0 0 0

2 3 4 5 6 03 5 0 2 4 64 0 3 6 2 55 2 6 3 0 46 4 2 0 5 30 6 5 4 3 21 1 1 1 1 1

d(D) = 12 d(D1) = 13

After linear permutations, deff is about 90% for large n.

How About Nonlinear Permutations?

Given an integer n, for x = 0, . . . , n− 1,

W (x) =

{2x, for 0 ≤ x < n/2;2(n− x)− 1, for n/2 ≤ x < n.

The W is a permutation of {0, . . . , n− 1}.

0 1 2 3 4 5 6

The W has been useful in

1. Latin squares, Williams (1949)

2. Orthogonal designs, Bailey(1982), Edmondson (1993)

3. Orthogonal LHDs under asecond-order Fourier model,Butler (2001)

GLP + Williams Transformation

Algorithm (Wang, Xiao and Xu, 2018, Annals of Statistics)

Step 1. Generate an n× p GLP design D.

Step 2. For b = 0, . . . , n− 1, generate Db = D + b (mod n).

Step 3. Let Eb =W (Db).

Step 4. Find the best Eb which maximizes d(Eb).

Example: n = 7, b = 1

D → D1 = D+1 (mod n) → E1 =W (D1)

1 2 3 4 5 62 4 6 1 3 53 6 2 5 1 44 1 5 2 6 35 3 1 6 4 26 5 4 3 2 10 0 0 0 0 0

2 3 4 5 6 03 5 0 2 4 64 0 3 6 2 55 2 6 3 0 46 4 2 0 5 30 6 5 4 3 21 1 1 1 1 1

4 6 5 3 1 06 3 0 4 5 15 0 6 1 4 33 4 1 6 0 51 5 4 0 3 60 1 3 5 6 42 2 2 2 2 2

d(D) = 12 d(D1) = 13 d(E1) = 16 (= dupper)

Comparison of Various n× (n− 1) LHDs

20 40 60 80 100

GLP+WTZhou and Xu 2015Xiao and Xu 2017 IXiao and Xu 2017 IIGLPSLHD

Key Result

b =W−1

(n− 1

2± c)

where c = b√

(n2 − 1)/12c.

Theorem

Given a prime n and p = n− 1, such defined b leads the best Eb,with

deff(Eb) ≥ 1− 2/√

3(n2 − 1).

As n→∞, deff(Eb)→ 1.

No need for computer search: D → Db → Eb =W (Db)

Guaranteed efficiency

Larger n, better design

Correlations

For any n× p design D = (xij), define

ρave(D) =

∑j 6=k |ρjk|p(p− 1)

where ρjk is the correlation between columns j and k of D.

Comparison of ρave for n× (n− 1) Designs

20 40 60 80 100

rho_{a

GLP+WTZhou and Xu 2015Xiao and Xu 2017 IXiao and Xu 2017 IIGLP

Method I: ρave(Eb) < 2/(n− 2)

Uniform design

Idea: choose design points from the design region with empiricaldistribution as “uniform” as possible (Fang et al, 2006).

Uniform Projection Designs Uniform Projection Designs

Uniform Designs and Centered L2-Discrepancy

For an n×m design D over [0, 1]m,

Disc(D) =

{∫[0,1]m

∣∣∣∣Vol(J(ax, x))− N(D ∩ J(ax, x))n

∣∣∣∣2 dx}1/2

The (squared) centered L2-discrepancy is defined by

CD(D) =

∑u⊆{1:m}

|Disc(Du)|2 ,

where Du is the projected design of D onto dimensions indexed bythe elements of u.

Uniform designs may have poor projections in lowerdimensional spaces.

A New Criterion

Focus on 2-dim projection uniformity

Uniform projection criterion (Sun, Wang and Xu, 2019,Annals of Statistics)

φ(D) =2

m(m− 1)

∑|u|=2

CD(Du), (1)

A design achieving the minimum φ(D) value is a uniformprojection design (UPD).

Two 25× 3 Latin hypercubes

Uniform design D1 UPD D218 16 14 2 3 219 9 0 4 5 1311 1 2 0 11 916 20 3 3 16 1720 22 12 1 22 2214 7 10 8 0 7

4 17 1 6 8 1912 12 7 9 14 2410 15 24 5 18 422 14 5 7 20 11

2 21 22 12 2 2115 11 21 10 9 0

1 5 4 14 12 143 10 11 13 15 6

23 3 8 11 24 150 13 15 17 4 108 23 6 15 7 57 8 18 19 10 189 4 13 16 19 206 19 9 18 23 1

24 18 19 21 1 1621 6 23 23 6 2313 24 17 22 13 317 0 16 20 17 12

5 2 20 24 21 8

Why we need a new criterion

Figure 3: Bivariate projections of D1 and D2, where ‘X’ means thatthere are no points in the grid.

D1[1,2]

−0.5 4.5 9.5 14.5 19.5 24.5

−0.5

D1[1,3]

−0.5 4.5 9.5 14.5 19.5 24.5−0

D1[2,3]

−0.5 4.5 9.5 14.5 19.5 24.5

−0.5

D2[1,2]

−0.5 4.5 9.5 14.5 19.5 24.5

−0.5

D2[1,3]

−0.5 4.5 9.5 14.5 19.5 24.5

−0.5

D2[2,3]

−0.5 4.5 9.5 14.5 19.5 24.5−0

Application: Design and Modeling Comparison

A 3-drug combination experiment on lung cancer (Al-Shyoukhet al. 2011).

A 512-run and 8-level full factorial design to study 3 drugs.

The response was the ATP level of the cells after the drugtreatments.

Table 1: Comparison of 1000×MSE for different models and designs

Normal Cell Cancer CellD512 RD80 MPD25 UPD25

Kriging 0.002 0.21 0.62 0.22NN 0.37 1.28 3.12 1.79Polynomial 0.48 1.16 3.22 0.74

D512 RD80 MPD25 UPD25

0.003 0.37 1.87 0.210.47 1.57 4.10 2.932.98 6.77 10.04 4.42

RD80: Random 80-run design; MPD25: MaxPro 25-run designs.

Comparison of projection properties

We compare four LHD(19, 18)’s:

1 The uniform design is from the uniform design website(UD)

2 The maximin distance design via R package SLHD (Ba, Myersand Brenneman, 2015, Technometrics).

3 The maximum projection (MaxPro) design were constructedvia R package MaxPro (Joseph et al., 2015, Biometrika)

4 The uniform projection design (UPD): Eb.

We ran R commands maximinSLHD (with slice parameter t = 1)and MaxProLHD 100 times with default settings and chose thebest designs.

Comparison of projection properties

Four criteria will be used in the comparison:

1 minimum Euclidean distance

2 maximum projection criterion (Joseph et al. 2015)

ψ(D) =

) n−1∑i=1

n∑j=i+1

1∏mk=1(xik − xjk)2

3 relative maximum centered L2-discrepancy (CD)

4 maximum correlation ρave.

For each k, we evaluate all(mk

)projected designs and determine

the worst projection with respect to four criteria.

●●

Projection dimenion (k)

3 6 9 12 15 18

● UPDMaxProMaximinUniform

●●

●● ● ● ● ●

3 6 9 12 15 18

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

3 6 9 12 15 18

● ●● ● ●

● ● ● ● ● ● ● ● ●

3 6 9 12 15 18

Figure 4: (a) minimum Euclidean distance (the larger the better), (b) maximum ψ(D) (the smaller thebetter), (c) relative maximum CD (the smaller the better), and (d) maximum ρave (the smaller the better).

Some Theoretical Results

Theorem 1

For a balanced (n, sm) design D and any 2 ≤ k ≤ m,

) ∑|u|=k

φ(Du) = φ(D),

where Du is the projected design onto k factors indexed by u.

UPDs have good space-filling properties not only in twodimensions, but also in all dimensions.

Some Theoretical Results

Theorem 2

For a balanced (n, sm) design D = (xik),

φ(D) =g(D)

4m(m− 1)n2s2+ C(m, s), (2)

g(D) =

n∑i=1

n∑j=1

d21(xi, xj)−

n∑i=1

( n∑j=1

d1(xi, xj)

φ(D) is a function of pairwise L1-distances of the rows of D.

An equidistant design under the L1-distance is a UPD.

The construction of UPDs via GLP + Williams transform.

Theorem 3 (Optimality of Eb)

(i) φ(Eb) = LB + f2(b)/[(n− 2)n4], where

f(b) = (W (b)− (n− 1)/2)2 − (n2 − 1)/12,

LB = (12n3 + 154n2 − 12n− 29)/(720n4);(ii) Let c0 = b

√(n2 − 1)/12c and bxc be the integer part of x. Let

{c0, c0 ≥

√(n2 − 4)/12− 1/2,

c0 + 1, c0 <√

(n2 − 4)/12− 1/2,

andb∗ =W−1

((n− 1)/2± c

). (4)

Then φ(Eb∗) minimizes φ(Eb).(iii) Eb∗ has φ-efficiency

φeff (Eb∗) =LB

φ(Eb∗)>

n2 + 5→ 1 (n→∞).

Example for n = 19

Table 2: The φ values (multiplied by 1000) and φ-efficiencies of Db andEb for n = 19 and b ∈ Zn.

b φ(Db) φeff (Db) φ(Eb) φeff (Eb) b φ(Db) φeff (Db) φ(Eb) φeff (Eb)0 2.107 .696 2.641 .555 10 1.662 .882 1.989 .7381 1.592 .922 1.630 .900 11 1.685 .871 1.483 .9892 1.703 .861 1.478 .992 12 1.757 .835 1.555 .9433 1.757 .835 1.666 .881 13 1.788 .820 1.772 .8284 1.773 .828 1.847 .794 14 1.773 .828 1.873 .7835 1.788 .820 1.847 .794 15 1.757 .835 1.772 .8286 1.757 .835 1.666 .881 16 1.703 .861 1.555 .9437 1.685 .871 1.478 .992 17 1.592 .922 1.483 .9898 1.662 .882 1.630 .900 18 2.107 .696 1.989 .7389 2.350 .624 2.641 .555

The φ-efficiency: φeff (D) = LBφ(D) .

Summary

Many available algorithms for constructing space-fillingdesigns, but not efficient for constructing large designs

The proposed methods make a breakthrough

The first to construct a broad class of infinite maximindistance LHDs without computer searchLarge distance efficiencies: deff = 1 (or → 1)Low average correlation ρave → 0 as n→∞

Uniform projection designs have good projection properties onall dimensions (Sun et al. 2019, Annals of Statistics)

The methods can be used to further construct optimalmulti-level fractional factorial designs (Wang and Xu, 2019)

Conclusions Uniform Projection Designs

Summary

Uniform projection designs are suitable when only a subset ofthe input variables are active.

Uniform projection designs have good space-filling not only intwo dimensions, but also in all dimensions.

Uniform projection design is equivalent to maximinL1-distance criterion if L1-equidistant designs exist.

The theoretical results can be easily extended to othercommonly used discrepancies such as wrap-aroundL2-discrepancy.

Optimal or highly efficient designs under the φ(D) criterionperforms well against design criterion changing thus they arerobust under other design criteria.

Conclusions Uniform Projection Designs

Selected References

Fang, K. T., Li, R. Z. and Sudjianto, A. (2006). Design and Modeling forComputer Experiments. Chapman and Hall/CRC, New York.

Johnson, M. E., Moore, L. M. and Ylvisaker, D. (1990). Minimax and maximindistance designs. J. Statist. Plan. Infer. 26, 131–48.

Sacks, J., Welch, W. J., Mitchell, T. J., and Wynn, H. P. (1989). Design andanalysis of computer experiments. Statistical science, 4, 409–423.

Santner, T. J., Williams, B. J., and Notz, W. (2003). The design and analysisof computer experiments. Springer.

Sun, F., Wang, Y. and Xu, H. (2019), “Uniform Projection Designs,” Annals ofStatistics, 47, 641–661.

Wang, L., Xiao, Q. and Xu, H. (2018), “Optimal Maximin L1-distance LatinHypercube Designs Based on Good Lattice Point Designs,” Annals of Statistics,46, 3741-3766.

Xiao, Q., Wang, L. and Xu, H. (2019). “Application of Kriging Models for aDrug Combination Experiment on Lung Cancer,” Statistics in Medicine, 38,236–246.

Zhou, Y. D., and Xu, H. (2015). Space-filling properties of good lattice pointsets. Biometrika, 102, 959–966.

References Uniform Projection Designs

Design and Analysis of Computer Experimentshqxu/stat201A/talk-upd.pdfComputer Experiments What are...

Documents