Bootstrap confidence intervals for reservoir model selection ......Multi-dimensional scaling (MDS)...

1

Bootstrap confidence intervals for reservoir model

selection techniques

Céline Scheidt and Jef Caers

Department of Energy Resources Engineering

Stanford University

Abstract

Stochastic spatial simulation allows rapid generation of multiple,

alternative realizations of spatial variables. Quantifying uncertainty on response

resulting from those multiple realizations would require the evaluation of a

transfer function on every realization. This is not possible in real applications,

where one transfer function evaluation may be very time consuming (several

hours to several days). One must therefore select a few representative

realizations for transfer function evaluation and then derive the production

statistics of interest (typically the P10, P50 and P90 quantiles of the response).

By selecting only a few realizations one may risk biasing the P10, P50 and P90

estimates as compared to the original multiple realizations.

The principle objective of this study is to develop a methodology to

quantify confidence intervals for the estimated P10, P50 and P90 quantiles

when only a few models are retained for response evaluation. Our approach is

to use the parametric bootstrap technique, which allows the evaluation of the

variability of the statistics obtained from uncertainty quantification and

construct confidence intervals. A second objective is to compare the confidence

intervals when using two selection methods available to quantify uncertainty

given a set of geostatistical realizations: traditional ranking technique and the

distance-based kernel clustering technique (DKM). The DKM has been

recently developed and has been shown to be effective in quantifying

uncertainty.

The methodology is demonstrated using two examples. The first example is

a synthetic example, which uses bi-normal variables and serves to demonstrate

the technique. The second example is from an oil field in West Africa where

the uncertain variable is the cumulative oil production coming from 20 wells.

The results show that for the same number of transfer function evaluations, the

DKM method has equal or smaller error and confidence interval compared to

ranking.

2

1. Introduction

Uncertainty quantification of subsurface spatial phenomena is done in the context

of decision making, often by estimating low, mean and high quantile values (typically

P10, P50, and P90) of the response of interest. Often, an exhaustive sampling of all

uncertain parameters is unfeasible, and only a small subset of reservoir model

realizations of the phenomena can be created. Due to high computational

requirements, the transfer function must be evaluated on an even smaller subset of

realizations. Therefore, any quantiles that are estimated from this subset are

themselves subject to uncertainty, and may vary depending on the selection method,

the number of transfer function evaluations, the initial set of realizations, the use of a

proxy response, etc.

The objective of the study is to be able to quantify confidence intervals for the

estimated P10, P50 and P90 quantiles when only a few models are retained for

response evaluation. The magnitude of the confidence intervals can then be used to

decide whether or not more flow simulations are required to establish a better

quantification of response uncertainty. The methodology developed uses parametric

bootstrap technique, which is a statistical method allowing to construct confidence

intervals of the estimated statistics. Such confidence intervals provide an idea on the

variability of the statistics inferred by selecting only a few models for evaluation.

The workflow can be applied using any technique of reservoir model selection. In

this paper, we compare the behavior of the estimated quantiles using 3 different

selection techniques. The first method is the traditional ranking technique (Ballin et

al., 1992), which select realizations according to a ranking measure. The second

method has been developed recently and is called the distance-based kernel technique

(DKM, Scheidt and Caers, 2008). Finally, we use a random selection for comparison.

It should be noted that the proposed bootstrap technique applies to any model

selection methodology.

The paper is organized as follows. In the next section, we give a description of

the two methods employed to quantify uncertainty in spatial parameters. Then, we

give a brief overview of the basic ideas of the bootstrap methodology in the context

of parametric inference, illustrated by a typical example. We then describe our

workflow which is applied to cases where we have a proxy response which can be

evaluated rapidly for each realization, and a true response which cannot be evaluated

for every realization. The subsequent section is devoted to the application of the

specific workflow to two examples, the first being a synthetic example, the second is

an example from an oil field in West Africa. Finally, we discuss the results obtained

as well as some concluding remarks.

2. Quantification of uncertainty – methodologies

Uncertainty quantification of a spatial phenomenon aims at characterizing the

3

statistics (P10, P50 and P90) of the response(s) of interest. In real applications where

one transfer function evaluation can be very time consuming, it may not be possible

to perform a transfer function evaluation on every realization of the reservoir. This

difficulty can be overcome by selecting a representative set of realizations from the

initial set. In this paper, we consider two different ways of selecting realizations for

transfer function evaluation. The first method is the traditional ranking technique,

which was introduced by Ballin et al. in 1992. The second method, denoted the

Distance-Kernel Method (DKM) is more recent and was first presented in 2008

(Scheidt and Caers) and applied to a real case in Scheidt and Caers (2009).

2.1. Traditional Ranking

Traditional ranking technique was introduced by Ballin in 1992 in the context of

stochastic reservoir modeling. The basic idea behind ranking is to define a rapidly

calculable ranking measure, which can be evaluated for each realization. Most of the

time, the ranking measure is static (eg. original oil-in-place), however more recent

studies employ more complex measures, such as connectivity (McLennan and

Deustch, 2005), streamline (Gilman et al., 2002) or tracer-based measures (Ballin et

al., 1992, Saad et al., 1996). The ranking measure acts as a proxy of the response of

interest for each realization. To be effective, therefore, ranking requires a good

correlation between the ranking measure and the response. The ranking measure is

used to rank the realizations according to the measure, and realizations are

subsequently selected corresponding typically to the P10, P50 and P90 quantiles.

Full flow simulation is then performed on these selected realizations, and the P10,

P50 and P90 values are derived from the simulation results.

In previous work (Scheidt and Caers, 2009), we show that selecting only 3

realizations to derive the P10, P50, and P90 quantiles can result in very inaccurate

estimations. In this study, contrary to the standard ranking approach, we propose to

select more than 3 realizations, and compare ranking with the Distance-Kernel

Method proposed below. The realizations are selected equally-spaced according to

the ranking measure, and we derive the P10, P50 and P90 quantiles by interpolation

from the distribution of the selected points.

2.2. Distance-Kernel Method

In this section, we describe the main principle of the Distance-Kernel Method

(DKM), illustrated in Figure 1. Starting from a large number of model realizations,

the first step is to define a dissimilarity distance between the realizations. This

distance is a measure of the dissimilarity between any two realizations, and should be

tailored to the application and the response(s) of interest (just as in ranking), in order

to make uncertainty quantification more efficient. The distance is evaluated between

any two realizations, and a dissimilarity distance table (NR x NR) is then derived.

Multi-dimensional scaling (MDS) is then applied using the distance table (Borg and

4

Groenen, 1997). This results in a map (usually 2 or 3D) of the realizations, where the

Euclidean distance between any two realizations is similar to the distance table. Note

that only the distance between the realizations in the new space matters - the actual

position of the realizations is irrelevant. Once the realizations are in MDS space, one

could classify realizations and select a subset using clustering techniques. However,

often the points in MDS space do not vary linearly and thus classical clustering

methods would result in inaccurate classification. To overcome the nonlinear

variation of the points, Schöelkopf et al. (2002) introduced kernel methods to

improve the clustering results. The main idea behind kernel methods is to introduce a

highly non-linear function Φ and map the realizations from the MDS space to a new

space, called feature space. The high dimensionality of that space makes the points

behave more linearly and thus standard classification tools, such as clustering, can be

applied more successfully. In this paper, we employ kernel k-means to select

representative realizations of the entire set. Transfer function evaluation is then

applied on the closest realization to the centroids and the statistics (P10, P50 and P90)

are computed on the small subset of realizations.

Model 1 Model 2

Model 3 Model 4

δδδδ12

δδδδ13 δδδδ24

δδδδ34

δδδδ32

δδδδ14

Distance Matrix DEuclidean Space

ΦΦΦΦ-1

Non-linear variation

linear “features”

Apply standard tools here

PCA

ClusteringDimensionality reduction

Apply standard tools herePCA

Clustering

Dimensionality reduction

Kernels

R

F

R

(a) (b) (c)

(d)

(e)

simulation

ΦΦΦΦP10,P50,P90

model selection

δδδδ44δδδδ43δδδδ42δδδδ414




4321

Figure 1: DKM for uncertainty quantification: (a) distance between two models,

(b) distance matrix, (c) models mapped in Euclidean space, (d) feature space, (e) pre-

image construction, (f) P10, P50, P90 estimation

5

For more details about the methodology, we refer to Scheidt and Caers (2008).

3. Parametric Bootstrap – Methodology

3.1. General introduction to Bootstrap

Bootstrap methods fall within the broader category of resampling methods. The

concept of the bootstrap was first introduced by Efron (1979). In his paper, Efron

considered two types of bootstrap procedures (nonparametric and parametric

inference). Bootstrap is a Monte-Carlo simulation technique that uses sampling

theory to estimate the standard error and the distribution of a statistic. In many recent

statistical texts, bootstrap methods are recommended for estimating sampling

distributions, finding standard errors and confidence intervals. A bootstrap procedure

is the practice of estimating properties of an estimator (such as its variance) by

measuring those properties when sampling from an approximate distribution. In the

parametric bootstrap, we consider an unknown distribution F to be a member of

some prescribed parametric family and obtain the empirical distribution nF̂ by

estimating the parameters of the family from the data. Then, a new random sequence,

called a resample, is generated from the distribution nF̂ .

The parametric bootstrap procedure works as follows. First, the statistics θ̂ of the distribution of the initial sample are computed (for example the mean and

variance). Then, the distribution nF̂ is estimated using those statistics. We assume

that the distribution nF̂ is the true distribution and we use Monte-Carlo simulation to

generate B new samples of the initial sample using the distribution nF̂ . Next, we

apply the same estimation technique to these “bootstrapped” data to get a total of B

bootstrap estimates of θ̂ , which are denoted b*ˆ̂

θ , b = 1,…B. Using these B bootstrap estimates, we can compute confidence intervals or any other statistical measure of

error.

Simple illustrative example

A simple example illustrating the parametric bootstrap method is presented in

Figure 2. Suppose we have NR=15 values X = (x1,…,xNR) of a normal distribution

),( σµN and we are interested in the estimation of the unknown parameters µ and σ.

The first step is to assume that X has a normal distribution Fn and then to estimate the

mean and variance of the distribution:

x=µ̂ and ( )2N

1R

2R

N

1ˆ ∑

=

−=i

i xxσ

6

We assume that µ̂ and σ̂ are the true parameters and we generate B = 1000 new

samples X*b (b=1,…,B) from )ˆ,ˆ(ˆ σµNFn = using Monte-Carlo simulation, each

sample containing NR=15 values. For each sample, the bootstrap estimate of the

mean and variance of the distribution can be calculated:

bbx

**ˆ̂ =µ and ( )2N

1

**

R

*2R

N

1ˆ̂ ∑=

−=i

bb

i

bxxσ

Having computed ( )bbb *2** ˆ̂,ˆ̂ˆ̂ σµθ = , one can now construct a histogram of the mean and the variance to display the probability distribution of the bootstrap

estimator (Figure 2). From this distribution, one can obtain an idea of the statistical

properties of the estimates µ̂ and σ̂ . In Figure 2, the red line represents the

estimation of the mean µ̂ and variance σ̂ of the initial sample.

)ˆ,ˆ(ˆ σµNF ≈

],...,[ **1* b

Nbb

Rxx=X

)ˆ̂,ˆ̂(ˆ̂ *** bbb σµθ =

],...,[ 1RN

xx=X

2 3 4 5 6 7 80

50

100

150

200

250

300

Bootstrap Mean

Fre

qu

en

cy

0 5 10 15 20 250

50

100

150

200

250

300

Bootstrap Variance

Fre

qu

en

cy

),( σµNF ≈

)ˆ,ˆ(ˆ σµθ =

Figure 2: Application of the parametric bootstrap procedure to a simple example

The histograms of the bootstrap estimations of the mean and the variance are

informative about the variability of the statistics obtained. Confidence intervals of

the estimated mean and variance (or any quantiles) can then be calculated from the B

estimates of the mean and variance.

The next section shows how to apply the bootstrap method in the context of

uncertainty quantification where a proxy value can be rapidly calculated for many

realizations of a spatial phenomenon.

3.2. Workflow of the study

Contrary to the previous example where the data are univariate, in the context of

reservoir model selection techniques, a proxy response is employed for the selection

7

using DKM or ranking and thus two variables are necessary: the response of interest

and the proxy response.

Therefore, we consider a bivariate variable X = [X1, X2,…, XNR], where:

• Xi = [xi,yi], i =1,…,NR, NR being the total number of samples/realizations

• xi represents the response of interest (e.g. cumulative oil production)

• yi represents the proxy response, which will serve as a ranking measure or

be transformed to a distance.

Note that for ranking and DKM to be effective, the response and its proxy should

be reasonably well correlated. In addition, for real applications, the values of the true

response xi for each realization are unknown.

In traditional uncertainty quantification, the proxy response serves as a guide to

select a few realizations which will be evaluated using the transfer function. The

response quantiles are then deduced from the evaluation of the realizations. Since the

resulting quantiles are subject to uncertainty, the bootstrap method illustrated

previously is well suited to the problem and can inform us on the accuracy of the

estimated quantiles and give an idea of the error resulting from the selection of a

small subset of realizations.

The workflow in the context of reservoir model selection is as follow. It is

illustrated in Figure 3.

1. Evaluate a proxy response yi for each of the i =1,…,NR realizations.

2. Apply ranking or DKM using the proxy response in order to select N

samples/realizations for evaluation (note that N

8

intervals.

6. A single measure of accuracy of our quantile estimation is defined by

computing the dimensionless bootstrap error of the estimated quantiles

for each of the B new samples created (Eq. 1):

−+

−+

−=

90

9090

50

5050

10

1010

ˆ

ˆˆ̂

ˆ

ˆˆ̂

ˆ

ˆˆ̂

3

1***

*

P

P

b

P

P

P

b

P

P

P

b

Pb

x

xx

x

xx

x

xxerror (1)

The bootstrap error of the estimated quantiles is evaluated on each sample, and

thus can be represented as a histogram to visualize the variability between the

samples. From the histogram, we can quantify the variation of the bootstrap error of

the estimated quantiles.

KKM/Ranking

-> Select N points

**

1 ,, Nxx K

**

1 ,, Nyy K

b

N

b xx **1 ,,K

b

N

byy

**

1 ,,K

True Values

RN1,, xx K

RN1,, yy K

KKM/Ranking

-> Select N points )(N

)(

1 R,, bb yy K

)(

N

)(

1 R,, bb xx K

−+

−+

−=

90

9090

50

5050

10

1010

ˆ

ˆˆ̂

ˆ

ˆˆ̂

ˆ

ˆˆ̂

3

1***

*

P

P

b

P

P

P

b

P

P

P

b

Pb

x

xx

x

xx

x

xxerror

b

P

b

P

b

P xxx***

905010

ˆ̂,ˆ̂,ˆ̂

***

905010ˆ,ˆ,ˆ PPP xxx b = 1,…,B

• Parametric Bootstrap

• Estimation of distribution

• Generation of B samples

nF̂

Figure 3: Workflow of the bootstrap method applied to uncertainty quantification

The workflow described previously and illustrated in Figure 3 is performed for

several values of N, where N is the number of selected realizations for evaluation.

This is done to evaluate the influence of the number of transfer function evaluations

on the accuracy of the quantile estimation. For each value of N, the selected

realizations are obtained using DKM or ranking methods, and therefore the

realizations are different for each value of N.

Now that the basic idea and theory of the bootstrap method has been presented,

9

the next section shows some application examples of this technique in the context of

uncertainty quantification.

4. Application of the methodology to uncertainty

quantification

Two examples are presented in this section. The first one is illustrative and uses a

bivariate Gaussian distribution. The second example is more complex and is based

on a real oil field reservoir in West Africa (West Coast African reservoir) and uses

real production data.

In the case of DKM, the definition of a distance between any two realizations is

required. In this study, in order to compare the results of the DKM with those

obtained by ranking using the exact same information, we use simply the difference

of ranking measure (proxy response) as a distance between realizations. Note

however that, as opposed to the ranking measure, the distance can be calculated using

a combination of many different measures, and thus has more flexibility to be tailored

to the application. We will discuss the consequences of this in more detail below.

4.1. Bivariate Gaussian distribution

In the first example, we consider a bivariate Gaussian distribution:

),(~ ΣµbiNX , where µ represents the mean and Σ the covariance matrix. In this

example, the mean of the sample is taken as ]5,5[=µ , and the covariance is taken

as:

=Σ

2ρ2

ρ22. The parameter ρ defines the correlation coefficient between the

target response and the proxy response.

To set up an example, an initial sample X of NR = 100 values is generated using

the distribution given above. Note that for this illustrative example, we use the term

sample instead of realization, since no geostatistical realization is associated to each

bivariate value. Figure 4 shows an example of the probability density plot of the bi-

normal sample X, where the correlation coefficient between the target and proxy

responses was defined as ρ = 0.9.

10

Figure 4: Probability density of X, which has a bi-normal distribution

Now that the initial data is defined, we assume that we only know the type of

distribution of X (bi-normal), but that we do not know the parameters defining the

distribution (the mean µ and the covariance Σ ). The bootstrap procedure illustrated

in Figure 3 is applied taking the sample X generated previously (Figure 4) and using

DKM to select N =15 points. Estimation of the mean µ̂ and the covariance Σ̂ are

then obtained using the response on the 15 selected points and thus the estimated

bivariate distribution of the data is assumed to be the true distribution:

)ˆ,ˆ(ˆ Σ= µbiNFn . B=1000 new samples of this distribution can then be easily

derived, since the distribution is known. Uncertainty quantification is then performed

on those B samples, and an estimation of the variability of the quantiles is possible.

Examples of the bootstrap histograms of the P10, P50 and P90 quantiles are presented

in Figure 5.

1 2 3 4 50

50

100

150

200

250

300

Bootstrap P10

Fre

que

ncy

4 4.5 5 5.5 6 6.50

50

100

150

200

250

Bootstrap P50

Fre

que

ncy

5 6 7 8 9 100

50

100

150

200

250

300

Bootstrap P90

Fre

que

ncy

Figure 5: Histogram of the P10, P50 and P90 quantiles estimated by bootstrap

***

905010

ˆ̂,ˆ̂,ˆ̂ PPP xxx . The red line represents the estimated quantiles***

905010ˆ,ˆ,ˆ PPP xxx . The

estimates are calculated using DKM to select 15 points.

11

We observe on Figure 5 that the distribution of the bootstrap quantiles is

Gaussian. In addition, there is a small bias in the estimation of the P10 and P90

quantiles for this example. Although this is not shown, ranking has the same effect.

The result is that on average, the 10

ˆ̂Px is overestimated and the 90

ˆ̂Px is underestimated.

The biased estimates should not affect the determination of the confidence intervals.

In our study, we have found that the estimated mean µ̂ and covariance Σ̂ from

the initial sample had an impact on the confidence intervals. Since our goal in this

first example is to understand what the general behavior is when varying the number

of selected samples N, we propose to do a Monte-Carlo bootstrap, which basically

means that we randomize the initial sample and use C sets of initial samples, then

perform the workflow illustrated in Figure 3 on those C sets of initial samples. The

estimated statistics of each initial sample are averaged to obtain the final statistics. In

this study, we take C = 15. In the next few examples, the workflow illustrated in

Figure 3 has been performed by varying the number of selected samples (N = 5, 8, 10,

15 and 20 more precisely), in order to examine the effect of the number of transfer

function evaluations on the bootstrap error. In addition, several correlation values

between the proxy response and the target response were used to explore the

influence of the correlation coefficient on the confidence intervals. Results are

presented in Figure 6, for ρ = 1, 0.9, 0.8, 0.7, 0.6 and 0.5 respectively. Figure 6

shows the confidence intervals of the error of the bootstrap estimated quantiles for

DKM (blue - square) and ranking (red - dot) for different values of N. The number of

bootstrap samples generated is B = 1000. The symbols represent the P50 value of

the error of the estimated quantiles, in other words, half of the estimated quantiles

have an error below this value and half above.

12

5 10 15 200

0.05

0.1

0.15

0.2

0.25

0.3

ρ = 1.0

# of function evaluation

Err

or

on

qu

an

tile

estim

atio

n

5 10 15 200

0.05

0.1

0.15

0.2

0.25

0.3

ρ = 0.6


Err

or

on

qu

an

tile

estim

atio

n

5 10 15 200

0.05

0.1

0.15

0.2

0.25

0.3

ρ = 0.8


Err

or

on

qu

an

tile

estim

atio

n

5 10 15 200

0.05

0.1

0.15

0.2

0.25

0.3

ρ = 0.9


Err

or

on

qu

an

tile

estim

atio

n

5 10 15 200

0.05

0.1

0.15

0.2

0.25

0.3

ρ = 0.5


Err

or

on

qu

an

tile

estim

atio

n

5 10 15 200

0.05

0.1

0.15

0.2

0.25

0.3

ρ = 0.7


Err

or

on

qu

an

tile

estim

atio

n

KKM

Ranking

Random

KKM

Ranking

Random

KKM

Ranking

Random

KKM

Ranking

Random

KKM

Ranking

Random

KKM

Ranking

Random

Figure 6: Confidence intervals (α = 10) of the bootstrap error of the estimated

quantiles as a function of the number of function evaluation for ρ = 1, 0.9, 0.8, 0.7,

0.6 and 0.5. The symbols represent the P50 value of the bootstrap error.

We observe on Figure 6 that the error globally decreases as the number of transfer

function evaluation increases. Also, the confidence intervals tend to narrow as the

number of transfer function evaluation increases, meaning that the error in our

estimates decreases. Both methods, DKM and ranking, provide similar results.

However, the error obtained by the DKM is slightly smaller than the one observed for

ranking. The same remark is valid for the confidence intervals. Finally, the results

13

provided by DKM vary smoother than the one obtained by ranking technique.

Note that each method selects optimally N samples for evaluation. Therefore, the

N = 8 models selected do not necessarily include the N = 5 models. This is true for

all N.

The bootstrap method can also be used to compute an estimate of the correlation

coefficient between the actual response and the proxy response. Figure 7 presents the

confidence intervals for the correlation corresponding to the results obtained in Fig. 6.

5 10 15 20-0.4

-0.2

0

0.2

0.4

0.6

0.8

1ρ = 0.5


Estim

ate

d c

orr

ela

tion

5 10 15 20

0.4

0.5

0.6

0.7

0.8

0.9

1ρ = 0.7


Estim

ate

d c

orr

ela

tion

5 10 15 200.75

0.8

0.85

0.9

0.95

1ρ = 0.9


Estim

ate

d c

orr

ela

tion

KKM

Ranking

Random

KKM

Ranking

Random

KKM

Ranking

Random

KKM

Ranking

Random

KKM

Ranking

Random

5 10 15 20-0.2

0

0.2

0.4

0.6

0.8

1

1.2ρ = 0.6


Estim

ate

d c

orr

ela

tion

5 10 15 200.55

0.6

0.65

0.7

0.75

0.8

0.85

0.9

0.95

ρ = 0.8


Estim

ate

d c

orr

ela

tion

Figure 7: Bootstrap estimates (α = 10) of the correlation between the response

and the proxy. The black line represents the input correlation (ρ = 0.9, 0.8, 0.7, 0.6

and 0.5) to generate the first sample

14

We observe on Figure 7 that the estimation of the correlation coefficient tends to

be overestimated, especially for small values of N. Figure 7 also shows that the

correlation estimation becomes more accurate and less prone to error as the number

of transfer function evaluation increases.

The next section illustrates the workflow using a real oil reservoir, located in

West Africa.

4.2. West Coast African reservoir

• Reservoir Description

The West Coast African (WCA) reservoir is a deepwater turbidite offshore

reservoir located in a slope valley. The reservoir is located offshore in 1600 feet of

water and is 4600 feet below see level. Four depositional facies were interpreted

from the well logs: shale (Facies 1), poor quality sand #1 (Facies 2), poor quality sand

#2 (Facies 3) and good quality channels (Facies 4). The description of the facies

filling the slope valley is subject to uncertainty. 12 TIs are used in this case study,

representing uncertainty on the facies representations.

The reservoir is produced with 28 wells, of which 20 are production wells and 8

are water injection wells. The locations of the wells are displayed in Figure 8. Wells

colored in red are producers wells and in blue are injectors.

Figure 8: Location of the 28 wells. Red are production wells and blue are

injection wells. Different colors in grid represent different fluid regions

72 geostatistical realizations were created using the multi-point geostatistical

algorithm snesim (Strebelle, 2002). To include spatial uncertainty, two realizations

were generated for each combination of TI and facies probability cube, leading to a

total of 72 possible realizations of the WCA reservoir. Each flow simulation took 4.5

hours.

15

In a previous paper (Scheidt and Caers, 2009), uncertainty quantification on the

WCA reservoir has been performed by performing only a small number of

simulations. It was shown that the statistics obtained by flow simulation on 7

realizations selected by the DKM are very similar to the one obtained by simulation

on the entire set of 72 realizations. A comparison with the traditional ranking method

showed that the DKM method easily outperforms ranking technique without requiring

any additional information. However, in reality, one would not have access to the

results of those 72 flow simulations, hence one would not know how accurate the

results of P10, P50, P90 of those 7 flow simulations are with respect to the entire set

of 72 flow simulations.

In this study, the response of interest is the cumulative oil production at 1200

days. We have evaluated the response for each of the 72 realizations, as a reference.

For the proxy response, we evaluated the cumulative oil production using streamline

simulation (Batycky et al., 1997). The correlation coefficient between the response

and the proxy is ρ = 0.92. In order to perform the parametric bootstrap procedure, we

must estimate the distribution of the cumulative oil production and its ranking proxy,

and be able to generate new samples with densities of the bivariate distribution.

Because we do not know a priori the distribution of the cumulative oil production and

its proxy (contrary to the previous example), we propose to compute the bivariate

densities using a kernel smoothing technique (Bowman and Azzalini, 1997).

• Generation of a sample for a kernel smoothing density

Kernel smoothing (Bowman and Azzalini, 1997) is a spatial method that

generates a map of density values. The density at each location reflects the

concentration of points in the surrounding area. Kernel smoothing does not require

making any parametric assumption about the probability density function (pdf). The

kernel smoothing density of a variable ],,[RN1

xx K=X is defined as follow:

p

i

i

i

pxx

h

xxK

hhxf ℜ∈

−= ∑

=

,,N

1),(ˆ

RN

1R

with K the kernel function and h the bandwidth.

In the case of a Gaussian rbf kernel, the kernel smoothing density is defined as:

∑=

−−=

RN

1

2

2/1

R2

1exp

)2(N

1),(ˆ

i

i

p h

xx

hhxf

π

Choosing the bandwidth for the kernel smoothing can be a difficult task, and is

generally a compromise between acceptable smoothness of the curve and fidelity to

the data. The choice of h has an impact on the overall appearance of the resulting

smooth curve, much more so than the choice of the kernel function which is generally

held to be of secondary importance. In this work, we use a bandwidth which is

function to the number of points in X.

16

For example, Figure 9 shows the density distribution of the 72 data from the

WCA example, estimated by kernel smoothing using a Gaussian kernel.

Figure 9: Probability density of X for WCA

Once the density of the bivariate variable has been defined, new samples of the

same distribution can be generated using Metropolis sampling technique.

• Overview of the Metropolis sampling algorithm

Metropolis-Hasting technique is a Markov chain-based method which allows

generating a random variable having a particular distribution (Metropolis and Ulam

1949, Metropolis et al. 1953). The Metropolis algorithm generates a sequence of

samples from a distribution f as follows:

1. Start with some initial value x0

2. Given this initial value, draw a candidate value x* from some

proposal distribution (we choose a uniform distribution)

3. Compute the ratio α of the density at the candidate x* and the current

xt-1 points and accept the candidate point with probability α:

4. Return to step 2 until the desired number of samples is obtained.

5. The new sample ( )txx ,...,1 is of distribution f

)(

*)(

1−

=txf

xfα

17

An illustration of a sample generated by Metropolis sampling associated with the

density provided by kernel smoothing is presented in Figure 10. The contours present

the density probability which is calculated using N = 10 values of response selected

by DKM. The red points show 70 values derived from this density by Metropolis

sampling.

Figure 10: Generation of a new sample using Metropolis sampling. The contours

represent the probability density obtained by kernel smoothing and red dots represent

new sample generated by Metropolis sampling

• Application of the bootstrap technique to the WCA case

In the context of uncertainty quantification in cumulative oil production, the

initial data are the flow simulations at the NR = 72 realizations of the WCA reservoir:

− x1,…xNR: Cumulative oil production obtained by full flow simulation

(target response)

− y1,…yNR: Cumulative oil production obtained by fast flow simulation

(proxy response)

The distance employed for the DKM is computed as the absolute value of the

difference of proxy response between any two realizations: jiij yyd −= .

The bootstrap procedure, illustrated in Figure 3, is performed for different number

N of transfer function evaluation: in this case N = 3, 5, 8, 10, 15 and 20. For each

value of N, the procedure to generate B bootstrap samples is as follow:

1. Select N realizations using the proxy response as ranking measure or distance

measure d according to the method used

2. Evaluation of the response using the transfer function (flow simulation) on

the N selected realizations

18

3. Compute the bivariate density nF̂ of the target response using kernel

smoothing on the N responses resulting from the selected realizations

4. Use Metropolis sampling to generate B samples of the bivariate distribution

nF̂

5. For each of the B samples generated, apply ranking or DKM to select N

realizations and compute the statistics of interest: ( )***905010

ˆ̂,ˆ̂,ˆ̂ˆ̂

PPP xxx=θ .

The workflow illustrated in Figure 11 gives more details than the general

workflow in Figure 3, by including the estimation of nF̂ by kernel smoothing and the

generation of new samples by metropolis sampling.

Kernel Smoothing on N

selected realizations

Metropolis sampling to

generate a new sample

RN1,, yy K

Proxy measure: Response evaluation

on N selected real.

nF̂)(

N

)(

1 R,, bb yy K

)(

N

)(

1 R,,

bbxx K

b

N

bxx

**

1 ,,K

b

N

b yy **1 ,,K

KKM/ranking

to select N real.

**

1 ,, Nxx K**

1 ,, Nyy K

P10, P50 and P90 evaluation

on N selected real.

KKM/ranking

to select N real.

Figure 11: Workflow for confidence interval calculation

The next section shows an application of the workflow illustrated above in Figure

11. The workflow is performed using 3 different methods for selecting the

realizations: DKM, ranking and random selection. Our objective is to see how each

method behaves as the number of transfer function evaluation increases and how they

compare to each other.

First, we compare the 3 methods by looking at the histograms of the bootstrap

error of the estimated quantiles for each method (Figure 12). The bootstrap error is

computed using Eq. 1 above. The blue, red and green bars represent the error

obtained for DKM, ranking and random selection respectively.

19

0 0.02 0.04 0.06 0.08 0.1 0.120

50

100

150

200

250

300

Response Value

Fre

quency

N = 3

DKM

Ranking

Random

0 0.02 0.04 0.06 0.08 0.1 0.120

50

100

150

200

250

300

Response Value

Fre

quency

N = 5

DKM

Ranking

Random

0 0.02 0.04 0.06 0.08 0.1 0.120

50

100

150

200

250

300

Response Value

Fre

quency

N = 8

DKM

Ranking

Random

0 0.02 0.04 0.06 0.08 0.1 0.120

50

100

150

200

250

300

Response Value

Fre

quency

N = 10

DKM

Ranking

Random

0 0.02 0.04 0.06 0.08 0.1 0.120

50

100

150

200

250

300

Response Value

Fre

quency

N = 15

DKM

Ranking

Random

0 0.02 0.04 0.06 0.08 0.1 0.120

50

100

150

200

250

300

Response Value

Fre

quency

N = 20

DKM

Ranking

Random

Figure 12: Histograms of the bootstrap error of the estimated quantiles for

different number of function evaluation and 3 selection methods.

We observe that, in each case, the DKM method performs better than the ranking

technique. For all values of N, the errors are globally smaller for the DKM than for

ranking or random selection. In addition, the error variance is reduced with more

transfer function evaluations.

20

Figure 12 represents the bootstrap percentile intervals (α = 10) of the bootstrap

error of the estimated quantiles. The symbol in each interval represents the P50 value

of the error.

5 10 15 200

0.02

0.04

0.06

0.08

0.1Quantiles estimation


Err

or

on q

uan

tile

estim

ation

KKM

Ranking

Random

Figure 13: Confidence intervals (α = 10) of the bootstrap error of the estimated

quantiles as a function of the number of function evaluations

We observe on Figure 13 that the error tends to decrease as the number of

function evaluations increases. As observed before on the histograms, DKM performs

better than ranking, which performs better than random selection. This conclusion

was also reached in Scheidt and Caers (2009). In this example, we observe that for

the DKM, the results stabilize for N > 8. We can therefore conclude that 8 or 10 flow

simulations are necessary for the DKM selected models to have the same uncertainty

as the total set of 72. In a previous paper (Scheidt and Caers, 2009), it was concluded

that 7 simulations were satisfactory. Note however that the distance in that work was

slightly more correlated to the difference in response compared to the correlation in

this study.

The table below represents the mean of the bootstrap error, computed from the

histograms presented in Figure 12.

DKM Ranking Random select.

N = 3 0.0356 0.0495 0.0561

N = 5 0.0333 0.0348 0.0407

N = 8 0.0280 0.0293 0.0324

N = 10 0.0253 0.0270 0.0322

N = 15 0.0250 0.0325 0.0367

N = 20 0.0270 0.0316 0.0340

Table 1: Mean of the dimensionless bootstrap error for each selection method.

21

This table, as well as the histograms and confidence intervals can be very useful

to give an indication of the error resulting from the quantile estimation of the

response, based on the N selected realizations. For example, suppose we are limited

in time and can only perform 5 transfer function evaluations. However, we want to

be sure that we can be confident on the uncertainty quantification results derived from

those 5 simulations. From Table 1, we can see that the mean error for N = 5 for

DKM is 0.0333 and 0.0495 for ranking. If we had a little more time and had

performed N = 8 simulations, the error would be 0.0280 and 0.0293, which is an

improvement of 16% (15.8% for ranking) compared to the results from N = 5.

Another way of looking at the results is to show the confidence intervals for each

quantile individually. This is illustrated on Figure 14.

5 10 15 205.5

6

6.5

7

7.5x 10

4 Quantiles estimation


P1

0

5 10 15 206

6.2

6.4

6.6

6.8

7

7.2

7.4

7.6x 10



P5

0

5 10 15 206.6

6.8

7

7.2

7.4

7.6

7.8

8

8.2x 10



P9

0

KKM

Ranking

Random

KKM

Ranking

Random

KKM

Ranking

Random

Figure 14: Confidence intervals of the bootstrap estimates of the quantiles P10,

P50 and P90 (BBL) as a function of the number of function evaluation. The line

represents the quantiles derived from the entire set of realizations

Figure 14 shows that DKM and ranking produce very accurate estimates of the

P50 quantile of the target response, even for small number of transfer function

evaluations (medians are easier to estimate than extremes). In addition, the P10

quantiles tend to be slightly underestimated, but DKM is closest to the reference

22

value than the other techniques. The same conclusions are valid for the P90, except

that we observe an overestimation of the quantiles. The underestimation of P10 and

overestimation of P90 are most likely due to the use of kernel smoothing to estimate

the density, which will increase the variability of the response compared to the

original 72 realizations.

As mentioned in the beginning of the paper, the proxy measure should be

correlated for DKM and ranking to be effective. However, the correlation coefficient

between both responses is not known a priori, since the target response for all

realizations is unknown. Once a selection method is applied and the transfer function

is evaluated on the selected realizations, an estimation of the correlation coefficient

can be inferred. The quality of the estimated correlation coefficient can be studied in

exactly the same way than the estimated quantiles, by doing parametric bootstrap.

Figure 15 represents the confidence intervals obtained for different values of N, the

correlations being estimated on the same samples used to estimate the quantile error.

The symbols show the initial estimates of the covariance ρ̂ .

5 10 15 200

0.2

0.4

0.6

0.8

1Quantiles estimation


Estim

ate

d c

orr

ela

tion

coe

ffic

ient

Figure 15: Bootstrap estimated correlation coefficient on the WCA test case.

Figure 15 shows that the first estimates ρ̂ of the correlation coefficient between

the 2 responses are accurate, with a slight overestimation for small number of transfer

function evaluations (N = 3 and 5). However, the bootstrap estimated correlation

coefficients are largely underestimated. We believe that this is due to the kernel

smoothing technique, which tends to smooth the density of the bivariate data and

therefore allow Metropolis sampling to sample points in a “wider” area than it should.

This was not the case for the bi-normal example in Section 4.1. However, we can

still derive conclusions on the confidence intervals provided. We observe that DKM

tends to have less uncertainty in the correlation coefficient than ranking, except for N

= 8.

23

5. Discussion on distances

The above examples demonstrate that using the same measure for ranking and

distance provides for similar accuracy in uncertainty quantification for the Gaussian

case. We should emphasize however that the bootstrap method applied in the context

of the paper is clearly unfavorable to DKM. In order to compare ranking and the

DKM, we calculated the distance between 2 realizations as the difference of the

ranking measure between the realizations. This leads to a representation of

uncertainty in a 1D MDS-space, and therefore the use of kernel methods has not the

same impact as for higher dimensional MDS-space. The distance in this study is very

simple, whereas in many applications the distance can be much more complex, and

can take into account many measures of influential factors on the response. For

example, a distance can be a function of many parameters, such as the cumulative oil

production at different times, and water-cut of a a group of wells (Scheidt and Caers,

2009). Using traditional ranking techniques may require multiple independent studies

if one is interested in uncertainty in several responses. In the case of DKM, a single

study is enough if the distance is well chosen.

6. Conclusions

We have established a new workflow to construct confidence intervals on

quantile estimations in model selection techniques. We would like to state explicitly

that we do not treat the question of whether or not the uncertainty model, i.e. the

possibly large set of reservoir models that can be generated by varying several input

parameters, is realistic. Uncertainty quantification by itself is inherently subjective

and any confidence estimates of the uncertainty model itself are therefore useless. In

this paper we assume there is a larger set of model realizations and assume that this

set provides a realistic representation of uncertainty. Then, the proposed bootstrap

allows quantifying error on uncertainty intervals or quantiles when only a few models

from the larger set are selected.

The workflow uses model selection methods – in this work DKM or ranking -

and employs a parametric bootstrap procedure to construct confidence intervals on

the quantiles retained by the model selection techniques. Examples show that DKM

provides more robust results compared to ranking, especially for small number of

transfer function evaluations. The study of the uncertainty resulting from model

selection can be very informative - it shows if we can be confident or not in the

estimated statistics. The confidence interval is a function of the estimated variance of

the response and the estimated correlation coefficient between the proxy measure and

the response. Since the user does not know the correlation coefficient a priori, we

propose performing a bootstrap procedure between the response and its proxy to

estimate the quality of the distance. If the estimated correlation coefficient is high

and its associated uncertainty low, then we can be confident on the uncertainty

quantification results. If after N transfer function evaluations the uncertainty is large

24

and a poor correlation is found, then the results should be improved by either using a

better proxy response or doing more transfer function evaluations.

Nomenclature

NR = number of initial realizations

N = number of selected realizations for transfer function evaluation

X = [X1,…, XNR]

Xi = [xi, yi]

xi = target response value for realization i

yi = proxy response value for realization i

dij = distance between realizations i and j

ρ = correlation coefficient between the target and proxy responses

B = number of samples generated in the bootstrap procedure

e*b = bootstrap error of the estimated quantiles for sample b ***

905010ˆ,ˆ,ˆ PPP xxx = estimated P10, P50 and P90 after the first selection method

***

905010

ˆ̂,ˆ̂,ˆ̂ PPP xxx = bootstrap estimated quantiles for the second selection method

References

Ballin, P.R., Journel A.G., and Aziz, K. [1992] Prediction of Uncertainty in

Reservoir Performance Forecast, JCPT, no. 4.

Batycky, R. P., Blunt, M. J. and Thiele, M. R. 1997. A 3D Field-Scale

Streamline-Based Reservoir Simulator, SPERE 12(4): 246-254.

Borg, I., Groenen, P. 1997. Modern multidimensional scaling: theory and

applications. New-York, Springer.Bowman, A. W., and A. Azzalini, [1997] Applied

Smoothing Techniques for Data Analysis, Oxford University Press

Erfon, B. [1979]. Bootstrap methods: Another look at the Jackknife, The Annals

of Statistics 7 (1): 1-26

Hastings, W. K. 1970. Monte Carlo sampling methods using Markov Chains and

their applications. Biometrika 57: 97–109.

McLennan, J.A., and Deutsch, C.V. 2005. Ranking Geostatistical Realizations by

Measures of Connectivity, Paper SPE/PS-CIM/CHOA 98168 presented at the SPE

International Thermal Operations and Heavy Oil Symposium, Calgary, Alberta,

Canada, 1-3 November.

Metropolis, N., and S. Ulam. 1949. The Monte Carlo method. J. Amer. Statist.

Assoc. 44: 335–341.

Metropolis, N., A.W. Rosenbluth, M. N. Rosenbluth, A.Teller, and H. Teller.

25

1953. Equations of state calculations by fast computing machines. Journal of

Chemical Physics 21: 1087–1091

Saad, N., Maroongroge, V. and Kalkomey C. T. 1996. Ranking Geostatistical

Models Using Tracer Production Data, Paper presented at the European 3-D

Reservoir Modeling Conference, Stavanger, Norway, 16-17 April.

Scheidt, C., and Caers, J. 2008. Representing Spatial Uncertainty Using Distances

and Kernels. Mathematical Geosciences, DOI:10.1007/s11004-008-9186-0.

Scheidt, C., and Caers, J. 2009, A new method for uncertainty quantification

using distances and kernel methods. Application to a deepwater turbidite reservoir.

Accepted in SPEJ. To be published.

Schoelkopf B., Smola A. (2002) Learning with kernels, MIT Press, Cambridge,

664p.

Strebelle, S. 2002. Conditional Simulation of Complex Geological Structures

using Multiple-point Statistics, Mathematical Geology, 34(1): 1-22.

Date post:	19-Feb-2021
Category:	Documents
Upload:	others
View:	7 times
Download:	0 times

Bootstrap confidence intervals for reservoir model selection ......Multi-dimensional scaling (MDS)...

Documents