1
Bootstrap confidence intervals for reservoir model
selection techniques
Céline Scheidt and Jef Caers
Department of Energy Resources Engineering
Stanford University
Abstract
Stochastic spatial simulation allows rapid generation of multiple,
alternative realizations of spatial variables. Quantifying uncertainty on response
resulting from those multiple realizations would require the evaluation of a
transfer function on every realization. This is not possible in real applications,
where one transfer function evaluation may be very time consuming (several
hours to several days). One must therefore select a few representative
realizations for transfer function evaluation and then derive the production
statistics of interest (typically the P10, P50 and P90 quantiles of the response).
By selecting only a few realizations one may risk biasing the P10, P50 and P90
estimates as compared to the original multiple realizations.
The principle objective of this study is to develop a methodology to
quantify confidence intervals for the estimated P10, P50 and P90 quantiles
when only a few models are retained for response evaluation. Our approach is
to use the parametric bootstrap technique, which allows the evaluation of the
variability of the statistics obtained from uncertainty quantification and
construct confidence intervals. A second objective is to compare the confidence
intervals when using two selection methods available to quantify uncertainty
given a set of geostatistical realizations: traditional ranking technique and the
distance-based kernel clustering technique (DKM). The DKM has been
recently developed and has been shown to be effective in quantifying
uncertainty.
The methodology is demonstrated using two examples. The first example is
a synthetic example, which uses bi-normal variables and serves to demonstrate
the technique. The second example is from an oil field in West Africa where
the uncertain variable is the cumulative oil production coming from 20 wells.
The results show that for the same number of transfer function evaluations, the
DKM method has equal or smaller error and confidence interval compared to
ranking.
2
1. Introduction
Uncertainty quantification of subsurface spatial phenomena is done in the context
of decision making, often by estimating low, mean and high quantile values (typically
P10, P50, and P90) of the response of interest. Often, an exhaustive sampling of all
uncertain parameters is unfeasible, and only a small subset of reservoir model
realizations of the phenomena can be created. Due to high computational
requirements, the transfer function must be evaluated on an even smaller subset of
realizations. Therefore, any quantiles that are estimated from this subset are
themselves subject to uncertainty, and may vary depending on the selection method,
the number of transfer function evaluations, the initial set of realizations, the use of a
proxy response, etc.
The objective of the study is to be able to quantify confidence intervals for the
estimated P10, P50 and P90 quantiles when only a few models are retained for
response evaluation. The magnitude of the confidence intervals can then be used to
decide whether or not more flow simulations are required to establish a better
quantification of response uncertainty. The methodology developed uses parametric
bootstrap technique, which is a statistical method allowing to construct confidence
intervals of the estimated statistics. Such confidence intervals provide an idea on the
variability of the statistics inferred by selecting only a few models for evaluation.
The workflow can be applied using any technique of reservoir model selection. In
this paper, we compare the behavior of the estimated quantiles using 3 different
selection techniques. The first method is the traditional ranking technique (Ballin et
al., 1992), which select realizations according to a ranking measure. The second
method has been developed recently and is called the distance-based kernel technique
(DKM, Scheidt and Caers, 2008). Finally, we use a random selection for comparison.
It should be noted that the proposed bootstrap technique applies to any model
selection methodology.
The paper is organized as follows. In the next section, we give a description of
the two methods employed to quantify uncertainty in spatial parameters. Then, we
give a brief overview of the basic ideas of the bootstrap methodology in the context
of parametric inference, illustrated by a typical example. We then describe our
workflow which is applied to cases where we have a proxy response which can be
evaluated rapidly for each realization, and a true response which cannot be evaluated
for every realization. The subsequent section is devoted to the application of the
specific workflow to two examples, the first being a synthetic example, the second is
an example from an oil field in West Africa. Finally, we discuss the results obtained
as well as some concluding remarks.
2. Quantification of uncertainty – methodologies
Uncertainty quantification of a spatial phenomenon aims at characterizing the
3
statistics (P10, P50 and P90) of the response(s) of interest. In real applications where
one transfer function evaluation can be very time consuming, it may not be possible
to perform a transfer function evaluation on every realization of the reservoir. This
difficulty can be overcome by selecting a representative set of realizations from the
initial set. In this paper, we consider two different ways of selecting realizations for
transfer function evaluation. The first method is the traditional ranking technique,
which was introduced by Ballin et al. in 1992. The second method, denoted the
Distance-Kernel Method (DKM) is more recent and was first presented in 2008
(Scheidt and Caers) and applied to a real case in Scheidt and Caers (2009).
2.1. Traditional Ranking
Traditional ranking technique was introduced by Ballin in 1992 in the context of
stochastic reservoir modeling. The basic idea behind ranking is to define a rapidly
calculable ranking measure, which can be evaluated for each realization. Most of the
time, the ranking measure is static (eg. original oil-in-place), however more recent
studies employ more complex measures, such as connectivity (McLennan and
Deustch, 2005), streamline (Gilman et al., 2002) or tracer-based measures (Ballin et
al., 1992, Saad et al., 1996). The ranking measure acts as a proxy of the response of
interest for each realization. To be effective, therefore, ranking requires a good
correlation between the ranking measure and the response. The ranking measure is
used to rank the realizations according to the measure, and realizations are
subsequently selected corresponding typically to the P10, P50 and P90 quantiles.
Full flow simulation is then performed on these selected realizations, and the P10,
P50 and P90 values are derived from the simulation results.
In previous work (Scheidt and Caers, 2009), we show that selecting only 3
realizations to derive the P10, P50, and P90 quantiles can result in very inaccurate
estimations. In this study, contrary to the standard ranking approach, we propose to
select more than 3 realizations, and compare ranking with the Distance-Kernel
Method proposed below. The realizations are selected equally-spaced according to
the ranking measure, and we derive the P10, P50 and P90 quantiles by interpolation
from the distribution of the selected points.
2.2. Distance-Kernel Method
In this section, we describe the main principle of the Distance-Kernel Method
(DKM), illustrated in Figure 1. Starting from a large number of model realizations,
the first step is to define a dissimilarity distance between the realizations. This
distance is a measure of the dissimilarity between any two realizations, and should be
tailored to the application and the response(s) of interest (just as in ranking), in order
to make uncertainty quantification more efficient. The distance is evaluated between
any two realizations, and a dissimilarity distance table (NR x NR) is then derived.
Multi-dimensional scaling (MDS) is then applied using the distance table (Borg and
4
Groenen, 1997). This results in a map (usually 2 or 3D) of the realizations, where the
Euclidean distance between any two realizations is similar to the distance table. Note
that only the distance between the realizations in the new space matters - the actual
position of the realizations is irrelevant. Once the realizations are in MDS space, one
could classify realizations and select a subset using clustering techniques. However,
often the points in MDS space do not vary linearly and thus classical clustering
methods would result in inaccurate classification. To overcome the nonlinear
variation of the points, Schöelkopf et al. (2002) introduced kernel methods to
improve the clustering results. The main idea behind kernel methods is to introduce a
highly non-linear function Φ and map the realizations from the MDS space to a new
space, called feature space. The high dimensionality of that space makes the points
behave more linearly and thus standard classification tools, such as clustering, can be
applied more successfully. In this paper, we employ kernel k-means to select
representative realizations of the entire set. Transfer function evaluation is then
applied on the closest realization to the centroids and the statistics (P10, P50 and P90)
are computed on the small subset of realizations.
Model 1 Model 2
Model 3 Model 4
δδδδ12
δδδδ13 δδδδ24
δδδδ34
δδδδ32
δδδδ14
Distance Matrix DEuclidean Space
ΦΦΦΦ-1
Non-linear variation
linear “features”
Apply standard tools here
PCA
ClusteringDimensionality reduction
Apply standard tools herePCA
Clustering
Dimensionality reduction
Kernels
R
F
R
(a) (b) (c)
(d)
(e)
simulation
ΦΦΦΦP10,P50,P90
model selection
δδδδ44δδδδ43δδδδ42δδδδ414
δδδδ34δδδδ33δδδδ32δδδδ313
δδδδ24δδδδ23δδδδ22δδδδ212
δδδδ14δδδδ13δδδδ12δδδδ111
4321
Figure 1: DKM for uncertainty quantification: (a) distance between two models,
(b) distance matrix, (c) models mapped in Euclidean space, (d) feature space, (e) pre-
image construction, (f) P10, P50, P90 estimation
5
For more details about the methodology, we refer to Scheidt and Caers (2008).
3. Parametric Bootstrap – Methodology
3.1. General introduction to Bootstrap
Bootstrap methods fall within the broader category of resampling methods. The
concept of the bootstrap was first introduced by Efron (1979). In his paper, Efron
considered two types of bootstrap procedures (nonparametric and parametric
inference). Bootstrap is a Monte-Carlo simulation technique that uses sampling
theory to estimate the standard error and the distribution of a statistic. In many recent
statistical texts, bootstrap methods are recommended for estimating sampling
distributions, finding standard errors and confidence intervals. A bootstrap procedure
is the practice of estimating properties of an estimator (such as its variance) by
measuring those properties when sampling from an approximate distribution. In the
parametric bootstrap, we consider an unknown distribution F to be a member of
some prescribed parametric family and obtain the empirical distribution nF̂ by
estimating the parameters of the family from the data. Then, a new random sequence,
called a resample, is generated from the distribution nF̂ .
The parametric bootstrap procedure works as follows. First, the statistics θ̂ of the distribution of the initial sample are computed (for example the mean and
variance). Then, the distribution nF̂ is estimated using those statistics. We assume
that the distribution nF̂ is the true distribution and we use Monte-Carlo simulation to
generate B new samples of the initial sample using the distribution nF̂ . Next, we
apply the same estimation technique to these “bootstrapped” data to get a total of B
bootstrap estimates of θ̂ , which are denoted b*ˆ̂
θ , b = 1,…B. Using these B bootstrap estimates, we can compute confidence intervals or any other statistical measure of
error.
Simple illustrative example
A simple example illustrating the parametric bootstrap method is presented in
Figure 2. Suppose we have NR=15 values X = (x1,…,xNR) of a normal distribution
),( σµN and we are interested in the estimation of the unknown parameters µ and σ.
The first step is to assume that X has a normal distribution Fn and then to estimate the
mean and variance of the distribution:
x=µ̂ and ( )2N
1R
2R
N
1ˆ ∑
=
−=i
i xxσ
6
We assume that µ̂ and σ̂ are the true parameters and we generate B = 1000 new
samples X*b (b=1,…,B) from )ˆ,ˆ(ˆ σµNFn = using Monte-Carlo simulation, each
sample containing NR=15 values. For each sample, the bootstrap estimate of the
mean and variance of the distribution can be calculated:
bbx
**ˆ̂ =µ and ( )2N
1
**
R
*2R
N
1ˆ̂ ∑=
−=i
bb
i
bxxσ
Having computed ( )bbb *2** ˆ̂,ˆ̂ˆ̂ σµθ = , one can now construct a histogram of the mean and the variance to display the probability distribution of the bootstrap
estimator (Figure 2). From this distribution, one can obtain an idea of the statistical
properties of the estimates µ̂ and σ̂ . In Figure 2, the red line represents the
estimation of the mean µ̂ and variance σ̂ of the initial sample.
)ˆ,ˆ(ˆ σµNF ≈
],...,[ **1* b
Nbb
Rxx=X
)ˆ̂,ˆ̂(ˆ̂ *** bbb σµθ =
],...,[ 1RN
xx=X
2 3 4 5 6 7 80
50
100
150
200
250
300
Bootstrap Mean
Fre
qu
en
cy
0 5 10 15 20 250
50
100
150
200
250
300
Bootstrap Variance
Fre
qu
en
cy
),( σµNF ≈
)ˆ,ˆ(ˆ σµθ =
Figure 2: Application of the parametric bootstrap procedure to a simple example
The histograms of the bootstrap estimations of the mean and the variance are
informative about the variability of the statistics obtained. Confidence intervals of
the estimated mean and variance (or any quantiles) can then be calculated from the B
estimates of the mean and variance.
The next section shows how to apply the bootstrap method in the context of
uncertainty quantification where a proxy value can be rapidly calculated for many
realizations of a spatial phenomenon.
3.2. Workflow of the study
Contrary to the previous example where the data are univariate, in the context of
reservoir model selection techniques, a proxy response is employed for the selection
7
using DKM or ranking and thus two variables are necessary: the response of interest
and the proxy response.
Therefore, we consider a bivariate variable X = [X1, X2,…, XNR], where:
• Xi = [xi,yi], i =1,…,NR, NR being the total number of samples/realizations
• xi represents the response of interest (e.g. cumulative oil production)
• yi represents the proxy response, which will serve as a ranking measure or
be transformed to a distance.
Note that for ranking and DKM to be effective, the response and its proxy should
be reasonably well correlated. In addition, for real applications, the values of the true
response xi for each realization are unknown.
In traditional uncertainty quantification, the proxy response serves as a guide to
select a few realizations which will be evaluated using the transfer function. The
response quantiles are then deduced from the evaluation of the realizations. Since the
resulting quantiles are subject to uncertainty, the bootstrap method illustrated
previously is well suited to the problem and can inform us on the accuracy of the
estimated quantiles and give an idea of the error resulting from the selection of a
small subset of realizations.
The workflow in the context of reservoir model selection is as follow. It is
illustrated in Figure 3.
1. Evaluate a proxy response yi for each of the i =1,…,NR realizations.
2. Apply ranking or DKM using the proxy response in order to select N
samples/realizations for evaluation (note that N
8
intervals.
6. A single measure of accuracy of our quantile estimation is defined by
computing the dimensionless bootstrap error of the estimated quantiles
for each of the B new samples created (Eq. 1):
−+
−+
−=
90
9090
50
5050
10
1010
ˆ
ˆˆ̂
ˆ
ˆˆ̂
ˆ
ˆˆ̂
3
1***
*
P
P
b
P
P
P
b
P
P
P
b
Pb
x
xx
x
xx
x
xxerror (1)
The bootstrap error of the estimated quantiles is evaluated on each sample, and
thus can be represented as a histogram to visualize the variability between the
samples. From the histogram, we can quantify the variation of the bootstrap error of
the estimated quantiles.
KKM/Ranking
-> Select N points
**
1 ,, Nxx K
**
1 ,, Nyy K
b
N
b xx **1 ,,K
b
N
byy
**
1 ,,K
True Values
RN1,, xx K
RN1,, yy K
KKM/Ranking
-> Select N points )(N
)(
1 R,, bb yy K
)(
N
)(
1 R,, bb xx K
−+
−+
−=
90
9090
50
5050
10
1010
ˆ
ˆˆ̂
ˆ
ˆˆ̂
ˆ
ˆˆ̂
3
1***
*
P
P
b
P
P
P
b
P
P
P
b
Pb
x
xx
x
xx
x
xxerror
b
P
b
P
b
P xxx***
905010
ˆ̂,ˆ̂,ˆ̂
***
905010ˆ,ˆ,ˆ PPP xxx b = 1,…,B
• Parametric Bootstrap
• Estimation of distribution
• Generation of B samples
nF̂
Figure 3: Workflow of the bootstrap method applied to uncertainty quantification
The workflow described previously and illustrated in Figure 3 is performed for
several values of N, where N is the number of selected realizations for evaluation.
This is done to evaluate the influence of the number of transfer function evaluations
on the accuracy of the quantile estimation. For each value of N, the selected
realizations are obtained using DKM or ranking methods, and therefore the
realizations are different for each value of N.
Now that the basic idea and theory of the bootstrap method has been presented,
9
the next section shows some application examples of this technique in the context of
uncertainty quantification.
4. Application of the methodology to uncertainty
quantification
Two examples are presented in this section. The first one is illustrative and uses a
bivariate Gaussian distribution. The second example is more complex and is based
on a real oil field reservoir in West Africa (West Coast African reservoir) and uses
real production data.
In the case of DKM, the definition of a distance between any two realizations is
required. In this study, in order to compare the results of the DKM with those
obtained by ranking using the exact same information, we use simply the difference
of ranking measure (proxy response) as a distance between realizations. Note
however that, as opposed to the ranking measure, the distance can be calculated using
a combination of many different measures, and thus has more flexibility to be tailored
to the application. We will discuss the consequences of this in more detail below.
4.1. Bivariate Gaussian distribution
In the first example, we consider a bivariate Gaussian distribution:
),(~ ΣµbiNX , where µ represents the mean and Σ the covariance matrix. In this
example, the mean of the sample is taken as ]5,5[=µ , and the covariance is taken
as:
=Σ
2ρ2
ρ22. The parameter ρ defines the correlation coefficient between the
target response and the proxy response.
To set up an example, an initial sample X of NR = 100 values is generated using
the distribution given above. Note that for this illustrative example, we use the term
sample instead of realization, since no geostatistical realization is associated to each
bivariate value. Figure 4 shows an example of the probability density plot of the bi-
normal sample X, where the correlation coefficient between the target and proxy
responses was defined as ρ = 0.9.
10
Figure 4: Probability density of X, which has a bi-normal distribution
Now that the initial data is defined, we assume that we only know the type of
distribution of X (bi-normal), but that we do not know the parameters defining the
distribution (the mean µ and the covariance Σ ). The bootstrap procedure illustrated
in Figure 3 is applied taking the sample X generated previously (Figure 4) and using
DKM to select N =15 points. Estimation of the mean µ̂ and the covariance Σ̂ are
then obtained using the response on the 15 selected points and thus the estimated
bivariate distribution of the data is assumed to be the true distribution:
)ˆ,ˆ(ˆ Σ= µbiNFn . B=1000 new samples of this distribution can then be easily
derived, since the distribution is known. Uncertainty quantification is then performed
on those B samples, and an estimation of the variability of the quantiles is possible.
Examples of the bootstrap histograms of the P10, P50 and P90 quantiles are presented
in Figure 5.
1 2 3 4 50
50
100
150
200
250
300
Bootstrap P10
Fre
que
ncy
4 4.5 5 5.5 6 6.50
50
100
150
200
250
Bootstrap P50
Fre
que
ncy
5 6 7 8 9 100
50
100
150
200
250
300
Bootstrap P90
Fre
que
ncy
Figure 5: Histogram of the P10, P50 and P90 quantiles estimated by bootstrap
***
905010
ˆ̂,ˆ̂,ˆ̂ PPP xxx . The red line represents the estimated quantiles***
905010ˆ,ˆ,ˆ PPP xxx . The
estimates are calculated using DKM to select 15 points.
11
We observe on Figure 5 that the distribution of the bootstrap quantiles is
Gaussian. In addition, there is a small bias in the estimation of the P10 and P90
quantiles for this example. Although this is not shown, ranking has the same effect.
The result is that on average, the 10
ˆ̂Px is overestimated and the 90
ˆ̂Px is underestimated.
The biased estimates should not affect the determination of the confidence intervals.
In our study, we have found that the estimated mean µ̂ and covariance Σ̂ from
the initial sample had an impact on the confidence intervals. Since our goal in this
first example is to understand what the general behavior is when varying the number
of selected samples N, we propose to do a Monte-Carlo bootstrap, which basically
means that we randomize the initial sample and use C sets of initial samples, then
perform the workflow illustrated in Figure 3 on those C sets of initial samples. The
estimated statistics of each initial sample are averaged to obtain the final statistics. In
this study, we take C = 15. In the next few examples, the workflow illustrated in
Figure 3 has been performed by varying the number of selected samples (N = 5, 8, 10,
15 and 20 more precisely), in order to examine the effect of the number of transfer
function evaluations on the bootstrap error. In addition, several correlation values
between the proxy response and the target response were used to explore the
influence of the correlation coefficient on the confidence intervals. Results are
presented in Figure 6, for ρ = 1, 0.9, 0.8, 0.7, 0.6 and 0.5 respectively. Figure 6
shows the confidence intervals of the error of the bootstrap estimated quantiles for
DKM (blue - square) and ranking (red - dot) for different values of N. The number of
bootstrap samples generated is B = 1000. The symbols represent the P50 value of
the error of the estimated quantiles, in other words, half of the estimated quantiles
have an error below this value and half above.
12
5 10 15 200
0.05
0.1
0.15
0.2
0.25
0.3
ρ = 1.0
# of function evaluation
Err
or
on
qu
an
tile
estim
atio
n
5 10 15 200
0.05
0.1
0.15
0.2
0.25
0.3
ρ = 0.6
# of function evaluation
Err
or
on
qu
an
tile
estim
atio
n
5 10 15 200
0.05
0.1
0.15
0.2
0.25
0.3
ρ = 0.8
# of function evaluation
Err
or
on
qu
an
tile
estim
atio
n
5 10 15 200
0.05
0.1
0.15
0.2
0.25
0.3
ρ = 0.9
# of function evaluation
Err
or
on
qu
an
tile
estim
atio
n
5 10 15 200
0.05
0.1
0.15
0.2
0.25
0.3
ρ = 0.5
# of function evaluation
Err
or
on
qu
an
tile
estim
atio
n
5 10 15 200
0.05
0.1
0.15
0.2
0.25
0.3
ρ = 0.7
# of function evaluation
Err
or
on
qu
an
tile
estim
atio
n
KKM
Ranking
Random
KKM
Ranking
Random
KKM
Ranking
Random
KKM
Ranking
Random
KKM
Ranking
Random
KKM
Ranking
Random
Figure 6: Confidence intervals (α = 10) of the bootstrap error of the estimated
quantiles as a function of the number of function evaluation for ρ = 1, 0.9, 0.8, 0.7,
0.6 and 0.5. The symbols represent the P50 value of the bootstrap error.
We observe on Figure 6 that the error globally decreases as the number of transfer
function evaluation increases. Also, the confidence intervals tend to narrow as the
number of transfer function evaluation increases, meaning that the error in our
estimates decreases. Both methods, DKM and ranking, provide similar results.
However, the error obtained by the DKM is slightly smaller than the one observed for
ranking. The same remark is valid for the confidence intervals. Finally, the results
13
provided by DKM vary smoother than the one obtained by ranking technique.
Note that each method selects optimally N samples for evaluation. Therefore, the
N = 8 models selected do not necessarily include the N = 5 models. This is true for
all N.
The bootstrap method can also be used to compute an estimate of the correlation
coefficient between the actual response and the proxy response. Figure 7 presents the
confidence intervals for the correlation corresponding to the results obtained in Fig. 6.
5 10 15 20-0.4
-0.2
0
0.2
0.4
0.6
0.8
1ρ = 0.5
# of function evaluation
Estim
ate
d c
orr
ela
tion
5 10 15 20
0.4
0.5
0.6
0.7
0.8
0.9
1ρ = 0.7
# of function evaluation
Estim
ate
d c
orr
ela
tion
5 10 15 200.75
0.8
0.85
0.9
0.95
1ρ = 0.9
# of function evaluation
Estim
ate
d c
orr
ela
tion
KKM
Ranking
Random
KKM
Ranking
Random
KKM
Ranking
Random
KKM
Ranking
Random
KKM
Ranking
Random
5 10 15 20-0.2
0
0.2
0.4
0.6
0.8
1
1.2ρ = 0.6
# of function evaluation
Estim
ate
d c
orr
ela
tion
5 10 15 200.55
0.6
0.65
0.7
0.75
0.8
0.85
0.9
0.95
ρ = 0.8
# of function evaluation
Estim
ate
d c
orr
ela
tion
Figure 7: Bootstrap estimates (α = 10) of the correlation between the response
and the proxy. The black line represents the input correlation (ρ = 0.9, 0.8, 0.7, 0.6
and 0.5) to generate the first sample
14
We observe on Figure 7 that the estimation of the correlation coefficient tends to
be overestimated, especially for small values of N. Figure 7 also shows that the
correlation estimation becomes more accurate and less prone to error as the number
of transfer function evaluation increases.
The next section illustrates the workflow using a real oil reservoir, located in
West Africa.
4.2. West Coast African reservoir
• Reservoir Description
The West Coast African (WCA) reservoir is a deepwater turbidite offshore
reservoir located in a slope valley. The reservoir is located offshore in 1600 feet of
water and is 4600 feet below see level. Four depositional facies were interpreted
from the well logs: shale (Facies 1), poor quality sand #1 (Facies 2), poor quality sand
#2 (Facies 3) and good quality channels (Facies 4). The description of the facies
filling the slope valley is subject to uncertainty. 12 TIs are used in this case study,
representing uncertainty on the facies representations.
The reservoir is produced with 28 wells, of which 20 are production wells and 8
are water injection wells. The locations of the wells are displayed in Figure 8. Wells
colored in red are producers wells and in blue are injectors.
Figure 8: Location of the 28 wells. Red are production wells and blue are
injection wells. Different colors in grid represent different fluid regions
72 geostatistical realizations were created using the multi-point geostatistical
algorithm snesim (Strebelle, 2002). To include spatial uncertainty, two realizations
were generated for each combination of TI and facies probability cube, leading to a
total of 72 possible realizations of the WCA reservoir. Each flow simulation took 4.5
hours.
15
In a previous paper (Scheidt and Caers, 2009), uncertainty quantification on the
WCA reservoir has been performed by performing only a small number of
simulations. It was shown that the statistics obtained by flow simulation on 7
realizations selected by the DKM are very similar to the one obtained by simulation
on the entire set of 72 realizations. A comparison with the traditional ranking method
showed that the DKM method easily outperforms ranking technique without requiring
any additional information. However, in reality, one would not have access to the
results of those 72 flow simulations, hence one would not know how accurate the
results of P10, P50, P90 of those 7 flow simulations are with respect to the entire set
of 72 flow simulations.
In this study, the response of interest is the cumulative oil production at 1200
days. We have evaluated the response for each of the 72 realizations, as a reference.
For the proxy response, we evaluated the cumulative oil production using streamline
simulation (Batycky et al., 1997). The correlation coefficient between the response
and the proxy is ρ = 0.92. In order to perform the parametric bootstrap procedure, we
must estimate the distribution of the cumulative oil production and its ranking proxy,
and be able to generate new samples with densities of the bivariate distribution.
Because we do not know a priori the distribution of the cumulative oil production and
its proxy (contrary to the previous example), we propose to compute the bivariate
densities using a kernel smoothing technique (Bowman and Azzalini, 1997).
• Generation of a sample for a kernel smoothing density
Kernel smoothing (Bowman and Azzalini, 1997) is a spatial method that
generates a map of density values. The density at each location reflects the
concentration of points in the surrounding area. Kernel smoothing does not require
making any parametric assumption about the probability density function (pdf). The
kernel smoothing density of a variable ],,[RN1
xx K=X is defined as follow:
p
i
i
i
pxx
h
xxK
hhxf ℜ∈
−= ∑
=
,,N
1),(ˆ
RN
1R
with K the kernel function and h the bandwidth.
In the case of a Gaussian rbf kernel, the kernel smoothing density is defined as:
∑=
−−=
RN
1
2
2/1
R2
1exp
)2(N
1),(ˆ
i
i
p h
xx
hhxf
π
Choosing the bandwidth for the kernel smoothing can be a difficult task, and is
generally a compromise between acceptable smoothness of the curve and fidelity to
the data. The choice of h has an impact on the overall appearance of the resulting
smooth curve, much more so than the choice of the kernel function which is generally
held to be of secondary importance. In this work, we use a bandwidth which is
function to the number of points in X.
16
For example, Figure 9 shows the density distribution of the 72 data from the
WCA example, estimated by kernel smoothing using a Gaussian kernel.
Figure 9: Probability density of X for WCA
Once the density of the bivariate variable has been defined, new samples of the
same distribution can be generated using Metropolis sampling technique.
• Overview of the Metropolis sampling algorithm
Metropolis-Hasting technique is a Markov chain-based method which allows
generating a random variable having a particular distribution (Metropolis and Ulam
1949, Metropolis et al. 1953). The Metropolis algorithm generates a sequence of
samples from a distribution f as follows:
1. Start with some initial value x0
2. Given this initial value, draw a candidate value x* from some
proposal distribution (we choose a uniform distribution)
3. Compute the ratio α of the density at the candidate x* and the current
xt-1 points and accept the candidate point with probability α:
4. Return to step 2 until the desired number of samples is obtained.
5. The new sample ( )txx ,...,1 is of distribution f
)(
*)(
1−
=txf
xfα
17
An illustration of a sample generated by Metropolis sampling associated with the
density provided by kernel smoothing is presented in Figure 10. The contours present
the density probability which is calculated using N = 10 values of response selected
by DKM. The red points show 70 values derived from this density by Metropolis
sampling.
Figure 10: Generation of a new sample using Metropolis sampling. The contours
represent the probability density obtained by kernel smoothing and red dots represent
new sample generated by Metropolis sampling
• Application of the bootstrap technique to the WCA case
In the context of uncertainty quantification in cumulative oil production, the
initial data are the flow simulations at the NR = 72 realizations of the WCA reservoir:
− x1,…xNR: Cumulative oil production obtained by full flow simulation
(target response)
− y1,…yNR: Cumulative oil production obtained by fast flow simulation
(proxy response)
The distance employed for the DKM is computed as the absolute value of the
difference of proxy response between any two realizations: jiij yyd −= .
The bootstrap procedure, illustrated in Figure 3, is performed for different number
N of transfer function evaluation: in this case N = 3, 5, 8, 10, 15 and 20. For each
value of N, the procedure to generate B bootstrap samples is as follow:
1. Select N realizations using the proxy response as ranking measure or distance
measure d according to the method used
2. Evaluation of the response using the transfer function (flow simulation) on
the N selected realizations
18
3. Compute the bivariate density nF̂ of the target response using kernel
smoothing on the N responses resulting from the selected realizations
4. Use Metropolis sampling to generate B samples of the bivariate distribution
nF̂
5. For each of the B samples generated, apply ranking or DKM to select N
realizations and compute the statistics of interest: ( )***905010
ˆ̂,ˆ̂,ˆ̂ˆ̂
PPP xxx=θ .
The workflow illustrated in Figure 11 gives more details than the general
workflow in Figure 3, by including the estimation of nF̂ by kernel smoothing and the
generation of new samples by metropolis sampling.
Kernel Smoothing on N
selected realizations
Metropolis sampling to
generate a new sample
RN1,, yy K
Proxy measure: Response evaluation
on N selected real.
nF̂)(
N
)(
1 R,, bb yy K
)(
N
)(
1 R,,
bbxx K
b
N
bxx
**
1 ,,K
b
N
b yy **1 ,,K
KKM/ranking
to select N real.
**
1 ,, Nxx K**
1 ,, Nyy K
P10, P50 and P90 evaluation
on N selected real.
KKM/ranking
to select N real.
Figure 11: Workflow for confidence interval calculation
The next section shows an application of the workflow illustrated above in Figure
11. The workflow is performed using 3 different methods for selecting the
realizations: DKM, ranking and random selection. Our objective is to see how each
method behaves as the number of transfer function evaluation increases and how they
compare to each other.
First, we compare the 3 methods by looking at the histograms of the bootstrap
error of the estimated quantiles for each method (Figure 12). The bootstrap error is
computed using Eq. 1 above. The blue, red and green bars represent the error
obtained for DKM, ranking and random selection respectively.
19
0 0.02 0.04 0.06 0.08 0.1 0.120
50
100
150
200
250
300
Response Value
Fre
quency
N = 3
DKM
Ranking
Random
0 0.02 0.04 0.06 0.08 0.1 0.120
50
100
150
200
250
300
Response Value
Fre
quency
N = 5
DKM
Ranking
Random
0 0.02 0.04 0.06 0.08 0.1 0.120
50
100
150
200
250
300
Response Value
Fre
quency
N = 8
DKM
Ranking
Random
0 0.02 0.04 0.06 0.08 0.1 0.120
50
100
150
200
250
300
Response Value
Fre
quency
N = 10
DKM
Ranking
Random
0 0.02 0.04 0.06 0.08 0.1 0.120
50
100
150
200
250
300
Response Value
Fre
quency
N = 15
DKM
Ranking
Random
0 0.02 0.04 0.06 0.08 0.1 0.120
50
100
150
200
250
300
Response Value
Fre
quency
N = 20
DKM
Ranking
Random
Figure 12: Histograms of the bootstrap error of the estimated quantiles for
different number of function evaluation and 3 selection methods.
We observe that, in each case, the DKM method performs better than the ranking
technique. For all values of N, the errors are globally smaller for the DKM than for
ranking or random selection. In addition, the error variance is reduced with more
transfer function evaluations.
20
Figure 12 represents the bootstrap percentile intervals (α = 10) of the bootstrap
error of the estimated quantiles. The symbol in each interval represents the P50 value
of the error.
5 10 15 200
0.02
0.04
0.06
0.08
0.1Quantiles estimation
# of function evaluation
Err
or
on q
uan
tile
estim
ation
KKM
Ranking
Random
Figure 13: Confidence intervals (α = 10) of the bootstrap error of the estimated
quantiles as a function of the number of function evaluations
We observe on Figure 13 that the error tends to decrease as the number of
function evaluations increases. As observed before on the histograms, DKM performs
better than ranking, which performs better than random selection. This conclusion
was also reached in Scheidt and Caers (2009). In this example, we observe that for
the DKM, the results stabilize for N > 8. We can therefore conclude that 8 or 10 flow
simulations are necessary for the DKM selected models to have the same uncertainty
as the total set of 72. In a previous paper (Scheidt and Caers, 2009), it was concluded
that 7 simulations were satisfactory. Note however that the distance in that work was
slightly more correlated to the difference in response compared to the correlation in
this study.
The table below represents the mean of the bootstrap error, computed from the
histograms presented in Figure 12.
DKM Ranking Random select.
N = 3 0.0356 0.0495 0.0561
N = 5 0.0333 0.0348 0.0407
N = 8 0.0280 0.0293 0.0324
N = 10 0.0253 0.0270 0.0322
N = 15 0.0250 0.0325 0.0367
N = 20 0.0270 0.0316 0.0340
Table 1: Mean of the dimensionless bootstrap error for each selection method.
21
This table, as well as the histograms and confidence intervals can be very useful
to give an indication of the error resulting from the quantile estimation of the
response, based on the N selected realizations. For example, suppose we are limited
in time and can only perform 5 transfer function evaluations. However, we want to
be sure that we can be confident on the uncertainty quantification results derived from
those 5 simulations. From Table 1, we can see that the mean error for N = 5 for
DKM is 0.0333 and 0.0495 for ranking. If we had a little more time and had
performed N = 8 simulations, the error would be 0.0280 and 0.0293, which is an
improvement of 16% (15.8% for ranking) compared to the results from N = 5.
Another way of looking at the results is to show the confidence intervals for each
quantile individually. This is illustrated on Figure 14.
5 10 15 205.5
6
6.5
7
7.5x 10
4 Quantiles estimation
# of function evaluation
P1
0
5 10 15 206
6.2
6.4
6.6
6.8
7
7.2
7.4
7.6x 10
4 Quantiles estimation
# of function evaluation
P5
0
5 10 15 206.6
6.8
7
7.2
7.4
7.6
7.8
8
8.2x 10
4 Quantiles estimation
# of function evaluation
P9
0
KKM
Ranking
Random
KKM
Ranking
Random
KKM
Ranking
Random
Figure 14: Confidence intervals of the bootstrap estimates of the quantiles P10,
P50 and P90 (BBL) as a function of the number of function evaluation. The line
represents the quantiles derived from the entire set of realizations
Figure 14 shows that DKM and ranking produce very accurate estimates of the
P50 quantile of the target response, even for small number of transfer function
evaluations (medians are easier to estimate than extremes). In addition, the P10
quantiles tend to be slightly underestimated, but DKM is closest to the reference
22
value than the other techniques. The same conclusions are valid for the P90, except
that we observe an overestimation of the quantiles. The underestimation of P10 and
overestimation of P90 are most likely due to the use of kernel smoothing to estimate
the density, which will increase the variability of the response compared to the
original 72 realizations.
As mentioned in the beginning of the paper, the proxy measure should be
correlated for DKM and ranking to be effective. However, the correlation coefficient
between both responses is not known a priori, since the target response for all
realizations is unknown. Once a selection method is applied and the transfer function
is evaluated on the selected realizations, an estimation of the correlation coefficient
can be inferred. The quality of the estimated correlation coefficient can be studied in
exactly the same way than the estimated quantiles, by doing parametric bootstrap.
Figure 15 represents the confidence intervals obtained for different values of N, the
correlations being estimated on the same samples used to estimate the quantile error.
The symbols show the initial estimates of the covariance ρ̂ .
5 10 15 200
0.2
0.4
0.6
0.8
1Quantiles estimation
# of function evaluation
Estim
ate
d c
orr
ela
tion
coe
ffic
ient
Figure 15: Bootstrap estimated correlation coefficient on the WCA test case.
Figure 15 shows that the first estimates ρ̂ of the correlation coefficient between
the 2 responses are accurate, with a slight overestimation for small number of transfer
function evaluations (N = 3 and 5). However, the bootstrap estimated correlation
coefficients are largely underestimated. We believe that this is due to the kernel
smoothing technique, which tends to smooth the density of the bivariate data and
therefore allow Metropolis sampling to sample points in a “wider” area than it should.
This was not the case for the bi-normal example in Section 4.1. However, we can
still derive conclusions on the confidence intervals provided. We observe that DKM
tends to have less uncertainty in the correlation coefficient than ranking, except for N
= 8.
23
5. Discussion on distances
The above examples demonstrate that using the same measure for ranking and
distance provides for similar accuracy in uncertainty quantification for the Gaussian
case. We should emphasize however that the bootstrap method applied in the context
of the paper is clearly unfavorable to DKM. In order to compare ranking and the
DKM, we calculated the distance between 2 realizations as the difference of the
ranking measure between the realizations. This leads to a representation of
uncertainty in a 1D MDS-space, and therefore the use of kernel methods has not the
same impact as for higher dimensional MDS-space. The distance in this study is very
simple, whereas in many applications the distance can be much more complex, and
can take into account many measures of influential factors on the response. For
example, a distance can be a function of many parameters, such as the cumulative oil
production at different times, and water-cut of a a group of wells (Scheidt and Caers,
2009). Using traditional ranking techniques may require multiple independent studies
if one is interested in uncertainty in several responses. In the case of DKM, a single
study is enough if the distance is well chosen.
6. Conclusions
We have established a new workflow to construct confidence intervals on
quantile estimations in model selection techniques. We would like to state explicitly
that we do not treat the question of whether or not the uncertainty model, i.e. the
possibly large set of reservoir models that can be generated by varying several input
parameters, is realistic. Uncertainty quantification by itself is inherently subjective
and any confidence estimates of the uncertainty model itself are therefore useless. In
this paper we assume there is a larger set of model realizations and assume that this
set provides a realistic representation of uncertainty. Then, the proposed bootstrap
allows quantifying error on uncertainty intervals or quantiles when only a few models
from the larger set are selected.
The workflow uses model selection methods – in this work DKM or ranking -
and employs a parametric bootstrap procedure to construct confidence intervals on
the quantiles retained by the model selection techniques. Examples show that DKM
provides more robust results compared to ranking, especially for small number of
transfer function evaluations. The study of the uncertainty resulting from model
selection can be very informative - it shows if we can be confident or not in the
estimated statistics. The confidence interval is a function of the estimated variance of
the response and the estimated correlation coefficient between the proxy measure and
the response. Since the user does not know the correlation coefficient a priori, we
propose performing a bootstrap procedure between the response and its proxy to
estimate the quality of the distance. If the estimated correlation coefficient is high
and its associated uncertainty low, then we can be confident on the uncertainty
quantification results. If after N transfer function evaluations the uncertainty is large
24
and a poor correlation is found, then the results should be improved by either using a
better proxy response or doing more transfer function evaluations.
Nomenclature
NR = number of initial realizations
N = number of selected realizations for transfer function evaluation
X = [X1,…, XNR]
Xi = [xi, yi]
xi = target response value for realization i
yi = proxy response value for realization i
dij = distance between realizations i and j
ρ = correlation coefficient between the target and proxy responses
B = number of samples generated in the bootstrap procedure
e*b = bootstrap error of the estimated quantiles for sample b ***
905010ˆ,ˆ,ˆ PPP xxx = estimated P10, P50 and P90 after the first selection method
***
905010
ˆ̂,ˆ̂,ˆ̂ PPP xxx = bootstrap estimated quantiles for the second selection method
References
Ballin, P.R., Journel A.G., and Aziz, K. [1992] Prediction of Uncertainty in
Reservoir Performance Forecast, JCPT, no. 4.
Batycky, R. P., Blunt, M. J. and Thiele, M. R. 1997. A 3D Field-Scale
Streamline-Based Reservoir Simulator, SPERE 12(4): 246-254.
Borg, I., Groenen, P. 1997. Modern multidimensional scaling: theory and
applications. New-York, Springer.Bowman, A. W., and A. Azzalini, [1997] Applied
Smoothing Techniques for Data Analysis, Oxford University Press
Erfon, B. [1979]. Bootstrap methods: Another look at the Jackknife, The Annals
of Statistics 7 (1): 1-26
Hastings, W. K. 1970. Monte Carlo sampling methods using Markov Chains and
their applications. Biometrika 57: 97–109.
McLennan, J.A., and Deutsch, C.V. 2005. Ranking Geostatistical Realizations by
Measures of Connectivity, Paper SPE/PS-CIM/CHOA 98168 presented at the SPE
International Thermal Operations and Heavy Oil Symposium, Calgary, Alberta,
Canada, 1-3 November.
Metropolis, N., and S. Ulam. 1949. The Monte Carlo method. J. Amer. Statist.
Assoc. 44: 335–341.
Metropolis, N., A.W. Rosenbluth, M. N. Rosenbluth, A.Teller, and H. Teller.
25
1953. Equations of state calculations by fast computing machines. Journal of
Chemical Physics 21: 1087–1091
Saad, N., Maroongroge, V. and Kalkomey C. T. 1996. Ranking Geostatistical
Models Using Tracer Production Data, Paper presented at the European 3-D
Reservoir Modeling Conference, Stavanger, Norway, 16-17 April.
Scheidt, C., and Caers, J. 2008. Representing Spatial Uncertainty Using Distances
and Kernels. Mathematical Geosciences, DOI:10.1007/s11004-008-9186-0.
Scheidt, C., and Caers, J. 2009, A new method for uncertainty quantification
using distances and kernel methods. Application to a deepwater turbidite reservoir.
Accepted in SPEJ. To be published.
Schoelkopf B., Smola A. (2002) Learning with kernels, MIT Press, Cambridge,
664p.
Strebelle, S. 2002. Conditional Simulation of Complex Geological Structures
using Multiple-point Statistics, Mathematical Geology, 34(1): 1-22.