+ All Categories
Home > Documents > Bootstrap confidence intervals for reservoir model selection ......Multi-dimensional scaling (MDS)...

Bootstrap confidence intervals for reservoir model selection ......Multi-dimensional scaling (MDS)...

Date post: 19-Feb-2021
Category:
Upload: others
View: 7 times
Download: 0 times
Share this document with a friend
25
1 Bootstrap confidence intervals for reservoir model selection techniques Céline Scheidt and Jef Caers Department of Energy Resources Engineering Stanford University Abstract Stochastic spatial simulation allows rapid generation of multiple, alternative realizations of spatial variables. Quantifying uncertainty on response resulting from those multiple realizations would require the evaluation of a transfer function on every realization. This is not possible in real applications, where one transfer function evaluation may be very time consuming (several hours to several days). One must therefore select a few representative realizations for transfer function evaluation and then derive the production statistics of interest (typically the P10, P50 and P90 quantiles of the response). By selecting only a few realizations one may risk biasing the P10, P50 and P90 estimates as compared to the original multiple realizations. The principle objective of this study is to develop a methodology to quantify confidence intervals for the estimated P10, P50 and P90 quantiles when only a few models are retained for response evaluation. Our approach is to use the parametric bootstrap technique, which allows the evaluation of the variability of the statistics obtained from uncertainty quantification and construct confidence intervals. A second objective is to compare the confidence intervals when using two selection methods available to quantify uncertainty given a set of geostatistical realizations: traditional ranking technique and the distance-based kernel clustering technique (DKM). The DKM has been recently developed and has been shown to be effective in quantifying uncertainty. The methodology is demonstrated using two examples. The first example is a synthetic example, which uses bi-normal variables and serves to demonstrate the technique. The second example is from an oil field in West Africa where the uncertain variable is the cumulative oil production coming from 20 wells. The results show that for the same number of transfer function evaluations, the DKM method has equal or smaller error and confidence interval compared to ranking.
Transcript
  • 1

    Bootstrap confidence intervals for reservoir model

    selection techniques

    Céline Scheidt and Jef Caers

    Department of Energy Resources Engineering

    Stanford University

    Abstract

    Stochastic spatial simulation allows rapid generation of multiple,

    alternative realizations of spatial variables. Quantifying uncertainty on response

    resulting from those multiple realizations would require the evaluation of a

    transfer function on every realization. This is not possible in real applications,

    where one transfer function evaluation may be very time consuming (several

    hours to several days). One must therefore select a few representative

    realizations for transfer function evaluation and then derive the production

    statistics of interest (typically the P10, P50 and P90 quantiles of the response).

    By selecting only a few realizations one may risk biasing the P10, P50 and P90

    estimates as compared to the original multiple realizations.

    The principle objective of this study is to develop a methodology to

    quantify confidence intervals for the estimated P10, P50 and P90 quantiles

    when only a few models are retained for response evaluation. Our approach is

    to use the parametric bootstrap technique, which allows the evaluation of the

    variability of the statistics obtained from uncertainty quantification and

    construct confidence intervals. A second objective is to compare the confidence

    intervals when using two selection methods available to quantify uncertainty

    given a set of geostatistical realizations: traditional ranking technique and the

    distance-based kernel clustering technique (DKM). The DKM has been

    recently developed and has been shown to be effective in quantifying

    uncertainty.

    The methodology is demonstrated using two examples. The first example is

    a synthetic example, which uses bi-normal variables and serves to demonstrate

    the technique. The second example is from an oil field in West Africa where

    the uncertain variable is the cumulative oil production coming from 20 wells.

    The results show that for the same number of transfer function evaluations, the

    DKM method has equal or smaller error and confidence interval compared to

    ranking.

  • 2

    1. Introduction

    Uncertainty quantification of subsurface spatial phenomena is done in the context

    of decision making, often by estimating low, mean and high quantile values (typically

    P10, P50, and P90) of the response of interest. Often, an exhaustive sampling of all

    uncertain parameters is unfeasible, and only a small subset of reservoir model

    realizations of the phenomena can be created. Due to high computational

    requirements, the transfer function must be evaluated on an even smaller subset of

    realizations. Therefore, any quantiles that are estimated from this subset are

    themselves subject to uncertainty, and may vary depending on the selection method,

    the number of transfer function evaluations, the initial set of realizations, the use of a

    proxy response, etc.

    The objective of the study is to be able to quantify confidence intervals for the

    estimated P10, P50 and P90 quantiles when only a few models are retained for

    response evaluation. The magnitude of the confidence intervals can then be used to

    decide whether or not more flow simulations are required to establish a better

    quantification of response uncertainty. The methodology developed uses parametric

    bootstrap technique, which is a statistical method allowing to construct confidence

    intervals of the estimated statistics. Such confidence intervals provide an idea on the

    variability of the statistics inferred by selecting only a few models for evaluation.

    The workflow can be applied using any technique of reservoir model selection. In

    this paper, we compare the behavior of the estimated quantiles using 3 different

    selection techniques. The first method is the traditional ranking technique (Ballin et

    al., 1992), which select realizations according to a ranking measure. The second

    method has been developed recently and is called the distance-based kernel technique

    (DKM, Scheidt and Caers, 2008). Finally, we use a random selection for comparison.

    It should be noted that the proposed bootstrap technique applies to any model

    selection methodology.

    The paper is organized as follows. In the next section, we give a description of

    the two methods employed to quantify uncertainty in spatial parameters. Then, we

    give a brief overview of the basic ideas of the bootstrap methodology in the context

    of parametric inference, illustrated by a typical example. We then describe our

    workflow which is applied to cases where we have a proxy response which can be

    evaluated rapidly for each realization, and a true response which cannot be evaluated

    for every realization. The subsequent section is devoted to the application of the

    specific workflow to two examples, the first being a synthetic example, the second is

    an example from an oil field in West Africa. Finally, we discuss the results obtained

    as well as some concluding remarks.

    2. Quantification of uncertainty – methodologies

    Uncertainty quantification of a spatial phenomenon aims at characterizing the

  • 3

    statistics (P10, P50 and P90) of the response(s) of interest. In real applications where

    one transfer function evaluation can be very time consuming, it may not be possible

    to perform a transfer function evaluation on every realization of the reservoir. This

    difficulty can be overcome by selecting a representative set of realizations from the

    initial set. In this paper, we consider two different ways of selecting realizations for

    transfer function evaluation. The first method is the traditional ranking technique,

    which was introduced by Ballin et al. in 1992. The second method, denoted the

    Distance-Kernel Method (DKM) is more recent and was first presented in 2008

    (Scheidt and Caers) and applied to a real case in Scheidt and Caers (2009).

    2.1. Traditional Ranking

    Traditional ranking technique was introduced by Ballin in 1992 in the context of

    stochastic reservoir modeling. The basic idea behind ranking is to define a rapidly

    calculable ranking measure, which can be evaluated for each realization. Most of the

    time, the ranking measure is static (eg. original oil-in-place), however more recent

    studies employ more complex measures, such as connectivity (McLennan and

    Deustch, 2005), streamline (Gilman et al., 2002) or tracer-based measures (Ballin et

    al., 1992, Saad et al., 1996). The ranking measure acts as a proxy of the response of

    interest for each realization. To be effective, therefore, ranking requires a good

    correlation between the ranking measure and the response. The ranking measure is

    used to rank the realizations according to the measure, and realizations are

    subsequently selected corresponding typically to the P10, P50 and P90 quantiles.

    Full flow simulation is then performed on these selected realizations, and the P10,

    P50 and P90 values are derived from the simulation results.

    In previous work (Scheidt and Caers, 2009), we show that selecting only 3

    realizations to derive the P10, P50, and P90 quantiles can result in very inaccurate

    estimations. In this study, contrary to the standard ranking approach, we propose to

    select more than 3 realizations, and compare ranking with the Distance-Kernel

    Method proposed below. The realizations are selected equally-spaced according to

    the ranking measure, and we derive the P10, P50 and P90 quantiles by interpolation

    from the distribution of the selected points.

    2.2. Distance-Kernel Method

    In this section, we describe the main principle of the Distance-Kernel Method

    (DKM), illustrated in Figure 1. Starting from a large number of model realizations,

    the first step is to define a dissimilarity distance between the realizations. This

    distance is a measure of the dissimilarity between any two realizations, and should be

    tailored to the application and the response(s) of interest (just as in ranking), in order

    to make uncertainty quantification more efficient. The distance is evaluated between

    any two realizations, and a dissimilarity distance table (NR x NR) is then derived.

    Multi-dimensional scaling (MDS) is then applied using the distance table (Borg and

  • 4

    Groenen, 1997). This results in a map (usually 2 or 3D) of the realizations, where the

    Euclidean distance between any two realizations is similar to the distance table. Note

    that only the distance between the realizations in the new space matters - the actual

    position of the realizations is irrelevant. Once the realizations are in MDS space, one

    could classify realizations and select a subset using clustering techniques. However,

    often the points in MDS space do not vary linearly and thus classical clustering

    methods would result in inaccurate classification. To overcome the nonlinear

    variation of the points, Schöelkopf et al. (2002) introduced kernel methods to

    improve the clustering results. The main idea behind kernel methods is to introduce a

    highly non-linear function Φ and map the realizations from the MDS space to a new

    space, called feature space. The high dimensionality of that space makes the points

    behave more linearly and thus standard classification tools, such as clustering, can be

    applied more successfully. In this paper, we employ kernel k-means to select

    representative realizations of the entire set. Transfer function evaluation is then

    applied on the closest realization to the centroids and the statistics (P10, P50 and P90)

    are computed on the small subset of realizations.

    Model 1 Model 2

    Model 3 Model 4

    δδδδ12

    δδδδ13 δδδδ24

    δδδδ34

    δδδδ32

    δδδδ14

    Distance Matrix DEuclidean Space

    ΦΦΦΦ-1

    Non-linear variation

    linear “features”

    Apply standard tools here

    PCA

    ClusteringDimensionality reduction

    Apply standard tools herePCA

    Clustering

    Dimensionality reduction

    Kernels

    R

    F

    R

    (a) (b) (c)

    (d)

    (e)

    simulation

    ΦΦΦΦP10,P50,P90

    model selection

    δδδδ44δδδδ43δδδδ42δδδδ414

    δδδδ34δδδδ33δδδδ32δδδδ313

    δδδδ24δδδδ23δδδδ22δδδδ212

    δδδδ14δδδδ13δδδδ12δδδδ111

    4321

    Figure 1: DKM for uncertainty quantification: (a) distance between two models,

    (b) distance matrix, (c) models mapped in Euclidean space, (d) feature space, (e) pre-

    image construction, (f) P10, P50, P90 estimation

  • 5

    For more details about the methodology, we refer to Scheidt and Caers (2008).

    3. Parametric Bootstrap – Methodology

    3.1. General introduction to Bootstrap

    Bootstrap methods fall within the broader category of resampling methods. The

    concept of the bootstrap was first introduced by Efron (1979). In his paper, Efron

    considered two types of bootstrap procedures (nonparametric and parametric

    inference). Bootstrap is a Monte-Carlo simulation technique that uses sampling

    theory to estimate the standard error and the distribution of a statistic. In many recent

    statistical texts, bootstrap methods are recommended for estimating sampling

    distributions, finding standard errors and confidence intervals. A bootstrap procedure

    is the practice of estimating properties of an estimator (such as its variance) by

    measuring those properties when sampling from an approximate distribution. In the

    parametric bootstrap, we consider an unknown distribution F to be a member of

    some prescribed parametric family and obtain the empirical distribution nF̂ by

    estimating the parameters of the family from the data. Then, a new random sequence,

    called a resample, is generated from the distribution nF̂ .

    The parametric bootstrap procedure works as follows. First, the statistics θ̂ of the distribution of the initial sample are computed (for example the mean and

    variance). Then, the distribution nF̂ is estimated using those statistics. We assume

    that the distribution nF̂ is the true distribution and we use Monte-Carlo simulation to

    generate B new samples of the initial sample using the distribution nF̂ . Next, we

    apply the same estimation technique to these “bootstrapped” data to get a total of B

    bootstrap estimates of θ̂ , which are denoted b*ˆ̂

    θ , b = 1,…B. Using these B bootstrap estimates, we can compute confidence intervals or any other statistical measure of

    error.

    Simple illustrative example

    A simple example illustrating the parametric bootstrap method is presented in

    Figure 2. Suppose we have NR=15 values X = (x1,…,xNR) of a normal distribution

    ),( σµN and we are interested in the estimation of the unknown parameters µ and σ.

    The first step is to assume that X has a normal distribution Fn and then to estimate the

    mean and variance of the distribution:

    x=µ̂ and ( )2N

    1R

    2R

    N

    1ˆ ∑

    =

    −=i

    i xxσ

  • 6

    We assume that µ̂ and σ̂ are the true parameters and we generate B = 1000 new

    samples X*b (b=1,…,B) from )ˆ,ˆ(ˆ σµNFn = using Monte-Carlo simulation, each

    sample containing NR=15 values. For each sample, the bootstrap estimate of the

    mean and variance of the distribution can be calculated:

    bbx

    **ˆ̂ =µ and ( )2N

    1

    **

    R

    *2R

    N

    1ˆ̂ ∑=

    −=i

    bb

    i

    bxxσ

    Having computed ( )bbb *2** ˆ̂,ˆ̂ˆ̂ σµθ = , one can now construct a histogram of the mean and the variance to display the probability distribution of the bootstrap

    estimator (Figure 2). From this distribution, one can obtain an idea of the statistical

    properties of the estimates µ̂ and σ̂ . In Figure 2, the red line represents the

    estimation of the mean µ̂ and variance σ̂ of the initial sample.

    )ˆ,ˆ(ˆ σµNF ≈

    ],...,[ **1* b

    Nbb

    Rxx=X

    )ˆ̂,ˆ̂(ˆ̂ *** bbb σµθ =

    ],...,[ 1RN

    xx=X

    2 3 4 5 6 7 80

    50

    100

    150

    200

    250

    300

    Bootstrap Mean

    Fre

    qu

    en

    cy

    0 5 10 15 20 250

    50

    100

    150

    200

    250

    300

    Bootstrap Variance

    Fre

    qu

    en

    cy

    ),( σµNF ≈

    )ˆ,ˆ(ˆ σµθ =

    Figure 2: Application of the parametric bootstrap procedure to a simple example

    The histograms of the bootstrap estimations of the mean and the variance are

    informative about the variability of the statistics obtained. Confidence intervals of

    the estimated mean and variance (or any quantiles) can then be calculated from the B

    estimates of the mean and variance.

    The next section shows how to apply the bootstrap method in the context of

    uncertainty quantification where a proxy value can be rapidly calculated for many

    realizations of a spatial phenomenon.

    3.2. Workflow of the study

    Contrary to the previous example where the data are univariate, in the context of

    reservoir model selection techniques, a proxy response is employed for the selection

  • 7

    using DKM or ranking and thus two variables are necessary: the response of interest

    and the proxy response.

    Therefore, we consider a bivariate variable X = [X1, X2,…, XNR], where:

    • Xi = [xi,yi], i =1,…,NR, NR being the total number of samples/realizations

    • xi represents the response of interest (e.g. cumulative oil production)

    • yi represents the proxy response, which will serve as a ranking measure or

    be transformed to a distance.

    Note that for ranking and DKM to be effective, the response and its proxy should

    be reasonably well correlated. In addition, for real applications, the values of the true

    response xi for each realization are unknown.

    In traditional uncertainty quantification, the proxy response serves as a guide to

    select a few realizations which will be evaluated using the transfer function. The

    response quantiles are then deduced from the evaluation of the realizations. Since the

    resulting quantiles are subject to uncertainty, the bootstrap method illustrated

    previously is well suited to the problem and can inform us on the accuracy of the

    estimated quantiles and give an idea of the error resulting from the selection of a

    small subset of realizations.

    The workflow in the context of reservoir model selection is as follow. It is

    illustrated in Figure 3.

    1. Evaluate a proxy response yi for each of the i =1,…,NR realizations.

    2. Apply ranking or DKM using the proxy response in order to select N

    samples/realizations for evaluation (note that N

  • 8

    intervals.

    6. A single measure of accuracy of our quantile estimation is defined by

    computing the dimensionless bootstrap error of the estimated quantiles

    for each of the B new samples created (Eq. 1):

    −+

    −+

    −=

    90

    9090

    50

    5050

    10

    1010

    ˆ

    ˆˆ̂

    ˆ

    ˆˆ̂

    ˆ

    ˆˆ̂

    3

    1***

    *

    P

    P

    b

    P

    P

    P

    b

    P

    P

    P

    b

    Pb

    x

    xx

    x

    xx

    x

    xxerror (1)

    The bootstrap error of the estimated quantiles is evaluated on each sample, and

    thus can be represented as a histogram to visualize the variability between the

    samples. From the histogram, we can quantify the variation of the bootstrap error of

    the estimated quantiles.

    KKM/Ranking

    -> Select N points

    **

    1 ,, Nxx K

    **

    1 ,, Nyy K

    b

    N

    b xx **1 ,,K

    b

    N

    byy

    **

    1 ,,K

    True Values

    RN1,, xx K

    RN1,, yy K

    KKM/Ranking

    -> Select N points )(N

    )(

    1 R,, bb yy K

    )(

    N

    )(

    1 R,, bb xx K

    −+

    −+

    −=

    90

    9090

    50

    5050

    10

    1010

    ˆ

    ˆˆ̂

    ˆ

    ˆˆ̂

    ˆ

    ˆˆ̂

    3

    1***

    *

    P

    P

    b

    P

    P

    P

    b

    P

    P

    P

    b

    Pb

    x

    xx

    x

    xx

    x

    xxerror

    b

    P

    b

    P

    b

    P xxx***

    905010

    ˆ̂,ˆ̂,ˆ̂

    ***

    905010ˆ,ˆ,ˆ PPP xxx b = 1,…,B

    • Parametric Bootstrap

    • Estimation of distribution

    • Generation of B samples

    nF̂

    Figure 3: Workflow of the bootstrap method applied to uncertainty quantification

    The workflow described previously and illustrated in Figure 3 is performed for

    several values of N, where N is the number of selected realizations for evaluation.

    This is done to evaluate the influence of the number of transfer function evaluations

    on the accuracy of the quantile estimation. For each value of N, the selected

    realizations are obtained using DKM or ranking methods, and therefore the

    realizations are different for each value of N.

    Now that the basic idea and theory of the bootstrap method has been presented,

  • 9

    the next section shows some application examples of this technique in the context of

    uncertainty quantification.

    4. Application of the methodology to uncertainty

    quantification

    Two examples are presented in this section. The first one is illustrative and uses a

    bivariate Gaussian distribution. The second example is more complex and is based

    on a real oil field reservoir in West Africa (West Coast African reservoir) and uses

    real production data.

    In the case of DKM, the definition of a distance between any two realizations is

    required. In this study, in order to compare the results of the DKM with those

    obtained by ranking using the exact same information, we use simply the difference

    of ranking measure (proxy response) as a distance between realizations. Note

    however that, as opposed to the ranking measure, the distance can be calculated using

    a combination of many different measures, and thus has more flexibility to be tailored

    to the application. We will discuss the consequences of this in more detail below.

    4.1. Bivariate Gaussian distribution

    In the first example, we consider a bivariate Gaussian distribution:

    ),(~ ΣµbiNX , where µ represents the mean and Σ the covariance matrix. In this

    example, the mean of the sample is taken as ]5,5[=µ , and the covariance is taken

    as:

    2ρ2

    ρ22. The parameter ρ defines the correlation coefficient between the

    target response and the proxy response.

    To set up an example, an initial sample X of NR = 100 values is generated using

    the distribution given above. Note that for this illustrative example, we use the term

    sample instead of realization, since no geostatistical realization is associated to each

    bivariate value. Figure 4 shows an example of the probability density plot of the bi-

    normal sample X, where the correlation coefficient between the target and proxy

    responses was defined as ρ = 0.9.

  • 10

    Figure 4: Probability density of X, which has a bi-normal distribution

    Now that the initial data is defined, we assume that we only know the type of

    distribution of X (bi-normal), but that we do not know the parameters defining the

    distribution (the mean µ and the covariance Σ ). The bootstrap procedure illustrated

    in Figure 3 is applied taking the sample X generated previously (Figure 4) and using

    DKM to select N =15 points. Estimation of the mean µ̂ and the covariance Σ̂ are

    then obtained using the response on the 15 selected points and thus the estimated

    bivariate distribution of the data is assumed to be the true distribution:

    )ˆ,ˆ(ˆ Σ= µbiNFn . B=1000 new samples of this distribution can then be easily

    derived, since the distribution is known. Uncertainty quantification is then performed

    on those B samples, and an estimation of the variability of the quantiles is possible.

    Examples of the bootstrap histograms of the P10, P50 and P90 quantiles are presented

    in Figure 5.

    1 2 3 4 50

    50

    100

    150

    200

    250

    300

    Bootstrap P10

    Fre

    que

    ncy

    4 4.5 5 5.5 6 6.50

    50

    100

    150

    200

    250

    Bootstrap P50

    Fre

    que

    ncy

    5 6 7 8 9 100

    50

    100

    150

    200

    250

    300

    Bootstrap P90

    Fre

    que

    ncy

    Figure 5: Histogram of the P10, P50 and P90 quantiles estimated by bootstrap

    ***

    905010

    ˆ̂,ˆ̂,ˆ̂ PPP xxx . The red line represents the estimated quantiles***

    905010ˆ,ˆ,ˆ PPP xxx . The

    estimates are calculated using DKM to select 15 points.

  • 11

    We observe on Figure 5 that the distribution of the bootstrap quantiles is

    Gaussian. In addition, there is a small bias in the estimation of the P10 and P90

    quantiles for this example. Although this is not shown, ranking has the same effect.

    The result is that on average, the 10

    ˆ̂Px is overestimated and the 90

    ˆ̂Px is underestimated.

    The biased estimates should not affect the determination of the confidence intervals.

    In our study, we have found that the estimated mean µ̂ and covariance Σ̂ from

    the initial sample had an impact on the confidence intervals. Since our goal in this

    first example is to understand what the general behavior is when varying the number

    of selected samples N, we propose to do a Monte-Carlo bootstrap, which basically

    means that we randomize the initial sample and use C sets of initial samples, then

    perform the workflow illustrated in Figure 3 on those C sets of initial samples. The

    estimated statistics of each initial sample are averaged to obtain the final statistics. In

    this study, we take C = 15. In the next few examples, the workflow illustrated in

    Figure 3 has been performed by varying the number of selected samples (N = 5, 8, 10,

    15 and 20 more precisely), in order to examine the effect of the number of transfer

    function evaluations on the bootstrap error. In addition, several correlation values

    between the proxy response and the target response were used to explore the

    influence of the correlation coefficient on the confidence intervals. Results are

    presented in Figure 6, for ρ = 1, 0.9, 0.8, 0.7, 0.6 and 0.5 respectively. Figure 6

    shows the confidence intervals of the error of the bootstrap estimated quantiles for

    DKM (blue - square) and ranking (red - dot) for different values of N. The number of

    bootstrap samples generated is B = 1000. The symbols represent the P50 value of

    the error of the estimated quantiles, in other words, half of the estimated quantiles

    have an error below this value and half above.

  • 12

    5 10 15 200

    0.05

    0.1

    0.15

    0.2

    0.25

    0.3

    ρ = 1.0

    # of function evaluation

    Err

    or

    on

    qu

    an

    tile

    estim

    atio

    n

    5 10 15 200

    0.05

    0.1

    0.15

    0.2

    0.25

    0.3

    ρ = 0.6

    # of function evaluation

    Err

    or

    on

    qu

    an

    tile

    estim

    atio

    n

    5 10 15 200

    0.05

    0.1

    0.15

    0.2

    0.25

    0.3

    ρ = 0.8

    # of function evaluation

    Err

    or

    on

    qu

    an

    tile

    estim

    atio

    n

    5 10 15 200

    0.05

    0.1

    0.15

    0.2

    0.25

    0.3

    ρ = 0.9

    # of function evaluation

    Err

    or

    on

    qu

    an

    tile

    estim

    atio

    n

    5 10 15 200

    0.05

    0.1

    0.15

    0.2

    0.25

    0.3

    ρ = 0.5

    # of function evaluation

    Err

    or

    on

    qu

    an

    tile

    estim

    atio

    n

    5 10 15 200

    0.05

    0.1

    0.15

    0.2

    0.25

    0.3

    ρ = 0.7

    # of function evaluation

    Err

    or

    on

    qu

    an

    tile

    estim

    atio

    n

    KKM

    Ranking

    Random

    KKM

    Ranking

    Random

    KKM

    Ranking

    Random

    KKM

    Ranking

    Random

    KKM

    Ranking

    Random

    KKM

    Ranking

    Random

    Figure 6: Confidence intervals (α = 10) of the bootstrap error of the estimated

    quantiles as a function of the number of function evaluation for ρ = 1, 0.9, 0.8, 0.7,

    0.6 and 0.5. The symbols represent the P50 value of the bootstrap error.

    We observe on Figure 6 that the error globally decreases as the number of transfer

    function evaluation increases. Also, the confidence intervals tend to narrow as the

    number of transfer function evaluation increases, meaning that the error in our

    estimates decreases. Both methods, DKM and ranking, provide similar results.

    However, the error obtained by the DKM is slightly smaller than the one observed for

    ranking. The same remark is valid for the confidence intervals. Finally, the results

  • 13

    provided by DKM vary smoother than the one obtained by ranking technique.

    Note that each method selects optimally N samples for evaluation. Therefore, the

    N = 8 models selected do not necessarily include the N = 5 models. This is true for

    all N.

    The bootstrap method can also be used to compute an estimate of the correlation

    coefficient between the actual response and the proxy response. Figure 7 presents the

    confidence intervals for the correlation corresponding to the results obtained in Fig. 6.

    5 10 15 20-0.4

    -0.2

    0

    0.2

    0.4

    0.6

    0.8

    1ρ = 0.5

    # of function evaluation

    Estim

    ate

    d c

    orr

    ela

    tion

    5 10 15 20

    0.4

    0.5

    0.6

    0.7

    0.8

    0.9

    1ρ = 0.7

    # of function evaluation

    Estim

    ate

    d c

    orr

    ela

    tion

    5 10 15 200.75

    0.8

    0.85

    0.9

    0.95

    1ρ = 0.9

    # of function evaluation

    Estim

    ate

    d c

    orr

    ela

    tion

    KKM

    Ranking

    Random

    KKM

    Ranking

    Random

    KKM

    Ranking

    Random

    KKM

    Ranking

    Random

    KKM

    Ranking

    Random

    5 10 15 20-0.2

    0

    0.2

    0.4

    0.6

    0.8

    1

    1.2ρ = 0.6

    # of function evaluation

    Estim

    ate

    d c

    orr

    ela

    tion

    5 10 15 200.55

    0.6

    0.65

    0.7

    0.75

    0.8

    0.85

    0.9

    0.95

    ρ = 0.8

    # of function evaluation

    Estim

    ate

    d c

    orr

    ela

    tion

    Figure 7: Bootstrap estimates (α = 10) of the correlation between the response

    and the proxy. The black line represents the input correlation (ρ = 0.9, 0.8, 0.7, 0.6

    and 0.5) to generate the first sample

  • 14

    We observe on Figure 7 that the estimation of the correlation coefficient tends to

    be overestimated, especially for small values of N. Figure 7 also shows that the

    correlation estimation becomes more accurate and less prone to error as the number

    of transfer function evaluation increases.

    The next section illustrates the workflow using a real oil reservoir, located in

    West Africa.

    4.2. West Coast African reservoir

    • Reservoir Description

    The West Coast African (WCA) reservoir is a deepwater turbidite offshore

    reservoir located in a slope valley. The reservoir is located offshore in 1600 feet of

    water and is 4600 feet below see level. Four depositional facies were interpreted

    from the well logs: shale (Facies 1), poor quality sand #1 (Facies 2), poor quality sand

    #2 (Facies 3) and good quality channels (Facies 4). The description of the facies

    filling the slope valley is subject to uncertainty. 12 TIs are used in this case study,

    representing uncertainty on the facies representations.

    The reservoir is produced with 28 wells, of which 20 are production wells and 8

    are water injection wells. The locations of the wells are displayed in Figure 8. Wells

    colored in red are producers wells and in blue are injectors.

    Figure 8: Location of the 28 wells. Red are production wells and blue are

    injection wells. Different colors in grid represent different fluid regions

    72 geostatistical realizations were created using the multi-point geostatistical

    algorithm snesim (Strebelle, 2002). To include spatial uncertainty, two realizations

    were generated for each combination of TI and facies probability cube, leading to a

    total of 72 possible realizations of the WCA reservoir. Each flow simulation took 4.5

    hours.

  • 15

    In a previous paper (Scheidt and Caers, 2009), uncertainty quantification on the

    WCA reservoir has been performed by performing only a small number of

    simulations. It was shown that the statistics obtained by flow simulation on 7

    realizations selected by the DKM are very similar to the one obtained by simulation

    on the entire set of 72 realizations. A comparison with the traditional ranking method

    showed that the DKM method easily outperforms ranking technique without requiring

    any additional information. However, in reality, one would not have access to the

    results of those 72 flow simulations, hence one would not know how accurate the

    results of P10, P50, P90 of those 7 flow simulations are with respect to the entire set

    of 72 flow simulations.

    In this study, the response of interest is the cumulative oil production at 1200

    days. We have evaluated the response for each of the 72 realizations, as a reference.

    For the proxy response, we evaluated the cumulative oil production using streamline

    simulation (Batycky et al., 1997). The correlation coefficient between the response

    and the proxy is ρ = 0.92. In order to perform the parametric bootstrap procedure, we

    must estimate the distribution of the cumulative oil production and its ranking proxy,

    and be able to generate new samples with densities of the bivariate distribution.

    Because we do not know a priori the distribution of the cumulative oil production and

    its proxy (contrary to the previous example), we propose to compute the bivariate

    densities using a kernel smoothing technique (Bowman and Azzalini, 1997).

    • Generation of a sample for a kernel smoothing density

    Kernel smoothing (Bowman and Azzalini, 1997) is a spatial method that

    generates a map of density values. The density at each location reflects the

    concentration of points in the surrounding area. Kernel smoothing does not require

    making any parametric assumption about the probability density function (pdf). The

    kernel smoothing density of a variable ],,[RN1

    xx K=X is defined as follow:

    p

    i

    i

    i

    pxx

    h

    xxK

    hhxf ℜ∈

    −= ∑

    =

    ,,N

    1),(ˆ

    RN

    1R

    with K the kernel function and h the bandwidth.

    In the case of a Gaussian rbf kernel, the kernel smoothing density is defined as:

    ∑=

    −−=

    RN

    1

    2

    2/1

    R2

    1exp

    )2(N

    1),(ˆ

    i

    i

    p h

    xx

    hhxf

    π

    Choosing the bandwidth for the kernel smoothing can be a difficult task, and is

    generally a compromise between acceptable smoothness of the curve and fidelity to

    the data. The choice of h has an impact on the overall appearance of the resulting

    smooth curve, much more so than the choice of the kernel function which is generally

    held to be of secondary importance. In this work, we use a bandwidth which is

    function to the number of points in X.

  • 16

    For example, Figure 9 shows the density distribution of the 72 data from the

    WCA example, estimated by kernel smoothing using a Gaussian kernel.

    Figure 9: Probability density of X for WCA

    Once the density of the bivariate variable has been defined, new samples of the

    same distribution can be generated using Metropolis sampling technique.

    • Overview of the Metropolis sampling algorithm

    Metropolis-Hasting technique is a Markov chain-based method which allows

    generating a random variable having a particular distribution (Metropolis and Ulam

    1949, Metropolis et al. 1953). The Metropolis algorithm generates a sequence of

    samples from a distribution f as follows:

    1. Start with some initial value x0

    2. Given this initial value, draw a candidate value x* from some

    proposal distribution (we choose a uniform distribution)

    3. Compute the ratio α of the density at the candidate x* and the current

    xt-1 points and accept the candidate point with probability α:

    4. Return to step 2 until the desired number of samples is obtained.

    5. The new sample ( )txx ,...,1 is of distribution f

    )(

    *)(

    1−

    =txf

    xfα

  • 17

    An illustration of a sample generated by Metropolis sampling associated with the

    density provided by kernel smoothing is presented in Figure 10. The contours present

    the density probability which is calculated using N = 10 values of response selected

    by DKM. The red points show 70 values derived from this density by Metropolis

    sampling.

    Figure 10: Generation of a new sample using Metropolis sampling. The contours

    represent the probability density obtained by kernel smoothing and red dots represent

    new sample generated by Metropolis sampling

    • Application of the bootstrap technique to the WCA case

    In the context of uncertainty quantification in cumulative oil production, the

    initial data are the flow simulations at the NR = 72 realizations of the WCA reservoir:

    − x1,…xNR: Cumulative oil production obtained by full flow simulation

    (target response)

    − y1,…yNR: Cumulative oil production obtained by fast flow simulation

    (proxy response)

    The distance employed for the DKM is computed as the absolute value of the

    difference of proxy response between any two realizations: jiij yyd −= .

    The bootstrap procedure, illustrated in Figure 3, is performed for different number

    N of transfer function evaluation: in this case N = 3, 5, 8, 10, 15 and 20. For each

    value of N, the procedure to generate B bootstrap samples is as follow:

    1. Select N realizations using the proxy response as ranking measure or distance

    measure d according to the method used

    2. Evaluation of the response using the transfer function (flow simulation) on

    the N selected realizations

  • 18

    3. Compute the bivariate density nF̂ of the target response using kernel

    smoothing on the N responses resulting from the selected realizations

    4. Use Metropolis sampling to generate B samples of the bivariate distribution

    nF̂

    5. For each of the B samples generated, apply ranking or DKM to select N

    realizations and compute the statistics of interest: ( )***905010

    ˆ̂,ˆ̂,ˆ̂ˆ̂

    PPP xxx=θ .

    The workflow illustrated in Figure 11 gives more details than the general

    workflow in Figure 3, by including the estimation of nF̂ by kernel smoothing and the

    generation of new samples by metropolis sampling.

    Kernel Smoothing on N

    selected realizations

    Metropolis sampling to

    generate a new sample

    RN1,, yy K

    Proxy measure: Response evaluation

    on N selected real.

    nF̂)(

    N

    )(

    1 R,, bb yy K

    )(

    N

    )(

    1 R,,

    bbxx K

    b

    N

    bxx

    **

    1 ,,K

    b

    N

    b yy **1 ,,K

    KKM/ranking

    to select N real.

    **

    1 ,, Nxx K**

    1 ,, Nyy K

    P10, P50 and P90 evaluation

    on N selected real.

    KKM/ranking

    to select N real.

    Figure 11: Workflow for confidence interval calculation

    The next section shows an application of the workflow illustrated above in Figure

    11. The workflow is performed using 3 different methods for selecting the

    realizations: DKM, ranking and random selection. Our objective is to see how each

    method behaves as the number of transfer function evaluation increases and how they

    compare to each other.

    First, we compare the 3 methods by looking at the histograms of the bootstrap

    error of the estimated quantiles for each method (Figure 12). The bootstrap error is

    computed using Eq. 1 above. The blue, red and green bars represent the error

    obtained for DKM, ranking and random selection respectively.

  • 19

    0 0.02 0.04 0.06 0.08 0.1 0.120

    50

    100

    150

    200

    250

    300

    Response Value

    Fre

    quency

    N = 3

    DKM

    Ranking

    Random

    0 0.02 0.04 0.06 0.08 0.1 0.120

    50

    100

    150

    200

    250

    300

    Response Value

    Fre

    quency

    N = 5

    DKM

    Ranking

    Random

    0 0.02 0.04 0.06 0.08 0.1 0.120

    50

    100

    150

    200

    250

    300

    Response Value

    Fre

    quency

    N = 8

    DKM

    Ranking

    Random

    0 0.02 0.04 0.06 0.08 0.1 0.120

    50

    100

    150

    200

    250

    300

    Response Value

    Fre

    quency

    N = 10

    DKM

    Ranking

    Random

    0 0.02 0.04 0.06 0.08 0.1 0.120

    50

    100

    150

    200

    250

    300

    Response Value

    Fre

    quency

    N = 15

    DKM

    Ranking

    Random

    0 0.02 0.04 0.06 0.08 0.1 0.120

    50

    100

    150

    200

    250

    300

    Response Value

    Fre

    quency

    N = 20

    DKM

    Ranking

    Random

    Figure 12: Histograms of the bootstrap error of the estimated quantiles for

    different number of function evaluation and 3 selection methods.

    We observe that, in each case, the DKM method performs better than the ranking

    technique. For all values of N, the errors are globally smaller for the DKM than for

    ranking or random selection. In addition, the error variance is reduced with more

    transfer function evaluations.

  • 20

    Figure 12 represents the bootstrap percentile intervals (α = 10) of the bootstrap

    error of the estimated quantiles. The symbol in each interval represents the P50 value

    of the error.

    5 10 15 200

    0.02

    0.04

    0.06

    0.08

    0.1Quantiles estimation

    # of function evaluation

    Err

    or

    on q

    uan

    tile

    estim

    ation

    KKM

    Ranking

    Random

    Figure 13: Confidence intervals (α = 10) of the bootstrap error of the estimated

    quantiles as a function of the number of function evaluations

    We observe on Figure 13 that the error tends to decrease as the number of

    function evaluations increases. As observed before on the histograms, DKM performs

    better than ranking, which performs better than random selection. This conclusion

    was also reached in Scheidt and Caers (2009). In this example, we observe that for

    the DKM, the results stabilize for N > 8. We can therefore conclude that 8 or 10 flow

    simulations are necessary for the DKM selected models to have the same uncertainty

    as the total set of 72. In a previous paper (Scheidt and Caers, 2009), it was concluded

    that 7 simulations were satisfactory. Note however that the distance in that work was

    slightly more correlated to the difference in response compared to the correlation in

    this study.

    The table below represents the mean of the bootstrap error, computed from the

    histograms presented in Figure 12.

    DKM Ranking Random select.

    N = 3 0.0356 0.0495 0.0561

    N = 5 0.0333 0.0348 0.0407

    N = 8 0.0280 0.0293 0.0324

    N = 10 0.0253 0.0270 0.0322

    N = 15 0.0250 0.0325 0.0367

    N = 20 0.0270 0.0316 0.0340

    Table 1: Mean of the dimensionless bootstrap error for each selection method.

  • 21

    This table, as well as the histograms and confidence intervals can be very useful

    to give an indication of the error resulting from the quantile estimation of the

    response, based on the N selected realizations. For example, suppose we are limited

    in time and can only perform 5 transfer function evaluations. However, we want to

    be sure that we can be confident on the uncertainty quantification results derived from

    those 5 simulations. From Table 1, we can see that the mean error for N = 5 for

    DKM is 0.0333 and 0.0495 for ranking. If we had a little more time and had

    performed N = 8 simulations, the error would be 0.0280 and 0.0293, which is an

    improvement of 16% (15.8% for ranking) compared to the results from N = 5.

    Another way of looking at the results is to show the confidence intervals for each

    quantile individually. This is illustrated on Figure 14.

    5 10 15 205.5

    6

    6.5

    7

    7.5x 10

    4 Quantiles estimation

    # of function evaluation

    P1

    0

    5 10 15 206

    6.2

    6.4

    6.6

    6.8

    7

    7.2

    7.4

    7.6x 10

    4 Quantiles estimation

    # of function evaluation

    P5

    0

    5 10 15 206.6

    6.8

    7

    7.2

    7.4

    7.6

    7.8

    8

    8.2x 10

    4 Quantiles estimation

    # of function evaluation

    P9

    0

    KKM

    Ranking

    Random

    KKM

    Ranking

    Random

    KKM

    Ranking

    Random

    Figure 14: Confidence intervals of the bootstrap estimates of the quantiles P10,

    P50 and P90 (BBL) as a function of the number of function evaluation. The line

    represents the quantiles derived from the entire set of realizations

    Figure 14 shows that DKM and ranking produce very accurate estimates of the

    P50 quantile of the target response, even for small number of transfer function

    evaluations (medians are easier to estimate than extremes). In addition, the P10

    quantiles tend to be slightly underestimated, but DKM is closest to the reference

  • 22

    value than the other techniques. The same conclusions are valid for the P90, except

    that we observe an overestimation of the quantiles. The underestimation of P10 and

    overestimation of P90 are most likely due to the use of kernel smoothing to estimate

    the density, which will increase the variability of the response compared to the

    original 72 realizations.

    As mentioned in the beginning of the paper, the proxy measure should be

    correlated for DKM and ranking to be effective. However, the correlation coefficient

    between both responses is not known a priori, since the target response for all

    realizations is unknown. Once a selection method is applied and the transfer function

    is evaluated on the selected realizations, an estimation of the correlation coefficient

    can be inferred. The quality of the estimated correlation coefficient can be studied in

    exactly the same way than the estimated quantiles, by doing parametric bootstrap.

    Figure 15 represents the confidence intervals obtained for different values of N, the

    correlations being estimated on the same samples used to estimate the quantile error.

    The symbols show the initial estimates of the covariance ρ̂ .

    5 10 15 200

    0.2

    0.4

    0.6

    0.8

    1Quantiles estimation

    # of function evaluation

    Estim

    ate

    d c

    orr

    ela

    tion

    coe

    ffic

    ient

    Figure 15: Bootstrap estimated correlation coefficient on the WCA test case.

    Figure 15 shows that the first estimates ρ̂ of the correlation coefficient between

    the 2 responses are accurate, with a slight overestimation for small number of transfer

    function evaluations (N = 3 and 5). However, the bootstrap estimated correlation

    coefficients are largely underestimated. We believe that this is due to the kernel

    smoothing technique, which tends to smooth the density of the bivariate data and

    therefore allow Metropolis sampling to sample points in a “wider” area than it should.

    This was not the case for the bi-normal example in Section 4.1. However, we can

    still derive conclusions on the confidence intervals provided. We observe that DKM

    tends to have less uncertainty in the correlation coefficient than ranking, except for N

    = 8.

  • 23

    5. Discussion on distances

    The above examples demonstrate that using the same measure for ranking and

    distance provides for similar accuracy in uncertainty quantification for the Gaussian

    case. We should emphasize however that the bootstrap method applied in the context

    of the paper is clearly unfavorable to DKM. In order to compare ranking and the

    DKM, we calculated the distance between 2 realizations as the difference of the

    ranking measure between the realizations. This leads to a representation of

    uncertainty in a 1D MDS-space, and therefore the use of kernel methods has not the

    same impact as for higher dimensional MDS-space. The distance in this study is very

    simple, whereas in many applications the distance can be much more complex, and

    can take into account many measures of influential factors on the response. For

    example, a distance can be a function of many parameters, such as the cumulative oil

    production at different times, and water-cut of a a group of wells (Scheidt and Caers,

    2009). Using traditional ranking techniques may require multiple independent studies

    if one is interested in uncertainty in several responses. In the case of DKM, a single

    study is enough if the distance is well chosen.

    6. Conclusions

    We have established a new workflow to construct confidence intervals on

    quantile estimations in model selection techniques. We would like to state explicitly

    that we do not treat the question of whether or not the uncertainty model, i.e. the

    possibly large set of reservoir models that can be generated by varying several input

    parameters, is realistic. Uncertainty quantification by itself is inherently subjective

    and any confidence estimates of the uncertainty model itself are therefore useless. In

    this paper we assume there is a larger set of model realizations and assume that this

    set provides a realistic representation of uncertainty. Then, the proposed bootstrap

    allows quantifying error on uncertainty intervals or quantiles when only a few models

    from the larger set are selected.

    The workflow uses model selection methods – in this work DKM or ranking -

    and employs a parametric bootstrap procedure to construct confidence intervals on

    the quantiles retained by the model selection techniques. Examples show that DKM

    provides more robust results compared to ranking, especially for small number of

    transfer function evaluations. The study of the uncertainty resulting from model

    selection can be very informative - it shows if we can be confident or not in the

    estimated statistics. The confidence interval is a function of the estimated variance of

    the response and the estimated correlation coefficient between the proxy measure and

    the response. Since the user does not know the correlation coefficient a priori, we

    propose performing a bootstrap procedure between the response and its proxy to

    estimate the quality of the distance. If the estimated correlation coefficient is high

    and its associated uncertainty low, then we can be confident on the uncertainty

    quantification results. If after N transfer function evaluations the uncertainty is large

  • 24

    and a poor correlation is found, then the results should be improved by either using a

    better proxy response or doing more transfer function evaluations.

    Nomenclature

    NR = number of initial realizations

    N = number of selected realizations for transfer function evaluation

    X = [X1,…, XNR]

    Xi = [xi, yi]

    xi = target response value for realization i

    yi = proxy response value for realization i

    dij = distance between realizations i and j

    ρ = correlation coefficient between the target and proxy responses

    B = number of samples generated in the bootstrap procedure

    e*b = bootstrap error of the estimated quantiles for sample b ***

    905010ˆ,ˆ,ˆ PPP xxx = estimated P10, P50 and P90 after the first selection method

    ***

    905010

    ˆ̂,ˆ̂,ˆ̂ PPP xxx = bootstrap estimated quantiles for the second selection method

    References

    Ballin, P.R., Journel A.G., and Aziz, K. [1992] Prediction of Uncertainty in

    Reservoir Performance Forecast, JCPT, no. 4.

    Batycky, R. P., Blunt, M. J. and Thiele, M. R. 1997. A 3D Field-Scale

    Streamline-Based Reservoir Simulator, SPERE 12(4): 246-254.

    Borg, I., Groenen, P. 1997. Modern multidimensional scaling: theory and

    applications. New-York, Springer.Bowman, A. W., and A. Azzalini, [1997] Applied

    Smoothing Techniques for Data Analysis, Oxford University Press

    Erfon, B. [1979]. Bootstrap methods: Another look at the Jackknife, The Annals

    of Statistics 7 (1): 1-26

    Hastings, W. K. 1970. Monte Carlo sampling methods using Markov Chains and

    their applications. Biometrika 57: 97–109.

    McLennan, J.A., and Deutsch, C.V. 2005. Ranking Geostatistical Realizations by

    Measures of Connectivity, Paper SPE/PS-CIM/CHOA 98168 presented at the SPE

    International Thermal Operations and Heavy Oil Symposium, Calgary, Alberta,

    Canada, 1-3 November.

    Metropolis, N., and S. Ulam. 1949. The Monte Carlo method. J. Amer. Statist.

    Assoc. 44: 335–341.

    Metropolis, N., A.W. Rosenbluth, M. N. Rosenbluth, A.Teller, and H. Teller.

  • 25

    1953. Equations of state calculations by fast computing machines. Journal of

    Chemical Physics 21: 1087–1091

    Saad, N., Maroongroge, V. and Kalkomey C. T. 1996. Ranking Geostatistical

    Models Using Tracer Production Data, Paper presented at the European 3-D

    Reservoir Modeling Conference, Stavanger, Norway, 16-17 April.

    Scheidt, C., and Caers, J. 2008. Representing Spatial Uncertainty Using Distances

    and Kernels. Mathematical Geosciences, DOI:10.1007/s11004-008-9186-0.

    Scheidt, C., and Caers, J. 2009, A new method for uncertainty quantification

    using distances and kernel methods. Application to a deepwater turbidite reservoir.

    Accepted in SPEJ. To be published.

    Schoelkopf B., Smola A. (2002) Learning with kernels, MIT Press, Cambridge,

    664p.

    Strebelle, S. 2002. Conditional Simulation of Complex Geological Structures

    using Multiple-point Statistics, Mathematical Geology, 34(1): 1-22.


Recommended