Statistical Characterization and Geological Correlation of...

URTeC Control ID Number: 1922498

Statistical Characterization and Geological Correlation of Wells Using Automatic Learning Gaussian Mixture Models David Lubo*, University of Oklahoma and Simon Bolivar University Vikram Jayaram, University of Oklahoma Kurt J. Marfurt, University of Oklahoma Copyright 2014, Unconventional Resources Technology Conference (URTeC)

This paper was prepared for presentation at the Unconventional Resources Technology Conference held in Denver, Colorado, USA, 25-27 August 2014.

The URTeC Technical Program Committee accepted this presentation on the basis of information contained in an abstract submitt ed by the author(s). The contents of this paper

have not been reviewed by URTeC and URTeC does not warrant the accuracy, reliabili ty, or timeliness of any information herein. All information is the responsibility of, and, is

subject to corrections by the author(s). Any person or entity that relies on any information obtained from this paper does so at their own risk. The information herein does not

necessarily reflect any position of URTeC. Any reproduction, distribution, or storage of any part of this paper without the written consent of URTeC is prohibited.

Summary

Tying detailed well log measurements to lower resolution but a really extensive 3D seismic data volumes is key to

quantitative seismic interpretation. Ties using a poststack or prestack convolution model are routine, while

supervised classification tying well data to seismic attributes using neural networks and geostatistics are also well

established. However, unsupervised classification ties where the objective is to identify unknown patterns in the data

is less well established. In this paper, we use an automatic learning Gaussian Mixture Model to statistically

characterize the well logs, evaluate the probability distribution functions of different lithologies and then tie them

to corresponding 3D seismic attribute volumes. We precondition our four-dimensional data by projecting onto two

dimensions using Independent Component Analysis.

We apply this workflow to Diamond M Field within the San Andres Formation and the Horseshoe Atoll Reef

Complex, Scurry County, TX, and find the Gaussian Mixture Model is able to statistically characterize and resolve

lithological variations seen in the logs. In particular, we are able to clearly distinguish between lithologies from six

different wells in the region of interest. The final result is a probabilistic map that statistically measures the variability

of the seismic lithologies from the well logs.

Introduction

Tying sonic and density logs to poststack and prestack seismic data volumes using deterministic forward modeling

and impedance inversion is a central component of quantitative seismic interpretation. Geostatistical estimation of

porosity away from well logs using co-located co-kriging of seismic impedance measurements is also well

established. Other well measurements such as gamma ray response can be tied to seismic attribute volumes using

supervised learning neural networks (e.g. Verma, 2013). In contrast, unsupervised learning methods where we have

neither an explicit model nor a user-defined correlation of well logs to seismic attributes are rarely used to classify

seismic attribute facies. However, recent advances in pattern recognition and data mining algorithms coupled with

faster computers promise to make such quantitative interpretation workflows possible. The more popular

unsupervised mapping techniques include principal component analysis (PCA), self-organazing mapping (SOM)

and more recently the generative topographic mapping (GTM). All three of these methods are projection methods.

If we consider a 2D projection, PCA projects higher dimension attribute data onto the 2D plane that best fits the

data. In SOM and GTM, this plane allowed to deform into a 2D surface or manifold that best fits the data.

Roy (2013) applied all three methods to map multiple 3D seismic attribute volumes. The actual “classification” in

these “manifold mapping” algorithms is done the human interpreter who either color-codes or uses cross-plot to

separate out different clusters of interest. A limitation of PCA and GTM is that they do not provide a probabilistic

measure of confidence as to whether a given data vector falls within a given cluster. GTM does provide such a

probabilistic measure but current implementations assume a uniform distributed manifold grid in a latent space

http://www.urtec.org/

URTeC Control ID 1922498 2

represented by univariate Gaussian distributions. In this paper, we evaluate an automatic learning Gaussian Mixture

Model (GMM) that can statically characterize the well data and correlate these lithological variations to a 3D P-

Impedance attribute volume. With GMM, we do not assume a univariate representation of Gaussians, but rather a

scalable multivariate representation of the data set.

We begin our paper with a review of the Gaussian Mixture Model. We then statistically characterize our data using

first ICA and then GMM with the objective of statistically representing the original data. Finally, we compare our

predictions to lithological variations within the reservoir.

Gaussian Mixture Model

A Gaussian Mixture Model (GMM) is parametric model of the probability distribution that provides greater

flexibility than traditional unsupervised clustering algorithms. Multidimensional data such as the well data or a suite

of seismic attributes can be modeled by a multidimensional Gaussian Mixture. As the name implies, the GMM is a

linear sum of M Gaussian probability density functions N (PDFs), characterized by a weight, mean μjm and a J by J

covariance matrix, Cm for the jth of J attributes or well measurement aj(t) at time or depth t as

M

m

mjmjjj taNtap1

,),( C (1)

where,

jmjm

T

jmj

m

Jmjmj tatataN

)()(2

1exp

)2(

1,),( 1

2/12/C

CC (2)

We use an Expectation Maximization Algorithm to estimate the model parameters, means 𝝁, covariances 𝐂 and

weights 𝜶, which can be represented by 𝛌 = {𝜶, 𝝁, 𝐂}

Unfortunately, we cannot determine the true number of mixing components using only EM. To solve this situation,

we use a “dynamic” algorithm which is capable of adding and removing Gaussian components to better fit the data.

In other words, the algorithm uses a combination of covariance constraints to split, merge or dynamically prune the

mixture components to correctly fit the data and automate the learning process.

Following Jayaram (2009), we implement the GMM using the following steps:

Generate the a posteriori probability of each mixture component m given K data samples aj(t).

K

k

mjmj tkamp0

,),(| C (3)

Compute the mixture weight

K

k

mjmjj tkampK 0

,),(|1

C (4)

mean vector

K

k

mjmj

K

k

jmjmj

jm

tkamp

tkatkamp

0

0

,),(|

)(,),(|

C

C

(5)


and the covariance matrix

K

k

mjmj

K

k

T

jmjjmjmjmj

m

tkamp

tkatkaCtkamp

0

0

',),(|

)()(',),(|

C

C

(6)

where C and C’ are J by J matrices, and C’ is the covariance matrix of the previous iteration using EM

and applying the dynamic algorithm to add-remove Gaussian components.

Update the a posteriori probability by computation of a convergence function Q.

𝑄(𝜆, 𝜆′) =

∑ ∑ 𝑝(𝑚|𝑎𝑗(𝑡), μ𝑗𝑚, 𝐶𝑚) log[𝛼′𝑖𝑁(𝑎𝑗(𝑡), 𝜇′𝑗𝑚, 𝐶′𝑚)]𝑀𝑚=1

𝐽𝑗=1 (7)

Where, 𝜆 is the likelihood (maximum likelihood of the Gaussian mixture), 𝜆′ is the likelihood of the previous

iteration and 𝝁′, 𝜶′, 𝐂′ are the Gaussian parameters from the previous iteration.

Stop if the increase in value of Q function at the current iteration (𝑸𝒓) relative to the value of Q function at the

previous iteration(𝑸𝒓−𝟏). is less than a chosen threshold.

Independent Component Analysis (ICA)

Figure 1: Example of how Independent Component Analysis (ICA) works

To illustrate ICA we utilize the popular cocktail-party problem. Imagine that you are in a party room where two

people are speaking simultaneously. Further you have been given two microphones, which are recording the

combination of voices from two people as illustrated in Figure 1. Notice that each of these recorded signals, m1 and

m2 is a weighted sum of the signals s1 and s2 spoken by the two people, P1 and P2, which we denote by 𝑷𝟏and 𝑷𝟐.

𝑀1 = 𝑤11𝑠1 + 𝑤12𝑠2 (8)

𝑀2 = 𝑤21𝑠1 + 𝑤22𝑠2 (9)

where 𝑤11, 𝑤12, 𝑤21 and 𝑤22 are weighting parameters that depend on the distances of the microphones to the

speakers. The goal is to estimate the two original speech signals s1 and s2, using only the recorded signals

𝑚1 and 𝑚2 . For simplicity we assume no time delays or any other extra factors that could affect this simplistic

assumption.


Obviously, if we knew the parameters 𝑤𝑖𝑗, we could solve the linear equations (8) and (9) by classical methods. In

our case, we do not know these parameters. Independent Component Analysis assumes that s1 and s2 are statistically

independent, allowing us to write the equation

m=Ws (10), where 𝐖 is a mixing matrix.

If the signal components are statistically independent, we have

s=W-1m (11)

Based on the Central Limit Theorem, the arithmetic mean of a sufficiently large number of independent random

variables will approximate a Gaussian distribution. For that reason, we can choose W-1 so that it maximizes the non-

Gaussian behavior. In order to quantify the non-Gaussian nature, we use kurtosis which is a measure of the shape of

the distribution. The kurtosis is zero for a Gaussian random variable, and non-zero for a non-Gaussian random

variable.

Statistical Characterization using Gaussian Mixture Models (GMM)

Diamond M Field which is located in Scurry County, TX, approximately 80 mi northeast of Midland, Texas. The

trend is part of the Horseshoe Atoll Reef Complex (Figure 2), an arcuate chain of reef mounds, composed of mixed

types of bioclastic debris that accumulated in the interior part of the developing Midland basin during Late Paleozoic

time. (Vest, 1970)

The atoll complex consists of three bioclastic carbonate units formed during late Pennsylvanian to early Permian

time when shallow water carbonate deposits dominated most of the deposition in the Permian basin: the Strawn,

Canyon, and Cisco formations, in ascending stratigraphic order. Core and log data indicate the Cisco formation has

a greater biogenic build-up, erosion, and karst. The Canyon and Strawn formations are more horizontally bedded.

These heterogeneous carbonate units are separated by locally correlative shale beds (Galloway et al., 1983)

According to Dutton et al. (2003), high variability of the sea level gives rise to a layering of tight and porous layers

and hence significant reservoir heterogeneity.

For its part, the San Andres Formation is characterized by a mainly carbonate prograding stratigraphic unit. The

lithology includes dolomite, limestone, salt and some siliciclastics facies (Ramondetta, 1982).

Figure 2: Location of the Diamond M Field, Surry County, TX (Red Star) (Modified from Walker,1995)


Figure 3 shows the location of the wells. Red and yellow colors define the carbonate buildup in the Horseshoe Atoll

Reef Complex (Davogustto, 2013). For our study, we used wells J, M05, K07, Garnet, Topaz and M08, extracting

Poisson’s Ratio (dimensionless), Density (in g/cm3), Compressional Velocity (in µs/ft) and Gamma Ray (in API

units). We then applied Independent Component Analysis (ICA) reducing our data from four attribute dimensions

to two ICA dimensions, ICA1 and ICA2 which then served as input to the Gaussian Mixture Model algorithm.

We grouped the wells with similar ICA PDFs. and found that two pairs of wells had similar PDFs while the remaining

two wells had different PDFs.

Figure 3: Time-structure map showing the location of the wells Jade (J), M05, K07, M08, Garnet (G) and Topaz (T), in the Diamond M Field,

Surry County, TX. Red and yellow colors define the carbonate buildup. (After Davogustto, 2013).

Wells K07 and M08

Figure 4: Gaussian mixture fit of the distribution of each well after reaching convergence using the EM Algorithm. Note the V-shape (indicated by red dashed lines), which is much more pronounced in Well M08 than in Well M08. Also note an abrupt cutoff indicated by the yellow

arrows. The dynamic algorithm found that 12 Gaussians were required to parameterize the data of K07 and 18 Gaussians for the data of M08.

In Figure 4 we see a V-shape in both wells, which is much more pronounced in well M08. Also, we clearly observe

Measured

Depth (ft)


the presence of two distinct clusters and an abrupt change in the cluster on the left. Applying GMM to the data the

dynamic EM algorithm finds that we need 12 and 18 Gaussians to parameterize the data in wells K07 and M08 using

equations 1 and 2.

Analyzing the marginal PDFs of K07 and M08 (Figure 5), we observe that the GMM provides a good match to the

data. A marginal PDF of a random variable is just the integral of the joint PDF with respect to the other random

variable.

Figure 5: Marginal PDFs of wells K07 and M08 across each independent component. The Gaussian Mixture Model provides an excellent match to the original data

After obtaining these results, we analyzed the changes in acoustic impedance along the wells with the objective to

explain why these distributions have similar features, i.e. why wells K07 and M08 are very similar to each other but

different from the others wells.

Figure 6: Vertical slices through the 3D acoustic impedance volume through wells K07 and M08. Note the colors of the acoustic impedance in

the horizontal layers within both the wells are quite similar but there are subtle differences between each other as shown by the yellow arrows.

Therefore we demonstrate that the GMM is sensitive to such subtle lateral and vertical changes that exist in our reservoir as shown in Figure 4.

In Figure 6, we see that around wells K07 and M08 that the general behavior in the acoustic impedance volume is

similar. Away for the wells there is significant variation. This similarity along with the fact that the PDF and the

number of Gaussians to represent the data are similar, suggest that GMM is able to statistically represent the lateral

and vertical changes that exist in our reservoir.

Wells Garnet-Topaz

In Figure 7, we see here that the clusters seen in the wells K07 and M08 no longer form a V-shape but are almost

parallel to each other. Furthermore, the abrupt cutoff of the cluster on left is now diffuse, with the trend smoother

and flatter in the Garnet well. There is greater spread in Topaz than in Garnet. When we apply GMM, we find we

need 6 Gaussians to represent Garnet and 7 Gaussians to represent Topaz.

Analyzing the marginal PDF of Topaz and Garnet (Figure 8), we see that GMM matches correctly matches the PDF


of the input data.

Figure 7: Gaussian mixture fit of the distribution of each well after reaching convergence using the EM Algorithm. Unlike the V-shape in figure

5, the clusters are now almost parallel while the abrupt edge in the cluster on is now more diffuse. There is a somewhat greater spread in Topaz than in Garnet.

Figure 8: Marginal PDFs of wells Topaz and Garnet across each independent component. The Gaussian Mixture Model correctly matches our

original data.

In Figure 9, we see that the general behavior in the acoustic impedance is almost the same. Areas where there are

significant changes are indicated by the yellow arrows.

Figure 9: The general behavior in the acoustic impedance about the Garnet and Topaz wells is similar, with the exception of the zones indicated

by the yellow arrows. Note that there zones of high impedance in Garnet which do not appear in Topaz.

Wells Jade and M05


The GMM PDFs from wells Jade and M05 shown in Figure 10 are different from each other and from the PDFs

shown in Figures 4 and 7. We see two well defined clusters in the Jade well PDF, with a trend in the cluster on the

left that is neither completely flat, nor totally sharp. Given the location of well Jade we assume that we are in a

transition between the properties of Garnet-Topaz to K07-M08. The PDF of well M05 exhibits the sane abrupt

change in the cluster on the left; also we observe a twin-elongated shape. The GMM algorithm required 10 Gaussians

to represent well M05 and 14 to represent Jade.

Figure 10: Gaussian mixture fit of the distribution of each well after reaching convergence using the EM Algorithm. We see that the clusters wells show independent behavior. Also, in Jade we see that the trend in the cluster on the left which is not completely flat, nor totally sharp, so

we can assume that we are in a transition between the properties of Garnet-Topaz to K07-M08, which given the location of well Jade may be

feasible (Yellow Arrow). For its part, M05 shows a twin-elongated shape and reapers the abrupt change.

In Figure 11, we observe that the model is still matching the input data.

Figure 11: Marginal PDF of Wells Jade and M05 across each independent component. The Gaussian Mixture Model (GMM) matches correctly our original data.

Analyzing the acoustic impedance, we observe that each well has its own behavior. Note in Figure 12 that there are

some subtle changes in the acoustic impedance; this maybe is the reason why we observe different distributions in

the wells.


Figure 12: There are clear differences in the acoustic impedance within both wells; this may be the reason that we observe different distributions

in the wells.

Conclusions

We propose a workflow based on independent component analysis and Gaussian mixture models that statistically

represent the variability measured in four well logs. This characterization is derived without any user intervention.

For this reason, it is called an “automatic learning” GMM.

The variability in Gaussian Mixture Model (GMM) represents the lateral and vertical changes seen in acoustic

impedance within the reservoir. Although we used only six wells, we feel this statistical workflow can be useful in

clustering the thousands of wells that are currently used in modern resource plays. Further correlation to seismic

attribute clusters may provide a means to identify and map sweet spots and geohazards.

In a situation when we have thousands of wells in a resource play, the propose workflow can determine which wells

are alike and which are different. We can also deduce if is there a correlation to those that are alike or different to

know where we have good EUR or estimate the possibility of intersecting a geohazard within the reservoir.

Acknowledgements

We thank to Parallel Petroleum LLC for the use of their data and the Attribute-Assisted Seismic Processing and

Interpretation (AASPI) consortium for its financial support. Graphics were made using licenses to Petrel, provided

to OU for research and education courtesy of Schlumberger.

References

V. Jayaram and B. Usevitch, “Active Learning schemes for reduced dimensionality Hyperspectral Classification”,

invited paper at the 2009 IEEE Asilomar Conference on Signals, Systems and Computer, Naval Postgraduate School,

Monterey, CA, November 2009.

L. Li, Z.H. Wan, S.F. Zhan, C.F. Tao, and X.H. Ran, 2013, Prediction of Geological Characteristic Using Gaussian

Mixture Model: 75th EAGE Conference & Exhibition incorporating SPE EUROPEC 2013, London, UK, 10-13

June 2013.

Roy, A., 2013, Latent Space Classification of Seismic Facies. Ph.D. Dissertation, University of Oklahoma

Davogustto, O., 2013, Quantitative Geophysical Investigations at the Diamond M Field, Scurry County, Texas.

Ph.D. Dissertation, University of Oklahoma.


Vest, E. L., 1970, Oil fields of Pennsylvanian-Permian Horseshoe atoll, west Texas in Geology of giant petroleum

fields: American Association of Petroleum Geologists Special Volume 14, 185–203.

Walker, D. A., J. Golonka, A. M. Reid, and S. T. Reid, 1991, The effects of late Paleozoic paleolatitude and

paleogeography on carbonate sedimentation in the Midland Basin, Texas; Permian Basin plays: Society of Economic

Paleontologists and Mineralogists, Permian Basin Chapter, Tomorrow’s Technology Today, 141–162.

Galloway, W. E., T. E. Ewing, C. M. Garrett, N. Tyler, and D. G. Bebout, 1983, Atlas of major Texas oil reservoirs:

Bureau of Economic Geology.

Verma,S., Roy. A., Perez. R. and Marfurt, K., Mapping high frackability and high TOC zones in the Barnett Shale:

Supervise Probabilistic Neural Network vs unsupervised multi-attribute Kohonen SOM. In the Proceedings of the

82th Annual Meeting of the Society of Exploration Geophysicists.

Ramondetta, P. J., 1982, Facies and stratigraphy of the San Andres Formation, northern and northwestern shelves of

the Midland Basin, Texas and New Mexico: Bureau of Economic Geology Report of Investigation No. 128.

Date post:	28-Dec-2019
Category:	Documents
Upload:	others
View:	0 times
Download:	0 times

Statistical Characterization and Geological Correlation of...

Documents