URTeC Control ID Number: 1922498
Statistical Characterization and Geological Correlation of Wells Using Automatic Learning Gaussian Mixture Models David Lubo*, University of Oklahoma and Simon Bolivar University Vikram Jayaram, University of Oklahoma Kurt J. Marfurt, University of Oklahoma Copyright 2014, Unconventional Resources Technology Conference (URTeC)
This paper was prepared for presentation at the Unconventional Resources Technology Conference held in Denver, Colorado, USA, 25-27 August 2014.
The URTeC Technical Program Committee accepted this presentation on the basis of information contained in an abstract submitt ed by the author(s). The contents of this paper
have not been reviewed by URTeC and URTeC does not warrant the accuracy, reliabili ty, or timeliness of any information herein. All information is the responsibility of, and, is
subject to corrections by the author(s). Any person or entity that relies on any information obtained from this paper does so at their own risk. The information herein does not
necessarily reflect any position of URTeC. Any reproduction, distribution, or storage of any part of this paper without the written consent of URTeC is prohibited.
Summary
Tying detailed well log measurements to lower resolution but a really extensive 3D seismic data volumes is key to
quantitative seismic interpretation. Ties using a poststack or prestack convolution model are routine, while
supervised classification tying well data to seismic attributes using neural networks and geostatistics are also well
established. However, unsupervised classification ties where the objective is to identify unknown patterns in the data
is less well established. In this paper, we use an automatic learning Gaussian Mixture Model to statistically
characterize the well logs, evaluate the probability distribution functions of different lithologies and then tie them
to corresponding 3D seismic attribute volumes. We precondition our four-dimensional data by projecting onto two
dimensions using Independent Component Analysis.
We apply this workflow to Diamond M Field within the San Andres Formation and the Horseshoe Atoll Reef
Complex, Scurry County, TX, and find the Gaussian Mixture Model is able to statistically characterize and resolve
lithological variations seen in the logs. In particular, we are able to clearly distinguish between lithologies from six
different wells in the region of interest. The final result is a probabilistic map that statistically measures the variability
of the seismic lithologies from the well logs.
Introduction
Tying sonic and density logs to poststack and prestack seismic data volumes using deterministic forward modeling
and impedance inversion is a central component of quantitative seismic interpretation. Geostatistical estimation of
porosity away from well logs using co-located co-kriging of seismic impedance measurements is also well
established. Other well measurements such as gamma ray response can be tied to seismic attribute volumes using
supervised learning neural networks (e.g. Verma, 2013). In contrast, unsupervised learning methods where we have
neither an explicit model nor a user-defined correlation of well logs to seismic attributes are rarely used to classify
seismic attribute facies. However, recent advances in pattern recognition and data mining algorithms coupled with
faster computers promise to make such quantitative interpretation workflows possible. The more popular
unsupervised mapping techniques include principal component analysis (PCA), self-organazing mapping (SOM)
and more recently the generative topographic mapping (GTM). All three of these methods are projection methods.
If we consider a 2D projection, PCA projects higher dimension attribute data onto the 2D plane that best fits the
data. In SOM and GTM, this plane allowed to deform into a 2D surface or manifold that best fits the data.
Roy (2013) applied all three methods to map multiple 3D seismic attribute volumes. The actual “classification” in
these “manifold mapping” algorithms is done the human interpreter who either color-codes or uses cross-plot to
separate out different clusters of interest. A limitation of PCA and GTM is that they do not provide a probabilistic
measure of confidence as to whether a given data vector falls within a given cluster. GTM does provide such a
probabilistic measure but current implementations assume a uniform distributed manifold grid in a latent space
URTeC Control ID 1922498 2
represented by univariate Gaussian distributions. In this paper, we evaluate an automatic learning Gaussian Mixture
Model (GMM) that can statically characterize the well data and correlate these lithological variations to a 3D P-
Impedance attribute volume. With GMM, we do not assume a univariate representation of Gaussians, but rather a
scalable multivariate representation of the data set.
We begin our paper with a review of the Gaussian Mixture Model. We then statistically characterize our data using
first ICA and then GMM with the objective of statistically representing the original data. Finally, we compare our
predictions to lithological variations within the reservoir.
Gaussian Mixture Model
A Gaussian Mixture Model (GMM) is parametric model of the probability distribution that provides greater
flexibility than traditional unsupervised clustering algorithms. Multidimensional data such as the well data or a suite
of seismic attributes can be modeled by a multidimensional Gaussian Mixture. As the name implies, the GMM is a
linear sum of M Gaussian probability density functions N (PDFs), characterized by a weight, mean μjm and a J by J
covariance matrix, Cm for the jth of J attributes or well measurement aj(t) at time or depth t as
M
m
mjmjjj taNtap1
,),( C (1)
where,
jmjm
T
jmj
m
Jmjmj tatataN
)()(2
1exp
)2(
1,),( 1
2/12/C
CC (2)
We use an Expectation Maximization Algorithm to estimate the model parameters, means 𝝁, covariances 𝐂 and
weights 𝜶, which can be represented by 𝛌 = {𝜶, 𝝁, 𝐂}
Unfortunately, we cannot determine the true number of mixing components using only EM. To solve this situation,
we use a “dynamic” algorithm which is capable of adding and removing Gaussian components to better fit the data.
In other words, the algorithm uses a combination of covariance constraints to split, merge or dynamically prune the
mixture components to correctly fit the data and automate the learning process.
Following Jayaram (2009), we implement the GMM using the following steps:
Generate the a posteriori probability of each mixture component m given K data samples aj(t).
K
k
mjmj tkamp0
,),(| C (3)
Compute the mixture weight
K
k
mjmjj tkampK 0
,),(|1
C (4)
mean vector
K
k
mjmj
K
k
jmjmj
jm
tkamp
tkatkamp
0
0
,),(|
)(,),(|
C
C
(5)
URTeC Control ID 1922498 3
and the covariance matrix
K
k
mjmj
K
k
T
jmjjmjmjmj
m
tkamp
tkatkaCtkamp
0
0
',),(|
)()(',),(|
C
C
(6)
where C and C’ are J by J matrices, and C’ is the covariance matrix of the previous iteration using EM
and applying the dynamic algorithm to add-remove Gaussian components.
Update the a posteriori probability by computation of a convergence function Q.
𝑄(𝜆, 𝜆′) =
∑ ∑ 𝑝(𝑚|𝑎𝑗(𝑡), μ𝑗𝑚, 𝐶𝑚) log[𝛼′𝑖𝑁(𝑎𝑗(𝑡), 𝜇′𝑗𝑚, 𝐶′𝑚)]𝑀𝑚=1
𝐽𝑗=1 (7)
Where, 𝜆 is the likelihood (maximum likelihood of the Gaussian mixture), 𝜆′ is the likelihood of the previous
iteration and 𝝁′, 𝜶′, 𝐂′ are the Gaussian parameters from the previous iteration.
Stop if the increase in value of Q function at the current iteration (𝑸𝒓) relative to the value of Q function at the
previous iteration(𝑸𝒓−𝟏). is less than a chosen threshold.
Independent Component Analysis (ICA)
Figure 1: Example of how Independent Component Analysis (ICA) works
To illustrate ICA we utilize the popular cocktail-party problem. Imagine that you are in a party room where two
people are speaking simultaneously. Further you have been given two microphones, which are recording the
combination of voices from two people as illustrated in Figure 1. Notice that each of these recorded signals, m1 and
m2 is a weighted sum of the signals s1 and s2 spoken by the two people, P1 and P2, which we denote by 𝑷𝟏and 𝑷𝟐.
𝑀1 = 𝑤11𝑠1 + 𝑤12𝑠2 (8)
𝑀2 = 𝑤21𝑠1 + 𝑤22𝑠2 (9)
where 𝑤11, 𝑤12, 𝑤21 and 𝑤22 are weighting parameters that depend on the distances of the microphones to the
speakers. The goal is to estimate the two original speech signals s1 and s2, using only the recorded signals
𝑚1 and 𝑚2 . For simplicity we assume no time delays or any other extra factors that could affect this simplistic
assumption.
URTeC Control ID 1922498 4
Obviously, if we knew the parameters 𝑤𝑖𝑗, we could solve the linear equations (8) and (9) by classical methods. In
our case, we do not know these parameters. Independent Component Analysis assumes that s1 and s2 are statistically
independent, allowing us to write the equation
m=Ws (10), where 𝐖 is a mixing matrix.
If the signal components are statistically independent, we have
s=W-1m (11)
Based on the Central Limit Theorem, the arithmetic mean of a sufficiently large number of independent random
variables will approximate a Gaussian distribution. For that reason, we can choose W-1 so that it maximizes the non-
Gaussian behavior. In order to quantify the non-Gaussian nature, we use kurtosis which is a measure of the shape of
the distribution. The kurtosis is zero for a Gaussian random variable, and non-zero for a non-Gaussian random
variable.
Statistical Characterization using Gaussian Mixture Models (GMM)
Diamond M Field which is located in Scurry County, TX, approximately 80 mi northeast of Midland, Texas. The
trend is part of the Horseshoe Atoll Reef Complex (Figure 2), an arcuate chain of reef mounds, composed of mixed
types of bioclastic debris that accumulated in the interior part of the developing Midland basin during Late Paleozoic
time. (Vest, 1970)
The atoll complex consists of three bioclastic carbonate units formed during late Pennsylvanian to early Permian
time when shallow water carbonate deposits dominated most of the deposition in the Permian basin: the Strawn,
Canyon, and Cisco formations, in ascending stratigraphic order. Core and log data indicate the Cisco formation has
a greater biogenic build-up, erosion, and karst. The Canyon and Strawn formations are more horizontally bedded.
These heterogeneous carbonate units are separated by locally correlative shale beds (Galloway et al., 1983)
According to Dutton et al. (2003), high variability of the sea level gives rise to a layering of tight and porous layers
and hence significant reservoir heterogeneity.
For its part, the San Andres Formation is characterized by a mainly carbonate prograding stratigraphic unit. The
lithology includes dolomite, limestone, salt and some siliciclastics facies (Ramondetta, 1982).
Figure 2: Location of the Diamond M Field, Surry County, TX (Red Star) (Modified from Walker,1995)
URTeC Control ID 1922498 5
Figure 3 shows the location of the wells. Red and yellow colors define the carbonate buildup in the Horseshoe Atoll
Reef Complex (Davogustto, 2013). For our study, we used wells J, M05, K07, Garnet, Topaz and M08, extracting
Poisson’s Ratio (dimensionless), Density (in g/cm3), Compressional Velocity (in µs/ft) and Gamma Ray (in API
units). We then applied Independent Component Analysis (ICA) reducing our data from four attribute dimensions
to two ICA dimensions, ICA1 and ICA2 which then served as input to the Gaussian Mixture Model algorithm.
We grouped the wells with similar ICA PDFs. and found that two pairs of wells had similar PDFs while the remaining
two wells had different PDFs.
Figure 3: Time-structure map showing the location of the wells Jade (J), M05, K07, M08, Garnet (G) and Topaz (T), in the Diamond M Field,
Surry County, TX. Red and yellow colors define the carbonate buildup. (After Davogustto, 2013).
Wells K07 and M08
Figure 4: Gaussian mixture fit of the distribution of each well after reaching convergence using the EM Algorithm. Note the V-shape (indicated by red dashed lines), which is much more pronounced in Well M08 than in Well M08. Also note an abrupt cutoff indicated by the yellow
arrows. The dynamic algorithm found that 12 Gaussians were required to parameterize the data of K07 and 18 Gaussians for the data of M08.
In Figure 4 we see a V-shape in both wells, which is much more pronounced in well M08. Also, we clearly observe
Measured
Depth (ft)
URTeC Control ID 1922498 6
the presence of two distinct clusters and an abrupt change in the cluster on the left. Applying GMM to the data the
dynamic EM algorithm finds that we need 12 and 18 Gaussians to parameterize the data in wells K07 and M08 using
equations 1 and 2.
Analyzing the marginal PDFs of K07 and M08 (Figure 5), we observe that the GMM provides a good match to the
data. A marginal PDF of a random variable is just the integral of the joint PDF with respect to the other random
variable.
Figure 5: Marginal PDFs of wells K07 and M08 across each independent component. The Gaussian Mixture Model provides an excellent match to the original data
After obtaining these results, we analyzed the changes in acoustic impedance along the wells with the objective to
explain why these distributions have similar features, i.e. why wells K07 and M08 are very similar to each other but
different from the others wells.
Figure 6: Vertical slices through the 3D acoustic impedance volume through wells K07 and M08. Note the colors of the acoustic impedance in
the horizontal layers within both the wells are quite similar but there are subtle differences between each other as shown by the yellow arrows.
Therefore we demonstrate that the GMM is sensitive to such subtle lateral and vertical changes that exist in our reservoir as shown in Figure 4.
In Figure 6, we see that around wells K07 and M08 that the general behavior in the acoustic impedance volume is
similar. Away for the wells there is significant variation. This similarity along with the fact that the PDF and the
number of Gaussians to represent the data are similar, suggest that GMM is able to statistically represent the lateral
and vertical changes that exist in our reservoir.
Wells Garnet-Topaz
In Figure 7, we see here that the clusters seen in the wells K07 and M08 no longer form a V-shape but are almost
parallel to each other. Furthermore, the abrupt cutoff of the cluster on left is now diffuse, with the trend smoother
and flatter in the Garnet well. There is greater spread in Topaz than in Garnet. When we apply GMM, we find we
need 6 Gaussians to represent Garnet and 7 Gaussians to represent Topaz.
Analyzing the marginal PDF of Topaz and Garnet (Figure 8), we see that GMM matches correctly matches the PDF
URTeC Control ID 1922498 7
of the input data.
Figure 7: Gaussian mixture fit of the distribution of each well after reaching convergence using the EM Algorithm. Unlike the V-shape in figure
5, the clusters are now almost parallel while the abrupt edge in the cluster on is now more diffuse. There is a somewhat greater spread in Topaz than in Garnet.
Figure 8: Marginal PDFs of wells Topaz and Garnet across each independent component. The Gaussian Mixture Model correctly matches our
original data.
In Figure 9, we see that the general behavior in the acoustic impedance is almost the same. Areas where there are
significant changes are indicated by the yellow arrows.
Figure 9: The general behavior in the acoustic impedance about the Garnet and Topaz wells is similar, with the exception of the zones indicated
by the yellow arrows. Note that there zones of high impedance in Garnet which do not appear in Topaz.
Wells Jade and M05
URTeC Control ID 1922498 8
The GMM PDFs from wells Jade and M05 shown in Figure 10 are different from each other and from the PDFs
shown in Figures 4 and 7. We see two well defined clusters in the Jade well PDF, with a trend in the cluster on the
left that is neither completely flat, nor totally sharp. Given the location of well Jade we assume that we are in a
transition between the properties of Garnet-Topaz to K07-M08. The PDF of well M05 exhibits the sane abrupt
change in the cluster on the left; also we observe a twin-elongated shape. The GMM algorithm required 10 Gaussians
to represent well M05 and 14 to represent Jade.
Figure 10: Gaussian mixture fit of the distribution of each well after reaching convergence using the EM Algorithm. We see that the clusters wells show independent behavior. Also, in Jade we see that the trend in the cluster on the left which is not completely flat, nor totally sharp, so
we can assume that we are in a transition between the properties of Garnet-Topaz to K07-M08, which given the location of well Jade may be
feasible (Yellow Arrow). For its part, M05 shows a twin-elongated shape and reapers the abrupt change.
In Figure 11, we observe that the model is still matching the input data.
Figure 11: Marginal PDF of Wells Jade and M05 across each independent component. The Gaussian Mixture Model (GMM) matches correctly our original data.
Analyzing the acoustic impedance, we observe that each well has its own behavior. Note in Figure 12 that there are
some subtle changes in the acoustic impedance; this maybe is the reason why we observe different distributions in
the wells.
URTeC Control ID 1922498 9
Figure 12: There are clear differences in the acoustic impedance within both wells; this may be the reason that we observe different distributions
in the wells.
Conclusions
We propose a workflow based on independent component analysis and Gaussian mixture models that statistically
represent the variability measured in four well logs. This characterization is derived without any user intervention.
For this reason, it is called an “automatic learning” GMM.
The variability in Gaussian Mixture Model (GMM) represents the lateral and vertical changes seen in acoustic
impedance within the reservoir. Although we used only six wells, we feel this statistical workflow can be useful in
clustering the thousands of wells that are currently used in modern resource plays. Further correlation to seismic
attribute clusters may provide a means to identify and map sweet spots and geohazards.
In a situation when we have thousands of wells in a resource play, the propose workflow can determine which wells
are alike and which are different. We can also deduce if is there a correlation to those that are alike or different to
know where we have good EUR or estimate the possibility of intersecting a geohazard within the reservoir.
Acknowledgements
We thank to Parallel Petroleum LLC for the use of their data and the Attribute-Assisted Seismic Processing and
Interpretation (AASPI) consortium for its financial support. Graphics were made using licenses to Petrel, provided
to OU for research and education courtesy of Schlumberger.
References
V. Jayaram and B. Usevitch, “Active Learning schemes for reduced dimensionality Hyperspectral Classification”,
invited paper at the 2009 IEEE Asilomar Conference on Signals, Systems and Computer, Naval Postgraduate School,
Monterey, CA, November 2009.
L. Li, Z.H. Wan, S.F. Zhan, C.F. Tao, and X.H. Ran, 2013, Prediction of Geological Characteristic Using Gaussian
Mixture Model: 75th EAGE Conference & Exhibition incorporating SPE EUROPEC 2013, London, UK, 10-13
June 2013.
Roy, A., 2013, Latent Space Classification of Seismic Facies. Ph.D. Dissertation, University of Oklahoma
Davogustto, O., 2013, Quantitative Geophysical Investigations at the Diamond M Field, Scurry County, Texas.
Ph.D. Dissertation, University of Oklahoma.
URTeC Control ID 1922498 10
Vest, E. L., 1970, Oil fields of Pennsylvanian-Permian Horseshoe atoll, west Texas in Geology of giant petroleum
fields: American Association of Petroleum Geologists Special Volume 14, 185–203.
Walker, D. A., J. Golonka, A. M. Reid, and S. T. Reid, 1991, The effects of late Paleozoic paleolatitude and
paleogeography on carbonate sedimentation in the Midland Basin, Texas; Permian Basin plays: Society of Economic
Paleontologists and Mineralogists, Permian Basin Chapter, Tomorrow’s Technology Today, 141–162.
Galloway, W. E., T. E. Ewing, C. M. Garrett, N. Tyler, and D. G. Bebout, 1983, Atlas of major Texas oil reservoirs:
Bureau of Economic Geology.
Verma,S., Roy. A., Perez. R. and Marfurt, K., Mapping high frackability and high TOC zones in the Barnett Shale:
Supervise Probabilistic Neural Network vs unsupervised multi-attribute Kohonen SOM. In the Proceedings of the
82th Annual Meeting of the Society of Exploration Geophysicists.
Ramondetta, P. J., 1982, Facies and stratigraphy of the San Andres Formation, northern and northwestern shelves of
the Midland Basin, Texas and New Mexico: Bureau of Economic Geology Report of Investigation No. 128.