Contentsreservoirsignals.com/wp...electrofaciesclassificationusingdata... · Priyank Srivastava (PE...

Priyank Srivastava (PE 5370: Mid- Term Project Report)

Contents Executive Summary ...................................................................................................................................... 2

PART- 1 Identify Electro facies from Given Logs using data mining algorithms ........................................ 3

Selection of wells ...................................................................................................................................... 3

Data cleaning and Preparation of data for input to data mining .............................................................. 3

Selection of data mining technique & Workflow ...................................................................................... 7

Mathematical Background of PCA and K-Means clustering ..................................................................... 9

Interpretation of Results ........................................................................................................................... 9

Relationship of Predicted Electro facies with original variables ......................................................... 10

“The folly of trusting Data mining” ......................................................................................................... 13

PART-2: Doing Clustering using SOM and “R” package ............................................................................ 13

Clustering and SOM in ‘R’ ....................................................................................................................... 14

PART-3: Clustering using Merged Dataset of all wells .............................................................................. 16

Conclusion ................................................................................................................................................... 17

Appendix – A: R Code for Part-III ................................................................................................................ 18


Executive Summary

The Objective of present project is to prepare a data mining model to estimate electro facies from set of

open-hole well logs. This trained model can then be used as a predictive tool for estimating unknown logs

at any new location. Present workflow utilizes principal component analysis (PCA) and K-Means clustering

algorithm for preparation of data mining model.

This report is divided into three parts in part-I the data mining algorithm is run on individual wells which

uses different attributes for each well depending on the availability. The produced clusters are mapped

back to individual wells based on gamma ray values which broadly shows Facies 1 as high gamma ray,

Facies 3 as mix of sand shale sequence and Facies 2 as low gamma ray. Presence of these facies is then

correlated with corresponding production rates from different wells to figure out reservoir quality of each

facies. Though K-means always converges the answer given by K-means depends on the initial centers. It

also returns centers that are averages of data points. So some of the wells (Young Joe; Flanik Randal)

which do not have complete dataset doesn’t show any clusters and thus it is difficult to generalize the

interpretation from this Model. This part ends with discussing various disadvantages of K-means

clustering. We can predict the unknown logs in these wells using present data mining model but it is out

of scope of present project. The process of data mining helps uncovering the hidden patterns in the data

set by exposing the relationships between attributes. But the issue is that it uncovers a lot of unuseful

patterns. It is up to the domain expert to filter through the patterns and accept the ones that are valid to

answer the objective question. Thus, in part-II some of the wells are used for clustering using self-

organizing maps (SOM). In part-III, 5 attributes (GR, AT90, PEF, RHOB and NPHI) are merged for all the 10

selected wells and similar workflow (PCA+K-means) is run to generate a generalize model for three

clusters from which different facies and its characteristics are identified.

To conclude based on study in Part III, I can summarize my finding in following table

Cluster name Interpretation

1 Shales/Sands with low porosity (0.09) and resistivity (9.12). Probably tight shales with high clay bound water (since , high Nphi 0.289)

3 Shales/Sands with very low porosity (0.038) but higher resistivity (16.26) and grain density than facies 1. Probably contains hydrocarbon saturation and less water

2 Probably the hottest spot in this region with good porosity and high Hydrocarbon saturation. So the well with highest amount of Facies 2 will be the most prolific producer.


PART- 1 Identify Electro facies from Given Logs using data mining algorithms

Selection of wells I choose the wells according to their API numbers so 10 wells in county parker (API: 42-367) were chosen.

But not all wells have equal amount of data while some wells have processed logs some don’t have it. The

table below gives the API numbers with corresponding well name and Production rate for the chosen

wells.

API’s Well name Production rate* (Mscf/day)

42-367-34050 Moore --

42-367-34447 Deaton 202

42-367-34576 Frank-Mask 830

42-367-34094 Sugar Tree 532

42-367-34227 Westhoff John 1029

42-367-34343 Flamik Randal 201

42-367-34385 Young Joe 779

42-367-34438 Kinyon 493

42-367-34744 Hagler 1365

42-367-34883 Lake Wheatherford 965

*From Drillininginfo.com

Based on the production rate, the wells can be divided in three categories. Our Goal in this project is to

(1) classify each well in electro-facies. (2) If i can relate the performance of well with newly classified

electro facies.

Data cleaning and Preparation of data for input to data mining Since the logs given to us were processed and contains many redundant and missing parameters. It

becomes imperative to select and clean the data for selection of attribute we want as input to data

mining algorithms. We want to develop electro facies for upper Barnett and lower Barnett zones local

stratigraphy of subsurface is given in Figure 1 as observed Barnett shales is divided in two parts by

forestburg limestone Thus, before inputting data in any data mining algorithm we need to get rid of

these limestone zones. Since in all of the given logs resistivity of mud is of order of 0.4 Ohm-meter we

can be sure that all the wells are drilled by water-based muds and hence we can use Photoelectric (PE)

log as lithology indicator since carbonates usually have high PE values of 5. We can easily screen out all

the values of log which shows PE < 4. Additional filtering is done by screening out all depths which

shows Density (RHOB) >2.7 gm/cc. Figure 2 shows the workflow used for cleaning and filtering of depth

so that our final output is depth and parameters of only upper and lower Barnett shale.


Figure 3 contains the list of attributes selected for each well. It can be observed that flamik randal and

young Joe well contains least amount of attributes.


Figure 1 : General stratigraphy of the Ordovician to Pennsylvanian section in fort-worth basin (Loucks & Ruppel, 2007)

Figure 2: Workflow for Data cleaning

Select all the depths with PEF < 4

Select all the depths with non zero GR , RHOB , AT90 and 0<NPHI <1

Normalize every parameter with its mean and variance


Figure 3: Table listed below gives the summary of different meaningful curves which could be extracted from each well.

• GR(Max:368;Min:18)

• PEF(Max:6.2;Min:2.2)

• AT90(Max:862;Min:0.68)• NPHI(Max:0.397;Min:.002)

• RHOB(Max:2.76;Min:2.34)

• WCLC(AVE: 0.183)

• WILL(AVE:0.69)

• WQUA(AVE:0.471)

• VCL(AVE:0.332)

Moore ( 9 Attributes)


• PEF(Max:5.18;Min:1.8)

• AT90

• NPHI(Max:0.374;Min:0)

• RHOB(Max:2.825;Min:2.39)

• WCLC(AVE: 0.176)

• WDOL(AVE:0.096)

• WILL(AVE:0.136)

• WQUA(AVE:0.474)

• WTOC(AVE:0.022)

• VCL(AVE:0.237)

Deaton(11 Attributes)


• NPHI(Max:0.30;Min:0)

• RHOB(Max:2.705;Min:0)

• VCL(AVE:0.289)

• PR (AVE: 0.227)

• CB (0.205)

Frank Mask(6 Attributes)


• PEF(Min:0;Max:9.776)

• AT90(Min:0.224;Max:173)

• NPHI(Min:-0.014;Max:0.569)

• RHOB(Min:2.75;Max:0.30)

• WILL

• WQUA

• VCL

• PR

• BULKMOD

Sugartree (10 Attributes)


• PEF(Min:2.28;Max:6.234)

• NPHI(Min:0.002;Max:0.397

• RHOB(Max:2.76;Min:2.34)

• WCAR (AVE:0.025)

• WCLC(AVE:0.183)

• WILL(AVE:0.311)

• WQUA(AVE:0.471)

• VCL(AVE:0.332)

Westhoff John (9 Attributes)

• GR(Min:0,Max:883)

• PEF(Min:0,Max:11.54)

• AT90(Min:0,Max:927)

• NPHI(Min:0,Max:2.7)

• RHOB(Min:0;Max:164)

Flamik Randal (5 Attributes)

• GR(Min:0,Max:883)

• PEF(Min:0,Max:11.54)

• AT90

• NPHI

• RHOB

Young Joe (5 Attributes)

• GR

• PEF

• AT90

• NPHI

• RHOB

• PR

• YME

Kinyon (7 Attributes)

• GR

• PEF

• AT90

• NPHI

• RHOB

• WCLC

• WILL

• WQUA

• VCL

Hagler (9 Attributes)

• GR

• PEF

• AT90

• NPHI

• RHOB

• WILL

• WQUA

• WPYR

Lake whetherford


Selection of data mining technique & Workflow Due to high volume of log data. It is desirable to choose unsupervised data mining techniques to first find

out if our data contains any hidden trends or patterns. Since many wells have log attributes as high as

200. So, it becomes necessary to first reduce the dimensionality of data before applying any clustering

algorithm. I use principal component analysis (PCA) to first reduce the dimensionality of data in three

principal components and consequently use K-means clustering algorithm to optimize and generate

clusters in the data. Figure 4 gives PCA & clustering density plots for different wells in sequence. Clustering

is done using X-means algorithm which automatically optimizes number of clusters by iteration. However,

due to uneven size of clustering as shown in Fig-4 it can be argued successfully that this method is not

giving us the right clusters that we want since in the quest to minimize the within cluster sum of squares

error , the X-means clustering gave more weight to larger clusters. Thus, to conclude this clustering

technique could not be applied in this case since K-means assumes that each cluster have roughly equal

number of observations. Also, PCA is the methodology which is applied to correlated attributes since

presence of variance in any one direction is necessary so if the data doesn’t show any correlation than

applying PCA is not a meaningful task.

Table 1 : Parameters used in X-means clustering and PCA analysis

PCA No. of components selected based of keeping variance of 90%

X- Means Clustering

Min. clusters 2

Max. clusters 60

Numerical measures Euclidean distances

Max. runs 10

Max. Optimization steps 100


Figure 4 : PCA Density Plots with X- Means clustering for following wells in order from top left 1. Moore 2. Deaton 3. Frankmask 4. Sugar tree 5. Westhoff John 6. Flanik Randal 7. Young Joe 8. Kinyon 9. Hagler 10. Lake Wheatherford. While Using X-Means clustering most of the wells can be described by three clusters in PCA data but Well 6 & 7 does not display any specific clusters.


Mathematical Background of PCA and K-Means clustering PCA is the dimensionality reduction technique to reduce dimensionality of data for a correlated attribute

dataset. The 1st principal component is the direction of maximum variance in data. While each principal

component is independent and orthogonal to each other. Every attribute needs to be scaled before

applying PCA algorithm to it. PCA is a very useful tool for exploratory data analysis and predictive

modelling of huge dimension dataset. While PCA helps to see internal patterns in data next step for data

mining is Clustering, although literature is rich with many different algorithms for efficient way to do

clustering fundamental workflow for clustering is shown in

Table 2

Table 2 : Workflow for clustering algorithms

Interpretation of Results Since Principal components as such does not have any physical meanings. I have to transform the

predicted clusters back to the original data.

Table below gives the distribution of data-points in different clusters for all the analyzed wells:

Well name No. of data points used in analyses after cleaning

Data points in cluster 1

Data point in cluster 2



Moore 1884 629 467 788 --

Deaton 2264 1729 125 410 --

Frank mask 3212 2642 570 -- --

Sugar tree 925 581 56 288 --

Westhoff john 8016 6539 1477 -- --

Flanik Randal 500 240 115 124 21

Young Joe 80 37 8 35 --

Kinyon 6462 538 1680 4244 --

Hagler 2535 1211 801 523 --

Wheatherford lake 5085 1178 3121 786 --

Determine No. of Clusters (Centroids) to

be placed

Find distance of each data point to each centroid and assign

centroid to each data point based on

minimizing sum of distance distance

find centroid of the clusters done in first

iteration and reclassify each

datapoint to it's cluster

recompute centroid and reclassify based

on minimizing sum of distances from

centroid

Iterate until things converge and number of clusters optimizes.


Relationship of Predicted Electro facies with original variables

Figure 5 : Moore well can be subdivided into three electro facies using data mining which can be correlated with gamma ray values. Facies 1 shows high gamma ray and are most probably shale interval while facies 2 have lesser radioactivity as compare to facies 1. Facies 3 have the lowest gamma ray reading.

4400

4600

4800

5000

5200

5400

5600

0 50 100 150 200 250 300 350 400D

epth

GR & Electrofacies For Moore Well

GR ELECTROFACIES

Facies 1 Dominated

Facies 3 Dominated

Facies 2 Dominated


Figure 6 : Deaton well seem to contain only facies 1 and facies 3. While amount of facies 2 is very less. In Frank mask well only two type of facies is present but it is not easy to classify them just based on gamma ray log.

4900

5100

5300

5500

5700

5900

6100

0 100 200 300 400

Dep

th

GR & Electrofacies for Deaton Well

GRELECTROFACIES

Facies 1 Dominated

Facies 3 Dominated

Facies 1 Dominated

5400

5600

5800

6000

6200

6400

6600

6800

0 100 200 300 400

Dep

th

GR & Electrofacies Frank mask

GRELECTROFACIES

Facies 1 Dominated

Facies 2 Dominated

Facies 1 Dominated


5600

5800

6000

6200

6400

6600

6800

7000

0 100 200 300 400

Dep

thGR & Electrofacies Kinyon

GR

Facies 3

Facies 2

Facies 1

5600

5800

6000

6200

6400

6600

6800

7000

0 100 200 300 400

Dep

th

GR & Electrofacies Hagler

GR

Facies 3

Facies 2

Facies 1


“The folly of trusting Data mining” Most of Data mining algorithm are heuristic processes in which no physical understanding is needed for

application of any process. The process of data mining is suppose to show us hidden trends. However,

applying any data mining task blindly can lead to completely wrong outputs. Given below are some of the

caveats of using K-means clustering to real life dataset.

1. K-means assumes the variance of the distribution of each attribute is spherical

2. Doesn’t work on spherical dataset

Usually higher the dimensions of data more difficult is applying K-means to it efficiently.

3. The Curse of Unevenly sized clusters

K-means assumes the prior probability for all K clusters are the same i.e. each cluster has roughly equal

number of observations. Which is obviously not the same with our dataset.

PART-2: Doing Clustering using SOM and “R” package Figure 7 Shows use of self-organizing maps U matrix plot with K means clustering for all the wells using

same attributes as used in part-1

Figure 7 : SOM clustering for Moore well


However, again it is difficult to evaluate the accuracy of clustering.

Clustering and SOM in ‘R’ Since ‘R’ provides some flexibility and quality checks for clustering. The filtered data obtained from part-

1 data cleaning workflow with additional constraint of GR value >120 is used as an input to R and I used

K-means clustering technique to see how it performs. This is done for following four wells Moore, Deaton,

Frankmask, Kinyon. This section describes the results of using ‘R’.

Figure 8 : Clustering Optimization for Moore well

Figure 9: Clustering optimization of Deaton Well


Figure 10 : Clustering Optimization for Frank mask well

Figure 11: Clustering optimization of Kinyon Well


Figure 12 : Clustering optimization of Hagler Well

PART-3: Clustering using Merged Dataset of all wells Names of selected wells. This time I just used the wells which contains all these 5 curves i.e. GR, AT90,

PEF, NPHI, and RHOB. Following wells were selected for the analysis

Bonds ranch C-1

Hyder 1H

Jerome Russell

John W Porter 3

Massey Unit

McFarland-Dixon

Moore-Price

Sol Carpenter Heirs

Sugar tree

Upham Joe Johnson

Applying the same workflow to merged dataset gives following three clusters as given in

Figure 13 : PCA clusters for merged dataset


The table below gives centroid for each cluster

Cluster number

PC1 PC2 Avg. GR

(API)

Avg. DPHI

Avg. PEF

Avg. At 90

Avg. RHOB

Avg. NPHI

2 -1.455 0.08 154 0.124 3.13 152 2.49 0.177

3 1.5113 0.8375 137 0.038 3.19 16.26 2.64 0.191

1 1.2253 -1.647 134 0.09 3.33 9.12 2.55 0.289

Conclusion The clusters can be interpreted as follows:

Cluster name Interpretation

1 Shales/Sands with low porosity (0.09) and resistivity (9.12). Probably tight shales with high clay bound water (since , high Nphi 0.289)

3 Shales/Sands with very low porosity (0.038) but higher resistivity (16.26) and grain density than facies 1. Probably contains hydrocarbon saturation and less water

2 Probably the hottest spot in this region with good porosity and high Hydrocarbon saturation. So the well with highest amount of Facies 2 will be the most prolific

producer.


Appendix – A: R Code for Part-III setwd("C:/Users/priya/Desktop/DMP_midterm/R") ms<-read.table("Book1_final.csv",header = TRUE ,sep = ",") ms[is.na(ms)]<-0 attach(ms) ls.str(ms) #na.rm=true #x[!is.na(x)] ms<-ms[ ,c(1,2,4,5,6,7,8)] #removing values of PEF>4 and GR<120 msfilter<-ms[(ms$PEF<4&ms$GR>110),] ##Doing k means clustering in r par(mfrow=row(1,3),mar=c(4,4,2,1)) #mydata<-scale(msfilter) ##applying PCA for sacled variable mspca<-prcomp(msfilter,center=TRUE , scale=TRUE, retx=TRUE) fulldata<-data.frame(msfilter,mspca$x) mydata<-mspca$x # Determine number of clusters wss <- (nrow(xmydata)-1)*sum(apply(mydata,2,var)) for (i in 2:15) wss[i] <- sum(kmeans(mydata, centers=i)$withinss) dev.copy(pdf,"myplot.pdf") plot(1:15, wss, type="b", xlab="Number of Clusters", ylab="Within groups sum of squares") fit<-kmeans(mydata,3,iter.max = 100 , nstart=50) #get cluster means aggregate(mydata,by=list(fit$cluster),FUN=mean) #append cluster assignment mydata<-data.frame(fulldata,fit$cluster) library(cluster) clusplot(mydata,fit$cluster,color=TRUE,shade=TRUE,labels=0,lines=0) write.table(mydata,"C:/Users/priya/Desktop/DMP_midterm/R/mergeddata.txt",sep="\t")

Date post:	30-Mar-2018
Category:	Documents
Upload:	lythu
View:	216 times
Download:	4 times

Contentsreservoirsignals.com/wp...electrofaciesclassificationusingdata... · Priyank Srivastava (PE...

Documents