+ All Categories
Home > Documents > Kohonen neural network and factor analysis based approach ... · Pattern recognition Geochemical...

Kohonen neural network and factor analysis based approach ... · Pattern recognition Geochemical...

Date post: 25-Mar-2020
Category:
Upload: others
View: 8 times
Download: 0 times
Share this document with a friend
11
Kohonen neural network and factor analysis based approach to geochemical data pattern recognition Xiang Sun a,b, , Jun Deng a , Qingjie Gong a , Qingfei Wang a , Liqiang Yang a , Zhongying Zhao c a State Key Laboratory of Geological Processes and Mineral Resources, China University of Geosciences, 29 Xuyuan Street, Beijing, 100083, PR China b Department of Resource and Environment, Liaoning Technical University, 47 Zhonghua Street, Fuxin, 123000, PR China c Department of Resource and Information, China University of Petroleum, 18 Fuxue Street, Beijing, 102249, PR China abstract article info Article history: Received 13 October 2008 Accepted 27 April 2009 Available online 20 May 2009 Keywords: Kohonen neural network Factor analysis Pattern recognition Geochemical data Kohonen neural network (KNN) and factor analysis are applied to regional geochemical pattern recognition for a PbZnMoAg mining area around Sheduolong in Qinghai Province, China. Prior to factor analysis, the geochemical data are classied by KNN. The results demonstrate that the 4-factor model accounted for 67% of the variation in the data. Factor F1, a PbZnMo factor and Factor F4, an AuAg factor, correlates with monzonitic granite intrusions and particularly with PbZnMoAg mineralization within those rocks. Factor F2, an AsCo factor, correlates with metamorphic rocks of paleoproterozoic Baishahe formation. Factor F3, a BiCu factor, correlates with granodiorite intrusions. The factor score maps suggest a revised location of faults and their mineralization signicance in coarse geological map. The approach not only effectively interprets the geological signicance of the factors, but also reduces the area of exploration targets. © 2009 Elsevier B.V. All rights reserved. 1. Introduction Metallogenesis is a complicated dynamic process. The origin and evolution of deposits are not only the way of motion of matter, but also the results of a variety of factors in geochemical elds (Yu et al., 1993; Zhai, 1999). Areas of mineralization have usually more obvious changes than background, which is shown in the form of material composition, structure, geophysics and geochemistry (Zhang, 1992; Zhao et al., 1995). As a result, the pattern recognition of geochemical data is very important to mineralization prospecting. Since 1970s, pattern recognition techniques have been applied to recognize the geological and economic mineralogical information hidden in geochemical data and to establish multivariate geochemical background pattern (Castillo-Munoz and Howarth, 1976; Gustavsson and Bjorklund, 1976; Xie, 1979; Lindqvist et al., 1987; Cheng, 1999; Li et al., 1999; Cheng, 2000; Agterberg, 2001; Cheng, 2004; Kaminskas, 2004). Recently, it has also been applied to investigate the relations between regional geochemical patterns and large ore deposits (Deng et al., 2001; Xie et al., 2004; Deng et al., 2007; Wang et al., 2007; Deng et al., 2008). Typical pattern recognition methods used in geochemical explora- tion mainly consist of discriminant analysis, cluster analysis and factor analysis (Castillo-Munoz and Howarth, 1976; Ji and Chen, 1993; Clemens et al., 2002). The applications of discriminant analysis are limited because training samples are often hard to be obtained, and that of cluster analysis and factor analysis are limited because it is difcult to cluster so many samples in large data sets, show the spatial texture of geochemical data and interpret the types of sample in terms of the corresponding types of variable (Cheng et al., 1994; Reimann et al., 2002; Ji et al., 2007). Efforts to classify large data sets and study the spatial texture of geochemical data using articial neural networks (ANN) have shown encouraging results (Li, 1999; Wang et al., 2002; Sun, 2007). The performance of ANN based techniques depends highly on the relationship between the patterns used in training the networks and the expected forecast patterns. If the diversity or the inconsistency between the training patterns and the expected forecast patterns is strong, the forecast errors of the ANN technique may be relatively high (Chow and Leung, 1996). The Kohonen neural network (KNN), usually performed in an unsupervised way to map high-dimensional data onto a low-dimensional framework of neurons, is one of the most fascinating topics in the neural network eld. The advantages of the KNN model is that it enables and facilitates a thorough investiga- tion of the high-dimensional data space and the algorithm does not require the teacher's signals and is not so complicated (Kohonen et al., 1996). The objective of this paper is to apply both the Kohonen neural network and the factor analysis to geochemical data pattern recognition for a PbZnMoAg mining area around Sheduolong in Qinghai Province, China. Journal of Geochemical Exploration 103 (2009) 616 Corresponding author. State Key Laboratory of Geological Processes and Mineral Resources, China University of Geosciences, 29 Xuyuan Street, Beijing,100083, PR China. Tel.: +86 10 6234 9020. E-mail address: [email protected] (X. Sun). 0375-6742/$ see front matter © 2009 Elsevier B.V. All rights reserved. doi:10.1016/j.gexplo.2009.04.002 Contents lists available at ScienceDirect Journal of Geochemical Exploration journal homepage: www.elsevier.com/locate/jgeoexp
Transcript
Page 1: Kohonen neural network and factor analysis based approach ... · Pattern recognition Geochemical data Kohonen neural network (KNN) and factor analysis are applied to regional geochemical

Journal of Geochemical Exploration 103 (2009) 6–16

Contents lists available at ScienceDirect

Journal of Geochemical Exploration

j ourna l homepage: www.e lsev ie r.com/ locate / jgeoexp

Kohonen neural network and factor analysis based approach to geochemical datapattern recognition

Xiang Sun a,b,⁎, Jun Deng a, Qingjie Gong a, Qingfei Wang a, Liqiang Yang a, Zhongying Zhao c

a State Key Laboratory of Geological Processes and Mineral Resources, China University of Geosciences, 29 Xuyuan Street, Beijing, 100083, PR Chinab Department of Resource and Environment, Liaoning Technical University, 47 Zhonghua Street, Fuxin, 123000, PR Chinac Department of Resource and Information, China University of Petroleum, 18 Fuxue Street, Beijing, 102249, PR China

⁎ Corresponding author. State Key Laboratory of GeoResources, China University of Geosciences, 29 Xuyuan StTel.: +86 10 6234 9020.

E-mail address: [email protected] (X. Sun).

0375-6742/$ – see front matter © 2009 Elsevier B.V. Aldoi:10.1016/j.gexplo.2009.04.002

a b s t r a c t

a r t i c l e i n f o

Article history:Received 13 October 2008Accepted 27 April 2009Available online 20 May 2009

Keywords:Kohonen neural networkFactor analysisPattern recognitionGeochemical data

Kohonen neural network (KNN) and factor analysis are applied to regional geochemical pattern recognitionfor a Pb–Zn–Mo–Ag mining area around Sheduolong in Qinghai Province, China. Prior to factor analysis, thegeochemical data are classified by KNN. The results demonstrate that the 4-factor model accounted for 67%of the variation in the data. Factor F1, a Pb–Zn–Mo factor and Factor F4, an Au–Ag factor, correlates withmonzonitic granite intrusions and particularly with Pb–Zn–Mo–Ag mineralization within those rocks. FactorF2, an As–Co factor, correlates with metamorphic rocks of paleoproterozoic Baishahe formation. Factor F3, aBi–Cu factor, correlates with granodiorite intrusions. The factor score maps suggest a revised location of faultsand their mineralization significance in coarse geological map. The approach not only effectively interpretsthe geological significance of the factors, but also reduces the area of exploration targets.

© 2009 Elsevier B.V. All rights reserved.

1. Introduction

Metallogenesis is a complicated dynamic process. The origin andevolution of deposits are not only theway of motion of matter, but alsothe results of a variety of factors in geochemical fields (Yu et al., 1993;Zhai, 1999). Areas of mineralization have usually more obviouschanges than background, which is shown in the form of materialcomposition, structure, geophysics and geochemistry (Zhang, 1992;Zhao et al., 1995). As a result, the pattern recognition of geochemicaldata is very important to mineralization prospecting.

Since 1970s, pattern recognition techniques have been appliedto recognize the geological and economic mineralogical informationhidden in geochemical data and to establish multivariate geochemicalbackground pattern (Castillo-Munoz and Howarth, 1976; Gustavssonand Bjorklund, 1976; Xie, 1979; Lindqvist et al., 1987; Cheng, 1999; Liet al., 1999; Cheng, 2000; Agterberg, 2001; Cheng, 2004; Kaminskas,2004). Recently, it has also been applied to investigate the relationsbetween regional geochemical patterns and large ore deposits (Denget al., 2001; Xie et al., 2004; Deng et al., 2007; Wang et al., 2007; Denget al., 2008).

Typical pattern recognition methods used in geochemical explora-tionmainly consist of discriminant analysis, cluster analysis and factor

logical Processes and Mineralreet, Beijing,100083, PR China.

l rights reserved.

analysis (Castillo-Munoz and Howarth, 1976; Ji and Chen, 1993;Clemens et al., 2002). The applications of discriminant analysis arelimited because training samples are often hard to be obtained, andthat of cluster analysis and factor analysis are limited because it isdifficult to cluster so many samples in large data sets, show the spatialtexture of geochemical data and interpret the types of sample in termsof the corresponding types of variable (Cheng et al., 1994; Reimannet al., 2002; Ji et al., 2007).

Efforts to classify large data sets and study the spatial textureof geochemical data using artificial neural networks (ANN) haveshown encouraging results (Li, 1999; Wang et al., 2002; Sun, 2007).The performance of ANN based techniques depends highly on therelationship between the patterns used in training the networks andthe expected forecast patterns. If the diversity or the inconsistencybetween the training patterns and the expected forecast patterns isstrong, the forecast errors of the ANN techniquemay be relatively high(Chow and Leung, 1996). The Kohonen neural network (KNN), usuallyperformed in an unsupervised way to map high-dimensional dataonto a low-dimensional framework of neurons, is one of the mostfascinating topics in the neural network field. The advantages of theKNN model is that it enables and facilitates a thorough investiga-tion of the high-dimensional data space and the algorithm does notrequire the teacher's signals and is not so complicated (Kohonen et al.,1996).

The objective of this paper is to apply both the Kohonen neuralnetwork and the factor analysis to geochemical data patternrecognition for a Pb–Zn–Mo–Ag mining area around Sheduolong inQinghai Province, China.

Page 2: Kohonen neural network and factor analysis based approach ... · Pattern recognition Geochemical data Kohonen neural network (KNN) and factor analysis are applied to regional geochemical

Fig. 1. Structure of Kohonen neural network. R: number of elements in input vector P; S:number of map nodes in SOM layer; n: number of weights.

7X. Sun et al. / Journal of Geochemical Exploration 103 (2009) 6–16

2. Kohonen neural network

A neural network consists of numerous computational elements(neurons or nodes), highly interconnected to each other. A weight isassociated to every connection. Normally nodes are arranged into

Fig. 2. Simplified geological map of Shenduolong area, Qinghai province, China. a: Quategranodiorite; d: Indosinian monzonitic granite; e: Indosinian syenites; f: fault; g: Ag–Pb–Zn

layers. During a training procedure input vectors are presented to theinput layer with or without specifying the desired output. Accordingto this difference neural networks can be classified as supervised orunsupervised (self-organizing) neural nets. Networks can also beclassified according to the input values (binary or continuous). Thelearning procedure itself contains three main steps, the presentationof the input sample, the calculation of the output and themodification of the weights by specified training rules. These stepsare repeated several times, until the network is said to be trained(Rojas, 1996).

KNN, also known as self-organizing map (SOM), consists of twolayers of neurons: an input layer and an SOM layer which can beimagined to be a rubber net that is stretched over the regions in theinput space where input vectors occur (Fig. 1(1)). The array of inputnodes operates simply as a flow-through layer for the input vectors.Each node in the SOM layer is fed by the input vector and is equippedwith a weight vector. The weight vectors must be the same for mapnodes and input vectors or the algorithm will not work (Kohonen,1989; Song and Hopke, 1996; Kim et al., 2002; Hoffmann, 2005;Marini et al., 2005; Bianchi et al., 2007).

Fig. 1(1) shows a structure of SOM network, with 15 map nodesaligning in 3 lines and 5 columns. There are R input nodes in inputlayer. Each map node is connected to each input node, and map nodesare not connected to each other. Therefore, the sum of connections is15×R. That is to say if there are 5 input nodes, there will be 75connections between map node and input node. The nodes areorganized in this manner, as a 2-D grid makes it easy to visualize theresults. This representation is also useful when the SOM algorithm

rnary; b: metamorphic rocks of paleoproterozoic Baishahe formation; c: Indosinian–Mo mineralization land.

Page 3: Kohonen neural network and factor analysis based approach ... · Pattern recognition Geochemical data Kohonen neural network (KNN) and factor analysis are applied to regional geochemical

Table 1Statistics summary of the total samples.

Element ⁎Au ⁎Ag As Sn Bi Co Cu Pb Zn Mo

⁎DL 0.1 0.01 0.1 0.1 0.1 0.2 0.1 0.2 0.5 0.1⁎bDL 0 0 0 0 0 0 0 0 0 0Minimum 0.50 1.00 1.50 0.24 0.12 2.56 5.70 11.17 10.00 0.16Maximum 7.80 5000.00 83.66 15.00 151.67 24.38 174.40 491.30 584.57 94.00Mean 1.50 45.72 14.50 1.15 8.84 11.23 25.37 28.06 70.15 1.09Std. Deviation 0.86 268.11 5.60 1.51 6.33 2.57 8.15 20.55 23.10 2.01Coefficient of variation (%) 57.33 586.42 38.62 131.30 71.61 22.89 32.12 73.24 32.93 184.40Lower quartiles 0.90 3.60 11.50 0.35 1.23 9.36 22.09 22.13 58.99 0.73Median 1.30 6.80 14.90 0.38 10.69 11.80 24.56 24.61 66.74 0.86Upper quartiles 1.80 47.98 17.10 1.40 12.17 12.96 27.33 27.68 75.60 1.09

⁎The unit of Au and Ag is 10−9, while others 10−6. The data are from Non-Ferrous Metal Geological Exploration Bureau, Qinghai Province, China; DL: detection limit, bDL: % ofsamples below the detection limit.

8 X. Sun et al. / Journal of Geochemical Exploration 103 (2009) 6–16

is used. In this configuration, each input vector contains an array offloats, or their weights and each map node contains an array of floats,or their weights and has a unique coordinate. This makes it easy toreference a node in the network, and to calculate the distances be-tween nodes. Because of the connections only to the input nodes, themap nodes are oblivious as towhat values their neighbors have. Amapnode will only update their weights based on what the input vectortells it. In Fig.1(2), theDIST box accepts the input vectorP and the inputweight matrix W, and calculates the distances between the P andW. The SOM transfer function C produces a for output element aicorresponding to i, the winning neuron. All other output elements ina are 0.

The KNN algorithm can be broken up into 8 steps (Guthikonda,2005).

1) Randomize the map nodes' weight vectors2) Grab an input vector and present to the network.3) Traverse each node in the map4) Use Euclidean distance formula to find similarity between the

input vector and the map's node's weight vector (Eq. 1).5) Track the node that produces the smallest distance (this node is

the best matching unit, BMU)6) The radius of the neighborhood of the BMU is calculated. This value

starts large. Typically it is set to be the radius of the network,diminishing each time-step. (Eqs. 2a, 2b).

7) Any nodes found within the radius of the BMU, calculated in 4),are adjusted tomake themmore like the input vector (Eqs. 3a, 3b).The closer a node is to the BMU, the more its' weights are altered(Eq. 3c).

8) Repeat 2) for N iterations.

The equations utilized by the algorithm are as follows:

Eq. 1 —Calculate the BMU.

DistFromInput2 =Xi=n

i=1

Ii−W ið Þ2

I = current input vectorW= node's weight vectorn = number of weights

Eq. 2a —Radius of the neighborhood.

σ tð Þ = σ0e− t =λð Þ

t = current iteration

Fig. 3. Box plot of data. Box: The difference between the upper and lower quartiles, also callengths from either end of the box. Extremes: Cases with values more than 3 box lengths fromthe box to the largest and smallest observed values that are less than 1.5 box lengths from eithsamples. Au, Au1, Au2, Au3, Au4, Au5, Au6, Au7, Au8 and Au9 represent the samples of 1st, 2nAs, Sn, Bi, Cu, Co, Pb, Zn and Mo.

λ = time constant (Eq. 2b)σ0= radius of the map

Eq. 2b —Time constant

λ = numIterations/mapRadius

Eq. 3a —New weight of a node.

W t + 1ð Þ = W tð Þ + Θ tð ÞL tð ÞI tð Þ− W tð ÞÞ

Eq. 3b —Learning rate.

L tð Þ = L0e− t =λð Þ

Eq. 3c —Distance from BMU.

Θ tð Þ = e −distFromBMU2= 2σ2 tð Þð Þð Þ

3. Program for pattern recognition of geochemical data

A systemic program for pattern recognition of geochemical datacan be built up based on KNN and factor analysis. Experience showsthat the key tasks in geochemical data interpretation can be achievedwith the following steps:

(1) Classification of the types of samples by using KNN.(2) Factor analysis of all samples and selection some group samples.(3) Interpretation and pattern recognition of the results of factor

analysis.

4. Example of application

4.1. Study area and data

The study area located in a Pb–Zn–Momining area around Sheduo-long, Qinghai province, China, covers about 80 km2 (Fig. 2). In order todefine exploration targets for Ag–Pb–Zn–Modeposits and enhance thereserves, Qinghai Geology Survey did 1∶50000 stream sedimentarysurvey. The geochemical data set, provided by the Qinghai GeologySurvey, contains 2860 samples and 10 components, i.e. the samples areall analyzed for Au, Ag, As, Sn, Bi, Co, Cu, Pb, Zn and Mo. Table 1 showsthe statistics of the elements in 2860 stream sediment samples. Aghas the highest Std. Deviation (268.11) and coefficient of variation(586.42%). Fig. 3(11) show the medians, upper and lower quartiles,

led the interquartile range. Outliers: Cases with values that are between 1.5 and 3 boxeither end of the box.Whiskers at the ends of the box show the distance from the end ofer end of the box. Fig. 3(1–10) are the box plots of groups, while Fig. 3(11) is that of totald, 3rd, 4th, 5th, 6th, 7th, 8th, 9th groups and total samples respectively. The same as Ag,

Page 4: Kohonen neural network and factor analysis based approach ... · Pattern recognition Geochemical data Kohonen neural network (KNN) and factor analysis are applied to regional geochemical

9X. Sun et al. / Journal of Geochemical Exploration 103 (2009) 6–16

Page 5: Kohonen neural network and factor analysis based approach ... · Pattern recognition Geochemical data Kohonen neural network (KNN) and factor analysis are applied to regional geochemical

Table 2Descriptive statistics of the group samples.

Classification Descriptive statistics Au Ag As Sn Bi Co Cu Pb Zn Mo

1st group (n=540) Minimum 0.50 1.00 7.48 0.28 7.26 9.84 20.74 15.88 53.95 0.16Maximum 2.5 15.7 21.00 3.63 151.67 17.13 52.54 62.81 110.70 2.87Mean 1.13 4.46 13.82 0.45 13.12 13.67 27.85 25.51 74.68 0.88aConcentration coefficient 0.75 0.10 0.95 0.39 1.48 1.22 1.10 0.91 1.06 0.81Std. Deviation 0.39 2.80 2.74 0.23 7.73 1.02 4.37 5.93 11.41 0.27Coefficient of variation (%) 34.51 62.78 19.83 51.11 58.92 7.46 15.69 23.25 15.28 30.68

2nd group (n=312) Minimum 0.50 1.00 11.20 0.29 8.75 7.43 18.52 14.21 46.48 0.46Maximum 1.70 22.80 31.90 2.40 35.45 14.72 59.60 112.90 97.67 2.29Mean 0.94 5.89 17.16 0.37 11.66 12.38 25.83 24.08 70.56 0.94Concentration coefficient 0.63 0.13 1.18 0.32 1.32 1.10 1.02 0.86 1.01 0.86Std. Deviation 0.24 2.95 2.14 0.14 1.92 0.76 4.47 6.09 8.14 0.26Coefficient of variation (%) 25.53 50.08 12.47 37.84 16.47 6.14 17.31 25.29 11.54 27.66

3rd group (n=179) Minimum 0.50 1.00 4.69 0.29 0.18 6.35 14.31 15.66 52.31 0.21Maximum 3.30 858.60 83.66 10.00 18.66 24.38 174.40 491.30 584.57 94.00Mean 1.32 91.44 20.50 2.10 4.86 13.44 37.17 54.54 106.37 2.03Concentration coefficient 0.88 2.00 1.41 1.83 0.55 1.20 1.47 1.94 1.52 1.86Std. Deviation 0.57 121.27 11.44 1.50 5.51 2.56 19.62 63.59 60.27 7.39Coefficient of variation (%) 43.18 132.62 55.80 71.43 113.37 19.05 52.78 116.59 56.66 364.04

4th group (n=302) Minimum 1.20 1.00 8.80 0.29 7.24 9.06 19.67 13.56 52.55 0.28Maximum 2.30 13.00 20.90 1.33 22.03 14.16 40.35 58.10 91.04 1.85Mean 1.77 5.62 15.36 0.36 11.89 12.07 24.86 23.36 69.35 0.88Concentration coefficient 1.18 0.12 1.06 0.31 1.35 1.07 0.98 0.83 0.99 0.81Std. Deviation 0.25 2.66 2.17 7.56E−02 2.14 0.82 2.89 3.62 8.76 0.21Coefficient of variation (%) 14.12 47.33 14.13 21.00 18.00 6.79 11.63 15.50 12.63 23.86

5th group (n=463) Minimum 0.50 1.00 9.54 0.28 7.76 4.61 6.39 11.17 20.03 0.33Maximum 2.60 14.80 32.39 0.87 35.86 12.62 37.02 35.03 77.94 2.06Mean 1.13 4.99 18.03 0.37 11.33 9.52 22.53 23.39 56.19 0.84Concentration coefficient 0.75 0.11 1.24 0.32 1.28 0.85 0.89 0.83 0.80 0.77Std. Deviation 0.37 2.85 4.47 5.29E−02 2.27 1.55 2.72 3.33 6.50 0.25Coefficient of variation (%) 32.74 57.11 24.79 14.30 20.04 16.28 12.07 14.24 11.57 29.76

6th group (n=165) Minimum 0.50 24.80 7.11 1.00 0.15 7.42 8.02 16.26 51.35 0.21Maximum 2.80 280.10 27.99 6.70 4.26 16.64 34.83 76.51 114.10 8.65Mean 1.20 79.62 13.70 2.74 0.60 11.04 23.65 28.70 69.17 1.25Concentration coefficient 0.80 1.74 0.94 2.38 0.07 0.98 0.93 1.02 0.99 1.15Std. Deviation 0.48 34.01 3.42 1.23 0.58 1.27 4.75 8.27 8.58 1.15Coefficient of variation (%) 40.00 42.72 24.96 44.89 96.67 11.50 20.08 28.82 12.40 92.00

7th group (n=336) Minimum 2.20 1.00 8.03 0.28 0.21 6.90 13.61 14.63 33.00 0.32Maximum 7.80 187.20 33.06 3.80 34.71 15.83 96.51 41.32 103.50 2.89Mean 3.21 7.63 15.64 0.41 11.59 12.22 26.64 23.55 70.23 0.8596Concentration coefficient 2.14 0.17 1.08 0.36 1.31 1.09 1.05 0.84 1.00 0.79Std. Deviation 0.85 14.23 2.96 0.27 2.74 1.38 6.94 3.44 11.51 0.2292Coefficient of variation (%) 26.48 186.50 18.93 65.85 23.64 11.29 26.05 14.61 16.39 26.66

8th group (n=63) Minimum 0.80 1.40 4.52 0.24 0.20 6.33 9.09 11.35 36.06 0.21Maximum 4.30 162.50 18.20 4.30 13.09 12.61 36.30 62.04 94.84 5.30Mean 2.20 39.08 12.08 0.99 5.68 8.88 20.55 25.04 56.30 0.82Concentration coefficient 1.47 0.85 0.83 0.86 0.64 0.79 0.81 0.89 0.80 0.75Std. Deviation 0.76 43.56 2.74 0.81 4.71 1.42 4.39 8.01 9.74 0.80Coefficient of variation (%) 34.55 111.46 22.68 81.82 82.92 15.99 21.36 31.99 17.30 97.56

9th group (n=500) Minimum 0.50 29.30 1.50 1.00 0.12 2.56 5.70 13.02 10.00 0.24Maximum 4.10 5000.00 18.66 15.00 10.00 15.18 68.65 178.40 232.90 24.13Mean 1.32 175.94 7.40 3.23 0.99 7.82 21.43 34.17 67.48 1.55Concentration coefficient 0.88 3.85 0.51 2.81 0.11 0.70 0.84 1.22 0.96 1.42Std. Deviation 0.65 617.55 2.84 1.93 1.99 2.00 8.45 22.49 25.75 1.51Coefficient of variation (%) 49.24 351.00 38.38 59.75 201.01 25.58 39.43 65.82 38.16 97.42

a Concentration coefficient is the ratio between the group mean and total mean. The number in brackets is the number of samples in the corresponding group. The unit of Au andAg is 10−9, while others 10−6.

10 X. Sun et al. / Journal of Geochemical Exploration 103 (2009) 6–16

outliers and extremes of distributions within groups. Ag, Pb andZn have more outliers and extremes, which maybe indicative ofmineralization.

4.2. Classification of the types of samples

There exists a lot of variation regarding the equations used withthe KNN algorithm. There is also a lot of research being done on theoptimal parameters. Some things of particular heavy debate are thenumber of iterations, the learning rate, and the neighborhood radius.It has been suggested by Kohonen (1989) himself, however, that thetraining should be split into two phases. Phase 1 will reduce the learn-ing rate from 0.9 to 0.1, and the neighborhood radius from half thediameter of the lattice to the immediately surrounding nodes. Phase 2

will reduce the learning rate from 0.1 to 0.0, but over double or morethe number of iterations in Phase 1. In Phase 2, the neighborhoodradius value should remain fixed at 1 (the BMU only). Analyzing theseparameters, Phase 1 allows the network to quickly fill out the space,while Phase 2 performs the fine-tuning of the network to a moreaccurate representation (Guthikonda, 2005).

To make out the optimal size of the SOM layer, the training is splitinto the Phase 1 and Phase 2 and various network sizes are simulated.In this application, the minimum network size is investigated as 3×3node network and 9000 epochs training is implemented to classify theinput vectors into 9 groups, effectively (Table 2). Concentrationcoefficient is the ratio between the group mean and total mean. If thegroup concentration coefficient of each element is great, especiallymuch greater than 1, the group is effectively classified.

Page 6: Kohonen neural network and factor analysis based approach ... · Pattern recognition Geochemical data Kohonen neural network (KNN) and factor analysis are applied to regional geochemical

11X. Sun et al. / Journal of Geochemical Exploration 103 (2009) 6–16

Fig. 4. Locations of stream sediment samples in Sheduolong area. A: 679 samples in 3rd and 9th groups; B: 2860 samples in all groups.

Page 7: Kohonen neural network and factor analysis based approach ... · Pattern recognition Geochemical data Kohonen neural network (KNN) and factor analysis are applied to regional geochemical

Table 5Factor loading matrix.

Element Samples of 3rd, 9th groups All samples

F1 F2 F3 F4 F1 F2 F3

Au −0.13 −0.02 0.03 0.62 0.05 0.05 0.91Ag 0.41 −0.18 0.21 0.63 0.62 −0.20 0.19As 0.13 0.89 0.08 −0.04 0.03 0.64 −0.01Sn 0.23 0.07 −0.45 0.48 0.49 −0.62 −0.05Bi 0.07 0.14 0.87 0.14 −0.23 0.73 0.09Co −0.04 0.83 0.33 −0.05 0.03 0.77 0.05Cu 0.12 0.27 0.68 −0.03 0.30 0.49 −0.06Pb 0.86 0.14 0.01 0.20 0.85 −0.03 −0.11Zn 0.66 0.38 0.21 0.34 0.81 0.27 0.07Mo 0.75 −0.15 0.02 −0.32 0.50 −0.05 −0.36

The data in bold emphases represent the selected factors based on the threshold forloadings (0.50 absolute value).

Table 4Eigenvalues and proportions.

Samples of 3rd, 9th groups All samples

Factor Eigenvalues % of variance Cumulative % Factor Eigenvalues % of variance Cumulative %

F1 2.00 19.99 19.99 F1 2.39 23.92 23.92F2 1.79 17.89 37.88 F2 2.28 22.83 46.75F3 1.62 16.24 54.12 F3 1.03 10.30 57.04F4 1.34 13.41 67.54

Table 3Correlation coefficients (based on 679 samples of 3rd and 9th groups).

Au Ag As Sn Bi Co Cu Pb Zn Mo

Au 1.000Ag 0.080 1.000As −0.044 −0.005 1.000Sn 0.090 0.280 −0.047 1.000Bi 0.019 0.224 0.199 −0.202 1.000Co −0.012 −0.098 0.631 −0.079 0.390 1.000Cu 0.014 0.056 0.271 −0.036 0.464 0.394 1.000Pb 0.098 0.327 0.208 0.172 0.117 0.021 0.101 1.000Zn 0.073 0.396 0.349 0.115 0.316 0.304 0.197 0.701 1.000Mo −0.033 0.093 −0.008 0.079 −0.006 −0.031 0.134 0.434 0.180 1.000

12 X. Sun et al. / Journal of Geochemical Exploration 103 (2009) 6–16

Table 2 shows that the 8 elements' concentration coefficient in the 3rdgroup are much greater than 1 while the other groups only have smallelements' concentration coefficient greater than 1, which means 3rdgroup possibly represents part of the total with high and anomalyvalues. 9th group has the highest Ag concentration coefficient (3.85)and higher Pb and Mo concentration coefficient. Fig. 3 shows the boxplot of group and total samples. 3rd and 9th groups contain most highand anomaly values of Ag, Pb and Zn based on the width, whiskers,extremes and outliers in box plot. Other groups may representrelatively lower value or the background value compared to 3rd and9th. Furthermore, one major aim of the survey was to defineexploration targets for Ag–Pb–Zn–Mo deposits. So, we selected 679samples of the 3rd and 9th groups to make factor analysis in order toeliminate the background value and delineate anomaly of Ag, Pb, Znand Mo. The areas of 3rd and 9th group samples are much smallerthan that of all groups (Fig. 4), which reduce the workload ofgeochemical data analysis and geological work.

4.3. Factor analysis

Factor analysis is a very data-sensitive technique, a fact that is oftenneglected. A careful univariate analysis should be carried out for anydata set prior to its being used for factor analysis (Reimann et al.,2002). In factor analysis, two main methods exist for extracting thecommon factors: principal factor analysis (PFA) and the maximumlikelihood method (ML). PFA works in principle like principalcomponent analysis (PCA) but with a reduced correlation orcovariance matrix. Only the common structure of all variables butnot any special behavior of each single variable is thus used. ML, incontrast, uses a complicated statistical optimization procedure toextract the factors. ML requires not only a normal distribution for allthe variables entered but also a multivariate normal distribution.When using PFA a normal distribution is not amust, but this method isbased on the correlation or covariance matrix and these are stronglyaffected by non-normally distributed data and the presence ofoutliers. Just as for many other statistical techniques, factor analysisis very sensitive to non-normally distributed data. Therefore, it shouldgenerally be tested whether or not all variables have a normaldistribution. It is now well known amongst geochemists that regionalgeochemical data practically never show a normal distribution(Reimann and Filzmoser, 2000). So all entered variables shouldcome as close to a normal distribution as possible. We investigated

many different transformations (e.g. square root, log10, logit, doublelogarithmic including scale transformation) and the much more widespread log-transformation resulted in a nearly normal distribution.Furthermore, the method of factor rotation must be selected. Varimax(Kaiser, 1958), Promax (Hendrickson and White, 1964), Oblimin(Harman, 1976) or Quartimin (Carroll, 1953) are just some examples.Varimax and Promax are orthogonal rotations, i.e. the rotated factorsare not correlated. Oblimin and Quartimin are oblique rotationmethods, i.e. the rotated factors can be correlated.

The factor analysis results by using PFA method and Varimaxrotation are summarized in Tables 3–5 inclusive. Correlation coeffi-cients are given in Table 2, Table 3 gives the eigenvalues, and Table 4 isthe Varimax factor loading matrix. In summary, we interpret factor F1to be a Pb–Zn–Mo factor, factor F2 as an As–Co factor, factor F3 as a Bi–Cu factor, and factor F4 as an Au–Ag factor. The total explainedvariance is 67.54% from the 3rd and 9th group factor analyses. Whilewe interpret factor F1 to be an Ag–Pb–Zn factor, factor F2 to be an As–Co–Bi–(-Sn) factor, factor F3 to be an Au, and the total explainedvariance 57.04% from the traditional factor analysis of all samples.

By contrasting the conclusions of the two types of method, we canconclude that the factors of traditional factor analysis don't have Mo.However, Pb, Zn and Mo mineralization are associated from thegeological survey. So, it is necessary to classify the samples prior tofactor analysis in order to obtain a reasonable explanation of factors.

Factor analysis allows us to calculate a single value for each ofthese groupings. For example, instead of a quantitative analysis of

Page 8: Kohonen neural network and factor analysis based approach ... · Pattern recognition Geochemical data Kohonen neural network (KNN) and factor analysis are applied to regional geochemical

13X. Sun et al. / Journal of Geochemical Exploration 103 (2009) 6–16

Fig. 5. Factor score maps. A: Factor F1 score maps; B: Factor F2 score maps; C: Factor F3 score maps; D: Factor F4 score maps; Legends of a, b, c, d, e, f and g are the same as in Fig. 2.

Page 9: Kohonen neural network and factor analysis based approach ... · Pattern recognition Geochemical data Kohonen neural network (KNN) and factor analysis are applied to regional geochemical

Fig. 5 (continued).

14 X. Sun et al. / Journal of Geochemical Exploration 103 (2009) 6–16

Page 10: Kohonen neural network and factor analysis based approach ... · Pattern recognition Geochemical data Kohonen neural network (KNN) and factor analysis are applied to regional geochemical

15X. Sun et al. / Journal of Geochemical Exploration 103 (2009) 6–16

three separate maps for Pb, Zn and Mo, we can establish a linearrelationship (factor) existing among these variables and plot a singlemap (factor scoremap) showing relative amounts of each factor. Suchmaps are shown in Fig. 5. These maps should be compared with thegeneral geologicalmap of the region (Fig. 2). Several points areworthnoting:

(1) Factor F1, is essentially a Pb–Zn–Mo factor. Abundant highvalues are compared in samples in the southeastern part of thestudy area which are partly or wholly underlain by monzoniticgranite intrusions. It was, in fact, the presence of theseintrusions that lead to the geochemical survey from whichdata used in this study were extracted. One major aim of thesurvey was to define exploration targets for Pb–Zn–Mo–Agdeposits. Additional exploration has been carried out in thearea since our data were collected and Pb–Zn–Mo–Ag miner-alization occurs in area with high factor F1 values (Fig. 5D).Therefore we conclude that the factor F4 has provided a usefulguide for further intensive exploration.

(2) Factor F2, an As–Co factor, relates with metamorphic rocks ofpaleoproterozoic Baishahe Formation.

(3) Factor F3, a Bi–Cu factor. Abundant high values are ingranodiorite intrusions.

(4) Factor F4 is essentially an Au–Ag factor. Abundant high valuesare compared in samples in the southeastern part of the studyarea and relates with monzonitic granite intrusions too. Theareawith high factor F4 values is the samewith that of factor F1.It was, in fact, the presence of these intrusions that lead to theAg–Pb–Zn–Mo mineralization.

In addition, we conclude Fault F3 is closely identifiedwith presenceof the occurrence Ag–Pb–Zn–Momineralization (Fig. 2). However, wecompared the F1 score map with the location of Fault F3 and foundthat the location of Fault F3 in the southeastern part of the coarsegeological map (Fig. 5A) may be wrong, and the accurate location ofFault F3 should be revised according to the high values of F1 scoremap. The revised location of Fault F3 is shown in Fig. 5A as a dashedline which has been proven by recent geological investigations.

5. Conclusions

The KNN effectively classifies the samples, eliminates manysamples with background values, and provides a few samples tofactor analysis, which reduces the workload of geochemical dataanalysis and geological work. The great advantage of factor analysis isthat the study of many variables commonly can be reduced to a few. Inthe present case, ten initial variables were reduced to four (e.g. factorF1, Pb–Zn–Mo; factor F2, As–Co; factor F3, Bi–Cu; and factor F4, Au–Ag), of which factor F1 and factor F2 appear particularly important formineral exploration.

In this study, we have found that KNN and factor analysis basedapproach appears to result in no significant loss of information andprovide more easily and reasonably interpretable results than onlyfactor analysis. It especially recognizes the location of faults and theirmineralization significance. The KNN and factor analysis basedapproach is an effective way to extract the abundant informationresources related to geology and mineralization which are hidden inregional geochemical data.

Acknowledgements

We thank two anonymous reviewers for their constructive reviewsand B. De Vivo for handling the manuscript. This research wassupported by the National Basic Research Program of China (973Program) (No. 2009CB421008), the Changjiang Scholars and Innova-tive Research Team in University (No. IRT0075), and the 111 Project ofthe Ministry of Education, China (No. B07011).

References

Agterberg, F., 2001. Multifractal simulation of geochemical map patterns. J. China Univ.Geosci. 12 (1), 31–39.

Bianchi, D., Calogero, R., Tirozzi, B., 2007. Kohonen neural networks and geneticclassification. Math. Comput. Model. 45, 34–60.

Carroll, J.B., 1953. An analytic solution for approximating simple structure in factoranalysis. Psychometrika 18, 23–38.

Castillo-Munoz, R., Howarth, R.J., 1976. Application of the empirical discriminantfunction to regional geochemical data from the United Kingdom. Geol. Soc. Amer.Bull. 87, 1567–1581.

Cheng, Q., 1999. Spatial and scaling modelling for geochemical anomaly separation.J. Geochem. Explor. 65 (5), 175–194.

Cheng, Q., 2000. Multifractal theory and geochemical element distribution pattern.Earth Sci.-J. China Univ. Geosci. 25 (3), 311–318 (in Chinese, with Englishabstract).

Cheng, Q., 2004. A new model for quantifying anisotropic scale invariance and fordecomposition of mixing patterns. J. Math. Gelo. 36 (3), 345–360.

Cheng, Q., Agterberg, F., Ballantyne, S., 1994. The separation of geochemical anomaliesfrom background by fractal methods. J. Geochem. Explor. 51 (2), 109–130.

Chow, T.W.S., Leung, C.T., 1996. Neural network based short-term load forecasting usingweather compensation. IEEE Trans. Power Syst. 11 (4), 1736–1742.

Clemens, R., Peter, F., Robert, G., 2002. Factor analysis applied to regional geochemicaldata: problems and possibilities. Appl. Geochem. 17, 185–206.

Deng, J., Fang, Y., Yang, L., 2001. Numerical modeling of ore-forming dynamics of fractaldispersive fluid systems. Acta Geol. Sin. 75, 220–232.

Deng, J., Wang, Q., Wan, L., et al., 2007. Singularity of Au distribution in altered rock typedeposit—an example from Dayingezhuang gold ore deposit. In: Zhao, P. (Ed.), The12th Conference of the International Association for Mathematical Geology. Chinauniversity of Geosciences Press, Wuhan, pp. 44–47.

Deng, J., Wang, Q., Wan, L., Yang, L., Zhou, L., Zhao, J., 2008. Random difference of thetrace element distribution in skarn and marbles from Shizishan Orefield, Anhuiprovince, China. J. China Univ. Geosci. 19 (4), 319–326.

Gustavsson, N., Bjorklund, A., 1976. Lithological classification of tills by discriminantanalysis. J. Geochem. Explor. 5, 393–395.

Guthikonda, M.G. 2005. http://www.shy.am/wp-content/uploads/2009/01/kohonen-self-organizing-maps-shyam-guthikonda.pdf.

Harman, H.H., 1976. Modern Factor Analysis, 3rd Edition. University of Chicago Press,Chicago.

Hendrickson, A.E., White, P.O., 1964. PROMAX: a quick method for rotation to obliquesimple structure. Brit. J. Stat. Psychology 17, 65–70.

Hoffmann, M., 2005. Numerical control of Kohonen neural network for scattered dataapproximation. Numer. Algorithms 39, 175–186.

Ji, H., Chen, Y., 1993. Correspondence cluster analysis for qualitative data and itsapplication. Comput. Tech. Geophys. Geochem. Explor. 15 (4), 300–306 (in Chinese,with English abstract).

Ji, H., Zeng, D., Shi, Y., et al., 2007. Semi-hierarchical correspondence cluster analysis andregional geochemical pattern recognition. J. Geochem. Explor. 93 (2), 109–119.

Kaminskas, D., 2004. Comparison of pattern-recognition techniques for classificationof Silurian rocks from Lithuania based on geochemical data. Nor. J. Geol. 84 (2),117–124.

Kaiser, H.F., 1958. The Varimax criterion for analytic rotation in factor analysis.Psychometrika 23, 187–200.

Kim, C.I., Yu, I.K., Song, Y.H., 2002. Kohonen neural network and wavelet transformbased approach to short-term load forecasting. Electr. Power Syst. Res. 63, 169–176.

Kohonen, T., 1989. Self-organization and Associative Memory Process. Springer-Verlag,Berlin.

Kohonen, T., Oja, E., Simula, O., Visa, A., Kangas, J., 1996. Engineering application of theself-organising map, Proc. IEEE 84 (10), 358–1384.

Li, C., 1999. Fractal, chaos and ANN in mineral exploration. InGeological Press, Beijing,pp. 20–100 (in Chinese).

Li, C., Ma, T., Zhu, X., et al., 1999. Fractal brownian motion and geochemical survey—afractal approach to the spatial distribution of element contents in the crust. Geol.Rev. 45 (1), 76–84 (in Chinese, with English abstract).

Lindqvist, L., Lundholm, I., Nisca, D., et al., 1987. Multivariate geochemical modeling andintegration with petrophysical data. J. Geochem. Explor. 29, 279–294.

Marini, F., Zupan, J., Magri, A.L., 2005. Class-modeling using Kohonen artificial neuralnetworks. Anal. Chim. Acta 544, 306–314.

Reimann, C., Filzmoser, P., 2000. Normal and lognormal data distribution ingeochemistry: death of a myth. Consequences for the statistical treatment ofgeochemical and environmental data. Environ. Geol. 39, 1001–1014.

Reimann, C., Filzmoserb, P., Garrett, R.G., 2002. Factor analysis applied to regionalgeochemical data: problems and possibilities. Appl. Geochem. 17, 185–206.

Rojas, R., 1996. Neural Networks. A Systematic Introduction. Springer-Verlag, Berlin.Song, X.H., Hopke, P.K., 1996. Kohonen neural network as a pattern recognition method

based on the weight interpretation. Anal. Chim. Acta 334, 57–66.Sun, X., 2007. Prediction of fluorite deposit in Yixian based on fuzzy-neural network.

J. China Univ. Geosci. 18, 279–281.Wang, Y., Chen, S., Wang, H., et al., 2002. Species analysis of gold in geochemical samples

by artificial neural network. Chin. J. Anal. Chem. 30 (1), 62–65 (in Chinese, withEnglish abstract).

Wang, Q., Deng, J., Wan, L., 2007. Fractal analysis of element distribution in Damoqujiagold deposit, Shandong province, China. Proceedings of the 12th conference of theinternational association mathematical Geology, vol. 8, pp. 262–265.

Xie, X., 1979. Regional Geochemical Prospecting. InGeological Press, Beijing, pp. 1–58(in Chinese).

Page 11: Kohonen neural network and factor analysis based approach ... · Pattern recognition Geochemical data Kohonen neural network (KNN) and factor analysis are applied to regional geochemical

16 X. Sun et al. / Journal of Geochemical Exploration 103 (2009) 6–16

Xie, X., Liu, D., Xiang, Y., Yan, et al., 2004. Geochemical blocks for predicting large oredeposits—concept and methodology. J. Geochem. Explor. 84, 77–91.

Yu, C., Cen, K., Bao, Z., 1993. Hydrothermal Metallogenesis Dynamics. InChina Universityof Geosciences Press, Wuhan, p. 218 (in Chinese).

Zhai, Y.,1999. RegionalMetallogeny. InGeological Press, Beijing, pp. 52–142 (in Chinese).

Zhang, B., 1992. Basic concepts and methodology of geochemistry. Earth Sci. 17, 18–25(in Chinese, with English abstract).

Zhao, P., Wang, J., Rao, M., Li, H., 1995. Geologic anomaly of China. Earth Sci. 20 (2),117–127 (in Chinese, with English abstract).


Recommended