+ All Categories
Home > Documents > 4 Th Iranian chemometrics Workshop (ICW) Zanjan-2004

4 Th Iranian chemometrics Workshop (ICW) Zanjan-2004

Date post: 26-Jan-2016
Category:
Upload: galvin
View: 39 times
Download: 0 times
Share this document with a friend
Description:
4 Th Iranian chemometrics Workshop (ICW) Zanjan-2004. 4 Th ICW. 4 Th ICW. The Problem of Factor Selection in PCA-Based Calibration Methods. By: Bahram Hemmateenejad Medicinal & Natural Products Chemistry Research Center, Shiraz University of Medical Science. 4 Th ICW. 4 Th ICW. - PowerPoint PPT Presentation
Popular Tags:
34
4 Th Iranian chemometrics Workshop (ICW) Zanjan-2004
Transcript
Page 1: 4 Th  Iranian chemometrics Workshop (ICW)   Zanjan-2004

4Th Iranian chemometrics Workshop (ICW)

Zanjan-2004

Page 2: 4 Th  Iranian chemometrics Workshop (ICW)   Zanjan-2004

The Problem of Factor Selection in PCA-Based

Calibration Methods

By:

Bahram HemmateenejadMedicinal & Natural Products Chemistry Research Center,

Shiraz University of Medical Science

4Th

ICW 4Th ICW

Page 3: 4 Th  Iranian chemometrics Workshop (ICW)   Zanjan-2004

Multivariate Calibration

Regression Equation relating measurements on m samples to k different variables by:

y = X by (m1): Dependent variable or Predicted

Variable

X (mk) : Independent variables or Predictor Variables

b (k1): regression coefficient

4Th

ICW 4Th ICW

Page 4: 4 Th  Iranian chemometrics Workshop (ICW)   Zanjan-2004

Multicomponent Analysis

y: concentration of the analyte

X: Recorded analytical signals at k different channels, i.e. absorbance at different wavelength

QSAR/QSPR Studies

y: chemical property or biological activity

X: Molecular descriptors representing structural features of molecules by number

4Th

ICW 4Th ICW

Page 5: 4 Th  Iranian chemometrics Workshop (ICW)   Zanjan-2004

• Colinearity between the independent variables (X)

• Number of dependent variables (k) should be much lower than the number of samples (m)

4Th

ICW 4Th ICW

Problems associated with MLR

Reduced number of variables must be used

Page 6: 4 Th  Iranian chemometrics Workshop (ICW)   Zanjan-2004

Feature selection The variables are selected

based on their generalization ability using selection methods such as stepwise variable selection, genetic algorithm, simulated annealing,…

Feature extraction The variables are

transformed into new coordinate axes with lower dimension

Principal Component Analysis (PCA) or Factor Analysis (FA)

4Th

ICW 4Th ICW

Page 7: 4 Th  Iranian chemometrics Workshop (ICW)   Zanjan-2004

PCA or FA or PFA

X = T P

X (mk)

T (mk)

P (kk)

T =[t1 t2 t3 t4 t5 … tk] Score

PT=[pT1 pT

2 pT3 pT

4 pT5 … pT

k] Loading

=[1 2 3 4 5 … k] eigen-value

1 > 2 > 3 > 4 > 5 > …> k

4Th

ICW 4Th ICW

Page 8: 4 Th  Iranian chemometrics Workshop (ICW)   Zanjan-2004

Each vector of T or P is named eigen-vector or PC or factor

i shows the amount of variances in the X matrix that is explained by the corresponding eigen-vectors (ti or pi)

A reduced set of PCs is necessary to reproduce the original data matrix without losing significant information

4Th

ICW 4Th ICW

PTX

X

T

P

(mk)

(mf)

(fk)

Page 9: 4 Th  Iranian chemometrics Workshop (ICW)   Zanjan-2004

f is the number of significant factors

f is the rank of the original data matrix

f describes the complexity of the X matrix

Ideally, f is the number of nonzero eigen-values

f can be determined by the theory of FA

Scree plot, indicator function, imbedded error, real error, …

4Th

ICW 4Th ICW

Page 10: 4 Th  Iranian chemometrics Workshop (ICW)   Zanjan-2004

PCA-Based regression method

MLR (Classical Least

Squares)

y = X b

b = (XTX)-1XTy

ynew = xnew b

Principal Component Regression (PCR)

X = T P

y = T b

b = (TTT)-1TTy

tnew = xnew P

ynew = tnew b

4Th

ICW 4Th ICW

Page 11: 4 Th  Iranian chemometrics Workshop (ICW)   Zanjan-2004

1. How many PCs must be used in PCR?

2. Which PCs should be considered in PCR modeling?

3. Is the magnitude of an eigen-value necessarily a measure of its significance for the calibration?

Significance of factor selection

4Th

ICW 4Th ICW Some Questions

Page 12: 4 Th  Iranian chemometrics Workshop (ICW)   Zanjan-2004

Top-down eigen-value ranking(ER)

Factors are entered to the model based on their decreasing eigen-value one after the other

Once new factor is entered, the regression model is build and its performances are validated by the existing procedures such as cross-validation

4Th

ICW 4Th ICW

Page 13: 4 Th  Iranian chemometrics Workshop (ICW)   Zanjan-2004
Page 14: 4 Th  Iranian chemometrics Workshop (ICW)   Zanjan-2004
Page 15: 4 Th  Iranian chemometrics Workshop (ICW)   Zanjan-2004
Page 16: 4 Th  Iranian chemometrics Workshop (ICW)   Zanjan-2004
Page 17: 4 Th  Iranian chemometrics Workshop (ICW)   Zanjan-2004
Page 18: 4 Th  Iranian chemometrics Workshop (ICW)   Zanjan-2004
Page 19: 4 Th  Iranian chemometrics Workshop (ICW)   Zanjan-2004

-4

-3

-2

-1

0

1

2

3

4

5

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

loga

rithm

of e

igen

-val

ue

Page 20: 4 Th  Iranian chemometrics Workshop (ICW)   Zanjan-2004
Page 21: 4 Th  Iranian chemometrics Workshop (ICW)   Zanjan-2004

4Th

ICW 4Th ICW

Top-down Correlation Ranking (CR)

First the correlation between each one of the factors and the dependent variable (concentration, y) is determined

Then, the factors are entered to the models based on their decreasing correlation consecutively.

Page 22: 4 Th  Iranian chemometrics Workshop (ICW)   Zanjan-2004

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Cor

rela

tion

Coe

ffic

ient

Page 23: 4 Th  Iranian chemometrics Workshop (ICW)   Zanjan-2004
Page 24: 4 Th  Iranian chemometrics Workshop (ICW)   Zanjan-2004

Other factor selection methods

4Th

ICW 4Th ICW

• Stepwise selection procedure

• Search algorithms• Simulated annealing• Genetic algorithm

Page 25: 4 Th  Iranian chemometrics Workshop (ICW)   Zanjan-2004

Some references4Th

ICW 4Th ICW

1. Xie YL, Kalivas JH. Evaluation of principal component selection methods to form a global prediction model by principal component regression. Anal. Chim. Acta 1997; 348: 19-27.

2. Sutter JM, Kalivas JH. Which principal components to utilize for principal component regression. J. Chemometrics 1992; 6: 217-225.

3. Sun J. A correlation principal component regression analysis of NIR data. J. Chemometrics 1995; 9: 21-29.

4. Depczynski U, Frost VJ, Molt K. Genetic algorithms applied to the selection of factors in principal component regression. Anal. Chim. Acta 2000; 420: 217-227.

5. Barros AS, Rutledge DN. Genetic algorithm applied to the selection of principal components. Chemometrics Intell. Lab. Syst. 1998; 40: 65-81.

6. Verdu-Andres J, Massart DL. Comparison of prediction-and correlation-Based methods to select the best Subset of principal components for principal component regression and detect outlying objects. Appl. Spect. 1998; 52: 1425-1434.

7. Xie YL, Kalivas JH. Local prediction models by principal component regression. Anal. Chim. Acta 1997; 348: 29-38.

8. Ferre L. Selection of components in principal component analysis: a comparison of methods. Comput. Stat. Data Anal. 1995; 19: 669-682.

Page 26: 4 Th  Iranian chemometrics Workshop (ICW)   Zanjan-2004

• Quantitative Structure-Electrochemistry Relationship Study of Some Organic Compounds

• Dependent variable• Half-wave reduction potential (E1/2)of 69 compounds

• Independent variables• 1150 theoretical molecular descriptors calculated by DRAGON

software

4Th

ICW 4Th ICW A QSPR example

Page 27: 4 Th  Iranian chemometrics Workshop (ICW)   Zanjan-2004

0

20

40

60

80

100

1 6 11 16 21 26 31

cum

ulat

ive

perc

ent o

f var

ianc

e

Page 28: 4 Th  Iranian chemometrics Workshop (ICW)   Zanjan-2004

0

0.1

0.2

0.3

0.4

0.5

0.6

1 6 11 16 21 26 31

Corr

elat

ion

Coef

ficie

nt

Page 29: 4 Th  Iranian chemometrics Workshop (ICW)   Zanjan-2004

15

17

19

21

23

25

0 2 4 6 8 10 12 14Number of entered PC

PRES

SCV

ER

CR

Page 30: 4 Th  Iranian chemometrics Workshop (ICW)   Zanjan-2004

• ANN is a nonlinear non-parametric modeling method

• Feature selection is more important for ANN• Feature selection-based ANN modeling is a

complex procedure• Orthogonalization of the variables before

introducing to the network substantially decreases the computational time and increases the overall performances of the ANN

• PC-ANN is a feature extraction-based algorithm

4Th

ICW 4Th ICW

Principal Component-Artificial Neural Network (PC-ANN)

Page 31: 4 Th  Iranian chemometrics Workshop (ICW)   Zanjan-2004

4Th

ICW 4Th ICW

• Genetic Algorithm Applied to the selection of Factors in PC-ANN modeling,

• The set of PCs selected by GA could model the structure-antagonist activity of the calcium channel blockers better than the ER procedure

• B. Hemmateenejad, M. Akhond, R. Miri, M. Shamsipur, J. Chem. Inf,. Comput. Sci. 43 (2003) 1328.

• How are the factors ranked based on their correlation coefficient in PC-ANN?

PC-GA-ANN Algorithm

Page 32: 4 Th  Iranian chemometrics Workshop (ICW)   Zanjan-2004

CR-PC-ANN Algorithm

• Correlation Ranking Procedure for factor selection in PC-ANN modeling,

• The nonlinear relationship between each one of the PCs and the dependent variable (y) was modeled by separate ANN models.

• It was found that the subset of PCs selected by CR was relatively the same as those selected by GA. Therefore the results of these factor selection procedures were similar

• B. Hemmateenejad, Chemometrics Intelligent Laboratory System, 2004, Accepted.

4Th

ICW 4Th ICW

Page 33: 4 Th  Iranian chemometrics Workshop (ICW)   Zanjan-2004

1. Application of ab initio theory to QSAR study of the 1,4-dihydrpyridine-based calcium channel blockers using GA-MLR and PC-GA-ANN procedures, B. Hemmateenejad, M.A. Safarpour, R.Miri, F. Taghavi, Journal of Computational Chemistry 25 (2004) 1495.

2. Highly Correlating Distance-Connectivity-Based Topological Indices. 2: Prediction of 15 Properties of a Large Set of Alkanes Using a Stepwise Factor Selection-Based PCR Analysis, M. Shamsipur, R. Ghavami, B. Hemmateenejad, H. Sharghi, QSAR Combinatorial Sciences, 2004, Accepted.

3. Quantitative Structure-Electrochemistry Relationship Study of some Organic Compounds using PCR and PC-ANN, B. Hemmateenejad, M. Shamsipur, Internet Electronic Journal of Molecular Design 3 (2004) 316.

4. Toward an Optimal Procedure for PC-ANN Model Building: Prediction of the Carcinogenic Activity of a Large Set of Drugs, B. Hemmateenejad, M.A. Safarpour, R. Miri, N. Nesari, Journal of Chemical Information and Computer Sciences, Revised

5. Optimal QSAR analysis of the carcinogenic activity of drugs by correlation ranking and genetic algorithm-based PCR, B. Hemmateenejad, Journal of Chemometrics, Submitted.

Page 34: 4 Th  Iranian chemometrics Workshop (ICW)   Zanjan-2004

1. Selection of Latent Variables in PLS2. Application of other selection algorithms such

as successive projections algorithm3. Comparison between the importance of factor

selection in multicomponent analysis and QSAR/QSPR studies

4. Application of the factor selection-based ANN modeling in multicomponent analysis

5. Validation of the different factor selection algorithms by new criteria

4Th

ICW 4Th ICW Feature Works


Recommended