Date post: | 28-Dec-2015 |
Category: |
Documents |
Upload: | audra-beasley |
View: | 214 times |
Download: | 0 times |
01.12.08 1
Chemometric functions in Excel Chemometric functions in Excel
Oxana Rodionova & Alexey PomerantsevOxana Rodionova & Alexey Pomerantsev
Semenov Institute of Chemical PhysicsSemenov Institute of Chemical Physics
[email protected]@chph.ras.ru
01.12.08 2
Distance Learning Course in Chemometrics Distance Learning Course in Chemometrics for Technological and Natural-Science for Technological and Natural-Science Mastership EducationMastership Education
3000 km
4000 km
• Unfulfilled need in chemometric education in Russia
• Low number of qualified specialists in chemometrics
• Large distances, e.g. Moscow – Barnaul is about 3000 km
• No modern chemometrics books in Russian
• No available chemometric software
• No support from officials: government, Academy, etc
Barnaul
• Easy available everywhere => INTERNET
• Interactive layout: all calculations should be clear and repeatable
• Web friendly environment for the calculations => EXCEL
• Necessity to make and use our own (free) software => EXCEL Add-In
01.12.08 3
Chemometric Chemometric calculations in calculations in Excel Excel
Excel UserInterface
VBAFunctions
С++DLL
DA
TAR
es
ults
Input
Calculations
Provides user with all possibilities of Excel interface, worksheet calculations, worksheet functions, charts, etc.
VBA helps to simplify routine work
All calculations are made "on the fly“ and very fast
01.12.08 4
InstallationInstallation
http://rcs.chph.ras.ru/down/sacs.ziphttp://rcs.chph.ras.ru/down/sacs.zip
Chemometrics. xlaChemometrics. xla put in the AddInn folderput in the AddInn folder
(C:\Documents and Settings\ (C:\Documents and Settings\ <User>\Application Data\ <User>\Application Data\ Microsoft\AddIns\)Microsoft\AddIns\)
Chemometrics.dllChemometrics.dll
put in your Windows folderput in your Windows folder (C:\WINDOWS\)(C:\WINDOWS\)
Load Chemometrics.xla by < Excel Options> <Add-Ins> in the open Workbook
01.12.08 5
Matrix calculations in ExcelMatrix calculations in Excel
={TRANSPOSE(B6:F10)}
={MMULT(B6:F10,TRANSPOSE(Barr))}
B6:F10
Barr
Ctrl-Shift-Enter
01.12.08 6
Principal Component Analysis (PCA)Principal Component Analysis (PCA)
Initial data
Loading matrix
XI
J A
Score matrix
TI= +
×
Error matrix
EI
J
PT
J
A
PJ
A
X=TPT+E
01.12.08 7
Chemometrics XLA. PCA ScoresChemometrics XLA. PCA Scores
={ScoresPCA(Xcal,5,1,Xtst)}
CenteringAND/ORweighting
nPC
XcalXcal
XtstXtst
01.12.08 8
Chemometrics XLA. PCA LoadingsChemometrics XLA. PCA Loadings
=TRANSPOSE(LoadingsPCA(Xcal,5,1))}CenteringAND/ORweighting
nPCExcel worksheet function
XcalXcal
01.12.08 9
List of chemometric functionsList of chemometric functionsPCA ScoresPCA <for calibration or test samples>
LoadingsPCA
PLS ScoresPLS <X-scores for calibration or test samples>
UScoresPLS <Y-scores for calibration or test samples>
LoadingsPLS <P-loadings>
WLoadingsPLS
QLoadingsPLS
PLS2 ScoresPLS2 <X-scores for calibration or test samples>
UScoresPLS2 <Y-scores for calibration or test samples>
LoadingsPLS2 <P-loadings>
WLoadingsPLS2
QLoadingsPLS2
Options:
• Centering AND/OR scaling
• Number of PCs
01.12.08 10
ScoresPCAScoresPCA
ScoresPCA (rMatrix [, nPCs] [,nCentWeightX] [, rMatrixNew] )
X data (calibration set)
Number of PC (A)
centering and/or scaling
1 centering
2 scaling
3 both
Test set
X[IJ] T[I A]
01.12.08 11
Validation RulesValidation Rules
If rMatrixNew is omitted then only calibration scores are calculated
If rMatrixNew is specified then only test scores are calculated
If rMatrixNew coincides with rMatrix then cross-validation is
calculated10% -out
cross-validation
01.12.08 12
LoadingsPCALoadingsPCA
LoadingsPCA (rMatrix [, nPCs] [,nCentWeightX])
X data (calibration set)
Number of PC (A)
centering and/or scaling
1 centering
2 scaling
3 both
X[IJ] P[J A]
01.12.08 13
Explorative Data AnalysisExplorative Data Analysis
Case study 1: People
01.12.08 14
PeoplePeople
01.12.08 15
Dataset in Excel Workbook (People.xls)Dataset in Excel Workbook (People.xls)
Number of objects (n) = 32
Number of variables (m) = 12
01.12.08 16
Data PreprocessingData Preprocessing
Aim: to transform the data into the most suitable form for data analysis
01.12.08 17
AutoscalingAutoscaling
mean centering scaling
autoscaling
+
=
01.12.08 18
PeoplePeople: : Scores & Loadings (PC1 vs. PC2)Scores & Loadings (PC1 vs. PC2)
-2
0
2
4
-4 -2 0 2 4 6
t1
t2
FSFS
FS
FS
FS
FS
FS
FS
FN
FN
FNFN
FN
FN
FN
FN
MS
MSMS
MS
MSMS
MSMS
MN
MN
MNMN
MN
MN
MNMN
-2
0
2
4
-4 -2 0 2 4 6
t1
t2
Height
Weight
Hairs
Shoes
Age
IncomeBeer
Wine
Sex
Strength
Region
IQ
-0.3
0.0
0.3
0.6
-0.4 -0.2 0.0 0.2 0.4
P1
P2 a)
“Map of Samples” “Map of Variables”
01.12.08 19
PeoplePeople: : Scores & Loadings (PC1 vs. PC3)Scores & Loadings (PC1 vs. PC3)
MNMN
MN
MN
MNMN
MN
MN
MSMS
MS MS
MS
MSMS
MS
FN
FN
FN
FN
FNFN
FN
FN
FS
FS
FS
FS
FS
FS
FSFS
-3
-1
1
3
-4 -2 0 2 4 6
t1
t3
Score plot Loading plot
IQ
Region
StrengthSex
Wine
Beer
Income
Age
ShoesHairs
Weight
Height
-0.8
-0.6
-0.4
-0.2
0.0
0.2
0.4
-0.4 -0.2 0.0 0.2 0.4
P1
P3 a)
18
20
21
2627
26
30
33
2324
2427
30
36
32
35
36
4240
41
32 3337
41
40
49
37
50
43
55
4648
-3
-2
-1
0
1
2
3
-4 -2 0 2 4 6
t1
t3
01.12.08 20
Case study 2: HPLC-DADCase study 2: HPLC-DAD
01.12.08 21
MeasurementsMeasurements
15
913
1721
2529
220
249
277
306
334
0.0
0.2
0.4
0.6
0.8
1.0
1.2
AU
time
wavelength
01.12.08 22
Dataset in Excel WorkbookDataset in Excel Workbook
X(3028)
01.12.08 23
Pure compoundsPure compounds A andA and BB
X=CST+E
0.0
0.2
0.4
0.6
0.8
1.0
220 240 260 280 300 320 340
l, nm
AU A
BC (t )
0.0
0.2
0.4
0.6
0.8
1.0
1.2
0 5 10 15 20 25 30
time
A
B
If we observe X can we predict C and S ?
01.12.08 24
30292827262524232221201918
1716
1514
1312 11
10
9
8
7
6
54
3
2
1
t 1
t 2
Score plotScore plot
B
A
C (t )
0.0
0.2
0.4
0.6
0.8
1.0
1.2
0 5 10 15 20 25 30
time
A
B
01.12.08 25
Conclusions from the Score PlotConclusions from the Score Plot
1. Linear regions = Pure compounds
2. Curved line= Co-elution
3. Closer to the origin = Lower intensity
4. Number of bends = Number of different compounds
01.12.08 26
Factor analysis vs. PCA analysisFactor analysis vs. PCA analysis
X
E1
+
=
CST×
2
J
I
I
J
X
E2
+
=
TPT×
A
J
I
I
J
01.12.08 27
Scores and LoadingsScores and Loadings
S , P
-0.4
-0.2
0.0
0.2
0.4
0.6
0.8
1.0
220 240 260 280 300 320 340
wave length
A
B
p1
p2
C , T
-0.8
0.2
1.2
2.2
3.2
0 5 10 15 20 25 30
time
A
B
t1
t2
01.12.08 28
Procrustes transformationProcrustes transformation
X ≈ CST
X ≈ TPT
I = RRT = Identity matrix
X ≈ T(RRT)PT = (TR)(PR)T
C ≈ TR S ≈ PR
R = Rstretch ×Rrotation
^ ^
01.12.08 29
Scores TransformationScores Transformation
3029282726252423222120191817 16 15 1413 12 11 10
9
8
7
6
54
3
21
t 1
t 2
12
3
4 5
6
7
8
9
10111213
1415161718192021222324252627282930
t 1
t 2
Stretching
12
3
45
6
7
8
9
101112131415161718192021222324252627282930
t 1
t 2
01.12.08 30
Procrustes analysis resultsProcrustes analysis results
C (t )
0.0
0.2
0.4
0.6
0.8
1.0
1.2
0 5 10 15 20 25 30
time
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0C hat(t )
A
B
Ahat
Bhat
0.0
0.2
0.4
0.6
0.8
1.0
220 240 260 280 300 320 340
wavelength l, nm
S(l)
0.0
0.2
0.4
0.6
0.8
1.0
1.2S hat(l)
A
B
Ahat
Bhat
01.12.08 31
Conclusions Conclusions
1. Scaling and centering is problem dependent
2. In this example number of PCs = Number of
different compounds
01.12.08 32
RegressionRegression
01.12.08 33
Principal Component Regression (PCR)Principal Component Regression (PCR)
Xp1
t
pAt...
tAt1
...
P
T
a = + e yT
1) PCA
2) MLR
01.12.08 34
Projection on Latent Structures (PLS)Projection on Latent Structures (PLS)
w1 t
wA t...
Xp1
t
pAt...
tAt1
... Yu1 uA
...
...q1
t
qAtQ
U
P
T
W
01.12.08 35
Projection on Latent Structures (PLS)Projection on Latent Structures (PLS)
B = + e YT
01.12.08 36
PLS and PLS2PLS and PLS2
b = + e yT1
1 1
B = + E YTM
M M
PLS
PLS2
01.12.08 37
ScoresPLSScoresPLS
ScoresPLS (rMatrixX, rMatrixY
[, nPCs] [, nCentWeightX] [, nCentWeightY] [, rMatrixXNew])
X data (calibration set)
Number of PC (A)
centering and/or scaling of X
1 centering
2 scaling
3 both
X Test set
Y data (calibration set)
centering and/or scaling of Y
1 centering
2 scaling
3 both
X[IJ], Y[I1] T[IA]
01.12.08 38
UScoresPLSUScoresPLS
UScoresPLS (rMatrixX, rMatrixY
[, nPCs] [, nCentWeightX] [, nCentWeightY] [, rMatrixXNew] [,
rMatrixYNew])
X data (calibration set)
Number of PC (A)
centering and/or scaling of X
1 centering
2 scaling
3 both
X Test set
Y data (calibration set)
centering and/or scaling of Y
1 centering
2 scaling
3 both
Y Test set
X[IJ] , Y[I1] U[I A]
01.12.08 39
WLoadingsPLSWLoadingsPLS
WLoadingsPLS (rMatrixX, rMatrixY
[, nPCs] [, nCentWeightX] [, nCentWeightY])
X data (calibration set)
Number of PC (A)
centering and/or scaling of X
1 centering
2 scaling
3 both
Y data (calibration set)
centering and/or scaling of Y
1 centering
2 scaling
3 both
X[IJ] , Y[I1] W[J A]
01.12.08 40
LoadingsPLSLoadingsPLS
LoadingsPLS (rMatrixX, rMatrixY
[, nPCs] [, nCentWeightX] [, nCentWeightY])
X data (calibration set)
Number of PC (A)
centering and/or scaling of X
1 centering
2 scaling
3 both
Y data (calibration set)
centering and/or scaling of Y
1 centering
2 scaling
3 both
X[IJ] , Y[I1] P[JA]
01.12.08 41
QLoadingsPLSQLoadingsPLS
QLoadingsPLS (rMatrixX, rMatrixY
[, nPCs] [, nCentWeightX] [, nCentWeightY])
X data (calibration set)
Number of PC (A)
centering and/or scaling of X
1 centering
2 scaling
3 both
Y data (calibration set)
centering and/or scaling of Y
1 centering
2 scaling
3 both
X[IJ], Y[I1] Q[1 A]
01.12.08 42
ScoresPLS2ScoresPLS2
ScoresPLS2 (rMatrixX, rMatrixY
[, nPCs] [, nCentWeightX] [, nCentWeightY] [, rMatrixXNew])
X data (calibration set)
Number of PC (A)
centering and/or scaling of X
1 centering
2 scaling
3 both
X Test set
Y data (calibration set)
centering and/or scaling of Y
1 centering
2 scaling
3 both
X[IJ], Y[IK] T[I A]
01.12.08 43
UScoresPLS2UScoresPLS2
UScoresPLS2 (rMatrixX, rMatrixY
[, nPCs] [, nCentWeightX] [, nCentWeightY] [, rMatrixXNew] [,
rMatrixYNew])
X data (calibration set)
Number of PC (A)
centering and/or scaling of X
1 centering
2 scaling
3 both
X Test set
Y data (calibration set)
centering and/or scaling of Y
1 centering
2 scaling
3 both
Y Test set
X[IJ], Y[IK] U[I A]
01.12.08 44
LoadingsPLS2LoadingsPLS2
LoadingsPLS2 (rMatrixX, rMatrixY
[, nPCs] [, nCentWeightX] [, nCentWeightY])
X data (calibration set)
Number of PC (A)
centering and/or scaling of X
1 centering
2 scaling
3 both
Y data (calibration set)
centering and/or scaling of Y
1 centering
2 scaling
3 both
WLoadingsPLS2WLoadingsPLS2
QLoadingsPLS2QLoadingsPLS2
X[IJ], Y[IK] P[J A] or W[J A] or Q[K A]
01.12.08 45
Seventh Winter Symposium on Seventh Winter Symposium on ChemometricsChemometrics
near Tula city, February 2010
100 km