Post on 15-Jul-2015
transcript
Adamu Mustapha PhD
There are number of multivariate Geostatistical analyses used
in environmental studies to identify the spatial and temporal
variation of the datasets.
1. Hierarchical Agglomerative Cluster Analysis (HACA)
2. Principal Component Analysis (PCA)
3. Multiple Linear Regression (MLR)
4. Pearson’s Product Moment Correlation Analysis
5. Discriminant Analysis
1. Hierarchical Agglomerative Cluster Analysis (HACA)
HACA is a multivariate Geostatistical technique whose primary
purpose is to assemble similar objects based on characteristic
they possess (Shrestha and Kazama, 2007)
The level of similarities at which observation are merged are
used to construct a dendrogram of clusters (Singh et al., 2004;
Chen et al., 2007; Juahir et al. 2011, Mustapha et al. 2012).
The resultant clusters exhibit high internal (within clusters)
homogeneity and high external (between groups) heterogeneity.
Jakara Dam
0 3 Km
Domestic sources
Industrial sources
Agricultural sources
Sources apportionment of Jakara Basin (Upstream)
Nigeria
S1 S2S3
S4S5
S6
S7S8
S9S10
S11 S12
S13 S14
S15
S16
S17
S18
S19
S20
S21
S22
S23
S25S26
S27
S28
S29
S30
S1, S2, . .. Sampling sites
S24
Sampling Points
2. Principal component analysis and or factor analysis (PCA)
PCA is a multivariate Geostatistical statistical technique that
examine the underlying pattern or relationship of a large number
of variables. It is use to get information about inter-relationships
among a set of variables
PCA group the variables into smaller and more meaningful set
of factors
How do we determine the number of factors to be retained?
We use the Kaiser’s-one- Criterion also known as the eigen-value
rule of >1
We equally use the Catell’s scree plot
It produce plot of the eigenvalues, looking at the plot where it
becomes horizontal, then Cartell;s recommends retaining all the
factors above this points.
These factors with eigenvalues 1 and >1 contribute the most
variance in the data sets.
The Important parameters in the factor have factor high factor
loading. Liu et al. 2003 suggest the following loading on
parameters
0 – 0.4 Low loading
0.5 -0.7 Moderate loading
> 0.7 High loading
Parameters Unit PC1 PC2 PC3 PC4 PC5
Pb mg L-1 0.960 -0.111 0.105 0.060 -0.098
Cd mg L-1 0.953 -0.101 0.145 -0.049 -0.040
Cr mg L-1 0.940 -0.037 0.148 -0.049 -0.060
Hg mg L-1 0.854 -0.070 0.020 -0.086 -0.299
Fe mg L-1 0.706 0.172 0.327 -0.353 -0.078
EC µS/cm -0.659 -0.139 -0.181 0.094 -0.144
Ni mg L-1 0.620 -0.014 -0.331 0.602 -0.234
BOD5 mg L-1 0.537 0.594 -0.114 -0.399 0.399
DS mg L-1 -0.156 0.835 0.557 0.074 -0.167
TS mg L-1 0.042 0.670 0.514 0.133 -0.036
pH -0.418 0.260 0.633 0.217 -0.073
DO mg L-1 0.166 -0.583 -0.617 -0.017 0.240
COD mg L-1 0.332 0.527 0.565 0.187 0.090
Turbidity NTU 0.342 -0.009 0.126 0.788 0.191
Hardness mg L-1 0.178 -0.257 0.367 0.162 0.809
Eigen value 5.53 2.21 1.96 1.42 1.13
% Variance 36.91 14.79 13.06 9.49 7.57
3. Multiple Linear Regression (MLR)
MLR is used to fit a model to our data and use it to predict the
value of the Y (DVs) from one or more IV’s.
Predicting out come from one or several predictors.
Mathematical techniques LSM is used to establish the line that
best describes the data.
Friday, November 28, 2014 11
Y=bo +b1x1+b2x2+b3x3+……bpxp
Regression analysis is to derived a prediction equation
Where:
Y = dependent variable
Xs = independent variables
bo = Y-intercept
b1 = regression coefficient
Before interpreting the result of MLR, there is need to check
for assumptions of regression analysis. i.e. Normality, linearity
and multicolinearity (Berry, 1993).
Friday, November 28, 2014 13
The normal p-p plot of regression standardized residuals revealed all
observed Values fall roughly along the straight line. This indicates
residuals are from normally Distributed population
Friday, November 28, 2014 14
Assumption sof linear regression model
Colinearity/Multicolinearity
Problem with correlation between Ivs that occurs when
Ivs are highly correlated which make it difficult to
determine the contribution of Ivs.
Tolerance value
Variance Inflation Factor (VIF)
Condition index
a. Tolerance
This is the amount of variability not explain by other Ivs,
small tolerance value indicates high Multicolinearity smaller
than 0.10
b. Variance Inflation factor (VIF)
This is the inverse of the tolerance. The cutoff threshold of
VIF must be >1.0
Condition Index (CI) is a measure of the relative amount of
variance associated with an eigen value. A large CI indicates a
high degree of collinearity
A value of CI greater than 15 indicates a possible problem and an
index greater than 30 suggests a serious problem with collinearity
(Kutner et al. 2004).
c. Condition index statistics
R = 0.986
R2 = 0.971
Model RR
Square
Adjusted
R Square
SE of the
Estimate
R Square
Change
Change Statistics
F
Changedf1 df2
Sig. F
Change
Durbin-
Watson
1 0.986 0.971 0.840 2.331 0.971 7.382 15 5 0.018 2.651
Friday, November 28, 2014 18
Model
Unstandadized
BETA Std. Error
Standardized
Coefficients
BETA t Sig. Tolerance VIF
1 (Constant) 102.748 39.602 2.594 0.018
Iron mg/l 0.438 0.127 0.778 3.449 0.000 0.250 15.897
Mercury mg/l 2.442 1.906 3.500 1.281 0.000 0.333 1304.69
Chromium mg/l -0.852 0.672 -3.188 -1.267 0.000 0.290 1105.85
Cadmium mg/l -5.695 2.019 -11.900 -2.821 0.000 0.540 3110.806
Lead mg/l 3.719 1.317 12.358 2.823 0.001 0.889 3350.478
Estimates of coefficient for the model
Friday, November 28, 2014 19
From the table the largest beta coefficient is 3.719 (lead), the
variable make a unique contribution in explaining DV.
2 2 2 2
( )( )
( ) ( )
n xy x yr
n x n y y
4. Pearson’s Product Moment Correlation Analysis
Identify the significant relationship between bivariate
r value Interpretation
0.0 to 0.29 Negligible or little correlation
0.3 to 0.49 Low correlation
0.5 to 0.69 Moderate or marked correlation
0.7 to 0.89 High correlation
0.9 to 1.00 Very high correlation
Table 2 Guildford rule of thumb for interpreting correlation analysis (r)
Headache Fever Backpain JointPain StickInjuries Scabies Rashes Catarh Cough Breathprob Diarrhoea EyeProblem StomachPain
Headache 1 0.505 0.547 .788** 0.575 .679* 0.308 0.191 -0.419 0.268 0.043 0.184 -0.049
Fever 1 0.624 0.525 0.571 .786** 0.183 0.498 0.085 .686* 0.204 .762* 0.619
Backpain 1 0.537 .862** .849** .701* 0.543 0.156 .823** 0.056 0.344 0.352
JointPain 1 0.551 .672* 0.246 0.181 -0.246 0.422 -0.443 0.344 -0.171
StickInjuries 1 .827** 0.452 0.369 0.301 .778** -0.083 0.207 0.064
Scabies 1 0.58 0.389 -0.087 .816** 0.023 0.57 0.367
Rashes 1 0.23 -0.058 0.528 -0.047 0.134 0.324
Catarh 1 0.308 0.441 0.18 .656* 0.437
Cough 1 0.314 -0.221 -0.012 0.054
Breathprob 1 -0.141 0.534 0.346
Diarrhoea 1 0.058 0.624
EyeProblem 1 0.602
StomachPain 1
Thank you