Post on 08-Apr-2018
transcript
8/6/2019 Hydrologic regionalization
1/26
Hydrologic Regionalization With Clustering
By
Nirdesh Kumar-06004008
8/6/2019 Hydrologic regionalization
2/26
Intro ction
R gionalization
Areas with homogeneous hydrologic response.
Applications-hydrologic design, planning, management of water resources
systems, regional trend analysis and frequency analysis of floods, low flows and
other variables.
Attributes-factors influencing hydrology in the area.
Physiographic-drainage area, slope of the mainchannel in the drainage
basin, soil runoff coefficient and storage.
Location-latitude, longitude and elevation.
Meteaorological- Specific humidity, temperature, wind velocity, wind
direction and rainfall.
On basis of attributes, sites are selected-Feature vectors. Cluster-Regions containing feature vectors with similar hydrologic response.
Optimum number of clusters obtained by application of cluster validity indices.
8/6/2019 Hydrologic regionalization
3/26
Tests applied to check the homogeneity of the region-Regional homogeneity test.
Regions adjusted to improve homogeneity.
Selection of variables influencing the hydrology in a region as attributes
Preparation of feature vectors using selected variables
Formation of clusters by applying clustering algorithm
Identification of optimum number of clusters
Validation of regions to test their homogeneity
Adjustment of heterogeneous regions
8/6/2019 Hydrologic regionalization
4/26
Clustering Techniques
Clustering-Variety of multivariate statistical procedures that are used to
investigate, interpret and classify given data into similar groups or clusters, which
may or may not be overlapping.
The data points within a cluster should be as similar as possible and the data
points of different clusters should be as dissimilar as possible.
Various Algorithms are used for clustering-K-means algorithm, single linkage,
complete linkage and Wards algorithm.
8/6/2019 Hydrologic regionalization
5/26
Hydrologic Regionalization With Clustering
Clustering Algorithms
K-Means Algorithm
N feature vectors in n-dimensional attribute space
is the value of attribute j in ith feature vector
Each feature vector represents one of the N sites in the study region.
Rescaling-process necessary to nullify the effects of the differences in their
variance and relative magnitudes.
8/6/2019 Hydrologic regionalization
6/26
denotes the rescaled value of
Represents standard deviation of attribute j.
Mean value of attribute j over all N feature vectors.
K-number of clusters.
Nk -number of feature vectors in cluster k.
-rescaled value of attribute j in the feature vector I assigned to cluster k.
-mean value of attribute j for cluster k, computed as
8/6/2019 Hydrologic regionalization
7/26
Minimizing F, distance of each feature vector from the centre of the cluster to which
it belongs, is minimized.
Steps involved in K-means algorithm to delineate clusters for a given value of K
are:
1- Set current iteration number t to 0 and maximum number of iterations to t_max.
2- Initialize K cluster centers to random values in the multidimensional feature vector
space.
3- Initialize the current feature vector number i to 1.
4- Determine Euclidean distance of ith feature vector from centers of each of the Kclusters, and assign it to the cluster whose center is nearest to it.
5- If i < N, increment i to i + 1 and go to step 4; otherwise continue with step 6.
6- Update the centroid of each cluster by computing average of the feature vectors
assigned to it. Then compute F for the current iteration t. If t = 0, increase t to t + 1
and go to step 3. If t > 0, compute the difference in the values of F for iterations t and
t - 1. Terminate the algorithm if change in the value of F between two successiveiterations is insignificant; otherwise, continue with step 7.
7- If t < t_max, update t to t + 1 and go to step 3; otherwise, terminate the algorithm.
8/6/2019 Hydrologic regionalization
8/26
Single linkage and complete linkage algorithms
Single linkage-Distance between the cluster [yi ,yj ], formed by merging clusters yiand yj ,and yk ,is the smaller of the distances between yi and yk or yj and yk . Complete linkage-distance between the new cluster [yi ,yj ] and any other singleton
cluster yk is the greater of the distances between yi and yk or yj and yk .
Single linkage Complete linkage
8/6/2019 Hydrologic regionalization
9/26
Wards algorithm
The objective function, W, of Wards algorithm minimizes the sum of squares of
deviations of the feature vectors from the centroid of their respective clusters.
At each step in the analysis, union of every possible pair of clusters is considered
and two clusters whose fusion results in the smallest increase in W are merged.
The change depends only on the relationship between the two merged clusters
and not on the relationships with other clusters.
8/6/2019 Hydrologic regionalization
10/26
Cluster Validity Indices
Identification of optimum number of compact and well separated clusters.
Dunns index
( Ci ,Cj )-Distance between clusters Ci and Cj
(Ck )-Intracluster distance of cluster Ck .
8/6/2019 Hydrologic regionalization
11/26
Regional Homogeneity Test
Heterogeneity of the set of plausible regions obtained from the cluster analysis
is assessed. Uses the advantages offered by sampling properties of L-moment ratios.
Examines whether the between-site dispersion of the sample LMRs for the
group of sites under consideration is larger than the dispersion expected in a
homogeneous region.
tRRegional average coeficient of L-variation(L-CV).
t4RRegional average L-kurtosis.
t3R
Regional average L-skewness.
-Weight apllied to sample L moment ratios at site i.
8/6/2019 Hydrologic regionalization
12/26
Heterogeneity measures (HM) can be based on three measures of dispersion.
(1) weighted standard deviation of the at-site sample L-CVs (V);
(2) weighted average distance from the site to the group weighted mean in the twodimensional space of L-CV and L-skewness ();
(3) weighted average distance from the site to the group weighted mean in the
two dimensional space of L-skewness and L-kurtosis ().
8/6/2019 Hydrologic regionalization
13/26
For each simulated realization(homogeneous region) V1 ,V2 and V3 are computed.
v ,v2 ,v3 are mean deviations and v ,v2 ,v3 are the standard deviations of the
simulated realizations.
HM
8/6/2019 Hydrologic regionalization
14/26
(6) Merging a region with another or others;
(7) Merging two or more regions and redefining groups;
(8) Obtaining more data and redefining regions.
irst three options are useful in reducing the alues of heterogeneit measures of a region
Options 7 help in ensuring that each region is sufficientl large in terms of collecti e
data length at all the sites in it
8/6/2019 Hydrologic regionalization
15/26
8/6/2019 Hydrologic regionalization
16/26
Serial
Number
Region
Name
Number of
Grid Points
Region Type
1 Peninsular 4 23.28 5. 3 0.26 Definitely
heterogeneous
2 West
Central
86 10.8 0.64 -1.33 Definitely
heterogeneous
3 Northwest 6 20. 6 5.87 -1.08 Definitely
heterogeneous
4 Central
Northeast
5 4.32 -0.73 -1. 0 Definitely
heterogeneous
5 Northeast 36 4.44 -0. 1 1.06 Definitely
heterogeneous
Results and Discussion
The statistical homogeneity of each of the five IMD SMR regions is tested using SMR
data at grid points in the region as shown in the table below.
The IMD regions are adjusted to improve their homogeneity and tabulated in table 2.
Figure 2 shows the number of sites removed to make the regions acceptably
homogeneous.
Table 1- Characteristics of the IMD SMR Regions Determined Using Heterogeneity Measures
8/6/2019 Hydrologic regionalization
17/26
Figure - MRregions that are consi ere as
homogeneous y IMD
Figure - MRregions after a justing
8/6/2019 Hydrologic regionalization
18/26
Serial Number Region Name Number of Grid
Points
Heterogeneity Measures Number of Grid
Points
Eliminated
1 Peninsular 27 0.75 -0.34 1.35 22
2 West Central 62 0.80 -1.17 -2.03 243 Northwest 40 0.84 -0.86 -1. 0 2
4 Central Northeast 45 0.74 -0.86 -1.47 14
5 Northeast 32 0.45 -1.30 -1.06 04
Table 2-Characteristics of SMR regions after adjusting
To delineate new homogeneous SMR regions in the study region, 52 out of 60 NCEP
grid boxes covering India are considered
Rain gauge density low in himalayan region(8 boxes discarded).
mean monthly values of each of the 15 atmospheric variables are considered at each
NCEP grid point for the summer monsoon months.
60 values (15 variables *16 grid points*4 months) are obtained for each grid point.
The principal components and standardized location attributes (latitude, longitude, and
average elevation of terrain in each of the NCEP grid boxes) are considered as attributes
to form 52 feature vectors for K-means cluster analysis, to reduce redundancy.
8/6/2019 Hydrologic regionalization
19/26
Figure 3- Grid boxes covering India.
Atmospheric variables influencing rainfall in the hashed box are considered at 16
NCEP grid points shown as black dots surrounding the box.
To know the exact number of regions,K-means algorithm is applied and cluster
validity indices are computed to determine the optimum numbe rof clusters.
Figure 4- Identification of optimal partition
provided by K-means clustering algorithm
8/6/2019 Hydrologic regionalization
20/26
Partition with the minimum value for Davies-Bouldin index and the
maximum value for Dunns and Calinski-Harabasz indices is considered
as the optimal partition.
Several of the clusters obtained using K-means algorithm for thechoice of K greater than 15 are found to be quite small in size, therefore
clusters obtained for K = 15 are selected as optimal partition.
Figure - lusters inoptimal partitionobtaine using -means
algorithm.
8/6/2019 Hydrologic regionalization
21/26
Cluster Number Cluster Size(in Number of IMD
Grid Points)
Heterogeneity Measures
1 15 -1.56 -0.46 0.03
2 2 10. 1 2.7 -0.83
3 22 17.27 5.46 1.22
4 25 .65 1.03 -0.33
5 38 5.20 -1.71 -3.11
6 53 5.43 -0.40 -1.78
7 4.27 0.36 -1.07
8 6 -0.43 -1.85 -1.55
13 6.08 -1.22 -1.86
10 53 12.37 0.30 -2.17
11 8.51 6.18 4.55
12 4 2.45 0.86 -0.14
13 46 12.67 2.74 -0.16
14 20 -0.20 -1.24 -1.07
15 11 2.46 -0.05 -1.06
Table 3- Characteristics of the Clusters in Optimal Partition
Obtained Using K-Means Algorithm.
Table 3 shows that clusters 8 and 14 are found to be acceptably homogeneous,cluster 1 is possibly homogeneous, whereas the remaining clusters are heterogeneous.
Overall, 23 out of the 301 IMD grid points considered for regionalization are
unallocated, as they are eliminated from different regions to improve statistical
homogeneity.
Six sites are transferred to other regions, and 33 sites are separated from clusters to
form new regions.
8/6/2019 Hydrologic regionalization
22/26
Table 4- Details of Region Formation From Optimal Partition Obtained Using K-MeansAlgorithm. The regions are adjusted and all 17 regions are classified as either acceptably
homogeneous or possibly homogeneous.
8/6/2019 Hydrologic regionalization
23/26
Table 5-Characteristics of the Regions Formed by Adjusting Clusters Obtained Using K-
Means Algorithm
8/6/2019 Hydrologic regionalization
24/26
Figure 6-Homogeneous rainfall regions obtained by adjusting the clusters
We observe that the number of sites that had to be eliminated from the regions for
improving their statistical homogeneity is found to be excessive, indicating that the IMD
SMR regions are not useful as precursors to derive homogeneous SMR regions.
New SMR regions are delineated using the proposed methodology.
8/6/2019 Hydrologic regionalization
25/26
Conclusion
Existing approaches based on statistics computed from observed hydrology.
Independent validation of the delineated regions for homogeneity in hydrology is not
possible.
Uncertainty in forming homogeneous regions in areas having a limited hydrological
data available.
Proposed method has the ability to form regions irrespective of the available data(raingauges for this study).
However, as seen in this study, there is uncertainty in validating homogeneous regions in
areas having a few rain gauges.
8/6/2019 Hydrologic regionalization
26/26
ThankY