Date post: | 18-Jan-2018 |
Category: |
Documents |
Upload: | ginger-thomas |
View: | 216 times |
Download: | 0 times |
Mr. Idrissa Y. H. Assistant Lecturer,Geography & EnvironmentDepartment of Social SciencesSchool of Natural & Social SciencesState University of Zanzibar
LECTURE 8a_ SPATIAL STATISCAL ANALYSIS
Introduction to spatial analysis Judging spatial association visually The concept of Clustering and
Cluster analysis Spatial Cross-Correlation Pearson, Spearman Multivariate spatial association
measures
That Spatial Statistics, extends traditional statistics on two fronts. First, it seeks to map the variation in a data set and Secondly, it can uncover “numerical spatial relationships” within and among mapped data layers.
Tobler’s Law “Everything is related to everything else, but near things are more related than distant things”
3 major benefits of spatial analysis Pattern Analysis Feature count Analysis
May 6, 2023 Data Mining: Concepts and Techniques 4
1. What is Cluster Analysis?2. Types of Data in Cluster Analysis3. A Categorization of Major Clustering Methods4. Partitioning Methods5. Hierarchical Methods6. Density-Based Methods7. Grid-Based Methods8. Model-Based Methods9. Clustering High-Dimensional Data 10.Constraint-Based Clustering 11.Outlier Analysis12.Summary
May 6, 2023 Data Mining: Concepts and Techniques 5
Cluster: a collection of data objects Similar to one another within the same cluster Dissimilar to the objects in other clusters
Cluster analysis Finding similarities between data according to the
characteristics found in the data and grouping similar data objects into clusters
Unsupervised learning: no predefined classes Typical applications
As a stand-alone tool to get insight into data distribution
As a preprocessing step for other algorithms
May 6, 2023 Data Mining: Concepts and Techniques 6
Pattern Recognition Spatial Data Analysis
Create thematic maps in GIS by clustering feature spaces
Detect spatial clusters or for other spatial mining tasks
Image Processing Economic Science (especially market research) WWW
Document classification Cluster Weblog data to discover groups of similar
access patterns
May 6, 2023 Data Mining: Concepts and Techniques 7
Marketing: Help marketers discover distinct groups in their customer bases, and then use this knowledge to develop targeted marketing programs
Land use: Identification of areas of similar land use in an earth observation database
Insurance: Identifying groups of motor insurance policy holders with a high average claim cost
City-planning: Identifying groups of houses according to their house type, value, and geographical location
Earth-quake studies: Observed earth quake epicenters should be clustered along continent faults
May 6, 2023 Data Mining: Concepts and Techniques 8
A good clustering method will produce high quality clusters with high intra-class similarity low inter-class similarity
The quality of a clustering result depends on both the similarity measure used by the method and its implementation
The quality of a clustering method is also measured by its ability to discover some or all of the hidden patterns
May 6, 2023 Data Mining: Concepts and Techniques 9
Dissimilarity/Similarity metric: Similarity is expressed in terms of a distance function, typically metric: d(i, j)
There is a separate “quality” function that measures the “goodness” of a cluster.
The definitions of distance functions are usually very different for interval-scaled, boolean, categorical, ordinal ratio, and vector variables.
Weights should be associated with different variables based on applications and data semantics.
It is hard to define “similar enough” or “good enough” the answer is typically highly subjective.
May 6, 2023 Data Mining: Concepts and Techniques 10
Partitioning approach: Construct various partitions and then evaluate them by some
criterion, e.g., minimizing the sum of square errors Typical methods: k-means, k-medoids, CLARANS
Hierarchical approach: Create a hierarchical decomposition of the set of data (or objects)
using some criterion Typical methods: Diana, Agnes, BIRCH, ROCK, CAMELEON
Density-based approach: Based on connectivity and density functions Typical methods: DBSACN, OPTICS, DenClue
May 6, 2023 Data Mining: Concepts and Techniques 11
Grid-based approach: based on a multiple-level granularity structure Typical methods: STING, WaveCluster, CLIQUE
Model-based: A model is hypothesized for each of the clusters and tries to find the
best fit of that model to each other Typical methods: EM, SOM, COBWEB
Frequent pattern-based: Based on the analysis of frequent patterns Typical methods: pCluster
User-guided or constraint-based: Clustering by considering user-specified or application-specific
constraints Typical methods: COD (obstacles), constrained clustering
May 6, 2023 Data Mining: Concepts and Techniques 12
Given k, the k-means algorithm is implemented in four steps: Partition objects into k nonempty subsets Compute seed points as the centroids of the
clusters of the current partition (the centroid is the center, i.e., mean point, of the cluster)
Assign each object to the cluster with the nearest seed point
Go back to Step 2, stop when no more new assignment
May 6, 2023 Data Mining: Concepts and Techniques 13
Example
0
1
2
3
4
5
6
7
8
9
10
0 1 2 3 4 5 6 7 8 9 100
1
2
3
4
5
6
7
8
9
10
0 1 2 3 4 5 6 7 8 9 10
0
1
2
3
4
5
6
7
8
9
10
0 1 2 3 4 5 6 7 8 9 100
1
2
3
4
5
6
7
8
9
10
0 1 2 3 4 5 6 7 8 9 10
0
1
2
3
4
5
6
7
8
9
10
0 1 2 3 4 5 6 7 8 9 10
K=2Arbitrarily choose K object as initial cluster center
Assign each objects to most similar center
Update the cluster means
Update the cluster means
reassignreassign
May 6, 2023 Data Mining: Concepts and Techniques 14
Strength: Relatively efficient: O(tkn), where n is # objects, k is # clusters, and t is # iterations. Normally, k, t << n.▪ Comparing: PAM: O(k(n-k)2 ), CLARA: O(ks2 + k(n-k))
Comment: Often terminates at a local optimum. The global optimum may be found using techniques such as: deterministic annealing and genetic algorithms
Weakness Applicable only when mean is defined, then what about
categorical data? Need to specify k, the number of clusters, in advance Unable to handle noisy data and outliers Not suitable to discover clusters with non-convex shapes
May 6, 2023 Data Mining: Concepts and Techniques 15
Cluster analysis groups objects based on their similarity and has wide applications
Measure of similarity can be computed for various types of data
Clustering algorithms can be categorized into partitioning methods, hierarchical methods, density-based methods, grid-based methods, and model-based methods
Outlier detection and analysis are very useful for fraud detection, etc. and can be performed by statistical, distance-based or deviation-based approaches
There are still lots of research issues on cluster analysis