+ All Categories
Home > Documents > Mr. Idrissa Y. H. Assistant Lecturer, Geography Environment Department of Social Sciences School of...

Mr. Idrissa Y. H. Assistant Lecturer, Geography Environment Department of Social Sciences School of...

Date post: 18-Jan-2018
Category:
Upload: ginger-thomas
View: 216 times
Download: 0 times
Share this document with a friend
Description:
 That Spatial Statistics, extends traditional statistics on two fronts. First, it seeks to map the variation in a data set and Secondly, it can uncover “numerical spatial relationships” within and among mapped data layers.  Tobler’s Law  “Everything is related to everything else, but near things are more related than distant things”  3 major benefits of spatial analysis  Pattern Analysis  Feature count Analysis
15
Mr. Idrissa Y. H. Assistant Lecturer, Geography & Environment Department of Social Sciences School of Natural & Social Sciences State University of Zanzibar LECTURE 8a_ SPATIAL STATISCAL ANALYSIS
Transcript
Page 1: Mr. Idrissa Y. H. Assistant Lecturer, Geography  Environment Department of Social Sciences School of Natural  Social Sciences State University of Zanzibar.

Mr. Idrissa Y. H. Assistant Lecturer,Geography & EnvironmentDepartment of Social SciencesSchool of Natural & Social SciencesState University of Zanzibar

LECTURE 8a_ SPATIAL STATISCAL ANALYSIS

Page 2: Mr. Idrissa Y. H. Assistant Lecturer, Geography  Environment Department of Social Sciences School of Natural  Social Sciences State University of Zanzibar.

Introduction to spatial analysis Judging spatial association visually The concept of Clustering and

Cluster analysis Spatial Cross-Correlation Pearson, Spearman Multivariate spatial association

measures

Page 3: Mr. Idrissa Y. H. Assistant Lecturer, Geography  Environment Department of Social Sciences School of Natural  Social Sciences State University of Zanzibar.

That Spatial Statistics, extends traditional statistics on two fronts. First, it seeks to map the variation in a data set and Secondly, it can uncover “numerical spatial relationships” within and among mapped data layers.

Tobler’s Law “Everything is related to everything else, but near things are more related than distant things”

3 major benefits of spatial analysis Pattern Analysis Feature count Analysis

Page 4: Mr. Idrissa Y. H. Assistant Lecturer, Geography  Environment Department of Social Sciences School of Natural  Social Sciences State University of Zanzibar.

May 6, 2023 Data Mining: Concepts and Techniques 4

1. What is Cluster Analysis?2. Types of Data in Cluster Analysis3. A Categorization of Major Clustering Methods4. Partitioning Methods5. Hierarchical Methods6. Density-Based Methods7. Grid-Based Methods8. Model-Based Methods9. Clustering High-Dimensional Data 10.Constraint-Based Clustering 11.Outlier Analysis12.Summary

Page 5: Mr. Idrissa Y. H. Assistant Lecturer, Geography  Environment Department of Social Sciences School of Natural  Social Sciences State University of Zanzibar.

May 6, 2023 Data Mining: Concepts and Techniques 5

Cluster: a collection of data objects Similar to one another within the same cluster Dissimilar to the objects in other clusters

Cluster analysis Finding similarities between data according to the

characteristics found in the data and grouping similar data objects into clusters

Unsupervised learning: no predefined classes Typical applications

As a stand-alone tool to get insight into data distribution

As a preprocessing step for other algorithms

Page 6: Mr. Idrissa Y. H. Assistant Lecturer, Geography  Environment Department of Social Sciences School of Natural  Social Sciences State University of Zanzibar.

May 6, 2023 Data Mining: Concepts and Techniques 6

Pattern Recognition Spatial Data Analysis

Create thematic maps in GIS by clustering feature spaces

Detect spatial clusters or for other spatial mining tasks

Image Processing Economic Science (especially market research) WWW

Document classification Cluster Weblog data to discover groups of similar

access patterns

Page 7: Mr. Idrissa Y. H. Assistant Lecturer, Geography  Environment Department of Social Sciences School of Natural  Social Sciences State University of Zanzibar.

May 6, 2023 Data Mining: Concepts and Techniques 7

Marketing: Help marketers discover distinct groups in their customer bases, and then use this knowledge to develop targeted marketing programs

Land use: Identification of areas of similar land use in an earth observation database

Insurance: Identifying groups of motor insurance policy holders with a high average claim cost

City-planning: Identifying groups of houses according to their house type, value, and geographical location

Earth-quake studies: Observed earth quake epicenters should be clustered along continent faults

Page 8: Mr. Idrissa Y. H. Assistant Lecturer, Geography  Environment Department of Social Sciences School of Natural  Social Sciences State University of Zanzibar.

May 6, 2023 Data Mining: Concepts and Techniques 8

A good clustering method will produce high quality clusters with high intra-class similarity low inter-class similarity

The quality of a clustering result depends on both the similarity measure used by the method and its implementation

The quality of a clustering method is also measured by its ability to discover some or all of the hidden patterns

Page 9: Mr. Idrissa Y. H. Assistant Lecturer, Geography  Environment Department of Social Sciences School of Natural  Social Sciences State University of Zanzibar.

May 6, 2023 Data Mining: Concepts and Techniques 9

Dissimilarity/Similarity metric: Similarity is expressed in terms of a distance function, typically metric: d(i, j)

There is a separate “quality” function that measures the “goodness” of a cluster.

The definitions of distance functions are usually very different for interval-scaled, boolean, categorical, ordinal ratio, and vector variables.

Weights should be associated with different variables based on applications and data semantics.

It is hard to define “similar enough” or “good enough” the answer is typically highly subjective.

Page 10: Mr. Idrissa Y. H. Assistant Lecturer, Geography  Environment Department of Social Sciences School of Natural  Social Sciences State University of Zanzibar.

May 6, 2023 Data Mining: Concepts and Techniques 10

Partitioning approach: Construct various partitions and then evaluate them by some

criterion, e.g., minimizing the sum of square errors Typical methods: k-means, k-medoids, CLARANS

Hierarchical approach: Create a hierarchical decomposition of the set of data (or objects)

using some criterion Typical methods: Diana, Agnes, BIRCH, ROCK, CAMELEON

Density-based approach: Based on connectivity and density functions Typical methods: DBSACN, OPTICS, DenClue

Page 11: Mr. Idrissa Y. H. Assistant Lecturer, Geography  Environment Department of Social Sciences School of Natural  Social Sciences State University of Zanzibar.

May 6, 2023 Data Mining: Concepts and Techniques 11

Grid-based approach: based on a multiple-level granularity structure Typical methods: STING, WaveCluster, CLIQUE

Model-based: A model is hypothesized for each of the clusters and tries to find the

best fit of that model to each other Typical methods: EM, SOM, COBWEB

Frequent pattern-based: Based on the analysis of frequent patterns Typical methods: pCluster

User-guided or constraint-based: Clustering by considering user-specified or application-specific

constraints Typical methods: COD (obstacles), constrained clustering

Page 12: Mr. Idrissa Y. H. Assistant Lecturer, Geography  Environment Department of Social Sciences School of Natural  Social Sciences State University of Zanzibar.

May 6, 2023 Data Mining: Concepts and Techniques 12

Given k, the k-means algorithm is implemented in four steps: Partition objects into k nonempty subsets Compute seed points as the centroids of the

clusters of the current partition (the centroid is the center, i.e., mean point, of the cluster)

Assign each object to the cluster with the nearest seed point

Go back to Step 2, stop when no more new assignment

Page 13: Mr. Idrissa Y. H. Assistant Lecturer, Geography  Environment Department of Social Sciences School of Natural  Social Sciences State University of Zanzibar.

May 6, 2023 Data Mining: Concepts and Techniques 13

Example

0

1

2

3

4

5

6

7

8

9

10

0 1 2 3 4 5 6 7 8 9 100

1

2

3

4

5

6

7

8

9

10

0 1 2 3 4 5 6 7 8 9 10

0

1

2

3

4

5

6

7

8

9

10

0 1 2 3 4 5 6 7 8 9 100

1

2

3

4

5

6

7

8

9

10

0 1 2 3 4 5 6 7 8 9 10

0

1

2

3

4

5

6

7

8

9

10

0 1 2 3 4 5 6 7 8 9 10

K=2Arbitrarily choose K object as initial cluster center

Assign each objects to most similar center

Update the cluster means

Update the cluster means

reassignreassign

Page 14: Mr. Idrissa Y. H. Assistant Lecturer, Geography  Environment Department of Social Sciences School of Natural  Social Sciences State University of Zanzibar.

May 6, 2023 Data Mining: Concepts and Techniques 14

Strength: Relatively efficient: O(tkn), where n is # objects, k is # clusters, and t is # iterations. Normally, k, t << n.▪ Comparing: PAM: O(k(n-k)2 ), CLARA: O(ks2 + k(n-k))

Comment: Often terminates at a local optimum. The global optimum may be found using techniques such as: deterministic annealing and genetic algorithms

Weakness Applicable only when mean is defined, then what about

categorical data? Need to specify k, the number of clusters, in advance Unable to handle noisy data and outliers Not suitable to discover clusters with non-convex shapes

Page 15: Mr. Idrissa Y. H. Assistant Lecturer, Geography  Environment Department of Social Sciences School of Natural  Social Sciences State University of Zanzibar.

May 6, 2023 Data Mining: Concepts and Techniques 15

Cluster analysis groups objects based on their similarity and has wide applications

Measure of similarity can be computed for various types of data

Clustering algorithms can be categorized into partitioning methods, hierarchical methods, density-based methods, grid-based methods, and model-based methods

Outlier detection and analysis are very useful for fraud detection, etc. and can be performed by statistical, distance-based or deviation-based approaches

There are still lots of research issues on cluster analysis


Recommended