Post on 11-Aug-2014
description
transcript
Developing a Tutorial for Grouping Analysis in ArcGIS
Daniel PierreMay 29, 2014
1. Introduction
2. Data
3. Grouping Analysis Workflows
4. Tutorial Exercises
5. Conclusions: Recommendations
Presentation Outline
Lauren Rosenshein Bennett, MSGeoprocessing Product Engineer, EsriLbennett@esri.com
Dr. Konrad DramowiczFaculty, Centre of Geographic
SciencesKonrad.Dramowicz@nscc.ca
Dr. Ela DramowiczFaculty, Centre of Geographic
SciencesEla.Dramowicz@nscc.ca
Introduction
Project Sponsor & Supervisors
Introduction
• Experimental testing of tool with multiple datasets
• Incorporation of Grouping Analysis with other tools
• Review of technical literature on clustering algorithms
• Review of existing tutorials
Project Overview
Introduction
• Introduced at ArcGIS 10.1
• Available with Basic, Standard and Advanced license levels
• Found in the Spatial Statistics toolbox, within the Mapping Clusters toolset
• Script tool
Grouping Analysis Tool
Introduction
• “...Performs a classification procedure that tries to find natural clusters in your data.” - Esri
• An aid for data comprehension• Feature similarity is based on
attributes specified as analysis fields and optionally, spatial constraints
• Given a number of groups, features within each output group are as similar as possible while groups are as different as possible
Grouping Analysis Tool
Introduction
• Two algorithm types: cluster analysis (traditional K-means) and regionalization (spatial K-means)
• Thirteen parameters (six required)
• Grouping results contingent on the number of groups, analysis fields, and type of spatial constraint
Grouping Analysis Tool
Data
Features:• Esri• City of Vancouver
Multivariate Data:• World Bank• BBC• Weatherbase• Statistics Canada
Data Sources
Data
• Data Enrichment (ArcGIS Online)
• HTML table import
• Spreadsheet reformatting
• Table joins
• Feature class edits
Data Preparation
Data
Selection Criteria:
• Two scales of analysis
• Illustration of various spatial constraint effects on results
• Sufficient number of features
• Visible spatial patterns in results
Tutorial Datasets
General Steps:
• Exploratory data analysis
• Preprocessing
• Determining appropriate Grouping Analysis settings
• Postprocessing, interpretation and evaluation of results
Grouping Analysis Workflows
Exploratory Data Analysis
1. Distribution of variable values• Thematic mapping• Spatial autocorrelation
2. Spatial relationships among features
• Contiguity of features and number of neighbours
• Spatial autocorrelation
Exploratory Data Analysis
Exploratory Data Analysis
• Explore distribution of dataset variables
• Choropleth maps and graduated symbol maps
• Identify set of variables to be used for Grouping Analysis
Thematic Mapping
Exploratory Data Analysis
• Analyze contiguity relationships among features
• Polygon Neighbors tool
• Determine relative connectivity of features by counting number of neighbours
• Frequency tool
Spatial Relationships
Exploratory Data Analysis
• Analyze contiguity and/or proximity relationships among features using GeoDa
• Create spatial weights
• Display histogram of feature connectivity according to defined spatial relationships
• Histogram linked to map and attribute table
Alternative Approach
Exploratory Data Analysis
• Considers attribute values and location of features simultaneously
• Moran’s I statistic determines whether spatial pattern of values is dispersed, random or clustered
• Significance of pattern evaluated with corresponding z-score
• One variable at a time
Spatial Autocorrelation
Preprocessing
Use hot spots to limit study area for Grouping Analysis:
• Calculate incremental spatial autocorrelation
• Identify distance band of most intense clustering
• Create hot spot map• Select features from original
dataset based on location of hot spots
Preprocessing
Grouping Analysis Settings
1. How many groups should be created?
2. Which analysis fields should be used?
3. Is a spatial constraint necessary? If so, which type is appropriate?
Grouping Analysis Settings:Key Considerations
Grouping Analysis Settings
• Default number is 2
• Sturge’s rule:
C = 1 + 3.3 log(n), whereC is the number of groups and n is the number of features
• Evaluate the optimal number of groups (up to a maximum of 15)
Number of Groups
Grouping Analysis Settings
Two vs. Three Groups
Grouping Analysis Settings
• Generally driven by research purpose and objectives of grouping
• Guide selection of analysis fields with exploratory data analysis findings
• Spatial variables may be used as indirect spatial constraints
• Assess effectiveness of fields to distinguish features with output report
Analysis Fields
Grouping Analysis Settings
Temperature: Spatial Variable
Grouping Analysis Settings
• Choice of spatial constraint or no spatial constraint determines which algorithm is used for grouping
• No spatial constraint – traditional K-Means (data space only)
• Any spatial constraint – Spatial ‘K’luster Analysis by Tree Edge Removal (SKATER) method (spatial K-Means)
Spatial Constraints
Grouping Analysis Settings
No Spatial Constraint vs.Spatial Constraint
Grouping Analysis Settings
• Contiguity – edges only (“rook” type) or edges and corners (“queen” type)
• Delaunay triangulation – contiguity of representations of features as Voronoi polygons
• Proximity – K nearest neighbours
• Spatial weights
Spatial Constraint Types
Grouping Analysis Settings
• Evaluate optimal number of groups
• Guide selection of analysis fields with calculated R2 values
• Visually assess results of specified spatial constraint
Iterative Process for Optimizing Grouping Analysis
Interpretation & Evaluation
• Spatial distribution of groups (map)
• Global statistics (output report)
• Group and variable statistics (output report)
• Group profiles
Interpretation of Results
Interpretation & Evaluation
• Compare group means with each other and global range
Group Profiles
Interpretation & Evaluation
• Compare group means and ranges for each variable
Group Profiles (2)
• Consider global mean, median and range for each variable
Group Profiles (3)
Interpretation & Evaluation
Interpretation & Evaluation
• Global Moran’s I statistic
• Determine spatial pattern of group membership
• Measure spatial compactness of group membership
• Clustered groups generally desired
Evaluation of Results: Spatial Autocorrelation
Dispersed
Clustered
Random
Interpretation & Evaluation
• Smallest to largest group
• Indicator of balance in group membership
• Balanced number of group members generally desired for comparison of statistics
• Frequency tool
Evaluation of Results: Cluster Size Ratio
Interpretation & Evaluation
• Goodness measure that combines concepts of cohesion and separation
• Adapted from cluster analysis to consider attribute data and location
• Silhouette coefficient is calculated for every feature and the average is taken for the entire dataset
Evaluation of Results: Silhouette
Interpretation & Evaluation
(B – A) / max(A, B) where
A is the distance between a feature and its group center
B is the distance between the feature and its neighbouring group center
Silhouette Coefficient
Interpretation & Evaluation
• Range between –1 (poor) and 1 (excellent)
• < 0.2 indicates poor clustering
• > 0.5 indicates good partition of the data
Silhouette Coefficient Values
Tutorial Exercises
• Six exercises
• Two scenarios (3 exercises for each)
• Suitable for users at all levels of experience
• Exercises take the user through the steps of preprocessing, group creation, interpretation and evaluation of results outlined here
Grouping Analysis Tutorial
Tutorial Exercises
Exercises:
1. Data exploration
2. Grouping for exploratory data analysis
3. Using Spatial Statistics tools to target areas of interest
Scenario 1: Analysis of Crime in Chicago
Tutorial Exercises
Exercises:
4. Create groups and use results to write profiles
5. Explore effects of spatial constraints
6. Evaluation of results
Scenario 2: Analysis of Olympic Results
Tutorial Exercises
1. All tutorial exercises use polygon data exclusively; point features not covered
2. Space-time constraints using spatial weights matrix file not covered
3. Catered to general user; no exercises specifically target advanced users
Limitations
Recommendations
1. Exploratory data analysis
2. Grouping Analysis
3. Evaluation of results
Recommendations: Enhancements and Additional Tools
Recommendations
• Multi-step process using Polygon Neighbors, Frequency and table joins could be simplified
• Dynamic linking of objects can make use of existing ArcGIS functionality
Determining Spatial Relationships Among Features
Recommendations
• Expand types of spatial relationships that can be analyzed
• Enable the analysis of higher order relationships
Determining Spatial Relationships Among Features (continued)
Recommendations
• Tools for determining most useful diagnostic or predictor variables
• Guide selection of analysis fields for data partitioning
• Adapt neural networks or other data mining tools to work with spatial constraints
Identification of Useful Diagnostic Variables
Recommendations
Grouping Analysis Tool Enhancements
• Create unique identifier
• Replace null values
Recommendations
• Spatial weights matrix can be used as the spatial constraint for creating groups
• Custom weights require either manual table creation or programming
• Solution: interactive feature selection
User-defined spatial relationships among features
Recommendations
• Expand beyond R2 and F-statistic values in output report
• Adapt methods used to evaluate cluster analysis algorithms (e.g. Silhouette)
• Challenge: universally applicable evaluation methods may not be feasible
Evaluation of Results
THANK YOU!