Copyright © 2013 by Luc Anselin, All Rights Reserved
Metadata, Provenance and
Web Service for Spatial
Analysis
--the case of spatial weightsLuc Anselin, Sergio Rey, Wenwen Li
GeoDa CenterSchool of Geographical Sciences and Urban
PlanningArizona State University
Copyright © 2013 by Luc Anselin, All Rights Reserved
•Some Specific Project Goals
• Integrate and sustain a core set of composable, interoperable, manageable, and reusable CyberGIS software elements based on community-driven and open source strategies
Copyright © 2012 by Luc Anselin, All Rights Reserved
•Challenge
•most current spatial analysis/spatial econometrics software written for single CPU
•rethink and rewrite analytical, algorithmic and processing facilities to integrate into a cyberinfrastructure
•address lack of interoperability
Copyright © 2013 by Luc Anselin, All Rights Reserved
•Spatial Econometrics Workbench
•framework for supporting spatial econometric research in a cyberscience era (Anselin and Rey, IJGIS 2012)
•Leverage PySAL and CyberGIS
•Support scientific workflow
Copyright © 2013 by Luc Anselin, All Rights Reserved
•PySAL
•open source library of Python routines for spatial analysis: geocomputation, spatial weights, spatial autocorrelation, spatial econometrics, regionalization
•http://pysal.org
•hosted on github
Copyright © 2013 by Luc Anselin, All Rights Reserved
Copyright © 2013 by Luc Anselin, All Rights Reserved
•PySAL Progress Report
•current version is 1.6 (7th release)
•3.5 years of on-time bi-annual releases
•20,000+ downloads (10,000 in 2012)
•recognized in open source scientific community - Anaconda
Copyright © 2013 by Luc Anselin, All Rights Reserved
Anaconda for big data analytics
Copyright © 2013 by Luc Anselin, All Rights Reserved
•Migrating to CyberGIS
•performance = need for parallelization + refined algorithms
•interoperability = provide functionality as web services
•replicability: need for metadata and provenance tracking
Copyright © 2013 by Luc Anselin, All Rights Reserved
•Example: Spatial Weights
•includes spatial data source, type of weights (e.g., contiguity, distance), any standardization or manipulation (e.g., higher order)
Copyright © 2013 by Luc Anselin, All Rights Reserved
•Lack of Interoperability
•different implementations
•no standards
•duplication of efforts
•hinders interoperability and workflow chaining
Copyright © 2013 by Luc Anselin, All Rights Reserved
•Example: Weights Formats in PySAL
Copyright © 2013 by Luc Anselin, All Rights Reserved
Example: PySAL spgregwhat do we know about south_k6.gwt and
south_ep_k20.kwt
Copyright © 2012 by Luc Anselin, All Rights Reserved
•Conceptual Framework
•separate data source from operations
•data source: polygon or coordinate files with standard metadata (projection, origin, etc.)
•operations: weights metadata
Copyright © 2013 by Luc Anselin, All Rights Reserved
weights vocabulary
Copyright © 2013 by Luc Anselin, All Rights Reserved
weights metadata structure (wmd)
Copyright © 2013 by Luc Anselin, All Rights Reserved
•Web service implementation(OGC WPS)
•wraps PySAL weights module
•(re)creates weights object from information in wmd file
•makes weights object available as a file
Copyright © 2013 by Luc Anselin, All Rights Reserved
Workflow
Weights Parser
Dispatcher
Output
wmd file(json)
PySAL
Weights
Metadata
Copyright © 2013 by Luc Anselin, All Rights Reserved
Illustration
Copyright © 2013 by Luc Anselin, All Rights Reserved
•Generate Weights from Shapefile
•NAT.shp available on server
•output format = gal
Copyright © 2013 by Luc Anselin, All Rights Reserved
•Get Request
• http://spatial.gdta.asu.edu/cgi-bin/wps.cgi?request=Execute&service=WPS&version=1.0.0&identifier=weights_ws&status=false&datainputs=[outputformat=gal;metadata={"input1":{"type":"shp","uri":"http://toae.org/pub/NAT.shp"},"weight_type":"rook","transform":"O", "parameters":{"p":2,"k":4}}]
metadata input
Copyright © 2013 by Luc Anselin, All Rights Reserved
Server Response
Copyright © 2013 by Luc Anselin, All Rights Reserved
Sample gal output
http://spatial.gdta.asu.edu/wpsoutput/e66df128-14ed-11e3-bde9-0050455c0671.gal
Copyright © 2013 by Luc Anselin, All Rights Reserved
metadata (wmd) file
http://spatial.gdta.asu.edu/wpsoutput/e66df128-14ed-11e3-bde9-0050455c0671.wmd
Copyright © 2013 by Luc Anselin, All Rights Reserved
Performance Evaluation•How does PySAL scale when the
amount of input data increases?
•Is the overhead of web service framework acceptable?
•How does the web service framework scale in handling massive concurrent requests?
Copyright © 2013 by Luc Anselin, All Rights Reserved
Scale-up vs. Scale-out solution
•Scale-up
•High-end computer
•Configuration• Processor 2 x 2.93 GHz Quad-Core Intel Xeon
• Memory 16 GB 1066 MHz DDR3 ECC
• Software Mac OS X Lion 10.7.4 (11E53)
•Scale-out:
•Web server cluster
Copyright © 2013 by Luc Anselin, All Rights Reserved
Web Server Cluster
Copyright © 2013 by Luc Anselin, All Rights Reserved
•Performance •experiment using grid layout for
N = 10,000 to N = 100,000
•rook contiguity and k nearest neighbors (k = 4)
•input shape files on server in Utah, web service on server at ASU
Copyright © 2013 by Luc Anselin, All Rights Reserved
•Experiment 1
•Timing: average over 5 experiments
•web server overhead, data transfer and computation
•explore effect of data size
Copyright © 2013 by Luc Anselin, All Rights Reserved
time for rook and KNN contiguity
Copyright © 2013 by Luc Anselin, All Rights Reserved
•Experiment 2
•Scalability of web service framework
•High-end computer (8-cores)
•Cluster (4 computing nodes, each has 2-core)
•Total processing time
•Speed up
Copyright © 2013 by Luc Anselin, All Rights Reserved
Total processing time
Copyright © 2013 by Luc Anselin, All Rights Reserved
Speed-up
Copyright © 2013 by Luc Anselin, All Rights Reserved
•Experiment 3
•Scalability of the cluster by adding more computing nodes
•Average response time
•128 concurrent requests
•Dataset: 10,000 polygons
Copyright © 2013 by Luc Anselin, All Rights Reserved
Scalability - cluster
Copyright © 2013 by Luc Anselin, All Rights Reserved
Next Steps
Copyright © 2013 by Luc Anselin, All Rights Reserved
Copyright © 2013 by Luc Anselin, All Rights Reserved
•Towards a Standard
•refine specification: flexible, expandable, deal with edge cases
•improve performance (parallelization)
•implement seek operations on distributed files
•interoperability with other software
Thank you!