Date post: | 08-Jul-2015 |
Category: |
Technology |
Upload: | visiongeomatique2014 |
View: | 462 times |
Download: | 2 times |
University of Minnesota
GeoJinniSpatial Data processing with Hadoop
http://spatialhadoop.cs.umn.edu/
@spatialhadoop
Ahmed Eldawy
Claudius Ptolemy (AD 90 – AD 168)
Al Idrisi (1099–1165)
Cholera cases in the London epidemic of 1854
Cool technology..!!
Can I use it in my
application?
Oh..!! But, it is not
made for me. Can’t
make use of it as is
My pleasure. Here
it is..
Kindly let me get
the technology
you have
Kindly let me
understand your needs
HELP..!! I have too
much data. Your
technology is not
helping me
mmm…Let me
check with my good
friends there.
My pleasure. Here
it is..
Cool DBMS
technology..!!
Can I use it in my
application?
Oh..!! But, it is not
made for me. Can’t
make use of it as is
Kindly let me
understand your needs
Kindly let me
get the
technology you
have
HELP..!! Again, I have
too much data. Your
technology is not
helping me
Sorry, seems like the
DBMS technology
cannot scale more
Let me check with
my other good
friends there.
Cool MapReduce technology..!!
Can I use it in my application?
Oh..!! But, it is not
made for me. Can’t
make use of it as is
My pleasure. Here
it is..
Kindly let me
understand your needs
Kindly let me
get the
technology you
have
Kindly let me
understand your needs
Kindly let me
get the
technology you
have
aka
GeoJinni
27
Tons of Spatial data out there…
Smart phones Satellite Images
Medical data
Traffic data
Geotagged Microblogs
VGI Sensor networks
Geotagged pictures
28
Spatial language Built-in spatial data types
Spatial Indexes Spatial Operations
GeoJinniWebsite: http://spatialhadoop.cs.umn.edu/
Download source code, binary distribution, and instructions
Email us at: [email protected]
■ Released in March 2013; 75,000 downloads since then
29
The Built-in Approach of GeoJinni
Storage (HDFS)
MapReduce
Runtime
Job Monitoring and
Scheduling
Pig
LatinHadoop
Java APIS
User Programs
Spatial Modules(Spatial)
User Program
+
MapReduce
APIs
+
Job Monitoring
and Scheduling
+
MapReduce
Runtime
+
Storage
+
…
Storage (HDFS)
MapReduce
Runtime
Job Monitoring and
Scheduling
Pig
LatinHadoop
Java APIS
User Programs
Spatial
Indexing
Early
Pruning
Spatial
Operators
Spatial
Language
The On-top
Approach
From Scratch
ApproachThe Built-in Approach
(GeoJinni)
30
Spatial Data & Hadoop
Takes 193 seconds
HadoopSpatial Data
points = LOAD ’points’ AS
(id:int, x:int, y:int);
result = FILTER points BY
x < xmax AND x >= xmin AND
y < ymax AND y >= ymin;
GeoJinni
points = LOAD ’points’ AS
(id:int, location:point);
result = FILTER points BY
IsOverlap(location, rectangle
(xmin, ymin, xmax, ymax));
Finishes in 2 seconds
GeoJinni
31
GeoJinni Architecture
Applications: MNTG [SSTD’13, ICDE’14]
SHAHED [ICDE’15] – TAREEG [SIGMOD’14, SIGSPATIAL’14]
Sp
atio
-tem
po
ral H
ad
oo
p
Language: Pigeon [ICDE’14]
Operations: Basic [VLDB’13] – CG_Hadoop [SIGSPATIAL’13]
Data Mining – Visualization [Under submission]
MapReduce: Spatial File Splitter – Spatial Record Reader
Indexing: Grid File – R-tree – R+-tree [ICDE’15]
32
Language Layer: Pigeon
■ Extends Pig Latin with OGC-compliant primitives
Spatial data types (e.g., Polygon)
Basic operations (e.g., Area)
Spatial predicates (e.g., Touches)
Spatial analysis (e.g., Union)
Spatial aggregate functions (e.g., Convex Hull)
A. Eldawy and M. F. Mokbel. Pigeon: A Spatial MapReduce Language. In ICDE, 2014
cities = LOAD ’cities’
AS (city_id: int, city_geom);
City_area = FOREACH cities
GENERATE Area(city_geom) AS area;
33
Indexing Layer: R+-tree
34
Indexing Layer: Grid File
35
Non-indexed Heap File
36
Range Query
SpatialFileSplitter
prunes blocks
outside the query
range
SpatialRecordReader
passes local indexes
to the map function
Map function selects
records in range
37
CG_Hadoop
■ Make use of GeoJinni to speedup
computational geometry algorithms
Polygon union, Skyline, Convex Hull,
Farthest/Closest Pair
■ Single machine implementation
E.g., Skyline of 4 billion points takes three hours
■ Straight forward implementation in Hadoop
Hadoop parallel execution
■ More efficient implementation
in GeoJinni
Spatial indexing
Early pruning
■ Free open source as part of GeoJinni
Single
Machine Hadoop
GeoJinni
29x
260x
1x
38
Convex Hull
Find the minimal convex polygon that contains all pointsInput Output
39
Convex Hull in CG_Hadoop
Hadoop CG_Hadoop
Partition
Pruning
Local hull
Global hull
40
Map rendering
■ Map rendering creates an image that represents the
data
■ Visualization is an international language
■ Can reveal patterns that are otherwise hard to spot
■ The visual system occupies about one third of the
human brain210 LINESTRING (-2.3634904 51.3845649, -2.3634254 51.3843983, -
2.3631927 51.3838436) [highway#primary,ref#A4,name#Gay Street]
420 LINESTRING (-1.8230973 52.5541131, -1.8230368 52.5540756, -
1.8229324 52.5540109, -1.8227961 52.5539014, -1.8227365 52.5538461, -
1.8226952 52.5538058, -1.8226204 52.5537103, -1.8223988 52.5534041, -
1.8221814 52.5531498, -1.8218478 52.5528188, -1.8215581 52.5525626, -
1.8213525 52.5524042) [source#GPS
Survey,highway#residential,postal_code#B72,name#Moss
Drive,is_in#Sutton Coldfield,maxspeed#30,abutters#residential]
490 LINESTRING (-0.1896508 51.6456414, -0.1895803 51.6456036, -
0.1895245 51.645551, -0.1890055 51.6450801, -0.1887808 51.6448764, -
0.1885605 51.6446756, -0.1883084 51.6443753, -0.1875496 51.6433375, -
0.1864572 51.6415288, -0.1862165 51.6411939, -0.1859495 51.6406583, -
0.1858855 51.6405461) [lit#yes,surface#asphalt,maxspeed#30
mph,highway#residential,abutters#residential,name#Sherrards Way]
770 LINESTRING (-1.8184653 52.5723683, -1.8182353 52.5723576, -
…
41
Smoothing
Input Buffer
Only
Buffer +
Merge
42
Multi-level Image
■ Many images at
different zoom
levels
Pan
Zoom in/out
Fly to
■ More details as
the zoom level
increases
43
MNTG - World-wide traffic generator
for road networks
http://mntg.cs.umn.edu/
M. F. Mokbel, L. Alarabi, J. Bao, A. Eldawy, A. Magdy, M. Sarwat, E. Waytas, and S.
Yackel. MNTG: An Extensible Web-based Traffic Generator. In SSTD, 2013
44
SHAHED – A tool for querying and
visualizing spatio-temporal satellite data
http://shahed.cs.umn.edu/
"SHAHED: A MapReduce-based System for Querying and Visualizing Spatio-temporal
Satellite Data“, Ahmed Eldawy et al, ICDE 2015
45
World Temperature
46
Smooth World Temperature
47
World Heat Map on Google Earth
48
TAREEG – Web-based extractor for
OpenStreetMap data using MapReduce
http://tareeg.net/
L. Alarabi, A. Eldawy, R. Alghamdi, and M. F. Mokbel. TAREEG: A MapReduce-Based
Web Service for Extracting Spatial Data from OpenStreetMap. In SIGMOD, 2014
49
Extracted Road Network
50
OperationsIndexes
Interact with the system and express your queries in a
simple high level language with built-in spatial support
Spatial high level language
Analyze your spatial data efficiently
Language
Built-in spatial data types
Have all your spatial datasets ready to load in
SpatialHadoop with the built-in spatial data types
Data types
Spatial Indexes
Datasets are organized efficiently using spatial indexes
(Grid or R-tree) that are adapted to MapReduce
Efficient Spatial Operations
Analyze your data on large clusters with built-in spatial
operations that runs efficiently using spatial indexes
GeoJinni
Website: http://spatialhadoop.cs.umn.edu/Download source code, binary distribution, and instructions
Email us at: [email protected]