+ All Categories
Home > Technology > Spatial Data processing with Hadoop

Spatial Data processing with Hadoop

Date post: 08-Jul-2015
Category:
Upload: visiongeomatique2014
View: 462 times
Download: 2 times
Share this document with a friend
Description:
Spatial Data processing with Hadoop
50
University of Minnesota GeoJinni Spatial Data processing with Hadoop http://spatialhadoop.cs.umn.edu/ @ spatialhadoop Ahmed Eldawy
Transcript
Page 1: Spatial Data processing with Hadoop

University of Minnesota

GeoJinniSpatial Data processing with Hadoop

http://spatialhadoop.cs.umn.edu/

@spatialhadoop

Ahmed Eldawy

Page 2: Spatial Data processing with Hadoop
Page 3: Spatial Data processing with Hadoop

Claudius Ptolemy (AD 90 – AD 168)

Page 4: Spatial Data processing with Hadoop

Al Idrisi (1099–1165)

Page 5: Spatial Data processing with Hadoop
Page 6: Spatial Data processing with Hadoop
Page 7: Spatial Data processing with Hadoop
Page 8: Spatial Data processing with Hadoop

Cholera cases in the London epidemic of 1854

Page 9: Spatial Data processing with Hadoop
Page 10: Spatial Data processing with Hadoop
Page 11: Spatial Data processing with Hadoop
Page 12: Spatial Data processing with Hadoop

Cool technology..!!

Can I use it in my

application?

Oh..!! But, it is not

made for me. Can’t

make use of it as is

My pleasure. Here

it is..

Page 13: Spatial Data processing with Hadoop
Page 14: Spatial Data processing with Hadoop

Kindly let me get

the technology

you have

Kindly let me

understand your needs

Page 15: Spatial Data processing with Hadoop
Page 16: Spatial Data processing with Hadoop

HELP..!! I have too

much data. Your

technology is not

helping me

mmm…Let me

check with my good

friends there.

My pleasure. Here

it is..

Cool DBMS

technology..!!

Can I use it in my

application?

Oh..!! But, it is not

made for me. Can’t

make use of it as is

Page 17: Spatial Data processing with Hadoop
Page 18: Spatial Data processing with Hadoop

Kindly let me

understand your needs

Kindly let me

get the

technology you

have

Page 19: Spatial Data processing with Hadoop
Page 20: Spatial Data processing with Hadoop
Page 21: Spatial Data processing with Hadoop
Page 22: Spatial Data processing with Hadoop
Page 23: Spatial Data processing with Hadoop

HELP..!! Again, I have

too much data. Your

technology is not

helping me

Sorry, seems like the

DBMS technology

cannot scale more

Let me check with

my other good

friends there.

Cool MapReduce technology..!!

Can I use it in my application?

Oh..!! But, it is not

made for me. Can’t

make use of it as is

My pleasure. Here

it is..

Page 24: Spatial Data processing with Hadoop
Page 25: Spatial Data processing with Hadoop

Kindly let me

understand your needs

Kindly let me

get the

technology you

have

Page 26: Spatial Data processing with Hadoop

Kindly let me

understand your needs

Kindly let me

get the

technology you

have

aka

GeoJinni

Page 27: Spatial Data processing with Hadoop

27

Tons of Spatial data out there…

Smart phones Satellite Images

Medical data

Traffic data

Geotagged Microblogs

VGI Sensor networks

Geotagged pictures

Page 28: Spatial Data processing with Hadoop

28

Spatial language Built-in spatial data types

Spatial Indexes Spatial Operations

GeoJinniWebsite: http://spatialhadoop.cs.umn.edu/

Download source code, binary distribution, and instructions

Email us at: [email protected]

■ Released in March 2013; 75,000 downloads since then

Page 29: Spatial Data processing with Hadoop

29

The Built-in Approach of GeoJinni

Storage (HDFS)

MapReduce

Runtime

Job Monitoring and

Scheduling

Pig

LatinHadoop

Java APIS

User Programs

Spatial Modules(Spatial)

User Program

+

MapReduce

APIs

+

Job Monitoring

and Scheduling

+

MapReduce

Runtime

+

Storage

+

Storage (HDFS)

MapReduce

Runtime

Job Monitoring and

Scheduling

Pig

LatinHadoop

Java APIS

User Programs

Spatial

Indexing

Early

Pruning

Spatial

Operators

Spatial

Language

The On-top

Approach

From Scratch

ApproachThe Built-in Approach

(GeoJinni)

Page 31: Spatial Data processing with Hadoop

31

GeoJinni Architecture

Applications: MNTG [SSTD’13, ICDE’14]

SHAHED [ICDE’15] – TAREEG [SIGMOD’14, SIGSPATIAL’14]

Sp

atio

-tem

po

ral H

ad

oo

p

Language: Pigeon [ICDE’14]

Operations: Basic [VLDB’13] – CG_Hadoop [SIGSPATIAL’13]

Data Mining – Visualization [Under submission]

MapReduce: Spatial File Splitter – Spatial Record Reader

Indexing: Grid File – R-tree – R+-tree [ICDE’15]

Page 32: Spatial Data processing with Hadoop

32

Language Layer: Pigeon

■ Extends Pig Latin with OGC-compliant primitives

Spatial data types (e.g., Polygon)

Basic operations (e.g., Area)

Spatial predicates (e.g., Touches)

Spatial analysis (e.g., Union)

Spatial aggregate functions (e.g., Convex Hull)

A. Eldawy and M. F. Mokbel. Pigeon: A Spatial MapReduce Language. In ICDE, 2014

cities = LOAD ’cities’

AS (city_id: int, city_geom);

City_area = FOREACH cities

GENERATE Area(city_geom) AS area;

Page 33: Spatial Data processing with Hadoop

33

Indexing Layer: R+-tree

Page 34: Spatial Data processing with Hadoop

34

Indexing Layer: Grid File

Page 35: Spatial Data processing with Hadoop

35

Non-indexed Heap File

Page 36: Spatial Data processing with Hadoop

36

Range Query

SpatialFileSplitter

prunes blocks

outside the query

range

SpatialRecordReader

passes local indexes

to the map function

Map function selects

records in range

Page 37: Spatial Data processing with Hadoop

37

CG_Hadoop

■ Make use of GeoJinni to speedup

computational geometry algorithms

Polygon union, Skyline, Convex Hull,

Farthest/Closest Pair

■ Single machine implementation

E.g., Skyline of 4 billion points takes three hours

■ Straight forward implementation in Hadoop

Hadoop parallel execution

■ More efficient implementation

in GeoJinni

Spatial indexing

Early pruning

■ Free open source as part of GeoJinni

Single

Machine Hadoop

GeoJinni

29x

260x

1x

Page 38: Spatial Data processing with Hadoop

38

Convex Hull

Find the minimal convex polygon that contains all pointsInput Output

Page 39: Spatial Data processing with Hadoop

39

Convex Hull in CG_Hadoop

Hadoop CG_Hadoop

Partition

Pruning

Local hull

Global hull

Page 40: Spatial Data processing with Hadoop

40

Map rendering

■ Map rendering creates an image that represents the

data

■ Visualization is an international language

■ Can reveal patterns that are otherwise hard to spot

■ The visual system occupies about one third of the

human brain210 LINESTRING (-2.3634904 51.3845649, -2.3634254 51.3843983, -

2.3631927 51.3838436) [highway#primary,ref#A4,name#Gay Street]

420 LINESTRING (-1.8230973 52.5541131, -1.8230368 52.5540756, -

1.8229324 52.5540109, -1.8227961 52.5539014, -1.8227365 52.5538461, -

1.8226952 52.5538058, -1.8226204 52.5537103, -1.8223988 52.5534041, -

1.8221814 52.5531498, -1.8218478 52.5528188, -1.8215581 52.5525626, -

1.8213525 52.5524042) [source#GPS

Survey,highway#residential,postal_code#B72,name#Moss

Drive,is_in#Sutton Coldfield,maxspeed#30,abutters#residential]

490 LINESTRING (-0.1896508 51.6456414, -0.1895803 51.6456036, -

0.1895245 51.645551, -0.1890055 51.6450801, -0.1887808 51.6448764, -

0.1885605 51.6446756, -0.1883084 51.6443753, -0.1875496 51.6433375, -

0.1864572 51.6415288, -0.1862165 51.6411939, -0.1859495 51.6406583, -

0.1858855 51.6405461) [lit#yes,surface#asphalt,maxspeed#30

mph,highway#residential,abutters#residential,name#Sherrards Way]

770 LINESTRING (-1.8184653 52.5723683, -1.8182353 52.5723576, -

Page 41: Spatial Data processing with Hadoop

41

Smoothing

Input Buffer

Only

Buffer +

Merge

Page 42: Spatial Data processing with Hadoop

42

Multi-level Image

■ Many images at

different zoom

levels

Pan

Zoom in/out

Fly to

■ More details as

the zoom level

increases

Page 43: Spatial Data processing with Hadoop

43

MNTG - World-wide traffic generator

for road networks

http://mntg.cs.umn.edu/

M. F. Mokbel, L. Alarabi, J. Bao, A. Eldawy, A. Magdy, M. Sarwat, E. Waytas, and S.

Yackel. MNTG: An Extensible Web-based Traffic Generator. In SSTD, 2013

Page 44: Spatial Data processing with Hadoop

44

SHAHED – A tool for querying and

visualizing spatio-temporal satellite data

http://shahed.cs.umn.edu/

"SHAHED: A MapReduce-based System for Querying and Visualizing Spatio-temporal

Satellite Data“, Ahmed Eldawy et al, ICDE 2015

Page 45: Spatial Data processing with Hadoop

45

World Temperature

Page 46: Spatial Data processing with Hadoop

46

Smooth World Temperature

Page 47: Spatial Data processing with Hadoop

47

World Heat Map on Google Earth

Page 48: Spatial Data processing with Hadoop

48

TAREEG – Web-based extractor for

OpenStreetMap data using MapReduce

http://tareeg.net/

L. Alarabi, A. Eldawy, R. Alghamdi, and M. F. Mokbel. TAREEG: A MapReduce-Based

Web Service for Extracting Spatial Data from OpenStreetMap. In SIGMOD, 2014

Page 49: Spatial Data processing with Hadoop

49

Extracted Road Network

Page 50: Spatial Data processing with Hadoop

50

OperationsIndexes

Interact with the system and express your queries in a

simple high level language with built-in spatial support

Spatial high level language

Analyze your spatial data efficiently

Language

Built-in spatial data types

Have all your spatial datasets ready to load in

SpatialHadoop with the built-in spatial data types

Data types

Spatial Indexes

Datasets are organized efficiently using spatial indexes

(Grid or R-tree) that are adapted to MapReduce

Efficient Spatial Operations

Analyze your data on large clusters with built-in spatial

operations that runs efficiently using spatial indexes

GeoJinni

Website: http://spatialhadoop.cs.umn.edu/Download source code, binary distribution, and instructions

Email us at: [email protected]


Recommended