+ All Categories
Home > Documents > BigData Spatial Analytics...Beyond Dashboard! Can have best ML, best model, best team, all useless...

BigData Spatial Analytics...Beyond Dashboard! Can have best ML, best model, best team, all useless...

Date post: 20-Jun-2020
Category:
Upload: others
View: 3 times
Download: 0 times
Share this document with a friend
69
BigData Spatial Analytics Mansour Raad
Transcript

BigDataSpatial Analytics

Mansour Raad

Story Time...

is hereby granted to

to certify that he/she has completed to satisfaction

The CCDH Exam

Cloudera, Inc. 210 Portage Avenue Palo Alto, CA 94306 www.cloudera.com

___________________________ Date Granted

Test Date:

___________________________ Authorized Signature

Mansour Raad

March 2, 2012

Mar 09, 2012

is hereby granted to

to certify that he/she has completed to satisfaction

The CCDH Exam

Cloudera, Inc. 210 Portage Avenue Palo Alto, CA 94306 www.cloudera.com

___________________________ Date Granted

Test Date:

__ __________________________Authorized Signature

March 2, 2012

Mar 09, 2012

Finally, a big nail...

Input 1

U.S.Demographic

Data

Demographic Info

• Location

• Gender

• Race

• Income

• Age

Input 2

~1000 Locations

Task...

For Each LocationFor Each Demographic

50 Mile Heatmap

“Traditional Way”

• 14 Days Later

• 850GB Raster

Gotta Be A Better Way !

Hadoop

$> cat input | map | sort | reduce > out

Advantage

• Parallelism

• Fast Input Stream

• Fast Computational Geometry

• Distributed Cache

Vector / Raster

Cooperative Processing

g.beginGradientFill(GradientType.RADIAL,[ 0xFF0000, 0x0000FF ], ...);g.drawRect(x, y, 200, 200);g.endFill();bitmapData.draw(shape, null, null, BlendMode.SCREEN, null, true);

Where To Run 10 Nodes ?

~238 MB Vectorvs.

~850 GB Raster

Best Visualizer ?

What is Big Data ?

Great Story Telling Tool !

Data Democratizer!Beyond Dashboard!Can have best ML, best model, best team, all useless if u cannot tell a story of results!

What Is Big Data ?

(academic)

Beyond Traditional Means !

Traditional Processing

Traditional Database

•Too Big

•Too Fast

•Unstructured

Forcing new ways of thinking !

Big Data Sources...

Catch all wordsjust like “Cloud” was 3 year ago !

WebLogs

“Internet Of Things”

Imagery

Health Records

VOLUME

VELOCITY VARIETY

Volume

• Very Large Amount

• More Parameters

• Multi Node

• Storage

• Processing -Simple math is more effective with large parameters-Scalable storage-Program to data rather data to program

Velocity

• Rate of digital flow

• Streaming

• Event Processing

• Feedback Loop

• Recommendations - Clicks, locations- Mobile / Smartphones- Last 5 min snapshot of traffic is no good when crossing the street- CERN

Velocity Engines

• IBM InfoSphere Streams

• Twitter Storm

• Apache S4

Variety

• Unstructured

• Incomplete

• Semantically Different

Data is messy

Storage Variety

• NoSQL

• Columnar (HBase)

• Key/Value (Redis)

• Document (MongoDB)

• Graph (Neo4J)

Hadoop

HDFS

• Multi-TB Storage

• Inexpensive Nodes

• Fault Tolerant

• Concurrent Reading

• Brings Programs To Data

MapReduce

• Software Framework

• Parallel Processing

• Jobs Executed on HDFS

• Java / Python / C++

• Spatial Libraries

MapReduce Job

input | map | sort | reduce | output

Java Jars packaged and sent to data nodes for execution

Apache Hive

“SQL”

MapReduce Job

HDFSCSVTSV

JSONBINARY

MapReduce

hive> select * from cities where country=‘lebanon’;

Spatial Storage

• CSV,TSV Lat,Lon

• Esri JSON format

• {geometry:{x:-123,y:45},attributes:{}}

• Custom

What About Spatial ?

User Defined Functions

• select tolower(“ESRI”);

• select * from mytable where cos(rad) < 0.1;

Spatial UDF !

select * from citieswhere near(x,y,-84.2,39.4);

select * from citieswhere contains(x,y,’#mypolys’);

PythonGeoProcessing

HDFSRDBMS

“small data” “big data”

HadoopTools

ArcMapCatalog

Demo Time

The “Zoo”

• Pig - high level language for hadoop

• HBase - real/time random access to hdfs

• Flume - streaming data flow

• Mahout - machine learning

• Zookeeper - distributed state management

Processing Evolution

• Transactional - Batch

• Operational - Dashboard

• Analytical - Exploratory

• Intelligent - Real/Time, predictive

Fixed Schema

Variable Schema

“[T]here are known knowns; there are things we know that we know.There are known unknowns; that is to say there are things that, we now know we don't know.But there are also unknown unknowns – there are things we do not know we don't know.”

—United States Secretary of Defense, Donald Rumsfeld

[email protected]

@mraad

Date Event Location

March 21, 2013Esri DC Meet Up – Big Data & Location Analytics Washington, DC

April 18, 2013 Esri DC Meet Up Washington, DC

March 23–26, 2013 Esri Partner Conference Palm Springs, CA

March 25–28, 2013 Esri Developer Summit Palm Springs, CA

July 6–9, 2013 Esri National Security Summit San Diego, CA

July 8–12, 2013 Esri International User Conference San Diego, CA

Upcoming Events


Recommended