Warp10: collect, store and manipulate sensor data - BreizhCamp - 2016 03-24

Post on 09-Apr-2017

200 views 3 download

transcript

@FinistSeb @LostInBrittany#BzhCmp #Warp10

Warp 10: Collect, store and manipulate sensor data

Horacio Gonzalez Sébastien Lambour

@FinistSeb @LostInBrittany#BzhCmp #Warp10

Horacio Gonzalez

@LostInBrittany

Cityzen Data

Spaniard lost in Brittany, developer, dreamer and all-around geek

@FinistSeb @LostInBrittany#BzhCmp #Warp10

Sébastien Lambour

@FinistSeb

Cityzen Data

Runner, 2 Kids, Geek, Handyman, Polyglot JVM Developer

@FinistSeb @LostInBrittany#BzhCmp #Warp10

Warp 10

@FinistSeb @LostInBrittany#BzhCmp #Warp10

IntroductionGeo-Time SeriesTM

Image: Spacetime distorsions

@FinistSeb @LostInBrittany#BzhCmp #Warp10

Time Series

Image: Mike Bostock

@FinistSeb @LostInBrittany#BzhCmp #Warp10

Time series storage and analysis

Image: Hamza Fessi and ABC Bourse

Not suited for your vanilla SQL RDBMS

One simple example: moving averages...

@FinistSeb @LostInBrittany#BzhCmp #Warp10

Geo-Time SeriesTM

Image: AIS Vessel Tracking

@FinistSeb @LostInBrittany#BzhCmp #Warp10

Geo-Time Series

@FinistSeb @LostInBrittany#BzhCmp #Warp10

Geo-Time Series and the IoT

Image: LinkedIn

@FinistSeb @LostInBrittany#BzhCmp #Warp10

IoT means talking thingHow fast are they talking?

@FinistSeb @LostInBrittany#BzhCmp #Warp10

Internet of very introverted Things

Long range transmissions

@FinistSeb @LostInBrittany#BzhCmp #Warp10

Internet of introverted Things

Personal Area Network

@FinistSeb @LostInBrittany#BzhCmp #Warp10

Internet of shy Things

Local Area NetworkCellular Networks

@FinistSeb @LostInBrittany#BzhCmp #Warp10

Lots of shy thing generate a huge lot of data

Image: Universal Studios

@FinistSeb @LostInBrittany#BzhCmp #Warp10

Internet of chatty Things

10 000 Hz

670 000 sensors

20 000 metrics

per second

@FinistSeb @LostInBrittany#BzhCmp #Warp10

Internet of garrulous Things

Image: Google

Millions of metrics per second

@FinistSeb @LostInBrittany#BzhCmp #Warp10

Warp 10 : A software platform for IoT

Warp 10 is a software platform that● Ingests and stores data● Manipulates and analyzes data● Is dedicated to data from sensors, meters, IoT and any real or

virtual probe

@FinistSeb @LostInBrittany#BzhCmp #Warp10

Warp 10 General Synoptic

Stockage Architecture

Language, Functions, Algorithms

Application access

VizualisationRealTime

@FinistSeb @LostInBrittany#BzhCmp #Warp10

#collectHow do you get these metrics?

Image: Games Radar

@FinistSeb @LostInBrittany#BzhCmp #Warp10

Using our own Sensision agent

@FinistSeb @LostInBrittany#BzhCmp #Warp10

Using our own Sensision agent

With queue forwarder

@FinistSeb @LostInBrittany#BzhCmp #Warp10

Using plugins for other collecting systems

@FinistSeb @LostInBrittany#BzhCmp #Warp10

Or simply pushing data directly

@FinistSeb @LostInBrittany#BzhCmp #Warp10

Choosing an input format

@FinistSeb @LostInBrittany#BzhCmp #Warp10

XML? JSON?

139 bytes 108 bytes

@FinistSeb @LostInBrittany#BzhCmp #Warp10

Warp 10 GTS Input Format

57 bytes

But size isn't the most important reason

parsing time is way more important

XML or even JSON parsing is slow and costlyWarp 10 GTS input format isn't

@FinistSeb @LostInBrittany#BzhCmp #Warp10

timestamp (us by default)

latitude:longitude (WGS84)

elevation (millimeters)

classname*

labels (key=value)

value* (long, double, boolean or string)* mandatory fields

Warp 10 GTS Input Format

@FinistSeb @LostInBrittany#BzhCmp #Warp10

#storeFrom tiny to huge

Image: Games Radar

@FinistSeb @LostInBrittany#BzhCmp #Warp10

Warp 10 on Raspberry Pi B+

1 000 datapoints per second

@FinistSeb @LostInBrittany#BzhCmp #Warp10

Warp 10 on Raspberry Pi 2 B

3 000 datapoints per second

@FinistSeb @LostInBrittany#BzhCmp #Warp10

Warp 10 on a modern server

120 000 datapoints per second

@FinistSeb @LostInBrittany#BzhCmp #Warp10

Warp 10 on a cluster

3 millions of datapoints per second(our current record on input traffic)

@FinistSeb @LostInBrittany#BzhCmp #Warp10

#analyseFrom tiny to huge

Image: Amazon

@FinistSeb @LostInBrittany#BzhCmp #Warp10

Many time-series solutions

TSAR

@FinistSeb @LostInBrittany#BzhCmp #Warp10

But they are only stores...

Fetching data is only the tip of the iceberg

@FinistSeb @LostInBrittany#BzhCmp #Warp10

Analysing the data

High level analysis must be done elsewhere

@FinistSeb @LostInBrittany#BzhCmp #Warp10

Algorithms are resource hungry

@FinistSeb @LostInBrittany#BzhCmp #Warp10

Your computer is not a datacenter

@FinistSeb @LostInBrittany#BzhCmp #Warp10

Manipulating GTS

To be scalable, analysis must be done in Warp 10 platform, not in user's computer

@FinistSeb @LostInBrittany#BzhCmp #Warp10

A true GTS analysis toolbox○ Hundreds of functions○ Manipulation frameworks○ Analysis workflow

Manipulating GTS

@FinistSeb @LostInBrittany#BzhCmp #Warp10

GTS manipulation

Why not a simple REST API?● One endpoint by function?● How to chain a workflow analysis?

REST API not suitable for complex manipulations

@FinistSeb @LostInBrittany#BzhCmp #Warp10

GTS manipulation

Why not a SQL dialect?● How do you do a simple moving average in SQL?● How do you geo-time fencing in SQL?

SQL is not adapted to (G)TS analysis!

@FinistSeb @LostInBrittany#BzhCmp #Warp10

GTS manipulation language

Our solution: a GTS manipulation language

WarpScript

@FinistSeb @LostInBrittany#BzhCmp #Warp10

A stack based language

@FinistSeb @LostInBrittany#BzhCmp #Warp10

WarpScript

Non-compiled Optimized functions, fast execution

@FinistSeb @LostInBrittany#BzhCmp #Warp10

Basic operations

@FinistSeb @LostInBrittany#BzhCmp #Warp10

Five frameworks

●●●●●

@FinistSeb @LostInBrittany#BzhCmp #Warp10

More than 500 functions

@FinistSeb @LostInBrittany#BzhCmp #Warp10

Time series functions

@FinistSeb @LostInBrittany#BzhCmp #Warp10

Time series functions

@FinistSeb @LostInBrittany#BzhCmp #Warp10

Geo-Time Series functions

Geo mapping (WKT)

Horizontal & vertical speed

Horizontal & vertical distance

Haversine...

@FinistSeb @LostInBrittany#BzhCmp #Warp10

Quantum IDE

@FinistSeb @LostInBrittany#BzhCmp #Warp10

Enough teasing...

@FinistSeb @LostInBrittany#BzhCmp #Warp10

Fuel prices data

16 297 448 metrics

11 379 fuel stations

42 885 Geo Time Series

@FinistSeb @LostInBrittany#BzhCmp #Warp10

Basic analysis

Average diesel fuel prices in France since 2007

Image: LEGO Ideas

@FinistSeb @LostInBrittany#BzhCmp #Warp10

First Fetch Data (SQL vs WarpScript )

@FinistSeb @LostInBrittany#BzhCmp #Warp10

FETCH gives us a GTS list

@FinistSeb @LostInBrittany#BzhCmp #Warp10

FETCH gives us a GTS list

Timestamp (microseconds since epoch)

@FinistSeb @LostInBrittany#BzhCmp #Warp10

FETCH gives us a GTS list

Location (latitude, longitude)

@FinistSeb @LostInBrittany#BzhCmp #Warp10

FETCH gives us a GTS list

Value

@FinistSeb @LostInBrittany#BzhCmp #Warp10

Calculate the average

Using Groovy:

@FinistSeb @LostInBrittany#BzhCmp #Warp10

Calculate the average with WarpScript

1- Calculate the mean price by station

@FinistSeb @LostInBrittany#BzhCmp #Warp10

Calculate the average with WarpScript

BUCKETIZE framework

Put the data of a GTS into regularly spaced buckets

@FinistSeb @LostInBrittany#BzhCmp #Warp10

Calculate the average with WarpScript

@FinistSeb @LostInBrittany#BzhCmp #Warp10

Calculate the average with WarpScript

2- Reduce to get the global average

@FinistSeb @LostInBrittany#BzhCmp #Warp10

Calculate the average with WarpScript

REDUCE framework

Apply a function on a set of GTS tick by tick

@FinistSeb @LostInBrittany#BzhCmp #Warp10

Too verbose? Write it differently

@FinistSeb @LostInBrittany#BzhCmp #Warp10

Even more concise

@FinistSeb @LostInBrittany#BzhCmp #Warp10

Basic analysisMean of the last available

diesel fuel prices in France

Image: LEGO Ideas

@FinistSeb @LostInBrittany#BzhCmp #Warp10

Fetching Data (SQL vs WarpScript )

@FinistSeb @LostInBrittany#BzhCmp #Warp10

FETCH gives us a GTS list

@FinistSeb @LostInBrittany#BzhCmp #Warp10

Mean of those last prices

align ticks with BUCKETIZE framework

compute the average with REDUCE

@FinistSeb @LostInBrittany#BzhCmp #Warp10

Geo-time analysis

Find the cheapest fuel station near here

48.115434, -1.636877

@FinistSeb @LostInBrittany#BzhCmp #Warp10

WKT: Well-known text geometry

@FinistSeb @LostInBrittany#BzhCmp #Warp10

WKT in WarpScript

@FinistSeb @LostInBrittany#BzhCmp #Warp10

Geo-filtering points of GTS

@FinistSeb @LostInBrittany#BzhCmp #Warp10

Geo-filtering points of GTS

MAPPER framework

Apply a function on values of a GTS that fall into a sliding window

@FinistSeb @LostInBrittany#BzhCmp #Warp10

The stations near my position

@FinistSeb @LostInBrittany#BzhCmp #Warp10

There can only be one

@FinistSeb @LostInBrittany#BzhCmp #Warp10

And this is only the surfacePossibilities are endless

@FinistSeb @LostInBrittany#BzhCmp #Warp10

Think differently

Geo-Time Series are everywhere

@FinistSeb @LostInBrittany#BzhCmp #Warp10

Warp 10 platform and tools

@FinistSeb @LostInBrittany#BzhCmp #Warp10

Everything is on GitHub

https://github.com/cityzendata/

@FinistSeb @LostInBrittany#BzhCmp #Warp10

Thank you !