+ All Categories
Home > Data & Analytics > Time Series With OrientDB - Fosdem 2015

Time Series With OrientDB - Fosdem 2015

Date post: 14-Jul-2015
Category:
Upload: wolf4ood
View: 1,888 times
Download: 0 times
Share this document with a friend
38
Time ows, on Gr aph Managing event sequences and me series with a Document- Graph Database FOSDEM 2015 Enrico Risa Orient Technologies LTD Twier: @wolf4ood Emanuele Tagliaferri Orient Technologies LTD Twier: @tglman
Transcript

Time flows, on Graph

Managing event sequences and time series with a Document-Graph Database

FOSDEM 2015

Enrico Risa

Orient Technologies LTD

Twitter: @wolf4ood

Emanuele Tagliaferri

Orient Technologies LTD

Twitter: @tglman

Time What…?

Time series: A time series is a sequence of data points, typicallyconsisting of successive measurements made over atime interval (Wikipedia)

Time What…?

Event sequences:

• A set of events with a timestamp

• A set of relationships “happenedbefore/after”

• Cause and effect relationships

Graph approaches

•. Nodes/Edges

•. Index free adjacency

•. Fast traversal

•. Dynamic structure

Graph approaches

Linked sequence

e1e1 e2e2next

e3e3next

e4e4next

e5e5next

(timestamp on vertex)

Graph approaches

linked sequence (tag based)

e1e1 e2e2

nextTag1

e3e3

nextTag2

e4e4nextTag1

e5e5

nextTag1

nextTag2

[Tag1, Tag2] [Tag1]

[Tag1, Tag2]

[Tag1]

[Tag2]

Graph approaches

Hierarchy

e1e1 e2e2 e60

e60

11

11

88

2424

22 6060…

Days

Hours

Minutes

Seconds

e3e3

Graph approaches

Mixed

e1e1 e2e2 e60

e60

11

11

88

2424

22 6060…

Days

Hours

Minutes

Seconds

e3e3

Current approaches

Advantages

•. Flexible

•. Events can be connected together in different ways

•. You can navigate events following a path by time ortag.

Current approaches

Disadvantages

•. Slow query for a high number of event

Optimization

● Data Pre-Aggregation

Optimization

Pre-aggregate

11

11

88

2424

22 6060…

Days

Hours

Minutes

…Graph

Optimization

Pre-aggregate

11

11

88

2424

22 6060…

Days

Hours

Minutes

…Graph

sum()

Optimization

Pre-aggregate

11

11

88

2424

22 6060…

Days

Hours

Minutes

…Graph

sum()

sum()

Optimization

Aggregation logic

• Second 0 -> insert

• Second 1 -> insert

• …

• Second 57 -> insert

• Second 58 -> insert

• Second 59 -> insert + aggregate update– Write aggregate value on minute vertex

● Minute == 59? Calculate aggregate on hour vertex

OrientDB

How to aggregate

Hooks: Server side triggers (Java or Javascript),executed when DB operations happen (eg. Insert orupdate)

Java interface:

Public RESULT onBeforeInsert(…);

public void onAfterInsert(…);public RESULT onBeforeUpdate(…);

public void onAfterUpdate(…);

Optimization

11

11

88

2424

22 6060…

Days

Hours

Minutes

sum = 1000

sum = 15000

sum = 300

incomplete

complete

11 22

sum = null

sum = null

Optimization

Query logic:

• Traverse from root node to specified level(filtering based on vertex data)

• Is there aggregate value?

– Yes: return it

– No: go one level down and do the same

Aggregation on a level will be VERY fast if youhave horizontal edges!

OrientDB

How to calculate aggregate values with a query

Input params:

- Root node (suppose it is #11:11)

select sum(aggregateVal) from (

traverse out() from #11:11

while in().aggregateVal is null

)

With the same logic you can query based on timewindows

Time Series Proof of Concept

POC Implementation

Core:● As OrientDB Plugin

● Rely on Hooks

● Aggregation Engine

● Handle all Time Unit

Data Visualization:

● Simple UI (Realtime/History)

● Query in Studio

Core

● Plugin that register hook and some input/outputsource (websocket ,message queue, socket etc..)

● Hook on Event Class (entry point)

- Event can be saved or not.- Aggregations are made when the lower time units changes- Pre-allocation of TimeUnit Pointers

● Time unit tracked:-Year-Month-Day-Minute-Second

Core

Advantages

● Simple (Few lines of code)

● No Indexes

● Easy to use

– Plain OrientDB sql to insert an eventinsert into event set bets = 1, cpu = 50

● Fast (Especially in plocal mode)

Core

Disadvantages

● Too Simple (For now)

● Aggregator hardcoded (Maybejavascript aggregator?)

Data Visualization

Two Charts:

● Realtime data through WebSocket

The engine pushes the events received every seconds

● Range query for history Data

Using the powerfull array range notation we can query fora specific time range

Let's Run It

Data Query Time unit

● Array Notation

selectexpand(m[1].d[30].h[13].m[5-10])

from year where time = 2015

● Traverse with Next

traverse next from(select expand(m[1].d[26].h[19].m[37])

from year where time = 2015 )while $depth <= 3

Data Query Aggregation

● Array Notation

select sum(bets)from (selectexpand(m[1].d[30].h[13].m[5-10])

from year where time = 2015)

● Traverse with Next

select sum(bets)from {traverse next from(select expand(m[1].d[26].h[19].m[37])

from year where time = 2015 )while $depth <= 3)

Multi-Model Optimization!We got OrientDB

• Document database (schema-free, complexproperties)

• Graph database (index-free adjacency, fast traversal)

• SQL (extended)

• Operational (schema - ACID)

• OO concepts (Classes, inheritance, polymorphism)

• REST/JSON interface

• Native Javascript (extend query language, exposeservices, event hooks)

• Distributed (Multi-master replica/sharding)architecture

● Studio 2.0

● Lucene & ETL in bundle

● WAL management (Fuzzy Checkpoint)

● Schema Driven Serialization

● Autosharding strategy on Distributed

OrientDB

First step: put them together

11

11

88

2424

22 6060…

Days

Hours

Minutes

{0: 1000,1: 1500.…59: 96

}

OrientDB

First step: put them together

11

11

88

2424

22 6060…

Days

Hours

Minutes

{0: 1000,1: 1500.…59: 96

}

<- IT’S A VERTEX TOO!!!

Graph

Document

OrientDB

put them together

11

88

2424

Days

Hours…

{0: {

0: 1000, 1: 1500,…59: 210

}1: { … }…59: { … }

}

Graph

Document

Where should I stop?

It depends on my domain andrequirements.

OrientDB

Third step: Complex domains

11

11 22 6060…

Hours

Minutes

{0: {val: 1000},1: {val: 1500}.…59: {

val: 96,eventTags: [tag1, tag2]…

}}

Graph

Document <- Enrich the domain

One model is not enough

One of most common issues of my customersis:

“I have a zoo of technologies in my applicationstack, and it’s getting worse every day”

My answer is: Multi-Model DB

of course ;-)

Thank you!

Enrico Risa

Orient Technologies LTD

Twitter: @wolf4ood

Emanuele Tagliaferri

Orient Technologies LTD

Twitter: @tglman


Recommended