Parallel Complex Event Processing

Post on 29-Jan-2018

1,411 views 2 download

transcript

Parallel Complex Event Processing

Karol Grzegorczyk03-06-2013

Big Data classification

[http://en.wikipedia.org/wiki/File:3_states_of_data.jpg]

Event-driven architecture

Complex Event Processing solutions

● Open Source:

– Esper

– Drools Fusion

– Storm

– WSO2 Complex Event Processor● Proprietary software

– Oracle Complex Event Processing

– StreamBase Complex Event Processing

– Informatica RulePoint

– TIBCO Complex Event Processing

Esper

● Two editions:

― Open source library

― Enterprise server based on Jetty

● Core component of Esper is a CEP engine.

● CEP engine is working like database turned upside-down

● Expressions are defined in Event Processing Language (EPL)

― Declarative domain specific language

― Similar with the SQL query language but differs from SQL in its use of views

rather than tables and events instead of records (rows)

― Views are reused among EPL statements for efficiency!

select * from OrderEvent.win:length(5)

Streams

● Complex event can be build based on several data streams.select * from AlertEvent as a, NewsEvent as n

where a.symbol = n.symbol

● Esper defines two types of data streams:

― Filter-based event streamselect * from OrderEvent(itemType='shirt')

― Pattern-based event streamselect * from pattern [

OrderEvent(itemType='shirt') -> OrderEvent(itemType='trousers')]

● It is possible to join between filter-based and pattern-based streams!

● Events can be forwarded to others streams using INSERT INTO keywords.

● It is also possible to update event (using UPDATE keyword) before it applies

to any selecting statements

Views

● Events are derived from streams (both filter- and pattern-based) by views

● Default view encloses all events from the stream since addition of the statement to the engine.

● View types:– Data windows (e.g. lenght, time)

– Named windows

– Extension Views (sorted window, rankied window, time-order view)

– Standard views (unique, grouped, size, lastevent)

– Statistics view (univariate, regression, correlation)

Esper processing

● Update listeners and subscriber objects are associated with EPL statements

● By defualt listeners and subscribers are notified when new event that match EPL query arrive (insert stream)

● In addition listeners and subscribers can be notified when some event that match EPL query is removed from the stream (due to the limit of particular window)

[Esper Reference]

Filtering

Esper provides two types of filtering:

● Stream-level filteringselect * from OrderEvent(type= 'shirt')

● Post-data-window filteringselect * from OrderEvent where type = 'shirt'

[Esper Reference]

[Esper Reference]

Stream-level filtering vs post-data-window filtering

select * from OrderEvent(type= 'shirt')

vs

select * from OrderEvent where type = 'shirt'

The first form is preferred, but still sometimes post-data-window filtering is desired:

Select one hundred orders and calculate average price of trousers.

select avg(price) from OrderEvent.win:length(100) where type = 'trousers'

Data Windows

● Basic windows:

― Length window (win:length)

― Length batch window (win:length_batch)

― Time window (win:time)

― Time batch window (win:time_batch)

● Advanced time windows

― Externally-timed window (win:ext_timed)

― Externally-timed batch window (win:ext_timed_batch)

― Time-Length combination batch window (win:time_length_batch)

― Time-Accumulating window (win:time_accum)

― Keep-All window (win:keepall)

― First Length (win:firstlength)

― First Time (win:firsttime)

[Esper Reference]

[Esper Reference]

Scaling Esper

● According to the documentation Esper exceeds over 500 000 event/s on a dual CPU 2GHz Intel based hardware, with engine latency below 3 microseconds average (below 10us with more than 99% predictability) on a VWAP benchmark with 1000 statements registered in the system - this tops at 70 Mbit/s at 85% CPU usage.

● Parallel processing

– Within one machine

- Context partitions

– With multiple machines

- Partitioned stream- Partition by use case

Context

● Context partition – basic level for locking

● By default single context partition

● Context types:

― Keyed Segmented

― Hash Segmented

― Category Segmented

― Non-overlapping context

― Overlapping context

● Nesting context

Keyed Segmented Context

create context ByCustomerAndAccountpartition by custId and account from BankTxn

context ByCustomerAndAccountselect custId, account, sum(amount) from BankTxn

Implicite grouping in select statement.

Hash Segmented Context

Assigns events to context partitions based on result of a hash function and modulo operation

create context SegmentedByCustomerHash coalesce by hash_code (custId) from BankTxn granularity 16 preallocate

context SegmentedByCustomerHashselect custId, account, sum(amount) from BankTxn group by custId, account

No implicite grouping in select statement!

Category Segmented Context

Assigns events to context partitions based on the values of one or more event properties, using a predicate expression(s) to define context partition membership.

create context CategoryByTempgroup temp < 65 as cold,group temp between 65 and 85 as normal,group temp > 85 as largefrom SensorEvent

context CategoryByTempselect context.label, count(*) from SensorEvent

Non-overlapping context

Non-overlapping context is created when start condition is meet and ended when end condition is meet. There is always either one or zero context partions.

create context NineToFive start (0, 9, *, *, *) end (0, 17, *, *, *)

context NineToFive select * from TrafficEvent(speed >= 100)

Overlapping context

This context initiates a new context partition when an initiating condition occurs, and terminates one or more context partitions when the terminating condition occurs.

create context CtxTrainEnter initiated by TrainEnterEvent as te terminated after 5 minutes

context CtxTrainEnter select t1 from pattern [t1=TrainEnterEvent -> timer:interval(5 min) and not TrainLeaveEvent(trainId = context.te.trainId)]

Context nesting

In case of nested contextx the context declared first controls thelifecycle of the context(s) declared thereafter.

create context NineToFiveSegmentedcontext NineToFive start (0, 9, *, *, *) end (0, 17, *, *, *),context SegmentedByCustomer partition by custId from BankTxn

context NineToFiveSegmentedselect custId, account, sum(amount) from BankTxn group by account

Partitioning without context declaration

Grouped data window std:groupwin()

What is the difference between:

select avg(price) from OrderEvent.std:groupwin(itemType).win:length(10)

And

select avg(price) from OrderEvent.win:length(10) group by itemType

?

Parallel processing on multiple machines

● Partitioned stream● Partition by use case

[Esper Enterprise Edition Reference]

Thank you