Date post: | 28-Jul-2015 |
Category: |
Software |
Upload: | patrick-di-loreto |
View: | 8,510 times |
Download: | 1 times |
Presented by Patrick Di Loreto R&D Engineering Lead 14th June 2015 Site: https://developer.williamhill.com/ BLOG: http://patricknoir.blogspot.com Twitter: https://twitter.com/patricknoir
Using Spark, Kafka, Cassandra and Akka on Mesos for Real-Time Personalization
• WH Labs • Omnia – Data Management Platform
– Omnia Chronos – A distributed Integration Middleware with Akka and Kafka – Omnia Fates – The long term memory with Apache Cassandra – Omnia NeoCortex – Real time and Machine Learning using Apache Spark – Omnia Hermes – Serving layer with Akka CQRS – Omnia Infrastructure - Mesos, Marathon and Docker
Introduction
We're Hiring h+ps://careers.williamhill.com
WH Apple Watch App Interac:ve Scoreboard Virtual Reality Horse Race Oculus RiD
Based on a Lambda Architecture Respecting Reactive Principles
• Chronos – Data Source • Fates – Batch Layer • NeoCortex – Speed Layer • Hermes – Serving Layer
Omnia – Data Management Platform
Omnia
Chronos
Fates
Hermes
NeoCortex
Omnia & Lambda Architecture
Chrono
s (Data Source)
NeoCortex (Speed Layer)
Fates (Batch Layer)
Hermes
(Serving Layer)
Omnia Chronos
Is in charge to collect the data from different sources and organise them into a stream of observable events.
Observable [ ]
• Social media • Facebook • Twi+er • Affiliates
• Page viewing • Ar:cles read, following and followers, bets etc…
• Sports related • Tweets • News • Gaming
• Web Analy:cs • Ac:vi:es with in our applica:ons
Internal Product Centric
External Customer Centric
{ “type” : “bet”, “version” : “1.0” “Ame” : “2015-‐06-‐03 08:00:31”, “acquisiAonTime: “ . . .”, “source” : “WHBetSystem” “payload” : { … any valid json } }
Omnia Chronos In Chronos you define streams that collect data and convert/persist into a stream of Observable[Incident].
Chronos
Stream 3
Stream 2
Stream 1
Stream
Omnia Chronos
• Each stream is an actor which supervises its children: – Adapter Actor – Converter Actor – Persistence Manager Actor
• Streams Actor are referential transparent with the usage of Akka Cluster: We have extended Akka Cluster to migrate the Stream Actors based on resource KPIs
• Data are persisted in Kafka for durability • Chronos is built on top of Akka, ScalaRx and Play framework:
planning migration to Akka Streaming
Fates represents the long term memory of Omnia. Is in charge to organise all the incidents recorded by Chronos into timelines and create new information as views by using machine learning, logical reasoning and time series analysis. • A timeline represents the history, the sequence of incidents performed by a specific entity over the time. Timelines
are organised per categories. An example of timeline can be the customer timeline, which might contain all the bets placed, deposit and withdraw activities, tweets etc... performed by the specific customer. A timeline category is not limited just to customers, it can be anything, for example: Sport Event: football match, competition
• Views are the result of job task that elaborates data from: – Timelines – Other Views
Omnia Fates
Timelines are created from timeline streams, each timeline stream read data from a Chronos stream and fed the right timeline.
Omnia Fates Ch
rono
s
Fates
• Fates persist timelines of incidents. • Column Family Name: <TimelineCategory>_tl • Key Definition: ( (entityId, date), timestamp )
• The partition key is a strong hash key : well balanced Cassandra Cluster • Composite key: incidents are ordered by timestamp under a specific entity within a day
(date = yyyy-MM-dd )
Omnia Fates - Cassandra
Omnia Fates
• We build views with job able to do: Jobs are performed on top of NeoCortex
Logical Reasoning • Deduc:on • Induc:on • Abduc:on
Time line analysis • Trends • Cycles • Seasonality
Other ML • Classifica:on • Clustering • Predic:ons
Omnia Neo Cortex • Neo Cortex is a library developed on top of Apache Spark in order to provide to the
developers an easy way to write micro services on top of Omnia. • In NeoCortex we use the distribute nature of Spark to perform fast, real time data
processing and we hide to the developer the problematic relative to the connection to the source system (Chronos) and the publishing layer
• Typeclass definition for: Timeline, View, ChronosStream etc… • Typeclass definition for Algebrical structures:
– Monoids, Rings, Groups, providing advanced functions for: moving averages, ARX, ARMA etc
Omnia Neo Cortex - Parallelism
chronos stream
Driver
Executor 1
Executor 2
Executor 3
Executor 4
Executor 3
Executor 4
Hermes
(Serving Layer)
Stage 1 (map)
Stage 2 (reduceByKey)
Fates :melines views
Hermes Is the layer on which data get represented for consumption: B2B and B2C. At its foundation micro-services, notifications and data as API are key aspects of the design
Scalable and simple full duplex communication for the web Express the correlation between the entities of the model Inspired by Falcor (Netflix) and GraphQL (Facebook)
Hermes
Hermes Distributed Cache
Hermes Node
Local Cache
Subscrip:o
n Manager
Client M
anager
Authen
:ca:
on Handler
Dispatcher
HTTP
WS
TCP
Browser
Herm
es JS
WH Ap
ps
Use Omnia on Omnia
Mesos
Maratho
n
Docker (Applica:on Repository)
Docker
Omnia App
Docker
Omnia App
Docker
Omnia App Ch
rono
s
NeoCortex (Speed Layer)
Fates (Batch Layer)
JMX JMX
JMX
Health Stream