Download - Presented by Andrew Yu - Cleveland State Universitycis.csuohio.edu/~sschung/cis611/AnalyticsinMotionAndrewYu.pdf · Real-time Analytics – Druid Not Fast Enough! Crappy + Slow .

Presented by Andrew Yu

Introduction

Huawei European Research Center in Munich, Germany

Problem

■ High Density Writes (100,000 events/s)

■ High Complexity Reads (100 ad-hoc queries/s)

■ High Freshness requirement (ad-hoc queries require <1s concurrency with new writes)

Old Solutions: OLTP & OLAP

Scales Poorly!

Old Solutions: Apache

■ Writes

– Storm

■ Reads

– Hadoop

– Spark

■ Real-time Analytics

– Druid

Not Fast Enough!

Crappy + Slow

Analytics in Motion (AIM System)

Writes

Reads

Stores

AIM System: Goals

■ Scale separately

■ Scale seamlessly

■ Scale with performance

■ Scale

■ Scale

■ Scale

Writes

Reads

Stores

Use case (the Huawei Marketing case)

■ As telecommunications provider

■ Gather cell-phone usage data from customers

■ Transform raw data to marketing-related attributes

■ Return real-time advertisements, promotions, and, and abuse warnings

■ Goal: Make a highly customized AIM System that caters to Huawei’s needsGoal: Make a highly customized AIM System that caters to Huawei’s needsGoal: Make a highly customized AIM System that caters to Huawei’s needsGoal: Make a highly customized AIM System that caters to Huawei’s needs

Definitions/Eventflow

■ Event Stream Processes (ESPs): raw writes data from customers

■ Business Rules (BRs): simple rules/triggers derived from ESPs for high-priority responses

■ Analytics Matrices (AMs): collection of marketing-related attributes that must be calculated from raw data

■ Real-Time Analytics (RTAs): complex BI queries derived from reads

Event Stream Processes (ESPs)

■ Get raw data -> update AMs

■ Only update attributes as necessary

■ (format AMs to be receptive to atomic updates)

■ e.g. call time, call location, caller, receiver, call duration, call cost

Business Rules (BRs)

■ Rules must be simple

■ Evaluations must be fast

■ Optimize evaluation algorithms for fast fail/fast success

■ e.g. unusual call location/receiver -> send customer a warning about stolen device

Analytics Matrices (AMs)

■ Marketing-related base attributes

■ Lots of aggregates

■ Building blocks for RTAs

■ ~80M rows, ~2K columns

■ e.g. call-density per unit times

Real-Time Analytics (RTAs)

■ Typical BI questions that Huawei might ask

■ Ad-hoc

■ e.g. call-density at given times in a given location (roll-ups and drill-downs)

INTEGRATING WRITES AND READS

Separate the Processes

■ Copy-on-Write

– Use UNIX fork() paradigm: Computing RTAs use forked snapshots of AM memory state that are separate from ESP updates

■ Differential UpdatesDifferential UpdatesDifferential UpdatesDifferential Updates

– Delta “copy” is used for ESPsDelta “copy” is used for ESPsDelta “copy” is used for ESPsDelta “copy” is used for ESPs

– Main “copy” is used for RTAs, while periodically updating through delta copies Main “copy” is used for RTAs, while periodically updating through delta copies Main “copy” is used for RTAs, while periodically updating through delta copies Main “copy” is used for RTAs, while periodically updating through delta copies via merge processesvia merge processesvia merge processesvia merge processes

– Can force immediate updates if RTA is highCan force immediate updates if RTA is highCan force immediate updates if RTA is highCan force immediate updates if RTA is high----prioritypriorityprioritypriority

Threads

■ Allow multiple threads on same data

■ Partition data, each thread gets one partitionPartition data, each thread gets one partitionPartition data, each thread gets one partitionPartition data, each thread gets one partition

Architectural Layering

■ Each ESP, Storage, and RTA Each ESP, Storage, and RTA Each ESP, Storage, and RTA Each ESP, Storage, and RTA nodes can be added as nodes can be added as nodes can be added as nodes can be added as necessarynecessarynecessarynecessary

Writes

Reads

Stores

Data Placement & Join Processing

■ Multiple Storage Nodes = Multiple AM Fragments

■ Each node must have BRs and dimension tables (for joins and sorts)

■ But O.K. for the use case since they are relatively small

AIM SYSTEM IMPLEMENTATION

For Huawei Use Case

Use Case Personalization

■ Each raw data has primary key

■ Small BRs and dimension tables

RTA Nodes

■ Lightweight

■ Only send queries to Storage Nodes

■ Asynchronous (send answers ASAP)

ESP Nodes

■ Heavyweight

■ Lots of writes (100,000/s)

■ BR processing

■ Synchronous (Threaded processes = sensitive to OS clock)

ESP and Storage Nodes on Same Machine

■ Advantage: Share memory

■ Semi-built AMs are too big for network transfer

AM Updates

■ Custom-built Kernels

■ Native data structures for AMs

■ No need to use higher-level language like C++

■ Use ESPs on memory:

– No conditional statements

– Aggregation functions are stored in kernel, not program

– All instructions are sequential

BR Evaluation

■ 300 BRs

■ Huawei ruleset is small

■ No need for indexing

ColumnMap

■ AM data structure design

■ Fit memory cache, not memory pages

■ Whole AMs stored in MEMORY (100s of GBs)!

■ 10MB L3 cache size

Results & Conclusions

Does it scale well?

References

■ SnappyData: A Hybrid Transactional Analytical Store Built On Spark

– Jags Ramnarayan, Barzan Mozafari, Sumedh Wale, Sudhir Menon, Neeraj Kumar, Hemant Bhanawat, Soubhik Chakraborty, Yogesh Mahajan, RishiteshMishra, Kishor Bachhav

– SIGMOD Conference 2016

■ Snappydata: Streaming, Transactions, and Interactive Analytics in a Unified Engine

– Jags Ramnarayan, Barzan Mozafari, Sumedh Wale, Sudhir Menon, Neeraj Kumar, Hemant Bhanawat, Soubhik Chakraborty, Yogesh Mahajan, RishiteshMishra, Kishor Bachhav 2016

■ Master's Thesis Nr. 140 Efficient Scan in Log-structured Memory Data Stores

– Systems Group, Kevin Bocksrocker, Markus Donald Kossmann, Pilman 2015