Presented by Andrew Yu
Introduction
Huawei European Research Center in Munich, Germany
Problem
■ High Density Writes (100,000 events/s)
■ High Complexity Reads (100 ad-hoc queries/s)
■ High Freshness requirement (ad-hoc queries require <1s concurrency with new writes)
Old Solutions: OLTP & OLAP
Scales Poorly!
Old Solutions: Apache
■ Writes
– Storm
■ Reads
– Hadoop
– Spark
■ Real-time Analytics
– Druid
Not Fast Enough!
Crappy + Slow
Analytics in Motion (AIM System)
Writes
Reads
Stores
AIM System: Goals
■ Scale separately
■ Scale seamlessly
■ Scale with performance
■ Scale
■ Scale
■ Scale
Writes
Reads
Stores
Use case (the Huawei Marketing case)
■ As telecommunications provider
■ Gather cell-phone usage data from customers
■ Transform raw data to marketing-related attributes
■ Return real-time advertisements, promotions, and, and abuse warnings
■ Goal: Make a highly customized AIM System that caters to Huawei’s needsGoal: Make a highly customized AIM System that caters to Huawei’s needsGoal: Make a highly customized AIM System that caters to Huawei’s needsGoal: Make a highly customized AIM System that caters to Huawei’s needs
Definitions/Eventflow
■ Event Stream Processes (ESPs): raw writes data from customers
■ Business Rules (BRs): simple rules/triggers derived from ESPs for high-priority responses
■ Analytics Matrices (AMs): collection of marketing-related attributes that must be calculated from raw data
■ Real-Time Analytics (RTAs): complex BI queries derived from reads
Event Stream Processes (ESPs)
■ Get raw data -> update AMs
■ Only update attributes as necessary
■ (format AMs to be receptive to atomic updates)
■ e.g. call time, call location, caller, receiver, call duration, call cost
Business Rules (BRs)
■ Rules must be simple
■ Evaluations must be fast
■ Optimize evaluation algorithms for fast fail/fast success
■ e.g. unusual call location/receiver -> send customer a warning about stolen device
Analytics Matrices (AMs)
■ Marketing-related base attributes
■ Lots of aggregates
■ Building blocks for RTAs
■ ~80M rows, ~2K columns
■ e.g. call-density per unit times
Real-Time Analytics (RTAs)
■ Typical BI questions that Huawei might ask
■ Ad-hoc
■ e.g. call-density at given times in a given location (roll-ups and drill-downs)
INTEGRATING WRITES AND READS
Separate the Processes
■ Copy-on-Write
– Use UNIX fork() paradigm: Computing RTAs use forked snapshots of AM memory state that are separate from ESP updates
■ Differential UpdatesDifferential UpdatesDifferential UpdatesDifferential Updates
– Delta “copy” is used for ESPsDelta “copy” is used for ESPsDelta “copy” is used for ESPsDelta “copy” is used for ESPs
– Main “copy” is used for RTAs, while periodically updating through delta copies Main “copy” is used for RTAs, while periodically updating through delta copies Main “copy” is used for RTAs, while periodically updating through delta copies Main “copy” is used for RTAs, while periodically updating through delta copies via merge processesvia merge processesvia merge processesvia merge processes
– Can force immediate updates if RTA is highCan force immediate updates if RTA is highCan force immediate updates if RTA is highCan force immediate updates if RTA is high----prioritypriorityprioritypriority
Threads
■ Allow multiple threads on same data
■ Partition data, each thread gets one partitionPartition data, each thread gets one partitionPartition data, each thread gets one partitionPartition data, each thread gets one partition
Architectural Layering
■ Each ESP, Storage, and RTA Each ESP, Storage, and RTA Each ESP, Storage, and RTA Each ESP, Storage, and RTA nodes can be added as nodes can be added as nodes can be added as nodes can be added as necessarynecessarynecessarynecessary
Writes
Reads
Stores
Data Placement & Join Processing
■ Multiple Storage Nodes = Multiple AM Fragments
■ Each node must have BRs and dimension tables (for joins and sorts)
■ But O.K. for the use case since they are relatively small
AIM SYSTEM IMPLEMENTATION
For Huawei Use Case
Use Case Personalization
■ Each raw data has primary key
■ Small BRs and dimension tables
RTA Nodes
■ Lightweight
■ Only send queries to Storage Nodes
■ Asynchronous (send answers ASAP)
ESP Nodes
■ Heavyweight
■ Lots of writes (100,000/s)
■ BR processing
■ Synchronous (Threaded processes = sensitive to OS clock)
ESP and Storage Nodes on Same Machine
■ Advantage: Share memory
■ Semi-built AMs are too big for network transfer
AM Updates
■ Custom-built Kernels
■ Native data structures for AMs
■ No need to use higher-level language like C++
■ Use ESPs on memory:
– No conditional statements
– Aggregation functions are stored in kernel, not program
– All instructions are sequential
BR Evaluation
■ 300 BRs
■ Huawei ruleset is small
■ No need for indexing
ColumnMap
■ AM data structure design
■ Fit memory cache, not memory pages
■ Whole AMs stored in MEMORY (100s of GBs)!
■ 10MB L3 cache size
Results & Conclusions
Does it scale well?
References
■ SnappyData: A Hybrid Transactional Analytical Store Built On Spark
– Jags Ramnarayan, Barzan Mozafari, Sumedh Wale, Sudhir Menon, Neeraj Kumar, Hemant Bhanawat, Soubhik Chakraborty, Yogesh Mahajan, RishiteshMishra, Kishor Bachhav
– SIGMOD Conference 2016
■ Snappydata: Streaming, Transactions, and Interactive Analytics in a Unified Engine
– Jags Ramnarayan, Barzan Mozafari, Sumedh Wale, Sudhir Menon, Neeraj Kumar, Hemant Bhanawat, Soubhik Chakraborty, Yogesh Mahajan, RishiteshMishra, Kishor Bachhav 2016
■ Master's Thesis Nr. 140 Efficient Scan in Log-structured Memory Data Stores
– Systems Group, Kevin Bocksrocker, Markus Donald Kossmann, Pilman 2015