Date post: | 09-May-2015 |
Category: |
Technology |
Upload: | cosmin-lehene |
View: | 7,670 times |
Download: | 1 times |
© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
Cosmin Lehene | Adobe
#bigdataro - 30 January 2013
Real-time “OLAP” for Big Data (+ use cases)
© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
What we needed … and built
OLAP Semantics Low Latency Ingestion High Throughput Real-time Query API
2
© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
“Physical” Building Blocks
3
© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
Logical Building Blocks
Dimensions, Metrics Aggregations Roll-up, drill-down, slicing and dicing, sorting
4
© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
OLAP 101 – Queries example
5
Date Country
City OS Browser Sale
2012-05-21
USA NY Windows FF 0.0
2012-05-21
USA NY Windows FF 10.0
2012-05-22
USA SF OSX Chrome 25.0
2012-05-22
Canada Ontario Linux Chrome 0.0
2012-05-23
USA Chicago OSX Safari 15.0
5 visits,3 days
2 countriesUSA: 4Canada: 1
4 cities:NY: 2SF: 1
3 OS-esWin: 2OSX: 2
3 browsersFF: 2Chrome:2
50.03 sales
© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
OLAP 101 – Queries example
Rolling up to country level:
SELECT COUNT(visits), SUM(sales)
GROUP BY country
“Slice” by browser
SELECT COUNT(visits), SUM(sales)
GROUP BY country
HAVING browser = “FF”
Top browsers by sales
SELECT SUM(sales), COUNT(visits)
GROUP BY browser
ORDER BY sales
6
Country visits
sales
USA 4 $50
Canada 1 0
Country visits
sales
USA 2 $10
Canada 0 0
Browser sales visits
Chrome $25 2
Safari $15 1
FF $10 2
© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
Aggregate at runtime Most flexible
Fast – scatter gather
Space efficient
But I/O, CPU intensive
slow for larger data
low throughput
Pre-aggregate Fast
Efficient – O(1)
High throughput
But More effort to process
(latency)
Combinatorial explosion (space)
No flexibility
OLAP – Runtime Aggregation vs. Pre-aggregation
7
© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
SaasBase Map
8
© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
SaasBase Domain Model Mapping
9
© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
SaasBase - Domain Model Mapping
10
© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
SaasBase - Ingestion, Processing, Indexing, Querying
11
© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
SaasBase - Ingestion, Processing, Indexing, Querying
12
© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
Ingestion
13
© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
Ingestion(ETL) throughput vs. latency
Historical data (large batches) Optimize for throughput
Increments (latest data, smaller) Optimize for latency
14
© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
Processing
15
© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
Processing
Processing involves reading the Input (files, tables, events), pre-aggregating it (reducing cardinality) and generating cubes that can be queried in real-time
“Super Processor” code running in Storm, Map-Reduce, HBase
16
© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
Processing for OLAP semantics
GROUP BY (process, query)
COUNT, SUM, AVG, etc. (process, query)
SORT (process, query)
HAVING (mostly query, can define pre-process constraints)
17
© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
SaasBase vs. SQL Views Comparison
18
© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
Query Engine
Always reads indexed, compact data
Query parsing
Scan strategy
Single vs. multiple scans
Start/stop rows (prefixes, index positions, etc.)
Index selection (volatile indexes with incremental processing)
Deserialization
Post-aggregation, sorting, fuzzy-sorting etc.
Paging
Custom dimension/metric class loading
19
© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
Adobe Business Catalyst
Online business presence: e-commerce, marketing, web analytics etc.
Use case: Web Analytics (visitors, channels, content, e-commerce, campaigns, etc.)
20
© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
BC - Workflow
21
© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
Adobe Business Catalyst - Stats
3 active datacenters
Raw data ~6TB (from ~1TB 18 months ago)
Visits table: ~1TB each(compressed)
OLAP cubes (stats): 49GB – 64GB (compressed)
~30 minutes latency (from actual pageview/sale to chart in UI)
10s – 100s of milliseconds latency for queries
~3000/s max concurrent OLAP queries (actual traffic is much lower)
22
© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
Adobe Pass for TV Everywhere
Authentication & Authorization
Single sign-on to Programmer content (e.g. Turner, NBC, Hulu, MTV, etc) with Cable operator credentials (e.g. Comcast, Dish, etc.)
23
© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
Adobe Pass – Use Case
Analytics use case: Operational metrics (users, devices, latencies, etc.)
Real-time ingestion in HBase
High Frequency Map Reduce jobs (every 2 minutes)
24
© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
Adobe Pass - Stats (London Olympics 2012)
67M streams ~ 5.3M hours
1.5M concurrent streams
> 7M unique users
1 Technical & Engineering Emmy Award ;)
25
© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
Adobe Primetime – Real-time Video Analytics
Unified video platform (acquisition, transcoding, broadcast, ads, analytics)
26
© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
Adobe Primetime – Use Case
Use Cases: Audience metrics – minutes latency ok
Ads metrics – seconds to minutes ok
Streaming QoS metrics – seconds must
Requirements: Massive throughput (millions of streams, multiple
heartbeats every 10 seconds)
Low latency (end-to-end)
27
© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 28
© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
Conclusions
OLAP semantics on a simple data model
Data as first class citizen
Domain Specific “Language” for Dimensions, Metrics, Aggregations
Framework for vertical analytics systems
Tunable performance, resource allocation
29
© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
Thank you!Cosmin Lehene @clehene
http://hstack.org
30
© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
Related
http://www.hbasecon.com/sessions/low-latency-olap-with-hbase/
http://www.slideshare.net/clehene/low-latency-olap-with-hbase-hbasecon-2012
31
© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.