Date post: | 16-Apr-2017 |
Category: |
Technology |
Upload: | charles-allen |
View: | 959 times |
Download: | 2 times |
2015-12-03
Programmatic Bidding Data Streams & Druid
Charles Allen
2015-12-03
We Are Hiring!We’d love to connect! Our current open positions are:
Engineering Director, UI Engineer and Distributed Systems Engineer.
We always have positions opening up so feel free to connect with Sarah Carter (our Head of Recruiting) for future
openings - [email protected].
2015-12-03
What is Real-Time Bidding?
Real-Time Bidding is resolving advertising supply and demand at the moment of supply.
+Best suited for systems with internet connectivity.
2015-12-03
For the sake of this conversation, Real-Time Bidding (RTB) is the general method by which digital media supply and demand is commonly reconciled using programmatic methodologies over very short time frames.
2015-12-03
What Happens in Real-Time Bidding?1. User loads resources which contain ad space
(supply is created by a Publisher)
2015-12-03
What Happens in Real-Time Bidding?2. Information / notification is generated and distributed to interested parties
Avail (a unit of supply of audience attention) is handled by an Exchange
2015-12-03
What Happens in Real-Time Bidding?3. Information on the avail is distributed to potentially interested parties
We now have an auction
2015-12-03
What Happens in Real-Time Bidding?4. Potentially interested parties judge the avail and either bid on the auction, or they do not.
2015-12-03
What Happens in Real-Time Bidding?5. The winner of the auction is determined by the exchange.5b. 100 ms has passedIf a human can perceive that an auction took place YOU ARE TOO SLOW
2015-12-03
What Happens in Real-Time Bidding?6. The winning ad is attempted to be served as an impression
2015-12-03
What Happens in Real-Time Bidding?7. The impression hopefully turns into a click or conversion
2015-12-03
Avail / Auction
Bid
Impression
Click / Conversion
???
2015-12-03
Programmatic data is 100x larger than Wall Street
2015-12-03
Cern - LHC
The LHC produces about 1GBs averagehttp://home.cern/about/updates/2015/06/lhc-season-2-cern-computing-ready-data-torrent
MMX raw incoming stream data regularly exceeds this * 1hr average
2015-12-03
Avail / Auction
Bid
Impression
Click / Conversion
???
2015-12-03
General Architecture
KafkaSamza/Kafka
Druid RTTranquility
Raw (S3) Hadoop / Spark
Deep Storage (S3)
Druid HistoricalUI / User
2015-12-03
Druid for Queries!.. But what is Druid?
Official - Druid is a fast column-oriented distributed data storeMe - Druid is a highly available Data Store designed for interactive, ad-hoc, OLAP style queries on time-series, denormalized data.
2015-12-03
Key points for BEST use casesHighly Available - No downtime for maintenance since 2011Interactive - FASTOLAP - InsightfulAd-hoc - DynamicTime-series - SequentialDenormalized - Flat
* By the way, it works on Streams(aka Real Real-Time)
2015-12-03
Lifecycle of a Real-Time Datum
Mr. Charlie Event
2015-12-03
Lifecycle of a Real-Time Datum
Firehose
Firehose
Druid RT Peon 0
Druid RT Peon 1
* Launched by Overlord by way of a Middle Manager
2015-12-03
Lifecycle of a Real-Time Datum
Firehose Druid RT 0In Memory Write-Optimized Store
Parser
2015-12-03
Lifecycle of a Real-Time Datum
Druid RT 0In Memory Write-Optimized Store
2015-12-03
Lifecycle of a Real-Time Datum
Druid RT 0In Memory Write-Optimized Store
Rollup
2015-12-03
Lifecycle of a Real-Time Datum
Druid RT 0In Memory Write-Optimized Store
2015-12-03
Lifecycle of a Real-Time Datum
In Memory Write-Optimized Store
Time or Size Memory MappedRead-Only Store
Persist
2015-12-03
Lifecycle of a Real-Time Datum
Memory MappedRead-Only Store
Memory MappedRead-Only Store
Memory MappedRead-Only Store
Merge Memory MappedRead-Only Store
* Segment
2015-12-03
Handoff
Lifecycle of a Real-Time Datum
Memory MappedRead-Only Store
Druid RT 0Druid Historical
Deep Storage(S3, HDFS, Azure, Cassandra)
2015-12-03
Lifecycle of a Real-Time Datum
Druid RT 0Druid Historical
Deep Storage(S3, HDFS, Azure, Cassandra)
Memory MappedRead-Only Store
* Orchestrated by Coordinator
2015-12-03
Lifecycle of a Real-Time DatumDruid Historical
Memory MappedRead-Only Store
Druid - Hot Druid - Cold Druid - Icy
Memory MappedRead-Only Store
Very Little Paging Some Paging Lots of Paging
2015-12-03
Lifecycle of a Real-Time DatumDruid Historical
Memory MappedRead-Only Store
Druid - Hot Druid - Cold Druid - Icy
Memory MappedRead-Only Store
Very Little Paging Some Paging Lots of Paging
Memory MappedRead-Only Store
2015-12-03
Lifecycle of a Real-Time DatumDruid Historical
Memory MappedRead-Only Store
Druid - Hot Druid - Cold Druid - Icy
Memory MappedRead-Only Store
Very Little Paging Some Paging Lots of Paging
2015-12-03
Lifecycle of a Real-Time DatumDruid Historical
Memory MappedRead-Only Store
Druid - Hot Druid - Cold Druid - Icy
Very Little Paging Some Paging Lots of Paging
2015-12-03
Lifecycle of a Real-Time Datum
Lifecycle rules tunable by datasource
2015-12-03
Canary / Metrics cluster
CoordinatorConsole
2015-12-03
Lifecycle of a Query
Query Router
Cold - Broker
Hot - Broker
XOR
2015-12-03
Lifecycle of a Query
Broker
Druid RT (Peon)
Druid Historical Hot
Druid Historical Cold
Druid Historical IcyCache
2015-12-03
Define Stream Hooks
Lifecycle of a Query
Cache
Druid Historical XYZ
Memory MappedRead-Only Store
Memory MappedRead-Only Store
2015-12-03
Lifecycle of a Query
Memory MappedRead-Only Store
Column Dictionary
Dimension Value Bitmap
Dimension Value Bitmap
Dimension Value Bitmap
Metric Column
Metric Column
Metric Column
Metric Column
2015-12-03
Lifecycle of a Query
Memory MappedRead-Only Store
Column Dictionary
Dimension Value Bitmap
Dimension Value Bitmap
Dimension Value Bitmap
Metric Column
Metric Column
Metric Column
Metric Column
* ByteBuffer slices
2015-12-03
Lifecycle of a Query
Dimension Value Bitmap
Dimension Value Bitmap
Metric Column
Metric Column
Metric Column
IteratorAggregator Aggregator Aggregator
Ready, set… GO!
2015-12-03
Lifecycle of a Query
Iterator
Aggregator
Aggregator
Aggregator
“Take 0, take 1,take 7, take 10”
Scan columns ONCE
Metrics
Dimensions
2015-12-03
Lifecycle of a Query
Iterator
Aggregator
Aggregator
Aggregator
Metrics
Dimensions
Memory Mapped Byte Buffers (Kernel disk cache)
2015-12-03
Lifecycle of a Query
Iterator
Aggregator
Aggregator
Aggregator
Metrics
Dimensions
JVM managed memory
2015-12-03
Lifecycle of a Query
Intermediate
Results
Intermediate
Results
Merge
Cache
Cache
Druid Historical XYZ Result
2015-12-03
Lifecycle of a Query
Druid Historical XYZ Result
Druid RTDEF Result
Druid Historical ABC Result
Merge Broker
Done!bubble up to UI
Router UI*
* Technically bubbles up to Business Logic layer
2015-12-03
Demo!
2015-12-03
What was in the Demo?
2015-12-03
Actual Druid Usage Data
Query load isabout ½ MillionPer Day
2015-12-03
Actual Druid Indexing Data
Only 2.8M streamingevents/secyesterday duringpeak hour.Was a slow day.
2015-12-03
Druid OSS Clients!Official
+ R https://github.com/druid-io/RDruid+ Python https://github.com/druid-io/pydruid
Community+ Spark https://github.com/SparklineData/spark-druid-olap+ SQL https://github.com/srikalyc/Sql4D + Many more! http://druid.io/docs/latest/development/libraries.html
JavaScript, Node.js, Clojure, Ruby, (other) SQL, TypeScript
2015-12-03
R Examplelibrary(RDruid)start_time <- as.POSIXlt(Sys.time(), "UTC", origin = "1970-01-01")start_time$sec <- 0end_time <- start_timestart_time$hour <- start_time$hour - 24intvl <- interval(start_time, end_time)
segment_times <- druid.query.timeseries( url = druid_query_url, # bard endpoint intervals = intvl, dataSource = "mmx_metrics_druid", aggregations = list(count = longSum(metric("count")), value = longSum(metric("value"))), filter = dimension("host") %=% hosts & dimension("metric") %=% "query/segment/time", granularity = "minute", context = list(useCache = T, populateCache = T))
2015-12-03
UI - Panoramix
https://github.com/mistercrunch/panoramix
2015-12-03
UI - Grafana
https://github.com/Quantiply/grafana-plugins/tree/master/features/druid
2015-12-03
Druid Speed+ https://www.linkedin.com/pulse/combining-druid-spark-interactive-flexible-an
alytics-scale-butani+ http://druid.io/blog/2014/03/17/benchmarking-druid.html
We’re always getting faster!Very common question in PRs is “How does this affect speed?” and PROVE ITMicro-benchmarks in druid-io master branchhttps://github.com/druid-io/druid/tree/master/benchmarksMacro-benchmarks done at scale(see your metrics console for answers)
2015-12-03
We Are Hiring!We’d love to connect! Our current open positions are:
Engineering Director, UI Engineer and Distributed Systems Engineer.
We always have positions opening up so feel free to connect with Sarah Carter (our Head of Recruiting) for future
openings - [email protected].