+ All Categories
Home > Technology > Programmatic Bidding Data Streams & Druid

Programmatic Bidding Data Streams & Druid

Date post: 16-Apr-2017
Category:
Upload: charles-allen
View: 959 times
Download: 2 times
Share this document with a friend
56
2015-12-03 Programmatic Bidding Data Streams & Druid Charles Allen
Transcript
Page 1: Programmatic Bidding Data Streams & Druid

2015-12-03

Programmatic Bidding Data Streams & Druid

Charles Allen

Page 2: Programmatic Bidding Data Streams & Druid

2015-12-03

We Are Hiring!We’d love to connect! Our current open positions are:

Engineering Director, UI Engineer and Distributed Systems Engineer.

We always have positions opening up so feel free to connect with Sarah Carter (our Head of Recruiting) for future

openings - [email protected].

Page 3: Programmatic Bidding Data Streams & Druid

2015-12-03

What is Real-Time Bidding?

Real-Time Bidding is resolving advertising supply and demand at the moment of supply.

+Best suited for systems with internet connectivity.

Page 4: Programmatic Bidding Data Streams & Druid

2015-12-03

For the sake of this conversation, Real-Time Bidding (RTB) is the general method by which digital media supply and demand is commonly reconciled using programmatic methodologies over very short time frames.

Page 5: Programmatic Bidding Data Streams & Druid

2015-12-03

What Happens in Real-Time Bidding?1. User loads resources which contain ad space

(supply is created by a Publisher)

Page 6: Programmatic Bidding Data Streams & Druid

2015-12-03

What Happens in Real-Time Bidding?2. Information / notification is generated and distributed to interested parties

Avail (a unit of supply of audience attention) is handled by an Exchange

Page 7: Programmatic Bidding Data Streams & Druid

2015-12-03

What Happens in Real-Time Bidding?3. Information on the avail is distributed to potentially interested parties

We now have an auction

Page 8: Programmatic Bidding Data Streams & Druid

2015-12-03

What Happens in Real-Time Bidding?4. Potentially interested parties judge the avail and either bid on the auction, or they do not.

Page 9: Programmatic Bidding Data Streams & Druid

2015-12-03

What Happens in Real-Time Bidding?5. The winner of the auction is determined by the exchange.5b. 100 ms has passedIf a human can perceive that an auction took place YOU ARE TOO SLOW

Page 10: Programmatic Bidding Data Streams & Druid

2015-12-03

What Happens in Real-Time Bidding?6. The winning ad is attempted to be served as an impression

Page 11: Programmatic Bidding Data Streams & Druid

2015-12-03

What Happens in Real-Time Bidding?7. The impression hopefully turns into a click or conversion

Page 12: Programmatic Bidding Data Streams & Druid

2015-12-03

Avail / Auction

Bid

Impression

Click / Conversion

???

Page 13: Programmatic Bidding Data Streams & Druid

2015-12-03

Programmatic data is 100x larger than Wall Street

Page 14: Programmatic Bidding Data Streams & Druid

2015-12-03

Cern - LHC

The LHC produces about 1GBs averagehttp://home.cern/about/updates/2015/06/lhc-season-2-cern-computing-ready-data-torrent

MMX raw incoming stream data regularly exceeds this * 1hr average

Page 15: Programmatic Bidding Data Streams & Druid

2015-12-03

Avail / Auction

Bid

Impression

Click / Conversion

???

Page 16: Programmatic Bidding Data Streams & Druid

2015-12-03

General Architecture

KafkaSamza/Kafka

Druid RTTranquility

Raw (S3) Hadoop / Spark

Deep Storage (S3)

Druid HistoricalUI / User

Page 17: Programmatic Bidding Data Streams & Druid

2015-12-03

Druid for Queries!.. But what is Druid?

Official - Druid is a fast column-oriented distributed data storeMe - Druid is a highly available Data Store designed for interactive, ad-hoc, OLAP style queries on time-series, denormalized data.

Page 18: Programmatic Bidding Data Streams & Druid

2015-12-03

Key points for BEST use casesHighly Available - No downtime for maintenance since 2011Interactive - FASTOLAP - InsightfulAd-hoc - DynamicTime-series - SequentialDenormalized - Flat

* By the way, it works on Streams(aka Real Real-Time)

Page 19: Programmatic Bidding Data Streams & Druid

2015-12-03

Lifecycle of a Real-Time Datum

Mr. Charlie Event

Page 20: Programmatic Bidding Data Streams & Druid

2015-12-03

Lifecycle of a Real-Time Datum

Firehose

Firehose

Druid RT Peon 0

Druid RT Peon 1

* Launched by Overlord by way of a Middle Manager

Page 21: Programmatic Bidding Data Streams & Druid

2015-12-03

Lifecycle of a Real-Time Datum

Firehose Druid RT 0In Memory Write-Optimized Store

Parser

Page 22: Programmatic Bidding Data Streams & Druid

2015-12-03

Lifecycle of a Real-Time Datum

Druid RT 0In Memory Write-Optimized Store

Page 23: Programmatic Bidding Data Streams & Druid

2015-12-03

Lifecycle of a Real-Time Datum

Druid RT 0In Memory Write-Optimized Store

Rollup

Page 24: Programmatic Bidding Data Streams & Druid

2015-12-03

Lifecycle of a Real-Time Datum

Druid RT 0In Memory Write-Optimized Store

Page 25: Programmatic Bidding Data Streams & Druid

2015-12-03

Lifecycle of a Real-Time Datum

In Memory Write-Optimized Store

Time or Size Memory MappedRead-Only Store

Persist

Page 26: Programmatic Bidding Data Streams & Druid

2015-12-03

Lifecycle of a Real-Time Datum

Memory MappedRead-Only Store

Memory MappedRead-Only Store

Memory MappedRead-Only Store

Merge Memory MappedRead-Only Store

* Segment

Page 27: Programmatic Bidding Data Streams & Druid

2015-12-03

Handoff

Lifecycle of a Real-Time Datum

Memory MappedRead-Only Store

Druid RT 0Druid Historical

Deep Storage(S3, HDFS, Azure, Cassandra)

Page 28: Programmatic Bidding Data Streams & Druid

2015-12-03

Lifecycle of a Real-Time Datum

Druid RT 0Druid Historical

Deep Storage(S3, HDFS, Azure, Cassandra)

Memory MappedRead-Only Store

* Orchestrated by Coordinator

Page 29: Programmatic Bidding Data Streams & Druid

2015-12-03

Lifecycle of a Real-Time DatumDruid Historical

Memory MappedRead-Only Store

Druid - Hot Druid - Cold Druid - Icy

Memory MappedRead-Only Store

Very Little Paging Some Paging Lots of Paging

Page 30: Programmatic Bidding Data Streams & Druid

2015-12-03

Lifecycle of a Real-Time DatumDruid Historical

Memory MappedRead-Only Store

Druid - Hot Druid - Cold Druid - Icy

Memory MappedRead-Only Store

Very Little Paging Some Paging Lots of Paging

Memory MappedRead-Only Store

Page 31: Programmatic Bidding Data Streams & Druid

2015-12-03

Lifecycle of a Real-Time DatumDruid Historical

Memory MappedRead-Only Store

Druid - Hot Druid - Cold Druid - Icy

Memory MappedRead-Only Store

Very Little Paging Some Paging Lots of Paging

Page 32: Programmatic Bidding Data Streams & Druid

2015-12-03

Lifecycle of a Real-Time DatumDruid Historical

Memory MappedRead-Only Store

Druid - Hot Druid - Cold Druid - Icy

Very Little Paging Some Paging Lots of Paging

Page 33: Programmatic Bidding Data Streams & Druid

2015-12-03

Lifecycle of a Real-Time Datum

Lifecycle rules tunable by datasource

Page 34: Programmatic Bidding Data Streams & Druid

2015-12-03

Canary / Metrics cluster

CoordinatorConsole

Page 35: Programmatic Bidding Data Streams & Druid

2015-12-03

Lifecycle of a Query

Query Router

Cold - Broker

Hot - Broker

XOR

Page 36: Programmatic Bidding Data Streams & Druid

2015-12-03

Lifecycle of a Query

Broker

Druid RT (Peon)

Druid Historical Hot

Druid Historical Cold

Druid Historical IcyCache

Page 37: Programmatic Bidding Data Streams & Druid

2015-12-03

Define Stream Hooks

Lifecycle of a Query

Cache

Druid Historical XYZ

Memory MappedRead-Only Store

Memory MappedRead-Only Store

Page 38: Programmatic Bidding Data Streams & Druid

2015-12-03

Lifecycle of a Query

Memory MappedRead-Only Store

Column Dictionary

Dimension Value Bitmap

Dimension Value Bitmap

Dimension Value Bitmap

Metric Column

Metric Column

Metric Column

Metric Column

Page 39: Programmatic Bidding Data Streams & Druid

2015-12-03

Lifecycle of a Query

Memory MappedRead-Only Store

Column Dictionary

Dimension Value Bitmap

Dimension Value Bitmap

Dimension Value Bitmap

Metric Column

Metric Column

Metric Column

Metric Column

* ByteBuffer slices

Page 40: Programmatic Bidding Data Streams & Druid

2015-12-03

Lifecycle of a Query

Dimension Value Bitmap

Dimension Value Bitmap

Metric Column

Metric Column

Metric Column

IteratorAggregator Aggregator Aggregator

Ready, set… GO!

Page 41: Programmatic Bidding Data Streams & Druid

2015-12-03

Lifecycle of a Query

Iterator

Aggregator

Aggregator

Aggregator

“Take 0, take 1,take 7, take 10”

Scan columns ONCE

Metrics

Dimensions

Page 42: Programmatic Bidding Data Streams & Druid

2015-12-03

Lifecycle of a Query

Iterator

Aggregator

Aggregator

Aggregator

Metrics

Dimensions

Memory Mapped Byte Buffers (Kernel disk cache)

Page 43: Programmatic Bidding Data Streams & Druid

2015-12-03

Lifecycle of a Query

Iterator

Aggregator

Aggregator

Aggregator

Metrics

Dimensions

JVM managed memory

Page 44: Programmatic Bidding Data Streams & Druid

2015-12-03

Lifecycle of a Query

Intermediate

Results

Intermediate

Results

Merge

Cache

Cache

Druid Historical XYZ Result

Page 45: Programmatic Bidding Data Streams & Druid

2015-12-03

Lifecycle of a Query

Druid Historical XYZ Result

Druid RTDEF Result

Druid Historical ABC Result

Merge Broker

Done!bubble up to UI

Router UI*

* Technically bubbles up to Business Logic layer

Page 46: Programmatic Bidding Data Streams & Druid

2015-12-03

Demo!

Page 47: Programmatic Bidding Data Streams & Druid

2015-12-03

What was in the Demo?

Page 48: Programmatic Bidding Data Streams & Druid

2015-12-03

Actual Druid Usage Data

Query load isabout ½ MillionPer Day

Page 49: Programmatic Bidding Data Streams & Druid

2015-12-03

Actual Druid Indexing Data

Only 2.8M streamingevents/secyesterday duringpeak hour.Was a slow day.

Page 50: Programmatic Bidding Data Streams & Druid

2015-12-03

Druid OSS Clients!Official

+ R https://github.com/druid-io/RDruid+ Python https://github.com/druid-io/pydruid

Community+ Spark https://github.com/SparklineData/spark-druid-olap+ SQL https://github.com/srikalyc/Sql4D + Many more! http://druid.io/docs/latest/development/libraries.html

JavaScript, Node.js, Clojure, Ruby, (other) SQL, TypeScript

Page 51: Programmatic Bidding Data Streams & Druid

2015-12-03

R Examplelibrary(RDruid)start_time <- as.POSIXlt(Sys.time(), "UTC", origin = "1970-01-01")start_time$sec <- 0end_time <- start_timestart_time$hour <- start_time$hour - 24intvl <- interval(start_time, end_time)

segment_times <- druid.query.timeseries( url = druid_query_url, # bard endpoint intervals = intvl, dataSource = "mmx_metrics_druid", aggregations = list(count = longSum(metric("count")), value = longSum(metric("value"))), filter = dimension("host") %=% hosts & dimension("metric") %=% "query/segment/time", granularity = "minute", context = list(useCache = T, populateCache = T))

Page 52: Programmatic Bidding Data Streams & Druid

2015-12-03

UI - Panoramix

https://github.com/mistercrunch/panoramix

Page 53: Programmatic Bidding Data Streams & Druid

2015-12-03

UI - Grafana

https://github.com/Quantiply/grafana-plugins/tree/master/features/druid

Page 54: Programmatic Bidding Data Streams & Druid

2015-12-03

UI - Pivot

https://github.com/implydata/pivot

Page 55: Programmatic Bidding Data Streams & Druid

2015-12-03

Druid Speed+ https://www.linkedin.com/pulse/combining-druid-spark-interactive-flexible-an

alytics-scale-butani+ http://druid.io/blog/2014/03/17/benchmarking-druid.html

We’re always getting faster!Very common question in PRs is “How does this affect speed?” and PROVE ITMicro-benchmarks in druid-io master branchhttps://github.com/druid-io/druid/tree/master/benchmarksMacro-benchmarks done at scale(see your metrics console for answers)

Page 56: Programmatic Bidding Data Streams & Druid

2015-12-03

We Are Hiring!We’d love to connect! Our current open positions are:

Engineering Director, UI Engineer and Distributed Systems Engineer.

We always have positions opening up so feel free to connect with Sarah Carter (our Head of Recruiting) for future

openings - [email protected].


Recommended