Date post: | 21-Feb-2017 |
Category: |
Data & Analytics |
Upload: | spark-summit |
View: | 62 times |
Download: | 1 times |
R&D To Product Pipeline Using Apache Spark in Adtech
Maximo GurmendezDr. Sunanda ParthasarathyDr. Saket MengleDataXu Inc.
What to expect from this session• Who is DataXu?
• Why Apache Spark?
• From R&D to product using Apache Spark Demo
• Analytics using Apache Spark
DataXuMake Marketing Smarter Through Data Science
• Who• Spun out of MIT Labs• A petabyte scale digital
marketing platform• One of the fastest growing
companies in Inc. 5000• What
• Help world’s most valuable brands understand and engage with their consumer
• Maximize ROI
Quick Statistics• Billions of ads served per month• ~10ms round trip response time• 130+ TB logs per day • 3000+ servers powering the platform• 13 regions, 24x7
Real Time Bidding
DataXu Machine Learning
Learn Models
ModelsImpressionsClicks
Activities
Calibrate
Evaluate
Real Time
BiddingS3
Why is this hard?Huge Scale • 2.7 million bid decisions per second
• 3 PB of data processed daily• Runs 24 X 7 on 5 Continents• Thousands of ML Models Trained per Day
Unattended Operation • Model training and deployment runs automatically every day
Changing Industry • Need ability to adapt quickly to new customer requirements
Demo
Benchmarks
0 5,000,000 10,000,000 15,000,0000
100
200
300
400
500Training Time Comparison
Logistic RegressionLinear (Logistic Regression)Decision TreeLinear (Decision Tree)Linear (Decision Tree)Random ForestLinear (Random Forest)
Number of Training Records
Trai
ning
Tim
e ( i
n se
c)
Current DataXu Model Spark Random Forest0
0.4
0.8
1.2
1.6
Avg. Bidding Latency (milliseconds)
Random Forests
Logistic Regression
Naive Bayes Decision Trees
DataXu Model
020406080
100120140
Model Size in Memory (KB)
S3 – meta data
Why Apache Spark for Adv. Analytics
• Makes Advanced Analytics a reality – accelerated queries, graph processing, streaming analytics
• Speaks multiple languages (Python, Scala, SQL)
• Makes it easy – Compared to Java/Hadoop complexities
• Accelerates the analyst/data scientist workflow
Real Time Bidding Engine
Adv. Analytics Engine
S3 – meta data
Advanced Analytics at DataXu
Real Time Bidding Engine
Analytics Engine
Partner/Client Data
Dashboarding/Reporting
+}
Analytics Demo
Thank You.
[email protected]@[email protected]
!! We’re hiring !! Data Scientists, Data Science Engineers. FTEs, Interns