Post on 17-Aug-2015
transcript
Data ScienceLessons from the Field
Gaurav KatariaHead of Product AdoptionGoogle for Work
Guest Lecturer at Stanford Business School
The views expressed here are my own and do not necessarily represent views of my employer
Lessons from the field
1. Create data-driven culture
2. Invest in the key capabilities
3. Iterate and adapt fast
4. Make the business tradeoffs
5. Don’t forget security and privacy
Data-Driven Decision Making Decision-Driven Data Making(Finding the data to support a
decision that has already been made)
Lesson #1: Create data-driven culture
Lesson #2: Invest in the Capabilities
Describe the Trends
Predict the Future
Change the Future
360o DATA
MACHINE LEARNING
EXPERIMENTATION
Lesson #2: Invest in the Capabilities
Describe the Trends
Predict the Future
Change the Future
360o DATA
MACHINE LEARNING
EXPERIMENTATION
“80% of a Data Scientist’s time is spent on cleaning and organizing the data”
Lesson #2: Invest in the Capabilities
Describe the Trends
Predict the Future
Change the Future
360o DATA
MACHINE LEARNING
EXPERIMENTATION
“Correlation Causation”
Lesson #2: Invest in the Capabilities
Describe the Trends
Predict the Future
Change the Future
360o DATA
MACHINE LEARNING
EXPERIMENTATION
“Let’s do it and then we’ll see what happens” Experiment
Lesson #2: Invest in the Capabilities (examples of common pitfalls)
Claim: Demand is inelastic
Reality: Channel partners did not pass on the discount
Claim: Feature is increasing user engagement
Reality: Users are frustrated because it takes them longer to get the stuff done
Claim: Launch increased sales
Reality: Actually holidays increased sales; the launch actually depressed sales
Price CutSales Feature LaunchTime spent
on site
Product LaunchSales
Sep Oct Nov Dec
Lesson #3: Iterate and adapt fast (Examples)
Movie/Song Recommendations
● Trends change
● User’s preferences change
● Licensing costs change
Lesson #3: Iterate and adapt fast (Examples)
Competition is not sleeping
● Big Data is getting democratized
● Machine learning offered as a service
● People are tuning their algorithms
© Shivon Zilis, Bloomberg Beta
● 100 customers: 90 will renew their subscription and 10 will not
● If we had a simple model that guessed our customers would always renew, it would be
accurate 90% of the time
● However, we’d never be able to identify the 10 customers who won’t renew
● Most business data follows a similar pattern (called class imbalance)
● We need an intelligent model, not just an accurate model
Lesson #4: Make the business tradeoffs (Example)
Lesson #4: Make the business tradeoffs (Example)
Prediction
Non-renew(10)
Renew(90)
True Positive
True Negative
Lesson #4: Make the business tradeoffs (Example)
Prediction
Non-renew(10)
Renew(90)
True Positive
False Positive
True Negative
False Negative
Lesson #4: Make the business tradeoffs (Example)
Prediction
Non-renew(10)
Renew(90)
True Positive (TP)
False Positive (FP)
True Negative (TN)
False Negative (FN)
Precision = 55%TPTP + FP
Recall = 60% TPTP + FN
● Cost plays a big role. For example,
○ Cost of action is $100/customer; Total cost of action for all predicted: $1,100
○ Benefit of action is $150/customer; Total benefit (true positives only): $900
○ Cost > Benefit
● So, what is more important: Precision or Recall?
Lesson #4: Make the business tradeoffs (Example)
Lesson #4: Make the business tradeoffs (Example)
Importance of Precision
Cost of a False Positive
Importance of Recall
Cost of a False Negative
Examples Automated Email
Large Discount
Product Recommendation
Customer Churn