1©MapR Technologies 2013
Which Algorithms Really Matter?
2©MapR Technologies 2013
Me, Us
Ted Dunning, Chief Application Architect, MapRCommitter PMC member, Mahout, Zookeeper, DrillBought the beer at the first HUG
MapRDistributes more open source components for HadoopAdds major technology for performance, HA, industry standard API’s
InfoHash tag - #maprSee also - @ApacheMahout @ApacheDrill
@ted_dunning and @mapR
4©MapR Technologies 2013
Topic For Today
What is important? What is not? Why? What is the difference from academic research? Some examples
5©MapR Technologies 2013
What is Important?
Deployable
Robust
Transparent
Skillset and mindset matched?
Proportionate
6©MapR Technologies 2013
What is Important?
Deployable– Clever prototypes don’t count if they can’t be standardized
Robust
Transparent
Skillset and mindset matched?
Proportionate
7©MapR Technologies 2013
What is Important?
Deployable– Clever prototypes don’t count
Robust– Mishandling is common
Transparent– Will degradation be obvious?
Skillset and mindset matched?
Proportionate
8©MapR Technologies 2013
What is Important?
Deployable– Clever prototypes don’t count
Robust– Mishandling is common
Transparent– Will degradation be obvious?
Skillset and mindset matched?– How long will your fancy data scientist enjoy doing standard ops tasks?
Proportionate– Where is the highest value per minute of effort?
9©MapR Technologies 2013
Academic Goals vs Pragmatics
Academic goals– Reproducible– Isolate theoretically important aspects– Work on novel problems
Pragmatics– Highest net value– Available data is constantly changing– Diligence and consistency have larger impact than cleverness– Many systems feed themselves, exploration and exploitation are both
important– Engineering constraints on budget and schedule
10©MapR Technologies 2013
Example 1:Making Recommendations Better
11©MapR Technologies 2013
Recommendation Advances
What are the most important algorithmic advances in recommendations over the last 10 years?
Cooccurrence analysis?
Matrix completion via factorization?
Latent factor log-linear models?
Temporal dynamics?
12©MapR Technologies 2013
The Winner – None of the Above
What are the most important algorithmic advances in recommendations over the last 10 years?
1. Result dithering2. Anti-flood
13©MapR Technologies 2013
The Real Issues
Exploration Diversity Speed
Not the last fraction of a percent
14©MapR Technologies 2013
Result Dithering
Dithering is used to re-order recommendation results – Re-ordering is done randomly
Dithering is guaranteed to make off-line performance worse
Dithering also has a near perfect record of making actual performance much better
15©MapR Technologies 2013
Result Dithering
Dithering is used to re-order recommendation results – Re-ordering is done randomly
Dithering is guaranteed to make off-line performance worse
Dithering also has a near perfect record of making actual performance much better
“Made more difference than any other change”
16©MapR Technologies 2013
Simple Dithering Algorithm
Generate synthetic score from log rank plus Gaussian
Pick noise scale to provide desired level of mixing
Typically
Oh… use floor(t/T) as seed
17©MapR Technologies 2013
Example … ε = 0.5
18©MapR Technologies 2013
Example … ε = log 2 = 0.69
19©MapR Technologies 2013
Exploring The Second Page
20©MapR Technologies 2013
Lesson 1:Exploration is good
21©MapR Technologies 2013
Example 2:Bayesian Bandits
22©MapR Technologies 2013
Bayesian Bandits
Based on Thompson sampling Very general sequential test Near optimal regret Trade-off exploration and exploitation
Possibly best known solution for exploration/exploitation
Incredibly simple
23©MapR Technologies 2013
Thompson Sampling
Select each shell according to the probability that it is the best
Probability that it is the best can be computed using posterior
But I promised a simple answer
24©MapR Technologies 2013
Thompson Sampling – Take 2
Sample θ
Pick i to maximize reward
Record result from using i
25©MapR Technologies 2013
Fast Convergence
26©MapR Technologies 2013
Thompson Sampling on Ads
An Empirical Evaluation of Thompson Sampling - Chapelle and Li, 2011
27©MapR Technologies 2013
Bayesian Bandits versus Result Dithering
Many useful systems are difficult to frame in fully Bayesian form Thompson sampling cannot be applied without posterior sampling
Can still do useful exploration with dithering
But better to use Thompson sampling if possible
28©MapR Technologies 2013
Lesson 2:Exploration is pretty easy to do and pays big benefits.
29©MapR Technologies 2013
Example 3:On-line Clustering
30©MapR Technologies 2013
The Problem
K-means clustering is useful for feature extraction or compression
At scale and at high dimension, the desirable number of clusters increases
Very large number of clusters may require more passes through the data
Super-linear scaling is generally infeasible
31©MapR Technologies 2013
The Solution
Sketch-based algorithms produce a sketch of the data Streaming k-means uses adaptive dp-means to produce this sketch
in the form of many weighted centroids which approximate the original distribution
The size of the sketch grows very slowly with increasing data size Many operations such as clustering are well behaved on sketches
Fast and Accurate k-means For Large Datasets. Michael Shindler, Alex Wong, Adam Meyerson.
Revisiting k-means: New Algorithms via Bayesian Nonparametrics . Brian Kulis, Michael Jordan.
32©MapR Technologies 2013
An Example
33©MapR Technologies 2013
An Example
34©MapR Technologies 2013
The Cluster Proximity Features
Every point can be described by the nearest cluster – 4.3 bits per point in this case– Significant error that can be decreased (to a point) by increasing number of
clusters Or by the proximity to the 2 nearest clusters (2 x 4.3 bits + 1 sign
bit + 2 proximities)– Error is negligible– Unwinds the data into a simple representation
Or we can increase the number of clusters (n fold increase adds log n bits per point, decreases error by sqrt(n)
35©MapR Technologies 2013
Diagonalized Cluster Proximity
36©MapR Technologies 2013
Lots of Clusters Are Fine
37©MapR Technologies 2013
Typical k-means Failure
Selecting two seeds here cannot be
fixed with Lloyds
Result is that these two clusters get glued
together
38©MapR Technologies 2013
Streaming k-means Ideas
By using a sketch with lots (k log N) of centroids, we avoid pathological cases
We still get a very good result if the sketch is created – in one pass– with approximate search
In fact, adaptive dp-means works just fine
In the end, the sketch can be used for clustering or …
39©MapR Technologies 2013
Lesson 3:Sketches make big data small.
40©MapR Technologies 2013
Example 4:Search Abuse
41©MapR Technologies 2013
Recommendations
Alice got an apple and a puppy
Charles got a bicycle
Alice
Charles
42©MapR Technologies 2013
Recommendations
Alice got an apple and a puppy
Charles got a bicycle
Bob got an apple
Alice
Bob
Charles
43©MapR Technologies 2013
Recommendations
What else would Bob like??
Alice
Bob
Charles
44©MapR Technologies 2013
Log Files
Alice
Bob
Charles
Alice
Bob
Charles
Alice
45©MapR Technologies 2013
History Matrix: Users by Items
Alice
Bob
Charles
✔ ✔ ✔✔ ✔
✔ ✔
46©MapR Technologies 2013
Co-occurrence Matrix: Items by Items
-
1 21 1
1
12 1
How do you tell which co-occurrences are useful?.
00
0 0
47©MapR Technologies 2013
Co-occurrence Binary Matrix
11not
not
1
48©MapR Technologies 2013
Indicator Matrix: Anomalous Co-Occurrence
✔✔
Result: The marked row will be added to the indicator field in the item document…
49©MapR Technologies 2013
Indicator Matrix
✔id: t4title: puppydesc: The sweetest little puppy ever.keywords: puppy, dog, pet
indicators: (t1)
That one row from indicator matrix becomes the indicator field in the Solr document used to deploy the recommendation engine.
Note: data for the indicator field is added directly to meta-data for a document in Solr index. You don’t need to create a separate index for the indicators.
50©MapR Technologies 2013
Internals of the Recommender Engine
50
51©MapR Technologies 2013
Internals of the Recommender Engine
51
52©MapR Technologies 2013
Looking Inside LucidWorks
What to recommend if new user listened to 2122: Fats Domino & 303: Beatles?
Recommendation is “1710 : Chuck Berry”
52
Real-time recommendation query and results: Evaluation
53©MapR Technologies 2013
Real-life example
54©MapR Technologies 2013
Lesson 4:Recursive search abuse pays
Search can implement recsWhich can implement search
55©MapR Technologies 2013
Summary
56©MapR Technologies 2013
57©MapR Technologies 2013
Me, Us
Ted Dunning, Chief Application Architect, MapRCommitter PMC member, Mahout, Zookeeper, DrillBought the beer at the first HUG
MapRDistributes more open source components for HadoopAdds major technology for performance, HA, industry standard API’s
InfoHash tag - #maprSee also - @ApacheMahout @ApacheDrill
@ted_dunning and @mapR