Building Personalized Applications at Scale
Garrett WuDirector of Engineering
Odiago, Inc.
Personalized Applications
Personalized Applications
Examples
● Recommendations○ Amazon○ Netflix
● Ad Targeting○ Hulu○ YouTube
● Fraud Detection○ Visa○ JPMC
● Spam○ GMail
● Search Personalization○ Google
Overall Requirements
● React to events in near real time.○ Low latency reads/writes.○ Event-driven analysis (not just batch).
● Web scale: 100's of millions of users.○ High throughput reads/writes.
● Reliable.○ Distributed, fault tolerant, graceful degradation.
● Flexible.○ Evolvable schema.○ Support ad-hoc experimentation and analyses.
Data Flow
Data Flow
Datastore Requirements
1. Random writes.2. Analysis (MapReduce).3. Random reads.
Datastore Requirements
1. Random writes.2. Analysis (MapReduce).3. Random reads.
Data Model Requirements
1. Write user-centric data.○ "Bob bought the Hunger Games book."○ "Sally viewed product page X."
2. Query user-centric data.○ "What were Jim's most recent 5 purchases?"○ "What are Sue's top 3 recommendations?"
Given everything we know about John:● Transactions.● Tweets.● Likes.
... recommend, classify, predict, cluster, profile.
User-centric Data Model
User-centric Data Model
<column> <name>email</name> <description>Email address</description> <schema>"string"</schema></column>
Cells have Avro schemas for evolvable storage and retrieval.
User-centric Data Model
● 3-D storage with timestamps.
Analyzing Data: Producers
● produce() generates derived data for a single row:○ recommend○ profile○ classify○ etc.
Analyzing Data: Gatherers
● gather() aggregates data across all rows.○ build association rules for collaborative filtering.○ train classifier models.○ compute prior probabilities for events.○ etc.
Example: Ad Targeting
User Games Interests Recommended AdsAlex MiniGolf Pro,
Extreme Pond Fishing
Bob Kitten Krash
Carol Apples Everywhere,Underground Racer
Game CategoriesMiniGolf Pro Golf,
Sports
Kitten Krash Cats,Racing
Apples Everywhere Puzzles
Example: Ad Targeting
User Games Interests Recommended AdsAlex MiniGolf Pro,
Extreme Pond FishingGolf,Sports
Bob Kitten Krash
Carol Apples Everywhere,Underground Racer
Game CategoriesMiniGolf Pro Golf,
Sports
Kitten Krash Cats,Racing
Apples Everywhere Puzzles
Producer
Example: Ad Targeting
User Games Interests Recommended AdsAlex MiniGolf Pro,
Extreme Pond FishingGolf,Sports
Bob Kitten Krash
Carol Apples Everywhere,Underground Racer
Category AdvertisementGolf ESPN.com
Animals Petco.com
Racing Nascar.com
Producer
ESPN.com
Example: Ad Targeting
User Games Interests Recommended AdsAlex MiniGolf Pro,
Extreme Pond FishingGolf,Sports
Bob Kitten Krash
Carol Apples Everywhere,Underground Racer
Category AdvertisementGolf ESPN.com
Animals Petco.com
Racing Nascar.com
Producer
ESPN.com
Wait, where did this come from?
Example: Gathering Associations
User Games Interests Clicked AdsAlex MiniGolf Pro,
Extreme Pond FishingGolf,Sports
Bob Kitten Krash
Carol Apples Everywhere,Underground Racer
Example: Gathering Associations
User Games Interests Clicked AdsAlex MiniGolf Pro,
Extreme Pond FishingGolf,Sports
Bob Kitten Krash
Carol Apples Everywhere,Underground Racer
Example: Gathering Associations
Example: Gathering Associations
Example: Gathering Associations
Example: Gathering Associations
Example: Gathering Associations
Map
.
.
.
Example: Gathering Associations
Map
.
.
.
Reduce
Final Thoughts
● A user-centric data storage model has great advantages:○ Fast per-user reads and writes.○ Already pivoted by your most common analysis.
● HBase provides fast, reliable random-access and scans.○ Billions of rows, millions of columns.○ Integrates well with MapReduce for analysis.
● Build scalable personalized applications with WibiData.○ Check out www.wibidata.com
Garrett Wu | [email protected]