+ All Categories
Home > Documents > Netflix and Beyond

Netflix and Beyond

Date post: 25-Feb-2016
Category:
Upload: caraf
View: 31 times
Download: 0 times
Share this document with a friend
Description:
Netflix and Beyond. Tuning Solr for great results. Walter Underwood http:// wunderwood.org/most_casual_observer /. Typical Web Query Mix. informational navigational (known-site) transactional (known-item) (Andrei Broder , AltaVista, 2002). “talking rat movie”. - PowerPoint PPT Presentation
Popular Tags:
18
Netflix and Beyond Tuning Solr for great results. Walter Underwood http://wunderwood.org/most_casual_observer/
Transcript
Page 1: Netflix and Beyond

Netflix and Beyond

Tuning Solr for great results.

Walter Underwoodhttp://wunderwood.org/most_casual_observer/

Page 2: Netflix and Beyond

Typical Web Query Mix

• informational• navigational (known-site)• transactional (known-item)

(Andrei Broder, AltaVista, 2002)

Page 3: Netflix and Beyond

“talking rat movie”

Page 4: Netflix and Beyond

Top Queries October 2006

• finding neverland• bridget jones• closer• the incredibles• incredibles• ladder 49• fat albert• being julia• ray• national treasure

• alfie• spanglish• star wars• meet the fockers• final cut• hotel rwanda• neverland• after the sunset• million dollar baby• hitch

Page 5: Netflix and Beyond

Netflix Queries

• 92% movie titles• 5% genres and categories• 3% people

Known-item queries make up 95% of Netflix traffic.

Page 6: Netflix and Beyond

Zipf Plot of Search Queries

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5log(rank)

log(freq)

Series1

Page 7: Netflix and Beyond

Problematic User Behavior

• One or two words?• Partial words• Misspellings

Page 8: Netflix and Beyond

One or Two Words?

Page 9: Netflix and Beyond

Partial Words

• People don’t like to make mistakes:– rat, rata, ratat– apoc– koyaanisq

• Phonetic encoding (soundex) assumes complete words

Page 10: Netflix and Beyond

Autocomplete Finishes Words

• Load movie titles and popular people• 10% improvement in search quality (MRR)• 10X as much traffic as search queries• Dedicated Solr with RAMDirectory• Front-end HTTP cache, 1 hour lifetime, 80% hit

rate

Page 11: Netflix and Beyond

Some Misspellings

• shakespear• the incredables• seven samarai• breakfast at tiffiney• blazing sadles• selen• scorupko• taeku• christopher walkin• return to lonsom dove• teh matrix• comdy tv

• pirhana• dungens and dragons• pufi yami• al pachino• incredables• gundan seed mobile suit• chatterluy• white fany to the rsecue• meet the faulkers• brigette joes diary• oh brother where are thou?• pirartes of the carr

Page 12: Netflix and Beyond

Switch from Phonetic to Fuzzy

• Tested a dozen algorithms with users• 250K queries per test cell• JaroWinkler slightly better than Levenstein• JaroWinkler with 0.7 is very, very broad match– “koyaanisqatsi” matches “koy” (yuck!)– but “1048” matches “1408”

Page 13: Netflix and Beyond

Problematic Corpus Behavior

• Missing movies– Ollie Hopnoodle’s Haven of Bliss– CJ7

• Hard-to-spell names– Ratatouille– Coraline– Inglourious Basterds

• Hard-to-remember names– Click– Apocalypto– Seven Up Plus Seven

Page 14: Netflix and Beyond
Page 15: Netflix and Beyond

Metrics: MRR

• Mean Reciprocal Rank• Weighted clickthrough, measured on site traffic– #1 is a full click– #2 is a half click– #3 is one third click– etc.

• Daily, weekly, and seasonal variations• Overall customer satisfaction• Good for A/B tests, weak for finding bugs

Page 16: Netflix and Beyond

Per-query Metrics

• Useful for finding problems• MRR• Clickthrough percent• Most-clicked rank (#1 is good)• Percentage of clicks on most-clicked– known-item queries are over 80%– categories are under 50%

Page 17: Netflix and Beyond

Success Looks Like …

• MRR consistently over 0.5• 85% of clicks on #1

Page 18: Netflix and Beyond

Questions?


Recommended