Date post: | 27-Feb-2018 |
Category: |
Documents |
Upload: | aman-pandey |
View: | 215 times |
Download: | 0 times |
of 50
7/25/2019 Discoverweeklydataengconf 151116205329 Lva1 App6892
1/50
From Idea to
Execution: SpotifysDiscover Weekly
Chris Johnson :: @MrChrisJohnsonEdward Newett :: @scaladaze
DataEngConf NYC Nov 2015
Or: 5 lessons in building
recommendation products at scale
7/25/2019 Discoverweeklydataengconf 151116205329 Lva1 App6892
2/50
Who are We??
Chris Johnson Edward Newett
7/25/2019 Discoverweeklydataengconf 151116205329 Lva1 App6892
3/50
Spotify in Numbers
Started in 2006, now available in 58markets
75+ Millionactive users, 20 Millionpaying subscribers
30+ Millionsongs, 20,000new songs added per day
1.5 Billionuser generated playlists
1 TBuser data logged per day
1,700 node Hadoop cluster 10,000+Hadoop jobs run daily
7/25/2019 Discoverweeklydataengconf 151116205329 Lva1 App6892
4/50
Challenge: 30M songs how do we recommendmusic to users?
7/25/2019 Discoverweeklydataengconf 151116205329 Lva1 App6892
5/50
Discover
7/25/2019 Discoverweeklydataengconf 151116205329 Lva1 App6892
6/50
Radio
7/25/2019 Discoverweeklydataengconf 151116205329 Lva1 App6892
7/50
Related Artists
7/25/2019 Discoverweeklydataengconf 151116205329 Lva1 App6892
8/50
Discover Weekly
Started in 2006, now available in 58markets
75+ Millionactive users, 20 Millionpaying subscrib
30+ Millionsongs, 20,000new songs added per day
1.5 Billionuser generated playlists 1 TBuser data logged per day
1,700 node Hadoop cluster
10,000+Hadoop jobs run daily
7/25/2019 Discoverweeklydataengconf 151116205329 Lva1 App6892
9/50
The Road toDiscover Weekly
7/25/2019 Discoverweeklydataengconf 151116205329 Lva1 App6892
10/50
2013 :: Discover Page v1.0
Personalized News Feed of
recommendations
Artists, Album Reviews, News
Articles, New Releases, Upcoming
Concerts, SocialRecommendations, Playlists
Required a lot of attention and
digging to engage with
recommendations
No organization of content
7/25/2019 Discoverweeklydataengconf 151116205329 Lva1 App6892
11/50
2014 :: Discover Page v2.0
Recommendations grouped intostrips(a la Netflix)
Limited to Albums and NewReleases
More organized than News-Feedbut still requires activeinteraction
7/25/2019 Discoverweeklydataengconf 151116205329 Lva1 App6892
12/50
Insight: users spending more time oneditorial Browse playlists than Discover
7/25/2019 Discoverweeklydataengconf 151116205329 Lva1 App6892
13/50
Idea: combine thepersonalized experience
of Discoverwith the leanback ease of Browse
7/25/2019 Discoverweeklydataengconf 151116205329 Lva1 App6892
14/50
Meanwhile 2014 Year In Music
7/25/2019 Discoverweeklydataengconf 151116205329 Lva1 App6892
15/50
Play it forward: Same content as theDiscover Page but.. a playlist
7/25/2019 Discoverweeklydataengconf 151116205329 Lva1 App6892
16/50
Lesson 1:
Be data driven from
start to finish
7/25/2019 Discoverweeklydataengconf 151116205329 Lva1 App6892
17/50
Slide from Dan McK
2008 2012 2015
7/25/2019 Discoverweeklydataengconf 151116205329 Lva1 App6892
18/50
Reach: How many users are you reaching
Depth: For the users you reach, what is the
depth of reach. Retention: For the users you reach, how many
do you retain?
Define success metrics BEFORE yourelease your test
7/25/2019 Discoverweeklydataengconf 151116205329 Lva1 App6892
19/50
Reach: DW WAU / Spotify WAU
Depth: DW Time Spent / Spotify WAU
Retention: DW week-over-week retention
Discover Weekly Key Success Metrics
7/25/2019 Discoverweeklydataengconf 151116205329 Lva1 App6892
20/50
2008 2012 2015
Slide from Dan McK
7/25/2019 Discoverweeklydataengconf 151116205329 Lva1 App6892
21/50
Step 1: Prototype (employee test)
7/25/2019 Discoverweeklydataengconf 151116205329 Lva1 App6892
22/50
Step 1: Prototype (employee test)
7/25/2019 Discoverweeklydataengconf 151116205329 Lva1 App6892
23/50
Results of Employee Test were very positiv
7/25/2019 Discoverweeklydataengconf 151116205329 Lva1 App6892
24/50
2008 2012 2015
Slide from Dan McK
7/25/2019 Discoverweeklydataengconf 151116205329 Lva1 App6892
25/50
Step 2: Release AB Test to 1% of Use
7/25/2019 Discoverweeklydataengconf 151116205329 Lva1 App6892
26/50
Google Form 1% Results
7/25/2019 Discoverweeklydataengconf 151116205329 Lva1 App6892
27/50
Personalized image resulted in 10% lift in WA
Initial 0.5% user test
1% Spaceman image
1% Personalizedimage
7/25/2019 Discoverweeklydataengconf 151116205329 Lva1 App6892
28/50
Lesson 2:Reuse existing
infrastructure in creativeways
7/25/2019 Discoverweeklydataengconf 151116205329 Lva1 App6892
29/50
Discover Weekly Data Flow
7/25/2019 Discoverweeklydataengconf 151116205329 Lva1 App6892
30/50
Recommendation
Models
Implicit Matrix Factorization
7/25/2019 Discoverweeklydataengconf 151116205329 Lva1 App6892
31/50
1 0 0 0 1 0 0 10 0 1 0 0 1 0 0
1 0 1 0 0 0 1 1
0 1 0 0 0 1 0 0
0 0 1 0 0 1 0 01 0 0 0 1 0 0 1
Aggregate all (user, track) streams into a large matrix
Goal: Approximate binary preference matrix by inner product of 2 smaller matrices by minimizing the
weighted RMSE(root mean squared error) using a function of plays, context, and recency as weight
X YUsers
Songs
= bias for user
= bias for item
= regularization parameter
= 1 if user streamed track else 0
= user latent factor vector
= item latent factor vector
[1] Hu Y. & Koren Y. & Volinsky C. (2008) Collaborative Filtering for Implicit Feedback Datasets 8th IEEE International Conference on Data Mining
Implicit Matrix Factorization
Can also use Logistic Loss!
7/25/2019 Discoverweeklydataengconf 151116205329 Lva1 App6892
32/50
1 0 0 0 1 0 0 10 0 1 0 0 1 0 0
1 0 1 0 0 0 1 1
0 1 0 0 0 1 0 0
0 0 1 0 0 1 0 0
1 0 0 0 1 0 0 1
Aggregate all (user, track) streams into a large matrix
Goal: Model probability of user playing a song as logistic, then maximize log likelihoodof binarypreference matrix, weighting positive observations by a function of plays, context, and recency
X YUsers
Songs
= bias for user
= bias for item
= regularization parameter
= user latent factor vector
= item latent factor vector
[2] Johnson C. (2014) Logistic Matrix Factorization for Implicit Feedback Data NIPS Workshop on Distributed Matrix Computations
Can also use Logistic Loss!
NLP Models on News and Blogs
7/25/2019 Discoverweeklydataengconf 151116205329 Lva1 App6892
33/50
NLP Models on News and Blogs
NLP Models work great on Playlists
7/25/2019 Discoverweeklydataengconf 151116205329 Lva1 App6892
34/50
Playlist itself is a
document
Songs inplaylist arewords
NLP Models work great on Playlists
Deep Learning on Audio
7/25/2019 Discoverweeklydataengconf 151116205329 Lva1 App6892
35/50
[3] http://benanne.github.io/2014/08/05/spotify-cnns.html
Deep Learning on Audio
Songs in a Latent Space representat
http://benanne.github.io/2014/08/05/spotify-cnns.html7/25/2019 Discoverweeklydataengconf 151116205329 Lva1 App6892
36/50
normalized item-vectors
Songs in a Latent Space representat
Songs in a Latent Space representat
7/25/2019 Discoverweeklydataengconf 151116205329 Lva1 App6892
37/50
user-vectorin same space
Songs in a Latent Space representat
7/25/2019 Discoverweeklydataengconf 151116205329 Lva1 App6892
38/50
Lesson 3:Dont scale until
you need to
7/25/2019 Discoverweeklydataengconf 151116205329 Lva1 App6892
39/50
Scaling to 100%: Rollout Challenges
!Create and publish 75M playlists every week
!Downloading and processing Facebook images
!Language translations
7/25/2019 Discoverweeklydataengconf 151116205329 Lva1 App6892
40/50
Scaling to 100%: Weekly refresh
!Time sensitive updates
!Refresh 75M playlists every Sunday night
!Take timezones into account
7/25/2019 Discoverweeklydataengconf 151116205329 Lva1 App6892
41/50
Discover Weekly publishing flow
7/25/2019 Discoverweeklydataengconf 151116205329 Lva1 App6892
42/50
7/25/2019 Discoverweeklydataengconf 151116205329 Lva1 App6892
43/50
7/25/2019 Discoverweeklydataengconf 151116205329 Lva1 App6892
44/50
7/25/2019 Discoverweeklydataengconf 151116205329 Lva1 App6892
45/50
Whats next?Iterating on contentqualityand interface
enhancements
I f db l
7/25/2019 Discoverweeklydataengconf 151116205329 Lva1 App6892
46/50
Iterating on quality and adding a feedback loo
7/25/2019 Discoverweeklydataengconf 151116205329 Lva1 App6892
47/50
DW feedback comes at the expense of presentation bi
7/25/2019 Discoverweeklydataengconf 151116205329 Lva1 App6892
48/50
Lesson 4:Users know best. In the
end, AB Test everything!
7/25/2019 Discoverweeklydataengconf 151116205329 Lva1 App6892
49/50
Lesson 5 (final lesson!):Empower bottom-up
innovation in your org and
amazing things will happen.
7/25/2019 Discoverweeklydataengconf 151116205329 Lva1 App6892
50/50
Thank You!(btw, were hiring Machine Learning andData Engineers, come chat with us!)