+ All Categories
Home > Documents > Discoverweeklydataengconf 151116205329 Lva1 App6892

Discoverweeklydataengconf 151116205329 Lva1 App6892

Date post: 27-Feb-2018
Category:
Upload: aman-pandey
View: 215 times
Download: 0 times
Share this document with a friend

of 50

Transcript
  • 7/25/2019 Discoverweeklydataengconf 151116205329 Lva1 App6892

    1/50

    From Idea to

    Execution: SpotifysDiscover Weekly

    Chris Johnson :: @MrChrisJohnsonEdward Newett :: @scaladaze

    DataEngConf NYC Nov 2015

    Or: 5 lessons in building

    recommendation products at scale

  • 7/25/2019 Discoverweeklydataengconf 151116205329 Lva1 App6892

    2/50

    Who are We??

    Chris Johnson Edward Newett

  • 7/25/2019 Discoverweeklydataengconf 151116205329 Lva1 App6892

    3/50

    Spotify in Numbers

    Started in 2006, now available in 58markets

    75+ Millionactive users, 20 Millionpaying subscribers

    30+ Millionsongs, 20,000new songs added per day

    1.5 Billionuser generated playlists

    1 TBuser data logged per day

    1,700 node Hadoop cluster 10,000+Hadoop jobs run daily

  • 7/25/2019 Discoverweeklydataengconf 151116205329 Lva1 App6892

    4/50

    Challenge: 30M songs how do we recommendmusic to users?

  • 7/25/2019 Discoverweeklydataengconf 151116205329 Lva1 App6892

    5/50

    Discover

  • 7/25/2019 Discoverweeklydataengconf 151116205329 Lva1 App6892

    6/50

    Radio

  • 7/25/2019 Discoverweeklydataengconf 151116205329 Lva1 App6892

    7/50

    Related Artists

  • 7/25/2019 Discoverweeklydataengconf 151116205329 Lva1 App6892

    8/50

    Discover Weekly

    Started in 2006, now available in 58markets

    75+ Millionactive users, 20 Millionpaying subscrib

    30+ Millionsongs, 20,000new songs added per day

    1.5 Billionuser generated playlists 1 TBuser data logged per day

    1,700 node Hadoop cluster

    10,000+Hadoop jobs run daily

  • 7/25/2019 Discoverweeklydataengconf 151116205329 Lva1 App6892

    9/50

    The Road toDiscover Weekly

  • 7/25/2019 Discoverweeklydataengconf 151116205329 Lva1 App6892

    10/50

    2013 :: Discover Page v1.0

    Personalized News Feed of

    recommendations

    Artists, Album Reviews, News

    Articles, New Releases, Upcoming

    Concerts, SocialRecommendations, Playlists

    Required a lot of attention and

    digging to engage with

    recommendations

    No organization of content

  • 7/25/2019 Discoverweeklydataengconf 151116205329 Lva1 App6892

    11/50

    2014 :: Discover Page v2.0

    Recommendations grouped intostrips(a la Netflix)

    Limited to Albums and NewReleases

    More organized than News-Feedbut still requires activeinteraction

  • 7/25/2019 Discoverweeklydataengconf 151116205329 Lva1 App6892

    12/50

    Insight: users spending more time oneditorial Browse playlists than Discover

  • 7/25/2019 Discoverweeklydataengconf 151116205329 Lva1 App6892

    13/50

    Idea: combine thepersonalized experience

    of Discoverwith the leanback ease of Browse

  • 7/25/2019 Discoverweeklydataengconf 151116205329 Lva1 App6892

    14/50

    Meanwhile 2014 Year In Music

  • 7/25/2019 Discoverweeklydataengconf 151116205329 Lva1 App6892

    15/50

    Play it forward: Same content as theDiscover Page but.. a playlist

  • 7/25/2019 Discoverweeklydataengconf 151116205329 Lva1 App6892

    16/50

    Lesson 1:

    Be data driven from

    start to finish

  • 7/25/2019 Discoverweeklydataengconf 151116205329 Lva1 App6892

    17/50

    Slide from Dan McK

    2008 2012 2015

  • 7/25/2019 Discoverweeklydataengconf 151116205329 Lva1 App6892

    18/50

    Reach: How many users are you reaching

    Depth: For the users you reach, what is the

    depth of reach. Retention: For the users you reach, how many

    do you retain?

    Define success metrics BEFORE yourelease your test

  • 7/25/2019 Discoverweeklydataengconf 151116205329 Lva1 App6892

    19/50

    Reach: DW WAU / Spotify WAU

    Depth: DW Time Spent / Spotify WAU

    Retention: DW week-over-week retention

    Discover Weekly Key Success Metrics

  • 7/25/2019 Discoverweeklydataengconf 151116205329 Lva1 App6892

    20/50

    2008 2012 2015

    Slide from Dan McK

  • 7/25/2019 Discoverweeklydataengconf 151116205329 Lva1 App6892

    21/50

    Step 1: Prototype (employee test)

  • 7/25/2019 Discoverweeklydataengconf 151116205329 Lva1 App6892

    22/50

    Step 1: Prototype (employee test)

  • 7/25/2019 Discoverweeklydataengconf 151116205329 Lva1 App6892

    23/50

    Results of Employee Test were very positiv

  • 7/25/2019 Discoverweeklydataengconf 151116205329 Lva1 App6892

    24/50

    2008 2012 2015

    Slide from Dan McK

  • 7/25/2019 Discoverweeklydataengconf 151116205329 Lva1 App6892

    25/50

    Step 2: Release AB Test to 1% of Use

  • 7/25/2019 Discoverweeklydataengconf 151116205329 Lva1 App6892

    26/50

    Google Form 1% Results

  • 7/25/2019 Discoverweeklydataengconf 151116205329 Lva1 App6892

    27/50

    Personalized image resulted in 10% lift in WA

    Initial 0.5% user test

    1% Spaceman image

    1% Personalizedimage

  • 7/25/2019 Discoverweeklydataengconf 151116205329 Lva1 App6892

    28/50

    Lesson 2:Reuse existing

    infrastructure in creativeways

  • 7/25/2019 Discoverweeklydataengconf 151116205329 Lva1 App6892

    29/50

    Discover Weekly Data Flow

  • 7/25/2019 Discoverweeklydataengconf 151116205329 Lva1 App6892

    30/50

    Recommendation

    Models

    Implicit Matrix Factorization

  • 7/25/2019 Discoverweeklydataengconf 151116205329 Lva1 App6892

    31/50

    1 0 0 0 1 0 0 10 0 1 0 0 1 0 0

    1 0 1 0 0 0 1 1

    0 1 0 0 0 1 0 0

    0 0 1 0 0 1 0 01 0 0 0 1 0 0 1

    Aggregate all (user, track) streams into a large matrix

    Goal: Approximate binary preference matrix by inner product of 2 smaller matrices by minimizing the

    weighted RMSE(root mean squared error) using a function of plays, context, and recency as weight

    X YUsers

    Songs

    = bias for user

    = bias for item

    = regularization parameter

    = 1 if user streamed track else 0

    = user latent factor vector

    = item latent factor vector

    [1] Hu Y. & Koren Y. & Volinsky C. (2008) Collaborative Filtering for Implicit Feedback Datasets 8th IEEE International Conference on Data Mining

    Implicit Matrix Factorization

    Can also use Logistic Loss!

  • 7/25/2019 Discoverweeklydataengconf 151116205329 Lva1 App6892

    32/50

    1 0 0 0 1 0 0 10 0 1 0 0 1 0 0

    1 0 1 0 0 0 1 1

    0 1 0 0 0 1 0 0

    0 0 1 0 0 1 0 0

    1 0 0 0 1 0 0 1

    Aggregate all (user, track) streams into a large matrix

    Goal: Model probability of user playing a song as logistic, then maximize log likelihoodof binarypreference matrix, weighting positive observations by a function of plays, context, and recency

    X YUsers

    Songs

    = bias for user

    = bias for item

    = regularization parameter

    = user latent factor vector

    = item latent factor vector

    [2] Johnson C. (2014) Logistic Matrix Factorization for Implicit Feedback Data NIPS Workshop on Distributed Matrix Computations

    Can also use Logistic Loss!

    NLP Models on News and Blogs

  • 7/25/2019 Discoverweeklydataengconf 151116205329 Lva1 App6892

    33/50

    NLP Models on News and Blogs

    NLP Models work great on Playlists

  • 7/25/2019 Discoverweeklydataengconf 151116205329 Lva1 App6892

    34/50

    Playlist itself is a

    document

    Songs inplaylist arewords

    NLP Models work great on Playlists

    Deep Learning on Audio

  • 7/25/2019 Discoverweeklydataengconf 151116205329 Lva1 App6892

    35/50

    [3] http://benanne.github.io/2014/08/05/spotify-cnns.html

    Deep Learning on Audio

    Songs in a Latent Space representat

    http://benanne.github.io/2014/08/05/spotify-cnns.html
  • 7/25/2019 Discoverweeklydataengconf 151116205329 Lva1 App6892

    36/50

    normalized item-vectors

    Songs in a Latent Space representat

    Songs in a Latent Space representat

  • 7/25/2019 Discoverweeklydataengconf 151116205329 Lva1 App6892

    37/50

    user-vectorin same space

    Songs in a Latent Space representat

  • 7/25/2019 Discoverweeklydataengconf 151116205329 Lva1 App6892

    38/50

    Lesson 3:Dont scale until

    you need to

  • 7/25/2019 Discoverweeklydataengconf 151116205329 Lva1 App6892

    39/50

    Scaling to 100%: Rollout Challenges

    !Create and publish 75M playlists every week

    !Downloading and processing Facebook images

    !Language translations

  • 7/25/2019 Discoverweeklydataengconf 151116205329 Lva1 App6892

    40/50

    Scaling to 100%: Weekly refresh

    !Time sensitive updates

    !Refresh 75M playlists every Sunday night

    !Take timezones into account

  • 7/25/2019 Discoverweeklydataengconf 151116205329 Lva1 App6892

    41/50

    Discover Weekly publishing flow

  • 7/25/2019 Discoverweeklydataengconf 151116205329 Lva1 App6892

    42/50

  • 7/25/2019 Discoverweeklydataengconf 151116205329 Lva1 App6892

    43/50

  • 7/25/2019 Discoverweeklydataengconf 151116205329 Lva1 App6892

    44/50

  • 7/25/2019 Discoverweeklydataengconf 151116205329 Lva1 App6892

    45/50

    Whats next?Iterating on contentqualityand interface

    enhancements

    I f db l

  • 7/25/2019 Discoverweeklydataengconf 151116205329 Lva1 App6892

    46/50

    Iterating on quality and adding a feedback loo

  • 7/25/2019 Discoverweeklydataengconf 151116205329 Lva1 App6892

    47/50

    DW feedback comes at the expense of presentation bi

  • 7/25/2019 Discoverweeklydataengconf 151116205329 Lva1 App6892

    48/50

    Lesson 4:Users know best. In the

    end, AB Test everything!

  • 7/25/2019 Discoverweeklydataengconf 151116205329 Lva1 App6892

    49/50

    Lesson 5 (final lesson!):Empower bottom-up

    innovation in your org and

    amazing things will happen.

  • 7/25/2019 Discoverweeklydataengconf 151116205329 Lva1 App6892

    50/50

    Thank You!(btw, were hiring Machine Learning andData Engineers, come chat with us!)


Recommended