+ All Categories
Home > Documents > Issuu Talk on Topic Models and Recommendation Systems

Issuu Talk on Topic Models and Recommendation Systems

Date post: 31-Mar-2016
Category:
Upload: arngren
View: 219 times
Download: 2 times
Share this document with a friend
Description:
Issuu gave a talk on the Data Science and Machine Learning Meetup in Copenhagen, Nov. 2013.
Popular Tags:
23
Topic Models Recommendations Morten Arngren Senior Data Scientist [ ]
Transcript
Page 1: Issuu Talk on Topic Models and Recommendation Systems

Topic Models Recommendations

Morten Arngren Senior Data Scientist[ ]

Page 2: Issuu Talk on Topic Models and Recommendation Systems

About Topic Recommendations

πŸ’‘ !

Recommendations

Modelling

Page 3: Issuu Talk on Topic Models and Recommendation Systems

β€œβ€¦YouTube for Publications…

Page 4: Issuu Talk on Topic Models and Recommendation Systems

IStarted in 2006 by 5 dudes.

15M. publications (free)πŸ“–

πŸ‘€ 7.5B. page views / month

340M. pages - (25 km2)

2013

πŸ‘₯ 83M. unique visitors / month

""

Page 5: Issuu Talk on Topic Models and Recommendation Systems

Data Science Team (Copenhagen)

12x 2.6GHz

96GB Ram

2TB SSD

2TB HardDrive

Morten Arngren Ph.D. in Machine Learning and AI (2011) M.Sc.A.M. (2007) B.Sc.E.E. (1997) !ISSUU, Data Scientist (2011 - present) DTU & FOSS Analytical, Machine Learning in Food Quality (2008-2011) Nokia Mobile Phones, Digital Signal Processing (2000-2007) Alcatel Space Denmark, Building Rockets (1997-2000)

Andrius Butkus Ph.D. in Digital Media Personalisation (2009) M.Sc.E.E. (2004) B.Sc.E.E. (2002) !ISSUU, Data Scientist (2011 - present) DTU External Lecturer, Human Computer Interaction (2010 - present) DTU Assistant Professor, Digital Media Engineering (2008-2010) ☁ Amazon Web

Services

ML Gadgets

Page 6: Issuu Talk on Topic Models and Recommendation Systems

πŸ“ˆDataπŸ“ˆData

Page 7: Issuu Talk on Topic Models and Recommendation Systems

πŸ“ˆData

Page 8: Issuu Talk on Topic Models and Recommendation Systems

πŸ“–Layout

(Quantify text and image boxes)

πŸš€

πŸš€

Article Extraction

)OCR

πŸš€

Image

Cover Analysis

#

Explicit Detection

Doc. Type Classification

$

Text

Detect Language (56)

Translate to English (from 24 languages) LDA Topics

(βš›

πŸš€

πŸ”Ž

Page

Content

*DB

&40k

Pubs / Day

Page 9: Issuu Talk on Topic Models and Recommendation Systems

time

Reader Activity

+!

,

πŸ‘

- -

πŸ‘

,

,,

-

N NSession

""

"" "

"

"

*DB

πŸ” πŸ”πŸŽ¬

🎧1

2πŸ“Ή

β€œBirdie Nam Nam”

200GB / Day

Page 10: Issuu Talk on Topic Models and Recommendation Systems

Topic Modelling

Page 11: Issuu Talk on Topic Models and Recommendation Systems

LATENT DIRICHLET ALLOCATION

150 topics (preset parameter)

Topic model based on Bag-of-Words Data

http://radimrehurek.com/gensim/

Wikipedia Training Data ~4.5M Single Articles

(Pure Topics)

arabicAustralia history business

islands environment

hotels

poetic

food design arts

plants animals

Topic Distribution

1501

LDA 🌴

D. Blei, A. Ng, and M. Jordan. Latent Dirichlet allocation. Journal of Machine Learning Research, 3:993–1022, January 2003.[ ]

Page 12: Issuu Talk on Topic Models and Recommendation Systems

πŸš€

✈

(

πŸ“Ή

5

🌴

LATENT DIRICHLET ALLOCATION

Properties Σ[0:1] ∧ = 1

LDA SpacePC 4

the real

5+

Issuu Publications

Page 13: Issuu Talk on Topic Models and Recommendation Systems

TOPIC CATEGORIES

(

🍸

✈ ✈

(

πŸ“Ή

~4.5 Mio.

Density distr ibution not the same

I🌴

8🍸

~9 Mio.

Empty locations in LDA space.

Travel

Cocktails

Chemistry

0.5 Travel 0.4 Spor ts 0.1

Botanics

Drinks

(Learning from Wikipedia Dataset)

Dancing

Page 14: Issuu Talk on Topic Models and Recommendation Systems

Recommendation System!

Page 15: Issuu Talk on Topic Models and Recommendation Systems

🎬

READER ACTIVITY

πŸ” πŸ”πŸŽ§1

2πŸ“Ή

Extract Implic it Rating….?

No Explic it Rating….

Timeβ€œBirdie Nam Nam”

Page 16: Issuu Talk on Topic Models and Recommendation Systems

Session { UserName: β€˜Birdie-Nam-Nam’ DocID: xxx-xxxxx Pages: 1: [250, 725, 569, 134, ...] 2: [1056, 1259, ...] 3: [1056, 1259, ...] 4: [102, 356, 208, 438] 5: [102, 356, 208, 438] 6: [5250, 3567, 809] 7: [5250, 3567, 809] ... TimeStamp: 1378935850 DocID: yyy-yyyyy }

Pages: [1,2,3,6,7] ReadTime: 25789 ms. TimeStamp: 1378935850

Browsing or Reading?Time

Readers

Publ

icat

ions

πŸ”

🎬

2

🎧

🍸

Page 17: Issuu Talk on Topic Models and Recommendation Systems

Item2Item Matrix

πŸ”

🎬

2

🎧

🍸

πŸ” 🎬 2 🎧 🍸

12πŸ“ΉπŸŽ¬πŸŽ§ πŸ”πŸ”

Reader indexed learning

To

Pages: [1,6,7,10,11] ReadTime: 11250 ms. TimeStamp: 1385437850

Time

568525081065

850 11509860

3690

in weeks

decay per week= 850

Decay function

Page 18: Issuu Talk on Topic Models and Recommendation Systems

RECOMMENDING

Item2Item Matrix

8

πŸ”

🎬

πŸ€

🍸

1 🍟 5 🎧 🎱

1 🍟 5 🎧

Item Matrix Weight Mapping Function

πŸŽ§πŸŽ¬πŸ“Ή πŸ”

Time

25081065850 1150

N

πŸ‘πŸŒ΄< πŸš€

11 1

Read History

πŸ“–

Likes

Stacks

Page 19: Issuu Talk on Topic Models and Recommendation Systems

RECOMMENDING

+5

πŸ” I

1 πŸ•

πŸ“Ή

β™«8

🎬

🎧

πŸ€

🍏🍟

E

πŸΈπŸ”ˆ

🎀

🎱

πŸ“·C

🍷

🍺🎾

F

πŸ‘½

🎱

Item Matrix Weight Mapping Function

1

Item Weights

1 🍟 5 🎧 🎱 1🍟5 🎧 🎱

πŸ”€Weighted Sampling

1🍟5 🎧 🎱

Page 20: Issuu Talk on Topic Models and Recommendation Systems

Max. Rank

Page 21: Issuu Talk on Topic Models and Recommendation Systems

Tuned Parameters

Page 22: Issuu Talk on Topic Models and Recommendation Systems

Deep Belief Network Model

Bag-of-Words modelTraining Data

I

Lars Maal

2000

500

20

2

Kasper Johansen

! "

Collaborate Fi lter ing Using Social Media Knowledge

Master Student Project

LLΓΈe

Page 23: Issuu Talk on Topic Models and Recommendation Systems

Master Student Project

LLMorten Arngren

Senior Data Scientist[ ]


Recommended