+ All Categories
Home > Documents > Parallelizing Random Walk with Restart for Large-Scale Query Recommendation

Parallelizing Random Walk with Restart for Large-Scale Query Recommendation

Date post: 31-Jan-2016
Category:
Upload: ajaxe
View: 41 times
Download: 0 times
Share this document with a friend
Description:
Parallelizing Random Walk with Restart for Large-Scale Query Recommendation. Meng -Fen Chiang, Tsung -Wei Wang and Wen-Chih Peng Department of Computer Science National Chiao Tung University (R.O.C.). Outline. Introduction Related Work problem Definition Parallel RWR - PowerPoint PPT Presentation
33
Parallelizing Parallelizing Random Walk with Restart for Random Walk with Restart for Large-Scale Query Recommendation Large-Scale Query Recommendation Meng-Fen Chiang, Tsung-Wei Wang and Meng-Fen Chiang, Tsung-Wei Wang and Wen-Chih Peng Wen-Chih Peng Department of Computer Science Department of Computer Science National Chiao Tung University (R.O.C.) National Chiao Tung University (R.O.C.)
Transcript
Page 1: Parallelizing  Random Walk with Restart for  Large-Scale Query Recommendation

Parallelizing Parallelizing Random Walk with Restart for Random Walk with Restart for

Large-Scale Query RecommendationLarge-Scale Query Recommendation

Meng-Fen Chiang, Tsung-Wei Wang andMeng-Fen Chiang, Tsung-Wei Wang and

Wen-Chih PengWen-Chih Peng

Department of Computer ScienceDepartment of Computer Science

National Chiao Tung University (R.O.C.)National Chiao Tung University (R.O.C.)

Page 2: Parallelizing  Random Walk with Restart for  Large-Scale Query Recommendation

OutlineOutline

• IntroductionIntroduction• Related WorkRelated Work• problem Definitionproblem Definition• Parallel RWRParallel RWR

– Temporal following pattern mining– Recommendation graph construction– Random walk with restart for multiple queries

• Experimental ResultsExperimental Results• ConclusionConclusion

2

Page 3: Parallelizing  Random Walk with Restart for  Large-Scale Query Recommendation

IntroductionIntroduction

• Yahoo! Asia Knowledge Plus (AKP)Yahoo! Asia Knowledge Plus (AKP)

Question Answer

3

Page 4: Parallelizing  Random Walk with Restart for  Large-Scale Query Recommendation

Introduction (contd.)Introduction (contd.)

• User access logUser access log– Consider a QA pair as an Item– A sequence of items clicked by a user

– Typically, what a user looks for during a short period shares certain topics

• Within 4 min, 18 sec. “Upload photos to Facebook “4

Page 5: Parallelizing  Random Walk with Restart for  Large-Scale Query Recommendation

Introduction (contd.)Introduction (contd.)

• Random Walk with Restart (RWR)Random Walk with Restart (RWR)– Compute relevance scores of a set of node for

a query nodeNode 4

Node 1Node 2Node 3Node 4Node 5Node 6Node 7Node 8Node 9Node 10Node 11Node 12

0.130.100.130.220.130.050.050.080.040.030.040.02

1

4

3

2

56

7

910

811

120.13

0.10

0.13

0.13

0.05

0.05

0.08

0.04

0.02

0.04

0.03

5

Page 6: Parallelizing  Random Walk with Restart for  Large-Scale Query Recommendation

OutlineOutline

• IntroductionIntroduction• Related WorkRelated Work• problem Definitionproblem Definition• Parallel RWRParallel RWR

– Temporal following pattern mining– Recommendation graph construction– Random walk with restart for multiple queries

• Experimental ResultsExperimental Results• ConclusionConclusion

6

Page 7: Parallelizing  Random Walk with Restart for  Large-Scale Query Recommendation

Related WorkRelated Work

• Random Walk with Restart (RWR)Random Walk with Restart (RWR)– Off-line mode

• Pre-compute required information off-line– Pros : fast on-line recommendation for a query– Cons : prohibitive storage consumption

– On-line mode• Compute recommendation for a query on-line

– Pros : less storage consumption– Cons : longer response time

– Fast RWR• Less storage consumption• Fast on-line response time for a query

7

Page 8: Parallelizing  Random Walk with Restart for  Large-Scale Query Recommendation

Related Work (contd.)Related Work (contd.)

• Scalable recommendationScalable recommendation– SmartMiner

• Identify user sessions• Mine frequent navigation patterns

– Personalized community recommendation• 312 K active users, 109 K popular communities• Training time ~ 14 mins (200 nodes)

– Personalized news recommendation• Handel streaming content• No explicit runtime analysis of off-line training and

on-line recommendation

8

Page 9: Parallelizing  Random Walk with Restart for  Large-Scale Query Recommendation

OutlineOutline

• IntroductionIntroduction• Related WorkRelated Work• problem Definitionproblem Definition• Parallel RWRParallel RWR

– Temporal following pattern mining– Recommendation graph construction– Random walk with restart for multiple queries

• Experimental ResultsExperimental Results• ConclusionConclusion

9

Page 10: Parallelizing  Random Walk with Restart for  Large-Scale Query Recommendation

Problem DefinitionProblem Definition

• GoalGoal– Given user click logs, a query item I– Recommend relevant items w.r.t. I

• RequirementsRequirements– Effectiveness

• Mine frequent navigation patterns from click logs

– Scalability• Efficiently manage large-scale click logs within few

hours– Parallelization of RWR– Parallelization of RWR for multiple query nodes

10

Page 11: Parallelizing  Random Walk with Restart for  Large-Scale Query Recommendation

OutlineOutline

• IntroductionIntroduction• Related WorkRelated Work• problem Definitionproblem Definition• A framework for scalable A framework for scalable

recommendationrecommendation– Temporal following pattern mining– Recommendation graph construction– Random walk with restart for multiple queries

• Experimental ResultsExperimental Results• ConclusionConclusion

11

Page 12: Parallelizing  Random Walk with Restart for  Large-Scale Query Recommendation

System Architecture System Architecture

User Access Logs

Temporal Following Pattern

Mining

Parameters:1.window size2.bin size

Item ID : <Item List>. . .

Recommendation Graph

Construction

Random Walk with Restart

Item ID : <Item List>. . .

Query Items :Item 1Item 2

. . .

12Off-Line Computation StorageInput

Page 13: Parallelizing  Random Walk with Restart for  Large-Scale Query Recommendation

Mining Temporal Following Mining Temporal Following Patterns in ParallelPatterns in Parallel

User Access Logs

Temporal Following Pattern

Mining

Parameters:1.window size2.bin size

Item ID : <Item List>. . .

Recommendation Graph

Construction

Random Walk with Restart

Item ID : <Item List>. . .

Query Items :Item 1Item 2

. . .

13

Page 14: Parallelizing  Random Walk with Restart for  Large-Scale Query Recommendation

Temporal Following RelationTemporal Following Relation

• Frequent QA browsing behaviors of Frequent QA browsing behaviors of users within a pre-defined time users within a pre-defined time windowwindow– E.g., window size = 150 sec.

14

Item 1 Item 2 Item 4Item 3User Click Stream :

0

Temporal Following relation : <Item 1, Item 2> : dt = 30

30 70 160

<Item 1, Item 3> : dt = 70

. . .<Item 1, Item 4> : dt = 160

Page 15: Parallelizing  Random Walk with Restart for  Large-Scale Query Recommendation

Temporal Following Pattern Temporal Following Pattern MiningMining

15

Mapper 1

Mapper N

Reducer 1 Reducer N

User click logs

. . .

. . .

Parameters

<Itemi , Itemj:cntij>

<Itemi , <Itemj:cntij, …, Itemz:cntiz>>

Temporal Following Relations

Temporal Following Patterns

Emit temporal following pairs for each item

Aggregate temporal following relation for each item

Page 16: Parallelizing  Random Walk with Restart for  Large-Scale Query Recommendation

Recommendation Graph Recommendation Graph ConstructionConstruction

User Access Logs

Temporal Following Pattern

Mining

Parameters:1.window size2.bin size

Item ID : <Item List>. . .

Recommendation Graph

Construction

Random Walk with Restart

Item ID : <Item List>. . .

Query Items :Item 1Item 2

. . .

16

Page 17: Parallelizing  Random Walk with Restart for  Large-Scale Query Recommendation

Recommendation Graph Recommendation Graph ConstructionConstruction

• Goal Goal – Transform discovered temporal following

patterns to a recommendation graph

• E.g., E.g.,

17

<Item 1, <Item2:cnt12, item3:cnt13>>

Temporal Following Pattern

<Item 4, <Item3:cntt13>> n1

n2

n3

n4

cnt13

cnt12

cnt43

Recommendation Graph

Page 18: Parallelizing  Random Walk with Restart for  Large-Scale Query Recommendation

Paralleling Paralleling Random Walk with RestartRandom Walk with Restart

User Access Logs

Temporal Following Pattern

Mining

Parameters:1.window size2.bin size

Item ID : <Item List>. . .

Recommendation Graph

Construction

Random Walk with Restart

Item ID : <Item List>. . .

Query Items :Item 1Item 2

. . .

18

Page 19: Parallelizing  Random Walk with Restart for  Large-Scale Query Recommendation

Paralleling Paralleling Random Walk with RestartRandom Walk with Restart

• With single queryWith single query

1

43

2

5 6

7

9 10

811

120.130.10

0.13

0.13

0.05

0.05

0.08

0.04

0.02

0.04

0.03

Node 4

Node 1Node 2Node 3Node 4Node 5Node 6Node 7Node 8Node 9Node 10Node 11Node 12

0.130.100.130.220.130.050.050.080.040.030.040.02

1

43

2

5 6

7

9 10

811

12

19

Page 20: Parallelizing  Random Walk with Restart for  Large-Scale Query Recommendation

Paralleling RWR With Single QueryParalleling RWR With Single Query

20

Machine 1 : Set initial score

for q

Machine N : Set initial score

for qMachine 1 :

Calculate relevance score

for each item

Machine N : Calculate

relevance score for each item

Machine 1 : Calculate difference of relevance score

vectors

Machine N : Calculate difference of relevance score

vectors

q : an item

User click logs

. . .

. . .

. . .

Initialization

RWR

Convergence

Converged

Parameters

No Yes

Page 21: Parallelizing  Random Walk with Restart for  Large-Scale Query Recommendation

Paralleling Paralleling Random Walk with RestartRandom Walk with Restart

• With multiple queryWith multiple query

1

4

3

2

5 6

7

9 10

811

12

1

43

2

56

7

9 10

811

12

1

43

2

5 6

7

9 10

811

120.130.10

0.13

0.13

0.05

0.05

0.08

0.04

0.02

0.04

0.03

21

1

43

2

5 6

7

9 10

811

120.100.10

0.10

0.13

0.13

0.13

0.13

0.04

0.02

0.04

0.03

0.13

Page 22: Parallelizing  Random Walk with Restart for  Large-Scale Query Recommendation

Paralleling RWR With Multiple Paralleling RWR With Multiple QueriesQueries

22

Machine 1 : Set initial score

for Q

Machine N : Set initial score

for Q

Mapper 1 : Calculate diffusion score for each item

w.r.t. each q

Mapper N : Calculate relevance score for each item

w.r.t. each q

Reducer 1 : Sum up diffusion

score for each item w.r.t. q

Reducer N : Sum up diffusion

score for each w.r.t. q

Q : itemsUser click logs

. . .

. . .

. . .

Initialization

RWR

Parameters

Until Maximum iteration<Itemi , <q1:rs1i, …, qz:rs1z> <adjacent list>>

Page 23: Parallelizing  Random Walk with Restart for  Large-Scale Query Recommendation

Paralleling RWR With Multiple QueriesParalleling RWR With Multiple Queries

• Diffusion score for each item w.r.t. Diffusion score for each item w.r.t. qq

• Sum up diffusion scores for each item Sum up diffusion scores for each item w.r.t. w.r.t. qq

23

Page 24: Parallelizing  Random Walk with Restart for  Large-Scale Query Recommendation

OutlineOutline

• IntroductionIntroduction• Related WorkRelated Work• problem Definitionproblem Definition• Parallel RWRParallel RWR

– Temporal following pattern mining– Recommendation graph construction– Random walk with restart for multiple queries

• Experimental ResultsExperimental Results• ConclusionConclusion

24

Page 25: Parallelizing  Random Walk with Restart for  Large-Scale Query Recommendation

Experimental SetupExperimental Setup

• Yahoo! Asia Knowledge Plus (AKP)Yahoo! Asia Knowledge Plus (AKP)– Duration : 1-week in July, 2009– #clicks : 90 M– #items : 4 M– #users : 2 M

• Performance evaluationPerformance evaluation– Quality study– Scalability study– Case study

25

Page 26: Parallelizing  Random Walk with Restart for  Large-Scale Query Recommendation

Quality StudyQuality Study

• User access logsUser access logs– Train 80% – Test 20%

• GroundtruthGroundtruth– For each item I clicked by user U– The set of items clicked by U after I within T sec.

• Measure the similarity with historical Measure the similarity with historical user click logsuser click logs– Item-precision– Item-recall

26

Page 27: Parallelizing  Random Walk with Restart for  Large-Scale Query Recommendation

Quality Study (contd.)Quality Study (contd.)

– Top-k hot items in the category of test item (HC)

– Temporal following pattern (TFP)– RWR based on temporal following pattern

(RWRTFP)• Higher precision & recall

27

Page 28: Parallelizing  Random Walk with Restart for  Large-Scale Query Recommendation

Scalability StudyScalability Study

• Temporal following pattern (TFP)– 4.1M items– 40 sec.• RWR based on temporal following pattern

(RWRTFP)– #sizes of input data – #computing nodes

28

Page 29: Parallelizing  Random Walk with Restart for  Large-Scale Query Recommendation

Scalability Study (contd.)Scalability Study (contd.)

• Computational cost is significantly reduced as number of machines increases

• More queries, more computation effective– 0.74 sec. (2K queries) 0.49 sec. (10K

queries)

29

Page 30: Parallelizing  Random Walk with Restart for  Large-Scale Query Recommendation

Case StudyCase Study

• Query ItemQuery Item– “What can I do if I do not have Word?”

30

Page 31: Parallelizing  Random Walk with Restart for  Large-Scale Query Recommendation

ConclusionConclusion

• Proposes a parallel RWR for multiple Proposes a parallel RWR for multiple query recommendationquery recommendation– Parallelize mining frequent navigation

behavior– Parallelize RWR– Compute RWR for multiple queries in parallel

• The recommender systemThe recommender system– General– Content- agnostic

31

Page 32: Parallelizing  Random Walk with Restart for  Large-Scale Query Recommendation

Q & AQ & A

32

Page 33: Parallelizing  Random Walk with Restart for  Large-Scale Query Recommendation

Temporal Following Pattern Temporal Following Pattern MiningMining

33

Mapper 1 : Emit temporal

following pairs for each item

Mapper N : Emit temporal

following pairs for each item

Reducer 1 : Aggregate temporal following relation for

each item

Reducer N : Aggregate temporal following relation for

each item

User click logs

. . .

. . .

Parameters

<Itemi , Itemj:dtij>

<Itemi , <Itemj:dtij, …, Itemz:dtiz>>

Temporal Following Relations

Temporal Following Patterns


Recommended