Parallelizing Random Walk with Restart for Large-Scale Query Recommendation

Post on 31-Jan-2016

41 views 0 download

description

Parallelizing Random Walk with Restart for Large-Scale Query Recommendation. Meng -Fen Chiang, Tsung -Wei Wang and Wen-Chih Peng Department of Computer Science National Chiao Tung University (R.O.C.). Outline. Introduction Related Work problem Definition Parallel RWR - PowerPoint PPT Presentation

transcript

Parallelizing Parallelizing Random Walk with Restart for Random Walk with Restart for

Large-Scale Query RecommendationLarge-Scale Query Recommendation

Meng-Fen Chiang, Tsung-Wei Wang andMeng-Fen Chiang, Tsung-Wei Wang and

Wen-Chih PengWen-Chih Peng

Department of Computer ScienceDepartment of Computer Science

National Chiao Tung University (R.O.C.)National Chiao Tung University (R.O.C.)

OutlineOutline

• IntroductionIntroduction• Related WorkRelated Work• problem Definitionproblem Definition• Parallel RWRParallel RWR

– Temporal following pattern mining– Recommendation graph construction– Random walk with restart for multiple queries

• Experimental ResultsExperimental Results• ConclusionConclusion

2

IntroductionIntroduction

• Yahoo! Asia Knowledge Plus (AKP)Yahoo! Asia Knowledge Plus (AKP)

Question Answer

3

Introduction (contd.)Introduction (contd.)

• User access logUser access log– Consider a QA pair as an Item– A sequence of items clicked by a user

– Typically, what a user looks for during a short period shares certain topics

• Within 4 min, 18 sec. “Upload photos to Facebook “4

Introduction (contd.)Introduction (contd.)

• Random Walk with Restart (RWR)Random Walk with Restart (RWR)– Compute relevance scores of a set of node for

a query nodeNode 4

Node 1Node 2Node 3Node 4Node 5Node 6Node 7Node 8Node 9Node 10Node 11Node 12

0.130.100.130.220.130.050.050.080.040.030.040.02

1

4

3

2

56

7

910

811

120.13

0.10

0.13

0.13

0.05

0.05

0.08

0.04

0.02

0.04

0.03

5

OutlineOutline

• IntroductionIntroduction• Related WorkRelated Work• problem Definitionproblem Definition• Parallel RWRParallel RWR

– Temporal following pattern mining– Recommendation graph construction– Random walk with restart for multiple queries

• Experimental ResultsExperimental Results• ConclusionConclusion

6

Related WorkRelated Work

• Random Walk with Restart (RWR)Random Walk with Restart (RWR)– Off-line mode

• Pre-compute required information off-line– Pros : fast on-line recommendation for a query– Cons : prohibitive storage consumption

– On-line mode• Compute recommendation for a query on-line

– Pros : less storage consumption– Cons : longer response time

– Fast RWR• Less storage consumption• Fast on-line response time for a query

7

Related Work (contd.)Related Work (contd.)

• Scalable recommendationScalable recommendation– SmartMiner

• Identify user sessions• Mine frequent navigation patterns

– Personalized community recommendation• 312 K active users, 109 K popular communities• Training time ~ 14 mins (200 nodes)

– Personalized news recommendation• Handel streaming content• No explicit runtime analysis of off-line training and

on-line recommendation

8

OutlineOutline

• IntroductionIntroduction• Related WorkRelated Work• problem Definitionproblem Definition• Parallel RWRParallel RWR

– Temporal following pattern mining– Recommendation graph construction– Random walk with restart for multiple queries

• Experimental ResultsExperimental Results• ConclusionConclusion

9

Problem DefinitionProblem Definition

• GoalGoal– Given user click logs, a query item I– Recommend relevant items w.r.t. I

• RequirementsRequirements– Effectiveness

• Mine frequent navigation patterns from click logs

– Scalability• Efficiently manage large-scale click logs within few

hours– Parallelization of RWR– Parallelization of RWR for multiple query nodes

10

OutlineOutline

• IntroductionIntroduction• Related WorkRelated Work• problem Definitionproblem Definition• A framework for scalable A framework for scalable

recommendationrecommendation– Temporal following pattern mining– Recommendation graph construction– Random walk with restart for multiple queries

• Experimental ResultsExperimental Results• ConclusionConclusion

11

System Architecture System Architecture

User Access Logs

Temporal Following Pattern

Mining

Parameters:1.window size2.bin size

Item ID : <Item List>. . .

Recommendation Graph

Construction

Random Walk with Restart

Item ID : <Item List>. . .

Query Items :Item 1Item 2

. . .

12Off-Line Computation StorageInput

Mining Temporal Following Mining Temporal Following Patterns in ParallelPatterns in Parallel

User Access Logs

Temporal Following Pattern

Mining

Parameters:1.window size2.bin size

Item ID : <Item List>. . .

Recommendation Graph

Construction

Random Walk with Restart

Item ID : <Item List>. . .

Query Items :Item 1Item 2

. . .

13

Temporal Following RelationTemporal Following Relation

• Frequent QA browsing behaviors of Frequent QA browsing behaviors of users within a pre-defined time users within a pre-defined time windowwindow– E.g., window size = 150 sec.

14

Item 1 Item 2 Item 4Item 3User Click Stream :

0

Temporal Following relation : <Item 1, Item 2> : dt = 30

30 70 160

<Item 1, Item 3> : dt = 70

. . .<Item 1, Item 4> : dt = 160

Temporal Following Pattern Temporal Following Pattern MiningMining

15

Mapper 1

Mapper N

Reducer 1 Reducer N

User click logs

. . .

. . .

Parameters

<Itemi , Itemj:cntij>

<Itemi , <Itemj:cntij, …, Itemz:cntiz>>

Temporal Following Relations

Temporal Following Patterns

Emit temporal following pairs for each item

Aggregate temporal following relation for each item

Recommendation Graph Recommendation Graph ConstructionConstruction

User Access Logs

Temporal Following Pattern

Mining

Parameters:1.window size2.bin size

Item ID : <Item List>. . .

Recommendation Graph

Construction

Random Walk with Restart

Item ID : <Item List>. . .

Query Items :Item 1Item 2

. . .

16

Recommendation Graph Recommendation Graph ConstructionConstruction

• Goal Goal – Transform discovered temporal following

patterns to a recommendation graph

• E.g., E.g.,

17

<Item 1, <Item2:cnt12, item3:cnt13>>

Temporal Following Pattern

<Item 4, <Item3:cntt13>> n1

n2

n3

n4

cnt13

cnt12

cnt43

Recommendation Graph

Paralleling Paralleling Random Walk with RestartRandom Walk with Restart

User Access Logs

Temporal Following Pattern

Mining

Parameters:1.window size2.bin size

Item ID : <Item List>. . .

Recommendation Graph

Construction

Random Walk with Restart

Item ID : <Item List>. . .

Query Items :Item 1Item 2

. . .

18

Paralleling Paralleling Random Walk with RestartRandom Walk with Restart

• With single queryWith single query

1

43

2

5 6

7

9 10

811

120.130.10

0.13

0.13

0.05

0.05

0.08

0.04

0.02

0.04

0.03

Node 4

Node 1Node 2Node 3Node 4Node 5Node 6Node 7Node 8Node 9Node 10Node 11Node 12

0.130.100.130.220.130.050.050.080.040.030.040.02

1

43

2

5 6

7

9 10

811

12

19

Paralleling RWR With Single QueryParalleling RWR With Single Query

20

Machine 1 : Set initial score

for q

Machine N : Set initial score

for qMachine 1 :

Calculate relevance score

for each item

Machine N : Calculate

relevance score for each item

Machine 1 : Calculate difference of relevance score

vectors

Machine N : Calculate difference of relevance score

vectors

q : an item

User click logs

. . .

. . .

. . .

Initialization

RWR

Convergence

Converged

Parameters

No Yes

Paralleling Paralleling Random Walk with RestartRandom Walk with Restart

• With multiple queryWith multiple query

1

4

3

2

5 6

7

9 10

811

12

1

43

2

56

7

9 10

811

12

1

43

2

5 6

7

9 10

811

120.130.10

0.13

0.13

0.05

0.05

0.08

0.04

0.02

0.04

0.03

21

1

43

2

5 6

7

9 10

811

120.100.10

0.10

0.13

0.13

0.13

0.13

0.04

0.02

0.04

0.03

0.13

Paralleling RWR With Multiple Paralleling RWR With Multiple QueriesQueries

22

Machine 1 : Set initial score

for Q

Machine N : Set initial score

for Q

Mapper 1 : Calculate diffusion score for each item

w.r.t. each q

Mapper N : Calculate relevance score for each item

w.r.t. each q

Reducer 1 : Sum up diffusion

score for each item w.r.t. q

Reducer N : Sum up diffusion

score for each w.r.t. q

Q : itemsUser click logs

. . .

. . .

. . .

Initialization

RWR

Parameters

Until Maximum iteration<Itemi , <q1:rs1i, …, qz:rs1z> <adjacent list>>

Paralleling RWR With Multiple QueriesParalleling RWR With Multiple Queries

• Diffusion score for each item w.r.t. Diffusion score for each item w.r.t. qq

• Sum up diffusion scores for each item Sum up diffusion scores for each item w.r.t. w.r.t. qq

23

OutlineOutline

• IntroductionIntroduction• Related WorkRelated Work• problem Definitionproblem Definition• Parallel RWRParallel RWR

– Temporal following pattern mining– Recommendation graph construction– Random walk with restart for multiple queries

• Experimental ResultsExperimental Results• ConclusionConclusion

24

Experimental SetupExperimental Setup

• Yahoo! Asia Knowledge Plus (AKP)Yahoo! Asia Knowledge Plus (AKP)– Duration : 1-week in July, 2009– #clicks : 90 M– #items : 4 M– #users : 2 M

• Performance evaluationPerformance evaluation– Quality study– Scalability study– Case study

25

Quality StudyQuality Study

• User access logsUser access logs– Train 80% – Test 20%

• GroundtruthGroundtruth– For each item I clicked by user U– The set of items clicked by U after I within T sec.

• Measure the similarity with historical Measure the similarity with historical user click logsuser click logs– Item-precision– Item-recall

26

Quality Study (contd.)Quality Study (contd.)

– Top-k hot items in the category of test item (HC)

– Temporal following pattern (TFP)– RWR based on temporal following pattern

(RWRTFP)• Higher precision & recall

27

Scalability StudyScalability Study

• Temporal following pattern (TFP)– 4.1M items– 40 sec.• RWR based on temporal following pattern

(RWRTFP)– #sizes of input data – #computing nodes

28

Scalability Study (contd.)Scalability Study (contd.)

• Computational cost is significantly reduced as number of machines increases

• More queries, more computation effective– 0.74 sec. (2K queries) 0.49 sec. (10K

queries)

29

Case StudyCase Study

• Query ItemQuery Item– “What can I do if I do not have Word?”

30

ConclusionConclusion

• Proposes a parallel RWR for multiple Proposes a parallel RWR for multiple query recommendationquery recommendation– Parallelize mining frequent navigation

behavior– Parallelize RWR– Compute RWR for multiple queries in parallel

• The recommender systemThe recommender system– General– Content- agnostic

31

Q & AQ & A

32

Temporal Following Pattern Temporal Following Pattern MiningMining

33

Mapper 1 : Emit temporal

following pairs for each item

Mapper N : Emit temporal

following pairs for each item

Reducer 1 : Aggregate temporal following relation for

each item

Reducer N : Aggregate temporal following relation for

each item

User click logs

. . .

. . .

Parameters

<Itemi , Itemj:dtij>

<Itemi , <Itemj:dtij, …, Itemz:dtiz>>

Temporal Following Relations

Temporal Following Patterns