+ All Categories
Home > Documents > TrailMix RecSys2018 43 -...

TrailMix RecSys2018 43 -...

Date post: 23-Sep-2020
Category:
Upload: others
View: 1 times
Download: 0 times
Share this document with a friend
20
Xing Zhao, Qingquan Song, James Caverlee and Xia Hu Department of Computer Science and Engineering Texas A&M University, USA 1
Transcript
Page 1: TrailMix RecSys2018 43 - people.tamu.edupeople.tamu.edu/~zhaoxing623/slides/TrailMix_RecSys2018_43.pdf · DatasetStatistics Items Quantity Proportion Playlists 1,000,000 UniqueTracks

Xing Zhao, Qingquan Song, James Caverlee and Xia HuDepartment of Computer Science and Engineering

Texas A&M University, USA

1

Page 2: TrailMix RecSys2018 43 - people.tamu.edupeople.tamu.edu/~zhaoxing623/slides/TrailMix_RecSys2018_43.pdf · DatasetStatistics Items Quantity Proportion Playlists 1,000,000 UniqueTracks

Dataset Statistics

Items Quantity Proportion

Playlists 1,000,000

Unique Tracks 2,262,292 100%

Unique tracks (freq ≥ 5) 599,341 96.05% Unique tracks (freq ≥ 100) 70,229 80.67%

Unique albums 734,684Unique artists 295,860

4

Track Appeared Times in Training Data1 5 10 100 1000 10000 40000

Num

ber o

f Rem

aini

ng T

rack

s

#106

0

0.5

1

1.5

2

2.5

Cum

sum

Tak

ing

Up

of P

ositi

ve S

ampl

es

0

0.2

0.4

0.6

0.8

1

Therefore, in some part of our methods, weonly consider these tracks for training.

Page 3: TrailMix RecSys2018 43 - people.tamu.edupeople.tamu.edu/~zhaoxing623/slides/TrailMix_RecSys2018_43.pdf · DatasetStatistics Items Quantity Proportion Playlists 1,000,000 UniqueTracks

Our Method - TrailMix

DNCF

C-Tree

CC-Title

5

PlaylistContinuation:For Task 2 to

10

Cold Start: ForTask 1

Page 4: TrailMix RecSys2018 43 - people.tamu.edupeople.tamu.edu/~zhaoxing623/slides/TrailMix_RecSys2018_43.pdf · DatasetStatistics Items Quantity Proportion Playlists 1,000,000 UniqueTracks

3 7

5 21

3 43

6 81

8 32

7

13 14

6 5

Tracks (2,262,292)

Words

(9,817)

Word list 1:Track list 1

Word list 2:Track list 2

Word list 3:Track list 3

…Clu

ster

Recommend

New title: e.g. Pop Punk 2018 Summer

Wordlist Tracks

Word list Tracks

Word list Tracks

Word list Tracks

Word list Tracks

Normalize

Pre-process

6

CC-Title: Context Clustering using Title

i

j

Track i is existed in 6playlists whose titlecontain word j

Page 5: TrailMix RecSys2018 43 - people.tamu.edupeople.tamu.edu/~zhaoxing623/slides/TrailMix_RecSys2018_43.pdf · DatasetStatistics Items Quantity Proportion Playlists 1,000,000 UniqueTracks

Items Quantityunique titles 92,944

unique normalized titles 17,381 unique non-stop

normalized words 9,817

playlist without title after processing 22,921

7

Steps:1. Preprocessing: stemming, stop words,

emoji, punctuation, etc.2. Building word-track matrix of size

9817 x 2,262,2923. Normalizing cells using ‘IDF’4. Clustering words based on row

similarity5. Recommend tracks in each cluster for

new title

CC-Title: Cont.

Page 6: TrailMix RecSys2018 43 - people.tamu.edupeople.tamu.edu/~zhaoxing623/slides/TrailMix_RecSys2018_43.pdf · DatasetStatistics Items Quantity Proportion Playlists 1,000,000 UniqueTracks

8

Highlight:1. CC-Title could deal with large scale of matrix

computation with high efficiency.2. In some cases (clusters), the performance is

very good.

CC-Title: Cont.

Page 7: TrailMix RecSys2018 43 - people.tamu.edupeople.tamu.edu/~zhaoxing623/slides/TrailMix_RecSys2018_43.pdf · DatasetStatistics Items Quantity Proportion Playlists 1,000,000 UniqueTracks

Pros: 1. Simple and Generic 2. Ensemble the advantages of basic matrix factorization model and MLP.

Cons:Computationally not efficient tobe directly applied on the targetproblem due to the huge itemscope and the matrix sparsity.

DNCF: Decorated Neural Collaborative Filtering

9

He et al. , “Neural Collaborative Filtering”. WWW, 2017.

Neural Collaborative Filtering

Page 8: TrailMix RecSys2018 43 - people.tamu.edupeople.tamu.edu/~zhaoxing623/slides/TrailMix_RecSys2018_43.pdf · DatasetStatistics Items Quantity Proportion Playlists 1,000,000 UniqueTracks

DNCF: Cont.

10

Two modifications to address efficiency issue:

Training Phase: Constrained Negative Sampling.

Testing Phase: Constrained Recommendation with Reordering.

Page 9: TrailMix RecSys2018 43 - people.tamu.edupeople.tamu.edu/~zhaoxing623/slides/TrailMix_RecSys2018_43.pdf · DatasetStatistics Items Quantity Proportion Playlists 1,000,000 UniqueTracks

2. Positive samples remain the wholedataset during training to protect thefeasible embedding and prediction ofall the testing data. (Task 2-10)

11

1. Constrain the negative samplingspace to the space of the tracksappearing equal to or more than 100times in the training data.

Track Appeared Times in Training Data1 5 10 100 1000 10000 40000

Num

ber o

f Rem

aini

ng T

rack

s

#106

0

0.5

1

1.5

2

2.5

Cum

sum

Tak

ing

Up

of P

ositi

ve S

ampl

es

0

0.2

0.4

0.6

0.8

1

Training Phase: Constrained Negative Sampling.

DNCF: Cont.

Page 10: TrailMix RecSys2018 43 - people.tamu.edupeople.tamu.edu/~zhaoxing623/slides/TrailMix_RecSys2018_43.pdf · DatasetStatistics Items Quantity Proportion Playlists 1,000,000 UniqueTracks

2. Reorder the predicted 500 tracks with an ensemble trick leveraging two types of predictions provided by the Word2Vec embedding.

12

1. Constrain the recommendation space by only recommending the popular tracks (>=100 times) during testing phase towards a more targeted prediction.

Testing Phase: Constrained Recommendation with Reordering.

DNCF Word2Vec (1) Word2Vec (2)

L1 L2 L3

φ1 φ2 φ3

φ1 \ L1 ∪ L2 ∪ L3

DNCF: Cont.

Page 11: TrailMix RecSys2018 43 - people.tamu.edupeople.tamu.edu/~zhaoxing623/slides/TrailMix_RecSys2018_43.pdf · DatasetStatistics Items Quantity Proportion Playlists 1,000,000 UniqueTracks

13

Highlight:1. Results steadily increase with maximum performance at seed 25;2. It performs better for playlists with random seeding tracks (R) than

sequential seeding tracks;

DNCF: Result

Page 12: TrailMix RecSys2018 43 - people.tamu.edupeople.tamu.edu/~zhaoxing623/slides/TrailMix_RecSys2018_43.pdf · DatasetStatistics Items Quantity Proportion Playlists 1,000,000 UniqueTracks

C-Tree: Constructed Tree

A Playlist is:1. Natural tree-structure: A playlist

consists of different tracks ,andthese tracks always belong to a specific album of an artist;

2. Meaningful Cluster: A list of tracks in a specific playlist always have latent similarity, such as genres, style, listening sense, etc.

14

Phylogenetic Tree.(Source: https://www.creative-biostructure.com/custom-

phylogenetic-tree-construction-service-399.htm)

Page 13: TrailMix RecSys2018 43 - people.tamu.edupeople.tamu.edu/~zhaoxing623/slides/TrailMix_RecSys2018_43.pdf · DatasetStatistics Items Quantity Proportion Playlists 1,000,000 UniqueTracks

15

A Real Example (PID: 11548):• Playlist Title: Pop Puck• 48 tracks belongs to 12albums by 5 artists (2 rockbands and 3 pop punkbands)

Pop punk band

Rock bandHow do we compare theinternal relationship?How do we compare itwith another tree(external)?

C-Tree: Cont.

Page 14: TrailMix RecSys2018 43 - people.tamu.edupeople.tamu.edu/~zhaoxing623/slides/TrailMix_RecSys2018_43.pdf · DatasetStatistics Items Quantity Proportion Playlists 1,000,000 UniqueTracks

16

Training Data: Complete Tree Testing Data: Incomplete Tree

External comparisonIncomplete Tree: A playlistonly contains partial oftracks (seed), which is

waiting for recommending.

C-Tree: Cont.

Page 15: TrailMix RecSys2018 43 - people.tamu.edupeople.tamu.edu/~zhaoxing623/slides/TrailMix_RecSys2018_43.pdf · DatasetStatistics Items Quantity Proportion Playlists 1,000,000 UniqueTracks

Steps:

1. Building Forest: 1 millioncomplete trees;

2. Comparing and normalizing thedistance between theincomplete tree T-test andcomplete tree T-train;

3. Recommending the tracks(leaves) from each T-train to theincomplete tree T-test, based onthe score of each leaf.

17

Playlist 1

Playlist 2

Playlist 3

Playlist 4

Playlist n

C-Tree: Cont.

Page 16: TrailMix RecSys2018 43 - people.tamu.edupeople.tamu.edu/~zhaoxing623/slides/TrailMix_RecSys2018_43.pdf · DatasetStatistics Items Quantity Proportion Playlists 1,000,000 UniqueTracks

18

C-Tree: Result

Highlight:1. Results steadily increase with maximum performance at seed 25;2. It performs better for playlists with random seeding tracks (R) than

sequential seeding tracks;

Page 17: TrailMix RecSys2018 43 - people.tamu.edupeople.tamu.edu/~zhaoxing623/slides/TrailMix_RecSys2018_43.pdf · DatasetStatistics Items Quantity Proportion Playlists 1,000,000 UniqueTracks

TrailMix: Ensemble Model

CC-Title

FinalRecommendation

ADNCF BDNCF

AC-Tree BC-Tree

Num_handout

Metho

d1

Metho

d2

19

Page 18: TrailMix RecSys2018 43 - people.tamu.edupeople.tamu.edu/~zhaoxing623/slides/TrailMix_RecSys2018_43.pdf · DatasetStatistics Items Quantity Proportion Playlists 1,000,000 UniqueTracks

Experiment and Result

Experiment Setting:• Training 80%, testing 20%: cross-validation for hyperparameter tuning• Testing data strictly follows the rules designed byRecSys 2018

20

Page 19: TrailMix RecSys2018 43 - people.tamu.edupeople.tamu.edu/~zhaoxing623/slides/TrailMix_RecSys2018_43.pdf · DatasetStatistics Items Quantity Proportion Playlists 1,000,000 UniqueTracks

Thank you!

21

Page 20: TrailMix RecSys2018 43 - people.tamu.edupeople.tamu.edu/~zhaoxing623/slides/TrailMix_RecSys2018_43.pdf · DatasetStatistics Items Quantity Proportion Playlists 1,000,000 UniqueTracks

Q&A

22


Recommended