TrailMix RecSys2018 43 -...

Xing Zhao, Qingquan Song, James Caverlee and Xia HuDepartment of Computer Science and Engineering

Texas A&M University, USA

1

Dataset Statistics

Items Quantity Proportion

Playlists 1,000,000

Unique Tracks 2,262,292 100%

Unique tracks (freq ≥ 5) 599,341 96.05% Unique tracks (freq ≥ 100) 70,229 80.67%

Unique albums 734,684Unique artists 295,860

4

Track Appeared Times in Training Data1 5 10 100 1000 10000 40000

Num

ber o

f Rem

aini

ng T

rack

s

#106

0

0.5

1

1.5

2

2.5

Cum

sum

Tak

ing

Up

of P

ositi

ve S

ampl

es

0

0.2

0.4

0.6

0.8

1

Therefore, in some part of our methods, weonly consider these tracks for training.

Our Method - TrailMix

DNCF

C-Tree

CC-Title

5

PlaylistContinuation:For Task 2 to

10

Cold Start: ForTask 1

3 7

5 21

3 43

6 81

8 32

7

13 14

6 5

Tracks (2,262,292)

Words

(9,817)

Word list 1:Track list 1



…

…

…Clu

ster

Recommend

New title: e.g. Pop Punk 2018 Summer

Wordlist Tracks

Word list Tracks

Word list Tracks

Word list Tracks

Word list Tracks

Normalize

Pre-process

…

6

CC-Title: Context Clustering using Title

i

j

Track i is existed in 6playlists whose titlecontain word j

Items Quantityunique titles 92,944

unique normalized titles 17,381 unique non-stop

normalized words 9,817

playlist without title after processing 22,921

7

Steps:1. Preprocessing: stemming, stop words,

emoji, punctuation, etc.2. Building word-track matrix of size

9817 x 2,262,2923. Normalizing cells using ‘IDF’4. Clustering words based on row

similarity5. Recommend tracks in each cluster for

new title

CC-Title: Cont.

8

Highlight:1. CC-Title could deal with large scale of matrix

computation with high efficiency.2. In some cases (clusters), the performance is

very good.

CC-Title: Cont.

Pros: 1. Simple and Generic 2. Ensemble the advantages of basic matrix factorization model and MLP.

Cons:Computationally not efficient tobe directly applied on the targetproblem due to the huge itemscope and the matrix sparsity.

DNCF: Decorated Neural Collaborative Filtering

9

He et al. , “Neural Collaborative Filtering”. WWW, 2017.

Neural Collaborative Filtering

DNCF: Cont.

10

Two modifications to address efficiency issue:

Training Phase: Constrained Negative Sampling.

Testing Phase: Constrained Recommendation with Reordering.

2. Positive samples remain the wholedataset during training to protect thefeasible embedding and prediction ofall the testing data. (Task 2-10)

11

1. Constrain the negative samplingspace to the space of the tracksappearing equal to or more than 100times in the training data.

Track Appeared Times in Training Data1 5 10 100 1000 10000 40000

Num

ber o

f Rem

aini

ng T

rack

s

#106

0

0.5

1

1.5

2

2.5

Cum

sum

Tak

ing

Up

of P

ositi

ve S

ampl

es

0

0.2

0.4

0.6

0.8

1

Training Phase: Constrained Negative Sampling.

DNCF: Cont.

2. Reorder the predicted 500 tracks with an ensemble trick leveraging two types of predictions provided by the Word2Vec embedding.

12

1. Constrain the recommendation space by only recommending the popular tracks (>=100 times) during testing phase towards a more targeted prediction.

Testing Phase: Constrained Recommendation with Reordering.

DNCF Word2Vec (1) Word2Vec (2)

L1 L2 L3

φ1 φ2 φ3

φ1 \ L1 ∪ L2 ∪ L3

DNCF: Cont.

13

Highlight:1. Results steadily increase with maximum performance at seed 25;2. It performs better for playlists with random seeding tracks (R) than

sequential seeding tracks;

DNCF: Result

C-Tree: Constructed Tree

A Playlist is:1. Natural tree-structure: A playlist

consists of different tracks ,andthese tracks always belong to a specific album of an artist;

2. Meaningful Cluster: A list of tracks in a specific playlist always have latent similarity, such as genres, style, listening sense, etc.

14

Phylogenetic Tree.(Source: https://www.creative-biostructure.com/custom-

phylogenetic-tree-construction-service-399.htm)

15

A Real Example (PID: 11548):• Playlist Title: Pop Puck• 48 tracks belongs to 12albums by 5 artists (2 rockbands and 3 pop punkbands)

Pop punk band

Rock bandHow do we compare theinternal relationship?How do we compare itwith another tree(external)?

C-Tree: Cont.

16

Training Data: Complete Tree Testing Data: Incomplete Tree

External comparisonIncomplete Tree: A playlistonly contains partial oftracks (seed), which is

waiting for recommending.

C-Tree: Cont.

Steps:

1. Building Forest: 1 millioncomplete trees;

2. Comparing and normalizing thedistance between theincomplete tree T-test andcomplete tree T-train;

3. Recommending the tracks(leaves) from each T-train to theincomplete tree T-test, based onthe score of each leaf.

17

Playlist 1

Playlist 2

Playlist 3

Playlist 4

Playlist n

…

C-Tree: Cont.

18

C-Tree: Result

Highlight:1. Results steadily increase with maximum performance at seed 25;2. It performs better for playlists with random seeding tracks (R) than

sequential seeding tracks;

TrailMix: Ensemble Model

CC-Title

FinalRecommendation

ADNCF BDNCF

AC-Tree BC-Tree

Num_handout

Metho

d1

Metho

d2

19

Experiment and Result

Experiment Setting:• Training 80%, testing 20%: cross-validation for hyperparameter tuning• Testing data strictly follows the rules designed byRecSys 2018

20

Thank you!

21

Q&A

22

Date post:	23-Sep-2020
Category:	Documents
Upload:	others
View:	1 times
Download:	0 times

TrailMix RecSys2018 43 -...

Documents