+ All Categories
Home > Documents > Temporal Skeletonization on Sequential Data...3 Hong Kong University of Science and Technology 1/22....

Temporal Skeletonization on Sequential Data...3 Hong Kong University of Science and Technology 1/22....

Date post: 22-Aug-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
22
Temporal Skeletonization on Sequential Data Temporal Skeletonization on Sequential Data Patterns, Categorization, and Visualization Chuanren Liu Rutgers University [email protected] Chuanren Liu 1 , Kai Zhang 2 , Hui Xiong 1 , Geoff Jiang 2 , Qiang Yang 3 1 Rutgers University 2 NEC Laboratories America, Inc. 3 Hong Kong University of Science and Technology 1 / 22
Transcript
Page 1: Temporal Skeletonization on Sequential Data...3 Hong Kong University of Science and Technology 1/22. Temporal Skeletonization on Sequential Data Background ... This is an standard

Temporal Skeletonization on Sequential Data

Temporal Skeletonization on Sequential DataPatterns, Categorization, and Visualization

Chuanren LiuRutgers University

[email protected]

Chuanren Liu1, Kai Zhang2, Hui Xiong1, Geoff Jiang2, Qiang Yang3

1 Rutgers University2 NEC Laboratories America, Inc.

3 Hong Kong University of Science and Technology

1 / 22

Page 2: Temporal Skeletonization on Sequential Data...3 Hong Kong University of Science and Technology 1/22. Temporal Skeletonization on Sequential Data Background ... This is an standard

Temporal Skeletonization on Sequential Data

Background

Sequential Pattern Analysis

Diversified applications in dynamic business environments.

Challenge: determine the right granularity for sequentialpattern mining with curse of cardinality.

Cardinality of sequential data: number of symbols.

2 / 22

Page 3: Temporal Skeletonization on Sequential Data...3 Hong Kong University of Science and Technology 1/22. Temporal Skeletonization on Sequential Data Background ... This is an standard

Temporal Skeletonization on Sequential Data

Introduction

Business-to-Business Purchase Pattern Analysis

Challenge: symbolic items with unknown stages.

3 / 22

Page 4: Temporal Skeletonization on Sequential Data...3 Hong Kong University of Science and Technology 1/22. Temporal Skeletonization on Sequential Data Background ... This is an standard

Temporal Skeletonization on Sequential Data

Introduction

Motivation: Curse of Cardinality

0.1 0.2 0.3 0.4 0.5 0.6 0.7

10−1

100

101

102

103

104

Support

Tim

e(S

econ

ds)

GSPSPADE

PrefixSpanSPAMOurs

Complexity

p: patternc : cardinality

`: pattern length

Pr(p) = c−`

Rareness

m,h,j,f,d,a,i,k,b

j,l,m,a,n,f,b,o,g

e,h,l,c,f,n,i,b,o

h,l,e,c,a,f,k,o,i

Granularity

m,k,j,f,d,a,i,h,b

j,l,m,a,n,f,b,o,g

e,h,l,c,f,n,i,b,o

h,l,e,c,m,f,k,o,i

Noise

4 / 22

Page 5: Temporal Skeletonization on Sequential Data...3 Hong Kong University of Science and Technology 1/22. Temporal Skeletonization on Sequential Data Background ... This is an standard

Temporal Skeletonization on Sequential Data

Introduction

Existing Approaches

Figure : Taxonomy of symbols.

s x y z

s1 * * *s2 * * *s3 * * *s4 * * *s5 * * *

Table : Features of symbols.

Disadvantages:

� It is difficult to obtain knowledge of items.� It is difficult to define distances and cluster the items.� Performed irrespective of the temporal content.� Cannot identify relevant temporal structure.

5 / 22

Page 6: Temporal Skeletonization on Sequential Data...3 Hong Kong University of Science and Technology 1/22. Temporal Skeletonization on Sequential Data Background ... This is an standard

Temporal Skeletonization on Sequential Data

Introduction

Our Approach: Temporal Skeletonization

Proactively reduce cardinality by temporal clusters:

Summarize the temporal correlaion in graph.

Embed the graph in a low dimensional space.

Identify temporal clusters.

Re-encode sequences with temporal clusters.

−30 −20 −10 0 10 20 30−30

−25

−20

−15

−10

−5

0

5

10

15

20

x

y

(a) Random sequences

−40 −30 −20 −10 0 10 20 30 40−25

−20

−15

−10

−5

0

5

10

15

20

x

y

(b) Seqs with temporal clusters

−25 −20 −15 −10 −5 0 5 10 15−40

−30

−20

−10

0

10

20

x

y

(c) Customer event sequences

Figure : The embedding of items in different types of sequence data.

6 / 22

Page 7: Temporal Skeletonization on Sequential Data...3 Hong Kong University of Science and Technology 1/22. Temporal Skeletonization on Sequential Data Background ... This is an standard

Temporal Skeletonization on Sequential Data

Temporal Skeletonization

Graph Model for Temporal Skeletonizaion I

Notations:

Symbols S = {e1, e2, · · · , e|S|}.Sequence Sn = (sn1 , s

n2 , · · · , snTn

).

Sequences {Sn|n = 1, 2, · · · ,N}.Objective: a coding scheme y = f (e) ∈ {1, 2, · · · ,K} by

min1

N

N∑n=1

∑1≤p,q≤Tn

|p−q|≤r

(f (snp )− f (snq )

)2.

Relax the integer constraints to:

yi = f (ei ) ∈ R.

7 / 22

Page 8: Temporal Skeletonization on Sequential Data...3 Hong Kong University of Science and Technology 1/22. Temporal Skeletonization on Sequential Data Background ... This is an standard

Temporal Skeletonization on Sequential Data

Temporal Skeletonization

Graph Model for Temporal Skeletonizaion II

Define the temporal graph W as

Wij =1

N

N∑n=1

∑1≤p,q≤Tn

|p−q|≤r

[snp = ei ∧ snq = ej ]

Then our objective is to minimize∑i ,j

Wij(yi − yj)2.

This is an standard graph-based optimization problem (withconstraints avoiding trivial solutions).

8 / 22

Page 9: Temporal Skeletonization on Sequential Data...3 Hong Kong University of Science and Technology 1/22. Temporal Skeletonization on Sequential Data Background ... This is an standard

Temporal Skeletonization on Sequential Data

Temporal Skeletonization

Embedding and Visualization

Compute eigenvectors of graph Laplacian as embedding.

Cluster the items in the embedding space.

Transform raw sequences to sequences of temporal clusters.

−30 −20 −10 0 10 20 30−30

−25

−20

−15

−10

−5

0

5

10

15

20

x

y

(a) Random sequences

−40 −30 −20 −10 0 10 20 30 40−25

−20

−15

−10

−5

0

5

10

15

20

x

y

(b) Seqs with temporal clusters

−25 −20 −15 −10 −5 0 5 10 15−40

−30

−20

−10

0

10

20

x

y

(c) Customer event sequences

Figure : The embedding of symbols in different types of sequence data.

9 / 22

Page 10: Temporal Skeletonization on Sequential Data...3 Hong Kong University of Science and Technology 1/22. Temporal Skeletonization on Sequential Data Background ... This is an standard

Temporal Skeletonization on Sequential Data

Temporal Skeletonization

Post-Temporal-Smoothing

Objective: Further reduce temporal variations in individualsequences.

Solution: Gaussian mixture models with fused lasso.

Notations: Given soft stages Y ∈ RT×K , compute a smootherversion X n:

Optimization: max∑T

t=1

∑Kk=1 XtkYtk , subject to

1

T − 1

T−1∑t=1

‖Xt − Xt+1‖1 ≤ λ,

K∑k=1

Xtk = 1,Xtk ≥ 0

10 / 22

Page 11: Temporal Skeletonization on Sequential Data...3 Hong Kong University of Science and Technology 1/22. Temporal Skeletonization on Sequential Data Background ... This is an standard

Temporal Skeletonization on Sequential Data

Temporal Skeletonization

Applications

1 Sequence visualization

Visualize symbol with coordinates.Visualize sequence as a trajectory.

2 Sequential pattern mining

Temporal cluster as new granularity.Curse of cardinality can be relieved.

3 Sequence clustering

Noises removed.More meaningful features.

11 / 22

Page 12: Temporal Skeletonization on Sequential Data...3 Hong Kong University of Science and Technology 1/22. Temporal Skeletonization on Sequential Data Background ... This is an standard

Temporal Skeletonization on Sequential Data

Simulated Study

Data

5000 sequences with:

Stages A,B,C ,D,E

Patterns (A→ B → C → D), (B → E → C )

Symbols 25 for each stage.

Simulation process:

Determine the stage to sample from based on the pattern.

Sample d symbols from the stage.

d ∼ (1− p)pd−1, p =14

15,E[d ] = 15.

12 / 22

Page 13: Temporal Skeletonization on Sequential Data...3 Hong Kong University of Science and Technology 1/22. Temporal Skeletonization on Sequential Data Background ... This is an standard

Temporal Skeletonization on Sequential Data

Simulated Study

Baselines I

Using the temporal clusters (stages) identified via our approach,the mining process succeeds quickly in less than 1 second, and canrecover exactly the two ground truth stage-wise patterns (whensupport is less than or equal to 0.5).

0.1 0.2 0.3 0.4 0.5 0.6 0.7

10−1

100

101

102

103

104

Support

Tim

e(S

econ

ds)

GSPSPADE

PrefixSpanSPAMOurs

0.1 0.2 0.3 0.4 0.5 0.6 0.7

100

101

102

103

104

105

Support

Pat

tern

s

OthersOurs

Figure : FSM algorithms on the simulated data.

13 / 22

Page 14: Temporal Skeletonization on Sequential Data...3 Hong Kong University of Science and Technology 1/22. Temporal Skeletonization on Sequential Data Background ... This is an standard

Temporal Skeletonization on Sequential Data

Simulated Study

Baselines II

HMMC iterations:

1 For all sequences in cluster Ck , estimate a transition matrixφk and a emission matrix θk ;

2 Reallocate each sequence Sn to the cluster Ck on whosetransition and emission matrices it has the highest probabilityof being produced, i.e., k = arg maxk Pr(Sn|φk , θk).

Task Sequence clustering Stage recovery

Method HMM Ours HMM Ours

Precision 0.997 1 0.488 1Recall 0.997 1 0.448 1

Table : Error of HMM and our method on two tasks.

14 / 22

Page 15: Temporal Skeletonization on Sequential Data...3 Hong Kong University of Science and Technology 1/22. Temporal Skeletonization on Sequential Data Background ... This is an standard

Temporal Skeletonization on Sequential Data

Simulated Study

Our Results

We can perfectly recover the stages.

We can perfectly cluster the sequences.

We do not require prior knowledge on the data.

−15 −10 −5 0 5 10 15−10

−8

−6

−4

−2

0

2

4

6

DA

B C

E

(a) Five symbol clusters (b) Two sequence clusters

15 / 22

Page 16: Temporal Skeletonization on Sequential Data...3 Hong Kong University of Science and Technology 1/22. Temporal Skeletonization on Sequential Data Background ... This is an standard

Temporal Skeletonization on Sequential Data

B2B Purchase Pattern Analysis

Data Description

Huge amount of customer event data:

88040 customers

5028 event symbols

248725 event records

16 / 22

Page 17: Temporal Skeletonization on Sequential Data...3 Hong Kong University of Science and Technology 1/22. Temporal Skeletonization on Sequential Data Background ... This is an standard

Temporal Skeletonization on Sequential Data

B2B Purchase Pattern Analysis

Embedding Results and Buying Stages

−30 −20 −10 0 10 20 30 40

−30

−20

−10

0

10

20

30

40

50

C1

C2

C3

C4

C5

C6

C7

C8

C9

C10

C11

C12

C13

C Top keywords SizeC1 Official Website 12C2 Corporate Event, Direct Mail 20C3 Trial Product Download 45C4 Conference 27C5 Unsubscribe 38C6 Webinar 101C7 Trial Product Download 70C8 Tradeshow 37C9 Corporate Event, Direct Mail 65C10 Web Marketing Ads 13C11 Webinar 21C12 Webinar 42C13 Search Engine 12

17 / 22

Page 18: Temporal Skeletonization on Sequential Data...3 Hong Kong University of Science and Technology 1/22. Temporal Skeletonization on Sequential Data Background ... This is an standard

Temporal Skeletonization on Sequential Data

B2B Purchase Pattern Analysis

Sequential Patterns in B2B customer event data

0 2 4 6 8

·10−2

10−1

100

101

102

103

Support

Tim

e(S

econ

ds)

GSPSPADE

PrefixSpanSPAMOurs

0 2 4 6 8

·10−2

101

102

103

104

Support

Pat

tern

s

OthersOurs

Figure : Sequential patterns in B2B customer event data.

18 / 22

Page 19: Temporal Skeletonization on Sequential Data...3 Hong Kong University of Science and Technology 1/22. Temporal Skeletonization on Sequential Data Background ... This is an standard

Temporal Skeletonization on Sequential Data

B2B Purchase Pattern Analysis

Critical Buying Paths

−30 −20 −10 0 10 20 30 40−40

−30

−20

−10

0

10

20

30

40

50

P1

P2

P3

P4

P5

Unsubscribe

Mail

Tradeshow

Conference

Search

Ads

Website

Webinar

Class P Path/Keyword Size

Successful

P1

C10 → C1 → C7 → C12 → C8 933Ads→Website→Download→Webinar→Tradeshow

P2

C13 → C3 → C11 → C12 → C8 1110Search→Download→Webinar→Webinar→Tradeshow

P3

C6 → C11 → C12 → C8 702Webinar→Webinar→Webinar→Tradeshow

Unsuccessful

P4

C2 → C9 → C5 423Mail→Corporate Event→Unsubscribe

P5C11 → C4 → C5 333Webinar→Conference→Unsubscribe

19 / 22

Page 20: Temporal Skeletonization on Sequential Data...3 Hong Kong University of Science and Technology 1/22. Temporal Skeletonization on Sequential Data Background ... This is an standard

Temporal Skeletonization on Sequential Data

B2B Purchase Pattern Analysis

Comparison with baselines

H1

H2

H3

H4

H5

State Top keywordsH1 Seminar, Official Website, Trial Download

H2 Seminar, Corporate Event, Tradeshow

H3 Trial Download, Seminar, Corporate Event

H4 Seminar, Conference, Corporate Event

H5 Seminar, Unsubscribe, Corporate Event

20 / 22

Page 21: Temporal Skeletonization on Sequential Data...3 Hong Kong University of Science and Technology 1/22. Temporal Skeletonization on Sequential Data Background ... This is an standard

Temporal Skeletonization on Sequential Data

Summary

Temporal skeletonization

A new approach to address curse of cardinality:

Translate rich temporal content into topoly space.Explore, quantify, and visualize temporal structures.

Applied on B2B customer event data:

Discover ciritcal purchasing patterns.Identify dynamic buying stages of customers.Improve the marketing practice.

21 / 22

Page 22: Temporal Skeletonization on Sequential Data...3 Hong Kong University of Science and Technology 1/22. Temporal Skeletonization on Sequential Data Background ... This is an standard

Temporal Skeletonization on Sequential Data

Summary

Thanks!

22 / 22


Recommended