FTW: Fast Similarity Search under the Time Warping Distance

Post on 19-Dec-2021

5 views 0 download

transcript

FTW: Fast Similarity Search under the Time Warping Distance

Yasushi Sakurai (NTT Cyber Space Labs)Masatoshi Yoshikawa (Nagoya Univ.)Christos Faloutsos (Carnegie Mellon Univ.)

PODS 2005 Y. Sakurai et al 2

Motivation

n Time-series dataq many applications

n computational biology, astrophysics, geology, meteorology, multimedia, economics

n Similarity searchq Euclidean distanceq DTW (Dynamic Time Warping)

n Useful for different sequence lengthsn Different sampling ratesn scaling along the time axis

PODS 2005 Y. Sakurai et al 3

Mini-introduction to DTWn DTW allows sequences to be stretched along the

time axisq Minimize the distance of sequencesq Insert ‘stutters’ into a sequenceq THEN compute the (Euclidean) distance

‘stutters’:original

PODS 2005 Y. Sakurai et al 4

Mini-introduction to DTWn DTW is computed by dynamic programming

q Warping path: set of grid cells in the time warping matrix

data sequence P of length N

query sequence Q of length M

pN

qM

pi

qjq1

p1

P

Q

p1 pi pNq1

qj

qM

p-stutters

q-stutters

Optimum warping path(the best alignment)

PODS 2005 Y. Sakurai et al 5

Mini-introduction to DTW

ïî

ïí

ì

----

+-=

=

)1,1(),1()1,(

min),(

),(),(

jifjif

jifqpjif

MNfQPD

ji

dtw

q-stutterno stutter

p-stutter

n DTW is computed by dynamic programming

p1, p2, …, pi,; q1, q2, …, qj

PODS 2005 Y. Sakurai et al 6

Mini-introduction to DTWn Global constraints limit the warping scope

q Warping scope: area that the warping path is allowed to visit

P

Q

p1 pi pN

q1

qj

qM

P

Q

p1 pi pN

q1

qj

qM

Itakura ParallelogramSakoe-Chiba Band

PODS 2005 Y. Sakurai et al 7

Mini-introduction to DTWn Width of the warping scope W is user-defined

P

Q

p1 pi pN

q1

qj

qM

Sakoe-Chiba Band

W1

P

Q

p1 pi pN

q1

qj

qM

W2

PODS 2005 Y. Sakurai et al 8

Motivation

n Similarity search for time-series dataq DTW (Dynamic Time Warping)

n scaling along the time axisBut…n High search cost O(NM)n prohibitive for long sequences

PODS 2005 Y. Sakurai et al 9

Our Solution, FTW

n Requirements: 1. Fast2. No false dismissals3. No restriction on the sequence length

n It should handle data sequences of different lengths4. Support for any, as well as for no restriction on

“warping scope”

PODS 2005 Y. Sakurai et al 10

Problem Definition

n Givenq S time-series data sequences of unequal lengths

{P1, P2, …, PS}, q a query sequence Q, q an integer k, q (optionally) a warping scope W,

n Find the k-nearest neighbors of Q from the data sequence set by using DTW with W

PODS 2005 Y. Sakurai et al 11

Overview

n Introductionn Related workn Main ideasn Experimental resultsn Conclusions

PODS 2005 Y. Sakurai et al 12

Related Work

n Sequence indexingq Agrawal et al. (FODO 1998)q Keogh et al. (SIGMOD 2001)q …

n Subsequence matchingq Faloutsos et al. (SIGMOD 1994)q Moon et al. (SIGMOD 2002)q …

PODS 2005 Y. Sakurai et al 13

Related Work

n Fast sequence matching for DTWq Yi et al. (ICDE 1998)q Kim et al. (ICDE 2001)q Chu et al. (SDM 2002)q Keogh (VLDB 2002)q Zhu et al. (SIGMOD 2003)q …

n None of the existing methods for DTW fulfills all the requirements

PODS 2005 Y. Sakurai et al 14

Overview

n Introductionn Related workn Main ideasn Experimental resultsn Conclusions

PODS 2005 Y. Sakurai et al 15

Main Idea (1) - LBS

n LBS (Lower Bounding distance measure with Segmentation)

n PA : Approximate sequencesq : segment rangeq : upper valueq : lower value

q t: length of time intervals*

):( Ui

Li

Ri ppp =

Uip

Rip

Lip AP

Rp1Rp4

Rp3

t t t t

Rp2

PODS 2005 Y. Sakurai et al 16

Main Idea (1) - LBS

Rjq

Rip

n Compute lower bounding distanceq Distance of the two ranges and :

distance of their two closest points

Rjq

Rip

Time

Value Lower bound

Time

Value Lower bound=0

PODS 2005 Y. Sakurai et al 17

Main Idea (1) - LBS

n Compute lower bounding distanceq Distance of the two ranges and :

distance of their two closest points

Rjq

ïïî

ïïí

ì

>-

>-

=)(0)()(

),(otherwise

pqpqqpqp

qpD Ui

Lj

Ui

Lj

Uj

Li

Uj

Li

Rj

Riseg

Rip

details

PODS 2005 Y. Sakurai et al 18

Main Idea (1) - LBS

P

Q

P

Q

n Exact DTW distance

PODS 2005 Y. Sakurai et al 19

Main Idea (1) - LBS

n Compute lower bounding distance from PA and QA

n Use a dynamic programming approach

AP

AQ

AP

AQ

),(),( QPDQPD dtwAA

lbs £

PODS 2005 Y. Sakurai et al 20

Main Idea (1) - LBS

n Compute lower bounding distance from PA and QA

n Use a dynamic programming approach

AP

AQ

),(),( QPDQPD dtwAA

lbs £

P

Q

PODS 2005 Y. Sakurai et al 21

Main Idea (2) - EarlyStopping

n Exploit the fact that we have found k-near neighbors at distance dcbq dcb: k-nearest neighbor distance (the Current Best)

the exact distance of the best k candidates so far

PODS 2005 Y. Sakurai et al 22

Main Idea (2) - EarlyStoppingn Exclude useless warping paths by using

q Omit g(1,3) ifq Omit g(4,1) if

AP

AQ

g(1,2)

g(3,1)

AP

AQ

cbdg >)2,1( cbd

cbdg >)1,3(

PODS 2005 Y. Sakurai et al 23

Main Idea (3) - Refinement

n Q: How to choose t (length of time intervals)?

AP

AQ

g(1,2)

g(3,1)

AP

AQ

t

t

PODS 2005 Y. Sakurai et al 24

Main Idea (3) - Refinement

n Q: How to choose t (length of intervals)?n A: Use multiple granularities, as follows:

AP

AQ

g(1,2)

g(3,1)

AP

AQ

t

t

PODS 2005 Y. Sakurai et al 25

Main Idea (3) - Refinement

n Compute the lower bounding distance from the coarsest sequences as the first refinement step

n Ignore P if , otherwise:

AP

AQ

g(1,2)

g(3,1)

AP

AQ

cbAA

lbs dQPD >),(

PODS 2005 Y. Sakurai et al 26

Main Idea (3) - Refinement

n … compute the distance from more accurate sequences as the second refinement step

n … repeat

AP

AQ

AQ

AP

PODS 2005 Y. Sakurai et al 27

Main Idea (3) - Refinement

n … until the finest granularityn Update the list of k-nearest neighbors if

P

Q

P

Q

cbdtw dQPD £),(

PODS 2005 Y. Sakurai et al 28

Overview

n Introductionn Related workn Main ideasn Experimental resultsn Conclusions

PODS 2005 Y. Sakurai et al 29

Experimental results

n Setupq Intel Xeon 2.8GHz, 1GB memory, Linuxq Datasets:

Temperature, Fintime, RandomWalkq Four different time intervals (for n=2048)

t1=2, t2=8, t3=32, t4=128

n Evaluationq Compared FTW with LB_PAA (the best so far)q Mainly computation time

PODS 2005 Y. Sakurai et al 30

Outline of experiments

n Speed vs db sizen Speed vs warping scope Wn Effect of filteringn Effect of varying-length data sequences

PODS 2005 Y. Sakurai et al 31

Search Performance

n Itakura Parallelogram

P

Q

p1 pi pN

q1

qj

qM

PODS 2005 Y. Sakurai et al 32

Search Performance

n Wall clock time as a function of data set sizen Temperature FTW is up

to 50 times faster!

PODS 2005 Y. Sakurai et al 33

Search Performance

n Wall clock time as a function of data set sizen Fintime FTW is up

to 40 times faster!

PODS 2005 Y. Sakurai et al 34

Search Performance

n Wall clock time as a function of data set sizen RandomWalk FTW is up

to 40 times faster!

More effective as the size

grows

PODS 2005 Y. Sakurai et al 35

Outline of experiments

n Speed vs db sizen Speed vs warping scope Wn Effect of filteringn Effect of varying-length data sequences

PODS 2005 Y. Sakurai et al 36

Search Performance

n Sakoe-Chiba Band

P

Q

p1 pi pN

q1

qj

qM

W1

P

Q

p1 pi pN

q1

qj

qM

W2

PODS 2005 Y. Sakurai et al 37

Search Performance

n Wall clock time as a function of warping scopen Temperature FTW is up

to 220 times faster!

PODS 2005 Y. Sakurai et al 38

Search Performance

n Wall clock time as a function of warping scopen Fintime FTW is up

to 70 times faster!

PODS 2005 Y. Sakurai et al 39

Search Performance

n Wall clock time as a function of warping scopen RandomWalk FTW is up

to 100 times faster!

PODS 2005 Y. Sakurai et al 40

Outline of experiments

n Speed vs db sizen Speed vs warping scope Wn Effect of filteringn Effect of varying-length data sequences

PODS 2005 Y. Sakurai et al 41

Effect of filtering

n Most of data sequences are excluded by coarser approximations (t4=128 and t3=32)q Using multiple granularities has significant advantages

Frequency of approximation use

PODS 2005 Y. Sakurai et al 42

Outline of experiments

n Speed vs db sizen Speed vs warping scope Wn Effect of filteringn Effect of varying-length sequences

PODS 2005 Y. Sakurai et al 43

Difference in Sequence Lengthsn 5 sequence data sets

Random(2048,0): length 2048 +/- 0Random(2048,32): length 2048 +/- 16Random(2048,64), Random(2048,128), Random(2048,256)

Outperform by 2+ orders of magnitude

LB_PAA can not handle

PODS 2005 Y. Sakurai et al 44

Overview

n Introductionn Related workn Main ideasn Experimental resultsn Conclusions

PODS 2005 Y. Sakurai et al 45

Conclusions

n Design goals: 1. Fast2. No false dismissals3. No restriction on the sequence length4. Support for any, as well as for no

restriction on “warping scope”

PODS 2005 Y. Sakurai et al 46

Conclusions

n Design goals: 1. Fast (up to 220 times faster)2. No false dismissals3. No restriction on the sequence length4. Support for any, as well as for no

restriction on “warping scope”

PODS 2005 Y. Sakurai et al 47

Page Accessesn Sequential scan of feature data should boost

performance (speed-up factors SF=5, SF=10)PAds: page accesses for data sequences

PAfd: page accesses for feature datadsfd

SF PASFPA

PA +=

details