+ All Categories
Home > Documents > Time Series I 1. Syllabus Nov 4Introduction to data mining Nov 5Association Rules Nov 10,...

Time Series I 1. Syllabus Nov 4Introduction to data mining Nov 5Association Rules Nov 10,...

Date post: 24-Dec-2015
Category:
Upload: doreen-gallagher
View: 215 times
Download: 0 times
Share this document with a friend
Popular Tags:
65
Time Series I 1
Transcript
Page 1: Time Series I 1. Syllabus Nov 4Introduction to data mining Nov 5Association Rules Nov 10, 14Clustering and Data Representation Nov 17Exercise session.

1

Time Series I

Page 2: Time Series I 1. Syllabus Nov 4Introduction to data mining Nov 5Association Rules Nov 10, 14Clustering and Data Representation Nov 17Exercise session.

SyllabusNov 4 Introduction to data mining

Nov 5 Association Rules

Nov 10, 14 Clustering and Data Representation

Nov 17 Exercise session 1 (Homework 1 due)

Nov 19 Classification

Nov 24, 26 Similarity Matching and Model Evaluation

Dec 1 Exercise session 2 (Homework 2 due)

Dec 3 Combining Models

Dec 8, 10 Time Series Analysis

Dec 15 Exercise session 3 (Homework 3 due)

Dec 17 Ranking

Jan 13 Review

Jan 14 EXAM

Feb 23 Re-EXAM

Page 3: Time Series I 1. Syllabus Nov 4Introduction to data mining Nov 5Association Rules Nov 10, 14Clustering and Data Representation Nov 17Exercise session.

3

Why deal with sequential data?• Because all data is sequential

• All data items arrive in the data store in some order

• Examples– transaction data– documents and words

• In some (or many) cases the order does not matter

• In many cases the order is of interest

Page 4: Time Series I 1. Syllabus Nov 4Introduction to data mining Nov 5Association Rules Nov 10, 14Clustering and Data Representation Nov 17Exercise session.

4

Time-series data: example

Financial time series

Page 5: Time Series I 1. Syllabus Nov 4Introduction to data mining Nov 5Association Rules Nov 10, 14Clustering and Data Representation Nov 17Exercise session.

5

Questions

• What is time series?

• How do we compare time series data?

• What is the structure of sequential data?

• Can we represent this structure compactly and accurately?

Page 6: Time Series I 1. Syllabus Nov 4Introduction to data mining Nov 5Association Rules Nov 10, 14Clustering and Data Representation Nov 17Exercise session.

Time Series• A sequence of observations:

– X = (x1, x2, x3, x4, …, xn)

• Each xi is a real number

– e.g., (2.0, 2.4, 4.8, 5.6, 6.3, 5.6, 4.4, 4.5, 5.8, 7.5)

time axis

valu

e ax

is

Page 7: Time Series I 1. Syllabus Nov 4Introduction to data mining Nov 5Association Rules Nov 10, 14Clustering and Data Representation Nov 17Exercise session.

7

Time Series Databases• A time series is an ordered set of real numbers,

representing the measurements of a real variable at equal time intervals

– Stock prices– Volume of sales over time– Daily temperature readings– ECG data

• A time series database is a large collection of time series

Page 8: Time Series I 1. Syllabus Nov 4Introduction to data mining Nov 5Association Rules Nov 10, 14Clustering and Data Representation Nov 17Exercise session.

8

Time Series Problems

• The Similarity Problem

X = x1, x2, …, xn and Y = y1, y2, …, yn

• Define and compute Sim(X, Y) or Dist(X, Y)– e.g. do stocks X and Y have similar movements?

• Retrieve efficiently similar time series– Indexing for Similarity Queries

Page 9: Time Series I 1. Syllabus Nov 4Introduction to data mining Nov 5Association Rules Nov 10, 14Clustering and Data Representation Nov 17Exercise session.

9

Types of queries

• whole match vs subsequence match

• range query vs nearest neighbor query

Page 10: Time Series I 1. Syllabus Nov 4Introduction to data mining Nov 5Association Rules Nov 10, 14Clustering and Data Representation Nov 17Exercise session.

10

Examples

• Find companies with similar stock prices over a

time interval

• Find products with similar sell cycles

• Cluster users with similar credit card utilization

• Find similar subsequences in DNA sequences

• Find scenes in video streams

Page 11: Time Series I 1. Syllabus Nov 4Introduction to data mining Nov 5Association Rules Nov 10, 14Clustering and Data Representation Nov 17Exercise session.

11

day

$price

1 365

day

$price

1 365

day

$price

1 365

distance function: by expert

(e.g., Euclidean distance)

Page 12: Time Series I 1. Syllabus Nov 4Introduction to data mining Nov 5Association Rules Nov 10, 14Clustering and Data Representation Nov 17Exercise session.

12

Problems

• Define the similarity (or distance) function• Find an efficient algorithm to retrieve similar

time series from a database– (Faster than sequential scan)

The Similarity function depends on the Application

Page 13: Time Series I 1. Syllabus Nov 4Introduction to data mining Nov 5Association Rules Nov 10, 14Clustering and Data Representation Nov 17Exercise session.

13

Metric Distances

• What properties should a similarity distance have to allow (easy) indexing?

– D(A,B) = D(B,A) Symmetry – D(A,A) = 0 Constancy of Self-Similarity– D(A,B) >= 0 Positivity– D(A,B) D(A,C) + D(B,C) Triangular Inequality

• Some times the distance function that best fits an application is not a metric

• Then indexing becomes interesting and challenging

Page 14: Time Series I 1. Syllabus Nov 4Introduction to data mining Nov 5Association Rules Nov 10, 14Clustering and Data Representation Nov 17Exercise session.

Euclidean Distance

14

• Each time series: a point in the n-dim space

• Euclidean distance– pair-wise point distance

v1

v2

X = x1, x2, …, xn

Y = y1, y2, …, yn

Page 15: Time Series I 1. Syllabus Nov 4Introduction to data mining Nov 5Association Rules Nov 10, 14Clustering and Data Representation Nov 17Exercise session.

15

Euclidean modelQuery Q

n datapoints

S

Q

Euclidean Distance betweentwo time series Q = {q1, q2, …, qn} and X = {x1, x2, …, xn}

Distance

0.98

0.07

0.21

0.43

Rank

4

1

2

3

Database

n datapoints

Page 16: Time Series I 1. Syllabus Nov 4Introduction to data mining Nov 5Association Rules Nov 10, 14Clustering and Data Representation Nov 17Exercise session.

16

• Easy to compute: O(n)• Allows scalable solutions to other problems,

such as– indexing– clustering– etc...

Advantages

Page 17: Time Series I 1. Syllabus Nov 4Introduction to data mining Nov 5Association Rules Nov 10, 14Clustering and Data Representation Nov 17Exercise session.

17

• Query and target lengths should be equal!

• Cannot tolerate noise:– Time shifts– Sequences out of phase– Scaling in the y-axis

Disadvantages

Page 18: Time Series I 1. Syllabus Nov 4Introduction to data mining Nov 5Association Rules Nov 10, 14Clustering and Data Representation Nov 17Exercise session.

18

Limitations of Euclidean Distance

Euclidean DistanceSequences are aligned “one to one”.

“Warped” Time AxisNonlinear alignments are possible.

Q

Q

C

C

Page 19: Time Series I 1. Syllabus Nov 4Introduction to data mining Nov 5Association Rules Nov 10, 14Clustering and Data Representation Nov 17Exercise session.

19

Dynamic Time Warping [Berndt, Clifford, 1994]

• DTW allows sequences to be stretched along the time axis– Insert ‘stutters’ into a sequence– THEN compute the (Euclidean) distance

‘stutters’original

Page 20: Time Series I 1. Syllabus Nov 4Introduction to data mining Nov 5Association Rules Nov 10, 14Clustering and Data Representation Nov 17Exercise session.

20

Computation

)1,1(

),1(

)1,(

min),(

),(),(

jif

jif

jif

qpjif

MNfQPD

ji

dtw

q-stutterno stutter

p-stutter

• DTW is computed by dynamic programming• Given two sequences

– P = {p1, p2, …, pi}

– Q = {q1, q2, …, qj}

Page 21: Time Series I 1. Syllabus Nov 4Introduction to data mining Nov 5Association Rules Nov 10, 14Clustering and Data Representation Nov 17Exercise session.

21

DTW: Dynamic time warping (1/2)

• Each cell c = (i, j) is a pair of indices

whose corresponding values will be

computed, (xi–yj)2, and included in

the sum for the distance.

• Euclidean path:

– i = j always.

– Ignores off-diagonal cells.

X

Y

xi

yj

(x2–y2)2 + (x1–y1)2 (x1–y1)2

Page 22: Time Series I 1. Syllabus Nov 4Introduction to data mining Nov 5Association Rules Nov 10, 14Clustering and Data Representation Nov 17Exercise session.

22

(i, j)

DTW: Dynamic time warping (2/2)

• DTW allows any path.• Examine all paths:

• Standard dynamic programming to fill in the table.

• The top-right cell contains final result.

(i, j)(i-1, j)

(i-1, j-1) (i, j-1)

shrink x / stretch y

stretch x / shrink y

X

Y

a

b

Page 23: Time Series I 1. Syllabus Nov 4Introduction to data mining Nov 5Association Rules Nov 10, 14Clustering and Data Representation Nov 17Exercise session.

• Warping path W: – set of grid cells in the time warping matrix

• DTW finds the optimum warping path W:– the path with the smallest matching score

Optimum warping path W(the best alignment) Properties of a DTW legal path

I. Boundary conditions

W1=(1,1) and WK=(n,m)

II. ContinuityGiven Wk = (a, b), then Wk-1 = (c, d), where a-c ≤ 1, b-d

≤ 1III. Monotonicity

Given Wk = (a, b), then Wk-1 = (c, d), where a-c ≥ 0, b-d ≥ 0

Properties of DTW

X

Y

23

Page 24: Time Series I 1. Syllabus Nov 4Introduction to data mining Nov 5Association Rules Nov 10, 14Clustering and Data Representation Nov 17Exercise session.

Properties of DTW

I. Boundary conditions

W1=(1,1) and WK=(n,m)

II. ContinuityGiven Wk = (a, b), then Wk-1 = (c, d), where a-c ≤ 1, b-d

≤ 1III. Monotonicity

Given Wk = (a, b), then Wk-1 = (c, d), where a-c ≥ 0, b-d ≥ 0 24

C. S. Myers and L. R. Rabiner. A comparative study of several dynamic time-warping algorithms for connected word recognition. The Bell System Technical Journal, 60(7):1389-1409, Sept. 1981.

Sakoe, H. and Chiba, S., Dynamic programming algorithm optimization for spoken word recognition, IEEE Transactions on Acoustics, Speech and Signal Processing, 26(1) pp. 43– 49, 1978, ISSN: 0096-3518

Page 25: Time Series I 1. Syllabus Nov 4Introduction to data mining Nov 5Association Rules Nov 10, 14Clustering and Data Representation Nov 17Exercise session.

• Query and target lengths may not be of equal length

• Can tolerate noise:– time shifts– sequences out of phase– scaling in the y-axis

Advantages

25

Page 26: Time Series I 1. Syllabus Nov 4Introduction to data mining Nov 5Association Rules Nov 10, 14Clustering and Data Representation Nov 17Exercise session.

• Computational complexity: O(nm)

• May not be able to handle some types of noise...

• It is not metric (triangle inequality does not hold)

Disadvantages

26

Page 27: Time Series I 1. Syllabus Nov 4Introduction to data mining Nov 5Association Rules Nov 10, 14Clustering and Data Representation Nov 17Exercise session.

27

Sakoe-Chiba Band Itakura Parallelogram

r =

Global Constraints Slightly speed up the calculations and prevent pathological warpings A global constraint limits the indices of the warping path

wk = (i, j)k such that j-r i j+r Where r is a term defining allowed range of warping for a given point in a

sequence

Page 28: Time Series I 1. Syllabus Nov 4Introduction to data mining Nov 5Association Rules Nov 10, 14Clustering and Data Representation Nov 17Exercise session.

28

Complexity of DTW

• Basic implementation = O(n2) where n is the length of the sequences– will have to solve the problem for each (i, j) pair

• If warping window is specified, then O(nr)– only solve for the (i, j) pairs where | i – j | <= r

Page 29: Time Series I 1. Syllabus Nov 4Introduction to data mining Nov 5Association Rules Nov 10, 14Clustering and Data Representation Nov 17Exercise session.

29

Longest Common Subsequence Measures

(Allowing for Gaps in Sequences)

Gap skipped

Page 30: Time Series I 1. Syllabus Nov 4Introduction to data mining Nov 5Association Rules Nov 10, 14Clustering and Data Representation Nov 17Exercise session.

30

Longest Common Subsequence (LCSS)

ignore majority of noise

match

match

Advantages of LCSS:

A. Outlying values not matched

B. Distance/Similarity distorted less

Disadvantages of DTW:

A. All points are matched

B. Outliers can distort distance

C. One-to-many mapping

LCSS is more resilient to noise than DTW.

Page 31: Time Series I 1. Syllabus Nov 4Introduction to data mining Nov 5Association Rules Nov 10, 14Clustering and Data Representation Nov 17Exercise session.

31

Longest Common SubsequenceSimilar dynamic programming solution as DTW, but now we measure similarity not distance.

Can also be expressed as distance

Page 32: Time Series I 1. Syllabus Nov 4Introduction to data mining Nov 5Association Rules Nov 10, 14Clustering and Data Representation Nov 17Exercise session.

32

Similarity Retrieval

• Range Query– Find all time series X where

• Nearest Neighbor query– Find all the k most similar time series to Q

• A method to answer the above queries: – Linear scan

• A better approach – GEMINI [next time]

Page 33: Time Series I 1. Syllabus Nov 4Introduction to data mining Nov 5Association Rules Nov 10, 14Clustering and Data Representation Nov 17Exercise session.

33

Lower Bounding – NN search

Intuition Try to use a cheap lower bounding calculation as often as possible Do the expensive, full calculations when absolutely necessary

We can speed up similarity search by using a lower bounding function D: distance measure

LB: lower bounding function s.t.: LB(Q, X) ≤ D(Q, X)

Set best = ∞ For each Xi:

if LB(Xi, Q) < bestif D(Xi, Q) < best best = D(Xi, Q)

1-NN Search Using LB

We assume a database of time series: DB = {X1, X2, …, XN}

Page 34: Time Series I 1. Syllabus Nov 4Introduction to data mining Nov 5Association Rules Nov 10, 14Clustering and Data Representation Nov 17Exercise session.

34

Lower Bounding – NN search

Intuition Try to use a cheap lower bounding calculation as often as possible Do the expensive, full calculations when absolutely necessary

We can speed up similarity search by using a lower bounding function D: distance measure

LB: lower bounding function s.t.: LB(Q, X) ≤ D(Q, X)

Range Query Using LB

For each Xi:if LB(Xi, Q) ≤ ε

if D(Xi, Q) < ε report Xi

We assume a database of time series: DB = {X1, X2, …, XN}

Page 35: Time Series I 1. Syllabus Nov 4Introduction to data mining Nov 5Association Rules Nov 10, 14Clustering and Data Representation Nov 17Exercise session.

35

Problems• How to define Lower bounds for different distance

measures?

• How to extract the features? How to define the

feature space?– Fourier transform

– Wavelets transform

– Averages of segments (Histograms or APCA)

– Chebyshev polynomials

– .... your favorite curve approximation...

Page 36: Time Series I 1. Syllabus Nov 4Introduction to data mining Nov 5Association Rules Nov 10, 14Clustering and Data Representation Nov 17Exercise session.

36

Some Lower Bounds on DTW

A

B

C

D

Each sequence is represented by 4 features: <First, Last, Min, Max>

LB_Kim = maximum squared difference of the corresponding features

LB_Kim

max(Q)

min(Q)

LB_Yi

Page 37: Time Series I 1. Syllabus Nov 4Introduction to data mining Nov 5Association Rules Nov 10, 14Clustering and Data Representation Nov 17Exercise session.

37

LB_Keogh [Keogh 2004]

L

U

Q

U

LQ

C

Q

C

Q

Sakoe-Chiba Band

Itakura Parallelogram

Ui = max(qi-r : qi+r)Li = min(qi-r : qi+r)

Page 38: Time Series I 1. Syllabus Nov 4Introduction to data mining Nov 5Association Rules Nov 10, 14Clustering and Data Representation Nov 17Exercise session.

38

CU

LQ

CU

LQ

C

Q

C

Q

Sakoe-Chiba Band

Itakura Parallelogram

n

iiiii

iiii

otherwise

LcifLc

UcifUc

CQKeoghLB1

2

2

0

)(

)(

),(_

LB_Keogh

LB_Keogh

),(),(_ CQDTWCQKeoghLB

Page 39: Time Series I 1. Syllabus Nov 4Introduction to data mining Nov 5Association Rules Nov 10, 14Clustering and Data Representation Nov 17Exercise session.

39

LB_KeoghSakoe-Chiba

LB_KeoghItakura

LB_Yi

LB_Kim

…proportional to the length of gray lines used in the illustrations

Tightness of LB

nceDistaWarpTimeDynamicTruenceDistaWarpTimeDynamicofEstimateBoundLowerT

0 T 1The larger the

better

Page 40: Time Series I 1. Syllabus Nov 4Introduction to data mining Nov 5Association Rules Nov 10, 14Clustering and Data Representation Nov 17Exercise session.

Lower Bounding

distanceQ

we want to find the 1-NN to our query data series, Q

Page 41: Time Series I 1. Syllabus Nov 4Introduction to data mining Nov 5Association Rules Nov 10, 14Clustering and Data Representation Nov 17Exercise session.

Lower Bounding

distanceQ true S1

we compute the distance to the first data series in our dataset, D(S1,Q)

this becomes the best so far (BSF)

Page 42: Time Series I 1. Syllabus Nov 4Introduction to data mining Nov 5Association Rules Nov 10, 14Clustering and Data Representation Nov 17Exercise session.

Lower Bounding

distanceQ true S1

BSF

LB S2

we compute the distance LB(S2,Q) and it is greater than the BSF

we can safely prune it, since D(S2,Q) LB(S2,Q)

Page 43: Time Series I 1. Syllabus Nov 4Introduction to data mining Nov 5Association Rules Nov 10, 14Clustering and Data Representation Nov 17Exercise session.

Lower Bounding

distanceQ true S1

BSF

LB S2

we compute the distance LB(S3,Q) and it is smaller than the BSFwe have to compute D(S3,Q) LB(S3,Q), since it may still be smaller

than BSF

LB S3

Page 44: Time Series I 1. Syllabus Nov 4Introduction to data mining Nov 5Association Rules Nov 10, 14Clustering and Data Representation Nov 17Exercise session.

Lower Bounding

distanceQ true S1

BSF

LB S2

it turns out that D(S3,Q) BSF, so we can safely prune S3

true S3

Page 45: Time Series I 1. Syllabus Nov 4Introduction to data mining Nov 5Association Rules Nov 10, 14Clustering and Data Representation Nov 17Exercise session.

Lower Bounding

distanceQ true S1

BSF

LB S2true S3

Page 46: Time Series I 1. Syllabus Nov 4Introduction to data mining Nov 5Association Rules Nov 10, 14Clustering and Data Representation Nov 17Exercise session.

Lower Bounding

distanceQ true S1

BSF

LB S2true S3

we compute the distance LB(S4,Q) and it is smaller than the BSFwe have to compute D(S4,Q) LB(S4,Q), since it may still be smaller

than BSF

LB S4

Page 47: Time Series I 1. Syllabus Nov 4Introduction to data mining Nov 5Association Rules Nov 10, 14Clustering and Data Representation Nov 17Exercise session.

Lower Bounding

distanceQ true S1

BSF

LB S2true S3true S4

it turns out that D(S4,Q) BSF, so S4 becomes the new BSF

Page 48: Time Series I 1. Syllabus Nov 4Introduction to data mining Nov 5Association Rules Nov 10, 14Clustering and Data Representation Nov 17Exercise session.

Lower Bounding

distanceQ true S1

S1 cannot be the 1-NN, because S4 is closer to Q

LB S2true S3true S4

BSF

Page 49: Time Series I 1. Syllabus Nov 4Introduction to data mining Nov 5Association Rules Nov 10, 14Clustering and Data Representation Nov 17Exercise session.

49

How about subsequence matching?

• DTW is defined for full-sequence matching:– All points of the query sequence are matched to all points of

the target sequence

• Subsequence matching:– The query is matched to a part (subsequence) of the target

sequence

Query sequence Data stream

Page 50: Time Series I 1. Syllabus Nov 4Introduction to data mining Nov 5Association Rules Nov 10, 14Clustering and Data Representation Nov 17Exercise session.

X: long sequence

Q: short sequence

What subsequence of X is the best match for Q?

Subsequence Matching

Page 51: Time Series I 1. Syllabus Nov 4Introduction to data mining Nov 5Association Rules Nov 10, 14Clustering and Data Representation Nov 17Exercise session.

X: long sequence

Q: short sequence

What subsequence of X is the best match for Q …such that the match ends at position j?

position j

J-Position Subsequence Match

X: long sequence

Q: short sequence

Page 52: Time Series I 1. Syllabus Nov 4Introduction to data mining Nov 5Association Rules Nov 10, 14Clustering and Data Representation Nov 17Exercise session.

X: long sequence

Q: short sequence

position j

J-Position Subsequence Match

X: long sequence

Q: short sequence

Naïve Solution: DTWExamine all possible subsequences

Page 53: Time Series I 1. Syllabus Nov 4Introduction to data mining Nov 5Association Rules Nov 10, 14Clustering and Data Representation Nov 17Exercise session.

X: long sequence

Q: short sequence

position j

J-Position Subsequence Match

Naïve Solution: DTWExamine all possible subsequences

X: long sequence

Q: short sequence

X: long sequence

Q: short sequence

Naïve Solution: DTWExamine all possible subsequences

Page 54: Time Series I 1. Syllabus Nov 4Introduction to data mining Nov 5Association Rules Nov 10, 14Clustering and Data Representation Nov 17Exercise session.

X: long sequence

Q: short sequence

position j

J-Position Subsequence Match

Naïve Solution: DTWExamine all possible subsequences

X: long sequence

Q: short sequence

X: long sequence

Q: short sequence

Naïve Solution: DTWExamine all possible subsequences

Page 55: Time Series I 1. Syllabus Nov 4Introduction to data mining Nov 5Association Rules Nov 10, 14Clustering and Data Representation Nov 17Exercise session.

X: long sequence

Q: short sequence

position j

J-Position Subsequence Match

Too costly!

Naïve Solution: DTWExamine all possible subsequences

X: long sequence

Q: short sequence

X: long sequence

Q: short sequence

Naïve Solution: DTWExamine all possible subsequences

Page 56: Time Series I 1. Syllabus Nov 4Introduction to data mining Nov 5Association Rules Nov 10, 14Clustering and Data Representation Nov 17Exercise session.

56

• Compute the time warping matrices starting from every database frame– Need O(n) matrices, O(nm) time per frame

Q

Xxtstartxtend

x1

Why not ‘naive’?

Capture the optimal subsequence starting

from t = tstart

n

m

Page 57: Time Series I 1. Syllabus Nov 4Introduction to data mining Nov 5Association Rules Nov 10, 14Clustering and Data Representation Nov 17Exercise session.

57

Key Idea• Star-padding

– Use only a single matrix (the naïve solution uses n matrices)

– Prefix Q with ‘*’, that always gives zero distance

– Instead of Q=(q1 , q2 , …, qm), compute distances with Q’

– O(m) time and space (the naïve requires O(nm))

(*)

),,,,('

0

210

q

qqqqQ m

Page 58: Time Series I 1. Syllabus Nov 4Introduction to data mining Nov 5Association Rules Nov 10, 14Clustering and Data Representation Nov 17Exercise session.

SPRING: dynamic programming

Initialization Insert a “dummy” state ‘*’ at the beginning of the query ‘*’ matches every value in X with score 0

database sequence X

quer

y Q

* 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Page 59: Time Series I 1. Syllabus Nov 4Introduction to data mining Nov 5Association Rules Nov 10, 14Clustering and Data Representation Nov 17Exercise session.

Computation Perform dynamic programming computation in a similar

manner as standard DTW

database sequence X

* 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

quer

y Q

SPRING: dynamic programming

(i, j)(i, j)(i-1, j)

(i-1, j-1) (i, j-1)

Page 60: Time Series I 1. Syllabus Nov 4Introduction to data mining Nov 5Association Rules Nov 10, 14Clustering and Data Representation Nov 17Exercise session.

Q[1:i] is

matched

with X[s,j]

database sequence X

* 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

quer

y Q

i

js

For each (i, j): compute the j-position subsequence match of the first i

items of Q to X[s:j]

SPRING: dynamic programming

Page 61: Time Series I 1. Syllabus Nov 4Introduction to data mining Nov 5Association Rules Nov 10, 14Clustering and Data Representation Nov 17Exercise session.

For each (i, j): compute the j-position subsequence match of the first i

items of Q to X[s:j] Top row: j-position subsequence match of Q for all j’s Final answer: best among j-position matches

Look at answers stored at the top row of the table

database sequence X

* 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

quer

y Q

SPRING: dynamic programming

Page 62: Time Series I 1. Syllabus Nov 4Introduction to data mining Nov 5Association Rules Nov 10, 14Clustering and Data Representation Nov 17Exercise session.

database sequence X

* 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Subsequence vs. full matchingqu

ery

Q

Q

p1 pi pN

q1

qj

qM

Page 63: Time Series I 1. Syllabus Nov 4Introduction to data mining Nov 5Association Rules Nov 10, 14Clustering and Data Representation Nov 17Exercise session.

Assume that the database is one very long sequence Concatenate all sequences into one sequence

O (|Q| * |X|) But can be computed faster by looking at only two

adjacent columns

Computational complexity

database sequence X

* 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

quer

y Q

Page 64: Time Series I 1. Syllabus Nov 4Introduction to data mining Nov 5Association Rules Nov 10, 14Clustering and Data Representation Nov 17Exercise session.

STWM (Subsequence Time Warping Matrix)

• Problem of the star-padding: we lose the information about

the starting frame of the match

• After the scan, “which is the optimal subsequence?”

• Elements of STWM

– Distance value of each subsequence

– Starting position !!

• Combination of star-padding and STWM

– Efficiently identify the optimal subsequence in a stream

fashion

Page 65: Time Series I 1. Syllabus Nov 4Introduction to data mining Nov 5Association Rules Nov 10, 14Clustering and Data Representation Nov 17Exercise session.

Up next…

• Time series summarizations

– Discrete Fourier Transform (DFT)

– Discrete Wavelet Transform (DWT)

– Piecewise Aggregate Approximation (PAA)

– Symbolic ApproXimation (SAX)

• Time series classification

– Lazy learners

– Shapelets


Recommended