+ All Categories
Home > Documents > The Automatic Explanation of Multivariate Time Series (MTS) Allan Tucker.

The Automatic Explanation of Multivariate Time Series (MTS) Allan Tucker.

Date post: 21-Dec-2015
Category:
View: 234 times
Download: 0 times
Share this document with a friend
Popular Tags:
32
The Automatic Explanation of Multivariate Time Series (MTS) Allan Tucker
Transcript

The Automatic Explanation of Multivariate Time Series (MTS)

Allan Tucker

The Problem - Data

• Datasets which are Characteristically:– High Dimensional MTS – Large Time Lags– Changing Dependencies– Little or No Available Expert Knowledge

• Lack of Algorithms to Assist Users in Explaining Events where:– Model Complex MTS Data– Learnable from Data with Little or No User

Intervention– Transparency Throughout the Learning and

Explaining Process is Vital

The Problem - Requirement

Contribution to Knowledge

• Using a Combination of Evolutionary Programming (EP) and Bayesian Networks (BNs) to Overcome Issues Outlined

• Extending Learning Algorithms for BNs to Dynamic Bayesian Networks (DBNs) with Comparison of Efficiency

• Introduction of an Algorithm for Decomposing High Dimensional MTS into Several Lower Dimensional MTS

Contribution to Knowledge (Continued)

• Introduction of New EP-Seeded GA Algorithm

• Incorporating Changing Dependencies

• Application to Synthetic and Real-World Chemical Process Data

• Transparency Retained Throughout Each Stage

Real Data

Data Preparation

Search Methods

Variable Groupings

Synthetic Data

Explanation

Model Building

Evaluation

Changing Dependencies

Framework Pre-processing

Key Technical Points 1Comparing Adapted Algorithms

• New Representation• K2/K3 [Cooper and Herskovitz]• Genetic Algorithm [Larranaga]• Evolutionary Algorithm [Wong]• Branch and Bound [Bouckaert]• Log Likelihood / Description Length• Publications:

– International Journal of Intelligent Systems, 2001

Key Technical Points 2Grouping

• A Number of Correlation Searches• A Number of Grouping Algorithms• Designed Metrics• Comparison of All Combinations• Synthetic and Real Data• Publications:

– IDA99– IEEE Trans System Man and Cybernetics 2001– Expert Systems 2000

Key Technical Points 3EP-Seeded GA

• Approximate Correlation Search Based on the One Used in Grouping Strategy

• Results Used to Seed Initial Population of GA• Uniform Crossover• Specific Lag Mutation• Publications:

– Genetic Algorithms and Evolutionary Computation Conference 1999 (GECCO99)

– International Journal of Intelligent Systems, 2001– IDA2001

Key Technical Points 4Changing Dependencies

• Dynamic Cross Correlation Function for Analysing MTS

• Extend Representation Introduce a Heuristic Search - Hidden Controller Hill Climb (HCHC)– Hidden Variables to Model State of the System– Search for Structure and Hidden States Iteratively

Future Work

• Parameter Estimation

• Discretisation

• Changing Dependencies

• Efficiency

• New Datasets – Gene Expression Data– Visual Field Data

DBN Representation

t-4 t-3 t-2 t-1 t

a0(t)

a1(t)

a2(t)

a3(t)

a4(t)

a2(t-2)

a3(t-2)

a4(t-3)

a3(t-4)

(3,1,4)(4,2,3)(2,3,2)(3,0,2)(3,4,2)

Sample DBN Search Results

5000

6000

7000

8000

9000

10000

11000

0 5000 10000 15000

Function Calls

Des

crip

tio

n L

eng

th

K3

EP

GA

0

500

1000

1500

2000

2500

3000

3500

4000

4500

5000

0 200 400 600 800 1000

Function Calls

Des

crip

tio

n L

eng

th

K3

EP

GA

N = 5, MaxT = 10 N = 10, MaxT = 60

Grouping

One High Dimensional

MTS (A)

1. Correlation Search (EP)

2. GroupingAlgorithm (GGA)

Several Lower Dimensional

MTS

List

(a, b, lag)(a, b, lag)

(a, b, lag)

12

R

G{0,3}

{1,4,5}{2}

Sample Grouping Results

0 1 2

3 4 5 6 7

8 9 10 11 12

13 14 15 16 17 18 19 20 21 22

23 24 25 26 27 28 29 30 31 3233 34 35 36 37 38 39 40 41 4243 44 45 46 47 48 4950 51 52 53 54 5556 57 5859 60

0 6123 4 5 789 1011 121314 15 20 21 2216 17 18 1923 24 25 26 27 28 29 30 31 3233 34 35 36 37 38 39 40 41 4243 44 45 46 47 48 4950 51 52 53 54 5556 57 5859 60

Original Synthetic MTS Groupings

Groupings Discoveredfrom Synthetic Data

25

27

29

31

33

35

37

39

41

43

45

1 501 1001 1501 2001 2501 3001

Time

Mag

nit

ud

e (T

emp

etc

)

10

12

14

16

18

20

22

39

11

15

Sample of Variables from a Discovered Oil Refinery Data Group

Parameter Estimation• Simulate Random Bag (Vary R, s and c, e)• Calculate Mean and SD for Each Distribution (the

Probability of Selecting e from s)• Test for Normality (Lilliefors’ Test)• Symbolic Regression (GP) to Determine the Function

for Mean and SD from R, s and c (e will be Unknown)

• Place Confidence Limits on the P(Number of Correlations Found e)

0: (a,b,l) 1: (a,b,l) 2: (a,b,l)

EPListSize: (a,b,l)

Final EPList

EP

0: ((a,b,l),(a,b,l)…(a,b,l))1: ((a,b,l),(a,b,l)…(a,b,l))2: ((a,b,l),(a,b,l)…(a,b,l))

GAPopsize: ((a,b,l) … (a,b,l))

GA

Initial GAPopulationDBN

EP-Seeded GA

EP-Seeded GA Results

6000

6500

7000

7500

8000

8500

9000

9500

10000

10500

0 500 1000 1500 2000 2500 3000

Function Calls

Des

crip

tio

n L

eng

th EP

EP Seeded GA(c=20%)

EP Seeded GA(c=100%)

N = 10, MaxT = 60 N = 20, MaxT = 60

14000

15000

16000

17000

18000

19000

20000

21000

0 1000 2000 3000 4000

Function Calls

Des

crip

tio

n L

eng

th EP

EP Seeded GA(c=100%)

EP Seeded GA(c=20%)

Varying the value of c

-7000

-6500

-6000

-5500

-5000

-4500

-4000

0 1000 2000 3000 4000 5000

Function Calls

Lo

g L

ikel

iho

od

c=10%

c=20%

c=30%

c=50%

c=70%

c=100%

EP

P(TGF instate_0) = 1.0t

t-1

t-11

t-13

t-16

t-20

t-60

P(TT instate_0) = 1.0 P(BPF instate_3) = 1.0

P(TT instate_1) = 0.446

P(TGF instate_3) = 1.0

P(SOT instate_0) = 0.314

P(C2% instate_0) = 0.279

P(T6T instate_0) = 0.347

P(RinT instate_0) = 0.565

Time Explanation

Changing Dependencies

20

25

30

35

40

45

50

1 501 1001 1501 2001 2501 3001 3501

Time (Minutes)

Var

iab

le M

agn

itu

de

7

7.5

8

8.5

9

9.5

10

10.5

A/M_GB

TGF

20

40

60

80

100

120

140

1 501 1001 1501 2001 2501 3001 3501 4001 4501 5001

Time (Minutes)

Var

iab

le M

agn

itu

de

7

7.5

8

8.5

9

9.5

10

10.5

A/M_GB

TGF

Dynamic Cross- Correlation

Function

1 6

11 16

21

26

31

36

41

46

51

56

61

S1

S5

S9

S13

S17

S21

S25

S29

S33

S37

S41

S45

S49

S53

S57

S61

S65

S69

S73

S77

S81

S85

Time Lag

Win

do

w P

os

ition

0.3-0.4

0.2-0.3

0.1-0.2

0-0.1

-0.1-0

-0.2--0.1

-0.3--0.2

Hidden Variable - OpState

t-4 t-3 t-2 t-1 t

a2(t) OpState2a2(t-1)

a3(t-2)

a0(t-4)

Hidden Controller Hill Climb

Update Segment_Lists through Op_State Parameter

Estimation

Update DBN_List through DBN Structure

Search

< DBN_List > < Segment_Lists >

Score

HCHC Results - Oil Refinery Data

-1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

0 20 40 60 80

Window Position

Mo

st S

ign

ific

ant

Co

rrel

atio

n

A/M_GB

SOT

T6T

Segments

HCHC Results - Synthetic Data

MTS 1 MTS 2 MTS 3Number of Original Links 12 26 16Spurious Links 2.3 2.9 4.0Implicit Links 2.3 1.0 0.4Missed Links 1.0 2.8 1.4Total SD 5.6 6.7 5.8Original Segmentation Length 1000 1000 500Segmentation Error 15.89 16.08 14.157Missed Segmentations 0.6 0.0 1.2Spurious Segmentation 0.9 0.5 0.8

Generate Data from Several DBNsAppend each Section of Data Together to Form One MTS with Changing DependenciesRun HCHC

t

t-1

t-3

t-5

t-6

t-9

Time Explanation

P(OpState1 is 0) = 1.0 P(a1 is 0) = 1.0 P(a0 is 0) = 1.0 P(a2 is 1) = 1.0

P(OpState1 is 0) = 1.0 P(a1 is 1) = 1.0 P(a0 is 0) = 1.0 P(a2 is 1) = 1.0

P(a2 is 0) = 0.758

P(a2 is 0) = 0.545

P(a0 is 0) = 0.968

P(a0 is 1) = 0.517

P(OpState0 is 0) = 0.519

P(a0 is 1) = 0.778P(OpState0 is 0) = 0.720

t

t-1

t-3

t-5

t-6

t-7

t-9

Time Explanation

P(OpState1 is 4) = 1.0 P(a1 is 0) = 1.0 P(a0 is 0) = 1.0 P(a2 is 1) = 1.0

P(OpState1 is 4) = 1.0 P(a1 is 1) = 1.0 P(a0 is 0) = 1.0 P(a2 is 1) = 1.0

P(a2 is 1) = 0.570

P(a2 is 1) = 0.974

P(a0 is 0) = 0.506

P(a0 is 1) = 0.549

P(OpState2 is 3) = 0.210

P(a2 is 0) = 0.882P(OpState2 is 4) = 0.222

Process Diagram

TT

T6T

T36T

RBT

SOTT11SOFT13

TGF

BPF

%C3

%C2

RINT

FF

PGM

PGB

AFT

C11/3T

Typical Discovered Relationships

TT

T6T

T36T

RBT

SOTT11

SOFT13

AFT

TGF

BPF

%C3

%C2

RINT

FF

C11/3T

PGM

PGB

ParametersDBN Search GA EP

PopSize 100 10MR 0.1 0.8CR 0.8 ---Gen Based on FC Based on FC

Correlation Searchc - Approx. 20% of sR - Approx. 2.5% of s

Grouping GA Synth. 1 Synth. 2-6 Oil

PopSize 150 100 150CR 0.8 0.8 0.8MR 0.1 0.1 0.1Gen 150 100 (1000 for GPV) 150

ParametersEP-Seeded GAc - Approx. 20% of sEPListSize - Approx. 2.5% of s GAPopSize - 10MR - 0.1CR - 0.8LMR - 0.1Gen - Based on FC

HCHCOil Synthetic

DBN_Iterations 1×106 5000Winlen 1000 200Winjump 500 50


Recommended