+ All Categories
Home > Documents > Farhana Ferdousi Liza and Marek Grzes School of Computing ... · Farhana Ferdousi Liza and Marek...

Farhana Ferdousi Liza and Marek Grzes School of Computing ... · Farhana Ferdousi Liza and Marek...

Date post: 20-Mar-2020
Category:
Upload: others
View: 7 times
Download: 0 times
Share this document with a friend
19
Estimating the Accuracy of Spectral Learning for HMMs Farhana Ferdousi Liza and Marek Grzes School of Computing, University of Kent, UK
Transcript
Page 1: Farhana Ferdousi Liza and Marek Grzes School of Computing ... · Farhana Ferdousi Liza and Marek Grzes School of Computing, University of Kent, UK. Roadmap ... basis vector angle

The UK’s European universityEstimating the Accuracy of Spectral Learning for HMMs Farhana Ferdousi Liza and Marek GrzesSchool of Computing, University of Kent, UK

Page 2: Farhana Ferdousi Liza and Marek Grzes School of Computing ... · Farhana Ferdousi Liza and Marek Grzes School of Computing, University of Kent, UK. Roadmap ... basis vector angle

Roadmap

• Motivation• Background• Proposed method• Experiments and results• Conclusion• Q&A

Page 3: Farhana Ferdousi Liza and Marek Grzes School of Computing ... · Farhana Ferdousi Liza and Marek Grzes School of Computing, University of Kent, UK. Roadmap ... basis vector angle

Motivation (Why Estimating the Accuracy?)

• Model is incorrect or training data is insufficient. (when we see unexpected results)

• The unsupervised learning and model selection

Page 4: Farhana Ferdousi Liza and Marek Grzes School of Computing ... · Farhana Ferdousi Liza and Marek Grzes School of Computing, University of Kent, UK. Roadmap ... basis vector angle

Motivation (Why Spectral Learning for HMM?)

θ0 0.2 0.4 0.6 0.8 1

likeliho

od

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5LM

LM

LMLM

LM

3_SL,LM

← 2_SL

↓ 1_SL

Page 5: Farhana Ferdousi Liza and Marek Grzes School of Computing ... · Farhana Ferdousi Liza and Marek Grzes School of Computing, University of Kent, UK. Roadmap ... basis vector angle

Background

• Spectral Learning• Moment based parameter estimation• Use information contained in the eigen-­vectors of a (item-­item similarity) matrix to detect structure.

• Provide certain (PAC-­style) guarantees of performance (not extremely heuristics).

• Hidden Markov Model• Described by three matrices: T, O and 𝜋.

Page 6: Farhana Ferdousi Liza and Marek Grzes School of Computing ... · Farhana Ferdousi Liza and Marek Grzes School of Computing, University of Kent, UK. Roadmap ... basis vector angle

Spectral learning for HMM

OOM operator for HMM

Empirical low label moment calculation:

Transformed operators for HMM

UΣ𝑉∗ =

Page 7: Farhana Ferdousi Liza and Marek Grzes School of Computing ... · Farhana Ferdousi Liza and Marek Grzes School of Computing, University of Kent, UK. Roadmap ... basis vector angle

Observation 1

X axis-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6

Y ax

is

-1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1SecondVecFirstVec

0.5

X axis

0

-0.5-0.5

0

Y axis

0.5

0.5

-0.5

0

Z ax

is

SecondVecFirstVecThirdVec

As expected, the basis vector (representing the subspace)rotates.

Page 8: Farhana Ferdousi Liza and Marek Grzes School of Computing ... · Farhana Ferdousi Liza and Marek Grzes School of Computing, University of Kent, UK. Roadmap ... basis vector angle

Observation 2

training data size0 5000 10000ba

sis

vect

or a

ngle

cha

nge

diff

0

20

40

60

80 m =2

training data size0 5000 10000ba

sis

vect

or a

ngle

cha

nge

diff

0

20

40

60

80 m =3

training data size0 5000 10000ba

sis

vect

or a

ngle

cha

nge

diff

0

20

40

60

80 m =4

training data size ×1040 2 4ba

sis

vect

or a

ngle

cha

nge

diff

0

20

40

60

80 m =8

training data size0 20 40 60ba

sis

vect

or a

ngle

cha

nge

diff

0102030405060 threshold = 0.1

training data size0 200 400ba

sis

vect

or a

ngle

cha

nge

diff

0

20

40

60

80threshold = 0.01

training data size0 200 400 600 800ba

sis

vect

or a

ngle

cha

nge

diff

0

20

40

60threshold = 0.001

training data size0 200 400 600 800ba

sis

vect

or a

ngle

cha

nge

diff

0

10

20

30

40

50

60threshold = 0.0001

Synthetic Dataset Real Dataset

Difference between the rotating bases gets smaller on larger training subsets

Page 9: Farhana Ferdousi Liza and Marek Grzes School of Computing ... · Farhana Ferdousi Liza and Marek Grzes School of Computing, University of Kent, UK. Roadmap ... basis vector angle

ill conditioned HMM

• A matrix is ill-­conditioned if the condition number is too large or inverse condition number (ICN) is too small.

• An HMM is ambiguous if ICN of the characteristic matrixes of HMM is too small (close to zero), and in such a case, the parameter estimation is difficult for any parameter estimation technique.

• ICN was calculated as a ratio between the smallest and largest singular value of a row augmented matrix of Tand O.

• Example: ill conditioned HMM

Page 10: Farhana Ferdousi Liza and Marek Grzes School of Computing ... · Farhana Ferdousi Liza and Marek Grzes School of Computing, University of Kent, UK. Roadmap ... basis vector angle

Proposed Criteria

• From our observations we have proposed the convergence criteria based on basis vector angle change difference.

• Our claim is based on the second observation that angle change difference reduces "subspace stabilises";; also "can be determined”, when the training data increases.

• [Hsieh and Olsen] showed that, active subspace will never change in a neighborhood of the global minimum.

• The subset size and the threshold determination is application specific, can be determine using cross validation technique.

Page 11: Farhana Ferdousi Liza and Marek Grzes School of Computing ... · Farhana Ferdousi Liza and Marek Grzes School of Computing, University of Kent, UK. Roadmap ... basis vector angle

Experimental setting

Real Dataset: web-­navigation data from msnbc.com

Synthetic Dataset : Configuration for the synthetic dataset

Page 12: Farhana Ferdousi Liza and Marek Grzes School of Computing ... · Farhana Ferdousi Liza and Marek Grzes School of Computing, University of Kent, UK. Roadmap ... basis vector angle

Evaluation 1:Error Measure for synthetic dataset (true model is known)

Threshold(log scaled)-10 -5 0 5

Nor

mal

ized

L1

erro

r

8.8

8.9

9

9.1

9.2

9.3Example 1

Threshold(log scaled)-10 -5 0 5

Nor

mal

ized

L1

erro

r8.28

8.29

8.3

8.31

8.32

8.33

8.34

Example 2

Threshold(log scaled)-10 -5 0 5

Nor

mal

ized

L1

erro

r

7.04

7.05

7.06

7.07

7.08

7.09

Example 15

Threshold(log scaled)-10 -5 0 5

Nor

mal

ized

L1

erro

r

7.61

7.62

7.63

7.64

7.65

7.66

7.67

Example 4

Page 13: Farhana Ferdousi Liza and Marek Grzes School of Computing ... · Farhana Ferdousi Liza and Marek Grzes School of Computing, University of Kent, UK. Roadmap ... basis vector angle

Evaluation 1:Error does not corresponds with ill conditioned HMM

Threshold(log scaled)-10 -5 0 5

Nor

mal

ized

L1

erro

r×104

0

2

4

6

8

10

12Example 8

Threshold(log scaled)-10 -5 0 5

Nor

mal

ized

L1

erro

r

×104

0

1

2

3

4

5Example 9

Threshold(log scaled)-10 -5 0 5

Nor

mal

ized

L1

erro

r

×104

0

2

4

6

8

10Example 18

Threshold(log scaled)-10 -5 0 5

Nor

mal

ized

L1

erro

r

×105

0

2

4

6

8

10Example 17

Page 14: Farhana Ferdousi Liza and Marek Grzes School of Computing ... · Farhana Ferdousi Liza and Marek Grzes School of Computing, University of Kent, UK. Roadmap ... basis vector angle

Evaluation 2:Recovered Parameter and proposed criteria (Well conditioned HMM)

Page 15: Farhana Ferdousi Liza and Marek Grzes School of Computing ... · Farhana Ferdousi Liza and Marek Grzes School of Computing, University of Kent, UK. Roadmap ... basis vector angle

Evaluation 2:Recovered Parameter and proposed criteria (ill conditioned HMM)

Threshold = 0.00001

Page 16: Farhana Ferdousi Liza and Marek Grzes School of Computing ... · Farhana Ferdousi Liza and Marek Grzes School of Computing, University of Kent, UK. Roadmap ... basis vector angle

Conclusion

• The angle change difference can be a useful criteria for checking the convergence.

• Without a convergence criteria it would be difficult to know whether the model is incorrect or, the model is correct but more training example is required.

Page 17: Farhana Ferdousi Liza and Marek Grzes School of Computing ... · Farhana Ferdousi Liza and Marek Grzes School of Computing, University of Kent, UK. Roadmap ... basis vector angle

Future Work

• Problems with spectral learning• Can not incorporate long term dependency• For large domain the SVD can be time consuming• Simplification of the domain space might be tricky and in some cases intractable

Page 18: Farhana Ferdousi Liza and Marek Grzes School of Computing ... · Farhana Ferdousi Liza and Marek Grzes School of Computing, University of Kent, UK. Roadmap ... basis vector angle

• Q & A• Thanks

Page 19: Farhana Ferdousi Liza and Marek Grzes School of Computing ... · Farhana Ferdousi Liza and Marek Grzes School of Computing, University of Kent, UK. Roadmap ... basis vector angle

THE UK’S EUROPEAN UNIVERSITY

www.kent.ac.uk


Recommended