+ All Categories
Home > Documents > Thomas Muller, Hinrich Schutze and Helmut Schmid ACL June 3-8, 2012 Reporter:Sitong Yang

Thomas Muller, Hinrich Schutze and Helmut Schmid ACL June 3-8, 2012 Reporter:Sitong Yang

Date post: 30-Dec-2015
Category:
Upload: ginger-lyons
View: 18 times
Download: 0 times
Share this document with a friend
Description:
A Comparative Investigation of Morphological Language Modeling for the Languages of the European Union. ICT. Thomas Muller, Hinrich Schutze and Helmut Schmid ACL June 3-8, 2012 Reporter:Sitong Yang. Outline. Introduction Modeling of morphology and shape Experimental Setup - PowerPoint PPT Presentation
Popular Tags:
33
1 A Comparative Investigation of Morphological Language Modeling for the Languages of the European Union Thomas Muller, Hinrich Schutze and Helmut Schmid ACL June 3-8, 2012 Reporter:Sitong Yang ICT
Transcript

1

A Comparative Investigation of Morphological Language Modeling

for the Languages of the European UnionThomas Muller, Hinrich Schutze and Helmut Schmid

ACL June 3-8, 2012 Reporter:Sitong Yang

ICT

2

Outline

• Introduction • Modeling of morphology and shape• Experimental Setup• Results and Discussion• Conclusion

3

Outline

• Introduction • Modeling of morphology and shape• Experimental Setup• Results and Discussion• Conclusion

4

Introduction

• Motivation

• Main idea

5

Motivation

Language model?

potentially

large

dangerous

serious

hypothetically

large

dangerous

serious

(frequent history) (rare history)

how to transfer ?

morphology

6

main idea• goal

•perplexity reduction(PD) for a large number of languages

7

main idea• goal

•perplexity reduction(PD) for a large number of languages

• Feature•Morphologigy•Shape Feature

8

main idea• goal

•perplexity reduction(PD) for a large number of languages

• Feature•Morphologigy•Shape Feature

• parameters•frequency threshold θ•number of suffixes uesd φ•morphological segmentation algorithms

9

Outline

• Introduction • Modeling of morphology and shape• Experimental Setup• Results and Discussion• Conclusion

10

Modeling of morphology and shape

• Morphology

• Shape features

• Similarity measure

11

Morphology

• Automatic suffix identification algorithms:Reports , Morfessor and Frequency

• Parameter:φ most frequent suffixes

12

Shape features• capitalization• special characters• word length

13

similarity measure

• similarity measure and details of the shape features in prior work (M¨ uller and Sch¨ utze, 2011).

14

Outline

• Introduction • Modeling of morphology and shape• Experimental Setup• Results and Discussion• Conclusion

15

Experimental Setup• Baseline

• Morphological class language model

• Distributional class language model

• Corpus

16

Experimental Setup• Experiments:

•srilm, kneser-Ney(KN), generic class implementation, optimal interpolation parameters

• Baseline•modified KN model

17

Morphological class language model

Class-based language model:

Word emission probobility:

18

Morphological class language model

Final model PM interpolates PC with a modified KN model:

Unknow word estimation:

19

Morphological class language model

modified class model PC'

20

Distributional class language model

• PD is same form PM

• The difference is the classes are mophological for PM and distributional for PD

• Whole-context distributional vector space model

21

Corpus• training set(80%)• validation set(10%)• test set(10%)

22

Outline

• Introduction • Modeling of morphology and shape• Experimental Setup• Results and Discussion• Conclusion

23

Results and Discussion

• Morphological model vs. Distributional model

• Sensitivity analysis of parameters

24

Morphological model vs. Distributional model

• MM:more morphological , more perplexity reduction ,largerφ.

• MM : Result considerable perplexity reduc-tions 3%-11%

• Frequency is surprisingly well

• Noly 4 cases DM better than MM

• DM restriction clustering to less frequent words

25

Morphological model vs. Distributional model

26

Sensitivity analysis of parameters• best and worst values of each parameter and the diffe

rence in perplexity improve-ment between the two.

• θ•strong influence on PD•positive correlated with morphological complexit

y

• φ and segmentation algorithms•negligible effect•frequency is perform best.

27

Sensitivity analysis of parameters

28

Outline

• Introduction • Modeling of morphology and shape• Experimental Setup• Results and Discussion• Conclusion

29

Conclusion• Feature:morphology shape feature

• Result:perplexity reduc-tions 3%-11%

• parameters:•θ:considerable influence•φ and segmentation algorithms: small effect

30

Future Work• A model that interpolates KN, morphological class mo

del and distributional class model.

31

my thought

• Minority language model

32

Q&A?

ICT

33

Thank you!

ICT


Recommended