+ All Categories
Home > Documents > Hindi POS tagging and chunking : An MEMM approach Aniket Dalal Kumar Nagaraj Uma Sawant Sandeep...

Hindi POS tagging and chunking : An MEMM approach Aniket Dalal Kumar Nagaraj Uma Sawant Sandeep...

Date post: 30-Jan-2016
Category:
View: 215 times
Download: 1 times
Share this document with a friend
Popular Tags:
22
Hindi POS tagging and chunking : An MEMM approach Aniket Dalal Kumar Nagaraj Uma Sawant Sandeep Shelke Under the guidance of Prof. P. Bhattacharyya
Transcript
Page 1: Hindi POS tagging and chunking : An MEMM approach Aniket Dalal Kumar Nagaraj Uma Sawant Sandeep Shelke Under the guidance of Prof. P. Bhattacharyya.

Hindi POS tagging and chunking : An MEMM

approach

Aniket Dalal Kumar Nagaraj Uma Sawant Sandeep Shelke

Under the guidance of Prof. P. Bhattacharyya

Page 2: Hindi POS tagging and chunking : An MEMM approach Aniket Dalal Kumar Nagaraj Uma Sawant Sandeep Shelke Under the guidance of Prof. P. Bhattacharyya.

Goal

Lexical AnalysisPart-Of-Speech (POS) Tagging : Assigning part-of-speech to each word. e.g. Noun, Verb...

Syntactic AnalysisChunking : Identify and label phrases as verb phrase, noun phrase etc.

Language : Hindi Approach : MEMM

Page 3: Hindi POS tagging and chunking : An MEMM approach Aniket Dalal Kumar Nagaraj Uma Sawant Sandeep Shelke Under the guidance of Prof. P. Bhattacharyya.

Outline

Maximum Entropy Markov Model (MEMM)Principle

Mathematical formulation

System overview Parameter estimation and classification

POS tagging features

Chunking features

Results and error analysis

Future work

Conclusion

Page 4: Hindi POS tagging and chunking : An MEMM approach Aniket Dalal Kumar Nagaraj Uma Sawant Sandeep Shelke Under the guidance of Prof. P. Bhattacharyya.

Maximum Entropy Markov Model

Maximum entropy principle The least biased model which considers all known

information is the one which maximizes entropy.

Entropy

Page 5: Hindi POS tagging and chunking : An MEMM approach Aniket Dalal Kumar Nagaraj Uma Sawant Sandeep Shelke Under the guidance of Prof. P. Bhattacharyya.

Maximum Entropy Markov Model

Mathematical formulation...

The distribution with the maximum entropy is equivalent to

\

Page 6: Hindi POS tagging and chunking : An MEMM approach Aniket Dalal Kumar Nagaraj Uma Sawant Sandeep Shelke Under the guidance of Prof. P. Bhattacharyya.

System overview

Parameter estimation and classification

GIS (Generalized Iterative Scaling)

finds the model parameters that define the maximum

entropy classifier for a given feature set and training

corpus

Beam Search

heuristic search algorithm, optimization of best-first

search

unfolds the first m most promising nodes at each depth

Page 7: Hindi POS tagging and chunking : An MEMM approach Aniket Dalal Kumar Nagaraj Uma Sawant Sandeep Shelke Under the guidance of Prof. P. Bhattacharyya.

What are features?

Feature function : Indicator function which captures useful facts of the

modelling task

For example,

Page 8: Hindi POS tagging and chunking : An MEMM approach Aniket Dalal Kumar Nagaraj Uma Sawant Sandeep Shelke Under the guidance of Prof. P. Bhattacharyya.

POS tagging features

Context-based POS tag of previous word

Current word

Word-dependentSuffixes

Digits

Special characters

English words

Page 9: Hindi POS tagging and chunking : An MEMM approach Aniket Dalal Kumar Nagaraj Uma Sawant Sandeep Shelke Under the guidance of Prof. P. Bhattacharyya.

POS tagging features

Dictionary-basedPossible tags for the word, according to the dictionary

Corpus-drivenOccurrence of a word and its tag(s) according to the training data

Page 10: Hindi POS tagging and chunking : An MEMM approach Aniket Dalal Kumar Nagaraj Uma Sawant Sandeep Shelke Under the guidance of Prof. P. Bhattacharyya.

Chunking features

Context based features Word itself (conditionally)

POS tag

Chunk label of previous word

Current POS tag based featureTag class

Page 11: Hindi POS tagging and chunking : An MEMM approach Aniket Dalal Kumar Nagaraj Uma Sawant Sandeep Shelke Under the guidance of Prof. P. Bhattacharyya.

Experimental Setup

26 POS tags6 chunk labels75 - 25 split of training and test dataResult averaged over 10 data sets

Page 12: Hindi POS tagging and chunking : An MEMM approach Aniket Dalal Kumar Nagaraj Uma Sawant Sandeep Shelke Under the guidance of Prof. P. Bhattacharyya.

Results

POS tagging accuracy Best : 89.346 %

Average : 88.4 %

Chunk labelling accuracy (per word basis)

Best : 87.399 %

Average : 86.45 %

Page 13: Hindi POS tagging and chunking : An MEMM approach Aniket Dalal Kumar Nagaraj Uma Sawant Sandeep Shelke Under the guidance of Prof. P. Bhattacharyya.

Accuracy across runs

Page 14: Hindi POS tagging and chunking : An MEMM approach Aniket Dalal Kumar Nagaraj Uma Sawant Sandeep Shelke Under the guidance of Prof. P. Bhattacharyya.

Error Analysis : POS tagging

Good performance for :VAUX, VFM, VNN

Postpositions

Need to improve :Compound tags

Proper nouns

Page 15: Hindi POS tagging and chunking : An MEMM approach Aniket Dalal Kumar Nagaraj Uma Sawant Sandeep Shelke Under the guidance of Prof. P. Bhattacharyya.

Error Analysis : Chunking

Good performance for :Noun phrase

Need to improve :Verb phrase

Page 16: Hindi POS tagging and chunking : An MEMM approach Aniket Dalal Kumar Nagaraj Uma Sawant Sandeep Shelke Under the guidance of Prof. P. Bhattacharyya.

Future Work

Morphological Features

Enriching dictionary

Hybrid models

Page 17: Hindi POS tagging and chunking : An MEMM approach Aniket Dalal Kumar Nagaraj Uma Sawant Sandeep Shelke Under the guidance of Prof. P. Bhattacharyya.

References

1. Adwait Ratnaparakhi. 1996. A maximum entropy model for part-of-speech tagging. In Erich Brill and Kenneth Church, editors, Proceedings of the Conference on Empirical Methods in NLP, pages 133-142. ACL. Somerset, New Jersey.

2. Adwait Ratnaparakhi. 1997. A simple introduction to maximum entropy models for natural language processing. Technical report 97-08, Institute for Research in Cognitive Science, University of Pennsylvania.

Page 18: Hindi POS tagging and chunking : An MEMM approach Aniket Dalal Kumar Nagaraj Uma Sawant Sandeep Shelke Under the guidance of Prof. P. Bhattacharyya.

References

3. Adam L. Berger , Vincent J. Della Pietra , Stephen A. Della Pietra, 1996 .A maximum entropy approach to natural language processing, Computational Linguistics, v.22 n.1, p.39-71.

4. Akshay Singh, Sushma Bendre, and Rajeev Sangal. 2005. HMM based chunker for hindi. In Proceedings of IJCNLP-05. Jeju Island, Republic of Korea.

Page 19: Hindi POS tagging and chunking : An MEMM approach Aniket Dalal Kumar Nagaraj Uma Sawant Sandeep Shelke Under the guidance of Prof. P. Bhattacharyya.

References

5. J. N. Darroch, D. Ratcliff, 1972. Generalized Iterative Scaling for Log-Linear Models, The Annals of Mathematical Statistics.

Page 20: Hindi POS tagging and chunking : An MEMM approach Aniket Dalal Kumar Nagaraj Uma Sawant Sandeep Shelke Under the guidance of Prof. P. Bhattacharyya.

Thank you!

Questions ?

Page 21: Hindi POS tagging and chunking : An MEMM approach Aniket Dalal Kumar Nagaraj Uma Sawant Sandeep Shelke Under the guidance of Prof. P. Bhattacharyya.

Example

Ram/PN aur/CC Sita/PN Shaadi/N karne/GRND ja/VM

rahen/VAUX hain/VAUX

Page 22: Hindi POS tagging and chunking : An MEMM approach Aniket Dalal Kumar Nagaraj Uma Sawant Sandeep Shelke Under the guidance of Prof. P. Bhattacharyya.

Beam Search

Ram

N:0.3 CC:0.005 PN:0.4 CC:0.2

CC:0.15 CC:0.25 INJ:0.10

VA:0.05

Aur


Recommended