+ All Categories
Home > Documents > Classification of Verbs – Towards Developing Bengali … of Verbs – Towards Developing Bengali...

Classification of Verbs – Towards Developing Bengali … of Verbs – Towards Developing Bengali...

Date post: 25-Apr-2018
Category:
Upload: ngolien
View: 228 times
Download: 2 times
Share this document with a friend
59
Classification of Verbs – Towards Developing Bengali Verb Subcategorization Lexicon February 12, 2010 1 Somnath Banerjee, Dipankar Das and Sivaji Bandyopadhyay Department of Computer Science & Engineering Jadavpur University, Kolkata-700032, India
Transcript
Page 1: Classification of Verbs – Towards Developing Bengali … of Verbs – Towards Developing Bengali Verb Subcategorization Lexicon February 12, 2010 1 Somnath Banerjee, Dipankar Das

Classification of Verbs – Towards Developing Bengali Verb Subcategorization Lexicon

February 12, 2010 1

Somnath Banerjee, Dipankar Das and Sivaji Bandyopadhyay

Department of Computer Science & Engineering

Jadavpur University, Kolkata-700032, India

Page 2: Classification of Verbs – Towards Developing Bengali … of Verbs – Towards Developing Bengali Verb Subcategorization Lexicon February 12, 2010 1 Somnath Banerjee, Dipankar Das

The NLP Research Group at Jadavpur University

� Computer Science and Engineering Department

� Teaching

NLP in both Undergraduate and

February 12, 2010 2

� NLP in both Undergraduate and Postgraduate courses in Computer Science and Engineering

� Research

Page 3: Classification of Verbs – Towards Developing Bengali … of Verbs – Towards Developing Bengali Verb Subcategorization Lexicon February 12, 2010 1 Somnath Banerjee, Dipankar Das

NLP Research

� National Consortium R & D Projects

� Cross Lingual Information Access

� English to Indian Languages Machine Translation Systems

February 12, 2010 3

Translation Systems

� Indian Languages to Indian Languages Machine Translation Systems

Page 4: Classification of Verbs – Towards Developing Bengali … of Verbs – Towards Developing Bengali Verb Subcategorization Lexicon February 12, 2010 1 Somnath Banerjee, Dipankar Das

NLP Research

� International Research Projects� “Advanced Platform for Question

Answering Systems” with Prof. Patrick Saint Dizier, France

“Sentiment Analysis” with Prof. Junichii

February 12, 2010 4

� “Sentiment Analysis” with Prof. Junichii Tsujii and Prof. Manabu Okumura, Japan

� “Answer Validation through Textual Entailment” with Prof. Alexander Gelbukh, Mexico

Page 5: Classification of Verbs – Towards Developing Bengali … of Verbs – Towards Developing Bengali Verb Subcategorization Lexicon February 12, 2010 1 Somnath Banerjee, Dipankar Das

NLP Research Areas

� Answer Validation Through Textual Entailment

� Machine Translation using EBMT approach (Manipuri – English)

February 12, 2010 5

approach (Manipuri – English)

� Opinion Mining and Opinion Summarization

� Emotion Analysis

� Temporal Relations Extraction

Page 6: Classification of Verbs – Towards Developing Bengali … of Verbs – Towards Developing Bengali Verb Subcategorization Lexicon February 12, 2010 1 Somnath Banerjee, Dipankar Das

NLP Research Areas

� Named Entity Extraction

� Subcategorization Frame Acquisition

� Question Answering Systems

February 12, 2010 6

� Text Summarization

� Statistical Machine Translation

� Language Generation

� Multi Word Extraction

Page 7: Classification of Verbs – Towards Developing Bengali … of Verbs – Towards Developing Bengali Verb Subcategorization Lexicon February 12, 2010 1 Somnath Banerjee, Dipankar Das

Outline

� Motivation

� Previous work

� Language Challenges

February 12, 2010 7

� Bengali Subcategorization Frames Acquisition

� Classification of Bengali Verbs

� Evaluation

� Future Task

Page 8: Classification of Verbs – Towards Developing Bengali … of Verbs – Towards Developing Bengali Verb Subcategorization Lexicon February 12, 2010 1 Somnath Banerjee, Dipankar Das

Motivation

� Subcategorization refers to certain kinds of relations between words and phrases in a sentence.

� A subcategorization frame is a statement

February 12, 2010 8

� A subcategorization frame is a statement of what types of syntactic arguments a verb (or adjective) takes, such as objects, infinitives, that-clauses, participial clauses and subcategorized prepositional phrases

Page 9: Classification of Verbs – Towards Developing Bengali … of Verbs – Towards Developing Bengali Verb Subcategorization Lexicon February 12, 2010 1 Somnath Banerjee, Dipankar Das

Motivation

� No existing parser in Bengali

- Subcategorization frame information helpsin parsing

� Subcategorization frames

February 12, 2010 9

� Subcategorization frames� Phrase alignment in a SMT system

� Question answering systems

� To build verb subcategorization lexicon forBengali using English VerbNet and Bengali-English Bilingual Dictionary

Page 10: Classification of Verbs – Towards Developing Bengali … of Verbs – Towards Developing Bengali Verb Subcategorization Lexicon February 12, 2010 1 Somnath Banerjee, Dipankar Das

Outline

� Motivation

� Previous work

� Language Challenges

February 12, 2010 10

� Bengali Subcategorization Frames Acquisition

� Classification of Bengali Verbs

� Evaluation

� Future Task

Page 11: Classification of Verbs – Towards Developing Bengali … of Verbs – Towards Developing Bengali Verb Subcategorization Lexicon February 12, 2010 1 Somnath Banerjee, Dipankar Das

Previous work (1/5)

� ANLT (Boguraev and Briscoe, 1987)- Alvey Natural Language Tools

- Manually prepared machine readable subcategorization lexicon

� ACQUILEX (Copestake, 1992)

February 12, 2010 11

� ACQUILEX (Copestake, 1992) - Acquisition of Lexical Knowledge for NLP Systems

- Designed to support representation of multilingual lexical information extracted from machine readable dictionaries

� COMLEX Syntax (Grishman et al., 1994)- Computational lexicon consisting of syntactical information for approximately 38,000 English headwords

Page 12: Classification of Verbs – Towards Developing Bengali … of Verbs – Towards Developing Bengali Verb Subcategorization Lexicon February 12, 2010 1 Somnath Banerjee, Dipankar Das

Previous work (2/5)

� FrameNet (Baker et al., 1998)

- On-line lexical resource for English, based on frame semanticsand supported by corpus evidence.

- Range of semantic and syntactic combinatory possibilities(valences) of each word in each of its senses

- More than 11,600 lexical units, more than 6,800 of which arefully annotated in more than 960 semantic frames exemplified inmore than 150,000 annotated sentences.

February 12, 2010 12

fully annotated in more than 960 semantic frames exemplified inmore than 150,000 annotated sentences.

� PropBank (Palmer et al., 2005)

- A corpus annotated with verbal propositions and their arguments

- Wide popularity for the semantic role labeling and NLP tasks.

- PropBank differs from FrameNet

- It commits to annotating all verbs in its data.

- All arguments to a verb must be syntactic constituents.

Page 13: Classification of Verbs – Towards Developing Bengali … of Verbs – Towards Developing Bengali Verb Subcategorization Lexicon February 12, 2010 1 Somnath Banerjee, Dipankar Das

Previous Work (3/5)

� Valex (Korhonen et al., 2006)

- 163 Subcategorization Frame (SCF)

types , superset of ANLT and COMLEX

- Provides lexical entry for each verb and

February 12, 2010 13

- Provides lexical entry for each verb and SCF combination

- Total 212,741 entries, 33 per verb on average

- Suitable for statistical NLP, linguistic and psycholinguistic use

Page 14: Classification of Verbs – Towards Developing Bengali … of Verbs – Towards Developing Bengali Verb Subcategorization Lexicon February 12, 2010 1 Somnath Banerjee, Dipankar Das

Previous work (4/5)

� VerbNet (Kipper-Schuler, 2006) - VerbNet associates the semantics of a verb with itssyntactic frames- Hierarchical domain-independent, broad-coverage verblexicon based on Levin (1993) verb classes- Mappings to other lexical resources such as WordNet(Miller, 1990; Fellbaum, 1998), XTAG (XTAG Research

February 12, 2010 14

(Miller, 1990; Fellbaum, 1998), XTAG (XTAG ResearchGroup, 2001), and FrameNet (Baker et al., 1998).- Verb class described by thematic roles (23), selectionalrestrictions on the arguments, and frames consisting of asyntactic description (55) and semantic predicates (94)- 274 Verb Classes and 5257 verb senses

Page 15: Classification of Verbs – Towards Developing Bengali … of Verbs – Towards Developing Bengali Verb Subcategorization Lexicon February 12, 2010 1 Somnath Banerjee, Dipankar Das

Previous Work (5/5)

� Ushioda, A., Evans, D.A., Gibson, T., Waibel, A.: The Automatic Acquisition of Frequencies of Verb Subcategorization Frames from Tagged Corpora. In: Boguraev, B., Pustejovsky, J. (eds.) Proceedings of the Workshop on Acquisition of Lexical Knowledge from Text, Columbus, Ohio, pp. 95–106 (1993)

� Lapata, M., Brew, C.: Using subcategorization to resolve verb class ambiguity. In: Fung, P., Zhou, J. (eds.) Proceedings of

February 12, 2010 15

class ambiguity. In: Fung, P., Zhou, J. (eds.) Proceedings of WVLC/EMNLP, pp. 266–274 (1999)

� Sarkar, A., Zeman, D.: Automatic extraction of subcategorization frames for Czech. In: Proceedings of COLING 2000 (2000)

� Kazunori Muraki, Shin'ichiro Kamei, Shinichi Doi.1997. A Left-to-right Breadth-first Algo-rithm for. Subcategorization Frame Selection of Japanese Verbs. TMI.

Page 16: Classification of Verbs – Towards Developing Bengali … of Verbs – Towards Developing Bengali Verb Subcategorization Lexicon February 12, 2010 1 Somnath Banerjee, Dipankar Das

Outline

� Motivation

� Previous work

� Language Challenges

February 12, 2010 16

� Bengali Subcategorization Frames Acquisition

� Classification of Bengali Verbs

� Evaluation

� Future Task

Page 17: Classification of Verbs – Towards Developing Bengali … of Verbs – Towards Developing Bengali Verb Subcategorization Lexicon February 12, 2010 1 Somnath Banerjee, Dipankar Das

Language Challenges

� Properties of Bengali -- Free phrase order language

Ami kal tomar sathe dekha korbo

Ami tomar sathe kal dekha korbo

Ami kal dekha korbo tomar sathe

Kal ami tomar sathe dekha korbo

February 12, 2010 17

Kal ami tomar sathe dekha korbo

Kal ami dekha korbo tomar sathe

- Rich morphology (Tense, Aspect and Person for verb)

- Compound verbs (“dekha” (see), “kara” (do), “dekha kara” (meet))

- Difficult to differentiate Arguments from Adjuncts

[Mohit] [sakalbela] [kath diye] [bari] [toiri korchilo]

[Mohit] [was preparing][house] [out of wood ][in the morning]

Page 18: Classification of Verbs – Towards Developing Bengali … of Verbs – Towards Developing Bengali Verb Subcategorization Lexicon February 12, 2010 1 Somnath Banerjee, Dipankar Das

Outline

� Motivation

� Previous work

� Language Challenges

February 12, 2010 18

� Bengali Verb Subcategorization Frames Acquisition

� Classification of Bengali Verbs

� Evaluation

� Future Task

Page 19: Classification of Verbs – Towards Developing Bengali … of Verbs – Towards Developing Bengali Verb Subcategorization Lexicon February 12, 2010 1 Somnath Banerjee, Dipankar Das

Bengali Verb Subcategorization Frames Acquisition

� Hypothesis

� Corpus Preparation

� Attempts Done

Target Verb Identification

February 12, 2010 19

� Target Verb Identification

� English Equivalent Verb Determination

� VerbNet Frames

� Bengali Verb Subcategorization Frames Acquisition

Page 20: Classification of Verbs – Towards Developing Bengali … of Verbs – Towards Developing Bengali Verb Subcategorization Lexicon February 12, 2010 1 Somnath Banerjee, Dipankar Das

Hypothesis

� Verb Subcategorization frames ofequivalent English verbs (in the samesense) for a Bengali verb

February 12, 2010 20

� � Initial set of verb subcategorization

frames for that Bengali verb

Page 21: Classification of Verbs – Towards Developing Bengali … of Verbs – Towards Developing Bengali Verb Subcategorization Lexicon February 12, 2010 1 Somnath Banerjee, Dipankar Das

Corpus Preparation

� Bengali News Corpus (Ekbal and Bandyopadhyay, 2008)developed from web archive

- Ekbal, A., Bandyopadhyay, S.: A Web-based Bengali NewsCorpus for Named Entity Recognition. Language Resources andEvaluation (LRE) Journal 42(2), 173–182 (2008)

- 14000 sentences

February 12, 2010 21

- 14000 sentences

� POS tagged using Maximum Entropy based POS tagger(Ekbal et al., 2008) (accuracy 88.2%)

- Ekbal, A., Haque, R., Bandyopadhyay, S.: Maximum EntropyBased Bengali Part of Speech Tagging. In: Gelbukh, A. (ed.) Advances inNatural Language Processing and Applications, Research in ComputingScience (RCS) Journal, vol. 33, pp. 67–78 (2008)

� Application of rule-based chunker (accuracy 89.4%) toget the chunked output

Page 22: Classification of Verbs – Towards Developing Bengali … of Verbs – Towards Developing Bengali Verb Subcategorization Lexicon February 12, 2010 1 Somnath Banerjee, Dipankar Das

Attempts Done (1+10 =11 most frequent Bengali verbs) (1/2)

February 12, 2010 22

Page 23: Classification of Verbs – Towards Developing Bengali … of Verbs – Towards Developing Bengali Verb Subcategorization Lexicon February 12, 2010 1 Somnath Banerjee, Dipankar Das

Attempts Done (2/2)

� Most frequent verb (dekha) (see)

- D.Das, A.Ekbal, and S.Bandyopadhyay. 2009.Acquiring Verb Subcategorization Frames in Bengalifrom Corpora. ICCPOL-09, LNAI-5459, 386-393, HongKong

February 12, 2010 23

Kong

� Next highest frequent verb (kara) (do)

- A special compound verb in Bengali

- S.Banerjee, D.Das and S.Bandyopadhyay. 2009.Bengali Verb Subcategorization Frame Acquisition – ABaseline Model. 7th ALR Workshop, (ACL-IJCNLP-2009),pp. 76-83, Suntec, Singapore

Page 24: Classification of Verbs – Towards Developing Bengali … of Verbs – Towards Developing Bengali Verb Subcategorization Lexicon February 12, 2010 1 Somnath Banerjee, Dipankar Das

Target verb Identification

� Stemming (accuracy 97.09%)

� Retrieve pattern {(VM)} for simple verbs and

{[XXX] (NN) [kara] (VM)} for compound verbs

� Verb forms not considered in the current task

- Passive forms ( e.g. (karano),

February 12, 2010 24

(kariye))

- Special forms (e.g. (jhakjhak kara) ,

(taktak kara))

Page 25: Classification of Verbs – Towards Developing Bengali … of Verbs – Towards Developing Bengali Verb Subcategorization Lexicon February 12, 2010 1 Somnath Banerjee, Dipankar Das

� Bengali-English bilingual dictionary

� http://home.uchicago.edu/~cbs2/banglainstruction.html

� Synonymous Verb Set (SVS)

English Equivalent Verb Determination

February 12, 2010 25

< v. to apply, to use; to behave,

to treat…>

< v. SVS1;SVS2;SVS3;…>

Page 26: Classification of Verbs – Towards Developing Bengali … of Verbs – Towards Developing Bengali Verb Subcategorization Lexicon February 12, 2010 1 Somnath Banerjee, Dipankar Das

VerbNet Frames (1/3)

� The VerbNet files contain the verbs with their possible subcategory frames and membership information in XML file format.

� The XML files of VerbNet have been preprocessed to build up a general list that contains all members (verbs) and their possible

February 12, 2010 26

members (verbs) and their possible subcategorization frames (primary as well as secondary) information.

� This preprocessed list is searched to acquire the subcategorization frames for each member of the SVS of the Bengali verb

Page 27: Classification of Verbs – Towards Developing Bengali … of Verbs – Towards Developing Bengali Verb Subcategorization Lexicon February 12, 2010 1 Somnath Banerjee, Dipankar Das

� ..... <VNCLASS ID="use-105"…� <MEMBERS>� <MEMBER name="use" wn=""/>� <MEMBER name="utilize" wn=""/>� <MEMBER name="apply" wn=""/>

VerbNet Frames (2/3)

February 12, 2010 27

� <MEMBER name="apply" wn=""/>� <MEMBER name="employ" wn=""/>

</MEMBERS>.....<FRAME>

� <DESCRIPTION descriptionNumber="8.1" primary="NP-PP" secondary="for-PP" xtag="0.2"/>

� <EXAMPLES>� <EXAMPLE>I spent the money for my training.</EXAMPLE>� </EXAMPLES>

</FRAME>……

Page 28: Classification of Verbs – Towards Developing Bengali … of Verbs – Towards Developing Bengali Verb Subcategorization Lexicon February 12, 2010 1 Somnath Banerjee, Dipankar Das

VerbNet Frames (3/3)

February 12, 2010 28

Page 29: Classification of Verbs – Towards Developing Bengali … of Verbs – Towards Developing Bengali Verb Subcategorization Lexicon February 12, 2010 1 Somnath Banerjee, Dipankar Das

Bengali Verb Subcategorization Frames Acquisition (1/4)

� In simple sentences the occurrence of the NNPC, NNP, NNC or NN tags preceded by the PRP (Pronoun), NNP, NNC, NN or NNPC tags and

February 12, 2010 29

NNP, NNC, NN or NNPC tags and followed by the verb gives similar frame syntax for “Basic Transitive” frame of the VerbNet.

(ami)(PRP) (kakatua)(NNP) (dekhi)(VM)I parrot see

Page 30: Classification of Verbs – Towards Developing Bengali … of Verbs – Towards Developing Bengali Verb Subcategorization Lexicon February 12, 2010 1 Somnath Banerjee, Dipankar Das

Bengali Verb Subcategorization Frames Acquisition (2/4)

� The syntax of “WHAT-S” frame for a Bengali sentence has been acquired by identifying the sentential complement part of the verb (dekha).

February 12, 2010 30

� The target verb followed by a NP chunk that consists of another main verb and WQ tag (question word) helps to identify the “WHAT-S” kind of frames.

(ami)(PRP) (dekhlam)(VM) (NP)((tara) (PP) (ki)(WQ) (korche)(VM))I saw they what did

Page 31: Classification of Verbs – Towards Developing Bengali … of Verbs – Towards Developing Bengali Verb Subcategorization Lexicon February 12, 2010 1 Somnath Banerjee, Dipankar Das

Bengali Verb Subcategorization Frames Acquisition (3/4)

� In order to acquire the frame of “NP-ING-OC”, we have created the list of possible Bengali inflections that can appear for the English “-ING” inflection.

� These inflections usually occur in sentences made up of compound verbs

February 12, 2010 31

� These inflections usually occur in sentences made up of compound verbs with conjunctive participle form (-e) and infinitive form (-te).

� If the phrase contains any of these inflections followed by the target verb then it gives a similar description of the VerbNet frame “NP-ING-OC”.

(ami)(PRP) (NP) ((tader) (haste)) (dekhechi)I them laughing have seen

Page 32: Classification of Verbs – Towards Developing Bengali … of Verbs – Towards Developing Bengali Verb Subcategorization Lexicon February 12, 2010 1 Somnath Banerjee, Dipankar Das

Bengali Verb Subcategorization Frames Acquisition (4/4)

Max jar theke hatpakha ar achhadon tairi korechilen

NP(NN) NP(PRP) CCP(PSP) NP (NN CC NN) VGNF( NN VM )

1 Frame : NP-PP

February 12, 2010 32

� From what Max made the hand fan and cover

Ram chitkar korlo je se ar kokhono asbe na

NP(NN) VGNF ( NN VM ) NP( DEM PRP CC NN VM NEG)

2. Frame : Sentential (S)

� Ram shouted that he will never come back

Page 33: Classification of Verbs – Towards Developing Bengali … of Verbs – Towards Developing Bengali Verb Subcategorization Lexicon February 12, 2010 1 Somnath Banerjee, Dipankar Das

Outline

� Motivation

� Previous work

� Language Challenges

February 12, 2010 33

� Bengali Subcategorization Frames Acquisition

� Classification of Bengali Verbs

� Evaluation

� Future Task

Page 34: Classification of Verbs – Towards Developing Bengali … of Verbs – Towards Developing Bengali Verb Subcategorization Lexicon February 12, 2010 1 Somnath Banerjee, Dipankar Das

� Key classes for Key verbs

� Synonyms from Bengali Thesaurus

� Sense based Classification of Synonyms

Classification of Bengali Verbs

February 12, 2010 34

Page 35: Classification of Verbs – Towards Developing Bengali … of Verbs – Towards Developing Bengali Verb Subcategorization Lexicon February 12, 2010 1 Somnath Banerjee, Dipankar Das

Key classes for Key verbs

� Task- Key verb classes (Cbk) for each of 11 most frequent verbs (Key verbs)

- Only 8 key verbs considered in the present study

- Subcategorization Frames Acquisition for each key verb

February 12, 2010 35

- Subcategorization Frames Acquisition for each key verb class

� Assumption- Synonymous verbs of a key verb with same sense are present in a Key verb Class for the key verb

� these verbs share same subcategorization frames as

their Key verb

Page 36: Classification of Verbs – Towards Developing Bengali … of Verbs – Towards Developing Bengali Verb Subcategorization Lexicon February 12, 2010 1 Somnath Banerjee, Dipankar Das

� Key classes for Key verbs

� Synonyms from Bengali Thesaurus

� Sense based Classification of Synonyms

Classification of Bengali Verbs

February 12, 2010 36

Page 37: Classification of Verbs – Towards Developing Bengali … of Verbs – Towards Developing Bengali Verb Subcategorization Lexicon February 12, 2010 1 Somnath Banerjee, Dipankar Das

Synonyms from Bengali Thesaurus

Key Verb (Ybk) and Member Verb (Xbm)

Xbm

February 12, 2010 37

- where “(�)” indicates the component part “���”(kara) [do]

-Machine readable Bengali Thesaurus is being developed manually from a printed Bengali Thesaurus- Entries of only eight key verbs present in Thesaurus

- Cbk � Bengali Key class for key verb Ybk

- Cbs � Bengali Synonymous Class

YbkCbs

Page 38: Classification of Verbs – Towards Developing Bengali … of Verbs – Towards Developing Bengali Verb Subcategorization Lexicon February 12, 2010 1 Somnath Banerjee, Dipankar Das

� Key classes for Key verbs

� Synonyms from Bengali Thesaurus

� Sense based Classification of Synonyms

Classification of Bengali Verbs

February 12, 2010 38

Page 39: Classification of Verbs – Towards Developing Bengali … of Verbs – Towards Developing Bengali Verb Subcategorization Lexicon February 12, 2010 1 Somnath Banerjee, Dipankar Das

Sense based Classification of Synonyms (1/6)

� Key verb and its Bengali synonyms are searched inBengali to English bilingual dictionary to extract theirEnglish equivalent synonyms

� - ECk � English equivalent class of key verb Ybk

February 12, 2010 39

� - ECk � English equivalent class of key verb Ybk

- ECm � English equivalent class of a synonymousmember verb

ECk = {KSVS1; KSVS2; …..; KSVSq}

ECm = {MSVS1; MSVS2; …..; MSVSp}

Page 40: Classification of Verbs – Towards Developing Bengali … of Verbs – Towards Developing Bengali Verb Subcategorization Lexicon February 12, 2010 1 Somnath Banerjee, Dipankar Das

Sense based Classification of Synonyms (2/6)

Example

MSVS2

KSVS1

MSVS1

KSVS2

February 12, 2010 40

- Sense of English synonyms (apply, use) is different from the sense (behave) for Bengali key verb (byabahar kara)

MSVS2

Ybk � (byabahar kara) (Cbk=2)Xbm � (prayog kara)

Page 41: Classification of Verbs – Towards Developing Bengali … of Verbs – Towards Developing Bengali Verb Subcategorization Lexicon February 12, 2010 1 Somnath Banerjee, Dipankar Das

Sense based Classification of Synonyms (3/6)

� Algorithm

� If there exists an Xbm such that Xbm belongs to Cbs

� for i = 1 to p, for j = 1 to q

if (Zsi ∩ Zdj) ≠ φ

February 12, 2010 41

� if (Zsi ∩ Zdj) ≠ φ� then Xbm is to be assigned in class Cbk

where Zsi belongs to MSVS and Zdj belongs to KSVS

Page 42: Classification of Verbs – Towards Developing Bengali … of Verbs – Towards Developing Bengali Verb Subcategorization Lexicon February 12, 2010 1 Somnath Banerjee, Dipankar Das

Sense based Classification of Synonyms (4/6)

KSVS1Zd1

Zd2

February 12, 2010 42

MSVS2Zs1

Zs2

Page 43: Classification of Verbs – Towards Developing Bengali … of Verbs – Towards Developing Bengali Verb Subcategorization Lexicon February 12, 2010 1 Somnath Banerjee, Dipankar Das

Sense based Classification of Synonyms (5/6)

KSVS2

February 12, 2010 43

MSVS1

Zs1

Zd1

Page 44: Classification of Verbs – Towards Developing Bengali … of Verbs – Towards Developing Bengali Verb Subcategorization Lexicon February 12, 2010 1 Somnath Banerjee, Dipankar Das

Sense based Classification of Synonyms (6/6)

(Cb1) (Cb2)

� �

February 12, 2010 44

�The process terminates when no MSVS of a member verb is left unclassified

Page 45: Classification of Verbs – Towards Developing Bengali … of Verbs – Towards Developing Bengali Verb Subcategorization Lexicon February 12, 2010 1 Somnath Banerjee, Dipankar Das

Snapshot

February 12, 2010 45

Page 46: Classification of Verbs – Towards Developing Bengali … of Verbs – Towards Developing Bengali Verb Subcategorization Lexicon February 12, 2010 1 Somnath Banerjee, Dipankar Das

Outline

� Motivation

� Previous work

� Language Challenges

February 12, 2010 46

� Bengali Subcategorization Frames Acquisition

� Classification of Bengali Verbs

� Evaluation

� Future Task

Page 47: Classification of Verbs – Towards Developing Bengali … of Verbs – Towards Developing Bengali Verb Subcategorization Lexicon February 12, 2010 1 Somnath Banerjee, Dipankar Das

Evaluation

� The set of acquired subcategorization frames or the frame lexicon can be evaluated against a gold standard corpus

February 12, 2010 47

against a gold standard corpus� obtained either through manual

analysis of corpus data, or

� from subcategorization frame entries in a large dictionary or

� from the output of the parser made for that language

Page 48: Classification of Verbs – Towards Developing Bengali … of Verbs – Towards Developing Bengali Verb Subcategorization Lexicon February 12, 2010 1 Somnath Banerjee, Dipankar Das

Evaluation (1/2)

� As there is no available parser, we consider mostly simple sentences for the current task

� Handling the phrase level tagging error caused by chunking

120 correctly chunked sentences prepared

February 12, 2010 48

� 120 correctly chunked sentences prepared manually to make gold standard test data

Page 49: Classification of Verbs – Towards Developing Bengali … of Verbs – Towards Developing Bengali Verb Subcategorization Lexicon February 12, 2010 1 Somnath Banerjee, Dipankar Das

Evaluation (2/2)

� Recall (r) - The percentage of subcategorization frame types in the gold standard that the system proposes

� Precision (p) - The percentage of subcategorization frame types that the system proposes are correct according to the gold standard

� F-Measure - 2*p*r/(p+r)

February 12, 2010 49

Evaluation result

Page 50: Classification of Verbs – Towards Developing Bengali … of Verbs – Towards Developing Bengali Verb Subcategorization Lexicon February 12, 2010 1 Somnath Banerjee, Dipankar Das

Results (1/4)

February 12, 2010 50

Page 51: Classification of Verbs – Towards Developing Bengali … of Verbs – Towards Developing Bengali Verb Subcategorization Lexicon February 12, 2010 1 Somnath Banerjee, Dipankar Das

Results (2/4)

February 12, 2010 51

Page 52: Classification of Verbs – Towards Developing Bengali … of Verbs – Towards Developing Bengali Verb Subcategorization Lexicon February 12, 2010 1 Somnath Banerjee, Dipankar Das

Results (3/4)

February 12, 2010 52

Page 53: Classification of Verbs – Towards Developing Bengali … of Verbs – Towards Developing Bengali Verb Subcategorization Lexicon February 12, 2010 1 Somnath Banerjee, Dipankar Das

Results (4/4)

February 12, 2010 53

Page 54: Classification of Verbs – Towards Developing Bengali … of Verbs – Towards Developing Bengali Verb Subcategorization Lexicon February 12, 2010 1 Somnath Banerjee, Dipankar Das

New Findings

� The sense wise classified verb synsets can be used for the verb entries in Bengali WordNet

� If a synonym has already been attempted in the Classification process as a key verb, the synonym will not be considered again

February 12, 2010 54

� New Frames possible in Bengali

Example

(Basic Transitive frame for prostut kara)

Ram khabar prostut korche

(Ram is preparing the food)

Page 55: Classification of Verbs – Towards Developing Bengali … of Verbs – Towards Developing Bengali Verb Subcategorization Lexicon February 12, 2010 1 Somnath Banerjee, Dipankar Das

Outline

� Motivation

� Previous work

� Language Challenges

February 12, 2010 55

� Bengali Subcategorization Frames Acquisition

� Classification of Bengali Verbs

� Evaluation

� Future Task

Page 56: Classification of Verbs – Towards Developing Bengali … of Verbs – Towards Developing Bengali Verb Subcategorization Lexicon February 12, 2010 1 Somnath Banerjee, Dipankar Das

Future Task (1/2)

� Dependency of bilingual dictionary

� More analysis on Primary and/or Secondary frames

� Error analysis to recover from the loss in precision

February 12, 2010 56

precision

� Machine learning Approach

� Thematic roles for Bengali

� Semantic Role Labeling

Page 57: Classification of Verbs – Towards Developing Bengali … of Verbs – Towards Developing Bengali Verb Subcategorization Lexicon February 12, 2010 1 Somnath Banerjee, Dipankar Das

Future Task (2/2)

� Full-fledged parser for Bengali

� Alignment issues in Machine Translation from English to Bengali

February 12, 2010 57

� Argument selection for Question-Answering systems

Page 58: Classification of Verbs – Towards Developing Bengali … of Verbs – Towards Developing Bengali Verb Subcategorization Lexicon February 12, 2010 1 Somnath Banerjee, Dipankar Das

Thank you

February 12, 2010 58

Page 59: Classification of Verbs – Towards Developing Bengali … of Verbs – Towards Developing Bengali Verb Subcategorization Lexicon February 12, 2010 1 Somnath Banerjee, Dipankar Das

Questions

?

February 12, 2010 59


Recommended