+ All Categories
Home > Technology > Discourse annotation for arabic

Discourse annotation for arabic

Date post: 11-Jun-2015
Category:
Upload: arabicnlpimamu2013
View: 152 times
Download: 2 times
Share this document with a friend
Popular Tags:
19
Discourse Annotation for Arabic Arwa Al-Zammam, Ruba Al-Homaid, Eman Al-Badr Supervisor: Amal Al-Saif
Transcript
Page 1: Discourse annotation for arabic

Discourse Annotation for Arabic

Arwa Al-Zammam, Ruba Al-Homaid, Eman Al-BadrSupervisor: Amal Al-Saif

Natural Language Processing - CS46511-6-1434 H

Page 2: Discourse annotation for arabic

Outline

• Leeds Arabic Discourse Treebank• Discourse Annotation• Arabic language characteristics• Discourse relations• Characteristics of Modern Standard Arabic• Arabic Discourse Connectives• Agreement Studies• Discourse Connective Recognition• Result of Discourse Connective Recognition• Discourse Relation Recognition• Result of Discourse Relation Recognition• Conclusion

Page 3: Discourse annotation for arabic

Leeds Arabic Discourse Treebank

• The Leeds Arabic Discourse Treebank LADTB v1 is the first discourse Treebank for MSA

• LADTB has similar annotation principles as PDTB project for English, Turkish, Hindi and Chinese discourse TB

• Although LADTB was built to be a gold standard for automatic discourse processing studies

Page 4: Discourse annotation for arabic

Discourse Annotation

• Discourse relations such as CAUSAL or CONTRAST relations between textual units play an important role in producing a coherent discourse.

. مكة إلى أحمد يسافر الرياض لم إلى سافر عمه . لكنه لزيارة.المريض

• In defining discourse connectives as lexical expressions that relate two text segments (arguments) that express abstract entities such as events, belief, facts or propositions .(Aw/or/او ,lkn/but/لكن)

تضاد Contrast

سببية Causal

Page 5: Discourse annotation for arabic

Discourse Annotation

• Applications using discourse annotation:• Automatic summarization• Question answering• Sentiment analysis• Readability assessment

Page 6: Discourse annotation for arabic

• Arabic discourse connectives are ambiguity.• Explicit discourse connectives.• The variety of Arabic discourse connectives.• The annotation principles designed to annotate discourse

connectives in English in the PDTB2, can be applied toreliably annotate discourse connectives in Arabic newswire.

• Machine learning models can be used to identify discourse connectives and relations in Arabic newswire.

• Supervised machine learning models can identify Arabic discourse connectives and their relations with high

reliability.

Arabic Language Characteristics

Page 7: Discourse annotation for arabic

Discourse Relations

• Explicit discourse relations:

[He took my photo,]Arg2 [while]DC [I was having dinner]Arg2

• Implicit discourse relations:

[He has to stay in bed.]Arg1 [He has the flu.]Arg2

Page 8: Discourse annotation for arabic

Characteristics of Modern Standard Arabic

Page 9: Discourse annotation for arabic

Characteristics of Modern Standard Arabic

Al-maSdar noun:

Page 10: Discourse annotation for arabic

Characteristics of Modern Standard Arabic

English Al-masdar noun Morph. Pattern Root

swimming سباحة فعالة Sbh/سبح

reflection انعكاس انفعال Eks/عكس

experiment تجربة تفعلة Jrb/جرب

war ب. ح/ر. ف/ع.ْل. Hrb//ب ح/ر/

defence دفاع فعال Dfe/دفع

Al-maSdar noun:

Page 11: Discourse annotation for arabic

• Word order in Arabic. (verb –subject –object)

• Punctuations in Arabic.

Characteristics of Modern Standard Arabic

Page 12: Discourse annotation for arabic

Arabic Discourse Connectives

• Conjunctions (لكن/lkn/but, او/Aw/or or و/w/and)• Adverbial ( فـ... (TAlmA.. f../as-long-as/طالما• Prepositional phrases, prepositions also can link discourse segments

when one or both arguments are al-maSdar nouns.some nouns such as ( نتيجة/ntyjp/result,خشية /ks.yp/fear and بغية/bqyp/desire) are used as discourse connectives in Arabic.

The discourse connectives in Arabic might occur:• Individually such as (لكن /lkn/however).• In conjunction with other connectives using the coordinating conjunction و

/w/and such as ( قبْل و .(lkn w qbl/however and before/ لكن• As multiple connectives without conjunction such as ( بعد /AlA bEd/اال

except after).

Page 13: Discourse annotation for arabic

Agreement Studies

• TASK I :measures whether annotators agree on the binary decision on whether an item constitutes a discourse connective in context.

• TASK 2:measures whether annotators agree on which discourse relation an identified connective expresses.

The agreement was measured for the distinction of discourse vs. non-discourse usage, relation assignment and argument assignment:

agr(ann1||) ann2 = |ann1 matching ann2| / |ann1|

Page 14: Discourse annotation for arabic

Discourse Connective Recognition

• Surface Features (SConn).• Lexical features of surrounding words (Lex).

يمكن األطفال ]ان باإلرهاق[ يصابوا ] Arg1 ان يشعروا[ DC و[ ان. Arg2 بالنعاس] جيدا يناموا لم اذا الدراسة خالل

• Part of Speech features (POS).• Syntactic category of related phrases (Syn).

non-discourse usage of و /w/and ( وجميلة كبيرة / المدرسة¯almdrsh

kbyrh w ˇgmylh/ the school is very large and beautiful).• Al-Masdar feature.

Page 15: Discourse annotation for arabic

Result

Acurr K Features

68.9 0 Baseline (not conn)

75.7 0.48

Conn only M1

Tokenization by white space + auto tagger

85.6 0.62 Conn + SConn + Lex M2

87.6 0.69 Conn + SConn + Lex + POS M3

88.5 0.70 Conn + SConn + Lex + POS + Masdar M4

ATB – based features

86.2 0.65 Conn + SConn + Lex M5

91.2 0.79 Conn + SConn + Lex + Syn/POS M6

92.4 0.82 Conn + Sconn + Lex + Syn/POS + Masdar M7

91.2 0.79 Conn + Sconn + Syn M8

91.2 0.79 Sconn + Lex + Syn + Masder M9

Page 16: Discourse annotation for arabic

Discourse Relation Recognition

• Words and POS of arguments.• Masdar.• Tense and Negation.• Length, Distance and Order Features.• Production Rules.

Page 17: Discourse annotation for arabic

Result

Acurr K Features Ref

All connectives (6039)

52.5 0 Baseline (CONJUNCTION)

77.2 0.60 Conn only (1) M1

78.8 0.66 Conn + Conn_f + Arg_f (37) M2

78.3 0.65 Conn + Conn_f + Arg_f + Production rules (1237)

M3

Excluding wa at BOP (3813)

35 0 Baseline (CONJUNCTION)

74.3 0.65 Conn only (1) M1

77 0.69 Conn + Conn_f + Arg_f (37) M2

76.7 0.69 Conn + Conn_f + Arg_f + Production rules (1237)

M3

Page 18: Discourse annotation for arabic

Result

Acurr K Features Ref

All connectives (6039)

62.4 0 Baseline (EXPANSION)

88.7 0.78 Conn only (1) M1

88.7 0.78 Conn + Conn_f + Arg_f (37) M2

Excluding wa at BOP (3813)

41.8 0 Baseline (EXPANSION)

82.7 0.74 Conn only (1) M1

83.5 0.75 Conn + Conn_f + Arg_f (37) M2

Page 19: Discourse annotation for arabic

Conclusion:

We talked about Arabic discourse annotation; discourse connective and relations. We also show Arabic language characteristics which related to this subject and the result.


Recommended