Hindi Discourse Relation Bank
Sudheer KolachinaDipti Misra Sharma
LTRC, IIIT-Hyderabad
joint work with
Rashmi PrasadHIA, UWM
Aravind JoshiIRCS, UPENN
HDRB: Introduction
● Objective● Create a large-scale corpus of Hindi texts annotated with
discourse relations ● Corpus
● 200 K drawn from the 400 K Hindi newspaper corpus on which morpho-syntactic annotation (POS, morph, chunk, syntactic dependency) being carried out independently (Begum et al., 2008)
● Parallel development of HDRB and the multi-level, multi-representational Hindi treebank (Xia et al., 2009, Bhatt et al., 2009, Bhatia et al., 2010) leading to an enriched linguistic resource for Hindi
HDRB: Introduction
● PDTB annotation scheme● Lexically grounded approach to discourse relations
● Explicit Connectives● Implicit connectives● AltLex expressions● EntRel ● NoRel
HDRB Experience
● Departures from PDTB annotation scheme● Annotation task design● Issues in Hindi discourse● Statistics from data annotated so far
Departures from PDTB
● Semantically driven assignment of argument labels
● Elimination of argument-specific sense labels ● Uniform treatment of pragmatic relations of all
types● Addition of Goal and Similarity senses
Semantic Argument labels
● Cause after effect. Hence, Arg1-Arg2 प्रतितियोगिगितिा के बाद सोगनल ने बतिाया िक िविजेतिा के रूप मे जब उसका नाम पुकारा गिया
तिोग उसे कुछ देर तिक खदु पर िविश्विास ही नहीं हुआ क्योंकिक विह मान कर चल रही थी िक यह प्रतितियोगिगितिा िफ़िक्स ह ै. (PDTB- Cause.Reason HDRB- Cause) ‘After the competition, Sonal said that [when her name was announced as the winner, she could not believe herself for some time], because {she was thinking that the competition was fixed}.’
• Cause before effect (Arg2-Arg1) फैशन िडिजाईनरोंक का कहना ह ैिक सबसे ज्यादा नक़ल या चोगरी मोगनोगपोगली िडिजाईन की
होगतिी ह ै. िडिजाईनर इन बातिोंक कोग बखबूी जानतेि है इसिलए कई बार ध्यान नहीं देतेि ह ै. (PDTB- Cause.Result HDRB- Cause)‘Fashion designers say that the most prevalent thefts or copies are of monopoly designs. {Designers know this fact very well} so [it does not matter to them many times].’
Pragmatic discourse relations
● Discourse relations are ● Pragmatic when they have to be inferred from the
propositional content of the arguments● Uniform three-way classification of pragmatic
senses of all types in HDRB ● Epistemic, Speech act (Sweetser, 1990)● Propositional
Pragmatic discourse relations
● Pragmatic concession, Subtype - Propositionalइनमे से एक डिर ाईविर ने हिथयारोंक की जानकारी होगने पर मामले से हाथ खीच िलया था लेिकन अदालति ने कहा िक अगिर उसने उस विक्त पुिलस कोग सूचना दे दी होगतिी तिोग िविस्फोगटोंक का षडं्यत्र भी िविफल होग जातिा
“[ , One of the drivers amongst them after learning about , .] the arms withdrew from the action But { the court
commented that had he informed the police about it at , the time the plot to cause blasts would not have been
}”successful
.
Pragmatic discourse relations
The Goal sense● New sense type Goal under the “Contingency” class● Applies where the situation described in one argument
is the goal of the situation described in the other argument (which enables the achievement of the goal)
● The argument describing the goal marked as2, 1Arg and the other argument is marked Arg
● , In PDTB goal subsumed by the result subtype● Distinguishing between cause and goal can have
important consequences such as, in the way questions are formulated over the relation
The Goal sense
सुभाष का आरोगप ह ैिक राजद अध्यक्ष राणा कोग इिसलए िटकेट देना चाहतेि है, िजससे चारे घोगटाले मे वेि गिविाह नही बन सके.
“[ ' Subhash s allegation is that RJD President wants to ( ) ] give a election ticket to Rana so that { he does not
.}”become a witness in the fodder scam
Annotation task design
● Annotation of discourse relations over continuous text ● Connective-wise annotation in PDTB● All types of discourse relations annotated in a single
pass ● Motivation
● Avoid multiple annotation passes over the same discourse ● Avoid incompatible interpretations of the text from different
passes● Higher Annotation load ?
Annotation task design
● Implicit connectives across paragraph boundaries also annotated
● Subordinators and particles excluded in Phase I● Attribution to be annotated in next phase
Subordinators
Post-positions, verbal participles, and suffixes that introduce non-finite clauses
बा के जाने के बाद उन्होंकने उस लडिके कोग अपने पास बुलाया . (Succession)‘After {Baa left} [he called the boy to him].’
... खेलतेि हुए यह भूल जातिा ह ैकी यिद उसका िमत्र भी उसे अपने िखलौने कोग हाथ लगिाने नहीं देतिा तिोग उसकोग िकतिना बुरा लगितिा . (Synchronous)‘...while [playing] {he forgets that if his friend too didn’t let him touch his toy, then he would feel very bad too].’
Particles
Particles such as भी , ही can function as discourse connectives in Hindi
लोगगि इसे दोगनोंक देशोंक के बीच बड़तेि िरश्तेि के पिरणाम के रूप मे देख रहे ह ै. कश्मीरी लोगगि इससे एक राजनीितिक सबक भी ले रहे ह ै. (Conjunction)‘[People see this as a consequence of the improving relation between the two countries]. {The Kashmiris are} also {learning a political lesson from this}.’
Instances only where they indicate the inclusion of verbs taken as discourse connectives
Restricted Backoff
● In PDTB, backoff to higher levels in the sense hierarchy allowed
● For example, in case of ambiguity between “Contrast” and ”Concession”, “Comparison” at the class level
● In HDRB, backoff restricted only upto the type level
● Restricted backoff due to semantic argument labels
Issues
● Implicit discourse relations between non-adjacent arguments
● AltLex vs Explicit connective
Implicit connectives between non-adjacent arguments
वििरष माकपा नेतिा नीलोगतपल बसु के मुतिािबकआम बजट मे न्यूनतिम साझा कायरकम की झलकसाफनजर आतिी है। विामदललगिातिार सरकार पर िशक्षा, स्विास्थय वि रोगजगिार के्षत्रोंक मे बजट प्रताविधान बढ़ाने के िलए दबावि बनाए हुए थे। बजट मे इन सभी के्षत्रोंक पर ध्यान िदया गिया है। ... Implicit= दूसरी ओर आरएसपी केअबनी
रॅय ने कहा िकबजट मे छोगटे िकसानोंक के िलए कुछ नहीं िकया गिया है। लघु उदोगगि के िलए नई योगजना नहीं है और कस्टम डू्यटी मे कटौतिी का सीधा असर घरेलू उदोगगि पर पडे़गिा। ...
● ‘[According to senior CPI(M) leader Mr.Nilotpal Basu, the general budget clearly reflects aspects of the Common Minimum Programme.] The left parties had been continuously exerting pressure on the government to increase allocation to education, health and employment sectors. And the budget has focused on all these areas ... Implicit=On the other hand, {RSP’s Mr.Avani Rai said that the budget has nothing to offer to small-scale farmers.} There is no planning for small-scale industry and the drop in custom duty will have a direct adverse impact on cottage industries ...
AltLex vs Explicit connective
● Distinction between AltLex expressions and Explicit connectives● Connectives are frozen
● Expressions in Hindi● Deictic element + Relation, e.g., इसके अलाविा ( )apart from this , उसके
अलाविा ( )apart from that , िजसके अलाविा ( )apart from which● Forms like इसी के साथ involving modification of the deictic element - (इस
+ ही) के साथ ( ( ) ) in addition to all this ● Explore classification of AltLex along the lines of Prasad et al.
(2010)
HDRB Annotation Statistics
Relation type # annotated tokens (% of total)
Explicit 835 (32.6)
Implicit 1286 (50)
AltLex 81 (3.2)
EntRel 341 (13.3)
NoRel 21 (0.9)
Total 2564
HDRB: Publications
● Oza et al. (2009a), The Hindi Discourse Relation Bank (LAW 2009)
● Oza et al. (2009b), Experiments with Annotating Discourse Relations in the Hindi Discourse Relation Bank (ICON 2009)
● Kolachina et al. (2012) Evaluation of discourse relation annotation in the Hindi Discourse Relation Bank (To appear in LREC 2012)
Thanks !
Explicit connective vs EntRel
● समुदी तूिफान के बाद मे बेस मे हुए नुकसान का वयिक्तगिति रूप सेजायजा लेने के बाद उन्होंकने कहा िकएकविषर मे हम बेसकोग पूरी तिरह ऑपरेशनलबना देगेि। उन्होंकने कहा िकसमस्या जहाजोंक और िविमानोंक से यहां से िनमारण
सामगी लाने की है। इसमे रनवेि िबछाने के िलए कंकीट भी शािमल है। गिौरतिलब है िक दूसरे िविशवि युद केसमय िबटेन केसाथ- साथ अमेिरका की सेनाएं दिक्षण एिशया मे जापानी सेनाओंके िखलाफअंडिमान- िनकोगबार का
इस्तेिमालकरतिी थीं। After a personal assessment of the damage incurred during the sea storm, he said that the base will be made fully operational in one year. He said that the main issue is of bringing construction material here on planes and ships. This includes concrete to lay the runway. It is noteworthy that during the second world war, Britain and American troops used the Andaman-Nicobar as base to counter the advance of Japanese troops in South Asia.
● Explicit connective
Subordinating Conjunctions
Lexical items conjoining finite adverbial clauses to their matrix clause
Typically occur clause-initially Both single (e.g., क्योंकिक (because)) and paired forms (e.g., अगिर...तिोग
(if..then))
[आज दीया जलाया गिया ह ै] क्योंकिक {आज मेरी विषरगिाँठ है .} (Cause)‘[Today the lamp has been lit] because {it is my birthday}.’ अगिर [ कोगई आपसे कहे की नमक छोगड़ दोग ] तिोग { आप भी नहीं छोगड़ेगेि } (Conditional)‘If [one were to ask you to quit taking salt] then {even you would not
quit}.’
Coordinating Conjunctions
Lexical items conjoining clauses or phrases of the same syntactic status
Occur clause-initially, e.g., और (and), लेिकन (but) Single as well as paired forms e.g., न केविल… बि (ल्क (not
only...but also)
[ संघ के सगंिठन अनेक ह ै ] िकन्तिु { िविचारधारा एक ही है .} (Concession)‘[There are many groups in the Sangh] but {there is just one
ideology.}’
Adverbials
Adverbial and prepositional phrases claimed to function as anaphoric discourse (Webber et al., 2003)
Some examples of these are सोग (so), िफ़िर (then), नहीं तिोग (otherwise), विास्तिवि मे (in fact), तिभी (just then), इसके अलाविा (in addition to this) etc.
[ दानविी लहरोंक के कारण अंडिमान के पि (श्चमी तिट पर तिटीय विनस्पिति परूी तिरह बबारद होग गियी .] इसके अलाविा { मूंगेि के चट्टानोंक कोग भी नुक्सान हुआ ह ै.} (Expansion)
‘[The coastal vegetation on the west coast of the Andaman has been completely destroyed due to wild waves]. In addition, {the coral reefs have also been damaged}.’
Pied-piped Sentential Relativizers
Pied-piped relative phrases that conjoin a relative clause with the predication of its matrix clause (rather than some NP)
Examples are िजससे (so that), िजसके कारण (because of which)
[ सारा काम छोगड़कर विह उस िचिड़या कोग उठाकर दविाघर की ओर भागिा ] िजससे { उसका सही इलाज िकया जा सके .} (Cause)
‘[Dropping all his work, he picked up the bird and ran towards the dispensary] so that {it could be given proper treatment}’
The relative pronoun िजससे modifies the event expressed in the matrix clause
Arguments of Discourse Relations
• In PDTB, Arg2 is the argument “syntactically” associated with the connective, and Arg1 is the “other” argument.
• In HDRB, argument naming is based on the sense of the relation. Each relation definition specifies its own convention for argument naming.
• E.g., In the “cause” relation, one argument is the cause and the other is the effect. HDRB convention: Arg1=effect; Arg2=cause
• Advantages of semantic naming scheme: More meaningful, and simplifies the sense classification hierarchy.
December 2009, ICON HDRB, Umangi et al. 28
Arguments of Discourse Relations
● Cause after effect. Hence, Arg1-Arg2 प्रतितियोगिगितिा के बाद सोगनल ने बतिाया िक [ िविजेतिा के रूप मे जब उसका नाम पुकारा गिया तिोग उसे कुछ देर तिक खदु पर
िविश्विास ही नहीं हुआ ] क्योंकिक { विह मान कर चल रही थी िक यह प्रतितियोगिगितिा िफ़िक्स ह ै.} (Cause)‘After the competition, Sonal said that [when her name was announced as the winner, she could not believe herself for some time], because {she was thinking that the competition was fixed}.’
• Cause before effect (Arg2-Arg1) फैशन िडिजाईनरोंक का कहना ह ैिक सबसे ज्यादा नक़ल या चोगरी मोगनोगपोगली िडिजाईन की होगतिी ह ै. { िडिजाईनर इन बातिोंक
कोग बखूबी जानतेि ह ै} इसिलए [ कई बार ध्यान नहीं देतेि ह ै.] (Cause) ‘Fashion designers say that the most prevalent thefts or copies are of monopoly designs. {Designers know this fact very well} so [it does not matter to them many times].’
December 2009, ICON HDRB, Umangi et al. 29
December 2009, ICON HDRB, Umangi et al. 30
Implicit Discourse RelationsFor adjacent sentences not related by an explicit connective, four possibilities are
considered in order:
(1) Infer a discourse relation and insert an “implicit” connective between them
{ इस गेिम के सार ेिखलाड़ी सिचन तिेदलुकर से भी महान ह ै.} IMPLICIT = इसिलए [ इनकोग क्लीन बोगल्डि करना िकसी के बस की बाति नहीं .] (Causal)‘{All the players in this game are greater than even Sachin Tendulkar} so [it is not possible for anyone to get them clean bowled.]’
(2) If relation is inferred but insertion of connective leads to redundancy, find and annotate an “alternate Lexicalization” (AltLex) of the relation
{ बगंिलादेश की कानून वयविस्था की हालति मे सुधार हुआ ह ै} AltLex [ इसी विजह से भारति ने सम्मलेन मे शािमल होगना का फ़ैिसला िलया ह ै]‘{Bangladesh’s judiciary has seen an improvement}. That is why [India has decided to participate in the conference.]’
Other Relations
(3) If no discourse relation is inferred but coherence results from an entity-based relation, annotate relation as “EntRel”.
[ िफ़िल्म महोगतसवि मे प्रतकाश झा की नयी िफ़िल्म अपहरण का भी प्रतीिमयर होगना ह ै] EntRel { गिंगिाजल के बाद झा की यह िकसी अलगि िविषय पर बनी दसूरी िफ़िल्म ह ै.}‘[Prakash Jha’s latest film Apaharan will be premiered at the film festival.] {This is Jha’s second film on a different subject after Gangajal.}’
(4) If no discourse relation or EntRel is perceived, annotate relation as “NoRel”
December 2009, ICON HDRB, Umangi et al. 31
Slide 1Slide 2Slide 3Slide 4Slide 5Slide 6Slide 7Slide 8Slide 9Slide 10Slide 11Slide 12Slide 13Slide 14Slide 15Slide 16Slide 17Slide 18Slide 19Slide 20Slide 21Slide 22Slide 23Slide 24Slide 25Slide 26Slide 27Arguments of Discourse RelationsSlide 29Implicit Discourse RelationsOther Relations