+ All Categories
Home > Documents > Hindi Discourse Relation Bank - Penn Engineeringpdtb2012/assets/... · [ दानविी...

Hindi Discourse Relation Bank - Penn Engineeringpdtb2012/assets/... · [ दानविी...

Date post: 13-Mar-2020
Category:
Upload: others
View: 1 times
Download: 0 times
Share this document with a friend
31
Hindi Discourse Relation Bank Sudheer Kolachina Dipti Misra Sharma LTRC, IIIT-Hyderabad joint work with Rashmi Prasad HIA, UWM Aravind Joshi IRCS, UPENN
Transcript
  • Hindi Discourse Relation Bank

    Sudheer KolachinaDipti Misra Sharma

    LTRC, IIIT-Hyderabad

    joint work with

    Rashmi PrasadHIA, UWM

    Aravind JoshiIRCS, UPENN

  • HDRB: Introduction

    ● Objective● Create a large-scale corpus of Hindi texts annotated with

    discourse relations ● Corpus

    ● 200 K drawn from the 400 K Hindi newspaper corpus on which morpho-syntactic annotation (POS, morph, chunk, syntactic dependency) being carried out independently (Begum et al., 2008)

    ● Parallel development of HDRB and the multi-level, multi-representational Hindi treebank (Xia et al., 2009, Bhatt et al., 2009, Bhatia et al., 2010) leading to an enriched linguistic resource for Hindi

  • HDRB: Introduction

    ● PDTB annotation scheme● Lexically grounded approach to discourse relations

    ● Explicit Connectives● Implicit connectives● AltLex expressions● EntRel ● NoRel

  • HDRB Experience

    ● Departures from PDTB annotation scheme● Annotation task design● Issues in Hindi discourse● Statistics from data annotated so far

  • Departures from PDTB

    ● Semantically driven assignment of argument labels

    ● Elimination of argument-specific sense labels ● Uniform treatment of pragmatic relations of all

    types● Addition of Goal and Similarity senses

  • Semantic Argument labels

    ● Cause after effect. Hence, Arg1-Arg2 प्रतितियोगिगितिा के बाद सोगनल ने बतिाया िक िविजेतिा के रूप मे जब उसका नाम पुकारा गिया

    तिोग उसे कुछ देर तिक खदु पर िविश्विास ही नहीं हुआ क्योंकिक विह मान कर चल रही थी िक यह प्रतितियोगिगितिा िफ़िक्स ह ै. (PDTB- Cause.Reason HDRB- Cause) ‘After the competition, Sonal said that [when her name was announced as the winner, she could not believe herself for some time], because {she was thinking that the competition was fixed}.’

    • Cause before effect (Arg2-Arg1) फैशन िडिजाईनरोंक का कहना ह ैिक सबसे ज्यादा नक़ल या चोगरी मोगनोगपोगली िडिजाईन की

    होगतिी ह ै. िडिजाईनर इन बातिोंक कोग बखबूी जानतेि है इसिलए कई बार ध्यान नहीं देतेि ह ै. (PDTB- Cause.Result HDRB- Cause)‘Fashion designers say that the most prevalent thefts or copies are of monopoly designs. {Designers know this fact very well} so [it does not matter to them many times].’

  • Pragmatic discourse relations

    ● Discourse relations are ● Pragmatic when they have to be inferred from the

    propositional content of the arguments● Uniform three-way classification of pragmatic

    senses of all types in HDRB ● Epistemic, Speech act (Sweetser, 1990)● Propositional

  • Pragmatic discourse relations

    ● Pragmatic concession, Subtype - Propositionalइनमे से एक डिर ाईविर ने हिथयारोंक की जानकारी होगने पर मामले से हाथ खीच िलया था लेिकन अदालति ने कहा िक अगिर उसने उस विक्त पुिलस कोग सूचना दे दी होगतिी तिोग िविस्फोगटोंक का षडं्यत्र भी िविफल होग जातिा

    “[ , One of the drivers amongst them after learning about , .] the arms withdrew from the action But { the court

    commented that had he informed the police about it at , the time the plot to cause blasts would not have been

    }”successful

  • .

    Pragmatic discourse relations

  • The Goal sense● New sense type Goal under the “Contingency” class● Applies where the situation described in one argument

    is the goal of the situation described in the other argument (which enables the achievement of the goal)

    ● The argument describing the goal marked as2, 1Arg and the other argument is marked Arg

    ● , In PDTB goal subsumed by the result subtype● Distinguishing between cause and goal can have

    important consequences such as, in the way questions are formulated over the relation

  • The Goal sense

    सुभाष का आरोगप ह ैिक राजद अध्यक्ष राणा कोग इिसलए िटकेट देना चाहतेि है, िजससे चारे घोगटाले मे वेि गिविाह नही बन सके.

    “[ ' Subhash s allegation is that RJD President wants to ( ) ] give a election ticket to Rana so that { he does not

    .}”become a witness in the fodder scam

  • Annotation task design

    ● Annotation of discourse relations over continuous text ● Connective-wise annotation in PDTB● All types of discourse relations annotated in a single

    pass ● Motivation

    ● Avoid multiple annotation passes over the same discourse ● Avoid incompatible interpretations of the text from different

    passes● Higher Annotation load ?

  • Annotation task design

    ● Implicit connectives across paragraph boundaries also annotated

    ● Subordinators and particles excluded in Phase I● Attribution to be annotated in next phase

  • Subordinators

    Post-positions, verbal participles, and suffixes that introduce non-finite clauses

    बा के जाने के बाद उन्होंकने उस लडिके कोग अपने पास बुलाया . (Succession)‘After {Baa left} [he called the boy to him].’

    ... खेलतेि हुए यह भूल जातिा ह ैकी यिद उसका िमत्र भी उसे अपने िखलौने कोग हाथ लगिाने नहीं देतिा तिोग उसकोग िकतिना बुरा लगितिा . (Synchronous)‘...while [playing] {he forgets that if his friend too didn’t let him touch his toy, then he would feel very bad too].’

  • Particles

    Particles such as भी , ही can function as discourse connectives in Hindi

    लोगगि इसे दोगनोंक देशोंक के बीच बड़तेि िरश्तेि के पिरणाम के रूप मे देख रहे ह ै. कश्मीरी लोगगि इससे एक राजनीितिक सबक भी ले रहे ह ै. (Conjunction)‘[People see this as a consequence of the improving relation between the two countries]. {The Kashmiris are} also {learning a political lesson from this}.’

    Instances only where they indicate the inclusion of verbs taken as discourse connectives

  • Restricted Backoff

    ● In PDTB, backoff to higher levels in the sense hierarchy allowed

    ● For example, in case of ambiguity between “Contrast” and ”Concession”, “Comparison” at the class level

    ● In HDRB, backoff restricted only upto the type level

    ● Restricted backoff due to semantic argument labels

  • Issues

    ● Implicit discourse relations between non-adjacent arguments

    ● AltLex vs Explicit connective

  • Implicit connectives between non-adjacent arguments

    वििरष माकपा नेतिा नीलोगतपल बसु के मुतिािबकआम बजट मे न्यूनतिम साझा कायरकम की झलकसाफनजर आतिी है। विामदललगिातिार सरकार पर िशक्षा, स्विास्थय वि रोगजगिार के्षत्रोंक मे बजट प्रताविधान बढ़ाने के िलए दबावि बनाए हुए थे। बजट मे इन सभी के्षत्रोंक पर ध्यान िदया गिया है। ... Implicit= दूसरी ओर आरएसपी केअबनी

    रॅय ने कहा िकबजट मे छोगटे िकसानोंक के िलए कुछ नहीं िकया गिया है। लघु उदोगगि के िलए नई योगजना नहीं है और कस्टम डू्यटी मे कटौतिी का सीधा असर घरेलू उदोगगि पर पडे़गिा। ...

    ● ‘[According to senior CPI(M) leader Mr.Nilotpal Basu, the general budget clearly reflects aspects of the Common Minimum Programme.] The left parties had been continuously exerting pressure on the government to increase allocation to education, health and employment sectors. And the budget has focused on all these areas ... Implicit=On the other hand, {RSP’s Mr.Avani Rai said that the budget has nothing to offer to small-scale farmers.} There is no planning for small-scale industry and the drop in custom duty will have a direct adverse impact on cottage industries ...

  • AltLex vs Explicit connective

    ● Distinction between AltLex expressions and Explicit connectives● Connectives are frozen

    ● Expressions in Hindi● Deictic element + Relation, e.g., इसके अलाविा ( )apart from this , उसके

    अलाविा ( )apart from that , िजसके अलाविा ( )apart from which● Forms like इसी के साथ involving modification of the deictic element - (इस

    + ही) के साथ ( ( ) ) in addition to all this ● Explore classification of AltLex along the lines of Prasad et al.

    (2010)

  • HDRB Annotation Statistics

    Relation type # annotated tokens (% of total)

    Explicit 835 (32.6)

    Implicit 1286 (50)

    AltLex 81 (3.2)

    EntRel 341 (13.3)

    NoRel 21 (0.9)

    Total 2564

  • HDRB: Publications

    ● Oza et al. (2009a), The Hindi Discourse Relation Bank (LAW 2009)

    ● Oza et al. (2009b), Experiments with Annotating Discourse Relations in the Hindi Discourse Relation Bank (ICON 2009)

    ● Kolachina et al. (2012) Evaluation of discourse relation annotation in the Hindi Discourse Relation Bank (To appear in LREC 2012)

  • Thanks !

  • Explicit connective vs EntRel

    ● समुदी तूिफान के बाद मे बेस मे हुए नुकसान का वयिक्तगिति रूप सेजायजा लेने के बाद उन्होंकने कहा िकएकविषर मे हम बेसकोग पूरी तिरह ऑपरेशनलबना देगेि। उन्होंकने कहा िकसमस्या जहाजोंक और िविमानोंक से यहां से िनमारण

    सामगी लाने की है। इसमे रनवेि िबछाने के िलए कंकीट भी शािमल है। गिौरतिलब है िक दूसरे िविशवि युद केसमय िबटेन केसाथ- साथ अमेिरका की सेनाएं दिक्षण एिशया मे जापानी सेनाओंके िखलाफअंडिमान- िनकोगबार का

    इस्तेिमालकरतिी थीं। After a personal assessment of the damage incurred during the sea storm, he said that the base will be made fully operational in one year. He said that the main issue is of bringing construction material here on planes and ships. This includes concrete to lay the runway. It is noteworthy that during the second world war, Britain and American troops used the Andaman-Nicobar as base to counter the advance of Japanese troops in South Asia.

    ● Explicit connective

  • Subordinating Conjunctions

    Lexical items conjoining finite adverbial clauses to their matrix clause

    Typically occur clause-initially Both single (e.g., क्योंकिक (because)) and paired forms (e.g., अगिर...तिोग

    (if..then))

    [आज दीया जलाया गिया ह ै] क्योंकिक {आज मेरी विषरगिाँठ है .} (Cause)‘[Today the lamp has been lit] because {it is my birthday}.’ अगिर [ कोगई आपसे कहे की नमक छोगड़ दोग ] तिोग { आप भी नहीं छोगड़ेगेि } (Conditional)‘If [one were to ask you to quit taking salt] then {even you would not

    quit}.’

  • Coordinating Conjunctions

    Lexical items conjoining clauses or phrases of the same syntactic status

    Occur clause-initially, e.g., और (and), लेिकन (but) Single as well as paired forms e.g., न केविल… बि (ल्क (not

    only...but also)

    [ संघ के सगंिठन अनेक ह ै  ] िकन्तिु { िविचारधारा एक ही है  .} (Concession)‘[There are many groups in the Sangh] but {there is just one

    ideology.}’

  • Adverbials

    Adverbial and prepositional phrases claimed to function as anaphoric discourse (Webber et al., 2003)

    Some examples of these are सोग (so), िफ़िर (then), नहीं तिोग (otherwise), विास्तिवि मे (in fact), तिभी (just then), इसके अलाविा (in addition to this) etc.

    [ दानविी लहरोंक के कारण अंडिमान के पि (श्चमी तिट पर तिटीय विनस्पिति परूी तिरह बबारद होग गियी .] इसके अलाविा { मूंगेि के चट्टानोंक कोग भी नुक्सान हुआ ह ै.} (Expansion)

    ‘[The coastal vegetation on the west coast of the Andaman has been completely destroyed due to wild waves]. In addition, {the coral reefs have also been damaged}.’

  • Pied-piped Sentential Relativizers

    Pied-piped relative phrases that conjoin a relative clause with the predication of its matrix clause (rather than some NP)

    Examples are िजससे (so that), िजसके कारण (because of which)

    [ सारा काम छोगड़कर विह उस िचिड़या कोग उठाकर दविाघर की ओर भागिा ] िजससे { उसका सही इलाज िकया जा सके .} (Cause)

    ‘[Dropping all his work, he picked up the bird and ran towards the dispensary] so that {it could be given proper treatment}’

    The relative pronoun िजससे modifies the event expressed in the matrix clause

  • Arguments of Discourse Relations

    • In PDTB, Arg2 is the argument “syntactically” associated with the connective, and Arg1 is the “other” argument.

    • In HDRB, argument naming is based on the sense of the relation. Each relation definition specifies its own convention for argument naming.

    • E.g., In the “cause” relation, one argument is the cause and the other is the effect. HDRB convention: Arg1=effect; Arg2=cause

    • Advantages of semantic naming scheme: More meaningful, and simplifies the sense classification hierarchy.

    December 2009, ICON HDRB, Umangi et al. 28

  • Arguments of Discourse Relations

    ● Cause after effect. Hence, Arg1-Arg2 प्रतितियोगिगितिा के बाद सोगनल ने बतिाया िक [ िविजेतिा के रूप मे जब उसका नाम पुकारा गिया तिोग उसे कुछ देर तिक खदु पर

    िविश्विास ही नहीं हुआ ] क्योंकिक { विह मान कर चल रही थी िक यह प्रतितियोगिगितिा िफ़िक्स ह ै.} (Cause)‘After the competition, Sonal said that [when her name was announced as the winner, she could not believe herself for some time], because {she was thinking that the competition was fixed}.’

    • Cause before effect (Arg2-Arg1) फैशन िडिजाईनरोंक का कहना ह ैिक सबसे ज्यादा नक़ल या चोगरी मोगनोगपोगली िडिजाईन की होगतिी ह ै. { िडिजाईनर इन बातिोंक

    कोग बखूबी जानतेि ह ै} इसिलए [ कई बार ध्यान नहीं देतेि ह ै.] (Cause) ‘Fashion designers say that the most prevalent thefts or copies are of monopoly designs. {Designers know this fact very well} so [it does not matter to them many times].’

    December 2009, ICON HDRB, Umangi et al. 29

  • December 2009, ICON HDRB, Umangi et al. 30

    Implicit Discourse RelationsFor adjacent sentences not related by an explicit connective, four possibilities are

    considered in order:

    (1) Infer a discourse relation and insert an “implicit” connective between them

    { इस गेिम के सार ेिखलाड़ी सिचन तिेदलुकर से भी महान ह ै.} IMPLICIT = इसिलए [ इनकोग क्लीन बोगल्डि करना िकसी के बस की बाति नहीं .] (Causal)‘{All the players in this game are greater than even Sachin Tendulkar} so [it is not possible for anyone to get them clean bowled.]’

    (2) If relation is inferred but insertion of connective leads to redundancy, find and annotate an “alternate Lexicalization” (AltLex) of the relation

    { बगंिलादेश की कानून वयविस्था की हालति मे सुधार हुआ ह ै} AltLex [ इसी विजह से भारति ने सम्मलेन मे शािमल होगना का फ़ैिसला िलया ह ै]‘{Bangladesh’s judiciary has seen an improvement}. That is why [India has decided to participate in the conference.]’

  • Other Relations

    (3) If no discourse relation is inferred but coherence results from an entity-based relation, annotate relation as “EntRel”.

    [ िफ़िल्म महोगतसवि मे प्रतकाश झा की नयी िफ़िल्म अपहरण का भी प्रतीिमयर होगना ह ै] EntRel { गिंगिाजल के बाद झा की यह िकसी अलगि िविषय पर बनी दसूरी िफ़िल्म ह ै.}‘[Prakash Jha’s latest film Apaharan will be premiered at the film festival.] {This is Jha’s second film on a different subject after Gangajal.}’

    (4) If no discourse relation or EntRel is perceived, annotate relation as “NoRel”

    December 2009, ICON HDRB, Umangi et al. 31

    Slide 1Slide 2Slide 3Slide 4Slide 5Slide 6Slide 7Slide 8Slide 9Slide 10Slide 11Slide 12Slide 13Slide 14Slide 15Slide 16Slide 17Slide 18Slide 19Slide 20Slide 21Slide 22Slide 23Slide 24Slide 25Slide 26Slide 27Arguments of Discourse RelationsSlide 29Implicit Discourse RelationsOther Relations


Recommended