An Unsupervised Approach to Recognizing Discourse Relationsmarcu/papers/relations-acl02.pdf · An...

An UnsupervisedApproachto RecognizingDiscourseRelations

Daniel Mar cu and AbdessamadEchihabiInformationSciencesInstituteandDepartmentof ComputerScienceUniversityof SouthernCalifornia4676Admiralty Way, Suite1001

Marinadel Rey, CA, 90292�marcu,echihabi � @isi.edu

Abstract

We presentan unsupervisedapproachtorecognizingdiscourserelationsof CON-TRAST, EXPLANATION-EVIDENCE, CON-DITION andELABORATION that hold be-tweenarbitraryspansof texts. We showthat discourserelation classifierstrainedon examples that are automaticallyex-tractedfrom massive amountsof text canbe usedto distinguishbetweensomeoftheserelationswith accuraciesashigh as93%, even whenthe relationsarenot ex-plicitly markedby cuephrases.

1 Intr oduction

In the field of discourseresearch,it is now widelyagreedthat sentences/clausesare usually not un-derstoodin isolation, but in relation to other sen-tences/clauses.Given the high level of interestinexplaining the natureof theserelationsandin pro-viding definitionsfor them (Mann andThompson,1988; Hobbs,1990; Martin, 1992; LascaridesandAsher, 1993; Hovy and Maier, 1993; Knott andSanders,1998),it is surprisingthat thereareno ro-bustprogramscapableof identifying discourserela-tionsthatholdbetweenarbitraryspansof text. Con-sider, for example,thesentence/clausepairsbelow.

a. Such standardswould precludearms salestostateslike Libya, which is alsocurrentlysub-ject to a U.N. embargo.

b. But stateslike Rwandabeforeits presentcrisiswould still beableto legally buy arms.

(1)

a. SouthAfrica can afford to forgo salesof gunsandgrenades

b. becauseit actually makes most of its profitsfrom the saleof expensive, high-technologysystemslike laser-designatedmissiles, air-craft electronicwarfare systems,tactical ra-dios,anti-radiationbombsandbattlefieldmo-bility systems.

(2)

In theseexamples,thediscoursemarkersBut andbecausehelp us figure out that a CONTRAST re-lation holds betweenthe text spansin (1) and anEXPLANATION-EVIDENCE relation holds betweenthespansin (2). Unfortunately, cuephrasesdo notsignalall relationsin atext. In thecorpusof Rhetori-calStructuretrees(www.isi.edu/� marcu/discourse/)built by Carlsonet al. (2001),for example,we haveobserved that only 61 of 238 CONTRAST relationsand79 out of 307 EXPLANATION-EVIDENCE rela-tions that hold betweentwo adjacentclausesweremarkedby acuephrase.

So what shall we do when no discoursemarkers are used? If we had access to ro-bust semantic interpreters, we could, forexample, infer from sentence 1.a that “can-not buy armslegally(libya)”, infer from sen-tence1.b that “can buy armslegally(rwanda)”,useour backgroundknowledge in order to infer that“similar(libya,rwanda)”, andapply Hobbs’s (1990)definitions of discourserelations to arrive at theconclusionthata CONTRAST relationholdsbetweenthesentencesin (1). Unfortunately, thestateof theart in NLP doesnot provide us accessto semanticinterpretersand generalpurposeknowledge basesthat would support these kinds of inferences.The discourse relation definitions proposed by

others (Mann and Thompson, 1988; Lascaridesand Asher, 1993; Knott and Sanders,1998) arenot easier to apply either becausethey assumethe ability to automaticallyderive, in addition tothe semanticsof the text spans,the intentionsandillocutionsassociatedwith themaswell.

In spite of the difficulty of determiningthe dis-course relations that hold betweenarbitrary textspans,it is clear that suchan ability is importantin many applications. First, a discourserelationrecognizerwould enablethe developmentof im-proveddiscourseparsersand,consequently, of highperformancesingledocumentsummarizers(Marcu,2000). In multidocumentsummarization(DUC,2002),it would enablethedevelopmentof summa-rization programscapableof identifying contradic-tory statementsboth within and acrossdocumentsand of producingsummariesthat reflect not onlythesimilaritiesbetweenvariousdocuments,but alsotheir differences. In question-answering,it wouldenablethe developmentof systemscapableof an-sweringsophisticated,non-factoidqueries,suchas“what were thecausesof X?” or “what contradictsY?”, whicharebeyondthestateof theart of currentsystems(TREC,2001).

In this paper, we describeexperimentsaimedatbuilding robustdiscourse-relationclassificationsys-tems. To build suchsystems,we train a family ofNaive Bayesclassifierson a large set of examplesthat aregeneratedautomaticallyfrom two corpora:a corpusof 41,147,805Englishsentencesthathaveno annotations,and BLIPP, a corpusof 1,796,386automaticallyparsedEnglish sentences(Charniak,2000),which is available from the Linguistic DataConsortium(www.ldc.upenn.edu).Westudyempir-ically the adequacy of variousfeaturesfor the taskof discourserelationclassificationandweshow thatsomediscourserelationscanbecorrectlyrecognizedwith accuraciesashighas93%.

2 Discourserelation definitions andgenerationof training data

2.1 Background

In orderto build a discourserelationclassifier, onefirst needsto decidewhat relation definitions oneis going to use. In Section1, we simply relied onthe reader’s intuition whenwe claimedthata CON-

TRAST relationholdsbetweenthe sentencesin (1).In reality though, associatinga discourserelationwith a text spanpair is a choicethat is clearly in-fluencedby thetheoreticalframework oneis willingto adopt.

If we adopt, for example, Knott andSanders’s (1998) account, we would say thatthe relation between sentences1.a and 1.b isADDITIVE, becauseno causal connectionexistsbetweenthe two sentences,PRAGMATIC, becausethe relation pertains to illocutionary force andnot to the propositionalcontentof the sentences,and NEGATIVE, becausethe relation involves aCONTRAST betweenthe two sentences. In thesameframework, the relation betweenclauses2.aand 2.b will be labeled as CAUSAL-SEMANTIC-POSITIVE-NONBASIC. In Lascaridesand Asher’stheory(1993),we would label the relationbetween2.aand2.b as EXPLANATION becausethe event in2.bexplainswhy theeventin 2.ahappened(perhapsby CAUSING it). In Hobbs’s theory (1990), wewould also label the relation between2.a and 2.bas EXPLANATION becausethe event assertedby2.b CAUSED or could CAUSE the event assertedin2.a.And in MannandThompsontheory(1988),wewould label sentencepairs 1.a, 1.b as CONTRAST

becausethe situationspresentedin them are thesame in many respects(the purchaseof arms),becausethesituationsaredifferentin somerespects(Libya cannotbuy armslegally while Rwandacan),and becausethese situations are comparedwithrespectto thesedifferences. By a similar line ofreasoning,we would label the relationbetween2.aand2.basEVIDENCE.

Thediscussionabove illustratestwo points.First,it is clearthatalthoughcurrentdiscoursetheoriesarebuilt on fundamentallydifferentprinciples,they allsharesomecommonintuitions. Sure,sometheo-riestalkabout“negativepolarity” while othersabout“contrast”.Sometheoriesreferto “causes”,someto“potentialcauses”,andsometo “explanations”.Butultimately, all thesetheoriesacknowledgethattherearesuchthingsasCONTRAST, CAUSE, andEXPLA-NATION relations.Second,given thecomplexity ofthedefinitionsthesetheoriespropose,it is clearwhyit is difficult to build programsthat recognizesuchrelationsin unrestrictedtexts. CurrentNLP tech-niquesdo not enableus to reliably infer from sen-

tence1.athat “cannotbuy armslegally(libya)” anddo not give usaccessto generalpurposeknowledgebasesthatassertthat“similar(libya,rwanda)”.

Theapproachweadvocatein thispaperis in somerespectslessambitiousthan currentapproachestodiscourserelationsbecauseit relies upon a muchsmallersetof relationsthanthoseusedby MannandThompson(1988) or Martin (1992). In our work,we decideto focusonly on four typesof relations,whichwecall: CONTRAST, CAUSE-EXPLANATION-EVIDENCE (CEV), CONDITION, and ELABORA-TION. (We definetheserelationsin Section2.2.) Inother respectsthough,our approachis moreambi-tious becauseit focuseson the problemof recog-nizing suchdiscourserelationsin unrestrictedtexts.In otherwords, given as input sentencepairssuchas thoseshown in (1)–(2), we develop techniquesandprogramsthat label the relationsthat hold be-tweenthesesentencepairsas CONTRAST, CAUSE-EXPLANATION-EVIDENCE, CONDITION, ELABO-RATION or NONE-OF-THE-ABOVE, even whenthediscourse relations are not explicitly signalled bydiscoursemarkers.

2.2 Discourserelation definitions

The discourserelations we focus on are definedat a much coarser level of granularity than inmost discoursetheories. For example, we con-siderthat a CONTRAST relationholdsbetweentwotext spansif one of the following relationsholds:CONTRAST, ANTITHESIS, CONCESSION, or OTH-ERWISE, asdefinedby MannandThompson(1988),CONTRAST or VIOLATED EXPECTATION, asdefinedby Hobbs(1990),or any of the relationscharacter-ized by this regular expressionof cognitive prim-itives, as defined by Knott and Sanders(1998):(CAUSAL � ADDITIVE) – (SEMANTIC � PRAGMATIC)– NEGATIVE. In otherwords,in ourapproach,wedonot distinguishbetweencontrastsof semanticandpragmaticnature,contrastsspecificto violatedex-pectations,etc. Table1 shows thedefinitionsof therelationswe considered.

Theadvantageof operatingwith coarselydefineddiscourserelationsis that it enablesus to automat-ically construct relatively low-noise datasetsthatcanbe usedfor learning. For example,by extract-ing sentencepairs that have the keyword “But” atthe beginning of the secondsentence,as the sen-

tencepair shown in (1), we canautomaticallycol-lectmany examplesof CONTRAST relations.And byextractingsentencesthat containthe keyword “be-cause”,wecanautomaticallycollectmany examplesof CAUSE-EXPLANATION-EVIDENCE relations.Aspreviousresearchin linguistics(HallidayandHasan,1976; Schiffrin, 1987) and computationallinguis-tics (Marcu,2000)show, someoccurrencesof “but”and“because”donothaveadiscoursefunction;andotherssignal other relationsthan CONTRAST andCAUSE-EXPLANATION. So we can expect the ex-ampleswe extract to be noisy. However, empiri-cal work of Marcu(2000)andCarlsonet al. (2001)suggeststhat the majority of occurrencesof “but”,for example,do signalCONTRAST relations.(In theRSTcorpusbuilt by Carlsonet al. (2001),89 out ofthe106occurrencesof “but” thatoccurat thebegin-ning of a sentencesignala CONTRAST relationthatholds betweenthe sentencethat containsthe word“but” andthe sentencethat precedesit.) Our hopeis that simpleextractionmethodsaresufficient forcollectinglow-noisetrainingcorpora.

2.3 Generationof training data

In order to collect training cases,we mined in anunsupervisedmannertwo corpora.Thefirst corpus,whichwecall Raw, is acorpusof 1 billion wordsofunannotatedEnglish(41,147,805sentences)thatwecreatedby catenatingvariouscorporamadeavail-ableover the yearsby the Linguistic DataConsor-tium. Thesecond,calledBLIPP, is a corpusof only1,796,386sentencesthatwereparsedautomaticallyby Charniak(2000). We extractedfrom both cor-pora all adjacentsentencepairs that containedthecuephrase“But” at thebeginningof thesecondsen-tenceandwe automaticallylabeledthe relationbe-tweenthetwo sentencepairsasCONTRAST. Wealsoextractedall the sentencesthat containedthe word“but” in themiddleof a sentence;we split eachex-tractedsentenceinto two spans,onecontainingthewordsfrom thebeginningof thesentenceto theoc-currenceof the keyword “but” andone containingthe words from the occurrenceof “but” to the endof thesentence;andwe labeledtherelationbetweenthetwo resultingtext spansasCONTRAST aswell.

Table 2 lists some of the cue phrases weused in order to extract CONTRAST, CAUSE-EXPLANATION-EVIDENCE, ELABORATION, and

CONTRAST CAUSE-EXPLANATION-EVIDENCE ELABORATION CONDITION

ANTITHESIS (M&T) EVIDENCE (M&T) ELABORATION (M&T) CONDITION (M&T)CONCESSION (M&T) VOLITIONAL -CAUSE (M&T) EXPANSION (Ho)OTHERWISE (M&T) NONVOLITIONAL -CAUSE (M&T) EXEMPLIFICATION (Ho)CONTRAST (M&T) VOLITIONAL -RESULT (M&T) ELABORATION (A&L)VIOLATED EXPECTATION (Ho) NONVOLITIONAL -RESULT (M&T)

EXPLANATION (Ho)( CAUSAL � ADDITIVE ) - RESULT (A&L)( SEMANTIC � PRAGMATIC ) - EXPLANATION (A&L)NEGATIVE (K&S)

CAUSAL -(SEMANTIC � PRAGMATIC ) -POSITIVE (K&S)

Table1: Relationdefinitionsas union of definitionsproposedby other researchers(M&T – (Mann andThompson,1988);Ho – (Hobbs,1990);A&L – (LascaridesandAsher, 1993);K&S – (Knott andSanders,1998)).

CONTRAST – 3,881,588examples[BOS �� EOS][BOS But �� EOS][BOS �� ] [but �� EOS][BOS �� ] [although �� EOS][BOS Although �� ,] [ �� EOS]

CAUSE-EXPLANATION-EVIDENCE — 889,946examples[BOS �� ] [because�� EOS][BOS Because�� ,] [ �� EOS][BOS �� EOS][BOS Thus, �� EOS]

CONDITION — 1,203,813examples[BOS If �� ,] [ �� EOS][BOS If �� ] [then �� EOS][BOS �� ] [if �� EOS]

ELABORATION — 1,836,227examples[BOS �� EOS][BOS �� for example �� EOS][BOS �� ] [which �� ,]

NO-RELATION-SAME-TEXT — 1,000,000examplesRandomlyextracttwo sentencesthataremorethan3 sentencesapartin a giventext.

NO-RELATION-DIFFERENT-TEXTS— 1,000,000examplesRandomlyextracttwo sentencesfrom twodifferentdocuments.

Table2: Patternsusedto automaticallyconstructacorpusof text spanpairslabeledwith discoursere-lations.

CONDITION relationsand the numberof examplesextractedfrom theRaw corpusfor eachtypeof dis-courserelation. In thepatternsin Table2, thesym-bols BOS and EOS denoteBeginningOfSentenceandEndOfSentenceboundaries,the“ ” standforoccurrencesof any words and punctuationmarks,the squarebrackets standfor text spanboundaries,andtheotherwordsandpunctuationmarksstandforthecuephrasesthatwe usedin orderto extractdis-courserelationexamples.For example,the pattern[BOS Although ,] [ EOS]is usedin orderto

extract examplesof CONTRAST relationsthat holdbetweena spanof text delimited to the left by thecuephrase“Although” occurringin thebeginningofa sentenceandto theright by thefirst occurrenceofacomma,andaspanof text thatcontainstherestofthesentenceto which “Although” belongs.

Wealsoextractedautomatically1,000,000exam-plesof whatwe hypothesizeto benon-relations,byrandomlyselectingnon-adjacentsentencepairsthatareatleast3sentencesapartin agiventext. Welabelsuch examplesNO-RELATION-SAME-TEXT. Andwe extractedautomatically1,000,000examplesofwhat we hypothesizeto be cross-documentnon-relations,by randomlyselectingtwo sentencesfromdistinct documents.As in the caseof CONTRAST

and CONDITION, the NO-RELATION examplesarealsonoisybecauselong distancerelationsarecom-monin well-written texts.

3 Determining discourserelationsusingNaiveBayesclassifiers

We hypothesizethat we candeterminethata CON-TRAST relationholdsbetweenthe sentencesin (3)evenif wecannotsemanticallyinterpretthetwo sen-tences,simply becauseour backgroundknowledgetells us that good and fails are good indicatorsofcontrastive statements.

� Johnis goodin mathandsciences.

� Paul fails almostevery classhetakes.

(3)

Similarly, wehypothesizethatwecandeterminethata CONTRAST relationholdsbetweenthe sentences

in (1), becauseour backgroundknowledgetells usthatembargo andlegally arelikely to occurin con-texts of oppositepolarity. In general,we hypothe-sizethat lexical item pairscanprovide cluesaboutthe discourserelationsthat hold betweenthe textspansin which thelexical itemsoccur.

To test this hypothesis,we need to solve twoproblems. First, we needa meansto acquirevastamountsof backgroundknowledgefrom which wecan derive, for example, that the word pairs good– fails and embargo – legally are good indicatorsof CONTRAST relations.Theextractionpatternsde-scribedin Table2 enableus to solve this problem.1

Second,givenvastamountsof trainingmaterial,weneeda meansto learnwhich pairsof lexical itemsarelikely to co-occurin conjunctionwith eachdis-courserelationandameansto applythelearnedpa-rametersto any pair of text spansin orderto deter-minethediscourserelationthatholdsbetweenthem.We solve the secondproblemin a Bayesianproba-bilistic framework.

We assumethata discourserelation �� thatholdsbetweentwo text spans,�� , is determinedbytheword pairsin thecartesianproductdefinedoverthewordsin thetwo text spans��! "�#�%$&�� .In general, a word pair �� ' � � $(� �can “signal” any relation �� . We determinethemost likely discourserelation that holds betweentwo text spans�� and �)� by taking themaximumover *+��,+-.*0/%1�2435��6 7�8�#�4�� , which accordingtoBayesrule, amountsto taking the maximumover*0�6,0-.*+/91�2+: ;=<6,035�>��)�?� �� @�&A(;=<�,B35�� ?�DC . If weassumethat the word pairs in the cartesianprod-uct are independent,35�>��4��)�?� �� E� is equivalentto F#GIH9J=K H0LNMPOEQSR�K QUT935��6�V�W�� . The values35��6�X�.�� E� are computedusing maximumlikelihoodestimators,whicharesmoothedusingtheLaplacemethod(ManningandSchutze,1999).

For eachdiscourserelation pair �6Y?��Z , we traina word-pair-basedclassifierusingtheautomaticallyderived training examplesin the Raw corpus,fromwhichwe first removedthecue-phrasesusedfor ex-tracting theexamples. This ensuresthatour classi-

1Notethatrelyingonthelist of antonymsprovidedby Word-net (Fellbaum,1998)is not enoughbecausethesemanticrela-tions in Wordnetarenot definedacrossword classboundaries.Forexample,Wordnetdoesnotlist the“antonymy”-lik erelationbetweenembargo andlegally.

fiers do not learn, for example,that the word pairif – then is a good indicator of a CONDITION re-lation, which would simply amountto learningtodistinguishbetweenthe extractionpatternsusedtoconstructthe corpus. We test eachclassifieron atest corpusof 5000 exampleslabeledwith � Y and5000exampleslabeledwith ��Z , which ensuresthatthebaselineis thesamefor all combinations� Y and��Z , namely50%.

Table 3 shows the performanceof all discourserelationclassifiers.As onecansee,eachclassifieroutperformsthe50%baseline,with someclassifiersbeingasaccurateasthat thatdistinguishesbetweenCAUSE-EXPLANATION-EVIDENCE and ELABORA-TION relations,which hasanaccuracy of 93%. Wehavealsobuilt asix-way classifierto distinguishbe-tweenall six relation types. This classifierhasaperformanceof 49.7%,with a baselineof 16.67%,which is achieved by labelingall relationsasCON-TRASTS.

We alsoexaminedthe learningcurvesof variousclassifiersandnoticedthat,for someof them,thead-dition of trainingexamplesdoesnotappearto haveasignificantimpacton their performance.For exam-ple, the classifierthat distinguishesbetweenCON-TRAST andCAUSE-EXPLANATION-EVIDENCE rela-tions has an accuracy of 87.1% when trained on2,000,000examplesandanaccuracy of 87.3%whentrainedon 4,771,534examples. We hypothesizedthat theflatteningof thelearningcurve is explainedby thenoisein our trainingdataandthevastamountof wordpairsthatarenotlikely to begoodpredictorsof discourserelations.

To test this hypothesis,we decidedto carry outa secondexperiment that usedas predictorsonlya subsetof the word pairs in the cartesianproductdefined over the words in two given text spans.To achieve this, we usedthe patternsin Table2 toextract examplesof discourserelations from theBLIPP corpus. As expected, the BLIPP corpusyieldedmuch fewer learningcases:185,846CON-TRAST; 44,776CAUSE-EXPLANATION-EVIDENCE;55,699 CONDITION; and 33,369 ELABORA-TION relations. To these examples, we added58,000 NO-RELATION-SAME-TEXT and 58,000NO-RELATION-DIFFERENT-TEXTS relations.

To each text span in the BLIPP corpus corre-spondsa parsetree (Charniak,2000). We wrote

CONTRAST CEV COND ELAB NO-REL -SAME-TEXT NO-REL -DIFF-TEXTSCONTRAST - 87 74 82 64 64CEV 76 93 75 74COND 89 69 71ELAB 76 75NO-REL -SAME-TEXT 64

Table3: Performancesof classifierstrainedon theRaw corpus.Thebaselinein all casesis 50%.

CONTRAST CEV COND ELAB NO-REL -SAME-TEXT NO-REL -DIFF-TEXTSCONTRAST - 62 58 78 64 72CEV 69 82 64 68COND 78 63 65ELAB 78 78NO-REL -SAME-TEXT 66

Table4: Performancesof classifierstrainedon theBLIPPcorpus.Thebaselinein all casesis 50%.

a simple programthat extractedthe nouns,verbs,and cue phrasesin each sentence/clause. Wecall thesethe most representativewords of a sen-tence/discourseunit. For example,the mostrepre-sentative wordsof the sentencein example(4), arethoseshown in italics.

Italy’s unadjustedindustrialproductionfell in Jan-

uary 3.4%from a year earlierbut rose0.4%from

December, thegovernmentsaid

(4)

Werepeatedtheexperimentwecarriedoutin con-junction with the Raw corpuson the dataderivedfrom theBLIPPcorpusaswell. Table4 summarizestheresults.

Overall, the performanceof the systemstrainedon themostrepresentative word pairsin theBLIPPcorpusis clearly lower thantheperformanceof thesystemstrained on all the word pairs in the Rawcorpus.But a direct comparisonbetweentwo clas-sifiers trained on different corpora is not fair be-causewith just 100,000examplesper relation, thesystemstrainedon theRaw corpusaremuchworsethanthosetrainedon theBLIPP data.The learningcurvesin Figure1 areilluminatingasthey show thatif oneusesasfeaturesonly themostrepresentativeword pairs,oneneedsonly about100,000trainingexamplesto achieve thesamelevel of performanceoneachievesusing1,000,000trainingexamplesandfeaturesdefinedover all word pairs.Also, sincethelearningcurve for theBLIPP corpusis steeperthan

Figure 1: Learningcurves for the ELABORATION

vs. CAUSE-EXPLANATION-EVIDENCE classifiers,trainedon theRaw andBLIPPcorpora.

thelearningcurve for theRaw corpus,this suggeststhat discourserelation classifierstrained on mostrepresentative word pairs and millions of trainingexamplescanachieve higherlevels of performancethan classifierstrainedon all word pairs (unanno-tateddata).

4 Relevanceto RST

Theresultsin Section3 indicateclearlythatmassiveamountsof automaticallygenerateddatacanbeusedto distinguishbetweendiscourserelationsdefinedasdiscussedin Section2.2. What the experiments

CONTR CEV COND ELAB# testcases 238 307 125 1761

CONTR — 6356 8065 6488CEV 8771 7685COND 8793

Table5: Performancesof Raw-trainedclassifiersonmanually labeledRST relationsthat hold betweenelementarydiscourseunits. Performanceresultsareshown in bold; baselinesareshown in normalfonts.

in Section3 do not show is whetherthe classifiersbuilt in thismannercanbeof any usein conjunctionwith someestablisheddiscoursetheory. To testthis,we usedthe corpusof discoursetreesbuilt in thestyleof RSTby Carlsonet al. (2001).We automati-cally extractedfrom thismanuallyannotatedcorpusall CONTRAST, CAUSE-EXPLANATION-EVIDENCE,CONDITION andELABORATION relationsthat holdbetweentwo adjacentelementarydiscourseunits.SinceRST (Mann and Thompson,1988) employsa finer grainedtaxonomyof relationsthanwe used,weappliedthedefinitionsshown in Table1. Thatis,we consideredthat a CONTRAST relation held be-tweentwo text spansif a humanannotatorlabeledthe relation betweenthosespansas ANTITHESIS,CONCESSION, OTHERWISE or CONTRAST. We re-trainedthen all classifierson the Raw corpus,butthis time without removing from thecorpusthecuephrasesthat wereusedto generatethe training ex-amples.We did this becausewhentrying to deter-mine whethera CONTRAST relationholdsbetweentwo spansof textsseparatedby thecuephrase“but”,for example,we want to take advantageof the cuephraseoccurrenceaswell. We employed our clas-sifierson the manuallylabeledexamplesextractedfrom Carlsonetal.’scorpus(2001).Table5 displaystheperformanceof our two way classifiersfor rela-tions definedover elementarydiscourseunits. Thetabledisplaysin thesecondrow, for eachdiscourserelation,thenumberof examplesextractedfrom theRSTcorpus.For eachbinaryclassifier, thetablelistsin boldtheaccuracy of ourclassifierandin non-boldfont themajority baselineassociatedwith it.

The resultsin Table 5 show that the classifierslearnedfrom automaticallygeneratedtraining data

canbe usedto distinguishbetweencertaintypesofRST relations. For example,the resultsshow thatthe classifierscan be usedto distinguishbetweenCONTRAST and CAUSE-EXPLANATION-EVIDENCE

relations,asdefinedin RST, but notsowell betweenELABORATION andany other relation. This resultis consistentwith thediscoursemodelproposedbyKnott etal. (2001),whosuggestthatELABORATION

relationsare too ill-defined to be part of any dis-coursetheory.

The analysisabove is informative only from amachinelearning perspective. From a linguisticperspective though, this analysisis not very use-ful. If no cue phrasesare usedto signal the re-lation betweentwo elementarydiscourseunits, anautomaticdiscourselabelercan at bestguessthatan ELABORATION relationholdsbetweentheunits,becauseELABORATION relationsare the most fre-quentlyusedrelations(Carlsonet al., 2001).Fortu-nately, with the classifiersdescribedhere,onecanlabelsomeof theunmarkeddiscourserelationscor-rectly.

For example,the RST-annotatedcorpusof Carl-son et al. (2001) contains 238 CONTRAST rela-tionsthatholdbetweentwo adjacentelementarydis-courseunits. Of these,only 61 aremarkedby a cuephrase,which meansthat a programtrained onlyon Carlsonet al.’s corpuscould identify at most61/238of the CONTRAST relationscorrectly. Be-causeCarlsonet al.’s corpusis small,all unmarkedrelationswill be likely labeledas ELABORATIONs.However, whenwe run our CONTRAST vs. ELAB-ORATION classifieron theseexamples,we can la-bel correctly 60 of the 61 cue-phrasemarked re-lations and, in addition, we can also label 123 ofthe177relationsthatarenot markedexplicitly withcue phrases. This meansthat our classifiercon-tributesto an increasein accuracy from [7\�]E^@_@`�a^@[?b to �P[EcdA(\4^@_?��]E^@_@`Vafe@e@b !!! Similarly, outof the 307 CAUSE-EXPLANATION-EVIDENCE rela-tionsthathold betweentwo discourseunits in Carl-son et al.’s corpus,only 79 are explicitly marked.A program trained only on Carlson et al.’s cor-pus, would, therefore,identify at most 79 of the307 relationscorrectly. Whenwe run our CAUSE-EXPLANATION-EVIDENCE vs. ELABORATION clas-sifier on theseexamples,we labeledcorrectly 73of the 79 cue-phrase-marked relationsand 102 of

the 228 unmarked relations. This correspondstoan increasein accuracy from eEg?]E_Ec+eha ^@[?b to�>eE_UAV\c?^?��]E_Ec+eiakj?e@b .

5 Discussion

In a seminalpaper, Banko and Brill (2001) haverecently shown that massive amountsof data canbe usedto significantly increasethe performanceof confusionsetdisambiguators.In our paper, weshow that massive amountsof datacanhave a ma-jor impacton discourseprocessingresearchaswell.Our experimentsshow that discourserelationclas-sifiers that usevery simple featuresachieve unex-pectedlyhighlevelsof performancewhentrainedonextremely large datasets. Developing lower-noisemethodsfor automaticallycollecting training dataanddiscovering featuresof higherpredictive powerfor discourserelationclassificationthanthefeaturespresentedin thispaperappearto beresearchavenuesthatareworthwhileto pursue.

Over thelastthirty years,thenature,number, andtaxonomyof discourserelationshave beenamongthe most controversial issuesin text/discourselin-guistics.This paperdoesnot settlethecontroversy.Rather, it raisessomenew, interestingquestionsbe-causethelexical patternslearnedby our algorithmscan be interpretedas empirical proof of existencefor discourserelations. If text productionwas notgovernedby any rulesabove thesentencelevel, weshould have not beenable to improve on any ofthe baselinesin our experiments. Our resultssug-gestthat it may be possibleto develop fully auto-matic techniquesfor defining empirically justifieddiscourserelations.

Acknowledgments. This work wassupportedbythe NationalScienceFoundationundergrantnum-berIIS-0097846andby theAdvancedResearchandDevelopmentActivity (ARDA)’s AdvancedQues-tion Answering for Intelligence(AQUAINT) Pro-gramundercontractnumberMDA908-02-C-0007.

ReferencesMichele Banko and Eric Brill. 2001. Scalingto very

very large corporafor natural languagedisambigua-tion. In Proceedingsof the39thAnnualMeetingof theAssociationfor ComputationalLinguistics(ACL’01),Toulouse,France,July6–11.

LynnCarlson,DanielMarcu,andMaryEllenOkurowski.2001. Building a discourse-taggedcorpus in theframework of rhetoricalstructuretheory. In Proceed-ingsof the2ndSIGDIALWorkshopon DiscourseandDialogue, Eurospeech 2001, Aalborg, Denmark.

EugeneCharniak. 2000. A maximum-entropy-inspiredparser. In Proceedingsof the First Annual Meetingof theNorth AmericanChapterof theAssociationforComputationalLinguisticsNAACL–2000, pages132–139,Seattle,Washington,April 29– May 3.

DUC–2002. Proceedingsof the SecondDocumentUn-derstandingConference, Philadelphia,PA, July.

ChristianeFellbaum,editor. 1998. Wordnet: An Elec-tronic Lexical Database. TheMIT Press.

MichaelA.K. HallidayandRuqaiyaHasan.1976.Cohe-sionin English. Longman.

JerryR. Hobbs. 1990. Literature andCognition. CSLILectureNotesNumber21.

EduardH. Hovy andElisabethMaier. 1993. Parsimo-nious or profligate: How many and which discoursestructurerelations?UnpublishedManuscript.

Alistair Knott andTed J.M. Sanders. 1998. The clas-sification of coherencerelationsand their linguisticmarkers: An exploration of two languages.Journalof Pragmatics, 30:135–175.

Alistair Knott, Jon Oberlander, Mick O’Donnell, andChris Mellish. 2001. Beyond elaboration: The in-teractionof relationsand focus in coherenttext. InT. Sanders,J. Schilperoord,andW. Spooren,editors,Text representation: linguistic and psycholinguisticaspects, pages181–196.Benjamins.

Alex LascaridesandNicholasAsher. 1993. Temporalinterpretation,discourserelations,andcommonsenseentailment. Linguisticsand Philosophy, 16(5):437–493.

William C. Mann and SandraA. Thompson. 1988.Rhetoricalstructuretheory: Toward a functional the-ory of text organization.Text, 8(3):243–281.

ChristopherManningandHinrich Schutze. 1999. Foun-dations of Statistical Natural Language Processing.TheMIT Press.

Daniel Marcu. 2000. TheTheoryand Practiceof Dis-courseParsingandSummarization. TheMIT Press.

JamesR. Martin. 1992. EnglishText. SystemandStruc-ture. JohnBenjaminPublishingCompany.

DeborahSchiffrin. 1987. Discourse Markers. Cam-bridgeUniversityPress.

TREC–2001.Proceedingsof the Text Retrieval Confer-ence, November. TheQuestion-AnsweringTrack.

Date post:	17-Oct-2020
Category:	Documents
Upload:	others
View:	0 times
Download:	0 times

An Unsupervised Approach to Recognizing Discourse Relationsmarcu/papers/relations-acl02.pdf · An...

Documents