Noun Phrase Coreference as Clustering

8/8/2019 Noun Phrase Coreference as Clustering

1/8

Noun Phrase Coreference as Clustering

Claire Cardie and Kiri WagstaffDepartment of Computer Science

Cornell University

Ithaca, NY 14853E-mail: cardie,[email protected]

Proceedings of the Joint Conference on Empirical Methods in Natural Language Processingand Very Large Corpora, 8289, Association for Computational Linguistics, 1999.

AbstractThis paper introduces a new, unsupervised algo-rithm for noun phrase coreference resolution. It dif-fers from existing methods in that it views coref-erence resolution as a clustering task. In an eval-uation on the MUC-6 coreference resolution cor-pus, the algorithm achieves an F-measure of 53.6%,placing it firmly between the worst (40%) and best(65%) systems in the MUC-6 evaluation. More im-portantly, the clustering approach outperforms theonly MUC-6 system to treat coreference resolutionas a learning problem. The clustering algorithm ap-pears to provide a flexible mechanism for coordi-

nating the application of context-independent andcontext-dependent constraints and preferences foraccurate partitioning of noun phrases into corefer-ence equivalence classes.

1 Introduction

Many natural language processing (NLP) applica-tions require accurate noun phrase coreference reso-lution: They require a means for determining whichnoun phrases in a text or dialogue refer to the samereal-world entity. The vast majority of algorithmsfor noun phrase coreference combine syntactic and,less often, semantic cues via a set of hand-craftedheuristics and filters. All but one system in theMUC-6 coreference performance evaluation (MUC-6, 1995), for example, handled coreference resolutionin this manner. This same reliance on complicatedhand-crafted algorithms is true even for the narrowertask of pronoun resolution. Some exceptions exist,however. Ge et al. (1998) present a probabilisticmodel for pronoun resolution trained on a small sub-set of the Penn Treebank Wall Street Journal corpus(Marcus et al., 1993). Dagan and Itai (1991) develop

a statistical filter for resolution of the pronoun itthat selects among syntactically viable antecedentsbased on relevant subject-verb-object cooccurrences.Aone and Bennett (1995) and McCarthy and Lehn-ert (1995) employ decision tree algorithms to handlea broader subset of general noun phrase coreferenceproblems.

This paper presents a new corpus-based approachto noun phrase coreference. We believe that itis the first such unsupervised technique developedfor the general noun phrase coreference task. Inshort, we view the task of noun phrase coreferenceresolution as a clustering task. First, each nounphrase in a document is represented as a vectorof attribute-value pairs. Given the feature vectorfor each noun phrase, the clustering algorithm coor-dinates the application of context-independent andcontext-dependent coreference constraints and pref-erences to partition the noun phrases into equiv-alence classes, one class for each real-world entity

mentioned in the text. Context-independent corefer-ence constraints and preferences are those that applyto two noun phrases in isolation. Context-dependentcoreference decisions, on the other hand, considerthe relationship of each noun phrase to surroundingnoun phrases.

In an evaluation on the MUC-6 coreference reso-lution corpus, our clustering approach achieves anF-measure of 53.6%, placing it firmly between theworst (40%) and best (65%) systems in the MUC-6 evaluation. More importantly, the clustering ap-proach outperforms the only MUC-6 system to viewcoreference resolution as a learning problem: TheRESOLVE system (McCarthy and Lehnert, 1995)employs decision tree induction and achieves an F-measure of 47% on the MUC-6 data set. Further-more, our approach has a number of importantadvantages over existing learning and non-learningmethods for coreference resolution:

The approach is largely unsupervised, so no an-notated training corpus is required.

Although evaluated in an information ex-


2/8

traction context, the approach is domain-independent.

As noted above, the clustering approach pro-vides a flexible mechanism for coordinat-ing context-independent and context-dependentcoreference constraints and preferences for par-

titioning noun phrases into coreference equiva-lence classes.

As a result, we believe that viewing noun phrasecoreference as clustering provides a promising frame-work for corpus-based coreference resolution.

The remainder of the paper describes the details ofour approach. The next section provides a concretespecification of the noun phrase coreference resolu-tion task. Section 3 presents the clustering algo-rithm. Evaluation of the approach appears in Sec-tion 4. Qualitative and quantitative comparisons torelated work are included in Section 5.

2 Noun Phrase Coreference

It is commonly observed that a human speaker orauthor avoids repetition by using a variety of nounphrases to refer to the same entity. While humanaudiences have little trouble mapping a collectionof noun phrases onto the same entity, this task ofnoun phrase (NP) coreference resolutioncan presenta formidable challenge to an NLP system. Fig-ure 1 depicts a typical coreference resolution system,which takes as input an arbitrary document and pro-duces as output the appropriate coreference equiva-lence classes. The subscripted noun phrases in thesample output constitute two noun phrase corefer-ence equivalence classes: Class JS contains the fivenoun phrases that refer to John Simon, and classPC contains the two noun phrases that representPrime Corp. The figure also visually links neigh-boring coreferent noun phrases. The remaining (un-bracketed) noun phrases have no coreferent NPs andare considered singleton equivalence classes. Han-dling the JS class alone requires recognizing corefer-ent NPs in appositive and genitive constructions aswell as those that occur as proper names, possessivepronouns, and definite NPs.

3 Coreference as Clustering

Our approach to the coreference task stems fromthe observation that each group of coreferent nounphrases defines an equivalence class1. Therefore, itis natural to view the problem as one of partitioning,or clustering, the noun phrases. Intuitively, all of thenoun phrases used to describe a specific concept will

1The coreference relation is symmetric, transitive, and re-flexive.

John Simon, Chief Financial Officer of

Prime Corp. since 1986, saw his pay jump

20%, to $1.3 million, as the 37-year-old also

became the financial-services companys

president.

[JS John Simon], [JS Chief Financial Officer]

of [PC Prime Corp.] since 1986, saw [JS his]

pay jump 20%, to $1.3 million, as [JS the 37-

year-old] also became [PC the financial-

services company]s [JS president].

Coreference System

Figure 1: Coreference System

be near or related in some way, i.e. their concep-tual distance will be small. Given a descriptionof each noun phrase and a method for measuringthe distance between two noun phrases, a cluster-ing algorithm can then group noun phrases together:Noun phrases with distance greater than a cluster-ing radius r are not placed into the same partitionand so are not considered coreferent.

The subsections below describe the noun phraserepresentation, the distance metric, and the cluster-

ing algorithm in turn.

3.1 Instance Representation

Given an input text, we first use the Empire nounphrase finder (Cardie and Pierce, 1998) to locateall noun phrases in the text. Note that Empireidentifies only base noun phrases, i.e. simple nounphrases that contain no other smaller noun phraseswithin them. For example, Chief Financial Officerof Prime Corp. is too complex to be a base nounphrase. It contains two base noun phrases: ChiefFinancial Officer and Prime Corp.

Each noun phrase in the input text is then repre-sented as a set of 11 features as shown in Table 1.This noun phrase representation is a first approxi-mation to the feature vector that would be requiredfor accurate coreference resolution. All feature val-ues are automatically generated and, therefore, arenot always perfect. In particular, we use very sim-ple heuristics to approximate the behavior of morecomplex feature value computations:

Individual Words. The words contained in the


3/8


4/8


5/8

Coreference Clustering (NPn, NPn1, . . . , N P 1)

1. Let r be the clustering radius.

2. Mark each noun phrase NPi as belonging to its ownclass, ci: ci = {NPi}.

3. Proceed through the noun phrases from the docu-ment in reverse order, NPn, NPn1, . . . , N P 1. Foreach noun phrase NPj encountered, consider eachpreceding noun phrase NPi.

(a) Let d = dist(NPi, NPj).

(b) Let ci = class ofNPi and cj = class ofNPj .

(c) Ifd < r and All NPs Compatible (ci, cj)then cj = ci cj .

All NPs Compatible (ci, cj)

1. For all NPa cj

(a) For all NPb ci

i. Ifdist(NPa, NPb) = then Return false.

2. Return true.

Figure 2: Clustering Algorithm

in position is dependent on the length of the docu-ment. For illustration, assume that this is less thanr. Thus, dist(N P1, N P3) < r. Their coreferenceclasses, c1 and c3, are then considered for merging.Because they are singleton classes, there is no addi-

tional possibility for conflict, and both noun phrasesare merged into c1.

The algorithm then considers N P2.Dist(N P1, N P2) = 11.0 plus a small penaltyfor their difference in position. If this distance is r, they will not be considered coreferent, andthe resulting equivalence classes will be: {{Thechairman, he}, {Ms. White}}. Otherwise, thedistance is < r, and the algorithm considers c1 andc2 for merging. However, c1 contains N P3, and, ascalculated above, the distance from N P2 to N P3 is. This incompatibility prevents the merging of c1and c2, so the resulting equivalence classes would

still be {{The chairman, he}, {Ms. White}}.In this way, the equivalence classes grow in a

flexible manner. In particular, the clustering al-gorithm automatically computes the transitive clo-sure of the coreference relation. For instance, ifdist(N Pi, N Pj) < r and dist(N Pj, N Pk) < rthen (assuming no incompatible NPs), N Pi, N Pj ,and N Pk will be in the same class and consid-ered mutually coreferent. In fact, it is possible thatdist(N Pi, N Pk) r, according to the distance mea-

sure; but as long as that distance is not , N Pican be in the same class as N Pk. The distancemeasure operates on two noun phrases in isolation,but the clustering algorithm can and does make useof intervening NP information: intervening nounphrases can form a chain that links otherwise dis-tant NPs. By separating context-independent andcontext-dependent computations, the noun phraserepresentation and distance metric can remain fairlysimple and easily extensible as additional knowledgesources are made available to the NLP system forcoreference resolution.

4 Evaluation

We developed and evaluated the clustering approachto coreference resolution using the dry run andformal evaluation MUC-6 coreference corpora.Each corpus contains 30 documents that have beenannotated with NP coreference links. We used the

dryrun data for development of the distance measureand selection of the clustering radius r and reservedthe formal evaluation materials for testing. All re-sults are reported using the standard measures of re-call and precision or F-measure (which combines re-call and precision equally). They were calculated au-tomatically using the MUC-6 scoring program (Vi-lain et al., 1995).

Table 4 summarizes our results and comparesthem to three baselines. For each algorithm, weshow the F-measure for the dryrun evaluation (col-umn 2) and the formal evaluation (column 4). (Theadjusted results are described below.) For the

dryrun data set, the clustering algorithm obtains48.8% recall and 57.4% precision. The formal eval-uation produces similar scores: 52.7% recall and54.6% precision. Both runs use r = 4, which was ob-tained by testing different values on the dryrun cor-pus. Table 5 summarizes the results on the dryrundata set for r values from 1.0 to 10.0.3 As expected,increasing r also increases recall, but decreases pre-cision. Subsequent tests with different values for ron the formal evaluation data set also obtained op-timal performance with r = 4. This provides partialsupport for our hypothesis that r need not be recal-culated for new corpora.

The remaining rows in Table 4 show the perfor-mance of the three baseline algorithms. The firstbaseline marks every pair of noun phrases as coref-erent, i.e. all noun phrases in the document form oneclass. This baseline is useful because it establishesan upper bound for recall on our clustering algo-rithm (67% for the dryrun and 69% for the formalevaluation). The second baseline marks as corefer-

3Note that r need not be an integer, especially when thedistance metric is returning non-integral values.


6/8


7/8

Algorithm Recall Precision F-measureClustering 53 55 54

RESOLVE 44 51 47Best MUC-6 59 72 65Worst MUC-6 36 44 40

Table 6: Results on the MUC-6 Formal Evaluation

large number of non-learning approaches to corefer-ence resolution. Table 6 provides a comparison ofour results to the best and worst of these systems.Most implemented a series of linguistic constraintssimilar in spirit to those employed in our system.The main advantage of our approach is that all con-straints and preferences are represented neatly inthe distance metric (and radius r), allowing for sim-ple modification of this measure to incorporate new

knowledge sources. In addition, we anticipate beingable to automatically learn the weights used in thedistance metric.

There is also a growing body of work on the nar-rower task of pronoun resolution. Azzam et al.(1998), for example, describe a focus-based approachthat incorporates discourse information when re-solving pronouns. Lappin and Leass (1994) makeuse of a series of filters to rule out impossible an-tecedents, many of which are similar to our -incompatibilities. They also make use of more exten-sive syntactic information (such as the thematic roleeach noun phrase plays), and thus require a fuller

parse of the input text. Ge et al. (1998) presenta supervised probabilistic algorithm that assumes afull parse of the input text. Dagan and Itai (1991)present a hybrid full-parse/unsupervised learningapproach that focuses on resolving it. Despite alarge corpus (150 million words), their approach suf-fers from sparse data problems, but works well whenenough relevant data is available. Lastly, Cardie(1992a; 1992b) presents a case-based learning ap-proach for relative pronoun disambiguation.

Our clustering approach differs from this previouswork in several ways. First, because we only requirethe noun phrases in any input text, we do not requirea full syntactic parse. Although we would expect in-creases in performance if complex noun phrases wereused, our restriction to base NPs does not reflect alimitation of the clustering algorithm (or the dis-tance metric), but rather a self-imposed limitationon the preprocessing requirements of the approach.Second, our approach is unsupervised and requiresno annotation of training data, nor a large corpus forcomputing statistical occurrences. Finally, we han-dle a wide array of noun phrase coreference, beyond

just pronoun resolution.

6 Conclusions and Future Work

We have presented a new approach to noun phrasecoreference resolution that treats the problem asa clustering task. In an evaluation on the MUC-6 coreference resolution data set, the approachachieves very promising results, outperforming the

only other corpus-based learning approach and pro-ducing recall and precision scores that place it firmlybetween the best and worst coreference systems inthe evaluation. In contrast to other approaches tocoreference resolution, ours is unsupervised and of-fers several potential advantages over existing meth-ods: no annotated training data is required, the dis-tance metric can be easily extended to account foradditional linguistic information as it becomes avail-able to the NLP system, and the clustering approachprovides a flexible mechanism for combining a vari-ety of constraints and preferences to impose a parti-tioning on the noun phrases in a text into coreference

equivalence classes.Nevertheless, the approach can be improved in a

number of ways. Additional analysis and evaluationon new corpora are required to determine the gen-erality of the approach. Our current distance met-ric and noun phrase instance representation are onlyfirst, and admittedly very coarse, approximations tothose ultimately required for handling the wide va-riety of anaphoric expressions that comprise nounphrase coreference. We would also like to make useof cues from centering theory and plan to explore thepossibility of learning the weights associated witheach term in the distance metric. Our methods for

producing the noun phrase feature vector are alsooverly simplistic. Nevertheless, the relatively strongperformance of the technique indicates that cluster-ing constitutes a powerful and natural approach tonoun phrase coreference resolution.

7 Acknowledgments

This work was supported in part by NSF Grants IRI9624639 and EIA 9703470, and a National ScienceFoundation Graduate fellowship. We would like to thankDavid Pierce for his formatting and technical advice.

ReferencesChinatsu Aone and William Bennett. 1995. Eval-

uating Automated and Manual Acquisition ofAnaphora Resolution Strategies. In Proceedingsof the 33rd Annual Meeting of the Association forComputational Linguistics, pages 122129. Asso-ciation for Computational Linguistics.

S. Azzam, K. Humphreys, and R. Gaizauskas. 1998.Evaluating a Focus-Based Approach to AnaphoraResolution. In Proceedings of the 36th Annual


8/8

Meeting of the Association for ComputationalLinguistics and COLING-98, pages 7478. Asso-ciation for Computational Linguistics.

C. Cardie and D. Pierce. 1998. Error-Driven Prun-ing of Treebank Grammars for Base Noun PhraseIdentification. In Proceedings of the 36th An-nual Meeting of the Association for Computa-tional Linguistics and COLING-98, pages 218224, University of Montreal, Montreal, Canada.Association for Computational Linguistics.

C. Cardie. 1992a. Corpus-Based Acquisition of Rel-ative Pronoun Disambiguation Heuristics. In Pro-ceedings of the 30th Annual Meeting of the Asso-ciation for Computational Linguistics, pages 216223, University of Delaware, Newark, DE. Associ-ation for Computational Linguistics.

C. Cardie. 1992b. Learning to Disambiguate Rel-ative Pronouns. In Proceedings of the Tenth Na-tional Conference on Artificial Intelligence, pages3843, San Jose, CA. AAAI Press / MIT Press.

I. Dagan and A. Itai. 1991. A Statistical Filterfor Resolving Pronoun References. In Y. A. Feld-man and A. Bruckstein, editors, Artificial Intelli-gence and Computer Vision, pages 125135. Else-vier Science Publishers, North Holland.

C. Fellbaum. 1998. WordNet: An Electronical Lex-ical Database. MIT Press, Cambridge, MA.

N. Ge, J. Hale, and E. Charniak. 1998. A StatisticalApproach to Anaphora Resolution. In Charniak,Eugene, editor, Proceedings of the Sixth Workshopon Very Large Corpora, pages 161170, Montreal,Canada. ACL SIGDAT.

S. Lappin and H. Leass. 1994. An Algorithm for

Pronominal Anaphora Resolution. ComputationalLinguistics, 20(4):535562.M. Marcus, M. Marcinkiewicz, and B. Santorini.

1993. Building a Large Annotated Corpus of En-glish: The Penn Treebank. Computational Lin-guistics, 19(2):313330.

J. McCarthy and W. Lehnert. 1995. Using DecisionTrees for Coreference Resolution. In C. Mellish,editor, Proceedings of the Fourteenth InternationalConference on Artificial Intelligence, pages 10501055.

MUC-6. 1995. Proceedings of the Sixth Message Un-derstanding Conference (MUC-6). Morgan Kauf-mann, San Francisco, CA.

J. R. Quinlan. 1993. C4.5: Programs for MachineLearning. Morgan Kaufmann, San Mateo, CA.

M. Vilain, J. Burger, J. Aberdeen, D. Connolly,and L. Hirschman. 1995. A model-theoretic coref-erence scoring scheme. In Proceedings of theSixth Message Understanding Conference (MUC-6), pages 4552, San Francisco, CA. Morgan Kauf-mann.

Date post:	10-Apr-2018
Category:	Documents
Upload:	aant-adhi-hartadi
View:	228 times
Download:	0 times

Noun Phrase Coreference as Clustering

Documents