Bar Ilan University
The Department of Computer Science
Using Semantic Knowledge for
Coreference Resolution
by
Chen Erez
Submitted in partial fulfillment of the requirements for the Master's
Degree in the Department of Computer Science, Bar-Ilan University
Ramat Gan, Israel June 2009, Sivan 5769
1
This work was carried out under the supervision of Dr. Ido Dagan
The Department of Computer Science
Bar-Ilan University
Israel
2
AcknowledgementsThis thesis paper has been accomplished with the help of a number of
people, and I wish to express my heartfelt thanks to them.
I am grateful to Dr. Ido Dagan at Bar-Ilan University for his
supervision of the thesis. It has been a great pleasure working with him, and I
have learned much.
I would like to thank Shachar Mirkin, Eyal Shnarch and Roy Bar Haim
for their guidance and helpful comments throughout the work.
I would like to thank our NLP group for their mutual support.
I would like to thank my family and friends for understanding,
supporting and encouraging.
3
Table of contentAbstract 5
1. Introduction 7
2. Background and related work
12
3. Coreference resolution using semantic features 25
3.1 Baseline coreference resolution system 25
3.2 Additional semantic features 25
4. Results analysis
4.1 The dataset 38
4.2 Evaluation measures 39
4.3 Experimental setup 43
4.4 Results 45
4.5 Detail analysis 48
5. Conclusion and future work 60
6. References 62
7. Appendixes 64
7.1 Decision trees example 64
4
AbstractCoreference resolution is the process of matching pairs of natural
language expressions that refer to the same entity in the real world. For
example, consider the sentence: "Yesterday Danny had a birthday party and
he got a lot of presents'". 'Danny' and 'He' are the same entity in the real
world, that is, Danny got a lot of presents.
This task relates mainly to coreference resolution between two noun
phrases in a document. The goal is to determine if some noun phrase, called
the anaphor, refers to a preceding noun phrase in the text, called the
antecedent. In this thesis, we focus on a supervised machine learning approach
for coreference resolution, using classifiers to determine whether two noun
phrases are coreferent or not. After classifying all candidate coreference pairs,
we group together all those noun phrases in the text that were found as
referring to the same real world entity, based on the equivalence relation
induced by the pair-wise classification (due to the transitivity of the
coreference relation). This process is called coreference chaining, and the set
of noun phrases in the text that refer to the same real world entity are called a
coreference chain.
Learning approaches for coreference resolution try to improve
performance mainly by enhancing the feature set of the coreference classifier.
Soon (2001) and Ng and Cardie (2002) introduced a coreference resolution
system with multiple simple lexical, grammatical and positional features.
However, their feature set lacks semantic knowledge – specifically,
recognizing whether the antecedent-anaphor candidate pair is related
semantically, such as through hypernym or synonymy, which is most relevant
for coreference resolution.
This thesis presents new enhancements for coreference resolution,
adding semantic knowledge to the machine learning approach. The main
idea is using semantic resources in order to recognize relevant semantic
relations between the antecedent-anaphor candidates. The semantic
resources we use are querying the web with Google, applying a pattern-
5
based approach, WordNet and Wikipedia. For each of these resources we
have created a new feature group.
We demonstrate that semantic knowledge is relevant for the
coreference resolution task by applying the coreference resolution system
with another group of features at each time, comparing results and
presenting the learnt decision trees. Also, we give a thorough error
analysis. We analyze each resource and describe its usefulness and its
types of errors, revealing many directions for further researches.
6
1. IntroductionThe Coreference Resolution Task
Coreference resolution is the process of matching pairs of natural
language expressions that refer to the same entity in the real world. In the
process of identifying (resolving) coreference, we use the linguistic concept of
anaphora. Anaphora resolution is the process of finding the antecedent (or the
referent) of an anaphoric noun phrase. An anaphoric noun phrase (the
anaphor) is a noun phrase that refers to a previous noun phrase (the
antecedent), which has already appeared in the text. For example, consider
refer to the following sentences:
"John gave Michael the Harry Potter book. He told Michael that the
book is marvelous."
In the above example we mark two noun phrases, John and He. John is
the antecedent and He is the anaphor. They corefer, that is, they refer to the
same real world entity (John). Other coreferring noun phrases are the two
mentions of Michael and the noun phrase 'the Harry Potter book' which is the
antecedent of the noun phrase 'the book'.
The next step is to create coreference chains. A coreference chain is an
equivalence relationship, because it is commutative, reflexive and transitive.
For instance, a coreference chain with three noun phrases includes an
anaphoric noun phrase A, which refers to an anaphoric noun phrase B (B is the
referent of A), where B refers to a noun phrase C (C is the referent of B).
Obviously, A, B and C all refer to the same real world entity.
The Importance of Coreference Resolution in NLP
Coreference resolution is a key task in many natural language
processing applications such as question answering and information retrieval.
Let us demonstrate the benefit of having a coreference resolution system as a
part of a question answering system:
Generally, the question-answering (QA) task is concerned with finding
an answer to a natural language question from a large text collection, or
7
determining that an answer cannot be found. The following example concerns
a typical question, and the benefit of a coreference resolution system:
Question: Where was Mozart born?
A part of the retrieved paragraph (identified as talking about Mozart):
"Mozart was a great musician … the musician was born in Salzburg…"
If we correctly identify Mozart as the antecedent of the musician, this assists in
answering the question.
The machine learning approach for coreference
In our research we focus on a supervised machine learning approach,
using classifiers to determine whether two noun phrases are coreferent. When
referring to the coreference resolution task as a classification problem, we
refer to a binary classification problem, coreferent or not-coreferent
A coreference classification system includes the following steps and
definitions: First, preprocessing the text, and perform various grammatical-
syntactic analyses, including the marking of noun phrases. Second, we prepare
a list of noun phrase pairs, where each pair contains an anaphor candidate and
an antecedent candidate. This list of noun phrase pairs defines our
classification instances. Third, we extract different features describing the
anaphor candidate, the antecedent candidate and different relationships
between them. This step is called feature extraction, and the different features
make up the feature set. The representation of an anaphor and an antecedent
noun phrase pair by these features is called a feature vector. We denote the
feature vector of a noun phrase pair as a coreference instance. Finally, a
classification algorithm creates a classification model, by tuning different
weights for these features, giving them preferences, or combining them
together to create decision trees. The created model is referred to as a
coreference classifier. The task of a trained coreference classifier is to
determine whether a coreference instance is coreferent or not.
8
Using semantic resources for coreference resolution
Many pairs of coreferring noun phrases have a semantic relationship.
For example, the pair of words 'pet' and 'dog' are having a hypernym relation
and are likely to be coreferent in a text like:
"Danny has a dog. He likes his pet very much"
Thus, extracting the semantic relation of the candidate antecedent-
anaphor pair seems to be useful for determining whether the candidate
pair is coreferent or not. More specifically, the most relevant semantic
relations are hypernym and synonyms. Having or not a strong semantic
relation between the pair of words helps to decide whether these words
corefer or not.
Therefore, in this research we investigate three significant semantic
resources for extracting hypernym and synonym relations: Google, by
applying the pattern-based approach, WordNet and Wikipedia.
The pattern-based module aims to identify joint occurrences of two
words within particular patterns, which typically indicate concrete semantic
relationships. For example, using the pattern 'NP1 and other NP2' (where NP
stands for Noun Phrase), a hyponymy relationship between copper and goods
can be identified from the sentence: “The Egyptians came hither from the land
of the Blacks bringing gold, which they exchange for copper and other
goods”. Thus, in our research we apply this module for each of the
antecedent-anaphor candidate pair, measuring the occurrences of each pair
with each of our patterns, described in section 3.2.1.1.
WordNet1 is a large lexical database of English, in which nouns, verbs,
adjectives and adverbs are grouped into sets of cognitive synonyms (synsets),
each expressing a distinct concept. Synsets are interlinked by means of
semantic and lexical relations. Most synsets are connected to other synsets via
1 http://wordnet.princeton.edu/obtain
9
a number of semantic relations. These relations vary based on the type of
word, and include hypernym and synonym. In addition, Snow et al. (2006)
presented a probabilistic model for taxonomy induction which considers as
features paths in parse trees between related taxonomy nodes. They show that
the best performing taxonomy they acquired was the one adding 400,000
hyponyms to WordNet. Using these two resources seems to be significant for
extracting relevant semantic relations, thus creating new semantic features for
the coreference classifier, as described in 3.2.2. Notice that in the basic
features of Soon (2001), WordNet is used for finding the semantic class of the
noun phrases (i.e., 'person', 'organization', 'date', 'time', 'money' etc'). Thus,
they only created a binary feature which determines whether the first senses of
a candidate pair are in the same semantic class or not, but did not check for
semantic relations such as synonyms or hypernym.
Wikipedia1 is another resource we used. Utilizing Wikipedia is based
on the work of Shnarch and Dagan (2009), for extracting lexical rules from
Wikipedia by examining the context, links and titles of Wikipedia pages. They
present the extraction of a large scale rule base from Wikipedia designed to
cover a wide scope of the lexical reference relation. They examine the
potential of definition sentences as a source for lexical reference rules, since
when writing a concept definition, one aims to formulate a concise text that
includes the most characteristic aspects of the defined concept. They show that
a definition is a promising source for reference relations between the defined
concept and the definition terms. In addition, they extract lexical reference
rules from Wikipedia redirect and hyperlink relations. As their work is
dedicated to extracting lexical reference rules, we have utilized their method
for the coreference resolution task by adding semantic features specifically
whether a candidate pair has a hypernym relation extracted from Wikipedia
(see section 3.2.3).
In this thesis we show that using only the semantic features achieves
results close to the base features of a standard coreference resolution system
(Soon 2001). In this case, the built tree (Appendix 1) includes semantic
1 http://www.wikipedia.org/
10
features from all used resources, thus demonstrating that all semantic features
are relevant to the coreference resolution task. FurthermoreAs a result, using
all semantic features in this case we obtain a higher score than using each of
the semantic feature groups alone.
Although using both of the base-system features and the semantic
features didn't obtain a higher f-measure, we give a thorough error analysis.
We analyze each resource and describe its usefulness and its types of errors,
revealed many directions for further researches.
The rest of the thesis is outlined as follows: Section 2 gives background
about the coreference resolution task definition and importance, and related
works using the semantic resources. Section 3 explains in details our
coreference application including the way of utilizing our new semantic
features. Section 4 describes our experiments and the obtained results and
gives a detailed error analysis concerning each of our semantic resource.
Section 5 concludes and presents suggested future work.
11
2. Background and related workIn this section we describe the work done in different areas which
is related to our work: first, we describe the coreference resolution task
(2,1), and BART(2.1.1), the baseline toolkit for coreference resolution
which we have used .Next, we describe works for coreference resolution
using the web (2.2) .Then, we described the patterns based module for
lexical entailment acquisition (2.3), and extracting lexical reference rules
from Wikipedia, since we applied these two works for coreference
resolution.
2.1 Coreference resolution using a machine learning approach In the machine learning based approach for coreference resolution
(Soon et al., 2001; Ng & Cardie, 2002a; Kehler et al., 2004; Yang et al.,
2004 and others), we relate to candidate <anaphor, antecedent> pairs,
where our task is to classify them as COREFERENT or NON-
COREFERENT. For each pair we collect different characteristics, called
features set, which relate to the coreference relationship. Using these
features, a machine-learning-classification-algorithm trains and builds the
classifier on a training set that contains annotated examples of coreference
relations. The training and testing instances are typically created
following the method of Soon et al. (2001). We create a positive training
instance from each pair of adjacent coreferent pairs <Pi,Pj>. Negative
training instances are obtained by pairing the anaphoric <Pj> with any
noun occurring between the anaphor <Pj> and the antecedent <Pi>.
During testing each text is processed from left to right: each noun is
paired as anaphor <Pj> with any preceding <Pk> from right to left, until a
pair labeled as coreferent is output, or the beginning of the document is
reached. That is because we assume there is at most one antecedent. The
main machine-learning classification algorithms, which were used for the
coreference resolution task, are decision trees (Quinlan 1993), maximum
entropy (Berger et al. 1996) and RIPPER (Cohen, 1995).
12
Soon et al. (2001) implemented one of the first simple domain-
independent machine learning coreference resolution systems. They built
a coreference resolution system based on a C4.5 decision tree classifier,
which uses only twelve features. Their features include grammatical
agreement features, syntactical features, positional features, string match
features, a proper name match normalizer feature, and a WordNet
semantic feature. In the next subsection we describe a modular toolkit,
which implements that module.
2.1.1 BART module 1 Bart is a modular toolkit for coreference resolution, which we use in
our research as a baseline coreference resolution application (more details in
section 3.1)
Bart implements the 12 features of Soon (at el 2001), described in table
1. BART is reaching 64.8% F1 measure on MUC6 corpus and 62.9% F1
measure on MUC7 corpus. (For description of MUC6 and MUC7 corpus, see
subsection 4.1)
Nu
m
Name Values
1 STRING
MATCH
TRUE if REi and REj have the same spelling, else
FALSE
2 ALIAS TRUE if one RE is an alias of the other; else FALSE
3 I PRONOUN TRUE if REi is a pronoun; else FALSE.
4 J PRONOUN TRUE if REj is a pronoun; else FALSE.
5 J DEF TRUE if REj starts with the; else FALSE.
6 J DEM TRUE if REj starts with this, that, these, or those;
else FALSE.
7 NUMBER TRUE if both REi and REj agree in number; else
1 1 An open source version of BART is available from
http://www.sfs.uni-tuebingen.de/˜versley/BART/.
13
FALSE.
8 GENDER UNKNOWN if either REi or REj have an undefined
gender. Else if they are both defined and agree
TRUE; else FALSE.
9 PROPER
NAME
TRUE if both REi and REj are proper names; else
FALSE.
10 APPOSITIVE TRUE if REj is in apposition with REi; else FALSE.
11 WN CLASS UNKNOWN if either REi or REj have an undefined
WordNet semantic class1 . Else if they both have a
defined one and it is it is the same TRUE; else
FALSE.
12 DISTANCE The number of sentences between REi and REj
TABLE 1 – the features implemented in the BART tollkit
BARTs design provides effective separation of concerns across
several tasks, including engineering new features that use different
sources of knowledge, designing improved or specialized preprocessing
methods, and improving the way that coreference resolution is mapped to
a machine learning problem.
BARTs architecture includes 4 phases: Preprocessing, Feature
Extraction, Learning, Training/Testing: Preprocessing consists in marking up
noun chunks and named entities, as well as additional information such as
part-of-speech tags and merging this information into markables that are the
starting point for the noun chunk mentions used by the coreference resolution.
In the feature extraction phase, each candidate pair of anaphor and antecedent
candidate is represented as a PairInstance object, which is enriched with
classification features by feature extractors which are realized as separate
classes, allowing for their independent development. For the learning, the
module uses the functionality of the WEKA machine learning toolkit. In the
training phase, the pairs that are to be used as training examples have to be
selected in a process of sample selection, whereas in the testing phase, it has to 1
? Semantic classes such as: "person," "organization", "location", "date, "time", etc'. see more in subsection 3.2.2.2
14
be decided which pairs are to be given to the decision function and how to
group mentions into equivalence classes given the classifier decisions.
2.2 Using the web for coreference resolutionMost of the works using the machine learning approach build a
classifier trained on a corpus such as MUC or ACE. However, the size of
the corpora leads to data sparseness. To solve this problem, we propose
using the web which is the largest corpus available. Although the web has
been already used for coreference resolution, it has been used only for two
specific types of coreference: Pronoun anaphora resolution (X. Yang
2005) and other – anaphora (N. Modjeska at el, 2003), as explained
below.
Pronoun resolution
Pronoun resolution is a subtask of coreference resolution, in which
the antecedent must be found only for a pronoun anaphor.
In the work of Yang (2005), for each pair of anaphor and a candidate
antecedent, they obtain statistics by querying a web search engine. They
obtain a predicate-argument statistics via a web search engine like Google and
Altavista. In their method, three relationships: possessive-noun, subject-verb
and verb-object, are considered. For these three types of predicate-argument
relationships, queries are constructed for each candidate antecedent NPcandi, in
the forms of “NPcandi VP” (for subject-verb), “VP NPcandi” (for verb-object),
and “NPcandi’s NP” or “NP of NPcandi” (for possessive-noun).
For example, consider the sentence:
"Several experts suggested that IBM’s accounting grew much more
liberal since the mid 1980s as its business turned sour."
15
For the pronoun “its” and the candidate “IBM”, the two generated
queries are “business of IBM” and “IBM’s business”. To reduce data
sparseness, in an initial query only the nominal or verbal heads of the
candidate antecedent are retained. Also, each named entity (such as
company, person, location etc') is replaced by the corresponding common
noun. (e.g, “IBM’s business” OR “company’s business” and “business of
IBM” OR “business of company”).
The semantic compatibility of the candidate with the anaphor
could be represented simply in terms of frequency
(1) StatSem(candi, ana) = count(candi, ana)
Where count(candi, ana) is the hit number of the queries returned by the
search engine. Alternatively, in terms of conditional probability (P(candi,
ana|candi)), where the count of the pair is divided by the count of the
single candidate in the corpus. That is:
(2) StatSem(candi, ana) = count(candi, ana)/count(candi)
Where count(candi) is the hit number of the query formed with only the
head of the candidate candi. In this way, the statistics would not bias
negatively candidates having lower frequency.
Thus, the values of equation 1 and Equation 2 are used for
creating new features for the coreference classifier. Their study shows that
the semantic compatibility obtained from the web significantly improves
the resolution of neutral pronouns.
Other anaphora:
Other anaphora is a subtask of coreference resolution, aiming to
find the 'other-anaphors' - that is, the referential noun phrases with the
modifiers 'other' or 'another'. For example, consider the following
sentences:
16
(1) An exhibition of American design and architecture opened in
September in Moscow and will travel to eight other Soviet cities.
(2) The alumni director of a Big Ten university: “I’d love to see
sports cut back and so would a lot of my counterparts at other
schools"
In example 1, 'other Soviet cities' refers to 'other than Moscow', and in
example 2, ' other schools' refers to 'other than the above mentioned Big
Ten university'.
Modjeska at el. (2003) present a machine learning approach to
other anaphora, using a Naive Bayes (NB) classifier. They show the
benefit of integrating the Web frequency counts obtained for syntactic
patterns specific to other-anaphora as an additional feature into the NB
algorithm.
They used the following pattern for other anaphora:
1. (N1 {sg} OR N1 {pl}) and other N2 {pl}
For common noun antecedents, they instantiate the pattern by
substituting N1 with each possible antecedent from set A, and N2 with the
anaphor- because normally N1 is a hyponym of N2 in (1), and the
antecedent is a hyponym of the anaphor. For example, An instantiated
pattern in the sentence (2) above is: "(university OR universities) and
other schools"
For NE antecedents they also instantiate (1) by substituting N1
with the NE category of the antecedent, and N2 with the anaphor
In addition, for NE antecedents they used the pattern:
2. N1 and other N2 {pl}
Where N1 is instituted with the original antecedent and N2 with
the anaphor. For example, in the sentence:
Will Quinlan had not inherited a damaged retinoblastoma supressor
gene and, therefore, faced no more risk than other children
Instantiation will give:
Will Quinlan and other children”
17
They submit these instantiations as queries to the Google search
engine, and these frequencies are than used for calculating new features
for the NB classifier. The new features raise the other-anaphora F-
measure from 45.5% to 56.9%.
2.3 The pattern-based module for relation extractionIn their work, Mirkin and Dagan (2006) use a pattern-based
approach for lexical entailment acquisition. Their general pattern-based
extraction module receives as input a set of lexical-syntactic patterns (as
in Table 1) and either a target term or a candidate pair of terms. It then
searches the web for occurrences of the patterns with the input term(s).
1 NP1 such as NP2
2 Such NP1 as NP2
3 NP1 or other NP2
4 NP1 and other NP2
5 NP1 ADV known as NP2
6 NP1 especially NP2
7 NP1 like NP2
8 NP1 including NP2
9 NP1-sg is (a OR an) NP2-sg
10 NP1-sg (a OR an) NP2-sg
11 NP1-pl are NP2-pl
Table 1: The patterns Mirkin and Dagan (2006) used for lexical
entailment acquisition
A small set of queries is created for each pattern-terms
combination, in order to retrieve as much relevant data with as few
queries as possible .Each pattern has two variable slots to be instantiated
by candidate terms for the sought relation.
18
In their research, the extraction module can be used in two modes:
(a) receiving a single target term as input and searching for instantiations
of the other variable to identify candidate related terms (b) receiving a
pair of terms and searching pattern instances with both terms - in order to
validate and collect information about the relationship between the terms.
Google provides a useful tool for these purposes, as it allows using
a wildcard which might match either a un-instantiated term or optional
words such as modifiers.
For example, the query
"such ** as *** (war OR wars)"
is one of the queries created for the input pattern such NP1 as NP2 and
the input target term war, allowing new terms to match the first pattern
variable.
For the candidate entailment pair war- struggle, the first variable is
instantiated as well. The corresponding query would be:
"such *(struggle OR struggles) as *** (war OR wars)”.
The automatically constructed queries, covering the possible
combinations of multiple wildcards, are submitted to Google1 and a
specified number of snippets are downloaded. The snippets are processed
using a word splitter and a sentence splitter2 , and the sentences are
processed with the OpenNLP43 POS tagger and NP chunker. Then,
pattern-specific regular expressions are used to extract relationships from
the chunked sentences, by verifying that the instantiated pattern indeed
occurs in the sentence and identifying variable instantiations.
In our research, we apply a similar method of the patterns based
module for coreference resolution (details in section 3.2.1)
1 http://www.google.com/apis /
2 Available from the University of Illinois at Urbana-Champaign, http://l2r.cs.uiuc.edu/~cogcomp/tools.php
3 www.opennlp.sourceforge.net /
19
2.4 Semantic features for coreference resolutionIn this subsection we focus on the use of semantic features for the
coreference classifier. We give examples for using semantic features from
the work of Ponzetto and Strube (2006).
2.4.1 WordNetPonzetto and Strube (2006) enrich the semantic information
available to the coreference classifier by using semantic similarity
measures based on the WordNet taxonomy (Pedersen et al., 2004). The
measures they use include path length based measures (Rada et al.,
1989;Wu & Palmer, 1994; Leacock & Chodorow, 1998), as well as ones
based on information content (Resnik, 1995; Jiang & Conrath, 1997; Lin,
1998).
In their work, the measures are obtained by computing the
similarity scores between the head lemma of each potential antecedent-
anaphor pair. In order to overcome the sense disambiguation problem,
they factorize over all possible sense pairs: given a candidate pair, they
take the cross product of each antecedent and anaphor sense to form pairs
of synsets. For each similarity measure they compute the similarity score
for all synset pairs, and create the following features:
WN SIMILARITY BEST the highest similarity score from all
senses of antecedent and anaphor.
WN SIMILARITY AVG the average similarity score from all
senses of antecedent and anaphor.
Pairs containing noun which cannot be mapped to WordNet synsets are
assumed to have a null similarity measure.
2.4.2 WikipediaWikipedia is a multilingual Web-based free-content encyclopedia.
The English version, as of 14 February 2006, contains 971,518 articles
with 16.8 million internal hyperlinks thus providing a large coverage
20
available knowledge resource. In addition, it provides also taxonomy by
means of the category feature: articles can be placed in one or more
categories, which are further categorized to provide a category taxonomy.
In practice, the taxonomy is not designed as a strict hierarchy or tree of
categories, but allows multiple categorization schemes to co-exist
simultaneously. Because each article can appear under more than one
category, and each category can appear in more than one parent category,
the categories do not form a tree structure, but a more general directed
graph.
Ponzetto and Strube (2006) used Wikipedia as follows: given the
candidate referring expressions <Pi> and <Pj> they pull the Wikipedia
pages they refer to. This is accomplished by querying the page titled as
the head lemma. They follow all redirects and check for disambiguation
pages, i.e. pages for ambiguous entries which contain links only (e.g.
Lincoln). If a disambiguation page is hit, we first get all the hyperlinks in
the page. If a link containing the other queried noun is found (i.e. a link
containing president in the Lincoln page), the linked page (President of
the United States) is returned, and else - the first article linked in the
disambiguation page is returned. Given a candidate coreference pair
<Pi,Pj>, the related Wikipedia pages <PREi, PREj> they point to,
retrieved by querying pages with titles <TREi, TREj> , they extract the
following features:
I/J GLOSS CONTAINS: U if no Wikipedia page titled TREi/j is
available. T if the first paragraph of text of PREi/j contains
TREj/i ; else F.
I/J RELATED CONTAINS: U if no Wikipedia page titled TREi/j
is available. T if at least one Wikipedia hyperlink of PREi/j
contains TREj/i ; else F.
I/J CATEGORIES CONTAINS: U if no Wikipedia page titled as
TREi/j is available. T if the list of categories PREi/j belongs to
contains TREj/i ; else F.
21
GLOSS OVERLAP: the overlap score between the first paragraph
of text of PREi and PREj . Following Banerjee & Pedersen (2003)
its computed as for n phrasal m-word overlaps.
In addition, they used three additional features based of the
Wikipedia category graph, based on Reda et al (1989).
Thus, most of their features are based on the Wikipedia category
graph, differently from Shnarch (2009) work (described in the next
subsection), on which we based our Wikipedia features for the
coreference classifier (Section 3.2.3.).
2.5 Using Wikipedia for extracting Lexical Reference
RulesA most common need in applied semantic inference is to infer the
meaning of a target term from other terms in a text. For example, a Question
Answering system may infer the answer to a question regarding luxury cars
from a text mentioning Bentley, which provides a concrete reference to the
sought meaning.
Aiming to capture such lexical inferences Shnarch and Dagan (2009)
followed (Glickman et al., 2006), which coined the term lexical reference (LR)
to denote references in text to the specific meaning of a target term. They
analyzed the dataset of the First Recognizing Textual Entailment Challenge
(Dagan et al., 2006), which includes examples drawn from seven different
application scenarios. It was found that an entailing text indeed includes a
concrete reference to practically every term in the entailed (inferred) sentence.
Thus, the goal of Shnarch (2009) is to utilize the broad knowledge of
Wikipedia to extract a knowledge base of lexical reference rules. Each
Wikipedia article provides a definition for the concept denoted by the title of
22
the article. As the most concise definition they take the first sentence of each
article, following (Kazama and Torisawa, 2007).
Since a concept definition usually employs more general terms than the
defined concept (Ide and Jean, 1993), the concept title is more likely to refer to
terms in its definition rather than vice versa. Therefore the title of the
Wikipedia article is taken as the left side of the constructed rule while an
extracted definition term is taken as its right side. As Wikipedia’s titles are
mostly noun phrases, the terms they extract as the right sides are the nouns and
noun phrases in the definition.
Their methods for extracting rules from Wikipedia are:
Be-Comp They identify the 'IS - A' pattern in the definition sentence
by extracting nominal complements of the verb ‘be’, taking them as the
right side of a rule whose left side is the article title.
All-N The Be-Comp extraction method yields mostly hypernym
relations, which do not exploit the full range of lexical references
within the concept definition. Therefore, we further create rules for all
head nouns and base noun phrases within the definition
Title Parenthesis A common convention in Wikipedia to disambiguate
ambiguous titles is adding a descriptive term in parenthesis at the end
of the title, as in The Siren (Musical), The Siren (sculpture) and Siren
(amphibian). From such titles they extract rules in which the
descriptive term inside the parenthesis is the right side and the rest of
the title is the left side.
Redirect As any dictionary and encyclopedia, Wikipedia contains
Redirect links that direct different search queries to the same article,
which has a canonical title. For instance, there are 86 different queries
that redirect the user to United States (e.g. U.S.A., America, Yankee
land). Redirect links are hand coded, specifying that both terms refer to
the same concept. We therefore generate a bidirectional entailment rule
for each redirect link.
Link Wikipedia texts contain hyper links to articles. For each link they
generate a rule who's LHS is the linking text and RHS is the title of the
23
linked article. In this case they generate a directional rule since links do
not necessarily connect semantically equivalent entities.
Based of this work and the described extraction method, we apply new
features from Wikipedia, as described in section 3.2.3.
24
3. Coreference resolution using semantic
featuresIn this section we describe the application of a range of semantic
resources for coreference resolution and the new features we have created
for the coreference classifier.
3.1 Baseline coreference resolution systemAs our baseline system we use the BART toolkit, which
implements the twelve features used by Soon (2001) ,described in 2.1.1.
These features include syntactical features, grammatical agreement
features, positional features, string match features, proper name match
feature, and a basic WordNet semantic feature (section 2.1.1). As this
toolkit is very modular, we could use it as the baseline coreference
system, and add our new semantic features to the toolkit.
In addition, as the machine learning approach for coreference
resolution includes building a classifier, in our research we tried to use
several classification algorithms implemented in Weka1. The major
algorithms are J482 and SVM3.
In the next subsections we describe the new semantic features we added to
the BART toolkit (3.2).
3.2 Additional semantic featuresIn this sub section we describe the semantic features we developed
for the coreference resolution classifier: The patterns-based module using
Google (3.2.1), WordNet features (3.2.2), and Wikipedia features (3.2.3).
All of these features actually aiming to test the semantic relations between
1 See: http://www.cs.waikato.ac.nz/ml/weka/
2 J48: http://grb.mnsu.edu/grbts/doc/manual/J48_Decision_Trees.html
3 SVM: http://en.wikipedia.org/wiki/Support_vector_machine
25
the word pairs in the coreference classifier. Having or not a strong
semantic relation between the pair of words helps to decide whether these
words corefer or not.
3.2.1 The web-patterns based moduleIn this subsection we first describe the module structure and its
steps (3.2.1.1) and then we explain the way of applying the module for
creating the new coreference features (3.2.1.2).
3.2.1.1 The module structureFollowing Mirkin and Dagan (2006), the main idea of the pattern-
base module (2.3) is using the web as the hugest corpus available for
searching specific patterns in it. The module includes four main steps
performed for every pair of terms, as listed below. Next we give the
details of each of the steps.
1. Choosing the patterns
2. Query creation
3. Snippet processing - downloading, cleaning, filtering and syntactic
processing
4. Using regular expressions defined for the patterns
1. Choosing the patterns
In the web-patterns based approach we start with pre-defined
patterns, according to the research specific task – which is, in our case,
coreference resolution.
Many co-referring words (antecedent – anaphor pair) have a
hypernym relation. For example: Sony and Company, TWA and airline,
etc. In other words, having a hypernym relation is indicative for a
coreference relation between the words.
26
Therefore, the patterns we used are patterns suitable for the
hypernym relation. The list of the patterns and an example sentence are
given in table l
Num Pattern Sentence Example
1 NP1 such as NP2 Scientists such as Einstein
2 Such NP1 as NP2 Such scientists as Einstein
3 NP1 or other NP2 Einstein or other scientists
4 NP1 and other NP2 Einstein and other scientists
5 NP1 ADV known as NP2 Einstein known as an important scientist
6 NP1 especially NP2 Scientists, especially Einstein
7 NP1 like NP2 Scientists like Einstein
8 NP1 including NP2 Scientists including Einstein
Table 1- the patterns we used for the coreference resolution task and an
example sentences
2. Query Creation
The query construction method must support queries with two
terms and a variable. When designing the query construction module, we
must consider that terms may contain several words and that queries are
an expensive resource, since the number of queries per day one can
submit to a commercial search engine is limited.
Queries were submitted to Google’s search engine through Google API1.
Therefore, the queries are constructed using the engine’s syntax, while
complying with some restrictions posed by the API package, such as a
limit to the maximal number of words per query. We use a feature of the
search engine which allows using an asterisk instead of any single word
(some stop words are excluded from this count, such as a, the etc.). In our
research, up to one consecutive asterisk is supported.
1 http://www.google.com/apis /
27
In addition, the query is constructed with the singular and the plural form
of the terms- depends on the pattern.
For example, for the pair TWA-airline, here are some of the created queries:
"(airline OR airlines) such as * TWA"
"(airline OR airlines) especially * TWA"
"(airline OR airlines) including * TWA"
"TWA and other * (airline OR airlines)"
3. Snippet Processing
For each query submitted to the search engine, we download a
predefined number of snippets. Snippets are used for a practical reason –
we do not need to download and process the entire document for a single
instance of the pattern we’re after. The drawback in using snippets is that
many times they contain partial sentences, decreasing the accuracy of the
syntactic processing.
Each downloaded snippet is cleaned from HTML tags and is
converted to plain text format. Then, using a word splitter and a sentence
segmenter from The University of Illinois at Urbana-Champaign1, we
tokenize the snippets and split them into sentences. Each sentence that
does not contain the target terms is deleted. All duplicate sentences are
deleted as well. Then, using OpenNLP2 Part of Speech Tagger and NP-
Chunker, we processed each of the sentences. Here’s an example of a
sentence retrieved for Sony and Company, processed up to shallow
parsing:
1 http://l2r.cs.uiuc.edu/~cogcomp/tools.php
2 www.opennlp.sourceforge.net /
28
[NP Sony,/NNP while/IN ] [VP well/RB known/VBN ] [PP
as/IN ] [NP THE/DT company/NN ] [NP that/WDT ] [VP
has/VBZ made/VBN ] [NP broadcast/NN television/NN
camerass,/NN ] [VP is/VBZ ] [NP the/DT owner/NN ] [PP
of/IN ] [NP the/DT old/JJ Minolta/NNP camera/NN
company,/JJ
4. Using regular expression defined for the patterns
After processing the snippets, pattern-specific regular expressions
are used over the chunked sentences to verify that the instantiated pattern
indeed occurs in the sentence.
Table 1 lists the regular expressions used by our method in order to
construct the extraction patterns grammar. Table 2 shows how the
patterns’ expressions were constructed from smaller building blocks. The
extraction stage, applied on chunked sentences, was designed to trade off
precision and recall in the extraction by handling some of the common
chunker errors, while not attempting to cover all cases. When compiling
the regular expressions we had in mind the idea that when using the web it
might not be necessary to extract information from complex text
structures, but rather settle for simpler text while relying on the scale and
redundancy of the web (Etzioni, 2004).
PatternPattern NameNum
NPWithPossCommaExpsuchAsExp
NPListExp
NP1 such as NP21
suchExpasExp NPListExpSuch NP1as NP22
NPListExporExp otherNPExpNP1 or other NP23
NPListExpandExp otherNPExpNP1 and other NP24
NPWithPossCommaExpadvKnownAsExp NP1ADV known as 5
29
NPExpNP2
NPWithPossComma ExpespeciallyNPExpNP1 especially NP26
NPWithPossCommaExplikeExp NPListExpNP 1like NP27
NPWithPossCommaExpincludingExp
NPListExp
NP 1including NP28
Table 1-the patterns we use, specified by sub-patterns (continuing in table 2)
Regular expression PattenNum
<NP [^>]* >NPExp1
<PP such/JJ as/IN >( :/:)? suchAsExp2
<NP (s|S)uch/JJ [^>]*> suchExp3
<PP as/IN > asExp4
NPExp (( ,/, NPExp ){0,10} (and|or)/CC NPExp )? NPListEx15
<NP [^>]* (and|or)/CC [^>]* > NPandNPExp6
(NPExp ,/, )* NPandNPExp NPListEx27
((NPListEx1)|(NPListEx2)) NPListExp8
<NP [^>]* > ,/, NPWithCommaExp9
((NPWithCommaExp)|(NPExp)) NPWithPossComma
Exp
10
<(PP|VP) including/VBG > includingExp11
(and/CC)?<ADVP especially/RB > NPExp especiallyNPExp112
(and/CC)?<NP especially/RB [^>]* > especiallyNPExp213
((especiallyNPExp1)|(especiallyNPExp2)) especiallyNPExp14
<PP like/IN > likeExp15
(([^/]*/RB)|(<ADVP [^/]*/RB >)) advExp16
((<VP (advExp)?known/VBN >|((advExp )?<VP
known/VBN >)) <PP as/IN >)
advKnownAsExp17
<NP other/JJ [^>]* > otherNPExp18
30
Table 2- the sub-patterns and its regular expression
3.2.1.2 Pattern-based feature extraction After applying the module as described above, we use its output
for creating new features for the coreference classifier.
We tried two sets of features, binary features and numeric features:
In the numeric mode, the value of each feature is the number of snippets
retrieved by its pattern. In the binary mode, the value of each feature is
true when there is at least one snippet returned by its pattern.
For example, the feature values for the pair ' TWA – airline' are: described in
table 3.
Feature name Numeric Value Binary value
NP1 such as NP2 11 1
Such NP1 as NP2 0 0
NP1 or other NP2 3 1
NP1 and other NP2 7 1
NP1 ADV known as NP2 0 0
NP1 especially NP2 0 0
NP1 like NP2 1 1
NP1 including NP2 6 1
Table 3- feature values for the pair ' TWA – airline'
In addition to the features created from the patterns, we used two
additional global features, that look at the overall occurrences and not only at
the occurrences of each pattern separately :
31
1. 'SUM OF ALL PATTERNS' - The sum of all 8 values of
the features from the patterns.
Formally:
Score1(w1,w2)
Where count(w1,w2,P) is the number of snippets returned
for the words pair (w1,w2), for the specific pattern P
(Which is actually the value of the numeric feature for
pattern P).
2. 'NUMBER OF UNIQUE PATTERNS' – The number of
features whose values are more then 0.
Formally:
ScoreF(w1,w2)
where count(w1,w2,P) is the same as described above.
Both features seem relevant because they consider the results from all
patterns. While feature (1) considers the number of matches from all
patterns, feature (2) relates to the number of different patterns which
retrieved a match.
Continuing the example of table 3, the value of 'ALL PATTERNS'
feature is the sum of 11+3+7+1+6 = 28 and the value of 'UNIQUE
PATTERNS' is 5.
Using Synonyms:
In order to increase recall, we are using the words synonyms from
WordNet1: For each word in each pair we find the first synonym in the
first sense.
32
Then, we create queries not only for the original pair, but also for the
word's synonyms.
For example: if the input is the pair 'acquisition - transaction', we create
queries for the pairs 'acquisition - transaction' and 'acquisition – dealing'
since 'dealing' is a synonym in the first sense of 'transaction', while
acquisition doesn't have a synonym in the first sense.
Then, the value of each feature of the original pairs is the sum of
the values in the derived pairs (including the original pair).
3.2.2 WordNet1 Feature3.2.2.1 WordNet as a lexical tool WordNet1 is another resource for finding synonyms and
hypernym, which is ,as said before, a relevant relation for coreference.
For the WordNet feature we used WordNet 3.0 and the
Snow 400k resource, (Snow et al.2006) which is a statistical extension of
WordNet. WordNet is a large lexical database of English, developed
under the direction of George A. Miller. Nouns, verbs, adjectives and
adverbs are grouped into sets of cognitive synonyms (synsets), each
expressing a distinct concept. Synsets are interlinked by means of
conceptual-semantic and lexical relations. Thus, for each pair of words in
our classifier instances, we check whether it appears in a hypernym
relation. More statistics about WordNet 3.0 can be found in table 1.
In addition, Snow et al. (2006) presented a probabilistic model for
taxonomy induction which considers as features paths in parse trees
between related taxonomy nodes. They show that the best performing
taxonomy was the one adding 400,000 hyponyms to WordNet.
POS Unique Synsets Total
Strings Word-Sense Pairs
1 http://wordnet.princeton.edu/
33
Noun 117798 82115 146312
Verb 11529 13767 25047
Adjective 21479 18156 30002
Adverb 4481 3621 5580
Totals 155287 117659 206941
Table 1 – WordNet 3.0 statistics
3.2.2.2 WordNet Feature
WordNet is utilized within one of the twelve features of Soon
(2001). This feature is called 'Semantic Class Agreement Feature'
(SEMCLASS): Its possible values are True, False, or Unknown. They
defined the following semantic classes: "female," "male," "person,"
"organization," "location," "date," "time," "money, "percent," and
"object." These semantic classes are arranged in a simple IS-A hierarchy.
Each of the "female" and "male" semantic classes is a subclass of the
semantic class "person," while each of the semantic classes
"organization," "location," "date," "time," "money," and "percent" is a
subclass of the semantic class "object." Each of these defined semantic
classes is then mapped to a WordNet synset. For example, "male" is
mapped to the second sense of the noun “male” in WordNet; "location" is
mapped to the first sense of the noun “location”, and so on.
In addition, they assume that the semantic class for every markable
extracted is the first sense of the head noun of that markable. Since
WordNet orders the senses of a noun by their frequency, this is equivalent
to choosing the most frequent sense as the semantic class. If the selected
semantic class of a markable is a subclass of one of the defined semantic
classes C then the semantic class of the markable is considered to be C;
else its semantic class is "Unknown." The semantic classes of markables i
and j are in agreement if one is the parent of the other (e.g., “chairman”
with semantic class "person" and “Mr. Lim” with semantic class "male"),
34
or if they are the same (e.g., “Mr. Lim” and “he”, both of semantic class
"male"). The value returned for such cases is “True”. If the semantic
classes of i and j are not the same (e.g., “IBM” with semantic class
"Organization" and “Mr. Lim” with semantic class "Male"), the returned
value is “False”. If either semantic class is "Unknown" then the head noun
strings of both markables are compared. If they are the same, return True;
else return Unknown.
In our research we used WordNet in order to find hypernym
and synonyms relations between the anaphor and antecedent. Our
additional feature can get 3 possible values: true, false and unknown.
1. True- if WordNet contains a path of hypernym between the two words
in the pair.
For example, giving the pair 'dog' and 'animal', WordNet contains
the path:
dog ->domesticated animal -> animal
Another example is for the pair CEO and person. In this case, the path is:
chief executive officer ->corporate executive -> executive, ->
administrator -> leader -> person
We consider a max of 10 words in the path (considering time of running
and relation relevancy).
2. False – if the two words appear in WordNet, but, WordNet doesn't
include a path of hypernym between them.
3. Unknown – if at least one word from the pair does not appear in
WordNet, or if one word of the pair is a pronoun.
Separating between case number 2 (False) and case number
3 (Unknown) aims to treat the sparseness of WordNet. The fact that the
two words appear in WordNet without having the appropriate relation is
more indicative as not being coreference then the fact that the words do
not appear in WordNet.
35
3.2.3 Wikipedia features
For the Wikipedia feature, we utilized the method of (Shnarch,
2009) which used Wikipedia for extracting Lexical Reference Rules,
(subsection 2.5.), for coreference resolution.
As already mentioned, the hypernym relation is mostly relevant for
the coreference resolution. (Shanrch , 2009) work (subsection 2.5)
generates rules from Wikipedia containing many hypernym rules (in
addition to other types of lexical references). For example, the Be-Comp
extraction method yields mostly hypernym relations. Thus, in our
research, we use an appropriate feature which can get 3 possible values:
1. True - Where a pair of words (w1,w2) corresponds to a rule from
Wikipedia, where the rule can be extracted from each of the extraction
method described in 2.5 (Be-Comp, All-N ,Title Parenthesis, Redirect
and Link).
For example, consider the pair (Clinton, president):
The first sentence in the page titled as Bill Clinton is:
William Jefferson "Bill" Clinton (born William Jefferson Blythe III,
August 19, 1946) served as the 42nd President of the United States
from 1993 to 2001.
Thus , the rule 'Clinton -> President' is extracted, and the value of the
feature is true.
2. False - Where a pair of words (w1,w2) does not corresponds to a rule
from Wikipedia.
3. Unknown – Where at least one word for the checked pair is a pronoun.
36
37
4. Results and analysisIn this section the data set and annotation scheme of our
experiments are described (4.1), and the evaluation measures (4.2) are
detailed. Next, we described the experimental setup (4.3), our obtained
results (4.4) and detailed analysis (4.5).
4.1 The dataset MUC (Message Understanding Conference) has been a series of
conferences, supported by DARPA, that aim to develop technologies for
information extraction. The coreference resolution task is considered one
of the important layers of the information extraction task.
The coreference annotation could be considered a kind of
hyperlinked version of the text, where the links connect all the mentions
of a given entity. To assist in building and evaluating such a system, MUC
gathered a set of articles annotated with pair-wise coreference links
between anaphors and antecedents, thus creating a text annotated with
coreference chains. The coreference chains are used to train and test a
coreference resolution system. In the evaluation, we use one of the
standard coreference corpora, the MUC 7 (MUC 7, 1998) dataset.
MUC 7 – Statistics and Structure
The MUC corpus and the MUC 7 coreference dataset in particular
are composed exclusively of newswire articles. Following is an example
of a short annotated text from MUC 7:
At the White House, shortly before departing for New York,
<Clinton> said the deaths of the <Force crew members> were
painful to
<him> because <they> worked for…
38
The noun phrases, which are part of a coreference relationship, are
emphasized, using a bold italic font. There are two coreference
relationships in this paragraph between the noun phrases:
1. Clinton and him
2. Force crew members and they
The MUC 7 dataset includes fifty documents annotated for
coreference, which are divided into 30 training documents, referred to as
dry-run documents, and 20 test documents, referred to as formal
documents. Summarization of statistics concerning the training set and the
test set is found in table 1.
All
database
Train
set
Test set Ave per
Doc
Num of docs 50 30 20
Num of paragraph 635 258 377 12.7
Num of sentences 1033 620 413 20.66
Num of tokens 25583 15331 10252 511.66
Num of coreference
nodes
3780 2259 1521 75.6
Num of coreference
links
2738 1624 1114 54.76
Num of coreference
groups/Clusters
977 603 374 19.54
Table 1 – MUC7 statistics. Coreference nodes are noun phrases participanting
coreference relation, coreference links are the links annotated between the
anaphor and the antecedent, and coreference groups are the chains created by
the coreference links.
4.2 Evaluation measuresIn this sub section we report the way of evaluating the results of
the coreference resolution system, as implemented in the MUC scoring
metric (Vilain et al., 1995).
39
4.2.1 IntroductionAs mentioned previously, the coreference clustering task aims to
group all the coreference links into clusters, where each cluster represents
the mentions in the document of a single entity in the real world. When
evaluating the performance of the coreference clusters we need to
compare the clusters which are implicitly created by the gold standard
key, i.e., the annotated coreference links, with the ones which are created
by the response, i.e., the output of our coreference resolution system. The
key links constitute a minimal spanning set, since each noun phrase in the
annotation could have at most one coreference link (as an anaphor,
referring to some other noun phrase), and each noun phrase could be
referred to only once (as an antecedent). On the other hand, in the base of
a cluster from the response, an anaphor doesn’t necessarily have a unique
antecedent, thus it is not a minimal spanning set of links. Nevertheless, a
cluster might be created by a different set of links, depending on its
source: the key or the response. Therefore, the idea of MUC’s scoring
scheme is not to compare the links specified in a key cluster with the links
of a response cluster, but rather to compare the nodes consisting of a key
cluster with the ones consisting of a response cluster.
The scoring scheme produces three evaluation measures, including
precision, recall and the F-measure .In the next sub-section we will
formally explain these measures.
4.2.2 Evaluating Recall and Precision
4.2.2.1 IntroductionAs mentioned previously, the coreference clustering task aims to
gather all the coreference links into clusters, while each cluster represents
the instances in the document of a single entity in the real world. When
evaluating the performance of the coreference clusters we need to
compare the clusters which are implicitly created by the key, i.e., the
40
annotated coreference links, with the ones which are created by the
response, i.e., the output of our coreference resolution system. Both the
key links and the response links constitute a minimal spanning set, since
each noun phrase could have at most one coreference link (as an anaphor,
referring to some other noun phrase), and each noun phrase could be
referred to only one time (as an antecedent). Nevertheless, a cluster might
be created by a different set of links, depending on its source: the key or
the response. Therefore, the idea of MUC’s scoring scheme is not to
compare the links inside a key cluster with the links of a response cluster,
but rather to compare the nodes consisting of a key cluster with the ones
consisting of a response cluster.
The scoring scheme produces three evaluation measures, including
precision, recall and the F-measure. In this sub-section we will formally
explain the recall, precision and F1 measures.
4.2.2.2 Recall and precision evaluationIn order to evaluate the recall and precision, let us first define the
following:
C – Cluster of coreferring nodes
e(C) – The minimal number of edges needed for spanning the cluster C.
This number is in fact equals to |C|-1.
Let us define the cluster Si from the keys and the corresponding
cluster/clusters Ri from response, Recalli is then the recall for
clusters/cluster groups Si and Ri, than:
Recalli =
That is, the number of edges needed for spanning the common
nodes of the cluster in the keys and in response, divided by the number of
edges needed for spanning the cluster in the keys. This is actually the
fraction of the correct links found in the response cluster out of the
number of edges needed for spanning the cluster in the key.
41
Extending this method to all clusters of the key for the entire test set, we
get:
Total recall for all clusters =
In addition, we define Precisioni as the precision for clusters Si and Ri, than
Precisioni =
That is, the number of edges needed for spanning the common
nodes of the cluster in the keys and in response, divided by the number of
edges needed for spanning the cluster in the response. This is actually the
fraction of the correct links founded in the cluster out of the number of
edges needed for spanning the cluster in response.
Extending this method to all clusters of the key for the entire test set, we
get:
Total precision for all clusters =
For example, consider the keys clusters: S1= {1, 2}, S2= {3, 4, 5}
and the response clusters: R1 = {1,2,3}, R2= {4,5,6}:
Than:
Recall = = = = 2/3
Precision = = = = 2/4
42
F1 is a standard measure, which combines the precision and recall
measures. In the coreference clustering evaluation we refer to F1. F1 is
defined here in the same way it is usually defined:
F1 is the harmonic average between the recall and the precision.
4.3 Experimental setupIn this subsection we describe our experiment setting: choosing a
part of the testing/training documents from MUC (4.3.1), the way of
applying our new features (4.3.2) and using then within the BART toolkit
(4.3.3)
4.3.1 Choosing documents from MUCIn our experiments we used 15 documents from the training set and
5 documents of the test set of MUC, due to the running time and the
number of queries we need to submit to Google. For the 20 documents we
used we got over 40000 pairs of words participating in a coreference
relation in the training set or in the test set of the coreference classifier.
For each pair we check 8 patterns, and for each pattern we create a query
with 0 or 1 asterisk Thus, we submitted around 400,000 queries to
Google.
For each query we processed up to the first 30 snippets – overall,
we had to process over 1.5 million snippets (since some queries did not
produce any results or less than 30 snippets), and use our regular
expressions over each of the processed snippet. For these reasons, we
have decided to randomly choose 20 documents for the training and test
set. To apply the module for all documents, one should use faster regular
43
expression matching method and a faster chunker, or, alternatively, use
fewer queries and fewer snippets, which might harm the recall.
4.3.2 Applying our new featuresThe first step after choosing the documents is applying our new
features (later used within the BART toolkit). This was done in 3 modes,
described below: The patterns-based module, WordNet features and
Wikipedia features.
1. The pattern-based module is applied on the pairs of words
participating in the training instances or in the test instances of the
coreference classifier. At the end of the patterns-based module processing,
we have a database table containing for each word pair the values of the
pattern-based features (3.2.1). This database table is used when running
the BART toolkit (4.3.3) during training and testing. As mentioned above
(section 4.3.1), we save the features values for over 40000 pairs of words.
That is because, as described in 3.2.1.2, we include also the synonyms of
the words appearing in the dataset.
In the query creation mode, we allow only one asterisk from each
direction of the patterns words. In addition, we relate only to the first 30
returned snippets.
2. As described in 3.2.2, for the WordNet feature we use WordNet
3.0, and Snow 400k. For each pair of words in the training or testing set
we check if WordNet includes a hypernym path of maximum length of 10.
Since a longer path length indicates less relatedness, and because of
running time, we had to set a threshold; a length of 10 seems to suffice.
Moreover; clearly, there is no use for checking pairs including
pronouns, so they get the value UNKNOWN.
3. As described in section 3.2.3, for the Wikipedia features we
used the work of Shnarch (2009). For each pair of words in the training or
testing set we check if the method extracted a reference rule. We used all
44
of the extraction method described in section 2.5 (Be-Comp, All-N, Title
Parenthesis, Redirect and Link).
4.3.3 Running BART toolkitWe added our new features to the BART toolkit. Thus, we could
run BART on 24 features: twelve base features, ten patterns-based
features, one WordNet feature and one Wikipedia feature.
In our experiment, we run each time a different subset of our new
features, in order to check their contribution. Our subset includes:
1. Base features
2. Web patterns + WordNet feature + Wikipedia feature
3. Base+ one of our new features each time
4. Each of our features alone
5. All of the 24 features
The used Classifier algorithms are J481 and SVM2 (its implementation in
Weka)
4.4 Results:Even though we didn't manage to improve BART's original F–
measure results, we did get to some interesting points. In this subsection
we give the results in details. Table 1 presents comparative results for the
coreference resolution task using the selected test/training documents, and
J48. The following subsections provide further results explanation and a
detailed error analysis.
Method Recall Precision F1
Base features 0.622 0.483 0.538
1 http://grb.mnsu.edu/grbts/doc/manual/J48_Decision_Trees.html2 http://en.wikipedia.org/wiki/Support_vector_machine
45
Patterns-Based + WN +
Wikipedia
0.612 0.401 0.509
Patterns-Based features
(using synonyms)
0.623 0.311 0.461
WN feature 0.503 0.412 0.485
Wikipedia feature 0.537 0.423 0.497
Base features + Patterns-Based
(using synonyms)
0.621 0.421 0.517
Base + WordNet 0.619 0.430 0.522
Base + Wikipedia 0.621 0.425 0.518
All features 0.621 0.430 0.521
Table 1 – Received results using SMO classifier/ J48 decision tree on the
selected testing documents
Results DetailsWhen analyzing Table 1 one can notice the following aspects:
Using only our new features (Patterns-Based + WN + Wikipedia)
achieves only less than 3% under the F measure when using the
base features. Thus, utilize semantic features does seem to be
relevant for the coreference resolution. Moreover, in this case the
built decision tree (appendix 1) includes features from all of the
semantic resources, thus demonstrating that all resources are
relevant for classification. This fact also impacts the results – when
applying all of the features groups, we get higher score than from
each group separately
The web-patterns extension of using synonyms (described in
3.2.1.2) does increase recall (than not using the extension) without
harming precision. For example, the synonym pair 'acquisition –
dealing' was an extension for the coreference pair 'acquisition -
transaction', which helped increasing the recall. Precision was
stable since we used only the first synonym from the first sense
from WordNet. Moreover, we have noticed that in many cases
using a hypernym for extension would also help. For example, in
46
the case of 'Sony-company', the pair 'Sony-Organization' will be a
good extension, where organization is a hypernym of company.
However, including hypernym will most likely harm the precision
significantly.
Applying only one semantic group, we could analyze the quality of
each resource. While in the patterns-based features we get a
relatively high recall but low precision, using Wikipedia or
WordNet features we achieve lower recall but higher precision.
This fact can be explained by the error analysis we did for each of
our features (subsection 4.5).
Adding the semantic features to the base features (which include
grammatical agreement features, syntactical features, positional
features, string match features and proper name match normalizer
feature) doesn’t achieve higher results. In the next subsection
(4.5) we give the error analysis of the semantic features. The
described errors cause this fact. In the error analysis and in section
5 we also suggest possible solutions, which may improve the
results in future work.
Learned Decision trees
Using decision trees is useful for testing the classifier model, since
they are quite easy to understand. The learned decision tree models and
their explanation are given in appendix 1. These trees show that in the
case of using only our features, the tree built includes 4 features: WordNet
feature, Wikipedia features, and the two features from the patterns-base
module – 'All Patterns ' and 'Unique Patterns' (described in 3.2.1.2). In
addition, when using the base features the first feature checked is the
String match feature, then the alias feature, etc'. The total number of used
features is 9.
.
47
4.5 Detailed AnalysisBelow is a detailed error analysis, concerning each of our new
feature groups. We first describe the patterns-based features errors (4.5.1),
than, WordNet errors (4.5.2), and Wikipedia errors (4.5.3).
4.5.1 - Pattern-based featuresThe pattern based module features indeed provides an indication
for coreference for many coreferent pairs of words. A sample for these
pairs is: TWA -> Airlines, Bush -> President, Officer -> Position, Period
-> year, Issues -> news, etc. However, in this subsection we describe the
pattern-based features errors. We analyze the two types of errors:
false positive errors – where snippets were retrieved for non-coreferent
pairs, and false negative errors – where the system didn't retrieve any
snippets for coreference pairs.
.
False positive errors
Analyzing the retrieved snippets, we noticed the fact that 40 % of
them are false positive. In this subsection we describe the classes of
the 'wrong' snippets, i.e., snippets retrieved for non-coreference words. In
order to determine the types of errors, we have analyzed 500 snippets,
selected randomly. For each group of errors we describe the error type,
give examples and analyze potential ways for solving the error. Also, for
each group, the number of the occurrences is reported.
1. Error type: Description / clause:
Word1 Word2 Snippet
place Post [NP Interviews/NNS ] [PP for/IN ] [NP
posts/NNS ] [VP taking/VBG ] [NP place/NN ]
48
and/CC [NP other/JJ posts/NNS ] [VP
advertised./VBN ] [NP Bid/NNP ] for…
News Area [VP send/VB ] [NP you/PRP ] [PP
updates/NNS ] [PP on/IN ] [NP security/NN
issues/NNS ] [PP in/IN ] [NP our/PRP$ area/NN
] and/CC [NP other/JJ news/NN ] [PP
about/IN ] ….
Year Bank [Madison/NNP Bank/NNP Ltd.,/NNP
Established/NNP ] [PP in/IN ] [NP the/DT
year/NN 1966,/CD ] [VP known/VBN ] [PP
as/IN ] [NP "AMCO/NNP BANK"/NNP ] …
Error explanation : In the pattern based module, the first noun in
the pattern may be the last noun of a clause in the snippet.
However, the correct noun for the pattern is other noun in the
clause – the head of the clause.
Consider the example of the pair 'News-Area'. The found
pattern is 'area and other news'. However, notice that area is a part
of the noun phrase: 'security issues in our area.' Thus, the correct
extraction of the pattern should be 'News -issues' rather than
'News-Area'.
Possible way of solution: Using full parsing would solve the
problem, since we could check that the first word of the pair does
not appear as a modifier in the snippet.
Number of the occurrences : 48
2. False reference of the pattern
Examples:
Word1 Word2 Snippets
Bush Taxes [NP democrats/NNS ] [VP would/MD
lower/VB ] [ taxes/NNS ] [PP like/IN ] [NP
49
BUSH/NNP ] [VP wanted ,perhaps/VBZ ] [NP
we/PRP ] [VP would/MD have seen/VB ] [ADJP
many/JJ ] [PP of/IN ] [NP those/DT jobs/NNS ]
[ADVP back/RB ]
Bush Taxes [NP He/PRP ] [NP deficit/NN ] [VP spent/VBD ]
[PP like/IN ] [NP Bush/NNP ] and/CC [NP
he/PRP ] [VP cuts/VBZ ] [NP taxes/NNS ] [PP
like/IN ] [NP Bush/NNP ] [VP to/TO get/VB ]
[NP us/PRP ] [PP out/IN ] [PP of/IN ] [NP those
numbers./DT]
Bush Taxes [NP John/NNP McCain/NNP ] [VP will/MD
lower/VB ] [NP taxes/NNS ] [PP like/IN ] [NP
Bush/NNP ] [VP did?/VBD ]
Error explanation: In the pattern based module, the first noun in
the pattern can be the last noun of a verb phrase.
Consider the first example, for the pair 'Bush-Taxes'. The
found pattern is 'taxes like Bush'. However, notice that like is
related to the verb phrase: 'lower taxes'.
Possible way of solution : Using full parsing would solve the
problem, since we could check that the second word of the pair is
not a part of a clause.
Number of the occurrences : 27
3. Separation of Proper Name
Word2 Word2 Snippet
Section Act [NP Government/NN Code/NNP Section/NNP] [VP
known/VBN ] [PP as/IN ] [NP the/DT Maddy/NNP
Act,/NNP ] [VP requires/VBZ ] [PP to/TO ] ..
Week Market [NP Independent/NNP Film/NNP Week,/NNP ]
[ADVP formerly/RB ] [VP known/VBN ] [PP as/IN ]
[NP IFP/NNP Market/NNP ] [VP |/VBD ]
50
Trust Bank [NP Arizona/NNP Bank/NNP and/CC Trust/NNP ]
[VP (formerly/RB known/VBN] [PP as/IN] [NP
Bank/NNP of/IN the/DT Southwest)/NNP] …
Error explanation: In this group the first or the second noun we
query to Google are a part of proper noun in the snippet, so the
snippets are correct only when considering the whole noun phrase
in the snippet.
Consider the example for the pair 'Week-market'. The
pattern is correct only when considering the full proper name
'Independent Film Week' and the full proper name 'IFP Market'.
Otherwise, obviously, week is not a coreference to market.
Possible way of solution: Ignoring the snippet when we search
only for a part of a proper name would solve such examples. This
can be done by changing the regular expressions to eliminate these
cases.
However, by doing so we would miss some cases in which
we should not ignore the snippet. For example, the pair
(bank ,place) is a coreference pair in our annotation, so we
wouldn't want to miss the snippet:
[NP 20/CD Churchill/NNP Place/NNP],[VP also/RB known/VBN ] [PP as/IN ] [NP Street/NNP Bank/NNP ],…
Thought both words in the pair are part of a proper name in the snippet.
Number of the occurrences: 65
4. Incorrect head is checked:
In this group the word extracted from the text, which we use in the
query, is not the correct head of the noun phrase in the text.
Word1 Word2 Snippet
51
Unit Vice [NP The/DT special/JJ investigations/NNS
unit/NN ] [ADVP also/RB ] [VP known/VBN ]
[PP as/IN ] [NP Vice/NNP and/CC
Narcotics/NNP ] [VP is/VBZ involved/VBN ]
[NP inscovert/JJ investigations/NNS ] [PP of/IN ]
[NP illegal/JJ activity/NN and/CC
enforcement/NN ] [PP of/IN ] [NP liquor/NN
codes./NN ]
Inc Effect [NP NetLearn/NNP Ventures,/NNP Inc./NNP ]
[VP formerly/RB known/VBN ] [PP as/IN ] [NP
Net/JJ Effect,/NN ] [VP is/VBZ ] [NP an/DT
IT/NNP consulting company/NN ] [VP
established/VBN ] [PP in/IN ] [NP 2001./CD]
Error explanation : When creating the query to Google; we consider
the pattern and the pair of nouns which are the heads of the noun
phrases from the text. In some cases, we extract a wrong head. For
example, in some cases we get the noun phrase 'Inc' as a noun
phrase in the query. Clearly, searching Google with too small head
could cause retrieving wrong snippets.
Possible way of solution: In order to get the head of the noun
phrase we used the implementation in BART. Probably trying
more tools for extracting the noun phrases heads is needed.
Another option is trying to submit to Google not only the head of
the noun phrase, though submitting the whole noun phrase will
case a very low recall.
Number of the occurrences: 31
5. Potentially co-referring words
In this group the pair of words may corefer in some context, so the
snippets retrieved make sense. However, these two words do not corefer
in the specific example given in our corpus.
52
Word1 Word2 Snippet
Post Chairman [NP Dr/NNP Sardar/NNP Singh/NNP
Johl,/NNP ] [NP a/DT well-known/JJ
economist/NN ] and/CC [NP former/JJ vice-
chairman/NN ] [PP of/IN ] [NP the
planning/DT board/NN ] [VP has/VBZ
held/VBN ] [NP many/JJ illustrious/JJ
posts/NNS ] [PP like/IN ] [NP
chairman/NN ] [PP of/IN ] [NP
Commission/NNP fors/NNS ]
Sony Company [NP Sony,/NNP while/IN ] [VP well/RB
known/VBN ] [PP as/IN ] [NP THE/DT
company/NN ] [NP that/WDT ] [VP has/VBZ
made/VBN ] [NP broadcast/NN television/NN
camerass,/NN ] [VP is/VBZ ] [NP the/DT
owner/NN ] [PP of/IN ] [NP the/DT old/JJ
Minolta/NNP camera/NN company,/JJ ] [VP
known/VBN ] [PP as/IN ] [NP a/DT
superb/NN ]
Heads Post [ADVP (a)/LS ] [NP how/WRB many/JJ
departmental/JJ heads/NNS ] and/CC [NP
other/JJ senior/JJ posts/NNS ] [VP have/VBP
been/VBN appointed/VBN ] [PP from
outside/IN ] [PP of/IN ] [NP the/DT
Island/NNP ] [PP during/IN ] [NP the/DT
last/JJ ten/CD years?/NN ]
Error explanation: In this group the snippets retrieved are not
wrong – that is, they certainly make sense for some contexts. For
example, 'Sony – Company' can actually be coreferent. However,
they are not in the specific annotation of our corpus - i.e.,
'company' relates to other company in the text.
53
Possible way of solution: Probably this is the most complicated
problem of the pattern-based module: it doesn't consider the
context of the words. Therefore, this group shows that finding
patterns without relating to the context of the examined text- can
cause errors.
Number of the occurrences: 19
The distribution of the errors is shown in figure 1.
Figure 1: False positive distribution by groups of errors
False negative errors:
Another analysis we have done is checking the false negative, i.e.,
a pair of coreferent words, which didn't get any snippets. We analyzed
about 100 cases of false negative, that is, 100 pairs which didn't yield any
retrieved snippet thought they are coreference pairs.
We have found the following groups:
1. Named entity which doesn't appear in the web
54
Although the web is the largest corpus available, in some cases it
doesn't include a specific entity/ number. For example: 20000$ -> sum,
Mischinski ->director, etc'.
Number of the occurrences: 36
2. Pattern – matching errors of the snippets
As described in (3.2.1), the last step of our pattern based module is
a pattern matching between the parsed snippet and a specific regular
expression. In some cases, even though the snippet is correct, it doesn't
pass the pattern matching step, for example, because of parsing error.
Number of the occurrences: 45
3. Too complex expression / understanding the context:
In some cases, the pair of words corefer in a very specific context,
so it can't be found in a specific pattern of the pattern-based-module.
For example, the pair scale -> issue (when talking about A two-tier wage
scale and the major outstanding issue) and the pair filing -> application
(when talking about the new filing and a serious application)
Number of the occurrences: 17
The percentages of the errors are shown in figure 2.
55
Figure 2: False negative distribution
4.3.2 WordNet Feature WordNet seems to be an important tool for coreference resolution.
Pairs of coreferent words such as: position ->job, merger ->integrating,
etc' are recognized by the feature as coreferring pairs of words. Yet, in this
subsection we analyze the false positive errors and the false negative
errors of the feature.
False positive errors
When examining the false positives, we have noticed that the main
problem is the large number of senses that WordNet includes for many
words in our corpus, where senses are sorted by frequency, from the most
to the least. For example, WordNet does have a path between the pairs of
words 'offer' and 'attempt', which are unlikely to corefer. The path is
derived from the last sense of 'offer', out of 4 possible senses. Another
example: is 'sale' and 'income': those words are related to the same topic,
but are not likely to corefer. In this case, the path for this pair is derived
from the last sense of 'sale', out of 6 possible senses.
56
Moreover, as in the pattern-based features, we have noticed the
error of potentially co-referring words which are not coreferent in our
corpus.
False negative errors
The main problem of WordNet causes the false negative is its
sparseness. This can be mainly realized in:
Specific named entities, so we miss pairs such as John Krieger -> a
spokesman, Bush -> President, 12.3.1997->today, etc.
Number, so we miss pairs such as: 20000$ -> sum, 1987 ->the
year, etc'.
Moreover, in many cases the words appear in WordNet but don't
have a path of hypernym, manly because the coreference includes
understanding the context. This includes pairs such as: The women-> the
members, or the pair of words scale->issue, which is coreferring in a
specific context (talking about 'two-tier wage scale' and the 'major
outstanding issue').
4.3.3 Wikipedia FeatureWikipedia seems to be another resource which is likely to
contribute to the coreference resolution task. By processing Wikipedia
pages we extract synonyms and hypernyms. For example, the alias Minn
is recognized as connected to Minesota. Yet, in this subsection we analyze
the false positive errors and the false negative errors of the Wikipedia
feature.
57
False positive Errors
For the false positive analysis, we have examined 80 false positive
pairs. We have noticed that the main problem of the feature is extracting
other semantic relations than synonyms and hypernyms. For example, we
sometimes get derivation relation: as in the pair 'Employment' and
'employee'. This happens because the first sentence in the Wikipedia page
for 'employment' is:
"Employment is a contract between two parties, one being the
employer and the other being the employee"
Another relation is meronym, as in the pair 'price' and 'service'.
This happens because the first sentence in the Wikipedia page for 'price'
is:
"Price in economics and business is the result of an exchange and
from that trade we assign a numerical monetary value to a good,
service or asset."
Moreover, as in the previous semantic features, we have noticed
the error of potentially co-referring words which are not coreferent in our
corpus. For example the pair: '1987' and 'year', and the pair 'yesterday'
the 'day'
False negative errors
As in WordNet, when using Wikipedia we suffer from its
sparseness. Though many named entities have a Wikipedia page (such as
IBM, George W. Bush, Minnesota), in many cases the extracted relation it
is not correct, as in the pair 'John Krieger' and 'spokesman'. Wikipedia
sparseness also appears in case of common nouns, such as for the pair
'talkes' and 'negotiations'
58
Another problem is the number of senses a word can have. For
example, Wikipedia does contain a page for 'party', but missed the pair:
'parties' and 'sides', since it relates to another sense of 'party'.
59
5. Conclusions and future workCoreference is a significant task of text understanding, and can be
used as a subtask for many other natural language processing applications.
such as question answering and information extraction. In this thesis we
have presented a coreference resolution application which includes new
semantic features, using varied semantic resources, including WordNet,
Wikipedia and Google (by applying the pattern-based module).
As a baseline coreference application, we used the BART system,
which implements twelve features, including grammatical agreement
features, syntactical features, positional features, string match features, a
proper name match normalizer feature, and a WordNet semantic feature.
However, these features are lack of semantic knowledge
Using our semantic resources, we aimed to identify the semantic
relation between the words of candidate antecedent-anaphor pair, where
the most relevant relations are synonyms and hypernyms. Having these
relations between the candidate antecedent and anaphor increases the
likelihood for coreference between the pair of words.
We applied the coreference resolution application each time with
another group of features: by applying the coreference resolution
application using only all our new semantic features (without the base
features of a standard coreference resolution system (Soon 2001)), we get
an F1 measure score which is close to the F1 measure when using the base
features. We showed that in this case the features which appear in the
decision tree are from all semantic resources, that is, all feature groups are
relevant. This fact also influences the results – when applying all of the
features groups, we get a higher score than from each group separately.
Thus, the results indicate that the semantic features are relevant for the
coreference resolution task. Also, when applying the pattern-based
module, we applied the synonyms extension (subsection 3.2.1.2), thus
increasing recall without harming precision. Next, we tried using both the
baseline features and the semantic features. In this case the semantic
features did not increase the F1 score.
60
We reported a detailed error analysis, concerning each of our used
semantic resources. We grouped the errors into main types, explaining
each group by examples and reporting its frequency. Our analysis reveals
many directions for future research. For example, as for the pattern-based
module, we notice that using a full parsing for the retrieved snippets is
needed for ignoring cases where the patterns don't relate to the checked
pair of words. Also, querying not only the heads of the noun phrases, but
also larger parts of the noun phrases, should be checked, since sometimes
we extract too small heads, thus retrieving unwanted snippets . As for
WordNet, we showed that searching for a hypernym path between the
candidate antecedent-anaphor pair might be noisy when the path includes
uncommon senses of the words. Therefore, relating paths which include
only the common sense or senses of the words should be considered. As
for Wikipedia, reducing noise should be done by ignoring unwanted
relations, mainly derivations and meronyms.
61
ReferencesRion Snow, Daniel Jurafsky, and Andrew Y. Ng. 2006. Semantic
taxonomy induction from heterogenous evidence. In Proceedings of
COLING-ACL.
Soon, W., H. Ng, and D. Lim. 2001. A machine learning approach to
coreference resolution of noun phrases. Computational Linguistics,
27(4):521–544.
Yang, Xiaofeng, Jian Su, and Chew-Lim Tan. 2005. Improving pronoun
resolution using statistics-based semantic compatibility information. In
Proceedings of the 43rd Annual meeting of the Association for
Computational Linguistics (ACL05), pages 427–434.
Eyal Shnarch, Libby Barak, Ido Dagan. Extracting Lexical Reference Rules
from Wikipedia. In Proceedings of ACL 2009
Y. Versley, S. Ponzetto, M. Poesio, V. Eidelman, A. Jern, J. Smith, X.
Yang, A.Moschitti, BART: A Modular Toolkit for Coreference
Resolution. Companion Volume of the Proceedings of the 46th Annual
Meeting of the Association for Computational Linguistics, June 16-18,
2008.
Mirkin, Shachar, Ido Dagan, and Maayan Geffet. 2006. Integrating
pattern-based and distributional similarity methods for lexical entailment
acquisition. In Proceedings of COLING-ACL Poster Sessions, pages 579–
586.
Exploiting semantic role labeling, WordNet and Wikipedia for
coreference resolution. Proceedings of the Human Language Technology
Conference of the North American Chapter of the Association for
Computational Linguistics, New York, N.Y., 4-9 June 2006, pp. 192-199
62
Modjeska N, Markert K, Nissim M: Using the Web in Machine Learning for
Other-Anaphora Resolution. Proc of the 2003 Conference on Empirical
Methods in Natural Language Processing;
Etzioni, Oren, M. Cafarella, D. Downey, S. Kok, A.-M. Popescu, T. Shaked,
S. Soderland, D.S. Weld, and A. Yates. 2004. Web-Scale Information
Extraction in KnowItAll. In Proceedings of WWW-04. NY, USA .
Banerjee, S. & T. Pedersen (2003). Extended gloss overlap as a measure of
semantic relatedness. In Proc. of IJCAI-03, pp. 805–810.
Quinlan, J. R., 1993. C4.5: Programs for Machine Learning. San Mateo, CA:
Morgan Kaufmann.
Rion Snow, Daniel Jurafsky, Andrew Y Ng (2006) Semantic taxonomy
induction from heterogenous evidence In ACL '06: Proceedings of the 21st
International Conference on Computational Linguistics and the 44th annual
meeting of the ACL (2006), pp. 801-808.
Ido Dagan, Oren Glickman, and Bernardo Magnini. 2006. The pascal
recognising textual entailment challenge. In Lecture Notes in Computer
Science, volume 3944, pages 177–190.
Ng V., Cardie C., 2002a. Improving Machine Learning Approaches to
Coreference Resolution, In Proceedings of the ACL 2002, pp. 104-111.
63
Appendix A- Trees Examples and explanation:
64
String match
TRUE Alias
TRUE
Unique pattern
Appositive
FALSE
Both Proper name
FALSE
TRUE
Number
FALSE All patterns
FLASE TRUE
Pronoun anaphor
Gender
FALSE FALSE NumberAgreement
SENTENCE DISTANCE
TRUE FALSE
FALSE
T
T
T
T
F
F
F
F
U
T
F
>1 <=1
=0 <0
F
T
T
F
F T
>2 <2
1 eerT serutaef desab-snrettaP dna serutaeF esaB -(woleb sliateD)
65
Wikipedia
WordNet All patterns Unique pattern
Wordnet FALSE
FALSE
TRUE
FALSE
TRUE WordNet
FALSE
TRUE
FALSE
TRUE
TRUE
FALSE
T
T
U
F
F
T
<1 =>1
U
<5 =>5
T
U
F
U
F
2 eerTerutaef aidepikiW + erutaef NW +serutaef dessap snrettaP -(woleb sliateD)
Trees explanation:In the given trees the nodes are the features names, and the edges are the
features values. Each leaf of the tree specified the pair classification – true
for coreference pair and false for non-coreferent pair.
Tree 1 is the model which was built for the base features and our
pattern-base features. In this tree we start with checking the string-match
feature, then the alias, continuing with the gender/Unique patterns etc.
Notice that two of our pattern-based features appear in the tree. This fact
indicates that the pattern-based features are indeed indicative for having a
coreference relation.
Tree 2 is the model which was built only for the new semantic
features: patterns-based features, WordNet feature, and Wikipedia feature.
Notice that the features appears in the tree are from all semantic resources,
indicating that all feature groups are relevant. This fact also impacts the
results – when applying all of the features groups we get higher score than
from each group separately.
66
תקציר אנאפורה היא תופעה שכיחה בדיבור ובטקסט. תופעה זו מתארת שני ביטוים
מטרת המשימהבטקסט המתייחסים לאותה ישות בעולם האמיתי. במילים אחרות,
אחר לצירוף שמני מתייחס אנפור, המכונה כלשהו, צירוף שמני לגלות האם היא
לדוגמא, במשפט: "לדני הייתה מסיבת יומולדת והואהקודם לו במסמך, המכונה קודמן.
דני. כלומר, דני הוא–קיבל הרבה מתנות" המילים 'דני' ו-'הוא' מתייחסות לאותה ישות
זה שקיבל את המתנות.
זו לבעיה ומתייחסים מכונה למידת בשיטות משתמשים אנחנו זו בעבודה
נאספים מאפיינים שונים, למשל, מספר זוג מילים בטקסט כל סיווג: עבור כבעיית
המילים בטקסט המפרידות בין המילים בזו, הסכמה לגבי יחיד/רבים, זכר/נקבה. בעזרת
מאפיינים אלה נבנה מסווג שיודע להחליט האם בין זוג מילים קיים קשר של אנאפורה
הצירופים השמניים, אשר קיים ביניהם לאחר סיום שלב הסיווג של כל זוגותאו לא.
בטקסט, אשר מתייחסים , אוספים את כל הזוגות הללו המופיעיםאנאפורהקשר של
ה ישות בעולם האמיתי. תהליך מכונה אשכול איסוףלאותה והצירופיםאנאפורות ,
.אנאפורההשמניים בטקסט המתייחסים לאותה ישות בעולם האמיתי נקראים אשכול
מנסות לשפר את ביצועיהן שלאנאפורהגישות מונחות למידה לפתרון בעיית ה
לגבי זוגותאפייניםהמ בעיקר על ידי הרחבת אוסףהאנאפורה מערכות לפתרון בעיית
.Soon (2001) , Ng and Cardie (2002) האנאפורה. מסווג המילים המועמדים עבור
אולם, באוסף המאפיינים חסר שימוש בידע סמנטי על זוגות המלים. למשל, זיהוי קשר
) או שמותsynonymסמנטי בין זוג המילים המועמד, כגון קשר של מילים נרדפות (
).hypernymמכלילים (
מטרת עבודה זו, אם כן, היא הצגת שיטה של שימוש בידע סמנטי עבור בעיית
זוגות המלים. האנאפורה כאשר השימוש בא לידי ביטוי בזיהוי קשרים סמנטים בין
קיומו של קשר כזה בין זוג מילים מהווה אינדיקציה עבור קיומו או אי קיומו של קשר
(בעזרת מודל מבוסס התבניות) ,גוגלאנאפורה. הכלים הסמנטיים בהם השתמשנו הם:
למסווג מאפיינים של חדשה קבוצה יצרנו סמנטי כלי לכל וויקיפדיה. וורדנט
האנאפורה.
בעבודה זו אנו מראים כי ידע סמנטי הוא אכן רלוונטי עבור פתרון האנאפורה
על ידי הרצת מערכת האנאפורה בעזרת קבוצת מאפיינים אחרת בכל פעם, השוואת
התוצאות והצגת עצי ההחלטה שהתקבלו. בנוסף , אנחנו מציגים ניתוח תוצאות מעמיק
לגבי כל אחד מהכלים: בחינת יתרונות, חסרונות וסוגי הטעויות של כל אחד. ניתוח
טעויות זה חושף כיוונים רבים למחקרים עתידיים.
67
דגן עידו'דר של בהדרכתו נעשתה זו עבודה
המחשב למדעי הפקולטה מן
בר-אילן אוניברסיטת
68
למדעי המחלקה בר-אילן אוניברסיטתהמחשב
בעית לפתרון סמנטי בידע שימושאנאפורה
ארז חן
מוסמך תואר קבלת לשם מהדרישות כחלק מוגשת זו עבודהבר-אילן אוניברסיטת של המחשב למדעי בפקולטה
תשס"טן, סיוו2009יוני רמת-גן, ישראל
69