Unsupervised Acquisition of Axioms to Paraphrase Noun Compounds and GenitivesCICLING 2012, New Delhi
Anselmo PeñasNLP & IR Group, UNED, Spain
Ekaterina OvchinnikovaUSC – Information Science Institute, USA
UNED
nlp.uned.es
Texts omit information Humans optimize language
generation effort
We omit information that we know the receptor is able to predict and recover
Our research goal is to make explicit the omitted information in texts
UNED
nlp.uned.es
Implicit predicates In particular, some noun compounds and
genitives are used in such way
In these cases, we want to recover the implicit predicates For example:
• Morning coffee -> coffee drunk in the morning• Malaria mosquito -> mosquito that carries malaria
UNED
nlp.uned.es
How to find the candidates? Nakov & Hearst 2006
Search the web• N1 N2 -> N2 THAT * N1• Malaria mosquito -> mosquito THAT *
malaria Here we use Proposition Stores
Harvest a text collection that will serve as context
Parse documents Count N-V-N, N-V-P-N, N-P-N, … structures Build Proposition Stores (Peñas & Hovy,
2010)
UNED
nlp.uned.es
Proposition Stores
Example: propositions that relateBomb, attack
• npn:[bomb:n, in:in, attack:n]:13.• nvpn:[bomb:n, explode:v, in:in,
attack:n]:11.• nvnpn:[bomb:n, kill:v, people:n, in:in,
attack:n]:8.• npn:[attack:n, with:in, bomb:n]:8.• …
All of them could be paraphrases for the noun compound “bomb attack”
UNED
nlp.uned.es
NE Semantic Classes
Now, What happens if we have a Named Entity?
Shakespeare’s tragedy -> write
Why? Consider
• John’s tragedy• Airbus’ tragedy
UNED
nlp.uned.es
NE Semantic Classes
We are considering the “semantic classes” of the NE
Shakespeare -> writerwriter, tragedy -> write
UNED
nlp.uned.es
Class-Instance relations Fortunately, relevant semantic
classes are pointed out in texts through well-known structures
• appositions, copulative verbs, “such as”, …
Here we take advantage of dependency parsing to get class-instance relationsNNP
NN
nnNNP
NN
apposNNP
NN
be
UNED
nlp.uned.es
Class-Instance relationsWorld News
has_instance(leader,'Yasir':'Arafat'):1491.has_instance(spokesman,'Marlin':'Fitzwater'):1001.has_instance(leader,'Mikhail':'S.':'Gorbachev'):980.has_instance(chairman,'Yasir':'Arafat'):756.has_instance(agency,'Tass'):637.has_instance(leader,'Radovan':'Karadzic'):611.has_instance(adviser,'Condoleezza':'Rice'):590.…
UNED
nlp.uned.es
So far
Propositions: <p,a> | P(p,a)p: predicatea: list of arguments <a1 …an>P(p,a): joint probability
Class-instance relations: <c,i> | P(c,i)c: classi: instanceP(c,i): joint probability
UNED
nlp.uned.es
Probability of a predicate Let’s consider the following example
Favre pass Assume the text has pointed out he
is aquarterback
What is Favre doing with the pass?The same as other quarterbacks• The quarterbacks we observed before in
the background collection – Proposition Store
UNED
nlp.uned.es
Probability of a predicateFavre pass -> p | P(p|i)
Favre -> quarterback | P(c|i)quarterback, pass -> throw | P(p|c)
We already have:
We need to estimate: P(p|c) (What other quarterbacks do with passes)
ic
cpPicPipP )|()|()|(
n
kkk icPicP
1
)|()|(
UNED
nlp.uned.es
Probability of a predicatequarterback pass -> p | P(p|c)
• Steve:Young pass -> throw | P(p|i)• Culpepper pass -> complete | P(p|i)• …
We already have
and P(p|i) comes from previous observation: Proposition Store
ci
ipPciPcpP )|()|()|(
n
kkk ciPciP
1
)|()|(
UNED
nlp.uned.es
Evaluation We want to address the following
questions Do we find the paraphrases required to
enable Textual Entailment?
Do all the noun-noun dependencies need to be paraphrased?
How frequently NEs appear in them?
UNED
nlp.uned.es
Experimental setting Proposition Store from
216,303 World News7,800,000 sentences parsed
RTE-2 (Recognizing Textual Entailment)83 entailment decisions depend
on noun-noun paraphrases 77 different noun-noun
paraphrases
UNED
nlp.uned.es
Results
How frequently NEs appear in these pairs? 82% of paraphrases contain at least one NE 62% are paraphrasing NE-N (e.g. Vikings
quarterback)
UNED
nlp.uned.es
Results
Do all the noun-noun dependencies need to be paraphrased? No, only 54% in our test set Some compounds encode semantic relations
such as: 12% are locative relations (e.g. New York club) Temporal relations (e.g. April 23rd strike , Friday semi-final) Class-instance relations (e.g. quarterback Favre) Measure, …
Some are trivial: 27% are paraphrased with “of”
UNED
nlp.uned.es
Results Do we find the paraphrases required to enable
Textual Entailment? Yes in 63% of non-trivial cases
Proposition type
Paraphrase
NPN Jackson trial ↔ trial against Jackson engine problem ↔ problem with engine
NVN U.S. Ambassador ↔ Ambassador represents the U.S.ETA bombing ↔ ETA carried_out bombing
NVNPN wife of Joseph Wilson ↔ wife is married to Joseph WilsonNVPN Vietnam veteran ↔ veteran comes from Vietnam
Shapiro’s office ↔ Shapiro work in office Germany's people ↔ people live in Germany
Abu Musab al-Zarqawi's group ↔ group led by Abu Musab al-Zarqawi
UNED
nlp.uned.es
ResultsRTE-2 pair 485: Paraphrase not found
United Nations vehicle ↔ United Nations produces vehicles
United Nations doesn’t share any class with the instances that “produce vehicles”
Toyota vehicle -> develop, build, sell, produce, make, export, recall, assemble, …
UNED
nlp.uned.es
Conclusions A significant proportion of noun-noun
dependencies includes Named Entities Some noun-noun dependencies don’t require
the retrieval of implicit predicates The method proposed is sensitive to
different Nes Different NEs retrieve different predicates
Current work: to select the most relevant paraphrase according to the text We are exploring weighted abduction
Unsupervised Acquisition of Axioms to Paraphrase Noun Compounds and GenitivesCICLING 2012, New Delhi
Thanks!