Introduction into Inference Based Natural Language ... · Introduction into Inference‐Based...

Post on 14-Aug-2020

0 views 0 download

transcript

Introduction into Inference Based Introduction into Inference‐Based Natural Language Understanding

Ekaterina OvchinnikovaISI, University of Southern California

katya@isi.edu

February, 28th –UNED

OutlineOutline

1. Introduction: natural language unterstanding (NLU)

Knowledge for Reasoning2. Knowledge for Reasoning

3. Automatic reasoning for NLU

4. Experiments

5. Conclusion

Introduction:Natural Language UnderstandingNatural Language Understanding

Natural Language Understanding (NLU)Natural Language Understanding (NLU)

I   d  t   d t d  t l l  In order to understand natural language, we need to know a lot about the world

and be able to draw inferences.

d l f h k l d h lText: “Romeo and Juliet” is one of Shakespeare’s early tragedies. The play 

has been highly praised by critics for its language and dramatic effect.

Natural Language Understanding (NLU)Natural Language Understanding (NLU)

I   d  t   d t d  t l l  In order to understand natural language, we need to know a lot about the world

and be able to draw inferences.

d l f h k l d h lText: “Romeo and Juliet” is one of Shakespeare’s early tragedies. The play

has been highly praised by critics for its language and dramatic effect.

Knowledge:

tragedies are plays• tragedies are plays

• plays are written in some language and have dramatic effect

Natural Language Understanding (NLU)Natural Language Understanding (NLU)

I   d  t   d t d  t l l  In order to understand natural language, we need to know a lot about the world 

and be able to draw inferences.

d l f h k l d h l

write

Text: “Romeo and Juliet” is one of Shakespeare’s early tragedies. The play

has been highly praised by critics for its language and dramatic effect.

Knowledge:

tragedies are plays• tragedies are plays

• plays are written in some language and have dramatic effect

• Shakespeare is a playwright; playwrights write playsShakespeare is a playwright; playwrights write plays

Natural Language Understanding (NLU)Natural Language Understanding (NLU)

I   d  t   d t d  t l l  In order to understand natural language, we need to know a lot about the world

and be able to draw inferences.

d l f h k l d h l

write

Text: “Romeo and Juliet” is one of Shakespeare’s early tragedies. The play

has been highly praised by critics for its language and dramatic effect.

Knowledge:

tragedies are plays• tragedies are plays

• plays are written in some language and have dramatic effect

• Shakespeare is a playwright; playwrights write playsShakespeare is a playwright; playwrights write plays

• “early” indicates time; time modifies events

• ...

Computational NLU: applicationsComputational NLU: applications

T  “R   d J li ” i     f Sh k ’   l   di  Th   l  Text: “Romeo and Juliet” is one of Shakespeare’s early tragedies. The play 

has been highly praised by critics for its language and dramatic effect.

Queries:

Sh k  i  th   th   f “R   d J li t” Shakespeare is the author of “Romeo and Juliet” .

Shakespeare went through a tragedy.

...

Applications: question answering, information extraction, automatic text Applications: question answering, information extraction, automatic text summarization, semantic search,...

Computational NLU: approachesComputational NLU: approaches

Shallow NLP methods are based on:Shallow NLP methods are based on:

lexical overlap pattern matching distributional similarity ...

continuum of methods

Deep NLP methods are based on:

semantic analysisy lexical and world knowledge logical inference ...

Inference based NLUInference‐based NLU

TEXTTEXT“Romeo and Juliet” is one of  Shakespeare’s tragedies. 

LOGICAL REPRESENTATION„Romeo and Juliet“(x)  tragedy(x) 

KNOWLEDGE  BASEShakespeare(y)  playwright(y)

playwright(y)  play(x) write(y x)„Romeo and Juliet (x)  tragedy(x)  Shakespeare(y)  rel(y,x)

playwright(y)  play(x) write(y,x)tragedy(x)  play(x)

b

INTERPRETATION„Romeo and Juliet“(x)  tragedy(x) play(x)„ ( ) g y( ) p y( )

Shakespeare(y) write(y,x)

Inference based NLU pipelineInference‐based NLU pipeline

Knowledge base

Knowledgeabout world base

Text Semantic parser

Logicalrepresentation

Inference machine

Final application

QueriesKnowledge about language:lexicon, grammarlexicon, grammar

Knowledge for ReasoningKnowledge for Reasoning

Sources of machine readable world knowledgeSources of machine‐readable world knowledge

L i l ti  di ti i1. Lexical‐semantic dictionaries

2. Distributional resources

3. Ontologies

Lexical semantic dictionariesLexical‐semantic dictionaries

Manually developed resources

Encode relations defined on word senses

Examples:  WordNet, FrameNet

gived

getcausestreepandanussrew pine

type of

donorrecipienttheme

sourcerecipientthemetrunk

forestwood

bole

Type of inference

x(dog(x)  animal(x)) Pluto is a dog Pluto is an animal

x,y,z(give(x,y,z)  get(y,z)) John gave Mary a book Mary got a book

Distributional resourcesDistributional resources

Automatically learned from corpora

Encode distributional properties of words

Examples:  VerbOcean, DIRT, Proposition Store, WikiRules

X finds a solution to Y  Y is solved by X X resolves Y

John finds a solution to the problemthe problem is solved by JohnJohn resolves the problem

Type of inference

p

x,y(solve(x,y) z(find(x,z)solution(z)  to(z,y))

John solves a puzzle  John finds a solution for a puzzle

OntologiesOntologies

Manually developed resources

Encode relationships between concepts

Examples:  SUMO, OpenCyc, DOLCE

x (French_Polynesia_Island(x)  Pacific_Island (x))x (Pacific_Island(x)  Island(x)y(located_in(x,y)  Pacific_Ocean(y))

Type of inference

Tahiti is a Pacific island  Tahiti is located in Pacific Ocean 

Three types of sources of world knowledge for NLUThree types of sources of world knowledge for NLU

1. Lexical‐semantic 2. Distributional 3. Ontologiesdictionaries resources

3 g

relations between word sensestree plant

wordstree –wood

conceptsx (tree(x)  y tree1 plant1

tree2 structure1tree woodtree – leaf

x (tree(x)  y (part(y,x)branch(y)))

knowledge/updates common‐sense/ static

common‐sense/ dynamic

„scientific“/staticThese three types of knowledge sources:static dynamic static

designed for reasoning(consistent)

no no yes

These three types of knowledge sources:

• contain disjount knowledge with different properties• are all useful for NLU

language‐dependent yes yes no

domain‐dependent no no/yes yes/no

constructed manually automatically manually

• can be used in combitation

constructed manually automatically manually

structure simple no/simple complex

lexicalized yes yes noy y

probabilistic poor yes no

Logical Inference for NLULogical Inference for NLU

Logical inference: deduction and abductionLogical inference: deduction and abduction

D d ti lid l i l i fDeduction: valid logical inference

x(p(x)  q(x)) Dogs are animals.

(A) Pl t  i  dp(A) Pluto is a dog.___________________ _______________________

q(A) Pluto is an animal.

Abduction: inference to the best explanation

( ( )  ( )) If it  i  th  th    i   tx(p(x)  q(x)) If it rains then the grass is wet.

q(A) The grass is wet.____________________ _____________________________________

p(A) It rains.

Deduction for NLUDeduction for NLU

lid i f   i  FOL  t ti   f t t

Blackburn and Bos (2005)

‐ valid inference given FOL representation of text

1 Proving that text follows from another text and KB1. Proving that text follows from another text and KBText1: Pluto is a dog. Pluto (dog(Pluto))

Text2: Pluto is an animal. Pluto (animal(Pluto)) Text1 entails Text2

KB: All dogs are animals. x (dog(x)animal(x))

C i   d l2. Constructing modelsText: John saw a house. The door was open. John,h,d (see(John,h)house(h) 

door(d) open(d))

KB: All houses have doors as their parts. x (house(x)y(door(y) part‐of(y,x)))

Model: see(John,h)house(h) door(d1) open(d1) door(d2) part‐of(d2,h)

d l h h h h d d d f d hMin. model: see(John,h)house(h) door(d) open(d) part‐of(d,h)

Weighted abduction for NLUWeighted abduction for NLU

b t i t t ti   f t t b d    i   l

Hobbs et al.(1993)

‐ best interpretation of text based on meaning overlap

Text: John composed a sonata.

KB ( )   ( t ( )1 2 k f t( ))KB: (1) x (sonata(x)1.2work‐of‐art(x))

(2) x,y (put‐together(x,y)0.6 collection(y)0.6 compose(x,y))

(3) x,y (create(x,y)0.6 work‐of‐art(y)0.6 compose(x,y))(3) ,y ( ( ,y) f (y) p ( ,y))

(J h )$10 t ( )$10    compose(John,s)$10 sonata(s)$10:   $30

put‐together(John x)$6 collection(x)$6 create(John y)$6 work‐of‐art(y)$6put‐together(John,x) collection(x) create(John,y) work‐of‐art(y)

sonata(y)$12

meaning overlap

(y)

Experiments1  RecognizingTextual Entailment1. RecognizingTextual Entailment

Recognizing textual entailment (RTE)Recognizing textual entailment (RTE)

Task: given a Text‐Hypothesis (T‐H) pair, predict entailment

Text : John gave a book to Mary.

H th i   M   t   b kHypothesis : Mary got a book.

Entailment: YES

Text : John gave a book to Mary.

Hypothesis : Mary read a book.

E i l d

Entailment: NO

Experimental data: Second RTE Challenge (RTE‐2) datasets Development and test sets contain 800 T‐H pairs each Development and test sets contain 800 T H pairs each Evaluation measure: accuracy ‐ the percentage of pairs correctly judged

Deduction for RTEDeduction for RTE

C t T d H i t  FOL l i l f1. Convert T and H into FOL logical forms

2. Query theorem prover

IF KB TH proven  THEN return “entailment”

IF (KB T H) proven  THEN return “inconsistent”

3. Query model builder

IF modelsT H towards KB THEN return “no entailment”

IF models T H towards KB THEN return “entailment possible”

Related work:  Akhmatova and Molla (2005), Fowler et al. (2005), 

Bos and Markert (2006), Tatu and Moldovan (2007)( ), ( 7)

RTE 2: results for deductive reasoner NutcrackerRTE 2: results for deductive reasoner Nutcracker

RTE system Nutcracker (Bos and Markert, 2005)y ( , 5)

Knowledge Base

WordNet inheritance:  x (dog(x)  animal(x))

synonymy:  x (heat(x) warmth(x)) (ca. 619 000 axioms)

FrameNet relations: x,y,z (give(x,y,z)  get(y,z))) (ca. 1500 axioms)

KB Proof found No proof Totaloo ou d(no. of pairs)

o p oo(no. of pairs)

ota(no. of pairs)

No KB 19 (2 4%) 781No KB 19 (2.4%) 781800WordNet and

FrameNet axioms22 (2.8 %) 778

Deductive reasoning for RTE: Discussion (1)Deductive reasoning for RTE: Discussion (1)

1. Finding proofs failed, because KB was incomplete

T: This      dog         barks.

H: Some animal is noisy.?

y

Solutions: LF decomposition (proposition‐to‐proposition entailment) LF decomposition (proposition to proposition entailment)

introduction of heuristics (model size, sentence length, semantic similarity)

2 No handling of ambiguity2. No handling of ambiguity

x (tree(x)  plant(x))

x (tree(x)  structure(x))

Solution: Statistical disambiguation before reasoning (not based on knowledge, depends g g g , p

on annotated corpora)

Deductive reasoning for RTE: Discussion (2)Deductive reasoning for RTE: Discussion (2)

3. Unbounded inference

dog(x)  animal(x)  creature(x)  physical_object(x)  entity(x)

Everything will be a part of the model!

4 Complexity of reasoning (FOL is undecidable)4. Complexity of reasoning (FOL is undecidable)

ca. 30 min per solved problem in average 

(20 words per sentence in average)( p g )

Abduction for RTEAbduction for RTE

1 Construct best interpretations of T and H towards KB1. Construct best interpretations of T and H towards KB

2. Add interpretation of T to KB

3. Construct best interpretation of H towards KB + T

4. Compare cost of interpretation ofH towards KB and towards KB + Tp p

5. If the cost difference exceeds trained threshold, return “entailment”.Otherwise, return “no entailment”.

Related work:  Raina et al  (2005)  Ovchinnikova et al (2011)Related work:  Raina et al. (2005), Ovchinnikova et al. (2011)

RTE 2: results for abductive reasonerMini TACITUSRTE 2: results for abductive reasonerMini‐TACITUS Abductuve reasoner Mini‐TACITUS  (Mulkar et al., 2007)

Knowledge Base

WordNet inheritance, instantiation, entailment, ...  (ca. 5o7 500 axioms)

F N t   d  l ti   (       i )FrameNet synonymy and relations  (ca. 55 300 axioms)

Knowledge base Accuracy Average number of g y gaxioms per sentence

T               H

No KB 57.3% 0                0No KB 57.3% 0                0

WordNet 59.6% 294           111

FrameNet 61.1% 1233          510

f f

WordNet+FrameNet 62.6% 1527            621

Best run outperforms 21 from 24 RTE‐2 participants(2 systems ‐ 73% and 75%, 2 systems ‐ 62% and 63%, 20 systems – from 55% to 61%)

Abductive reasoning for RTE: Discussion (1)Abductive reasoning for RTE: Discussion (1)

1 No treatment of logical connectors  quantifiers andmodality in natural1. No treatment of logical connectors , quantifiers andmodality in naturallanguage

If A then B entails A and B

A is not B entails A is B

Solution: explicitly axiomatazing logical connections

...  if(e1,e2)

  ( )...  or(e1,e2)

2. Over‐unification

J h     l d Bill   lJohn eats an apple and Bill eats an apple.

Solution:

J h ( )  t( )  l ( )  Bill( )  t( )  l ( ) John(x1)  eat(e1,x1,y1)  apple(y1)  Bill(x2)  eat(e2,x2,y2)  apple(y2)  y1≠y2But how to formulate the constraints?

Abductive reasoning for RTE: Discussion (2)Abductive reasoning for RTE: Discussion (2)

3. Complexity of reasoning (Horn clauses ‐ exponential)In 30 min per sentence, optimal solutions found for 6% of the cases

Mini‐TACITUSwas not designed for large‐scale processing substantial optimization is possiblep p

Deduction vs  abduction for NLUDeduction vs. abduction for NLU

D d ti W i ht d bd tiDeduction Weighted abduction

ambiguity • unable to choose betweenalternative readings, 

• choice between readings    based on meaning overlap

if both are consistent

unbounded inference unlimited inference chains an inference is appropriate, if it is part of the lowest‐costpproof

incomplete knowledge if a piece of relevantknowledge is missing  

assumptions are allowedknowledge is missing, fails to find a proof

expressivity/complexity

FOLexpressive enough to

Horn clausesrestricted quantification and complexity expressive enough to

represent most of NLreasoning iscomputationally complex

restricted quantification and logical connectorsreasoning is cheapercomputationallycomputationally complex computationally

Experiments2  Semantic Role Labeling2. Semantic Role Labeling

Semantic role labeling (SRL)Semantic role labeling (SRL)

Task: given a predicate, disambiguate it and label its arguments with g p , g gsemantic roles

Text: Senses of “take”:

John took Mary home.  Bringing [agent,  theme,  goal]y g g [ g , , g ]

John took  drugs. Ingest substance [ingestor,  substance]

John took a bus. Ride vehicle [theme, vehicle]

The walk took 30 minutes. Taking time  [activity, time_length]

Experimental data: RTE 2 Challenge test data annotated with FrameNet frames and roles RTE‐2 Challenge test data annotated with FrameNet frames and roles

(Burchardt and Pennacchiotti, 2008) used as a gold standard

Abduction for SRLAbduction for SRL

In abductive framework  SRL is a by‐product of constructing best  In abductive framework, SRL is a by product of constructing best interpretations

Text :                     John took the bus. He got off at 10th street.g ff

LF:                         John(x1)  take(e1,x1,x2)  bus(x2) get_off(e2,x1)  at(e2,x3)  10th_ street(x3)

Axioms  from FrameNet:  

1.Ride_vehicle(e1,x1,x2)  take(e1,x1,x2)2 Taking time (e1 x1 x2)  take(e1 x1 x2)2.Taking_time (e1,x1,x2)  take(e1,x1,x2)3.Disembarking (e1,x1,x2)   gett_off(e1,x1)  at(e1,x2)4. Disembarking (e1,x1,x2)   Ride_vehicle(e2,x1,x3) 

Interpretation:   John(x1)  take(e1,x1,x2)  bus(x2)  Ride_vehicle(e1,x1,x2) get off(e2 )  at(e2 3)  0th  street( 3)  Disembarking (e2 3) get_off(e2,x1)  at(e2,x3)  10th_ street(x3) Disembarking (e2,x1,x3) 

SRL for RTE 2  test set : resultsSRL for RTE‐2  test set : results Frame‐Annotated Corpus for Textual Entailment (FATE) used as a gold 

standard standard • annotates RTE‐2 test set• annotates only frames relevant for computing entailment

Results are compared against the state‐of‐the‐art system for assigning FrameNet frames and roles Shalmaneser ( alone and boosted with the WordNet Detour to FrameNet)WordNet Detour to FrameNet)• only recall is considered for frame match

System Frame matchRecall

Role matchPrecision Recall

Shalmaneser 0 55 0 54                0 37Shalmaneser 0.55 0.54                0.37

Shalmaneser+Detour 0.85 0.52                0.36

Mini‐TACITUS 0.65 0.55                0.30

Experiments3  Paraphrasing Noun‐Noun Dependencies3. Paraphrasing Noun Noun Dependencies

Paraphrasing noun noun dependencies in RTEParaphrasing noun‐noun dependencies in RTE

Task: given a noun‐noun construction (noun compound or possessive) in an  g p pentailment pair, find best paraphrases

T:      Muslims make up some 3.2 million of Germany’s 82 million people...H:     82 million people live in Germany.

Text : Germany’s  peopleText : Germany s  people...

Paraphrases : Germany has peoplepeople from Germany

l li  i  Gpeople live in Germany...

Experimental data: 1600 pairs of RTE‐2  set have been manually investigated only those NN dependencies which are crucial for inferring entailment  only those NN‐dependencies which are crucial for inferring entailment 

have been considered (93T‐H pairs)

Abduction for paraphrasing nn dependenciesAbduction for paraphrasing nn‐dependencies

Text :                                  Shakespeare’s  poem was writtenText :                                  Shakespeare s  poem was written...

LF:                                       Shakespeare(x)  poem (y)  of(y,x) write(e,z,y)

Axioms  from Proposition Store:  Shakespeare(x) write (e,x,y)  poem(y) Shakespeare(x)  poem (y)  of(y,x)Shakespeare(x)  have (e x y)  poem(y) Shakespeare(x)  poem (y)  of(y x)Shakespeare(x)  have (e,x,y)  poem(y) Shakespeare(x)  poem (y)  of(y,x)

Interpretation:                Shakespeare(x)  poem (y)  of(y,x) write(e,x,y)

Paraphrasing axioms from Proposition StoreParaphrasing axioms from Proposition StorePeñas and Hovy (2010)

Dependency parse of newspaper texts is used to generate propositions

Propositions from sentence

Steve Walsh threw a pass to Brent Jones in the first quarter.

[Steve_Walsh:noun, throw:verb, pass:noun]

[Steve_Walsh:noun, throw:verb, pass:noun, to:prep, Brent_Jones:noun]

[Steve_Walsh:noun, throw:verb, pass:noun, in:prep, quarter:noun]

Propositions containing nouns Germany and people

[people:noun, in:prep, Germany:noun] :6433[p p , p p, y ]

[Germany:noun, have:verb, people:noun] :2035

[people:noun, live:verb, in:prep, Germany:noun] :288

NN dependencies in RTE 2: results for abductionNN‐dependencies in RTE‐2: results for abductionPeñas andOvchinnikova (2012)

KB:  WordNet, FrameNet, and Proposition Store

Number of pairsNumber of pairs

Correct paraphrasing 42

Wrong paraphrasing 1

No NN construction found 27

No relevant paraphrase found 23

Outcomes:

Integrating heterogeneous knowledge gives advantages  Integrating heterogeneous knowledge gives advantages • legalization of marijuana drugs legalization

Not all applied paraphrases (18 out of 42) were the most frequent

ConclusionsConclusions

World knowledge is not a bottleneck anymore

Inference‐based approaches can compete with purely statistical ones Inference based approaches can compete with purely statistical ones

Weighed abduction seems to be more promising for NLU than classical deductiondeduct o

Complexity of reasoning is still an issue

OutlookOutlook

Main obstacles to large scale inference based NLU :Main obstacles to large‐scale inference‐based NLU :

1. Lack of structured  world knowledge  applicable for reasoning

2 Computational complexity of reasoning2. Computational complexity of reasoning

Situation is changing:g g

• a lot of machine‐readable knowledge available• computational capacities increase

d l d• new reasoners developed

It‘ s time to look again at inference‐based NLU!

Thank you!