Getting Computers to Understand What They Read (Or Hear)
Christopher Manning
http://nlp.stanford.edu/
Computer Forum 2012
The future was …
A vast quantity of information, contained in knowledge bases, with artificial intelligence systems for
reasoning over it
The future is …
A vast quantity of information in an ugly mess known as The Web.
But it’s all indexed and easily searchable, and, for humans,
most of the time it actually works
amazingly well.
But how can we use it to get computers to do more advanced tasks
which require getting knowledge from language and putting facts together?
We need machine reading.
We need more than word counts
Extracting Knowledge
Textual abstract: A summary for humans
LLNL EQ Lawrence Livermore National Laboratory LLNL LOC-‐IN California Livermore LOC-‐IN California LLNL IS-‐A scientific research laboratory LLNL FOUNDED-‐BY University of California LLNL FOUNDED-‐IN 1952
“The Lawrence Livermore National Laboratory (LLNL) in Livermore, California is a scientific research
laboratory founded by the University of California in 1952.”
Structured knowledge: A summary for machines
relation entity
Machine Reading with Distant Supervision
[Mintz, et al. ACL 2009; Surdeanu et al. 2011]
• If we had relations marked in texts, we could train a conventional relation extraction system …
• Can we exploit the abundant found information about relations – such as from DBpedia or Freebase – to be able to bootstrap systems for machine reading?
• Method: use database as “distant supervision” of text • The challenge is dealing with the “noise” that enters
the picture
Results
• Precision of extracted facts: about 70%
• New relations learned:
Montmartre IS-‐IN Paris Fyoder Kamesnky DIED-‐IN Clearwater
Fort Erie IS-‐IN Ontario Utpon Sinclair WROTE Lanny Budd
Vince McMahon FOUNDED WWE Thomas Mellon HAS-‐PROFESSION Judge
Where syntactic knowledge helps
How useful are syntactic representations for this goal?
Back Street is a 1932 film made by Universal Pictures, directed by John M. Stahl, and produced by Carl Laemmle Jr.
– Back Street and John M. Stahl are far apart in the surface string
– But they are close together in a dependency parse
Stanford Dependencies as a representation for relation extraction
The little boy jumped over the fence.
jumped!
boy! fence!
the! the!little!
prep_over nsubj
det amod det
det(boy-‐3, The-‐1) amod(boy-‐3, little-‐2) nsubj(jumped-‐4, boy-‐3) det(fence-‐7, the-‐6) prep_over(jumped-‐4, fence-‐7)
S
NP VP
NN
JJ PP
DT
NN DT VBD
IN NP The little
over
boy
the fence
jumped
[de Marneffe & Manning 2008]
Stanford Dependencies as a representation for relation extraction
• Stanford Dependencies favor short paths between related content words
Björne et al. 2009
Over ¾
How do we design a human language understanding system?
• Most systems use a pipeline of processing stages – Tokenize – Part-‐of-‐speech – Named entities – Syntactic parse – Semantic roles – Coreference – …
Probabilistic joint inference helps component tasks [Finkel & Manning, NAACL 2009, 2010]
55
60
65
70
75
80
85
90
ABC CNN MNB NBC PRI VOA
Named Entity Recognition F1-score on OntoNotes (by section)
Baseline
Joint Inference
Goal: Joint modeling of the many phases of linguistic analysis Here, parsing and named entities
Fixed 24% of named entity boundary errors and of incorrect label errors
22% improvement in parsing scores
How can we understand relationships between pieces of text?
• Can one conclude one piece of text from another? – Emphasis is on handling the variability of linguistic expression
• This textual inference technology would enable: – Semantic search: lobbyists attempting to bribe U.S. legislators
The A.P. named two more senators who received contributions engineered by lobbyist Jack Abramoff in return for political favors.
– Question answering: Who bought J.D. Edwards? Thanks to its recent acquisition of J.D. Edwards, Oracle will soon be able …
– Customer email response – Paraphrase and contradiction detection
Natural Logic [MacCartney & Manning 2008, 2009]
OK, the example is contrived, but it compactly exhibits containment, exclusion, and implicativity….
P Jimmy Dean refused to move without blue jeans. H James Dean didn’t dance without pants
yes
Natural logic attempts to capture valid inferences from their surface linguistic forms A revival of Aristotelian syllogistics An example:
7 basic entailment relations
Venn symbol name example
P = Q equivalence couch = sofa
P ⊏ Q forward entailment (strict)
crow ⊏ bird
P ⊐ Q reverse entailment (strict)
European ⊐ French
P ^ Q negation (exhaustive exclusion)
human ^ nonhuman
P | Q alternation (non-exhaustive exclusion)
cat | dog
P _ Q cover (exhaustive non-exclusion)
animal _ nonhuman
P # Q independence hungry # hippo
Relations are defined for all semantic types: tiny ⊏ small, hover ⊏ fly, kick ⊏ strike, ���this morning ⊏ today, in Beijing ⊏ in China, everyone ⊏ someone, all ⊏ most ⊏ some
Lexical entailment classification
P Jimmy Dean
refused to move without blue jeans
H James Dean did n’t dance without pants
edit���index 1 2 3 4 5 6 7 8
edit���type SUB DEL INS INS SUB MAT DEL SUB
lex���feats
strsim=���0.67
implic: ���–/o cat:aux cat:neg hypo hyper
lex���entrel = | = ^ ⊐ = ⊏ ⊏
inversion
Entailment projection
P Jimmy Dean
refused to move without blue jeans
H James Dean did n’t dance without pants
edit���index 1 2 3 4 5 6 7 8
edit���type SUB DEL INS INS SUB MAT DEL SUB
lex���feats
strsim=���0.67
implic: ���–/o cat:aux cat:neg hypo hyper
lex���entrel = | = ^ ⊐ = ⊏ ⊏
projec-tivity ↑ ↑ ↑ ↑ ↓ ↓ ↑ ↑
atomic���entrel = | = ^ ⊏ = ⊏ ⊏
Final answer
Entailment composition
P Jimmy Dean
refused to move without blue jeans
H James Dean did n’t dance without pants
edit���index 1 2 3 4 5 6 7 8
edit���type SUB DEL INS INS SUB MAT DEL SUB
lex���feats
strsim=���0.67
implic: ���–/o cat:aux cat:neg hypo hyper
lex���entrel = | = ^ ⊐ = ⊏ ⊏
projec-tivity ↑ ↑ ↑ ↑ ↓ ↓ ↑ ↑
atomic���entrel = | = ^ ⊏ = ⊏ ⊏
compo-sition = | | ⊏ ⊏ ⊏ ⊏ ⊏
fish | human
human ^ nonhuman
fish ⊏ nonhuman
For example:
✓
Multiword paraphrases
• But this system is not so good at working out “multiword paraphrases”
– walked inland – moved away from the coast
– Pollack said the plaintiffs failed to show that Merrill and Blodget directly caused their losses
– Basically , the plaintiffs did not show that omissions in Merrill’s research caused the claimed losses
Hierarchical Deep Learning: Unsupervised Recursive Autoencoder
Recursive autoencoders capture sematic similarity
Recursive autoencoders for full-‐sentence paraphrase detection
Experiments on Microsoft Research Paraphrase Corpus (Dolan et al. 2004)
Language is inherently connected to people
“… the common misconception [is] that language use has primarily to do with words and what they mean. It doesn’t. It has primarily to do with people and what they mean.”
Asking questions and influencing answers
Clark & Schober, 1992
What does it mean?
A: Was the movie good?
B: Hysterical. We laughed so hard.
Was it a good movie? YES/NO ?
The outpouring of social language use on the web let’s us learn what people mean (as never
before)
Review ratings can teach modifier scales
Grounded learning of answer interpretations
A: Is this hurricane season extraordinary?
B: Very unusual in the sense of how many storms we've had.
• We learn “contingent oppositions”
A: Is Obama qualified?
B: I think he is young.
Envoi
• Probabilistic models have given us very good tools for analyzing human language sentences
• We can extract participants and their relations with good accuracy
• There is exciting work in text understanding and inference based on these foundations
• This provides a basis for computers to do higher-‐level tasks that involve knowledge & reasoning
• But much work remains to achieve the language competence of science fiction robots….