Formal Semantics Slides by Julia Hockenmaier, Laura McGarrity, Bill McCartney, Chris Manning, and...

Post on 23-Dec-2015

219 views 0 download

Tags:

transcript

Formal Semantics

Slides by Julia Hockenmaier, Laura McGarrity, Bill McCartney, Chris

Manning, and Dan Klein

Formal Semantics

It comes in two flavors:• Lexical Semantics: The meaning of words• Compositional semantics: How the meaning

of individual units combine to form the meaning of larger units

What is meaning

• Meaning ≠ Dictionary entriesDictionaries define words using words.Circularity!

Reference

• Referent: the thing/idea in the world that a word refers to

• Reference: the relationship between a word and its referent

Reference

Barack presidentObama

The president is the commander-in-chief.= Barack Obama is the commander-in-chief.

Reference

Barack presidentObama

I want to be the president.≠ I want to be Barack Obama.

Reference

• Tooth fairy?

• Phoenix?

• Winner of the 2016 presidential election?

What is meaning?

• Meaning ≠ Dictionary entries• Meaning ≠ Reference

Sense

• Sense: The mental representation of a word or phrase, independent of its referent.

Sense ≠ Mental Image• A word may have different mental images for

different people.– E.g., “mother”

• A word may conjure a typical mental image (a prototype), but can signify atypical examples as well.

Sense v. Reference

• A word/phrase may have sense, but no reference:– King of the world– The camel in CIS 8538– The greatest integer– The

• A word may have reference, but no sense:– Proper names: Dan McCloy, Kristi Krein

(who are they?!)

Sense v. Reference

• A word may have the same referent, but more than one sense:– The morning star / the evening star (Venus)

• A word may have one sense, but multiple referents:– Dog, bird

Some semantic relations between words

• Hyponymy: subclass– Poodle < dog– Crimson < red– Red < color– Dance < move

• Hypernymy: superclass• Synonymy:

– Couch/sofa– Manatee / sea cow

• Antonymy:– Dead/alive– Married/single

Lexical Decomposition

• Word sense can be represented with semantic features:

Compositional Semantics

Compositional Semantics

• The study of how meanings of small units combine to form the meaning of larger units

The dog chased the cat ≠ The cat chased the dog.ie, the whole does not equal the sum of the parts.

The dog chased the cat = The cat was chased by the dogie, syntax matters to determining meaning.

Principle of Compositionality

The meaning of a sentence is determined by the meaning of its words in conjunction with the way they are syntactically combined.

Exceptions to Compositionality

• Anomaly: when phrases are well-formed syntactically, but not semantically– Colorless green ideas sleep furiously. (Chomsky)– That bachelor is pregnant.

Exceptions to Compositionality

• Metaphor: the use of an expression to refer to something that it does not literally denote in order to suggest a similarity– Time is money.– The walls have ears.

Exceptions to Compositionality

• Idioms: Phrases with fixed meanings not composed of literal meanings of the words– Kick the bucket = die

(*The bucket was kicked by John.)– When pigs fly = ‘it will never happen’

(*She suspected pigs might fly tomorrow.)– Bite off more than you can chew

= ‘to take on too much’(*He chewed just as much as he bit off.)

Idioms in other languages

Logical Foundations for Compositional Semantics

• We need a language for expressing the meaning of words, phrases, and sentences

• Many possible choices; we will focus on– First-order predicate logic (FOPL) with types– Lambda calculus

Truth-conditional Semantics• Linguistic expressions

– “Bob sings.”

• Logical translations– sings(Bob)– but could be p_5789023(a_257890)

• Denotation:– [[bob]] = some specific person (in some context)– [[sings(bob)]] = true, in situations where Bob is singing; false, otherwise

• Types on translations:– bob: e(ntity)– sings(bob): t(rue or false, a boolean type)

Truth-conditional SemanticsSome more complicated logical descriptions of language:

– “All girls like a video game.”– x:e . y:e . girl(x) [video-game(y) likes(x,y)]

– “Alice is a former teacher.”– (former(teacher))(Alice)

– “Alice saw the cat before Bob did.”– x:e, y:e, z:e, t1:e, t2:e .

cat(x) see(y) see(z) agent(y, Alice) patient(y, x) agent(z, Bob) patient(z, x) time(y, t1) time(z, t2) <(t1, t2)

FOPL Syntax Summary

• A set of types T = {t1, … }

• A set of constants C = {c1, …}, each associated with a type from T

• A set of relations R = {r1, …}, where each ri is a subset of Cn for some n.

• A set of variables X = {x1, …}

• , , , , , , ., :

Truth-conditional semantics• Proper names:

– Refer directly to some entity in the world– Bob: bob

• Sentences:– Are either t or f– Bob sings: sings(bob)

• So what about verbs and VPs?– sings must combine with bob to produce sings(bob)– The λ-calculus is a notation for functions whose arguments are not yet filled.– sings: λx.sings(x)– This is a predicate, a function that returns a truth value. In this case, it takes a

single entity as an argument, so we can write its type as e t

• Adjectives?

Lambda calculus• FOPL + λ (new quantifier) will be our lambda calculus

• Intuitively, λ is just a way of creating a function– E.g., girl() is a relation symbol; but

λx . girl(x) is a function that takes one argument.

• New inference rule: function application(λx . L1(x)) (L2) → L1(L2)

E.g., (λx . x2) (3) → 32

E.g., (λx . sings(x)) (Bob) → sings(Bob)

• Lambda calculus lets us describe the meaning of words individually. – Function application (and a few other rules) then lets us combine those

meanings to come up with the meaning of larger phrases or sentences.

Compositional Semantics with the λ-calculus

• So now we have meanings for the words• How do we know how to combine the words?• Associate a combination rule with each grammar rule:– S : β(α) NP : α VP : β (function application)– VP : λx. α(x) ∧ β(x) VP : α and : ∅ VP : β

(intersection)

• Example:

Composition: Some more examples

• Transitive verbs:– likes : λx.λy.likes(y,x)– Two-places predicates, type e(et)– VP “likes Amy” : λy.likes(y,Amy) is just a one-place predicate

• Quantifiers:– What does “everyone” mean?– Everyone : λf.x.f(x)– Some problems:

• Have to change our NP/VP rule• Won’t work for “Amy likes everyone”

– What about “Everyone likes someone”?– Gets tricky quickly!

Composition: Some more examples

• Indefinites– The wrong way:• “Bob ate a waffle” : ate(bob,waffle)• “Amy ate a waffle” : ate(amy,waffle)

– Better translation:• ∃x.waffle(x) ^ ate(bob, x)• What does the translation of “a” have to be?• What about “the”?• What about “every”?

Denotation

• What do we do with the logical form?– It has fewer (no?) ambiguities– Can check the truth-value against a database– More usefully: can add new facts, expressed in

language, to an existing relational database– Question-answering: can check whether a statement

in a corpus entails a question-answer pair:“Bob sings and dances”

Q:“Who sings?” has answer A:“Bob”

– Can chain together facts for story comprehension

Grounding• What does the translation likes : λx. λy. likes(y,x) have

to do with actual liking?• Nothing! (unless the denotation model says it does)• Grounding: relating linguistic symbols to perceptual

referents– Sometimes a connection to a database entry is enough– Other times, you might insist on connecting “blue” to the

appropriate portion of the visual EM spectrum– Or connect “likes” to an emotional sensation

• Alternative to grounding: meaning postulates– You could insist, e.g., that likes(y,x) => knows(y,x)

More representation issues

• Tense and events– In general, you don’t get far with verbs as predicates– Better to have event variables e

• “Alice danced” : danced(Alice) vs.• “Alice danced” : ∃e.dance(e)^agent(e, Alice)^(time(e)<now)

– Event variables let you talk about non-trivial tense/aspect structures:

“Alice had been dancing when Bob sneezed”

More representation issues

• Propositional attitudes (modal logic)– “Bob thinks that I am a gummi bear”

• thinks(bob, gummi(me))?• thinks(bob, “He is a gummi bear”)?

– Usually, the solution involves intensions (^p) which are, roughly, the set of possible worlds in which predicate p is true.• thinks(bob, ^gummi(me))

– Computationally challenging• Each agent has to model every other agent’s mental state• This comes up all the time in language –

– E.g., if you want to talk about what your bill claims that you bought, vs. what you think you bought, vs. what you actually bought.

More representation issues

• Multiple quantifiers:“In this country, a woman gives birth every 15 minutes.Our job is to find her, and stop her.”

-- Groucho Marx

• Deciding between readings– “Bob bought a pumpkin every Halloween.”– “Bob put a warning in every window.”

More representation issues

• Other tricky stuff– Adverbs– Non-intersective adjectives– Generalized quantifiers– Generics

• “Cats like naps.”• “The players scored a goal.”

– Pronouns and anaphora• “If you have a dime, put it in the meter.”

– … etc., etc.

Mapping Sentences to Logical Forms

CCG Parsing• Combinatory Categorial

Grammar– Lexicalized PCFG– Categories encode

argument sequences• A/B means a category that

can combine with a B to the right to form an A

• A \ B means a category that can combine with a B to the left to form an A

– A syntactic parallel to the lambda calculus

Learning to map sentences to logical form

• Zettlemoyer and Collins (IJCAI 05, EMNLP 07)

Some Training Examples

CCG Lexicon

Parsing Rules (Combinators)Application

Right: X : f(a) X/Y : f Y : a

Left: X : f(a) Y : a X\Y : f

Additional rules:• Composition• Type-raising

CCG Parsing Example

Parsing a Question

Lexical Generation

Input Training ExampleSentence: Texas borders Kansas.Logical form: borders(Texas, Kansas)

GENLEX

• Input: a training example (Si, Li)

• Computation:– Create all substrings of consecutive words in Si

– Create categories from Li

– Create lexical entries that are the cross products of these two sets

• Output: Lexicon Λ

GENLEX Cross Product

Input Training ExampleSentence: Texas borders Kansas.Logical form: borders(Texas, Kansas)

Output LexiconOutput SubstringsTexasbordersKansasTexas bordersborders KansasTexas borders Kansas

X(cross product)

Output CategoriesNP : texasNP : kansas(S\NP)/NP : λx.λy.borders(y,x)

GENLEX Output LexiconWords Category

Texas NP : texas

Texas NP : kansas

Texas (S\NP)/NP : λx.λy.borders(y,x)

borders NP : texas

Borders NP : kansas

borders (S\NP)/NP : λx.λy.borders(y,x)

… …

Texas borders Kansas NP : texas

Texas borders Kansas NP : kansas

Texas borders Kansas (S\NP)/NP : λx.λy.borders(y,x)

Weighted CCG

Given a log-linear model with a CCG lexicon Λ, a feature vector f, and weights w:

The best parse is: y* = argmax w f(x,y)∙

where we consider all possible parses y for the sentence x given the lexicon Λ.

y

Parameter Estimation for Weighted CCG Parsing

Inputs: Training set {(Si,Li) | i = 1, …, n}Initial lexicon Λ, initial weights w, num. iter. T

Computation: For t=1 … T, i = 1 … n:Step 1: Check correctness

If y* = argmax w f(S∙ i,y) is Li, skip to next iStep 2: Lexical generation

Set λ = Λ ∪ GENLEX(Si,Li)Let y’ = argmax w f(S∙ i,y)

Define λi to be the lexical entries in y’Set Λ = Λ ∪ λi

Step 3: Update ParametersLet y’’ = argmax w f(S∙ i,y)If y’’ ≠ Li

Set w = w + f(Si, y’) – f(Si,y’’)

Output: Lexicon Λ and parameters w

y s.t. L(y) = Li

y

Example Learned Lexical Entries

Challenge Revisited

Disharmonic Application

Missing Content Words

Missing content-free words

A complete parse

Geo880 Test Set

Precision Recall F1

Zettlemoyer & Collins 2007 95.49 83.20 88.93

Zettlemoyer & Collins 2005 96.25 79.29 86.95

Wong & Mooney 2007 93.72 80.00 86.31

Summing Up

• Hypothesis: Principle of Compositionality– Semantics of NL sentences and phrases can be composed

from the semantics of their subparts• Rules can be derived which map syntactic analysis to

semantic representation (Rule-to-Rule Hypothesis)– Lambda notation provides a way to extend FOPC to this

end– But coming up with rule2rule mappings is hard

• Idioms, metaphors and other non-compositional aspects of language makes things tricky (e.g. fake gun)