Phrase Structures and Syntax - The University of Edinburgh · 2019-10-08 · Phrase Structures and...

Post on 24-May-2020

5 views 0 download

transcript

Phrase Structures and SyntaxANLP: Lecture 11

Shay Cohen

School of InformaticsUniversity of Edinburgh

8 October 2019

1 / 57

Until now...

I Focused mostly on regular languages

I Finite state machines and transducersI n-gram modelsI Hidden Markov ModelsI Viterbi search and friends

I ... Next: going up one level in the Chomsky hierarchy

2 / 57

Recap: The Chomsky hierarchy

Context−sensitive

Context−free

Regular

Recursively enumerable

3 / 57

Side note: Is English Regular?

Centre-embedding[The cat1 likes tuna fish1].[The cat1 [the dog2 chased2] likes tuna fish1].[The cat1 [the dog2 [the rat3 bit3] chased2] likes tuna fish1].

Consider L = {(the N)n TVm likes tuna fish | n,m ≥ 0}where N = { cat, dog, rat, elephant, kangaroo . . . }

TV = { chased, bit, admired, ate, befriended . . . }

Clearly L is regular. However, L ∩ English is the language

{(the N)n TVn−1 likes tuna fish | n ≥ 1}

Can use pumping lemma to show L is not regular.

Assumption 1. “(the N)n TVm likes tuna fish” is ungrammaticalfor m 6= n − 1.Assumption 2. “(the N)n TVn−1 likes tuna fish” is grammaticalfor all n ≥ 1. (Is this reasonable? You decide!)

4 / 57

Side note: Is English Regular?

Centre-embedding[The cat1 likes tuna fish1].[The cat1 [the dog2 chased2] likes tuna fish1].[The cat1 [the dog2 [the rat3 bit3] chased2] likes tuna fish1].

Consider L = {(the N)n TVm likes tuna fish | n,m ≥ 0}where N = { cat, dog, rat, elephant, kangaroo . . . }

TV = { chased, bit, admired, ate, befriended . . . }

Clearly L is regular. However, L ∩ English is the language

{(the N)n TVn−1 likes tuna fish | n ≥ 1}

Can use pumping lemma to show L is not regular.

Assumption 1. “(the N)n TVm likes tuna fish” is ungrammaticalfor m 6= n − 1.Assumption 2. “(the N)n TVn−1 likes tuna fish” is grammaticalfor all n ≥ 1. (Is this reasonable? You decide!)

4 / 57

The NLP Pipeline

5 / 57

Grammar Writing Exercise

Date: October 25 (Friday during lecture time)

You will write a grammar for the English language

There will be a competition between the grammars for “precision”and “recall”

You should be able to start working on your grammar by the endof this class

More details here:http://www.inf.ed.ac.uk/teaching/courses/anlp/cgw

There will be prizes!

6 / 57

Computing meaning

A well-studied, difficult, and un-solved problem.

Fortunately, we know enough tohave made partial progress (Wat-son won).

Over the next few weeks, we will work up to the study of systemsthat can assign logical forms that mathematically state themeaning of a sentence, so that they can be processed by machines.

Our first stop will be natural language syntax.

7 / 57

Natural language syntax

Syntax provides the scaffolding for semantic composition.

The brown dog on the mat saw the striped cat through thewindow.

The brown cat saw the striped dog through the window on themat.

Do the two sentences above mean the same thing? What is theprocess by which you computed their meanings?

8 / 57

Natural language syntax

Syntax provides the scaffolding for semantic composition.

The brown dog on the mat saw the striped cat through thewindow.The brown cat saw the striped dog through the window on themat.

Do the two sentences above mean the same thing? What is theprocess by which you computed their meanings?

8 / 57

Natural language syntax

Syntax provides the scaffolding for semantic composition.

The brown dog on the mat saw the striped cat through thewindow.The brown cat saw the striped dog through the window on themat.

Do the two sentences above mean the same thing? What is theprocess by which you computed their meanings?

8 / 57

Constituents

Words in a sentence often form groupings that can combine withother units to produce meaning. These groupings, calledconsituents can often be identified by substitution tests (muchlike parts of speech!)

Kim [read a book], [gave it to Sandy], and [left]

You said I should read the book and [read it] I did.

Kim read [a very interesting book about grammar].

9 / 57

Heads and Phrases

Noun (N): Noun Phrase (NP)Adjective (A): Adjective Phrase (AP)Verb (V): Verb Phrase (VP)Preposition (P): Prepositional Phrase (PP)

I So far we have looked at terminals (words or POS tags).

I Today, we’ll look at non-terminals, which correspond tophrases.

I The part of speech that a word belongs to is closely linked tothe type of constituent that it is associated with.

I In a X-phrase (eg NP), the key occurrence of X (eg N) iscalled the head, and controls how the phrase interacts (bothsyntactically and semantically) with the rest of the sentence.

I In English, the head tends to appear in the middle of a phrase.

10 / 57

Constituents have structure

English NPs are commonly of the form:

(Det) Adj* Noun (PP | RelClause)*NP: the angry duck that tried to bite me,

head: duck.

VPs are commonly of the form:

(Aux) Adv* Verb Arg* Adjunct*Arg → NP | PPAdjunct → PP | AdvP | . . .VP: usually eats artichokes for dinner,

head: eat

.

In Japanese, Korean, Hindi, Urdu, and other head-final languages,the head is at the end of its associated phrase.

In Irish, Welsh, Scots Gaelic and other head-initial languages, thehead is at the beginning of its associated phrase.

11 / 57

Constituents have structure

English NPs are commonly of the form:

(Det) Adj* Noun (PP | RelClause)*NP: the angry duck that tried to bite me, head: duck.

VPs are commonly of the form:

(Aux) Adv* Verb Arg* Adjunct*Arg → NP | PPAdjunct → PP | AdvP | . . .VP: usually eats artichokes for dinner,

head: eat

.

In Japanese, Korean, Hindi, Urdu, and other head-final languages,the head is at the end of its associated phrase.

In Irish, Welsh, Scots Gaelic and other head-initial languages, thehead is at the beginning of its associated phrase.

11 / 57

Constituents have structure

English NPs are commonly of the form:

(Det) Adj* Noun (PP | RelClause)*NP: the angry duck that tried to bite me, head: duck.

VPs are commonly of the form:

(Aux) Adv* Verb Arg* Adjunct*Arg → NP | PPAdjunct → PP | AdvP | . . .VP: usually eats artichokes for dinner, head: eat.

In Japanese, Korean, Hindi, Urdu, and other head-final languages,the head is at the end of its associated phrase.

In Irish, Welsh, Scots Gaelic and other head-initial languages, thehead is at the beginning of its associated phrase.

11 / 57

WALS - Subject Verb Object order

Taken from https://wals.info/feature/81A#2/5.6/172.8

12 / 57

Desirable Properties of a Grammar

Chomsky specified two properties that make a grammar“interesting and satisfying”:

I It should be a finite specification of the strings of thelanguage, rather than a list of its sentences.

I It should be revealing, in allowing strings to be associatedwith meaning (semantics) in a systematic way.

We can add another desirable property:

I It should capture structural and distributional properties ofthe language. (E.g. where heads of phrases are located; how asentence transforms into a question; which phrases can floataround the sentence.)

13 / 57

Desirable Properties of a Grammar

I Context-free grammars (CFGs) provide a pretty goodapproximation.

I Some features of NLs are more easily captured using mildlycontext-sensitive grammars, as well see later in the course.

I There are also more modern grammar formalisms that bettercapture structural and distributional properties of humanlanguages. (E.g. combinatory categorial grammar.)

I Programming language grammars (such as the ones used withcompilers, like LL(1)) aren’t enough for NLs.

14 / 57

A Tiny Fragment of English

Let’s say we want to capture in a grammar the structural anddistributional properties that give rise to sentences like:

A duck walked in the park. NP,V,PPThe man walked with a duck. NP,V,PPYou made a duck. Pro,V,NPYou made her duck. ? Pro,V,NPA man with a telescope saw you. NP,PP,V,ProA man saw you with a telescope. NP,V,Pro,PPYou saw a man with a telescope. Pro,V,NP,PP

We want to write grammatical rules that generate these phrasestructures, and lexical rules that generate the words appearing inthem.

15 / 57

A Tiny Fragment of English

Let’s say we want to capture in a grammar the structural anddistributional properties that give rise to sentences like:

A duck walked in the park. NP,V,PPThe man walked with a duck. NP,V,PPYou made a duck. Pro,V,NPYou made her duck. ? Pro,V,NPA man with a telescope saw you. NP,PP,V,ProA man saw you with a telescope. NP,V,Pro,PPYou saw a man with a telescope. Pro,V,NP,PP

We want to write grammatical rules that generate these phrasestructures, and lexical rules that generate the words appearing inthem.

15 / 57

A Tiny Fragment of English

Let’s say we want to capture in a grammar the structural anddistributional properties that give rise to sentences like:

A duck walked in the park. NP,V,PPThe man walked with a duck. NP,V,PPYou made a duck. Pro,V,NPYou made her duck. ? Pro,V,NPA man with a telescope saw you. NP,PP,V,ProA man saw you with a telescope. NP,V,Pro,PPYou saw a man with a telescope. Pro,V,NP,PP

We want to write grammatical rules that generate these phrasestructures, and lexical rules that generate the words appearing inthem.

15 / 57

A Tiny Fragment of English

Let’s say we want to capture in a grammar the structural anddistributional properties that give rise to sentences like:

A duck walked in the park. NP,V,PPThe man walked with a duck. NP,V,PPYou made a duck. Pro,V,NPYou made her duck. ? Pro,V,NPA man with a telescope saw you. NP,PP,V,ProA man saw you with a telescope. NP,V,Pro,PPYou saw a man with a telescope. Pro,V,NP,PP

We want to write grammatical rules that generate these phrasestructures, and lexical rules that generate the words appearing inthem.

15 / 57

A Tiny Fragment of English

Let’s say we want to capture in a grammar the structural anddistributional properties that give rise to sentences like:

A duck walked in the park. NP,V,PPThe man walked with a duck. NP,V,PPYou made a duck. Pro,V,NPYou made her duck. ? Pro,V,NPA man with a telescope saw you. NP,PP,V,ProA man saw you with a telescope. NP,V,Pro,PPYou saw a man with a telescope. Pro,V,NP,PP

We want to write grammatical rules that generate these phrasestructures, and lexical rules that generate the words appearing inthem.

15 / 57

Grammar for the Tiny Fragment of English

Grammar G1 generates the sentences on the previous slide:

Grammatical rules Lexical rulesS → NP VP Det → a | the | her (determiners)NP → Det N N → man | park | duck | telescope (nouns)NP → Det N PP Pro → you (pronoun)NP → Pro V → saw | walked | made (verbs)VP → V NP PP Prep → in | with | for (prepositions)VP → V NPVP → VPP → Prep NP

16 / 57

Context-free grammars: formal definition

A context-free grammar (CFG) G consists of

I a finite set N of non-terminals,

I a finite set Σ of terminals, disjoint from N,

I a finite set P of productions of the form X → α, whereX ∈ N, α ∈ (N ∪ Σ)∗,

I a choice of start symbol S ∈ N.

17 / 57

A sentential form is any sequence of terminals and nonterminalsthat can appear in a derivation starting from the start symbol.

Formal definition: The set of sentential forms derivable from G isthe smallest set S(G) ⊆ (N ∪ Σ)∗ such that

I S ∈ S(G)

I if αXβ ∈ S(G) and X → γ ∈ P, then αγβ ∈ S(G).

The language associated with grammar is the set of sententialforms that contain only terminals.

Formal definition: The language associated with G is defined byL(G) = S(G) ∩ Σ∗

A language L ⊆ Σ∗ is defined to be context-free if there existssome CFG G such that L = L(G).

18 / 57

A sentential form is any sequence of terminals and nonterminalsthat can appear in a derivation starting from the start symbol.

Formal definition: The set of sentential forms derivable from G isthe smallest set S(G) ⊆ (N ∪ Σ)∗ such that

I S ∈ S(G)

I if αXβ ∈ S(G) and X → γ ∈ P, then αγβ ∈ S(G).

The language associated with grammar is the set of sententialforms that contain only terminals.

Formal definition: The language associated with G is defined byL(G) = S(G) ∩ Σ∗

A language L ⊆ Σ∗ is defined to be context-free if there existssome CFG G such that L = L(G).

18 / 57

Assorted remarks

I X → α1 | α2 | · · · | αn is simply an abbreviation for abunch of productions X → α1, X → α2, . . . , X → αn.

I These grammars are called context-free because a rule X → αsays that an X can always be expanded to α, no matter wherethe X occurs.This contrasts with context-sensitive rules, which might allowus to expand X only in certain contexts, e.g. bXc → bαc .

I Broad intuition: context-free languages allow nesting ofstructures to arbitrary depth. E.g. brackets, begin-end blocks,if-then-else statements, subordinate clauses in English, . . .

19 / 57

Grammar for the Tiny Fragment of English

Grammar G1 generates the sentences on the previous slide:

Grammatical rules Lexical rulesS → NP VP Det → a | the | her (determiners)NP → Det N N → man | park | duck | telescope (nouns)NP → Det N PP Pro → you (pronoun)NP → Pro V → saw | walked | made (verbs)VP → V NP PP Prep → in | with | for (prepositions)VP → V NPVP → VPP → Prep NP

Does G1 produce a finite or an infinite number of sentences?

20 / 57

Recursion

Recursion in a grammar makes it possible to generate an infinitenumber of sentences.

In direct recursion, a non-terminal on the LHS of a rule alsoappears on its RHS. The following rules add direct recursion to G1:

VP → VP Conj VPConj → and | or

In indirect recursion, some non-terminal can be expanded (viaseveral steps) to a sequence of symbols containing thatnon-terminal:

NP → Det N PPPP → Prep NP

21 / 57

Structural Ambiguity

You saw a man with a telescope.

S

NP

Pro

You

VP

V

saw

NP

Det

a

N

man

PP

Prep

with

NP

Det

a

N

telescope

22 / 57

Structural Ambiguity

You saw a man with a telescope.

S

NP

Pro

You

VP

V

saw

NP

Det

a

N

man

PP

Prep

with

NP

Det

a

N

telescope

23 / 57

Structural Ambiguity

You saw a man with a telescope.

S

NP

Pro

You

VP

V

saw

NP

Det

a

N

man

PP

Prep

with

NP

Det

a

N

telescope

S

NP

Pro

You

VP

V

saw

NP

Det

a

N

man

PP

Prep

with

NP

Det

a

N

telescope

This illustrates attachment ambiguity: the PP can be a part of theVP or of the NP. Note that there’s no POS ambiguity here.

24 / 57

Structural Ambiguity

You saw a man with a telescope.

S

NP

Pro

You

VP

V

saw

NP

Det

a

N

man

PP

Prep

with

NP

Det

a

N

telescope

S

NP

Pro

You

VP

V

saw

NP

Det

a

N

man

PP

Prep

with

NP

Det

a

N

telescope

This illustrates attachment ambiguity: the PP can be a part of theVP or of the NP. Note that there’s no POS ambiguity here.

24 / 57

Structural Ambiguity

You saw a man with a telescope.

S

NP

Pro

You

VP

V

saw

NP

Det

a

N

man

PP

Prep

with

NP

Det

a

N

telescope

S

NP

Pro

You

VP

V

saw

NP

Det

a

N

man

PP

Prep

with

NP

Det

a

N

telescope

This illustrates attachment ambiguity: the PP can be a part of theVP or of the NP. Note that there’s no POS ambiguity here.

24 / 57

Structural AmbiguityGrammar G1 only gives us one analysis of you made her duck.

S

NP

Pro

You

VP

V

made

NP

Det

her

N

duck

There is another, ditransitive (i.e., two-object) analysis of thissentence – one that underlies the pair:

What did you make for her?You made her duck.

25 / 57

Structural Ambiguity

For this alternative, G1 also needs rules like:

NP → NVP → V NP NPPro → her

S

NP

Pro

You

VP

V

made

NP

Det

her

N

duck

S

NP

Pro

You

VP

V

made

NP

Pro

her

NP

N

duck

In this case, the structural ambiguity is rooted in POS ambiguity.

26 / 57

Structural AmbiguityThere is a third analysis as well, one that underlies the pair:

What did you make her do?You made her duck.

(move head or body quickly downwards)

Here, the small clause (her duck) is the direct object of a verb.

Similar small clauses are possible with verbs like see, hear andnotice, but not ask, want, persuade, etc.

G1 needs a rule that requires accusative case-marking on thesubject of a small clause and no tense on its verb.:

VP → V S1S1 → NP(acc) VP(untensed)NP(acc) → her | him | them

27 / 57

Structural AmbiguityThere is a third analysis as well, one that underlies the pair:

What did you make her do?You made her duck. (move head or body quickly downwards)

Here, the small clause (her duck) is the direct object of a verb.

Similar small clauses are possible with verbs like see, hear andnotice, but not ask, want, persuade, etc.

G1 needs a rule that requires accusative case-marking on thesubject of a small clause and no tense on its verb.:

VP → V S1S1 → NP(acc) VP(untensed)NP(acc) → her | him | them

27 / 57

Structural Ambiguity

Now we have three analyses for you made her duck:

NP VP

S

VPro NP

You made duck

Det N

her

NP VP

S

VPro

You made duck

NP NP

Pro

her

N

NP VP

S

VPro

You made duck

S

NP(acc)

her

VP

V

How can we compute these analyses automatically?

28 / 57

A Fun Exercise - Which is the VP?

(a) (b) (c)

A new one?(d) (e) (f)

saw the car from my house window with my telescope

E

29 / 57

A Fun Exercise - Which is the VP?

(a) (b) (c)

A new one?(d) (e) (f)

saw the car from my house window with my telescopeE

29 / 57

Questions to Ask Yourself

I Can this context-free grammar formalism tackle all syntaxphenomena?

I Where do we get the grammar from? How big would it be?

I How do we take a grammar and a sentence, and get a tree forthe sentence from the grammar efficiently?

I How would we introduce probabilities into the use of acontext-free grammar?

30 / 57

Summary

I We use CFGs to represent NL grammars

I Grammars need recursion to produce infinite sentences

I Most NL grammars have structural ambiguity

I A parser computes structure for an input automatically

I Recursive descent and shift-reduce parsing

31 / 57