+ All Categories
Home > Documents > I256 Applied Natural Language Processing Fall 2009

I256 Applied Natural Language Processing Fall 2009

Date post: 14-Feb-2016
Category:
Upload: heaton
View: 69 times
Download: 1 times
Share this document with a friend
Description:
I256 Applied Natural Language Processing Fall 2009. Sentence Structure. Barbara Rosario. Resources for IR. Excellent resources for IR: Course syllabus of Stanford course: Information Retrieval and Web Search (CS 276 / LING 286) http://www.stanford.edu/class/cs276/cs276-2009-syllabus.html - PowerPoint PPT Presentation
45
1 I256 Applied Natural Language Processing Fall 2009 Sentence Structure Barbara Rosario
Transcript
Page 1: I256  Applied Natural Language Processing Fall 2009

1

I256 Applied Natural Language

ProcessingFall 2009

Sentence Structure

Barbara Rosario

Page 2: I256  Applied Natural Language Processing Fall 2009

2

Resources for IR

• Excellent resources for IR:– Course syllabus of Stanford course:

Information Retrieval and Web Search (CS 276 / LING 286)• http://www.stanford.edu/class/cs276/cs276-2009-

syllabus.html– Book: Introduction to Information Retrieval (

http://informationretrieval.org/)

Page 3: I256  Applied Natural Language Processing Fall 2009

3

Outline• Sentence Structure• Constituency• Syntactic Ambiguities• Context Free Grammars (CFG)• Probabilistic CFG (PCFG)• Main issues

– Designing grammars– Learning grammars– Inference (automatic parsing)

• Lexicalized Trees• Review

Acknowledgments: Some slides are adapted and/or taken from Klein’s CS 288 course

Page 4: I256  Applied Natural Language Processing Fall 2009

4

Analyzing Sentence Structure

• Key motivation is natural language understanding. – How much more of the meaning of a text can

we access when we can reliably recognize the linguistic structures it contains?

– With the help of the sentence structure, can we answer simple questions about "what happened" or "who did what to whom"?

Page 5: I256  Applied Natural Language Processing Fall 2009

5

Phrase Structure Parsing

• Phrase structure parsing organizes syntax into constituents or brackets

new art critics write reviews with computers

PP

NPNP

N’

NP

VP

S

Page 6: I256  Applied Natural Language Processing Fall 2009

6

Example Parse

Hurricane Emily howled toward Mexico 's Caribbean coast on Sunday packing 135 mph winds and

torrential rain and causing panic in Cancun , where frightened tourists squeezed into musty shelters .

Page 7: I256  Applied Natural Language Processing Fall 2009

7

Analyzing Sentence Structure

• How can we use a formal grammar to describe the structure of an unlimited set of sentences?

• How can we “discover” / design such a grammar?

Page 8: I256  Applied Natural Language Processing Fall 2009

8

Constituency Tests• Words combine with other words to form units. • How do we know what nodes go in the tree?

– What is the evidence of being a unit?

• Classic constituency tests:– Substitution– Question answers– Semantic grounds

• Coherence• Reference• Idioms

– Dislocation– Conjunction

Page 9: I256  Applied Natural Language Processing Fall 2009

9

Constituent structure: Substitution

• Substitutability: a sequence of words in a well-formed sentence can be replaced by a shorter sequence without rendering the sentence ill-formed.– The little bear saw the fine fat trout in the brook.

Page 10: I256  Applied Natural Language Processing Fall 2009

10

Constituent structure

Page 11: I256  Applied Natural Language Processing Fall 2009

11

Constituent structure• Each node in this tree (including the words) is called a

constituent. – The immediate constituents of S are NP and VP.

Page 12: I256  Applied Natural Language Processing Fall 2009

12

Conflicting Tests

• Constituency isn’t always clear– Units of transfer:

• think about ~ penser à• talk about ~ hablar de

– Phonological reduction:• I will go I’ll go• I want to go I wanna go

– Coordination• He went to and came from the store.

La vélocité des ondes sismiques

Page 13: I256  Applied Natural Language Processing Fall 2009

13

PP Attachment

I cleaned the dishes from dinnerI cleaned the dishes with detergentI cleaned the dishes in my pajamasI cleaned the dishes in the sink

Page 14: I256  Applied Natural Language Processing Fall 2009

14

PP Attachment

Page 15: I256  Applied Natural Language Processing Fall 2009

15

Syntactic Ambiguities• Prepositional phrases:

They cooked the beans in the pot on the stove with handles.

• Particle vs. preposition:The puppy tore up the staircase.

• Gerund vs. participial adjectiveVisiting relatives can be boring.Changing schedules frequently confused passengers.

• Modifier scope within NPsimpractical design requirementsplastic cup holder

• Coordination scope:Small rats and mice can squeeze into holes or cracks in the wall.

• And others…

Page 16: I256  Applied Natural Language Processing Fall 2009

16

Context Free Grammar (CFG)• Write symbolic or logical rules:

Grammar (CFG) Lexicon

ROOT S

S NP VP

NP DT NN

NP NN NNS

NN interest

NNS raises

VBP interest

VBZ raises

NP NP PP

VP VBP NP

VP VBP NP PP

PP IN NP

Page 17: I256  Applied Natural Language Processing Fall 2009

17

Context Free Grammar (CFG)

• NLTK, context-free grammars are defined in the nltk.grammar module.

• Define a grammar (you can write your own grammars)

Page 18: I256  Applied Natural Language Processing Fall 2009

18

CFG: formal definition• A context-free grammar is a tuple <N, T, S, R>

– N : the set of non-terminals• Phrasal categories: S, NP, VP, ADJP, etc.• Parts-of-speech (pre-terminals): NN, JJ, DT, VB

– T : the set of terminals (the words)– S : the start symbol

• Often written as ROOT or TOP– R : the set of rules

• Of the form X Y1 Y2 … Yk, with X, Yi N• Examples: S NP VP, VP VP CC VP• Also called rewrites, productions, or local trees

Page 19: I256  Applied Natural Language Processing Fall 2009

19

CFG: parsing• Parse a sentence admitted by the grammar

• Use deduction systems to prove parses from words– Simple 10-rule grammar: 592 parses– Real-size grammar: many millions of parses!

• This scales very badly, didn’t yield broad-coverage tools

Page 20: I256  Applied Natural Language Processing Fall 2009

20

Treebank• Access Treebank to develop broad-coverage grammars.

Page 21: I256  Applied Natural Language Processing Fall 2009

21

PLURAL NOUN

NOUNDETDET

ADJNOUN

NP NP

CONJ

NP PP

Treebank Grammar Scale• Treebank grammars can be enormous

– The raw grammar has ~10K states, excluding the lexicon– Better parsers usually make the grammars larger, not smaller

• Solution?

Page 22: I256  Applied Natural Language Processing Fall 2009

22

Probabilistic Context Free Grammar (PCFG)

• Context free grammar that associates a probability with each of its productions.– P(Y1 Y2 … Yk | X)

• The probability of a parse generated by a PCFG is simply the product of the probabilities of the productions used to generate it.

Page 23: I256  Applied Natural Language Processing Fall 2009

23

Outline• Sentence Structure• Constituency• Syntactic Ambiguities• Context Free Grammars (CFG)• Probabilistic CFG (PCFG)• Main issues

– Designing grammars– Learning grammars (learn the set of rules

automatically)– Parsing (inference: analyze a sentence and

automatically build a syntax tree)• Lexicalized Trees

Page 24: I256  Applied Natural Language Processing Fall 2009

24

The Game of Designing a Grammar

Annotation refines base treebank symbols to improve statistical fit of the grammar Parent annotation [Johnson ’98]

Page 25: I256  Applied Natural Language Processing Fall 2009

25

Annotation refines base treebank symbols to improve statistical fit of the grammar Parent annotation [Johnson ’98] Head lexicalization [Collins ’99, Charniak ’00]

The Game of Designing a Grammar

Page 26: I256  Applied Natural Language Processing Fall 2009

26

Annotation refines base treebank symbols to improve statistical fit of the grammar Parent annotation [Johnson ’98] Head lexicalization [Collins ’99, Charniak ’00] Automatic clustering

The Game of Designing a Grammar

Page 27: I256  Applied Natural Language Processing Fall 2009

27

Learning

• Many complicated learning algorithms…– Another time )-;– Or take CS 288 spring 2010 (recommended!)

Page 28: I256  Applied Natural Language Processing Fall 2009

28

Parsing with Context Free Grammar

• A parser processes input sentences according to the productions of a grammar, and builds one or more constituent structures that conform to the grammar. (Inference)– It is a procedural interpretation of the

grammar. – It searches through the space of trees

licensed by a grammar to find one that has the required sentence along its fringe.

Page 29: I256  Applied Natural Language Processing Fall 2009

29

Parsing algorithms

• Top-down method (aka recursive descent parsing)

• Bottom-up method (aka shift-reduce parsing)

• Left-corner parsing• Dynamic programming technique called

chart parsing. • Etc…

Page 30: I256  Applied Natural Language Processing Fall 2009

30

• Bottom up parser: Begins with a tree consisting of the node S• At each stage it consults the grammar to find a production that can be used

to enlarge the tree• When a lexical production is encountered, its word is compared against the

input• After a complete parse has been found, the parser backtracks to look for

more parses.

Page 31: I256  Applied Natural Language Processing Fall 2009

31

Issues

• Memory requirements• Computation time

Page 32: I256  Applied Natural Language Processing Fall 2009

32

Runtime: Practice• Parsing with the vanilla treebank grammar:

• Why’s it worse in practice?– Longer sentences “unlock” more of the grammar– All kinds of systems issues don’t scale

~ 20K Rules

(not an optimized parser!)

Observed exponent:

3.6

Page 33: I256  Applied Natural Language Processing Fall 2009

33

Problems with PCFGs

• If we do no annotation, these trees differ only in one rule:– VP VP PP– NP NP PP

• Parse will go one way or the other, regardless of words• Lexicalization allows us to be sensitive to specific words

Page 34: I256  Applied Natural Language Processing Fall 2009

34

Lexicalized Trees• Add “headwords” to

each phrasal node– Syntactic vs. semantic

heads– Headship not in (most)

treebanks– Usually use head rules,

e.g.:• NP:

– Take leftmost NP– Take rightmost N*– Take rightmost JJ– Take right child

• VP:– Take leftmost VB*– Take leftmost VP– Take left child

Page 35: I256  Applied Natural Language Processing Fall 2009

35

Lexicalized PCFGs?• Problem: we now have to estimate probabilities like

• Never going to get these atomically off of a treebank

• Solution: break up derivation into smaller steps

Page 36: I256  Applied Natural Language Processing Fall 2009

36

Resources• Foundation of Stat NLP (chapter 12) • Dan Klein’s group (and his class cs 288)

– http://www.cs.berkeley.edu/~klein– http://nlp.cs.berkeley.edu/Main.html#Parsing

• Speech and Language processing. Jurafsky and Martin (chapters 12, 13, 14)

• Software:– Berkeley parser (Klein group) http://

code.google.com/p/berkeleyparser/– Michael Collins parser:

http://people.csail.mit.edu/mcollins/code.html

Page 37: I256  Applied Natural Language Processing Fall 2009

37

Dependency grammars

• Phrase structure grammar is concerned with how words and sequences of words combine to form constituents.

• A distinct and complementary approach, dependency grammar, focuses instead on how words relate to other words

• Dependency is a binary asymmetric relation that holds between a head and its dependents.

Page 38: I256  Applied Natural Language Processing Fall 2009

38

Dependency grammars

• Dependency graph: labeled directed graph– nodes are the lexical items– labeled arcs represent dependency relations

from heads to dependents• Can be used to directly express

grammatical functions as a type of dependency.

Page 39: I256  Applied Natural Language Processing Fall 2009

39

Dependency grammars

• Dependency structure gives attachments.• In principle, can express any kind of

dependency• How to find the dependencies?

Shaw Publishing acquired 30 % of American City in March

WHAT

WHEN

WHO

Page 40: I256  Applied Natural Language Processing Fall 2009

40

• Link up pairs with high mutual information– Mutual information measures how much one word

tells us about another. – The doesn’t tell us much about what follows

• I.e. “the” and “red” have small mutual information– United ?

Idea: Lexical Affinity Models

congress narrowly passed the amended bill

Page 41: I256  Applied Natural Language Processing Fall 2009

41

Problem: Non-Syntactic Affinity• Words select other words (also) on syntactic

grounds• Mutual information between words does not

necessarily indicate syntactic selection.

a new year begins in new york

expect brushbacks but no beanballs

congress narrowly passed the amended bill

Page 42: I256  Applied Natural Language Processing Fall 2009

42

Idea: Word Classes• Individual words like congress are entwined with

semantic facts about the world.• Syntactic classes, like NOUN and ADVERB are

bleached of word-specific semantics.• Automatic word classes more likely to look like DAYS-

OF-WEEK or PERSON-NAME.• We could build dependency models over word classes.

[cf. Carroll and Charniak, 1992]

congress narrowly passed the amended bill

NOUN ADVERB VERB DET PARTICIPLE NOUN

Page 43: I256  Applied Natural Language Processing Fall 2009

43

Review• Python and NLTK• Lower level text processing (stemming

segmentation…)• Grammar

– Morphology– Part-of-speech (POS)– Phrase level syntax (PCFG, parsing)

• Semantics– Word sense disambiguation (WSD)– Lexical acquisition

Page 44: I256  Applied Natural Language Processing Fall 2009

44

Review• “Higher level” apps

– Information extraction– Machine translation– Summarization– Question answering– Information retrieval

• Intro to probability theory and graphical models (GM)– Example for WSD– Language Models (LM) and smoothing

• Corpus-based statistical approaches to tackle NLP problems– Data (corpora, labels, linguistic resources)– Feature extractions – Statistical models: Classification and clustering

Page 45: I256  Applied Natural Language Processing Fall 2009

45

Review• What I hope we achieved:• Given a language problem, know how to frame it in

NLP language, and use the appropriate algorithms to tackle it

• Overall idea of linguistic problems • Overall understanding of NLP tasks, both lower level

and higher level application• Basic understanding of Stat NLP

– Corpora & annotation– Classification, clustering – Sparsity problem

• Familiarity with Python and NLTK


Recommended