CS626-460: Language Technology for the Web/Natural Language Processing

Post on 08-Feb-2016

28 views 0 download

Tags:

description

CS626-460: Language Technology for the Web/Natural Language Processing. Pushpak Bhattacharyya CSE Dept., IIT Bombay Constituent Parsing and Algorithms (with major contributions from Dr. Rajat Mohanty). Syntax. - PowerPoint PPT Presentation

transcript

CS626-460: Language Technology for the Web/Natural

Language Processing

Pushpak BhattacharyyaCSE Dept., IIT Bombay

Constituent Parsing and Algorithms(with major contributions from Dr. Rajat

Mohanty)

Syntax Syntax is the study of the combination

of words into phrases, clauses and sentences.

Syntax describes how sentences and their constituents are structured.

Grammar A finite set of rules

that generates only and all sentences of a language.

that assigns an appropriate structural description to each one.

Grammatical Analysis Techniques Two main devices

Morphological Categorial Functional

Sequential Hierarchical Transformational

Breaking up a String Labeling the Constituents

Hierarchical Breaking up and Categorial Labeling

S

VP

V Adv

ran away

NP

A N

Poor John

Poor John ran away.

Hierarchical Breaking up and Functional Labeling

Immediate Constituent (IC) Analysis Construction types in terms of the

function of the constituents: Predication (subject + predicate) Modification (modifier + head) Complementation (verbal + complement) Subordination (subordinator + dependent

unit) Coordination (independent unit +

coordinator)

S

HeadModifier

In the morning, the sky looked much brighter

Subordinator DU PredicateSubject

Head

Head

Head Verbal ComplementModifier Modifier

Modifier

In the morning, the sky looked much brighter.

An Example

Noun Phrases

John

NP

N

student

NP

N

the

Det

student

NP

N

the

Det

intelligent

AdjP

John the student the intelligent student

Phrases

Noun Phrase

five

NP

Quant

his

Det

first

Ord

students

N

PhD

N

his first five PhD students

Noun Phrase

five

NP

Quant

the

Det

students

N

best

AP

of my class

PP

The five best students of my class

Verb Phrases

sing

VP

V

can

Aux

the ball

VP

NP

can

Aux

hit

V

can sing can hit the ball

Verb Phrase

a flower

VP

NP

can

Aux

give

V

to Mary

PP

Can give a flower to Mary

Verb Phrase

John

VP

NP

may

Aux

make

V

the chairman

NP

may make John the chairman

Verb Phrase

the book

VP

NP

may

Aux

find

V

very interesting

AP

may find the book very interesting

Prepositional Phrases

in the classroom

the river

PP

NP

near

P

the classroom

PP

NP

in

P

near the river

Adjective Phrases

intelligent

AP

A

honest

AP

A

very

Degree

of sweets

AP

PP

fond

A

intelligent very honest fond of sweets

Adjective Phrase• very worried that she might have done badly in the

assignment

that she might have done badly in the assignment

AP

S’

very

Degree

worried

A

A segment of English Grammar S’(C) S S{NP/S’} VP VP(AP+) V (AP+) ({NP/S’})

(AP+) (PP+) (AP+) NP(D) (AP+) N (PP+) PPP NP AP(AP) A

PSG Parse Tree John wrote those words in the Book of Proverbs.

S

VPNP

VPropN NP

John wrote those words

PP

NP

in

P

the book

of proverbs

NP PP

Penn Treebank

(S (NP-SBJ (NP John)) (VP wrote (NP those words) (PP-LOC in

(NP (NP-TTL (NP the Book)(PP of (NP Proverbs)))

John wrote those words in the Book of Proverbs.

PSG Parse Tree Official trading in the shares will start in Paris on

Nov 6.S

VP

NP

NAP

official

PP

trading will start on Nov 6

A

PP

NP

in

P

the shares

NP

PPVAux

in Paris

Penn POS Tags

[ Official/JJ trading/NN ]in/IN [ the/DT shares/NNS ]will/MD start/VB in/IN [ Paris/NNP ]on/IN [ Nov./NNP 6/CD ]

Official trading in the shares will start in Paris on Nov 6.

Penn Treebank

( (S (NP-SBJ (NP Official trading) (PP in (NP the shares))) (VP will (VP start

(PP-LOC in (NP Paris))

(PP-TMP on (NP (NP Nov 6)

Official trading in the shares will start in Paris on Nov 6.

Penn POS Tag Sset Adjective: JJ Adverb: RB Cardinal Number: CD Determiner: DT Preposition: IN Coordinating Conjunction CC Subordinating Conjunction: IN Singular Noun: NN Plural Noun: NNS Personal Pronoun: PP Proper Noun: NP Verb base form: VB Modal verb: MD Verb (3sg Pres): VBZ Wh-determiner: WDT Wh-pronoun: WP

Basic Parsing Strategy

A Fragment of English Grammar

S NP VPVP V NPNP NNP | ART NNNP RamV ate | sawART a | an | theN rice | apple | movie

Derivation

S => NP VP (rewrite S) => NNP VP (rewrite NP) => Ram VP (rewrite NNP) => Ram V NP (rewrite VP) => Ram ate NP (rewrite V) => Ram ate ART N (rewrite NP) => Ram ate the N (rewrite ART) => Ram ate the rice (rewrite N)

MultipleChoicePoints

• S is a special symbol called start symbol.

Two Strategies : Top-Down & Bottom-Up

Top down : Start with S and generate the sentence.

Bottom up : Start with the words in the sentence and use the rewrite rules backwards to reduce the sequence of symbols to produce S.

Previous slide showed top-down strategy.

Bottom-Up DerivationRam ate the rice

=> NNP ate the rice (rewrite Ram)=> NNP V the rice (rewrite ate)=> NNP V ART rice (rewrite the)=> NNP V ART N (rewrite rice)=> NP V ART N (rewrite NNP)=> NP V NP (rewrite ART N)=> NP VP (rewrite V NP)=> S

Parsing AlgorithmA procedure that “searches” through the grammatical

rules to find a combination that generates a tree which

stands for the structure of the sentence

Top-Down Parsing (using A*)

DFS on the AND-OR graph

Data structures: Open List (OL): Nodes to be expanded Closed List (CL): Expanded Nodes Input List (IL): Words of sentence to be

parsed Moving Head (MH): Walks over the IL

Trace of Top-Down Parsing

OL

CL (empty)

IL

S

Ram ate the rice

Initial Condition (T0)

MH

Trace of Top-Down Parsing

OL

CL

IL

MH

NP VP

S

Ram ate the rice

T1:

Trace of Top-Down Parsing

OL

CL

IL

MH

NNP ART N VP

S NP

Ram ate the rice

T2:

Trace of Top-Down Parsing

OL

CL

IL

ART N VP

S NP NNP

Ram ate the rice

T3:

MH (portion of Input consumed)

Trace of Top-Down Parsing

OL

CL

IL

N VP

S NP NNP ART*

Ram ate the rice

T4:

MH

(* indicates ‘useless’ expansion)

Trace of Top-Down Parsing

OL

CL

IL

VP

S NP NNP ART* N*

Ram ate the rice

T5:

MH

Trace of Top-Down Parsing

OL

CL

IL

V NP

S NP NNP ART* N*

Ram ate the rice

T6:

MH

Trace of Top-Down Parsing

OL

CL

IL

NP

S NP NNP ART* N* V

Ram ate the rice

T7:

MH

Trace of Top-Down Parsing

OL

CL

IL

NNP ART N

S NP NNP ART* N* V NP

Ram ate the rice

T8:

MH

Trace of Top-Down Parsing

OL

CL

IL

ART N

S NP NNP ART* N* V NNP*

Ram ate the rice

T9:

MH

Trace of Top-Down Parsing

OL

CL

IL

N

S NP NNP ART* N* V NNP ART

Ram ate the rice

T10:

MH

Trace of Top-Down Parsing

OL

CL

IL

S NP NNP ART* N* V NNP ART N

Ram ate the rice

T11:

MH

Successful Termination: OL empty AND MH at the end of IL.

Bottom-Up ParsingBasic idea: Refer to words from the

lexicon. Obtain all POSs for each word. Keep combining until S is

obtained. (to be continued)