+ All Categories
Home > Documents > CS626-460: Language Technology for the Web/Natural Language Processing

CS626-460: Language Technology for the Web/Natural Language Processing

Date post: 08-Feb-2016
Category:
Upload: pia
View: 28 times
Download: 0 times
Share this document with a friend
Description:
CS626-460: Language Technology for the Web/Natural Language Processing. Pushpak Bhattacharyya CSE Dept., IIT Bombay Constituent Parsing and Algorithms (with major contributions from Dr. Rajat Mohanty). Syntax. - PowerPoint PPT Presentation
Popular Tags:
45
CS626-460: Language Technology for the Web/Natural Language Processing Pushpak Bhattacharyya CSE Dept., IIT Bombay Constituent Parsing and Algorithms (with major contributions from Dr. Rajat Mohanty)
Transcript
Page 1: CS626-460: Language Technology for the Web/Natural Language Processing

CS626-460: Language Technology for the Web/Natural

Language Processing

Pushpak BhattacharyyaCSE Dept., IIT Bombay

Constituent Parsing and Algorithms(with major contributions from Dr. Rajat

Mohanty)

Page 2: CS626-460: Language Technology for the Web/Natural Language Processing

Syntax Syntax is the study of the combination

of words into phrases, clauses and sentences.

Syntax describes how sentences and their constituents are structured.

Page 3: CS626-460: Language Technology for the Web/Natural Language Processing

Grammar A finite set of rules

that generates only and all sentences of a language.

that assigns an appropriate structural description to each one.

Page 4: CS626-460: Language Technology for the Web/Natural Language Processing

Grammatical Analysis Techniques Two main devices

Morphological Categorial Functional

Sequential Hierarchical Transformational

Breaking up a String Labeling the Constituents

Page 5: CS626-460: Language Technology for the Web/Natural Language Processing

Hierarchical Breaking up and Categorial Labeling

S

VP

V Adv

ran away

NP

A N

Poor John

Poor John ran away.

Page 6: CS626-460: Language Technology for the Web/Natural Language Processing

Hierarchical Breaking up and Functional Labeling

Immediate Constituent (IC) Analysis Construction types in terms of the

function of the constituents: Predication (subject + predicate) Modification (modifier + head) Complementation (verbal + complement) Subordination (subordinator + dependent

unit) Coordination (independent unit +

coordinator)

Page 7: CS626-460: Language Technology for the Web/Natural Language Processing

S

HeadModifier

In the morning, the sky looked much brighter

Subordinator DU PredicateSubject

Head

Head

Head Verbal ComplementModifier Modifier

Modifier

In the morning, the sky looked much brighter.

An Example

Page 8: CS626-460: Language Technology for the Web/Natural Language Processing

Noun Phrases

John

NP

N

student

NP

N

the

Det

student

NP

N

the

Det

intelligent

AdjP

John the student the intelligent student

Page 9: CS626-460: Language Technology for the Web/Natural Language Processing

Phrases

Page 10: CS626-460: Language Technology for the Web/Natural Language Processing

Noun Phrase

five

NP

Quant

his

Det

first

Ord

students

N

PhD

N

his first five PhD students

Page 11: CS626-460: Language Technology for the Web/Natural Language Processing

Noun Phrase

five

NP

Quant

the

Det

students

N

best

AP

of my class

PP

The five best students of my class

Page 12: CS626-460: Language Technology for the Web/Natural Language Processing

Verb Phrases

sing

VP

V

can

Aux

the ball

VP

NP

can

Aux

hit

V

can sing can hit the ball

Page 13: CS626-460: Language Technology for the Web/Natural Language Processing

Verb Phrase

a flower

VP

NP

can

Aux

give

V

to Mary

PP

Can give a flower to Mary

Page 14: CS626-460: Language Technology for the Web/Natural Language Processing

Verb Phrase

John

VP

NP

may

Aux

make

V

the chairman

NP

may make John the chairman

Page 15: CS626-460: Language Technology for the Web/Natural Language Processing

Verb Phrase

the book

VP

NP

may

Aux

find

V

very interesting

AP

may find the book very interesting

Page 16: CS626-460: Language Technology for the Web/Natural Language Processing

Prepositional Phrases

in the classroom

the river

PP

NP

near

P

the classroom

PP

NP

in

P

near the river

Page 17: CS626-460: Language Technology for the Web/Natural Language Processing

Adjective Phrases

intelligent

AP

A

honest

AP

A

very

Degree

of sweets

AP

PP

fond

A

intelligent very honest fond of sweets

Page 18: CS626-460: Language Technology for the Web/Natural Language Processing

Adjective Phrase• very worried that she might have done badly in the

assignment

that she might have done badly in the assignment

AP

S’

very

Degree

worried

A

Page 19: CS626-460: Language Technology for the Web/Natural Language Processing

A segment of English Grammar S’(C) S S{NP/S’} VP VP(AP+) V (AP+) ({NP/S’})

(AP+) (PP+) (AP+) NP(D) (AP+) N (PP+) PPP NP AP(AP) A

Page 20: CS626-460: Language Technology for the Web/Natural Language Processing

PSG Parse Tree John wrote those words in the Book of Proverbs.

S

VPNP

VPropN NP

John wrote those words

PP

NP

in

P

the book

of proverbs

NP PP

Page 21: CS626-460: Language Technology for the Web/Natural Language Processing

Penn Treebank

(S (NP-SBJ (NP John)) (VP wrote (NP those words) (PP-LOC in

(NP (NP-TTL (NP the Book)(PP of (NP Proverbs)))

John wrote those words in the Book of Proverbs.

Page 22: CS626-460: Language Technology for the Web/Natural Language Processing

PSG Parse Tree Official trading in the shares will start in Paris on

Nov 6.S

VP

NP

NAP

official

PP

trading will start on Nov 6

A

PP

NP

in

P

the shares

NP

PPVAux

in Paris

Page 23: CS626-460: Language Technology for the Web/Natural Language Processing

Penn POS Tags

[ Official/JJ trading/NN ]in/IN [ the/DT shares/NNS ]will/MD start/VB in/IN [ Paris/NNP ]on/IN [ Nov./NNP 6/CD ]

Official trading in the shares will start in Paris on Nov 6.

Page 24: CS626-460: Language Technology for the Web/Natural Language Processing

Penn Treebank

( (S (NP-SBJ (NP Official trading) (PP in (NP the shares))) (VP will (VP start

(PP-LOC in (NP Paris))

(PP-TMP on (NP (NP Nov 6)

Official trading in the shares will start in Paris on Nov 6.

Page 25: CS626-460: Language Technology for the Web/Natural Language Processing

Penn POS Tag Sset Adjective: JJ Adverb: RB Cardinal Number: CD Determiner: DT Preposition: IN Coordinating Conjunction CC Subordinating Conjunction: IN Singular Noun: NN Plural Noun: NNS Personal Pronoun: PP Proper Noun: NP Verb base form: VB Modal verb: MD Verb (3sg Pres): VBZ Wh-determiner: WDT Wh-pronoun: WP

Page 26: CS626-460: Language Technology for the Web/Natural Language Processing

Basic Parsing Strategy

Page 27: CS626-460: Language Technology for the Web/Natural Language Processing

A Fragment of English Grammar

S NP VPVP V NPNP NNP | ART NNNP RamV ate | sawART a | an | theN rice | apple | movie

Page 28: CS626-460: Language Technology for the Web/Natural Language Processing

Derivation

S => NP VP (rewrite S) => NNP VP (rewrite NP) => Ram VP (rewrite NNP) => Ram V NP (rewrite VP) => Ram ate NP (rewrite V) => Ram ate ART N (rewrite NP) => Ram ate the N (rewrite ART) => Ram ate the rice (rewrite N)

MultipleChoicePoints

• S is a special symbol called start symbol.

Page 29: CS626-460: Language Technology for the Web/Natural Language Processing

Two Strategies : Top-Down & Bottom-Up

Top down : Start with S and generate the sentence.

Bottom up : Start with the words in the sentence and use the rewrite rules backwards to reduce the sequence of symbols to produce S.

Previous slide showed top-down strategy.

Page 30: CS626-460: Language Technology for the Web/Natural Language Processing

Bottom-Up DerivationRam ate the rice

=> NNP ate the rice (rewrite Ram)=> NNP V the rice (rewrite ate)=> NNP V ART rice (rewrite the)=> NNP V ART N (rewrite rice)=> NP V ART N (rewrite NNP)=> NP V NP (rewrite ART N)=> NP VP (rewrite V NP)=> S

Page 31: CS626-460: Language Technology for the Web/Natural Language Processing

Parsing AlgorithmA procedure that “searches” through the grammatical

rules to find a combination that generates a tree which

stands for the structure of the sentence

Page 32: CS626-460: Language Technology for the Web/Natural Language Processing

Top-Down Parsing (using A*)

DFS on the AND-OR graph

Data structures: Open List (OL): Nodes to be expanded Closed List (CL): Expanded Nodes Input List (IL): Words of sentence to be

parsed Moving Head (MH): Walks over the IL

Page 33: CS626-460: Language Technology for the Web/Natural Language Processing

Trace of Top-Down Parsing

OL

CL (empty)

IL

S

Ram ate the rice

Initial Condition (T0)

MH

Page 34: CS626-460: Language Technology for the Web/Natural Language Processing

Trace of Top-Down Parsing

OL

CL

IL

MH

NP VP

S

Ram ate the rice

T1:

Page 35: CS626-460: Language Technology for the Web/Natural Language Processing

Trace of Top-Down Parsing

OL

CL

IL

MH

NNP ART N VP

S NP

Ram ate the rice

T2:

Page 36: CS626-460: Language Technology for the Web/Natural Language Processing

Trace of Top-Down Parsing

OL

CL

IL

ART N VP

S NP NNP

Ram ate the rice

T3:

MH (portion of Input consumed)

Page 37: CS626-460: Language Technology for the Web/Natural Language Processing

Trace of Top-Down Parsing

OL

CL

IL

N VP

S NP NNP ART*

Ram ate the rice

T4:

MH

(* indicates ‘useless’ expansion)

Page 38: CS626-460: Language Technology for the Web/Natural Language Processing

Trace of Top-Down Parsing

OL

CL

IL

VP

S NP NNP ART* N*

Ram ate the rice

T5:

MH

Page 39: CS626-460: Language Technology for the Web/Natural Language Processing

Trace of Top-Down Parsing

OL

CL

IL

V NP

S NP NNP ART* N*

Ram ate the rice

T6:

MH

Page 40: CS626-460: Language Technology for the Web/Natural Language Processing

Trace of Top-Down Parsing

OL

CL

IL

NP

S NP NNP ART* N* V

Ram ate the rice

T7:

MH

Page 41: CS626-460: Language Technology for the Web/Natural Language Processing

Trace of Top-Down Parsing

OL

CL

IL

NNP ART N

S NP NNP ART* N* V NP

Ram ate the rice

T8:

MH

Page 42: CS626-460: Language Technology for the Web/Natural Language Processing

Trace of Top-Down Parsing

OL

CL

IL

ART N

S NP NNP ART* N* V NNP*

Ram ate the rice

T9:

MH

Page 43: CS626-460: Language Technology for the Web/Natural Language Processing

Trace of Top-Down Parsing

OL

CL

IL

N

S NP NNP ART* N* V NNP ART

Ram ate the rice

T10:

MH

Page 44: CS626-460: Language Technology for the Web/Natural Language Processing

Trace of Top-Down Parsing

OL

CL

IL

S NP NNP ART* N* V NNP ART N

Ram ate the rice

T11:

MH

Successful Termination: OL empty AND MH at the end of IL.

Page 45: CS626-460: Language Technology for the Web/Natural Language Processing

Bottom-Up ParsingBasic idea: Refer to words from the

lexicon. Obtain all POSs for each word. Keep combining until S is

obtained. (to be continued)


Recommended