Parallel Tools for Natural Language Processing

Post on 03-Jan-2016

29 views 3 download

description

Parallel Tools for Natural Language Processing. Mark Brigham Melanie Goetz Andrew Hogue. 6.338 / 18.337 - March 16, 2004. Sentence Parsing. Consider the sentence: “John ate the cookie on the table” We want to: Tag the sentence with parts of speech Group the words by phrase. - PowerPoint PPT Presentation

transcript

Parallel Tools for Natural Language Processing

Mark Brigham

Melanie Goetz

Andrew Hogue

6.338 / 18.337 - March 16, 2004

• Consider the sentence:

“John ate the cookie on the table”

• We want to:– Tag the sentence with parts of speech– Group the words by phrase

Sentence Parsing

Context Free Grammars

• Recursive set of rules

• Defines what syntactic structure can be applied to a phrase or word

• Top-level rule S defines the sentence

S → NP VP

NP → Det N

NP → NP PP

VP → VP PP

VP → V NP

N → ‘cookie’

N → ‘table’

Det → ‘the’

V → ‘ate’

Context Free Grammars

• Applying a CFG to a sentence creates a parse-tree for that sentence

Context Free Grammars

Top-downparse

Context Free Grammars

Bottom-upparse

Parallelizable!

Ambiguity

More than one parse for a single sentence!

Parallelization

• Bottom-up rule application appropriate for parallel processing

• Ambiguous parses also parallelizable

• Long, complex sentences may be most interesting

• Proust?

Chart Parsing

• Create a matrix where entries correspond to words/phrases

• If there is a valid CFG parse of a phrase [i,j], add it to that matrix cell

• A cell [i,j] may only depend on other cells [m,n] where m < i and n < j.

John ate the cookie on the table

John

ate

the

cookie

on

the

table

John ate the cookie on the table

John

ate

the

cookie

on

the

table

John ate the cookie on the table

John

ate

the

cookie

on

the

table

John ate the cookie on the table

John

ate

the

cookie

on

the

table

John ate the cookie on the table

John

ate

the

cookie

on

the

table

John ate the cookie on the table

John

ate

the

cookie

on

the

table

John ate the cookie on the table

John

ate

the

cookie

on

the

table

John ate the cookie on the table

John

ate

the

cookie

on

the

table

John ate the cookie on the table

John

ate

the

cookie

on

the

table

John ate the cookie on the table

John

ate

the

cookie

on

the

table

John ate the cookie on the table

John

ate

the

cookie

on

the

table

Other Tools

• Considering parallelizing other NLP tools

• Word-stemming: Multiple finite state automata applied to a single word in parallel

• Automated part-of-speech recognition on large corpora