+ All Categories
Home > Documents > TEORIE E TECNICHE DEL RICONOSCIMENTO Linguistica computazionale in Python: -Analisi sintattica...

TEORIE E TECNICHE DEL RICONOSCIMENTO Linguistica computazionale in Python: -Analisi sintattica...

Date post: 11-Jan-2016
Category:
Upload: george-morgan
View: 219 times
Download: 3 times
Share this document with a friend
31
TEORIE E TECNICHE DEL RICONOSCIMENTO Linguistica computazionale in Python: - Analisi sintattica (parsing)
Transcript
Page 1: TEORIE E TECNICHE DEL RICONOSCIMENTO Linguistica computazionale in Python: -Analisi sintattica (parsing)

TEORIE E TECNICHE DEL RICONOSCIMENTO

Linguistica computazionale in Python:- Analisi sintattica (parsing)

Page 2: TEORIE E TECNICHE DEL RICONOSCIMENTO Linguistica computazionale in Python: -Analisi sintattica (parsing)

DAL CHUNKING ALL’ANALISI SINTATTICA COMPLETA

Page 3: TEORIE E TECNICHE DEL RICONOSCIMENTO Linguistica computazionale in Python: -Analisi sintattica (parsing)

PROBLEMA: AMBIGUITA’

While hunting in Africa, I shot an elephant in my pajamas. How an elephant got into my pajamas I'll never know.

Page 4: TEORIE E TECNICHE DEL RICONOSCIMENTO Linguistica computazionale in Python: -Analisi sintattica (parsing)

PROBLEMA: AMBIGUITA’

While hunting in Africa, I shot an elephant in my pajamas. How an elephant got into my pajamas I'll never know.

Page 5: TEORIE E TECNICHE DEL RICONOSCIMENTO Linguistica computazionale in Python: -Analisi sintattica (parsing)

CARATTERIZZAZIONE DELLA SINTASSI DI UNA LINGUA: CONTEXT-FREE GRAMMARS

• Slides ELN?

Page 6: TEORIE E TECNICHE DEL RICONOSCIMENTO Linguistica computazionale in Python: -Analisi sintattica (parsing)

CARATTERIZZAZIONE DELLA SINTASSI DI UNA LINGUA: CONTEXT-FREE GRAMMARS

• Capture constituency and ordering– Ordering:

• What are the rules that govern the ordering of words and bigger units in the language?

– Constituency:How words group into units and how the various kinds of units behave

Page 7: TEORIE E TECNICHE DEL RICONOSCIMENTO Linguistica computazionale in Python: -Analisi sintattica (parsing)

Constituency• E.g., Noun phrases (NPs)

• Three parties from Brooklyn• A high-class spot such as Mindy’s• The Broadway coppers• They• Harry the Horse• The reason he comes into the Hot Box

• How do we know these form a constituent?

Page 8: TEORIE E TECNICHE DEL RICONOSCIMENTO Linguistica computazionale in Python: -Analisi sintattica (parsing)

Constituency (II)– They can all appear before a verb:

• Three parties from Brooklyn arrive…• A high-class spot such as Mindy’s attracts…• The Broadway coppers love…• They sit

– But individual words can’t always appear before verbs:• *from arrive…• *as attracts…• *the is• *spot is…

– Must be able to state generalizations like:• Noun phrases occur before verbs

Page 9: TEORIE E TECNICHE DEL RICONOSCIMENTO Linguistica computazionale in Python: -Analisi sintattica (parsing)

Constituency (III)

• Preposing and postposing:– On September 17th, I’d like to fly from Atlanta to Denver– I’d like to fly on September 17th from Atlanta to Denver– I’d like to fly from Atlanta to Denver on September 17th.

• But not:– *On September, I’d like to fly 17th from Atlanta to Denver– *On I’d like to fly September 17th from Atlanta to Denver

Page 10: TEORIE E TECNICHE DEL RICONOSCIMENTO Linguistica computazionale in Python: -Analisi sintattica (parsing)

Indicating constituents: brackets, trees

• [S [NP [PRO I]] [VP [V prefer] [NP [Det a] [Nom [N morning]

[N flight] ] ] ] ]S

NP VP

NP

VerbPro

Nom

Det NounNoun

I prefer morninga flight

Page 11: TEORIE E TECNICHE DEL RICONOSCIMENTO Linguistica computazionale in Python: -Analisi sintattica (parsing)

NLE 12

Beyond regular languages: Context-Free Grammars

S NP VPNP Det NominalNominal NounVP V

Det theDet aNoun flightV left

Page 12: TEORIE E TECNICHE DEL RICONOSCIMENTO Linguistica computazionale in Python: -Analisi sintattica (parsing)

CFGs: set of rules

• S -> NP VP– This says that there are units called S, NP, and VP

in this language– That an S consists of an NP followed immediately

by a VP– Doesn’t say that that’s the only kind of S– Nor does it say that this is the only place that NPs

and VPs occur

Page 13: TEORIE E TECNICHE DEL RICONOSCIMENTO Linguistica computazionale in Python: -Analisi sintattica (parsing)

Generativity

• As with FSAs you can view these rules as either analysis or synthesis machines– Generate strings in the language– Reject strings not in the language– Impose structures (trees) on strings in the

language

• How can we define grammatical vs. ungrammatical sentences?

Page 14: TEORIE E TECNICHE DEL RICONOSCIMENTO Linguistica computazionale in Python: -Analisi sintattica (parsing)

Derivations

• A derivation is a sequence of rules applied to a string that accounts for that string– Covers all the elements in the string– Covers only the elements in the string

Page 15: TEORIE E TECNICHE DEL RICONOSCIMENTO Linguistica computazionale in Python: -Analisi sintattica (parsing)

Derivations as Trees

S

NP VP

NP

VerbPro

Nom

Det NounNoun

I prefer morninga flight

Page 16: TEORIE E TECNICHE DEL RICONOSCIMENTO Linguistica computazionale in Python: -Analisi sintattica (parsing)

CFGs more formally

• A context-free grammar has 4 parameters (“is a 4-tuple”)

1) A set of non-terminal symbols (“variables”) N

2) A set of terminal symbols (disjoint from N)

3) A set of productions P, each of the form• A -> • Where A is a non-terminal and is a string of symbols from the

infinite set of strings ( N)*

4) A designated start symbol S

Page 17: TEORIE E TECNICHE DEL RICONOSCIMENTO Linguistica computazionale in Python: -Analisi sintattica (parsing)

Defining a CF language via derivation

• A string A derives a string B if – A can be rewritten as B via some series of rule applications

• More formally:– If A -> is a production of P– and are any strings in the set ( N)*– Then we say that

• A directly derives or A – Derivation is a generalization of direct derivation– Let 1, 2, … m be strings in ( N)*, m>= 1, s.t.

• 1 2, 2 3… m-1 m

• We say that 1derives m or 1* m

– We then formally define language LG generated by grammar G• A set of strings composed of terminal symbols derived from S• LG = {w | w is in * and S * w}

Page 18: TEORIE E TECNICHE DEL RICONOSCIMENTO Linguistica computazionale in Python: -Analisi sintattica (parsing)

NLE 22

What `context free’ means

Page 19: TEORIE E TECNICHE DEL RICONOSCIMENTO Linguistica computazionale in Python: -Analisi sintattica (parsing)

NLE 23

Derivations and languages

• The language LG GENERATED by a CFG grammar G is the set of strings of TERMINAL symbols that can be derived from the start symbol S using the production rules in G– LG = {w | w is in * and S derives w}

• The strings in LG are called GRAMMATICAL

• The strings not in LG are called UNGRAMMATICAL

Page 20: TEORIE E TECNICHE DEL RICONOSCIMENTO Linguistica computazionale in Python: -Analisi sintattica (parsing)

NLE 24

Grammar development

• One of the most basic skills in NLE is the ability to write a CFG for some fragment of a language (e.g., the dates)

• We’ll briefly cover some of the issues to be addressed when writing small CFG grammars

Page 21: TEORIE E TECNICHE DEL RICONOSCIMENTO Linguistica computazionale in Python: -Analisi sintattica (parsing)

CFG in PYTHON

• NLTK, 8.3

Page 22: TEORIE E TECNICHE DEL RICONOSCIMENTO Linguistica computazionale in Python: -Analisi sintattica (parsing)

ANALISI SINTATTICA

• TOP-DOWN search: the parse tree has to be rooted in the start symbol S– EXPECTATION-DRIVEN parsing– Esempio; RECURSIVE DESCENT

• BOTTOM-UP search: the parse tree must be an analysis of the input– DATA-DRIVEN parsing– Esempio: SHIFT-REDUCE

Page 23: TEORIE E TECNICHE DEL RICONOSCIMENTO Linguistica computazionale in Python: -Analisi sintattica (parsing)

TOP-DOWN PARSING CON NLTK

• Recursive descent parsing (NLTK, 8.3)– nltk.RecursiveDescentParser(grammar)– nltk.app.rdparser()

Page 24: TEORIE E TECNICHE DEL RICONOSCIMENTO Linguistica computazionale in Python: -Analisi sintattica (parsing)

BOTTOM-UP PARSING CON NLTK

• Shift-reduce (NLTK, 8.3, p. 305)– nltk.app.srparser()– ShiftReduceParser(grammar)

Page 25: TEORIE E TECNICHE DEL RICONOSCIMENTO Linguistica computazionale in Python: -Analisi sintattica (parsing)

MODELLI PIU’ AVANZATI DI PARSING

• Left corner (NLTK)• Chart (NLTK)

Page 26: TEORIE E TECNICHE DEL RICONOSCIMENTO Linguistica computazionale in Python: -Analisi sintattica (parsing)

DEPENDENCIES E DEPENDENCY GRAMMAR (NLTK, 8.5)

Page 27: TEORIE E TECNICHE DEL RICONOSCIMENTO Linguistica computazionale in Python: -Analisi sintattica (parsing)

IL PROBLEMA DELL’AMBIGUITA’

• Ambiguity – Church and Patel (1982): the number of

attachment ambiguities grows like the Catalan numbers

• C(2) = 2, C(3) = 5, C(4) = 14, C(5) = 132, C(6) = 469, C(7) = 1430, C(8) = 4867

• Avoiding reparsing

Page 28: TEORIE E TECNICHE DEL RICONOSCIMENTO Linguistica computazionale in Python: -Analisi sintattica (parsing)

COMMON STRUCTURAL AMBIGUITIES

• COORDINATION ambiguity– OLD (MEN AND WOMEN) vs

(OLD MEN) AND WOMEN• ATTACHMENT ambiguity:

– Gerundive VP attachment ambiguity• I saw the Eiffel Tower flying to Paris

– PP attachment ambiguity• I shot an elephant in my pajamas

Page 29: TEORIE E TECNICHE DEL RICONOSCIMENTO Linguistica computazionale in Python: -Analisi sintattica (parsing)

PP ATTACHMENT AMBIGUITY

Page 30: TEORIE E TECNICHE DEL RICONOSCIMENTO Linguistica computazionale in Python: -Analisi sintattica (parsing)

AMBIGUITY: SOLUTIONS

• Use a PROBABILISTIC GRAMMAR (not covered in this module)

• Use semantics

Page 31: TEORIE E TECNICHE DEL RICONOSCIMENTO Linguistica computazionale in Python: -Analisi sintattica (parsing)

SCRIVERE UNA GRAMMATICA

• NLTK, 8.6


Recommended