Date post: | 17-Feb-2019 |
Category: |
Documents |
Upload: | truonghanh |
View: | 218 times |
Download: | 0 times |
1
Grammatiche Tiepidamente CS
Alessandro Mazzei
[email protected]@di.unito.it
Dipartimento di InformaticaUniversità di Torino
2
Outline Sintassi e grammatiche generative
La gerarchia di Chomsky e il linguaggio
naturale
Mildly-Context Sensitive Language e Tree
Adjoining Grammars (TAG)
Anatomia di un parser
3
Outline Sintassi e grammatiche generative
La gerarchia di Chomsky e il linguaggio
naturale
Mildly-Context Sensitive Language e Tree
Adjoining Grammars (TAG)
Anatomia di un parser
4
Natural Language ProcessingPhonetics acoustic and perceptual elementsPhonology inventory of basic sounds (phonemes) and basic rules for combination, e.g. vowel harmonyMorphology how morphemes combine to form words, relationship of phonemes to meaning Syntax sentence formation, word order and the formation of constituents from word groupingsSemantics how do word meanings recursively compose to form sentence meanings (from syntax to logical formulas) Pragmatics meaning that is not part of compositional meaning
5
Syntax and SemanticsPaolo ama Francesca
Paolo ama FrancescaN
NP
S
NV
VP
Francesca ama Paolo
Francesca ama PaoloN
NP
S
NV
VP
Syntactic Parsing Syntactic Parsing
8
Generative Grammars and Natural Languages
● Generative Grammars can model the natural language as a formal language
● The derivation tree can model the syntactic structure of the sentence
10
Constituency
Constituent = group of contiguous (?!) words ● that are as a unit [Fodor-Bever,Bock-Loebell]● that have syntactic properties
Ex. preposed-postposed, substitutability.Noun Phrases (NP), Verb Phrases (VP),...
● CFG: Constituent ⇔ non terminal symbols V
11
Toy Grammar
● G4=(Σ
4,{S,NP,VP,V
1,V
2},S,P
4})
Σ4 = {I,Anna,John,Harry,saw,see,swimming}
P4 = {S→ NP VP, VP→V
1 S, VP→V
2,
NP→I|John|Harry|Anna, V
1→saw|see, V
2→swimming}
14
Toy Grammar
INP
S
S NP VP⇒ ⇒I VPS→ NP VPVP→V
1 S
VP→V2
NP→I|John|Harry|AnnaV
1→saw|see
V2→swimming
VP
15
Toy Grammar
INP
S
SV1
VP
S NP VP I VP⇒ ⇒ ⇒I V1SS→ NP VP
VP→V1 S
VP→V2
NP→I|John|Harry|AnnaV
1→saw|see
V2→swimming
16
Toy Grammar
INP
S
SV1
VP
saw
S NP VP I VP I V⇒ ⇒ ⇒1S⇒
I saw S
S→ NP VPVP→V
1 S
VP→V2
NP→I|John|Harry|AnnaV
1→saw|see
V2→swimming
17
Toy Grammar
INP
S
SV1
VP
sawNP VP
S NP VP I VP I V⇒ ⇒ ⇒1S⇒
I saw S ⇒I saw NP VP
S→ NP VPVP→V
1 S
VP→V2
NP→I|John|Harry|AnnaV
1→saw|see
V2→swimming
18
Toy Grammar
INP
S
SV1
VP
sawNP VP
Harry
S NP VP I VP I V⇒ ⇒ ⇒1S⇒
I saw S I saw NP VP ⇒ ⇒I saw Harry VP
S→ NP VPVP→V
1 S
VP→V2
NP→I|John|Harry|AnnaV
1→saw|see
V2→swimming
19
Toy Grammar
I
V2
NP
S
SV1
VP
sawNP VP
Harry
S NP VP I VP I V⇒ ⇒ ⇒1S⇒
I saw S I saw NP VP ⇒ ⇒I saw Harry VP ⇒I saw Harry V
2
S→ NP VPVP→V
1 S
VP→V2
NP→I|John|Harry|AnnaV
1→saw|see
V2→swimming
20
Toy Grammar
I
V2
NP
S
SV1
VP
sawNP VP
Harry
swimming
S NP VP I VP I V⇒ ⇒ ⇒1S⇒
I saw S I saw NP VP ⇒ ⇒I saw Harry VP⇒I saw Harry V
2⇒
I saw Harry swimming
S→ NP VPVP→V
1 S
VP→V2
NP→I|John|Harry|AnnaV
1→saw|see
V2→swimming
21
Outline Sintassi e grammatiche generative
La gerarchia di Chomsky e il linguaggio
naturale
Mildly-Context Sensitive Language e Tree
Adjoining Grammars (TAG)
Anatomia di un parser
22
Languages Chomsky hierarchy
(ab)n
anbn
anbncn
a2n
LDiag
Linear A → aB
Context-freeS → aSb
Context-sensitiveCaa → aaCa
Type 0
Ψ → θ
23
Outline Sintassi e grammatiche generative
La gerarchia di Chomsky e il linguaggio
naturale
Mildly-Context Sensitive Language e Tree
Adjoining Grammars (TAG)
Anatomia di un parser
24
Mildly context sensitive languages
[Joshi85]● Include CFG languages● Nested and cross-serial dependencies
● Polynomially parsable● Constant growth property
a1 b
1 c
1 a
2 b
2 c
2a
1 b
1 c
1 c
2 b
2 a
2
25
Constant growth property● Definition A language L is constant growth if there
is a constants c0 and a finite set of constant C such
that for all w ε L where |w| >c0 there is a w' ε L
such that |w| = |w'| + c for some c ε C.● This property is the formal version of the linguistic
intuition that the sentence belonging to a natural language can be built from a finite set of bounded structures using the same linear operations [Wei88].
26
Languages Chomsky hierarchy
(ab)n
anbn
anbncn
a2n
LDiag
Linear A → aB
Context-freeS → aSb
Context-sensitiveCaa → aaCa
Type 0
Ψ → θ
27
Languages Chomsky hierarchy
(ab)n
anbn
anbncn
a2n
LDiag
Linear A → aB
Context-freeS → aSb
Context-sensitiveCaa → aaCa
Type 0
Ψ → θ
Mildly Context-sensitiveCB → f(C,B)
28
MCL ⇔ TAG,HG,LIG,CCG● Tree Adjoining Grammars (Joshi et al. 1975)● Head Grammars (Pollard 1984)● Linear Indexed Grammars (Gazdar 1985)● Combinatory Categorial Grammars (Steedman
1985)
➔ elementary structures➔ combination rules
29
Tree Adjoining Grammars
S
A↓ C
c B↓
a
A B
d
C
b C*
Elementary structures = multilevel trees
α1
α2 β
1α
3
36
TAG and MCSL● TAG properly contains all context-free languages
(finitely ambiguous). Theorem (Schabes 1990)
● TAG is polynomially parsable}: O(n6) – Embedded Push Down Automata, CKY (Vijay-
Shanker 1987)– Left-to-right parser (Schabes 1990)
37
TAG and MCSL● TAG captures only certain types of dependencies
– Cross-serial dependencies: verb-raised analisys (Kroch Santorini 1991)
– No mix-languages
● TAG has the constant--growth property: (Weir 1988)
38
Lexicalized Tree Adjoining Grammars
● Extended domain of locality ● Recursion Factorization by adjoining operation● Lexicalization
39
LTAG
S
NP↓ VP
Vpleases
NP↓
NP
NSue
NP
NBill
VP
ADVoften
VP*S
VP
Vpleases
VP
ADVoften
NP
NSue NP
NBill
Structures = multilevel treesOperations = substitution, adjoining
αpleases βoften
αSueαBill
40
Outline Sintassi e grammatiche generative
La gerarchia di Chomsky e il linguaggio
naturale
Mildly-Context Sensitive Language e Tree
Adjoining Grammars (TAG)
Anatomia di un parser
42
Anatomy of a Parser
(1) Grammar Context-Free, ...
(2) AlgorithmI. Search strategy
top-down, bottom-up, left-to-right, ...II.Memory organization
back-tracking, dynamic programming, ...(3) Oracle
Probabilistic, rule-based, ...
47
Parser 1
(1) Grammar Context-Free, ...
(2) AlgorithmI. Search strategy
top-down, bottom-up, left-to-right, ...II.Memory organization
back-tracking, dynamic programming, ...(3) Oracle
Probabilistic, rule-based, ...
48
Parser 1 (1)S→NP VPNP→DET NomNP→PropN
S→AUX NP VPAUX→doesNP→DET Nom
DET→thisNom→Noun
Noun→flightVP→Verb
52
Ambiguity● One sentence can have several “legal parse tree”● 15 words ⇒ ~1000000 parse trees
Dynamic Programming Earley Algorithm⇒
55
PCFGP(T
a) = .15 * .4 *.05 * .05 *
.35 * .75 * .4 * .4 * .4 * .3 * .4 * .5 =
= 1.5 x 10-6
P(Tb) = .15 * .4 *.4 * .05 *
.05 * .75 * .4 * .4 * .4 * .3 * .4 * .5 =
= 1.7 x 10-6
56
Parser 2 (CKY)
(1) Grammar Context-Free, ...
(2) AlgorithmI. Search strategy
top-down, bottom-up, left-to-right, ...II.Memory organization
back-tracking, dynamic programming, ...(3) Oracle
Probabilistic, rule-based, ...
57
CKY idea
W1 W
2 W
3 W
4 W
5
C
P(1,4,A) = pA * P(1,2,B) * P(3,4,C)
P(1,4,D) = pD
* P(1,2,B) * P(3,4,C)
B
A
A→BC [pA]
D→BC [pD]
W1 W
2 W
3 W
4 W
5
CB
D
62
Bill pleases Sue
Top-downLeft-to-Right strategy
Parsing
S → NP VPVP → V NPNP → NN → SueN → BillV → pleases
63
S → NP VPVP → V NPNP → NN → SueN → BillV → pleases
SNP VP
Bill pleases Sue
Parsing
Top-downLeft-to-Right strategy
64
S → NP VPVP → V NPNP → NN → SueN → BillV → pleases
SNP
N
VP
Bill pleases Sue
Parsing
Top-downLeft-to-Right strategy
65
S → NP VPVP → V NPNP → NN → SueN → BillV → pleases
SNP
NBill
VP
Bill pleases Sue
Parsing
Top-downLeft-to-Right strategy
66
S → NP VPVP → V NPNP → NN → SueN → BillV → pleases
SNP
NBill
VP
V NPBill pleases Sue
Parsing
Top-downLeft-to-Right strategy
67
S → NP VPVP → V NPNP → NN → SueN → BillV → pleases
SNP
NBill
VP
Vpleases
NPBill pleases Sue
Parsing
Top-downLeft-to-Right strategy
68
S → NP VPVP → V NPNP → NN → SueN → BillV → pleases
SNP
NBill
VP
Vpleases
NP
N
Bill pleases Sue
Parsing
Top-downLeft-to-Right strategy
69
S → NP VPVP → V NPNP → NN → SueN → BillV → pleases
SNP
NBill
VP
Vpleases
NP
NSue
Bill pleases Sue
Parsing
Top-downLeft-to-Right strategy
70
Anatomy of a Parser
(1) Grammar CFG, TAG, ...
(2) AlgorithmI. Search strategy
top-down, bottom-up, left-to-right, ...II.Memory organization
back-tracking, dynamic programming, ...(3) Oracle
Probabilistic, rule-based, ...
72
Reference[Kamide-et-al03] Y. Kamide, G. T.M. Altmann, and S. L.
Haywood. 2003. The time-course of prediction in incremental sentence processing: Evidence from anticipatory eye movements. In Journal of Memory and Language, 49.
[Milward1995] D. Milward. 1995. Incremental interpretation of categorial grammar. In Proceedings of EACL95.
[Phillips03] C. Phillips. 2003. Linear order and constituency. In Linguistic Inquiry, 34.
73
Reference
[Stabler94] E. P. Stabler. 1994. The finite connectivity of linguistic structure. In Perspectives on Sentence Processing.
[Sturt-Lombardo04] Sturt, P. and Lombardo, V. (2004). The time-course of processing of coordinate sentences. Poster presented at the 17th annual CUNY Sentence Processing Conference.
74
Reference[Purver-Kempson04] M. Purver and R. Kempson. Incremental
parsing, or incremental grammar? ACL Workshop Incremental Parsing: Bringing Engineering and Cognition Together, Barcelona, July 2004.Linguistics, 27(1).
[Joshi85] A. Joshi. How much context-sensitivity is necessary for characterizing structural descriptions - tree adjoining grammars. In Natural Language Processing- Theoretical, Computational and Psychological Perspectives. Cambridge University Press, 1985.
[DHS00] C. Doran, B. Hockey, A. Sarkar, B. Srinivas, and F. Xia. Evolution of the xtag system. In A. Abeill e and O. Rambow, editors, Tree Adjoining Grammars. Chicago Press, 2000.
75
Reference[Marcus-et-al93] M. Marcus, B. Santorini, and M. A.
Marcinkiewicz. Building a large annotated corpus of english: The penn treebank. Computational Linguistics, 19, 1993.
[Bosco-et-al00] C. Bosco, V. Lombardo, D. Vassallo, and L. Lesmo. Building a treebank for italian: a data-driven annotation schema. In LREC00, Athens, 2000.