Date post: | 19-Dec-2015 |
Category: |
Documents |
View: | 218 times |
Download: | 0 times |
Synchronous Grammars andTree Automata
David Chiang and Kevin KnightUSC/Information Sciences Institute
Viterbi School of EngineeringUniversity of Southern California
Why Worry About Formal Language Theory?
• They already figured out most of the key things back in the 1960s & 1970s
• Lucky us!– Helps clarify our thinking about applications– Helps modeling & algorithm development– Helps promote code re-use– Opportunity to develop novel, efficient algorithms and
data structures that bring ancient theorems to life• Formal grammar and automata theory are the
daughters of natural language processing – let’s keep in touch
[Chomsky 57]
• Distinguish grammatical English from ungrammatical English:– John thinks Sara hit the boy– * The hit thinks Sara John boy– John thinks the boy was hit by Sara– Who does John think Sara hit?– John thinks Sara hit the boy and the girl– * Who does John think Sara hit the boy and?– John thinks Sara hit the boy with the bat– What does John think Sara hit the boy with?– Colorless green ideas sleep furiously.– * Green sleep furiously ideas colorless.
This Research Program has Contributed Powerful Ideas
Formal
language
hierarchy
Compiler
technology
Context-free grammar
This Research Program has NLP Applications
Alternative speech recognition or translation outputs:
Green = how grammaticalBlue = how sensible
This Research Program hasNLP Applications
Alternative speech recognition or translation outputs:
Pick this one!
Green = how grammaticalBlue = how sensible
This Research ProgramHas Had Wide Reach
• What makes a legal RNA sequence?
• What is the structure of a given RNA sequence?
Yasubumi Sakakibara, Michael Brown, Richard Hughey, I. Saira Mian, Kimmen Sjolander, Rebecca C. Underwood and David Haussler. Stochastic Context-Free Grammars for tRNA, Modeling Nucleic Acids Research,22(23):5112-5120, 1994.
This Research Program is Really Unfinished!
Type in your English sentence here:
Is this grammatical?
Is this sensible?
Acceptors and Transformers
• Chomsky’s program is about which utterances are acceptable
• Other research programs are aimed at transforming utterances– Translate a English sentence into Japanese…– Transform a speech waveform into transcribed words…– Compress a sentence, summarize a text…– Transform a syntactic analysis into a semantic
analysis…– Generate a text from a semantic representation…
Strings and Trees
• Early on, trees were realized to be a useful tool in describing what is grammatical– A sentence is a noun phrase (NP) followed by a verb phrase (VP)– A noun phrase is a determiner (DT) followed by a noun (NN)– A noun phrase is a noun phrase (NP) followed by a prepositional
phrase (PP)– A PP is a preposition (IN) followed by an NP
• A string is acceptable if it has an acceptable tree …• Transformations may take place at the tree level …
S
NP VP
NP PP
IN NP
Natural Language Processing
• 1980s: Many tree-based grammatical formalisms• 1990s: Regression to string-based formalisms
– Hidden Markov Models (HMM), Finite-State Acceptors (FSAs) and Transducers (FSTs)
– N-gram models for accepting sentences [eg, Jelinek 90]– Taggers and other statistical transformations [eg, Church 88]– Machine translation [eg, Brown et al 93]– Software toolkits implementing generic weighted FST operations
[eg, Mohri, Pereira, Riley 00]
WFSA WFST WFST WFSTw e j k
“backwards application of string k through a composition of a transducer cascade, intersected with a weighted FSA language model”
Natural Language Processing
• 2000s: Emerging interest in tree-based probabilistic models– Machine translation
• [Wu 97, Yamada & Knight 02, Melamed 03, Chiang 05, …]– Summarization
• [Knight & Marcu 00, …]– Paraphrasing
• [Pang et al 03, …]– Question answering
• [Echihabi & Marcu 03, …]– Natural language generation
• [Bangalore & Rambow 00, …]
What are the conceptual tools to help us get a grip on encoding andexploiting knowledge aboutlinguistic tree transformations?
Goal of this tutorial
Part 1: Introduction
Part 2: Synchronous Grammars < break >
Part 3: Tree Automata
Part 4: Conclusion
Part 1: Introduction
Part 2: Synchronous Grammars < break >
Part 3: Tree Automata
Part 4: Conclusion
Part 1: Introduction
Part 2: Synchronous Grammars < break >
Part 3: Tree Automata
Part 4: Conclusion
Tree Automata
• David talked about synchronous grammars
• I’ll talk about tree automata
Output tree 1Output tree 2
Output treeInput tree
synchronized
Steps to Get There
Strings Trees
Grammars Regular grammar ?
Acceptors Finite-state acceptor (FSA), PDA
?
String Pairs (inputoutput)
Tree Pairs (inputoutput)
Transducers Finite-state transducer (FST)
?
Context-Free Grammar
• Example:– S NP VP [p=1.0]– NP DET N [p=0.7]– NP NP PP [p=0.3]– PP P NP [p=1.0]– VP V NP [p=0.4]– DET the [p=1.0]– N boy [p=1.0]– V saw [p=1.0]– P with [p=1.0]
• Defines a set of strings• Language described by CFG can also
be described by a push-down acceptor
Generative Process:
S
Context-Free Grammar
• Example:– S NP VP [p=1.0]– NP DET N [p=0.7]– NP NP PP [p=0.3]– PP P NP [p=1.0]– VP V NP [p=0.4]– DET the [p=1.0]– N boy [p=1.0]– V saw [p=1.0]– P with [p=1.0]
• Defines a set of strings• Language described by CFG can also
be described by a push-down acceptor
Generative Process:
S
VPNP
Context-Free Grammar
• Example:– S NP VP [p=1.0]– NP DET N [p=0.7]– NP NP PP [p=0.3]– PP P NP [p=1.0]– VP V NP [p=0.4]– DET the [p=1.0]– N boy [p=1.0]– V saw [p=1.0]– P with [p=1.0]
• Defines a set of strings• Language described by CFG can also
be described by a push-down acceptor
Generative Process:
S
NP
VPNP
PP
Context-Free Grammar
• Example:– S NP VP [p=1.0]– NP DET N [p=0.7]– NP NP PP [p=0.3]– PP P NP [p=1.0]– VP V NP [p=0.4]– DET the [p=1.0]– N boy [p=1.0]– V saw [p=1.0]– P with [p=1.0]
• Defines a set of strings• Language described by CFG can also
be described by a push-down acceptor
Generative Process:
S
NP
DET N
VPNP
PP
the boy
Context-Free Grammar
• Example:– S NP VP [p=1.0]– NP DET N [p=0.7]– NP NP PP [p=0.3]– PP P NP [p=1.0]– VP V NP [p=0.4]– DET the [p=1.0]– N boy [p=1.0]– V saw [p=1.0]– P with [p=1.0]
• Defines a set of strings• Language described by CFG can also
be described by a push-down acceptor
Generative Process:
S
NP
DET N
VPNP
PP
P NP
the boy
…
Context-Free Grammar
• Example:– S NP VP [p=1.0]– NP DET N [p=0.7]– NP NP PP [p=0.3]– PP P NP [p=1.0]– VP V NP [p=0.4]– DET the [p=1.0]– N boy [p=1.0]– V saw [p=1.0]– P with [p=1.0]
• Defines a set of strings• Language described by CFG can also
be described by a push-down acceptor
Generative Process:
S
NP
DET N
VP
V NP
NP
PP
P NP DET N
DET Nthe boy
the boy
the boy
saw
with
Context-Free Grammar
• Is this what lies behind modern parsers like Collins and Charniak?
• No… they do not have a finite list of productions, but an essentially infinite list.– “Markovized” grammar
• Generative process is head-out:
S
VP
Ph(VP | S)
Context-Free Grammar
• Is this what lies behind modern parsers like Collins and Charniak?
• No… they do not have a finite list of productions, but an essentially infinite list.– “Markovized” grammar
• Generative process is head-out:
S
VPNP
Pleft(NP | VP, S)
Context-Free Grammar
• Is this what lies behind modern parsers like Collins and Charniak?
• No… they do not have a finite list of productions, but an essentially infinite list.– “Markovized” grammar
• Generative process is head-out:
S
VPNPPP
Pleft(PP | VP, S)
Context-Free Grammar
• Is this what lies behind modern parsers like Collins and Charniak?
• No… they do not have a finite list of productions, but an essentially infinite list.– “Markovized” grammar
• Generative process is head-out:
S
VPNPPPSTOP
Pleft(STOP | VP, S)
Context-Free Grammar
• Is this what lies behind modern parsers like Collins and Charniak?
• No… they do not have a finite list of productions, but an essentially infinite list.– “Markovized” grammar
• Generative process is head-out:
S
VPNPPPSTOP ADVP
Context-Free Grammar
• Is this what lies behind modern parsers like Collins and Charniak?
• No… they do not have a finite list of productions, but an essentially infinite list.– “Markovized” grammar
• Generative process is head-out:
S
VPNPPPSTOP ADVP STOP
Context-Free Grammar
• Is this what lies behind modern parsers like Collins and Charniak?
• No… they do not have a finite list of productions, but an essentially infinite list.– “Markovized” grammar
• Generative process is head-out:
S
VPNPPPSTOP ADVP STOP
VBD
Context-Free Grammar
• Is this what lies behind modern parsers like Collins and Charniak?
• No… they do not have a finite list of productions, but an essentially infinite list.– “Markovized” grammar
• Generative process is head-out:
S
VPNPPPSTOP ADVP STOP
VBD NP
Context-Free Grammar
• Is this what lies behind modern parsers like Collins and Charniak?
• No… they do not have a finite list of productions, but an essentially infinite list.– “Markovized” grammar
• Generative process is head-out:
S
VPNPPPSTOP ADVP STOP
VBD NP NP ……
ECFG [Thatcher 67]
• Example:– S ADVP* NP VP PP*
– VP VBD NP [NP] PP*
• Defines a set of strings• Can model Charniak and Collins:
– VP (ADVP | PP)* VBD (NP | PP)*
• Can even model probabilistic version:
*e* : Pright(STOP|VP)VBD:1.0*e* : Pleft(STOP|VP)
ADVP : Pleft(ADVP|VP) NP
PP : Pleft(PP|VP) PP
VP *e*:Phead(VBD|VP)
ECFG [Thatcher 67]
• Is ECFG more powerful than CFG?
• It is possible to build a CFG that generates the same set of strings as your ECFG– As long as right-hand side of every rule is regular– Or even context-free! (see footnote, [Thatcher 67])
• BUT:– the CFG won’t generate the same derivation trees
as the ECFG
ECFG [Thatcher 67]
• For example:
– ECFG can (of course) accept the string language a*– Therefore, we can build a CFG that also accepts a*– But ECFG can do it via this set of derivation trees, if so desired:
S S S S S …
a a aaa aa a a a a aa a a
, , , , ,
Tree Generating Systems• Sometimes the trees are important
– Parsing, RNA structure prediction– Tree transformation
• We can view CFG or ECFG as a tree-generating system, if we squint hard …
• Or, we can look at tree-generating systems– Regular tree grammar (RTG) is standard in automata
literature• Slightly more powerful than tree substitution grammar (TSG)• Less powerful than tree-adjoining grammar (TAG)
– Top-down tree acceptors (TDTA) are also standard
What We Want• Device or grammar D for compactly representing
possibly-infinite set S of trees (possibly with weights)
• Want to support operations like:– Membership testing: Is tree X in the set S?– Equality testing: Do sets described by D1 and D2
contain exactly the same trees?– Intersection: Compute (possibly-infinite) set of trees
described by both D1 and D2– Weighted tree-set intersection, e.g.:
• D1 describes a set of candidate English translations of some Chinese sentence, including disfluent ones
• D2 describes a general model of fluent English
Regular Tree Grammar (RTG)
• Example:– q S(qnp, VP(V(run))) [p=1.0]
– qnp NP(qdet, qn) [p=0.7]
– qnp NP(qnp, qpp) [p=0.3]
– qpp PP(qprep, qnp) [p=1.0]
– qdet DET(the) [p=1.0]
– qprep PREP(of) [p=1.0]
– qn N(sons) [p=0.5]
– qn N(daughters)[p=0.5]
• Defines a set of trees
Generative Process:
q
Regular Tree Grammar (RTG)
• Example:– q S(qnp, VP(V(run))) [p=1.0]
– qnp NP(qdet, qn) [p=0.7]
– qnp NP(qnp, qpp) [p=0.3]
– qpp PP(qprep, qnp) [p=1.0]
– qdet DET(the) [p=1.0]
– qprep PREP(of) [p=1.0]
– qn N(sons) [p=0.5]
– qn N(daughters)[p=0.5]
• Defines a set of trees
Generative Process:
S
VP
V
qnp
run
Regular Tree Grammar (RTG)
• Example:– q S(qnp, VP(V(run))) [p=1.0]
– qnp NP(qdet, qn) [p=0.7]
– qnp NP(qnp, qpp) [p=0.3]
– qpp PP(qprep, qnp) [p=1.0]
– qdet DET(the) [p=1.0]
– qprep PREP(of) [p=1.0]
– qn N(sons) [p=0.5]
– qn N(daughters)[p=0.5]
• Defines a set of trees
Generative Process:
S
VP
V
NP
run
qnp qpp
Regular Tree Grammar (RTG)
• Example:– q S(qnp, VP(V(run))) [p=1.0]
– qnp NP(qdet, qn) [p=0.7]
– qnp NP(qnp, qpp) [p=0.3]
– qpp PP(qprep, qnp) [p=1.0]
– qdet DET(the) [p=1.0]
– qprep PREP(of) [p=1.0]
– qn N(sons) [p=0.5]
– qn N(daughters)[p=0.5]
• Defines a set of trees
Generative Process:
S
VP
V
NP
run
qppNP
qdet qn
Regular Tree Grammar (RTG)
• Example:– q S(qnp, VP(V(run))) [p=1.0]
– qnp NP(qdet, qn) [p=0.7]
– qnp NP(qnp, qpp) [p=0.3]
– qpp PP(qprep, qnp) [p=1.0]
– qdet DET(the) [p=1.0]
– qprep PREP(of) [p=1.0]
– qn N(sons) [p=0.5]
– qn N(daughters)[p=0.5]
• Defines a set of trees
Generative Process:
S
VP
V
NP
run
qppNP
DET
the
qn
Regular Tree Grammar (RTG)
• Example:– q S(qnp, VP(V(run))) [p=1.0]
– qnp NP(qdet, qn) [p=0.7]
– qnp NP(qnp, qpp) [p=0.3]
– qpp PP(qprep, qnp) [p=1.0]
– qdet DET(the) [p=1.0]
– qprep PREP(of) [p=1.0]
– qn N(sons) [p=0.5]
– qn N(daughters)[p=0.5]
• Defines a set of trees
Generative Process:
S
VP
V
NP
run
qppNP
DET
the
N
sons
Regular Tree Grammar (RTG)
• Example:– q S(qnp, VP(V(run))) [p=1.0]
– qnp NP(qdet, qn) [p=0.7]
– qnp NP(qnp, qpp) [p=0.3]
– qpp PP(qprep, qnp) [p=1.0]
– qdet DET(the) [p=1.0]
– qprep PREP(of) [p=1.0]
– qn N(sons) [p=0.5]
– qn N(daughters)[p=0.5]
• Defines a set of trees
Generative Process:
S
VP
V
NP
run
PPNP
DET
the
N
sons
PREP NP
of DET
the
N
daughters
P(t) = 1.0 x 0.3 x 0.7 x 0.5 x 0.7 x 0.5
{ , }
Relation Between RTG and CFG
• For every CFG, there is an RTG that directly generates its derivation trees– TRUE (just convert the notation)
• For every RTG, there is a CFG that generates the same trees in its derivations– FALSE
RTG:q NP(NN(clown)
NN(killer))q NP(NN(killer)
NN(clown))
No CFG possible.
NP
NN NN
killer clown
NP
NN NN
clown killer
but reject “clown clown”
Relation Between RTG and TSG
• For every TSG, there is an RTG that directly generates its trees– TRUE (just convert the notation)
• For every RTG, there is an TSG that generates the same trees– FALSE
• Using states, an RTG can accept all trees (over symbols a and b) that contain exactly one a
Relation of RTG to Johnson [98]Syntax Model
TOP
S
NP VP
PRO VB NP
PRO
P(PRO | NP:VP)
P(PRO | NP:S)
= 0.21
= 0.03
Relation of RTG to Johnson [98]Syntax Model
TOP
S:TOP
NP:S VP:S
PRO:NP VB:VP NP:VP
PRO:NP
P(PRO | NP:VP)
P(PRO | NP:S)
= 0.21
= 0.03
Relation of RTG to Johnson [98] Syntax Model
S
NP VP
PRO VB NP
PRO
RTG:qstart S(q.np:s, q.vp:s)q.np:s NP(q.pro:np) p=0.21
q.np:vp NP(q.pro:np) p=0.03
q.pro:np PRO(q.pro)q.pro:np heq.pro:np sheq.pro:np himq.pro.np her
P(PRO | NP:VP)= 0.03
Relation of RTG to Lexicalized Syntax Models
• Collins’ model assigns P(t) to every tree• Can weighted RTG (wRTG) implement P(t) for
language modeling? – Just like a WFSA can implement smoothed n-gram
language model…
• Something like yes. – States can encode relevant context that is passed up and
down the tree.– Technical problems:
• “Markovized” grammar (Extended RTG?)• Some models require keeping head-word information, and if we
back off to a state that forgets the head-word, we can’t get it back.
Something an RTG Can’t Do
{b, , , , …}a a a
b b a a
b b b b
a
a a
b b b b
a
a a
b b b b
Note also that the yield language is {b2n : n 1}, which is not context-free.
Language Classes in String World and Tree World
String World Tree
World
indexedlanguages
RegularLanguages
CFL
RTL
CFTL
yield
yield
…
…
Picture So Far
Strings Trees
Grammars Regular grammar, CFG
DONE (RTG)
Acceptors Finite-state acceptor (FSA), PDA
?
String Pairs (inputoutput)
Tree Pairs (inputoutput)
Transducers Finite-state transducer (FST)
?
FSA Analog: Top-Down Tree Acceptor [Rabin, 1969; Doner, 1970]
For any RTG, there is a TDTA that accepts the same trees.For any TDTA, there is an RTG that accepts the same trees.
RTG:q NP(NN(clown) NN(killer))q NP(NN(killer) NN(clown))
q NP
NN NN
clown killer
NP
q2 NN q3 NN
clown killer
NP
q2 clown q3 killer
NN NN
NP
clown killer
NN NN
accept accept
TDTA:q NP q2 q3 q NP q3 q2q2 NN q2
q3 NN q3
q2 clown acceptq3 killer accept
FSA Analog: Bottom-Up Tree Acceptor [Thatcher & Wright 68; Doner 70]
• A similar story for bottom-up acceptors…
• To summarize tree acceptor automata and RTGs:– Tree acceptors are the analogs of string FSAs
– They are often used in proofs
– The visual appeal is not as great at FSAs
– People often prefer to read and write RTGs
Properties of Language Classes
RSL(FSA, rexp)
CFL(PDA, CFG)
RTL(TDTA, RTG)
Closed under union YES YES YESClosed under intersection YES NO YESClosed under complement YES NO YESMembership testing O(n) O(n3) O(n)Emptiness decidable? YES YES YESEquality decidable? YES NO YES
(references: Hopcroft & Ullman 79, Gécseg & Steinby 84)
String Sets Tree Sets
Picture So Far
Strings Trees
Grammars Regular grammar, CFG
DONE (RTG)
Acceptors Finite-state acceptor (FSA), PDA
DONE (TDTA)
String Pairs (inputoutput)
Tree Pairs (inputoutput)
Transducers Finite-state transducer (FST)
?
Transducers
Tree Transducer
Tree transducers compactly represent a possibly-infinite set of tree pairs
Probabilistic tree transducer assigns P(t2 | t1) to every tree pair
Can ask: What’s the best transformation of tree t1?
String Transducer
FST compactly represents a possibly-infinite set of string pairs
Probabilistic FST assigns P(s2 | s1) to every string pair
Can ask: What’s the best transformation of input string s1?
.
Reorder
S
NP VB2 VB1
PP VB
NN P
he adores
listening
music to
Insert
desu
S
NP VB2 VB1
PP VB
NN P
he ha
music to
ga
adores
listening no
Translate
Kare ha ongaku wo kiku no ga daisuki desu
Take Leaves
desu
S
NP VB2 VB1
PP VB
NN P
kare ha
ongaku wo
ga
daisuki
kiku no
S
NP VB1
he adores
listening
VB2
VB PP
NNP
musicto
Parse Tree(E)
Sentence(J)
Example 1: Machine Translation [Yamada & Knight 01]
Example 2: Sentence Compression[Knight & Marcu 00]
S
PRP VB
he
adores
listening
VP
VB PP
NN
P
music
to
NP
VP
S
PRP VB
he
adores
listening
VP
VB PP
P
to
NP
VP
JJ
good
NP
NN
music
NP
PP
NN
P
home
atNP
What We Want
• Device T for compactly representing possibly-infinite set of tree pairs (possibly with weights)
• Want to support operations like:– Forward application: Send input tree X through T, get all
possible output trees, perhaps represented as an RTG.
– Backward application: What input trees, when sent through T, would output tree X?
– Composition: Given a T1 and T2, can we build a transducer T3 that does the work of both in sequence?
– Equality testing: Do T1 and T2 contain exactly the same tree pairs?
– Training: How to set weights for T given a training corpus of input/output tree pairs?
Finite-State Transducer (FST)
k
n
i
g
h
t
q k q2 ε
q2 n q N
q i q AYq g q3 ε
q4 t qfinal Tq3 h q4 ε
Original input: Transformation:
q k
n
i
g
h
t
FST
qq2
qfinal
q3 q4
k : ε
n : N
h : εg : ε
t : T
i : AY
Finite-State (String) Transducer
q2 n
i
g
h
t
q k q2 ε
q2 n q N
q i q AYq g q3 ε
q4 t qfinal Tq3 h q4 ε
Original input: Transformation:
k
n
i
g
h
t
FST
qq2
qfinal
q3 q4
k : ε
n : N
h : εg : ε
t : T
i : AY
Finite-State (String) Transducer
N
q i
g
h
t
q k q2 ε
q2 n q N
q i q AYq g q3 ε
q4 t qfinal Tq3 h q4 ε
Original input: Transformation:
k
n
i
g
h
t
FST
qq2
qfinal
q3 q4
k : ε
n : N
h : εg : ε
t : T
i : AY
Finite-State (String) Transducer
q g
h
t
q k q2 ε
q2 n q N
q i q AYq g q3 ε
q4 t qfinal Tq3 h q4 ε
AY
N
Original input: Transformation:
k
n
i
g
h
t
FST
qq2
qfinal
q3 q4
k : ε
n : N
h : εg : ε
t : T
i : AY
Finite-State (String) Transducer
q3 h
t
q k q2 ε
q2 n q N
q i q AYq g q3 ε
q4 t qfinal Tq3 h q4 ε
AY
N
Original input: Transformation:
k
n
i
g
h
t
FST
qq2
qfinal
q3 q4
k : ε
n : N
h : εg : ε
t : T
i : AY
Finite-State (String) Transducer
q4 t
q k q2 ε
q2 n q N
q i q AYq g q3 ε
q4 t qfinal Tq3 h q4 ε
AY
N
Original input: Transformation:
k
n
i
g
h
t
FST
qq2
qfinal
q3 q4
k : ε
n : N
h : εg : ε
t : T
i : AY
Finite-State (String) Transducer
q k q2 ε
q2 n q N
q i q AYq g q3 ε
q4 t qfinal Tq3 h q4 ε
T
qfinal
AY
N
k
n
i
g
h
t
Original input: Transformation:
FST
qq2
qfinal
q3 q4
k : ε
n : N
h : εg : ε
t : T
i : AY
Finite-State (String) Transducer
q k q2 ε
q2 n q N
q i q AYq g q3 ε
q4 t qfinal Tq3 h q4 ε
T
AY
N
k
n
i
g
h
t
Original input: Final output:
FST
Trees
S
NP VP
PRO
he
VBZ
enjoys
NP
VBG
listening
VP
P
to
NP
SBAR
music
S
NP S
PRO
kare
kiku
ongaku o
NP VPPN
wa VB
daisuki
PV
desu
SBAR PN
gaS PS
NP VB
NN PN
no
Original input:
?
Target output:
Tree Transformation
S
NP VP
PRO
he
VBZ
enjoys
NP
VBG
listening
VP
P
to
NP
SBAR
music
Original input: Transformation:
q S
NP VP
PRO
he
VBZ
enjoys
NP
VBG
listening
VP
P
to
NP
SBAR
music
?
Tree Transformation
S
NP VP
PRO
he
VBZ
enjoys
NP
VBG
listening
VP
P
to
NP
SBAR
music
Original input: Transformation:
S
q.np NP S
PRO
he
q.vbz VBZ
enjoys
q.np NP
VBG
listening
VP
P
to
NP
SBAR
music
?
Trees
S
NP VP
PRO
he
VBZ
enjoys
NP
VBG
listening
VP
P
to
NP
SBAR
music
Original input: Transformation:
S
S
q.vbz VBZ
enjoys
q.np NP
VBG
listening
VP
P
to
NP
SBAR
music
NP
PRO
kare
PN
wa?
Trees
S
NP VP
PRO
he
VBZ
enjoys
NP
VBG
listening
VP
P
to
NP
SBAR
music
S
NP S
PRO
kare
kiku
ongaku o
NP VPPN
wa VB
daisuki
PV
desu
SBAR PN
gaS PS
NP VB
NN PN
no
Original input: Final output:
qfinal
?
Top-Down Tree Transducer
• Introduced by Rounds (1970) & Thatcher (1970)“Recent developments in the theory of automata have pointed to an
extension of the domain of definition of automata from strings to trees … parts of mathematical linguistics can be formalized easily in a tree-automaton setting … Our results should clarify the nature of syntax-directed translations and transformational grammars …” (Mappings on Grammars and Trees, Math. Systems Theory 4(3), Rounds 1970)
• “FST for trees”• Large literature
– e.g., Gécseg & Steinby (1984), Comon et al (1997)
• Many classes of transducers– Top-down (R), bottom-up (F), copying, deleting, look-ahead…
Top-Down Tree Transducer
Original Input:
TargetOutput:
S
PRO VP
VB NPhe
likes spinach
X
PROVP
VB NP he
likes spinach
AUX
does
?
R (“root to frontier”) transducer
Top-Down Tree Transducer
Transformation:q S
x0 x1
q S
PRO VP
VB NPhe
likes spinach
Or in computer-speak:
q S(x0, x1) X(q x1, q x0, AUX(does))
X
q x1 q x0 AUX
does
Original Input:
S
PRO VP
VB NPhe
likes spinach
R (“root to frontier”) transducer
Top-Down Tree Transducer
Transformation:q S
x0 x1
X
q x1 q x0 AUX
does
Or in computer-speak:
q S(x0, x1) X(q x1, q x0, AUX(does))
X
q PROq VP
VB NP he
likes spinach
AUX
does
Original Input:
S
PRO VP
VB NPhe
likes spinach
R (“root to frontier”) transducer
• Tree transducers generalize FSTs (strings are “monadic trees”)
Relation of Tree Transducers to (String) FSTs
q rA/B
q r*e*/B
q rA/*e*
q A
x0
B
r x0
q A
x0
r x0
q x B
r x
FSTtransition
Equivalent tree transducer rule
Tree Transducer Simulating FST
k
n
i
g
h
t
q k q2 x0
Original input: Transformation:
q k
n
i
g
h
t
x0
q n N
x0 q x0
R (“root to frontier”) transducer
Tree Transducer Simulating FST
q2 n
i
g
h
t
Original input: Transformation:
k
n
i
g
h
t
q k q2 x0
x0
q n N
x0 q x0
R (“root to frontier”) transducer
Tree Transducer Simulating FST
N
q i
g
h
t
Original input: Transformation:
k
n
i
g
h
t
q k q2 x0
x0
q n N
x0 q x0
(and so on …)
R (“root to frontier”) transducer
Top-Down Transducers can Copy and Delete
VP
q VB
VP
VB PP
q VP
x0 x1
VP
q x0
R transducer that deletes
d sin
x0
MULT
cos
MULT
cos d y
R transducer that copies
i x0
d x0
d sin
y
i y
Example from Rounds (70)
calculusd sin(y) = cos(y) · d y
sentencecompression
Complex Re-Ordering
S
PROVB NP
Original Input:
TargetOutput:
S
PRO VP
VB NP
q S
x0 x1
?
R transducer
Complex Re-Ordering
q S
x0 x1
S
qleft x1 q x0 qright x1
S
PRO VP
VB NP
qleft VP
x0 x1
q x0
qright VP
x0 x1
q x1
q PRO PRO
q VB VB
q NP NP
Transformation:
q S
PRO VP
VB NP
Original Input:
R transducer
Complex Re-Ordering
S
q PROqleft VP
VB NP
qright VP
VB NP
Transformation:q S
x0 x1
S
qleft x1 q x0 qright x1
qleft VP
x0 x1
q x0
qright VP
x0 x1
q x1
q PRO PRO
q VB VB
q NP NP
S
PRO VP
VB NP
Original Input:
R transducer
Complex Re-Ordering
q PROq VB qright VP
VB NP
Transformation:q S
x0 x1
S
qleft x1 q x0 qright x1
qleft VP
x0 x1
q x0
qright VP
x0 x1
q x1
q PRO PRO
q VB VB
q NP NP
SS
PRO VP
VB NP
Original Input:
R transducer
Complex Re-Ordering
q PROVB qright VP
VB NP
Transformation:q S
x0 x1
S
qleft x1 q x0 qright x1
qleft VP
x0 x1
q x0
qright VP
x0 x1
q x1
q PRO PRO
q VB VB
q NP NP
SS
PRO VP
VB NP
Original Input:
R transducer
Complex Re-Ordering
PROVB qright VP
VB NP
Transformation:q S
x0 x1
S
qleft x1 q x0 qright x1
qleft VP
x0 x1
q x0
qright VP
x0 x1
q x1
q PRO PRO
q VB VB
q NP NP
SS
PRO VP
VB NP
Original Input:
R transducer
Complex Re-Ordering
PROVB q NP
Transformation:q S
x0 x1
S
qleft x1 q x0 qright x1
qleft VP
x0 x1
q x0
qright VP
x0 x1
q x1
q PRO PRO
q VB VB
q NP NP
SS
PRO VP
VB NP
Original Input:
R transducer
Complex Re-Ordering
S
PROVB NP
FinalOutput:
S
PRO VP
VB NP
q S
x0 x1
S
qleft x1 q x0 qright x1
qleft VP
x0 x1
q x0
qright VP
x0 x1
q x1
q PRO PRO
q VB VB
q NP NP
Original Input:
R transducer
Extended Left-Hand Side Transducer: xR
S
PROVB NP
FinalOutput:
S
PRO VP
VB NP
q S
x0:PRO VP
S
q x1 q x0 q x2
q PRO PRO
q VB VB
q NP NP
Original Input:
x1:VB x2:NP
Mentioned already in Section 4, Rounds 1970.Not defined or used in proofs there.See [Knight & Graehl 05].
xR transducer
Tree-to-String Transducers: xRSSyntax-directed translation for compilers [Aho & Ullman 71]
S
NP VP
PRO
he
VBZ
enjoys
NP
VBG
listening
VP
P
to
NP
SBAR
music
Original input: Transformation:
q S
NP VP
PRO
he
VBZ
enjoys
NP
VBG
listening
VP
P
to
NP
SBAR
music
q S
x0:NP VP
q x0 , q x2 , q x1
x1:VBZ x2:NP
xRS transducer
S
NP VP
PRO
he
VBZ
enjoys
NP
VBG
listening
VP
P
to
NP
SBAR
music
Original input: Transformation:
q.np NP
PRO
he
q.vbz VBZ
enjoys
q.np NP
VBG
listening
VP
P
to
NP
SBAR
music
, ,
Tree-to-String Transducers: xRSSyntax-directed translation for compilers [Aho & Ullman 71]
xRS transducerq S
x0:NP VP
q x0 , q x2 , q x1
x1:VBZ x2:NP
Tree-to-String Transducers: xRSSyntax-directed translation for compilers [Aho & Ullman 71]
S
NP VP
PRO
he
VBZ
enjoys
NP
VBG
listening
VP
P
to
NP
SBAR
music
Original input: Transformation:
q.vbz VBZ
enjoys
q.np NP
VBG
listening
VP
P
to
NP
SBAR
music
, ,kare wa,
xRS transducerq S
x0:NP VP
q x0 , q x2 , q x1
x1:VBZ x2:NP
Tree-to-String Transducers: xRSSyntax-directed translation for compilers [Aho & Ullman 71]
S
NP VP
PRO
he
VBZ
enjoys
NP
VBG
listening
VP
P
to
NP
SBAR
music
kare kikuongaku owa daisuki desugano
Original input: Final output:
, , , , , , ,,
xRS transducer
q S
x0:NP VP
q x0 , q x2 , q x1
x1:VBZ x2:NP
Used in a practical MTsystem [Galley et al 04].
Are Tree Transducers Expressive Enough for Natural Language?
• Can published tree-based probabilistic models be cast in this framework?– This was previously done for string-based models,
e.g.• [Knight & Al-Onaizan 98] – word-based MT as FST• [Kumar & Byrne 03] – phrase-based MT as FST
• Can published tree-based probabilistic models be naturally extended in this framework?
.
Reorder
S
NP VB2 VB1
PP VB
NN P
he adores
listening
music to
Insert
desu
S
NP VB2 VB1
PP VB
NN P
he ha
music to
ga
adores
listening no
Translate
Kare ha ongaku wo kiku no ga daisuki desu
Take Leaves
desu
S
NP VB2 VB1
PP VB
NN P
kare ha
ongaku wo
ga
daisuki
kiku no
S
NP VB1
he adores
listening
VB2
VB PP
NNP
musicto
Sentence (J)
Are Tree Transducers Expressive Enough for Natural Language?
[Yamada & Knight 01]
Parse (E)
.
Reorder
S
NP VB2 VB1
PP VB
NN P
he adores
listening
music to
Insert
desu
S
NP VB2 VB1
PP VB
NN P
he ha
music to
ga
adores
listening no
Translate
Kare ha ongaku wo kiku no ga daisuki desu
Take Leaves
desu
S
NP VB2 VB1
PP VB
NN P
kare ha
ongaku wo
ga
daisuki
kiku no
S
NP VB1
he adores
listening
VB2
VB PP
NNP
musicto
Sentence(J)
Are Tree Transducers Expressive Enough for Natural Language?
Can be cast as a single 4-statetree transducer.
See [Graehl & Knight 04]
[Yamada & Knight 01]
Parse (E)
Extensions Possible within theTransducer Framework
S
x0:NP VP
x1:VB x2:NP2
x1, x0, x2
S
PRO VP
VB x0:NPthere
are
hay, x0
NP
x0:NP PP
of
P x1:NP
x1, , x0
Multilevel Re-Ordering
Non-constituent Phrases
Lexicalized Re-Ordering
VP
VBZ VBG
is
está, cantando
Phrasal Translation
singing
VP
VB x0:NP PRT
put
poner, x0
Non-contiguous Phrases
on
NPB
DT x0:NNS
the
x0
Context-SensitiveWord Insertion
Extensions Possible within theTransducer Framework
S
x0:NP VP
x1:VB x2:NP2
x1, x0, x2
S
PRO VP
VB x0:NPthere
are
hay, x0
NP
x0:NP PP
of
P x1:NP
x1, , x0
Multilevel Re-Ordering
Non-constituent Phrases
Lexicalized Re-Ordering
VP
VBZ VBG
is
está, cantando
Phrasal Translation
singing
VP
VB x0:NP PRT
put
poner, x0
Non-contiguous Phrases
on
NPB
DT x0:NNS
the
x0
Context-SensitiveWord Insertion
Practical MT system[Galley et al 04, 06; Marcu et al 06]
uses 40m suchautomatically collected rules
Limitations of theTop-Down Transducer Model
q S
WhNP VP
AUX S
NP VP
V S
NP VP
V S
NP VP
Who
does
John
think
Mary
believes
I
saw
S
NP VP
V qwho S
NP VP
V S
NP
John
thinks
Mary
believes
I
V
VP
saw
V
*
Who does John think Mary believes I saw? John thinks Mary believes I saw who?
Limitations of theTop-Down Transducer Model
q S
WhNP VP
AUX S
NP VP
V S
NP VP
V S
NP VP
Who
does
John
think
Mary
believes
I
saw
S
NP VP
V S
NP VP
V S
NP
John
thinks
Mary
believes
I
V
qwhoVP
saw
V
*
Who does John think Mary believes I saw? John thinks Mary believes I saw who?
Limitations of theTop-Down Transducer Model
q S
WhNP VP
AUX S
NP VP
V S
NP VP
V S
NP VP
Who
does
John
think
Mary
believes
I
saw
S
NP VP
V S
NP VP
V S
NP
John
thinks
Mary
believes
I
V
VP
saw
V
*
Who does John think Mary believes I saw? John thinks Mary believes I saw who?
who
WhNP
Limitations of the Top-Down Transducer Model
q S
WhNP VP
AUX S
NP VP
V S
NP VP
V S
NP VP
DT ADJ N
Whose blue dogdoes
John
think
Mary
believes
I
saw
S
NP VP
V S
NP VP
V S
NP
John
thinks
Mary
believes
I
V
VP
saw
V
*
Whose blue dog does John think Mary believes I saw? John thinks Mary believes I saw whose blue dog?
WhNP
DT ADJ N
whose blue dog
Can’t do this
R
RL
RLN | FLN
+ delete
copying
non-copying
deleting
non-deleting
Tree Transducers: A Digression
+ copy (before Nprocess)+ complex re-order
WFST
+ branching LHS rules
R
RL
RLN | FLN
xRLN+ delete
+ finite-checkbefore delete+ complexre-order
xR
xRL
+ complexre-order
+ delete+ finiite-checkbefore delete
+ finite-checkbefore delete
+ copy (beforenprocess)
copying
non-copying
deleting
non-deleting
Tree Transducers: A Digression
+ copy (before Nprocess)+ complex re-order
WFST
+ branching LHS rules
R
RT
RL
RN
RTDRLN | FLN
RD
xRLN+ delete
+ finite-checkbefore delete+ complexre-order
xR
RR
xRL
+ complexre-order
+ delete+ finiite-checkbefore delete
+ finite-checkbefore delete
+ copy (beforenprocess)
+ reg-check before delete
to R
copying
non-copying
deleting
non-deleting
Tree Transducers: A Digression
+ copy (before Nprocess)+ complex re-order
WFST
+ branching LHS rules
F
RRLR | FL
RT
RL
RN
RTDRLN | FLN
F2R2
RD
PDTT
xRLN+ delete
+ delete+ reg-check before delete
+ reg-check before delete
+ copy (before Nprocess)+ complex re-order
+ finite-checkbefore delete+ complexre-order
xR
RR
xRL
+ complexre-order
+ delete+ finiite-checkbefore delete
+ finite-checkbefore delete
+ copy (beforenprocess)
+ reg-check before delete
+ nprocess (before copy)+ complex re-order
to R
+ reg-check before delete+ nprocess before copy
FLD
+ non-determinism
copying
non-copying
deleting
non-deleting
Tree Transducers: A Digression
WFST
+ branching LHS rules
WFST
F
RRLR | FL
RT
RL
RN
RTDRLN | FLN
F2R2
RD
PDTT
+ delete
+ delete+ reg-check before delete
+ reg-check before delete
+ copy (before Nprocess)+ complex re-order
+ finite-checkbefore delete+ complexre-order
xR
RR
xRL
+ complexre-order
+ delete+ finiite-checkbefore delete
+ finite-checkbefore delete
+ copy (beforenprocess)
+ reg-check before delete
+ nprocess (before copy)+ complex re-order
to R
+ reg-check before delete+ nprocess before copy
FLD
+ non-determinism
+ branching LHS rules
copying
non-copying
deleting
non-deleting
Tree Transducers: A Digression
xRLN
Bold circles =Closed underComposition
Top Down Tree Transducers are Not Closed Under Composition [Rounds 70]
X|Y|X|
…|a
R that changesX’s and Y’s to X’s and Y’s, non-deterministically
Y|X|X|
…|a
R that duplicatesinput tree under new label Z
Z/ \
Y Y| |X X| |X X| |
… …| |a a
No problem! No problem!
X|Y|X|
…|a
R that changesX’s and Y’s to X’s and Y’s, non-deterministically,then duplicatesresulting tree undernew label Z
Z/ \
Y Y| |X X| |X X| |
… …| |a a
Impossible for R
Top Down Tree Transducers are Not Closed Under Composition [Rounds 70]
NOTE: Deterministic R-transducers are composable.
Properties of Transducer Classes
FST R RL RLN=FLN F FL
Closed under composition YES NO NO YES NO YES
Closed under intersection NO NO NO NO NO NO
Image of tree (or finite forest) is:
RSL RTL RTL RTL Not RTL
RTL
Image of RTL is: RSL Not RTL
RTL RTL Not RTL
RTL
Inverse image of tree is: RSL RTL RTL RTL RTL RTL
Inverse image of RTL is: RSL RTL RTL RTL RTL RTL
Efficiently trainable YES YES YES YES
String Transducers Tree Transducers
(references: Hopcroft & Ullman 79, Gécseg & Steinby 84, Baum et al 70, Graehl & Knight 04)
K-best Trees & Determinization
• String sets (FSA)– [Dijkstra 59] gives best path through FSA
• O(e + n log n)
– [Eppstein 98] gives k-best paths through FSA, including cycles• O(e + n log n + k log k) /* to print k-best lengths */
– [Mohri & Riley 02] give k-best unique strings (determinization followed by k-best paths)
• Tree sets (RTG)– [Knuth 77] gives best derivation tree from CFG (easily adapted to RTG)
• O(t + n log n) running time
– [Huang & Chiang 05] give k-best derivations in RTG– [May & Knight 05] give k-best unique trees (determinization followed by
k-best derivations)
The Training Problem [Graehl & Knight 04]
• Given: – an xR transducer with a set of non-deterministic rules (r1…rm)– a training corpus of input-tree/output-tree pairs
• Produce:– conditional probabilities p1…pm that maximize P(output trees | input trees)
= Πi,o P(o | i) input/output trees <i, o>
= Πi,o Σd P(d, o | i) derivations d mapping i to o
= Πi,o Σd Πr pr rules r in derivation d
Derivations in R
Input tree:
A
B C
F GD E
A
R S
XT U
V W
q A
x0 x1
q B
x0 x1
q F V
A
R Srule1
rule2
rule5
R Transducer rules:
Xq x1 q x0
U
q C
x0 x1rule3
T
q x0 q x1
q C
x0 x1rule4
T
q x1 q x0
q F Wrule6
q G Vrule7
q G Wrule8
1.0
1.0
0.6
0.4
0.9
0.1
0.5
0.5
Output tree: Packed Derivations:
qstart 1.0 rule1(q.1.12, q.2.11)q.1.12 1.0 rule2q.2.11 0.6 rule3(q.21.111, q.22.112)q.2.11 0.4 rule4(q.21.112, q.22.111)q.21.111 0.9 rule5q.22.112 0.5 rule8q.21.112 0.1 rule6q.22.111 0.5 rule7
Derivations:
rule1
rule3 rule2
rule5 rule8
rule1
rule4 rule2
rule6 rule7
(total weight = 0.27)
(total weight = 0.02)
Derivations in R
Input tree:
A
B C
F GD E
A
R S
XT U
V W
q A
x0 x1
q B
x0 x1
q F V
A
R Srule1
rule2
rule5
R Transducer rules:
Xq x1 q x0
U
q C
x0 x1rule3
T
q x0 q x1
q C
x0 x1rule4
T
q x1 q x0
q F Wrule6
q G Vrule7
q G Wrule8
1.0
1.0
0.6
0.4
0.9
0.1
0.5
0.5
Output tree: Packed Derivations:
qstart 1.0 rule1(q.1.12, q.2.11)q.1.12 1.0 rule2q.2.11 0.6 rule3(q.21.111, q.22.112)q.2.11 0.4 rule4(q.21.112, q.22.111)q.21.111 0.9 rule5q.22.112 0.5 rule8q.21.112 0.1 rule6q.22.111 0.5 rule7
Derivations:
rule1
rule3 rule2
rule5 rule8
rule1
rule4 rule2
rule6 rule7
(total weight = 0.27)
(total weight = 0.02)
Derivations in R
Input tree:
A
B C
F GD E
A
R S
XT U
V W
q A
x0 x1
q B
x0 x1
q F V
A
R Srule1
rule2
rule5
R Transducer rules:
Xq x1 q x0
U
q C
x0 x1rule3
T
q x0 q x1
q C
x0 x1rule4
T
q x1 q x0
q F Wrule6
q G Vrule7
q G Wrule8
1.0
1.0
0.6
0.4
0.9
0.1
0.5
0.5
Output tree: Packed Derivations:
qstart 1.0 rule1(q.1.12, q.2.11)q.1.12 1.0 rule2q.2.11 0.6 rule3(q.21.111, q.22.112)q.2.11 0.4 rule4(q.21.112, q.22.111)q.21.111 0.9 rule5q.22.112 0.5 rule8q.21.112 0.1 rule6q.22.111 0.5 rule7
Derivations:
rule1
rule3 rule2
rule5 rule8
rule1
rule4 rule2
rule6 rule7
(total weight = 0.27)
(total weight = 0.02)
Naïve EM Algorithm for xR
Initialize uniform probabilities
Until tired do:
Make zero rule count table
For each training case <i, o>
Compute P(o | i) by summing over derivations d
For each derivation d for <i, o> /* exponential! */
Compute P(d | i, o) = P(d, o | i) / P(o | i)
For each rule r used in d
count(r) += P(d | i, o)
Normalize counts to probabilities
Efficient EM for xR
Initialize uniform probabilitiesUntil tired do:
Make zero rule count tableFor each training case <i, o>
Build packed derivation forest O(q n2 r) time/spaceUse inside-outside algorithm to collect rule counts
Inside pass O(q n2 r) time/spaceOutside pass O(q n2 r) time/spaceCount collection pass O(q n2 r) time/space
Normalize counts to probabilities (joint or conditional)
Per-example training complexity is O(n2) x “transducer constant” -- same as forward-backward training for string transducers [Baum & Welch 71].
Variation for xRS runs in O(q n4 r) time (if |RHS| = 2).
Training: Related Work
• Generalizes specific MT model training algorithm in the appendix of [Yamada/Knight 01]– xR training not tied to particular MT model, or to MT at all
• Generalizes forward-backward HMM training [Baum et al 70]– Write strings vertically, and use xR training
• Generalizes inside-outside PCFG training [Lari/Young 90]– Attach fixed input tree to each “output” string, and use xRS
training
• Generalizes synchronous tree-substitution grammar (STSG) training [Eisner 03]– xR and xRS allow copying of subtrees
Picture So Far
Strings Trees
Grammars Regular grammar, CFG DONE (RTG)
Acceptors Finite-state acceptor (FSA), PDA
DONE (TDTA)
String Pairs (inputoutput)
Tree Pairs (inputoutput)
Transducers Finite-state transducer (FST)
DONE
If You Want to Play Around …
• Tiburon: A Weighted Tree Automata Toolkit– www.isi.edu/licensed-sw/tiburon– Developed by Jonathan May at USC/ISI– Portable, written in Java
• Implements operations on tree grammars and tree transducers– k-best trees, determinization, application, training, …
• Tutorial primer comes with explanations, sample automata, and exercises– Hands-on follow-up to a lot of what we covered today
• Beta version released June 2006
Part 1: Introduction
Part 2: Synchronous Grammars < break >
Part 3: Tree Automata
Part 4: Conclusion
Why Worry About Formal Languages?
• They already figured out most of the key things back in the 1960s & 1970s
• Lucky us!
Relation of Synchronous Grammars with Tree Transducers
• Developed largely independently– Synchronous grammars
• [Aho & Ullman 69, Lewis & Stearns 68] inspired by compilers • [Shieber and Schabes 90] inspired by syntax/semantics mapping
– Tree Transducers• [Rounds 70] inspired by transformational grammar
• Recent work is starting to relate the two– [Shieber 04, 06] explains both in terms of abstract
bimorphisms– [Graehl, Hopkins, Knight in prep] relates STSG to xRLN &
gives composition closure results
Wait, Maybe They Didn’t Figure Out Everything in the 1960s and 1970s …
• Is there a formal tree transformation model that fits natural language problems and is closed under composition?
• Is there an “markovized” version of R-style transducers, with horizontal Thatcher-like processing of children?
• Are there transducers that can move material across unbounded distance in a tree? Are they trainable?
• Are there automata models that can work with string/string data?• Can we build large, accurate, probabilistic syntax-based language
models of English using RTGs?• What about root-less transformations like
– JJ(big) NN(cheese) jefe
• Can we build probabilistic tree toolkits as powerful and ubiquitous as string toolkits?
More To Do on Synchronous Grammars
• Connections between synchronous grammars and tree transducers and transfer results between them
• Degree of formal power needed in theory and practice for translation and paraphrase [Wellington et al 06]
• New kinds of synchronous grammars that are more powerful but not more expensive, or not much more
• Efficient and optimal algorithms for minimizing the rank of a synchronous CFG [Gildea, Satta, Zhang 06]
• Faster strategies for translation with an n-gram language model [Zollmann & Venugopal 06; Chiang, to appear?]
Further Readings• Synchronous grammar
– Attached notes by David (with references)
• Tree automata– [Knight and Graehl 05] overview and relationship to
natural language processing
– Tree Automata textbook [Gécseg & Steinby 84]
– TATA textbook on the Internet [Comon et al 97]
References on These Slides[Aho & Ullman 69] ACM STOC[Aho & Ullman 71] Information and Control, 19[Bangalore & Rambow 00] COLING[Baum et al 70] Ann. Math. Stat. 41[Chiang 05] ACL[Chomsky 57] Syntactic Structures[Church 77] ANLP[Comon et al 97] www.grappa.univ-lille3.fr/tata/[Dijkstra 59] Numer. Math. 1[Doner 70] J. Computer & Sys. Sci. 4[Echihabi & Marcu 03] ACL[Eisner 03] HLT[Eppstein 98] Siam J. Computing 28[Galley et al 04, 06] HLT, ACL-COLING[Gecseg & Steinby 84] textbook[Gildea, Satta, Zhang 06] COLING-ACL[Graehl & Knight 04] HLT[Jelinek 90] Readings in Speech Recog.[Knight & Al-Onaizan 98] AMTA[Knight and Graehl 05] Cicling[Knight & Marcu 00] AAAI[Knuth 77] Info. Proc. Letters 6[Kumar & Byrne 03] HLT
[Lewis & Stearns 68] JACM[Lari & Young 90] Comp. Speech
and Lang. 4[Marcu et al 06] EMNLP[May & Knight 05] HLT[Melamed 03] HLT[Mohri & Riley 02] ICSLP[Mohri, Pereira, Riley 00] Theo. Comp. Sci.[Pang et al 03] HLT[Rabin 69] Trans. Am. Math. Soc. 141[Rounds 70] Math. Syst. Theory 4[Shieber & Schabes 90] COLING[Shieber 04, 06] TAG+, EACL[Thatcher 67] J. Computer & Sys. Sci. 1[Thatcher 70] J. Computer & Sys. Sci. 4[Thatcher & Wright 68] Math. Syst. Theory 2[Wellington et al 06] ACL-COLING[Wu 97] Comp. Linguistics[Yamada & Knight 01, 02] ACL[Zollmann & Venugopal 06] HLT