Minimalism and Merge Grammars
Matilde MarcolliMAT1509HS: Mathematical and Computational Linguistics
University of Toronto, Winter 2019, T 4-6 and W 4, BA6180
MAT1509HS Win2019: Linguistics Merge Grammars
Main References:
E.P. Stabler, Computational perspectives on minimalism, in“Oxford Handbook of Linguistic Minimalism”, OxfordUniversity Press, 2010, 616–641.
K. Vijay-Shanker, D. Weir, The equivalence of four extensionsof context free grammar formalisms, Mathematical SystemsTheory, 27 (1994) 511–545.
P. beim Graben, S. Gerth, Geometric representations forminimalist grammars, arXiv:1101.5076
T. Hunter, C. Dyer, Distributions on Minimalist GrammarDerivations, Proc. 13th Meeting of the Mathematics ofLanguage (MoL 13), Association for ComputationalLinguistics, 2013, pp.1–11.
R.C. Berwick, M. Marcolli, Linguistic merge and theformalism of renormalization, work in preparation.
MAT1509HS Win2019: Linguistics Merge Grammars
Extend the Context-Free Class to Mild Context Sensitivity
limited cross-serial dependencies
polynomial time parsing
semilinearity
MAT1509HS Win2019: Linguistics Merge Grammars
Semilinearity
• a subset V ⊂ Zk+ is semilinear if it is a finite union of sets of the
form{c +
∑w∈P
λw w | c ∈ C}
for some finite sets C,P ⊂ Nk and scalars λw
• a language L ⊂ A∗ with alphabet #A = k is semilinear iff forany monoid homomorphism
ϕ : (A∗, ?)→ (Zk+,+)
the image ϕ(L) ⊂ Nk is a semilinear set
• context-free and tree-adjoining grammars have semilinearproperty (Joshi and Yokomori, 1983)
MAT1509HS Win2019: Linguistics Merge Grammars
Multiple Context Free Grammars (MCFG)
• introduced by H.Seki, T.Matsumura, M.Fujii, T.Kasami, 1990
• Example: L = {an1an2 · · · an2m | n ≥ 0} is an m-MCFG
G = (N = {A, S},T = {ai}2mi=1,O = ∪mk=1(T ∗)k , {f , g},P,S)
with production rules P
f (x1, x2, . . . , xm) = (a1x1a2, a3x2a4, . . . , a2m−1xma2m)
g(x1, x2, . . . , xm) = x1x2 · · · xmA→ (ε, ε, . . . , ε), A→ f [A], S → g [A]
MAT1509HS Win2019: Linguistics Merge Grammars
MCFG: general definition G = (N,T ,O,F ,P, S)
O = ∪mk=1(T ∗)k
finite set F of (partial) functions f : Oa(f ) → O somea(f ) ∈ Nf ∈ F function of a(f ) variables: there are0 ≤ r(f ), dk(f ) ≤ m, k = 1, . . . ,m,
f :m∏
k=1
(T ∗)dk (f ) → (T ∗)r(f )
functions f (x1, . . . , xa(f )) are concatenations of constantstrings in T ∗ and variables inX = {xkj , k = 1, . . . , a(f ), j = 1, . . . , dk(f )} with each xijoccurring at most once
d : N → N, d(S) = 1, if A→ f (A1, . . . ,Aa(f )) in P thenr(f ) = d(A) and dk(f ) = d(Ak)
MAT1509HS Win2019: Linguistics Merge Grammars
Properties:
m-MCFG ( (m + 1)-MCFG
MCFGs are semilinear (Vijay-Shanker, Weir, Joshi, 1987)
tree adjoining grammars sit between CFG and 2-MCFG
CFG = 1-MCFG ( TAG ( 2-MCFG
recognition w ∈ LG is polynomially decidable(but inclusion LG1 ⊆ LG2 is undecidable)
MCFGs can be made stochastic as CFGs
MAT1509HS Win2019: Linguistics Merge Grammars
Merge Grammars or Minimalist Grammars (MG)
• introduced in
Edward P. Stabler and Edward L. Keenan, Structural similaritywithin and among languages, Theoretical Computer Science,293 (2003) 345–363.
• formalizing the derivations within Chomsky’s Minimalist Modelin the setting of formal languages
MAT1509HS Win2019: Linguistics Merge Grammars
• Minimalist Grammar G = (A,Sel , Lic , Lex , c)
A finite alphabet
Lic (licensing types) and Sel (selecting types) disjoint finitesets
Syn set of syntactic features:
lexicon finite subsetLex ⊂ A∗× (selectors ∪ licensors)∗× selectees × licensees ∗
c ∈ Sel type for completed expression
MAT1509HS Win2019: Linguistics Merge Grammars
Examples of minimalist lexicon items in Lex
lexical categories: adjective A, adjective phrase AP, adverb Adv,adverb phrase AdvP, noun N, noun phrase NP, verb V, verb phraseVP, etc.
functional categories: coordinate conjunction C, determiner D,negation Neg, particle Par, preposition P, prepositional phrase PP,subordinate conjunction Sub, tense T, tense phrase TP, etc.
selection: =X selection of an X phrase
licensees: -X requirements forcing movement
licensors: features that satisfy licensees requirements like +wh+case etc.
MAT1509HS Win2019: Linguistics Merge Grammars
Operations instead of production rules only two fixed kinds ofoperations in Minimalist Grammars
1 MERGE2 MOVE
MAT1509HS Win2019: Linguistics Merge Grammars
Merge and Move• more formal description of MERGE and MOVE
• Merge: (α, β) 7→ {α, {α, β}} or {β, {α, β}}• iterations: (γ, {α, {α, β}}) 7→ {γ, {γ, {α, {α, β}}}}
MAT1509HS Win2019: Linguistics Merge Grammars
Example of derivation in Minimalist Grammars (embedded question)
MAT1509HS Win2019: Linguistics Merge Grammars
MG = MCFG (Theorem 1 of Stabler 2010)
Main Idea of how to transform a MG grammar into a MCFG
transform trees into tuples of strings (subscript 0: non-lexicalexpressions; subscript 1 lexical; also :: and : for lexical and derived)• these tuples of strings give the production rules of a MCFG• start symbol of MCFG is 〈c〉0
MAT1509HS Win2019: Linguistics Merge Grammars
Example: previous derivation in terms of tuples of strings
MAT1509HS Win2019: Linguistics Merge Grammars
External and Internal Merge Operations
• MG operations of MERGE and MOVE unified as two aspects ofthe same merge operation
1 external merge
2 internal merge
under shortest move constraint (SMC): exactly one head inthe tree has −x as first feature; tM maximal projection
MAT1509HS Win2019: Linguistics Merge Grammars
Head and Projection
labels > and < of merge identify where head of the tree is:here leaf vertex number 8
maximal projection in T is a subtree of T that is not a propersubtree of any larger subtree with the same head
leaves {2, 3, 4} determine a subtree with head vertex the leafnumbered 3
any larger subtree in T would have a different head: this is amaximal projection
also subtree determined by leaves {5, 6}MAT1509HS Win2019: Linguistics Merge Grammars
Notation about features
in addition to labels {>,<} of merge operations, finite set ofsyntactic features labels X ∈ {N,V ,A,P,C ,T ,D, . . .}selector features denoted by the symbol σX for a headselecting a phrase XP(Note: usually notation = X rather than σX for selector)
T [α] tree where head is labelled by an ordered set of syntacticfeatures starting with α
operation T [α] 7→ T removes the α-feature from the headvertex
MAT1509HS Win2019: Linguistics Merge Grammars
Merge and the Origin of Language
Robert C. Berrwick, Noam Chomsky, Why only us? MITPress, 2015.
• proposal of a single significant evolutionary change leading to thestructure of human languages in a single computational operationcapable of generating recursive structures: merge operation
• is there a way to characterize merge as a fundamental structureof recursion in a mathematical sense?
MAT1509HS Win2019: Linguistics Merge Grammars
Merge as a Universal Structure: Recursion and Renormalization
• planar rooted trees
rooted tree T : simply connected (no loops) finite graph,vertex set V , distinguished element vr ∈ V (root), set ofedges E oriented away from root, source and target mapss, t : E → V , leaves univalent vertices
a planar embedding determines a linear ordering of the leaves
vertex-decorated LV : V → DV finite set, edge-decoratedLE : E → DE finite set of decorations; assumevertex-decorated
binary: leaves univalent; root valence two; all internal verticesvalence three (all binary splittings from root to leaves)
MAT1509HS Win2019: Linguistics Merge Grammars
Merge trees• set of labels will correspond to syntactic features, as well assymbols < and > for merge operations
<
α β
for merge α
α β
• planar or non-planar trees: planar trees assumes linear ordering isdetermined for result of merge (Stabler); non-planar if version oflinguistic minimalism where when merge is performed linear orderis determined later in the derivation
MAT1509HS Win2019: Linguistics Merge Grammars
Loday–Ronco Hopf algebra of planar rooted trees
vector space Vk spanned by planar binary rooted trees T withk internal vertices (hence k + 1 leaves)
dimVk = (#DV )k(2k)!
k!(k + 1)!
#DV cardinality of set DV of vertex labels
graded vector space V = ⊕k≥0Vk with V0 = Qgiven label d ∈ DV , grafting operator ∧d
∧d : V ⊗ V → V, T1 ⊗ T2 7→ T = T1 ∧d T2
with ∧d : Vk ⊗ V` → Vk+`−1attaching the two roots vr1 of T1 and vr2 of T2 to a singleroot vertex v labelled by d ∈ DV
MAT1509HS Win2019: Linguistics Merge Grammars
associative concatenation operations on planar binary rooted trees
given S and T , the tree S\T (S under T ) obtained bygrafting the root of T to the rightmost leaf of S
tree T/S (S over T ) obtained by grafting the root of T tothe leftmost leaf of S
grafting operation and concatenations
grafting operation obtained from concatenations
T1 ∧d T2 = T1/S\T2,
with S tree with a single vertex decorated by d ∈ DV
each planar binary rooted tree is a grafting T = T` ∧d Tr ofthe trees stemming to the left and right of root vertex
MAT1509HS Win2019: Linguistics Merge Grammars
Loday–Ronco Hopf algebra HLR
vector space V = ⊕k≥0Vk with V0 = Qmultiplication and a comultiplication inductively by degrees
trees T = T` ∧ Tr and T ′ = T ′` ∧ T ′r with product
T ? T ′ = T` ∧ (Tr ? T′) + (T ? T ′`) ∧ T ′r
coproduct
∆(T ) =∑j ,k
(T`,j ? Tr ,k)⊗ (T ′`,n−j ∧ T ′r ,m−k) + T ⊗ •
with T = T` ∧ Tr and ∆(T`) =∑
j T`,j ⊗ T ′`,n−j and∆(Tr ) =
∑k Tr ,k ⊗ T ′r ,m−k for T` ∈ Vn and Tr ∈ Vm
antipode on graded bialgebras inductively
S(X ) = −X −∑
S(X ′)X ′′
for ∆(X ) = X ⊗ 1 + 1⊗ X +∑
X ′ ⊗ X ′′ lower deg X ′,X ′′
MAT1509HS Win2019: Linguistics Merge Grammars
External Merge in HLR
• external merge T = em(T1[σX ],T2[X ]) of treesT1[σX ] and T2[X ]
if T1[σX ] single root vertex labelled by feature σX
<
T1 T2
in all other cases
>
T2 T1
MAT1509HS Win2019: Linguistics Merge Grammars
• labelled grafting (with domain)
X = X0X1 · · ·Xr or X = σX0X1 · · ·Xr string of syntacticfeatures and σ selector
domain
Dom(∧) = {(T1[X ],T2[Y ]) |X = σX0X1 · · ·Xr and Y = X0Y1 · · ·Ys}
external merge as labelled grafting
T1[X ]∧T2[Y ] =
{T1[X1 · · ·Xr ] ∧< T2[Y1 · · ·Ys ] |T1| = 1T2[Y1 · · ·Ys ] ∧> T1[X1 · · ·Xr ] |T1| > 1.
notation X for X with the first feature erased
T1[X ] ∧ T2[Y ] =
{T1[X ] ∧< T2[Y ] |T1| = 1
T2[Y ] ∧> T1[X ] |T1| > 1.
MAT1509HS Win2019: Linguistics Merge Grammars
Internal Merge in HLR
given binary rooted tree T with subtree T1 and another binaryrooted tree T2 define T{T1 → T2} as planar binary rootedtree obtained by removing T1 from T and replacing it with T2
notation: σ for the feature selector (=), and ω and ω for“licensor” and “licensee” (±)
domain of internal merge
Dom(I) = {T [X ] | ∃T1[Y ] ⊂ T [X ],with Y = ωX0Y , X = ωX0X}
internal merge
I(T [X ]) = TM1 [Y ] ∧> T{T1[Y ]M → ∅}
MAT1509HS Win2019: Linguistics Merge Grammars
Admissible Cuts on Trees
(planar) rooted tree T
admissible cut C of T : selection of a number of edges of Tsuch that every oriented path from the root to one of theleaves contains at most one of the selected edges
removing edges in C gives a disjoint union ρC (T ) t πC (T )
a (planar) tree ρC (T ) containing the root vertexa disjoint union πC (T ) = ∪iTi of (planar) trees each with aunique source vertex (root)
elementary admissible cut: cut consisting of a single edge
MAT1509HS Win2019: Linguistics Merge Grammars
Internal Merge via Admissible Cuts
tree T{T1 → ∅} is same as tree ρC (T ) of a single elementarycut
internal merge
I(T [X ]) = TM1 [Y ] ∧> T{T1[Y ]M → ∅} = πC (T ) ∧> ρC (T )
C is the elementary admissible cut specified by subtree TM1
internal merge are modelled by the three combinatorialoperations
T 7→ • ∧ T , T1 ⊗ T2 7→ T2 ∧ T1,
T 7→ ρC (T )⊗ πC (T ) 7→ πC (T ) ∧ ρC (T )
MAT1509HS Win2019: Linguistics Merge Grammars
Iterations of Internal Merge
matching labels conditions for domains
N-th iterate internal merge: admissible cut C of the tree Twith the number of cut branches #C = N
I#C (T [X ]) =
1+#C∧ (πC (T )[Y] ρC (T )[XN ]
)notation πC (T )[Y] for the forest πC (T ) = TM
N · · ·TM1 , where
the label [Y] means
πC (T )[Y] = TMN [Y (N)] · · ·TM
1 [Y (1)]
label [XN ] of the tree ρC (T ) what remains of the originallabel X after the initial terms ωX0ωX1 · · ·ωXN−1 are removed
MAT1509HS Win2019: Linguistics Merge Grammars
Connes–Kreimer Hopf algebra of rooted trees
polynomial algebra generated by the planar rooted trees T
coproduct: sum over all admissible cuts
∆(T ) = T ⊗ 1 + 1⊗ T +∑C
πC (T )⊗ ρC (T )
grading by span of the planar rooted trees with k internalvertices
antipode defined inductively on graded bialgebras
used as reformulation of the Connes–Kreimer Hopf algebra ofFeynman graphs in perturbative QFT
MAT1509HS Win2019: Linguistics Merge Grammars
Comparison between Hopf algebras
there is a map of Hopf algebras φ : HCK → HLR
maps unit 1 ∈ HCK (empty tree) to binary tree consisting ofsingle root vertex •maps single vertex tree • in HCK to binary tree with a singleinternal vertex (one root and two leaves)otherwise maps
φ(T ) = φ(F (T ))/φ(•)
with F (T ) forest obtained by removing root of T and / is theconcatenation operation grafting root of φ(F (T )) to left leafof φ(•)for a forest F = T1 · · ·Tn in HCK image
φ(F ) = φ(T1)\φ(T2)\ · · · \φ(Tn)
with \ the other concatenation operation grafting root ofφ(Ti+1) to rightmost leaft of φ(Ti )
this map is compatible with product and coproduct andantipode
MAT1509HS Win2019: Linguistics Merge Grammars
Example
References
M. Aguiar, F. Sottile, Structure of the Loday–Ronco Hopf algebraof trees, Journal of Algebra, Vol.295 (2006) 473–511
A. Connes, D. Kreimer, Hopf algebras, Renormalization andNoncommutative geometry, Comm. Math. Phys 199 (1998)203–242
J.L. Loday, M. Ronco, Hopf algebra of the planar binary trees, Adv.Math. 139 (1998) N.2, 293–309
MAT1509HS Win2019: Linguistics Merge Grammars
Recursive Structures and Dyson–Schwinger Equations in QFT
in the Connes–Kreimer setting
in perturbative QFT solve the equations of motion by arecursive combinatorial equation in the Feynman graphs:Dyson–Schwinger equationoperators B+
d : H → H with d ∈ DV vertex decoration
B+d (T1 · · · · · Tm) = T
with T grafting roots vr1 , . . . , vrm of trees T1, . . . ,Tm tocommon vertex •d labelled by d ∈ DV
satisfies Hochschild 1-cocycle condition
∆(B+d (X )) = B+
d (X )⊗ 1 + (Id ⊗ B+d ) ◦∆(X )
Dyson–Schwinger Equation
X = B+(P(X ))
fixed point of nonlinear transformation X 7→ B+(P(X ))some polynomial or formal power series P(t) =
∑k≥0 akt
k
with a0 = 1MAT1509HS Win2019: Linguistics Merge Grammars
unique solution X =∑
k≥1 xk
xn+1 =n∑
k=1
∑j1+···+jk=n
akB+(xj1 · · · xjk )
initial step x1 = B+(1)
more general form with vertex labels: variables X = (Xδ)δ∈DV
Fδ(X ) =∑
k1,...,kN
a(δ)k1,...,kN
X k1δ1· · ·X kN
δN
Dyson–Schwinger Equation
Xδ = B+δ (Fδ(X ))
unique solution Xδ =∑
τ xτ τ
xτ = (N∏
k=1
(∑mk
l=1 pδ,l)!∏mkl=1 pδ,l !
)a(δ)∑N
k=1 p1,k ,...,∑N
k=1 pN,kxp1,1τ1,1 · · · x
pN,mNτN,mN
τ = B+(τp1,11,1 · · · τ
p1,m11,m1
· · · τpN,1
N,1 · · · τpN,mNN,mN
)
MAT1509HS Win2019: Linguistics Merge Grammars
Dyson–Schwinger Equation in Loday–Ronco setting
Dyson–Schwinger equation X = B+(P(X )) becomesφ(X ) = φ(B+(P(X )))
for a forest F = T1 · · ·Tk image φ(B+(F )) = φ(F )/S withS = φ(•)general form of the solution
φ(xn+1) =n∑
k=1
∑j1+···+jk=n
ak φ(xj1 · · · xjk )/S =
n∑k=1
∑j1+···+jk=n
ak (φ(xj1)\ · · · \φ(xjk ))/S
MAT1509HS Win2019: Linguistics Merge Grammars
References about Dyson–Schwinger equations and Hopf algebras:
C. Bergbauer, D. Kreimer, Hopf algebras in renormalization theory:locality and Dyson-Schwinger equations from Hochschildcohomology, in “Physics and Number Theory”, pp. 133–164, IRMALect. Math. Theor. Phys. 10, Eur. Math. Soc., 2006
C. Delaney, M. Marcolli, Dyson-Schwinger equations in the theoryof computation, in “Feynman amplitudes, periods and motives”,pp. 79–107, Contemp. Math. 648, Amer. Math. Soc., 2015.
L. Foissy, Classification of systems of Dyson–Schwinger equations inthe Hopf algebra of decorated rooted trees, Advances in Math. 224(2010) 2094–2150
K. Yeats, Rearranging Dyson-Schwinger Equations, Memoirs of theAmerican Mathematical Society, 211, American MathematicalSociety, 2011.
MAT1509HS Win2019: Linguistics Merge Grammars
Dyson–Schwinger Equations and Generative Processes
generative process that identifies a family of planar binaryrooted trees obtained recursively through the application ofthe operations
φ(xj1), . . . , φ(xjk ) 7→ (φ(xj1)\ · · · \φ(xjk ))/φ(•)
linear combination and the coefficients ak as additional datathat keep track of weights assigned to the trees, so that eachφ(xj`) is itself a weighted combination of binary trees
general problem of how to make MG grammars probabilistic(via MCFGs not suitable)
with this iteration equation method can assign coefficients ak :use to assign probabilities consistently
Question: can view derivations in minimalist linguistics as recursivesolutions of Dyson–Schwinger type equations? Is this what isuniversal about merge?
MAT1509HS Win2019: Linguistics Merge Grammars