More than 4,000 live languages
Most are resource-poor
Key Questions
2
Can we improve monolingual performance byexploiting multilingual connections?
Multilingual Learning
Linguistic Motivation:
Languages related structurally and genetically
But differ systematically in patterns of expression and ambiguity
Goal:
• Induce individual language structures
• Induce cross-lingual connections
• Learn from differences in lexical ambiguity
fish/poissons [N] vs. fish/pêcher [V]
• Learn from differences in structural ambiguity (1) determiner “les” signals noun
Motivation for Multilingual Learning
• Learn from differences in lexical ambiguity
fish/poissons [N] vs. fish/pêcher [V]
• Learn from differences in structural ambiguity (1) determiner “les” signals noun
Motivation for Multilingual Learning
Multilingual Learningfor POS Tagging
Input:Untagged bilingual parallel corpus
Goal:Induce a POS tagger for each language(test on monolingual data)
6
Proposed Research
• Learn from non-parallel corpus
Benefit from the world’s wealth of language resources
• Move towards language-neutral semantic representation
num singular
person 1st
animacy yes
he הוא وہ
num singular
transitive yes
time present
smells מריח سونگھتا ہے
num plural
animacy no
flowers پھول פרחים
Constrain unsupervised grammar induction using language-independent syntactic rules
Using Linguistic Universals for Structure Analysis
Root Auxiliary Noun Adjective
Root Verb Noun Article
Verb Noun Noun Noun
Verb Pronoun Noun Numeral
Verb Adverb Preposition Noun
Verb Verb Adjective Adverb
Auxiliary Verb
(Naseem et al., EMNLP 2010)
Using Universal for Structure Analysis
20
30
40
50
60
70
80
English Danish Slovene Spanish Swedish Portuguese
No rules Universal Rules
Model Posterior
Adding the Universal Rules
Parses of data◊ Kids eat apples. ◊ Kids eat apples.
Parses of data
Post
erio
r p
rob
abili
ty
…. ….. ……
18
Model Posterior
Count(edges ∈ rules) ... 1 … … … 3 … …
╳ 0.005 ╳ 0.01
Adding the Universal Rules
Posterior Probability
Parses of data◊ Kids eat apples. ◊ Kids eat apples.
Parses of data
Post
erio
r p
rob
abili
ty
…. ….. ……
19
0.005
0.01
Model Posterior
Count(edges ∈ rules) ... 1 … … … 3 … …
╳ 0.005 ╳ 0.01
= (… + 0.005 + … + … + 0.03 + …) = 2.79 E[edges ∈ rules]
Adding the Universal Rules
Posterior Probability
Parses of data
20
◊ Kids eat apples. ◊ Kids eat apples.
Parses of data
Post
erio
r p
rob
abili
ty
…. ….. ……
0.005
0.01
Model Posterior
Count(edges ∈ rules) ... 1 … … … 3 … …
╳ 0.005 ╳ 0.01
= (… + 0.005 + … + … + 0.03 + …) = 2.79 E[edges ∈ rules]
≥ 0.8 ╳ total edges
Adding the Universal Rules
Posterior Probability
Parses of data
21
◊ Kids eat apples. ◊ Kids eat apples.
Parses of data
Post
erio
r p
rob
abili
ty
…. ….. ……
Pre-specified threshold
0.005
0.01
The Gap Remains
68.8
71.9
91.5
60
65
70
75
80
85
90
95
Unsupervised Headden III et al.
(2009)
Universal rules Naseem et al.
(2010)
Supervised McDonald et al.
(2006)
Leverage Language Diversity in Language Analysis
• Typological Analysis: compare languages based on structural patterns (aka typological parameters)
• Parameters encode dimensions of language variance
Subject Verb Object Positioning
Number of Genders
Definite Article
23
English Russian Hebrew
Exponence of Selected Inflectional Formatives
No case Case + number No case
Definite Articles Definite word distinct from demonstrative
No definite or indefinite article
Definite affix
Systems of Gender Assignment
SemanticSemantic and formal
Semantic and formal
Order of Adjective and Noun
Adjective-Noun Adjective-Noun Noun-Adjective
Hand and Arm Different Identical Different
The World Atlas of Language Structures Online2,650 Languages, 142 Features
24
0
0.1
0.2
0.3
0.4
English
P(.|Verb)
0
0.1
0.2
0.3
0.4
Portuguese
P(.|Verb)
From Typological Tables to Rule Distributions
0
0.1
0.2
0.3
0.4
0.5
English
P(.|Noun)
0
0.1
0.2
0.3
0.4
0.5
Portuguese
P(.|Noun)
From Typological Tables to Rule Distributions
Low Density Language
Unsupervised
Resource Rich Language
Supervised
Model for Low Density Language
Typology Reference
p(. | NP)p(. | NP)
KL divergence between p(. | NP)
and p(. | NP)
Proposed Approach: Bilingual Scenario
Arabic
Low Density Language
Unsupervised
)NP|(p
English
Chinese
Typology Reference
Proposed Approach: Multilingual Scenario
Model for Low Density Language
He smells flowers
pos verb
num singular
transitive yes
time present
smells (x1,x2)
pos verb
num singular
transitive no
time present
smells (x1)
pos noun
num plural
count yes
smells
Semantic Ambiguity
He smells flowers
pos verb
num singular
transitive yes
time present
smells (x1,x2)
pos verb
num singular
transitive no
time present
smells (x1)
pos noun
num plural
count yes
smells
smells/سونگھتا ہے flowers/پھول he/وہ
سونگھتا ہے بدبو آتی ہےبدبوئیں
ריחסמ מריח תרחו
פרחים /flowersהוא /he מריח /smells
Construct a Language Neutral Semantic Representation
• Align trees of multi-parallel corpus
He smells flowers הואמריחפרחים وہ پھول سونگھتا ہے
Construct a Language Neutral Semantic Representation
• Align trees of multi-parallel corpus
• Extract minimal set of frequently occurring fragments
Model with Dirichlet processes (adaptor grammar induction)
He smells flowers הואמריחפרחים وہ پھول سونگھتا ہے
num singular
person 1st
animacy yes
he הוא وہ
num singular
transitive yes
time present
smells מריח سونگھتا ہے
num plural
animacy no
flowers پھول פרחים
Construct a Language Neutral Semantic Representation
• Align trees of multi-parallel corpus
• Extract minimal set of frequently occurring fragments
• Learn to semantic parsing in a monolingual setting
num singular
person 1st
animacy yes
he הוא وہ
num singular
transitive yes
time present
smells מריח سونگھتا ہے
num plural
animacy no
flowers پھول פרחים
Construct a Language Neutral Semantic Representation
• Align trees of multi-parallel corpus
• Extract minimal set of frequently occurring fragments
• Learn to semantic parsing in a monolingual setting
• Project representation into low density language via bilingual corpus
num singular
person 1st
animacy yes
he הוא وہ
num singular
transitive yes
time present
smells מריח سونگھتا ہے
num plural
animacy no
flowers پھول פרחים
Benefits of Multilingual Semantic Representation
• Developing tools with scarce target language annotations
– Reduces need in training data due to abstraction over alternative surface realizations
• Developing tools with no target language annotations
– Supports cross-lingual transfer due to language-neutral features derived from the representation