Processing Learner Texts: from Annotation to . . .
Weiwei Sun
Wangxuan Institute of Computer TechnologyPeking University
@BCLU 2020
give a topic and then discussion about it
Is it a good English sentence?
Can you guess the meaning?
1 of 60
English as a Second Language (ESL)
L1 speakers29%
L2 speakers
71%
from Ethnologue (2019, 23rd edition);
898.4 million ESL speakers!
2 of 60
Learner texts are everywhere . . .
Language Tests
3 of 60
Learner texts are everywhere . . .
Social Network
3 of 60
Learner texts are everywhere . . .
my paper
3 of 60
Learner texts are everywhere . . .
https://acl2020.org/blog/general-conference-statistics/
and perhaps yours 3 of 60
First languages, second languages, cross-lingual transfer
L1 has an influence on L2
4 of 60
Something like
japanesenative
chineseto learn L2-chi, L1-jpn
englishnative
chineseto learn L2-chi, L1-eng
learn
learn
5 of 60
Universals
Noun Phrase Accessibility Hierarchy (?, ?)
Subject � direct object � indirect object � oblique � genitive � object ofcomparisonIf a language can relativize on a position on the hierarchy, then any otherhigher position can also be relativized on.
(1) a. the man who I am taller than � object of comparison
b. the man whose father I know � genitive
For example, if a language allows (1a), then it allows (1b).
A universal of SLA
L2 learners find relative clauses higher on the hierarchy easier to acquire.
6 of 60
Annotating learner texts
There is naturally a need to automatically annotate second language datawith rich lexical, syntactic, semantic and even pragmatic information.
High-performance automatic annotation,
I from an engineering perspective, enables deriving high-qualityinformation by structuring this specific type of data, and
I from a scientific perspective, enables quantitative studies for SecondLanguage Acquisition, which is complementary to hands-onexperiences in interpreting second language phenomena.
Is this talk about annotating grammatical errors? Not really.
7 of 60
Annotating learner texts
There is naturally a need to automatically annotate second language datawith rich lexical, syntactic, semantic and even pragmatic information.
High-performance automatic annotation,
I from an engineering perspective, enables deriving high-qualityinformation by structuring this specific type of data, and
I from a scientific perspective, enables quantitative studies for SecondLanguage Acquisition, which is complementary to hands-onexperiences in interpreting second language phenomena.
Is this talk about annotating grammatical errors? Not really.
7 of 60
Annotating learner texts
There is naturally a need to automatically annotate second language datawith rich lexical, syntactic, semantic and even pragmatic information.
High-performance automatic annotation,
I from an engineering perspective, enables deriving high-qualityinformation by structuring this specific type of data, and
I from a scientific perspective, enables quantitative studies for SecondLanguage Acquisition, which is complementary to hands-onexperiences in interpreting second language phenomena.
Is this talk about annotating grammatical errors? Not really.
7 of 60
Data: Reddit (https://www.reddit.com)
Large-scale L2 texts are available!
250M native and non-native English sentences (3.8B tokens), coveringover 45K authors from 50 countries (Rabinovich et al., 2018)
L1 Sentence
French I have to go to the Dr. to do a rapid check on my heart stability.
FrenchMaybe put every name through a manual approbation pipelineso it ensures quality.
FrenchPolls have shown public approbation for this law is somewherebetween 58% and 65%, and it has been a strong promiseduring the presidential campaign.
ItalianThe event was even more shocking because the precedent eveninghe wasn’t sick at all.
8 of 60
(Automatic) Annotation for learner languages
2010
POS tags(Dıaz-Negrillo et al., 2010)
2012 2013 2014 2016 2017
POS tagssyntactic dependencies
(Ragheb & Dickinson, 2012)(Dickinson & Ragheb, 2013)(Ragheb & Dickinson, 2014)
Phrase Structure(Nagata & Sakaguchi, 2016)
Universal Dependencies(Berzak et al., 2016)
UD for learner Chinese(Lee et al., 2017)
Annotated L2 texts are available!
Corpus L2 L1 #sent Structure
TLE English Multiple 5,124 Universal DependencyKonan-JIEM English Japanese 3,260 Phrase StructureICNALE English 10 Asian countries 1,930 Phrase Structure
Chinese-CFL Chinese Multiple 451 Universal Dependency
9 of 60
Data: lang-8 (http://lang-8.com)
Large-scale L2-L1 parallel texts are available!
6.8M English sentence pairs and 720K Chinese sentence pairs (Mizumotoet al., 2011; Y. Zhao et al., 2018)
L2 speaker 城市里的人能度过多方面的生活。corrected 城市里的人能过丰富多彩的生活。
L2 speaker You know what should I done.corrected You know what I should have done.
10 of 60
Patterns of cross-lingual transfer
NP
NN
smell
NN
bread
smell of bread
L2-English
Standard
NP(x0:NN x1:NN) → x1 x0
VP
NP
sports
ADVP
often
VB
play
often play sports
... ...
... ...
L2-English
Standard
VP(x0:VB x1:ADVP x2:NP) → x1 x0 x2
bread smell
面包 香气Mandarin
smell of breadEnglish
play often sports
faire souvent du sportFrench
often play sportsEnglish
11 of 60
Using patterns
Learner Texts
Tree-String Patterns Vectors Phylogenetic Structure
RussianPolishUkrainianSpanishFrenchPortuguese(Brazilian)ItalianPersianGermanHindi
Zhao et al. (2020); arXiv:2007.0907612 of 60
(Automatic) Annotation for learner languages
2010
POS tags(Dıaz-Negrillo et al., 2010)
2012 2013 2014 2016 2017 2018 2020
POS tagssyntactic dependencies
(Ragheb & Dickinson, 2012)(Dickinson & Ragheb, 2013)(Ragheb & Dickinson, 2014)
Phrase Structure(Nagata & Sakaguchi, 2016)
Universal Dependencies(Berzak et al., 2016)
UD for learner Chinese(Lee et al., 2017)
Semantic Roles(Lin et al., 2018)
English Resource Semantics(Y. Zhao et al., 2020)
Negation Scope Resolution(undersubmission)
capture “meanings”13 of 60
Can human understand interlanguage robustly?
/ It is difficult to define the syntactic formalism of learner language.
Grammaticality judgement
, But sometimes we can understand what they mean . . .
14 of 60
Can human understand interlanguage robustly?
/ It is difficult to define the syntactic formalism of learner language.
Grammaticality judgement
, But sometimes we can understand what they mean . . .
14 of 60
Research questions
I How can we capture meanings of L2s?How can we annotate L2 texts?Are there many differences from annotating L1 texts?
I How badly does an L1 data-trained semantic parser perform?Can state-of-the-art grammatical error correction systems help?
I What role does syntactic parsing play in processing L2 texts?What role does cross-lingual transfer play?
15 of 60
Research questions
I How can we capture meanings of L2s?How can we annotate L2 texts?Are there many differences from annotating L1 texts?
I How badly does an L1 data-trained semantic parser perform?Can state-of-the-art grammatical error correction systems help?
I What role does syntactic parsing play in processing L2 texts?What role does cross-lingual transfer play?
15 of 60
Shallow and not-that-shallow meaning representations
Semantic Role Labeling
Some boys want to go .
ARG0 ARG1
Bi-lexical Semantic Dependency
Some boys want to go .
BV ARG1
ARG1
ARG2
Conceptual Graphs
some q
boy n 1
want v to
go v 1
BV ARG1 ARG2
ARG1
16 of 60
Interface Hypothesis
Language structures involving an interface between different languagemodules, like syntax-semantics interface and semantics-pragmaticsinterface, are less likely to be acquired completely than structures that donot involve this interface; see e.g. Sorace (2011).
VP
PP
NP
PNit
Pabout
VP
NP
NP
Ndiscussion
CONJand then
NP
Ntopic
Da
Vgive
17 of 60
Literal meaning versus intended meaning
Literal Meaning Intended Meaningoften
6=conventional meaning
sentence meaningspeaker meaning
interpretation
linguistic code features author’s intention
give v 1
pron
pronoun q
and+then c
topic n of
a q
discussion n 1
udef q
about p
pron pronoun q
BV
ARG1
ARG2
L-INDEX
BV
R-INDEX BV
ARG1
ARG2
BV
pronoun q
pron
give v 1
a q topic n of
and+then c
discuss v 1
pron pronoun q
BV
ARG1ARG2
L-INDEX
BV
R-INDEX
ARG2
BV
give a topic and then discussion about it. Give a topic and then discuss it.
18 of 60
SemBanking in Natural Language Processing
compositionally
non-compositionally
manually-annotated
grammar-based
PropBank(Kingsbury & Palmer, 2002)FrameNet(Baker et al., 1998)
Redwoods Treebank(Oepen et al., 2004)TREPIL(Rosen et al., 2005)Groningen Meaning Bank(Basile et al., 2012)
Abstract Meaning Representation(Banarescu et al., 2013)
comprehensivenessconsistencyscalability
Bender, E.M., Flickinger, D., Oepen, S., Packard, W. and Copestake, A.Layers of interpretation: On grammar and compositionality. ICWS 2015.
19 of 60
Two languages, three tasks
Chinese as a Second Language
I Semantic Role Labeling
I Negation Scope Resolution
English as a Second Language
I Compositional Semantics
Semantic Graph Parsing
20 of 60
SemBanking in Natural Language Processing
compositionally
non-compositionally
manually-annotated
grammar-based
PropBank(Kingsbury & Palmer, 2002)FrameNet(Baker et al., 1998)
Redwoods Treebank(Oepen et al., 2004)TREPIL(Rosen et al., 2005)Groningen Meaning Bank(Basile et al., 2012)
Abstract Meaning Representation(Banarescu et al., 2013)
comprehensivenessconsistencyscalability
4
Bender, E.M., Flickinger, D., Oepen, S., Packard, W. and Copestake, A.Layers of interpretation: On grammar and compositionality. ICWS 2015.
21 of 60
Semantic Role Labeling
I Argument (AN): Who did what to whom ?
I Adjunct (AM): When , where , why and how ?
I ate breakfast quickly in the car this morning because I was in a hurry
A0 A1
AM-MNR
AM-LOC
AM-TMP
AM-PRP
Our work
Z. Lin, Y. Duan, Y. Zhao, W. Sun and X. Wan. Semantic Role Labelingfor Learner Chinese: the Importance of Syntactic Analysis and L2-L1Parallel Data. EMNLP 2018.
22 of 60
Data source
initial collection1,108,907 pairs
717,241 pairsclean up reasonably
sized
manual
selectionannotation
typologically different mother tongues
Chinese Sino-TibetanRussian SlavicArabic SemiticJapanese ?English Germanic
23 of 60
Inter-annotator agreement
I Annotator: two students majoring in linguistics
I The first 50-sentence trial set: adapting and refining the ChinesePropBank secification
I The rest 100-sentence set: reporting the inter-annotator agreement
ENG JPN RUS ARA90
92
94
96
98
100
L1
L2
24 of 60
Negation Scope Resolution
I Negation cue: linguistic unit that expresses negation.
I Negation event: the event related to a cue.
I Negation scope : the maximum part(s) of the sentence that areinfluenced or negated by negation cue.
(2) a. We needs actions and not thoughts .
b. He failed to catch the first train .
c. This is an un clean desk .
d. 换言说,没有 宗教生活与日常生活差距 。
e. 换言说, 宗教生活与日常生活之间 没有 距离 。
Our work
M. Zhang, W. Wang, Y. Zhao, S. Sun, W. Sun and X. Wan. NegationScope Resolution for Chinese as a Second Language. (under submission)
25 of 60
Inter-annotator agreement
C. F1 S. F1 T. F1 C. Kappa S. Kappa
CD-SCO (2012) 94.88 85.04 91.53 - -SFU Review (2012) 92.79 81.88 - 92.70 87.20BioScope (2008) 98.65 95.91 - - -CNeSp (2015) - - - 95.00 93.00
L2-chi, L1-eng 100.00 92.55 97.12 100.00 96.13chiL2⇒L1, L1-eng 100.00 92.55 97.71 100.00 95.65L2-chi, L1-jpn 100.00 90.09 94.62 100.00 92.15chiL2⇒L1, L1-jpn 100.00 90.09 94.35 100.00 92.32
I Inter-annotator agreement w.r.t. negation cue and scope.
I Previous negation corpora reported cue-level F1 (C. F1) at 91%-95%,scope-level F1 (S. F1) at 76%-85%, token-level F1 (T. F1) at88%-92%, and kappa at 87%-91%.
26 of 60
SemBanking in Natural Language Processing
compositionally
non-compositionally
manually-annotated
grammar-based
PropBank(Kingsbury & Palmer, 2002)FrameNet(Baker et al., 1998)
Redwoods Treebank(Oepen et al., 2004)TREPIL(Rosen et al., 2005)Groningen Meaning Bank(Basile et al., 2012)
Abstract Meaning Representation(Banarescu et al., 2013)
comprehensivenessconsistencyscalability 4
Bender, E.M., Flickinger, D., Oepen, S., Packard, W. and Copestake, A.Layers of interpretation: On grammar and compositionality. ICWS 2015.
27 of 60
Treebank of Learner English (Berzak et al., 2016)
TLE (http://esltreebank.org/)
I a collection of 5,124 ESL sentences
I manually annotated with POS tags and UD trees
I in original and error corrected forms.
28 of 60
SemBanking by intergrating TLE and ERG
Input
Sentence
ACE/PET
parser
(Packard, 2013)
Meaning Representation 1
Meaning Representation 2
...Meaning Representation K
RerankerMeaning
Representation
29 of 60
SemBanking by intergrating TLE and ERG
Input
Sentence
ACE/PET
parser
(Packard, 2013)
English Resource Grammar
(ERG; Flickinger (1999))
Hand-crafted computational grammar;
HPSG-based;
25+ person years
Meaning Representation 1
Meaning Representation 2
...Meaning Representation K
RerankerMeaning
Representation
29 of 60
SemBanking by intergrating TLE and ERG
Input
Sentence
ACE/PET
parser
(Packard, 2013)
English Resource Grammar
(ERG; Flickinger (1999))
Hand-crafted computational grammar;
HPSG-based;
25+ person years
ACE/PET
parser
(Packard, 2013)
Meaning Representation 1
Meaning Representation 2
...Meaning Representation K
RerankerMeaning
Representation
29 of 60
SemBanking by intergrating TLE and ERG
Input
Sentence
ACE/PET
parser
(Packard, 2013)
English Resource Grammar
(ERG; Flickinger (1999))
Hand-crafted computational grammar;
HPSG-based;
25+ person years
ACE/PET
parser
(Packard, 2013)
Meaning Representation 1
Meaning Representation 2
...Meaning Representation K
RerankerMeaning
Representation
29 of 60
SemBanking by intergrating TLE and ERG
Input
Sentence
ACE/PET
parser
(Packard, 2013)
English Resource Grammar
(ERG; Flickinger (1999))
Hand-crafted computational grammar;
HPSG-based;
25+ person years
ACE/PET
parser
(Packard, 2013)
Meaning Representation 1
Meaning Representation 2
...Meaning Representation K
Reranker
Universal Dependencies
(UD)
manually annotated
Meaning
Representation
29 of 60
SemBanking by intergrating TLE and ERG
Input
Sentence
ACE/PET
parser
(Packard, 2013)
English Resource Grammar
(ERG; Flickinger (1999))
Hand-crafted computational grammar;
HPSG-based;
25+ person years
ACE/PET
parser
(Packard, 2013)
Meaning Representation 1
Meaning Representation 2
...Meaning Representation K
Reranker
Universal Dependencies
(UD)
manually annotated
K candidate graphs G1,G2, ...,GK and gold UD tree T :
G = arg max16i6K score(Gi ,T )
score(Gi ,T ) = WTF(fGi, fT )
Meaning
Representation
29 of 60
SemBanking by intergrating TLE and ERG
Input
Sentence
ACE/PET
parser
(Packard, 2013)
English Resource Grammar
(ERG; Flickinger (1999))
Hand-crafted computational grammar;
HPSG-based;
25+ person years
ACE/PET
parser
(Packard, 2013)
Meaning Representation 1
Meaning Representation 2
...Meaning Representation K
Reranker
Universal Dependencies
(UD)
manually annotated
K candidate graphs G1,G2, ...,GK and gold UD tree T :
G = arg max16i6K score(Gi ,T )
score(Gi ,T ) = WTF(fGi, fT )
Meaning
Representation
Target Representation:
Elementary Dependency Structures
(EDS; Oepen and Lønning (2006))
and several others
29 of 60
Performance of reranking
−Rerank Rerank(50) Rerank(500) Oracle(50) Oracle(500)85
90
95
100
IAA-EDMsmatch
EDM
Evaluation on DeepBank
Inter-Annotator Agreement (IAA) of EDM is reported in Bender et al.(2015).
30 of 60
Research questions
I How can we capture meanings of L2s?How can we annotate L2 texts?Are there many differences from annotating L1 texts?
I How badly does an L1 data-trained semantic parser perform?Can state-of-the-art grammatical error correction systems help?
I What role does syntactic parsing play in processing L2 texts?What role does cross-lingual transfer play?
31 of 60
Two languages, three tasks
Chinese as a Second Language
I Semantic Role Labeling
I Negation Scope Resolution
English as a Second Language
I Compositional Semantics
Semantic Graph Parsing
some q
boy n 1
want v to
go v 1
BV ARG1 ARG2
ARG1
31 of 60
Evaluation – smatch (Cai & Knight, 2013)
a:want
c:go
b:boy
gold graph
ARG1
ARG0
ARG0
a′:want
c′:go
b′:boy
predicted graph
ARG1
ARG0
1 : (a, b,ARG0) 1 : (a′, b′,ARG0) vxx ′ = 1, x ∈ {a, b, c}2 : (a, c ,ARG1) 2 : (a′, c ′,ARG1) t11 = 13 : (c, b,ARG0) t22 = 1
32 of 60
Evaluation – smatch (Cai & Knight, 2013)
a:want
c:go
b:boy
gold graph
ARG1
ARG0
ARG0
a′:want
c′:go
b′:boy
predicted graph
ARG1
ARG0
1 : (a, b,ARG0) 1 : (a′, b′,ARG0) vxx ′ = 1, x ∈ {a, b, c}2 : (a, c ,ARG1) 2 : (a′, c ′,ARG1) t11 = 13 : (c, b,ARG0) t22 = 1
32 of 60
Error-oriented smatch
give v 1
pron
pronoun q
and+then c
topic n of
a q
discussion n 1
udef q
about p
pron pronoun q
BV
ARG1
ARG2
L-INDEX
BV
R-INDEX BV
ARG1
ARG2
BV
I Most of the structure is good.
I We need to focus on errors!
Add weights
33 of 60
Error-oriented smatch
give v 1
pron
pronoun q
and+then c
topic n of
a q
discussion n 1
udef q
about p
pron pronoun q
BV
ARG1
ARG2
L-INDEX
BV
R-INDEX BV
ARG1
ARG2
BV
I Most of the structure is good.
I We need to focus on errors!Add weights
33 of 60
Node-relaxed smatch
discussion n 1
udef q
about p
pron pronoun q
Predicted Graph
discuss v 1
pron pronoun q
Gold Graph
BV
ARG1
ARG2
BV
ARG2
BV
?
We went this discuss .
We went to this discussion . Statistical Machine
Translation (SMT)
discuss ↔ discussion
Paraphrase Table
Large-scale L2-L1 data exists: lang-8
34 of 60
Node-relaxed smatch
discussion n 1
udef q
about p
pron pronoun q
Predicted Graph
discuss v 1
pron pronoun q
Gold Graph
BV
ARG1
ARG2
BV
ARG2
BV
?
We went this discuss .
We went to this discussion . Statistical Machine
Translation (SMT)
discuss ↔ discussion
Paraphrase Table
Large-scale L2-L1 data exists: lang-8
34 of 60
Semantic parsers (Koller et al., 2019)
factorization-basedcomposition-based
transition-based
translation-based
2014 2015 2016 2017 2018 2019
0.5
0.6
0.7
0.8
Modeling syntactico-semantic derivation/composition vsderived/composed structures
35 of 60
Compositionality
The meaning of an expression is a function of the meaningsof its parts and of the way they are syntactically combined.
B. Partee
brown + cow = brown cow
Fodor and Lepore (2002)
36 of 60
Hyperedge Replacement Grammar-based parser
and then
ARG1CONJ
discussion
about
pronudef
pronoun
BV ARG1 ARG2BV
NP
and then discussion about it
R-INDEX
NP
CONJ
NPdiscussion
about
pron
udefpronoun
BV ARG1 ARG2BV
R-INDEX
ARG1
then
and
NP
37 of 60
Hyperedge Replacement Grammar-based parser
and then
ARG1CONJ
discussion
about
pronudef
pronoun
BV ARG1 ARG2BV
NP
and then discussion about it
R-INDEX
NP
CONJ
NP
discussion
about
pron
udefpronoun
BV ARG1 ARG2BV
R-INDEX
ARG1
then
and
NP
37 of 60
Hyperedge Replacement Grammar-based parser
and then
ARG1CONJ
discussion
about
pronudef
pronoun
BV ARG1 ARG2BV
NP
and then discussion about it
R-INDEX
NP
CONJ
NP
discussion
about
pron
udefpronoun
BV ARG1 ARG2BV
R-INDEX
ARG1
then
and
NP
37 of 60
Hyperedge Replacement Grammar-based parser
and then
ARG1CONJ
discussion
about
pronudef
pronoun
BV ARG1 ARG2BV
NP
and then discussion about it
R-INDEX
NP
CONJ
NPdiscussion
about
pron
udefpronoun
BV ARG1 ARG2BV
R-INDEX
ARG1
then
and
NP
37 of 60
Factorization-based parser
I Inspired by graph-based dependency parsersI Explicitly models the target structure
topic and then discussion
encoder encoder encoder encoderr4 r6
arg max
topic n of and c then a 1 discussion n 1
c4 c6Biaffine
ScoreEdge( and c→ discussion n 1)
Systems
Y. Chen, Y. Ye and W. Sun. 2019. Peking at MRP 2019: Factorization-and Composition-Based Parsing for Elementary Dependency Structures.CoNLL Shared Task on Cross-Framework Meaning Representation Parsing.
38 of 60
Factorization-based parser
I Inspired by graph-based dependency parsersI Explicitly models the target structure
topic and then discussion
encoder encoder encoder encoderr4 r6
arg max
topic n of and c then a 1 discussion n 1
c4 c6Biaffine
ScoreEdge( and c→ discussion n 1)
Systems
Y. Chen, Y. Ye and W. Sun. 2019. Peking at MRP 2019: Factorization-and Composition-Based Parsing for Elementary Dependency Structures.CoNLL Shared Task on Cross-Framework Meaning Representation Parsing.
38 of 60
Factorization-based parser
I Inspired by graph-based dependency parsersI Explicitly models the target structure
topic and then discussion
encoder encoder encoder encoder
r4 r6
arg max
topic n of and c then a 1 discussion n 1
c4 c6Biaffine
ScoreEdge( and c→ discussion n 1)
Systems
Y. Chen, Y. Ye and W. Sun. 2019. Peking at MRP 2019: Factorization-and Composition-Based Parsing for Elementary Dependency Structures.CoNLL Shared Task on Cross-Framework Meaning Representation Parsing.
38 of 60
Factorization-based parser
I Inspired by graph-based dependency parsersI Explicitly models the target structure
topic and then discussion
encoder encoder encoder encoder
r4 r6
arg max
topic n of and c then a 1 discussion n 1
c4 c6Biaffine
ScoreEdge( and c→ discussion n 1)
Systems
Y. Chen, Y. Ye and W. Sun. 2019. Peking at MRP 2019: Factorization-and Composition-Based Parsing for Elementary Dependency Structures.CoNLL Shared Task on Cross-Framework Meaning Representation Parsing.
38 of 60
Factorization-based parser
I Inspired by graph-based dependency parsersI Explicitly models the target structure
topic and then discussion
encoder encoder encoder encoderr4 r6
arg max
topic n of and c then a 1 discussion n 1
c4 c6Biaffine
ScoreEdge( and c→ discussion n 1)
Systems
Y. Chen, Y. Ye and W. Sun. 2019. Peking at MRP 2019: Factorization-and Composition-Based Parsing for Elementary Dependency Structures.CoNLL Shared Task on Cross-Framework Meaning Representation Parsing.
38 of 60
Factorization-based parser
I Inspired by graph-based dependency parsersI Explicitly models the target structure
topic and then discussion
encoder encoder encoder encoderr4 r6
arg max
topic n of and c then a 1 discussion n 1
c4 c6Biaffine
ScoreEdge( and c→ discussion n 1)
Systems
Y. Chen, Y. Ye and W. Sun. 2019. Peking at MRP 2019: Factorization-and Composition-Based Parsing for Elementary Dependency Structures.CoNLL Shared Task on Cross-Framework Meaning Representation Parsing.
38 of 60
Factorization-based parser
I Inspired by graph-based dependency parsersI Explicitly models the target structure
topic and then discussion
encoder encoder encoder encoderr4 r6
arg max
topic n of and c then a 1 discussion n 1
c4 c6Biaffine
ScoreEdge( and c→ discussion n 1)
Systems
Y. Chen, Y. Ye and W. Sun. 2019. Peking at MRP 2019: Factorization-and Composition-Based Parsing for Elementary Dependency Structures.CoNLL Shared Task on Cross-Framework Meaning Representation Parsing.
38 of 60
Factorization-based parser
I Inspired by graph-based dependency parsersI Explicitly models the target structure
topic and then discussion
encoder encoder encoder encoderr4 r6
arg max
topic n of and c then a 1 discussion n 1
c4 c6Biaffine
ScoreEdge( and c→ discussion n 1)
Systems
Y. Chen, Y. Ye and W. Sun. 2019. Peking at MRP 2019: Factorization-and Composition-Based Parsing for Elementary Dependency Structures.CoNLL Shared Task on Cross-Framework Meaning Representation Parsing.
38 of 60
Factorization-based parser
I Inspired by graph-based dependency parsersI Explicitly models the target structure
topic and then discussion
encoder encoder encoder encoderr4 r6
arg max
topic n of and c then a 1 discussion n 1
c4 c6Biaffine
ScoreEdge( and c→ discussion n 1)
Systems
Y. Chen, Y. Ye and W. Sun. 2019. Peking at MRP 2019: Factorization-and Composition-Based Parsing for Elementary Dependency Structures.CoNLL Shared Task on Cross-Framework Meaning Representation Parsing.
38 of 60
Factorization-based parser
I Inspired by graph-based dependency parsersI Explicitly models the target structure
topic and then discussion
encoder encoder encoder encoderr4 r6
arg max
topic n of and c then a 1 discussion n 1
c4 c6
Biaffine
ScoreEdge( and c→ discussion n 1)
Systems
Y. Chen, Y. Ye and W. Sun. 2019. Peking at MRP 2019: Factorization-and Composition-Based Parsing for Elementary Dependency Structures.CoNLL Shared Task on Cross-Framework Meaning Representation Parsing.
38 of 60
Factorization-based parser
I Inspired by graph-based dependency parsersI Explicitly models the target structure
topic and then discussion
encoder encoder encoder encoderr4 r6
arg max
topic n of and c then a 1 discussion n 1
c4 c6Biaffine
ScoreEdge( and c→ discussion n 1)
Systems
Y. Chen, Y. Ye and W. Sun. 2019. Peking at MRP 2019: Factorization-and Composition-Based Parsing for Elementary Dependency Structures.CoNLL Shared Task on Cross-Framework Meaning Representation Parsing.
38 of 60
Factorization-based parser
I Inspired by graph-based dependency parsersI Explicitly models the target structure
topic and then discussion
encoder encoder encoder encoderr4 r6
arg max
topic n of and c then a 1 discussion n 1
c4 c6Biaffine
ScoreEdge( and c→ discussion n 1)
Systems
Y. Chen, Y. Ye and W. Sun. 2019. Peking at MRP 2019: Factorization-and Composition-Based Parsing for Elementary Dependency Structures.CoNLL Shared Task on Cross-Framework Meaning Representation Parsing.
38 of 60
Factorization-based parser
I Inspired by graph-based dependency parsersI Explicitly models the target structure
topic and then discussion
encoder encoder encoder encoderr4 r6
arg max
topic n of and c then a 1 discussion n 1
c4 c6Biaffine
ScoreEdge( and c→ discussion n 1)
Systems
Y. Chen, Y. Ye and W. Sun. 2019. Peking at MRP 2019: Factorization-and Composition-Based Parsing for Elementary Dependency Structures.CoNLL Shared Task on Cross-Framework Meaning Representation Parsing.
38 of 60
Results — Parsing to literal meanings
DeepBank L1 L2
80
90
100
Composition-Based
Factorization-Based
Model Error-oriented Node Edge All
Composition-BasedNO 89.04 82.14 85.75
YES 71.46 79.77 76.66
Factorization-BasedNO 90.96 84.48 87.86
YES 73.55 80.27 77.75
39 of 60
Results — Parsing to literal meanings
DeepBank L1 L2
80
90
100
Composition-Based
Factorization-Based
Model Error-oriented Node Edge All
Composition-BasedNO 89.04 82.14 85.75
YES 71.46 79.77 76.66
Factorization-BasedNO 90.96 84.48 87.86
YES 73.55 80.27 77.75
39 of 60
Results — Parsing to literal meanings
WOadv WOinc Others Wform Pform Nn ArtOrDet
40
60
80
Composition-Based
Factorization-Based
I ArtOrDet:It is obvious to see that (internet → the internet) saves people timeand also connects people globally.
I WOinc:(Someone having what kind of disease → What kind of diseasesomeone has) is a matter of their own privacy.
40 of 60
Grammatical Error Correction (GEC)
Original Sentence
give a topic and thendiscussion about it .
Figure 1: Architecture of our multilayer convolutional modelwith seven encoder and seven decoder layers (only one en-coder and one decoder layer are illustrated in detail).
si ∈ Rd is given by si = w(si) + p(i), where w(si) is theword embedding and p(i) is the position embedding cor-responding to the position i of token si in the source sen-tence. Both embeddings are obtained from embedding ma-trices that are trained along with other parameters of the net-work.
The encoder and decoder are made up of L layers each.The architecture of the network is shown in Figure 1. Thesource token embeddings, s1, . . . , sm, are linearly mappedto get input vectors of the first encoder layer, h0
1, . . . ,h0m,
where h0i ∈ Rh and h is the input and output dimension of
all encoder and decoder layers. Linear mapping is done bymultiplying a vector with weights W ∈ Rh×d and addingthe biases b ∈ Rh:
h0i = Wsi + b
In the first encoder layer, 2h convolutional filters of dimen-sion 3 × h map every sequence of three consecutive inputvectors to a feature vector f1i ∈ R2h. Paddings (denoted by<pad> in Figure 1) are added at the beginning and end ofthe source sentence to retain the same number of output vec-tors as the source tokens after the convolution operations.
f1i = Conv(h0i−1,h
0i ,h
0i+1)
where Conv(·) represents the convolution operation. This isfollowed by a non-linearity using gated linear units (GLU)(Dauphin et al. 2016):
GLU(f1i ) = f1i,1:h ◦ σ(f1i,h+1:2h)
where GLU(f1i ) ∈ Rh, ◦ and σ represent element-wise mul-tiplication and sigmoid activation functions, respectively,and f1i,u:v denotes the elements of f1i from indices u to v(both inclusive). The input vectors to an encoder layer arefinally added as residual connections. The output vectors ofthe lth encoder layer are given by,
hli = GLU(f li ) + hl−1
i i = 1, . . . ,m
Each output vector of the final encoder layer, hLi ∈ Rh, is
linearly mapped to get the encoder output vector, ei ∈ Rd,using weights We ∈ Rd×h and biases be ∈ Rd:
ei = WehLi + be i = 1, . . . ,m
Now, consider the generation of the target word tn atthe nth time step in decoding, with n − 1 target wordspreviously generated. For the decoder, paddings are addedat the beginning. The two paddings, beginning-of-sentencemarker and the previously generated tokens, are embeddedas t−2, t−1, t0, t1, . . . , tn−1 in the same way as source to-ken embeddings are computed. Each embedding tj ∈ Rd islinearly mapped to g0
j ∈ Rh and passed as input to the firstdecoder layer. In each decoder layer, convolution operationsfollowed by non-linearities are performed on the previousdecoder layer’s output vectors gl−1
j , where j = 1, . . . , n:
ylj = GLU(Conv(gl−1
j−3,gl−1j−2,g
l−1j−1)
where Conv(·) and GLU(·) represent convolutions and non-linearities respectively, and yl
j becomes the decoder state atthe jth time step in the lth decoder layer. The number and sizeof convolution filters are the same as those in the encoder.
Each decoder layer has its own attention module. To com-pute attention at layer l before predicting the target tokenat the nth time step, the decoder state yl
n ∈ Rh is lin-early mapped to a d-dimensional vector with weights Wz ∈Rd×h and biases bz ∈ Rd, adding the previous target to-ken’s embedding:
zln = Wzyln + bz + tn−1
The attention weights αln,i are computed by a dot product of
the encoder output vectors e1, . . . , em with zln and normal-ized by a softmax:
αln,i =
exp(e�i zln)∑m
k=1 exp(e�k zln)i = 1, . . . ,m
The source context vector xln is computed by applying the
attention weights to the summation of the encoder outputvectors and the source embeddings. The addition of thesource embeddings helps to better retain information aboutthe source tokens.
xln =
m∑
i=1
αln,i(ei + si)
The context vector xln is then linearly mapped to cln ∈ Rh.
The output vector of the lth decoder layer, gln, is the summa-tion of cln, yl
n, and the previous layer’s output vector gl−1n .
gln = yl
n + cln + gl−1n
The final decoder layer output vector gLn is linearly mapped
to dn ∈ Rd. Dropout (Srivastava et al. 2014) is applied atthe decoder outputs, embeddings, and before every encoderand decoder layer. The decoder output vector is then mappedto the target vocabulary size (|Vt|) and softmax is computedto obtain target word probabilities.
on = Wodn + bo Wo ∈ R|Vt|×d,bo ∈ R|Vt|
5757
Chollampatt and Ng (2018)158
Copy Scores Vocabulary Distribution
Final Distribution
!1 !3!2 !4 "1 "3"2 "4 "5
#1 #3#2 #4
Encoder Decoder
Attention Distribution
h1src h2src h3src h4src h1trg h2
trg h3trg h4
trg h5trg
+α tcopy
×N×N
Token-level labeling output
Figure 1: Copy-Augmented Architecture.
and generating is controlled by a balancing factorαcopyt ∈ [0, 1] at each time step t.
pt(w) = (1−αcopyt )∗pgent (w)+(αcopy
t )∗pcopyt (w)(5)
The new architecture outputs the generationprobability distribution as the base model, by gen-erating the target hidden state. The copying scoreover the source input tokens is calculated with anew attention distribution between the decoder’scurrent hidden state htrg and the encoder’s hiddenstates Hsrc (same as hsrc1...N ). The copy attention iscalculated the same as the encoder-decoder atten-tions, listed in Equation 6, 7, 8 :
qt,K, V = htrgt W Tq , H
srcW Tk , H
srcW Tv (6)
At = qTt K (7)
P copyt (w) = softmax(At) (8)
The qt, K and V are the query, key, and valuethat needed to calculate the attention distributionand the copy hidden state. We use the normalizedattention distribution as the copy scores and usethe copy hidden states to estimate the balancingfactor αcopy
t .
αcopyt = sigmoid(W T
∑(AT
t · V )) (9)
The loss function is as described in Equation 4,but with respect to our mixed probability distribu-tion yt given in Equation 5.
3 Pre-training
Pre-training is shown to be useful in many taskswhen lacking vast amounts of training data. Inthis section, we propose denoising auto-encoders,which enables pre-training our models with large-scale unlabeled corpus. We also introduce a par-tially pre-training method to make a comparisonwith the denoising auto-encoder.
3.1 Denoising Auto-encoder
Denoising auto-encoders (Vincent et al., 2008) arecommonly used for model initialization to extractand select features from inputs. BERT (Devlinet al., 2018) used a pre-trained bi-directional trans-former model and outperformed existing systemsby a wide margin on many NLP tasks. In contrastto denoising auto-encoders, BERT only predictsthe 15% masked words rather than reconstructingthe entire input. BERT denoise the 15% of thetokens at random by replacing 80% of them with[MASK], 10% of them with a random word and10% of them unchanged.
Inspired by BERT and denoising auto-encoders,we pre-traine our copy-augmented sequence to se-quence model by noising the One Billion WordBenchmark (Chelba et al., 2013), which is a largesentence-level English corpus. In our experiments,the corrupted sentence pairs are generated by the
Zhao et al. (2019)
manually annotated
Give a topic and then discuss it .
give a topic and then discuss it .
give a topic and then discuss it .
Semantic Parser
41 of 60
Results — Parsing to intended meanings
Original Chollampattand Ng (2018)
Zhao et al.(2019)
ManuallyCorrected
70
80
90
Standard smatch
Error-oriented smatch
Node-relaxed smatch
Model F0.5
Chollampatt and Ng (2018) 45.36W. Zhao et al. (2019) 61.15
42 of 60
Research questions
I How can we capture meanings of L2s?How can we annotate L2 texts?Are there many differences from annotating L1 texts?
I How badly does an L1 data-trained semantic parser perform?Can state-of-the-art grammatical error correction systems help?
I What role does syntactic parsing play in processing L2 texts?What role does cross-lingual transfer play?
43 of 60
Two languages, three tasks
Chinese as a Second Language
I Semantic Role Labeling
I Negation Scope Resolution
English as a Second Language
I Compositional Semantics
Semantic Graph Parsing
用 汉语 也 说话 快 对 我 来 说 很 难
ARG0
AM
AM
43 of 60
Three SRL systems
Parsers
Systems PCFGLA-parser-basedSRL system
Neural-parser-basedSRL system
Neural syntax-agnosticSRL system
Minimal span-based parser
Berkeley parser
Performance<
Trained on Chinese TreeBank that has SRL in CPB
Trained on Chinese PropBank (CPB)
44 of 60
Syntax-agnostic SRL
B-A0 I-A0 I-A0 I-A0 I-A0 B-AM I-AM I-AM I-AM B-AM REL
用 汉语 也 说话 快 对 我 来 说 很 难
45 of 60
Syntax-agnostic SRL
B-A0 I-A0 I-A0 I-A0 I-A0 B-AM I-AM I-AM I-AM B-AM REL
用 汉语 也 说话 快 对 我 来 说 很 难
45 of 60
Syntax-agnostic SRL
B-A0 I-A0 I-A0 I-A0 I-A0 B-AM I-AM I-AM I-AM B-AM REL
用 汉语 也 说话 快 对 我 来 说 很 难
用 汉语 也 说话
BiLSTM BiLSTM BiLSTM BiLSTM
MLP MLP MLP MLP
B-A0 I-A0 I-A0 I-A0
45 of 60
Syntax-based SRL
CP
PU
。.
SP
啊mod
IP
VP
VP
VP
VP
ADJ
难hard
ADVP
很very
PP
对我来说for me
VP
说话快speaking quickly
ADVP
也also
IP
用汉语using Chinese
REL
A0
AM
NULL
AM AM
46 of 60
Syntax-based SRL
CP
PU
。.
SP
啊mod
IP
VP
VP
VP
VP
ADJ
难hard
ADVP
很very
PP
对我来说for me
VP
说话快speaking quickly
ADVP
也also
IP
用汉语using Chinese
REL
A0
AM
NULL
AM AM
46 of 60
Syntax-based SRL
CP
PU
。.
SP
啊mod
IP
VP
VP
VP
VP
ADJ
难hard
ADVP
很very
PP
对我来说for me
VP
说话快speaking quickly
ADVP
也also
IP
用汉语using Chinese
REL
A0
AM
NULL
AM AM
46 of 60
Syntax-based SRL
CP
PU
。.
SP
啊mod
IP
VP
VP
VP
VP
ADJ
难hard
ADVP
很very
PP
对我来说for me
VP
说话快speaking quickly
ADVP
也also
IP
用汉语using Chinese
REL
A0
AM
NULL
AM AM
46 of 60
Syntax-based SRL
CP
PU
。.
SP
啊mod
IP
VP
VP
VP
VP
ADJ
难hard
ADVP
很very
PP
对我来说for me
VP
说话快speaking quickly
ADVP
也also
IP
用汉语using Chinese
REL
A0
AM
NULL
AM AM
46 of 60
Syntax-based SRL
CP
PU
。.
SP
啊mod
IP
VP
VP
VP
VP
ADJ
难hard
ADVP
很very
PP
对我来说for me
VP
说话快speaking quickly
ADVP
也also
IP
用汉语using Chinese
REL
A0
AM
NULL
AM
AM
46 of 60
Syntax-based SRL
CP
PU
。.
SP
啊mod
IP
VP
VP
VP
VP
ADJ
难hard
ADVP
很very
PP
对我来说for me
VP
说话快speaking quickly
ADVP
也also
IP
用汉语using Chinese
REL
A0
AM
NULL
AM AM
46 of 60
Evaluation and findings
ENG JPN RUS ARA ALL
65
70
75
80
Performance on L1
PCFGLA-parser-based-L1
Neural-parser-based-L1
Neural syntax-agnostic-L1
The syntax-based systems are more robust when handling learner texts.
47 of 60
Evaluation and findings
ENG JPN RUS ARA ALL
65
70
75
80
Performance on L1 & L2
PCFGLA-parser-based-L1 Neural-parser-based-L1 Neural syntax-agnostic-L1
PCFGLA-parser-based-L2 Neural-parser-based-L2 Neural syntax-agnostic-L2
The syntax-based systems are more robust when handling learner texts.
47 of 60
Evaluation and findings
ENG JPN RUS ARA ALL
65
70
75
80
Performance on L1 & L2
PCFGLA-parser-based-L1 Neural-parser-based-L1 Neural syntax-agnostic-GAP
PCFGLA-parser-based-L2 Neural-parser-based-L2 Neural syntax-agnostic-L2
The syntax-based systems are more robust when handling learner texts.
47 of 60
Why syntactic analysis is important?
用 汉语 也 说话 快 对我来说 很 难 啊。Using Chinese also speaking quickly to me very hard.
Gold
A0 rel
A0
Syntax-based system
Neural end-to-end system
AM AM
A0 AM AM AM rel
AM rel
It is very hard for me to speak Chinese quickly.
48 of 60
Why syntactic analysis is important?
CP
PU
。.
SP
啊mod
IP
VP
VP
VP
VP
难hard
ADVP
很very
PP
对我来说for me
VP
说话快speaking quickly
ADVP
也also
IP
用汉语using Chinese
Though the whole structure is bad , some parts may be good .
49 of 60
Why syntactic analysis is important?
CP
PU
。.
SP
啊mod
IP
VP
VP
VP
VP
难hard
ADVP
很very
PP
对我来说for me
VP
说话快speaking quickly
ADVP
也also
IP
用汉语using Chinese
Though the whole structure is bad ,
some parts may be good .
49 of 60
Why syntactic analysis is important?
CP
PU
。.
SP
啊mod
IP
VP
VP
VP
VP
难hard
ADVP
很very
PP
对我来说for me
VP
说话快speaking quickly
ADVP
也also
IP
用汉语using Chinese
Though the whole structure is bad , some parts may be good .
49 of 60
Research questions
I How can we capture meanings of L2s?How can we annotate L2 texts?Are there many differences from annotating L1 texts?
I How badly does an L1 data-trained semantic parser perform?Can state-of-the-art grammatical error correction systems help?
I What role does syntactic parsing play in processing L2 texts?What role does cross-lingual transfer play?
50 of 60
Something like
japanesenative
chineseto learn L2-chi, L1-jpn
englishnative
chineseto learn L2-chi, L1-eng
learn
learn
50 of 60
Two languages, three tasks
Chinese as a Second Language
I Semantic Role Labeling
I Negation Scope Resolution
English as a Second Language
I Compositional Semantics
Semantic Graph Parsing
(3) a. We needs actions and not thoughts .
b. He failed to catch the first train .
c. This is an un clean desk .
d. 换言说,没有 宗教生活与日常生活差距 。
e. 换言说, 宗教生活与日常生活之间 没有 距离 。
51 of 60
L2-Japanese data is more useful
L2 L1
ENG
L2 L1
JPN
Training set: ALL
L2 L1
ENG
L2 L1
JPN
Training set: L2-chi, L1-eng/jpn
L2 L1
ENG
L2 L1
JPN
Training set: L2-chi, L1-eng
TrainTest
L2-chi, L1-eng chiL2⇒L1, L1-eng L2-chi, L1-jpn chiL2⇒L1, L1-jpn
L2-chi, L1-eng 73.4/67.8/61.3 73.7/69.4/62.9 71.6/64.9/56.3 71.1/64.5/56.9
chiL2⇒L1, L1-eng 73.4/68.1/61.6 73.8/69.7/62.4 70.7/65.1/58.2 71.0/65.4/57.6
L2-chi, L1-jpn 73.2/65.6/57.3 74.7/68.0/60.7 75.2/68.5/62.8 74.2/68.4/61.5chiL2⇒L1, L1-jpn 72.9/65.1/58.0 74.0/68.3/60.8 74.6/68.4/59.8 74.1/68.5/60.7
Table: Recall scores of BERT-BiLSTM/ELMo-BiLSTM/Random-BiLSTM.
52 of 60
Some examples
(4) a. 所以大多数的人觉得 受 不 了日本的夏天 。
b. 我还 没有 上小学 的时候。
c. 我们 不 应该恐怕说错 还有不好意思的事 。
I pro-drop
I relative clause
I coordination
53 of 60
Conclusion and future work
I How can we capture meanings of L2s?How can we annotate L2 texts?Are there many differences from annotating L1 texts?
I How badly does an L1 data-trained semantic parser perform?Can state-of-the-art grammatical error correction systems help?
I What role does syntactic parsing play in processing L2 texts?What role does cross-lingual transfer play?
I How can we effectively enlarge annotated corpora?
I What is the best practice to annotate syntactic structures of secondlanguages?
I What types of computational analyses can we develop for secondlanguage acquisition?
I Can we access learners’ language capability by annotating theirlanguage outputs?
54 of 60
Conclusion and future work
I How can we capture meanings of L2s?How can we annotate L2 texts?Are there many differences from annotating L1 texts?
I How badly does an L1 data-trained semantic parser perform?Can state-of-the-art grammatical error correction systems help?
I What role does syntactic parsing play in processing L2 texts?What role does cross-lingual transfer play?
I How can we effectively enlarge annotated corpora?
I What is the best practice to annotate syntactic structures of secondlanguages?
I What types of computational analyses can we develop for secondlanguage acquisition?
I Can we access learners’ language capability by annotating theirlanguage outputs?
54 of 60
Game over
THANK YOU
Joint work with Yuanyuan Zhao, Mengyu Zhang, Weiqi Wang, Zi Lin,Yuguang Duan
55 of 60
References I
Baker, C. F., Fillmore, C. J., & Lowe, J. B. (1998). The berkeley framenetproject. In Proceedings of the 17th international conference oncomputational linguistics-volume 1 (pp. 86–90).
Banarescu, L., Bonial, C., Cai, S., Georgescu, M., Griffitt, K., Hermjakob,U., . . . Schneider, N. (2013). Abstract meaning representation forsembanking. In Proceedings of the 7th linguistic annotationworkshop and interoperability with discourse (pp. 178–186).
Basile, V., Bos, J., Evang, K., & Venhuizen, N. (2012). Developing alarge semantically annotated corpus. In Eighth internationalconference on language resources and evaluation (pp. 3196–3200).
Bender, E. M., Flickinger, D., Oepen, S., Packard, W., & Copestake, A.(2015). Layers of interpretation: On grammar and compositionality.In Proceedings of the 11th international conference oncomputational semantics (pp. 239–249).
54 of 60
References II
Berzak, Y., Kenney, J., Spadine, C., Wang, J. X., Lam, L., Mori, K. S., . . .Katz, B. (2016, August). Universal dependencies for learner english.In Proceedings of the 54th annual meeting of the association forcomputational linguistics (volume 1: Long papers) (pp. 737–746).Berlin, Germany: Association for Computational Linguistics.Retrieved from http://www.aclweb.org/anthology/P16-1070
Cai, S., & Knight, K. (2013). Smatch: an evaluation metric for semanticfeature structures. In Acl 2013 (Vol. 2, pp. 748–752).
Chollampatt, S., & Ng, H. T. (2018, February). A multilayer convolutionalencoder-decoder neural network for grammatical error correction. InProceedings of the thirty-second aaai conference on artificialintelligence.
Dıaz-Negrillo, A., Meurers, D., Valera, S., & Wunsch, H. (2010). Towardsinterlanguage pos annotation for effective learner corpora in sla andflt. In Language forum (Vol. 36, pp. 139–154).
55 of 60
References III
Dickinson, M., & Ragheb, M. (2013). Annotation for learner englishguidelines.
Flickinger, D. (1999). The english resource grammar..Fodor, J. A., & Lepore, E. (2002). The compositionality papers. Oxford
University Press.Kingsbury, P., & Palmer, M. (2002). From treebank to propbank. In Lrec
(pp. 1989–1993).Koller, A., Oepen, S., & Sun, W. (2019, July). Graph-based meaning
representations: Design and processing. In Proceedings of the 57thannual meeting of the association for computational linguistics:Tutorial abstracts (pp. 6–11). Florence, Italy: Association forComputational Linguistics. Retrieved fromhttps://www.aclweb.org/anthology/P19-4002
Lee, J., Leung, H., & Li, K. (2017). Towards universal dependencies forlearner chinese. In Proceedings of the nodalida 2017 workshop onuniversal dependencies (udw 2017) (pp. 67–71).
56 of 60
References IV
Lin, Z., Duan, Y., Zhao, Y., Sun, W., & Wan, X. (2018). Semantic rolelabeling for learner chinese: the importance of syntactic parsing andl2-l1 parallel data. In Proceedings of the 2018 conference onempirical methods in natural language processing (pp. 3793–3802).
Mizumoto, T., Komachi, M., Nagata, M., & Matsumoto, Y. (2011).Mining revision log of language learning sns for automated japaneseerror correction of second language learners. In Proceedings of 5thinternational joint conference on natural language processing (pp.147–155).
Nagata, R., & Sakaguchi, K. (2016). Phrase structure annotation andparsing for learner english. In Proceedings of the 54th annualmeeting of the association for computational linguistics (volume 1:Long papers) (Vol. 1, pp. 1837–1847).
Oepen, S., Flickinger, D., Toutanova, K., & Manning, C. D. (2004). Lingoredwoods. Research on Language and Computation, 2(4), 575–596.
57 of 60
References V
Oepen, S., & Lønning, J. T. (2006). Discriminant-based mrs banking. InLrec (pp. 1250–1255).
Packard, W. (2013). Ace: the answer constraint engine. URLhttp://sweaglesw. org/linguistics/ace.
Rabinovich, E., Tsvetkov, Y., & Wintner, S. (2018). Native languagecognate effects on second language lexical choice. Transactions ofthe Association for Computational Linguistics, 6, 329–342.
Ragheb, M., & Dickinson, M. (2012, December). Defining syntax forlearner language annotation. In Proceedings of coling 2012: Posters(pp. 965–974). Mumbai, India. Retrieved from http://
cl.indiana.edu/~md7/papers/ragheb-dickinson12.html
58 of 60
References VI
Ragheb, M., & Dickinson, M. (2014). Developing a corpus ofsyntactically-annotated learner language for english. In Proceedingsof the 13th international workshop on treebanks and linguistictheories (tlt13 (pp. 292–300). Tubingen, Germany. Retrieved fromhttp://cl.indiana.edu/~md7/papers/
ragheb-dickinson14b.html
Rosen, V., Meurer, P., De Smedt, K., Butt, M., & King, T. H. (2005).Constructing a parsed corpus with a large lfg grammar. Proceedingsof LFG’05, 371–387.
Sorace, A. (2011). Pinning down the concept of “interface” inbilingualism. Linguistic approaches to bilingualism, 1(1), 1–33.
Zhao, W., Wang, L., Shen, K., Jia, R., & Liu, J. (2019). Improvinggrammatical error correction via pre-training a copy-augmentedarchitecture with unlabeled data..
59 of 60
References VII
Zhao, Y., Jiang, N., Sun, W., & Wan, X. (2018). Overview of the nlpcc2018 shared task: Grammatical error correction. In Ccf internationalconference on natural language processing and chinese computing(pp. 439–445).
Zhao, Y., Sun, W., Cao, J., & Wan, X. (2020). Semantic parsing forenglish as a second language. In Acl.
60 of 60