+ All Categories
Home > Documents > Direkt Profil: an automatic analyzer of texts written in French as a second language Jonas...

Direkt Profil: an automatic analyzer of texts written in French as a second language Jonas...

Date post: 27-Mar-2015
Category:
Upload: amber-perkins
View: 221 times
Download: 0 times
Share this document with a friend
Popular Tags:
37
Direkt Profil: an automatic analyzer of texts written in French as a second language Jonas Granfeldt(1), Pierre Nugues(2), Suzanne Schlyter(1), Malin Ågren (1), Edin Kukovic (1), Emil Persson (1), Jonas Thulin (2), Lisa Persson (2), Fabian Kostadinov (3) (1) Lund University, Centre for languages and literature, French (2) Lund Institute of Technology, Department of Computer Science (3) University of Zürich, Department of Computer Science http://profil.sol.lu.se [email protected]
Transcript
Page 1: Direkt Profil: an automatic analyzer of texts written in French as a second language Jonas Granfeldt(1), Pierre Nugues(2), Suzanne Schlyter(1), Malin Ågren.

Direkt Profil: an automatic analyzer of texts written in French as a second

language

Jonas Granfeldt(1), Pierre Nugues(2), Suzanne Schlyter(1), Malin Ågren (1), Edin Kukovic (1),

Emil Persson (1), Jonas Thulin (2), Lisa Persson (2), Fabian Kostadinov (3)

(1) Lund University, Centre for languages and literature, French

(2) Lund Institute of Technology, Department of Computer Science

(3) University of Zürich, Department of Computer Science

http://profil.sol.lu.se

[email protected]

Page 2: Direkt Profil: an automatic analyzer of texts written in French as a second language Jonas Granfeldt(1), Pierre Nugues(2), Suzanne Schlyter(1), Malin Ågren.

OUTLINE• Introduction

– The idea– Rationale– The knowledge bases– Demo

• Theoretical background– Developmental sequences and developmental stages in L2 French

• Method– CEFLE - The development corpus– The Direkt Profil system

• Overview of the system• Annotation• Defining profiles/stages with machine learning

• Results• Annotation• Defining profiles/stages• Example of an applied study with Direkt Profil

– Direkt Profil and teachers’ assessments: a correlation study

• Conclusion– Problems– Future work

Page 3: Direkt Profil: an automatic analyzer of texts written in French as a second language Jonas Granfeldt(1), Pierre Nugues(2), Suzanne Schlyter(1), Malin Ågren.

• The idea was…– To provide researchers, teachers and

learners with an easy-to-use tool for overall diagnostic assessment of developmental stage.

– To base the assessment on current research on second language acquisition.

– To automatically provide feedback to teachers and learners on language level and central target features of the language.

– To use learners’ free written production as the basis of assessment (rather than close-tests)

INTRODUCTION

Page 4: Direkt Profil: an automatic analyzer of texts written in French as a second language Jonas Granfeldt(1), Pierre Nugues(2), Suzanne Schlyter(1), Malin Ågren.

Rationale• Language acquisition is a process

which follows a specific and definable order.

• Learners and teachers want to know about the progress the learners make.

• Instruction is probably most effective if it is adopted to the learners’ present developmental level (cf. The Teachability Hypothesis, Pienemann, 1985)

INTRODUCTION

Page 5: Direkt Profil: an automatic analyzer of texts written in French as a second language Jonas Granfeldt(1), Pierre Nugues(2), Suzanne Schlyter(1), Malin Ågren.

The knowledge bases for the project• Second Language Research• Linguistics (French)• Natural Language Processing • Engineering

INTRODUCTION

Page 6: Direkt Profil: an automatic analyzer of texts written in French as a second language Jonas Granfeldt(1), Pierre Nugues(2), Suzanne Schlyter(1), Malin Ågren.

An example: a learner text from the corpus

<CORPUS><SAMPLE SUBJECT_ID="XXXX"><TEXT>C'est deux personne, une fille et sa mère. La fille est grand et elle a

une robe blue. Sa mère est petite mais grosse et elle a une robe vert. Elles va à L'Italie dans ses vacances. La fille pense à les garcons italien et sa mere pense du soleil. Elles sont derière un table avec une map. Elles boire des café. Leur voiture est vert. La voiture est trés petite est la bagage n'est pas fit. Maintenant elles à destination D'Italie. Elles check in. Le monsiuer fait une ronde tête est une grand moustache. Leur chambre est beau avec deux lis est une trés beaux vue. Elle est sur la plage. Sur la mere il y a des bateaux. Elles fait du soleil. Dans la soir elle a dîner dans une restaurant. À côté il y a un garcon avec une costume blue. Après le diner elles boire du vin rouge dans la bar. Les deux garcon d'italien ils voir la mère et sa fille. Ils sont d'amour. Ils parlent et boire de alcohol. Aprés ils fait du dancing. Le jour aprés ils fait du sightseeing avec Tony et son autobus rouge. Il est bold. Après le sightseeing ils visite un marche. La dame grosse a une hat rouge. Le monsieur grand a un hat noir. La fille grand amour le garcon petite mais grosse. Sur le soir ils separé - le grand monsieur avec la petite mais grosse dame et la grand fille avec le petite mais trés grand monsieur. Le jour après ils revenir a Suede avec les deux monsieurs. </TEXT>

<INFO TASK_NAME="VOYAGE_ITALIE" GROUP_SUBJECT="MAIN" SUBJECT_LEVEL="2" SOURCE_SCHOOL="XXXX"/>

</SAMPLE>

INTRODUCTION

Page 7: Direkt Profil: an automatic analyzer of texts written in French as a second language Jonas Granfeldt(1), Pierre Nugues(2), Suzanne Schlyter(1), Malin Ågren.

DEMO HERE

INTRODUCTION

Page 8: Direkt Profil: an automatic analyzer of texts written in French as a second language Jonas Granfeldt(1), Pierre Nugues(2), Suzanne Schlyter(1), Malin Ågren.

French L2 in a developmental perspective• Many projects since 1980s (examples)

– ESF-project (Perdue, 1993, L2 French, different L1s)– InterFra project (Bartning, 1997 and later) (Swedish

L1)– FIFI/DURS project (Schlyter, 1986 and later, Granfeldt,

2003) (Swedish L1)– Myles & Mitchell Myles (2002 and later) (Flloc-project,

English L1)

• Empirical objectives of this research:– arrive at rich and empirically valid descriptions of how

French interlanguage develops over time.– identify features at different linguistic levels which are

developmentally related.

• Some syntheses are emerging:– Bartning & Schlyter (2004): A proposal of six stages of

development.– Véronique et al. (2009): A proposal of three stages

THEORETICAL BACKGROUND

Page 9: Direkt Profil: an automatic analyzer of texts written in French as a second language Jonas Granfeldt(1), Pierre Nugues(2), Suzanne Schlyter(1), Malin Ågren.

Benchmarking grammatical development of French L2 (Bartning & Schlyter, 2004)

• Objectives: • Describe developmental sequences in French L2 for a

number of morphosyntactic phenonema• Establish general learner stages/profiles wrst to

grammatical development• Data:

• Oral corpora of French L2 (L1 = Swedish). • Post-puberty learners (N=35, 80 recordings)

• Method: • Frequency analysis and linguistic profiling• Manual and semi-automated tagging of transcriptions

THEORETICAL BACKGROUND

Page 10: Direkt Profil: an automatic analyzer of texts written in French as a second language Jonas Granfeldt(1), Pierre Nugues(2), Suzanne Schlyter(1), Malin Ågren.

Stage 1 2 3 4 5 6

% finite lexical verbs

(elle boit vs elle boire) 50-75 70-80 80-90 90-98 100 100

% 3e pers plural irreg. lexical verbs (elles vient vs elles viennent)

_ _ a few

cases 50 few

errors 100

Tense use Pres. Pres

(P.C.)

Pres

P.C.

(Impf)

Pres.

P.C

Impf

Pres

P.C

Impf

P-Q-P

Cond

Pres.

P.C

Impf

P-Q-P

Cond

P.Simp.

% gender agreement (NP art + N)

55-75 60-80 65-85 70-90 75-95 90-100

Initial Advanced Intermediate

Granfeldt (2003); Bartning & Schlyter (2004)

A model with 6 profiles/stages (sample)

Page 11: Direkt Profil: an automatic analyzer of texts written in French as a second language Jonas Granfeldt(1), Pierre Nugues(2), Suzanne Schlyter(1), Malin Ågren.

Direkt Profil• Objectives:

• To implement the model of Bartning & Schlyter (2004)• To develop an easy-to-use system for automated annotation,

extraction and frequency analysis of as many as possible of the features in B&S work

• To develop a system for defining developmental stages/profiles

• Method: • Constructing an interlanguage partial parser for L2

French• Connecting the parser to a module for machine learning• Constructing an interface

• We have expanded on B&S original work wrst : • Type of data (written rather than oral)• Quantity of data• Additional features (more morph.synt. features, lexical

and quantitative features)

METHOD

Page 12: Direkt Profil: an automatic analyzer of texts written in French as a second language Jonas Granfeldt(1), Pierre Nugues(2), Suzanne Schlyter(1), Malin Ågren.

Overview of Direkt Profil

Page 13: Direkt Profil: an automatic analyzer of texts written in French as a second language Jonas Granfeldt(1), Pierre Nugues(2), Suzanne Schlyter(1), Malin Ågren.

The development corpus CEFLE

CEFLE corpus Selection of CEFLE analyzed

Task name

Elicitation type

Words Text length

Sent. length

Homme Pictures 17260 Stage 1 (N=23) 78 6.9

Souvenir Pers. Narrative 14365 Stage 2 (N=98) 161 8.4

Italie Pictures 30840 Stage 3 (N=97) 212 9.8

Moi Pers. Narrative 30355 Stage 4 (N=58) 320 11.6

92820 Control (N=41) 308 15.2

•CEFLE: Corpus Ecrit de Français Langue Etrangère• 400 texts written under controlled conditions by 85 Swedish and 22 French students (317 texts used here) 4 texts / learner. • Manual assignment of “stage” to one text from each learner using B&S criteria (Voyage en Italie)

Granfeldt, Nugues et al. (2006)

Page 14: Direkt Profil: an automatic analyzer of texts written in French as a second language Jonas Granfeldt(1), Pierre Nugues(2), Suzanne Schlyter(1), Malin Ågren.

ANNOTATION• We developed an annotation scheme based on

B&S (2004) framework.• The concepts of noun or verb group is the

grammatical representation of most phenomena in this framework.

• Essential to the Direkt Profil annotation• Many syntactic annotation frameworks for French

take this into consideration– An example from Gendner et al. (2004): et mademoiselle qui <NV> appelait </NV> au

secours ! ... ou plutôt non , <NV> on ne l' entendait </NV> plus ... <NV> elle était </NV> peut-être morte ...

• This annotation make no provision however for the specific details in B&S framework

Page 15: Direkt Profil: an automatic analyzer of texts written in French as a second language Jonas Granfeldt(1), Pierre Nugues(2), Suzanne Schlyter(1), Malin Ågren.

ANNOTATION (cont’d)

• The Direkt Profil annotation is an XML-based mark up, split into 5 levels:

1. Tokenisation

2. Identification of prefabricated structures (c’est; je m’appelle etc)

Page 16: Direkt Profil: an automatic analyzer of texts written in French as a second language Jonas Granfeldt(1), Pierre Nugues(2), Suzanne Schlyter(1), Malin Ågren.

ANNOTATION (cont’d)3) POS-tagging (Det, Prep, Pron,

V(être/avoir), Konj)

4)Groupe detection/chunking: rule-based (decision tree) and uses a set of grammatical words (« mots vides », Tesnière, 1959; Vergne, 1998)

5)Chunk classification: rule-based feature checking between elements.

Page 17: Direkt Profil: an automatic analyzer of texts written in French as a second language Jonas Granfeldt(1), Pierre Nugues(2), Suzanne Schlyter(1), Malin Ågren.

The sentence Ils parlons dans la bar is annotated as

<segment class="c5148"><tag pos="pro:nom:pl:p3:mas"> Ils</tag> <tag pos="ver:impre:pl:p1"> parlons </tag></segment> dans <segment class="c3071"> <tag pos ="det:fem:sg">la</tag> <tag pos="nom:mas:sg">bar</tag> </segment>

c5148 reads: “Lexical verb/Present tense/3rd.pers.PL/no_agreement”

c3071 reads: “Det_Noun_NP/singular_det/without_gender_agreement”

– Features are finally counted and raw occurrences are converted to

percentages (where relevant)

Page 18: Direkt Profil: an automatic analyzer of texts written in French as a second language Jonas Granfeldt(1), Pierre Nugues(2), Suzanne Schlyter(1), Malin Ågren.

The dictionary

• The engine uses a dictionary of French inflected forms available freely from Association des Bibliophiles Universels (ABU)

• We have corrected, complemented it, and converted it to XML.

• We have also added frequency-of-use information from the Lexique database (New, Pallier & Ferrand, 2005)

Page 19: Direkt Profil: an automatic analyzer of texts written in French as a second language Jonas Granfeldt(1), Pierre Nugues(2), Suzanne Schlyter(1), Malin Ågren.

DEFINING STAGES/PROFILES

• Using the criteria in Bartning & Schlyter (2004) two researchers manually classified 82 texts of the sub-corpus Le voyage en Italie (part of CEFLE).

• The classification was subsequently re-used with all texts from the same learner, resulting in 317 classified texts.

• We trained/build classifiers where we used automatically extracted phenomena as features representing the learners’ texts.

• Currently 142 phenomena (features/attributes) are used when establishing a learner profile stage.

• We used C4.5 (Quinlan, 1986), LMT (Landwehr et al., 2003), and Support Vector Machines (Boser et al al., 1992) from the Weka collection (Witten & Frank, 2005)

Page 20: Direkt Profil: an automatic analyzer of texts written in French as a second language Jonas Granfeldt(1), Pierre Nugues(2), Suzanne Schlyter(1), Malin Ågren.

RESULTS

Page 21: Direkt Profil: an automatic analyzer of texts written in French as a second language Jonas Granfeldt(1), Pierre Nugues(2), Suzanne Schlyter(1), Malin Ågren.

0%

20%

40%

60%

80%

100%

Sta. 1 Sta.2 Sta. 3 Sta. 4 Contr

Direkt Profil (v.1.5.1) Recall and Precision

Recall

Precision

Annotation

Granfeldt, Nugues et al., 2005

RESULTS

Page 22: Direkt Profil: an automatic analyzer of texts written in French as a second language Jonas Granfeldt(1), Pierre Nugues(2), Suzanne Schlyter(1), Malin Ågren.

CLASSIFICATION using all features

Granfeldt & Nugues, 2007

RESULTS

Page 23: Direkt Profil: an automatic analyzer of texts written in French as a second language Jonas Granfeldt(1), Pierre Nugues(2), Suzanne Schlyter(1), Malin Ågren.

A sample decision tree

• % NPs with gender agreement <= 93• | % nominative pronouns <= 4: 1 (7.0/1.0)• | % nominative pronouns > 4• | | % NPs with num+gen agreement <= 94: 1 (2.0)• | | % NPs with num+gen agreement > 94• | | | % pluperfect verbs in S-V agreement <= 0• | | | | S-V agreement w/ modal verbs <= 10• | | | | | Average sentence length <= 15• | | | | | | % of the next 2,000 words <= 0: 1 (2.0/1.0)• | | | | | | % of the next 2,000 words > 0• | | | | | | | % D-N-A in agreement <= 0: 2 (11.0)• | | | | | | | % D-N-A in agreement > 0• | | | | | | | | % D-A-N in agreement <= 50• | | | | | | | | | % of the next 2,000 words <= 1: 2 (8.0/1.0)• | | | | | | | | | % of the next 2,000 words > 1• | | | | | | | | | | % prepositions <= 9• | | | | | | | | | | | % vbs in the imperfect <= 0• | | | | | | | | | | | | % mod+inf verbs in S-V agreement <= 33: 2 (4.0)• | | | | | | | | | | | | % mod+inf verbs in S-V agreement > 33: 3

(3.0/1.0)• | | | | | | | | | | | % vbs in the imperfect > 0: 3 (2.0)

Page 24: Direkt Profil: an automatic analyzer of texts written in French as a second language Jonas Granfeldt(1), Pierre Nugues(2), Suzanne Schlyter(1), Malin Ågren.

Attribute selection

• We ran an attribute selection procedure in order to identify the best features at this point.

• To evaluate the 142 attributes, we measured the information gain for each attribute with respect to the class. This method is derived from ID3 and is part of the Weka software.

Top 10 features according to InfoGain metric

Average merit Feature0.4371 % Determiner Noun agreement (gender errors)0.3351 % Unknown words (i.e. not in dictionary)0.3232 % NPs with gender agreement (including adjectives)0.2925 Average sentence length0.2565 % Prepositions (out of all parts-of-speech)0.2082 % S-V agreement with modal verbs followed by

infinitive0.1953 % Noun Adjective with agreement (gender and number)0.1793 % S-V agreemet w auxiliary in passé composé0.1739 % S-V agreement with être/avoir 3ppl (all tenses)0.153 % K1Tokens (out of all tokens)

Granfeldt & Nugues, 2007

Page 25: Direkt Profil: an automatic analyzer of texts written in French as a second language Jonas Granfeldt(1), Pierre Nugues(2), Suzanne Schlyter(1), Malin Ågren.

Results after feature selection

(top 20 attributes)

Granfeldt & Nugues, 2007

Page 26: Direkt Profil: an automatic analyzer of texts written in French as a second language Jonas Granfeldt(1), Pierre Nugues(2), Suzanne Schlyter(1), Malin Ågren.

Direkt Profil and teachers’ assessment: a correlation

study• An example of an applied study with Direkt Profil

• Several scholars have suggested that work on developmental sequences and stages could be used as a mean for assessing language development of a particular individual at a given time (Clahsen, 1985, Pinemann & Johnston, 1987, the Rapid Profile program Pinemann & Mackay, 1992, Brindley, 1998)

Page 27: Direkt Profil: an automatic analyzer of texts written in French as a second language Jonas Granfeldt(1), Pierre Nugues(2), Suzanne Schlyter(1), Malin Ågren.

Research questions

1. What is the correlation between the developmental stage and teachers’ assessments of the same texts? (RQ1)

2. To what extent can the developmental stage predict teachers’ ranking of a particular text? (RQ2)

Page 28: Direkt Profil: an automatic analyzer of texts written in French as a second language Jonas Granfeldt(1), Pierre Nugues(2), Suzanne Schlyter(1), Malin Ågren.

Method- 50 texts from the CEFLE- corpus (Ågren, 2005) were selected (Task: Le voyage en Italie picture

series)- The learner texts had previously been manually analysed according to developmental stage

following the criteria in B&S

Stage 1(man)

Stage 2(man)

Stage 3(man)

Stage 4(man)

Natives

10 texts

10 texts

10 texts

10 texts

10 texts

The texts were also analysed by Direkt Profil resulting in two separate indications for developmental stage (manual and automated)

Page 29: Direkt Profil: an automatic analyzer of texts written in French as a second language Jonas Granfeldt(1), Pierre Nugues(2), Suzanne Schlyter(1), Malin Ågren.

Method (cont’d)7 experienced teachers of upper secondary school rated

the 50 texts on a six grade scale (6 = highest level)

They were asked to assess the texts in three domains:(a)“Form”, i.e. language (grammar, lexicon, spelling etc.)(b)“Content and Communication” (content in relation to

the pictures, the communicative success of the text)(c)“Overall”, i.e. combining a and b (in a way they found

suitable)

The teachers also stated for each assessment the degree of certainty with which they had rated the text (scale of 5 where 5 indicated completely certain and 1 indicated completely uncertain)

Page 30: Direkt Profil: an automatic analyzer of texts written in French as a second language Jonas Granfeldt(1), Pierre Nugues(2), Suzanne Schlyter(1), Malin Ågren.

RESULT: Median and distribution of ratings for form (language)

Granfeldt & Ågren, 2009

Page 31: Direkt Profil: an automatic analyzer of texts written in French as a second language Jonas Granfeldt(1), Pierre Nugues(2), Suzanne Schlyter(1), Malin Ågren.

RESULT Inter-rater agreement between teachers

Krippendorffs α (Kalpha)

Form ,738 Content and communicative success ,749

Assessment

Overall ,750

Granfeldt & Ågren, 2009

Page 32: Direkt Profil: an automatic analyzer of texts written in French as a second language Jonas Granfeldt(1), Pierre Nugues(2), Suzanne Schlyter(1), Malin Ågren.

RESULT: Correlating developmental stage and teachers’ assessments

Assessment of form (median)

Assessment of content/ communicative functions (median)

Overall assessment (median)

Developmental stage (man) (natives excluded)

,908 ,902 ,883

Direkt Profil (natives excluded)

,872 ,876 ,865

Instructional level (natives excluded)

,774 ,780 ,776

Answering Research Question 1:

The developmental stage is better correlated with the assessments of the teachers than instructional level.

Granfeldt & Ågren, 2009

Page 33: Direkt Profil: an automatic analyzer of texts written in French as a second language Jonas Granfeldt(1), Pierre Nugues(2), Suzanne Schlyter(1), Malin Ågren.

RESULT: Regression analysis

Overall assessment (r2)

Developmental stage (excl. natives)

,735

Direkt Profil (excl. natives)

,703

Instructional level (excl. natives)

,566

Apprx. 70% of the variance in the teachers ranking of the texts can be explained by the developmental stage as analysed by Direkt Profil

Answering Research Question 2:

Page 34: Direkt Profil: an automatic analyzer of texts written in French as a second language Jonas Granfeldt(1), Pierre Nugues(2), Suzanne Schlyter(1), Malin Ågren.

Conclusion• We have presented a system for assessment of developmental

stage/profile in French as a second language French. – The system implements the current theory of stages/profiles of

development in French.• The system consists of

– a interlanguage partial parser for French L2 called Direkt Profil and

– a machine-learning module connected to it.

• Results:– An evaluation of the annotation showed mixed results, depending very

much on the developmental stage of the writer.– Results from classification experiments show:

• Best results with a 3-stage classification: a mean F of 0.82• Stage 1 is the most problematic• The texts from the natives are relatively easy to classify: a mean F

of 0.91• A large feature set does not seem to be necessary (at least not for

this data)• Using an attribute/feature selection method, we have identified a

list of ”10 best attributes”

Page 35: Direkt Profil: an automatic analyzer of texts written in French as a second language Jonas Granfeldt(1), Pierre Nugues(2), Suzanne Schlyter(1), Malin Ågren.

Problems”Briefly, the language produced by learners is

about the worst imaginable type of language for NLP.” (Tschichold, 2007)

– Lexical spelling (orthographe lexicale) is a problem – incorrect forms lead to increased ambiguity and to incorrect annotation

– Attribute selection is not sufficiently studied.– Amount of data is still insufficient.

Page 36: Direkt Profil: an automatic analyzer of texts written in French as a second language Jonas Granfeldt(1), Pierre Nugues(2), Suzanne Schlyter(1), Malin Ågren.

Future work

• Optimising annotation:• Procedures to adress the spelling problem• Review the rules• Ongoing student tests with a stochastic parser (trained on

the Le monde corpus)

• Adding more texts from higher stages of development

• Expanding to other languages (Italian L2)

• Continue working with other assessment schemes, i.e. the Common European Framework of Reference (Granfeldt, 2008)

Page 37: Direkt Profil: an automatic analyzer of texts written in French as a second language Jonas Granfeldt(1), Pierre Nugues(2), Suzanne Schlyter(1), Malin Ågren.

• Thank you for your attention!

• Direkt Profil is free to use• Available at this adress:

– http://profil.sol.lu.se

• AcknowledgmentsThe profiling team in Lund: Pierre Nugues, Suzanne

Schlyter, Malin Ågren, Edin Kuckovic, Emil Persson, Fabian Kostadinov, Lisa Persson

This work was supported by the Swedish Research Council Grant number 2004-1674


Recommended