+ All Categories
Home > Documents > Learner Corpora in Use: A Taxonomy of Flemish Students ... · The government is interested in the...

Learner Corpora in Use: A Taxonomy of Flemish Students ... · The government is interested in the...

Date post: 27-Sep-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
34
Learner Corpora in Use: A Taxonomy of Flemish Students’ Errors in Written Dutch Annelies Deveneyns & Jose Tummers Leeds, IVACS 21-22/06/2012
Transcript
Page 1: Learner Corpora in Use: A Taxonomy of Flemish Students ... · The government is interested in the impact of social network sites, such as Facebook, MySpace, Netlog…, on social life.

Learner Corpora in Use:A Taxonomy of Flemish Students’ Errors in Written Dutch

Annelies Deveneyns & Jose Tummers

Leeds, IVACS 21-22/06/2012

Page 2: Learner Corpora in Use: A Taxonomy of Flemish Students ... · The government is interested in the impact of social network sites, such as Facebook, MySpace, Netlog…, on social life.

Contents

1. Problem Statement

2. Project

3. Data Gathering

4. Error Analysis

5. Results

6. Conclusion

2

Page 3: Learner Corpora in Use: A Taxonomy of Flemish Students ... · The government is interested in the impact of social network sites, such as Facebook, MySpace, Netlog…, on social life.

1. Problem Statement (1/3)

General social perception

• Deteriorating (written) language proficiency amongst youngsters and young adults, including proficiency in Dutch mother tongue

• Prominent position in policy plans Flemish Min. Education

• Tienkamp voor gelijke kansen ‘Decatlon for equal opportunities’ (Vandenbroucke 2007)

• Taalnota ‘Language note’ (Smet 2011)

3

Page 4: Learner Corpora in Use: A Taxonomy of Flemish Students ... · The government is interested in the impact of social network sites, such as Facebook, MySpace, Netlog…, on social life.

1. Problem Statement (2/3)

Higher education

• Degrading quality papers: audit, work field

• Language gap between learning outcomes secondary education and (implicit) requirements higher education (Van den Branden 2010; Bogaert & Verheyden 2010)

• Language test (Millenaar 2007; Reijn 2008)

• Screening (Deygers & Kanobana 2010; De Wachter & Heeren

2011)

• Summer classes (Sterckx & Vanhoren 2011)

• Research: corroboration negative perception (Bogaert &

Verheyden 2010; Peeters & Van Houtven 2010)

4

Page 5: Learner Corpora in Use: A Taxonomy of Flemish Students ... · The government is interested in the impact of social network sites, such as Facebook, MySpace, Netlog…, on social life.

1. Problem Statement (3/3)

Paradox• Acknowledgement problems mother tongue proficiency• Research

• Focus on Dutch as Foreign Language• Little research mother tongue proficiency young adults

(Berckmoes & Rombouts 2009; Peeters & Van Houtven 2010)

Research written language• Focus on process and on reader orientation to the

detriment of the text as product (Hyland 2002; Weijen 2009)

• Communicative approach: reduced attention to formal properties

• Research based on fill-in exercises and multiple choicetests

� Need for objective and authentic data about mother tongue proficiency in higher education

5

Page 6: Learner Corpora in Use: A Taxonomy of Flemish Students ... · The government is interested in the impact of social network sites, such as Facebook, MySpace, Netlog…, on social life.

2. Project (1/2)

Goals

• Chart written mother tongue proficiency (Dutch) first year bachelor students at KHLeuven/Leuven University College

• Holistic and analytic scoring

• Error analysis

• Analysis based on proficiency measures

• Empirically based proposals for remediation

6

Page 7: Learner Corpora in Use: A Taxonomy of Flemish Students ... · The government is interested in the impact of social network sites, such as Facebook, MySpace, Netlog…, on social life.

2. Project (2/2)

Methodology

• Corpus linguistics: tradition of corpus-based analyses in applied linguistics and language education, mainly in Anglo-Saxon tradition and ESOFL (Hunston 2002; Mukherjee 2006; Römer 2008)

• (Renewed) attention for text

• Readability research (Pittler & Nenkova 2008)

• Automatic text evaluation (Yannakoudakis et al. 2011) and correction (Albert et al. 2009)

• Learner corpora (Granger 2003; Pravec 2002)

7

Page 8: Learner Corpora in Use: A Taxonomy of Flemish Students ... · The government is interested in the impact of social network sites, such as Facebook, MySpace, Netlog…, on social life.

3. Data Gathering (1/3)

Spontaneous language use in natural ‘settings’

• No fill-in nor multiple choice (Van Schrooten & De Glopper

1990)

• No ‘language assignment’

• Argumentative text

• Textual type characteristic of EQF-6 (EP & EC 2008)

• Conceptual structure and complexity (vs. narrative texts) (Yau & Belanger 1986)

• Structured around thematic elaboration (Berman & Nir-Sagiv

2007)

• Drawback

• Limited to one text type

• Labor intensive

8

Page 9: Learner Corpora in Use: A Taxonomy of Flemish Students ... · The government is interested in the impact of social network sites, such as Facebook, MySpace, Netlog…, on social life.

3. Data Gathering (2/3)

Assignment

• Argumentative and persuasive text for broad audience

• Subject: social network sites

• Duration: max. 60’ for 500 words

• Resources

• Computer

• All resources deemed useful

• AssignmentThe government is interested in the impact of social network sites, such as Facebook, MySpace, Netlog …, on social life.

Imagine you are a newspaper journalist who has to write a critical article on the use and the impact of social network sites. Formulate your opinion in a coherent and well-structured text of about 500 words. Convince your audience of your point of view, which can be positive or negative.

You have 60 minutes to formulate your opinion. You can use all resources deemed useful.

9

Page 10: Learner Corpora in Use: A Taxonomy of Flemish Students ... · The government is interested in the impact of social network sites, such as Facebook, MySpace, Netlog…, on social life.

3. Data Gathering (3/3)

Sample

• Cluster sampling (McDaniel & Gates 2007)

• Leuven University College: 13 programs

• 1 group/class per program

• Administration

• Semester 2 in 1st bachelor

• Scheduled in timetable selected groups/classes

• Corpus: 346 texts

• Error analysis: random sample of 100 texts

10

Page 11: Learner Corpora in Use: A Taxonomy of Flemish Students ... · The government is interested in the impact of social network sites, such as Facebook, MySpace, Netlog…, on social life.

4. Error Analysis (1/5)

Goal

• Error identification

• Error coding

� Development course materials (Nesselhauf 2004)

� Consciousness awakening

Research questions

1. What are the most frequent errors?• Corpus frequency (CF)

• Recurrence of error within same text

2. What are the most typical errors?• Document frequency (DF)

• Spread over texts, viz. students

11

Page 12: Learner Corpora in Use: A Taxonomy of Flemish Students ... · The government is interested in the impact of social network sites, such as Facebook, MySpace, Netlog…, on social life.

4. Error Analysis (2/5)

Definition ‘error’

• no sinecure (Tono 2003)

• Vague and diffuse

• Reference to mother tongue

• Reference linguistic feeling mother tongue speaker

Stepwise solution

• “an unsuccessful bit of language” (James 1998)

• Error coding scheme in style learner corpora (Granger 2003; Pravec 2002): 55 codes resulting from incremental process• Linguistic information: spelling, lexicon, syntax, textual

grammar, document structure

• Error type: erroneous use, redundancy, missing, order, …

12

Page 13: Learner Corpora in Use: A Taxonomy of Flemish Students ... · The government is interested in the impact of social network sites, such as Facebook, MySpace, Netlog…, on social life.

4. Error Analysis (3/5)

Normative reference point

• Actual language use: variation

• Codified norm: objective and replicable

• Spelling & Lexicon: Groene Boekje, Van Dale (EGVD 14.7)

• Grammar: ANS (Haeseryn et al. 1997)

• Drawback: stricter norm than actual language use

• Young adults who have successfully finished secondary education: high level of (written) language proficiency assumed (cf. SE)

• Gradation of error level (Albert et al. 2009): difficult to objectivize

13

Page 14: Learner Corpora in Use: A Taxonomy of Flemish Students ... · The government is interested in the impact of social network sites, such as Facebook, MySpace, Netlog…, on social life.

4. Error Analysis (4/5)

Coding scheme

1. Spelling

2. Punctuation

3. Capitals

4. Lexicon

5. Grammar

1. Syntax

2. Textual structure

6. Document structure

14

Page 15: Learner Corpora in Use: A Taxonomy of Flemish Students ... · The government is interested in the impact of social network sites, such as Facebook, MySpace, Netlog…, on social life.

4. Error Analysis (5/5)

Quantitative measures

• DF (document frequency) error: percentage texts in sample in which error occurs

• CF (corpus frequency) error: average number of occurrences of error per 100 words

• Standardized score

• Computed based on the documents in which the error actually occurs: significant influence of documents without error on CF (95%CIdifference = [0.02;0.29]; t = 2.2019, df = 107.048, p = 0.02982)

15

Page 16: Learner Corpora in Use: A Taxonomy of Flemish Students ... · The government is interested in the impact of social network sites, such as Facebook, MySpace, Netlog…, on social life.

5. Results (1/13)

Overview statistics

• Spread errors over texts (viz. students)

• Spread errors over error types

16

Page 17: Learner Corpora in Use: A Taxonomy of Flemish Students ... · The government is interested in the impact of social network sites, such as Facebook, MySpace, Netlog…, on social life.

5. Results (2/13)

Overview statistics CF and DF scores over students (cumulative, viz. all errors taken together)

17

Page 18: Learner Corpora in Use: A Taxonomy of Flemish Students ... · The government is interested in the impact of social network sites, such as Facebook, MySpace, Netlog…, on social life.

5. Results (3/13)

Overview statistics CF and DF scores over error codes (average)

18

Page 19: Learner Corpora in Use: A Taxonomy of Flemish Students ... · The government is interested in the impact of social network sites, such as Facebook, MySpace, Netlog…, on social life.

5. Results (4/13)

Errors per linguistic category

19

Category DF CF

Spelling 0.90 0.63

Punctuation 0.98 2.21

Use of capitals 0.24 0.25

Lexicon 1.00 2.85

Syntax 1.00 2.97

Textual grammar 1.00 3.98

Document structure 1.00 0.95

Page 20: Learner Corpora in Use: A Taxonomy of Flemish Students ... · The government is interested in the impact of social network sites, such as Facebook, MySpace, Netlog…, on social life.

5. Results (5/13)

Refinement – Punctuation

20

DF CF

Missing 0.93 1.21

Erroneous 0.93 1.06

Redundant 0.21 0.36

Page 21: Learner Corpora in Use: A Taxonomy of Flemish Students ... · The government is interested in the impact of social network sites, such as Facebook, MySpace, Netlog…, on social life.

5. Results (6/13)

Refinement – Lexicon

21

DF CF

Non-existing word 0.47 0.34

Erroneous use 0.98 1.76

Erroneous combination 0.48 0.25

Redundant 0.56 0.47

Flemish 0.16 0.27

Contamination 0.05 0.26

Pleonasm 0.18 0.24

MSN language 0.33 0.37

Borrowings 0.30 0.33

Totum pro parte 0.54 0.38

Pars pro toto 0.04 0.17

Page 22: Learner Corpora in Use: A Taxonomy of Flemish Students ... · The government is interested in the impact of social network sites, such as Facebook, MySpace, Netlog…, on social life.

5. Results (7/13)

Refinement – Syntax

22

DF CF

Word order: inversion 0.24 0.37

Word order: V end group 0.20 0.23

Word order: other 0.86 0.52

Valence: transitivity 0.05 0.17

Valence: PP 0.57 0.29

Valence: completive clause 0.02 0.19

Auxiliary verb 0.31 0.20

Tempus 0.43 0.37

DF CF

Gender 0.03 0.35

Inflection: V-dt 0.43 0.26

Inflection: V-congr 0.46 0.28

Inflection: N 0.28 0.23

Inflection: A 0.21 0.24

Inflection: other 0.33 0.27

Redundance 0.82 0.60

Erroneous POS 0.37 0.34

Missing words 0.94 0.76

Erroneous

contraction

0.52 0.27%2

Page 23: Learner Corpora in Use: A Taxonomy of Flemish Students ... · The government is interested in the impact of social network sites, such as Facebook, MySpace, Netlog…, on social life.

Slide 22

%2 samentrekking = contraction%FullName%; 19/06/2012

Page 24: Learner Corpora in Use: A Taxonomy of Flemish Students ... · The government is interested in the impact of social network sites, such as Facebook, MySpace, Netlog…, on social life.

5. Results (8/13)

Refinement – Textual grammar

23

DF CF

Anaphora 0.96 1.24

Erroneous links 0.98 1.04

New referent 0.98 1.22

Coordinating

conjunction

0.73 0.47

Subordinating

conjunction

0.26 0.23

Pronominal adverb 0.19 0.22

Relative pronoun 0.28 0.29

Conjunctive adverb 0.28 0.24

Page 25: Learner Corpora in Use: A Taxonomy of Flemish Students ... · The government is interested in the impact of social network sites, such as Facebook, MySpace, Netlog…, on social life.

5. Results (9/13)

Refinement – Document structure

24

DF CF

Title: absent 0.16 n/a

Paragraph: absent 0 n/a

Paragraph: erroneous 0 n/a

Structure: intro 0 n/a

Structure: body 0 n/a

Structure: conclusion 0 n/a

DF CF

Style: colloquial 0.30 0.34

Style: block letters 0.24 0.40

Style: stream of

consciousness

0.79 0.51

Style: quotes 0.02 0.20

Style: repetitive 0.53 0.35

Style: formal 0.37 0.37

Page 26: Learner Corpora in Use: A Taxonomy of Flemish Students ... · The government is interested in the impact of social network sites, such as Facebook, MySpace, Netlog…, on social life.

5. Results (10/13)

Error taxonomy: relation between DF and CF

• Correlation (Pearson): 0.837785

• Categorical• DF: 4 groups

• DF < 0.25

• 0.25 ≤ DF < 0.50

• 0.50 ≤ DF < 0.75

• DF ≥ 0.75

• CF: skewed distribution

• CF < Decile 5

• Decile 5 ≤ CF < Decile 8

• Decile 8 ≤ CF < Decile 9

• CF ≥ Decile 9

25

Page 27: Learner Corpora in Use: A Taxonomy of Flemish Students ... · The government is interested in the impact of social network sites, such as Facebook, MySpace, Netlog…, on social life.

26

DF < 0.25 0.25 ≤ DF < 0.50 0.50 ≤ DF < 0.75 DF ≥ 0.75

CF <

decile 5

<C.M> <L.PPT> <G.V> <G.T.A> <G.T.PA> <D.S.B> <D.S.Q> <D.T> <D.P> <D.MS>

<G.AV> <G.T.SC>

Decile 5 ≤

CF <

Decile 8

<C.E> <L.F> <L.C> <L.P> <G.T.RP> <P.R>

<L.EC> <L.B> <L.U> <L.MSN> <G.T> <G.G><G.POS> <D.S.C> <D.S.F>

<L.R> <L.TTP><G.T.CC><G.EC><D.S.R>

Decile 8 ≤

CF <

Decile 9

<S> <G.WO> <G.I> <G.R> <G.MW> <D.S.S>

CF ≥

Decile 9

<P.M> <P.E> <L.E> <G.T.A> <G.T.L> <G.T.R>

5. Results (11/13)

Page 28: Learner Corpora in Use: A Taxonomy of Flemish Students ... · The government is interested in the impact of social network sites, such as Facebook, MySpace, Netlog…, on social life.

27

DF < 0.25 0.25 ≤ DF < 0.50 0.50 ≤ DF < 0.75 DF ≥ 0.75

CF <

decile 5

<C.M> <L.PPT> <G.V> <G.T.A> <G.T.PA> <D.S.B> <D.S.Q> <D.T> <D.P> <D.MS>

<G.AV> <G.T.SC>

Decile 5 ≤

CF <

Decile 8

<C.E> <L.F> <L.C> <L.P> <G.T.RP> <P.R>

<L.EC> <L.B> <L.U> <L.MSN> <G.T> <G.G><G.POS> <D.S.C> <D.S.F>

<L.R> <L.TTP><G.T.CC><G.EC><D.S.R>

Decile 8 ≤

CF <

Decile 9

<S> <G.WO> <G.I> <G.R> <G.MW> <D.S.S>

CF ≥

Decile 9

<P.M> <P.E> <L.E> <G.T.A> <G.T.L> <G.T.R>

5. Results (12/13)

Page 29: Learner Corpora in Use: A Taxonomy of Flemish Students ... · The government is interested in the impact of social network sites, such as Facebook, MySpace, Netlog…, on social life.

28

DF < 0.25 0.25 ≤ DF < 0.50 0.50 ≤ DF < 0.75 DF ≥ 0.75

CF <

decile 5

<C.M> <L.PPT> <G.V> <G.T.A> <G.T.PA> <D.S.B> <D.S.Q> <D.T> <D.P> <D.MS>

<G.AV> <G.T.SC>

Decile 5 ≤

CF <

Decile 8

<C.E> <L.F> <L.C> <L.P> <G.T.RP> <P.R>

<L.EC> <L.B> <L.U> <L.MSN> <G.T> <G.G><G.POS> <D.S.C> <D.S.F>

<L.R> <L.TTP><G.T.CC><G.EC><D.S.R>

Decile 8 ≤

CF <

Decile 9

<S> <G.WO> <G.I> <G.R> <G.MW> <D.S.S>

CF ≥

Decile 9

<P.M> <P.E> <L.E> <G.T.A> <G.T.L> <G.T.R>

5. Results (12/13)

• Textual grammar:

referential elements

• Punctuation

• Erroneous use

• Missing

• Lexicon: erroneous use

Page 30: Learner Corpora in Use: A Taxonomy of Flemish Students ... · The government is interested in the impact of social network sites, such as Facebook, MySpace, Netlog…, on social life.

29

DF < 0.25 0.25 ≤ DF < 0.50 0.50 ≤ DF < 0.75 DF ≥ 0.75

CF <

decile 5

<C.M> <L.PPT> <G.V> <G.T.A> <G.T.PA> <D.S.B> <D.S.Q> <D.T> <D.P> <D.MS>

<G.AV> <G.T.SC>

Decile 5 ≤

CF <

Decile 8

<C.E> <L.F> <L.C> <L.P> <G.T.RP> <P.R>

<L.EC> <L.B> <L.U> <L.MSN> <G.T> <G.G><G.POS> <D.S.C> <D.S.F>

<L.R> <L.TTP><G.T.CC><G.EC><D.S.R>

Decile 8 ≤

CF <

Decile 9

<S> <G.WO> <G.I> <G.R> <G.MW> <D.S.S>

CF ≥

Decile 9

<P.M> <P.E> <L.E> <G.T.A> <G.T.L> <G.T.R>

5. Results (13/13)

• Spelling

• Syntax

• Word order

• Missing elements

• Redundant elements

• Document: stream of

consciousness

Page 31: Learner Corpora in Use: A Taxonomy of Flemish Students ... · The government is interested in the impact of social network sites, such as Facebook, MySpace, Netlog…, on social life.

6. Conclusion (1/3)

General

• A lot of errors, in all texts in the sample

• Relation between error recurrence (CF) and document frequency (DF): small number of highly recurrent errors

• Great diversity of errors in texts

• Reminder: very strict criteria

30

Page 32: Learner Corpora in Use: A Taxonomy of Flemish Students ... · The government is interested in the impact of social network sites, such as Facebook, MySpace, Netlog…, on social life.

6. Conclusion (2/3)

Errors

• (Highly) frequent errors concern ‘markers’

• Textual grammar: referential markers

• Syntax

• Lexicon (erroneous word use)

• Punctuation

• Negative impact on

• Textual understanding (Pittler & Nenkova 2008)

• Textual evaluation (Kloet et al. 2003)

• Persuasion (McCroskey & Mehrley 1969)

• Image author (Burgoon & Miller 1985)

31

Page 33: Learner Corpora in Use: A Taxonomy of Flemish Students ... · The government is interested in the impact of social network sites, such as Facebook, MySpace, Netlog…, on social life.

6. Conclusion (3/3)

Recommendations Dutch language teaching in Flemish higher education

• Formal complement to communicative approach• Knowledge: master formal aspects of language and several formal

written text styles• Skills: write error-free and well-structured texts • Attitude: understand importance correct language use and

implications sloppy written language

• Inclusive language teaching• Ubiquitous position language as communication tool in curriculum:

embedment in non-language modules• Train the (non-language) teacher• Work on the student’s knowledge, skills and attitudes• Uniform language framework and evaluation criteria, with

progressively growing severity

32

Page 34: Learner Corpora in Use: A Taxonomy of Flemish Students ... · The government is interested in the impact of social network sites, such as Facebook, MySpace, Netlog…, on social life.

Any questions …

[email protected]

[email protected]

33


Recommended