Optimizing Test & Courseware Development
Lisbon 23 April 2016
John De Jong SVP Global Assessment Standards, Pearson
Professor of Language Testing VU University Amsterdam
3
PISA Programme for International Student Assessment
PISA Development over time
2000: Reading Mathematics and Science
2003: Reading Mathematics and Science
2006: Reading Mathematics and Science
2009: Reading Mathematics and Science
+ Optional Electronic Reading
2012: Reading Mathematics and Science
+ Optional Electronic Mathematics
2015: Electronic: Reading Mathematics and Science
+ Collaborative Problem Solving
2018 : Reading Mathematics and Science
+ Global Competence
4
Lessons from PISA
Major drivers of success of countries
• Clear standards defined at national level
• High level of teacher autonomy
5
… then, how to define standards?
Ranking CPS in higher education and workplace Applied Skill Rank Educ Rank Work
Oral Communications 3 1
Teamwork / Collaboration 3 1
Problem Solving 1 2
Written Communications 2 2
Information Technology Application 4 3
Lifelong Learning / Self Direction 2 4
Professionalism / Work Ethic 5 4
Ethics / Social Responsibility 6 4
Creativity / Innovation 3 5
Diversity 7 6
Leadership 7 7
Survey results
Definition Agree %
is clearly described 97
matches my own understanding
of CPS 95
will help higher ed institutions
to understand CPS 88
will help employers to
understand CPS 100
is what is taught in my country 52
The CPS definition is … Agree %
Crucial reformation targets
• Establish needs
• Define learning objectives
• Define coherent and realistic curriculum
• Engage students
9
15
Structural approach to defining objectives
Difficulty
Dom
ain
Language
Do
ma
ins o
f la
ng
ua
ge
use
/ T
op
ics
Difficulty
Self / personal experience
Negotiating with others
Deal with new
Academic
Specialized
Jokes
GE: A1 A2 B1 B2 C1 C2
AE: General MBA
PE: Waiter Politician
Coherent bank of objectives
A General Model of Language Development
Gen
eral
Cogn
itio
n
Language Proficiency
Measuring within population of language learners: measures both linguistic and general cognitive development
Measuring across two populations of language learners, may just measure cognitive development only.
Including appropriate native speaker population can help to measure linguistic development only
0 1 2 3 4 5 etc. “language age”
0
1
2
3
4
5
etc
.
“co
gn
itiv
e a
ge”
The Global Scale of English
18
Comparison PTE Academic (GSE scale) and IELTS and TOEFL
IELTS
TOEFL iBT
Sample page (from B1)
The Pearson Syllabus – General English
20
English The need for
Overview
• A vocabulary framework linked to the Global Scale of English (GSE) and the CEFR
• Organized by topics and subtopics based on the CoE Vantage specifications categorization
• Describing vocabulary targets for learners of general English
• A probabilistic model of productive vocabulary learning
• Based on the principle of incremental learning of word meanings, from basic to specialized
• Including 20k+ lemmas; 37k+ meanings; 80k+ collocations; 7k+ functional units
• Helping learners, teachers, and materials designers identify level-appropriate vocabulary
Methodology
Combines frequency data and teacher judgements via 4 main steps:
1. Corpus 2.5 billion words > extraction of frequency list
2. Semantic annotation
• Manual tagging of 37k word meanings using of CoE ‘Vantage’
3. Teacher ratings
• Rating of 37k word meanings by 10 teachers (scale: 1 to 5 + 99)
4. Statistical analysis
• Rank word meanings by combining frequency data and teacher ratings
5. Fit the data onto a model, link each meaning to the CEFR /GSE
Lemmas and meanings
Structure vocabulary around pedagogically relevant
sets using the CoE Vantage categorization
Example:
Specific Notions (Topics)
Fork > FOOD&DRINKS_tableware
SPORT&HOBBIES_gardening
TRAVEL_directions
23
Theoretical assumptions
A model of vocabulary growth based on current literature:
• Basic (A1) > 500-1k words (500 words as min. elementary level -Hill, 2013; 500-1k as general teaching target)
• Basic (A2)> boundary for high frequency vocabulary set at 3k families for everyday conversation (Adolphs & Schmitt, 2003)
• Independent (B1) > 5k families to read authentic texts (Schmitt, 2007)
• Independent (B2) > minimum target of 10k lemmas at univ. level (Hazenberg & Hulstijn, 1996) for Dutch; 8/9k f. for unassisted comprehension (Nation, 2006)
• Proficient (C1 upwards) > 20k f. known by educated L1 speakers (Nation, 2001); 50k w. known by most L1 speakers (Crystal, 1981)
Hill, D. R. (2001). Survey: Graded Readers. ELT Journal 55(3), Oxford University Press, 300-324
Adolphs, S. & Schmitt, N. (2003). Lexical coverage of spoken discourse. Applied Linguistics 24, 4: 425-438.
Schmitt, N. (2007). Current perspectives on vocabulary teaching and learning. In J. Cummins and C. Davison (eds.), International Handbook of English language teaching: part II. NY: Springer, 827-841.
Hazenberg, S. & Hulstijn, J. H. (1996). Defining a minimal receptive second‐ language vocabulary for non‐native university students: An empirical investigation. Applied Linguistics, 17 (2), 145‐163
Nation, I., S., P. (2006). How large a vocabulary is needed for reading and listening. The Canadian Modern Language Review, 63 (1), 59-82
Nation, P. (2001). Leaning vocabulary in another language. Cambridge: Cambridge University Press.Schmitt, N. (2000). Vocabulary in language teaching. Cambridge: Cambridge University Press, pp.7-8
Crystal, D. (1981). Clinical Linguistics. Vienna, Springer
Data modelling 1
y = 0.006x3.539
R² = 0.9842
0
10,000
20,000
30,000
40,000
50,000
60,000
10 20 30 40 50 60 70 80 90
From GSE to ModelLem
Hypothesis: 'CumLem'
Model: 'ModelLem'
Meanings vs Lemmas
1.0
1.5
2.0
2.5
<T T A1 A2 A2+ B1 B1+ B2 B2+ C1 C2
Average number of Meanings per Lemma
Vocabulary growth
0
2000
4000
6000
8000
10000
12000
14000
PreT T A1 A2 A2+ B1 B1+ B2 B2+ C1 C2
Vocabulary growth by level
New Meanings New Lemmas
Cumulative vocabulary growth
0
10000
20000
30000
40000
50000
60000
PreT T A1 A2 A2+ B1 B1+ B2 B2+ C1 C2
Cumulative Vocabulary Growth by Level
Cumul Meanings Cumul Lemmas
The vocabulary usefulness rating
1 = Essential words learners would want to acquire first
2 = Important words that become necessary at a next stage
3 = Useful words enabling more detailed and specific language
4 = Nice to have words to express concepts more accurately
5 = Extra words some language users will use occasionally
99 “Escape” words which are impossible to rate - you have never heard of the word before or you cannot decide between widely different ratings
Teachers received online training and followed specific
guidelines
Each word was rated by a random 10 out of the 19 raters in an
overlapping design using a pre-defined scale of 1-5
Combine ratings and Frequency data
Ra x rRating + Frank x (1- rRating) + Frank
Combine =
2
Where
Combine is the optimal combination of ratings and Frequency data
Ra is the Rating average
rRating is the Reliability of rating data
Frank is the scaled frequency rank.
adj.in People & relationships [personal traits]
A1: happy (23), good (22);
A2: angry (34), kind (36)
A2+: noisy (39), silly (40)
B1: upset (47), lonely (48)
B1+: confident (51), nasty (53)
B2: creative (59), sympathetic (63)
B2+: kind-hearted (67), spoiled (70)
C1: hypocritical (76), bashful (80)
C2: shifty (86), sycophantic (88)
34
y = -3.8806x2 + 42.05x - 24.081R² = 0.9974
10
20
30
40
50
60
70
80
90
1 2 3 4 5
Tourist
A1
A2
B1
B2
C1
C2
Essential
Important
Useful
Extra
Nice to have
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
10 20 30 40 50 60 70 80 90
Lik
elih
ood
of
Su
ccess
GSE Task Difficulty
A learner at 25 on GSE
Girl, Mother
Boy, Father