Optimizing EFL Vocabulary Learning with IRT and Online Technology
Barrier-Free Vocabulary Project
Dr. Brent CulliganEFL InstructorAoyama Gakuin Women’s CollegeSenior Scientist, Lexxica [email protected]
Dr. Charles BrowneProfessor of LinguisticsMeiji Gakuin UniversityCo-founder, Lexxica [email protected]
Outline
1. Review key concepts of lexical coverage
2. How to create special purpose lexicons
3. How to identify each learner’s unknown words
4. Word Engine spaced repetition learning tools
5. Introducing V-Lexx - text analysis and control
EFL learners are not meeting and acquiring enough high-frequency vocabulary to achieve sufficient lexical coverage of English
Challenge #1
Part One
Review the key concepts of lexical coverage
Key coverage thresholds
Coverage describes the percentage of words that are known in a given text
Below 80 percent coverage, reading comprehension is almost impossible (Hu & Nation, 2001)
At 95 percent coverage, it becomes possible to read without the help of dictionaries (Laufer, 1989)
Part One
80 percent coveragePart One
If * planting rates are * with planting targets satisfied in each * and the forests are * at the earliest opportunity, the * .wood supplies could further increase to about 36 million * meters . * in the * *2001-2015. The additional * wood supply should greatly * d * - * , even if much is used for energy production.
12 of 58 words missing
95 percent coveragePart One
1 of 58 words missing
If current planting rates are maintained with planting targets satisfied in each region and the forests are milled at the earliest opportunity, the available wood supplies could further increase to about 36 million * -meters annually in the period 2001-2015. The additional available wood supply should greatly exceed domestic requirements, even if much is used for energyproduction.
HF Words
1 7% West (53), Nation (90)
100 50% West (53), Nation (90)
1000 75% West (53), Engles (68)
2000 85% West (53), Nation (90)
4200 95% Culligan (08)
Relationship between high-frequency words and coverage
Coverage Research
Part One
Latest analysis of 1.2 trillion words in corpora identified the 4200 high-frequency words that provide 95 percent coverage of general texts
These 4200 words are the most direct route to attaining 95 percent coverage of general English
4200 words = 95 Percent Coverage
Part One
100%
75
50
25
00 2000 4200
Lexi
cal C
over
age
High-frequency words
95%
The essential 2000 words
Part One
4200 words = 95 Percent Coverage
Specific purpose vocabulary words for TOEIC and TOEFL exams are quite different than the vocabulary of general English
How can we help learners acquire high-frequency vocabulary for specific purposes?
Challenge #2
Part Two
Creating special purpose lexicons
(The relatively easy part)
Examples of special purpose lexicons
Assemble digital or scanned text
Filter-out junk text
Implement standards of form and lemmatization
Implement Frequency Indexation rules
Organize words by coverage and cumulative coverage
TOEIC TOEFL Steps:
Part Two
Corpus provenance
TOEIC 3461
TOEFL 5290
Part Two
Analysis of 787,382 total words from 300 top selling TOEIC preparation textbooks and 1000 TOEIC practice exams.
Analysis of 1.24 million total words from 350 top selling TOEFL preparation textbooks and 1300 TOEFL practice exams.
The TOEIC and TOEFL corpora were prepared in cooperation with Compass Publishing
TOEIC
Lexicons by cumulative coveragePart Two
99%
Words are not learned in their order of frequency to a specific purpose, but rather words are learned in order of their difficulty within each specific culture
Which high-frequency words are unknown to a given student in any particular country?
Challenge #3
Part Three
Identifying each learner’s unknownhigh-frequency words
(The relatively difficult part)
Part Three
Lexxica uses IRT to identify the statistical difficulty of each word
Part Three
IRT models are defined by:
1. Number of estimated item parameters
2. Number of estimated person parameters
3. Number of intermediated steps in estimated parameters
4. Dimensionality
Part Three
One item parameter, one person parameter model
P(U =1 |θ,b)“The probability of an item being correct, P(U = 1), is conditioned on the ability of the student (θ), and the difficulty of the item (b).”
Part Three
One item parameter, one person parameter model
⎟⎟⎠
⎞⎜⎜⎝
⎛−
=)(1
)(lnθ
θθP
P
WhereP(θ ) is the probability of a correct response given by a student with ability θ.
⎟⎟⎠
⎞⎜⎜⎝
⎛ −=
)()(1ln
θθ
PPb
Easier More Difficult
null
stoprage
burn
Part Three
Item characteristic curves
Part Three
Probability can be interpreted as:
1. The number of items having the same or lower difficulty and that are likely to be known to a student of a given ability
2. The number of students with the same ability and that are likely to know an item or set of items having known difficulty metrics
Part Three
Taking advantage of item probability
1. Each word has an associated difficulty metric
2. The probability that any given word is known depends on the ability of the student
3. A student’s coverage of a specific purpose depends on the ability of the student and the difficulty of the subject’s high-frequency words
Average occurrences per million words
An example of how frequency does not predict difficulty
Part Three
Item injured hurt
Frequency 25x 55x
If difficulty were correlated to frequency then hurt would be the easier word because hurt occurs more frequently in English texts
An example of how frequency does not predict difficulty
Part Three
Item injured hurt
Frequency 25x 55x
Part Three
Item injured hurt
Frequency 25x 55x
Difficulty 1.33 2.34
Japanese people with a 2500word vocabulary will
know hurt
Japanese people with a 1600word vocabulary will know injured
An example of how frequency does not predict difficulty
V-Check uses lexical decision tasks to identify the user’s ability
Part Three
injured
Easy word
V-Check uses lexical decision tasks to identify the user’s ability
Part Three
hurt
Not so easy word
V-Check uses lexical decision tasks to identify the user’s ability
Part Three
ghart
Many non‐words are used to control for guessing
V-Check uses lexical decision tasks to identify the user’s ability
Part Three
kohl
Is it real, or not?
Part Three
V-Check identifies which specific words are known and reports coverage by each purpose
General English
TOEFL
TOEIC
Interchange
1
・
・
・
・
・
・
・
・
・
・
・
・
・
・
・
・
・
・
・
・
・
・
・・
1,000
2,000
20,000
Coverage Goal 99%
10,000
Part Three
High-frequency
Low-frequency
Known words
Personal list of unknown words
V-Check creates personal target word lists
How can we transfer word knowledge from short-term to long-term memory?
Challenge #4
Part Four
Word Engine spaced repetition learning tools
(Available June 2008)
Part Four
High-speed learning tools menu
All Word Engine learning tools utilize a spaced repetition system to build long-term retention
Words are repeated at increasing time intervals until fully acquired
Spaced repetition engine and database
Time intervals based on research by Ebbinghaus (1885), Leitner (1972), Pimsleur (1967), and Mondria (1994)
Part Four
Flashcards focus on comprehensionPart Four
SightWords focuses on visual automaticityPart Four
SoundBubbles focuses on aural automaticityPart Four
Part Four
All learning tools provide Session Reports
Reading is a great way for learners to develop control of their vocabulary, grammar, and style
But reading materials have too many difficult words for learners – even the graded readers!
Challenge #5
Part Five
V-Lexx supports text analysis and editing for extensive graded reading
(Available September 2008)
Part Five
V-Lexx is a web-based application for creating lexically graded reading materials
Part Five
V-Lexx analyses text coverage by ability and identifies words that are too difficult
Part Five
Reading and practice materials may be edited so as to provide 95% coverage for any level of ability
V-Lexx displays the edited stories in channels (Available from Sept 2008)
Part Five
Thank you!
www.lexxica.comTo download our vocabulary researchTo use the free Word Engine softwareTo see links to other vocabulary support sitesTo download this presentation
Dr. Charles BrowneProfessor of LinguisticsMeiji Gakuin UniversityCo-founder, Lexxica [email protected]
Dr. Brent CulliganEFL InstructorAoyama Gakuin Women’s CollegeSenior Scientist, Lexxica [email protected]
Go to: