+ All Categories
Home > Documents > PowerPoint PresentationTitle PowerPoint Presentation Author Tom Cobb Created Date 12/1/2019 10:43:53...

PowerPoint PresentationTitle PowerPoint Presentation Author Tom Cobb Created Date 12/1/2019 10:43:53...

Date post: 07-Feb-2021
Category:
Upload: others
View: 3 times
Download: 0 times
Share this document with a friend
84
https://lextutor.ca/nancy_present.pdf 1
Transcript
  • https://lextutor.ca/nancy_present.pdf1

  • 2

  • Word families vs. lemmas as the counting unit in text coverage research – summary of the debate and resolution

    • The acquisition of vocabulary is primary in all aspects of language learning, and vocabulary is only manageable through computational analysis of spoken or written texts or corpora. This presentation will look at some issues in analysing vocabulary and learning to read in English.

    Text analysis requires some sort of grouping of words, the two main grouping principles being the word family and the lemma. Families include inflected and derived forms (analyse and analysis), lemmas only inflected (analyse and analyses). The word family is a pedagogical extension of the lemma, and has been used extensively in testing and coverage research into the amount of vocabulary that must be known for various kinds of reading.

    • Lately, however, this research has been challenged by supporters of the lemma, on the grounds that learners cannot be assumed to know all the derived forms of a typical word family. But do they have to? Are there enough derived forms in use in typical texts to affect the coverage research results? Morpholex is a text analysis program that was developed to answer these questions.

    • The presentation will be mainly in and about English, but connections to French and learning French will be elaborated.

    3

  • Familles de mots vs lemmes comme unité de comptage dans la recherche sur la couverture de texte - résumé du débat et résolution

    • L'acquisition du vocabulaire est primordiale dans tous les aspects de l'apprentissage d'une langue, et le vocabulaire n'est gérable que par l'analyse informatique de textes ou de corpus parlés ou écrits. Cette présentation abordera quelques problèmes d’analyse du vocabulaire et d’apprentissage de la lecture en anglais.

    • L'analyse de texte nécessite une sorte de regroupement de mots, les deux principaux principes de regroupement étant la famille de mots et le lemme. Les familles incluent les formes infléchies et dérivées (analyser et cet analyse), les lemmes seulement infléchis (analyse et analyses).

    • La famille est une extension pédagogique du lemme et a été largement utilisé dans les tests et les recherches de couverture sur la …

    • … quantité de vocabulaire à connaître pour différents types de lecture.

    • Récemment, cependant, cette recherche a été contestée par les partisans du lemme, au motif que les apprenants ne peuvent pas être supposés connaître toutes les formes dérivées d’une famille de mots typique. Mais doivent-ils? Y a-t-il suffisamment de formes dérivées utilisées dans des textes typiques pour affecter les résultats de la recherche sur la couverture? Morpholex est un programme d'analyse de texte développé pour répondre à ces questions.

    • La présentation portera principalement sur l'anglais et sur l'anglais, mais des liens avec le français et l'apprentissage du français seront développés. 4

  • Outlineof presentation

    Definitions• Family• Lemma• Coverage

    Family’s role in coverage research

    The lemmatizers’ challenge• Their coverage methodology

    • Retooled

    Methodology

    • Development of Morpholex

    • Making of a mini-corpus

    • Typical reading materials at 4 levels

    Results

    • What proportion of these texts are derived forms?

    • How many individual derived forms are involved?

    Next chapter

    • Nuclear lists5

  • Background issues

    Interpreting corpora in language pedagogy

    Complementarity of corpus + empirical findings

    Language teachability• And raising it through text analysis

    Coding as research

    6

  • Two kinds of groupings ~Family v. lemma

    …expanded into lists of 1,000 (families or lemmas)7

  • 8

    a

    an

    able

    ability

    abler

    ablest

    ably

    abilities

    unable

    inability

    about

    absolute

    absolutely

    absolutist

    absolutists

    accept

    acceptability

    acceptable

    acceptably

    unacceptable

    acceptance

    accepted

    accepting

    accepts

    unacceptably

    account

    accounted

    accounting

    a

    an

    able

    abler

    ablest

    about

    absolute

    absolutes

    absolutest

    accept

    accepted

    accepting

    accepts

    account

    accounts

    accounted

    accounting

    achieve

    achieved

    achieving

    achieves

    across

    act

    acts

    acted

    acting

    active

    actives

    Lemma isBase (or head) word+ Inflections

    Family isBase (or head) word+ Inflections+ Derivations

    Derivation can involveChange of POS (able→ability)Change of meaning (able→unable)Big change to base word (able→ability)Few ‘rules’

    Inflection none of these (able→abler)Hence ‘easier’

    Where do all these variant forms come from?A corpusFrequency > x

    1k byFam

    1k byLem

  • 9

    1 2

  • Et en français ~Fr utilise toujours les lemmes(Though maybe not for long)

    10

    âgée though similar isat different k-level

    école does not include écolier

    économieand économiqueare both 1kbut 2 items

  • Why do we need groupings?

    Computationally• Modern corpus is millions of individual words (‘tokens’)

    • Impossible to manage individually

    Pedagogically• If a learner knows “cat” there is no reason to treat “cats” as a

    new word• But : “catty” ?

    • Syntax: Different part of speech• Semantics: Rarely applied to cats themselves

    • Ss may know cat and not really know catty

    • It is a question of how much to include in the groupings11

  • Why do we need groupings?

    • To profile the frequency of words in texts • in a clear and useful manner

    • And discover the lexical challenge of different texts• Especially in conjunction

    with vocabulary testsemploying the same measure

    • What do you notice in this profile?

    12

  • Coverage – the magic numbers 95 and 98

    • Coverage = the extent to which a certain word list ‘covers’ (accounts for, contains) a given percentage of a text or corpus

    • The coverage points of pedagogical interest have been determined by empirical (not computational) research :• Texts can be comprehended with resources when 95% of

    individual word tokens are known• 95% typically corresponds to 5,000 families known

    • Texts can be comprehended independently when 98% of the individual word tokens are known• 98% typically corresponds to 8,000 families known

    • Those are the words learners need

    13

    Average text95% = c. 5,000 word fams98% = c. 8,000 word fams

  • So is this a ‘difficult’ or ‘easy’ text?

    14

    Average text95% = c. 5,000 word fams98% = c. 8,000 word fams

  • 15

  • How many words do learners know?For this we use receptive family-k-level-based testing

    16Typically % score at a level, x 1,000 families = learners’ receptive knowledge at that level

  • Now we can match ‘knows’ with ‘needs to know’ for particular texts (types)

    17

  • For example

    Suppose a learner’s score on VST is this:

    1k=80%2k=70%3k=0

    And the text he is reading profiles like this

    80% at 1k10% at 2k10% at 3k

    Typical situation

    Then his 1k knowledge gives him 80% x 80% = 64% of the words in the text

    And his 2k knowledge gives him 70% x 10% = 7% of the words in the text

    So this learners is reading a text with

    64% + 7% = 71% of its words known•

    While research suggests 95% is minimally needed

    • What does it feel like to read a text at 71% coverage? →

    18

  • 71% coverage is far from 95%

    19

  • This research, however, is based on the family as unit of word-counting• It would

    not be the same if we used the lemma as the counting unit

    • Look at 2 profiles →for the same text (Rex M.)

    • The diff. being(presumably) ± the derived word forms

    20

  • As in text, so in a corpus

    21

  • Fams & lems : Peda-Pros & Cons

    FAMILY

    • Whole family is together in one place• Lemma will put very similar

    forms in widely separated lists• ADAPT k=3• ADAPTATION k=6• ADAPTABLE k=12

    • Seems inefficient• Lemma underestimates the

    learner

    FAMILY

    • A small number of fam. lists cover any text or even corpus• Exhaustive - every word will get

    classified by the profiler• While lemma needs many lists to

    profile a text• 50% more• With lots of redundancy• Probably unusable in practice

    • But family overestimates the learner?

    22

  • The issue: 100 of the best L2 reading studies in recent years…

    • Are based on the word familyE.g., Nation, Laufer, Schmitt, Grabe,…

    • But is it a convenience to use this unit or a principled decision?• Family = smaller set of lists; tidier computer output; easier for

    practitioners to understand; matches common sense (1k=speech, >3k+=text, etc.)

    23

  • A group of researchers, primarily in Japan…

    • Working with learners who have little contact with English outside the classroom• And a very examination-driven

    approach to language learning

    • These researchers strongly dispute the use of the word family

    • These researchers argue that no knowledge beyond the lemma can be assumed in their learners

    • run runs running• runner × a run ×

    • And therefore the coverage research, based on assumed word-family knowledge, does not describe most Japanese learners• Or many other learners

    worldwide

    24

  • → A serious problem

    Which has occupied vocab research conferences for past five years

    25

  • The problem could be language interference

    • L1-L2 differences in affixation• Japanese affixation is like compounding ?

    • Both parts remain identifiable• English can twist the base word quite severely to add an affix

    • Able-ability• Particularly affecting the pronunciation

    • Pronounce – pronunciation• Such that base word becomes less identifiable

    • French may be less problematic here• “L’accent tonique” means base word is not lost ?

    • In orthography or pronunciation

    • Fr: Science-scientifique

    • Eng: ScIence-scIenTIFic 26

  • Fam v. Lem could have been just one more interminable debate…

    Of the type we know so wellExcept that one of the Japan researchers, Dale Brown, pointed to a way forward

    • Brown asked whether/how much derived forms are in fact used in texts• Hoping to show they are used a great deal

    • To explain his learners’ weak reading ability

    • Specifically, are derived forms needed to reach the 95% and 98% coverage points ?• If Yes, then the family-based coverage research does not apply to learners who know

    only lemma forms• For the words they know at all

    • A brilliant idea to measure word forms’ contributions to coverage• Except that Brown didn’t go all the way 27

  • What Brown didMétho ~For the first 5,000 head words in Nation’s BNC-Coca family lists ~• He took a random 100-head word sample from each 1,000

    • (= 500 head words total)

    • Then looked up all the forms for each family on online British National Corpus • Several look-ups for each family

    • since BNC is lemmatized• … adding up the frequency figure for all inflected and derived forms in

    each family

    • (This must resemble Paul Nation’s original fleshing out of these lists as families – but for 500, not 25,000, words)

    28

  • 29

  • Example

    30

    Able 29657

    Ability 9,054

    Abilities 1,324

    Abler 0

    Ably 96

    Unable 6,134

    Inability 1,087

    Inabilities 5

    TOTAL 47,357

    SoTotal word-forms 47,357Derived forms 17,700>(all but ‘able’ and ‘abler’)

    Percent derived

    Forms 37.37%

    And so on for 500 random families

  • So if c.35% of a given family in the British National Corpus consists of derivations…

    • Then a learner who knew all the words in the corpus• But only as lemmas, not as derivations• Would be reading this corpus with c.65% coverage

    • And comprehension would be low

    • But: do learners read a corpus ?• No

    • Can what’s in a corpus be extrapolated to what’s in its constituent texts?• A good question• A clue to its answer is in another part of the BNC output →

    31

  • BNC is c. 100 million word tokens

    Comprising c. 4000 texts• in 100 text-types

    • of c.25,000 words each

    • Output tracks each search-term back to individual texts• So derived form ‘ability’ is in 2090/4048 = 52% of BNC’s texts

    • So far, so good for Brown’s argument

    32

  • But BNC also gives the distribution of these texts in different parts of the corpus

    Ex, ‘Ability’

    • Barely present in speech, or in fiction

    • Fiction : 438 hits in >16 million wds

    • News : 718 hits in > 9 million wds

    • This raises the question

    • Can whole-corpuscoverages be described as general?

    • And specifically :Do they represent the type oftexts that ESL learners typically read?

    33

  • How can we track derivations in texts that learners typically read?• Finding typical texts is no problem

    • Beginners - graded readers

    • Intermediates – novels and newspapers

    • Advanced and non-native TESL trainees- Academic research articles

    But how to count up the proportion of derived forms in these texts?• Specifically, how to determine if there are enough to undermine 95% and 98% coverage

    for learners who know only lemmas

    • (For the words they know)

    34

  • Enter Morpholex

    Formerly a minor Lextutor routine

    • A “list profiler” • For cracking a family (or lemma) list into its base

    words and various morphologies

    • Example: here is k2 at Bauer and Nation Level 2 (with just inflected forms) →

    35

  • 36

  • Extended in 2019from list profiler to text profiler

    37

  • 38

    To note :Count (# tokens) and coverage (% of tokens) for each level are given

    Ex, Level 3: 12 derivs, 1.7% of text

    Cumulative %’s are givenIndicating point where 95 and 98%are met/surpassed

    Particular affixes are identified that were needed toreach 95 and 98%

    Ex, Base + Inflect + 1 deriv affix (~ion) were needed to reach 95%

  • Baseword check

    • A few errors • 6/695 wds (1%)

    • Many derivations where it isnot the case that a basewordcould have been known without the affix

    formidable - able = formid ?

    39

  • Definitions & program design (=métho) Levels 1-7?

    • These are Bauer & Nation’s (1993) framework for identifying morphology levels

    • By frequency, transparency, regularity, degree of change imposed on the base word -‘difficulty’

    LEVELS

    • 1 = base words

    • 2 = inflected forms

    • 3-7 = derived forms

    Total number in the B&N scheme : 100(not exhaustive)

    The Morpholex organigram

    Level by level, 1-7 ~

    • Each word is matched against a list of B&N prefixes and affixes

    • If affix found, asks: Is the remainder without this affix present in a list of all possible words?

    • With adaptations like ‘stun’ → ‘stunn’ so that ‘stunning’ minus ‘~ing’ is a real word

    • If Yes, it’s an inflected or derived form• If No, it’s a base word

    • I.E. It’s considered a derived form onlyif learners could have recognized the base word if not extended into a derived form ***

    40

  • The B&N framework (FYI – not really needed in this presentation)

    41

    Level 1 Base words

    Level 2 Base words + inflections ('lemmas')

    -s (on noun or verb), -ed/-ing (on verb), -er (er2)/-est (on adjective), -th (on number), and -en (en2) on irregular verb

    -er2 and -en2 are named to separate them from verb+er in Level 3 and noun+en in Level 5

    All the rest involve derivations (change in meaning and/or part of speech)

    Level 3 Frequent and regular affixes with minimal change to the base word in speech or writing

    -able/ible, -er/-or (on verb), -ish, -less, -ly, -ness, -th, -y,

    non-, un-

    Level 4 Frequent orthographically regular affixes which often impose pronunciation change (admIre => admirAtion

    -al (autumnal), -ation (admiration), -ess (fortress), -ful (plentiful), -ism (dogmatism), -ist (semanticist), -ity (solemmnity

    in-

    Level 5 Less frequent but regular affixes

    -age (leakage), -al (arrival), -ally (idiotically), -an (American), -ance (clearance), -ant (consultant), -ary (revolutionary),

    anti- (anti-inflation), ante- (anteroom), arch- (archbishop), bi- (biplane), circum- (circumnavigate), counter- (counter

    Level 6 Frequent but irregular affixes (often with significant change to base word)

    -able (inscrutable), -ee (lessee), -ic (spastic), -ify (mollify), -ion (superstition), -ist (solipsist), -ition (transition), -ive

    pre-, re-

    Level 7 Classical affixes

    -ar (circular), -ate (electorate), -et (packet, casket), -some (troublesome), -ure (departure, exposure)

    ab- (abnormal), ad- (admixture), com- (commiserate), de- (demist), dis- (disintegrate), ex- (out - external), in-(in -

  • Métho (2)

    Mini-corpus of 250,000 words

    5 instances each of 5 text types, - run individually through Morpholex

    • Applied linguistics articles (5)• ‘Quality’ news stories (5)• Classic novels (5)• Simplified novels (7)

    42

  • Results

    How manyderived formsby text type?

    43

  • 44

  • 45

  • 46

  • Summary 1

    So it seems extensive knowledge of derivations is needed only for academic and quality press• While classic literature needs less

    • Derivations are a little over 5% of word in texts• So almost 95% is just base words and inflections

    • And graded (simplified) stories virtually none• Base and Inflections alone get us to 95, often to 98

    A rather optimistic picture• And it gets better when we count up the number of different individual

    affixes (types) →

    47

  • Results (2)

    How many dif-ferent affix types?

    Even in academictexts, knowledge of just 3 affixes gets learner to 95%

    Another 9to 98%

    48

  • Results (2)

    News texts similar

    Just 3 affixes gets learner to 95%

    And another 6affixes to 98%

    49

  • Results (2)

    Novels even better

    Zero affixes gets learner to 95%

    And another 8affixes to 98%

    50

  • Results (2)

    Graded novelsbetter still

    Zero affixes gets learner to 95%

    And another 2affixes to 98%

    With average for all →text types 2 and 6

    51

  • Summary 2

    • A very small number of derived forms (affixes) is needed across the types to reach 95%• (= “comprehension with resources” - surely the typical situation of a learner)

    • And a manageable number to reach 98%

    • And it gets even better if we look at the repetition of particular affixes →

    52

  • 53

    1. CommonalitiesResults (3)– Distribution of affixes across text types

  • 54

    2. SpecificitiesResults (3)– Distribution of affixes across text types

  • Overall picture is of a small handful of affixations in common use

    (First 18 of 36) →

    Just 8 are > 80% of totalJust 12 are > 90%Just 17 are > 95%

    55

  • (All 36) →

    A further 18 affixesmake up the remaining5% of all affixations

    56

  • Summary 3

    A small handful of affixations form the vast majority

    • Are easily within what an ESL learner can cope with

    • Once the affixes identified

    • A slightly larger but still manageable number of affixes form the rest

    • But are little used

    • Forming fewer than 5% of affixations

    57

  • Summary 3b

    Brown’s picture of impossible texts with 35% of words unknowable to learners with only lemmas…

    is vastly exaggerated

    Many texts are available with almost no derived forms

    • And any text is accessible with minor preparation• = learning or being taught just 5-10 affixes, even in

    academic texts• Which will be re-encountered in whatever texts are presented

    thereafter• And would not need to be re-learned

    58

  • Summary 3c

    Tests yet againSo family-test scores are reliable predictions of text coverage for test-takers who know only lemmas

    • So a score of 80% at 1k means that 80%x70%=56% of words are known at that level????• (i.e., with adjustment of 30% derivatives subtracted)

    No!• it means that 80% x 95%=76% of words are known at that level

    • (with 5% derivatives subtracted)• And fewer in many text types

    59

  • Summary 3dAnd the existing family-based coverage research is safe, valid, and worth carrying forward

    This research is family based, but ~

    • The vast majority of derived forms in a given family • While present in a 100 million word corpus of 4,000 texts in 100 text types• + needed so that every word in a VP gets categorized

    • Will not be a significant component of particular texts that we might put in front of learners • Derivations will be well under 5% in most cases and under 2% in many• And any effort invested in learning these will be well repaid

    • Since they are few in number and much repeated

    • So it is a tale of many family members that we don’t see very often• Except when amassed in a corpus 60

  • Summary 3eIt is worth mentioning that Paul Nation long ago proposed that lemma and family are not an opposition

    just different stages in word knowledge

    It is kind of an accident that the coverage work was done exclusively with families

    A parallel strand of coverage work could easily use the lemmaIt’s just that it wasn’t done→ A kind of oversight

    61

  • How surprising is any of this?

    English has been losing its morphology for centuries

    Nominative, accus-ative, genitive, etc.

    Inflections→

    62

  • 63

    Derivations→

    A topic foranother day

  • So - is there a reduction in derivations over time?Four non-literary texts over the centuries

    Could be a trend

    64

  • Further work

    So with all that, where does this work go next?

    • As well as supporting the future of the coverage research, it has some future of its own• To do with →unused family members• And developing word lists that accommodate these

    • The purpose of exhaustive lists is clear• But surely more targeted lists would also have their uses

    • Or call them ‘nuclear’ lists

    We saw that

    • While many affixations exist across text types…• ~ly, un~

    • …others form ‘a profile’ specific to particular text types• ~ful, ~y & ~ness in FICTION

    wonderful, beautiful; dreamy, rainy; happiness, gladness • ~al & ~age in ACADEMIC/NEWS

    societal, governmental, culturalusage, coverage, percentage

    65

  • Toward non-exhaustive or ‘nuclear’ lists

    These too would have their usesThe word forms most used

    Productive not just receptiveHow to get such lists?

    Method: ‘multiply’ complete generic family lists• Against particular small corpora or text collections

    • Science corpus; general corpus; course materials• And reduce the generic list to just the inflections and derivations actually found in the

    corpus• And even whole families

    • This should be a useful list• Small• Useable for both production & reception

    66

  • 67

  • Size reductions at 1k

    68

    BNC-COCA 1k as FAMILIES 1,000 families 6,866 word types

    BNC-Coca 1k as LEMMAS 998 lemmas 3,316 types 3316/6866=48%

    Family x Brown corpus 984 fams 4,723 4723/6866=69%

    Family x BNC-Medical 814 3,459 3459/6866=50%

    Family x BNC-Law 505 2,569 2569/6866=37%

    Family x BAWE-Engineering 712 3,283 3283/6866=48%

    Graded story 996 2,442 2442/6866=36%

    To noteMany of these reduced lists have fewer items (types) than the 1k lemma list

    • But without loss of valuable items(All these items are IN the target corpus x number of occurrences)

    And the reductions are even greater as we move away from the high-frequency 1k zone →

  • Size reductions at 4k

    69

    BNC-COCA 4k as FAMILIES 1,000 families 4,868 word types

    BNC-Coca 4k as LEMMAS 998 lemmas 2,911 tokens 2911/4868=60%

    Family x Brown corpus 817 fams 2,928 2928/4868=60%

    Family x BNC-Medical 395 1,742 1742/4868=36%

    Family x BNC-Law 505 648 648/4868=13%

    Family x BAWE-Engineering 334 528 528/4868=11%

    Graded story N/A N/A

  • Conclusion: [Fam v. Lem] or [Present v. Absent] ?

    • Rather than starting from lemma list• With loss of valuable items• And guessing at what learners know and don’t know

    • It is better to start from a complete family list• And reduce it to what is present in a corpus

    • (properly chosen or designed)

    • And teach learners the few derivations likely to be involved• While maintaining all the useful research that has already been done in

    the family framework• And the tests• And the computer tools

    70

  • Uses of nuclear lists?

    • Students love lists• These are targeted

    • Great concordance exercises

    • Small size lends itself to flashcards

    • With Text Lex Compare nuclear list can be compared to exam script• Exam should be no more than 2%

    novel lexis

    71

  • Applications to FrenchSome of these issues do not arise, because…

    Family lists have never been developed for French• Even lemma lists have attracted only minor interest within

    didactiques• Lextutor however does have a

    • lemma-based French profiler and

    • Lemma-level vocabulary size test

    72

  • 73

  • Le corpus utilisé par Lextutor

    Corpus Liste

    74

  • Lextutor has 25 French 1,000-lemma lists & calculates coverage

    75

  • Le lexique divisés en groupes de 1,000 lemmes1k 2k 3k 4k 5k etc … →25k

    76

  • Les testes de lataille ou du niveau

    de vocabulairedes apprenants

    • Exemple validé et publié(Le “TTV”, RCLV, 2016)

    77

  • 78

  • But the lists themselves get little use• “Didactique du français” sees little value in lists

    • Because background of personnel?• Literature-oriented people find lists mechanical

    • But they also find testing, CALL, etc mechanical• Corpus-oriented people are lemma oriented –

    lemma is easy to compute• But lemma lists make no sense pedagogically

    • Huge numbers of levels• Full of redundancy

    • École is K1, écolier is K8

    • So that once we get to about 5k there is less to learn in each ‘higher’ list ------------------→

    • In most of these the original lower-k base word is present and obvious

    79

  • Applications to French•Didactique du français has never seen much value in

    lists• Because reading is rarely a predominant goal?

    • No comparable stampede of overseas L2 learners to Francophone universities as in English• Where reading will be maker or breaker

    • So the burning questions of coverage research• Make less sense

    • Similarly the categorization of words it depends on

    80

  • This could be about to change

    • Gothenburg University (Sweden ) French Dept has invited me to help them develop a set of family lists• March 2020• Christina Lindquist & colleagues

    • They want to develop a quantitative/computational approach to teaching French

    • And find the lemma useless for ~• Teaching• Testing• Sequencing materials• i.e. Practice

    81

  • In which case…All the same questions would arise that arose in English

    • Do learners know words qua ~• Individual words• Lemma groups• Family groups ?

    Ripe with opportunities for empirical research !

    My suspicion?Derived forms are less of a problem in French than English

    So French family lists are worth looking atAnd particularly the nuclear lists to be derived therefrom

    Particularly to get rid of those long lists of little-used verb formsReferences→

    82

  • 83

    When this is (pre-)published the link will be here :

    lextutor.ca/research/ (Click ‘papers’)

  • Selected ReferencesCOVERAGE

    • Laufer & Ravenhorst (2010). ‘Lexical threshold revisited: Lexical text coverage, learner’s vocabulary size and reading comprehension,’ Reading in a Foreign Language 22: 15–30.

    • Schmitt, Jiang & Grabe (2011). ‘The percentage of words known in a text & reading comprehension,’ Modern Language Journal 95:26–43.

    FAMILIES

    • Bauer & Nation (1993). ‘Word families,’ International Journal of Lexicography 6: 253–79.

    FAMILY CRITICS

    • Brown (2018). ‘Examining the word family through word lists,’ Vocabulary Learning & Instruction 7: 51–65.

    VOCABPROFILE

    • https://www.lextutor.ca/vp/comp/

    FAMI/LEMMATIZER

    • https://www.lextutor.ca/familizer/

    LEVELS TESTS

    • https://www.lextutor.ca/tests/

    COVERAGE CALCULATOR

    • https://www.lextutor.ca/cover/

    MORPHOLEX

    • https://www.lextutor.ca/cgi-bin/morpho/lex/

    NULCEAR LIST BUILDER

    • https://www.lextutor.ca/freq/nuclear/

    RESEARCH

    • https://www.lextutor.ca/research/

    84

    https://www.lextutor.ca/vp/comp/https://www.lextutor.ca/familizer/https://www.lextutor.ca/tests/https://www.lextutor.ca/cover/https://www.lextutor.ca/cgi-bin/morpho/lex/https://www.lextutor.ca/freq/nuclear/https://www.lextutor.ca/research/

Recommended