+ All Categories
Transcript
  • NT2LexA CEFR-Graded Lexical Resource for Dutch as a Foreign Language

    Linked to Open Dutch WordNet

    Anaïs Tack 1,2 Thomas François 1 Piet Desmet 2 Cédrick Fairon 11 CENTAL, Université catholique de Louvain, Louvain-la-Neuve, Belgium

    2 ITEC, imec, KU Leuven Kulak, Kortrijk, Belgium

    CEFR-GRADED LEXICONS

    a graded lexicon is a lexical database that includes lexical

    frequencies observed in texts graded along a difficulty scale

    Foreign language (L2) materials

    • textbooks and readers / learner texts

    • CEFR scale [A1 > A2 > B1 > B2 > C1 > C2] (Council of Europe, 2001)

    CEFRLex � cental.uclouvain.be/cefrlex/

    ANALYSIS Semantics

    ANALYSIS Frequency

    KEY TAKEAWAYS

    NT2Lex

    �� a new resource for Dutch as a foreign language (NT2)

    �� 17,743 entries with graded frequency distributions

    �� measure of receptive word difficulty

    �� measure of word sense complexity

    through linkage to Open Dutch WordNet

    � cental.uclouvain.be/nt2lex/

    French - FLELex(François et al., 2014)

    Swedish - SVALex(François et al., 2016)

    English - EFLLex(Dürlich & François, 2018)

    Swedish - SweLLex(Volodina et al., 2016)

    ANALYSIS Psycholinguistics

    NT2LEX

    Online tools for lexical complexity analysis

    • database search

    • CEFR-based complex word identification (Tack et al., 2016)

    Tools

    Corpus of reading materials

    • corpus of 461,088 tokens

    • 5 CEFR levels (A1, A2, B1, B2, C1)

    Preprocessing

    • part-of-speech tagging with Frog (van den Bosch et al., 2007)

    • SVM WSD tool trained on DutchSemCor (Vossen et al., 2012)

    • linkage to Open Dutch WordNet (Postma et al., 2016)

    Lexical frequencies

    • lexical entries with per-level observed frequency

    • normalised for lexical dispersion (Carroll et al., 1971)

    ResourceNT2LEX

    lemma pos sense synset A1 A2 B1 B2 C1pakkento grab

    WW() pakken-v-1 odwn-10-101230891-v 35 117 101 5 -

    pakkento defeat

    WW() pakken-v-10 eng-30-01100145-v - 51 12 - -

    zijnto exist

    WW() zijn-v-1 eng-30-02603699-v 2,094 1,647 1,423 1,253 1,335

    0 20 40 60 80frequency

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0

    disp

    ersi

    on

    r2 = 0.83

    frequency

    • correlation Subtlex-NL (Keuleers et al., 2010)

    • Zipfian effects

    shorter = more frequent

    dispersion

    • theoretical familiarity

    • more dispersed = basic voc

    A1 A2 B1 B2 C1 TOTALlevel

    0.5

    1.0

    1.5

    2.0

    2.5

    3.0

    3.5

    4.0

    poly

    sem

    es

    semasiology

    • form > meaning mappings

    • easy = more polysemous

    onomasiology

    • meaning > form mappings

    • lower degree of synonymy

    • L2-specific lexicalisations

    0 5 10 15 20age of acquisition

    0.00

    0.05

    0.10

    0.15

    0.20

    0.25

    0.30

    dens

    ity

    A1A2B1B2C1TOTAL

    0 2 4 6concreteness

    0.00

    0.05

    0.10

    0.15

    0.20

    0.25

    0.30

    0.35

    0.40

    dens

    ity

    A1A2B1B2C1TOTAL

    interplay of psycholinguistic norms (Brysbaert et al., 2014)

    /ColorImageDict > /JPEG2000ColorACSImageDict > /JPEG2000ColorImageDict > /AntiAliasGrayImages false /CropGrayImages true /GrayImageMinResolution 300 /GrayImageMinResolutionPolicy /OK /DownsampleGrayImages true /GrayImageDownsampleType /Bicubic /GrayImageResolution 300 /GrayImageDepth -1 /GrayImageMinDownsampleDepth 2 /GrayImageDownsampleThreshold 1.50000 /EncodeGrayImages true /GrayImageFilter /DCTEncode /AutoFilterGrayImages true /GrayImageAutoFilterStrategy /JPEG /GrayACSImageDict > /GrayImageDict > /JPEG2000GrayACSImageDict > /JPEG2000GrayImageDict > /AntiAliasMonoImages false /CropMonoImages true /MonoImageMinResolution 1200 /MonoImageMinResolutionPolicy /OK /DownsampleMonoImages true /MonoImageDownsampleType /Bicubic /MonoImageResolution 1200 /MonoImageDepth -1 /MonoImageDownsampleThreshold 1.50000 /EncodeMonoImages true /MonoImageFilter /CCITTFaxEncode /MonoImageDict > /AllowPSXObjects false /CheckCompliance [ /None ] /PDFX1aCheck false /PDFX3Check false /PDFXCompliantPDFOnly false /PDFXNoTrimBoxError true /PDFXTrimBoxToMediaBoxOffset [ 0.00000 0.00000 0.00000 0.00000 ] /PDFXSetBleedBoxToMediaBox true /PDFXBleedBoxToTrimBoxOffset [ 0.00000 0.00000 0.00000 0.00000 ] /PDFXOutputIntentProfile () /PDFXOutputConditionIdentifier () /PDFXOutputCondition () /PDFXRegistryName () /PDFXTrapped /False

    /CreateJDFFile false /Description > /Namespace [ (Adobe) (Common) (1.0) ] /OtherNamespaces [ > /FormElements false /GenerateStructure false /IncludeBookmarks false /IncludeHyperlinks false /IncludeInteractive false /IncludeLayers false /IncludeProfiles false /MultimediaHandling /UseObjectSettings /Namespace [ (Adobe) (CreativeSuite) (2.0) ] /PDFXOutputIntentProfileSelector /DocumentCMYK /PreserveEditing true /UntaggedCMYKHandling /LeaveUntagged /UntaggedRGBHandling /UseDocumentProfile /UseDocumentBleed false >> ]>> setdistillerparams> setpagedevice


Top Related