Facet and Prism based Model for Pedagogical Indexation of Texts for Language Learning - The...

FACET AND PRISM BASED MODEL FOR PEDAGOGICAL

INDEXATION OF TEXTS FOR LANGUAGE LEARNING

The Consequences of the Notion of Pedagogical Context

Mathieu Loiseau, Georges Antoniadis and Claude PontonLIDILEM, Universit Stendhal Grenoble 3, Grenoble, France

{mathieu.loiseau, georges.antoniadis, claude.ponton}@u-grenoble3.fr

Keywords: Pedagogical indexation, Computer assisted language learning, Natural language processing, Modeling, Meta-data.

Abstract: In this article, we discuss the problem of pedagogical indexation of texts for language learning and address itunder the scope of the notion of “pedagogical context”. This prompts us to propose a new version of a modelbased on a couple formed of two entities : prisms and facets.We first evoke the importance of material selection in the task of planing a language class in order to introduceour point of view of Yinger’s model of planing applied to language teacher’s search of texts. This is closelyintermingled with the elaboration of the notion of pedagogical context from which our model stem. Thisversion though in a way similar to our first attempt provides sounder notions on which to build on.

1 PEDAGOGICAL INDEXATION

1.1 The MIRTO Project

The MIRTO project, started in 2001, stemmed fromthe observation of various recurrent issues in Com-puter Assisted Language Learning (CALL) systems:rigidity, inability to adapt the learning sequences tothe learners and the necessary adaptation of teacherswho are not provided with means to manipulate con-cepts pertaining to their field of expertise (languagedidactics) (Antoniadis et al., 2004). The aim of Multi-apprentissages Interactifs par des Recherches sur desTextes et l’Oral (MIRTO) was to promote the use ofNatural Language Processing (NLP) in order to ad-dress those problems by adding an abstraction layerbetween the user and the material he/she manipulates.Indeed, they consider that to allow teachers to formu-late their problems in didactics relevant terms, lan-guage should not be handled as character sequencesbut as a system of forms and concepts (Antoniadiset al., 2005). In order to do so, MIRTO proposes toseparate treatments (e.g. gap-filling exercise genera-tion script) and the data on which they are to be ap-plied (a text in this case).

1.2 Definition and Objectives

This made evident the need for a text base, which,for consistency’s sake, would have to allow user’s toperform language teaching driven queries. In otherwords, a subpart of the problem was the conceptionof a system that could perform pedagogical indexa-tion of texts. In this work we defined pedagogical in-dexation as “an indexation performed according to adocumentary language that allows users to query forobjects in order to use them for teaching” (Loiseau,2009). Considering the aforementioned context, weare therefore working towards pedagogical indexationof texts for language learning.

Indeed, a study of the literature concerning themost often used language teaching methods and aseries of interviews with some language teachersprompted us not only to consider this problem in thecontext of the future use of the text in a CALL activ-ity, but to try to consider the problem globally: fewof the teachers we had interviewed were really com-puter savvy, all the same, they all underlined the im-portance of text search in their practices. We later gotconfirmation of this nature of things by a larger scalestudy, which established text search as a common taskin language teaching (Loiseau, 2009).

Having modified the scope of our work – with-out completely cutting ties with MIRTO, for integra-tion remained a perspective – into the conception of

413

https://www.researchgate.net/publication/32231420_NLP-based_scripting_for_CALL_activities?el=1_x_8&enrichId=rgreq-a15dca8d-21cb-44e4-bc84-a3ff1ba3ba2b&enrichSource=Y292ZXJQYWdlOzQ1NDUxMTc5O0FTOjEwMjY4MjYyNjg4NzY4MEAxNDAxNDkyODU0MTI4

a model for pedagogical indexation of texts for lan-guage teaching, we started to consider the existingmeans to achieve it.

1.3 Learning Resource Description

Standards

A wide array of research tackles the definition anduse of learning resource description standards. Theprincipal standards we analyzed were Learning Ob-ject Metadata (LOM) (IEEE, 2002), Sharable Con-tent Object Reference Model (SCORM) (SCORM,2006) and some teaching oriented application pro-files of the Dublin Core Metadata Initiative (DCMI)(GEM, 2004; edna, 2006). As for providing a so-lution to our problem, all the standards we stud-ied came with the same flaws, most of which comefrom the fact that these standards try to integrate inthe same model, entities of very different conceptuallevel: the ressources used to set up activities (low ag-gregation level in the LOM terminology) and the ac-tivities themselves (higher aggregation level) (Pernin,2006). Balatsoukas et al. take the analysis a little bitfurther in pointing out that the lower the aggregationlevel of the learning object the broader its spectrum(i.e. the range of activities that can be performed withit) (Balatsoukas et al., 2008). Indeed, in the particu-lar case of texts (raw resources), the descriptors pro-vided by the standards seem, at best, difficult to use:how does one assign a “Description” (“Comments onhow this learning object is to be used” (IEEE, 2002))when the resource potentially could be used in differ-ent contexts.

The approach advocated by Recker & Wiley pro-poses to treat differently what they call intrinsic(“derivable by simply having the resource at hand”)and extrinsic properties (which “describe the contextin which the resource is used”) (Recker & Wiley,2001). All the same, their analysis cannot be directlytransposed to our problem, for their aim is to providea collaborative resource description system in whichauthoritative and non-authoritative annotation coex-ist. On the other hand our aim is, in the first place,to provide a model that would allow a system to auto-mate as much as possible the pedagogical indexationof texts. User annotation is, in this context, more apotential extension of the system than a core feature.There was therefore at this point no clear cut directionin which to go: the pedagogical properties seemed toconstitute extrinsic properties for the raw resourcesthat are texts, thus potentially discarding educationalmetadata as a solution. We therefore decided to resortto an empirical study to confirm this hypothesis andget a grasp of teachers practices regarding text search.

2 PEDAGOGICAL CONTEXT

2.1 Empirical Study

Our empirical study took the form of a survey, whichbuilt on a series of interview and an exploration ofthe literature, part of which we have just summed upabove. Beyond the confirmation of the hypothesis ofthe multiple uses texts can have in language teaching,we aimed at obtaining a first look into the process oftext search. We meant our point of view to be as gen-eral as can be, in the hope to extract invariants, thatwould remain unaffected by variables such as the lan-guage taught, the country in which it is taught or towhom. The study was mostly filled online, but alsoin paper form, both medium adding up to 130 testi-monies. Beside confirming unequivocally that textscan be used in various language teaching situations1,the survey allowed us to extract a (non necessarily ex-haustive) list of four practices that lead to texts beingused in language learning: search for a text to usein a precise activity, writing the text, text encounterduring personal readings and texts on a syllabus (ofany form). We will focus here on the provenance thatis closest to the role of a pedagogically indexed textbase, i.e. the search for a text in order to use it in aspecific activity, which also happens to be the mostwidely represented practice (concerning nearly 97%of the teachers answering the survey).

2.2 Adaptation of Yinger’s Model

To describe the task of searching for a text for agiven activity we resorted to using Yinger’s modelof planification (Yinger, 1978) or more precisely partof it. Yinger defines planing as a three stage pro-cess: problem finding, problem formulation/solutionand finally implementation, evaluation, routinization(Yinger, 1978). In our task, the problem is alreadyfound (the teacher has an activity in mind) and thesearch is supposed to provide a text to actually usein class and thus precedes implementation. We fo-cus here on the problem formulation/solution, whichaccording to Yinger is an “helicoidal” repetition ofthree phases: elaboration, investigation and adapta-tion (Yinger, 1978), which we adapt to our problemunder the labels selection, evaluation and transforma-tion (cf. figure 1 p. 3) (Loiseau, 2009).

The dashed semi-ovoid at the bottom of figure1contains a set of texts the teacher has access to. The

197,3% of the teachers who answered the question de-clare they consider that a given text can be used with vari-ous goals in different contexts and 94,5% of them (92% ofour the sample) declare having done so.

ICSOFT 2010 - 5th International Conference on Software and Data Technologies

414

https://www.researchgate.net/publication/220374694_Learning_Objects_Update_Review_and_Critical_Approach_to_Content_Aggregation?el=1_x_8&enrichId=rgreq-a15dca8d-21cb-44e4-bc84-a3ff1ba3ba2b&enrichSource=Y292ZXJQYWdlOzQ1NDUxMTc5O0FTOjEwMjY4MjYyNjg4NzY4MEAxNDAxNDkyODU0MTI4

Transformation

Evaluation

Selection

+ Evaluated properties

+ Projected properties

Figure 1: Yinger’s model adapted to text search.

intensity of the gray inside the form represent towhich extent they are pedagogically “connoted”. Forinstance a text taken straight from a newspaper andthat has never been used in teaching (to the knowl-edge of the teacher) is not connoted, whereas a textrecommended by peers or found inside a textbook hassome sort of pedagogical connotation. The aim is notto evaluate this “connotation” or even theorize it, butto acknowledge that the teacher can resort to sourceswith different statuses.

The selection phase consists in the teacher relyingon his necessary preconceptions2 projecting onto thetext properties linked in a way or another to the activ-ity they are planning. An example of such a behav-ior would be a teacher choosing an author based onsome properties they attribute to their writing: “RoaldDahl, [...] all his short stories are packed with theseverbs [...] for emotion and gestures [...], that in French[require] a whole phrase [...].” (testimony from ourstudy).

Once the text is selected based on the propertiesthat the teacher has attributed a priori to the text, thetext is actually in the hands of the teacher (or virtu-ally so) for the first time in this planning sequenceand they can now attribute a new set of properties tothe text. They are no longer projected properties, theyconstitute the teacher’s actual perspective on the re-source based on the activity they want to set up withit. This set of properties can confirm or invalidate theones that have been assigned during the first phaseor concern totally different aspect of the text. For in-

2Without preconceptions this phase would consist in arandom selection of texts.

stance, it is completely imaginable that the teacher wequoted above should confirm her hypothesis, but con-clude that the short story can turn out to be difficultfor her learners, which brings us to the last phase: tak-ing action upon the evaluated properties. The actiontransforms the text status-wise, there are three alter-natives:• the text is assigned a use context corresponding

the teacher’s current search and is transformedinto actual teaching material (solid arrow in fig-ure 1);

• the text, though considered unfit for this particularactivity, is deemed useable in another context andcan be kept for future use in a personal repository:it is transformed into potential teaching material(dotted arrow in figure 1);

• the text is not relevant from the teacher’s point ofview and is just discarded (not represented).

2.3 First Definition

The description of these three phases allowed us toprecise the role of a pedagogically indexed text base:it is meant to assist the teacher in the selection phaseand possibly allow him to perform it according toless instinctive criteria when applicable (for exampleconcerning the linguistic content of the text), but italso allowed us to introduce the notion of Pedagogi-cal Context (PC)3 as: “set of features which describethe teaching situation” (Loiseau, 2009). This notionis especially useful in order to describe the process oftext search and its integration in a learning sequencefor the various iterations of the above scenario corre-spond to a gradual definition of the PC: the material isa component of the teaching situation (Charlier, 1989)thus influencing it and at the same time its choice isinfluence by the other components of the PC since thesearch is performed for a given activity. In order toachieve pedagogical indexation of texts for languagelearning, it seems necessary to be able to take into ac-count the PC, which means studying the link betweencomponents of the PC and the actual properties of thetext.

3 PC AS AN INFLUENCE CAST

ON TEXT PROPERTIES

Among our objectives with our second survey wastrying to establish relations between properties of the

3In order to avoid exceedingly numerous repetitions, wewill either refer to it using “PC” or its complete form “Ped-agogical Context”.

FACET AND PRISM BASED MODEL FOR PEDAGOGICAL INDEXATION OF TEXTS FOR LANGUAGELEARNING - The Consequences of the Notion of Pedagogical Context

415

PC and properties of the text. We cross-examined:

• the activity type (gap-filling exercise – 3 types –,comprehension activity, introduction of new no-tions – vocabulary or syntax –) with the size oftext, the number of representative elements of anotion (if the notion is the preterit this will bethe number of preterit conjugated verbs of thetext, and the tolerance to newness (vocabulary andgrammar-wise);

• the learners’ first language and tolerance to new-ness ;

• the learners’ level and tolerance to newness.

The length of the text and the number of represen-tative elements were numerical variables and askedfor each activity type. In this case, the tolerance tonewness was evaluated using two separate categori-cal variables, one concerning new vocabulary (otherthan the object of the lesson) and the other concern-ing new grammatical structures (other than the objectof the lesson). Both variables could take their valuesbetween “proscribed”, “tolerated” and “sought”. Foreach activity type used, we asked the teachers to ratetheir tolerance to newness using this scale for bothvariables.

When crossed with the learners’ level and first lan-guage, the tolerance to newness was also the object ofa closed-ended question. These questions allow theteacher to state that the criteria is not relevant or candecide not to answer. The other two possibilities de-pended on the question and do not distinguish vocab-ulary and grammar:

• first language: the more similar the mother tongueand the learned language, [the more/the less] onewill accept unknown grammatical structures orvocabulary;

• level: the higher the level, [the more/the less] onewill accept unknown grammatical structures orvocabulary.

The results can be summed up by figure 24.Properties such as the length of a given text are to-

tally independent from the Pedagogical Context andthus do not need it to be computed, but our studyshowed that the activity type had an effect on textlength5, which means that depending on the activitytype, teachers will be looking for texts of different

4Due to room restrictions we cannot include detailedstatistics in this paper, they are available in section 5.3(pp. 231–245) of (Loiseau, 2009) though.

5ANOVA: F(143) = 3,362 ; p <,01. Post-hoc tests aresignificant when comparing “comprehension activity” withthe various forms of “gap-filling exercises” (Loiseau, 2009).

Text Properties

Pedagogical Context

Text length

Goals

Audience : - Level - L1

Activity

Decision

Number of representative elements

Unknown vocabulary and structures

Decision

Decision

Syst

em

text description

Sequence for a given text

Figure 2: Influence of the pedagogical context on the attri-bution of text properties.

lengths. A text property such as the number of repre-sentative elements of a notion obviously depends onthe notion, which in turn is a direct consequence of thepedagogical goals of the teachers. Likewise, the num-ber of representative elements of a notion that is con-sidered appropriate by the teacher will depend on theactivity type (e.g. 4 or 5 occurrences might be enoughto introduce a notion, whereas to practice it under theform of a gap-filling exercise teachers seek an aver-age of 11 occurrences)6. Finally, if the amount of un-known vocabulary/structures is a property of the text,it cannot be evaluated unless we link it with the au-dience with whom the activity is going to be used. Itdirectly depends on the level of the students, which isalso used differently afterwards to take a decision onwhether or not to use the text: the higher the students’level the more tolerant the teachers will be regardingthe presence of new vocabulary or structures (otherthan the object of the lesson). The activity type7 andthe proximity between the learners’ language and theone that is taught also seem to have a significant effecton the tolerance to “newness”8.

The various tests we have performed on the aboveseries of variables tend to show that the Pedagog-ical Context indeed influences text properties. Welack data to precisely characterize the relations be-

6ANOVA: F(127) = 4,739 ; p<,005. Post-hoc tests aresignificant when comparing “introduction of a new notion”with “comprehension gap-filling exercise” and “introduc-tion of a new syntactic notion” with “form aimed gap-fillingexercises” (Loiseau, 2009).

7χ2(10) = 32,2 ; p <,001 (Loiseau, 2009).881.3% of teachers taking into account their learners

first language considered that closer languages allow moretolerance (Loiseau, 2009) and 71.4% consider that thehigher the level of the learners the more unknown vocab-ulary/structures they will accept (Loiseau, 2009).


416

tween text properties and the PC, but we have beenable to demonstrate their existence. The fragmen-tary knowledge we have come to gather have yet al-lowed us to explore examples of ways to take into ac-count these concurrent influences that the Pedagogi-cal Context has on text properties or on the way toact upon them. Interestingly, they all follow the samepattern, the properties which depend on the PC repre-sent a sort of point of view of the text reflecting theproblem of the teacher in his search. The PedagogicalContext, despite still representing the same entities inthe real world, has become, thanks to a switch of fo-cus, “a paradigm casting its influence on the texts’properties”.

4 PRISM-FACET BASED MODEL

The following model aims at taking into account therole of the Pedagogical Context in the evaluation oftext properties, in order to propose help to the userin his selection task. It is a second version of themodel which has been introduced in (Loiseau et al.,2008). We will first describe this new version of themodel, before we conclude by explaining the maindifferences between the two versions.

4.1 Recursive Definitions

The model is articulated around a couple of two indis-sociable notions: prism and facet. The prism insuresthat the properties are coherent in the way they arecomputed: “a prism is a mechanism – computerizableor not – associated to a property defined consideringthe texts’ later exploitation in teaching, which allowsto assign a value to this property for all text dependingon a given pedagogical context”9.

This definition allows us to highlight the link withpedagogical indexation: the definition of the prismdepends on the needs of the teachers. This definitionrevolves around the difference between the concep-tual level of the properties (class of properties) andtheir value (after instantiation). It is the essence of theprism which is the procedure which allows to makethe transition from the first to the latter, when apply-ing the concept to a given object (a text).

This leads us to the formalization of the propertyand like the prism depends on the property it is meantto describe depends on its alter ego: “a text facet is aproperty of the text, which was defined with a view toits pedagogical exploitation in laguage teaching and

9Translation of the definitions page 257 of (Loiseau,2009).

for which an evaluation procedure can be defined andapplied to any given couple (text,PC)”9.

4.2 Facet and Facet-Value

Before we go on and explore the consequences of theabove definitions, we shall enter a terminological is-sue. Like the term “property”, the word “facet” is, aswe use it, polysemic. It can, depending on the context,designate either the concept or the attribute. For in-stance, “parallelism” is a property (concept) which isapplicable to a certain type of object, and two planes(for instance the ground and a shelf) can have theproperty to be parallels. In the case of facets, wemight use the word to designate either the property inits conceptual form – facet Fi, text facet or just facetwith no other precision – or its value for a given cou-ple (text,PC) – a given text’s facet, Fi[CP](T ) –.

4.3 Constant Facets

From the point of view of the task of selection, thefacet is the central entity on the conceptual level: inthe planning process, the facets represent the notionsupon which the teachers base their reasoning. A ped-agogically indexed text base will not be able to takeinto account every teacher’s individual point of viewof every facet presented to them (or not in the nearfuture), the usability of such a system therefore relieson the prisms, which offer consistency through theirmechanical, systematic, nature.

Going through some of the properties representedin figure 2 will allow us to explain further the model.

TextT

Fauthor(T) =Андрей Курков

PwrdCount

Pauthor

FwrdCount(T)=1101

Figure 3: Prism examples and values for the correspondingfacets (independent from PC).

In figure 3, we indicate two examples of facets.The word count (FwrdCount ), which is exactly the sameas the property in figure 2 and Fauthor correspondingto the author of the text. The diagram also presentsthe values of these facet for a given text T . We intro-duce a functional notation based on the facets, eventhough strictly speaking the application that allowsthe computation of the values is defined inside the


417

prisms (PwrdCount and Pauthor, here), which precise thestatus of both entities:• the prism is a tool, materializing a process;• the facet is a concept, a text property which has a

value for every couple (text,PC).

4.4 The Pedagogical Context in the

Model

In these first examples, the Pedagogical Context doesnot influence the value of the facet, which remainsconstant for a given text T for any PC. The aim ofthe model is to represent more complex properties. Infigure 2, the number of representative elements is anexample of such a facet. We represent it in figure 4.In this figure, a sole prism (PrepEt ) is shown revealing

FrepEt[Pretérito](T1)=0

FrepEt[haber* que +

inf](T2)=2

FrepEt[haber* que +

inf](T1)=4FrepEt[Pretérito](T2)=3

TextT1

TextT2

PrepEt

PC2PC1

pretér

ito in

defini

dohaber* que + inf

Figure 4: Prism examples and corresponding facets for 4different (text,PC) couples.

two facets for each text. Each of the two facets of T1and T2 corresponds to a different Pedagogical Contextfor which both text could be compared to come to adecision. In the example of figure 4, the T1 contains 4occurrence of haber que structures10 and no preterit,while T2 contains 2 occurrences of haber que struc-tures and 3 occurrences of preterit. Figure 4 also rep-resents the metaphor behind the name of prism andfacets. In this metaphor the Pedagogical Context isa light cast on a text through a prism, thus revealingone of its facets. Consistently with its optical counter-part the prism divides the ray of the PC to keep onlythe components (frequencies) which are necessary tocompute the value of the facet. Applied to a systemwhich would assist the user in its selection task, thechoice of prisms would have an expressive function:the user would only be asked of the PC components

10Used to express duty in Spanish: Para soar hay quedormir (to dream, one has to sleep), Habr que resistir untiempo ms (One will have to go on resisting for a while).

required by the prisms selected, thus providing themwith means to describe the features of the teaching sit-uation which are relevant for their search (figure 5).

T

PC

PnPiP1

Figure 5: Expressive function of the prisms.

The notation introduced in figure 4 is meant torender the difference of status that exist in the modelbetween the PC and the texts. This difference comesfrom the function of the model, namely to providea framework for the implementation of a system ofpedagogical indexation of texts for language learn-ing. When performing a given iteration of the cycledescribed in 1, the PC is constant. Of course, for atask of text search to yield a text that is actually usedin language learning, the Pedagogical Context mightevolve during the various iterations of the cycle, butthe PC will be constant inside a given selection sub-task (for which a system is supposed to provide assis-tance) of a given cycle. Yet, each prism is evidentlymeant to be reusable from one cycle to the other and,by definition, has to be able to compute values of itsassociated facet for all PC11, hence the notation.

4.5 Prisms as a Mean of Selection

By definition, indexation is essentially a descriptiontask (Bertrand et al., 1996), yet its aimed at allow-ing users to easily spot the texts that satisfy theirneeds, an objective of discrimination. In our case,part of the discrimination task, will not be automat-able (e.g. based on interestingness or on the abilityto give rise to a debate), the other part will mostlyrely on constraining the tolerated values of facets. Wehave concluded that the better way to model that kind

11The implementation of certain facets, such as the num-ber of occurrences of a given type of reported speech (direct,indirect, free indirect) would require manual intervention.All the same a mechanism can be defined in order for a hu-man to annotate it (making it a facet). In a system, such afacet could be implemented on a set of texts. To make suchtexts coexist in a system with not annotated texts (treating itas a subcorpus), not applicable has to be an accepted valueof a facet for a text in a certain pedagogical context.


418

of constraint is to integrate it inside the PedagogicalContext and thus to take it into account in the valueof the facets. A constrained version of a facet justadd a phase to the mechanism associated to its com-putation: after the value of the non-constrained avatarof the facet is computed a simple test instruction canbe added, to return false if the constraint is not metand the value computed otherwise. In the constrainedfacet obtained, the expression of the constraint is partof the Pedagogical Context. Indeed, it is relevant tothe problem of the teacher to decide, depending onthe situation they want to use the text in, to excludetexts based on the value of facets such as its length.

We have been convinced of that when trying toconsider higher level facets. For instance, one canimagine developing a prism which would allow totake into account the information we have gathered inour study regarding the activity type12: let FAN be thefacet associated to this prism. FAN could be a booleanproperty telling whether a text is potentially suitablefor an activity. The PC components used would bethe activity type and the notion on which to work.The treatment would rely on the facets we have calledFwrdCount and FrepEt , fixing threshold values for eachactivity type (for instance a gap-filling exercise couldnot be longer than n words and could not contain lessthan, say, 5 occurrences of the notion. The constraintof FwrdCount and FrepEt is directly derivable from thePC of FAN , which is a clue in the direction of our so-lution. But the decisive element is the fact that thethreshold values that could be defined based on ourstudy, despite lacking precision, come from teacherdeclarations. They were given the possibility to con-sider the criteria not pertinent, which means that itis very likely that it corresponds to a conscious fea-ture expected in the text (if not explicitly evaluated)and thus qualify as a component of the PedagogicalContext. We do not consider this the only solution,but find it a consistent and practical one.

4.6 Facet vs Metadata

The notions of facet and prism allow to:• associate the concept (facet) and its modeling,

making explicit the sense of the concept handledby the tool (prism);

• model the influence of the Pedagogical Context onthe properties of the objects (texts).These two characteristics distinguish facets from

metadata. According to Bourda, metadata is informa-

12The actual implementation of such a facet would re-quire much more experimentation: we only have declaredpractices, which would lack precision.

tion on objects which can be understood by humansand processed by software (Bourda, 2002). Bothfacets and metadata are therefore meant to proposea global point of view of an object rather than high-light information contained in the document (for in-stance FrepEt means to provide a unique value asso-ciated to a structure, not to list all the occurrences ofthe structure). This similarity in the object of bothnotions is especially conspicuous for constant facets(cf. figure 3), which could be treated with metadata.But in the same way that constant functionals such asf (x)→ 0 are a particular case of functionals, constantfacets are only a particular case of a generic notion,which cannot be efficiently modeled with metadata.

This can be shown with the example of FrepEt .In order to implement comparable description withmetadata one would need to anticipate any possiblerequest made by teachers. The text “Rabbits run.”would require a descriptor saying it contains one oc-currence of the form “rabbits” but also one occurrenceof a form the lemma of which is “rabbit”. The textshould also be found if the teacher is looking for theform “run”, but also if they are looking for a text con-taining occurrence of the present simple of the verb torun. We already have 4 descriptors indicating one oc-currence of a given structure. But it might also bepertinent to know that the text contains one occur-rence of “rabbits run”, one of a form whose lemmais “rabbit” with the verb run, one occurrence of theform run associated to a plural subject, etc. And thisonly concerns a 2 word text.

When the Pedagogical Context offers a certain va-riety of potential values – each of which should beassociated with a value for each text – the fixednessof metadata requires to anticipate every single one ofthem, making it potentially hazardous or inefficientas far as storage is concerned (in our example, despitenot being exhaustive, we have found 7 descriptors fora single facet and a two word text). Facets and prisms,by associating a property and a means to compute itintroduce flexibility and dynamicity in the descriptionof resources, which seem necessary to handle the no-tion of Pedagogical Context.

5 TOWARDS IMPLEMENTATION

The example of FrepEt leads to considering implemen-tation options. Indeed, in order to introduce flexibilityand to make computation of facet values possible, theinformation on the text provided by FrepEt relies oninformation of the text. The computation of valuesof FrepEt could be handled first by performing mor-phological analysis of the text, before using regular


419

expressions on the resulting annotated version of thetext. We will refer to the information of the text addedby the first part of the process as underlying proper-ties of the text. They are to be analyzed to providedinformation on the text, namely facet values.

When implementing this sequence of treatmentsin the perspective of indexing them, the additionof underlying properties (morphological analysis forFrepEt ), which will be referred to as pre-processing,should be performed once and for all, when the textis added to the system. On the other hand, in orderto introduce the dynamicity that metadata lacks, thecomputation of facet values, which we will refer toas computation, need to be performed when the userqueries the system.

5.1 Prisms and Functions

This decomposition of the prism’s mechanism as a se-quence of treatments decomposed into pre-processingand computation allows us to answer the questionasked by note 11 p 6. When implementing a facetbased system, a prism mechanism can require humanpre-processing but computation needs to be fully au-tomatable.

As far as implementing prisms, to provide evolu-tivity and take advantage of already developed tools(especially NLP procedures), we recommend reusingthe concept of function as defined in MIRTO (Anto-niadis et al., 2004). According to this point of viewa prism is linked to a facet and composed of two se-quences of functions: pre-processing and computa-tion (cf. figure 6).

Fn4

Fn2

Fn3

Fnn

Fni

Fn1

Inte

rfac

eC

oord

inat

ion

mod

ule

Language teacher

Text collection

Pre-processing

Computation

Views

FunctionsScripts

Prisms

Selection

Evaluation

Figure 6: Proposed general architecture for a facet basedsystem.

5.2 Views

In figure 6 prisms are not the only entity composedof functions. As an extension of the indexation sys-tem and a means for the user to interact with the sys-tem we introduce the notion of views. Considering

the complexity of certain properties which intervenein the process of searching for a text to use in lan-guage teaching and the difficulty to achieve reliabilityin NLP when moving away from the form, a realistapproach needs to acknowledge the amount of workleft to the user during the phase of evaluation. Amongother considerations, the fact that “100% reliabilityis, and may stay in the future, an unattainable goal.Therefore it is more realistic to stress on ‘assisted’rather than ‘fully automated’ approaches” (Blanchardet al., 2009) is at the origin of their “didactic trian-gulation strategy”. Adapting it to our problem, viewscome as a mean to assist language teacher in the eval-uation phase. They are meant to allow the user to ac-cess to some of the underlying information, in orderto help them in their evaluation, adopting a qualita-tive point of view where prisms are quantitative. Forinstance in figure 7.

Preterit verbs

Views Pipepline (8)Pixies (5) 3 bears (13)

The Story of Goldilocks and the Three Bearstasted (3)was (2)answeredcameexplainedknockedsaidwalkedwentwere

Text

List

Preterit verbs

Pipepline (8)Pixies (5) 3 bears (13)Views

The Story of Goldilocks and the Three BearsOnce upon a time, there was a little girl named Goldilocks. She went for a walk in the forest. Pretty soon, she came upon a house. She knocked and, when no one answered, she walked right in. At the table in the kitchen, there were three bowls of porridge. Goldilocks was hungry. She tasted the porridge from the first bowl. "This porridge is too hot!" she explained. So, she tasted the porridge from the second bowl. "This porridge is too cold," she said. So, she tasted the last bowl of porridge.

Text

List

Figure 7: Example of views linked to FrepEt for PC preterit.

A user looking for a text to have their learnerswork a structural exercise on the preterit tense in En-glish, might want a text with at least 7 occurrences ofthe tense. They might want to make sure that the textcontains irregular verbs including “to be”. To discarda text the list view would be sufficient and might bemore convenient than the highlighted view (see fig-ure 7), which would offer to the teacher an in contextglimpse at the verbs, that might be preferable to makesure that the resulting activity would not prove toodifficult (or easy) for the learners.

The notion of view has not been fully formalizedyet. The link with facets has to be specified further:are some of the views completely independent from


420

any facet (and thus prism), relying on their own pre-processing or should they all be linked to a facet theway the views in figure 7 are to FrepEt ? Should theones that are linked to specific facets solely be linkedto them by their common pre-processing or shouldthey before all be linked to a prism ?

6 CONCLUSIONS

We introduced this model as a second version of a pre-vious work (Loiseau et al., 2008). This new version isnot only justified by a concerned to make it clearer:despite being similar in philosophy, it comes afterthe theorization of the notion of Pedagogical Context.Even though present in the first version of the model,PC was roughly defined. The work on the notion hasallowed us to build on sounder basis the notions offacet and prism, which have be subject to semanti-cal alteration. The prism was in the first version aglobal module of the system handling all processesand which is now explicitly linked to a facet, thus un-derlying the tight link between the two of them.

Despite its simplicity, prism PwrdCount exemplifiesthis relation, the kind of approximation inherent to thetask at hand and the usefulness of NLP in the imple-mentation of such a system. Depending on the capac-ities of the pre-processing13 the definition of the facetcan be altered (or the other way around). The wordcount can be based on a list of separators betweenwhich lie the words to be counted. But in this case theFrench “chou-fleur” could be two words, while it ac-tually designate a precise object (cauliflower)14. Thedecision of which kind of treatment to use can comefrom a didactic question: one wants to evaluate thelength of the text, in order to provide an idea of sizeof the text, considering compounds as separate wordsmight not be a problem. But one might consider thatthe word count should be as consistent with the lin-guistic definition of word as possible. But what in-terest teachers could actually be to consider as wordsonly non function words in order to get a better graspat the quantity of vocabulary necessary to understandthe text. On the other hand the choice of what thefacet actually means might come from purely practi-cal reasons: the available word count function workswith no dictionary whatsoever and cannot distinguishfunction words from others or even identify a com-

13In this case the pre-processing actually could evaluatethe property, due to its independence from the PC.

14’-’ should be a separator in French since it is addedwhen the verb and subject are inverted to form a question:Dort-elle ? Oui, elle dort comme une masse. (Is she sleep-ing? Yes she is sleeping like a log)

pound. In both, case the link between the concept be-hind the facet and the prism should remain unaltered,might it mean modifying the prism, the facet or both...

The meaning of view has also changed (the viewof this version of the model corresponds more or lessto the visualization of the former) leading to alter-ation of the implementation. The questions raised inthe previous section by this extension to the evalua-tion task are among the various implementation ques-tions at hand. We are implementing a prototype ofthis version of the model. It will undoubtedly raisemore questions, such as the definition of a frameworkfor prisms in order to make their integration and de-velopment easier.

Such a definition could also lead us to consider theproblem of the system’s adaptation to its users up toallowing them to create their own prisms and facets.Indeed we have seen with FAN that a new prism couldwith didactic added value could be implemented withvery little treatment (threshold values definition) be-yond the grouping of two existing prisms. Carefulanalysis and specification of implementation conse-quences of the properties of prisms might constitute aviable path toward end-user programming functional-ities (Nardi, 1993) through the creation of compoundprisms.

REFERENCES

Antoniadis, G., Echinard, S., Kraif, O., Lebarbe, T.,Loiseau, M., & Ponton, C. (2004). NLP-based script-ing for CALL activities. In Lemnitzer, E. H. L., edi-tor, COLING 2004 eLearning for Computational Lin-guistics and Computational Linguistics for eLearn-ing, pages 18–25, Geneve. COLING. Available from:http://hal.archives-ouvertes.fr/hal-00190373/fr/.

Antoniadis, G., Echinard, S., Kraif, O., Lebarbe, T., &Ponton, C. (2005). Modelisation de l’integration deressources TAL pour l’apprentissage des langues :la plateforme MIRTO. ALSIC, 8(Numero specialTALAL):65–79. Available from: http://alsic.u-strasbg.fr/v08/antoniadis/alsic v08 04-rec4.htm.

Balatsoukas, P., Morris, A., & O’Brien, A. (2008). Learn-ing objects update: Review and critical approach tocontent aggregation. Journal of Educational Tech-nology & Society, 11(2):119–130. Available from:http://www.ifets.info/journals/11 2/11.pdf.

Bertrand, A., Cellier, J.-M., & Giroux, L. (1996). Ex-pertise and strategies for the identification of themain ideas in document indexing. Applied Cogni-tive Psychology, 10(5):419–433. Available from:http://www3.interscience.wiley.com/journal/21437/abstract.

edna, edna resources - metadata applicationprofile [online]. (2006). Available from:http://www.edna.edu.au/edna/webdav/site/myjahiasite/shared/edna resources metadata 1.0.pdf.


421

GEM. Listing of GEM 2.0 top-level ele-ments [online]. (2004). Available from:http://www.thegateway.org/about/documentation/metadataElements/index html.

Blanchard, A., Kraif, O., & Ponton, C. (2009). Masteringnoise and silence in learner answers processing: sim-ple techniques for analysis and diagnosis. CALICOJournal. Available from: http://tr.im/calicoabokcp.

Bourda, Y. (2002). Des objets pedagogiques auxdossiers pedagogiques (via l’indexation). Docu-ment numerique, 6(1-2):115–128. Available from:http://www.cairn.info/revue-document-numerique-2002-1-page-115.htm.

Charlier, E. (1989). Planifier un cours, c’est prendre desdecisions. Pedagogies en developpement. Serie 5,Nouvelles pratiques de formation. De Boeck Univer-site, Bruxelles ; Paris.

IEEE (2002). Final 1484.12.1 LOM draft standarddocument. Technical report, IEEE LTSC WG12.Available from: http://ltsc.ieee.org/wg12/files/LOM 1484 12 1 v1 Final Draft.pdf.

Loiseau, M. (2009). Elaboration d’un modele pourune base de textes indexee pedagogiquement pourl’enseignement des langues. PhD thesis, Uni-versite Stendhal Grenoble 3. Available from:http://tel.archives-ouvertes.fr/tel-00440460/fr/.

Loiseau, M., Antoniadis, G., & Ponton, C. (2008).Model for pedagogical indexation of texts forlanguage teaching. In Cordeiro, J., Shishkov,B., Ranchordas, A., & Helfert, M., editors, IC-SOFT (ISDM/ABF), volume ISDM/ABF, pages212–217. INSTICC Press. Available from:http://mathieu.loiseau.free.fr/bdtip/fichiers/articles/icsoft-2008.pdf.

Nardi, B. A. (1993). A Small Matter of Programming: Per-spectives On End User Computing. MIT Press, secondprinting (1995) edition.

Pernin, J.-P. (2006). Normes et standards pour la concep-tion, la production et l’exploitation des EIAH. InGrandbastien, M. & Labat, J.-M., editors, Environ-nements informatiques pour l’apprentissage humain,pages 201–222. Hermes et Lavoisier, Paris.

Recker, M. M. & Wiley, D. A. (2001). A non-authoritative educational metadata ontology forfiltering and recommending learning objects. In-teractive Learning Environments, 9(3):255–271.Available from: http://search.ebscohost.com/ lo-gin.aspx?direct=true&db=aph&AN=5848430&site=ehost-live.

SCORM (2006). SCORM overview. Specification SCORM2004 3rd Edition Content Aggregation Model Version1.0, Advance Distributed Learning. Available from:http://tr.im/scorm2004 3.

Yinger, R. J. (1978). A study of teacher planning: De-scription and a model of preactive decision mak-ing. East Lansing, MI. Michigan State Univer-sity, Institute for Research on Teaching. Avail-able from: http://www.eric.ed.gov/ERICWebPortal/detail?accno=ED152747.

APPENDIX: ACRONYMS

CALL Computer Assisted Language Learning

DCMI Dublin Core Metadata Initiativeedna Educational Network of AustraliaEIAH Environnements Informatiques pour

l’Apprentissage HumainGEM the Getaway to Educational Material

LOM Learning Object MetadataMIRTO Multi-apprentissages Interactifs par des

Recherches sur des Textes et l’OralNLP Natural Language ProcessingPC Pedagogical ContextSCORM Sharable Content Object Reference

ModelTAL Traitement Automatique des Langues


422

1 · icsoft2010.bib · 2010-08-02 17:25 · Mathieu Loiseau

@proceedings{Cordeiro:2010fk,Booktitle = {Proceedings of the Fifth International Conference on Software and Data

Technologies},Colloque = {ICSOFT 2010},Dates = {22-24 juillet 2010},Editor = {Jos{\'e} Cordeiro and Maria Virvou and Boris Shishkov},Isbn = {978-989-8425-22-5},Lieu = {Le Pir{\'e}e},Year = {2010}}

@inproceedings{Loiseau:2010kx,Address = {Lisboa},Crossref = {Cordeiro:2010fk},Author = {Mathieu Loiseau and Georges Antoniadis and Claude Ponton},Crossref = {Cordeiro:2010fk},Organization = {INSTICC},Pages = {413--422},Title = {Facet and prism based model for pedagogical indexation of texts for

language learning},Volume = {2}}

Date post:	13-Nov-2023
Category:	Documents
Upload:	univ-grenoble-alpes
View:	0 times
Download:	0 times

Facet and Prism based Model for Pedagogical Indexation of Texts for Language Learning - The...

Documents