Learning Analytics Community Exchange
Interoperability Study – Assessment and Allied
Activities Draft for Public Comment
By: Adam Cooper
Published: 09 December 2014
Keywords: learning analytics, assessment, quiz, essay, survey, activity
data
This working document considers the kind of events that are likely to
occur in mainstream assessment processes, and allied activity such as
questionnaire responses. It does so from the perspective of asking:
“which events are of interest for learning analytics?” Its purpose is to
identify that set of events and attributes which may be considered to be
commonplace, and therefore a candidate for cross-platform data stores,
analysis-time mapping, improving interoperability, and standardisation.
Audience: this is a relatively technical document aimed at readers with
experience in software development and architecture, or development
of interoperability standards, etc.
Interoperability Study – Assessment And Allied Activities
Contents
1. Introduction .............................................................................................................................. 1
Caveat ........................................................................................................................................... 2
2. Assessment Analytics Context .................................................................................................... 2
3. A Simple Core Model ................................................................................................................. 5
Assumptions .................................................................................................................................. 5
Data for All Events ......................................................................................................................... 6
Core Model Events and Their Attributes ........................................................................................ 8
Possible Missing Pieces ................................................................................................................ 12
4. Current Standards .................................................................................................................... 13
PSLC DataShop ............................................................................................................................ 13
ADL Experience API (xAPI) ............................................................................................................ 14
IMS LIS......................................................................................................................................... 15
IMS Caliper .................................................................................................................................. 16
MOOCdb ..................................................................................................................................... 16
5. Source Material ....................................................................................................................... 16
Technical ..................................................................................................................................... 16
References .................................................................................................................................. 17
6. About ... ................................................................................................................................... 18
Interoperability Study – Assessment And Allied Activities
1
1. Introduction This working document1 explores the processes of various kinds of mainstream assessment and
allied activities from the perspective of asking “which events may be of interest for learning
analytics?” This exploration is not ab-initio, but takes IMS QTI as a starting-point conceptual model.
The events of interest include those triggered by the learner as well as events that pertain to the
learner’s actions, for example scoring.
The immediate purpose in conducting this study is to determine that set of events and attributes
which may be considered to be commonplace, and therefore a candidate for cross-platform data
stores, analysis-time mapping, improving interoperability, and potentially standardisation.
The following are considered to be in scope as “assessment and allied activities”:
Objective questions delivered electronically and scored automatically.
Manual essay marking, including using “electronic management of assessment” tools such as
Turnitin Grademark, and automatic originality evaluation (often termed plagiarism
detection).
Surveys/questionnaires delivered electronically, but not scored. For example, end of module
satisfaction surveys.
Competency examinations, observation of candidate behaviour and assessment against
objective descriptions of skills.
Objective questions delivered physically and scored automatically (e.g. optical mark
reading).
Use of “clickers”.
Module/course/unit level grading.
Assessment of portfolio evidence.
Double marking, moderation, and other managed quality assurance and adjustment
activities.
Automated score adjustment, such as lateness penalties, “extra credit” options, etc.
For those situations where electronic delivery occurs, this may be achieved in fully online, or online-
offline-sync scenarios. Events may originate from one or both of client (user device) and server
(responsible for delivery).
These activities do not involve an identical set of events but have sufficient overlap to motivate an
exploration of the common ground between them. Such a common vocabulary – the specific one
developed in this paper will be referred to as the Core Model – should make it easier to develop re-
usable analytical methods2, i.e.to be general purpose. Additions may be required to allow for
assessment activities that do not fit precisely into the assumed stereotypes.
Although a general purpose Core Model promises benefits through consistency, an over-general
approach forces a degree of abstraction, or extensive optionality, which would make consistent use
of the Core Model more difficult, consequently making analysis more difficult or impractical. Hence,
1 As a working document, it is subject to change without archival of previous revisions.
2 By considering a range of applications, it is assumed that the resulting Core Model is more likely to be general
purpose and not to only be suited to one or two models for how assessment happens.
Interoperability Study – Assessment And Allied Activities
2
the following are considered out of scope on the grounds that they represent a small fraction of the
assessment activity currently undertaken in educational/training establishments yet would require
the addition of complexity into a Core Model3:
Adaptive testing.
Peer assessment.
Assessment in the context of Intelligent Tutoring Systems.
Specialist assessment models/theories such as Item Response Theory.
Section 5, “Source Material”, gives references to the technical and other sources used.
Caveat
In spite of its resting on the existing, and widely implemented, basis of IMS QTI, this study should be
considered as a speculation. It is a starting point from which to consider each of the bulleted
applications given in the introduction, as general cases that summarises a considerable range of real-
world variety.
2. Assessment Analytics Context Before considering the specifics of events in assessment and allied activities, it is sensible to briefly
consider the ultimate application of the data for learning analytics. What is it that people4 are likely
to want to do with this data?
It seems reasonable to claim that assessment processes are the oldest source of data about learning
yet data from assessment has not received a great deal of attention in the literature on learning
analytics, except at a coarse level: predictions based on macro-level outcomes and grade point
averages, analytics looking at the relationship between summative outcomes and candidate
attributes, or simple visualisations of scores. This belies two realities: the ease with which
assessment analytics can be aligned to current teaching and learning practice; the extensive history
of psychometrics. Brief comment is made on these two points, although the place of assessment in
teaching and learning practice is taken as self-evident, followed by comment on existing research
literature from the Learning Analytics and Educational Data Mining communities.
Public discourse in the e-Learning space makes it clear that there is growing interest in the Electronic
Management of Assessment5 (EMA). Although the focus of attention is not on learning analytics,
these platforms are relevant in what they make possible. Combining this with the evolving support
for assessment activities in software such as Moodle, Blackboard, and Turnitin, and a sizable base for
specialised e-Assessment platforms6 suggests that there is a good basis for assessment analytics.
3 They are, arguably, better dealt with in a special-purpose, rather than general-purpose, scheme. 4 The assumption here is that these people are working in the context of an education or training organisation, or a software/content provider to those organisations. Organisations specialising in testing and assessment are not the expected users of the Core Model. 5 For example, Jisc, which supports effective use of technology across UK Higher and Further Education, has a
current (2014) project on the topic - http://www.jisc.ac.uk/research/projects/electronic-management-of-assessment . 6 The 2014 survey of Technology Enhanced Learning, http://www.ucisa.ac.uk/tel , conducted by the UK
organisation UCISA, showed that the following percentages of respondents had as a centrally-supported eAssessment tool: Blackboard (50%), Moodle (29%), Questionmark Perception (23%).
Interoperability Study – Assessment And Allied Activities
3
Alignment of assessment analytics to current practice could be expanded upon in various ways, but
the overall idea is that current practice provides a jumping-off point for adapting practice to
incorporate more use of the results of data analysis. This might involve improving the identification
of learner uncertainty/difficulty, providing more actionable feedback, improving the assessment
instruments, identifying weaknesses in the learning activities and resources, etc. These practices
occur today, but they could be enhanced in scale and quality with access to data with suitable
structure and level of detail.
The learning analytics discourse has given surprisingly little attention to these aspects, given the way
assessment affects all learners and is embedded in teaching and learning practice (Ellis 2013). Ellis’s
work is interesting both because she identifies assessment as a good entry point for learning
analytics with teaching staff, and because she focuses particularly on electronic support for teacher-
marked essays, which challenges the common assumption that assessment analytics is only about
“quizzes” containing objective questions. The use cases emerging from the use of Electronic
Management of Assessment (EMA) systems for essay marking7 provide a useful counter-balance to
the prevailing view of “e-assessment emphasis”.
Although psychometrics – the theory and techniques of psychological measurement – is a field with
much that is accessible only to assessment specialists and research workers, it includes Classical Test
Theory (CTT), which should be accessible to anyone competent to carry out learning analytics8. CTT
can be employed for the following cases9 in a typical learning analytics setting:
Item level difficulty, discrimination, reliability, etc. IMS QTI includes support for item level
usage statistics.
Test-level utility, determining whether the test tells us anything useful about the candidates.
Inter-rater reliability, in which the consistency of scoring by two or more human or
computer-based markers is compared statistically.
These do not cover all aspects of assessment-related analytics, being predominantly about the
instruments of assessment being exercised on a cohort/sample10. They are oriented towards, and
commonly used to evaluate, the quality of summative assessments and are less suited to, and used
for, supporting the processes of teaching and learning, although CTT is used to identify possible
issues with learning activity design and learning resource content.
The Educational Data Mining (EDM) conference proceedings and journal, the Learning Analytics and
Knowledge (LAK) conference proceedings, and the Journal of Learning Analytics contain numerous
7 Cath Ellis’s presentation to the ALT ELESIG – video at http://vimeo.com/85331242 – and to eLearning Forum Asia – slides at http://elfasia.org/2012/wp-content/uploads/2011/11/Breakout-Session2A-Speaker_2.pdf – make the case for assessment analytics to improve student learning. 8 The Questionmark white paper, “Item Analysis Analytics” by Greg Pope (available via https://www.questionmark.com/us/whitepapers/Pages/default.aspx), illustrate this scope of accessibility. Moodle also provides a “quiz statistics report”, which includes standard CTT measures (https://docs.moodle.org/28/en/Quiz_statistics_report). Software libraries exist for CTT, for example for R: http://cran.r-project.org/web/views/Psychometrics.html 9 Note: the Core Model will not consider these statistics per se as they are the product of analysis; we are only
concerned with the capture of data to support these kinds of analysis. 10
For example, item difficulty is only meaningful when referred to a sample; the same question might be trivial to mathematics undergraduate and impossible for a middle school student.
Interoperability Study – Assessment And Allied Activities
4
references to assessment. On the whole, however, these are not particularly relevant to this study,
oriented as it is to practical learning analytics activities in the near- to mid-term, because they:
fall outside either the likely competence or educational practices of typical educational
establishments; or
deal with summative results rather than more granular records of activity.
There are some exceptions to this general statement, for example a poster at the 2014 EDM
Conference entitled “Towards Uncovering the Mysterious World of Math Homework”(Feng 2014). A
notable exception is a paper from the 2009 EDM Conference describing the use of process mining
tools to explore patterns of navigation and response events in the taking of online objective tests
(Pechenizkiy et al. 2009), further described in a chapter of the Handbook of Educational Data mining
(Trcka et al. 2011). The process mining approach is also applicable to situations where events from
different sources overlap, for example where video watching and question answering combine to
give a richer picture of student activity outside the nominal “attempting a question”. The chapter by
Trcka et al is also interesting in that it describes the use of an established piece of process mining
software, ProM, with an existing XML workflow logging language, MXML, although it is now
superseded by XES (Günther & Verbeek 2014), which has been submitted to IEEE for
standardisation.
In summary, the context imagined for this study includes four different kinds of situation in which
assessment and related events have a role in “closing the loop”:
Assessment instruments are designed and used. Assessment analytics allows for the
determination of reliability, efficiency, and validity and the identification of which
assessments, items, or markers/scorers are in need of improvement.
Courses are designed and delivered. Assessment analytics may allow for the objective
determination of topics that require more clarity in presentation, opportunities to practice,
under-pinning material, etc. This may be achieved in “real time” during delivery or in
periodic re-design.
Informative (“instructional”) resources and low-stakes assessment are typically combined in
LMSs and similar tools. Understanding patterns of activity across these two kinds of resource
can help in the improvement of the informative resources.
Students are accustomed to receiving marks and written feedback, but assessment analytics
can help to pinpoint particular areas as next steps for improvement in knowledge, skills, or
performance-acts (e.g. essay construction), etc. in ways that are more precise and
convincing.
Finally, it is noted that Learning analytics may also be used as an assessment instrument, where
outcome measures are derived from captured activity data. This kind of approach may be able to
extend the diversity of assessment opportunities, especially to include more natural/authentic
situations in a scalable way, for example to encompass ideas of membership and process, or skilled
practice. This is not, however, the focus of this paper.
Interoperability Study – Assessment And Allied Activities
5
3. A Simple Core Model The summary from the previous section has implications at the level of data. It indicates the
potential for the gathering, management, and analysis of assessment-related data that goes well
beyond summative scores11 to incorporate:
Detail on the strengths and weaknesses in submitted work in a form that allows for
corrective action.
Detail at per-item level.
Detail in the sequence of events including time-on-task, pathways taken, etc.
These indicate where a Core Model should include detail.
Assumptions
Granularity of Events
It has been assumed that the Core Model will focus on events - at instances in time, not over
durations - that a learner would identify as significant in the assessment process. This does not
generally give data such as “time spent answering question N”, but such statistics may be computed
at analysis time. The data that would be used to compute such derived quantities should be stored
in any case, since it may be useful for computing numerous other metrics or identifying various kinds
of patterns. The principle of storing the most atomic data rather than derived data should offer the
greatest potential utility to a range of analytics situations, including some yet to be practiced. Non-
derived data is also very important for tracking-down errors, and it may be a necessary component
of transparent analytics, in which later challenges or queries must be addressed.
Treating the events as single atomic items also allows the same vocabulary to be used for the range
of different cases indicated as in scope in the introduction; there will be variety in the lifecycles of
some cases but these may be expressed using a common vocabulary. For example, whereas an
online delivery system might have a very well-defined idea that an attempt at an assessment item
can be defined by the interval between showing that question and a response being made, this is of
dubious utility when an essay is submitted and marked by a tutor. Yet both include a submission
event and the production of a score/grade.
IMS QTI as a Base Model and Vocabulary
IMS QTI has evolved over a number of years and is built on the experience and expertise of members
of the assessment industry, as well as having numerous implementations in software. This is good
reason to believe that it correctly captures many of the key concepts of assessment design and
online delivery, and does so with discrimination. Furthermore, the development of QTI by multiple
participants in the project team gives some assurance that it reflects a range of practices.
The scope of the Core Model, as outlined in the previous sections, differs from QTI in that the Core
Model focuses on the learner/candidate experience, whereas QTI focuses on expressing assessment
11
This document is not concerned with interoperability at the level of summative scores. While it may be an exaggeration to say that this is “a solved problem”, there are several approaches for which there is evidence: IMS LIS Outcomes (and a profile for IMS LTI), SCORM, ADL eXperience API, and IMS QTI Results.
Interoperability Study – Assessment And Allied Activities
6
content, how responses should be transformed to outcomes, and a means to convey results
(although the Results Reporting part is the least implemented).
Using IMS QTI as a base model and vocabulary for the Simple Core Model should both reduce the
effort required to describe the Core Model and increase its quality. The effort required becomes
more a case of re-expressing some QTI ideas from a candidate/learner perspective, and investigating
applicability outside the umbrella scenario of e-assessment in QTI (although the QTI Results
Reporting specification is explicit in its coverage of assessment other than by testing).
Minimal Assessment Resource Metadata, No Assessment Resource Content
In practice, there will generally be information about the assessment resource, in addition to the
resource itself, that describes or controls how it is delivered, scored, managed, etc. This information
would be applicable to all learners being assessed in a given instance of the assessment. While this
information will be necessary for some analyses, for the purposes of developing a Simple Core
Model, an approach of minimising metadata and avoiding content has been adopted because:
It would greatly inflate the task in hand to propose a common model, or such a model
already exists (e.g. IMS QTI);
Capturing learner-level data in a consistent and unified form is taken to be the currently-
significant missing piece of the educational technology jigsaw, whereas the metadata and
content already exists (although not necessarily in a consistent and unified form).
Candidate activity and metadata have very different lifecycles and rate of
production/change.
Light-weight event payloads are sought in the interest of scalability and performance.
In general, the operationalisation of “minimal assessment resource metadata” will be the inclusion
in the event payload of identifiers to related facts.
The aim is to strenuously avoid following a design path that leads to a data model that looks like a
data warehouse schema for assessment on the grounds that this would be to create something that
is too challenging to adopt, as well as being questionable as an architecture for event logging. The
use of identifiers allows later processing into an OLAP12 data cube, etc, should that be required.
Individual as the Subject of Assessment
The discussion in this document assumes it is individuals that are the subject of assessment.
Data for All Events
Time
At least three times may be applicable: 1) the clock time of the device being used at the time of the
event; 2) the clock time of the server storing event logs for analytics and 3) the clock time of the
delivery system (learning management or e-Assessment server). Time zone differences, and the
possibility of online-offline-sync uses, mean that discriminating between these may be very
important for analysis; from the point of view of analysing the learning process, the learner’s local
time is highly relevant.
12
OnLine Analytical Processing, an established business intelligence approach to efficiently querying multi-dimensional data.
Interoperability Study – Assessment And Allied Activities
7
The Core Model includes, for all events:
Clock time of the device being used for access, since this is the best estimator of the
learner’s local time.
Clock time of the delivery system, since this will reflect course delivery timings (e.g. release
times, deadlines).
It is assumed that the logging server time stores its time as a matter of course, and that this is
available when event data is extracted from store for analysis.
Identifiers
The following are assumed to be identified for all events logged:
A user identifier. This may require cross-mapping in analysis pre-processing13.
A session identifier. This would show the authentication session for online use, or its
equivalent for off-line cases (e.g. an observation session in a competency assessment).
One or more identifiers for the assessment and its component parts. See below.
Identification of the application and version that originates the data. This may influence
the meaning or significance attributed to the data when it is analysed. It is equivalent to
tool_consumer_info_* in IMS LTI.
In addition, when the assessment items are delivered electronically:
An “attempt” session identifier. This has application-specific meaning for what constitutes
an attempt on an item in an online delivery system. For IMS QTI compliant software, this is
the Candidate Session identifier14.
Identification of the containing learning resource/activity. This may include identification of
the “learning context”. These should at least include equivalent s for resource link and
context ids as defined in IMS LTI. This provides information on where the assessment was
launched from and the course/module it is part of.
Identifiers for the assessment and component parts follow the following structure:
The assessment. This is always present and must uniquely identify a single assessment
opportunity. This is distinct from the QTI usage, which refers to the test (a measuring
instrument). An assessment opportunity could be described as an instance of a test available
in a limited time window, and generally to a limited number of candidates, but it also
includes assessment other than by test. It is identical to the LineItem concept as defined in
IMS LIS (Outcomes Management Service).
An assessment part. Assessment parts distinguish independently submit-able units of the
assessment. This must be unique within an assessment and must be present when a subset
13
It is common practice to expose different user ids to external tools, and to use different identifiers for data exports that may contain PII (e.g. forum text), as an aspect of data security. 14
This statement is in need of verification. The QTI ASI specification glossary defines Candidate Session in terms of interactions with items but does not make clear what the meaning of the term is when multiple items are viewed simultaneously.
Interoperability Study – Assessment And Allied Activities
8
of items in an assessment is submitted. See the QTI specification of “submissionMode” for
an account of the QTI equivalent “testPart” in simultaneous submission mode.
A section. Section identifiers distinguish groups of assessment items that are presented
together, but not with items in another section. For example, if sets of questions are shown
one screen-page at a time, the section identifier would show which questions were
presented together. There may be 1 or more sections in a part. If the items in a section are
submittable (for outcome processing), then those items also comprise a part. This must be
unique within an assessment and must be present when a subset of items in an assessment
is shown.
An item. An atomic unit of scoring that may contain information, instructions, and more
than one unit of interaction (e.g. two related multiple choice selections). The concept of an
item, and its relationship to interactions, is as described in the QTI Implementation Guide
but is not dependent on a delivery system implementing QTI.
Identifiers for these will be abbreviated Aid, Pid, Sid, Iid in the following.
Core Model Events and Their Attributes
Following on from the brief outline of learning analytics requirements, and taking IMS QTI as a
conceptual reference-point, a Core Model is advanced in the following two tables. These outline,
separately, the kind of event (Table 1Table 1) and the nature of the attributes needed to capture the
necessary facts about the event (Table 2).
Action Event Attributes Notes
Access Aid
Sid
Array of Iid15
Resume
[NumAttempt]
At least one item is presented to the respondent16. If only some items in an assessment are presented, Sid must be specified. This event must always occur in cases of e-assessment and may also occur if, for example, the task description for an assignment is provided electronically but the activity of producing the assignment is not tracked.
Response Change Aid
Iid
Item response
This event, which only applies for fully-tracked electronic delivery (and is not required), allows for the capture of the current state of the response to a given item after it changes
17, but before submission or leaving the item(s).
TimeOut Aid
Sid
Array of Iid
The delivery system reached the maximum time allowed for a response and prevented further interaction. This may, but need not be, immediately followed by submission of responses for scoring, signalled by a separate event.
Get Hint Aid
Sid
Array of Iid
Identifier for the hint resource
This is a no-penalty hint requiring no response processing. See section 7.6 in QTI specification (infoControl class). Hints will usually be at the level of an item.
Leave Aid
Sid
The respondent discontinued interaction in a controlled manner (e.g. due to clicking “next”). The current state of their
15
Of length 1 for single item presentation and listing all item identifiers in the given assessment or section that is identified. The same applies to all cases where “Array of Iid” is indicated in this table. 16
This may be an item for which no response is possible, e.g. initial instructions. 17
Some detail may be required here if the Core Model were to be developed into a technical specification, although it may be best left as an application-specific rule (e.g. JavaScript onChange event in HTML) as to what constitutes a change.
Interoperability Study – Assessment And Allied Activities
9
Array of Iid
Array of item responses
responses may be saved, according to application design, but there is no response processing. The state following this event is equivalent to the pendingSubmission value of sessionStatus in the QTI Results Reporting specification.
Submit Aid
Pid
Array of Iid
Array of item responses
KeyId
OutcomeExpected
[NumAttempt]
Response(s) have been submitted, either by an explicit user action or automatically (e.g. following a time-out). Depending on the assessment, it may be that items are submitted individually, in sets (“parts”), or for the whole assessment. In all cases, the 1..* values for the submitted items are included, along with the root Aid. The state following this event is equivalent to the pendingResponseProcessing value of sessionStatus in the QTI Results Reporting specification. For essay assignments and similar cases, the Iid could be considered to be redundant but is must be given for consistency. This may be the first event recorded for some assessments.
Start observing Aid
Pid
KeyId
Observer identity
This is for cases where observable behaviour, rather than a response to a question or assignment, is being assessed. This would apply to competence assessment of vocational skills, observation-based assessment of collaboration, etc. This event marks the start of an observation. In some senses, this is comparable to a submission, and the KeyId has equivalent purpose.
Stop observing KeyId Signals that a period of observation has ended. Depending on the situation, outcomes may be determined after “stop observing” or between the start and end of an observation period.
Outcome determined
OutcomeLevel
OutcomeStatus
Identifier(s) for assessment, part, or items (according to value of OutcomeLevel)
Array of OutcomeLists
Assessor identifier
Array of keyIds
OutcomeLevel indicates whether the outcome is for an assessment, or an assessment part, or comprises a set of item-level outcomes. For cases of double marking, two outcomes would arise from one submission. An assessment-level outcome may arise from multiple submissions/observations, so multiple keyIds may be required. Conversely a single submission of all assessment items of may lead to multiple “outcome determined” events, for example if objective questions are mixed with those requiring text responses and human marking. The state following this event is equivalent to the “final” value of sessionStatus in the QTI Results Reporting specification. NB: this does not mean that the outcome is finalised for the assessment; moderation or penalties may apply.
Outcome adjusted
{as outcome determined}
AdjustmentReason
This allows for various kinds of adjustment to be applied, where the officially-recorded outcome includes a penalty or bonus, e.g. a lateness penalty. The pre-adjusted outcome is a more accurate indication of the student’s ability. This means that adjustments due to moderation (etc) are not “outcome adjusted” events (see the explanation for OutcomeStatus attribute below).
Table 1 – Core Model Events
Attribute Notes
Adjustment reason
Enumeration Lateness, additional credit.
Assessor identity An identifier of the person or software responsible for
Interoperability Study – Assessment And Allied Activities
10
declaring the outcome. Although the word “assessor” is used, this could also apply to cases where an outcome declaration is essentially a ratification or verification act.
Identifier for the hint resource
Item response An array of {interaction type, cardinality, interaction response, response type}
Based on the IMS QTI item/interaction model and vocabulary (see below).
NumAttempt Integer The index number of the attempt according to the delivery engine, if known. IMS QTI compliant systems are required to maintain this information but it is not reasonable to expect all tracked applications to do so.
Observer identity An identifier of the person or software responsible for observing the performance. There will usually be an outcome-determined record with the same agent.
OutcomeList An array of {outcome label, outcome value, outcome data type, outcome reference}
One or more outcomes, which may be associated with an assessment, part or item. This is substantially modelled on IMS QTI (see below).
OutcomeExpected Boolean An outcome is expected for this submission. A value of false would apply to a questionnaire. Although this could be inferred from the identity of the application (see “Identifiers”, above), an explicit attribute avoids the need for look-up tables.
OutcomeLevel Enumerated value (assessment, part, item)
Specifies whether the outcome record contains outcomes for individual items (typically item scores), an outcome for an assessment part, or an outcome for the entire assessment.
OutcomeStatus Enumerated value (provisional, final)
This allows for provisional outcomes to be appropriately marked, and final outcomes, determined by processes such as verification, ratification, moderation, etc. to be clearly identified.
Resume Boolean A flag to indicate previously saved responses were restored to the items presented in the Access event.
KeyId A unique identifier for the submission or observation, usable to ensure correlation of submission/observation with outcomes for cases where outcome processing is delayed and multiple submissions/observations are permitted. Timestamps should allow this correlation to be inferred; this identifier ensures the correlation is accurately known.
Table 2 - Attributes for Core Model Events
Item Response Details
IMS QTI models an assessment item as one or more interactions. In many online quizzes or
questionnaires, there will only be a single interaction, for example a single-selection multiple choice.
Multiple interaction items allow for cases where the score for an item depends on the response to
both interactions. Hence an Item Response is an array, although commonly of length 1.
The base set of interaction types (which may be extended if necessary) is as defined in the IMS QTI
2.1 ASI Information Model. The cardinality value, drawn from the IMS QTI enumeration, captures the
difference between, for example, a multiple choice where only one option may be chosen and
where multiple options may be chosen; although these are often described as different “question
types”, the QTI approach of using interaction types with a qualifier is adopted.
Interoperability Study – Assessment And Allied Activities
11
The interaction response type should be drawn from the baseType enumeration of IMS QTI and the
encoding of the response should follow the QTI specifications for how choices, pairings, etc. are
expressed.
Some responses may be files18, for example an essay, photograph, presentation slides, video, etc.
Outcome List Elements
For assessment based on objective questions, the computation of assessment-level outcomes from
item-level scores follows from mathematical formulae or algorithms, but for human-marked or
observation-based assessment there may be an equivalent structure in the form of a marking
scheme. Marking schemes have a role in learning analytics of non-objective assessment as they
naturally indicate strengths and weaknesses in relation to the intention of assessment. Marking
schemes may take a variety of forms:
Marking guide in which the marker can freely assign a score against several dimensions of
quality up to a specified maximum.
A rubric19, a matrix approach to scoring performance in essays, competency examinations,
etc. The matrix defines various dimensions of quality (attributes being assessed) and several
descriptions of typical performance against each dimension that match certain scores or
level values. For each dimension, the human marker chooses the description that best
matches their subject to determine the score/level to assign to the outcome that
corresponds to the dimension.
Checklist of criteria, in which a yes/no decision maps to a non-zero/zero score for each
criterion. This can be viewed as a special case of a rubric.
The approach to outcomes should allow for a level of detail beyond a simple summative score for
both objective item-based assessment and human-marked assessment against a scheme of some
kind. This leads to the following approach to capturing outcome information.
Each outcome for an assessment etc, and there may be several outcomes, comprises:
outcome label: a name for the outcome, which may be specific to an assessment, testing
application, institution, etc. This includes the outcome variable names, and their specified
usage, as defined in the QTI specification (SCORE, DURATION20 and PASSED). The approach
of using section or part identifiers as prefixes when designating section or part SCOREs etc,
as described in the QTI specification, is not necessary since the scope of the outcome label is
given by the combination of “outcome level” and “identifier” attributes of the “outcome
determined” event. Outcome labels may be used to identify dimensions of quality in a
marking guide or rubric.
outcome value: the value of the outcome, e.g. an integer or decimal score, a letter grade,
etc.
outcome data type: the data type of the outcome value, using the baseType enumeration
from IMS QTI.
18
In practice, these are expected to be stored in the system handling the submission and not to be duplicated into an analytics data store. Consequently, the assumption is that a URL will be used to locate the response. 19
The word “rubric” is also often used to refer to instructions given to candidates in an assessment. 20 DURATION should only be used when the test delivery engine tracks time spent.
Interoperability Study – Assessment And Allied Activities
12
outcome reference an optional reference, by URI, to an externally-defined21 learning
objective, performance criterion, etc. This should be interpreted as an imprecise mapping,
and not necessarily as an indicator that the external outcome was achieved.
It may be useful to determine which items were not answered (this is common in applications of
classical test theory), or to generalise this to assessments. An outcome label of MISSING is reserved
for this purpose. Items that are not answered should use this outcome label and other cases where
response(s) have not been submitted but where an outcome is recorded may use MISSING.
It is commonplace for assignments to be subjected to some form of originality evaluation, commonly
referred to as plagiarism detection software. Strictly speaking, the determination of plagiarism is
generally a human judgement informed by the results of an automated originality evaluation. The
result of originality evaluation may be captured using the reserved outcome label of “ORIGINALITY”.
In addition, software to support the electronic management of assessment often supports the use of
“comment banks”, i.e. pre-defined comments that may be selected by the marker. Since comments
are usually used when a weakness is identified, these allow for patterns of weakness to be explored
within a cohort, or between cohorts undertaking the same assessment task, etc. To accommodate
this use case an outcome label of “COMMENT_TAGS” is proposed to contain a space-separated list
of comment tags (assessment-scoped comment identifiers) to be associated with the outcome.
Possible Missing Pieces
These are features omitted from the Core Model but which may have some merit, and may need
further thought or discussion.
Activity Type
It may be useful in practice to know what stereotype the delivery application conforms to since this
will indicate the event patterns expected. There is likely to be some variation so an activity type
indication is likely to be just an “indication”, a hint. This is left as a missing piece because it is not
clear how to balance specificity of type vs number of type definitions required. At one end of the
spectrum, it may simply hint at the stereotype – e.g. be a label such as quiz, assignment, survey,
grading, etc. – and at the other becomes equivalent to an identification of the application, or maybe
further broken down. Indeed, it may be most useful to simply declare that Activity Type is an
application-specific vocabulary, so to allow analytics scripts to contain code to handle similarities
and differences according to local rules; attempts to specify stereotypes may be doomed.
Requesting Explanations
Some delivery systems can provide post-submission explanation that is intended to help the
candidate understand the problem with an incorrect response. It may be useful for the request of
such information to be recorded. Note: the act of requesting an explanation is assumed to be
educationally-relevant, rather than this information being automatically provided in the normal
course of events. The QTI endAttemptInteraction would lead to logging of a “request explanation”
event, in addition to submission and outcome events.
21
External to the assessment; this may be a reference to an intended learning outcome in an educational establishment’s module specifications, or to a state-wide educational standard.
Interoperability Study – Assessment And Allied Activities
13
Sequence Index of Items
Since items may sometimes be shuffled, it may be useful to record the sequence index of items to
indicate the actual order of presentation. This could be achieved by Iid ordering or a sequence index
being added to the Access event.
Assessment/item Metadata
Although a general principle of minimising metadata was advanced in the section “Minimal
Assessment Resource Metadata, No Assessment Resource Content”, this may have been applied
over-rigidly. In some cases, it may be practical to avoid adding additional attributes. For example an
outcome label MAXSCORE is indicated in the QTI documentation, but not referred to in the core
model above us.
Event Patterns, State Transitions and Lifecycles
The Core Model has avoided dealing with event sequences and their relationship to delivery system
state-transitions and the lifecycle of assessment and related processes. Practical implementation
work on capturing these events should attend to this temporal aspect. It is likely that there are some
recurrent patterns that are shared by similar kinds of activity; it would be useful to gather these to
support convergent practice, but it is felt to be over-speculative to propose such patterns in the
absence of evidence.
There is one existing state model in the IMS QTI 2.1 specification; it defines a state model for
compliant delivery engines, as well as an enumeration for sessionStatus in the QTI results reporting
specification (as indicated in Table 1). While not all “assessment and allied activities” will be QTI
compliant, and not all delivery engine transitions are necessarily useful for learning analytics – with a
learner/learning focus rather than a delivery-engine focus – a mapping would be generally
informative as well as being of particular relevance to QTI implementations.
4. Current Standards The word “standards” is used loosely to include proposed generic data storage/access patterns.
PSLC DataShop
The Tutor Message format (v4) was consulted22, and found to contain a few details on assessment-
related semantics:
The semantic_event accommodates RESULT, ATTEMPT, and HINT_REQUEST. A further free-
form 30 character string permits expression of a subtype to the semantic event.
The action_evaluation element has preferred values that include CORRECT and INCORRECT.
TMF includes a skill data element, intended to associate a “knowledge component” (a
concept specific to intelligent tutoring systems) with other data.
The mapping from existing TMF data to the Core Model is quite minimal. Alternatively, the Tutor
Message Format is general purpose, and it may be possible to profile it (describe how it should be
used, with definition of appropriate vocabularies) to fully express the Core Model.
22 http://pslcdatashop.web.cmu.edu/dtd/guide/tutor_message_dtd_guide_v4.pdf
Interoperability Study – Assessment And Allied Activities
14
ADL Experience API (xAPI)
The xAPI does not specify vocabularies for the events given in the Core Model but it does specify
data structures for interactions and outcomes/results and includes features that allow vocabularies
for event types, which are referred to as activity verbs in the xAPI specification. ADL and the xAPI
community expects that these vocabularies will published online, separately from the core
specificaiton.
Concerning the Built-in Features
The relevant built-in features are listed below, with section references referring to xAPI v1.0.1
documentation.
Interaction Activities (section 4.1.4). These are limited to interactions as defined in SCORM
and this part of xAPI specifies how to describe the activity rather than the user’s activity.
A Result object (section 4.1.5). This “represents a measured outcome related to the
statement in which it is included.” This breaks the principle of atomic events (see the section
“Assumptions”, above) since it includes the result as part of another statement of activity. In
addition to breaking the principle – which is not, of course, un-challengeable – such bundling
is not appropriate for some of the assessment and allied activities indicated in the
introduction to this document, for example when assessment presentation, submission, and
scoring are quite separate events, each with their own attributes.
A Score object (section 4.1.5), which is part of the Result object. This handles only a single
numerical outcome.
It would be possible to use Result and Score to capture some of the information in the Core Model in
some situations but, relative to the Core Model, a considerable loss of information would occur.
Correspondingly, it would not be possible to extract sufficient information from an xAPI statement to
express the events as per the Core Model. To get around this problem would require the addition of
quite a few extensions, essentially to capture a series of component events within an umbrella
assessment activity record.
In conclusion: if using xAPI to capture detailed assessment (and related) events, Result should be
avoided and externally-defined vocabularies preferred. This will give a more uniform approach than
using Results extensions, since it avoids making assessment a “special case”.
Concerning Externally-defined Vocabularies
The Tin Can Registry23 and the ADL xAPI vocabulary24 list event verbs that may be used with xAPI,
including “saved” and “submitted” (http://activitystrea.ms/schema/1.0/{save,submitted}), which are
borrowed from Activity Streams, and “answered” (http://adlnet.gov/expapi/verbs/answered). These
map on to the Core Model in the case when the object is an assessment or related activity. Another
verb is “completed”, also borrowed from Activity Streams, but the semantics of “completed” are not
a perfect match to the submission of an assignment; it is submission that would be tracked in
practice. There is also a verb, “viewed” (http://id.tincanapi.com/verb/viewed), and “resumed”, that
could be correlated with “Accessed” in the Core Model. The verbs “passed”, “failed”, and
“mastered” are essentially special cases of the Core Model concept of an outcome; they are
23
https://registry.tincanapi.com/#home/verbs 24 http://adlnet.gov/expapi/verbs/
Interoperability Study – Assessment And Allied Activities
15
insufficient to cope with variety but could be used in alongside, as commonly understood summative
outcomes.
An application profile for capturing assessment and allied events would have to nuance these
definitions. It may be cleaner to coin new verbs for the Core Model event types and this would be
necessary for some of them in any case.
The Tin Can Registry also includes a recipe entitled Checklist Performance Observation25, which
describes how a session of pass/fail assessments of a set of predetermined tasks would be recorded
using the Experience API with standard verbs, with their particular interpretation signalled by an
identifier for the recipe. This matches one of the use cases for the Core Model, and illustrates one
way in which the other use cases could be mapped to xAPI. It also suggests that the Core Model
could be taken forward as a series of recipes using common concepts to address various use cases;
the Core Model could remain as a resource for new recipes, but the existence of defined recipes
would make it easier to select templates in common situations.
It would require working through quite a few use-case driven examples to clarify the relationship
between the core model and xAPI + vocabularies, but the tentative conclusion are that:
Both could be used for self-contained objective tests with simple numerical outcomes and
basic skill assessments.
It would be possible to create new verbs and usage recipes for xAPI along the lines of the
Core Model.
It would be necessary to use the xAPI extension feature to transport all of the attributes in Table 2,
again with a requirement to create at least one URI for unique identification of type.
IMS LIS
Overall, the scenario of use of LIS Outcomes is very different to the event tracking approach of the
Core Model. LIS is concerned with data synchronisation between student record systems and
learning management systems, and so is concerned with few high stakes summative outcomes.
Never-the-less, these are of interest for learning analytics, so it might be useful to capture only these
summative events into the same data store as, for example video use, even if assessments are not
tracked in detail. LIS outcome data should also correlate precisely with final summative Core Model
outcome events.
The points of contact between the Core Model and LIS Outcomes are summarised. As noted in
“Identifiers”, the assessment entity in the Core Model is identical to the LineItem concept as defined
in IMS LIS (Outcomes Management Service). If replicating LIS interactions as tracking events26, the
Core Model would be used to capture the Result on completion of CreateResult(),
CreateByProxyResult(), ReplaceResult(), and UpdateResult(). OutcomeStatus is an equivalent of LIS
statusofResult, and not lineItemType, which conceptually aligns with the outcome label.
25
https://registry.tincanapi.com/#profile/20/recipes 26
This should not be understood as a recommendation that LIS web service calls be replicated as tracking events; LIS activity is unlikely to be synchronous with learner events.
Interoperability Study – Assessment And Allied Activities
16
The Core Model does not include an equivalent to the LIS ResultValue, which gives the permitted
range of outcomes, consequential to the principle of minimal metadata in the Core Model.
IMS Caliper
Work on IMS Caliper27 is in progress and will be published by IMS on completion. Subject to this
occurring during 2015, this “draft for public comment” document will be revised accordingly.
MOOCdb
MOOCdb28 includes some support for assessment and the documentation states:
“Due to the online nature of submissions, assessments are handled in different ways. Assessments
could be done by the computer via simple check mechanisms or automated algorithms, peer review,
evaluation by instructors and/or graders. For some courses multiple assessors are used. The MOOCdb
schema captures this [these] situations.”
MOOCdb is, however, very much focussed on the submission event and a minimal representation of
outcome (it uses “assessment” to refer to the outcome) as a single floating point number in the
range 0-1. It also appears to lack information, such as an outcome status, that would be necessary in
multiple-assessor, or staged assessment, scenarios.
It appears to be possible to express MOOCdb data in the Core Model except that:
MOOCdb lacks any specification of interaction types and response format.
MOOCdb includes assessment structure and metadata (e.g. deadline, weighting) that were
intentionally not included in the Core Model.
Expressing Core Model in MOOCdb would be limited to the MOOCdb Submissions and Assessment
tables (the Problems table contains structure and metadata) and would only be possible if the
SCORE outcome is used. This transformation would lose quite a lot of information in many scenarios.
5. Source Material
Technical
ADL Experience API (xAPI)
This is sometimes known as Tin Can API, from the project that initially developed it. The developers
continue to provide information and software support to adopters.
Core resources are:
The Experience API 1.0.1 specification - http://www.adlnet.gov/tla/experience-
api/technical-specification/
The ADL xAPI vocabulary - http://adlnet.gov/expapi/verbs/
The Tin Can Registry - https://registry.tincanapi.com/#home/verbs
IMS LIS (Learner Information Systems)
27
http://imsglobal.org/caliper 28 http://moocdb.csail.mit.edu/wiki/index.php?title=MOOCdb
Interoperability Study – Assessment And Allied Activities
17
LIS is comprised of a number of parts, supporting the principal data exchanges between student
record systems and learning management systems. Only the “outcomes service” is relevant to this
study.
The original LIS v1.0 specification -
http://www.imsglobal.org/lis/lisv2p0p1/OMSInfoModelv1p0.html.
A public draft v1.0 showing a compatible use with IMS LTI -
http://www.imsglobal.org/lti/ltiv1p2pd/ltiOMIv1p0pd.html
IMS QTI (Question and Test Interoperability)
Whenever “IMS QTI” is written in this document, the reference should be understood to refer to
version 2.1. It is available from http://www.imsglobal.org/question/. Particular sections of relevance
are:
Implementation Guide
Assessment Test, Section and Item Information Model (ASI)
Results Reporting
References
Ellis, C., 2013. Broadening the Scope and Increasing the Usefulness of Learning Analytics: The Case for Assessment Analytics. British Journal of Educational Technology. Available at: http://eprints.hud.ac.uk/16829/1/Ellis_BJET_submission.docx [Accessed July 11, 2014].
Feng, M., 2014. Towards Uncovering the Mysterious World of Math Homework. In Proceedings of the 7th International Conference on Educational Data Mining. pp. 425–426. Available at: http://educationaldatamining.org/EDM2014/uploads/procs2014/posters/101_EDM-2014-Poster.pdf [Accessed November 12, 2014].
Günther, C.W. & Verbeek, E., 2014. XES Standard Definition v2.0. Available at: http://www.xes-standard.org/_media/xes/xesstandarddefinition-2.0.pdf [Accessed April 29, 2013].
Pechenizkiy, M. et al., 2009. Process Mining Online Assessment Data. In T. Barnes et al., eds. Proceedings of the 2nd International Conference On Educational Data Mining. pp. 279–288.
Trcka, N., Pechenizkiy, M. & van der Aalst, W., 2011. Process Mining from Educational Data. In C. Romero et al., eds. Handbook of Educational Data Mining. CRC Press, pp. 123–142.
Interoperability Study – Assessment And Allied Activities
18
6. About ...
Acknowledgements
The author would like to thank Brian Kelly and Tore Hoel for reviewing the v0.2.1 draft.
This document was produced with funding from the European Commission Seventh
Framework Programme as part of the LACE Project, grant number 619424.
About the Author
Adam works for Cetis, the Centre for Educational Technology and Interoperability
Standards, at the University of Bolton, UK. He rather enjoys data wrangling and hacking
about with R. He is a member of the UK Government Open Standards Board, and a
member of the Information Standards Board for Education, Skills and Children’s
Services, and is a strong advocate of open standards and open system architecture.
Adam is leading the workpackage on interoperability and data sharing.
About this document
(c) 2014, Adam Cooper.
Licensed for use under the terms of the Creative Commons Attribution v4.0
licence. Attribution should be “by Adam Cooper, for the LACE Project
(http://www.laceproject.eu)”.
For more information, see the LACE Publication Policy: http://www.laceproject.eu/publication-
policy/. Note, in particular, that some images used in LACE publications may not be freely re-used.
This is a public draft document for comment; the latest version and an explanation of how
to comment is available from: http://www.laceproject.eu/dpc/assessment-events-
learning-analytics-interoperability-study/. The final version will be linked-to from there.
About LACE
The LACE project brings together existing key European players in the field of learning analytics &
educational data mining who are committed to build communities of practice and share emerging
best practice in order to make progress towards four objectives.
Objective 1 – Promote knowledge creation and exchange
Objective 2 – Increase the evidence base
Objective 3 – Contribute to the definition of future directions
Objective 4 – Build consensus on interoperability and data sharing
http://www.laceproject.eu @laceproject