Date post: | 13-Jun-2015 |
Category: |
Technology |
Upload: | nicolas-van-labeke |
View: | 120 times |
Download: | 0 times |
02 December 2008
MyPlan - Similarity Metrics for Matching
Lifelong Learner Timelines
Nicolas Van Labeke
Using Similarity Metrics for Matching Lifelong Learners 2
The Context
• Lifelong Learners?– Learning opportunities– All ages, all contexts
• Role of Technology?– Ubiquitous access to resources and facilities– Learner-centred models of organising and
delivering educational resources
• Better support for planning?
Using Similarity Metrics for Matching Lifelong Learners 3
The MyPlan project
• funded by the JISC e-Learning Capital programme, 1/9/2006 – 30/11/2008(RA 1/4/2007 – 30/7/2008)
• developing, deploying and evaluating new techniques and tools that allow personalised planning of lifelong learning
• building on and extending the earlier L4All project and software prototype, funded by the JISC Distributed e-Learning Pilots programme 1/2/2005 – 31/10/2006
Using Similarity Metrics for Matching Lifelong Learners 4
Partners (MyPlan)
• Birkbeck College – 80% of students are part-time
• Institute of Education• Community College Hackney
– A Level, GCSE, adult learning courses, teacher training and vocational qualifications
• UCAS– UK central organisation through which applications are
processed for entry to HE, providing information and services to prospective students and HE professionals.
• Linking London Lifelong Learning Network (L4N)– support lifelong learners in the London region, providing them
with access to information and resources that facilitates their progression from Secondary Education, through to Further Education (FE) and on into Higher Education
Using Similarity Metrics for Matching Lifelong Learners 5
L4ALL – Approach
• Taking a holistic view of lifelong learners’ work and learning experience
• Based on the notion of learning pathways
• Sharing learning pathways with others:– identifying learning opportunities that may not
otherwise have been considered– positioning successful learners “like me” as
role models
Using Similarity Metrics for Matching Lifelong Learners 6
L4ALL – Methodology
• User requirements elicitation, via interviews with HE and FE students, focus groups (educators, recruitment & careers specialists), workshop events, consultation with advisors: – use cases– examples of learning pathways– identification of critical decision points
• Technical requirements elicitation– development of tools and standards– use of existing e-services where possible
• User-centred design– Iterative & incremental prototyping– Usability
Using Similarity Metrics for Matching Lifelong Learners 8
L4ALL – Supporting Engagement & Participation
1. Lifelong learners require support not only at the level of the individual user but also at the level of a group or team, and of the learning community as a whole
2. There are critical decision points or periods where lifelong learners need increased support
3. A partnership between the different stakeholders (e.g. lifelong learners themselves but also learning providers, career advisors, adult learning organisations) is an important element in offering a holistic approach to personal development.
Using Similarity Metrics for Matching Lifelong Learners 9
L4ALL – Personalising the pathway through lifelong learning• Breaking the “one size fit
all” mould• Recognition of diversity
• Different interaction at different stage of the journey– Motivation– Curriculum – Logistic – Pedagogy– Assessment– Opportunity Why should I learn?
What can I learn?
How could I study?
How will I learn?
How do I know I've learned?
Personalised needs-benefits analysisAccess to advice, guidance, learners’ case studies
Curriculum choice through HE partnershipsCloser links to work and community
Adaptive, interactive learningCommunication, collaboration
Assessment when readyProgress files, e-portfolios
Access to information & guidanceQualifications - career options planner
Flexible modes, locations etc. Mix of home, campus, overseas
Where will it take me?
Using Similarity Metrics for Matching Lifelong Learners 10
L4ALL – Lifelong Learning for All The System
• Timeline: record of a user’s learning trail– Educational, professional and personal
• A web-based portal for lifelong learners– Access information about courses– Manage personal development plan– Annotate, Reflect & Share
• Pilot System – Incremental design– Simple Service-Oriented Architecture– Ontology-based Learner Model (RDF - JENA)
• Skeleton of a Social Network Platform?
Using Similarity Metrics for Matching Lifelong Learners 11
L4ALL – System Architecture
Web Server (Apache Tomcat)
L4All Portal (JSP)
Graphical User
Interface
UserManagement
Timeline Search
Course Search
Web Services(Servlets)
Java Beans
JENA
Semantic Web
Framework
Databases(MySQL)
User Course
LearnDirect API
Course Search
GoogleMapAPI
Location Search
Using Similarity Metrics for Matching Lifelong Learners 12
Using Similarity Metrics for Matching Lifelong Learners 13
MyPlan - Introducing Personalised Functionalities• To develop and evaluate user models that reflect the needs of the
diverse population of lifelong learners. – Lifelong learner ontology, interoperability (H. Baajour)
• To allow learners to role-play different learning and career progressions, by integrating game-based applications into the system– Second Life sessions (S. De Freitas)
• To enhance individual learners’ engagement with the lifelong learning process by developing, deploying and evaluating personalised functionalities for searching and recommendation of learning opportunities – Personalised search of timelines– Recommendations
Redesigning the GUI
Using Similarity Metrics for Matching Lifelong Learners 14
SIMILE Javascript Timeline – http://simile.mit.edu/timeline/
Using Similarity Metrics for Matching Lifelong Learners 15
Searching the L4ALL User Model
• A three-part model– User Profile: identification, personal information, …– Learning Profile: learning goals, skills, qualification,
…– Timeline, as set of episodes: description, title,
classification, start date, duration, …• Search by keywords
Personalised search for “people like me”– Reflect structure and semantic of timelines– Detect “similarities” between learners’ pathway
Using Similarity Metrics for Matching Lifelong Learners 16
Similarity Metrics
• Textual-based metrics with algorithm-specific indication of similarity between 2 strings– “SAM” / “SAMUEL”
• Levenshtein Distance (Edit Distance)– number of transpositions, substitutions and deletions
needed to transform one string into another
• Information integration & applied CS– bioinformatics, musicology, phonetic, etc– ITS: sequence of instructional activities (
Using Similarity Metrics for Matching Lifelong Learners 17
Our approach
• Black-box– Reusing existing metrics– Identifying behaviour in the context of timeline
• Different interpretations of “people like me”
• Focus on usability, not accuracy
Tokenisation of Timelines
Using Similarity Metrics for Matching Lifelong Learners 18
Hypothesis 1 & 2 : Time
• Timelines are (obviously) time-dependent– Essential for user’s own pathways– No evidence for relevance in “people like me”
• Similar episode two years apart?• Similar episode twice as long (part-time)?
Start dates and duration ignored Gap between episodes ignored
Relative position used to sort episodes
Using Similarity Metrics for Matching Lifelong Learners 19
Hypothesis 3 : Category of episode
• Different categories of episodes– Educational– Occupation– Personal
• Importance for own pathways – critical turning point
• Irrelevant for “people like me”?
Categories to be filtered out by user
Description
SC Attended school
CL Attended college
UN Attended University
DG Obtained a degree
CS Attended a particular course
WK Employed
VL Voluntary work in charity/voluntary organisation
BS Started a business
ML Attended military service
RE Retired
UE Unemployed
CR Home carer
MV Moved to a different location
TV Spent some time abroad
CH Birth in the family
AD Adopted a child
DE Death in the family
MA Got married
SE Divorced
DS Developed a (permanent) disability
IL Developed a (temporary) illness
OTAny user-defined episode not covered previously
Using Similarity Metrics for Matching Lifelong Learners 20
Hypothesis 4 : Classification of episodes
0.0.0.0 Unknown1.0.0.0 Managers and Senior Officials2.0.0.0 Professional Occupations
2.3.0.0 Teaching and Research Professionals2.3.2.0 Research Professionals
2.3.2.1 Scientific Researchers2.3.2.2 Social Science Researchers2.3.2.9 Researchers N.E.C.
- -2.3.2.1 6.4.0.0WKSecondary
classification(e.g. discipline, activity sector)
Primary classification
(e.g. qualification, occupation)
Episode Category (e.g. work, college, military service, …)
0.0.0.0 Unknow1.0.0.0 Medicine and Dentistry6.0.0.0 Mathematical and Computer Sciences
6.4.0.0 Computer Science
• Category of episode alone not sufficient
• Most important episodes have extra classifications
• But fine-grained description may not be useful
User to vary depth of classification
Using Similarity Metrics for Matching Lifelong Learners 21
Tokenisation of Timelines
Cl-10.1.0.0-3.1.0.0
Dg-10.1.0.0-3.1.0.0
Wk-4.0.0.0-7.2.1.2
Wk-11.0.0.0-3.1.3.2
Wk-3.0.0.0-4.1.3.6
Mv-0.0.0.0-0.0.0.0
Un-6.4.0.0-6.3.0.0
Cl-10-3 Dg-10-3 Wk-4-7 Wk-11-3 Wk-3-4 Mv-0-0 Un-6-6
Exp
ress
ivity
Cl-- Dg-- Wk-- Wk-- Wk-- Un--Mv--
Using Similarity Metrics for Matching Lifelong Learners 22
Similarity Metrics
SimMetrics JAVA package – http://www.dcs.shef.ac.uk/~sam/simmetrics.html
Levenshtein
Needleman – Wunsch
Jaro
Matching Coefficient
Euclidean Distance
Block Distance
Jaccard Similarity
Cosine Similarity
Dice Similarity
Overlap Coefficient
Using Similarity Metrics for Matching Lifelong Learners 23
Encoding of some timelines
ID Description Encoding
Source The original timeline used as the source for the similarity measure Cl-00 Un-00 Mv-00 Wk-00
Id A timeline similar to the source. Cl-00 Un-00 Mv-00 Wk-00
ReA timeline containing the same episodes as the source but in a totally different order (i.e. no episode is at the same position in the string). Un-00 Wk-00 Cl-00 Mv-00
ADeA new work episode (similar to an existing one) is added to the timeline.
Cl-00 Un-00 Mv-00 Wk-00 Wk-00
ADnA new episode (different from all existing ones) is added to the timeline.
Cl-00 Un-00 Mv-00 Wk-00 Bs-00
RMw The last episode is removed from the source timeline. Cl-00 Un-00 Mv-00
RMu One of the episodes of the source timeline is removed. Cl-00 Mv-00 Wk-00
SBnOne of the episodes of the source timeline is substituted by a new one (different from all existing ones).
Cl-00 Un-00 Mv-00 Bs-00
SBeOne of the episodes of the source timeline is substituted by an existing episode.
Cl-00 Un-00 Mv-00 Un-00
SBvOne of the episodes of the source timeline is substituted by a variant of an existing episode.
Cl-00 Un-00 Mv-00 Wk-10
Using Similarity Metrics for Matching Lifelong Learners 24
Comparison of Metrics
ID RE ADe ADn RMw RMu SBn SBe SBv
Levenshtein 1 0 0.8 0.8 0.75 0.75 0.75 0.75 0.75
Needleman - Wunsch 1 0 0.8 0.8 0.75 0.75 0.75 0.75 0.88
Jaro 1 0.72 0.93 0.93 0.92 0.92 0.83 0.83 0.83
Matching Coefficient 1 1 0.8 0.8 0.75 0.75 0.75 0.75 0.75
Euclidean Distance 1 1 0.84 0.84 0.8 0.8 0.75 0.75 0.75
Block Distance 1 1 0.89 0.89 0.86 0.86 0.75 0.75 0.75
Jaccard Similarity 1 1 1 0.8 0.75 0.75 0.6 0.75 0.6
Cosine Similarity 1 1 1 0.89 0.87 0.87 0.75 0.87 0.75
Dice Similarity 1 1 1 0.89 0.86 0.86 0.75 0.86 0.75
Overlap Coefficient 1 1 1 1 1 1 0.75 1 0.75
User-defined cost functionsUser-defined cost functions
Using Similarity Metrics for Matching Lifelong Learners 25
Search for “People like me”
• “Existential” search• Filtering by
– User profile– Episode categories
• Tuning by– Classification depth– Similarity Metrics
• Ranking by timeline similarity
Using Similarity Metrics for Matching Lifelong Learners 26
Using Similarity Metrics for Matching Lifelong Learners 27
Explaining Similarity Measures
• Needleman – Wunsch
• Computing alignment of strings– Copy/substituting tokens– Insertion/deletion
• Optimal score for alignment of the first i characters in T1 and the first j characters in T2
• Score indicates minimal edit distance
• Backtracking for alignment(s)
0
0
0
10
321234D
32123C
432112B
543211A
654321
CBECBA
1
___
CBE
D
_
C
C
B
B
A
A
G
G
d
Using Similarity Metrics for Matching Lifelong Learners 28
Cl-0.0-4.1
Wk-R.0-2.4
Un-6.4-6.3
Wk-N.0-9.2
Un-6.4-6.1
Wk-S.0-3.1
Wk-J.0-3.1
Wk-J.0-2.1
Cl-6.4-4.1
Wk-J.0-2.1
Un-9.1-6.3
Wk-R.0-2.4
Wk-C.0-3.5
Wk-P.0-2.3
Un-6.4-6.1
Wk-G.0-4.2
Wk-K.0-3.1S1
S2
Cl-0.0-4.1
Wk-R.0-2.4
Un-6.4-6.3
Wk-N.0-9.2
Un-6.4-6.1
Wk-S.0-3.1
Wk-J.0-3.1
Wk-J.0-2.1
0 1 2 3 4 5 6 7 8
Cl-6.4-4.1 1 2 3 4 5 6 7 8 9
Wk-J.0-2.1
2 3 4 5 6 7 8 9 8
Un-9.1-6.3 3 4 5 6 7 8 9 10 9
Wk-R.0-2.4
4 5 4 5 6 7 8 9 10
Wk-C.0-3.5
5 6 5 6 7 8 9 10 11
Wk-P.0-2.3
6 7 6 7 8 9 10 11 12
Un-6.4-6.1 7 8 7 8 9 8 9 10 11
Wk-G.0-4.2
8 9 8 9 10 9 10 11 12
Wk-K.0-3.1
9 10 9 10 11 10 11 12 13
Use
r’s
Tim
elin
e (S
1)Target’s Timeline (S2)
Cl-0.0-4.1
Wk-R.0-2.4
Un-6.4-6.3
Wk-N.0-9.2
Un-6.4-6.1
Wk-S.0-3.1
Wk-J.0-3.1
Wk-J.0-2.1
Cl-6.4-4.1
Wk-J.0-2.1
Un-9.1-6.3
Wk-R.0-2.4
Wk-C.0-3.5
Wk-P.0-2.3
Un-6.4-6.1
Wk-G.0-4.2
Wk-K.0-3.1S1
S2
Using Similarity Metrics for Matching Lifelong Learners 29
“What should I do next?”
• “Recommendation” too strong term– Suggesting reliability & objectivity; difficulty of obtaining expert
pathways• Role Model
– source of inspiration– This is what people have done after following a pathway similar
to yours; why not consider a similar future ? Exploiting String alignments
• Identifying common patterns & possible future pathways• Naïve “Rule of Thumb” approach• Lack of semantic BETWEEN episodes
Cl-0.0-4.1
Wk-R.0-2.4
Un-6.4-6.3
Wk-N.0-9.2
Un-6.4-6.1
Wk-S.0-3.1
Wk-J.0-3.1
Wk-J.0-2.1
Cl-6.4-4.1
Wk-J.0-2.1
Un-9.1-6.3
Wk-R.0-2.4
Wk-C.0-3.5
Wk-P.0-2.3
Un-6.4-6.1
Wk-G.0-4.2
Wk-K.0-3.1S1
S2
Using Similarity Metrics for Matching Lifelong Learners 30
Using Similarity Metrics for Matching Lifelong Learners 31
Using Similarity Metrics for Matching Lifelong Learners 32
Using Similarity Metrics for Matching Lifelong Learners 33
Conclusions
• Different metrics, different aspects of string comparison– Not one particularly adequate or “better”– Context of use important: what does “people like me” mean?
• What are they good for?– Separation between encoding and matching– Encoding does not depend on context, embeds some – not all –
of the timeline’s semantic • Persistent storage, indexing, RSS feed, alerts
• What are they no so good for?– Discrepancy between string similarity and timeline similarity– Lack of explanation on the reasons for similarity
• The way forward?– Identifying contexts of usage and deploying tailored mechanism– User-defined mechanism
Using Similarity Metrics for Matching Lifelong Learners 34
Which Measure of (Dis)similarity?
• Needleman – Wunsch– Distance between tokens?
– Cost functions• G: gap (insert/delete)• d: distance (substitute)
• Normalised Similarity?– algorithm-specific
___ DCBA
CBE _CBA
E _CBA
66% (4/6)
50% (2/4)
Similarity Dissimilarity
- -2.3.2.1 6.4.0.0WK
- -1.0.0.0 4.2.0.0WK
Using Similarity Metrics for Matching Lifelong Learners 35
An Holistic Approach of Timeline Matching
Using Similarity Metrics for Matching Lifelong Learners 36
10
9
10
Wk-K.0-3.1
Cl-0.0-4.1
Wk-G.0-3.1
Un-6.4-6.1
Wk-I.0-3.1
0 1 2 3 4 5
Cl-6.4-4.1 1 2 3 4 5 6
Wk-J.0-2.1 2 3 4 5 6 7
Un-9.1-6.3 3 4 5 6 7 8
Wk-R.0-2.4 4 5 6 7 8 9
Wk-C.0-3.5 5 6 7 8 9 10
Wk-P.0-2.3 6 7 8 9 11
Un-6.4-6.1 7 8 9 9 10
Wk-G.0-4.2 8 9 10 11 10 11
Wk-K.0-3.1 8 9 10 11 12
Us
er’
s T
ime
lin
e (
S1
)
Target’s Timeline (S2)
Wk-G.0-3.1
Un-6.4-6.1
Wk-I.0-3.1
Cl-0.0-4.1
Wk-K.0-3.1
Wk-G.0-3.1
Un-6.4-6.1
Wk-I.0-3.1
Cl-0.0-4.1
Wk-K.0-3.1
Cl-6.4-4.1
Wk-J.0-2.1
Un-9.1-6.3
Wk-R.0-2.4
Wk-C.0-3.5
Wk-P.0-2.3
Un-6.4-6.1
Wk-G.0-4.2
Wk-K.0-3.1
Cl-6.4-4.1
Wk-J.0-2.1
Un-9.1-6.3
Wk-R.0-2.4
Wk-C.0-3.5
Wk-P.0-2.3
Un-6.4-6.1
Wk-G.0-4.2
Wk-K.0-3.1
Alignment 1
Alignment 2
Multiple String Alignments
Using Similarity Metrics for Matching Lifelong Learners 37
Future Work (?)
• (Multiple) External Representations of timelines AND similarities
• Full-fledged Social Network functionalities– Reflection– Help & advice seeking, interventions (peers,
institutions, …)
• “Recommendation”– Dependencies BETWEEN episodes– Domain knowledge (e.g. course entry profile,
alternatives to top-down taxonomies)