Date post: | 22-Nov-2014 |
Category: |
Education |
Upload: | christoph-rensing |
View: | 510 times |
Download: | 2 times |
© author(s) of these slides including research results from the KOM research network and TU Darmstadt; otherwise it is specified at the respective slide
21-Sep-13
Prof. Dr.-Ing. Ralf Steinmetz
KOM - Multimedia Communications Lab
Rensing_Crowdsourcing_ECTELmeetsECSCW__2013.pptx
Investigating Crowdsourcing as an
Evaluation Method for
(TEL) Recommender Systems
Mojisola Erdt
Florian Jomrich
Katja Schüler
Christoph Rensing
Source: http://www.digitalvisitor.com/cultural-differences-in-online-behaviour-and-customer-reviews/
KOM – Multimedia Communications Lab 2
Motivation & Context
Workplace Learning
to solve a particular work related
problem or to prepare for a new task
Informal: Community of practices &
using resources from the Web or
companies Intranet or created from
the employees
Train and practice necessary
competences in vocational
training
Learning Application
Social tagging application
Help to structure resources
Help to retrieve resources
Reflection
…
see Proceedings of EC-TEL 2011
(TEL) Recommender Systems
Recommend relevant resources,
tags or users
KOM – Multimedia Communications Lab 3
Quality Measures for Recommender Systems in TEL:
Relevance: “Relevance refers to the ability of a RS (Recommender System)
to provide items that fit the user’s preferences” [Epifania 2012].
Novelty: “Novelty (or discovery) is the extent to which users receive new
and interesting recommendations” [Pu 2011].
Diversity: “Diversity measures the diversity level of items in the
recommendation list” [Pu 2011].
Quality Measures for a User Experiment
KOM – Multimedia Communications Lab 4
Evaluation
Approach
Method Advantages Disadvantages
Offline
Evaluation
Cross-validation
using historical
datasets
Fast
Less effort
Repeatable
New, unknown
resources cannot be
evaluated
Cross-validation
using synthetically
generated data
For testing and
analysing
Might be biased
towards an algorithm
Artificial scenarios
Online
Evaluation
User Experiments
(Lab & Field)
User’s
perspective
A lot of effort and
time
Few users
Real-life testing Real-life setting
Needs a substantial
amount of users
Crowdsourcing Fast
Less effort
Enough users
Spamming
Evaluation Methods for TEL Recommender
Systems
KOM – Multimedia Communications Lab 6
Experiment
Generate a basis graph structure (extended Folksonomy)
5 Experts research on the topic of climate change for one hour
Using CROKODIL to create an extended folksonomy
Ca. 70 resources were attached to 8 activities
Generate Recommendations
Recommendations from AScore and FolkRank algorithms (see Proceedings of
EC-TEL 2012)
Three Hypotheses:
1. AScore gives more relevant resources than baseline (FolkRank)
2. AScore gives more novel resources than baseline (FolkRank)
3. AScore gives more diverse resources than baseline (FolkRank)
Questionnaire is created
10 questions per recommendation
3 questions to each hypothesis
1 control question to detect spammers
KOM – Multimedia Communications Lab 7
Crowdsouring Evaluation Concept
Jobs on 2 crowdsourcing platforms
Microworkers.com
Active Users and good service quality
CrowdFlower.com
Gives access to other platforms (e.g.
Amazon MTurk a lot of crowdworkers)
KOM – Multimedia Communications Lab 9
Overview of Participants in the Evaluation
Jobs on 2 crowdsourcing platforms
Microworkers.com (Active Users and good service quality)
CrowdFlower.com (Gives access to other platforms (e.g. Amazon MTurk a lot of crowdworkers))
125 completed Questionnaires were evaluated
KOM – Multimedia Communications Lab 11
Summary: Results
Hyp. 1 Relevance Hyp. 2 Novelty Hyp. 3 Diversity
p=0.0065 < 0.5 p=0.0042 < 0.5 p=0.677 > 0.5
supported supported not supported
KOM – Multimedia Communications Lab 15
Conclusion & Future Work
Crowdsourcing can be used as an Evaluation Method for TEL
Further Analysis of Data collected
Comparing results between Crowdworkers and Non-Crowdworkers
Improve Crowdsourcing Evaluation Concept
Apply a more complex evaluation
Apply to further scenarios
KOM – Multimedia Communications Lab 16
Workshop related research questions
How has a recommender to been designed to support resource-based
knowledge acquisition at the workplace?
Relevance and novelty valid quality measures?
How has a recommender to been designed to support resource-based
Learning?
Which are the valid quality measures?
Aggregation of relevance, novelty, diversity, … ?
How can the recommendation been adapted to the current need of a learner?
What do we know from the learner Learning Analytics
How can we support teachers to train and practice self-regulated
learning competences in vocational training?
KOM – Multimedia Communications Lab 17
Questions & Contact
KOM – Multimedia Communications Lab 18
Hypothesis 1: Relevance
Q1. The given internet resource supports me very well in my research about
the topic.
Q2. If I could only use this resource, my research would still be very successful.
Q3. Without this resource just by using my own resources, my research about
the given topic would still be very good.
Hypothesis 2: Novelty
Q4. The internet resource gives me new insights and/ or information for my
task.
Q5. I would have found this resource on my own/ anyway/ during my research.
Q6. There are lots of important aspects about the topic described in this
resource that lack in other resources.
Hypothesis 3: Diversity
Q7. This internet resource differs strongly from my other resources.
Q8. This resource informs me comprehensively about my topic.
Q9. This resource covers the whole spectrum of research about the given
topic.
Questions to Hypothesis 1 – 3
KOM – Multimedia Communications Lab 19
Control Questions in Questionnaire
Q10a. How many pictures and tables that are relevant to the given research topic
does the given resource contain?
Q10b. Give a short summary of the recommended resource above by giving 4
keywords describing its content.
Q10c. Describe the content of the given resource in two sentences.
Control Questions
KOM – Multimedia Communications Lab 20
KOM – Multimedia Communications Lab 21
KOM – Multimedia Communications Lab 22
Quality Measures for Recommender Systems in TEL:
Relevance: “Relevance refers to the ability of a RS (Recommender System)
to provide items that fit the user’s preferences” [Epifania 2012].
Novelty: “Novelty (or discovery) is the extent to which users receive new
and interesting recommendations” [Pu 2011].
Diversity: “Diversity measures the diversity level of items in the
recommendation list” [Pu 2011].
Quality Measures for a User Experiment
KOM – Multimedia Communications Lab 23
Crowdworkers: Age distribution