Post on 16-Dec-2015
transcript
- 1 -© FTW 2014
WG2 TF CrowdsourcingCROWDSOURCING 2.XFrom Microworkers to Customers: Lessons learned from crowdsourcing testing
Bruno GARDLO, FTW8th General Qualinet Meeting, 7.10. – 10.10.2014, Delft
- 2 -© FTW 2014
Simple web-app, with flash player and a ACR 5 rating scale, 2 CCs ranging from 800 to 2000 kbps
Volunteers gathered via Facebook and Email calls, 114 users, 3 months of collecting the data
Each respondent was directly approached via chat, when he started and finished the test (pseudo-controlled environment)
Users rated 10 videos in a row, repeated twice – overall assessment time ~12 minutes
Beginnings…Back in the year 2010
- 3 -© FTW 2014
Facebook Study 2010
- 4 -© FTW 2014
Facebook study 2011
Procedure was taking rather long time, so we looked for alternatives, but still focused on Facebook volunteers
Enhanced web-app, to be more “standalone” – added content questions, shortened testing time to be more “appealing” to the subjects
Reduced administrative burden (no chatting with respondents…)
220 subjects ~ only 812 reliable answers
- 5 -© FTW 2014
Facebook Study 2011
- 6 -© FTW 2014
2012: Move to Microworkers.com
o Introduced 2-stage design, together with “Screen quality test”
o Additionally – application monitoring, Control questions concerning the playback, Consistency questions, “Gold” data:
o Initial number of users: 297
o Number of users remained after screening: 88 (29%)
- 7 -© FTW 2014
2012: Move to Microworkers.com
- 8 -© FTW 2014
2012: Move to Microworkers.com
o Introduced 2-stage design, together with “Screen quality test”
o Additionally – application monitoring, Control questions concerning the playback, Consistency questions, “Gold” data:
o Initial number of users: 297
o Number of users remained after screening: 88 (29%)
o Next steps: Improving the efficiency and reliability of the campaigns
- 9 -© FTW 2014
2013: In-Momento Approach
Engage user’s attention with interesting easy to do task (and monitor reliability)
Retain user’s attention with short testing session and with simple UI
Investigate user’s reliability and find a communication way to create a dialog with him – Lead him in the app and inform about progress and how is he doing…
Once user is in the app and is reliable
- Try to engage him with more tasks
- Do not force him to continue
- Offer him some advantage over other users (higher payment)
- 10 -© FTW 2014
2013: In-momento Results
100 reliable ratings in several hours (instead of days)
Overall reliability:
86%
Increased cost efficiency:
0.26 $ vs. 0.08$ per reliable task
Decrease of administrative overhead! MOS still higher than that achieved in the Lab
- 11 -© FTW 2014
Facing the CS issues
There is a consistency in the results for all our past CS studies – MOS in CS assessment is higher than those achieved in the Lab
Some tasks related to the QoE, are easier to recognize for the workers
- e.g. users are more familiar with the stalling video, than with recognizable artifacts related to the encoding
- Users are accustomed to certain quality level they receive
The relation between Lab Studies and CS study results is often not easy discoverable, maybe even not existent
- 12 -© FTW 2014
THANK YOU FOR YOUR ATTENTION