- 1 -© FTW 2014 WG2 TF Crowdsourcing CROWDSOURCING 2.X From Microworkers to Customers: Lessons...

transcript

- 1 -© FTW 2014

WG2 TF CrowdsourcingCROWDSOURCING 2.XFrom Microworkers to Customers: Lessons learned from crowdsourcing testing

Bruno GARDLO, FTW8th General Qualinet Meeting, 7.10. – 10.10.2014, Delft

- 2 -© FTW 2014

Simple web-app, with flash player and a ACR 5 rating scale, 2 CCs ranging from 800 to 2000 kbps

Volunteers gathered via Facebook and Email calls, 114 users, 3 months of collecting the data

Each respondent was directly approached via chat, when he started and finished the test (pseudo-controlled environment)

Users rated 10 videos in a row, repeated twice – overall assessment time ~12 minutes

Beginnings…Back in the year 2010

- 3 -© FTW 2014

Facebook Study 2010

- 4 -© FTW 2014

Facebook study 2011

Procedure was taking rather long time, so we looked for alternatives, but still focused on Facebook volunteers

Enhanced web-app, to be more “standalone” – added content questions, shortened testing time to be more “appealing” to the subjects

Reduced administrative burden (no chatting with respondents…)

220 subjects ~ only 812 reliable answers

- 5 -© FTW 2014

Facebook Study 2011

- 6 -© FTW 2014

2012: Move to Microworkers.com

o Introduced 2-stage design, together with “Screen quality test”

o Additionally – application monitoring, Control questions concerning the playback, Consistency questions, “Gold” data:

o Initial number of users: 297

o Number of users remained after screening: 88 (29%)

- 7 -© FTW 2014

- 8 -© FTW 2014

o Introduced 2-stage design, together with “Screen quality test”

o Additionally – application monitoring, Control questions concerning the playback, Consistency questions, “Gold” data:

o Initial number of users: 297

o Number of users remained after screening: 88 (29%)

o Next steps: Improving the efficiency and reliability of the campaigns

- 9 -© FTW 2014

2013: In-Momento Approach

Engage user’s attention with interesting easy to do task (and monitor reliability)

Retain user’s attention with short testing session and with simple UI

Investigate user’s reliability and find a communication way to create a dialog with him – Lead him in the app and inform about progress and how is he doing…

Once user is in the app and is reliable

- Try to engage him with more tasks

- Do not force him to continue

- Offer him some advantage over other users (higher payment)

2013: In-momento Results

100 reliable ratings in several hours (instead of days)

Overall reliability:

Increased cost efficiency:

0.26 $ vs. 0.08$ per reliable task

Decrease of administrative overhead! MOS still higher than that achieved in the Lab

Facing the CS issues

There is a consistency in the results for all our past CS studies – MOS in CS assessment is higher than those achieved in the Lab

Some tasks related to the QoE, are easier to recognize for the workers

- e.g. users are more familiar with the stalling video, than with recognizable artifacts related to the encoding

- Users are accustomed to certain quality level they receive

The relation between Lab Studies and CS study results is often not easy discoverable, maybe even not existent

THANK YOU FOR YOUR ATTENTION

- 1 -© FTW 2014 WG2 TF Crowdsourcing CROWDSOURCING 2.X From Microworkers to Customers: Lessons...

Documents