+ All Categories
Home > Technology > Text versus Speech: A Comparison of Tagging Input Modalities for Camera Phones

Text versus Speech: A Comparison of Tagging Input Modalities for Camera Phones

Date post: 16-May-2015
Category:
Upload: mauro-cherubini
View: 923 times
Download: 7 times
Share this document with a friend
Description:
Speech and typed text are two common input modalities for mobile phones. However, little research has compared them in their ability to support annotation and retrieval of digital pictures on mobile devices. In this paper, we report the results of a month-long field study in which participants took pictures with their camera phones and had the choice of adding annotations using speech, typed text,or both. Subsequently, the same subjects participated in a controlled experiment where they were asked to retrieve images based on annotations as well as retrieve annotations based on images in order to study the ability of each modality to effectively support users' recall of the previously captured pictures. Resultsdemonstrate that each modality has advantages and shortcomings for the production of tags and retrieval of pictures. Several guidelines are suggested when designing tagging applications for portable devices.
Popular Tags:
27
Text vs. Speech A Comparison of Tagging Input Modalities for Camera Phones Research & Development Mauro Cherubini , Xavier Anguera, Nuria Oliver, and Rodrigo de Oliveira
Transcript
Page 1: Text versus Speech: A Comparison of Tagging Input Modalities for Camera Phones

Text vs. Speech A Comparison of Tagging Input Modalities

for Camera Phones

Research & Development

Mauro Cherubini, Xavier Anguera, Nuria Oliver, and Rodrigo de Oliveira

Page 2: Text versus Speech: A Comparison of Tagging Input Modalities for Camera Phones

people do not want to tag their pictures

intro → hypotheses → methodology → results → implications

Page 3: Text versus Speech: A Comparison of Tagging Input Modalities for Camera Phones

research question:

Assuming that users are willing to input at least one tag, which input

modality can help the production and retrieval of the pictures?

intro → hypotheses → methodology → results → implications

Page 4: Text versus Speech: A Comparison of Tagging Input Modalities for Camera Phones

hypothesis 1

Speech is preferred to text as an annotation mechanism on mobile

phones (objective measure)

Support: - Mitchard and Winkles (2002)

intro → hypotheses → methodology → results → implications

Page 5: Text versus Speech: A Comparison of Tagging Input Modalities for Camera Phones

hypothesis 1-bis

Speech annotations are preferred by users even if this means spending more time on the task (subjective measure)

Support: - Perakakis and Potamianos (2008)

intro → hypotheses → methodology → results → implications

Page 6: Text versus Speech: A Comparison of Tagging Input Modalities for Camera Phones

hypothesis 2

The longer the tag the larger the advantage of voice over text for

annotating pictures on mobile phones

Support: - Hauptmann and Rudnicky (1990)

intro → hypotheses → methodology → results → implications

Page 7: Text versus Speech: A Comparison of Tagging Input Modalities for Camera Phones

hypothesis 3

Retrieving pictures on mobile phones with speech is not faster than with text

(objective measure)

Support: - Mills et al. (2000)

intro → hypotheses → methodology → results → implications

Page 8: Text versus Speech: A Comparison of Tagging Input Modalities for Camera Phones

the user study

intro → hypotheses → methodology → results → implications

field study (4 weeks)

controlled experiment

T1 - T2 - T3 - T4

3 experimental conditions: a. Speech only

b. Text only c. Speech and Text

Page 9: Text versus Speech: A Comparison of Tagging Input Modalities for Camera Phones

intro → hypotheses → methodology → results → implications

MAMI

Page 10: Text versus Speech: A Comparison of Tagging Input Modalities for Camera Phones

intro → hypotheses → methodology → results → implications

features of MAMI

•  processing is done entirely on the mobile phone

•  speech is not transcribed

•  to compare the waveforms of the audio tags, MAMI uses algorithm of Dynamic Time Warping

Page 11: Text versus Speech: A Comparison of Tagging Input Modalities for Camera Phones

task 1: remember the tag

intro → hypotheses → methodology → results → implications

stimulus retrieval

Pictures taken during the field trial

Page 12: Text versus Speech: A Comparison of Tagging Input Modalities for Camera Phones

task 2: remember the context

intro → hypotheses → methodology → results → implications

stimulus retrieval

TASK 2 PICTURE 1

three little bushes Garden Tree Stairs

Page 13: Text versus Speech: A Comparison of Tagging Input Modalities for Camera Phones

task 3: remember the picture

intro → hypotheses → methodology → results → implications

stimulus retrieval

Text Audio tags were converted into

textual tags and vice versa

Page 14: Text versus Speech: A Comparison of Tagging Input Modalities for Camera Phones

task 4: remember the sequence

intro → hypotheses → methodology → results → implications

assignment retrieval

TASK 4

Three pictures among the oldest and three pictures among the newest.

Page 15: Text versus Speech: A Comparison of Tagging Input Modalities for Camera Phones

metrics

intro → hypotheses → methodology → results → implications

•  time to completion

•  false positives

•  retrieval errors

Page 16: Text versus Speech: A Comparison of Tagging Input Modalities for Camera Phones

results H1

intro → hypotheses → methodology → results → implications

Page 17: Text versus Speech: A Comparison of Tagging Input Modalities for Camera Phones

results H1-bis

All participants in the BOTH group felt that tagging with text was more effective than tagging with voice.

Voice: 3.33 [0.81], Text: 4.34 [0.81] (Mean [SD]) 1 = completely agree; 5 = completely disagree

intro → hypotheses → methodology → results → implications

Page 18: Text versus Speech: A Comparison of Tagging Input Modalities for Camera Phones

results H2

intro → hypotheses → methodology → results → implications

Page 19: Text versus Speech: A Comparison of Tagging Input Modalities for Camera Phones

results H3

intro → hypotheses → methodology → results → implications

Page 20: Text versus Speech: A Comparison of Tagging Input Modalities for Camera Phones

results H3 - continued

Page 21: Text versus Speech: A Comparison of Tagging Input Modalities for Camera Phones

take away 1: �speech is not a given

the advantage of audio as an input modality for tagging pictures on mobile phones is not a given

why? 1. retrieval precision

2. privacy

intro → hypotheses → methodology → results → implications

Page 22: Text versus Speech: A Comparison of Tagging Input Modalities for Camera Phones

take away 2: �input mistakes

we address text input mistakes immediately. on the contrary mistakes in audio recordings are less

frequently addressed

intro → hypotheses → methodology → results → implications

Page 23: Text versus Speech: A Comparison of Tagging Input Modalities for Camera Phones

take away 3: �memory

speech does not help memorizing the tags

intro → hypotheses → methodology → results → implications

Page 24: Text versus Speech: A Comparison of Tagging Input Modalities for Camera Phones

implication 1:�allow multiple modalities

© Pixar, 2008

intro → hypotheses → methodology → results → implications

Page 25: Text versus Speech: A Comparison of Tagging Input Modalities for Camera Phones

implication 2:�enable audio inspection

intro → hypotheses → methodology → results → implications

Page 26: Text versus Speech: A Comparison of Tagging Input Modalities for Camera Phones

implication 3: �enable modality synesthesia

© Disney, 1940

intro → hypotheses → methodology → results → implications

Page 27: Text versus Speech: A Comparison of Tagging Input Modalities for Camera Phones

end�thanks

[email protected] [email protected]

http://www.i-cherubini.it/mauro/blog/ http://research.tid.es/multimedia/

Research & Development


Recommended