Date post: | 16-May-2015 |
Category: |
Technology |
Upload: | mauro-cherubini |
View: | 923 times |
Download: | 7 times |
Text vs. Speech A Comparison of Tagging Input Modalities
for Camera Phones
Research & Development
Mauro Cherubini, Xavier Anguera, Nuria Oliver, and Rodrigo de Oliveira
people do not want to tag their pictures
intro → hypotheses → methodology → results → implications
research question:
Assuming that users are willing to input at least one tag, which input
modality can help the production and retrieval of the pictures?
intro → hypotheses → methodology → results → implications
hypothesis 1
Speech is preferred to text as an annotation mechanism on mobile
phones (objective measure)
Support: - Mitchard and Winkles (2002)
intro → hypotheses → methodology → results → implications
hypothesis 1-bis
Speech annotations are preferred by users even if this means spending more time on the task (subjective measure)
Support: - Perakakis and Potamianos (2008)
intro → hypotheses → methodology → results → implications
hypothesis 2
The longer the tag the larger the advantage of voice over text for
annotating pictures on mobile phones
Support: - Hauptmann and Rudnicky (1990)
intro → hypotheses → methodology → results → implications
hypothesis 3
Retrieving pictures on mobile phones with speech is not faster than with text
(objective measure)
Support: - Mills et al. (2000)
intro → hypotheses → methodology → results → implications
the user study
intro → hypotheses → methodology → results → implications
field study (4 weeks)
controlled experiment
T1 - T2 - T3 - T4
3 experimental conditions: a. Speech only
b. Text only c. Speech and Text
intro → hypotheses → methodology → results → implications
MAMI
intro → hypotheses → methodology → results → implications
features of MAMI
• processing is done entirely on the mobile phone
• speech is not transcribed
• to compare the waveforms of the audio tags, MAMI uses algorithm of Dynamic Time Warping
task 1: remember the tag
intro → hypotheses → methodology → results → implications
stimulus retrieval
Pictures taken during the field trial
task 2: remember the context
intro → hypotheses → methodology → results → implications
stimulus retrieval
TASK 2 PICTURE 1
three little bushes Garden Tree Stairs
task 3: remember the picture
intro → hypotheses → methodology → results → implications
stimulus retrieval
Text Audio tags were converted into
textual tags and vice versa
task 4: remember the sequence
intro → hypotheses → methodology → results → implications
assignment retrieval
TASK 4
Three pictures among the oldest and three pictures among the newest.
metrics
intro → hypotheses → methodology → results → implications
• time to completion
• false positives
• retrieval errors
results H1
intro → hypotheses → methodology → results → implications
results H1-bis
All participants in the BOTH group felt that tagging with text was more effective than tagging with voice.
Voice: 3.33 [0.81], Text: 4.34 [0.81] (Mean [SD]) 1 = completely agree; 5 = completely disagree
intro → hypotheses → methodology → results → implications
results H2
intro → hypotheses → methodology → results → implications
results H3
intro → hypotheses → methodology → results → implications
results H3 - continued
take away 1: �speech is not a given
the advantage of audio as an input modality for tagging pictures on mobile phones is not a given
why? 1. retrieval precision
2. privacy
intro → hypotheses → methodology → results → implications
take away 2: �input mistakes
we address text input mistakes immediately. on the contrary mistakes in audio recordings are less
frequently addressed
intro → hypotheses → methodology → results → implications
take away 3: �memory
speech does not help memorizing the tags
intro → hypotheses → methodology → results → implications
implication 1:�allow multiple modalities
© Pixar, 2008
intro → hypotheses → methodology → results → implications
implication 2:�enable audio inspection
intro → hypotheses → methodology → results → implications
implication 3: �enable modality synesthesia
© Disney, 1940
intro → hypotheses → methodology → results → implications
end�thanks
[email protected] [email protected]
http://www.i-cherubini.it/mauro/blog/ http://research.tid.es/multimedia/
Research & Development