Date post: | 16-Jan-2016 |
Category: |
Documents |
Upload: | bonnie-paul |
View: | 225 times |
Download: | 0 times |
The Browser Evaluation TestA Proposal
Pierre Wellner, Mike FlynnIDIAP, September 2003
Ricoh “MuVIE”, Lee et alVideo editing, key frames, transcript search, embedded web browser, slides, whiteboard, minutes, perspective & panoramic views, speaker location, visual & audio activity
NOT TESTED ON H
UMANS
Microsoft “Distributed Meetings” Cutler et alPanoramic video, person-tracking, audio source localisation & beam-forming, speaker clustering & change, whiteboard camera, PC capture
SUBJECTIVELY TESTED
The Problem• No assessment, or...• Assessed by unique scheme• Often very subjective
[from Cutler et al, “Distributed Meetings: A Meeting Capture and Broadcasting System”, ACM Multimedia, 2002]
– “I was able to get the information I needed […]”– “I would use this system again if I had to miss a meeting.”– “I would recommend the use of this system to my peers.”
• No standard Browsing task
→ Objective comparison not possible ←
Aims of the BET
• Performance, not judgment• Independent of experimenter perception• Directly comparable numeric scores• Replicable
The Browsing Task
Find a maximum number of
observations of interest
in a minimum amount of time.
But what is an “observation of interest”?
test
sampling
BETOverview
observations answers
observers
playbacksystem
subjects
media browser
scoring
scores
meetingparticipants
corpus
recording system
People
• Participants
• Observers– Observer selection– Many diverse interests– Interesting for participants or absentees?
• Subjects– Subject selection
Data
• Corpus– Discussion, Presentation, Decision, Status…– Normal meetings, if possible– Reflect common distribution
• Observations– Pairs of statements, one true, one false
Tests & Scores
• Test: sample of observations
• Subjects must decide on truth– using the browser
• Score is correct minus incorrect answers
• Control scores established:– Educated guesses, no media– Same software as observers– Well-known basic applications
Illustration• Corpus
20 meetings @ ~40 minutes ≈ 13 hrs 20 mins of recordings• Observations
60 observers3 observers watch each meeting @ 18 observation-pairs/hour6 real-time ≈ 240 hours observation time216 observation-pairs/meeting, or 4,320 observation-pairs total
• Testing10 subjects each watch 8 meetings, in 2 hours 40 mins per subject4 subjects watch each meeting, 26 hours 40 mins total subject time1 answer per minute, 160 answers/subject ≈ 1,600 answers total
• SignificanceAssume: binomial distribution of results, 90% answered correctlyConfidence interval: 88.2% to 91.6%, with 95% confidence level
Summary
• Performance, not judgment– Subjects are measured in performance of tasks
• Independent of experimenter perception– Observers indirectly decide the tasks
• Directly comparable numeric scores– Standard methods, standard scores
• Replicable– Publicly accessible Web-site– All media available for download– Tests and scoring on-line
Questions…?
• Is this a good method?
• Do you recognise the problem?
• Would you use this method?
• Do you have a browser to test?
• Do you know of an existing MM corpus?
• …