Post on 07-Nov-2014
description
transcript
The TREC2001 Video Track:Information Retrieval on Digital Video
Information
Alan F. Smeaton Centre for Digital Video Processing, Dublin City University, Ireland
Paul Over National Institute for Standards and Technology, USA
Cash J. Costello Applied Physics Laboratory, Johns Hopkins University, USA
Arjen P. de Vries CWI, Amsterdam, The Netherlands
David Doermann Laboratory for Language and Media Processing, University of Maryland, USA
Alexander Hauptmann School of Computer Science, Carnegie Mellon University, USA
Mark E. Rorvig School of Library and Information Sciences, University of North Texas, USA
John R. Smith IBM T.J. Watson Research Center, USA
Lide Wu Dept. of Computer Science, Fudan University, China
• TREC2001• TREC2001 Video Track• TREC2001 Video Track Tasks
– Shot Boundary Detection Task– Search Task
• Search Task• Participants in Search Task & Their Focus• Summary of approaches by participants• Conclusion
Presentation overview
2/21
TREC2001 Video Track: Information Retrieval on Digital Video Information
• Annual activity (1992- ) to “benchmark the retrieval effectiveness of Information Retrieval tasks”
• Co-ordinator NIST (National Institute for Standards and Technology, US) defines & distributes:– Test document corpus– Topics (queries)
• Participating groups develop an IR system, run Topics against Test document corpus, sends the results to NIST
• NIST generate relevance assessments and calculate the performance in terms of precision & recall
• Annual conference in Gaithersburg, Maryland
TREC (Text REtrieval Conference)
3/21
TREC2001 Video Track: Information Retrieval on Digital Video Information
• Different streams, introduced to focuses on a particular sub-problems in Information Retrieval
• 15 different “tracks” have been introduced, some stopped, some continuing, e.g:– Interactive track 1993-– Chinese language track 1995-1998– Web track 1998-– Question Answering track 1998-– Video track 2001-
“Tracks” in TREC
4/21
TREC2001 Video Track: Information Retrieval on Digital Video Information
• 1st Video Track in 2001• Promote progress in content-based retrieval from
digital video via open, metrics-based evaluation
• 12 Participating groups (5 USA, 2 Asia, 5 Europe) - contributing definition of corpus, topics, task via discussion, and running of the track
• Following the TREC framework: NIST co-ordinated and provided:– Video document corpus– Topic queries
Video Track in TREC2001
5/21
TREC2001 Video Track: Information Retrieval on Digital Video Information
• Video document corpus - total 11.2 hours (85 video files in MPEG-1 format; 6.3 Gbytes), mostly documentary nature, varying in age, style and quality e.g:
6/21
TREC2001 Video Track: Information Retrieval on Digital Video Information
• “A New Horizon” (16 min; colour; documentary) - This Great Plains orientation tape explains the boundaries of the Great Plains Region which is one of five regions that make up the Bureau of Reclamation
• “Challenge at Glen Canyon” (26 min; colour; documentary) - Shows how the repairing of the spillway caused by flooding along the Colorado River System was conducted
Video Track in TREC2001
• 74 Topics (queries) - with multimedia examples (audio/image/video) along with each topic, e.g:– Topic #8: “find clips showing the planet Jupiter”
(with 2 images depicting Jupiter)
– Topic #32: “find clips with a chopper landing”
(with 3 audio clips of a helicopter sound)
– Topic #54: “find clips showing Glen Canyon dam”
(with a short video clip showing Glen Canyon dam)
7/21
TREC2001 Video Track: Information Retrieval on Digital Video Information
Video Track in TREC2001
Number of topics 74
No. topics with image examples / Avg. number of images 26 / 2.0
No. topics with audio examples / Avg. number of audio 10 / 4.3
No. topics with video examples / Avg. number of videos 51 / 2.4
• Two distinctive tasks:– Shot Boundary Detection task: engineering
exercise to evaluate the accuracy of automatically detecting camera shot boundaries in the video corpus
Tasks in Video Track in TREC2001
8/21
TREC2001 Video Track: Information Retrieval on Digital Video Information
– Facilitates higher-level video indexing/browsing (e.g scene detection/navigation, news story segmentation…)
Video file
Camera shot
9/21
TREC2001 Video Track: Information Retrieval on Digital Video Information
• Two distinctive tasks:– Search task: running topic queries against
the video corpus, searching for the video segments that answer the queries
• Automatic• Interactive
– Answer segments are submitted to NIST for evaluation
Tasks in Video Track in TREC2001
• Among 12 participating groups in the TREC2001 Video Track:– all 12 groups took part in the Shot Boundary Task– 8 groups took part in the Search Task
• Participants in Search Task:– Carnegie Mellon University, USA– Dublin City University, Ireland– Fudan University, China– IBM Research, USA– Johns Hopkins University, USA– Lowlands Group (Netherlands)– University of Maryland, USA– University of North Texas, USA
Participating Groups in Search Task
10/21
TREC2001 Video Track: Information Retrieval on Digital Video Information
• Used Informedia Digital Video Library’s standard processing modules– Shot Boundary Detection (using color histogram comparison)– Keyframe extraction– Speech recognition (using Sphinx speech recogniser with 64,000 word
vocabulary)– Face detection– Video OCR– Image search based on color histogram features in different colour
spaces and textures
• Informedia interface for Interactive track, users allowed to switch between multiple image search engines
• Image retrieval augmented to process I-frames (not only keyframes)• Speaker identification component used to compare query audio
example to the audio in the retrieved video segment• Image retrieval & video OCR had the largest impact on performance
Carnegie Mellon University (USA)
11/21
TREC2001 Video Track: Information Retrieval on Digital Video Information
• Using Físchlár Digital Video System
• Shot boundary detection & Keyframe extraction• Allowed users to browse through keyframes with different
browsing interfaces including:– Timeline browser (linear, spatial keyframe presentation)– Slide Show browser (linear, temporal keyframe presentation)– Hierarchical browser (hierarchical, spatial keyframe presentation)
• 30 test users (final year undergrads & research students) interacted with the system in controlled environment– 12 topic queries / user– 6 minutes / topic query– within-user setting (each user used all 3 browsers 4 times each, in
round robin fashion)
• Timeline browser allowed largest number of answer submissions, with lowest precision, Slide Show vice versa
Dublin City University (Ireland)
12/21
TREC2001 Video Track: Information Retrieval on Digital Video Information
• Tried 17 topics including people searching, video text searching, camera motion, etc.)
• Feature extraction module:– qualitative camera motion analysis module– face detection/recognition module (skin color based
segmentation + motion/shape filtering, use of a new optimal discrimination criterion)
– video text detection/recognition module (vertical edge based methods to detect text blocks; improved logical level technique to binarize text blocks)
– speaker recognition / speaker clustering module– Speech SDK (Microsoft) to get transcript
• Off-line indexing followed by on-line searching
Fudan University (China)
13/21
TREC2001 Video Track: Information Retrieval on Digital Video Information
• Members from IBM T.J. Watson Research Center & IBM Almaden Research Center
• Using IBM CueVideo System– Shot Boundary Detection & Keyframe extraction– MPEG-7 visual descriptors for indexing keyframes & answering
automatic searches– Statistical model for classifying & generating labels/scores for:
• events (fire, smoke, launch)• scenes (greenery, land, outdoors, rock, sand, sky, water)• objects (airplane, boat, rocket, vehicle, faces)
– Query/filter pipelines to cascaded content- & model-based searching, e.g “shots that have similar colour to this image, have label ‘outdoors’ and show a ‘boat’ ”
• Compared performance of content/module-based system vs. speech-based system: best results obtained by combining the two methods
IBM Research
14/21
TREC2001 Video Track: Information Retrieval on Digital Video Information
• Automatic searching:– Keyframes are used for indexing by color histogram &
image texture– Query representation consisting of image & video
portion of information need– Similarity measure by weighting distance between
the image features of the query representation and the indexed keyframes: Shots with most similar keyframes associated are then retrieved.
Johns Hopkins University (USA)
15/21
TREC2001 Video Track: Information Retrieval on Digital Video Information
• Joint group among database group of CWI, multimedia group of TNO, vision group of University of Amsterdam, language technology group of University of Twente
• Retrieval engine based on:– face detection– camera motion detection (pan, tilt, zoom)– monologue detection– video OCR detection
• System heuristically selected a set of filters based on the detectors by analysing the query text with WordNet
• Compared performance with Transcript-based (provided by CMU) system
• Transcript-based system outperformed features-based system
Lowlands Group (The Netherlands)
16/21
TREC2001 Video Track: Information Retrieval on Digital Video Information
• Temporal Color Correlogram - to capture the spatio-temporal relationship of colors in a video shot
• Using MERIT system with VideoLogger video editing software (from Virage)
• Keyframe extraction (1st frame in the shot) => static image color correlogram calculation => temporal correlogram calculation (by shot segmentation in equal intervals, then shot features fed into CMRS retrieval system)
• TREC topic queries were translated into example videos/images
University of Maryland (USA)
17/21
TREC2001 Video Track: Information Retrieval on Digital Video Information
• Keyframe extraction (frames every 5 seconds)• Redundant keyframe removal (to ensure presence of
frames outside the prescribed normal distribution limits)• Resulting keyframes placed into UNT’s Brighton Image
Searcher application (retrieval based on mathematical measures that correspond to primitive image features)
• 13 topics used by 2 members to retrieve relevant keyframes against topics
• Chosen keyframes were then used as an exemplar to find other keyframes similar to them.
• Precision scores were better than expected due to the human judgement presence
University of North Texas (USA)
18/21
TREC2001 Video Track: Information Retrieval on Digital Video Information
• Varied approaches by different groups– Interactive searching vs. automatic searching– Speech recognition transcript vs. visual-only– Various combination of different features for retrieval– Experienced groups vs. new groups in video retrieval
• Performance (Precision) results varied greatly:– Interactive: Best group 0.6 - Worst group 0.23 (across same 31 topics)– Automatic: 0.609 - 0.002
• The video track was still shaping itself in 2001 & not complete– only small-scale comparisons possible (within-topic, between closely
related system variants)– cross-system comparison possible only after achieving better
consistency in topic formulation, agreement on better measures, larger numbers of data points)
• Difficulties & unforeseen problems highlighted, tackled in 2nd Video track in TREC2002
Summary & Analysis of Approaches
19/21
TREC2001 Video Track: Information Retrieval on Digital Video Information
Conclusions
20/21
• Revealed lots of issues to be addressed in evaluating the performance of retrieval on digital video information
• There are groups working in this area worldwide who have the capability and the systems to support real information retrieval on significant volumes of digital video content
• 2nd Video Track (2002)– more than 20 participating groups– 68.5 hours of video document corpus– 25 focused set of topic queries– Tasks:
• Shot Boundary Detection - as before• Semantic feature extraction task (face, indoor/outdoor,
landscape/cityscape, speech/music/monologue, etc.)• Search - interactive or automatic as before
TREC2001 Video Track: Information Retrieval on Digital Video Information
Conclusion
21/21
TREC2001 Video Track website with papers:
http://www-nlpir.nist.gov/projects/t01v/t01v.html
Authors’ Note: The authors wish to extend our sympathies to the family and friends of our co-author, Mark E. Rorvig, who passed away shortly
before this paper was submitted.
TREC2001 Video Track: Information Retrieval on Digital Video Information