Definition
A multimedia search tool allows the user to search the Web for non-textual materials such as audio, image, radio, television and video files
The search tool can be a directory, meta search engine or a single search engine
Types of multimedia
5 Types of Multimedia Search Engines:– Video– Audio– Radio– TV– Image
Evaluation CriteriaScope and Navigation
How big and comprehensive is the index? Is this clearly stated? How often is the index updated? Is this clearly stated? Is the site easy to navigate? Is the screen cluttered? Is the user inundated with advertising and flashy marketing?
Search Tool Does the search tool use spiders to compile their own searchable databases on the Web? Or, does the search tool search the databases of multiple sets of individual search engines?
Search Options Are simple and advanced search options available? Is it possible to conduct Boolean, phrase and wildcard or truncation searches? Are field searches available? Can searches be filtered?
Performance Are the results quickly returned? How well do the results match the query? Are there dead links? Is there any obvious duplication of results?
Presentation How are the search results grouped? Can the search results be customized? How is relevancy determined?
Support Is there a link to help? Is the help information suitable for beginning and advanced searchers?
Examples
Multimedia search tools which allow the user to search for video files
– AllTheWeb: www.alltheweb.com– AltaVista: www.altavista.com/video– Lycos: http://multimedia.lycos.com– Singingfish: www.singingfish.com– Dogpile: www.dogpile.com– Fazzle: www.fazzle.com
Overview
“Conventional” methods: catalogs, databases and analog previewing
Why digitize? Discovering video structure Automatic and manual indexing Data models & user interfaces Prospects for the future: mobile and web
services
Convetional Methods
Structured databases:– AV cataloging (AACR2, MARC 21)– Shot lists– Asset management systems
“Pathfinders” (librarians, archivists) Embedded markers: hints, chapters, scenes (DVD) Video logging systems Hardware browse/skim: FF, slow-mo, etc.
Video Search in libaries
Mainly MARC– 245: Title (usually main entry)– 300: Description (physical piece)– 505: Contents– 508: Credits– 511: Performer note– 520: Summary
520: Summary
505: Contents
650: Subject headings
Footage - Opening credits Chocolate factory workers. Alan Coxon and Kathy Sykes preparing food. Man biting into chocolate bar (0'00-0'50")Alan opening fridge and walking over to Kathy at table.Kathy grating orange. Alan showing ingredients for cheesecake. Cooking chocolate. Alan and Kathy breaking chocolate and smelling it.Breaking chocolate.Kathy tasting chocolate (0'51"-"2'00“)
Browse and Skim: Media PlayersDVD player clones; can be enhanced with SDKs
Media Players are DECODERS Pause, FF, rewind Variable speed Navigate menus, chapters, tracks Insert markers Change audio subtitles Show closed captioning Shuttle/scrub
Media Player Example
Start, stop, pause, rewind to beginning, FF to end, advance by frame
File markers; added by end-user
Play speed settings: 0.5 >> 3X
What is video
Authored video has: Series of still images @25-30 fps Structure: frames >> shots >> scenes MODALITIES
– (Audio tracks)– (Text: captioning, subtitles, etc.)– (Graphics: logos, running tickers etc.)
Production metadata: timestamp, datestamp, flash on/off
Advantages of Digital Video
Store and deliver over networksAllow analysis by computers
Allow auto & manual indexingUSING:
– Image processing – Signal processing– Information visualization
Why Compress Video?
– 1 frame (@TV brightness) = 0.9 megabytes (MB) of storage
– At 29 fps, each second = 26.1 MB of storage– 30 minute film = 53 gigabytes (GB) of storage
OBJECT: Make file smaller; retain as much information as possible
Encoding Formats
These formats use some kind of compression; similar encoding methods—many CODECS—some “lossy,” others “lossless”
AVI: audio-video interleave or interactive QuickTime MPEG family: MPEG-1, 2, 4 H261: for video conferencing New: H264; JPEG 2000
CODECS
Compressor/Decompressor, or Coder/Decoder Produce and work with encoding formats. Central to compression and encoding; perform signal
and image processing tasks Examples: Cinepak, Indeo, Windows Media Video. MPEG-4: DivX, Xvid, 3ivX implementations of certain
compression recommendations of MPEG-4.
How Do CODECS Work?
Movement creates “temporal aliasing”: human eye/brain fills in the gaps
Blurring produced by camera shutter softens edges
Modeled by CODECS and algorithms Goal: acceptable facsimile of moving scene
Example
Jermyn, I. Psychovisual Evaluation of Image Database Retrieval and Image Segmentation
Encoding Methods: predictive
Sampling: value of function @ regular intervals (example: brightness of pixels)
Quantization: frequency of sampling (1 in 10 vs. 1 in 100 frames)
Discrete cosine transforms (DCT) an array of data (not just one pixel) is transformed into another set of values.
Inter-frame vs. Intra-frame encoding
Video Compression
Repetition and Patterns– Say we have a video of a horse running. The movement of the
horse’s legs is the same step after step.
– The codec recognizes this repetition of a string of numbers over several frames that form a repeated pattern (the horse’s moving legs).
– The codec saves this data using a “ditto, 319 more times”, or using a single number token to represent the repeated pattern.
– Relatively lossless
Video Compression
Averaging– Same method a JPEG uses
Looks at a block of pixels and averages their color and brightness, saving one number rather than 4, 9, 16, etc.
Video Compression
Range Reduction– The range in brightness of an original video might be on a scale of 1 to 500,
meaning that the lightest part of the sky is 500 times brighter than the darkest shadow under a forest.
– This brightness is recorded as a number for each pixel in each frame
– Wide range of brightness requires that a big number - 8 or 16 bits - be recorded for each pixel.
– The codec reduces the range to a scale of, say, 1 to 100. The sky is still brighter, but only 100 times or so brighter.
– Because the number is saved millions of times in the movie file, reducing its size can delete lots of data from the file…most people will not notice this range reduction.
Video Compression
Frame-Difference– For the first frame of a video, every pixel is recorded.
For subsequent frames, only those pixels that have changed are recorded.
Video Structure
Video
Scene
Shot
Frame
Keyframes
The MPEG-4 codec used to encode this video clip allows for “keyframes” to be inserted at fairly short intervals.
Keyframes are are frames in which all the information has been encoded. We can navigate from keyframe to keyframe, gaining some quick info about the content.
Algorithms can be written to “grab” keyframes to create a storyboard, which can be used to make a visual index of the video.
Shot Boundary Detection
Algorithms that compare the similarities between nearby frames. When the similarities fall below a pre-determined level, the limit of a “shot” is automatically defined:
Edge detection Compare color histograms Compare motion vectors
Spatial & Temporal Segmentation
1. Use shot boundary detection and keyframes to define shots & choose representative frames
2. Use CBIR (Content-based Image Retrieval) techniques to reveal features in representative frames
(shapes, colors, textures)
CBIR Techniques
Images (frames) have no inherent semantic meaning: only arrays of pixel intensities– Color Retrieval: compare histograms– Texture Retrieval: relative brightness of pixel pairs– Shape Retrieval: Humans recognize objects
primarily by their shape – Retrieval by position within the image