Semantic Content-based Access To Hypervideo Databases Haitao Jiang Major Professor: Ahmed K....

transcript

Semantic Content-based Semantic Content-based Access To Hypervideo Access To Hypervideo

DatabasesDatabases

Haitao Jiang

Major Professor: Ahmed K. Elmagarmid

Computer Science Department

Purdue University

Organization Of The TalkOrganization Of The Talk

• Introduction And Review Of Related WorkIntroduction And Review Of Related Work

• Logical Hypervideo Data Model (LHVDM)Logical Hypervideo Data Model (LHVDM)

• Semantic Content-based Video Queries Semantic Content-based Video Queries

• A Web-based Logical Hypervideo A Web-based Logical Hypervideo Database (WLHVDB)Database (WLHVDB)

• ConclusionConclusion

IntroductionIntroduction

• Digital Video And Video Digital Video And Video DatabasesDatabases

• Basic Research Problems Basic Research Problems

• Research Motivation Research Motivation

• Research Goal Research Goal

Unique Characteristics Of Unique Characteristics Of Video DataVideo Data

• Semantics: rich and ambiguousSemantics: rich and ambiguous

• Relationship: ill-definedRelationship: ill-defined

• Structure: unclearStructure: unclear

• Dimension: spatial and temporalDimension: spatial and temporal

• Volume: hugeVolume: huge

Video Data ContentVideo Data Content

• Visual ContentVisual Content

• Audio ContentAudio Content

• Text ContentText Content

• Semantics ContentSemantics Content

Research ProblemsResearch Problems

• Video Data ModelingVideo Data Modeling

• Video Data IndexingVideo Data Indexing

• Video Data QueryVideo Data Query

• Video BrowsingVideo Browsing

Video Data Model Video Data Model RequirementsRequirements

• Content-based Data AccessContent-based Data Access

• Video Data AbstractionVideo Data Abstraction

• Variable Data Access GranularityVariable Data Access Granularity

• Dynamic And Incremental Video Dynamic And Incremental Video AnnotationAnnotation

Video Data Model Video Data Model Requirements (Con.)Requirements (Con.)

• Video Data Independence Video Data Independence

• Spatial And Temporal Spatial And Temporal CharacteristicsCharacteristics

• Video And Meta-data Sharing Video And Meta-data Sharing And ReuseAnd Reuse

Related WorkRelated Work

• Video Data Modeling, Video Data Modeling, Indexing, And QueryingIndexing, And Querying

• Video ObjectsVideo Objects

• Video BrowsingVideo Browsing

Video Data Modeling, Video Data Modeling, Indexing, and QueryingIndexing, and Querying

• Traditional Database ApproachTraditional Database Approach

• Visual Content Or Segmentation-Visual Content Or Segmentation-based Approachbased Approach

• Stratification Or Annotation Stratification Or Annotation Layering ApproachLayering Approach

Traditional Database Traditional Database ApproachApproach

• Categorize And Predefine Video Categorize And Predefine Video Data Attributes/ValuesData Attributes/Values

• Use Traditional Databases And SQLUse Traditional Databases And SQL

• Inflexible And LimitedInflexible And Limited

• Examples: Examples: VISIONVISION, , Video Database BrowserVideo Database Browser

Segmentation-based Segmentation-based ModelsModels

• Parse And Segment Video StreamsParse And Segment Video Streams

• Index On Visual Features Of Index On Visual Features Of RFramesRFrames

• Extract High Level Logical Structure Extract High Level Logical Structure And Semantics By Classifying And Semantics By Classifying Against Domain ModelsAgainst Domain Models

Segmentation-based Segmentation-based Models (con.)Models (con.)

• Can Be Fully Automated Can Be Fully Automated

• Lack Of Flexibility Lack Of Flexibility

• Limited SemanticsLimited Semantics

• Video Streams Need To Be Well-Video Streams Need To Be Well-structuredstructured

• Examples: Examples: JACOBJACOB, QBIC, , QBIC, InformediaInformedia

StratificationStratification

• Segment Video Semantics Segment Video Semantics

• Concept Of Logical Video DataConcept Of Logical Video Data

• Allows For Semantic Content-based Allows For Semantic Content-based Video AccessVideo Access

• Annotation Can Be Tedious And BiasedAnnotation Can Be Tedious And Biased

• Examples: Examples: VideoStarVideoStar, Algebraic , Algebraic VideoVideo

StratificationStratification(con.)(con.)

Existing ModelsExisting Models

– Has Limited Temporal QueriesHas Limited Temporal Queries

– Has Limited Video Browsing Mechanism Has Limited Video Browsing Mechanism

– Lack Multi-user Views And Data SharingLack Multi-user Views And Data Sharing

– Lack Modeling Of Video ObjectsLack Modeling Of Video Objects

– Lack Spatial And Spatial-Temporal Lack Spatial And Spatial-Temporal Query CapabilitiesQuery Capabilities

Different Forms Of Video Different Forms Of Video AnnotationAnnotation

• Multi-layer Icons - Multi-layer Icons - MediaStreamMediaStream

• Keywords Keywords

• Free Text DocumentsFree Text Documents

• Other Types Of Annotation?Other Types Of Annotation?

Sources Of Video Sources Of Video AnnotationsAnnotations

• Closed CaptionClosed Caption

• Text In Video Frames: highlight Text In Video Frames: highlight detection and OCRdetection and OCR

• Voice RecognitionVoice Recognition

• Manual AnnotationManual Annotation

Annotation Support In A Annotation Support In A Video Data ModelVideo Data Model

• Annotation of Arbitrary SequenceAnnotation of Arbitrary Sequence

• Incremental Creation, Deletion, Incremental Creation, Deletion, And ModificationAnd Modification

• Multi-user Annotation SharingMulti-user Annotation Sharing

• Arbitrary Overlap Of AnnotationsArbitrary Overlap Of Annotations

Video ObjectsVideo Objects

• Index On Spatial And Temporal InformationIndex On Spatial And Temporal Information

• MBR as the Spatial RepresentationMBR as the Spatial Representation

• Narrow Focus And Lack Of Data AbstractionNarrow Focus And Lack Of Data Abstraction

• Limited Video QueriesLimited Video Queries

• Example: Example: AVISAVIS, , CVOTCVOT

Video BrowsingVideo Browsing

• Visual Content-based BrowsingVisual Content-based Browsing– Film StripsFilm Strips

– Salient ImagesSalient Images

– Scene Clustering GraphScene Clustering Graph

• NeedNeed Semantic Content-based Semantic Content-based BrowsingBrowsing

• NeedNeed Inter-Video Navigation Inter-Video Navigation

Research MotivationsResearch Motivations

• Visual Content-based Video Access IS Visual Content-based Video Access IS Important BUT Lack SemanticsImportant BUT Lack Semantics

• Users Often Prefer Semantic Content-Users Often Prefer Semantic Content-based Video Data Accessbased Video Data Access

• Lots Applications: Digital Video Lots Applications: Digital Video Library And Distance Learning etc.Library And Distance Learning etc.

• Web Is An Emerging Way Of Web Is An Emerging Way Of Information SharingInformation Sharing

Research GoalResearch Goal

• Goal: Goal: To Provide Effective And Flexible To Provide Effective And Flexible Semantic Content-based Video Data Access In A Semantic Content-based Video Data Access In A Distributed and Multi-user Sharing EnvironmentDistributed and Multi-user Sharing Environment

• Both Spatial And Temporal Video Both Spatial And Temporal Video QueriesQueries

• Heterogeneous Applications And User Heterogeneous Applications And User ViewsViews

• Semantic Content-based BrowsingSemantic Content-based Browsing

JACOB ProjectJACOB ProjectArdizzone and Cascia et al. 1997Ardizzone and Cascia et al. 1997

• Visual Content-based Access To Visual Content-based Access To Images And VideosImages And Videos

• RFrames Are Extracted And Served RFrames Are Extracted And Served As Descriptors Of Video SegmentsAs Descriptors Of Video Segments

• Index On Visual Features (Color, Index On Visual Features (Color, Motion, And Texture etc.)Motion, And Texture etc.)

Informedia ProjectInformedia ProjectM. A. Smith, T. Kanade, M. G. Christel, D. B. Winkler et al. CMUM. A. Smith, T. Kanade, M. G. Christel, D. B. Winkler et al. CMU

• Video Abstraction: Title-Poster Video Abstraction: Title-Poster Frame-Film strip-Skim videoFrame-Film strip-Skim video

• Speech Recognition->Transcript-Speech Recognition->Transcript->Natural Language Processing->Natural Language Processing->Keywords->Align to Frames>Keywords->Align to Frames

• Face And Keyword SearchFace And Keyword Search

VISION Digital LibraryVISION Digital LibraryK. M. Pua, S. Gauch et al. University of Kansas, 1993 - 1994K. M. Pua, S. Gauch et al. University of Kansas, 1993 - 1994

• Practical And Cost-effective Practical And Cost-effective Implementation But Very LimitedImplementation But Very Limited

• Video Storage System + IR system Video Storage System + IR system (Illustra - An ORDBMS)(Illustra - An ORDBMS)

• Text Is As One Table Entry Of Video DataText Is As One Table Entry Of Video Data

• Support Boolean OperatorsSupport Boolean Operators

OVID SystemOVID SystemOomota and Tanaka, 1991Oomota and Tanaka, 1991

• Video Object: a set arbitrary frame Video Object: a set arbitrary frame sequences with attributes and valuessequences with attributes and values

• Video Object Model Is SchemalessVideo Object Model Is Schemaless

• Data Description Sharing Via Data Description Sharing Via “Interval-inclusion Based Inheritance”“Interval-inclusion Based Inheritance”

• User Can Decide which Attributes To User Can Decide which Attributes To Be Shared Be Shared

OVID System (con.)OVID System (con.)• Video-Object Composition: merge, Video-Object Composition: merge,

interval projection and overlapinterval projection and overlap

• VideoSQLVideoSQL– SELECT: SELECT:

continuous/Incontinuous/anyObjectcontinuous/Incontinuous/anyObject

– WHERE: attribute is [value] / attribute WHERE: attribute is [value] / attribute contains [value] / defineOver [frames]contains [value] / defineOver [frames]

• Browsing: VideoChart - bar chart Browsing: VideoChart - bar chart representation of video objectsrepresentation of video objects

Virtual Video BrowserVirtual Video BrowserLittle et al., 1993Little et al., 1993

• Predefined Schema With Fixed AttributesPredefined Schema With Fixed Attributes

• Descriptions Can Not be Overlapped or Descriptions Can Not be Overlapped or NestedNested

• Target at MOD: not suitable for dynamic Target at MOD: not suitable for dynamic creation, modification of videocreation, modification of video

• No Personalized ViewNo Personalized View

• No Spatio-temporal QueriesNo Spatio-temporal Queries

Video Database Browser Video Database Browser SystemSystemRowe, Boreczky et al. 1994Rowe, Boreczky et al. 1994

• Classify Metadata Into: Bibliographic, Classify Metadata Into: Bibliographic, Structural, And Content DataStructural, And Content Data

• Use Relational Database Schema Use Relational Database Schema (POSTGRES RDBMS)(POSTGRES RDBMS)

• Support Video Queries On Predefined Support Video Queries On Predefined AttributesAttributes

Video StratificationVideo StratificationSmith and Davenport, MIT, 1991 - 1992Smith and Davenport, MIT, 1991 - 1992

• Associate Description To A Associate Description To A Sequence Of Video FramesSequence Of Video Frames

• Simple Keyword SearchSimple Keyword Search

• Strata May OverlapStrata May Overlap

• Relation Among Strata Is AbsentRelation Among Strata Is Absent

BRAHMABRAHMADan et al., IBM T. J. Watson, 1996Dan et al., IBM T. J. Watson, 1996

• Browsing and Retrieval Browsing and Retrieval Architecture for Hierarchical Architecture for Hierarchical Multimedia AnnotationsMultimedia Annotations

• Each Annotation Node is an Each Annotation Node is an Attribute / Value PairAttribute / Value Pair

• Nodes Can Be Dynamically Created Nodes Can Be Dynamically Created and Shared by Multi-users and Shared by Multi-users

Media StreamsMedia StreamsDavis 1993Davis 1993

• Goal: overcome keyword annotation weaknessesGoal: overcome keyword annotation weaknesses

• Iconic Video Content AnnotationIconic Video Content Annotation

• Hierarchical: general -> specificHierarchical: general -> specific

• Represent And Match Temporal RelationsRepresent And Match Temporal Relations

• Fixed VocabularyFixed Vocabulary

• Doesn’t Address Textual Data, e.g. Closed Doesn’t Address Textual Data, e.g. Closed CaptionCaption

Algebraic Video SystemAlgebraic Video SystemWeiss et al, MIT, 1995Weiss et al, MIT, 1995

• Goal: Temporal Video Goal: Temporal Video CompositionComposition

• Basic Approach: StratificationBasic Approach: Stratification

Algebraic Video Data Algebraic Video Data ModelModel• Video ExpressionVideo Expression::

– multi-window, spatial, temporal and multi-window, spatial, temporal and content combination of raw video content combination of raw video segmentssegments

– recursively constructed using video recursively constructed using video algebraic operatorsalgebraic operators

• Video Algebraic Operators: creation, Video Algebraic Operators: creation, composition, output, and descriptioncomposition, output, and description

Algebraic Video Data Algebraic Video Data ModelModel

• Providing Multiple Coexisting Views Providing Multiple Coexisting Views (Nest Stratification)(Nest Stratification)

• Video Query: Boolean combination of Video Query: Boolean combination of attributesattributes

• Temporal Constraint Is Expressed As Temporal Constraint Is Expressed As Attribute ValuesAttribute Values

• Video Browsing Within The ExpressionVideo Browsing Within The Expression

VideoSTAR (STorage And VideoSTAR (STorage And Retrieval) SystemRetrieval) SystemHjelsvold et al,, 1995Hjelsvold et al,, 1995

• Goal: Multi-user Video Information Goal: Multi-user Video Information SharingSharing

• Basic Approach: StratificationBasic Approach: Stratification

VideoSTAR: Generic VideoSTAR: Generic Video Data ModelVideo Data Model

• Continuous Media Objects (Continuous Media Objects (CMObjectCMObjects)s)

• MediaStreamMediaStream: : – Virtual Video Streams (Virtual Video Streams (VideoStreamsVideoStreams))

– Video/Audio Recordings Video/Audio Recordings ((StoredMediaSegmentsStoredMediaSegments))

• An Arbitrary An Arbitrary StreamInterval StreamInterval can be can be annotatedannotated

VideoSTAR: Video VideoSTAR: Video Querying and BrowsingQuerying and Browsing

• Three Kinds of Video Context:Three Kinds of Video Context: – Basic, Secondary, and PrimaryBasic, Secondary, and Primary

– Unconditionally context sharingUnconditionally context sharing

• VideoSTAR Query AlgebraVideoSTAR Query Algebra– Boolean, Set , and Temporal OperatorsBoolean, Set , and Temporal Operators

– Based on Attribute/ValueBased on Attribute/Value

– Users Need to Choose Query ContextUsers Need to Choose Query Context

VideoSTAR: Video VideoSTAR: Video Querying and BrowsingQuerying and Browsing

Two Browsing OperatorsTwo Browsing Operators

• Retrieve All Annotations Over a Retrieve All Annotations Over a Video Stream or Interval Video Stream or Interval

• Retrieve All Structures Defined Retrieve All Structures Defined Over a IntervalOver a Interval

Advanced Video Advanced Video Information System (AVIS)Information System (AVIS)Adah, Candan, Chen, Erol, and Subrahamanian, University of Adah, Candan, Chen, Erol, and Subrahamanian, University of

Maryland. MSJ 1996Maryland. MSJ 1996

• Basic Approach: spatial Indexes + Basic Approach: spatial Indexes + RDB RDB

• Entities: Entities: things that are interesting which things that are interesting which may or may not actually appear in the movie, may or may not actually appear in the movie, including video objects, activity types, event including video objects, activity types, event (roles and teams)(roles and teams)

• Raw Video Frame SequencesRaw Video Frame Sequences

Advanced Video Advanced Video Information System (AVIS) Information System (AVIS)

• Associate MapAssociate Map: : entities <--> frame sequences.entities <--> frame sequences.

• Index:Index: frame segment tree + OBJECTARRAY + frame segment tree + OBJECTARRAY + EVENTARRAY + ACTIVITYARRAYEVENTARRAY + ACTIVITYARRAY

• All Clips Must Be Equal Length With No All Clips Must Be Equal Length With No OverlapOverlap

• No Spatial and Temporal QueriesNo Spatial and Temporal Queries

• No Logical Video AbstractionsNo Logical Video Abstractions

Common Video Object Common Video Object Model (CVOT)Model (CVOT)J. Li and T. Ozsu et al. University of Alberta, 1998J. Li and T. Ozsu et al. University of Alberta, 1998

• Focus On Salient Objects And Based On Focus On Salient Objects And Based On OODBMSOODBMS

• CVO Tree: each leaf is a video interval CVO Tree: each leaf is a video interval with salient objects (similar to AVIS) with salient objects (similar to AVIS) attachedattached

• Video Clips Can Be Overlapped To Model Video Clips Can Be Overlapped To Model Special Editing Effects (Fade In etc.)Special Editing Effects (Fade In etc.)

Common Video Object Common Video Object Model (CVOT) Model (CVOT)

• Query Language: MOQLQuery Language: MOQL– based on OQL proposed by ODMG for based on OQL proposed by ODMG for

ODBMSsODBMSs

– has both temporal and spatial operatorshas both temporal and spatial operators

• Symbolic Trajectory Representation And Symbolic Trajectory Representation And MatchingMatching

• Logical v.s. Physical Salient ObjectsLogical v.s. Physical Salient Objects

• Only Address Salient ObjectsOnly Address Salient Objects

• Representation Frames (RFrames)Representation Frames (RFrames)– Sport Highlight [Yow95]Sport Highlight [Yow95]

– Caption Detection [Smith95, Yeo96]Caption Detection [Smith95, Yeo96]

– Keyword Spotting [Smith95]Keyword Spotting [Smith95]

• Explicit Models (News Video) Explicit Models (News Video) [Swanberg93, Zhang94][Swanberg93, Zhang94]

Video Browsing (con.)Video Browsing (con.)

• Shot Clustering Based On Visual Shot Clustering Based On Visual Similarity and Temporal Similarity and Temporal Locality[Yeung95, Rui98]Locality[Yeung95, Rui98]– Scene Change Graph (CTG) [Yeung95] Scene Change Graph (CTG) [Yeung95]

Video->Shot Segmentation->Shot Video->Shot Segmentation->Shot Clustering->Scene SegmentationClustering->Scene Segmentation

Logical Hypervideo Data Logical Hypervideo Data Model (LHVDM)Model (LHVDM)

• DefinitionDefinition

• Hierarchical Video AbstractionsHierarchical Video Abstractions

• Hot Video Object ModelingHot Video Object Modeling

• Video IndexingVideo Indexing

• Video Semantic Association And HypervideoVideo Semantic Association And Hypervideo

• A Generic Video Database ArchitectureA Generic Video Database Architecture

Logical Hypervideo Data Logical Hypervideo Data Model (con.)Model (con.)

(PV, PVS, LV, LVS, HO, CD, LINKS, UV, (PV, PVS, LV, LVS, HO, CD, LINKS, UV, MAP)MAP)PV: Set Of Physical Video StreamsPV: Set Of Physical Video Streams

PVS: Set Of Physical Video SegmentsPVS: Set Of Physical Video Segments

LV: Set Of Logical Video StreamsLV: Set Of Logical Video Streams

LVS: Set Of Logical Video SegmentsLVS: Set Of Logical Video Segments

HO: Set Of Hot Objects HO: Set Of Hot Objects

CD: Set Of Content DescriptionsCD: Set Of Content Descriptions

LINK: Set Of Video HyperlinksLINK: Set Of Video Hyperlinks

UV: Set Of User ViewsUV: Set Of User Views

MAP: Set Of Mapping Relations MAP: Set Of Mapping Relations

Logical Hypervideo Data Logical Hypervideo Data Model (con.)Model (con.)

MAP includesMAP includesPV <--> PVS: Easy Data ManipulationPV <--> PVS: Easy Data Manipulation

PVS <--> LV: Data Independence And Data ReusePVS <--> LV: Data Independence And Data Reuse

LV <--> LVS: Multi-user View LV <--> LVS: Multi-user View

LV,LVS<-->HO:Effective QueryLV,LVS<-->HO:Effective Query

LV,LVS,HO,CD<-->UV: Multi-user View SharingLV,LVS,HO,CD<-->UV: Multi-user View Sharing

LV,LVS,HO,LINKS<-->CD: Semantic Content-based LV,LVS,HO,LINKS<-->CD: Semantic Content-based AccessAccess

Video Hyperlinks: Effective Video BrowsingVideo Hyperlinks: Effective Video Browsing

Hierarchical Video Hierarchical Video AbstractionsAbstractions

Physical Video Streams(PVs)

Physical Video Segments(PVSs)

Logical Video Streams(LVs)

Logical Video Segments(LVSs)

Hot Objects(HOs)

User Views(UVs)

Logical HypervideoData Model (LHVDM)

Video Hyperlinks

Hot Video ObjectsHot Video Objects

• What Is A Hot Video ObjectWhat Is A Hot Video Object– A Logical Video AbstractionA Logical Video Abstraction

– A Sub-Frame Region That Is “Hot” In A A Sub-Frame Region That Is “Hot” In A Set Of Logical Frame SequenceSet Of Logical Frame Sequence

• Why Call Them “Hot” Object?Why Call Them “Hot” Object?– Target Of InterestTarget Of Interest

– Hyperlink Property (Hot Video Spot)Hyperlink Property (Hot Video Spot)

Hot Video Objects (con.)Hot Video Objects (con.)

• ImplicitImplicit Hot Video Object Hot Video Object

• Why Hot Object Modeling Is Why Hot Object Modeling Is Important?Important?

– More Precise Video Annotation And More Precise Video Annotation And QueryQuery

– Capture Spatial Characteristics of Video Capture Spatial Characteristics of Video DataData

Hot Video Object ModelHot Video Object Model

Video Hyperlinks

(LINK)

Hot Object(HO)

Semantic Content (T)

Geometric Content (G)

Visual Content

Live Time Intervals

AudioContent

Hot Object TrackingHot Object Tracking

• Template MatchingTemplate Matching

• Active ContourActive Contour

E = E = * E * Eint int + (1- + (1- ) * E) * Eext ext

EEintint((VVii) = ) = ||VVi i ’’||2 2 + + ||VVi i

’’’’||22

EEext ext = = c c (-|(-|NN((V V (t))(t))II((V V (t))|)dt(t))|)dt

EEext ext ((VVii) =) = 1 - (D(1 - (D(VVii)/255 + 1)* (-|)/255 + 1)* (-|NN((VVii))II((VVii )|) )|)

Experiments With Hot Experiments With Hot Video Object TrackingVideo Object Tracking

Video IndexingVideo Indexing

• Main Indexes: Semantic Content Main Indexes: Semantic Content DescriptionsDescriptions

• Content DescriptionsContent Descriptions– Hot Objects’ v.s. LVSs’ v.s. LVs’ v.s. Hot Objects’ v.s. LVSs’ v.s. LVs’ v.s.

LINKSs’LINKSs’

– System’s v.s. Users’System’s v.s. Users’

– Sharing & Access ControlSharing & Access Control

• Auxiliary Indexes: Various MappingsAuxiliary Indexes: Various Mappings

Video Semantic Video Semantic Association and Association and

HypervideoHypervideo• Why Semantic Association Is Important?Why Semantic Association Is Important?

– More Effective Video Data Access More Effective Video Data Access

• Video Hyperlinks Represent Semantic Video Hyperlinks Represent Semantic AssociationsAssociations– Hypervideo And Hypervideo DatabasesHypervideo And Hypervideo Databases

– Flexible And User Adaptive Hypervideo Flexible And User Adaptive Hypervideo DatabasesDatabases

A Generic Video Database A Generic Video Database System ArchitectureSystem Architecture

Knowledge Inference

Engine

GeometricEngine

InformationRetrievalEngine

Video Database Engines

Integrator

VideoHyperlinks

Video Indexes

Hot Objects

Video Data

SpatialInformation

Video Annotations

KnowledgeBase

Video Database

Semantic Content-based Semantic Content-based Video Query LanguageVideo Query Language

((ExprExpr, , GranularityGranularity, Scope, , Scope, SpaceSpace))

• Expr Expr : video query expression: video query expression

• GranularityGranularity ::

– Logical Video Streams (v)Logical Video Streams (v)

– Logical Video Segments (s)Logical Video Segments (s)

– Hot Objects (o)Hot Objects (o)

Semantic Content-based Semantic Content-based Video Query Language Video Query Language

(con.)(con.)• ScopeScope : :

– System Annotations Only (s)System Annotations Only (s)

– Including User Annotations (u)Including User Annotations (u)

– Including Sharable Other Users’ Including Sharable Other Users’ Annotations (a)Annotations (a)

Semantic Content-based Semantic Content-based Video Query Language Video Query Language

(con.)(con.)• Space Space ::

– Subset Of VDBSubset Of VDB

– Can Be Result Of Another QueryCan Be Result Of Another Query

– Allows Recursive Query Allows Recursive Query RefinementRefinement

– Example: Q2 = (expr, o, u, Q1)Example: Q2 = (expr, o, u, Q1)

Boolean and IR Query Boolean and IR Query OperatorsOperators

• AND, OR, NOTAND, OR, NOT

• NOT Is Not SafeNOT Is Not Safe

• ADJ (adjacent)ADJ (adjacent)

• Regular Expression And Regular Expression And Approximate MatchingApproximate Matching

Temporal Query Temporal Query OperatorsOperators

Thirteen Interval Relations [Allen83]Thirteen Interval Relations [Allen83]

Before After

Starts Ends

Overlaps

During

Temporal Query Temporal Query Operators (con.)Operators (con.)

• Operators For LVSOperators For LVS

– Interval Temporal OperatorsInterval Temporal Operators

• Operators For HOOperators For HO

– Instance (or Point) Temporal Operators: Instance (or Point) Temporal Operators: more precise query specificationmore precise query specification

– Interval Temporal OperatorsInterval Temporal Operators

Spatial Query OperatorsSpatial Query Operators

• DirectionalDirectional

• TopologicalTopological

• DistanceDistance

West East

Query ProcessingQuery Processing

• Recursive And Top-Down/Bottom-UPRecursive And Top-Down/Bottom-UP

• Support Distribute EvaluationSupport Distribute Evaluation

• ““Close World Assumption”? Close World Assumption”?

• Answer: No (Raw Video Data) And Yes Answer: No (Raw Video Data) And Yes (Within User’s View)(Within User’s View)

• Reason: Video Data Is Semantically RichReason: Video Data Is Semantically Rich

Query Search SpaceQuery Search Space

• User Definable User Definable

• System Owned Subset of VDB Are System Owned Subset of VDB Are Always SearchedAlways Searched

• User’s Queries Are Processed Within User’s Queries Are Processed Within One’s ViewOne’s View

• Determined By A Query’s Granularity, Determined By A Query’s Granularity, Scope, Space, And UserScope, Space, And User

Efficient Query ProcessingEfficient Query Processing

• Query Augmentation or Pre-Query Augmentation or Pre-filteringfiltering

• Query Evaluation OrderQuery Evaluation Order

• Query Caching And Knowledge Query Caching And Knowledge BaseBase

Query ExamplesQuery Examples

• Simple QueriesSimple Queries

““Find video clips that has a red BMW Z3 in Find video clips that has a red BMW Z3 in it”it”

Q1 = ((red Q1 = ((red BMW

Query Examples (con.)Query Examples (con.)

• Temporal QueryTemporal Query

“Find video clips in which a scene with a bird flying appears after the scene with a child eating ice cream”

Q3 = ((bird flying) Tafter (child eating ice cream), o, -,-)

Q3’ = ((bird flying) Iafter (child eating ice cream), s, -,-)

• Spatial QuerySpatial Query

“Find video clips in which the Vice President Al Gore standing to the right of President Clinton who is giving his Union speech at Washington DC”

Q2a = Union speech Washington DC, s, -,-)

Q2b =(((Vice President Gore) Sright (President Clinton)), o, -, Q2a)

• Spatial QueriesSpatial Queries

““Find video clips in which a blue bird Find video clips in which a blue bird is flying over a kid’s head”is flying over a kid’s head”

QQ4 4 = ((blue = ((blue bird bird flying) flying) SSaboveabove ((child ((child kid kid boy boy girl) girl) head), o, -,-)head), o, -,-)

• Spatio-Temporal QuerySpatio-Temporal Query

““Find video clips in which a police car Find video clips in which a police car with siren on is chasing a red Porsche with siren on is chasing a red Porsche and hit on it”and hit on it”

QQaa = ((police car siren), o, -,-) = ((police car siren), o, -,-)

QQbb = (red Prosche), o, -,-) = (red Prosche), o, -,-)

Q = ((QQ = ((Qaa S Sapproachapproach Q Qbb) I) Ibeforebefore (Q (Qaa S Stouchtouch Q Qbb), o, ), o, -,-)-,-)

Web-based Logical Web-based Logical Hypervideo Database Hypervideo Database

System (WLHVDB)System (WLHVDB)

• System ArchitectureSystem Architecture

• Video Wrapper And Lazy DeliveryVideo Wrapper And Lazy Delivery

• Populating The Video DatabasePopulating The Video Database

• Distributed Query ProcessingDistributed Query Processing

• Access Control And User ProfilingAccess Control And User Profiling

System ArchitectureSystem Architecture

Server

InternetInternet

Client

Video Annotation Engine

Video Parser

Query Processing Server

IR Engine

Various Tools and Scripts

Account Manager

User Profile Manager

Access Control Manager

Server Cache Manager

Physical Video Data

Logical Video Data

Video Annotations

User Views

Video Indexes

User Profiles

Server Query Cache

Video Hyperlinks

Query Input

Query Result Presentation

Media Player

Client Query Cache

Client Cache Manager

Data Editor ClientClient

Information Retrieval (IR) Information Retrieval (IR) and Glimpseand Glimpse

• Full Text ScanningFull Text Scanning

• Signature FilesSignature Files

• Inversion - almost all commercial Inversion - almost all commercial systemssystems

• Vector Model and Clustering: Vector Model and Clustering: weighted and relevance feedbackweighted and relevance feedback

Glimpse Glimpse (GLobal IMPlicit SEarch)(GLobal IMPlicit SEarch)

• Small Index: Small Index: 2-4%2-4%

• Full Text Boolean QueriesFull Text Boolean Queries

• Arbitrary Approximate And Regular Arbitrary Approximate And Regular Expression MatchingExpression Matching

• Efficient (<500MB): Efficient (<500MB): 5 seconds for finding 5 seconds for finding 10 occurrences among 4500 files of total size 10 occurrences among 4500 files of total size of 69MBof 69MB

Video Wrapper And Lazy Video Wrapper And Lazy DeliveryDelivery

• Why Need Them? Why Need Them? – Huge Date Volume v.s. Limited BandwidthHuge Date Volume v.s. Limited Bandwidth

• Why Lazy Delivery?Why Lazy Delivery?– ““Avoid” Sending Video Data InformationAvoid” Sending Video Data Information

• What is a Video Wrapper?What is a Video Wrapper?– Multi-resolution Video RepresentationMulti-resolution Video Representation

– Adaptive Local Refinement Based On InterestAdaptive Local Refinement Based On Interest

Current Video Wrapper Current Video Wrapper Implementation Implementation

RFrames

Clip Posters

Video Poster

Populating the VDBPopulating the VDB

Video Capture

Video Parsing and

Segmentation

Video Representation

Construction

Video Indexes

Video Wrapper

LVsClosed Caption

CaptureLVSs and

Annotations

Hot Objects

Object Tracking

DBA UsersAnalogVideo

Query ProcessingQuery Processing

Client

Video Query

Sending IR Sub-queries

Query Parsing and

Syntax Checking

Processing Boolean and Spatio-temporal Operators

Result

No Error

Show ErrorMessage Processing IR Sub-queries

Sending Partial Results

Server

Search User’s View

Server and Client-Side Server and Client-Side Query CachingQuery Caching

Client-side Cache

IR Sub-queries

Get Results

Update Cache

Server-side Cache

Get Result

Send Results

Query IR Engine

Update Cache

Results

Hit Hit

Miss Miss

Client Server

• Loop of Query-Browsing-PlayLoop of Query-Browsing-Play

• Inter- And Intra-Video BrowsingInter- And Intra-Video Browsing

• User Adaptive User Adaptive

• Video Wrapper Refinement Video Wrapper Refinement ProcessProcess

Video Browsing (con.)Video Browsing (con.)

Video Query

Video Posters

Video Board

RFrames Annotations Audio Stream Video Hyperlinks

Video Stream

Access Control and User Access Control and User ProfilingProfiling

• Different Categories Of Users And GroupsDifferent Categories Of Users And Groups

• Different Permissions On Video Data And Different Permissions On Video Data And MetadataMetadata

• Users Need To Authenticate ThemselvesUsers Need To Authenticate Themselves

• User Activities And Local Environment User Activities And Local Environment Information Are RecordedInformation Are Recorded

Summary of Major Summary of Major ContributionsContributions

• A Novel Video Data Model A Novel Video Data Model (LHVDM) That Supports(LHVDM) That Supports– Multi-level Video AbstractionMulti-level Video Abstraction

– Video Data IndependenceVideo Data Independence

– Multi-user Data SharingMulti-user Data Sharing

– Dynamic And Incremental View UpdateDynamic And Incremental View Update

– Variable Access GranularitiesVariable Access Granularities

Summary of Major Summary of Major Contributions (con.)Contributions (con.)

• A Novel Video Data Model A Novel Video Data Model (LHVDM) That(LHVDM) That Represents Represents

– Both Spatial And Temporal Video Both Spatial And Temporal Video Characteristics Characteristics

– Hot Video ObjectsHot Video Objects

– Video Semantics And Semantic Video Semantics And Semantic AssociationsAssociations

• A Novel Video Data Model (LHVDM) A Novel Video Data Model (LHVDM) ThatThat

– Supports User Adaptive Video Browsing Supports User Adaptive Video Browsing

– Hyperlinks Video Entities For More Hyperlinks Video Entities For More Efficient BrowsingEfficient Browsing

– Can Be Extended To Other Multimedia Can Be Extended To Other Multimedia Data Such As Audio DataData Such As Audio Data

• A Video Query Language That AllowsA Video Query Language That Allows

– Easy Query FormulationEasy Query Formulation

– Video Semantic Content-based Video Semantic Content-based QueriesQueries

– Both Spatial And Temporal Both Spatial And Temporal ConstraintsConstraints

– Hot Object-based Video QueriesHot Object-based Video Queries

– User Selectable Granularity, Space, User Selectable Granularity, Space, and Scope and Scope

• A Generic Video Database A Generic Video Database System Architecture That IsSystem Architecture That Is

– Modular, Flexible, And ScalableModular, Flexible, And Scalable

– Readily To Be Distributed Readily To Be Distributed

– Easy To Be ImplementedEasy To Be Implemented

• The Design And Implementation Of The Design And Implementation Of A Web-based Prototype That UsesA Web-based Prototype That Uses– A Novel Video Wrapper And Lazy A Novel Video Wrapper And Lazy

Evaluation Approach Evaluation Approach

– Distributed Query Processing And Sub-Distributed Query Processing And Sub-query Caching Schemaquery Caching Schema

– Multi-user Data Access Control And Multi-user Data Access Control And View SharingView Sharing

– User ProfilingUser Profiling

Future WorkFuture Work

• Identify New Applications And Identify New Applications And Perform More Extensive TestsPerform More Extensive Tests

• Explore And Integrate Other Forms Explore And Integrate Other Forms Of Video Annotations Such As Of Video Annotations Such As Visual FeaturesVisual Features

• Extended To Other Multimedia Extended To Other Multimedia Data Such As Slides, Images, And Data Such As Slides, Images, And AudioAudio

Future WorkFuture Work(con.)(con.)

• Knowledge-based Video AccessKnowledge-based Video Access

• Automatic Generation of Video Automatic Generation of Video WrappersWrappers

• Video Data Security and Access Video Data Security and Access ControlControl

Related PublicationsRelated Publications

• Survey and BooksSurvey and Books– A. K. Elmagarmid and H. Jiang. Multimedia Video A. K. Elmagarmid and H. Jiang. Multimedia Video

(chapter), Encyclopedia of Electrical and Electronics (chapter), Encyclopedia of Electrical and Electronics Engineering. John Wiley & Sons. 1998, In press.Engineering. John Wiley & Sons. 1998, In press.

– A. K. Elmagarmid, H, Jiang and et al. Video Database A. K. Elmagarmid, H, Jiang and et al. Video Database Systems: Issues, Products and Applications. Kluwer Systems: Issues, Products and Applications. Kluwer Academic Publishers, 1997.Academic Publishers, 1997.

– H. Jiang, A. Helal, A. K. Elmagarmid, and A. Joshi. Scene H. Jiang, A. Helal, A. K. Elmagarmid, and A. Joshi. Scene Change Detection Techniques for Video Database Change Detection Techniques for Video Database

Systems. ACM Multimedia Sys., 6:186-195, May 1998.Systems. ACM Multimedia Sys., 6:186-195, May 1998. – H. Jiang and A. K. Elmagarmid. Video Databases: State of H. Jiang and A. K. Elmagarmid. Video Databases: State of

the Art, State of the Market and State of Practice. Proc. the Art, State of the Market and State of Practice. Proc. 2nd Intl. Workshop on Multimedia Info. Sys., Page 87-91, 2nd Intl. Workshop on Multimedia Info. Sys., Page 87-91, West Point, New York, September 26-28, 1996.West Point, New York, September 26-28, 1996.

Related Publications Related Publications (con.)(con.)

• Video Analysis And Computer VisionVideo Analysis And Computer Vision– H. Jiang and A. K. Elmagarmid. Extract Visual H. Jiang and A. K. Elmagarmid. Extract Visual

Content Representation in Video Databases, Proc. of Content Representation in Video Databases, Proc. of Intel Conf. on Imaging Sci., Sys., and Tech. Intel Conf. on Imaging Sci., Sys., and Tech. (CISST'97), Las Vegas, Nevada, June 30 - July 3, (CISST'97), Las Vegas, Nevada, June 30 - July 3, 1997. 1997.

– H. Jiang and J. Dailey. A Video Database System for H. Jiang and J. Dailey. A Video Database System for Studying Animal Behavior. Proc. SPIE Photonics Studying Animal Behavior. Proc. SPIE Photonics East'96 - Multimedia Storage and Archiving Sys. Intl. East'96 - Multimedia Storage and Archiving Sys. Intl. Conf., Page 162-173, Volume SPIE-2916, Boston, MA, Conf., Page 162-173, Volume SPIE-2916, Boston, MA, November 18-19, 1996.November 18-19, 1996.

• Video Data Model, Indexing, and Video Data Model, Indexing, and AccessAccess– H. Jiang, D. Montesi, and A. K. Elmagarmid. Integrate H. Jiang, D. Montesi, and A. K. Elmagarmid. Integrate

Video and Text for Content-based Accesses to Video Video and Text for Content-based Accesses to Video Databases. J. of Multimedia Sys. and Tools. 1998, Databases. J. of Multimedia Sys. and Tools. 1998, accepted. accepted.

– H. Jiang and A. K. Elmagarmid. WVTDB - A Web-based H. Jiang and A. K. Elmagarmid. WVTDB - A Web-based VideoText Database System. Special Issue on Data VideoText Database System. Special Issue on Data and Knowl. Management in Multimedia Sys., IEEE and Knowl. Management in Multimedia Sys., IEEE Trans. on Data and Knowl. Eng.. 1998, accepted. Trans. on Data and Knowl. Eng.. 1998, accepted.

• Video Data Model, Indexing, and AccessVideo Data Model, Indexing, and Access• H. Jiang and A. K. Elmagarmid. Spatial and Tempora H. Jiang and A. K. Elmagarmid. Spatial and Tempora

Content-based Queries in Hypervideo Databases. Special Content-based Queries in Hypervideo Databases. Special Issue on Multimedia Data Management, The Very Large Issue on Multimedia Data Management, The Very Large Database J.. 1998, submitted. Database J.. 1998, submitted.

• F. Kokkoras, H. Jiang, I. Valhavas, A. K. Elmagarmid, and E. F. Kokkoras, H. Jiang, I. Valhavas, A. K. Elmagarmid, and E. N. Houstis. Smart VideoText: An Intelligent Video Database N. Houstis. Smart VideoText: An Intelligent Video Database System}, CSD-TR 97-049, Department of Computer System}, CSD-TR 97-049, Department of Computer Sciences, Purdue University, West Lafayette, IN 47907, Sciences, Purdue University, West Lafayette, IN 47907, USA, 1997. USA, 1997.

END OF THE END OF THE PRESENTATIONPRESENTATION

THANK YOU VERY THANK YOU VERY MUCHMUCH

MPEG-IMPEG-I

• A Bit Stream For Compressed Video A Bit Stream For Compressed Video And Audio Stream And Audio Stream

• Optimized For Data Rate Of 1.5MbpsOptimized For Data Rate Of 1.5Mbps

• Non-interlacedNon-interlaced

• Typical Compress Ratio 27:1Typical Compress Ratio 27:1

• Quality: VHS VideoQuality: VHS Video

MPEG-IIMPEG-II

• Digital Boardcasting Video (CCIR601) At A Digital Boardcasting Video (CCIR601) At A Data Rate Of 4 - 9 MbpsData Rate Of 4 - 9 Mbps

• Compatible With MPEG-ICompatible With MPEG-I

• Coding Of Interlaced VideoCoding Of Interlaced Video

• User-selectable DCT PrecisionUser-selectable DCT Precision

• Scalable Extension For Multi-resolution Scalable Extension For Multi-resolution Coding Coding

• Wide Range Of Frame SizeWide Range Of Frame Size

MPEG-IVMPEG-IV

• Targets At Multimedia ApplicationsTargets At Multimedia Applications

• Coding Of Video Objects For Content-based Coding Of Video Objects For Content-based InteractivityInteractivity

• Improved Temporal Random AccessImproved Temporal Random Access

• More Efficient Coding More Efficient Coding

• Supports Very Low Video Data Rate (<64Kbps)Supports Very Low Video Data Rate (<64Kbps)

• Robustness For Wireless ApplicationsRobustness For Wireless Applications

MPEG-VII:MPEG-VII:

Digital Library InitiatvesDigital Library Initiatves

• Standford:Standford: integrated virtual library integrated virtual library with new services and uniform access to with new services and uniform access to networked information collectionsnetworked information collections

• UC BerkeleyUC Berkeley: : 1) computer vision for 1) computer vision for digital documents; 2) database protocols digital documents; 2) database protocols for client/server information retrieval; 3) for client/server information retrieval; 3) data acquisition technologies; 4) content-data acquisition technologies; 4) content-based browserbased browser

Digital Library Initiatves Digital Library Initiatves (con.)(con.)

• UC Santa Barbara (Alexandra):UC Santa Barbara (Alexandra): a a distributed system that provides a distributed system that provides a comprehensive range of library services for comprehensive range of library services for collections of spatially indexed and graphical collections of spatially indexed and graphical information (digital maps and images) information (digital maps and images)

• UC BerkeleyUC Berkeley: : high quality search and high quality search and display of Internet information (SGML display of Internet information (SGML documents)documents)

Digital Library Initiatves Digital Library Initiatves (con.)(con.)

• University of Michigan:University of Michigan: a diverse a diverse collection of earth and space sciences and collection of earth and space sciences and cooperating software agents cooperating software agents

• CMU (Informedia)CMU (Informedia): : content-search and content-search and retrieval of digital libraries using speech retrieval of digital libraries using speech understanding, computer vision, natural understanding, computer vision, natural language processinglanguage processing