Semantic Multimedia Analysis and Search - Future SOC Symposium 2013

Post on 16-Jan-2015

669 views 0 download



keynote at FutureSOC Symposium 2013, at HPI, Potsdam, 20-21.06.2013


Symposium on Future Trends in Service-Oriented Computing, HPI Potsdam, 20-21.06.2013

Semantic Multimedia Analysis and Search

Dr. Harald SackHasso-Plattner-Institut for IT-Systems Engineering

University of Potsdam

Potsdam, 21/06/2013

Freitag, 21. Juni 13

Harald Sack, Hasso-Plattner-Institute for IT-Systems Engineering, LDW 2011, Magdeburg, 30. Sep. 2011

•Searching Multimedia Web vs. Archive

•How to Open Up Multimedia Data?Automated Multimedia Analysis

•How to Determine the Meaning of (Multimedia) Metadata? Context-Driven Semantic Analysis

•How to Make Use of Semantic Metadata?Exploratory Search and Intelligent Recommendations

Semantic Multimedia Analysis and Search

Freitag, 21. Juni 13

Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam


Searching the WebFreitag, 21. Juni 13

Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam


Searching the WebFreitag, 21. Juni 13

Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam


Freitag, 21. Juni 13

Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam


Google Knowledge Graph

= “search results with semantic- search information gathered from a wide variety of sources“

Freitag, 21. Juni 13

Harald Sack, Hasso-Plattner-Institute for IT-Systems Engineering, Workshop ,Corporate Semantic Web‘, XInnovations 2011, Berlin, 19. Sep. 2011Google Multimedia Search

Freitag, 21. Juni 13

Harald Sack, Hasso-Plattner-Institute for IT-Systems Engineering, LDW 2011, Magdeburg, 30. Sep. 2011

‣Google Multimedia Search relies on text-based metadata and link context

How does Google find Multimedia?

Freitag, 21. Juni 13

Harald Sack, Hasso-Plattner-Institute for IT-Systems Engineering, LDW 2011, Magdeburg, 30. Sep. 2011

Seach by Media Content

Freitag, 21. Juni 13

Harald Sack, Hasso-Plattner-Institute for IT-Systems Engineering, LDW 2011, Magdeburg, 30. Sep. 2011

The Ordinary Archive is a Small World...

Neil Armstrong

Freitag, 21. Juni 13

Harald Sack, Hasso-Plattner-Institute for IT-Systems Engineering, LDW 2011, Magdeburg, 30. Sep. 2011

But, wouldn‘t it be nice, if.....

Neil Armstrong

...but maybe you are also interested in

- Buzz Aldrin (1 videos)- John Glen (1 video)- Juri Gagarin (2 videos)

- Richard Nixon (3 videos)

- Apollo 11 (1 video)- NASA (20 videos)

- Moon (14 videos)

- space exploration (34 videos)

- technology (1.205 videos)

Sorry, no results found for ‘Neil Armstrong‘...

Freitag, 21. Juni 13

Harald Sack, Hasso-Plattner-Institute for IT-Systems Engineering, LDW 2011, Magdeburg, 30. Sep. 2011

How to Search in Multimedia Archives?

Freitag, 21. Juni 13

vfm - Seminar: Metadatenmanagement in Medienunternehmen, 05. September 2012, Bonn Jörg Waitelonis, Hasso-Plattner-Institut Potsdam

Content-Based Search in Multimedia Archives relies on text-based Metadata Current Solution: Manual Annotation

Freitag, 21. Juni 13

Harald Sack, Hasso-Plattner-Institute for IT-Systems Engineering, LDW 2011, Magdeburg, 30. Sep. 2011 image


Text Recognition

Visual Analysis

(Selected) Automated Media Analysis

Face Detection

Face Detection

Logo Detection


text / images




Recognitionaudio event detection


Freitag, 21. Juni 13

Structural Video Analysis

• Decomposition of time-based media into meaningful media fragments of coherent content that can be used as basic element for indexing and classification







Freitag, 21. Juni 13

Video Optical Character Recognition (OCR)

Fig. 1. Workflow of the proposed text detection method. (b) is the vertical edge map of (a). (c) is the vertical dilation map of(b). (d) is the binary map of (c). (e) the result map of subsequent connected component analysis. (f) shows the binary map afterthe adaptive projection profile refinement. (g) is the final detection result.

for text detection of nature scene images. The operator com-putes for each pixel the width of the most likely stroke con-taining the pixel. The output of the operator is a stroke-featuremap, which has the same size as the input image, while eachpixel represents the corresponding stroke width value of theinput image.


Text detection is the first task of video OCR. Our approachdetermines, whether a single frame of a video file containstext lines, for which a tight bounding box is returned. In or-der to manage detected text lines efficiently, we have defined aclass ”text line object” with the following properties: bound-ing box location (the top-left corner position), bounding boxsize. After the first round of text detection, the refinement andthe verification procedures ensure the validity of the detectionresults in order to reduce false alarms.

3.1. Text detector

Before performing the text detection process, a gaussiansmooth filter is applied to the images that have an entropyvalue larger than a predefined threshold Tentr . For our pur-pose, Tentr =5.25 has proven to be to the best advantage.

We have developed an edge based text detector, subse-quently referred to edge text detector. The advantage of ourdetector is its computational efficiency compared to other ma-chine learning based approaches, because no computation-ally expensive training period is required. However, for vi-sually different video sequences a parameter adaption has tobe performed. The best suited parameter combination of ourmethod were learned from the test runs on the given test data.

Fig. 2. Workflow of the proposed adaptive text line refinementprocedure

The processing workflow for a single frame is depictedin Fig. 1 (a-e). First, a vertical edge map is produced usingSobel filter [8] (cf. Fig. 1 (b)). Then, the morphological dila-tion operation is adopted to link the vertical character edgestogether (cf. Fig. 1 (c)). Let MinW denote the detected min-imal text line width. A rectangle kernel:1�MinW is definedfor vertical dilation operator. Subsequently, a binary maskis generated by using Otsu’s thresholding method [9]. Ulti-mately, we create a binary map after Connected Component

• Video OCR is much more difficult than traditional print OCR• fast detection/filtering of text candidates• verification of text candidates• script separation from background• visual quality enhancement• application of standard OCR software• spell correction w.r.t. context and temporal


Freitag, 21. Juni 13

• Face DetectionDetect candidate image regionsin a video frame that depict a human face

• Face TrackingTrack a detected face in videoover consecutive frames within shot boundaries

• Face ClusteringGroup faces detected and tracked in videos into visually similar sets within a single video

• Face Recognition/IdentificationReliable identification of detected faces

Video Face Detection, Tracking & Clustering

personfrontal face:90%

not a person

personprofile face:70%

Freitag, 21. Juni 13

Visual Concept Detection

• Adaption of traditional ,Bag of Words‘ approach from text retrieval

• Image is expressed as vector (histogram)of dictionary codeword frequencies

• classification via machine learning(Support Vector Machines)

• Konzeptzuordnung durch maschinelles Lernverfahren (hier Support Vector Machines)

Freitag, 21. Juni 13

Annotation of Audiovisual Data

Metadata Extraction

Metadata (e.g. MPEG-7) ... <SpatialDecomposition> <TextAnnotation> <KeywordAnnotation> <Keyword>Astronaut</Keyword> </KeywordAnnotation> </TextAnnotation> <SpatialMask> <SubRegion> <Polygon> <Coords> 480 150 620 480 </Coords> </Polygon> </SubRegion> </SpatialMask> ... </SpatialDecomposition> ...

• Multimedia data with spatiotemporal Annotations

Neil Armstrong

Freitag, 21. Juni 13

Harald Sack, Hasso-Plattner-Institute for IT-Systems Engineering, LDW 2011, Magdeburg, 30. Sep. 2011

• Authoritative Metadata• structured data• semi-structured data

• natural language text • Non-authoritative Metadata

• (free) user tags and comments• restricted vocabularies

• (Media) Analysis Metadata• low level features• high level features

• etc.

How to Determine the Meaning of Metadata?





location dependency



level ofabstraction

Freitag, 21. Juni 13

Neil Armstrong


is a


is a

Science Occupation






has an

,Neil Armstrong‘ is more than just a character string

Kosmonautsame as

Juri Gagarin

is a

is NOT a


Freitag, 21. Juni 13

Where does the knowledge come from...?

Freitag, 21. Juni 13

Astronaut Person

Neil Armstrong

Science Occupation


is a is a

is a

is a has a

Web of Data

Freitag, 21. Juni 13

Web of Data = Linked Open DataBut what, if there is no trivial unique identification?

Armstronguser tag

Freitag, 21. Juni 13

Semantic Web Technologies , Dr. Harald Sack, Hasso Plattner Institute, University of Potsdam


Freitag, 21. Juni 13

Semantic Web Technologies , Dr. Harald Sack, Hasso Plattner Institute, University of Potsdam


Freitag, 21. Juni 13

Web of Data = Linked Open DataUnderstanding requires Context




Freitag, 21. Juni 13

4242 42 4224424242 42 4242Semantic AnalysisSemantics is determined by Context

Context Item

N.Steinmetz, H.Sack: Semantic Multimedia Information Retrieval Based on Contextual Descriptions, 2013

„Armstrong landed the Eagle on the Moon.“Text

SEMEX Multimedia Context Model

Context Dimensions










Contextual Description


Level of Structure



Freitag, 21. Juni 13


George Armstrong Custer

Neil Armstrong

The Armstrong Twins

Armstrong, Florida

Armstrong, Ontario

Armstrong Automobile

Joe ArmstrongArmstrong County, Texass

Armstrong Gun

Craig Armstrong

Armstrong (Moon Crater)

Louis Armstrong

Armstrong Tunnel

Louis Armstrong International Airport

Armstrong‘s Theorem

Sir Thomas Armstrong

Ian Armstrong

Eagle Moon

Eagle (Bird)

Eagle (heraldry)


The Eagle (2011 film)

Eagle (song)

John H. EagleEagle (typeface)

Eagle Falls (Washington)

Eagle (Moon Crater)

Eagle (comic)

Eagle (lunar module)

Eagle TV

Armstrong Tunnel

The Eagle (Pub)

War Eagle

The Eagle (newspaper)

Eagle (racehorse)

Angela EagleLinda Eagle

James Philipp Eagle

95 entities448 entities

Armstrong (British Columbia)Karen Armstrong

Curtis Armstrong

Gillian Armstrong Hilary Armstrong

William L. Armstrong

156 entities

Man on the Moon (film)

Moon (song)

Moon Son-Ri

C Moon

The Moon (Tarot card)

Edgar Moon

Moon OSMoon (Band)


Moon 44

Man on the Moon (soundtrack)

William Moon

Lottie Moon

Mr. Moon (song)

Man on the Moon (musical)

Darvin Moon

Moon 83

Francis MoonGary Moon

Robert Charles Moon

Black Moon

Allan Moon

Ban-Ki Moon

Fly me to the Moon (song)

Semantic AnalysisNamed Entity Mapping

„Armstrong landed the Eagle on the Moon.“

Consider all entities within the same context

Freitag, 21. Juni 13

Select matching entities from all possible candidate entities: • Popularity based strategies• Linguistical strategies• Statistical strategies• Semantic based strategies

General Approach1. Make an assumption 2. Do the strategies support or contradict your assumption3. Make decision according to logical and probabilistic rules/constraints

Semantic AnalysisNamed Entity Recognition

N. Ludwig, H. Sack, “Named entity recognition for user-generated tags,TIR 2011

• reference text corpus(wikipedia)

• link graph (wikipedia)• semantic graph


Entity Selection Process

Freitag, 21. Juni 13


George Armstrong Custer

The Armstrong Twins

Armstrong, Florida

Armstrong, Ontario

Armstrong Automobile

Joe ArmstrongArmstrong County, Texass

Armstrong Gun

Craig Armstrong

Armstrong (Moon Crater)

Armstrong Tunnel

Louis Armstrong International Airport

Armstrong‘s Theorem

Sir Thomas Armstrong

Ian Armstrong

Eagle Moon

Eagle (Bird)

Eagle (heraldry)


The Eagle (2011 film)

Eagle (song)

John H. EagleEagle (typeface)

Eagle Falls (Washington)

Eagle (Moon Crater)

Eagle (comic)

Eagle TV

Armstrong Tunnel

The Eagle (Pub)

War Eagle

The Eagle (newspaper)

Eagle (racehorse)

Angela EagleLinda Eagle

James Philipp Eagle

95 entities448 entities

Armstrong (British Columbia)Karen Armstrong

Curtis Armstrong

Gillian Armstrong Hilary Armstrong

William L. Armstrong

156 entities

Man on the Moon (film)

Moon (song)

Moon Son-Ri

C Moon

The Moon (Tarot card)

Edgar Moon

Moon OSMoon (Band)

Moon 44

Man on the Moon (soundtrack)

William Moon

Lottie Moon

Mr. Moon (song)

Man on the Moon (musical)

Darvin Moon

Moon 83

Francis MoonGary Moon

Robert Charles Moon

Black Moon

Allan Moon

Ban-Ki Moon

Neil Armstrong

Eagle (lunar module)


Louis Armstrong

Fly me to the Moon (song)

Semantic AnalysisNamed Entity Recognition

„Armstrong landed the Eagle on the Moon.“

N. Steinmetz, H.Sack: Semantic Multimedia Information Retrieval Based on Contextual Descriptions, 2013

Entity Selection Process(Semantic) Graph Analysis

Freitag, 21. Juni 13

4242 42 4224424242 42 4242

vfm - Seminar: Metadatenmanagement in Medienunternehmen, 05. September 2012, Bonn Jörg Waitelonis, Hasso-Plattner-Institut Potsdam


Semantically Annotated Multimedia

Video Analysis /Metadata Extraction




e.g., person xylocation yzevent abc

e.g., bibliographical data,geographical data,encyclopedic data, ..

Entity Recognition/ Mapping

N. Ludwig, H. Sack: Named Entity Recognition for User-Generated Tags. In Proc. of the 8th Int. Workshop on Text-based Information Retrieval, IEEE CS Press, 2011

Freitag, 21. Juni 13

Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam


Entity Based Search

• linguistic ambiguities of traditional keyword based search can be avoided

• enables high precision and high recall retrieval

• Query string refinement / extension• entity auto-suggestion• interpretation of natural language queries

J. Osterhoff, J. Waitelonis, H. Sack, Widen the Peepholes! Entity-Based Auto-Suggestion as a rich and yet immediate Starting Point for Exploratory Search, IVDW 2012

Freitag, 21. Juni 13

Freitag, 21. Juni 13

Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam


search facets

C. Hentschel, H. Sack, et al., Open up cultural heritage in video archives with mediaglobe, I2CS 2012

Freitag, 21. Juni 13

Freitag, 21. Juni 13

Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam


Explorative Search












dbpedia-owl:mission Waitelonis, H. Sack: Towards exploratory video search using linked data, MTAP Volume 59, Number 2 (2012), 645-672



Freitag, 21. Juni 13

Exploratory Search and Serendipity•Find something that you were not looking for on purpose ...





Freitag, 21. Juni 13

Explorative Search & Intelligent Recommmendationwith yovisto

Freitag, 21. Juni 13

Explorative Search & Intelligent Recommmendationwith yovisto

Freitag, 21. Juni 13

Harald Sack, Hasso-Plattner-Institute for IT-Systems Engineering, LDW 2011, Magdeburg, 30. Sep. 2011

Contact:Dr. Harald SackHasso-Plattner-Institut für SoftwaresystemtechnikUniversität PotsdamProf.-Dr.-Helmert-Str. 2-3D-14482 Potsdam

Homepage: Twitter: lysander07 / biblionomicon / yovisto Slides can be found at

Thank you very much

for your attention!

Freitag, 21. Juni 13