New 21 Multimedia, Broadcasting, and eCulturetroncy/Publications/Troncy-swhandbook11.pdf · 2011....

21 Multimedia, Broadcasting,and eCulture

John Domi

DOI 10.100

Lyndon Nixon1 . Stamatia Dasiopoulou2 . Jean-Pierre Evain3 . EeroHyvonen4 . Ioannis Kompatsiaris2 . Raphael Troncy51Semantic Technology Institute (STI) International, Vienna,Austria2Centre for Research and Technology Hellas (CERTH),Thermi-Thessaloniki, Greece3European Broadcasting Union (EBU), Grand-Saconnex,Switzerland4Aalto University and University of Helsinki, Aalto, Finland5EURECOM, Sophia Antipolis, France

21.1

ng

7

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 913

21.2
Scientific and Technical Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 915
21.2.1
Multimedia Semantics: Vocabularies and Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . 915
21.2.1.1
Multimedia Vocabularies on the Semantic Web . . . . . . . . . . . . . . . . . . . . . . . . . . . 915
21.2.2
Semantic Web–Based Multimedia Annotation Tools . . . . . . . . . . . . . . . . . . . . . . 923
21.2.3
Semantic Multimedia Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 929
21.2.4
Semantics in Broadcasting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 935
21.2.4.1
Metadata in Broadcasting from Its Origin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 935
21.2.4.2
Metadata Standardization in Broadcasting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 936
21.2.4.3
Using Ontologies: Metadata + Semantic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 937
21.2.4.4
A Semantic Representation of TV-Anytime in a Nutshell . . . . . . . . . . . . . . . . . 937
21.2.4.5
A Semantic Representation of Thesauri . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 941
21.2.4.6
The Holy Grail: Agreeing on a Class Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 942
21.2.4.7
Agreeing on Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 943
21.3
Example Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 945
21.3.1
Semantic Television . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 945
21.3.1.1
User Activity Capture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 947
21.3.1.2
Enriched EPG Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 948
21.3.1.3
Alignment Between Vocabularies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 948
21.3.1.4
Personalized TV Program Recommendation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 949
21.3.2
Semantics in Cultural Heritage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 949
21.3.2.1
Ontological Dimensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 951
21.3.2.2
Challenges of Content Creation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 953
ue, Dieter Fensel & James A. Hendler (eds.), Handbook of Semantic Web Technologies,

/978-3-540-92913-0_21, # Springer-Verlag Berlin Heidelberg 2011

912 21 Multimedia, Broadcasting, and eCulture

21.3.2.3
Syntactic and Semantic Interoperability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 954
21.3.2.4
Semantic eCulture Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 955
21.3.2.5
Semantic Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 959
21.3.2.6
Semantic Browsing and Recommending . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 960
21.3.2.7
Visualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 961
21.3.2.8
Cultural Heritage as Web Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 962
21.4
Related Resources Including Key Papers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 962
21.4.1
Multimedia Ontologies, Annotation, and Analysis . . . . . . . . . . . . . . . . . . . . . . . . 962
21.4.2
Broadcaster Artifacts Online . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 963
21.4.2.1
Vocabularies and Ontologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 963
21.4.2.2
Metadata Schemas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 964
21.4.2.3
Semantic Television . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 964
21.4.3
Cultural Heritage Artifacts Online . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 964
21.4.3.1
Vocabularies and Ontologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 964
21.4.3.2
Metadata Schemas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 965
21.4.3.3
Semantic eCulture Systems Online . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 965
21.5
Future Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 965
21.6
Cross-References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 969

21.1 Introduction 21 913

Abstract: This chapter turns to the application of semantic technologies to areas

where text is not dominant, but rather audiovisual content in the form of images,

3D objects, audio, and video/television. Non-textual digital content raises new

challenges for semantic technology in terms of capturing the meaning of that content

and expressing it in the form of semantic annotation. Where such annotations are

available in combination with expressive ontologies describing the target domain, such

as television and cultural heritage, new and exciting possibilities arise for multimedia

applications.

21.1 Introduction

Ever since computers became capable of processing data other than text, capturing,

storing and processing images, 3D modeling, and processing audio and video, the issues

of how to describe these data so that they could be found again, or process these data so

that they could be reused in new contexts, have been studied in the field of multimedia

systems. The subject of this chapter is the work emerging on the intersection of multi-

media systems and semantic technology, leading to new insights in multimedia analysis

and annotation, and by extension new applications in areas like broadcasting (television)

and cultural heritage.

General cross-media queries are textual in nature, as text is considered the easiest

media for a computer system to handle. In order that queries are then matched to media,

the media objects are manually textually annotated. Then established text matching

algorithms are applicable to the multimedia retrieval. This additional annotation of

data is often referred to as ‘‘metadata,’’ which means ‘‘data about data.’’ In annotated

systems, how the user forms the query can be very significant in determining the success of

the retrieval, both in terms of the ambiguity of natural language and that the user may be

unaware of how the media has been annotated. Annotated systems are also not aware of

the broader meaning of the terms used in their metadata vocabulary, for example, that the

keyword ‘‘FordOrion’’ is a specific instance of a ‘‘car,’’ which is a ‘‘vehicle.’’ Hence, retrieval

is rather coarse, for example, only media with the exact annotation searched for is

returned, rather than with other, similar, media. To overcome this, text-based approaches

such as Latent Semantic Indexing [1] analyze natural language and associate related

words. This type of approach is still very dominant on the Web, for example, Google

Image Search (possibly the most used image retrieval system on the Web at the time of

writing) associates images with the text closest to them on the HTML page. In all these

cases, the metadata are determinable only as a result of there already being natural

language text associated with the media.

The set of semantic technologies addressed previously offers a new solution to the

problems of multimedia retrieval and processing. Multimedia annotations can become

richer than just simple metadata with keywords, where the use of ontologies enables the

annotator to link annotation values to knowledge about the wider domain, whether that

domain is that of the media object’s representation (e.g., a picture of a Ford Orion linked


into an ontology about cars) or of the media object itself (e.g., metadata on an art painting

is described in terms of an ontology about art paintings, which can capture domain

knowledge about paintings such as the materials used, style employed, etc.). Hence, the

choice of an appropriate semantic schema to annotate multimedia content is important

with respect to its future (re)use and most multimedia schemas in use today are not

immediately usable together with ontologies and reasoners.

> Section 21.2.1 begins with an exploration of semantic multimedia with the current

status of multimedia ontologies for annotation and the work toward a shared Media

Ontology. This is complemented by an overview of current tools for multimedia anno-

tation in > Sect. 21.2.2.

Multimedia content selection is a very different and difficult problem in comparison

with textual retrieval where the query is usually also textual and is realized by string

matching in the content store, aided by devices such as stemming and synonyms. The key

problems in the case of multimedia retrieval are that the form of query does not generally

match the form of media being queried and with queries that are of the same form (e.g.,

user whistling to search an audio database) matching techniques are more complex than

with text. However, in media industries, it is more typical to search image data on the basis

of an existing image, or audio based on a note or sample. Here, MIR (multimedia

information retrieval) research to improve the so-called query by example focuses on

low-level feature extraction and developing classifiers that map these low-level features to

a high-level concept. However, such low-level matching has the restriction of requiring the

query to be in the same form as the stored media, and conversely, that the stored media is

all of a single form. Hence, mixed media stores are excluded from this approach, and

queries are often not intuitive to the general user (e.g., much depends on the user’s skill for

drawing or whistling). The use of such classification techniques to support the multime-

dia annotation not only helps reduce human annotation effort but provides means for

cross-media multimedia search, or even mixed-form queries (query by example with

identification of the concepts sought, to better rank or filter results). > Section 21.2.3

introduces semantic multimedia analysis techniques to better train the classifiers and

extract concepts from low-level features.

The broadcasting industry relies on schemas and standards for its metadata, and a look

into developments toward a semantic schema standard for broadcasters in the future is

provided in > Sect. 21.2.4.

In turn, semantic data about multimedia objects allow them to be processed and

manipulated in similar ways to other instance data, for example, SPARQL-based retrieval

of matching objects, data mediation to ensure interoperability of schemas across

systems, or transformations to effect adaptation of the described media object to new

devices or contexts. In the rest of this chapter, in > Sect. 21.3, applications of semantic

technologies applied and adapted to the multimedia domain are presented, with examples

from the broadcasting (> Sect. 21.3.1) and cultural heritage (> Sect. 21.3.2) sectors,

respectively.

After providing some key papers on the current work in semantic multimedia

(> Sect. 21.4), > Sect. 21.5 will turn to the future trends in this area, considering

21.2 Scientific and Technical Overview 21 915

particularly how future TV viewers and explorers of 3D virtual worlds may benefit from

today’s research and enjoy the use of semantic technologies without being aware of it.

21.2 Scientific and Technical Overview

Before turning to applications of semantic multimedia and its future in society and

industry, it is necessary to introduce the current state of the art of the building blocks of

semantic multimedia: firstly, the vocabularies that are formalized as ontologies and used

to semantically describe multimedia content. Then the means to creating those annota-

tions, both through manual editing in tools and through automated generation using

multimedia analysis techniques. Finally, we look at how the state of the art in the

broadcasting industry is moving toward the use of semantic technology.

21.2.1 Multimedia Semantics: Vocabularies and Tools

The availability of interoperable semantic metadata is crucial for handling effectively the

growing amount of multimedia assets that are encountered in a plethora of applications

addressing both personal and professional multimedia data usage. A growing adoption of

Semantic Web technologies by the multimedia community in order to enable large-scale

interoperability between media descriptions and to benefit from the advantages brought

by explicit semantics in the reuse, sharing, and processing of metadata can be observed.

Multimedia annotations present several challenges. One of them is to enable users to

describe the content of some assets with respect to specific domain ontologies, but

contrary to the annotation of textual resources, multimedia content does not contain

canonical units (similar to words) that would have a predefined meaning. In the case of

media annotation, particular requirements apply as a result of the intrinsically multidis-

ciplinary nature of multimedia content. Among the most fundamental of these is the

ability to localize and annotate specific subparts within a given media asset, such as

regions in a still image or moving objects in video sequences. The modeling of the

structural and decomposition knowledge involved in the localization of individual

media segments varies across vocabularies and has different levels of support among

the existing annotation tools. The supported types of metadata, the granularity and

expressivity of the annotation level, the intended context of usage, etc., give rise to

further differences, casting a rather obscure setting regarding the sharing and reuse of

the generated multimedia annotations.

21.2.1.1 Multimedia Vocabularies on the Semantic Web

There has been a proliferation of metadata formats to express information about media

objects. For example, pictures taken by camera come with EXIF metadata related to the


image data structure (height, width, orientation), the capturing information (focal length,

exposure time, flash), and the image data characteristics (transfer function, color space

transformation). These technical metadata are generally completed with other standards

aiming at describing the subject matter. DIG35 is a specification of the International

Imaging Association (I3A). It defines, within an XML Schema, metadata related to image

parameters, creation information, content description (who, what, when, and where),

history, and intellectual property rights. XMP provides a native RDF data model and

predefined sets of metadata property definitions such as Dublin Core, basic rights, and

media management schemas for describing still images. IPTC has itself integrated XMP in

its Image Metadata specifications.

Video can be decomposed and described using MPEG-7, the Multimedia Content

Description ISO Standard. This language provides a large and comprehensive set of

descriptors including multimedia decomposition descriptors, management metadata

properties, audio and visual low-level features, and more abstract semantic concepts.

From the broadcast world, the European Broadcaster Union (EBU) has actively contrib-

uted to the video extension of the new version of IPTCNewsML-G2 based on IPTC’s NAR

architecture for describing videos, providing some extensions in order to be able to

associate metadata to arbitrary parts of videos and to have a vocabulary for rights

management. The EBU has also developed the EBUCore, P-Meta, and TV-Anytime

standards for production, archives, and electronic program guides (EPG). Finally,

video-sharing platforms provide generally their own lightweight metadata schemas and

APIs such as Yahoo!, Media RSS, or Google Video sitemaps.

Many of these formats are further described and discussed in [2]. On the one hand, an

environment is observed that uses numerous languages and formats, often XML-based,

that leads to interoperability problems and that excludes linking to other vocabularies and

existing Web knowledge resources. On the other hand, there is a need for using and

combining some of these metadata formats on the Web and there has been research work

for enabling interoperability using Semantic Web technologies. The following first

describes the various attempts to bring the most famous standard, MPEG-7, into the

Semantic Web. Then an ontology is presented for media resources that aims to be a future

W3C recommendation.

Comparing Four Different MPEG-7 Ontologies

MPEG-7, formally named Multimedia Content Description Interface [3], is an ISO/IEC

standard developed by the Moving Picture Experts Group (MPEG) for the structural and

semantic description of multimedia content. MPEG-7 standardizes tools or ways to define

multimedia Descriptors (Ds), Description Schemes (DSs), and the relationships between

them. The descriptors correspond either to the data features themselves, generally low-

level features such as visual (e.g., texture, camera motion) and audio (e.g., spectrum,

harmony), or semantic objects (e.g., places, actors, events, objects). Ideally, most low-level

descriptors would be extracted automatically, whereas human annotation would be

required for producing high-level descriptors. The description schemes are used for

grouping the descriptors into more abstract description entities. These tools as well as


their relationships are represented using theDescription Definition Language (DDL). After

a requirement specification phase, the W3C XML Schema recommendation has been

adopted as the most appropriate syntax for the MPEG-7 DDL.

The flexibility of MPEG-7 is therefore based on allowing descriptions to be associated

with arbitrary multimedia segments, at any level of granularity, using different levels of

abstraction. The downside of the breadth targeted by MPEG-7 is its complexity and its

ambiguity. Hence, MPEG-7 XML Schemas define 1,182 elements, 417 attributes, and

377 complex types, which make the standard difficult to manage. Moreover, the use of

XML Schema implies that a great part of the semantics remains implicit. For example,

very different syntactic variations may be used in multimedia descriptions with the same

intended semantics, while remaining valid MPEG-7 descriptions. Given that the standard

does not provide a formal semantics for these descriptions, this syntax variability causes

serious interoperability issues for multimedia processing and exchange [4–6]. The profiles

introduced by MPEG-7 and their possible formalization [7] concern, by definition, only

a subset of the whole standard. For alleviating the lack of formal semantics in MPEG-7,

four multimedia ontologies represented in OWL and covering the whole standard have

been proposed (> Table 21.1) [8–10]. The proposers of these four ontologies have

compared and discussed these four modeling approaches [11]. First, these four ontologies

are briefly described and then their commonalities and differences are outlined using three

criteria: (1) the way the multimedia ontology is linked with domain semantics; (2) the

MPEG-7 coverage of the multimedia ontology; and (3) the scalability and modeling

rationale of the conceptualization.

In 2001, Hunter proposed an initial manual translation of MPEG-7 into RDFS

(and then into DAML+OIL) and provided a rationale for its use within the Semantic

Web [9]. This multimedia ontology was translated into OWL, and extended and harmo-

nized using the ABC upper ontology [12] for applications in the digital libraries [13] and

eResearch fields [14]. The current version is an OWL Full ontology containing classes

defining the media types (Audio, AudioVisual, Image, Multimedia, and Video) and

. Table 21.1

Summary of the different MPEG-7-based multimedia ontologies

Hunter [9] DS-MIRF [10] Rhizomik [8] COMM

Foundations ABC None None DOLCE

Complexity OWL-Full OWL-DL OWL-DL OWL-DL

URL metadata.net/mpeg7/

www.music.tuc.gr/ontologies/MPEG703.zip

rhizomik.net/ontologies/mpeg7ontos

multimedia.semanticWeb.org/COMM/

Coverage MDS+Visual MDS+CS All MDS+Visual

Applications Digitallibraries,eResearch

Digital libraries,eLearning

Digital rightsmanagement,e-business

Multimediaanalysis andannotations

http://www.music.tuc.gr/ontologies/MPEG703.zip




the decompositions from theMPEG-7Multimedia Description Schemes (MDS) part. The

descriptors for recording information about the production and creation, usage, struc-

ture, and the media features are also defined. The ontology can be viewed in Protege

(http://protege.stanford.edu/) and has been validated using the WonderWeb OWL

Validator (http://www.mygrid.org.uk/OWL/Validator). This ontology has usually been

applied to describe the decomposition of images and their visual descriptors for use in

larger semantic frameworks. Harmonizing through an upper ontology, such as ABC,

enables queries for abstract concepts such as subclasses of events or agents to return

media objects or segments of media objects. While the ontology has most often been

applied in conjunction with the ABC upper model, it is independent of that ontology and

can also be harmonized with other upper ontologies such as SUMO [15] or DOLCE [16].

In 2004, Tsinaraki et al. proposed the DS-MIRF ontology that fully captures in OWL

DL the semantics of the MPEG-7 MDS and the Classification Schemes. The ontology can

be visualized with GraphOnto or Protege and has been validated and classified with the

WonderWeb OWL Validator. The ontology has been integrated with OWL domain

ontologies for soccer and Formula 1 in order to demonstrate how domain knowledge

can be systematically integrated in the general-purpose constructs of MPEG-7. This

ontological infrastructure has been utilized in several applications, including audiovisual

digital libraries and eLearning. The DS-MIRF ontology has been conceptualizedmanually,

according to the methodology outlined in [10]. The XML Schema simple datatypes

defined in MPEG-7 are stored in a separate XML Schema to be imported in the DS-

MIRF ontology. The naming of the XML elements are generally kept in the rdf:IDs of the

corresponding OWL entities, except when two different XML Schema constructs have the

same names. The mapping between the original names of the MPEG-7 descriptors and

the rdf:IDs of the corresponding OWL entities is represented in an OWL DL mapping

ontology. Therefore, this ontology will represent, for example, that the Name element of

the MPEG-7 type TermUseType is represented by the TermName object property, while

the Name element of the MPEG-7 type PlaceType is represented by the Name object

property in the DS-MIRF ontology. The mapping ontology also captures the semantics of

the XML Schemas that cannot be mapped to OWL constructs such as the sequence

element order or the default values of the attributes. Hence, it is possible to return to an

original MPEG-7 description from the RDF metadata using this mapping ontology. This

process has been partially implemented in GraphOnto [17], for the OWL entities that

represent the SemanticBaseType and its descendants. The generalization of this

approach has led to the development of a transformation model for capturing the

semantics of any XML Schema in an OWL DL ontology [18]. The original XML Schema

is converted into a main OWL DL ontology, while an OWL DL mapping ontology keeps

track of the constructs mapped in order to allow circular conversions.

In 2005, Garcia and Celma presented the Rhizomik approach that consists of mapping

XML Schema constructs to OWL constructs, following a generic XML Schema to OWL

together with an XML to RDF conversion [8]. Applied to the MPEG-7 schemas, the

resulting ontology covers the whole standard as well as the Classification Schemes and

TV-Anytime (http://tech.ebu.ch/tvanytime). It can be visualized with Protege or Swoop

http://protege.stanford.edu/

http://www.mygrid.org.uk/OWL/Validator

http://tech.ebu.ch/tvanytime


(http://code.google.com/p/swoop) and has been validated and classified using the

WonderWeb OWLValidator and Pellet. The Rhizomik ontology was originally expressed

in OWL Full, since 23 properties must be modeled using an rdf:Property because they

have both a datatype and object-type range, that is, the corresponding elements are both

defined as containers of complex types and simple types. An OWL DL version of the

ontology has been produced, solving this problem by creating two different properties

(owl:DatatypeProperty and owl:ObjectProperty) for each of them. This change

is also incorporated into the XML2RDF step in order to map the affected input XML

elements to the appropriate OWL property (object or data type), depending on the kind of

content of the input XML element. The main contribution of this approach is that it

benefits from the great amount of metadata that has been already produced by the XML

community. Moreover, it allows the automatic mapping of input XML Schemas to OWL

ontologies and XML data based on them to RDF metadata following the resulting

ontologies. This approach has been used with other large XML Schemas in the Digital

Rights Management domain, such as MPEG-21 and ODRL [19], and in the eBusiness

domain [20].

In 2007, Arndt et al. have proposed COMM, the Core Ontology of Multimedia, for

annotation. Based on early work [21, 22], COMM has been designed manually by

reengineering completely MPEG-7 according to the intended semantics of the written

standard. The foundational ontology DOLCE serves as the basis of COMM. More

precisely, the Description and Situation (D&S) and Ontology of Information Objects

(OIO) patterns are extended into various multimedia patterns that formalize theMPEG-7

concepts. The use of an upper-level ontology provides a domain-independent vocabulary

that explicitly includes formal definitions of foundational categories, such as processes or

physical objects, and eases the linkage of domain-specific ontologies because of the

definition of top-level concepts. COMM covers the most important part of MPEG-7

that is commonly used for describing the structure and the content of multimedia

documents. Current investigations show that parts of MPEG-7 that have not yet been

considered (e.g., navigation and access) can be formalized analogously to the other

descriptors through the definition of other multimedia patterns. COMM is an OWL

DL ontology that can be viewed using Protege. Its consistency has been validated using

Fact++-v1.1.5. Other reasoners failed to classify it due to the enormous amount of DL

axioms that are present in DOLCE. The presented OWL DL version of the core module is

just an approximation of the intended semantics of COMM since the use of OWL 1.1 (e.g.,

qualified cardinality restrictions for number restrictions of MPEG-7 low-level descrip-

tors) and even more expressive logic formalisms are required for capturing its complete

semantics.

To compare the four MPEG-7-based ontologies described above, consider a task to

annotate the famous ‘‘Big Three’’ picture, taken at the Yalta (Crimea) Conference, showing

the heads of government of the USA, the UK, and the Soviet Union during World War II.

The description could be obtained either manually or automatically from an annotation

tool. It could also be the result of an automatic conversion from an MPEG-7 description.

The annotation should contain the media identification and locator, define the still region

http://code.google.com/p/swoop


SR1 of the image, and provide the semantics of the region using http://en.wikipedia.org/

wiki/Churchill for identifying the resource Winston Churchill. > Figure 21.1 depicts

the RDF descriptions generated for these four ontologies.

The link between a multimedia ontology and any domain ontologies is crucial. In the

example, a more complete description could include information about ‘‘Churchill’’ (a

person, a British Prime Minister, etc.) and about the event. In addition, details about the

provenance of the image (e.g., date taken, photographer, camera used) could also be

linked to complete the description. The statements contained in the descriptions above, in

conjunction with any of the four underlying ontologies presented in this paper, can then

be used to answer queries such as ‘‘find all images depicting Churchill ’’ or ‘‘find all media

depicting British Prime Ministers.’’ Furthermore, subjective queries such as ‘‘find images

with a ‘bright’ segment in them,’’ where ‘‘bright’’ is defined as mpeg7:DominantColor

greater than rgb(220,220,220), are also possible.

Hunter’s MPEG-7 and COMM ontologies both use an upper ontology approach to

relate with other ontologies (ABC and DOLCE). Hunter’s ontology uses either semantic

relations from MPEG-7, such as depicts, or defines external properties that use an

MPEG-7 class, such as mpeg7:Multimedia, as the domain or range. In COMM, the link

with existing vocabularies is made within a specific pattern: the Semantic Annotation

Pattern, reifying the DOLCE Ontology of Information Object (OIO) pattern. Conse-

quently, any domain-specific ontology goes under the dolce:Particular or owl:

Thing class. The DS-MIRF ontology integrates domain knowledge by subclassing one

of the MPEG-7 SemanticBaseType: places, events, agents, etc. Furthermore, it fully

captures the semantics of the various MPEG-7 relationships represented as instances of

the RelationType. According to the standard, the value of these properties must come

from some particular classification schemes: RelationBaseCS, TemporalRe-

lationCS, SpatialRelationCS, GraphRelationCS, and SemanticRelationCS.

A typed relationship ontology extending DS-MIRF has been defined for capturing all

these relationships.

An important modeling decision for each of the four ontologies is how much they are

tied to theMPEG-7 XML Schema. These decisions impact upon the ability of the ontology

to support descriptions generated automatically and directly from MPEG-7 XML output

and on the complexity of the resulting RDF. Therefore, the modeling choices also affect

the scalability of the systems using these ontologies and their ability to handle large media

datasets and cope with reasoning over very large quantities of triples. Both the DS-MIRF

and the Rhizomik ontologies are based on a systematic one-to-one mapping from the

MPEG-7 descriptors to equivalent OWL entities. For the DS-MIRF ontology, themapping

has been carried out manually, while for the Rhizomik ontology, it has been automated

using an XSL transformation and it is complemented with an XML to RDFmapping. This

has been a key motivator for the Rhizomik ontology and the ReDeFer tool where the

objective is to provide an intermediate step before going to a more complete multimedia

ontology, such as COMM. The advantage of the one-to-one mapping is that the trans-

formation of the RDF descriptions back to MPEG-7 descriptions may be automated later

on. In addition, this approach enables the exploitation of legacy data and allows existing

http://en.wikipedia.org/wiki/Churchill

http://en.wikipedia.org/wiki/Churchill

mpe

g7:im

age

mpe

g7:C

reat

ion

Info

rmat

ion

mpe

g7:M

edia

Loca

tor

mpe

g7:M

edia

UR

I

data

:has

-rec

tang

le

mpe

g7:M

edia

Loca

tor

rdf:t

ype

mpe

g7:s

patia

l_de

com

posi

tion

mpe

g7:D

omin

antC

olor

mpe

g7:d

epic

ts

mpe

g7:C

oord

s

mpe

g7:S

patia

lMas

km

peg7

:dep

icts

mpe

g7:P

olyg

on

mpe

g7:S

eman

ticm

peg7

:Spa

tialM

ask

rdf:t

ype

mpe

g7:M

edia

Loca

tor

mpe

g7:s

patia

l_de

com

posi

tion

mpe

g7:C

reat

ionI

nfor

mat

ion

mpe

g7:T

itle

mpe

g7:d

im

dns:

play

ed-b

ydn

s:pl

ayed

-by

dns:

play

s

dns:

setti

ng

rdf:t

ype

dns:

defin

esdn

s:re

aliz

ed-b

y dns:

defin

esrd

f:typ

em

peg7

:Spa

tialD

ecom

posi

tion

mpe

g7:C

reat

ion

mpe

g7:S

patia

lm

ask

mpe

g7:T

itle

cont

entS

trin

g

mpe

g7:d

im

mpe

gu:P

olyg

on

mpe

g7:R

elat

ed

Mat

eria

l

mpe

g7:S

tillR

egio

n

core

:imag

e-da

taco

re:s

eman

tic-

anno

tatio

n

core

:sem

antic

-la

bel-r

ole

http

://en

.wik

iped

ia.o

rg/

wik

i/Chu

rchi

ll

mpe

g7:S

tillR

egio

n

mpe

g7:P

olyg

on

mpe

g7:im

age

dbpe

dia:

Chu

rchi

llR

eg1

mpe

g7:S

ubR

egio

n

The

big

thre

e at

the

yalta

con

fere

nce

The

big

thre

e at

the

yalta

con

fere

nce

5 25

10

20 1

5 15

10

10 5

15”^

^xsd

:str

ing 5

25 1

0 20

15

15 1

0 10

515

”^^x

sd:s

trin

g

5 25

10

20 1

5 15

10

10 5

15”^

^xsd

:str

ing

rgb(

25,2

55,2

55)

The

big

thre

e at

the

yalta

con

fere

nce

5 25

10

20 1

5 15

10

10 5

15”^

^xsd

:str

ing

mpe

g7:im

age

mpe

g7:C

oord

s

mpe

g7:S

egm

ent

Type

dbpe

dia:

Chu

rchi

ll

Reg

1

loc:

spat

ial-

mas

k-ro

lelo

c:re

gion

-lo

cato

r-de

scrip

tor

loc:

boun

ding

-bo

x

foaf

:Per

son

dbpe

dia:

Chu

rchi

ll

mpe

g7:C

oord

s

Reg

1

a

bc

d

mpe

g7:S

ubR

egio

n

.Fig.21.1

Anim

agedescribedaccordingto

anMPEG-7

ontology.(a)Rhizomikapproach

.(b)DS-M

IRFapproach

.(c)Hunterapproach

.(d)COMM

approach

.

Theim

ageisavisualrepresentationoftheresourceidentifiedbyhttp://en.wikipedia.org/w

iki/Im

age:Yalta_Conference.jpg


http://en.wikipedia.org/wiki/Image:Yalta_Conference.jpg


tools that output MPEG-7 descriptions to be integrated into a semantic framework. The

main drawback of this approach is that it does not guarantee that the intended semantics

of MPEG-7 is fully captured and formalized. On the contrary, the syntactic interopera-

bility and conceptual ambiguity problems such as the various ways of expressing

a semantic annotation remain.

The COMM ontology avoids doing a one-to-one mapping for solving these ambigu-

ities that come from the XML Schemas, while an MPEG-7-to-COMM converter is still

available for reusing legacy metadata. A direct translation from anMPEG-7 XML descrip-

tion using Hunter’s ontology is possible. However, in practice, the multimedia semantics

captured by the ontology have instead been used to link with domain semantics. There-

fore, rather than translating MPEG-7 XML descriptions into RDF, this ontology has been

used to define semantic statements about a media object and to relate these statements to

the domain semantics. This results in a smaller number of triples.

The MPEG-7-based ontologies discussed here aim to provide richer semantics and

better frameworks for multimedia description and exchange than can be addressed by

current standards. For further reading, the interested reader can also refer to [116] that

surveys the state of the art of MPEG-7-based ontologies. Related efforts to develop

multimedia ontologies include the following: the Visual Descriptor Ontology (VDO)

[23] is based on the MPEG-7 Visual part and used for image and video analysis; [24] have

proposed a visual ontology by extending WordNet with multimedia semantics from

Hunter’s ontology, specifically for use within the museums and art domain; [25] devel-

oped an MPEG-7-based ontology and applied it to annotating football (soccer) videos;

similar to the approach used in Hunter’s ontology and in COMM, this ontology uses the

decomposition and visual components of MPEG-7 and captures high-level domain

semantics in domain-specific ontologies.

Toward a Standardized Ontology for Media Resources

The Ontology for Media Resources currently being specified in W3C is a core vocabulary

that covers basic metadata properties to describe media resources (see www.w3.org/TR/

mediaont-10/). The goal of this ontology is to address the interoperability problem by

providing a common set of properties defining the basic metadata needed for media

resources and the semantic links between their values in different existing vocabularies.

The ontology can be used to attach different types of metadata to the media, such as the

duration, the target audience, the copyright, the genre, and the rating. Media fragments

can also be defined in order to have a smaller granularity and attach keywords or formal

annotations to parts of the video. The ontology will also be accompanied by an API that

provides uniform access to all elements defined by the ontology.

The purpose of the mappings defined in the ontology is to enable different applica-

tions to share and reuse metadata represented in heterogeneous metadata formats. For

example, creator is a common property that is supported in many metadata formats.

Therefore, it is defined as one of the properties in the core vocabulary of the ontology for

media resources and aligned with other vocabularies. Ideally, the mappings defined in the

ontology should be used to reconcile the semantics of a term defined in a particular

http://www.w3.org/TR/mediaont-10/

http://www.w3.org/TR/mediaont-10/

<http://data.linkedevents.org/media/4303994975> a ma:Image; dc:title "Radiohead / Thom Yorke"; ma:locator <http://farm3.static.flickr.com/2726/4303994975_74302c45b5_o.png>; ma:createDate "2010-01-25T12:27:21"^ ^xsd:dateTime; ma:frameWidth "1280"^ ^xsd:integer; ma:frameHeight "720"^ ^xsd:integer; ma:keyword "colin"; ma:keyword "radiohead".

. Fig. 21.2

MediaOntology annotation (Courtesy Raphael Troncy, from data.linkedevents.org)


schema. However, this cannot be easily achieved, due to the many differences in the

semantics that are associated with each property in the mapped vocabularies. For exam-

ple, the property dc:creator from Dublin Core and the property exif:Artist

defined in EXIF are both aligned to the property ma:creator. However, the extension

of the property in the EXIF vocabulary (i.e., the set of values that the property can have) is

more specific than the corresponding set of values that this property can have in Dublin

Core. Therefore, mapping back and forth between properties from different schemas,

using this ontology as a reference, will induce a certain loss in semantics. The axioms

representing the mappings are defined as an exact, broader, or narrower mapping between

two properties (> Fig. 21.2).

21.2.2 Semantic Web–Based Multimedia Annotation Tools

As already sketched by the aforementioned, multimedia annotations come in

a multilayered, intertwined fashion, encompassing among others descriptions about the

conveyed subject matter (e.g., a train arriving at the station in a rainy day), content

structure (e.g., the specific image region depicting the train or the part of the video

capturing the train as it approaches), visual features (e.g., the values of the color descrip-

tors corresponding to the rainy sky image parts), administrative information (e.g.,

ownership and editing rights), and so forth. Different aspects pertain depending on the

annotation needs and the particular application context addressed each time, as illus-

trated in the description of state-of-the-art ontology-based annotation tools that follows.

SWAD (http://swordfish.rdfWeb.org/discovery/2004/03/w3photo/annotate.html),

though no longer maintained, constitutes one of the first Semantic Web–based

implementations addressing the manual annotation of images. Through a Web-based

interface, it allows the user to insert descriptions regarding who or what is depicted in

an image (person, object, and event), when and where it was taken, and additional

creation and licensing information. Annotations are exported in RDF. Despite the very

early stage of Semantic Web technologies uptake, SWAD used an impressive number of

http://swordfish.rdfWeb.org/discovery/2004/03/w3photo/annotate.html


RDF vocabularies including FOAF, the Dublin Core element set, RDFiCalendar (http://

www.w3.org/2002/12/cal/), as well as an experimental, at that time, namespace for

WordNet.

PhotoStuff (http://www.mindswap.org/2003/PhotoStuff/) is a platform-independent

ontology-based annotation tool that allows users to describe the contents of an image with

respect to ontology concepts, as well as administrative information about the image such

as the creation date [26]. Multiple RDF(S) or OWL ontologies can be simultaneously

loaded, facilitating the user in the creation of annotations distributed across many

ontologies. Classes from the ontologies can be associated to the entire image or specific

regions using one of the available drawing tools (circle, rectangle, and polygon), while the

necessary region, localization, and so forth, definitions are provided by a built-in image-

region ontology. Annotations referring to relations can also be created by connecting

concept annotation instances that have been already identified in an image using prop-

erties from the uploaded ontologies. PhotoStuff also takes advantage of existing metadata

embedded in image files (such as EXIF) by extracting and encoding such information in

RDF/XML. The property depicts from FOAF and its inverse (depiction) are used to

link image (region) instances with domain ontologies concept instances. Finally,

PhotoStuff supports browsing, searching, and managing digital image annotations

through a loosely coupled connection with a Semantic Web portal (> Fig. 21.3).

AktiveMedia (http://www.dcs.shef.ac.uk/�ajay/html/cresearch.html), developed

within the AKT (http://www.aktors.org/akt/) and X-Media (http://www.x-media-project.

org/) projects, is an ontology-based cross-media annotation system addressing text and

image assets [27]. In the image annotation mode, AktiveMedia supports content markup

with respect to multiple domain-specific ontologies. Unlike PhotoStuff, only a single

ontology is displayed each time in the ontology browser. The user-created annotations

may refer to an image or a region level (using the provided rectangular and circular

drawing facilities), and a simple built-in schema is used to capture localization informa-

tion. To describe an image as whole, AktiveMedia provides three free text fields, namely,

title, content, and comment. Using the text annotation mode, the user-entered descrip-

tions can be subsequently annotated with respect to an ontology. The supported ontology

languages include RDFS and OWL, as well as older Semantic Web languages such as

DAML and DAML-ONT, and RDF are used for the export of the generated annotations.

An interesting feature of AktiveMedia, though not directly related to the task of image

annotation, is its ability to learn during textual annotation mode, so that suggestions can

be subsequently made to the user (> Fig. 21.4).

Following a different rationale, M-Ontomat-Annotizer, developed within the

aceMedia project (http://www.acemedia.org/aceMedia), extends the typical media anno-

tation by enabling in addition the formal representation of low-level visual features and

their linking with concepts from a domain ontology [28]. In order to formalize the linking

of domain concepts with visual descriptors, M-Ontomat-Annotizer employs the Visual

Annotation Ontology (VAO) and the Visual Descriptor Ontology (VDO) [29], both

hidden from the user. The VAO serves as a meta-ontology allowing one to model

domain-specific instances as prototype instances and to link them to respective descriptor

http://www.w3.org/2002/12/cal/

http://www.w3.org/2002/12/cal/

http://www.mindswap.org/2003/PhotoStuff/

http://www.dcs.shef.ac.uk/~ajay/html/cresearch.html

http://www.dcs.shef.ac.uk/~ajay/html/cresearch.html

http://www.aktors.org/akt/

http://www.x-media-project.org/

http://www.x-media-project.org/

http://www.acemedia.org/aceMedia

. Fig. 21.3

Annotation screenshot using PhotoStuff

. Fig. 21.4

Annotation screenshot using AktiveMedia



instances through the hasDescriptor property. The domain-specific instances, and by

analogy the extracted descriptor instances, may refer to a specific region or to the entire

image. For the identification of a specific region, the user may either make use of the

automatic segmentation functionality provided by the M-Ontomat-Annotizer or use one

of the manually drawing tools, namely, the predefined shapes (rectangle and ellipse), free

hand, and magic wand. The supported input ontology languages are RDFS and DAML. In

a subsequent release within the K-Space project (http://kspace.qmul.net), M-Ontomat 2.0

(http://mklab.iti.gr/m-onto2) provides support for descriptive and structural annotations

in the typical semantic search and retrieval sense (> Fig. 21.5).

The K-Space Annotation Tool (KAT) (https://launchpad.net/kat) is an ontology-based

framework developed within the K-Space project for the semiautomatic annotation of

multimedia content [30]. Its core provides the infrastructure of an API and set of services,

including configuration and access to Sesame and Sesame2 repositories that enable users

to implement in a plug-in-based fashion relevant functionalities such as visualization,

editing, and manipulation of semantic content annotations. The model and storage layer

of KAT is based on the Core Ontology of Multimedia (COMM) [31] and the MultiMedia

Metadata Ontology (M3O), a subsequent extension that addresses also the annotation of

rich multimedia presentations [32]. In the current release, concepts from an ontology can

be used to mark up images and respective regions that are localized manually, using either

the rectangle or the polygon drawing tools. Decomposition and localization information

. Fig. 21.5

Annotation screenshot using M-Ontomat-Annotizer

http://kspace.qmul.net

http://mklab.iti.gr/m-onto2

https://launchpad.net/kat


is represented based on the respective COMM decomposition pattern. KAT’s flexible

architecture and COMM-based annotation model render it media independent, allowing

support for additional content types as long as respective media management function-

alities (e.g., video player) are implemented. Furthermore, the COMM-based annotation

model makes it quite straightforward to extend annotation level so as to include addi-

tional dimensions, such as low-level visual features for example, again as long as appro-

priate feature extraction plug-ins are available (> Fig. 21.6).

The Video and Image Annotation (VIA) tool, developed within the BOEMIE project

(http://www.boemie.org), provides a looser notion of ontology-based media markup.

Specifically, it supports the annotation of image and video assets using concepts from an

ontology, while allowing also for free text descriptions. Users may also add administrative

descriptions, including information about the creator of the annotations, the date of the

annotation creation, etc., based on a simple built-in schema. Image (and video frame)

annotation may address the entire image (video frame) or specific regions; in the case of

image annotation, the user can also select to extract MPEG-7 visual descriptors to enhance

annotations with low-level information. The localization of regions is performed either

semiautomatically, providing to the user a segmented image and allowing him or her to

correct it by region merging, or manually, using one of the drawing functionalities,

namely, free hand, polygon, circle, or rectangle.

Video annotation may refer to the entire video asset, video segments, moving regions,

frames, or still regions within a frame. It can be performed either in a successive frame-by-

frame fashion or in real time, where the user follows the movement of an object while the

. Fig. 21.6

Annotation screenshot using KAT

http://www.boemie.org


video is playing, by dragging its bounding box. The annotations performed using VIA can

be saved as annotation projects, so that the original video, the imported ontologies, and

the annotations can be retrieved and updated at a later time. The produced metadata are

exported following a custom format, either in XML or in a more human-readable textual

form (> Fig. 21.7).

Following a different approach, a number of media annotation tools have been

developed based on MPEG-7 and customary multimedia description vocabularies. This

line of thought is particularly evident in the case of video annotation, where Semantic

Web–based technologies have hardly been employed; KAT is an exception, though cur-

rently it provides just the infrastructure and not an implementation, while VIA, though

allowing the use of a subject matter ontology for video annotation, uses a proprietary

format for encoding the generated metadata. Prominent examples of non-Semantic Web

compliant video annotation tools include IBM’s VideoAnnEx, Anvil, the Semantic Video

Annotation Suite, Ontolog, Elan, etc. (despite some names, none of these tools produces

metadata based on an ontology). For a complete list of relevant tools and resources for

the annotation of media content, the interested reader is referred to the Tools and

Resources page of W3C Multimedia Semantics Incubator Group (http://www.w3.org/

2005/Incubator/mmsem/wiki/Tools_and_Resources).

. Fig. 21.7

Annotation screenshot using VIA

http://www.w3.org/2005/Incubator/mmsem/wiki/Tools_and_Resources

http://www.w3.org/2005/Incubator/mmsem/wiki/Tools_and_Resources


Besides the uneven uptake of Semantic Web technologies in the annotation of

video assets compared to images, the aforementioned tools lead to a number of consid-

erations. A critical one relates to the interoperability of the generated annotation

metadata. The different ontology schemas used by the individual tools to represent

media-related descriptions and the respective modeling approaches to the linking

with domain ontologies hamper the sharing and reuse of annotations across different

applications. The situation is further aggravated, as many tools choose to follow propri-

etary schemas. This discrepancy between the available theoretical results and their

uptake in practical applications brings forth once again the trade-off between the

complexity of the proposed formal representation models and the effective fulfillment

of real-world needs. It is also quite interesting to note that many tools do not allow

users to edit previously created annotations at a later point, treating annotation as

a onetime affair.

The ambiguity that characterizes the relation and interlinking between annotations

generated by different tools induces a subsequent vagueness when assessing the appro-

priateness of each tool for a given application. For semantic search and retrieval, appli-

cations that address content at the level of perceived meaning, possible selection criteria

may be the expressivity level (are ontology classes adequate or relation descriptions are

also needed?) or the granularity of the annotations (depth of spatial or temporal content

decomposition). For applications that encompass multimedia analysis aspects too, tools

that support descriptions at the level of low-level features provide the means to capture

and share this additionally required information.

Concluding, although themain focus of semantic multimedia research continues to be

the automatic extraction of multimedia content annotations, the ability to effectively

generate manually or semiautomatically annotations remains a crucial pursuit. Despite

the strenuous research efforts and successful results in sporadic application domains, the

automatic extraction of multimedia semantics is still at a very naive level compared to

practical user needs. Moreover, manual content annotations contribute actively in seman-

tic multimedia research serving as ground truth data for evaluation purposes and as

training data to support knowledge acquisition and learning tasks.

21.2.3 Semantic Multimedia Analysis

Automated multimedia content understanding has strained researchers for years in the

painstaking quest to confront the so-called semantic gap challenge, namely, the lack of

correspondence between the low-level content descriptions that can be automatically

derived and the semantic interpretation a human would attribute [33].

Since its early days, research in content understanding has been intertwined with the

use of structured, prior knowledge in the pursuit of endowing computational systems

with the notion of informed (in terms of background knowledge–driven) interpretation.

This interrelation has rendered knowledge representation and reasoning as integral

elements of the undertaken investigations.


In the 1980s and early 1990s, semantic networks and frames were widely used tomodel

the relevant domain knowledge and support object recognition and scene interpretation

tasks, while formal approaches, based on first-order logic, were significantly sparser

[34, 35]. The lack of consensual semantics and the customized reasoning algorithms,

hindered the sharing and reuse of knowledge across systems, and often led to disparate

conclusions for the same problem. Furthermore, the underlying assumption that all

aspects involved in semantic content analysis should be explicitly modeled had resulted

in extremely elaborate conceptualizations and decision-making strategies. All these led

gradually to a period of rather receding interest in the use of explicit knowledge,

a tendency further corroborated by the momentum that statistical inference approaches

have gained as generic tools for semantic image and object classification. Soon though, the

limitations of relying solely on learning using perceptual information and similarity-based

associations became apparent, reviving interest into the role of knowledge and reasoning

in multimedia content understanding [36, 37]. Following the Semantic Web initiative,

ontology [115] and Description Logic (DL) languages [116] became the prevalent for-

malisms for capturing and representing knowledge, establishing the current literature.

The following considers the use and role of Semantic Web technologies in the current

state of the art in semantic multimedia analysis and understanding, and concludes with

a brief discussion on open issues and challenges for future directions. As space constraints

have enforced several simplifications and omissions, the interested reader is referred to the

provided related resources for a thorough treatment of the topics addressed.

> Figure 21.8 depicts a typical example framework for semantic image analysis

deploying Semantic Web–based technologies. The interpretation process starts with an

image processing stage, where the extraction of relevant low- and intermediate-level

Seaside(image) ≥ 0.64Countryside_buildings(image) ≥ 0.52

above(r3,r2) ≥ 0.58Building(r3) ≥ 0.85Building(r4) ≥ 0.55

Vegetation(r4) ≥ 0.75......

edges, regions,texture, color...

Image processing and analysis

segmentation, feature extraction,object/scene classification, ...

Inference and semantic interpretationBackground knowledge

derive higher-level descriptions,coherency checking, ...

LandscapeCountryside_buildingsGrass � ∃ above.SkyLandscape � SeasideCountryside_buildings

Domain knowledge,constraints,

visual attributes, etc.

OutdoorLandscape⊥⊥∃ contains.Building� ∃contains.Vegetation

��

. Fig. 21.8

Semantic image analysis architecture


representations pertaining to perceptual features, such as color and texture, takes place.

This is often carried out with the joint application of spatial (temporal) segmentation.

The extracted representations may then be used directly as input facts (assertions) in the

inference process, or may undergo further processing to acquire descriptions of higher

abstraction (e.g., concept classifiers for the detection of primitive objects, as in the

illustrated example), before inference is eventually invoked. The knowledge base encodes

logical associations and constraints necessary to admit valid content interpretations, as

well as appropriate abstraction layers so as to link and map information derived from

perceptual evidence to object and scene interpretations. A possible model for a building

thus, may span multiple levels of abstraction, starting with constituent parts such as

windows and doors, include corresponding spatial and geometric characteristics, and

reach down to edge and spatial adjacency definitions at the level of image pixels.

Knowledgemodeling and inference lie at the core of the interpretation process and being

intertwined with the configuration of content interpretation, they comprise the chief

dimensions of differentiation in the current state of the art. For example, approaches that

realize interpretation as a stepwise transition from low- tohigh-level content representations,

place naturally large emphasis on the modeling and linking of media-related knowledge to

domain-specific knowledge. Approaches that tackle instead the configuration of content

interpretation in a formally accountable way tend to concentrate on the implications and

adaptation requirements imposed on inferencing, and usually abstract lower-level represen-

tations. Additional differences can be traced to the intrinsic ambiguity involved in content

interpretation that gives rise to diverse imprecision handling methodologies, to conceptual

modeling requirements that affect the granularity and types of knowledge considered, to

the interplay between knowledge, inference, and analysis, and so forth, to name but a few.

In the following, consider the following representative examples from the current state of

art, outlining themain characteristics, weaknesses, and insights, starting with approaches that

consider the formal representation of media-related knowledge, and which can be further

classified into those adhering to standardized definitions, such as MPEG-7, and those

following proprietary ones. In [38] domain experts define, through a graphical interface,

rules that map particular combinations of low-level visual features (color, texture, shape, and

size) to high-level semantic concepts defined in the domain ontology. These rules are

subsequently applied in order to infer descriptions of the form ‘‘still region ri depicts cj,’’

where cj is an instance of a domain concept [39, 40]. In [23], MPEG-7 compliant visual

descriptions (color, texture, shape) are used to enrich domain concepts serving as

prototypical visual instantiations. These enriched domain representations are subse-

quently used as prior knowledge to train statistical classifiers for the automated detection

of the addressed semantic concepts. The semantic link between these prototypical

instances and the respective domain concepts is established through the M-Ontomat-

Annotizer graphical annotation tool [41] making use of the Multimedia Structure

ontology, the Visual Descriptor ontology, and the Visual Annotation ontology [29].

The MPEG-7 compliant media-related knowledge representations render the afore-

mentioned approaches particularly appealing in terms of reusing and sharing the knowl-

edge involved. The extracted low-level features can be interchanged straightforwardly


between different applications, saving significant time and effort that would be required

for their recalculation. Similarly, the inference rules defined by domain experts and

enriched with visual attributes can be shared across applications enabling them to more

effectively communicate human domain knowledge, where relevant.

Addressing similar considerations, a visual ontology-guided approach to object rec-

ognition is presented in [42], this time adopting a proprietary approach to the definition

of the media-related descriptions. Domain experts populate the knowledge base using the

color, texture, and spatial concepts provided by the visual ontology to describe semantic

objects of the domain. An image processing ontology is used to formally encode notions

pertaining to the image processing level, including entities such as edge and region, and

features, such as color histograms to numerically characterize visual properties. The

linking of the visual ontology descriptions with the image processing ontology descrip-

tions representations can be either manually predefined or learned following a mixed

bottom-up and top-down methodology [43, 44]. Another processing-related ontology is

presented in [45], this time accounting for the algorithmic aspects of the analysis process

for the detection of semantic objects in video sequences. A domain-specific ontology

provides object descriptions extended with low-level and qualitative visual knowledge,

and a set of rules determines the sequence of steps required to detect particular semantic

objects using the definitions provided by the analysis ontology. In [116], clusters of visually

similar content, computed based on low-level features, are used as concept definitions to

enrich (via subclass relations) the linguistic terms comprising the domain ontology.

Besides differences in the modeling and engineering of the background knowledge, the

aforementioned approaches share a common underlying assumption, modeling interpre-

tation as straight bottom-up deduction by augmenting the initial perceptual facts through

inference upon the available background knowledge. Although this assumption may be

true in given applications, the incompleteness, ambiguity, and complexity that charac-

terize the task of content interpretation, in general, render such a view of limited

applicability. To meet the challenge of selecting among plausible alternatives while

constructing an interpretation, a number of approaches have investigated more closely

the requirements imposed on inferencing.

In a series of works [46–48], Description Logics are examined for high-level scene

interpretation based on the notion of aggregates, that is, concepts that consist of multiple

parts that are constrained with respect to particular spatial (or temporal) relations. High-

level concepts are linked to corresponding view-concepts, which realize the grounding with

low-level processing evidence and initiate the (partial) instantiation of aggregates. The

interpretation process is modeled as a recursive search in the space of alternative inter-

pretations exploiting the logical structure of the aggregates in a mixed bottom-up and

top-down fashion. The existence of multiple models though leaves open a great degree of

freedom for choosing which of the alternatives to examine first each time. In [49],

a middle layer serves as mediator, attempting to match hypotheses from the high-level

interpretation layer to the available evidence; if a hypothesis is neither confirmed nor

refuted, low-level image processing is invoked againwith accordingly updated parameters.

Work toward a probabilistic model for handling dependencies in compositional aggregate


hierarchies is sketched in [50], with the purpose of providing preference measures to

guide the interpretation steps.

In [51], the media interpretation configuration described above is extended to for-

malize interpretation as abduction over Description Logic ABoxes. The set j of input

assertions that are provided by image analysis are split into bona fide assertionsj1 that are

considered true by default, and bona fiat ones j2 that need to be explained. Interpretation

then is formalized as the abductive problem of finding the explanations a such that S,j1,a ⊨ j2, holds. In the presented experimental setting, j2 corresponds to the set of

spatial relationships assertions. Such division, however, is arbitrary and cannot be justified

formally; similar considerations hold for the definition of the backward-chaining rules

used to implement the abductive reasoning. Preference over the possible explanations is

determined in terms of the number of (new) individuals that need to be hypothesized (as

part of a) and the number of j2 assertions that get explained.

Advancing from purely deductive configurations to include nonstandard forms of

inference marks a significant turn, given the ill-defined transition from perceptual repre-

sentations into semantic descriptions. Automatic segmentation hardly ever results in

(semantically) meaningful partitions, while it is practically impossible to acquire unique

and reliable mappings between perceptual appearances and semantic notions. In such

a setting, the ability to cope with incompleteness and ambiguity is crucial, and abductive

reasoning, providing inference to the best explanation, presents an appealing direction for

future research.

However, extensions of this kind are not sufficient alone. The extraction of media

semantics at the level of objects, events, and scenes encompasses intrinsically a large

amount of imprecision. It permeates the extraction of features, the identification of

shapes, matching textures, colors, etc., and distills the translation from perceptual to

symbolic representations addressed by image analysis. The latter may express either

uncertainty, thus representing degrees of belief and plausibility, or vagueness, expressing

conformity through degrees of membership [52]. Yet, the majority of the literature tends

to treat the descriptions upon which inference is performed as crisp facts (binary prop-

ositions), ignoring the probability or vagueness information captured in the accompany-

ing confidence degrees.

The need to accommodate for vagueness has been acknowledged early on. In [53],

a preliminary investigation into the use of Description Logics for object recognition is

reported, outlining the limitations involved with exact recognition; in a subsequent

investigation, the proposed framework has been extended with approximate reasoning

to assess composite shape matching and subsumption [54]. In [55], a fuzzy DLs-based

reasoning framework is proposed to integrate, possibly complementary, overlapping, and/

or conflicting classifications at object and scene level, into a semantically coherent final

interpretation. The input classifications, obtained by means of statistical learning, are

modeled as fuzzy assertions, and a three-step procedure is followed in order to determine

the set of plausible interpretations, resolve inconsistencies by tracking the assertions and

axioms triggering them, and further enrich the interpretations by making explicit missing

descriptions [56]. Other approaches building on fuzzy semantics include [57], where


fuzzy DLs are used to merge over-segmented regions and to accordingly update the

degrees of classifications associated to the regions, and [58], where a fuzzy ontology

capturing spatial relations for image interpretation is presented.

Similar to fuzzy extensions [59, 60], probabilistic extensions to logical inference [61,

62] have been investigated in media interpretation approaches, though significantly

sparser. In [63], an appropriately defined ontology, which links visually extracted descriptors

with domain entities and semantic constraints, guides the design of the Bayesian network used

to perform probabilistic inference over automatically extracted video descriptions. In [64],

commonsense knowledge, encoded in the formof first-order logic production rules, is used to

deduce the topology of a Markov Logic Network [65] and semantically analyze parking lot

videos. The use of the Markov Logic Network makes it possible to formulate the uncertainty

involved in the detection of objects and movements, as well as the statistical ambiguity

characterizing part of the domain knowledge; while in [118], bilattice theory, which orders

knowledge along two axes that represent the degree of truth and the degree of belief,

respectively [124], is explored as the means to handle and reason under imprecision in

human detection applications. In [119], a reasoning framework that combines Semantic

Web technologies with rule-based and causality-based reasoning is investigated, while

highlighting challenges with respect to inconsistency and uncertainty handling. Finally, it

is worth mentioning two initiatives that culminate the findings of a series of workshops

toward an ontology framework for representing video events. The Video Event Represen-

tation Language (VERL) models events in the form of changes of states, while the Video

Event Markup Language (VEML) serves as a complementary annotation framework [66].

Though less rigorous than respective logic-based formalisms for representing actions

and effects and temporal semantics (e.g., the Event Calculus [120]), such initiatives

manifest a continuously increased awareness and interest in cross-disciplinary results

and experiences.

The aforementioned work outlines an intriguing amalgam of valuable results and

insightful observations. As illustrated by the current state of the art, formal knowledge

representation and reasoning bring in a tremendous potential to inject semantics into the

otherwise data-driven statistical learning and inferencing technologies used in media

interpretation. Intrinsic traits challenge the typical deductive reasoning scheme much as

the classical binary value semantics, demanding a profound investigation of the exten-

sions and adaptations necessary to the currently available inference mechanisms. In this

quest, the management of imprecision is crucial, especially with regard to the effective

combination of probabilistic and fuzzy semantics under a formal, coherent framework:

the distinction between probable and plausible interpretations is key both to forming and

ranking alternative interpretations.

Supporting hybrid inference schemes that allow for imprecision though, is not

sufficient on its own to handle the missing and incomplete descriptions obtained by

means of typical media processing. Building on the classical logic paradigm, Semantic

Web languages adopt the open-world assumption. Low-level representations serve as

evidence that determine the set of possible interpretations and formal knowledge

is expected to further restrict them into valid ones based on coherency and consistency


considerations; yet compositional semantics are hardly encountered in the existing liter-

ature. The investigated interpretation configurations implicitly espouse a closed world-

view, focusing only on explicitly asserted facts, while poorly exploiting the supported

open semantics in the involved knowledge modeling and engineering tasks [66].

Such requirements are intertwined with another critical challenge, namely, the transition

from pipeline-like interpretation configurations, where successive steps of statistical and

logical inference take place, to more interactive schemes that exploit jointly learning and

logical inference. The existing approaches address fragmentarily and only partially the

aforementioned considerations, paving an interesting roadmap for future research

activities.

21.2.4 Semantics in Broadcasting

21.2.4.1 Metadata in Broadcasting from Its Origin

Although it was not called ‘‘metadata,’’ the audiovisual industry and the broadcasters in

particular have been managing such content-related information for decades. Archives

from where content has to be found and retrieved have been the place where the need for

accurate documentation first arose.

Metadata is the modern IT equivalent of a label on a tape or film reel (title, short

description) with potentially more structured machine-readable information (technical

details, broadcast time, storage location). With a growing quantity of content being

produced every year (thousands of hours of audio and video material), the business

rationale behind well-documented metadata is more justified than ever: ‘‘if you can’t find

it, it’s like you don’t have it, hence you must pay for it again!’’

Although the first databases date back to the 1960s, their real expansion came with

the democratization, ease of use, and the reasonable computing power of computers in the

mid-1980s. Within the broadcasting community, the ‘‘early adopters’’ waited until the

mid-1990s (already 15 years later) to measure the potential of metadata and information

management in databases. Still, it is only recently that the role of metadata has been fully

recognized.

In an analog world, the first broadcaster’s need for metadata was internal to recover all

the information historically available on tags, cards, production forms, and a reference to

a physical location of the media (e.g., tape, film, and disk on a shelf). Digitization has been

the opportunity for generating and managing more data like restoration information

(e.g., tools, methods, parameters, results). In a file-based production environment,

metadata is vital: what is the format of the video or audio file? What editorial content

does the file contain? Where is the resource within petabytes of distributed mass storage?

The exchange of content (e.g., between the post-producer of an advertising spot and the

broadcaster in charge of exploiting it) is also greatly facilitated by metadata to search

material, publish information on programs available, and provide information on the file

being provided.


However, although all the technical conditions are now met to develop effective

metadata solutions for production, the cost of generating metadata remains a barrier

and the next challenge is to develop tools to automatically extract or generate metadata.

This includes speech-to-text recognition, face recognition, format detection, and content

summarization (e.g., reduce a 40-min program into a 3-min clip made of representative

key scenes and synchronized metadata).

Last but not least, the objective of broadcasters is to have their programs being easily

accessed and seen across a variety of delivery media and platforms including traditional

linear broadcast, but also Internet (live streaming, video-on-demand, catch-up TV),

mobiles, and any hybrid combination like hybrid broadcast–broadband. In this rich and

ubiquitous context, metadata is vital.

21.2.4.2 Metadata Standardization in Broadcasting

In this section, different metadata standards for production and distribution will be

mentioned. Proprietary metadata solutions from MAM (Media Asset Management)

solution providers or consumer electronics manufacturers (proposing competing pro-

gram guides accessible through their respective products for an additional fee) are

intentionally out of scope.

Different groups are working on broadcasting standards. The AdvancedMediaWorkflow

Association (AMWA) has a focus on metadata associated to container formats also carrying

metadata (the AdvancedAuthoring andMedia Exchange Formats, AAF,MXF). The European

Broadcasting Union (EBU) is developing technical specifications related to all domains of

broadcasting technology, including metadata. The Society of Motion Picture and Television

Engineers (SMPTE) develops specification for audiovisual production. Harmonization is

desired although difficult to achieve. But, it must be noted that several of the existing

standards correspond to different needs or have only a regional impact.

Why Are Standards Necessary?

The ‘‘business-to-business exchange’’ application led to the necessity to propose a solu-

tion for interoperability, that is, using information understandable to the sending

and receiving parties. It is critically needed in a broadcasting environment in which

data, aggregated from different providers, have to be forwarded in a common format to

receiving devices from different consumer electronics manufacturers. It remains true for

hybrid broadcast–broadband services where data are also aggregated from different

sources and represented in a common format, for example, for display on a portal page

or for transmission to devices.

What Is Meant by Interoperability?

The first level of interoperability is the identification of a common set of structured

attributes characterizing content with agreed names and detailed semantics. Some exam-

ples: DMS-1 has been defined by AMWA as a set of attributes to be associated to

audiovisual material in MXF (Media Exchange Format) containers. RP210 is an SMPTE


dictionary of metadata attributes commonly met in television and radio production. The

EBUCore is defined by EBU as a core set of metadata based onDublin Core to facilitate the

aggregation and exchange of metadata with audiovisual archive portals. ETSI TV-Anytime

was developed to facilitate the use of personal video recorders through harmonized

electronic program guide data. DVB Service Information is aminimum set of information

related to programs and services, which is broadcast in DVB streams.

The second level of interoperability is the representation format, which defines how the

structure of description attributes is being digitally serialized. Some examples of repre-

sentation formats are:

● SMPTE KLV (Key, Length, Value)

● W3C XML, RDF/OWL N3, or Turtle

● JSON (JavaScript Object Notation)

● DVB SI (binary encoding of service information)

The third level of interoperability is the definition of delivery mechanisms (e.g.,

standardized by DVB in Europe, ARIB in Japan, or ATSC in the USA) over, for example,

MPEG Transport Stream (MPEG-TS) or Internet Protocols (IP). This includes solutions

adapted to the bandwidth of the different media such as data fragmentation and partial

updates.

21.2.4.3 Using Ontologies: Metadata + Semantic

One motivation for broadcast metadata is to provide search and personalization

functionality through access to richer information in order to facilitate faster queries

and deliver results more relevant to users. This, in proportion with the large volumes of

audiovisual material being produced, requires always more metadata augmented

with more semantics. An important question to answer before designing an ontology

and associated properties is ‘‘what is it that the implementer wants users to search for?’’

Because of the close relation of the Semantic Web initiative to W3C, the use of

semantic descriptions of audiovisual content was initially thought to have a de facto

focus on distribution, that is, targeting access to content by the users. This is why work

primarily started from TV-Anytime (a metadata format for describing electronic program

guides and on-demand catalogs), which additionally proposes a consistent class model

and embryonic identifier-based entity relationships. Further work showed the high

potential value of also using semantic-based descriptions for metadata at the production

stage and broadcaster archives.

21.2.4.4 A Semantic Representation of TV-Anytime in a Nutshell

The TV-Anytime specification has been developed within an open forum of broadcasters,

manufacturers, operators, EPG providers, etc. It addresses linear broadcasting and online

nonlinear services. Although it was first published by ETSI in the series of specifications


TS 102 822 in 2005, it also fits new content access concepts like catch-up TV and other

mobile services.

TV-Anytime benefits from a solid class-oriented data model shown in > Fig. 21.9.

> Figure 21.9 shows different types of classes (e.g., ProgramGroup, Programme,

Person), entity/class relationships (object properties in blue), and data properties

(in red). Considering how TV-Anytime could be represented in a Semantic Web model:

– The set of classes forms the backbone of themodel. All the classes (e.g., ProgramGroup,

Programme, Segment) represented in > Fig. 21.9 are properly identified (as they

would likely be recorded in a database) and can easily be attributed a URI, which is

a key eligibility criterion for a class in the Semantic Web. The fact that CreditsItem

does not have an identifier is not essential in XML but it will be more critical in

a semantic model to avoid using blank nodes through which a Person or

Organisation class instance would be linked to Programme with the addition of

the role data property. However, this means that credit items should be managed as an

individually identified class in the broadcaster’s database, which exists but is not

necessarily common practice. Best practice would dictate that a forthcoming version

of TV-Anytime contains an optional identifier per credit item.

– TV-Anytime relations such asMemberOf, AggregationOf, or RelatedMaterial are directly

eligible to become object properties. It must be noted that several of these relations have

their inverse also defined, which is another important feature in support of semantic

models.

– IDRef relationships (from the XML Schema) are also implicit object properties for which

better names can be found.

– XML implicit relationships like ScheduleEvent, an element member of the Schedule

complex type would need to be associated with a proper identifier to become

a ScheduleEvent class for which an object property such as HasScheduleEvent

would be used to create an association with a class schedule.

– As far as data properties are concerned, the transformation is rather straightforward

with the exception that reusable complex–type structures should be replaced by flat

structures directly referring to a class. This again is to avoid blank nodes.

Starting with the transformation rules mentioned above, it becomes easy to transform

the most significant part of the TV-Anytime model into an ontology written in RDF

(Resource Description Framework) and OWL (Web Ontology Language). As an example,

the statement ‘‘a TV Program has the title ‘Tonight’s Show’’’ could be expressed in RDF/

OWL as shown in > Fig. 21.10.

However, it is not necessarily optimal to work only in RDF/OWL. For example, cardinal-

ities cannot be managed with the same flexibility as in XML. An optionwould therefore be to

generate instances in the strongly validated XML environment, and to transform the results

into an instance of the equivalent ontology as shown in > Fig. 21.11.

The use of an instance template is attractive to users as it hides from them the complexity

of the ontology. However, generating instance templates for complex ontologies such as for

audiovisual services is a challenge. Tools to facilitate this are currently missing.

Pro

gram

Gro

up

Pro

gram

Gro

up

Pro

gram

me

Pro

gram

me

IDR

ef

UR

s, U

RL

(por

tal),

CR

D(p

rogr

amm

e),

etc.

TV

AID

Str

ing

CR

Dse

mid

Str

ing

Str

ing

IDR

ef

IDR

ef

date

Tim

e

Str

ing

date

Tim

e

strin

g

strin

g

sem

id

CR

ID

Inte

ger

TV

AID

IDne

eded

!

Seg

men

ted

Seg

men

tGro

upid

Mem

berO

f

Mem

berO

fA

ggre

gatio

nOf P

rogr

amid

Der

ived

From

Has

Cre

dz

Per

sonI

DR

ef

Sch

edul

eEve

nt

serv

iceI

DR

ef

Par

entS

ervi

ce

Ser

vice

IDR

ef

MIS

SIN

G!!!

Org

anis

atio

nID

Ref

Rel

ated

Mat

eria

l

Seg

men

tid

Epi

sode

Of

Titl

e

Seg

men

tinfo

rmat

ion

Titl

e

Rol

eor

gld

pens

old

end

star

t

Ser

vice

Nam

e

serv

icel

d

Inst

anoe

ld

Gen

reTite

Gro

upid

Num

berO

fitem

s

Gen

eP

rogr

ame

Cre

dits

Item

Org

anis

atio

n

Per

son

Sch

edul

e

Ser

vice

Sch

edul

eEve

nt

Ser

vice

Pro

gram

Gro

up, S

erie

s

Seg

men

t

Seg

men

tGro

up

.Fig.21.9

Entity

relationship

diagram

oftheTV-Anytimeclass

model


Data model

Parser

XML Schemarepresentation,

cardinalities, datatypes,etc.

Ontology, minimumconstraints (“some”, string),

focus on semantic links

Reasoner Application, e.g., search engine

Ontologyinstance

Instancetemplate

Validatedmetadatainstance

Database

Application, e.g., EPG

retrieve

crawl

map

. Fig. 21.11

Combining XML and RDF/OWL

Ontologytva:hasTitle a owl:DatatypeProperty ;

rdfs:domain tva:Programme ;rdfs:range xsd:string .

Instancetvshows:102587 a tva:Programme ;

tva:hasTitle “Tonight’s Show” .

. Fig. 21.10

Example of RDF statement, schema, and instance


The main advantages of ontologies for broadcasters are:

– The simplicity of flat statements about resources

– The scalability to create new classes and properties, for example, for customization or

particular applications, in a backward compatible manner

– The possibility to infer properties

– The flexibility to use new query approaches.

Some of the disadvantages of ontologies for broadcasters are:

– A steep learning curve

– The danger of confusing concepts and misusing, for example, class and subclasses

– The management of cardinalities

– The nontrivial conversion of XML structures in RDF

– The lack of editing and validating tools


21.2.4.5 A Semantic Representation of Thesauri

Finally, another important part of the TV-Anytime specification is the Classification

schemes such as the controlled lists of genres and roles. The EBU has converted some of

the TV-Anytime classification schemes into SKOS (Simple Knowledge Organization

System), see http://www.ebu.ch/metadata/ontologies/skos/.

SKOS is a vocabulary that is very convenient for representing classification schemes

with object properties like broader term or narrower term, exactMatch, narrowMatch, or

broadMatch.

As shown in > Fig. 21.12, each term of a classification scheme (or thesaurus) is

independently subject to a series of statements and is no longer part of a hierarchical

XML structure such as used by MPEG-7, TV-Anytime or DVB. Nevertheless, the hierar-

chical structure can be reconstituted by reasoners as shown in > Fig. 21.13, and also

include machine-interpretable statements about mapping to other external classification

schemes. Ontologies and class models like SKOS are the answer to resolving access to

classification scheme terms:

– In MPEG-7, TV-Anytime or DVB, Classification Schemes are defined as hierarchical

lists of terms identified by a termID (the access key). Each term has at least a name and

a definition. In the XML space, resolving a URIwith termID into a term name requires

to put in place additional resolving mechanisms (e.g., developing a particular software

interface or API).

– In RDF, an object property will point to the SKOS class called ‘‘concept’’ identified by

its URI (e.g., the classification scheme locator and termID). If the classification scheme

has been imported or can be connected to, all data properties of the concept are

directly accessible. This is the lowest but very demonstrative level of ‘‘linked data.’’ Any

ontology can therefore refer to any term of a SKOS classification scheme.

Other mechanisms than SKOS could be defined in order to describe classification

schemes. However, the need for interoperability requires agreeing on a set of well-defined

classes and properties, which SKOS successfully proposes for controlled vocabularies.

ebu:ebu_ContentGenreCS.skos.xml#3.6.8.16.4a skos:Concept ;skos:note “Valid” ;skos:historyNote “2007-04-12” ;skos:changeNote “First version” ;skos:prefLabel “Dubstep” ;skos:narrowMatch

http://www.ebu.ch/cs/tva/ContentCS.xml#3.6.8.16.4 ;skos:broader

ebu:ebu_ContentGenreCS.skos.xml#3.6.8.16 .

. Fig. 21.12

Extract from EditorialFormatCodeCS

http://www.ebu.ch/metadata/ontologies/skos/

. Fig. 21.13

Screenshot of a SKOS view of EditorialFormatCodeCS using Protege


21.2.4.6 The Holy Grail: Agreeing on a Class Model

Standardization groups like the European Broadcasting Union (EBU), the International

Press and Telecommunications Committee (IPTC), andW3CMedia AnnotationWorking

Group (MAWG) are now paying more attention to the Semantic Web and linked data,

generally starting by the ‘‘SKOSification’’ of their classification schemes. More interest-

ingly, there is also an attempt coordinated by EBU to define a common basic class model

for audiovisual content. These are the main classes used by some audiovisual schemas

(among several others):

– BBC ‘‘Programme Model’’ (http://www.bbc.co.uk/ontologies/programmes): brand,

series, episode, program, program item, version (of the program), segment, broadcast

(event), service, channel, broadcaster (role), person

– CableLabs: asset, chapter, distributor, provider, person, actor, director, producer, studio

– EBUCore (http://tech.ebu.ch/docs/tech/tech3293v1_1.pdf), EBU P-META (http://tech.

ebu.ch/docs/tech/tech3295v2_1.pdf) & W3C MAWG: resource, creator, contributor,

http://www.bbc.co.uk/ontologies/programmes

http://tech.ebu.ch/docs/tech/tech3293v1_1.pdf




publisher, location, collection/group, fragment/part/segment, concept (classification),

person, organization

– ETSI TV-Anytime: program group (including brand, series, etc.), program, segment

group, segment, service, schedule, location (broadcast event, schedule event,

on-demand program, push download program), person, organization, concept

(classification)

– FRBR, work, expression, manifestation, item, person, corporate body, event, place,

concept, object

– IPTC newsML-G2 (http://www.iptc.org/cms/site/index.html?channel=CH0111):

news item, part, person, organization, creator, contributor, person, organization,

concept

– ISO/IEC MPEG-7: audiovisual segment, video segment, audio segment, text segment,

segment group, audiovisual region, fragment, collection, agent, person, organization,

place, event, object

– PBCore: resource, creator, contributor, publisher, concept (classification)

As can be seen from the examples above, nothing should prevent minimum harmo-

nization but a lack of willingness. To be finalized, this model will also require detailed

semantics for every class. Furthermore, several classes are eligible to become subclasses. Of

course, themodel can be complemented with user-defined classes or a user can utilize only

a subset of the above defined classes.

21.2.4.7 Agreeing on Properties

A first level of interoperability is achieved by defining a common set of classes. The effort

needs to be repeated on properties. There are two main types of properties in semantic

modeling:

– Object properties defining relation between classes/objects. EpisodeOf, AggregationOf,

and MemberOf are very explicit examples as shown in > Fig. 21.9.

– Data properties qualifying a class/object, of which typical examples are ‘‘Title,’’ ‘‘Iden-

tifier,’’ ‘‘Description.’’

Properties must be selected properly. The most important criterion consists of defin-

ing properties onwhich queries will be done:What is it that users will or should be looking

for? The second criterion is the definition of inverse transitive properties, which, by

inference, will enrich the number of triples in stores on which queries will be done,

therefore maximizing the chances of positive query hits. In a linked data environment,

the third criterion is to reuse existing ontologies defining classes and properties such as

FOAF (Friend of a Friend) for persons and contacts. Of course, the choice of such links to

existing ontologies shall not prevail upon the efficiency of a solution developed

for a particular application within a specific ecosystem. All that matters is the interoper-

ability requirement, which may vary. Linked data also raise issues like persistence and

http://www.iptc.org/cms/site/index.html?channel=CH0111


(e.g., editorial) quality. While in XML, agreeing on properties is more problematic

because the model is often closely linked to a particular application, in RDF, properties

and classes can complement those in an existing ontology.> Figure 21.14 shows a possible

high-level class model encompassing the commonalities of the schemas listed above. This

may be a first step toward a harmonization of audiovisual content metadata.

Channel

Location

Channel

Location

Series

ParentSeries

ParentGroup

ParentBlock

ParentGroup

Relatedordered

programmes

ProgrammeGroup

Collection, Mini-Series, Series,Concept, Show,

Theme, etc.

Episode

ItemGroup

Relateditems

Audiotrack

Videotrack

MediaObject

Clip (physical),newsfeed

rushes, shot,background

music,fragment

Broadcastevent

Schedule

ServiceBrand

Catalog

VoD, catch-upTV

Portal

Location

Lens

Lensparameters

EditorialObject

Dopesheet,script, report,

etc.

Location

Programmedescription

Event

Programmedescription

Person

E.g., Role

Agent

Role(contributor)

E.g., credits

Organisation

E.g., credits,brand, service

Concept

ClassificationSchemes

Camera

Cameraparameters

Instance (work)

Delivery (e.g.,broadcast)

Version,Manifestation

Programme(resource,

work asset)

E.g., TV/radioprogramme,movie, Tune

ItemBlock

Time-relatedand ordered

items

Item

Segment, e.g.,edited news

item/part

Pushdownload

. Fig. 21.14

A unified class model?

21.3 Example Applications 21 945

21.3 Example Applications

In this section, two major domains for the technologies and tools of semantic multimedia

are introduced with examples of semantic multimedia technology application: television,

and cultural heritage. > Section 21.3.1 is partially based on [69] with acknowledgment to

the coauthors.

21.3.1 Semantic Television

Television has on the one hand traditionally meant the broadcast industry and on the other

hand now incorporates the growing Web-based video domain which is converging with

classical broadcast TV in the end device. In this domain, the main atomic object for semantic

description is the individual TV program (or video item), while theremay be a further higher-

level description of the structure of those programs (EPG metadata, or a video playlist). The

main challenges in the television (and, by extension, Web video) domain are the scale of

the content available and the need for filtering and personalization of the content. The

NoTube project (notube.tv) considers three particularly representative scenarios for future

television enabled by semantic technology:

(a) The RAI demonstrator shows how news programs can be enriched with concepts

(people, places, themes) that allow personalized delivery of a news stream and easy

browsing to additional information. This demonstrator focuses on the value of

passive personalization of on-demand TV content.

(b) The Stoneroos demonstrator enables a user to create an interests profile in a simple

fashion, which can be used to generate TV program recommendations within their

personal EPG. This demonstrator focuses on the value of a multi-platform and

multilingual platform for personal content and ads.

(c) The BBC demonstrator shows how TV can be personalized using Social Web data,

facilitating a personalized TV experience without an intrusive user profiling process.

This demonstrator focuses on the value of active personalization of TV which is

integrated with the user’s Social Web activities.

To illustrate what is envisaged more generally by semantic TV, Jana is introduced as an

example future user of the NoTube infrastructure. She is socially active on the Web and

does not see the need to explicitly define her preferences or wait until she has used the

recommender system long enough for it to learn her preferences. In the first use case,

Jana’s recommendations are generated based on her online social activity. In the second

use case, Jana is interested in a program and uses the ‘‘I would like to knowmore’’-button.

Jana then gets information about this program, which contains links toWikipedia, IMDB,

or online information sources. Next to this, she also gets recommendations of related

programs. With the ‘‘why’’-button option Jana can see why each program has been

recommended to her. As enriched TV program descriptions are considered, the reasons

for recommendations are often based on interesting semantic relations between entities.


For example, when Jana is watching an episode of ‘‘True Blood,’’ this makes her curious

about the series, so she picks up her smartphone to find out more about it using the

NoTube application. When she presses the ‘‘I want to know more’’-button the Wikipedia

page is shown as well as some recommendations. One of the recommendations is the pilot

of the series ‘‘Six Feet Under,’’ which she already knows. She is curious about the reason of

the recommendation, so she presses the ‘‘why?’’-button next to it and sees that both series

were created by ‘‘Alan Ball’’ and they share two genres: ‘‘black comedy’’ and ‘‘drama.’’ She

is happy to learn that the two series were created by the same man and continues by

looking up information about Alan Ball.

The open NoTube TV and Web infrastructure is illustrated in > Fig. 21.15. The front

end which is what the user sees is any device connected to the Internet and able to

consume NoTube services, whether a TV, PC, or mobile device, including a so-called

second screen (where a smaller mobile device is used to present auxiliary content in

synchronization with the TV signal on the larger screen device). The Application Logic

implements the workflow of data and processes which realize the NoTube service to the

end device. It relies on the Middleware and Broker layer for this, which makes use of

Semantic Web Service technology to dynamically discover and resolve adequate, available

services into more complex service workflows.

To enable this, sets of services for users, metadata, and TV content are developed,

which are described semantically and mediated by the broker, and in front specific

applications are developed to make use of those services to provide the desired

. Fig. 21.15

NoTube open TV and Web infrastructure


functionalities, for example, user activity capture, and content recommendation. The

section considers, from the user services, the Beancounter service which harvests user

data from online social profiles, which are used to generate a semantic user-interests

profile. On the basis of that interests profile, TV program descriptions are analyzed and

recommendations aremade to the viewer. From the content services, the DataWarehouse

service collects and enriches EPG data. Existing EPG harvesting services, such as XMLTV,

are used to obtain EPG data. The descriptions of TV programs are then enriched by

metadata services, such as the Named Entity Recognition service, which identifies entities

from the linked data cloud. The NoTube vocabulary alignment service identifies links

between concepts of different vocabularies.

21.3.1.1 User Activity Capture

The huge amount of interactions that a user performs on the Web represents a powerful

and extraordinary source for mining his or her interests and preferences. Even if several

attempts already tried to infer a profile of a user starting from his or her behavior on

the Web [121–123], the NoTube approach focuses on the opportunities unleashed by the

so-called Social and Real-time Web – where users discover, consume and share contents

within their social graph, in a real-time manner often using mobile devices. In such

environments each user produces a rich and machine-readable flow of activities from

which implicit and explicit information regarding his or her preferences can be extracted.

This scenario considers a generic user who holds at least two accounts on different

social Web applications: Last.fm and Glue.com. Last.fm tracks the user in listening,

sharing, and live events activities. Glue.com acts like a user log, realized as a browser

plug-in. Glue.com makes available through a set of Web APIs an exhaustive set of Web

resources the user visited, shared, or liked, enriching themwith an internal categorization.

The following sections show how data are aggregated from these sources, linked to

information in several linked data clouds, and how reasoning over the data can make

explicit the user’s interests in a user profile. The information aggregation is achieved

through identity resolution made against different ontologies.

To uniformly represent user activity data of different sources in a single graph, the

ATOM Activity Streams in RDF vocabulary (http://xmlns.notu.be/aair/) is used to rep-

resent user activities. To determine the objects of activity, a named entity recognition

service is used. An alignment service is used to link the objects of different vocabularies,

for example, Last.fm artists are linked to DBpedia entities and the BBC Music catalog

(http://www.bbc.co.uk/music). Vocabularies are defined for TV-related user activities,

that is, verbs such as, for example, ‘‘watching, reviewing, rating.’’

The data generated by the data collection and enrichment process form a potentially

huge amount of activities for each individual user. The challenge is to derive general user

interests from this set of user activities. This is done by using the DBpedia SKOS

vocabulary, which is the semantic counterpart of the Wikipedia Categories. If a user

listens to bands or musicians sharing the same subject, then it could be reasonable to infer

http://xmlns.notu.be/aair/

http://www.bbc.co.uk/music


that the subject represents an interest for that user. Moreover, the rich and complex SKOS

hierarchy of DBpedia allows one to extract a lot of other interesting information. For

example, if a user is particularly interested in movies where a particular actor or actress

appeared, more information will be available about this in the system, since it is highly

probable that DBpedia contains some SKOS subjects describing this. Similarly, if a user

listens to bands originating from a specific geographical region then this could be useful to

perform recommendations of other bands and artists.

21.3.1.2 Enriched EPG Data

In general, EPG data are produced by broadcast companies. Some broadcasts companies,

such as the BBC, have made their EPG data machine-readable and publicly available.

Other EPG data are harvested from websites using existing tools, such as XMLTV, and

converted to a machine-readable format.

Enrichments of EPG metadata are used to provide the end users with extra information

about the content in which they are interested. For example, a scheduled broadcast of a movie

could have an enrichment that enumerates the main actors together with the pointers to

IMDB. Recommender algorithms that fall in the category of content-based filtering algorithms

use content descriptions of the items for determining the relevance to the users. To give

a simple example, when a user oftenwatches content annotatedwith theWestern concept, then

other content annotated with the same or related concepts may be interesting to him or her.

In the NoTube project broadcast data are enriched using the linked data cloud, by

linking existing data sources, for example, DBPedia (subject data), Yago (data about

people), and IMDB (data about movies) to broadcast metadata. By enriching the EPG

data, links to semantic entities in the linked data cloud are added to the metadata of TV

programs. The interconnected entities in the linked data cloud allow for finding interest-

ing relationships between entities, for example, that twomovies have beenmade by people

that have a common interest in film noir. The relationship between entities is often typed,

for example, by SKOS relationships. Since not all relationships between entities are

considered interesting, the types of the relationships must be taken into account during

the recommendation process. This can be done by using relationships of specific patterns

and/or assigning a certain weight to specific relations.

21.3.1.3 Alignment Between Vocabularies

The data sources described above are often already annotated with a fixed set of concepts.

For example, EPG data from the BBC (http://www.bbc.co.uk/programmes) are annotated

with BBC-defined genre hierarchy and IMDB-categorized TV series and Films into

a similar set of genres. Vocabularies can be domain-independent, for example, Princeton

WordNet (http://wordnet.princeton.edu) provides a set of lexical concepts that match

words (e.g., in English) and provides semantic relations between those words. Such

http://www.bbc.co.uk/programmes

http://wordnet.princeton.edu


vocabularies can be used to annotate domain-specific data. For example, the description

of a movie could be a set of WordNet concepts. For some datasets the Semantic Web

community already converted the vocabularies and schemas in RDF, like the BBC

Programmes ontology (http://bbc.co.uk/ontologies/programmes), the TV-Anytime

schemas and vocabularies, W3C WordNet. To cover multiple perspectives (extracted)

additional genre vocabularies for sources like YouTube are also created.

21.3.1.4 Personalized TV Program Recommendation

Given the availability of the semantically enriched EPG/TV program data, the semanti-

cally enriched user activity and interests profile, and the alignment between different

vocabularies, the NoTube component Beancounter is able to process the combination of

these data to provide a personalized TV program recommendation. The recommendation

strategy in NoTube takes a content-based approach, in which the closeness between

concepts in a classification scheme (e.g., the DBPedia categorization model) is taken to

provide aweighting of the topics withwhich a program is annotated with respect to the set

of topics in the user’s profile. To detail this further:

1. Identify weighted sets of DBPedia resources from user activity objects.

2. Compute the distance between DBPedia concepts in the user profile and in the

program schedule through a SKOS-based categorization scheme.

3. Choose the matches above a certain threshold for TV program recommendation.

As a result, it should be possible to present the user, through their EPG, for example,

a highlighting of TV programs, which should interest them and also to provide some

explanation for the recommendation, as shown in the mock-up below of a personalized

EPG (> Fig. 21.16). The same technologies are used in the back-end to enable the other

NoTube scenarios, such as a personalized news stream, or pushing personalized advertising.

21.3.2 Semantics in Cultural Heritage

Objects and content in cultural heritage (CH) are both textual and non-textual, interre-

lated with each other in various ways, and are produced by various organizations and

individuals in different ways. As a result, producing, harvesting, aggregating, publishing,

and utilizing cultural heritage content on theWeb is difficult in many ways. In this section,

three major problem areas are covered:

(1) Semantics for cultural heritage. Ontologies and metadata formats are the key

components needed in representing CH content on the Semantic Web. Rich ontol-

ogies and complexmetadata models coveringmore or less all aspects of human life are

needed for representing the semantics of culture for machines.

(2) Content creation challenges. Cultural heritage content is produced in a distributed

creation process by various organizations and individuals from different cultures using

http://bbc.co.uk/ontologies/programmes

. Fig. 21.16

Mocked-up personalized EPG from NoTube project


different languages. The content is typically very heterogeneous, both in terms of

metadata formats and vocabularies/ontologies used. However, from the end user’s

viewpoint, content should be accessible seamlessly using different languages and vocab-

ularies from different eras, which means that the content should be made semantically

interoperable.

(3) Semantic eCulture systems. Semantic computing facilitates searching, linking, and

presenting the semantically interlinked, heterogeneous, multi-format, and multilin-

gual CH content. This applies to both professional and layman end users, as well as to

machines using CH repositories though service APIs.

Semantic Web technologies provide new solution approaches to all these areas, and

cultural heritage (CH) has become an important application domain for semantic technol-

ogies. This section presents an overview of issues and solution approaches related to

representing ontologies and metadata of cultural heritage, to creating syntactically and

semantically interoperable content, and to creating intelligent end-user applications on the

Semantic Web.

In journalism and multimedia, content is often collected, described, and searched in

terms of the ‘‘Five Ws and one H’’:

● Who? Who was involved?

● What? What happened and what was involved?


● Where? Where did it take place?

● When? When did it take place?

● Why? Why did it happen?

● How? How did it happen?

In the following, firstly properties of semantic cultural content along these ontological

dimensions are discussed.

21.3.2.1 Ontological Dimensions

To answer the who-question, vocabularies, authority files, and ontologies of persons,

organizations, and fictive actors have been created. The problems of identifying and

describing, for example, persons are well known in, for example, the library domain

[69]. For example, similar names are shared by many individuals (e.g., John Smith),

names change in time (e.g., when getting married), names are transliterated in different

ways in different languages, people use pseudo names and are known by nicknames. An

example of an extensive authority system is the Library of Congress Authority Files

(http://authorities.loc.gov). The Universal List of Authority Names (ULAN) (http://

www.getty.edu/research/conductingresearch/vocabularies/ulan/) of Getty Foundation is

widely used in cultural institutions and Semantic Web systems.

The what-question involves both events that take place and tangible objects that

participate in events. Events are a central category in knowledge representation of artificial

intelligence (AI) [70], and have been employed in semantic cultural heritage systems, too.

Events form the core of the CIDOC CRM system [71], an ontological system for

harmonizing cultural heritage and library content. By describing what actually is hap-

pening in the real world, heterogeneous content can be harmonized and made interop-

erable in a deeper semantic sense [73]. In ontologies, such as DOLCE [72] and WordNet

[74], events are separated from other ontological concepts. Events remain a difficult

concept to represent through an ontology as they are complex and necessarily involve

many other concepts in a particular relationship to one another.

For representing tangible objects, cultural heritage thesauri such as the Art and

Architecture Thesaurus (AAT) are available. These thesauri make it possible to identify

and disambiguate object types from each other and harmonize references to them.

Additional ontological descriptions are needed for more refined semantic descriptions

of objects. One dimension here is the structure of the object. This involves, for example,

describing various part-of relations [74], such as area inclusion, member-of, made-of, and

consists-of relations. For example, a material ontology can be used for describing the

materials of which objects are made of. A consists-of relation may describe the compo-

sition of objects, for example, that legs are part of chairs. Also the function of objects is

often important to know, for example, that ships are used for sailing. Such relations are

needed, for example, when aggregating related information and objects together in search

and recommender systems.

http://authorities.loc.gov

http://www.getty.edu/research/conductingresearch/vocabularies/ulan/

http://www.getty.edu/research/conductingresearch/vocabularies/ulan/


A research area of its own is creating and searching 3D representations of

cultural artifacts and buildings [112]. For example, there are 3D models of CH buildings

and cities, such as the virtual Kyoto [75], using platforms such as Second Life and Google

Earth.

The where-dimension in the CH domain is challenging, because one has to deal with

not only modern geographical places, available in resources such as GeoNames and

various national geographical gazetteers, but also with historical places that may not

even exist today. The Thesaurus of Geographical Names (TGN) is a resource in which lots

of historical places can be found. A problem in dealing with historical places is that

cultural content is typically indexed (annotated) using historical names (e.g., Carthage)

but can be queried using names from different time periods (e.g., modern geography),

too. To address the problem of changing boundaries and names of historical places,

a model and ontology is presented in [76]. In a more general setting, the concept of

places is complex and involves not only geographical information. For example, what does

‘‘Germany’’ actually mean in terms of location, time, and culture?

Time and the when-question are of central importance in the cultural heritage domain

that deals with history. A special problem of interest here is that time periods are often

imprecise in different ways [77, 78]. Firstly, the time may be exact but not known. For

example, an artifact may have been manufactured in a certain day but it is not known

exactly when, only an estimate of the year, decade, or century may be known. This kind of

uncertainty of time can be modeled, for example, using time intervals and probability

theory. Secondly, the time may be fuzzy in nature. For example, a castle may have been

built during a longer period (or periods) of time with different intensity, so it is not

possible to state exactly when it was actually built. A modeling option here is to use fuzzy

sets for representing time. It should be noted also that time periods may not be absolute

but are conditioned by places. For example, the notion of the ‘‘bronze age’’ and stylistic

periods of art, for example, ‘‘art nouveau,’’ may be different in different countries and

cultures.

From a machine viewpoint, formal time representation can be used for reasoning, like

in the interval calculus [79], and when matching query time periods with indexing time

periods. From the human–computer interaction viewpoint, a key question is how does

one perceive uncertain time intervals in information retrieval, that is, when querying with

an imprecise time period? For example, querying on ‘‘the middle ages,’’ what time periods

should be included in the answer set and how relevant are they?

The question ‘‘why’’ has not been addressed much in CH systems for the Semantic

Web. There are, however, several approaches to this. Firstly, it is possible to model causal

chains explicitly in annotations. For example, in the history ontology HISTO (http://

www.seco.tkk.fi/ontologies/histo/), there are some 1,200 historical events of history some

of which are related to each other using causal chains. Explicit links or transitive link

chains based on them can be then shown to the end user illustrating the why-dimension.

A problem here is that there may be disagreements about historical causality and other

facts between the historians creating ontologies or annotating the content. Actually, it is

not uncommon that knowledge in humanities is based on different opinions. Metadata

http://www.seco.tkk.fi/ontologies/histo/

http://www.seco.tkk.fi/ontologies/histo/


can then be used for encoding different opinions, for example, what was the cause of the

World War II. Secondly, on a reasoning level, implicit relations between related objects of

interest can be explained based on the rules used during reasoning, as customary in some

expert systems of artificial intelligence research. For example, in [80], the semantic

recommendation links between related artifacts are produced using Prolog rules, and

a simple explanation of the reasoning chain in natural language is exposed to the end user.

Semantic CH systems have the potential to address the why-question by exposing and

presenting cultural content in novel ways that helps one in understanding cultural

phenomena and processes. Semantic techniques and computing can be used as a tool of

research for making explicit something useful or new that is only implicitly present in

a repository of cultural content. At this point, one enters the field of digital humanities

[81]. For example, the idea of associative or relational search, developed for security

applications [82], can be used as an approach to answer why-questions. For example, in

[83], one can query how two persons are related to each other based on the social network

of the ULAN registry of historical persons. Relational search is available also in [84].

Finally, the how-question addresses the problem of describing how things happen(ed).

By chaining and relating events with each other and by decomposing them into sub-

events, semantics of narrative structures such as stories can be presented. For example, in

[85], modeling CH processes and stories using RDF(S) is discussed. In [83], the narrative

structures of the epic Kalevala, and the processes of making leather booths, ceramics, and

farming have beenmodeled as narrative structures and interlinked to related CH contents,

such as objects in museum collections.

21.3.2.2 Challenges of Content Creation

Cultural heritage content is available in various forms, is semantically heterogeneous, is

interlinked, and is published at different locations on the Web. From the end user’s

viewpoint, it would be useful if content related to some topic, person, location, or other

resource could be aggregated and integrated, in order to provide richer and more

complete seamless views to contents. This is possible only if the interoperability of

heterogeneous content can be obtained on syntactic and semantic levels.

There are two major ways to address interoperability problems: one can either try to

prevent them during original content creation or one can try to solve the problems

afterward when aggregating content and creating applications. Preventing interoperability

problems is the goal of various efforts aiming at developing standards and harmonized

ways of expressing content, metadata, and vocabularies as well as best practices for

cataloging. The process of producing content can be supported by various shared tools,

such as shared ontology services and metadata format repositories.

Although harmonizing content creation would in general be the optimal strategy to

address interoperability issues, this is in practice possible only to some extent. As a result,

lots of post-processing effort is needed for solving interoperability problems afterward when

making existing non-harmonized content syntactically and semantically interoperable.


21.3.2.3 Syntactic and Semantic Interoperability

Syntactic interoperability means that data are represented in similar formats or structures.

Syntactic interoperability requires that similar fields are used in metadata structures, and

that their values are filled using similar formats. For example, one may demand that the

name of a person in two interoperable metadata schemas should be expressed using

separate properties ‘‘firstName’’ and ‘‘lastName,’’ and that the name strings used as

value are transliterated using the same system. For example, the name of Ivan Ayvazovsky,

the Russian painter (1817–1900) has 13 different labels in ULAN (Ajvazovskij, Aivazovski,

Aiwasoffski, etc.), all correctly transliterated in their own way.

Since Semantic Web content is represented using ontologies and metadata schemas,

semantic interoperability issues come in two major forms. First, there is the problem

of schema interoperability, that is, how two different metadata schemas of similar or

different content types can be made mutually interoperable. For example, the ‘‘painter’’

of a painting and the ‘‘author’’ of a novel should somehow be declared semantically related

as creators of a piece of art, otherwise all creators cannot be found and related. It

is also possible that syntactically similar property names in two schemas have different

meaning, which leads to semantic confusion. Second, there is the problem of vocabulary

interoperability. Here, values of a metadata schema field in content from different orga-

nizations may have been taken from different vocabularies that are related, but this

relation has not been explicated. For example, a vocabulary may have the concept of

‘‘chair’’ while ‘‘sofa’’ is used in another one. It is also possible that the same label has

different interpretations in different vocabularies, for example, ‘‘chair’’ as a piece

of furniture, or as a spokesman of an organization. CH concerns various topic areas,

such as art, history, handicraft, etc. in which different thesauri and vocabularies are used.

Even within a single topic area, different mutually non-interoperable vocabularies may

be used.

There seems to be at least three approaches to obtainingmetadata schema interoperability.

First, a minimal ‘‘core’’ schema can be specified that defines the common parts of all schemas

in focus. Then, more refined schemas called applications can be extended from the core by

introducing new fields and refining original ones. This approach has been adopted by the

Dublin Core (DC) Metadata Initiative. For example, ‘‘date’’ is a DC element that can further

be specified as ‘‘date published’’ or ‘‘date last modified.’’ The core elements can be refined or

qualified in an interoperable way by formal expressions, which relate the refinement or

qualification back to the core element, for example, the relationship between a core property

and its refinements can be represented in RDFS using the property rdfs:subPropertyOf. An

example of a DC application is VRA Core for representing metadata about works of visual

culture as well as the images that document them.

Second, it is possible to define a harmonizing ontology or schema that is capable of

representing all metadata schemas to be integrated. Semantic interoperability on a schema

level is then obtained by transforming the metadata of different forms into this harmo-

nized ontology. Awell-known example of this is the CIDOC Conceptual Reference Model

(CIDOC CRM) [3], the ISO standard 21127:2006. This model provides definitions and


a formal structure for describing the implicit and explicit concepts and relationships used in

cultural heritage documentation. The framework includes 81 classes, such as crm:Man-

MadeObject, crm:Place, and crm:Time-Span, and a set of 132 properties relating

events and the entities with each other, such as crm:HasTime-Span and crm:IsIdentifiedBy.

Third, it is possible to transform all metadata into a knowledge representation about

the events of the world, as customary in AI. This approach involves developing and using

domain ontologies/vocabularies for representing events and objects, not considered in the

CIDOC CRM standard focusing on schema semantics [72].

A major area of research in semantic CH applications is the (semi)automatic anno-

tation of contents. If contents are described using texts, then named entity, concept, and

relation extraction techniques [87] can be employed first in order to move from literal

into concept space. For non-textual multimedia content, for example, images, videos,

speech, and music, problems of crossing the semantic gap have to be addressed [94].

21.3.2.4 Semantic eCulture Systems

A major application type of semantic technologies in the CH domain has been semantic

portals [87]. Examples of such systems include, for example, MuseumFinland [80]

presenting artifacts from various museums, MultimediaN E-Culture demonstrator [84]

presenting art and artists fromvarious museums, CultureSampo [83] presenting virtually all

kinds of cultural contents (objects, persons, art, maps, narratives, music, etc.), CHIP [88]

for personalized mobile access to art collections, and Mobile DBPedia Mobile [89] for

mobile access to linked data contents. Systems such as Wikipedia (DBPedia) and Freebase

include lots of semantically linked CH content. In addition to systems utilizing SemanticWeb

technologies, there are even more eCulture sites, portals, and applications on the Web

implemented using more traditional technologies. Many of these systems have been reported

since 1997 in the Museums and the Web conference series (http://www.archimuse.com/

conferences/mw.html). A typical eCulture application here is a tailored application for

explaining and teaching a particular CH topic with a nice graphical Flash-based user interface.

In this section, research on semantic eCulture portals is described that focus on publishing

CH content from the collections of museums, libraries, archives, media organizations, and

other sources on the Semantic Web. A common goal in such CH portals is to create a global

view over CH collections that are distributed over the Web, as if the collections were a single

uniform repository. This idea, developed originally in some national research projects, has also

been adaptedonan international level inprojects such as theEuropeanLibrary andEuropeana.

These large-scale systems are, however, still based on traditional rather than semantic technol-

ogies. There is, however, a demonstration for Europeana’s semantic search based on

the MultimediaN E-Culture system (http://eculture.cs.vu.nl/europeana/session/search).

In order to survive on the Web, a CH portal should be beneficial to both content

providers and their customers. We describe below, an ideal ‘‘business model’’ of

a semantic CH portal, based on CultureSampo [83], clarifying the challenges and benefits

of utilizing semantic technologies in CH portals.

http://www.archimuse.com/conferences/mw.html

http://www.archimuse.com/conferences/mw.html

http://eculture.cs.vu.nl/europeana/session/search


There are twomajor categories of challenges involvedwhen creating aCHportal: semantic

and organizational. First, semantic challenges arise from the fact that cultural heritage content

is semantically heterogeneous and available in various forms (documents, images, audio

tracks, videos, collection items, learning objects, etc.), concern various topics (art, history,

handicraft, etc.), is written in different languages, and is targeted at both laymen and experts.

Furthermore, the content is semantically interlinked, as depicted in > Fig. 21.17.

Second, organizational challenges arise from the fact that memory, media, and other

organizations and citizens that create the contents work independently according to their

own goals and practices, as illustrated in > Fig. 21.18. This freedom and independence of

publication is essential and empowers the whole Web, but results also in redundant work

in content creation, and that interoperability of content between providers cannot be

achieved easily. For example, redundant information about cultural persons, places,

historical events, etc. has to be collected and maintained in many organizations, because

of missing collaboration between organizations. Each organization will have its own

database/metadata schema, which cannot be changed.

The Semantic Web–based solution approach to these problems is illustrated in

> Fig. 21.19, using elements of > Figs. 21.17 and > 21.18. The apparatus produces

Videos

MapsArtifacts

Encyclopedia

NarrativesLiterature

Music

Fine artsBiographies

Cultural sites

Buildings

. Fig. 21.17

Semantic challenges of CH portals: cultural heritage content comes in many forms and is

interlinked

. Fig. 21.18

Organizational challenge of CH portals: content is produced by independent organizations

and individuals for their own purposes with little collaboration


harmonized RDF content for a global knowledge base. In the center are the ontologies

forming a conceptual backbone of the system. The collection items around the ontologies

are attached to the ontologies by metadata that is produced in terms of harmonized and

interlinked metadata schemas and vocabularies. The content providers depicted around

the circle, that is, the portal system, publish metadata locally and independently by using

shared metadata schemas and ontologies. The result is a large global semantic RDF

network linking different contents together in ontologically meaningful ways. When an

organization or an individual person submits a piece of (meta)data into the system, the

new data get automatically semantically linked to related materials, that is, semantically

enriched. At the same time, all relatedmaterials get enriched by references to the new piece

of knowledge, and through it to other contents. The collaborative business model works,

because each additional piece of knowledge is (in the ideal world) beneficial to everybody

participating in the system. An additional benefit is that content providers can share

efforts in developing the ontology infrastructure, reducing redundant work.

> Figure 21.19 shows that a semantic CH portal is far more than the portal pages, as seen

by the customer on the Web. Firstly, a collaborative ontology infrastructure is needed. This

includes a set of cross-domain ontologies, such as artifacts, places, actors, events, time, etc.,

ontology alignments [90], and a selection of metadata schemas and their alignments.

. Fig. 21.19

Solution approach – harmonized, distributed production of content, linked together into

a global RDF knowledge base


Secondly, a content production and harvesting system is needed for the content providers

and the portal for producing and maintaining the content. A most important question

here is at what point semantic content is produced: during cataloging by the content

providers or afterward when harvesting the content at the portal. The choice depends

on the case at hand, but in general high-quality semantic content can be produced best at

the organizations producing the content, and shared tools supporting this using the

underlying ontology infrastructure can be very useful. For example, in CultureSampo

the national FinnONTO infrastructure [91] with its ontology services is used as a basis.

Semantics can be used to provide the end user, both human users and machines, with

intelligent services for finding, relating, and learning the right information based on his or her

own preferences and the context of using the system. Major functionalities of human user

interfaces include:

● Semantic search, that is, finding objects of interest

● Semantic browsing, that is, linking and aggregating content based on their meaning

● Visualization, that is, presenting the search results, contents, and browsing options in

useful ways


In the following, these possibilities of providing the end users with intelligent services

are briefly explored.

21.3.2.5 Semantic Search

On the Semantic Web, search can be based on finding the concepts related to the

documents at the metadata and ontology levels, in addition to the actual text or other

features of the data. With such concept-based methods, document meanings and queries

can be specified more accurately, which usually leads to better recall and precision,

especially if both the query and the underlying content descriptions are concept-based.

In practice, semantic search is usually based on query expansion, where a query concept is

expanded into its subconcepts or related concepts in order to improve recall. For example,

the query ‘‘chair’’ could find ‘‘sofas’’ too, even if the word ‘‘chair’’ is not mentioned in the

metadata of sofas. However, care must be taken when expanding queries so that precision

of search is not lost. For example, the underlying ontological hierarchies, such as a SKOS

vocabulary, may not be transitive leading to problems. For example, if the broader concept

of ‘‘makeup mirrors’’ is ‘‘mirrors,’’ and the broader concept of ‘‘mirrors’’ is ‘‘furniture,’’

then searching for furniture would return makeup mirrors, if query expansion is applied.

Part-of relations are especially tricky in terms of query expansion. When searching

chairs in Europe, also chairs from different countries in Europe can be found. However,

‘‘doors’’ should not be returned when searching for ‘‘buildings’’ even if doors are part of

buildings.

A problem of semantic search is mapping the literal search words, used by humans, to

underlying ontological concepts, used by the computer. Depending on the application,

only queries expressed by terms that are relevant to the domain and content available

are successful, other queries result in frustrating ‘‘no hits’’ answers. A way to solve the

problem is to provide the end user with explicit vocabularies as facets in the user interface,

for example, a subject heading category tree as in Yahoo! and dmoz.org. By selecting

a category, related documents are retrieved. If content in semantic search is indexed using

language-neutral concept URIs, and their labels are available in different languages,

multilinguality can be supported.

Awidely employed semantic search and browsing technique in semantic CH portals is

view-based or faceted search [95–99]. Here, the user can make several simultaneous

selections from orthogonal facets (e.g., object type, place, time, creator). They are exposed

to the end user in order to (1) provide him or her with the right query vocabulary, and

(2) for presenting the repository contents and search results and the number of hits in

facet categories. The number of hits resulting from a category selection is always shown to

the user before the selection. This eliminates queries leading to ‘‘no hits’’ dead ends, and

guides the user in making the next search selection on the facets. The result set can be

presented to the end user according to the facet hierarchies for better readability. This is

different from traditional full text search where results are typically presented as a hit list

ordered by decreasing relevance. Faceted search is not a panacea for all information


retrieval tasks. A Google-like keyword search interface is usually preferred if the user is

capable of expressing his or her information need accurately [101].

Faceted search has been integrated with the idea of ontologies and the Semantic Web

[99]. The facets can be constructed algorithmically from a set of underlying ontologies

that are used as the basis for annotating search items. Furthermore, the mapping of search

items onto search facets can be defined using logic rules. This facilitates more intelligent

semantic search of indirectly related items. Methods for ranking the search results in

faceted search based on fuzzy logic and probability theory are discussed in [100].

Another search technique now abundant in semantic CH applications is autocompletion.

The idea here is to search feasible query word options dynamically as the user types in a query,

and to provide the options for him or her to choose from. Semantic autocompletion [102,

103] generalizes this idea by trying to guess, based on ontologies and reasoning, the search

concept that the user is trying to formulate after each input character in an input field, or

even do the search down to the actual search objects dynamically.

With non-textual cultural documents, such as paintings, photographs, and videos,

metadata-based search techniques are a must in practice. However, also content-based

information retrieval methods (CBIR) [92], focusing on retrieving images, and multime-

dia information retrieval (MIR) [93], focusing on retrieving multimedia content, can be

used as complementary techniques. Here, the idea is to utilize actual document features

(at the data level), such as color, texture, and shape in images, as a basis for information

retrieval. For example, an image of Abraham Lincoln could be used as a query for finding

other pictures of him, or a piece of music could be searched for by humming it. Tools for

navigating, searching, and retrieving 2D images, 3D models, and textual metadata have

been developed, for example, in the Sculpteur project (http://www.sculpteurWeb.org).

Bridging the ‘‘semantic gap’’ between low-level image and multimedia features and

semantic annotations is an important but challenging research theme [94].

21.3.2.6 Semantic Browsing and Recommending

The idea of semantic browsing is to provide the end user with meaningful links to related

contents, based on the underlying metadata and ontologies of contents. RDF browsers

and tabulators are a simple form of a semantic browser. Their underlying idea has been

explicated as the linked data principle proposing that when an RDF resource (URI) is

rendered in a browser, the attached RDF links to related resources should be shown.When

one of these links is selected, the corresponding new resource is rendered, and so on.

A more developed and general idea is recommender systems [104, 107, 108]. Here, the

logic of selecting and recommending related resources can be based on also other principles

than the underlying RDF graph. For example, collaborative filtering is based on the browsing

statistics of other users. Also logic rules on top of an RDF knowledge base can be used for

creating semantic recommendation links and, at the same time, explanations telling the end

user why the recommendation link was selected in this context. Recommendations can be

based on a user profile of interest and the user’s feedback or browsing log [109, 110].

http://www.sculpteurWeb.org


Semantic recommending is related to relational search, where the idea is to try to

search and discover serendipitous semantic associations between different content items.

The idea is to make it possible for the end user to formulate queries such as ‘‘How is

X related to Y’’ by selecting the end-point resources, and the search result is a set of

semantic connection paths between X and Y [83, 84].

The behavior of semantic CH applications should in many cases be dynamic, based on

the context of usage [104]. Users are usually not interested in everything found in the

underlying content repositories, and would like to get information at different levels of

detail (e.g., information for children, professionals or busy travelers). An important aspect

of a CH application is then adaptation of the portal to different personal information

needs, interests, and usage scenarios, that is, the context of using an application. The

context concerns several aspects:

● Personal interests and the behavior of the end user. Material likely to be of interest to

the user should be preferred. Techniques such as collaborative filtering could be useful

in utilizing other users’ behavior.

● The social environment of the user (e.g., friends and other system users).

● The place and other environmental conditions (e.g., weather) in which the application is

used.

● The time of using the system (summer, night, etc.). For example, recommending to the

end user that a visit to a beach for a swim during winter may not be wise due to snow, and

it would be frustrating to direct him or her to amuseum on aMonday when it happens to

be closed.

● The computational environment at hand (WiFi, RFID, GPS, ad hoc networks, etc.).

21.3.2.7 Visualization

Visualization is an important aspect of the Semantic Web dealing with semantically

complex and interlinked contents [105]. In the cultural heritage domain, maps, timelines,

and methods for visualizing complicated and large semantic networks, result sets, and

recommendations are of special interest.

Maps are useful in both searching content and in visualizing the results. Awidely used

approach to using maps in portals is to use mash-up map services based on Google Maps

or similar services. For example, lots of Wikipedia articles have location information and

can be projected on maps [89]. Maps can also be used as navigational aids.

In the cultural heritage domain, historical maps are of interest of their own. For

example, they depict old place names and borders not available anymore in contemporary

maps. An approach to visualize historical geographical changes is developed in [106].

Here old maps are laid semitransparently on top of the contemporary maps and satellite

images of Google Maps, as a kind of historical lens. At the same time, articles from

Wikipedia and photos from services like Panoramio, as well as objects from museum

collections can be visualized on top of maps, giving even more historical and contempo-

rary perspective to the contents.


Maps are very usable in mobile phones and navigation systems. Many modern phones

include not only GPS for positioning, but also a compass for orientation. In some

augmented reality systems, it is possible to point the camera of the device in a direction

and get information about the nearby objects there. An example of this type of system is

Wikitude (http://www.wikitude.org).

Another important dimension for visualizing cultural content is time. A standard

approach for temporal visualization is to project objects of interest on a timeline.

A generic mash-up tool for creating timelines is the Simile timeline (http://simile.mit.

edu/timeline/). A timeline can be used both for querying and for visualizing and orga-

nizing search results.

21.3.2.8 Cultural Heritage as Web Services

The Semantic Web facilitates reusing and aggregating contents through Web APIs [83].

A starting point for this is to publish the CH repository as a SPARQL end point. It is also

possible to develop higher-level services for querying the RDF store. Both traditional Web

Services and lightweight mash-ups based on AJAX and REST can be used here. Using the

mash-up approach, the functionalities can be used in external applications with just a few

lines of JavaScript code added on the HTML level.

The possibility of reusing semantic CH content as a service is an important motivator for

organizations to join CH portal projects. In this way, one can not only get more visibility to

one’s content through a larger portal, but also enrich one’s own content with others’ related

content, and can use the enriched content back for other applications.

21.4 Related Resources Including Key Papers

21.4.1 Multimedia Ontologies, Annotation, and Analysis

Semantic Multimedia. Staab, S., Scherp, A., Arndt, R., Troncy, R., Grzegorzek, M.,

Saathoff, C., Schenk, S., Hardman, L.: Semantic multimedia. In: Reasoning Web: Fourth

International Summer School, Venice, Italy, 7–11 September 2008. Tutorial Lectures.

Springer, pp. 125–170 (2008).

In this paper, issues of semantics in multimedia management are dealt with, covering

the representation of multimedia metadata using Semantic Web ontologies; the interpre-

tation of multimedia objects by various means of reasoning; the retrieval of multimedia

objects by means of low- and high-level (semantic) representations of multimedia; and

the further processing of multimedia facts in order to determine provenance, certainty,

and other meta-knowledge aspects of multimedia data.

Enquiring MPEG-7-based multimedia ontologies. Dasiopoulou, S., Tzouvaras, V.,

Kompatsiaris, I., Strintzis, M.G.: Enquiring MPEG-7-based multimedia ontologies.

Multimed Tools Appl 46(2–3) (January 2010).

http://www.wikitude.org

http://simile.mit.edu/timeline/

http://simile.mit.edu/timeline/

21.4 Related Resources Including Key Papers 21 963

Machine-understandable metadata form the main prerequisite for the intelligent

services envisaged in a Web, which going beyond mere data exchange and provides

for effective content access, sharing, and reuse. MPEG-7, despite providing

a comprehensive set of tools for the standardized description of audiovisual content, is

largely compromised by the use of XML that leaves the largest part of the intended

semantics implicit. Aspiring to formalize MPEG-7 descriptions and enhance multimedia

metadata interoperability, a number of multimedia ontologies have been proposed.

Though sharing a common vision, the developed ontologies are characterized by sub-

stantial conceptual differences, reflected both in the modeling of MPEG-7 description

tools as well as in the linking with domain ontologies. Delving into the principles

underlying their engineering, a systematic survey of the state of the art MPEG-7-based

multimedia ontologies is presented, and issues highlighted that hinder interoperability as

well as possible directions toward their harmonization.

COMM: ACore Ontology forMultimedia Annotation. Arndt, R., Troncy, R., Staab, S.,

Hardman, L.: COMM: a core ontology for multimedia annotation. In: Staab, S., Studer, R.

(eds.), Handbook on Ontologies, 2nd edn. International Handbooks on Information Sys-

tems. Springer Verlag, pp. 403–421 (2009).

This chapter analyzes the requirements underlying the semantic representation of media

objects, explains why the requirements are not fulfilled by most semantic multimedia

ontologies and presents COMM, a core ontology for multimedia, that has been built

reengineering the current de facto standard for multimedia annotation, that is, MPEG-7,

and using DOLCE as its underlying foundational ontology to support conceptual clarity and

soundness as well as extensibility toward new annotation requirements.

ADescription Logic for Image Retrieval. Di Sciascio, E, Donini, F.M., Mongiello, M.:

A description logic for image retrieval. In: Proceedings of AI*IA, September 1999.

This paper presents a Description Logic–based language that enables the description

of complex objects as compositions of simpler artifacts for the purpose of semantic image

indexing and retrieval. An extensional semantics is provided, which allows for the formal

definition of corresponding reasoning services.

Ontological inference for image and video analysis. Town, C.: Ontological inference

for image and video analysis. Mach Vis Appl 17(2), 94–115 (2006).

Though focusing solely on probabilistic aspects of the imprecision involved in image

and video analysis, the paper elaborates insightfully on the individual limitations of

ontological and Bayesian inference, and proposes an iterative, goal-driven hypothesize-

and-test approach to content interpretation.

21.4.2 Broadcaster Artifacts Online

21.4.2.1 Vocabularies and Ontologies

● BBC programmes ontology. This ontology aims at providing a simple vocabulary for

describing programs. It covers brands, series (seasons), episodes, broadcast events,


broadcast services, etc. The data at http://www.bbc.co.uk/programmes are annotated

using this ontology. http://bbc.co.uk/ontologies/programmes/

21.4.2.2 Metadata Schemas

● TV-Anytime. TV-Anytime is a set of specifications for the controlled delivery of multi-

media content to a user’s personal device (Personal Video Recorder (PVR)). It seeks to

exploit the evolution in convenient, high-capacity storage of digital information to

provide consumers with a highly personalized TV experience. Users will have access to

content from a wide variety of sources, tailored to their needs and personal preferences.

http://www.etsi.org/Website/technologies/tvanytime.aspx

21.4.2.3 Semantic Television

NoTube: making the Web part of personalized TV. Schopman, B., Brickley, D., Aroyo, L.,

van Aart C., Buser, V., Siebes, R., Nixon, L., Miller, L., Malaise, V., Minno, M., Mostarda, M.,

Palmisano, D., Raimond, Y.: NoTube: making the Web part of personalized TV. In: Pro-

ceedings of the WebSci10: Extending the Frontiers of Society Online, April 2010.

The NoTube project aims to close the gap between the Web and TV by semantics. Bits

and pieces of personal and TV-related data are scattered around the Web. NoTube aims to

put the user back in the driver’s seat by using data that are controlled by the user, for

example, from Facebook and Twitter, to recommend programs that match the user’s

interests. By using the linked data cloud, semantics can be exploited to find complex

relations between the user’s interests and background information on programs, resulting

in potentially interesting recommendations.

21.4.3 Cultural Heritage Artifacts Online

21.4.3.1 Vocabularies and Ontologies

● Art and Architecture Thesaurus (AAT). A hierarchical vocabulary of around 34,000

records, including 134,000 terms, descriptions, bibliographic citations, and other

information relating to fine art, architecture, decorative arts, archival materials,

archaeology, and other material culture. http://www.getty.edu/research/conductin-

g_research/vocabularies/aat/

● Thesaurus of Geographical Names (TGN). A hierarchical vocabulary of around

895,000 records, including 1.1 million names, place types, coordinates, and descriptive

notes, focusing on places important for the study of art and architecture. http://www.

getty.edu/research/conducting_research/vocabularies/tgn/

http://www.bbc.co.uk/programmes

http://bbc.co.uk/ontologies/programmes/

http://www.etsi.org/Website/technologies/tvanytime.aspx

http://www.getty.edu/research/conducting_research/vocabularies/aat/

http://www.getty.edu/research/conducting_research/vocabularies/aat/

http://www.getty.edu/research/conducting_research/vocabularies/tgn/

http://www.getty.edu/research/conducting_research/vocabularies/tgn/

21.5 Future Issues 21 965

● Universal List of Artist Names (ULAN). A vocabulary of around 162,000 records,

including 453,000 names and biographical and bibliographic information for artists,

architects, firms, shops, and art repositories, including a wealth of variant names, pseu-

donyms, and language variants. http://www.getty.edu/research/conducting_research/

vocabularies/ulan/

● Iconclass. A classification system designed for art and iconography. It is the most

widely accepted scientific tool for the description and retrieval of subjects represented

in images (works of art, book illustrations, reproductions, photographs, etc.) and is

used by museums and art institutions. http://www.iconclass.nl/

● Library of Congress Subject Headings (LCSH). A very large subject classification

system for libraries, available also in SKOS. http://id.loc.gov/authorities/

21.4.3.2 Metadata Schemas

● Dublin Core. The Dublin Core Metadata Initiative, or ‘‘DCMI,’’ is an open organiza-

tion engaged in the development of interoperable metadata standards that support

a broad range of purposes and business models. http://dublincore.org/

● CIDOC CRM. The CIDOC Conceptual Reference Model (CRM) provides

definitions and a formal structure for describing the implicit and explicit con-

cepts and relationships used in cultural heritage documentation. http://www.cidoc-

crm.org/

21.4.3.3 Semantic eCulture Systems Online

● MuseumFinland. A semantic portal aggregating artifact collections from several

museums [80]. The content comes from Finnish museums and the system interface

is in Finnish (with an English tutorial). http://www.museosuomi.fi/

● MultimediaN eCulture Demonstrator. This cultural search engine gives access to

artworks from several museum collections using several large vocabularies [84]. http://

e-culture.multimedian.nl/demo/session/search

● CultureSampo. A semantic portal aggregating cultural content of different kinds from

tens of different organizations, Web sources, and public [83]. The content comes from

Finnish and some international sources, and the user interface supports Finnish,

English, and Swedish. http://www.kulttuurisampo.fi/

21.5 Future Issues

Semantic technologies are seen as being nearly ready for mainstream adoption as the first

decade of the twenty-first century draws to an end. While the situation with respect

to textual content is quite mature, non-textual media present additional challenges

http://www.getty.edu/research/conducting_research/vocabularies/ulan/

http://www.getty.edu/research/conducting_research/vocabularies/ulan/

http://www.iconclass.nl/

http://id.loc.gov/authorities/

http://dublincore.org/

http://www.cidoc-crm.org/

http://www.cidoc-crm.org/

http://www.museosuomi.fi/

http://e-culture.multimedian.nl/demo/session/search

http://e-culture.multimedian.nl/demo/session/search

http://www.kulttuurisampo.fi/


to the technology adoption. As a result, the wider adoption of semantic multimedia may

first follow the breakthrough of other applications of semantic technology – for example,

knowledge management, data integration, semantic search – applied to textual content.

The new challenges being faced by themedia industry – the scale and complexity ofmedia

being produced and shared – act as amarket driver for technological advances in the semantic

multimedia field. Online media in particular needs improved retrieval, adaptation, and

presentation if content owners are to win market share in a broad and overfilled market.

A fewmedia organizations have begun to lead the way in using and demonstrating semantics,

for example, the BBC has begun to publish its online content with RDF.

The arts – that is, cultural heritage – are another sector in which semantics are gaining

traction. Museums, for example, have large amounts of metadata about their collections,

which cannot be easily interpreted or reused due to non-digital, non-semantic, and

proprietary approaches. Again, some pioneers, such as Rijksmuseum in Amsterdam, are

taking the first steps to digitize and annotate their collections and explore the new

possibilities which are realized.

The media, arts, and entertainment sector looks on semantics as a clear future solution

for their problems with large scales of heterogeneous non-textual content, and for the

emerging challenges in realizing attractive and competitive content offers on a ubiquitous

Web with millions of content channels. The cost of creating the semantic data tends to be

larger at present than the benefits gained from its creation, so while the potential benefit

from semantics will continue to grow as Web media becomes more ubiquitous (making

a Unique Selling Point ever more critical for a content owner and provider), the actual

costs of semantics must still fall through improved, more automated content annotation

tools and approaches. Let us look at trends and technology in two specific target areas for

semantic multimedia.

IP Television: IP Television refers to the convergence of Internet and Television,

which is also happening outside of the television set (e.g., also Web-based TV, Mobile

TV). Currently, it is focused on new types of services around television such as EPGs,

programming on demand, and live TV pause. An emerging trend in IPTV is toward Web

integration through widgets, which are lightweight self-contained content items that

make use of open Web standards (HTML, JavaScript) and the IP back-channel to

communicate with the Web (typically in an asynchronous manner). Yahoo! and Intel,

for example, presented their Widget Channel at the CES in January 2009, where Web

content such as Yahoo! news and weather, or Flickr photos, could be displayed in on-screen

widgets on TV. Sony and Samsungwill go tomarket in 2010with Internet-enabled televisions.

A 2009 survey found that there should be a gradual but steady uptake of TV Internet usage

with ‘‘the mass market inflection point occurring over the next 3–5 years’’ (from http://

oregan.net/press_releases.php?article=2009-01-07). Parallel to this, research into seman-

tic IPTV applications and solutions is being established in academic and industry labs.

A key focus for semantics is the formal description of the programming and user interests

to provide for a better personalization of the TV experience (EU project NoTube, http://

www.notube.tv, 2009–2012) as well as formal description of networks and content

to enable a better delivery of complex services (myMedia, http://www.semantic-iptv.de).

http://oregan.net/press_releases.php?article=2009-01-07

http://oregan.net/press_releases.php?article=2009-01-07

http://www.notube.tv

http://www.notube.tv

http://www.semantic-iptv.de

21.5 Future Issues 21 967

Amajor barrier to uptake by broadcasters and content providers is the lack of support

for semantic technology in the legacy broadcast systems. Shifts in the provider-side IT

infrastructure to Internet-based (even cloud-based) infrastructures should give an open-

ing for the introduction of semantics into the production systems of the television and

media companies. Vocabularies and technologies will need to converge on specific stan-

dards to encourage industry acceptance, as discussed in this chapter’s section on broad-

casting, which should emerge in this next ‘‘uptake’’ period. As Internet-TV reaches the

mass market point (possibly by 2014), companies will seek Unique Selling Points for their

products and services, which will drive the incorporation of semantic technologies into

IPTV infrastructures and packages.

Virtual Worlds and 3D: The third dimension has always been a part of human

perception, but in the digital world it has had a shorter existence. Today, on the other

hand, computers are capable of rendering highly complex 3D scenes, which can even be

mistaken for real by the human eye. 3DTV is on the cusp of market introduction. A new IT

segmentmust deal with the capturing of 3D objects, their manipulation andmanagement,

in application domains from health care to cultural heritage.

Challenges in the 3D technology domain include how to describe 3D objects for their

indexing, storage, retrieval, and alteration. Semantics provide a means to improve the

description, search, and reuse of complex 3D digital objects. Awareness of the value and

potential use of this technology in the 3Dmedia community is at an early stage [111]. It is

being promoted to industry through initiatives like FOCUS K3D (http://www.focusk3d.

eu), which has application working groups for the domains of medicine and bioinfor-

matics, gaming and simulation, product modeling, and archaeology. A survey on the state

of the art in cultural heritage [112] notes that progress is being made on standardized

metadata schemes and ontologies; current limitations relate to methodologies and the

lack of specialized tools for 3D knowledge management, yet this could be addressed in the

short to medium term.

Virtual worlds are a natural extension of 3D technology into reflecting the perceptive

realities of one’s own world and have also found applicative usage in domains such as

medicine, social analysis, education, and eCommerce. Making virtual worlds ‘‘react’’ more

realistically to actions and activities performed by the actors of that world requires

a (semantic) understanding of the objects rendered in the world and the events that (can)

occur between them. There is also a trend to more closely couple real and virtual

worlds through (real world) sensors, which generate data streams to cause the virtual world

to reflect the real in near-real time. This leads to a requirement to handle increasing scales of

heterogeneous, dirty data for information extraction and actionable inference within the

virtual world.

As in the 3D technology field, barriers to use of semantic technologies lie in the need to

agree on the vocabularies and schema for descriptions of the worlds (which now goes

beyond the form of objects, and encapsulates what can be done with them, how they react

to external events, etc.), as well as the availability of appropriate tools for the creation and

maintenance of semantic virtual worlds. Hence, it is likely that semantics will first need to

experience wide uptake in 3D technology systems before it also further develops into

http://www.focusk3d.eu

http://www.focusk3d.eu


a technology for virtual worlds in the medium to long term. Projects such as Semantic

Reality (http://www.semanticreality.org) provide exciting longer-term visions of a virtual

world tightly connected to the real world, with trillions of sensors able to ensure a close

correlation between both [113].

Such visions of intelligent media and even intelligent worlds will be built on the blocks of

semantic multimedia technology discussed in this chapter, once key barriers to uptake are

overcome. In particular, semantic annotation of non-textual media remains a significant

barrier.

Foundational technologies to (semi)automatically annotate non-textual resources

have been investigated by the multimedia semantics community, which spans more

broadly the areas of computer vision and multimedia analysis. These areas provide

means to analyze visual streams of information with the help of low-level feature extrac-

tion, object detection or high-level event inferencing. Despite promising advances,

approaches that can be generically and efficiently applied to automate annotation across

media still remain to be defined. In contrast to textual resources, which are annotated

automatically to a large extent, non-textual media semantic annotation heavily relies on

human input, thus being associated to significant costs.

The area of computer vision provides methods to make visual resources eligible

for machines. Recent years have seen considerable advancement in the range of things that

are detectable in still and moving images. This includes object detection scaling up

to a considerable amount of different objects for some tools, to object tracking in,

for example, surveillance videos. All of these approaches try to derive meaning from low-

level features (like color histogram, motion vectors, etc.) automatically. Despite constant

advances, these tools are still not capable of exploiting the full meaning of visual resources as

not all meaning is localized in the visual features and needs human interpretation. Two ways

are currently followed in current research: The first one is to provide rich human annotations

as training data for future automated analysis. The second one relies purely on analysis of raw

content, which only performs well for specialized domains and settings in which relevant

concepts can be easily recognized. Richer semantics, capturing implicit features and meaning

derived from humans cannot be extracted in this manner. Present trends put therefore the

humanmore andmore into the loop by lowering the entry barrier for his or her participation.

This is done by adoptingWeb2.0 or game-based approaches to engage users in the annotation

of visual resources. Recent approaches, for example, try to support automatic analysis with

tagging or vice versa. What is still missing are approaches that are capable of exploiting more

high-level features also in visual resources of lower quality and which can be adapted across

domains.

Hence, in the foreseeable future, multimedia analysis has still to be supported by end users

to a great extent. This is why recent years have seen a huge growth in available annotation

tools, which allowmanual or semiautomatic annotation of visual resources. These approaches

are either targeted at the support of analysis approaches to provide training data or as ameans

for users to organize media. These approaches show a varying complexity: While some allow

users to express complex statements about visual resources, others enable the provision of

tags. Some approaches apply annotation or tag propagation and offer support based on

http://www.semanticreality.org

21.6 Cross-References 21 969

previously supplied annotations. Most of these approaches are still not mature and are only

applied in research.While approaches based on (complex) ontologies exist, some of them are

not suitable for most end users. At the other side of the spectrum, tagging-based approaches

are not suitable to capture all semantics in visual resources. What is still needed are tools that

allow one to capture subjective views of visual resources and combine these views to deliver

a consolidated objective view that can represent a view which holds across users. While

tagging-based approaches are proven to ease large-scale uptake, motivating users to provide

more meaningful annotations is still an issue.

21.6 Cross-References

> Future Trends

References

1. Deerwater, S., Dumais, S.T., Furnas, G.W.,

Landauer, T.K., Harshman, R.: Indexing by latent

semantic analysis. J. Am. Soc.Inf. Sci. 41(6),

391–407 (1990)

2. Burger, T., Hausenblas, M.: Why real-world mul-

timedia assets fail to enter the semantic web. In:

Proceedings of the Semantic Authoring, Annota-

tion and Knowledge Markup Workshop

(SAAKM 2007) located at the Fourth Interna-

tional Conference on Knowledge Capture (KCap

2007), Whistler. CEUR Workshop Proceedings

289. CEUR-WS.org (2007)

3. MPEG-7: Multimedia content description Inter-

face. Standard No. ISO/IEC 15938 (2001)

4. van Ossenbruggen, J., Nack, F., Hardman, L.:

That obscure object of desire: multimedia meta-

data on the web (part I). IEEE Multimed. 11(4),

38–48 (2004)

5. Nack, F., van Ossenbruggen, J., Hardman, L.:

That obscure object of desire: multimedia meta-

data on the web (part II). IEEE Multimed. 12(1),

54–63 (2005)

6. Troncy, R., Carrive, J.: A reduced yet extensible

audio-visual description language: how to

escape from theMPEG-7 bottleneck. In: Proceed-

ings of the Fourth ACM Symposium on Docu-

ment Engineering (DocEng 2004), Milwaukee

(2004)

7. Troncy, R., Bailer, W., Hausenblas, M., Hofmair,

P., Schlatte, R.: Enabling multimedia metadata

interoperability by defining formal semantics

of MPEG-7 profiles. In: Proceedings of the First

International Conference on Semantics and

Digital Media Technology (SAMT 2006), Athens,

pp. 41–55 (2006)

8. Garcia, R., Celma, O.: Semantic integration and

retrieval of multimedia metadata. In: Proceedings

of the Fifth International Workshop on

Knowledge Markup and Semantic Annotation

(SemAnnot 2005), Galway, pp. 69–80 (2005)

9. Hunter, J.: Adding multimedia to the semantic

web – building an MPEG-7 ontology. In:

Proceedings of the First International Semantic

Web Working Symposium (SWWS 2001),

Stanford, pp. 261–281 (2001)

10. Tsinaraki, C., Polydoros, P., Christodoulakis, S.:

Interoperability support for ontology-based

video retrieval applications. In: Proceedings of

the Third International Conference on Image

and Video Retrieval (CIVR 2005), Dublin,

pp. 582–591 (2005)

11. Troncy, R., Celma, O., Little, S., Garcıa, R.,

Tsinaraki, C.: MPEG-7 based multimedia

ontologies: interoperability support or interoper-

ability issue? In: SAMT 2007: Workshop on Mul-

timedia Annotation and Retrieval enabled by

Shared Ontologies (MAReSO 2007), Genoa

(2007)

12. Lagoze, C., Hunter, J.: The ABC ontology and

model (v3.0). J. Digit. Inf. 2(2) (2001)


13. Hunter, J.: Enhancing the semantic interoperabil-

ity of multimedia through a core ontology.

IEEE Trans. Circuits Syst. Video Technol. 13(1),

49–58 (2003)

14. Hunter, J., Little, S.: A framework to enable

the semantic inferencing and querying of

multimedia content. Int. J. Web Eng. Technol.

2(2/3), 264–286 (2005)(Special issue on the

Semantic Web)

15. Pease, A., Niles, I., Li, J.: The suggested upper

merged ontology: a large ontology for the seman-

tic web and its applications. In: Working Notes of

the AAAI-2002 Workshop on Ontologies and the

Semantic Web, Edmonton (2002)

16. Gangemi, A., Guarino, N., Masolo, C., Oltramari,

A., Schneider, L.: Sweetening ontologies with

DOLCE. In: Proceedings of the 13th International

Conference on Knowledge Engineering and

Knowledge Management (EKAW 2002),

Siguenza, pp. 166–181 (2002)

17. Polydoros, P., Tsinaraki, C., Christodoulakis, S.:

GraphOnto: OWL-based ontology management

and multimedia annotation in the DS-MIRF

framework. J. Digit. Inf. Manag. 4(4), 214–219

(2006)

18. Tsinaraki, C., Polydoros, P., Christodoulakis, S.:

Interoperability support between MPEG-7/21

and OWL in DS-MIRF. Trans. Knowl. Data Eng.

19(2), 219–232 (2007)(Special issue on the

Semantic Web Era)

19. Garcia, R., Gil, R., Delgado, J.: A web ontologies

framework for digital rights management. J. Artif.

Intell. Law 15, 137–154 (2007)

20. Garcia, R., Gil, R.: Facilitating business interop-

erability from the semantic web. In: Proceedings

of the Tenth International Conference on Busi-

ness Information Systems (BIS 2007), Poznan,

pp. 220–232 (2007)

21. Troncy, R.: Integrating structure and semantics

into audio-visual documents. In: Proceedings of

the Second International Semantic Web Confer-

ence (ISWC 2003), Sanibel Island. Lecture Notes

in Computer Science, vol. 2870, pp. 566–581.

Springer, Berlin (2003)

22. Isaac, A., Troncy, R.: Designing and using an

audio-visual description core ontology. In:Work-

shop on Core Ontologies in Ontology Engineer-

ing at the 14th International Conference on

Knowledge Engineering and Knowledge Manage-

ment (EKAW 2004), Whittlebury Hall (2004)

23. Bloehdorn, S., Petridis, K., Saathoff, C., Simou,

N., Tzouvaras, V., Avrithis, Y., Handschuh, S.,

Kompatsiaris, Y., Staab, S., Strintzis, M.G.:

Semantic annotation of images and videos for

multimedia analysis. In: Proceedings of the Sec-

ond European Semantic Web Conference (ESWC

2005), Heraklion. Lecture Notes in Computer

Science, vol. 3532, pp. 592–607. Springer, Berlin

(2005)

24. Hollink, L., Worring, M., Schreiber, A.Th.: Build-

ing a visual ontology for video retrieval. In: Pro-

ceedings of the 13th ACM International

Conference on Multimedia (MM 2005), Hilton

(2005)

25. Vembu, S., Kiesel, M., Sintek, M., Bauman, S.:

Towards bridging the semantic gap inmultimedia

annotation and retrieval. In: Proceedings of the

First International Workshop on Semantic Web

Annotations for Multimedia (SWAMM 2006),

Edinburgh (2006)

26. Halaschek-Wiener, C., Golbeck, J., Schain, A.,

Grove, M., Parsia, B., Hendler, J.: Annotation

and provenance tracking in semantic web photo

libraries. In: Proceedings of International

Provenance and Annotation Workshop (IPAW

2006), Chicago, pp. 82–89 (2006)

27. Chakravarthy, A., Ciravegna, F., Lanfranchi, V.:

Aktivemedia: cross-media document annotation

and enrichment. In: Poster Presentaiton at the

Proceedings of Fifth International Semantic Web

Conference (ISWC 2006), Athens, GA. Lecture

Notes in Computer Science, vol. 4273. Springer,

Berlin (2006)

28. Petridis, K., Bloehdorn, S., Saathoff, C.,

Simou, N., Dasiopoulou, S., Tzouvaras, V.,

Handschuh, S., Avrithis, Y., Kompatsiaris, I.,

Staab, S.: Knowledge representation and semantic

annotation of multimedia content. IEE Proc.

Vis. Image Signal Process. 153, 255–262 (2006)

(Special issue on Knowledge-Based Digital Media

Processing)

29. Simou, N., Tzouvaras, V., Avrithis, Y., Stamou,G.,

Kollias, S.: A visual descriptor ontology for mul-

timedia reasoning. In: Proceedings of the Work-

shop on Image Analysis for Multimedia

Interactive Services (WIAMIS 2005), Montreux

(2005)

30. Saathoff, C., Schenk, S., Scherp, A.: Kat: the

k-space annotation tool. In: Poster Session, Inter-

national Conference on Semantic and Digital


Media Technologies (SAMT 2008), Koblenz

(2008)

31. Arndt, R., Troncy, R., Staab, S., Hardman, L.,

Vacura, M.: COMM: designing a well-founded

multimedia ontology for the web. In: Proceedings

of Sixth International Semantic Web Conference

(ISWC 2007), Busan. Lecture Notes in Computer


(2007)

32. Saathoff, C., Scherp, A.: Unlocking the seman-

tics of multimedia presentations in the web

with the multimedia metadata ontology. In:

Proceedings of 19th International Conference

on World Wide Web (WWW 2010), Raleigh,

pp. 831–840 (2010)

33. Smeulders, A.W.M., Worring, M., Santini, S.,

Gupta, A., Jain, R.: Content-based image retrieval

at the end of the early years. IEEE Trans. Pattern

Anal. Mach. Intell. 22(12), 1349–1380 (2000)

34. Rao, A., Jain, R.: Knowledge representation and

control in computer vision systems. IEEE Expert

3, 64–79 (1988)

35. Draper, B., Hanson, A., Riseman, E.: Knowledge-

directed vision: control, learning and integration.

IEEE 84(11), 1625–1681 (1996)

36. Snoek, C., Huurnink, B., Hollink, L., Rijke, M.,

Schreiber, G., Worring, M.: Adding semantics to

detectors for video retrieval. IEEE Trans.

Multimed. 9(5), 975–986 (2007)

37. Hauptmann, A., Yan, R., Lin, W.H., Christel, M.,

Wactlar, H.: Can high-level concepts fill the

semantic gap in video retrieval? A case study

with broadcast news. IEEE Trans. Multimed.

9(5), 958–966 (2007)

38. Hunter, J., Drennan, J., Little, S.: Realizing the

hydrogen economy through Semantic Web

technologies. IEEE Intell. Syst. 19(1), 40–47

(2004)

39. Little, S., Hunter, J.: Rules-by-example – a novel

approach to semantic indexing and querying

of images. In: Proceedings of the Third Interna-

tional Semantic Web Conference (ISWC 2004),

Hiroshima. Lecture Notes in Computer


(2004)

40. Hollink, L., Little, S., Hunter, J.: Evaluating the

application of semantic inferencing rules to

image annotation. In: International Conference

on Knowledge Capture (K-CAP 2005), Banff,

pp. 91–98 (2005)

41. Petridis, K., Anastasopoulos, D., Saathoff, C.,

Timmermann, N., Kompatsiaris, Y., Staab, S.:

M-OntoMat-Annotizer: image annotation,

linking ontologies and multimedia low-level fea-

tures. In: Proceedings of the 10th International

Conference on Knowledge-Based Intelligent

Information and Engineering Systems, Part 3

(KES 2006), Bournemouth. Lecture Notes in

Computer Science, vol. 4253, pp. 633–640.


42. Maillot, N., Thonnat, M., Boucher, A.: Towards

ontology based cognitive vision. In: Proceed-

ings of the International Conference on Com-

puter Vision Systems (ICVS 2003), Graz,

pp. 44–53 (2003)

43. Hudelot, C., Maillot, N., Thonnat, M.: Symbol

grounding for semantic image interpretation:

from image data to semantics. In: Proceedings

of 10th International Conference on Computer

Vision Workshops (ICCV 2005), Beijing (2005)

44. Maillot, N., Thonnat, M.: A weakly supervised

approach for semantic image indexing and

retrieval. In: Proceedings of the Fourth Interna-

tional Conference on Image and Video Retrieval

(CIVR 2005), Singapore. Lecture Notes in Com-

puter Science, vol 3568, pp. 629–638. Springer,

Berlin (2005)

45. Dasiopoulou, S., Mezaris, V., Kompatsiaris, I.,

Papastathis, V., Strintzis, M.: Knowledge-assisted

semantic video object detection. IEEE Trans.

Circuits Syst. Video Technol. 15(10), 1210–1224

(2005)

46. Neumann, B., Weiss, T.: Navigating through logic-

based scene models for high-level scene interpreta-

tions. In: Proceedings of the Third International

Conference on Computer Vision Systems (ICVS

2003), Graz. Lecture Notes in Computer Science,

vol. 2626, pp. 212–222. Springer, Berlin (2003)

47. Moller, R., Neumann, B., Wessel, M.: Towards

computer vision with description logics: some

recent progress. In: Workshop on Integration

of Speech and Image Understanding, Corfu,

pp. 101–115 (1999)

48. Neumann, B., MAoller, R.: On scene interpretation

with description logics. Image Vision Comput.

Arch. 26(1), 247–275 (2008). http://dx.doi.org/

10.1016/j.imavis.2007.08.013

49. Hotz, L., Neumann, B., Terzic, K.: High-level

expectations for low-level image processing. In:

Proceedings of the 31st Annual German

http://dx.doi.org/10.1016/j.imavis.2007.08.013

http://dx.doi.org/10.1016/j.imavis.2007.08.013


Conference on AI (KI 2008), Kaiserslautern,

pp. 87–94 (2008)

50. Neumann, B.: Bayesian compositional hierar-

chies – a probabilistic structure for scene inter-

pretation. Technical report FBI-HH-B-282/08.

Department of Informatics, Hamburg University

(2008)

51. Peraldi, I.E., Kaya, A., Melzer, S., Moller, R.,

Wessel, M.: Towards a media interpretation

framework for the semantic web. In: Proceedings

of the International Conference on Web Intelli-

gence (WI 2007), Silicon Valley, pp. 374–380

(2007)

52. Dubois, D., Prade, H.: Possibility theory, proba-

bility theory and multiple-valued logics: a clarifi-

cation. Ann. Math. Artif. Intell. 32(1–4), 35–66

(2001)

53. Sciascio, E.D., Donini, F.: Description logics for

image recognition: a preliminary proposal. In:

International Workshop on Description Logics

(DL 1999), Linkoping (1999)

54. Sciascio, E.D., Donini, F., Mongiello, M.: Struc-

tured knowledge representation for image

retrieval. J. Artif. Intell. Res. 16, 209–257 (2002)

55. Dasiopoulou, S., Kompatsiaris, I., Strintzis, M.:

Using fuzzy DLs to enhance semantic image

analysis. In: Proceedings of Third International

Conference on Semantic and Digital Media Tech-

nologies (SAMT 2008), Koblenz, pp. 31–46 (2008)

56. Dasiopoulou, S., Kompatsiaris, I., Strintzis, M.:

Applying fuzzy DLs in the extraction of image

semantics. J. Data Semant. 14, 105–132 (2009)

57. Simou, N., Athanasiadis, T., Stoilos, G., Kollias,

S.: Image indexing and retrieval using expressive

fuzzy description logics. Signal Image Video

Process. 2(4), 321–335 (2008)

58. Hudelot, C., Atif, J., Bloch, I.: Fuzzy spatial rela-

tion ontology for image interpretation. Fuzzy Sets

Syst. 159(15), 1929–1951 (2008)

59. Straccia, U.: Reasoning within fuzzy description

logics. J. Artif. Intell. Res. 14, 137–166 (2001)

60. Stoilos, G., Stamou, G., Tzouvaras, V., Pan, J.,

Horrocks, I.: The fuzzy description logic

f-SHIN. In: Proceedings of the International

Workshop on Uncertainty Reasoning for the

Semantic Web (URSW 2005), Galway, pp. 67–76

(2005)

61. Ding, Z.: Bayesowl: a probabilistic framework for

semantic web. Ph.D. thesis, University of Mary-

land, Baltimore County (2005)

62. da Costa, P., Laskey, K., Laskey, K.: PR-OWL: a

Bayesian ontology language for the semantic web.

In: Proceedings of the Fourth International

Workshop on Uncertainty Reasoning for the

Semantic Web (URSW 2008), Karlsruhe.

Lecture Notes in Computer Science, vol. 5327,

pp. 88–107. Springer, Berlin (2008)

63. Town, C.: Ontological inference for image and

video analysis. Mach. Vis. Appl. 17(2), 94–115

(2006)

64. Tran, S., Davis, L.: Event modeling and recogni-

tion using Markov logic networks. In: Proceed-

ings of the 10th European Conference on

Computer Vision, Part II (ECCV 2008),

Marseille, pp. 610–623 (2008)

65. Richardson, M., Domingos, P.: Markov logic net-

works. Mach. Learn. 62(1–2), 107–136 (2006)

66. Francois, A., Nevatia, R., Hobbs, J., Bolles, R.:

Verl: an ontology framework for representing

and annotating video events. IEEE Multimed.

12(4), 76–86 (2005)

67. Dasiopoulou, S., Kompatsiaris, I.: Trends and

issues in description logics frameworks for

image interpretation. In: Proceedings of the

Sixth Hellenic Conference on Artificial Intelli-

gence: Theories, Models and Applications

(SETN 2010), Athens, pp. 61–70 (2010)

68. Schopman, B., Brickley, D., Aroyo, L., van Aart,

C., Buser, V., Siebes, R., Nixon, L., Miller, L.,

Malaise, V., Minno,M.,Mostarda, M., Palmisano,

D., Raimond, Y.: NoTube: making the web part

of personalised TV. In: Proceedings of the

WebSci10: Extending the Frontiers of Society

Online, Raleigh (2010)

69. Taylor, A.: Introduction to Cataloging and Clas-

sification. Library and Information Science

Text Series. Libraries Unlimited, Santa Barbara

(2006)

70. Sowa, J.: Knowledge Representation. Logical,

Philosophical, and Computational Foundations.

Brooks/Cole, Pacific Grove (2000)

71. Doerr, M.: The CIDOC CRM – an ontological

approach to semantic interoperability of meta-

data. AI Mag. 24(3), 75–92 (2003)

72. Ruotsalo, T., Hyvonen, E.: An event-based

method for making heterogeneous metadata

schemas and annotations semantically interoper-

able. In: Proceedings of the Sixth International

Semantic Web Conference, Second Asian Seman-

tic Web Conference (ISWC 2007 + ASWC 2007),


Busan. Lecture Notes in Computer Science,


73. Borgo, S., Masolo, C.: Foundational choices in

DOLCE. In: Staab, S., Studer, R. (eds.) Handbook

on Ontologies. International Handbooks on

Information Systems, 2nd edn., pp. 403–421.

Springer, Dordrecht (2009)

74. Fellbaum, C. (ed.): WordNet. An Electronic

Lexical Database. MIT Press, Cambridge (2001)

75. Yano, K., Nakaya, T., Isoda, Y., Takase, Y.,

Kawasumi, T., Matsuoka, K., Seto, T.,

Kaewahara, D., Tsukamotew, A., Inoue, M.,

Kirimura, T.: Virtual Kyoto: 4DGIS comprising

spatial and temporal dimensions. J. Geogr.

117(2), 464–476 (2008)

76. Kauppinen, T., Vaatanen, J., Hyvonen, E.: Creat-

ing and using geospatial ontology time series in a

semantic cultural heritage portal. In: Proceedings

of the Fifth European Semantic Web Conference

(ESWC 2008), Tenerife. Lecture Notes in Com-

puter Science, vol. 5021, pp. 110–123. Springer,

Berlin (2008)

77. Nagypal, G., Motik, B.: Afuzzy model for

representing uncertain, subjective, and vague

temporal knowledge in ontologies. In: On the

Move to Meaningful Internet Systems 2003:

CoopIS, DOA, and ODBASE – OTM Confeder-

ated International Conferences (CoopIS, DOA,

and ODBASE 2003), Catania, pp. 906–923 (2003)

78. Kauppinen, T., Mantegari, G., Paakkarinen, P.,

Kuittinen, H., Hyvonen, E., Bandini, S.: Deter-

mining relevance of imprecise temporal intervals

for cultural heritage information retrieval. Int. J.

Hum. Comput. Stud. 86(9), 549–560 (2010).

Elsevier

79. Allen, J.F.: Maintaining knowledge about tempo-

ral intervals. Commun. ACM 26(11), 832–843

(1983)

80. Hyvonen, E., Makela, E., Salminen, M., Valo, A.,

Viljanen, K., Saarela, S., Junnila, M., Kettula, S.:

MuseumFinland – Finnish museums on the

semantic web. J. Web Semant. 3(2), 224–241

(2005)

81. McCarty, W.: Humanities Computing. Palgrave

Macmillan, Basingstoke (2005)

82. Sheth, A., Aleman-Meza, B., Arpinar, I.B.,

Bertram, C., Warke, Y., Ramakrishnan, C.,

Halaschek, C., Anyanwu, K., Avant, D., Arpinar,

F.S., Kochut, K.: Semantic association iden-

tification and knowledge discovery for national

security applications. J. Database Manag. Data-

base Technol. 16(1), 33–53 (2005)

83. Hyvonen, E., Makela, E., Kauppinen, T., Alm, O.,

Kurki, J., Ruotsalo, T., Seppala, K., Takala, J.,

Puputti, K., Kuittinen, H., Viljanen, K.,

Tuominen, J., Palonen, T., Frosterus, M., Sinkkila,

R., Paakkarinen, P., Laitio, J., Nyberg, K.:

CultureSampo – Finnish culture on the semantic

web 2.0. In: Proceedings of the Museums and the

Web (MW 2009), Indianapolis (2009)

84. Schreiber, G., Amin, A., Aroyo, L., van Assem,M.,

de Boer, V., Hardman, L., Hildebrand, M.,

Omelayenko, B., van Ossenbruggen, J.,

Tordai, A., Wielemaker, J., Wielinga, B.J.:

Semantic annotation and search of cultural-

heritage collections: The MultimediaN E-Culture

demonstrator. J. Web Semant. 6(4), 243–249

(2008)

85. Junnila, M., Hyvonen, E., Salminen, M.: Describ-

ing and linking cultural semantic content by

using situations and actions. In: Robering, K.

(ed.) Information Technology for the Virtual

Museum. LIT verlag, Berlin (2008)

86. Byrne, K.: Populating the Semantic Web – Com-

bining text and relational databases as RDF

graphs. Ph.D. thesis, University of Edinburgh,

Supp. 32 (2009)

87. Hyvonen, E.: Semantic portals for cultural heri-

tage. In: Staab, S., Studer, R. (eds.) Handbook on

Ontologies, 2nd edn., pp. 757–778. Springer,

Dordrecht (2009)

88. van Hage, W.R., Stash, N., Wang, Y., Aroyo, L.:

Finding your way through the Rijksmuseumwith

an adaptive mobile museum guide. In: The

Semantic Web: Research and Applications, Sev-

enth Extended SemanticWeb Conference (ESWC

2010), Proceedings, Part I, Heraklion. Lecture

Notes in Computer Science, vol. 6088,

pp. 46–59. Springer, Berlin (2010)

89. Becker, C., Bizer, C.: DBpedia mobile: a

locationenabled linked data browser. In: Proceed-

ings of the First Workshop about Linked Data on

the Web (LDOW 2008), Beijing (2008);

Kobilarov, G., Scott, T., Raimond, Y., Oliver, S.,

Sizemore, C., Smethurst, M., Bizer, C., Lee, R.:

Media meets semantic web – how the BBC uses

DBpedia and linked data to make connections.

In: The Semantic Web: Research and Applica-

tions, Proceedings of the Seventh Extended

Semantic Web Conference (ESWC 2010), Part I,


Heraklion. Lecture Notes in Computer Science,


90. Euzenat, J., Shvaiko, P.: Ontology Matching.


91. Hyvonen, E., Viljanen, K., Tuominen, J.,

Seppala, K.: Building a national semantic web

ontology and ontology service infrastructure –

the FinnONTO approach. In: Proceedings of the

Fifth European Semantic Web Conference

(ESWC 2008), Tenerife. Lecture Notes in Com-


Berlin (2008)

92. Rui, Y., Huang, T., Chang, S.: Image retrieval:

current techniques, promising directions and

open issues. J. Vis. Commun. Image Represent.

10(4), 39–62 (1999)

93. Lew, M.S., Sebe, N., Djeraba, C., Jain, R.: Content-

basedmultimedia information retrieval: state of the

art and challenges.ACMTrans.Multimed.Comput.

Commun. Appl. 2, 1–19 (2006)

94. Hollink, L.: Semantic annotation for retrieval

of visual resources. Ph.D. thesis, Free University

of Amsterdam. SIKS Dissertation Series,

No. 2006-24 (2006)

95. Pollitt, A.S.: The key role of classification and

indexing in view-based searching. Technical

Report. University of Huddersfield, UK. http://

www.ifla.org/IV/ifla63/63polst.pdf (1998)

96. Hearst, M., Elliott, A., English, J., Sinha, R.,

Swearingen, K., Lee, K.-P.: Finding the flow in Web

site search. Commun. ACM 45(9), 42–49 (2002)

97. Hyvonen, E., Saarela, S., Viljanen, K.: Application

of ontology techniques to view-based semantic

search and browsing. In: The Semantic Web:

Research and Applications. Proceedings of the

First European Semantic Web Symposium

(ESWS 2004), Heraklion. Lecture Notes in Com-


Berlin (2004)

98. Sacco, G.M.: Dynamic taxonomies: guided inter-

active diagnostic assistance. In: Wickramasinghe,

N. (ed.) Encyclopedia of Healthcare Information

Systems. Idea Group, Hershey (2005)

99. Hildebrand, M., van Ossenbruggen, J., Hardman,

L.: /facet: a browser for heterogeneous Semantic

Web repositories. In: Proceeding of the Fifth

International Semantic Web Conference (ISWC

2006), Athens, GA. Lecture Notes in Computer


(2006)

100. Holi, M.: Crisp, fuzzy, and probabilistic faceted

semantic search. Dissertation, School of Science

and Technology, Aalto University, Espoo (2010)

101. English, J., Hearst, M., Sinha, R., Swearingen, K.,

Lee, K.-P.: Flexible search and navigation using

faceted metadata. Technical report. School of

Information Management and Systems, Univer-

sity of Berkeley, Berkeley (2003)

102. Hyvonen, E., Makela, E.: Semantic

autocompletion. In: Proceedings of the First

Asian Semantic Web Conference (ASWC

2006), Beijing. Lecture Notes in Computer

Science, vol. 4185, pp. 739–751. Springer,

Heidelberg (2006)

103. Hildebrand, M., van Ossenbruggen, J.R.: Con-

figuring semantic web interfaces by data map-

ping. In: Proceedings of the Workshop on:

Visual Interfaces to the Social and the Semantic

Web (VISSW 2009), Sanibel Island (2009)

104. Burke, R.: Knowledge-based recommender sys-

tems. In: Kent, A. (ed.) Encyclopedia of Library

and Information Systems, vol. 69. Marcel

Dekker, New York (2000)

105. Geroimenko, V., Chen, C. (eds.): Visualizing the

Semantic Web: XML-Based Internet and Infor-

mation Visualization. Springer, Berlin (2002)

106. Kauppinen, T., Henriksson, R., Vaatainen, J.,

Deichstetter, C., Hyvonen, E.: Ontology based

modeling and visualization of cultural spatio-

temporal knowledge. In: Semantic Web at

Work – Proceedings of STeP 2006, Finnish AI

Society, Espoo (2006)

107. Adomavicius, G.: Toward the next generation

of recommender systems: a survey of the

state-of-the-art and possible extensions. IEEE

Trans. Knowl. Data Eng. 17(6), 734–749 (2005)

108. Viljanen, K., Kansala, T., Hyvonen, E., Makela,

E.: ONTODELLA – a projection and linking

service for Semantic Web applications. In: Pro-

ceedings of the 17th International Conference

on Database and Expert Systems Applications

(DEXA 2006), Krakow (2006)

109. Ruotsalo, T., Makela, E., Kauppinen, T.,

Hyvonen, E., Haav, K., Rantala, V., Frosterus, M.,

Dokoohaki, N., Matskin, M.: Smartmuseum:

personalized context-aware access to digital

cultural heritage. In: Proceedings of the Inter-

national Conferences on Digital Libraries and

the Semantic Web 2009 (ICSD 2009), Trento

(2009)

http://www.ifla.org/IV/ifla63/63polst.pdf

http://www.ifla.org/IV/ifla63/63polst.pdf


110. Wang, Y., Stash, N., Aroyo, L., Gorgels, P.,

Rutledge, L., Schreiber, G.: Recommendations

based on semantically-enriched museum collec-

tion. J. Web Semant. 6(4), 283–290 (2008)

111. Spagnuolo, M., Falcidieno, B.: 3Dmedia and the

semantic web. IEEE Intell. Syst. 24(2), 90–96

(2009)

112. State of the art report on 3D content in

archaeology and cultural heritage. FOCUS K3D

deliverable 2.4.1 (2009)

113. Hauswirth, M., Decker, S.: Semantic reality –

Connecting the real and the virtual world. In:

Position Paper at Microsoft SemGrail Work-

shop, Redmond (2007)

114. Dasiopoulou, S., Tzouvaras, V., Kompatsiaris, I.,

Strintzis, M.G.: EnquiringMPEG-7 based ontol-

ogies. Multimed. Tools Appl. 46(2), 331–370

(2010)

115. Staab, S., Studer, R. (eds.): Handbook on

Ontologies. Springer, Dordrecht (2004)

116. Baader, F., Calvanese, D., McGuinness, D.L.,

Nardi, D., Patel-Schneider, P.F.: The Description

Logic Handbook: Theory, Implementation, and

Applications. Cambridge University Press,

Cambridge (2003)

117. Bertini, M., Del Bimbo, A., Serra, G., Torniai, C.,

Cucchiara, R., Grana, C., Vezzani, R.: Dynamic

pictorially enriched ontologies for digital video

libraries. IEEE Multimed. 16(2), 42–51 (2009)

118. Shet, V.D., Neumann, J., Ramesh, V., Davis, L.S.:

Bilattice-based logical reasoning for human

detection. In: Proceedings of IEEE Conference

on Computer Vision and Pattern Recognition

(CVPR 2007), Minneapolis, pp. 1–8 (2007)

119. Patkos, T., Chrysakis, I., Bikakis, A., Plexousakis,

D., Antoniou, G.: A reasoning framework for

ambient intelligence. In: Proceedings of the

Sixth Hellenic Conference on Artificial Intelli-

gence: Theories, Models and Applications

(SETN 2010), Athens, pp. 213–222 (2010)

120. Kowalski, R.A., Sergot, M.J.: A logic-based cal-

culus of events. In: Foundations of Knowledge

Base Management, pp. 23–55. Springer, Berlin

(1985). ISBN 3-540-18987-4

121. Biancalana, C., Micarelli, A., Squarcella, C.:

Nereau: a social approach to query expan-

sion. In: Proceeding of the 10th ACM Interna-

tional Workshop on Web Information and Data

Management (WIDM 2008), Napa Valley,

pp. 95–102. ACM, New York (2008)

122. Brusilovsky, P., Maybury, M.T.: From adaptive

hypermedia to the adaptive web. Commun.

ACM 45(5), 30–33 (2002)

123. Carmagnola, F., Cena, F., Gena, C., Torre, I.: A

semantic framework for adaptive web-based sys-

tems. In: Bouquet, P., Tummarello, G. (eds.)

Semantic Web Applications and Perspectives

(SWAP 2005), Proceedings of the Second Italian

Semantic Web Workshop, Trento. CEURWork-

shop Proceedings, vol. 166. CEUR-WS.org

(2005)

124. Ginsberq, M.L.: Multivalued logics: a uniform

approach to reasoning in artificial intelligence.

Comput. Intell. 4, 265–316 (1988)

Date post:	16-Oct-2020
Category:	Documents
Upload:	others
View:	3 times
Download:	0 times

New 21 Multimedia, Broadcasting, and eCulturetroncy/Publications/Troncy-swhandbook11.pdf · 2011....

Documents