21 Multimedia, Broadcasting,and eCulture
John Domi
DOI 10.100
Lyndon Nixon1 . Stamatia Dasiopoulou2 . Jean-Pierre Evain3 . EeroHyvonen4 . Ioannis Kompatsiaris2 . Raphael Troncy51Semantic Technology Institute (STI) International, Vienna,Austria2Centre for Research and Technology Hellas (CERTH),Thermi-Thessaloniki, Greece3European Broadcasting Union (EBU), Grand-Saconnex,Switzerland4Aalto University and University of Helsinki, Aalto, Finland5EURECOM, Sophia Antipolis, France
21.1
ng
7
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 913
21.2
Scientific and Technical Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91521.2.1
Multimedia Semantics: Vocabularies and Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91521.2.1.1
Multimedia Vocabularies on the Semantic Web . . . . . . . . . . . . . . . . . . . . . . . . . . . 91521.2.2
Semantic Web–Based Multimedia Annotation Tools . . . . . . . . . . . . . . . . . . . . . . 92321.2.3
Semantic Multimedia Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92921.2.4
Semantics in Broadcasting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93521.2.4.1
Metadata in Broadcasting from Its Origin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93521.2.4.2
Metadata Standardization in Broadcasting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93621.2.4.3
Using Ontologies: Metadata + Semantic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93721.2.4.4
A Semantic Representation of TV-Anytime in a Nutshell . . . . . . . . . . . . . . . . . 93721.2.4.5
A Semantic Representation of Thesauri . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94121.2.4.6
The Holy Grail: Agreeing on a Class Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94221.2.4.7
Agreeing on Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94321.3
Example Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94521.3.1
Semantic Television . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94521.3.1.1
User Activity Capture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94721.3.1.2
Enriched EPG Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94821.3.1.3
Alignment Between Vocabularies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94821.3.1.4
Personalized TV Program Recommendation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94921.3.2
Semantics in Cultural Heritage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94921.3.2.1
Ontological Dimensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95121.3.2.2
Challenges of Content Creation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 953ue, Dieter Fensel & James A. Hendler (eds.), Handbook of Semantic Web Technologies,
/978-3-540-92913-0_21, # Springer-Verlag Berlin Heidelberg 2011
912 21 Multimedia, Broadcasting, and eCulture
21.3.2.3
Syntactic and Semantic Interoperability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95421.3.2.4
Semantic eCulture Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95521.3.2.5
Semantic Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95921.3.2.6
Semantic Browsing and Recommending . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96021.3.2.7
Visualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96121.3.2.8
Cultural Heritage as Web Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96221.4
Related Resources Including Key Papers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96221.4.1
Multimedia Ontologies, Annotation, and Analysis . . . . . . . . . . . . . . . . . . . . . . . . 96221.4.2
Broadcaster Artifacts Online . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96321.4.2.1
Vocabularies and Ontologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96321.4.2.2
Metadata Schemas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96421.4.2.3
Semantic Television . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96421.4.3
Cultural Heritage Artifacts Online . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96421.4.3.1
Vocabularies and Ontologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96421.4.3.2
Metadata Schemas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96521.4.3.3
Semantic eCulture Systems Online . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96521.5
Future Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96521.6
Cross-References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96921.1 Introduction 21 913
Abstract: This chapter turns to the application of semantic technologies to areas
where text is not dominant, but rather audiovisual content in the form of images,
3D objects, audio, and video/television. Non-textual digital content raises new
challenges for semantic technology in terms of capturing the meaning of that content
and expressing it in the form of semantic annotation. Where such annotations are
available in combination with expressive ontologies describing the target domain, such
as television and cultural heritage, new and exciting possibilities arise for multimedia
applications.
21.1 Introduction
Ever since computers became capable of processing data other than text, capturing,
storing and processing images, 3D modeling, and processing audio and video, the issues
of how to describe these data so that they could be found again, or process these data so
that they could be reused in new contexts, have been studied in the field of multimedia
systems. The subject of this chapter is the work emerging on the intersection of multi-
media systems and semantic technology, leading to new insights in multimedia analysis
and annotation, and by extension new applications in areas like broadcasting (television)
and cultural heritage.
General cross-media queries are textual in nature, as text is considered the easiest
media for a computer system to handle. In order that queries are then matched to media,
the media objects are manually textually annotated. Then established text matching
algorithms are applicable to the multimedia retrieval. This additional annotation of
data is often referred to as ‘‘metadata,’’ which means ‘‘data about data.’’ In annotated
systems, how the user forms the query can be very significant in determining the success of
the retrieval, both in terms of the ambiguity of natural language and that the user may be
unaware of how the media has been annotated. Annotated systems are also not aware of
the broader meaning of the terms used in their metadata vocabulary, for example, that the
keyword ‘‘FordOrion’’ is a specific instance of a ‘‘car,’’ which is a ‘‘vehicle.’’ Hence, retrieval
is rather coarse, for example, only media with the exact annotation searched for is
returned, rather than with other, similar, media. To overcome this, text-based approaches
such as Latent Semantic Indexing [1] analyze natural language and associate related
words. This type of approach is still very dominant on the Web, for example, Google
Image Search (possibly the most used image retrieval system on the Web at the time of
writing) associates images with the text closest to them on the HTML page. In all these
cases, the metadata are determinable only as a result of there already being natural
language text associated with the media.
The set of semantic technologies addressed previously offers a new solution to the
problems of multimedia retrieval and processing. Multimedia annotations can become
richer than just simple metadata with keywords, where the use of ontologies enables the
annotator to link annotation values to knowledge about the wider domain, whether that
domain is that of the media object’s representation (e.g., a picture of a Ford Orion linked
914 21 Multimedia, Broadcasting, and eCulture
into an ontology about cars) or of the media object itself (e.g., metadata on an art painting
is described in terms of an ontology about art paintings, which can capture domain
knowledge about paintings such as the materials used, style employed, etc.). Hence, the
choice of an appropriate semantic schema to annotate multimedia content is important
with respect to its future (re)use and most multimedia schemas in use today are not
immediately usable together with ontologies and reasoners.
> Section 21.2.1 begins with an exploration of semantic multimedia with the current
status of multimedia ontologies for annotation and the work toward a shared Media
Ontology. This is complemented by an overview of current tools for multimedia anno-
tation in > Sect. 21.2.2.
Multimedia content selection is a very different and difficult problem in comparison
with textual retrieval where the query is usually also textual and is realized by string
matching in the content store, aided by devices such as stemming and synonyms. The key
problems in the case of multimedia retrieval are that the form of query does not generally
match the form of media being queried and with queries that are of the same form (e.g.,
user whistling to search an audio database) matching techniques are more complex than
with text. However, in media industries, it is more typical to search image data on the basis
of an existing image, or audio based on a note or sample. Here, MIR (multimedia
information retrieval) research to improve the so-called query by example focuses on
low-level feature extraction and developing classifiers that map these low-level features to
a high-level concept. However, such low-level matching has the restriction of requiring the
query to be in the same form as the stored media, and conversely, that the stored media is
all of a single form. Hence, mixed media stores are excluded from this approach, and
queries are often not intuitive to the general user (e.g., much depends on the user’s skill for
drawing or whistling). The use of such classification techniques to support the multime-
dia annotation not only helps reduce human annotation effort but provides means for
cross-media multimedia search, or even mixed-form queries (query by example with
identification of the concepts sought, to better rank or filter results). > Section 21.2.3
introduces semantic multimedia analysis techniques to better train the classifiers and
extract concepts from low-level features.
The broadcasting industry relies on schemas and standards for its metadata, and a look
into developments toward a semantic schema standard for broadcasters in the future is
provided in > Sect. 21.2.4.
In turn, semantic data about multimedia objects allow them to be processed and
manipulated in similar ways to other instance data, for example, SPARQL-based retrieval
of matching objects, data mediation to ensure interoperability of schemas across
systems, or transformations to effect adaptation of the described media object to new
devices or contexts. In the rest of this chapter, in > Sect. 21.3, applications of semantic
technologies applied and adapted to the multimedia domain are presented, with examples
from the broadcasting (> Sect. 21.3.1) and cultural heritage (> Sect. 21.3.2) sectors,
respectively.
After providing some key papers on the current work in semantic multimedia
(> Sect. 21.4), > Sect. 21.5 will turn to the future trends in this area, considering
21.2 Scientific and Technical Overview 21 915
particularly how future TV viewers and explorers of 3D virtual worlds may benefit from
today’s research and enjoy the use of semantic technologies without being aware of it.
21.2 Scientific and Technical Overview
Before turning to applications of semantic multimedia and its future in society and
industry, it is necessary to introduce the current state of the art of the building blocks of
semantic multimedia: firstly, the vocabularies that are formalized as ontologies and used
to semantically describe multimedia content. Then the means to creating those annota-
tions, both through manual editing in tools and through automated generation using
multimedia analysis techniques. Finally, we look at how the state of the art in the
broadcasting industry is moving toward the use of semantic technology.
21.2.1 Multimedia Semantics: Vocabularies and Tools
The availability of interoperable semantic metadata is crucial for handling effectively the
growing amount of multimedia assets that are encountered in a plethora of applications
addressing both personal and professional multimedia data usage. A growing adoption of
Semantic Web technologies by the multimedia community in order to enable large-scale
interoperability between media descriptions and to benefit from the advantages brought
by explicit semantics in the reuse, sharing, and processing of metadata can be observed.
Multimedia annotations present several challenges. One of them is to enable users to
describe the content of some assets with respect to specific domain ontologies, but
contrary to the annotation of textual resources, multimedia content does not contain
canonical units (similar to words) that would have a predefined meaning. In the case of
media annotation, particular requirements apply as a result of the intrinsically multidis-
ciplinary nature of multimedia content. Among the most fundamental of these is the
ability to localize and annotate specific subparts within a given media asset, such as
regions in a still image or moving objects in video sequences. The modeling of the
structural and decomposition knowledge involved in the localization of individual
media segments varies across vocabularies and has different levels of support among
the existing annotation tools. The supported types of metadata, the granularity and
expressivity of the annotation level, the intended context of usage, etc., give rise to
further differences, casting a rather obscure setting regarding the sharing and reuse of
the generated multimedia annotations.
21.2.1.1 Multimedia Vocabularies on the Semantic Web
There has been a proliferation of metadata formats to express information about media
objects. For example, pictures taken by camera come with EXIF metadata related to the
916 21 Multimedia, Broadcasting, and eCulture
image data structure (height, width, orientation), the capturing information (focal length,
exposure time, flash), and the image data characteristics (transfer function, color space
transformation). These technical metadata are generally completed with other standards
aiming at describing the subject matter. DIG35 is a specification of the International
Imaging Association (I3A). It defines, within an XML Schema, metadata related to image
parameters, creation information, content description (who, what, when, and where),
history, and intellectual property rights. XMP provides a native RDF data model and
predefined sets of metadata property definitions such as Dublin Core, basic rights, and
media management schemas for describing still images. IPTC has itself integrated XMP in
its Image Metadata specifications.
Video can be decomposed and described using MPEG-7, the Multimedia Content
Description ISO Standard. This language provides a large and comprehensive set of
descriptors including multimedia decomposition descriptors, management metadata
properties, audio and visual low-level features, and more abstract semantic concepts.
From the broadcast world, the European Broadcaster Union (EBU) has actively contrib-
uted to the video extension of the new version of IPTCNewsML-G2 based on IPTC’s NAR
architecture for describing videos, providing some extensions in order to be able to
associate metadata to arbitrary parts of videos and to have a vocabulary for rights
management. The EBU has also developed the EBUCore, P-Meta, and TV-Anytime
standards for production, archives, and electronic program guides (EPG). Finally,
video-sharing platforms provide generally their own lightweight metadata schemas and
APIs such as Yahoo!, Media RSS, or Google Video sitemaps.
Many of these formats are further described and discussed in [2]. On the one hand, an
environment is observed that uses numerous languages and formats, often XML-based,
that leads to interoperability problems and that excludes linking to other vocabularies and
existing Web knowledge resources. On the other hand, there is a need for using and
combining some of these metadata formats on the Web and there has been research work
for enabling interoperability using Semantic Web technologies. The following first
describes the various attempts to bring the most famous standard, MPEG-7, into the
Semantic Web. Then an ontology is presented for media resources that aims to be a future
W3C recommendation.
Comparing Four Different MPEG-7 Ontologies
MPEG-7, formally named Multimedia Content Description Interface [3], is an ISO/IEC
standard developed by the Moving Picture Experts Group (MPEG) for the structural and
semantic description of multimedia content. MPEG-7 standardizes tools or ways to define
multimedia Descriptors (Ds), Description Schemes (DSs), and the relationships between
them. The descriptors correspond either to the data features themselves, generally low-
level features such as visual (e.g., texture, camera motion) and audio (e.g., spectrum,
harmony), or semantic objects (e.g., places, actors, events, objects). Ideally, most low-level
descriptors would be extracted automatically, whereas human annotation would be
required for producing high-level descriptors. The description schemes are used for
grouping the descriptors into more abstract description entities. These tools as well as
21.2 Scientific and Technical Overview 21 917
their relationships are represented using theDescription Definition Language (DDL). After
a requirement specification phase, the W3C XML Schema recommendation has been
adopted as the most appropriate syntax for the MPEG-7 DDL.
The flexibility of MPEG-7 is therefore based on allowing descriptions to be associated
with arbitrary multimedia segments, at any level of granularity, using different levels of
abstraction. The downside of the breadth targeted by MPEG-7 is its complexity and its
ambiguity. Hence, MPEG-7 XML Schemas define 1,182 elements, 417 attributes, and
377 complex types, which make the standard difficult to manage. Moreover, the use of
XML Schema implies that a great part of the semantics remains implicit. For example,
very different syntactic variations may be used in multimedia descriptions with the same
intended semantics, while remaining valid MPEG-7 descriptions. Given that the standard
does not provide a formal semantics for these descriptions, this syntax variability causes
serious interoperability issues for multimedia processing and exchange [4–6]. The profiles
introduced by MPEG-7 and their possible formalization [7] concern, by definition, only
a subset of the whole standard. For alleviating the lack of formal semantics in MPEG-7,
four multimedia ontologies represented in OWL and covering the whole standard have
been proposed (> Table 21.1) [8–10]. The proposers of these four ontologies have
compared and discussed these four modeling approaches [11]. First, these four ontologies
are briefly described and then their commonalities and differences are outlined using three
criteria: (1) the way the multimedia ontology is linked with domain semantics; (2) the
MPEG-7 coverage of the multimedia ontology; and (3) the scalability and modeling
rationale of the conceptualization.
In 2001, Hunter proposed an initial manual translation of MPEG-7 into RDFS
(and then into DAML+OIL) and provided a rationale for its use within the Semantic
Web [9]. This multimedia ontology was translated into OWL, and extended and harmo-
nized using the ABC upper ontology [12] for applications in the digital libraries [13] and
eResearch fields [14]. The current version is an OWL Full ontology containing classes
defining the media types (Audio, AudioVisual, Image, Multimedia, and Video) and
. Table 21.1
Summary of the different MPEG-7-based multimedia ontologies
Hunter [9] DS-MIRF [10] Rhizomik [8] COMM
Foundations ABC None None DOLCE
Complexity OWL-Full OWL-DL OWL-DL OWL-DL
URL metadata.net/mpeg7/
www.music.tuc.gr/ontologies/MPEG703.zip
rhizomik.net/ontologies/mpeg7ontos
multimedia.semanticWeb.org/COMM/
Coverage MDS+Visual MDS+CS All MDS+Visual
Applications Digitallibraries,eResearch
Digital libraries,eLearning
Digital rightsmanagement,e-business
Multimediaanalysis andannotations
918 21 Multimedia, Broadcasting, and eCulture
the decompositions from theMPEG-7Multimedia Description Schemes (MDS) part. The
descriptors for recording information about the production and creation, usage, struc-
ture, and the media features are also defined. The ontology can be viewed in Protege
(http://protege.stanford.edu/) and has been validated using the WonderWeb OWL
Validator (http://www.mygrid.org.uk/OWL/Validator). This ontology has usually been
applied to describe the decomposition of images and their visual descriptors for use in
larger semantic frameworks. Harmonizing through an upper ontology, such as ABC,
enables queries for abstract concepts such as subclasses of events or agents to return
media objects or segments of media objects. While the ontology has most often been
applied in conjunction with the ABC upper model, it is independent of that ontology and
can also be harmonized with other upper ontologies such as SUMO [15] or DOLCE [16].
In 2004, Tsinaraki et al. proposed the DS-MIRF ontology that fully captures in OWL
DL the semantics of the MPEG-7 MDS and the Classification Schemes. The ontology can
be visualized with GraphOnto or Protege and has been validated and classified with the
WonderWeb OWL Validator. The ontology has been integrated with OWL domain
ontologies for soccer and Formula 1 in order to demonstrate how domain knowledge
can be systematically integrated in the general-purpose constructs of MPEG-7. This
ontological infrastructure has been utilized in several applications, including audiovisual
digital libraries and eLearning. The DS-MIRF ontology has been conceptualizedmanually,
according to the methodology outlined in [10]. The XML Schema simple datatypes
defined in MPEG-7 are stored in a separate XML Schema to be imported in the DS-
MIRF ontology. The naming of the XML elements are generally kept in the rdf:IDs of the
corresponding OWL entities, except when two different XML Schema constructs have the
same names. The mapping between the original names of the MPEG-7 descriptors and
the rdf:IDs of the corresponding OWL entities is represented in an OWL DL mapping
ontology. Therefore, this ontology will represent, for example, that the Name element of
the MPEG-7 type TermUseType is represented by the TermName object property, while
the Name element of the MPEG-7 type PlaceType is represented by the Name object
property in the DS-MIRF ontology. The mapping ontology also captures the semantics of
the XML Schemas that cannot be mapped to OWL constructs such as the sequence
element order or the default values of the attributes. Hence, it is possible to return to an
original MPEG-7 description from the RDF metadata using this mapping ontology. This
process has been partially implemented in GraphOnto [17], for the OWL entities that
represent the SemanticBaseType and its descendants. The generalization of this
approach has led to the development of a transformation model for capturing the
semantics of any XML Schema in an OWL DL ontology [18]. The original XML Schema
is converted into a main OWL DL ontology, while an OWL DL mapping ontology keeps
track of the constructs mapped in order to allow circular conversions.
In 2005, Garcia and Celma presented the Rhizomik approach that consists of mapping
XML Schema constructs to OWL constructs, following a generic XML Schema to OWL
together with an XML to RDF conversion [8]. Applied to the MPEG-7 schemas, the
resulting ontology covers the whole standard as well as the Classification Schemes and
TV-Anytime (http://tech.ebu.ch/tvanytime). It can be visualized with Protege or Swoop
21.2 Scientific and Technical Overview 21 919
(http://code.google.com/p/swoop) and has been validated and classified using the
WonderWeb OWLValidator and Pellet. The Rhizomik ontology was originally expressed
in OWL Full, since 23 properties must be modeled using an rdf:Property because they
have both a datatype and object-type range, that is, the corresponding elements are both
defined as containers of complex types and simple types. An OWL DL version of the
ontology has been produced, solving this problem by creating two different properties
(owl:DatatypeProperty and owl:ObjectProperty) for each of them. This change
is also incorporated into the XML2RDF step in order to map the affected input XML
elements to the appropriate OWL property (object or data type), depending on the kind of
content of the input XML element. The main contribution of this approach is that it
benefits from the great amount of metadata that has been already produced by the XML
community. Moreover, it allows the automatic mapping of input XML Schemas to OWL
ontologies and XML data based on them to RDF metadata following the resulting
ontologies. This approach has been used with other large XML Schemas in the Digital
Rights Management domain, such as MPEG-21 and ODRL [19], and in the eBusiness
domain [20].
In 2007, Arndt et al. have proposed COMM, the Core Ontology of Multimedia, for
annotation. Based on early work [21, 22], COMM has been designed manually by
reengineering completely MPEG-7 according to the intended semantics of the written
standard. The foundational ontology DOLCE serves as the basis of COMM. More
precisely, the Description and Situation (D&S) and Ontology of Information Objects
(OIO) patterns are extended into various multimedia patterns that formalize theMPEG-7
concepts. The use of an upper-level ontology provides a domain-independent vocabulary
that explicitly includes formal definitions of foundational categories, such as processes or
physical objects, and eases the linkage of domain-specific ontologies because of the
definition of top-level concepts. COMM covers the most important part of MPEG-7
that is commonly used for describing the structure and the content of multimedia
documents. Current investigations show that parts of MPEG-7 that have not yet been
considered (e.g., navigation and access) can be formalized analogously to the other
descriptors through the definition of other multimedia patterns. COMM is an OWL
DL ontology that can be viewed using Protege. Its consistency has been validated using
Fact++-v1.1.5. Other reasoners failed to classify it due to the enormous amount of DL
axioms that are present in DOLCE. The presented OWL DL version of the core module is
just an approximation of the intended semantics of COMM since the use of OWL 1.1 (e.g.,
qualified cardinality restrictions for number restrictions of MPEG-7 low-level descrip-
tors) and even more expressive logic formalisms are required for capturing its complete
semantics.
To compare the four MPEG-7-based ontologies described above, consider a task to
annotate the famous ‘‘Big Three’’ picture, taken at the Yalta (Crimea) Conference, showing
the heads of government of the USA, the UK, and the Soviet Union during World War II.
The description could be obtained either manually or automatically from an annotation
tool. It could also be the result of an automatic conversion from an MPEG-7 description.
The annotation should contain the media identification and locator, define the still region
920 21 Multimedia, Broadcasting, and eCulture
SR1 of the image, and provide the semantics of the region using http://en.wikipedia.org/
wiki/Churchill for identifying the resource Winston Churchill. > Figure 21.1 depicts
the RDF descriptions generated for these four ontologies.
The link between a multimedia ontology and any domain ontologies is crucial. In the
example, a more complete description could include information about ‘‘Churchill’’ (a
person, a British Prime Minister, etc.) and about the event. In addition, details about the
provenance of the image (e.g., date taken, photographer, camera used) could also be
linked to complete the description. The statements contained in the descriptions above, in
conjunction with any of the four underlying ontologies presented in this paper, can then
be used to answer queries such as ‘‘find all images depicting Churchill ’’ or ‘‘find all media
depicting British Prime Ministers.’’ Furthermore, subjective queries such as ‘‘find images
with a ‘bright’ segment in them,’’ where ‘‘bright’’ is defined as mpeg7:DominantColor
greater than rgb(220,220,220), are also possible.
Hunter’s MPEG-7 and COMM ontologies both use an upper ontology approach to
relate with other ontologies (ABC and DOLCE). Hunter’s ontology uses either semantic
relations from MPEG-7, such as depicts, or defines external properties that use an
MPEG-7 class, such as mpeg7:Multimedia, as the domain or range. In COMM, the link
with existing vocabularies is made within a specific pattern: the Semantic Annotation
Pattern, reifying the DOLCE Ontology of Information Object (OIO) pattern. Conse-
quently, any domain-specific ontology goes under the dolce:Particular or owl:
Thing class. The DS-MIRF ontology integrates domain knowledge by subclassing one
of the MPEG-7 SemanticBaseType: places, events, agents, etc. Furthermore, it fully
captures the semantics of the various MPEG-7 relationships represented as instances of
the RelationType. According to the standard, the value of these properties must come
from some particular classification schemes: RelationBaseCS, TemporalRe-
lationCS, SpatialRelationCS, GraphRelationCS, and SemanticRelationCS.
A typed relationship ontology extending DS-MIRF has been defined for capturing all
these relationships.
An important modeling decision for each of the four ontologies is how much they are
tied to theMPEG-7 XML Schema. These decisions impact upon the ability of the ontology
to support descriptions generated automatically and directly from MPEG-7 XML output
and on the complexity of the resulting RDF. Therefore, the modeling choices also affect
the scalability of the systems using these ontologies and their ability to handle large media
datasets and cope with reasoning over very large quantities of triples. Both the DS-MIRF
and the Rhizomik ontologies are based on a systematic one-to-one mapping from the
MPEG-7 descriptors to equivalent OWL entities. For the DS-MIRF ontology, themapping
has been carried out manually, while for the Rhizomik ontology, it has been automated
using an XSL transformation and it is complemented with an XML to RDFmapping. This
has been a key motivator for the Rhizomik ontology and the ReDeFer tool where the
objective is to provide an intermediate step before going to a more complete multimedia
ontology, such as COMM. The advantage of the one-to-one mapping is that the trans-
formation of the RDF descriptions back to MPEG-7 descriptions may be automated later
on. In addition, this approach enables the exploitation of legacy data and allows existing
mpe
g7:im
age
mpe
g7:C
reat
ion
Info
rmat
ion
mpe
g7:M
edia
Loca
tor
mpe
g7:M
edia
UR
I
data
:has
-rec
tang
le
mpe
g7:M
edia
Loca
tor
rdf:t
ype
mpe
g7:s
patia
l_de
com
posi
tion
mpe
g7:D
omin
antC
olor
mpe
g7:d
epic
ts
mpe
g7:C
oord
s
mpe
g7:S
patia
lMas
km
peg7
:dep
icts
mpe
g7:P
olyg
on
mpe
g7:S
eman
ticm
peg7
:Spa
tialM
ask
rdf:t
ype
mpe
g7:M
edia
Loca
tor
mpe
g7:s
patia
l_de
com
posi
tion
mpe
g7:C
reat
ionI
nfor
mat
ion
mpe
g7:T
itle
mpe
g7:d
im
dns:
play
ed-b
ydn
s:pl
ayed
-by
dns:
play
s
dns:
setti
ng
rdf:t
ype
dns:
defin
esdn
s:re
aliz
ed-b
y dns:
defin
esrd
f:typ
em
peg7
:Spa
tialD
ecom
posi
tion
mpe
g7:C
reat
ion
mpe
g7:S
patia
lm
ask
mpe
g7:T
itle
cont
entS
trin
g
mpe
g7:d
im
mpe
gu:P
olyg
on
mpe
g7:R
elat
ed
Mat
eria
l
mpe
g7:S
tillR
egio
n
core
:imag
e-da
taco
re:s
eman
tic-
anno
tatio
n
core
:sem
antic
-la
bel-r
ole
http
://en
.wik
iped
ia.o
rg/
wik
i/Chu
rchi
ll
mpe
g7:S
tillR
egio
n
mpe
g7:P
olyg
on
mpe
g7:im
age
dbpe
dia:
Chu
rchi
llR
eg1
mpe
g7:S
ubR
egio
n
The
big
thre
e at
the
yalta
con
fere
nce
The
big
thre
e at
the
yalta
con
fere
nce
5 25
10
20 1
5 15
10
10 5
15”^
^xsd
:str
ing 5
25 1
0 20
15
15 1
0 10
515
”^^x
sd:s
trin
g
5 25
10
20 1
5 15
10
10 5
15”^
^xsd
:str
ing
rgb(
25,2
55,2
55)
The
big
thre
e at
the
yalta
con
fere
nce
5 25
10
20 1
5 15
10
10 5
15”^
^xsd
:str
ing
mpe
g7:im
age
mpe
g7:C
oord
s
mpe
g7:S
egm
ent
Type
dbpe
dia:
Chu
rchi
ll
Reg
1
loc:
spat
ial-
mas
k-ro
lelo
c:re
gion
-lo
cato
r-de
scrip
tor
loc:
boun
ding
-bo
x
foaf
:Per
son
dbpe
dia:
Chu
rchi
ll
mpe
g7:C
oord
s
Reg
1
a
bc
d
mpe
g7:S
ubR
egio
n
.Fig.21.1
Anim
agedescribedaccordingto
anMPEG-7
ontology.(a)Rhizomikapproach
.(b)DS-M
IRFapproach
.(c)Hunterapproach
.(d)COMM
approach
.
Theim
ageisavisualrepresentationoftheresourceidentifiedbyhttp://en.wikipedia.org/w
iki/Im
age:Yalta_Conference.jpg
21.2 Scientific and Technical Overview 21 921
922 21 Multimedia, Broadcasting, and eCulture
tools that output MPEG-7 descriptions to be integrated into a semantic framework. The
main drawback of this approach is that it does not guarantee that the intended semantics
of MPEG-7 is fully captured and formalized. On the contrary, the syntactic interopera-
bility and conceptual ambiguity problems such as the various ways of expressing
a semantic annotation remain.
The COMM ontology avoids doing a one-to-one mapping for solving these ambigu-
ities that come from the XML Schemas, while an MPEG-7-to-COMM converter is still
available for reusing legacy metadata. A direct translation from anMPEG-7 XML descrip-
tion using Hunter’s ontology is possible. However, in practice, the multimedia semantics
captured by the ontology have instead been used to link with domain semantics. There-
fore, rather than translating MPEG-7 XML descriptions into RDF, this ontology has been
used to define semantic statements about a media object and to relate these statements to
the domain semantics. This results in a smaller number of triples.
The MPEG-7-based ontologies discussed here aim to provide richer semantics and
better frameworks for multimedia description and exchange than can be addressed by
current standards. For further reading, the interested reader can also refer to [116] that
surveys the state of the art of MPEG-7-based ontologies. Related efforts to develop
multimedia ontologies include the following: the Visual Descriptor Ontology (VDO)
[23] is based on the MPEG-7 Visual part and used for image and video analysis; [24] have
proposed a visual ontology by extending WordNet with multimedia semantics from
Hunter’s ontology, specifically for use within the museums and art domain; [25] devel-
oped an MPEG-7-based ontology and applied it to annotating football (soccer) videos;
similar to the approach used in Hunter’s ontology and in COMM, this ontology uses the
decomposition and visual components of MPEG-7 and captures high-level domain
semantics in domain-specific ontologies.
Toward a Standardized Ontology for Media Resources
The Ontology for Media Resources currently being specified in W3C is a core vocabulary
that covers basic metadata properties to describe media resources (see www.w3.org/TR/
mediaont-10/). The goal of this ontology is to address the interoperability problem by
providing a common set of properties defining the basic metadata needed for media
resources and the semantic links between their values in different existing vocabularies.
The ontology can be used to attach different types of metadata to the media, such as the
duration, the target audience, the copyright, the genre, and the rating. Media fragments
can also be defined in order to have a smaller granularity and attach keywords or formal
annotations to parts of the video. The ontology will also be accompanied by an API that
provides uniform access to all elements defined by the ontology.
The purpose of the mappings defined in the ontology is to enable different applica-
tions to share and reuse metadata represented in heterogeneous metadata formats. For
example, creator is a common property that is supported in many metadata formats.
Therefore, it is defined as one of the properties in the core vocabulary of the ontology for
media resources and aligned with other vocabularies. Ideally, the mappings defined in the
ontology should be used to reconcile the semantics of a term defined in a particular
<http://data.linkedevents.org/media/4303994975> a ma:Image; dc:title "Radiohead / Thom Yorke"; ma:locator <http://farm3.static.flickr.com/2726/4303994975_74302c45b5_o.png>; ma:createDate "2010-01-25T12:27:21"^ ^xsd:dateTime; ma:frameWidth "1280"^ ^xsd:integer; ma:frameHeight "720"^ ^xsd:integer; ma:keyword "colin"; ma:keyword "radiohead".
. Fig. 21.2
MediaOntology annotation (Courtesy Raphael Troncy, from data.linkedevents.org)
21.2 Scientific and Technical Overview 21 923
schema. However, this cannot be easily achieved, due to the many differences in the
semantics that are associated with each property in the mapped vocabularies. For exam-
ple, the property dc:creator from Dublin Core and the property exif:Artist
defined in EXIF are both aligned to the property ma:creator. However, the extension
of the property in the EXIF vocabulary (i.e., the set of values that the property can have) is
more specific than the corresponding set of values that this property can have in Dublin
Core. Therefore, mapping back and forth between properties from different schemas,
using this ontology as a reference, will induce a certain loss in semantics. The axioms
representing the mappings are defined as an exact, broader, or narrower mapping between
two properties (> Fig. 21.2).
21.2.2 Semantic Web–Based Multimedia Annotation Tools
As already sketched by the aforementioned, multimedia annotations come in
a multilayered, intertwined fashion, encompassing among others descriptions about the
conveyed subject matter (e.g., a train arriving at the station in a rainy day), content
structure (e.g., the specific image region depicting the train or the part of the video
capturing the train as it approaches), visual features (e.g., the values of the color descrip-
tors corresponding to the rainy sky image parts), administrative information (e.g.,
ownership and editing rights), and so forth. Different aspects pertain depending on the
annotation needs and the particular application context addressed each time, as illus-
trated in the description of state-of-the-art ontology-based annotation tools that follows.
SWAD (http://swordfish.rdfWeb.org/discovery/2004/03/w3photo/annotate.html),
though no longer maintained, constitutes one of the first Semantic Web–based
implementations addressing the manual annotation of images. Through a Web-based
interface, it allows the user to insert descriptions regarding who or what is depicted in
an image (person, object, and event), when and where it was taken, and additional
creation and licensing information. Annotations are exported in RDF. Despite the very
early stage of Semantic Web technologies uptake, SWAD used an impressive number of
924 21 Multimedia, Broadcasting, and eCulture
RDF vocabularies including FOAF, the Dublin Core element set, RDFiCalendar (http://
www.w3.org/2002/12/cal/), as well as an experimental, at that time, namespace for
WordNet.
PhotoStuff (http://www.mindswap.org/2003/PhotoStuff/) is a platform-independent
ontology-based annotation tool that allows users to describe the contents of an image with
respect to ontology concepts, as well as administrative information about the image such
as the creation date [26]. Multiple RDF(S) or OWL ontologies can be simultaneously
loaded, facilitating the user in the creation of annotations distributed across many
ontologies. Classes from the ontologies can be associated to the entire image or specific
regions using one of the available drawing tools (circle, rectangle, and polygon), while the
necessary region, localization, and so forth, definitions are provided by a built-in image-
region ontology. Annotations referring to relations can also be created by connecting
concept annotation instances that have been already identified in an image using prop-
erties from the uploaded ontologies. PhotoStuff also takes advantage of existing metadata
embedded in image files (such as EXIF) by extracting and encoding such information in
RDF/XML. The property depicts from FOAF and its inverse (depiction) are used to
link image (region) instances with domain ontologies concept instances. Finally,
PhotoStuff supports browsing, searching, and managing digital image annotations
through a loosely coupled connection with a Semantic Web portal (> Fig. 21.3).
AktiveMedia (http://www.dcs.shef.ac.uk/�ajay/html/cresearch.html), developed
within the AKT (http://www.aktors.org/akt/) and X-Media (http://www.x-media-project.
org/) projects, is an ontology-based cross-media annotation system addressing text and
image assets [27]. In the image annotation mode, AktiveMedia supports content markup
with respect to multiple domain-specific ontologies. Unlike PhotoStuff, only a single
ontology is displayed each time in the ontology browser. The user-created annotations
may refer to an image or a region level (using the provided rectangular and circular
drawing facilities), and a simple built-in schema is used to capture localization informa-
tion. To describe an image as whole, AktiveMedia provides three free text fields, namely,
title, content, and comment. Using the text annotation mode, the user-entered descrip-
tions can be subsequently annotated with respect to an ontology. The supported ontology
languages include RDFS and OWL, as well as older Semantic Web languages such as
DAML and DAML-ONT, and RDF are used for the export of the generated annotations.
An interesting feature of AktiveMedia, though not directly related to the task of image
annotation, is its ability to learn during textual annotation mode, so that suggestions can
be subsequently made to the user (> Fig. 21.4).
Following a different rationale, M-Ontomat-Annotizer, developed within the
aceMedia project (http://www.acemedia.org/aceMedia), extends the typical media anno-
tation by enabling in addition the formal representation of low-level visual features and
their linking with concepts from a domain ontology [28]. In order to formalize the linking
of domain concepts with visual descriptors, M-Ontomat-Annotizer employs the Visual
Annotation Ontology (VAO) and the Visual Descriptor Ontology (VDO) [29], both
hidden from the user. The VAO serves as a meta-ontology allowing one to model
domain-specific instances as prototype instances and to link them to respective descriptor
. Fig. 21.3
Annotation screenshot using PhotoStuff
. Fig. 21.4
Annotation screenshot using AktiveMedia
21.2 Scientific and Technical Overview 21 925
926 21 Multimedia, Broadcasting, and eCulture
instances through the hasDescriptor property. The domain-specific instances, and by
analogy the extracted descriptor instances, may refer to a specific region or to the entire
image. For the identification of a specific region, the user may either make use of the
automatic segmentation functionality provided by the M-Ontomat-Annotizer or use one
of the manually drawing tools, namely, the predefined shapes (rectangle and ellipse), free
hand, and magic wand. The supported input ontology languages are RDFS and DAML. In
a subsequent release within the K-Space project (http://kspace.qmul.net), M-Ontomat 2.0
(http://mklab.iti.gr/m-onto2) provides support for descriptive and structural annotations
in the typical semantic search and retrieval sense (> Fig. 21.5).
The K-Space Annotation Tool (KAT) (https://launchpad.net/kat) is an ontology-based
framework developed within the K-Space project for the semiautomatic annotation of
multimedia content [30]. Its core provides the infrastructure of an API and set of services,
including configuration and access to Sesame and Sesame2 repositories that enable users
to implement in a plug-in-based fashion relevant functionalities such as visualization,
editing, and manipulation of semantic content annotations. The model and storage layer
of KAT is based on the Core Ontology of Multimedia (COMM) [31] and the MultiMedia
Metadata Ontology (M3O), a subsequent extension that addresses also the annotation of
rich multimedia presentations [32]. In the current release, concepts from an ontology can
be used to mark up images and respective regions that are localized manually, using either
the rectangle or the polygon drawing tools. Decomposition and localization information
. Fig. 21.5
Annotation screenshot using M-Ontomat-Annotizer
21.2 Scientific and Technical Overview 21 927
is represented based on the respective COMM decomposition pattern. KAT’s flexible
architecture and COMM-based annotation model render it media independent, allowing
support for additional content types as long as respective media management function-
alities (e.g., video player) are implemented. Furthermore, the COMM-based annotation
model makes it quite straightforward to extend annotation level so as to include addi-
tional dimensions, such as low-level visual features for example, again as long as appro-
priate feature extraction plug-ins are available (> Fig. 21.6).
The Video and Image Annotation (VIA) tool, developed within the BOEMIE project
(http://www.boemie.org), provides a looser notion of ontology-based media markup.
Specifically, it supports the annotation of image and video assets using concepts from an
ontology, while allowing also for free text descriptions. Users may also add administrative
descriptions, including information about the creator of the annotations, the date of the
annotation creation, etc., based on a simple built-in schema. Image (and video frame)
annotation may address the entire image (video frame) or specific regions; in the case of
image annotation, the user can also select to extract MPEG-7 visual descriptors to enhance
annotations with low-level information. The localization of regions is performed either
semiautomatically, providing to the user a segmented image and allowing him or her to
correct it by region merging, or manually, using one of the drawing functionalities,
namely, free hand, polygon, circle, or rectangle.
Video annotation may refer to the entire video asset, video segments, moving regions,
frames, or still regions within a frame. It can be performed either in a successive frame-by-
frame fashion or in real time, where the user follows the movement of an object while the
. Fig. 21.6
Annotation screenshot using KAT
928 21 Multimedia, Broadcasting, and eCulture
video is playing, by dragging its bounding box. The annotations performed using VIA can
be saved as annotation projects, so that the original video, the imported ontologies, and
the annotations can be retrieved and updated at a later time. The produced metadata are
exported following a custom format, either in XML or in a more human-readable textual
form (> Fig. 21.7).
Following a different approach, a number of media annotation tools have been
developed based on MPEG-7 and customary multimedia description vocabularies. This
line of thought is particularly evident in the case of video annotation, where Semantic
Web–based technologies have hardly been employed; KAT is an exception, though cur-
rently it provides just the infrastructure and not an implementation, while VIA, though
allowing the use of a subject matter ontology for video annotation, uses a proprietary
format for encoding the generated metadata. Prominent examples of non-Semantic Web
compliant video annotation tools include IBM’s VideoAnnEx, Anvil, the Semantic Video
Annotation Suite, Ontolog, Elan, etc. (despite some names, none of these tools produces
metadata based on an ontology). For a complete list of relevant tools and resources for
the annotation of media content, the interested reader is referred to the Tools and
Resources page of W3C Multimedia Semantics Incubator Group (http://www.w3.org/
2005/Incubator/mmsem/wiki/Tools_and_Resources).
. Fig. 21.7
Annotation screenshot using VIA
21.2 Scientific and Technical Overview 21 929
Besides the uneven uptake of Semantic Web technologies in the annotation of
video assets compared to images, the aforementioned tools lead to a number of consid-
erations. A critical one relates to the interoperability of the generated annotation
metadata. The different ontology schemas used by the individual tools to represent
media-related descriptions and the respective modeling approaches to the linking
with domain ontologies hamper the sharing and reuse of annotations across different
applications. The situation is further aggravated, as many tools choose to follow propri-
etary schemas. This discrepancy between the available theoretical results and their
uptake in practical applications brings forth once again the trade-off between the
complexity of the proposed formal representation models and the effective fulfillment
of real-world needs. It is also quite interesting to note that many tools do not allow
users to edit previously created annotations at a later point, treating annotation as
a onetime affair.
The ambiguity that characterizes the relation and interlinking between annotations
generated by different tools induces a subsequent vagueness when assessing the appro-
priateness of each tool for a given application. For semantic search and retrieval, appli-
cations that address content at the level of perceived meaning, possible selection criteria
may be the expressivity level (are ontology classes adequate or relation descriptions are
also needed?) or the granularity of the annotations (depth of spatial or temporal content
decomposition). For applications that encompass multimedia analysis aspects too, tools
that support descriptions at the level of low-level features provide the means to capture
and share this additionally required information.
Concluding, although themain focus of semantic multimedia research continues to be
the automatic extraction of multimedia content annotations, the ability to effectively
generate manually or semiautomatically annotations remains a crucial pursuit. Despite
the strenuous research efforts and successful results in sporadic application domains, the
automatic extraction of multimedia semantics is still at a very naive level compared to
practical user needs. Moreover, manual content annotations contribute actively in seman-
tic multimedia research serving as ground truth data for evaluation purposes and as
training data to support knowledge acquisition and learning tasks.
21.2.3 Semantic Multimedia Analysis
Automated multimedia content understanding has strained researchers for years in the
painstaking quest to confront the so-called semantic gap challenge, namely, the lack of
correspondence between the low-level content descriptions that can be automatically
derived and the semantic interpretation a human would attribute [33].
Since its early days, research in content understanding has been intertwined with the
use of structured, prior knowledge in the pursuit of endowing computational systems
with the notion of informed (in terms of background knowledge–driven) interpretation.
This interrelation has rendered knowledge representation and reasoning as integral
elements of the undertaken investigations.
930 21 Multimedia, Broadcasting, and eCulture
In the 1980s and early 1990s, semantic networks and frames were widely used tomodel
the relevant domain knowledge and support object recognition and scene interpretation
tasks, while formal approaches, based on first-order logic, were significantly sparser
[34, 35]. The lack of consensual semantics and the customized reasoning algorithms,
hindered the sharing and reuse of knowledge across systems, and often led to disparate
conclusions for the same problem. Furthermore, the underlying assumption that all
aspects involved in semantic content analysis should be explicitly modeled had resulted
in extremely elaborate conceptualizations and decision-making strategies. All these led
gradually to a period of rather receding interest in the use of explicit knowledge,
a tendency further corroborated by the momentum that statistical inference approaches
have gained as generic tools for semantic image and object classification. Soon though, the
limitations of relying solely on learning using perceptual information and similarity-based
associations became apparent, reviving interest into the role of knowledge and reasoning
in multimedia content understanding [36, 37]. Following the Semantic Web initiative,
ontology [115] and Description Logic (DL) languages [116] became the prevalent for-
malisms for capturing and representing knowledge, establishing the current literature.
The following considers the use and role of Semantic Web technologies in the current
state of the art in semantic multimedia analysis and understanding, and concludes with
a brief discussion on open issues and challenges for future directions. As space constraints
have enforced several simplifications and omissions, the interested reader is referred to the
provided related resources for a thorough treatment of the topics addressed.
> Figure 21.8 depicts a typical example framework for semantic image analysis
deploying Semantic Web–based technologies. The interpretation process starts with an
image processing stage, where the extraction of relevant low- and intermediate-level
Seaside(image) ≥ 0.64Countryside_buildings(image) ≥ 0.52
above(r3,r2) ≥ 0.58Building(r3) ≥ 0.85Building(r4) ≥ 0.55
Vegetation(r4) ≥ 0.75......
edges, regions,texture, color...
Image processing and analysis
segmentation, feature extraction,object/scene classification, ...
Inference and semantic interpretationBackground knowledge
derive higher-level descriptions,coherency checking, ...
LandscapeCountryside_buildingsGrass � ∃ above.SkyLandscape � SeasideCountryside_buildings
Domain knowledge,constraints,
visual attributes, etc.
OutdoorLandscape⊥⊥∃ contains.Building� ∃contains.Vegetation
�����
. Fig. 21.8
Semantic image analysis architecture
21.2 Scientific and Technical Overview 21 931
representations pertaining to perceptual features, such as color and texture, takes place.
This is often carried out with the joint application of spatial (temporal) segmentation.
The extracted representations may then be used directly as input facts (assertions) in the
inference process, or may undergo further processing to acquire descriptions of higher
abstraction (e.g., concept classifiers for the detection of primitive objects, as in the
illustrated example), before inference is eventually invoked. The knowledge base encodes
logical associations and constraints necessary to admit valid content interpretations, as
well as appropriate abstraction layers so as to link and map information derived from
perceptual evidence to object and scene interpretations. A possible model for a building
thus, may span multiple levels of abstraction, starting with constituent parts such as
windows and doors, include corresponding spatial and geometric characteristics, and
reach down to edge and spatial adjacency definitions at the level of image pixels.
Knowledgemodeling and inference lie at the core of the interpretation process and being
intertwined with the configuration of content interpretation, they comprise the chief
dimensions of differentiation in the current state of the art. For example, approaches that
realize interpretation as a stepwise transition from low- tohigh-level content representations,
place naturally large emphasis on the modeling and linking of media-related knowledge to
domain-specific knowledge. Approaches that tackle instead the configuration of content
interpretation in a formally accountable way tend to concentrate on the implications and
adaptation requirements imposed on inferencing, and usually abstract lower-level represen-
tations. Additional differences can be traced to the intrinsic ambiguity involved in content
interpretation that gives rise to diverse imprecision handling methodologies, to conceptual
modeling requirements that affect the granularity and types of knowledge considered, to
the interplay between knowledge, inference, and analysis, and so forth, to name but a few.
In the following, consider the following representative examples from the current state of
art, outlining themain characteristics, weaknesses, and insights, starting with approaches that
consider the formal representation of media-related knowledge, and which can be further
classified into those adhering to standardized definitions, such as MPEG-7, and those
following proprietary ones. In [38] domain experts define, through a graphical interface,
rules that map particular combinations of low-level visual features (color, texture, shape, and
size) to high-level semantic concepts defined in the domain ontology. These rules are
subsequently applied in order to infer descriptions of the form ‘‘still region ri depicts cj,’’
where cj is an instance of a domain concept [39, 40]. In [23], MPEG-7 compliant visual
descriptions (color, texture, shape) are used to enrich domain concepts serving as
prototypical visual instantiations. These enriched domain representations are subse-
quently used as prior knowledge to train statistical classifiers for the automated detection
of the addressed semantic concepts. The semantic link between these prototypical
instances and the respective domain concepts is established through the M-Ontomat-
Annotizer graphical annotation tool [41] making use of the Multimedia Structure
ontology, the Visual Descriptor ontology, and the Visual Annotation ontology [29].
The MPEG-7 compliant media-related knowledge representations render the afore-
mentioned approaches particularly appealing in terms of reusing and sharing the knowl-
edge involved. The extracted low-level features can be interchanged straightforwardly
932 21 Multimedia, Broadcasting, and eCulture
between different applications, saving significant time and effort that would be required
for their recalculation. Similarly, the inference rules defined by domain experts and
enriched with visual attributes can be shared across applications enabling them to more
effectively communicate human domain knowledge, where relevant.
Addressing similar considerations, a visual ontology-guided approach to object rec-
ognition is presented in [42], this time adopting a proprietary approach to the definition
of the media-related descriptions. Domain experts populate the knowledge base using the
color, texture, and spatial concepts provided by the visual ontology to describe semantic
objects of the domain. An image processing ontology is used to formally encode notions
pertaining to the image processing level, including entities such as edge and region, and
features, such as color histograms to numerically characterize visual properties. The
linking of the visual ontology descriptions with the image processing ontology descrip-
tions representations can be either manually predefined or learned following a mixed
bottom-up and top-down methodology [43, 44]. Another processing-related ontology is
presented in [45], this time accounting for the algorithmic aspects of the analysis process
for the detection of semantic objects in video sequences. A domain-specific ontology
provides object descriptions extended with low-level and qualitative visual knowledge,
and a set of rules determines the sequence of steps required to detect particular semantic
objects using the definitions provided by the analysis ontology. In [116], clusters of visually
similar content, computed based on low-level features, are used as concept definitions to
enrich (via subclass relations) the linguistic terms comprising the domain ontology.
Besides differences in the modeling and engineering of the background knowledge, the
aforementioned approaches share a common underlying assumption, modeling interpre-
tation as straight bottom-up deduction by augmenting the initial perceptual facts through
inference upon the available background knowledge. Although this assumption may be
true in given applications, the incompleteness, ambiguity, and complexity that charac-
terize the task of content interpretation, in general, render such a view of limited
applicability. To meet the challenge of selecting among plausible alternatives while
constructing an interpretation, a number of approaches have investigated more closely
the requirements imposed on inferencing.
In a series of works [46–48], Description Logics are examined for high-level scene
interpretation based on the notion of aggregates, that is, concepts that consist of multiple
parts that are constrained with respect to particular spatial (or temporal) relations. High-
level concepts are linked to corresponding view-concepts, which realize the grounding with
low-level processing evidence and initiate the (partial) instantiation of aggregates. The
interpretation process is modeled as a recursive search in the space of alternative inter-
pretations exploiting the logical structure of the aggregates in a mixed bottom-up and
top-down fashion. The existence of multiple models though leaves open a great degree of
freedom for choosing which of the alternatives to examine first each time. In [49],
a middle layer serves as mediator, attempting to match hypotheses from the high-level
interpretation layer to the available evidence; if a hypothesis is neither confirmed nor
refuted, low-level image processing is invoked againwith accordingly updated parameters.
Work toward a probabilistic model for handling dependencies in compositional aggregate
21.2 Scientific and Technical Overview 21 933
hierarchies is sketched in [50], with the purpose of providing preference measures to
guide the interpretation steps.
In [51], the media interpretation configuration described above is extended to for-
malize interpretation as abduction over Description Logic ABoxes. The set j of input
assertions that are provided by image analysis are split into bona fide assertionsj1 that are
considered true by default, and bona fiat ones j2 that need to be explained. Interpretation
then is formalized as the abductive problem of finding the explanations a such that S,j1,a ⊨ j2, holds. In the presented experimental setting, j2 corresponds to the set of
spatial relationships assertions. Such division, however, is arbitrary and cannot be justified
formally; similar considerations hold for the definition of the backward-chaining rules
used to implement the abductive reasoning. Preference over the possible explanations is
determined in terms of the number of (new) individuals that need to be hypothesized (as
part of a) and the number of j2 assertions that get explained.
Advancing from purely deductive configurations to include nonstandard forms of
inference marks a significant turn, given the ill-defined transition from perceptual repre-
sentations into semantic descriptions. Automatic segmentation hardly ever results in
(semantically) meaningful partitions, while it is practically impossible to acquire unique
and reliable mappings between perceptual appearances and semantic notions. In such
a setting, the ability to cope with incompleteness and ambiguity is crucial, and abductive
reasoning, providing inference to the best explanation, presents an appealing direction for
future research.
However, extensions of this kind are not sufficient alone. The extraction of media
semantics at the level of objects, events, and scenes encompasses intrinsically a large
amount of imprecision. It permeates the extraction of features, the identification of
shapes, matching textures, colors, etc., and distills the translation from perceptual to
symbolic representations addressed by image analysis. The latter may express either
uncertainty, thus representing degrees of belief and plausibility, or vagueness, expressing
conformity through degrees of membership [52]. Yet, the majority of the literature tends
to treat the descriptions upon which inference is performed as crisp facts (binary prop-
ositions), ignoring the probability or vagueness information captured in the accompany-
ing confidence degrees.
The need to accommodate for vagueness has been acknowledged early on. In [53],
a preliminary investigation into the use of Description Logics for object recognition is
reported, outlining the limitations involved with exact recognition; in a subsequent
investigation, the proposed framework has been extended with approximate reasoning
to assess composite shape matching and subsumption [54]. In [55], a fuzzy DLs-based
reasoning framework is proposed to integrate, possibly complementary, overlapping, and/
or conflicting classifications at object and scene level, into a semantically coherent final
interpretation. The input classifications, obtained by means of statistical learning, are
modeled as fuzzy assertions, and a three-step procedure is followed in order to determine
the set of plausible interpretations, resolve inconsistencies by tracking the assertions and
axioms triggering them, and further enrich the interpretations by making explicit missing
descriptions [56]. Other approaches building on fuzzy semantics include [57], where
934 21 Multimedia, Broadcasting, and eCulture
fuzzy DLs are used to merge over-segmented regions and to accordingly update the
degrees of classifications associated to the regions, and [58], where a fuzzy ontology
capturing spatial relations for image interpretation is presented.
Similar to fuzzy extensions [59, 60], probabilistic extensions to logical inference [61,
62] have been investigated in media interpretation approaches, though significantly
sparser. In [63], an appropriately defined ontology, which links visually extracted descriptors
with domain entities and semantic constraints, guides the design of the Bayesian network used
to perform probabilistic inference over automatically extracted video descriptions. In [64],
commonsense knowledge, encoded in the formof first-order logic production rules, is used to
deduce the topology of a Markov Logic Network [65] and semantically analyze parking lot
videos. The use of the Markov Logic Network makes it possible to formulate the uncertainty
involved in the detection of objects and movements, as well as the statistical ambiguity
characterizing part of the domain knowledge; while in [118], bilattice theory, which orders
knowledge along two axes that represent the degree of truth and the degree of belief,
respectively [124], is explored as the means to handle and reason under imprecision in
human detection applications. In [119], a reasoning framework that combines Semantic
Web technologies with rule-based and causality-based reasoning is investigated, while
highlighting challenges with respect to inconsistency and uncertainty handling. Finally, it
is worth mentioning two initiatives that culminate the findings of a series of workshops
toward an ontology framework for representing video events. The Video Event Represen-
tation Language (VERL) models events in the form of changes of states, while the Video
Event Markup Language (VEML) serves as a complementary annotation framework [66].
Though less rigorous than respective logic-based formalisms for representing actions
and effects and temporal semantics (e.g., the Event Calculus [120]), such initiatives
manifest a continuously increased awareness and interest in cross-disciplinary results
and experiences.
The aforementioned work outlines an intriguing amalgam of valuable results and
insightful observations. As illustrated by the current state of the art, formal knowledge
representation and reasoning bring in a tremendous potential to inject semantics into the
otherwise data-driven statistical learning and inferencing technologies used in media
interpretation. Intrinsic traits challenge the typical deductive reasoning scheme much as
the classical binary value semantics, demanding a profound investigation of the exten-
sions and adaptations necessary to the currently available inference mechanisms. In this
quest, the management of imprecision is crucial, especially with regard to the effective
combination of probabilistic and fuzzy semantics under a formal, coherent framework:
the distinction between probable and plausible interpretations is key both to forming and
ranking alternative interpretations.
Supporting hybrid inference schemes that allow for imprecision though, is not
sufficient on its own to handle the missing and incomplete descriptions obtained by
means of typical media processing. Building on the classical logic paradigm, Semantic
Web languages adopt the open-world assumption. Low-level representations serve as
evidence that determine the set of possible interpretations and formal knowledge
is expected to further restrict them into valid ones based on coherency and consistency
21.2 Scientific and Technical Overview 21 935
considerations; yet compositional semantics are hardly encountered in the existing liter-
ature. The investigated interpretation configurations implicitly espouse a closed world-
view, focusing only on explicitly asserted facts, while poorly exploiting the supported
open semantics in the involved knowledge modeling and engineering tasks [66].
Such requirements are intertwined with another critical challenge, namely, the transition
from pipeline-like interpretation configurations, where successive steps of statistical and
logical inference take place, to more interactive schemes that exploit jointly learning and
logical inference. The existing approaches address fragmentarily and only partially the
aforementioned considerations, paving an interesting roadmap for future research
activities.
21.2.4 Semantics in Broadcasting
21.2.4.1 Metadata in Broadcasting from Its Origin
Although it was not called ‘‘metadata,’’ the audiovisual industry and the broadcasters in
particular have been managing such content-related information for decades. Archives
from where content has to be found and retrieved have been the place where the need for
accurate documentation first arose.
Metadata is the modern IT equivalent of a label on a tape or film reel (title, short
description) with potentially more structured machine-readable information (technical
details, broadcast time, storage location). With a growing quantity of content being
produced every year (thousands of hours of audio and video material), the business
rationale behind well-documented metadata is more justified than ever: ‘‘if you can’t find
it, it’s like you don’t have it, hence you must pay for it again!’’
Although the first databases date back to the 1960s, their real expansion came with
the democratization, ease of use, and the reasonable computing power of computers in the
mid-1980s. Within the broadcasting community, the ‘‘early adopters’’ waited until the
mid-1990s (already 15 years later) to measure the potential of metadata and information
management in databases. Still, it is only recently that the role of metadata has been fully
recognized.
In an analog world, the first broadcaster’s need for metadata was internal to recover all
the information historically available on tags, cards, production forms, and a reference to
a physical location of the media (e.g., tape, film, and disk on a shelf). Digitization has been
the opportunity for generating and managing more data like restoration information
(e.g., tools, methods, parameters, results). In a file-based production environment,
metadata is vital: what is the format of the video or audio file? What editorial content
does the file contain? Where is the resource within petabytes of distributed mass storage?
The exchange of content (e.g., between the post-producer of an advertising spot and the
broadcaster in charge of exploiting it) is also greatly facilitated by metadata to search
material, publish information on programs available, and provide information on the file
being provided.
936 21 Multimedia, Broadcasting, and eCulture
However, although all the technical conditions are now met to develop effective
metadata solutions for production, the cost of generating metadata remains a barrier
and the next challenge is to develop tools to automatically extract or generate metadata.
This includes speech-to-text recognition, face recognition, format detection, and content
summarization (e.g., reduce a 40-min program into a 3-min clip made of representative
key scenes and synchronized metadata).
Last but not least, the objective of broadcasters is to have their programs being easily
accessed and seen across a variety of delivery media and platforms including traditional
linear broadcast, but also Internet (live streaming, video-on-demand, catch-up TV),
mobiles, and any hybrid combination like hybrid broadcast–broadband. In this rich and
ubiquitous context, metadata is vital.
21.2.4.2 Metadata Standardization in Broadcasting
In this section, different metadata standards for production and distribution will be
mentioned. Proprietary metadata solutions from MAM (Media Asset Management)
solution providers or consumer electronics manufacturers (proposing competing pro-
gram guides accessible through their respective products for an additional fee) are
intentionally out of scope.
Different groups are working on broadcasting standards. The AdvancedMediaWorkflow
Association (AMWA) has a focus on metadata associated to container formats also carrying
metadata (the AdvancedAuthoring andMedia Exchange Formats, AAF,MXF). The European
Broadcasting Union (EBU) is developing technical specifications related to all domains of
broadcasting technology, including metadata. The Society of Motion Picture and Television
Engineers (SMPTE) develops specification for audiovisual production. Harmonization is
desired although difficult to achieve. But, it must be noted that several of the existing
standards correspond to different needs or have only a regional impact.
Why Are Standards Necessary?
The ‘‘business-to-business exchange’’ application led to the necessity to propose a solu-
tion for interoperability, that is, using information understandable to the sending
and receiving parties. It is critically needed in a broadcasting environment in which
data, aggregated from different providers, have to be forwarded in a common format to
receiving devices from different consumer electronics manufacturers. It remains true for
hybrid broadcast–broadband services where data are also aggregated from different
sources and represented in a common format, for example, for display on a portal page
or for transmission to devices.
What Is Meant by Interoperability?
The first level of interoperability is the identification of a common set of structured
attributes characterizing content with agreed names and detailed semantics. Some exam-
ples: DMS-1 has been defined by AMWA as a set of attributes to be associated to
audiovisual material in MXF (Media Exchange Format) containers. RP210 is an SMPTE
21.2 Scientific and Technical Overview 21 937
dictionary of metadata attributes commonly met in television and radio production. The
EBUCore is defined by EBU as a core set of metadata based onDublin Core to facilitate the
aggregation and exchange of metadata with audiovisual archive portals. ETSI TV-Anytime
was developed to facilitate the use of personal video recorders through harmonized
electronic program guide data. DVB Service Information is aminimum set of information
related to programs and services, which is broadcast in DVB streams.
The second level of interoperability is the representation format, which defines how the
structure of description attributes is being digitally serialized. Some examples of repre-
sentation formats are:
● SMPTE KLV (Key, Length, Value)
● W3C XML, RDF/OWL N3, or Turtle
● JSON (JavaScript Object Notation)
● DVB SI (binary encoding of service information)
The third level of interoperability is the definition of delivery mechanisms (e.g.,
standardized by DVB in Europe, ARIB in Japan, or ATSC in the USA) over, for example,
MPEG Transport Stream (MPEG-TS) or Internet Protocols (IP). This includes solutions
adapted to the bandwidth of the different media such as data fragmentation and partial
updates.
21.2.4.3 Using Ontologies: Metadata + Semantic
One motivation for broadcast metadata is to provide search and personalization
functionality through access to richer information in order to facilitate faster queries
and deliver results more relevant to users. This, in proportion with the large volumes of
audiovisual material being produced, requires always more metadata augmented
with more semantics. An important question to answer before designing an ontology
and associated properties is ‘‘what is it that the implementer wants users to search for?’’
Because of the close relation of the Semantic Web initiative to W3C, the use of
semantic descriptions of audiovisual content was initially thought to have a de facto
focus on distribution, that is, targeting access to content by the users. This is why work
primarily started from TV-Anytime (a metadata format for describing electronic program
guides and on-demand catalogs), which additionally proposes a consistent class model
and embryonic identifier-based entity relationships. Further work showed the high
potential value of also using semantic-based descriptions for metadata at the production
stage and broadcaster archives.
21.2.4.4 A Semantic Representation of TV-Anytime in a Nutshell
The TV-Anytime specification has been developed within an open forum of broadcasters,
manufacturers, operators, EPG providers, etc. It addresses linear broadcasting and online
nonlinear services. Although it was first published by ETSI in the series of specifications
938 21 Multimedia, Broadcasting, and eCulture
TS 102 822 in 2005, it also fits new content access concepts like catch-up TV and other
mobile services.
TV-Anytime benefits from a solid class-oriented data model shown in > Fig. 21.9.
> Figure 21.9 shows different types of classes (e.g., ProgramGroup, Programme,
Person), entity/class relationships (object properties in blue), and data properties
(in red). Considering how TV-Anytime could be represented in a Semantic Web model:
– The set of classes forms the backbone of themodel. All the classes (e.g., ProgramGroup,
Programme, Segment) represented in > Fig. 21.9 are properly identified (as they
would likely be recorded in a database) and can easily be attributed a URI, which is
a key eligibility criterion for a class in the Semantic Web. The fact that CreditsItem
does not have an identifier is not essential in XML but it will be more critical in
a semantic model to avoid using blank nodes through which a Person or
Organisation class instance would be linked to Programme with the addition of
the role data property. However, this means that credit items should be managed as an
individually identified class in the broadcaster’s database, which exists but is not
necessarily common practice. Best practice would dictate that a forthcoming version
of TV-Anytime contains an optional identifier per credit item.
– TV-Anytime relations such asMemberOf, AggregationOf, or RelatedMaterial are directly
eligible to become object properties. It must be noted that several of these relations have
their inverse also defined, which is another important feature in support of semantic
models.
– IDRef relationships (from the XML Schema) are also implicit object properties for which
better names can be found.
– XML implicit relationships like ScheduleEvent, an element member of the Schedule
complex type would need to be associated with a proper identifier to become
a ScheduleEvent class for which an object property such as HasScheduleEvent
would be used to create an association with a class schedule.
– As far as data properties are concerned, the transformation is rather straightforward
with the exception that reusable complex–type structures should be replaced by flat
structures directly referring to a class. This again is to avoid blank nodes.
Starting with the transformation rules mentioned above, it becomes easy to transform
the most significant part of the TV-Anytime model into an ontology written in RDF
(Resource Description Framework) and OWL (Web Ontology Language). As an example,
the statement ‘‘a TV Program has the title ‘Tonight’s Show’’’ could be expressed in RDF/
OWL as shown in > Fig. 21.10.
However, it is not necessarily optimal to work only in RDF/OWL. For example, cardinal-
ities cannot be managed with the same flexibility as in XML. An optionwould therefore be to
generate instances in the strongly validated XML environment, and to transform the results
into an instance of the equivalent ontology as shown in > Fig. 21.11.
The use of an instance template is attractive to users as it hides from them the complexity
of the ontology. However, generating instance templates for complex ontologies such as for
audiovisual services is a challenge. Tools to facilitate this are currently missing.
Pro
gram
Gro
up
Pro
gram
Gro
up
Pro
gram
me
Pro
gram
me
IDR
ef
UR
s, U
RL
(por
tal),
CR
D(p
rogr
amm
e),
etc.
TV
AID
Str
ing
CR
Dse
mid
Str
ing
Str
ing
IDR
ef
IDR
ef
date
Tim
e
Str
ing
date
Tim
e
strin
g
strin
g
sem
id
CR
ID
Inte
ger
TV
AID
IDne
eded
!
Seg
men
ted
Seg
men
tGro
upid
Mem
berO
f
Mem
berO
fA
ggre
gatio
nOf P
rogr
amid
Der
ived
From
Has
Cre
dz
Per
sonI
DR
ef
Sch
edul
eEve
nt
serv
iceI
DR
ef
Par
entS
ervi
ce
Ser
vice
IDR
ef
MIS
SIN
G!!!
Org
anis
atio
nID
Ref
Rel
ated
Mat
eria
l
Seg
men
tid
Epi
sode
Of
Titl
e
Seg
men
tinfo
rmat
ion
Titl
e
Rol
eor
gld
pens
old
end
star
t
Ser
vice
Nam
e
serv
icel
d
Inst
anoe
ld
Gen
reTite
Gro
upid
Num
berO
fitem
s
Gen
eP
rogr
ame
Cre
dits
Item
Org
anis
atio
n
Per
son
Sch
edul
e
Ser
vice
Sch
edul
eEve
nt
Ser
vice
Pro
gram
Gro
up, S
erie
s
Seg
men
t
Seg
men
tGro
up
.Fig.21.9
Entity
relationship
diagram
oftheTV-Anytimeclass
model
21.2 Scientific and Technical Overview 21 939
Data model
Parser
XML Schemarepresentation,
cardinalities, datatypes,etc.
Ontology, minimumconstraints (“some”, string),
focus on semantic links
Reasoner Application, e.g., search engine
Ontologyinstance
Instancetemplate
Validatedmetadatainstance
Database
Application, e.g., EPG
retrieve
crawl
map
. Fig. 21.11
Combining XML and RDF/OWL
Ontologytva:hasTitle a owl:DatatypeProperty ;
rdfs:domain tva:Programme ;rdfs:range xsd:string .
Instancetvshows:102587 a tva:Programme ;
tva:hasTitle “Tonight’s Show” .
. Fig. 21.10
Example of RDF statement, schema, and instance
940 21 Multimedia, Broadcasting, and eCulture
The main advantages of ontologies for broadcasters are:
– The simplicity of flat statements about resources
– The scalability to create new classes and properties, for example, for customization or
particular applications, in a backward compatible manner
– The possibility to infer properties
– The flexibility to use new query approaches.
Some of the disadvantages of ontologies for broadcasters are:
– A steep learning curve
– The danger of confusing concepts and misusing, for example, class and subclasses
– The management of cardinalities
– The nontrivial conversion of XML structures in RDF
– The lack of editing and validating tools
21.2 Scientific and Technical Overview 21 941
21.2.4.5 A Semantic Representation of Thesauri
Finally, another important part of the TV-Anytime specification is the Classification
schemes such as the controlled lists of genres and roles. The EBU has converted some of
the TV-Anytime classification schemes into SKOS (Simple Knowledge Organization
System), see http://www.ebu.ch/metadata/ontologies/skos/.
SKOS is a vocabulary that is very convenient for representing classification schemes
with object properties like broader term or narrower term, exactMatch, narrowMatch, or
broadMatch.
As shown in > Fig. 21.12, each term of a classification scheme (or thesaurus) is
independently subject to a series of statements and is no longer part of a hierarchical
XML structure such as used by MPEG-7, TV-Anytime or DVB. Nevertheless, the hierar-
chical structure can be reconstituted by reasoners as shown in > Fig. 21.13, and also
include machine-interpretable statements about mapping to other external classification
schemes. Ontologies and class models like SKOS are the answer to resolving access to
classification scheme terms:
– In MPEG-7, TV-Anytime or DVB, Classification Schemes are defined as hierarchical
lists of terms identified by a termID (the access key). Each term has at least a name and
a definition. In the XML space, resolving a URIwith termID into a term name requires
to put in place additional resolving mechanisms (e.g., developing a particular software
interface or API).
– In RDF, an object property will point to the SKOS class called ‘‘concept’’ identified by
its URI (e.g., the classification scheme locator and termID). If the classification scheme
has been imported or can be connected to, all data properties of the concept are
directly accessible. This is the lowest but very demonstrative level of ‘‘linked data.’’ Any
ontology can therefore refer to any term of a SKOS classification scheme.
Other mechanisms than SKOS could be defined in order to describe classification
schemes. However, the need for interoperability requires agreeing on a set of well-defined
classes and properties, which SKOS successfully proposes for controlled vocabularies.
ebu:ebu_ContentGenreCS.skos.xml#3.6.8.16.4a skos:Concept ;skos:note “Valid” ;skos:historyNote “2007-04-12” ;skos:changeNote “First version” ;skos:prefLabel “Dubstep” ;skos:narrowMatch
http://www.ebu.ch/cs/tva/ContentCS.xml#3.6.8.16.4 ;skos:broader
ebu:ebu_ContentGenreCS.skos.xml#3.6.8.16 .
. Fig. 21.12
Extract from EditorialFormatCodeCS
. Fig. 21.13
Screenshot of a SKOS view of EditorialFormatCodeCS using Protege
942 21 Multimedia, Broadcasting, and eCulture
21.2.4.6 The Holy Grail: Agreeing on a Class Model
Standardization groups like the European Broadcasting Union (EBU), the International
Press and Telecommunications Committee (IPTC), andW3CMedia AnnotationWorking
Group (MAWG) are now paying more attention to the Semantic Web and linked data,
generally starting by the ‘‘SKOSification’’ of their classification schemes. More interest-
ingly, there is also an attempt coordinated by EBU to define a common basic class model
for audiovisual content. These are the main classes used by some audiovisual schemas
(among several others):
– BBC ‘‘Programme Model’’ (http://www.bbc.co.uk/ontologies/programmes): brand,
series, episode, program, program item, version (of the program), segment, broadcast
(event), service, channel, broadcaster (role), person
– CableLabs: asset, chapter, distributor, provider, person, actor, director, producer, studio
– EBUCore (http://tech.ebu.ch/docs/tech/tech3293v1_1.pdf), EBU P-META (http://tech.
ebu.ch/docs/tech/tech3295v2_1.pdf) & W3C MAWG: resource, creator, contributor,
21.2 Scientific and Technical Overview 21 943
publisher, location, collection/group, fragment/part/segment, concept (classification),
person, organization
– ETSI TV-Anytime: program group (including brand, series, etc.), program, segment
group, segment, service, schedule, location (broadcast event, schedule event,
on-demand program, push download program), person, organization, concept
(classification)
– FRBR, work, expression, manifestation, item, person, corporate body, event, place,
concept, object
– IPTC newsML-G2 (http://www.iptc.org/cms/site/index.html?channel=CH0111):
news item, part, person, organization, creator, contributor, person, organization,
concept
– ISO/IEC MPEG-7: audiovisual segment, video segment, audio segment, text segment,
segment group, audiovisual region, fragment, collection, agent, person, organization,
place, event, object
– PBCore: resource, creator, contributor, publisher, concept (classification)
As can be seen from the examples above, nothing should prevent minimum harmo-
nization but a lack of willingness. To be finalized, this model will also require detailed
semantics for every class. Furthermore, several classes are eligible to become subclasses. Of
course, themodel can be complemented with user-defined classes or a user can utilize only
a subset of the above defined classes.
21.2.4.7 Agreeing on Properties
A first level of interoperability is achieved by defining a common set of classes. The effort
needs to be repeated on properties. There are two main types of properties in semantic
modeling:
– Object properties defining relation between classes/objects. EpisodeOf, AggregationOf,
and MemberOf are very explicit examples as shown in > Fig. 21.9.
– Data properties qualifying a class/object, of which typical examples are ‘‘Title,’’ ‘‘Iden-
tifier,’’ ‘‘Description.’’
Properties must be selected properly. The most important criterion consists of defin-
ing properties onwhich queries will be done:What is it that users will or should be looking
for? The second criterion is the definition of inverse transitive properties, which, by
inference, will enrich the number of triples in stores on which queries will be done,
therefore maximizing the chances of positive query hits. In a linked data environment,
the third criterion is to reuse existing ontologies defining classes and properties such as
FOAF (Friend of a Friend) for persons and contacts. Of course, the choice of such links to
existing ontologies shall not prevail upon the efficiency of a solution developed
for a particular application within a specific ecosystem. All that matters is the interoper-
ability requirement, which may vary. Linked data also raise issues like persistence and
944 21 Multimedia, Broadcasting, and eCulture
(e.g., editorial) quality. While in XML, agreeing on properties is more problematic
because the model is often closely linked to a particular application, in RDF, properties
and classes can complement those in an existing ontology.> Figure 21.14 shows a possible
high-level class model encompassing the commonalities of the schemas listed above. This
may be a first step toward a harmonization of audiovisual content metadata.
Channel
Location
Channel
Location
Series
ParentSeries
ParentGroup
ParentBlock
ParentGroup
Relatedordered
programmes
ProgrammeGroup
Collection, Mini-Series, Series,Concept, Show,
Theme, etc.
Episode
ItemGroup
Relateditems
Audiotrack
Videotrack
MediaObject
Clip (physical),newsfeed
rushes, shot,background
music,fragment
Broadcastevent
Schedule
ServiceBrand
Catalog
VoD, catch-upTV
Portal
Location
Lens
Lensparameters
EditorialObject
Dopesheet,script, report,
etc.
Location
Programmedescription
Event
Programmedescription
Person
E.g., Role
Agent
Role(contributor)
E.g., credits
Organisation
E.g., credits,brand, service
Concept
ClassificationSchemes
Camera
Cameraparameters
Instance (work)
Delivery (e.g.,broadcast)
Version,Manifestation
Programme(resource,
work asset)
E.g., TV/radioprogramme,movie, Tune
ItemBlock
Time-relatedand ordered
items
Item
Segment, e.g.,edited news
item/part
Pushdownload
. Fig. 21.14
A unified class model?
21.3 Example Applications 21 945
21.3 Example Applications
In this section, two major domains for the technologies and tools of semantic multimedia
are introduced with examples of semantic multimedia technology application: television,
and cultural heritage. > Section 21.3.1 is partially based on [69] with acknowledgment to
the coauthors.
21.3.1 Semantic Television
Television has on the one hand traditionally meant the broadcast industry and on the other
hand now incorporates the growing Web-based video domain which is converging with
classical broadcast TV in the end device. In this domain, the main atomic object for semantic
description is the individual TV program (or video item), while theremay be a further higher-
level description of the structure of those programs (EPG metadata, or a video playlist). The
main challenges in the television (and, by extension, Web video) domain are the scale of
the content available and the need for filtering and personalization of the content. The
NoTube project (notube.tv) considers three particularly representative scenarios for future
television enabled by semantic technology:
(a) The RAI demonstrator shows how news programs can be enriched with concepts
(people, places, themes) that allow personalized delivery of a news stream and easy
browsing to additional information. This demonstrator focuses on the value of
passive personalization of on-demand TV content.
(b) The Stoneroos demonstrator enables a user to create an interests profile in a simple
fashion, which can be used to generate TV program recommendations within their
personal EPG. This demonstrator focuses on the value of a multi-platform and
multilingual platform for personal content and ads.
(c) The BBC demonstrator shows how TV can be personalized using Social Web data,
facilitating a personalized TV experience without an intrusive user profiling process.
This demonstrator focuses on the value of active personalization of TV which is
integrated with the user’s Social Web activities.
To illustrate what is envisaged more generally by semantic TV, Jana is introduced as an
example future user of the NoTube infrastructure. She is socially active on the Web and
does not see the need to explicitly define her preferences or wait until she has used the
recommender system long enough for it to learn her preferences. In the first use case,
Jana’s recommendations are generated based on her online social activity. In the second
use case, Jana is interested in a program and uses the ‘‘I would like to knowmore’’-button.
Jana then gets information about this program, which contains links toWikipedia, IMDB,
or online information sources. Next to this, she also gets recommendations of related
programs. With the ‘‘why’’-button option Jana can see why each program has been
recommended to her. As enriched TV program descriptions are considered, the reasons
for recommendations are often based on interesting semantic relations between entities.
946 21 Multimedia, Broadcasting, and eCulture
For example, when Jana is watching an episode of ‘‘True Blood,’’ this makes her curious
about the series, so she picks up her smartphone to find out more about it using the
NoTube application. When she presses the ‘‘I want to know more’’-button the Wikipedia
page is shown as well as some recommendations. One of the recommendations is the pilot
of the series ‘‘Six Feet Under,’’ which she already knows. She is curious about the reason of
the recommendation, so she presses the ‘‘why?’’-button next to it and sees that both series
were created by ‘‘Alan Ball’’ and they share two genres: ‘‘black comedy’’ and ‘‘drama.’’ She
is happy to learn that the two series were created by the same man and continues by
looking up information about Alan Ball.
The open NoTube TV and Web infrastructure is illustrated in > Fig. 21.15. The front
end which is what the user sees is any device connected to the Internet and able to
consume NoTube services, whether a TV, PC, or mobile device, including a so-called
second screen (where a smaller mobile device is used to present auxiliary content in
synchronization with the TV signal on the larger screen device). The Application Logic
implements the workflow of data and processes which realize the NoTube service to the
end device. It relies on the Middleware and Broker layer for this, which makes use of
Semantic Web Service technology to dynamically discover and resolve adequate, available
services into more complex service workflows.
To enable this, sets of services for users, metadata, and TV content are developed,
which are described semantically and mediated by the broker, and in front specific
applications are developed to make use of those services to provide the desired
. Fig. 21.15
NoTube open TV and Web infrastructure
21.3 Example Applications 21 947
functionalities, for example, user activity capture, and content recommendation. The
section considers, from the user services, the Beancounter service which harvests user
data from online social profiles, which are used to generate a semantic user-interests
profile. On the basis of that interests profile, TV program descriptions are analyzed and
recommendations aremade to the viewer. From the content services, the DataWarehouse
service collects and enriches EPG data. Existing EPG harvesting services, such as XMLTV,
are used to obtain EPG data. The descriptions of TV programs are then enriched by
metadata services, such as the Named Entity Recognition service, which identifies entities
from the linked data cloud. The NoTube vocabulary alignment service identifies links
between concepts of different vocabularies.
21.3.1.1 User Activity Capture
The huge amount of interactions that a user performs on the Web represents a powerful
and extraordinary source for mining his or her interests and preferences. Even if several
attempts already tried to infer a profile of a user starting from his or her behavior on
the Web [121–123], the NoTube approach focuses on the opportunities unleashed by the
so-called Social and Real-time Web – where users discover, consume and share contents
within their social graph, in a real-time manner often using mobile devices. In such
environments each user produces a rich and machine-readable flow of activities from
which implicit and explicit information regarding his or her preferences can be extracted.
This scenario considers a generic user who holds at least two accounts on different
social Web applications: Last.fm and Glue.com. Last.fm tracks the user in listening,
sharing, and live events activities. Glue.com acts like a user log, realized as a browser
plug-in. Glue.com makes available through a set of Web APIs an exhaustive set of Web
resources the user visited, shared, or liked, enriching themwith an internal categorization.
The following sections show how data are aggregated from these sources, linked to
information in several linked data clouds, and how reasoning over the data can make
explicit the user’s interests in a user profile. The information aggregation is achieved
through identity resolution made against different ontologies.
To uniformly represent user activity data of different sources in a single graph, the
ATOM Activity Streams in RDF vocabulary (http://xmlns.notu.be/aair/) is used to rep-
resent user activities. To determine the objects of activity, a named entity recognition
service is used. An alignment service is used to link the objects of different vocabularies,
for example, Last.fm artists are linked to DBpedia entities and the BBC Music catalog
(http://www.bbc.co.uk/music). Vocabularies are defined for TV-related user activities,
that is, verbs such as, for example, ‘‘watching, reviewing, rating.’’
The data generated by the data collection and enrichment process form a potentially
huge amount of activities for each individual user. The challenge is to derive general user
interests from this set of user activities. This is done by using the DBpedia SKOS
vocabulary, which is the semantic counterpart of the Wikipedia Categories. If a user
listens to bands or musicians sharing the same subject, then it could be reasonable to infer
948 21 Multimedia, Broadcasting, and eCulture
that the subject represents an interest for that user. Moreover, the rich and complex SKOS
hierarchy of DBpedia allows one to extract a lot of other interesting information. For
example, if a user is particularly interested in movies where a particular actor or actress
appeared, more information will be available about this in the system, since it is highly
probable that DBpedia contains some SKOS subjects describing this. Similarly, if a user
listens to bands originating from a specific geographical region then this could be useful to
perform recommendations of other bands and artists.
21.3.1.2 Enriched EPG Data
In general, EPG data are produced by broadcast companies. Some broadcasts companies,
such as the BBC, have made their EPG data machine-readable and publicly available.
Other EPG data are harvested from websites using existing tools, such as XMLTV, and
converted to a machine-readable format.
Enrichments of EPG metadata are used to provide the end users with extra information
about the content in which they are interested. For example, a scheduled broadcast of a movie
could have an enrichment that enumerates the main actors together with the pointers to
IMDB. Recommender algorithms that fall in the category of content-based filtering algorithms
use content descriptions of the items for determining the relevance to the users. To give
a simple example, when a user oftenwatches content annotatedwith theWestern concept, then
other content annotated with the same or related concepts may be interesting to him or her.
In the NoTube project broadcast data are enriched using the linked data cloud, by
linking existing data sources, for example, DBPedia (subject data), Yago (data about
people), and IMDB (data about movies) to broadcast metadata. By enriching the EPG
data, links to semantic entities in the linked data cloud are added to the metadata of TV
programs. The interconnected entities in the linked data cloud allow for finding interest-
ing relationships between entities, for example, that twomovies have beenmade by people
that have a common interest in film noir. The relationship between entities is often typed,
for example, by SKOS relationships. Since not all relationships between entities are
considered interesting, the types of the relationships must be taken into account during
the recommendation process. This can be done by using relationships of specific patterns
and/or assigning a certain weight to specific relations.
21.3.1.3 Alignment Between Vocabularies
The data sources described above are often already annotated with a fixed set of concepts.
For example, EPG data from the BBC (http://www.bbc.co.uk/programmes) are annotated
with BBC-defined genre hierarchy and IMDB-categorized TV series and Films into
a similar set of genres. Vocabularies can be domain-independent, for example, Princeton
WordNet (http://wordnet.princeton.edu) provides a set of lexical concepts that match
words (e.g., in English) and provides semantic relations between those words. Such
21.3 Example Applications 21 949
vocabularies can be used to annotate domain-specific data. For example, the description
of a movie could be a set of WordNet concepts. For some datasets the Semantic Web
community already converted the vocabularies and schemas in RDF, like the BBC
Programmes ontology (http://bbc.co.uk/ontologies/programmes), the TV-Anytime
schemas and vocabularies, W3C WordNet. To cover multiple perspectives (extracted)
additional genre vocabularies for sources like YouTube are also created.
21.3.1.4 Personalized TV Program Recommendation
Given the availability of the semantically enriched EPG/TV program data, the semanti-
cally enriched user activity and interests profile, and the alignment between different
vocabularies, the NoTube component Beancounter is able to process the combination of
these data to provide a personalized TV program recommendation. The recommendation
strategy in NoTube takes a content-based approach, in which the closeness between
concepts in a classification scheme (e.g., the DBPedia categorization model) is taken to
provide aweighting of the topics withwhich a program is annotated with respect to the set
of topics in the user’s profile. To detail this further:
1. Identify weighted sets of DBPedia resources from user activity objects.
2. Compute the distance between DBPedia concepts in the user profile and in the
program schedule through a SKOS-based categorization scheme.
3. Choose the matches above a certain threshold for TV program recommendation.
As a result, it should be possible to present the user, through their EPG, for example,
a highlighting of TV programs, which should interest them and also to provide some
explanation for the recommendation, as shown in the mock-up below of a personalized
EPG (> Fig. 21.16). The same technologies are used in the back-end to enable the other
NoTube scenarios, such as a personalized news stream, or pushing personalized advertising.
21.3.2 Semantics in Cultural Heritage
Objects and content in cultural heritage (CH) are both textual and non-textual, interre-
lated with each other in various ways, and are produced by various organizations and
individuals in different ways. As a result, producing, harvesting, aggregating, publishing,
and utilizing cultural heritage content on theWeb is difficult in many ways. In this section,
three major problem areas are covered:
(1) Semantics for cultural heritage. Ontologies and metadata formats are the key
components needed in representing CH content on the Semantic Web. Rich ontol-
ogies and complexmetadata models coveringmore or less all aspects of human life are
needed for representing the semantics of culture for machines.
(2) Content creation challenges. Cultural heritage content is produced in a distributed
creation process by various organizations and individuals from different cultures using
. Fig. 21.16
Mocked-up personalized EPG from NoTube project
950 21 Multimedia, Broadcasting, and eCulture
different languages. The content is typically very heterogeneous, both in terms of
metadata formats and vocabularies/ontologies used. However, from the end user’s
viewpoint, content should be accessible seamlessly using different languages and vocab-
ularies from different eras, which means that the content should be made semantically
interoperable.
(3) Semantic eCulture systems. Semantic computing facilitates searching, linking, and
presenting the semantically interlinked, heterogeneous, multi-format, and multilin-
gual CH content. This applies to both professional and layman end users, as well as to
machines using CH repositories though service APIs.
Semantic Web technologies provide new solution approaches to all these areas, and
cultural heritage (CH) has become an important application domain for semantic technol-
ogies. This section presents an overview of issues and solution approaches related to
representing ontologies and metadata of cultural heritage, to creating syntactically and
semantically interoperable content, and to creating intelligent end-user applications on the
Semantic Web.
In journalism and multimedia, content is often collected, described, and searched in
terms of the ‘‘Five Ws and one H’’:
● Who? Who was involved?
● What? What happened and what was involved?
21.3 Example Applications 21 951
● Where? Where did it take place?
● When? When did it take place?
● Why? Why did it happen?
● How? How did it happen?
In the following, firstly properties of semantic cultural content along these ontological
dimensions are discussed.
21.3.2.1 Ontological Dimensions
To answer the who-question, vocabularies, authority files, and ontologies of persons,
organizations, and fictive actors have been created. The problems of identifying and
describing, for example, persons are well known in, for example, the library domain
[69]. For example, similar names are shared by many individuals (e.g., John Smith),
names change in time (e.g., when getting married), names are transliterated in different
ways in different languages, people use pseudo names and are known by nicknames. An
example of an extensive authority system is the Library of Congress Authority Files
(http://authorities.loc.gov). The Universal List of Authority Names (ULAN) (http://
www.getty.edu/research/conductingresearch/vocabularies/ulan/) of Getty Foundation is
widely used in cultural institutions and Semantic Web systems.
The what-question involves both events that take place and tangible objects that
participate in events. Events are a central category in knowledge representation of artificial
intelligence (AI) [70], and have been employed in semantic cultural heritage systems, too.
Events form the core of the CIDOC CRM system [71], an ontological system for
harmonizing cultural heritage and library content. By describing what actually is hap-
pening in the real world, heterogeneous content can be harmonized and made interop-
erable in a deeper semantic sense [73]. In ontologies, such as DOLCE [72] and WordNet
[74], events are separated from other ontological concepts. Events remain a difficult
concept to represent through an ontology as they are complex and necessarily involve
many other concepts in a particular relationship to one another.
For representing tangible objects, cultural heritage thesauri such as the Art and
Architecture Thesaurus (AAT) are available. These thesauri make it possible to identify
and disambiguate object types from each other and harmonize references to them.
Additional ontological descriptions are needed for more refined semantic descriptions
of objects. One dimension here is the structure of the object. This involves, for example,
describing various part-of relations [74], such as area inclusion, member-of, made-of, and
consists-of relations. For example, a material ontology can be used for describing the
materials of which objects are made of. A consists-of relation may describe the compo-
sition of objects, for example, that legs are part of chairs. Also the function of objects is
often important to know, for example, that ships are used for sailing. Such relations are
needed, for example, when aggregating related information and objects together in search
and recommender systems.
952 21 Multimedia, Broadcasting, and eCulture
A research area of its own is creating and searching 3D representations of
cultural artifacts and buildings [112]. For example, there are 3D models of CH buildings
and cities, such as the virtual Kyoto [75], using platforms such as Second Life and Google
Earth.
The where-dimension in the CH domain is challenging, because one has to deal with
not only modern geographical places, available in resources such as GeoNames and
various national geographical gazetteers, but also with historical places that may not
even exist today. The Thesaurus of Geographical Names (TGN) is a resource in which lots
of historical places can be found. A problem in dealing with historical places is that
cultural content is typically indexed (annotated) using historical names (e.g., Carthage)
but can be queried using names from different time periods (e.g., modern geography),
too. To address the problem of changing boundaries and names of historical places,
a model and ontology is presented in [76]. In a more general setting, the concept of
places is complex and involves not only geographical information. For example, what does
‘‘Germany’’ actually mean in terms of location, time, and culture?
Time and the when-question are of central importance in the cultural heritage domain
that deals with history. A special problem of interest here is that time periods are often
imprecise in different ways [77, 78]. Firstly, the time may be exact but not known. For
example, an artifact may have been manufactured in a certain day but it is not known
exactly when, only an estimate of the year, decade, or century may be known. This kind of
uncertainty of time can be modeled, for example, using time intervals and probability
theory. Secondly, the time may be fuzzy in nature. For example, a castle may have been
built during a longer period (or periods) of time with different intensity, so it is not
possible to state exactly when it was actually built. A modeling option here is to use fuzzy
sets for representing time. It should be noted also that time periods may not be absolute
but are conditioned by places. For example, the notion of the ‘‘bronze age’’ and stylistic
periods of art, for example, ‘‘art nouveau,’’ may be different in different countries and
cultures.
From a machine viewpoint, formal time representation can be used for reasoning, like
in the interval calculus [79], and when matching query time periods with indexing time
periods. From the human–computer interaction viewpoint, a key question is how does
one perceive uncertain time intervals in information retrieval, that is, when querying with
an imprecise time period? For example, querying on ‘‘the middle ages,’’ what time periods
should be included in the answer set and how relevant are they?
The question ‘‘why’’ has not been addressed much in CH systems for the Semantic
Web. There are, however, several approaches to this. Firstly, it is possible to model causal
chains explicitly in annotations. For example, in the history ontology HISTO (http://
www.seco.tkk.fi/ontologies/histo/), there are some 1,200 historical events of history some
of which are related to each other using causal chains. Explicit links or transitive link
chains based on them can be then shown to the end user illustrating the why-dimension.
A problem here is that there may be disagreements about historical causality and other
facts between the historians creating ontologies or annotating the content. Actually, it is
not uncommon that knowledge in humanities is based on different opinions. Metadata
21.3 Example Applications 21 953
can then be used for encoding different opinions, for example, what was the cause of the
World War II. Secondly, on a reasoning level, implicit relations between related objects of
interest can be explained based on the rules used during reasoning, as customary in some
expert systems of artificial intelligence research. For example, in [80], the semantic
recommendation links between related artifacts are produced using Prolog rules, and
a simple explanation of the reasoning chain in natural language is exposed to the end user.
Semantic CH systems have the potential to address the why-question by exposing and
presenting cultural content in novel ways that helps one in understanding cultural
phenomena and processes. Semantic techniques and computing can be used as a tool of
research for making explicit something useful or new that is only implicitly present in
a repository of cultural content. At this point, one enters the field of digital humanities
[81]. For example, the idea of associative or relational search, developed for security
applications [82], can be used as an approach to answer why-questions. For example, in
[83], one can query how two persons are related to each other based on the social network
of the ULAN registry of historical persons. Relational search is available also in [84].
Finally, the how-question addresses the problem of describing how things happen(ed).
By chaining and relating events with each other and by decomposing them into sub-
events, semantics of narrative structures such as stories can be presented. For example, in
[85], modeling CH processes and stories using RDF(S) is discussed. In [83], the narrative
structures of the epic Kalevala, and the processes of making leather booths, ceramics, and
farming have beenmodeled as narrative structures and interlinked to related CH contents,
such as objects in museum collections.
21.3.2.2 Challenges of Content Creation
Cultural heritage content is available in various forms, is semantically heterogeneous, is
interlinked, and is published at different locations on the Web. From the end user’s
viewpoint, it would be useful if content related to some topic, person, location, or other
resource could be aggregated and integrated, in order to provide richer and more
complete seamless views to contents. This is possible only if the interoperability of
heterogeneous content can be obtained on syntactic and semantic levels.
There are two major ways to address interoperability problems: one can either try to
prevent them during original content creation or one can try to solve the problems
afterward when aggregating content and creating applications. Preventing interoperability
problems is the goal of various efforts aiming at developing standards and harmonized
ways of expressing content, metadata, and vocabularies as well as best practices for
cataloging. The process of producing content can be supported by various shared tools,
such as shared ontology services and metadata format repositories.
Although harmonizing content creation would in general be the optimal strategy to
address interoperability issues, this is in practice possible only to some extent. As a result,
lots of post-processing effort is needed for solving interoperability problems afterward when
making existing non-harmonized content syntactically and semantically interoperable.
954 21 Multimedia, Broadcasting, and eCulture
21.3.2.3 Syntactic and Semantic Interoperability
Syntactic interoperability means that data are represented in similar formats or structures.
Syntactic interoperability requires that similar fields are used in metadata structures, and
that their values are filled using similar formats. For example, one may demand that the
name of a person in two interoperable metadata schemas should be expressed using
separate properties ‘‘firstName’’ and ‘‘lastName,’’ and that the name strings used as
value are transliterated using the same system. For example, the name of Ivan Ayvazovsky,
the Russian painter (1817–1900) has 13 different labels in ULAN (Ajvazovskij, Aivazovski,
Aiwasoffski, etc.), all correctly transliterated in their own way.
Since Semantic Web content is represented using ontologies and metadata schemas,
semantic interoperability issues come in two major forms. First, there is the problem
of schema interoperability, that is, how two different metadata schemas of similar or
different content types can be made mutually interoperable. For example, the ‘‘painter’’
of a painting and the ‘‘author’’ of a novel should somehow be declared semantically related
as creators of a piece of art, otherwise all creators cannot be found and related. It
is also possible that syntactically similar property names in two schemas have different
meaning, which leads to semantic confusion. Second, there is the problem of vocabulary
interoperability. Here, values of a metadata schema field in content from different orga-
nizations may have been taken from different vocabularies that are related, but this
relation has not been explicated. For example, a vocabulary may have the concept of
‘‘chair’’ while ‘‘sofa’’ is used in another one. It is also possible that the same label has
different interpretations in different vocabularies, for example, ‘‘chair’’ as a piece
of furniture, or as a spokesman of an organization. CH concerns various topic areas,
such as art, history, handicraft, etc. in which different thesauri and vocabularies are used.
Even within a single topic area, different mutually non-interoperable vocabularies may
be used.
There seems to be at least three approaches to obtainingmetadata schema interoperability.
First, a minimal ‘‘core’’ schema can be specified that defines the common parts of all schemas
in focus. Then, more refined schemas called applications can be extended from the core by
introducing new fields and refining original ones. This approach has been adopted by the
Dublin Core (DC) Metadata Initiative. For example, ‘‘date’’ is a DC element that can further
be specified as ‘‘date published’’ or ‘‘date last modified.’’ The core elements can be refined or
qualified in an interoperable way by formal expressions, which relate the refinement or
qualification back to the core element, for example, the relationship between a core property
and its refinements can be represented in RDFS using the property rdfs:subPropertyOf. An
example of a DC application is VRA Core for representing metadata about works of visual
culture as well as the images that document them.
Second, it is possible to define a harmonizing ontology or schema that is capable of
representing all metadata schemas to be integrated. Semantic interoperability on a schema
level is then obtained by transforming the metadata of different forms into this harmo-
nized ontology. Awell-known example of this is the CIDOC Conceptual Reference Model
(CIDOC CRM) [3], the ISO standard 21127:2006. This model provides definitions and
21.3 Example Applications 21 955
a formal structure for describing the implicit and explicit concepts and relationships used in
cultural heritage documentation. The framework includes 81 classes, such as crm:Man-
MadeObject, crm:Place, and crm:Time-Span, and a set of 132 properties relating
events and the entities with each other, such as crm:HasTime-Span and crm:IsIdentifiedBy.
Third, it is possible to transform all metadata into a knowledge representation about
the events of the world, as customary in AI. This approach involves developing and using
domain ontologies/vocabularies for representing events and objects, not considered in the
CIDOC CRM standard focusing on schema semantics [72].
A major area of research in semantic CH applications is the (semi)automatic anno-
tation of contents. If contents are described using texts, then named entity, concept, and
relation extraction techniques [87] can be employed first in order to move from literal
into concept space. For non-textual multimedia content, for example, images, videos,
speech, and music, problems of crossing the semantic gap have to be addressed [94].
21.3.2.4 Semantic eCulture Systems
A major application type of semantic technologies in the CH domain has been semantic
portals [87]. Examples of such systems include, for example, MuseumFinland [80]
presenting artifacts from various museums, MultimediaN E-Culture demonstrator [84]
presenting art and artists fromvarious museums, CultureSampo [83] presenting virtually all
kinds of cultural contents (objects, persons, art, maps, narratives, music, etc.), CHIP [88]
for personalized mobile access to art collections, and Mobile DBPedia Mobile [89] for
mobile access to linked data contents. Systems such as Wikipedia (DBPedia) and Freebase
include lots of semantically linked CH content. In addition to systems utilizing SemanticWeb
technologies, there are even more eCulture sites, portals, and applications on the Web
implemented using more traditional technologies. Many of these systems have been reported
since 1997 in the Museums and the Web conference series (http://www.archimuse.com/
conferences/mw.html). A typical eCulture application here is a tailored application for
explaining and teaching a particular CH topic with a nice graphical Flash-based user interface.
In this section, research on semantic eCulture portals is described that focus on publishing
CH content from the collections of museums, libraries, archives, media organizations, and
other sources on the Semantic Web. A common goal in such CH portals is to create a global
view over CH collections that are distributed over the Web, as if the collections were a single
uniform repository. This idea, developed originally in some national research projects, has also
been adaptedonan international level inprojects such as theEuropeanLibrary andEuropeana.
These large-scale systems are, however, still based on traditional rather than semantic technol-
ogies. There is, however, a demonstration for Europeana’s semantic search based on
the MultimediaN E-Culture system (http://eculture.cs.vu.nl/europeana/session/search).
In order to survive on the Web, a CH portal should be beneficial to both content
providers and their customers. We describe below, an ideal ‘‘business model’’ of
a semantic CH portal, based on CultureSampo [83], clarifying the challenges and benefits
of utilizing semantic technologies in CH portals.
956 21 Multimedia, Broadcasting, and eCulture
There are twomajor categories of challenges involvedwhen creating aCHportal: semantic
and organizational. First, semantic challenges arise from the fact that cultural heritage content
is semantically heterogeneous and available in various forms (documents, images, audio
tracks, videos, collection items, learning objects, etc.), concern various topics (art, history,
handicraft, etc.), is written in different languages, and is targeted at both laymen and experts.
Furthermore, the content is semantically interlinked, as depicted in > Fig. 21.17.
Second, organizational challenges arise from the fact that memory, media, and other
organizations and citizens that create the contents work independently according to their
own goals and practices, as illustrated in > Fig. 21.18. This freedom and independence of
publication is essential and empowers the whole Web, but results also in redundant work
in content creation, and that interoperability of content between providers cannot be
achieved easily. For example, redundant information about cultural persons, places,
historical events, etc. has to be collected and maintained in many organizations, because
of missing collaboration between organizations. Each organization will have its own
database/metadata schema, which cannot be changed.
The Semantic Web–based solution approach to these problems is illustrated in
> Fig. 21.19, using elements of > Figs. 21.17 and > 21.18. The apparatus produces
Videos
MapsArtifacts
Encyclopedia
NarrativesLiterature
Music
Fine artsBiographies
Cultural sites
Buildings
. Fig. 21.17
Semantic challenges of CH portals: cultural heritage content comes in many forms and is
interlinked
. Fig. 21.18
Organizational challenge of CH portals: content is produced by independent organizations
and individuals for their own purposes with little collaboration
21.3 Example Applications 21 957
harmonized RDF content for a global knowledge base. In the center are the ontologies
forming a conceptual backbone of the system. The collection items around the ontologies
are attached to the ontologies by metadata that is produced in terms of harmonized and
interlinked metadata schemas and vocabularies. The content providers depicted around
the circle, that is, the portal system, publish metadata locally and independently by using
shared metadata schemas and ontologies. The result is a large global semantic RDF
network linking different contents together in ontologically meaningful ways. When an
organization or an individual person submits a piece of (meta)data into the system, the
new data get automatically semantically linked to related materials, that is, semantically
enriched. At the same time, all relatedmaterials get enriched by references to the new piece
of knowledge, and through it to other contents. The collaborative business model works,
because each additional piece of knowledge is (in the ideal world) beneficial to everybody
participating in the system. An additional benefit is that content providers can share
efforts in developing the ontology infrastructure, reducing redundant work.
> Figure 21.19 shows that a semantic CH portal is far more than the portal pages, as seen
by the customer on the Web. Firstly, a collaborative ontology infrastructure is needed. This
includes a set of cross-domain ontologies, such as artifacts, places, actors, events, time, etc.,
ontology alignments [90], and a selection of metadata schemas and their alignments.
. Fig. 21.19
Solution approach – harmonized, distributed production of content, linked together into
a global RDF knowledge base
958 21 Multimedia, Broadcasting, and eCulture
Secondly, a content production and harvesting system is needed for the content providers
and the portal for producing and maintaining the content. A most important question
here is at what point semantic content is produced: during cataloging by the content
providers or afterward when harvesting the content at the portal. The choice depends
on the case at hand, but in general high-quality semantic content can be produced best at
the organizations producing the content, and shared tools supporting this using the
underlying ontology infrastructure can be very useful. For example, in CultureSampo
the national FinnONTO infrastructure [91] with its ontology services is used as a basis.
Semantics can be used to provide the end user, both human users and machines, with
intelligent services for finding, relating, and learning the right information based on his or her
own preferences and the context of using the system. Major functionalities of human user
interfaces include:
● Semantic search, that is, finding objects of interest
● Semantic browsing, that is, linking and aggregating content based on their meaning
● Visualization, that is, presenting the search results, contents, and browsing options in
useful ways
21.3 Example Applications 21 959
In the following, these possibilities of providing the end users with intelligent services
are briefly explored.
21.3.2.5 Semantic Search
On the Semantic Web, search can be based on finding the concepts related to the
documents at the metadata and ontology levels, in addition to the actual text or other
features of the data. With such concept-based methods, document meanings and queries
can be specified more accurately, which usually leads to better recall and precision,
especially if both the query and the underlying content descriptions are concept-based.
In practice, semantic search is usually based on query expansion, where a query concept is
expanded into its subconcepts or related concepts in order to improve recall. For example,
the query ‘‘chair’’ could find ‘‘sofas’’ too, even if the word ‘‘chair’’ is not mentioned in the
metadata of sofas. However, care must be taken when expanding queries so that precision
of search is not lost. For example, the underlying ontological hierarchies, such as a SKOS
vocabulary, may not be transitive leading to problems. For example, if the broader concept
of ‘‘makeup mirrors’’ is ‘‘mirrors,’’ and the broader concept of ‘‘mirrors’’ is ‘‘furniture,’’
then searching for furniture would return makeup mirrors, if query expansion is applied.
Part-of relations are especially tricky in terms of query expansion. When searching
chairs in Europe, also chairs from different countries in Europe can be found. However,
‘‘doors’’ should not be returned when searching for ‘‘buildings’’ even if doors are part of
buildings.
A problem of semantic search is mapping the literal search words, used by humans, to
underlying ontological concepts, used by the computer. Depending on the application,
only queries expressed by terms that are relevant to the domain and content available
are successful, other queries result in frustrating ‘‘no hits’’ answers. A way to solve the
problem is to provide the end user with explicit vocabularies as facets in the user interface,
for example, a subject heading category tree as in Yahoo! and dmoz.org. By selecting
a category, related documents are retrieved. If content in semantic search is indexed using
language-neutral concept URIs, and their labels are available in different languages,
multilinguality can be supported.
Awidely employed semantic search and browsing technique in semantic CH portals is
view-based or faceted search [95–99]. Here, the user can make several simultaneous
selections from orthogonal facets (e.g., object type, place, time, creator). They are exposed
to the end user in order to (1) provide him or her with the right query vocabulary, and
(2) for presenting the repository contents and search results and the number of hits in
facet categories. The number of hits resulting from a category selection is always shown to
the user before the selection. This eliminates queries leading to ‘‘no hits’’ dead ends, and
guides the user in making the next search selection on the facets. The result set can be
presented to the end user according to the facet hierarchies for better readability. This is
different from traditional full text search where results are typically presented as a hit list
ordered by decreasing relevance. Faceted search is not a panacea for all information
960 21 Multimedia, Broadcasting, and eCulture
retrieval tasks. A Google-like keyword search interface is usually preferred if the user is
capable of expressing his or her information need accurately [101].
Faceted search has been integrated with the idea of ontologies and the Semantic Web
[99]. The facets can be constructed algorithmically from a set of underlying ontologies
that are used as the basis for annotating search items. Furthermore, the mapping of search
items onto search facets can be defined using logic rules. This facilitates more intelligent
semantic search of indirectly related items. Methods for ranking the search results in
faceted search based on fuzzy logic and probability theory are discussed in [100].
Another search technique now abundant in semantic CH applications is autocompletion.
The idea here is to search feasible query word options dynamically as the user types in a query,
and to provide the options for him or her to choose from. Semantic autocompletion [102,
103] generalizes this idea by trying to guess, based on ontologies and reasoning, the search
concept that the user is trying to formulate after each input character in an input field, or
even do the search down to the actual search objects dynamically.
With non-textual cultural documents, such as paintings, photographs, and videos,
metadata-based search techniques are a must in practice. However, also content-based
information retrieval methods (CBIR) [92], focusing on retrieving images, and multime-
dia information retrieval (MIR) [93], focusing on retrieving multimedia content, can be
used as complementary techniques. Here, the idea is to utilize actual document features
(at the data level), such as color, texture, and shape in images, as a basis for information
retrieval. For example, an image of Abraham Lincoln could be used as a query for finding
other pictures of him, or a piece of music could be searched for by humming it. Tools for
navigating, searching, and retrieving 2D images, 3D models, and textual metadata have
been developed, for example, in the Sculpteur project (http://www.sculpteurWeb.org).
Bridging the ‘‘semantic gap’’ between low-level image and multimedia features and
semantic annotations is an important but challenging research theme [94].
21.3.2.6 Semantic Browsing and Recommending
The idea of semantic browsing is to provide the end user with meaningful links to related
contents, based on the underlying metadata and ontologies of contents. RDF browsers
and tabulators are a simple form of a semantic browser. Their underlying idea has been
explicated as the linked data principle proposing that when an RDF resource (URI) is
rendered in a browser, the attached RDF links to related resources should be shown.When
one of these links is selected, the corresponding new resource is rendered, and so on.
A more developed and general idea is recommender systems [104, 107, 108]. Here, the
logic of selecting and recommending related resources can be based on also other principles
than the underlying RDF graph. For example, collaborative filtering is based on the browsing
statistics of other users. Also logic rules on top of an RDF knowledge base can be used for
creating semantic recommendation links and, at the same time, explanations telling the end
user why the recommendation link was selected in this context. Recommendations can be
based on a user profile of interest and the user’s feedback or browsing log [109, 110].
21.3 Example Applications 21 961
Semantic recommending is related to relational search, where the idea is to try to
search and discover serendipitous semantic associations between different content items.
The idea is to make it possible for the end user to formulate queries such as ‘‘How is
X related to Y’’ by selecting the end-point resources, and the search result is a set of
semantic connection paths between X and Y [83, 84].
The behavior of semantic CH applications should in many cases be dynamic, based on
the context of usage [104]. Users are usually not interested in everything found in the
underlying content repositories, and would like to get information at different levels of
detail (e.g., information for children, professionals or busy travelers). An important aspect
of a CH application is then adaptation of the portal to different personal information
needs, interests, and usage scenarios, that is, the context of using an application. The
context concerns several aspects:
● Personal interests and the behavior of the end user. Material likely to be of interest to
the user should be preferred. Techniques such as collaborative filtering could be useful
in utilizing other users’ behavior.
● The social environment of the user (e.g., friends and other system users).
● The place and other environmental conditions (e.g., weather) in which the application is
used.
● The time of using the system (summer, night, etc.). For example, recommending to the
end user that a visit to a beach for a swim during winter may not be wise due to snow, and
it would be frustrating to direct him or her to amuseum on aMonday when it happens to
be closed.
● The computational environment at hand (WiFi, RFID, GPS, ad hoc networks, etc.).
21.3.2.7 Visualization
Visualization is an important aspect of the Semantic Web dealing with semantically
complex and interlinked contents [105]. In the cultural heritage domain, maps, timelines,
and methods for visualizing complicated and large semantic networks, result sets, and
recommendations are of special interest.
Maps are useful in both searching content and in visualizing the results. Awidely used
approach to using maps in portals is to use mash-up map services based on Google Maps
or similar services. For example, lots of Wikipedia articles have location information and
can be projected on maps [89]. Maps can also be used as navigational aids.
In the cultural heritage domain, historical maps are of interest of their own. For
example, they depict old place names and borders not available anymore in contemporary
maps. An approach to visualize historical geographical changes is developed in [106].
Here old maps are laid semitransparently on top of the contemporary maps and satellite
images of Google Maps, as a kind of historical lens. At the same time, articles from
Wikipedia and photos from services like Panoramio, as well as objects from museum
collections can be visualized on top of maps, giving even more historical and contempo-
rary perspective to the contents.
962 21 Multimedia, Broadcasting, and eCulture
Maps are very usable in mobile phones and navigation systems. Many modern phones
include not only GPS for positioning, but also a compass for orientation. In some
augmented reality systems, it is possible to point the camera of the device in a direction
and get information about the nearby objects there. An example of this type of system is
Wikitude (http://www.wikitude.org).
Another important dimension for visualizing cultural content is time. A standard
approach for temporal visualization is to project objects of interest on a timeline.
A generic mash-up tool for creating timelines is the Simile timeline (http://simile.mit.
edu/timeline/). A timeline can be used both for querying and for visualizing and orga-
nizing search results.
21.3.2.8 Cultural Heritage as Web Services
The Semantic Web facilitates reusing and aggregating contents through Web APIs [83].
A starting point for this is to publish the CH repository as a SPARQL end point. It is also
possible to develop higher-level services for querying the RDF store. Both traditional Web
Services and lightweight mash-ups based on AJAX and REST can be used here. Using the
mash-up approach, the functionalities can be used in external applications with just a few
lines of JavaScript code added on the HTML level.
The possibility of reusing semantic CH content as a service is an important motivator for
organizations to join CH portal projects. In this way, one can not only get more visibility to
one’s content through a larger portal, but also enrich one’s own content with others’ related
content, and can use the enriched content back for other applications.
21.4 Related Resources Including Key Papers
21.4.1 Multimedia Ontologies, Annotation, and Analysis
Semantic Multimedia. Staab, S., Scherp, A., Arndt, R., Troncy, R., Grzegorzek, M.,
Saathoff, C., Schenk, S., Hardman, L.: Semantic multimedia. In: Reasoning Web: Fourth
International Summer School, Venice, Italy, 7–11 September 2008. Tutorial Lectures.
Springer, pp. 125–170 (2008).
In this paper, issues of semantics in multimedia management are dealt with, covering
the representation of multimedia metadata using Semantic Web ontologies; the interpre-
tation of multimedia objects by various means of reasoning; the retrieval of multimedia
objects by means of low- and high-level (semantic) representations of multimedia; and
the further processing of multimedia facts in order to determine provenance, certainty,
and other meta-knowledge aspects of multimedia data.
Enquiring MPEG-7-based multimedia ontologies. Dasiopoulou, S., Tzouvaras, V.,
Kompatsiaris, I., Strintzis, M.G.: Enquiring MPEG-7-based multimedia ontologies.
Multimed Tools Appl 46(2–3) (January 2010).
21.4 Related Resources Including Key Papers 21 963
Machine-understandable metadata form the main prerequisite for the intelligent
services envisaged in a Web, which going beyond mere data exchange and provides
for effective content access, sharing, and reuse. MPEG-7, despite providing
a comprehensive set of tools for the standardized description of audiovisual content, is
largely compromised by the use of XML that leaves the largest part of the intended
semantics implicit. Aspiring to formalize MPEG-7 descriptions and enhance multimedia
metadata interoperability, a number of multimedia ontologies have been proposed.
Though sharing a common vision, the developed ontologies are characterized by sub-
stantial conceptual differences, reflected both in the modeling of MPEG-7 description
tools as well as in the linking with domain ontologies. Delving into the principles
underlying their engineering, a systematic survey of the state of the art MPEG-7-based
multimedia ontologies is presented, and issues highlighted that hinder interoperability as
well as possible directions toward their harmonization.
COMM: ACore Ontology forMultimedia Annotation. Arndt, R., Troncy, R., Staab, S.,
Hardman, L.: COMM: a core ontology for multimedia annotation. In: Staab, S., Studer, R.
(eds.), Handbook on Ontologies, 2nd edn. International Handbooks on Information Sys-
tems. Springer Verlag, pp. 403–421 (2009).
This chapter analyzes the requirements underlying the semantic representation of media
objects, explains why the requirements are not fulfilled by most semantic multimedia
ontologies and presents COMM, a core ontology for multimedia, that has been built
reengineering the current de facto standard for multimedia annotation, that is, MPEG-7,
and using DOLCE as its underlying foundational ontology to support conceptual clarity and
soundness as well as extensibility toward new annotation requirements.
ADescription Logic for Image Retrieval. Di Sciascio, E, Donini, F.M., Mongiello, M.:
A description logic for image retrieval. In: Proceedings of AI*IA, September 1999.
This paper presents a Description Logic–based language that enables the description
of complex objects as compositions of simpler artifacts for the purpose of semantic image
indexing and retrieval. An extensional semantics is provided, which allows for the formal
definition of corresponding reasoning services.
Ontological inference for image and video analysis. Town, C.: Ontological inference
for image and video analysis. Mach Vis Appl 17(2), 94–115 (2006).
Though focusing solely on probabilistic aspects of the imprecision involved in image
and video analysis, the paper elaborates insightfully on the individual limitations of
ontological and Bayesian inference, and proposes an iterative, goal-driven hypothesize-
and-test approach to content interpretation.
21.4.2 Broadcaster Artifacts Online
21.4.2.1 Vocabularies and Ontologies
● BBC programmes ontology. This ontology aims at providing a simple vocabulary for
describing programs. It covers brands, series (seasons), episodes, broadcast events,
964 21 Multimedia, Broadcasting, and eCulture
broadcast services, etc. The data at http://www.bbc.co.uk/programmes are annotated
using this ontology. http://bbc.co.uk/ontologies/programmes/
21.4.2.2 Metadata Schemas
● TV-Anytime. TV-Anytime is a set of specifications for the controlled delivery of multi-
media content to a user’s personal device (Personal Video Recorder (PVR)). It seeks to
exploit the evolution in convenient, high-capacity storage of digital information to
provide consumers with a highly personalized TV experience. Users will have access to
content from a wide variety of sources, tailored to their needs and personal preferences.
http://www.etsi.org/Website/technologies/tvanytime.aspx
21.4.2.3 Semantic Television
NoTube: making the Web part of personalized TV. Schopman, B., Brickley, D., Aroyo, L.,
van Aart C., Buser, V., Siebes, R., Nixon, L., Miller, L., Malaise, V., Minno, M., Mostarda, M.,
Palmisano, D., Raimond, Y.: NoTube: making the Web part of personalized TV. In: Pro-
ceedings of the WebSci10: Extending the Frontiers of Society Online, April 2010.
The NoTube project aims to close the gap between the Web and TV by semantics. Bits
and pieces of personal and TV-related data are scattered around the Web. NoTube aims to
put the user back in the driver’s seat by using data that are controlled by the user, for
example, from Facebook and Twitter, to recommend programs that match the user’s
interests. By using the linked data cloud, semantics can be exploited to find complex
relations between the user’s interests and background information on programs, resulting
in potentially interesting recommendations.
21.4.3 Cultural Heritage Artifacts Online
21.4.3.1 Vocabularies and Ontologies
● Art and Architecture Thesaurus (AAT). A hierarchical vocabulary of around 34,000
records, including 134,000 terms, descriptions, bibliographic citations, and other
information relating to fine art, architecture, decorative arts, archival materials,
archaeology, and other material culture. http://www.getty.edu/research/conductin-
g_research/vocabularies/aat/
● Thesaurus of Geographical Names (TGN). A hierarchical vocabulary of around
895,000 records, including 1.1 million names, place types, coordinates, and descriptive
notes, focusing on places important for the study of art and architecture. http://www.
getty.edu/research/conducting_research/vocabularies/tgn/
21.5 Future Issues 21 965
● Universal List of Artist Names (ULAN). A vocabulary of around 162,000 records,
including 453,000 names and biographical and bibliographic information for artists,
architects, firms, shops, and art repositories, including a wealth of variant names, pseu-
donyms, and language variants. http://www.getty.edu/research/conducting_research/
vocabularies/ulan/
● Iconclass. A classification system designed for art and iconography. It is the most
widely accepted scientific tool for the description and retrieval of subjects represented
in images (works of art, book illustrations, reproductions, photographs, etc.) and is
used by museums and art institutions. http://www.iconclass.nl/
● Library of Congress Subject Headings (LCSH). A very large subject classification
system for libraries, available also in SKOS. http://id.loc.gov/authorities/
21.4.3.2 Metadata Schemas
● Dublin Core. The Dublin Core Metadata Initiative, or ‘‘DCMI,’’ is an open organiza-
tion engaged in the development of interoperable metadata standards that support
a broad range of purposes and business models. http://dublincore.org/
● CIDOC CRM. The CIDOC Conceptual Reference Model (CRM) provides
definitions and a formal structure for describing the implicit and explicit con-
cepts and relationships used in cultural heritage documentation. http://www.cidoc-
crm.org/
21.4.3.3 Semantic eCulture Systems Online
● MuseumFinland. A semantic portal aggregating artifact collections from several
museums [80]. The content comes from Finnish museums and the system interface
is in Finnish (with an English tutorial). http://www.museosuomi.fi/
● MultimediaN eCulture Demonstrator. This cultural search engine gives access to
artworks from several museum collections using several large vocabularies [84]. http://
e-culture.multimedian.nl/demo/session/search
● CultureSampo. A semantic portal aggregating cultural content of different kinds from
tens of different organizations, Web sources, and public [83]. The content comes from
Finnish and some international sources, and the user interface supports Finnish,
English, and Swedish. http://www.kulttuurisampo.fi/
21.5 Future Issues
Semantic technologies are seen as being nearly ready for mainstream adoption as the first
decade of the twenty-first century draws to an end. While the situation with respect
to textual content is quite mature, non-textual media present additional challenges
966 21 Multimedia, Broadcasting, and eCulture
to the technology adoption. As a result, the wider adoption of semantic multimedia may
first follow the breakthrough of other applications of semantic technology – for example,
knowledge management, data integration, semantic search – applied to textual content.
The new challenges being faced by themedia industry – the scale and complexity ofmedia
being produced and shared – act as amarket driver for technological advances in the semantic
multimedia field. Online media in particular needs improved retrieval, adaptation, and
presentation if content owners are to win market share in a broad and overfilled market.
A fewmedia organizations have begun to lead the way in using and demonstrating semantics,
for example, the BBC has begun to publish its online content with RDF.
The arts – that is, cultural heritage – are another sector in which semantics are gaining
traction. Museums, for example, have large amounts of metadata about their collections,
which cannot be easily interpreted or reused due to non-digital, non-semantic, and
proprietary approaches. Again, some pioneers, such as Rijksmuseum in Amsterdam, are
taking the first steps to digitize and annotate their collections and explore the new
possibilities which are realized.
The media, arts, and entertainment sector looks on semantics as a clear future solution
for their problems with large scales of heterogeneous non-textual content, and for the
emerging challenges in realizing attractive and competitive content offers on a ubiquitous
Web with millions of content channels. The cost of creating the semantic data tends to be
larger at present than the benefits gained from its creation, so while the potential benefit
from semantics will continue to grow as Web media becomes more ubiquitous (making
a Unique Selling Point ever more critical for a content owner and provider), the actual
costs of semantics must still fall through improved, more automated content annotation
tools and approaches. Let us look at trends and technology in two specific target areas for
semantic multimedia.
IP Television: IP Television refers to the convergence of Internet and Television,
which is also happening outside of the television set (e.g., also Web-based TV, Mobile
TV). Currently, it is focused on new types of services around television such as EPGs,
programming on demand, and live TV pause. An emerging trend in IPTV is toward Web
integration through widgets, which are lightweight self-contained content items that
make use of open Web standards (HTML, JavaScript) and the IP back-channel to
communicate with the Web (typically in an asynchronous manner). Yahoo! and Intel,
for example, presented their Widget Channel at the CES in January 2009, where Web
content such as Yahoo! news and weather, or Flickr photos, could be displayed in on-screen
widgets on TV. Sony and Samsungwill go tomarket in 2010with Internet-enabled televisions.
A 2009 survey found that there should be a gradual but steady uptake of TV Internet usage
with ‘‘the mass market inflection point occurring over the next 3–5 years’’ (from http://
oregan.net/press_releases.php?article=2009-01-07). Parallel to this, research into seman-
tic IPTV applications and solutions is being established in academic and industry labs.
A key focus for semantics is the formal description of the programming and user interests
to provide for a better personalization of the TV experience (EU project NoTube, http://
www.notube.tv, 2009–2012) as well as formal description of networks and content
to enable a better delivery of complex services (myMedia, http://www.semantic-iptv.de).
21.5 Future Issues 21 967
Amajor barrier to uptake by broadcasters and content providers is the lack of support
for semantic technology in the legacy broadcast systems. Shifts in the provider-side IT
infrastructure to Internet-based (even cloud-based) infrastructures should give an open-
ing for the introduction of semantics into the production systems of the television and
media companies. Vocabularies and technologies will need to converge on specific stan-
dards to encourage industry acceptance, as discussed in this chapter’s section on broad-
casting, which should emerge in this next ‘‘uptake’’ period. As Internet-TV reaches the
mass market point (possibly by 2014), companies will seek Unique Selling Points for their
products and services, which will drive the incorporation of semantic technologies into
IPTV infrastructures and packages.
Virtual Worlds and 3D: The third dimension has always been a part of human
perception, but in the digital world it has had a shorter existence. Today, on the other
hand, computers are capable of rendering highly complex 3D scenes, which can even be
mistaken for real by the human eye. 3DTV is on the cusp of market introduction. A new IT
segmentmust deal with the capturing of 3D objects, their manipulation andmanagement,
in application domains from health care to cultural heritage.
Challenges in the 3D technology domain include how to describe 3D objects for their
indexing, storage, retrieval, and alteration. Semantics provide a means to improve the
description, search, and reuse of complex 3D digital objects. Awareness of the value and
potential use of this technology in the 3Dmedia community is at an early stage [111]. It is
being promoted to industry through initiatives like FOCUS K3D (http://www.focusk3d.
eu), which has application working groups for the domains of medicine and bioinfor-
matics, gaming and simulation, product modeling, and archaeology. A survey on the state
of the art in cultural heritage [112] notes that progress is being made on standardized
metadata schemes and ontologies; current limitations relate to methodologies and the
lack of specialized tools for 3D knowledge management, yet this could be addressed in the
short to medium term.
Virtual worlds are a natural extension of 3D technology into reflecting the perceptive
realities of one’s own world and have also found applicative usage in domains such as
medicine, social analysis, education, and eCommerce. Making virtual worlds ‘‘react’’ more
realistically to actions and activities performed by the actors of that world requires
a (semantic) understanding of the objects rendered in the world and the events that (can)
occur between them. There is also a trend to more closely couple real and virtual
worlds through (real world) sensors, which generate data streams to cause the virtual world
to reflect the real in near-real time. This leads to a requirement to handle increasing scales of
heterogeneous, dirty data for information extraction and actionable inference within the
virtual world.
As in the 3D technology field, barriers to use of semantic technologies lie in the need to
agree on the vocabularies and schema for descriptions of the worlds (which now goes
beyond the form of objects, and encapsulates what can be done with them, how they react
to external events, etc.), as well as the availability of appropriate tools for the creation and
maintenance of semantic virtual worlds. Hence, it is likely that semantics will first need to
experience wide uptake in 3D technology systems before it also further develops into
968 21 Multimedia, Broadcasting, and eCulture
a technology for virtual worlds in the medium to long term. Projects such as Semantic
Reality (http://www.semanticreality.org) provide exciting longer-term visions of a virtual
world tightly connected to the real world, with trillions of sensors able to ensure a close
correlation between both [113].
Such visions of intelligent media and even intelligent worlds will be built on the blocks of
semantic multimedia technology discussed in this chapter, once key barriers to uptake are
overcome. In particular, semantic annotation of non-textual media remains a significant
barrier.
Foundational technologies to (semi)automatically annotate non-textual resources
have been investigated by the multimedia semantics community, which spans more
broadly the areas of computer vision and multimedia analysis. These areas provide
means to analyze visual streams of information with the help of low-level feature extrac-
tion, object detection or high-level event inferencing. Despite promising advances,
approaches that can be generically and efficiently applied to automate annotation across
media still remain to be defined. In contrast to textual resources, which are annotated
automatically to a large extent, non-textual media semantic annotation heavily relies on
human input, thus being associated to significant costs.
The area of computer vision provides methods to make visual resources eligible
for machines. Recent years have seen considerable advancement in the range of things that
are detectable in still and moving images. This includes object detection scaling up
to a considerable amount of different objects for some tools, to object tracking in,
for example, surveillance videos. All of these approaches try to derive meaning from low-
level features (like color histogram, motion vectors, etc.) automatically. Despite constant
advances, these tools are still not capable of exploiting the full meaning of visual resources as
not all meaning is localized in the visual features and needs human interpretation. Two ways
are currently followed in current research: The first one is to provide rich human annotations
as training data for future automated analysis. The second one relies purely on analysis of raw
content, which only performs well for specialized domains and settings in which relevant
concepts can be easily recognized. Richer semantics, capturing implicit features and meaning
derived from humans cannot be extracted in this manner. Present trends put therefore the
humanmore andmore into the loop by lowering the entry barrier for his or her participation.
This is done by adoptingWeb2.0 or game-based approaches to engage users in the annotation
of visual resources. Recent approaches, for example, try to support automatic analysis with
tagging or vice versa. What is still missing are approaches that are capable of exploiting more
high-level features also in visual resources of lower quality and which can be adapted across
domains.
Hence, in the foreseeable future, multimedia analysis has still to be supported by end users
to a great extent. This is why recent years have seen a huge growth in available annotation
tools, which allowmanual or semiautomatic annotation of visual resources. These approaches
are either targeted at the support of analysis approaches to provide training data or as ameans
for users to organize media. These approaches show a varying complexity: While some allow
users to express complex statements about visual resources, others enable the provision of
tags. Some approaches apply annotation or tag propagation and offer support based on
21.6 Cross-References 21 969
previously supplied annotations. Most of these approaches are still not mature and are only
applied in research.While approaches based on (complex) ontologies exist, some of them are
not suitable for most end users. At the other side of the spectrum, tagging-based approaches
are not suitable to capture all semantics in visual resources. What is still needed are tools that
allow one to capture subjective views of visual resources and combine these views to deliver
a consolidated objective view that can represent a view which holds across users. While
tagging-based approaches are proven to ease large-scale uptake, motivating users to provide
more meaningful annotations is still an issue.
21.6 Cross-References
> Future Trends
References
1. Deerwater, S., Dumais, S.T., Furnas, G.W.,
Landauer, T.K., Harshman, R.: Indexing by latent
semantic analysis. J. Am. Soc.Inf. Sci. 41(6),
391–407 (1990)
2. Burger, T., Hausenblas, M.: Why real-world mul-
timedia assets fail to enter the semantic web. In:
Proceedings of the Semantic Authoring, Annota-
tion and Knowledge Markup Workshop
(SAAKM 2007) located at the Fourth Interna-
tional Conference on Knowledge Capture (KCap
2007), Whistler. CEUR Workshop Proceedings
289. CEUR-WS.org (2007)
3. MPEG-7: Multimedia content description Inter-
face. Standard No. ISO/IEC 15938 (2001)
4. van Ossenbruggen, J., Nack, F., Hardman, L.:
That obscure object of desire: multimedia meta-
data on the web (part I). IEEE Multimed. 11(4),
38–48 (2004)
5. Nack, F., van Ossenbruggen, J., Hardman, L.:
That obscure object of desire: multimedia meta-
data on the web (part II). IEEE Multimed. 12(1),
54–63 (2005)
6. Troncy, R., Carrive, J.: A reduced yet extensible
audio-visual description language: how to
escape from theMPEG-7 bottleneck. In: Proceed-
ings of the Fourth ACM Symposium on Docu-
ment Engineering (DocEng 2004), Milwaukee
(2004)
7. Troncy, R., Bailer, W., Hausenblas, M., Hofmair,
P., Schlatte, R.: Enabling multimedia metadata
interoperability by defining formal semantics
of MPEG-7 profiles. In: Proceedings of the First
International Conference on Semantics and
Digital Media Technology (SAMT 2006), Athens,
pp. 41–55 (2006)
8. Garcia, R., Celma, O.: Semantic integration and
retrieval of multimedia metadata. In: Proceedings
of the Fifth International Workshop on
Knowledge Markup and Semantic Annotation
(SemAnnot 2005), Galway, pp. 69–80 (2005)
9. Hunter, J.: Adding multimedia to the semantic
web – building an MPEG-7 ontology. In:
Proceedings of the First International Semantic
Web Working Symposium (SWWS 2001),
Stanford, pp. 261–281 (2001)
10. Tsinaraki, C., Polydoros, P., Christodoulakis, S.:
Interoperability support for ontology-based
video retrieval applications. In: Proceedings of
the Third International Conference on Image
and Video Retrieval (CIVR 2005), Dublin,
pp. 582–591 (2005)
11. Troncy, R., Celma, O., Little, S., Garcıa, R.,
Tsinaraki, C.: MPEG-7 based multimedia
ontologies: interoperability support or interoper-
ability issue? In: SAMT 2007: Workshop on Mul-
timedia Annotation and Retrieval enabled by
Shared Ontologies (MAReSO 2007), Genoa
(2007)
12. Lagoze, C., Hunter, J.: The ABC ontology and
model (v3.0). J. Digit. Inf. 2(2) (2001)
970 21 Multimedia, Broadcasting, and eCulture
13. Hunter, J.: Enhancing the semantic interoperabil-
ity of multimedia through a core ontology.
IEEE Trans. Circuits Syst. Video Technol. 13(1),
49–58 (2003)
14. Hunter, J., Little, S.: A framework to enable
the semantic inferencing and querying of
multimedia content. Int. J. Web Eng. Technol.
2(2/3), 264–286 (2005)(Special issue on the
Semantic Web)
15. Pease, A., Niles, I., Li, J.: The suggested upper
merged ontology: a large ontology for the seman-
tic web and its applications. In: Working Notes of
the AAAI-2002 Workshop on Ontologies and the
Semantic Web, Edmonton (2002)
16. Gangemi, A., Guarino, N., Masolo, C., Oltramari,
A., Schneider, L.: Sweetening ontologies with
DOLCE. In: Proceedings of the 13th International
Conference on Knowledge Engineering and
Knowledge Management (EKAW 2002),
Siguenza, pp. 166–181 (2002)
17. Polydoros, P., Tsinaraki, C., Christodoulakis, S.:
GraphOnto: OWL-based ontology management
and multimedia annotation in the DS-MIRF
framework. J. Digit. Inf. Manag. 4(4), 214–219
(2006)
18. Tsinaraki, C., Polydoros, P., Christodoulakis, S.:
Interoperability support between MPEG-7/21
and OWL in DS-MIRF. Trans. Knowl. Data Eng.
19(2), 219–232 (2007)(Special issue on the
Semantic Web Era)
19. Garcia, R., Gil, R., Delgado, J.: A web ontologies
framework for digital rights management. J. Artif.
Intell. Law 15, 137–154 (2007)
20. Garcia, R., Gil, R.: Facilitating business interop-
erability from the semantic web. In: Proceedings
of the Tenth International Conference on Busi-
ness Information Systems (BIS 2007), Poznan,
pp. 220–232 (2007)
21. Troncy, R.: Integrating structure and semantics
into audio-visual documents. In: Proceedings of
the Second International Semantic Web Confer-
ence (ISWC 2003), Sanibel Island. Lecture Notes
in Computer Science, vol. 2870, pp. 566–581.
Springer, Berlin (2003)
22. Isaac, A., Troncy, R.: Designing and using an
audio-visual description core ontology. In:Work-
shop on Core Ontologies in Ontology Engineer-
ing at the 14th International Conference on
Knowledge Engineering and Knowledge Manage-
ment (EKAW 2004), Whittlebury Hall (2004)
23. Bloehdorn, S., Petridis, K., Saathoff, C., Simou,
N., Tzouvaras, V., Avrithis, Y., Handschuh, S.,
Kompatsiaris, Y., Staab, S., Strintzis, M.G.:
Semantic annotation of images and videos for
multimedia analysis. In: Proceedings of the Sec-
ond European Semantic Web Conference (ESWC
2005), Heraklion. Lecture Notes in Computer
Science, vol. 3532, pp. 592–607. Springer, Berlin
(2005)
24. Hollink, L., Worring, M., Schreiber, A.Th.: Build-
ing a visual ontology for video retrieval. In: Pro-
ceedings of the 13th ACM International
Conference on Multimedia (MM 2005), Hilton
(2005)
25. Vembu, S., Kiesel, M., Sintek, M., Bauman, S.:
Towards bridging the semantic gap inmultimedia
annotation and retrieval. In: Proceedings of the
First International Workshop on Semantic Web
Annotations for Multimedia (SWAMM 2006),
Edinburgh (2006)
26. Halaschek-Wiener, C., Golbeck, J., Schain, A.,
Grove, M., Parsia, B., Hendler, J.: Annotation
and provenance tracking in semantic web photo
libraries. In: Proceedings of International
Provenance and Annotation Workshop (IPAW
2006), Chicago, pp. 82–89 (2006)
27. Chakravarthy, A., Ciravegna, F., Lanfranchi, V.:
Aktivemedia: cross-media document annotation
and enrichment. In: Poster Presentaiton at the
Proceedings of Fifth International Semantic Web
Conference (ISWC 2006), Athens, GA. Lecture
Notes in Computer Science, vol. 4273. Springer,
Berlin (2006)
28. Petridis, K., Bloehdorn, S., Saathoff, C.,
Simou, N., Dasiopoulou, S., Tzouvaras, V.,
Handschuh, S., Avrithis, Y., Kompatsiaris, I.,
Staab, S.: Knowledge representation and semantic
annotation of multimedia content. IEE Proc.
Vis. Image Signal Process. 153, 255–262 (2006)
(Special issue on Knowledge-Based Digital Media
Processing)
29. Simou, N., Tzouvaras, V., Avrithis, Y., Stamou,G.,
Kollias, S.: A visual descriptor ontology for mul-
timedia reasoning. In: Proceedings of the Work-
shop on Image Analysis for Multimedia
Interactive Services (WIAMIS 2005), Montreux
(2005)
30. Saathoff, C., Schenk, S., Scherp, A.: Kat: the
k-space annotation tool. In: Poster Session, Inter-
national Conference on Semantic and Digital
21.6 Cross-References 21 971
Media Technologies (SAMT 2008), Koblenz
(2008)
31. Arndt, R., Troncy, R., Staab, S., Hardman, L.,
Vacura, M.: COMM: designing a well-founded
multimedia ontology for the web. In: Proceedings
of Sixth International Semantic Web Conference
(ISWC 2007), Busan. Lecture Notes in Computer
Science, vol. 4825, pp. 30–43. Springer, Berlin
(2007)
32. Saathoff, C., Scherp, A.: Unlocking the seman-
tics of multimedia presentations in the web
with the multimedia metadata ontology. In:
Proceedings of 19th International Conference
on World Wide Web (WWW 2010), Raleigh,
pp. 831–840 (2010)
33. Smeulders, A.W.M., Worring, M., Santini, S.,
Gupta, A., Jain, R.: Content-based image retrieval
at the end of the early years. IEEE Trans. Pattern
Anal. Mach. Intell. 22(12), 1349–1380 (2000)
34. Rao, A., Jain, R.: Knowledge representation and
control in computer vision systems. IEEE Expert
3, 64–79 (1988)
35. Draper, B., Hanson, A., Riseman, E.: Knowledge-
directed vision: control, learning and integration.
IEEE 84(11), 1625–1681 (1996)
36. Snoek, C., Huurnink, B., Hollink, L., Rijke, M.,
Schreiber, G., Worring, M.: Adding semantics to
detectors for video retrieval. IEEE Trans.
Multimed. 9(5), 975–986 (2007)
37. Hauptmann, A., Yan, R., Lin, W.H., Christel, M.,
Wactlar, H.: Can high-level concepts fill the
semantic gap in video retrieval? A case study
with broadcast news. IEEE Trans. Multimed.
9(5), 958–966 (2007)
38. Hunter, J., Drennan, J., Little, S.: Realizing the
hydrogen economy through Semantic Web
technologies. IEEE Intell. Syst. 19(1), 40–47
(2004)
39. Little, S., Hunter, J.: Rules-by-example – a novel
approach to semantic indexing and querying
of images. In: Proceedings of the Third Interna-
tional Semantic Web Conference (ISWC 2004),
Hiroshima. Lecture Notes in Computer
Science, vol. 3298, pp. 534–548. Springer, Berlin
(2004)
40. Hollink, L., Little, S., Hunter, J.: Evaluating the
application of semantic inferencing rules to
image annotation. In: International Conference
on Knowledge Capture (K-CAP 2005), Banff,
pp. 91–98 (2005)
41. Petridis, K., Anastasopoulos, D., Saathoff, C.,
Timmermann, N., Kompatsiaris, Y., Staab, S.:
M-OntoMat-Annotizer: image annotation,
linking ontologies and multimedia low-level fea-
tures. In: Proceedings of the 10th International
Conference on Knowledge-Based Intelligent
Information and Engineering Systems, Part 3
(KES 2006), Bournemouth. Lecture Notes in
Computer Science, vol. 4253, pp. 633–640.
Springer, Berlin (2006)
42. Maillot, N., Thonnat, M., Boucher, A.: Towards
ontology based cognitive vision. In: Proceed-
ings of the International Conference on Com-
puter Vision Systems (ICVS 2003), Graz,
pp. 44–53 (2003)
43. Hudelot, C., Maillot, N., Thonnat, M.: Symbol
grounding for semantic image interpretation:
from image data to semantics. In: Proceedings
of 10th International Conference on Computer
Vision Workshops (ICCV 2005), Beijing (2005)
44. Maillot, N., Thonnat, M.: A weakly supervised
approach for semantic image indexing and
retrieval. In: Proceedings of the Fourth Interna-
tional Conference on Image and Video Retrieval
(CIVR 2005), Singapore. Lecture Notes in Com-
puter Science, vol 3568, pp. 629–638. Springer,
Berlin (2005)
45. Dasiopoulou, S., Mezaris, V., Kompatsiaris, I.,
Papastathis, V., Strintzis, M.: Knowledge-assisted
semantic video object detection. IEEE Trans.
Circuits Syst. Video Technol. 15(10), 1210–1224
(2005)
46. Neumann, B., Weiss, T.: Navigating through logic-
based scene models for high-level scene interpreta-
tions. In: Proceedings of the Third International
Conference on Computer Vision Systems (ICVS
2003), Graz. Lecture Notes in Computer Science,
vol. 2626, pp. 212–222. Springer, Berlin (2003)
47. Moller, R., Neumann, B., Wessel, M.: Towards
computer vision with description logics: some
recent progress. In: Workshop on Integration
of Speech and Image Understanding, Corfu,
pp. 101–115 (1999)
48. Neumann, B., MAoller, R.: On scene interpretation
with description logics. Image Vision Comput.
Arch. 26(1), 247–275 (2008). http://dx.doi.org/
10.1016/j.imavis.2007.08.013
49. Hotz, L., Neumann, B., Terzic, K.: High-level
expectations for low-level image processing. In:
Proceedings of the 31st Annual German
972 21 Multimedia, Broadcasting, and eCulture
Conference on AI (KI 2008), Kaiserslautern,
pp. 87–94 (2008)
50. Neumann, B.: Bayesian compositional hierar-
chies – a probabilistic structure for scene inter-
pretation. Technical report FBI-HH-B-282/08.
Department of Informatics, Hamburg University
(2008)
51. Peraldi, I.E., Kaya, A., Melzer, S., Moller, R.,
Wessel, M.: Towards a media interpretation
framework for the semantic web. In: Proceedings
of the International Conference on Web Intelli-
gence (WI 2007), Silicon Valley, pp. 374–380
(2007)
52. Dubois, D., Prade, H.: Possibility theory, proba-
bility theory and multiple-valued logics: a clarifi-
cation. Ann. Math. Artif. Intell. 32(1–4), 35–66
(2001)
53. Sciascio, E.D., Donini, F.: Description logics for
image recognition: a preliminary proposal. In:
International Workshop on Description Logics
(DL 1999), Linkoping (1999)
54. Sciascio, E.D., Donini, F., Mongiello, M.: Struc-
tured knowledge representation for image
retrieval. J. Artif. Intell. Res. 16, 209–257 (2002)
55. Dasiopoulou, S., Kompatsiaris, I., Strintzis, M.:
Using fuzzy DLs to enhance semantic image
analysis. In: Proceedings of Third International
Conference on Semantic and Digital Media Tech-
nologies (SAMT 2008), Koblenz, pp. 31–46 (2008)
56. Dasiopoulou, S., Kompatsiaris, I., Strintzis, M.:
Applying fuzzy DLs in the extraction of image
semantics. J. Data Semant. 14, 105–132 (2009)
57. Simou, N., Athanasiadis, T., Stoilos, G., Kollias,
S.: Image indexing and retrieval using expressive
fuzzy description logics. Signal Image Video
Process. 2(4), 321–335 (2008)
58. Hudelot, C., Atif, J., Bloch, I.: Fuzzy spatial rela-
tion ontology for image interpretation. Fuzzy Sets
Syst. 159(15), 1929–1951 (2008)
59. Straccia, U.: Reasoning within fuzzy description
logics. J. Artif. Intell. Res. 14, 137–166 (2001)
60. Stoilos, G., Stamou, G., Tzouvaras, V., Pan, J.,
Horrocks, I.: The fuzzy description logic
f-SHIN. In: Proceedings of the International
Workshop on Uncertainty Reasoning for the
Semantic Web (URSW 2005), Galway, pp. 67–76
(2005)
61. Ding, Z.: Bayesowl: a probabilistic framework for
semantic web. Ph.D. thesis, University of Mary-
land, Baltimore County (2005)
62. da Costa, P., Laskey, K., Laskey, K.: PR-OWL: a
Bayesian ontology language for the semantic web.
In: Proceedings of the Fourth International
Workshop on Uncertainty Reasoning for the
Semantic Web (URSW 2008), Karlsruhe.
Lecture Notes in Computer Science, vol. 5327,
pp. 88–107. Springer, Berlin (2008)
63. Town, C.: Ontological inference for image and
video analysis. Mach. Vis. Appl. 17(2), 94–115
(2006)
64. Tran, S., Davis, L.: Event modeling and recogni-
tion using Markov logic networks. In: Proceed-
ings of the 10th European Conference on
Computer Vision, Part II (ECCV 2008),
Marseille, pp. 610–623 (2008)
65. Richardson, M., Domingos, P.: Markov logic net-
works. Mach. Learn. 62(1–2), 107–136 (2006)
66. Francois, A., Nevatia, R., Hobbs, J., Bolles, R.:
Verl: an ontology framework for representing
and annotating video events. IEEE Multimed.
12(4), 76–86 (2005)
67. Dasiopoulou, S., Kompatsiaris, I.: Trends and
issues in description logics frameworks for
image interpretation. In: Proceedings of the
Sixth Hellenic Conference on Artificial Intelli-
gence: Theories, Models and Applications
(SETN 2010), Athens, pp. 61–70 (2010)
68. Schopman, B., Brickley, D., Aroyo, L., van Aart,
C., Buser, V., Siebes, R., Nixon, L., Miller, L.,
Malaise, V., Minno,M.,Mostarda, M., Palmisano,
D., Raimond, Y.: NoTube: making the web part
of personalised TV. In: Proceedings of the
WebSci10: Extending the Frontiers of Society
Online, Raleigh (2010)
69. Taylor, A.: Introduction to Cataloging and Clas-
sification. Library and Information Science
Text Series. Libraries Unlimited, Santa Barbara
(2006)
70. Sowa, J.: Knowledge Representation. Logical,
Philosophical, and Computational Foundations.
Brooks/Cole, Pacific Grove (2000)
71. Doerr, M.: The CIDOC CRM – an ontological
approach to semantic interoperability of meta-
data. AI Mag. 24(3), 75–92 (2003)
72. Ruotsalo, T., Hyvonen, E.: An event-based
method for making heterogeneous metadata
schemas and annotations semantically interoper-
able. In: Proceedings of the Sixth International
Semantic Web Conference, Second Asian Seman-
tic Web Conference (ISWC 2007 + ASWC 2007),
21.6 Cross-References 21 973
Busan. Lecture Notes in Computer Science,
vol. 4825, pp. 409–422. Springer, Berlin (2007)
73. Borgo, S., Masolo, C.: Foundational choices in
DOLCE. In: Staab, S., Studer, R. (eds.) Handbook
on Ontologies. International Handbooks on
Information Systems, 2nd edn., pp. 403–421.
Springer, Dordrecht (2009)
74. Fellbaum, C. (ed.): WordNet. An Electronic
Lexical Database. MIT Press, Cambridge (2001)
75. Yano, K., Nakaya, T., Isoda, Y., Takase, Y.,
Kawasumi, T., Matsuoka, K., Seto, T.,
Kaewahara, D., Tsukamotew, A., Inoue, M.,
Kirimura, T.: Virtual Kyoto: 4DGIS comprising
spatial and temporal dimensions. J. Geogr.
117(2), 464–476 (2008)
76. Kauppinen, T., Vaatanen, J., Hyvonen, E.: Creat-
ing and using geospatial ontology time series in a
semantic cultural heritage portal. In: Proceedings
of the Fifth European Semantic Web Conference
(ESWC 2008), Tenerife. Lecture Notes in Com-
puter Science, vol. 5021, pp. 110–123. Springer,
Berlin (2008)
77. Nagypal, G., Motik, B.: Afuzzy model for
representing uncertain, subjective, and vague
temporal knowledge in ontologies. In: On the
Move to Meaningful Internet Systems 2003:
CoopIS, DOA, and ODBASE – OTM Confeder-
ated International Conferences (CoopIS, DOA,
and ODBASE 2003), Catania, pp. 906–923 (2003)
78. Kauppinen, T., Mantegari, G., Paakkarinen, P.,
Kuittinen, H., Hyvonen, E., Bandini, S.: Deter-
mining relevance of imprecise temporal intervals
for cultural heritage information retrieval. Int. J.
Hum. Comput. Stud. 86(9), 549–560 (2010).
Elsevier
79. Allen, J.F.: Maintaining knowledge about tempo-
ral intervals. Commun. ACM 26(11), 832–843
(1983)
80. Hyvonen, E., Makela, E., Salminen, M., Valo, A.,
Viljanen, K., Saarela, S., Junnila, M., Kettula, S.:
MuseumFinland – Finnish museums on the
semantic web. J. Web Semant. 3(2), 224–241
(2005)
81. McCarty, W.: Humanities Computing. Palgrave
Macmillan, Basingstoke (2005)
82. Sheth, A., Aleman-Meza, B., Arpinar, I.B.,
Bertram, C., Warke, Y., Ramakrishnan, C.,
Halaschek, C., Anyanwu, K., Avant, D., Arpinar,
F.S., Kochut, K.: Semantic association iden-
tification and knowledge discovery for national
security applications. J. Database Manag. Data-
base Technol. 16(1), 33–53 (2005)
83. Hyvonen, E., Makela, E., Kauppinen, T., Alm, O.,
Kurki, J., Ruotsalo, T., Seppala, K., Takala, J.,
Puputti, K., Kuittinen, H., Viljanen, K.,
Tuominen, J., Palonen, T., Frosterus, M., Sinkkila,
R., Paakkarinen, P., Laitio, J., Nyberg, K.:
CultureSampo – Finnish culture on the semantic
web 2.0. In: Proceedings of the Museums and the
Web (MW 2009), Indianapolis (2009)
84. Schreiber, G., Amin, A., Aroyo, L., van Assem,M.,
de Boer, V., Hardman, L., Hildebrand, M.,
Omelayenko, B., van Ossenbruggen, J.,
Tordai, A., Wielemaker, J., Wielinga, B.J.:
Semantic annotation and search of cultural-
heritage collections: The MultimediaN E-Culture
demonstrator. J. Web Semant. 6(4), 243–249
(2008)
85. Junnila, M., Hyvonen, E., Salminen, M.: Describ-
ing and linking cultural semantic content by
using situations and actions. In: Robering, K.
(ed.) Information Technology for the Virtual
Museum. LIT verlag, Berlin (2008)
86. Byrne, K.: Populating the Semantic Web – Com-
bining text and relational databases as RDF
graphs. Ph.D. thesis, University of Edinburgh,
Supp. 32 (2009)
87. Hyvonen, E.: Semantic portals for cultural heri-
tage. In: Staab, S., Studer, R. (eds.) Handbook on
Ontologies, 2nd edn., pp. 757–778. Springer,
Dordrecht (2009)
88. van Hage, W.R., Stash, N., Wang, Y., Aroyo, L.:
Finding your way through the Rijksmuseumwith
an adaptive mobile museum guide. In: The
Semantic Web: Research and Applications, Sev-
enth Extended SemanticWeb Conference (ESWC
2010), Proceedings, Part I, Heraklion. Lecture
Notes in Computer Science, vol. 6088,
pp. 46–59. Springer, Berlin (2010)
89. Becker, C., Bizer, C.: DBpedia mobile: a
locationenabled linked data browser. In: Proceed-
ings of the First Workshop about Linked Data on
the Web (LDOW 2008), Beijing (2008);
Kobilarov, G., Scott, T., Raimond, Y., Oliver, S.,
Sizemore, C., Smethurst, M., Bizer, C., Lee, R.:
Media meets semantic web – how the BBC uses
DBpedia and linked data to make connections.
In: The Semantic Web: Research and Applica-
tions, Proceedings of the Seventh Extended
Semantic Web Conference (ESWC 2010), Part I,
974 21 Multimedia, Broadcasting, and eCulture
Heraklion. Lecture Notes in Computer Science,
vol. 6088, pp. 723–737. Springer, Berlin (2010)
90. Euzenat, J., Shvaiko, P.: Ontology Matching.
Springer, Berlin (2007)
91. Hyvonen, E., Viljanen, K., Tuominen, J.,
Seppala, K.: Building a national semantic web
ontology and ontology service infrastructure –
the FinnONTO approach. In: Proceedings of the
Fifth European Semantic Web Conference
(ESWC 2008), Tenerife. Lecture Notes in Com-
puter Science, vol. 5021, pp. 95–109. Springer,
Berlin (2008)
92. Rui, Y., Huang, T., Chang, S.: Image retrieval:
current techniques, promising directions and
open issues. J. Vis. Commun. Image Represent.
10(4), 39–62 (1999)
93. Lew, M.S., Sebe, N., Djeraba, C., Jain, R.: Content-
basedmultimedia information retrieval: state of the
art and challenges.ACMTrans.Multimed.Comput.
Commun. Appl. 2, 1–19 (2006)
94. Hollink, L.: Semantic annotation for retrieval
of visual resources. Ph.D. thesis, Free University
of Amsterdam. SIKS Dissertation Series,
No. 2006-24 (2006)
95. Pollitt, A.S.: The key role of classification and
indexing in view-based searching. Technical
Report. University of Huddersfield, UK. http://
www.ifla.org/IV/ifla63/63polst.pdf (1998)
96. Hearst, M., Elliott, A., English, J., Sinha, R.,
Swearingen, K., Lee, K.-P.: Finding the flow in Web
site search. Commun. ACM 45(9), 42–49 (2002)
97. Hyvonen, E., Saarela, S., Viljanen, K.: Application
of ontology techniques to view-based semantic
search and browsing. In: The Semantic Web:
Research and Applications. Proceedings of the
First European Semantic Web Symposium
(ESWS 2004), Heraklion. Lecture Notes in Com-
puter Science, vol. 3053, pp. 92–106. Springer,
Berlin (2004)
98. Sacco, G.M.: Dynamic taxonomies: guided inter-
active diagnostic assistance. In: Wickramasinghe,
N. (ed.) Encyclopedia of Healthcare Information
Systems. Idea Group, Hershey (2005)
99. Hildebrand, M., van Ossenbruggen, J., Hardman,
L.: /facet: a browser for heterogeneous Semantic
Web repositories. In: Proceeding of the Fifth
International Semantic Web Conference (ISWC
2006), Athens, GA. Lecture Notes in Computer
Science, vol. 4273, pp. 272–285. Springer, Berlin
(2006)
100. Holi, M.: Crisp, fuzzy, and probabilistic faceted
semantic search. Dissertation, School of Science
and Technology, Aalto University, Espoo (2010)
101. English, J., Hearst, M., Sinha, R., Swearingen, K.,
Lee, K.-P.: Flexible search and navigation using
faceted metadata. Technical report. School of
Information Management and Systems, Univer-
sity of Berkeley, Berkeley (2003)
102. Hyvonen, E., Makela, E.: Semantic
autocompletion. In: Proceedings of the First
Asian Semantic Web Conference (ASWC
2006), Beijing. Lecture Notes in Computer
Science, vol. 4185, pp. 739–751. Springer,
Heidelberg (2006)
103. Hildebrand, M., van Ossenbruggen, J.R.: Con-
figuring semantic web interfaces by data map-
ping. In: Proceedings of the Workshop on:
Visual Interfaces to the Social and the Semantic
Web (VISSW 2009), Sanibel Island (2009)
104. Burke, R.: Knowledge-based recommender sys-
tems. In: Kent, A. (ed.) Encyclopedia of Library
and Information Systems, vol. 69. Marcel
Dekker, New York (2000)
105. Geroimenko, V., Chen, C. (eds.): Visualizing the
Semantic Web: XML-Based Internet and Infor-
mation Visualization. Springer, Berlin (2002)
106. Kauppinen, T., Henriksson, R., Vaatainen, J.,
Deichstetter, C., Hyvonen, E.: Ontology based
modeling and visualization of cultural spatio-
temporal knowledge. In: Semantic Web at
Work – Proceedings of STeP 2006, Finnish AI
Society, Espoo (2006)
107. Adomavicius, G.: Toward the next generation
of recommender systems: a survey of the
state-of-the-art and possible extensions. IEEE
Trans. Knowl. Data Eng. 17(6), 734–749 (2005)
108. Viljanen, K., Kansala, T., Hyvonen, E., Makela,
E.: ONTODELLA – a projection and linking
service for Semantic Web applications. In: Pro-
ceedings of the 17th International Conference
on Database and Expert Systems Applications
(DEXA 2006), Krakow (2006)
109. Ruotsalo, T., Makela, E., Kauppinen, T.,
Hyvonen, E., Haav, K., Rantala, V., Frosterus, M.,
Dokoohaki, N., Matskin, M.: Smartmuseum:
personalized context-aware access to digital
cultural heritage. In: Proceedings of the Inter-
national Conferences on Digital Libraries and
the Semantic Web 2009 (ICSD 2009), Trento
(2009)
21.6 Cross-References 21 975
110. Wang, Y., Stash, N., Aroyo, L., Gorgels, P.,
Rutledge, L., Schreiber, G.: Recommendations
based on semantically-enriched museum collec-
tion. J. Web Semant. 6(4), 283–290 (2008)
111. Spagnuolo, M., Falcidieno, B.: 3Dmedia and the
semantic web. IEEE Intell. Syst. 24(2), 90–96
(2009)
112. State of the art report on 3D content in
archaeology and cultural heritage. FOCUS K3D
deliverable 2.4.1 (2009)
113. Hauswirth, M., Decker, S.: Semantic reality –
Connecting the real and the virtual world. In:
Position Paper at Microsoft SemGrail Work-
shop, Redmond (2007)
114. Dasiopoulou, S., Tzouvaras, V., Kompatsiaris, I.,
Strintzis, M.G.: EnquiringMPEG-7 based ontol-
ogies. Multimed. Tools Appl. 46(2), 331–370
(2010)
115. Staab, S., Studer, R. (eds.): Handbook on
Ontologies. Springer, Dordrecht (2004)
116. Baader, F., Calvanese, D., McGuinness, D.L.,
Nardi, D., Patel-Schneider, P.F.: The Description
Logic Handbook: Theory, Implementation, and
Applications. Cambridge University Press,
Cambridge (2003)
117. Bertini, M., Del Bimbo, A., Serra, G., Torniai, C.,
Cucchiara, R., Grana, C., Vezzani, R.: Dynamic
pictorially enriched ontologies for digital video
libraries. IEEE Multimed. 16(2), 42–51 (2009)
118. Shet, V.D., Neumann, J., Ramesh, V., Davis, L.S.:
Bilattice-based logical reasoning for human
detection. In: Proceedings of IEEE Conference
on Computer Vision and Pattern Recognition
(CVPR 2007), Minneapolis, pp. 1–8 (2007)
119. Patkos, T., Chrysakis, I., Bikakis, A., Plexousakis,
D., Antoniou, G.: A reasoning framework for
ambient intelligence. In: Proceedings of the
Sixth Hellenic Conference on Artificial Intelli-
gence: Theories, Models and Applications
(SETN 2010), Athens, pp. 213–222 (2010)
120. Kowalski, R.A., Sergot, M.J.: A logic-based cal-
culus of events. In: Foundations of Knowledge
Base Management, pp. 23–55. Springer, Berlin
(1985). ISBN 3-540-18987-4
121. Biancalana, C., Micarelli, A., Squarcella, C.:
Nereau: a social approach to query expan-
sion. In: Proceeding of the 10th ACM Interna-
tional Workshop on Web Information and Data
Management (WIDM 2008), Napa Valley,
pp. 95–102. ACM, New York (2008)
122. Brusilovsky, P., Maybury, M.T.: From adaptive
hypermedia to the adaptive web. Commun.
ACM 45(5), 30–33 (2002)
123. Carmagnola, F., Cena, F., Gena, C., Torre, I.: A
semantic framework for adaptive web-based sys-
tems. In: Bouquet, P., Tummarello, G. (eds.)
Semantic Web Applications and Perspectives
(SWAP 2005), Proceedings of the Second Italian
Semantic Web Workshop, Trento. CEURWork-
shop Proceedings, vol. 166. CEUR-WS.org
(2005)
124. Ginsberq, M.L.: Multivalued logics: a uniform
approach to reasoning in artificial intelligence.
Comput. Intell. 4, 265–316 (1988)