+ All Categories
Home > Documents > Nov. 17, 2004© Artem Chebotko, 20041 OntoELAN: An Ontology-Based Linguistic Multimedia Annotator...

Nov. 17, 2004© Artem Chebotko, 20041 OntoELAN: An Ontology-Based Linguistic Multimedia Annotator...

Date post: 20-Dec-2015
Category:
View: 224 times
Download: 0 times
Share this document with a friend
Popular Tags:
27
Nov. 17, 2004 © Artem Chebotko, 2004 1 OntoELAN: An Ontology-Based Linguistic Multimedia Annotator Speaker: Artem Chebotko ([email protected]) Department of Computer Science Wayne State University
Transcript

Nov. 17, 2004 © Artem Chebotko, 2004 1

OntoELAN: An Ontology-Based Linguistic Multimedia

Annotator

Speaker: Artem Chebotko([email protected])

Department of Computer ScienceWayne State University

Nov. 17, 2004 © Artem Chebotko, 2004 2

Coauthors

From left: Ms. Yu Deng, graduated with M.S. in Computer Science in 2004; Prof. Shiyong Lu, Computer Science, my advisor; Prof. Farshad Fotouhi, Computer Science, Chair of the department; Prof. Anthony Aristar, Dept. of English, Linguistics Program. All at the Wayne State University.Hennie Brugman, Alexander Klassmann, Han Sloetjes, Albert Russel, Peter Wittenburg, Max Planck Institute for Psycholinguistics, Nijmegen, Netherlands.Acknowledgements: Laura Buszard-Welcher and Andrea Berez, Dept. of English, Linguistics Program, WSU.

Nov. 17, 2004 © Artem Chebotko, 2004 3

The Outline of The Talk Background and Motivation The Limitations of Existing Tools Our Approach and Advantages An Overview of OntoELAN Demo

Nov. 17, 2004 © Artem Chebotko, 2004 4

Background and Motivation Linguistics

Many languages are in serious danger of being lost

In fact, half of the world's approximately 6,500 languages may disappear in the next 100 years

Language data is critical to the research of linguistics, anthropology, history, sociology, and political science, etc.

Language data is also important for the community of that language.

Nov. 17, 2004 © Artem Chebotko, 2004 5

Background and Motivation Multimedia

Many language data are collected as audio and video recordings

Difficult for indexing and retrieval because multimedia data are not structured and their semantics are implicit in their contents.

Annotation of multimedia data provides an opportunity for making the semantics explicit

Nov. 17, 2004 © Artem Chebotko, 2004 6

Background and Motivation Ontology-based annotation

An ontology is an explicit specification of a shared conceptualization. It formalizes the knowledge of various concepts and their relationships in a particular domain

Annotation with ontological terms, whose meaning is known and understood by the domain community

Nov. 17, 2004 © Artem Chebotko, 2004 7

Requirements for a Linguistic Multimedia Annotator Support for the annotation of descriptive metadata

such as title, authors, date, time, etc. Support for a time axis and temporal segmentation

of clips into slots Support for multiple-tier annotation, with each tier

providing one avenue for annotation Support for ontology-based annotation to avoid

incompatible formats and vocabularies

Nov. 17, 2004 © Artem Chebotko, 2004 8

The Limitations of Existing Tools

Either don’t support ontology IBM MPEG-7 Annotation Tool, ELAN

or provide limited support of multimedia Protégé, ImageSpace, IBM MPEG-7 Annotation Tool

Tools Descriptive annotation

Temporal segmentation

Multi-tier annotation

Ontology support

Protégé Yes No No Yes

IBM MPEG-7 Yes No No No

ImageSpace Yes No No Yes

ELAN Yes Yes Yes No

Nov. 17, 2004 © Artem Chebotko, 2004 9

Our Approach and Advantages We developed an ontology-based annotation tool,

OntoELAN, for linguistic multimedia data that satisfies all the above requirements

The ontological approach eliminates multiple incompatible annotation formats

if the whole community can agree upon one domain ontology

Annotations are formally defined and machine interpretable

Deduction of additional, implicit information Search is precise and easier

Nov. 17, 2004 © Artem Chebotko, 2004 10

An Overview of OntoELAN Developed on the top of ELAN annotator

Max Planck Institute for Psycholinguistics team Features inherited from ELAN

display a speech and/or video signals, together with their annotations;

time linking of annotations to media streams; linking of annotations to other annotations; unlimited number of annotation tiers as defined by a

user; different character sets; basic search facilities.

Nov. 17, 2004 © Artem Chebotko, 2004 11

An Overview of OntoELAN Ontology support

Wayne State University team New features

language profile creation; ontology-based annotation; storing annotations in the XML format based

on the General Multimedia Ontology and domain ontologies.

Nov. 17, 2004 © Artem Chebotko, 2004 12

An Overview of OntoELAN

Nov. 17, 2004 © Artem Chebotko, 2004 13

An Overview of OntoELAN

Nov. 17, 2004 © Artem Chebotko, 2004 14

Linguistic Domain Ontology One example is the General Ontology for Linguistic

Description (GOLD) Developed at University of Arizona

Expressions OrthographicExpression, Utterance, SignedExpression, Word,

WordPart Grammar

Tense, Number, Agreement, PartOfSpeech PartOfSpeech: Noun, Verb, Participle, Preverb

Data structures A lexical entry, a phoneme table and a syntactic tree

Metaconcepts Language itself

Nov. 17, 2004 © Artem Chebotko, 2004 15

General Multimedia Ontology Simple semantic framework for multimedia annotation Developed at Wayne State University especially for

OntoELAN AnnotationDocument Tier TimeSlot Annotation AlignableAnnotation ReferringAnnotation AnnotationValue StringAnnotation OntologyAnnotation etc.

Nov. 17, 2004 © Artem Chebotko, 2004 16

General Multimedia Ontology

Nov. 17, 2004 © Artem Chebotko, 2004 17

Language Profile … is a subset of ontological terms, possibly

renamed, that are used in the annotation of a particular multimedia resource ontological terms user-defined terms a mapping between ontological terms and user-

defined terms a reference to an ontology

Nov. 17, 2004 © Artem Chebotko, 2004 18

Language Profile Advantages

Only a subset of ontological terms is useful for a particular resource annotation

Renaming ontological terms, e.g. use another language, give an abbreviation or a synonym

Combining the meaning of two or many ontological terms in one user-defined term.

Disadvantage More work

Nov. 17, 2004 © Artem Chebotko, 2004 19

Language Profile

Nov. 17, 2004 © Artem Chebotko, 2004 20

Annotation Tiers and Linguistic Types Annotation tiers

contain annotation values can be either alignable or referring are associated with their linguistic types

Linguistic types None Time Subdivision Symbolic Subdivision Symbolic Association

Ontological tier

Nov. 17, 2004 © Artem Chebotko, 2004 21

Linguistic Multimedia Annotation with OntoELAN

Language profile creation Creation of tiers Creation of annotations

Nov. 17, 2004 © Artem Chebotko, 2004 22

Linguistic Multimedia Annotation with OntoELAN

Nov. 17, 2004 © Artem Chebotko, 2004 23

Demos Language profile creation

profile01.swf profile01.AVI profile02.swf profile02.AVI

Creation of tiers & Creation of annotations annotate01.swf annotate01.AVI annotate02.swf annotate02.AVI

Nov. 17, 2004 © Artem Chebotko, 2004 24

Conclusions and Future Work OntoELAN is the first attempt at annotating

linguistic multimedia data with a linguistic ontology

Future Work provide more channels for sharing data on the

Web, such as the multimedia descriptions, the language words, etc.

improve the current searching system integrate a text document annotation

Nov. 17, 2004 © Artem Chebotko, 2004 25

References Artem Chebotko, Yu Deng, Shiyong Lu and

Farshad Fotouhi. An Ontology-based Multimedia Annotator for the Semantic Web of Language Engineering. International Journal on Semantic Web and Information Systems, January, 2005.

Artem Chebotko et al. OntoELAN: An Ontology-based Linguistic Multimedia Annotator. Proc. of the IEEE Sixth International Symposium on Multimedia Software Engineering (IEEE-MSE'2004), Miami, FL, USA, December, 2004.

Nov. 17, 2004 © Artem Chebotko, 2004 26

References OntoELAN

http://www.cs.wayne.edu/~yudeng/projects.htm LangDL: A Digital Library For Language Engineering

And Research http://database.cs.wayne.edu/proj/langdl/index.html

ELAN http://www.mpi.nl/tools/elan.html

E-MELD http://www.emeld.org

GOLD http://www.emeld.org/gold

General Multimedia Ontology http://database.cs.wayne.edu/proj/OntoELAN/multimedia.owl

Nov. 17, 2004 © Artem Chebotko, 2004 27

Questions?

Contact information Artem Chebotko [email protected] 313-577-6711


Recommended