Nov. 17, 2004 © Artem Chebotko, 2004 1
OntoELAN: An Ontology-Based Linguistic Multimedia
Annotator
Speaker: Artem Chebotko([email protected])
Department of Computer ScienceWayne State University
Nov. 17, 2004 © Artem Chebotko, 2004 2
Coauthors
From left: Ms. Yu Deng, graduated with M.S. in Computer Science in 2004; Prof. Shiyong Lu, Computer Science, my advisor; Prof. Farshad Fotouhi, Computer Science, Chair of the department; Prof. Anthony Aristar, Dept. of English, Linguistics Program. All at the Wayne State University.Hennie Brugman, Alexander Klassmann, Han Sloetjes, Albert Russel, Peter Wittenburg, Max Planck Institute for Psycholinguistics, Nijmegen, Netherlands.Acknowledgements: Laura Buszard-Welcher and Andrea Berez, Dept. of English, Linguistics Program, WSU.
Nov. 17, 2004 © Artem Chebotko, 2004 3
The Outline of The Talk Background and Motivation The Limitations of Existing Tools Our Approach and Advantages An Overview of OntoELAN Demo
Nov. 17, 2004 © Artem Chebotko, 2004 4
Background and Motivation Linguistics
Many languages are in serious danger of being lost
In fact, half of the world's approximately 6,500 languages may disappear in the next 100 years
Language data is critical to the research of linguistics, anthropology, history, sociology, and political science, etc.
Language data is also important for the community of that language.
Nov. 17, 2004 © Artem Chebotko, 2004 5
Background and Motivation Multimedia
Many language data are collected as audio and video recordings
Difficult for indexing and retrieval because multimedia data are not structured and their semantics are implicit in their contents.
Annotation of multimedia data provides an opportunity for making the semantics explicit
Nov. 17, 2004 © Artem Chebotko, 2004 6
Background and Motivation Ontology-based annotation
An ontology is an explicit specification of a shared conceptualization. It formalizes the knowledge of various concepts and their relationships in a particular domain
Annotation with ontological terms, whose meaning is known and understood by the domain community
Nov. 17, 2004 © Artem Chebotko, 2004 7
Requirements for a Linguistic Multimedia Annotator Support for the annotation of descriptive metadata
such as title, authors, date, time, etc. Support for a time axis and temporal segmentation
of clips into slots Support for multiple-tier annotation, with each tier
providing one avenue for annotation Support for ontology-based annotation to avoid
incompatible formats and vocabularies
Nov. 17, 2004 © Artem Chebotko, 2004 8
The Limitations of Existing Tools
Either don’t support ontology IBM MPEG-7 Annotation Tool, ELAN
or provide limited support of multimedia Protégé, ImageSpace, IBM MPEG-7 Annotation Tool
Tools Descriptive annotation
Temporal segmentation
Multi-tier annotation
Ontology support
Protégé Yes No No Yes
IBM MPEG-7 Yes No No No
ImageSpace Yes No No Yes
ELAN Yes Yes Yes No
Nov. 17, 2004 © Artem Chebotko, 2004 9
Our Approach and Advantages We developed an ontology-based annotation tool,
OntoELAN, for linguistic multimedia data that satisfies all the above requirements
The ontological approach eliminates multiple incompatible annotation formats
if the whole community can agree upon one domain ontology
Annotations are formally defined and machine interpretable
Deduction of additional, implicit information Search is precise and easier
Nov. 17, 2004 © Artem Chebotko, 2004 10
An Overview of OntoELAN Developed on the top of ELAN annotator
Max Planck Institute for Psycholinguistics team Features inherited from ELAN
display a speech and/or video signals, together with their annotations;
time linking of annotations to media streams; linking of annotations to other annotations; unlimited number of annotation tiers as defined by a
user; different character sets; basic search facilities.
Nov. 17, 2004 © Artem Chebotko, 2004 11
An Overview of OntoELAN Ontology support
Wayne State University team New features
language profile creation; ontology-based annotation; storing annotations in the XML format based
on the General Multimedia Ontology and domain ontologies.
Nov. 17, 2004 © Artem Chebotko, 2004 14
Linguistic Domain Ontology One example is the General Ontology for Linguistic
Description (GOLD) Developed at University of Arizona
Expressions OrthographicExpression, Utterance, SignedExpression, Word,
WordPart Grammar
Tense, Number, Agreement, PartOfSpeech PartOfSpeech: Noun, Verb, Participle, Preverb
Data structures A lexical entry, a phoneme table and a syntactic tree
Metaconcepts Language itself
Nov. 17, 2004 © Artem Chebotko, 2004 15
General Multimedia Ontology Simple semantic framework for multimedia annotation Developed at Wayne State University especially for
OntoELAN AnnotationDocument Tier TimeSlot Annotation AlignableAnnotation ReferringAnnotation AnnotationValue StringAnnotation OntologyAnnotation etc.
Nov. 17, 2004 © Artem Chebotko, 2004 17
Language Profile … is a subset of ontological terms, possibly
renamed, that are used in the annotation of a particular multimedia resource ontological terms user-defined terms a mapping between ontological terms and user-
defined terms a reference to an ontology
Nov. 17, 2004 © Artem Chebotko, 2004 18
Language Profile Advantages
Only a subset of ontological terms is useful for a particular resource annotation
Renaming ontological terms, e.g. use another language, give an abbreviation or a synonym
Combining the meaning of two or many ontological terms in one user-defined term.
Disadvantage More work
Nov. 17, 2004 © Artem Chebotko, 2004 20
Annotation Tiers and Linguistic Types Annotation tiers
contain annotation values can be either alignable or referring are associated with their linguistic types
Linguistic types None Time Subdivision Symbolic Subdivision Symbolic Association
Ontological tier
Nov. 17, 2004 © Artem Chebotko, 2004 21
Linguistic Multimedia Annotation with OntoELAN
Language profile creation Creation of tiers Creation of annotations
Nov. 17, 2004 © Artem Chebotko, 2004 23
Demos Language profile creation
profile01.swf profile01.AVI profile02.swf profile02.AVI
Creation of tiers & Creation of annotations annotate01.swf annotate01.AVI annotate02.swf annotate02.AVI
Nov. 17, 2004 © Artem Chebotko, 2004 24
Conclusions and Future Work OntoELAN is the first attempt at annotating
linguistic multimedia data with a linguistic ontology
Future Work provide more channels for sharing data on the
Web, such as the multimedia descriptions, the language words, etc.
improve the current searching system integrate a text document annotation
Nov. 17, 2004 © Artem Chebotko, 2004 25
References Artem Chebotko, Yu Deng, Shiyong Lu and
Farshad Fotouhi. An Ontology-based Multimedia Annotator for the Semantic Web of Language Engineering. International Journal on Semantic Web and Information Systems, January, 2005.
Artem Chebotko et al. OntoELAN: An Ontology-based Linguistic Multimedia Annotator. Proc. of the IEEE Sixth International Symposium on Multimedia Software Engineering (IEEE-MSE'2004), Miami, FL, USA, December, 2004.
Nov. 17, 2004 © Artem Chebotko, 2004 26
References OntoELAN
http://www.cs.wayne.edu/~yudeng/projects.htm LangDL: A Digital Library For Language Engineering
And Research http://database.cs.wayne.edu/proj/langdl/index.html
ELAN http://www.mpi.nl/tools/elan.html
E-MELD http://www.emeld.org
GOLD http://www.emeld.org/gold
General Multimedia Ontology http://database.cs.wayne.edu/proj/OntoELAN/multimedia.owl
Nov. 17, 2004 © Artem Chebotko, 2004 27
Questions?
Contact information Artem Chebotko [email protected] 313-577-6711