Supervised by Prof. LYU, Rung Tsong Michael
Department of Computer Science & Engineering
The Chinese University of Hong Kong
Prepared by: Chan Pik Wah, Pat
Ngai Cheuk Han, Table
LYU0102LYU0102
XML for InteroperableXML for Interoperable Digital Video Library Digital Video Library
Outline Introduction to XVIP Overview of Project Extraction Techniques
Face Detection Speech Recognition
Multimedia Transformation & Presentation XSL SMIL Transformation
Problems & Solutions Conclusion
Motivations Rapid increase in the
usage of multimedia information
New approach: DIGITAL VIDEO LIBRARY
Project Outline
Motivations Little attention paying on video
information extraction and storage Scalability of the system in terms of adding
new extraction components Lack of a generic framework for
presentation and visualization of video information
Project Outline
Overview of XVIP
Project Outline
Achievements in last Semester 2 Extraction
Techniques Scene Change VOCR
Integrate data into XML
XML Editor Knowledge
Enrichment
Project Outline
Achievements in this Semester 2 more extraction
techniques Face Detection Speech Recognition
New data integrated to XML
XML to SMIL Transformer
Project Outline
Extraction Techniques
Extraction Techniques
Video
Scene Change
VOCD
Face Detection
Speech Recognition
XML
Face Detection Object-presence
detections are also an important technique.
Identify and index features to support image similarity matching. Face detection is a good example
Extraction Techniques
Face Detection Name of people
appearing in the video How they are interacting
with the environment More searchable
Extraction Techniques
Face Detection Neural Network-Based Algorithm The basic algorithm used for face detection
Extraction Techniques
Face Detection Face Recognition Facial Expression Analysis Enrich the XML Easier for user to search the content of
video
Extraction Techniques
Speech Recognition Speech recognition technology can make
any spoken data useful for library indexing and retrieval
Extraction Techniques
Speech Recognition Engine
Extraction Techniques
Speech Recognition
ViaVoice Error rate > 50%
Extraction Techniques
Usage of XML
XML
Indexing & Searching
Combine with other XML for Knowledge Enrichment
Presentation
Exchange data with different application
Presentation of the video data XML is not presentable without processing HTML with images, but is static SMIL is good for multimedia presentation No existing tools for integrating different XML
data into a SMIL presentation Current transformation language has
a lot of limitations in transforming
XML to SMIL
SMIL
SMIL SMIL stands for Synchronized Multimedia
Integration Language is currently a W3C Recommendation.
It is a markup language that can synchronize and integrate multimedia.
It enables authors to specify when and what should be presented.
RealPlayer, QuickTime, IE support
SMIL
Advantages SMIL is text-based
Easy to develop with a text editor Generate customized presentations
Generate customized SMIL file based on preferences recorded in the visitor's browser
SMIL effort is led by the W3C W3C tries to shape a specification that is beneficial
to all parties involved. Avoid using container formats.
SMIL can stream many media formats, no need to merge clips into a single streaming file.
SMIL
Timing and SynchronizationSequence element:<seq>
<img src="pix/0.jpg" dur="15" region="scene"/>
<img src="pix/15.jpg" dur="5" region="scene"/>
<img src="pix/20.jpg" dur="7" region="scene"/>
<img src="pix/27.jpg" dur="4" region="scene"/>
……
</seq>
Parallel element:
<par>
<text src="text/transcript.rt" region="transcript" />
<text src="text/mapdetail.rt" region="mapdetail" />
<video src="news.mpg" region="video" fill="freeze"/>
…
</par>
SMIL
XSL Stands for “Extensible Stylesheet Language” XSL is the language defined by the W3C to add
formatting information to XML data. XSLT -- most commonly used XSL standard
Transforms one XML document into another. Used in our FYP.
XSL
XSL
Working Principle
Source
Tree
XSL Stylesheet
Output
Transformation Process
Transformation
Input files XML file
generated by XVIP
XML files of additional information
Output files A SMIL file
Some RealText files
Design 1 Build with VC++ solely
Read all the input files, get the information
Create the output the files for the SMIL presentation.
Transformation
Disadvantages Layout of the SMIL
presentation need to be hard-coded in the VC++ program.
The layout becomes hard to change and the transformer becomes hard to extend.
Design 1 with modification Modification
Provide an additional file or interface as a template for user to define the layout of SMIL presentation.
Disadvantage The flexibility provided is still limited. Not a standard way to define a template.
Transformation
Design 2 Use XSLT assisting the
transformation. User can define his own template with XSL.
Advantages Program-independent Extensible Standard templates
Transformation
Limitations of XSLT It can only read one i
nput data file and one XSL file, then generate one output.
It cannot do combin-ation among files.
Design 2Solutions: Knowledge Enrichment
Combine additional information with the XML file from XVIP before converting to SMIL
Creating output files Use separate XSL files to generate RealText files Use separate XSL files to generate layout of the pres
entation and displaying order of objects in different regions, then combine them to a SMIL file
Transformation
Knowledge Enrichment
Transformation
Combined XML file
Information of major cities
XML file from XVIP
Combined XML file XML file
contains information of major cities that are related to the video.
<COMBINE><TIME begin="10" dur="11"><NAME>香港 </NAME><DETAIL>中國南部一個沿海城市 </DETAIL><AREA>China</AREA></TIME><TIME begin="21" dur="20"><NAME>紐約 </NAME><DETAIL>隸屬美國紐約州的城市 </DETAIL><AREA>America</AREA></TIME></COMBINE>
Transformation
Create RealText files
Geographical Information
Biographical Information
Video Transcript
Transformation
Create SMIL file
Transformation
Layout
Displaying
order
Create SMIL file
Transformation
SMIL PresentationCombining the temporary files
Problems & Solutions Problem 1
The result from XSLT processor is in UTF-8 encoding format, but SMIL needs the format ANSI.
Solution: Write a function “UTF8toANSI” for conversion.
Problems & Solutions
Problems & Solutions Problem 2
XSLT has limitation. It can only read one XML, one XSL file and generate one output file.
Our transformation process has more than one input files
Solution: Do knowledge enrichment and produce a combined XML
result file before creating the output files.
Problems & Solutions
ConclusionXVIP contains: Four video information modalities
Scene change detection VOCD Speech recognition Face detection
Information integration module with XML For storing the extracted video data in XML format
Conclusion
Conclusion XML editor
For editing the XML file generated
Knowledge enrichment component For adding additional information to the XML-
based video data
XML to SMIL transformer For converting the XML-based video data into
SMIL presentation
Conclusion
ConclusionXVIP : provides multiple functions for extracting video
information stores video information in a flexible and
scalable way Comprises a transformer to generate prese
ntation on the information
Paper “XVIP: An XML-Based Video Information Processing System”, Michael Lyu, Edward Yau, C.H.Ngai, P.W.Chan, was accepted by COMPSAC 2002.
Conclusion
Q & A