Date post: | 31-Dec-2015 |
Category: |
Documents |
Upload: | jeffrey-robbins |
View: | 221 times |
Download: | 2 times |
Concepts of Multimedia Concepts of Multimedia Processing and TransmissionProcessing and Transmission
IT 481, Lecture #1Dennis McCaughey, Ph.D.
22 January, 2007
01/22/2007IT 481, Spring2
OutlineOutline
Course DescriptionInstructorExams, Homework and ProjectGradingGeneral PoliciesLecture Schedule
01/22/2007IT 481, Spring3
Course DescriptionCourse Description
Topics– The fundamentals of signal and image
processing, including algorithms for signal processing that have applications to multimedia
– Techniques for voice coding and recognition, CD and DVD technology, streaming video, WANs and LANs, and videoconferencing technology
Text: Multimedia Communications; Applications, Networks, Protocols and Standards, Fred Halsall, Addison-Wesley; 1st edition (2002), ISBN: 0-201-39818-4.
01/22/2007IT 481, Spring4
InstructorInstructor
Dennis McCaughey– Contact Information
703-263-7425 (Office) 703-624-6830 (Cell) [email protected] (e-mail) Office Hours: one hour before class
– Background PhD in EE University of Southern California 1977
– Thesis: Degrees of Freedom for Projection Imaging
01/22/2007IT 481, Spring5
Exams, Homework and ProjectExams, Homework and Project
Mid-Term: 1 Hour Closed Book– Cover the key topics covered in class and
homework Final: Format “To Be Determined” Homework: 1) Reading assignments, 2)
Written answers to selected questions based on reading assignments, 3) Some limited math problems
Project: Format (Preliminary): MATLAB implementations of selected multimedia processing applications.
01/22/2007IT 481, Spring6
More on the ProjectMore on the Project
A course project will explore aspects of multimedia signal processing and will be computer based using MATLAB.
Project topics will consist of a set of Matlab implementations addressing multimedia concepts assigned on a running basis over the semester.
Each student will be required to submit the project in the format of a final report.
The projects will be graded on the effort applied-not on Matlab programming skills.
Details regarding topics, content, and format will be provided during the course.
01/22/2007IT 481, Spring7
GradingGrading
The final grade will be determined by a weighted average of the homework assignments, a mid-term exam, a final exam and a project
Homework 10%
Mid-Term 20%
Project 30%
Final 40%
01/22/2007IT 481, Spring8
General PoliciesGeneral Policies
Collaboration– Students are permitted and encouraged to collaborate on homework
assignments. – All graded work, however, must be the original effort of the student
submitting the paper.
Homework– Homework will be collected at the beginning of each class period. Note:
Late homework will be accepted provided the reason for the delay is coordinated with the instructor within 2 days of its assignment. Homework solutions will be discussed in class.
Make-up Exams– Make-up exams will not be given unless detailed written clarification
accompanied by documentation for the absence is provided. If this information is not provided an F grade will be given for the exam. The location and time for a make-up exam will be decided by the instructor. Also, students are expected to be in class and on-time for every class.
01/22/2007IT 481, Spring9
Lecture Schedule (Preliminary)Lecture Schedule (Preliminary)
Week Date Chapter Topic Reading
Assignment Homework
1 1/22 1 Lecture #1: Introduction to Multimedia Communications
1,2
2 1/29 None Lecture #2: Signal Processing Fundamentals and Intro to Matlab
3 2/5 2 Lecture #3: Multimedia Information Representation
3
4 2/12 3 Lecture #4: Text Compression 3 5 2/19 3 Lecture #5: Image Compression 4 7 2/26 4 Lecture #6: Audio Compression 4 8 3/5 1-4 Mid-Term Exam &Project Review 9 3/12 None Spring Break 10 3/19 4 Lecture #7: Video Compression 5
11 3/26 5 Lecture #8: Standards for Multimedia Communications
6
12 4/2 6 Lecture #9: Digital Communication Basics
11
13 4/9 11 Lecture #10: Entertainment Networks and High Speed Modems
TBD
14 4/16 TBS Lecture #11: Data Privacy TBD 15 4/23 TBS Special Topics 1-6,11 16 4/30 1-6,11 Final Exam Review 5/14 Final Exam 7:30pm
Multimedia CommunicationsMultimedia Communications
01/22/2007IT 481, Spring11
What is Multimedia?What is Multimedia?
Multimedia is a combination of text, art, sound, animation, and video.
Slide: Courtesy, Hung Nguyen
01/22/2007IT 481, Spring12
Multimedia Components SimplifiedMultimedia Components Simplified
Multimedia can be viewed as they combination of audio, video, data and how they interact with the user (more than the sum of the individual components)
Audio
Multimedia
VideoData
01/22/2007IT 481, Spring13
BackgroundBackground
Fast paced emergence in applications in medicine, education, travel etc
Characterized by large documents that must be communicated with short delays
Glamorous applications such as distance learning, video teleconferencing
Applications that are enhanced by Video are often seen as driver for development of multimedia networks
01/22/2007IT 481, Spring14
Forces Driving Communications That Forces Driving Communications That Facilitate Multimedia CommunicationsFacilitate Multimedia Communications
Evolution of communications and data networks
Increasing availability of almost unlimited bandwidth demand
Availability of ubiquitous access to the network
Ever increasing amount of memory and computational power
Sophisticated terminals Digitization of virtually everything
01/22/2007IT 481, Spring15
New Information System ParadigmNew Information System Paradigm
Integration
MultimediaIntegrated
Communication
MultimediaProcessing
Broadband Link
Workstation, PC
Slide: Courtesy, Hung Nguyen
01/22/2007IT 481, Spring16
Elements of Multimedia SystemsElements of Multimedia Systems
Two key communication modes– Person-to-person– Person-to-machine
TransportUse
InterfaceUse
Interface
TransportProcessingStorage and
Retrieval
UseInterface
Slide: Courtesy, Hung Nguyen
01/22/2007IT 481, Spring17
Multimedia NetworksMultimedia Networks
The world has been wrapped in copper and glass fiber and can be viewed as a “hair ball” with physical, wireless and satellite entry/exit points.
Physical: LAN-WAN connections Wireless: Cellular telephony, wireless PC
connectivity Satellite: INMARSAT, THURYA, ACeS etc
01/22/2007IT 481, Spring18
Multimedia Communication ModelMultimedia Communication Model
Partitioning of information objects into distinct types, e.g., text, audio, video
Standardization of service components per information type
Creation of platforms at two levels – network service and multimedia communication
Define general applications for multiple use in various multimedia environments
Define specific applications, e.g. e-commerce, tele-training, … using building blocks from platform and general applications
01/22/2007IT 481, Spring19
RequirementsRequirements
User Requirements– Fast preparation and presentation– Dynamic control of multimedia applications– Intelligent support to users– Standardization
Network Requirements– High speed and variable bit rates– Multiple virtual connections using the same
access– Synchronization of different information types– Suitable standardized services along with
support
01/22/2007IT 481, Spring20
Network RequirementsNetwork Requirements
ATM-BISDN and SS7 have enabled the switching based communications capabilities over the PSTN that support the necessary services
ATM-BISDN-SS7 will evolve to all optical “switchless” networks based on packet transfer
01/22/2007IT 481, Spring21
Packet Transfer ConceptPacket Transfer Concept
Allows voice, video and data to be dealt with in a common format
More flexible than circuit switching which it can emulate while allowing the multiplexing of varied bit rate data streams
Dynamic allocation of bandwidth Handle Variable Bit Rate (VBR) directly
01/22/2007IT 481, Spring22
ConsiderationsConsiderations
Buffering required for constant bit rate data such as audio
Re-sequencing and recovery capabilities must be provided over networks where packets may be received either in an order different from that transmitted or dropped– In an ATM network some packets can be
dropped while others may not (i.e. voice vs bank transfer data packets)
– Optimum packet lengths for voice video and data differ in an ATM network
– IP packets over the internet may arrive in a different order or be dropped.
01/22/2007IT 481, Spring23
Digital Video Signal TransportDigital Video Signal TransportV
ideo
Encoder•Transformation•Quantization•Entropy Coding•Bit-Rate Control
Application
•Data Structuring
Use
rs
Network Multiplexing/Routing
•Overhead (FEC)•Re-Trans
•Error detection•Loss detection•Error correction•Erasure correction
Application
•Re-Synch
Decoder•De-quantization•Entropy decode•Inv Trans•Loss conceal•Post process
The following figure will be examined over the course of the semester
01/22/2007IT 481, Spring24
Quality of Service (QoS)Quality of Service (QoS)
The set of parameters that defines the properties of media streams
Can define four QoS layers:1. User QoS: Perception of the multimedia data at
the user interface (“qualitative”)2. Application QoS: Parameters such as end-to-
end delay (“quantitative”)3. System QoS: Requirements on the
communications services derived from the application QoS
4. Network QoS: Parameters such as network load and performance
01/22/2007IT 481, Spring25
Applications of MultimediaApplications of Multimedia
Business - Business applications for multimedia include presentations training, marketing, advertising, product demos, databases, catalogues, instant messaging, and networked communication.
Schools - Educational software can be developed to enrich the learning process.
Slide: Courtesy, Hung Nguyen
01/22/2007IT 481, Spring26
Applications of MultimediaApplications of Multimedia
Home - Most multimedia projects reach the homes via television sets or monitors with built-in user inputs.
Public places - Multimedia will become available at stand-alone terminals or kiosks to provide information and help.
Slide: Courtesy, Hung Nguyen
01/22/2007IT 481, Spring27
Compact Disc Read-Only (CD-ROM)Compact Disc Read-Only (CD-ROM)
CD-ROM is the most cost-effective distribution medium for multimedia projects.
It can contain up to 80 minutes of full-screen video or sound.
CD burners are used for reading discs and converting the discs to audio, video, and data formats.
Slide: Courtesy, Hung Nguyen
01/22/2007IT 481, Spring28
Digital Versatile Disc (DVD) Digital Versatile Disc (DVD)
Multilayered DVD technology increases the capacity of current optical technology to 18 GB.
DVD authoring and integration software is used to create interactive front-end menus for films and games.
DVD burners are used for reading discs and converting the disc to audio, video, and data formats.
Slide: Courtesy, Hung Nguyen
01/22/2007IT 481, Spring29
Multimedia CommunicationsMultimedia Communications
Multimedia communications is the delivery of multimedia to the user by electronic or digitally manipulated means.
Audio Communications(Telephony, sound, Broadcast)
Multimedia Communications
Video Communications(Video telephony,
TV/HDTV)
Data, text, imageCommunications
(Data Transfer, fax…)
Slide: Courtesy, Hung Nguyen
01/22/2007IT 481, Spring30
Multimedia TermsMultimedia Terms
01/22/2007IT 481, Spring31
Alternative Types of Media used in Alternative Types of Media used in Multimedia ApplicationsMultimedia Applications
01/22/2007IT 481, Spring32
Multimedia Communications NetworksMultimedia Communications Networks
01/22/2007IT 481, Spring33
Multimedia Networks and Their ServicesMultimedia Networks and Their Services
01/22/2007IT 481, Spring34
Multimedia Networks and Their ServicesMultimedia Networks and Their Services
Audio-Visual IntegrationAudio-Visual Integration
01/22/2007IT 481, Spring36
Application in Biometrics – Bimodal Application in Biometrics – Bimodal Person VerificationPerson Verification
Existing methods for person verification are mainly based on a single modality which would have limitation in security and robustness
Audio visual integration using a camera and microphone makes person verification a more reliable product
Slide: Courtesy, Hung Nguyen
01/22/2007IT 481, Spring37
Joint Audio-Video CodingJoint Audio-Video Coding
Correlation between audio and video can be used to achieve more efficient coding– Predictive coding of audio and video information
used to construct estimate of current frame (cross-modal redundancy)
– Difference between original and estimated signal can be transmitted as parameters
– Decision on what and how to send is based on Rate Distortion (R-D) criteria
Reconstruction done at receiver according to agreed-upon decoding rules
Slide: Courtesy, Hung Nguyen
01/22/2007IT 481, Spring38
Cross-Model Predictive CodingCross-Model Predictive Coding
Visual Analysis
A-to-VMapping
DecisionModule(R-D)
Parameter X
X̂
XX ˆ
Nothing
Parameter X
Slide: Courtesy, Hung Nguyen
01/22/2007IT 481, Spring39
Importance of InteractionImportance of Interaction
Multimedia is more than the combination of text, audio, video and data
Interaction among media is importantConsider a poorly dubbed movie
– Audio not synchronized with video– Lip movements inconsistent with
language– Audio dynamic range inconsistent with
the sceneSlide: Courtesy, Hung Nguyen
01/22/2007IT 481, Spring40
Media InteractionMedia Interaction
Process and Model
Audio
TextImageVideo
Multimedia
Lip synchFace Animation
Joint A/V Coding
CompressionSynthesis3D Sound
Sign languageLip reading
Speech RecognitionText-to-Speech
Compression, GraphicsDatabase indexing/retrieval
TranslationNatural language
Slide: Courtesy, Hung Nguyen
01/22/2007IT 481, Spring41
Bimodality of Human SpeechBimodality of Human Speech
Human speech is produced by vibration of the vocal cord, configuration of the vocal tract with muscles that generate facial expressions
Audio + Visual Perceived
ba ga da
pa ga ta
ma ga na
Slide: Courtesy, Hung Nguyen
01/22/2007IT 481, Spring42
Basic DefinitionsBasic Definitions
The basic unit of acoustic speech is called a phoneme
In the visual domain, the basic unit of mouth movement is called viseme– A viseme is the smallest visibly distinguishable
unit of speech– Can contain several phonemes and thus form
one viseme group– A many-to-one mapping between phonemes and
visemes
Slide: Courtesy, Hung Nguyen
01/22/2007IT 481, Spring43
Lip Reading SystemLip Reading System
Application to support hearing-impaired person
People learn to understand spoken language by combining visual content with lexical, syntactic, semantic and programmatic information
Automated lip reading systems– Speech recognition possible using only visual
information– Integrated with speech recognition systems to
improve accuracy Slide: Courtesy, Hung Nguyen
01/22/2007IT 481, Spring44
Lip SynchronizationLip Synchronization
Applications – In VTC (video teleconferencing) where video
frame is dropped (low bandwidth requirement) but audio must still be continuous
– In non-real-time use such as dubbing in studio where recorded voice full of background noise
Time-warping commonly used in both audio and video modes– Time-frequency analysis– Video time-warping could be used for VTC– Audio time-warping could be used for dubbing
Slide: Courtesy, Hung Nguyen
01/22/2007IT 481, Spring45
Lip TrackingLip Tracking
To prevent too much jerkiness in the motion rendering and too much loss in lip synchronization
Involved real-time analysis on 3-dimensional of the video signal plus one temporal dimension
Produce meaningful parameters
– Classification of mouth images into visemes– Measures of dimension, e.g. mouth widths and
heights Analysis tools – Fourier Transform, Karhunen-
Loeve Transform (KLT), Probability Density Function (pdf) Estimation
Slide: Courtesy, Hung Nguyen
01/22/2007IT 481, Spring46
Audio-to-Visual Mapping for Lip Audio-to-Visual Mapping for Lip TrackingTracking
Conversion of acoustic speech to mouth shape parameters
A mapping of phonemes to visemes Could be most precisely implemented with a
complete speech recognizer followed by a look-up table– High computational overhead plus table look-up
complexity– Do not need to recognize spoken word to achieve audio-
to-visual mapping Physical relationships exist between vocal tract
shape and sound produced functional relationships exist between speech and visual parametersSlide: Courtesy, Hung Nguyen
01/22/2007IT 481, Spring47
Classification-Based Conversion Classification-Based Conversion Approaches for Lip TrackingApproaches for Lip Tracking
Two-step process– Classification of acoustic signal using VQ
(vector quantization), HMM (hidden Markov model) and NN (neural network)
– Mapping of the acoustic classes into corresponding visual outputs, then averaged to get centroid
Shortcomings– Error resulting from averaging visual vector to
get visual centroid– Not a continuous mapping – finite output levels
Slide: Courtesy, Hung Nguyen
01/22/2007IT 481, Spring48
Classification-Based ConversionClassification-Based Conversion
Phoneme Space Viseme Space
Centroid
Slide: Courtesy, Hung Nguyen
01/22/2007IT 481, Spring49
Audio and Visual Integration for Lip Audio and Visual Integration for Lip Reading ApplicationsReading Applications
Three major steps– Audio-visual pre-processing – Principal
Component Analysis (PCA) has been used for feature extraction
– Pattern recognition strategy (HMM, NN, time-warping…)
– Integration strategy (decision making) Heuristic rules to incorporate knowledge of phonemes
about the two modalities Combination of independent evaluation score for each
modalities
Slide: Courtesy, Hung Nguyen