Date post: | 20-Dec-2015 |
Category: |
Documents |
View: | 214 times |
Download: | 0 times |
A Non-obtrusive Head Mounted A Non-obtrusive Head Mounted
Face Capture SystemFace Capture System
Thesis Committee: Thesis Committee: Dr. George C. Stockman (Main Dr. George C. Stockman (Main Advisor)Advisor)Dr. Frank Biocca (Co-Advisor)Dr. Frank Biocca (Co-Advisor)Dr. Charles OwenDr. Charles OwenDr. Jannick Rolland (External Dr. Jannick Rolland (External Faculty)Faculty)
Chandan K. ReddyChandan K. ReddyMaster’s Thesis DefenseMaster’s Thesis Defense
Modes of CommunicationModes of Communication
Text only - e.g. Mail, Electronic MailText only - e.g. Mail, Electronic Mail Voice only – e.g. TelephoneVoice only – e.g. Telephone PC camera based conferencing – e.g. PC camera based conferencing – e.g.
Web camWeb cam Multi-user TeleconferencingMulti-user Teleconferencing Teleconferencing through Virtual Teleconferencing through Virtual
EnvironmentsEnvironments Augmented Reality Based Augmented Reality Based
TeleconferencingTeleconferencing
Problem DefinitionProblem Definition
Face Capture System ( FCS ) Virtual View Synthesis Depth Extraction and 3D Face
Modeling Head Mounted Projection Displays 3D Tele-immersive Environments High Bandwidth Network Connections
Thesis ContributionsThesis Contributions
Complete hardware setup for the FCS.Complete hardware setup for the FCS. Camera-mirror parameter estimation for the Camera-mirror parameter estimation for the
optimal configuration of the FCS.optimal configuration of the FCS. Generation of quality frontal videos from two Generation of quality frontal videos from two
side videosside videos Reconstruction of texture mapped 3D face model Reconstruction of texture mapped 3D face model
from two side viewsfrom two side views Evaluation mechanisms for the generated frontal Evaluation mechanisms for the generated frontal
views.views.
Existing Face Capture Existing Face Capture SystemsSystems
Advantages :Advantages : Freedom for Head Movements Freedom for Head Movements
Drawbacks :Drawbacks : Obstruction of the user’s Field of view Obstruction of the user’s Field of view
Main Applications :Main Applications : Character Animation and Mobile Character Animation and Mobile environmentsenvironments
CourteCourtesy :sy :
FaceCap3d - a FaceCap3d - a product product from Standard from Standard DeviationDeviation
Optical Face Tracker – a Optical Face Tracker – a product from Adaptive product from Adaptive OpticsOptics
Existing Face Capture Existing Face Capture SystemsSystems
Advantages :Advantages : No burden for the user No burden for the user
Drawbacks :Drawbacks : Highly equipped environments and Highly equipped environments and restricted head motionrestricted head motion
Main Applications :Main Applications : Teleconferencing and Teleconferencing and Collaborative workCollaborative work
CourteCourtesy:sy:
National tele-immersion National tele-immersion InitiativeInitiative
Sea of CamerasSea of Cameras(UNC Chappel Hill)(UNC Chappel Hill)
Proposed Face Capture Proposed Face Capture SystemSystem
Novel Face Capture System that is being Novel Face Capture System that is being developed.developed.
Two Cameras capture the corresponding Two Cameras capture the corresponding side views through the mirrorsside views through the mirrors
(F. Biocca and J. P. Rolland, “Teleportal face-to-face system”, Patent Filed, 2000.)
AdvantagesAdvantages
User’s field of view is unobstructedUser’s field of view is unobstructed Portable and easy to usePortable and easy to use Gives very accurate and quality face Gives very accurate and quality face
imagesimages Can process in real-timeCan process in real-time Simple and user-friendly systemSimple and user-friendly system Static with respect to human headStatic with respect to human head Flipping the mirror – cameras view Flipping the mirror – cameras view
the user’s viewpointthe user’s viewpoint
ApplicationsApplications
Mobile EnvironmentsMobile Environments Collaborative WorkCollaborative Work Multi-user TeleconferencingMulti-user Teleconferencing Medical AreasMedical Areas Distance LearningDistance Learning Gaming and Entertainment industryGaming and Entertainment industry OthersOthers
Equipment RequiredEquipment Required
HardwareHardware 2 lipstick cameras 2 lipstick cameras
2 lenses with focal length 2 lenses with focal length 12mm 12mm
2 mirrors with 1.5 inch 2 mirrors with 1.5 inch diameterdiameter
2 Matrox Meteor II 2 Matrox Meteor II standard cardsstandard cards
Lighting equipmentLighting equipment
VGA to NTSC ConverterVGA to NTSC Converter
A ProjectorA Projector
A MicrophoneA Microphone
SoftwareSoftware NetworkNetworkMIL – LITE 7.0MIL – LITE 7.0
Visual Studio 6.0Visual Studio 6.0
Adobe Premiere 6.0Adobe Premiere 6.0
Sound RecorderSound Recorder
Internet 2Internet 2
NAC 3000 MPEG EncoderNAC 3000 MPEG Encoder
NAC 4000 MEG DecoderNAC 4000 MEG Decoder
Optical LayoutOptical Layout
Three Components to be consideredThree Components to be considered CameraCamera MirrorMirror Human FaceHuman Face
Specification ParametersSpecification Parameters
CameraCamera Sensing area: 3.2 mm X 2.4 mm (¼”).Sensing area: 3.2 mm X 2.4 mm (¼”). Pixel Dimensions: Image sensed is of Pixel Dimensions: Image sensed is of
dimensions 768 X 494 pixels. Digitized image dimensions 768 X 494 pixels. Digitized image size is 320 X 240 due to restrictions of the size is 320 X 240 due to restrictions of the RAM size. RAM size.
Focal Length(Fc): 12 mm (VCL – 12UVM).Focal Length(Fc): 12 mm (VCL – 12UVM). Field of View (FOV): 15.2 Field of View (FOV): 15.2 00 X 11.4 X 11.4 00.. Diameter (Dc): 12mmDiameter (Dc): 12mm Fnumber (Nc): 1 -achieve maximum lightness.Fnumber (Nc): 1 -achieve maximum lightness. Minimum Working Distance (MWD)- 200 mm. Minimum Working Distance (MWD)- 200 mm. Depth of Field (DOF): to be estimatedDepth of Field (DOF): to be estimated
Specification Parameters (Contd.)Specification Parameters (Contd.)
MirrorMirror Diameter (Dm) / Fnumber (Nm)Diameter (Dm) / Fnumber (Nm) Focal Length (fm)Focal Length (fm) Magnification factor (Mm)Magnification factor (Mm) Radius of curvature (Rm) Radius of curvature (Rm)
Human FaceHuman Face Height of the face to be captured (H~ 250mm)Height of the face to be captured (H~ 250mm) Width of the face to be captured (W~ 175 mm)Width of the face to be captured (W~ 175 mm)
DistancesDistances Distance between the camera and the mirror. Distance between the camera and the mirror.
(D(Dcmcm~150mm)~150mm) Distance between the mirror and the face. (DDistance between the mirror and the face. (Dmfmf
~200mm)~200mm)
Customization of Cameras and Customization of Cameras and MirrorsMirrors Off-the-shelf camerasOff-the-shelf cameras
Customizing camera lens is a tedious taskCustomizing camera lens is a tedious task Trade-off has to be made between the field of view Trade-off has to be made between the field of view
and the depth of fieldand the depth of field Sony DXC LS1 with 12mm lens is suitable for our Sony DXC LS1 with 12mm lens is suitable for our
applicationapplication Custom designed mirrorsCustom designed mirrors
A plano-convex lens with 40mm diameter is coated A plano-convex lens with 40mm diameter is coated with black on the planar side. with black on the planar side.
The radius of curvature of the convex surface is The radius of curvature of the convex surface is 155.04 mm. 155.04 mm.
The thickness at the center of the lens is 5 mm. The thickness at the center of the lens is 5 mm. The thickness at the edge is 3.7 mm. The thickness at the edge is 3.7 mm.
Problem StatementProblem Statement
Generating virtual frontal view from two side Generating virtual frontal view from two side viewsviews
Data processingData processing
Two synchronized videos are captured in real-Two synchronized videos are captured in real-time (30 frames/sec) simultaneously.time (30 frames/sec) simultaneously.
For effective capturing and processing, the For effective capturing and processing, the data is stored in uncompressed format.data is stored in uncompressed format.
Machine Specifications (Lorelei @ Machine Specifications (Lorelei @ metlab.cse.msu.edu):metlab.cse.msu.edu): Pentium III processorPentium III processor Processor speed: 746 MHzProcessor speed: 746 MHz RAM Size: 384 MBRAM Size: 384 MB Hard Disk write Speed (practical): 9 MB/sHard Disk write Speed (practical): 9 MB/s
MIL-LITE is configured to use 150 MB of RAMMIL-LITE is configured to use 150 MB of RAM
Data processing (Contd.)Data processing (Contd.)
Size of 1 second video = 30 * 320 * 240 *3 Size of 1 second video = 30 * 320 * 240 *3 = 6.59 MB= 6.59 MB
Using 150 MB RAM, only 10 seconds video Using 150 MB RAM, only 10 seconds video from two cameras can be capturedfrom two cameras can be captured
Why does the processing have to be offline?Why does the processing have to be offline? Calibration procedure is not automaticCalibration procedure is not automatic Disk writing speed must be at least 14 MB/S.Disk writing speed must be at least 14 MB/S. To capture 2 videos of 640 * 480 resolution, the To capture 2 videos of 640 * 480 resolution, the
Disk writing speed must be at least 54 MB/S ???Disk writing speed must be at least 54 MB/S ???
Structured Light Structured Light techniquetechnique
A square grid in the A square grid in the frontal view appears as a frontal view appears as a quadrilateral (with curved quadrilateral (with curved edges) in the real side edges) in the real side viewview
Projecting a grid on Projecting a grid on the frontal view of the frontal view of the facethe face
Color BalancingColor Balancing
Hardware based approach White balancing of the cameras
Why this is more robust ? – why not software based ? There is no change in the input camera Better handling of varying lighting conditions No pre - knowledge of the skin color is required No additional overhead Its enough if both cameras are color balanced relatively
Left Calibration Left Calibration Face ImageFace Image
Transformation TablesTransformation Tables
ProjectorProjector
Right Calibration Right Calibration Face ImageFace Image
Off-line Calibration StageOff-line Calibration Stage
Transformation Transformation TablesTables
RightRightFace ImageFace Image
LeftLeftFace ImageFace Image
Right WarpedRight WarpedFace ImageFace Image
Left WarpedLeft WarpedFace ImageFace Image
Mosaiced Face ImageMosaiced Face Image
Operational StageOperational Stage
Comparison of the Frontal ViewsComparison of the Frontal Views
First row – Virtual frontal views First row – Virtual frontal views Second row – Original frontal viewsSecond row – Original frontal views
Video Synchronization (Eye Video Synchronization (Eye blinking)blinking)
First row – Virtual frontal views First row – Virtual frontal views Second row – Original frontal viewsSecond row – Original frontal views
Coordinate SystemsCoordinate Systems
There are five coordinate systems in our application
World Coordinate System (WCS) Face Coordinate System (FCS) Left Camera Coordinate system (LCCS) Right Camera Coordinate system (RCCS) Projector Coordinate System (PCS)
Camera CalibrationCamera Calibration Conversion from 3D world coordinates to 2D Conversion from 3D world coordinates to 2D
camera coordinates - Perspective Transformation camera coordinates - Perspective Transformation ModelModel
s s L L PPrr
s s L L PPcc
ss
CC1111 CC1212 CC1313
CC1414
CC2121 CC2222 CC2323
CC2424
CC3131 CC3232 CC3333 11
==
s s W W PPyy
s s W W PPzz
11
s s W W PPxx
Eliminating the scale factorEliminating the scale factor
uujj = (c = (c1111 – c – c31 31 uujj) x) xjj + (c + (c1212 – c – c32 32 uujj) y) yjj + (c + (c1313 – c – c33 33
uujj) z) zjj + c + c1414
vvjj = (c = (c2121 – c – c31 31 vj) xj + (cvj) xj + (c2222 – c – c32 32 vj) yj + (cvj) yj + (c2323 – – cc33 33 vj) zj + cvj) zj + c2424
Calibration sphereCalibration sphere A sphere can be used for CalibrationA sphere can be used for Calibration Calibration points on the sphere are Calibration points on the sphere are
chosen in such a way that the chosen in such a way that the
Azimuthal angle is varied in steps of Azimuthal angle is varied in steps of 4545oo
Polar angle is varied in steps of 30Polar angle is varied in steps of 30oo
The location of these calibration The location of these calibration points is known in the 3D coordinate points is known in the 3D coordinate System with respect to the origin of System with respect to the origin of the spherethe sphere
The origin of the sphere defines the The origin of the sphere defines the origin of the World Coordinate origin of the World Coordinate SystemSystem
Projector CalibrationProjector Calibration
Similar to Camera CalibrationSimilar to Camera Calibration 2D image coordinates can not be obtained 2D image coordinates can not be obtained
directly from a 2D image.directly from a 2D image. A “Blank Image” is projected onto the sphereA “Blank Image” is projected onto the sphere The 2D coordinates of the calibration points The 2D coordinates of the calibration points
on the projected image are notedon the projected image are noted More points can be seen from the projector’s More points can be seen from the projector’s
point of view – some points are common to point of view – some points are common to both camera viewsboth camera views
Results appear to have slightly more errors Results appear to have slightly more errors when compared to the camera calibrationwhen compared to the camera calibration
3D Face Model Construction3D Face Model Construction
Why?Why? To obtain different views of the faceTo obtain different views of the face To generate the stereo pair to view it in To generate the stereo pair to view it in
the HMPDthe HMPD
Steps requiredSteps required Computation of 3D LocationsComputation of 3D Locations Customization of 3D ModelCustomization of 3D Model Texture MappingTexture Mapping
Computation of 3D Computation of 3D pointspoints 3d point estimation using stereo3d point estimation using stereo
Stereo between two cameras is not Stereo between two cameras is not possible because of the occlusion by the possible because of the occlusion by the facial featuresfacial features
Hence two stereo pair computationsHence two stereo pair computations Left camera and projectorLeft camera and projector Right camera and projectorRight camera and projector
Using stereo, compute 3D points of Using stereo, compute 3D points of prominent facial feature points in FCSprominent facial feature points in FCS
3D Generic Face Model3D Generic Face Model
A generic face model with 395 vertices and A generic face model with 395 vertices and 818 triangles818 triangles
Left: front view and Right: side viewLeft: front view and Right: side view
Evaluation SchemesEvaluation Schemes Evaluation of facial expressions and is not Evaluation of facial expressions and is not
studied extensively in literaturestudied extensively in literature Evaluation can be done for facial Evaluation can be done for facial
alignment, face recognition for static alignment, face recognition for static images images
Lip and eye movements in a dynamic eventLip and eye movements in a dynamic event Perceptual quality – How are the moods Perceptual quality – How are the moods
conveyed?conveyed? Two types of evaluationTwo types of evaluation
Objective evaluationObjective evaluation Subjective evaluationSubjective evaluation
Objective EvaluationObjective Evaluation Theoretical EvaluationTheoretical Evaluation No human feedback requiredNo human feedback required This evaluation can give us a measure This evaluation can give us a measure
ofof Face recognitionFace recognition Face alignmentFace alignment Facial movementsFacial movements
Methods appliedMethods applied Normalized cross correlationNormalized cross correlation Euclidean distance measuresEuclidean distance measures
Evaluation ImagesEvaluation Images
5 frames were considered for objective 5 frames were considered for objective evaluationevaluation
First row – virtual frontal views First row – virtual frontal views Second row – original frontal viewsSecond row – original frontal views
Normalized Cross-Normalized Cross-CorrelationCorrelation
Regions considered for normalized cross-Regions considered for normalized cross-correlation correlation
( Left: Real image Right: Virtual image)( Left: Real image Right: Virtual image)
Normalized Cross-Normalized Cross-CorrelationCorrelation
Let V be the virtual image and R be the real imageLet V be the virtual image and R be the real image Let w be the width and h be the height of the imagesLet w be the width and h be the height of the images The Normalized Cross-correlation between the two images V and R is given byThe Normalized Cross-correlation between the two images V and R is given by
wherewhere
Normalized Cross-Normalized Cross-CorrelationCorrelation
VideoVideo
FramFrameses
Left Left eyeeye
Right Right eyeeye
MoutMouthh
Eyes Eyes + +
MoutMouthh
CompleComplete facete face
FramFrame1e1
0.9880.988 0.9870.987 0.9930.993 0.9890.989 0.9890.989
FramFrame2e2
0.9690.969 0.9720.972 0.9850.985 0.9780.978 0.9850.985
FramFrame3e3
0.9690.969 0.9670.967 0.9920.992 0.9780.978 0.9860.986
FramFrame4e4
0.9910.991 0.9890.989 0.9930.993 0.9900.990 0.9900.990
FramFrame5e5
0.9850.985 0.9860.986 0.9920.992 0.9880.988 0.9890.989
Euclidean Distance Euclidean Distance measuresmeasures
Euclidean distance between two points i and j is given byEuclidean distance between two points i and j is given by
Let Rij be the euclidean distance between two points i and j in Let Rij be the euclidean distance between two points i and j in the real imagethe real image
Let Vij be the euclidean distance between two points i and j in Let Vij be the euclidean distance between two points i and j in the virtual imagethe virtual image
Dij = | Rij - Vij |Dij = | Rij - Vij |
Euclidean Distance Euclidean Distance measuresmeasures
framesframes DDafaf DDbfbf DDcfcf DDcgcg DDdgdg DDegeg ErrorError
FrameFrame11
2.02.000
0.80.800
4.14.155
3.43.499
2.92.955
3.43.466
2.802.80
FrameFrame22
0.50.599
3.03.000
0.70.799
4.94.911
0.60.633
0.80.800
1.791.79
FrameFrame33
1.81.888
3.83.844
4.24.299
4.34.344
2.62.688
1.81.833
3.143.14
FrameFrame44
1.01.099
2.92.977
2.12.100
6.36.333
3.03.011
4.04.088
3.363.36
FrameFrame55
1.61.622
2.22.211
5.55.577
4.94.999
1.21.244
1.91.900
2.922.92
Subjective EvaluationSubjective Evaluation Evaluates the human perceptionEvaluates the human perception Measurement of quality of a talking faceMeasurement of quality of a talking face Factors that might affectFactors that might affect
Quality of the videoQuality of the video Facial movements and expressionsFacial movements and expressions Synchronization of the two halves of the faceSynchronization of the two halves of the face Color and Texture of the faceColor and Texture of the face Quality of audioQuality of audio Synchronization of audioSynchronization of audio
A preliminary study has been made to A preliminary study has been made to assess the quality of the generated videosassess the quality of the generated videos
ConclusionConclusion and and Future Future WorkWork
Virtual Frontal Image
Texture Mapped 3D Face Model
Virtual Frontal Video
3D Facial Animation
Conclusion Future Work
SummarySummary
Design and implementation of a novel Design and implementation of a novel Face Capture SystemFace Capture System
Generation of virtual frontal view from Generation of virtual frontal view from two side views in a video sequencetwo side views in a video sequence
Extraction of depth information using Extraction of depth information using stereo methodstereo method
Texture mapped 3D face model Texture mapped 3D face model generationgeneration
Evaluation of virtual frontal videosEvaluation of virtual frontal videos
Future WorkFuture Work
Online processing in real-timeOnline processing in real-time Automatic calibrationAutomatic calibration 3D facial animation3D facial animation Subjective Evaluation of the virtual frontal videosSubjective Evaluation of the virtual frontal videos Data compression while processing and Data compression while processing and
transmissiontransmission Customization of camera lensesCustomization of camera lenses Integration with a Head Mounted Projection Integration with a Head Mounted Projection
DisplayDisplay