3D Fingertip and Palm Tracking in Depth Image SequencesHui Liang, Junsong Yuan and Daniel Thalmann
Proceedings of the 20th ACM international conference on Multimedia 2012
2
Outline• Introduction • Related Work• Proposed Method• Experimental Results• Conclusion
3
Introduction
4
Introduction• Human hand is an essential body part for human-computer
interaction.
• The positions of tracked fingertips: hand pose estimation
• Difficulty in fingertip tracking:
Side-by-side Bending Nearby
5
Introduction• Many previous methods:
• Only focus on extracting 2D fingertips• Cannot track fingertips robustly for a freely moving hand
• In this paper:
• Present a robust fingertip and palm tracking scheme• With the input of depth images (KINECT)• Track the 3D fingertip positions quite accurately
6
Related Work
Related work• Focus only on 2D fingertips:[4][5][6][9]
• Based on contour analysis of the extracted hand region:[2][4][5][6]
• Usually can track the fingertips for only stretched fingers.
Related work• In [6],
• Fingertips are tracked for infrared image sequences.• It utilizes a template matching strategy • Fingertip tracking : Kalman filter
• In [2],
• Stereoscopic vision is adopted• maximize the distance center of gravity of the hand & the boundary
Related work• In [9] (Kinect),
Depth < Threshold Circular filter
Minimum depth
Related work• [2] S. Consei1, S. Bourennane, and L. Martin. Three dimensional fingertip tracking in
stereovision, 2005. Proc. of the 7th Int’l Conf. on Advanced Concepts for Intelligent Vision Systems.
• [4] K. Hsiao, T. Chen, and S. Chien. Fast fingertip positioning by combining particle filtering with particle random diffusion, 2008. Proc. IEEE Int’l Conf. on Multimedia and Expo.
• [5] I. Katz, K. Gabayan, and H. Aghajan. A multi-touch surface using multiple cameras, 2007. Proc. of the 9th Int’l Conf. on Advanced concepts for intelligent vision systems.
• [6] K. Oka, Y. Sato, and H. Koike. Real-time tracking of multiple fingertips and gesture recognition for augmented desk interface systems, 2002. Proc. IEEE Int’l Conf. on Automatic Face and Gesture Recognition.
• [9] J. L. Raheja, A. Chaudhary, and K. Singal. Tracking of fingertips and centres of palm using kinect, 2011. Proc. Of the 3rd Int’l Conf. on Computational Intelligence, Modelling and Simulation.
11
ProposedMethod
12
Overview
13
Hand and Palm Detection• 1) Assume the hand is the nearest object
• 2) Constrain global hand rotation by:
• : global rotation angle of the hand
ForegroundSegmentation
PalmLocalization
HandSegmentation
14
• Threshold the depth frame to obtain the foreground F :
• p : a pixel coordinate • z(p) : depth value (of point p )• z0 : the minimum depth value
• zD : depth threshold
Hand and Palm DetectionForeground
SegmentationPalm
LocalizationHand
Segmentation
foreground F
0.2m
15
• The palm region is approximated with a circle:
• pp : the palm center (of point p )
• rp : the radius
• Assume that hand palm forms a globally largest blob• Cp equals to the largest inscribed circle of the contour of F .• 2D Kalman filter : reduce computational complexity
Hand and Palm DetectionForeground
SegmentationPalm
LocalizationHand
Segmentation
16
• Separate hand and forearm by a line:
• 1) Tangent to Cp
• 2) Perpendicular to the orientation of the forearm
• Orientation of the forearm :• The Eigenvector that corresponds to the largest
Eigenvalue of the covariance matrix of the contour pixel coordinates of F
Hand and Palm DetectionForeground
SegmentationPalm
LocalizationHand
Segmentation
Hand region : FV(2D) → FD(3D)
17
Finger Detection and Tracking• Constraints on possible fingertip locations:
• 1) Only in depth discontinuous region ( in contour Fv)• 2) | Depth(one point) – Depth(neighborhoods) | are important.• 3) Utilize the 3D geodesic shortest path (GSP)
Fingertip vs. Non-fingertip
Nearby Fingertips
Initialization and Re-initialization
Fingertip Tracking
18
Fingertip detectionFingertip tracking
• Goal: detect all five fingertips in the depth image• Based on three depth-based features
• Build a graph G :
• Vh : contains of all points within FV (hand contour)
• Eh : for each pair of vertices(p,q), 1) they are in the 8-neighborhood of each other 2) their 3D distance is within threshold τ
Fingertip Detection
Edge weight : 3D Euclidean distance
19
• Calculate Geodesic distance dg(p):• From palm center pp for each vertex Vh
• Dijkstra graph search on Gh
• GSP point set Ug(p):• The set of vertices on the shortest path from pp to p
• Rectangle local feature RL(p):• Describe the neighborhood of a point p in FV
•
• : ratio of 1s
Fingertip Detection
0 0 11 1 1 1 11 1 X 1 01 1 1 1
0 1 0
Fingertip detectionFingertip tracking
1cm
20
• Calculate Geodesic distance dg(p):
Fingertip Detection
0.4
Fingertip detectionFingertip tracking
dg(p)
𝜂(𝑝)
21
• Fingertip labeling:
Fingertip Detection
:estimate the probability that has the label lj
number of GSP points frame number
1 2 3 4 5
1
2
3
4
5i : fingertip
j : label
Fingertip detectionFingertip tracking
Nmax=6
22
• Fingertip labeling:
Fingertip DetectionFingertip detectionFingertip tracking
23
Fingertip Tracking• Build a particle filter for each fingertip
• (x, ω) denote a particle• x : 2D position in FV • ω : the particle weight
• denote a particle
• Constrain the positions of each particle to the border point set UB to reduce the search space
Fingertip detectionFingertip tracking
24
• Likelihood function :
•
Metric parameters
difference
Hausdorff difference
feature difference
Fingertip Tracking Fingertip detectionFingertip tracking
/
/
Geodesicdistance
GSP points
Neighbordepth
25
Fingertip Tracking• Likelihood function :
Fingertip detectionFingertip tracking
26
ExperimentalResults
27
Experimental Results• Quantitative results on synthetic sequences:
Seq. No. Motion Seq.
No. Motion
Seq. 1 grasping Seq. 4 flexion
Seq. 2 adduction/abduction Seq. 5 global rotation
Seq. 3 successive single finger Seq. 6 combination of grasping and global rotation
28
Experimental Results• Virtual object grasping:
29
Conclusion
30
Conclusion• Using multiple depth-based features for accurate fingertip
localization
• Adopting a particle filter to track the fingertips over successive frames
• Track the 3D positions of fingertips robustly
• Great potential for extension to other HCI applications