+ All Categories
Home > Documents > Toward accurate realtime marker labeling for live …...Motion capture technology has been widely...

Toward accurate realtime marker labeling for live …...Motion capture technology has been widely...

Date post: 21-Jul-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
11
Vis Comput (2017) 33:993–1003 DOI 10.1007/s00371-017-1400-y ORIGINAL ARTICLE Toward accurate real-time marker labeling for live optical motion capture Shihong Xia 1 · Le Su 1,2 · Xinyu Fei 1,2 · Han Wang 1,2 Published online: 15 May 2017 © The Author(s) 2017. This article is an open access publication Abstract Marker labeling plays an important role in optical motion capture pipeline especially in real-time applications; however, the accuracy of online marker labeling is still unclear. This paper presents a novel accurate real-time online marker labeling algorithm for simultaneously dealing with missing and ghost markers. We first introduce a soft graph matching model that automatically labels the markers by using Hungarian algorithm for finding the global optimal matching. The key idea is to formulate the problem in a com- binatorial optimization framework. The objective function minimizes the matching cost, which simultaneously mea- sures the difference of markers in the model and data graphs as well as their local geometrical structures consisting of edge constraints. To achieve high subsequent marker label- ing accuracy, which may be influenced by limb occlusions or self-occlusions, we also propose an online high-quality full- body pose reconstruction process to estimate the positions of missing markers. We demonstrate the power of our approach by capturing a wide range of human movements and achieve the state-of-the-art accuracy by comparing against alternative methods and commercial system like VICON. Electronic supplementary material The online version of this article (doi:10.1007/s00371-017-1400-y) contains supplementary material, which is available to authorized users. B Shihong Xia [email protected] 1 Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China 2 University of Chinese Academy of Sciences, Beijing, China Keywords Motion capture · Marker labeling · Graph matching · Point correspondence 1 Introduction Motion capture technology has been widely used to create natural human animations in real-time live applications as virtual training, virtual prototyping, computer games and computer animated puppetry [1]. Passive optical motion cap- ture system, like VICON [2], is used in most applications because of its high precision and low intrusion. However, it only records 3D markers’ positions without any phys- ical meaning (unlabeled). In addition, markers may often disappear and/or re-appear during the motion sequence due to limbs occlusion or self-occlusion, which makes marker labeling task for a live motion capture sequence be a big challenge. The goal of practical marker labeling task is to (1) solve the correspondences problem for moving markers while (2) provide a solution to deal with missing and/or ghost markers which will lead to motion reconstruction ambiguities. Unlike marker labeling method in offline man- ner [3], we mainly aim to achieve the second goal, especially when both accuracy and efficiency need to be considered in real-time live applications [4] and interactive applica- tions [5]. In this paper, we present a novel online marker label- ing approach based on graph matching model and human pose reconstruction process to produce accurate and effi- cient marker labeling results for real-time live applications with missing/ghost markers, as illustrated in Fig. 1. Specif- ically, by regarding labeled markers at previous frame and unlabeled markers at current frame as model graph and data graph, respectively, we formulate marker label- 123
Transcript
Page 1: Toward accurate realtime marker labeling for live …...Motion capture technology has been widely used to create natural human animations in real-time live applications as virtual

Vis Comput (2017) 33:993–1003DOI 10.1007/s00371-017-1400-y

ORIGINAL ARTICLE

Toward accurate real-time marker labeling for live optical motioncapture

Shihong Xia1 · Le Su1,2 · Xinyu Fei1,2 · Han Wang1,2

Published online: 15 May 2017© The Author(s) 2017. This article is an open access publication

Abstract Marker labeling plays an important role in opticalmotion capture pipeline especially in real-time applications;however, the accuracy of online marker labeling is stillunclear. This paper presents a novel accurate real-time onlinemarker labeling algorithm for simultaneously dealing withmissing and ghost markers. We first introduce a soft graphmatching model that automatically labels the markers byusing Hungarian algorithm for finding the global optimalmatching. The key idea is to formulate the problem in a com-binatorial optimization framework. The objective functionminimizes the matching cost, which simultaneously mea-sures the difference of markers in the model and data graphsas well as their local geometrical structures consisting ofedge constraints. To achieve high subsequent marker label-ing accuracy, which may be influenced by limb occlusions orself-occlusions, we also propose an online high-quality full-body pose reconstruction process to estimate the positions ofmissing markers. We demonstrate the power of our approachby capturing a wide range of human movements and achievethe state-of-the-art accuracy by comparing against alternativemethods and commercial system like VICON.

Electronic supplementary material The online version of thisarticle (doi:10.1007/s00371-017-1400-y) contains supplementarymaterial, which is available to authorized users.

B Shihong [email protected]

1 Institute of Computing Technology, Chinese Academy ofSciences, Beijing, China

2 University of Chinese Academy of Sciences, Beijing, China

Keywords Motion capture · Marker labeling · Graphmatching · Point correspondence

1 Introduction

Motion capture technology has been widely used to createnatural human animations in real-time live applications asvirtual training, virtual prototyping, computer games andcomputer animated puppetry [1]. Passive optical motion cap-ture system, like VICON [2], is used in most applicationsbecause of its high precision and low intrusion. However,it only records 3D markers’ positions without any phys-ical meaning (unlabeled). In addition, markers may oftendisappear and/or re-appear during the motion sequence dueto limbs occlusion or self-occlusion, which makes markerlabeling task for a live motion capture sequence be a bigchallenge.

The goal of practical marker labeling task is to (1)solve the correspondences problem for moving markerswhile (2) provide a solution to deal with missing and/orghost markers which will lead to motion reconstructionambiguities. Unlike marker labeling method in offline man-ner [3], we mainly aim to achieve the second goal, especiallywhen both accuracy and efficiency need to be consideredin real-time live applications [4] and interactive applica-tions [5].

In this paper, we present a novel online marker label-ing approach based on graph matching model and humanpose reconstruction process to produce accurate and effi-cient marker labeling results for real-time live applicationswith missing/ghost markers, as illustrated in Fig. 1. Specif-ically, by regarding labeled markers at previous frameand unlabeled markers at current frame as model graphand data graph, respectively, we formulate marker label-

123

Page 2: Toward accurate realtime marker labeling for live …...Motion capture technology has been widely used to create natural human animations in real-time live applications as virtual

994 S. Xia et al.

Fig. 1 Online marker labeling process overview. Given previouslylabeled marker data (left three columns), our approach automaticallylabels the raw marker data (top right image) captured by motion cap-ture system

ing problem as soft graph matching, which is an essen-tial combinatorial optimization problem solved by Hun-garian algorithm to achieve high efficiency. In order toachieve high labeling accuracy, we also design a nonlin-ear optimization process to estimate the positions of missingmarkers.

We demonstrate the power of our approach by compar-ing against alternative state-of-the-art methods and com-mercial system as VICON on a wide range of motioncapture data with missing/ghost markers. First, we showour outperformed accuracy and efficiency on single sub-ject motion sequences and two interactive subjects motionsequences (Sect. 5.1). Then, we show our outperformedpose reconstruction accuracy on single subject motionsequences (Sect. 5.2). Finally, we show the accurate markerlabeling results to demonstrate the capability of handlingwith ghost markers as well as facial motions with nonerigid constraints (Sect. 5.3). Due to page limitation, pleasesee the supplementary video for more evaluation results.Note that, since we only focus on correctly solving themarker labeling problem not motion denoising [4] or miss-ing marker estimation [6,7] problem, we only show theevaluations compared against alternative marker labelingmethods.

In summary, our main contributions are as follows: (1)A novel accurate and efficient marker labeling process inreal-time live manner; (2) A soft graph matching model thatautomatically labels the markers in successive frames byusing Hungarian algorithm for finding the global optimalmatching solution.

2 Related work

Our online marker labeling method is related to point corre-spondence and graph matching methods.

2.1 Point correspondence

Yu [8] proposed online tracking framework formultiple inter-acting subjects by constructing a motion model to find bestmarker correspondences,which is a greedy algorithm leadingto local optimum and it must be at least two visible markerson the same limb. Similar to [8], we also use the trackingframework, but we introduce a soft graph matching modelinstead of using the example data [8] to improve the labelingaccuracy.

Li [9] proposed a self-initializing identification labelingmethod on each segment for establishing local segmentalcorrespondences. Li [10] designed a similarity k-d tree toidentifymarkers fromsimilar poses of twoobjects, but cannotdeal with missing data. Li [11] integrated key-frame-basedself-initializing hierarchical segmental matching [9] withinter-frame tracking to label the articulated motion sequencepresented by feature points which is an offline approach.

Mundermann Articulated-ICP algorithm with soft-jointconstraints [12] is used to track limbs from dense images.In our case, full-body and facial motions are represented by3D sparse points known as markers attached on the subjects’skin. When markers are missing, which happens frequentlyduring motion capture process, it is almost impossible to useICP-based methods [13,14] to find the marker correspon-dences in successive frames. Others [15,16] formulate densepoints into lines, curves or surfaces to get non-rigid transfor-mations. The necessary spatial data continuity is again notavailable in the case of sparse points [11].

Probabilistic inference with points’ topology is used tofind point correspondences for 2D non-rigid points [17,18]and 3D dense surface points [19,20]. Different from them,we propose the soft graphmatchingmodelwith discrete com-binatorial optimization algorithm to find 3D sparse markercorrespondences in online manner by solving a problem.

2.2 Graph matching

Graph matching plays a central role in solving correspon-dence problem. According to whether the graph edges aretaken into account or not, graph matching can be dividedinto two categories: unary graph matching and binary graphmatching.

Unary graph matching treats each node independently,discarding the relationships between nodes. The model wasused by Veenman [21] to solve point tracking problem. Theyproposed an adaptable framework which can be used in con-

123

Page 3: Toward accurate realtime marker labeling for live …...Motion capture technology has been widely used to create natural human animations in real-time live applications as virtual

Toward accurate real-time marker labeling for live optical motion capture 995

junction with a variety of cost functions [22,23], which canbe solved by Hungarian algorithm optimization [24].

Binary graph matching considers both node and edgeattributes. The problem is non-polynomial and a lot of efforthas been made in finding good approximate solutions [25].Probably the fastest approximation solution to the problemis presented in [26], the authors present an efficient spec-tral method. After relaxing the mapping constraints and theintegral constraints, the principle eigenvector of matchingcost matrix is interpreted as the confidence of assignments.The assignment with the maximum confidence and consis-tent with the constraints is accepted as a correct assignment.But as stated in [27,28], the correspondence accuracy of themethod is not very satisfactory. Torresani [28] apply “dualdecomposition” approach that decomposes the original prob-lem into simpler sub-problems, which are repeatedly solvedindependently and combined into a global solution. Theauthor claims that it is the first technique capable of reach-ing global optimality on various real-world image matchingproblems and outperforms existing graph matching algo-rithms. In fact, their method needs several seconds to processa picture with 30 nodes.

Existing graph matching methods cannot solve our prob-lemwith both accuracy and efficiency.Unary graphmatchingis efficient but inaccurate due to the neglect of edge con-straints. Binary graph matching is more accurate by takingboth motion smoothness and edge constraints into account.But it is too complex to get the optimal solution in real time. Inthis paper, we take both advantages of unary and binary graphmodel and present soft graph matching model by mergingthe matching cost of local geometrical structure consistingof edges into the matching cost of graph nodes to achievehigh accuracy and efficiency simultaneously.

3 Soft graph matching

We define labeled marker set at previous frame as ModelGraph represented by G1 = (V1, E1), then unlabeledmarker set at current frame as Data Graph represented byG2 = (V2, E2), respectively. V1 = {mi : i = 1, . . . , M}and V2 = {uj : i = 1, . . . , N } are node (markers) sets. E1

and E2 are edge sets. mi and ui are labeled and unlabeled3D marker positions, respectively. We connect two markersby an edgemimj if they are neighbors on the same limb andtheir relative position keeps fixed over time, as we call thislocal rigid constraints.

We assume the number of markers in V1 and V2 is thesame, i.e., M = N . In the case of M �= N which is causedby missing markers and ghost markers, we will add dummymarkers toV1 andV2 tomake the condition holds. Thematch-ing cost related to the dummymarkers is set to the maximumcost wmax .

Let {φi j : i = 1, . . . , M; j = 1, . . . , N } denote all pos-sible matches between model and data graph, and {ci j , i =1, . . . , M; j = 1, . . . , N } be their matching cost.We assumeL as the correct label which is a set of marker match. xi j isindicator variable, equals to 1, if φi j ∈ L , and to 0 otherwise.

Ourmethod considers themarker and its local geometricalstructure simultaneously. The edges starting from a markermake up its local geometrical structure. Thus, marker label-ing problem can be formulated as the following soft graphmatching.

minx

cost (x) =∑

a

[ωpcp(a)xa + (1 − ωp)clg(a)xa] (1)

cp(a) = ‖m1 − u1‖2 (2)

where cp and clg are matching cost of point and its localgeometrical structure, respectively,ωp is weight, and a is anypossiblematch (i, j).Weuse the classicHungarian algorithmto solve above combinatorial optimization problem. In ourexperiment, we set ωp = 0.5.

Figure 2 explains how to calculate the matching cost ofmarker correspondence and its local geometrical structure.Let φa denote the correspondence u1 to m1. The cost ofpoint match caused by φa is defined as the following spatialdistance.

As we can see in Fig. 2, m1 and m2 are connected byan edge m1m2. If u1 is matched to m1 and u3 is matchedto m2, the relative position between u1 and u3 must meetedge constraint of m1 and m2. We use ce(mimj,ui′uj′) todenote the matching cost of edge. Letmj denote the markersconnected with m1 and uj′ be the candidate assignment ofmj, the local geometrical matching cost of φa is defined as:

clg(a) = 1

|mj|∑

mj

minuj′

ce(m1mj,u1uj′

)(3)

Fig. 2 An example of soft graph matching model including of modeland data graph. Markers m1, . . . ,m4 are nodes in model graph at pre-vious frame, and unlabeled markers u1, . . . , u7 are nodes in data graphat current frame. Markers within circles (dashed circles) are selectedas candidate assignments. The involved edge matching cost is shown inTable 1

123

Page 4: Toward accurate realtime marker labeling for live …...Motion capture technology has been widely used to create natural human animations in real-time live applications as virtual

996 S. Xia et al.

Table 1 The matching cost of edges related to φa

m2 m3 m4

u3 ce(m1m2,u1u3)

u4 ce(m1m3,u1u4)

u5 ce(m1m3,u1u5)

u6 ce(m1m4,u1u6)

u7 ce(m1m4,u1u7)

The costs in the same column are all related to the same edge in modelgraph so their minimum is selected as the matching cost of the edge.The matching cost of different edges is then averaged to form the localgeometrical matching cost of φa

where |mj| is the number of markers connected withm1 andce(m1mj,u1uj′) is defined as:

ce(m1mj,u1uj′) = (‖u1 − uj′ ‖ − dm1mj)2

+ωa

(1 − (m1 − mj) · (u1 − uj′)

‖m1 − mj‖‖u1 − uj′ ‖)2

(4)

where dm1mj is the distance between m1 and mj which isobtained from the previous frame and is updated over time.The 1st term in Eq. 4 measures the difference of lengthbetween two edges and the 2nd term is the difference of theirdirection. The inconsistency between the unit of length andthe unit of cosine angle is compensated by ωa . In our experi-ment, we set ωa = 1e4. The matching cost of different edgesis then averaged to form the local geometrical matching costof φa .

Let φa = φi j , our soft graph matching model is definedas follows:

minx

cost (x) =∑

i

j

wi j xi j , where

wi j ={

ωpcp(a) + (1 − ωp)clg(a), j ∈ b(i)wmax, j /∈ b(i)

s.t. xi j ∈ {0, 1}∑

i

xi j = 1, for all j,∑

j

xi j = 1, for all i (5)

wherewmax is an experimentally definedmaximum cost, andcandidate match b(i) of marker i are selected as a set ofmarkers the distance betweenwhich and the predictedmarkerusing Kalman filter [30] is less than a specific threshold. Weuse the Hungarian algorithm to find the best matching, whichis super-fast as the calculation of matching cost is only doneon the selected candidate assignments for each marker.

As our marker labeling method is for real-time live appli-cations, we use a simple method to automatically label allmarkers at the 1st frame. Specifically, first, we instruct the

subjects to perform their motions starting from T-pose withall markers visible. Then, based on the prior knowledge ofcurrent subject’s skeleton T-pose model and marker offsetsrelative to the inboard joints, we perform a nonlinear opti-mization process to fit the model into the captured markers at1st frame by minimizing the distances between the markerson the model and captured 1st frame.

4 Missing marker estimation

Motion capture raw data often contains missing markers dueto limb occlusions and self-occlusions, which will lead tolow accuracy in marker labeling process. Here we proposea nonlinear optimization process to solve the problem. First,we reconstruct the current pose using Inverse Kinematicstechnique. Then, we use the reconstructed pose and edgeconstraint to estimate the position of occluded markers.

We define human body pose using a set of independentjoint coordinates θ ∈ R42, including absolute root positionand orientation as well as the relative joint angles of individ-ual joints. These bones are head (1 Dof), neck (2 Dof), lowerback (3 Dof), and left/right shoulders (2 Dof), arms (3 Dof),forearms (1 Dof), hands (3 Dof), upper legs (3 Dof), lowerlegs (1 Dof), and feet (2 Dof).

We reconstruct current frame pose θ t by minimizing anobjective function consisting of four terms:

minθ t

λ1EO + λ2EP + λ3ES + λ4EC (6)

where EO , EP , ES and EC represent the observed term,predicted term, smoothness term and constraint term,respectively. Theweightsλ1, λ2, λ3 andλ4 control the impor-tance of each term and experimentally set to 0.05, 0.15, 0.8and 0.1, respectively. We describe details of each term asfollows.

The observed termmeasures the distance between the labeledobserved markers and corresponding markers from recon-structed pose:

EO =M∑

i=1

[(1 − oti )(ei (θt) − mt

i )2] (7)

where ei (θ t) is the forward kinematics function that com-putes i th marker position with the prior knowledge of theuser’s skeleton, sv, and markers’ offsets, lv, relative to theinboard joint. oti is a binary weight, and equals to 0 if i thmarker is occluded, and to 1 otherwise. Reconstructing themotion sequence from only this constraint is the same asperforming per-frame inverse kinematics as in [29].

The predicted term According to the Kalman filter [30], wecan get a probabilistic distribution of the 3D position of the

123

Page 5: Toward accurate realtime marker labeling for live …...Motion capture technology has been widely used to create natural human animations in real-time live applications as virtual

Toward accurate real-time marker labeling for live optical motion capture 997

occluded markers. Suppose that xt−1i is the hidden state vec-

tor as the 3D position and velocity of marker i , yti is themeasurement vector as the captured position or estimatedposition (when the marker is occluded) of the same marker.The reconstructed pose should maximize the conditional dis-tribution yti |xt−1

i , which is a normal probability distribution

P(yti |xt−1i ) = exp

(− 12 (y

ti − μt

i

)T(Γ t

i )−1(yti − μti ))

(2π)d2 |Γ t

i | 12(8)

with the mean and variance

μti = vti , Γ t

i = HTi HiΛ + Σ (9)

where d is the dimension of yti , |Γ ti | is the determinant of the

covariance matrix Γ ti , Hi is the measurement matrix which

relates the hidden state xti to the measurement yti , Λ and Σ

are the process noise covariance and the measurement noisecovariance, respectively.

We minimize the negative log of P(yti |xt−1i ), yielding the

formulation:

EP =M∑

i=1

[oti (ei (θ t) − vti )T (Γ t

i )−1(ei (θt) − vti )] (10)

The smoothness term is used to enforce temporal smoothnessby penalizing the velocity change between current recon-structed pose θ t and two previous ones [θ t−1, θ t−2] throughtime:

ES = ‖θ t − 2θ t−1 + θ t−2‖2 (11)

The constraint term is used to prevent the pose from reachingan impossible posture by over bending the joints. We limitthe joint angles by following equation:

EC =∑

θ ti ∈θ t

[β(i)(θ ti − θi)2 + β(i)(θ ti − θi)

2] (12)

where each body joint is associatedwith conservative bounds[θi, θi]. For the bounds, we use the values measured by thebiomechanical literature [31]. β(i) and β(i) are indicatorfunctions. β(i) evaluates to 1 if θ ti < θi, and to 0 otherwise.

β(i) is equal to 1 if θ ti > θ i, and to 0 otherwise.We use Quasi-Newton BFGS optimization [32] to solve

the optimization problem in Eq. 6. We initialize the posereconstruction process without the smoothness term for the1st frame. Each frame takes 3–5 iterations to converge formost cases.

The pose reconstruction process keeps the motion ten-dency of the missing markers by maintaining rigid bodyconstrains, so we can estimate missing markers from the

reconstructed pose. Specifically, by assuming the relativeposition of two markers (mt

j,mti ), on a same limb, which

we call neighbor markers, and they are fixed at any time dur-ing the motion sequence, we can get the missing markersfrom the reconstructed pose:

mti = 1

| j |∑

j

[mtj − (e j (θ

t) − ei (θt))] (13)

where j is the neighbor marker of i that is visible at currentframe.

When one marker and most of its neighbors are missingat the same time, we use an iterative scheme to recover themissingmarkers. First, we recover themissingmarker whoseneighbors are visible, and then use the recovered marker toestimate other occluded markers. If all markers on a samelimb are missing at the same time, we directly use the virtualmarkers on the reconstructed pose as the recovered ones.

5 Experimental results

We demonstrate the power of our approach by comparingagainst alternative state-of-the-art methods and commer-cial system as VICON on a wide range of motion capturedata. First, we show our outperformed accuracy and effi-ciency on single and double interactive subjects motionsequences (Sect. 5.1). Then, we show our outperformed posereconstruction accuracy on single subject motion sequences(Sect. 5.2). Finally, we show the accurate marker labelingresults to demonstrate the capability of handling with ghostmarkers as well as facial motions with none rigid constraints(Sect. 5.3). All of the tests are done on a 4-core 2.4GHz CPUwith 2GB RAM.We use the labeled markers at the 1st frameto initialize Kalman filters and the relative positions betweenmarkers on a same limb. We use identification rate of markertrajectories ζ , which is defined as the ratio of the numberof correctly labeled marker trajectories with respect to totalnumber of trajectories, to represent themarker labeling accu-racy.

5.1 Performance on CMU motion capture data

We compare our method against alternative methods: Yu [8](YLD), the closest point based approach (CP), binary graphmatching [26] (LH) and original unary graph matching(UGM). The CP approach assumes that correct correspon-dence is the closest point in the next frame. As for binaryand unary graph matching, the definition of matching cost ofpoint and edge correspondence takes the formof Eqs. 2 and 4,respectively. We test on 665 CMU MoCap data [33] (totally816,000 frames) including walk, run, jump, kick, punch, roll,

123

Page 6: Toward accurate realtime marker labeling for live …...Motion capture technology has been widely used to create natural human animations in real-time live applications as virtual

998 S. Xia et al.

dance, skateboard, basketball, etc. The original data are cap-tured at 120 fps. All the data are classified according to theembedded noise level that is defined as:

η = maxt,i, j

‖mti − mt

j‖ − di j

di j. (14)

The results are shown inFig. 3 andTable 2.At high capturerate (120 fps), the 3D position of each marker won’t changemuch, so the result of UGM is as good as ours. But dueto limited computing power, lower capture rate such as 60or 30 fps is commonly used in practical applications. So wecompared our method against UGM at capture rate 60, 45,30 and 25 fps, and get the outperformed accuracy, shown inFig. 4. The efficiency testing result is shown in Table 3.

We also test our approach on motion capture data of mul-tiple interacting characters. By adding markers into modelgraph, our method can naturally be expanded to multiplesubjects. The outperformed accuracy of our method com-

Fig. 3 Comparison results: identification rate of marker trajectory (ζ )versus embedded noise level (Eq. 14)

Table 2 Efficiency (fps) of different labeling methods

YLD CP LH UGM Our method

Walk 263.3 263.2 52.7 288.7 258.5

Run 264.9 261.9 60.1 289.0 259.1

Jump 265.0 262.1 48.4 288.2 257.2

Kick 262.6 262.7 44.6 289.2 256.4

Punch 263.0 262.3 44.7 289.0 256.3

Roll 265.0 264.5 60.3 289.6 260.0

Dance 263.5 264.0 46.8 288.0 256.5

Skateboard 262.6 262.8 63.5 288.8 259.6

Basketball 263.5 260.5 54.9 288.1 256.9

All 263.3 262.4 48.1 288.8 257.1

Fig. 4 Comparison results with down-sampling: percent of incorrectlylabeledmarker trajectory (1−ζ ) versus different frame rate. The dashedlines represent results of the two methods on all MoCap data. The barsrepresent results on MoCap data with different embedded noise levels(Eq. 14, all:noise free, 0.2:20% noise and 0.4:40% noise)

Table 3 Efficiency (fps) comparison on single subject

YLD CP LH UGM Our method

Label 278.5 261.7 56.3 288.6 250.7

IK 85.6 91.1 83.9 87.5 85.0

Total 65.4 67.6 33.7 67.1 63.5

pared against alternative methods is shown in Fig. 5. And theefficiency testing result is shown in Table 4.

5.2 Application: online human motion reconstruction

Based on our online marker labeling and pose reconstructionalgorithm, we proposed an online motion reconstruction sys-tem. The motion capture system we used is Vicon T-seriessystem with 12 cameras. Our system takes unlabeled 3Dmarker positions as input and produces reconstructed posesin real-time online manner. We compare the resulting ani-mation of our method against Vicon and alternative labelingmethods: YLD, UGM, LH and CP. The comparison resultsare best viewed in the supplementary video, although weshow several examples in Figs. 6 and 7.

5.3 Discussion

In case of noise motion capture data with capture rate at120 fps, the displacement of marker in successive frames isvery small, the labeling accuracy of our method is obvi-ously better than alternative methods (CP, LH, YLD), andalmost equal to UGM. That is because as in CP and LH andYLD methods, as the noise level within motion capture dataincreasing, the rigidity of the edges cannot be kept anymore.

123

Page 7: Toward accurate realtime marker labeling for live …...Motion capture technology has been widely used to create natural human animations in real-time live applications as virtual

Toward accurate real-time marker labeling for live optical motion capture 999

Fig. 5 Accuracy comparison results on motion capture data of two interacting subjects

However, UGM method will decay as the decrease of themotion capture rates because it only considers the smooth-ness of the marker’s trajectory. This indicates that integrateduse of the soft graph matching model and the missing markerestimation scheme is helpful to resume the identification afterthe loss of most tracking.

Ghost markers The original motion capture data containednoise markers. We randomly generate more noise markersas ghost markers. We first specify a number (α) of ghostmarkers and then randomly generate the positions of ghostmarkers as well as the appearing time. The ghost markersare generated in two different ways, Fig. 8. In the first way,ghost markers are directly generated according to originalnoisemarkers positions. In the secondway, ghostmarkers are

generated according to extra noise markers, which are sam-pled from the original noise marker positions with Gaussiannoise N (0, 2)(cm). We test our marker labeling method on500 randomly generated motions in both ways. And we findthat even when α = |M |, the number of total wrong labeledmarkers is still less than 10. These results demonstrate thecapability of rejecting a large number of ghost markers.Facial marker labeling Unlike human body, there are nolimbs in human face. As a result, local rigid constraints areinvalid in facial motions as the relative distances betweenmarkers may change a lot along with the facial muscle andskin. So we only use the motion smoothness constraint toestimate the matching cost for different marker correspon-dences (the 2nd term in Eq. 4). To correctly label all markersat the 1st frame, similar to full-body cases,we instruct subject

123

Page 8: Toward accurate realtime marker labeling for live …...Motion capture technology has been widely used to create natural human animations in real-time live applications as virtual

1000 S. Xia et al.

Fig. 6 Comparison results on a trampoline motion sequence

Table 4 Efficiency (fps) comparison on double subjects

YLD CP LH UGM Our method

Label 108.7 125.4 13.1 139.1 123.6

IK 39.1 40.7 40.9 40.3 39.6

Total 28.7 31.7 9.9 31.2 30.0

to perform facial motions starting from “normal” expression.Figure 9 indicates the power of ourmethod for accurate facialmarker labeling applications.

6 Conclusion

In this paper,we present a newonlinemarker labelingmethodfor optical motion capture, which can be used for buildingreal-time live applications. Experimental results demonstratethat the marker labeling accuracy of our method outperformsthe state-of-the-art marker labeling methods especially in thecases of missing/ghost markers and low-frequency capturerates. It benefits from the integrated use of the proposed soft

graph matching model and the marker estimation schemewhich simultaneously considers the local geometrical struc-ture and full pose. Although the marker labeling efficiencyof our method is not the best (as rand 4 of 5 methods) due tothe use of pose reconstruction process, it is still sufficient forreal-time live applications.

6.1 Limitation and future work

The performance of ourmethod becomesworse as themotioncapture rate is decreased. The main reason is that current set-ting of the empirical weights is not optimal. In fact, whenthe MoCap data are down-sampled or the markers are previ-ously occluded, the weight of the point correspondence costshould be decreased and when amarker violates rigidity con-straint a lot (i.e., attached to soft tissues), the weight of theedge correspondence cost should be decreased. We plan toset an automatic or experimental scheme finding the opti-mal weights being suitable for various kinds of motion infuture. Also, we plan to explore a robust method automati-cally detecting labeling failures and reinitializing the labelingprocess.

123

Page 9: Toward accurate realtime marker labeling for live …...Motion capture technology has been widely used to create natural human animations in real-time live applications as virtual

Toward accurate real-time marker labeling for live optical motion capture 1001

Fig. 7 Accuracy comparison results on a playing tennis sequence

Aiming to identify markers for live real-time applica-tions, ourmethod is easily extended tomulti-actor interactionmotions. Unfortunately, the estimated marker is not alwaysaccurate, especially when the markers on an arm or a legare all occluded for a long period of time. Inaccurate estima-tion of missing markers may lead to the deterioration of thelabeling algorithm, especially when the missing marker re-appears. As for multiple interacting characters, quality of the

reconstructed motion cannot be guaranteed when the inter-action becomes more intensive (for example, two people areholding each other while rolling on the ground). We plan todo further study on data-driven approach: First, when calcu-lating Eq. 3, we implicitly assume the correspondence of theneighbor, but the label result may conflict with this assump-tion.This could be improvedbyusing example data. Then,wewould like to explore how to construct a reasonable statistical

123

Page 10: Toward accurate realtime marker labeling for live …...Motion capture technology has been widely used to create natural human animations in real-time live applications as virtual

1002 S. Xia et al.

Fig. 8 Labeling result on a walking sequence with 41 ghost markers.The two different ways of generating the position of ghost markers arerepresented in the left and right two images, respectively

Fig. 9 Top row input unlabeled markers (white dots) of facial motioncapture data. Bottom row marker labeling (colored dots) results (thegreen marker links are only used for viewing convenience, not the indi-cation of rigid constraint)

model from example database so that better predictions couldbe derived for occluded markers. Finally, reconstructing themovement of multiple intensively interacting characters isanother problem worth study.

Open Access This article is distributed under the terms of the CreativeCommons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution,and reproduction in any medium, provided you give appropriate creditto the original author(s) and the source, provide a link to the CreativeCommons license, and indicate if changes were made.

References

1. Xia, S., Gao, L., Lai, Y-K., Yuan, M-Z., Chai, J.: A survey onhuman performance capture and animation. J. Comput. Sci. Tech-nol. 32(3), 536–554 (2017)

2. https://www.vicon.com/ (2017)

3. Akhter, I., Simon, T., Khan, S., Matthews, I., Sheikh, Y.: Bilinearspatiotemporal basis models. ACM Trans. Graph. 31(2), 17:1–17:12 (2012)

4. Lou, H., Chai, J.: Example-based human motion denoising. IEEETrans. Vis. Comput. Graph. 16(5), 870–8792 (2010)

5. Nguyen, N., Wheatland, N., Brown, D., Parise, B., Liu, C.K., Zor-dan, V.: Performance capture with physical interaction. In: TheACM SIGGRAPH/Eurographics Symposium on Computer Ani-mation, SCA 2010, pp. 189–195 (2010)

6. Aristidou, A., Lasenby, J.: Real-time marker prediction and CoRestimation in optical motion capture. Vis. Comput. 29(1), 7–26(2013). doi:10.1007/s00371-011-0671-y

7. Burke, M., Lasenby, J.: Estimating missing marker positions usinglow dimensional Kalman smoothing. J. Biomech. 49, 1854–1858(2016)

8. Yu, Q., Li, Q., Deng, Z.: Online motion capture marker labelingfor multiple interacting articulated targets. Comput. Graph. Forum26(3), 477–483 (2007)

9. Li, B., Meng, Q., Holstein, H.: Articulated pose identificationwith sparse point features. IEEE Trans. Syst. Man Cybern. PartB Cybern. 34(3), 1412–1422 (2004)

10. Li, B., Meng, Q., Holstein, H.: Similarity K-d tree method forsparse point pattern matching with underlying non-rigidity. PatternRecogn. 38(12), 2391–2399 (2005)

11. Li, B., Meng, Q., Holstein, H.: Articulated motion reconstructionfrom feature points. Pattern Recognit. 41(1), 418–431 (2008)

12. Mundermann, L., Corazza, S., Andriacchi, T.P.: Accurately mea-suring human movement using articulated ICP with soft-jointconstraints and a repository of articulatedmodels. In: IEEEConfer-ence on Computer Vision and Pattern Recognition, pp. 1–6 (2007)

13. Besl, P.J., McKay, H.D.: A method for registration of 3-D shapes.IEEE Trans. Pattern Anal. Mach. Intell. 14(2), 239–256 (1992)

14. Zhang, Z.: Iterative point matching for registration of free-formcurves and surfaces. Int. J. Comput. Vis. 13(2), 119–152 (1994)

15. Maintz, J.B., Viergever, M.A.: A survey of medical image regis-tration. Med. Image Anal. 2(1), 1–36 (1998)

16. Chui, H., Rangarajan, A.: A new point matching algorithm for non-rigid registration. Comput. Vis. Image Underst. 89(2-3), 114–141(2003)

17. Lee, J.-H.,Won, C.-H.: Topology preserving relaxation labeling fornonrigid point matching. IEEE Trans. Pattern Anal. Mach. Intell.33(2), 427–432 (2011)

18. Zheng, Y., Doermann, D.: Robust point matching for nonrigidshapes by preserving local neighborhood structures. IEEE Trans.Pattern Anal. Mach. Intell. 28(4), 643–649 (2006)

19. Starck, J.,Hilton,A.:Correspondence labelling forwide-timeframefree-form surface matching. In: Proceedings of IEEE InternationalConference on Computer Vision (2007)

20. Sahillioglu, Y., Yemez, Y.: 3D shape correspondence by isometry-driven greedy optimization. In: Proceedings of CVPR, 2010, pp.453–458 (2010)

21. Veenman, C.J., Reinders, M.J.T., Backer, E.: Resolving motioncorrespondence for densely moving points. IEEE Trans. PatternAnal. Mach. Intell. 23(1), 54–72 (2001)

22. Sethi, I.K., Ramesh, J.: Finding trajectories of feature points ina monocular image sequence. IEEE Trans. Pattern Anal. Mach.Intell. 9(1), 56–73 (1987)

23. Rangarajan, K., Shah, M.: Establishing motion correspondence.CVGIP Image Underst. 54(1), 56–73 (1991)

24. Kuhn, H.W.: The Hungarian method for the assignment problem.Naval Res. Log. Q. 2(1–2), 83–97 (1955)

25. Conte, D., Foggia, P., Sansone, C., Vento,M.: Thirty years of graphmatching in pattern recognition. Int. J. Pattern Recognit. Artif.Intell. 18(3), 265–298 (2004)

26. Leordeanu, M., Hebert, M.: A spectral technique for corre-spondence problems using pairwise constraints. In: Tenth IEEE

123

Page 11: Toward accurate realtime marker labeling for live …...Motion capture technology has been widely used to create natural human animations in real-time live applications as virtual

Toward accurate real-time marker labeling for live optical motion capture 1003

International Conference on Computer Vision, 2005. ICCV 2005,vol. 2, pp. 1482–1489 (2005)

27. Cour, T., Srinivasan, P., Shi, J.: Balanced graph matching. In: Pro-ceedings of NIPS, pp. 313–320 (2006)

28. Torresani, L., Kolmogorov, V., Rother, C.: Feature correspondencevia graphmatching:models and global optimization. In: 10th Euro-pean Conference on Computer Vision, pp. 596–609 (ECCV 2008)

29. Zhao, J., Badler, N.I.: Inverse kinematics positioning using nonlin-ear programming for highly articulated figures.ACMTrans.Graph.13(4), 313–336 (1994)

30. Kalman, R.E.: A new approach to linear filtering and predictionproblems. Trans. ASME J. Basic Eng. 82(Series D), 35–45 (1960)

31. Boone, D.C., Azen, S.P.: Normal range of motion of joints in malesubjects. J. Bone Joint Surg. 61(5), 756–759 (1994)

32. Gill, P.E., Murray, W., Wright, M.H.: Practical Optimization. Aca-demic Press, New York (1981)

33. CMU Mocap database. http://mocap.cs.cmu.edu/

Shihong Xia is a professor asso-ciated with the Beijing KeyLaboratory of Mobile Com-puting and Pervasive Device,Institute of Computing Technol-ogy, Chinese Academy of Sci-ences. He received his bache-lor’s degree in mathematics fromthe Sichuan Normal University(1996), Chengdu. He completedhis master’s degree in appliedmathematics (1999) and Ph.D.degree in computer software andtheory (2002) from University ofChinese Academy of Sciences,

Beijing. His research interests include computer graphics, virtual realityand artificial intelligence.

Le Su is a Ph.D. candidate asso-ciated with University of Chi-nese Academy of Sciences andalso with Institute of ComputingTechnology, Chinese Academyof Sciences. He received his B.A.in Computer Science and Tech-nology from the JiLin University(2003), China. He completed hisM.Eng. fromXiaMenUnivercity(2008),China.His research inter-ests include computer graphicsand virtual reality.

Xinyu Fei received her B.A. inComputer Science and Technol-ogy from the China AgriculturalUniversity (2009), China. Shecompleted her M.Eng from Uni-versity of Chinese Academy ofSciences and also with Insti-tute of Computing Technology,Chinese Academy of Sciences(2011), China. Her researchinterests include computer graph-ics and virtual reality.

Han Wang is a Ph.D. can-didate associated with Univer-sity of Chinese Academy ofSciences and also with Insti-tute of Computing Technology,Chinese Academy of Sciences.She received her B.A. in Com-puter Science and Technologyfrom the China University ofPetroleum (2009), China. Herresearch interests include com-puter graphics and virtual reality.

123


Recommended