A Simple Oriented Mean-Shift Algorithm For Tracking · A Simple Oriented Mean-Shift Algorithm For...

A Simple Oriented Mean-Shift Algorithm For

Tracking

Jamil Drareni and Sebastien Roy

DIRO, Universite de Montreal, Canada.{drarenij,roys}@iro.umontreal.ca

Abstract. Mean-Shift tracking gained a lot of popularity in computervision community. This is due to its simplicity and robustness. However,the original formulation does not estimate the orientation of the trackedobject. In this paper, we extend the original mean-shift tracker for ori-entation estimation. We use the gradient field as an orientation signa-ture and introduce an efficient representation of the gradient-orientationspace to speed-up the estimation. No additional parameter is requiredand the additional processing time is insignificant. The effectiveness ofour method is demonstrated on typical sequences.

1 Introduction

Object tracking is a fundamental and challenging task in computer vision. It isused in several applications such as surveillance [1], eye tracking [2] and objectbased video compression/communication [3].

Although many tracking methods exist, they generally fall into two classes,bottom-up and top-down [4]. In a bottom-up approach, objects are first identifiedand then tracked. The top-down approach instead, uses hypotheses or signaturesthat discriminate the object of interest. The tracking is then performed by hy-potheses satisfaction.

Recently, a top-down algorithm based on mean-shift was introduced for blobtracking [5]. This algorithm is non-parametric and relies solely on intensitieshistograms. The tracking is performed by finding the mode of a statistical distri-bution that encodes the likelihood of the original object’s model and the modelat the probing position. Because it is a top-down approach and it does not rely ona specific model, the mean-shift tracker is well adapted for real-time applicationsand robust to partial occlusions.

In [6], an extension was proposed to cope with the scale variation. However,little has been done to extend the tracker for rotational motions[7]. In fact,the original mean-shift tracker as proposed in [5] is invariant to rotations andthus, does not provide information on the target’s orientation. This property isinduced by the inherent spaceless nature of the histograms. While this may notbe problematic for objects with symmetrical dimensions like circles or squares,it is no longer valid when the tracked objects are ”thin” [7]. An example of atracked thin object (an arm) is illustrated in Fig.1.

Fig. 1. Result of tracking an arm using the presented oriented mean-shift tracker.

In [7], the authors used a simplified form of correlograms to encode pixelspositions within the region of interest. Pairs of points at an arbitrarily fixeddistance along the principal axis vote with their joint intensities and their anglerelative to the patch’s origin to generate an orientation-intensity correlogram.Once the correlogram is estimated, it is used in the mean-shift’s main loop justlike a regular histogram. Unfortunately no method was proposed to automati-cally select the fixed distance for pairs sampling. Furthermore, since the pairsare only picked along the principal axis of the object, the generated correlogramdoes not encode a global representation of the object.

In this paper, we propose a fast and simple algorithm for an oriented mean-shift tracking. We use the original mean-shift formulation to estimate the ob-ject’s translation and for the rotational part, a histogram of the orientations ofthe spatial gradients (within the region of interest) is used to assign an orienta-tion according to a previously computed set of possible orientations’ histogram.The effectiveness of the proposed method is demonstrated in experiments withvarious types of images. Our method can also be applied to video stabilizationas our experiments will show. Real-time applications are still possible since theadditional processing time is negligible.

The rest of the paper is organized as follows, in section 2, mean-shift trackingis summarized. Section 3, presents the gradient-orientation representation usinghistogram’s LUT. The implementation of the proposed method is described insection 4. The experiments and results are reported in section 5 and we finallysummarize our conclusion in section 6.

2 Mean-shift and Limitations

The mean-shift algorithm, as initially proposed in [8], is a non-parametric methodto estimate the mode of a density-function. Let S = {xi}i=1..n a finite set of r-dimensional data and K(x) a multivariate kernel with window radius h. Thesample mean at x is defined as:

m(x) =

∑

xi∈S K(x − xi).xi∑

xi∈S K(x − xi)

The quantity m(x)−x is called the mean-shift vector. It has been proven thatif K(x) is an isotropic kernel with a convex and monotonic decreasing profile, the

mean-shift vector always points in the direction of the maximum increase in thedensity. Thus, following this direction recursively leads to the local maximumof the density spanned by S. Examples of such kernels are the gaussians andEpanechnikov kernels. The reader is referred to [8, 9, 5] for further details onthe mean-shift algorithm and related proof of convergence.

2.1 Mean-Shift for Tracking

Comaniciu et al.[5] took advantage of the mean-shift’s property and proposedan elegant method to track blobs based on intensities histograms. The algorithmfinds the displacement △y of the object of interest S as a weighted sum:

△y =

∑

xi∈S wi.K(x − xi).xi∑

xi∈S wi.K(x − xi)

Where wi are weights related to the likelihood of the model and the tar-get’s intensities histograms. The estimation is recursive until the displacement’smagnitude ||△y|| vanishes (or reaches a predefined value).

Unfortunately, the mean-shift tracker can not infer the orientation of anobject based on its intensity histogram. To overcome this limitation, the trackermust use clues related to the spatial organization of the pixels or parametersthat describe textures. Among those clues, image gradients are good candidatesbecause their orientations vary when the image undergoes a rotation and areeasy to compute.

2.2 Gradients and Gradients Histogram

Let I be an image. The first order gradient of I at position (x, y), noted ∇Ixy

is defined as:

∇Ixy = [Ix, Iy]T =

[

I(x + 1, y) − I(x − 1, y)I(x, y + 1) − I(x, y − 1)

]

(1)

The orientation and the magnitude of the gradient vector Ixy are given by:

θ(x, y) = tan−1

(

Iy

Ix

)

; mag(x, y) =√

I2x + I2

y (2)

It is clear that the orientation is independent of the image translation. How-ever, a rotation of the image yield a rotation of the gradient field by the sameamount. This property can be used to assign an orientation to the object of in-terest. Instead of keeping track of the gradient field itself, it is more convenient tobuild a histogram of gradient’s orientations. This representation has been usedin Lowe’s SIFT [10] to assign an orientation to the keypoints.

In the present work, the m-bin orientation histogram O of an object is com-puted as:

Om = C

i=n∑

i=1

mag(pi) · δ[θ(pi) − m] (3)

Where p0, p1, ...pn are the n pixels of the object of interest and the normaliza-tion constant C is computed as to insure that

∑u=mu=1

Ou = 1. θ(pi) and mag(pi)are functions that return the orientation and the magnitude of the gradient atpixel pi as defined in (2) As opposed to a regular intensity histogram, each sam-ple modulates its contribution with its magnitude. The reason behind this choiceis two-fold: first, we generally observe that gradients with larger magnitudes tendto be more stable ; second, the gradient is known to be very sensitive to noise,thus weighting the votes with their magnitudes is like privileging samples witha good signal to noise ratio.

As opposed to [10], we do not extract a dominant orientation from the his-togram. Rather, we keep the whole histogram as an orientation signature.

Histograms and bin width One of the major problem that arises when es-timating a histogram (or any density function) from a finite set of data is todetermine the bin width of the histogram. A large bin width gives an over-smoothed histogram with a coarse block look, whereas a small bin width resultsin an under-smoothed and jagged histogram [11]. In [12], Scott showed that theoptimal bin width W , which provides an unbiased estimation of the probabilitydensity is given by:

W = 3.49 · σ · N−1/3 (4)

Where N is the number of the samples and σ is the standard deviation ofthe distribution. We used a more robust formulation described in [13]:

W = 2 · IQR · N−1/3 (5)

The interquartile range (IQR) is the difference between the 75th and 25th

percentile of the distribution. Note that (5) does not contain σ, thereby reducingthe risk of bias. The bin width computed with (5) is the one we use throughoutour experiments.

3 Tracking with Gradient Histograms

A single orientation histogram encodes only the gradient distribution for onespecific orientation. Thus, to infer the orientation from a gradient histogram, aLUT of gradient histograms corresponding to all image orientations must be builtbeforehand (at the initialization step). During the tracking process, the gradienthistogram of the object must be compared against the histograms in the LUT.The sought orientation is the one that corresponds to the closer histogram inthe LUT. A histogram’s likelihood can be computed in different ways. We used

the histogram intersection as introduced in [14] for its robustness and ease ofcomputation.

The intersection of two m-bins histograms h1 and h2 is defined as:

h1 ∩ h2 =i=m∑

i=1

Min(h1[i], h2[i]) (6)

Where Min() is a function that returns the minimum of its arguments. It isclear that the closer the histograms, the bigger the intersection score. The look-up table of histograms captures the joint orientation-gradient space of the objectand can also be seen as a 2D histogram.

In the following subsections, two methods are introduced to construct a his-togram gradient table: Image-Rotation Voting and Gradient-Rotation Voting.

3.1 Image-Rotation Voting

This is the simplest way to gather histograms of gradients for different orien-tations. The image of the tracked object is rotated by 360◦ around its center.The rotation is performed by a user-defined step (2◦ in our experiments) andan orientation histogram is computed at each step. The resulting histograms arestored in a stack and they form the gradient’s histograms LUT. To reduce noisedue to the intensity aliasing , rotations are performed with a bi-cubic interpola-tion. This method is outlined in the algorithm below:

Given: Original image, target’s pixels {pi}i=1...n and a rotation step △rot.

1. step← 0 , ndx← 0.2. Apply a gaussian filter on {pi}i=1...n to reduce noise (typically 3× 3).3. Compute {magi}i=1...n and {θi}i=1...n the orientation and magnitudes of gradients

at {pi}i=1...n according to (1).4. Derive the orientation histogram Om using {magi} and {θi} according to (3).5. LUT [ndx]← Om

6. ndx← ndx + 1 , step← step +△rot.7. Rotate {pi}i=1...n by step degrees.8. If step < 360 go to step 3.9. return LUT

3.2 Gradient-Rotation Voting

The second method is faster and produces better results in practice. Insteadof rotating the image itself, the computed gradient field of the original imageis incrementally rotated and the result of each rotation votes in the properhistogram. Note that due to histogram descretization, rotating a gradient fieldis not exactly equivalent to shifting the histogram by the same amount. This isdue to the fact that histogram sampling is generally not the same as the rotationsampling. For instance, after rotating the gradient field some samples that vote

for a specific bin might still vote for the same bin whereas others may jump toan adjacent bin. They would be equivalent if the gradient histogram had a binwidth of 1 (i.e 360 bins), which is not the case in practice. The gradient-rotationvoting is outlined below:

Given: Original image, target’s pixels {pi}i=1...n and a rotation step △rot.

1. step← 0 , ndx← 0.

2. Apply a gaussian smoothing on {pi}i=1...n to reduce noise.

3. Compute {magi}i=1...n and {θi}i=1...n the orientation and magnitudes of gradientsat {pi}i=1...n according to (1).

4. Derive the orientation histogram Om using {magi} and {θi} according to (3).

5. LUT [ndx]← Om

6. ndx← ndx + 1 , step← step +△rot.

7. For each {θi}i=1...n

Do θi ← θi +△rot

8. If step < 360 go to step 4.

9. return LUT

4 Implementation

We implemented the proposed oriented mean-shift tracker as an extension tothe original mean-shift tracker. The user supplies the initial location of the ob-ject to track, along with its bounding-box and an initial orientation. Images arefirst smoothed with a gaussian filter to reduce the noise (typically a 3 × 3 gaus-sian mask). Note that the smoothing is only applied in the neighborhood of theobject. Orientations’s look-up table are generated using the Gradient-Rotatingmethod with a 2◦ step. The orientation estimation can either be nested withinthe original mean-shift loop or performed separately after the estimation of thetranslational part. The complete algorithm is outlined below:

Given: The original sequence, the initial object’s position (y0) and orientation (θ0).

1. Compute the LUT of histograms orientations at y0 (see section 3).

2. Initialize the mean-shift algorithm.

3. For each frame fi

(a) Update the object’s position using the original mean-shift.

(b) compute the gradient and estimate H the orientation histogram using (3).

(c) hmax ← Max [H ∩ hi]; hi ∈ LUT .

(d) update the object’s orientation by the orientation that corresponds to hmax.

Notice that the orientations are estimated relatively to the initial orientationθ0.

Even though histogram intersection is a fast operation, processing time canstill be saved at step 4.c by limiting the search in a specific range instead of theentire LUT. Typical range is ±20◦ from the object’s previous orientation.

Fig. 2. Some frames from the manually rotated sequence (with a fixed background).

5 Experimental Results

We tested the proposed oriented mean-shift algorithm on several motion se-quences. Since we propose an orientation upgrade to the original mean-shifttracker, we mostly considered sequences with dominating rotational motion. Wefirst ran our tracker on a synthetic sequence that was generated by fully rotat-ing a real image (a chocolate box). The figure fig.2 shows some frames from thesynthetic sequence.

0

10

20

30

40

50

60

70

80

4 6 8 10 12 14 16 18 20 22

Err

or (

%)

number of bins

Gradient-RotationImage-Rotation

Fig. 3. Errors in orientation estimation as a function of histogram samples.

The error of rotation estimation using different bin size is reported in figurefig.3. The green curve represents the error with a LUT generated by the image-

rotation method whereas the red curve is the error using a LUT generated bythe gradient-rotation method. In both cases the computed optimal bin size was

14. As we can see, the gradient-rotation method gives better results and is lesssensitive to the bin size variation. For the rest of the experiments, gradient’sLUT were generated using the gradient-rotation method.

Fig. 4. Results of tracking a rotating face. Sample frames: 78, 164 and 257

We further tested our method for face tracking. As the face underwent analmost perfect roll, we computed the orientation estimations at each frame. Weobserve that the face is tracked accurately, although no exact ground truth isavailable in this case. The results are shown in figures fig.4 and fig.5.

Aerial surveillance is another field where the tracking is useful. Due to therectangular shape of common vehicles, an oriented tracking is suitable as shownin figure fig.6. However, notice that the orientation is not truly 2D, as the viewangle induces some perspective distortions that is not handled in our method.

Finally, we illustrate the effectiveness of the proposed method for video rec-tification. A hand-held camera was rotated by hand around its optical axis whilegazing at a static scene (see figure fig.7, left column). We tracked a rigid objectattached to the scene and used the recovered motion to rectify and cancel therotation in the video sequence. The results of tracking/rectification are shownin the figure fig.8 and the estimated orientations are plotted in the figure fig.7.The rotation is well recovered, as can be seen in the estimated curve of figurefig.7 and the rectified images of figure fig.8. Notice that the rectified images aresometimes distorted by parallax effects that are not modelled by our algorithm.

0 50 100 150 200 250 300 350

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

Frame Id

Ang

le (

rad)

Fig. 5. Estimated orientation for the rotating face sequence.

Fig. 6. Tracking results for the car pursuit sequence.

-1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

0 100 200 300 400 500 600 700 800

Ang

le (

rad)

Frame

Fig. 7. Estimated orientation for the shelf sequence.

Fig. 8. Results of tracking and rectifying images from a rolling camera sequence. left)results of the original tracking. right) rectified sequence after rotation cancellation.Shown frames are 0, 69,203,421,536 and 711.

6 Conclusion

We have presented a fast and simple extension to the original mean-shift tracker,to allow the estimation of the orientation. This rotation parameter is crucialwhen the tracked objects have a ”thin” shape. We introduced the idea of thegradient-orientation space represented by the gradient look-up tables. Of course,the LUT can be extended to other cues related to the texture or pixels posi-tions. This representation proved to be efficient as our experiments depicted.The proposed method ran comfortably on a regular PC in real time. Trackingwas performed at 10-25 frames per second for typical 2000 pixels objects. In ourimplementation, the orientation was estimated independently from the transla-tion shift. However, performing a combined mean-shift on histograms intensitiesand gradient LUT is possible. In the future, we plan on adding support forperspective deformation to better handle different type of rotations.

References

[1] J. Segen and S. G. Pingali. A camera-based system for tracking people in realtime. In International Conference on Pattern Recognition, page 63, 1996.

[2] Zhiwei Zhu, Qiang Ji, and Kikuo Fujimura. Combining kalman filtering and meanshift for real time eye tracking under active ir illumination (oral). 2002.

[3] J. Crowley and K. Schwerdt. Robust tracking and compression for video commu-nication. In IEEE Computer Society Conference on Computer Vision, Workshop

on Face and Gesture Recognition, 1999.[4] Katja Nummiaro, Esther Koller-Meier, and Luc J. Van Gool. An adaptive color-

based particle filter. Image Vision Comput., 21(1):99–110, 2003.[5] D. Comaniciu, V. Ramesh, and P. Meer. Real-time tracking of non-rigid objects

using mean shift. In Conference on Computer Vision and Pattern Recognition,pages 142–151, 2000.

[6] R. T. Collins. Mean-shift blob tracking through scale space. In Conference on

Computer Vision and Pattern Recognition, volume 2, pages 234–240, 2003.[7] Qi ZHao and Hai Tao. Object tracking using color correlogram. In Joint IEEE

International Workshop on Visual Surveillance and Performance Evaluation of

Tracking and Surveillance, pages 263–270, 2005.[8] K. Fukunaga and L. Hostetler. The estimation of the gradient of a density function

with applications in pattern recognition. In IEEE Transactions on Information

Theory, pages 32–40, 1975.[9] Yizong Cheng. Mean shift, mode seeking, and clustering. IEEE Trans. Pattern

Analysis and Machine Intelligence, 17(8):790–799, 1995.[10] D. Lowe. Distinctive image features from scale-invariant keypoints. In Interna-

tional Journal of Computer Vision, volume 20, pages 91–110, 2003.[11] M. P. Wand. Data-based choice of histogram bin width. The American Statisti-

cian, 51(1):59, 1997.[12] D. Scott. On optimal and data-based histograms. Number 66, 1979.[13] A. J. Izenman. Recent developments in nonparametric density estimation. vol-

ume 86, 1991.[14] Michael J. Swain and Dana H. Ballard. Color indexing. International Journal of

Computer Vision, 7(1):11–32, 1991.

Date post:	11-Sep-2018
Category:	Documents
Upload:	doanmien
View:	224 times
Download:	0 times

A Simple Oriented Mean-Shift Algorithm For Tracking · A Simple Oriented Mean-Shift Algorithm For...

Documents