An Object Tracking Scheme Based on Local Densityzhuhan/preprints/localDensit… · object density...

T.-J. Cham et al. (Eds.): MMM 2007, LNCS 4351, Part I, pp. 166 – 175, 2007. © Springer-Verlag Berlin Heidelberg 2007

An Object Tracking Scheme Based on Local Density

Zhuan Qing Huang and Zhuhan Jiang

School of Computing and Mathematics, University of Western Sydney, NSW, 1797, Australia

[email protected], [email protected]

Abstract. We propose a method for tracking an object from a video sequence of moving background through the use of the proximate distribution densities of the local regions. The discriminating features of the object are extracted from a small neighborhood of the local region containing the tracked object. The object’s location probability is estimated in a Bayesian framework with the prior being the approximated probabilities in the previous frame. The proposed method is both practical and general since a great many of video scenes are included in this category. For the case of less-potent features, however, additional information from such as the motion is further integrated to help improving the estimation of location probabilities of the object. The non-statistical location of an object is then derived through thresholding and shape adjustment, as well as being verified by the prior density of the object. The method is effective and robust to occlusion, illumination change, shape change and partial appearance change of the object.

Keywords: Object detection, object tracking, density estimation.

1 Introduction

Object tracking in video sequences is an important task in computer vision, which can be applied to a variety of fields such as surveillance, process control, medical treatment, machine intelligence and object copyright protection. The challenges are, firstly, the appearance of object being continuously changing with the change of pose, location, illumination, occlusion, as well as the internal change of the object, and the object model based on the color or derived features may vary significantly throughout the video sequences. Secondly the computational load is high and therefore critical for real-time applications. Fortunately, by the nature of the tracking process, the objects in many such applications would need neither recognize the object precisely nor identify the object shape accurately. Capturing the whereabouts of the object in the subsequence frames under different conditions is hence the main focus.

Though the appearance of an object may vary quite largely within a video sequence, yet we know it bears only small changes in any two successive frames in most situations. Adaptive methods such as adaptive deformable template approach in the literature were adopted to improve the tracking accuracy by updating the current information for next frame. Features used for capturing an object are normally the

An Object Tracking Scheme Based on Local Density 167

color, edge or shape, and those mathematically derived from them. In terms of color features, the algorithms include template matching, histogram matching and points matching [1,3,14-16]. In term of shape features, there are active contour approaches, edge matching and deformable shape approaches [2,8-12]. Another important piece of information in video sequences is the temporal features. Many algorithms now integrate the motion information to track an object [2,13]. However, purely motion based approaches may exclude the situation where the object becomes static during certain time interval. Paragios el al [2] proposed a method by modeling the inter-frame difference data as a mixture density from motion and static component, then integrated the motion information into geodesic active contour functions. The Condensation algorithm in [6] utilized “factored sampling” and learned dynamical models, together with the visual observations, to propagate the probability distribution through the frames. Color approaches from early template match or histogram match to later probability estimation have also been investigated. Comaniciu et al [1] regularized the feature histogram-based target representation by spatially masking with an isotropic kernel. The target localization problem becomes finding the local maxima of the similarity function.

To avoid the intensive location searching or exhaustive pixels matching, we propose in this work a method that tracks the object in a video sequence based on the characteristics of the local region density. By approximating the local object and background densities, the object probability (refering to the probability of a pixel belonging to the object) is then obtained within a Bayesian framework. The characteristics of the density features lead to a slight formula adjustment in the object density computation. This paper is organized as follows: Sections 2 introduces the feature selection to be used in the object density estimation later on. Section 3 describes the tracking strategies based on the density feature. Implementation and experimental results are shown in Section 4, and finally Section 5 is the conclusion.

2 Feature Selection

Object features include color, shape, gradient and other derived properties. Some of the features may be more suitable than the others for the tracking purpose depending on the particular individual circumstances. In what follows, we will investigate how to select proper object features to achieve a better tracking performance.

2.1 Discriminating Feature

Significant features such as strong edges, smoothing uniform color or regular texture are known to be good in general for capturing an object. On the other hand, the contrast of the features between the object and the background are also good for tracking an object. It is naturally anticipated that for the effective tracking an object feature that distinguishes itself more from the background is a better choice than those more similar to the background. The object features that are similar to the background would decrease the accuracy of capturing an object, and could even lead to a tracking

168 Z.Q. Huang and Z. Jiang

failure, whether or not the features are significant on their own. In a dull environment, a bright color of the object is a good choice for tracking, while in a bright environment the dark color of the object is more suitable. In this connection, a segmentation method [5] was proposed to extract bright targets filtered out by wavelet techniques, while a feature selection mechanism [4] was presented to evaluate features used to improve the tracking performance. In our work, we propose to select features for the tracking based on the contrast of probability density between the object and its local background from different feature spaces. The main strategy is to make use of an optimal segment of density distribution from different sources. These density segments sought for such purposes can be used in the later steps, and the density of object in the current frame is calculated with the approximate density from previous frame.

(a) (b) (c)

Fig. 1. (a) Object and local background. (b) Object, local background and mixture density. (c) Difference of densities between object and local background.

In order to examine the local density, we first use a simple shape such as an ellipse to approximate the object, as show in fig. 1(a), (or another shape based on the characteristics of the object so that it better covers the object), while the local background is estimated on an annular area with a larger ellipse. In fig. 1(b), the solid (blue) line is the object density pA(x) obtained from the smaller ellipse, the dotted (red) line is the local background density pB(x) obtained from the annular area, the dashed (green) line represents the mixture density pM(x). The mixture density is the weighted sum of the object density and the local background density via

)()()( xpPxpPxp BBAAM += , (1)

where x is the pixel intensity, and PA and PB are the weight values with PA + PB=1. For two densities of the shape in figure 1(b), where object density and local background density are well separated with some overlapping, we propose to use an approximate density for the subsequent frame to obtain the object density, which will be described in detail in the next section. We here describe only the initial processing that determines which density segment is to be selected and how to tune the densities so that the low density in one feature can be complemented by the high density in another feature.

Let use define the difference of the two densities by f(x)=pA(x)-pB(x). Then the density segment may be suitable for object tracking if f(x) is sufficiently large there. If both pA(x) and pB(x) are very small on a segment, then the segment is not suitable for


detecting the object presence because it does not have enough discriminating power and the result would be unpredictable. We can measure the suitability by the relative value f(x)/pA(x), where a larger value would correspond to a better performance for the tracking. If the density is approximated by a Gaussian distribution, we can also measure the performance by the model parameters. To reduce the approximation complexity, we can simply use the area centre of the positive difference f(x) as the landmarks. The higher the centre is located, the larger the difference between object and local background.

2.2 Optimal Feature Density

The object densities from different feature spaces may have different characteristics. In most cases, they would not be all the same; otherwise the object and its local background are almost the same. Out goal is to find the ideal candidate density which has maximum difference between object and its local background. In other words, we choose a feature so as to best distinguish the object from its local background. A density may be derived from the RGB space, HSV space or from other properties such as the smoothness. The total difference of object density from local background in a positive portion can be calculated by Ω+= ∫f(x)>0 f(x)dx. We can compare Ω+ as well as the area centre of the positive f(x) among several different feature spaces, and choose the feature that has larger Ω+ and higher centre point. For instance, the densities of object and background may overlap too much in RGB space, but may be able to separate in saturation within HSV space, as shown in fig. 3 (b) and (c), or the densities in hue space can separate better than the intensity as shown in fig. 2 (a), (c).

2.3 Complementally Feature Densities

As shown in fig. 1(c), the leftmost part of the object density from intensity is well separated from the local background density, yet the other part on the right will result in weaker object detection or missing area. For better representing the object, we can combine the optimal parts of different densities from different feature spaces, which may complement each other for a fuller object representation.

Though all densities exhibiting larger difference can be used, more densities used may increase the accuracy; it also increases the computation unnecessarily if less is enough for the tracking purpose. One reason for looking into other feature densities is that the selected object density could have a (not small) part overlapping with the background density, and the difference between them is not large, which means there would be some area missing in the object representation. Therefore the size of the density overlap and the extent of the difference can be used to make a decision on whether to utilize an additional feature property. In fig. 1 and fig. 2, for instance, we can observe there is a portion of object density on the right that is overlapped in fig. 1(b), and the difference of this part is small in fig. 1(c). We also observe that the object has another property which is the smoothness, as shown in fig. 2(g). This indicates that we can combine these two set of densities to better represent the object.


3 Proximate Density Approaches

The object tracking is conducted according to the features of the object. For the object with feature density exhibiting discrimination, we propose to use local densities of the object and its surrounding background for its tracking. The method uses the proximate density based on Bayesian rule. For non-discriminating densities, other information such as the motion may be needed to improve the estimation.

3.1 Discriminative Density

Bayesian inference is a method in which latest evidence or observation is utilized to update or to newly infer the probability of a hypothesis being true. Given a complete set of n+1 mutually exclusive hypotheses Hi, the estimated prior probability p(Ho) of a hypothesis Ho can be improved into the posterior probability p(Ho|D), if a set of additionally observed data D is to be used to further improve the probability estimation. More precisely, the posterior probability p(Ho|D) of the hypothesis Ho can be calculated through the use of the prior probability of the hypothesis and the probabilities of the observed data under the different hypotheses:

∑=

=n

iii

ooo

HpHDp

HpHDpDHp

0

)()|(

)()|()|( ,

(2)

where p(D|Hi) is the likelihood of the hypothesis Hi under the observed data D, p(Hi) is the prior probability of the hypothesis Hi. The denominator is essentially a normalization factor. According to the Bayesian rule, the object probability can be expressed as

)()|()()|(

)()|()|(

BpBxpApAxp

ApAxpxAp

tttt

ttt

+= , (3)

where A and B denote that the current pixel of value x belongs to the object and the background respectively, p(x|A) is the likelihood of the current pixel of value x belonging to the object, p(A) and p(B) are the prior probabilities, the probabilities estimated prior to inspecting the actual pixel value, of the pixel belonging to the object and the background respectively. We in general don’t know exactly the prior probability of a pixel being on the object or the background; we assume they are constant. We also don’t know the exactly object and background density at current time t, but we know the densities, especially the object density, would be very close to the densities in the previous frame at time t-1, which we already knew. So we can use the pt-1(x|A) and pt-1(x|B) in (3) to calculate the approximate object density over the current frame. In fact additional simplification on the formula can be made for the actual calculation. The advantage of this method over the direct density matching is that it requires less iterative searching. In fact, the local object density will in general fall into this catalogue when the local region excludes most background by its border of an ellipse or other shapes.


For the use of multiple feature densities, the object density from the combined features is

∑=

=n

iii xApxAp

1

)|()|( λ , ∑=

=n

ii

1

1λ , (4)

where n is the number of feature densities, and λi ≥0 is the weight factor. This would result in a more complete coverage of the object. For a rough tracking, one distinguishing feature is enough. Nevertheless the multiple densities approach would still increase the robustness of tracking with less impact on the sudden appearance change of the object.

3.2 Non-discriminative Density

When the object and background densities are not well separated from each other, but the difference of main part of density is large, then the method mentioned above is still applicable. If however these densities are flat and/or quite close to each other, in other words, the object is very similar to its local background regarding to the selected feature, the above method alone will not be sufficient to extract the object. One piece of information not being used in the above density approach is the pixel space relationship. Yet if we establish the space relationship based on the object location, it often leads to the iterative location searching. However there is another piece of information that can be made use of, the motion information. For the static background, we can estimate the motion by the difference data from successive frames. We model similar to [2] the difference frame data D’(s, t)=x(s, t)-x(s, t-1) (s is pixel location) as a mixture of the background density pb(d) and the motion density pm(d) similar to (1), and then determine the model parameters by maximizing the joint density using maximum likelihood method. Now we integrate the motion density into the calculation of object density as following:

))(()()|())(()()|(

))(()()|()|(

11

1

xdpBpBxpxdpApAxp

xdpApAxpxAp

bt

mt

mt

t

−−

−

+≈ . (5)

The object probability obtained this way would be more distinguishable from the local background than the method without the use of the motion information. For the moving background sequences, the motion information may be obtained by other methods.

3.3 Object Shape Refinement

The region an object occupies is obtainable by thresholding the location probability in the local area while ensuring the coherence of its neighbor with such as region growing, and region consolidation method [6]. The probabilities may not always represent the full object shape, as part of object area very close to the background may not be well represented by the probability distribution. Such a problem can be largely alleviated by projecting the object ellipse in the previous frame onto the


current region. Next we compare the object density with the one in previous frame, if the error is smaller than a permitted threshold, the local region is defined. The background density can’t play this role since the background may have large variation if motion is large. The ellipse needs to adapt along with the object moving throughout the sequence. The criterion to fit an ellipse onto an object is to ensure that the ellipse differs with the object region as less as possible.

4 Implementation and Experiments

The following experiments are developed to illustrate the effect on tracking an object, due to the density selection, multiple densities, as well as the non-discriminative density. One video sequence is showed in fig. 1(a) and another is showed in fig. 4(a). These results will also be properly explained and discussed.

First we inspect the effect of the density shape difference. For this purpose we examine a penguin sequence also used in fig. 5. We observe that the hue densities in fig. 2(a) are better separated than the saturation and intensity densities depicted in fig. 2(b) and fig. 2(c) respectively, and the resulting object probability is better distinguished from its local background as showed in fig. 2 (d), (e) and (f). Thus the hue is the better feature to use for the local proximate density method to capture the object for this sequence.

a b c d e f g h i

Fig. 2. (a) Hue density. (b) Saturation density. (c) Intensity density. (d) Object probability by hue. (e) Object probability by saturation. (f) Object probability by intensity. (g) Smoothness. (h) Object probability by smoothness. (i) Object probability by intensity and smoothness.


a b c

d e f g h i

Fig. 3. (a) Hue density. (b) Saturation density. (c) Intensity density. (d) Object probability by hue. (e) Object probability by Saturation. (f) Object probability by Intensity. (g) Object probability by hue and saturation (h) Object probability by hue and intensity. (i) Object probability by saturation and intensity.

(a) (b) (c)

Fig. 4. Non-discriminating density tracking

Additional experiments for combining multiple features are conducted for the second sequence (see fig. 4(a)) as shown in fig. 3. From densities in the fig. 3(a) to (c), we observe that the saturation is a better feature to separate object from its local background than the other two, and the corresponding object probabilities in fig. 3(d) to (f) demonstrate this. The multiple densities approach is shown in fig. 3(g), (h) and (i), where we combined the hue and saturation together via (4), it has improved the results as shown in fig. 3(g). The result manifested in the observation of the density distributions. We note that other features or properties can also play a similar role. For example, in fig.2, the density distribution in fig. 2(g) shows the object is pretty uniform in its color, and when we combine it with the intensity density, it yields a better result as in fig. 2(i) than the intensity itself in fig. 2(f). This shows that when we take into consideration the smoothness (standard variation) of color, the object probability has been enhanced as a result.


Fig. 5. Tracking penguin

Fig. 6. Tracking a vehicle

We now move to experiment with non-discriminative densities such as in fig. 3(c), where the object and its local background densities do not separate well in terms of the distribution. For this purpose we examine a video sequence starting from image in fig. 4(a). We model the motion density and background density from difference frame data by the Gaussians, and use the Expectation-Maximization algorithm to calculate the model parameters, with the resulting distributions depicted in fig. 4(b). Then use the motion information and apply (5) to calculate the object probability. The result is shown in fig. 4(c). If we compare it with fig. 3(f) that uses only the intensity probability, we see that the additional motion information greatly enhanced the performance for the non-discriminating intensity density.

We now apply our scheme to a complete penguin sequence to see the effectiveness of the penguin tracking. Fig. 5 depicts the tracking through the video sequence, where the red dot on the penguin indicates the tracking results. We note that on black and white images, this reddish shade is not easy to observe. It instead looks like a dot grid on the penguin object. Finally illustrate in fig. 6 that the proposal method is applied to video sequences of a moving background sequence. In the video sequence in fig. 6, the tracked object is the vehicle and the region obtained through the tracking is depicted in green dots, which also fall on the red vehicle as expected.

5 Conclusion

We proposed a fast object tracking method for video sequences based on the local feature densities. The local features are properly selected and the current local densities are typically approximated by the densities in the previous frame. The object’s location probability is calculated within a Bayesian framework, and the motion information is used to compensate for the case of non-discriminating features. The method is efficient and robust for most tracking scenarios, and the experiments also demonstrated the efficiency of the method in that it does not involve complicated modeling and heavy computation as many iterative searching algorithms would.


References

1. Comaniciu, D. Ramesh, V. and Meer, P.: Kernel-Based Object Tracking, IEEE Trans. Pattern Analysis and Machine Intelligence, Vol. 25, No. 5, (2003) 564-577.

2. Paragios, N. and Deriche, R.: Geodesic Active Contours and Level Sets for the Detection and Tracking of Moving Objects, IEEE Trans. Pattern Analysis and Machine Intelligence, Vol.22, No.3, (2000) 266-280.

3. Sheikh , Y. and Shah, M.: Bayesian Modeling of Dynamic Scenes for Object Detection, IEEE Trans. Pattern Analysis and Machine Intelligence, Vol.27, No. 11, (2005) 1778-1792.

4. Collins, R.T., Liu, Y. and Leordeanu, M.: Online Selection of Discriminative Tracking Features, IEEE Trans. Pattern Analysis and Machine Intelligence, Vol. 27, No. 10, (2005) 1631-1642.

5. Zhang, X.P. and Desai, M.D.: Segmentation of Bright Targets Using Wavelets and Adaptive Thresholding. IEEE Trans. on Image Processing, Vol. 10, No. 7,(2001) 1020-1030.

6. Isard, M. and Blake, A.: CONDENSATION-Conditional density propagation for visual tracking, Int. J. Comput. Vis., Vol. 29(1), (1998) 5-28.

7. Mansouri, A.R.: Region tracking via level Set PDEs with Motion Computation, IEEE Trans. Pattern Analysis and Machine Intelligence, Vol. 24, No.7, (2002) 947-967.

8. Yusuf, A.S. and Kambhamettu, C.: A Coarse-to-Fine Deformable Contour Optimization Framework, IEEE Trans. Pattern Analysis and Machine Intelligence, Vol. 25(2), (2003) 174-186.

9. DeCarlo, D. and Metaxas, D.: Adjusting Shape Parameters Using Model-Based Optical Flow Residuals, IEEE Trans. Pattern Analysis and Machine Intelligence. Vol. 24(6), (2002) 814-823.

10. Shen, D. and Davatzikos, C.: An Adaptive-Focus Deformable Model Using Statistical and Geometric Information, IEEE Trans. Pattern Analysis and Machine Intelligent, Vol.22(8), (2000) 906-913

11. Mukherjee, D.P., Ray, N. and Acton, S.T.: Level Set Analysis for Leukocyte Detection and Tracking, IEEE Trans. On Image Processing, Vol.13(4), (2004) 562-572.

12. Huang, Z.Q. and Jiang, Z.: Tracking Camouflaged Objects with Weighted Region Consolidation, Proceedings of Digital Image Computing: Techniques and Application, (2005) 161-168.

13. Jepson, A. D., Fleet, D. J. and El-Maraghi, T.F.: Robust Online Appearance Modes for Visual Tracking, IEEE Trans. Pattern Analysis and Machine Intelligence, Vol. 25, No. 10, (2003) 1296-1311.

14. Elgammal, A. Duraiswami, R. Davis, L.S.: Efficient Kernel Density Estimation Using the Fast Gauss Transform with Applications to Color Modeling and Tracking, IEEE Trans. Pattern Analysis and Machine Intelligence, Vol. 25, No. 11, (2003) 1499-1504.

15. Comaniciu, D. and Meer, P.: Mean Shift: A Robust Approach Toward Feature Space Analysis, IEEE Trans. Pattern Analysis and Machine Intelligence, Vol. 24, No. 5, (2002) 603-619.

16. Perez, P. Hue, C. Vermaak, J. and Gangnet, M.: Color-Based Probabilistic Tracking, Proc. European Conf. Computer Vision, Vol. I, (2002) 661-675.

Date post:	14-Oct-2020
Category:	Documents
Upload:	others
View:	0 times
Download:	0 times

An Object Tracking Scheme Based on Local Densityzhuhan/preprints/localDensit… · object density...

Documents