Opencv Object Tracking

OBJECT TRACKINGTSBB13 Computer Vision System

Fredrik Lundell Andreas Wallin Andreas Vikman

Wednesday 28th April, 2010

Contents

1 Introduction 21.1 The dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.2 Generalization and Assumptions . . . . . . . . . . . . . . . . . . 2

2 Methods 42.1 Background modeling . . . . . . . . . . . . . . . . . . . . . . . . 4

2.1.1 Median filter . . . . . . . . . . . . . . . . . . . . . . . . . 42.1.2 Gaussian mixture model . . . . . . . . . . . . . . . . . . . 4

2.2 Foreground segmentation . . . . . . . . . . . . . . . . . . . . . . 52.2.1 Morphological operations . . . . . . . . . . . . . . . . . . 52.2.2 Segmentation with Graph Cuts . . . . . . . . . . . . . . . 5

2.3 Identity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.4 Object identity over time . . . . . . . . . . . . . . . . . . . . . . . 6

3 Results 73.1 Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

3.1.1 Background model . . . . . . . . . . . . . . . . . . . . . . 73.1.2 Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . 7

3.2 Tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83.2.1 Identity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83.2.2 Identity over time . . . . . . . . . . . . . . . . . . . . . . . 9

4 Conclusion 10

References 11.1 Dokumentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

1

Chapter 1

Introduction

The aim of this project is to implement an object tracking system for the courseComputer Vision Systems at Linköping University. The problem can be brokendown into two major components: detection and tracking. Detection involvesthe separation of foreground from background, segmentation and giving eachdetected object an identity. Tracking refers to the part where identity is main-tained over time.

To realise such a system techniques such as background modeling, graphcuts, contour extraction and temporal filtering are used.

Furthermore one important criteria is that the system implemented performin real time or as near real time as possible. To such effect the chosen implemen-tation language is C/C++ and a complementary open source computer visionlibrary (OpenCV2.01) is used.

1.1 The dataset

The datasets used in the project come from the following sources:

• CAVIAR2 EC Funded CAVIAR project/IST 2001 37540

• ViSOR3 Video Surveillance Online Repository

1.2 Generalization and Assumptions

To simplify the task of object tracking a number of assumptions have beenmade:

• The camera is assumed to be static i.e. no ego-motion is present, thereforeall movement observed comes from the observed scene and not the cameraitself.

• The background is mostly static, mostly in this case means that the back-ground consists of simple geometries such as the inside of a building or a

1http://opencv.willowgarage.com/wiki/2http://homepages.inf.ed.ac.uk/rbf/CAVIAR/3http://www.openvisor.org

2

road which do not change significantly over time (disregarding differinglightning conditions).

• All objects in a scene move at a similar velocity across the scene.

3

Chapter 2

Methods

2.1 Background modeling

By removing the background from a scene movement over time can be detected.However accurately determining what is background and what is foregroundin a scene is not a simple task. Noise, differing lighting conditions (dark vs.a lit room) and false positives (trees moving in the wind) all make the taskharder. The simplest method would to simply take an image and assign itas a background reference and then perform the subtraction between eachframe and this reference. This approach however only works on scenes wheremost parameters are known and can be controlled, not really suited for mostreal-world situations. Therefore a more robust approach of describing thebackground is needed.

2.1.1 Median filter

Assuming that the background will be present more often than the movingobjects , the background can be modeled as being the temporal median intensityfor each pixel.[3] The method can be modified to only increment/decrementthe median by a chosen step α for each pixel, if the current pixel intensity islarger/smaller than the median value as outlined in [5]. This removes the needto explicitly calculate the median for each pixel in every frame.

Median filtering is a better approach than a static background image. But apoorly chosen α will either incorporate slow-moving objects into the backgroundor leave trailing ghosts in the foreground. See 3.1 for reference.

2.1.2 Gaussian mixture model

An even more appealing approach would be to model the background as a setof gaussian distributions in each pixel describing in each frame which object isclosest to the camera. This approach also lends itself to a multi-modal interpre-tation of the background where several such distributions can be consideredas background. This potentially solves the problem of periodically recurringmovement such as leaves moving in the wind. [5]

4

2.2 Foreground segmentation

Modeling the background is an important step in separating the foregroundfrom the background. However, we need additional segmentation methods toseparate objects from noise and unwanted artifacts.

2.2.1 Morphological operations

One approach to image segmentation is to use fundamental morphologicaloperations. Dilation and erosion are basic morphological operations constructedas logical-based decisions as a structure element convolves over the image.Dilation is like a union operator which causes objects to grow in size. Erosion isan intersection operator and causes objects to shrink. They both can be usedindependently or combined to remove noise, artifacts and isolating individualobjects in a local neighborhood.

2.2.2 Segmentation with Graph Cuts

Graph cuts [2] is a combination of existing image segmentation methods andsimple machine learning techniques. Graph cuts unlike morphogical operationsguarantees a global solution and finds an optimal or minimal cut which dividesthe foreground from background. A graph is constructed with nodes associatedwith each pixel in the image. Nodes are connected with edges weighted withsome measurement of similarity. Each node is also connected to a source andsink node viewed as representing the objects and background respectively. Thestructure of the graph is illustrated in 2.1.

Figure 2.1: Graph structure

The edge weights between nodes should be tuned in such a way whichmakes similar pixels close to each other belong to the same object. In thisimplementation the similarity measurement is based on the image intensity. Theweights between edges is shown in equation 2.1 where I denotes the intensityat the current node and σ the standard deviation.

Wij = e−(I(i)−I(j))2

σ (2.1)

5

The source and the sink weights describes how likely a node belongs to theforeground or the background. These weights are shown in equation 2.2 and2.3 where IF and IB denotes the the intensity of forground and backgroundrespectivly.

Wsi =

(IF − Ii)

(IF − Ii) + (IB − Ii)(2.2)

Wti =

(IB − Ii)

(IF − Ii) + (IB − Ii)(2.3)

With this setup the cost of the cut is very high inside objects of interest andlow around the border of the objects. The MAXFLOW[1] algorithm calculatesthe minimal cut which separates the objects from the background.

2.3 Identity

When the foreground has been predicted the pixels belonging to the foregroundwill be grouped as individual objects and assigned a label. OpenCV provides afunction for finding contours in a binary image, cvFindContours. It creates avector with vectors containing the point coordinates for each contour. The func-tion uses an algorithm described in [4] to detect the contours. To visualize theidentity of tracked objects a rectangular bounding box is rendered surroundingthe points provided by cvFindContours.

2.4 Object identity over time

The identity of an object is maintained by calculating the intersection and areaoverlap of the bounding boxes found in the current and previous frame. Ifthe intersection area is above a certain threshold the previous identity is usedto set the new identity. Rectangels which do not satifiy the condition of areaintersection is given a new unique identity.

6

Chapter 3

Results

3.1 Detection

3.1.1 Background model

The background is modeled by the median filtering approach described earlier.A crude but effective ramp-up of the variable α is performed to quickly reach astate where the median background is usable.

A large α, e.g. 30, is used as starting point for the first n, e.g. 20, frames, thenfor each n frames both α and n is halved until α reaches 0.5 where it remainsconstant.

The choice of α is dependent on the apparent velocity of moving objects ina scene. In cars.wmv from the ViSOR dataset an α of 0.5 is clearly too small atcertain times as can be seen in 3.1.

Figure 3.1: Left: Absolute difference of median. Right: Result of the medianfailing to keep the background intact. cars.wmv, ViSOR

3.1.2 Segmentation

The Graph cut method used for foreground and background segmentationproved to be effective but rather slow. One of the constraints which Graph cutsmade impossible was to perform all calculations in real-time. We solved this

7

problem by downsample the image to half the size and use that informationas setup for the graph. This made it possible to run the algorithm in almostreal-time without any major loss in quality. To maintain object solidity wehad to force the algorithm to only deal with rough structures. This resulted inbetter segmentations in most cases limited the possibility to detect small movingobjects.

Figure 3.2: Left: Absolute difference of median. Right: Result of Graph cutsegmentation. cars.wmv, ViSOR

Figure 3.3: Left: Absolute difference of median. Right: Result of Graph cutsegmentation. Note that it removes the small object to the left Walk3.mpg,CAVIAR

3.2 Tracking

3.2.1 Identity

The cvFindContours function in OpenCV works well to identify the trackedobjects. It manages to provide a good prediction of which pixels that belongtogether. The shape of a contour can shift much between frames though, causingthe bounding box to expand and shrink rapidly. Interpolation between thecurrent frame and the previous reduces this behaviour but could be enhancedby taking more than two frames in consideration.

8

Median filteringupdate step α 0.5

Graph Cutsdownsampling factor 0.5std. deviation σ 100.0ThresholdFG 40ThresholdBG 0

Trackingsize of bounding boxes > 40 px2

overlap of bounding boxes > 40 %

Table 3.1: Constants, used in the implementation

3.2.2 Identity over time

The maintaing of identity between frames works well if the object is not movingto fast and objects do not overlap in the scene. If objects are moving to fast,bounding boxes will not overlap between frames and the identity will the not bepreserved. Overlaping objects in the scene will probably get the same identity.

Figure 3.4: Left: Identity at a previous frame. Right: The identity is maintainedover time. Walk3.mpg, CAVIAR

9

Chapter 4

Conclusion

With rather simple Computer vision techniques we have developed a systemcapable of detecting and tracking objects in several simple cases. However, a lotof further improvements can be done.

Median filtering is a simple approach to background modeling, easy toimplement but produces some errors that could have been avoided using forinstance gaussian mixture models. Median filtering tends to leave trailingghosts in the foreground which creates false positives.

The Graph cut segmentation algorithm is computationally expensive whichmakes it hard to perform foreground segmentation in real-time. Furthermore itproduces a far more stable result than for instance morphological operationswhich need to be manually optimized for each scene.

When determining if two bounding boxes ought to be the same identitywe only take the previous frame into consideration. Performing an averagingoperation over several frames could have been used to make predictions ofthe next most likely position and size of the bounding box. A more formalapproach would be to use Kalman filtering and would have given more accurateinformation when estimating the identity of objects.

Considering the short project time we are pleased with the results and don’tthink we could have produced a far better result without replacing some of thefundamental algorithms for detection and tracking of objects.

10

References

[1] Yuri Boykov and Vladimir Kolmogorov. An experimental comparison ofmin-cut/max-flow algorithms for energy minimization in vision. IEEE Trans.Pattern Anal. Mach. Intell., 26(9):1124–1137, 2004.

[2] Anders P Eriksson, Olof Barr, and Kalle Åström. Image segmentationusing minimal graph cuts. In Fredrik Georgsson and Niclas Börlin, editors,Proceedings SSBA 2006, pages 45–48, 2006.

[3] Milan Sonka, Vaclav Hlavac, and Roger Boyle. Image Processing, Analysis,and Machine Vision. Thomson, 2008.

[4] Ichiro Suzuki and Tadao Kasami. A distributed mutual exclusion algorithm.ACM Trans. Comput. Syst., 3(4):344–349, 1985.

[5] John Wood. Statistical background models with shadow detection for videobased tracking. Master’s thesis, Linköping University, 2007.

.1 Code documentation

/∗ ∗∗∗ t r a c k i n g . h∗∗ /

# ifndef TRACKING_HPP_# define TRACKING_HPP_

# include <iostream ># include <vector ># include < s t r i n g ># include " graph . h"# include " cv . h"# include " highgui . h"# include " cxcore . h"# include " bBox . hpp"

typedef Graph<double , double , double> GraphType ;

11

# define MEDIAN_FINAL_ALPHA 0 . 5# define SIGMA_GC 100 .0# define CONTOUR_MIN_OVERLAP 0 . 4# define CONTOUR_MIN_DISTANCE 150# define CONTOUR_MIN_AREA 40# define DOWN_SAMPLE_RATE 0 . 5 / / Graphcut downsampling f a c t o r .

using namespace std ;

/∗ ∗∗ C l a s s t r a c k i n g .∗ /

c l a s s Tracking {public :

/∗ ∗∗ C o n s t r u c t o r∗ Input : cv : : Mat image , s e t s up t h e g r a p h c u t graph∗ and median background mode l ing d a t a .∗ /

Tracking ( cv : : Mat& input ) ;

/∗ ∗∗ I n i t i a l s e t u p o f median background model .∗ /

void medianSetup ( const cv : : Mat& input ) ;/∗ ∗∗ Updates t h e median background model . C a l l e d from p r o c e s s ( ) .∗ /

void updateMedian ( cv : : Mat& input ) ;

/∗ ∗∗ I n i t i a l s e t u p o f g r a p h c u t graph nodes and e d g e s .∗ /

void graphCutSetup ( cv : : Mat& input ) ;/∗ ∗∗ Updates t h e g r a p h c u t graph . C a l l e d from p r o c e s s .∗ /

void updateGraph ( const cv : : Mat &input , cv : : Mat &output ) ;/∗ ∗∗ C o n s t r u c t s an image from t h e d a t a in t h e g r a p h c u t graph .∗ /

void buildGraphImage ( const cv : : Mat &input , cv : : Mat &output ) ;

/∗ ∗∗∗ /

void morphSegment ( const cv : : Mat &input , cv : : Mat &output ) ;/∗ ∗

12

∗ Finds and c r e a t e s p o i n t l i s t s o f a l l c o n t o u r s found in an image .∗ /

void createContours ( cv : : Mat& input ) ;

/∗ ∗∗ C a l l e d e a c h t ime a new frame n e e d s t o be p r o c e s s e d .∗ /

void process ( cv : : Mat& frame , i n t fps ) ;

/∗ ∗∗ Updates t h e drawn bo u n d i n g bo x e s in t h e f i n a l s t a g e .∗ /

void updateBox ( ) ;

/∗ ∗∗ Updates i d ’ s o f bounding b o x e s found in e a c h f rame .∗ /

void updateIDs ( cv : : Mat& output ) ;

private :

/∗ ∗∗ S t o r e s t h e number o f p r o c e s s e d f r a m e s .∗ /

double frame ;

/∗ ∗∗ C o n t a i n e r o f t h e median background d a t a .∗ /

cv : : Mat median ;

/∗ ∗∗ The a b s o l u t e d i f f e r e n c e o f t h e c u r r e n t f rame and t h e median .∗ /

cv : : Mat absDiffMedian ;

/∗ ∗∗ The image g e n e r a t e d by t h e g r a p h c u t graph .∗ /

cv : : Mat graphImage ;

/∗ ∗∗ S t o r a g e c o n t a i n e r f o r c o n t o u r s .∗ /

cv : : MemStorage s torage ;

/∗ ∗∗ P o i n t e r t o g r a p h c u t graph .∗ /

GraphType ∗g ;

13

/∗ ∗∗ L i s t o f t h e p r e v i o u s f r a m e s b o u n d in g b o x e s .∗ /

vector <bbox> prevIDs ;

/∗ ∗∗ L i s t t h e c u r r e n t f r a m e s b o u n d in g b o x e s .∗ /

vector <bbox> newIDs ;

/∗ ∗∗ L i s t o f t h e " unique " b o u n d in g b o x e s found .∗ /

vector <bbox> f i n a l I D s ;} ;

# endif /∗ TRACKING_HPP_ ∗ /

14

Date post:	25-Apr-2015
Category:	Documents
Upload:	razimi99
View:	936 times
Download:	0 times

Opencv Object Tracking

Documents