+ All Categories
Home > Documents > 2516 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 19, …R. Vasudevan, R. Bajcsy, and S. Sastry are...

2516 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 19, …R. Vasudevan, R. Bajcsy, and S. Sastry are...

Date post: 29-May-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
14
2516 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 19, NO. 10, OCTOBER 2010 A Distributed Topological Camera Network Representation for Tracking Applications Edgar Lobaton, Member, IEEE, Ramanarayan Vasudevan, Student Member, IEEE, Ruzena Bajcsy, Fellow, IEEE, and Shankar Sastry, Fellow, IEEE Abstract—Sensor networks have been widely used for surveil- lance, monitoring, and tracking. Camera networks, in particular, provide a large amount of information that has traditionally been processed in a centralized manner employing a priori knowledge of camera location and of the physical layout of the environ- ment. Unfortunately, these conventional requirements are far too demanding for ad-hoc distributed networks. In this article, we present a simplicial representation of a camera network called the camera network complex ( -complex), that accurately captures topological information about the visual coverage of the network. This representation provides a coordinate-free cal- ibration of the sensor network and demands no localization of the cameras or objects in the environment. A distributed, robust algorithm, validated via two experimental setups, is presented for the construction of the representation using only binary detection information. We demonstrate the utility of this representation in capturing holes in the coverage, performing tracking of agents, and identifying homotopic paths. Index Terms—Multitarget tracking, network coverage, sensor networks, simplicial homology, smart camera networks. I. INTRODUCTION E VER increasing improvements to the resolution and frame rates of cameras has been problematic for camera networks, wherein data has traditionally been processed in an entirely centralized manner. Due to the high cost of transferring data, much of the processing has taken place offline, thus, compromising the overall effectiveness of the network. This observation has driven the deployment of distributed camera networks. An adhoc network is a distributed sensor network that is setup by placing sensors at random locations. Unfortunately, most vi- sion-based algorithms for distributed networks demand a priori Manuscript received August 30, 2009; revised February 15, 2010; accepted May 02, 2010. Date of publication June 07, 2010; date of current version September 17, 2010. This work was supported in part by the ARO MURI grant W911NF-06-1-0076, in part by AFOSR grant FA9550-06-1-0267, in part by DARPA DSO HR0011-07-1-0002 via the projects SToMP, and in part by the National Science Foundation under Grant # 0937060 to the Computing Re- search Association for the CIFellows Project. The associate editor coordinating the review of this manuscript and approving it for publication was Prof. Janusz Konrad. E. Lobaton is with the Department of Computer Science, University of North Carolina at Chapel Hill, NC 27599 USA (e-mail: [email protected]). R. Vasudevan, R. Bajcsy, and S. Sastry are with the Department of Elec- trical Engineering and Computer Sciences, University of California, Berkeley, CA 94720 USA (e-mail: [email protected]; [email protected]; [email protected].). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TIP.2010.2052273 knowledge of camera and static object location in the environ- ment, which is generally either too expensive, too error prone, or too time consuming to recover. Given the desire to maintain the inherent adaptability of ad-hoc networks, it becomes essen- tial to develop distributed algorithms that perform tasks such as coverage verification and tracking without explicit localization information. In this article, we consider a sensor network wherein each node is a camera capable of performing local computation to extract discrete observations corresponding to agents either en- tering or exiting its field of view that maybe transmitted to other nodes for further processing. These observations are used to build a representation of the network coverage without any prior localization information about the cameras or objects. This rep- resentation is an abstract simplicial complex referred to as the camera network complex, or -complex. Our contributions in this article are three-fold: first, we de- velop the -complex to accurately capture all topological information about the visual coverage of the camera network; second, we present and provide experimental validation of an al- gorithm to distributedly construct the representation; third, we describe the utility of the representation by constructing algo- rithms to perform tracking of multiple agents within the camera network. An example of the representation constructed by em- ploying the algorithms proposed in this article can be found in Fig. 1. Importantly, notice that the representation is able to detect the hole in coverage corresponding to the static pillar in addition to correctly capturing overlap in coverage between cameras. This article is organized as follows: a taxonomy of repre- sentations used to capture network coverage is presented in Section II; the tools from algebraic topology used throughout this article are reviewed in Section III; the mathematical models and assumptions of the environment under consideration are described in Section IV; the -complex and the distributed algorithm for its construction are presented in Sections V and VI, respectively; finally, an overview of how to perform homotopic path identification and tracking using our simplicial representation are introduced in Section VII. II. RELATED WORK In this section, we provide a taxonomy of the various camera network coverage representations. These representations can be placed on a spectrum according to the amount of geometric in- formation they provide: at one extreme are Vision Graphs which only provide information about coverage overlap between pairs of cameras and at the other extreme are full-metric 3-D models which explicitly capture camera and object localization. Though 1057-7149/$26.00 © 2010 IEEE
Transcript
Page 1: 2516 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 19, …R. Vasudevan, R. Bajcsy, and S. Sastry are with the Department of Elec-trical Engineering and Computer Sciences, University

2516 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 19, NO. 10, OCTOBER 2010

A Distributed Topological Camera NetworkRepresentation for Tracking Applications

Edgar Lobaton, Member, IEEE, Ramanarayan Vasudevan, Student Member, IEEE, Ruzena Bajcsy, Fellow, IEEE,and Shankar Sastry, Fellow, IEEE

Abstract—Sensor networks have been widely used for surveil-lance, monitoring, and tracking. Camera networks, in particular,provide a large amount of information that has traditionally beenprocessed in a centralized manner employing a priori knowledgeof camera location and of the physical layout of the environ-ment. Unfortunately, these conventional requirements are far toodemanding for ad-hoc distributed networks. In this article, wepresent a simplicial representation of a camera network calledthe camera network complex (�� -complex), that accuratelycaptures topological information about the visual coverage ofthe network. This representation provides a coordinate-free cal-ibration of the sensor network and demands no localization ofthe cameras or objects in the environment. A distributed, robustalgorithm, validated via two experimental setups, is presented forthe construction of the representation using only binary detectioninformation. We demonstrate the utility of this representation incapturing holes in the coverage, performing tracking of agents,and identifying homotopic paths.

Index Terms—Multitarget tracking, network coverage, sensornetworks, simplicial homology, smart camera networks.

I. INTRODUCTION

E VER increasing improvements to the resolution andframe rates of cameras has been problematic for camera

networks, wherein data has traditionally been processed in anentirely centralized manner. Due to the high cost of transferringdata, much of the processing has taken place offline, thus,compromising the overall effectiveness of the network. Thisobservation has driven the deployment of distributed cameranetworks.

An adhoc network is a distributed sensor network that is setupby placing sensors at random locations. Unfortunately, most vi-sion-based algorithms for distributed networks demand a priori

Manuscript received August 30, 2009; revised February 15, 2010; acceptedMay 02, 2010. Date of publication June 07, 2010; date of current versionSeptember 17, 2010. This work was supported in part by the ARO MURI grantW911NF-06-1-0076, in part by AFOSR grant FA9550-06-1-0267, in part byDARPA DSO HR0011-07-1-0002 via the projects SToMP, and in part by theNational Science Foundation under Grant # 0937060 to the Computing Re-search Association for the CIFellows Project. The associate editor coordinatingthe review of this manuscript and approving it for publication was Prof. JanuszKonrad.

E. Lobaton is with the Department of Computer Science, University of NorthCarolina at Chapel Hill, NC 27599 USA (e-mail: [email protected]).

R. Vasudevan, R. Bajcsy, and S. Sastry are with the Department of Elec-trical Engineering and Computer Sciences, University of California, Berkeley,CA 94720 USA (e-mail: [email protected]; [email protected];[email protected].).

Color versions of one or more of the figures in this paper are available onlineat http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TIP.2010.2052273

knowledge of camera and static object location in the environ-ment, which is generally either too expensive, too error prone,or too time consuming to recover. Given the desire to maintainthe inherent adaptability of ad-hoc networks, it becomes essen-tial to develop distributed algorithms that perform tasks such ascoverage verification and tracking without explicit localizationinformation.

In this article, we consider a sensor network wherein eachnode is a camera capable of performing local computation toextract discrete observations corresponding to agents either en-tering or exiting its field of view that maybe transmitted to othernodes for further processing. These observations are used tobuild a representation of the network coverage without any priorlocalization information about the cameras or objects. This rep-resentation is an abstract simplicial complex referred to as thecamera network complex, or -complex.

Our contributions in this article are three-fold: first, we de-velop the -complex to accurately capture all topologicalinformation about the visual coverage of the camera network;second, we present and provide experimental validation of an al-gorithm to distributedly construct the representation; third, wedescribe the utility of the representation by constructing algo-rithms to perform tracking of multiple agents within the cameranetwork. An example of the representation constructed by em-ploying the algorithms proposed in this article can be foundin Fig. 1. Importantly, notice that the representation is able todetect the hole in coverage corresponding to the static pillarin addition to correctly capturing overlap in coverage betweencameras.

This article is organized as follows: a taxonomy of repre-sentations used to capture network coverage is presented inSection II; the tools from algebraic topology used throughoutthis article are reviewed in Section III; the mathematical modelsand assumptions of the environment under consideration aredescribed in Section IV; the -complex and the distributedalgorithm for its construction are presented in Sections Vand VI, respectively; finally, an overview of how to performhomotopic path identification and tracking using our simplicialrepresentation are introduced in Section VII.

II. RELATED WORK

In this section, we provide a taxonomy of the various cameranetwork coverage representations. These representations can beplaced on a spectrum according to the amount of geometric in-formation they provide: at one extreme are Vision Graphs whichonly provide information about coverage overlap between pairsof cameras and at the other extreme are full-metric 3-D modelswhich explicitly capture camera and object localization. Though

1057-7149/$26.00 © 2010 IEEE

Page 2: 2516 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 19, …R. Vasudevan, R. Bajcsy, and S. Sastry are with the Department of Elec-trical Engineering and Computer Sciences, University

LOBATON et al.: A DISTRIBUTED TOPOLOGICAL CAMERA NETWORK REPRESENTATION FOR TRACKING APPLICATIONS 2517

Fig. 1. �� -complex for a network of three cameras constructed using themethodology presented in this article: the views from different cameras (bottom-left), planar projection of the coverage of three cameras and their decomposedviews (each region is given a different letter) due to occluding objects (top), thesimplicial complex built by finding the overlap between the decomposed viewsof the cameras (bottom-right). Edges between decomposed views denote pairwise overlap in coverage and blue triangles denote three way overlap in cov-erage. The simplicial complex, correctly, contains a single hole (i.e., the loopwith vertices ��, ��, ��, �� and ��) that corresponds to the column which actsas an occluding object in the physical coverage.

we employ this notion of information to distinguish betweenthese various representations, the methods chosen to recoverthese representations (usually either an appearance or co-occur-rence algorithm) are not mutually exclusive.

1) Vision Graph: A Vision Graph is a graph where eachnode represents a camera’s coverage and edges specify anoverlap in coverage. The graph provides connectivity infor-mation about the network, but provides no other geometricinformation about the network coverage (e.g., the holes in thecoverage). Cheng et al. [1] build a Vision Graph distributedly bybroadcasting feature descriptors of each camera view throughthe network to establish correspondences between cameras.In contrast, Marinakis et al. [2], [3] build a Vision Graph bycomparing reports of detections between cameras and thenuse a Markov model for modeling the transition probabilitiesand minimize a functional using Markov Chain Monte CarloSampling.

2) Simplicial Representation: Several have attempted to im-prove upon the connectivity information provided by the VisionGraph by incorporating geometric information from the envi-ronment into the representation. This work has focused on thedetection and recovery of holes in the coverage due to the en-vironment. Prior work has relied mostly on considering sym-metric coverage (explicitly or implicitly) or high density sensor

coverage in the field. Vin de Silva et al. [4] obtain the Rips com-plex based upon the communication graph of the network andcompute homologies using this representation. Their methodassumes some symmetry in the coverage of each sensor node(such as circular coverage), however, this assumption is invalidin camera networks. Muhammad et al. [5] have also worked onthe distributed computation of simplicial homology for moregeneral sensor network using a communication graph of the net-work, but their work provides no experimental validation.

The -complex, the focus of this manuscript, is a simpli-cial complex introduced by Lobaton et al. [6]. The constructionrelies on the decomposition of the image domain of each cameraby using occluding contours corresponding to static objects. The

-complex is proven to capture the homotopy type of thecoverage of the camera network (e.g., it captures the holes incoverage and the overlap in coverage between cameras) for 3-Denvironments with vertical walls. However, there are no guar-antees for generic 3-D environments. A distributed algorithmfor its construction under noisy observations and multiple tar-gets in the environment using reports of detections is also con-sidered by Lobaton et al. [7]. This latter work only consideredan indoor three camera, two target example which took severalhours to setup. In this article, we extend their example to an in-door and outdoor eight camera, five participant setup that tookapproximately fifteen minutes to setup. Moreover, we describehow this additional geometric information can be exploited toperform tracking of multiple agents within the camera network.

3) Activity Topology: Activity Topology refers to the modelobtained after identifying specific regions within the image do-main from different camera views that correspond to the samephysical location. Contrast this with the Vision Graph whereinthe entire image domain is compared to establish overlap in cov-erage. This representation moves closer to a full metric recon-struction; however, little effort has been made to exploit this in-formation to characterize network coverage.

Mankris et al. [8] construct this representation by ap-plying an appearance model between observed data in orderto determine overlap between different portions of views.Van den Hengel et al. [9] introduce an exclusion approach tocalculate the Activity Topology by starting with all possiblecombinations of topological connections and removing in-consistent links again using an appearance model. Detmoldet al. [10] provide algorithms for large network setups, andan evaluation of the method and datasets are made available[11]. Though their method only relies on the detection of atarget and avoids the use of appearance models, it has theunfortunate shortcoming of requiring the continuous streamingof detections from each camera.

4) Full-Metric Model: Full-metric models capture all geo-metric information about camera location (i.e., positions andorientations), which then determines the overlap between cam-eras as long as there exist no objects in their field of view. Whenobjects are present in the environment, it is necessary to recoverthe locations of the objects in order to properly characterize thecoverage of the network. Unfortunately, the amount of compu-tation and the difficulty of constructing robust algorithms to ac-curately localize cameras is nontrivial.

Page 3: 2516 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 19, …R. Vasudevan, R. Bajcsy, and S. Sastry are with the Department of Elec-trical Engineering and Computer Sciences, University

2518 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 19, NO. 10, OCTOBER 2010

Stauffer et al. [12] determine connectivity between overlap-ping camera views by calculating correspondence models andextracting homography. Lo Presti et al. [13] compute homogra-phies by approximating tracks using piecewise linear segmentsand appearance models. Meingast et al. [14] utilize tracks andradio interferometry to fully localize the cameras. Rahimi et al.[15] describe a simultaneous calibration and tracking algorithm(using a network of nonoverlapping sensors) by using velocityextrapolation for a single target. All of these algorithms work ina centralized fashion. Funiak et al. [16] introduce a distributedalgorithm for simultaneous localization and tracking with a setof overlapping cameras, but their algorithm is not robust to largechanges in perspective.

Though all the representations considered in this section pro-vide useful information about the deployment of cameras in anetwork, we employ the -complex since it provides the flex-ibility required in an ad-hoc network (i.e., it is robustly com-putable in a distributed fashion), while not sacrificing valuablegeometric information about the environment.

III. MATHEMATICAL BACKGROUND

In this section, the concepts from algebraic topology usedthroughout this manuscript are introduced. This section containsmaterial adapted from [4] and is not intended as a formal intro-duction to the topic. For a proper introduction to the topic, thereader is encouraged to read [17]–[19].

A. Simplicial Homology

In order to characterize network coverage, we employ the fun-damental construct of algebraic topology: the simplex.

Definition 1: Given a collection of vertices , a -simplexis a set where and forall . A -simplex, , is a face of a -simplex,

, denoted , if the vertices of form a subset of thevertices of . A finite collection of simplices, , is called asimplicial complex if whenever a simplex lies in the collectionthen so does each of its faces.

Simplices are defined as purely combinatorial objects whosevertices are just labels requiring no coordinates in space. Con-structing simplices given a collection of sets is the focus of thisarticle, and we consider the simplest of such methods next.

Definition 2: The nerve complex of a collection of sets,, for some , is the simplicial complex where

vertex corresponds to the set and its -simplices corre-spond to nonempty intersections of distinct elements of

.It is also useful to consider various modifications that improve

the overall utility of the simplicial complex.Definition 3: The tracking graph, , of a simplicial

complex, , is a directed graph such that the vertices corre-spond to simplices in and an edge from to is present if

.To illustrate these concepts, consider the network in 2-D il-

lustrated in Fig. 2. Each camera position has a label from 1 to 7,and their corresponding field of view, , is shaded in the plane.The nerve complex consists of the simplices in Table I. Thenerve of the collection is depicted graphically on the top-right

Fig. 2. Collection of sets corresponding to the coverage of a camera network(left) with corresponding nerve complex (top-right) and tracking graph (bottom-right).

TABLE ILIST OF SIMPLICES FOR EXAMPLE IN FIG. 2

of Fig. 2, where 0-simplices are represented by nodes, 1-sim-plices are represented by edges, and 2-simplices are representedby triangles. Although in this setup the nerve complex capturesall topological information in regards to network coverage, inSection V, we illustrate a setup where the nerve complex is un-able to accurately characterize the network coverage. The corre-sponding tracking graph for the simplicial complex is shown onthe bottom-right of Fig. 2. Beyond providing tools to rigorouslycharacterize network coverage, algebraic topology allows us todefine meaningful algebraic structures on these simplices.

Definition 4: Let be the -simplices of a given com-plex, for some . Then, the group of -chains, , is thefree Abelian group generated by . That is

if and only if

for some . If there are no -simplices, then .Similarly, .

This definition allows us to construct an algebraic operationwhich characterizes topological invariants.

Definition 5: Let the boundary operator applied to a-simplex , be defined by

and extended to any by linearity. A -chain, ,is called a k-cycle if . The set of -cycles, denoted by

, is the and forms a subgroup of . That is

A -chain is called a k-boundary if there existssuch that . The set of -boundaries, denoted

Page 4: 2516 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 19, …R. Vasudevan, R. Bajcsy, and S. Sastry are with the Department of Elec-trical Engineering and Computer Sciences, University

LOBATON et al.: A DISTRIBUTED TOPOLOGICAL CAMERA NETWORK REPRESENTATION FOR TRACKING APPLICATIONS 2519

by , is the image of and it is also a subgroup of .That is

We can check that for any , whichimplies that is a subgroup of .

We now make several important observations. First, theboundary operator, , maps a -simplex to its -sim-plicial faces. Second, a calculation shows that the 1-simplicesthat form a closed loop correspond to the group of 1-cycles (aproof of this fact can be found in any of the algebraic topologybooks we reference at the beginning of this section). We areinterested in detecting the holes in our domain due to static ob-jects. Unfortunately, these type of holes are just a subset of ;namely, 1-cycles can also be obtained from the 2-boundaries ofa given complex. This observation motivates the definition ofhomology groups, which define a type of topological invariant.

Definition 6: The th homology group is the quotient group

The homology of a complex is the collection of all homologygroups. The rank of , called the th Betti number, , givesus a coarse measure of the number of holes. In particular, isthe number of connected components and is the number ofloops that enclose different “holes” in the complex.

Considering again the example from Fig. 2, we can representthe group of 0-chains with the vector space by identifyingthe simplices with the standard basis vectors

, where and so on. For, we identify the 1-simplices with the standard basis vectors

, where andso on. Similarly for , the 2-simplices are identified with thestandard basis vectors , where andso on.

As mentioned earlier, is the operator that maps a simplexto its boundary faces. For example, we have

That is, can be expressed in matrix form as

and

Since and , then

and

Hence, we recover the fact that there is only a single connectedcomponent in Fig. 2. Similarly, it can be verified that

which tells us that the number of holes in our coverage is 1.Observe, for (since ).

B. Cech Theorem

Next we introduce the Cech Theorem which is proved by Bottet al. [20]. Before proceeding further, the following definition isrequired:

Definition 7: Given two spaces and , a homotopy be-tween two continuous functions andis a 1-parameter family of continuous functionsfor connecting to . Two spaces and aresaid to be of the same homotopy type if there exist functions

and with homotopic to the identitymap on and homotopic to the identity map on . A set

is contractible if the identity map on is homotopic to aconstant map.

Put simply, two functions are homotopic if it is possible tocontinuously deform one into the other. A space is contractibleif it is possible to continuously deform it into a single point. Twospaces with the same homotopy type have the same homology.

Theorem 1 (Cech Theorem): If the sets (for some) and all nonempty finite intersections are contractible,

then the union has the homotopy type as the nervecomplex.

If the required conditions are satisfied, then the topologicalstructure dictated by the union of the sets is captured by thenerve complex. Observe that in Fig. 2 all intersections are con-tractible. Therefore, we conclude that the extracted nerve com-plex has the same homology as the space formed by the unionof the coverage.

IV. ENVIRONMENT MODEL

In this section, the model used for the physical layout, camerasensors, and agents in the environment are made explicit.

The Environment: Consider a 3-D domain and a collectionof objects with the following properties:

• there is a global coordinate system, ; Points in this co-ordinate system are denoted by ;

• objects and cameras reside within the planes (the“floor”) and the (the “ceiling”);

• objects are sets of the form

where is a connected polygon with nonempty interior,and the number of objects, , is finite.

Agents: Agents are represented by the following properties:

Page 5: 2516 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 19, …R. Vasudevan, R. Bajcsy, and S. Sastry are with the Department of Elec-trical Engineering and Computer Sciences, University

2520 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 19, NO. 10, OCTOBER 2010

• they are line segments of the form

where ;• agents move continuously in the environment by trans-

lating along the floor plane and changing theirlocation.

Cameras: cameras are located at static unknown locations,capable of detecting the agents, and satisfying the followingproperties:

• each camera has a local coordinate system, , wherethe origin, , corresponds to the camera position. Pointsin this coordinate system are denoted by ;

• the field of view, , of a camera is given by

where is the interior of a polyhedral convex cone basedat ;

• the camera projection, , for camera isgiven by

The image of this map is called the image domain, ;• the detection set, , of agent in camera is defined

as shown in the equation at the bottom of the page. whereis the line segment joining points and . An agent is

said to be visible by camera if ;• the coverage, , of camera is given by

an agent at is visible by camera

The network coverage, , is the union of the individualcoverage of the cameras.

Objects are static elements in the environment while agentsare dynamic elements in the environment. We assume the exis-tence of a global coordinate system for the sake of clarity whilespecifying the location of objects, agents or cameras, but its cal-culation is not required. Although these assumptions may seemrestrictive, most camera networks (indoor or outdoor) satisfy ourassumptions. Several of the choices in our model (such as thevertical line target and polyhedral objects) are made in orderto simplify analysis. We validate these assumptions in real-lifescenarios through experiment. The example in Fig. 3 shows anagent and a camera with its corresponding field of view. On theright plot of the figure, we illustrate the coverage of the camera

Fig. 3. Mapping from 3-D to 2-D: A camera and its field of view are shownfrom multiple perspectives (left and middle), and its corresponding coverage asa set in 2-D (right). For the 3-D configuration, the planes displayed bound thespace that can be occupied by an agent.

Fig. 4. Nerve complexes (bottom row) obtained from the collection �� � (toprow). One complex captures the correct topological information (bottom-left),but the other does not (bottom-middle) unless the coverage is properly decom-posed (bottom-right).

as a subset in . There is a clear mapping from our 3-D sce-nario to a 2-D domain.

Our goal, in this article, is to obtain a representation thatcaptures the topological structure of camera network coverage,while not relying on a priori knowledge of camera or objectlocation.

V. -COMPLEX

The goal of this section is to outline the steps required to con-struct a simplicial complex that captures accurate topologicalinformation about the network coverage, . A naïve approachconsiders the nerve complex constructed from the collection ofcamera coverages, , as illustrated in Fig. 4 (left). Unfortu-nately, the middle diagram in this figure demonstrates that thisapproach fails. In this case, the hypothesis of the Cech The-orem is unsatisfied ( is not contractible). If, on the otherhand, we first decompose camera 1’s coverage along lines cor-responding to the object’s boundaries and consider the nervecomplex constructed from these four coverage sets as done inthe right diagram of Fig. 4, then we capture the hole in the cov-erage. In fact, this construction accurately captures the topolog-ical structure of the camera network coverage.

Page 6: 2516 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 19, …R. Vasudevan, R. Bajcsy, and S. Sastry are with the Department of Elec-trical Engineering and Computer Sciences, University

LOBATON et al.: A DISTRIBUTED TOPOLOGICAL CAMERA NETWORK REPRESENTATION FOR TRACKING APPLICATIONS 2521

This decomposition of a camera’s coverage can be obtainedvia local detections from each camera without knowledge ofcamera position. To illustrate this approach, consider trackingthe set of detections in the image domain, , of a cameraas agents move in the physical environment. We would observeline segments that either leave through its boundary or dis-appear from the interior of the domain outlining line segmentscorresponding to the occluding contours due to the vertical ob-jects in the domain. If we extend these line segments, we obtainbisecting lines that decompose the image domain and, thus, thecamera coverage. We describe the algorithm to perform this taskin Section VI-A.

Definition 8: The nerve complex constructed after decom-posing each camera’s field of view by its corresponding bi-secting lines is called the CN-complex.

Its construction consists of two steps:1) identify all bisecting lines to decompose each camera’s

coverage;2) construct the nerve complex on the resulting collection of

sets by determining whether there is an overlap betweencameras.

This construction guarantees that all finite intersections betweenthe resulting sets are contractible which together with the CechTheorem yields the following result whose proof can be foundin [6]:

Theorem 2 (Decomposition Theorem): Given an environ-ment and camera network that satisfy the modeling assumptionsfrom Section IV and the coverage of each camera is connected,the nerve complex of the collection of sets obtained afterdecomposing each camera’s field of view by its correspondingbisecting lines has the same homotopy type as the networkcoverage.

Since we are only interested in recovering the homotopy typeof a 2-D coverage, due to our environmental model, we need toonly consider a nerve complex constructed with a maximum of2-simplices.

VI. DISTRIBUTED ROBUST -COMPLEX CONSTRUCTION

In this section, we describe the algorithms required to dis-tributedly construct the -complex.

A. Finding Bisecting Lines

First, we address the problem of detecting the bisecting linesthat decompose the image domain of a camera. To this end, weassume we have a background subtraction algorithm: thresh-olding the difference between a background image and the cur-rent frame is sufficient. An example of the output of this al-gorithm on several sample images can be found in Fig. 5. Un-surprisingly, this algorithm does not perform particularly well.We then utilize an algorithm presented by Jackson et al. [21],which consists of accumulating the boundary of foreground ob-jects wherever partial occlusions are detected. In our case, weonly store the detections at times when occlusion events (i.e.,an agent appears or disappears from a camera’s view) occur.We are uninterested in the exact boundary of the objects, butonly the bisecting lines. Hence, we take the approach of first

Fig. 5. Sample frames from a video sequence employed to find bisecting linesand construct the�� -complex presented in the experimental Section VI-D. Ac-tual images (top) and corresponding detections obtained by frame differencing(bottom). Note the unreliability of the foreground segmentation results.

Fig. 6. Steps to find bisecting lines: for the original view (left), the boundariesof the foreground masks are accumulated whenever occlusion events are de-tected (top-middle). Vertical bisecting lines are estimated by aggregating obser-vations over all rows and obtaining the indices of the columns with the highestdetections (bottom-middle). Bisecting lines are further refined through a linearfit procedure using the accumulated observations (right).

approximating any occluding boundary with vertical lines andthen refining the fit. This step is done locally at each node.

In Fig. 6, we observe a camera view with several occludingboundaries due to walls and a column (left). The accumu-lated boundaries of the foreground detections are shown onthe middle. Initial estimates for the boundaries are chosenat the peaks of the distributions of detections along eachcolumn (middle-bottom). Finally, the estimates are refined byperforming a least-square fit on the data with respect to allthe points on the boundary that are close to the vertical lineestimates. The final result is shown in the right plot.

B. Finding Intersect Points

After the bisections within each camera view have been cal-culated, we must determine the connectivity between the re-sulting decomposed views. Throughout this section when refer-ring to a camera view, we mean one of the resulting decomposedregions. More specifically, we look for intersect points (i.e.,points in the intersection of the field of views of the cameras).No target identification is necessary, but recurrence over timeis exploited. This is accomplished by approximating the proba-bilities of overlap using the number of times that an occlusionevent occurs in a camera (hit count), and the count of concur-rent detections with other cameras (match count). Each cameranode is synchronized and has the following properties:

• each has a unique ID;• agent detections are stored in detection sets, , with

corresponding times, , where is the frame number;• a list of intersect points, , is maintained which con-

tains coordinates of each intersect point in its local imagedomain, the ID of the primary camera for which each point

Page 7: 2516 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 19, …R. Vasudevan, R. Bajcsy, and S. Sastry are with the Department of Elec-trical Engineering and Computer Sciences, University

2522 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 19, NO. 10, OCTOBER 2010

is also visible, the coordinates of each point in the imagedomain of the primary camera, and corresponding hit andmatch counts;

• a list of intersect points, , is maintained which con-tains coordinates of each intersect point in its local imagedomain, the IDs of the primary and secondary cameras forwhich each point is also visible, the coordinates of eachpoint in the image domains of the primary and secondarycameras, and corresponding hit and match counts.

The role of the primary and secondary cameras and how to cal-culate the various lists are explained later in this section, but firstwe illustrate our approach by considering an example.

Assume we have two cameras in a room of area 1 with regionin the coverage of camera 1 and region in the coverage

of camera 2. Let be the intersection of these regions. Alsoassume that we have independent agents, and the probabilityof an agent’s location is uniformly distributed over the room. Wedefine as the event that there is a detection in at a giveninstance, as the complementary event, and as the eventthat there is a detection in their intersection. For simplicity, weassume just for this thought experiment that an agent is detectedin if and only if it is actually present (i.e., there are no errorsin detection). Hence, we have

(1)

where for a set , is the set complement over the room andis its area. Then

(2)

Therefore, the probability of detecting a target in given adetection in is the conditional probability

(3)

We observe that this quantity is a function of the overlapand measures the amount of detections that can be explainedby observations in the intersection. If and are dis-joint, then . As the overlap increases, the prob-ability increases. If , then . These ob-servations are illustrated in Fig. 7. Intuitively, we expect a sim-ilar behavior even if the agents are not uniformly distributed, andthere are detection errors. Hence, we use the following quantityas a measure of the overlap between two regions:

(4)

Note that this quantity has the following properties:whenever , or ; and when-ever . In our algorithm, is employed as a direct mea-sure of the confidence of overlap in coverage between cameras1 and 2, which corresponds to a 1-simplex in our representation.

Fig. 7. Geometric depiction illustrating different overlapping configurationsand corresponding detection probabilities for 3 agents in a square room of area1. Intuitively, whenever � and � are obtain � �� �� � � �. For a partialoverlap, we expect a larger probability value (middle). If �� � � �� � � ����and �� � � � � �����, then �� ���� � � � �� �� � � ���. If we haveperfect overlap, then we observe that � �� �� � � �.

The quantity seems an ideal measure to quantify overlapbetween camera regions; however, its direct computation re-quires the identification of detections in the overlap between twocameras, in order to compute , which is unknown. Nev-ertheless, by assuming that the detection between two disjointregions are independent from one another we observe that

(5)

where represents a detection in the regionand is similarly defined. We also have that

and . Then,we can express the previous equation as

(6)

where . Hence

(7)

We emphasize the dependence of this probability on the param-eter by using the notation . If we are given samples

from this distribution, we can solve for the maximumlikelihood estimator by using the formula

(8)

where since is a subset ofand .

Remember that approximates and we can use thisquantity to estimate using (4) given that andare known. This gives

(9)

In order to perform the computations to calculate , we re-quire , and samples from . In our al-gorithms, detections occur whenever there are agents enteringor leaving the coverage of a camera. and can beestimated by counting the number of detections in each cameraover time. The samples from are obtained by having

Page 8: 2516 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 19, …R. Vasudevan, R. Bajcsy, and S. Sastry are with the Department of Elec-trical Engineering and Computer Sciences, University

LOBATON et al.: A DISTRIBUTED TOPOLOGICAL CAMERA NETWORK REPRESENTATION FOR TRACKING APPLICATIONS 2523

camera 1 broadcast its detections to the network and camera 2keeping a count of concurrent and missed detections.

A similar analysis can be performed to estimate the proba-bility of overlap between three regions by assumingthat and are known, and samples from

are available. Note that is estimated fromthe previous argument. In this case, we get

(10)

where represents , andis obtained from (7) by replacing with and

with . Then, we can define the overlap ratio

(11)

Note that samples from are not directly available.Instead we use samples from as long asis close enough to 1 (we use a value of 0.9 in our experiments)which guarantees a large overlap between and .

In our algorithm is employed as a direct measure of theconfidence of overlap between three regions, which correspondsto a 2-simplex in our representation. Since we are only interestedin recovering the homotopy type of a 2-D coverage, we needonly consider a nerve complex constructed with a maximumof 2-simplices. Hence, we do not need to consider any higher“degree” overlap ratios.

It is possible to bound the estimated overlap ratios such thatvalues above a given threshold are guaranteed to correspondto sufficient overlap between two regions. Nevertheless, such abound would require a priori knowledge about the distributionof an agent’s location, the number of agents and the geometry ofthe environment; however, this information may be unavailableand calculating an arbitrary cutoff maybe impossible. There-fore, we employ an argument from algebraic topology calledpersistence to robustly analyze the observed data in order toavoid making undue assumptions when extracting topologicalinformation about the coverage. This approach is described inSection VI-C.

Next, we describe an algorithm to obtain samples fromand distributedly between two

and three cameras, respectively. Locally, each camera makesobservations and transmits detections after every occlusionevent. The transmission of these detections serves as a triggerfor other cameras to transmit their own detections. Once thedetections have been shared, the estimates of the conditionalprobabilities are updated.

The algorithm’s goal is to estimate the aforementioned con-ditional probabilities using the number of times that an occlu-sion event occurs in a camera, the hit count, and the count ofconcurrent detection between other cameras, the match count.In the case of pairwise detections, we have a local and primarycamera (the camera in which the event is observed). In the caseof three cameras, we have a local, a primary (the camera inwhich the event is observed), and a secondary camera. Thesecounts are used to compute the overlap ratios defined in (9) and(11). These ratios are computed over all intersect points and use

Fig. 8. Experiment I setup: views from cameras 1 (top-left) through 3 (top-right) and corresponding detected bisecting lines (bottom row).

Fig. 9. �� -complex found for Experiment I using a threshold value of � �

��� (left). The persistence diagrams obtained from the experiment (right). Thebars in the diagram correspond to topological features that are tracked over arange threshold values � . The start of the bar corresponds to the “birth” of thefeature and its end corresponds to its “death” (e.g., a hole is introduced and thenvanishes). Note, a single connected component and single hole are the mostpersistent features. The hole is due to the column in the middle of the room.

the maximum of these quantities as an indicator of the likeli-hood of overlap between two or three cameras.

The first step in computing the hit and match counts for theintersect points is to perform occlusion event detection andtransmit this information across the network. This process isoutlined in Algorithm 1. Whenever an event is detected, itserves as a trigger for information sharing between cameranodes.

Algorithm 1 Event Detections

1:if Current frame number, , is greater than 1 then

2: if and then

3: Compute coordinates of detections points in .

4: Add coordinates, time , and camera IDto transmission queue.

5: else if and then

6: Compute coordinates of detection points in .

7: Add coordinates, camera ID, and timeto transmission queue.

8: end if

9:end if

Once an event transmission is received by another cameranode, it is used to update detection counts for the intersectpoints, . This process is outlined in Algorithm 2. The

Page 9: 2516 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 19, …R. Vasudevan, R. Bajcsy, and S. Sastry are with the Department of Elec-trical Engineering and Computer Sciences, University

2524 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 19, NO. 10, OCTOBER 2010

primary camera is the camera from which the event originated.Entries that are unreliable, based upon low estimates of theconditional probabilities with high enough confidence, areremoved as described in line 3. An entry is labeled unreliableif there are more than 10 observations, and its probability ofdetection is below 0.1. Note that the first loop of the algorithm(starting at line 4) takes care of updating the match counts,while the second loop (starting at line 13) takes care of up-dating the hit counts. At the end of the algorithm (line 18), theobservations of the local camera are transmitted throughoutthe network. These observations play the role of secondaryobservations while updating .

Algorithm 2 Update Detection Counts in

1:Receive detection coordinates , time , and ID fromprimary camera.

2:Obtain local detection, , at time .

3:Remove any entries in list of intersect points, , that areunreliable.

4:for each detection coordinate from primary camera do

5: for each detection coordinate in do

6: if entry in exists with matching coordinates(same as the one we are iterating over) and receivedcamera ID then

7: Increase match count of such entry by 1.

8: else

9: Create entry in with corresponding camera IDand coordinates, hit count set to 0 and match countset to 1.

10: end if

11: end for

12:end for

13:for each detection coordinate from primary camera do

14: if entries in exist with corresponding primarycoordinates and primary camera ID then

15: Increase hit count of such entries by 1.

16: end if

17:end for

18:Add coordinates in and of detection points forwhich there was a match, primary ID and local camera ID, andtime to transmission queue.

Once an event transmission and secondary observations arereceived by a camera node, the counts for the intersect points,

, is updated using a process identical to Algorithm 2. Theonly differences are that is replaced by , mentions ofprimary are replaced by primary and secondary, and no trans-mission is necessary at the end of this process. Data storage and

processing occurs distributedly, and each camera maintains alocal copy of the representation.

At the completion of the algorithm, we have a list of simpliceswith an associated likelihood, which are computed by taking themaximum probability between all intersect points with verticescorresponding to the same simplex. This is a probabilistic ver-sion of the -complex that is used in the next section to ex-tract a deterministic -complex.

C. Computing Homology

As described in Section III, homology provides topologicalinvariants. The homology of our representation can be extracteddistributedly using the algorithms proposed by Muhammad etal. [5]. According to the results of the previous section, differentcomplexes are obtained as a function of the threshold chosen onthe simplex likelihood (i.e., maintaining only those simplicesthat have a likelihood above a specified threshold ). This ofcourse means that the various homologies of these complexesare also a function of the threshold.

At this point, we employ a persistent homology approach[22], [23]. Namely, we do not choose a particular thresholdon the probability which would dictate only a single simplicialcomplex, but analyze the various homologies over the range ofpossible probabilities, . The outcome of this approachare barcodes with initial position depicting the birth of a newtopological feature (a new connected component or hole) andits final position which symbolizes the disappearance of sucha feature. We then choose a threshold according to the pointwith the most persistent topological feature (i.e., a thresholdchosen amongst the set of possible thresholds corresponding tothe range with the most consistent feature). This allows us tochoose a threshold on the conditional probabilities consideredin the previous subsection without relying on a priori knowl-edge about the distribution of an agent’s location, the numberof agents, or the geometry of the environment. Though we donot describe how to compute persistence, their exist efficient al-gorithms to perform this computation [22], [23].

An example is illustrated in Fig. 9 for the experimental sce-nario considered in the next section. The x-axis in the right plotdenotes a threshold over the likelihood, each blue line denotesthe existence of a connected component or a hole as a func-tion of this threshold, and the number of blue lines at a partic-ular threshold denotes the number of connected components orholes.

D. Experiment

In this section, we consider two experimental setups withthree and eight cameras, respectively. Video was recordedsimultaneously from all cameras using several computers. Thecomputers were synchronized used the network time protocol(NTP) and use no prior knowledge about camera locations,no appearance or tracking models, or no knowledge about thenumber of targets. The data was processed in a single computerby simulating a distributed camera network. Namely, eachcamera was treated as an independent process on our com-puter. The amount of data transmitted and processing required

Page 10: 2516 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 19, …R. Vasudevan, R. Bajcsy, and S. Sastry are with the Department of Elec-trical Engineering and Computer Sciences, University

LOBATON et al.: A DISTRIBUTED TOPOLOGICAL CAMERA NETWORK REPRESENTATION FOR TRACKING APPLICATIONS 2525

Fig. 10. Plots of the number of block detections per occlusion event over time for Experiment I. We note that the events are relatively sparse (over an 8.5 minperiod), and the number of blocks detected at each time step are under 15 in most cases. There were a total of 180, 42, and 270 events on each camera, respectively,for a total of 492 events in the network.

are small enough to occur distributedly on a sensor networkplatform such as CITRIC [24].

The simulation was performed in MATLAB by creating sep-arate structures for each camera sensor which maintained in-formation about its current state and a list of in-queue and out-queue messages. A path connecting cameras with consecutiveIDs is used as a communication graph. Every 0.1 s a singlemessage is transmitted from each camera to its communicationneighbors. Messages were processed locally once they were re-ceived. We keep track of the number of messages generated byeach camera sensor and the amount of data that goes throughthem.

The data in both experiments was captured using a resolu-tion of 320 240 for all the cameras at about 10 frames persecond with continuous motion throughout the sequence. Thenumber of people moving through the scene varied from one tofive. We computed intersect points and corresponding probabil-ities as described in the Section VI-B. However, instead of con-sidering every possible pixel as an intersect point, we split theimage domain into blocks of size 20 20 and use these regions.The raw transmission of these detections using a single bit perblock would need to be streamed at approximately 14 kB/minfrom each camera. However, in our experiment we observe lessthan 2 kB/min of data generated by each camera (see Tables IIand III). Since our algorithm is event driven, its communicationcost is unaffected by devices with higher temporal resolution.

1) Experiment I: In our first experiment, we utilized threecameras in an indoor environment. Each camera was connectedto a different computer while recording the data. The sequencecorresponds to about 8.5 min of recording with the first 3.2 mincorresponding to a single target, the next 3.3 min correspondingto a different single target, and the last 2 min corresponding totwo targets moving in the environment. Images were capturedat about 10 frames per second.

The physical setup of our experiment is shown on the top ofFig. 1. Views from the three cameras are shown in the top row ofFig. 8. Importantly, note that though there is significant overlapbetween the three cameras, finding common features betweenviews would be difficult due to the large change in perspective.The decomposed camera views (after finding bisecting lines)are shown at the bottom of Fig. 8. There are three regions incamera 1, one region in camera 2, and five regions in camera 3after decomposition. Fig. 9 (left) illustrates the correspondingsimplex after we threshold with a value of . From the

TABLE IISUMMARY OF DATA TRANSMISSION FOR EXPERIMENT I

right plot, we observe a single connected component and holein the domain are the most persistent topological features in thecoverage, as desired.

Since detections are only transmitted after an occlusion event,the transmission rate is low. Fig. 10 shows a summary of thenumber of blocks in which a detection was observed for eachcamera over time. Table II shows the average number of commu-nication packets associated with the construction of 1-simplicesand 2-simplices (columns 1 and 2) transmitted over the wholeexperiment, the average amount of data generated from eachcamera (column 3), and the average amount of data transmittedfrom each camera (column 4). The latter quantity includes mes-sages delivered to the entire network that are not originated atthe given camera. The data size of a 1-packet is obtained by as-signing four bytes to encode the time stamp, two bytes to specifythe ID of the camera source and two bytes to encode the co-ordinates in which there was a block detection. The data sizeof a 2-packet is obtained by assigning four bytes to encode thetime stamp, four bytes to specify the IDs of the cameras asso-ciated with the detection, and four bytes per block for coordi-nates. Note, no additional compression is performed, and theamount of data generated from each camera is less than 12%of the amount of data required if we were to stream all of thedetection.

2) Experiment II: In our second experiment, we utilized eightcameras, four were indoors and four were outdoors. There werea total of four computers used for data recording (one for eachnearest pair in the physical layout). The sequence correspondsto about 10.5 min where the number of targets varied between2 and 5.

The physical layout of our experiment is shown in Fig. 11.Setting up the network for our experiment took less than 15 minby placing laptops at different locations and mounting cameras,which illustrates the inherent flexibility of an ad hoc network.The decomposed camera views (after finding bisecting lines)are shown counterclockwise from the top-left to the bottom-right in the figure. Fig. 12 (left) illustrates the corresponding

Page 11: 2516 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 19, …R. Vasudevan, R. Bajcsy, and S. Sastry are with the Department of Elec-trical Engineering and Computer Sciences, University

2526 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 19, NO. 10, OCTOBER 2010

Fig. 11. Layout of Experiment II (top-right) and corresponding camera viewswith decomposed image domains (counterclockwise from top-left to bottom-right).

Fig. 12. �� -complex found for Experiment II using a threshold value of � ���� (left). The persistence diagrams obtained from the experiment (right). Note,a single connected component and hole are the most persistent features.

simplex after we threshold with a value of . From theright plot, we observe a single connected component and holein the domain are the most persistent topological features in thecoverage, as desired.

Table III shows the average number of packets associatedwith the construction of 1-simplices and 2-simplices (columns1 and 2) transmitted over the whole experiment, the averageamount of data generated from each camera (column 3), and theaverage amount of data transmitted from each camera (column4). The size of the data packets are computed as before. Notethat the amount of data generated by each camera is less than

TABLE IIISUMMARY OF DATA TRANSMISSION FOR EXPERIMENT II

Fig. 13. Coverage of a camera network (left), its corresponding tracking graph(right), and paths for two agents moving in the environment.

7% of the amount of data required if we were to stream all ofthe detections.

VII. WEAK TRACKING

In this section, we illustrate the utility of the -complexby describing how to perform tracking on the representationand how to use the produced data to detect homotopic paths.To begin, we make several observations. First, observe that asingle agent’s location is mapped to a simplex in the -com-plex by determining for which cameras the agent is visible.Second, given fast enough sampling, an agent in simplex canonly move to a face of or a simplex for which is a face.This observation is captured by the tracking graph introducedin Section III-A. Fig. 13 illustrates this mapping from simplicialcomplex to tracking graph using the example camera networkconsidered in the background section. In the figure, we observea pair of paths (left), and , and their corresponding pro-jections onto the tracking graph (right), and , respectively.Detecting the location of agents in the tracking graph helps lo-calize agents in the physical environment and allows for poten-tially more efficient deployment of valuable resources.

The objective of this section is to characterize the motion ofagents in the camera network coverage by identifying their oc-cupancy in the tracking graph. We refer to this process as weaktracking. A path of an agent in the -complex is definedas the ordered list of simplices that it occupies over time. Sincewe employ no identification of agents, weak tracking results insome ambiguity when agents cross paths.

A. Tracking Multiple Targets

Our goal, in this subsection, is to specify the dynamics ofmoving agents in the environment via their dynamics on thetracking graph. Throughout this subsection, we assume that

Page 12: 2516 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 19, …R. Vasudevan, R. Bajcsy, and S. Sastry are with the Department of Elec-trical Engineering and Computer Sciences, University

LOBATON et al.: A DISTRIBUTED TOPOLOGICAL CAMERA NETWORK REPRESENTATION FOR TRACKING APPLICATIONS 2527

cameras are high enough to identify the number of distinctagents in their field of view; however, the identification of theagents is not assumed.

Let be the nodal count vector containing the number ofagents at different states in the tracking graph, where is theframe number. The transitions of agents between regions in thetracking graph is captured by a vector, , which specifies thenumber of transitions along a particular edge of the graph. Note,its sign specifies the direction of a transition. Since the trackinggraph is a directed graph, we can specify an oriented incidencematrix, , which satisfies the relation

(12)

However, this formulation is unable to capture the fact that tran-sitions leaving a node can never exceed the number of agents ata node at a given time. In order to quantify this property, weconsider

(13)

where for a vector or matrix returns an element of thesame dimension with entries equal to except all negative en-tries are replaced by 0, , and

. We count the number of transi-tions out of a particular node by computing

(14)

where .Each camera can only count the number of agents in its own

coverage and cannot identify which agents are in the coverageof another camera. The number of agents counted by camera ,denoted , is equal to

where (15)

The sensor count vector, , is then defined by its projections,

(16)

where if and 0 otherwise. Finally, we mustacknowledge that agents can enter or exit the coverage of thenetwork through different regions in the coverage. We capturethis requirement via a vector that is added to at everyframe.

Given a tracking graph, we can define a corresponding inci-dence matrix, , and a projection matrix, , and if we know thecount of agents at frames and , then the transitions be-tween nodes must satisfy

(17)

TABLE IVLIST OF NODES OCCUPIED BY AGENTS OVER TIME AND

CORRESPONDING COUNTS

Fig. 14. Plot of paths in the physical space joining points � and � (left) andtheir corresponding projection onto the tracking graph for the setup consideredin the background section.

With these equations for the counting dynamics of agents inthe graph and knowledge of the sensor counts , we canattempt to solve for the number of agents in the environmentand their corresponding tracks. Hence, we consider these subproblems:

1) Counting Agents: Finding the number of agents in thecoverage is a common surveillance problem. This can be ad-dressed by either considering observations at a single frame orobservations over multiple frames. In this scenario, we assumeno boundary transitions. This problem can be posed as a ques-tion of feasibility under linear constraints by employing (7) and

, where is the number of agents in question.2) Recovering Tracks: Finding the internal transitions of the

agents which define paths in the tracking graph allows us to lo-calize targets as they move along the physical space. This re-quires knowledge of the number of agents and when they enteror exit the coverage. If in addition, we know probabilities oftransitions and path probabilities, we can calculate an expectedpath in the environment when observations are unable to iden-tify a unique path.

As an example, consider the paths displayed in Fig. 13.The physical and graph paths are displayed in the left andright plots of Fig. 13, respectively. An occupancy list for theagents over time for the tracking graph is included in Table IV(second column). From the counts (see third columnin Table IV) and knowledge of the -complex, we canformulate a simple optimization problem in order to recoverthe occupancy of the agents in the tracking graph. If we know

Page 13: 2516 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 19, …R. Vasudevan, R. Bajcsy, and S. Sastry are with the Department of Elec-trical Engineering and Computer Sciences, University

2528 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 19, NO. 10, OCTOBER 2010

the initial configuration and set , then we can posethe following binary programming problem:

(18)

where is the space of binary vectors, and are the vec-tors formed from stacking and respectively, andthe matrices , , and are obtained by considering thedynamics from (17). In solving this problem, we look for thesparsest vector of transitions that explain the observed counts.The solution to this problem gives the recovered node occupan-cies shown in Table IV (last column). Note, most of the occu-pancies agree with the actual paths followed by the agent; how-ever, there is a single discrepancy at frame which isstill consistent with the observed counts. Hence, the recoveredand actual paths are indistinguishable when only considering thecount vectors.

The tracking process, as described, on the -complex doesnot distinguish between agents locally, i.e., there is no way toseparate crossing paths. Employing a local appearance modelcould solve these ambiguities. In this context, it is also possibleto build a distributed probabilities process to perform trackingon a graph via the framework presented by Oh et al. [25].

B. Identifying Homotopic Tracks

Once tracks have been recovered, it is useful to identify pathswith common starts and end positions that are related to one an-other by deformation. This allows for clustering of paths whenperforming behavioral analysis of agents. For example, this canbe used to identify (up to small deformations) the most commonpath followed by an individual when going from his office to acoffee shop.

Two paths are homotopic if they are continuous deformationsof one another. First, let us consider paths in the -complexformed by 1-chains. In particular, we are interested in identi-fying when two paths, and , with common start and endpositions in the -complex are homotopic. Note that

forms a loop. If the previous paths are homotopic, thenas a result of the comments following Definition 5. Hence,

if the rank of the matrix, , does not equal the rank of ,then and are not homotopic.

When tracking agents using the tracking graph, paths aregiven by ordered lists of occupied simplices (not all of themhave to be 1-simplices). In order to determine whether twosuch paths and with common starting and end locationare homotopic, we can form the loop given by an orderedlist of simplices, . We can generate an ordered listof vertices, , by selecting the vertex with the smallestindex from each simplex (i.e., if then

is selected where ). After removingconsecutively repeated vertices, a 1-chain, , can be built byselecting consecutive pairs of vertices as 1-simplices. Then, therank check described previously can be performed.

In order to illustrate this process, consider the -complexfrom Fig. 14 and the paths and . Observe that is givenby the list of simplices shown at the top of the page. From thispath, the list of vertices {5, 1, 2, 3, 4, 6, 5} is extracted and the1-chain

is constructed. Note that whileallowing us to conclude that the two paths are not homotopic.

VIII. CONCLUSION

In this article, a distributed algorithm for the robust construc-tion of a simplicial representation of a camera network’s cov-erage, the -complex, is presented. The utility of the rep-resentation in fully characterizing the topological structure ofthe network’s coverage and tracking agents in the network isdemonstrated. The construction proceeds by first decomposingthe field of views of the camera nodes locally and then deter-mining overlap between pairs by transmitting sparse occlusionevents across the network.

The strength of the representation is its ease of construction.By creating a representation that is able to accurately capturetopological information about the network’s coverage withoutmaking undue assumptions about the location of cameras or ob-jects or employing agent identification, we bridge the gap be-tween vision graphs which provide limited information aboutoverlaps in network coverage and full metric representationswhich falter in the ad-hoc distributed camera networks context.This work provides a framework to integrate local image obser-vations using a global algebraic framework and opens the doorfor new avenues of research in this nascent field.

REFERENCES

[1] Z. Cheng, D. Devarajan, and R. Radke, “Determining vision graphs fordistributed camera networks using feature digests,” EURASIP J. Appl.Signal Process., vol. 2007, no. 1, 2007, 10.1155/2007/57034.

[2] D. Marinakis and G. Dudek, “Topology inference for a vision-basedsensor network,” in Proc. Canadian Conf. Computer and Robot Vision,2005, pp. 121–128.

[3] D. Marinakis, P. Giguere, and G. Dudek, “Learning network topologyfrom simple sensor data,” in Proc. Canadian Conf. Artificial Intelli-gence, 2007, pp. 417–428.

[4] V. de Silva and R. Ghrist, “Coordinate-free coverage in sensor net-works with controlled boundaries via homology,” Int. J. Robot. Res.,vol. 25, pp. 1205–1221, 2006.

[5] A. Muhammad and A. Jadbabaie, “Decentralized computation of ho-mology groups in networks by gossip,” in Proc. Amer. Control Conf.,2007, pp. 3438–3443.

[6] E. Lobaton, A. Parvez, and S. Sastry, “Algebraic approach to recov-ering topological information in distributed camera networks,” inProc. Int. Conf. Information Processing in Sensor Networks, 2009, pp.193–204.

Page 14: 2516 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 19, …R. Vasudevan, R. Bajcsy, and S. Sastry are with the Department of Elec-trical Engineering and Computer Sciences, University

LOBATON et al.: A DISTRIBUTED TOPOLOGICAL CAMERA NETWORK REPRESENTATION FOR TRACKING APPLICATIONS 2529

[7] E. Lobaton, R. Vasudevan, S. Sastry, and R. Bajcsy, “Robust construc-tion of the camera network complex for topology recovery,” in Proc.ACM/IEEE Int. Conf. Distributed Smart Cameras, 2009, pp. 1–8.

[8] D. Makris, T. Ellis, and J. Black, “Bridging the Gaps between cam-eras,” in Proc. IEEE Computer Society Conf. Computer Vision and Pat-tern Recognition, 2004, vol. II, pp. 205–210.

[9] A. van den Hengel, A. Dick, and R. Hill, “Activity topology estimationfor large networks of cameras,” in Proc. IEEE Int. Conf. Video andSignal Based Surveillance, 2006, pp. 1486–1493.

[10] H. Detmold, A. van den Hengel, A. Dick, A. Cichowski, R. Hill, E.Kocadag, Y. Yarom, K. Falkner, and D. Munro, “Estimating cameraoverlap in large and growing networks,” in Proc. ACM/IEEE Int. Conf.Distributed Smart Cameras, 2008, pp. 1–10.

[11] R. Hill, A. van den Hengel, A. Dick, A. Cichowski, and H. Detmold,“Empirical evaluation of the exclusion approach to estimating cameraoverlap,” in Proc. ACM/IEEE Int. Conf. Distributed Smart Cameras,2008, pp. 1–9.

[12] C. Stauffer and K. Tieu, “Automated multi-camera planar tracking cor-respondence modeling,” in Proc. IEEE Computer Society Conf. Com-puter Vision and Pattern Recognition, 2003, vol. I, pp. 259–266.

[13] L. L. Presti and M. L. Cascia, “Real-time estimation of geometricaltransformation between views in distributed smart-cameras systems,”in Proc. ACM/IEEE Int. Conf. Distributed Smart Cameras, 2008, pp.1–8.

[14] M. Meingast, M. Kushwaha, S. Oh, X. Koutsoukos, A. Ledeczi, andS. Sastry, “Fusion-based localization for a heterogeneous cameranetwork,” in Proc. ACM/IEEE Int. Conf. Distributed Smart Cameras,2008, pp. 1–8.

[15] A. Rahimi, B. Dunagan, and T. Darrell, “Simultaneous calibration andtracking with a network of non-overlapping sensors,” in Proc. IEEEComputer Society Conf. Computer Vision and Pattern Recognition,2004, vol. 1, pp. I-187–I-194.

[16] S. Funiak, C. Guestrin, M. Paskin, and R. Sukthankar, “Distributed lo-calization of networked cameras,” in Proc. Int. Conf. Information Pro-cessing in Sensor Networks, 2006, pp. 34–42.

[17] A. Hatcher, Algebraic Topology. Cambridge, U.K.: Cambridge Univ.Press, 2002.

[18] J. Munkres, Topology, 2nd ed. Upper Saddle River, NJ: Prentice-Hall,2000.

[19] T. Kaczynski, K. Mischaikow, and M. Mrozek, Computational Ho-mology. New York: Springer-Verlag, 2003.

[20] R. Bott and L. Tu, Differential Forms in Algebraic Topology. NewYork: Springer-Verlag, 1995.

[21] B. Jackson, R. Bodor, and N. Papanikolopoulos, “Learning static oc-clusions from interactions with moving figures,” in Proc. IEEE/RSJ Int.Conf. Intelligent Robots and Systems, 2004, vol. I, pp. 963–968.

[22] G. Carlsson, A. Zomorodian, A. Collins, and L. Guibas, “Persistencebarcodes for shapes,” in Proc. Eurographics/ACM SIGGRAPH Symp.Geometry Processing, New York, 2004, pp. 124–135.

[23] A. Zomorodian and G. Carlsson, “Computing persistent homology,”Discrete Computat. Geom., vol. 33, no. 2, pp. 249–274, 2005.

[24] P. Chen et al., “CITRIC: A low-bandwidth wireless camera networkplatform,” in Proc. ACM/IEEE Intl. Conf. Distributed Smart Cameras,2008, pp. 1–10.

[25] S. Oh and S. Sastry, “Tracking on a graph,” in Proc. Int. Symp. Infor-mation Processing in Sensor Networks, 2005, pp. 195–202.

Edgar Lobaton (M’09) received the B.S. degrees inmathematics and electrical engineering from SeattleUniversity, Seattle, WA, in 2004, and the Ph.D. de-gree in electrical engineering and computer sciencesfrom the University of California, Berkeley, in 2009.

He is currently a Post-Doctoral Researcher inthe Department of Computer Science, Universityof North Carolina, Chapel Hill. He was previouslyengaged in research at Alcatel-Lucent Bell Labs in2005 and 2009. His research interests include sensornetworks, computer vision, tele-immersion, and

motion planning.Dr. Lobaton is the recipient of the 2009 Computer Innovation Fellows post-

doctoral fellowship award, the 2004 Bell Labs Graduate Research Fellowship,and the 2003 Barry M. Goldwater Scholarship.

Ramanarayan Vasudevan (S’10) received theB.S. degree in electrical engineering and computersciences and an Honors Degree in Physics fromthe University of California, Berkeley, in 2006, andthe M.S. degree in electrical engineering from theUniversity of California. Berkeley, in 2009.

His research interests include sensor networks,computer vision, hybrid systems, and optimalcontrol. He is the recipient of the 2002 Regent andChancellor’s Scholarship.

Ruzena Bajcsy (M’81–SM’88–F’92) received theM.S. and Ph.D. degrees in electrical engineeringfrom Slovak Technical University, Bratislava,Slovakia, in 1957 and 1967, respectively, and thePh.D. degree in computer science from StanfordUniversity, Stanford, CA, in 1972.

She is currently a Professor of electrical engi-neering and computer sciences at the Universityof California, Berkeley, and Director Emeritus ofthe Center for Information Technology Research inthe Interest of Science (CITRIS). Prior to joining

Berkeley, she headed the Computer and Information Science and EngineeringDirectorate at the National Science Foundation. As a former faculty memberof the University of Pennsylvania, she also served as the Director of theUniversity’s General Robotics and Active Sensory Perception Laboratory,which she founded in 1978, and chaired the Computer and Information Sciencedepartment from 1985 to 1990.

Dr. Bajcsy is a member of the National Academy of Engineering and theNational Academy of Science Institute of Medicine as well as a Fellow of theAssociation for Computing Machinery (ACM), the Institute of Electronic andElectrical Engineers, and the American Association for Artificial Intelligence.In 2001, she received the ACM/Association for the Advancement of ArtificialIntelligence Allen Newell Award, and was named as one of the 50 most impor-tant women in science in the November 2002 issue of Discover Magazine. In2008, she was the recipient Benjamin Franklin Medal for Computer and Cogni-tive Sciences. In 2010, she was the recipient of the IEEE Robotics and Automa-tion Pioneer award.

Shankar Sastry (F’95) received the B.Tech.degreefrom the Indian Institute of Technology, Bombay, in1977, the M.S. degree in EECS, M.A. degree in math-ematics, and the Ph.D. in EECS, all from the Univer-sity of California, Berkeley, in 1979, 1980, and 1981,respectively.

He is currently Dean of the College of Engi-neering, University of California, Berkeley. He wasformerly the Director of Center for InformationTechnology Research in the Interest of Society(CITRIS) and the Banatao Institute at CITRIS

Berkeley. He served as chair of the EECS department from January 2001through June 2004. In 2000, he served as Director of the Information Tech-nology Office at DARPA. From 1996–1999, he was the Director of theElectronics Research Laboratory at Berkeley, an organized research unit onthe Berkeley campus conducting research in computer sciences and all aspectsof electrical engineering. He is the NEC Distinguished Professor of ElectricalEngineering and Computer Sciences and holds faculty appointments in theDepartments of Bioengineering, EECS, and Mechanical Engineering. Prior tojoining the EECS faculty in 1983 he was a professor at MIT.


Recommended