Probabilistic color matching and tracking of human subjects

Probabilistic color matching and tracking ofhuman subjects

Abdeq M. Abdi,* Mendel Schmiedekamp, and Shashi PhohaInformation Science and Technology Division, Applied Research Laboratory, The Pennsylvania

State University, University Park, Pennsylvania 16802, USA

*Corresponding author: [email protected]

Received 17 December 2009; revised 6 August 2010; accepted 10 August 2010;posted 13 August 2010 (Doc. ID 121581); published 8 September 2010

Pattern discovery algorithms based on the computational mechanics (CM) method have been shown tosuccinctly describe underlying patterns in data through the reconstruction of minimum probabilisticfinite state automata (PFSA). We apply the CM approach toward the tracking of human subjects in realtime by matching and tracking the underlying color pattern as observed from a fixed camera. Objects areextracted from a video sequence, and then raster scanned, decomposed with a one-dimensional Haarwavelet transform, and symbolized with the aid of a red–green–blue (RGB) color cube. The clusteredcausal state algorithm is then used to reconstruct the corresponding PFSA. Tracking is accomplishedby generating the minimum PFSA for each subsequent frame, followed by matching the PFSAs tothe previous frame. Results show that there is an optimum alphabet size and segmentation of theRGB color cube for efficient tracking. © 2010 Optical Society of AmericaOCIS codes: 100.4999, 100.0100, 150.1135, 150.6044.

1. Introduction

Video surveillance has been shown to be an effectivetool for monitoring public order, traffic patterns, andpublic safety, and is currently being employed by var-ious agencies, including Homeland Security, the De-partment of Defense, and intelligence agencies [1–3].Typically, surveillance videos are either processed inreal time by trained personnel or the footages aresaved in video files for later analysis. The amountof video footage is vast compared to the number ofqualified analysts to process them. In addition, therequired storage space to save these video files is con-siderable. Hence, it is very desirable to automate thevideo surveillance system and minimize the requiredvideo storage space. A simple task of a video surveil-lance process is tracking the individuals in a videosequence. This typically involves tracking the shape,face, or clothing worn by a target across a videosequence [1–3]. Current tracking methods include

region-, active-contour-, feature-, and model-basedtracking. These methods track changes in the regioncontaining the objects, boundary, or shape of theobjects; specific biometric features; or compare theobjects with a human model for tracking [1]. Mathe-matical and probabilistic tools proposed for track-ing include the Kalman filter, Bayesian networks,the hidden Markov model for behavior understand-ing, wavelet decomposition, hetero-associative jointtransform correlation, backprojection, the mean shiftmethod, and camshift method [1–9]. In this paper, wereport a real-time clustered causal state algorithm(CCSA) video color tracking algorithm that trackshuman subjects based on their color patterns.The color tracking algorithm proposed in this workis based on pattern discovery and computationalmechanics (CM). Pattern discovery with CM is an ap-proach that extracts meaningful and predictive pat-terns from data using statistical mechanics andShannon information theory [10–13]. Unlike the pre-vious probabilistic tools for tracking, and patternrecognition methods in general, pattern discoveryalgorithms do not require prior knowledge of the

0003-6935/10/264926-10$15.00/0© 2010 Optical Society of America

4926 APPLIED OPTICS / Vol. 49, No. 26 / 10 September 2010

pattern or the structure that generated the process.CM represents the underlying pattern of a processin a minimum probabilistic finite state automaton(PFSA). The required storage space of these PFSAsis several orders ofmagnitude smaller than the actualdata, depending on the complexity of the pattern [14].The interesting aspect of CM is that a totally randomor structured process tends to have the minimumnumber of states for the PFSA, whereas a mixtureof random and structured data results in a morecomplex PFSA. The causal state splitting reconstruc-tion (CSSR) algorithm can reconstruct an approxi-mateminimumdeterministic PFSA, or an ε-machine,given a sequence of symbols from a finite alphabet[13]. However, CSSR is time inefficient for real-timeapplications. TheCCSA is a nondeterministic variantof CSSR and is able to reconstruct an approximateminimum nondeterministic PFSA with some loss infidelity using efficient clustering technology to speedup the reconstruction process [14]. The principle be-hind CCSA, its time complexity, and its applicationin lossy video compression were described previously[14]. Results show that the time complexity of CCSAis linear with respect to the size of the symbol se-quence and thus is capable of real-time PFSA recon-struction. In this paper, we apply CCSA for the colortracking algorithm (CTA) application. The proposedCCSA-CTA extracts moving objects from a videoframe using a standard background subtractionmethod [15]. The extracted objects are then rasterscanned, and the pixel count reduces with a one-dimensional fast Haar wavelet transform (1D FWT)[16]. The resulting red–green–blue (RGB) featuresare then quantized using a segmented RGB colorcube. This value is then symbolized using symbolsfrom a finite alphabet. Once symbolized, the CCSAalgorithm is used to determine the underlying colorpattern of each object by generating an approximateminimumPFSA fromthe symbol sequence.ThePFSAfor each object is reconstructed at each frame, and theresulting automata are matched and tracked acrossframes using a stat-metric approach [14]. Theadvantage of the PFSA method compared to othertrackingmethods is the ability to treat patterns as ob-jects. The proposed tracking method is very intuitive.Through the minimum PFSA, the patterns can betracked, manipulated, compared, efficiently stored,and the underlying patterns can be regeneratedprobabilistically.

This paper has six sections. Section 2 outlinescomputational mechanics concepts and the basis ofthe CCSA algorithm. Section 3 describes the colortracking algorithm, including the matching andtracking process. Section 4 describes simulations, ex-perimental results, and the tracking of two subjects ina laboratory environment, and comparison with thestandard backprojection method. Section 5 discussesthe simulation and experimental results. Finally, weconclude in Section 6.

2. CCSA

A. Computational Mechanics

In this section we briefly describe CM concepts asoutlined by other researchers [10,11,14]. Consider astationary, discrete, bi-infinite stochastic process,…S−2S−1S0S1S2…, where Si can take on any valuesfrom a finite alphabet. The first task is to breakthe sequence into two sequences representing the

past history S←

∈ f…S−2S−1g, and the future ~S ∈

fS0S1S2…g. The mutual information between the

past and the future Ið~S; S←Þ can be used to determine

how much of the past history predicts the future se-quences. To minimize the complexity, a partition R

of the history S←is determined thatmaximizes themu-

tual information between the future and the parti-tion. Because computing resources are limited, thelength of the past and future sequences being exam-ined must be limited to what is computationally fea-sible. The mutual information between a sequence ofM symbols into the future ~SM and the partition is di-rectly given by the entropy of the future sequencesHð~SMÞ, and the conditional entropyHð~SM jRÞ as givenby the following equation [10]:

Ið~SM;RÞ ¼ Hð~SMÞ −Hð~SMjRÞ: ð1Þ

BecauseHð~SMÞ is the same for each part of the parti-tion, the mutual information is maximized by mini-mizing the conditional entropy. The probability of

generating ~SM given L previous symbols S←L

is thengiven by the maximum likelihood estimate, as givenby the following equation [10–13]:

PðS←M

¼ → sM j s←LÞ

¼X→sM

�vðS

←L¼ s

←L;→ SM ¼ → sMÞ

vðS←L

¼ s←LÞ

�; ð2Þ

where the lowercase variables are a particular reali-zation of the past or future variables, and vðxÞ is thefrequency of a sequence or a symbol x.Maximal entro-py partition is then used to minimize the number ofsubdivisions of the partition, while maximizing thepredictive power of the partition. This requires allequivalent histories that have similar conditionalprobability distributions for future symbols to bemerged. Two histories s and s0 are equivalent forfuture symbols if

Pð→ SM ¼ → sM jS←

¼ s←Þ ¼ Pð→ SM ¼ → sM jS

←0¼ s

←0Þ:ð3Þ

B. Minimum PFSA Reconstruction

A PFSA is a quintuple fQ;A; δ; τ; qig, where Q is theset of states, A is a finite alphabet, δ is the state

10 September 2010 / Vol. 49, No. 26 / APPLIED OPTICS 4927

transition map, τ is the state transition probability,and qi is the initial starting state [17]. The CCSAcan reconstruct an approximate minimum nondeter-ministic PFSA and succinctly encode the conditionalprobability distribution described by Eqs. (2) and (3)in real time. In this case, the resulting partition con-tains the effective states of the PFSA. The stages ofthe CCSA are initialization, clustering, and finaliza-tion. In the initialization stage, a hash table is con-structed from a sequence, and it contains uniquehistories of a fixed length L. The advantage of thehash table is the ability to give each unique historyof a fixed length L a unique position in the hashtable, thereby making it easier for bookkeeping pur-poses and later clustering. The maximum size of thehash table is given by the cardinality of the alphabetset A to the power of the fixed history length L, jAjL.To acquire the hash table, a history window of lengthL and a future window of length M symbol sequenceis scanned, as shown in Fig. 1 in the case of a threesymbol alphabet fb; g; rg. Like CSSR, the CCSA ex-amines only one symbol into the future M ¼ 1. Thefrequency count of each unique history is recorded,together with the frequency count of the correspond-ing future symbol. At this stage, the hash table, andthe corresponding symbol and the history count are aD-Markov process as described by others [11]. In theClustering stage, the CCSA merges histories thathave similar conditional probability distributions.To satisfy Eq. (3), the distance metric between twohistories, h and h0, is then used to determine whetherthe two histories are of the same state. If vðhÞ andvðanÞ are the frequency of the history and the corre-sponding future symbol, respectively, the distancemetric between the conditional probabilities of thehistories is given by the following:

d ¼�XjAj

n

�vðanÞvðhÞ −

vða0nÞ

vðh0Þ�

2�: ð4Þ

In the Finalization stage, the CCSA uses the hashtable to determine the transition map (δ) betweenstates and the corresponding output symbols, asshown in Fig. 2. If νðσk; ai; σlÞ is the frequency of tran-sition of symbol ai from σk to σl, then the transitionprobability is given by

τðσk;ai; σlÞ ¼νðσk;ai; σlÞPjQj

l¼1

PjAjn¼1 vðσk;an; σlÞ

; ð5Þ

where the denominator represents the total transi-tions from state σk to all other states.

C. Agglomerative Clustering Algorithm and Parameters

Initially, the first history encountered in the hash ta-ble is set as the prototype for the first cluster, C1. Thedistance metric of Eq. (4) is then used to determinewhether the histories in the hash table are within acluster radius r of C1, where r is the tolerance radiusto satisfy the equivalence relation in Eq. (3). His-tories that are within r of the first cluster are aggre-gated and incorporated into the cluster. If a history isoutside the cluster radius r, a new cluster is created,and that history becomes the prototype for a secondcluster C2. If a history is within a cluster radius r ofan existing cluster, then the corresponding history ismoved to that cluster. The process is then repeateduntil every history in the hash table is included ina cluster. Each cluster then corresponds to a stateσ in the PFSA, such that Q ¼ fσ1; σ2;…σjQjg ¼fC1;C2;…CjQjg, as shown in Fig. 3. Currently, the op-timum radius of a cluster is determined heuristicallybased on the alphabet size and a constant Z using thefollowing formulation:

r ¼ 1 −Z

jAj0:5 ; ð6Þ

where Z ¼ 0:75 was determined to be optimal forPFSA reconstruction [14]. With Eq. (6), an upperbound is placed on the number of possible states such

Fig. 1. (Color online)Method to count the histories and the futuresymbols using a sliding window. The window limits the past to Lsymbols and the future to M symbols.

Fig. 2. Probabilistic finite state automaton. The states containequivalent histories and transitions to other states from a symboltaken from a finite alphabet and with a transition probability τ.

Fig. 3. Clustering of equivalent histories. C1 and C2 are thecenters of clusters 1 and 2, and r is the clustering radius.


that jQj ≤ jAj [14]. If P is the size of the symbolsequence jsj, the time complexity of the CCSAwith agglomeration clustering is OðjAj1mathordrþ1 ·N þ jAj2ð1=rþ1Þþ1Þ for the case jAjL > P, andOðjAj1mathordrþ1 · Pþ jAj2ð1mathordrþ1ÞÞ for the casejAjL < P [14]. The time complexity of the CCSA isOðPÞ, or linear with respect to the size of the symbolsequence, if jAjL < P, jAj ≪ P, and jAjL ≫

jAj1mathordrþ1, which is satisfied for most cases [14].

3. Color Tracking Algorithm

The purpose of the CCSA-CTA is to encode the colorpatterns of an object into a minimum PFSA, thenmatch and track the PFSA across video frames.The initial process of the CCSA-CTA starts with avideo feed from a video camera system, as shownin Fig. 4. A frame generator then determines theaverage background frame and the correspondingstandard deviation (sigma). A standard 3-sigmathreshold was used for the background subtraction.

The background frame is then subtracted from agroup of frames (GOF) containing the objects to betracked. The objects are then segmented and rasterscanned vertically to extract relevant RGB features.It is possible to have redundancy in the data com-pared to the observed pattern. Scaling down theobjects to a smaller number of pixel counts in com-parison to the pattern will not severely affect the ac-curacy of the PFSA reconstruction process, as long asthere is sufficient information to satisfy the fre-quency count in Eq. (5), and the noise is minimized.A straightforward method that we have proposedand implemented is to decompose the image usinga 1D FWT right after the raster scanning stage, asshown in Fig. 4(b). Wavelet transforms are multire-solution, and images can be scaled to arbitrary di-mensions [9,16]. The standard wavelet and scalingcoefficients of the Haar wavelet transform are givenby the following:

ψðxÞ ¼� 1 0 ≤ x< 1=2−1 1=2 ≤ x< 10 otherwise

�; ϕðxÞ ¼

�1 0 ≤ x< 10 otherwise

�:

ð7Þ

The 1D FWT routine decomposes the objects by tak-ing the RGB average and the difference of any twoneighboring pixel points. The RGB average is equiva-lent to a low-pass filter (LP), while the RGB differ-ence is equivalent to a high-pass filter (HP). Thepixel count is then downshifted or reduced by one-half on each iteration of the Haar wavelet transform.This is done for each RGB channel. Figure 5 showsthe wavelet decomposition filter banks diagram. Thereduced RGB pixel features from the LP–LP segmentare then used as coordinates on a RGB color cube, asshown in Fig. 6. The RGB color cube is subdividedinto cube sections and each cube section is given aunique coordinate in the RGB color cube. If V, R,G, and B are the number of divisions along each axisof the RGB color cube and the RGB pixel values forred, green, and blue pixels, respectively, then thecoordinates of a cube subsection r, g, and b, roundedto the nearest integer, are given by the followingequations:

r ¼ ðR · V=256Þ; g ¼ ðG · V=256Þ;b ¼ ðB · V=256Þ: ð8Þ

Fig. 4. (Color online) Color tracking algorithm block diagram. (a)Background and group of frame generation. (b), (c) Backgroundsubtraction, object extraction, segmentation, symbolization, andPFSA reconstruction. (d) PFSA matching with previously storedPFSA. The distance metric is used to determine the distance be-tween the distributions generated by the two PFSAs. Fig. 5. 1D FWT decomposition filter banks diagram.


A V-numbering system formulation is then used togive each cube section a unique base-10 value, asgiven by the following polynomial expansion:

Fðr; g; bÞ ¼ r · V2 þ g · V þ b: ð9Þ

The maximum value for F is given by the following:

Fmax ¼ ðV − 1Þ · ðV2 þ V þ 1Þ: ð10Þ

The Fðr; g; bÞ values are then uniformly sampled by acolor bin, where each bin is given a symbol from analphabet set A composed of N symbols:

an ¼ AððFðr; g; bÞ ·N=FmaxÞÞ: ð11Þ

The resulting sequence is converted to an approxi-mate minimum PFSA using the CCSA.

The final stage of the CCSA-CTA requires match-ing the PFSA in subsequent frames. The stat-metricalgorithm is used to compare the distributions gen-erated by two or more PFSAs [14]. The algorithmtakes a PFSA and runs the automaton down to a pre-determined iteration or depth. During this process,the symbol probability vector PðaÞ and the stateprobability vector PðσÞ for a PFSA are determinediteratively. The symbol probability vector for theith iteration is given by the following equation:

Pið�aÞ ¼XjQj

k

XjQj

l

Pi−1ðσkÞ · τðσk; �a; σlÞ: ð12Þ

The next state probability vector is then given by thefollowing equation:

Pið�σÞ ¼XjAjn

XjQj

k

Pi−1ðσkÞ · τðσk;an; �σÞ: ð13Þ

The symbol probability vector and state probabilityvector must sum to 1 such that the following aresatisfied:

XjQj

k¼1

PðσkÞ ¼ 1;PjAj

n¼1 PðanÞ ¼ 1 : ð14Þ

The iterations start with an initial state probabilityvector of the start state P0ðσÞ. P0ðσÞ is unity at thestart state and zero for all other states. The nextiteration of the symbol probability vector is thendetermined by inserting Eq. (13) into Eq. (12) andrepeating the process for the next iteration. The dis-tance metric using the sum of squares for I iterationsbetween a PFSA X and a PFSA Y is then given by thefollowing equation [14,17]:

D ¼ 1I

XIi

�XjAjn

ðPXiðanÞ − PYiðanÞÞ2�: ð15Þ

Because it is possible for the PFSAs to start from dif-ferent states, an offset on the start state is introducedinto PFSA X in order to determine the minimum dis-tance metric, in Eq. (15). The maximum offset possi-ble is equal to the maximum number of states inPFSA X . The distance metric in Eq. (15) is calculatedat each offset, and the minimum distance metric isthen used for the matching criterion [14]. In theCCSA-CTA algorithm, PFSA Y is an array of PFSAsacquired in previous frames and is stored in compu-ter memory. PFSA X is the PFSA for the currentframe. Each object and corresponding PFSA X inthe current frame will then be compared againsteach PFSA Y in the array. The PFSA Y element thatgives the minimum distance metric corresponds to apossible match for PFSA X. A match is found if thedistance metric is within a predefined threshold.Once the current object is matched, the correspond-ing PFSA X replaces the PFSA Y element if it pro-duces a better match compared to the old element.Finally, if there is no match and the distance metricis outside the predefined threshold, a new object isassumed and the corresponding PFSA is added tothe PFSA Y array. The process is repeated acrossframes.

4. Experiments and Results

A. Object Matching Simulation and Results

Thematching and tracking components of the CCSA-CTA is initially tested using MATLAB to match asimple three-element rectangular object. The threecolors and corresponding RGB values are redR½200; 50; 50�, blue B½50; 50; 200�, and green G½50;200; 50�, with a black background K ½50; 50; 50�. Thedimensions of the objects are 12 pixels by 75 pixelsand 24 pixels by 150 pixels for the background.The RGB cube is subdivided evenly by a power of2 or subdivided by 2, 4, 8, or 16, respectively, alongeach axis; this roughly corresponds to 8, 64, 512,and 4096 cubic elements, respectively. The valuesof fr; g; bg for each cube section are f2; 4; 1g,f12; 48; 3g, f113; 393; 78g, and f963; 3123; 828g,respectively. The CCSA is initially set for I ¼ 500,

Fig. 6. RGB color cube. The RGB pixel values specify the coordi-nates on the color cube. The color cube is subdivided into cube sec-tions, and each cube is a given a coordinate ðr; g; bÞ.


L ¼ 3, and the size of the alphabet set was varied by2, 4, 8, and 16 symbols, respectively. No pixel reduc-tion is done using the 1D FWT. Figure 7 shows a rec-tangular object and a triangular object and theircorresponding PFSAs. As shown in Fig. 7, the colorpatterns for both objects are similar but have two dis-tinct PFSAs. This indicates that the PFSAs areweighted to the shape of the target as well as the col-or pattern. For the case of the triangle object, the sur-rounding region K is in contact with each region, asindicated by the PFSA, as well as shown by its geo-metry. Note that the number of states for the PFSA isequal to the number of distinct color regions. Bystudying the PFSA, we can see how the objects areput together and the corresponding transitionsbetween the regions. The probability of a transitionbetween color regions is close to zero for both objects

since the transitions are abrupt and have a minimalarea. However, the probability of staying in the sameregion given a current region is close to 1. In addi-tion, the transition probabilities approach theirrespective limits of zero and 1, respectively, as thedimensions of the objects are scaled up. This is a di-rect result of Eq. (2) since more information, i.e.,more pixels, results in a better approximation ofthe maximum-likelihood estimate for the conditionalprobability. Two rectangular object targets, R-G-Band B-G-R, with the dimensions given above, arematched with a set composed of three rectangular ob-jects: R-G-B, B-G-R, and R-B-G, respectively. Table 1shows the matching results for both objects as a func-tion of alphabet size and the number of subdivisionsalong the axis of the RGB color cube. As shown inTable 1, the B-G-R target can be correctly identifiedfor all cases. However, the R-G-B target is correctlyidentified for the case when the alphabet size in-creases along with theV ; otherwise, the R-B-G targetis incorrectly identified as the R-G-B. This is becauseR-B-G is closer to R-G-B in RGB space than B-G-R, asdetermined by Eq. (6), thus requiring more refine-ment in the alphabet size.

Figure 8 shows the effect of Gaussian noise on thenumber of states of the PFSA. The case shown is forV ¼ 16 and N ¼ 8. As shown in the fitting, thenumber of states increases linearly from three to11 below sigma ¼ 10, and then flattens aftersigma ¼ 30. sigma ¼ 25 corresponds to �2 changein symbols; sigma ¼ 5 corresponds to �1 change insymbols. After sigma ¼ 30, the process is maximallycomplex, essentially a stochastic process containingboth random data and structured data. Increasingthe history length while under noisy conditions willresults in a larger number of states than a smallerhistory length. Without noise, increasing the historylength beyond L ¼ 3 will not improve accuracy sincecolor patterns are simple.

There is some redundancy in the data compared tothe observed pattern. For example, the dimensions ofthe rectangular object are 12 × 75 pixels or 900 totalpixels in the object, while the color pattern is com-posed of only three colors. Reducing the pixel countby a factor of 4 or 225 using the 1D FWT pixels givessimilar matching results for the case of a B-G-R

Fig. 7. (Color online) Three-element color objects and correspond-ing minimum PFSAs. (a) Rectangular object and correspondingPFSA. (b) Triangular object and corresponding PFSA.

Table 1. Matching Results for R-G-B and B-G-R Object Targets as a Function of Alphabet Size and the Number ofSubdivisions along the RGB Color Cube, Va

V

Alphabet Size

2 4 8 16

Matching Results

2 RGB–RBG, BGR RGB, BGR RGB, BGR RGB, BGR4 RGB–RBG, BGR RGB–RBG, BGR RGB, BGR RGB, BGR8 RGB–RBG, BGR RGB–RBG, BGR RGB–RBG, BGR RGB, BGR16 RGB–RBG, BGR RGB–RBG, BGR RGB–RBG, BGR RGB–RBG, BGR

aThe matching results are given as fX ;Yg, where X is the matching results for R-G-B and Y is the matching results for B-G-R. The set ofobject for matching are R-G-B, B-G-R, and R-B-G. An ambiguousmatching is shown as a combination of objects from thematching set suchas R-G-B–R-B-G for R-G-B target match results.


rectangular target with no noise. Any amount ofnoise will be incorporated by 1D FWT, decreasingthe accuracy of the PFSA matching stage.

B. Single Camera Tracking Experiment and Results

The CCSA-CTA was implemented in Visual C++using the results of the previous simulation as theinitial starting conditions. The tracking experimentwas conducted in our Robotics and Controls Labora-tory, which has floor dimensions of 9m× 9m. Thetesting of the CCSA-CTA was conducted on an Inter-net video camera (Sony: SNCRZ30). The field of viewof the camera is set to 45° horizontally and 22° ver-tically. The first part of the algorithm of the CCSA-CTA acquires an average background frame and aGOF for background subtraction and object extrac-tion. Note that the background interferes with fore-ground object for the case when the color of theforeground and background are similar. A 3-sigmathreshold is used to separate the GOF from the back-ground. To fill in the missing foreground pixels, asmall routine is written into the background subtrac-tion algorithm that allows all the vertical object pix-els within 10 pixels from its neighbor to be treated aspart of the object. The objects are then individuallyextracted and the boundaries of the objects aredetermined. The parameters of the CCSA-CTA areI ¼ 500, history length L ¼ 3, and alphabet sizeN ¼ f2; 4; 8; 16g, with an RGB color cube subdividedalong its axis into V ¼ f2; 4; 8g. No pixel reduction isdone using 1D FWT. In addition, object shape effectwas minimized by skipping over the background pix-els. This was done to reduce the variability of thePFSAs, since the subjects tend to change shape asthey move. Two human subjects were tracked withthe camera. Figure 9 shows the subjects’ silhouettesand corresponding PFSAs for V ¼ 8 and N ¼ 8. Sub-ject 1 has a blue shirt and dark (black) pants, andSubject 2 has a blue shirt and light (blue) pants.The subjects move in an irregular circular patternwhile entering, leaving, and reemerging into the fieldof view of the camera. The subjects also occult each

other as they move in the irregular circular pattern.About 1158 JPEG frames were sequentially acquiredand processed by the CCSA-CTA, at 10 frames/s.The tracking is successful after V ¼ 8 and N > 2.Figure 10 shows the tracking image results for se-lected frames for N ¼ 8 and V ¼ 8. The CCSA-CTAidentifies and bounds each object with a unique coloras each object sequentially enters the scene, asshown in Figs. 10(a) and 10(b); in this case, a greenboundary is given to Subject 2 and a red boundary isgiven for Subject 1. The CCSA-CTA erroneouslyidentifies a fourth subject in Fig. 10(c) with a blueboundary, because the object extraction algorithm as-sumes a single object instead of two unique objects.The CCSA-CTA is able to reacquire the correct ob-jects, as shown in Fig. 10(d), and to continue tracking

Fig. 8. (Color online) Number of states as a function of sigma ofthe Gaussian distribution to simulate noise for the case L ¼ 3.

Fig. 9. (Color online) Subject silhouettes and correspondingPFSAs: (a) Subject 1, (b) Subject 2, (c) Subject 1 PFSA, (d) Subject2 PFSA. Note that transition probabilities of less than 0.01 are notincluded. The red lines are counterclockwise transitions, and theblue lines are clockwise transitions.


until the objects occult each other, in Fig. 10(f). Theobjects are then reacquired after occultation, asshown in Fig. 10(g). Figure 11 shows the resultingdistance metric D across frames. The calculation ofthe distance metric is done between the PFSA forSubject 1 at each frame and the stored PFSA Y forSubject 1 and Subject 2. A frame with no distancemetric indicates that Subject 1 is not in the frame.

The minimum distance metric is a match for Subject1. Note that, between 0:15 < D < 0:25 in Fig. 11,Subject 1 is either completely or partially occultedin the frame. Similar observations were made forPFSA X generated by Subject 2, as shown in Fig. 12.Overall, 38% of the frames are completely or par-tially occulted as the subjects moved around the la-boratory space. The CCSA-CTA was compared to thestandard backprojection algorithms found under theOpenCV suite [18]. Instead of raster scanning thesubject’s silhouette, the backprojection method usesa three-dimensional RGB color histogram to deter-mine the color features of a subject model. The nor-malized histogram is then backprojected to create aprobabilistic image. The probabilistic image is thenmatched across frames using a predefined thresholdon the sum of the absolute difference between corre-sponding probabilistic pixel values of the model andthe test frame. For the backprojection method,Figs. 9(a) and 9(b) were used as models for Subject2 and Subject 1, respectively. The parameters forthe CCSA-CTA are N ¼ 8 and V ¼ 4, 8, 16, andthe dimensions of the backprojection histogram are½4; 4; 4�, ½8; 8; 8�, and ½16; 16; 16�. Figure 13 showsthe CCSA-CTA and backprojection results for allframes. The measured noise level for the cameraand resulting images is about 4% of the RGB range(or 255). Figure 13 also shows the effects of noise andpixel reduction on the accuracy of the CCSA-CTA. Asshown in the figure, the minimum error is 2% at thenominal noise level and 4% with a 12% noise level forthe CCSA-CTA. For the case of the backprojection re-sults, the minimum error is 0.7% at the nominalnoise level and 3% with a 12% noise level. For boththe backprojection and the CCSA-CTA results, theerrors were attributed to the occulted frames. Withthe occulted frames removed, the accuracy of thetracking is 100% for both methods. However, theCCSA-CTA error was higher for the occulted framesdue to the dissimilarity between the occulted andnonocculted frames. This is the direct result of thelocal instead of the global characteristics of the

Fig. 10. (Color online) Tracking of Subject 1 (dark pants) andSubject 2 (light pants): (a) 464, (b) 496, (c) 499, (d) 529, (e) 538,(f) 542, (g) 593, (h) 594, and (i) 614. The color tracking algorithmsuccessfully tracks both objects when not occulted. There is ambi-guity for the case of occlusion, as shown in (c) and (f). The CTAcorrectly differentiates the objects after occlusion.

Fig. 11. (Color online) Distance metric of PFSA X generated bySubject 1 when compared to the PFSAY prototype of Subject 1 andSubject 2.

Fig. 12. (Color online) Distance metric of PFSA X generated bySubject 2 when compared to the PFSAY prototype of Subject 1 andSubject 2.


CCSA-CTA. No major improvements were observedas the history length was set to 4 and 8 since the colorpatterns are simple for each object. However, theCCSA-CTA was able to maintain tracking after thepixel count was reduced by a factor of 8 and witha 3 times nominal noise level, from about 2000 pixelson the average to about 250 pixels, with some minordegradation in the tracking. The trade-off was an im-provement in the PFSA reconstruction speed. Withthe occulted frames removed, the tracking accuracyreaches 100% for both methods given the test caseand added noise.

5. Discussion

The CCSA-CTA demonstrates the feasibility of usingCCSA and pattern discovery algorithms in the colortracking of human subjects. We demonstrated effi-cient matching of simulated objects and we usedthe results of the simulation in the experimental sec-tion to successfully track two human subjects in alaboratory condition. In particular, the experimentalsection has shown several particularities of theCCSA-CTA. Occultation and partial objects in thescene confuses the CCSA-CTA. However, the CCSA-CTA recovers once the objects are in full view of thecamera, as shown in Section 4. The effect of occulta-tion is a major issue in other tracking algorithms, asshown in the literature, and is a continuous area ofresearch [1]. Another issue with tracking is deter-mining a threshold for the distance metric of Eq. (12).In the CCSA-CTA, the threshold was set manuallyand is not updated continuously. This is cumbersomebecause a new threshold will be required if the alpha-bet size changes, since the distance metric is relatedto the symbol set of Eq. (10). A potential solution is touse clustering technology to determine an optimumthreshold level automatically. Increasing the divi-sions of the RGB cube, while maintaining the alpha-bet size, decreases the resolution of CCSA-CTA. Thisis due to the number system formulation of Eq. (5);

thus, green and blue are much closer in RGB spacecompared to red and green or red and blue. To main-tain the resolution of the CCSA-CTA, the size of thealphabet will be required to increase proportionally,which is not desirable, since this would consequentlyreduce the speed of the reconstruction of the ma-chines. A better option would be to use nonuniformsampling of Eq. (5) or determine distinct regionsusing a histogram, and giving each region a symbolfrom an alphabet. Since we cannot predict the color ofobjects beforehand, this will require continuous up-dates of the regions. The trade-off of this approachwould be reduction in speed if the updating is donecontinuously for every frame.

Results show that the CCSA-CTA is comparable tothe backprojection method. However, the backprojec-tion method is slightly more accurate than theCCSA-CTA. Backprojection utilizes all the bins, upto 4096 bins for V ¼ 16, whereas the CCSA-CTAsamples the bins. In effect, the use of the histogramgives a global view of the targets and the CCSA givesa local view. A local view is an advantage for the casewhen identifying subjects having similar colors butwith different color distribution patterns. In addi-tion, the backprojection method requires modelsfor tracking, whereas the patterns are discoveredin the CCSA-CTA. Since the color of clothing is heav-ily weighted when compared to the face, neck, andextruding body color, the CCSA-CTA will fail if thecolor pattern of the subjects is similar. A more robustscheme is to use other biometrics to increase theprobability of a match. In the future, we plan to sub-divide a human subject into zones composed of thehead, body, and lower body parts, similar to whatis done in the literature [19,20]. From this, we extendthe RGB color cube to include gaits, walking peri-odicity, object centroid, facial identifications, andposture.

As shown in the simulations, there is some redun-dancy in the data compared to the observed pattern.Reducing the pixel count with 1D FWT increased thePFSA reconstruction speed, since the number of pix-els for processing is reduced. This shows that scalingdown objects to a smaller number of pixels in compar-ison to the color pattern will not severely affect theaccuracy of the PFSA reconstruction process, as longas sufficient information is given to satisfy the fre-quency count. The speed of the CCSA-CTA can be in-creased further by minimizing the reconstruction ofPFSAs for every frame. Since the color distribution ofthe objects does not change significantly one frameahead, it may be possible to reconstruct only the partof the PFSA that needs updating. Finally, the CCSA-CTA is heavily dependent on the quality of the objectextraction routine. Background interference is aserious issue. A more sophisticated background sub-traction routine is required, such as Gaussian mix-tures, background cut, and randomMarkov methods,as previously shown in the literature [21–23]. An-other method we are interested in is using CM di-rectly on the image without requiring background

Fig. 13. (Color online) CCSA-CTA and backprojection results.The parameters are V ¼ 4, 8, and 16, and jAj ¼ 8 for the CCSA-CTA, and ½4; 4;4�, ½8; 8;8�, and ½16; 16; 16� RGB color bin histogramsizes for the backprojection method. The minimum and maximumpixel reduction is PR ¼ 0 and PR ¼ 8, respectively. The minimumand maximum noise is 4% and 12%, respectively.


subtraction. If the background is fixed, then it maybe possible to divide the image into backgroundPFSA and foreground PFSA. We are currently ex-ploring the use of the CCSA-CTA with a multispec-tral thermal infrared imager.

6. Conclusion

The CCSA-CTA demonstrates the feasibility of usingCCSA and pattern discovery algorithms in the colortracking of human subjects. Two human subjectswere successfully tracked in a laboratory environ-ment. Our results show that CCSA-CTA is robustto noise and is able to maintain the distinction be-tween the subjects, even after occultation. The opti-mum alphabet size for the CCSA for matching simplecolor patterns is between three and eight. However,to maintain the resolution of the CCSA-CTA, wehave to increase the size of the alphabet proportion-ally to the subdivision of the RGB color cube, which isnot desirable. A better option would be to use nonuni-form sampling or determine some distinct regionsusing a histogram, and give each region a symbolfrom an alphabet. Since the color of clothing is heav-ily weighted when compared to the face, neck, andextruding body color, the CCSA-CTA will likely failif the colors of the clothing of two or more targetsare similar. A more robust scheme is to extend theRGB color cube and add other biometrics. Finally,the CCSA-CTA is heavily dependent on the qualityof the object extract routine. Background interfer-ence is a serious issue. A more sophisticated back-ground subtraction routine is required for large scaletracking.

This material is based on work supported by theDepartment of Defense, and the U. S. Army ResearchLaboratory and the U. S. Army Research Office(USARO) under the eSensIF Multidisciplinary Uni-versity Research Initiative (MURI) award W911NF-07-1-0376. Any opinions, findings, conclusions orrecommendations expressed in this publication arethose of the authors and do not necessarily reflectthe views of the sponsor.

References1. W. Hu, T. Tan, L. Wang, and S. Maybank, “A survey on visual

surveillance of object motion and behavior,” IEEE Trans. Syst.Man Cybern. C 34, 334–352 (2004).

2. R. T. Collins, A. J. Lipton, T. Kanade, H. Fujiyoshi, D. Duggins,Y. Tsin, D. Tolliver, N. Enomoto, O. Hasegawa, P. Burt, and L.Wixon, “A system for video surveillance and monitoring,” Car-negie Mellon University report CMU-RI-TR-00-12 (2001).

3. C. Lerdsudwichai, M. Abdel-Mottaleb, and A. Ansari, “Track-ing multiple people with recovery from partial and total occlu-sion,” Pattern Recogn. 38, 1059–1070 (2005).

4. A. A. Argyros and M. I. A. Lourakis, “Three-dimensionaltracking of multiple skin-colored regions by a moving stereo-scopic system,” Appl. Opt. 43, 366–377 (2004).

5. S. Weng, C. Kuo, and S. Tu, “Video object tracking usingadaptive Kalman filter,” J. Visual Commun. Image Represent.17, 1190 (2006).

6. G. R. Bradski, “Computer video face tracking for use in aperceptual user interface,” Intel Technol. J. Q2, 1–15 (1998).

7. M. J. Swain and D. H. Ballard, “Color indexing,” Int. J. Com-put. Vis. 7(1), 11–32 (1991).

8. M. S. Alam, J. Khan, and A. Bal, “Heteroassociative multiple-target tracking by fringe-adjusted joint transform correla-tion,” Appl. Opt. 43, 358–365 (2004).

9. F. A. Sadjadi, “Infrared target detection with probability den-sity functions of wavelet transform subbands,” Appl. Opt. 43,315–323 (2004).

10. C. R. Shalizi and J. P. Crutchfield, “Computational mechanics:pattern and prediction, structure, and simplicity,” J. Stat.Phys. 104, 817–879 (2001).

11. A. Ray, “Symbolic dynamic analysis of complex systems foranomaly detection,” Signal Process. 84, 1115–1130 (2004).

12. V. Rajagopalan and A. Ray, “Symbolic time series analysisvia wavelet-based partitioning,” Signal Process. 86, 3309–3320 (2006).

13. C. R. Shalizi and K. L. Shalizi, “Blind construction of optimalnonlinear recursive predictors for discrete sequences,” in Pro-ceedings of the 20th Conference on Uncertainty in ArtificialIntelligence (AUAI Press, 2004), Vol. 70, pp. 504–511.

14. M. Schmiedekamp, A. Subbu, and S. Phoha, “The clusteredcausal state algorithm: efficient pattern discovery for lossydata-compression applications,” Comput. Sci. Eng. 8, 59–67(2006).

15. M. Piccardi, “Background subtraction techniques: a review,” in2004 IEEE International Conference on Systems, Man, andCybernetics (IEEE, 2004), pp. 3099–3104.

16. C. S. Burrus, R. A. Gopinath, and H. Guo, Introduction toWavelets and Wavelets Transforms: a Primer (Prentice-Hall 1998).

17. E. Vidal, F. Thollard, C. del la Higuera, F. Casacuberta, and R.C. Carrasco, “Probabilistic finite-state machines: Part I,”IEEE Trans. Pattern Anal. Mach. Intell. 27, 1013–1025(2005).

18. Open Source Computer Vision, http://opencv.willowgarage.com/wiki/.

19. G. Theo, “Robust segmentation and tracking of colored objectsin video,” IEEE Trans. Circuits Syst. Video Technol. 14,776–781 (2004).

20. M. W. Lee and R. Nevatia, “Body part detection for humanpose estimation and tracking,” in Proceedings of the 2008IEEE Workshop on Motion and Video Computing (IEEE,2008).

21. J. Ritter, J. Kato, S. Joga, and A. Blake, “A probabilistic back-ground model for tracking,” in ECCV 2000, LNCS 1843, D.Vernon, ed. (Springer-Verlag, 2000), pp. 336–350.

22. J. Sun, W. Zhang, X. Tang, and H. Y. Shum, “Background cut,”in ECCV 2006, Part II, LNCS 3952, A. Leonardis, H. Bishcof,A. Pinz, eds. (Springer-Verlag, 2006), pp. 628–641.

23. Y. Shan and R. Wang, “Improved algorithms for motion detec-tion and tracking,” Opt. Eng. 45, 067201 (2006).


http://opencv.willowgarage.com/wiki/



Date post:	30-Sep-2016
Category:	Documents
Upload:	shashi
View:	212 times
Download:	0 times

Probabilistic color matching and tracking of human subjects

Documents