+ All Categories
Home > Documents > Weiyao Lin, Hang Chu, Jianxin Wu, Bin Sheng, and ... IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR...

Weiyao Lin, Hang Chu, Jianxin Wu, Bin Sheng, and ... IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR...

Date post: 25-May-2018
Category:
Upload: phungdat
View: 213 times
Download: 0 times
Share this document with a friend
13
1980 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 23, NO. 11, NOVEMBER 2013 A Heat-Map-Based Algorithm for Recognizing Group Activities in Videos Weiyao Lin, Hang Chu, Jianxin Wu, Bin Sheng, and Zhenzhong Chen Abstract —In this paper, a new heat-map-based algorithm is proposed for group activity recognition. The proposed algorithm first models human trajectories as series of heat sources and then applies a thermal diffusion process to create a heat map (HM) for representing the group activities. Based on this HM, a new key- point-based (KPB) method is used for handling the alignments among HMs with different scales and rotations. A surface-fitting (SF) method is also proposed for recognizing group activities. Our proposed HM feature can efficiently embed the temporal motion information of the group activities while the proposed KPB and SF methods can effectively utilize the characteristics of the HM for activity recognition. Section IV demonstrates the effectiveness of our proposed algorithms. Index Terms—Activity recognition, group activity, heat map, surface matching. I. Introduction D ETECTING group activities or human interactions have attracted increasing research interests in many applica- tions such as video surveillance and human–computer interac- tion [1]–[6]. Many algorithms have been proposed for recognizing group activities or interactions [1]–[6], [24], [25]. Zhou et al. [2] proposed to detect pair activities by extracting the causality, mean, variance features from bi-trajectories. Ni et al. [3] further extended the causality features into three types of individuals, pairs, and groups. Chen et al. [5] detected group activities by introducing the connected active segmentations for representing the connectivity among people. Manuscript received September 27, 2012; revised March 24, 2013; accepted May 8, 2013. Date of publication June 19, 2013; date of current version November 1, 2013. This work was supported in part by the National Science Foundation of China under Grants 61001146 and 61202154, the Open Project Program of the National Laboratory of Pattern Recognition, the SMC Grant of SJTU, Shanghai Pujiang Program (12PJ1404300), and the Chinese National 973 Grants (2010CB731401). The basic idea of this paper appeared in our conference version [27]. In this version, we propose new KPB and SF methods to handle the heat map alignments, carry out detailed analysis, and present more performance results. The first two authors contributed equally to this paper. This paper was recommended by Associate Editor F. G. B. De Natale. W. Lin and H. Chu are with the Department of Electronic Engineer- ing, Shanghai Jiao Tong University, Shanghai 200240, China (e-mail: [email protected]; [email protected]). J. Wu is with the National Key Laboratory for Novel Software Technology, Department of Computer Science, Nanjing University, Nanjing 210023, China (e-mail: [email protected]). B. Sheng is with the Department of Computer Science, Shanghai Jiao Tong University, Shanghai 200240, China (e-mail: [email protected]). Z. Chen is with MediaTek USA, Inc., San Jose, CA 95134 USA (e-mail: [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TCSVT.2013.2269780 Cheng et al. [4] proposed the Group Activity Pattern for representing group activities as Gaussian processes and extract Gaussian parameters as features. However, most of the existing algorithms extract the overall features from the activities’ entire motion information (e.g., the statistical average of the motion trajectory). These features cannot suitably embed activities’ temporal motion information (e.g., fail to indicate where a person is in the video at a certain moment). Thus, they will have limitations when recognizing more complex group activities. Although some methods [6], [29] incorporate the temporal information with chain models such as the hidden Markov models (HMM), they have the disadvantage of requiring large-scale training data [17]. Other methods try to include the temporal information by attaching time stamps with trajectories and perform recognition by associating these time stamp labels [18], [19]. However, these methods are more suitable for scenarios with only one trajectory or trajectories with fixed correspondence. They will become less effective or even infeasible when describing and differentiating the complicated temporal interactions among multiple trajectories in group activities. Furthermore, [24], [25] give more extensive survey about the existing techniques used in group activity recognition and crowd analysis. In another part, handling motion uncertainties are also an important issue in group activity recognition. Since the motions of people vary inherently in group activities, the recognition accuracy may be greatly affected by this uncertain motion nature. Although some methods utilize Gaussian pro- cesses estimation or filtering to handle this uncertain problem [3], [4], they do not simultaneously consider the issue for reserving the activities’ temporal motion information. Furthermore, the recognition method is a third key issue for recognizing group activities. Although the popularly used models, such as linear discriminative analysis and HMM [6], show good results in many scenarios, their training difficulty and the requirement of the training data scale will increase substantially when the feature vector length becomes large or the group activity becomes complex. Therefore, it is also nontrivial to develop more flexible recognition methods for effectively handling the recognition task. In this paper, we propose a new heat-map-based (HMB) algorithm for group activity recognition. The contributions of our work can be summarized as follows. 1) We propose a new heat map (HM) feature to represent group activities. The proposed HM can effectively catch the temporal motion information of the group activities. 1051-8215 c 2013 IEEE
Transcript

1980 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 23, NO. 11, NOVEMBER 2013

A Heat-Map-Based Algorithm for RecognizingGroup Activities in Videos

Weiyao Lin, Hang Chu, Jianxin Wu, Bin Sheng, and Zhenzhong Chen

Abstract—In this paper, a new heat-map-based algorithm isproposed for group activity recognition. The proposed algorithmfirst models human trajectories as series of heat sources and thenapplies a thermal diffusion process to create a heat map (HM) forrepresenting the group activities. Based on this HM, a new key-point-based (KPB) method is used for handling the alignmentsamong HMs with different scales and rotations. A surface-fitting(SF) method is also proposed for recognizing group activities.Our proposed HM feature can efficiently embed the temporalmotion information of the group activities while the proposedKPB and SF methods can effectively utilize the characteristicsof the HM for activity recognition. Section IV demonstrates theeffectiveness of our proposed algorithms.

Index Terms—Activity recognition, group activity, heat map,surface matching.

I. Introduction

DETECTING group activities or human interactions haveattracted increasing research interests in many applica-

tions such as video surveillance and human–computer interac-tion [1]–[6].

Many algorithms have been proposed for recognizinggroup activities or interactions [1]–[6], [24], [25]. Zhouet al. [2] proposed to detect pair activities by extractingthe causality, mean, variance features from bi-trajectories.Ni et al. [3] further extended the causality features intothree types of individuals, pairs, and groups. Chen et al. [5]detected group activities by introducing the connected activesegmentations for representing the connectivity among people.

Manuscript received September 27, 2012; revised March 24, 2013; acceptedMay 8, 2013. Date of publication June 19, 2013; date of current versionNovember 1, 2013. This work was supported in part by the National ScienceFoundation of China under Grants 61001146 and 61202154, the Open ProjectProgram of the National Laboratory of Pattern Recognition, the SMC Grant ofSJTU, Shanghai Pujiang Program (12PJ1404300), and the Chinese National973 Grants (2010CB731401). The basic idea of this paper appeared in ourconference version [27]. In this version, we propose new KPB and SF methodsto handle the heat map alignments, carry out detailed analysis, and presentmore performance results. The first two authors contributed equally to thispaper. This paper was recommended by Associate Editor F. G. B. De Natale.

W. Lin and H. Chu are with the Department of Electronic Engineer-ing, Shanghai Jiao Tong University, Shanghai 200240, China (e-mail:[email protected]; [email protected]).

J. Wu is with the National Key Laboratory for Novel Software Technology,Department of Computer Science, Nanjing University, Nanjing 210023, China(e-mail: [email protected]).

B. Sheng is with the Department of Computer Science, Shanghai Jiao TongUniversity, Shanghai 200240, China (e-mail: [email protected]).

Z. Chen is with MediaTek USA, Inc., San Jose, CA 95134 USA (e-mail:[email protected]).

Color versions of one or more of the figures in this paper are availableonline at http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TCSVT.2013.2269780

Cheng et al. [4] proposed the Group Activity Pattern forrepresenting group activities as Gaussian processes andextract Gaussian parameters as features. However, most ofthe existing algorithms extract the overall features fromthe activities’ entire motion information (e.g., the statisticalaverage of the motion trajectory). These features cannotsuitably embed activities’ temporal motion information (e.g.,fail to indicate where a person is in the video at a certainmoment). Thus, they will have limitations when recognizingmore complex group activities. Although some methods [6],[29] incorporate the temporal information with chain modelssuch as the hidden Markov models (HMM), they have thedisadvantage of requiring large-scale training data [17]. Othermethods try to include the temporal information by attachingtime stamps with trajectories and perform recognition byassociating these time stamp labels [18], [19]. However,these methods are more suitable for scenarios with only onetrajectory or trajectories with fixed correspondence. They willbecome less effective or even infeasible when describing anddifferentiating the complicated temporal interactions amongmultiple trajectories in group activities. Furthermore, [24],[25] give more extensive survey about the existing techniquesused in group activity recognition and crowd analysis.

In another part, handling motion uncertainties are alsoan important issue in group activity recognition. Since themotions of people vary inherently in group activities, therecognition accuracy may be greatly affected by this uncertainmotion nature. Although some methods utilize Gaussian pro-cesses estimation or filtering to handle this uncertain problem[3], [4], they do not simultaneously consider the issue forreserving the activities’ temporal motion information.

Furthermore, the recognition method is a third key issuefor recognizing group activities. Although the popularly usedmodels, such as linear discriminative analysis and HMM [6],show good results in many scenarios, their training difficultyand the requirement of the training data scale will increasesubstantially when the feature vector length becomes largeor the group activity becomes complex. Therefore, it is alsonontrivial to develop more flexible recognition methods foreffectively handling the recognition task.

In this paper, we propose a new heat-map-based (HMB)algorithm for group activity recognition. The contributions ofour work can be summarized as follows.

1) We propose a new heat map (HM) feature to representgroup activities. The proposed HM can effectively catchthe temporal motion information of the group activities.

1051-8215 c© 2013 IEEE

LIN et al.: HEAT-MAP-BASED ALGORITHM FOR RECOGNIZING GROUP ACTIVITIES IN VIDEOS 1981

Fig. 1. (a) Activity trajectory. (b) Corresponding heat source series. (c) HMdiffused from the heat source series in (b). (d) HM surface of (c) in 3-D.

2) We propose to introduce a thermal diffusion process tocreate the HM. By this way, the motion uncertainty fromdifferent people can be efficiently addressed.

3) We propose a key-point-based (KPB) method to handlethe alignments among HMs with different scales androtations. By this way, the HM differences due to motionuncertainty can be further reduced and the follow-uprecognition process can be greatly facilitated.

4) We also propose a new surface-fitting (SF) method torecognize the group activities. The proposed SF methodcan effectively catch the characteristics of our HMfeature and perform recognition efficiently.

The remainder of this paper is organized as follows.Section II describes the basic ideas of our proposed HMfeature as well as the KPB and SF methods. Section IIIpresents the details of our HMB algorithm. The experimentalresults are shown in Section IV and Section V concludes thepaper.

II. Basic Ideas

A. Heat Map Feature

As mentioned, given the activities’ motion information (i.e.,motion trajectory in this paper), directly extracting the globalfeatures will lose the useful temporal information. In orderto avoid such information loss, we propose to model theactivity trajectory as a series of heat sources. As shown inFig. 1, (a) is the trajectory of one person. In order to transferthe trajectories into heat source series, we first divide theentire video scene into small nonoverlapping patches [i.e., thesmall squares in (b)]. If the trajectory goes through a patch,this patch will be defined as one heat source. By this way, atrajectory can be transferred into a series of heat sources, as inFig. 1(b). Furthermore, in order to further catch the temporalinformation of the trajectory, we also introduce a decay factoron different heat sources such that the thermal energies of theolder heat sources (i.e., patches closer to the stating point ofthe trajectory) are smaller while the newer heat sources willhave larger thermal energies. By this way, the thermal values ofthe heat source series can be arranged increasingly accordingto the direction of the trajectory and the temporal informationcan be effectively embedded.

Furthermore, since people’s trajectories may have largevariations, directly using the heat source series as features willbe greatly affected by this motion fluctuation. Therefore, inorder to reduce the motion fluctuation, we further propose tointroduce a thermal diffusion process to diffuse the heats fromthe heat source series to the entire scene. We call this diffusionresult as the HM. With our HM feature, we can describe theactivities’ motion information by 3-D surfaces. Fig. 1(c) and(d) shows the HM of the trajectory in Fig. 1(a) in 2-D formatand in 3-D surface format, respectively. Several points needto be mentioned about the HM in our paper.

1) Note that although the heat diffusion was introduced inobject segmentation in some works [8], the mechanismand utilization of HM in our algorithm is far differentfrom them. And to the best of our knowledge, thisis the first work to introduce HM into group activityrecognition.

2) The definition of ‘heat map’ in this paper is alsodifferent from the ones used in some activity recognitionmethods [11], [12]. In those methods [11], [12], theHMs are defined to reflect the number of translationsamong different regions without considering the orderduring passes. Thus, they are more focused on reflectingthe popularity of regions (i.e., whether some regionsare more often visited by people) while neglecting thetemporal motion information as well as the interactionsamong trajectories.

3) With the HM features, we can perform offline activityrecognition by creating HMs for the entire trajectories.This off-line recognition is important in many appli-cations such as video retrieval and surveillance videoinvestigation [6], [15]. Furthermore, the HM featurescan also be used to perform on-line (or on-the-fly)recognition by using shorter sliding windows. This pointwill be further discussed in Section IV.

After the calculation of HM features, we can use them forrecognizing group activities. However, two problems need tobe solved for perform recognition with HM features. They aredescribed in the following.

B. Alignments Among HMs

Although the thermal diffusion process can reduce themotion fluctuation effect due to motion uncertainty or trackingbiases, the resulting HM will still differ a lot due to thevarious motion patterns for different activities. For example,in Fig. 2(a), since the trajectories of human activities takevaries directions and lengths, the HMs for the same typeof group activity show large differences in scales and ro-tations. Therefore, alignments are necessary to reduce theseHM differences for facilitating the follow-up recognitionprocess.

In this paper, we propose a new KPB method to handlethe alignments among HMs. Since our HMs are featured withpeaks [i.e., local maxima in HM as in Fig. 1(d)], the proposedKPB method extracts the peaks from HMs as the key pointsand then performs alignments according to these key points inan iterative way. By this way, the scale and rotation variations

1982 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 23, NO. 11, NOVEMBER 2013

Fig. 2. Alignments among HMs. (a) HMs for the group activity ‘Gather’performed by different people. (b) Alignment results of the HMs in (a) byour KPB method.

Fig. 3. Process of the SF method.

among HMs can be effectively removed. Fig. 2(b) shows thealignment results of the HMs in (a) by our KPB method. Moredetails about the KPB method will be described in the nextsection.

C. Recognition Based on HMs

Since the HM feature includes rich information, the problemthen comes to the selection of a suitable method for performingrecognition based on this HM feature. In this paper, wefurther propose a SF method for activity recognition. In ourSF method, a set of standard surfaces are first identified forrepresenting different activities. Then, the similarities betweenthe surface of the input HM and the standard surfaces arecalculated. And finally, the best matched standard surface willbe picked up and its corresponding activity will become therecognized activity for the input HM. The process of our SFmethod is shown in Fig. 3.

With the basic ideas of the HM feature, the KPB and theSF methods described above, we can propose our HMB groupactivity recognition algorithm. It is described in detail in thefollowing section.

III. HMB Algorithm

The framework of our HMB algorithm can be describedby Fig. 4. In Fig. 4, the input group activities’ trajectories

Fig. 4. Process of the HMB algorithm.

are first transferred into heat source series, then the thermaldiffusion process is performed to create the HM feature fordescribing the input group activity. After that, the KPB methodis used for aligning HMs and finally the SF method is usedfor recognizing the group activities. As mentioned, the heatsource series transfer, the thermal diffusion, the KPB method,and the SF method are the four major contributions of ourproposed algorithm. Thus, we will focus on describing thesefour parts in the following.

A. Heat Source Series Transfer

Assume that we have in total j trajectories in the currentgroup activity. The thermal energy Ei of the heat source patchi can be calculated by

Ei =∑

j

Ei,j · e−kt(tcur−tid,j) (1)

where e−kt(tcur−tid,j) is the time decay term [10], kt is thetemporal decay coefficient, tcur is the current frame number,and tid,j is the frame number when the jth trajectory leavespatch i. Ei,jis the accumulated thermal energy for trajectory jin patch i and it can be calculated by (2). From (1), we can seethat newer heat sources of the trajectory have more thermalenergies than the older heat sources

Ei,j =∫ tid,j−tis,j

0C.e−kt tdt =

C

kt

(I − e−Kt (tid,j−tis,j)) (2)

where tis,j and tid,j are the frame number when the jthtrajectory enters and leaves patch i, respectively. kt is thetemporal decay coefficient as in (1), and C is a constant. Inthe experiments of our paper, C is set to be 1. From (2), wecan see that the accumulated thermal energy is proportionalto the stay length of trajectory j at patch i. If j stays in i forlonger time, more thermal energy will be accumulated in patchi. On the other hand, if no trajectory goes through patch i, theaccumulated thermal energy of patch i will be 0, indicatingthat patch i is not a heat source patch.

B. Thermal Diffusion

After getting the heat source series by (1), the thermaldiffusion process will be performed over the entire scene to

LIN et al.: HEAT-MAP-BASED ALGORITHM FOR RECOGNIZING GROUP ACTIVITIES IN VIDEOS 1983

Fig. 5. Left column: two trajectory sets for group activity “Gather” Middlecolumn: the corresponding heat source series. Right column: the correspond-ing HMs.

Fig. 6. HM surfaces for different group activities.

create the HM. The HM value H i at patch i after diffusion[10] can be calculated by

Hi =

∑Nl=1 El · e−kpd(i,l)

N(3)

where El is the thermal energy of the heat source patch l, Nis the total number of heat source patches. kp is the spatialdiffusion coefficient, and d(i, l) is the distance between patchesi and l.

The advantage of the thermal diffusion process can bedescribed by Fig. 5. In Fig. 5, the left column lists twodifferent trajectory sets for the group activity ‘Gather.’ Due tothe variation of human activity or tracking biases, these twotrajectory sets are obviously different from each other. Andthese differences are exactly transferred to their heat sourceseries (the middle column). However, with the thermal dif-fusion process, the trajectory differences are suitably blurred,which makes their HMs (the right column) close to each other.At the same time, the temporal information of the two groupactivities is still effectively reserved in the HMs.

Also, Fig. 6 shows the example HM surfaces for differentgroup activities defined in Table I. From Fig. 6, it is clear thatour proposed HM can precisely catch the activities’ tempo-ral information and show obviously distinguishable patternsamong different activities.

Furthermore, it should be noted that our proposed HMBalgorithm is not limited to trajectories. More generally, as

TABLE I

Definitions of Different Human Group Activities

Fig. 7. Selection of the second key point for single-peak HM cases (the pinkpoint is the selected second key point, the blue point is the peak point, andthe red line is the contour line whose corresponding point values are all equalto the half of the peak value, best viewed in color). (a) HM in 2-D view.(b) HM of (a) in 3-D view.

long as we can detect patches with motions, we can use thesemotion patches as the heat sources to create HMs. Therefore,in practice, when reliable trajectories cannot be achieved, wecan even skip the tracking process and use various low-levelmotion features (such as the optical flow [28]) to create theHMs for recognition. This point will be further demonstratedin Section IV.

C. KPB HM Alignment Method

After generating the HM features, the alignment processis performed for removing the scale and rotation variationsamong HMs. In this paper, we borrow the idea of the activeappearance model used in face fitting [7], [13] and proposea KPB HM alignment method. The process of using ourKPB method to align an input HM with a target HM canbe described by Algorithm 1.

Furthermore, several points need to be mentioned about ourKPB method.

1) When HMs with different peak numbers are aligned,only the peaks available in both HMs are used foralignment (e.g., when an HM with n1 peaks is aligningwith an HM with n2 peaks and n1 < n2, we only use n1

peaks as the key points for alignment).2) For HM with only one peak, we will add an additional

key point for alignment. That is, we first pick up thepoints whose heat values are half that of the peak point,and then the one which is farthest to the peak willbe selected as the second key point for alignment, asshown in Fig. 7. Since the direction from the peak to theadditional key point represents the slowest-descendingslope of the HM surface, the HMs can then be suitablyaligned by matching this slope.

3) It should be noted that in the steps of 2, 3, and 4 inAlgorithm 1, the key points are shifted, scaled, and

1984 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 23, NO. 11, NOVEMBER 2013

Algorithm 1 KPB Method

1. For the input HM, extract the n largest peak points anduse the locations of these peak points as the key points inlater alignment steps: (P1, P2, P3, ..., Pn), where Pi = [xi,yi] is the location of the ith key point with xi and yi beingits x and y coordinates in the HM.2. Organize the key points Pi for each input HM in andescending order according to their heat values in HM [i.e.,H(P i)> H(P j) for i > j].3. Shift the key points (P1, P2, P3, ..., Pn) such that thegravity center of these points is in the center of the HM.4. Scale the key points (P1, P2, P3, ..., Pn) such that(∑n

i=1

√x2

i + y2i

)/n = 1.

5. Align the key points of the input HM with the target HMsuch that argT

(min

(∑ni=1 |Gi − Pi · T |2)), where T is a

2 × 2 matrix for aligning the key points of Pi, and Gi arethe key points for the target HM. And T can be achievedby linear regression.6. Apply the final shift, rotation, and scaling operationderived from 2 to 4 on the entire input HM for achievingthe final aligned version.

rotated coherently (i.e., by the same parameter) in orderto keep the overall shape of HM during alignment.

4) In Algorithm 1, the key points Gi of the target HM areassumed to be already shifted and scaled properly. Inour HMB algorithm, we perform clustering on the HMsin the training data and perform alignment within eachcluster. After that, the mean of the aligned HMs in eachcluster is used as the standard surface (i.e., the targetHM) for representing the cluster during recognition. Theprocess of clustering the HMs and calculating the meanHM for each cluster is performed in an iterative way asdescribed by Algorithm 2. And this point will be furtherdiscussed in detail in the next section.

D. SF Method for Activity Recognition

With the HM feature and the KPB alignment method, wecan then perform recognition based on our SF method. TheSF process can be described by

m∗ = arg minm

(minTm

||Tm · SHM − SSD,m||)

(4)

where m∗ is the final recognized activity. SHM is the HMsurface of the input activity, SSD,m is the standard surfacefor activity m. Tm is the alignment operator derived byAlgorithm 1 for aligning with SSD,m. And || · || is the absolutedifference between two HM surfaces. From (4), we can seethat the SF method includes two steps. In the first step, theinput HM is aligned to fit with each standard surface. Andthen in the second step, the standard surface that best fits theinput HM surface will be selected as the recognized activity.

As shown in Algorithm 2, the standard surface can beachieved by clustering the training HMs and taking the meanHM for each cluster. However, since the HMs may still varywithin the same activity, it may still be less effective to use one

Algorithm 2 Clustering the HMs and calculating the mean HM keypoints for each cluster in the training set

1. Cluster the HMs in the training set according to theiractivity labels.2. for each HM v in the training set do3. Shift the key points (P1,v, P2,v, P3,v, ..., Pn,v) of HMv such that the gravity center of these point is in the centerof the HM.4. Scale the key points (P1,v, P2,v, P3,v, ..., Pn,v) of HM

v such that(∑n

i=1

√x2

i,v + y2i,v

)/n = 1.

5. end for6. for each cluster u do7. Randomly select an HM in cluster u as the initial meanHM and define the key points for this mean HM as (Gu

1 ,Gu

2 , Gu3 , ..., Gn

u)8. for each HM v in cluster u do9. Scale the key points (P1,v, P2,v, P3,v, ..., Pn,v) of

HM v such that(∑n

i=1

√x2

i,v + y2i,v

)/n = 1.

10. Align the key points of the HM v with the currentmean HM such that argT v

(min

(∑ni=1 |Gu

i − Pi,v · T v|2))

,where T v is the alignment matrix for v.11. Move the key points of the HM v to the aligned places,i.e., Pnew

i,v = Pi,v · T v.12. end for13. Update the key points of the mean HM of clusteru by: G

u,newi =

(∑NUMv=1 Pnew

i,v

)/NUM, where NUM is the

number of HMs in cluster u.14. If not converged and iteration time ≤ 1000, returnto 8.15. Align all the HMs in cluster u to the calculated keypoints of the mean HM. And the final mean HM canbe achieved by averaging or selecting the most fitted oneamong these aligned HMs.16. end for

fixed HM as the standard surface for recognition. Therefore,in this paper, we further propose an adaptive surface-fitting(ASF) method that selects the standard surface in an adaptiveway. The proposed ASF method can be described by

m∗ = arg maxm

⎛⎝ ∑

Str,m∈Nw(SHM )

GA

(minTtr

∥∥Ttr · SHM − Str,m

∥∥)⎞⎠(5)

where SHM is the HM surface for the input activity. Str,m

is the HM surface for activity m in the training data. T tr isthe alignment operator for aligning with Str. Nw(SHM) is setcontaining the w most similar HM surfaces to SHM . GA(·) isthe Gaussian kernel function as defined by

GA(x) = exp(−|x|22σ2

) (6)

where σ controls the steepness of the kernel.From (5), we can see that the proposed ASF method

adaptively select the most similar HM surfaces as the standardsurfaces for recognition. By this way, the in-class HM surfacevariation effect can be effectively reduced. Furthermore, by

LIN et al.: HEAT-MAP-BASED ALGORITHM FOR RECOGNIZING GROUP ACTIVITIES IN VIDEOS 1985

TABLE II

Sample Number for Different Group Activities for the

Experiments in Tables III and IV and Figs. 8 and 9

introducing the Gaussian kernel, different training surfacesStr,m can be allocated different importance weights accordingto their similarity to the input HM SHM during the recognitionprocess.

Furthermore, several things need to be mentioned about theASF method.

1) When w > 1 in Nw,m(SHM), the ASF method can beviewed as an extended version of the k-nearest-neighbormethods [14], where the kernel-weighted distance be-tween points is calculated by the absolution differencebetween the aligned HM surfaces.

2) When w = 1 in Nw,m(SHM), the ASF method is simplifiedto finding a Str,m in the training set that can bestrepresent the input SHM .

IV. Experimental Results

In this section, we show experimental results for ourproposed HMB algorithm. The ASF method is used forrecognition in our experiments. And the patch size is setto be 10 × 10 based on our experimental statistics, in orderto achieve satisfactory resolution of the HM surface whilemaintaining the efficiency of computation. Furthermore, foreach input video clip, the HM is created for the entire clip.

A. Experimental Results on BEHAVE Dataset

In this section, we perform five different sets of experimentson the BEHAVE dataset to evaluate our proposed algorithm.

First of all, we change the values of the temporal decayparameter kt and the thermal diffusion parameter kp in (1)and (3) to see their effects in recognition performances. Weselect 200 video clips from the BEHAVE dataset [1] andrecognize six group activities defined in Table I. The samplenumber for each group activity is shown in Table II. Eachvideo clip includes two to five trajectories. In order to examinethe algorithm’s performance against tracking fluctuation andtracking bias, we perform five rounds of experiments wherein each round, different fluctuation and bias effects are addedon the ground-truth trajectories. The final results are averagedover the five rounds. The recognition results under 75%-training and 25%-testing are shown in Tables III and IV.

Tables III and IV show the total error rate (TER) fordifferent kt and kp values. The TER rate is calculated byNt−miss

/Nt−f, where Nt−miss

is the total number of misdetectionactivities for both normal and abnormal activities and Nt−f

is the total number of activity sequences in the test set [6],[15]. TER reflects the overall performance of the algorithm inrecognizing all activity types [6], [15]. In Tables III and IV,our HMB algorithm is performed where w in (5) is set tobe 1 (i.e., selecting only the most similar HM surface in thetraining set during recognition). Furthermore, the example HM

TABLE III

TER Rates of HMB Algorithm Under Different Spatial

Diffusion Coefficient kp Values (When kt= 0.125)

TABLE IV

TER Rates of HMB Algorithm Under Different Temporal

Diffusion Coefficient kt Values (When kp = 2)

Fig. 8. Example HM surfaces for activity ‘Together’ with different kp

values. (a) kp = 0.0001. (b) kp = 0.25. (c) kp = 1000.

surfaces under different kt and kp values are shown in Figs. 8and 9, respectively.

From Table III and Fig. 8, we can see that: 1) When kp

is set to be a very small number [such as Fig. 8(a)], thethermal diffusion effect is too strong that the HM is close toa flat surface. In this case, the effectiveness of the HM cannotfully work and the recognition performances will be decreased.2) On the contrary, if kp is set to be extremely large [such asFig. 8(c)], few thermal diffusion is performed and the HMsurfaces are only concentrated on the heat source patches.In these cases, the recognition performance will also reduce.3) The results for kp = 2 and kp = ∞ in Table III can also showthe usefulness of our proposed thermal diffusion process. Sinceno diffusion process is applied on the HM when kp = ∞, itis more vulnerable to tracking fluctuation or tracking biases,resulting in lower recognition results. Comparatively, by theintroduction of our thermal diffusion process, the trackingfluctuation effects can be greatly reduced and the performancescan be obviously improved. 4) If taking a careful look atFig. 8(c), we can see that there is a needle-like peak in themiddle of the HM. It is created because both trajectoriestraverse the same patch, thus making the heat source valuegreatly amplified at this patch location. If we directly usethis HM for recognition, this noisy peak will affect the finalperformance. However, by using our heat diffusion process,this noisy peak can be blurred or deleted [such as Fig. 8(b)]and the coherence among HMs can be effectively kept.5) Except for extremely small or large values, kp can achievegood results within a wide range.

From Table IV and Fig. 9, we can see the effects of thetemporal decay parameter kt . 1) For an extremely small kt

value [such as Fig. 9(a)], most heat sources will show thesame values. In this case, the temporal information of thetrajectory will be lost in the HM and the performance will be

1986 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 23, NO. 11, NOVEMBER 2013

Fig. 9. Example HM surfaces for activity together with different kt values.(a) kt = 0. (b) kt = 0.03. (c) kt = 1000.

TABLE V

Video-Clip Number for Different Group Activities for the

Experiments in Tables VI and VII

Fig. 10. Examples of human group activities in the BEHAVE dataset [1].

reduced. 2) For an extremely large kt value [such as Fig. 9(c)],the old heat sources will decay quickly such that the HM isconcentrated only on the newest heat source. In this case,the trajectory’s temporal information will also be lost andleading to low performances. 3) Except for extremely smallor large values, kt can also achieve good results within a widerange.

Based on the above discussion, kt and kp in (1) and (3) areset to be 0.125 and 2, respectively, throughout our experiments.

Secondly, we compare our HMB algorithm with the otheralgorithms. In order to include more activity samples, wefurther increase the sample number and select 325 video clipsfor six activities (as in Table I) from the BEHAVE dataset[1]. The sample number distributions for different activitiesare shown in Table V. Each video clip includes two to fivetrajectories. Fig. 10 shows some examples of the six activities.The following six algorithms are compared.

1) The weighted feature-support vector machine (WF-SVM) algorithm which utilizes causalities between tra-jectories for group recognition [2] (WF-SVM).

TABLE VI

Miss, FA, and TER Rates for Different Algorithms

on the BEHAVE Dataset

2) The localized causality feature-support vector machine(LC-SVM) algorithm which includes the individual, pair,and group correlations for recognition [3] (LC-SVM).

3) The group-representative-based activity detection(GRAD) algorithm which uses Markov chain modelsfor modeling the temporal information for performingrecognition [6] (GRAD).

4) Using our proposed HM as the input features and ourKPB method for HM alignments. After that, usingprinciple component analysis (PCA) for reducing theHM feature vector length and use SVM for activityrecognition [16], [17] (HM-PCASVM).

5) Using the entire version of our proposed HMB algorithmand w in (5) is set to be 1 (HMB (w = 1)).

6) Using the entire version of our proposed HMB algorithmand w in (5) is set to be 3 (HMB (w = 3)).

Similarly, we split the dataset into 75% training—25%testing parts and perform recognition on the testing part [6].Six independent experiments are performed and the results areaveraged. Furthermore, we use the ground-truth trajectoriesin this experiment. However, note that in practice, variousobject detection and tracking algorithms [9], [26], [30], and[31] can be utilized to achieve trajectories. Even in caseswhen reliable trajectories cannot be achieved, other low-levelfeatures [28] can be used in our algorithm to take the placeof the trajectories. This point will be further discussed inSection IV-C. Table VI shows the miss, false alarm (FA), andTER [6] for different algorithms. The miss detection rate isdefined by Nθ fn/Nθ+, where Nθ fn is the number of falsenegative (misdetection) sequences for activity θ, and Nθ+ isthe total number of positive sequences of activity θ in the testdata [6], [15]. The FA rate is defined by Nθ fp/Nθ−, where Nθfp is the number of false positive (FA) video clips for activityθ, and Nθ− is the total number of negative video clips exceptactivity θ in the test data [6].

From Table VI, we can have the following observations.

1) Due to the complexity and uncertainty of human activ-ities, the WF-SVM, LC-SVM, and GRAD algorithmsstill produce unsatisfactory results for some group ac-tivities such as “Gather.” Compared to these algorithms,

LIN et al.: HEAT-MAP-BASED ALGORITHM FOR RECOGNIZING GROUP ACTIVITIES IN VIDEOS 1987

TABLE VII

Comparison of TER Rates With Different Trajectory Qualities

(m Is the Noise Strength Parameter Which Measures the

Average Pixel-Level Deviation From the Ground-Truth

Trajectories)

algorithms based on our HM features (HMB (w = 1),HMB (w = 3), and HM-PCASVM) have better perfor-mances. This demonstrates that our HM features are ableto precisely catch the characteristics of activities.

2) Comparing the HMB algorithms (HMB (w = 1), HMB(w = 3)), and the HM-PCASVM, we can see that theHMB algorithms have improved results than that ofthe HM-PCASVM algorithm. This demonstrates theeffectiveness of our SF recognition methods. Note thatthe improvement of our HMB algorithm will becomemore obvious in another dataset, as will be shown later.

3) The performance of the HMB (w = 1) algorithm is closeto the HMB (w = 3) algorithm. And similar observationscan be achieved for other datasets and for other w values(when w < 5). Therefore, in practice, we can simply setw = 1 when implementing the ASF methods.

Thirdly, in order to evaluate the influence of trajectoryqualities to the algorithm performances, we perform anotherexperiment by adding Gaussian noises with different strengthon the ground-truth trajectories and perform recognition onthese noisy trajectories. The results are shown in Table VII.

Table VII compares the TER of our HMB algorithm andthe WF-SVM algorithm [2]. We select to compare with WF-SVM because it has the best performance among the comparedmethods in Table VI. The noise strength parameter m inTable VII is the average pixel-level deviation from the ground-truth trajectory. For example, m = 5 means that in average, thenoisy trajectory is five-pixel deviated from the ground-truthtrajectory. Note that m only reflects the average deviation whilethe actual noisy trajectories may have more fluctuation ef-fects, for example, fluctuating with different deviation strengtharound the ground-truth trajectory and deviating with largemagnitudes from the ground-truth.

From Table VII, we can see that:1) Our HMB algorithm can still achieve pretty stable

performances when the qualities of the trajectoriesdecrease (i.e., when the noise strength m increases).Comparatively, the performance decrease by the WF-SVM algorithm is more obvious. For example, whenm = 5, the TER rate of WF-SVM will be increased bymore than 3% while our HMB is only increased by lessthan 1%. This further demonstrates that the heat thermaldiffusion process in our algorithm can effectively reducethe possible trajectory fluctuations.

2) When the noise strength is extremely large (e.g., m = 25in Table VII), the performance of our HMB algorithmwill also be decreased. This is because when the trajecto-ries are extremely noisy and deviated, they will become

Fig. 11. (a) Trajectories for the two complex activities. (b) Major featurevalues for the WF-SVM algorithm [2]. (c) HMs for the two complex activities.

far different from the standard ones and appear morelike a different activity. This will obviously affect therecognition performance. However, from Table VII, wecan also see that, even in large noise situations, our HMBalgorithm can still achieve better performance than theWF-SVM method.

3) More importantly, note that our HMB algorithm is notlimited to trajectories. Instead, various low-level motionfeatures such as the optical flow [28] can also be in-cluded into our algorithm to create HMs for recognition.Therefore, in cases when reliable trajectories cannot beachieved (such as the m = 25 case in Table VII), our algo-rithm can also be extended by skipping the tracking stepand directly utilizing other low-level motion features forperforming group activity recognition. This point will befurther discussed in Section IV-C.

Fourthly, in order to further demonstrate our HM features,we perform another experiment for recognizing two complexactivities: “Exchange” (i.e., two people first approach eachother, stay together for a while and then separate) and “Return”(i.e., two people first separate and then approach to eachother later). In this experiment, we extract 32 pair-trajectoriesfrom the BEHAVE dataset for the two complex activities andperform 75% training–25% testing. Some example frames areshown in Fig. 12. In Fig. 11, (a) shows the trajectories ofthe two complex activities, (b) shows the values of the majorfeatures in the WF-SVM algorithm [2], and (c) shows the HMsurfaces. From Fig. 11(b), we can see that the features in theWF-SVM algorithm cannot show much difference betweenthe two complex activities. Compared to (b), our HMs in (c)are obviously more distinguishable. The recognition results forthe WF-SVM algorithm and our HMB algorithm are shownin Table VIII. The results in Table VIII further demonstratethe effectiveness of our HM features in representing complexgroup activities.

Finally, we evaluate our algorithm in recognizing the sub-activities. Note that our algorithm can be easily extended torecognize the subactivities by using shorter sliding windowsto achieve the short-term trajectories instead of the entiretrajectories. By this way, we can also achieve on-the-flyactivity recognition at each time instant [6], [29]. In order to

1988 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 23, NO. 11, NOVEMBER 2013

TABLE VIII

Miss and TER Rates for the Complex Activities

Fig. 12. Example frames of the Exchange and Return sequences and thequalitative results of online subactivity recognition by using a 30-frame-long sliding window. The bars represent labels of each frame, red representsApproach, green represents Stay, and blue represents Separate.

demonstrate this point, Fig. 12 shows the results by applyinga 30-frame-long sliding window to automatically recognizethe subactivities inside the complex “Exchange” and “Return”video sequences. From Fig. 12, we can see that our HMBalgorithm can also achieve satisfying recognition results forthe subactivities inside the long-term sequences. Besides, ouralgorithm is also able to recognize both the long-term activitiesand the short-term activities by simultaneously introducingmultiple sliding windows with different lengths. By this way,both the subactivities of the current clip and the complex ac-tivities of the long-term clip can be automatically recognized.

B. Experimental Results for Traffic Dataset

In this section, we perform two experiments on the trafficdatasets.

Firstly, we perform an experiment on a traffic dataset forrecognizing group activities among vehicles in the crossroad.The dataset is constructed from 20 long surveillance videostaken by different cameras. Seven vehicle group activities aredefined as in Table IX and some example activities are shownin Fig. 13. We select 245 video clips from the dataset whereeach activity includes 35 video clips and each clip includestwo trajectories. In this dataset, the trajectories are achieved byfirst using our proposed object detection method [26] to detectthe vehicles and then using the particle-filtering-based trackingmethod [9], [31] to track the detected vehicles. The miss, FA,and TER of different algorithms are shown in Table X.

From Table X, we can see that the LC-SVM algorithmproduces less satisfactory results. This is because the group

TABLE IX

Definitions of the Vehicle Group Activities

Fig. 13. Examples of the defined vehicle group activities.

activities in this dataset contain more complicated activitiesthat are not easily distinguishable by the causality and feed-back features [3]. Also, the performance of the WF-SVMand the GRAD algorithms are still unsatisfactory in severalactivities such as “Follow,” “Overtake,” and “Pass.” Comparedto these, the performances of our HM algorithms [HMB(w = 1) and HM-PCASVM] are obviously improved. Besides,the performance of the HMB (w = 1) is also improved fromthe HM-PCASVM algorithm. These further demonstrate theeffectiveness of our proposed HM feature as well as our SFrecognition method.

Furthermore, it should be noted that there are two importantchallenging characteristics for the traffic dataset: 1) The videosin the dataset are taken from different cameras (as in Fig. 13).This makes the trajectories vary a lot for the same activity.2) Within each video, there are also large-scale variations(i.e., the object size is much larger at the front region thanthat in the far region, as shown in Fig. 13). Because ofthis, same activities from different regions may also havelarge variations and are difficult to be differentiated. Thesechallenging characteristics partially lead to the low perfor-mance in the compared algorithms (WF-SVM, LC-SVM, and

LIN et al.: HEAT-MAP-BASED ALGORITHM FOR RECOGNIZING GROUP ACTIVITIES IN VIDEOS 1989

TABLE X

Miss, FA, and TER for Different Algorithms on the

Vehicle Group Activity Dataset

Fig. 14. Examples of vehicle group activities in the new dataset.

GRAD). However, comparatively, these variations in scale andcamera view are much less obvious in our HM algorithms[HMB (w = 1) and HM-PCASVM] by utilizing the proposedKPB alignment method for eliminating the scale differencesand utilizing the proposed HM for effectively catching thecommon characteristics of activities.

Secondly, we also perform another experiment with differentcamera settings. In this experiment, we use the traffic datasetin Fig. 13 to train the HMs and then directly use these HMsto recognize the activities from a new dataset as in Fig. 14.The new dataset in Fig. 14 includes 65 video clips takenfrom a camera whose height, angle, and zoom are largelydifferent from the ones in Fig. 13. The results are shown inTable XI.

From Table XI, we can see that when using the trainedmodels to recognize the activities in a dataset with different

TABLE XI

Miss, FA, and TER for Different Algorithms by Using the HMs

Trained From the Traffic Dataset in Fig. 13 to Recognize

the New Dataset in Fig. 14

camera settings, the performances of the compared algorithms(WF-SVM, LC-SVM, and GRAD) are obviously decreased.Comparatively, our HM algorithms [HMB (w = 1) and HM-PCASVM] can still produce reliable results. This demonstratesthat: 1) our proposed KPB HM alignment method can ef-fectively handle the HM differences due to different camerasettings and 2) our HMB algorithm has the flexibility ofdirectly applying the HMs trained from one camera settingto the other camera settings.

C. Experimental Results for UMN Dataset

Finally, in order to demonstrate that our algorithm canalso be extended to other low-level motion features [28], weperform another experiment by using the optical flows forrecognition.

The experiment is performed on the UMN dataset [22],which contains videos of 11 different scenarios of an abnormalescape event in 3 different scenes including both indoor andoutdoor. Each video starts with normal behaviors and endswith the abnormal behavior (i.e., escape). In this experiment,we first compute the optical flow between the current andthe previous frames. Then patches with high optical-flowmagnitudes will be viewed as the heat sources for creating theHMs and these HMs will be utilized for activity recognition inour HMB algorithm. A sliding window of 30 frames is used asthe basic video clip and one HM is generated from each clip.By this way, we can achieve 257 video clips. We randomlyselect five normal behavior HMs and five abnormal behaviorHMs as the training set to classify the rest 247 video clips.Furthermore, we set w in (5) as 1.

Fig. 15 shows some example frames of the UMN datasetand compares the normal/abnormal classification results of ouralgorithm with the ground truth. Furthermore, Fig. 16 com-pares the ROC curves between our algorithm (HMB + OpticalFlow) and three other algorithms: the optical flow only method(Optical Flow) [20], [28], the Social Force Model [20], andthe Velocity-Field Based method [21].

From Figs. 15 and 16, we can have the following observa-tions.

1990 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 23, NO. 11, NOVEMBER 2013

Fig. 15. Qualitative results of using our HMB algorithm for abnormaldetection in the UMN dataset. The bars represent the labels of each frame,green represents normal and red represents abnormal.

1) From Fig. 15, we can see that the UMN dataset in-cludes high density of people where reliable trackingis difficult. However, our HMB algorithms can stillachieve satisfying normal/abnormal classification resultsby using the optical flow features. This demonstrates thepoint that when reliable trajectories cannot be achieved,our algorithm can be extended by skipping the trackingstep and directly utilizing the low-level motion featuresto perform group activity recognition.

2) From Fig. 15, we can see that our algorithm can performonline normal/abnormal activity recognition for eachtime instant by using a 30-frame-long sliding window.This further demonstrates that our algorithm is extend-able to on-the-fly and subactivity recognitions.

3) Our HMB algorithm can achieve similar or better resultsthan the existing social-force-based methods [20] and[21] when detecting the UMN dataset. This demonstratesthe effectiveness of our HMB algorithm. Although othersocial-force-based methods [23] may have further im-proved results on the UMN dataset, the performanceof our HMB algorithm can also be further improvedby: (a) using more reliable motion features (such as thetrajectories of the local spatio-temporal interest points[23]) to take the place of the optical flow, (b) includingmore training samples (note that in this experiment, onlyfive normal clips and five abnormal clips are used fortraining in our algorithm).

4) More importantly, compared with the social-force-basedmethods [20], [21], and [23], our HMB algorithm alsohas the following advantages:

Fig. 16. ROC curves of different methods in abnormal detection in the UMNdataset.

Most social-force-based methods [20], [21], and [23]are more focused on the relative movements among theobjects (e.g., whether two objects are approaching orsplitting) while the objects’ absolute movements in thescene are neglected (e.g., whether an object is standstill or moving in the scene). Thus, these methods willhave limitations in differentiating activities with similarrelative movements but different absolute movements(such as “Wait” and “Gather” in Fig. 10 or “Confront”and “Bothturn” in Fig. 13). Comparatively, the HMfeatures in our HMB algorithm can effectively embedboth the relative and absolute movements of the objects.

Since the interaction forces used in the social-force-based methods [20], [21], and [23] cannot effectively re-flect the correlation changes over time (e.g., two objectsfirst approach and then split), they also have limitationsin differentiating activities with complex correlationchanges such as “Return” and “Exchange.” Compara-tively, since our HM features include rich informationabout the temporal correlation variations among objects,these complex activities can be effectively handled byour algorithm.

Since our HM features can effectively distinguishthe motion patterns between the normal and abnormalbehaviors, our algorithm can achieve good classificationresults with only a small number of training samples (inthis experiment, only five 30-frame-long normal clipsand five 30-frame abnormal clips are needed for trainingin our algorithm). Comparatively, more training samplesare required to construct reliable models for the social-force-based methods [20], [21], and [23].

V. Conclusion

In this paper, we proposed a new HMB algorithm forgroup activity recognition. We proposed to create the HM forrepresenting the group activities. Furthermore, we also pro-posed to use a KPB method for aligning different HMs and a

LIN et al.: HEAT-MAP-BASED ALGORITHM FOR RECOGNIZING GROUP ACTIVITIES IN VIDEOS 1991

SF method for recognizing activities. Section IV demonstratedthe effectiveness of our algorithm.

References

[1] R. Fisher, G. Milne, S. Blunsden, and E. Andrade. (2007). BEHAVEDataset [Online]. Available: http://homepages.inf.ed.ac.uk/rbf/BEHAVE

[2] Y. Zhou, S. Yan, and T. Huang, “Pair-activity classification by bi-trajectory analysis,” in Proc. IEEE Conf. CVPR, 2008, pp. 1–8.

[3] B. Ni, S. Yan, and A. Kassim, “Recognizing human group activity withlocalized causalities,” in Proc. IEEE CVPR, 2009, pp. 1470–1477.

[4] Z. Cheng, L. Qin, Q. Huang, S. Jiang, and Q. Tian, “Group activityrecognition by Gaussian process estimation,” in Proc. Int. Conf. PatternRecognition (ICPR), 2010, pp. 3228–3231.

[5] Y. Chen, W. Lin, H. Li, H. Luo, Y. Tao, and D. Liu, “A new package-group-transmission-based algorithm for human activity recognition invideos,” in Proc. IEEE VCIP, 2011, pp. 1–4.

[6] W. Lin, M.-T. Sun, R. Poovendran, and Z. Zhang, “Group event detectionwith a varying number of group members for video surveillance,” IEEETrans. Circuits Syst. Video Technol., vol. 20, no. 8, pp. 1057–1067,Aug. 2010.

[7] C. Goodall. “Procrustes methods in the statistical analysis of shape,”J. R. Stat. Soc., vol. 53, no. 3, pp. 285–339, 1991.

[8] Y. Fang, M. Sun, M. Kim, and K. Ramani, “Heat-mapping: A robustapproach toward perceptually consistent mesh segmentation,” in Proc.IEEE Conf. CVPR, 2011, pp. 2145–2152.

[9] D. Comaniciu, V. Ramesh, and P. Meer, “Kernel-based object tracking,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 25, no. 5, pp. 564–574,May 2003.

[10] H. Carslaw and J. Jaeger, Conduction of Heat in Solids. Oxford, U.K.:Oxford Univ. Press, 1986.

[11] A. Girgensohn, F. Shipman, and L. Wilcox, “Determining activity pat-terns in retail spaces through video analysis,” in Proc. ACM Multimedia,2008, pp. 889–892.

[12] W. A. Hoff and J. W. Howard, “Activity recognition in a dense sensornetwork,” in Proc. Int. Conf. SNA, 2009, pp. 67–72.

[13] T. F. Cootes and C. J. Taylor, “Statistical models of appearance for com-puter vision,” Image Sci. Biomed. Eng., Univ. Manchester, Manchester,U.K, Tech. Rep., 2004.

[14] T. Hastie, R. Tibshirani, and J. H. Friedman, The Elements of StatisticalLearning. New York, NY, USA: Springer, 2003.

[15] W. Lin, M.-T. Sun, R. Poovendran, and Z. Zhang, “Activity recognitionusing a combination of category components and local models for videosurveillance,” IEEE Trans. Circuits Syst. Video Technol., vol. 18, no. 8,pp. 1128–1139, Aug. 2008.

[16] C. Chang and C. Lin, “LIBSVM: A library for support vector machines,”ACM Trans. Intell. Syst. Technol., vol. 2, pp. 1–27, 2011.

[17] C. Wang, J. Zhang, J. Pu, X. Yuan, and L. Wang, “Chrono-gait image:A novel temporal template for gait recognition,” in Proc. ECCV, 2010,pp. 257–270.

[18] L. Bao and S. Intille, “Activity recognition from user-annotated acceler-ation data,” in Proc. Int. Conf. Pervasive Computing PERVASIVE, vol.3001. 2004, pp. 1–17.

[19] C. Rao, M. Shah, and T. Syeda-Mahmood, “Action recognition basedon view invariant spatio-temporal analysis,” in Proc. ACM Multimedia,2003, pp. 518–527.

[20] R. Mehran, A. Oyama, and M. Shah, “Abnormal crowd behaviordetection using social force model,” in Proc. IEEE Conf. CVPR, 2009,pp. 935–942.

[21] J. Zhao, Y. Xu, X. Yang, and Q. Yan, “Crowd instability analysis usingvelocity-field based social force model,” in Proc. VCIP, 2011, pp. 1–4.

[22] N. Papanikolopoulos. (2009). Unusual Crowd Activity Dataset [Online].Available: http://mha.cs.umn.edu/Movies/Crowd-Activity-All.avi

[23] X. Cui, Q. Liu, M. Gao, and D. N. Metaxas, “Abnormal detection usinginteraction energy potentials,” in Proc. IEEE Conf. CVPR, 2011, pp.3161–3167.

[24] B. Zhan, D. N. Monekosso, P. Remagnino, S. A. Velastin, and L. Xu,“Crowd analysis: A survey,” Mach. Vision Appl., vol. 19, pp. 345–357,2008.

[25] J. C. S. Jacques Junior, S. R. Musse, and C. R. Jung, “Crowd analysisusing computer vision techniques,” Signal Process. Mag., vol. 27,pp. 66–77, 2010.

[26] X. Su, W. Lin, X. Zhen, X. Han, H. Chu, and X. Zhang, “A new local-main-gradient-orientation HOG and contour differences based algorithmfor object classification,” in Proc. IEEE ISCAS, 2013, pp. 21–24.

[27] H. Chu, W. Lin, J. Wu, X. Zhou, Y. Chen, and H. Li, “A new heat-map-based algorithm for human group activity recognition,” in Proc. ACMMultimedia, 2012, pp. 1069–1072.

[28] B. D. Lucas and T. Kanade, “An iterative image registration techniquewith an application to stereo vision,” in Proc. Imaging UnderstandingWorkshop, 1981, pp. 121–130.

[29] S. Chiappino, P. Morerio, L. Marcenaro, E. Fuiano, G. Repetto, andC. S. Regazzoni, “A multi-sensor cognitive approach for active securitymonitoring of abnormal overcrowding situations,” in Proc. Int. Conf.Information Fusion (FUSION), 2012, pp. 2215–2222.

[30] G. Wu, Y. Xu, X. Yang, Q. Yan, and K. Gu, “Robust object trackingwith bidirectional corner matching and trajectory smoothness algo-rithm,” in Proc. Int. Workshop Multimedia Signal Processing, 2012,pp. 294–298.

[31] R. Hess and A. Fern, “Discriminatively trained particle filters forcomplex multi-object tracking,” in Proc. IEEE Conf. CVPR, 2009, pp.240–247.

Weiyao Lin received the B.E. and M.E. degreesfrom Shanghai Jiao Tong University, Shanghai,China, in 2003 and 2005, respectively, and the Ph.Ddegree from the University of Washington, Seattle,USA, in 2010, all in electrical engineering.

He is currently an Associate Professor at theDepartment of Electronic Engineering, SJTU. Hiscurrent research interests include video processing,machine learning, computer vision, and video codingand compression.

Hang Chu received the B.E. degree in electronicengineering from Shanghai Jiao Tong University,China, in 2013. He is currently a master student inCornell University, Ithaca, NY, USA.

His research interests include statistical learningand its applications in computer vision.

Jianxin Wu received the B.S. and M.S. degrees incomputer science from Nanjing University, Nanjing,China, and the Ph.D. degree in computer sciencefrom the Georgia Institute of Technology, Atlanta,GA, USA.

He is currently a Professor at Nanjing Universityand was an Assistant Professor at Nanyang Techno-logical University, Singapore. His research interestsinclude computer vision and machine learning.

Bin Sheng received the B.A. degree in Englishand the B.E. degree in computer science fromHuazhong University of Science and Technology,Wuhan. China, in 2004, the M.S. degree in softwareengineering from the University of Macau, Taipa,Macau, in 2007, and Ph.D. degree in computerscience from the Chinese University of Hong Kong,Shatin, Hong Kong, in 2011.

He is an Assistant Professor in the Department ofComputer Science and Engineering at Shanghai JiaoTong University, Shanghai, China. His research in-

terests include virtual reality, computer graphics, and image-based techniques.

1992 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 23, NO. 11, NOVEMBER 2013

Zhenzhong Chen received the B.Eng. degree fromHuazhong University of Science and Technology,Wuhan, China, and the Ph.D. degree from the Chi-nese University of Hong Kong (CUHK), Shatin,Hong Kong, both in electrical engineering.

He is currently a member of Technical Staff atMediaTek USA, Inc., San Jose, CA, USA. He wasa Lee Kuan Yew Research Fellow and a PrincipalInvestigator with the School of Electrical and Elec-tronic Engineering, Nanyang Technological Univer-sity (NTU), Singapore, and the European Research

Consortium for Informatics and Mathematics Fellow (ERCIM Fellow) withthe National Institute for Research in Computer Science and Control (INRIA),France. He held visiting positions with Polytech’Nantes, Nantes, France,Universite Catholique de Louvain, Louvain-la-Neuve, Belgium, and MicrosoftResearch Asia, Beijing, China. His research interests include visual percep-tion, visual signal processing, and multimedia communications.

Dr. Chen is a voting member of the the IEEE Multimedia CommunicationsTechnical Committee (MMTC) and was an invited member of the IEEEMMTC Interest Group of Quality of Experience for Multimedia Communi-cations from 2010 to 2012. He has served as a Guest Editor of special issuesfor the IEEE MMTC E-Letter and the Journal of Visual Communication andImage Representation. He served as the Co-Chair of the International Work-shop on Emerging Multimedia Systems and Applications 2012 (EMSA 2012).He has co-organized several special sessions at international conferences,including IEEE ICIP 2010, ICME 2010, and Packet Video 2010, and hasserved as a technical program committee member of IEEE ICC, GLOBECOM,CCNC, ICME, and others. He received the CUHK Faculty Outstanding Ph.D.Thesis Award, the Microsoft Fellowship, and the ERCIM Alain BensoussanFellowship. He is a member of SPIE.


Recommended