Real-time Abnormal Motion Detection in Surveillance Video

Real-time Abnormal Motion Detection in Surveillance Video

Nahum Kiryati, Tammy Riklin Raviv†, Yan Ivanchenko, Shay RochelTel Aviv University, † Massachusetts Institute of Technology

[email protected], [email protected], [email protected], [email protected]

Abstract

Video surveillance systems produce huge amounts ofdata for storage and display. Long-term human mon-itoring of the acquired video is impractical and in-effective. Automatic abnormal motion detection sys-tem which can effectively attract operator attention andtrigger recording is therefore the key to successful videosurveillance in dynamic scenes, such as airport termi-nals. This paper presents a novel solution for real-time abnormal motion detection. The proposed methodis well-suited for modern video-surveillance architec-tures, where limited computing power is available nearthe camera for compression and communication. Thealgorithm uses the macroblock motion vectors that aregenerated in any case as part of the video compressionprocess. Motion features are derived from the motionvectors. The statistical distribution of these featuresduring normal activity is estimated by training. At theoperational stage, improbable-motion feature values in-dicate abnormal motion. Experimental results demon-strate reliable real-time operation.

1. Introduction

A video surveillance system covering a large officebuilding or a busy airport can apply hundreds and eventhousands of cameras. To avoid communication bot-tlenecks, the acquired video is often compressed by alocal processor within the camera, or at a nearby video-server. The compressed video is then transmitted to acentral facility for storage and display.

Abnormal motion detection is the key to effectiveand economical video surveillance. The detection ofan abnormal motion can trigger video transmission andrecording, and can be used to attract the attention ofa human observer to a particular video channel. Theproblem is characterized by three related challenges.One is the reliability requirement, meaning that irreg-ular events should be consistently detected, while the

false-alarm rate should be sufficiently low. The secondis effective characterization of normal motion, allow-ing discrimination between normal and abnormal activ-ity. Third, abnormal motion detection should be accom-plished using the limited computational power availableat or near the camera.

This paper presents a novel real-time abnormal mo-tion detection scheme. The algorithm uses the mac-roblock motion vectors that are generated anyway aspart of standard video compression methods [3]. Mo-tion features are derived from the motion vectors. Nor-mal activity is characterized by the joint statistical dis-tribution of the motion features, estimated during atraining phase at the inspected site. During online op-eration, improbable-motion feature values indicate ab-normal motion. Relying on motion vectors rather thanon pixel data reduces the input data rate by about twoorders of magnitude, and allows real-time operation onlimited computational platforms.

Previous works that rely on segmentation, groupingor tracking have been reported in [7, 2, 13, 20, 16,14, 6, 10]. Steps towards liberation from segmenta-tion and tracking in activity analysis have been takenby [11, 18, 15, 4, 9]. Activity analysis relying on an-ticipated characteristics of human motion, such as peri-odicity, gait or gestures, can be found in [1, 12, 19, 11].In [11], principal component analysis of the macroblockmotion vectors was used to match the detected activ-ity in a video stream to known human activities (walk-ing, running, kicking), and for selective access of detailsfrom the uncompressed domain. Novelty or activity de-tection in video using pixel-level motion analysis hasbeen reported by [8, 5].

Unlike most previous methods for video analysis,the suggested approach completely avoids segmenta-tion and tracking. Taken together with the reliance onmacroblock motion vectors and the lack of a-priori pre-sumptions regarding normal motion, these design deci-sions distinguish our work from most of the availableliterature.

2 Method

2.1 From video to motion vectors

Common video compression schemes exploit boththe spatial and the temporal (frame-to-frame) redun-dancy present in the image sequence [3]. A frame iseither an intra-frame that is compressed as a full stillimage, eliminating its spatial redundancy, or an inter-frame represented by macro-block displacement vectorsrelative to (say) the previous frame, and an error image.Intra-frames are generated at constant intervals, to al-low random-access to the content, and to reduce accu-mulated errors. An intra-frame is also provided whenthere is a significant change in the scene (e.g., an edit-ing cut), so that representation of the current frame interms of the previous one is inefficient.

A motion vector Vi,j = {Vxi,j, Vyi,j

} is associ-ated with each Mh × Mw macro-block (i, j) in aninter-frame. In the current implementation Mh =Mw = 16 pixels. Generally, i ∈ {1, . . . , imax},j ∈ {1, . . . , jmax}. The motion vector points to thelocation of the most similar Mh ×Mw block in the pre-vious frame. Inter-frame l is then represented by a setof n = imax × jmax motion vectors V l = {V l

i,j , i =1, . . . , imax, j = 1, . . . , jmax} associated with its mac-roblocks, or by their 2n components {V l

xi,j, V l

yi,j, i =

1, . . . , imax, j = 1, . . . , jmax}.The difference between the current block and the ref-

erence block in the previous frame is compressed as partof the error image, and used for reconstruction. Whenthe match between the current macro-block and the ref-erence block is poor, the current block is compressed byitself and is referred to as an intra-block.

2.2 From motion-vectors to motion features

A small set F l of m << n features is derived fromthe set of motion vectors V l. Its regular probability dis-tribution is estimated during training. In the course ofonline operation, the feature vector F l of the incomingframe is compared to the statistical model. If its proba-bility is low, it is declared abnormal.

The surveillance domain knowledge allows “man-ual” selection of the m features in F l. The advantage ofhuman-designed features with respect to blindly gen-erated ones is their clear conceptual meaning, provid-ing insight and promoting testability and maintainabil-ity. Let

|V li,j | =

√(V l

xi,j)2 + (V l

yi,j)2,

Φli,j = arctan(V l

yi,j/V l

xi,j)

respectively denote the magnitude and direction of themotion vector V l

i,j . The current implementation usesthe following m = 5 features.

2.2.1 Total absolute motion

F lTAM =

∑i,j

|Vi,j | (1)

This feature corresponds to the total motion in thescene. No distinction is made between the motion of‘objects’ and the motion of, say, tree branches on awindy day.

2.2.2 Regional information

Dividing the frame into K rectangular sub-frames Ak,the area of dominant motion is obtained by:

F lADM = k∗ = arg max

k(

∑i,j∈Ak

|V li,j |) (2)

This feature is the index of the sub-area of frame l withthe largest sum of absolute values of motion vectors.Informally, this is the part of the frame with the largestabsolute motion. In the current implementation, K = 9.

The ratio between the total absolute motion in thedominant area Ak∗ of frame l and the total absolutemotion F l

TAM is an indicator of motion homogeneitywithin the frame. Formally,

F lMH = max

k

∑i,j∈Ak

|V li,j |

F lTAM + ε

(3)

The addition of the small positive constant ε to the de-nominator prevents division by 0 in static frames.

2.2.3 Directional information

The range of motion directions {−π, π} is divided intoR equal fractions of size Δϕ = 2π/R. Let r =0 . . . R − 1 be the angular fraction index. The princi-pal motion direction is defined as the index of the mostpopular angular fraction:

F lPMD = r∗ = arg max

r

∑i,j

(|Φli,j − rΔϕ| <

Δϕ

2)

(4)where the sum is incremented if the arithmetic conditionis satisfied.

A measure for the dominance of the principal motiondirection is obtained by the ratio of the total motion inthe principal motion direction and the total absolute mo-tion in the frame:

F lDPM =

∑i,j |V l

i,j |(|Φli,j − r∗Δϕ| < Δϕ

2 )

F lTAM + ε

(5)

2.3 Training and online detection

The feature vector F l corresponding to frame l isrepresented by a point in an m-dimensional featurespace. The essence of the training phase is estimation ormodeling of the probability density function of featurevectors during normal conditions. Having an estimateof the probability density function allows, in the opera-tional stage, to associate with each incoming frame theprobability density of its feature vector under the nor-mal motion hypothesis. The requirement of real-timecomputation at the full video rate supports the selectionof a histogram that holds a discrete approximation of them-dimensional probability density function of the fea-ture vectors obtained during the training stage. In thedetection phase, the feature vector associated with eachincoming frame is computed. When the probabilities ofthe occurrence of k features vectors associated with kconsecutive frames are below a threshold T , the k-estframe is declared abnormal.

3 Experimental results

The suggested abnormal motion detection algorithmwas successfully tested at outdoor location. The al-gorithm was implemented in C++. The simplicity ofthe computations and the well-defined dynamic rangesallow fixed-point numerical representation. The coderuns on a Pentium 4 2.8GHz PC with a Windows C++graphical user interface at a rate of 75 frames per sec-ond, without optimization. This is three times fasterthan the video rate. In this experiment, the cameracaptured a pedestrian pathway from a nearby build-ing. The complete movie, with analysis by our system,can be found at http://abn-motion.axspace.com. Abnor-mal/normal motion frames are framed with red/greenrespectively.

Roughly 50 minutes of video were acquired. About41 minutes of normal pedestrian traffic were used fortraining. The 9-minute long test sequence containednormal and abnormal activity. The movie was capturedusing a SONY TRV900E PAL (25fps) digital videocamera. It was then transformed to a computer inDV format and coded to MPEG-1 format using thegeneric MPEG-2 codec from the MPEG group websitehttp://www.mpeg.org/MPEG/MSSG/#source.

Several representative frames from the video se-quence are provided. Examples of frames showing nor-mal behavior are presented in Fig. 1. A few frames de-tected as abnormal are presented in Fig 2. The framesshown belong to a jumping episode, to a running andgrass-crossing episode and to service vehicle episode.Note that the semantic descriptions (jumping, running,

frame 169 frame 993

frame 1044 frame 1382

Figure 1: Examples of normal behavior.

grass-crossing) are provided merely for clarity. The op-eration of the algorithm is based on global motion fea-tures, without segmentation, tracking or any other at-tempt for semantic interpretation. These events are ab-normal simply in the sense that similar motion patternshad not been observed (generally, have only rarely beenobserved) during the training session.

4 Discussion

We presented a computationally efficient and reliablemethod for abnormal motion detection in compressedvideo streams. The input to the algorithm is the setof macro-block motion vectors (as well as intra-frameand intra-block flags) that are produced anyway by thecompression process - an essential part of many modernvideo surveillance systems.

In the context of video analysis, ‘normal’ and ‘ab-normal’ are fundamentally hard to define. The best cur-rent way to evaluate an abnormal motion detector is bylearning the patterns of normal activity. Since the learn-ing is based global motion features, no attempt is madeto associate the motion abnormality detected with any‘object’ in the scene. From a practical point of view,since the algorithm is used mainly for triggering videorecording for later human analysis, or for triggeringtransmission to a human observer, detecting the ‘ob-ject’ that generated the abnormal motion is much lessimportant than the detection of the abnormality itself.Fundamentally, the algorithm can detect abnormal mo-tion that cannot be associated with any specific object.For example, panic in a crowd of people at a subway

frame 558 frame 570



Figure 2: Frames taken from a jumping episode (firstrow), from a running and grass-crossing episode (sec-ond row) and from a service vehicle episode (third row)are detected as abnormal.The complete movie, with analysis by our system, canbe found at http://abn-motion.axspace.com.

station, or an unexpected tsunami wave at the seafront,are alarming situations characterized by abnormal mo-tion, even though no particular ‘object’ in the scene canbe associated with it.

The algorithm is modular, in the sense that differentfeature vectors can be suggested and alternative prob-ability density estimation or modeling methods can beused. Further extension given long training sequenceswould be to learn normal pattern of activity from shortframe sequences using for example Markov chains.

References

[1] J.K. Aggarwal and Q. Cai, Human motion analysis: a re-view, CVIU, Vol. 73, No. 3, pp. 428-440, 1999.

[2] C. Beleznai, B. Fruhstuck and H. Bischof, Tracking mul-tiple humans using fast mean shift mode seeking, Int.Workshop on Visual Surveillance, pp. 25-32, 2005.

[3] V. Bhaskaran and K. Konstantinides, Image and VideoCompression Standards: Algorithms and Architectures,Kluwer, 1997.

[4] O. Boiman and M. Irani, Detecting irregularities in im-ages and in video, ICCV,2005.

[5] A. A. Efros, A. C. Berg, G. Mori and J. Malik, Recogniz-ing action at a distance, ICCV, 2003.

[6] R. Hamid, Y. Huang and I. Essa, ARGMode - Activityrecognition using graphical models, CVPR, Vol. 4, pp.38-44, 2003.

[7] I. Haritaoglu, D. Harwood and L.S. Davis, W4: Real-timesurveillance of people and their activities, PAMI, Vol. 22,No. 8, pp. 809-830, 2000.

[8] R.S. Gaborski, V.S. Vaingankar, V.S. Chaoji, A.M. Tere-desai and A. Tentler, VENUS: A system for novelty de-tection in video streams with learning, FLAIRS Confer-ence, 2004.

[9] C. Kaas, J. Luettin, R. Mattone and K. Zahn, Evaluationof a self-learning event detector, In Video-Based Surveil-lance Systems: Computer Vision and Distributed Pro-cessing, Kluwer, 2002.

[10] G. Medioni, I. Cohen, F. Bremond, S. Hongeng and R.Nevatia, Event detection and analysis from video streams,PAMI, Vol. 23, No. 8, pp. 873-889, 2001.

[11] B. Ozer, W. Wolf and A.N. Akansu, Human activity de-tection in MPEG sequences, Workshop on Human Mo-tion pp. 61, 2000.

[12] R. Polana and R. Nelson, Detection and recognition ofperiodic non-rigid motion, IJCV, Vol. 23, No. 3, pp. 261-282, 1997.

[13] S. Rao and P.S. Sastry, Abnormal activity detectionin video sequences using learned probability densities,TENCON, 2003, Vol. 1, pp. 369-372.

[14] C. Rao, A. Yilmaz and M. Shah, View-invariant repre-sentation and recognition of actions, IJCV, Vol. 50, pp.203-226, 2002.

[15] J. Sherrah and S. Gong, VIGOUR: A system for track-ing and recognition of multiple people and their activities,ICPR, pp. 179-182, 2000.

[16] C. Stauffer and W.E.L. Grimson, Learning patterns ofactivity using realtime tracking, PAMI, Vol. 22, No. 8,pp. 747-757, 2000.

[17] A. Veeraraghavan, A.R. Chowdhury and R. Chellappa,Role of shape and kinematics in human movement analy-sis, CVPR, Vol. I, pp. 730-737, 2004.

[18] T. Xiang and S. Gong, Beyond tracking: modelling ac-tivity and Understanding behavior, IJCV, Vol. 67, No. 1,pp. 21-51, 2006.

[19] Y. Yacoob and M.J. Black, Parametrized modeling andrecognition of activities, ICCV,1998.

[20] H. Zhong, J. Shi and M. Visontai, Detecting unusual ac-tivity in video, CVPR, Vol. 2, pp. 819-826, 2004.

Date post:	03-Feb-2022
Category:	Documents
Upload:	others
View:	5 times
Download:	0 times

Real-time Abnormal Motion Detection in Surveillance Video

Documents