Human Activity Recognition

Real-Time Multiple HumanActivity Recognition from

Tracked Body Displacements

Sagar MedikeriShashank Pujar

under the guidance ofDr Uma Mudenagudi

Department of Electronics and Communications,BVBCET, Hubli, India.

Contents

1 Introduction 11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Motivation of the problem . . . . . . . . . . . . . . . . . . . . 1

2 Review of Literature 3

3 Human Activity Recognition from Tracked Body Displace-ments 43.1 Human Detection Algorithm . . . . . . . . . . . . . . . . . . 43.2 Multiple Human Tracking and Tagging the activities . . . . . 6

4 Results and Conclusion 104.1 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . 104.2 Conclusion and Future Work . . . . . . . . . . . . . . . . . . 10

4.2.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . 104.2.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . 12

i

List of Figures

3.1 Activity state recognition without and with backtracking . . 83.2 Outputs of various processes involved in the algorithm . . . . 9

4.1 Results of the Activity Recognition algorithm . . . . . . . . . 11

ii

Human Activity Recognition 1

Chapter 1

Introduction

1.1 Introduction

Recognizing human activity is a very challenging task, ranging from low-level sensing and feature extraction from sensory data to high-level inferencealgorithms used to infer the state of the subject from the dataset. We areinterested in the scientific challenges of modeling simple activities of peoplelike walking, running and standing. We have come up with a recognizerwhich is able to reject moving objects other than humans and eventuallyrecognize the activities of only the humans as classified by a computer visionsystem. We assume that the camera position remains fixed as in case of asurveillance or security camera.

Human Activity Recognition is a complex problem involving various sub-processes. It begins with (1) detecting if there are any humans in the videocaptured by the camera. (2) Once a person is detected, he is tracked and (3)the activity being done the person is identified and tagged with one of thestates from the defined activity set. We define the activity set as consistingof the following states - Run Left, Walk Left, Stand Still, Walk Right andRun Right.

1.2 Motivation of the problem

Human Activity Recognition has a lot of scope in many fields. Real-timeimaging and human motion tracking systems are applicable to areas such assurveillance, robotics, law enforcement and defense. Many different appli-cations have been studied by researchers in activity recognition; examplesinclude assisting the sick and disabled. For example, Pollack et al in [2] show


that by automatically monitoring human activities, home-based rehabilita-tion can be provided for people suffering from traumatic brain injuries. Onecan find applications ranging from security-related applications and logisticssupport to location-based services. Due to its many-faceted nature, differentfields may refer to activity recognition as plan recognition, goal recognition,intent recognition, behavior recognition, location estimation and locationbased services.

Dept of EC, BVBCET, Hubli, KA.


Chapter 2

Review of Literature

In [3] authored by Rybski and Veloso, human activity is recognized by track-ing face displacements. Here, Haar trained classifier for human face is usedto detect and track faces. However, it is assumed that person’s face is alwaysfacing the camera. If the person’s face or in general, any object the classifieris trained to detect, is even slightly turned away from the camera, the haarclassifier fails to detect it. This is the inherent disadvantage of using haarclassifiers apart from the increased computation and delay.

In [1] authored by Hong, Turk and Huang, finite state machine modelsof gestures are constructed by learning spatial and temporal information ofthe gestures separately from each other. However, it is assumed that thegestures are performed directly in front of the camera and that the individualfeatures of the face and hands can be recognized and observed without error.

Much of the related work in activity modeling relies upon fixed cameraswith known poses with respect to the objects and people that they are track-ing. Our efforts focus on activity models that can be tracked and observedby uncalibrated vision systems which do not have the luxury of knowingtheir absolute position in the environment. This approach is attractive be-cause it minimizes the cost for setting up the system and increases its generalutility.


Chapter 3

Human Activity Recognitionfrom Tracked BodyDisplacements

3.1 Human Detection Algorithm

The detection algorithm consists of following processing steps:

1. Frame subtraction,

2. Thesholding,

3. Update the motion history,

4. Median filtering,

5. Finding contours,

6. Computing centers of the contours,

7. Motion-vector extraction and

8. Activity state classification.

All computations on each image frame are completed with one passover the image, thereby providing a high throughput. The high through-put is a critical factor for achieving the goal of real-time tracking. Theadvantages of this algorithm among others, include 1) identifying anobject by its motions without relying on a priori assumption of object


model, 2) using a single video camera, and 3) providing better com-putation speed and accuracy in detection and tracking compared tohaar classifiers.

(a) Frame SubtractionThe two adjacent image frames from the video sequence are de-noted as I1(x, y) and I2(x, y). The width and height for eachframe are W and H respectively. It is safe to assume that theframe rate is sufficiently high with respect to the velocity of themovement. With this assumption, the difference between I1(x, y)and I2(x, y) should contain information about the location andincremental movements of the object. LetId(x, y) = |I1(x, y)− I2(x, y)| ;The frame subtraction also serves an important function of elim-inating the background and any stationary objects. This is doneusing cvAbsDiff function.

(b) ThresholdingThe difference image is thresholded in to a binary image (silhou-ette) according toIsilh = 1 if Id(x, y) ≥ αIsilh = 0 if Id(x, y) ≤ α

where α is a threshold that determines the tradeoff between sen-sitivity and robustness of the tracking algorithm. This is doneusing cvThreshold function.

(c) Updating the Motion HistoryHere, the motion history image is updated by moving the silhou-ette image. The motion history image is updated as following:mhi(x,y)=timestamp if silhouette(x,y)!=0mhi(x,y)=0 if silhouette(x,y)=0 and mhi(x,y) < timestamp-durationmhi(x,y)=mhi(x,y) otherwiseThat is, MHI pixels where motion occurs are set to the currenttimestamp, while the pixels where motion happened far ago arecleared. This is done using cvUpdateMotionHistory function.

(d) Median FilteringEven though the camera mount is assumed to be fixed in position,salt ’n pepper noise is observed in the video captured by thecamera. In the video captured, this can be due to motion oftree leaves due to wind, waves on the surface of the water and



other such stray sources of noise. Eliminating these reduces thepossibilty of false detection. Median filtering an image is a widelyadopted method for noise removal. This method finds the medianof a pixel neighborhood and assigns it to the center pixel. Thisis done using cvSmooth function.

(e) Finding ContoursBefore finding contours, the image is dilated as a pre-processingstep. Dilation clubs all closely positioned motions of a persontogether to form a lumped region of motion. This enables propercontour detection. A contour is detected for each person in theframe. Centers for each of the contours is found.

3.2 Multiple Human Tracking and Tagging theactivities

At every frame of video, all of the contours that have been foundreturn an (x, y) position in image coordinates. The software thentracks each contour from frame to frame and stores a history of thecontour positions. Because relying on absolute (x, y) positions will bebrittle due to the above constraints, we instead look at the differenceof the contour center positions between subsequent frames of video,e.g. (∆x,∆y), where∆x = [xt − xt−1]and∆y = [yt − yt−1].

(a) Minimum distance contour correlationWhen there is more than one person in the frames, there will beas many contours. To link a contour in one frame to a contourin another frame, we find the Euclidean distances to the contourcenters in the current frame from all the contour centers in theprevious frame. Then, the contour center pair (one from previousframe and one from current frame) which produce a minimumdistance are linked together. The same is done for all contourcenters in the current frame.Let CONTOURS PREVIOUS and CONTOURS CURRENT rep-resent the number of contours in the previous and current frames.Depending on the number of contours in the previous and currentframes, three cases arise.

i. CONTOURS PREVIOUS = CONTOURS CURRENT



Here, the number of persons detected in the current and pre-vious frames are equal. Thus, each contour center in theprevious frame will be linked to a contour center in the cur-rent frame.

ii. CONTOURS PREVIOUS > CONTOURS CURRENTHere, the number of persons detected in the current frameis less than that in the previous frame. This implies thatone or more persons have left the scene or are motionless.Each contour center in the current frame gets linked to onecontour center in the previous frame. The unlinked contourcenters in the previous frame are dropped.

iii. CONTOURS PREVIOUS < CONTOURS CURRENTHere, the number of persons detected in the current frame ismore than that in the previous frame. This implies that oneor more persons have entered the scene or began moving fromstand still state. Since each contour center in the currentframe cannot be linked to a contour center in the previousframe, the process is reversed and the unlinked contour centeris displayed in its position.

(b) Fixed Length BacktrackingOnce a contour center in the current frame is linked to a contourcenter in the previous frame, the horizontal displacement betweenthem is calculated. But, comparing this value against the thresh-olds set to classify motion will lead to erroneous results. This isbecause, while walking, when a person keeps a foot forward, itis stationary momentarily and motion of the other foot which isbehind is started. When the center is computed for the contourin this situation, it will shift in a direction opposite to that ofthe person’s walking. Figure 3.1 illustrates the above situation.(Video source: www.istockphoto.com)This problem can be circumvented by fixed length backtracking.we have defined a hybrid approach by which backtracking is doneon the latest k frames. Thus, when the observation at time t isreceived, the state at [t - (k/FPS)], where FPS is the frames persecond of the video, is inferred. In a real-time system, this fixed-window approach will cause a delay in the state estimate, butas long as the delay is not too long, the estimate may still beuseful to act upon. Thus, the value of k represents a tradeoff inaccuracy and estimation lag in a real-time system.



Figure 3.1: Activity state recognition without and with backtracking

(c) Tagging activities and representationThe result of fixed length backtracking for each contour is com-pared against the set of thresholds to arrive at the activity stateof the persons. Once this is done, a rectangle is drawn aroundthe contour and the activity state is displayed.

Figure 3.2 illustrates the various processes involved in the algorithm.



Figure 3.2: Outputs of various processes involved in the algorithm



Chapter 4

Results and Conclusion

4.1 Experimental Results

To test the algorithm, a test video was made in the college campus withan Olympus FE15 digital camera fixed on a tripod at a distance of aboutten meters from the road. The camera lens was directed perpendicular tothe motion of the people. Figure 4.1 depicts the results obtained at variousinstants of the test video. It can be seen that the algorithm has correctlydetected the activity states of the various persons in the video frames.

The recognizer algorithm performs well in normal lighting conditions. Itis able to distinguish between walking and running activities. It is able toreject non-human moving objects like cycles, bikes and cars. It can trackany number of persons in the video. Incorporating fixed length backtrackinggreatly increased the accuracy of the activity recognizing algorithm. How-ever, it made few false detections in the presence of shadows. It is alsoinefficient in detecting the stand still activity state.

4.2 Conclusion and Future Work

4.2.1 Conclusion

The advantage of using Tracked Body Displacements is that person is de-tected irrespective of his/her orientation with respect to the camera. Thishowever, is not the case with Haar classifiers where the person’s featuresshould closely match with the features of the persons in the images used


Figure 4.1: Results of the Activity Recognition algorithm



during the training of the classifier. Haar classifier’s advantage lies in thefact that it can detect even a stationary person.

We set out to acheive human activity recognition considering the activityset consisting of the states - Run/Walk Left/Right and Stand Still. ExceptStand still state, all other states are satisfactorily recognized. We had notanticipated that shadows will cause a problem in the detection of humans.But, we realized that when there are heavy shadows, like in the eveningsor mornings, false detections occur. However, the algorithm recognizes theactivities of multiple persons remarkably well.

4.2.2 Future Work

To make the algorithm more robust, we intend to incorporate statisticalmodelling of the activity states. We plan to reliably detect the stand stillstate by extracting features unique in that situation. We will incorporateadaptive thresholding to distinguish between walking and running statesreliably irrespective of whether the person is near to the capturing cameraor far from it. We would also like to explore the option of using haar trainingto tackle the aforesaid problems.



Bibliography

[1] Huang T.S. Hong P., Turk M. Gesture modeling and recognitionusing finite state machines. Proceedings of the Fourth IEEE Inter-national Conference and Gesture Recognition, Grenoble, France,2000.

[2] M. E. Pollack and L. E. B. et al. Autominder: an intelligent cogni-tive orthotic system for people with memory impairment. Roboticsand Autonomous Systems, 2003.

[3] Paul E. Rybski and Manuela M. Veloso. Robust real-time humanactivity recognition from tracked face displacements. September2005.

Date post:	30-Oct-2014
Category:	Documents
Upload:	sagar-medikeri
View:	54 times
Download:	3 times

Human Activity Recognition

Documents