CHAPTER 6 HUMAN BEHAVIOR UNDERSTANDING...

127

CHAPTER 6

HUMAN BEHAVIOR UNDERSTANDING MODEL

6.1 INTRODUCTION

Analyzing the human behavior in video sequences is an active field

of research for the past few years. The vital applications of this field could

include the monitoring of behaviors for secure installations, video

surveillance, video retrieval and human computer interaction systems. The

main objective is to recognize and predict the behavior and detect

abnormalities. Currently, many researchers have contributed many

approaches for predicting the behavior as a post processing task. In this work,

it is proposed to analyze the behavior of the human action during the course.

In the common scenario, like parking lots and supermarkets, the visual

surveillance system should analyze abnormal behavior (as an indicative of

theft) and raise an alarm to alert the visual analysts. Hence, the action patterns

of the people should be analyzed and the state of action should be detected as

either ‘normal’ or ‘abnormal’ to understand the behavior.

The characterization of human behavior is equivalent to dealing

with a sequence of video frames that contains both the spatial and temporal

information (Cadamo et al 2010). The temporal information conveys more

details for human behavior understanding. Normally, human posture analysis

is the basic step to extract the temporal information. During human posture

analysis, various human behavior patterns are exhibited in the form of key

postures like ‘turnleft’, ‘guardkick’, ‘falldown’ etc.

128

This chapter presents a novel human behavior understanding model

that analyses the human movements and learns the human posture status

either as ‘normal’ or ‘abnormal’ from the video sequences using Probabilistic

Global Action Graph (PGAG). According to the posture analysis, the status of

the human behavior can be predicted as either ‘normal’ or ‘abnormal’ using

the proposed approach. The process flow of the human behavior

understanding model is shown in Figure 6.1.

Figure 6.1 Process flow of the human behavior understanding model

The proposed human behavior understanding model consists of two

phases, namely training and testing phases. During the training phase, the

following pipeline of processes is involved: (i) Foreground segmentation, (ii)

Feature Extraction, (iii) Vector Quantization, and (iv) probabilistic Global

Action Graph construction.

Test video

State

likelihood

Feature

Extraction

Normal / abnormal

status

Behavior

Alarming

Training Phase

Testing Phase

Foreground

Segmentation

TSOC

COC

Probabilistic

Global Action

Graph (PGAG)

Feature

Extraction

VQ

Preprocessing Vector

Quantization

Behavior Modeling

arg max Pij

j

Video capturing Silhouettes

Normal /abnormal

VQ symbols

129

(i) The pixel layer based approach is used as an initial

preprocessing step to segment the foreground (human

silhouette) from the action video.

(ii) TSOC and COC are identified as features, which are extracted

from each silhouette.

(iii) In vector quantization, the aim is to group similar postures

together and create a finite number of key postures

representing the code book. In any case, 35 dimensional shape

features for each key posture is symbolized as a one

dimensional ‘VQ symbol’ in code book.

(iv) A semiautomatic state space approach based human

understanding model is simulated using Probabilistic Global

Action Graph (PGAG).

In order to experiment the designed model, the test phase is

formulated, which has two major steps, namely

(i) For the input sequence of silhouette, the likelihood of key

posture is identified using similarity measure.

(ii) The key posture is analyzed as either ‘normal or ‘abnormal’

via PGAG with the help of an alarm.

In any real-time system, the behavior model is very essential to

understand domain knowledge irrespective of action. The rest of the chapter

focuses mainly on designing domain specific behavior model based on key

posture transitions.

130

6.2 PROBABILISTIC GLOBAL ACTION GRAPH (PGAG)

In general, every action has a finite number of key postures and

there exists a bound relation across actions due to restrictive movement of the

human body. Hence, the related action sequences may share a few of the key

postures. The relations can be denoted in two cases:

(i). Considering the ‘walk’ and ‘turnaround’ normal actions, few

of the frequently correlated postures are ‘heel-strike’, ‘toe-off’,

‘mid-stance’ etc.

(ii). Similarly, for ‘shotgun’ and ‘fell down’ abnormal actions,

‘raise hand’, ‘point’, ‘bend-knee’, ‘lower body’ and ‘fall on

floor’ are the correlated postures. Normally, few actions

exhibit similar key postures either at the beginning or ending

point of their occurrence.

A weighted directed action graph termed as PGAG is constructed,

which acquires and distinguishes the posture transitions across various

composite actions globally. In this graph, each node represents a key posture.

The weighted link between nodes represents the transitional probability

between the two key postures. The temporal characteristic of each action is

obtained using the posture transitions. Under the state level hypothesis, the

transitions among nodes signify the occurrence of an event. Events can be

defined based on dominant and persistent characteristics of the posture

transitions. Hence, the PGAG possesses the characteristics of understanding

the behavior in terms of posture transitions.

131

6.2.1 Construction of PGAG

The PGAG is constructed using probabilistic posture transition

matrix. The steps involved during the construction of PGAG are as follows:

1. Consider the number of nodes in PGAG, which are equal to

the number of VQ symbols in the posture code book. (i.e. # of

PGAG nodes = # of key postures).

2. For each possible posture, the posture transition probability

(Pij) is obtained between key postures ‘i’ and posture ‘j’

Pij ='i'Posturefromstransitionof#

'j'Postureto'i'Posturefromstransitionof# (6.1)

3. The posture transition probability between the postures is

constrained with,

1Pm

1jij

(6.2)

where the sum of transition probabilities from ith

posture to all other jth

postures must be equal to ‘1’, and ‘m’ is the total number of key postures.

Hence, the posture transition matrix has the dimension of m x m.

The PGAG using six key postures of ‘runstop’ action are depicted

in Figure 6.2. In this graph, the ‘runstop’ action is performed in multiple

views. The transition paths are normally cyclic in nature and there exist

specific beginning or ending key postures as detailed in Figure 6.2.

132

0.3

0.4540.37

0.322

0.45

0.130

0.1650.245

0.125

0.226

P50.016

P0

P1 P2 P3 P4

0.34

0.19

0.228 0.2 0.48

0.109

1

0.13 0.52

1

Figure 6.2 PGAG with 6 nodes (P0-P5) and posture transition

probabilities for ‘runstop’ action

The corresponding posture transition matrix using PGAG is listed

in Table 6.1. Also, the posture transition matrix has non-zero probabilities at

the lower diagonals, the main reason here is that the ‘runstop’ action has

temporally related key postures with strict left-to-right transitions.

Table 6.1 Posture transition matrix for ‘runstop’ action

Pij P0 P1 P2 P3 P4 P5

P0 0.109 0.165 0.226 0.125 0.245 0.130

P1 0.000 0.228 0.450 0.000 0.000 0.322

P2 0.000 0.130 0.200 0.370 0.000 0.300

P3 0.000 0.000 0.190 0.340 0.454 0.016

P4 0.000 0.000 0.000 0.520 0.480 0.000

P5 0.000 0.000 0.000 0.000 0.000 1.000

133

6.3 BEHAVIOR UNDERSTANDING MODEL

The constructed PGAG can be effectively used for analyzing the

frame level action dynamics in the form of key posture transitions. A human

behavior understanding model is simulated, which predicts the status of the

key postures either as ‘normal’ or ‘abnormal’ using a priori knowledge.

In the training phase, each VQ symbol has been assigned with

unique behavior status as either ‘normal’ or ‘abnormal’. The model notifies

the abnormal behavior in the sequence of events. It also raises an alarm during

an abnormal event, which analyzes the current state and the next state using

PGAG and VQ symbol status (normal / abnormal).

The probabilistic state transitions are described in the form of four

cases which are depicted in Figures 6.3 (a) to 6.3(d).

Case 1: Initial state is normal

Figure 6.3(a) Case 1 of PGAG

Case 2: Current state is normal and the next state is most probably normal

Figure 6.3(b) Case 2 of PGAG

t > 1P0

t = 1

Pj

t > i

Pj

t > i

t > iPi

t = i

Pij > 0.5 i

Pij < 0.5 i

Pii > 0.5 i

134

Case 3: Initial state is abnormal

Figure 6.3(c) Case 3 of PGAG

Case 4: Current state is abnormal and the next state is most probably

abnormal

Figure 6.3(d) Case 4 of PGAG

where, in Case 2 and Case 4, i represents the maximum likelihood of posture

transitions from current posture ‘i’ to any posture ‘j’, i.e. the maximum value

of ith

row in posture transition matrix.

In the testing phase for the video, the behavior status can be plotted

between the number of frames and their posture status indication (where

Normal = 0 and Abnormal =1). Thus, the PGAG based human behavior

understanding model is capable of measuring the probabilistic likelihood of

next state of the posture sequence and generating appropriate alarm for the

concerned authorities in real-time.

6.4 EXPERIMENTS

The proposed human behavior understanding model is

experimented on a public video data set MuHAVi-MAS

t > 1P0

t = 1

Pj

t > i

Pj

t > i

t > iPi

t = i

Pij > 0.5 i

Pij < 0.5 i

Pii> 0.5 i

135

(http://dipersec.king.ac.uk/MuHAVi-MAS/). The silhouette images are in

PNG format and each action combination can be downloaded as a small zip

file (between 1 to 3 MB). Also, the developers of MuHAVi-MAS have added

3 constant characters "GT-" to the beginning of every original image name to

label them as ground truth images. Here, 5 composite action classes such as

‘CA1-walkturnback’, ‘CA2-runstop’, ‘CA3-punch’, ‘CA4- kick’ and

‘CA5-shotguncollapse’ along with manually annotated action status are

available for the corresponding image frames. Also, it contains information

about actor, camera views and sample identity. Thus, the MuHAVi-MAS data

set has enough information to validate the performance of the proposals for

human behavior understanding model using PGAG. Sample frames from the

data set for five composite actions are shown in Figure 6.4.

(a) CA3 - Punch (b) CA4- Kick (c) CA5- ShotGunCollapse

(d) CA1 - WalkTurnBack (e) CA2 – RunStop

Figure 6.4 Sample image frames from MuHAVi-MAS dataset for 5

composite actions

http://dipersec.king.ac.uk/MuHAVi-MAS/).

136

This multi-view data set with five cameras contains the ground

truth, which is explicitly represented for each of the composite actions

performed by five actors. Also, these five composite actions have been

logically partitioned into 14 primitive actions as detailed in Table 6.2.

Table 6.2 Detailed specification about 14 primitive actions from

MuHAVi-MAS data set

Composite

action

label

Composite

Action

Primitive

action

labels

Primitive

Actions

Data set size =

No. of samples

x No. of frames

C11WalkRightToLe

ft8 x 72 = 576

C13 TurnBackRight 4 x 61 = 244

C12WalkLeftToRig

ht8 x 86 = 708

CA1 WalkTurnBack

N - Normal

C14 TurnBackLeft 4 x 54 = 216

C9 RunRightToLeft 8 x 66 = 528

C13 TurnBackRight 4 x 52 = 208

C10 RunLeftToRight 8 x 78 = 624

CA2 Run_Stop

N - Normal

C14 TurnBackLeft 4 x 51 = 204

C8 GuardToPunch 16 x 28 = 448CA3 Punch

AN - Abnormal C7 PunchRight 16 x 46 = 736

C6 GuardToKick 16 x 28 = 448CA4 Kick

AN - Abnormal C5 KickRight 16 x 47 = 752

C1 CollapseRight 8 x 84 = 672

C3 StandupRight 8 x 120 = 960

C2 CollapseLeft 8 x 93 = 744

CA5 ShotGunCollapse

AN - Abnormal

C4 StandupLeft 4 x 112 = 448

Based on available ground truth, totally, the data set with 140 video

samples contains 3308 normal frames and 5208 abnormal frames, and as a

whole, 8516 frames are considered for the experimentation. The data set

detailed so far in Table 6.2 is uniformly partitioned into two data sets, namely

Train Set and Test Set. During this partitioning, the number of frames

considered per composite action and the corresponding ground truth for each

frame with status as either ‘normal’ or ‘abnormal’ is mentioned in detail in

Table 6.3.

137

Table 6.3 MuHAVi-MAS data set partitioning

No. of frames per composite actionData Set

CA1 CA2 CA3 CA4 CA5Normal Abnormal

Train set 872 782 592 600 1412 2248 2010

Test set 872 782 592 600 1412 2248 2010

In the training phase, Train set is chosen to understand the behavior

by updating the PGAG. From each action video silhouettes, the TSOC and

COC features are extracted and then vector quantized into 82 key postures.

The recognized key postures are further subcategorized as 39 ‘normal’ and

43 ‘abnormal’ key postures. The detailed categorization of key postures is

listed in Table 6.4.

Table 6.4 Categorization of key postures per composite action

Composite

Action

No. of Key

Postures

VQ

symbols

No. of

normal

postures

No. of

abnormal

postures

WalkTurnBack 15 w1-w15 15 0

Kick 17 k1-k17 3 14

Punch 18 p1-p18 4 14

Runstop 12 r1-r12 12 0

ShotGunCollapse 20 s1-s20 5 15

The recognized 82 key postures represent the nodes of the PGAG,

and the 82 x 82 dimensional posture transition probability matrix is computed,

considering the similarity between the training postures and key posture. The

‘runstop’ action has 12 key postures and their posture transition matrix is

listed in Table 6.5.

138

Table 6.5 Posture transition matrix for ‘runstop’ action using

MuHAVi-MAS data samples

Pij r1 r2 r3 r4 r5 r6 r7 r8 r9 r10 r11 r12

r1 0.351 0.149 0.020 0.007 0.041 0.027 0.027 0.074 0.074 0.047 0.115 0.068

r2 0.159 0.280 0.098 0.015 0.030 0.023 0.061 0.121 0.091 0.008 0.098 0.015

r3 0.041 0.107 0.339 0.132 0.008 0.116 0.017 0.190 0.000 0.000 0.041 0.008

r4 0.000 0.028 0.135 0.390 0.057 0.199 0.014 0.149 0.000 0.014 0.014 0.000

r5 0.050 0.042 0.034 0.050 0.210 0.084 0.000 0.143 0.025 0.185 0.134 0.042

r6 0.022 0.014 0.065 0.275 0.080 0.348 0.007 0.152 0.000 0.007 0.029 0.000

r7 0.064 0.085 0.213 0.043 0.043 0.021 0.277 0.106 0.064 0.000 0.085 0.000

r8 0.026 0.093 0.144 0.093 0.093 0.113 0.015 0.273 0.005 0.021 0.093 0.031

r9 0.126 0.165 0.000 0.000 0.000 0.010 0.019 0.010 0.505 0.000 0.029 0.136

r10 0.090 0.010 0.000 0.010 0.170 0.000 0.000 0.020 0.030 0.310 0.240 0.120

r11 0.099 0.033 0.007 0.013 0.139 0.033 0.033 0.159 0.040 0.106 0.265 0.073

r12 0.136 0.027 0.009 0.000 0.055 0.018 0.009 0.000 0.109 0.145 0.045 0.445

At any current state of the simulated model with ith

key posture, the

possible next state transition is evaluated using the similarity score ( i). This i

represents the most probable transitions of ith

key posture, which are

highlighted in Table 6.5. Also, the few cells having the values ‘0.000’ imply

that no transitions have occurred between the corresponding postures.

6.4.1 GUI Design for Human Behavior Analysis

The GUI design for human behavior analysis is implemented,

which includes the ‘Browse’ option and display provisions for frame number,

similarity score with the closest key posture, key posture label and current

action status. The ‘Browse’ option is used to select the test video sample for

verifying the model performance. The frame number indicates the current

silhouette being processed in the input video. The similarity score in the range

[0..1], provides the distance measure between current frame and closest key

posture across code book. The posture label displays the VQ symbol based on

the action. The posture type indicates behavior alarm as either ‘normal’ or

‘abnormal’. The simulated model performance for the given video is plotted

as the number of frames versus the status of the posture as either ‘normal’ or

‘abnormal’. The ‘normal’ or ‘abnormal’ status is scaled as 0 or 1 respectively.

139

In Figure 6.5, the summary of results obtained using proposed

PGAG based behavior understanding model is illustrated for the test sample

of ‘kick’ abnormal action. This sample video sequence has 40 frames, out of

which 29 frames are categorized as ‘abnormal’ and 11 frames are accounted

as ‘normal’. According to the annotation provided by the data set, the result

achieved during behavior learning for this sample is 72.5%. Even though at

the action level ‘kick’ is categorized into ‘abnormal’ status, at the frame level

their starting and ending frame sequences exhibit ‘standing’ posture only,

which is a ‘normal’ one. Hence, the proposed model has attained correct

behavior understanding. Similarly, for the second test sample of

‘shotguncollapse’ abnormal action, the results are summarized in Figure 6.6.

This sample consists of 50 frames and out of which 48 frames are categorized

as ‘abnormal’. Thus, the proposed behavior understanding model obtained

96% accuracy. Likewise, for the third test sample of ‘walkturnback’ normal

action as shown in Figure 6.7, the accuracy reported is 95%.

Figure 6.5 GUI based results for ‘kick’ action, where frame number 40

is alarmed as ‘ABNORMAL’ and the unique VQ symbol

from PGAG is 56. Overall performance plot shows out of 40

frames, 29 frames are identified as ‘ABNORMAL’ and

hence most probably ‘ABNORMAL’ status

140

Figure 6.6 GUI based Results for ‘shotguncollapse’ action, where frame

number 50 is alarmed as ‘ABNORMAL’ and the unique VQ

symbol from PGAG is 62. Overall performance plot shows out

of 50 frames, 48 frames are identified as ‘ABNORMAL’ and

hence most probably ‘ABNORMAL’ status

Figure 6.7 GUI based Results for ‘walkturnback’ action, where frame

number 40 is alarmed as ‘NORMAL’ and the unique VQ

symbol from PGAG is 23. Overall performance plot shows out

of 40 frames, 38 frames are identified as ‘NORMAL’ and

hence most probably ‘NORMAL’ status

141

6.4.2 Performance Analysis

The model is evaluated for predicting the ‘normal’ or ‘abnormal’

posture status using a test set of 56 video samples with 4258 frames. The test

outcome can be either ‘1’ i.e. predicting that the human has performed

‘abnormal’ action or ‘0’ i.e. predicting that the human has performed ‘normal’

action.

TP - true positives (abnormal, correctly declared as abnormal)

TN - true negatives (normal, correctly declared as normal)

FP - false positives (normal, incorrectly declared as abnormal)

FN - false negatives (abnormal, incorrectly declared as

normal)

The performance is measured based on the following metrics:

Accuracy – Proportion of true results in the result set.

Accuracy (Ac) =FNFPTNTPof#

TNof#TPof# (6.3)

Precision – Proportion of true positive against all positive

results.

Precision (Pr) =FPTPof#

TPof# (6.4)

Sensitivity – Proportion of actual positives which are correctly

identified as such. This is also called as ‘recall rate’.

Sensitivity (Sen) = 100XFNTPof#

TPof# (6.5)

142

Specificity – Proportion of negatives which are correctly

identified.

Specificity (Spe) = 100XFPTNof#

TNof# (6.6)

The performance of the behavior model for the training and test

data set using the PGAG approach is reported in Table 6.6 and Table 6.7

respectively.

Table 6.6 Performance analysis of human behavior model on training set

No. of

framesData Set

N AN

FP FN TP TN Ac PrSen

(%)

Spe

(%)

Train Set 2248 2010 319 317 1691 1931 0.85 0.84 84 86

In Table 6.6, the performance of the behavior model is obtained for

the Train set with 4258 video frames. The proposed work has correctly

categorized the ‘normal’ status with 86% specificity and categorized correctly

the ‘abnormal’ status with 84% sensitivity. Hence, out of 2010 ‘abnormal’

frames, 1691 are correctly recognized, similarly out of 2248 ‘normal’ frames,

1931 are correctly recognized and obtained 85% accuracy of results.

Table 6.7 Performance analysis of human behavior model on test set

No. of

framesData Set

N AN

FP FN TP TN Ac PrSen

(%)

Spe

(%)

Test Set 2248 2010 341 329 1669 1919 0.81 0.83 84 85

143

In Table 6.7, the performance of the behavior model for the Test set

with 4258 unknown video frames is considered. The proposed work has

correctly categorized the ‘normal’ status with 85% specificity and correctly

categorized the ‘abnormal’ status with 84% sensitivity. Hence, out of 2010

‘abnormal’ frames, 1669 are correctly recognized, similarly out of 2248

‘normal’ frames, 1919 are correctly recognized and obtained 85% accuracy of

results.

For both the results reported in Table 6.6 and Table 6.7, the reason

for getting only around 84% is due to the improper ground truth information.

Based on ground truth, the actions ‘kick’, ‘punch’ and ‘shotguncollapse’ have

been marked as ‘abnormal’. But, as per the human visual perception, even

though they are marked as ‘abnormal’ actions, the initial point and end point

of 13% of the action frames are considered as ‘normal’ only. Hence, the result

reported has less accuracy in terms of sensitivity and specificity.

The probabilistic behavior model is well structured and

implemented with real time data. The performance shows that the system is

highly reliable for behavior analysis.

6.5 SUMMARY

The main contribution in this chapter is to simulate the human

behavior understanding model for real-time environment. This objective is

analyzed and state space approach is formulated. The PGAG has been

proposed to learn the action dynamics at the frame level. The ultimate

purpose of the system is to predict the behavior status either as ‘normal’ or

‘abnormal’. The human behavior model is designed and experimented using

multi-viewed data set with train set of 4258 frames and test set of 4258 frames

put together 8516 frames. The system is evaluated with four metrics. The

performance results have achieved 86% specificity and 84% sensitivity for

144

train set. Similarly for the test set, the system achieved 85% specificity and

84% sensitivity. The simulated behavior understanding model can analyze

video contents and recognize human postures and status of the actions well in

advance. This proposed model can be effectively utilized to ease the real

world scenarios where behavior understanding is a complex task.

The forthcoming chapter presents concluding remarks and

summarizes the findings of this research work. Also, the future avenues for

further extension are highlighted.

Date post:	05-May-2018
Category:	Documents
Upload:	vonga
View:	215 times
Download:	2 times

CHAPTER 6 HUMAN BEHAVIOR UNDERSTANDING...

Documents