A multi-sensor cognitive approach for active security...

A multi-sensor cognitive approach for activesecurity monitoring of abnormal overcrowding

situationsSimone Chiappino, Pietro Morerio, Lucio Marcenaro, Elisabetta Fuiano, Giulia Repetto, Carlo S. Regazzoni

DITEN, University of GenovaVia Opera Pia 11A 16145 Genoa - Italy

Emails: {s.chiappino,pmorerio}@ginevra.dibe.unige.it{lucio.marcenaro,carlo.regazzoni}@unige.it

Abstract—Intelligent camera networks have been lately em-ployed for a wide range of heterogeneous purposes, concerningboth security and safety oriented systems. Military and civilapplications ranging from border surveillance and public spacesmonitoring to ambient intelligence and road safety are repre-sentative of such various applications. In this paper a discussionon the exploitation of a cognitive-based architecture, couplingsimulation tools to real scenarios for interaction modelling andanalysis, is presented. The application of the proposed generalframework, which is given the name of Cognitive Node - CN, tocrowd monitoring is hereby presented.

I. INTRODUCTION

A lot of works have been devoted in the last decade tolink traditional computer vision tasks to high-level contextaware functionalities such as scene understanding, behaviouranalysis, interaction classification or recognition of possiblethreats or dangerous situations [1], [2], [3], [4].

Among the several disciplines which are involved in thedesign of next generation security and safety systems, cog-nitive sciences [5] represent one of the most promising interms of capability of provoking improvements with respectto state of the art. As a matter of fact, several recent studieshave proposed the application of smart functionalities tocamera and sensor networks in order to move from objectrecognition paradigm to event/situation recognition one [6].The application of bio-inspired models to safety and securitytasks represents a relevant added value. In fact, the capabilitynot only of detecting the presence of an intruder in a forbiddenarea or recognizing the trajectory of an object in an urbanscenario (e.g. a baggage in a station or a car on the road)but also of interpreting the behaviour of the entity in themonitored scene or properly selecting events of interest (upto anomalous events) with respect to normal situations. Inaddition, to efficiently exploit cognitive capabilities in anintelligent sensor network, the role of data fusion algorithmsis crucial [7], [8]. In the literature, several works deal withdata fusion problem applied to heterogeneous sensors both forsecurity [9], [10] and safety tasks [11], [12].

In this work, the features of a cognitive-based framework,inspired by the previously cited concepts, are described andthe application of the proposed architecture to crowd analysis

is presented. The paper is organized as follows: in section II,an overview of the state of the art in the field of crowd analysisis given. Sections III and IV are devoted to the exploitation ofa cognitive-based architecture. Section V include a practicalapplication of the theory developed in the previous section.Eventually, conclusions are drawn in section VI. Future de-velopments of this work are also given.

II. CROWD ANALYSIS

The crowd phenomenon has recently increasingly attractedthe attention of worldwide researchers in several applicationdomains as, for instance, visual surveillance, serious gamingand public space design [13]. Different implications related tocrowd behavior analysis can be considered since both technicaland social aspect is still under researchers’ investigation.

On the one hand, researchers focusing on psychology andsociology domains consider crowd behaviour modelling as asocial phenomenon. Several examples can be found in theopen literature dealing with the role and the relevance ofhuman interaction factors in characterizing the behaviour ofa crowd. In [14], a simulation-based approach to the creationof a population of pedestrians is proposed. The authors aim atmodelling the behaviour of up to 10,000 pedestrians in orderto analyse several movement patterns and people reactionstypical of an urban environment. The impact of emotions ofindividual agents in a crowded area has been investigatedalso by Liu et al. [15] in order to simulate and model thebehaviour of groups of people. As well, Handford and Rogers[16] have recently proposed a framework for modelling driversbehaviour during an evacuation in a post disaster scenariotaking into account several social factors which can affect theirbehaviour in following a path to reach a safe spot.

To the other hand, technical aspects in crowd behaviouranalysis applications mainly focus on the detection of eventsor the extraction of particular features exploiting computervision based algorithms. An estimation of the number ofpeople in a crowd can be performed by computing the num-ber of foreground and edge pixels. Davies et al. propose asystem using Fourier transform for estimating the motion ofthe crowd [17]. Many researchers tried to use segmentation

2215

and shape recognition techniques for detecting and trackingindividuals and thus estimating the crowd. However this kindof approach can hardly be applied to overcrowding situationswhere people are typically severely occluded [18], [19]. Neuralnetworks are used in [20] for estimating crowd density fromtexture analysis, but in this case an extensive training phaseis needed for getting good performances. A Bayesian model-based segmentation algorithm was proposed in [21]; thismethod uses shape models for segmenting individual in thescene and is thus able to estimate the number of people in thecrowd. The algorithm is based on Markov chain Monte Carlosampling and it is extremely slow for large crowds. Opticalflow based technique is used in [22], [23], while Rahmalan etal. [24] proposed a computer vision-based approach relying onthree different methods to estimate crowd density for outdoorsurveillance applications.

As a matter of fact, the combination of technical and socialaspects can represent an added value with respect to thealready presented works. A first example can be found in[25] where authors exploit a joint visual tracking-Bayesianreasoning approach to understand people and crowd behaviorin a metro station scenario. More recently [26], [27], [28], [29]a social force model describing the interactions among theindividual members of a group of people has been proposedto detect abnormal events in crowd videos.

The Cognitive Node presented in this work can be appliedto the crowd analysis domain to effectively join technical andsocial aspects related to the behaviour of groups of people. Inthis scenario the goal of the system is to analyse and classifycrowd interactions in order to maintain a proper securitylevel in the monitored area and to put in action effectivecountermeasures in case of detection of panic or overcrowdingsituations.

III. COGNITIVE MODEL

The proposed approach has been implemented according toa bio-inspired model of human reasoning and consciousnessgrounded on the work of the neuro-physiologist A. Damasio[5]. Damasio’s theories describe the cognitive entities ascomplex systems capable of incremental learning based onthe experience of the relationships between themselves andthe external world. Two specific brain devices can be definedto formalize the above concept called proto-self and core-self. Such devices are specifically devoted to monitor andmanage respectively the internal status of an entity (proto-self) and the relationships with the external world (core-self). Thus, a crucial aspect in modelling a cognitive entityfollowing Damasio’s model is first of all represented by thecapability of accessing entity’s internal status and secondly bythe knowledge and analysis of the surrounding environment.

This approach can be mapped into a sensing framework bydividing the sensors into endo-sensors (or proto-sensors) andeso-sensors (or core-sensors) as they monitor, respectively, theinternal or external state of the interacting entities.

The core of the proposed architecture is the so calledCognitive Node. It can be considered as a module that is

able to receive data from sensors, to process them for findingpotentially dangerous or anomalous events and situations, and,in some cases, to interact with the environment itself or contactthe human operator.

A. Cognitive Cycle for single and multiple entities represen-tation

Within the proposed scheme the representation of eachentity has to be structured into a multi-level hierarchicalway. As a whole, the closed processing loop realized bythe cognitive node in case of a given interaction between anobserved object and the system can be represented by meansof the so-called Cognitive Cycle (CC - see Figure 1) which iscomposed of four main steps:• Sensing: the system has to continuously acquire knowl-

edge about the interacting objects and about its owninternal status.

• Analysis: the collected raw knowledge is processed inorder to obtain a precise and concise representation ofthe occurring causal interactions.

• Decision: the precise information provided by analysisphase is processed and a decision strategy is selectedaccording to the goal of the system.

• Action: the system put into practice the configurationprovided by the decision phase under the form of a directaction over the environment or of a message provided tothe user.

Fig. 1. Cognitive Cycle (single object representation)

In addition, the learning phase is continuous and involvesall the stages (within certain limits) of the cognitive cycle.

Thus, the CC can be viewed as a dispositional embodieddescription of an object as it includes reactions it generates inthe cognitive system, i.e. to possible actions that the systemcan plan and perform when a situation involving that objectis observed and predicted. According to this statement, it ispossible to refer to the representation model depicted in Figure1 as to an Embodied Cognitive Cycle (ECC). With respect tosecurity and safety domains, in which the ECC is here applied,the above mentioned embodied description is associated toa precise objective: to maintain stability of the equilibrium

2216

between the object and the environment (i.e. maintenance ofthe proper level of security and/or safety).

As a consequence, each entity is provided by a ’secu-rity/safety oriented ECC (S/S-ECC)’ which is representative ofthe entity itself within the Cognitive Node. The mapping of theS/S-ECC onto the Cognitive Node chain shown in Figure 2 canbe viewed as the result of the interaction between two entities,each one described as a cognitive cycle too. In particular, ifthe external object (eso) and the internal autonomous system(endo) are represented as a couple of Interacting VirtualCognitive Cycles (IVCC). The IVCCs can be matched withthe CN structure (i.e. the bottom-up and the top-down chains)by associating parts of the knowledge related with the differentECC phases to the multilevel structure processing parts of theCN (Figure 3).

Fig. 2. Cognitive Node: Bottom-up analysis and top-down decision chain

Fig. 3. Embodied Cognitive Cycle, Interactive Virtual Cognitive Cycles andCognitive Node matching representation

More in detail, the representation model of the ECC (topleft corner of Figure 3) is centered on the cognitive system thatcan be considered by itself as a cognitive entity. Therefore, it ispossible to map the proposed representation as in the top rightcorner of Figure 3, where two IVCCs, the one representingthe entity (or object - IVCCO) and the other representing the

cognitive system (IVCCS), interact in a given environment.In this model, the sensing and action blocks of the IVCCScorrespond to the sensing and action blocks of the ECC (seebottom right corner of the figure). However, in the IVCCS, suchblocks assume a parallel virtual representation of the physicalsensing and action observed corresponding respectively tothe Intelligent Sensing Node and the Actuator blocks in thegeneral framework.

The proposed interpretation of the matching among theembodied cognitive model, the interactive virtual cycles rep-resenting the entities acting in the environment (including thesystem) and the cognitive node allows considering the CN asan universal machine for processing ECCs with respect to alarge variety of application domains. In general, each ECCstarts with ISN (intelligent Sensor Node) data including aninteracting entity (eso-sensor) and a system reflexive observa-tion (endo-sensor). The observed data (acquired under systemviewpoint) are considered from two different perspectives (theobject and the system) by creating a description of the currentstate of the entities using knowledge learned in previousexperiences. Such process happens at event detection andsituation assessment sub-blocks. Then, a prediction of futureactions taken by the IVCCO, contextualized with the self-prediction of future planned actions of the system, occur atprediction sub-block. The use of the knowledge of the IVCCOends at this stage. Finally, the IVCCS is completed by adjustingplans of the system in the representation of its decision andaction phases that are, as stated above, a parallel virtualizationof the ECC.

In addition, it is relevant to briefly point out that a similardecomposition can be adopted in the case when two interactiveentities are observed. The description of the interacting sub-jects can be modelled observing that the two entities can forma single meta-entity to which is associated a meta cognitivecycle interacting with the autonomous system. As the meta-entity (ME) can simply be considered as a composition of thetwo cognitive cycles associated to the initial entity couple.

The advantage of the proposed representation, involving thedescription of an Embodied Cognitive Cycle by means of anIVCC couple is that the same mechanism used to representthe interaction of a ME with the autonomous system can bealso used to represent the interaction between two observedentities forming an observed meta-entity.

Dynamic Bayesian Networks (DBNs) [30] can be used torepresent cognitive cycles and IVCCs based on an algorithm,called Autobiographical Memory [31], and provide a tool fordescribing embodied objects within the CN in a way that canallow incremental learning from experience. In particular, itshould be noted that also the interaction between the operatorand the autonomous system can be represented as an IVCC.In that case, the operator-system interaction can be differentlyused as an internal reference for the CN as the operator canbe seen as a teaching entity addressing most effective actionstowards the goal of maintaining security/safety levels duringthe learning phase.

2217

IV. COGNITIVE NODE

The general architecture of the Cognitive Node is depictedin figure 4.

Intelligent sensors are able to acquire raw data from physicalsensors and to generate feature vectors corresponding to theentities to be observed by the cognitive node. Acquired featurevectors must be fused spatially and temporally in the firststages of the node, if they are coming from different intelligentsensors.

Fig. 4. Cognitive Node Architecture

As briefly introduced in the previous section, the CognitiveNode is internally subdivided into two main parts: the analysisand the decision blocks linked through the cognitive refine-ment block. Analysis blocks are responsible for organizingsensors data and finding interesting or notable configurationsof the observed entities at different levels. Those levels cancommunicate directly with the human operator through net-work interfaces in the upper part of figure 4. This is basicallywhat can be done by a standard signal processing system beingable to alert a supervisor whenever a specific event is detected.A prediction module is able to use the stored experience ofthe node through the internal Autobiographical Memory forestimating a possible evolution of the observed environment.

All the processed data and predictions generated by theanalysis steps are used as input of the cognitive refinementblock. This module can be seen as a surrogate of the humanoperator: during the configuration of the system it is able tolearn how to distinguish between different levels of potentiallydangerous situations. This process can be done by manuallylabeling different zones of the observed data or by implement-ing a specific algorithm for the particular cognitive application.In the on-line phase, cognitive refinement module is able todetect if a predicted condition is starting drift away from thestandard observed environment, thus getting the overall systemcloser to a warning situation.

Decision modules of the cognitive node are responsible forselecting the best actions to be automatically performed by thesystem for avoiding a dangerous situation. Those actions can

be performed on the fully cooperative parts of the observedsystem; all the decisions taken by the cognitive node are madewith the precise intent of maintaining the environment in acontrollable, alarm-free state. If all the actions of the nodeare unable to keep the system in a standard state and themeasured warning level continues to increase, the node itselfcan decide to stop the cognitive cycle and to give commandof the controllable parts of the system back to the humanoperator. The caretaker has always the possibility to decideand completely bypass the automatic system or to be forcedto acknowledge each single action that the cognitive node istransmitting to the guarded environment.

A. Data fusion

The data fusion module is able to receive data from intel-ligent sensors on the field, and to fuse them from a temporaland spatial point of view. If one considers a set of S intelligentsensors, each k ∈ S sends to the cognitive node a vector offeatures ~x(k, t) =

{x1,x2, . . . ,xNk

}where k = {1,2, . . . ,S} at

time instant t. Intelligent sensors are sending feature vectorsasynchronously to the cognitive node, that must be able toregister them temporally and spatially before sending data toupper level processing modules.

From a temporal point of view, the data fusion module col-lects and stores into an internal buffer all newest measurements~xk,t∗k

from intelligent sensors k = {1,2, . . . ,S} that are receivedat a certain time instant t∗k . Data acquisition time can varyfrom sensor to sensor.

As soon as a new feature vector is acquired from sen-sor k, data fusion module can compute an extended featurevector by combining all measurements from all consideredintelligent sensors ϕ(t̂) = f (~x1,t∗1

,~x2,t∗2, . . . ,~xS,t∗S

), where t̂ ≥{t∗1 , t∗2 , . . . , t

∗S

}.

Thus the generation rate of the data fusion module can beestimated by considering the minimum time interval betweentwo sequential measurements of the higher frequency sensor.If ∆tn

k = (tnk − tn−1

k ) is the time interval between arrival timesof feature vectors~x(k, tn) and~x(k, tn−1) for sensor k, the actualdata rate of the fusion block can be estimated by computingmink(∆tn

k ).The analytic expression of the fusion function ϕ(t̂), depends

on the physical relationship between measured quantities andcannot be studied with a generic approach. In the followingscenarios, feature vectors are mainly generated by videoanalytics algorithms that are able to process images acquiredfrom video-surveillance cameras and extract scene descriptors(i.e., trajectories of moving objects, crowd densities within acertain environment, human activity related features, etc.). Thefusion algorithm must be designed for being able to combinetogether all the sensor data from the guarded environment.If one supposes to have surveillance cameras with partiallyoverlapped fields of view, for instance, fusion algorithm willfind a mean of measured data that are referred to the sameportion of the environment. On the other hand, if a set ofdisjoint video sensors is considered, data fusion algorithm willfind the union of considered feature vectors, thus giving to

2218

the upper modules of the cognitive node a more completedescription of the considered world.

In any case one can suppose that the fused feature vectorsproduced as output of this module have the following form:

~x(t) = {~xC,~xP}= {~xC1 ,~xC2 , . . . ,~xCn ,~xP1 ,~xP2 , . . . ,~xPm} (1)

Equation 1 expresses a general form for the global featurevector that is the result of the data fusion module. Vector ~xCidentifies features related to so-called core objects, i.e., entitiesthat are detected within the considered environment but thatare not part of the internal state of the system itself. Vector~xP identifies proto object features that are specific for entitiesthat can be completely controlled by the cognitive node.

B. Event detection

Event detection step can be divided into an off-line and aon-line phase. During the learning off-line stage, temporallyand spatially aligned feature vectors that are received from thedata fusion submodule, are clustered.

A Self Organizing Map [32] (SOM) unsupervised classifiercan be employed to convert the multidimensional proto andcore vectors ~xP(t) and ~xC(t) to a lower dimensional M-D,where M is the dimension (from here on we consider M = 2without losing generality) map (layer) where the input vectorsare clustered according to their similarities and to each clusteris assigned a label. Labels can be associated in a supervisedway, by a human operator or according to a priori information,to an ongoing situation that belongs to a set of conditionsto be identified pertaining to the specific application. Thechoice of SOMs to perform feature reduction and clusteringprocesses is due to their capabilities to reproduce in a plausiblemathematical way the global behaviour of the winner-takes-all and lateral inhibition mechanism shown by distributed bio-inspired decision mechanisms.

The clustering process, applied to internal and external dataallows one to obtain a mapping of proto and core vectors~xP(t) and ~xC(t) in 2-D vectors, corresponding to the positionsof the neurons in the SOM map, that we call, respectively,proto Super-states SxP and core Super-states SxC. Each clusterof Super-states, deriving from the SOM classifiers, is thenassociated with a semantic label related to the contextualsituation:

SxiP 7→ li

P, i = 1, . . . ,NP

Sx jC 7→ l j

C, j = 1, . . . ,NC(2)

where the notation SxiP and Sx j

C indicates that the Super-statebelongs, respectively, to the i-th proto label and to the j-thcore label; NP and NC are, respectively, the maximum numberof the proto and core Super-states labels.

Then, the result of this process is the building of a 2D mapdivided in connected regions labelled with a meaningful iden-tifier related to the ongoing situation. Using this representationit is possible to interpret the changes of state vectors~xP(t) and~xC(t) from instant to instant as movements in a plane (map)where each position is representative of a super state, i.e. of aparticular circumstance. If changes of the vector states XP(t)

and XC(t) do not imply a change of Super-state labels SxiP 7→ li

Pand Sx j

C 7→ l jC it means that the modifications are irrelevant

from the point of view of the chosen semantic representationof the situation. On the other hand, when the Super-state labelsSxi

P and Sx jC change in subsequent time instants, this fact

entails a contextual situation modification, i.e. an event. Then,by sequentially analysing the dynamic evolution of Super-states, proto and core events can be detected.

The resulting information becomes an approximation ofwhat Damasio calls the Autobiographical Memory where theinteraction between user and system is memorized.

The output of the off-line process is a list of zones withinthe features space. This module considers also dynamic aspectsof the evolution of clustered features: transition probabilitiesbetween different zones are computed, in such a way that theoutcome of the training process can be ideally compared toan Hidden Markov Model.

During the on-line phase, input vector from data fusionmodule (1), are processed and a set of possible events isgenerated by this block. The core-proto descriptive vector isused as input of the Self Organizing Map generated during theoff-line step.

Instead of memorizing sequence of states to describe theinteraction, it is chosen to consider events, that is state changes,since they can be located in time and they can better describecause-effect relationships. Therefore, from the signals of (1)two set of proto events εP

t and core events εCt are considered.

The sequences of proto (internal) and core (external) eventsrelated to the two interacting elements are organized intotriplets {εP,εC,εP} and {εC,εP,εC} to represent the causalrelationships in terms of initial situation (first event), externalcause (second event) and consequent effect on the examinedentity (third event). The appropriate memorization of thesetriplets is what we call autobiographical memory and thatit is the bases for the algorithms described in the followingsections.The basic idea behind the algorithms is to estimatethe frequency of occurrence of the effects caused by a certainexternal event in order to derive two probability distributions:

p(εP|εC,εP) (3)p(εC|εP,εC) (4)

representing the causality of observed events in the interaction.According to above considerations, the Autobiographical

Memory formation is characterized by learning the changesin proto Super-state caused by a core Super-state modification(core event). Therefore, the proto Super-state preceding thecore event, the core event itself and the proto Super-statemust be memorized. More precisely, considering a core event(εC) taking place at time T1 the effects on the internal statemust be taken into account to learn how the interaction withthe external entity, which provoked the core event, occurred.To do that, a time window of duration T−max is taken intoaccount to detect what was the proto Super-state Sx−P (t), withT1−T−max < t < T1 (i.e. the initial internal condition) and itsmodification subsequent to the core event, i.e. Sx+P (t), with

2219

T1 < t < T1 + T+max. Note that T+

max is the maximum timeafter which we consider reasonable that the proto modificationhas been caused by the core event and it was not occurredautonomously.

Three events are, then, memorized:

• ε−P = Sx0

P → Sx−P : proto event at the initial instant. Itrepresents the modification of the proto Super-state fromSx0

P 7→ liP to Sx−P 7→ l j

P occurring before the core event. Thetwo labels li

P and l jP are the ones associated, respectively,

with the Super-states Sx0P and Sx−P . The event ε

−P stores

the initial internal state Sx−P and at the same time, if itchanged in the time window T−max.

• εC = Sx−C → Sx+C : core event. It describes the change ofthe external super state from Sx−C 7→ lm

C to Sx+C 7→ lnC.

• ε+P = Sx−P → Sx+P : proto event following to the core event.

It represents the change of the proto super state fromSx−P = l j

P to Sx+P = lkP.

The above triplet {ε−P ,εC,ε+P } represents a core self instantia-

tion that is associated with an element of the AutobiographicalMemory, namely what, in the Damasio work, is called coreconsciousness.

This model of interactions relies on the following assump-tions: 1) the sequence of events considered to describe theinteraction to be stored in the passive memory is: proto - core- proto (i.e. internal - external - internal with respect to thesystem); 2) just one core event is involved in the interaction,i.e. the proto state change is caused by only one core event; 3)an interaction takes place only if a proto event, i.e. a changein proto super state, follows the core event within T+

max; 4) ifa proto event, preceding the core event, is involved in theinteraction it must occur within T−max. Note that, to modelan interaction, it is not necessary that a proto event takesplace before the core event within a time range; in fact thesystem could have been in a stable state for a long periodof time since a core event modified the internal status. Inour model since we want to memorize events, the steadycondition before the core event will be considered as a pseudoevent where the label of the super states does not change, i.e.ε−P = (Sx0

P→ Sx−P ) 7→ (liP→ li

P).The sequence of events is represented by a statistical graph-

ical model in order to introduce a mathematical descriptionof the proposed interaction model. This choice is due to thefact that the interaction pattern is composed by a temporalsequence of interdependent events and then it can be seenas a stochastic process. Therefore an approach for modellingsequences of events that relies on a probabilistic model resultsto be particularly appropriate.

The interaction patterns are modelled by a DynamicBayesian Network in order to have a representation able tostatistically encode the relevant data variability. DBNs providea compact way to model trajectory that allows simple trainingand evaluation procedure together with the possibility ofcomparing path of different length. The proposed DBN graph,shown in Figure 5, aims at describing interactions takinginto account the neuro-physiologically motivated model of the

Autobiographical Model. The conditional probability densities(CPD) p(εP

t |εPt−1) and p(εC

t |εCt−1) encodes the motion pattern

of the objects in the environment regardless the presence ofother objects.

Fig. 5. Dynamic Bayesian Network model representing interactions withAutobiographical Memory

The interactions between the two objects are consideredwith the CPDs

p(εPt |εC

t−∆tC) (5)

p(εCt |εP

t−∆tP) (6)

In particular, Equation 5 describes the probability that theevents εC, occurred at time t−∆tC and performed by the objectassociated to the core context, has caused the event εP in theproto context. Reversed interpretation in terms of causal eventsshould be given to p(εC

t |εPt−∆tP).

Considering the definition of the core consciousness, thecausal relationships between the two entities are encoded intwo conditional probability densities (CPDs):

p(εPt |εC

t−∆tC ,εPt−∆tP) (7)

p(εCt |εP

t−∆tP ,εCt−∆tC) (8)

As a matter of fact, the probability densities in Equations 7-8 consider both the interaction (i.e. Eq. 5 or Eq. 6) and theinitial situation (i.e. εP

t−∆tP or εCt−∆tC ).

C. Situation assessment

The main purpose of the situation assessment module is toreveal specific patterns of events in the acquired data. Duringan off-line phase, the module was able to learn a set of specificpre-defined situations. Each situation is mapped into a datastructure (the so-called Autobiographical Memory, AM) thatis able to describe typical events evolutions if that specificinteraction is considered. Then, given a specific event εC asinput, several scores are computed on a certain number of pre-defined known interactions: in this way this module is able toevaluate which one is the most probable situation that can beestimated from the temporal evolution of the measured data.

By using the Autobiographical Memory, the cognitive nodehas the possibility to classify actions while they are takingplace in an on-line manner. For this reason, an accumulativemeasure is computed exploiting the information encoded inthe proposed DBN model. To accomplish this task, for eachinteraction i : i = 1, . . . ,NI , where NI is the number of consid-ered interactions, a set of couple of trajectories is used to train

2220

the model and to derive its parameters Θi. Then the followingmeasure is computed every time a new event ε

P,Ct is detected:

lit = li

t−∆tC,P + p(eP,Ct−∆tP,C ,e

C,Pt−∆tC,P ,e

P,Ct |Θi) (9)

where lit−∆tC,P is the measure computed at the time in which the

previous event has been observed and with p(eP,Ct−∆tP,C , eC,P

t−∆tC,P ,

eP,Ct |Θi) the probability that the observed triplet belongs to

the i-th interaction model is indicated. For each event thenit is chosen the interaction according to i∗ = argmaxi li withi = 1, . . . ,NI .

D. Prediction

The prediction block uses the very same models that areconsidered by the situation assessment IV-C module. Auto-biographical Memories are used here to predict future eventsof the core objects εC+; this prediction is performed for eachknow situation.

The knowledge base within the Autobiographical Memorycan be used to predict near future events by observing andprocessing internal and external events occurring within thescope of the system. This capability resembles the activationof the autobiographical self, which is the brain process ofrecovering neural images related to the arisen core self.

To perform the prediction task when an external eventεC(m,n) is detected by the system the proto map is analysedto establish which was the previously occurred internal eventε−P(i, j). The Autobiographical Memory is then examined to

establish which is the internal event ε̂+P( j,∗) that is more likely

to occur, carrying the internal state to the Super-state l∗P, thatis:

ε̂+P( j,∗) : max

ε+P( j,∗)

p(

ε+P( j,∗)|εC(m,n),ε

−P(i, j)

)(10)

Moreover the temporal histogram can provide informationabout the time at which the ε

+P( j,∗) might take place. All these

data can be very useful for an Ambient Intelligence applicationto anticipate operations or arrange the elements of the systemthat can be involved in the interaction with the external world.

The predicted proto event is then the one that maximizes thelearned probability distribution p(ε+P |εC(m,n),ε

−P(i, j)). Different

choices can be made to foresee the time when this eventwill take place as, for example, either the mean or themedian value of the temporal histogram related to the triplet{ε−P(i, j),εC(m,n), ε̂

+P( j,∗)} or the value related to the most frequent

bin.

V. COGNITIVE NODE APPLICATION DOMAIN:CROWDANALYSIS

The simulated monitored environment is shown in Fig.6. The configuration of doors, walls and rooms is howevercustomizable and for sure tests on different environments willbe run in the future to give more consistency to the theory de-veloped so far. The use of a graphical engine (freely availableat http://www.horde3d.org/) has been introduced inorder to make the simulation realistic in the AutobiographicalMemory (III-A) training phase. Here a human operator acts

Fig. 6. The simulated monitored environment.

on doors configuration in order to prevent room overcrowding,based on the visual output of the simulator.

Crowd behaviour within the simulator is modelled basedon Social Forces, which were briefly mentioned in sectionII. This model assimilates each character on the scene to aparticle subject to 2D forces, and threats it consequently froma strictly physical point of view. Its motion equations arederived from Newton’s law F = ma. The forces a characteris driven by are substantially of three kinds. An attractivemotivational force pulls characters toward some scheduleddestination, while repulsive physical forces and interactionforces prevent from collision into physical objects and othercharacters. Although simplified with respect to [28], the modelshows a very (qualitatively) realistic visual output. Charactersare also animated to simulate walk motion.

The simulator also includes (simulated) sensors. These tryto reproduce (processed) sensor data coming from differentcameras looking at different subsets (rooms) of the monitoredscene. A virtual people estimation algorithm outputs the num-ber of people by simply adding some noise to the mere numberof people framed by the virtual camera.

The state vector of the system (which corresponds to theexternal object (eso)) is

XCr(t) = {xCr1(t),xCr2(t), . . . ,xCrN (t)} , (11)

with N = 6 in our case (six cameras, one for each room).xCrn(t) is the number of people in room n. A 10x10 2DSOM is then trained in order to clusterize the state vectorspace. The SOM superstates (better say, their variations) defineevents. Such events are then labelled as formalized in IV-B.The internal (endo) state of the system (namely, the doors’configuration) is simply modelled by a binary vector storingthe state of each door (true if open, false if closed). Variationsof such a vector define proto events.

An Autobiographical Memory (III-A) is then trained bysimulating a human operator who opens virtual gates in orderto let the crowd stream outside in case high occupancy statesare reached and, at the same time, to minimize the time gatesremain open. Such a memory basically stores the interactionbetween a human operator (proto-self) and the crowd (core-self) as formalized in III. The operator’s reactions to crowdfluctuations stored in the AM are used on-line to choose

2221

an optimal strategy, i.e. to simulate the action of a humanoperator, by predicting not only his behaviour but also crowd’sreaction to it. Results show that the system is able to predictand prevent overcrowding to a very good extent.

VI. CONCLUSIONS

A detailed description of cognitive model, cognitive nodeand autobiographical memories frameworks has been given.A simple experimental set-up has been developed to giveconsistency to the developed theory, showing satisfactoryresults.

Future developments of this work surely include moretesting in the hope of obtaining some more quantitative results.The simulator can be refined to better depict reality andadditional scenarios can be investigated. Tests based on realdata would also give more consistency to the developed theory,but the very testing scenario that was here proposed would bein practice quite difficult to be implemented. However, thanksto the scalability of the model, many other application domainscan be explored, which can allow for real data testing.

Eventually, investigations on impacts of SOM training anddimension and of AM multiple-operator training will be car-ried on.

REFERENCES

[1] P. Remagnino, S. A. Velastin, G. L. Foresti, and M. Trivedi, “Novelconcepts and challenges for the next generation of video surveillancesystems,” Mach. Vision Applications, vol. 18, no. 3, pp. 135–137, 2007.

[2] M. Trivedi, K. Huang, and I. Mikic, “Intelligent environments and activecamera networks,” in Proceedings of the IEEE International Conferenceon System, Man and Cybernetics, 2000, pp. 804–809.

[3] A. Lipton, C. Heartwell, N. Haering, and D. Madden, “Automated videoprotection, monitoring & detection,” IEEE Aerospace and ElectronicSystems Magazine, vol. 18, no. 5, pp. 3–18, May 2003.

[4] M. M. Trivedi, T. Gandhi, and J. McCall, “Looking-in and looking-outof a vehicle: Computer-vision-based enhanced vehicle safety,” IntelligentTransportation Systems, IEEE Transactions on, vol. 8, no. 1, pp. 108–120, 2007.

[5] A. R. Damasio, The Feeling of What Happens-Body, Emotion and theMaking of Consciousness. Harvest Books, 2000.

[6] M. Valera and S. Velastin, “Intelligent distributed surveillance systems:a review,” Vision, Image and Signal Processing, IEEE Proceedings,vol. 52, no. 2, pp. 192–204, April 2005.

[7] G. L. Foresti, C. S. Regazzoni, and P. K. Varshney, Multisensor Surveil-lance Systems: The Fusion Perspective. Kluwer Academic, Boston,2003.

[8] R. Collins, A. Lipton, H. Fujiyoshi, and T. Kanade, “Algorithms forcooperative multisensor surveillance,” Proceedings of the IEEE, vol. 89,no. 10, pp. 1456–1477, October 2001.

[9] D. Smith and S. Singh, “Approaches to multisensor data fusion intarget tracking: A survey,” IEEE Transactions on Knowledge and DataEngineering, vol. 18, no. 12, pp. 1696–1710, December 2006.

[10] A. Prati, R. Vezzani, L. Benini, E. Farella, and P. Zappi, “An integratedmulti-modal sensor network for video surveillance,” in Proc. of the thirdACM international workshop on Video surveillance & sensor networks,November 2005.

[11] B. R. Chang, H. F. Tsai, and C.-P. Young, “Intelligent data fusionsystem for predicting vehicle collision warning using vision/gpssensing,” Expert Systems with Applications, vol. 37, no. 3, pp. 2439 –2450, 2010. [Online]. Available: http://www.sciencedirect.com/science/article/B6V03-4WXSJY7-8/2/ddc46fb18f2555045f0d99487c652c7f

[12] S. Wu, S. Decker, P. Chang, T. Camus, and J. Eledath, “Collision sensingby stereo vision and radar sensor fusion,” Intelligent TransportationSystems, IEEE Transactions on, vol. 10, no. 4, pp. 606 –614, 2009.

[13] B. Zhan, D. N. Monekosso, P. Remagnino, S. A. Velastin,and L.-Q. Xu, “Crowd analysis: a survey,” Mach. Vision Appl.,vol. 19, pp. 345–357, September 2008. [Online]. Available: http://portal.acm.org/citation.cfm?id=1416799.1416810

[14] C. Loscos, D. Marchal, and A. Meyer, “Intuitive crowd behavior indense urban environments using local laws,” in Theory and Practice ofComputer Graphics, 2003. Proceedings, 2003, pp. 122 – 129.

[15] B. Liu, Z. Liu, and Y. Hong, “A simulation based on emotions model forvirtual human crowds,” in Image and Graphics, 2009. ICIG ’09. FifthInternational Conference on, 2009, pp. 836 –840.

[16] D. Handford and A. Rogers, “Modelling driver interdependent behaviourin agent-based traffic simulations for disaster management,” in The NinthInternational Conference on Practical Applications of Agents and Multi-Agent Systems, Salamanca, Spain, accepted for publication, april 2011.

[17] A. C. Davies, J. H. Yin, and S. A. Velastin, “Crowd monitoringusing image processing,” Electronics and Communication EngineeringJournal, vol. 7, pp. 37–47, 1995.

[18] C. R. Wren, A. Azarbayejani, T. Darrell, and A. P. Pentland, “Pfinder:Real-time tracking of the human body,” IEEE Transactions on PatternAnalysis and Machine Intelligence, vol. 19, pp. 780–785, 1997.

[19] I. Haritaoglu, D. Harwood, and L. S. David, “W4: Real-timesurveillance of people and their activities,” IEEE Trans. Pattern Anal.Mach. Intell., vol. 22, pp. 809–830, August 2000. [Online]. Available:http://dx.doi.org/10.1109/34.868683

[20] A. N. Marana, S. A. Velastin, L. F. Costa, and R. A. Lotufo,“Automatic estimation of crowd density using texture,” Safety Science,pp. 165–175, Apr. 1998. [Online]. Available: http://dx.doi.org/10.1016/S0925-7535(97)00081-7

[21] T. Zhao and R. Nevatia, “Bayesian human segmentation in crowdedsituations,” Computer Vision and Pattern Recognition, IEEE ComputerSociety Conference on, vol. 2, p. 459, 2003.

[22] E. Andrade, S. Blunsden, and R. Fisher, “Hidden markov models foroptical flow analysis in crowds,” in Pattern Recognition, 2006. ICPR2006. 18th International Conference on, vol. 1, September 2006, pp.460 –463.

[23] Y. Benabbas, N. Ihaddadene, and C. Djeraba, “Motion pattern extractionand event detection for automatic visual surveillance,” EURASIP Journalon Image and Video Processing, vol. 2011, p. 15, 2011.

[24] H. Rahmalan, M. Nixon, and J. Carter, “On crowd density estimationfor surveillance,” in Crime and Security, 2006. The Institution ofEngineering and Technology Conference on, 2006, pp. 540 –545.

[25] F. Cupillard, A. Avanzi, F. Bremond, and M. Thonnat, “Video under-standing for metro surveillance,” in Networking, Sensing and Control,2004 IEEE International Conference on, vol. 1, 2004, pp. 186 – 191Vol.1.

[26] R. Mehran, A. Oyama, and M. Shah, “Abnormal crowd behaviordetection using social force model,” in Computer Vision and PatternRecognition, 2009. CVPR 2009. IEEE Conference on, 2009, pp. 935–942.

[27] S. Pellegrini, A. Ess, K. Schindler, and L. van Gool, “You’ll neverwalk alone: Modeling social behavior for multi-target tracking,” inInternational Conference on Computer Vision, 2009.

[28] M. Luber, J. A. Stork, G. D. Tipaldi, and K. O. Arras, “People trackingwith human motion predictions from social forces,” in Proc. of theInt. Conf. on Robotics & Automation (ICRA), Anchorage, USA, 2010.

[29] B. E. Moore, S. Ali, R. Mehran, and M. Shah, “Visual crowdsurveillance through a hydrodynamics lens,” Commun. ACM, vol. 54,no. 12, pp. 64–73, Dec. 2011. [Online]. Available: http://doi.acm.org/10.1145/2043174.2043192

[30] A. Dore and C. S. Regazzoni, “Bayesian bio-ispired model for learninginteractive trajectories,” in Proc. of the IEEE International Conferenceon Advanced Video and Signal based Surveillance, AVSS 2009, Genoa,Italy, September 2009.

[31] A. Dore, A. Cattoni, and C. Regazzoni, “Interaction modeling andprediction in smart spaces: a bio-inspired approach based on autobi-ographical memory,” Systems, Man and Cybernetics, Part A: Systemsand Humans, IEEE Transactions on, 2010.

[32] T. Kohonen, “The self-organizing map,” Proceedings of the IEEE,vol. 78, no. 9, pp. 1464 –1480, Sep. 1990.

2222

Date post:	17-Oct-2020
Category:	Documents
Upload:	others
View:	0 times
Download:	0 times

A multi-sensor cognitive approach for active security...

Documents