Decision Making and Reasoning with Uncertain Image and Sensor Data
Pramod K VarshneyKishan G MehrotraChilukuri K Mohan
Main Themes Decentralized decision-making Multiple uncertain information
streams Dynamically changing
environments Algorithms for realistic battlefield
scenarios
What is the agent’s current location?
What activities are other agents involved in?
What is the likelihood of damage at various locations?
What would be the safest paths to a goal/exit zone?
Main Contributions Scenario recognition from video
sequences Improved activity recognition with
audio+video information Development of new algorithms for path
planning in a battlefield Formulation of path planning as a multi-
objective optimization problem Development of a new multi-objective
evolutionary algorithm
1. Scenario Recognition and Classification
Event recognition and scene analysis with real time visual and audio information
Problem Formulation Detect moving objects and classify
activities Identify sounds indicative of specific events Quantify uncertainty in activity
classification Develop an enhanced scene representation
by integrating audio and visual information Related work
1.1.Video Component Goal: To detect and track moving
objects and classify activity in real time
Input: real time video stream Output: detected moving object
and activity classification
Video Processing Pipeline (cont’d.)Goal: Recognition of a moving object’s activities
from a sequence of images (video)
Low Level Processing-Filtering-Detection-Tracking-Feature Extraction
High Level Processing
-Frame Classification-Scenario Recognition
Sequence of
Frames
Extracted
Features
Extracted
Scenarios
Video Processing PipelineReal time Video Acquisition
Detection
Tracking
Feature extraction
Classification
Scene Description Generator
Visualization
Features Extracted Aspect Ratio (AR) = d / (a+b+c) Relative Upper Density (RUD) = a / (a+b+c) Relative Middle Density (RMD) = b / (a+b+c) Relative Lower Density (RLD) = c / (a+b+c) Velocity and centroid
a
b
c
d
Video Feature Analysis: Example
Feature Walking Bending
AR 0.2 0.3
RUD 0.3 0.2
RMD 0.4 0.5
RLD 0.3 0.3
Figure 1
Figure 2
Classification Algorithms Used for Activity Detection Multi-module back-propagation
neural network Inductive Decision Tree
Learning (C5) algorithm Control Chart Approach Bayesian networks
Visualization of Activity with Uncertainty Measure Example activities
shown here: sitting, bending and standing
Uncertainty is calculated from classifier output, foreach event
The blue pointer indicates the level of certainty in the classifier decision
Control Chart Approach for Video Activity Classification Control Chart indicates the variation in
the values of some feature over time, with graphical depiction of the upper and lower control limits for that feature.
High level detection with control charts:
1. Identification of each activity.2. Recognition of when the activity begins
and ends.
Control Chart Example (with Upper and Lower Control Limits for each activity)
0.00
0.20
0.40
0.60
0.80
1.00
1.20
1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81
Frame Number
Asp
ect R
atio
UCL for Standing
UCL for Sitting
LCL for Standing
LCL for Sitting
LCL for Bending
UCL for Bending
detail
1.2.Why Audio?
Role of Audio Component Obtain information which may not
be acquired visually Provide additional comprehensive
information enriching the scene context
Due to large number of potential sounds to identify, the scope of problem is very vast
Audio Processing Goal: To detect and classify sounds
indicative of specific events Input: A sample of sound in real time Output: Detected class of specific
sound Example: sound samples indicate
specific objects/events such as explosions and vehicles
What’s New? Fusion of audio and video for
surveillance and scene analysis New audio features - Spectrum
shape modeling coefficients
Audio Processing PipelineAudio acquisition
Histogram Features
Spectral FeaturesRelative Band
Energies
Linear predictive coding /Cepstral coefficients
Choose features
Multi Module back-propagation Neural Networks
Audio Features
Amplitude Histogram Features (width, symmetry, skewness and kurtosis calculated on a histogram of a 3 second clip)
Spectral Centroid and Zero Crossing RateRelative Band energiesLinear Predictive Coding CoefficientsCepstral Coefficients
Spectrum shape modeling coefficients
What and why?
Audio Enhanced Visual Processing
Video Processing and Classification
Audio Processingand Classification
Visualization
Description Generation
Video Acquisition
Sound Acquisition
FusionUncertainty
Audio Visual Classes 3 classes of video events
Sitting Standing Bending
4 classes of sound events are considered Silence Clear Speech Babble or Speech in noise Alarm sounds (smoke detector class)
Prototype Demonstration
Experimental Results - Video
Sub-scenario recognition accuracy of Control Chart approach:
VideoNumber of
FramesNumber of
sub-scenarios
Number of recognized
sub-scenarios
1 823 11 10
2 512 6 6
3 701 12 12
4 514 9 10
Experimental Results - Video We used 4 different video sequences. Total
2250 feature vectors, 1072 were used in the training and rest of the 1478 vectors were used in the testing.
Classification Accuracy using different methods: Neural Network (back-propagation) 91.34% Decision Tree (C5) 92.86% Naive Bayesian Network 89.61% Control Chart 95.70%
Experimental Results - Audio In this 4 class problem, we obtain
classification accuracy of 92% on recorded data (off-line classification)
75% for real time classification in the laboratory acoustic environment Acoustics of each environment can be
different, leading to misclassifications Characteristics of the recording
equipment
1.3.Representation Scheme Audio and visual processing yields
information about scene context Need for representation scheme for
acquired audio video information Generation of a document containing
audio-visual information, which can be further processed
XML Based Description We chose an XML based representation
Widely accepted standard for information exchange More comprehensive forms such as XML schema will
be used for representation MPEG standards use XML based Audio visual content
management Semi structured, allowing for addition of user defined
data and information An XML based representation allows for
standardization, flexibility and extendibility Automatic generation of XML based description Descriptor gives the state of observed scenario
over a certain time period
Example Descriptor
Moving object Features and activity class
Header
Complete descriptor
Descriptor Utility The combined audio visual
descriptor can serve as a base for Data mining for unusual events or
correlation between events and activities
Building case libraries of interesting scenarios or for particular cases
Audio-visual fusion and visualization
Discussion We have shown the feasibility of
activity recognition using combined video and audio information.
Future work: integration, extension, elaboration
Next section (path planning): after activity recognition, battlefield decision-maker must act.
2. Personnel Movement Planning in a Battlefield
Path computation algorithms for risk minimization
2.1 Path Planning in a Battlefield Goal: To determine (escape) paths for
personnel in a battlefield Input: A node weighted graph with each
node representing a geographical location of a battlefield whose weight corresponds to the associated risk.
Quality Measure: The quality of an escape path is determined by cumulative risk of the path
Problem Formulation A path P is a non cyclic sequence
(L1,L2….Ln) where L1 is the initial location of personnel, Ln is a target or exit point, and each Li is adjacent to Li+1 in the graph.
Determine escape paths which maximize path quality Q(P) defined as
k
Q(P)= log(1-risk(Li)) i=1
Modeling Risks We define risk as the probability of occurrence of a
high level of damage to personnel traversing a path
Two probabilistic risk models Gaussian Distribution - models risks due to specific events such as explosion and
chemical threats Beta Distribution - models risks due to
distribution of events through the entire geographical region
Modeling Risks with Gaussian Distribution
Algorithms for Path Planning Uniform Cost Search – finds the
optimal solution (Dijkstra’s algorithm) Simulated Annealing Evolution Strategies (ES)
µ+1 ES Stochastic ES Evolutionary Quenching Strategy (EQS)
Evolution Strategies Initialize population Generate offspring at each iteration
from a population of size µ Replacement Strategy
µ+1 ES – Deterministic replacement; only offspring of higher quality are accepted
Stochastic ES - Probability of replacement
is equal to min[1,Q(offspring)/Q(parent)]
Key Principle of EQS An evolution strategy which accepts
solutions of lower quality with a probability that decreases with increase in number of iterations (annealing principle)
Ensures escape of local optimum during early stages of the algorithm
Emphasizes convergence to optimal solution at later stages of the algorithm
Optimal Route Planning for Battlefield Risk Minimization
Goal
Source
Source
Goal
Source
High risk
Moderate risk
Low risk
Risk free
Optimal Route Planning for Battlefield Risk Minimization (Contd.)
Simulation Results The algorithms were simulated on a
100x100 grid with 15 target nodes on the periphery of the grid.
In all instances of the problem, EQS approximates the optimal solution outperforming Simulated Annealing and variants of ES.
EQS and other variants of ES require a relatively less computational time of 21 seconds compared to uniform cost search (470 seconds)
Performance Comparison of Different Algorithms with a Gaussian Distribution
for Risk Values
2.2 Multi-Objective Path Planning In a battlefield, a path can be
evaluated with respect to different objectives.
Some crucial aspects of a path to be considered are:
Cumulative Risk Length of the Path Reward associated with the target node
Multi-objective Evolutionary Algorithms Goal: To discover a set of non
dominated solutions with significant diversity
Evolutionary algorithms are best suited for multi-objective optimization since they simultaneously explore multiple solutions
Multi-objective Evolutionary Algorithms (Contd.) We have implemented three multi-objective
evolutionary algorithms for path planning problem
Pareto Archived Evolution Strategy- J.D. Knowles and D.W Corne, “On Metrics for comparing non dominated sets,” in Proc. IEEE Congress on Evolutionary Computation (CEC02), pp.711-716, 2002.
Non-dominated Sorting Genetic Algorithm - K. Deb , S. Agarwal, A. Pratap, and T. Meyarivan, “A fast and elitist multi-objective genetic algorithm: NSGA II,” in Proc. Parallel Problem Solving from Nature VI, pp.849-858, 2000.
Evolutionary Multi-objective Crowding Algorithm
Evolutionary Multi-objective Crowding Algorithm (EMOCA)
EMOCA considers crowding density in data space for path planning
Mating opportunities are given to better quality as well as substantially different individuals
Stochastic acceptance criteria is used which depends on crowding density difference between parent and offspringEMOCA
Main steps
Multi-objective Problem Scenario
Goal-1
Goal-2
Goal-3
Source
High risk
Moderate risk
Low risk
Risk free
Multi-objective Problem Scenario (contd.) Paths are evaluated with respect
to three different measures – risk, path length and reward
Difficult tradeoffs exist: for example, should personnel follow a more risky path to increase the probability of finding a greater reward?
Illustrating Mutually Non Dominating Paths
P1 goal1 P2 goal2
goal3
P3
source
High risk
Moderate risk
Low risk
Risk free
Path Quality with respect to Different Measures
Path Risk Path length
Reward
P1 0.7 9 0.2
P2 0.2 14 0.5
P3 0.7 12 1
Best Choice of Path W-risk W-path
lengthW-reward
Best path
Low High Low P1
High Low Low P2
Low Low High P3
Performance Comparison We have used a well known metric – C
metric for performance comparison. Smaller values of C metric indicates better performance.
We have also obtained C metric values over multiple trials comparing the solutions obtained by different algorithms for each trial
Simulation Results EMOCA outperforms NSGA II and
PAES for results obtained over 100 trials
EMOCA obtains more non-dominated solutions and has lower C metric values than other algorithms.
The results clearly indicate that EMOCA performs best for the path planning application
C-metrics for Various Pair-wise Algorithm Comparisons
Algorithm1 Algorithm2 C(Algorithm2, Algorithm1)
EMOCA(without crossover)
PAES 0.15
EMOCA(with crossover)
PAES 0.00
EMOCA (with crossover)
NSGA II 0.06
Discussion Efficient algorithms for risk
minimization Near-optimal solutions Modeled path planning as a multi-
objective optimization problem Developed a new algorithm (EMOCA)
outperforming state of the art multi-objective evolutionary algorithms
Future Work
Develop multi-objective evolutionary algorithms for other battlefield applications such as wireless sensor networks employed in surveillance systems
Develop algorithms for dynamic path planning Multiple object detection and tracking, and work
on Multi camera platform Develop a comprehensive library of recognizable
sounds to provide richer context information New methodologies for audio visual fusion Integration with VGIS
Mutation
The mutation step consists of replacing a randomly chosen edge of the path by another sub path between the same nodes.
In mutating the path a b c d e, a randomly chosen edge of the path, say c d, is replaced by an alternate sub-
path c f h d, yielding a b c f h d e
Simulated Annealing- main steps Initialize population- straight line
shortest paths from source node to target node
Mutation of parent to produce offspring
Stochastic replacement with probability
1-e (Q(offspring)-Q(parent))/temperature
Mutation
The mutation step consists of replacing a randomly chosen edge of the path by another sub path between the same nodes.
In mutating the path a b c d e, a randomly chosen edge of the path, say c d, is replaced by an alternate sub-
path c f h d, yielding a b c f h d e
Multi-objective Optimization- Preliminaries The solution to a multi-objective optimization
problem is a set of non-dominated vectors. A solution vector x dominates a solution vector
y (x>>y) if and only if
i {1,….m} : fi(x) >= fi(y), and
j {1,….m} : fj(x) > fj(y) Where m is the number of objectives. X andY
are mutually non-dominating if the above conditions do not hold.
EMOCA- Main Steps Initialize Generate mating population Generate offspring by crossover ,
mutation Create a new pool consisting of some
parents and some offspring Trim new pool to generate population of next iteration
Crossover Two Point Path Crossover operator (2PTPX) which
is less disruptive and preserves a major portion of the parent paths.
Consider two parent paths S N1 N3 E1 and S N2 N4 E2, where N1 and N2 are at least four path lengths away from E1 and E2, and nodes N3 and N4 are a few edges away from N1 and N2, respectively.
The crossover operator then generates the offspring S N1 N4 E2 and S N2 N3 E1 .
Pareto Archived Evolution Strategy (PAES) Uses a local search strategy and maintains
an archive of non-dominated solutions. Parent is mutated to produce offspring If offspring dominates parent, it is
accepted If offspring and parent are non-dominated,
then acceptance decision is based on the squeeze factor of the solutions.
Non-dominated Sorting Genetic Algorithm(NSGA II) Generates offspring population of size N
from mating population of size N by crossover and mutation
Uses binary tournament to select mating pairs
A non dominated sorting on combined population(parent+offspring) is used to obtain mating population for next iteration
Crowding density Data space crowding density is defined as (P)=
L/E where L is the number of paths in the current population passing through each edge of path P, and E is total number of edges in path P
A relatively low value of (P) indicates that path P does not share many edges with other paths in the population, giving it a relatively high diversity rank.
Salient features of EQS The acceptance probability of EQS depends
on where =((c+(1-c)*i)/)-, i is the current iteration , is the maximum number of iterations, c and are algorithm parameters.
During initial stages of the algorithm, when i=0, =c/-, and the probability of acceptance is high. During later stages of the algorithm when i approaches ,
=c/+(1-c)-, and the probability of accepting the offspring is relatively low.
Trimming New pool The new pool is sorted based on the primary
criterion of non-domination rank and the secondary criterion of diversity rank
The new population will consist of the first N elements of the sorted list containing solutions grouped into different fronts:F1, F2,
…..Fn where elements of Fi+1 are dominated
only by elements in F1,F2 ,…..Fi.
New Pool Generation The offspring is compared with one of the parents to form
the new pool.There are three possible cases: Case 1: If the offspring dominates the parent, then the
offspring is added to the new pool. Case 2: If dominated by the parent, the offspring is added to
the new pool with probability
1-exp((offspring)- (parent)). Case 3: Otherwise, if the offspring has a lower crowding
density than the parent, then it is added to the new pool, else the parent is added to the new pool.
Mating Population Generation Binary tournament selection is iterated to
create the mating pool In each step, two randomly chosen
members of the current population are compared
The tournament to determine who enters the mating population is won by the solution with lower total rank, the sum of its non-domination rank and diversity rank
Squeeze factor
The squeeze factor of a candidate solution is the number of archive elements located in the same cell of the objective function space, assuming that this space is a finite hyper cube divided in to (2d)m equal sized non overlapping hyper cubes.
C-metric
C metric – calculates the fraction of solutions in one non-dominated set that are dominated by the non-dominated solutions of the other set.
Significance of audio features Histogram features Features calculated on histogram
Width Symmetry Skewness Kurtosis
Clear voice has a asymmetricbroad histogram
Voice in noise has a narrower histogram, and is more
symmetric Useful in detecting modulations
in sound
Other sound environments We conducted experiments To classify the following environments
Air conditioned rooms Construction site Factory Rail tunnel Warehouse
To distinguish between types of power tools in a construction setting
Drills Hammers Generators Compressor Electric motors
Significance of audio features (cont’d) Spectral Centroid and Zero Crossing Rate,
model the spectral distribution and the dominant frequency (pitch) of sound
Band Relative Energies calculate the energy in several spectral bands. Speech mostly contains energy in the band below 1 khz whereas alarms might have a different distribution
LPC coefficients and Cepstral Coefficients give a direct indication of sampled sound in time and querfency domain respectively
Complete XML descriptor
Related Work Interpretation system of dynamic scenes INRIA
France 2003. Robust, Online Event Detection and Classification
for Video Monitoring (Cornell University) Video Surveillance and Monitoring (Carnegie
Mellon University 2000) Work dealing with situational context learning like
Computational Auditory scene analysis, Wearable Audio Computing at MIT(2003), Technology for Enabling Awareness (TEA) project(2000)
Low Level Processing of video Moving Object Detection:
Background Subtraction: Luminance Contrast Method Background/Template Updating
Moving Object Tracking:Dynamic TemplateInfinite Impulse Response (IIR)
Feature Extraction:Bounding box is identified, and useful features extracted from it
Uncertainty computation ))((/)()( ioutputioutputiunc
Module 1:standing
Module 2: standing
Module 3: sitting
0.987
0.01
0.092
092.001.0987.0/987.0)1( unc
092.001.0987.0/01.0)2( unc
092.001.0987.0/092.0)3( unc
0.9063
0.0092
0.0845
Spectral shape coefficients Divide the spectrum into 5 bands Do a linear regression,find best fit lines for the
spectral envelope in each Band Slopes of these lines give
the coefficients Inspired by the Kates
coefficients Indicate shape of spectrum
Frame based classification The mean values and standard deviations are
computed for each feature fi and for each class ci to be discriminated, using the available training data
For each class ci , the upper and lower bounds associated with the control chart are obtained:upperBound(fk , ci ) = mean(fk , ci ) + fk, ci .standard deviation (fk , ci )
lowerBound(fk , ci ) = mean(fk , ci ) – fk, ci . standard deviation (fk , ci )
Decision in Classification Final classification uses the majority rule.
For instance, if [standing,standing,standing,bending] is the vector representing single-feature based classification for each of the four features, the final conclusion is standing.
Ties are broken by giving priority to one feature: A tie between standing and bending is broken in
favor of ‘Standing’ if the value of RUD feature for the candidate object is closer to mean(RUD,Standing) than to mean(RUD,Bending).
A tie between standing and sitting is broken by AR. A tie between sitting and bending is broken by RLD.
Recognition of Sub-Scenario If c (>0) consecutive decisions at times t, (t-1),
……..(t-c+1) are all different from the decision being made at time (t-c), then we conclude that a new sub-scenario had commenced at time (t-c+1).
Otherwise, we attribute the differences to noise and image quality, and presume that the sub-scenario has not changed.
Video features Features derived from the moving
object used for activity detection are Aspect ratio Velocity Relative densities of pixels in upper ,
lower and middle bands of bounding box Coordinates of centroid of bounding box