SIAM 2014 InvitedTalk

8/10/2019 SIAM 2014 InvitedTalk

1/19

Learning hierarchical invariant spatio-temporalfeatures for human action and activityrecognition

Binu M Nair, Vijayan K Asari

07/08/2014


2/19

Introduction

Applications of activity/action recognition

Gaming (Kinect) Autonomous Visual Control of Fighter Jets by Air Crew hand gestures.

Research Objectives

To detect and recognize harmful activities of individuals of interest from a set/pair of surveillance cameras at long range.

Motivation: Monitoring a crowded environment and locating suspicious activities by security personnel

Security personnel creates a temporary signature of people in the scene (type of clothing, the shape etc..)

Identifies the action of the person (walking, running etc..)

Locates the individual with suspicious action and then observes him closely of what he is doing( from the joints movements etc)


3/19

Introduction

To have an automated system to perform these tasks, there are 4 different entities

Automatic Pedestrian Unique ID tagger

Security personnel fairly knowing what each person looked like

Human Action Recognition

Seeing what action each one does : walking, running, bending etc..

Automatic Detection and Tracking of Specific body joints.

Examining a particular individual(performing a suspicious action) closely of what he/she does

Inference of what activity is performed by joint trajectory analysis based on context

Eg: Bending down to place a suitcase or pick up a box or tying his shoe lace etc..


4/19

Motivation

Need a real time system

Recognize an action or an activity from 15-20 frames of a streaming video

Should not depend on the initialization of action/gait cycle states (starting/ending points of a

an action cycle)

Should be invariant to speed of motion

Applications

Air crew hand gesture recognition for autonomous visual control of fighter jet

Decision to follow a person based on activity in surveillance.


5/19

Typical Data-flow for Generic Action Recognitionsystem

Feature Extraction : - Posture/Motion Cues (Hierarchical invariant features)

Action Segmentation:- Segmenting out action instances consistent with the train set

Action Learning and Classification:- Learn statistical models to classify new feature

observations ( based on PCA-Generalized Regression Neural Networks)

Feature

Extraction

Action Learning

Action

Classification

Action Model

Database

Action

SegmentationVideo


6/19

Feature Extraction and Feature Fusion

HierarchicalHistogram of

Oriented Flow

Quantized

Local Binary

Pattern

+

Action

Feature

Input Frame

Feature Fusion

Optical Flow

Optical Flow Mag/Dir

Hierarchical Histogram of

Oriented Flow

HOF

(N)

HOF

(N/2)

HOF

(N/2)

HOF

(N/2)

HOF

(N/2)

Masked Region

Feature Fusion Assumption that HHOF, LBFP and RT are independent of each other. Can concatenate one after the other to form the complete feature vector ( Feature Fusion in

Biometric systems)

R-Transform


7/19

Feature Selection

Feature Set

3-Level HHOF ( 140 elements) , 2-Level LBFP ( 295 elements) , 2-level R-Transform

(180) : Total Feature Set

Over fitting of regression model for each action class and tuned more to irrelevant and

redundant feature elements and thus lower accuracy.

Methodology ( Fast Correlation-based Feature Selection) - FCBF Identify relevant features with large correlation values

Remove redundant features and choose a subset of features.

Correlation measure based on Information Theory

Symmetrical Uncertainty (SU) between two random variables X and Y

H(X) Entropy ; IG(X|Y) information of X gained from the knowledge provided by

Y


8/19

Algorithm(Training / Testing)


9/19

RESULTS


10/19

Weizmann dataset

10 different actions performed by 9 different persons

Low resolution video at 30 fps

Static background


11/19

Weizmann Dataset

Testing strategy:- Leave 10 out (corresponding to one person)

Partial Sequence :- 15 frames with overlap of 10 frames


12/19

Robustness Test (Test for Deformity)With bag With dog Knees Up Limping Moonwalk

Legs

Occluded

Normal

WalkWith

BriefcaseWith Pole With Skirt

Test Seq 1st Best 2nd Best Median to

all actions

Swinging a

bag

Walk 2.508 Skip 3.094 3.939

Carrying a

briefcase

Walk 1.866 Skip 2.170 3.641

Walking

with a dog

Walk 1.806 Skip 2.338 3.824

Knees Up Walk 2.894 Side 3.270 4.091

Limping

Man

Walk 2.224 Skip 2.922 3.821

Sleepwalkin

g

Walk 1.892 Skip 2.132 3.663

Occluded

Legs

Walk 1.883 Skip 2.594 2.624

NormalWalk

Walk 1.886 Skip 2.624 3.633

Occluded by

a pole

Walk 2.149 Skip 2.945 3.880

Walking in a

skirt

Walk 1.855 Skip 2.159 3.540


13/19

Cambridge Hand gesture

9 different hand gestures Different combinations of shape and motion 5 different illumination conditions


14/19

KTH Action Dataset

6 human actions 25 subjects 4 different scenarios 600 sequence divided into 2391 subsequences Low res : 160 120 at 25 fps

11/10/2014 Binu M Nair 14

R lt 4 t i d f t


15/19

Results on 4 sets using proposed featureset.


16/19

Results on all sets with STIP features


17/19

UCF Sports Dataset

High Res : 720 480 200 video sequences Contains 9 actions Challenge :

Complex and varying background Wide range of scenes and view point variations

Tested on 8 actions : dive, golf swing, lift, ride, run, skate, swing and walk Tested on window size of 15 frames with overlap of 10.

11/10/2014 Binu M Nair 17


18/19

Future work in action recognition Testing on the UCF ARG

Dataset Multi-view human action

dataset Set of actions

Boxing, carrying, clapping,digging, jogging, open-closetrunk, running,throwing, walking, waving

Challenges Different resolutions

across cameras. Different kinds of

features.


19/19

Thank You

Questions?

Date post:	02-Jun-2018
Category:	Documents
Upload:	binuq8usa
View:	221 times
Download:	0 times

SIAM 2014 InvitedTalk

Documents