Activity Recognition in VideoShashi Kant
Cognika
www.cognika.com
February 6, 2013
Cognika Introduction
2/7/2013 2
MachineVision
Real-TimeSearch
Cognika Introduction
2/7/2013 3
MachineVision
Real-TimeSearch
Forensic“Search”
Real-TimeAlerting
What we do
• “Search” within FMV• By Image (OOI)
• By Video Clip
• By Text
• Real-Time
• Activity-based Searching – spatiotemporal querying
2/7/2013 4
Inverted Indexing
2/7/2013 5Source: developer.apple.com
Text Indexing Process
2/7/2013 6
Source Document
Analyze
Parser
Tokenizer
Stemmer
Tokens
Payloads
Write to IndexInvertedIndex
Analyzer
Indexed Documents
Vector Space Model
• Documents and Queries are “Vectors”
– Di = (wi,1, wi,2, wi,3….wi,n)
– Where wi,j is weight for “term” j in document
• Cosine Similarity = Cosine of angle between query and stored document
2/7/2013 7
TF-IDF Vector Space Querying
𝑑𝑗 = 𝑤1,𝑗 , 𝑤2,𝑗 … .𝑤𝑛,𝑗
𝑞 = 𝑤1,𝑞 , 𝑤2,𝑞… .𝑤𝑛,𝑞
Document
Query
2/7/2013 8
Video Indexing Process
2/7/2013 9
Blob Extraction
Source Video(s) Frames
Index“Documents”
Training Set
Object Classification
Metadata
Frame Extraction
BlobDescriptors
Document Construction
Inverted Index
Simplified Example
2/7/2013 10
Circle
Triangle
Circle <x1,y1>
Triangle <x2,y2>
Training Image Set
Frame ImageIndex Document Representation
<x4,y4>
Color, Shape, Texture, Contour
Descriptors
Flow Chart
StabilizationMotion
Compensation
VideoStream
IsCamera Moving?
Blob Tracking
Yes
No
Disk-basedIndex
Extract Blob
Feature Vector
Build Frameset(Sliding
Window)
In-Memory Index
2/7/2013 11
Alerting
Search
Sliding Window Approach
Frame-1 Frame-2 Frame-3 Frame-k... Frame-p...
Window 1
Window 2
Frame-q...
Window w...
2/7/2013 12
Sequences Hierarchy
Objects (e.g. Humans, Vehicles)
Events (e.g. Humans Moving, Vehicles Moving)
Activities (e.g. Persons Moving Away, Vehicles Driving away)
Scenarios (e.g. Humans Gathering around Parked Vehicles)
2/7/2013 13
VideoIndex
Blob Extraction
Object-Frame Matrixes
Inferred Latent SemanticGraph
Normalization to adjust for quality
Object Classification
Metadata(e.g. Date-
Time, Resolution
etc.)
Frameset
2/7/2013 14
What we Index
• Color histograms• Shape Descriptors• Contour Descriptors• Video Metadata (e.g. date-time, resolution etc.)• Contextual information (e.g. Geo-location etc.)
2/7/2013 15
Query Clip
Result Clips
2/7/2013 16
Query Response TimesActivity Query Mean Response Time(milliseconds)
(averaged over 5 consecutive queries)No. of Results
Parked Vehicle 762 482
Person Walking 482 891
Ingressing Vehicle 319 876
Egressing Vehicle 410 573
Moving Vehicles 890 1098
Vehicle Halting & Person Exiting 1028 73
Person Entering Vehicle & Vehicle Moving
1176 48
Persons Gathering 908 382
Sub-second Responses for Terascale & Larger possible
2/7/2013 17
Prototype UI
2/7/2013 18
Further Research
• Improved Feature Vectors (for sparse features)
• Improved Blob Classifiers
• Improved Stabilization, BG Subtraction & Motion Compensation
• “Super Resolution” Enhancements
2/7/2013 19
We Are Hiring!
2/7/2013 20
Machine Vision Engineers
• OpenCV, and other machine vision toolkits
• OpenGL, CUDA
• Bayesian, ANN, SVMs etc.
• Video Background desirable
Search Engineers
• Lucene, Solr, Elastic-Search
• Hadoop, Katta, ZooKeeper
• Terascale+ Real-Time Search Experience desirable