Optimizing Pedestrian Detection For Real-time Automotive Applications Vladimir Glavtchev Automotive Computer Vision Engineer NVIDIA
Agenda
§ Scope of this presentation
§ Introduction to Pedestrian Detection
§ Survey of Pedestrian Detection Techniques
§ Academic Focus
§ Production ADAS Focus
§ Optimizations
§ A Path to Production
§ Demo
Scope of the Presentation
§ What you will see in this talk: — Introduction to state-of-the-art techniques in Pedestrian Detection
— High-level concepts
— The gap between academia and industry in ADAS
— Optimizations needed to bridge this gap
— Hardware recommendations for various algorithms
§ What you will not see in this talk: — Low-level CUDA-specific optimizations
— Marketing pitch / jargon
— Code!
Introduction to Pedestrian Detection
§ Examples
Image courtesy: Dollar et al– “Pedestrian Detection: An Evaluation of the State of the Art”
Research Examples
§ Example vehicles — VisLab
— Lexus
Image courtesy VisLab Image Courtesy Lexus
Our goal: Integrated system
§ Windshield mounted camera(s)
Image courtesy Valeo
Production Examples
§ Current night vision pedestrian detection systems — German automakers
— Volvo, Lexus, etc
Image courtesy BMW AG
Terminology
§ Introduction to terminology: — True positive (correct alarm)
§ Detecting an existing pedestrian.
— False positive (false alarm) § System alarms of pedestrian when none is present.
— True negative (correct non-alarm) § System correctly does not alarm because no pedestrian is present.
— False negative (incorrect non-alarm) § System failing to alarm when a pedestrian is actually present.
Terminology
§ Introduction to terminology: — Preprocessing
§ Steps taken to rectify, undistort, enhance, and otherwise enable easier classification and tracking.
— Classification § Detecting a pedestrian in an image.
— Tracking § “Following” a pedestrian in a sequence of images after a classification.
Classification State-of-the-Art Techniques
• Haar wavelets
• Histograms of Oriented Gradients (HOG)
• Local Binary Patterns (LBP)
Classification State-of-the-Art Techniques
§ Current state-of-the-art is: — Sped up version of ChnFtrs1 classifier (Integral Channel Filters)
— HOG-like features + Cascade structure + SVM2
1 Dollar et al – “Fastest pedestrian detector in the west”
2 Benenson et al – “Pedestrian detection at 100 frames per second”
Academic Focus
§ Accuracy
§ Novelty
§ Trade-offs
§ Grayscale / color
§ Depth / stereo
§ Public datasets — Not necessarily representative of in-vehicle camera footage (INRIA,
TUD Brussels)
Academic Detection Pipeline
Input Frame
Preprocessing
Classification
False Positive Reduction
Tracking
Naive N models, 1 image scale
Traditional Approach 1 model, N image scales
Image courtesy: Benenson et al – “Pedestrian Detection at 100 frames per second”
Naïve Implementation
Traditional Academic Implementation
Traditional Academic Implementation
Traditional Academic Implementation
§ Runtime performance — ChnFtrs runs at ~1 fps on modern desktop HW
P. Dollár, C. Wojek, B. Schiele and P. Perona “Pedestrian Detection: An Evaluation of the State of the Art”
Academic Pipeline Performance
§ Scaling is expensive
§ Reduce scaling by a factor of K
Classification Optimization – Approach I
Dollar “FPDW” Approach 1 model, N / K image scales
§ Runtime performance — Optimization achieves 3–7 fps on average
— But that’s still not fast enough!
Image courtesy: Dollár et al - “Pedestrian Detection: An Evaluation of the State of the Art”
Classification Optimization – Approach I
§ Don’t scale image
§ Scale model by a factor of K
Classification Optimization – Approach II
Benenson “100 fps” approach N / K models, 1 image scale
Benenson et al – “Pedestrian detection at 100 frames per second”
Performance: Approach II
Core i7 870 NVIDIA GeForce GTX 470 (14 SMs)
NVIDIA GeForce GT 640 (2 SMs)
Target Automotive GPU (1 SM)
0.08 fps 1.38 fps 0.20 fps 0.10 fps
§ Runtime performance — Performance for VGA (640x480) images
— GPU offers 15x performance boost over CPU
— Simple scaling of performance to number of SMs
Search Space Reduction: Approach II
§ Reduce search space using stixels
Benenson et al – “Fast stixel computation for fast pedestrian detection”
Production ADAS Focus
§ In-vehicle integration
§ Real-time operation
§ Accuracy
§ No false positives — Can put driver in dangerous situations
— Reduces driver confidence in the system
§ Cost — ASICs, FPGAs, General-purpose processors
§ Power
From Academia to Production
§ Cameras — Color vs. Grayscale
§ Implications of grayscale: Retraining classifiers, reduced detection rate
— Monocular vs. Stereo § Implications of mono: No depth information, larger search space
— Infrared § Can significantly simplify night vision detection
§ Sensor fusion — Availability of lidar / radar
§ Can provide depth information for monocular cameras
Proposed Solution: Motion Estimation
§ Observation: To an observer on a moving vehicle, closer objects move faster than objects farther away
Towards ADAS: Motion Estimation
§ Calculate motion vector for each N x N pixel block — Compute motion from previous frame to current frame
— Possible methods: § Optical flow algorithms
§ Lucas-Kanade, Block-matching, Horn and Schunck
§ Block-based Iterative Motion Estimation (used for video encoding)
— Great fit for GPU because these algorithms are very parallel § Most operate on blocks of pixels
- Function available in OpenCV library
Towards ADAS: Motion Example
Towards ADAS: Motion Segmentation
§ Segment blocks of pixels which have a motion vector with: — High confidence
— High enough magnitude
— Similar direction
§ These segments represent the “foreground” — Objects which are moving faster than those around them
Towards ADAS: Geometry Reduction
§ Reducing the classification search space using geometric constraints
— Pedestrians cannot be taller than a certain height
— Pedestrians cannot be shorter than a certain height
— Pedestrians cannot be detached from the ground
§ What is needed? — Estimate of the ground plane
— Vehicle / camera pitch information
Tracking Bounding Box
Classification Bounding Box
Max. allowed height
Estimated Horizon
Min. allowed height
Estimated pedestrian base
Towards ADAS: Geometry Reduction
Pedestrian tracking
§ Why do we need to track? — Classification does not give us successful results at each frame
— Gives us a better approximation of a pedestrian’s trajectory
§ Tracking using motion information — Median Flow
§ Closed-loop tracking — MeanShift, CamShift, and TemplateMatching
- Function available in OpenCV library
Tracking: MeanShift
Histogram of tracking box in Hue space
Input Frame
Preprocessing
Classification
False Positive Reduction
Tracking
Input Frame
Preprocessing
Geometry Reduction
Depth/Motion Segmentation
Classification
Tracking
Traditional Academic Pipeline
Proposed Optimized Pipeline
15x
15x
Motion Estimation Free
8x
44x
Gain HW
Gain HW
Demo
§ Video demo
Benenson et al – “Pedestrian detection at 100 frames per second”
Integrated ADAS Performance
NVIDIA GeForce GTX 470 (14 SMs)
NVIDIA GeForce GT 640 (2 SMs)
Target Automotive GPU (1 SM)
50 fps 7.14 fps 3.57 fps
§ Monocular results
NVIDIA GeForce GTX 470 (14 SMs)
NVIDIA GeForce GT 640 (2 SMs)
Target Automotive GPU (1 SM)
135 fps 19.29 fps 9.64 fps
§ Stereo stixels + ground plane results — Motion Estimation & Geometric Reduction
Conclusions
§ Full-frame classification is not fast enough yet
§ GPU acceleration can lead to a large speed-up in classification
§ Classification search space must be significantly reduced to achieve real-time results
§ Recent progress in academic research is employing practical system deployment concepts
§ Advances in sensors and processors will enable very high frame rates which will free up resources for other tasks
Thank You!
§ Special thanks to:
— Elif Albuz
— Phillip Smith
— Shalini Gupta
— Khanh Duc