Confidential
1
Dr. Gaby Hayon, EVP R&D
Mobileye Sensing Status and Road Map
November 2019
Smart agent for harvesting, localization and dynamic information for REM based map
ADAS products working everywhere and at all conditions on millions of vehicles
Sensing state for ME policy under the strict role of independency and redundancy.
The Challenge of Sensing for the automotive market
ME sensing has three demanding customers
True redundancy
Surround computer vision Radar/Lidar sub-system
ME’s AD Perception
surround computer visioncomprehensive env. model
ME’s AD Perception
Comprehensive CV Environmental Model
Full and unified surround coverage of all decision-relevant environment elements.
These are generally grouped into 4 categories:
Road Geometry (RG)All driving paths, explicitly/partially/implicitly indicated, their surface profile and surface type.
Road Boundaries (RB)Any delimiter of the drivable area, it’s 3D structure and semantics.Both laterally delimiting elements(FS) and longitudinally (general objects/debris).
Road Users (RU)360 degrees detection and inter-camera tracking of any movable road-user, and actionable semantic-cues these users convey. (light indicators, gestures).
Road Semantics (RS)Road-side directives (TFL/TSR) , on-road directives (text, arrows, stop-line , crosswalk) and their DP association.
Object detection DNNs Texture engine , example
Structure engine, example
Robust CV Environmental Model
Multiple independent visual-processing engines overlap in their coverage of the 4 categories (RG, RB, RU, RS)
To satisfy extremely-low nominal failure frequencies of the CV-Sub-system
Lanes detection DNN
Single view Parallax-net elevation map
Semantic Segmentation engine
Multi-view Depth network
Generalized-HPP (VF)
Wheels DNN
Road Semantic Networks
RG
RB, RU ,RS
RB, RU
RB, RU
RG
RU
RU
▪ Longitudinal and Lateral Driving plans / decisions• Overtake : Is the vehicle an obstacle?• Lane change: “Give-way“ /“take-way” labeling of objects • Assessment of objects likely trajectories by the scene.
▪ VRU related drive planning
▪ Environmental limitations
▪ Safe-stop possibility
▪ Emergency/Enforcement response
Support of different driving decisions & planning requires extraction of additional, essential set of contextual cues:
Actionable CV Environmental Model
Ped trajectory, intentions (head/body pose), relevance, vulnerability & host-path-access.
visibility range , blockage, occlusions/view-range, road friction.
Emergency vehicles / personnel detection, Gesture recognition.
Is the road shoulder drivable? Is it safe to stop?
Cccc
Confidential
8
Visual perception
Environment Model Elements
Road Users
Road Users
360 degrees detection and inter-camera tracking of any movable road-user, and actionable semantic-cues these users convey (light indicators, gestures)
On top of the standalone Object detection networks running on all cameras, 2 Dedicated 360-stitching engines have been developed to assure completeness and coherency of the unified objects map:
• Vehicle signature
• Very close (part-of) vehicle in FOV : face & limits
“Full Image Detection”- raw signal “Full Image Detection” output- short range precise detection
Road Users
Road Users
Road Users
Temporal tracker
Dimension net output
Metric Physical Dimensions estimation
dramatically improving measurements quality using novelty methods
Road Users
Wheels- RU-part (relatively regular in shape) which we deliberately detect to affirm vehicle detections, 3D position, and tracking for high-function customers.
Road Users
▪ The semantic segmentation is evident of all Road users, redundant to the dedicated networks
▪ It is also evident of extremely-small visible fragments of road users; These may potentially be used as scene-level contextual cues.
Road Users – open door
Open car door is uniquely classified , as it is both extremely common, critical and of no ground intersection
Road Users - VRU
Baby strollers and wheel chairs are detection through a dedicated engine on top of the highly matured pedestrians detection system
Road Users - VRU
Baby strollers and wheel chairs are detection through a dedicated engine on top of the highly matured pedestrians detection system
Surround-view stitched SR FS
Road Boundaries
Occupancy Grid:
▪ Fusion of free space signal from 4 parking cameras, and front camera
▪ Main usages: a very accurate signal for handling crowded scenes, and a redundancy layer for objects detection, specifically general objects as containers, cones, carts, etc.
▪ Comparing the known scene (road edges and detected objects) with the occupancy grid. The differences are marked and reported as unknown objects.
Road Users
Emergency vehicle , light indicators Pedestrian understanding
Road users semantics
▪ Head/pose orientationPedestrians posture/gesture.
▪ Vehicle light indicatorsEmergency vehicle/Personnel classification.
Road Users
Pedestrian Gesture Understanding
Come closer Stop! On the phone You can pass
Confidential
Road Users
• Redundant to the appearance-based engines
• Reinforce detection and measurements to support higher level of end-functions
• E.g.- dealing with “rear protruding” objects – which hover above the objects ground intersection.
Dense Structure-based Object detection
Road Users
100°
100°
100°
• Infers depth in "center" view using input from "center" and overlapping "surround" cameras
• Flexibility in camera placement and orientation compared to canonical stereo-baseline camera pair setups
• Covering blind-regions using e.g. parking camera in the front region
• Learning based approach allows finding good object shape priors, and prediction in texture-less regions
• Angular resolution much higher than Lidar
• Provides independent measurement and detection modality
• Does not rely on manual labeling
• Predicts per-pixel depth independent of Lidar
DNN based multi-view stereo
How do we do this?
Confidential
Road Users
Road Users
DNN based multi-view stereo
Road Users
DNN based multi-view stereo
Leveraging Lidar Processing Module for Stereo Camera Sensing – “Pseudo-Lidar”
Road Users
Dense depth image from stereo cameras
High-res Pseudo-Lidar Object detectionUpright obstacle ‘stick’ extraction
Road Users
• RSS safety envelope should not be violate even in areas with limited visibility
• To ensure that, we must determine whether the reason for not detecting an object is because it doesn't exist or due to an occlusion
• The solution- creating a 360 deg visibility envelope and measuring visibility range in all angles
• Computation of information gathered from all cameras and the following features:- Free space and road edges - Vehicles and pedestrians detection - REM map and road elevation
View Range
knowing that you don’t know
Road Users
Policy-level applicationsplacing "fake targets" in occluded areas that intersect with ego's planned path, assuming plausible speed and trajectory
Z axis view rangecopping with occlusions deriving from road elevation
Visible range
Occluded
Ghost target
Visible range
Occluded
View range origin legend
Main Front
Narrow Front
Front Right
Front Left
Rear Right
Rear Left
Rear
Road Boundaries
▪ Road▪ Elevated▪ Cars▪ Bike, Bicycle▪ Ped▪ CA obj▪ Guardrail ▪ Concrete▪ Curbs▪ Flat ▪ Snow▪ Parking in▪ Parking out
Full Surface Segmentation Road/nRoad
Detection of Any delimiter of the road surface- 3D structure and semantics. Both laterally delimiting elements(FS) and longitudinally (GO/debris)
The Semantic segmentation engine provides a rich, high resolution pixel-level labeling; The SSN vocabulary is especially enriched to classify road delimiter types:
Road
Edge
Car
Bike
Ped
General object
GuardRail
Concret
Curb
Flat
Snow
Road
Edge
Car
Bike
Ped
General object
GuardRail
Concret
Curb
Flat
Snow
Surround Road/nRoad classification
Road Boundaries
Detection of Any delimiter of the road surface, it’s 3D structure and semantics. Both laterally delimiting elements(FS) and longitudinally (GO/debris)
The Parallax Net engine provides an accurate understanding of structure by assessing residual
elevation (flow) from the locally governing road surface (homography).
It is therefore evident of extremely small objects and low-elevation lateral boundaries.
Debris detection identifies structural deviations from road surface.
Structure from Motion approach: geometry-based & appearance-invariant.detects any type of hazard.
Debris Detection
Road Geometry - Road3 in production
https://www.youtube.com/watch?v=s7HCI33KVHA
Advanced lane applications (VW) Volkswagen Passat Travel Assist 2.0
with Mobileye camera
Road Geometry
Road4 Technology provides deep lanes understanding rather than “simple” lane-marks detection
▪ Severely occluded lane-marks - Endures gaps of over 20m within marker
▪ Semi/partly/unmarked lane marker
▪ Multi-geometry lane structures – merge, split, HWE, junctions
▪ Stable DP map also pass-through Junctions and construction areas
Bots dots and occluded lane marks
Lane detection on wet roads at nightMerge and splits and passing through junctions
Road Geometry
Parallax-Netprovides a dense understanding of all driving surface elevation model , and local detailed ‘longitudinal profile’ characteristics such as road bumps and ditches
Road Geometry
▪ Host Driving Path : Geometry and Center
▪ Any-object (point) driving path
▪ Any-object (point) lane assignment
▪ Road-elevation - accounted-for by inference
The Generalized HPP technology (VF) provides
Does not involve explicit detection and modeling of lane-boundary evidence, but rather leverages top down contextual understanding.
Road Semantics
▪ Road-side directives (TFL/TSR)
▪ on-road directives (text, arrows, stop-line , crosswalk)
▪ Lane type- HOV, bicycle lane
▪ The DP association
▪ Road Friction
▪ Boundary type
▪ OCR
Road Semantics
Road Semantics
K
Confidential
Lidar/Radar Sensing Subsystem
Confidential
Lidar/Radar-only Subsystem Setup
Environment Modeling– Road Users& Free- Space Detection
Free-Space detection via 3D Occupancy EngineModel-based approach
Road User detection & tracking Model-based approach
Lidar Semantics - Shape Classification
Data-driven classification approach
Key use-case static object near crosswalk - distinguish between:
Dedicated Deep Neural Net fed with Lidar reflections to resolve semantic ambiguities.
Pedestrians – give wayTraffic signs – drive through
Lidar-Localization in Camera-Generated Map
Localization in sparse semantic map is enabled by extracting rich Lidar features
Vehicle trajectory Semantic map information & Lidar reflections projected onto front camera
Bird’s view display + map semantics