1
Avideh Zakhor
Video and Image Processing Lab
University of California, Berkeley
Fast, Automatic, Photo-Realistic, 3D Modeling of Building Interiors
Acknowledgements
Staff member:
– John Kua
Students
– Matthew Carlberg, Nick Corso, Nikhil Naikal,
George Chen, Jacky Chen, Tim Liu, Stephen
Shum, Stewart He, Victor Sanchez,
2
Outline
Problem statement
Localization
Model construction
Applications
Future work
4
3D Outdoor Modeling
•Combine airborne and ground-based laser scans and
camera images
•Acquisition vehicle in motion as data is being collected
Ground-Based Modeling
building facades
3D City Model
Airborne Modeling
rooftops & terrain
Fusion
Registration
5
6
7
8
Interactive Video of 3D Model for Downtown Berkeley
9
ABC News Clip on Indoor Modeling
Indoor Modeling Goals and objectives:
– 3D, fast, automated, photo-realistic models of building interiors
– Enables virtual walk throughs and fly throughs
– Visualize exterior and interior; seamless transition between the two
Applications of indoor modeling:
– Virtual reality, games and entertainment, training & simulations,
architecture, construction, real estate, first responders, emergency
management ….;
Today’s solutions: Indoor mapping:
– Wheeled devices on even, smooth surfaces
– Output: 2D maps, rather than textured 3D models
What about uneven surfaces e.g. staircases
What about photo-realistic textured 3D models?
Proposed Approach to 3D Indoor Modeling Use human operator rather than wheeled devices in order to
map/model uneven surfaces, tight environments
Concept of Operation:
– Equip a backpack with sensors
– Walk around a building to acquire the data
– Process data offline
Challenges:– Weight/power limitations for human operator with backpack
– Unlike outdoor modeling: No GPS inside buildings
No aerial imagery to help with localization
– Unlike wheeled systems with only 3 degrees of freedom: x, y, & yaw Need to recover six degrees of freedom for a human operator: x,y,z,yaw,pitch, roll
12
13
Human Operators …
Data Acquisition
14
C-L
L-V3
L-HU1
L-V2
Intersense
HG9900
C-R
L-H1
C-LC-R
L-S2
L-V3
L-V2
OMS
HG9900
Laptop
Applanix
L-V1
L = laserC = cameraH = horizontalV = Vertical
15
System Components
• High performance laptop with three striped RAID hard drives
• Two Point Grey Grasshopper cameras –• 5 megapixel, with fisheye lenses with 183 degree field of view
• Five Hokuyo URG indoor laser range finder (LRF):• 40Hz scan rate, 30m range, 1081 points per scan, 0.25 degree angular resolution, 270
degree field of view
• Intersense InertiaCube3 Orientation Measurement Sensor (OMS)
• External 170W hours Lithium Ion battery pack with regulators
• Handheld windows mobile device to control start and stop of data collection.
• Honeywell HG9900 Navigation Grade Inertial Measurement Unit (IMU) and Applanix navigation computer
• Used only as ground truth for localization• Price tag: $0.5M• Requires zero velocity updates (ZUPTS) for the after every 2 minutes of walking
• Walk 2 min, freeze 1 minute…..• Combines 3 ring laser gyros with bias stability < .003 deg/hr with 3 accelerometers
with bias of less than 0.245 mm/s
Concept of Operation
•Localize:
• Estimate position and orientation at each time instant
•Six degrees of freedom x,y,z, yaw, pitch, roll
•Uses horizontal and vertical laser scanners & cameras
•Generate point cloud
•Stack vertical scans using localization
•Uses vertical geometry laser scanners
•Reconstruct surface from the point cloud
•Texture map
•Uses camera
Vertical LRF
System Diagram
17
Incremental
localization
Loop closure
(global
localization)
Point cloud
generation
Texture
mappingSurface
reconstruction
LH1 LV2 OMSLV3 LV1
C-RC-L
6DOF 6DOF
Point cloud
3D TexturedModel
18
Incremental Localization: Iterative Closest Point (ICP) Scan Matching Algorithm
Scan 1
Scan 2m = first scan d = second scan
Iteratively finds the rotation R, and translation, t between two scans;
Works well if:
Scans are in the same plane
– two translations parameters and one rotation
Scans are more or less pre-registered R and t small
Environments with rich 3D geometric features
Outdoor Pose Estimation Using Scan Matching of Horizontal Scanner
Overlapping horizontal laser scans:Continuously captured during vehicle motion (75 Hz)Relative 2D pose estimation by scan-to-scan matchingv
u
t = t0
t = t1
Scan matchingTranslation (u,v)Rotation
(u, v)
Scan 1 Line segment
extraction
(u’, v’, ’)
Q(u’, v’, ’)
RotationR(’)
Translationt(u’, v’)
Scan 2
Robust LS
j s
d2
2
min
2 exp
20
Resulting Path from Scan Matching
Length:24.3 km
Driving time:78 minutes
Scans:665,000
Scan points:85 million
Camera images:19,200
200 m
2 Data Acquisitions
1
2
21
Use Airborne Data to Globally Correct Vehicle Pose
Idea: Ground-based façade scans should match edges in aerial photographs or digital surface models (DSM):
Maximize overlap between scan points and airborne edges
or
Additional advantage: registered airborne DSM and ground based data
22
200 m
1
2
Path After Global Registration
Length:24.3 km
Driving time:78 minutes
Scans:665,000
Scan points:85 million
Camera images:19,200
2 Data Acquisitions
23
Top View
Laser Scan Plot
Estimating Yaw Using Horizontal Scanner: Indoors
sidewall sidewall
24
Top View
Laser Scan Plot
Estimating Yaw Using Horizontal Scanner
sidewall sidewall
ICP scan matching: x, y, and yaw
25
Side View
Top View
Rear View
Use three orthogonally mounted laser scanners
Side walls
Side walls
ceiling
floor
sidewallsidewall
ceilingLV2: Vertical scanner point forward for pitch
LH1: Horizontal scanner for yawLV1: Vertical scanner pointing sideways for roll
•Run ICP scan matching 3 times
Incremental Localization: 3XICP Algorithm
Use three planar laser scanners:
Top horizontal yaw scanner:
∆x, ∆y, ∆yaw
Side looking vertical roll scanner
∆y, ∆z, ∆roll
Side looking vertical pitch
scanner
∆x, ∆z, ∆pitch
The 6DoF transformation is
[∆x; ∆y; ∆z; ∆roll; ∆pitch; ∆yaw]
Top horizontal yaw scanner
Side looking vertical roll scanner
26
C-L
L-V3
L-HU1
L-V2
Intersense
HG9900
C-R
L-H1
C-LC-R
L-S2
L-V3
L-V2
OMS
HG9900
Laptop
Applanix
L-V1
Side looking vertical pitch scanner∆
Incremental Localization: 2 x ICP + OMS Algorithm
Motivation: 3XICP can have large errors
OMS gives roll, pitch, and yaw; Yaw values not reliable
Affected by local magnetic field steel cabinets
New setup: Top horizontal yaw scanner ICP ∆x, ∆y, ∆yaw
Side looking vertical scanner ICP ∆z
InterSense OMS ∆roll, ∆pitch
Pitch/roll compensation of yaw scan matching
28
Roll = 10 deg
Pitch = 10 deg
Roll = -10 deg
Pitch = 10 deg
Roll = 0 deg
Pitch = 0 deg
Roll = 0 deg
Pitch = 10 deg
•Assume walls are vertical;•Rotate the uncompensated scan into 3-D space:
• Augment with zeros for z-coordinate;• Multiply by the rotation matrix defined by the estimated roll and pitch angles.
•The compensated scan is the projection of the 3-D scan onto the X/Y plane
Incremental Localization: 1xICP + OMS + Planar Algorithm
∆y∆x
x
ym
FLOOR
x
ypitch∆y ∆x
Direction of Motion
Z is perp. dist. frompitch scanner to floor
Scan line: y= mx + b
my
xp
1tantan 11
Pitch angle p:
•Top horizontal yaw scanner ICP ∆x, ∆y, ∆yaw
•Planarity: absolute Z•OMS: ∆roll, ∆pitch
Incremental Localization: Hybrid Algorithm 1xICP+OMS+Planar:
– Has very low error in Z, but only works with planar floors
2xICP+OMS
– Works with non-planar floors, e.g. stairs, but larger error
Hybrid algorithm combines the best of both worlds:
– Use regression and line fitting to detect sections of the environment that
have planar floors to adaptively switch between algorithms.
30
Shortcoming of Incremental Localization
•Unbounded error dead reckoning •Errors accumulate over time
•Global error large, even though incremental error is low
•Loop closure: •Revisit the same spot during acquisition to induce a “cycle” in the graph
• Build a graph G (V,E):• A node is pose at a given time• The edge between two nodes is the incremental
transformation Ti-->j from node i to node j• Ti-->j can be from any of the 4 incremental
localization algorithms
1 2 3 4 5T12 T23 T34 T45
Loop closure errorLoop Closure Revisit the same location during acquisition:
– Estimate a 6-DOF transformation between the two poses
of the two visits.
Nonlinear graph optimization to estimate “closed
loop” pose X:– Each edge: transformation Tij with covariance ∑ij
– f(Xi, Xj): transformation from pose Xi to Xj
– Intuitively size of covariance adjusts pose of nodes
1
2
3 4 5
8
7
6
910
Loop closure 101
jijijiEji
T
jijiXXX
TXXfTXXfV
),(),()(,,,
min1
21
Summary of Localization
Estimate transformations, T, and their covariances ∑ between:
• Adjacent nodes
• Each loop closure event
Graph
Graph optimization via TORO
Final estimated poses
Question: How to detect loop closures?
Loop Closure Detection
Use FAB-MAP [1] to generate rank-ordered list of candidate image pairs for potential
loop closures
Computes probability of each image belonging to each location:
– Use SIFT features within Bayesian inference and maximum likelihood framework
Loop closure event from subset of Dataset 3
[1] M. Cummins and P. Newman, IJRR 2008 34
Post-Process FAB-MAP with Keypoint Matching Run 100 trials of FAB-MAP:
– Record Image pairs with high
probability
Post-process image pairs with
highest counts to eliminate false
positives
– Key point matching (Lowe 2004)
For each candidate image pair
(I1,I2), find the nearest and the
second-closest neighbors for every
feature of I1, in I2 and compute the
ratio of their distances
Different distributions for correct
and incorrect matches from an
image pair:– For correct match, larger % below 0.6
threshold
0
20
40
60
80
100
85-1 18-3 69-20 84-3 19-3
Nu
mb
er
of
Image pairs in Dataset 1
35
Incorrect match Correct match
PDF of percentage of features with ratio below the threshold 0.6
Example of Detected Loop Closures
Results (1) : Effect of Loop Closure
36
•Closure reduces Global Error
Results for a simple loop in a 30 m hallway
Results(2): Comparing Algorithms
37
Set 1: operator 1 Set 2: operator 2
• “Planar” algorithms lowest global error in Z
Results for a simple loop in a 30 m hallway
∆ ∆ ∆ ∆ ∆ ∆ ∆ ∆ ∆ ∆ ∆ ∆
More Complex Environments: stairs: 2xICP+OMS
38
Set 1 Set 2
Set 3 Set 4
Dataset Path length Average
Position Error
1 68.73 m 0.66 m
2 46.28 m 0.35 m
3 46.28 m 0.58 m
4 142.03 m 0.43 m
Global roll and yaw error is larger for staircase
3 Story Localization
39
Loop Closure Detection With Multiple Cameras
40
One Left-Facing Camera
Two Opposing Cameras
1 2 3 4
568 7
9
1 2 3 4
568 7
9
•Loop closures can have opposite heading•Initial condition for yaw scanner: Zero for translation, pitch, roll; 180 deg. for yaw
•Loop closures must be in similar location and orientation•Zero initial translation and rotation for yaw scanner ICP
Double Camera Loop Closure
41
•Prune candidate loop closures with erroneous transformations:•Metrics from scan matching•Metric based on transformation of 3D locations of matched image features between loop closure images
System Diagram
42
Incremental
localizationLoop closure
Point cloud
generation
Texture
MappingSurface
reconstruction
LH1 LV2 OMSLV3 LV1
C-RC-L
6DOF 6DOF
Point cloud
3D TexturedModel
Generating Point Clouds
43
3D Colored Point Cloud: from outside
3D Colored Point Cloud: from inside
46
Two &threestories
Staircase Point Clouds
47Interactive video
48
Point cloudFor pushcartWith wheel
System Diagram
49
Incremental
localizationLoop closure
Point cloud
generation
Texture
mappingSurface
reconstruction
LH1 LV2 OMSLV3 LV1
C-RC-L
6DOF 6DOF
Point cloud
3D TexturedModel
Surface Reconstruction from Point Cloud 1. Piecewise planar surface model; assumes:
- Floor and ceiling are planar & horizontal.
-Walls are planar & perpendicular to the floor
2. Surface models from scan line triangulation-Take advantage of structure in scan line ordering of
laser to generate mesh of triangles
-Carlberg et al., 3DPVT 2008
Method 2 results in more geometrically rich
models, but also requires more artifacts R
R+1
N
N+1
Overview of Plane Fitting
51
Principal component
analysis (PCA)
Cluster analysis
Classification & segmentation
Plane fitting- RANSAC
PCA: For each input point pi of the point
cloud, perform a principal component analysis (PCA) on the ball neighborhood of radius σ centered on pi
The eigenvector associated with the smallest eigenvalue estimate of the normal vector, ni, to the surface of the ball neighborhood centered on pi
p
i
σni
Point Classification and Segmentation
52
Each point pi classified as part of one of three different types of structures:
Original point cloud of a T-shaped hall Type “Wall-X” Structure
pi
Wall-X “parallel” to y-z plane
Wall-Y “parallel” to x-z plane
Ceiling-floor “parallel” to x-y plane
• Each structure is segmented and analyzed separately.
Cluster Analysis
53
Analyze each structure separately to divide it into “clusters” e.g. wall,
floor ceiling
Use Euclidean distance
Type “Wall-X” Structure A cluster for a wall
Plane Fitting
54
Use RANSAC to fit a plane to each cluster
Inliers determine the extent of each plane.
Hard to deal with stairs
A cluster of points representing a wall
Inliers projected to the fitted plane
System Diagram
55
Incremental
localizationLoop closure
Point cloud
generation
Texture
mappingSurface
reconstruction
LH1 LV2 OMSLV3 LV1
C-RC-L
6DOF 6DOF
Point cloud
3D TexturedModel
Overall Approach to Texture Mapping
56
For Every Triangle:1. Find “Candidate” set of images2. Filter out undesirable images from candidate set3. Find the best Image in the filtered candidate set
Candidate
SetFilter Optimize
Image to be used to
texture map
Filter & Optimize
57
NTriangle normal
CCameralook vector
dDistance
Conditions:d < dt
α < α t
NTriangle normal
ddistance
CCameralook vector
α
Filtered Candidate Set
max[ 1/d * (-1*Ci) * N ] i = 1 .. Num of Cameras
FilterOptimize
Choose camera poses that are close to the triangle and are “head on” to maximize resolution
Among all candidates, choose one thatoptimizes joint function of distance andviewing angle.
Choosing the Candidate Set
58
– First, consider all images already used for a given
plane i.– Neighborhood consistency
– If filtered candidate set is empty. The consider
images with timestamp within Δt seconds of last
used image
To deal with path localization error.
Paths at two different times
– If filtered candidate set form step 2 is empty then
consider all available images.
wall
Piecewise Planar Model with Image Mosaicing
Looking from outside of hallway
Looking down hallway
Piecewise Planar Model with Image Mosaicing
Scan Line Triangulation Model of Cory Hall, Fourth Floor
East wall
Looking from outside of hallway
CeilingWest wall
Scan Line Triangulation Model of Cory Hall, Fourth Floor
Looking down hallway
Artifacts of Texture Mapped Models
Close up of west wall
image j of texture
atlas
image j+1 of texture
atlas
Seam due to slight illumination differences between subsequent images
3D to 2D mis-registration due to small errors in world to image transformation
Seam due to slight illumination differences between subsequent images
Localization error
Blending Refined Images-Image blending removes discontinuities in texture caused by
illumination differences between successive images
-For each triangle, texture is a weighted average of pixels from two
images.
Unblended Blended
),(),( 1111 iiiiiiii yxIyxIp
Ii(xi,yi): pixel from image i
Ii+1(xi+1,yi+1): pixel from image i+1
time
Image weights over time
i+1ii-1
1
0
αi-1 αi αi+1
time at which image i was acquired
Gain Compensation and Blending
Matthew Brown and David G. Lowe: Automatic Panoramic Image Stitching using Invariant Features IJCV 2007
Before:
After:
Interactive Rendering of a Planar model with Gain Compensation and Blending
Texture Mis-alignments
Laser based localization is not pixel accurate
Use images to refine localization
Three approaches:
– Graph optimization using images
– Image Stitching per plane Matthew Brown and David G. Lowe: Automatic Panoramic
Image Stitching using Invariant Features IJCV 2007
– MRF formulation
68
Graph Optimization Using Images
69
•Add new edges/transformations to graph using image matching
• Use Levenberg-Marquardt nonlinear minimization to enforce
consistency of photographic views of a set of 3D points.
Transformations Using images
Graph Optimization Results
70
Before
After
Localization Drift due to Image based Graph Optimization
71
•Z error increasing from left to right•Image matching changes the localization path & point cloud
Example of Graph Based Optimization
72
before
after
Example of Image Stitching
73
Interactive video
MRF Formulation
74
)min( dE )min( sd EE
•The minimization is a Markov Random Field (MRF) problem which can be solved using Graph Cut. [Boykov et al. 01]
1 ( , )
min( ( ) ( , ))N
q i s i j
i i j
E l E l l
•Cast texture selection and alignment as a labeling problem [Lempistky and Ivanov, 07]:•Include image transformations to generate more image candidates
2
1
))()(()( kl
m
k
kljis pIpIllEji
2)()(iliid CamTrilE
V. Lempitsky and D. Ivanov, Seamless Mosaicing of Image-Based Texture Maps. CVPR07.Y. Boykov, O. Veksler and R. Zabih, Fast Approximation Energy Minimization via Graph Cuts. PAMI01.
1p
Quality function Smoothness function
: number of triangles
: number of sampling points
: position of the i triangle
: image label for the i triangle
I : image with label
: camera position of image I
p : the k sampling point
i
i i
i th
i th
l i
l l
k th
N
m
T
l
l
C
s in an edge
Quality only Quality and smoothness
76
Interactive Rendering of Models
Application to Image Based Rendering Side products of 3D Modeling:
– Pose of captured imagery:
Which images to render for a given view
– 3D point cloud and fitted planes
Occlusions
3 steps:
– Choose the “best” image(s) head on and close by angle and distance criteria
– Check for occlusion remove images
– Mosaic & blend the remaining images
A user navigates with the keyboard and mouse to rotate, translate, and switch
between ceiling, floor, and side cameras
Example: 5 images stitched together per frame results in 5 frames per second
interactive video
77
Application to Mobile Indoor Augmented Reality– Image based localization is key in mobile AR applications
Superimpose meta-data information on query image from mobile location based advertising.
– Key ingredient for image based localization:
A geo-taged image database with full pose for each image
– Easy for outdoors GPS, IMU
– Use recovered pose from 3D indoor modeling for a geo-tagged image database
78
outdoors
indoors
Indoor Image Retrieval Performance
0.5
0.6
0.7
0.8
0.9
1
0 1 2 3 4 5 6 7 8 9 10 11
Ma
tch
%
Top N
Indoor Performance
cory-2 cory-5
Cory Hall fifth floor: offices;2nd floor: display cases, poster, pictures
Examples of 2nd & 5th floor Cory pictures
80Data base Cell phone query
5th floor
2nd floor
Future work
Surface reconstruction needs more work:
– Planar and triangulation are too naiive
– Need more sophisticated schemes such as ball pivoting
– Need to deal with modeling staircases, etc.
Reduce texture mis-alignment
Improve localization
– Pixel level accuracy
– Small error in localization results in noticeable visual artifacts:
Geometry and texture both get messed up
Water tight models a priori constraint
Simplify models to reduce size:
– Rendering and interactivity
Characterize accuracy more systematically
– Volumetric characterization rather than localization
81
Applications
Architecture:
– Reverse engineering floor plans
– High rise offices, residential, government offices.
Plant and factory facilities:
– Keeping track of location and condition of equipment & assets in factories
Public areas such as schools, hospitals
– Useful for firefighters, law enforcements, first responders
Airport, train/bus stations, transportation facilities
Music halls, stadiums, theatres, public event spaces
Gaming and entertainment
Underground mines and tunnels
82