Image formation
Agenda
• Perspective projection
• Rotations
• Camera models
Light as a wave + particle
Light as a wave (ignore for now)
Refraction Diffraction
Image formationImage Formation
Digital Camera
The Eye
Film
Image Formation
Digital Camera
The Eye
Film
Digital Image
Image Formation
Digital Camera
The Eye
Film
Human eye
Pixel brightness
CS 217 Lecture 1 — April 1 Spring 2009
2. We measure the amount of light leaving a surface as
Radiance = power / (foreshortened area · solid angle)
= watt / (meter2 · sr)
=δ2P
δA · δω≈
P
∆A · ∆ωSteradian = surface area of an unit radians sphere
cut-out by a solid angle (0 ∼ 4π)
in 1D, reduces to radian = length of an unit radius circle
cut-out by a unit angle (0 ∼ 2π)
we need foreshortened area because a patch directly overhead δA sees more of A
1.1.3 Imaging a pixel
pixel intensity∝ total irradiance =
! x+∆x
x
! y+∆y
y
! 1
t=0
! π
−π
! π
2
0E[x, y, t, θ,φ] · f(θ,φ) dx dy dt dθ dφ
(sensor response)
0 < f(θ,φ) < 1↪→ will tend to 1 for (θ,φ) directly overhead patch
Q : Why do we not see an image of a scene on a paper?
A: Restrict directions of incoming light with a pinhole.
Pinhole optics:
Right-hand coordinate system place the scene at -z:
y′
f ′=
y
z,
x′
f ′=
x
z⇒
y = f ′ · yz
x = f ′ · xz
1-2
(More on “light as psychics” at end of semseter)
Pinhole opticsPinhole camera
Camera ObscuraCamera Obscura
World’s largest photographWorld’s largest photograph – 2006, El Toro Marine Corps Irvine,CA
El Toro Marine Corps, Irvine CA 2006
Accidental pinholes
(the view from Antonio’s hotel room)
what’s the dark stuff?
Torralba and Freeman, CVPR’12
Torralba and Freeman, CVPR’12
Accidental pinhole and pinspeck cameras: revealing the scene outside the picture
Antonio Torralba, William T. FreemanComputer Science and Artificial Intelligence Laboratory (CSAIL)
[email protected], [email protected]
Abstract
We identify and study two types of “accidental” imagesthat can be formed in scenes. The first is an accidental pin-hole camera image. These images are often mistaken forshadows, but can reveal structures outside a room, or theunseen shape of the light aperture into the room. The sec-ond class of accidental images are “inverse” pinhole cam-era images, formed by subtracting an image with a smalloccluder present from a reference image without the oc-cluder. The reference image can be an earlier frame of avideo sequence. Both types of accidental images happen ina variety of different situations (an indoor scene illuminatedby natural light, a street with a person walking under theshadow of a building, etc.). Accidental cameras can revealinformation about the scene outside the image, the lightingconditions, or the aperture by which light enters the scene.
1. Introduction
Researchers in computer vision have explored numerousways to form images, including novel lenses, mirrors, codedapertures, and light sources (e.g. [1, 2, 7, 10]). The novelcameras are, by necessity, carefully designed to control thelight transport such that images can be viewed from thedata recorded by the sensors. In this paper, we point outthat in scenes, accidental images can also form, and can berevealed within still images or extracted from a video se-quence using simple processing, corresponding to acciden-tal real and “inverse” pinhole camera images, respectively.These images are typically of poorer quality than imagesformed by intentional cameras, but they are present in manyscenes illuminated by indirect light and often occur withoutus noticing them.
A child might ask: why we don’t see an image of theworld around us when we view a blank surface? In a sensewe do: light rays yielding images of the world do land onsurfaces and then reflect back to our eye. But there are toomany of them and they all wash out to the ambient illu-mination we observe in a room or outdoors. Of course, if
a) b)
Figure 1. a) Light enters the room via an open window. b) On thewall opposite the window, we can see a projected pattern of lightand shadow. But, are the dark regions shadows? See Fig. 2
one restricts the set of light rays falling on a surface, we canreveal some particular one of the images. This is what a pin-hole camera does. Only a restricted set of rays falls on thesensor, and we can observe an image if we look at a surfacewith light from only a pinhole falling on it. A second wayto view an image when looking at a surface is to restrict thereflected rays from the surface by looking at a mirror sur-face. All rays impinge on the surface, but only those from aparticular direction reflect properly into our eye and so weagain see an image when viewing a surface.
There are many ways in which pictures are formedaround us. The most efficient mechanisms are to use lensesor narrow apertures to focus light into a picture of what isin front. So a set of occluders (to form a pinhole camera)or a mirror surface (to capture only a subset of the reflectedrays) will let us see an image as we view a surface. Forthose cases, an image is formed by intentionally building aparticular arrangement of surfaces that will result in a cam-era.
However, similar arrangements appear naturally by acci-dental arrangements of surfaces in many places. Often theobserver is not aware of the images produced by those ac-cidental cameras. Fig. 1.b shows one example of a picturein which one can see a pattern of shadows and reflectionsprojected on the walls of different scenes. Indeed, at first,one could miss-interpret some of the dark patterns on the
1
CVPR 2012
Perspective projection
Closer objects appear larger Closer objects are lower in the image
Parallel lines meet
Great reference
https://www.youtube.com/watch?v=q8xsXFU7dK0&list=PLc0IeyeoGt2xtmfaF2ST_uNdeptre3f9s&index=2
Pinhole Camera
How do we compute P’? [on board]
optical axis
[Aside: right-handed coordinate system]
Pinhole Camera
Image inversion
Image inversion
Perplexed folks for a while. But software (or the brain) can simply invert this.
Physical model that avoids inversion
COP = pinhole, camera center Distance of COP to easel = focal length
“easel”
Visual angle
“easel”✓
✓ =L
f
Human head is 9 inches high. At a distance of 9 feet, it subtends 1/12 radians = 4.8 degrees, regardless of focal length
theta = units of radians
Note: math is easier for a spherical easel (e.g., retina)
L = length of projection on sphere
(common unit in human vision)
Field of view (FOV)24mm
50mm
135mm
Field of View
24mm
50mm
135mm
Field of View
FOV = total sensor size (diagonal)focal length
(in radians)
Increasing the focal length and stepping back
© Marc Levoy
© Marc Levoy
✦ changing the focal length lets us move back from a subject, while maintaining its size on the image
✦ but moving back changes perspective relationships
What happens to apparant object size and FOV when we double distance to object and double the focal length?
xnew
=
2fX
2Z=
fX
Z= x
old
FOVnew
=
sensor size
2f=
1
2
FOVold
Decreasing the focal length and moving forward
© Marc Levoy
Perspective projection
Closer objects appear larger Closer objects are lower in the image
Parallel lines meetAll these can be simply derived with x = f
X
Z
!
Vanishing point: proof2
4XYZ
3
5 =
2
4A
x
By
Cz
3
5+ �
2
4D
x
Dy
Dz
3
5
x =fX
Z
=f(A
x
+ �D
x
)
A
z
+ �D
z
! fD
x
D
z
as � ! 1
y =fY
Z=
f(Ax
+ �Dx
)
Az
+ �Dz
! fDy
Dz
as � ! 1
3D lines with identical direction vectors coverge to same 2D image location
(parallel lines meet)
Compute projected point (x,y) as lambda approaches infinity [on board]:
COP
(X,Y,Z)(x,y,f)
Special case: manhatten world
VP1 VP2
VP3
Consider a “city-block” world where all lines follow one of 3 directions
Special case: horizon lineFunny things happen… Parallel lines aren’t…
Figure by David Forsyth Claim: all 3D lines on ground plane meet at a horizon line
Horizon line: proof
For all points A on ground plane (Ax,-h,Az) with a direction D along ground plane (Dx,0,Dz), where will vanishing points converge to?
Equation of ground plane is Y = -h
2
4XYZ
3
5 =
2
4A
x
By
Cz
3
5+ �
2
4D
x
Dy
Dz
3
5
(fD
x
Dz
, 0)
(x, y) ! (fD
x
D
z
,
fD
y
D
z
) as � ! 1
COP
(X,Y,Z)(x,y,f)
Why is horizon line not always at center of image?
Image y position: proofEquation of ground plane is Y = -h
A point on ground plane will have y-coordinate=?
Z1
Z2Z3
y = -fh/Z
Image height: proofBottom of tree: (X,-h,Z) Top of tree: (X,L-h,Z)
ytop
� ybot
=f(L� h)
Z� �fh
Z=
fL
Z
Consequence of derivations for image height and parallel lines
distances and angles aren’t preserved in camera projection
32
Orthographic projection
(X,Y,Z)(x,y,f)
x = X y = Y
x = fX/Z y = fY/Z
Life would be much simpler; we could trust angles and distances
COP
(X,Y,Z)(x,y,f)
33
Scaled orthographic projection
2
4A
x
Ay
Z
3
5
2
4B
x
By
Z +�Z
3
5
Consider two points (A,B) at different depths that are far away from camera:
if Z >> deltaZ, what happens to their image projections (e.g., ax and bx)?
ax
=
fAx
Z= ↵A
x
bx
=
fBx
Z +�Z
⇡ fBx
Z= ↵B
x
for �Z ⌧ Z
We can approximate sets of such points with a scaled orthographic model
COP
Perspective vs Orthogrpahic
Wide angle Standard Telephoto
Perspective vs Orthographic
Wide angle Standard Telephoto
Scaled orthographic
Scaled orthographic
Perspective tends to matter for large objects
Funny things happen… (change in depth of object large relative to distance from camera)
A look back: dominant effects of perspective
• Parallel lines meet at vanishing points
• Objects further away are smaller
• Foreshortening
Fronto-parallel view Foreshortened view Perspective view
Affine “linear” warp Homography “nonlinear” warp
Rotation of far-away plane Rotation of close-by plane
164 Computer Vision: Algorithms and Applications (September 3, 2010 draft)
Transformation Matrix # DoF Preserves Icon
translationh
I ti
2⇥32 orientation
rigid (Euclidean)h
R ti
2⇥33 lengths ⇢⇢
⇢⇢SSSS
similarityh
sR ti
2⇥34 angles ⇢
⇢SS
affineh
Ai
2⇥36 parallelism ⇥⇥ ⇥⇥
projectiveh
˜Hi
3⇥38 straight lines `
Table 3.5 Hierarchy of 2D coordinate transformations. Each transformation also preservesthe properties listed in the rows below it, i.e., similarity preserves not only angles but alsoparallelism and straight lines. The 2⇥3 matrices are extended with a third [0T 1] row to forma full 3⇥ 3 matrix for homogeneous coordinate transformations.
amples of such transformations, which are based on the 2D geometric transformations shownin Figure 2.4. The formulas for these transformations were originally given in Table 2.1 andare reproduced here in Table 3.5 for ease of reference.
In general, given a transformation specified by a formula x0 = h(x) and a source imagef(x), how do we compute the values of the pixels in the new image g(x), as given in (3.88)?Think about this for a minute before proceeding and see if you can figure it out.
If you are like most people, you will come up with an algorithm that looks something likeAlgorithm 3.1. This process is called forward warping or forward mapping and is shown inFigure 3.46a. Can you think of any problems with this approach?
procedure forwardWarp(f,h, out g):
For every pixel x in f(x)
1. Compute the destination location x0 = h(x).
2. Copy the pixel f(x) to g(x0).
Algorithm 3.1 Forward warping algorithm for transforming an image f(x) into an imageg(x0) through the parametric transform x0 = h(x).
36 Computer Vision: Algorithms and Applications (September 3, 2010 draft)
y
x
similarity
Euclidean affine
projective
translation
Figure 2.4 Basic set of 2D planar transformations.
Translation. 2D translations can be written as x0 = x + t or
x0 =h
I tix (2.14)
where I is the (2⇥ 2) identity matrix or
x0 =
"I t
0
T 1
#x (2.15)
where 0 is the zero vector. Using a 2⇥ 3 matrix results in a more compact notation, whereasusing a full-rank 3⇥ 3 matrix (which can be obtained from the 2⇥ 3 matrix by appending a[0T 1] row) makes it possible to chain transformations using matrix multiplication. Note thatin any equation where an augmented vector such as x appears on both sides, it can always bereplaced with a full homogeneous vector x.
Rotation + translation. This transformation is also known as 2D rigid body motion or the2D Euclidean transformation (since Euclidean distances are preserved). It can be written asx0 = Rx + t or
x0 =h
R tix (2.16)
where
R =
"cos ✓ � sin ✓
sin ✓ cos ✓
#(2.17)
is an orthonormal rotation matrix with RRT = I and |R| = 1.
Scaled rotation. Also known as the similarity transform, this transformation can be ex-pressed as x0 = sRx + t where s is an arbitrary scale factor. It can also be written as
x0 =h
sR tix =
"a �b txb a ty
#x, (2.18)
where we no longer require that a2 + b2 = 1. The similarity transform preserves anglesbetween lines.
2D Geometric Transformations
Let’s define families of transformations by the properties that they preserve
Where we are headed….
Euclidean (trans + rot) preserves lengths + angles
Euclidean
Affine
Projective
Affine: preserves parallel lines
Projective: preserves lines
but first, we’ll need tools from geometry
Agenda
• Perspective projection
• Rotations
• Camera models
Orthogonal transformations
Defn: Orthogonal transformations are linear transformations that preserve distances and angles
[can conclude by setting a,b = coordinate vectors]
Defn: A is a rotation matrix if ATA = I, det(A) = 1Defn: A is a reflection matrix if ATA = I, det(A) = -1
aT b = T (a)T (b) where T (a) = Aa, a 2 Rn, A 2 Rn⇥n
aT b = aTATAb () ATA = I
aT b = F (a)TF (b) where F (a) = Aa, a 2 Rn, A 2 Rn⇥n
2D Rotations
R =
cos ✓ � sin ✓sin ✓ cos ✓
�
1 DOF
3D Rotations
Think of as change of basis where ri = r(i,:) are orthonormal basis vectors
R
2
4XYZ
3
5 =
2
4r11 r12 r13r21 r22 r23r31 r32 r33
3
5
2
4XYZ
3
5
rotated coordinate frame
r1
r2
r3
How many DOFs?
3 = (2 to point r1 + 1 to rotate along r1)
Euler’s rotation theorm
Any rotation of a rigid body in a three-dimensional space is equivalent to a pure rotation about a single fixed axis
https://en.wikipedia.org/wiki/Euler's_rotation_theorem
3D RotationsLots of parameterizations that try to capture 3 DOFs
Helpful ones for vision: orthonormal matrix, axis-angle, exponential maps
Represent a 3D rotation with a unit vector pointed along the axis of rotation, and an angle of rotation about that vector
7
Shears
A=
2
664
1 hxy hxz 0hyx 1 hyz 0hzx hzy 1 00 0 0 1
3
775
Shears y into x
7
8
Rotations• 3D Rotations fundamentally more complex than in 2D!
• 2D: amount of rotation!• 3D: amount and axis of rotation
-vs-
2D 3D
8
05-3DTransformations.key - February 9, 2015
Review: dot and cross products
Dot product:
Cross product:
a · b = ||a|| ||b||cos✓
Cross product matrix: a⇥ b = ab =
2
40 �a3 a2a3 0 �a1�a2 a1 0
3
5
2
4b1b2b3
3
5
a⇥ b =
2
4a2b3 � a3b2b1a3 � a1b3a1b2 � a2b1
3
5
Approach
x
! 2 R3, ||!|| = 1
✓
https://en.wikipedia.org/wiki/Axis-angle_representation
Rodrigues' rotation formula
x✓
! 2 R3, ||!|| = 1
xk
x?
1. Write as x as sum of parallel and perpindicular component to omega
2. Rotate perpindicular component by 2D rotation of theta in plane orthogonal to omega
R = I + w sin ✓ + ww(1� cos ✓)
[Rx can simplify to cross and dot product computations]
https://en.wikipedia.org/wiki/Rodrigues'_rotation_formula_rotation_formula
Exponential map representation
x✓
! 2 R3, ||!|| = 1
xk
x?
[standard Taylor series expansion of exp(x) @ x=0 as 1 + x + (1/2!)x2 +…]
Implies that we can approximate change in position of x due to a small rotation v as: v ⇥ x, where v = !✓
R = exp(v), where v = !✓
= I + v +1
2!
v2 + . . .
[reduces to Rodrigous’ formula with Taylor series expansion of sine + cosine]
Agenda
• Perspective projection
• Rotations
• Camera models
Recall perspective projection
COP
(X,Y,Z)
(x,y,1)
x =f
Z
X
y =f
Z
Y
x
y
z
Perspective projection revisited
�
2
4x
y
1
3
5 =
2
4f 0 00 f 00 0 1
3
5
2
4X
Y
Z
3
5
�x = fX
� = Z
x =�x
�
=fX
Z
Given (X,Y,Z) and f, compute (x,y) and lambda:
Special case: f = 1
COP
(X,Y,Z)(x,y,1)
• 3D point is obtained by scaling ray pointed at image coordinate • Scale factor = true depth of point
Natural geometric intuition:
[Aside: given an image with a focal length ‘f’, resize by ‘1/f’ to obtain unit-focal-length image]
Z
2
4x
y
1
3
5 =
2
4X
Y
Z
3
5
Homogenous notation
For now, think of above as shorthand notation for
2
4x
y
z
3
5 ⇠
2
4X
Y
Z
3
5
2
4x
y
z
3
5 ⌘
2
4X
Y
Z
3
5
9� s.t. �
2
4x
y
z
3
5 =
2
4X
Y
Z
3
5
Camera projection
3D point in world coordinates
Camera extrinsics (rotation and translation)
Camera instrinsic matrix K (can include skew & non-square pixel size)
�
2
4x
y
1
3
5 =
2
4f 0 00 f 00 0 1
3
5
2
4r11 r12 r13 t
x
r21 r22 r23 t
y
r31 r32 r33 t
z
3
5
2
664
X
Y
Z
1
3
775
camera
world coordinate frame
r1
r2
r3
T
Aside: homogenous notation is shorthand for x =�x
�
Fancier intrinsicsx
s
= s
x
x
y
s
= s
y
y
x
0 = x
s
+ o
x
y
0 = y
s
+ o
y
x” = x
0 + s
✓
y
0
non-square pixels
shifted origin
x
y
✓ skewed image axes
}
}
K =
2
4s
x
s
✓
o
x
0 s
y
o
y
0 0 1
3
5
2
4f 0 00 f 00 0 1
3
5 =
2
4fs
x
fs
✓
o
x
0 fs
y
o
y
0 0 1
3
5
Notation�
2
4x
y
1
3
5 =
2
4fs
x
fs
✓
o
x
0 fs
y
o
y
0 0 1
3
5
2
4r11 r12 r13 t
x
r21 r22 r23 t
y
r31 r32 r33 t
z
3
5
2
664
X
Y
Z
1
3
775
= K3⇥3
⇥R3⇥3 T3⇥1
⇤
2
664
X
Y
Z
1
3
775
= M3⇥4
2
664
X
Y
Z
1
3
775
Claims (without proof): 1. A 3x4 matrix ‘M’ can be a camera matrix iff det(M) is not zero 2. M is determined only up to a scale factor
[Using Matlab’s rows x columns]
Notation (more)M3⇥4
2
664
XYZ1
3
775 =⇥A3⇥3 b3⇥1
⇤
2
664
XYZ1
3
775
= A3⇥3
2
4XYZ
3
5+ b3⇥1
M =
2
4mT
1
mT2
mT3
3
5 , A =
2
4aT1aT2aT3
3
5 , b =
2
4b1b2b3
3
5
Applying the projection matrix
Set of 3D points that project to x = 0:
Set of 3D points that project to y = 0:
Set of 3D points that project to x = inf or y = inf:
� =⇥X Y Z
⇤a3 + b3
⇥X Y Z
⇤a1 + b1 = 0
⇥X Y Z
⇤a2 + b2 = 0
⇥X Y Z
⇤a3 + b3 = 0
x =1
�
(⇥X Y Z
⇤a1 + b1)
y =1
�(⇥X Y Z
⇤a2 + b2)
x
y
a3
Rows of the projection matrix describe the 3 planes defined by the image coordinate system
a1
a2
image plane
COP
(x,y) (X,Y,Z)
What’s set of (X,Y,Z) points that project to same (x,y)?2
4X
Y
Z
3
5 = �w + b where w = A
�1
2
4x
y
1
3
5, b = �A
�1b
What’s the position of COP / pinhole?
COP
A
2
4XYZ
3
5+ b = 0 )
2
4XYZ
3
5 = �A�1b
Other geometric properties
Draw plane infront of pinhole. Write (x,y) for normalized coordinate and (u,v) for image coordinates?
Affine cameras
mT3 =
⇥0 0 0 1
⇤perspective weak perspective
Affine cameras
Captures 3D affine transformation + orthographic projection + 2D affine transformation
x
y
�=
· · ·· · ·
�2
41
11
3
5
2
664
· · · ·· · · ·· · · ·
1
3
775
2
664
X
Y
Z
1
3
775
=
2
4a11 a12 a13 b1a21 a22 a23 b2
1
3
5
2
664
XYZ1
3
775
=
a11 a12 a13a21 a22 a23
�2
4XYZ
3
5+
b1b2
�
x = AX+ b
• Projection defined by 8 parameters • Parallel lines project to parallel lines • 2D points = linear projection of 3D points (+ 2D translation)
Affine Cameras
• Example: Weak-perspective projection model • Projection defined by 8 parameters • Parallel lines project to parallel lines • The transformation can be written as a direct linear transformation plus an offset
Image coordinates (x,y) are an affine function of world coordinates (X,Y,Z)
mT3 =
⇥0 0 0 1
⇤ x =⇥X Y Z
⇤a1 + b1
y =⇥X Y Z
⇤a2 + b1
Geometric Transformations
Euclidean (trans + rot) preserves lengths + angles
Euclidean
Affine
Projective
Affine: preserves parallel lines
Projective: preserves lines