PhotoMeter: Easy-to-use MonoGraphoMetrics- Visual Metrology from Calibrated Images -
Hendrik Müller*
In cooperation of:
Prof. Jarek Rossignac Dr. Markus WackerGVU Center, IRIS Cluster Department of Computer Graphics
College of Computing College of ComputingGeorgia Institute of Technology, GA University of Technology Dresden
Dresden, July 30th 2004
ii
ii
AcknowledgementsI have enjoyed my work at the GVU Center at the College of Computing at the Georgia
Institute of Technology immensely. It has been a fantastic experience, extremely useful for
my educational and personal growth. Being surrounded by helpful and outgoing people in
a friendly atmosphere and environment assisted the success immeasurably. I extend many
thanks to Prof. Jarek Rossignac for giving me this opportunity. Additionally, I would like to
thank Alexander Powell for his guidance and help in developing the user interface for Pho-
toMeter, and to Lawrence Ibarria and Francisco Palop for their suggestions and support.
I also want to underline Dr. Markus Wacker’s seamless takeover of my project and
his guidance upon my return to Dresden from Atlanta, GA. Finally, I express my gratitude
to Mark Wittman who proofread the whole thesis and gave me suggestions to improve the
grammar and fluency of my English.
iii
iii
Figure 1: Waterfall by M. C. Escher (1961). This painting illustrates perspective absurdities. It seems like thewaterfall is fed with its own water, but the painting's perspective projection is correct. This effect occursbecause of a specific view position where well-chosen parts of the flume overlap. Because of this picture andpictures like it, it is necessary to understand the issues of perspective. One way to find out if this building couldactually exist is to create a 3D model of it. Creating three-dimensional models from single images will be amain issue of the present thesis.
iv
iv
AbstractMonoGraphoMetrics is the science of computing 3D measures from a single image. Sev-
eral recent research activities have produced theoretical principles and practical tools for
extracting 3D measures from uncalibrated photographs, using either single- or multiple-
view-metrology. Some tools require that the user identifies configurations of edges, which
are used to establish constraints for the camera model or to identify planes in the scene.
Other tools detect vanishing lines and vanishing points automatically to compute the cam-
era position as a prerequisite to compute several dimensions. The process involved is often
laborious and its application is limited to images where the required configurations of
edges are visible. In contrast, the work presented here is limited to photographs taken with
a calibrated camera. More specifically, I assume to have a horizontally oriented camera
at a known height above the floor and with a known field of view. Under these simple
conditions, a single mouse operation – point-press-drag-release – provides enough infor-
mation to compute the 3D position of any point
€
P with respect to the coordinate system of
the camera, provided that
€
P and its projection
€
F on the floor can be identified in the im-
age. Horizontal and vertical displacements between pairs of points are displayed as 3D
measurements (arrows and dimension labels). This approach is the basis of the interactive
PhotoMeter system, in which a novice user can easily measure indoor and outdoor spaces
with only one or two mouse clicks per measurement. The resulting measurements obtain
dimensions and positions of buildings, rooms, offices, windows, doors, pieces of furniture,
and even people. Through theoretical and experimental analysis, I will discuss how errors
in lens distortion, camera calibration, height, and alignment impact the accuracy of the re-
ported measurements. For example, I achieved an inaccuracy of less than five centimeters
when measuring walls, doors, or furniture at five meters from the camera. I envision appli-
cations of this simple technology in a variety of fields involving architecture, home acquisi-
tion, interior decoration, and online purchase of furniture.
v
v
German SummaryMonoGraphoMetrics ist die Wissenschaft der Berechnung von dreidimensionalen Maßen
aus einem einzelnen Bild heraus. Aktuelle Forschungsaktivitäten haben theoretische
Grundsätze und praktische Werkzeuge entwickelt um aus unkalibrierten Fotografien 3D
Maße durch Metrologie mit einzelnen oder mehreren Ansichten zu extrahieren. Einige
Werkzeuge gehen davon aus, dass der Benutzer Anordnungen von Kanten aus dem Bild
identifiziert, welche später dazu genutzt werden um Einschränkungen für das Kameramo-
dell zu bestimmen und um Ebenen in der Szene zu erkennen. Andere Werkzeuge ermitteln
automatisch Fluchtlinien und Fluchtpunkte um die Kameraposition zu bestimmen, welche als
die Voraussetzung für die Berechnung zahlreicher Abmessungen dienen. Der damit ver-
bundene Prozess ist oft mühselig und deren Anwendungen sind auf Bilder begrenzt, in de-
nen benötigte Kantenanordnungen erkennbar sind. Im Gegensatz dazu ist die Arbeit, die
hier präsentiert wird, auf Fotografien limitiert, die mit einer kalibrierten Kamera erstellt
wurden. Ich setze voraus, dass eine horizontal orientierte Kamera mit einer bekannten
Höhe über dem Fußboden und einem bekannten Sichtfeld eingesetzt wird. Unter diesen
einfachen Umständen stellt eine einzige Mausoperation (klicken-ziehen-loslassen) genü-
gend Informationen zur Verfügung um die 3D-Position jedes Punktes
€
P relativ zum Koor-
dinatensystem der Kamera zu berechnen. Die Voraussetzung dafür ist, dass der Punkt
€
Pund dessen Projektion
€
F auf den Fußboden im Bild identifiziert werden können. Horizonta-
le und vertikale Positionsunterschiede zwischen Paaren von Punkten werden als 3D-Maße
(Pfeile mit Abmessungen) dargestellt. Diese Herangehensweise bietet die Basis des interak-
tiven PhotoMeter-Systems, in welchem es einem Einsteiger mit nur einem oder zwei Maus-
klicks pro Messung oder Position leicht möglich ist innere und äußere Bereiche, wie Ge-
bäude, Räume, Büros, Fenstern, Türen, Möbelstücke und sogar Personen, auszumessen.
Durch theoretische und experimentelle Analysen erörtere ich, wie Fehler durch Linsenver-
zerrung, Kalibrierung, Höhe und Ausrichtung der Kamera die aufgeführten Abmessungen
beeinflussen. Zum Beispiel erreiche ich eine Ungenauigkeit von weniger als fünf Zentimeter
für Wände, Türen oder Möbel, welche sich in einer Entfernung von fünf Meter von der
Kamera befinden. Diese Anwendung kann durch ihre einfache Technologie in einer Viel-
zahl von Gebieten, wie der Architektur, der Innendekoration und sogar dem Online-Kauf
von Möbelstücken, eingesetzt werden.
vi
vi
Table of ContentsAcknowledgements.........................................................................................................ii
Abstract........................................................................................................................iv
German Summary ..........................................................................................................v
Table of Contents ..........................................................................................................vi
List of Figures .............................................................................................................. viii
1 Introduction............................................................................................................. 1
1.1 Measurements from images.............................................................................1
1.2 Using vision vs. reality ....................................................................................2
1.3 Thesis outline .................................................................................................4
2 Projective Geometry ................................................................................................ 5
2.1 Projective geometry and art history..................................................................5
2.1.1 The visual pyramid & Alberti’s “open window”.....................................6
2.1.2 Alberti’s perspective construction.........................................................7
2.1.3 Pelerin’s “distance point construction”..................................................8
2.2 Camera models and perspective mappings ......................................................9
2.2.1 Pinhole camera model ........................................................................9
2.2.2 Plane-to-plane homography...............................................................11
2.2.3 Plane-to-plane homology...................................................................12
2.3 Distortion correction .....................................................................................13
2.4 Vanishing points and vanishing lines ..............................................................14
3 Previous & Related Work ....................................................................................... 16
3.1 Metrology & modeling from a single image ...................................................17
3.1.1 “Tour Into the Picture” ......................................................................18
3.1.2 “Single-view metrology” ...................................................................20
3.2 Metrology & modeling from multiple images ..................................................23
3.2.1 Using two views ...............................................................................23
3.2.2 Using three or more views.................................................................24
3.3 Metrology & modeling from panoramas?.......................................................24
4 PhotoMeter: Easy-to-use MonoGraphoMetrics ......................................................... 26
vii
vii
4.1 Introduction .................................................................................................26
4.2 Computations...............................................................................................27
4.2.1 Main Idea........................................................................................27
4.2.2 Mathematical/geometrical calculations ..............................................28
4.2.3 Validation with “3D Measurement Simulation” ...................................32
4.3 Requirements & Preliminaries ........................................................................34
4.4 User Interface ..............................................................................................36
4.5 Implementation & Documentation ..................................................................39
4.5.1 Implementation decisions ..................................................................39
4.5.2 Class diagrams.................................................................................40
4.5.3 Activity diagrams..............................................................................40
4.6 Results.........................................................................................................43
5 Conclusion ............................................................................................................ 47
Bibliography ............................................................................................................... 48
CD Contents................................................................................................................ 50
viii
viii
List of FiguresFigure 1: Waterfall by M. C. Escher (1961. ....................................................................iii
Figure 2: A three-dimensional visual measuring device.. ...................................................1
Figure 3: Errors from input to output. ..............................................................................2
Figure 4: First proof of perspective effect by Leonardo. ....................................................3
Figure 5: From left to right: Filippo Brunelleschi and Leon Battista Alberti. ..........................5
Figure 6: The model of a visual pyramid..........................................................................6
Figure 7: Leonardo da Vinci The Annunciation (c. 1472-1475). ........................................7
Figure 8: Alberti’s perspective construction. .....................................................................8
Figure 9: Two applications of perspective construction with Alberti’s method......................8
Figure 10: Pelerin's distance point construction process. ...................................................9
Figure 11: Pinhole camera model. ..................................................................................9
Figure 12: Leonardo’s perspectograph; A perspective machine from Albrecht Dürer ........10
Figure 13: Plane-to-plane camera model. ......................................................................11
Figure 14: Inter-image homography.. ............................................................................12
Figure 15: Planar homology is defined by a vertex, an axis, and a characteristic ratio. ....12
Figure 16: Radial distortion correction. .........................................................................14
Figure 17: Extracted parallel lines, vanishing points, and a vanishing line. .......................15
Figure 18: "Tour into the picture" process overview. .......................................................18
Figure 19: "Tour into the picture": a) spidery mesh; b) five regions. .................................19
Figure 20: Basic geometry of Criminisi’s approach. ........................................................21
Figure 21: Distance between two planes.. .....................................................................21
Figure 22: Distance between two planes 2.. ..................................................................22
Figure 23: An example of results obtained with Criminisi’s single-view metrology. ............23
Figure 24: Stereovision system......................................................................................23
Figure 25: Vanishing point a vanishing circle for a spherical panorama image. ................25
Figure 26: Using a spherical image to create other TIP views. .........................................25
Figure 27: Using a cylindrical image to create other TIP views. .......................................25
Figure 28: PhotoMeter's applications. ...........................................................................26
Figure 29: The basic setup of the 3D scene (left) and basic labels in the scene (right).. .....27
Figure 30: The sequence shows how to measure a window with two mouse clicks.............28
Figure 31: 3D view showing the geometric computation of the horizontal distance. ..........29
Figure 32: Top view showing how to compute the x and z coordinates:. ..........................29
Figure 33: Side view of vertical section plane defined in Figure 32..................................29
Figure 34: Implementation of the basic PhotoMeter computations....................................31
Figure 35: Shows the two different views of 3D Measurement Simulation. .......................33
ix
ix
Figure 36: Shows a section of 3D Measurement Simulation. ...........................................33
Figure 37: Shows the comparison process. ....................................................................34
Figure 38: Initial camera calibration panel in PhotoMeter...............................................35
Figure 39: An inexpensive laser level for adjusting the horizontal. ..................................35
Figure 40: Start screen of PhotoMeter...........................................................................36
Figure 41: Menu of PhotoMeter with all submenus. ........................................................37
Figure 42: The user interface of PhotoMeter. .................................................................38
Figure 43: PhotoMeter panels. .....................................................................................38
Figure 44: Main UML class diagram of PhotoMeter. ......................................................41
Figure 45: UML activity diagram explaining, the use of panels and dialogs. ....................42
Figure 46: UML activity diagram explaining mouse actions in the window. ......................43
Figure 47: Acceptable tilt as a function of depth. ...........................................................44
Figure 48: Error in computed height as a function of distance for a tilt of one degree. ......44
Figure 49: Position error as a function of inaccuracy of the camera height. ......................45
Figure 50: Error in computed height as a function of distance for due to pixel resolution...45
Figure 51: The error in the dimensions measured with PhotoMeter. .................................46
1
1
1 IntroductionIn this section I will introduce and state the motivation for metrology from images. I will
expose the basic model and differentiate between metrology from vision and metrology
from reality.
1.1 Measurements from imagesImages or sequences of images potentially carry a tremendous amount of geometrical in-
formation about the represented scene. It is one aim of computer graphics and computer
vision to extract this information in a quantifiable, accurate way. By interpreting photo-
graphs, general techniques are developed to reconstruct a three-dimensional digital model
of a scene. The leading idea is as follows (see Figure 2):
(i) A person takes some pictures of the scene (or the object) to be measured.
(ii) A computer creates a 3D metric model of the viewed scene by interpreting
those images.
(iii) Finally, the model is stored in a database, which may be queried at any
time for measurements via a graphical user interface.
Figure 2: A three-dimensional visual measuring device: (i) a photograph of a scene is taken; (ii) the image istransferred into a computer and interpreted; (iii) a 3D model of the viewed scene is reconstructed and interac-tively queried for measurements.
Such a device possesses several interesting features:
(i) It is user friendly. In fact, once the images are taken and the model is built,
the user can virtually walk trough it, view the scene from different locations,
take measurements querying the software interface and store them in a da-
2
2
tabase, interact with the objects of the scene, place new, consistent objects
in the scene (augmented reality) and create animations;
(ii) The capture process is rapid and simple since it only involves one camera
taking pictures of the environment to be measured;
(iii) The acquired data are stored digitally on a disk, ready for reuse at any time
when measurements of the original scene and its objects are needed;
(iv) The hardware involved is cheap and easy to use. No new, dedicated hard-
ware is necessary.
The work presented here first gives an overview of existing single- and multiple-view
metrology methods underlying simple geometric mathematics. A new way of extracting
measurements from a single image taken with a calibrated camera using a special graphi-
cal user interface is presented as the main topic.
The process of taking measurements is traditionally a difficult engineering task, and
like all the engineering tasks, must be accurate and robust. This image is taken with a cali-
brated camera and this data, all transformations, and the output measurement are affected
by error. (see Figure 3).
Figure 3: Errors: the input data are processed by the transformation T to obtain the required output measure-ment. Input data and transformations are affected by error, leading to error in the output measurement.
Because the process presented here is highly fault-prone, I will provide a detailed error
analysis at the end of this thesis.
1.2 Using vision vs. realityFor digital measurements of the real world, several different methods of distance measur-
ing are used. Devices that help to obtain these dimensions can be categorized as active
and passive devices. Active and passive devices differ in their acquisition of the measure-
ments. Active devices send signals into the environment and receive reflected ones back.
The returned signals are analyzed and compared with the outgoing ones to retrieve infor-
mation related to distances. Contrarily, passive devices do not use any active signals; they
rely on after-analyses done on the given date, e.g. a photograph taken with a camera.
Many distance-measuring systems have been based on ultrasonic technology. It is
possible to buy relatively cheap ultrasonic devices capable of measuring the distance from
the operator to an object (such as a wall) with an echo reflection time measurement sys-
tem. Ultrasonic scanners have, for instance, been successfully used in medical imaging and
in robotic problems [FM94, SS94]. The main problem of such an approach is that the
3
3
measurement returned is affected by strange phenomena like multiple reflections of the ul-
trasound waves on objects, thus leading to wrong estimation of the reflection time. A sec-
ond approach for measuring depths is the use of laser range finders. These devices work
by directing laser beams onto the object to be measured and analyzing the phase or echo-
ing time of the reflected beams [Le99]. Those systems are extremely accurate but they suf-
fer problems similar to those of ultrasonic devices. Other active devices employ cameras to
acquire images of an object illuminated by a regular light pattern projected by special
auxiliary devices. The shape of the object is computed from the deformation of the pro-
jected grid [Ma92]. Consequently, active devices are affected by reflections or interfer-
ences and need to be used with care. Passive devices such as cameras do not suffer from
the above problems and are characterized by a wide range of applications. They can be
applied to measure the distance of the device from an object as well as the distance be-
tween two arbitrary points in space, areas of surface and angles. Cameras return bi-
dimensional data (rather than mono-dimensional ones from active devices) by dense sam-
pling within the field of view. So, they can measure objects, which are far as well as close
ones as long as they are visible in the image. Speed is not an issue for such devices, be-
cause the photograph is taken in a few milliseconds. In contrary to these advantages, there
are many issues and problems arising when using cameras.
Taking measurements of the world from images is complicated by the fact that in the
imaging process the 3D space is projected onto a planar image, with some unavoidable
loss of information. Reconstruction of the scene means retrieving that information from the
images and/or the camera’s internal parameters. In particular, perspective distortions oc-
cur during the acquisition stage (see Figure 4). For example, objects that are far away
from the eye or the camera look smaller than objects that are close. Furthermore, camera
properties lead to lens distortion and other issues.
Figure 4: First proof of perspective effect by Leonardo: “Among objects of equal size that which is most remotefrom the eye will look smallest.”
4
4
1.3 Thesis outlineThe remainder of this thesis is organized as follows: In section 2, I give an overview of
mathematical foundations and the development of projective geometry. I start with a jour-
ney into art, describe camera models, and provide basics about vanishing points and van-
ishing lines. Previous and related work about metrology from single views, multiple views,
and panoramas are explained in section 3. I delve into “Tour into the picture” and single-
view metrology as two examples of visual metrology. Section 4 is dedicated to the results
of my research in single-view metrology. There, I will primarily describe the method used
to compute the location of 3D points from a single mouse click. This also includes descrip-
tions of the developed system’s user interface and its implementation decisions. Further-
more, results from simulations and real data are provided and discussed. Finally, I con-
clude this thesis with a short summary and a discussion of possible future work in the last
section.
5
5
2 Projective GeometryIn this section, I will introduce the mathematical foundations used in metrology from im-
ages. This involves a journey into art history, a presentation of different camera models,
and also a short discussion about distortion correction. If the reader has previous knowl-
edge of the contents within this section, it can be skipped.
To begin with projective geometry, we first need to make sure we understand the
differences between transformations and projections. Transformations are mappings within
n-dimensional space, for example the movement of a point in a 3D space. In contrast, a
projection is a mapping from n-dimensional space down to lower-dimensional subspace,
e.g. the mapping of a point in 3D to a point on a plane (a 2D entity). Here, we are inter-
ested in such 3D to 2D projections where the plane is an image. Basically, there are two
different kinds of projection: orthographic projection and perspective projection. In ortho-
graphic projection, perspective effects do not occur – distant objects look same as near
ones. With perspective projections further objects appear smaller than nearer ones and
lines need to be seen as preserved.
In the present thesis, only the perspective projection model is used and explained,
since it can be applied to real world cognition. So, images taken with cameras are based
on this model. For further details, I will refer to [A1435], [Ae1435], and [Crim01] for this
section.
2.1 Projective geometry and art history
Figure 5: From left to right: Filippo Brunelleschi and Leon Battista Alberti.
In the early renaissance, painters started using perspective projection to come up
with mathematically correct paintings. Ironically, the invention of mathematical rules for
correct perspective came not from a painter, but from the sculptor and architect Filippo
Brunelleschi (1377-1446). He did not write down his deliberations, and as a result the first
written account of a method to construct pictures in correct perspective is found in a trea-
tise written by the learned humanist Leon Battista Alberti (1404-1472). The first version,
written in Latin and dated 1435, was entitled “De Pictura” (On Painting), and was Al-
6
6
berti’s effort to relate the development of painting in Florence with his own theories of art.
Later in 1505, Jean Pelerin (1445-1522) published his book “De Artificiali Perspectiva”
that describes another way of perspective construction. Here, I will enter into details of the
work of Alberti [A1435, Ae1435] and Pelerin.
2.1.1 The visual pyramid & Alberti’s “open window”Alberti used the common knowledge of a visual pyramid to understand perspective. He
described the visual pyramid as “…a figure of a body from whose base straight lines are
drawn upward, terminating in a single point. The base of this pyramid is a plane, which is
seen. The sides of the pyramid are those rays, which I have called extrinsic. The cuspid,
that is the point of the pyramid, is located within the eye where the angle of the quantity
is…” [Ae1435] (see Figure 6).
Figure 6: The model of a visual pyramid (also known as pyramid of vision or visual cone).
He defined a two-dimensional image as the cutting plane through the visual pyramid. To
explain this, he made the following mental experiment: imagine an image plane out of
glass. The painter/viewer is located at the peak of the pyramid. Now, when looking from
the peak through the pyramid, you can see lots of different planes cutting through the im-
age plane; some are parallel to the cutting edges, others are perpendicular, transversal,
or horizontal. When projecting these lines on the image plane, you get the correct per-
spective projection. So, the perspective correct image comes into being by a linear projec-
tion from the base plane of the pyramid onto the image plane.
In a different way, this method was explained with an “open window” [A1435]: To
use linear perspective an artist must first imagine the picture surface as an “open window”
through which he sees the painted world. Straight lines are then drawn on the canvas to
represent the horizon. “Visual rays” connect the viewer's eye to a point in the distance.
The horizon line runs across the canvas at the eye level of the viewer. The horizon line is
where the sky appears to meet the ground. The vanishing point should be located near the
7
7
center of the horizon line. Accordingly, it is located where all parallel lines, which he
called orthogonals, that run towards the horizon line appear to come together like train
tracks in the distance. Orthogonal lines are “visual rays” helping the viewer's eye to con-
nect points from the edges of the canvas to the vanishing point. An artist uses them to align
the edges of walls and paving stones. Leonardo learned this method as an apprentice in
Florence and produced his painting “Annunciation” (see Figure 7) when he was only 21
years old.
Figure 7: Leonardo da Vinci The Annunciation (1472-1475): Shows the correctness of perspective in hispainting. The green horizontal line is the vanishing line with the red circle representing the vanishing point,which is the location where all parallel lines - orthogonals - (drawn in blue) cross.
2.1.2 Alberti’s perspective constructionUsing the previously described theory, Alberti came up with his perspective construction of
an image from the real world. He explained it with a floor or pavement. Referring to
Figure 8(a), the centric point
€
C is chosen; this is the point in the picture directly opposite to
the viewer’s eye. It is also known as the central vanishing point. The ground plane
€
AB in
the picture is divided equally, and each division point is joined to
€
C by a line. These are
lines that run perpendicular to the plane of the picture. In Figure 8(b), the point
€
R is de-
termined by setting
€
NR as the viewing distance. The viewing distance is how far the
painter was from the picture. This is then how far a viewer should stand from the picture.
€
R is known as the right diagonal vanishing point. The lines from the edge
€
AB to the cen-
tric point
€
C are intersected by the lines converging at
€
R. As seen in Figure 8(c), these in-
tersection points highlighted as red dots are the basis to draw lines perpendicular to the
line
€
NB . These formed lines are called transversals and run parallel to the ground line
€
ABof the picture. Notice that the diagonal points of the squares in the grid can be joined by a
straight line. This is an indication that Alberti’s construction shows the ground plane of a
picture in the correct perspective. To summarize, the result of this construction is a floor like
a checkerboard (green in Figure 8(c)) in the right perspective relative to specific viewing
8
8
distance. The construction of the floor can be used to draw other objects like cuboids or a
circle (see Figure 9). Alberti became very famous with this method, even though it was not
explained completely and nor was it proved correctly.
Figure 8: Alberti’s perspective construction, original [A1435] (left). The construction process (right).
Figure 9: Two applications of perspective construction with Alberti’s method: (a) the rectangle
€
PQRS is thebase plane of the cuboid. The perspective construction of this plane is originated in the perspective floor. Thewhole cuboid is then created; (b) the circle can be drawn in perspective.
2.1.3 Pelerin’s “distance point construction”Historical records show that besides Alberti’s construction, there were other methods for
constructing floors. One of them is known as the distance point construction, and was
found in the treatise of Jean Pelerin (1445-1522) entitled “De Artificiali Perspectiva”. Re-
ferring to Figure 10(a), the ground line is
€
AB is divided equally, and each of these divi-
9
9
sion points are joined to the centric point
€
C . Next, the distance point
€
D is chosen. The dis-
tance
€
CD is the viewing distance. As in Figure 10(b), the line
€
AD will intersect all the or-
thogonals. These intersection points are used to draw the transversals, which are parallel
to
€
AB . The result is a floor as accurate in perspective Alberti’s method would produce.
This could be proven geometrically, as discussed in more detail in [NUS04].
Figure 10: Pelerin's distance point construction process.
2.2 Camera models and perspective mappingsNow that we have gone through perspective constructions in art history, we need to ad-
dress mathematics. In current single- and multiple-view metrology, the image formation
process must be modeled in a rigorous mathematical way. This section describes the cam-
era models and the projective transformations, which are relevant for these methods.
2.2.1 Pinhole camera modelThe pinhole camera is the simplest, and the ideal, model of camera function. It has an in-
finitesimally small hole through which light enters before forming an inverted image on the
camera surface facing the hole. The intersection of the light rays with the camera surface
plane form the image of the object. To simplify things, we usually model a pinhole camera
by placing the image plane between the focal point of the camera and the object, such
that the image is not inverted. Such a mapping from three dimensions onto two dimensions
is called perspective projection. This takes place because of straight lines that pass through
a single point. A schematic pinhole camera model is presented in Figure 11.
Figure 11: Pinhole camera model: a point
€
X in the 3D space is imaged as
€
x . Euclidean coordinates
€
X ,Y ,Zand
€
x,y are used for the world and image references systems, respectively.
€
O is the center of projection, theviewer.
10
10
Simple geometry shows that if we denote the distance of the image plane to the cen-
ter of projection by
€
f (also known as the focal length), then the image coordinates
€
x = (x,y)Τ are related to the object coordinates
€
X = (X ,Y ,Z )Τ by similar triangles by
€
x = f XZ
;
€
y = f YZ
. (1)
These equations are non-linear. They can be made linear by introducing homogenous co-
ordinates, which is effectively a matter of embedding the Euclidean geometry into the per-
spective framework. Each point
€
X ,Y ,Z( ) in three-space is mapped onto a line in four-space
given by
€
WX ,WY ,WZ ,W( ), where
€
W is a dummy variable that sweeps out the line
(
€
W ≠ 0). In homogenous coordinates, the perspective projection onto the plane is given by
€
x = PX with
€
P =
f 0 0 00 f 0 00 0 1 0
€
xyw
=
f 0 0 00 f 0 00 0 1 0
XYZW
. (2)
The “=” in function (2) stands for equality up to scale. That means the result needs to be
multiplied with a scale factor for correctness. This, however, is not important here. The
camera model is completely specified once the matrix
€
P is determined. The matrix can be
computed from the relative positioning of the world points and camera center, and from
the camera’s internal parameters (focal length); however, it can also be computed directly
from image-to-world point correspondences as shown in (1).
Leonardo da Vinci and Albrecht Dürer were two of the first painters to use the tech-
nique of a pinhole camera model to paint pictures with correct perspective (see Figure
12). They used special machines instead of geometric calculations.
Figure 12: left: Leonardo’s perspectograph “The things approach the point of the eye in pyramids, and thesepyramids are intersected on the glass plane.” Leonardo da Vinci (1453-1519); right: A perspective machinefrom Albrecht Dürer “Underweysung der Messung mit Zirckel und Richtscheyt” (1525).
11
11
2.2.2 Plane-to-plane homographyAn interesting specialization of the general central projection described above is a plane-
to-plane projection; a 2D-2D projective mapping. The camera model for perspective im-
ages of planes, mapping points on a world plane to points on the image plane (and vice-
versa) is well known [SeKn79]. Points on a plane are mapped to points on another plane
by a plane-to-plane homography, also known as a planar projective transformation. It is
an invertible mapping induced by the star of rays focused in the camera center (center of
projection). Planar homographies arise, for example, when a world planar surface is im-
aged.
Figure 13: Plane-to-plane camera model: a point
€
X in the world plane is imaged as
€
x . Euclidean coordinates
€
X ,Y and
€
x,y are used for the world and image planes, respectively.
€
O is the viewer’s position.
This homography can be described by a 3x3 homogenous matrix. Figure 13 shows
the imaging process. Under perspective projection corresponding points
€
X = (X ,Y )Τ in the
world plane and
€
x = (x,y)Τ in the image plane are related homogenously by
€
x = HX with
€
H =
f 0 00 f 00 0 1
€
xyw
=
f 0 00 f 00 0 1
XYW
. (3)
The
€
W or
€
w in this function is necessary to apply matrix transformations like rotation, scal-
ing, and transformation to the object. The “=” stands again for equality up to scale. The
camera model is completely specified once the matrix is determined. Here too the matrix
can be computed from the relative positioning of the two planes and camera center, and
the camera internal parameters; however, it can also be computed directly from image-to-
world correspondences.
The plane-to-plane camera model can also be used to understand inter-image ho-
mography. A planar surface viewed from two different viewpoints induces a homography
between the two images. Some points on the world plane can be transferred from one to
12
12
the other by means of a homography mapping (see Figure 14). A few visible mappings
are enough to describe the projection matrix.
Figure 14: Inter-image homography: the floor viewed in both images induces a homography. Some points canbe mapped from one image to the other.
2.2.3 Plane-to-plane homologyA planar homology is a plane-to-plane projective transformation and a specialization of
the homography. It is characterized by a line of fixed points, called the axis and a distinct
fixed point not on the axis known as the vertex (see Figure 15). Planar homologies arise in
several imaging situations, for instance, when different light sources cast shadows of an
object onto the same plane.
Figure 15: A planar homology is defined by a vertex, an axis, and a characteristic ratio: its characteristic in-
variant s given by the cross-ratio
€
v, p1,p2 ,ip( ) where
€
p1 and
€
p2 are any pair of points mapped by the ho-
mology and
€
ip is the intersection of the line through
€
p1 and
€
p2 and the axis. The point
€
p1 is projected onto
the point
€
p2 under the homology, and similarly for
€
q1 and
€
q2 .
13
13
Such a transformation is defined by a 3x3 matrix (
€
H ) with one distinct eigenvalue whose
corresponding eigenvector is the vertex, and two repeated eigenvalues whose correspond-
ing eigenvectors span the axis. A planar homology can be interpreted as a particular pla-
nar homography. The projective transformation representing the homology can be given
directly in terms of the 3-vector representing the axis
€
a , the 3-vector representing the ver-
tex
€
v , and a scalar factor
€
µ as
€
H = I + µvaΤ
v•a . (4)
The factor
€
µ is the characteristic ratio and it can be computed as the cross-ratio of four
aligned points
€
a,b,c,d . The cross-ratio
€
µ = a,b,c,d( ) is then given by
€
a,b,c,d( ) =c− a( ) d − b( )c− b( ) d − a( )
. (5)
This case can also be seen in Figure 15. For more details on cross-ratios see [Sam86].
2.3 Distortion correctionA prerequisite of the theory treated in this thesis is that the camera behaves according to
the pinhole model. But cheap wide-angle lenses, such as those used in security systems vio-
late this requirement. In such cases the grossest distortions from the pinhole model are
usually radial. A correction step is, therefore, necessary before any metrology process
may be performed. Several possible methods have been investigated to correct such a dis-
tortion. A simple correction has been proposed by Devernay and Faugeras [DeFa01]
where only one image of the scene is necessary and the radial distortion model is com-
puted from the deformation of images of world straight edges. Devernay’s algorithm first
extracts edges from the image and measures how much these are bended compared to a
straight line. On the basis of the degree of distortion, a correction factor
€
f rd( ) can be
computed. So, it is possible to correct the whole image in the following way: A point
€
xc inthe corrected image can be computed from its corresponding point
€
xd in the distorted im-
age with a given center
€
c of distortion by
€
xc = c+ f rd( )× xd − c( ) . (6)
Idealistically, all previously detected lines, that have been bended because of per-
spective projection, become straight and the image can be considered to be taken with a
pinhole camera. In fact, this is not possible for all edges in reality, because not all bended
edges can be detected. With such a corrected image, perspective and metrology algo-
rithms can be safely performed.
14
14
An example from [Crim01] shows an image captured by a cheap security type cam-
era which exhibits radial distortion (see Figure 16). Note how straight edges in the scene
appear curved in the image. After extracting all edges, a set of edges assumed to be
straight in the scene has been selected. From those, the distortion parameters has been
computed and the image can be corrected accordingly. Note that now images of straight
edges in the world are straight. I did not implement any distortion correction algorithm into
my system because there are several solutions offered already†. Thus, the lens distortion of
the image from which measurements are going to be computed has to be corrected be-
forehand.
Figure 16: Radial distortion correction; a) original image showing radial distortion; b) lines corresponding tostraight world edges have been selected in image; c) corrected image; d) edges from corrected image.[Crim01]
2.4 Vanishing points and vanishing linesVanishing points and vanishing lines are extremely powerful geometric cues. They convey
vast amounts of information about direction of lines and orientation of planes. These enti-
ties can be estimated directly from the image and no explicit knowledge of the relative ge-
ometry between camera and viewed scene is required. Often they lie outside the physical
image, but his does not affect the computations.
† An example for software that corrects lens distortion is LensDoc from Andromeda Software(http://www.andromeda.com/info/lensdoc/).
15
15
After straight-line segments in the image have been detected with an edge detector
and broken ones have been merged, vanishing points can be computed. Images of paral-
lel world lines intersect each other in the same vanishing point. Note here, that these lines
are parallel in the world, but because of perspective distortion they do not look parallel in
the image anymore. The vanishing point is, therefore, defined by at least two such lines.
Images of lines parallel to each other and to a plane intersect in points on the vanishing
line. Therefore, two sets of those lines with different directions are sufficient to define the
vanishing line (see Figure 17). A Maximum Likelihood Estimate algorithm [Crim01] can be
employed when more than two lines or orientations are available to compute the most-
likely correct intersection.
Figure 17: Extracted parallel lines, vanishing points, and a vanishing line from an image.
16
16
3 Previous & Related WorkThere are particular areas of research that are applicable to my work: multiple- and
single-view modeling, rendering, and metrology. This section presents a survey of the most
significant work in the field of the three-dimensional reconstruction from two-dimensional
images. The papers are arranged from single-view systems to multi-view ones. I will talk
specifically about the use of panoramas and about the method “Tour into the picture”
(TIP).
There have been various papers about image-based modeling and rendering and
image-based metrology. These results are closely related to the approach presented here.
Metrology only adds the knowledge of a reference distance in the image on which basis
other measurements can be obtained. For modeling purposes, multiple input images are
necessary to get a 360-degree viewable model [DTM96]. On the other hand, for metrol-
ogy, a single image can be enough [HAA97, LCZ99, CRZ00, Crim01, Crim02, KBB02,
WW02, ElHa01, KCS03].
In 1996, Debevec et al. [DTM96] proposed a hybrid geometry- and image-based
approach to model and render architecture from photographs. He developed a photo-
grammetric modeling method, which facilitates the recovery of the basic geometry of the
photographed scene. Also, he came up with a better model-based stereo algorithm to re-
cover how the real scene deviates from the basic model. Finally, with view-dependent tex-
ture mapping his method was composed of multiple views of the scene.
“Tour Into the Picture” [HAA97], also known as TIP, presented in 1997 was one of
the first approaches using a single image to create a 3D scene and an animation out of it.
This will be described in more detail in section 3.1.1. In 1999, Criminisi et al. [LZC99]
started to contribute tremendously to multiple- and single-view metrology and modeling.
Contrary to Debevec’s accomplishments, Criminisi’s approach does not require multiple
images and the camera’s internal calibration. His algorithms use vanishing points and van-
ishing lines to establish planes by defining a square with four control points. These are
rectified and can then be used to take measurements on that plane knowing a reference
distance of an object on the same plane. Furthermore, it is possible to compute internal
camera parameters knowing three orthogonal vanishing points. This technique is restricted
to exceptional images, e.g. ones where specific edge configurations can be identified.
Later, he added the option to measure between parallel planes and emphasize the usage
of uncalibrated images and finally, the reconstruction of complete 3D scenes from single
images [CRZ00, Crim01, Crim02]. He assumes to have images from which vanishing lines
can be extracted. I will talk more about Criminisi’s single-view metrology in section 3.1.2.
Several authors describe extensions of Criminisi’s method. A vanishing point based method
17
17
is of equal precision and robustness compared with Criminisi’s homography-based ap-
proach, but with less complexity [WW02]. Another approach [ElHa01] neither needs van-
ishing lines nor calibrated images to create complete 3D models. Finally, Kushal et al.
[KBB02] suggested a method using two planes in the scene, selected from the user, to con-
struct a three-dimensional representation of the image. Furthermore, in [KCS03] they
showed a method based on building the final model through high-level primitives like
planes, spheres, cuboids, and others.
3.1 Metrology & modeling from a single imageIn general, one view alone does not provide enough information for a complete 3D recon-
struction. However, some metric quantities can be computed from the knowledge of some
geometrical information, such as the relative position of points, lines, and planes in the
scene. But, generally, in order to do so, the intrinsic parameters of the camera need to be
known. These parameters are: focal length, principal point, skew, and aspect ratio
[Fau93].
A number of visual algorithms have been developed to compute the intrinsic pa-
rameters of a camera in the case that they are not known. This task is called camera cali-
bration. Usually, calibration algorithms assume some of the camera internal parameters to
be known and derive the remaining ones. Common assumptions: unit aspect ratio, zero
skew, or coincidence of principal point and image center. The work of Tsai [Tsai87] has
been popular in the field of camera calibration. From a single image of a known, planar
calibration grid (e.g. a checker board) the algorithm estimates the focal length of the cam-
era, its external position, and its orientation assuming a known principal point. The prob-
lem of calibrating a camera is also discussed by several other research scientists, for ex-
ample by Faugeras [Fau93, DeFa01] as mentioned before in section 2.3. He presents al-
gorithms to compute the projection matrix (external calibration) and eventually the camera
internal parameters from only one view of a 3D known grid. He analyses linear and non-
linear methods for estimating the 3D-2D projection matrix, the robustness of the estimate
and the best location of the reference points. An interesting problem is addressed in
[KSH98] by Kim et al. In this paper, the authors compute the position of a ball from single
images of a football game. By making use of shadows on the ground plane and simple
geometric relationships based on similar triangles the ball can be tracked throughout the
sequence.
In the following sections, I will discuss TIP [HAA97] and Criminisi’s “Single-view me-
trology” [LZC99, CRZ01] in more detail. With TIP it is not necessary to know internal cam-
era parameters nor any reference distances. Nothing is needed to create a kind of walk-
through or fly-through animation. “Single-view metrology” suggests a way of measuring
18
18
the scene without knowing any internal camera parameters. Other scene constraints are
necessary here. I note that the approach developed in this thesis does need some internal
camera parameters, but does not require special scenes like in “Single-view metrology”.
3.1.1 “Tour Into the Picture”TIP presented in 1997 [HAA97] was one of the first approaches using a single image to
create a 3D scene and an animation out of it. The main idea of this method is simply to
provide a user interface, which allows the user to easily and interactively perform the fol-
lowing operations. I follow [HAA97].
(i) Adding “virtual” vanishing points for the scene – The specification of the
vanishing point should be done by the user.
(ii) Distinguishing foreground objects from background – The decision as to
whether an object in the scene is near the viewer should be made by the
user, since no 3D geometry of the scene is known. In other words, this
means that the user can freely position the foreground object, with the cam-
era parameters being arranged.
(iii) Constructing the background scene and the foreground objects by simple
polygons – In order to approximate the geometry of the background scene,
several polygons should be generated to represent the background. This
model is then a polyhedron-like form with the vanishing point being on its
base. The “billboard”-like representation and its variation are used for fore-
ground objects.
These three operations are closely related to each other so that the interactive user inter-
face should be able to provide their easy and simultaneous performance. A spidery mesh
is the key to fulfilling this requirement.
Figure 18: "Tour into the picture" process overview.
19
19
Figure 19: "Tour into the picture": a) apply a spidery mesh on the image; b) define five regions.
This method is outlined as follows: Figure 18 shows the process flow. After an input
image is digitized (Figure 18(a)), the 2D image of the background and 2D mask image of
the foreground objects are made (Figure 18(b), (c)). The background image is made by
retouching the image with the foreground objects removed. This is easily done with 2D
paint tools like Adobe Photoshop. TIP uses a spidery mesh as seen in Figure 19(a) to pre-
scribe a few perspective conditions, including the specification of a vanishing point (Figure
18(d)). This spidery mesh is a 2D image consisting of a vanishing point and an inner rec-
tangle. The inner rectangle is used to specify the rear window in 3D space (see Figure
19(a)). The rear window can be thought of as the border that the virtual camera cannot
go through when making the animation. The TIP GUI allows the user to deform and trans-
late the inner rectangle and to translate the vanishing point. Next, the background is mod-
eled with five 3D rectangles (Figure 18(e)). The outer rectangle in the spidery mesh is de-
composed into five smaller regions: floor, right wall, left wall, rear wall, and ceiling Figure
19(b). Then, simple polygonal models for the foreground objects are also constructed
(Figure 18(f)). Based on the foreground mask information, you construct a 3D polygonal
model for a foreground object in the scene. Since one polygon is usually not enough to
specify a complete object, you need to use several ones hierarchically ordered. After the
scene is completely reconstructed, the last user input is to position the new virtual camera
from which we want to see the scene (Figure 18(g)). For that, three parameters can be
20
20
controlled by the user through the user interface: the camera position, the view-plane nor-
mal, and the view angle. This allows for rotations, translations, zoom, and view angle
changes of the generated 3D scene. Finally, the image seen from a virtual position can be
rendered using ordinary texture mapping techniques (Figure 18(h)). On top of that, it is
possible to render animations by a key-framing method.
There are a few things to note about TIP. In some images it is difficult for users to
specify the vanishing point. When the 2D image contains more than one vanishing point
(see Figure 17) it is hard to create a background image and a foreground mask. But TIP
shows that it is possible to create an animation from only a single picture of photograph.
3.1.2 “Single-view metrology”Single-view metrology is the science of obtaining measurements of scene structure (e.g.
lengths, areas) from a single image. This field is mainly marked by Criminisi, whose work I
will basically follow [CRZ00, Crim02]. His idea is to use scene constraints imposed by
parallel lines and planes to obtain dimensions out of a single image. These dimensions can
be up to scale, that means we only know ratios, or with absolute metric values because of
a known reference measurement in the scene. Criminisi assumes in his approach that the
vanishing line of a reference plane in the scene may be determined from the image, to-
gether with a vanishing point for a reference direction (not parallel to the plane). He is
concerned with three types of outputs:
(i) measurements of distances between any of the planes which are parallel to
the reference plane;
(ii) measurements on these planes; and
(iii) the determination of the camera’s position in terms of the reference plane
and direction.
His measurement methods are independent of the camera internal parameters. Further-
more, his ideas can be seen as reversing the rules for drawing perspective images given
by Leon Battista Alberti in his treatise on perspective (1435) as described in the section
“Introduction”.
The basic geometry of the plane’s vanishing line and the vanishing point Criminisi
used are illustrated in Figure 20. The vanishing line
€
l of the reference plane is the projec-
tion of the line at infinity of the reference plane into the image. This line could also be de-
scribed as the horizon. The vanishing point
€
v is the image of the point at infinity in the ref-
erence direction. Note that the reference direction need not be vertical. The vanishing
point is then the image of the vertical “footprint” of the camera center on the reference
plane. Likewise, the reference plane will often, but not necessarily, be the ground plane, in
which case the vanishing line is commonly known as the “horizon”.
21
21
Figure 20: Basic geometry of Criminisi’s approach: The plane’s vanishing line
€
l is the intersection of the imageplane with a plane parallel to the reference plane and passing through the camera center. The vanishing point
€
v is the intersection of the image plane with a line parallel to the reference direction through the camera cen-ter [Crim01].
First, let us take a look at Criminisi’s approach to measure the distance (in the refer-
ence direction) between two parallel planes, specified by the image points
€
x and
€
x'.Figure 21 shows the geometry, with points
€
x and
€
x' in correspondence. He formulated the
following theorems that solve this problem.
Theorem 1: Given the vanishing line of a reference plane and the vanishing point for areference direction, then distances from the reference plane parallel to the reference di-rection can be computed from their imaged end points up to a common scale factor. Thescale factor can be determined from one known reference length.
Theorem 2: Given a set of linked parallel planes, the distance between any pair of planesis sufficient to determine the absolute distance between any other pair. The link providesa chain of point correspondences between the set of planes.
Figure 21: Distance between two planes relative to the distance of the camera center from one of the twoplanes: a) in the real world; b) in the image [Crim01].
To understand this, we need to take a look at Figure 21. The four points
€
x ,
€
x',
€
c ,
€
vdefine a cross-ratio, where
€
x is a point on the far plane
€
X ,
€
x' is point on the near plane
€
X ' ,
€
c is the camera center, and
€
v is the vanishing point. The value of the cross-ratio de-
termines a ratio of distances described as
€
d(a,b) between planes in the world as follows:
22
22
€
cross =d(x,c) d(x' ,v)d(x' ,c) d(x,v)
=d(X ,c) d(X ' ,v)d(X ' ,c) d(X ,v)
(7)
Assuming that the vanishing point
€
V is at infinity the distances
€
d(X ' ,v) and
€
d(X ,c) are in-
finitesimal large (
€
∞ ). So, they can be eliminated from (7), which leads to:
€
cross =d(X ,c)d(X ' ,c)
(8)
Let us define the far-plane-to-camera distance as
€
ZC and the near-plane-to-camera distance
as
€
ZC − Z :
€
d(X ,c) = ZC and
€
d(X ' ,c) = ZC − Z (9)
€
cross =ZC
ZC − Z(10)
After some algebra this function leads to:
€
ZZC
=1− 1cross
(11)
So, with a known far-plane-to-camera distance
€
ZC or near-plane-to-camera
€
ZC − Z , we can
use the cross ratio from the image to compute the plane-to-plane distance
€
Z (and vice-
versa). The plane-to-camera distance may be difficult or inconvenient to measure directly.
Instead, we can use a known plane-to-plane distance (in the reference direction) to derive
the unknown plane-to-plane distance (see Figure 22). If a reference distance
€
Zr is known,
we need to apply the cross-ratio like described above to
€
v,cr ,r2 ,r1 to get plane
€
π r -to-
camera distance. Then, we need to use that distance and the cross-ratio of
€
v,cs,s2 ,s1 to get
the distance between
€
π r and
€
π to determine
€
ZC .
Figure 22: Distance between two planes relative to the distance between two other planes: a) in the world; b)in the image [Crim01].
If the reference plane
€
π is affine calibrated (its vanishing line is known) then from
image measurements we can compute the ratios of lengths of parallel line segments on the
23
23
plane and the ratios of areas on the plane. An affine calibration is an affine rectification.
This method was described earlier as plane-to-plane homology, to recall:
€
x = HX with
€
H = I + µvaΤ
v•a(12)
As described before, the camera’s distance
€
ZC from a particular plane can be obtained
knowing a single reference distance
€
Zr . Knowing
€
ZC allows computing the complete posi-
tion of the camera.
Now, there is the question of reconstructing the whole scene. Computing measure-
ments of relevant edges in the scene allows a complete model reconstruction. An example
of a complete reconstruction is shown in Figure 23.
Figure 23: An example of results obtained with Criminisi’s single-view metrology: a) original image; b) synthe-sized image; c) synthesized view with original camera location.
3.2 Metrology & modeling from multiple imagesAn extension of single-view metrology and modeling is multiple-view metrology and model-
ing. With more than one view, it is possible to reconstruct or measure a bigger portion of
the scene.
3.2.1 Using two views
Figure 24: Stereovision system: Two images of the same scene are captures. Three-dimensional structure canbe computed from the analysis of those images.
24
24
The classical algorithms for 3D reconstruction use stereovision systems. Stereovision con-
sists of capturing two images of a scene taken from different viewpoints and estimating the
depth of the scene by analyzing the disparity between corresponding features (see Figure
24). This methodology finds its basis in trigonometry and is employed by the human bin-
ocular vision system. The basic steps in reconstructing a scene from two images are:
(i) finding corresponding points on the two images; and
(ii) intersecting the corresponding rays in the 3D space.
3.2.2 Using three or more viewsTwo views suffice to reconstruct a scene, but adding more images from different points of
view can constrain the reconstruction problem by reducing the uncertainty in the estimated
structure. This is particularly true if a line matching process is used rather than a point
matching one (line matching is not possible in two views). Furthermore, the use of three or
more views allows a check on the consistency of the features matched using the first two
views.
3.3 Metrology & modeling from panoramas?Panoramas are an interesting extension of using multiple views to apply scene reconstruc-
tion and metrology methods. There is not much work done is this field, yet. In [KAS01], a
walk-through of a panorama image is proposed as an extension to TIP. With this model, it
is not possible to reconstruct the whole scene, but it is the beginning of creating a 3D
model.
In a (spherical) panorama image, the environment viewed from a camera is mapped
onto a base sphere at the camera position. As shown in Figure 25, parallel lines
€
A and
€
Bon the ground plane are projected onto this sphere as arcs
€
A' and
€
B', respectively.
€
A'and
€
B' intersect at a point on the base sphere referred to as a vanishing point of the
spherical image. Since
€
A and
€
B on the ground plane take on any inclinations, the set of
all vanishing points on the sphere form a circle, that is the intersection of the base sphere
with the horizon plane. This circle is said to be a vanishing circle that is analogous to the
vanishing line for the planar image. The vanishing circle divides the base sphere into two
disjoint hemispheres. The lower one corresponds to the ground plane in the 3D environ-
ment, and the upper one corresponds to the space above the ground plane. Thus, the van-
ishing circle can be thought of as the horizon that separated the earth represented by the
ground plane from the sky. Using these assumptions, it is easy to reconstruct the back-
ground and foreground image and a TIP-like walk- or fly-through animation can be com-
puted.
25
25
Figure 26 and Figure 27 show an example of spherical and conical panoramas
that are used for a walk-through or fly-through animation.
Figure 25: Vanishing point a vanishing circle for a spherical panorama image.
Figure 26: Using a spherical image to create other TIP views.
Figure 27: Using a cylindrical image to create other TIP views.
26
26
4 PhotoMeter: Easy-to-use MonoGraphoMetricsThe solution proposed here is less general than the approaches mentioned in the previous
section. It offers two significant advantages. First, the user does not need to waste time es-
tablishing a reference plane like with Criminisi’s approach. Second, measurements may be
taken even when only a small portion of the floor below the measured points is visible, but
no set of floor edges is available to define vanishing points (e.g. see Figure 28).
Figure 28: PhotoMeter, for example helps to measure the height and width of a window (a) or the height,width, and depth of a locker (b).
4.1 IntroductionOne of the primary challenges of Computer Vision and Computer Graphics is the automa-
tion of the creation of precise 3D models of real environments. The objective of the project
described here is much more modest. I strive to provide users with a very simple-to-use and
effective tool for performing real 3D measurements from a single photograph. Such a tool
may be used for measuring buildings, rooms, windows, doors, furniture pieces, and even
people.
The advantage of the proposed approach lies in its simplicity. A simple mouse op-
eration – position-press-drag-release – is sufficient to precisely measure the 3D location of
a point visible in the image. Relative vertical and horizontal dimensions between such a
user-defined 3D point and a previously defined point, serving as a temporary anchor, are
clearly shown on the screen as mark-up arrows with dimension labels. This tool has been
integrated within a complete interactive system, called PhotoMeter. The user of PhotoMe-
ter can load a picture, perform the desired measurements, save the marked image for fu-
ture references or email it to a colleague, client, or sub-contractor.
The simplicity and effectiveness of PhotoMeter result from a design decision, which
restricts its use to pictures taken with a calibrated camera. More specifically, the user must
know the horizontal field of view of the camera and the height of the camera above the
27
27
floor or ground when the picture was taken. Furthermore, PhotoMeter relies heavily on the
assumption that the camera was perfectly leveled (horizontal). Consequently, errors in
camera calibration, in its height, and in its horizontal alignment will introduce errors in the
measurements displayed by PhotoMeter. I provide an error analysis and suggest an ap-
proach for camera calibration, but recommend that PhotoMeter be used with tripods that
permit to lock the camera tilt, hence ensuring a horizontal position and that make it easy
to always set the camera at the same height. Alternatively, a simple stool may be used to
support the camera.
4.2 ComputationsIn this section, I will explain necessary foundations for the computations and how the
measurements are computed by PhotoMeter.
4.2.1 Main IdeaAfter the image has been prepared in means of lens distortion correction and loaded, Pho-
toMeter can perform its work. PhotoMeter assumes the following scene settings:
(i) the aspect ratio of the image is set to 4:3;
(ii) the camera’s focal center is at
€
C = (0,h,0)Τ where
€
h is the height above the
floor; that means the origin of the world coordinate system is set to be on
the floor exactly below the camera;
(iii) the center of the image is at
€
S = (0,h,d)Τ where
€
d is the distance between
the camera center and the image center.
Figure 29: The basic setup of the 3D scene (left) and basic labels in the scene (right).
The scene in Figure 29 displays the image as a look-through screen. The distance
€
d is set
automatically by PhotoMeter depending of the horizontal field of view set by the user
through the calibration dialog described in section Fehler! Verweisquelle konnte nicht ge-
funden werden.. Furthermore, the scene relies on a left-handed coordinate system as used
in OpenGL. Then, the y-vector is the up-vector in the scene, the x-vector is the side-vector,
and the z-vector is the depth-vector respectively to the camera center. Using these settings
for the scene, PhotoMeter needs two more coordinates. First, the coordinates of the pixel
€
P'= (p'x , p'y )Τ are needed, which is the pixel in the image where the object to be meas-
28
28
ured is situated. Second, the projection of the point
€
P' onto the floor gives the pixel
€
F'= ( f 'x , f 'y )Τ. A dragging constraint ensures that
€
f 'x = p'x . Finally with this input, dimen-
sions between every point in the image whose floor shadows are visible can be computed
easily. These computations can be performed by similar triangles.
4.2.2 Mathematical/geometrical calculations
Figure 30: The sequence shows how to measure a window with two mouse clicks. (a) The user clicks the top leftcorner of the window, drags down, and releases when the line touches the floor. During the drag, the line isconstrained to remain vertical. The 3D location of the top corner of the window is computed and becomes theanchor (datum). (b) The user clicks the opposite, lower-right corner of the window and drags down to thefloor. The 3D location of the opposite corner is computed. (c) PhotoMeter displays the perspective projectionof the vertical and horizontal displacements between the anchor and the new point. (d) The user pressesENTER to keep this series of measurements and to annotate the image with the associated dimensions.
To obtain the height of a world 3D point
€
P in the scene, the user clicks at the pixel
€
P' where
€
P appears on the screen, drags the cursor down to the floor, and releases the
mouse-button at the pixel
€
F' where the vertical projection of
€
P onto the floor appears on
the screen. To help the user ensure that the line from
€
P' and
€
F' is vertical, once
€
P' is se-
lected, the horizontal motions of the cursor are temporarily disabled, until the mouse-
button is released. As shown in Figure 30, when
€
P' identifies a point on a wall or on a
vertical side of a furniture, the location of
€
F' is obvious. Furthermore, when
€
P lies on the
floor, no dragging is necessary. In this case
€
P' equals
€
F'. The first 3D point
€
P identified
during a series of measurements is used as an anchor (datum) for all subsequent meas-
urements in the series. Subsequent 3D points may be added to the series until the ENTER
key is pressed. PhotoMeter computes the vertical and horizontal displacements, in 3D, be-
tween each one of these points and the anchor. It overlays these displacements on the im-
age by drawing their perspective projections as red lines. When ENTER is pressed, the se-
ries is frozen and the labels with dimensions added to the image. The next selected 3D
point will automatically become the anchor for the next series. To find the coordinates
€
(px ,py , pz )Τ of the selected 3D point
€
P in the coordinate system of the floor projection of
the camera, PhotoMeter uses
€
P' and
€
F'.
29
29
Figure 31: 3D view showing the geometric computation of the horizontal distance dobject from the object to thecamera.
Figure 32: Top view showing how to compute the x and z coordinates of the object. The orange line defines avertical section plane through the camera and the object.
Figure 33: Side view of vertical section plane defined in Figure 32 explaining the computation of the height ofthe object.
30
30
To understand the internal computations of PhotoMeter, we first take a closer look
at Figure 31. In this figure, you can easily see how to compute
€
dobject , which is the absolute
distance from the world coordinate origin
€
O = (0,0,0)Τ (vertical projection from the camera
position onto the floor) to the foot section of the object
€
F = ( fx ,0, fz )Τ . To solve this prob-
lem, I use similar triangle techniques (see the green marked triangles in Figure 31). The
distance
€
dobject can be computed as follows:
€
dobjectd'
=hh'
with
€
d'= p'x2+ d2 and
€
h'= f 'y (13)
€
dobject =h• p'x
2+ d2
f 'y(14)
Now, to compute
€
px , I also use similar triangles (see Figure 32):
€
pxp'x
=dobjectd'
€
px =p'x• dobject
d'(15)
After substituting
€
dobject and
€
d', I get the following result:
€
px =p'x
p'x2+ d2
•h• p'x
2+ d2
f 'y , which leads to
€
px =p'x• hf 'y
(16)
I also use similar triangles to get
€
py (see Figure 33). I need to add
€
h afterwards:
€
pyp'y
=dobjectd'
€
py = h+p'y• dobject
d'(17)
After substituting
€
dobject and
€
d', I get the following result:
€
py = h+p'y
p'x2+ d2
•h• p'x
2+ d2
f 'y , which leads to
€
py = h+p'y• hf 'y
(18)
Then, only
€
pz is left and is again computed by similar triangles (see Figure 32):
€
pzd
=dobjectd'
€
pz =d •dobject
d'(19)
After substituting
€
dobject and
€
d', I get the following result:
€
pz =d
p'x2+ d2
•h• p'x
2+ d2
f 'y , which leads to
€
pz =d • hf 'y
(20)
To conclude, I get
€
P = (px , py ,pz )Τ with:
€
px =p'x• hf 'y
;
€
py = h+p'y• hf 'y
;
€
pz =d • hf 'y
(21)
31
31
The implementation of these computations is shown in Figure 34.
class Calculation{
// variable declarationspublic:...
private:...
// p and f values have to be relative to a screen center of (0,0)Calculation::Calculation (GLfloat distCameraScreen, GLfloat cameraHeight, GLfloat pPrimeX, GLfloat pPrimeY, GLfloat fPrimeY){
// given values through user inputcamera.y = cameraHeight;object.pPrime.x = pPrimeX;object.pPrime.y = pPrimeY;object.fPrime.x = pPrimeX;object.fPrime.y = fPrimeY;screen.distFromCamera = distCameraScreen;
// set valuescamera.x = 0.0f;camera.z = 0.0f;screen.center.x = 0.0f;screen.center.y = camera.y;screen.center.z = screen.distFromCamera;
}
void Calculation::Compute3DCoordinate(){
ComputeHeight();ComputeWidth();ComputeDepth();
}
void Calculation::ComputeHeight(){
object.p.y = camera.y + object.pPrime.y * camera.y / fabs(object.fPrime.y);
}
void Calculation::ComputeWidth(){
object.p.x = object.pPrime.x * camera.y / fabs(object.fPrime.y);}
void Calculation::ComputeDepth(){
object.p.z = screen.distFromCamera * camera.y / fabs(object.fPrime.y);
}}
Figure 34: Implementation of the basic PhotoMeter computations.
Knowing the position of more than one pixel that has been projected back into the
world coordinate system, it is possible to compute distances between these positions.
32
32
4.2.3 Validation with “3D Measurement Simulation”It is easy to prove the mathematical correctness of the functions stated above. To see if
these functions work to compute the 3D position of a point presented by a pixel in the im-
age, I implemented another program, which I called 3D Measurement Simulation. Using
3D Measurement Simulation, I altered and simplified these functions several times until I
came to the final result presented above.
3D Measurement Simulation basically consists of two views (see Figure 35 and
Figure 36). The first one is a perspective view where you can see the scene with the setup
of a camera, a screen (image plane), a floor, several wall planes, and two objects used
for validation (cuboids). The second one is a camera view, which shows the picture taken
by the camera i.e. the picture seen through the camera.
To validate the correctness of the computed 3D position of the selected pixel in the
image, I put two spheres in the scene representing a specific position in 3D. Here, I will
only talk about the position of the upper right corner of the right cube. Since this is a gen-
erated scene, the correct position of this corner is known. Now, I need to display the com-
puted position after clicking at this specific corner and show the occurred differences,
which is the error. To do so, I implemented a special section in the program as seen in
Figure 36. Here is a short description of the outputs. The numbers refer to Figure 36:
1) The user can change the perspective matrix of the perspective view to set
values (camera view, navigation view).
2) This section provides the user with basic scene settings: Here, the camera is
at a height of 1, the distance between the screen and the camera is set to
1.5, and this results in a horizontal field of view of 28.07°.
3) Selection between two presets is given here: This sets the comparison object
(left or right cube). Also, their original positions are displayed.
4) This section presents the scanned position in the image and the computed
position corresponding to the same pixel.
5) The original and the computed positions are compared and the error in per-
cent is shown here.
Having a very low error (usually less than 1%) for different pixels representing the preset
position of the sphere allows concluding that the computations are correct. This low error
only leads to pixel discretization failure. Beside the calculated error, there is another way
of validation: The program draws a small sphere in the scene at the computed position.
Now, you can check if this sphere is at the point you intended to reach.
33
33
Figure 35: Shows the two different views of 3D Measurement Simulation.
Figure 36: Shows the section of 3D Measurement Simulation where the computed error is displayed.
Figure 37 presents the results of computing the error for the right cube. The goal is
to calculate the position of the green sphere (see Figure 37(a) upper left). To do so, the
user clicks on that position, drags the mouse down to the floor, and releases the mouse
button (it draws a red arrow). Simultaneously, the new position and the error are dis-
played. Furthermore, the blue sphere is drawn at that position (see Figure 37(b)). The per-
spective view allows visual checking for its correctness. Figure 37(c) provides the results.
34
34
Figure 37: Shows the comparison process of the original and the computed position of a set sphere.
4.3 Requirements & PreliminariesWhen metrology is used for engineering or architectural applications, it is important to
provide accurate measurements and to quantify the associated error. As in other Mono-
GraphoMetrics applications, the error in a measurement computed with PhotoMeter is due
to the combined effect of several errors:
(i) Camera calibration
(ii) Lens distortion
(iii) Horizontal alignment
(iv) Tripod height
(v) Error in selecting a point in the image
(vi) Error in selecting its floor shadow
The cumulative effects of these errors on the accuracy of the measurements are discussed
in section 4.6. Here, I briefly suggest how to reduce these errors through proper camera
calibration and image correction.
PhotoMeter uses the field of view of the camera. Test results have shown that speci-
fications provided by camera manufacturers may be confusing or inaccurate. Therefore it
is preferable to measure the actual field of view manually. Figure 38 shows the starting
panel in PhotoMeter explaining graphically how to measure the horizontal field of view by
35
35
placing the camera parallel to a wall and reporting the perpendicular distance (Label “a”
in Figure 38) from the camera to the wall and the horizontal length (Label “b” in Figure
38) of the visible portion of the wall. The horizontal field of view angle (“α”) is computed
automatically. To ensure that the camera viewing direction is perpendicular to the wall, I
recommend placing the camera so that the left and right vertical edges of a wall, door, or
window appear perfectly flushed against the left and right borders of the image in the
viewfinder. This camera calibration needs to be performed only once per camera.
Figure 38: Initial camera calibration panel in PhotoMeter.
To ensure that the camera is perfectly horizontal when used to take a picture for
PhotoMeter, one could simply place the camera on a horizontal surface, such as a stool or
a tripod with fixed orientation and height. To ensure that the stool or tripod is perfectly
horizontal, one could use expensive theodolites or a much cheaper laser level (Figure 39),
which includes a laser pointer that can be used to ensure that the height of the laser point
on the wall is identical in several directions.
Figure 39: An inexpensive laser level for adjusting the horizontal.
The height of the camera must also be measured accurately. Pointing the laser at a
vertical edge in the room may ensure that the height is measured vertically. After the pic-
ture has been taken, distortions in the image caused by the lens of the camera should be
corrected [DeFa01]. Several programs that correct lens distortion automatically or semi-
automatically are available and should be use here independently from PhotoMeter‡.
‡ An example for software that corrects lens distortion is LensDoc from Andromeda Software(http://www.andromeda.com/info/lensdoc/).
36
36
4.4 User InterfaceI implemented PhotoMeter on the MacOS X platform as a non-portable software. In this
section, I would like to describe all features and buttons of PhotoMeter. For a better rec-
ognition of PhotoMeter, I created an application icon (see Figure 40). The program can be
opened in two different ways: by clicking on the application icon in the file browser or
from the dock or by dragging an image on the application icon.
Figure 40: Start screen of PhotoMeter.
Right after the user has opened the program; another panel automatically appears
from the top of the main window – the calibration panel (see Figure 38). Here, the user is
asked to provide some values to calculate the camera’s internal parameters. After confirm-
ing this dialog, the program can be controlled by the buttons from the toolbar on the right
(see Figure 40), by keyboard shortcuts, or by the menu. The buttons are explained as fol-
lows:
Starts an open panel to load an image.
Starts a save panel to save the measured image.
Starts a print panel that allows printing of the altered image.
37
37
Saves the current measurements permanently.
Clears the last performed measurement.
Clears all measurements to get the original image.
Opens the calibration dialog to change camera settings.
Keyboard shortcuts are displayed in the menu (see Figure 41).
Figure 41: Menu of PhotoMeter with all submenus.
When an image is loaded, either by dragging on the application icon, by dragging
into the program or by using the menu or the button (see Figure 43(a)), PhotoMeter
checks the file format, the aspect ratio, and the size of the image. The system can interpret
images of the file formats JPEG, JPG, BMP, GIF, TIFF, TIF, PNG, PDF, PSD, PICT, and EPS.
This variety is possible because of the Acceleration framework offered from Apple. If the
image does not have an aspect ratio of 4:3 PhotoMeter refuses the image (see Figure
43(f)). If the image does not have the right resolution (800x600 pixels), but offers the
right aspect ratio (4:3), PhotoMeter resizes the image to 800x600 (see Figure 43(e)).
800x600 is a size possible to use as an OpenGL texture. After the image is loaded, the
user can start measuring the scene. The current measurement is drawn in red arrows. The
anchor representing the first measured object is marked with a red dot and is saved for
further processing. After pressing ENTER or using the “Keep Red Arrows” button the red
arrows change its color and dimensions are assigned and finally saved (see Figure 42).
The user can change the color and the width of the arrows in the preferences dialog when
these interfere with the image (see Figure 43(d)). After the user has finished his work, the
result can be saved or printed (see Figure 43(b) and (c)). If the user opens another image
without saving the previously edited one the software asks the user if he or she wants to
discard these changes or to save the measured image before proceeding (see Figure
43(e)). When the user quits the program, the same test takes place.
38
38
Figure 42: The user interface of PhotoMeter. This screenshot also shows measurements that are already saved(blue arrows with dimensions) and a measurement series that is still in progress (red arrows with anchor).
Figure 43: PhotoMeter panels: a) open dialog; b) save dialog; c) print dialog; d) preferences panel; e) savechanges dialog; f) aspect ratio dialog.
39
39
4.5 Implementation & DocumentationIn this section I describe the implementation of PhotoMeter. First, I give a short insight into
important implementation decisions. I will also go into the system’s class structure and
some significant activity explanations.
4.5.1 Implementation decisionsIn early stages, the question after the right platform emerged. I first wanted to implement a
portable program for the Mac OS X, Windows, and UNIX platform. To reach this goal, I
wanted to use OpenGL combined with GLUT. Lots of disadvantages came up when think-
ing about a user interface development: GLUT only allows a spare user interface. Further-
more, problems with image loading could arise. After all, I decided to implement my ideas
for the Mac OS X platform.
Mac OS X development is typified with Cocoa. Cocoa is a rich set of object-oriented
frameworks that allow for the most rapid development of applications on Mac OS X. The
Cocoa application environment is designed specifically for Mac OS X – only native appli-
cations and allows development of applications and plug-ins, which can make use of all
the extensive features of Mac OS X. Using Apple’s free Xcode Tools with Cocoa, allows
for rapid interface prototyping and a seamless development process. Apple’s Xcode Tools
consist of several programs:
(i) Xcode to actually implement;
(ii) Interface Builder to prototype the interface;
(iii) Icon Composer to create Apple conform icons; and
(iv) several for useful features.
After reviewing this development package I decided to create a Cocoa application.
Then, there was the question for the right programming language: Java, C, C++,
Objective-C, and Objective-C++ can be used. The Objective-C language is a simple com-
puter language designed to enable sophisticated object-oriented programming, but based
on standard C. So, it combines several features:
(i) object-oriented techniques allow the delivery of functionality that is packed
in the Cocoa framework;
(ii) because it is a standard ANSI C extension, existing C code can be adapted
and you can use all the benefits of C when working within Objective-C; and
(iii) it is a very simple and dynamic programming language.
The Objective-C++ language allows you to freely mix C++ and Objective-C code in the
same source file or project. Using Objective-C++, you can directly call Objective-C objects
from C++ code, and you can directly call C++ code from Objective-C objects. Thus, Objec-
tive-C++ allows you to use C++ class libraries directly from within the Cocoa application.
40
40
The filename extension for Objective-C files is .m (e.g. file.m); Objective-C++ uses .mm
(e.g. file.mm).
To summarize, I decided to implement a Cocoa application (to get support from its
extensive frameworks) with Objective-C++ (because it allows me to use my previous
knowledge of C and C++, but also the support of all Mac OS X features).
4.5.2 Class diagramsIn the UML class diagram presented in Figure 44, I am showing all classes used in my im-
plementation and its inheritances from classes from several Cocoa frameworks. The main
class in my implementation is MyNSOpenGLView. This class handles the user interface and
all user interactions like mouse and keyboard functions. Furthermore, all panels, image
saving and printing processes are controlled here. The basic OpenGL functionality is inher-
ited from the Cocoa class NSOpenGLView. The appearance of the actual scene – the im-
age and all arrows – is described in the class MyScene. This class is also connected to the
classes MeasurementEntry and APTexture. MeasurementEntry represents all values of an
arrow, including its start and end position, all connected measurements, its color and so
on. The class APTexture handles image loading using the Cocoa Application framework
and the creation of textures for OpenGL. The class Calculation, implemented in C++, com-
putes from necessary input values the 3D position of an object in the scene. All these
classes inherit from several Cocoa classes, mainly from NSWindow to create windows,
from NSPanel to create panels, and for sure from NSObject for basic object preferences.
4.5.3 Activity diagramsPhotoMeter offers a variety of interaction possibilities:
(i) setting camera parameters,
(ii) changing scene settings,
(iii) loading an image,
(iv) measuring the scene,
(v) saving the measured image, and
(vi) printing the image.
Figure 45 shows a UML use-case diagram of all loading, saving, printing procedures as
well as the usage of the camera calibration and scene settings dialog. Displaying the main
window without any panels on top characterizes the main state of the program. By clicking
on buttons different process are started. When the user opens the Calibration Panel and
sets new values, the input is validated continuously. After applying the user input, these
values get saved. The Preferences Panel, the Save Panel, the Print Panel work in a similar
simple way. Pressing the assigned button opens the panel; confirming the input terminates
the panel. Also, all panels can be cancelled without applying any input. The image load-
41
41
ing process is a more difficult procedure. First, it needs to be checked if the current image
has undergone any changes since it has been opened or saved. Second, the selected im-
age needs to be checked for its aspect ratio and its size. Based on the results and related
user inputs, the procedure goes different ways. Figure 46 describes the workflow of inter-
actions when measuring a loaded picture. Here, we need to distinguish between pressing
ENTER and any mouse operations. When the user presses ENTER, the current measuring
process is finished by saving and labeling these measurements. Using the mouse leads to
new measurements: Pressing the mouse sets the start point for a new measurement. Before
setting this position is the anchor, the software needs to check the amount of arrows of the
current measuring process. With dragging and later releasing the mouse, the measurement
process ends and red arrows are drawn on the image.
Figure 44: Main UML class diagram of PhotoMeter.
42
42
Figure 45: UML activity diagram explaining, the use of panels and dialogs of PhotoMeter.
43
43
Figure 46: UML activity diagram explaining mouse actions in the window.
4.6 ResultsThe accuracy of PhotoMeter is strongly dependent from a human task – taking the picture
with a well-setup camera. A small human failure evokes strong uncertainties with the com-
puted measurements. Beside human errors, there are also errors occurred because of pixel
discretization.
The measurement errors due to camera tilt and height error are also a function of
depth (i.e., distance between the measured object and the camera). These dependencies
are illustrated in Figure 47, Figure 48 and Figure 49 as theoretical values. Figure 47
shows the acceptable horizontal camera tilt as a function of depth, so that the error in the
object’s height does not exceed ten centimeters. A tilt of one degree would create an error
of less than ten centimeters for an object located at six meters from the camera. If the dis-
tance from the camera to the objects gets bigger the acceptable horizontal camera tilt for
44
44
a ten centimeters error drops rapidly to non-achievable values. Figure 48 shows the posi-
tion error as a function of distance for a one-degree tilt. For example, at a distance of
twenty meters from the camera, a one-degree camera tilt causes a position error of thirty-
five centimeters. Just as above, when the depth is higher the error raises linear, but gets
unacceptable. Figure 49 plots the error in the position of
€
P as a function of the error in
camera height. The error depends on the height of the measured point. I have plotted the
range of errors (dotted lines) and the mean error (red line).
Figure 47: Acceptable tilt as a function of depth to ensure an accuracy of 10cm of the object’s height.
Figure 48: Error in computed height as a function of distance for a tilt of one degree.
45
45
Figure 49: Position error as a function of inaccuracy in the estimation of the camera height.
A further source of error comes from the inaccurate selection of the points on the
screen. It stems from the discretization of the image and from the difficulty of aligning the
cursor with the correct pixel. Assuming that the user has selected the correct pixel, the
maximal error of the position in the image would be the half of the diagonal of a pixel.
Figure 50 plots the maximal position error correct pixel selection as a function of depth for
points near the center of the screen and points near the edge.
Figure 50: Error in computed height as a function of distance due to pixel resolution for points near the centerof the image (red line) and points near the border (blue line).
To measure the cumulative effect of these errors, I have experimented with a variety
of indoor scenes. The camera and tripod were calibrated as discussed in section Fehler!
Verweisquelle konnte nicht gefunden werden.. In practice, I obtain an accuracy of less
than five centimeters when measuring the dimensions of windows, doors, or furniture
pieces within five meters from the camera (see Figure 51).
46
46
Figure 51: The error in the dimensions measured with PhotoMeter and shown here does not exceed 5cm.
47
47
5 ConclusionI describe a very simple-to-use tool for measuring vertical and horizontal distances be-
tween arbitrary 3D points from a single image, provided that the points and their vertical
projection on the floor can be identified in the image. The errors resulting from the cumula-
tive effects of camera calibration, lens distortion, camera alignment, and pixel selection
increased with distance from the camera. At five meters, I usually achieve an accuracy of
less than five centimeters. I believe PhotoMeter is a viable alternative to physical meas-
urements or to more elaborate photometry techniques for a range of applications where
speed and ease-of-use are important and where the inherent inaccuracy is acceptable.
Such applications include area floor and wall estimations for interior decoration and plan-
ning the layout of office or kitchen furniture.
In the future, I will explore how higher-resolution and local image-processing tech-
niques can improve the accuracy of the manual point selection on the screen. Furthermore,
the method presented can be extended to panoramas.
48
48
Bibliography[A1435] L. B. ALBERTI. De Pictura. 1435. Reproduced and commented by Wissen-
schaftliche Buchgesellschaft Darmstadt. 2000.
[Ae1435] L. B. ALBERTI. On Painting. Translation of “De Pictura” into English at
http://www.noteaccess.com/Texts/Alberti/index.htm.
[Ber87] M. BERGER. Geometry II. Springer-Verlag. 1987.
[Crim01] A. CRIMINISI. Accurate Visual Metrology from Single and Multiple Uncali-
brated Images. Distinguished Dissertation Series. Springer-Verlag London
Ltd., September 2001.
[Crim02] A. CRIMINISI. Single-View Metrology: Algorithms and Applications. In Proc.
DAGM 2002 Symposium, Zürich, Switzerland, September 2002.
[CRZ00] A. CRIMINISI, I. REID, AND A. ZISSERMAN. Single view metrology. IJCV,
40(2): pages 123-148, 2000.
[DeFa01] F. DEVERNAY AND O. D. FAUGERAS. Straight lines have to be straight: Auto-
matic calibration and removal of distortion from scenes of structured envi-
ronments. Machine Vision and Applications, 1, pages 14-24. 2001.
[DTM96] P. E. DEBEVEC, C. J. TAYLOR, AND J. MALIK. Modeling and rendering architec-
ture from photographs: A hybrid geometry- and image- based approach. In
Proceedings, ACM SIGGRAPH, pages 11-20, 1996.
[HAA97] Y. HORRY, K. ANJYO, AND K. ARAI. Tour into the picture: Using a spidery
mesh interface to make animation from a single image. In SIGGRAPH, pages
225-232, 1997.
[ElHa01] S. F. EL-HAKIM. A flexible approach to 3D reconstruction from single images.
SIGGRAPH ’01 Sketches and Applications, 2001.
[Fau93] O. D. FAUGERAS. Three—dimensional Computer Vision: a Geometric View-
point. MIT Press. 1993.
[FM94] F. FIGGUEROA AND A. MAHAJAN. A robust method to determine the coordi-
nates of a wave source for 3-D position sensing. ASME Journal of Dynamic
Systems, Measurements and Control, 116: pages 505-511, September
1994.
[KAS01] H. KANG, S.PYO, K, ANJYO, AND S. SHIN. Tour into the picture using a Van-
ishing Line and its Extension to Panormamic Images. In Proceedings of
EUROGRAPHICS. 2001.
49
49
[KSH98] T. KIM, Y. SEO, AND K. HONG. Physics-based 3D position analysis of a soccer
ball from monocular image sequences. Proceedings of International Confer-
ence on Computer Vision, pages 721-726, 1998.
[KCS03] A. M. KUSHAL, G. CHANDA, K. SRIVASTAVA, M. GUPTA, S. SANYAL, T. V. N.
SRIRAM, P. KALRA, AND S. BANERJEE. Multilevel modelling and rendering of
architectural scenes. In Proceedings of EUROGRAPHICS. 2003.
[KBB02] A. M. KUSHAL, V. BANSAL, AND S. BANERJEE. A simple method for interactive
3D reconstruction and camera calibration from a single view. In Proceedings
Indian Conference in Computer Vision, Graphics and Image Processing,
2002.
[LCZ99] D. LEIBOWITZ, A. CRIMINISI, AND A. ZISSERMAN. Creating architectural models
from images. In Proceedings EUROGRAPHICS, pages 39-50, 1999.
[Le99] M. LEVOY. The digital Michelangelo project. In Proc. Eurographics, volume
18, September 1999.
[Maas92] H.-G. MASS. Robust automatic surface reconstruction with structured light. In
International Archives of Photogrammetry and Remote Sensing, volume
XXIX of Part B5, pages 102-107. 1992.
[NUS04] H. ASLAKSEN, National University of Singapore. Perspective in Mathematics
and Art. www.math.nus.edu.sg/aslaksen/projects/perspective/.2004.
[Sam86] P. SAMUEL. Undergraduate Texts in Mathematics: Projective Geometry.
Springer-Verlag. 1986.
[SeKn79] J. SEMPLE AND G. KNEEBONE. Algebraic Projective Geometry. Oxford Uni-
versity Press. 1979.
[SS94] G. STERN AND A. SCHINDLER. Three-dimensional visualization of bone sur-
faces from ultrasound scanning. Technical report, A.A.Du Pont Institute,
1994.
[Tsai87] Y. R. TSAI. A versatile camera calibration technique for high-accuracy 3D
machine vision metrology using off-the-shelf tv cameras and lenses. IEEE
Journal of Robotics and Automation, RA-3(4): pages 323-344, August
1987.
[WW02] G. H. WANG, Y. H. WU, AND Z. Y. HU. A Novel Approach for Single View
Based Plane Metrology. In Proc. International Conference on Pattern Rec-
ognition, vol.2, pages 556-559, Quebec City, Canada, August 2002.
50
50
CD ContentsWritten formulations:
- Thesis as PDF
Source codes:
- 3D Measurements Simulation
- PhotoMeter (version 0.1)
Compilations:
- 3D Measurements Simulation
- PhotoMeter version (1.0)