Technische Universität Berlin€¦ · Technische Universität Berlin...

Technische Universität BerlinSchool IV - Electrical Engineering and Computer ScienceDepartment of Computer Engineering and Microelectronics

Robotics and Biology Laboratory

Bachelor’s Thesis

Extended Visual Servoing for Manipulation

presented by

Sebastian KochMatr.-Nr.: 309837

[email protected] Engineering

Date of submission: December 1, 2010

Examiners: Prof. Dr. Oliver Brock, Prof. Dr. Olaf HellwichAdvisors: Prof. Dr. Oliver Brock, Dipl.-Ing. Maximilian Laiacker

mailto:[email protected]

Declaration

I hereby declare in lieu of an oath that I have produced this work by myself. Allused sources are listed in the bibliography and content taken directly or indirectlyfrom other sources is marked as such. This work has not been submitted to anyother board of examiners and has not yet been published. I am fully aware of thelegal consequences of making a false declaration.

Eidesstattliche Erklärung

Ich versichere an Eides statt, dass ich die vorliegende Arbeit selbstständig ange-fertigt habe. Alle benutzten Quellen sind im Literaturverzeichnis aufgeführt unddie wörtlich oder inhaltlich entnommenen Stellen als solche kenntlich gemacht. DieArbeit wurde bisher keiner anderen Prüfbehörde vorgelegt und noch nicht veröf-fentlicht. Ich bin mir bewusst, dass eine unwahre Erklärung rechtliche Folgen hat.

Place/Date Signature

I

Abstract

The interest in robots that are able to act in unstructured environments is increasing.The control of a robot in such an unstructured environment has to rely on senseand perception. Using visual perception for this provides comprehensive possibilitieswith regards to safe manipulation of objects. This thesis presents a new approach forvisual robot control (visual servoing). Existing approaches are extended in the waythat depth and position information of objects is estimated during the movement ofthe robot. This is done by the visual tracking of an object throughout the trajectory.As a result, it becomes possible to manipulate objects in the environment of arobot with little previous knowledge. In addition, a force sensor is used for reliablemanipulation in such unstructured environments.

Key words: hybrid visual servoing, visual robot control, manipulation, unstruc-tured environments, grasping, depth and position estimation

Kurzfassung

Das Interesse an Robotern die in unstrukturierten Umgebungen agieren ist wach-send. Die Steuerung eines Roboters in einer unstrukturierten Umgebung basiert aufder Abfrage von Sensoren und Wahrnehmungsfähigkeiten. Die Benutzung visuellerWahrnehmungsfähigkeiten erlaubt umfangreiche Möglichkeiten zur sicheren Manip-ulation von Objekten. In der vorliegenden Bachelorarbeit wird ein neuer Ansatzfür die Steuerung eines Roboters mit Hilfe visueller Information präsentiert. Bisherexistierende Verfahren werden so erweitert, dass die Tiefe und damit die Positioneines unbekannten Objektes während der Bewegung des Roboters bestimmt werdenkann. Dies ermöglicht es, Objekte in der unstrukturierten Umgebung eines Robot-ers mit relativ wenig Vorwissen zu manipulieren. Weiterhin stellt die Verwendungvon Kraftsensoren zusätzlich zu den Kameras einen wichtigen Bestandteil für dieManipulation dar.

Titel: Erweiterte visuelle Steuerung eines Roboters für die Manipulation von Ob-jekten.

II

Table of Contents

Preface IDeclaration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . IAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . IITable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . IIIList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . VList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . VNotation Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . VI

1 Introduction 11.1 Subject . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.3 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.4 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2 Methods and Techniques 42.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42.2 Robotics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.2.1 Control System . . . . . . . . . . . . . . . . . . . . . . . . . . 52.2.2 Visual Servoing . . . . . . . . . . . . . . . . . . . . . . . . . . 62.2.3 Coordinate Transformations . . . . . . . . . . . . . . . . . . . 7

2.3 Computer Vision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82.3.1 Camera Model . . . . . . . . . . . . . . . . . . . . . . . . . . 82.3.2 Epipolar Geometry . . . . . . . . . . . . . . . . . . . . . . . . 92.3.3 Visual Features . . . . . . . . . . . . . . . . . . . . . . . . . . 92.3.4 Object Tracking . . . . . . . . . . . . . . . . . . . . . . . . . . 92.3.5 Depth Estimation . . . . . . . . . . . . . . . . . . . . . . . . . 102.3.6 Triangulation . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

3 Implementation 113.1 Problem Description . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

3.1.1 Input and Output . . . . . . . . . . . . . . . . . . . . . . . . . 123.1.2 Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

3.2 Algorithm Description . . . . . . . . . . . . . . . . . . . . . . . . . . 133.2.1 Object Selection . . . . . . . . . . . . . . . . . . . . . . . . . 13

III

3.2.2 Initial Position Estimation . . . . . . . . . . . . . . . . . . . . 143.2.3 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163.2.4 Manipulation . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

3.3 Practical Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203.3.1 Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203.3.2 Hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

4 Experiments and Results 224.1 Performed Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . 22

4.1.1 Working Environment . . . . . . . . . . . . . . . . . . . . . . 224.2 Quantitative Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

4.2.1 Explanation of the Data Tables . . . . . . . . . . . . . . . . . 234.2.2 Interpretation of the Data . . . . . . . . . . . . . . . . . . . . 25

4.3 Qualitative Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 264.3.1 Trajectory Analysis . . . . . . . . . . . . . . . . . . . . . . . . 274.3.2 Triangulation Analysis . . . . . . . . . . . . . . . . . . . . . . 28

5 Conclusion 295.1 Summary and Contribution . . . . . . . . . . . . . . . . . . . . . . . 295.2 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305.3 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305.4 Final Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

References 32Images and Internet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

IV

List of Figures

1.1 Two examples of recently developed robots . . . . . . . . . . . . . . . 11.2 Overview of different kinds of autonomous robots . . . . . . . . . . . 2

2.1 Operational space control scheme . . . . . . . . . . . . . . . . . . . . 52.2 Joint space control scheme . . . . . . . . . . . . . . . . . . . . . . . . 52.3 Different visual servoing control modes . . . . . . . . . . . . . . . . . 62.4 Coordinate transformations . . . . . . . . . . . . . . . . . . . . . . . 72.5 Pinhole camera model . . . . . . . . . . . . . . . . . . . . . . . . . . 82.6 Triangulation of 3D points . . . . . . . . . . . . . . . . . . . . . . . . 10

3.1 The robot at the initial position and before the manipulation . . . . . 113.2 Object selection in an image . . . . . . . . . . . . . . . . . . . . . . . 133.3 Initial position estimation and camera rotation . . . . . . . . . . . . . 153.4 Camera images of start and intermediate position . . . . . . . . . . . 173.5 Illustration of previous knowledge of image-based visual servoing . . . 183.6 Overview of the manipulation procedure . . . . . . . . . . . . . . . . 193.7 Overview of the created software modules . . . . . . . . . . . . . . . 203.8 Front view of the Mekabot . . . . . . . . . . . . . . . . . . . . . . . . 213.9 4-finger human-like robot hand . . . . . . . . . . . . . . . . . . . . . 21

4.1 Overview of the objects that are used in the experiments . . . . . . . 234.2 Explanation of the 2D tracking error . . . . . . . . . . . . . . . . . . 244.3 Plot of the object position estimation error . . . . . . . . . . . . . . . 264.4 Trajectory analysis and plot of trial 13 . . . . . . . . . . . . . . . . . 264.5 Trajectory analysis and plot of trial 16 . . . . . . . . . . . . . . . . . 274.6 Backprojected rays from the camera into the scene . . . . . . . . . . 28

5.1 Key positions and tracked features of a complete trajectory . . . . . . 295.2 Slow movement trajectory based on force sensor input . . . . . . . . . 29

V

List of Tables

4.1 Experimental results trial 1 – trial 10 . . . . . . . . . . . . . . . . . . 234.2 Experimental results trial 11 – trial 20 . . . . . . . . . . . . . . . . . 24

Notation Remarks

The following notational conventions are used throughout this thesis:

• Vectors are lower case bold: v

• Matrices are upper case bold: M

• Scalars are mixed case italic: S, r

• Lower indices mark enumerations and components: ai

• Upper indices indicate the coordinate system: xc

• Transformation from coordinate system w to system c: Twc

Important information is marked in the following way:

These boxes highlight information that relates directly to the central theme.This includes conclusions of sections, practical information or overviews ofparticular significance.

VI

Chapter 1. Introduction

1 IntroductionThis chapter introduces the subject of the thesis. A short motivation for the general field of research

as well as a specific motivation for this thesis are presented. Furthermore, the objectives of this

thesis are specified and the following chapters are outlined.

1.1 Subject

The domain of service and field robotics is developing rapidly with the emergence ofnew technologies and new needs. Figure 1.1 shows two examples of recently createdrobots that illustrate this development. This domain includes robots that operatein unstructured environments with little or even no prior information. In contrastto industrial robotics with defined constraints, tasks are variable and have to beperformed autonomously.

Figure 1.1: Two recently created robots. The robot on the left is Jaco [1]. This robot armis build exclusively out of lightweight materials. It is used for personal assistance and medicalapplication. The right image shows iRobot [2]. It is a vaccum cleaning robot that worksautonomously in household environments.

Obviously, the control of a robot in such an unstructured environment has to rely onsense and perception. There are different sensor types such as force/tactile, inertial,odometric, range and visual sensors. This facilitates a large number of approachesand techniques to plan and control the movement of a robot. Some techniquesalready work reliably for specific tasks and are used in consumer products (seefigure 1.1). One technique for more comprehensive areas of operation is known underthe term visual servoing [12]. The approach of visual servoing is to process sensordata of optical sensors such as cameras. By doing so, geometrical and qualitativeinformation is obtained without physical interaction. This information is then used

1


to plan and control the movement and interaction. As an example, consider a servicerobot that uses image data to be able to manipulate objects in its environment.

The subject of this thesis is an extension of visual servoing. This extensionallows the robot to estimate the depth and position of an object during itsmovement. Besides, the robot is able to work with any manipulable ob-ject without specific visual markers. In comparison, visual servoing (as it isdescribed for example in [12] and [13]) uses specific information about ob-jects. This information is either given as a reference image or as geometricalinformation about the object.

1.2 Motivation

The general motivation for field and service robotics originates from multiple fields.For example, these are underwater and aerial robotics, medical and domestic appli-cations and robotics in hazardous environments [14]. Figure 1.2 shows several robotsfrom different fields. In general, the most important task in robotics is the manip-ulation (e.g. grasping, lifting, opening, . . . ) of an object. In order to manipulatean object, it is necessary to interact with the environment and to establish physi-cal contact with the object. A safe and reliable interaction necessitates extensivegathering of information about the environment.

Figure 1.2: Some examples for robots that operate autonomously in unstructured environ-ments. From left to right: Autonomous aerial robot [3], Health care robot [4], Domesticservices robot [5].

One way to collect this information about the environment of a robot is the useof computer vision. Camera images are processed and information that is neededto control the robot is extracted. This idea is motivated by the fact that humansand animals use primarily their visual perception to be able to interact with the

2


environment. Many different approaches, adaptations and improvements for visionbased control have been presented. Subsection 2.1 presents the most prominentrelated work.

Robust manipulation in unstructured environments is still an unsolved prob-lem in the field of robotics. The use of visual perception is a promisingapproach to gather the information that is needed for interaction. The ex-isting solutions either work only for particular tasks or depend on extensiveprevious knowledge. Accordingly, there is still a motivation for more gen-eral, more robust and less dependent approaches. The presented extensionfor visual servoing is one step in this direction.

1.3 Objectives

The objectives of this thesis can be outlined as the following:

• Extension of visual servoing loop for object depth/position estimation

• Previous knowledge (object model, desired image) becomes dispensable

• No use of special markers or tags on objects

• Implementation of one specific manipulation (grasping from above)

• Evaluation and testing of implementation

1.4 Overview

The structure of this thesis is as follows:

Chapter 2 presents related work and gives an overview of well-established methodsand techniques in the field of robotics and computer vision.

Chapter 3 presents the implementation and the algorithm. The combination ofthe techniques that are described in chapter 2 is elaborated.

Chapter 4 contains the performed experiments to test the algorithm and a specificapplication. Experimental results are presented and analysed.

Chapter 5 consists of a conclusion, discusses problems and possible future work.Final remarks and practical advice resulting from the thesis are listed.

3

Chapter 2. Methods and Techniques

2 Methods and TechniquesThis chapter describes the important methods and techniques that were used for the implementation.

These methods are all well-established and illustrated in all details in literature. Therefore the

overview in this chapter contains only the parts concerning this thesis.

2.1 Related Work

This section gives an overview of recent research results in the domain of visualservoing. For a detailed explanation of this technique, you might first want to readsubsection 2.2.2. The term visual servoing has been introduced in 1979 by Hill andPark [15]. Since then the rapid technological development of cameras, computers androbots has opened up possibilities for improvements as well as new approaches andideas. A recent comprehensive overview of all techniques that include visual systemsin robotics is given by Kragic and Vincze [16]. Besides the general overview theyalso provide introductions to specific fields, such as visual servoing. Furthermore,they cluster the existing branches into already working techniques and still openchallenges.Another recent survey about the domain of visual servoing is presented by Kragicand Christensen [17]. In this survey they summarize and classify about 100 differentapproaches according to the number of controlled degrees of freedom, the type ofcontrol model and the camera configuration.Hutchinson, Hager and Corke [18] as well as Hutchinson and Chaumette [12],[19]provide tutorials as an introduction to this field. The latter is also part of the mostrecent robotics compendium [14]. This thesis is primarily based on these tutorials,but uses also ideas and techniques of other approaches. Differences and similaritiesbetween the approach presented here and the approach in the tutorials are pointedout in subsection 2.2.2 and subsection 3.2.3. Besides the scientific articles, multiplesoftware solutions and frameworks for visual servoing exist [20],[6].

2.2 Robotics

Robotics is a highly multidisciplinary field of research. Given this fact, it is impossi-ble to illustrate all the different branches in this section. As a result, only the mostimportant techniques are presented with regards to the necessary adjustments.

4


2.2.1 Control System

A control system enables the connection between the sensory and the actuationsystem. Therefore it retrieves feedback data from the sensory system and adjusts theactions that are executed in an intelligent way. Joint space control and operationalspace control are the two possibilities for this. Figure 2.1 shows a control scheme foroperational space control, figure 2.2 the equivalent scheme for joint space control.

Figure 2.1: Operational space control scheme. Operational space control uses directly adesired position xd and calculates the command forces fc. See [14, p.136].

Figure 2.2: Joint space control scheme. Joint space control transfers the desired position tojoint values qd and calculates the command torques τ . See [14, p.136].

In joint space control, the control loop minimizes the task error in the joint space.Inverse kinematics is used to transfer the desired cartesian position xd into thevector qd of joint coordinates. Typically, intermediate positions are calculated andthe trajectory between these positions is interpolated.On the contrary, the control loop of operational space control minimizes the taskerror in operational space. This is based on the dynamics of the robot expressed inoperational space. A Jacobian matrix J(q) transforms the joint velocity vector q tothe end-effector velocity vector x.

x = J(q) · q or q = J−1(q) · x (2.1)

Operational space control is typically the control mode for visual servoing.Unfortunately, it turned out that the chosen robot platform did not supportthis mode of control. Consequently, joint space control was used in thisthesis which lead to several shortcomings (see section 5.2 for a discussionof the problems). Nevertheless, the presented work can easily be used withoperational space control if the robot supports this.

5


2.2.2 Visual Servoing

Visual servoing is the process of using data from vision sensors in the control loopof robots. It is similar to active computer vision with regards to the gathering ofinformation (given the case that the camera is mounted on the robot).Basically, three forms of visual servoing are described in literature. These forms areimage-based, position-based and hybrid visual servoing. The categorization is basedupon the formulation of the error. Image-based approaches try to minimize the error(distance between the desired value and current value) in the image space, whereasposition-based approaches minimize the error in 3D space. Hybrid approaches dealwith error functions in more than one space. Figure 2.3 shows the differences betweenthe three control modes.

Figure 2.3: Classification of visual servoing based on the formulation of the control error.Top left: Position-based control. Top right: Image-based control. Bottom: Hybrid control.Image courtesy of [21].

Besides, there is a second criteria that subdivides visual servoing according to thecamera configuration. The most popular groups are eye-in-hand and eye-to-handmounting. In the former group the camera is fixed to the end-effector of the robot,in the latter the camera is stationary in the world and observes the end-effector.

In this thesis hybrid visual servoing is used with the camera mounted on theend-effector. One advantage of this approach is that the target object can beeasily kept in the field of view of the camera. For more details see subsection3.2.3. Besides, the target position is estimated during the movement.

6


2.2.3 Coordinate Transformations

Figure 2.4: The concept of coordinate transformations is illustrated in this image. The leftimage shows a robot with several relevant coordinate frames and the transformations betweenthem. These are the world coordinate system w, the end-effector coordinate system e, thecamera frame c and the coordinate system attached to the object o and the correspondingtransformations Ta

b . The right image shows two coordinate systems a and b and is used as anexample for a transformation.

Transformations describe the relation between coordinate systems and are used totransform coordinates from one system to another. Usually, they are expressed inhomogeneous coordinates. The homogeneous transformation Ta

b of the example infigure 2.4 is given by

Tab =

1 0 0 0

0 0 −1 2

0 1 0 1

0 0 0 1

This transformation matrix is composed of a 3×3 rotation matrix and a translationvector from a to b. The columns of the rotation matrix are the three unit vectors offrame b expressed in frame a. The last row is the homogeneous augmentation. Inthe example, the coordinates of the point with regards to the system b are given aspb. The transformation then can be used to transform the point to the coordinatesystem a. This is done with a simple multiplication of the transformation matrixwith the point in homogeneous coordinates (augmented by 1).

pa = Tab ·pb

The concept of these transformations simplifies the change of coordinate systems toa large degree. More information on relevant mathematics can be found in [22],[23].

7


2.3 Computer Vision

2.3.1 Camera Model

A camera model describes the correspondence between objects in 3D space and theirappearance in a 2D image. The well known pinhole camera model applies for allcameras without complicated lens systems. This model is a simplification of thereal world. Thus, it is extended with a distortion model that is used to deal withradial and tangential distortions of the lens. Figure 2.5 shows the principal ideabehind the pinhole model. The mathematical formulation can be derived from thisconfiguration using the rules of perspective projection. The result can be writtendown in the form of a matrix and is known as the camera projection matrix. Cameracalibration is the process of determining the camera calibration matrix.

Figure 2.5: The pinhole camera model that describes the projection of objects onto the imageplane. Rays of light are passing the optical center and create the image on the sensor plane.Image courtesy of [21].

u

v

1

=

fx 0 cx

0 fy cy

0 0 1

r11 r12 r13 t1

r21 r22 r23 t2

r31 r32 r33 t3

x

y

z

1

= Kci E

wc

x

y

z

1

The intrinsic Kc

i and extrinsic Ewc matrices map the point pw with the world coordi-

nates (x, y, z) to the image coordinates (u, v). This mapping is ambigous since there

8


is an infinite number of points in world coordinates that are mapped to the samecamera coordinates. All these points lie on the ray from the camera principal pointoc to the point pw. The information about this degree of liberty gets lost duringprojection. Several methods exist to compensate for this loss. They are describedshortly in subsection 2.3.5. More information on camera models, projections andthe camera calibration can be found in [24, p.151ff].

2.3.2 Epipolar Geometry

Epipolar geometry is a topic in computer vision that describes the geometric rela-tions between images taken from different positions in space. A simplified versionof epipolar geometry is used in this thesis for depth estimation. In this case therotation between the two cameras is 0, and the images can be used as if they wererectified. Equation 3.2 then describes the depth of a point in the scene. For genericcases, the rotation and translation between the two views can either be calculatedwith image information or can be acquired by using the kinematic model of therobot and the encoder information. For a more detailed description that handlesnon-simplified cases see [24].

2.3.3 Visual Features

Visual features are distinct intensity patterns or measurable alignments of geometri-cal primitives in images. Reasons why they are of interest can be saliency, reliabilityin recognition and tracking, invariance and many more. Multiple possibilities todetect such features and to describe them in a unique way exist. Good features totrack that are described in [25] were used in this implementation. The reason forthis choice is that these features are reliable with regards to tracking.

2.3.4 Object Tracking

An object that is described by one or more visual features is tracked over a seriesof images. Since objects normally move slowly and normal cameras acquire about20 images per second, the spatial difference between an object in one image and inanother image is small. Additionally, tracking features is one way to avoid the cor-respondence problem that exists with stereo camera systems. This is an importantadvantage when objects are only weakly textured.

9


2.3.5 Depth Estimation

Basically there are two groups of depth estimation techniques based on visual infor-mation. These groups are divided by the number of images that is used to calculatethe depth. One group is based on monocular images. Only the information that iswithin one image is used to estimate the depth. The second group uses the infor-mation from multiple images and different viewpoint positions. The two interestingtechniques of the second group are:

Depth from Stereo Two or even more images are used to calculate the depth.Therefore correspondent features have to be found in all images and a disparitymap is calculated. This disparity map represents relative depth information.

Depth from Camera Motion Equivalent to depth from stereo with the exceptionthat features are corresponding generally better due to tracking during themovement. This is the reason why this method is used in the thesis.

2.3.6 Triangulation

Triangulation is the process of estimating the position of a point in 3D space givena set of corresponding feature locations and camera positions. Figure 2.6 shows thetriangulation of a point given two camera views. The searched point is approximatedhere by the point that lies closest to the rays. For more details see [24, p.310ff].

Figure 2.6: Triangulation of point p that is projected onto x0 and x1. Due to noise, thebackprojected rays normally do not intersect. The point is approximated as the center of theshortest connection between the two rays. Image courtesy of [26].

10

Chapter 3. Implementation

3 ImplementationThis chapter explains the implementation and the algorithm in all details. This comprises the

combination of the techniques that are described in the former chapter. The resulting generic skill

is defined by a problem description as well as in- and output values.

3.1 Problem Description

As described in the introduction, the task is to move the end-effector from a startingposition to the object in order to manipulate it. In the beginning, the robot is in aninitial position and observes the manipulable objects (see figure 3.1, left image).

Figure 3.1: The robot is starting from its initial position. Manipulable objects lie on thetable and are in the field of view of the camera (left image). The problem is to move the robothand to a position at which the object can be manipulated (right image). The arrow in themiddle thus represents the implemented technique that solves this problem.

The end-effector (in this case the robot hand) then moves towards the object. Thisis done by using the image information that is acquired by the camera. When thefinal position is reached, the object can be manipulated (see figure 3.1, right image).

The movement part of the described problem is solved as follows. First, anintermediate position is approached by means of visual servoing. Followingthis the force sensor of the hand is used to determine physical contact.

11


3.1.1 Input and Output

The input of the created algorithm is the manipulation description and the imagecoordinates of an object. The manipulation description determines how the selectedobject is to be manipulated. This comprises the kind of manipulation (grasping,pushing, inserting, . . . ) and the relative position and orientation (pose) of theend-effector with regards to the object. Additionally, the movement trajectory ofthe robot arm depends on the kind of manipulation. For instance if an object isto be pushed sidewards, the end-effector has to approach the object at the sameheight. In such a scenario the orientation of the end-effector only plays a minor role.Conversely, if an object is to be grasped from above, the end-effector has to approachthe object from above. In this case the orientation of the grasper is relevant.

The kind of manipulation influences the movement trajectory and the targetpose of the robot hand. The main aspect of this thesis is the visual servoingpart. Thus the manipulation problem is simplified in the way that only aspecific scenario is solved. This scenario consists of one simple manipulationwhich is the grasping of an object from above. Additionally, it is assumedthat the object is rotated in a way that allows the robot hand to grasp it.

The image coordinates of an object are given by the tuple (u, v) and represent oneimage point on the object. Figure 3.2 on the next page shows the selection of thecoordinates and the object selection. The output of the algorithm is the performedmanipulation (see figure 3.1 on the preceding page, right image).

3.1.2 Assumptions

The following assumptions are being made:

• Objects are located in the working area of the robot

• The size and shape of an object satisfies certain conditions(the robot hand must be able to manipulate it)

• The orientation of an object is either given or less important for the task

• No collision with other objects can occur

• The robot-camera system must be calibrated

12


3.2 Algorithm Description

The algorithm or implementation can be divided into the following parts:

• Object selection

• Initial position estimation

• Approach

• Manipulation

The main and more generic parts are the initial estimation and the approach. Onthe contrary, the object selection and the manipulation parts are more specific.

3.2.1 Object Selection

The object to be manipulated is selected in the camera image. The selection containsthe coordinates (u, v) of one point of the object. In the current implementation theuser selects the object by clicking on it (see figure 3.2, left image). Alternatively,an automatic unsupervised selection based on object recognition could be realized.Possible techniques are for example bag of visual words (see [27],[28]) or classifiercascades based on Haar features (see [29],[30]). The basic procedure is similar forboth techniques. At first, object categories are learned with the help of sampleimages. Afterwards, an object can be found and located in an image. If there aremultiple objects in an image, classification results can be used to determine theobject that is selected.

Figure 3.2: The left image shows the selection that is made by the user. Features are thensearched for in the environment of the position (u, v). Finally, the object is selected andrepresented by the center of gravity (uc, vc) of all describing features. When features aremoving away from the object center during the tracking they are deleted from the selection.

13


After the coordinates are selected, features in the environment of the selection aresearched for and added to the selection (see figure 3.2 on the preceding page, rightimage). This is done by a combination of graph search and snakes (see [31]). Atfirst, features are searched for in a distance graph and added to the selection. Thesnakes algorithm then optimizes the contour of the selection and further featuresare added. The features used in the implementation are good features to track (see[25]). They are used because tracking these features is robust and simple. Moredetails of visual features are outlined in section 2.3.3. The feature descriptors couldeasily be exchanged with or extended by other feature descriptors. Accordingly, therobustness of the tracking can be improved by chosing and combining appropriatefeatures. In the case that only a few features can be found on the object, featureson the border between the object and the background are used. The advantage ofthis is that nearly every object – even plain objects – can be used as a target.

A point in the image is selected by the user. The object selection is thenadjusted to the object borders. In this step, additional features are addedto the selection. This increases the robustness of the tracking and makes itpossible to work without special markers on the objects. In this implemen-tation the selection is not stored in any way. Thus, if it gets lost during thetracking, the object can not be selected again.

3.2.2 Initial Position Estimation

After the object is selected in the image, an initial estimation of its position in3D space is performed. In order to keep the object in the field of view, it is firstcentered in the image by a rotation of the camera. The rotation is determined bythe following equations:

Roll: α = 0 Pitch: β = k · (vc − cv) Yaw: γ = k · (uc − cu) (3.1)

with k = gain, (uc, vc) = object center, (cu, cv) = image center

The angles are represented here in the more intuitive roll-pitch-yaw notation. How-ever, internally the rotation is interpolated with quaternions. A short descriptionand comparison is given in section 2.2.3. The coordinates of the image center (cu, cv)

are subtracted to move the origin to the center. The rotation about the camera’soptical axis can not be estimated with one point and is set to 0.

14


Thus, two rotational degrees of freedom of the end-effector are controlled with ap-servo loop. The left and the middle image of figure 3.3 show the selection beforeand after the camera rotation. Admittedly, a pd-or pid-controller would be a bettersolution, but it turns out that the results are sufficient if the gain k is adjustedproperly. Even though the two other controllers are theoretically easy to implement,the synchronization of the image retrieval loop and the control loop makes it moredifficult to implement in reality.

Figure 3.3: Left: The center of the object selection (uc, vc) lies at a certain distance to theimage center (cu, cv). The rotation of the camera is calculated with these coordinates and theresult of the rotation is shown in the middle image. Middle: The coordinates of the objectand the image center are approximately equal. The camera is then moved sidewards. Right:After the sidewards movement, there is a certain pixel disparity between the centers that isused for depth estimation.

The next step is a translational movement of the camera. The resulting image afterthe movement can be seen in the right image of figure 3.3. This is necessary to be ableto estimate the depth of the object. For best results, this translational movementshould be directed perpendicular to the connecting line between the camera and theobject. If the movement is along the line, the position can not be estimated at all.Epipolar geometry (see subsection 2.3.2 for a detailed explanation) is then used tocalculate the object depth. In order to do so, a simplified version of the generalepipolar geometry is used. This simplification is only valid when the two imageplanes coincide. A controlled movement of the robot along a direction parallel tothe image plane can guarantee this. The formular for the estimation of the depththen is:

depth =camera distance× focal length

pixel disparity(3.2)

With the estimated depth, it is then possible to calculate the object position oc.This necessitates the camera model and is done by a backprojection of image points(see subsection 2.3.1). The current camera transformation Tw

c is used to transformthe object position oc to the position ow in world coordinates.

15


In general, it is also possible to already approach the object during the initialestimation. The reason to use the simplified solution (movement exclusivelyin the image plane) was the following. The inverse kinematics module oftenreturned solutions that led to a loss of the image features when a movementtowards the object was performed. With the use of operational space controlit is possible to approach the object during this step.

3.2.3 Approach

After the object position is estimated, hybrid visual servoing is used to approachthe object. According to [12], visual servoing can be described with the followingerror equation:

e(t) = s(m(t), a) − s∗ (3.3)

with e(t) = error, m(t) = image data, a = knowledge, s = features, s∗ = desired f.

The error e(t) is minimized during the servoing process. The image data m(t)

represents the current information in the image. Together with the knowledge a

(here: the camera model), the feature vector s(m(t), a) can be calculated. Beforethe initial estimation the feature vector contains the current object selection coor-dinates (uc, vc). The vector of desired values contains the image center coordinates(cu, cv). After the 3D object position is estimated, the vector s∗ is extended withan intermediate position next to the object. It is necessary to use this intermediateposition because when the robot hand approaches the object it may disappear fromthe limited field of view. Figure 3.4 on the next page shows an image taken at theintermediate position. Furthermore, the camera position is added to the vector s.

The difference between visual servoing as described in [12] and this approachcan be shown in equation 3.3. The described image-based visual servoingapproach uses a fixed set of image points for s∗. Thus, this approach necessi-tates a reference image and works only with textured or marked objects (seefigure 3.5 on page 18). The described position-based approach is dependenton additional previous knowledge in form of a 3D model. Here, the desiredvalues s∗ are calculated and changed during the movement.

16


The advantage of hybrid visual servoing is that 3 translational degrees of freedom areposition-based controlled, whereas the rotational degrees of freedom are image-basedcontrolled. The object center coordinates (uc, vc) are used to control 2 rotationaldegrees of freedom (see equation 3.3). This keeps the object in the field of view ofthe camera, which is one requirement for the tracking. The translational degrees offreedom are controlled independently and are used to approach the object.

Figure 3.4: The left image shows the camera view at the beginning of the servoing loop.The initial estimation of the 3D object position is used to approach the intermediate position.During the movement, the 3D object position as well as the intermediate position are refined.The right image shows the camera view at the intermediate position. Obviously, the object isat the border of the image and the tracking of the position has to be stopped. The intermediateposition is dependent on the kind of manipulation. For example, if an object is to be graspedfrom above, the position must be above the object.

In contrast to the rotational control, the control of the 3 translational degrees offreedom is more complex. The following requirements have to be fulfilled:

• Direct movement towards the intermediate position

• Lateral movement to refine the object position estimation

The first requirement is described by the position error which is a vector betweenthe camera position pw and the intermediate position tw.

pwi+1 = k · (tw − pw

i ) (3.4)

with k = gain, pwi+1 = new position, pw

i = old position, tw = intermediate position

In classical visual servoing only this error determines the movement towards theobject. In contrast to this, here the lack of previous knowledge is compensated forby a more complex movement. This leads to the second requirement which is the

17


necessity to refine the object position estimation, which may be imprecise. The re-finement is also important because the position of the object may change. The waythe refinement is done here is a lateral (sideways) movement and a triangulation ofthe position. The best solution to determine the lateral movement is based on thereliability of the current estimation. In brief, if the estimation is less likely to beprecise, a larger lateral movement is performed. At this point again, a simplifiedmethod is used. This method calculates a sinusoid trajectory with a changing am-plitude dependent on the distance to the intermediate position. The idea behind itis that the estimation quality increases the smaller the distance gets.

Reference image Current view

Figure 3.5: Illustration of the previous knowledge that is used for image-based visual servoing.The left image illustrates the reference image of the object, the right image illustrates thecurrent view. The control error is defined by feature positions and their relation or size in thereference image and the current image. In other words, the camera is moved to a position atwhich the features in the reference image and the current image are identical. In this case,they are identical at one position with a fourfold rotational ambiguity. Texture or specialmarkers are used to define a unique solution.

The hybrid visual servoing approach that is used here allows one to super-impose different trajectories and motions. These are the rotational motionto keep the object in the field of view and the generated trajectories to ap-proach the object and to refine the position estimation. As a result, it ispossible to track the object and to estimate the object position without anobject model. This is an advantage for the manipulation of unknown ob-jects. The servoing schemes described in [12] are evaluated with regards tothe system behaviour and the generated trajectory. Of course, the resultingtrajectory here is not a straight movement towards the object. An analysisof the resulting trajectory is presented in section 4.3.

18


3.2.4 Manipulation

The last part of the algorithm is the execution of a specific manipulation. This ma-nipulation is the grasping of the object from above. Consequently, the intermediateposition lies above the object. In order to detect the necessary physical contact withthe object, the force sensor of the hand is used. Hence, the hand moves downwardsuntil the measured vertical force exceeds a certain threshold. The hand stops in thisposition as it is touching the object. Forces are applied to the fingers and the handcloses around the object. As a last step, the grasped object is lifted. All these partsof the manipulation are illustrated in figure 3.6.

Figure 3.6: This figure shows an overview of the manipulation procedure. At first, the robothand is in the intermediate position above the object (left image). It then moves downwardsuntil the measured force exceeds a certain threshold. When this is the case and the objecthas not moved in the meanwhile, the robot hand touches the object (middle image). The lastimage shows how the object is grasped and lifted.

A similar approach that combines force and visual feedback in industrial environ-ments is described in [32]. For the chosen robot configuration, it was absolutelynecessary to combine both techniques. One reason for this is the fact that the robotcamera is attached to the side of the hand – not the center.

The additional use of force feedback makes it possible to detect physicalcontact. Besides, the highly capable robot hand that was used ensuresthe relatively robust manipulation. Problems with this technique arise onlywhen the object is moved during the last step or when it is not orientated theright way. The first problem can be avoided for example by using a secondcamera that observes the complete scene and detects object movements. Tosolve the second issue, the type of manipulation plays an important role. Inthis scenario (grasping from above), it can be addressed with the use of asimple 2D orientation detection.

19


3.3 Practical Details

3.3.1 Software

The following libraries and tools were used for the implementation in Python:

• OpenCV for the acquisition and processing of the images [7]

• YAML in order to save trajectories and parameters [8]

• CGKit for an implementation of quaternions [9]

• Meka M3 Framework to control the robot [10]

• Matplotlib for the evaluation and visualization [11]

The implementation is distributed over four classes which can bee seen in figure 3.7.This modularization allows a clean structuring of the different parts.

Servoing

Robot

MEKA

Vision

OpenCV

Visualisation

Figure 3.7: The four created software modules are shown here. The bottom modules Robotand Vision are used by the modules Servoing and Visualization. MEKA and OpenCV areintegral parts of these two modules.

The module Robot acts as an interface to control the robot. Image acquisition andprocessing is performed by the Vision module. The Servoing module then combinesthe provided functionality in the the servoing loop. Multithreading is used to enableperformant image processing and robust tracking. The use of OpenCV for theimage processing simplifies the complete vision part to a large degree. MEKA asthe equivalent for the robot control part also contains a lot of useful functionality.This is the reason for the strong integration of these libraries into the Robot andVision modules. The module Visualization can plot the trajectory and behaviourduring or after the movement. This is done by saving and loading the trajectory.

20


3.3.2 Hardware

The robot that was used is the Mekabot [10] which is shown in figure 3.8. It is ahumanoid torso with two degrees of freedom and with two arms that have sevendegrees of freedom each. The end-effector is a 4-finger human-like hand, that iscapable of difficult grasping tasks as well as measuring the applied force (see fig-ure 3.9). Two conventional firewire cameras are attached to the hands. They areable to capture images with a resolution of 640 × 480 pixels at a framerate of 30frames per second. Objects can be put on a small table in front of the robot.

Figure 3.8: Front view of the Mekabot. The objects lie on the table in front of the robot.The camera is attached to the robot hand.

Figure 3.9: Side view of the 4-finger robot hand performing a pinch grasp. The camera ismissing in this image. Image courtesy of [10].

21

Chapter 4. Experiments and Results

4 Experiments and ResultsThis chapter presents the experiments and the results. Different aspects of the experiments are

elaborated and analysed quantitative and qualitative. A short overview of the working environment

and the manipulable objects that were used is given.

4.1 Performed Experiments

To test the implementation the robot performed one specific manipulation severaltimes. The type of manipulation was the grasping of an object from above. Differentobjects without special visual properties (color, markers, etc.) were used. Themovement trajectory was recorded in the form of absolute cartesian positions ofthe end-effector. These positions were calculated with encoder information and thekinematic model. Furthermore, the estimation of the object position and the featurecoordinates were recorded.

The experiments are performed to analyse various aspects of the implemen-tation. First, the overall reliability and stability of the presented techniqueis verified. Second, the resulting trajectory and the behaviour of the systemare analysed. The third interesting aspect is the refinement of the objectposition estimation. And last, visual aspects such as the use of plain ob-jects, the tracking error and triangulation results are tested and analysed.A conclusion that considers all different results is presented in chapter 5.

4.1.1 Working Environment

The working environment consisted of the robot (which is described in subsection3.3.2) and a small table in front of it. Different objects were put on the table andthe servoing loop was started by the user. This means that the user clicked on anobject in the initial camera image. Subsequently, the hand approached the selectedobject and grasped it from above. Figure 4.1 on the next page shows the objectsthat were used for the experiments. These objects are all weakly textured but easyto grasp for the robot hand. All in all, the experiment was performed several times.Nevertheless, only the data of the last 20 experiments is presented here.

22


Figure 4.1: The objects that were used in the experiments. From left to right: box, metalsphere (Sph1), wooden sphere (Sph2), bottle (Bot), frame (Fra) and wooden cube.

4.2 Quantitative Results

In this section, quantitative results of the performed experiments are presented.These results may not be completely representative, but they indicate a basic trendof the capabilities. Tabular 4.1 and tabular 4.2 on the next page contain the data.An explanation and interpretation of the presented data is then given in the followingsubsections.

Table 4.1: Experimental results trial 1 – trial 10. For an explanation and interpretation ofthe results see subsection 4.2.1 and subsection 4.2.2.

Number 01 02 03 04 05 06 07 08 09 10

Object Box Box Box Box Box Box Box Box Cube Sph1

2D error 1 3 28 36 24 4 — 11 14 21

3D error 1.4 10.2 2.5 3.9 13.2 1.8 — 4.4 6.2 2.8

∆ est 0.9 7.2 3.0 3.3 — 3.4 — 4.6 6.1 3.6

Distance 30.9 24.3 26.3 25.6 31.8 29.9 — 35.6 32.3 38.9

Success Yes No Yes Yes No Yes No Yes Yes Yes

4.2.1 Explanation of the Data Tables

The data of the experiment or trial n is contained in the corresponding column. Therow Object names the object that was used for this specific trial (see figure 4.1). Therow 2D error represents the distance (in pixels) from the selection center before themovement to the selection center after the movement. Figure 4.2 on the next pageshows an example and gives an explanation of what is meant by this. 3D error isthe measured distance (in cm) between the optimal grasping position (determined

23


Table 4.2: Experimental results trial 11 – trial 20. For an explanation and interpretation ofthe results see subsection 4.2.1 and subsection 4.2.2.

Number 11 12 13 14 15 16 17 18 19 20

Object Fra Fra Bot Bot Sph1 Sph1 Box Box Box Sph2

2D error 9 18 8 10-15 — 11 — 7 2 —

3D error 3.7 10.4 3.9 2.4 — 3.3 7.0 4.0 4.0 2.1

∆est 2.9 5.6 1.0 5.9 — 14.8 4.0 2.5 1.7 3.5

Distance 26.6 40.9 37.1 31.9 — 29.8 24.1 28.6 24.5 27.9

Success Yes Yes No Yes No Yes Yes Yes Yes Yes

by hand) and the reached position after the movement. The estimation change ∆est

is the distance between the initial estimation and the final estimation of the position(in cm). It thus represents the total change of the estimation. For the improvementof the estimation over the time, see figure 4.3 on page 26. The row Distance showsthe total distance (in cm) between the starting and ending position of the robothand. Success indicates whether the grasping and lifting was successful or not. Forsome trials, not all data could be acquired due to various problems (see subsection4.2.2).

Figure 4.2: The 2D tracking error is calculated as the pixel distance of the selected objectcenter before and after the movement. The red point marks the center before the movementand the green point after the movement. This distance is only an estimation, since the originalcenter was selected manually in the second image. Nevertheless, it shows the accuracy of theobject tracking during the movement.

24


4.2.2 Interpretation of the Data

The total success rate of about 75% suggests that the presented technique principallyworks. A further interpretation follows after the remarks for individual trials:

• Trial 4: The tracked features were partially out of the field of view at the endof the movement. Therefore, the selection center moved and the 2D trackingerror was relatively large.

• Trial 5: The 3D position error was large due to a wrong initial estimation. Theestimation error could not be calculated for the same reason.

• Trial 7/15: Due to a bad solution of the inverse kinematics all tracked featuresleft the field of view. As a result, the complete trial was stopped and no datacould be acquired.

• Trial 13: The object (the bottle) could not be grasped because of its orienta-tion.

• Trial 12/17: The large 3D error was mainly in the vertical direction and theobjects could be grasped.

The errors in 3D space for succesful trials were mainly caused by tracking errorsof the features. However, the correlation between the 2D tracking error and the3D error is not clearly observable. The reason for this is that the resulting 3Derror is influenced by many different parts as for example the sensor noise of theencoders and errors of the estimation. Another point that was demonstrated was therefinement of the position during the movement. Even for large initial estimationerrors (as in trial 16), the refinement worked and the object could be grasped (seealso figure 4.3 on the next page). Furthermore, the total distance between the objectand the initial position also played a minor role for the overall performance.

As expected, the quantitative results only suggest some weak conclusions.These conclusions can be subsumed as follows: First, the presented tech-nique works for the relatively generic scenario. Further improvements canbe realized to increase the reliability of the technique. Even if the 3D erroris large, the manipulation is executed succesfully. And last, the refinementof the position estimation even works for larger errors (as for example mov-ing objects). For more representative results, the experiments has to beperformed under different conditions.

25


Figure 4.3: The images show the error of the object position estimation plotted over thetime. The reference position that is used to determine the error is the measured 3D positionof the object. The left image shows four trials with a better initial estimation. The rightimage shows four trials with a worse initial estimation. Both plots show that the estimationerror decreases globally over the time, even if it increases locally in some cases. For a detailedexplanation of this behaviour, further experiments are necessary due to the fact that manyparameters influence this error.

4.3 Qualitative Analysis

This section contains the analysis of the movement trajectories, the system be-haviour and some visual aspects. Two example trajectories of the experiments andthe corresponding estimation of the object position are shown in figure 4.4 andfigure 4.5 on the next page.

dd

SE

Figure 4.4: The sampled trajectory of trial 13 is plotted here and different aspects areemphasized. See figure 4.5 on the next page for further details.

26


dd

S

E

Figure 4.5: The sampled trajectory of trial 16 is plotted here and different aspects areemphasized. In the left image the trajectory and the position estimation is plotted in 3Dspace. The corresponding 2D plot is shown in the upper right image. The lower right imageshows the change of the (u, v) coordinates of the object center. Three different stages of themovement are emphasized (1-3). The initial estimation (S) as well as the final estimation (E)of the object position are marked. A detailed analysis is given in subsection 4.3.1.

4.3.1 Trajectory Analysis

The different stages of the movement are as follows:

• Stage 1 The object is selected by the user and the selection is centered in theimage (see (u,v)-coordinates plot). Next the initial estimation of the objectposition is done by moving the camera sideways. In both trials, the movementwas along one camera axis, which can be seen in the (u,v)-plots.

• Stage 2 The movement towards the object is executed. The camera moveson a sinusoid trajectory and keeps the object centered in the image. Duringthis movement the object position is refined by new triangulations. The end-effector moves to the intermediate position above the actual object position.

• Stage 3 The camera is not used anymore since the object leaves the fieldof view of the camera (see (u,v)-plot). The hand is moved downwards until aforce is noticed. The final position is reached and the object is grasped.

The strange movement behaviour that occurs at the end of stage 2 is probablycaused by several reasons. However, the rapid changement of the (u, v) coordinatesindicates that the orientation of the camera is changing significantly. This meansthat the orientation control or the inverse kinematics or both induce this behaviour.Additionally, the fact that the object is closer to the camera intensifies this effect.

27


4.3.2 Triangulation Analysis

Already shown in section 4.2 is the way in which the position estimation is refinedduring the movement. This is done by performing new triangulations as describedin subsection 2.3.6 and subsection 3.2.3. If the initial estimation is imprecise (as forexample in figure 4.4 on page 26), the wrong estimation can still be compensatedfor during the movement. In this case the distance between the initial estimationand the final estimation is ∼15 cm.

Figure 4.6: The image shows the backprojection of the object center point into 3D space.The current camera frame and the image coordinates are used to do this These rays are usedfor the refinement of the object position estimation. Under normal conditions these rays areskew because of the noise in the camera frame and image coordinates. Thus, an approximationmethod is used to estimate a new object position.

The qualitative analysis shows that even small movements of the targetobject can be tolerated. The behaviour of the system is relatively robust.Even when small disturbances occur, the robot hand succesfully executesthe manipulation. Besides, the analysis proves that the simple triangulationthat is used here already produces good results.

28

Chapter 5. Conclusion

5 ConclusionA final conclusion of the work is drawn in this chapter. The achieved goals are summarized and

the existing problems discussed. Furthermore, an overview of possible future work is presented and

final remarks are outlined.

5.1 Summary and Contribution

The presented technique allows the manipulation of objects in unstructured envi-ronments. To achieve this, the position of objects is estimated during the visualservoing loop. This estimation depends on the known camera trajectory and thetracking of image features. Figure 5.1 shows key positions and tracked features ofa complete trajectory. A force sensor is used in addition to the vision sensors todetect physical contact, which is necessary for manipulation tasks. This part of thealgorithm is shown in figure 5.2.

Figure 5.1: The images show key positions and the according camera image for a completedvisual servoing loop. The last image shows the position before the end-effector moves down-wards. At this point, the feature tracking is already stopped and the force sensor is used todetect physical contact (see figure 5.2).

Figure 5.2: The robot hand is slowly moving downwards until a force above a certain thresh-old is detected. This makes it possible to grasp and lift the object.

The main contribution of this work is the described extension of the visual servoingloop. Resulting advantages of this are less usage of previous knowledge and the pos-

29


sibility of manipulating plain objects. The results of the experiments show that thismethod provides reliable manipulation of objects in weakly structured environments.

5.2 Discussion

Even though the position estimation and the manipulation works reliably, there arestill problems that need to be discussed. Possible solutions to some problems arepresented in section 5.3.The most profound shortcoming arises from the use of inverse kinematics to calcu-late the trajectory. When a new position of the end-effector necessitates a largermovement of some joints, the tracked object can disappear out of the field of viewof the camera. As a result, the manipulation has to be started again.The second problem is due to the fact that no orientational and structural informa-tion of the object is used. Consequently, it works only if the object is manipulablewithout a previously performed rotation and if its structure allows the manipulation.For example, if a bottle is standing on a table and the manipulation is grasping thebottle, this may work only if the robot grasps the bottle from the side and not fromabove.Another point for discussion involves the simplifications that are made. Basically,they are used in order to decrease the complexity. As the results show, there are someshortcomings that arise from this simplifications. Therefore several improvementscan easily be made by using the more general methods.Other important issues are that there is no collision detection and that object move-ment during the servoing is only possible if the movements are relatively small.

5.3 Future Work

One important improvement would be to use operational space control. Unfortu-nately, this mode could not be used in this implementation. Nevertheless, if thecontrol software supports this mode, it is easy to implement velocity control in theoperational space. Since the position of the object and the pose of the end-effectorare known, a velocity vector can be calculated and used for the movement.Further tasks could include the extension of the tracking module. In the presentedimplementation, the object is only represented by a set of features in image space andone point in 3D space. During the movement, the position in 3D space is estimatedbut further information about the structure of the object is discarded. A possible

30


adaptation could include the estimation of the object’s structure and orientation(Structure from Motion) in order to find contact points for the manipulation.Of course, many more improvements and extensions could be implemented. Forexample, one could use filtering for the object position estimation to better facil-itate working with moving objects. One could also use object recognition for theunsupervised selection of objects. Additionally, there are more sophisticated waysto generate movement trajectories.

5.4 Final Remarks

• Use of quaternions for the interpolation between different orientations plays animportant role

• Image processing should run in a different thread in order to work in real-time

• Precise internal camera calibration is important for accurate information

• Using undistorted images narrows the field of view but enhances the accuracy

31

References. Images and Internet

Images and Internet

[1] http://www.kinovarehab.com/.

[2] http://www.iroboteurope.de/.

[3] http://robotics.usc.edu/~avatar/index.html.

[4] http://rtc.nagoya.riken.jp/RI-MAN/index_us.html.

[5] http://www.dlr.de/rm/en/desktopdefault.aspx/tabid-5471/.

[6] http://vstoolbox.sourceforge.net/.

[7] http://opencv.willowgarage.com/wiki.

[8] http://www.yaml.org/.

[9] http://cgkit.sourceforge.net/.

[10] http://www.mekabot.com/.

[11] http://matplotlib.sourceforge.net/.

32

http://www.kinovarehab.com/

http://www.iroboteurope.de/

http://robotics.usc.edu/~avatar/index.html

http://rtc.nagoya.riken.jp/RI-MAN/index_us.html

http://www.dlr.de/rm/en/desktopdefault.aspx/tabid-5471/

http://vstoolbox.sourceforge.net/

http://opencv.willowgarage.com/wiki

http://www.yaml.org/

http://cgkit.sourceforge.net/

http://www.mekabot.com/

http://matplotlib.sourceforge.net/

References. Bibliography

Bibliography

[12] F. Chaumette and S. Hutchinson. Visual Servo Control, Part I: Basic Ap-proaches. IEEE Robotics and Automation Magazine, volume 13 (4):82–90,2006.

[13] P. I. Corke. Visual Control Of Robot Manipulators – A Review. In VisualServoing, 1–31. World Scientific, 1994.

[14] B. Siciliano and O. Khatib, editors. Springer Handbook of Robotics. Springer,first edition, 2008.

[15] J. Hill and T. W. Park. Real Time Control of a Robot with a Mobile Camera.233–246, 1979.

[16] D. Kragic and M. Vincze. Vision for Robotics. now Publishers Inc., first edition,2010.

[17] D. Kragic and H. I. Christensen. Survey on Visual Servoing for Manipula-tion. Technical report, Computational Vision and Active Perception Labora-tory, 2002.

[18] S. Hutchinson, G. D. Hager and P. I. Corke. A Tutorial on Visual Servo Control.IEEE Transactions on Robotics and Automation, volume 12 (5):651–670, 1996.

[19] F. Chaumette and S. Hutchinson. Visual Servo Control, Part II: AdvancedApproaches. IEEE Robotics and Automation Magazine, volume 14 (1):109–118, 2007.

[20] E. Marchand, F. Spindler and F. Chaumette. ViSP for Visual Servoing: AGeneric Software Platform with a Wide Class of Robot Control Skills. IEEERobotics and Automation Magazine, volume 12 (4):40–52, 2005.

[21] V. Lippiello. Multi-Object and Multi-Camera Robotic Visual Servoing. 2003.

[22] O. Khatib and K. Kolarov. Introduction to Robotics, 2009.

[23] R. M. Murray, Z. Li and S. S. Sastry. A Mathematical Introduction to RoboticManipulation, 1994.

[24] R. I. Hartley and A. Zisserman. Multiple View Geometry in Computer Vision.Cambridge University Press, second edition, 2004.

33

References. Bibliography

[25] J. Shi and C. Tomasi. Good Features to Track. Proceedings of the Conferenceon Computer Vision and Pattern Recognition, 593–600, 1994.

[26] R. Szeliski. Computer Vision: Algorithms and Applications. Springer, august18 edition, 2010.

[27] P. Tirilly, V. Claveau and P. Gros. Language Modeling for Bag-of-Visual WordsImage Categorization. CIVR ’08: Proceedings of the 2008 International Con-ference on Content-Based Image and Video Retrieval, 249–258, 2008.

[28] J. Yang, Y.-G. Jiang, A. G. Hauptmann and C.-W. Ngo. Evaluating Bag-of-Visual-Words Representations in Scene Classification. MIR ’07: Proceedings ofthe Workshop on Multimedia Information Retrieval, 197–206, 2007.

[29] P. Viola and M. Jones. Rapid Object Detection Using a Boosted Cascade of Sim-ple Features. Computer Vision and Pattern Recognition, 2001 IEEE ComputerSociety Conference on, volume 1:I–511–I–518 vol.1, 2001.

[30] R. Lienhart and J. Maydt. An Extended Set of Haar-like Features for RapidObject Detection. Image Processing, 2002 International Conference on, vol-ume 1:I–900–I–903 vol.1, 2002.

[31] M. Kass, A. Witkin and D. Terzopoulos. Snakes: Active contour models. In-ternational Journal of Computer Vision, volume 1 (4):321–331, 1987.

[32] J. Baeten and J. De Schutter. Hybrid Vision/Force Control at Corners inPlanar Robotic-Contour Following. Mechatronics, IEEE/ASME Transactionson, volume 7 (2):143 –151, 2002.

34

Date post:	13-Jun-2020
Category:	Documents
Upload:	others
View:	0 times
Download:	0 times

Technische Universität Berlin€¦ · Technische Universität Berlin...

Documents