Hough transform analysis of data from a planar array of image sensors

Hough transform analysis of data from a planar array

of image sensors

S Wright

This puper suggests that a plunar array of cameras curt provide a multidimensionul image space which is more .structured thun conventional images. This simpltfies the ,suhsequent analysis using a Hough transform. Possible ~~pplications and choice of arra_vl configuration are discussed, un initial implementation is described and results are presented.

Keywords., scene analysis, image sensor arrays, robot vision

Industrial vision systems are of growing importance in flexible manufacturing systems (FMS) as they offer a noncontact sensor which can be programmed to cater for a wide range of applications. For example, in future robotic assembly systems, the vision system could be required to

l recognize the type, position and orientation of objects l provide dynamic feedback of robot end effector

position relative to a data item in the visual field l inspect components and assemblies for compliance

with specification

These applications all require some measure of general- purpose scene analysis if extensive setting up time and tooling costs are to be avoided. Fortunately, the industrial task allows the use of simplifying assumptions such as that the geometry of all the objects in view is stored in a CAD database. Inferring object geometry from an Intensity image is, however, a nontrivial task, owing to variable ambient lighting, surface finish, discoloration, reflections. shadows, etc. Ideally, an industrial sensor used in conjunction with such a database would, therefore, provide a depth map as the primary sensor information.

A number of such systems are available or being developed based on active lighting, scanning rangefinder, stereo image analysis or some combination of these tech-

Cambridge University Engineering Department. Trumpington Street. Cambridge CB2 IPZ, UK

niques’. If these types of system are to be routinely applied in industrial robotic assembly, the sensor performance must not degrade even on rare ‘difficult’ images such as those containing very repetitive features, as the FMS will be expected to operate unsupervised on a wide variety of tasks.

Ideally the sensor will be self-contained, such that it is possible to mount it on the end of a robot arm so that the viewpoint and the scale of the image are under program control. This has the additional benefit that the dynamic range of such a depth mapping sensor need not be as large as with a fixed camera system since the area requiring maximum depth resolution will usually be close to the robot end effector. Even when this is not the case, the robot arm can move to the area of interest to take a closer ‘look’.

If the vision system is to be included in any continuous feedback loops then its frame rate must be fast enough not to limit the dynamic performance of the robot - this will require a depth map calculation delay of less than say 0.1s for a typical current robot such as the IBM 7565. This will almost certainly require dedicated analysis hardware. If this is to be conveniently achieved, then the analysis algorithm must be as simple as possible. This paper presents a technique for constraining the ‘range from motion’ problem in order to simplify subsequent analysis while still meeting the requirements for the sensor to be self-contained, and able ‘difficult’ images.

to deal with

DEPTH MAPPING TECHNIQUES

There are numerous possible approaches to obtaining a depth map, each with advantages in particular applications. For instance, scanning rangefinders can avoid ambient lighting problems, but are inherently slow and. as precision mechanical devices, are also likely to remain expensive. Other approaches such as structured light can process a large number of points in parallel. but may require setting up for each application.

Stereo image analysis requires that corresponding

026228856~87:‘02121~06 903.00 @ 1987 Butterworth & Co. (Publishers) Ltd

WI 5 no 2 IIly1’ 10x7 121

Camera

Qi Perspex block

T

t

Axis of rotation

Figure 1. Variable parallax ofJyet in apparent viewing position controlled by the angle of a Perspex block

pixels in the two views are identified. This requires an iterative procedure which complicates implementation in hardware, and can also give ambiguous results in the presence of repetitive features in the image, or if features in the image are nearly aligned with the axis of the stereo pair’.

An alternative technique for determining scene depth is ‘range from motion’. This approach can avoid the correspondence problem by tracking features between adjacent images3,4, but if the full six degrees of freedom of motion are allowed, the calculations to recover the scene depth can be complicated, particularly if the relative position of each view is not known exactly. Problems with this approach also arise if there is any independent motion in the scene during the image sequence capture time.

This paper suggests that a passive array of well- matched vision sensors (or alternatively a single camera rapidly scanning through a fixed sequence of positions) can give the advantages of range from motion in avoid- ing any ambiguity in pixel correspondence while the simultaneous image acquisition (or short sequence capture time) minimizes the problem of independent movement in the scene. The fixed set of camera positions simplifies calibration by removing the reliance on a separate motion sensing system, and careful choice of camera positions can then minimize the complexity of the subsequent calculations and eliminate the sensitivity to feature orientation. In particular, the degrees of freedom of viewpoint position can be reduced from six to two if the cameras are arranged in a planar array normal to their collective line of sight’.

In the short term, a camera array would be expensive, bulky, and difficult to calibrate. so the initial investigation of the technique has used the variable parallax offset produced by an inclined block of Perspex to scan a sequence of viewpoints (Figure I). Two configurations have been investigated: a linear, and a circular scan sequence. These are seen as practical systems for inspection and object recognition, particularly when combined with a high frame rate CCD camera, but as the image

image plane

Pinhole lens-

\ \ \ ‘1

,/ \\\\ \ \ \, i

,/ /’

\ \ \, i ’ ‘! ‘II ‘1 \ ‘1 1 ,/ \ ‘\ I//

\ ‘,

‘I I’% I

,’ I I ‘\

Figure 2. Geometry of image qf)Yyet caused by .?aieral camera displacement

sequence capture time would preclude use in dynamic feedback tasks, this approach is primarily an expedient way of testing the analysis algorithm. The following sections describe the analysis technique used, and the trade-offs involved in the choice of camera configuration, and present the experimental results achieved. The next stage in this study is seen as an investigation of hardware implementation aspects.

IMAGE ARRAY ANALYSIS

The conventional stereo analysis approach would identify features in each individual camera image, then track the motion of these features between images in order to establish the magnitude of the parallax offsets, and hence the range. However, Bolles and Baker’ have shown that sequence of images resulting from lateral camera motion can be considered as a 3D data solid, with two spatial dimensions and one temporal, and that this data solid can be sectioned orthogonally to the image plane and parallel to the lateral camera motion to give a set of images with one spatial and one temporal dimension (hereafter referred to as the epipolar or orthogonal image) which consists entirely of linear struc- tures. This data solid can also be generated by a linear array of cameras where the temporal dimension is replaced by linear camera offset. The technique can be further extended to a 4D data solid with two dimensions of camera offset needed to describe the spatial position of the camera taking the sample image.

From inspection of Figure 2, it may be seen by similar triangles that the offset of an image point from the centre

122 image and vision computing

of an image (X,) is related to the offset of the object from the camera axis (X,) by the equation

where F is the focal length of the camera, and Z+ F is the length of the normal from the object point onto the image plane. The slope of the lines in the orthogonal image space is dX,!dX,,. It may be seen from Equation I 1) that this slope is inversely proportional to the Z- coordinate of the object point. Measurement of the slope of these image features will. therefore, provide a Cartesian depth map.

The Hough transform is a mapping from image space into parameter space. which was originally developed .o identify the parametric form of straight line features n images, and has since been extended to analytic

curves, arbitrary shapes, and can be applied to multi- dimensional pattern recognition, and greyscale Images’ “I. It can, therefore, be applied to both the 31) Ltnd 4D data solids described above in order to find the slope of features in the orthogonal image set. Owing to the problem of illustration. the analysis will be described for the 3D solid, the extension to the fourth dimension will be assumed.

The algorithm is based on the following steps.

Partition the data into a set of orthogonal images. Smooth then differentiate each orthogonal image. Clip to provide a binary sign representation (Kennedy et ul.“). Map this binary data into the range parameter space. Select the best range estimates in the parameter space to give a set of range estimates for each line in the original image. Apply continuity constraints to the range image set (Baker”).

Having established a technique to determine a sparse depth map from a set of images, the practical considera- tions of accuracy, sensitivity to noise, sensor size. weight and cost must be considered.

CHOICE OF NUMBER AND SPACING OF CAMERAS

In a planar array the separation of the outermost cameras defines the accuracy with which the object range can be determined. The range quantization levels Z(n) for this camera pair can be obtained by substituting into Equation (I) for the maximum baseline B and the image offset X, = nS where S is the pixel spacing, and II is a positive integer to give

Z(n) = (FB)i(nS) (n = 1,2, .) (2)

Equation (2) allows calculation of the furthest detectable range by evaluation of Z at n= I. These calculations provide a guide to the size of array needed to fulfil a given requirement, but are not a theoretical limit to its performance. as it may be possible to locate edges to subpixel resolution in a process similar to hyperacuity in human vision”. It should be noted that the inverse law governing the separation of quantization levels implies that accurate range profiles are available for a

b c

very restricted range of depths. In a practical system. if the scanning Perspex block is used, this range of fine sensitivity can be adjusted during operation by use of a zoom lens. as well as by changing the camera position.

Intermediate cameras in the array maintain the correspondence of pixels between any two views. This can be illustrated by considering the case of a linear array of cameras and a test object with regular verti:,al lines at a fixed range. For an infinite number of cameras, this arrangement would produce the ori.hogonal image shown in Figure 3. If the camera spacing is increased to an interval such that the image is displaced by one wavelength or greater between successive samples, then the mapping into the parameter space will produce ambiguous (or aliased) results as illustrated by the set of samples marked # in Figure 3. where lines a, b, and c could all be selected as giving the correct parallax angle.

In the range from motion analysis of an image sequence. the temporal spacing of the images an hence the physical spacing is approximately constant. This con- straint does not apply to a sensor array. The benefit of this can be seen by adding a single extra sample (marked $) to the set of samples marked # in Figure 3. The Hough transform will now accumulate more votes for the correct range line (b) than for lhe aliased lines (a and c). In general. for an extra camera on the end of a line of N cameras, the actual positlon of an image feature in the additional image must not deviate from its expected position by more than one wavelength of the highest spatial frequency present in the image. This observation provides a theoretically optimum number of cameras for any given accuracy and resolution based on an exponentially increasing camera spacing. In prac- tice this theoretical optimum can be achieved only approximately as the use of one vote to distinguish between alternative maxima in parameter space makes no allowance for sensor noise and inaccuracy.

EDGE EFFECTS, OCCLUSIONS AND LIMITATIONS

It may be seen from Equation (I) that. the features in the orthogonal image space corresponding to short ranges in the parameter space may, if they are already close to the edge of that solid, exceed the boundary of the image solid without appearing in all the images. It may also be that they are interrupted by an occluding

i Machine tool cutter Spring / /

Perspex block Camera

Figure 4. Overview of test scene

object. Noise in the image, and alignment or sensor matching problems may cause the vote lines in parameter space to cluster rather than intersect at exactly one point for every object point. These effects are discussed further by Shapiro*.

Errors due to changing alignment of the sensors can be minimized by the use of solid state cameras which are rigidly connected to each other. This removes drift in the control electronics as a source of alignment error such as would have occurred with Vidicon tube cameras. Owing to the short baseline of the array, the mechanical constraints can be much more robust than is possible with a conventional stereo pair, but mechanical shock and thermal expansion of the mountings cannot be elimi- nated. For maximum accuracy, it will, therefore, be necessary to update the calibration offsets automatically using feedback from exact object positions as they become known. For less accurate work, as the sensor array is a single unit, calibration by the manufacturers can be sufficient. In either case, manual intervention in setting up for a new application is avoided.

Image X coordinate

Figure 5. A set of sample lines from the same position in the set qf 32 te.st images

analysis does not use the information of what pattern is projected, or where from, the projection system does not have any expensive requirements for accurate equip- ment or time consuming setting-up procedures.

The alternative formulation of the technique with a scanning Perspex block avoids many of these calibration issues. Sensor matching is no longer a problem, and the parallelism of the effective viewpoints is determined by the parallelism of the sides of the Perspex block, which can be controlled to close tolerance. The remain- ing problem is that of determining the angular position of the Perspex block at each image position so that the parallax offset can be accurately calculated.

EXPERIMENTAL RESULTS AND CONCLUSIONS

The analysis assumes that the surfaces in the scene exhibit perfect Lambertian radiation properties. This should be a good approximation for most matt engineering materials such as those in an automated assembly cell, provided that the illumination of the cell is sufficiently diffuse, particularly as the range of angles tested by a practical array would be small.

To simulate the data available from a planar array of image sensors, a sequence of images was obtained by rotating a parallel-sided block of Perspex through fixed angular increments in the line of sight of a single camera. This produces an effective offset 6.x given by

6X = TN sin[8 - arcsin(sin(@/N)] cos[arcsin(sin@)/ N)]

where 8 is the angular position of the Perspex block, T is its thickness and N its refractive index.

A limitation of any passive stereo matching procedure An overhead view of the test scene is shown in Figure is that depth information can only be deduced where 4. This scene consists of a machine tool cutter, and a there are identifiable features in the image. This effect spring, arranged on a surface marked out with a grid could be avoided if the scene was illuminated with a for calibration purposes, and in front of a backdrop projected image similar to structured light, to produce tiled with acoustic tiles. These objects are set at ranges artificial edges in otherwise featureless areas. As the between 20 and 100 cm. The image was digitized using


Image X coordinate

Figure 6. Hough transform of the set ofsample lines sho1l.n in Figure 5

Figure 8. Range image threshoided in depth, showing xoustic tile backdrop to the test scene

;L 512 x 512 x 7-bit frame grabber at 32 equiangular Increments of the Perspex block.

An orthogonal image from the experinlental data set 1s shown in Figure 5. clearly showing the parallax motion

/ ;;

,I 1 ,+ I 6, ,,,. ,,

i 8

LJ_.--_-A -~ ____-.-__ ___ _. Figure 9. As Figure 8, but thresholded at a close range to add the machine tool cutter

to be measured. The Hough transform was calculated for this line (Figure 6) and single image of the test scene is shown in Figure 7. The calculation was repeated for each line in the image to produce a sparse depth map. which is illustrated by thresholding at three different depth values. to give Figures 8 IO. This shows good sensitivity to features which are normal both to the line of sight of the camera and to the line of traverse of the linear array.

A second experiment has been constructed where the rotation of the Perspex block is coaxial with the line of sight of the camera. producing a circular offset rather

12s

than a linear one. This should give equal sensitivity irres- pective of feature orientation. Further experiments are planned to demonstrate this and the effect of exponential camera spacing in reducing the number of calculations required for a given accuracy depth map.

The limited depth range over which accurate measure- ments can be achieved with a short baseline sensor can be seen by inspection of the range scale in Figure 6. For the dynamic control task, this should not prove a problem as tine control is usually only required at close range. For the general object recognition task, more time can be allowed; therefore, the range information can be used in conjunction with other image analysis techniques - for instance, to provide scale information to constrain the search space of a shape-based algorithm’J.

The results obtained so far have shown that a planar array image sensor can avoid some of the ambiguities inherent in the conventional stereo image analysis by the provision of extra information. This allows an analysis algorithm to be used to derive a depth map which is simple, operating on binary data in a single pass. It is, therefore, potentially convenient to implement in hardware.

The next stage in this research is to investigate these hardware implementation aspects, and build a full planar array of sensors to study the realtime application of this technique in robotic assembly tasks.

REFERENCES

I Jarvis, R A ‘A perspective on range finding techniques for computer vision’ IEEE Trans. PAMI Vol 5 No 2 (March 1983) pp

2 Hildreth, E C ‘Computations underlying the measurement of visual motion’ AZ Memo 761, MIT

ArtlYicial Intelligence Laboratory, Cambridge, MA, USA (March 1984)

3 Bridwell, N J and Huang, T S ‘A discrete spatial representation for lateral motion stereo’ Comput. Vision, Graph. and Imuge Proc. Vol 21 (1983) pp 33-57

4 Moravec, H P ‘Visual mapping by robot rover’ Proc. 6th Int. Joint Conf. AI (1979) pp 598-620

5 Tsai, R Y ‘Multiframe image point matching and 3D surface reconstruction’ IEEE Trans. PAMZ Vol 5 No 2 (March 1983) pp 159-174

6 Bolles, R C and Baker, H H ‘Epipolar plane image analysis: a technique for analysing motion sequences’ Proc. 3rd IEEE Workshop Computer Vision Represen- tation and Control (1985)

7 Hough, P V C ‘Method and means for recognizing complex patterns’ US Patent 3069654 (1962)

8 Shapiro, S D ‘Feature space transforms for curve detection’ Pattern Recognition Vol 10 (1978) pp 192- 143

9 Ballard, D H ‘Generalising the Hough transform to detect arbitrary shapes’ Pattern Recognition Vol 13 No 2 (1981) pp 1 l-22

10 Sklansky, J ‘On the Hough technique for curve detection’ IEEE Trans. Comput. Vol C-27 No 10 (Ott 1978) pp 9233926

11 Kennedy, J, Migliau, A, Morasso, P Sandini, G, Teulings, H L and Vernon, D ‘Image and movement understanding’ Presented at Esprit technical w!eek, Brussels, Belgium (September 1985)

12 Baker, H H ‘Edge based stereo correlation’ Proc. ARPA Image Understanding Workshop, University qf Mary/and, USA (April 1980)

13 Huertas, A and Medioni, G ‘Edge detection with subpixel precision’ Proc. 3rd IEEE Workshop Computer Vision Representation and Control (1985)

14 Duda, R 0, Nitzon, D and Barrett, P ‘Use of range and reflectance data to find planar surface regions’ IEEE Trans. PAMI Vol 1 (July 1979) pp 259-271


Date post:	30-Aug-2016
Category:	Documents
Upload:	s-wright
View:	214 times
Download:	0 times

Hough transform analysis of data from a planar array of image sensors

Documents