PART II – Two case studies Dario Cazzato, INO – CNR dario.cazzato@ino.it Università del Salento...

transcript

OpenCVPART II – Two case studies

Dario Cazzato, INO – CNRdario.cazzato@ino.it

Università del SalentoFacoltà di Ingegneria

Image Processing(Elaborazione delle Immagini)

A.A. 2012/2013

This lesson introduce the use of the OpenCV library on real cases.

Two study cases:◦ Stereo Correspondence Problem;◦ Segmentation of Video Sequences.

Introduction

Why OpenCV?◦ Free, open source, for real-time application, cross-

platform, constantly updated, strong partners and research.

C/C++:◦ “cv::Mat vs CvMat”.

Basic components:◦ Matrixes, vectors, rectangles, sizes, images,

datatype… Some example.

What We’ve seen

Stereo Correspondance Problem

What is Stereo Vision? Men:

◦ Binocular vision.◦ Average distance between

eyes: 6cm.◦ An object/point seen with

eyes is viewed as one, altough in the retina We have two images.

The combined image is more than the sum of its parts. It’s not trivial!

What is Stereo Vision? Other configurations:

◦ Animals: Predator: binocular sight. Prey: lateral eyes (to enlarge the field of sight).

◦ Intersecting line of sight (typical in stereo vision!).

Let’s see some key concept from a practical point of view, but all the problem is absolutely larger!◦ You will see more detail about epipolar

geometry and matrixes computation at lesson.

Pinhole camera model The simplest model of the camera. Only a single ray enters from any particular point, the

pinhole aperture. This point is projected onto the image plane. The focal lenght is the distance from the pinhole

aperture to the image plane.

Pinhole camera model A real point Q is

projected onto the image plane by the ray passing through the center of projection.

This intersection gives q.

Calibration Matrix (3X4)

From “Learning OpenCV”, G.Bradski, A.Kaehler, O’Reilly.

Pinhole camera model

Homogeneous coordinate system.

If you have N dimension, use N+1 coordinates.

HomographyPerspective Geometry

What is Stereo Vision?

1 Camera is not enough!

What is Stereo Vision?With two (or more) cameras we can compute depth by triangulation,if we are able to find homologous points in the two images.

Epipolar Geometry

What is Stereo Vision? Four steps:

1. Undistortion;2. Rectification;3. Disparity Map;4. Triangulation.

What is Stereo Vision?1. Undistortion: removal of tangential and

radial lens distortion.

This problem concerns the single camera!

Distortion vector (1X5)

What is Stereo Vision?1. Undistortion: removal of tangential and

radial lens distortion.

What is Stereo Vision?2. Rectify: output row-aligned images (coplanar,

with the same y-coordinate).

With rectified images, we can search for a point in one image in the same line (y-coordinate) of the second one!

What is Stereo Vision? Of course, a stereo calibration is needed

(extrinsic and intrinsic parameters):◦ Intrinsic: focal lenght, distortion.◦ Extrinsic: Matrixes R, T that aligns the two cameras

(Essential Matrix E, you will see more at lesson).

We can divide the procedure in :◦ Stereo Calibration: computation of the geometric

relations between the two cameras in space;◦ Stereo Rectification: “correction” of individual

images as made with row-aligned image plane and parallel optical axes.

Look at the example of OpenCV, and the source stereo_calib.cpp.

Rectification turns the cameras in standard form!

Example 1

From “Learning OpenCV”, G.Bradski, A.Kaehler, O’Reilly.

What is Stereo Vision?3. Disparity map: difference in x-coordinates of

the same point viewed in the 2 cameras.

A map is created computing the disparity for all the points. It’s encoded as a grayscale image, where farer point are darker.

What is Stereo Vision?4. Triangulation: difference in x-coordinates of

the same point viewed in the 2 cameras.

Idea: (d:T = f:Z)

Stereo Correspondence Problem How to find homologous points?

◦ Correlation-based - checking if one location in one image looks/seems like another in another image;

◦ Feature-based - finding features in the image and seeing if the layout of a subset of features is similar in the two images.

Occlusions

So, why so challenging?

Photometric transformations

Uniform regions

Noise Specular surfaces Perspective views Repetition “No results possible” detection:

◦ Sometime we just would like to say: “No correspondent point in the other image for this point”

Stereo Correspondence ProblemLocal Algorithm

Winner take all strategy

Sum of Absolute Difference:

Sum of Squared Differences:

WTA Strategy

Zero-mean Normalized Cross Correlation:

Not just the window, but a fast normalization;

ZNCC has range [-1,1]; We compute the ZNCC for each pixel center; We take the max value.

WTA Strategy

ZNCC What we can do to enhounce the model?

◦ Ratio between first and second max; Idea behind: if the maximum and a local maximum

have similar value, the probability of error increase, and we could reject these values (putting a treshold).

◦ Spread; Idea behind: a flat area means repetitive texture. Just

discard maximum in flat peaks.◦ Multiple windows;◦ Kernel shape based on segmentation.

Computation time increases!

ZNCC – Our model Two enhouncements:

1. Check the epipolar line ± size: We can deal with noise in the epipolar geometry; For a fast computation, keep size small! (1,2,3).

2. Inverse function: We take the maximum, and we make ZNCC again

starting from the right image; If the new winner isn’t the starting point (or is more

than a treshold far, an error occurred, so discard the point).

Demo 1

Segmentation of Video Sequences

Background subtraction One of the Video Sequences Segmentation

algorithms. Good with fixed camera and static

background. High level goal:

◦ People detection.◦ Surveillance:

Reactive; Proactive.

From the Web

Codebook Model BS: subtract the current frame from the

background model. Two phases:

◦ Background training;◦ Foreground detection.

Improvements to the base version.

Segmentation - background model A codebook is built for every pixel; A codebook is composed by codewords, boxes;

that grow to cover the common values seen over the time;

Samples of each pixel are clustered in set of codewords;

Incoming pixel:◦ It has a brightness in the brightness range AND Color

Distortion less than a treshold = BACKGROUND;◦ Othervise FOREGROUND.

Segmentation – codebook model

Segmentation - mnrl MNRL (Maximum Negative Run Lenght): let

us to make the background learning with objects movement.

It refines codebook separating codebooks that can have foreground from the real background.

MNRL = 50%.

Segmentation – detecting foreground

The foreground is simply detected computing the distance of the sample from the nearest cluster mean.

Segmentation – codebook algorithm

Segmentation – codebook improvements Left object in the scene:

Segmentation – codebook improvements Holes problem:

Segmentation – codebook improvements Layering Modeling/Detection - 3 classes of

codebook and 3 parameters that let to switch in the categories:◦ Permanent;◦ Non-permanent;◦ Training.

Adaptative Codebook Updating:◦ Retraining is not the solution!! ◦ Global status updating at each frame;◦ Periodical cleanining of the old codebook.

Filtering the binary frame Median filter:

Filtering the binary frame Opening and Closing:

Morphological Operators

Filtering the binary frame Opening and Closing:

Filtering the binary frame

Filtering the binary frame Opening:

Closing:

Median Filter:

Object detection Why object detection?

Not all the white pixels are of real interest (noise, holes not yet updated…).

Object detection and labeling algorithm required.

Object detection algorithm A: when an external contour point is encountered for the

first time, a complete trace of the contour is made. This procedure stops when A is found again. All that points will have the same label A;

B: when A' is encountered (it is an external contour point already labeled), a scan of the entire line is made, marking with the same label all the points encountered;

Object detection algorithm C: when an internal contour point B is encountered for the

first time, it takes the same label. Then a trace of this contour is made, giving again the same label to all the met point;

D: when an already labeled point is found, like B', a scan of the entire line is made, marking the detected point with the same label.

Filters – filter by area and ratio We slide all the blobs continuing to process

only blobs with an area and ratio included in a range: ◦ [min Area, max Area], in pixel;◦ [min Ratio, max Ratio].

Decision of range:◦ Average height (from 1,60m to 2m);◦ Average width of the box (from 10 cm to 60 cm);◦ Distance from the camera (from 1m to 5m);

A first necessary loss of genarality.

Demo 2

Technical report (Camera Calibration):◦ A Flexible New Technique for Camera

Calibration, Zhengyou Zhang, 1998 Papers (Stereo Vision):

◦ Chia-Hung Chen, Han-Pang Huang, and Sheng-Yen Lo, Stereo-Based 3D Localization for Grasping KnownObjects with a Robotic Arm System Department of Mechanical Engineering National Taiwan University 10647, Taipei, Taiwan

References

Papers (Codebook):◦ Kim, Chalidabhongse, Harwood, Davis, Real-Time

foreground-background segmentation using Codebook model, Computer Vision Lab, Department of Computer Science, University of Maryland, College Park, MD 20742, USA, Faculty of Information Technology, King Mongkut’s Institute of Technology, Ladkrabang, Bangkok 10520, Thailand, 2005.

◦ P. Fihl, R. Corlin, S. Park, T.B. Moeslund, M.M Trivedi, Tracking of Individuals in Very Long Video Sequence, Laboratory of Computer Vision and Media Technology, Aalborg University, Denmark, Computer Vision and Robotics Research Laboratory The University of California, San Diego, USA, 2006

References

Stereo Images and disparities (ground truth): ◦ http://vision.middlebury.edu/stereo

Camera Calibration and 3D Reconstruction with OpenCV:◦ http://docs.opencv.org/modules/calib3d/doc/camera_calibrati

on_and_3d_reconstruction.html Motion Analysis with OpenCV:

◦ http://docs.opencv.org/modules/video/doc/motion_analysis_and_object_tracking.html (MOG)

Books:◦ Codebook: “Learning OpenCV” (O’Reilly), Chapter 9.◦ Stereo Vision: “Learning OpenCV” (O’Reilly), Chapter 12.

Me:◦ dario.cazzato@ino.it

References

PART II – Two case studies Dario Cazzato, INO – CNR dario.cazzato@ino.it Università del Salento...

Documents