High Level Computer Vision
Basic Image Processing - April 26, 2o17
Bernt Schiele - [email protected] Mario Fritz - [email protected]
mpi-inf.mpg.de/hlcv
High Level Computer Vision - April 26, 2o17 2
Today - Basics of Digital Image Processing
• Linear Filtering ‣ Gaussian Filtering
• Multi Scale Image Representation ‣ Gaussian Pyramid, Laplacian Pyramid
• Edge Detection ‣ ‘Recognition using Line Drawings’
‣ Image derivatives (1st and 2nd order)
• Hough Transform ‣ Finding parametrized curves, generalized Hough transform
• Object Instance Identification using Color Histograms
• (Several slides are taken from Michael Black @ Brown)
High Level Computer Vision - April 26, 2o17 3
Computer Vision and its Components
• computer vision: ‘reverse’ the imaging process ‣ 2D (2-dimensional) digital image processing ‣ ‘pattern recognition’ / 3D image analysis
‣ image understanding
Information
High Level Computer Vision - April 26, 2o17 4
Image Filtering: 2D Signals and Convolution
• Image Filtering ‣ to reduce noise,
‣ to fill-in missing values/information
‣ to extract image features (e.g.edges/corners), etc
• Simplest case: ‣ linear filtering: replace each pixel by a linear combination of its neighbors
• 2D convolution (discrete): ‣ discrete Image: I[m,n]
‣ filter ‘kernel’: g[k,l]
‣ ‘filtered’ image: f[m,n]
f [m,n] = I ⇥ g =�
k,l
I[m� k, n� l]g[k, l]
f [m,n] g[k, l]I[k, l]�1 0 1�1 0 1�1 0 1
8 5 27 5 39 4 1
18 �=can be expressed as matrix multiplication!
High Level Computer Vision - April 26, 2o17 5
Linear Systems
• Basic Properties: ‣ homogeneity T[a X] = a T[X]
‣ additivity T[X1 + X2] = T[X1] + T[X2]
‣ superposition T[aX1 + bX2] = a T[X1] + b T[X2]
‣ linear systems <=> superposition
• examples: ‣ matrix operations (additions, multiplication)
‣ convolutions
High Level Computer Vision - April 26, 2o17 6
Filtering to Reduce Noise
• “Noise” is what we’re not interested in ‣ low-level noise: light fluctuations, sensor noise, quantization effects,
finite precision, …
‣ complex noise (not today): shadows, extraneous objects.
• Assumption: ‣ the pixel’s neighborhood contains information about its intensity
2 3 33 20 23 2 3
�2 3 33 3 23 2 3
High Level Computer Vision - April 26, 2o17 7
Model: Additive Noise
• Image I = Signal S + Noise N:
S + N = I
High Level Computer Vision - April 26, 2o17 8
Model: Additive Noise
• Image I = Signal S + Noise N ‣ I.e. noise does not depend on the signal
• we consider: ‣ Ii : intensity of i’th pixel
‣ Ii = si + ni with E(ni) = 0
- si deterministic
- ni,nj independent for i ≠ j
- ni,nj i.i.d. (independent, identically distributed)
• therefore: ‣ intuition: averaging noise reduces its effect
‣ better: smoothing as inference about the signal
High Level Computer Vision - April 26, 2o17 9
Average Filter
• Average Filter ‣ replaces each pixel with an average of its neighborhood
‣ Mask with positive entries that sum to 1
• if all weights are equal, it is called a BOX filter
19
19
19
19
19
19
19
19
19
High Level Computer Vision - April 26, 2o17 10
Gaussian Averaging (An Isotropic Gaussian)
• Rotationally symmetric • Weights nearby pixels more than
distant ones ‣ this makes sense as ‘probabilistic’
inference
• the pictures show a smoothing kernel proportional to
g(x, y) =1
2�⇥2exp
��x2 + y2
2⇥2
⇥
High Level Computer Vision - April 26, 2o17 11
Smoothing with a Gaussian
• Effects of smoothing: ‣ each column shows realizations of an image of Gaussian noise
‣ each row shows smoothing with Gaussians of different width
noise increase
smoothing increase
High Level Computer Vision - April 26, 2o17 12
Smoothing with a Gaussian
• Example:
Original Image Box-filteredGaussian-filtered
High Level Computer Vision - April 26, 2o17 13
Efficient Implementation
• Both, the BOX filter and the Gaussian filter are separable: ‣ first convolve each row with a 1D filter
‣ then convolve each column with a 1D filter
‣ remember: - convolution is linear - associative and commutative
• Example: separable BOX filter
(fx � fy)� I = fx � (fy � I)
fx � fy fx fy19
19
19
19
19
19
19
19
19
= 13
13
13 �
131313
High Level Computer Vision - April 26, 2o17 14
Example: Separable Gaussian
• Gaussian in x-direction
• Gaussian in y-direction
• Gaussian in both directions
g(x, y) =1
2�⇥2exp
��x2 + y2
2⇥2
⇥
g(x) =1⇥2�⇥
exp�� x2
2⇥2
⇥
g(y) =1⇥2�⇥
exp�� y2
2⇥2
⇥
High Level Computer Vision - April 26, 2o17 15
Multi-Scale Image Representation
• In this class: ‣ Gaussian Pyramids
‣ Laplacian Pyramids -> later
• Example of a Gaussian Pyramid
High Level Computer Vision - April 26, 2o17 16
Motivation: Search across Scales
High Level Computer Vision - April 26, 2o17 17
Computation of Gaussian Pyramid
High Level Computer Vision - April 26, 2o17 18
Gaussian Pyramid• Questions of interest: ‣ which information is
preserved over ‘scales’
‣ which information is lost over ‘scales’
High Level Computer Vision - April 26, 2o17 19
Fourier Transform in Pictures
• a *very* little about Fourier transform to talk about spatial frequencies…
+ ...
High Level Computer Vision - April 26, 2o17 20
Another Example
• a bar ‣ in the big images is
a hair (on the zebra’s nose)
‣ in smaller images, a stripe
‣ in the smallest image, the animal’s nose
High Level Computer Vision - April 26, 2o17 21
Today - Basics of Digital Image Processing
• Linear Filtering ‣ Gaussian Filtering
• Multi Scale Image Representation ‣ Gaussian Pyramid, Laplacian Pyramid
• Edge Detection ‣ ‘Recognition using Line Drawings’
‣ Image derivatives (1st and 2nd order)
• Hough Transform ‣ Finding parametrized curves, generalized Hough transform
• Object Instance Identification using Color Histograms
• (Several slides are taken from Michael Black @ Brown)
High Level Computer Vision - April 26, 2o17 22
Image Edges: What are edges? Where do they come from?
• Edges are changes in pixel brightness
High Level Computer Vision - April 26, 2o17 23
Image Edges: What are edges? Where do they come from?
• Edges are changes in pixel brightness ‣ Foreground/Background
Boundaries ‣ Object-Object-Boundaries ‣ Shadow Edges ‣ Changes in Albedo or
Texture ‣ Changes in Surface Normals
High Level Computer Vision - April 26, 2o17 24
Line Drawings: Good Starting Point for Recognition?
High Level Computer Vision - April 26, 2o17 25
Example of Recognition & Localization
• David Lowe
High Level Computer Vision - April 26, 2o17 26
Example of Recognition & Localization
• David Lowe ‣ 1. ‘filter’ image to find brightness changes
‣ 2. ‘fit’ lines to the raw measurements
High Level Computer Vision - April 26, 2o17 27
Example of Recognition & Localization
• David Lowe ‣ 3. ‘project’ model into the image and ‘match’ to lines
(solving for 3D pose)
High Level Computer Vision - April 26, 2o17 28
Class of Models
• Common Idea & Approach (in the 1980’s) ‣ matching of models (wire-frame/geons/generalized cylinders...)
to edges and lines
• so the ‘only’ remaining problem to solve is: ‣ reliably extract lines & edges that can be matched to these models...
High Level Computer Vision - April 26, 2o17
• Barbara Image: ‣ entire image
29
Actual 1D profile
‣ line 250:
‣ line 250 smoothed with a Gaussian:
High Level Computer Vision - April 26, 2o17 30
What are ‘edges’ (1D)
• Idealized Edge Types:
• Goals of Edge Detection: ‣ good detection: filter responds
to edge, not to noise
‣ good localization: detected edge near true edge
‣ single response: one per edge
High Level Computer Vision - April 26, 2o17
Edges
• Edges: ‣ correspond to fast changes
‣ where the magnitude of the derivative is large
smoothing
“image” of 2 step-edges
single line of “image”
31
High Level Computer Vision - April 26, 2o17 32
Edges & Derivatives…
1st derivative
2nd derivative
High Level Computer Vision - April 26, 2o17 33
Compute Derivatives
• we can implement this as a linear filter: ‣ direct:
‣ or symmetric:
�1 1
�1 0 1
d
dxf(x) = lim
h�0
f(x + h)� f(x)h
⇥ f(x + 1)� f(x)
High Level Computer Vision - April 26, 2o17
Edge-Detection
• based on 1st derivative: ‣ smooth with Gaussian
‣ calculate derivative
‣ finds its maxima
34
g � f
g
f
d
dx(g � f)
High Level Computer Vision - April 26, 2o17
Edge-Detection
• Simplification: ‣ remember:
derivative as well as convolution are linear operations
‣ saves one operation
35
f
d
dxg
d
dx(g � f) =
�d
dxg
⇥� f
�d
dxg
⇥� f
High Level Computer Vision - April 26, 2o17
• Barbara Image: ‣ entire image
36
1D Barbara signal
‣ line 250(smoothed):
‣ 1st derivative
High Level Computer Vision - April 26, 2o17
• Barbara Image: ‣ entire image
37
1D Barbara signal: note the amplification of small variations
‣ line 250(smoothed):
‣ 1st derivative
High Level Computer Vision - April 26, 2o17 38
thresholding the derivative?
High Level Computer Vision - April 26, 2o17 39
Implementing 1D edge detection
• algorithmically: ‣ find peak in the 1st derivative
‣ but - should be a local maxima
- should be ‘sufficiently’ large
‣ hysteresis: use 2 thresholds - high threshold to start edge curve (maximum value of
gradient should be sufficiently large)
- low threshold to continue them (in order to bridge “gaps” with lower magnitude)
- (really only makes sense in 2D...)
High Level Computer Vision - April 26, 2o17
• partial derivatives ‣ in x direction:
‣ often approximated with simple filters (finite differences):
40
Extension to 2D Edge Detection: Partial Derivatives
‣ in y direction:
d
dxI(x, y) = Ix ⇥ I �Dx
d
dyI(x, y) = Iy ⇥ I �Dy
Dx =13
�1 0 1�1 0 1�1 0 1
Dy =13
�1 �1 �10 0 01 1 1
High Level Computer Vision - April 26, 2o17 41
Finite Differences
High Level Computer Vision - April 26, 2o17 42
Finite Differences responding to noise
• increasing noise level (from left to right) ‣ noise: zero mean additive Gaussian noise
High Level Computer Vision - April 26, 2o17
• derivative in x-direction:
‣ in 1D:
‣ in 2D:
43
Again: Derivatives and Smoothing
Dx � (G� I) = (Dx �G)� I
High Level Computer Vision - April 26, 2o17 44
What is the gradient ?
no change
change
∂I∂x, ∂I∂y
"
#$%
&'= k,0( )
∂I∂x, ∂I∂y
"
#$%
&'= 0,k( )
no change
no change
change
High Level Computer Vision - April 26, 2o17 45
small change
∂I∂x, ∂I∂y
"
#$%
&'= kx ,ky( )
What is the gradient ?
• gradient direction is perpendicular to edge
• gradient magnitude measures edge strength
large change
High Level Computer Vision - April 26, 2o17 46
2D Edge Detection
• calculate derivative ‣ use the magnitude of the gradient
‣ the gradient is:
‣ the magnitude of the gradient is:
‣ the direction of the gradient is:
∇I = Ix , Iy( ) = ∂I∂x, ∂I∂y
#
$%&
'(
∇I = Ix2 + Iy
2
θ = arctan Iy , Ix( )
High Level Computer Vision - April 26, 2o17 47
2D Edge Detection
• the scale of the smoothing filter affects derivative estimates, and also the semantics of the edges recovered ‣ note: strong edges persist across scales
1 pixel 3 pixels 7 pixels
High Level Computer Vision - April 26, 2o17 48
2D Edge Detection• there are 3 major issues: ‣ the gradient magnitude at different scales is different; which should we choose?
‣ the gradient magnitude is large along a thick trail; how do we identify the significant points?
‣ how do we link the relevant points up into curves?
High Level Computer Vision - April 26, 2o17
‘Optimal’ Edge Detection: Canny
• Assume: ‣ linear filtering
‣ additive i.i.d. Gaussian noise
• Edge Detection should have: ‣ good detection: filter response to edge, not noise
‣ good localization: detected edge near true edge
‣ single response: one per edge
• then: optimal detector is approximately derivative of Gaussian
• detection/localization tradeoff: ‣ more smoothing improves detection
‣ and hurts localization
49
High Level Computer Vision - April 26, 2o17
The Canny edge detector
original image (Lena)
thresholding
50
norm (=magnitude) of
the gradient
thinning (non-maximum
suppression
High Level Computer Vision - April 26, 2o17
Non-maximum suppression
• Check if pixel is local maximum along gradient direction ‣ choose the largest gradient magnitude along the gradient direction
‣ requires checking interpolated pixels p and r
51
High Level Computer Vision - April 26, 2o17 52
Butterfly Example (Ponce & Forsyth)
High Level Computer Vision - April 26, 2o17 53
line drawing vs. edge detection
High Level Computer Vision - April 26, 2o17 54
High Level Computer Vision - April 26, 2o17
• recall: ‣ the zero-crossings of the second derivative
tell us the location of edges
55
Edges & Derivatives…
1st derivative
2nd derivative
High Level Computer Vision - April 26, 2o17
• 1st derivative:
• 2nd derivative:
• mask for ‣ 1st derivative: 2nd derivative:
56
Compute 2nd order derivatives
d
dxf(x) = lim
h�0
f(x + h)� f(x)h
⇥ f(x + 1)� f(x)
d2
dx2f(x) = lim
h�0
ddxf(x + h)� d
dxf(x)h
⇥ d
dxf(x + 1)� d
dxf(x)
⇥ f(x + 2)� 2f(x + 1) + f(x)
�1 1 1 �2 1
High Level Computer Vision - April 26, 2o17 57
The Laplacian
• The Laplacian:
‣ just another linear filter:
∇2 f =∂2 f∂x2
+∂2 f∂y2
∇2 G⊗ f( ) = ∇2G⊗ f
High Level Computer Vision - April 26, 2o17
• in 1D: • in 2D (‘mexican hat’):
58
Second Derivative of Gaussian
High Level Computer Vision - April 26, 2o17
1D edge detection
• using Laplacian
59
f
d2
dx2gLaplacian of Gaussian
operator
�d2
dx2g
⇥� f
High Level Computer Vision - April 26, 2o17 60
Approximating the Laplacian
• Difference of Gaussians (DoG) at different scales:
High Level Computer Vision - April 26, 2o17 61
The Laplacian Pyramid
High Level Computer Vision - April 26, 2o17 62
Edge Detection with Laplacian
• sigma = 4 • sigma = 2
High Level Computer Vision - April 26, 2o17
Edge Detection Today
• Still topic of active research after 40 years
• Today dominated by learning-based methods
• Quantitative Evaluation eg. on Berkeley Segmentation Data Set ‣ 500 images
‣ 5 Annotations per image
63
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
iso−F
Recall
Pre
cisi
on
[F = 0.80] Human[F = 0.74] SE[F = 0.74] SCG − Ren, Bo (2012)[F = 0.73] Sketch Tokens − Lim, Zitnick, Dollar (2013)[F = 0.73] gPb−owt−ucm − Arbelaez, et al. (2010)[F = 0.64] Mean Shift − Comaniciu, Meer (2002)[F = 0.64] Normalized Cuts − Cour, Benezit, Shi (2005)[F = 0.61] Felzenszwalb, Huttenlocher (2004)[F = 0.60] Canny
Figure 5. Results for BSDS 500. See Table 1 and text for details.
References[1] P. Arbelaez, M. Maire, C. Fowlkes, and J. Malik. Contour
detection and hierarchical image segmentation. PAMI, 33,2011. 1, 2, 4, 6
[2] M. Blaschko and C. Lampert. Learning to localize objectswith structured output regression. In ECCV, 2008. 2
[3] K. Bowyer, C. Kranenburg, and S. Dougherty. Edge detectorevaluation using empirical roc curves. Computer Vision andImage Understanding, 84(1):77–103, 2001. 2
[4] L. Breiman, J. Friedman, C. J. Stone, and R. A. Olshen. Clas-siffication and Regression Trees. Chapman and Hall/CRC,1984. 2, 3
[5] J. Canny. A computational approach to edge detection.PAMI, 8(6):679–698, November 1986. 1, 2, 6
[6] B. Catanzaro, B.-Y. Su, N. Sundaram, Y. Lee, M. Murphy,and K. Keutzer. Efficient, high-quality image contour detec-tion. In ICCV, 2009. 2, 6
[7] A. Criminisi, J. Shotton, and E. Konukoglu. Decision forests:A unified framework for classification, regression, densityestimation, manifold learning and semi-supervised learning.Foundations and Trends in Computer Graphics and Vision,7(2-3):81–227, February 2012. 2, 3, 4
[8] P. Dollar, S. Belongie, and P. Perona. The fastest pedestriandetector in the west. In BMVC, 2010. 5
[9] P. Dollar, Z. Tu, and S. Belongie. Supervised learning ofedges and object boundaries. In CVPR, 2006. 1, 2, 6
[10] R. O. Duda, P. E. Hart, et al. Pattern classification and sceneanalysis, volume 3. Wiley New York, 1973. 1, 2
[11] P. F. Felzenszwalb and D. P. Huttenlocher. Efficient graph-based image segmentation. IJCV, 59(2):167–181, 2004. 6
[12] V. Ferrari, L. Fevrier, F. Jurie, and C. Schmid. Groupsof adjacent contour segments for object detection. PAMI,30(1):36–51, 2008. 1
[13] J. R. Fram and E. S. Deutsch. On the quantitative evaluationof edge detection schemes and their comparison with humanperformance. IEEE TOC, 100(6), 1975. 1, 2
[14] W. T. Freeman and E. H. Adelson. The design and use ofsteerable filters. PAMI, 13:891–906, 1991. 1, 2
[15] P. Geurts, D. Ernst, and L. Wehenkel. Extremely randomizedtrees. Machine Learn, 63(1):3–42, Apr. 2006. 2, 3
[16] R. Hidayat and R. Green. Real-time texture boundary detec-tion from ridges in the standard deviation space. In BMVC,2009. 6
[17] T. K. Ho. The random subspace method for constructingdecision forests. PAMI, 20(8):832–844, 1998. 3
[18] I. T. Joliffe. Principal Component Analysis. Springer-Verlag,1986. 4
[19] M. Kass, A. Witkin, and D. Terzopoulos. Snakes: Activecontour models. IJCV, 1(4):321–331, 1988. 1
[20] P. Kontschieder, S. Bulo, H. Bischof, and M. Pelillo. Struc-tured class-labels in random forests for semantic image la-belling. In ICCV, 2011. 1, 2, 3, 4
[21] J. Lim, C. L. Zitnick, and P. Dollar. Sketch tokens: A learnedmid-level representation for contour and object detection. InCVPR, 2013. 1, 2, 4, 5, 6
[22] J. Mairal, M. Leordeanu, F. Bach, M. Hebert, and J. Ponce.Discriminative sparse image models for class-specific edgedetection and image interpretation. In ECCV, 2008. 2
[23] J. Malik, S. Belongie, T. Leung, and J. Shi. Contour andtexture analysis for image segmentation. IJCV, 43, 2001. 1
[24] D. Martin, C. Fowlkes, and J. Malik. Learning to detect nat-ural image boundaries using local brightness, color, and tex-ture cues. PAMI, 26(5):530–549, 2004. 1, 2
[25] D. Martin, C. Fowlkes, D. Tal, and J. Malik. A databaseof human segmented natural images and its application toevaluating segmentation algorithms and measuring ecologi-cal statistics. In ICCV, 2001. 6
[26] S. Nowozin and C. H. Lampert. Structured learning and pre-diction in computer vision. Foundations and Trends in Com-puter Graphics and Vision, 6:185–365, 2011. 1, 2
[27] P. Perona and J. Malik. Scale-space and edge detection usinganisotropic diffusion. PAMI, 12(7):629–639, 1990. 2
[28] X. Ren. Multi-scale improves boundary detection in naturalimages. In ICCV, 2008. 5
[29] X. Ren, C. Fowlkes, and J. Malik. Scale-invariant contourcompletion using cond. random fields. In ICCV, 2005. 1
[30] X. Ren, C. Fowlkes, and J. Malik. Figure/ground assignmentin natural images. In ECCV, 2006. 1, 2
[31] X. Ren and B. Liefeng. Discriminatively trained sparse codegradients for contour detection. In NIPS, 2012. 1, 2, 5, 6
[32] G. S. Robinson. Color edge detection. Optical Engineering,16(5), 1977. 1
[33] N. Silberman and R. Fergus. Indoor scene segmentation us-ing a structured light sensor. In ICCV Workshop on 3D Rep-resentation and Recognition, 2011. 2, 4, 6
[34] B. Taskar, V. Chatalbashev, D. Koller, and C. Guestrin.Learning structured prediction models: a large margin ap-proach. In ICML, 2005. 2
[35] I. Tsochantaridis, T. Hofmann, T. Joachims, and Y. Altun.Learning for interdependent and structured output spaces. InICML, 2004. 2
[36] S. Ullman and R. Basri. Recognition by linear combinationsof models. PAMI, 13(10), 1991. 1
[37] S. Zheng, Z. Tu, and A. Yuille. Detecting object bound-aries using low-, mid-, and high-level information. In CVPR,2007. 1, 2
[38] D. Ziou, S. Tabbone, et al. Edge detection techniques-anoverview. Pattern Recognition and Image Analysis, 8:537–559, 1998. 1, 2
8
• References • P. Arbelaez, M. Maire, C. Fowlkes and J. Malik: Contour Detection and Hierarchical Image
Segmentation; IEEE TPAMI, 2011
• P.Dollar, C. Lawrence Zitnick: Fast Edge Detection using Structured Forests; International Conference on Computer Vision 2013; to appear in IEEE TPAMI 2015
High Level Computer Vision - April 26, 2o17 64
Today - Basics of Digital Image Processing
• Linear Filtering ‣ Gaussian Filtering
• Multi Scale Image Representation ‣ Gaussian Pyramid, Laplacian Pyramid
• Edge Detection ‣ ‘Recognition using Line Drawings’
‣ Image derivatives (1st and 2nd order)
• Hough Transform ‣ Finding parametrized curves, generalized Hough transform
• Object Instance Identification using Color Histograms
• (Several slides are taken from Michael Black @ Brown)
High Level Computer Vision - April 26, 2o17 65
Discussion
• edge detection + contour extraction ‣ edges are defined as discontinuities in the image ‣ we can assemble them, to obtain corresponding object contours ‣ but contours do not necessarily correspond to object boundaries
• problem: ‣ there is basically no knowledge used how object contours look like ‣ obviously humans use such knowledge to segment objects ‣ in principle: if we knew which object is in the image it would be much simpler to
segment the object
High Level Computer Vision - April 26, 2o17 66
Hough Transformation
• detection of straight lines ‣ use the ‘knowledge’ that many contours belong to straight lines
• representation of a line: y = a x + b ‣ 2 parameters: a and b - determine all points of a line
‣ this corresponds to a transformation: (a,b) -> (x,y) - y = a x + b
‣ inverse interpretation: transformation of (x,y) -> (a,b) - b = (-x)a + y
‣ usage: points for which the magnitude of the first derivate is large lie potentially on a line
High Level Computer Vision - April 26, 2o17 67
Hough Transformation
• for a particular point (x,y) determine all lines which go through this point: ‣ the parameters of
all those lines aregiven by: b = (-x)a + y
‣ I.e. those lines are given by a line in the parameter space (a,b)
High Level Computer Vision - April 26, 2o17 68
Hough Transformation
• implementation: ‣ the parameter space (a,b) has to be discretized
‣ for each candidate (x,y) for a line, store the line b = (-a) x + y
‣ in principle each candidate (x,y) votes for the discretized parameters
‣ the maxima in the parameter space (a,b) correspond to lines in the image
• problem of this particular parameterization ‣ the parameter ‘a’ can become infinite (for vertical lines)
‣ problematic for the discretization
High Level Computer Vision - April 26, 2o17 69
Hough Transformation
• choose another parameterization:
‣ for this parameterization the domain is limited: - is limited by the size of the image - and
x cos(�) + y sin(�) = ⇥
�
�
y
x
� � [0, 2⇥]�
High Level Computer Vision - April 26, 2o17 70
Examples
• Houghtransform for a square (left) and a circle (right)
High Level Computer Vision - April 26, 2o17 71
Examples
High Level Computer Vision - April 26, 2o17 72
Hough Transform
• the same idea can be used for other parameterized contours ‣ Example:
- circle: (x-a)2 + (y-b)2 = r2 - 3 parameters: center point (a, b) and radius r
• Limitation: ‣ the parameter space should not become too large
‣ not all contours can be parameterized
High Level Computer Vision - April 26, 2o17 73
Generalized Hough Transform
• Generalization for an arbitrary contour ‣ choose reference point for the contour (e.g. centre)
‣ for each point on the contour remember where it is located w.r.t. to the reference point
‣ e.g. if the center is the reference point: remember radius r and angle relative to the tangent of the contour
‣ recognition: whenever you find a contour point, calculate the tangent angle and ‘vote’ for all possible reference points
High Level Computer Vision - April 26, 2o17 74
Today - Basics of Digital Image Processing
• Linear Filtering ‣ Gaussian Filtering
• Multi Scale Image Representation ‣ Gaussian Pyramid, Laplacian Pyramid
• Edge Detection ‣ ‘Recognition using Line Drawings’
‣ Image derivatives (1st and 2nd order)
• Hough Transform ‣ Finding parametrized curves, generalized Hough transform
• Object Instance Identification using Color Histograms
• (Several slides are taken from Michael Black @ Brown)
High Level Computer Vision - April 26, 2o17 75
Object Recognition (reminder)
• Different Types of Recognition Problems: ‣ Object Identification
- recognize your apple, your cup, your dog
- sometimes called: “instance recognition”
‣ Object Classification - recognize any apple,
any cup, any dog
- also called: generic object recognition, object categorization, …
- typical definition: ‘basic level category’
High Level Computer Vision - April 26, 2o17
• Example Database for Object Identification: ‣ COIL-100 - Columbia Object Image Library ‣ contains 100 different objects, some form the same object class
(e.g. cars,cups)
76
Object Identification
High Level Computer Vision - April 26, 2o17 77
Challenges = Modes of Variation
• Viewpoint changes ‣ Translation ‣ Image-plane rotation ‣ Scale changes ‣ Out-of-plane rotation
• Illumination • Clutter • Occlusion • Noise
2D image
3D object
ry
rx
High Level Computer Vision - April 26, 2o17 78
Appearance-Based Identification / Recognition
• Basic assumption ‣ Objects can be represented
by a collection of images(“appearances”).
‣ For recognition, it is sufficient to just comparethe 2D appearances.
‣ No 3D model is needed.
⇒ Fundamental paradigm shift in the 90’s
3D object
ry
rx
High Level Computer Vision - April 26, 2o17 79
Global Representation
• Idea ‣ Represent each view (of an object) by a global descriptor.
‣ For recognizing objects, just match the (global) descriptors. ‣ Modes of variation can be taken care of by:
- built into the descriptor
• e.g. a descriptor can be made invariant to image-plane rotations, translation
- incorporate in the training data or the recognition process.
• e.g. viewpoint changes, scale changes, out-of-plane rotation
- robustness of descriptor or recognition process (descriptor matching)
• e.g. illumination, noise, clutter, partial occlusion
= ==
High Level Computer Vision - April 26, 2o17 80
Case Study: Use Color for Recognition
• Color: ‣ Color stays constant under geometric transformations ‣ Local feature
- Color is defined for each pixel
- Robust to partial occlusion
• Idea ‣ Directly use object colors for identification / recognition ‣ Better: use statistics of object colors
High Level Computer Vision - April 26, 2o17 81
Color Histograms
• Color statistics ‣ Given: tri-stimulus R,G,B for each pixel ‣ Compute 3D histogram
- H(R,G,B) = #(pixels with color (R,G,B))
[Swain & Ballard, 1991]
High Level Computer Vision - April 26, 2o17 82
Color Histograms
[Swain & Ballard, 1991]
• Robust representation ‣ presence of occlusion, rotation
High Level Computer Vision - April 26, 2o17 83
Color
• One component of the 3D color space is intensity ‣ If a color vector is multiplied by a scalar, the intensity changes, but not
the color itself. ‣ This means colors can be normalized by the intensity.
- Intensity is given by: I = R + G + B:
‣ „Chromatic representation“
r =R
R + G + B
g =G
R + G + B
b =B
R + G + B
High Level Computer Vision - April 26, 2o17 84
Color
• Observation: ‣ Since r + g + b = 1, only 2 parameters are necessary ‣ E.g. one can use r and g ‣ and obtains b = 1 - r - g
r + g + b = 1⇥ b = 1� r � g
R + G + B = 1
R
B
G
High Level Computer Vision - April 26, 2o17 85
Recognition using Histograms
• Histogram comparison ‣ Database of known objects ‣ Test image of unknown object
test image
known objects
High Level Computer Vision - April 26, 2o17 86
Recognition using Histograms
• Database with multiple training views per object
test image
High Level Computer Vision - April 26, 2o17 87
Histogram Comparison
• Comparison measures ‣ Intersection
• Motivation ‣ Measures the common part of both histograms ‣ Range: [0,1] ‣ For unnormalized histograms, use the following formula
High Level Computer Vision - April 26, 2o17 88
Histogram Comparison
• Comparison Measures ‣ Euclidean Distance
• Motivation ‣ Focuses on the differences between the histograms ‣ Range: [0,∞] ‣ All cells are weighted equally. ‣ Not very discriminant
High Level Computer Vision - April 26, 2o17 89
Histogram Comparison
• Comparison Measures ‣ Chi-square
• Motivation ‣ Statistical background:
- Test if two distributions are different
- Possible to compute a significance score
‣ Range: [0,∞] ‣ Cells are not weighted equally!
- therefore more discriminant
- may have problems with outliers (therefore assume that each cell contains at least a minimum of samples)
High Level Computer Vision - April 26, 2o17 90
Histogram Comparison
• Which measure is best? ‣ Depends on the application…
‣ Both Intersection and χ2 give good performance. - Intersection is a bit more robust.
- χ2 is a bit more discriminative.
- Euclidean distance is not robust enough.
‣ There exist many other measures - e.g. statistical tests: Kolmogorov-Smirnov
- e.g. information theoretic: Kullback-Leiber divergence, Jeffrey divergence, ...
High Level Computer Vision - April 26, 2o17 91
Recognition using Histograms
• Simple algorithm 1. Build a set of histograms H = {M1, M2, M3, ...} for each known object
- more exactly, for each view of each object
2. Build a histogram T for the test image. 3. Compare T to each Mk∈H
- using a suitable comparison measure
4. Select the object with the best matching score - or reject the test image if no object is similar enough.
“Nearest-Neighbor” strategy
High Level Computer Vision - April 26, 2o17 92
Color Histograms
• Recognition (here object identification) ‣ Works surprisingly well ‣ In the first paper (1991), 66 objects could be recognized almost
without errors
[Swain & Ballard, 1991]
High Level Computer Vision - April 26, 2o17 93
Discussion: Color Histograms• Advantages ‣ Invariant to object translations
‣ Invariant to image rotations
‣ Slowly changing for out-of-plane rotations
‣ No perfect segmentation necessary
‣ Histograms change gradually when part of the object is occluded
‣ Possible to recognize deformable objects - e.g. pullover
• Problems ‣ The pixel colors change with the illumination
(„color constancy problem“) - Intensity
- Spectral composition (illumination color)
‣ Not all objects can be identified by their color distribution.