CSCE 643 Computer Vision:Extractions of Image Features
Jinxiang Chai
Good Image Features
• What are we looking for?– Strong features– Invariant to changes (affine and perspective,
occlusion, illumination, etc.)
Feature Extraction
Why do we need to detect features?
- Features correspond to important points in both the world and image spaces
- Object detection/recognition
- Solve the problem of correspondence
• Locate an object in multiple images (i.e. in video)• Track the path of the object, infer 3D structures,
object and camera movement
Outline
Image Features
- Corner detection
- SIFT extraction
What are Corners?
Point features
What are Corners?
Point features
Where two edges come together
Where the image gradient has significant components in the x and y direction
We will establish corners from the gradient rather than the edge images
Basic Ideas
What are gradients along x and y directions?
Basic Ideas
What are gradients along x and y directions?
Basic Ideas
What are gradients along x and y directions?
How to measure corners based on the gradient images?
Basic Ideas
What are gradients along x and y directions?
How to measure corners based on the gradient images?How to measure corners based on the gradient images?
- two major axes in the local window!
How to Find Two Major Axes?
• Principal component analysis (PCA)
How to Find Two Major Axes?
• Principal component analysis (PCA)
The length of two major axes is dependent on the ration of eigen values (λ1/λ2 ).
Corner Detection Algorithm
6160531918
5855531513
5555501313
1010101111
1012121110
y
yxII
x
yxII yx
),(,
),(
1. Compute the image gradients
2. Define a neighborhood size as an area of interest around each pixel
3x3 neighborhood
3. For each image pixel (i,j), construct the following matrix from it and its neighborhood values
e.g.
Corner Detection Algorithm (cont’d)
6160531918
5855531513
5555501313
1010101111
1012121110
xI
2
2
),(yyx
yxx
T
y
x
y
xji III
III
I
I
I
IC
22222
2222)3,3(
5553155550
13101011]1,1[
C
Similar to covariance matrix (Ix,Iy)T!
Corner Detection Algorithm (cont’d)
4. For each matrix C(i,j), determine the 2 eigenvalues λ(i.j)= [λ1, λ2].
- This means dominant gradient direction aligns with x or y axis.
- If either λ1 or λ2 is close to zero, then this is not a corner.
Simple case:
Corner Detection Algorithm (cont’d)
4. For each matrix C(i,j), determine the 2 eigenvalues λ(i.j)= [λ1, λ2].
Simple case:
Isolated pixelsInterior Region Edge Corner
λ1, λ2=0 Large λ1 and small λ2 Large λ1 and large λ2 small λ1 and small λ2
Corner Detection Algorithm (cont’d)
4. For each matrix C(i,j), determine the 2 eigenvalues λ(i.j)= [λ1, λ2].
- This is just a rotated version of the one on last slide
- If either λ1 or λ2 is close to zero, then this is not a corner.
- invariant to 2D rotation
General case:
Eigen-values and Corner
- λ1 is large
- λ2 is large
Eigen-values and Corner
- λ1 is large
- λ2 is small
Eigen-values and Corner
- λ1 is small
- λ2 is small
Corner Detection Algorithm (cont’d)
4. For each matrix C(i,j), determine the 2 eigenvalues λ(i.j)= [λ1, λ2].
5. If both λ1 and λ2 are big, we have a corner (Harris also checks the ratio of λs is not too high)
ISSUE: The corners obtained will be a function of the threshold !
Image Gradients
Image Gradients
Closeup of image orientation at each pixel
The Orientation Field
Corners are detected where both λ1 and λ2 are big
The Orientation Field
Corners are detected where both λ1 and λ2 are big
Corner Detection Sample Results
Threshold=25,000 Threshold=10,000
Threshold=5,000
Outline
Image Features
- Corner detection
- SIFT extraction
Scale Invariant Feature Transform (SIFT)
• Choosing features that are invariant to image scaling and rotation
• Also, partially invariant to changes in illumination and 3D camera viewpoint
Motivation for SIFT
• Earlier Methods– Harris corner detector
• Sensitive to changes in image scale• Finds locations in image with large gradients in two
directions
– No method was fully affine invariant• Although the SIFT approach is not fully invariant it
allows for considerable affine change• SIFT also allows for changes in 3D viewpoint
Invariance
• Illumination
• Scale
• Rotation
• Affine
Readings
• Object recognition from local scale-invariant features [pdf link], ICCV 09
• David G. Lowe, "Distinctive image features from scale-invariant keypoints," International Journal of Computer Vision, 60, 2 (2004), pp. 91-110
SIFT Algorithm Overview
1. Scale-space extrema detection
2. Keypoint localization
3. Orientation Assignment
4. Generation of keypoint descriptors.
Scale Space• Different scales are appropriate for
describing different objects in the image, and we may not know the correct scale/size ahead of time.
Scale space (Cont.)
• Looking for features (locations) that are stable (invariant) across all possible scale changes– use a continuous function of scale (scale space)
• Which scale-space kernel will we use?– The Gaussian Function
•
- variable-scale Gaussian
- input image
Scale-Space of Image
y)(x,I *)ky,G(x, )ky,x,( L
),,( kyxG),( yxI
•
- variable-scale Gaussian
- input image
• To detect stable keypoint locations, find the scale-space extrema in difference-of-Gaussian function
Scale-Space of Image
y)(x,I *)ky,G(x, )ky,x,( L
),,( kyxG),( yxI
),(*)),,(),,((),,( yxIyxGkyxGyxD ),,(),,(),,( yxLkyxLyxD
•
- variable-scale Gaussian
- input image
• To detect stable keypoint locations, find the scale-space extrema in difference-of-Gaussian function
Scale-Space of Image
y)(x,I *)ky,G(x, )ky,x,( L
),,( kyxG),( yxI
),(*)),,(),,((),,( yxIyxGkyxGyxD ),,(),,(),,( yxLkyxLyxD
•
- variable-scale Gaussian
- input image
• To detect stable keypoint locations, find the scale-space extrema in difference-of-Gaussian function
Scale-Space of Image
y)(x,I *)ky,G(x, )ky,x,( L
),,( kyxG),( yxI
),(*)),,(),,((),,( yxIyxGkyxGyxD ),,(),,(),,( yxLkyxLyxD
Look familiar?
•
- variable-scale Gaussian
- input image
• To detect stable keypoint locations, find the scale-space extrema in difference-of-Gaussian function
Scale-Space of Image
y)(x,I *)ky,G(x, )ky,x,( L
),,( kyxG),( yxI
),(*)),,(),,((),,( yxIyxGkyxGyxD ),,(),,(),,( yxLkyxLyxD
Look familiar?
-bandpass filter!
Difference of Gaussian
1. A = Convolve image with vertical and horizontal 1D Gaussians, σ=sqrt(2)
2. B = Convolve A with vertical and horizontal 1D Gaussians, σ=sqrt(2)
3. DOG (Difference of Gaussian) = A – B
4. So how to deal with different scales?
Difference of Gaussian
1. A = Convolve image with vertical and horizontal 1D Gaussians, σ=sqrt(2)
2. B = Convolve A with vertical and horizontal 1D Gaussians, σ=sqrt(2)
3. DOG (Difference of Gaussian) = A – B
4. Downsample B with bilinear interpolation with pixel spacing of 1.5 (linear combination of 4 adjacent pixels)
A1
B1
Difference of Gaussian Pyramid
Input Image
Blur
Blur
Blur
Downsample
Downsample
B2
B3
A2
A3
A3-B3
A2-B2
A1-B1
DOG2
DOG1
DOG3
Blur
Other issues
• Initial smoothing ignores highest spatial frequencies of images
Other issues
• Initial smoothing ignores highest spatial frequencies of images
- expand the input image by a factor of 2, using bilinear interpolation, prior to building the pyramid
Other issues
• Initial smoothing ignores highest spatial frequencies of images
- expand the input image by a factor of 2, using bilinear interpolation, prior to building the pyramid
• How to do downsampling with bilinear interpolations?
Bilinear Filter
Weighted sum of four neighboring pixels
x
y
u
v
Bilinear Filter
Sampling at S(x,y):
(i+1,j)
(i,j) (i,j+1)
(i+1,j+1)
S(x,y) = a*b*S(i,j) + a*(1-b)*S(i+1,j)
+ (1-a)*b*S(i,j+1) + (1-a)*(1-b)*S(i+1,j+1)
u
v
y
x
Bilinear Filter
Sampling at S(x,y):
(i+1,j)
(i,j) (i,j+1)
(i+1,j+1)
S(x,y) = a*b*S(i,j) + a*(1-b)*S(i+1,j)
+ (1-a)*b*S(i,j+1) + (1-a)*(1-b)*S(i+1,j+1)
Si = S(i,j) + a*(S(i,j+1)-S(i))
Sj = S(i+1,j) + a*(S(i+1,j+1)-S(i+1,j))
S(x,y) = Si+b*(Sj-Si)
To optimize the above, do the following
u
v
y
x
Bilinear Filter
(i+1,j)
(i,j) (i,j+1)
(i+1,j+1)
y
x
Pyramid Example
A1 B1 DOG1
DOG3
DOG3A2
A3 B3
B2
Feature Detection
• Find maxima and minima of scale space• For each point on a DOG level:
– Compare to 8 neighbors at same level– If max/min, identify corresponding point at pyramid
level below– Determine if the corresponding point is max/min of its 8
neighbors– If so, repeat at pyramid level above
• Repeat for each DOG level• Those that remain are key points
Identifying Max/Min
DOG L-1
DOG L
DOG L+1
Refining Key List: Illumination
• For all levels, use the “A” smoothed image to compute– Gradient Magnitude
• Threshold gradient magnitudes: – Remove all key points with Mij less than 0.1
times the max gradient value
• Motivation: Low contrast is generally less reliable than high for feature points
SIFT Feature Orientation?
• We now obtain the location and scale of SIFT features
• How can we obtain the orientation of features?
Assigning Canonical Orientation
• For each remaining key point:– Choose surrounding N x N window at DOG
level it was detected
DOG image
Assigning Canonical Orientation
• For all levels, use the “A” smoothed image to compute– Gradient Orientation
+
Gaussian Smoothed Image Gradient Orientation Gradient Magnitude
Assigning Canonical Orientation
• Gradient magnitude weighted by 2D gaussian
Gradient Magnitude 2D Gaussian Weighted Magnitude
* =
Assigning Canonical Orientation• Accumulate in histogram
based on orientation• Histogram has 36 bins with
10° increments
Weighted Magnitude
Gradient OrientationGradient OrientationS
um o
f W
eigh
ted
Mag
nitu
des
Assigning Canonical Orientation• Identify peak and assign
orientation and sum of magnitude to key point
Weighted Magnitude
Gradient OrientationGradient OrientationS
um o
f W
eigh
ted
Mag
nitu
des
Peak*
Local Image Description
• SIFT keys each assigned:– Location– Scale (analogous to level it was detected)– Orientation (assigned in previous canonical
orientation steps)
• Now: Describe local image region invariant to the above transformations
SIFT key example
Local Image Description
For each key point:
• Identify 8x8 neighborhood (from DOG level it was detected)
• Align orientation to x-axis
Local Image Description
3. Calculate gradient magnitude and orientation map
4. Weight by Gaussian
Local Image Description
5. Calculate histogram of each 4x4 region. 8 bins for gradient orientation. Tally weighted gradient magnitude.
Local Image Description
6. This histogram array is the image descriptor. (Example here is vector, length 8*4=32. Best suggestion: 128 vector for 16x16 neighborhood)
Applications: Image Matching
• Find all key points identified in source and target image– Each key point will have 2d location, scale and
orientation, as well as invariant descriptor vector
• For each key point in source image, search corresponding SIFT features in target image.
• Find the transformation between two images using epipolar geometry constraints or affine transformation.
Image matching via SIFT featrues
Feature detection
Image matching via SIFT featrues
• Image matching via nearest neighbor search
- if the ratio of closest distance to 2nd closest distance greater than 0.8 then reject as a false match.
• Remove outliers using epipolar line constraints.
Image matching via SIFT featrues
Summary
• SIFT features are reasonably invariant to rotation, scaling, and illumination changes.
• We can use them for image matching and object recognition among other things.
• Efficient on-line matching and recognition can be performed in real time