Image Processing
Outline
• Logistics
• Motivation
• Convolution
• Filtering
Waitlist
• We are at 103 enrolled with 158 students on wait list. This room holds 107.
• I’m getting numerous requests of the form “how likely is it that I’ll get registered?” unlikely :(
• If you are considering dropping, please do so quickly
Some final class philosophies
• Diverse background of class implies folks will find some topics will be redundant/new (e.g., EE folks might be bored by today’s signal processing)
• I think 1-way lectures are boring (and such context can easily be found elsewhere). Discussions are way more fun! I encourage you to come to class.
• I hate power-point. I’d rather write on board, but this room is not conducive for it. I still encourage you to take notes.
• If you are going to come and check e-mail / Facebook, I’d rather you drop now to make room for someone else who’d get more out of lecture.
Outline
• Logistics
• Motivation
• Convolution
• Filtering
Lecture 1 - !!!
Fei-Fei Li & Andrej Karpathy! 5"Jan"15'12'
David Marr, 1970s
David Marr, 1982
Computational perspectiveCredited with early computational approach for vision
David Marr
Low-level Mid-level High-level
Low-level vision
Finding edges, blobs, bars, etc….
Consider family of low-level image processing operations
Photoshop / Instragram filters: blur, sharpen, colorize, etc….
Are certain combinations redundant? Is there a mathematical way to characterize them?
Recall: what is a digital (grayscale) image?
Matrix of integer values
Let’s think of image as zero-padded functions
Images as height fields
F[i,j]
Characterizing image transformations
F[i,j] G[i,j]T
F[i] G[i]T
G[i] = T (F [i])
T (↵F1 + ↵F2) = ↵G1 + �G2
G[i� j] = T (F [i� j])
(Abuse of notation: [i] does not mean transformation is applied at each pixel separately)
G = T (F )
5 4 2 3 7 4 6 5 3 6 5 4 2 3 7 4 6 5 3 6
How do we characterize image processing operations ?
Properties of “nice” functional transformations
Additivity
Scaling
Shift Invariance
T (F1 + F2) = T (F1) + T (F2)
T (↵F ) = ↵T (F )
G[i� j] = T (F [i� j])
Direct consequence: LinearityT (↵F1 + ↵F2) = ↵G1 + �G2
Impulse response= 1 for i = 0 (0 othwerwise)�[i]
[also called delta function]
What does this look like for an image?
Any function can be written as linear combination of shifted and scaled impulse reponses
= ++... + + ...
Figure 1: Staircase approximation to a continuous-time signal.
Representing signals with impulses. Any signal can be expressed as a sum of scaled andshifted unit impulses. We begin with the pulse or “staircase” approximation to a continuoussignal , as illustrated in Fig. 1. Conceptually, this is trivial: for each discrete sample of theoriginal signal, we make a pulse signal. Then we add up all these pulse signals to make up theapproximate signal. Each of these pulse signals can in turn be represented as a standard pulsescaled by the appropriate value and shifted to the appropriate place. In mathematical notation:
As we let approach zero, the approximation becomes better and better, and the in the limitequals . Therefore,
Also, as , the summation approaches an integral, and the pulse approaches the unit impulse:
(1)
In other words, we can represent any signal as an infinite sum of shifted and scaled unit impulses. Adigital compact disc, for example, stores whole complex pieces of music as lots of simple numbersrepresenting very short impulses, and then the CD player adds all the impulses back together oneafter another to recreate the complex musical waveform.
This no doubt seems like a lot of trouble to go to, just to get back the same signal that weoriginally started with, but in fact, we will very shortly be able to use Eq. 1 to perform a marveloustrick.
Linear Systems
A system or transform maps an input signal into an output signal :
where denotes the transform, a function from input signals to output signals.
Systems come in a wide variety of types. One important class is known as linear systems. Tosee whether a system is linear, we need to test whether it obeys certain rules that all linear systemsobey. The two basic tests of linearity are homogeneity and additivity.
4
F[i] = ?
F [i] = F [0]�[i] + F [1]�[i� 1] + . . .
F [i] =X
u
F [u]�[i� u]
T (F [i]) =X
u
F [u]T (�[i� u])
G[i] =X
u
F [u]H[i� u] where H[i] = T (�[i]), G[i] = T (F [i])
G = F ⇤H
Convolution= ++... + + ...
Figure 1: Staircase approximation to a continuous-time signal.
Representing signals with impulses. Any signal can be expressed as a sum of scaled andshifted unit impulses. We begin with the pulse or “staircase” approximation to a continuoussignal , as illustrated in Fig. 1. Conceptually, this is trivial: for each discrete sample of theoriginal signal, we make a pulse signal. Then we add up all these pulse signals to make up theapproximate signal. Each of these pulse signals can in turn be represented as a standard pulsescaled by the appropriate value and shifted to the appropriate place. In mathematical notation:
As we let approach zero, the approximation becomes better and better, and the in the limitequals . Therefore,
Also, as , the summation approaches an integral, and the pulse approaches the unit impulse:
(1)
In other words, we can represent any signal as an infinite sum of shifted and scaled unit impulses. Adigital compact disc, for example, stores whole complex pieces of music as lots of simple numbersrepresenting very short impulses, and then the CD player adds all the impulses back together oneafter another to recreate the complex musical waveform.
This no doubt seems like a lot of trouble to go to, just to get back the same signal that weoriginally started with, but in fact, we will very shortly be able to use Eq. 1 to perform a marveloustrick.
Linear Systems
A system or transform maps an input signal into an output signal :
where denotes the transform, a function from input signals to output signals.
Systems come in a wide variety of types. One important class is known as linear systems. Tosee whether a system is linear, we need to test whether it obeys certain rules that all linear systemsobey. The two basic tests of linearity are homogeneity and additivity.
4
impulse response, filter, kernel
F [i] = F [0]�[i] + F [1]�[i� 1] + . . .
F [i] =X
u
F [u]�[i� u]
T (F [i]) =X
u
F [u]T (�[i� u])
G[i] =X
u
F [u]H[i� u] where H[i] = T (�[i]), G[i] = T (F [i])
G = F ⇤H
Example
5 4 2 3 7 4 6 5 3 61 2 3 *
Template
Deva Ramanan
January 20, 2015
G[i] = F [i] ⇤H[i] =X
u
F [u]H[i� u]
= H[i] ⇤ F [i] =X
u
H[u]F [i� u]
G[i] = F [i]⌦H[i] =X
u
H[u]F [i+ u]
= F [i] ⇤H[�i]
G[i, j] = F ⇤H =X
u
X
v
F [u, v]H[i� u, j � v]
G[i, j] = F ⇤H = H ⇤ F =X
u
X
v
H[u, v]F [i� u, j � v]
G[i, j] = F ⌦H =X
u
X
v
H[u, v]F [i+ u, j + v]
G[i, j] = F ⇤H =kX
u=�k
kX
v=�k
H[u, v]F [i+ u, j + v]
1
0 1 2 3 4 5 6 7 8 90 1 2
H F
G[0] = ? G[1] = ?
Example5 4 2 3 7 4 6 5 3 61 2 3 *
3 2 1
G[0] = 5x1 = 5 G[1] = 5x2+ 4x1 = 14 G[2] = 5x3 + 4x2 + 2x1 = 25 …
-3 -2 -1 0 1 2 3 4 5 6 7 8 9
Preview of 2D
f
h
Properties of convolution
Commutative
Associative
Distributive
Implies that we can efficiently implement complex operations
F ⇤H = H ⇤ F(F ⇤H) ⇤G = F ⇤ (H ⇤G)
(F ⇤G) + (H ⇤G) = (F +H) ⇤G
Powerful way to think about any image transformation that satisfies additivity, scaling, and shift-invariance
Proof: commutativityH ⇤ F =
X
u
H[u]F [i� u] =X
u0
H[i� u0]F [u0] where u0 = i� u
=X
u
F [u]H[i� u] = F ⇤H
Conceptually wacky: allows us to interchange the filter and image
SizeGiven F of length N and H of length M, what’s size of G = F * H?
SizeGiven F of length N and H of length M, what’s size of G = F * H?
>>conv(F,H,’full’) >>conv(F,H,’valid’) >>conv(F,H,’same’)
N+M-1N-M+1
N
A simpler approach
5 4 2 3 7 4 6 5 3 61 2 3
0 1 2 3 4 5 6 7 8 9-1 0 1
Template
Deva Ramanan
January 14, 2015
G[i, j] = F ⌦H =X
u
X
v
H[u, v]F [i+ u, j + v]
G[i, j] = G ⇤H =X
u
X
v
H[u, v]F [i� u, j � v]
1
Template
Deva Ramanan
January 14, 2015
G[i, j] = F ⌦H =X
u
X
v
H[u, v]F [i+ u, j + v]
G[i, j] = G ⇤H =X
u
X
v
H[u, v]F [i� u, j � v]
1
Scan original F instead of flipped version. What’s the math?
(Cross) correlation
5 4 2 3 7 4 6 5 3 61 2 3
0 1 2 3 4 5 6 7 8 9-1 0 1
Template
Deva Ramanan
January 14, 2015
G[i, j] = F ⌦H =X
u
X
v
H[u, v]F [i+ u, j + v]
G[i, j] = G ⇤H =X
u
X
v
H[u, v]F [i� u, j � v]
1
Template
Deva Ramanan
January 14, 2015
G[i, j] = F ⌦H =X
u
X
v
H[u, v]F [i+ u, j + v]
G[i, j] = G ⇤H =X
u
X
v
H[u, v]F [i� u, j � v]
1
Scan original F instead of flipped version. What’s the math?
F [i]⌦H[i] =u=kX
u=�k
H[u]F [i+ u]
Properties
Associativity, Commutative properties do not hold
… but correlation is easier to think about
Convolution vs correlation (1-d)
(commutative property)
(convolution)
Template
Deva Ramanan
January 20, 2015
G[i] = F [i] ⇤H[i] =X
u
F [u]H[i� u]
= H[i] ⇤ F [i] =X
u
F [i� u]H[u]
G[i] = F [i]⌦H[i] =X
u
F [i+ u]H[u]
= F [i] ⇤H[�i]
G[i, j] = F ⇤H =X
u
X
v
F [u, v]H[i� u, j � v]
G[i, j] = F ⇤H =X
u
X
v
H[u, v]F [i� u, j � v]
G[i, j] = F ⌦H =X
u
X
v
H[u, v]F [i+ u, j + v]
G[i, j] = F ⇤H =kX
u=�k
kX
v=�k
H[u, v]F [i+ u, j + v]
1
(exercise for reader!)
Template
Deva Ramanan
January 20, 2015
G[i] = F [i] ⇤H[i] =X
u
F [u]H[i� u]
= H[i] ⇤ F [i] =X
u
H[u]F [i� u]
G[i] = F [i]⌦H[i] =X
u
H[u]F [i+ u]
= F [i] ⇤H[�i]
G[i, j] = F ⇤H =X
u
X
v
F [u, v]H[i� u, j � v]
G[i, j] = F ⇤H = H ⇤ F =X
u
X
v
H[u, v]F [i� u, j � v]
G[i, j] = F ⌦H =X
u
X
v
H[u, v]F [i+ u, j + v]
G[i, j] = F ⇤H =kX
u=�k
kX
v=�k
H[u, v]F [i+ u, j + v]
1
Template
Deva Ramanan
January 20, 2015
G[i] = F [i] ⇤H[i] =X
u
F [u]H[i� u]
= H[i] ⇤ F [i] =X
u
H[u]F [i� u]
G[i] = F [i]⌦H[i] =X
u
H[u]F [i+ u]
= F [i] ⇤H[�i]
G[i, j] = F ⇤H =X
u
X
v
F [u, v]H[i� u, j � v]
G[i, j] = F ⇤H = H ⇤ F =X
u
X
v
H[u, v]F [i� u, j � v]
G[i, j] = F ⌦H =X
u
X
v
H[u, v]F [i+ u, j + v]
G[i, j] = F ⇤H =kX
u=�k
kX
v=�k
H[u, v]F [i+ u, j + v]
1
(cross-correlation)
2D correlation
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 90 0 90 90 90 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 0 0 0 0 0 0 0
0 0 90 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 10 20 30 30 30 20 10
0 20 40 60 60 60 40 20
0 30 60 90 90 90 60 30
0 30 50 80 80 90 60 30
0 30 50 80 80 90 60 30
0 20 30 50 50 60 40 20
10 20 30 30 30 30 20 10
10 10 10 0 0 0 0 0
[.,.]g[.,.]f
Image filtering 1 1 1 1 1 1 1 1 1 ],[ ⋅⋅h
Credit: S. Seitz
],[],[],[,
lnkmflkhnmglk
++=∑60
Gaussian filtering
A Gaussian kernel gives less weight to pixels further from the center of the window
This kernel is an approximation of a Gaussian function:
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 90 0 90 90 90 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 0 0 0 0 0 0 0
0 0 90 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
1 2 1
2 4 2
1 2 1
Slide by Steve Seitz
G[i, j] = F ⌦H =kX
u=�k
kX
v=�k
H[u, v]F [i+ u, j + v]
Convolution vs correlation (2-d)
>> conv2(H,F) >> filter2(H,F)
Convolution:
Correlation:
Template
Deva Ramanan
January 14, 2015
G[i, j] = F ⌦H =X
u
X
v
H[u, v]F [i+ u, j + v]
G[i, j] = G ⇤H =X
u
X
v
H[u, v]F [i� u, j � v]
1
Template
Deva Ramanan
January 20, 2015
G[i] = F [i] ⇤H[i] =X
u
F [u]H[i� u]
= H[i] ⇤ F [i] =X
u
H[u]F [i� u]
G[i] = F [i]⌦H[i] =X
u
H[u]F [i+ u]
= F [i] ⇤H[�i]
G[i, j] = F ⇤H =X
u
X
v
F [u, v]H[i� u, j � v]
G[i, j] = F ⇤H = H ⇤ F =X
u
X
v
H[u, v]F [i� u, j � v]
G[i, j] = F ⌦H =X
u
X
v
H[u, v]F [i+ u, j + v]
G[i, j] = F ⇤H =kX
u=�k
kX
v=�k
H[u, v]F [i+ u, j + v]
1
convolutioncorrelation
Can we compute correlation with convolution?
f
hf
h
Border effects
Annoying detailsWhat is the size of the output?• MATLAB: filter2(g, f, shape)
• shape = ‘full’: output size is sum of sizes of f and g• shape = ‘same’: output size is same as f• shape = ‘valid’: output size is difference of sizes of f and g
f
gg
gg
f
gg
gg
f
gg
gg
full same valid
Border paddingBorders!
From Szeliski, Computer Vision, 2010
Examples of correlationLinear filters: examples
Original
1 1 1 1 1 1 1 1 1
Blur (with a mean filter)
Source: D. Lowe
=
Practice with linear filters
0 0 0 0 1 0 0 0 0
Original
?
Source: D. Lowe
Examples of correlation
Practice with linear filters
0 0 0 0 1 0 0 0 0
Original Filtered (no change)
Source: D. Lowe
Examples of correlation
Practice with linear filters
0 0 0 0 0 1 0 0 0
Original
?
Source: D. Lowe
Practice with linear filters
0 0 0 1 0 0 0 0 0
Original Shifted left By 1 pixel
Source: D. Lowe
Examples of correlation
Practice with linear filters
0 0 0 1 0 0 0 0 0
Original Shifted left By 1 pixel
Source: D. Lowe What would this look like for convolution?
Examples of correlation
Practice with linear filters
0 0 0 0 0 1 0 0 0
Original
?
Source: D. Lowe
Examples of correlationPractice with linear filters
0 0 0 1 0 0 0 0 0
Original Shifted left By 1 pixel
Source: D. Lowe
Practice with linear filters
0 0 0 0 0 1 0 0 0
Original
?
Source: D. Lowe
Examples of correlationPractice with linear filters
0 0 0 1 0 0 0 0 0
Original Shifted left By 1 pixel
Source: D. Lowe
Practice with linear filters
0 0 0 1 0 0 0 0 0
Original Shifted left By 1 pixel
Source: D. Lowe
Practice with linear filters
0 0 0 0 1 0 0 0 0
Original Filtered (no change)
Source: D. Lowe
1 2 12 4 21 2 1
/16
What would this look like for convolution?
Examples of correlation
Practice with linear filters
0 0 0 0 1 0 0 0 0
Original Filtered (no change)
Source: D. Lowe
0 0 00 2 00 0 0
- ?0 0 00 1 00 0 0
-
Practice with linear filters
0 0 0 0 1 0 0 0 0
Original Filtered (no change)
Source: D. Lowe
Examples of correlation
Practice with linear filters
0 0 0 0 1 0 0 0 0
Original Filtered (no change)
Source: D. Lowe
0 0 00 1 00 0 0
-1 2 12 4 21 2 1
/16- )( +0 0 00 1 00 0 0
Unsharp filter
Examples of correlation
Sharpen filter
Gaussianscaled impulseLaplacian of Gaussian
imageblurredimage unit impulse
(identity)
ExamplesImage!rota>on!
€
⊗
g[m,n]
h[m,n]
= ?
f[m,n]
It is linear, but not a spatially invariant operation. There is not convolution.
Image!rota>on!
€
⊗
g[m,n]
h[m,n]
= ?
f[m,n]
It is linear, but not a spatially invariant operation. There is not convolution.
? ? ?? ? ?? ? ?
Can rotations be represented with a convolution? Are they linear shift-invariant (LSI) operations G[i,j] = T(F[i,j])?
Derivative filters (correlation)
⇥�1 1
⇤�11
�
Question: what happens as we repeatedly convolve an image F with filter H?
Practice with linear filters
0 0 0 0 1 0 0 0 0
Original Filtered (no change)
Source: D. Lowe
F F*H
Aside for the probability junkies: The PDF of the sum of two random variables = convolution of their PDFs functions. Repeated convolutions => repeated sums => CLT
Gaussian
1
16
2
41 2 12 4 21 2 1
3
5
Gaussian filters
= 30 pixels = 1 pixel = 5 pixels = 10 pixels
Implementation
63
Gaussian Kernel
• Standard deviation σ: determines extent of smoothing
Source: K. Grauman
σ = 2 with 30 x 30 kernel
σ = 5 with 30 x 30 kernel
Matlab: >> G = FSPECIAL('gaussian',HSIZE,SIGMA)
1
16
2
41 2 12 4 21 2 1
3
5
Finite-support filters
65
Choosing kernel width
• The Gaussian function has infinite support, but discrete filters use finite kernels
Source: K. Grauman What should HSIZE be?
Rule-of-thumb
Set radius of filter to be 3 sigma
Useful representation: Gaussian pyramid
Figure 1: Gaussian Pyramid. Depicted are four levels of the Gaussian pyamid,levels 0 to 3 presented from left to right.
[2] P.J. Burt. Fast filter transforms for image processing. Computer Graphics
and Image Processing, 1981.
[3] P.J. Burt. Fast algorithms for estimating local image properties. Computer
Graphics and Image Processing, 1983.
[4] P.J. Burt and E.H. Adelson. The laplacian pyramid as a compact imagecode. IEEE Transactions on Communication, 31(4):532–540, April 1983.
[5] L.I. Larkin and P.J. Burt. Multi-resolution texture energy measures. InIEEE Conference on Computer Vision and Pattern Recognition, 1983.
2
Filter + subsample (to exploit redundancy in output)
http://persci.mit.edu/pub_pdfs/pyramid83.pdfBurt & Adelson 83
Smoothing vs edge filtersGaussian filters
= 30 pixels = 1 pixel = 5 pixels = 10 pixels
How should filters behave on a flat region with value ‘v’ ?
Smoothing vs edge filtersGaussian filters
= 30 pixels = 1 pixel = 5 pixels = 10 pixels
How should filters behave on a flat region with value ‘v’ ?
Output ‘v’ Output 0
X
ij
H[i, j] = 1X
ij
H[i, j] = 0
53
Template matching with filtersTemplate matching
Goal: find in image Main challenge: What is a
good similarity or distance measure between two patches? • Correlation • Zero-mean correlation • Sum Square Difference • Normalized Cross Correlation
Side by Derek Hoiem
Template matching Goal: find in image Main challenge: What is a
good similarity or distance measure between two patches? • Correlation • Zero-mean correlation • Sum Square Difference • Normalized Cross Correlation
Side by Derek Hoiem
Can we use filtering to build detectors?
H[i,j]
F[i,j]
Matching with filters Goal: find in image Method 0: filter the image with eye patch
Input Filtered Image
],[],[],[,
lnkmflkgnmhlk
++=∑
What went wrong?
f = image g = filter
Side by Derek Hoiem
Attempt 1: correlate with eye patch
Template matching Goal: find in image Main challenge: What is a
good similarity or distance measure between two patches? • Correlation • Zero-mean correlation • Sum Square Difference • Normalized Cross Correlation
Side by Derek Hoiem
G[i, j] =kX
u=�k
kX
v=�k
H[u, v]F [i+ u, j + v]
= HTFij = ||H||||Fij || cos ✓, H, Fij 2 R(2K+1)2
H
Matching with filters Goal: find in image Method 0: filter the image with eye patch
Input Filtered Image
],[],[],[,
lnkmflkgnmhlk
++=∑
What went wrong?
f = image g = filter
Side by Derek Hoiem
Attempt 1: correlate with eye patch
Template matching Goal: find in image Main challenge: What is a
good similarity or distance measure between two patches? • Correlation • Zero-mean correlation • Sum Square Difference • Normalized Cross Correlation
Side by Derek Hoiem
G[i, j] =kX
u=�k
kX
v=�k
H[u, v]F [i+ u, j + v]
= HTFij = ||H||||Fij || cos ✓, H, Fij 2 R(2K+1)2
Useful to think about correlation and convolution
H
FijH
Fij
✓ij
Attempt 1.5: correlate with transformed eye patchTemplate matching
Goal: find in image Main challenge: What is a
good similarity or distance measure between two patches? • Correlation • Zero-mean correlation • Sum Square Difference • Normalized Cross Correlation
Side by Derek Hoiem
Template matching Goal: find in image Main challenge: What is a
good similarity or distance measure between two patches? • Correlation • Zero-mean correlation • Sum Square Difference • Normalized Cross Correlation
Side by Derek Hoiem
Let’s transform filter such that response on a flat region is 0
57
Matching with filters
Goal: find in image Method 1: filter the image with zero-mean eye
Input Filtered Image (scaled) Thresholded Image
)],[()],[(],[,
lnkmgflkfnmhlk
++−=∑
True detections
False detections
mean of f
Attempt 1.5: correlate with zero-mean eye patch
G[i, j] =kX
u=�k
kX
v=�k
(H[u, v]� H̄)F [i+ u, j + v]
=kX
u=�k
kX
v=�k
H[u, v]F [i+ u, j + v]� H̄kX
u=�k
kX
v=�k
F [i+ u, j + v]
58
Attempt 2: SSDMatching with filters
Goal: find in image Method 2: SSD
Input 1- sqrt(SSD) Thresholded Image
2
,
)],[],[(],[ lnkmflkgnmhlk
++−=∑
True detections
Can this be implemented with filtering?
SSD[i, j] = ||H � Fij ||2
= (H � Fij)T (H � Fij)
59
Matching with filters Goal: find in image Method 2: SSD
Input 1- sqrt(SSD)
2
,
)],[],[(],[ lnkmflkgnmhlk
++−=∑
What’s the potential downside of SSD?
Side by Derek Hoiem
Template matching Goal: find in image Main challenge: What is a
good similarity or distance measure between two patches? • Correlation • Zero-mean correlation • Sum Square Difference • Normalized Cross Correlation
Side by Derek Hoiem
What will SSD find here?
(where eyes have been darkened by .5 scale factor)
SSD will fire on shirt
60
Normalized cross correlation
Matching with filters
Goal: find in image Method 3: Normalized cross-correlation
Input Normalized X-Correlation Thresholded Image
True detections Template matching Goal: find in image Main challenge: What is a
good similarity or distance measure between two patches? • Correlation • Zero-mean correlation • Sum Square Difference • Normalized Cross Correlation
Side by Derek Hoiem
where H, F are mean-centered
H
Fij
✓ij
NCC[i, j] =HTFij
||H||||Fij ||
=
HTFijpHTH
qFTijFij
= cos ✓
61
8 SERGE BELONGIE, CS 6670: COMPUTER VISION
image that returns the median value. Another example of a nonlinear filteris “Non-Local Means,” which we describe next.
In Non-Local Means, for every pixel p we look for patches elsewhere inthe image that look similar to the patch surrounding p. We then averagethis set of patches to determine the filtered value of p.
One nice feature of NL-Means is that it is “edge preserving,” while othermethods of smoothing/de-noising can result in blurry edges.
8.5. Looking ahead: modern applications of filter banks
The above approaches to filtering were largely hand designed. This is partlydue to limitations in computing power and lack of access to large datasets inthe 80s and 90s. In modern approaches to image recognition the convolutionkernels/filtering operations are often learned from huge amounts of trainingdata.
In 1998 Yann LeCun created a Convolutional Network (named “LeNet”)that could recognize hand-written digits using a sequence of filtering op-erations, subsampling and assorted nonlinearities the parameters of whichwere learned via stochastic gradient descent on a large,labeled training set.Rather than hand selecting the filters to use, part of LeNet’s training was topick for itself the most e↵ective set of filters. Modern ConvNets use basicallythe same structure as LeNet but because of richer training sets and greatercomputing power we can recognize far more complex objects than handwrit-ten digits (see, for example, GoogLeNet in 2014 and other submissions toImageNet Large-Scale Visual Recognition Challenge).
Modern filter banksLearn filters from training data to look for low, mid, and high-level features
Convolutional Neural Nets (CNNs) Lecun et al 98
62
• Any linear shift-invariant operation can be characterized by a convolution • (Convolution) correlation intuitively corresponds to (flipped) matched-filters • Derive filters by continuous operations (derivative, Gaussian, …) • Contemporary application: convolutional neural networks
A look back