Date post: | 15-Jan-2016 |
Category: |
Documents |
Upload: | efren-richmond |
View: | 221 times |
Download: | 0 times |
MMSE Estimation for Sparse Representation Modeling
Michael Elad The Computer Science Department The Technion – Israel Institute of technology Haifa 32000, Israel
*
Joint work with
Irad Yavneh & Matan Protter
The CS Department, The Technion
April 6th, 2009
MMSE Estimation for Sparse Representation ModelingBy: Michael Elad
2
Noise Removal?
In this talk we focus on signal/image denoising …
Important: (i) Practical application; (ii) A convenient platform for testing basic ideas in signal/image processing.
Many Considered Directions: Partial differential equations, Statistical estimators, Adaptive filters, Inverse problems & regularization, Wavelets, Example-based techniques, Sparse representations, …
Main Massage Today: Several sparse representations can be found and used for better denoising performance – we introduce, motivate, discuss, demonstrate, and explain this new idea.
Remove Additive
Noise ?
MMSE Estimation for Sparse Representation ModelingBy: Michael Elad
3
1. Background on Denoising with Sparse Representations
2. Using More than One Representation: Intuition
3. Using More than One Representation: Theory
4. A Closer Look At the Unitary Case
5. Summary and Conclusions
Agenda
MMSE Estimation for Sparse Representation ModelingBy: Michael Elad
4
Part I Background on Denoising with Sparse
Representations
MMSE Estimation for Sparse Representation ModelingBy: Michael Elad
5
Relation to measurements
Denoising By Energy Minimization
Thomas Bayes 1702 -
1761
Prior or regularizationy : Given measurements
x : Unknown to be recovered
xPryx21
xf2
2
Many of the proposed signal denoising algorithms are related to the minimization of an energy function of the form
This is in-fact a Bayesian point of view, adopting the Maximum-A-posteriori Probability (MAP) estimation.
Clearly, the wisdom in such an approach is within the choice of the prior – modeling the signals of interest.
MMSE Estimation for Sparse Representation ModelingBy: Michael Elad
6
Sparse Representation Modeling
MK
N
DA fixed Dictionary
Every column in D (dictionary) is a prototype signal (atom).
The vector is generated randomly with few (say L for now) non-zeros at random locations and with random values.
A sparse & random vector
αx
N
MMSE Estimation for Sparse Representation ModelingBy: Michael Elad
7
ˆx
L.t.sy21
minargˆ00
22
D
D
D-y = -
Back to Our MAP Energy Function The L0 “norm” is effectively
counting the number of non-zeros in .
The vector is the representation (sparse/redundant).
Bottom line: Denoising of y is done by minimizing
x
L.t.symin 00
2
2
D 22
200 y.t.smin
Dor
MMSE Estimation for Sparse Representation ModelingBy: Michael Elad
8
Next steps: given the previously found atoms, find the next one to best fit the residual. The algorithm stops when the error is below the destination threshold.
The MP is one of the greedy algorithms that finds one atom at a time [Mallat & Zhang
(’93)]. Step 1: find the one atom that
best matches the signal.
The Orthogonal MP (OMP) is an improved version that re-evaluates the coefficients by Least-Squares after each round.
2yD
The Solver We Use: Greed Based
22
200 y.t.smin
D
MMSE Estimation for Sparse Representation ModelingBy: Michael Elad
9
Orthogonal Matching Pursuit
Ki1forrdzmin)i(ECompute 1ni
z
0
00
0
Sand
yyr
0,0n
D
2
nr
1nn
Initialization
Main Iteration
1.
2.
3.
4.
5.
)i(E)i(E,Ki1.t.siChoose 00
nn Spsup.t.symin:LS
Dnn yr:sidualReUpdate D
}i{SS:SUpdate 01nnn
StopYesNo
OMP finds one atom at a time for approximating the solution of
22
200 y.t.smin
D
MMSE Estimation for Sparse Representation ModelingBy: Michael Elad
10
Part II Using More than One Representation: Intuition
MMSE Estimation for Sparse Representation ModelingBy: Michael Elad
11
Back to the Beginning. What If …
Consider the denoising problem
and suppose that we can find a group of J candidate
solutions
such that
22
200 y.t.smin
D
J1jj
22
2j
0
0j
y
Nj
D
Basic Questions: What could we do with
such a set of competing solutions in order to better denoise y?
Why should this help?
How shall we practically find such a set of solutions?
Relevant work:[Larsson & Selen (’07)][Schintter et. al. (`08)]
[Elad and Yavneh (’08)]
MMSE Estimation for Sparse Representation ModelingBy: Michael Elad
12
Motivation – General
Why bother with such a set?
Because each representation conveys a different story about the desired signal.
Because pursuit algorithms are often wrong in finding the sparsest representation, and then relying on their solution is too sensitive.
… Maybe there are “deeper” reasons?
1
2
D
D
MMSE Estimation for Sparse Representation ModelingBy: Michael Elad
13
Our Motivation
An intriguing relationship between this idea and the common-practice in example-based techniques, where several examples are merged.
Consider the Non-Local-Means [Buades, Coll, & Morel (‘05)]. It uses (i) a local dictionary (the neighborhood patches), (ii) it builds several sparse representations (of cardinality 1), and (iii) it merges them.
Why not take it further, and use general sparse representations?
2
D
1
D
MMSE Estimation for Sparse Representation ModelingBy: Michael Elad
14
Generating Many Representations
Our Answer: Randomizing the OMP
Ki1forrdzmin)i(ECompute 1ni
z
0
00
0
Sand
yyr
0,0n
D
2
nr
1nn
Initialization
Main Iteration
1.
2.
3.
4.
5.
)i(E)i(E,Ki1.t.siChoose 00
nn Spsup.t.symin:LS
Dnn yr:sidualReUpdate D
}i{SS:SUpdate 01nnn
Stop
)i(EcexpyprobabilitwithiChoose0
YesNo
* Larsson and Schnitter propose a more complicated and deterministic tree pruning method
*
For now, lets set the parameter c manually for best performance. Later we shall
define a way to set it automatically
MMSE Estimation for Sparse Representation ModelingBy: Michael Elad
15
Lets Try
10000
y
Proposed Experiment :
Form a random dictionary D.
Multiply by a sparse vector α0 ( ).
Add Gaussian iid noise v with σ=1 and obtain .
Solve the problem
using OMP, and obtain .
Use Random-OMP and obtain .
Lets look at the obtained representations …
100
200
D
1000
1jRandOMPj
0
100y.t.smin2
200
D
+=v y
OMP
MMSE Estimation for Sparse Representation ModelingBy: Michael Elad
16
Some Observations
0 10 20 30 400
50
100
150
Candinality
His
togra
m
Random-OMP cardinalitiesOMP cardinality
85 90 95 100 1050
50
100
150
200
250
300
350
Representation ErrorH
isto
gra
m
Random-OMP errorOMP error
0 0.1 0.2 0.3 0.40
50
100
150
200
250
300
Noise Attenuation
His
togr
am
Random-OMP denoisingOMP denoising
0 5 10 15 200.05
0.1
0.15
0.2
0.25
0.3
0.35
Cardinality
No
ise
Att
enu
ation
Random-OMP denoisingOMP denoising
0
0
RandOMPi
2
20
220
y
D
DD
2
2yD
We see that
•The OMP gives the sparsest solution
•Nevertheless, it is not the most effective for denoising.
•The cardinality of a representation does not reveal its efficiency.
MMSE Estimation for Sparse Representation ModelingBy: Michael Elad
17
The Surprise (at least for us) …
0 50 100 150 200-3
-2
-1
0
1
2
3
index
valu
e
Averaged Rep.Original Rep.OMP Rep.
1000
1j
RandOMPj1000
1ˆ
Lets propose the average
as our representation
This representation IS NOT SPARSE AT
ALL but it gives
06.0y
ˆ2
20
220
D
DD
MMSE Estimation for Sparse Representation ModelingBy: Michael Elad
18
Is It Consistent? … Yes!
0 0.1 0.2 0.3 0.4 0.50
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
OMP Denoising Factor
Ra
ndO
MP
Den
oisi
ng F
acto
r
OMP versus RandOMP resultsMean Point
Here are the results of 1000 trials with
the same parameters …
?Cases of
zero solution
MMSE Estimation for Sparse Representation ModelingBy: Michael Elad
19
Part III Using More than One Representation:
Theory
MMSE Estimation for Sparse Representation ModelingBy: Michael Elad
20
Our Signal Model
K
N
DA fixed Dictionary
α
x D is fixed and known.
The vector α is built by: Choosing the support s with
probability P(s) from all the 2K possibilities Ω.
For simplicity, assume that |s|=k is fixed and known.
Choosing the αs coefficients using iid Gaussian entries N(0,σx).
The ideal signal is x=Dα=Dsαs.The p.d.f. P(α) and P(x) are clear and known
MMSE Estimation for Sparse Representation ModelingBy: Michael Elad
21
Adding Noise
K
N
DA fixed Dictionary
α
x
yv
+Noise Assumed:
The noise v is additive white Gaussian vector with probability Pv(v)
The conditional p.d.f.’s P(y|s), P(s|y), and even also P(x|y) are all clear and well-
defined (although they may appear nasty).
2
2
2
yxexpCxyP
MMSE Estimation for Sparse Representation ModelingBy: Michael Elad
22
The Key – The Posterior P(x|y)
y|xPWe have access to
MAP MMSE
)y|x(PArgMaxxx
MAP y|xExMMSE
The estimation of α and multiplication by D is equivalent to the above.
These two estimators are impossible to compute, as we show next.
Oracle known
support s
oraclex
MMSE Estimation for Sparse Representation ModelingBy: Michael Elad
23
Lets Start with The Oracle
yP
P|yPy|Ps,y|P ss
s
2x
2s
s2
expP
2
2ss
s2
yexp|yP
D
2x
2s
2
2ss
s22
yexpy|P
D
s1
sTs2
1
2x
sTs2s hy
111ˆ
QDIDD
* When s is known
*
Comments:
• This estimate is both the MAP and MMSE.
• The oracle estimate of x is obtained by multiplication by Ds.
MMSE Estimation for Sparse Representation ModelingBy: Michael Elad
24
We have seen this as the oracle’s probability for the
support s:
2x
2s
2
2ss
s22
yexpy|P
D
The MAP Estimation
2))log(det(
2
hhexp)s(P
...)s|y(P)s(P)y|s(P
1ss
1s
Ts QQ
s
MAP )s,y|(P)y|s(PArgMaxˆ )y|(PArgMaxˆMAP
)y|(P)y|s(PArgMaxˆ ss,
MAP
s
2))log(det(
2
hhexp)s(PArgMaxs
1ss
1s
Ts
s
MAP QQ
MMSE Estimation for Sparse Representation ModelingBy: Michael Elad
25
The MAP Estimation
Implications:
The MAP estimator requires to test all the possible supports for the maximization. In typical problems, this is impossible as there is a combinatorial set of possibilities.
This is why we rarely use exact MAP, and we typically replace it with approximation algorithms (e.g., OMP).
2))log(det(
2
hhexp)s(PArgMaxs
1ss
1s
Ts
s
MAP QQ
MMSE Estimation for Sparse Representation ModelingBy: Michael Elad
26
s1
ss hˆs,y|E Q
This is the oracle for s, as we have seen before
The MMSE Estimation
2))log(det(
2
hhexp)s(P
...)s|y(P)s(P)y|s(P
1ss
1s
Ts QQ
s
MMSE s,y|E)y|s(Py|Eˆ
s
sMMSE )y|s(Pˆ
MMSE Estimation for Sparse Representation ModelingBy: Michael Elad
27
The MMSE Estimation
s
MMSE s,y|E)y|s(Py|Eˆ
s
sMMSE )y|s(Pˆ
Implications:
The best estimator (in terms of L2 error) is a weighted average of many sparse representations!!!
As in the MAP case, in typical problems one cannot compute this expression, as the summation is over a combinatorial set of possibilities. We should propose approximations here as well.
MMSE Estimation for Sparse Representation ModelingBy: Michael Elad
28
This is our c in the
Random-OMP
The Case of |s|=k=1
2k
T22
x
2x
2
1ss
1s
Ts
)dy(2
1exp
2))log(det(
2
hhexp)s(P)y|s(P
The k-th atom in D Based on this we can propose a greedy
algorithm for both MAP and MMSE:
MAP – choose the atom with the largest inner product (out of K), and do so one at a time, while freezing the previous ones (almost OMP).
MMSE – draw at random an atom in a greedy algorithm, based on the above probability set, getting close to P(s|y) in the overall draw.
MMSE Estimation for Sparse Representation ModelingBy: Michael Elad
29
Bottom Line
The MMSE estimation we got requires a sweep through all supports (i.e. combinatorial search) – impractical.
Similarly, an explicit expression for P(x/y) can be derived and maximized – this is the MAP estimation, and it also requires a sweep through all possible supports – impractical too.
The OMP is a (good) approximation for the MAP estimate.
The Random-OMP is a (good) approximation of the Minimum-Mean-Squared-Error (MMSE) estimate. It is close to the Gibbs sampler of the probability P(s|y) from which we should draw the weights. Back to the beginning: Why Use Several
Representations? Because their average leads to a provable better noise suppression.
MMSE Estimation for Sparse Representation ModelingBy: Michael Elad
30
0.5 1 1.5
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
Re
lativ
e M
ea
n-S
qu
are
d-E
rro
r
Comparative Results
The following results correspond to a small dictionary (20×30), where the combinatorial formulas can be evaluated as well.Parameters:
• N=20, K=30
• True support=3
• σx=1
• J=10 (RandOMP)
• Averaged over 1000 experiments
20
1. Emp. Oracle2. Theor. Oracle3. Emp. MMSE4. Theor. MMSE5. Emp. MAP6. Theor. MAP7. OMP8. RandOMP
Known support
MMSE Estimation for Sparse Representation ModelingBy: Michael Elad
31
Part IV A Closer Look At the
Unitary Case IDDDD TT
MMSE Estimation for Sparse Representation ModelingBy: Michael Elad
32
Few Basic Observations
ss22x
2
2x
2
s1
ss
s2Ts2s
2x
2
2x
2
2x
sTs2s
c1
hˆ
1y
1h
11
Q
D
IIDDQ
yTDLet us denote
(The Oracle)
MMSE Estimation for Sparse Representation ModelingBy: Michael Elad
33
Back to the MAP Estimation
2))log(det(
2
hhexpArgMaxs
1ss
1s
Ts
s
MAP QQ
We assume |s|=k fixed with
equal probabilities
*
This part becomes a constant, and
thus can be discarded
2
2s2s1
sTs
chh
Q
This means that MAP estimation can be easily evaluated by
computing β, sorting its entries in descending order, and choosing
the k leading ones.
MMSE Estimation for Sparse Representation ModelingBy: Michael Elad
34
Closed-Form Estimation
It is well-known that MAP enjoys a closed form and simple solution in the case of a unitary dictionary D.
This closed-form solution takes the structure of thresholding or shrinkage. The specific structure depends on the fine details of the model assumed.
It is also known that OMP in this case becomes exact.
What about the MMSE? Could it
have a simple closed-form solution
too ?
MMSE Estimation for Sparse Representation ModelingBy: Michael Elad
35
The MMSE … Again
s s
MMSE )y|s(PcˆThis is the formula we got:
=
We combine linearly many sparse representations (with proper weights)
+ + + + + + +
The result is one effective representation (not sparse anymore)
MMSE Estimation for Sparse Representation ModelingBy: Michael Elad
36
The MMSE … Again
s s
MMSE )y|s(PcˆThis is the formula we got:
We change the above summation to
K
1jjj
kj
MMSE eqˆ
where there are K contributions (one per each atom) to be found and used.
We have developed a closed-form recursive formula for computing the q coefficients.
MMSE Estimation for Sparse Representation ModelingBy: Michael Elad
37
Towards a Recursive Formula
iq
2i2si
2
2s2 2
cexp
2
cexp...)y|s(P
We have seen that the governing probability for the weighted averaging is given by
jjK
1j ss
sii
ss
sii
ss
MMSE
e)j(Iqq
)y|s(Pcˆ
Indicator function stating if j is in s
kjq
MMSE Estimation for Sparse Representation ModelingBy: Michael Elad
38
The Recursive Formula
K
1
1k1
1kj
1j
ss
sii
kj
qq1
)q1(qk...)j(Iqq
where j1j qq
K
1j2jq
K
1j1jq
K
1j3jq
K
1jkjq
MMSE Estimation for Sparse Representation ModelingBy: Michael Elad
39
0.5 1 1.5
0
0.07
0.08
0.09
0.1
Re
lativ
e M
ea
n-S
qu
are
d-E
rro
r
This is a synthetic experiment resembling the previous one, but with few important changes:
2
OracleKnown support
An Example
• D is unitary
• The representation’s cardinality is 5 (the higher it is, the weaker the Random-OMP becomes)
• Dimensions are different: N=K=64
• J=20 (RandOMP runs)
Theor. MAPOMP
Recursive MMSE
Theor. MMSE
RandOMP
MMSE Estimation for Sparse Representation ModelingBy: Michael Elad
40
Part V Summary and
Conclusions
MMSE Estimation for Sparse Representation ModelingBy: Michael Elad
41
Today We Have Seen that …
By finding the sparsest
representation and using it to recover the clean signal
How ?
Sparsity and Redundancy are
used for denoising of signals/images
Can we do better? Today we have shown that
averaging several sparse representations for a signal
lead to better denoising, as it approximates the MMSE
estimator.
More on these (including the slides and the relevant papers) can be found in http://www.cs.technion.ac.il/~elad