Date post: | 14-Jan-2016 |
Category: |
Documents |
Upload: | zoe-fleming |
View: | 212 times |
Download: | 0 times |
Statistical Modeling of Images and its Application into Denoising
What is statistics and why? a mathematical science pertaining to the
collection, analysis, interpretation or explanation, and presentation of data
What is signal and noise? Jewelry vs. stones (but don’t be fooled by
the appearance) What is the risk of statistical approach?
Data-driven vs. model-basedEE565 Advanced Im
age Processing Copyright Xin Li 20081
Why do we Need Statistical Model in the first place? Any image processing algorithm has to
work on a collection (class) of images instead of a single one
Mathematical model gives us the abstraction of common properties of the images within the same class
Model is our hypothesis and images are our observation data In physics, can F=ma explain the
relationship between force and acceleration? In image processing, can this model fit this class of images?
Introduction to Statistical Models
Motivating applications: Texture synthesis vs. image denoising
Statistical image modeling Modeling correlation/dependency Transform-domain texture synthesis Nonparametric texture synthesis Performance evaluation issue
Computer Graphics in SPORE
What is Image/Texture Model?
speech
Analysis
Synthesis
Pitch, LPCResidues …
texture
Analysis
Synthesis
P(X): parametric/nonparametric
How do we Tell the Goodness of a Model?
Synthesis (in statistical language, it is called sampling)
Hypothesizedmodel
Does the generatedsample (experimentalresult) look like the data of our interests?
A fair coin?
Does the generatedsequence (experimentalresult) contain the samenumber of Heads and Tails?
Flipthe coin
Computersimulation
Discrete Random Variables (taken from EE465)Discrete Random Variables (taken from EE465)
Example III: For a gray-scale image (L=256), we can use the notation p(rk), k = 0,1,…, L - 1, to denote the histogram of an image with L possible gray levels, rk, k = 0,1,…, L - 1, where p(rk) is the probability of the kth gray level (random event) occurring. The discrete random variables in this case are gray levels.
Question: What is wroning with viewing all pixels as being generated from an independent identically distributed (i.i.d.) random variable
To Understand the Problem
Theoretically, if all pixels are indeed i.i.d., then random permutation of pixels should produce another image of the same class (natural images)
Experimentally, we can write a simple MATLAB function to implement and test the impact of random permutation
Permutated image with identical histogram to lena
Random Process Random process is the foundation for
doing research in the field of communication and signal processing (that is why EE513 is the core requirement for qualified exam)
Random processes is the vector generalization of (scalar) random variables
Correlation and Dependency (N=2)Correlation and Dependency (N=2)
If the condition
holds, then the two random variables are said to be uncorrelated. From our earlier discussion, we know that if x and y are statistically independent, then p(x, y) = p(x)p(y), in which case we write
Thus, we see that if two random variables are statistically independent then they are also uncorrelated. The converse of this statement is not true in general.
Covariance of two Random VariablesCovariance of two Random Variables
The moment µ11
is called the covariance of x and y.
Recall: How to Calculate E(XY)?
… …
… …
X
Y
N
nnnYX
NXYE
1
1)(Empirical solution:
Note: When Y=X, we are getting autocorrelation
Stationary Process*
T T+K
P(X1,…,XN)=P(XK+1,…,XK+N) for any K,N (all statistics is time invariant)
N N
space/time location
order of statistics
Gaussian Process
With mean vector m and covariance matrix C
For convenience, we often assume zero mean (if it is nonzero mean, we can subtract the mean)
The question is: is the distribution of observation data Gaussian or not?
For Gaussian process, it is stationary as long asits first and second order statistics are time-invariant
The Curse of Dimensionality Even for a small-size image such as 64-by-
64, we need to model it by a random process in 4096-dimensional space (R4096) whose covariance matrix is sized by 4096-by-4096
Curse of dimensionality was pointed out by E. Bellman in 1960s; but even computing resource today cannot handle the brute-force search of nearest-neighbor search in relatively high-dimensional space.
Markovian Assumption
Andrei A. Markov1856 - 1922
Pafnuty L. Chebyshev1821 - 1894
Andrey N. Kolmogorov1903 - 1987
A Simple Idea
The future is determined by the present but is independent of the past
Note that stationarity and Markovianity are two “orthogonal” perspectives of imposing constraintsto random processes
Markov Process
),...,|()|()(),...,( 111211 XXXPXXPXPXXP MMM
),...,|(),...,|( 111 Nkkkkk XXXPXXXP
N-th order Markovian
N past samples
Parametric or non-parametric characterization
Autoregressive (AR) Model Parametric model (Linear
Prediction)
An infinite impulse response (IIR) filter
N
nknknk wXaX
1
N
n
nn zazAzAzH
zWzHzX
1
1)(),(/1)(
),()()(
z-transform
Example: AR(1)
iik
i
kkkkkk
wa
wawXawaXX
...
122
1
Autocorrelation function
...2,1,0,)( kakr k
a=0.9
k
r(k)
Yule-Walker Equation
N
nknknk wXaX
1
N
nlknknlkk XXEaXXE
1
)()(
N
k
a
a
a
rrNr
rr
rr
Nrrr
Nr
kr
r
1
)0()1()1(
)1()0(
)0()1(
)1(......)1()0(
)(
)(
)1(
Covariance C
Wiener’s IdeasIn practice, we do not know autocorrelation functions but only observation data X1,…,XM
),...,()( 11 1
2N
M
k
N
nnknk aafXaXMSE
Approach 1: empirically estimate r(k) from X1,…,XM
Approach 2: Formulate the minimization problem of
Niaaaf iN 1,0/),...,( 1
Exercise: you can verify they end up with the same results
Least-Square Estimation
N
nknknk wXaX
1
N
k
a
a
a
NMXNMXMX
NMXMX
NXXX
MX
kX
X
1
)()1()1(
)1(......)2(
......
)1(......)1()0(
)(
)(
)1(
M equations, N unknown variables11 NNMM aCy
Least-Square Estimation (Con’d)
11 NNMM aCy
aCCyC TT
)()( 1 yCCCa TT
If you write it out, it is exactly the empirical wayof estimating autocorrelation functions – nowyou have got the third approach
Rxxrx
From 1D to 2D
Xm,n1
23 4
5Xm,n1
2 3 4
5
6
Causal neighborhood Noncausal neighborhood
678
Causality of neighborhood depends on differentapplications (e.g., coding vs. synthesis)
Experimental Justifications
original
Analysis
Synthesisrandomexcitation
AR modelparameters
Failure Example (I)
Analysisand
Synthesis
N=8,M=4096
Another way to look at it: if X and Y are two imagesof disks, will (X+Y)/2 produce another disk image?
Failure Example (II)
Analysisand
Synthesis
Note that the failure reason of this example is different from the last example (N is not large enough)
N=8,M=4096
Summary of AR Modeling Simple and admit closed-form solution Widely studied in time series analysis
and speech processing applications Known as 2D Kalman filtering and
Gaussian MRF in the literature of image processing
Computational issues In 1D scenario, fast algorithms exist due to
the Toeplitz property of covariance matrix (e.g., Levinson-Durbin recursion)
Improvement over AR Model Doubly stochastic process*
In stationary Gaussian process, second-order statistics are time/spatial invariance
In doubly stochastic process, second-order statistics (e.g., covariance) are modeled by another random process with hidden variables
Windowing technique To estimate spatially varying statistics
Why do We need Windows? Nothing to do with Microsoft All images have finite dimensions – they
can be viewed as the “windowed” version of natural scenes
Any empirical estimation of statistical attributes (e.g., mean, variance) is based on the assumption that all N samples observe the same distribution However, how do we know this assumption
is satisfied?
1D Rectangular Window
X(n)
n
W=(2T+1)
Tnk
Tnkkn X
TX
)12(
1
2D Rectangular Window
W=(2T+1)
W=(2T+1)
Loosely speaking, parameterestimation from a localizedwindow is a compromisedsolution to handle spatiallyvarying statistics
Such idea is common toother types of non-stationarysignals too (e.g., short-time speech processing)
ExampleAs window slidesthough the image,we will observe thatAR model parametersvary from locationto location
A
B
C
Q: AR coefficientsat B and C differfrom those at A butfor different reasons,Why?