Explicit Signal to Noise Ratio in Reproducing Kernel Hilbert Spaces.pdf

transcript

Explicit Signal to Noise Ratio inReproducing Kernel Hilbert Spaces

Luis Gómez-Chova1 Allan A. Nielsen2 Gustavo Camps-Valls1

1Image Processing Laboratory (IPL), Universitat de València, Spain.luis.gomez-chova@uv.es , http://www.valencia.edu/chovago

2DTU Space - National Space Institute. Technical University of Denmark.

IGARSS 2011 – Vancouver, Canada

Image Processing Laboratory

Intro SNR KMNF Results Conclusions

Outline

1 Introduction

2 Signal-to-noise ratio transformation

3 Kernel Minimum Noise Fraction

4 Experimental Results

5 Conclusions and Open questions

L. Gómez-Chova et al. Explicit Kernel Signal to Noise Ratio IGARSS 2011 – Vancouver 1/23

Motivation

Feature Extraction

Feature selection/extraction is essential before classification or regressionto discard redundant or noisy componentsto reduce the dimensionality of the data

Create a subset of new features by combinations of the existing ones

Linear Feature Extraction

Linear methods offer Interpretability ∼ knowledge discoveryPCA: projections maximizing the data set variancePLS: projections maximally aligned with the labelsICA: non-orthogonal projections with maximal independent axes

Drawbacks

1 Most feature extractors disregard the noise characteristics!2 Linear methods fail when data distributions are curved (nonlinear relations)

Motivation

Feature Extraction

Feature selection/extraction is essential before classification or regressionto discard redundant or noisy componentsto reduce the dimensionality of the data

Create a subset of new features by combinations of the existing ones

Linear Feature Extraction

Linear methods offer Interpretability ∼ knowledge discoveryPCA: projections maximizing the data set variancePLS: projections maximally aligned with the labelsICA: non-orthogonal projections with maximal independent axes

Drawbacks

1 Most feature extractors disregard the noise characteristics!2 Linear methods fail when data distributions are curved (nonlinear relations)

Objectives

New nonlinear kernel feature extraction method for remote sensing data

Extract features robust to data noise

Method

Based on the Minimum Noise Fraction (MNF) transformationExplicit Kernel MNF (KMNF)

Noise is explicitly estimated in the reproducing kernel Hilbert spaceDeals with non-linear relations between the noise and signal features jointlyReduces the number of free parameters in the formulation to one

Experiments

PCA, MNF, KPCA, and two versions of KMNF (implicit and explicit)

Test feature extractors for real hyperspectral image classification

1 Introduction

Signal and noise

Signal vs noise

Signal: magnitude generated by an inaccesible system, siNoise: magnitude generated by the medium corrupting the signal, ni

Observation: signal corrupted by noise, xi

Notation

Observations: xi ∈ RN , i = 1, . . . , n

Matrix notation: X = [x1, . . . , xn]> ∈ Rn×N

Centered data sets: assume X has zero mean

Empirical covariance matrix: Cxx = 1nX>X

Projection matrix: U (size N × np) → X′ = XU (np extracted features)

Principal Component Analysis Transformation

Principal Component Analysis (PCA)

Find projections of X = [x1, . . . , xN ]> maximizing the variance of data XU

PCA: maximize: Trace{(XU)>(XU)} = Trace{U>CxxU}subject to: U>U = I

Including Lagrange multipliers λ, this is equivalent to the eigenproblem

Cxxui = λiui → CxxU = UD

ui are the eigenvectors of Cxx and they are orthonormal, u>i uj = 0

PCA limitations

1 Axes rotation to the directions of maximum variance of data2 It does not consider noise characteristics:

Assumes noise variance is low → last eigenvectors with low eigenvaluesMaximum variance directions may be affected by noise

Principal Component Analysis Transformation

Principal Component Analysis (PCA)

Find projections of X = [x1, . . . , xN ]> maximizing the variance of data XU

PCA: maximize: Trace{(XU)>(XU)} = Trace{U>CxxU}subject to: U>U = I

Cxxui = λiui → CxxU = UD

ui are the eigenvectors of Cxx and they are orthonormal, u>i uj = 0

PCA limitations

1 Axes rotation to the directions of maximum variance of data2 It does not consider noise characteristics:

Assumes noise variance is low → last eigenvectors with low eigenvaluesMaximum variance directions may be affected by noise

Minimum Noise Fraction Transformation

The SNR transformation

Find projections maximizing the ratio between signal and noise variances:

SNR: maximize: Tr

U>CssUU>CnnU

ffsubject to: U>CnnU = I

Unknown signal and noise covariance matrices Css and Cnn

The MNF transformation

Assuming additive X = S + N and orthogonal S>N = N>S = 0 noiseMaximizing SNR is equivalent to Minimizing NF = 1/(SNR+1):

MNF: maximize: Tr

U>CxxUU>CnnU

This is equivalent to solving the generalized eigenproblem:

Cxxui = λiCnnui → CxxU = CnnUD

The SNR transformation

Find projections maximizing the ratio between signal and noise variances:

SNR: maximize: Tr

U>CssUU>CnnU

Unknown signal and noise covariance matrices Css and Cnn

Assuming additive X = S + N and orthogonal S>N = N>S = 0 noiseMaximizing SNR is equivalent to Minimizing NF = 1/(SNR+1):

MNF: maximize: Tr

U>CxxUU>CnnU

This is equivalent to solving the generalized eigenproblem:

Minimum Noise Fraction equivalent to solve the generalized eigenproblem:

Since U>CnnU = I, eigenvalues λi are the SNR+1 in the projected space

Need estimates of signal Cxx = X>X and noise Cnn ≈ N>N covariances

The noise covariance estimation

Noise estimate: diff. between actual value and a reference ‘clean’ value

N = X− Xr

Xr from neighborhood assuming a spatially smoother signal than the noiseAssume stationary processes in wide sense:

Differentiation: ni ≈ xi − xi−1Smoothing filtering: ni ≈ xi − 1

k=1 wkxi−kWiener estimatesWavelet domain estimates

Minimum Noise Fraction equivalent to solve the generalized eigenproblem:

Since U>CnnU = I, eigenvalues λi are the SNR+1 in the projected space

Need estimates of signal Cxx = X>X and noise Cnn ≈ N>N covariances

The noise covariance estimation

Noise estimate: diff. between actual value and a reference ‘clean’ value

N = X− Xr

Xr from neighborhood assuming a spatially smoother signal than the noiseAssume stationary processes in wide sense:

Differentiation: ni ≈ xi − xi−1Smoothing filtering: ni ≈ xi − 1

k=1 wkxi−kWiener estimatesWavelet domain estimates

1 Introduction

Kernel methods for non-linear feature extraction

Kernel methods

Input features space Kernel feature space

1 Map the data to a high-dimensional feature space, H (dH →∞)2 Solve a linear problem there

Kernel trick

No need to know dH →∞ coordinates for each mapped sample φ(xi )

Kernel trick: “if an algorithm can be expressed in the form of dotproducts, its non-linear (kernel) version only needs the dot productsamong mapped samples, the so-called kernel function:”

K(xi , xj ) = 〈φ(xi ),φ(xj )〉

Using this trick, we can implement K-PCA, K-PLS, K-ICA, etc

Kernel Principal Component Analysis (KPCA)

Find projections maximizing variance of mapped data [φ(x1), . . . , φ(xN)]>

KPCA: maximize: Tr{(ΦU)>(ΦU)} = Tr{U>Φ>ΦU}subject to: U>U = I

The covariance matrix Φ>Φ and projection matrix U are dH × dH !!!

KPCA through kernel trick

Apply the representer’s theorem: U = Φ>A where A = [α1, . . . ,αN ]

KPCA: maximize: Tr{A>ΦΦ>ΦΦ>A} = Tr{A>KKA}subject to: U>U = A>ΦΦ>A = A>KA = I

KKαi = λiKαi → Kαi = λiαi

Now matrix A is N × N !!! (eigendecomposition of K)

Projections are obtained as ΦU = ΦΦ>A = KAL. Gómez-Chova et al. Explicit Kernel Signal to Noise Ratio IGARSS 2011 – Vancouver 11/23

Kernel MNF Transformation

KMNF through kernel trick

Find projections maximizing SNR of mapped data [φ(x1), . . . , φ(xN)]>

Replace X ∈ Rn×N with Φ ∈ Rn×NH

Replace N ∈ Rn×N with Φn ∈ Rn×NG

CxxU = CnnUD⇒ Φ>ΦU = Φ>n ΦnUD

Not solvable: matrices Φ>Φ and Φ>n Φn are NH × NH and NG × NGLeft multiply both sides by Φ, and use representer’s theorem, U = Φ>A:

ΦΦ>ΦΦ>A = ΦΦ>n ΦnΦ>AD→ KxxKxxA = KxnK>xnAD

Now matrix A is N × N !!! (eigendecomposition of Kxx wrt Kxn)Kxx = ΦΦ> is symmetric with elements K(xi , xj )

Kxn = ΦΦ>n = K>nx is non-symmetric with elements K(xi , nj )

Easy and simple to program!Potentially useful when signal and noise are nonlinearly related

SNR in Hilbert spaces

Implicit KMNF: noise estimate in the input space

Estimate the noise directly in the input space: N = X− Xr

Signal-to-noise kernel:

Kxn = ΦΦn> → K(xi , nj )

with Φn> = [φ(n1), . . . ,φ(nn)]

Kernels Kxx and Kxn dealing with objects of different nature → 2 paramsTwo different kernel spaces → eigenvalues have no longer meaning of SNR

Explicit KMNF: noise estimate in the feature space

Estimate the noise explicitly in the Hilbert space: Φn = Φ−Φr

Signal-to-noise kernel:

Kxn = ΦΦn> = Φ(Φ−Φr )

> = ΦΦ> −ΦΦr> = Kxx − Kxr

Again it is not symmetric K(xi , rj ) 6= K(ri , xj )

Advantage: same kernel parameter for Kxx and Kxn

SNR in Hilbert spaces

Explicit KMNF: nearest reference

Differentiation in feature space: φni≈ φ(xi )− φ(xi,d )

(Kxn)ij ≈ 〈φ(xi ),φ(xj )− φ(xj,d )〉 = K(xi , xj )− K(xi , xj,d )xrx

Explicit KMNF: averaged reference

Difference to a local average in feature space (e.g. 4-connected

neighboring pixels): φni≈ φ(xi )−

d=1 φ(xi,d )xrxr x

(Kxn)ij ≈ 〈φ(xi ),φ(xj )−1D

φ(xj,d )〉 = K(xi , xj )−1D

K(xi , xj,d )

Explicit KMNF: autoregression reference

Weight the relevance of each kernel in the summation:

(Kxn)ij ≈ 〈φ(xi ),φ(xj )−DX

wdφ(xj,d )〉 = K(xi , xj )−DX

wdK(xi , xj,d )

1 Introduction

Experimental results

Data material

AVIRIS hyperspectral image (220-bands): Indian Pine test site145× 145 pixels, 16 crop types classes, 10366 labeled pixelsThe 20 noisy bands in the water absorption region are intentionally kept

Experimental setup

PCA, MNF, KPCA, and two versions of KMNF (implicit and explicit)

The 220 bands transformed into a lower dimensional space of 18 features

Visual inspection: extracted features in descending order of relevance

Features 1–3 4–6 7–9 10–12 13–15 16–18

implicitKMNF

explicitKMNF

Analysis of the eigenvalues: signal variance and SNR of transformed data

Analysis of the eigenvalues:

Signal variance of the transformed data for PCA

SNR of the transformed data for MNF and KMNF

0 5 10 15 200

# feature

PCAMNFKMNF

The proposed approach provides the highest SNR!

LDA classifier: land-cover classification accuracy

2 4 6 8 10 12 14 16 180.1

# features

tic (κ)

PCAMNFKPCAKMNFiKMNF

2 4 6 8 10 12 14 16 180.1

# features

tic (κ)

PCAMNFKPCAKMNFiKMNF

Original hyperspectral image Multiplicative random noise (10%)

Best results: linear MNF and the proposed KMNF

The proposed KMNF method outperforms MNF when the image iscorrupted with non additive noise

LDA classifier: land-cover classification maps

LDA-MNF LDA-KMNF

1 Introduction

Conclusions and open questions

Conclusions

Kernel method for nonlinear feature extraction maximizing the SNRGood theoretical and practical properties for extracting noise-free features

Deals with non-linear relations between the noise and signalThe only parameter is the width of the kernelKnowledge about noise can be encoded in the method

Simple optimization problem → eigendecomposition of the kernel matrix

Noise estimation in the kernel space with different levels of sophistication

Simple feature extraction toolbox (SIMFEAT) soon at http://isp.uv.es

Open questions and Future Work

Pre-images of transformed data in the input space

Learn kernel parameters in an automatic way

Test KMNF in more remote sensing applications: denoising, unmixing, ...

Explicit Signal to Noise Ratio inReproducing Kernel Hilbert Spaces

Luis Gómez-Chova1 Allan A. Nielsen2 Gustavo Camps-Valls1

1Image Processing Laboratory (IPL), Universitat de València, Spain.luis.gomez-chova@uv.es , http://www.valencia.edu/chovago

2DTU Space - National Space Institute. Technical University of Denmark.

IGARSS 2011 – Vancouver, Canada

Image Processing Laboratory

Explicit Signal to Noise Ratio in Reproducing Kernel Hilbert Spaces.pdf

Technology