+ All Categories
Home > Documents > Kernel based data fusion

Kernel based data fusion

Date post: 08-Jan-2016
Category:
Upload: aileen
View: 30 times
Download: 3 times
Share this document with a friend
Description:
Kernel based data fusion. Discussion of a Paper by G. Lanckriet. Paper. Overview. Problem : Aggregation of heterogeneous data Idea : Different data are represented by different kernels Question : How to combine different kernels in an elegant/efficient way? - PowerPoint PPT Presentation
Popular Tags:
17
1 Kernel based data fusion Discussion of a Paper by G. Lanckriet
Transcript
Page 1: Kernel based data fusion

1

Kernel based data fusionDiscussion of a Paper by G. Lanckriet

Page 2: Kernel based data fusion

2

Paper

Page 3: Kernel based data fusion

3

Overview

Problem: Aggregation of heterogeneous data

Idea: Different data are represented by differentkernels

Question: How to combine different kernels in anelegant/efficient way?

Solution: Linear combination and SDP

Application: Recognition of ribosomal and membrane proteins

Page 4: Kernel based data fusion

4

Linear combination of kernels

weight kernel

Resulting kernel K is positive definite (xTKx > 0 for x, provided i > 0 and xTKi x > 0 )

Elegant aggregation of heterogeneous data More efficient than training of individual SVMs KCCA uses unweighted sum over individual kernels

xTKx = x2K

x

x2K

0

Page 5: Kernel based data fusion

5

Support Vector Machine

slack variables

square norm vector

penalty term

Hyperplane

Page 6: Kernel based data fusion

6

Dual form

Lagrange multipliers

quadratic, convex

Maximization instead of minimization Equality constraints Lagrange multipliers instead of w,b, Quadratic program (QP)

positive definite

scalar 0

Page 7: Kernel based data fusion

7

Inserting linear combination

Combined kernel must be within the cone of positive semidefinite matrices

Fixed trace,avoids trivial solution

ugly

Page 8: Kernel based data fusion

8

Cone and other stuff

http://www.convexoptimization.com/dattorro/positive_semidefinate_cone.html

The set of all symmetric positive semidefinite matrices of particular dimension is called the positive semidefinite cone.

xTAx ≥ 0, x

A

Positive semidefinite:

Positive semidefinite cone:

Page 9: Kernel based data fusion

9

Semidefinite program (SDP)

positive semidefinite constraints

Fixed trace,avoids trivial solution

Page 10: Kernel based data fusion

10

Dual form

Quadratically constraint quadratic program (QCQP) QCQPs can be solved more efficiently than SDPs

(O(n3) <-> O(n4.5)) Interior point methods

quadratic constraint

Page 11: Kernel based data fusion

11

Interior point algorithm

Linear program:

maximize cTx

subject to Ax < b x ≥ 0

Classical Simplex method follows edges of polyhedron Interior point methods walk through the interior of the

feasible region

Page 12: Kernel based data fusion

12

Application Recognition of ribosomal and membrane proteins in

yeast 3 Types of data

Amino acid sequences Protein protein interactions mRNA expression profiles

7 Kernels Empirical kernel map -> sequence homology

BLAST(B), Smith-Waterman(SW), Pfam FFT -> sequence hydropathy

KD hydropathy profiles, padding, low-pass filter, FFT, RBF Interaction kernel(LI) -> PPI Diffusion(D) -> PPI RBF(E) -> gene expression

Page 13: Kernel based data fusion

13

Results

Combination of kernels performs better than individual kernels Gene expression (E) most important for ribosomal protein recognition PPI (D) most important for membrane protein recognition

Page 14: Kernel based data fusion

14

Results Small improvement compared to weights = 1 SDP robust in the presence of noise How performs SDP versus kernel weights derived

from accuracy of individual SVMs? Membrane protein recognition

Other methods use sequence information only TMHMM designed for topology prediction TMHMM not trained on yeast only

Page 15: Kernel based data fusion

15

Why is this cool?

Everything you ever dreamed of: Optimization of C included

(2-norm soft margin SVM =1/C)

Hyperkernels (optimize the kernel itself)

Transduction (learn from labeled & unlabeled samples in polynomial time)

SDP has many applications(Graph theory, combinatorial optimization, …)

Page 16: Kernel based data fusion

16

Literature Learning the kernel matrix with semidefinite programming

G.R.G.Lanckrit et. al, 2004 Kernel-based data fusion and its application to protein

function prediction in yeastG.R.G.Lanckrit et. al, 2004

Machine learning using HyperkernelsC.S.Ong, A.J.Smola, 2003

Semidefinite optimizationM.J.Todd, 2001

http://www-user.tu-chemnitz.de/~helmberg/semidef.html

Page 17: Kernel based data fusion

17

Software SeDuMi (SDP) Mosek (QCQP, Java,C++, commercial) YALMIP (Matlab)… http://www-user.tu-chemnitz.de/~helmberg/semidef.html


Recommended