Learning with Hypergraphs: Discovery of Higher-Order Interaction Patterns from High-Dimensional Data...

Learning with Hypergraphs: Learning with Hypergraphs: Discovery of Higher-Order Interaction Patterns from Discovery of Higher-Order Interaction Patterns from

High-Dimensional DataHigh-Dimensional Data

Moscow State University,Moscow State University, Faculty of Computational Mathematics and Cybernetics, Faculty of Computational Mathematics and Cybernetics, Feb. 22, 2007, Moscow, RussiaFeb. 22, 2007, Moscow, Russia

Byoung-Tak Zhang

Biointelligence LaboratorySchool of Computer Science and Engineering

Brain Science, Cognitive Science, Bioinformatics ProgramsSeoul National University

Seoul 151-742, Korea

[email protected]://bi.snu.ac.kr/

© 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/

2

Probabilistic Graphical Models (PGMs)Probabilistic Graphical Models (PGMs)

Represent the joint probability distribution on some random variables in graphical form. Undirected PGMs Directed PGMs

Generative: The probability distribution for some variables given values of other variables can be obtained. Probabilistic inference

( , , , , )

( ) ( | ) ( | , ) ( | , , )

( | , , , )

( ) ( ) ( | , ) ( | ) ( | )

P A B C D E

P A P B A P C A B P D A B C

P E A B C D

P A P B P C A B P D B P E C

C

A B

E

D

• C and D are independent given B.

• C asserts dependency between A and B.

• B and E are independent given C.


3

Kinds of Graphical ModelsKinds of Graphical Models

Graphical Models

- Boltzmann Machines - Markov Random Fields

- Bayesian Networks- Latent Variable Models- Hidden Markov Models- Generative Topographic Mapping- Non-negative Matrix Factorization

Undirected Directed


4

Bayesian NetworksBayesian Networks BN = (S, P) consists of a network structure S and a set of local

probability distributions P 1

( ) ( | )

n

i ii

p p xx pa

• Structure can be found by relying on the prior knowledge of causal relationships

<BN for detecting credit card fraud>


5

From Bayes Nets to High-Order PGMsFrom Bayes Nets to High-Order PGMs

G

F

J

A

S

G

F

J

A

S

{ , , , }

( | , , , )

( , , , | ) ( )

( , , , )

( , , , | )

( | ) ( | ) ( | ) ( | )

( | )x J G S A

P F J G S A

P J G S A F P F

P J G S A

P J G S A F

P J F P G F P S F P A F

P x F

G

F

J

A

S

{ , , , , }

( , , , , )

( | ) ( | ) ( | )( | )

( | ( ))x F J G S A

P F J G S A

P G F P J F P J A J S

P x pa x

( , ) {( , )| , { , , , } and }

( , , , , )

( , | ) ( , | ) ( , | )

( , | ) ( , | )

( , | )

( ( , ) | )he x y x y x y J G S A

x y

P F J G S A

P J G F P J S F P J A F

P G S F P G A F

P S A F

P he x y F

(1) Naïve Bayes

(2) Bayesian Net

(3) High-Order PGM

The Hypernetworks The Hypernetworks


7

HypergraphsHypergraphs A hypergraph is a (undirected) graph G whose edges connect

a non-null number of vertices, i.e. G = (V, E), where V = {v1, v2, …, vn}, E = {E1, E2, …, En}, and Ei = {vi1, vi2, …, vim} An m-hypergraph consists of a set V of vertices and a subset

E of V[m], i.e. G = (V, V[m]) where V[m] is a set of subsets of V whose elements have precisely m members.

A hypergraph G is said to be k-uniform if every edge Ei in E has cardinality k.

A hypergraph G is k-regular if every vertex has degree k. Rem.: An ordinary graph is a 2-uniform hypergraph.


8

An Example HypergraphAn Example Hypergraph

v5v5

v1v1

v3v3

v7v7

v2v2

v6v6

v4v4

G = (V, E)V = {v1, v2, v3, …, v7}E = {E1, E2, E3, E4, E5}

E1 = {v1, v3, v4}E2 = {v1, v4}E3 = {v2, v3, v6}E4 = {v3, v4, v6, v7}E5 = {v4, v5, v7}

E1

E4

E5

E2

E3


9

HypernetworksHypernetworks A hypernetwork is a hypergraph of weighted edges. It is defined as a tripl

e H = (V, E, W), where

V = {v1, v2, …, vn},

E = {E1, E2, …, En},

and W = {w1, w2, …, wn}. An m-hypernetwork consists of a set V of vertices and a subset E of V[m], i.

e. H = (V, V[m], W) where V[m] is a set of subsets of V whose elements have precisely m members and W is the set of weights associated with the hyperedges.

A hypernetwork H is said to be k-uniform if every edge Ei in E has cardinality k.

A hypernetwork H is k-regular if every vertex has degree k. Rem.: An ordinary graph is a 2-uniform hypergraph with wi=1.

[Zhang, DNA-2006]


10

x1x2

x3

x4

x5

x6

x7

x8 x9

x10

x11

x12

x13

x14

x15

A Hypernetwork A Hypernetwork

Learning with Hypernetworks Learning with Hypernetworks


12

The Hypernetwork Model of LearningThe Hypernetwork Model of Learning

Nn

K

iii

i

D

WWWW

SkXSSS

xxxX

WSXH

I

1)(

)()3()2(

,...,,

}{

:set Training

),...,,(

|| , ,

)(

),,(

as defined isrk hypernetwo The

21

x

)( 2121...21

21

21...21

321

321321

21

2121

321

321321

21

2121

2 ,...,,

)()()()(

2 ,...,,

)()()()(

,,

)()()()3(

,

)()()2(

)()(

,,

)()()()3(

,

)()()2()(

...)(

1exp)Z(

isfunction partition thewhere

,...)(

1exp

)Z(

1

...6

1

2

1exp

)Z(

1

)];(exp[)Z(

1 )|(

ondistributiy probabilit The

...6

1

2

1 );(

rkhypernetwo theofenergy The

m kkiiikiii

k

kiiikiii

iiiiiiiiii

iiiiiiiiii

K

k iii

mmmk

K

k iii

nnnk

iii

nnn

ii

nn

nn

iii

nnn

ii

nnn

xxxwkc

W

xxxwkcW

xxxwxxwW

WEW

WP

xxxwxxwWE

x

xx

x

[Zhang, 2006]


13

Deriving the Learning RuleDeriving the Learning Rule

N

n

K

k iii

nnnk

N

n

Kn

Nn

WZxxxwkc

WWWP

WP

k

kiiikiii1 2 ,...,,

)()()()(

1

)()3()2()(

1)(

)(ln...)(

1exp

),...,,|(ln

)|}({ln

21

21...21

x

x

)|}({ln 1)(

)(

...21

WPw

Nns

siii

x

N

n

Nn

WP

WP

1

(n)

1)(

)|(

)|}({

x

x


14

Derivation of the Learning Derivation of the Learning RuleRule

xx

x

x

x

x

)|(......

...1

...

where

......

......

)(ln...)(

1exp

)(ln...)(

1exp

)|}({ln

2121

2121

2121

2121

...2121

21...21

...21

21

21...21

...21

...21

)|(

1

)()()(

)|(

1)|(

)()()(

1)(

2 ,...,,

)()()()()(

1 2 ,...,,

)()()()()(

1)(

)(

WPxxxxxx

xxxN

xxx

xxxxxxN

xxxxxx

WZw

xxxwkcw

WZxxxwkcw

WPw

siiis

siiis

ss

ssiii

siiik

kiiikiii

siii

k

kiiikiii

siii

siii

WPiii

N

n

nnn

Dataiii

WPiiiDataiii

N

nWPiii

nnn

N

ns

K

k iii

nnnks

N

n

K

k iii

nnnks

Nns


15x8 x9

x12

x1x2

x3

x4

x5

x6

x7x10

x11

x13

x14

x15

x1 =1

x2 =0

x3 =0

x4 =1

x5 =0

x6 =0

x7 =0

x8 =0

x9 =0

x10 =1

x11 =0

x12 =1

x13 =0

x14 =0

x15 =0

y

= 1

x1 =0

x2 =1

x3 =1

x4 =0

x5 =0

x6 =0

x7 =0

x8 =0

x9 =1

x10 =0

x11 =0

x12 =0

x13 =0

x14 =1

x15 =0

y

= 0

x1 =0

x2 =0

x3 =1

x4 =0

x5 =0

x6 =1

x7 =0

x8 =1

x9 =0

x10 =0

x11 =0

x12 =0

x13 =1

x14 =0

x15 =0

y

=1

4 examples

x4 x10 y=1x1

x4 x12 y=1x1

x10 x12 y=1x4

x3 x9 y=0x2

x3 x14 y=0x2

x9 x14 y=0x3

x6 x8 y=1x3

x6 x13 y=1x3

x8 x13 y=1x6

1

2

3

1

2

3

x1 =0

x2 =0

x3 =0

x4 =0

x5 =0

x6 =0

x7 =0

x8 =1

x9 =0

x10 =0

x11 =1

x12 =0

x13 =0

x14 =0

x15 =1

y

=14

x11 x15 y=0x84

Round 1Round 2Round 3


16

Molecular Self-Assembly of HypernetworksMolecular Self-Assembly of Hypernetworks

xi xj y

X7

X6

X5

X8

X1

X2

X3

X4

Hypernetwork Representation

x1 x3 Class

x1 x2 x4 Classx2 x3 Class

x1 x4 Class

x1 x3 Class

x1 x3 Class

x1 x2 x4 Class

x1 x2 x4 Class

x2 x3 x4 Class

x2 x3 x4 Class

x2 x3 x4 Class

x2 x3 Class

x2 x3 Class

x1 x4 Class

x1 x4 Class

x1 Class

x2 Class

x1 x2 Class

x1 x3 Class

x1 xn Class…

x1 Class

x1 Class

x2 Class

x1 x2 Class

x1 x2 Class

x1 x3 Class

x1 x3 Class

x1 x3 Class

x1 xn Class…

x2 Class

x2 Class

x1 x3 Class

x1 x3 Class

Molecular Encoding


17

Encoding a Hypernetwork with DNAEncoding a Hypernetwork with DNA

z1 :

z2 :

z3 :

z4 :

b)

x1

x2

x3

x4

x5

y

0

1

where

z1 : (x1=0, x2=1, x3=0, y=1)z2 : (x1=0, x2=0, x3=1, x4=0, x5=0, y=0)z3 : (x2=1, x4=1, y=1)z4 : (x2=1, x3=0, x4=1, y=0)

a)

AAAACCAATTGGAAGGCCATGCGG

AAAACCAATTCCAAGGGGCCTTCCCCAACCATGCCC

AATTGGCCTTGGATGCGG

AATTGGAAGGCCCCTTGGATGCCC

GG

AAAA

AATT

AAGG

CCTT

CCAA

ATGC

CC

Collection of (labeled) hyperedges

Library of DNA molecules corresponding to (a)


18

DNA Molecular ComputingDNA Molecular Computing

Self-assembly

Heat

Cool

Polymer

Repeat

Self-replication

Molecular recognitionNanostructure


19

Learning the Hypernetwork (by Molecular EvoluLearning the Hypernetwork (by Molecular Evolution)tion)

Library of combinatorialmolecules

+

Library Example

Select the library elements matching the example

Amplify the matched library elements by PCR

Next generation

ii

Hybridize

[Zhang, DNA11]


20

Molecular Information ProcessingMolecular Information Processing

MP4.avi


21

i i

The Theory of Bayesian EvolutionThe Theory of Bayesian Evolution

P0(Ai) Pg(Ai |D)...

generation 0 generation gP(A |D)P(A |D)

Pg(Ai)

[Zhang, CEC-99]

Evolution as a Bayesian inference process Evolutionary computation (EC) is viewed as an iterative process of

generating the individuals of ever higher posterior probabilities from the priors and the observed data.


22

1. Let the hypernetwork H represent the current distribution P(X,Y).

2. Get a training example (x,y).3. Classify x using H as follows

3.1 Extract all molecules matching x into M.3.2 From M separate the molecules into classes:

Extract the molecules with label Y=0 into M0

Extract the molecules with label Y=1 into M1

3.3 Compute y*=argmaxY{0,1}| MY |/|M|

4. Update HIf y*=y, then Hn ← Hn-1+{c(u, v)} for u=x and v=y for (u, v) Hn-1,If y*≠y, then Hn ← Hn-1{c(u, v)} for u=x and v ≠ y for (u, v) Hn-1

5.Goto step 2 if not terminated.

Evolutionary Learning Algorithm for HyEvolutionary Learning Algorithm for Hypernetwork Classifierspernetwork Classifiers

Learning with Hypergraphs: ApplicatioLearning with Hypergraphs: Application Resultsn Results

Biological ApplicationsBiological Applications

DNA-Based Molecular Diagnosis MicroRNA-Based Diagnosis Aptamer-Based Diagnosis


25

DNA-Based DiagnosisDNA-Based Diagnosis

120 samples from60 leukemia patients

Diagnosis

[Cheok et al., Nature Genetics, 2003]

Gene expression data

Training Hypernets with 6-fold validation

Class: ALL/AML

&


26

Learning CurveLearning Curve

Fitness evolution of the population of hyperedges


27

Order Effects on LearningOrder Effects on Learning

Fitness curves for runs with fixed-cardinality hyperedges (card = 1, 4, 7, 10)

Aptamer-Based Cardiovascular Disease Aptamer-Based Cardiovascular Disease DiagnosisDiagnosis


29

Training DataTraining Data▷ Disease : Cardiovascular Disease (CVD)

▷ Classes : 4 Classes [Normal / 1st / 2nd / 3rd Stages] ▷ The number of Samples : 135 Samples [N : 40 / 1st : 38 / 2nd : 19 / 3rd : 18] ▷ Preprocessing

3K Aptamer Array

Convert to Real-value

3K Real-value Data

Feature SelectionUsing Gain Ratio

150 Real-value Data

BinarizationUsing MDL

150 Boolean Data

▷ Simulation Parameter Value 1) Order : 2 ~ 70 2) Sampling Rate : 50 3) In each case, 10 times repeated and averaged

▷ Classification : Majority voting with The Sum of Library Element Weight

▷ Training / Test Size : Traing 108 (80%) / Test 27 (20%)


30

Learning & Classification by HypernetworksLearning & Classification by HypernetworksX1=1 C=1X2=0X3=0X4=1X5=1X6=1X7=0 … X149=1X0=1

X1=0

C=0

X2=0X3=1X4=1X5=1X6=0X7=0 X149=1X0=0

X1=0 C=1X2=1X3=1X4=0X5=1X6=0X7=1 X149=1X0=0

X1=1X2=0X3=0X0=1 C=1

X4=1X6=1X7=0X0=1 C=1

X18=1X35=0X68=1X82=0C=1

X6=0X7=0X8=0X9=1 C=0

X14=0X4=1X5=1X7=0 C=0

X22=0X4=1X6=0X149=1

C=0

X1=0X33=1X4=0X9=1 C=1

X3=1X6=0X52=1X8=0 C=1

X2=1X4=0X5=1X0=0 C=1

X1=1 C=1X2=1X3=1X4=0X5=0X6=0X7=1 X149=1X0=0

X1=0 C=0X2=1X3=1X4=0X5=0X6=0X7=1 X149=1X0=1

W=1000

W=1000

W=1000

W=1000

W=1000

W=1000

W=1000

W=1000

W=1000

TrainingData

TestData

BinarizationBinarization

Library

Data Set

Library

Sampling

Sampling

WeightUpdate

WeightUpdate

TrainingData

TestTest

TestData

65

70

75

80

85

90

95

0

40 80

120

160

200

240

280

320

360

400

440

480

520

560

600

640

680

720

760

800

840

880

920

960

1000

70

72

74

76

78

80

82

84

86

0

40 80

120

160

200

240

280

320

360

400

440

480

520

560

600

640

680

720

760

800

840

880

920

960

1000

Learining Loop [Evolution Stage]

Source Data

Adjust Learning Rate

…

…

…

…

C=0

X1=1X2=0X3=0X0=1 C=1

X4=1X6=1X7=0X0=1 C=1

X18=1X35=0X68=1X82=0C=1

X6=0X7=0X8=0X9=1 C=0

X14=0X4=1X5=1X7=0 C=0

X22=0X4=1X6=0X149=1

X1=0X33=1X4=0X9=1 C=1

X3=1X6=0X52=1X8=0 C=1

X2=1X4=0X5=1X0=0 C=1

W’=1

W’=45

W’=4000

W’=12

W’=8530

W’=500

W’=1300

W’=4

W’=14

Weight Update Rule (Learning) : Error CorrectionIn case that all index-value matched,If Class is correct, w = w*1.0001Else w = w*0.95.


31

Simulation Result (1/3)Simulation Result (1/3)

▷ Training & test errors as learning goes on (order k=12)

0 50 100 150 200 250 300 350 400 450 50075

80

85

90

95

100

Epoch

Accura

cy

Training

Test


32

Simulation Result (2/3)Simulation Result (2/3)

0 50 100 150 200

64

66

68

70

72

74

76

78

80

82

84

Epoch

Accura

cy

24

8

12

16

2030

40

50

6070

Order

▷ Accuracy on test data as learning goes on (order k=12)


33

Simulation Result (3/3)Simulation Result (3/3) ▷ The effect of learning

0 10 20 30 40 50 60 70 8064

66

68

70

72

74

76

78

80

82

84

Order

Accura

cy

Learning

Sampling only

Mining Mining CCancerancer-R-Related elated MicroMicroRNA RNA MModules from miRNA odules from miRNA EExpression xpression PProfilesrofiles


35

Gene Regulation by microRNAsGene Regulation by microRNAs MicroRNAs

MicroRNAs (miRNAs) are endogenous about 22 nt RNAs that can play important regulatory roles in animals, plants and viruses. Post-transcriptional gene regul

ation Binding target genes for degra

dation or translational repression

Recently, miRNAs are reported that related to the cancer development and progression.


36

DatasetDataset

The miRNA expression microarray data The expression profiles of miRN

A in human among 11 tumors, which were bladder, breast, colon, kidney, lung, pancreas, prostate, uterus, melanoma, mesothelioma, ovary tissue (Lu et al., 2005).

This dataset consists of an expression matrix of 151 miRNAs (rows) and 89 samples (columns).

Tissue type

Cancer

Normal

Bladder 1 6

Breast 3 6

Colon 4 7

Kidney 3 4

Lung 2 5

Pancreas 1 8

Prostate 6 6

Uterus 1 10

Melanoma 0 3

Mesothelioma 0 8

Ovary 0 5

All tissues

21 68


37

X=11

X=20

X=31

X=41

X=50

X=61

X=1510

Class cancer

X=10

X=20

X=30

X=41

X=50

X=60

……. X=1511

Class normal

X=11

X=20

X=30

X=41

X=50

X=61

……. X=1511

Class cancer

…

1

2

89

Data item : 151 miRNAs 89 samples

X=1 X=2 cancer

X=1 X=45 cancer

X=1 X=80 normal

X=1 X=2 cancer

1

X=10 X=20 normal

X=10 X=31 cancer

X=31 X=20 normal

2

X=1 X=2 cancer

X=1 X=45 cancer

X=1 X=45 cancer

X=1 X=2 cancer

89

Library (normal or cancer classification rules)

…

A hypernetwork H = (X, E, W) of DNA Molecules

Representing a Hypernetwork Representing a Hypernetwork from miRNA Expression Datafrom miRNA Expression Data


38

PerformancePerformance

Leave-one-out cross-validation

Algorithms Correct classification rate

Bayesian Network 79.77 %

Naïve Bayes 83.15 %

ID3 88.76 %

Hypernetworks 90.00%

Sequential Minimal Optimization (SMO)

91.01 %

Multi-layer perceptron (MLP) 92.13 %


39

Accuracy vs. Order for Test Data Accuracy vs. Order for Test Data (sampling only)(sampling only)

20 40 60 80 100 120 1400.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Order

Cla

ssifi

catio

n ra

tio


40

Learning Curves for Training Learning Curves for Training DataData

0 10 20 30 40 50 60

0.8

0.85

0.9

0.95

1

Epoch

Cla

ssifi

catio

n ra

tio

23

4

5

67

Order


41

miRNA Data MiningmiRNA Data Mining

WeightmiRNA modules

a b

7919.249184 hsa-miR-215 1 hsa-miR-7 1

6787.927872 hsa-miR-194 1 hsa-miR-30d 0

6787.927872 hsa-miR-214 1 hsa-miR-30e 0

6084.600896 hsa-miR-21 1 hsa-miR-321 1

5656.60656 hsa-miR-142-3p 1 hsa-miR-34b 0

5656.60656 hsa-miR-142-3p 1 hsa-miR-96 0

5656.60656 hsa-miR-126 1 hsa-miR-30c 0

5324.025784 hsa-miR-26b 1 hsa-miR-29b 1

5324.025784 hsa-let-7f 1 hsa-miR-9* 1

5324.025784 hsa-miR-224 1 hsa-miR-301 0

miRNA modules related to cancer miRNAs related to cancer

miRNAs weight

hsa-miR-155 295972.7

hsa-miR-105 283034.8

hsa-miR-223 280371.4

hsa-miR-21 277609.9

hsa-let-7c 270764.7

hsa-miR-142-3p 266700.1

hsa-miR-29b 263159

hsa-miR-224 260877.3

hsa-miR-183 260877.3

hsa-miR-184 260116.7

hsa-let-7a 256313.8

Non-Biological ApplicationsNon-Biological Applications

Digit Recognition Face Classification Text Classification Movie Title Prediction


43

Digit Recognition: DatasetDigit Recognition: Dataset

Original Data Handwritten digits (0 ~ 9) Training data: 2,630 (263

examples for each class) Test data: 1,130 (113

examples for each class)

Preprocessing Each example is 8x8

binary matrix. Each pixel is 0 or 1.


44

Output Layer

x1

x1

xn

x3

x2

Class

x1x3

x1…xn

x1x2

x2

w2

w i

w jw

m

w1

Input Layer

Hidden Layer

•••

•••

•••

n

k k

nm

1

}#,1|{1

copiesofwk

niwW

n

ki

x1 Class

x2 Class

x1 x2 Class

x1 x3 Class

x1 xn Class…

x1 Class

x1 Class

x2 Class

x1 x2 Class

x1 x2 Class

x1 x3 Class

x1 x3 Class

x1 x3 Class

x1 xn Class…

x1 x3 Class

x1 x2 x4 Classx2 x3 Class

x1 x4 Class

x1 x3 Class

x1 x3 Class

x1 x2 x4 Class

x1 x2 x4 Class

x2 x3 x4 Class

x2 x3 x4 Class

x2 x3 x4 Class

x2 x3 Class

x2 x3 Class

x1 x4 Class

x1 x4 Class

x1 Class

x2 Class

x1 x2 Class

x1 x3 Class

x1 xn Class…

x1 Class

x1 Class

x2 Class

x1 x2 Class

x1 x2 Class

x1 x3 Class

x1 x3 Class

x1 x3 Class

x1 xn Class…

x2 Class

x2 Class

x1 x3 Class

x1 x3 Class

Probabilistic Library(DNA Representation)

“Layered” Hypernetwork

Pattern ClassificationPattern Classification


45

Simulation Results – without Simulation Results – without Error CorrectionError Correction |Train set| = 3760, |Test set| = 1797.


46

Performance ComparisonPerformance Comparison

Methods Accuracy

MLP with 37 hidden nodes 0.941

MLP with no hidden nodes 0.901

SVM with polynomial kernel 0.926

SVM with RBF kernel 0.934

Decision Tree 0.859

Naïve Bayes 0.885

kNN (k=1) 0.936

kNN (k=3) 0.951

Hypernet with learning (k = 10) 0.923

Hypernet with sampling (k = 33) 0.949


47

Error Correction AlgorithmError Correction Algorithm

1. Initialize the library as before.2. maxChangeCnt := librarySize.3. For i := 0 to iteration_limit

1. trainCorrectCnt := 0.2. Run classification for all training patterns. For each correctly classifed p

atterns, increase trainCorrectCnt.3. For each library elements

1. Initialize fitness value to 0.2. For each misclassified training patterns if a library element is matched to that

example1. if classified correctly, then fitness of the library element gains 2 points.2. Else it loses 1 points.

4. changeCnt := max{ librarySize * (1.5 * (trainSetSize - trainCorrectCnt) / trainSetSize + 0.01), maxChangeCnt * 0.9 }.

5. maxChangeCnt := changeCnt.6. Delete changeCnt library elements of lowest fitness and resample library

elements whose classes are that of deleted ones.


48

Simulation Results – with Simulation Results – with Error CorrectionError Correction iterationLimit = 37, librarySize = 382,300,

0 5 10 15 20 25 30 350.9

0.91

0.92

0.93

0.94

0.95

0.96

0.97

0.98

0.99

1

Iteration

Cla

ssifi

catio

n ra

tio

Train

610

14

18

2227

Order

0 5 10 15 20 25 30 35

0.87

0.88

0.89

0.9

0.91

0.92

0.93

Iteration

Cla

ssifi

catio

n ra

tio

Test

610

14

18

2226

Order


49


Algorithms Correct classification rate

Random Forest (f=10, t=50) 94.10 %

KNN (k=4)

Hypernetwork (Order=26)

93.49 %

92.99 %

AdaBoost (Weak Learner: J48) 91.93 %

SVM (Gaussian Kernel, SMO) 91.37 %

MLP 90.53 %

Naïve Bayes

J48

87.26 %84.86 %

Face Classification ExperimentsFace Classification Experiments


51

Face Data SetFace Data Set

Yale dataset 15 people 11 images

per personTotal 165

images


52

Training Images of a PersonTraining Images of a Person

10 for training

The remaining 1 for test


53

Bitmaps for Training Data Bitmaps for Training Data (Dimensionality = 480)(Dimensionality = 480)


54

Classification Rate by Leave-One-Classification Rate by Leave-One-OutOut


55

Classification Rate Classification Rate (Dimensionality = 64 by PCA)(Dimensionality = 64 by PCA)

Text Classification ExperimentsText Classification Experiments


57

Text ClassificationText Classification

. . .

1 0 0 0 2 0 0 1

0 3 0 1 0 0 0 1

1. Documents1. Documents

3. Term vectors3. Term vectors

1 0 1 0 1 0 1 0

0 1 1 1 1 0 0 1

1 0 0 0 0 1 0 1

0 0 1 0 0 0 1 0

0 1 1 0 0 0 0 1

0 0 1 0 0 1 0 0

1 0 1 1 0 0 1 1

0 1 1 0 1 0 0 0

0 0 0 0 1 1 0 0

baseballspecs

graphicshockey

unixspace

d1

d2

d3

dn

4. Binary term-document matrix

4. Binary term-document matrix

1 0 1 0 0 0 0 2

2. Bag-of-words representation2. Bag-of-words representation

x1=0 x2=1 y=1x3=1

x1=0 x2=0 y=0x3=1 x2=1 x3=0 y=0

x2=1 y=0

x2=1 x3=1 y=1

x1=0 y=0x1=0 y=0

x1=0 y=0

x1=0 y=1x1=0 y=1

x1=0 y=1

x2=0 y=0x2=0 y=0

x2=0 y=0x2=0 y=1

x2=0 y=1x2=0 y=1

x1=0 x2=0 y=0x1=0 x2=0 y=0

x1=0 x2=0 y=0

x1=0 x2=0 y=1x1=0 x2=0 y=1

x1=0 x2=0 y=1

x1=0 x2=1 y=0x1=0 x2=1 y=0

x1=0 x2=1 y=0

x1=0 x2=1 y=1x1=0 x2=1 y=1

x1=0 x2=1 y=1

x3=0x1=0 x2=0 y=0x3=0x1=0 x2=0 y=0

x3=0x1=0 x2=0 y=0

x3=0x1=0 x2=0 y=1x3=0x1=0 x2=0 y=1

x3=0x1=0 x2=0 y=1

x3=1x1=0 x2=0 y=0x3=1x1=0 x2=0 y=0

x3=1x1=0 x2=0 y=0

x3=1x1=0 x2=0 y=1x3=1x1=0 x2=0 y=1

x3=1x1=0 x2=0 y=1

x3=0x1=0 x2=1 y=0x3=0x1=0 x2=1 y=0

x3=0x1=0 x2=1 y=0

x3=0x1=0 x2=1 y=1x3=0x1=0 x2=1 y=1

x3=0x1=0 x2=1 y=1

5. DNA encoded kernel functions

5. DNA encoded kernel functions


58

Text ClassificationText Classification

Data from Reuters-21578 (‘ACQ’ and ‘EARN’)

Learning curves: average for 10 runs


59


‘ACQ’ data (4,724 documents)

‘EARN’ data (7,888 documents)

Higher-dimensional kernel functions can improve the performance further.

Learning from Movie Captions Learning from Movie Captions ExperimentsExperiments


61

Learning Hypernets from Movie CaptionsLearning Hypernets from Movie Captions

Order Sequential Range: 2~3

Corpus Friends Prison Break 24


62



63



64



65


Classification Query generation

- I intend to marry her : I ? to marry her I intend ? marry her I intend to ? her I intend to marry ? Matching

- I ? to marry her order 2: I intend, I am, intend to, …. order 3: I intend to, intend to marry, …

Count the number of max-perfect-matching h

yperedges


66

Completion & Classification Examples

Query Completion Classification

who are you Corpus: Friends, 24, Prison Break

? are you

who ? you

who are ?

what are you

who are you

who are you

Friends

Friends

Friends

you need to wear it Corpus: 24, Prison Break, House

? need to wear it

you ? to wear it

you need ? wear it

you need to ? it

you need to wear ?

i need to wear it

you want to wear it

you need to wear it

you need to do it

you need to wear a

24

24

24

House

24



67

ConclusionConclusion Hypernetworks are a graphical model employing higher-order nodes exp

licitly and allowing for a more natural representation for learning higher-order graphical models.

We introduce an evolutionary learning algorithm that makes use of the high information density and massive parallelism of molecular computing to solve the combinatorial explosion problems.

Applied to pattern recognition (and completion) problems in IT and BT. Obtained a performance competitive to conventional ML classifiers. Why does this work?

Exploits the huge population size available in DNA computing to build an ensemble machine, i.e. a hypernetwork, of simple random hyperedges.

A new kind of evolutionary algorithm where a very simple “molecular” operators are applied to a “huge” population of individuals in a “massively parallel” way.

Another potential of hypernetworks is for application to solving biological problems where data are given as “wet” DNA or RNA molecules.


68

Simulation Experiments Joo-Kyoung Kim, Sun Kim, Soo-Jin Kim, Jung-Woo Ha, Chan-Hoon Park, Ha-Young JangCollaborating Labs - Biointelligence Laboratory, Seoul National University - RNomics Lab, Seoul National University - DigitalGenomics, Inc. - GenoProt, Inc. Supported by - National Research Lab Program of Min. of Sci. & Tech. (2002-2007)

- Next Generation Tech. Program of Min. of Ind. & Comm. (2000-2010)

More Information at - http://bi.snu.ac.kr/MEC/ - http://cbit.snu.ac.kr/

AcknowledgementsAcknowledgements

Date post:	02-Jan-2016
Category:	Documents
Upload:	ross-bradley
View:	213 times
Download:	0 times