+ All Categories
Home > Documents > 1 Machine Learning in Systems Biology, Sep 25th Identification of overlapping biclusters using...

1 Machine Learning in Systems Biology, Sep 25th Identification of overlapping biclusters using...

Date post: 15-Jan-2016
Category:
View: 212 times
Download: 0 times
Share this document with a friend
53
1 Machine Learning in Systems Biology, Sep 25th Identification of overlapping biclusters using Probabilistic Relational Models Tim Van den Bulcke, Hui Zhao , Kristof Engelen, Bart De Moor, Kathleen Marchal
Transcript
Page 1: 1 Machine Learning in Systems Biology, Sep 25th Identification of overlapping biclusters using Probabilistic Relational Models Tim Van den Bulcke, Hui.

1Machine Learning in Systems Biology, Sep 25th

Identification of overlapping biclusters using Probabilistic Relational Models

Tim Van den Bulcke, Hui Zhao, Kristof Engelen,

Bart De Moor, Kathleen Marchal

Page 2: 1 Machine Learning in Systems Biology, Sep 25th Identification of overlapping biclusters using Probabilistic Relational Models Tim Van den Bulcke, Hui.

2Machine Learning in Systems Biology, Sep 25th

Overview

• Biclustering and biology

• Probabilistic Relational Models

• ProBic biclustering model

• Algorithm

• Results

• Conclusion

Page 3: 1 Machine Learning in Systems Biology, Sep 25th Identification of overlapping biclusters using Probabilistic Relational Models Tim Van den Bulcke, Hui.

3Machine Learning in Systems Biology, Sep 25th

Overview

• Biclustering and biology– What is biclustering?

– Why biclustering?

• Probabilistic Relational Models

• ProBic biclustering model

• Algorithm

• Results

• Conclusion

Page 4: 1 Machine Learning in Systems Biology, Sep 25th Identification of overlapping biclusters using Probabilistic Relational Models Tim Van den Bulcke, Hui.

4Machine Learning in Systems Biology, Sep 25th

Biclustering and biology

What is biclustering?

• Definition in the context of gene expression data:

genes

conditions

A bicluster is a subset of genes which show a similar expression profile under a subset of conditions.

Page 5: 1 Machine Learning in Systems Biology, Sep 25th Identification of overlapping biclusters using Probabilistic Relational Models Tim Van den Bulcke, Hui.

5Machine Learning in Systems Biology, Sep 25th

Biclustering and biology

Why bi-clustering?*

• Only a small set of the genes participates in a cellular process.

• A cellular process is active only in a subset of the conditions.

• A single gene may participate in multiple pathways that may or may not be coactive under all conditions.

* From: Madeira et al. (2004) Biclustering Algorithms for Biological Data Analysis: A Survey

Page 6: 1 Machine Learning in Systems Biology, Sep 25th Identification of overlapping biclusters using Probabilistic Relational Models Tim Van den Bulcke, Hui.

6Machine Learning in Systems Biology, Sep 25th

Overview

• Biclustering and biology

• Probabilistic Relational Models

• ProBic biclustering model

• Algorithm

• Results

• Conclusion

Page 7: 1 Machine Learning in Systems Biology, Sep 25th Identification of overlapping biclusters using Probabilistic Relational Models Tim Van den Bulcke, Hui.

7Machine Learning in Systems Biology, Sep 25th

Probabilistic Relational Models (PRMs)

Patient

Treatment

Virus strain Contact

Image: free interpretation from Segal et al. Rich probabilistic models

Page 8: 1 Machine Learning in Systems Biology, Sep 25th Identification of overlapping biclusters using Probabilistic Relational Models Tim Van den Bulcke, Hui.

8Machine Learning in Systems Biology, Sep 25th

Probabilistic Relational Models (PRMs)

• Traditional approaches “flatten” relational data– Causes bias

– Centered around one view of the data

– Loose relational structure

• PRM models– Extension of Bayesian networks

– Combine advantages of probabilistic reasoning with relational logic

Patient

flatten

Contact

Page 9: 1 Machine Learning in Systems Biology, Sep 25th Identification of overlapping biclusters using Probabilistic Relational Models Tim Van den Bulcke, Hui.

9Machine Learning in Systems Biology, Sep 25th

Overview

• Biclustering and biology

• Probabilistic Relational Models

• ProBic biclustering model

• Algorithm

• Results

• Conclusion

Page 10: 1 Machine Learning in Systems Biology, Sep 25th Identification of overlapping biclusters using Probabilistic Relational Models Tim Van den Bulcke, Hui.

10Machine Learning in Systems Biology, Sep 25th

ProBic biclustering model: notation

• g: gene

• c: condition

• e: expression

• g.Bk: gene-bicluster assignment for gene g to bicluster k (unknown, 0 or 1)

• c.Bk: condition-bicluster assignment for condition c to bicluster k (unknown, 0 or 1)

• e.Level: expression level value (known, continuous value)

Page 11: 1 Machine Learning in Systems Biology, Sep 25th Identification of overlapping biclusters using Probabilistic Relational Models Tim Van den Bulcke, Hui.

11Machine Learning in Systems Biology, Sep 25th

ProBic biclustering model

• Dataset instance

GeneGene

ExpressionExpression

ConditionCondition

ID B1 B2

g1 ? (0 or 1) ? (0 or 1)

g2 ? (0 or 1) ? (0 or 1)

ID B1 B2

c1 ? (0 or 1) ? (0 or 1)

c2 ? (0 or 1) ? (0 or 1)

g.ID c.ID level

g1 c1 -2.4

g1 c2 (missing value)

g2 c1 1.6

g2 c2 0.5

Page 12: 1 Machine Learning in Systems Biology, Sep 25th Identification of overlapping biclusters using Probabilistic Relational Models Tim Van den Bulcke, Hui.

12Machine Learning in Systems Biology, Sep 25th

ProBic biclustering model

• Relational schema and PRM model

Notation:

• g: gene

• c: condition

• e: expression

• g.Bk: gene-bicluster assignment for gene g to bicluster k (0 or 1, unknown)

• c.Bk: condition-bicluster assignment for condition c to bicluster k: (0 or 1, unknown)

• e.Level: expression level value (continuous, known)

GeneGene

ExpressionExpression

ConditionCondition

B1 B2

level

B1 B2

P(e.level | g.B1,g.B2,c.B1,c.B2,c.ID)=

Normal( μg.B,c.B,c.ID, σg.B,c.B,c.ID )

ID

P(e.level | g.B1,g.B2,c.B1,c.B2,c.ID)=

Normal( μg.B,c.B,c.ID, σg.B,c.B,c.ID )

1

2

3

Page 13: 1 Machine Learning in Systems Biology, Sep 25th Identification of overlapping biclusters using Probabilistic Relational Models Tim Van den Bulcke, Hui.

13Machine Learning in Systems Biology, Sep 25th

ProBic biclustering model

GeneGene

ExpressionExpression

ConditionCondition

? (0 or 1)? (0 or 1)g2

? (0 or 1)? (0 or 1)g1

B2B1ID

? (0 or 1)? (0 or 1)c2

? (0 or 1)? (0 or 1)c1

B2B1ID

1.6c1g2

(missing value)c2g1

0.5c2g2

-2.4c1g1

levelc.IDg.ID

g1.B1

g1.B2 level1,1

c1.B1 c1.B2

g2.B1

g2.B2 level2,

2

c2.B1 c2.B2

level2,1

c1.ID c2.ID

PRM modelDatabase instance

ground Bayesian network

GeneGene

ExpressionExpression

ConditionCondition

B1 B2B1 B2

P(e.level | g.B1,g.B2,c.B1,c.B2, c.ID)=

Normal( μg.B,c.B, c.ID, σg.B,c.B,c.ID )

ID

level

Page 14: 1 Machine Learning in Systems Biology, Sep 25th Identification of overlapping biclusters using Probabilistic Relational Models Tim Van den Bulcke, Hui.

14Machine Learning in Systems Biology, Sep 25th

ProBic biclustering model

• Joint Probability Distribution is defined as a product over each of the node in Bayesian Network:

• ProBic posterior ( ~ likelihood x prior ):

Expression level

conditional probabilitiesExpression level prior

(μ, σ)’s

Prior condition to bicluster assignmentsPrior gene to bicluster assignment

Page 15: 1 Machine Learning in Systems Biology, Sep 25th Identification of overlapping biclusters using Probabilistic Relational Models Tim Van den Bulcke, Hui.

15Machine Learning in Systems Biology, Sep 25th

Overview

• Biclustering and biology

• Probabilistic Relational Models

• ProBic biclustering model

• Algorithm

• Results

• Conclusion

Page 16: 1 Machine Learning in Systems Biology, Sep 25th Identification of overlapping biclusters using Probabilistic Relational Models Tim Van den Bulcke, Hui.

16Machine Learning in Systems Biology, Sep 25th

Algorithm: choices

• Different approaches possible

• Only approximative algorithms are tractable:– MCMC methods (e.g. Gibbs sampling)

– Expectation-Maximization (soft, hard assignment)

– Variational approaches

– simulated annealing, genetic algorithms, …

• We chose a hard assignment Expectation-Maximization algorithm (E.-M.)– Natural decomposition of the model in E.-M. steps

– Efficient

– Extensible

– Relatively good convergence properties for this model

Page 17: 1 Machine Learning in Systems Biology, Sep 25th Identification of overlapping biclusters using Probabilistic Relational Models Tim Van den Bulcke, Hui.

17Machine Learning in Systems Biology, Sep 25th

Algorithm: Expectation-Maximization

• Maximization step:– Maximize posterior w.r.t. μ, σ values (model parameters),

given the current gene-bicluster and condition-bicluster assignments (=the hidden variables)

• Expectation step:– Maximize posterior w.r.t. gene-bicluster and condition-bicluster

assignments, given the current model parameters

– Two-step approach:

• Step 1: max. posterior w.r.t. C.B, given G.B and μ, σ values

• Step 2: max. posterior w.r.t. G.B, given C.B and μ, σ values

Page 18: 1 Machine Learning in Systems Biology, Sep 25th Identification of overlapping biclusters using Probabilistic Relational Models Tim Van den Bulcke, Hui.

18Machine Learning in Systems Biology, Sep 25th

Algorithm: Simple Example

Page 19: 1 Machine Learning in Systems Biology, Sep 25th Identification of overlapping biclusters using Probabilistic Relational Models Tim Van den Bulcke, Hui.

19Machine Learning in Systems Biology, Sep 25th

Overview

• Biclustering and biology

• Probabilistic Relational Models

• ProBic biclustering model

• Algorithm

• Results– Noise sensitivity

– Bicluster shape

– Overlap

– Missing values

• Conclusion

Page 20: 1 Machine Learning in Systems Biology, Sep 25th Identification of overlapping biclusters using Probabilistic Relational Models Tim Van den Bulcke, Hui.

20Machine Learning in Systems Biology, Sep 25th

Results: noise sensitivity

• Setup: – Simulated dataset: 500 genes x 200 conditions

– Background distribution: Normal(0,1)

– Bicluster distributions: Normal( rnd(N(0,1)), σ ), varying sigma

– Shapes: three 50x50 biclusters

Page 21: 1 Machine Learning in Systems Biology, Sep 25th Identification of overlapping biclusters using Probabilistic Relational Models Tim Van den Bulcke, Hui.

21Machine Learning in Systems Biology, Sep 25th

Results: noise sensitivity

Precision (genes) Recall (genes)

Precision (conditions) Recall (conditions)

A BA B

A

B

A B

σ σ

σ σ

A

B

Precision = TP / (TP+FP) Recall = TP / (TP+FN)

Page 22: 1 Machine Learning in Systems Biology, Sep 25th Identification of overlapping biclusters using Probabilistic Relational Models Tim Van den Bulcke, Hui.

22Machine Learning in Systems Biology, Sep 25th

Results: bicluster shape independence

• Setup:– Dataset: 500 genes x 200 conditions

– Background distribution: N(0,1)

– Bicluster distributions: N( rnd(N(0,1)), 0.2 )

– Shapes: 80x10, 10x80, 20x20

Page 23: 1 Machine Learning in Systems Biology, Sep 25th Identification of overlapping biclusters using Probabilistic Relational Models Tim Van den Bulcke, Hui.

23Machine Learning in Systems Biology, Sep 25th

Results: bicluster shape independence

Page 24: 1 Machine Learning in Systems Biology, Sep 25th Identification of overlapping biclusters using Probabilistic Relational Models Tim Van den Bulcke, Hui.

24Machine Learning in Systems Biology, Sep 25th

Results: Overlap examples

• Two biclusters (50 genes, 50 conditions)

• Overlap:25 genes, 25 conditions

• Two biclusters (10 genes, 80 conditions)

• Overlap: 2 genes, 40 conditions

Page 25: 1 Machine Learning in Systems Biology, Sep 25th Identification of overlapping biclusters using Probabilistic Relational Models Tim Van den Bulcke, Hui.

25Machine Learning in Systems Biology, Sep 25th

Results: Missing values

• ProBic model has no concept of ‘missing values’

No prior missing value estimations which could bias the result

Page 26: 1 Machine Learning in Systems Biology, Sep 25th Identification of overlapping biclusters using Probabilistic Relational Models Tim Van den Bulcke, Hui.

26Machine Learning in Systems Biology, Sep 25th

Results: Missing values – one example

• 500 genes x 200 conditions

• Noise std: bicluster 0.2, background 1.0

• Missing value: 70%

• One bicluster 50x50

Page 27: 1 Machine Learning in Systems Biology, Sep 25th Identification of overlapping biclusters using Probabilistic Relational Models Tim Van den Bulcke, Hui.

27Machine Learning in Systems Biology, Sep 25th

Results: Missing values

Precision (genes) Recall (genes)

Precision (conditions) Recall (conditions)% missing values

% missing values

Page 28: 1 Machine Learning in Systems Biology, Sep 25th Identification of overlapping biclusters using Probabilistic Relational Models Tim Van den Bulcke, Hui.

28Machine Learning in Systems Biology, Sep 25th

Overview

• Biclustering and biology

• Probabilistic Relational Models

• ProBic biclustering model

• Algorithm

• Results

• Conclusion

Page 29: 1 Machine Learning in Systems Biology, Sep 25th Identification of overlapping biclusters using Probabilistic Relational Models Tim Van den Bulcke, Hui.

29Machine Learning in Systems Biology, Sep 25th

Conclusion

• Noise robustness

• Naturally deals with missing values

• Relatively independent of bicluster shape

• Simultaneous identification of multiple overlapping biclusters

• Can be used query-driven

• Extensible

Page 30: 1 Machine Learning in Systems Biology, Sep 25th Identification of overlapping biclusters using Probabilistic Relational Models Tim Van den Bulcke, Hui.

30Machine Learning in Systems Biology, Sep 25th

Acknowledgements

KULeuven:

• whole BioI group, ESAT-SCD

– Tim Van den Bulcke

– Thomas Dhollander

• whole CMPG group(Centre of Microbial and Plant Genetics)

– Kristof Engelen

– Kathleen Marchal

UGent:

• whole Bioinformatics & Evolutionary Genomics group

– Tom Michoel

Page 31: 1 Machine Learning in Systems Biology, Sep 25th Identification of overlapping biclusters using Probabilistic Relational Models Tim Van den Bulcke, Hui.

31Machine Learning in Systems Biology, Sep 25th

Thank you for your attention!

Page 32: 1 Machine Learning in Systems Biology, Sep 25th Identification of overlapping biclusters using Probabilistic Relational Models Tim Van den Bulcke, Hui.

32Machine Learning in Systems Biology, Sep 25th

Near future

• Automated definition of algorithm parameter settings

• Application biological datasets– Dataset normalization

• Extend model with different overlap models

• Model extension from biclusters to regulatory modulesinclude motif + ChIP-chip data

PromoterPromoter

ExpressionExpression

ArrayArray

S1 S2 S3 S4

R1 R2 R3

M1 M2

level

M1P1 P2 P3

Gene

M2

TTCAATACAGG

R1

R2

Page 33: 1 Machine Learning in Systems Biology, Sep 25th Identification of overlapping biclusters using Probabilistic Relational Models Tim Van den Bulcke, Hui.

33Machine Learning in Systems Biology, Sep 25th

Results: Missing values

• ProBic model has no concept of ‘missing values’

GeneGene

ExpressionExpression

ConditionCondition

? (0 or 1)? (0 or 1)g2

? (0 or 1)? (0 or 1)g1

B2B1ID

? (0 or 1)? (0 or 1)c2

? (0 or 1)? (0 or 1)c1

B2B1ID

1.6c1g2

(missing value)c2g1

0.5c2g2

-2.4c1g1

levelc.IDg.ID

g1.B1

g1.B2 level1,

1

c1.B1 c1.B2

g2.B1

g2.B2 level2,2

c2.B1 c2.B2

level2,

1

c1.ID c2.ID

PRM model Database instance

ground Bayesian network

GeneGene

ExpressionExpression

ConditionCondition

B1 B2B1 B2

P(e.level | g.B1,g.B2,c.B1,c.B2, c.ID)

=Normal( μg.B,c.B, c.ID, σg.B,c.B,c.ID )

ID

level

Page 34: 1 Machine Learning in Systems Biology, Sep 25th Identification of overlapping biclusters using Probabilistic Relational Models Tim Van den Bulcke, Hui.

34Machine Learning in Systems Biology, Sep 25th

Algorithm: example

Page 35: 1 Machine Learning in Systems Biology, Sep 25th Identification of overlapping biclusters using Probabilistic Relational Models Tim Van den Bulcke, Hui.

35Machine Learning in Systems Biology, Sep 25th

Algorithm properties

• Speed:– 500 genes, 200 conditions, 2 biclusters: 2 min.

– Scaling:

• ~ #genes . #conditions . 2#biclusters (worse case)

• ~ #genes . #conditions . (#biclusters)p (in practice), p=1..3

Page 36: 1 Machine Learning in Systems Biology, Sep 25th Identification of overlapping biclusters using Probabilistic Relational Models Tim Van den Bulcke, Hui.

40Machine Learning in Systems Biology, Sep 25th

Overview

• Biclustering and biology

• Probabilistic Relational Models

• ProBic biclustering model

• Algorithm

• Results

• Discussion

• Conclusion

Page 37: 1 Machine Learning in Systems Biology, Sep 25th Identification of overlapping biclusters using Probabilistic Relational Models Tim Van den Bulcke, Hui.

41Machine Learning in Systems Biology, Sep 25th

Discussion: Expectation-Maximization

• Initialization: – initialization with (almost) all genes and condition:

convergence to good local optimum

– multiple random initializations: many initializations required (speed!)

• E.-M. steps:– limit changes in gene/condition-bicluster assignments in both E-

steps results in higher stability (at cost of slower convergence)

Page 38: 1 Machine Learning in Systems Biology, Sep 25th Identification of overlapping biclusters using Probabilistic Relational Models Tim Van den Bulcke, Hui.

42Machine Learning in Systems Biology, Sep 25th

Probabilistic Relational Models (PRMs)

• Relational extension to Bayesian Networks (BNs):– BNs ~ a single flat table– PRMs ~ relational data structure

• A relational scheme (implicitly) defines a constrained Bayesian network

• In PRMs, probability distributions are shared among all objects of the same class

• Likelihood function:(very similar to chain-rule in Bayesian networks)

• Learning PRM model e.g. using maximum likelihood principle

)).(|.(),S,|( ,.

AxparentsAxPP Sx Ax

I

AttributesObjects

Page 39: 1 Machine Learning in Systems Biology, Sep 25th Identification of overlapping biclusters using Probabilistic Relational Models Tim Van den Bulcke, Hui.

43Machine Learning in Systems Biology, Sep 25th

Algorithm: user-defined parameters

• P(μg.B,c.B,c, σg.B,c.B,c): prior distributions for μ, σ

– Conjugate:

• Normal-Inverse-χ2 distribution or

• Normal distribution with pseudocount

– Makes extreme distributions less likely

• P(a.Bk): prior probability that a condition is in bicluster k

– Prevents background conditions to be in biclusters

– If no prior distribution P(μ, σ): conditions are always more likely to be in a bicluster due to statistical variations.

• P(g.Bk): prior probability that a gene is in bicluster k

– Initialize biclusters with seed genes: query-driven biclustering

• P(a.Bk) and P(g.Bk):

– Both have impact on the preference for certain bicluster shapes

Page 40: 1 Machine Learning in Systems Biology, Sep 25th Identification of overlapping biclusters using Probabilistic Relational Models Tim Van den Bulcke, Hui.

44Machine Learning in Systems Biology, Sep 25th

Algorithm: Expectation-Maximization

• Expectation step 2:

– argmaxG.B log(posterior)

constant

without prior: independent per gene

Approximation based on previous iteration

quasi independence if small changes in assignment

Page 41: 1 Machine Learning in Systems Biology, Sep 25th Identification of overlapping biclusters using Probabilistic Relational Models Tim Van den Bulcke, Hui.

45Machine Learning in Systems Biology, Sep 25th

ProBic biclustering model

GeneGene

ExpressionExpression

ConditionCondition

B1 B2B1 B2

P(e.level | g.B1,g.B2,c.B1,c.B2, c.ID)=

Normal( μg.B,c.B, c.ID, σg.B,c.B,c.ID )

ID

PRM model

level

g.B c.B c μ σ

1 1,2 44 -1 .5

2 1,2 44 +2 .6

- * 44 0 1

* - 44 0 1

1 1 44 -1 .5

2 2 44 +2 .6

1

2

c=44

Page 42: 1 Machine Learning in Systems Biology, Sep 25th Identification of overlapping biclusters using Probabilistic Relational Models Tim Van den Bulcke, Hui.

46Machine Learning in Systems Biology, Sep 25th

ProBic biclustering model

• ground Bayesian network

g1.B1

g1.B2 level1,1

c1.B1 c1.B2

g2.B1

g2.B2 level2,2

c2.B1 c2.B2

level2,1

c1.ID c2.ID

Page 43: 1 Machine Learning in Systems Biology, Sep 25th Identification of overlapping biclusters using Probabilistic Relational Models Tim Van den Bulcke, Hui.

47Machine Learning in Systems Biology, Sep 25th

ProBic biclustering model

• Likelihood function:(~ chain-rule in Bayesian networks)

• In PRMs, probability distributions are shared among all objects of the same class

)).(|.(),S,|( ,.

AxparentsAxPP Sx Ax

I

AttributesObjects

Page 44: 1 Machine Learning in Systems Biology, Sep 25th Identification of overlapping biclusters using Probabilistic Relational Models Tim Van den Bulcke, Hui.

48Machine Learning in Systems Biology, Sep 25th

Algorithm: Expectation-Maximization

• Maximization step:

– argmaxμ,σ log(posterior)

constant

independent per condition

+

analytic solution based on

sufficient statistics

Page 45: 1 Machine Learning in Systems Biology, Sep 25th Identification of overlapping biclusters using Probabilistic Relational Models Tim Van den Bulcke, Hui.

49Machine Learning in Systems Biology, Sep 25th

Algorithm: Expectation-Maximization

• Expectation step 1: assign conditions

– argmaxC.B log(posterior)

constant

independent per condition

Page 46: 1 Machine Learning in Systems Biology, Sep 25th Identification of overlapping biclusters using Probabilistic Relational Models Tim Van den Bulcke, Hui.

51Machine Learning in Systems Biology, Sep 25th

Algorithm: Expectation-Maximization

• Expectation step 1:– Evaluate function for every condition and for

every bicluster assignment e.g. 200 conditions, 30 biclusters: 200 * 230 = 200 billion ~ a lot

– But can be performed very efficiently:

• Partial solutions can be reused among different bicluster assignments

1

2

3

c.B = 1 ?c.B = 1,2 ?c.B = 1,2,3 ?

Page 47: 1 Machine Learning in Systems Biology, Sep 25th Identification of overlapping biclusters using Probabilistic Relational Models Tim Van den Bulcke, Hui.

52Machine Learning in Systems Biology, Sep 25th

Algorithm: Expectation-Maximization

• Expectation step 1:– Evaluate function for every condition and for

every bicluster assignment e.g. 200 conditions, 30 biclusters: 200 * 230 = 200 billion ~ a lot

– But can be performed very efficiently:

• Partial solutions can be reused among different bicluster assignments

• Only evaluate potential good solutions: use Apriori-like approach.

1

2

3

a.B = 1 ?a.B = 2 ?a.B = 1,2 ?

Page 48: 1 Machine Learning in Systems Biology, Sep 25th Identification of overlapping biclusters using Probabilistic Relational Models Tim Van den Bulcke, Hui.

53Machine Learning in Systems Biology, Sep 25th

Algorithm: Expectation-Maximization

• Expectation step 1:– Evaluate function for every condition and for

every bicluster assignment e.g. 200 conditions, 30 biclusters: 200 * 230 = 200 billion ~ a lot

– But can be performed very efficiently:

• Partial solutions can be reused among different bicluster assignments

• Only evaluate potential good solutions: use Apriori-like approach.

• Avoid background evaluations

1

2

3

Page 49: 1 Machine Learning in Systems Biology, Sep 25th Identification of overlapping biclusters using Probabilistic Relational Models Tim Van den Bulcke, Hui.

54Machine Learning in Systems Biology, Sep 25th

Algorithm: Expectation-Maximization

• Expectation step 2:– Analogous approach as in step 1

Page 50: 1 Machine Learning in Systems Biology, Sep 25th Identification of overlapping biclusters using Probabilistic Relational Models Tim Van den Bulcke, Hui.

55Machine Learning in Systems Biology, Sep 25th

Acknowledgements

KULeuven:

• whole BioI group, ESAT-SCD

– Tim Van den Bulcke

– Thomas Dhollander

• whole CMPG group(Centre of Microbial and Plant Genetics)

– Kristof Engelen

– Kathleen Marchal

UGent:

• whole Bioinformatics & Evolutionary Genomics group

– Tom Michoel BIO I..

Page 51: 1 Machine Learning in Systems Biology, Sep 25th Identification of overlapping biclusters using Probabilistic Relational Models Tim Van den Bulcke, Hui.

56Machine Learning in Systems Biology, Sep 25th

Algorithm: iteration strategy

• Many implementation choices in generalized E.-M.:– Single E-step per M-step

– Iterate E-steps until convergence per M-step

– Iteratively perform E-step 1, M-step, E-step 2, M-step

– …

• Our choice: – Iteratively perform E-step 1, M-step, E-step 2, M-step

– fast and good convergence properties

Page 52: 1 Machine Learning in Systems Biology, Sep 25th Identification of overlapping biclusters using Probabilistic Relational Models Tim Van den Bulcke, Hui.

57Machine Learning in Systems Biology, Sep 25th

Results: 9 bicluster dataset (15 genes x 80 conditions)

Page 53: 1 Machine Learning in Systems Biology, Sep 25th Identification of overlapping biclusters using Probabilistic Relational Models Tim Van den Bulcke, Hui.

58Machine Learning in Systems Biology, Sep 25th

Results: Missing values

• 500 genes, 300 conditions

• Noise stdev: 0.2 (bicluster), 1.0 background

• 1 50x50 biclusterPrecision (genes) Recall (genes)

Precision (conditions) Recall (conditions)% missing values

% missing values


Recommended