+ All Categories
Home > Documents > Measures of Variability for Graphical Models · Measures of Structure Variability Measures of...

Measures of Variability for Graphical Models · Measures of Structure Variability Measures of...

Date post: 21-Aug-2020
Category:
Upload: others
View: 5 times
Download: 0 times
Share this document with a friend
37
Measures of Variability for Graphical Models Marco Scutari [email protected] Department of Statistical Sciences University of Padova March 14, 2011 Marco Scutari University of Padova
Transcript
Page 1: Measures of Variability for Graphical Models · Measures of Structure Variability Measures of Structure Variability All of these measures can be rescaled to vary in the [0;1] interval

Measures of Variability for Graphical Models

Marco Scutari

[email protected] of Statistical Sciences

University of Padova

March 14, 2011

Marco Scutari University of Padova

Page 2: Measures of Variability for Graphical Models · Measures of Structure Variability Measures of Structure Variability All of these measures can be rescaled to vary in the [0;1] interval

Graphical Models

Marco Scutari University of Padova

Page 3: Measures of Variability for Graphical Models · Measures of Structure Variability Measures of Structure Variability All of these measures can be rescaled to vary in the [0;1] interval

Graphical Models

Graphical Models

Graphical models are defined by:

• a network structure, either an undirected graph (Markovnetworks [3], gene association networks, correlation networks,etc.) or a directed graph (Bayesian networks [9]). Each nodecorresponds to a random variable;

• a global probability distribution, which can be factorised intoa small set of local probability distributions according to thetopology of the graph.

This combination allows a compact representation of the jointdistribution of large numbers of random variables and simplifiesinference on its parameters.

Marco Scutari University of Padova

Page 4: Measures of Variability for Graphical Models · Measures of Structure Variability Measures of Structure Variability All of these measures can be rescaled to vary in the [0;1] interval

Graphical Models

A Simple Bayesian Network: Watson’s Lawn

TRUE FALSE

SPRINKLER

0.4 0.6

TRUE FALSE

RAIN

0.2 0.8

SPRINKLERFALSE

GRASS WET

0.0 1.0

TRUERAIN

FALSEFALSE

0.8 0.2TRUEFALSE

0.9 0.1FALSETRUE

0.99 0.01TRUETRUE

RAIN

FALSE

0.01 0.99TRUE

SPRINKLERSPRINKLERSPRINKLER RAIN

GRASS WET

Marco Scutari University of Padova

Page 5: Measures of Variability for Graphical Models · Measures of Structure Variability Measures of Structure Variability All of these measures can be rescaled to vary in the [0;1] interval

Graphical Models

The Problem

Most literature on the analysis of graphical models focuses on thestudy of the parameters of local probability distributions (such asconditional probabilities or partial correlations).

• Comparing models learned with different algorithms isdifficult, because they maximise different scores, use differentestimators for the parameters, work under different sets ofhypotheses, etc. [15].

• Unless the true global probability distribution is known it isdifficult to assess the quality of the estimated models.

• The few available measures of structural difference arecompletely descriptive in nature (i.e. Hamming distance [8] orSHD [21]), and are difficult to interpret.

Focusing on network structures sidesteps most of these issues.

Marco Scutari University of Padova

Page 6: Measures of Variability for Graphical Models · Measures of Structure Variability Measures of Structure Variability All of these measures can be rescaled to vary in the [0;1] interval

Modelling Undirected NetworkStructures

Marco Scutari University of Padova

Page 7: Measures of Variability for Graphical Models · Measures of Structure Variability Measures of Structure Variability All of these measures can be rescaled to vary in the [0;1] interval

Modelling Undirected Network Structures

Edges and Univariate Bernoulli Random Variables

Each edge ei in an undirected graph U = (V, E) has only twopossible states,

ei =

{1 if ei ∈ E0 otherwise

.

Therefore it can be modelled as a Bernoulli random variable Ei,

ei ∼ Ei =

{1 ei ∈ E with probability pi

0 ei 6∈ E with probability 1− pi,

where pi is the probability that the edge ei appears in the graph.We will denote it as Ei ∼ Ber(pi).

Marco Scutari University of Padova

Page 8: Measures of Variability for Graphical Models · Measures of Structure Variability Measures of Structure Variability All of these measures can be rescaled to vary in the [0;1] interval

Modelling Undirected Network Structures

Edge Sets as Multivariate Bernoulli

The natural extension of this approach is to model any set W ofedges (such as E or {V ×V}) as a multivariate Bernoulli randomvariable W ∼ Berk(p). W is uniquely identified by the parameterset

p = {pw : w ⊆W,w 6= ∅} ,

which represents the dependence structure [10] among themarginal distributions Wi ∼ Ber(pi), i = 1, . . . , k of the edges.

The parameter set p can be estimated using m bootstrap samples[4] as suggested in Friedman et al. [5] or Imoto et al. [7].

Marco Scutari University of Padova

Page 9: Measures of Variability for Graphical Models · Measures of Structure Variability Measures of Structure Variability All of these measures can be rescaled to vary in the [0;1] interval

Modelling Undirected Network Structures

Second Order Properties

The marginal variances of the edges are bounded, because

pi ∈ [0, 1] =⇒ σii = pi − p2i ∈

[0,

1

4

].

Covariances are bounded in the same interval (in modulus).Similar bounds exist for the eigenvalues λ1, . . . , λk of thecovariance matrix Σ,

0 6 λi 6k

4and 0 6

k∑i=1

λi 6k

4.

Furthermore, if W1 and W2 are two multivariate Bernoullirandom variables, then they are independent if and only if

W1 ⊥⊥W2 ⇐⇒ COV(W1,W2) = O.

Marco Scutari University of Padova

Page 10: Measures of Variability for Graphical Models · Measures of Structure Variability Measures of Structure Variability All of these measures can be rescaled to vary in the [0;1] interval

Measures of StructureVariability

Marco Scutari University of Padova

Page 11: Measures of Variability for Graphical Models · Measures of Structure Variability Measures of Structure Variability All of these measures can be rescaled to vary in the [0;1] interval

Measures of Structure Variability

Entropy of the Bootstrapped Network Structures

Consider the graphical models U1, . . . ,Um learned from the bootstrapsamples. Three scenarios are possible:

• minimum entropy: all the models learned from the bootstrapsamples have the same structure. In this case:

pi =

{1 if ei ∈ E0 otherwise

and Σ = O;

• intermediate entropy: several models are observed with differentfrequencies mb,

∑mb = m, so

pi =1

m

∑b : ei∈Eb

mb and pij =1

m

∑b : ei∈Eb,ej∈Eb

mb;

• maximum entropy: all possible models appear with the samefrequency, which results in

pi =1

2and Σ =

1

4Ik.

Marco Scutari University of Padova

Page 12: Measures of Variability for Graphical Models · Measures of Structure Variability Measures of Structure Variability All of these measures can be rescaled to vary in the [0;1] interval

Measures of Structure Variability

Entropy of the Bootstrapped Network Structures

maximum entropy

minimumentropy

Marco Scutari University of Padova

Page 13: Measures of Variability for Graphical Models · Measures of Structure Variability Measures of Structure Variability All of these measures can be rescaled to vary in the [0;1] interval

Measures of Structure Variability

Univariate Measures of Variability

• The generalised variance

VARG(Σ) = det(Σ) =

k∏i=1

λi ∈[0,

1

4k

].

• The total variance (or total variability)

VART (Σ) = tr(Σ) =

k∑i=1

λi ∈[0,k

4

].

• The squared Frobenius matrix norm

VARN (Σ) = |||Σ−k4Ik|||2F =

k∑i=1

(λi −

k

4

)2

∈[k(k − 1)2

16,k3

16

].

Marco Scutari University of Padova

Page 14: Measures of Variability for Graphical Models · Measures of Structure Variability Measures of Structure Variability All of these measures can be rescaled to vary in the [0;1] interval

Measures of Structure Variability

Measures of Structure Variability

All of these measures can be rescaled to vary in the [0, 1] interval and toassociate high values to networks whose structure display a high entropyin the bootstrap samples:

VART (Σ) =4

kVART (Σ), VARG(Σ) = 4kVARG(Σ), VARN (Σ) =

k3 − 16VARN (Σ)

k(2k − 1).

Furthermore, these measures can be easily translated into asymptotic orMonte Carlo tests (via parametric bootstrap) having the maximumentropy covariance matrix as the null hypothesis.

4m tr(Σ).∼ χ2

mk

√n[4k det(Σ)− 1

].∼ N(0, 2k)

mk

2k

√4k det(Σ)

.∼ Ga(k(m+ 1− k)

2, 1

)|||Σ− 1

4|||2F

.∼ 1

8mχ2

12k(k+1)

Marco Scutari University of Padova

Page 15: Measures of Variability for Graphical Models · Measures of Structure Variability Measures of Structure Variability All of these measures can be rescaled to vary in the [0;1] interval

Measures of Structure Variability

Structure Variability (Total Variance)

maximum entropyminimumentropy

Marco Scutari University of Padova

Page 16: Measures of Variability for Graphical Models · Measures of Structure Variability Measures of Structure Variability All of these measures can be rescaled to vary in the [0;1] interval

Measures of Structure Variability

Structure Variability (Squared Frobenius Matrix Norm)

maximum entropyminimumentropy

Marco Scutari University of Padova

Page 17: Measures of Variability for Graphical Models · Measures of Structure Variability Measures of Structure Variability All of these measures can be rescaled to vary in the [0;1] interval

Modelling Directed AcyclicNetwork Structures

Marco Scutari University of Padova

Page 18: Measures of Variability for Graphical Models · Measures of Structure Variability Measures of Structure Variability All of these measures can be rescaled to vary in the [0;1] interval

Modelling Directed Acyclic Network Structures

Edges and Univariate Trinomial Random Variables

Each arc aij in a directed acyclic graph G = (V, A) has threepossible states,

aij =

−1 if aij =←−aij = {vi ← vj}0 if aij 6∈ A, denoted with aij

1 if aij = −→aij = {vi → vj},

and therefore it can be modelled as a Trinomial random variableAi, which is essentially a multinomial random variable with threestates. Variability measures (and their normalised variants) can beextended from the undirected case as

VAR(Ai) = VAR(Ei) + 4P(−→aij)P(←−aij) ∈ [0, 1]

Marco Scutari University of Padova

Page 19: Measures of Variability for Graphical Models · Measures of Structure Variability Measures of Structure Variability All of these measures can be rescaled to vary in the [0;1] interval

Modelling Directed Acyclic Network Structures

Edge Sets as Multivariate Trinomials

As before, the natural extension to model any set W of arcs is touse a multivariate Trinomial random variable W ∼ Trik(p) and toestimate its parameters via nonparametric bootstrap.

However:

• the acyclicity constraint of Bayesian networks makes derivingexact results very difficult because it cannot be written inclosed form;

• the score equivalence of most structure learning strategiesmakes inference on Trik(p) tricky unless particular care istaken (i.e. both possible orientations of many arcs result inequivalent probability distributions, so the algorithms cannotchoose between them).

Marco Scutari University of Padova

Page 20: Measures of Variability for Graphical Models · Measures of Structure Variability Measures of Structure Variability All of these measures can be rescaled to vary in the [0;1] interval

Modelling Directed Acyclic Network Structures

Properties of the Multivariate Trinomial

In the maximum entropy case we have the following approximateresults [11]:

P(−→aij) = P(←−aij) '1

4+

1

4(n− 1)and P(aij) '

1

2− 1

2(n− 1).

where n is the number of nodes of the graph. Furthermore, wehave that

VAR(Aij) '1

2+

1

2(n− 1)→ 1

2as n→∞

and

|COV(Aij , Akl)| / 4

[3

4− 1

4(n− 1)

]2 [1

4+

1

4(n− 1)

]2

→ 9

64as n→∞.

Marco Scutari University of Padova

Page 21: Measures of Variability for Graphical Models · Measures of Structure Variability Measures of Structure Variability All of these measures can be rescaled to vary in the [0;1] interval

Modelling Directed Acyclic Network Structures

Measures of Structure Variability

Since variances are bounded in [0, 1] we can define again

VART (Σ) =1

kVART (Σ) and VARG(Σ) = VARG(Σ).

We can also compute VARN (Σ) using a Monte Carlo estimate forCOV(Aij , Akl) based on Ide and Cozman’s algorithm [6]. Thesame holds for hypothesis tests.

Marco Scutari University of Padova

Page 22: Measures of Variability for Graphical Models · Measures of Structure Variability Measures of Structure Variability All of these measures can be rescaled to vary in the [0;1] interval

Determining StatisticallySignificant Functional

Relationships

Marco Scutari University of Padova

Page 23: Measures of Variability for Graphical Models · Measures of Structure Variability Measures of Structure Variability All of these measures can be rescaled to vary in the [0;1] interval

Determining Statistically Significant Functional Relationships

The Problem

• transcriptions of regulatory (gene) networks controlling bothmyogenic and adipogenic differentiation are still under activeinvestigation.

• myogenic and adipogenic differentiation pathways are typicallyconsidered non-overlapping, but Taylor-Jones et al. [20] hasshown that myogenic progenitors from aged mice co-expresssome aspects of both myogenic and adipogenic geneprograms.

• their balance is apparently regulated by Wnt signallingaccording to Vertino et al. [22], but there have been fewefforts to understand the interactions between these twonetworks.

Marco Scutari University of Padova

Page 24: Measures of Variability for Graphical Models · Measures of Structure Variability Measures of Structure Variability All of these measures can be rescaled to vary in the [0;1] interval

Determining Statistically Significant Functional Relationships

The Experimental Setting

The clonal gene expression data was generated from RNA isolatedfrom 34 clones of myogenic progenitors obtained from 24-monthsold mice, cultured to confluence and allowed to differentiate for 24hours. RT–PCR was used to quantify the expression of 12 genes:

• myogenic regulatory factors: Myo-D1, Myogenin and Myf-5.

• adipogenesis-related genes: FoxC2, DDIT3, C/EPB andPPARγ.

• Wnt-related genes: Wnt5a and Lrp5.

• control genes: GAPDH, 18S and B2M.

Marco Scutari University of Padova

Page 25: Measures of Variability for Graphical Models · Measures of Structure Variability Measures of Structure Variability All of these measures can be rescaled to vary in the [0;1] interval

Determining Statistically Significant Functional Relationships

Choosing the Right Structure Learning Algorithm

VART(Σ)

GS

IAMB

Fast−IAMB

Inter−IAMB

HC

MMHC

Tabu

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

Marco Scutari University of Padova

Page 26: Measures of Variability for Graphical Models · Measures of Structure Variability Measures of Structure Variability All of these measures can be rescaled to vary in the [0;1] interval

Determining Statistically Significant Functional Relationships

Choosing the Right Tuning Parameters

VART(Σ)

COR

MI

MI−SH

ZF

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

Marco Scutari University of Padova

Page 27: Measures of Variability for Graphical Models · Measures of Structure Variability Measures of Structure Variability All of these measures can be rescaled to vary in the [0;1] interval

Determining Statistically Significant Functional Relationships

Determining Significant Functional Relationships

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

p(i)

Fp (

i)(x)

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

p~(i)

Fp~

(i)(x

;t)

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

p(i)

Fp (

i)(x)

Significant functional relationships can be selected by filtering outthe noise in the data or by finding the closest minimum-entropyconfiguration.

Marco Scutari University of Padova

Page 28: Measures of Variability for Graphical Models · Measures of Structure Variability Measures of Structure Variability All of these measures can be rescaled to vary in the [0;1] interval

Determining Statistically Significant Functional Relationships

Statistically Significant FRs

control genes:GAPDH, 18S, B2M

DDIT3

Wnt5a

FoxC2

Myogenin

Myo-D1 LRP5

Myf-5

CEBPα

PPARγ

Marco Scutari University of Padova

Page 29: Measures of Variability for Graphical Models · Measures of Structure Variability Measures of Structure Variability All of these measures can be rescaled to vary in the [0;1] interval

Conclusions

Marco Scutari University of Padova

Page 30: Measures of Variability for Graphical Models · Measures of Structure Variability Measures of Structure Variability All of these measures can be rescaled to vary in the [0;1] interval

Conclusions

Conclusions

• In literature inference on the structure of graphical models isusually overlooked in favour of the inference on theparameters of the global and local distributions.

• Rigorous inference on network structures is possible with theappropriate multivariate distributions: multivariate Bernoulliand multivariate Trinomial.

• In this setting we can define descriptive statistics andhypothesis tests which are easy to interpret and apply to anyset of edges/arcs.

Marco Scutari University of Padova

Page 31: Measures of Variability for Graphical Models · Measures of Structure Variability Measures of Structure Variability All of these measures can be rescaled to vary in the [0;1] interval

Conclusions

Thank you.

Marco Scutari University of Padova

Page 32: Measures of Variability for Graphical Models · Measures of Structure Variability Measures of Structure Variability All of these measures can be rescaled to vary in the [0;1] interval

References

Marco Scutari University of Padova

Page 33: Measures of Variability for Graphical Models · Measures of Structure Variability Measures of Structure Variability All of these measures can be rescaled to vary in the [0;1] interval

References

References I

R. B. Ash.Probability and Measure Theory.Academic Press, 2nd edition, 2000.

S. S. Chavan, M. A. Bauer, M. Scutari, and R. Nagarajan.NATbox: a Network Analysis Toolbox in R.BMC Bioinformatics, 10(Suppl 11):S14, 2009.Supplement contains the Proceedings of the 6th Annual MCBIOS Conference(Transformational Bioinformatics: Delivering Value from Genomes).

D. I. Edwards.Introduction to Graphical Modelling.Springer, 2nd edition, 2000.

B. Efron and R. Tibshirani.An Introduction to the Bootstrap.Chapman & Hall, 1993.

N. Friedman, M. Goldszmidt, and A. Wyner.Data Analysis with Bayesian Networks: A Bootstrap Approach.In Proceedings of the 15th Annual Conference on Uncertainty in ArtificialIntelligence, pages 206–215. Morgan Kaufmann, 1999.

Marco Scutari University of Padova

Page 34: Measures of Variability for Graphical Models · Measures of Structure Variability Measures of Structure Variability All of these measures can be rescaled to vary in the [0;1] interval

References

References II

J. S. Ide and F. G. Cozman.Random Generation of Bayesian Networks.In Proceedings of the 16th Brazilian Symposium on Artificial Intelligence, pages366–375. Springer-Verlag, 2002.

S. Imoto, S. Y. Kim, H. Shimodaira, S. Aburatani, K. Tashiro, S. Kuhara, andS. Miyano.Bootstrap Analysis of Gene Networks Based on Bayesian Networks andNonparametric Regression.Genome Informatics, 13:369–370, 2002.

D. Jungnickel.Graphs, Networks and Algorithms.Springer, 3rd edition, 2008.

K. Korb and A. Nicholson.Bayesian Artificial Intelligence.Chapman & Hall, 2004.

F. Krummenauer.Limit Theorems for Multivariate Discrete Distributions.Metrika, 47(1):47–69, 1998.

Marco Scutari University of Padova

Page 35: Measures of Variability for Graphical Models · Measures of Structure Variability Measures of Structure Variability All of these measures can be rescaled to vary in the [0;1] interval

References

References III

G. Melancon, I. Dutour, and M. Bousquet-Melou.Random Generation of Dags for Graph Drawing.Technical Report INS-R0005, Centre for Mathematics and Computer Sciences,Amsterdam, 2000.

R. Nagarajan, S. Datta, and M. Scutari.Graphical Models in R.Use R! series. Springer, 2011.In preparation.

R. Nagarajan, S. Datta, M. Scutari, M. L. Beggs, G. T. Nolen, and C. A.Peterson.Functional Relationships Between Genes Associated with DifferentiationPotential of Aged Myogenic Progenitors.Frontiers in Physiology, 1(21):1–8, 2010.

M. Scutari.Structure Variability in Bayesian Networks.Working Paper 13-2009, Department of Statistical Sciences, University ofPadova, 2009.Deposited on arXiv in the Statistics - Methodology archive, available fromhttp://arxiv.org/abs/0909.1685.

Marco Scutari University of Padova

Page 36: Measures of Variability for Graphical Models · Measures of Structure Variability Measures of Structure Variability All of these measures can be rescaled to vary in the [0;1] interval

References

References IV

M. Scutari.Learning Bayesian Networks with the bnlearn R Package.Journal of Statistical Software, 35(3):1–22, 2010.

M. Scutari.Measures of Variability for Bayesian Network Graphical Structures.Journal of Multivariate Analysis, 2010.Submitted for publication.

M. Scutari.bnlearn: Bayesian network structure learning, 2011.R package version 2.4, http://www.bnlearn.com/.

M. Scutari and A. Brogini.Constraint-based Bayesian Network Learning with Permutation Tests.Communications in Statistics – Theory and Methods, 2011.Special Issue containing the Proceedings of the Conference “Statistics forComplex Problems: the Multivariate Permutation Approach and RelatedTopics”, Padova, June 14 – 15. In print.

Marco Scutari University of Padova

Page 37: Measures of Variability for Graphical Models · Measures of Structure Variability Measures of Structure Variability All of these measures can be rescaled to vary in the [0;1] interval

References

References V

M. Scutari and K. Strimmer.Introduction to Graphical Modelling.In D. J. Balding, M. Stumpf, and M. Girolami, editors, Handbook of StatisticalSystems Biology. Wiley, 2011.In print.

J. M. Taylor-Jones, R. E. McGehee, T. A. Rando, B. Lecka-Czernik, D. A.Lipschitz, and C. A. Peterson.Activation of an Adipogenic Program in Adult Myoblasts with Age.Mechanisms of Ageing and Development, 123(6):649–661, 2002.

I. Tsamardinos, L. E. Brown, and C. F. Aliferis.The Max-Min Hill-Climbing Bayesian Network Structure Learning Algorithm.Machine Learning, 65(1):31–78, 2006.

A. M. Vertino, J. M. Taylor-Jones, K. A. Longo, E. D. Bearden, T. F. Lane,R. E. McGehee, O. A. MacDougald, and C. A. Peterson.Wnt10b Deficiency Promotes Coexpression of Myogenic and AdipogenicPrograms in Myoblasts.Molecular Biology of the Cell, 16(4):2039–2048, 2005.

Marco Scutari University of Padova


Recommended