BGX 1 Sylvia Richardson Natalia Bochkina Alex Lewin Centre for Biostatistics Imperial College,...

transcript

Sylvia RichardsonNatalia Bochkina

Alex Lewin

Centre for BiostatisticsImperial College, London

Bayesian inference in differentialexpression experiments

BBSRCwww.bgx.org.uk

Biological Atlasof Insulin Resistance

Background

• Investigating changes of gene expression under different conditions is one of the key questions in many biological experiments

• Specificity of the context is– High dimensional data (ten of thousands of

genes) and few samples• Need to borrow information

– Many sources of variability • Important to adopt a flexible modelling framework

Bayesian Hierarchical Modelling allows to capture important features of the data while maintaining generalisibility of the tools/ techniques developed

Modelling differential expression

Differential expression parameter

Condition 1 Condition 2

Posterior distribution (flat prior)

Mixture modelling for classification

Hierarchical model of replicatevariability and array effect

Start with given pointestimates of expression

Outline

• Background• Bayesian hierarchical models for differential

expression experiments• Decision rules based on tail posterior probabilities

– Comparison with existing approaches– FDR estimation for tail posterior probabilities

• Extension of tail posterior probabilities to analysing multiclass experiments

• Illustration• Discussion and further work

Data: ygcr = log gene expression gene g, replicate r, condition c

g = gene effect

δg = differential effect for gene g between 2 conditions

r(g)c = array effect – modelled as a smooth (spline) function of g

gc2 = gene specific variance

• 1st level yg1r N(g – ½ δg + r(g)1 , g12)

yg2r N(g + ½ δg + r(g)2 , g22)

Σrr(g)c = 0, r(g)c = function of g , parameters {c,d}

• 2nd level “Flat” priors for g , δg, {c,d}

gc2 g (ac, bc)

(lognormal or inverse-gamma)

I -- Bayesian hierarchical model for differential expression (Lewin et al, Biometrics, 2006)

Exchangeablevariances

Joint modelling of array effects and differential expression

• Performs normalisation simultaneously with estimation– Gives fewer false positives than plug in

• BHM set up allows to check some of the modelling assumptions using mixed posterior predictive checks:– the need for gene specific variances – their 2nd level distribution

Found that lognormal or 2 parameter inversegamma distribution for the variances gave similar model checks

Selecting genes that are differentially expressed

• Interested in testing the null hypothesis

Two broad approaches have been used:

P value type Mixture

P(H0 | ygcr)

U [0,1]

close to 0

close to 1

close to 0

References Baldi and Long Smyth 2004, …

Moderated t stat

Lonnstedt & Speed 02, Newton & Kendziorski, 01, 03 Lonnstedt & Britton 05, Gottardo 06, ….

H (g)0 : ±g =0 versus H (g)

1 : ±g 6=0:

Bayesian mixtures

• Relies on specification of prior model for :

• Choice of model for the alternative (see the poster by Alex Lewin)– Could influence the performance of the classification– To check how the alternative fits the data is non standard

Investigate properties of Bayesian selection rules based on “non informative” prior for

±g±g » ¼0±0+(1¡ ¼0)h(±gj´)

II -- Bayesian selection rules for pairwise comparisons

• 1st level (no array effect):

• Hierarchical model

Extend p value approach to consider the tail probabilities of appropriate function of parameters

®g; ±g » 1; ¾2gs » f (¾2gs j µs) µs » f (µs)

yg1r j®g;±g;¾2g1 » N (®g ¡ ±g=2;¾2g1); r = 1;:: : ;m1;

yg2r j®g;±g;¾2g2 » N (®g +±g=2;¾2g2); r = 1;:: : ;m2:

Posterior distributions

• Define the “Bayesian T statistic”:

The following conditional distributions hold

• Posterior distributions:

Let ¹ygs = 1ms

P msr=1ygsr and and ¹yg = ¹yg2 ¡ ¹yg1, wherems is thenumber of

replicates in condition s. Let s2gs =1

ms ¡ 1

P msr=1(ygsr ¡ ¹ygs)2, s = 1;2

Let w2g =¾2g1=m1+¾2g2=m2 be thevarianceof ¹yg.

tg =±g=wg

±gj¹yg;wg » N (¹yg;w2g);

tgj¹yg;wg » N (¹yg=wg;1):

where f (w2gjs

2gs) denotes theposterior density of thevariances

f (±gj¹yg;s2gs) =

' ((±g ¡ ¹yg)=wg)f (w2gjs

2gs)d(w

f (tgj¹yg;s2gs) =

0' (tg ¡ ¹yg=wg)f (w2

gjs2gs)d(w

Tail posterior probabilities 1 (N. Bochkina and SR, 2006)

• Use selection rules of the form :

• What ‘statistic’ to choose:• How to define its percentiles ?

– we suppose that we could have observed data with (its expected value of under the null) – work out the percentiles using posterior distributions

conditional on

Tg =±g or tg ?

¹yg = 0

¹yg = 0:

P fTg >T (®)g jygsrg¸ pcut,

Summarise the distribution of the Tg by a tail area

whereTg is a suitable function of parameters of interest,and T (®)

g denotesan 1¡ ®=2percentileof Tg obtained under thenull hypothesis.

Tail posterior probabilities 2

Recall

Corresponding distribution function involves numerical integration computationally expensive

The percentile

is easy to calculate

Consider the tail probability:

f (±gj¹yg;s2gs) =

2gs)d(w

• But

Distribution function of

does not involve gene specific parameters

f (tgj¹yg =0;s2gs)

f (tgj¹yg =0;wg) » N (0;1)

t(®)g = t(®) =©¡ 1(1¡ ®=2)

p(tg; t(®)) = P f jtgj > t(®) jygsrg

5% 0.5%

Can simulate ornumerically evaluateF0 the distributionof under H0

p(tg;t(®))

Key point:F0 is geneindependent(conjugate case)

Density of p(tg; t(®)) for ®= 0:05for data generated under thenull hypothesis

¾g1 =¾g2

Another Bayesian rule

• A natural idea is to compare the parameter to 0,

i.e. to consider :

or its complementary or the 2-sided alternative :

• It turns out that this Bayesian selection rule behaves like a “p-value”:– Distribution of is uniform under H0

– There is equivalence with frequentist testing based on the marginal distribution of under the null, in the spirit of the moderated t statistic introduced by Smyth 2004

p(±g;0)

p(±g;0) = P f±g >0jygsrg

maxfp(±g;0);1¡ p(±g;0)g

Suppose¾2g1 =¾2g2 =¾2g with conjugateprior InvGamma(a;b)

f (±gj¹yg;s2gs) =

2gs)d(w

Distribution of wg conjugate

±gjygr ;a;b» t(2a+m1+m2 ¡ 2; ¹yg;Sg=pk)

whereS2g = E (¾2

gjygr ;a;b) =2b+s2g (m1+m2 ¡ 2)

2a+m1+m2¡ 2is theBayesian posterior estimateof

thevarianceand k = (m¡ 11 +m¡ 1

p(±g;0) = P f±g >0jygr ;a;bg = FT (2a+m1+m2¡ 2)(¹ygpk=Sg)

= 1¡ P©t > tmod

g jH0ª

Link between p(δg,0) and the moderated t statistic

Under H0, thedistribution of p(dg;0) is uniform

Moderated t statistic

Histograms of measure of differential expressionSimulated data

Under H0

Under H1

p(tg , tg (α)) p(δg , 0)

Tail posterior probabilities 3

• Investigate the performance of selection rules based on

In particular:– what is the FDR associated with each value of ?– In the conjugate case:

– How does this rule compares to rules based on

p(±g;0)

F DR(pcut) =¼0P fp(tg; t(®)) > pcutjH0g

P fp(tg; t(®)) > pcutg;

Use Storey Use F0

Use observed proportion

p(tg; t(®)) = P f jtgj > t(®) jygsrg> pcut

Comparison of estimated (solid line) and “true” FDR (dashed line) on simulated data

π0 = 0.95 π0 = 0.90 π0 = 0.70

III-- Data Sets and Biological questions

Biological Questions

• Understand the mechanisms of insulin resistance• Cell line experiments where reaction of mouse muscle cell

line to treatment by insulin or metformin (an insulin replacement drug) is observed after 2 and 12 hours

• Questions of interest related to simple and compound comparisons

3 replicates for each condition, Affymetrix MOE430A chip, 22690 genes per chip

Data pre-processed by RMA and normalised using intensity dependent LOESS normalisation

p(tg , t (α)), α = 0.05 2max{ p(δg ,0), 1- p(δg ,0)} - 1

Volcano plots for muscle cell data:Change between insulin and control at 2 hours

Cut-off : 0. 925

Less peaked around zeroAllows better separation

Peaked around zeroVaries steeply as afunction of ¹yg

Insulin versus controlTail posterior probabilities

Estimated FDR

2 hours 12 hours

1151 selected (FDR= 0.5%) 13 selected (FDR= 0.5%)

π0 = 0.61 π0 = 0.98

Metformin versus controlTail posterior probabilities

Estimated FDR

2 hours 12 hours

1854 selected (FDR= 0.5%) 72 selected (FDR= 0.5%)

π0 = 0.56 π0 = 0.79

IV – Extension to the analysis of multi class data

• In our case study, 3 groups (control c=0, insulin c=1, metformin c=2) and 3 times points: t=0, t=1 ( 2 hours), t=2 (12 hours) each replicated 3 times

• ANOVA like model formulation suited to the analysis of such multifactorial experiments :

ygtcr = ®g+°gt +±gtc+"gtcr ;

"gtcr » N (0;¾2g)

¾¡ 2g » ¡ (a;b)

®g; °gt; ±gtc » 1; t;c= 1;2

Global variance parametrisation(borrowing information)

Joint tail posterior probabilities

• Interest is in testing a compound null hypothesis, i.e. involving several differential parameters

e.g. testing jointly for the effect of insulin and metformin at 2 hours

• In this case, we are interested in a specific alternative:

Note: Rejecting the null hypothesis in an ANOVA setting

corresponds to a different alternative

• Define joint tail posterior probabilities:

where is the Bayesian T statistic for each treatment

H (g)0 : ±g11 =0& ±g12 =0 versus H (g)

1 : ±g11 6= 0& ±g12 6= 0:

pJg = P f jtg11j > t(®) & jtg12j > t(®) j data)

tgtc =±gtc=wg

Benefits of joint posterior probabilities

• Takes into account correlation of the differential expression measures between the conditions induced by sharing the same variance parameter

• Usual practice is to:– Carry out pairwise comparisons– Select genes for each comparison using same cut-off

on the pp – Intersect lists and find genes common to both lists

• Joint pp shown to lead to fewer false positives in

this case of positive correlation (simulation study)

Correlation of DE parameters and Bayesian T statistic for insulin and metformin (2 hours)

• With joint tail posterior probabilities, and a cut-off of pcut =0.92, 280 selected as jointly perturbed at 2 hours

• Applying pairwise comparison and combining the lists adds another 47 genes to the list

Correlation between tg11 and tg12Correlation between ±g11 and ±g12

Discussion 1

• Tail posterior probabilities (Tpp) is a generic tool that can be used in any situations where a large number of hypotheses related in a hierarchical fashion are to be tested

• We have derived the distribution of the Tpp under the null and

proposed a corresponding estimate of FDR • This distribution requires numerical integration but is gene

independent (conjugate case), so only needs to be evaluated once • Tpp is a smooth function of the amount of DE with a gradient that

“spreads” the genes, thus allowing to choose genes with desired level of uncertainty about their DE

• Interesting connection between Bayesian and frequentist inference for the differential expression parameter

Discussion 2

• Interesting to compare performance of Tpp with that of mixture models• E.g Gamma mixtures (see poster by Alex Lewin)

δg ~ 0δ0 + 1G (-x|1.5, 1) + 2G (x|1.5, 2)

Dirichlet distribution for (0, 1, 2)

Exp(1) hyper prior for 1 and 2

Also Normal and t mixtures have been considered:

δg ~ 0δ0 + (1-0) T(ν,μ,τ) (μ ~ 1, τ, ν -1~ Exp(1) )

δg ~ 0δ0 + (1-0) N(μ,τ) (μ ~ 1, τ ~ Exp(1) )

Simulated data

• 3000 variables, 6 replicates, 2 conditions

yg1r N(g, g2)

yg2r N(g + δg, g2)

g2 ~ 0.03 + LogNorm(-3.85, 0.82),

g ~ Norm(7, 25),• δg : slightly asymmetric:

5%: δg | δg > 0 ~ h( δg),

10%: δg | δg < 0 ~ h(-δg),

85%: δg ~ N(0, 0.01),

whereh(j±gjj±g 6= 0) = 0:2U[0;2:5]+0:4U[0:07;0:7]+0:4N (0:7;0:7).

Comparison of mixture and tail pp

• Fit 3 mixture models (Gamma, Normal, t alternative) and flat model.

• Classification mixtures: P{ H1 |data}, flat: tail posterior probability.

Comparable performance, with a little edge for the Gamma and Normal mixture

BBSRC Exploiting Genomics grant Wellcome Trust BAIR consortium

Colleagues in the Biostatistics group: Marta Blangiardo, Anne Mette Hein, Maria de Iorio

Colleagues in the Biology group at ImperialTim Aitman, Ulrika Andersson, Dave Carling

Papers and technical reports: www.bgx.org.uk/For the tail probability paper: www.bgx.org.uk/Natalia/Bochkina.ps

Thanks

BGX 1 Sylvia Richardson Natalia Bochkina Alex Lewin Centre for Biostatistics Imperial College,...

Documents