Post on 28-Mar-2015
transcript
1BGX
Sylvia RichardsonNatalia Bochkina
Alex Lewin
Centre for BiostatisticsImperial College, London
Bayesian inference in differentialexpression experiments
BBSRCwww.bgx.org.uk
Biological Atlasof Insulin Resistance
2BGX
Background
• Investigating changes of gene expression under different conditions is one of the key questions in many biological experiments
• Specificity of the context is– High dimensional data (ten of thousands of
genes) and few samples• Need to borrow information
– Many sources of variability • Important to adopt a flexible modelling framework
Bayesian Hierarchical Modelling allows to capture important features of the data while maintaining generalisibility of the tools/ techniques developed
3BGX
Modelling differential expression
Differential expression parameter
Condition 1 Condition 2
Posterior distribution (flat prior)
Mixture modelling for classification
Hierarchical model of replicatevariability and array effect
Hierarchical model of replicatevariability and array effect
Start with given pointestimates of expression
4BGX
Outline
• Background• Bayesian hierarchical models for differential
expression experiments• Decision rules based on tail posterior probabilities
– Comparison with existing approaches– FDR estimation for tail posterior probabilities
• Extension of tail posterior probabilities to analysing multiclass experiments
• Illustration• Discussion and further work
5BGX
Data: ygcr = log gene expression gene g, replicate r, condition c
g = gene effect
δg = differential effect for gene g between 2 conditions
r(g)c = array effect – modelled as a smooth (spline) function of g
gc2 = gene specific variance
• 1st level yg1r N(g – ½ δg + r(g)1 , g12)
yg2r N(g + ½ δg + r(g)2 , g22)
Σrr(g)c = 0, r(g)c = function of g , parameters {c,d}
• 2nd level “Flat” priors for g , δg, {c,d}
gc2 g (ac, bc)
(lognormal or inverse-gamma)
I -- Bayesian hierarchical model for differential expression (Lewin et al, Biometrics, 2006)
Exchangeablevariances
6BGX
Joint modelling of array effects and differential expression
• Performs normalisation simultaneously with estimation– Gives fewer false positives than plug in
• BHM set up allows to check some of the modelling assumptions using mixed posterior predictive checks:– the need for gene specific variances – their 2nd level distribution
Found that lognormal or 2 parameter inversegamma distribution for the variances gave similar model checks
7BGX
Selecting genes that are differentially expressed
• Interested in testing the null hypothesis
Two broad approaches have been used:
P value type Mixture
P(H0 | ygcr)
H0
H1
U [0,1]
close to 0
close to 1
close to 0
References Baldi and Long Smyth 2004, …
Moderated t stat
Lonnstedt & Speed 02, Newton & Kendziorski, 01, 03 Lonnstedt & Britton 05, Gottardo 06, ….
H (g)0 : ±g =0 versus H (g)
1 : ±g 6=0:
8BGX
Bayesian mixtures
• Relies on specification of prior model for :
• Choice of model for the alternative (see the poster by Alex Lewin)– Could influence the performance of the classification– To check how the alternative fits the data is non standard
Investigate properties of Bayesian selection rules based on “non informative” prior for
±g±g » ¼0±0+(1¡ ¼0)h(±gj´)
±g
9BGX
II -- Bayesian selection rules for pairwise comparisons
• 1st level (no array effect):
• Hierarchical model
Extend p value approach to consider the tail probabilities of appropriate function of parameters
®g; ±g » 1; ¾2gs » f (¾2gs j µs) µs » f (µs)
yg1r j®g;±g;¾2g1 » N (®g ¡ ±g=2;¾2g1); r = 1;:: : ;m1;
yg2r j®g;±g;¾2g2 » N (®g +±g=2;¾2g2); r = 1;:: : ;m2:
10BGX
Posterior distributions
• Define the “Bayesian T statistic”:
The following conditional distributions hold
• Posterior distributions:
Let ¹ygs = 1ms
P msr=1ygsr and and ¹yg = ¹yg2 ¡ ¹yg1, wherems is thenumber of
replicates in condition s. Let s2gs =1
ms ¡ 1
P msr=1(ygsr ¡ ¹ygs)2, s = 1;2
Let w2g =¾2g1=m1+¾2g2=m2 be thevarianceof ¹yg.
tg =±g=wg
±gj¹yg;wg » N (¹yg;w2g);
tgj¹yg;wg » N (¹yg=wg;1):
where f (w2gjs
2gs) denotes theposterior density of thevariances
f (±gj¹yg;s2gs) =
Z 1
0
1wg
' ((±g ¡ ¹yg)=wg)f (w2gjs
2gs)d(w
2g);
f (tgj¹yg;s2gs) =
Z 1
0' (tg ¡ ¹yg=wg)f (w2
gjs2gs)d(w
2g);
11BGX
Tail posterior probabilities 1 (N. Bochkina and SR, 2006)
• Use selection rules of the form :
• What ‘statistic’ to choose:• How to define its percentiles ?
– we suppose that we could have observed data with (its expected value of under the null) – work out the percentiles using posterior distributions
conditional on
Tg =±g or tg ?
¹yg = 0
¹yg = 0:
P fTg >T (®)g jygsrg¸ pcut,
Summarise the distribution of the Tg by a tail area
whereTg is a suitable function of parameters of interest,and T (®)
g denotesan 1¡ ®=2percentileof Tg obtained under thenull hypothesis.
12BGX
Tail posterior probabilities 2
Recall
Corresponding distribution function involves numerical integration computationally expensive
The percentile
is easy to calculate
Consider the tail probability:
f (±gj¹yg;s2gs) =
Z 1
0
1wg
' ((±g ¡ ¹yg)=wg)f (w2gjs
2gs)d(w
2g)
• But
Distribution function of
does not involve gene specific parameters
f (tgj¹yg =0;s2gs)
f (tgj¹yg =0;wg) » N (0;1)
t(®)g = t(®) =©¡ 1(1¡ ®=2)
p(tg; t(®)) = P f jtgj > t(®) jygsrg
13BGX
5% 0.5%
Can simulate ornumerically evaluateF0 the distributionof under H0
p(tg;t(®))
Key point:F0 is geneindependent(conjugate case)
Density of p(tg; t(®)) for ®= 0:05for data generated under thenull hypothesis
¾g1 =¾g2
14BGX
Another Bayesian rule
• A natural idea is to compare the parameter to 0,
i.e. to consider :
or its complementary or the 2-sided alternative :
• It turns out that this Bayesian selection rule behaves like a “p-value”:– Distribution of is uniform under H0
– There is equivalence with frequentist testing based on the marginal distribution of under the null, in the spirit of the moderated t statistic introduced by Smyth 2004
p(±g;0)
p(±g;0) = P f±g >0jygsrg
maxfp(±g;0);1¡ p(±g;0)g
±g
¹yg
15BGX
Suppose¾2g1 =¾2g2 =¾2g with conjugateprior InvGamma(a;b)
f (±gj¹yg;s2gs) =
Z 1
0
1wg
' ((±g ¡ ¹yg)=wg)f (w2gjs
2gs)d(w
2g)
Distribution of wg conjugate
±gjygr ;a;b» t(2a+m1+m2 ¡ 2; ¹yg;Sg=pk)
whereS2g = E (¾2
gjygr ;a;b) =2b+s2g (m1+m2 ¡ 2)
2a+m1+m2¡ 2is theBayesian posterior estimateof
thevarianceand k = (m¡ 11 +m¡ 1
2 )
p(±g;0) = P f±g >0jygr ;a;bg = FT (2a+m1+m2¡ 2)(¹ygpk=Sg)
= 1¡ P©t > tmod
g jH0ª
Link between p(δg,0) and the moderated t statistic
Under H0, thedistribution of p(dg;0) is uniform
Moderated t statistic
16BGX
Histograms of measure of differential expressionSimulated data
Under H0
Under H1
p(tg , tg (α)) p(δg , 0)
17BGX
Tail posterior probabilities 3
• Investigate the performance of selection rules based on
In particular:– what is the FDR associated with each value of ?– In the conjugate case:
– How does this rule compares to rules based on
pcut
p(±g;0)
F DR(pcut) =¼0P fp(tg; t(®)) > pcutjH0g
P fp(tg; t(®)) > pcutg;
Use Storey Use F0
Use observed proportion
p(tg; t(®)) = P f jtgj > t(®) jygsrg> pcut
18BGX
Comparison of estimated (solid line) and “true” FDR (dashed line) on simulated data
π0 = 0.95 π0 = 0.90 π0 = 0.70
19BGX
III-- Data Sets and Biological questions
Biological Questions
• Understand the mechanisms of insulin resistance• Cell line experiments where reaction of mouse muscle cell
line to treatment by insulin or metformin (an insulin replacement drug) is observed after 2 and 12 hours
• Questions of interest related to simple and compound comparisons
3 replicates for each condition, Affymetrix MOE430A chip, 22690 genes per chip
Data pre-processed by RMA and normalised using intensity dependent LOESS normalisation
20BGX
p(tg , t (α)), α = 0.05 2max{ p(δg ,0), 1- p(δg ,0)} - 1
Volcano plots for muscle cell data:Change between insulin and control at 2 hours
Cut-off : 0. 925
Less peaked around zeroAllows better separation
Peaked around zeroVaries steeply as afunction of ¹yg
21BGX
Insulin versus controlTail posterior probabilities
Estimated FDR
2 hours 12 hours
1151 selected (FDR= 0.5%) 13 selected (FDR= 0.5%)
π0 = 0.61 π0 = 0.98
22BGX
Metformin versus controlTail posterior probabilities
Estimated FDR
2 hours 12 hours
1854 selected (FDR= 0.5%) 72 selected (FDR= 0.5%)
π0 = 0.56 π0 = 0.79
23BGX
IV – Extension to the analysis of multi class data
• In our case study, 3 groups (control c=0, insulin c=1, metformin c=2) and 3 times points: t=0, t=1 ( 2 hours), t=2 (12 hours) each replicated 3 times
• ANOVA like model formulation suited to the analysis of such multifactorial experiments :
ygtcr = ®g+°gt +±gtc+"gtcr ;
"gtcr » N (0;¾2g)
¾¡ 2g » ¡ (a;b)
®g; °gt; ±gtc » 1; t;c= 1;2
Global variance parametrisation(borrowing information)
24BGX
Joint tail posterior probabilities
• Interest is in testing a compound null hypothesis, i.e. involving several differential parameters
e.g. testing jointly for the effect of insulin and metformin at 2 hours
• In this case, we are interested in a specific alternative:
Note: Rejecting the null hypothesis in an ANOVA setting
corresponds to a different alternative
• Define joint tail posterior probabilities:
where is the Bayesian T statistic for each treatment
H (g)0 : ±g11 =0& ±g12 =0 versus H (g)
1 : ±g11 6= 0& ±g12 6= 0:
pJg = P f jtg11j > t(®) & jtg12j > t(®) j data)
tgtc =±gtc=wg
25BGX
Benefits of joint posterior probabilities
• Takes into account correlation of the differential expression measures between the conditions induced by sharing the same variance parameter
• Usual practice is to:– Carry out pairwise comparisons– Select genes for each comparison using same cut-off
on the pp – Intersect lists and find genes common to both lists
• Joint pp shown to lead to fewer false positives in
this case of positive correlation (simulation study)
26BGX
Correlation of DE parameters and Bayesian T statistic for insulin and metformin (2 hours)
• With joint tail posterior probabilities, and a cut-off of pcut =0.92, 280 selected as jointly perturbed at 2 hours
• Applying pairwise comparison and combining the lists adds another 47 genes to the list
Correlation between tg11 and tg12Correlation between ±g11 and ±g12
27BGX
Discussion 1
• Tail posterior probabilities (Tpp) is a generic tool that can be used in any situations where a large number of hypotheses related in a hierarchical fashion are to be tested
• We have derived the distribution of the Tpp under the null and
proposed a corresponding estimate of FDR • This distribution requires numerical integration but is gene
independent (conjugate case), so only needs to be evaluated once • Tpp is a smooth function of the amount of DE with a gradient that
“spreads” the genes, thus allowing to choose genes with desired level of uncertainty about their DE
• Interesting connection between Bayesian and frequentist inference for the differential expression parameter
28BGX
Discussion 2
• Interesting to compare performance of Tpp with that of mixture models• E.g Gamma mixtures (see poster by Alex Lewin)
δg ~ 0δ0 + 1G (-x|1.5, 1) + 2G (x|1.5, 2)
H0 H1
Dirichlet distribution for (0, 1, 2)
Exp(1) hyper prior for 1 and 2
Also Normal and t mixtures have been considered:
δg ~ 0δ0 + (1-0) T(ν,μ,τ) (μ ~ 1, τ, ν -1~ Exp(1) )
δg ~ 0δ0 + (1-0) N(μ,τ) (μ ~ 1, τ ~ Exp(1) )
29BGX
Simulated data
• 3000 variables, 6 replicates, 2 conditions
yg1r N(g, g2)
yg2r N(g + δg, g2)
g2 ~ 0.03 + LogNorm(-3.85, 0.82),
g ~ Norm(7, 25),• δg : slightly asymmetric:
5%: δg | δg > 0 ~ h( δg),
10%: δg | δg < 0 ~ h(-δg),
85%: δg ~ N(0, 0.01),
whereh(j±gjj±g 6= 0) = 0:2U[0;2:5]+0:4U[0:07;0:7]+0:4N (0:7;0:7).
30BGX
Comparison of mixture and tail pp
• Fit 3 mixture models (Gamma, Normal, t alternative) and flat model.
• Classification mixtures: P{ H1 |data}, flat: tail posterior probability.
Comparable performance, with a little edge for the Gamma and Normal mixture
31BGX
BBSRC Exploiting Genomics grant Wellcome Trust BAIR consortium
Colleagues in the Biostatistics group: Marta Blangiardo, Anne Mette Hein, Maria de Iorio
Colleagues in the Biology group at ImperialTim Aitman, Ulrika Andersson, Dave Carling
Papers and technical reports: www.bgx.org.uk/For the tail probability paper: www.bgx.org.uk/Natalia/Bochkina.ps
Thanks