A Microarray-Based Screening Procedure for Detecting Differentially
Represented Yeast Mutants
Rafael A. Irizarry
Department of Biostatistics, JHU
http://biostat.jhsph.edu/~ririzarr
kanRA
Transformation into deletion pool
Select for Ura+ transformantsGenomic DNA preparation
Circular pRS416
PCRCy5 labeled PCR products Cy3 labeled PCR products
Oligonucleotide array hybridization
B
EcoRI linearized PRS416
NHEJ Defective
MCS
CEN/ARS
URA3 ttaaaatt
CEN/ARS
URA3
UPTAG DOWNTAG
Which mutants are NHEJ defective?
• Find mutants defective for transformation with linear DNA
• Dead in linear transformation (green)
• Alive in circular transformation (red)
• Look for spots with large log(R/G)
• .
Y K U 7 0 N E J 1 Y K U 8 0
Y K U 7 0 N E J 1 Y K U 8 0
Data
• 5718 mutants
• 3 replicates on each slide
• 5 Haploid slides, 4 Diploid slides
• Arrays are divided into 2 downtags, 3 uptag (2 of which replicate uptags)
Average Red and Green Scatter Plot
Average Red and Green MVA plot
Improvement to usual approach
• Take into account that some mutants are dead and some alive
• Use a statistical model to represent this• Mixture model?• With ratio’s we lose information about R
and G separately • Look at them separately (absolute
analysis)
Histograms
Using model we can attach uncertainty to tests
For example posterior z-test,
weighted average of z-tests with weights obtained using the posterior probability (obtained from EM)
Is Normal(0,1)
QQ-Plot
Uptag/Downtag Z-Scores
Average Red and Green MVA Plot
Average Red and Green Scatter Plot
ResultsTable
1 YMR106C 9.5 47 69.2 a a 1002 YOR005C 19.7 35 44.9 a d 1003 YLR265C 6.1 32 35.8 a m 1004 YDL041W 10.4 32 35.6 a m 1005 YIL012W 12.2 31 21.7 a a 1006 YIL093C 4.8 29 30.8 a a 1007 YIL009W 5.6 29 -23.5 a a 1008 YDL042C 12.9 29 32.1 a d 1009 YIL154C 1.8 28 91.3 m m 8210 YNL149C 1.7 27 93.4 m d 7111 YBR085W 2.5 26 -15.8 a a 8412 YBR234C 1.7 26 87.5 m d 7513 YLR442C 6.1 26 -100.0 a a 100
Acknowledgements
• Siew Loon Ooi
• Jef Boeke
• Forrest Spencer
• Jean Yang
END
Summary
• Simple data exploration useful tool for quality assessment
• Statistical thinking helpful for interpretation
• Statistical models may help find signals in noise
Acknowledgements
UC Berkeley StatBen BolstadSandrine DudoitTerry SpeedJean Yang
MBG (SOM)Jef BoekeSiew-Loon OoiMarina LeeForrest Spencer
BiostatisticsKarl BromanLeslie CopeCarlo CoulantoniGiovanni ParmigianiScott Zeger
Gene LogicFrancois Colin Uwe Scherf’s Group
PGATom Cappola Skip GarciaJoshua Hare
WEHIBridget HobbsNatalie Thorne
Warning
• Absolute analyses can be dangerous for competitive hybridization slides
• We must be careful about “spot effect”
• Big R or G may only mean the spot they where on had large amounts of cDNA
• Look at some facts that make us feel safer
Correlation between replicates
R1 R2 R3 G1 G2 G3
R1 1.00 0.95 0.95 0.94 0.90 0.90
R2 0.95 1.00 0.96 0.90 0.95 0.91
R3 0.95 0.96 1.00 0.91 0.92 0.95
G1 0.94 0.90 0.91 1.00 0.96 0.96
G2 0.90 0.95 0.92 0.96 1.00 0.97
G3 0.90 0.91 0.95 0.96 0.97 1.00
Correlation between red, green, haploid, diplod, uptag, downtag
RHD RHU RDD RDU GHD GHU GDD GDU
RHD 1.00 0.59 0.56 0.32 0.95 0.58 0.54 0.37RHU 0.59 1.00 0.38 0.56 0.58 0.95 0.40 0.58RDD 0.56 0.38 1.00 0.58 0.54 0.39 0.92 0.64RDU 0.32 0.56 0.58 1.00 0.33 0.53 0.58 0.89GHD 0.95 0.58 0.54 0.33 1.00 0.62 0.56 0.39GHU 0.58 0.95 0.39 0.53 0.62 1.00 0.41 0.58GDD 0.54 0.40 0.92 0.58 0.56 0.41 1.00 0.73GDU 0.37 0.58 0.64 0.89 0.39 0.58 0.73 1.00
BTW
The mean squared error across slides is about 3 times bigger than the mean squared error within slides
Mixture Model
We use a mixture model that assumes:
• There are three classes:– Dead– Marginal– Alive
• Normally distributed with same correlation structure from gene to gene
Random effect justification
Each x = (r1,…,r5,g1,…,g5) will have the following effects:
• Individual effect: same mutant same expression (replicates are alike)
• Genetic effect: same genetics same expression
• PCR effect : expect difference in uptag, downtag
Does it fit?
Does it fit?
What can we do now that we couldn’t do before?
• Define a t-test that takes into account if mutants are dead or not when computing variance
• For each gene compute likelihood ratios comparing two hypothesis:
alive/dead vs.dead/dead or alive/alive
QQ-plot for new t-test
Better looking than others
1 YMR106C 9.5 47 69.2 a a 1002 YOR005C 19.7 35 44.9 a d 1003 YLR265C 6.1 32 35.8 a m 1004 YDL041W 10.4 32 35.6 a m 1005 YIL012W 12.2 31 21.7 a a 1006 YIL093C 4.8 29 30.8 a a 1007 YIL009W 5.6 29 -23.5 a a 1008 YDL042C 12.9 29 32.1 a d 1009 YIL154C 1.8 28 91.3 m m 8210 YNL149C 1.7 27 93.4 m d 7111 YBR085W 2.5 26 -15.8 a a 8412 YBR234C 1.7 26 87.5 m d 7513 YLR442C 6.1 26 -100.0 a a 100