Date post: | 29-Jul-2015 |
Category: |
Health & Medicine |
Upload: | jtleek |
View: | 199 times |
Download: | 0 times |
fixing the leaks in the genomics
http://jhudatascience.org/
https://www.coursera.org/specialization/genomics/41
@simplystatshttp://simplystatistics.org
@jtleekhttp://www.jtleek.com
https://www.counsyl.com/
Their basic pitch was “Genomics is a fraud”
“”
http://www.technologyreview.com/news/535771/a-contrarian-in-biotech/
“The explosive growth of next-generation sequencing data submitted into the SRA exceeds the growth rate of storage capacity ”
http://www.ncbi.nlm.nih.gov/pubmed/22009675
3 costanalyst variationmotivation
1 cost
costs
moneyinterpretability
http://arxiv.org/pdf/math/0606441.pdf
http://www.ncbi.nlm.nih.gov/pubmed/19276151
@leekgroup
http://www.ncbi.nlm.nih.gov/pubmed/25788628
http://www.ncbi.nlm.nih.gov/pubmed/25788628
Agilent/Grade 1 Agilent/Grade 3 Illumina/Grade1 Illumina/Grade3
100%
75%
50%
25%
0%
Acc
urac
y
Pam Scaled Pam Unscaled TSP
http://www.ncbi.nlm.nih.gov/pubmed/25788628
algorithm1.select useful pairs2.screen pairs for association3.build a simple cart predictor
http://www.ncbi.nlm.nih.gov/pubmed/19276151
Patil et al. (in prep)
Patil et al. (in prep)
Patil et al. (in prep)
@leekgroup
Data:
xik
- value for feature i, sample k
yk - group indicator for sample k
TSP is (i,j) pair that maximizes:
|Pr(xik
< xjk
| yk=1) – Pr(x
ik < x
jk | y
k=0)| ⌃ ⌃
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1989150/
@leekgroup
zijk
=1(xik
< xjk
)
E[zijk
|yk] = a
0ij + a
1ijy
k
→ max |a1jk
| = TSP
Patil et al. (in prep)
@leekgroup
• Not the same as TSP• But |â/s.e.(â)| = |û/s.e.(û)|, algebraically• “Variance regularized” TSP• zijk invariant to monotone transformations• Fix parameters → find features
E[yk|z
ijk] = u
0ij + u
1ijz
ijk
Patil et al. (in prep)
@leekgroup
1. Calculate t-statistic for all pairs2. Choose top pair (or covariate)3. Continue for a fixed number of pairs
E[yk|z
ijk] = u
0ij + u
1ijz
ijk
Patil et al. (in prep)
@leekgroup
http://astor.som.jhmi.edu/~marchion//breastTSP.html
@leekgroup
USP7 < RP11-423C15.3
NM_018610 < MTCH1
RND1 < LGALS14
No Recur
No Recur
No Recur
Recur
No Yes
No Yes
No Yes
@leekgroup
@leekgroup
Mammaprint
Patil et al. (in prep)
2 analyst variation
what went wrong?
2things
what went wrong? transparency
The data/code weren’t reproducible
what went wrong? transparency
There was a lack of cooperation
what went wrong? expertise
They used silly prediction rules
(Pr(FEC) = 5/8[Pr(F) + Pr(E) + Pr(C)] – ¼)
what went wrong? expertise
They had study design problems
(Batch effects)
what went wrong? expertise
Their predictions weren’t locked down
Today: Pr(FEC) = 0.8Tomorrow: Pr(FEC) = 0.1
At the end of the day the Pottianalysis was fully reproducible
The problem is that the analysiswas wrong
@leekgroup
http://bit.ly/10vS1yt
@leekgroup
http://bit.ly/OgW3xv
@leekgroup
Drinkel et al. Oganometalics 2013
@leekgroup
@leekgroup
@leekgroup
@leekgroup
http://simplystatistics.tumblr.com/post/19646774024/laws-of-nature-and-the-law-of-patents-supreme-court
3 motivation
$(from reducing sample size)
basic idearandomization isn’t perfect “rebalance” with baseline covariatesimprove estimator precision
Ack Math!!!!
Estimate probability of being in arm given baseline covariates
Calculate initial estimate for each person using each arm model using propensity score weighted logistic regression
Define a covariate as the residual from fitting the arm-level models minus the arm-level means and fit new propensity models
Use these propensities to re-fit WLR from (2), then average predictions to get covariate-adjusted treatment effect
@leekgroup
http://astor.som.jhmi.edu/~marchion//breastTSP.html
@leekgroup
Age, Tumor Size, Grade 5.1%
Age, Tumor Size, Grade, ER Status
4.9%
Mammaprint Risk Category (MRC)
5.4%
Age, Tumor Size, Grade, ER Status, MRC
7.8%
@leekgroup
Age, Tumor Size, Grade 5.1%
Age, Tumor Size, Grade, ER Status
4.9%
Mammaprint Risk Category (MRC)
5.4%
Age, Tumor Size, Grade, ER Status, MRC
7.8%
Age, Tumor Size, Grade, ER Status, TSP
6.2%
3 costanalyst variationmotivation
acknowledgementsLeek groupPrasad PatilLeo Collado TorresAbhi NelloreClaire RubermanJack FuKai Kammers
CollaboratorsMichael RosenblumBenjamin Haibe-KainsP.O. Bachant-WinnerRoger Peng
Prasad Patilhttp://www.biostat.jhsph.edu/~prpatil/
Links
https://github.com/leekgroup/sig2trial
http://jtleek.com/talks/