Date post: | 23-Dec-2015 |
Category: |
Documents |
Upload: | pierce-harvey |
View: | 215 times |
Download: | 1 times |
3rd Summer Schoolin Computational Biology
September 8, 2014
Frank Emmert-StreibComputational Biology and Machine Learning Laboratory
Center for Cancer Research and Cell Biology Queen’s University Belfast, UK
Organizers of the summer school
General questions: Frank Emmert-Streib Shu-Dong Zhang [email protected] [email protected]
Lecturers of the summer school
Shailesh Tripathi .
Alexey Stupnikov
& Kevin Keenan, David Simpson, Caroline Meharg, Myrto Kostadima, Bori Mifsud
History of the summer school
year0
5
10
15
20
25
30
35
40
18
25
35
2012 2013 2014
Num
ber o
f par
ticip
ants
Organizational notes
• Coffee breaks (short - foyer)• Lunch (1 hour)• Sign-in sheets
• Internet access: – Students from QUB: Use your QUB account – External students: Guest account
Shailesh Tripathi
What will we learn?
• different high-throughput data types:– Microarray data– Sequencing data (DNA-seq, RNA-seq, ChIP-seq)
• basic statistics and machine learning methods– Hypothesis testing– Supervised & unsupervised learning
• basic data visualization• importance of large-scale data in modern
biologysystems biology
Vision of the VC
Universities require interdisciplinary engagement in the educational and research
effort
Professor Patrick Johnston of President andVice-Chancellor (VC) of Queen’s University
What will we not learn?(Adjusting expectations)
• Example:– When learning a foreign language, how much can you
learn in 3 days?• Analogy:– programming language – statistics/machine learning – biology
The time it takes to become proficient in computational biology is comparable to the time to learn a language.
Good news!
• The summer school in computational biology provides you with a guided start.
• When you are from Belfast:– Journal club: computational biology and biostatistics
(every Monday in the HSB, 3pm)– Degree: MSc in Computational Genomics &
Bioinformatics– General problems/questions: Frank Emmert-Streib
What is reproducible research?
Reproducible research is the ability that an entire study can be reproduced, either by the same researcher or an independent researcher.
In this context is important.
Example
In order to understand the meaning of reproducible research let’s consider the following examples.
Task: Producethe figure.
Example
In order to understand the meaning of reproducible research let’s consider the following examples.
Task: Producethe figure.
Approach:Adobe IllustratorGimpCorelDrawPowerpoint
Example
In order to understand the meaning of reproducible research let’s consider the following examples.
Task: Producethe figure.
Summary:How long did it take?t=30minHow did you do it?Describe it in a report.
Example
When you publish results, e.g.,
and someone wants to repeat the same or a similar analysis– How long does a re-analysis take?– How is a re-analysis done?
Example
When you publish results, e.g.,
and someone wants to repeat the same or a similar analysis– How long does a re-analysis take? – 30min– How is a re-analysis done? – depends on the report you
provided & the availability of the software
Alternative way to generate results
Create the figure by writing a program.
• Latex• freely available
ComparisonProprietary Software with GUI
Programming language
Time for you to create figure for the first time
t = 30min t = 30min
Time for you to create figure for the n-th time
ts < t (ts = 20min) tp < t (tp << 1sec)
Time for someone else to create the same figure for first time
t’ ~ t (t’ = 30min) t’’ ~ tp (t’’<< 1sec
Need to pay for license?
Yes No
Figure reproducible by everyone?
No Yes
Back to data analysis
The same line or argumentation holds for the analysis of data.
• Create a figure -> conduct a data analysis• Adobe Illustrator -> Partek, GenomeStudio etc
Back to data analysis
The same line or argumentation holds for the analysis of data.
• Create a figure -> conduct a data analysis• Adobe Illustrator -> Partek, GenomeStudio etc
In order to obtain reproducible results in ‘genomics’ we use R.
Reproducible research
• Analyze data by writing programs in R.• Share your data & your programs with others.
Other groups can reproduce your results.
For this reason we use R in this summer school.
Data sharing
US National Institute of Health (NIH) requires that all generated genomics data funded by NIH must be shared online.
Nature, 4 September 2014
Mandatory!