+ All Categories
Home > Documents > 3 rd Summer School in Computational Biology September 8, 2014 Frank Emmert-Streib Computational...

3 rd Summer School in Computational Biology September 8, 2014 Frank Emmert-Streib Computational...

Date post: 23-Dec-2015
Category:
Upload: pierce-harvey
View: 215 times
Download: 1 times
Share this document with a friend
Popular Tags:
30
3 rd Summer School in Computational Biology September 8, 2014 Frank Emmert-Streib Computational Biology and Machine Learning Laboratory Center for Cancer Research and Cell Biology Queen’s University Belfast, UK
Transcript

3rd Summer Schoolin Computational Biology

September 8, 2014

Frank Emmert-StreibComputational Biology and Machine Learning Laboratory

Center for Cancer Research and Cell Biology Queen’s University Belfast, UK

Organizers of the summer school

General questions: Frank Emmert-Streib Shu-Dong Zhang [email protected] [email protected]

Lecturers of the summer school

Shailesh Tripathi .

Alexey Stupnikov

& Kevin Keenan, David Simpson, Caroline Meharg, Myrto Kostadima, Bori Mifsud

We thank our sponsors

History of the summer school

year0

5

10

15

20

25

30

35

40

18

25

35

2012 2013 2014

Num

ber o

f par

ticip

ants

Organizational notes

• Coffee breaks (short - foyer)• Lunch (1 hour)• Sign-in sheets

• Internet access: – Students from QUB: Use your QUB account – External students: Guest account

Shailesh Tripathi

Schedule

What will we learn?

• different high-throughput data types:– Microarray data– Sequencing data (DNA-seq, RNA-seq, ChIP-seq)

• basic statistics and machine learning methods– Hypothesis testing– Supervised & unsupervised learning

• basic data visualization• importance of large-scale data in modern

biologysystems biology

Interdisciplinary summer school

Vision of the VC

Universities require interdisciplinary engagement in the educational and research

effort

Professor Patrick Johnston of President andVice-Chancellor (VC) of Queen’s University

What will we not learn?(Adjusting expectations)

• Example:– When learning a foreign language, how much can you

learn in 3 days?• Analogy:– programming language – statistics/machine learning – biology

The time it takes to become proficient in computational biology is comparable to the time to learn a language.

Good news!

• The summer school in computational biology provides you with a guided start.

• When you are from Belfast:– Journal club: computational biology and biostatistics

(every Monday in the HSB, 3pm)– Degree: MSc in Computational Genomics &

Bioinformatics– General problems/questions: Frank Emmert-Streib

High-throughput data

Data Types

Central Dogma of Molecular BiologyFrancis Crick, 1956

Reproducible Research

What is reproducible research?

Reproducible research is the ability that an entire study can be reproduced, either by the same researcher or an independent researcher.

In this context is important.

Example

In order to understand the meaning of reproducible research let’s consider the following examples.

Task: Producethe figure.

Example

In order to understand the meaning of reproducible research let’s consider the following examples.

Task: Producethe figure.

Approach:Adobe IllustratorGimpCorelDrawPowerpoint

Example

In order to understand the meaning of reproducible research let’s consider the following examples.

Task: Producethe figure.

Summary:How long did it take?t=30minHow did you do it?Describe it in a report.

Example

When you publish results, e.g.,

and someone wants to repeat the same or a similar analysis– How long does a re-analysis take?– How is a re-analysis done?

Example

When you publish results, e.g.,

and someone wants to repeat the same or a similar analysis– How long does a re-analysis take? – 30min– How is a re-analysis done? – depends on the report you

provided & the availability of the software

Alternative way to generate results

Create the figure by writing a program.

• Latex• freely available

ComparisonProprietary Software with GUI

Programming language

Time for you to create figure for the first time

t = 30min t = 30min

Time for you to create figure for the n-th time

ts < t (ts = 20min) tp < t (tp << 1sec)

Time for someone else to create the same figure for first time

t’ ~ t (t’ = 30min) t’’ ~ tp (t’’<< 1sec

Need to pay for license?

Yes No

Figure reproducible by everyone?

No Yes

Back to data analysis

The same line or argumentation holds for the analysis of data.

• Create a figure -> conduct a data analysis• Adobe Illustrator -> Partek, GenomeStudio etc

Back to data analysis

The same line or argumentation holds for the analysis of data.

• Create a figure -> conduct a data analysis• Adobe Illustrator -> Partek, GenomeStudio etc

In order to obtain reproducible results in ‘genomics’ we use R.

Reproducible research

• Analyze data by writing programs in R.• Share your data & your programs with others.

Other groups can reproduce your results.

For this reason we use R in this summer school.

Data sharing

US National Institute of Health (NIH) requires that all generated genomics data funded by NIH must be shared online.

Nature, 4 September 2014

Mandatory!

Enjoy the summer school!


Recommended