+ All Categories
Home > Documents > Introduction to and - Wiki.uio.no · 2014-10-07 · Introduction to and INF-BIO5121/9121 Sveinung...

Introduction to and - Wiki.uio.no · 2014-10-07 · Introduction to and INF-BIO5121/9121 Sveinung...

Date post: 25-May-2020
Category:
Upload: others
View: 4 times
Download: 0 times
Share this document with a friend
40
Introduction to and INF-BIO5121/9121 Sveinung Gundersen ELIXIR.NO / Dept. of Informatics, UiO Oct 7, 2014
Transcript
Page 1: Introduction to and - Wiki.uio.no · 2014-10-07 · Introduction to and INF-BIO5121/9121 Sveinung Gundersen ELIXIR.NO / Dept. of Informatics, UiO Oct 7, 2014

Introduction to and

INF-BIO5121/9121

Sveinung GundersenELIXIR.NO / Dept. of Informatics, UiO

Oct 7, 2014

Page 2: Introduction to and - Wiki.uio.no · 2014-10-07 · Introduction to and INF-BIO5121/9121 Sveinung Gundersen ELIXIR.NO / Dept. of Informatics, UiO Oct 7, 2014

Credit

• Some of this presentation (most figures) is fetched from the presentation “Introduction to Lifeportal” held by Karin Lagesen, provided under the CC-by license (http://creativecommons.org/licenses/by/4.0/). Modifications have been made.

Page 3: Introduction to and - Wiki.uio.no · 2014-10-07 · Introduction to and INF-BIO5121/9121 Sveinung Gundersen ELIXIR.NO / Dept. of Informatics, UiO Oct 7, 2014

• We are doing science, also on the computer!

• 4-5-6 is typically done on the computer anyway

• But the methods/software used in bioinformatics often give very varied results

• We should really think of computer analysis as part of the experiment, aiming for the same level of rigor and reproducibility!by Tiffany Ard, Nerdy Baby artwork,

https://www.facebook.com/NerdyBabyLLC

Page 4: Introduction to and - Wiki.uio.no · 2014-10-07 · Introduction to and INF-BIO5121/9121 Sveinung Gundersen ELIXIR.NO / Dept. of Informatics, UiO Oct 7, 2014

Galaxy• Developed at Penn State and Emory

Universities, for over 10 years by a large development team

• Aims to be a framework for “supporting

• Accessible

• Reproducible

• Transparent

• computational research in the life sciences” (Goecks et. al., Genome Biology 2010)

Page 5: Introduction to and - Wiki.uio.no · 2014-10-07 · Introduction to and INF-BIO5121/9121 Sveinung Gundersen ELIXIR.NO / Dept. of Informatics, UiO Oct 7, 2014

Accessible

• Users do not need to learn the command line

• Web-based solution, point-and-click

• Consistent look and feel

• Easy to upload your own datasets, or import datasets from established data warehouses

Page 6: Introduction to and - Wiki.uio.no · 2014-10-07 · Introduction to and INF-BIO5121/9121 Sveinung Gundersen ELIXIR.NO / Dept. of Informatics, UiO Oct 7, 2014

Reproducible

• Bioinformaticians gets surprised every time they need to redo/modify previous analyses

• But bench biologists already know the importance of reproducibility!

• You also know that even with a detailed lab journal, reproduction is a challenge

• The question is then how this manifests itself when doing analysis on a computer

Page 7: Introduction to and - Wiki.uio.no · 2014-10-07 · Introduction to and INF-BIO5121/9121 Sveinung Gundersen ELIXIR.NO / Dept. of Informatics, UiO Oct 7, 2014

What is in silico reproducibility?

• Basically the same issues as at the bench:

• Materials -> Data sources

• Experiment conditions -> Analysis parameters

• Equipment (and models) -> Programs (and versions)

• And the same challenges:

• Are all relevant conditions described accurately?

• Will the same materials and equipment be available?

Page 8: Introduction to and - Wiki.uio.no · 2014-10-07 · Introduction to and INF-BIO5121/9121 Sveinung Gundersen ELIXIR.NO / Dept. of Informatics, UiO Oct 7, 2014

What is the current status of reproducibility?• Less than half of selected microarray

experiments published in Nature Genetics could be reproduced(Ioannidis et al., Nat Genet 2009)

• More than half [of surveyed papers] do not provide primary data and list neither the version nor the parameters used [for read mapping](Nekrutenko and Taylor., Nat Rev Genet 2012)

Page 9: Introduction to and - Wiki.uio.no · 2014-10-07 · Introduction to and INF-BIO5121/9121 Sveinung Gundersen ELIXIR.NO / Dept. of Informatics, UiO Oct 7, 2014

Why should you care?(about making your analyses reproducible)

• Because it’s the right thing to do!

• .. and the one that’s struggling with its reproduction is often the future you

• Journals are becoming aware of the issues

• Reviewers may value it

• Anyway, it’s the same as at the bench..

Page 10: Introduction to and - Wiki.uio.no · 2014-10-07 · Introduction to and INF-BIO5121/9121 Sveinung Gundersen ELIXIR.NO / Dept. of Informatics, UiO Oct 7, 2014

Galaxy supports reproducibility

• Automatically tracks metadata at every step

• Which are the datasets?

• What are the parameters?

• Which tools, and which version of the tool?

• What are the outputs

• Users can annotate the steps to capture the intent of the analysis!

Page 11: Introduction to and - Wiki.uio.no · 2014-10-07 · Introduction to and INF-BIO5121/9121 Sveinung Gundersen ELIXIR.NO / Dept. of Informatics, UiO Oct 7, 2014

Galaxy supports reproducibility

• All jobs can be rerun later, by independent scientists

• Workflows capture common analysis sequences, i.e. typical experimental setups. Can be reused for other datasets and experiments

Page 12: Introduction to and - Wiki.uio.no · 2014-10-07 · Introduction to and INF-BIO5121/9121 Sveinung Gundersen ELIXIR.NO / Dept. of Informatics, UiO Oct 7, 2014

Transparent• “Enabling users to share and communicate

their experimental results and outputs in a meaningful way” (Goecks et. al., Genome Biology 2010)

• Everything can be shared: Datasets, histories (i.e. experimental logbook), tools, workflows

• Provides public repositories

• Galaxy Pages are web-based documents for publishing results. Every level of detail can be accessed by readers

Page 13: Introduction to and - Wiki.uio.no · 2014-10-07 · Introduction to and INF-BIO5121/9121 Sveinung Gundersen ELIXIR.NO / Dept. of Informatics, UiO Oct 7, 2014

• Galaxy installation at UiO, running on the Abel cluster

• Contains hundreds of tools, from Phylogeny tools to High Througput Sequencing analysis

• Available for all Feide users (all university users and several colleges)

Page 14: Introduction to and - Wiki.uio.no · 2014-10-07 · Introduction to and INF-BIO5121/9121 Sveinung Gundersen ELIXIR.NO / Dept. of Informatics, UiO Oct 7, 2014

lifeportal.uio.no

Select  Feide  login,  press  Academic  Login

Page 15: Introduction to and - Wiki.uio.no · 2014-10-07 · Introduction to and INF-BIO5121/9121 Sveinung Gundersen ELIXIR.NO / Dept. of Informatics, UiO Oct 7, 2014

Select your institution

Select  University  ofOslo,  then  con;nue

Page 16: Introduction to and - Wiki.uio.no · 2014-10-07 · Introduction to and INF-BIO5121/9121 Sveinung Gundersen ELIXIR.NO / Dept. of Informatics, UiO Oct 7, 2014

Use UiO username/password

Your  UiO  usernameand  password

Page 17: Introduction to and - Wiki.uio.no · 2014-10-07 · Introduction to and INF-BIO5121/9121 Sveinung Gundersen ELIXIR.NO / Dept. of Informatics, UiO Oct 7, 2014

Verify login information

Click  User,  verify  thatyour  email  addressis  shown

Page 18: Introduction to and - Wiki.uio.no · 2014-10-07 · Introduction to and INF-BIO5121/9121 Sveinung Gundersen ELIXIR.NO / Dept. of Informatics, UiO Oct 7, 2014

Page orientation

Naviga;on  bar,  with  workflows,  shared  data  etc.

History  panel-­‐  shows  allthe  datasets  you  haveanalyzed  and  produced

Tool  panel  withmany  analysisprograms Detail  panel  –  

where  the  resultsare  shown

Page 19: Introduction to and - Wiki.uio.no · 2014-10-07 · Introduction to and INF-BIO5121/9121 Sveinung Gundersen ELIXIR.NO / Dept. of Informatics, UiO Oct 7, 2014

Create a new history

When  star;ng  on  a  "new"thing,  start  with  a  cleanhistory,  and  name  it!

Page 20: Introduction to and - Wiki.uio.no · 2014-10-07 · Introduction to and INF-BIO5121/9121 Sveinung Gundersen ELIXIR.NO / Dept. of Informatics, UiO Oct 7, 2014

Getting data: uploading

Click  on  Upload  File,then  Upload  File  again

Select  fastqsanger  assequence  format

Page 21: Introduction to and - Wiki.uio.no · 2014-10-07 · Introduction to and INF-BIO5121/9121 Sveinung Gundersen ELIXIR.NO / Dept. of Informatics, UiO Oct 7, 2014

Uploading data

Select  input  file  here

Page 22: Introduction to and - Wiki.uio.no · 2014-10-07 · Introduction to and INF-BIO5121/9121 Sveinung Gundersen ELIXIR.NO / Dept. of Informatics, UiO Oct 7, 2014

Uploaded data

Uploading  data  -­‐  not  quite  done

Page 23: Introduction to and - Wiki.uio.no · 2014-10-07 · Introduction to and INF-BIO5121/9121 Sveinung Gundersen ELIXIR.NO / Dept. of Informatics, UiO Oct 7, 2014

Look at data - eye symbol

Page 24: Introduction to and - Wiki.uio.no · 2014-10-07 · Introduction to and INF-BIO5121/9121 Sveinung Gundersen ELIXIR.NO / Dept. of Informatics, UiO Oct 7, 2014

Data annotation - pen symbol

Can  add  informa;onabout  the  data  set  hereGood  for  tracking  data

Page 25: Introduction to and - Wiki.uio.no · 2014-10-07 · Introduction to and INF-BIO5121/9121 Sveinung Gundersen ELIXIR.NO / Dept. of Informatics, UiO Oct 7, 2014

Removing data set - X

NOTE:  removed  data  sets  are  not  gone,just  not  shown  in  your  history

Need  to  do  more  to  actually  delete  it

Page 26: Introduction to and - Wiki.uio.no · 2014-10-07 · Introduction to and INF-BIO5121/9121 Sveinung Gundersen ELIXIR.NO / Dept. of Informatics, UiO Oct 7, 2014

Analyzing data

Select  programin  leT  bar

Select  inputfile  here

?

Page 27: Introduction to and - Wiki.uio.no · 2014-10-07 · Introduction to and INF-BIO5121/9121 Sveinung Gundersen ELIXIR.NO / Dept. of Informatics, UiO Oct 7, 2014

15.08.2014 [email protected]

The  abel  computer  cluster

• Lifeportal  runs  on  the  abel  computer  cluster

• >  10  000  cores!

• >  40  TB  memory!

• Lifeportal  submits  jobs  to  the  abel  cluster

• Can  use  several  cores  for  a  single  job  

27

Page 28: Introduction to and - Wiki.uio.no · 2014-10-07 · Introduction to and INF-BIO5121/9121 Sveinung Gundersen ELIXIR.NO / Dept. of Informatics, UiO Oct 7, 2014

15.08.2014 [email protected]

Choose  job  op;ons

28

Page 29: Introduction to and - Wiki.uio.no · 2014-10-07 · Introduction to and INF-BIO5121/9121 Sveinung Gundersen ELIXIR.NO / Dept. of Informatics, UiO Oct 7, 2014

15.08.2014 [email protected]

Job  op;ons

• #  tasks  =  #  cores  you  want  to  use• #  tasks  per  node:  –One  node  has  16  cores,  some;mes  programs  run  faster  if  all  cores  are  in  the  same  node

• Wall  ;me:  guess;mated  run;me.  – Note:  jobs  exceeding  that  will  be  killed!

• Memory  per  cpu:  each  CPU  has  4  GB  of  memory  -­‐  just  leave  this  op;on

29

Page 30: Introduction to and - Wiki.uio.no · 2014-10-07 · Introduction to and INF-BIO5121/9121 Sveinung Gundersen ELIXIR.NO / Dept. of Informatics, UiO Oct 7, 2014

15.08.2014 [email protected]

CPU  quotas

• Quotas  calculated  as  #  CPU  hours• All  have  200  hrs  to  use

• Big  projects  should  apply  for  their  own  quotas  

30

Page 31: Introduction to and - Wiki.uio.no · 2014-10-07 · Introduction to and INF-BIO5121/9121 Sveinung Gundersen ELIXIR.NO / Dept. of Informatics, UiO Oct 7, 2014

15.08.2014 [email protected]

Running  job  status

• Colors  show  the  status  of  the  job

• Purple:  data  uploading

• Gray:  analysis  queued

• Yellow:  running

• Green:  done

• Red:  error  has  occured

Queued

Running

Done

31

Page 32: Introduction to and - Wiki.uio.no · 2014-10-07 · Introduction to and INF-BIO5121/9121 Sveinung Gundersen ELIXIR.NO / Dept. of Informatics, UiO Oct 7, 2014

15.08.2014 [email protected]

Results  show  up  as  new  data  set!

Results  from  jobshow  up  as  a  newdata  set  in  history!

Basic  sta;s;csappear  here

FastQC  qualityplot

32

Page 33: Introduction to and - Wiki.uio.no · 2014-10-07 · Introduction to and INF-BIO5121/9121 Sveinung Gundersen ELIXIR.NO / Dept. of Informatics, UiO Oct 7, 2014

15.08.2014 [email protected]

Data  sets  know  how  they  were  made

33

Page 34: Introduction to and - Wiki.uio.no · 2014-10-07 · Introduction to and INF-BIO5121/9121 Sveinung Gundersen ELIXIR.NO / Dept. of Informatics, UiO Oct 7, 2014

15.08.2014 [email protected]

Can  easily  run  analyses  again

34

Page 35: Introduction to and - Wiki.uio.no · 2014-10-07 · Introduction to and INF-BIO5121/9121 Sveinung Gundersen ELIXIR.NO / Dept. of Informatics, UiO Oct 7, 2014

15.08.2014 [email protected]

What  did  I  do  again....?

35

Page 36: Introduction to and - Wiki.uio.no · 2014-10-07 · Introduction to and INF-BIO5121/9121 Sveinung Gundersen ELIXIR.NO / Dept. of Informatics, UiO Oct 7, 2014

15.08.2014 [email protected]

Can  look  at  old  analyses

36

Page 37: Introduction to and - Wiki.uio.no · 2014-10-07 · Introduction to and INF-BIO5121/9121 Sveinung Gundersen ELIXIR.NO / Dept. of Informatics, UiO Oct 7, 2014

15.08.2014 [email protected]

Share  or  publish  histories

37

Can  share  via  link  or  publish  for  all  to  see

Page 38: Introduction to and - Wiki.uio.no · 2014-10-07 · Introduction to and INF-BIO5121/9121 Sveinung Gundersen ELIXIR.NO / Dept. of Informatics, UiO Oct 7, 2014

15.08.2014 [email protected]

Published  histories  open  to  all

38

NOTE:  others  can  not  only  look  atpublished  histories,  they  can  alsocopy  data  sets  from  it!

Prac;cal  way  to  share  data!

Page 39: Introduction to and - Wiki.uio.no · 2014-10-07 · Introduction to and INF-BIO5121/9121 Sveinung Gundersen ELIXIR.NO / Dept. of Informatics, UiO Oct 7, 2014

15.08.2014 [email protected]

Impor;ng  shared  histories

39

Page 40: Introduction to and - Wiki.uio.no · 2014-10-07 · Introduction to and INF-BIO5121/9121 Sveinung Gundersen ELIXIR.NO / Dept. of Informatics, UiO Oct 7, 2014

Galaxy:other tutorials

• For more tutorials and exercises, check out:

http://wiki.g2.bx.psu.edu/Learn

• Article with step-for-step examples/protocols making use of Galaxy in different scenarios:

Blankenberg, D., et al., Galaxy: a web-based genome analysis tool for experimentalists. Current protocols in molecular biology, Jan 2010, Chapter 19.


Recommended