+ All Categories
Home > Documents > Presentation of the CRG Bioinformatics Core facility Jean-François Taly.

Presentation of the CRG Bioinformatics Core facility Jean-François Taly.

Date post: 23-Dec-2015
Category:
Upload: calvin-chandler
View: 223 times
Download: 5 times
Share this document with a friend
Popular Tags:
60
Presentation of the CRG Bioinformatics Core facility Jean-François Taly
Transcript
Page 1: Presentation of the CRG Bioinformatics Core facility Jean-François Taly.

Presentation of the

CRG Bioinformatics Core facility

Jean-François Taly

Page 2: Presentation of the CRG Bioinformatics Core facility Jean-François Taly.

People in the BioCore

Jean-Francois Luca Toni•@CRG 2009•@BioCore 2012•Acting head•Structur. bioinfo.•MSA•NGS analyst•Galaxy server•Training

•@BioCore 2010•NGS analyst•Small ncRNA prediction•Motif analysis•Training

•@Biocore 2009•Wikis•Web/DB dev.•DB Mirrors•Struct. bioinfo.•Training

•@Biocore 2014•Micro-arrays•NGS analyst•Galaxy•Training

Sarah

Page 3: Presentation of the CRG Bioinformatics Core facility Jean-François Taly.

Our mission

• Expertise in bioinformatics• Service• Consultation

• Trainings • Internal and external

• Support in infrastructures• In collaboration with the SIT and TIC

• Part of the CRG bioinformaticians network• 83 @ bioinformatics retreat• Many more in PRBB/CNAG

Page 4: Presentation of the CRG Bioinformatics Core facility Jean-François Taly.

Our services

Analysis Microarray Chip-seq RNA-seq DE and assembly Genome assembly Variant calling

Informatics support Wiki WEB Server API

Trainings Galaxy, Perl, Linux, advanced bioinformatics

Page 5: Presentation of the CRG Bioinformatics Core facility Jean-François Taly.

Fee per service

Item PRBB fees Public fees without VAT

Manual data analysis 13.12 €/hour 39.36 €/hour

Automated data analysis (CPU time)

2.38 €/hour 7.16 €/hour

Page 6: Presentation of the CRG Bioinformatics Core facility Jean-François Taly.

Our contribution to projects

Project conception

Bioinfo exp. design

Bioinfo exp. realization

Bioinfo output interpretation

Project conclusions

Page 7: Presentation of the CRG Bioinformatics Core facility Jean-François Taly.

Our contribution to projects

Project conception

Bioinfo exp. design

Bioinfo exp. realization

Bioinfo output interpretation

Project conclusions

Apply a definedprocedures

Page 8: Presentation of the CRG Bioinformatics Core facility Jean-François Taly.

Our contribution to projects

Project conception

Bioinfo exp. design

Bioinfo exp. realization

Bioinfo output interpretation

Project conclusions

CustomizedAnalysis

Page 9: Presentation of the CRG Bioinformatics Core facility Jean-François Taly.

CRG bioinformatics community

Big Data WG• EGA initiative• Data Engineering• NoSQL• HPC

NGS Tech. Sem.• RNA-seq• G. assembly• Variant Annot.• Metagenomics

Other topics• Integrated -omics• Good practice in

code dev.• Galaxy dev.• …

Page 10: Presentation of the CRG Bioinformatics Core facility Jean-François Taly.

source: Creative Commons, Wikipedia

Gene expression array data analysis:• Background correction and normalization• Differential expression analysis• Gene Ontology and pathway analysis• Various graphics / plots

Additional array-based technologies the Bioinformatics unit supports include:• qPCR arrays• Comparative Genomics Hybridization arrays

Main tools are based on the R / Bioconductor environment

Micro-arrays

Page 11: Presentation of the CRG Bioinformatics Core facility Jean-François Taly.

RNA-seq

Page 12: Presentation of the CRG Bioinformatics Core facility Jean-François Taly.

RNA-seq

Page 13: Presentation of the CRG Bioinformatics Core facility Jean-François Taly.

DNA-seq

Page 14: Presentation of the CRG Bioinformatics Core facility Jean-François Taly.

DNA-seq

Pevzner P A et al. PNAS 2001;98:9748-9753

Page 15: Presentation of the CRG Bioinformatics Core facility Jean-François Taly.

Chip-seq

Page 16: Presentation of the CRG Bioinformatics Core facility Jean-François Taly.

Chip-seq

Page 17: Presentation of the CRG Bioinformatics Core facility Jean-François Taly.

Growing to the next level

From gene DE to transcripts DE Users have now access to longer reads and deeper coverage

Metagenomics 16S Ribosomal amplicon sequencing with MiSeq

Data integration framework Combining different data types into one single analysis

RNAseq DE Histone marks Metabolomics data Proteomics

Data analysis workflow on Galaxy Leave the basic processing to users and focus on advanced analysis

Page 18: Presentation of the CRG Bioinformatics Core facility Jean-François Taly.

Databases mirroring

Biological file sources ENSEMBL UCSC NCBI Blast DBs UniProt PDB Igenomes (Illumina, only Human but the rest is upcoming)

All Indexed and formated for NCBI BLAST+ (makeblastdb for proteins and nucleic acids) Bowtie & Bowtie2 BWA Fastaindex (Exonerate) GEM faTo2bit

Page 19: Presentation of the CRG Bioinformatics Core facility Jean-François Taly.

Where are they stored?

In CRG common storage: /db

More information: http://biocore.crg.cat/wiki/Category:Mirrors

IMPORTANT: /db/seq (former /seq) IS DEPRECATED

Page 20: Presentation of the CRG Bioinformatics Core facility Jean-François Taly.

WEB and Database services

Applications Data and project management Platforms for big data analysis and complex information

querying Promotion and publication of scientific results

Page 21: Presentation of the CRG Bioinformatics Core facility Jean-François Taly.

WEB and Database services Example

Superfly for Yogi Jaëger Visual catalogue of gene embryo development of different fly

species.

Page 22: Presentation of the CRG Bioinformatics Core facility Jean-François Taly.

WEB and Database services Example

PRGDB with Walter Sanseverino Wiki-based Database of plant resistance genes.

Page 23: Presentation of the CRG Bioinformatics Core facility Jean-François Taly.

Activity per category in 2014

Page 24: Presentation of the CRG Bioinformatics Core facility Jean-François Taly.

Presentation of the Galaxy platform

Jean-François Taly Bioinformatics Core Facility

CRG (Barcelona, Catalonia, Spain)September 18th 2014

EMBO Global Exchange CoursePasteur Institute of Tunis, Tunisia

Page 25: Presentation of the CRG Bioinformatics Core facility Jean-François Taly.

Biologists: Linux-free data analysis with a graphical

interface

Bioinformaticians: Insure reproducibility when sharing analysis

and workflows Teach their knowledge to a broad audience Get access to workflows for topics they are

not familiar of

Software Developers: Diffuse their tools on a standardized platform

Why Should I Use Galaxy?

Page 26: Presentation of the CRG Bioinformatics Core facility Jean-François Taly.

The Galaxy Team

Galaxy is developed by :• The Nekrutenko lab in the center for

Comparative Genomics and Bioinformatics at Penn State University

• The Taylor lab at Johns Hopkins University• The community

https://wiki.galaxyproject.org/GalaxyTeam

Page 27: Presentation of the CRG Bioinformatics Core facility Jean-François Taly.

Rationale behind GalaxyFrom Goeks et al. Genome Biol. 2010.

“Computation has become an essential tool in life science research. This is exemplified in genomics, where first microarrays and now massively parallel DNA sequencing have enabled a variety of genome-wide functional assays, such as ChIP-seq and RNA-seq (and many others), that require increasingly complex analysis tools. However, sudden reliance on computation has created an 'informatics crisis' for life science researchers: computational resources can be difficult to use, and ensuring that computational experiments are communicated well and hence reproducible is challenging. Galaxy helps to address this crisis by providing an open, web-based platform for performing accessible, reproducible, and transparent genomic science. “

Page 28: Presentation of the CRG Bioinformatics Core facility Jean-François Taly.

Biologists: Linux-free data analysis with a graphical

interface

Bioinformaticians: Insure reproducibility when sharing analysis

and workflows Teach their knowledge to a broad audience Get access to workflows for topics they are

not familiar of

Software Developers: Diffuse their tools on a standardized platform

Why Should I Use Galaxy?

Page 29: Presentation of the CRG Bioinformatics Core facility Jean-François Taly.

Makes bioinformatics accessible

Page 30: Presentation of the CRG Bioinformatics Core facility Jean-François Taly.

From a command line …

Page 31: Presentation of the CRG Bioinformatics Core facility Jean-François Taly.

… to a graphical interface

Page 32: Presentation of the CRG Bioinformatics Core facility Jean-François Taly.

One step

Page 33: Presentation of the CRG Bioinformatics Core facility Jean-François Taly.

Multi-step protocol1

2

3

4

5

Page 34: Presentation of the CRG Bioinformatics Core facility Jean-François Taly.

Workflow

Page 35: Presentation of the CRG Bioinformatics Core facility Jean-François Taly.

Galaxy Tutorials https://usegalaxy.org/u/jeremy/p/galaxy-rna-seq-analysis-exercise

https://wiki.galaxyproject.org/Learn

Page 37: Presentation of the CRG Bioinformatics Core facility Jean-François Taly.

Biologists: Linux-free data analysis with a graphical

interface

Bioinformaticians: Insure reproducibility when sharing analysis

and workflows Teach their knowledge to a broad audience Get access to workflows for topics they are

not familiar of

Software Developers: Diffuse their tools on a standardized platform

Why Should I Use Galaxy?

Page 38: Presentation of the CRG Bioinformatics Core facility Jean-François Taly.

Reproducibility

Bioinformaticians suffer that too!• Results can change in function of

• Libraries and software versions• Genome annotations

• Results published without the code

Want to share your findings with everybody?

• Froze an environment in a Virtual Machine• Use an application controller (Docker) • Prepare a Galaxy workflow

Page 39: Presentation of the CRG Bioinformatics Core facility Jean-François Taly.

Improve the visibility of a paper

“A Galaxy workflow and the corresponding wrappers are available to download at https://mylab.com. A virtual machine containing a pre-set up server can be download at the same address “

Why not having as well?

Page 40: Presentation of the CRG Bioinformatics Core facility Jean-François Taly.

Galaxy Workflows

Page 41: Presentation of the CRG Bioinformatics Core facility Jean-François Taly.

Biologists: Linux-free data analysis with a graphical

interface

Bioinformaticians: Insure reproducibility when sharing analysis

and workflows Teach their knowledge to a broad audience Get access to workflows for topics they are

not familiar of

Software Developers: Diffuse their tools on a standardized platform

Why Should I Use Galaxy?

Page 42: Presentation of the CRG Bioinformatics Core facility Jean-François Taly.

Wrapping software

Software

The wrapper prepare the command line

XML file

Page 43: Presentation of the CRG Bioinformatics Core facility Jean-François Taly.

Simple wrapper example

Page 44: Presentation of the CRG Bioinformatics Core facility Jean-François Taly.

venn_diagram.sh Wrapper can launch scripts

Page 45: Presentation of the CRG Bioinformatics Core facility Jean-François Taly.

TopHat wrapper (1) XML file describing tophat parameters

Page 46: Presentation of the CRG Bioinformatics Core facility Jean-François Taly.

TopHat wrapper (2) XML file describing tophat parameters

Page 47: Presentation of the CRG Bioinformatics Core facility Jean-François Taly.

Community Tools/Wrappers

Page 48: Presentation of the CRG Bioinformatics Core facility Jean-François Taly.

Galaxy Public servers Good points

Free No IT tasks Comes with reference genomes and

workflows

Bad points Offer Limited Resources (Disk/CPUs) Data transfer may be long Give access to the tools they want Data security may not be respected

Should I install Galaxy?

Page 49: Presentation of the CRG Bioinformatics Core facility Jean-François Taly.

Galaxy Public Servers https://wiki.galaxyproject.org/PublicGalaxyServers

Page 50: Presentation of the CRG Bioinformatics Core facility Jean-François Taly.

Galaxy Local Server Good points

Total control on data and tools Your own disk and CPU limitation Some companies sell a ready-to-use

infrastructure Tool shed helps to install wrappers and

software

Bad points Cost of installation and maintenance Need IT supports if you need a multi-users

advanced set up

Should I install Galaxy?

Page 51: Presentation of the CRG Bioinformatics Core facility Jean-François Taly.

Get Galaxy https://wiki.galaxyproject.org/Admin/GetGalaxy

Can be installed only in Linux or Mac

Page 52: Presentation of the CRG Bioinformatics Core facility Jean-François Taly.

NFS:/software

HPC

User

/scratch

Sequences Indexes

Files, Back-up, tmp

FTP

NFS

NFS:/db

Galaxy server

Tools

DATA Software

30 days max.

Files > 2Gb

Page 53: Presentation of the CRG Bioinformatics Core facility Jean-François Taly.

Database engine Galaxy team recommend postgreSQL but can it be

MySQL Store users details and data information

Tools = wrappers File describing all possible parameters of a software Script preparing the correct command line

Apache server

Page 54: Presentation of the CRG Bioinformatics Core facility Jean-François Taly.
Page 55: Presentation of the CRG Bioinformatics Core facility Jean-François Taly.

Shared file system NFS (2Pb)

10 €/Tb/Group/Month Access to the shared biological resources

Ensembl, UCSC Genomes and indexes Uniprot, pfam, smart, PDB

Access to the shared software repository

High Performance Computing 7 cores 8 CPUS each (56 tot) 47 Gb memory

Page 56: Presentation of the CRG Bioinformatics Core facility Jean-François Taly.
Page 57: Presentation of the CRG Bioinformatics Core facility Jean-François Taly.

FTP server Proftpd for the server side I recommend Filezila for the client (multiplatform)

Upload from Galaxy Files are moved to the shared file system

Page 58: Presentation of the CRG Bioinformatics Core facility Jean-François Taly.

Galaxy is an open, web-based platform for computational biomedical research.

Accessible: Users without programming experience can run tools and workflows

Reproducible: Galaxy captures analysis details Transparent: Users can share and publish

analyses

WIKI: https://wiki.galaxyproject.org/FrontPage

Summary

Page 59: Presentation of the CRG Bioinformatics Core facility Jean-François Taly.

http://galaxy.crg.es/

Demo on Galaxy@CRG

Page 60: Presentation of the CRG Bioinformatics Core facility Jean-François Taly.

Recommended