Date post: | 14-Dec-2015 |
Category: |
Documents |
Upload: | beryl-bates |
View: | 217 times |
Download: | 3 times |
1st EELA Grid School 1st EELA Grid School December 4th of 2006December 4th of 2006
Eduardo MURRIETA LEON
Romualdo ZAYAS-LAGUNAS
Pierre-Alain BRANGER
Jérôme VERLEYEN
Roberto RODRIGUEZ
César BONAVIDES
Alfredo HERNANDEZ
EMBOSS over a GridEMBOSS over a Grid
4
EMBOSS over a GridEMBOSS over a Grid
IndexIndex
BioinformaticsBioinformatics EMBOSSEMBOSS ObjectivesObjectives
5
EMBOSS over a GridEMBOSS over a Grid
What is Bioinformatics?What is Bioinformatics?
BioinformaticsBioinformatics What is it?What is it?
ToolsTools
DatabaseDatabase EMBOSS EMBOSS ObjectivesObjectives
• State of art
- Analysis of genes expression
- Need for prediction of protein structure
- Analysis of sequence
- A huge amount of knowledge to store
6
EMBOSS over a GridEMBOSS over a Grid
What is Bioinformatics?What is Bioinformatics?
BioinformaticsBioinformatics What is it?What is it?
ToolsTools
DatabaseDatabase EMBOSS EMBOSS ObjectivesObjectives
• Bioinformatics as a solution
- To help life science data analysis
- Use in a lot of domain (human genome project)
7
EMBOSS over a GridEMBOSS over a Grid
Type of ToolsType of Tools
BioinformaticsBioinformaticsWhat is it?What is it?
ToolsTools
DatabaseDatabase EMBOSS EMBOSS ObjectivesObjectives
• Searching (knowledge extraction)
- Blast (nucleotides, proteins)
• Alignment
- Clustal
• Phylogeny
- Phylip
8
EMBOSS over a GridEMBOSS over a Grid
DatabaseDatabase
BioinformaticsBioinformaticsWhat is it?What is it?
ToolsTools DatabaseDatabase
EMBOSS EMBOSS ObjectivesObjectives
• Various organization
- NCBI : United States
- EMBL : Europe
- DDBJ : Japan
9
EMBOSS over a GridEMBOSS over a Grid
OverviewOverview
BioinformaticsBioinformatics EMBOSSEMBOSS
OverviewOverview
Tech. Char. Tech. Char.
ArchitectureArchitecture
GUIs GUIs
UseUse ObjectivesObjectives
• The European Molecular Biology Open Software Suite
- From EMBnet
• Package of software:
- a set of sequence analysis program
- a toolkit for creating robust bioinformatics applications or workflows
- Database searching
- Identification of motif
- Presentation tools for publication
10
EMBOSS over a GridEMBOSS over a Grid
Technical CharacteristicsTechnical Characteristics
BioinformaticsBioinformatics EMBOSSEMBOSS
OverviewOverview Tech. Char.Tech. Char.
ArchitectureArchitecture
GUIs GUIs
UseUse ObjectivesObjectives
• Software requirements
- Linux Distribution
- gcc compiler and graphic libraries
• Hardware requirements
- 100 to 400 Mb free disk space
- 512 Mb of RAM
• Execution requirements- Input data size : From 20 Kb to 100 Mb - Output : From 20 Kb to 1 Mb
11
EMBOSS over a GridEMBOSS over a Grid
EMBOSS ArchitectureEMBOSS Architecture
BioinformaticsBioinformatics EMBOSSEMBOSS
OverviewOverview
Tech. Char. Tech. Char. ArchitectureArchitecture
GUIsGUIs
UseUse ObjectivesObjectives
• Main parts
- ACD Files
- Programs (API)
- Inputs / Outputs (sequences, databases)
12
EMBOSS over a GridEMBOSS over a Grid
ACD FilesACD Files
BioinformaticsBioinformatics EMBOSSEMBOSS
OverviewOverview
Tech. Char. Tech. Char. ArchitectureArchitecture
GUIsGUIs
UseUse ObjectivesObjectives
• ACD Files
- Ajax Command
Definition Files
- stored in
$EMBOSS_DIR/acd
application: intconv [ documentation: "Convert ints to ajints" groups: "Test"]
section: input [ information: "Input section" type: "page"]
infile: infile [ parameter: "Y" knowntype: "integer long data" information: "Standard format information" ]
endsection: input
13
EMBOSS over a GridEMBOSS over a Grid
ProgramsPrograms
BioinformaticsBioinformatics EMBOSSEMBOSS
OverviewOverview
Tech. Char. Tech. Char. ArchitectureArchitecture
GUIsGUIs
UseUse ObjectivesObjectives
• Programs
- Binary files written in C and stored in $EMBOSS_DIR/bin
- Use of libraries
AJAX (Asynchronous Javascript and XML)
NUCLEUS (specific of molecular sequence analysis)
14
EMBOSS over a GridEMBOSS over a Grid
Input/OutputInput/Output
BioinformaticsBioinformatics EMBOSSEMBOSS
OverviewOverview
Tech. Char. Tech. Char. ArchitectureArchitecture
GUIsGUIs
UseUse ObjectivesObjectives
• Sequences
- succession of letters representing the structure of a real or hypothetical DNA molecule or protein
- ASCII TEXT extracted from huge Databases
• EMBOSS can access to various format of database
- Embl, Fasta, Genbank, Swissprot …
- access by Id of genes, by description keywords …
15
EMBOSS over a GridEMBOSS over a Grid
GUI for EMBOSSGUI for EMBOSS
BioinformaticsBioinformatics EMBOSSEMBOSS
OverviewOverview
Tech. Char.Tech. Char.
Architecture Architecture GUIsGUIs
UseUse ObjectivesObjectives
• wEMBOSS, Jemboss …
16
EMBOSS over a GridEMBOSS over a Grid
Use of EMBOSSUse of EMBOSS
BioinformaticsBioinformatics EMBOSSEMBOSS
OverviewOverview
Tech. Char.Tech. Char.
Architecture Architecture
GUIsGUIs UseUse
ObjectivesObjectives
• Study of Haptoglobin protein in different species
- Extraction from Swissprot DB. (“seqret”)
- 10 Mamalians species (human, rat, mouse ,rabbit)
- Alignment (“emma”)
- Calculate the phylogenetic tree
17
EMBOSS over a GridEMBOSS over a Grid
Use of EMBOSSUse of EMBOSS
BioinformaticsBioinformatics EMBOSSEMBOSS
OverviewOverview
Tech. Char.Tech. Char.
Architecture Architecture
GUIsGUIs UseUse
ObjectivesObjectives
• Example of a generated tree
18
EMBOSS over a GridEMBOSS over a Grid
ObjectivesObjectives
BioinformaticsBioinformatics EMBOSSEMBOSS ObjectivesObjectives
« Get EMBOSS running over a Grid »
- EMBOSS jobs execution on a grid through command lined
- Retrieving jobs results
- Be able to execute a complete workflow / pipeline sequence analysis (i.e Use of EMBOSS)
Complementary functions• EMBOSSed Databases research
• Wrapping applications for EMBOSS over a Grid
• Web interface and Project manager for EMBOSS
• Have a BioGrid portal