+ All Categories
Home > Documents > EMBOSS as a DAS Client

EMBOSS as a DAS Client

Date post: 12-Jan-2016
Category:
Upload: theo
View: 51 times
Download: 0 times
Share this document with a friend
Description:
EMBOSS as a DAS Client. Peter Rice [email protected] Mahmut Uludag [email protected] 3rd March 2011. EMBOSS: A quick introduction. European Molecular Biology Open Software Suite Open source package for sequence analysis ANSI C source code GPL licensed applications, LGPL libraries - PowerPoint PPT Presentation
Popular Tags:
21
EBI is an Outstation of the European Molecular Biology Laboratory. EMBOSS as a DAS Client Peter Rice [email protected] Mahmut Uludag [email protected] 3rd March 2011.
Transcript
Page 1: EMBOSS as a DAS Client

EBI is an Outstation of the European Molecular Biology Laboratory.

EMBOSS as a DAS Client

Peter Rice [email protected]

Mahmut Uludag [email protected]

3rd March 2011.

Page 2: EMBOSS as a DAS Client

EMBOSS as a DAS Client21 April 20232

EMBOSS: A quick introduction

• European Molecular Biology Open Software Suite

• Open source package for sequence analysis• ANSI C source code• GPL licensed applications, LGPL libraries• 200+ applications• 100+ third party applications in 15 associated packages• Project started 1996 at Sanger Centre and HGMP • Now based at EBI• Release 6.3.0 15th July 2010• Funded by UK-BBSRC and EMBL-EBI

Page 3: EMBOSS as a DAS Client

EMBOSS as a DAS Client21 April 20233

EMBOSS history

• Project started at Sanger Centre and SEQNET August 1996• Alan moved from SEQNET 1997 (Wellcome funding)• Peter moved to Lion Bioscience 2000 (CCP11-BBSRC/MRC)• Peter moved to EBI 2003• HGMP closed 2005: Alan+Jon moved to EBI• BBSRC funding (limited) 2006-2009• BBSRC BBR funding 2009-2011

• Major new developments• New data types• New data sources• Built-in ontologies

Page 4: EMBOSS as a DAS Client

EMBOSS as a DAS Client21 April 20234

EMBOSS command line interface

• EMBOSS applications run from the command line• This is not the only interface

• There are over 100 interfaces and packaged systems available• Web interfaces• Graphical user interfaces (GUIs)• Web services

• All applications have a command definition file (.acd)• Defines all inputs, outputs, and other options• Read at startup• Contains all command line options with descriptions• Template for any other interface

Page 5: EMBOSS as a DAS Client

EMBOSS as a DAS Client21 April 20235

EMBOSS command line example

% antigenic

Input protein sequence(s): uniprot:actb1_fugru

Minimum length of antigenic region [6]:

Output report [actb1_fugru.antigenic]:

% antigenic uniprot:actb1_fugru -auto

Page 6: EMBOSS as a DAS Client

EMBOSS as a DAS Client21 April 20236

EMBOSS ACD File

application: antigenic [ documentation: "Finds antigenic sites in proteins" groups: "Protein:Motifs"]

section: input [ information: "Input section” type: "page“ ]

seqall: sequence [ parameter: "Y" type: “proteinstandard" ]

endsection: input

section: required [ information: "Required section” type: "page” ]

integer: minlen [ standard: "Y" minimum: "1" maximum: "50" default: "6" information: "Minimum length of antigenic region" ]

endsection: required

section: output [ information: "Output section” type: "page” ]

report: outfile [ parameter: "Y" rformat: "motif" multiple: "Y" taglist: "int:pos=Max_score_pos" ]

endsection: output

Page 7: EMBOSS as a DAS Client

EMBOSS as a DAS Client21 April 20237

EMBOSS ACD File with EDAM Annotation

application: antigenic [ documentation: "Finds antigenic sites in proteins" groups: "Protein:Motifs" relations: "EDAM:0000201 topic Immunological analysis" relations: "EDAM:0000416 operation Epitope mapping“]

section: input [ information: "Input section“ type: "page” ]

seqall: sequence [ parameter: "Y" type: “proteinstandard" relations: "EDAM:0001219 data Pure protein sequence" relations: "EDAM:0000849 data Sequence record" relations: "EDAM:0002178 data 1 or more“]

endsection: input

section: required [ information: "Required section” type: "page” ]

integer: minlen [ standard: "Y" minimum: "1" maximum: "50" default: "6" information: "Minimum length of antigenic region" relations: "EDAM:0001249 data Sequence length“ ]

endsection: required

section: output [ information: "Output section” type: "page” ]

report: outfile [ parameter: "Y" rformat: "motif" multiple: "Y" taglist: "int:pos=Max_score_pos" relations: "EDAM:0001534 data Peptide immunogenicity report“ ]

endsection: output

Page 8: EMBOSS as a DAS Client

EMBOSS as a DAS Client21 April 20238

Documentation & books

Three books at typesetting stage.

• Administrators’ Manual• Users’ Manual• Developers’ Manual

Concomitant major revision of EMBOSS website.

Automation of website content addition.

Books to form basis of new website content.

Page 9: EMBOSS as a DAS Client

Uniform Sequence Address (USA): URL-style naming

Derived from the familiar "VMS logical name" syntax used by SRS and GCG.

database : entryname• embl : ecompa ID or accession can be used in this way• uniprot-id : opsd_bovin SRS syntax for query by ID• embl-acc : x13776 SRS syntax for query by accession

format :: filename• fasta :: /users/pmr/paamir.fa Filename with specific format• ecoompa.genbank With no format, can try all formats

format :: filename : entryname• fasta :: unfinished : AH6.1 Most formats allow multiple sequences

Also @listfile

and asis::gctgactgactgatg

Queries database-field:query SRS syntax for id, acc, sv, des, key, org

EMBOSS: Sequences

EMBOSS as a DAS Client21 April 20239

Page 10: EMBOSS as a DAS Client

• Aim to read “all” public data resources

• Follow cross-references (explicit and implied)• UniProt• EMBL/GenBank/DDBJ• Other

• Servers• Multiple data resources through a single server definition

• DAS, Ensembl, BioMart, WsEbeye, DbFetch, SRS• Cache files of resource definitions for server

• Data resource catalogue (drcat)• 600+ data resources• Query terms and URLs• EDAM annotation of resources, formats, identifiers, terms

New data resources

EMBOSS as a DAS Client21 April 202310

Page 11: EMBOSS as a DAS Client

ID ArachnoServer

Acc DB-0145

Name ArachnoServer

Desc Spider toxin database

URL http://www.arachnoserver.org

Cat Organism-specific databases

Taxon 6845 | Arachnida

EDAMres 0000621 | Organism-specific

EDAMdat 0002400 | Toxin annotation

EDAMid 0002578 | ArachnoServer ID

Xref SP_explicit | ArachnoServer ID;Toxin name

Query Toxin annotation | HTML | ArachnoServer ID | www.arachnoserver.org/toxincard.html?id=%s

Example ArachnoServer ID | AS000014

CCmisc BMC Genomics 10:375-375(2009); [Pubmed: 19674480]

Data resource catalogue (drcat)

EMBOSS as a DAS Client21 April 202311

Page 12: EMBOSS as a DAS Client

EMBOSS Datatypes21.04.2312

EMBOSS Data Types

• Sequences• Nucleotide (DNA and RNA)• Protein

• Features• Attached to sequences• Independent data objects

• Bio-Ontologies (OBO)• Taxonomy (NCBI)• Data Resources• Assembled reads• Text

• Text, HTML, XML

Page 13: EMBOSS as a DAS Client

• Reuse “USA” syntax• [Server:] Dbname : identifier Database has an access method• [Server:] Dbname – field : query General field names

• Data types: features, bio-ontologies, taxonomy, etc.

• Access methods: HTTP, DAS, BioMart, Ensembl, ...

• Multiple types and formats for a server/resource• type: “sequence features”• format: “embl fasta”

New data types

EMBOSS as a DAS Client21 April 202313

Page 14: EMBOSS as a DAS Client

EMBOSS as a DAS Client21.04.2314

EMBOSS Query Language

• Query fields are now made general• Any field queriable by the access method (DAS, SRS, …)• Any index created by indexing applications• Any query term in the data resource catalogue

• Multiple queries combined• For one data resource• AND, OR, … to combine queries

Page 15: EMBOSS as a DAS Client

EMBOSS as a DAS Client21.04.2315

DAS Server Definitions

SERVER das [

method: "dassource"

type: "sequence, features"

url: "http://www.dasregistry.org/das/"

comment: "access sequence/feature sources listed on das registry

(http://www.dasregistry.org/das/)"

cachefile: "server.dassource"

]

Page 16: EMBOSS as a DAS Client

EMBOSS as a DAS Client21.04.2316

DAS Server Definitions

SERVER ensembldas [

method: "dassource"

type: "sequence, features"

url: "http://www.ensembl.org/das/"

comment: "access sequence/feature sources on ensembl das server

(http://www.ensembl.org/das/)"

cachefile: "server.ensembldas"

]

Page 17: EMBOSS as a DAS Client

EMBOSS as a DAS Client21.04.2317

DAS Example

DB Ensembl_Human_Genes [

method: das

type: "Sequence, Features“

taxon: "9606“

format: "das, dasgff“

url: http://www.ebi.ac.uk/das-srv/genedas/das/ Homo_sapiens.Gene_ID.reference

example: "ENSG00000139618“

comment: "The Ensembl human Gene_ID reference source, serving sequences and non-location features.“

hasaccession: "N“

identifier: "segment“

fields: "segment, type, category, categorize, feature_id“

]

Page 18: EMBOSS as a DAS Client

EMBOSS as a DAS Client21 April 202318

Ensembl DAS Example

DB Felis_catus_CAT_prediction_transcript [ method: das type: "Nucfeatures“ taxon: "9685“ format: "dasgff“ url: http://www.ensembl.org/das/Felis_catus.CAT.prediction_transcript example: "scaffold_209987[1:550]“ comment: "Annotation source for Felis_catus prediction_transcript“ hasaccession: "N“ identifier: "segment“ fields: "segment, type, category, categorize, feature_id“]

Page 19: EMBOSS as a DAS Client

EMBOSS as a DAS Client21.04.2319

EMBOSS Query Language

• das: ensembl_human_genes: ENSG00000139618• ensembldas: Felis_catus_CAT_prediction_transcript:

scaffold_209987 [1:550]• das: Homo_sapiens_GRCh37_transcript: 10

[32889611:32973347]• das: uniprot: P00280• das: cath: 5pti• das: uniparc: UPI000000000A• das: Homo_sapiens_GRCh37_reference-

{segment: 11 & type: supercontig}

Page 20: EMBOSS as a DAS Client

EMBOSS as a DAS Client21.04.2320

EMBOSS Query Language: Future

• Ontology-based searches of data resources• Taxonomy• EDAM terms

• Resources• Data types• Identifiers

• Descriptions

• Search for applications matching data types• Sequences and features• Nucleotide and protein• …

• Support for DAS advanced query ...

Page 21: EMBOSS as a DAS Client

EMBOSS as a DAS Client21 April 202321

Acknowledgements

• EBI: Peter Rice, Alan Bleasby, Jon Ison, Mahmut Uludag, Martin Senger, Tom Oinn, Jaina Mistry, Rodrigo Lopez, Sharmilla Pillai, Hamish McWilliam

• RFCGR/HGMP: Alan Bleasby, Jon Ison, Tim Carver, Hugh Morgan, Claude Beazley, Lisa Mullan, Damian Counsell, Gary Williams, Val Curwen, Mark Faller, Sinead O’Leary, Thon deBoer, Martin Bishop

• Sanger Institute: Ian Longden, Richard Bruskiewich, Simon Kelley

• LION: Mahmut Uludag, Thomas Laurent, Bijay Jassal, Bren Vaughan, Thure Etzold

• National bioinformatics service providers in: Norway, Spain, Italy, Netherlands, Germany, Belgium, Russia, China, Canada, Australia, Argentina

• Others: Catherine Letondal, Don Gilbert, Rodger Staden, Bill Pearson, Webb Miller, Marie-Laetitia Denayer, Amandine Schurmann, Gabriele Weiler, Luke McCarthy, David Mathog, David Bauer, Henrikki Almusa, Thomas Siegmund, Scott Markel, Darryl Leon, Bastien Chevreux, Ivo Hofacker, ...

• IBM, Hewlett-Packard, (Compaq), Apple, SGI, Sun, LION bioscience, SciTegic, Cambridge University Press

• Open-Bio Foundation, Sourceforge, Debian, Fedora, CEH

... And the British Antarctic Survey

http://emboss.sourceforge.net

http://emboss.open-bio.org/wiki/Latest_developments


Recommended