+ All Categories
Home > Documents > DAS/2: Next Generation Distributed Annotation System Gregg Helt 1, Steve Chervitz 1, Andrew Dalke 3,...

DAS/2: Next Generation Distributed Annotation System Gregg Helt 1, Steve Chervitz 1, Andrew Dalke 3,...

Date post: 31-Mar-2015
Category:
Upload: leanna-leman
View: 213 times
Download: 0 times
Share this document with a friend
Popular Tags:
19
DAS/2: Next Generation DAS/2: Next Generation Distributed Annotation Distributed Annotation System System Gregg Helt Gregg Helt 1 , Steve Chervitz , Steve Chervitz 1 , Andrew Dalke , Andrew Dalke 3 , Allen Day , Allen Day 4 , Ed , Ed Erwin Erwin 1 , Andreas Prlic , Andreas Prlic 2 , and Lincoln Stein , and Lincoln Stein 4 with many other contributors with many other contributors (1) Affymetrix, Inc. (2) Sanger Institute (3) Dalke Scientific; (4) Cold Spring Harbor Laboratory
Transcript
Page 1: DAS/2: Next Generation Distributed Annotation System Gregg Helt 1, Steve Chervitz 1, Andrew Dalke 3, Allen Day 4, Ed Erwin 1, Andreas Prlic 2, and Lincoln.

DAS/2: Next Generation DAS/2: Next Generation Distributed Annotation SystemDistributed Annotation System

Gregg HeltGregg Helt11, Steve Chervitz, Steve Chervitz11, Andrew Dalke, Andrew Dalke33, Allen Day, Allen Day44, Ed , Ed ErwinErwin11, Andreas Prlic, Andreas Prlic22, and Lincoln Stein, and Lincoln Stein44

with many other contributorswith many other contributors

(1) Affymetrix, Inc.(2) Sanger Institute (3) Dalke Scientific;(4) Cold Spring Harbor Laboratory

Page 2: DAS/2: Next Generation Distributed Annotation System Gregg Helt 1, Steve Chervitz 1, Andrew Dalke 3, Allen Day 4, Ed Erwin 1, Andreas Prlic 2, and Lincoln.

Development of DAS/2 SpecificationDevelopment of DAS/2 Specification

DAS/2 development initially motivated by numerous suggestions for improvements to DAS on the DAS mailing list, and the series of RFCs collected on biodas.org site

Though informal, still a long process! NIH grant awarded June 2004 for development of next-generation

DAS/2 Most recent DAS/2 specification is available at

biodas.org/documents/das2/das2_protocol.html (tied to CVS repository)

DAS/2.0 XML schema frozen since November 2006– Specified with RelaxNG– Available in CVS repository at cvs.biodas.org, in file

das/das2/das2_schemas.rnc

Feedback from the DAS developer and user communities will continue to guide future iterations of the DAS/2 specification

– Biweekly teleconference, everyone is welcome to join in the discussion– DAS/2 mailing list ( http://lists.open-bio.org/mailman/listinfo/das2 )– biodas.org site moving to wiki ( biodas.org/wiki )

Page 3: DAS/2: Next Generation Distributed Annotation System Gregg Helt 1, Steve Chervitz 1, Andrew Dalke 3, Allen Day 4, Ed Erwin 1, Andreas Prlic 2, and Lincoln.

““Things I would like to do with DAS, but Things I would like to do with DAS, but currently can’t” (without extensions)currently can’t” (without extensions)

Achieve reasonable performance with large amounts of data

Represent features with more than two levels

Reliably refer to DAS features / sequences / etc. outside of DAS

Reliably relate feature types to a more structured ontology

Efficiently cache DAS feature queries

Easily identify when two DAS servers are using the same coordinate system (doable with help of Sanger DAS registry)

Have a standard way to create and edit DAS features

Page 4: DAS/2: Next Generation Distributed Annotation System Gregg Helt 1, Steve Chervitz 1, Andrew Dalke 3, Allen Day 4, Ed Erwin 1, Andreas Prlic 2, and Lincoln.

Preserving DAS1 Strengths in DAS/2Preserving DAS1 Strengths in DAS/2

Specification is independent of implementation– Many server implementations– Many client implementations

Simple, simple, simple– HTTP for transport– URLs for queries– XML for responses– REST-like style

No central annotation authority

Focus on location-based annotations of biological sequences

Couple XML response formats to URL request formats– Instead of XML formats on their own

Page 5: DAS/2: Next Generation Distributed Annotation System Gregg Helt 1, Steve Chervitz 1, Andrew Dalke 3, Allen Day 4, Ed Erwin 1, Andreas Prlic 2, and Lincoln.

Basic DAS/2 QueriesBasic DAS/2 Queries

NetAffx examples: http://netaffxdas.affymetrix.com/das2/ Sources query: what genomes and versions of those genomes

are available? Segments query: what annotated sequences are available Types query: what types of annotations are available Features query: get features / annotations

– Based on type– Based on segment– Based on segment range– Based on annotation ID

Page 6: DAS/2: Next Generation Distributed Annotation System Gregg Helt 1, Steve Chervitz 1, Andrew Dalke 3, Allen Day 4, Ed Erwin 1, Andreas Prlic 2, and Lincoln.

High Level Comparison High Level Comparison DAS/1 and DAS/2 are very similarDAS/1 and DAS/2 are very similar

DAS/1 DAS/2

Page 7: DAS/2: Next Generation Distributed Annotation System Gregg Helt 1, Steve Chervitz 1, Andrew Dalke 3, Allen Day 4, Ed Erwin 1, Andreas Prlic 2, and Lincoln.

DAS/2 Enhancements: PerformanceDAS/2 Enhancements: Performance

One of the biggest complaints about DAS1 : Performance– Very verbose annotation XML, which hinders performance at the

server, network, and client

DAS/2 Solution #1: Refactoring annotation XML– Much smaller minimum footprint

DAS/2 Solution #2: Alternative return formats– All servers can return defined das2xml annotation format– Servers can also specify additional return formats per annotation type– Clients can choose from alternative formats if they desire– Not restricted to XML, or even text– Examples: GFF3, BED, PSL, binaryPSL– Extreme performance improvements possible

Page 8: DAS/2: Next Generation Distributed Annotation System Gregg Helt 1, Steve Chervitz 1, Andrew Dalke 3, Allen Day 4, Ed Erwin 1, Andreas Prlic 2, and Lincoln.

Redesigned XML for improved performance: Redesigned XML for improved performance: minimal feature XMLminimal feature XML

DAS/2

<FEATURE uri=“” type=“” />

<LOC segment=“” range=“” />

</FEATURE>

DAS/1

<FEATURE id=“” />

<TYPE id=“” />

<METHOD />

<START> </START>

<END> </END>

<SCORE> </SCORE>

<ORIENTATION> </ORIENTATION>

<PHASE> </PHASE>

</FEATURE>

Page 9: DAS/2: Next Generation Distributed Annotation System Gregg Helt 1, Steve Chervitz 1, Andrew Dalke 3, Allen Day 4, Ed Erwin 1, Andreas Prlic 2, and Lincoln.

DAS/2 Enhancements: Resolving Ambiguities DAS/2 Enhancements: Resolving Ambiguities Example: Ambiguous Range QueriesExample: Ambiguous Range Queries

query range = x:yquery range = x:y

xx yy

Server 1 Response:Server 1 Response:

Server 2 Response:Server 2 Response:

Overlap or containment?Overlap or containment?Parent based or separate?Parent based or separate?

Server 3 Response:Server 3 Response:

Server 4 Response:Server 4 Response:

Page 10: DAS/2: Next Generation Distributed Annotation System Gregg Helt 1, Steve Chervitz 1, Andrew Dalke 3, Allen Day 4, Ed Erwin 1, Andreas Prlic 2, and Lincoln.

DAS/2 Solution #1 – remove spec ambiguityDAS/2 Solution #1 – remove spec ambiguityExample: Ambiguous Range QueriesExample: Ambiguous Range Queries

Be specific about whether feature query range filter is overlap, containment, etc.

Add different region filters for different possibilities– Overlaps– Contains– Within– Identical

Allow boolean combinations of these and other filters in the query URL

– A smart client could used these combinations to optimize queries

Return full feature closure ( all parents and parts )– This also allows streaming processing

Page 11: DAS/2: Next Generation Distributed Annotation System Gregg Helt 1, Steve Chervitz 1, Andrew Dalke 3, Allen Day 4, Ed Erwin 1, Andreas Prlic 2, and Lincoln.

Solution #2: DAS/2 Validation SuiteSolution #2: DAS/2 Validation Suite

Verify whether a DAS/2 server is compliant with the specification.

– Critical for improving interoperability between clients and servers developed by different groups.

Standalone tool and web application, written in Python– Enter a DAS/2 URL query or XML response– Get an HTML report about DAS/2 compliance

Performs schema-based validation– also validates some parts of protocol not formalized in schema, such

as URL query parameters

Web application at http://cgi.biodas.org:8080/– Moving soon– Plan is to eventually integrate into DAS/2 registry server– Source code available at: http://sourceforge.net/projects/dasypus

Page 12: DAS/2: Next Generation Distributed Annotation System Gregg Helt 1, Steve Chervitz 1, Andrew Dalke 3, Allen Day 4, Ed Erwin 1, Andreas Prlic 2, and Lincoln.

DAS/2 enhancements to integrate needs for DAS/2 enhancements to integrate needs for DAS1 extensionsDAS1 extensions

CAPABILITIES element – replaces DAS1 X-Das-Capabilities header

Gene DAS– DAS/2 feature is not required to have a location– If has a location, not required to specify range

Protein DAS– DAS/2 feature is not required to have any DNA-specifc elements like phase or

orientation

Alignment DAS– DAS/2 feature can have multiple locations– Each location can have an optional gap attribute which is a CIGAR string– Two locations: pairwise alignment– More than two locations: multiple alignment

“simple” DAS– Server can choose to not support a capability by omitting its CAPABILITIES

element For example, no segments / entry-points query

– Can specify that feature filters are not supported

Structural DAS Others (3DEM, Interaction, ???)

Page 13: DAS/2: Next Generation Distributed Annotation System Gregg Helt 1, Steve Chervitz 1, Andrew Dalke 3, Allen Day 4, Ed Erwin 1, Andreas Prlic 2, and Lincoln.

More DAS/2 EnhancementsMore DAS/2 Enhancements

IDs are URIs– Could be LSIDs or URLs– Allows for integration with many other web technologies– xml:base

“Writeback” spec to allow DAS/2 clients to create and edit annotations on DAS/2 servers

– Spec has been frozen, but client and server implementation are still preliminary

Ontologies for feature types

Feature hierarchies

DAS/2 Registry

And more…

Page 14: DAS/2: Next Generation Distributed Annotation System Gregg Helt 1, Steve Chervitz 1, Andrew Dalke 3, Allen Day 4, Ed Erwin 1, Andreas Prlic 2, and Lincoln.

DAS/2 Server ImplementationsDAS/2 Server Implementations

GMOD-based DAS/2 server– Deployed at http://das.biopackages.net/das/genome– Uses BioPerl for middleware – Plugin architecture for data backend– Currently most developed plugin is for CHADO database– Source code available via anonymous CVS as part of GMOD

See http://www.gmod.org for access details.

Genometry DAS/2 server– Deployed at http://netaffxdas.affymetrix.com/das2/sources– Designed for performance

(Mostly) In-memory object datastore Quickly transmit hundreds of thousands of features Quickly transmit millions of graph data points

– Only supports fairly simple annotations – Supports alternative content formats– Supports some DAS/2 caching via If-Modified-Since header

Simple files exposed on web server

Easing migration: DAS1 DAS/2 transformational proxy server

Other implementations?

Page 15: DAS/2: Next Generation Distributed Annotation System Gregg Helt 1, Steve Chervitz 1, Andrew Dalke 3, Allen Day 4, Ed Erwin 1, Andreas Prlic 2, and Lincoln.

DAS/2 Client ImplementationsDAS/2 Client Implementations

IGB (“ig-bee”) - genome visualization app developed at Affymetrix – Implemented in Java in the Integrated Genome Browser

Supports data loading via a variety of formats and mechanisms Contains both DAS1 and DAS/2 clients

– Handles large amounts of genome-scale data Loads hundreds of thousands of sequence annotations at once Loads dense quantitative graphs with millions of data points Maintains real-time responsiveness to user interactions Includes features to support exploratory data analysis Plugin architecture for customized extensions

– Source code released under Common Public License http://genoviz.sourceforge.net Also available as a WebStart-managed application at Affymetrix or Sourceforge web

sites

Other implementations?– GBrowse– Dasypus validator– DAS/2 Registry– ???

Page 16: DAS/2: Next Generation Distributed Annotation System Gregg Helt 1, Steve Chervitz 1, Andrew Dalke 3, Allen Day 4, Ed Erwin 1, Andreas Prlic 2, and Lincoln.

DAS/2 RegistryDAS/2 Registry

Main registry implementation developed by Andreas Prlic

Evolving from Sanger DAS1 registry

Multiple ways to access registry – Andreas’ talk later

One elegant way: DAS/2 registry is simply a DAS/2 server– Most info needed for a registry are already available in DAS/2

XML responses– So any DAS/2 server that aggregates DAS/2 sources in its

sources XML doc can be considered a DAS/2 registry– This works because of the RESTful approach to specifying URLs

for accessing particular versioned source capabilities– “Simple” DAS/2 registries can even be static documents– Very useful for in-house DAS/2 registries

More sophisticated DAS/2 registries can have query filters for the sources query (not developed yet)

Page 17: DAS/2: Next Generation Distributed Annotation System Gregg Helt 1, Steve Chervitz 1, Andrew Dalke 3, Allen Day 4, Ed Erwin 1, Andreas Prlic 2, and Lincoln.

DAS/2 WritebackDAS/2 Writeback

Uses HTTP POST

DAS2XML POSTed to DAS/2 writeback server

Atomic transactional unit is the HTTP call

Locking mechanism

Spec stable

Only partial client and server implementations, expect spec to change as implementations are further developed

Page 18: DAS/2: Next Generation Distributed Annotation System Gregg Helt 1, Steve Chervitz 1, Andrew Dalke 3, Allen Day 4, Ed Erwin 1, Andreas Prlic 2, and Lincoln.

Future DAS/2 developmentsFuture DAS/2 developments

Short term– More documentation of specification– More documentation of existing client and server implementations– Continued improvements to client and server implementations– Most work needed on client and server writeback implementation

Help install and/or develop DAS/2 servers at model organism database sites

Mapping servers

Interclient communications protocol

Extreme DAS caching

[ 3D structure ]

Extensions– Extended via CAPABILITIES element– General Principles:

If entity is independent enough to have an ID, the ID shoud be a URI …

Page 19: DAS/2: Next Generation Distributed Annotation System Gregg Helt 1, Steve Chervitz 1, Andrew Dalke 3, Allen Day 4, Ed Erwin 1, Andreas Prlic 2, and Lincoln.

AcknowledgementsAcknowledgements

DAS & DAS2 mailing list participants!


Recommended