+ All Categories
Home > Technology > SADI for GMOD: Semantic Web Services for Model Organism Databases

SADI for GMOD: Semantic Web Services for Model Organism Databases

Date post: 12-Jan-2015
Category:
Upload: benvvalk
View: 2,866 times
Download: 2 times
Share this document with a friend
Description:
SADI for GMOD is a collection of ready-made SADI services for accessing sequence feature data in RDF form. The services were developed as an add-on for the GMOD (Generic Model Organism Database) project, which is a popular toolkit for building model organism databases and their associated websites (e.g. FlyBase).
Popular Tags:
19
SADI for GMOD: Semantic Web Services for Model Organism Databases Ben Vandervalk, Luke McCarthy, Edward Kawas, Mark Wilkinson James Hogg Research Centre, Heart + Lung Institute University of British Columbia http://code.google.com/p/sadi/wiki/SADIforGMOD
Transcript
Page 1: SADI for GMOD: Semantic Web Services for Model Organism Databases

SADI for GMOD: Semantic Web Services for Model Organism Databases

Ben Vandervalk, Luke McCarthy, Edward Kawas, Mark Wilkinson

James Hogg Research Centre, Heart + Lung InstituteUniversity of British Columbia

http://code.google.com/p/sadi/wiki/SADIforGMOD

Page 2: SADI for GMOD: Semantic Web Services for Model Organism Databases

Background

Page 3: SADI for GMOD: Semantic Web Services for Model Organism Databases

Background: Model Organism Databases

• several organisms are studied extensively by biologists: e.g. yeast, mouse, fruitfly

• each model organism has its own database: 

• sequences (DNA, RNA, protein)

• sequence features (e.g. genes)

• research publications

• experimental results

• biochemical pathways

• phenotype images

• evolutionary trees (for closely related species)

All images were obtained from Wikipedia and are in the public domain.

Page 4: SADI for GMOD: Semantic Web Services for Model Organism Databases

Background: Sequence Features

position on DNA sequencepromoter track

gene track

transcript track

Lincoln Stein, http://www.sequenceontology.org/gff3.shtml

sequence features (a.k.a. sequence annotations) are regions of a DNA or protein sequence with a certain type (e.g. 'gene') in genome browsers, different types of sequence annotations are displayed in separate tracks

Page 5: SADI for GMOD: Semantic Web Services for Model Organism Databases

Background: Sequence Features

autogenerated image from http://flybase.org/cgi-bin/gbrowse/dmel/

Many types of biological data are represented as sequence features:

promoters chromosome bands genes transcripts CDSs proteins protein domains transposons non-coding RNAs ESTs many more...

Page 6: SADI for GMOD: Semantic Web Services for Model Organism Databases

Background: Distributed Annotation System (DAS)

autogenerated image from http://flybase.org/cgi-bin/gbrowse/dmel/

HTTP GET

DAS XML

DAS Server

DAS Server

DAS Server

DAS Server

HTTP GET

DAS XML

HTTP GET

DAS XML

HTTP GET

DAS XML

Page 7: SADI for GMOD: Semantic Web Services for Model Organism Databases

Background: Limitations of the Distributed Annotation System (DAS)

 integrating data from DAS servers requires specialized software (“DAS clients”)

 other types of data (e.g. biochemical pathways, experimental results) cannot be automatically integrated with sequence feature data

 most bioinformatics analysis software (e.g. BLAST) does not speak DAS

Page 8: SADI for GMOD: Semantic Web Services for Model Organism Databases

SADI for GMOD: Semantic Web Services for Model Organism 

Databases

Page 9: SADI for GMOD: Semantic Web Services for Model Organism Databases

SADI for GMOD: Semantic Web Services for Model Organism Databases

SADI (Semantic Automated Discovery and Integration)

• Standard for Web services that consume/generate RDF• Motivation: automated integration of bioinformatics data and 

software 

GMOD (Generic Model Organism Database)

• Toolkit for building a model organism database and website

• Collection of related open source projects: e.g. Chado, Gbrowse, Pathway Tools  

• Many sites use GMOD components: FlyBase, BeetleBase, DictyBase, etc. 

Page 10: SADI for GMOD: Semantic Web Services for Model Organism Databases

SADI in a Nutshell• to invoke a SADI service:

o HTTP POST an RDF document to the service URLo e.g. $ curl --data @input.rdf http://sadiframework.org/examples/hello

• to get service metadata:  o HTTP GET on service URLo returns an RDF document with service name, description, etc. o e.g. $ curl http://sadiframework.org/examples/hello

• structure of input/output data is described in OWLo service provider specifies one input OWL class and one output OWL class

• strengths of SADIo no framework-specific messaging formats or ontologieso supports batch processing of inputso supports long-running services (asynchronous services)

more info: http://sadiframework.org/

Page 11: SADI for GMOD: Semantic Web Services for Model Organism Databases

SADI for GMOD Services

• SADI services for accessing sequence feature data• implemented as Perl CGI scripts

Service Name Input Relationship Output

get_feature_info database identifier is about feature description

genomic coordinates overlaps

genomic coordinates is represented by

get_child_features feature description

get_parent_features feature description

get_features_overlapping_region

collection of feature descriptions

get_sequence_for_region

DNA, RNA, or amino acid sequence

has part / derives into

collection of feature descriptions

is part of / derives from

collection of feature descriptions

Page 12: SADI for GMOD: Semantic Web Services for Model Organism Databases

SADI for GMOD: Structure of Service Input/Output RDF

@prefix lsrn: <http://purl.oclc.org/SADI/LSRN/> .@prefix GeneID: <http://lsrn.org/GeneID:> .

GeneID:49962 a lsrn:GeneID_Record; sio:SIO_000008 [ # p = 'has attribute' a lsrn:GeneID_Identifier; sio:SIO_000300 "49962" # p = 'has value' ] .

@perefix lsrn: <http://purl.oclc.org/SADI/LSRN/> .@prefix GeneID: <http://lsrn.org/GeneID:> .@prefix FlyBase: <http://flybase.org/cgi-bin/sadi.gmod/feature?id=> .@prefix GenBank: <http://lsrn.org/GB:> .

# p = 'is about'GeneID:49962 sio:SIO_000332 FlyBase:FBgn0040037 .

# feature

FlyBase:FBgn0040037 a SO:SO_0000704 . # o = 'gene' range:position [ a range:RangedSequencePosition; sio:SIO_000053 . # p = 'has proper part' [ a range:StartPosition; sio:SIO_000300 26994]; sio:SIO_000053 . # p = 'has proper part' [ a range:EndPosition; sio:SIO_000300 32391]; range:in_relation_to _:minus_strand_seq ] .

_:minus_strand_seq sio:SIO_000011 [ # p = 'represents' a strand:MinusStrand; sio:SIO_000093 GenBank:AE014135 # p = 'is proper part of' ] .

# reference feature (chromosome)

FlyBase:4 # chromosome 4 a SO:SO_0000105 . # o = 'chromosome arm'

Input RDF (N3) Output RDF (N3)

get_feature_info

HTTP POST

Page 13: SADI for GMOD: Semantic Web Services for Model Organism Databases

SADI for GMOD Demo

Page 14: SADI for GMOD: Semantic Web Services for Model Organism Databases

SADI Client Software

SADI Taverna PluginSHARE Query Engine

http://biordf.net/cardioSHARE/query

SPARQL Query => SADI Workflow Design SADI Workflows

http://sadiframework.org/content/2010/05/03/sadi-taverna-plugin-tutorial/

Page 15: SADI for GMOD: Semantic Web Services for Model Organism Databases

Demo with SHARE Query Engine

SPARQL Query SADI Workflow

"What proteins are homologous to FlyBase protein FBpp0091047?"

PREFIX FlyBase: <http://lsrn.org/FLYBASE:>PREFIX sio: <http://semanticscience.org/resource/>PREFIX sadi: <http://sadiframework.org/ontologies/properties.owl#>

SELECT ?homologWHERE { # SIO_000332 = 'is about' FlyBase:FBpp0091047 sio:SIO_000332 ?protein . ?protein sadi:hasSequence ?sequence .

# SIO_010302 = 'is homologous to' ?protein sio:SIO_010302 ?homolog .

}

Page 16: SADI for GMOD: Semantic Web Services for Model Organism Databases

Acknowledgements

  TeamMark Wilkinson: Principal InvestigatorLuke McCarthy: Lead Programmer, SADI & SHAREEdward Kawas: Perl Programmer, SADI

Funding

MicrosoftResearch

http://sadiframework.org/

Page 17: SADI for GMOD: Semantic Web Services for Model Organism Databases

SADI Training Course

 

“Web Publishing of Scientific Data and Services”October 22nd-23rd, 2011

University of British Columbia (next door!)

Learn how to:

=> semantically describe service functionality in OWL=> publish Semantic Web services using the SADI framework

More info: http://sadiframework.org/training

Page 18: SADI for GMOD: Semantic Web Services for Model Organism Databases

Extra Slides

Page 19: SADI for GMOD: Semantic Web Services for Model Organism Databases

[GENERAL]db_adaptor = Bio::DB::SeqFeature::Storedb_args = -adaptor DBI::mysql -dsn dbi:mysql:database=flybasebase_url = http://flybase.org/cgi-bin/sadi.gmod/

SADI for GMOD: Setting up the Services1. Load your GFF files into a Bio::DB::SeqFeature::Store database (mysql) 2. Install SADI for GMOD dependencies with CPAN

3. Download the SADI for GMOD tarball and unpack into cgi-bin

4. Set DB connection parameters in cgi-bin/sadi.gmod/sadi.gmod.conf

5. Configure Dbxref mappings in cgi-bin/sadi.gmod/dbxref.conf

[DBXREF_TO_LSRN]SwissProt = UniProtUniProtKB = UniProtSwissProt/TrEMBL = UniProt...

6. Register the services in public SADI registry: http://sadiframework.org/registry

more info: http://code.google.com/p/sadi/wiki/SADIforGMOD


Recommended