Date post: | 12-Jan-2015 |
Category: |
Technology |
Upload: | benvvalk |
View: | 2,866 times |
Download: | 2 times |
SADI for GMOD: Semantic Web Services for Model Organism Databases
Ben Vandervalk, Luke McCarthy, Edward Kawas, Mark Wilkinson
James Hogg Research Centre, Heart + Lung InstituteUniversity of British Columbia
http://code.google.com/p/sadi/wiki/SADIforGMOD
Background
Background: Model Organism Databases
• several organisms are studied extensively by biologists: e.g. yeast, mouse, fruitfly
• each model organism has its own database:
• sequences (DNA, RNA, protein)
• sequence features (e.g. genes)
• research publications
• experimental results
• biochemical pathways
• phenotype images
• evolutionary trees (for closely related species)
All images were obtained from Wikipedia and are in the public domain.
Background: Sequence Features
position on DNA sequencepromoter track
gene track
transcript track
Lincoln Stein, http://www.sequenceontology.org/gff3.shtml
sequence features (a.k.a. sequence annotations) are regions of a DNA or protein sequence with a certain type (e.g. 'gene') in genome browsers, different types of sequence annotations are displayed in separate tracks
Background: Sequence Features
autogenerated image from http://flybase.org/cgi-bin/gbrowse/dmel/
Many types of biological data are represented as sequence features:
promoters chromosome bands genes transcripts CDSs proteins protein domains transposons non-coding RNAs ESTs many more...
Background: Distributed Annotation System (DAS)
autogenerated image from http://flybase.org/cgi-bin/gbrowse/dmel/
HTTP GET
DAS XML
DAS Server
DAS Server
DAS Server
DAS Server
HTTP GET
DAS XML
HTTP GET
DAS XML
HTTP GET
DAS XML
Background: Limitations of the Distributed Annotation System (DAS)
integrating data from DAS servers requires specialized software (“DAS clients”)
other types of data (e.g. biochemical pathways, experimental results) cannot be automatically integrated with sequence feature data
most bioinformatics analysis software (e.g. BLAST) does not speak DAS
SADI for GMOD: Semantic Web Services for Model Organism
Databases
SADI for GMOD: Semantic Web Services for Model Organism Databases
SADI (Semantic Automated Discovery and Integration)
• Standard for Web services that consume/generate RDF• Motivation: automated integration of bioinformatics data and
software
GMOD (Generic Model Organism Database)
• Toolkit for building a model organism database and website
• Collection of related open source projects: e.g. Chado, Gbrowse, Pathway Tools
• Many sites use GMOD components: FlyBase, BeetleBase, DictyBase, etc.
SADI in a Nutshell• to invoke a SADI service:
o HTTP POST an RDF document to the service URLo e.g. $ curl --data @input.rdf http://sadiframework.org/examples/hello
• to get service metadata: o HTTP GET on service URLo returns an RDF document with service name, description, etc. o e.g. $ curl http://sadiframework.org/examples/hello
• structure of input/output data is described in OWLo service provider specifies one input OWL class and one output OWL class
• strengths of SADIo no framework-specific messaging formats or ontologieso supports batch processing of inputso supports long-running services (asynchronous services)
more info: http://sadiframework.org/
SADI for GMOD Services
• SADI services for accessing sequence feature data• implemented as Perl CGI scripts
Service Name Input Relationship Output
get_feature_info database identifier is about feature description
genomic coordinates overlaps
genomic coordinates is represented by
get_child_features feature description
get_parent_features feature description
get_features_overlapping_region
collection of feature descriptions
get_sequence_for_region
DNA, RNA, or amino acid sequence
has part / derives into
collection of feature descriptions
is part of / derives from
collection of feature descriptions
SADI for GMOD: Structure of Service Input/Output RDF
@prefix lsrn: <http://purl.oclc.org/SADI/LSRN/> .@prefix GeneID: <http://lsrn.org/GeneID:> .
GeneID:49962 a lsrn:GeneID_Record; sio:SIO_000008 [ # p = 'has attribute' a lsrn:GeneID_Identifier; sio:SIO_000300 "49962" # p = 'has value' ] .
@perefix lsrn: <http://purl.oclc.org/SADI/LSRN/> .@prefix GeneID: <http://lsrn.org/GeneID:> .@prefix FlyBase: <http://flybase.org/cgi-bin/sadi.gmod/feature?id=> .@prefix GenBank: <http://lsrn.org/GB:> .
# p = 'is about'GeneID:49962 sio:SIO_000332 FlyBase:FBgn0040037 .
# feature
FlyBase:FBgn0040037 a SO:SO_0000704 . # o = 'gene' range:position [ a range:RangedSequencePosition; sio:SIO_000053 . # p = 'has proper part' [ a range:StartPosition; sio:SIO_000300 26994]; sio:SIO_000053 . # p = 'has proper part' [ a range:EndPosition; sio:SIO_000300 32391]; range:in_relation_to _:minus_strand_seq ] .
_:minus_strand_seq sio:SIO_000011 [ # p = 'represents' a strand:MinusStrand; sio:SIO_000093 GenBank:AE014135 # p = 'is proper part of' ] .
# reference feature (chromosome)
FlyBase:4 # chromosome 4 a SO:SO_0000105 . # o = 'chromosome arm'
Input RDF (N3) Output RDF (N3)
get_feature_info
HTTP POST
SADI for GMOD Demo
SADI Client Software
SADI Taverna PluginSHARE Query Engine
http://biordf.net/cardioSHARE/query
SPARQL Query => SADI Workflow Design SADI Workflows
http://sadiframework.org/content/2010/05/03/sadi-taverna-plugin-tutorial/
Demo with SHARE Query Engine
SPARQL Query SADI Workflow
"What proteins are homologous to FlyBase protein FBpp0091047?"
PREFIX FlyBase: <http://lsrn.org/FLYBASE:>PREFIX sio: <http://semanticscience.org/resource/>PREFIX sadi: <http://sadiframework.org/ontologies/properties.owl#>
SELECT ?homologWHERE { # SIO_000332 = 'is about' FlyBase:FBpp0091047 sio:SIO_000332 ?protein . ?protein sadi:hasSequence ?sequence .
# SIO_010302 = 'is homologous to' ?protein sio:SIO_010302 ?homolog .
}
Acknowledgements
TeamMark Wilkinson: Principal InvestigatorLuke McCarthy: Lead Programmer, SADI & SHAREEdward Kawas: Perl Programmer, SADI
Funding
MicrosoftResearch
http://sadiframework.org/
SADI Training Course
“Web Publishing of Scientific Data and Services”October 22nd-23rd, 2011
University of British Columbia (next door!)
Learn how to:
=> semantically describe service functionality in OWL=> publish Semantic Web services using the SADI framework
More info: http://sadiframework.org/training
Extra Slides
[GENERAL]db_adaptor = Bio::DB::SeqFeature::Storedb_args = -adaptor DBI::mysql -dsn dbi:mysql:database=flybasebase_url = http://flybase.org/cgi-bin/sadi.gmod/
SADI for GMOD: Setting up the Services1. Load your GFF files into a Bio::DB::SeqFeature::Store database (mysql) 2. Install SADI for GMOD dependencies with CPAN
3. Download the SADI for GMOD tarball and unpack into cgi-bin
4. Set DB connection parameters in cgi-bin/sadi.gmod/sadi.gmod.conf
5. Configure Dbxref mappings in cgi-bin/sadi.gmod/dbxref.conf
[DBXREF_TO_LSRN]SwissProt = UniProtUniProtKB = UniProtSwissProt/TrEMBL = UniProt...
6. Register the services in public SADI registry: http://sadiframework.org/registry
more info: http://code.google.com/p/sadi/wiki/SADIforGMOD