Date post: | 03-Jan-2016 |
Category: |
Documents |
Upload: | amelia-hodges |
View: | 214 times |
Download: | 0 times |
1
caBIO ECCF Pilot
Konrad Rokicki
ICR Workspace Call
July 28, 2010
2
What is caBIO?
• Repository of molecular annotations loaded with data from many different sources
• Currently exposes data using many APIs and interfaces:• SDK-generated: Java API, REST API, SOAP API• Grid Data Service• Python API• Portlet
• History as a pilot project: • First caCORE generated system• First silver-level grid service• First caGrid portlet• First CBIIT iPhone app
3
Goals for the Pilot Project
• Leverage caBIO as a reference implementation of the NCI CBIIT ECCF
• Develop a set of ECCF-based Molecular Annotation Service specifications
• Implement and deploy a service based on service specifications
• Provide guidelines to assist other NCI CBIIT products in leveraging ECCF processes and developing ECCF artifacts
• Provide input on the ECCF Implementation Guide
• Develop guidelines that are pragmatic and useful
• Identify list of tools and infrastructure that will assist in the development of services and specifications
4
Team
• caBIO Team• Juli Klemm• Sharon Gaheen• Jim Sun• Liqun Qi• Konrad Rokicki
• ECCF Mentoring• Baris Suzek • Brian Davis• Raghu Chintalapati
5
ECCF Artifact Matrix
Enterprise/BusinessViewpoint
InformationViewpoint
ComputationalViewpoint
EngineeringViewpoint
Computation Independent Model (CIM)
Platform Independent Model (PIM)
Platform Specific Model (PSM)
6
RM-ODP Viewpoints
• Enterprise/Business Viewpoint• Purpose / Scope• Business cases /Storyboards• Industry standards
• Information Viewpoint• Information Models (DAM, PIM, PSM)• Semantic Profiles
• Computational Viewpoint• Capabilities / Operations• Functional Profiles
• Engineering Viewpoint• Non-functional Requirements• Deployment model
7
Levels of Abstraction
• Computation Independent Model (CIM)• Service Scope and Description Document • CIM Service Specification Document (CIMSS)
• Platform Independent Model (PIM)• PIM Service Specification Document (PIMSS)
• Platform Specific Model (PSM)• PSM Service Specification Document (PSMSS)• Service Integration Guide
• Implementation• Deployable System
8
Enterprise Service Specification Process
9
Project Plan
10
Scope and Service Description
• “The Molecular Annotation Service provides a set of interfaces for the annotation of experimental or other types of data with molecular information. ”
• “The purpose of Molecular Annotations service is to provide specifications for a set of molecular annotations that may be integrated with user-facing applications.”
• “The development of a common, reusable set of interfaces provided by this service will facilitate standardization, integration, and interoperability between various systems that provide and consume molecular annotations.”
11
Mapping to LSBAM
LS BAM Use Case Service Mapping Description
Characterize/Organize the Data
The molecular annotations service supports the Characterize/Organize the Data use cases by providing annotations for molecular entities associated with data. For example, in characterizing experimental data, a researcher may look up reference annotations with the service to find which genes are mapped to the microarray used in the experiment.
Integrate Data Sets The molecular annotations service supports the Integrate Data Sets use case as it will provide the capability of retrieving annotations from the service to use as join points, or to display as an additional reference.
Annotate Findings/Results
The molecular annotations service supports the Annotate Findings/Results use case as the service provides direct support for obtaining information associated with molecular entities to assist in annotating findings/results.
Identify and Review Knowledge Bases and /or Databases
The molecular annotations service supports the Identify and Review Knowledge Bases and/or Databases use case as the service provides support for knowledge discovery via the integration of annotations across disparate data sources.
12
Business Storyboards
Outline Bioinformatics developer wants to retrieve all diseases and agents associated with a target gene
Detail John Smith is developing a web site that allows researchers to find all of the diseases associated with a specific gene. The site will also allow researchers to select a gene and obtain a list of agents (drugs) used to target that gene. By querying the molecular annotations service, John’s web application can retrieve a list of diseases and agents associated with a gene.
13
Scope
Items Scope / Out of Scope
Source
Provide the ability to retrieve molecular annotations
Scope Molecular Annotation Service Scope and Description
Provide the ability functional associations, cellular locations, and biological processes associated with a gene
Scope Molecular Annotation Service Scope and Description
Provide the ability to retrieve disease and agents associated with a gene
Scope Molecular Annotation Service Scope and Description
Provide the ability to retrieve variations associated with a gene
Scope Molecular Annotation Service Scope and Description
… … …
14
Semantic Profiles
Semantic Profile No.
Semantic Profile Name
Constrained Information Model
Semantic Profile Description
MA-SP1 Molecular Annotation Domain Analysis Model
LSDAM v1.1 The molecular annotation service will use semantics from the Life Science DAM. The following classes are included in the project-specific DAM (grouped by sub-domain):
GeneNucleicAcidSequenceFeatureMolecularSequenceAnnotationGeneticVariationSingleNucleotidePolymorphismNucleicAcidPhysicalLocation…
15
Project Analysis Model
class Molecular Annotation
domain::MolecularSequenceAnnotation
- date: TS
domain::Gene
- symbol: ST
domain::GeneIdentifier
- databaseName: CD- identifier: II
domain::NucleicAcidSequenceFeature
- orientat ion: ST
domain::AdditionalOrganismName
- comment : ST- source: CD- value: ST
domain::Organism
- commonName: ST- ncbiTaxonomyId: CD- scientificName: CD- taxonomyRank: CD
domain::MolecularSequence
- value: SC
domain::NucleicAcidPhysicalLocation
- endCoordinate: INT- startCoordinate: INT
domain::NucleicAcidSequence
domain::GeneticVariation
domain::SingleNucleotidePolymorphism
domain::SingleNucleotidePolymorphismIdentifier
- databaseName: CD- identifier: II
BRIDG 2.1 - ISO::TherapeuticAgent
+ identifier: II+ statusCode: CD+ statusDateRange: IVL<TS>
BRIDG 2.1 - ISO::Material
- actualIndicator: BL+ descript ion: ST+ formCode: CD+ identifier: DSET<II>+ name: DSET<EN.TN>+ statusCode: CD+ statusDateRange: IVL<TS>
BRIDG 2.1 - ISO::Produc t
+ classCode: DSET<CD>+ expirationDate: TS+ pre1938Indicator: BL+ typeCode: CD
0..*
identifies / is identif iedby
1
0..*
is included in /includes
1.. *
1.. *
is designated by / designates
0..*
0..*
is included in /includes
1
0..*
reports / is reported by
0..*1
is included in /includes
0..*
0..*
identifies / is identif iedby
1
+product 1
plays / is played by
+therapeuticAgent 0..1
+product 0..1
has component /used as component
+productCollection0..*
16
Capabilities
Name DescriptionGet Gene By Symbol or Alias
Returns the gene named by the specified gene symbol or gene alias
Get Gene By Microarray Reporter
Returns the gene associated with the specified microarray reporter
Get Functional Associations Returns annotations describing a gene's molecular function
Get Cellular Locations Returns annotations describing a gene's location within a cell
Get Biological Processes Returns annotations describing a gene's role in biological processes
Get Disease Associations Returns findings about a gene's role in diseasesGet Agent Associations Returns findings about agents which target a given geneGet Structural Variations Returns variations which are located on a given geneGet Homologous Gene Returns a gene’s homologous gene in a specified organism
17
Capability Details
Name [M] Get Gene By Symbol or AliasDescription [M] Returns the gene named by the specified gene symbol or
gene alias and the gene’s organismPre-Conditions [M] NoneSecurity Pre-Conditions [M] None
Inputs [M] Gene Symbol or Alias
Organism IdentifierOutputs [M] A collection of Gene objects Post-Conditions [O] NoneException Conditions [M] No matching genes found
Aspects left for Technical Bindings [O]
Format and data type for the Organism Identifier
Notes [O] NA
18
Functional Profiles
Functional Profile No.
Functional Profile Name
Functional Profile Description
Capability Names
MA-FP1 Gene Annotation Query Profile
Contains all the capabilities for retrieving gene annotations
Get Gene By Symbol or Alias Get Gene By Microarray Reporter Get Functional Associations Get Cellular Locations Get Biological Processes Get Disease Associations Get Agent Associations Get Structural Variations Get Homologous Gene
19
Conformance Profiles
Conformance No MA-CP1
Conformance Name
LSDAM-based Gene Annotation Conformance Profile
Description This conformance profile defines the functionality for the Gene Annotation Service using LSDAM semantics
Usage Context This profile would be used by a researcher wishing to access gene annotations
Mandatory No
Functional Profile(s)
MA-FP1 : Gene Annotation Query Profile
Semantic Profile(s) MA-SP1 : LSDAM v1.1
20
Activity Diagrams
act Query Serv ice
Da
ta Q
ue
ry S
erv
ice
Se
rvic
e U
se
r
Start
Send Query
Compose Query
Identify Operation
Set Parameters
Receive Query
ExecuteQuery
Return Results
Receive Results
ProcessResults
End
Data Sourc e
21
Conformance Statements
Name Type Viewpoint Description Test methodQuery Performance
Obligation Engineering The MA service should provide a response within 0.5 seconds to support a synchronous UI based client
Test cases to include performance testing.
Additional Functionality
Permission Computational The MA service can provide additional functionality other than specified in these specifications
Design Review
Semantic Model
Obligation Informational The MA service must provide traceability to classes in the LSDAM where applicable.
Design Review
Data Types Obligation Informational The MA service must conform to NCI’s constrained list of ISO 21090 data types.
Design Review
Functional Profiles
Obligation Computational Functional Profiles shall be deployed as functional wholes. Ignoring or omitting functional behavior defined within a functional profile is not permitted, nor is diverging from the detailed functional specifications provided in Section 4.
1. Design Review
2. Test cases
22
ECCF Artifact Matrix
Enterprise/BusinessViewpoint
InformationViewpoint
ComputationalViewpoint
EngineeringViewpoint
Computation Independent Model (CIM)
Platform Independent Model (PIM)
Platform Specific Model (PSM)
23
Relation to CIM
Conceptual Functional Service Specification Name
Conceptual Functional Service Specification Version
Description & Link to the Conceptual Functional Service Specification
Molecular Annotation Computation Independent Service Specification
0.0.4 https://gforge.nci.nih.gov/svnroot/cabiodb/ECCF/artifacts/conceptual/CIMSS_Molecular_Annotation_Service.doc
Deviation from the Conceptual Functional Service Specification
Reason for Deviation
None
24
Relationship to Standards
Standards Description
LSBAM v1.0 Service conforms to NCI’s Life Science Business Architecture Model
LSDAM v1.1 Service conforms to the Life Sciences DAM version 1.1LSPIM v0.1 Service conforms to the Life Sciences PIM version 0.1ISO 21090 Service conforms to NCI’s version of ISO 21090 data typesHUGO Gene Symbols
Service leverages gene symbols from the Human Genome Organization
MGI Gene Symbols
Service leverages gene symbols from the International Committee on Standardized Genetic Nomenclature for Mice
25
What is the LSPIM?
• Life Science Platform Independent Model
• Based on the LSDAM v1.1
• A PIM is derived from a DAM by following some guidelines:
• Constrain to relevant classes and attributes
• Localize by adding attributes as needed
• Semantics are made explicit
• Enumeration of value domains
• Resolution of type codes into class hierarchies
• Resolution of all many to many relationships
• All required compliance with other models needs to be expressed at PIM level
• The LSPIM is currently being developed by the Information Representation Working Group (IRWG)
class Gene and related classes
LSPIM_1_0::Gene
- symbol: ST
domain::Gene
- fullName: ST- symbol: ST
domain::GeneIdentifier
- databaseName: CD- identifier: II
domain::GeneIdentifier
- databaseName: CD- identifier: II
LSPIM_1_0::NucleicAcidSequenceFeature
- orientation: ST
LSPIM_1_0::NucleicAcidPhysicalLocation
- endCoordinate: INT- startCoordinate: INT
domain::NucleicAcidPhysicalLocation
- assembly: ST- endCoordinate: INT- featureType: CD- startCoordinate: INT
domain::NucleicAcidSequenceFeature
- gridIdentifier: TEL.URL- orientation: ST
LSPIM_1_0::Organism
- commonName: ST- ncbiTaxonomyId: CD- scientificName: CD- taxonomyRank: CD
domain::Organism
- commonName: ST- gridIdentifier: TEL.URL- ncbiTaxonomyId: II- scientificName: ST- taxonomyRank: CD
0..*
is from / has
1
«trace»
1.. *
is included in / includes0..*
«trace»
«trace»
«trace»
0..*
is included in / includes
1.. *
1.. *
identifies / is identified by
0..*
«trace»
identifies / is identified by
26
Platform Independent Model
• PIM is based on the LSPIM but it may be constrained and localized:
• Add any attributes that are needed
• Remove attributes which are unnecessary
• Add associations • Add new classes
LSPIM MAPIM
27
NucleicAcidPhysicalLocation
Trace Attribute Name Type Description
LSPIM startCoordinate INT The start coordinate of the range (inclusive), given as an integer offset from the start of the sequence.
LSPIM endCoordinate INT The end coordinate of the range (inclusive), given as an integer offset from the start of the sequence.
New featureType CD The type of gene feature located, e.g. GENE, CDS, UTR, RNA, PSEUDO.
New assembly ST The genome assembly which this location is defined in reference to.
28
Traceability for Information Models
29
Operations
Operation No.
Operation Name Interface Name Operation Description
MA-INF1-OP1
getGeneBySymbol MAGeneAnnotationQuery Returns the gene named by the specified gene symbol or gene alias
MA-INF1-OP2
getGeneByMicroarrayReporter MAGeneAnnotationQuery Returns the gene associated with the specified microarray reporter
MA-INF1-OP3
getFunctionalAssociations MAGeneAnnotationQuery Returns annotations describing a gene’s molecular function
MA-INF1-OP4
getCellularLocations MAGeneAnnotationQuery Returns annotations describing a gene’s location within a cell
… … … …
30
Operation Behavior Description
Behavior Description
Client supplies a GeneSearchCriteria instance with a gene symbol or alias and an Organism to search within
The case of the symbol or alias is ignored If the Organism is null then all Organisms are searched The system returns the matching Gene object(s), if any
Pre-Conditions NoneSecurity Pre-Conditions
None
Inputs GeneSearchCriteria Outputs Return:
Fully-populated instance(s) of the Gene class Post-Conditions NoneException Conditions
No matching genes found
Additional Details NoneNotes None
getGeneBySymbol
Returns the gene named by the specified gene symbol or gene alias and the gene’s organism.
31
Search Criteria (Inputs)
class Utility
util::GeneSearchCriteria
- symbolOrAlias: ST
domain::Organism
- bigid: TEL.URL- commonName: ST- ncbiTaxonomyId: II- scientificName: ST- taxonomyRank: CD
util::ReporterSearchCriteria
- reporterName: ST
domain::ArrayDesign
- description: ST- L SID: II- manufacturer: ST- name: ST
0..1 0..1
32
Dynamic Model
sd Dynamic Model
End User
«service»
Interfaces::MAService
Data Store
getGeneBySymbol("BRCA2") :Gene
executeQuery()
Gene("BRCA2")
getDiseaseAssociations("BRCA2") :DiseaseAssociat ion
executeQuery()
Collection(DiseaseAssociation)
Enterprise/BusinessViewpoint
InformationViewpoint
ComputationalViewpoint
EngineeringViewpoint
Computation Independent Model (CIM)
Platform Independent Model (PIM)
Platform Specific Model (PSM)
34
ECCF Artifact Matrix
35
Relation to PIM
Platform Independent Model Name and Service Specification
Platform Independent Model and Service Specification Version
Description & Link to the Platform Independent Model and Service Specification
Molecular Annotation Service Platform Independent Model and Service Specification
0.1.0 http://gforge.nci.nih.gov/svnroot/cabiodb/ECCF/artifacts/logical/PIMSS_Molecular_Annotation_Service.doc
36
PSM Information Model
37
Service Interface
Implemented Interface No.
Supported Interface Name Interface Description Link
MA-INF1 MAGeneAnnotationQuery Includes all operations for retrieving gene annotations.
N/A
DS-INF1 Data Service Query Contains the CQL query operation
https://ncisvn.nci.nih.gov/svn/cagrid/branches/caGrid-1_3_release/cagrid-1-0/caGrid/projects/data/schema/Data/DataService.wsdl
38
Implementation
1. Leverage new releases of:• ISO 21090 Common Library• caCORE SDK • caGrid / Introduce
2. Create new MA database and map to MA PSM
3. Populate MA database with data from caBIO database
4. Generate caCORE-like system from the MA PSM
5. Generate data service with Introduce
6. Add custom methods to implement service operations
39
Deployment Plan
40
Resources
• caBIO
https://wiki.nci.nih.gov/display/caBIO/caBIO+Wiki+Home+Page
• caBIO ECCF Pilot Project
https://wiki.nci.nih.gov/display/caBIO/caBIO+ECCF
• SAIF Book
http://gforge.hl7.org/gf/download/docmanfileversion/5503/6972/saeaf_02_19_10.pdf
41
Questions?