+ All Categories
Home > Documents > 1 caBIO ECCF Pilot Konrad Rokicki ICR Workspace Call July 28, 2010.

1 caBIO ECCF Pilot Konrad Rokicki ICR Workspace Call July 28, 2010.

Date post: 03-Jan-2016
Category:
Upload: amelia-hodges
View: 214 times
Download: 0 times
Share this document with a friend
Popular Tags:
40
1 caBIO ECCF Pilot Konrad Rokicki ICR Workspace Call July 28, 2010
Transcript
Page 1: 1 caBIO ECCF Pilot Konrad Rokicki ICR Workspace Call July 28, 2010.

1

caBIO ECCF Pilot

Konrad Rokicki

ICR Workspace Call

July 28, 2010

Page 2: 1 caBIO ECCF Pilot Konrad Rokicki ICR Workspace Call July 28, 2010.

2

What is caBIO?

• Repository of molecular annotations loaded with data from many different sources

• Currently exposes data using many APIs and interfaces:• SDK-generated: Java API, REST API, SOAP API• Grid Data Service• Python API• Portlet

• History as a pilot project: • First caCORE generated system• First silver-level grid service• First caGrid portlet• First CBIIT iPhone app

Page 3: 1 caBIO ECCF Pilot Konrad Rokicki ICR Workspace Call July 28, 2010.

3

Goals for the Pilot Project

• Leverage caBIO as a reference implementation of the NCI CBIIT ECCF

• Develop a set of ECCF-based Molecular Annotation Service specifications

• Implement and deploy a service based on service specifications

• Provide guidelines to assist other NCI CBIIT products in leveraging ECCF processes and developing ECCF artifacts

• Provide input on the ECCF Implementation Guide

• Develop guidelines that are pragmatic and useful

• Identify list of tools and infrastructure that will assist in the development of services and specifications

Page 4: 1 caBIO ECCF Pilot Konrad Rokicki ICR Workspace Call July 28, 2010.

4

Team

• caBIO Team• Juli Klemm• Sharon Gaheen• Jim Sun• Liqun Qi• Konrad Rokicki

• ECCF Mentoring• Baris Suzek • Brian Davis• Raghu Chintalapati

Page 5: 1 caBIO ECCF Pilot Konrad Rokicki ICR Workspace Call July 28, 2010.

5

ECCF Artifact Matrix

Enterprise/BusinessViewpoint

InformationViewpoint

ComputationalViewpoint

EngineeringViewpoint

Computation Independent Model (CIM)

Platform Independent Model (PIM)

 

Platform Specific Model (PSM)

 

Page 6: 1 caBIO ECCF Pilot Konrad Rokicki ICR Workspace Call July 28, 2010.

6

RM-ODP Viewpoints

• Enterprise/Business Viewpoint• Purpose / Scope• Business cases /Storyboards• Industry standards

• Information Viewpoint• Information Models (DAM, PIM, PSM)• Semantic Profiles

• Computational Viewpoint• Capabilities / Operations• Functional Profiles

• Engineering Viewpoint• Non-functional Requirements• Deployment model

Page 7: 1 caBIO ECCF Pilot Konrad Rokicki ICR Workspace Call July 28, 2010.

7

Levels of Abstraction

• Computation Independent Model (CIM)• Service Scope and Description Document • CIM Service Specification Document (CIMSS)

• Platform Independent Model (PIM)• PIM Service Specification Document (PIMSS)

• Platform Specific Model (PSM)• PSM Service Specification Document (PSMSS)• Service Integration Guide

• Implementation• Deployable System

Page 8: 1 caBIO ECCF Pilot Konrad Rokicki ICR Workspace Call July 28, 2010.

8

Enterprise Service Specification Process

Page 9: 1 caBIO ECCF Pilot Konrad Rokicki ICR Workspace Call July 28, 2010.

9

Project Plan

Page 10: 1 caBIO ECCF Pilot Konrad Rokicki ICR Workspace Call July 28, 2010.

10

Scope and Service Description

• “The Molecular Annotation Service provides a set of interfaces for the annotation of experimental or other types of data with molecular information. ”

• “The purpose of Molecular Annotations service is to provide specifications for a set of molecular annotations that may be integrated with user-facing applications.”

• “The development of a common, reusable set of interfaces provided by this service will facilitate standardization, integration, and interoperability between various systems that provide and consume molecular annotations.”

Page 11: 1 caBIO ECCF Pilot Konrad Rokicki ICR Workspace Call July 28, 2010.

11

Mapping to LSBAM

LS BAM Use Case Service Mapping Description

Characterize/Organize the Data

The molecular annotations service supports the Characterize/Organize the Data use cases by providing annotations for molecular entities associated with data. For example, in characterizing experimental data, a researcher may look up reference annotations with the service to find which genes are mapped to the microarray used in the experiment.

Integrate Data Sets The molecular annotations service supports the Integrate Data Sets use case as it will provide the capability of retrieving annotations from the service to use as join points, or to display as an additional reference.

Annotate Findings/Results

The molecular annotations service supports the Annotate Findings/Results use case as the service provides direct support for obtaining information associated with molecular entities to assist in annotating findings/results.

Identify and Review Knowledge Bases and /or Databases

The molecular annotations service supports the Identify and Review Knowledge Bases and/or Databases use case as the service provides support for knowledge discovery via the integration of annotations across disparate data sources.

Page 12: 1 caBIO ECCF Pilot Konrad Rokicki ICR Workspace Call July 28, 2010.

12

Business Storyboards

Outline Bioinformatics developer wants to retrieve all diseases and agents associated with a target gene

Detail John Smith is developing a web site that allows researchers to find all of the diseases associated with a specific gene. The site will also allow researchers to select a gene and obtain a list of agents (drugs) used to target that gene. By querying the molecular annotations service, John’s web application can retrieve a list of diseases and agents associated with a gene.

Page 13: 1 caBIO ECCF Pilot Konrad Rokicki ICR Workspace Call July 28, 2010.

13

Scope

Items Scope / Out of Scope

Source

Provide the ability to retrieve molecular annotations

Scope Molecular Annotation Service Scope and Description

Provide the ability functional associations, cellular locations, and biological processes associated with a gene

Scope Molecular Annotation Service Scope and Description

Provide the ability to retrieve disease and agents associated with a gene

Scope Molecular Annotation Service Scope and Description

Provide the ability to retrieve variations associated with a gene

Scope Molecular Annotation Service Scope and Description

… … …

Page 14: 1 caBIO ECCF Pilot Konrad Rokicki ICR Workspace Call July 28, 2010.

14

Semantic Profiles

Semantic Profile No.

Semantic Profile Name

Constrained Information Model

Semantic Profile Description

MA-SP1 Molecular Annotation Domain Analysis Model

LSDAM v1.1 The molecular annotation service will use semantics from the Life Science DAM. The following classes are included in the project-specific DAM (grouped by sub-domain):

GeneNucleicAcidSequenceFeatureMolecularSequenceAnnotationGeneticVariationSingleNucleotidePolymorphismNucleicAcidPhysicalLocation…

Page 15: 1 caBIO ECCF Pilot Konrad Rokicki ICR Workspace Call July 28, 2010.

15

Project Analysis Model

class Molecular Annotation

domain::MolecularSequenceAnnotation

- date: TS

domain::Gene

- symbol: ST

domain::GeneIdentifier

- databaseName: CD- identifier: II

domain::NucleicAcidSequenceFeature

- orientat ion: ST

domain::AdditionalOrganismName

- comment : ST- source: CD- value: ST

domain::Organism

- commonName: ST- ncbiTaxonomyId: CD- scientificName: CD- taxonomyRank: CD

domain::MolecularSequence

- value: SC

domain::NucleicAcidPhysicalLocation

- endCoordinate: INT- startCoordinate: INT

domain::NucleicAcidSequence

domain::GeneticVariation

domain::SingleNucleotidePolymorphism

domain::SingleNucleotidePolymorphismIdentifier

- databaseName: CD- identifier: II

BRIDG 2.1 - ISO::TherapeuticAgent

+ identifier: II+ statusCode: CD+ statusDateRange: IVL<TS>

BRIDG 2.1 - ISO::Material

- actualIndicator: BL+ descript ion: ST+ formCode: CD+ identifier: DSET<II>+ name: DSET<EN.TN>+ statusCode: CD+ statusDateRange: IVL<TS>

BRIDG 2.1 - ISO::Produc t

+ classCode: DSET<CD>+ expirationDate: TS+ pre1938Indicator: BL+ typeCode: CD

0..*

identifies / is identif iedby

1

0..*

is included in /includes

1.. *

1.. *

is designated by / designates

0..*

0..*

is included in /includes

1

0..*

reports / is reported by

0..*1

is included in /includes

0..*

0..*

identifies / is identif iedby

1

+product 1

plays / is played by

+therapeuticAgent 0..1

+product 0..1

has component /used as component

+productCollection0..*

Page 16: 1 caBIO ECCF Pilot Konrad Rokicki ICR Workspace Call July 28, 2010.

16

Capabilities

Name DescriptionGet Gene By Symbol or Alias

Returns the gene named by the specified gene symbol or gene alias

Get Gene By Microarray Reporter

Returns the gene associated with the specified microarray reporter

Get Functional Associations Returns annotations describing a gene's molecular function

Get Cellular Locations Returns annotations describing a gene's location within a cell

Get Biological Processes Returns annotations describing a gene's role in biological processes

Get Disease Associations Returns findings about a gene's role in diseasesGet Agent Associations Returns findings about agents which target a given geneGet Structural Variations Returns variations which are located on a given geneGet Homologous Gene Returns a gene’s homologous gene in a specified organism

Page 17: 1 caBIO ECCF Pilot Konrad Rokicki ICR Workspace Call July 28, 2010.

17

Capability Details

Name [M] Get Gene By Symbol or AliasDescription [M] Returns the gene named by the specified gene symbol or

gene alias and the gene’s organismPre-Conditions [M] NoneSecurity Pre-Conditions [M] None

Inputs [M] Gene Symbol or Alias

Organism IdentifierOutputs [M] A collection of Gene objects  Post-Conditions [O] NoneException Conditions [M] No matching genes found

Aspects left for Technical Bindings [O]

Format and data type for the Organism Identifier

Notes [O] NA

Page 18: 1 caBIO ECCF Pilot Konrad Rokicki ICR Workspace Call July 28, 2010.

18

Functional Profiles

Functional Profile No.

Functional Profile Name

Functional Profile Description

Capability Names

MA-FP1 Gene Annotation Query Profile

Contains all the capabilities for retrieving gene annotations

Get Gene By Symbol or Alias Get Gene By Microarray Reporter Get Functional Associations Get Cellular Locations Get Biological Processes Get Disease Associations Get Agent Associations Get Structural Variations Get Homologous Gene

Page 19: 1 caBIO ECCF Pilot Konrad Rokicki ICR Workspace Call July 28, 2010.

19

Conformance Profiles

Conformance No MA-CP1

Conformance Name

LSDAM-based Gene Annotation Conformance Profile

Description This conformance profile defines the functionality for the Gene Annotation Service using LSDAM semantics

Usage Context This profile would be used by a researcher wishing to access gene annotations

Mandatory No

Functional Profile(s)

MA-FP1 : Gene Annotation Query Profile

Semantic Profile(s) MA-SP1 : LSDAM v1.1

Page 20: 1 caBIO ECCF Pilot Konrad Rokicki ICR Workspace Call July 28, 2010.

20

Activity Diagrams

act Query Serv ice

Da

ta Q

ue

ry S

erv

ice

Se

rvic

e U

se

r

Start

Send Query

Compose Query

Identify Operation

Set Parameters

Receive Query

ExecuteQuery

Return Results

Receive Results

ProcessResults

End

Data Sourc e

Page 21: 1 caBIO ECCF Pilot Konrad Rokicki ICR Workspace Call July 28, 2010.

21

Conformance Statements

Name Type Viewpoint Description Test methodQuery Performance

Obligation Engineering The MA service should provide a response within 0.5 seconds to support a synchronous UI based client

Test cases to include performance testing.

Additional Functionality

Permission Computational The MA service can provide additional functionality other than specified in these specifications

Design Review

Semantic Model

Obligation Informational The MA service must provide traceability to classes in the LSDAM where applicable.

Design Review

Data Types Obligation Informational The MA service must conform to NCI’s constrained list of ISO 21090 data types.

Design Review

Functional Profiles

Obligation Computational Functional Profiles shall be deployed as functional wholes. Ignoring or omitting functional behavior defined within a functional profile is not permitted, nor is diverging from the detailed functional specifications provided in Section 4.

1. Design Review

2. Test cases

Page 22: 1 caBIO ECCF Pilot Konrad Rokicki ICR Workspace Call July 28, 2010.

22

ECCF Artifact Matrix

Enterprise/BusinessViewpoint

InformationViewpoint

ComputationalViewpoint

EngineeringViewpoint

Computation Independent Model (CIM)

Platform Independent Model (PIM)

 

Platform Specific Model (PSM)

 

Page 23: 1 caBIO ECCF Pilot Konrad Rokicki ICR Workspace Call July 28, 2010.

23

Relation to CIM

Conceptual Functional Service Specification Name

Conceptual Functional Service Specification Version

Description & Link to the Conceptual Functional Service Specification

Molecular Annotation Computation Independent Service Specification

0.0.4 https://gforge.nci.nih.gov/svnroot/cabiodb/ECCF/artifacts/conceptual/CIMSS_Molecular_Annotation_Service.doc

Deviation from the Conceptual Functional Service Specification

Reason for Deviation

None

Page 24: 1 caBIO ECCF Pilot Konrad Rokicki ICR Workspace Call July 28, 2010.

24

Relationship to Standards

Standards Description

LSBAM v1.0 Service conforms to NCI’s Life Science Business Architecture Model

LSDAM v1.1 Service conforms to the Life Sciences DAM version 1.1LSPIM v0.1 Service conforms to the Life Sciences PIM version 0.1ISO 21090 Service conforms to NCI’s version of ISO 21090 data typesHUGO Gene Symbols

Service leverages gene symbols from the Human Genome Organization

MGI Gene Symbols

Service leverages gene symbols from the International Committee on Standardized Genetic Nomenclature for Mice

Page 25: 1 caBIO ECCF Pilot Konrad Rokicki ICR Workspace Call July 28, 2010.

25

What is the LSPIM?

• Life Science Platform Independent Model

• Based on the LSDAM v1.1

• A PIM is derived from a DAM by following some guidelines:

• Constrain to relevant classes and attributes

• Localize by adding attributes as needed

• Semantics are made explicit

• Enumeration of value domains

• Resolution of type codes into class hierarchies

• Resolution of all many to many relationships

• All required compliance with other models needs to be expressed at PIM level

• The LSPIM is currently being developed by the Information Representation Working Group (IRWG)

Page 26: 1 caBIO ECCF Pilot Konrad Rokicki ICR Workspace Call July 28, 2010.

class Gene and related classes

LSPIM_1_0::Gene

- symbol: ST

domain::Gene

- fullName: ST- symbol: ST

domain::GeneIdentifier

- databaseName: CD- identifier: II

domain::GeneIdentifier

- databaseName: CD- identifier: II

LSPIM_1_0::NucleicAcidSequenceFeature

- orientation: ST

LSPIM_1_0::NucleicAcidPhysicalLocation

- endCoordinate: INT- startCoordinate: INT

domain::NucleicAcidPhysicalLocation

- assembly: ST- endCoordinate: INT- featureType: CD- startCoordinate: INT

domain::NucleicAcidSequenceFeature

- gridIdentifier: TEL.URL- orientation: ST

LSPIM_1_0::Organism

- commonName: ST- ncbiTaxonomyId: CD- scientificName: CD- taxonomyRank: CD

domain::Organism

- commonName: ST- gridIdentifier: TEL.URL- ncbiTaxonomyId: II- scientificName: ST- taxonomyRank: CD

0..*

is from / has

1

«trace»

1.. *

is included in / includes0..*

«trace»

«trace»

«trace»

0..*

is included in / includes

1.. *

1.. *

identifies / is identified by

0..*

«trace»

identifies / is identified by

26

Platform Independent Model

• PIM is based on the LSPIM but it may be constrained and localized:

• Add any attributes that are needed

• Remove attributes which are unnecessary

• Add associations • Add new classes

LSPIM MAPIM

Page 27: 1 caBIO ECCF Pilot Konrad Rokicki ICR Workspace Call July 28, 2010.

27

NucleicAcidPhysicalLocation

Trace Attribute Name Type Description

LSPIM startCoordinate INT The start coordinate of the range (inclusive), given as an integer offset from the start of the sequence.

LSPIM endCoordinate INT The end coordinate of the range (inclusive), given as an integer offset from the start of the sequence.

New featureType CD The type of gene feature located, e.g. GENE, CDS, UTR, RNA, PSEUDO.

New assembly ST The genome assembly which this location is defined in reference to.

Page 28: 1 caBIO ECCF Pilot Konrad Rokicki ICR Workspace Call July 28, 2010.

28

Traceability for Information Models

Page 29: 1 caBIO ECCF Pilot Konrad Rokicki ICR Workspace Call July 28, 2010.

29

Operations

Operation No.

Operation Name Interface Name Operation Description

MA-INF1-OP1

getGeneBySymbol MAGeneAnnotationQuery Returns the gene named by the specified gene symbol or gene alias

MA-INF1-OP2

getGeneByMicroarrayReporter MAGeneAnnotationQuery Returns the gene associated with the specified microarray reporter

MA-INF1-OP3

getFunctionalAssociations MAGeneAnnotationQuery Returns annotations describing a gene’s molecular function

MA-INF1-OP4

getCellularLocations MAGeneAnnotationQuery Returns annotations describing a gene’s location within a cell

… … … …

Page 30: 1 caBIO ECCF Pilot Konrad Rokicki ICR Workspace Call July 28, 2010.

30

Operation Behavior Description

Behavior Description

Client supplies a GeneSearchCriteria instance with a gene symbol or alias and an Organism to search within

The case of the symbol or alias is ignored If the Organism is null then all Organisms are searched The system returns the matching Gene object(s), if any

Pre-Conditions NoneSecurity Pre-Conditions

None

Inputs GeneSearchCriteria Outputs Return:

Fully-populated instance(s) of the Gene class Post-Conditions NoneException Conditions

No matching genes found

Additional Details NoneNotes None

getGeneBySymbol

Returns the gene named by the specified gene symbol or gene alias and the gene’s organism.

Page 31: 1 caBIO ECCF Pilot Konrad Rokicki ICR Workspace Call July 28, 2010.

31

Search Criteria (Inputs)

class Utility

util::GeneSearchCriteria

- symbolOrAlias: ST

domain::Organism

- bigid: TEL.URL- commonName: ST- ncbiTaxonomyId: II- scientificName: ST- taxonomyRank: CD

util::ReporterSearchCriteria

- reporterName: ST

domain::ArrayDesign

- description: ST- L SID: II- manufacturer: ST- name: ST

0..1 0..1

Page 32: 1 caBIO ECCF Pilot Konrad Rokicki ICR Workspace Call July 28, 2010.

32

Dynamic Model

sd Dynamic Model

End User

«service»

Interfaces::MAService

Data Store

getGeneBySymbol("BRCA2") :Gene

executeQuery()

Gene("BRCA2")

getDiseaseAssociations("BRCA2") :DiseaseAssociat ion

executeQuery()

Collection(DiseaseAssociation)

Page 33: 1 caBIO ECCF Pilot Konrad Rokicki ICR Workspace Call July 28, 2010.

Enterprise/BusinessViewpoint

InformationViewpoint

ComputationalViewpoint

EngineeringViewpoint

Computation Independent Model (CIM)

Platform Independent Model (PIM)

 

Platform Specific Model (PSM)

 

34

ECCF Artifact Matrix

Page 34: 1 caBIO ECCF Pilot Konrad Rokicki ICR Workspace Call July 28, 2010.

35

Relation to PIM

Platform Independent Model Name and Service Specification

Platform Independent Model and Service Specification Version

Description & Link to the Platform Independent Model and Service Specification

Molecular Annotation Service Platform Independent Model and Service Specification

0.1.0 http://gforge.nci.nih.gov/svnroot/cabiodb/ECCF/artifacts/logical/PIMSS_Molecular_Annotation_Service.doc

Page 35: 1 caBIO ECCF Pilot Konrad Rokicki ICR Workspace Call July 28, 2010.

36

PSM Information Model

Page 36: 1 caBIO ECCF Pilot Konrad Rokicki ICR Workspace Call July 28, 2010.

37

Service Interface

Implemented Interface No.

Supported Interface Name Interface Description Link

MA-INF1 MAGeneAnnotationQuery Includes all operations for retrieving gene annotations.

N/A

DS-INF1 Data Service Query Contains the CQL query operation

https://ncisvn.nci.nih.gov/svn/cagrid/branches/caGrid-1_3_release/cagrid-1-0/caGrid/projects/data/schema/Data/DataService.wsdl 

Page 37: 1 caBIO ECCF Pilot Konrad Rokicki ICR Workspace Call July 28, 2010.

38

Implementation

1. Leverage new releases of:• ISO 21090 Common Library• caCORE SDK • caGrid / Introduce

2. Create new MA database and map to MA PSM

3. Populate MA database with data from caBIO database

4. Generate caCORE-like system from the MA PSM

5. Generate data service with Introduce

6. Add custom methods to implement service operations

Page 38: 1 caBIO ECCF Pilot Konrad Rokicki ICR Workspace Call July 28, 2010.

39

Deployment Plan

Page 39: 1 caBIO ECCF Pilot Konrad Rokicki ICR Workspace Call July 28, 2010.

40

Resources

• caBIO

https://wiki.nci.nih.gov/display/caBIO/caBIO+Wiki+Home+Page

• caBIO ECCF Pilot Project

https://wiki.nci.nih.gov/display/caBIO/caBIO+ECCF

• SAIF Book

http://gforge.hl7.org/gf/download/docmanfileversion/5503/6972/saeaf_02_19_10.pdf

Page 40: 1 caBIO ECCF Pilot Konrad Rokicki ICR Workspace Call July 28, 2010.

41

Questions?


Recommended