+ All Categories
Home > Documents > Science-driven informatics for pcori pprn

Science-driven informatics for pcori pprn

Date post: 23-Feb-2016
Category:
Upload: anevay
View: 17 times
Download: 0 times
Share this document with a friend
Description:
Science-driven informatics for pcori pprn. Kristen Anton UNC Chapel Hill/ White River Computing Dan Crichton White River Computing February 3, 2014. From Information Design , Nathan Shedroff. White River Computing / UNC Chapel Hill. Architecture – what is it?. Architecture:. - PowerPoint PPT Presentation
Popular Tags:
21
SCIENCE-DRIVEN INFORMATICS FOR PCORI PPRN Kristen Anton UNC Chapel Hill/ White River Computing Dan Crichton White River Computing February 3, 2014
Transcript
Page 1: Science-driven informatics for  pcori pprn

SCIENCE-DRIVEN INFORMATICSFOR PCORI PPRN

Kristen AntonUNC Chapel Hill/ White River Computing

Dan CrichtonWhite River Computing

February 3, 2014

Page 2: Science-driven informatics for  pcori pprn

White River Computing / UNC Chapel HillFrom Information Design, Nathan Shedroff

Page 3: Science-driven informatics for  pcori pprn

• Process Architecture – describes the core processes for the system• Data Architecture – describes the information models and data

standards for the system• Application Architecture – Portals, tools, etc.• Technology Architecture – Infrastructure elements

White River Computing / UNC Chapel Hill

Architecture – what is it?

• The fundamental organization of a system embodied in its components, their relationships to each other and to the environment, and the principles guiding its design and evolution (ANSI/IEEE Std. 1471-2000)

Architecture:

Architecture is decomposed into four core pieces :

Page 4: Science-driven informatics for  pcori pprn

• Identify the drivers and requirements• Create an architectural description of the system –

identified stakeholders, concerns and associated models• Identify core architectural principals• Separate the architecture into key viewpoints• Create a decomposition of the system identifying the

elements and mapping to the requirements• Identified the high-level flows and analyze from the

rpocess, information and application/technology perspectives

• Generate the architectural models

White River Computing / UNC Chapel Hill

Architecture Development Approach

Page 5: Science-driven informatics for  pcori pprn

• One of the major challenges is communicating an architecture

• Who are the PCORI stakeholders that care about the architecture?

• How do we communicate their care-abouts?

White River Computing / UNC Chapel Hill

Communicating an Architecture

• Determine a useful view of the system for the stakeholder

• Projects have suffered because a useful view wasn’t provided

The viewpoint is where you look from

The view is what you see

(Stakeholders)

Page 6: Science-driven informatics for  pcori pprn

• The organization, implementation and deployment of the software should follow the identification of an architecture which aligns with the principles and needs of the stakeholders

• The separation of the architecture into concerns will let us determine what capabilities exist and what capabilities need to be developed

• Ultimately this will help to ensure that a system is deployed which will integrate

White River Computing / UNC Chapel Hill

Software Development

Page 7: Science-driven informatics for  pcori pprn

White River Computing / UNC Chapel Hill

Recommended Software Development Approach

Project Formulation

SystemFormulation/Architecture

Site Development

Project Organization, Objectives, High Level Schedule and Project Plan

High-Level Architecture for System and Data, Architecture, Data Flows, Initial Data Structure, etc

Development and deployment of theinfrastructure and architecture; development of the core data model/ consistent with PCORnet “universal” data model?

Jan 2014 – Mar 2014

Feb 2014 – June 2014

June 2014 – June 2015

Page 8: Science-driven informatics for  pcori pprn

White River Computing / UNC Chapel Hill

Supporting science-driven research needs:Case Study – Early Detection Research Network

(EDRN)• Research network of collaborating scientists from

more than 40 institutions – international network of networks

• Focus on identifying and validating biomarkers of cancer at early stage/ preclinical

Bioinformatics challenges in EDRN:Developing computing infrastructure that is “biomarker-centric.”

Improve research capability by enabling real-time access to a variety of information that crosses institutional boundaries.

Page 9: Science-driven informatics for  pcori pprn

• Coordinated discovery and validation of biomarkers across cancer research centers to increase accuracy of the results of studies

• Accommodating various data types• Facilitation of analytics through data integration

and single-point access• Support workflows associated with various types

of information• Encouraging and supporting collaboration

White River Computing / UNC Chapel Hill

Bioinformatics – GoalsSupporting science-driven research needs

Page 10: Science-driven informatics for  pcori pprn

White River Computing / UNC Chapel Hill

Bioinformatics – GoalsSupporting science-driven research needs

• Linking highly diverse systems together to integrate and present data for analytics

• Defining a comprehensive information model for describing the problem space/ ontology

• Providing software interfaces for capture, discovery, and access of data resources

• Providing a secure transfer and distribution infrastructure• Enabling all data sources to be heterogeneous and

distributed• Providing integrated portal for access to distributed data• Providing bioinformatics tools/ pipelines for uniform data

processing

Page 11: Science-driven informatics for  pcori pprn

White River Computing / UNC Chapel Hill

BioinformaticsEDRN Knowledge Environment

Functional architecture: Services

• Data capture• Data discovery• Data access• Data retrieval• Data processing• Data distribution

Page 12: Science-driven informatics for  pcori pprn

• Biomarkers• Studies• Participants• Organs• Data generated from instruments (e.g. mass

spec, arrays)

White River Computing / UNC Chapel Hill

BioinformaticsEDRN Knowledge Environment

Information architecture: Data Model across EDRN projects (“universal” data model)

• Representation of information associated with data objects managed within the knowledge system

• Models for:

Page 13: Science-driven informatics for  pcori pprn

• Relationships between and among objects• Standard set of metadata elements that can be used for

annotating objects• Multiple metadata schemata for machine usable

explanations of the metadata descriptions• Metadata descriptions describe the inception and

composition of data• Common language for describing data and associated

attributes: Common Data Elements (CDEs)• CDE has a Uniform Resource Identifier (URI) – URL form

points to CDE definition page – used in XML standards

White River Computing / UNC Chapel Hill

BioinformaticsEDRN Knowledge Environment

Information architecture: Data Model

Page 14: Science-driven informatics for  pcori pprn

eCASScience Warehouse

CDE Repository

ERNE

VSIMS

Participant DB

Protocol DB

Public Portal

Distributed SpecimenDatabases

EDRN science data results (local, distributed and varying

degrees of validation)

Descriptions of biomarkersand their use (protocol_id)

Descriptions of EDRN studies-Participants-Specimen tracking, etc

Protocols and theirdescriptions

Data elements and their descriptions

BIOINFORMATICSTOOLS

EDRN science data results(protocol_id,

participant_id)(protocol_id,

participant_id)

(protocol_id,participant_id)

Biomarker_DB(protocol_id)

Participants and their

characteristics

EDRN Knowledge Environment

Page 15: Science-driven informatics for  pcori pprn

• Biomarker Database holds 850 curated biomarkers, including panels/ signatures of biomarkers

• Biomarker Database modeled to reflect the data model: activity in multiple organs, protocols, data files – facilitate single-point data access

• eSIS contains 165 protocols• eCAS holds 56 data sets, with many files in each set, and more added

daily – standard metadata around each set and each product• Two bioinformatics tools implemented: Proteomics “pipeline”

(generating standardized biomarker identification files); REDCap (standardized data definition and capture at the project level) – additional in progress

• Common Data Elements (CDEs) contributed to the NCI repository• CDE has a Uniform Resource Identifier (URI) – URL form points to

CDE definition page – used in XML standards• Portal facilitates authorized access to almost 200,000 specimens• Publications and Resources

White River Computing / UNC Chapel Hill

EDRN Knowledge EnvironmentSuccess?

Page 16: Science-driven informatics for  pcori pprn

White River Computing / UNC Chapel Hill

EDRN Knowledge EnvironmentTechnology

• Iterative development• Open Source philosophy and tools• Apache OODT (Object Oriented Data

Technology)

Software components developed independent of any data model:

EDRN’s computing infrastructure can be replicated

Page 17: Science-driven informatics for  pcori pprn

White River Computing / UNC Chapel Hill

EDRN Knowledge EnvironmentTechnology

Page 18: Science-driven informatics for  pcori pprn

White River Computing / UNC Chapel Hill

Bioinformatics – GoalsSupporting science-driven research needs: SHARE

Page 19: Science-driven informatics for  pcori pprn

Geisel School of Medicine at Dartmouth / UNC Chapel Hill

Bioinformatics – GoalsSupporting science-driven research needs: SHARE

Page 20: Science-driven informatics for  pcori pprn

Geisel School of Medicine at Dartmouth / UNC Chapel Hill

Supporting science-driven research needs: PCORI PPRN

Page 21: Science-driven informatics for  pcori pprn

Geisel School of Medicine at Dartmouth / UNC Chapel Hill

Opportunity to offer our architecture to PCORnet?

Synergy in data modelQuery across CCFA PPRN network …network of networks?


Recommended