Date post: | 01-Jan-2016 |
Category: |
Documents |
Upload: | jacob-york |
View: | 215 times |
Download: | 1 times |
1
A National Virtual Specimen Database for Early Cancer Detection
June 26, 2003
Daniel CrichtonNASA Jet Propulsion Laboratory
Sean KellyNASA Jet Propulsion Laboratory
Mark Thornquist Fred Hutchinson Cancer Research Center
Sudhir Srivastava National Cancer Institute
Heather KincaidFred Hutchinson Cancer Research Center
Donald JohnseyNational Cancer Institute
Marcy WingetFred Hutchinson Cancer Research Center
2
Vision
Development of a world-wide knowledge and informatics environment for sharing cancer specimen data across repositories
Data and Computers interconnected to
form a virtual database Integrated Cancer Resources
•Specimens•Images•Assays•Biomarkers•etc
3
Early Detection Research Network(EDRN)
5-Year collaboration supported by NCI Goal: Identify, evaluate, and validate
promising biomarkers to support the early detection of cancer
Comprised of:• 18 Biomarker Laboratories• 9 Clinical and Epidemiology Centers• 3 Biomarker Validation Laboratories• Data Management and Coordinating Center
4
EDRN Resource Network Exchange (ERNE) Virtual Specimen Repository (real-time access to
distributed repositories) Informatics infrastructure created for EDRN Existing sites specimen databases maintained locally Uses EDRN Common Data Elements (CDEs) Maps institutions local data definitions to EDRN
CDEs Secure and Confidential Secure Dynamic Portal
6
Information Infrastructure Progress
Initiation (10/00 - 3/01)
•Connect Moffitt and San Antonio•Finalize EDRN CDEs used in knowledge system•Create Dynamic Portal•Present Feasibility at EDRN S.C. Meeting
•Discuss Informatics at 2nd EDRN S.C. Meeting•Present Mock Knowledge System at EDRN S.C. Meeting
Feasibility (4/01 - 10/01)
Pilot (10/01 - 9/02)•Implement four sites•Finalize IRB Protocol template•Create Online Mapping Tool•Present at EDRN S.C. Meeting
Implementation (9/02 - 6/03)•Implement three additional sites•Present at EDRN S.C. Meeting
7
EDRN Bioinformatics Architecture3. Repositories for storing and retrieving many data types data
1. Bioformatics tools and applications use “API”
Visualization Tools
Analysis Tools
“OODT”Middleware
“OODT”Middleware
EDRNData
Repositories
EDRNData
Repositories
APIAPI
APIAPI
2. Middleware creates theinformatics infrastructure connecting systems and data
SPOREData
Repositories
SPOREData
Repositories
OtherData
Repositories
OtherData
Repositories
APIAPI
Web Search Tools
MetadataMediationStandard
Metadata
8
Informatics Infrastructure
Connect local databases via the Internet Query multiple institutional databases
concurrently Metadata-based distributed framework Object Oriented Data Technology (OODT)
framework (JPL)• Combines semantic data model with distributed
services to create a “grid” architecture
9
OODT Framework
Developed by NASA to support science data management for the robotic planetary program
Defines a reusable architectural pattern that enables• information clustering and retrieval across distributed
data resources• intelligent query algorithm for scalability• interoperability between disparate data models• a reusable software components• domain independence• plug-in for various distributed computing
implementations
OODT/Science Web Tools
OODT/Science Web Tools
LocalClient
L ab-w ide Component F ramework
ProfileXMLData
ProfileXMLData
Data System
2
Data System
2
Data System
1
Data System
1
QueryServiceQuery
ServiceProductServiceProductService
ProfileServiceProfileService
ArchiveServiceArchiveService
Bridge to External Services
Bridge to External Services
10
Critical OODT Components
Query Server – Manages and routes concurrent queries to distributed resources. Combines results.
Profile Server – Enables resource discovery providing information about what data resources are available (a resource is really an electronic object)
Product Server – Enables access and retrieval of data products from an online data source
Servers written in Java and supported on Windows, Linux, Solaris, Mac OS X, etc
11
Software Component Deployment
Userquery
EDRN Secure Website
Que
ryC
lien
t
Web
se
rver
sear
ch.js
p
Product ServerMoffitt
EDRN ProfileServer
EDRN CDE Mapping Database
SpecimenDatabase
SpecimenDatabase
SpecimenDatabase
SpecimenDatabase
SpecimenDatabase
SpecimenDatabase
DMCC – Fred Hutchinson Cancer Research Center
Science Tools
Userquery
SpecimenDatabase
SpecimenDatabase
Product ServerSan Antonio
Product ServerMD Anderson
Product ServerColorado
Product ServerCreighton
Product ServerGLNE
Product ServerPittsburgh
Product ServerNew York
Product ServerBrigham and Womens
SpecimenDatabase
12
Semantic Architecture Define a common data model for EDRN
• Common Data Elements• Relationships between elements
Institutions have existing specimen repositories with locally defined data models• Map local data elements to CDEs using EDRN CDE mapping and
repository tools• 39 CDEs Shared
Use Standards• ISO/IEC 11179• Resource Description Framework (RDF)
Use standard definitions for data exchange• Communicate using a standard XML schema
13
Gender Mapping ExampleEDRN CDE Institution DE
Table Name M_Sput_Subject
Name BASELINE_DEMOGRAPHICS-GENDER_CODE
SEX
Version 1.0
Data Type Integer Character
Document Text Gender (What is yourgender?)
Gender
Permissible Values 1 Male2 Female9 Unknown/Refused
MFU
Mapping Type Match by Query
14
Security and Confidentiality
Highly Sensitive Information Health Insurance Portability and Accountability Act
(HIPAA)• Removed Personal Health Information (PHI)
Security Measures• 128-bit strong encryption using Secure Socket Layer (SSL)• Access limited to remote connect from specific IP(s) on
specific ports. Firewalls augmented with rule set. Institutions IRBs
• Common Protocol
18
Number of Participants by Specimen Type
19255
186
9751
443616 253
Blood
Bone Marrow
Tissue
Bronchial Washings / Brushings
Sputum
Urine
19
ERNE Achievements
Deployed Software Infrastructure to 10 institutions• Process of connecting new sites well understood
Software Infrastructure Maturing• Extensive nightly testing and monitoring of infrastructure
Team Maturing and Growing Policy Challenges Institutional Access Science Support
20
More Information
EDRN – http://www.cancer.gov/edrn OODT – http://www.jpl.nasa.gov
Contact:• Heather Kincaid: [email protected] • Dan Crichton: [email protected]• Don Johnsey: [email protected]
22
Dynamic Portal
JSP-based implementation that queries informatics infrastructure• Uses CDE terms for constructing query
expression Shows available servers Limit available choices based on selected
criteria Quick Search Advanced Search