Date post: | 13-Jul-2015 |
Category: |
Education |
Upload: | philip-bourne |
View: | 1,693 times |
Download: | 0 times |
The NIH as a Digital Enterprise:Implications for PAG
Philip E. Bourne, PhDAssociate Director for Data Science
National Institutes of Health
PAG San DiegoJanuary 11, 2015
Biomedical Research is Becoming More Digital and FAIR
Finding
Accessing
Integrating
Reusing
digital research objects
This move from an observational science to a more analytical science
is being driven by ever increasing amounts of digital data
And This May Just be the Beginning
Evidence:– Google car
– 3D printers
– Waze
– Robotics
From: The Second Machine Age: Work, Progress, and Prosperity in a Time of Brilliant Technologies by Erik Brynjolfsson & Andrew McAfee
Further Perturbation:The Story of Meredith
http://fora.tv/2012/04/20/Congress_Unplugged_Phil_Bourne
Stephen Friend
47/53 “landmark” publications could not be replicated
[Begley, Ellis Nature, 483, 2012] [Carole Goble]
ADDS Mission Statement
To foster an open ecosystem that enables biomedical* research to be
conducted as a digital enterprise that enhances health, lengthens life and
reduces illness and disability
* Includes biological, biomedical, behavioral, social, environmental, and clinical studies that relate to understanding health and disease.
Some Goals of the Digital Enterprise
Cost savings through sharing of best practices
Sustainability of digital assets
Collaboration through identification of collaborators at the point of data collection not publication
Improved reproducibility through data and methods sharing
Integration of data types and data and literature to accelerate discovery
Some of Today’s Observations
Bad News– We do not yet have a
data sustainability plan
– Global policies define the why but not the how
– We do not know how all the data we currently have are used
– We can’t estimate future supply and demand
– We need to ramp up training programs in data science
Good news– Genuine willingness to
address the problem
– Global communities are emerging
– Efficiencies can be achieved
– BD2K is the beginnings of a plan
– We are beginning to quantify the issues
Elements of The Digital Enterprise
Community Policy
Infrastructure
• Sustainability• Collaboration• Training
Elements of The Digital Enterprise
Community Policy
Infrastructure
• Sustainability Collaboration
• Training
VirtuousResearch
Cycle
Policies – Now & Forthcoming
Data Sharing– Genomic data sharing announced
– Data sharing plans on all research awards
– Data sharing plan enforcement
• Machine readable plan
• Repository requirements to include grant numbers
http://www.nih.gov/news/health/aug2014/od-27.htm
Policies - Forthcoming
Data Citation– Goal: legitimize data as a form of scholarship
– Process:
• Machine readable standard for data citation (done)
• Endorsement of data citation for inclusion in NIH bib sketch, grants, reports, etc.
• Example formats for human readable data citations
• Slowly work into NLM/NCBI workflow
BD2KCenter
BD2KCenter
BD2KCenter
BD2KCenter
BD2KCenter
BD2KCenter
DDICC
Software
Standards
Infrastructure - The Commons
Labs
Labs
Labs
Labs
The Commons
Digital Objects (with UIDs)
Search(indexed metadata)
Computing Platform
Th
e C
omm
ons
Vivien BonazziGeorge Komatsoulis
The Commons: Compute Platforms
The CommonsConceptual Framework
Public CloudPlatforms
Super Computing (HPC) Platforms
Other Platforms ?
Google, AWS (Amazon)
Microsoft (Azure), IBM,
other?
In house compute
solutions
Private clouds, HPC
– Pharma
– The Broad
– Bionimbus
Traditionally low access
by NIH
How Might PAG’s Participate?
Consider contributing digital research objects into the Commons – data, software, standards, narrative, course materials …
Initiate your own moves from cylinders of excellence to more integrated and multi-functional data sources
Work to define new business models for the scientific enterprise
Generic Needs
Homogenization of disparate large unstructured datasets
Deriving structure from unstructured data
Feature mapping and comparison from image data
Visualization and analysis of multi-dimensional phenotypic datasets
Causal modeling of large scale dynamic networks and subsequent discovery
Utilize data that are sparsely and irregularly sampled and noisy
BD2K will offer reference datasets and points of domain expertise to explore these questions
1) Build an OPEN digital framework for data science training:
NIH Data Science Workforce Development Center
1) Develop short-term training opportunities: Courses, educational resources, etc.
1) Develop the discipline of biomedical data science and support cross-training – OPEN courseware
Community: TrainingData Science Training Goals
All goals have a diversity component and manate
Associate Director for Data Science
Commons BD2K Efficiency
Sustainability Education Innovation Process
• Cloud – Data & Compute
• Search• Security • Reproducibility
Standards• App Store
• Coordinate• Hands-on• Syllabus• MOOCs
• Community• Centers• Training Grants• Catalogs• Standards• Analysis
• Data Resource Support
• Metrics• Best
Practices• Evaluation• Portfolio
Analysis
The Biomedical Research Digital Enterprise
Partnerships
Collaboration
Programmatic Theme
Deliverable
Example Features • IC’s• Researchers• Federal
Agencies• International
Partners• Computer
Scientists
Scientific Data Council External Advisory Board
Training
Potential Outcomes
Mobility: improve the outcomes of surgeries in children with cerebral palsy and gait pathology
Wellness: markers derived from constantly monitored eHealth/mobile health devices – apply to smoking cessation, weight loss
Cancer: further personalization of treatment
Mental Health: better identify factors that resist and promote brain disease e.g., schizophrenia, bipolar disorder, major depression, attention deficit hyperactivity disorder (ADHD), obsessive compulsive disorder (OCD), autism
Addiction: utilizing social media to track and treat drug use and addiction