Towards a Platform for Global Health
Philip E. Bourne, PhD, FACMIAssociate Director for Data Science
The National Institutes of Health
http://www.slideshare.net/[email protected]
Bias• Worked on long standing data resources
– PDB, IEDB• Systems pharmacology with emphasis
on the role of molecular structure• AVC for innovation and industrial
alliances at UCSD• Chief data officer for the National
Institutes of Health• Open science zealot
https://www.boundless.com/psychology/textbooks/boundless-psychology-textbook/researching-psychology-2/bias-in-psychological-research-407/biases-in-experimental-design-validity-reliability-and-other-issues-132-12667/images/research-bias/
Before we look at platforms .. and thinking as a funder .. I want to describe an emergent effort that may have some valuable lessons for GA4GH going forward in their relationship to funders (something that should not be ignored)
Preprintshttp://www.hdimagez.com/gifts-made-of-moneywallpapers/
What is a preprint?• A complete manuscript/research report shared prior to/instead of
publication – in ArXiv 80% of preprints get published at a later date
• Not formally peer reviewed but may be commented on by the community – depends on the preprint service
http://asapbio.org/
Preprints – Long the realm of physicists are gaining traction in the life sciences
Speeds up dissemination
Record of priority More informed
grant review Negative data
✘ Fear of scooping✘ Career disadvantage✘ Inability to publish✘ Quality: Moderation
only; no peer review
Status• ASAPbio to issue RFI for what a central preprint service should look
like• ~15 global funders (government and foundations) – the coalition of
the willing – defined basic principles to support such a service• Collectively expect to fund ASAPbio to award a contract to build the
system • While sustainability models should be sought, funders anticipate
funding a central service for 5-10 years at least
Endpoint• Accelerated scientific outcomes through a human and machine
accessible corpus of open knowledge accessible to all
How should GA4GH view this development? …
Perceived critical missionStrong leadership Leading scientists engagedSignificant community support✖Obvious endpoint/singular
message✖Funders - coalition of the
willing✖Identified champions within
each funding body
http://asapbio.org/
Obvious endpoint/singular message
Possible Touchpoint to Funders:
“The partners in the Global Alliance are working together to create a common framework of harmonized approaches to enable the responsible, voluntary, and secure sharing of genomic and clinical data.”
Funders too are increasingly looking at moving from pipes to platforms (aka common framework)..
What would such a platform look-like? …
Sangeet Paul Chowdryhttp://platformthinkinglabs.com/start-here/
Making Biomedical Research More Like Airbnb
Philip E. Bourne, PhD, FACMIAssociate Director for Data Science
The National Institutes of Health
http://www.slideshare.net/[email protected]
I am not crazy, hear me out• Airbnb is a platform that supports a trusted relationship between
consumer (renter) and supplier (host)• The platform focuses on maximizing the exchange of services
between supplier and consumer and maximizing the amount of trust associated with a given stakeholder
• It seems to be working: • 60 million users searching 2 million listings in 192 countries • Average of 500,000 stays per night. • Evaluation of US $25bn
Is not biomedical research the same?
Why a comparison to Airbnb is not fair• Airbnb was born digital
• The exchange of services on Airbnb are simple compared to what is required of a platform to support biomedical research
Nevertheless there is much to be learnt
Consider why this appeals to funders
Author Submission via the Web Depositor Submission via the Web
Syntax Checking Syntax Checking
Review by Scientists &Editors
Review by Annotators
Corrections by Author Corrections by Depositor
Publish – Web Accessible Release – Web Accessible
Similar Processes Lead to Similar Resources
Bourne, PLoS Comp. Biol. 2005 1(3) e34de Waard Nature Proceedings 2010 10101/npre.2010.4742.1
What is different is the perceived value of each to the research enterprise. That value difference is diminishing in part because of openness, accessibility, policy, governance, increased data reuse and lets not forget other forms of madness…
The Analog-Digital Data Knowledge Cycle
P.E. Bourne, 2016, There is No Intelligent Life Down There
Scho
larly
Wor
kflow
Platforms - the situation today
In summary there is not currently a widely adopted single platform for the exchange of services in biomedical research. Either there is a platform per service or no platform at all. Why have we not done better and what are the impediments today?
Impediments to a biomedical platform
• Current work practices by all stakeholders• Entrenched business models• Size of the undertaking aka resources needed• Trust• Incentives to use the platform
http://www.forbes.com/sites/johnhall/2013/04/29/10-barriers-to-employee-innovation/#8bdbaa811133
The NIH through the Big Data to Knowledge (BD2K) is experimenting with a platform, keeping in mind the need to overcome these impediments
Enter The Commons
https://en.wikipedia.org/wiki/Ealing_Common#/media/File:Ealing_Common_-_geograph.org.uk_-_17075.jpg
Scho
larly
Wor
kflow
Commons – Initial focus is on integrating two layers of the scholarly workflow
Commons topology
Compute Platform: Cloud or HPC
Services: APIs, Containers, Indexing,
Software: Services & Tools
scientific analysis tools/workflows
Data“Reference” Data Sets
User defined data
Digital Object Com
pliance
App store/User Interface
PaaS
SaaS
IaaS
https://datascience.nih.gov/commons
Commons compliance
• Treat products of research – data, methods, papers etc. as digital objects
• These digital objects exist in a shared virtual space
• Digital object compliance through FAIR principles:
• Findable• Accessible (and usable)• Interoperable • Reusable The FAIR Principles
http://www.nature.com/articles/sdata201618
NIH + Community defined data sets
possible FOAs and CCM
BD2K Centers, MODS, HMP & InteroperabilitySupplements
Cloud credits model (CCM)
BioCADDIE/OtherIndexing
NCI & NIAID Cloud Pilots
Compute Platform: Cloud or HPC
Services: APIs, Containers, Indexing,
Software: Services & Tools
scientific analysis tools/workflows
Data“Reference” Data Sets
User defined data
Digital Object Com
pliance
App store/User Interface
Mapping current BD2K activities to the commons topology
https://datascience.nih.gov/commons
Prediction – funder activities are about to accelerate and here lies the opportunity….
What are the funder incentives? ….
Incentives• Airbnb
• Monetize unutilized space
• Ease of use
• New vacation experience
• Commons• Need to improve rigor and
reproducibility• Productivity• Sustainability
• Education and training
• Opportunity to undertake elastic compute on large complex data
NIH committed to (and would hope other funders will join)• The Commons and the FAIR principles• Pilots that test the feasibility of the platform for larger scale
development/adoption• Provision of two large complex data sets in the Commons – TOPMed
and GTEx are obvious choices, others may surface• Use cases that illustrate the feasibility and scientific value of:
• Access to a single data source• Interoperability across data sources
Summary• NIH has endorsed the Commons and the FAIR principles• The Commons is the beginnings of a platform from which to conduct
biomedical research• Over the next 1-2 years we are conducting pilots to evaluate the
feasibility of the Commons• If feasible the intent is to expand into additional layers of the scholarly
research lifecycle• The global reach of GA4GH can foster a coalition of the willing• Commons applications are an opportunity to provide a singular
message
“I really admire Airbnb as a pioneer of the sharing economy and for building community. They've found an elegant way to help hosts make more money and for guests to have authentic experiences. It brings those people together in a unique way. “
Logan Green
“The Commons is an effort at creating a sharing economy and for building community. We hope for a more cost effective and productive research environment while bringing people together in a unique way. “
Phil Bourne
Speaking of a shared economy…
You are invited to contribute to a shared document that describes this concept..
You will be acknowledged and the document put forward for NIH clearance to be blogged/preprinted/published….
http://tinyurl.com/hc4td5b
Acknowledgements
• ADDS Office: Vivien Bonazzi, Jennie Larkin, Michelle Dunn, Mark Guyer, Allen Dearry, Sonynka Ngosso, Tonya Scott, Lisa Dunneback, Vivek Navale (CIT/ADDS)
• NCBI: George Komatsoulis• NHGRI: Valentina di Francesco
• NIGMS: Susan Gregurick
• CIT: Debbie Sinmao, Andrea Norris
• NIH Common Fund: Jim Anderson , Betsy Wilder, Leslie Derr
• NCI Cloud Pilots/ GDC: Warren Kibbe, Tony Kerlavage, Tanja Davidsen
• Commons Reference Data Set Working Group: Weiniu Gan (HL), Ajay Pillai (HG), Elaine Ayres, (BITRIS), Sean Davis (NCI), Vinay Pai (NIBIB), Maria Giovanni (AI), Leslie Derr (CF), Claire Schulkey (AI)
• RIWG Core Team: Ron Margolis (DK), Ian Fore, (NCI), Alison Yao (AI), Claire Schulkey (AI), Eric Choi (AI)
• OSP: Dina Paltoo, Kris Langlais, Erin Luetkemeier, Agnes Rooke,
NIH…Turning Discovery Into Health
[email protected]://datascience.nih.gov/
http://www.ncbi.nlm.nih.gov/research/staff/bourne/