+ All Categories
Home > Documents > The Hutch Data Commonwealth - Amazon S3 · data to create an interoperable resource for the...

The Hutch Data Commonwealth - Amazon S3 · data to create an interoperable resource for the...

Date post: 09-Aug-2020
Category:
Upload: others
View: 1 times
Download: 0 times
Share this document with a friend
25
The Hutch Data Commonwealth Matthew Trunnell CIO & VP of IT Executive Director, Hutch Data Commonwealth Fred Hutchinson Cancer Research Center TechConnect Conference 13 March 2018
Transcript
Page 1: The Hutch Data Commonwealth - Amazon S3 · data to create an interoperable resource for the research community. Requirements: • Permanent digital IDs • Permanent metadata •

The Hutch Data Commonwealth

Matthew TrunnellCIO & VP of ITExecutive Director, Hutch Data CommonwealthFred Hutchinson Cancer Research Center

TechConnect Conference13 March 2018

Page 2: The Hutch Data Commonwealth - Amazon S3 · data to create an interoperable resource for the research community. Requirements: • Permanent digital IDs • Permanent metadata •
Page 3: The Hutch Data Commonwealth - Amazon S3 · data to create an interoperable resource for the research community. Requirements: • Permanent digital IDs • Permanent metadata •

© Fred Hutchinson Cancer Research Center 2

Page 4: The Hutch Data Commonwealth - Amazon S3 · data to create an interoperable resource for the research community. Requirements: • Permanent digital IDs • Permanent metadata •

© Fred Hutchinson Cancer Research Center 3Photo Credit Len Rubenstein

Page 5: The Hutch Data Commonwealth - Amazon S3 · data to create an interoperable resource for the research community. Requirements: • Permanent digital IDs • Permanent metadata •

Cost per Genome

200 TB

1 PB10 PB

Page 6: The Hutch Data Commonwealth - Amazon S3 · data to create an interoperable resource for the research community. Requirements: • Permanent digital IDs • Permanent metadata •
Page 7: The Hutch Data Commonwealth - Amazon S3 · data to create an interoperable resource for the research community. Requirements: • Permanent digital IDs • Permanent metadata •

IT infrastructure • Expansive storage services with deep security and effective tools for consent management and access control

• An elastic computing infrastructure for running analyses without boundData services • Biomedical data stores leveraging and driving global standard interfaces

• Methods stores that enable users and groups across the research community to share, discover and build off each other’s methods and ideas

Analysis engine • A comprehensive suite of methods for high-dimensional data analysis for use in both research and clinical environments

• New approaches to data mining leveraging applied machine learning

Knowledge sharing and dissemination

• Knowledge bases that seamlessly weave internal and external knowledge and data

• Researcher and clinician portals that leverage all of the above to provide customized, user-driven views of the data

6

New capabilities needed

Page 8: The Hutch Data Commonwealth - Amazon S3 · data to create an interoperable resource for the research community. Requirements: • Permanent digital IDs • Permanent metadata •

7

New competencies needed

IT infrastructureDevOps

Data services

Data Engineering

Analysis engine

Data Science

Knowledge sharing and dissemination UI/UX

Page 9: The Hutch Data Commonwealth - Amazon S3 · data to create an interoperable resource for the research community. Requirements: • Permanent digital IDs • Permanent metadata •

© Fred Hutchinson Cancer Research Center 8

295Total faculty members

3000+Total employees

NCI-DesignatedComprehensive Cancer Center

5Research Divisions

• Basic Sciences• Clinical Research• Human Biology• Public Health

Science• Vaccine & Infectious

Disease

Nobel laureates 3

$284,704,566NIH funding FY17

RESEARCH EXCELLENCE

1,072Scientific papers published FY18 YTD

Page 10: The Hutch Data Commonwealth - Amazon S3 · data to create an interoperable resource for the research community. Requirements: • Permanent digital IDs • Permanent metadata •

JAMA. 2014;311(24):2479-2480. doi:10.1001/jama.2014.4228

Page 11: The Hutch Data Commonwealth - Amazon S3 · data to create an interoperable resource for the research community. Requirements: • Permanent digital IDs • Permanent metadata •

Columbia University, New York, NY; andMicrosoft Research, Redmond, WA

ASSOCIATED CONTENT

Appendix DOI: 10.1200/JOP.2015.010504

DOI: 10.1200/JOP.2015.010504;published online ahead of print atjop.ascopubs.org on June 7, 2016.

Screening for PancreaticAdenocarcinoma Using SignalsFrom Web Search Logs: FeasibilityStudy and ResultsJohn Paparrizos, MSc, Ryen W. White, PhD, and Eric Horvitz, MD, PhD

AbstractIntroductionPeople’s online activities can yield clues about their emerging health conditions. We

performed an intensive study to explore the feasibility of using anonymized Web querylogs to screen for the emergence of pancreatic adenocarcinoma. The methods used

statistical analyses of large-scale anonymized search logs considering the symptomqueries from millions of people, with the potential application of warning individual

searchers about the value of seeking attention from health care professionals.

MethodsWe identified searchers in logs of online search activitywho issued special queries that aresuggestive of a recent diagnosis of pancreatic adenocarcinoma.We then went back many

months before these landmark queries were made, to examine patterns of symptoms,which were expressed as searches about concerning symptoms. We built statisticalclassifiers that predicted the future appearance of the landmark queries based onpatterns

of signals seen in search logs.

ResultsWe found that signals about patterns of queries in search logs can predict the futureappearance of queries that are highly suggestive of a diagnosis of pancreatic

adenocarcinoma. We showed specifically that we can identify 5% to 15% of cases, whilepreserving extremely low false-positive rates (0.00001 to 0.0001).

ConclusionSignals in search logs show the possibilities of predicting a forthcoming diagnosis ofpancreatic adenocarcinoma from combinations of subtle temporal signals revealed in the

queries of searchers.

INTRODUCTIONPancreatic adenocarcinoma poses a diffi-cult and resistant challenge in oncology. Itis the fourth leading cause of cancer deathin the United States and is the sixth leadingcause of cancer death in Europe.1 The ill-ness is frequently diagnosed too late tobe treated effectively2 ,3 and can progress

from stage I to stage IV in just over1 year.4Approximately 75%ofpatientswithpancreatic adenocarcinoma who are notcandidates for surgery will die within 1 yearof diagnosis, and only 4% will survive for5 years postdiagnosis.5

Early signs and symptoms of pancre-atic adenocarcinoma are subtle and often

Copyright © 2016 by American Society of Clinical Oncology jop.ascopubs.org 1

Original Contribution FOCUS ON QUALITYInform

ation downloaded from jop.ascopubs.org and provided by at Arnold Library - Fred Hutchinson Cancer Research Center on Septem

ber 9, 2016 from 140.107.179.210

Copyright © 2016 American Society of Clinical O

ncology. All rights reserved.

Page 12: The Hutch Data Commonwealth - Amazon S3 · data to create an interoperable resource for the research community. Requirements: • Permanent digital IDs • Permanent metadata •

The Hutch Data Commonwealth

© Fred Hutchinson Cancer Research Center 11

Page 13: The Hutch Data Commonwealth - Amazon S3 · data to create an interoperable resource for the research community. Requirements: • Permanent digital IDs • Permanent metadata •

© Fred Hutchinson Cancer Research Center 12

Corollary:Enable investigators to trace backwards from

publication to analysis results to laboratory results to biospecimens in freezers.

Vision:Enable investigators to leverage all possible data

in the effort to eliminate disease by driving the development of data infrastructure and data

science capabilities through collaborative research, strategic partnering and robust

engineering.

Page 14: The Hutch Data Commonwealth - Amazon S3 · data to create an interoperable resource for the research community. Requirements: • Permanent digital IDs • Permanent metadata •
Page 15: The Hutch Data Commonwealth - Amazon S3 · data to create an interoperable resource for the research community. Requirements: • Permanent digital IDs • Permanent metadata •

© Fred Hutchinson Cancer Research Center 14

Areas of focus for the Commonwealth• High-dimensional data integration

• Natural language processing (NLP) to extract structured information from clinical records

• Application of “deep learning” to medical image analysis

• Collection and analysis of data from mobile devices

• Data management and discovery

• Developing partnerships with regional technology and research organizations

Page 16: The Hutch Data Commonwealth - Amazon S3 · data to create an interoperable resource for the research community. Requirements: • Permanent digital IDs • Permanent metadata •

~90 FTE• Service Desk• Systems Engineering • Network Engineering• Information Security• Enterprise applications• Project management office• Scientific computing

~45 FTE

• Product engineering• Software development

• Data engineering

• Product management

• Clinical Informatics• Applied data science• Partnership development

Hutch Center IT Hutch Data Commonwealth

Page 17: The Hutch Data Commonwealth - Amazon S3 · data to create an interoperable resource for the research community. Requirements: • Permanent digital IDs • Permanent metadata •

Data Commons

Page 18: The Hutch Data Commonwealth - Amazon S3 · data to create an interoperable resource for the research community. Requirements: • Permanent digital IDs • Permanent metadata •
Page 19: The Hutch Data Commonwealth - Amazon S3 · data to create an interoperable resource for the research community. Requirements: • Permanent digital IDs • Permanent metadata •

© Fred Hutchinson Cancer Research Center 18

https://kids.nationalgeographic.com/explore/space/black-holes/

Data have gravity.

Page 20: The Hutch Data Commonwealth - Amazon S3 · data to create an interoperable resource for the research community. Requirements: • Permanent digital IDs • Permanent metadata •

© Fred Hutchinson Cancer Research Center 19

Disease-specific Commons

OpenAPS Data Commons Nightscout Data Commons

on theOpen Humans platform

Data-specific Commons

Federal Research CommonsInstitutional Commons

Page 21: The Hutch Data Commonwealth - Amazon S3 · data to create an interoperable resource for the research community. Requirements: • Permanent digital IDs • Permanent metadata •

Data Commons: Cyber-infrastructure that co-locates data, storage, and computing infrastructure with commonly used tools for analyzing and sharing data to create an interoperable resource for the research community.Requirements:

• Permanent digital IDs• Permanent metadata• API-based access• Data portability• Data peering• Pay-for compute

Grossman, R. et al. 2017. A Case For Data Commons: Toward Science as a Service. arXiv:1604.02608 [cs.CY]

Page 22: The Hutch Data Commonwealth - Amazon S3 · data to create an interoperable resource for the research community. Requirements: • Permanent digital IDs • Permanent metadata •

Key Takeaways

© Fred Hutchinson Cancer Research Center 21

Page 23: The Hutch Data Commonwealth - Amazon S3 · data to create an interoperable resource for the research community. Requirements: • Permanent digital IDs • Permanent metadata •

© Fred Hutchinson Cancer Research Center 22

Big data requires newinfrastructureinvestmentorganization

Page 24: The Hutch Data Commonwealth - Amazon S3 · data to create an interoperable resource for the research community. Requirements: • Permanent digital IDs • Permanent metadata •

© Fred Hutchinson Cancer Research Center 23

Research computing is becoming more costly.

It’s time we start sharing.

Page 25: The Hutch Data Commonwealth - Amazon S3 · data to create an interoperable resource for the research community. Requirements: • Permanent digital IDs • Permanent metadata •

Recommended