+ All Categories
Home > Documents > The Seven Bridges Cloud Ecosystem: Enabling Interoperable ... 20180403 Liz... · HIPAA-compliant on...

The Seven Bridges Cloud Ecosystem: Enabling Interoperable ... 20180403 Liz... · HIPAA-compliant on...

Date post: 24-Jun-2020
Category:
Upload: others
View: 1 times
Download: 0 times
Share this document with a friend
38
sevenbridges.com © 2018 Seven Bridges SAMPLE TITLE HERE The Seven Bridges Cloud Ecosystem: Enabling Interoperable Data Access and Analysis Liz Williams, PhD [email protected]
Transcript
Page 1: The Seven Bridges Cloud Ecosystem: Enabling Interoperable ... 20180403 Liz... · HIPAA-compliant on AWS and GCP deployments ISO 27001:2013 certified US Federal Information Security

sevenbridges.com © 2018 Seven Bridges

SAMPLE TITLE HERE

The Seven Bridges Cloud Ecosystem: Enabling Interoperable Data Access and Analysis

Liz Williams, PhD [email protected]

Page 2: The Seven Bridges Cloud Ecosystem: Enabling Interoperable ... 20180403 Liz... · HIPAA-compliant on AWS and GCP deployments ISO 27001:2013 certified US Federal Information Security

© 2018 Seven Bridges sevenbridges.com

The content of this presentation is solely the responsibility of Seven Bridges Genomics Inc and does not necessarily represent the official views of the National Cancer Institute or National Institutes of Health.

2

Page 3: The Seven Bridges Cloud Ecosystem: Enabling Interoperable ... 20180403 Liz... · HIPAA-compliant on AWS and GCP deployments ISO 27001:2013 certified US Federal Information Security

© 2018 Seven Bridges sevenbridges.com

The Seven Bridges Cloud Ecosystem Enables Precision Medicine

Data Users Infrastructure

Interoperability

Partnerships

3

Page 4: The Seven Bridges Cloud Ecosystem: Enabling Interoperable ... 20180403 Liz... · HIPAA-compliant on AWS and GCP deployments ISO 27001:2013 certified US Federal Information Security

© 2018 Seven Bridges sevenbridges.com

Infrastructure

Interoperability

Partnerships

The Seven Bridges Cloud Ecosystem

4

Page 5: The Seven Bridges Cloud Ecosystem: Enabling Interoperable ... 20180403 Liz... · HIPAA-compliant on AWS and GCP deployments ISO 27001:2013 certified US Federal Information Security

© 2018 Seven Bridges sevenbridges.com

Project Management User Management Authentication & Authorization System Monitoring Usage Logging Notification Service Backup Service Billing Management

The Seven Bridges Platform

5

Web Application API

Task Execution API Data/Metadata Service

Cloud Storage & Compute

Resource Manager

Core Platform Infrastructure

Data Infrastructure Independent Core Services

Task Execution Infrastructure

Task Scheduler

Job Management Layer

Orchestration Layer

Page 6: The Seven Bridges Cloud Ecosystem: Enabling Interoperable ... 20180403 Liz... · HIPAA-compliant on AWS and GCP deployments ISO 27001:2013 certified US Federal Information Security

© 2018 Seven Bridges sevenbridges.com

Security & Compliance on the Seven Bridges Platform

●  HIPAA-compliant on AWS and GCP deployments

●  ISO 27001:2013 certified

●  US Federal Information Security Management Act (FISMA) Moderate certification based on NIST 800-53 Rev 4 controls for the CGC

●  NIH Trusted Partner for the CGC

●  Compliant with dbGaP Security Best Practices ●  US-EU Privacy Shield Program registered participant; preparing for GDPR ●  Support for CAP, CLIA, and GxP best practices

6

Page 7: The Seven Bridges Cloud Ecosystem: Enabling Interoperable ... 20180403 Liz... · HIPAA-compliant on AWS and GCP deployments ISO 27001:2013 certified US Federal Information Security

© 2018 Seven Bridges sevenbridges.com

Essential Features of an Interoperable Data Ecosystem

Collaborative Usable Reproducible Extendable Scalable

Findable Accessible Interoperable Reusable

+

7

Page 8: The Seven Bridges Cloud Ecosystem: Enabling Interoperable ... 20180403 Liz... · HIPAA-compliant on AWS and GCP deployments ISO 27001:2013 certified US Federal Information Security

© 2018 Seven Bridges sevenbridges.com

●  Secure, customizable workspaces

●  Managed billing

●  User-friendly interface

●  Easy data management

●  Industry- standard bioinformatics pipelines

●  Flexible & reproducible methods

●  Automated & accessible task logs

●  Developer- friendly tools

●  Portable bioinformatics pipelines

●  Scalable data storage

●  Cloud- optimized computation

Collaborative Usable Reproducible Extendable Scalable

Essential Features of an Interoperable Data Ecosystem

8

Page 9: The Seven Bridges Cloud Ecosystem: Enabling Interoperable ... 20180403 Liz... · HIPAA-compliant on AWS and GCP deployments ISO 27001:2013 certified US Federal Information Security

© 2018 Seven Bridges sevenbridges.com

2006 2014 2015 2016 2017

TCGA Pilot Program announced

Launched the CGC

2018

Awarded NCI Cancer Genomics Cloud (CGC) Pilot contract

Logged 3000th user & 450th year of compute

time on the CGC

Registered 1000th CGC user

...

Growth of the Seven Bridges Cloud Ecosystem

9

CAVATICA selected as NIH Kids First Data Resource

Launched CAVATICA partnership with CHOP

Partnered with JAX to build NCI’s PDXNet Data Commons

Selected for NIH Data Commons Pilot

Launched CAVATICA

Page 10: The Seven Bridges Cloud Ecosystem: Enabling Interoperable ... 20180403 Liz... · HIPAA-compliant on AWS and GCP deployments ISO 27001:2013 certified US Federal Information Security

© 2018 Seven Bridges sevenbridges.com

Available by Q4 2018 *

* * *

Data in the Seven Bridges Cloud Ecosystem

10

Page 11: The Seven Bridges Cloud Ecosystem: Enabling Interoperable ... 20180403 Liz... · HIPAA-compliant on AWS and GCP deployments ISO 27001:2013 certified US Federal Information Security

sevenbridges.com © 2018 Seven Bridges

An NCI Cancer Research Data Commons Cloud Resource

The Seven Bridges Cancer Genomics Cloud (CGC)

11 11

Page 12: The Seven Bridges Cloud Ecosystem: Enabling Interoperable ... 20180403 Liz... · HIPAA-compliant on AWS and GCP deployments ISO 27001:2013 certified US Federal Information Security

© 2018 Seven Bridges sevenbridges.com

The Seven Bridges CGC

The Seven Bridges Cancer Genomics Cloud has been funded in whole or in part with Federal funds from the National Cancer Institute, National Institutes of Health, Contract No. HHSN261201400008C and Task Order No. 17X146 under Contract No. HHSN261201500003I.

cancergenomicscloud.org

A Cloud Resource within the NCI Cancer Research Data Commons for secure storage, sharing & analysis

of petabytes of public, multi-omic cancer datasets

12

Page 13: The Seven Bridges Cloud Ecosystem: Enabling Interoperable ... 20180403 Liz... · HIPAA-compliant on AWS and GCP deployments ISO 27001:2013 certified US Federal Information Security

© 2018 Seven Bridges sevenbridges.com

●  User-friendly web interface

●  Powerful RESTful API, Datasets API & object-oriented and user-friendly libraries in Python, R & Java

●  Comprehensive online documentation & training resources

●  Technical support from a team of 200+ expert scientists, bioinformaticians & engineers

Accessibility

cancergenomicscloud.org

13

Page 14: The Seven Bridges Cloud Ecosystem: Enabling Interoperable ... 20180403 Liz... · HIPAA-compliant on AWS and GCP deployments ISO 27001:2013 certified US Federal Information Security

© 2018 Seven Bridges sevenbridges.com

Collaboration Tools

cancergenomicscloud.org

●  Secure and customizable private workspaces for management of collaborators, data, tools & analysis results

●  Project description, note & notification features for communicating with collaborators around the world

●  Automatically generated, durable records of input/output files, apps, versions & parameters for every task run on the platform

14

Page 15: The Seven Bridges Cloud Ecosystem: Enabling Interoperable ... 20180403 Liz... · HIPAA-compliant on AWS and GCP deployments ISO 27001:2013 certified US Federal Information Security

© 2018 Seven Bridges sevenbridges.com

2015 2016 2017 2018

Petabytes of Public Datasets

*

* *

Anticipated availability * cancergenomicscloud.org

●  3 PB of multi-omic public datasets ●  20 PB of linked data ●  0.5 PB of private & derived data

15

*

Page 16: The Seven Bridges Cloud Ecosystem: Enabling Interoperable ... 20180403 Liz... · HIPAA-compliant on AWS and GCP deployments ISO 27001:2013 certified US Federal Information Security

© 2018 Seven Bridges sevenbridges.com

Interactive and Programmatic Query Tools

cancergenomicscloud.org

●  Web- and API-based metadata query tools to explore the data landscape and build cohorts for analysis

●  Semantic triple-store technology for dataset harmonization & cross-dataset query building

16

Page 17: The Seven Bridges Cloud Ecosystem: Enabling Interoperable ... 20180403 Liz... · HIPAA-compliant on AWS and GCP deployments ISO 27001:2013 certified US Federal Information Security

© 2018 Seven Bridges sevenbridges.com

Built-in Data Security

cancergenomicscloud.org

●  Per-file, per-user permissions management for third-party controlled-access data

●  A permissions management model extendable across datasets & data governance entities

17

Page 18: The Seven Bridges Cloud Ecosystem: Enabling Interoperable ... 20180403 Liz... · HIPAA-compliant on AWS and GCP deployments ISO 27001:2013 certified US Federal Information Security

© 2018 Seven Bridges sevenbridges.com

Tools To Connect Data

cancergenomicscloud.org

Import Data to the Platform

●  Command Line Uploader & CLI

●  Seven Bridges Uploader (GUI)

●  API import

●  HTTP(S) / FTP import

Connect the Platform to External Resources

●  Connect Cloud Storage (Volumes API)

●  SBFS (a FUSE-based file system)

Mount projects from your desktop

18

Page 19: The Seven Bridges Cloud Ecosystem: Enabling Interoperable ... 20180403 Liz... · HIPAA-compliant on AWS and GCP deployments ISO 27001:2013 certified US Federal Information Security

© 2018 Seven Bridges sevenbridges.com

Tools To Analyze Data

cancergenomicscloud.org

●  A curated collection of 350+ bioinformatics tools & workflows

●  Optimized for speed & cost in the cloud

●  Fully parameterized & customizable

●  Accessible via the GUI & API

19

Page 20: The Seven Bridges Cloud Ecosystem: Enabling Interoperable ... 20180403 Liz... · HIPAA-compliant on AWS and GCP deployments ISO 27001:2013 certified US Federal Information Security

© 2018 Seven Bridges sevenbridges.com

Tools To Ensure Analytical Reproducibility

cancergenomicscloud.org

●  Docker-containerized bioinformatics pipelines

●  Automatically generated and accessible logs for every task run on the platform

●  Tool & workflow versions

●  Parameters

●  Input & output files

20

Page 21: The Seven Bridges Cloud Ecosystem: Enabling Interoperable ... 20180403 Liz... · HIPAA-compliant on AWS and GCP deployments ISO 27001:2013 certified US Federal Information Security

© 2018 Seven Bridges sevenbridges.com

An Extendable Analysis Ecosystem

cancergenomicscloud.org

●  SBFS to connect data on the platform to local applications

●  Data Cruncher, a custom JupyterLab environment for interactive analysis, data visualization & implementation of custom tertiary analysis tools

Files Instance

21

Page 22: The Seven Bridges Cloud Ecosystem: Enabling Interoperable ... 20180403 Liz... · HIPAA-compliant on AWS and GCP deployments ISO 27001:2013 certified US Federal Information Security

© 2018 Seven Bridges sevenbridges.com

Tools To Port Your Own Pipelines to the Platform

cancergenomicscloud.org

●  An intuitive and flexible software development kit for developing and porting custom tools to the platform

●  Conformance with community standards to ensure pipeline portability & reproducibility

22

Page 23: The Seven Bridges Cloud Ecosystem: Enabling Interoperable ... 20180403 Liz... · HIPAA-compliant on AWS and GCP deployments ISO 27001:2013 certified US Federal Information Security

© 2018 Seven Bridges sevenbridges.com

●  3,000+ registered users from 60+ countries

●  347,000+ completed tasks representing 465+ years of total compute time

Value of the CGC Ecosystem to the Research Community

cancergenomicscloud.org 23

| Jan 2016

350000 -

325000 -

300000 -

275000 -

250000 -

225000 -

200000 -

175000 -

150000 -

125000 -

100000 -

75000 -

50000 -

25000 -

0 -

| Jul

2016

| Jan 2017

| Jul

2017

| Jan 2018

Completed Tasks Failed Tasks

Num

ber o

f Tas

ks R

un

Page 24: The Seven Bridges Cloud Ecosystem: Enabling Interoperable ... 20180403 Liz... · HIPAA-compliant on AWS and GCP deployments ISO 27001:2013 certified US Federal Information Security

© 2018 Seven Bridges sevenbridges.com

Case Study #1: TCGA Immune Response Working Group ●  Collaborative analysis with members of the Immune Response Working Group of The

Cancer Genome Atlas (TCGA) Research Network ●  Outcome: cost-optimized (<$0.30/sample), high-throughput HLA typing across ~9,000 TCGA

RNA-Seq (fastq) files

Case Study #2: PanCancer Analysis of Whole Genomes (PCAWG) Study ●  High-throughput, harmonized analysis by Seven Bridges of all tumor and matched genomes

in the dataset (~1,350) ●  Outcome: rapid generation of ~65,000 output files (including ~5,000 VCFs) totaling 725 TB

Case Study #3: Independent Analysis on 45,000 Genomes ●  High-throughput analysis of 45,000 bacterial genomes accessed from SRA via API and

analyzed using a custom workflow ●  Outcome: analysis completed in ~1 week by a novice CGC user with no substantive

assistance from the CGC team

The CGC Enables Scalable, Cost-Effective Research

cancergenomicscloud.org 24

Page 25: The Seven Bridges Cloud Ecosystem: Enabling Interoperable ... 20180403 Liz... · HIPAA-compliant on AWS and GCP deployments ISO 27001:2013 certified US Federal Information Security

sevenbridges.com © 2018 Seven Bridges

An NCI-funded Resource for the Patient-Derived Xenograft Development and Trial Centers Research Network

The JAX-Seven Bridges PDXNet Data Commons

25 25

Page 26: The Seven Bridges Cloud Ecosystem: Enabling Interoperable ... 20180403 Liz... · HIPAA-compliant on AWS and GCP deployments ISO 27001:2013 certified US Federal Information Security

© 2018 Seven Bridges sevenbridges.com

The JAX-Seven Bridges PDXNet Data Commons

pdxnetwork.org/pdccc/

The JAX-Seven Bridges PDX Data Commons and Coordination Center is funded in whole or in part with Federal funds from the National Cancer Institute, National Institutes of Health, Department of Health and Human Services, under Contract No. 1U24CA224067-01.

A cloud-based environment for secure storage, sharing & analysis of data for the Patient-Derived Xenograft

Development and Trial Centers Research Network (PDXNet)

26

Page 27: The Seven Bridges Cloud Ecosystem: Enabling Interoperable ... 20180403 Liz... · HIPAA-compliant on AWS and GCP deployments ISO 27001:2013 certified US Federal Information Security

© 2018 Seven Bridges sevenbridges.com

The JAX-Seven Bridges PDXNet Data Commons

pdxnetwork.org/pdccc/

Designed to: ●  Connect the PDX Development and Trial Centers

(PDTCs) & the Patient-Derived Model Repository (PDMR)

●  Colocalize PDXNet data & bioinformatics resources to facilitate data harmonization, discovery & analysis

●  Integrate data from individual PDTCs & pilot projects to inform preclinical trials

●  Make PDXNet data & harmonized workflows FAIR and available to the broader research community

27

Page 28: The Seven Bridges Cloud Ecosystem: Enabling Interoperable ... 20180403 Liz... · HIPAA-compliant on AWS and GCP deployments ISO 27001:2013 certified US Federal Information Security

© 2018 Seven Bridges sevenbridges.com

Key Features of the PDXNet Data Commons

pdxnetwork.org/pdccc/

●  Collaborative ●  Usable: Custom data sharing features to enable phased release of consortium

datasets to PDXNet participants & to the public

●  Reproducible: Use of Rabix & CWL for creating reproducible and portable workflows for consortium-wide data harmonization

●  Extendable:

○  Full integration with the Seven Bridges CGC to enable access to all available public datasets & bioinformatics resources

○  A harmonized metadata model that enables increasingly complex queries across public and private datasets using existing data query tools

●  Scalable

28

Page 29: The Seven Bridges Cloud Ecosystem: Enabling Interoperable ... 20180403 Liz... · HIPAA-compliant on AWS and GCP deployments ISO 27001:2013 certified US Federal Information Security

sevenbridges.com © 2018 Seven Bridges

The NIH Common Fund Gabriella Miller Kids First Pediatric Data Resource

CAVATICA

29 29

Page 30: The Seven Bridges Cloud Ecosystem: Enabling Interoperable ... 20180403 Liz... · HIPAA-compliant on AWS and GCP deployments ISO 27001:2013 certified US Federal Information Security

© 2018 Seven Bridges sevenbridges.com

CAVATICA & the Kids First Data Resource

cavatica.org

A cloud-based environment for secure storage, sharing & analysis of large volumes of genomic data

from pediatric cancer & rare disease patients

30

Page 31: The Seven Bridges Cloud Ecosystem: Enabling Interoperable ... 20180403 Liz... · HIPAA-compliant on AWS and GCP deployments ISO 27001:2013 certified US Federal Information Security

© 2018 Seven Bridges sevenbridges.com

CAVATICA & the Kids First Data Resource

cavatica.org

Designed to: ●  Integrate data for multiple rare pediatric diseases

across dozens of hospitals & clinical sites

●  Colocalize consortium data & bioinformatics resources to facilitate data harmonization, discovery & analysis

●  Make Kids First data & harmonized workflows FAIR and available to the broader research community

31

Page 32: The Seven Bridges Cloud Ecosystem: Enabling Interoperable ... 20180403 Liz... · HIPAA-compliant on AWS and GCP deployments ISO 27001:2013 certified US Federal Information Security

© 2018 Seven Bridges sevenbridges.com

Key Features of CAVATICA & the Kids First Data Resource

●  Collaborative ●  Usable: Custom permissions management for fine-grained control of private

dataset access

●  Reproducible: Use of Rabix & CWL for creating reproducible and portable workflows for consortium-wide harmonization

●  Extendable:

○  Interoperability with the CGC to enable authorized access to public datasets

○  A harmonized metadata model that enables queries across pediatric and adult datasets using existing data query tools

●  Scalable

cavatica.org 32

Page 33: The Seven Bridges Cloud Ecosystem: Enabling Interoperable ... 20180403 Liz... · HIPAA-compliant on AWS and GCP deployments ISO 27001:2013 certified US Federal Information Security

sevenbridges.com © 2018 Seven Bridges

An NIH Data Commons Pilot Solution

FAIR4CURES

33 33

Page 34: The Seven Bridges Cloud Ecosystem: Enabling Interoperable ... 20180403 Liz... · HIPAA-compliant on AWS and GCP deployments ISO 27001:2013 certified US Federal Information Security

© 2018 Seven Bridges sevenbridges.com

FAIR4CURES

A data and standards ecosystem for making NIH data resources FAIR and for enabling secure data sharing & analysis

The FAIR4CURES project is funded in whole or in part with Federal funds from the National Institutes of Health.

34

in collaboration with the NIH Data Commons Pilot Phase Consortium (DCPPC)

Page 35: The Seven Bridges Cloud Ecosystem: Enabling Interoperable ... 20180403 Liz... · HIPAA-compliant on AWS and GCP deployments ISO 27001:2013 certified US Federal Information Security

© 2018 Seven Bridges sevenbridges.com

FAIR4CURES

Designed to:

●  Be a cloud-agnostic platform for making distributed NIH data resources FAIR and available for analysis by the broader research community

●  Establish community standards and generate resources for making digital objects FAIR

Findable: ○  GUIDs for digital objects ○  A common metadata model for indexing & search

Accessible: Standardized authentication / authorization

Interoperable: ○  Open API standards ○  Cross-platform interoperability

Reusable: ○  GUIDs for digital objects

35

Page 36: The Seven Bridges Cloud Ecosystem: Enabling Interoperable ... 20180403 Liz... · HIPAA-compliant on AWS and GCP deployments ISO 27001:2013 certified US Federal Information Security

© 2018 Seven Bridges sevenbridges.com

Key Features of FAIR4CURES

●  Collaborative: GUIDs to promote data and tool publication & reuse ●  Usable: Workspaces connected to multiple cloud providers to enable compute

where the data live

●  Reproducible: GUIDs to promote analytical reproducibility ●  Extendable:

○  A standardized authentication & authorization schema ○  Open API standards & cross-platform interoperability ○  A common metadata model that enables queries across increasingly diverse

datasets & data types using existing data query tools ●  Scalable

36

Page 37: The Seven Bridges Cloud Ecosystem: Enabling Interoperable ... 20180403 Liz... · HIPAA-compliant on AWS and GCP deployments ISO 27001:2013 certified US Federal Information Security

© 2018 Seven Bridges sevenbridges.com

The Seven Bridges Cloud Ecosystem: Interoperable Data Access and Analysis to Drive Precision Medicine

Infrastructure

Interoperability

37

Partnerships

Page 38: The Seven Bridges Cloud Ecosystem: Enabling Interoperable ... 20180403 Liz... · HIPAA-compliant on AWS and GCP deployments ISO 27001:2013 certified US Federal Information Security

sevenbridges.com © 2018 Seven Bridges

Liz Williams, PhD [email protected]


Recommended