+ All Categories
Home > Documents > The ELIXIR Compute Platform: An environment for …The ELIXIR Compute Platform: An environment for...

The ELIXIR Compute Platform: An environment for …The ELIXIR Compute Platform: An environment for...

Date post: 29-May-2020
Category:
Upload: others
View: 9 times
Download: 0 times
Share this document with a friend
20
ELIXIR-EXCELERATE is funded by the European Commission within the Research Infrastructures programme of Horizon 2020, grant agreement number 676559. The ELIXIR Compute Platform: An environment for Analysing Life-Science Data Steven Newhouse, Head of Technical Services, EMBL-EBI
Transcript
Page 1: The ELIXIR Compute Platform: An environment for …The ELIXIR Compute Platform: An environment for Analysing Life-Science Data Steven Newhouse, Head of Technical Services, EMBL-EBI

ELIXIR-EXCELERATE is funded by the European Commission within the Research Infrastructures programme of Horizon 2020, grant agreement number 676559.

The ELIXIR Compute Platform:An environment for Analysing

Life-Science DataSteven Newhouse, Head of Technical Services, EMBL-EBI

Page 2: The ELIXIR Compute Platform: An environment for …The ELIXIR Compute Platform: An environment for Analysing Life-Science Data Steven Newhouse, Head of Technical Services, EMBL-EBI

Genomics vs High Energy Physics

• Both are excellent examples of Big Data but Genomics data is:

• More complex and variable, used in more demanding ways

• Growth is accelerating faster than physics data

• Greater uncertainty on short timescales => less time to respond

• Less community-wide investment in s/w and infrastructure

• Sequencing and imaging machines provide 1000’s of data sources

• Research data deposited into repositories before publishing

• Health data retained inside organisational firewalls

• Tony Wildish, Genomics vs. Physics, HEPIX 2016,

https://indico.cern.ch/event/531810/sessions/208405/#20161019

Page 3: The ELIXIR Compute Platform: An environment for …The ELIXIR Compute Platform: An environment for Analysing Life-Science Data Steven Newhouse, Head of Technical Services, EMBL-EBI

EMBL: European Molecular Biology LaboratoryOver 1600 people and more than 80 nationalities

Structural biology

Hamburg

Life sciences

Heidelberg

Epigenetics and neurobiology

Rome

Bioinformatics

Cambridge(EMBL-EBI)

Structural biology

Grenoble

Tissue biology and disease modelling

Barcelona

Page 4: The ELIXIR Compute Platform: An environment for …The ELIXIR Compute Platform: An environment for Analysing Life-Science Data Steven Newhouse, Head of Technical Services, EMBL-EBI

Data Resources at EMBL-EBI

Literature & ontologies• Experimental Factor

Ontology• Gene Ontology• BioStudies• Europe PMC

Chemical biology• ChEBI• ChEMBL• SureChEMBL

Molecular structures• Protein Data Bank in Europe• Electron Microscopy Data Bank

Gene, protein & metabolite expression• Expression Atlas• Metabolights• PRIDE• RNA Central

Protein sequences, families & motifs• InterPro• Pfam• UniProt

Genes, genomes & variation• Ensembl• Ensembl Genomes• GWAS Catalog• Metagenomics portal

Systems• BioModels• BioSamples• Enzyme Portal• IntAct• Reactome

Molecular Archives• European Nucleotide Archive• European Variation Archive• European Genome-phenome Archive• ArrayExpress

Cross domain resources . Cross dom

ain resources

dg

P

b

s

y

Page 5: The ELIXIR Compute Platform: An environment for …The ELIXIR Compute Platform: An environment for Analysing Life-Science Data Steven Newhouse, Head of Technical Services, EMBL-EBI

Ever Increasing Demands

Storage growth at EMBL-EBI still 40-50% a year.

Increasingly ‘interesting’ data being generated and held in national or local repositories.• Integration challenges

Page 6: The ELIXIR Compute Platform: An environment for …The ELIXIR Compute Platform: An environment for Analysing Life-Science Data Steven Newhouse, Head of Technical Services, EMBL-EBI

Big data, big demand

~27 million requests to EMBL-EBI websites

every day

200 petabytes

of storage capacity in our data centres

EMBL-EBI delivered

152 million jobs to its users in 2016

Scientists at over

3.2 million unique IP addresses use

EMBL-EBI websites

Page 7: The ELIXIR Compute Platform: An environment for …The ELIXIR Compute Platform: An environment for Analysing Life-Science Data Steven Newhouse, Head of Technical Services, EMBL-EBI

Data Centre Infrastructure

Campus(Hinxton)[90 racks]

Leased Data Centre(Hemel Hempstead)

[90 racks]

Leased Data

Centre(Slough)

[10 racks]

JANET – UK Academic Network

• Raw Storage:• Object Store – 101PB• NAS – 70PB• HPC Storage 22PB• Tape – 22PB

• Analysis Capacity:• HTC: 22,000 job slots• HPC: 7,000 job slots• Cloud: 6,000 vCPUs• Virtual infrastructure: 1,500 cores

20Gbs10Gbs1Gbs

Page 8: The ELIXIR Compute Platform: An environment for …The ELIXIR Compute Platform: An environment for Analysing Life-Science Data Steven Newhouse, Head of Technical Services, EMBL-EBI

ELIXIR – Research Infrastructure for Life Science

8

• ComputeAccess, Exchange & Compute on sensitive data

• DataSustain core data resources

• ToolsServices & connectors to drive access and exploitation

• StandardsIntegration and interoperability of data and services.

• TrainingProfessional skills for managing and exploiting data

Page 9: The ELIXIR Compute Platform: An environment for …The ELIXIR Compute Platform: An environment for Analysing Life-Science Data Steven Newhouse, Head of Technical Services, EMBL-EBI

ELIXIR Compute Platform: Integration with communities

The transfer of large volume, electronic confidential, human data

https://www.elixir-europe.org/events/elixir-webinar-transfer-large-volume-data

Page 10: The ELIXIR Compute Platform: An environment for …The ELIXIR Compute Platform: An environment for Analysing Life-Science Data Steven Newhouse, Head of Technical Services, EMBL-EBI

ELIXIR Compute Platform: Integrating Existing Serviceshttps://www.elixir-europe.org/platforms/compute

The ELIXIR Nodes and their collaboration with European e-Infrastructures form the technical and resource foundation of the ELIXIR Compute Platform.

A geographically distributed Authentication & Authorisation Infrastructure (AAI) in operation.

Integrated Cloud & Compute and Storage & File Transfer Services that are provided by the individual ELIXIR Nodes and which will be discoverable through ELIXIR.

Moving data between sites is one key capability of the ELIXIR Compute Platform.

Raising the level of abstraction through platforms that promote distributed workflow execution

Page 11: The ELIXIR Compute Platform: An environment for …The ELIXIR Compute Platform: An environment for Analysing Life-Science Data Steven Newhouse, Head of Technical Services, EMBL-EBI

ELIXIR Cloud & Compute

ELIXIR Cloud capacities surveyed here DK, DE, EBI, FI, FR, SUI confirmed capacity, counting only these nodes

> 60.000 compute cores

> 24.000 TB of storage

> 3.000 compute users

Resource allocationdecisions are made bythe nodes

Page 12: The ELIXIR Compute Platform: An environment for …The ELIXIR Compute Platform: An environment for Analysing Life-Science Data Steven Newhouse, Head of Technical Services, EMBL-EBI

ELIXIR Data Transfer and Storage

• PID and Metadata Registry

• Minimal metadata for tracking and downloading data available

• Example implementation integrating GridFTP and Handles capturing minimal metadata; automatic Handle resolving

• Next step: Integration with RDA collections API and specification

• File Transfer

• Deployed FTS3 integrated with ELIXIR AAI supporting multiple protocols (gridftp, https, S3, …),

• Command line and web UI

• Performance tests between GridFTP, Aspera, http and other protocols is still ongoing (Elixir-ES)

• Reference Data Set Distribution Service

• RDSDS planned, designed and developed at EMBL-EBI with support from EUDAT2020

Page 13: The ELIXIR Compute Platform: An environment for …The ELIXIR Compute Platform: An environment for Analysing Life-Science Data Steven Newhouse, Head of Technical Services, EMBL-EBI

Interaction with e-Infrastructures

Communities CommunitiesCommunities

ELIXIR Compute Platform

EOSC(EGI, EUDAT & Indigo)

Commercial ProvidersELIXIR Nodes

Page 14: The ELIXIR Compute Platform: An environment for …The ELIXIR Compute Platform: An environment for Analysing Life-Science Data Steven Newhouse, Head of Technical Services, EMBL-EBI

At a GLIF meeting a long time ago, in a galaxy far, far away…

• EGI.eu still here!

• Coordinating EOSC-Hub

• EOSC: European Open Science Cloud

• Federating cloud resources and services

• Enabling open science around open data

• https://www.glif.is/meetings/2010/plenary/newhouse-egi.pdf

Page 15: The ELIXIR Compute Platform: An environment for …The ELIXIR Compute Platform: An environment for Analysing Life-Science Data Steven Newhouse, Head of Technical Services, EMBL-EBI

Future Compute Platform: ELIXIR-GA4GH Analysis Environment

• Integrate user federation ELIXIR AAI into local compute and data deployments

• Rationalise a ELIXIR-wide Data Distribution Network – starting with Reference datasets

• Drive ELIXIR Compute Platform support for hybrid (public/private & cloud/HPC) deployments – e.g. Openstack, SGE, etc

• Develop Task Distribution Network using Task orchestration engines – e.g. Kubernetes

• Support national or regional workflow choreography engines – e.g. CWL, Nextflow, Galaxy, etc.

Page 16: The ELIXIR Compute Platform: An environment for …The ELIXIR Compute Platform: An environment for Analysing Life-Science Data Steven Newhouse, Head of Technical Services, EMBL-EBI

Infrastructure Requirements

• Data Sources:

• EMBL-EBI has a lot of data! Science DMZ to improve access.

• Cloud Resources:

• From within ELIXIR Nodes, national providers, others all federated through EOSC

• Commercial cloud providers: HelixNebula (T-Systems & RHEA), AWS, GCP, MSA, …

• Data Sinks:

• Strategic placement of reference data sets & tactical placement of analysis data

Underlying Network Infrastructure with dynamic dedicated virtual links?

Page 17: The ELIXIR Compute Platform: An environment for …The ELIXIR Compute Platform: An environment for Analysing Life-Science Data Steven Newhouse, Head of Technical Services, EMBL-EBI

Hybrid Cloud Future

• Cost model of public clouds:• Good for transitory activities, e.g. 1000 cores for 2 months

• Bad for long-term activities, e.g. 17PB for 5 years growing 0.5PB/month in + out

• How can we present our on-site storage externally?• Replicate and sync to the cloud: Existing file based access model

• On-demand caching: Existing file based access model with smart layer

• Direct network access over http: Read/Write whole object

• How much bandwidth is needed to support these models?

Page 18: The ELIXIR Compute Platform: An environment for …The ELIXIR Compute Platform: An environment for Analysing Life-Science Data Steven Newhouse, Head of Technical Services, EMBL-EBI

Scaling out from EMBL-EBI’s Data Centres

JANET/GEANT

Public Clouds

http(Object Store)

NFS(Scale out storage)

Web SitesWeb Services

ELIXIR Clouds

IndividualUsers

EBI Hybrid Cloud

Page 19: The ELIXIR Compute Platform: An environment for …The ELIXIR Compute Platform: An environment for Analysing Life-Science Data Steven Newhouse, Head of Technical Services, EMBL-EBI

Summary

• Network remains key factor• Even more so for big data in clouds

• Business & service models remain complex• Both local ISP connectivity and public cloud providers

• Can network paths on demand help (or hinder) here• Danger in just adding another complexity layer!• But could provide USP for performance & cost with public clouds

Page 20: The ELIXIR Compute Platform: An environment for …The ELIXIR Compute Platform: An environment for Analysing Life-Science Data Steven Newhouse, Head of Technical Services, EMBL-EBI

www.elixir-europe.org

@ELIXIREurope /company/elixir-europe

www.elixir-europe.org

/company/elixir-europe

Thank you – [email protected]: ELIXIR Compute Platform &

EMBL-EBI Technical Services [email protected]


Recommended