+ All Categories
Home > Engineering > Bruno Silva - eMedLab: Merging HPC and Cloud for Biomedical Research

Bruno Silva - eMedLab: Merging HPC and Cloud for Biomedical Research

Date post: 12-Jan-2017
Category:
Upload: danny-abukalam
View: 431 times
Download: 2 times
Share this document with a friend
37
eMedLab: Merging HPC and Cloud for Biomedical Research Dr Bruno Silva eMedLab Service Operations Manager HPC Lead - The Francis Crick Institute [email protected] @brunodasilva 16/2/2016
Transcript
Page 1: Bruno Silva - eMedLab: Merging HPC and Cloud for Biomedical Research

eMedLab:Merging HPC and Cloud for Biomedical Research

Dr Bruno SilvaeMedLab Service Operations ManagerHPC Lead - The Francis Crick Institute

[email protected] @brunodasilva 16/2/2016

Page 2: Bruno Silva - eMedLab: Merging HPC and Cloud for Biomedical Research

Institutional Collaboration

Page 3: Bruno Silva - eMedLab: Merging HPC and Cloud for Biomedical Research

Medical(Bioinformatics:(Data3Driven(Discovery(for(Personalised(Medicine(((((((((((((((((((((((((((((((((((((UCLP/Crick/Sanger/EBI(!

! 1(

Medical(Bioinformatics:(Data3Driven(Discovery(for(Personalised(Medicine(P.L.(Beales((UCLP),(M.(Caulfield((UCLP),(P.V.(Coveney((UCLP),(D.(Hawkes((UCLP),((

H.(Hemingway((UCLP),(T.J.(Hubbard((Sanger),(D.A.(Lomas((UCLP),(N.M.(Luscombe((UCLP,(Crick),((J.P.(Overington((EBI),(L.(Smeeth((UCLP),(J.C.(Smith((Crick),(C.(Swanton((UCLP,(Crick)(

(1.(Objectives(2.(The(Partnership(3.(Disease(Types(4.(eMedLab(e3Infrastructure(5.(Research(and(Training(Academy!

6.(Coordinating(Analytics(Research:(Academy(Labs(7.(Strategic(Issues(8.(Costs(9.(Metrics(for(Success(

1.(Objectives( (Our vision is to maximise the gains for patients and for medical research that will come from the explosion in human health data. To realise this potential we need to accumulate medical and biological data on an unprecedented scale and complexity, to coordinate it, to store it safely and securely, and to make it readily available to interested researchers. It is vital to develop people with the skills and expertise to exploit these data for the benefit of patients. Together, UCL Partners, the Francis Crick Institute, Sanger Institute and the European Bioinformatics Institute shall deliver the following:

1.1#Create#a#powerful#eMedLab#e3infrastructure#(lead:#Smith)!We are hampered in our work to generate new medical insights because of the fragmented accessibility of fundamental clinical and research data, and the lack of a high-performance computing (HPC) facility in which to analyse them. We shall build eMedLab, a shared computer cluster to integrate and share heterogeneous data from personal healthcare records, imaging, pharmacoinformatics and genomics. Through co-location, we will eliminate the delays and security risks that occur when data are moved. It also provides a platform to develop analytical tools that allow biomedical researchers to transform raw data into scientific insights and clinical outcomes. eMedLab will store data securely and its modular design will ensure sustainability through expansion and replacement. This will cost £6.8M.

1.2#Expand#capacity:#Medical#Bioinformatics#Research#&#Training#Academy#(lead:#Lomas)!As part of the UK’s healthcare strategy, we will train the next generation of clinicians and scientists to ensure that the NHS’s ability to apply genomic and imaging data to clinical care is among the best in the world. We shall establish a Medical Bioinformatics Research and Training Academy where basic and clinical scientists, research fellows, post-docs and PhD students will be trained for world-leading computational biomedical science. The Academy will ensure that interactions cut across the traditional boundaries of disease types. We will fund 4 Career Development Fellowships (CDFs) to recruit outstanding junior faculty; successful fellows will choose their home institution from one, or combination of the 4 partners. Research activities will be coordinated by the Academy Labs. We will form synergistic links with the Farr Institute of Health Informatics Training Academy in e-Health and the UK-ELIXIR node for bioinformatics training. This will cost £2.1M.

1.3#Strategic#overview!This is a strategically critical bid for establishing medical bioinformatics in the UK; it will enable us build on our existing strengths to treat diseases (Fig 1). Secure partner data will be loaded into eMedLab, alongside public data from projects such as ENCODE and 1000 Genomes; it will also interface with industry-derived data and the new Global Alliance to allow secure sharing of genomic and clinical data. The consolidated, integrated information, along with associated tool and analytics, will drive the activities of the Academy, and provide the substrate for research performed by the CDFs as well as researchers among partners. This bid leverages >£10M of grant investment plus £1.8M industry investment, and provides opportunities to apply for additional funding for infrastructure and capacity growth.

For illustration, we describe exemplar projects that will be enabled by our partnership. (i) We highlight 3 disease domains in which we have unique strengths: rare diseases, cardiovascular diseases, and cancer. (ii) We focus on 3 data types in which we have outstanding skills: genomic (primarily genetic), imaging (ranging in scale from whole organs to histopathological samples) and e-Health information (patient records and deep phenotyping). Close links with the Farr Health Informatics Research Institute at UCL

Page 4: Bruno Silva - eMedLab: Merging HPC and Cloud for Biomedical Research

Research Data

eMedlab

exists?Plan Discover

Collect StoreTransfer

Analyse Share

No

Yes

Publish

Archive

Page 5: Bruno Silva - eMedLab: Merging HPC and Cloud for Biomedical Research

Multidisciplinary research

DIY.. .

Page 6: Bruno Silva - eMedLab: Merging HPC and Cloud for Biomedical Research

Federated Institutional support

eMedLabOps team

Inst. Support

Inst. Support

Inst. Support

Inst. Support

Inst. Support

Inst. Support

Nofundingavailable fordedicated staff!

Page 7: Bruno Silva - eMedLab: Merging HPC and Cloud for Biomedical Research

The Technical Design Group

• MikeAtkins– UCL(ProjectManager)• AndyCafferkey – EBI• RichardChristie– QMUL(Chair)• PeteClapham– Sanger• DavidFergusson– theCrick• ThomasKing– QMUL• RichardPassey – UCL• BrunoSilva– theCrick

Page 8: Bruno Silva - eMedLab: Merging HPC and Cloud for Biomedical Research

Bid responses – interesting facts

• MajorityprovidingOpenStack astheCloudOS• HalfincludedanHPCandaCloudenvironment• OneprovidedaVmware-basedsolution• OneprovidedaOpenStack-onlysolution• HalftenderresponsesofferedLustre• OneprovidedCeph forVMstorage

Page 9: Bruno Silva - eMedLab: Merging HPC and Cloud for Biomedical Research

Winning bid

• 6048cores (E5-2695v2)• 252 IBMFlexservers,each with• 24cores• 512GBRAM percomputeserver• 240GBSSD(2x120GBRAID0)• 2x10GbEthernet

• 3:1Mellanox Ethernetfabric• IBMGSS26– Scratch1.2PB• IBMGSS24– GeneralPurpose(Bulk)4.3PB• Cloud OS– OpenStack

OCF Response to Mini Competition under NSSA Framework

82

UCL Specification Contractors Technical Solution

Ref no

Requirement Individual score weightings

Compliant?

FC/PC/NC/NA

Contractor response Please fill in your response to the specification here. If additional information is attached, please give document references. Please do  not  provide  “yes”  responses  in  the  following  sections.    Where  a  requirement is met please explain how you meet or exceed this requirement.

may be acceptable in the network.

Our network design would accommodate connecting a number of devices directly into the

40Gb core switches using 40Gb connections. We propose using this for our scratch storage

solution due to the high bandwidth required from it. Our compute nodes will also benefit from

this in that fewer hops across the network will be required than from connecting the scratch

storage into a leaf switch.

1.7.2

Your proposed solution must provide a separate dedicated 1 Gb/s interconnect for monitoring and management.

3 FC

OCF have included a separate 1Gb/s Ethernet monitoring and management network which will

be used to deploy and manage the solution.

Our proposed compute nodes reside in IBM Flex Chassis allowing these nodes to connect via an

internal mid-plane to an internal IBM EN2092 1Gb switches which will greatly reduce the

amount of bulky Cat5e cables required for the compute nodes, simplifying maintenance and

removing potential failure points.

The remainder of the management network is provided by rack-mounted IBM G8052 1Gb

switches.

1.7 Your proposed 3 FC All nodes in our solution feature two separate network interface components to offer resiliency

Page 10: Bruno Silva - eMedLab: Merging HPC and Cloud for Biomedical Research
Page 11: Bruno Silva - eMedLab: Merging HPC and Cloud for Biomedical Research

Benchmarkresultspreliminary

• AggregateHPL(onerunperserver– embarrassinglyparallel)• Peak460Gflops*252=116Tflops• Max– 94%• Min– 84%

• VM≈BaremetalHPLruns(16core)

Page 12: Bruno Silva - eMedLab: Merging HPC and Cloud for Biomedical Research

Benchmarkresultspreliminary– baremetalonly• Storagethroughput

BulkFileSystem(gpfsperf GB/s) Scratch FileSystem(gpfsperf GB/s)

Create Read Write Create Read Write

Sequential Sequential Random Sequential Random Sequential Sequential Random Sequential Random

16M 16M 512K 16M 512K 16M 512K 16M 512K 16M 16M 512K 16M 512K 16M 512K 16M 512K

100 88 86 131 22 96 97 89 60 141 84 83 107 20 137 137 125 28

Page 13: Bruno Silva - eMedLab: Merging HPC and Cloud for Biomedical Research

eMedLab Service

Page 14: Bruno Silva - eMedLab: Merging HPC and Cloud for Biomedical Research

Federated Institutional support

Operations Team Support(Support to facilitators and Systems Administrators)

Institutional Support(direct support to research)

TicketsTrainingDocumentation

elasticluster

Page 15: Bruno Silva - eMedLab: Merging HPC and Cloud for Biomedical Research

Pilot Projects

Page 16: Bruno Silva - eMedLab: Merging HPC and Cloud for Biomedical Research

Pilot Projects

• Spiros Denaxas - Integrating EHR into i2b2 data marts

Page 17: Bruno Silva - eMedLab: Merging HPC and Cloud for Biomedical Research

Pilot Projects

• Taane Clark – Biobank Data Analysis – evaluation of analysis tools

Page 18: Bruno Silva - eMedLab: Merging HPC and Cloud for Biomedical Research

Pilot Projects

• Michael Barnes - TranSMART

Page 19: Bruno Silva - eMedLab: Merging HPC and Cloud for Biomedical Research

Pilot Projects

• Chela James - Gene discovery, rapid genome sequencing, somatic mutation analysis and high-definition phenotyping

VM Image

Installing OS

CPU RAM Disk

“Flavours”

VM Instance

1

VM Instance

N

Network

Start/Stop/Hold/CheckpointInstance

Horizon ConsoleSSH- External IPSSH– TunnelWebinterface, etc…

Page 20: Bruno Silva - eMedLab: Merging HPC and Cloud for Biomedical Research

Pilot Projects

• Peter Van Loo – Scalable, Collaborative, Cancer Genomics Cluster

elasticluster

Page 21: Bruno Silva - eMedLab: Merging HPC and Cloud for Biomedical Research

Pilot Projects

• Javier Herrero - Collaborative Medical Genomics Analysis Using Arvados

Page 22: Bruno Silva - eMedLab: Merging HPC and Cloud for Biomedical Research

Challenges

Page 23: Bruno Silva - eMedLab: Merging HPC and Cloud for Biomedical Research

Challenges

Support Integration

Presentation

Performance

Security Allocation

Page 24: Bruno Silva - eMedLab: Merging HPC and Cloud for Biomedical Research

Challenges

Support Integration

Presentation

Performance

Security Allocation

Page 25: Bruno Silva - eMedLab: Merging HPC and Cloud for Biomedical Research

Security

Page 26: Bruno Silva - eMedLab: Merging HPC and Cloud for Biomedical Research

Challenges - Security

• PresentationofGPFSsharedstoragetoVMsraisessecurityconcerns• VMswillhaverootaccess– evenwithsquash,usercansidestepidentity• Re-exportGPFSwithaserver-sideauthenticationNASprotocol• Alternatively,abstractsharedstoragewithanotherservicesuchasiRODS

• AbilityofOpenStack userstomaintainsecurityofVMs• Particularlyproblematicwhendeploying“fromscratch”systems• Competent,dedicatedPSAsmitigatethis…buttheyarehardtofind

Page 27: Bruno Silva - eMedLab: Merging HPC and Cloud for Biomedical Research

Performance

Page 28: Bruno Silva - eMedLab: Merging HPC and Cloud for Biomedical Research

Challenges - Performance• FileSystemBlockRe-Mapping

• SSperformsextremelywellwith 16MBblocks– wewantto leveragethis

• Leveragingsharedstorageparallelism(POSIX)• Leverageincreasedthroughput inparallelfilesystems ishampered byCinder– Nova1:1storagemapping

• Multi-attach(forreads)orManilamightofferawayout forthis

• Hypervisoroverhead(notallcoresusedforcompute)• Pagepool (GPFScache)takesRAM(32GB)andsomeCPU• Minimise number ofcores“wasted”oncloudmanagement• Ontheotherhand fewercoresmeansmorememorybandwidth

• VMIOperformancepotentiallyaffectedbyvirtualnetworkstack• LeveragefeaturesavailableintheMellanox NICssuchasRoCE,SR-IOV,andoffloadcapabilities

Page 29: Bruno Silva - eMedLab: Merging HPC and Cloud for Biomedical Research

Challenges – PerformanceBlock Re-Mapping

• SS(GPFS)isverygoodathandlingmanysmallfiles– bydesign• VMsperformrandomIOreadsandafewwriteswiththeirstorage• VMstorage(andCinderstoragepools)areverylargefilesontopofGPFS• VMblocksizedoesnotmatchSS(GPFS)blocksize

BulkFileSystem(gpfsperf GB/s) ScratchFileSystem(gpfsperf GB/s)Create Read Write Create Read Write

Sequential Sequential Random Sequential Random Sequential Sequential Random Sequential Random

16M 16M 512K 16M 512K 16M 512K 16M 512K 16M 16M 512K 16M 512K 16M 512K 16M 512K

100 88 86 131 22 96 97 89 60 141 84 83 107 20 137 137 125 28

Page 30: Bruno Silva - eMedLab: Merging HPC and Cloud for Biomedical Research

Challenges – PerformanceBlock Re-Mapping

• turnrandomintosequentialIO (attheQEMU/libvirt layer?)• Standingbaremetalcluster(nuclearoption)• GPFSoncomputenode+containers

BulkFileSystem(gpfsperf GB/s) ScratchFileSystem(gpfsperf GB/s)Create Read Write Create Read Write

Sequential Sequential Random Sequential Random Sequential Sequential Random Sequential Random

16M 16M 512K 16M 512K 16M 512K 16M 512K 16M 16M 512K 16M 512K 16M 512K 16M 512K

100 88 86 131 22 96 97 89 60 141 84 83 107 20 137 137 125 28

Page 31: Bruno Silva - eMedLab: Merging HPC and Cloud for Biomedical Research

Challenges – PerformanceShared Storage Parallelism

• GPFSpresentsaparallelPOSIXfilesystem,sharedbetweencomputenodes• OpenStackCinderstorageisessentiallyafileinthePOSIXsense• EachCindervolumecanonlybeaccessedbyoneVM• QEMU/libvirt canbemulti-threaded– enablesVMfileparallelisminGPFS• Multi-attachcanallowmultiplemachinestoaccessthesameblockofstorage• Manilaprojectmightaddressallofthis,ifitevercomestoenablenativeGPFSsupport

Page 32: Bruno Silva - eMedLab: Merging HPC and Cloud for Biomedical Research

Future Developments

Page 33: Bruno Silva - eMedLab: Merging HPC and Cloud for Biomedical Research

Future Developments

• VMandStorageperformance analysis• CreateoptimalsettingsrecommendationsforProjectSystemsAdministratorsandOpsteam

• Revisit Networkconfiguration• Provideasimpler,morestandardOpenStack environment• Simplifyservicedelivery,accountcreation,otheradministrativetasks

• Research DataManagement forSharedData• CouldbeaservicewithintheVMservicesecosystem• IRODSisapossibility• ExplorepotentialofScratch

• Integration withAssent(Moonshot tech)• Accesstoinfrastructurethroughremotecredentialsandlocalauthorisation• Firststeptosecurelysharingdataacrosssites(SafeShareproject)

Page 34: Bruno Silva - eMedLab: Merging HPC and Cloud for Biomedical Research

Operations Team

Thomas Jones (UCL) Pete Clapham (Sanger)William Hay (UCL) James Beale (Sanger)Luke Sudbery (UCL)

Tom King (QMUL)Bruno Silva (Ops Manager, Crick)Adam Huffman (Crick) Andy Cafferkey (EMBL-EBI)Luke Raimbach (Crick) Rich Boyce (EMBL-EBI)Stefan Boeing (Data Manager, Crick) David Ocana (EMBL-EBI)

Page 35: Bruno Silva - eMedLab: Merging HPC and Cloud for Biomedical Research

Institutional Support Teams

UCL:Facilitator: David WongPSA: Faruque Sarker

Crick:Facilitator: David Fergusson/Bruno SilvaPSA: Adam Huffman, Luke Raimbach, John Bouquiere

LSHTM:Facilitator: Jackie StewartPSA: Steve Whitbread, Kuba Purebski

Page 36: Bruno Silva - eMedLab: Merging HPC and Cloud for Biomedical Research

Institutional Support Teams

Sanger:Facilitator: Tim Cutts, Josh RandallPSA: Peter Clapham, James Beal

EMBL-EBI:Facilitator: Steven Newhouse/Andy CafferkeyPSA: Gianni Dalla Torre

QMUL:Tom King

Page 37: Bruno Silva - eMedLab: Merging HPC and Cloud for Biomedical Research

I’llstophere…

Thank You!


Recommended