+ All Categories
Home > Documents > JASMIN/CEMS and EMERALD

JASMIN/CEMS and EMERALD

Date post: 16-Feb-2016
Category:
Upload: step
View: 65 times
Download: 0 times
Share this document with a friend
Description:
JASMIN/CEMS and EMERALD. Scientific Computing Developments at STFC Peter Oliver, Martin Bly Scientific Computing Department Oct 2012. Outline. STFC Compute and Data National and International Services Summary. Daresbury Laboratory Daresbury Science and Innovation Campus - PowerPoint PPT Presentation
Popular Tags:
27
JASMIN/CEMS and EMERALD Scientific Computing Developments at STFC Peter Oliver, Martin Bly Scientific Computing Department
Transcript
Page 1: JASMIN/CEMS and EMERALD

JASMIN/CEMS and EMERALD

Scientific Computing Developments at STFCPeter Oliver, Martin Bly

Scientific Computing DepartmentOct 2012

Page 2: JASMIN/CEMS and EMERALD

19th October 2012 JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing

Outline

• STFC• Compute and Data• National and International Services• Summary

Page 3: JASMIN/CEMS and EMERALD

Isaac Newton Group of TelescopesLa Palma

UK Astronomy Technology CentreEdinburgh

Polaris HouseSwindon, Wiltshire

Chilbolton ObservatoryStockbridge, Hampshire

Daresbury LaboratoryDaresbury Science and Innovation CampusWarrington, Cheshire

Joint Astronomy Centre Hawaii

Rutherford Appleton LaboratoryHarwell Oxford Science and Innovation Campus

19th October 2012 JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing

Page 4: JASMIN/CEMS and EMERALD

19th October 2012 JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing

What we do….

• The nuts and bolts that make it work• enable scientists, engineers and researcher to develop

world class science, innovation and skills

Page 5: JASMIN/CEMS and EMERALD

SCARF

19th October 2012 JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing

• Providing Resources for STFC Facilities, Staff and their collaborators• ~2700 Cores• Infiniband• Panasas filesystem• Managed as one entity• ~50 peer reviewed publications/year

• Additional capacity per year for general use• Facilities such as CLF add capacity using their own

funds• National Grid Service partner

• Local access using Myproxy-SSO• Users use federal id and password to login

• UK e-Science Certificate access

Page 6: JASMIN/CEMS and EMERALD

19th October 2012 JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing

• NSCCS (National Service Computational Chemistry Software)

• Providing National and International Compute, Training and support

• EPSRC Mid-Range Service– SGI Altix UV SMP system, 512 CPUs, 2TB shared

memory• Large memory SMP chosen over a traditional cluster

as this best suites the Computational Chemistry Applications

• Supports over 100 active users– ~70 peer reviewed papers per year– Over 40 applications installed

• Authentication using NGS technologies• Portal to submit jobs

– access for less computationally aware chemists

Page 7: JASMIN/CEMS and EMERALD

Tier-1 Architecture

19th October 2012 JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing

CPUATLASCASTOR

CMSCASTOR LHCB

CASTORGENCASTOR

SJ5

Storage Pools

• >8000 processor cores• >500 disk servers (10PB)• Tape robot (10PB)• >37 dedicated T10000 tape drives

(A/B/C)

OPN

Page 8: JASMIN/CEMS and EMERALD

E-infrastructure South

19th October 2012 JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing

• Consortium of UK universities• Oxford, Bristol, Southampton, UCL• Formed the Centre for Innovation

• With STFC as a partner• Two New Services (£3.7M)

• IRIDIS – Southampton – x86-64• EMERALD – STFC – GPGPU Cluster

• Part of larger investment in e-infrastructure• A Midland Centre of Excellence (£1M). Led by Loughborough University• West of Scotland Supercomputing Centre for Academia and Industry (£1.3m). Led

by the University of Strathclyde• E-Infrastructure Interconnectivity (£2.58M). Led by the University of Manchester• MidPlus: A Centre of Excellence for Computational Science, Engineering and

Mathematics (£1.6 M). Led by the University of Warwick

Page 9: JASMIN/CEMS and EMERALD

19th October 2012 JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing

• Providing Resources to Consortium and partners• Consortium of UK universities

• Oxford, Bristol, Southampton, UCL, STFC• Largest production GPU facility in UK

• 372 Nvidia Telsa M2090 GPUs• Scientific Applications

• Still under discussion• Computational Chemistry front runners

• AMBER• NAMD• GROMACS• LAMMPS

• Eventually 100’s of applications covering all sciences

EMERALD

Page 10: JASMIN/CEMS and EMERALD

EMERALD

19th October 2012 JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing

• 6 racks

Page 11: JASMIN/CEMS and EMERALD

EMERALD HARDWARE I

• 15 x SL6500 chassis:– 4 x GPU compute nodes, each 2 x CPUs and 3 x NVidia M2090

GPUs = 8 GPUs & 12 GPUs per chassis, power ~3.9kW •       SL6500 scalable line chassis •       4 x 1200W power supplies, 4 fans•       4 x 2U, half-width SL390 servers

– SL390s nodes•       2 x Intel E5649 (2.53GHz, 6 core, 80 Watts) •       3 x NVidia M2090 GP-GPUs (512 CUDA cores)•       48GB DDR-3 memory •       1 HDD 146GB SAS 15k drive •       HP QDR Infiniband & 10GbE ports •       Dual 1Gb network ports 

19th October 2012 JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing

Page 12: JASMIN/CEMS and EMERALD

EMERALD HARDWARE II

• 12 x SL6500 chassis, – 2 x GPU compute nodes, each 2 x CPUs and 8 x NVidia M2090

GPUs = 4 CPUs & 16 GPUs per chassis, power ~ 4.6kW.Twelve Chassis•       SL6500 scalable line chassis •       4 x 1200W power supplies, 4 fans•       2 x 4U, half-width SL390 servers

– SL390s nodes•       2 x Intel E5649 (2.53GHz, 6 core, 80 Watts) •       8 x NVidia M2090 GP-GPUs (512 CUDA cores)•       96GB DDR-3 memory •       1 HDD 146GB SAS 15k drive •       HP QDR Infiniband & 10GbEthernet•       Dual 1Gb network ports 

19th October 2012 JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing

Page 13: JASMIN/CEMS and EMERALD

EMERALD

19th October 2012 JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing

• System Applications• RedHat Enterprise 6.x• Platform LSF• CUDA tool kit

• SDK and libraries• Intel and Portland Compilers

• Scientific Applications• Still under discussion• Computational Chemistry front runners

• AMBER• NAMD• GROMACS• LAMMPS

• Eventually 100s of applications covering all sciences

Page 14: JASMIN/CEMS and EMERALD

EMERALD

19th October 2012 JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing

• Managing a GPU cluster• GPUs are more power efficient and give more Gflops/Watt than x86_64 servers

• Reality……True……But each 4 U Chassis: • ~1.2 kW/U space• Full rack required 40+ kW!• Hard to cool

• Additional in row coolers• Cold aisle containment

• Uneven power demand• Stresses aircon and power infrastructure

• 240 GPU job• 31kW Cluster idle to 80kW instantly

• Measured GPU parallel MPI job (HPL) using 368 GPU Cores ~1.4Gflops/W

• Measured X5675 cluster parallel MPI job (HPL) ~0.5Gflops/W

Page 15: JASMIN/CEMS and EMERALD

CEDA data storage & services• Curated data archive• Archive management services• Archive access services (HTTP, FTP, Helpdesk, ...)

Data intensive scientific computing• Global / regional datasets & models• High spatial, temporal resolution• Private cloud

Flexible access to high-volume & complex data for climate & earth observation communities• Online workspaces• Services for sharing & collaboration

JASMIN/CEMS

19th October 2012 JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing

Page 16: JASMIN/CEMS and EMERALD

• Deadline (or funding gone!): 31st March 2012 for “doing science”• Government Procurement : £5M Tender to order < 4 weeks• Machine room upgrades + Large Cluster compete for time• Bare floor to operation in 6 weeks• 6 hours from power off to 4.6PBytes ActiveStore11 mounted at RAL• “Doing science” 14th March• 3 Satellite Site installs in Parallel (Leeds 100TB, Reading 500TB, ISIC

600TB)

Oct 2011 ... 8-Mar-2012 BIS Funds Tender Order Build Network Complete

JASMIN/CEMS

19th October 2012 JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing

Page 17: JASMIN/CEMS and EMERALD

JASMIN/CEMS at RAL

`

JAS

MIN

JAS

MIN

JAS

MIN

30kWIn-R

ow

Cooling

30kWIn-R

ow

Cooling

300300

JAS

MIN

JAS

MIN

JAS

MIN

JAS

MIN

JAS

MIN

JAS

MIN

30kWIn-R

ow

Cooling

30kWIn-R

ow

Cooling

30kWIn-R

ow

Cooling

JAS

MIN

JAS

MIN

JAS

MIN

30kWIn-R

ow

Cooling

- 12 Racks w. Mixed Servers and Storage- 15KW/rack peak (180KW Total)

- Enclosed cold aisle + in-aisle cooling- 600kg / rack (7.2 Tonnes total)- Distributed 10Gb network

- (1 Terabit/s bandwidth)- Single 4.5PB global file system- Two VMware vSphere pools of servers with

dedicated image storage.- 6 Weeks bare floor to working 4.6PB.

19th October 2012 JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing

Page 18: JASMIN/CEMS and EMERALD

JASMIN / CEMS Infrastructure

Configuration:

Storage:103 Panasas ActiveStor 11 shelves, (2,208 x 3TB drives total).

Computing: ‘Cloud’ of 100’s of Virtual machines hosted on 20 Dell R610 Servers

Networking: 10Gb Gnodal throughout. “Lightpath” dedicated links to UK and EU Supercomputers

Physical: 12 Racks. Enclosed aisle, in-row chillers

Capacity: RAL 4.6 PB useable (6.6PB raw). This is equivalent to 920,000 DVDs (a 1.47 km high tower of DVDs)

High Performance: 1.03 Tb/s total storage bandwidth = Copying 1500 DVDs per minute

Single Namespace Solution: one single file system, managed as one system

Status: The largest Panasas system in the world and one of the largest storage deployments in the UK

19th October 2012 JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing

Page 19: JASMIN/CEMS and EMERALD

JASMIN/CEMS Networking

• Gnodal 10Gb Networking– 160 x 10Gb Ports

• in a 4 x GS4008 switch stack• Compute

• 23 Dell servers for VM hosting• (VMware vCentre + vCloud) and HPC

access to storage.• 8 Dell Servers for compute

• Dell Equallogic iSCSI arrays (VM images)

• All 10Gb connected.

• Already upgraded 10Gb network• to add 80 more Gnodal 10Gb ports• Compute expansion

19th October 2012 JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing

Page 20: JASMIN/CEMS and EMERALD

What is Panasas Storage?

• “A complete hardware and software storage solution”

• Ease of Management– Single Management Console for 4.6PB

• Performance– Parallel access via DirecFlow, NFS, CIFS– Fast Parallel reconstruction

• ObjectRAID– All files stored as objects.– RAID level per file– Vertical, Horizontal and network parity

• Distributed parallel file system– Parts (objects) of files on every blade– All blades transmit/receive in parallel

• Global Name Space• Battery UPS

– Enough to shut down cleanly.• 1x 10Gb Uplink per shelf

– Performance scales with size

Director Blade

Storage Blades

19th October 2012 JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing

Page 21: JASMIN/CEMS and EMERALD

PanActive Manager

19th October 2012 JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing

Page 22: JASMIN/CEMS and EMERALD

Panasas in Operation

• Reliability– 1133 Blades– 206 Power Supplies– 103 Shelf Network switches– 1442 components

• Soak testing revealed 27 faults• In Operation 7 faults

– No loss of service – ~0.6% failure per year– Compared to commodity

storage ~5% per year

19th October 2012 JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing

• Performance– Random IO 400MB/s per host– Sequential IO 1Gbyte/s per

host

• External Performance – 10Gb connected– Sustained 6Gp/s

Page 23: JASMIN/CEMS and EMERALD

19th October 2012 JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing

• Backups• System and User Data• SVN

• Codes and documentation• Monitoring

• Ganglia, Cacti, Power-management• Alerting

• Nagios• Security

• Intrusion detection, patch monitoring• Deployment

• Kickstart, LDAP, inventory database• VMware

• Server consolidation,extra resilience• 150+ Virtual servers• Supporting all e-Science activities

• Development Cloud

• ~

Infrastructure SolutionsSystems Management

Page 24: JASMIN/CEMS and EMERALD

19th October 2012 JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing

e-Infrastructures

• Lead role in National and International e-infrastructures• Authentication

• Lead and Develop UK e-Science Certificate Authority• Total issued ~30,000• Current~3000

• Easy integration of UK Access Management Federation

• Authorisation• Use existing EGI tools

• Accounting• Lead and develop EGI APEL accounting

• 500M Records, 400GB data• ~282 Sites publish records• ~12GB/day loaded into the main tables• Usually 13 months but Summary data since 2003

• Integrated into existing HPC style services

Page 25: JASMIN/CEMS and EMERALD

19th October 2012 JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing

e-Infrastructures

• Lead role in National and International e-infrastructures• User Management

• Lead and develop NGS UAS Service• Common portal for project owners• Manage Project and User Allocations

• Display trends, make decisions (policing)

• Information, what services are available?• Lead and develop the EGI information portal GOCDB • 2180 registered GOCDB users belonging to 40 registered NGIs• 1073 registered sites hosting a total of 4372 services• 12663 downtime entries entered via GOCDB

• Training & Support• Training Market place

• tool developed to promote training opportunities, resources and materials• SeIUCCR Summer Schools

• Supporting 30 students for 1 week Course (120 Applicants)

Page 26: JASMIN/CEMS and EMERALD

19th October 2012 JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing

Summary

• High Performance Computing and Data• SCARF• NSCCS• JASMIN• EMERALD• GridPP – Tier1

• Managing e-Infrastructures• Authentication, Authorisation, Accounting• Resource discovery• User Management, help and Training

Page 27: JASMIN/CEMS and EMERALD

19th October 2012 JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing

Information

• Website• http://www.stfc.ac.uk/SCD

• Contact: Pete Oliver• peter.oliver at stfc.ac.uk

Questions?


Recommended