+ All Categories
Home > Documents > Tim Bell [email protected] 30/03/2015 2Tim Bell - HEPTech.

Tim Bell [email protected] 30/03/2015 2Tim Bell - HEPTech.

Date post: 16-Jan-2016
Category:
Upload: ashlie-dennis
View: 220 times
Download: 0 times
Share this document with a friend
Popular Tags:
37
Transcript
Page 1: Tim Bell tim.bell@cern.ch 30/03/2015 2Tim Bell - HEPTech.
Page 2: Tim Bell tim.bell@cern.ch 30/03/2015 2Tim Bell - HEPTech.

Cloud Computing Infrastructure at CERN

Tim [email protected]

30/03/2015 2Tim Bell - HEPTech

Page 3: Tim Bell tim.bell@cern.ch 30/03/2015 2Tim Bell - HEPTech.

About CERN

• CERN is the European Organization for Nuclear Research in Geneva

• Particle accelerators and other infrastructure for high energy physics (HEP) research

• Worldwide community• 21 member states (+ 2 incoming members)• Observers: Turkey, Russia, Japan, USA, India• About 2300 staff• >10’000 users (about 5’000 on-site)• Budget (2014) ~1000 MCHF

• Birthplace of the World Wide Web

30/03/2015 Tim Bell - HEPTech 3

Page 4: Tim Bell tim.bell@cern.ch 30/03/2015 2Tim Bell - HEPTech.

30/03/2015 4Tim Bell - HEPTech

Page 5: Tim Bell tim.bell@cern.ch 30/03/2015 2Tim Bell - HEPTech.

30/03/2015 5Tim Bell - HEPTech

Page 6: Tim Bell tim.bell@cern.ch 30/03/2015 2Tim Bell - HEPTech.

30/03/2015 6Tim Bell - HEPTech

Page 7: Tim Bell tim.bell@cern.ch 30/03/2015 2Tim Bell - HEPTech.

The Worldwide LHC Computing Grid

Tier-1: permanent storage, re-processing, analysis

Tier-0 (CERN): data recording, reconstruction and distribution

Tier-2: Simulation,end-user analysis

> 2 million jobs/day

~350’000 cores

500 PB of storage

nearly 170 sites, 40 countries

10-100 Gb links

730/03/2015 Tim Bell - HEPTech

Page 8: Tim Bell tim.bell@cern.ch 30/03/2015 2Tim Bell - HEPTech.

30/03/2015 Tim Bell - HEPTech 8

CERN Archive>100 PB

CERN New data

15 PB

23 PB

27 PB

Page 9: Tim Bell tim.bell@cern.ch 30/03/2015 2Tim Bell - HEPTech.

LHC data growth • Expecting to

record 400PB/year by 2023

• Compute needs expected to be around 50x current levels if budget available

30/03/2015 Tim Bell - HEPTech 9

2010 2015 2018 2023

PBperyear

Page 10: Tim Bell tim.bell@cern.ch 30/03/2015 2Tim Bell - HEPTech.

The CERN Meyrin Data Centre

30/03/2015 10Tim Bell - HEPTech

http://goo.gl/maps/K5SoG

Page 11: Tim Bell tim.bell@cern.ch 30/03/2015 2Tim Bell - HEPTech.

New Data Centre in Budapest

30/03/2015 11Tim Bell - HEPTech

Page 12: Tim Bell tim.bell@cern.ch 30/03/2015 2Tim Bell - HEPTech.

30/03/2015 12Tim Bell - HEPTech

Page 13: Tim Bell tim.bell@cern.ch 30/03/2015 2Tim Bell - HEPTech.

Good News, Bad News

30/03/2015 Tim Bell - HEPTech 13

• Additional data centre in Budapest now online• Increasing use of facilities as data rates increase

But…• Staff numbers are fixed, no more people• Materials budget decreasing, no more money• Legacy tools are high maintenance and brittle• User expectations are for fast self-service

Page 14: Tim Bell tim.bell@cern.ch 30/03/2015 2Tim Bell - HEPTech.

Innovation Dilemma• How can we avoid the sustainability trap ?

• Define requirements• No solution available that meets those requirements• Develop our own new solution• Accumulate technical debt

• How can we learn from others and share ?• Find compatible open source communities• Contribute back where there is missing functionality• Stay mainstream

Are CERN computing needs really special ?

30/03/2015 Tim Bell - HEPTech 14

Page 15: Tim Bell tim.bell@cern.ch 30/03/2015 2Tim Bell - HEPTech.

O’Reilly Consideration

30/03/2015 Tim Bell - HEPTech 15

Page 16: Tim Bell tim.bell@cern.ch 30/03/2015 2Tim Bell - HEPTech.

Job Trends Consideration

30/03/2015 Tim Bell - HEPTech 16

Page 17: Tim Bell tim.bell@cern.ch 30/03/2015 2Tim Bell - HEPTech.

CERN Tool Chain

30/03/2015 Tim Bell - HEPTech 17

Page 18: Tim Bell tim.bell@cern.ch 30/03/2015 2Tim Bell - HEPTech.

OpenStack Cloud Platform

30/03/2015 Tim Bell - HEPTech 18

Page 19: Tim Bell tim.bell@cern.ch 30/03/2015 2Tim Bell - HEPTech.

OpenStack Governance

30/03/2015 Tim Bell - HEPTech 19

Page 20: Tim Bell tim.bell@cern.ch 30/03/2015 2Tim Bell - HEPTech.

OpenStack Status• 4 OpenStack clouds at CERN

• Largest is ~104,000 cores in ~4,000 servers• 3 other instances with 45,000 cores total• 20,000 more cores being installed in April

• Collaborating with companies at every 6 month open design summits• Last one in Paris had 4,500 attendees

30/03/2015 20Tim Bell - HEPTech

Page 21: Tim Bell tim.bell@cern.ch 30/03/2015 2Tim Bell - HEPTech.

Cultural TransformationsTechnology change needs cultural change• Speed

• Are we going too fast ?

• Budget• Cloud quota allocation rather than CHF

• Skills inversion• Legacy skills value is reduced

• Hardware ownership• No longer a physical box to check

30/03/2015 Tim Bell - HEPTech 21

Page 22: Tim Bell tim.bell@cern.ch 30/03/2015 2Tim Bell - HEPTech.

CERN openlab in a nutshell• A science – industry partnership to drive R&D and innovation

with over a decade of success

• Evaluate state-of-the-art technologies in a challenging environment and improve them

• Test in a research environment today what will be used in many business sectors tomorrow

• Train next generation of engineers/employees

• Disseminate results and outreach to new audiences

30/03/2015 Tim Bell - HEPTech 22

Page 23: Tim Bell tim.bell@cern.ch 30/03/2015 2Tim Bell - HEPTech.

Phase V MembersPartners

Contributors

Associates

Research

30/03/2015 Tim Bell - HEPTech 23

Page 24: Tim Bell tim.bell@cern.ch 30/03/2015 2Tim Bell - HEPTech.

IN2P3Lyon

Onwards the Federated Clouds

Public Cloud such as Rackspace

CERN Private Cloud

102K cores

ATLAS Trigger28K cores

CMS Trigger12K cores

Brookhaven National Labs

NecTARAustralia

Many Others on Their Way

30/03/2015 Tim Bell - HEPTech 24

ALICE Trigger12K cores

Page 25: Tim Bell tim.bell@cern.ch 30/03/2015 2Tim Bell - HEPTech.

30/03/2015 Tim Bell - HEPTech 25

Helix Nebula

AtosCloudSigma

T-Systems

Broker(s)

EGI Fed Cloud

Front-endFront-end Front-endFront-endFront-end

Academic Other market sectorsBig Science Small and Medium Scale Science

Publicly funded Commercial

Government Manufacturing Oil & gas, etc.

Net

wor

k C

omm

erci

al/G

EA

NT

Interoute

Front-end

Page 26: Tim Bell tim.bell@cern.ch 30/03/2015 2Tim Bell - HEPTech.

Summary• Open source tools have successfully replaced CERN’s

legacy fabric management system• Private clouds provide a flexible base for High Energy

Physics and a common approach with public resources• Cultural change to an Agile approach has required time

and patience but is paying off• CERN’s computing challenges combined with industry

and open source collaboration fosters sustainable innovation

30/03/2015 26Tim Bell - HEPTech

Page 27: Tim Bell tim.bell@cern.ch 30/03/2015 2Tim Bell - HEPTech.

Thank You

30/03/2015 27

• CERN OpenStack technical details at http://openstack-in-production.blogspot.fr

Tim Bell - HEPTech

Page 28: Tim Bell tim.bell@cern.ch 30/03/2015 2Tim Bell - HEPTech.

Backup Slides

30/03/2015 28Tim Bell - HEPTech

Page 29: Tim Bell tim.bell@cern.ch 30/03/2015 2Tim Bell - HEPTech.

30/03/2015 29Tim Bell - HEPTech

Page 30: Tim Bell tim.bell@cern.ch 30/03/2015 2Tim Bell - HEPTech.

The LHC timeline

L~7x1033

Pile-up~20-35

L=1.6x1034

Pile-up~30-45

L=2-3x1034

Pile-up~50-80 L=5x1034

Pile-up~ 130-200

L.Rossi

Tim Bell - HEPTech 3030/03/2015

Page 32: Tim Bell tim.bell@cern.ch 30/03/2015 2Tim Bell - HEPTech.

compute-nodescontrollers

compute-nodes

Scaling Architecture Overview

32

Child CellGeneva, Switzerland

Child CellBudapest, Hungary

Top Cell - controllersGeneva, Switzerland

Load BalancerGeneva, Switzerland

controllers

30/03/2015 Tim Bell - HEPTech

Page 33: Tim Bell tim.bell@cern.ch 30/03/2015 2Tim Bell - HEPTech.

Monitoring - Kibana

3330/03/2015 Tim Bell - HEPTech

Page 34: Tim Bell tim.bell@cern.ch 30/03/2015 2Tim Bell - HEPTech.

Architecture Components

34

rabb

itmq

- Keystone

- Nova api- Nova conductor- Nova scheduler- Nova network- Nova cells

- Glance api

- Ceilometer agent-central- Ceilometer collector

Controller

- Flume

- Nova compute

- Ceilometer agent-compute

Compute node

- Flume

- HDFS

- Elastic Search

- Kibana

- MySQL

- MongoDB

- Glance api- Glance registry

- Keystone

- Nova api- Nova consoleauth- Nova novncproxy- Nova cells

- Horizon

- Ceilometer api

- Cinder api- Cinder volume- Cinder scheduler

rabb

itmq

Controller

Top Cell Children Cells

- Stacktach

- Ceph

- Flume

30/03/2015 Tim Bell - HEPTech

Page 35: Tim Bell tim.bell@cern.ch 30/03/2015 2Tim Bell - HEPTech.

30/03/2015 35

Microsoft Active Directory

Database Services

CERN Network Database

Account mgmt system

Horizon

Keystone

Glance

NetworkCompute

Scheduler

Cinder

Nova

Block StorageCeph & NetApp

CERN Accounting

Ceilometer

Tim Bell - HEPTech

Page 36: Tim Bell tim.bell@cern.ch 30/03/2015 2Tim Bell - HEPTech.

Public Procurement CycleStep Time (Days) Elapsed (Days)

User expresses requirement 0

Market Survey prepared 15 15

Market Survey for possible vendors 30 45

Specifications prepared 15 60

Vendor responses 30 90

Test systems evaluated 30 120

Offers adjudicated 10 130

Finance committee 30 160

Hardware delivered 90 250

Burn in and acceptance 30 days typical with 380 worst case 280

Total 280+ Days

30/03/2015 Tim Bell - HEPTech 36

Page 37: Tim Bell tim.bell@cern.ch 30/03/2015 2Tim Bell - HEPTech.

Some history of scale…

Tim Bell - HEPTech 37

Date Collaboration sizes

Data volume, archive technology

Late 1950’s 2-3 Kilobits, notebooks

1960’s 10-15 kB, punchcards

1970’s ~35 MB, tape

1980’s ~100 GB, tape, disk

1990’s ~750 TB, tape, disk

2010’s ~3000 PB, tape, disk

For comparison:1990’s: Total LEP data set ~few TBWould fit on 1 tape today

Today: 1 year of LHC data ~27 PB

30/03/2015


Recommended