The WorldWide LHC computing Grid, GridPP and...

Post on 04-Jun-2020

1 views 0 download

transcript

The WorldWide LHC computing Grid,

GridPP and you

Christopher J. Walker

C.J.Walker@qmul.ac.uk

29/08/2013 S 2

Overview

• The Grid

– Motivation and history

• Thanks David Britton

– Live Demo

29/08/2013 S 3 3

Introduction

The physics

The LHC

The Grid

The Experiments

July 4th 2012:

Rolf Heuer

CERN

Director

General

29/08/2013 S 4 4 David Britton, University of Glasgow

Why we need the Grid.

What is a Grid?

How the Grid works.

Grid usage and impact.

Evolution.

Summary.

Outline

29/08/2013 S 5

Challenge – The Data Volume

5 David Britton, University of Glasgow

All Events

Standard

Model

W,Z, Jets

Higgs

Notionally

40TB/sec

200MB/sec

recorded

Higgs?

10 Orders of

Magnitude

29/08/2013 S 6

Challenge: Date Complexity

6 David Britton, University of Glasgow

Multiple separate interactions

during each “collision”.

TABLE OF MILLIONS

Collisions 40-million times a second,

each a composite of many interactions.

150-million electronic channels on

ATLAS and CMS detectors.

15-million gigabytes (15 Petabytes) of

data recorded per year.

Expect a few per million recorded

collisions to contain a Higgs (but

individually you can’t tell that they are

Higgs).

29/08/2013 S 7

Data Pyramid

7 David Britton, University of Glasgow

Raw Data

Reconstructed Data

Analysis Objects

Tag Data

Ntuples

H

Monte Carlo Data

Reconstructed Data

Analysis Objects

Tag Data

Ntuples

H

Total ATLAS Disk Used

100 PB

2008 2012

1 Petabyte = 1000 Terabytes

1 Terabyte = 1000 Gigabytes

1 Gigabyte = 1000

Megabytes

29/08/2013 S 8

What is the Grid?

8 David Britton, University of Glasgow

29/08/2013 S 9 9 David Britton, University of Glasgow

Web:

Focused historically on sharing information

(high level data - text, picture, music, video)

Allows a limited set of predetermined actions

(data processing) such as search, filter, sort,

stream, etc.

Grid: The idea is to share storage and

computing power more directly,

enabling much larger data sets to be

shared with user-determined data

processing.

Web vs Grid

29/08/2013 S 10 10 David Britton, University of Glasgow

Evolution towards Grid

29/08/2013 S 11 11 David Britton, University of Glasgow

Why “Grid?”

29/08/2013 S 12

How does it work?

12 David Britton, University of Glasgow

E = mc2

Grid

Middleware

29/08/2013 S 13 13 David Britton, University of Glasgow

Middleware

CPU Disks, CPU etc

Application Layer

OPERATING

SYSTEM

Word/Excel

Email/Web

Your

Program

Games

MIDDLEWA

RE

CPU

Cluster

User

Interface

Machine

CPU

Cluster

CPU

Cluster

Resource

Broker Information

Service

Grid

Disk

Server

Your

Program

Replica

Catalogue Bookkeeping

Service

Single PC

Middleware is the Operating System of a distributed computing system.

29/08/2013 S 14 14 David Britton, University of Glasgow

How does it work?

Getting Started

1. Get a digital certificate (UK Certificate Authority)

2. Join a Virtual Organisation (VO)

Authentication – who you are

Authorisation – what you are allowed to do

29/08/2013 S 15 15 David Britton, University of Glasgow

How does it works?

The details

VOMS

WMS

JS

RB

LFC

BDII

Logging &

Bookkeeping

3

CPU Nodes Storage

Grid Enabled Resources

CPU Nodes Storage

Grid Enabled Resources

CPU Nodes Storage

Grid Enabled Resources

CPU Nodes Storage

Grid Enabled Resources

4

5

Submitter

6

7

8 9

10

The Grid

glite-wms-job-submit myjob.jdl Myjob.jdl

JobType = “Normal”;

Executable = "/sum.exe";

InputData = "LF:testbed0-00019";

DataAccessProtocol = "gridftp";

InputSandbox = {"/home/user/WP1testC","/home/file*”, "/home/user/DATA/*"};

OutputSandbox = {“sim.err”, “test.out”, “sim.log"};

Requirements = other. GlueHostOperatingSystemName == “linux" &&

other. GlueHostOperatingSystemRelease == "Red Hat 6.2“ && other.GlueCEPolicyMaxWallClockTime > 10000;

Rank = other.GlueCEStateFreeCPUs;

gridui

JDL

11

0 VOMS-proxy-init

1

2

Job S

tatu

s?

29/08/2013 S 16

Storage Development

The similar exponential increase in the storage

density, and the corresponding fall in the cost

of data storage.

CPU Development

The sustained exponential increase in the

density of transistors, and the corresponding

fall in the cost of computational power.

Enabling Technology

16

Network Development

The similar exponential increase in the

available bandwidth and the corresponding fall

in the cost of moving data.

Density of storage

29/08/2013 S 17

When is a Grid useful?

17 David Britton, University of Glasgow

Problems that are highly parallelizable Problem

Grid

Solution

Input data is independent

e.g. Images:

A=2

B=3 A=3

B=3 A=2

B=4

Simulation using

different parameters:

Not so good for closely

coupled problems

These pieces may be

independent

These pieces will have

to interact

29/08/2013 S 18

Structure of the Grid

Institutes

CERN computer centre

RAL,UK

ScotGrid NorthGrid SouthGrid London

France Italy Germany USA

Glasgow Edinburgh Durham

Tier 0

Tier 1 National centres

Tier 2 Regional groups

Offline farm

Online system

Workstations

Useful model for

Particle Physics but

not necessary for

others

Studies in the late 90’s lead to a hierarchical structure.

RAL,UK

ScotGrid NorthGrid SouthGrid London

Glasgow Edinburgh Durham

“GridPP”

“wLCG”

18 David Britton, University of Glasgow

29/08/2013 S 19

Multiple Grids

19 David Britton, University of Glasgow

Worldwide LHC Computing Grid

(WLCG) combines:

•EGI (European Grid Infrastructure).

•OSG (Open Science Grid) in the US.

•NorduGrid in the Nordic countries.

Combined Resources (August 2012):

•152 Sites in 36 Countries

•325,000 logical CPUs

•210 Petabytes of disk

•180 Petabytes of tape

Comparing number of cores (which is not a fair measure of the computing

power) “CERN’s (distributed) SuperComputer” would rank 3rd in the current

top-10 supercomputers worldwide.

29/08/2013 S 20

UK Contribution

20 David Britton, University of Glasgow

UK Resources (August 2012):

•19 Sites

•36,000 logical CPUs

•21 Petabytes of disk

•5+ Petabytes of tape

CPU contributions in 2012

USA

UK

29/08/2013 S 21

Grid in Action

21 David Britton, University of Glasgow

29/08/2013 S 22

Data Transfer

22 David Britton, University of Glasgow

Nominal

design rate

was 1.3 GB/s

29/08/2013 S 23

Moving Data – Quick quiz

• With a Gbit connection, how long does it take to move

– 1 GB (Gbyte)?

– 1TB

– 10TB

29/08/2013 S 24

Worldwide Usage

24 David Britton, University of Glasgow

1 million jobs/day

29/08/2013 S 25

Impact

“Wealth Creation” “Quality of Life”

Immense (pictures); Cambridge Ontology (n-

grams); Econophysica (financial); Total Oil

(exploration); Constellation Tech (software)

Avian Flu (biomed); Malaria (Wisdom

project); Landslide prediction; nano-CMOS;

photonics; etc.

29/08/2013 S 26

Evolution-I

26 David Britton, University of Glasgow

Tier-structure for wLCG designed in the late

90’s assumed 600Mbps links. Today’s multi-

Gigabit links enable a more flexible and robust

architecture. May increase complexity.

New CPU architectures require more

application development in order to exploit the

increase in computing capacity. This is a

challenge for legacy code.

Maximum Sustained Bandwidth Density of storage

Maximum Sustained Bandwidth

Although storage density continues to

increase it is getting more difficult to use,

which puts demands on the architecture and

applications, increasing the complexity.

Evolution of computing models

Hierarchy Mesh

29/08/2013 S 27

Evolution-II

27 David Britton, University of Glasgow

MIDDLEWA

RE

CPU

Cluster

User

Interface

Machine

CPU

Cluster

CPU

Cluster

Resource

Broker Information

Service

Grid

Disk

Server

Your

Program

Replica

Catalogue Bookkeeping

Service

MIDDLEWARE

CPU

Cluster

User

Interface

Machine

CPU

Cluster

CPU

Cluster

Resource

Broker Information

Service

Gri

d

Disk

Server

Your

Program

Replica

Catalogue Bookkeeping

Service

Too much middleware actually resides in the

application-layer and is unique to an individual

user group (virtual organisation).

In addition, there are multiple middleware

stacks (gLite; ARC, Unicore, etc) used by

different user groups.

Some degree of rationalisation and consolidation is required - this is a

natural part of the process when working in a development

environment.

29/08/2013 S 28

Evolution-III

28 David Britton, University of Glasgow

The boundary between web and Grid has

become blurred as Grid ideas are taken up.

The web is becoming much more machine-

readable; data movement is becoming more

automated and more extensive as bandwidth

improvements enable new services.

The Grid is also about collaboration: This is somewhat different in the

commercial world where partners tend not to share internal

infrastructure but out-source to a third-party. So we’ve seen the

growth of “Cloud Computing”. Again, the boundaries are becoming

blurred with Grids of Clouds and Clouds of Grids likely in the future.

29/08/2013 S 30

Summary

30 David Britton, University of Glasgow

29/08/2013 S 31

Summary

31 David Britton, University of Glasgow

A Large Hadron Collider Delivering collisions up to 40 million times per second

A Global Supercomputer

29/08/2013 Support for Non LHC

VOs 32

VO usage (3 months)

29/08/2013 Support for Non LHC

VOs 33

Non LHC VO share

http://pprc.qmul.ac.uk/~walker/votable.html

29/08/2013 S 34

Demo

• Submitting a job

– Helloworld

– Running a script

• Managing data

– Copying a file

• LFC

– Mounting via WebDAV

• In future redirect via LFC

29/08/2013 S 35

Hello world example

walker@heplt019:~/talks/2013-daresbury/londongrid-

example$ cat helloworld.jdl

#############Hello World#################

Executable = "/bin/echo";

Arguments = "Hello welcome to londongrid ";

StdOutput = "hello.out";

StdError = "hello.err";

OutputSandbox = {"hello.out","hello.err"};

######################################

29/08/2013 Support for Non LHC

VOs 36

“Live Demo – mounting

storage”

heplt019:~# mount -t davfs https://hepgrid11.ph.liv.ac.uk/dpm/ph.liv.ac.uk/home/ /mnt/liverpoolPlease enter the username to authenticate with server

https://hepgrid11.ph.liv.ac.uk/dpm/ph.liv.ac.uk/home/ or hit enter for none.

Username:

Please enter the password to authenticate user with server

https://hepgrid11.ph.liv.ac.uk/dpm/ph.liv.ac.uk/home/ or hit enter for none.

Password:

Please enter the password to decrypt client

certificate /etc/davfs2/certs/private/my_cert.p12.

Password: /sbin/mount.davfs: the server certificate is not trusted

issuer: Authority, eScienceCA, UK

subject: CSD, Liverpool, eScience, UK

identity: hepgrid11.ph.liv.ac.uk

fingerprint: 34:c1:2d:63:57:2d:ff:07:10:21:cc:1d:a7:7a:ad:58:f9:bd:4d:b0

You only should accept this certificate, if you can

verify the fingerprint! The server might be faked

or there might be a man-in-the-middle-attack.

Accept certificate for this session? [y,N] y

/sbin/mount.davfs: warning: the server does not support locks

29/08/2013 Support for Non LHC

VOs 37

“Live Demo”

heplt019:~# df -h

Filesystem Size Used Avail Use% Mounted on

/dev/sda3 90G 78G 7.2G 92% /

tmpfs 2.0G 0 2.0G 0% /lib/init/rw

udev 2.0G 316K 2.0G 1% /dev

tmpfs 2.0G 0 2.0G 0% /dev/shm

https://hepgrid11.ph.liv.ac.uk/dpm/ph.liv.ac.uk/home/

26G 13G 13G 50% /mnt/liverpool

heplt019:~# cd /mnt/liverpool/atlas/atlasscratchdisk

heplt019:/mnt/liverpool/atlas/atlasscratchdisk# echo "hello webdav" >cjwhello

heplt019:/mnt/liverpool/atlas/atlasscratchdisk# cat cjwhello

hello webdav

29/08/2013 S 38

What does this mean for you?

• GridPP expertise: Big Data

– Compute

– Data transfer over Wide area network

– Federated access

• If your problem fits our solution:

– Talk to us

• Share experience

• Some resources

– Scientific Linux (RHEL/centos compatible)

29/08/2013 S 39

Lessons

• Grid good for

– embarrassingly parallel problems

• Need to deal with failure

– Bookkeeping difficult

– Ganga and Dirac solutions

• FTS for file transfers

29/08/2013 S 40

What does this mean for you?

• GridPP expertise: Big Data

– Compute

– Storage

– Data transfer over Wide area network

– Federated access

• Lots of people accessing the same data

• How do I learn more?

– Talk to me

– Talk to your local grid admin (high energy physics group)

29/08/2013 S 41

GridPP sites in the UK

• If you want to

know more

about GridPP,

talk to your local

site admin

29/08/2013 S 42

Conclusions

• Overview of Grid computing

– LHC and the Higgs

• GridPP

• Demo

– Submitted some jobs

– Transferred some data

29/08/2013 S 43

Acknowledgements

• GridPP

• David Britton

– Many of the slides