Network Performance for ATLAS Real-Time Remote Computing Farm Study Alberta, CERN Cracow,...

Post on 04-Jan-2016

214 views 2 download

transcript

Network Performance for ATLAS Real-Time Remote Computing Farm StudyAlberta, CERN Cracow, Manchester, NBI

MOTIVATIONSeveral experiments, including ATLAS at the Large Hadron Collider (LHC) and D0 at Fermi Lab, have expressed interest in using remote computing farms for processing and analysing, in real time, the information from particle collision events. Different architectures have been suggested from pseudo-real-time file transfer and subsequent remote processing, to the real-time requesting of individual events as described here.

To test the feasibility of using remote farms for real-time processing, a collaboration was set up between members of ATLAS Trigger/DAQ community, with support from several national research and education network operators (DARENET, Canarie, Netera, PSNC, UKERNA and Dante) to demonstrate a Proof of Concept and measure end-to-end network performance. The testbed was centred at CERN and used three different types of wide area high-speed network infrastructures to link the remote sites:• an end-to-end lightpath (SONET circuit) to the University of Alberta in Canada• standard Internet connectivity to the University of Manchester in the UK and the Niels Bohr Institute in Denmark• a Virtual Private Network (VPN) composed out of an MPLS tunnel over the GEANT and an Ethernet VPN over the PIONIER networks to IFJ PAN Krakow in Poland.

Remote Computing Concepts

ROBROBROBROB

L2PUL2PUL2PUL2PU

SFISFI SFI

PFLocal Event Processing Farms

ATLAS Detectors – Level 1 Trigger

SFOs

Mass storageExperimental Area

CERN B513

CopenhagenEdmontonKrakowManchester

PF

Remote Event Processing Farms

PF

PF PF

ligh

tpat

hs

PF

Data Collection Network

Back End Network

GÉANT

Switch

Level 2 Trigger

Event Builders

CERN-Manchester TCP Activity

TCP/IP behaviour of the ATLAS Request- Response Application Protocol observed with Web100

64 Byte Request in Green 1 Mbyte reponse in Blue TCP in Slow Start takes 19 round trips or ~ 380 ms

TCP Congestion window in RedThis is reset by TCP on each Request due to lack of data sent by the application over the network.TCP obeys RFC 2518 & RFC 2861

0

50000

100000

150000

200000

250000

0 200 400 600 800 1000 1200 1400 1600 1800 2000time

Data

Byte

s O

ut

0

50

100

150

200

250

300

350

400

Data

Byte

s I

n

DataBytesOut (Delta DataBytesIn (Delta

0

50000

100000

150000

200000

250000

0 200 400 600 800 1000 1200 1400 1600 1800 2000time ms

Data

Byte

s O

ut

0

50000

100000

150000

200000

250000

Cu

rCw

nd

DataBytesOut (Delta DataBytesIn (Delta CurCwnd (Value

0

100

200

300

400

500

600

700

800

0 500 1000 1500 2000 2500 3000time ms

nu

m P

ackets

0

200000

400000

600000

800000

1000000

1200000

Cw

nd

PktsOut (Delta PktsIn (Delta CurCwnd (Value

Observation of the Status of Standard TCP with web100

Observation of TCP with no Congestion window reduction

TCP Congestion window in Red grows nicelyRequest-response takes 2 rtt after 1.5 sRate ~ 10 events/s with 50 ms processing time

Transfer achievable throughput grows to 800 Mbit/sData Transferred when the Application requires the data

0100200300400

500600700800900

0 1000 2000 3000 4000 5000 6000 7000 8000time ms

TC

PA

ch

ive M

bit

/s

0

200000

400000

600000

800000

1000000

1200000

Cw

nd

3 Round Trips

2 Round Trips

The ATLAS Application Protocol

Send OK

Send event data

Request event

●●●

Request Buffer

Send processed event

Process event

Time

Request-Response time (Histogram)

Event Filter EFD SFI and SFO

Event Request: EFD requests an event from SFI SFI replies with the event data

Processing of the event occurs

Return of Computation:EF asks SFO for buffer spaceSFO send OKEF transfers the results of the computation

CERN-Alberta TCP Activity

64 Byte Request in Green 1 Mbyte reponse in Blue TCP in Slow Start takes 12 round trips or ~ 1.67 s

Observation of TCP with no Congestion window reduction with web100

TCP Congestion window in Red grows graduallyafter slowstartRequest-response takes 2 rtt after ~2.5 sRate ~ 2.2 events/s with 50 ms processing time

Transfer achievable throughput grows from 250 to 800 Mbit/s

2 RoundTrips

0100000200000300000400000500000600000700000800000900000

1000000

0 1000 2000 3000 4000 5000time

Data

Byte

s O

ut

0

50

100

150

200

250

300

350

400

Data

Byte

s I

n

DataBytesOut (Delta DataBytesIn (Delta

0

100

200

300

400

500

600

700

0 2000 4000 6000 8000 10000 12000 14000 16000 18000 20000

time ms

nu

m P

ackets

0

200000

400000

600000

800000

1000000

Cw

nd

PktsOut (Delta PktsIn (Delta CurCwnd (Value

0100

200300

400500

600700

800

0 2000 4000 6000 8000 10000 12000 14000 16000 18000 20000

time ms

TC

PA

ch

ive M

bit

/s

0

200000

400000

600000

800000

1000000

Cw

nd

Principal partners

Web100 parameters on the server located at CERN (data source)

Green – small requests Blue – big responsesTCP ACK packets also counted (in each direction)One response = 1 MB ~ 380 packets

64 byte Request 1 Mbyte Response

CERN-Kracow TCP Activity

Steady state request-response latency ~140 msRate ~ 7.2 events/sFirst event takes 600 ms due to TCP slow start