Network Architecture for the LHCb DAQ Upgrade
Network Architecture for the LHCb DAQ Upgrade
Guoming LiuCERN, Switzerland
Upgrade DAQ Miniworkshop
May 27, 2013
2
Introduction to the LHCb DAQ Upgrade: numbers
Potential network technologies for the DAQ upgrade
DAQ network architecture
DAQ schemes
Summary
Outlines
3
Timeframe: installation in the second long shut-down of the LHC in 2018, be ready for data taking in 2019
Trigger: a fully flexible software solution. Low Level Trigger (LLT) : tune the input rate to the computing
farm from 1 – 40 MHz when the system is not fully ready for 40 MHz
The DAQ system should be capable of reading out the whole detector at the LHC collision rate of 40MHz.
Numbers for the DAQ Network Event size: ~100 KB Max. event input rate: 40 MHz Unidirectional Bandwidth: ~38.4 Tbit/s (may scale up)
LHCb DAQ Upgrade
4
High-speed interconnection technologies Ethernet (10G/40G/00G) InfiniBand (FDR, coming EDR) Some other similar technologies
Ethernet Very popular for desktop/station/server Familiar by users/developers
InfiniBand Mainly used in high performance computing and large enterprise
data center High speed: 56Gb/s FDR Great performance/price
Network Technologies
5
Ethernet vs InfiniBand
Ethernet InfiniBand
Reliability Best effort, relies on upper layer protocol TCP/IP
Hardware based re-transmission
Flow Control Pause frame, temporarily blocking the transmission
Credit based
Switch Method Store-and-forward or cut-through
Cut-through
Buffer size Large (store-and-forward) or small (cut-through)
Small
6
Readout board (TELL1): custom FPGA board
UDP-like transport protocol: MEP (Multi-Event Packet)
Push DAQ scheme Deep buffer is required
in the routers and the switches
Review: Current LHCb DAQ
Event data
Timing and Fast Control Signals
SWITCH
HLT farm
Detector
TFC / Readout
Supervisor
SWITCHSWITCH SWITCH SWITCH SWITCH SWITCH
Core router 1MEP Request
Event building
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
Readout Board
VELO ST OT RICH ECal HCal Muon
L0 Trigger
Readout Board
Readout Board
Readout Board
Readout Board
Readout Board
Readout Board
FEE FEE FEE FEE FEE FEE FEE
Core router2
Evt mFrag.
Evt mFrag.
Evt mFrag.
CPU n: DataReq
Evt m, Dest nEvt m,
Dest nEvt m, Dest n
7
Unidirectional solution: Dataflow in the core network is unidirectional
Bidirectional mixed solution: Readout Unit (RU) & Builder Unit (BU) connected to the same
Top-Of-Rack (TOR) switch, dataflow in the core network is bidirectional
Bidirectional uniform solution: RU & BU combined in the same server, dataflow in the core
network is bidirectional
Network Architecture for DAQ upgrdae
8
All the readout units are connected to the core network The builder unit and the filter unit are implemented in the
same server. The dataflow in the core network is unidirectional
Unidirectional solution
9
DAQ: Core Network
Monolithic core router
fabric with fat-tree topologyvs
Monolithic core-router (current solution in LHCb) pros: “simple” architecture, good performance cons: expensive, not many choices
Fabric with fat-tree topology : many small Top-of-Rack (TOR) switches pros: cost-efficiency, scalability, flexibility cons: complexity
Fabric is quite popular in data center: Cisco FabricPath, Juniper QFabri, and also other large chassis …
10
Ethernet vs InfiniBand
The builder unit and the filter unit are implemented in the same server
All the readout units are connected to the TOR switches instead of the core network.
11
Bidirectional mixed solution (1)
The dataflow in the core network is bi-directional Requires RUs and BU/FUs are close enough to connect
the same TOR switch This can save up to 50% of bandwidth and ports in the
core network. The price per port in the core network are usually 3 to 4
times more expensive than in a TOR switch
12
Bidirectional mixed solution (2)
The readout unit and builder unit are implemented in the same server (RU/BU server)
The RU/BU server connects both the core network (for event building) and the TOR switch (for event filtering)
13
Bidirectional uniform solution (1)
The dataflow in the core network is bi-directional Saves up to 50% ports in the core network. Possible to choose different network technologies for the
core layer (event builder network) and the edge layer (event filter network). e.g. cost-effective InfiniBand FDR for the core, low cost 10
GBase-T for the event filter network
Increases the flexibility: deep buffer, easy to implement different DAQ schemes in software
Not tied to any technology Reduces the complexity in the FPGA receiver card
No deep buffer is needed Simple protocol (e.g PCIe) with PC
14
Bidirectional uniform solution (2)
Key to success of the uniform solution: the RU/BU module RU/BU modules serve five purposes:
15
Bidirectional uniform solution (3)
Receives data fragments from the front-end electronics
Sends data fragments to the other modules
Builds complete events Performs event filtering on
a fraction of the complete events
Distributes the remaining events to a sub-farm of filter units
1 2 3 4
IO bandwidth requirements of RU/BU modules: Full 24x GBT link ~ 154 Gb/s input and output
or ~ 215 Gb/s for wide user mode
Preliminary tests on a Sandy-Bridge server Intel E5 2650: 2x16x2.0G 2x Mellanox dual-port InfiniBand FDR cards Connect-IB OS: SLC 6.2 Software: MLNX-OFED 2.0 Connect-IB cards send and receive data simultaneously
16
Bidirectional uniform solution (4)
Preliminary test results: input and output throughput MLNX-OFED 2.0 is a beta version, but needed for the new dual-port cards In MLNX-OFED 1.5.3, the throughput of the single-port card is close to the
limit More tunings on OS and software are needed to improve the performance
17
Bidirectional uniform solution (5)
1k 2k 4k 8k 16k 32k 64k100
120
140
160
180
200
220
Input and Output Throughput
Output
Input
Fragment size (byte)
Th
rou
gh
tpu
t (G
b/s
)
Therotical Limit217.2 Gb/s
Several different DAQ schemes in term of the data flow Push data without traffic shaping Push data with barrel-shift traffic shaping Pull data from the destinations
Different schemes fit for different network technologies and topologies
More details on Daniel’s talk later
18
DAQ schemes
Both Ethernet and InfiniBand, or a mix of both can be the candidate for the DAQ network upgrade
Several architectures have been discussed, the uniform solution is the most flexible and cost-effective solution
Preliminary tests show the uniform solution can work More studies for the LHCb DAQ network upgrade are
needed, stay tuned for the development in industry
19
Summary
20