Date post: | 29-Jan-2016 |
Category: |
Documents |
Upload: | leslie-melton |
View: | 218 times |
Download: | 0 times |
Update on DAQ Upgrade R&D Update on DAQ Upgrade R&D with RCE/CIM and ATCA platform with RCE/CIM and ATCA platform
Rainer Bartoldus, Martin Kocian, Andy Haas, Mike Huffer,
Su Dong, Emanuel Strauss, Matthias Wittgen
2
Prelude
• Generic DAQ R&D at SLAC with the RCE (Reconfigurable Cluster Element) and CIM (Cluster Interconnect Module) on ATCA platform being adapted to ATLAS DAQ upgrade R&D.
• Many previous communications e.g.:– Mike Huffer at ACES Mar/09 (& sessions of last ATUW):
• http://indico.cern.ch/materialDisplay.py?contribId=51&sessionId=25&materialId=slides&confId=47853
– Rainer Bartoldus at ROD workshop Jun/09:• http://indico.cern.ch/materialDisplay.py?
contribId=16&sessionId=4&materialId=slides&confId=59209
• RCE training workshop at CERN June/09:– http://indico.cern.ch/conferenceOtherViews.py?view=standard&confId=57836
with introductions, instructions and discussions as current source of documentations.
• A collaborative R&D open to all:– Shared RCE test stand at CERN (E-mail Rainer to get an account):
https://twiki.cern.ch/twiki/bin/view/Atlas/RCEDevelopmentLab – E-Group for communications: atlas-highlumi-RCE-development for
open signup. – Everyone is welcome to explore !
3
Essential Features of RCE on ATCA
• Generic DAQ concept with RCE born out of analysis of previous HEP DAQ systems to establish basic building blocks serving common needs of broad range of applications.
• Explore the modern System-On-Chip technology with e.g. Vertex-4 FPGAs with versatile integrated resources.
• High speed I/O capabilities for multi Gb/s transmissions to fully utilize FPGA processing power and reduce system footprint.
• Implementation over ATCA based crate infrastructure to benefit from modern telecommunication technology.
• A system consists of RCE processing boards and CIM interconnect modules to utilize ATCA point-point serial backplane connections for high bandwidth data movements and 10GE ethernet access.
• Rear Transition Modules (RTM) to facilitate custom user I/O. • Extensive software infrastructure and utilities are integral
part of the design.
4
Reconfigurable Cluster Element (RCE)
MGTs
Configuration128 MByte Flash
Memory Subsystem
Core
Resources
DX Ports
DX Ports
DSP tiles
Processor
450 MHZ PPC-405Cross-Bar
512 MByte RLDRAM-II
DSP tiles
CombinatoricLogic
DSP tiles
DSP tilesMGTs
Boot Options
reset & bootstrap
options
Combinatoric
Logic
CombinatoricLogic
Combinatoric
Logic
Next generation with Virtex 5 & 6 RCE memory
2 Gbytes
(192 MAC units)+ Extensive associated software infrastructure
and utilities
Current implementationOn Virtex-4 FPGA
55
RCE Hardware ResourcesRCE Hardware Resources
• Multi-Gigabit Transceivers (MGTs)– up to 12 channels of:
• SER/DES• input/output buffering• clock recovery• 8b/10b encoder/decoder• 64b/66b encoder/decoder
– each channel can operate up to 6.5 gb/s– channels may be bound together for greater
aggregate speed• Combinatoric logic
• gates• flip-flops (block RAM)• I/O pins
• DSP support– contains up 192 Multiple-Accumulate-Add (MAC) units
6
RCE Software & Development
• Cross-development…– GNU cross-development environment (C & C++)– remote (network) GDB debugger– network console
• Operating system support…– Bootstrap loader– Open Source Real-Time kernel (RTEMS)
• POSIX compliant interfaces• Standard IP network stack
– Exception handling support• Object-Oriented emphasis:
– Class libraries (C++)• Plugin support• Configuration Interface
77
RCE board + RTM (Petacache project RCE board + RTM (Petacache project example)example)
Media Carrier with flash
Media Slice controller
RCE
Zone 1(power)
Zone 2
Zone 3
transceivers
RTM
88
Cluster Interconnect board + RTMCluster Interconnect board + RTM
CI
Zone 3
Zone 1
10 GE switch
XFP
1G Ethernet
RCE
XFPRTM
9
RCE Development Lab at CERN
10
Application of RCE to Pixel Calibration
Pixel DigitalCalibrationDemo by Martin Kocian
After a few mask stages
End of calibration
Demonstrated at RCE training workshop Jun/15-16/2009 at CERN Similar setup used to test IBL ½ stave electrical data transmission with 16 channels.
Existing PixelModule 3 Gb/s
/CIM
10-GEEthernet HSIO
11
RCE Development Status• RCE R&D has already moved to production systems for
Linac Coherent Light Source (LCLS) controls and experiments at SLAC. Same RCE board used for many R&D projects: Peta- cache, LSST and ATLAS upgrade.
• A significantly upgraded generation-2 RCE with Xilinx Vertex 5 is envisioned for the coming year, among the improvements include larger memory and more user firmware space.
• Exploring RCE for ATLAS pixel upgrade/IBL: work are underway to port current pixel calibrations to RCEs, aiming at FE-I4 tests and IBL stave-0.
• A companion I/O board (HSIO) is widely used for ATLAS Si strip detector upgrade test stand. A compact RCE+HSIO test stand board is planned for near future.
12
RCE+HSIO Test Stand Board
• RCE boards have strong software base for flexible and fast development, but rather bulky with the ATCA crate infrastructure and excess reources not needed for test stand.
• HSIO has the large variety and multiplicity of I/O channels to serve wide range of applications, but the vast FPGA resources is not easy to explore with coding only in firmware.
• Dave Nelson is working on a combined test stand board merging RCE and HSIO:– A slimmed down single FPGA RCE and software
support– A separate Virtex-5 FPGA play original HSIO role– Same variety of I/O channels as HSIO– Same simple stand alone bench operation as HSIO
with just an external 48V, but can also just plug in an ATCA crate
13
Applications for ATLAS DAQ upgrade
• Original investigation was a possible common ROD for most subsystems, and a new combined ROD+ROS architecture to drastically improve bandwidth throughput for phase-2 upgrade.
• The mature R&D advance already allow serious considerations of the RCE/CIM concept for Phase-1 upgrade needs:– RODs for IBL – RODs for forward muon upgrades– RODs for AFP (detector very similar to IBL)– Potential benefit of high throughput ROS ?
Must be able to live within the current TTC/TDAQ architecture
14
A possible 48-channel ROM (Readout Module)
1515
sLHC Upgrade Read-Out-Crate (ROC)sLHC Upgrade Read-Out-Crate (ROC)
from L1
CIM
Rear Transition Module
10-GE switch
P3
Backplane
Rear Transition Module
switch managementL1 fanout 10-GE
switch
Shelf Management
10-GE switch
10-GE switch
P3
ROMs
CIM
To monitoring & control from L1
To L2 & Event Building
switch managementL1 fanout
(X12) 10 gb/s
(x4) 10 gb/s
(x4) 10 gb/s
16
The upgrade path for IBL ROD
• Changes to ROD/BOC and DAQ needed in any case:– Data links at 160Mhz needs at least new BOC (Back of
Crate) and associated ROD firmware change.– IBL uses FE-I4 and 16 FEs per half stave so that some
code changes are necessary anyway.– Upgrade detector need faster & more frequent
calibration.– Difficulty with obsolete parts for maintaining current
design.
• Is there a forward looking upgrade path with modern technology for higher performance yet fit into phase-1 timescale ? – Generic RCE R&D with ATCA is adoptable on the IBL
time scale for its DAQ and test needs at earlier stages.
17
IBL ROD VME Baseline
Reproducing existing RODs to live with present bandwidth limitations by deploying large number of boards.
18
IBL ROD Upgrade Scheme
Initial mode: pure ROD behavior to output via S-link to ROS
Upgrade Mode: combined ROD+ROS behavior directly output to Ethernet.
Read OutModule
19
IBL Upgrade Hardware Components (I)
• ROM– Regular ROM assumes all functionalities of present
ROD and with room to host ROS functionalities.– Each ROM has 6 FPGAs hosting 12 RCEs
• process 40x160Mb/s input Fes with 10 RCEs (each RCE’s share of 640Mb/s is `trivial’ compared to the expected capacity).
• Event building for S-link/ethernet output with 2 RCEs. – RCE includes all resources for data formatting, DAQ
data flow, calibration + memory in present ROD.
• RTM(ROM)– Similar front-end communication roles of the present
BOC, while S-links are simpler Snap12 transceivers. – 40 channel compact optical I/O with TX/RX, same as
current BOC.TX/RX control with FPGA via I2C from ROM.
– No need to deal with 8b/10b encoding as the RCE has embedded native utilities to encode/decode.
20
IBL Upgrade Hardware Components (II)
• CIM– Assumes the network interconnect management and
external interface roles to cover present SBC and TIM functionalities.
– RCE master + 2 Fulcrum FM224s ASICs for 10 GE network switching.
• RTMc(CIM)
– Ethernet I/O connections. – Some functionalities of present TIM and drivers for
I/O with the pixel system TTC crate.
21
TTC Distribution in RCE/CIM crate
Distributed interface with TTCrx ASIC paired with each RCE
22
Upgrade ROM Benefits for IBL Case
• Allow more frequent/extensive/faster calibration – Calibration histogram data output path via 10GE ethernet will
completely remove data shipping timing concerns.– 4x (12x) more memory per pixel than baseline IBL ROD
(current outer layer ROD), and the memories are internal within RCE with much faster access.
– Power PC programming environment much easier than DSPs for complex algorithms, while the 192 DSP tiles/RCE offers large processing power for repetitive simple processing.
• Smaller footprint modern hardware for easier production, installation and maintenance.
• Simpler variation of the ROM with present RCEs offers prototype and test stand boards to meet FE-I4 tests, stave test needs and same software preserved into full system.
• Has built-in architecture evolution flexibility to explore upgrade schemes such as integrated ROD+ROS and potential services to trigger with the very high bandwidth.
23
Backward Compatibility & Commissioning
• Despite the different look of hardware, the user interface will be no different to the existing pixel detector and interface to the rest of pixel DAQ and TDAQ will also look like just another pixel crate (until we try to become ROD+ROS).
• Most existing DAQ/calibration DSP code are adoptable with much less development effort needed compared to original calibration implementation.
• New system can also be made to be able to run on present b-layer so that fiber splitting can be done early on with real system as parasitic DAQ commissioning (as extensively used in BaBar/Tevatron).
• Switching between S-link and ROD+ROS mode can potentially be done without touching hardware.
24
Summary (I)
• RCE/ATCA R&D already well advanced with prototypes being used for IBL/Pixel upgrade testing.
• Investigation for the full readout crate for IBL indicate that the RCE ROM can easily meet the IBL ROD requirements and offers extra margin for much improved performance.
• The project is very much realizable on the IBL time frame owing to the well advanced R&D already carried out at SLAC for other projects.
• The upgrade system has a small hardware foot print and less hardware cost than VME systems.
• The application software effort will benefit from integrated core software utilities and easy to make progress.
• There is a full suite of test prototypes promising same software to be used for tests and finale DAQ/calibration.
25
Summary (II)
• The application for other subsystems (e.g. forward muon) may be simpler if the inputs as also Glinks like S-link. A more flexible configuration possible for the symmetric Glink I/O. We are interested in pure DAQ use cases where this cannot be easily adopted.
• There is sufficient flexibility to allow reconfiguring the architecture to very different modes, including the classical mode fully compatible with current architecture.
• Exploring other possibilities e.g. L1.5 triggers with similar architecture ?
We believe there is a viable path for ATLAS to evolve smoothly into a modern DAQ architecture even before
phase-2
26
Backup
2727
Why ATCA as a packaging standard?Why ATCA as a packaging standard?
• An emerging telecom standard… • Its attractive features:
– backplane & packaging available as a commercial solution– generous form factor
• 8U x 1.2” pitch– hot swap capability– well-defined environmental monitoring & control– emphasis on High Availability– external power input is low voltage DC
• allows for rack aggregation of power
• Its very attractive features:– the concept of a Rear Transition Module (RTM)
• allows all cabling to be on rear (module removal without interruption of cable plant)
• allows separation of data interface from the mechanism used to process that data
– high speed serial backplane• protocol agnostic• provision for different interconnect topologies
2828
Three building block conceptsThree building block concepts
• Computational elements– must be low-cost
• $$$• footprint• power
– must support a variety of computational models
– must have both flexible and performanent I/O
• Mechanism to connect together these elements– must be low-cost– must provide low-latency/high-bandwidth
I/O – must be based on a commodity (industry)
protocol– must support a variety of interconnect
topologies• hierarchical• peer-to-peer• fan-In & fan-Out
• Packaging solution for both element & interconnect– must provide High Availability– must allow scaling– must support different physical I/O
interfaces– preferably based on a commercial standard
• The Reconfigurable Cluster Element (RCE)– employs System-On-Chip
technology (SOC)
• The Cluster Interconnect (CI)– based on 10-GE Ethernet
switching • ATCA
– Advanced Telecommunication Computing Architecture
– crate based, serial backplane
2929
The Cluster Interconnect (CI)The Cluster Interconnect (CI)
• Based on two Fulcrum FM224s– 24 port 10-GE switch– is an ASIC (packaging in 1433-ball BGA)– XAUI interface (supports multiple speeds including 100-
BaseT, 1-GE & 2.5 gb/s)– less then 24 watts at full capacity– cut-through architecture (packet ingress/egress < 200
NS)– full Layer-2 functionality (VLAN, multiple spanning tree
etc..)– configuration can be managed or unmanaged
Management bus
RCE
10-GE L2 switch10-GE L2 switch 10-GE L2 switch
Q0 Q1
Q2 Q3
3030
Derived configuration - Cluster Element (CE) Derived configuration - Cluster Element (CE)
Combinatoric logic
MGTs
Core
1.0/2/5/10.0 gb/s
PGP PGPPGP PGPPGPPGP PGPPGP
Ethernet MAC Ethernet MAC
MGTs
Combinatoric logic
E0
3.125 gb/s
E1
3131
Cluster Interconnect board + RTM (Block diagram)Cluster Interconnect board + RTM (Block diagram)
MFD CI
Q2
P2
1-GE
10-GE XFP
XFP
Q0
Q1 Q3
XFP
XFP
XFP
XFP
XFP
XFP
XFP
Payload RTM
P3
P3
10-GE1-GE
10-GE XFP
10-GE
(fabric)
(base)
base
fabric
(fabric)
(base)
3232
Typical (5 slot) ATCA crateTypical (5 slot) ATCA crate
fans
CI RTM
RCE RTM
CI board
Power suppliesRCE board
Shelf manager
Front
Back
33
IBL Readout Production Cost Estimate
• Prototyping is expected to add ~100K$(?)• No longer needs SBC and TIM
Items Quantity Unit cost (K$)
Sum Cost (K$)
ROM 12 + 6 spares
8 144
RTM 12 + 6 spares
3? 54
CIM 2 + 3 spares 5 25
RTMc 2 + 3 spares 3 15
Crates 1 + 3 spares 5 20
Total255
34
TTC ROD busy