CC* Integration:
SANDIE: SDN-Assisted NDN for
Data Intensive Experiments
Edmund Yeh
NSF Campus Cyberinfrastructure and Cybersecurity Innovation for Cyberinfrastructure PI Workshop
SCIENTIFIC and BROADER IMPACT▪ Lay groundwork for an NDN-based
data distribution and access system for data-intensive science fields
▪ Benefit user community through lowered costs, faster data access
and standardized naming structures
▪ Engage next generation of scientists in emerging concepts of future Internet architectures for data intensive applications
▪ Advance, extend and test the NDN paradigm to encompass the most data intensive research applications of global extent
SOLUTIONS + Deliverables
CHALLENGES▪ LHC program in HEP is world’s largest
data intensive application: handling One Exabyte by ~2018 at hundreds of sites
▪ Global data distribution, processing, access, analysis; large but limited computing, storage, network resources
APPROACH▪ Use Named Data Networking (NDN)
to redesign LHC HEP network; optimize workflow
SANDIE: SDN Assisted NDN for Data Intensive ExperimentsSeptember 25,2018 |College Park, MD
TEAM▪ Northeastern, Caltech, Colorado
State▪ In partnership with other LHC
sites and the NDN project team
CC* Integration:
▪ Deploy NDN edge caches with SSDs & 40G/100G network interfaces at 7 sites; combine with larger core caches
▪ Simultaneously optimize caching (“hot” datasets), forwarding, and congestion control in both the network core and site edges
▪ Development of naming scheme and attributes for fast access and efficient communication in HEP and other fields
PI: Edmund Yehco-PIs: Harvey Newman, Christos Papadopoulos
Program, Area: CC, Integration Award Number: 1659403
Edmund YehProfessorNortheastern [email protected]
Christos PapadopoulosProfessor
Colorado State University
Harvey NewmanProfessor of [email protected]
Project Title: SANDIE: SDN-Assisted NDN for Data Intensive Experiments
NSF Campus Cyberinfrastructure and Cybersecurity Innovation for Cyberinfrastructure PI Workshop
September 25,2018 |College Park, MD
Outline
•Named Data Networking (NDN)
•Optimized NDN algorithms
•Optimizing CMS workflows: simulations
•Global testbed
• Integration with Current Workflow: NDN-based filesystem
plugin for XRootD
Named Data Networking (NDN)
• Leading Information-centric Networking (ICN) architecture.
•Started by $7.9M NSF Future Internet Architecture grant (2010).
•Name data instead of endpoints: paradigm shift.
•Pull-based data distribution architecture with stateful forwarding.
•Native support for caching and multicast.
•Natural fit for data-intensive science.
•Website: http://named-data.net
• State-of-art, theoretically-grounded, high-performance optimization
frameworks for optimizing NDN.
• Joint optimization of caching, forwarding, and congestion control.
• VIP (Virtual Interest Packet) framework (Yeh, et al. 2014).
• Adaptive caching and routing framework (Ioannidis & Yeh 2016, 2017).
• Algorithms have optimality guarantees.
• Algorithms are distributed, adaptive, dynamic.
• Superior performance in delay, cache hits, cache evictions.
Optimized NDN Algorithms
• Long-term goal: optimize dynamic caching, distribution of datasets to REPOS, distribution of workflow jobs.
• First goal: given CMS workflow statistics, use simulation to find performance improvement with optimal VIP caching and forwarding algorithm (Yeh, et al. 2014).
• Took dataset and request statistics in PhEDEx and Elastic Search over 2 months at datablock level granularity (10s-100s of GBs).
• Measured CMS network topology and link capacities using PerfSONAR.
• Simulation: apply VIP algorithm to set of 500 most popular datablocks.
• 6 TB caches in all T1 and T2 US sites.
• Result: average data retrieval delays reduced by ~ 50% for off-site workflow.
Optimizing CMS Workflows: Simulations
• Topology includes all Tier 1 and Tier 2 sites in US:
– PerfSONAR: average throughput between sites
– 10 - 100 Gbps typically
CMS Simulation Topology
•Expand Caltech SDN testbed and CSU climate testbed.
•Deploy few more sites in USA and abroad (Northeastern, UCSD, U. Nebraska, ..)
•Caches each with:
⚫ Several terabytes of SSDs
⚫ 40-60 Terabytes of SAS disks
⚫ 10G to 100G network interfaces
⚫ NDN and Xrootd software systems
Global Testbed
•Goals: increased efficiency, reduced complexity for both
applications and network.
• Integrate with XRootD, the de-facto data management
software.
•First step: implement NDN-based filesystem plugin for
XRootD and suitable NDN Producer.
Integration with Current Workflow
Architecture and Flow Diagram
NDN Consumer for XRootD Plugin
• Support for: open, fstat, read and close system calls
• Composes Interests for these system calls over NDN:
/ndn/xrootd/……./root/path/for/ndn/xrd/foo?ndn.MustBeFresh=true
ndn prefix path to file of interest. ndn interest specific info
/ndn/xrootd/......./root/test/path/ndn/xrd/foo/%00%00?ndn.MustBeFresh=true
ndn prefix path to file of interest. seg. no ndn interest specific info
• For read system call, it breaks down the request into segments.
• Handles: Interest validation, timeout and Nack.
offset in file buffer len
NDN package len 7KB
first Interest last Interest
• Registers prefixes to NFD.
• Performs open, fstat, read and close system calls as required.
• Keeps track of opened files.
• Sends data as strings and non-negative integers.
/ndn/xrootd/……..../root/path/for/ndn/xrd/foo/00/?ndn.MustBeFresh=true&ndn.Nonce=825012545
ndn prefix path to file of interest. ndn data specific info
NDN Producer for XRootD Plugin
Demo
Consumer tracing:
Using xrdcp tool to copy file:
Demo
Producer tracing
• NDN is natural fit for LHC and other large-scale scientific data and
computation networks.
• Fully exploits both bandwidth, storage, and computation resources.
• Distributed data access by name, with high-performance caching, forwarding,
congestion control.
• Works well with overlay networks with SDN-assisted reserved bandwidth.
• Synergistic with ongoing move to edge computing, SDN, network function
virtualization, network slicing.
• Possible model for National Research Platform (NRP).
Conclusions