Software Defined Services (SDS) For High Performance
Large Scale Science Data Streams Across 100 Gbps
WANs
Joe Mambretti, Director, ([email protected])
International Center for Advanced Internet Research (www.icair.org)
Northwestern University
Director, Metropolitan Research and Education Network (www.mren.org)
Co-Director, StarLight, PI StarLight SDX, PI-iGENI, PI-OMNINet
(www.startap.net/starlight)
The International Conference for High Performance Computing,
Networking, Storage and Analysis SC16
Salt Lake City, Utah
November 13-17, 2016
Sloan Digital Sky
Survey
www.sdss.org
Globus Alliance
www.globus.org
LIGO
www.ligo.org TeraGrid
www.teragrid.org
ALMA: Atacama
Large Millimeter
Array
www.alma.nrao.edu
CAMERA
metagenomics
camera.calit2.net
Comprehensive
Large-Array
Stewardship System
www.class.noaa.gov
DØ (DZero)
www-d0.fnal.gov
ISS: International
Space Station
www.nasa.gov/statio
n
IVOA:
International
Virtual
Observatory
www.ivoa.net
BIRN: Biomedical
Informatics Research
Network
www.nbirn.net
GEON: Geosciences
Network
www.geongrid.org
ANDRILL:
Antarctic
Geological
Drilling
www.andrill.org
GLEON: Global Lake
Ecological
Observatory
Network
www.gleon.orgPacific Rim
Applications and
Grid Middleware
Assembly
www.pragma-
grid.net
CineGrid
www.cinegrid.orgCarbon Tracker
www.esrl.noaa.gov/
gmd/ccgg/carbontrack
er
XSEDE
www.xsede.org
LHCONE
www.lhcone.net
WLCG
lcg.web.cern.ch/LCG/publi
c/
OOI-CI
ci.oceanobservatories.org
OSG
www.opensciencegrid.org
SKA
www.skatelescope.o
rg
NG Digital
Sky Survey
ATLAS
Compilation By Maxine Brown
Macro Network Science Themes
• Transition From Legacy Networks To Networks That
Take Full Advantage of IT Architecture and Technology
• Extremely Large Capacity (Multi-Tbps Streams)
• High Degrees of Communication Services
Customization
• Highly Programmable Networks
• Network Facilities As Enabling Platforms for Any Type
of Service
• Network Virtualization
• Highly Distributed Processes
National Science Foundation’s Global
Environment for Network Innovations (GENI)
• GENI Is Funded By The National Science Foundation’s Directorate
for Computer and Information Science and Engineering (CISE)
• GENI Is a Virtual Laboratory For Exploring Future Internets At
Scale.
• GENI Is Similar To Instruments Used By Other Science Disciplines,
e.g., Astronomers – Telescopes, HEP - Synchrotrons
• GENI Creates Major Opportunities To Understand, Innovate and
Transform Global Networks and Their Interactions with Society.
• GENI Is Dynamic and Adaptive.
• GENI Opens Up New Areas of Research at the Frontiers of Network
Science and Engineering, and Increases the Opportunity for
Significant Socio-Economic Impact.
Global Environment for Network Innovations
(GENI)
• GENI
– Supports At-Scale Experimentation on Shared, Heterogeneous,
Highly Instrumented Infrastructure
– Enables Deep Programmability Throughout the Network,
– Promotes innovations in Network Science, Security,
Technologies, Services and Applications
– Provides Collaborative and Exploratory Environments for
Academia, Industry and the Public to Catalyze Groundbreaking
Discoveries and Innovation.
Future Cyberinfrastructure
• Large Scale Highly Distributed Infrastructure That Can
Support Multiple Empirical Research Testbeds At Scale
• Next Generation GENI, Edge Clouds, IOT, US Ignite,
Platform for Advanced Wireless Research (PAWR) and
Many Others
• Currently Being Planned – Will Be Designed,
Implemented and Operated By Researchers for
Researchers
National Science Foundation Global Environment for Network innovations
International 40G and 100 G ExoGENI Testbed
ORCA Semantics
Chapter:
Creating a Worldwide Network
For The Global Environment for Network
Innovations (GENI) and
Related Experimental Environments
iGENI: The International GENI
• The iGENI Initiative Will Design, Develop, Implement, and Operate a Major New National and International Distributed Infrastructure.
• iGENI Will Place the “G” in GENI Making GENI Truly Global.
• iGENI Will Be a Unique Distributed Infrastructure Supporting Research and Development for Next-Generation Network Communication Services and Technologies.
• This Infrastructure Will Be Integrated With Current and Planned GENI Resources, and Operated for Use by GENI Researchers Conducting Experiments that Involve Multiple Aggregates At Multiple Sites.
• iGENI Infrastructure Will Connect Its Resources With Current GENI National Backbone Transport Resources, With Current and Planned GENI Regional Transport Resources, and With International Research Networks and Projects,
•
The Global Lambda Integrated Facility: a Global
Programmable Resource
SDX StarLightNetherLight
Ronald van der Pol, Joe Mambretti, Jim Chen, John Shillington
SCInet TbpsTopology
By Azher Mughal CalTech
DTN Flows@100 Gbps=>Compute CanadaCANARIEStarLight<+>SC16
SCInet WAN Topology By Azher Mughal CalTech
Precision Medicine
• Precisely match treatments to patients and their specific disease
• Genomic data promises optimal matching.
• 1.7 million cancer cases diagnosed in America each year.
• A single RNA-seq file is 10-20 GB, Whole genome raw data files are
> 100 GB.
• Analysis has become the bottleneck and data size is an issue.
2,000,000 genomes ≈ 1 Exabyte (1,000,000,000,000 MB)
Cost to sequence 1 genome less than $5,000 and falling fast.
Cost to analyze 1 genome is approx. $100,000 and rising.
• A key step towards Algorithm-assisted Personalized medicine is
building Data Commons/Cloud analytics and the *Programmable*
Networks & Communication Exchanges (SDXs) for high
performance, flexible data transport.
Hospitals,
Doctors
Cloud Computation
Genomic Data Commons
Patients
Output: Data-
Aware,
Analytics-Informed
Diagnosis,
Prognosis,
Optimal Treatment
Future: The Virtual ComprehensiveCancer Center
Genomic Data Commons Data Transfer
BI Data Flow Visualization (Inbound-Outbound)
From SDSC To UoC
NCI Genomic Data
Commons
• Harmonization and storage for the Nations Cancer Genomic
Data GDC 1.6PB of cancer genomic data and associated clinical
data.
• Precision Medicine Enabled By Precision Networking
Bionimbus Protected Data
Cloud
• Petabyte-scale, secure compliant biomedical
cloud that interoperates with dbGaP controlled
access data at NIH.
Biomedical Data Commons:
Flow Orchestration: Control Plane + Data Plane
Data Plane
Control Plane
Data Repository A (West Coast)
Data Repository C (Asia) Data Repository D (Europe)
Data Repository B
(South)
Visualization Engines
North AmericaCompute Engines
(Midwest)
Object Storage
(Data)
Local
Compute
“Local” Location
(University of Chicago, US)
Remote
Compute
ClientClientServerServer
UDT
“Remote” Location
(ASGC, Taiwan)
TCP
Remote stats and
visualizations
Parcel Based Collaboration
parcel-
udt2tcp
parcel-
tcp2udt
Client identifies and
retrieves data with ARK ids
Performs Compute
Remotely
Results are pushed back
to remote storage and an
ARK id is assigned
Visualizations are
produced locally
Source: Kevin P. Keegan, Bob
Grossman
Software Defined Networking Exchanges
(SDXs): Motivations
• With the Increasing Deployment of SDN In Production Networks,
the Need for an SDN Exchange (SDX) Has Been Recognized.
• Current SDN Architecture/Protocols/Technologies Are Single
Domain Centralized Controller Oriented
• SDXs Are Required To Interconnect Increasing Numbers Of SDN
“Islands.”
• SDXs Provide Highly Granulated Views Into (and Control Over) All
Flows Within the Exchange – i.e., Much Enhanced Traffic
Engineering and Optimization.
• Options for Many New Types of Services and Capabilities, e.g.,
Encyption E2E, Ultra High Resolution Digital Media, Support for
Data Intensive Science
• WH Office of Science and Technology Policy – Large Scale Science
Instrumentation
• Democratization Of Exchange Facilities – Options for Edge Control
Over Exchange Flows
E2E Services Based On Open Architecture For
Petascale Sciences (OAPS)
• Integrating Science Workflows With Foundation Infrastructure
Workflows (Orchestration)
• Providing Built-In Preconfigured Examples/Templates To Establish
Infrastructure Foundation Workflows
• Providing Zero-Touch “Playbooks” For Different Segments of
Infrastructure Foundation Workflows After Running The 1st Suite
• Supporting Interactive Control Over Running Workflows
• Providing Portability for Different Infrastructure Foundation
Workflows
• Providing Capabilities for Experiment Reproducibility
• Providing Options For Real Time Scientific Visualization
• With Standard Tools
IRNC: RXP: StarLight SDX A Software Defined
Networking Exchange for Global Science Research and
Education
Joe Mambretti, Director, ([email protected])
International Center for Advanced Internet Research (www.icair.org)
Northwestern University
Director, Metropolitan Research and Education Network (www.mren.org)
Co-Director, StarLight (www.startap.net/starlight)
PI IRNC: RXP: StarLight SDX
Co-PI Tom DeFanti, Research Scientist, ([email protected])
California Institute for Telecommunications and Information Technology (Calit2),
University of California, San Diego
Co-Director, StarLight
Co-PI Maxine Brown, Director, ([email protected])
Electronic Visualization Laboratory, University of Illinois at Chicago
Co-Director, StarLight
Co-PI Jim Chen, Associate Director, International Center for Advanced Internet
Research, Northwestern University
National Science Foundation
International Research Network Connections Program
Workshop
Chicago, Illinois
May 15, 2015
Another SDX Opportunity! An
Experimental Testbed For
Computer Science Research
IRNC SDX
IRNC SDX
IRNC SDX
IRNC SDX
GENI
SDX
UoM SDX
Planned US SDX Interoperable Fabric
GENI
SDX IRNC SDX
Will Be Contiguous To
the StarLight SDX
Global Research Platform
• A Emerging International Fabric
• A Specialized Globally Distributed Platform For Science
Discovery and Innovation
• Based On State-Of-the-Art-Clouds
• Interconnected With Computational Grids,
Supercomputing Centers, Specialized Instruments, et al
• Also, Based On World-Wide 100 Gbps Networks
• Leveraging Advanced Architectural Concepts, e.g.,
SDN/SDX/SDI – Science DMZs
• Ref: Demonstrations @ SC15, Austin Texas November
2015
• New=> Global Research Platform 100 Gbps Network
(GRPnet) On Private Optical Fiber Between PacificWave
and StarLight via the PNWGP
www.startap.net/starlight
Thanks to the NSF, DOE, DARPA
Universities, National Labs,
International Partners,
and Other Supporters