Date post: | 08-Jan-2017 |
Category: |
Data & Analytics |
Upload: | larry-smarr |
View: | 516 times |
Download: | 0 times |
“The Pacific Research Platform”
Briefing to The Quilt Visit to Calit2’s Qualcomm Institute
University of California, San DiegoFebruary 10, 2016
Dr. Larry SmarrDirector, California Institute for Telecommunications and Information Technology
Harry E. Gruber Professor, Dept. of Computer Science and Engineering
Jacobs School of Engineering, UCSDhttp://lsmarr.calit2.net
1
Vision: Creating a West Coast “Big Data Freeway” Connected by CENIC/Pacific Wave
Use Lightpaths to Connect All Data Generators and Consumers,
Creating a “Big Data” FreewayIntegrated With High Performance Global Networks
“The Bisection Bandwidth of a Cluster Interconnect, but Deployed on a 20-Campus Scale.”
This Vision Has Been Building for Over a Decade
NSF’s OptIPuter Project: Demonstrating How SuperNetworks Can Meet the Needs of Data-Intensive Researchers
OptIPortal– Termination
Device for the
OptIPuter Global
Backplane
Calit2 (UCSD, UCI), SDSC, and UIC Leads—Larry Smarr PIUniv. Partners: NCSA, USC, SDSU, NW, TA&M, UvA, SARA, KISTI, AIST
Industry: IBM, Sun, Telcordia, Chiaro, Calient, Glimmerglass, Lucent
2003-2009 $13,500,000
In August 2003, Jason Leigh and his
students used RBUDP to blast data from NCSA to SDSC
over theTeraGrid DTFnet,
achieving18Gbps file transfer out of the available 20Gbps
LS Slide 2005
DOE ESnet’s Science DMZ: A Scalable Network Design Model for Optimizing Science Data Transfers
• A Science DMZ integrates 4 key concepts into a unified whole:– A network architecture designed for high-performance applications,
with the science network distinct from the general-purpose network
– The use of dedicated systems for data transfer
– Performance measurement and network testing systems that are regularly used to characterize and troubleshoot the network
– Security policies and enforcement mechanisms that are tailored for high performance science environments
http://fasterdata.es.net/science-dmz/Science DMZCoined 2010
The DOE ESnet Science DMZ and the NSF “Campus Bridging” Taskforce Report Formed the Basis for the NSF Campus Cyberinfrastructure Network Infrastructure and Engineering (CC-NIE) Program
Based on Community Input and on ESnet’s Science DMZ Concept,NSF Has Funded Over 100 Campuses to Build Local Big Data Freeways
2012-2015 CC-NIE / CC*IIE / CC*DNI PROGRAMS
Red 2012 CC-NIE AwardeesYellow 2013 CC-NIE AwardeesGreen 2014 CC*IIE AwardeesBlue 2015 CC*DNI AwardeesPurple Multiple Time Awardees
Source: NSF
The Pacific Research Platform: The Next Logical Step – Connect Multiple Campus Science DMZs with 10-100Gbps Lightpaths
NSF CC*DNI Grant$5M 10/2015-10/2020
PI: Larry Smarr, UC San Diego Calit2Co-Pis:• Camille Crittenden, UC Berkeley CITRIS, • Tom DeFanti, UC San Diego Calit2, • Philip Papadopoulos, UC San Diego SDSC, • Frank Wuerthwein, UC San Diego Physics
and SDSC
FIONA – Flash I/O Network Appliance:Termination Device for 10-100Gbps Flows
UCOP Rack-Mount Build:
FIONAs Are Science DMZ Data Transfer Nodes &Optical Network Termination Devices
UCSD CC-NIE Prism Award & UCOPPhil Papadopoulos & Tom DeFanti
Joe Keefe & John Graham
Cost $8,000 $20,000
Intel Xeon Haswell Multicore
E5-1650 v3 6-Core
2x E5-2697 v3 14-Core
RAM 128 GB 256 GB
SSD SATA 3.8 TB SATA 3.8 TB
Network Interface 10/40GbEMellanox
2x40GbE Chelsio+Mellanox
GPU NVIDIA Tesla K80
RAID Drives 0 to 112TB (add ~$100/TB)
John Graham, Calit2’s QI
FIONAs as Uniform DTN End Points
Existing DTNs
As of October 2015
FIONA DTNs
UC FIONAs Funded byUCOP “Momentum” Grant
Ten Week Sprint to Demonstrate the West Coast Big Data Freeway System: PRPv0
Presented at CENIC 2015 March 9, 2015
FIONA DTNs Now Deployed to All UC CampusesAnd Most PRP Sites
Pacific Research PlatformMulti-Campus Science Driver Teams
• Jupyter Hub• Biomedical
– Cancer Genomics Hub/Browser– Microbiome and Integrative ‘Omics– Integrative Structural Biology
• Earth Sciences– Data Analysis and Simulation for Earthquakes and Natural Disasters– Climate Modeling: NCAR/UCAR– California/Nevada Regional Climate Data Analysis– CO2 Subsurface Modeling
• Particle Physics• Astronomy and Astrophysics
– Telescope Surveys– Galaxy Evolution– Gravitational Wave Astronomy
• Scalable Visualization, Virtual Reality, and Ultra-Resolution Video 10
PRP First Application: Distributed IPython/Jupyter Notebooks: Cross-Platform, Browser-Based Application Interleaves Code, Text, & Images
IJuliaIHaskellIFSharpIRubyIGoIScalaIMathicsIaldorLuaJIT/TorchLua KernelIRKernel (for the R language)IErlangIOCamlIForthIPerlIPerl6IoctaveCalico Project • kernels implemented in Mono,
including Java, IronPython, Boo, Logo, BASIC, and many others
IScilabIMatlabICSharpBashClojure KernelHy KernelRedis Kerneljove, a kernel for io.jsIJavascriptCalysto SchemeCalysto Processingidl_kernelMochi KernelLua (used in Splash)Spark KernelSkulpt Python KernelMetaKernel BashMetaKernel PythonBrython KernelIVisual VPython Kernel
Source: John Graham, QI
GPU JupyterHub:
2 x 14-core CPUs256GB RAM 1.2TB FLASH
3.8TB SSDNvidia K80 GPU
Dual 40GbE NICsAnd a Trusted Platform
Module40Gbps
GPU JupyterHub:
1 x 18-core CPUs128GB RAM 3.8TB SSD
Nvidia K80 GPUDual 40GbE NICs
And a Trusted Platform Module
PRP UC-JupyterHub BackboneNext Step: Deploy Across PRP
Source: John Graham, Calit2 UC Berkeley UC San Diego
OSG Federates Clusters in 40/50 States:A Major XSEDE Resource
Source: Miron Livny, Frank Wuerthwein, OSG
Open Science Grid Has Had a Huge Growth Over the Last Decade -Currently Federating Over 130 Clusters
Crossed 100 Million
Core-Hours/MonthIn Dec 2015
Over 1 Billion Data Transfers
Moved200 Petabytes
In 2015
Supported Over200 Million Jobs
In 2015
Source: Miron Livny, Frank Wuerthwein, OSG
ATLAS
CMS
PRP Prototype of LambdaGrid Aggregation of OSG Software & ServicesAcross California Universities in a Regional DMZ
• Aggregate Petabytes of Disk Space & PetaFLOPs of Compute, Connected at 10-100 Gbps
• Transparently Compute on Data at Their Home Institutions & Systems at SLAC, NERSC, Caltech, UCSD, & SDSC
SLAC
UCSD & SDSC
UCSB
UCSC
UCD
UCR
CSU Fresno
UCI
Source: Frank Wuerthwein, UCSD Physics;
SDSC; co-PI PRP
PRP Builds on SDSC’s
LHC-UC ProjectCaltech
ATLAS
CMS
other physics
life sciences
other sciences
OSG Hours 2015 by Science Domain
Two Automated Telescope SurveysCreating Huge Datasets Will Drive PRP
300 images per night. 100MB per raw image
30GB per night
120GB per night
250 images per night. 530MB per raw image
150 GB per night
800GB per nightWhen processed
at NERSC Increased by 4x
Source: Peter Nugent, Division Deputy for Scientific Engagement, LBLProfessor of Astronomy, UC Berkeley
Precursors to LSST and NCSA
PRP Allows Researchersto Bring Datasets from NERSC
to Their Local Clusters for In-Depth Science Analysis
Global Scientific Instruments Will Produce Ultralarge Datasets Continuously Requiring Dedicated Optic Fiber and Supercomputers
Square Kilometer Array Large Synoptic Survey Telescope
https://tnc15.terena.org/getfile/1939 www.lsst.org/sites/default/files/documents/DM%20Introduction%20-%20Kantor.pdf
Tracks ~40B Objects,Creates 10M Alerts/Night
Within 1 Minute of Observing
2x40Gb/s
PRP Will Support the Computation and Data Analysisin the Search for Sources of Gravitational Radiation
Augment the aLIGO Data and Computing Systems
at Caltech, by connecting at 10Gb/s to SDSC Comet supercomputer,
enabling LIGO computations to enter via the same PRP “job cache” as for LHC.
HPWREN Users and Public Safety ClientsGain Redundancy and Resilience from PRP Upgrade
San Diego CountywideSensors and Camera
ResourcesUCSD & SDSU
Data & ComputeResources
UCSD
UCR
SDSU
UCIUCI & UCR
Data Replicationand PRP FIONA Anchors
as HPWREN ExpandsNorthward
10X Increase During Wildfires
• PRP CENIC 10G Link UCSD to SDSU– DTN FIONAs Endpoints– Data Redundancy – Disaster Recovery – High Availability – Network Redundancy
Data From Hans-Werner Braun
Source: Frank Vernon, Greg Hidley, UCSD
PRP Backbone Sets Stage for 2016 Expansion of HPWREN, Connected to CENIC, into Orange and Riverside Counties
• Anchor to CENIC at UCI– PRP FIONA Connects to
CalREN-HPR Network– Data Replication Site
• Potential Future UCR CENIC Anchor
• Camera and Relay Sites at:– Santiago Peak– Sierra Peak– Lake View– Bolero Peak– Modjeska Peak– Elsinore Peak– Sitton Peak– Via Marconi
Collaborations through COAST –County of Orange Safety Task Force
UCR
UCI
UCSD
SDSU
Source: Frank Vernon, Greg Hidley, UCSD
40G FIONAs
20x40G PRP-connected
WAVE@UC San Diego
PRP Links FIONA ClustersCreating Distributed Virtual Reality
PRP
CAVE@UC Merced
UCD
UCSF
Stanford
NASAAMES/NREN
UCSC
UCSB
Caltech
USC UCLA
UCIUCSD SDSU
UCR
EsnetDoE Labs
UW/PNWGPSeattle
Berkeley
UCM
Los Nettos
Internet2
Internet2Seattle
Note: This diagram represents a subset of sites and connections.
* Institutions withActive Archaeology Programs
“In an ideal world –Extremely high bandwidth to
move large cultural heritage datasets around the PRP cloud for
processing & viewing in CAVEs around PRP with Unlimited Storage
for permanent archiving.”-Tom Levy, UCSD
PRP is NOT Just for Big Data Science and Engineering:Linking Cultural Heritage and Archaeology Datasets
Building on CENIC’s ExpansionTo Libraries, Museums,
and Cultural Sites
Next Step: Global Research PlatformBuilding on CENIC/Pacific Wave and GLIF
Current InternationalGRP Partners