Assume a Supercomputer (and a bunch of other stuff)
Earthcube & Other Origins
Achieving the EarthCube goal of an “integrated system to access, analyze and share information that is used by the entire geosciences community” will require transformative advances in approaches to data collection, integration, analysis, and curation. Success depends on identifying productive solutions from inside and outside the community while building on decades of effort in developing the existing community cyberinfrastructure.
Hyak & Other Origins
Cyberinfrastructure is the coordinated aggregate of software, hardware and other technologies, as well as human expertise, required to support current and future discoveries in science and engineering.
The challenge of Cyberinfrastructure is to integrate relevant and often disparate resources to provide a useful, usable, and enabling framework for research and discovery characterized by broad access and “end-to-end” coordination.
* Francine Berman (February 18, 2005). "SBE/CISE Workshop on Cyberinfrastructure for the Social Sciences". San Diego Supercomputer Center.
Assume a Can Opener
Good & Improving Networking
NETWORKPARTNERS
IP router site
Connector site
Connection
First/second port city
Connector name
No. of connections x speed
LEARN
2-100G
Houston/Dallas
Pacific Northwest GigaPOP
2-100G, Seattle/Chicago
Oregon Gigapop
2-10G, Seattle/LA
UEN
100G, Salt Lake City
CENIC
100G, LA
LEARN
2-100GHouston/Dallas
MCNC/C-Light
2-100G, Raleigh/Charlotte
KyRON
10G, Louisville
MAGPI
100G, Philadelphia
NYSERNet
2-100G, 10G New York City/Buffalo
MissiON
2-10G, Jackson/Jackson
GPN
2-100G, Kansas City/Tulsa
Sun Corridor
2-100G Tucson/Phoenix
Indiana GigaPOP
100G, 10GChicago/Indianapolis
OARnet
100G, 10GCleveland/Chicago
3ROX
100G,Pttisburgh
MARIA
100G, Ashburn
CAAREN
100G, McLean
CIC OmniPOP
2-100G, Chicago/Starlight
U of Memphis
10G, Memphis
FLR
100G, Jacksonville
NOX
2-100G, Boston/New York
MREN
100G, Starlight
LONI
100G, Baton Rouge
MAX
100G, 10G, McLean/McLean
UIUC
100G, Starlight
SOX
100G, Atlanta
University of Montana
100G, Missoula
CEN
100G, Hartford
NSHE
2-100G, Reno, Las Vegas
Drexel
100G,Philadelphia
Internet2 Network Connectionswww.internet2.edu/connectors — July of 2015
Globaly!
NSF IRNC-sponsored connections Other international connections
For further information regarding the international programs of Internet2, visit http://internet2.edu/international or contact Heather Boyles, International Relations Director, [email protected]. A listing of networks reachable via the Internet2 Network is found on the back of this page.
THE INTERNATIONAL REACH OF THE INTERNET2 NETWORKwww.internet.edu
U.S. Exchange PointsPacificWavePacificWave-NorthPacificWave-Bay AreaPacificWave-South
StarLight
AtlanticWaveMANLANNGIX-EastAMPATH
And Locally!
Nx10G/40Gto campus research labs
High Speed
Research Network(HSRN)
via Internet2 AL2S
via Pacific Wave
100G
PerfSonar
Hyak
lolo
Co-‐Located Research Equipment
Nx10G/40G/100G
Internet US R&E NetworksPacific RimR&E Networks
Campus Researcher
Typicalcampus user
R&E Networks at Starlight
10G
NSF CC-‐NIE funding supports 10G/40G/100G interconnect components in the green devices.
UW Campus 40G MPLS
Network
100G
via Northern Wave
Pacific Northwest Gigapop
The Networking “Business” ModelOr, how successful efforts are funded
> Partner with researchers with leading needs > Design a solution that satisfies their needs
> Apply for funding
> Let everyone else come along FOR FREE
An assumed SupercomputerOr, what we learned from Hyak
> Transfers from Outside to Central HPC > Transfers from Campus to Central HPC
> Transfers from Outside to Campus
SIZE isn’t the only important dimension
Not a Can Opener, supercomputerBut which supercomputer?
> “It’s not about the cycles, it’s the data that matters” Assumes magic?
> “Take the CPU to the data” Origin’s in MapReduce & Hadoop, irrelevant today Assumes a single-purpose system Assumes each researcher has one
> “CLOUD! Assumes one “cloud” Assumes you can pay for it Assumes it does what you want
“CLOUD!” Is Always Cheaper!!!!errrrr, just kidding
%"Usage Hyak AWS"On0Demand
AWS"10year"Reservation
AWS"30year"Reservation
100% 1.00 12.14 7.17 5.2090% 1.00 9.83 7.17 5.2080% 1.00 7.77 7.17 5.2070% 1.00 5.95 7.17 5.2060% 1.00 4.37 7.17 5.2050% 1.00 3.04 7.17 5.2040% 1.00 1.94 7.17 5.2030% 1.00 1.09 7.17 5.2020% 1.00 0.49 7.17 5.2010% 1.00 0.12 7.17 5.20
Hyak"/"AWS"Effective"Hourly"Rates
“CLOUD!” Is Always Cheaper!!!!errrrr, just kidding, in a plot
$0.00$
$2.00$
$4.00$
$6.00$
$8.00$
$10.00$
$12.00$
100%$ 90%$ 80%$ 70%$ 60%$ 50%$ 40%$ 30%$ 20%$ 10%$
Effec%v
e'$/Nod
e/Hr'
%'Usage'Over'Time'
Hyak'and'AWS'Effec%ve'Hurly'Rates'Compared'
Hyak$
AWS$On8Demand$
AWS$18year$ReservaAon$
AWS$38year$ReservaAon$
But what about XSEDE?
18,298,792'
7,259,042'
Top$10$Users$+$Hyak$vs$XSEDE$+$2014$
Hyak'
XSEDE'
Distributed Supercomputing Doesn’t Work
EFFECTIVE throughput still poor
CLOUDY storage still slow
The Use Cases
> Prep for Petascale P4P Campus<->Hyak<->Outside Immediate, on-demand access Traditional queue useful for testing at larger scales
> Speed of Science SoS
Campus<->Hyak<->Outside Immediate, on-demand access Traditional queue useful for parameter sweeps, etc.
> Big Data Pipelines Catchy Abbreviation Pending Campus<->Hyak Immediate, on-demand access Traditional queue useful exploration
The Campus Condo Cluster
Is a collection of Personal Supercomputers
With a traditional supercomputer FOR FREE
0%#
10%#
20%#
30%#
40%#
50%#
60%#
70%#
80%#
90%#
100%#
1# 7# 13# 19# 25# 31# 37# 43# 49#Months'
Hyak'Elas.c'Usage''
TOTAL#
Std#
Backfill#
Interac>ve#
GPU#
Supporting DIVERSE Workloads
4"
8"16"32"64"128"256"
0.00%"
1.00%"
2.00%"
3.00%"
4.00%"
5.00%"
6.00%"
7.00%"
8.00%"
8" 16" 32" 64" 128" 256" 512" 1024" 2048"
Run$me'in'Hours'
%'of'S
td.'Q
ueue
'Core5Ho
urs'
Core'Count'
Hyak'Core5Hours'Used'in'Standard'Queues'by'job'core5count'&'run$me'
Partnering with Anchor Tenants Works for STORAGE Too!
KB# MB# GB# TB# PB# >PB#%#Respondents# 5.10%# 28.23%# 45.92%# 18.37%# 1.36%# 1.02%#
%#Bytes# 0.00%# 0.00%# 0.00%# 0.16%# 11.75%# 88.10%#
0.00%#
10.00%#
20.00%#
30.00%#
40.00%#
50.00%#
60.00%#
70.00%#
80.00%#
90.00%#
100.00%#
Users%vs%Bytes%UW%Libraries%Data%Repository%Survey%Results%
Files vs Bytes It’s the same story everywhere
0
2
4
6
8
10
12
14
16
18
20
0 5 10 15 20 25 30 35
Perc
enta
ge
log(2) filesize
% Files% Bytes
ConclusionsWhich supercomputer?
> Leadership-class systems are important, of course
> Distributed computing (e.g. BOINC) also!
> But CAMPUS CONDOS fill an important and growing gap
> And the networks we’re building already ASSUME Campus Condos (or similar)!
ConclusionsDon’t assume your cyberinfrastructure!
> Fill the gap with a Campus Condo Cluster
> And Storage – DTN, archives, and working disk
> Partner with your biggest users – it works
> And provide assistance to the Long Tail – it won’t grow by itself