+ All Categories
Home > Documents > Keynote Presentation SC Asia Singapore March 27, 2018 ·  · 2018-03-012018-03-01 · University...

Keynote Presentation SC Asia Singapore March 27, 2018 ·  · 2018-03-012018-03-01 · University...

Date post: 03-Apr-2018
Category:
Upload: lycong
View: 218 times
Download: 2 times
Share this document with a friend
41
“Toward the Global Research Platform” Keynote Presentation SC Asia Singapore March 27, 2018 Dr. Tom DeFanti Research Scientist, Co-PI The Pacific Research Platform and CHASE-CI California Institute for Telecommunications and Information Technology’s Qualcomm Institute University of California San Diego Distinguished Professor Emeritus, University of Illinois at Chicago 1
Transcript

“Toward the Global Research Platform”

Keynote Presentation SC Asia

Singapore

March 27, 2018

Dr. Tom DeFantiResearch Scientist, Co-PI The Pacific Research Platform and CHASE-CI

California Institute for Telecommunications and Information Technology’s

Qualcomm Institute

University of California San Diego

Distinguished Professor Emeritus, University of Illinois at Chicago

1

Abstract

Abstract: The US National Science Foundation-funded (award # 1541349) “The Pacific Research Platform

(PRP)” to the University of California San Diego for 5 years starting October 1, 2015. It emerged out of the unmet

demand for high-performing bandwidth to connect data generators and data consumers. The PRP is in its third

year of building a broad base of support from application scientists, campus CIOs, regional network leaders, and

network engineers, and continues to successfully bring in new, unanticipated science applications, as well as test

new means to dramatically improve throughput. The PRP is, in fact, a grand volunteer community in an ever-

expanding region where 35 CIOs and 50 application scientists initially signed letters of support for the original NSF

proposal, all as unfunded partners. The PRP was scaled to be a regional program by design, mainly focusing on

West Coast US institutions, although it now includes several long-distance US and transoceanic Global Lambda

Integrated Facility (GLIF) partners to verify that the technology used is not limited to the size and homogeneity of

CENIC, the regional network serving California. There is pent-up demand from the high-performance networking

and scientific communities to extend the PRP nationally, and indeed worldwide. This motivated the PRP to host

The First National Research Platform Workshop in Bozeman, MT, in August 2017. At that meeting, a strong US

and international community emerged, well documented in the report published on the PRP website

(pacificresearchplarform.org). This presentation will discuss will cover lessons learned from PRP applications,

technology, and science engagement activities, as well as how best to align future PRP networking strategies with

the GRP’s emerging groundswell of enthusiasm. The goal is to prototype a future in which a fully-funded multi-

national Global Research Platform emerges.

This presentation includes ideas, words and visuals from many sources,

Most prominently: the PI of the PRP and CHASE-CI, Larry Smarr, UCSD

Thirty Years After US NSF Adopts US DOE Supercomputer Center Model

NSF Adopts DOE ESnet’s Science DMZ for High Performance Applications

• A Science DMZ integrates 4 key concepts into a unified whole:

– A network architecture designed for high-performance applications,

with the science network distinct from the general-purpose network

– The use of dedicated systems as data transfer nodes (DTNs)

– Performance measurement and network testing systems that are

regularly used to characterize and troubleshoot the network

– Security policies and enforcement mechanisms that are tailored for

high performance science environments

http://fasterdata.es.net/science-dmz/

Science DMZ

Coined 2010

Based on Community Input and on ESnet’s Science DMZ Concept,

NSF Has Funded Over 100 US Campuses to Build DMZs

Red 2012 CC-NIE Awardees

Yellow 2013 CC-NIE Awardees

Green 2014 CC*IIE Awardees

Blue 2015 CC*DNI Awardees

Purple Multiple Time Awardees

Source: NSF

(GDC)

Logical Next Step: The Pacific Research Platform Networks Campus DMZs

to Create a Regional End-to-End Science-Driven “Big Data Superhighway” System

NSF CC*DNI DIBBsCooperative Agreement

$6M 10/2015-10/2020

PI: Larry Smarr, UC San Diego Calit2

Co-PIs:

• Camille Crittenden, UC Berkeley CITRIS,

• Tom DeFanti, UC San Diego Calit2/QI,

• Philip Papadopoulos, UCSD SDSC,

• Frank Wuerthwein, UCSD Physics and SDSC

Letters of Commitment from:

• 50 Researchers from 15 Campuses

• 32 IT/Network Organization Leaders

Source: John Hess, CENIC

• FIONAs PCs [ESnet DTNs]:

– ~$8,000 Big Data PC with:

– 10/40 Gbps Network Interface Cards

– 3 TB SSDs

– Higher Performance at higher cost:

– +NVMe SSDs & 100Gbps NICs Disk-to-Disk

– +Up to 8 GPUs [4M GPU Core Hours/Week]

– +Up to 196 TB of Disks used as Data Capacitors

– +Up to 38 Intel CPU cores or AMD Epyc cores

– US$1,100 10Gbps FIONA (if 10G is fast enough)

• FIONettes are US$300 EL-30-based FIONAs

– 1Gbps NIC With USB-3 for Flash Storage or SSD

– Perfect for Training and smaller campuses

Big Data Science Data Transfer Nodes (DTNs)-

Flash I/O Network Appliances (FIONAs)

Phil Papadopoulos, SDSC &

Tom DeFanti, Joe Keefe & John Graham, Calit2

Key Innovation: UCSD Designed FIONAs To Solve the Disk-to-Disk

Data Transfer Problem at Full Speed on 10/40/100G Networks

FIONAS—10/40G, US$8,000

FIONette—1G, $300

FIONAs on the PRP and Partners

• ~40 FIONAs are on the PRP as GridFTP (MaDDash) + perfSONAR Systems

– PRP Partners: all 10 UCs, Caltech, Stanford, USC, SDSC, UW, UIC

– Plus U Utah, Montana State, U Chicago, Clemson U, U Hawaii, NCAR, Guam

– Plus Internationals: Uv Amsterdam, KISTI (Korea), Singapore

• Many States and Regionals Building FIONAs and Creating MaDDashes

– FIONA Build Specs on pacificresearchplatform.org Website

– Weekly Engineering Calls with Notes Going to 60+ Technical Participants

– Fasterdata.es.net has lots of DTN and DMZ wisdom and data

We Measure Disk-to-Disk Throughput with 10GB File Transfer

4 Times Per Day in Both Directions for All PRP Sites

January 29, 2016

From Start of Monitoring 12 DTNs

to 24 DTNs Connected at 10-40G

in 1 ½ Years

July 21, 2017

Source: John Graham, Calit2/QI

We Use Kubernetes

to Manage FIONAs Across the PRP

“Kubernetes is a way of stitching together a collection of machines into, basically, a big computer,”

--Craig Mcluckie, Google and now CEO and Founder of Heptio

"Everything at Google runs in a container." --Joe Beda,Google

Rook is Ceph Cloud-Native Object Storage

‘Inside’ Kubernetes

https://rook.io/

Source: John Graham, Calit2/QI

FIONA8FIONA8

100G Epyc NVMe

We Built Nautilus - A Multi-Tenant Containerized PRP HyperCluster for Big Data Applications

Running Kubernetes with Rook/Ceph Cloud Native Storage and GPUs for Machine Learning

40G SSD 3T

100G NVMe 6.4T

SDSU

100G Gold NVMe

March 2018 John

Graham, Calit2/QI

100G NVMe 6.4T

Caltech

40G SSD

UCAR

FIONA8

UCI

FIONA8FIONA8

FIONA8FIONA8

FIONA8

FIONA8FIONA8

FIONA8

sdx-controllercontroller-0

Calit2

100G Gold FIONA8

SDSC

40G SSD

UCR 40G SSD

USC

40G SSD

UCLA

40G SSD

Stanford

40G SSD

UCSB

100G NVMe 6.4T

40G SSD

UCSC

40G SSD

Hawaii

Rook/Ceph - Block/Object/FS

Swift API compatible with

SDSC, AWS, and Rackspace

Kubernetes

Centos7

New NSF CHASE-CI Grant Creates a Community Cyberinfrastructure:

Adding a Machine Learning Layer Built on Top of the Pacific Research Platform

Caltech

UCB

UCI UCR

UCSD

UCSC

Stanford

MSU

UCM

SDSU

NSF Grant for High Speed “Cloud” of 256 GPUs

For 30 ML Faculty & Their Students at 10 Campuses

for Training AI Algorithms on Big Data

Machine Learning Researchers

Need a New Cyberinfrastructure

“Until cloud providers are willing to find a solution

to place commodity (32-bit) game GPUs into their servers

and price services accordingly,

I think we will not be able to leverage the cloud effectively.”

“There is an actual scientific infrastructure need here,

surprisingly unmet by the commercial market,

and perhaps CHASE-CI is the perfect catalyst to break this logjam.”

--UC Berkeley Professor Trevor Darrell

FIONA8: a FIONA with 8 GPUs

Supports PRP Data Science Machine Learning--4M GPU Core Hours/Week

8 Nvidia GTX-1080 Ti GPUs (11 GB)

Testing AMD Radeon Vega (16 GB)

24 CPU Cores, 32,000 GPU cores, 96 GB RAM, 2TB SSD, Dual 10Gbps ports

3” High; ~$16,000

Single vs. Double Precision GPUs:

Gaming vs. Supercomputing

8 x 1080 Ti: 1 million GPU core

hours every two days.

700 million GPU core hours for

$16K in 4 yrs

$22/million GPU core hours.

Plus power, admin costs

GPUs for

OSG Applications

UCSD Game GPUs for Data Sciences Cyberinfrastructure -

Devoted to Data Analytics and Machine Learning Research and Teaching

SunCAVE 70 GPUs

WAVE + VROOM 48 GPUs

FIONA with

8-Game GPUs

88 GPUs

for Students

CHASE-CI Grant Provides

256 GPUs to 32 Researchers on 10 Campuses:

>22B GPU Core Hours over 4 years

FIONA8FIONA8

100G Epyc NVMe

40G 160TB

100G NVMe 6.4T

SDSU

100G Gold NVMe

March 2018 John Graham, UCSD

100G NVMe 6.4T

Caltech

40G 160TB

UCAR

FIONA8

UCI

FIONA8FIONA8

FIONA8FIONA8

FIONA8

FIONA8FIONA8

FIONA8

sdx-controllercontroller-0

Calit2

100G Gold FIONA8

SDSC

40G 160TB

UCR 40G 160TB

USC40G 160TB

UCLA

40G 160TB

Stanford

40G 160TB

UCSB

100G NVMe 6.4T

40G 160TB

UCSC

40G 160TB

Hawaii

Rook/Ceph - Block/Object/FS

Swift API compatible with

SDSC, AWS and Rackspace

Kubernetes

Centos7

Running Kubernetes/Rook/Ceph On PRP

Allows Us to Deploy a Distributed PB+ of Storage for Posting Science Data

Expanding to the Global Research Platform

Via CENIC/Pacific Wave, Internet2, and International Links

PRP

PRP’s Current

International

Partners

Asia to US Shows Distance is Not the Barrier

to Above 5Gb/s Disk-to-Disk Performance

Netherlands

Guam

Australia

Korea

Japan

Singapore

PRP Held

The First National Research Platform Workshop on August 7-8, 2017

Co-Chairs:

Larry Smarr, Calit2

& Jim Bottum, Internet2

Program Chair:

Tom DeFanti

See agenda, reports, video on

pacificresearchplarform.org

135 Attendees

Coming: The Second National Research Platform Workshop (2NRP)

Bozeman, MT August 6-7, 2018—Register Soon at CENIC.ORG!

Steering Committee :

Larry Smarr, Calit2

Inder Monga, ESnet

Ana Hunsinger, Internet2

Program Committee:

Jim Bottum

Maxine Brown

Sherilyn Evans

Marla Meehl

Wendy Huntoon

Kate Mace

Local Hosts: Jerry Sheehan, MSU and CENIC

Thank You for Your Kind Attention!

Our Support Comes From:

• US National Science Foundation (NSF) awards

➢ CNS 0821155, CNS-1338192, CNS-1456638, CNS-1730158,

ACI-1540112, & ACI-1541349

• University of California Office of the President CIO

• UCSD Chancellor’s Integrated Digital Infrastructure Program

• UCSD Next Generation Networking initiative

• Calit2 and Calit2’s Qualcomm Institute

• CENIC, PacificWave and StarLight

• DOE ESnet

PRP’s First 2 Years:

Connecting Multi-Campus Application Teams and Devices

Earth

Sciences

Data Transfer Rates From 40 Gbps DTN in UCSD Physics Building,

Across Campus on PRISM DMZ, Then to Chicago’s Fermilab Over CENIC/ESnet

Source: Frank Wuerthwein, UCSD, SDSC

Based on This Success,

Upgrading 40G DTN to 100G

For Bandwidth Tests & Kubernetes

to OSG, Caltech, and UCSC

LHC Data Analysis Running on PRP

Source: Frank Würthwein, OSG, UCSD/SDSC, PRP

Two Projects:

• OSG Cluster-in-a-Box for “T3”

• Distributed Xrootd Cache for “T2”

PRP Over CENIC

Couples UC Santa Cruz Astrophysics Cluster to LBNL NERSC Supercomputer

CENIC 2018

Innovations in

Networking

Award for

Research

Applications

100 Gbps FIONA at UCSC Allows for Downloads to the UCSC Hyades Cluster

from the LBNL NERSC Supercomputer for DESI Science Analysis

300 images per night.

100MB per raw image

120GB per night

250 images per night.

530MB per raw image

800GB per night

Source: Peter Nugent, LBNL

Professor of Astronomy, UC Berkeley

Precursors to

LSST and NCSA

NSF-Funded Cyberengineer

Shaw Dong @UCSC

Receiving FIONA

Feb 7, 2017

Distributed Computation on PRP Nautilus HyperCluster

Coupling SDSU Cluster and SDSC Comet Using Kubernetes Containers

25 years

Developed and executed MPI-based PRP Kubernetes Cluster execution

[CO2,aq] 100 Year Simulation

4 days75 years

100 years

• 0.5 km x 0.5 km x 17.5 m

• Three sandstone layers

separated by two shale

layers

Simulating the Injection of CO2

in Brine-Saturated Reservoirs:

Poroelastic & Pressure-Velocity

Fields Solved In Parallel With MPI

Using Domain Decomposition

Across Containers

Source: Chris Paolini and Jose Castillo, SDSU

40G FIONAs

20x40G PRP-connected

WAVE@UC San Diego

PRP Enables

Distributed Walk-in Virtual Reality CAVEs

PRP

WAVE @UC Merced

Transferring 5 CAVEcam Images from UCSD to UC Merced:

2 Gigabytes now takes 2 Seconds (8 Gb/sec)

The Prototype PRP Has Attracted

New Application Drivers

Scott Sellars, Marty Ralph

Center for Western Weather

and Water Extremes

Frank Vernon, Graham Kent, & Ilkay Altintas, WildfiresJules Jaffe – Undersea Microscope

Tom Levy

At-Risk Cultural Heritage

PRP Links At-Risk Cultural Heritage and Archaeology Datasets

at UCB, UCLA, UCM and UCSD with CAVEkiosks

48 Megapixel CAVEkiosk

UCSD Library

48 Megapixel CAVEkiosk

UCB Library

24 Megapixel CAVEkiosk

UCM Library

UC President Napolitano's Research Catalyst Award to UC San Diego (Tom Levy),

UC Berkeley (Benjamin Porter), UC Merced (Nicola Lercari) and UCLA (Willeke Wendrich)

Church Fire, San Diego CA

Alert SD&ECameras/HPWREN

October 21, 2017

New PRP Application:

Coupling Wireless Wildfire Sensors to Computing

Thomas Fire, Ventura, CA

Firemap Tool, WIFIRE

December 10, 2017

CENIC 2018

Innovations in Networking Award

for Experimental Applications

temperature

relative humidity

fuel moisture

fuel temperature

data logger

barometric pressure

Pan-tilt-zoom camera

support

equipment

3D ultrasonic

anemometer

solar

radiation

tipping

rainbucket

anemometer

Mount Laguna Meterological Sensor Instrumentation Provides

Real-Time Data Flows Over HPWREN to PRP-Connected Servers

Source: Hans-Werner Braun, SDSC

HPWREN-Connected SoCal Weather Stations:

Giving High-Resolution Weather Data in San Diego County

All Connected by

HPWREN Wireless Internet

PRP/CENIC Backbone Sets Stage for 2018 Expansion

of HPWREN Wireless Connectivity Into Orange and Riverside Counties

• PRP CENIC 100G

Links UCSD, SDSU &

UCI HPWREN

Servers

– FIONAs Endpoints

– Data Redundancy

– Disaster Recovery

– High Availability

– Kubernetes Handles

Software Containers

and Data

• Potential Future UCR

CENIC Anchor

UCR

UCI

UCSD

SDSU

Source: Frank Vernon,

Hans Werner Braun HPWREN UCI Antenna Dedicated

June 27, 2017

Once a Wildfire is Spotted, PRP Brings High-Resolution Weather Data

to Fire Modeling Workflows in WIFIRE

Real-Time

Meteorological Sensors

Weather Forecast

Landscape data

WIFIRE Firemap

Fire Perimeter

Work Flow

PRP

Source: Ilkay Altintas, SDSC

Some Machine Learning Case Studies

To Improve on WIFIRE

• Smoke and fire perimeter detection based on imagery

• Prediction of Santa Ana and fire conditions specific to location

• Prediction of fuel build up based on fire and weather history

• NLP for understanding local conditions based on radio communications

• Deep learning on multi-spectra imagery for high resolution fuel maps

• Classification project to generate more accurate fuel maps (using Planet Labs satellite data)

All Require Periodic,

Dynamic, and

Programmatic

Access to Data!

Source: Ilkay Altintas, SDSC; Co-PI CHASE-CI

Director: F. Martin Ralph Website: cw3e.ucsd.edu

Big Data Collaboration with:

Source: Scott Sellers, CW3E

Collaboration on Atmospheric Water in the West

Between UC San Diego and UC Irvine

Director, Soroosh Sorooshian, UCSD Website http://chrs.web.uci.edu

Calit2’s FIONA

SDSC’s COMET

Calit2’s FIONA

Pacific Research Platform (10-100 Gb/s)

GPUsGPUs

Complete workflow time: 20 days20 hrs20 Minutes!

UC, Irvine UC, San Diego

Major Speedup in Scientific Work Flow

Using the PRP

Source: Scott Sellers, CW3E

Using Machine Learning to Determine

the Precipitation Object Starting Locations

*Sellars et al., 2017 (in prep)

UC San Diego Jaffe Lab (SIO) Scripps Plankton Camera

Off the SIO Pier with Fiber Optic Network

Over 300 Million Images So Far!

Requires Machine Learning for Automated Image Analysis and Classification

Phytoplankton: Diatoms

Zooplankton: Copepods

Zooplankton: Larvaceans

Source: Jules Jaffe, SIO

”We are using the FIONAs for image processing...

this includes doing Particle Tracking Velocimetry

that is very computationally intense.”-Jules Jaffe


Recommended