+ All Categories
Home > Documents > Welcome the Second Spring of Dataflow and Parallel...

Welcome the Second Spring of Dataflow and Parallel...

Date post: 14-May-2018
Category:
Upload: ngomien
View: 214 times
Download: 0 times
Share this document with a friend
50
: Welcome the Second Spring of Dataflow and Parallel Computing -- Toward a Path of Convergence for Ecosystems of Extreme-Scale HPC, Big Data and Beyond Guang R. Gao ACM Fellow and IEEE Fellow Endowed Distinguished Professor, University of Delaware And Founder of ETI A&M 05-16-2016 1
Transcript

:

Welcome the Second Spring of Dataflow and Parallel Computing --

Toward a Path of Convergence for Ecosystems of Extreme-Scale HPC, Big Data and Beyond

Guang R. Gao ACM Fellow and IEEE Fellow

Endowed Distinguished Professor, University of Delaware

And Founder of ETI

A&M 05-16-2016 1

Outline • Introduction • Second Spring of HPC Parallel Computing • New Challenges:HPC vs. Big Data –

Divergence or Convergence ? • The Codelet Model and SWARM • Challenges/Opportunities: HPC + Big Data • Summary Remarks

A&M 05-16-2016 2

Looking Back 20+ Years The Pessimism over our field..

• HPC is a small and relatively unimportant field ?

• Is Parallel Computing dead – Ken Kennedy? • Computer architecture is a dead field ? • Full Artificial Intelligence is a “fantasy” ? • Dataflow model of computation suffered great

setback ….

A&M 05-16-2016 3

Looking Back 20+ Years ..

• “Parallel Computing is dead” • “Death of computer architecture” • “Death of dataflow model of computation” • “Death of Artificial Intelligence!” • ….

AIST-03-01-2016 演讲 4

IPDPS2005-Keynote 5

State of Parallel Computer Architecture Innovations

– “…researchers basked in parallel-computing glory. They developed an amazing variety of parallel algorithms for every applicable sequential operation. They proposed every possible structure to interconnect thousands of processors…”

– But “.. The market for massively parallel computers has collapsed, and many companies have gone out of business.

[IEEE Computer, Nov. 1994, pp 74-75]

IPDPS2005-Keynote 6

State of Parallel Computer Architecture Innovations

• “ ..The term 'proprietary architecture' has become pejorative. For computer designers, the revolution is over and only 'fine tuning' remains… “

[“End of Architecture”, Burton Smith 1990s]

5/25/2016 A&M 05-16-2016 7

Corporations Vanishing (1985 – 2005)

1990 1992 1994 1996 1985 2000 1998 2005

1999 Sequent

1994 Thinking Machines

1992 Meiko Scientific

1995 Pyramid

1998 DEC

1989 ETA

MasPar 1996

Convex Computer

1994

nCube 2005

Kendall Square Resarch

1996

ESCD 1990 Multiflow

1990 Cray Research

1996

BBN 1997

Myrias 1991

Keynote at the 2005 IPDPS Conference Denver, CO

“Is Parallel Computing Dead ?” - Ken Kennedy, 1994

AIST-03-01-2016 演讲 8

“The announcement that Thinking Machines would seek Chapter 11 bankruptcy protection, although not unexpected, sent shock waves through the high- performance computing community. Coupled with the well-publicized problems of Kendall Square Research and the rumored problems of Intel Supercomputer Systems Division, this event has led many people to question the long- term viability of the parallel computing industry and even parallel computing itself. Meanwhile, the dramatic strides in the performance of scientific workstations continues to squeeze the market for parallel supercomputing. On several recent occasions, I have been asked whether parallel computing will soon be relegated to the trash heap reserved for promising technologies that never quite make it. Washington certainly seems to be looking in the other direction--agency program managers, if they talk of high-performance computing at all, seem to view it as a small and relatively unimportant subcomponent of the National Information Infrastructure.

Outline • Introduction • Second Spring of HPC Parallel Computing • New Challenges: HPC vs. Big Data –

Divergence or Convergence • The Codelet Model and SWARM • Challenges/Opportunities: HPC + Big Data • Summary Remarks

A&M 05-16-2016 9

山穷水尽疑无路 柳暗花明又一村

宋代诗人陆游的作品

《游山西村》

A&M 05-16-2016 10

2005-Present A Second Spring of HPC Parallel Computing

• Sequential processing hits serious walls – Heat wall – Memory wall – Other walls

• Parallel processing (appear) to provide a powewrful alternative to beat the walls

• Moors Law (appear) to still enjoy good years in the past decade

• Two examples (see next 2 slides) A&M 05-16-2016 11

2016/5/25 IPDPS2005-Keynote 12

2016/5/25 IPDPS2005-Keynote 13 Communication Ports for

3D Mesh Inter-Chip Network

UPC+/- Co-array Fortran OpenMP-XN EARTH-C +/- MPI ……

Application Programming API

Cyclops Thread Virtual Machine Thread

Management Shared Memory

Operations

Thread Creation & Termination

Scheduling

Dynamic memory management

Put / get with sync

acquire / release fibers

async function invocation

Kcc/gcc

Compiler

Tool

chain

Cyclops-64 Programming Models and System Software Supports

Cyclops-64 ISA

Fine-Grain Multithreading

Thread Synchronization

Load Balancing

Others

Put / get

Location Consistency

System Software

Percolation

Advanced Execution/ Programming Model

Infrastructure and Tools

Simulation / Emulation

Analytical Modeling

Base Execution

Model

Fine-Grain Multithreading (e.g. EARTH,

CARE)

24x24

24 PC cards in 1 shishkebab

1 PetaFlops

A-Switch

Crossbar Network

MEM

OR

YB

AN

K

MEM

OR

YB

AN

K

MEM

OR

YB

AN

K

MEM

OR

YB

AN

K

MEM

OR

YB

AN

K

MEM

OR

YB

AN

K

MEM

OR

YB

AN

K

MEM

OR

YB

AN

K

TU TU

SP SP

FPU

4 GB/sec* 6

4 GB/sec

50 MB/sec

1 Gbit/sethernet

Off

-Chi

p M

emor

y

OtherChips via 3D

mesh

Off

-Chi

p M

emor

yO

ff-C

hip

Mem

ory

Off

-Chi

p M

emor

y

IDEHDD

4 GB/sec

6

SP SP SP SP SP SP SP SP

TU TU

SP SP

FPU

TU TU

SP SP

FPU

TU TU

SP SP

FPU

A-s

witc

h

DM

A6A-Switch

Crossbar NetworkCrossbar Network

MEM

OR

YB

AN

KM

EMO

RY

BA

NK

MEM

OR

YB

AN

KM

EMO

RY

BA

NK

MEM

OR

YB

AN

KM

EMO

RY

BA

NK

MEM

OR

YB

AN

KM

EMO

RY

BA

NK

MEM

OR

YB

AN

KM

EMO

RY

BA

NK

MEM

OR

YB

AN

KM

EMO

RY

BA

NK

MEM

OR

YB

AN

KM

EMO

RY

BA

NK

MEM

OR

YB

AN

KM

EMO

RY

BA

NK

TU TU

SP SP

FPU

TUTU TUTU

SPSP SPSP

FPUFPU

4 GB/sec* 6

4 GB/sec

50 MB/sec

1 Gbit/sethernet

Off

-Chi

p M

emor

yO

ff-C

hip

Mem

ory

OtherChips via 3D

mesh

Off

-Chi

p M

emor

yO

ff-C

hip

Mem

ory

Off

-Chi

p M

emor

yO

ff-C

hip

Mem

ory

Off

-Chi

p M

emor

yO

ff-C

hip

Mem

ory

IDEHDD

4 GB/sec

6

SPSP SPSP SPSP SPSP SPSP SPSP SPSP SPSP

TU TU

SP SP

FPU

TUTU TUTU

SPSP SPSP

FPUFPU

TU TU

SP SP

FPU

TUTU TUTU

SPSP SPSP

FPUFPU

TU TU

SP SP

FPU

TUTU TUTU

SPSP SPSP

FPUFPU

A-s

witc

h

DM

A

A-s

witc

h

DM

A6

Outline • Introduction • The Second Spring of HPC Parallel Computing • HPC vs. Big Data – Divergence or

Convergence • The Codelet Model and SWARM • Challenges/Opportunities: HPC + Big Data • Summary Remarks

A&M 05-16-2016 14

What is HPC

High-Performance Computing: The term "high-performance computing" refers to systems that, through a combination of processing capability and storage capacity, can solve computational problems that are beyond the capability of small- to medium-scale systems. [Obama’s Executive Order]

Gao-03-07-2016 MEXT 演讲 15

What is Big Data ? Big data is a broad term for data sets so large or complex that traditional data processing applications are inadequate. • Challenges include analysis, capture, data curation,

search, sharing, storage, transfer,visualization, querying and information privacy.

• The term often refers simply to the use of predictive analytics or certain other advanced methods to extract value from data, and seldom to a particular size of data set.

Gao-03-07-2016 MEXT 演讲 16

Data analytics and computing ecosystem compared

Courtesy by “Exscale Computing and Big Data”, DANIEL A. REED AND JACK DONGARRA , CACM July 2015

Mahout: machine learning tool Hive: data warehouse software Pig: provide high level language for big data Sqoop: exchange data with traditional database Flume: log management Zookeeper: maintaining consistency Storm: real-time computation system. Hbase: a distributed, scalable big data store. AVRO: data serialization system.

Data Analytic

FORTRAN,C,C++: languages PAPI: performance and debugging tool MPI/OpenMP: multi-core parallel model SLURM: batch scheduler Lustre: parallel file system

Computational Science

NOTE: The Divergence of Big Data and HPC Eco-Systems!

Waseda-01-26-2016 演讲 17

Key Insights The tools and cultures of high-performance computing and big data

analytics have diverged, to the detriment of both; unification is essential to address a spectrum of major research domains

The challenges of scale tax our ability to transmit data, compute complicated functions on that data, or store a substantial part of it; new approaches are required to meet these challenges

The international nature of science demands further development of advanced computer architectures and global standards for processing data, even as international competition complicates the openness of the scientific process

Courtesy by “Exscale Computing and Big Data”, DANIEL A. REED AND JACK DONGARRA , CACM July 2015

Waseda-01-26-2016 演讲 18

Outline • Introduction • Second Spring of HPC Parallel Computing • New Challenges: HPC vs. Big Data –

Divergence or Convergence • The Codelet Model and SWARM • Challenges/Opportunities: HPC + Big Data • Summary Remarks

A&M 05-16-2016 19

A Quiz: Have you heard the following terms ?

Actors (dataflow) ?

A&M 05-16-2016 20

strand ?

fiber ? codelet ?

21

Coarse-Grain vs. Fine-Grain Multithreading

CPU

Memory

Fine-Grain non-preemptive thread- The “hotel” model

Thread Unit

A Thread Pool

CPU

Memory

Executor Locus

A Single Thread

Coarse-Grain thread- The family home model

Thread Unit

A&M 05-16-2016

What Is A Codelet ?

• Intuitively: A unit of computation which interacts with the global state only at its entrance and exit points • Terminology I do not like to use the term “functional” here – which usually means “stateless”!

22 A&M 05-16-2016

Operational Semantics of Codelets Enabling/Firing Rules

Consider a Codelet graph G – with an assignment of events on some of its edges: • A codelet is enabled if

– An event is present on each of its input edges; – none of the output edges may have any events.

• An enabled event can be scheduled for execution (i.e. fired). The firing of a Codelet will remove all input events (one from each input), and will produce output events, one on each output.

23 A&M 05-16-2016

The Codelet: A Fine-Grain Piece of Computing

Codelet

Result Object

Data Objects

A&M 05-16-2016 24 5/25/2016

• A Codelet is fired when all its inputs are available. • Inputs can be data or resource conditions. • Fundamental properties of Data-Flow: Determinacy,

Repeatability, Composability, among others.

This Looks Like Data Flow ! - Jack Dennis

Evolution of Multithreaded Execution and Architecture Models

Non-dataflow based

CDC 6600 1964

MASA Halstead 1986

HEP B. Smith 1978

Cosmic Cube Seiltz 1985

J-Machine Dally 1988-93

M-Machine Dally 1994-98

Dataflow model inspired

MIT TTDA Arvind 1980

Manchester Gurd & Watson 1982

*T/Start-NG MIT/Motorola 1991-

SIGMA-I Shimada 1988

Monsoon Papadopoulos & Culler 1988

P-RISC Nikhil & Arvind 1989

EM-5/4/X RWC-1 1992-97

Iannuci’s 1988-92

Others: Multiscalar (1994), SMT (1995), etc.

Flynn’s Processor 1969

CHoPP’77 CHoPP’87

TAM Culler 1990

Tera B. Smith 1990-

Alwife Agarwal 1989-96

Cilk Leiserson

LAU Syre 1976

Eldorado

CASCADE

Static Dataflow Dennis 1972 MIT

Arg-Fetching Dataflow DennisGao

1987-88

MDFA Gao

1989-93

EARTH Hum et al. 1993-2006

HTVM/TNT-X DelCuvillo and Gao

2000-2010

Codelet Model

Gao et. al. 2009-

An early version of this slide was presented in my invited talk at Turing Award Winner

Fran Allen’s Retirement Party 2002

A&M 05-16-2016 25

DataFlow = Data + Flow

What is Dataflow Model ?

26 A&M 05-16-2016

DataFlow = Data + Flow

A Miss Understanding of Dataflow Model

27 A&M 05-16-2016

Dataflow Model of Computation (pioneered by J.B. Dennis, Early 1970s)

• A data-driven program execution model (PXM) where a program unit is enabled for execution upon the availability of its input data at runtime.

• It has long been viewed as a radical (颠覆性的)departure from the classical von Neumann computation model (often referred as a control-driven or control-flow PXM).

A&M 05-16-2016 28

Inspiration: Jack Dennis General purpose parallel machines based on a dataflow graph model of computation

2013年,因其在操作系统和数据流领域的重大贡献 荣获IEEE John von

Neumann Medal

29 A&M 05-16-2016

Dataflow Model Superioity -- Breaking “两墙一锁”

• Breaking the serialization barrier to parallelism exploitation due to the von-Neumann computation model [Arvind 1982]

• Breaking the von Neumann (CPU-Memory ) bottle-neck: provide a tight/smooth coupling of the processing and data [Backus 1977 ACM Turing Award]

• Unlocking the shackle of traditional OS and VM [SPARK latest news Deep Dive Into Databricks’ Big

Speedup Plans for Apache Spark, May, 2015]

30 A&M 05-16-2016

First

Second

Third

Finally

A&M 05-16-2016 31

What is SWARM?

A&M 05-16-2016 32

What Is SWARM (cont’d) ?

A&M 05-16-2016 33

Execution Model API

SWift Abstract Runtime Machine SWARM

Programming Environment Platforms

Users Users SW

AR

M E

xecu

tion

Mod

el

Programming Models

High-Level Programming API (MPI, Open MP, CnC, Xio,

Chapel, etc.)

Software packages Program libraries Utility applications

Compilers Tools/SDK

Exascale Hardware Architecture

SWARM Runtime

Language Runtime

A&M 05-16-2016 34

What is SWARM?

• SWARM = SWift Adaptive Runtime Machine • SWARM -- a commercialization of an An

Abstract Codelet Machine (ACM) • SWARM is developed and marketed by ETI – a

small business based on Delaware • SWARM is available for academia under special

license and agreement

A&M 05-16-2016 35

36

D-TEC http://www.dtec-xstack.org XPRESS

http://xstack.sandia.gov/xpress

XTUNE http://ctop.cs.utah.edu/

x-tune/

GVR http://gvr.cs.uchicago.edu

Traleika Glacier (https://sites.google.com/site/traleikaglacierxstack/

DynAX http://www.etinternational

.com/xstack

SLEEC https://engineering.purdu

e.edu/ ~milind/sleec/

DEGAS http://crd.lbl.gov/groups-depts/ future-technologies-group/ projects/DEGAS/

CORVETTE http://crd.lbl.gov/groups-depts/ future-technologies-group/ projects/corvette/

Codelet based

US DOE X-Stack Program --- 9 Awardees

A&M 05-16-2016

Event driven tasks (inspired by Delaware codelet model): Dataflow inspired codelets (self contained/”atomic”). Non blocking, no preemption. Programming model: Separation of concerns: Domain specification & HW mapping. Express data locality with hierarchical tiling. Global, shared, non-coherent address space. Optimization and auto generation of EDTs (HW specific). Execution model: Dynamic, event-driven scheduling, non-blocking. Dynamic decision to move computation to data. Observation based adaption (self-awareness). Implemented in the runtime environment. Separation of concerns: User application, control, and resource management.

Programming & Execution Model

A&M 05-16-2016 37

Outline

• Introduction • Second Spring of HPC Parallel Computing • New Challenges:HPC vs. Big Data – Divergence or

Convergence ? • The Codelet Model and SWARM • Challenges/Opportunities: HPC + Big Data • Summary Remarks

A&M 05-16-2016 38

Data analytics and computing ecosystem compared

Courtesy by “Exscale Computing and Big Data”, DANIEL A. REED AND JACK DONGARRA , CACM July 2015

Mahout: machine learning tool Hive: data warehouse software Pig: provide high level language for big data Sqoop: exchange data with traditional database Flume: log management Zookeeper: maintaining consistency Storm: real-time computation system. Hbase: a distributed, scalable big data store. AVRO: data serialization system.

Data Analytic

FORTRAN,C,C++: languages PAPI: performance and debugging tool MPI/OpenMP: multi-core parallel model SLURM: batch scheduler Lustre: parallel file system

Computational Science

NOTE: The Divergence of Big Data and HPC Eco-Systems!

Applications and Community Codes Mahout, R, and Applications Application

Level

Hive Pig Sqoop Flume

Storm Map-Reduce

AVRO

Hbase Big Table (key-value store)

HDFS (Hadoop File System)

Zookeeper (coordination)

Cloud Services (e.g. AWS

Application Level

Middleware and

Management

Virtual Machines and Cloud Services (optional)

Linux OS variant

FORTRAN, C, C++, and IDEs

Domain-specific Libraries

MPI/OpenMP + Accelerator Tools

Numerical Libraries Performance and Debugging

Lustre (Parallel File System)

Batch Scheduler (such as SLURM)

System Monitoring Tools

Linux OS variant

Ethernet Switches

Local Node Storage

Commodity X86 Racks

Infiniband + Ethernet Switches

SAN + Local Node

Storage

X86 Racks + GPUs or

Accelerators

System Software

Cluster Software

Data Analytics Ecosystem

Computational Science Ecosystem

A&M 05-16-2016 39

Issues and Challenges - The Three Gaps ?

• How to handle the divergence (gap) of eco-systems of extreme-scale computation and big data ?

• How to bridge the gaps between – data vs. knowledge – Knowledge vs. “$$”s

• How to best encourage and guide innovation and entrepreneurship to bridge the gaps ?

40 A&M 05-16-2016

Innovation and Entrepreneurship – Jointly Funded by

Public and Private Sectors Partnerships • Most recently: Obama and DoD are going to Silicon Volley and announced a "cooperation" between the private sector VCs to promote/fund entrepreneurship for US mission-driven innovations • Small business participation is critical for

ground-breaking R&D • Very pleased to witness new momentum from Japan

side • A broad ground open up for Japan-US collaboration • An Example

A&M 05-16-2016 41

From My Own Experience : Three Case Studies

A&M 05-16-2016 42

A&M 05-16-2016 43

Case I:Cyclops 64 Project

Cyclops-64 1.1 Petaflops 13 TB memory

Processor Rack 11.52 Teraflops 144 GB memory

Mid-plane 3.84 Teraflops 48 GB memory

Node Card 80 Gigaflops 1 GB memory

Cyclops-64 ASIC 80 Procs + 16 iCaches + 96 Port xbar switch

C64 Processor 2 Thread Units + 60 KB SRAM + 1FP Unit

• Cyclops64 System (Blue Gene/C)

– first generation of large-scale

many-core chip technology is

employed (160 core/chip, upto

10,000 chips/system)

• Cyclops64系统中,leverage

datalow model to break the OS

barrier,

• Invention of the TiNy-Threads(

TNT)PXM。

Ethernet Switches Local Node Storage

Commodity X86 Racks Hardware

Infiniband Ethernet Switches

SAN Local Node Storage

X86 Racks, GPUs or Accelerators

Big Data Ecosystem HPC Ecosystem

Hadoop

System Software

Map-Reduce based API Spark RDD API

Spark Storm

CrAMER/HAMR MPI OpenMP

SWARM OpenCL

EARTH

Cilk

Flowlet-based API Storm API

Programming Model

MPI API OpenMP API

Codelet API TiNy-Threads API

Data Analytic Machine Learning

Real Time Processing Financial Application Application

Computational Chemistry Bioinformatics

Computational physics

Domain-specific Application

A&M 05-16-2016 44

Ethernet Switches Local Node Storage

Commodity X86 Racks Hardware

Infiniband Ethernet Switches

SAN Local Node Storage

X86 Racks, GPUs or Accelerators

Big Data Ecosystem HPC Ecosystem

Hadoop

System Software

Map-Reduce based API Spark RDD API

Spark Storm

HAMR MPI OpenMP

SWARM OpenCL

EARTH

Cilk

Flowlet-based API Storm API

Programming Model

MPI API OpenMP API

Codelet API TiNy-Threads API

Data Analytic Machine Learning

Real Time Processing Financial Application Application

Computational Chemistry Bioinformatics

Computational physics

Domain-specific Application

DataFlow Model Inspired Runtime System

45 A&M 05-16-2016

Outline • Introduction • Obama’s Executive Order • The Codelet Model and SWARM • Challenges/Opportunities: HPC + Big Data • Summary Remarks

A&M 05-16-2016 46

My Remarks on the “Second Spring”

• Entrepreneurship and Innovation are critical for the success in the “second spring”!

• But, we must never forget the past lessons that we have learned in the 1st Spring!

A&M 05-16-2016 47

Wang Xing – Forbes Profile

• Wang Xing on Forbes Lists #83 China Rich List (2015)

Wang Xing cracks the top 100 richest in China for the first time. Wang created the largest "online-to-offline commerce company in China. Wang, a student of Prof. Guang R. Gao at University of Delaware, went back to China in 2005 and started his innovation and entrepreneur career.

http://www.forbes.com/profile/wang-xing/

Latest News : Chinese startup raises largest private funding round ever! Published: Jan 19, 2016 9:53 a.m. ET 48 A&M 05-16-2016

The Three-Way JV Model • Three-Way Investment Public Support + Private Industry Giant + Small Entrepreneur

- My Own Experience

A&M 05-16-2016 49

Acknowledgements

• Sponsors: DOE, DOD, NSF etc. • Darema Frederica, AFSOR (DDDAS) • Colleagues and collaborators. • Intel, UIUC, Indiana U/LSU, Rice, and many

others in the DOE X-Stack Program. • Waseda University (Prof. Kasahara and others) • Japan SGU Program and Dean Sugano • University of Delaware, ETI and CAPSL

50 A&M 05-16-2016


Recommended