:
Welcome the Second Spring of Dataflow and Parallel Computing --
Toward a Path of Convergence for Ecosystems of Extreme-Scale HPC, Big Data and Beyond
Guang R. Gao ACM Fellow and IEEE Fellow
Endowed Distinguished Professor, University of Delaware
And Founder of ETI
A&M 05-16-2016 1
Outline • Introduction • Second Spring of HPC Parallel Computing • New Challenges:HPC vs. Big Data –
Divergence or Convergence ? • The Codelet Model and SWARM • Challenges/Opportunities: HPC + Big Data • Summary Remarks
A&M 05-16-2016 2
Looking Back 20+ Years The Pessimism over our field..
• HPC is a small and relatively unimportant field ?
• Is Parallel Computing dead – Ken Kennedy? • Computer architecture is a dead field ? • Full Artificial Intelligence is a “fantasy” ? • Dataflow model of computation suffered great
setback ….
A&M 05-16-2016 3
Looking Back 20+ Years ..
• “Parallel Computing is dead” • “Death of computer architecture” • “Death of dataflow model of computation” • “Death of Artificial Intelligence!” • ….
AIST-03-01-2016 演讲 4
IPDPS2005-Keynote 5
State of Parallel Computer Architecture Innovations
– “…researchers basked in parallel-computing glory. They developed an amazing variety of parallel algorithms for every applicable sequential operation. They proposed every possible structure to interconnect thousands of processors…”
– But “.. The market for massively parallel computers has collapsed, and many companies have gone out of business.
[IEEE Computer, Nov. 1994, pp 74-75]
IPDPS2005-Keynote 6
State of Parallel Computer Architecture Innovations
• “ ..The term 'proprietary architecture' has become pejorative. For computer designers, the revolution is over and only 'fine tuning' remains… “
[“End of Architecture”, Burton Smith 1990s]
5/25/2016 A&M 05-16-2016 7
Corporations Vanishing (1985 – 2005)
1990 1992 1994 1996 1985 2000 1998 2005
1999 Sequent
1994 Thinking Machines
1992 Meiko Scientific
1995 Pyramid
1998 DEC
1989 ETA
MasPar 1996
Convex Computer
1994
nCube 2005
Kendall Square Resarch
1996
ESCD 1990 Multiflow
1990 Cray Research
1996
BBN 1997
Myrias 1991
Keynote at the 2005 IPDPS Conference Denver, CO
“Is Parallel Computing Dead ?” - Ken Kennedy, 1994
AIST-03-01-2016 演讲 8
“The announcement that Thinking Machines would seek Chapter 11 bankruptcy protection, although not unexpected, sent shock waves through the high- performance computing community. Coupled with the well-publicized problems of Kendall Square Research and the rumored problems of Intel Supercomputer Systems Division, this event has led many people to question the long- term viability of the parallel computing industry and even parallel computing itself. Meanwhile, the dramatic strides in the performance of scientific workstations continues to squeeze the market for parallel supercomputing. On several recent occasions, I have been asked whether parallel computing will soon be relegated to the trash heap reserved for promising technologies that never quite make it. Washington certainly seems to be looking in the other direction--agency program managers, if they talk of high-performance computing at all, seem to view it as a small and relatively unimportant subcomponent of the National Information Infrastructure.
Outline • Introduction • Second Spring of HPC Parallel Computing • New Challenges: HPC vs. Big Data –
Divergence or Convergence • The Codelet Model and SWARM • Challenges/Opportunities: HPC + Big Data • Summary Remarks
A&M 05-16-2016 9
2005-Present A Second Spring of HPC Parallel Computing
• Sequential processing hits serious walls – Heat wall – Memory wall – Other walls
• Parallel processing (appear) to provide a powewrful alternative to beat the walls
• Moors Law (appear) to still enjoy good years in the past decade
• Two examples (see next 2 slides) A&M 05-16-2016 11
2016/5/25 IPDPS2005-Keynote 13 Communication Ports for
3D Mesh Inter-Chip Network
UPC+/- Co-array Fortran OpenMP-XN EARTH-C +/- MPI ……
Application Programming API
Cyclops Thread Virtual Machine Thread
Management Shared Memory
Operations
Thread Creation & Termination
Scheduling
Dynamic memory management
Put / get with sync
acquire / release fibers
async function invocation
Kcc/gcc
Compiler
Tool
chain
Cyclops-64 Programming Models and System Software Supports
Cyclops-64 ISA
Fine-Grain Multithreading
Thread Synchronization
Load Balancing
Others
Put / get
Location Consistency
System Software
Percolation
Advanced Execution/ Programming Model
Infrastructure and Tools
Simulation / Emulation
Analytical Modeling
Base Execution
Model
Fine-Grain Multithreading (e.g. EARTH,
CARE)
24x24
24 PC cards in 1 shishkebab
1 PetaFlops
A-Switch
Crossbar Network
…
MEM
OR
YB
AN
K
MEM
OR
YB
AN
K
MEM
OR
YB
AN
K
MEM
OR
YB
AN
K
MEM
OR
YB
AN
K
MEM
OR
YB
AN
K
MEM
OR
YB
AN
K
MEM
OR
YB
AN
K
…
TU TU
SP SP
FPU
4 GB/sec* 6
4 GB/sec
50 MB/sec
1 Gbit/sethernet
Off
-Chi
p M
emor
y
OtherChips via 3D
mesh
Off
-Chi
p M
emor
yO
ff-C
hip
Mem
ory
Off
-Chi
p M
emor
y
IDEHDD
4 GB/sec
6
SP SP SP SP SP SP SP SP
TU TU
SP SP
FPU
TU TU
SP SP
FPU
TU TU
SP SP
FPU
A-s
witc
h
DM
A6A-Switch
Crossbar NetworkCrossbar Network
…
MEM
OR
YB
AN
KM
EMO
RY
BA
NK
MEM
OR
YB
AN
KM
EMO
RY
BA
NK
MEM
OR
YB
AN
KM
EMO
RY
BA
NK
MEM
OR
YB
AN
KM
EMO
RY
BA
NK
MEM
OR
YB
AN
KM
EMO
RY
BA
NK
MEM
OR
YB
AN
KM
EMO
RY
BA
NK
MEM
OR
YB
AN
KM
EMO
RY
BA
NK
MEM
OR
YB
AN
KM
EMO
RY
BA
NK
…
TU TU
SP SP
FPU
TUTU TUTU
SPSP SPSP
FPUFPU
4 GB/sec* 6
4 GB/sec
50 MB/sec
1 Gbit/sethernet
Off
-Chi
p M
emor
yO
ff-C
hip
Mem
ory
OtherChips via 3D
mesh
Off
-Chi
p M
emor
yO
ff-C
hip
Mem
ory
Off
-Chi
p M
emor
yO
ff-C
hip
Mem
ory
Off
-Chi
p M
emor
yO
ff-C
hip
Mem
ory
IDEHDD
4 GB/sec
6
SPSP SPSP SPSP SPSP SPSP SPSP SPSP SPSP
TU TU
SP SP
FPU
TUTU TUTU
SPSP SPSP
FPUFPU
TU TU
SP SP
FPU
TUTU TUTU
SPSP SPSP
FPUFPU
TU TU
SP SP
FPU
TUTU TUTU
SPSP SPSP
FPUFPU
A-s
witc
h
DM
A
A-s
witc
h
DM
A6
Outline • Introduction • The Second Spring of HPC Parallel Computing • HPC vs. Big Data – Divergence or
Convergence • The Codelet Model and SWARM • Challenges/Opportunities: HPC + Big Data • Summary Remarks
A&M 05-16-2016 14
What is HPC
High-Performance Computing: The term "high-performance computing" refers to systems that, through a combination of processing capability and storage capacity, can solve computational problems that are beyond the capability of small- to medium-scale systems. [Obama’s Executive Order]
Gao-03-07-2016 MEXT 演讲 15
What is Big Data ? Big data is a broad term for data sets so large or complex that traditional data processing applications are inadequate. • Challenges include analysis, capture, data curation,
search, sharing, storage, transfer,visualization, querying and information privacy.
• The term often refers simply to the use of predictive analytics or certain other advanced methods to extract value from data, and seldom to a particular size of data set.
Gao-03-07-2016 MEXT 演讲 16
Data analytics and computing ecosystem compared
Courtesy by “Exscale Computing and Big Data”, DANIEL A. REED AND JACK DONGARRA , CACM July 2015
Mahout: machine learning tool Hive: data warehouse software Pig: provide high level language for big data Sqoop: exchange data with traditional database Flume: log management Zookeeper: maintaining consistency Storm: real-time computation system. Hbase: a distributed, scalable big data store. AVRO: data serialization system.
Data Analytic
FORTRAN,C,C++: languages PAPI: performance and debugging tool MPI/OpenMP: multi-core parallel model SLURM: batch scheduler Lustre: parallel file system
Computational Science
NOTE: The Divergence of Big Data and HPC Eco-Systems!
Waseda-01-26-2016 演讲 17
Key Insights The tools and cultures of high-performance computing and big data
analytics have diverged, to the detriment of both; unification is essential to address a spectrum of major research domains
The challenges of scale tax our ability to transmit data, compute complicated functions on that data, or store a substantial part of it; new approaches are required to meet these challenges
The international nature of science demands further development of advanced computer architectures and global standards for processing data, even as international competition complicates the openness of the scientific process
Courtesy by “Exscale Computing and Big Data”, DANIEL A. REED AND JACK DONGARRA , CACM July 2015
Waseda-01-26-2016 演讲 18
Outline • Introduction • Second Spring of HPC Parallel Computing • New Challenges: HPC vs. Big Data –
Divergence or Convergence • The Codelet Model and SWARM • Challenges/Opportunities: HPC + Big Data • Summary Remarks
A&M 05-16-2016 19
A Quiz: Have you heard the following terms ?
Actors (dataflow) ?
A&M 05-16-2016 20
strand ?
fiber ? codelet ?
21
Coarse-Grain vs. Fine-Grain Multithreading
CPU
Memory
Fine-Grain non-preemptive thread- The “hotel” model
Thread Unit
A Thread Pool
CPU
Memory
Executor Locus
A Single Thread
Coarse-Grain thread- The family home model
Thread Unit
A&M 05-16-2016
What Is A Codelet ?
• Intuitively: A unit of computation which interacts with the global state only at its entrance and exit points • Terminology I do not like to use the term “functional” here – which usually means “stateless”!
22 A&M 05-16-2016
Operational Semantics of Codelets Enabling/Firing Rules
Consider a Codelet graph G – with an assignment of events on some of its edges: • A codelet is enabled if
– An event is present on each of its input edges; – none of the output edges may have any events.
• An enabled event can be scheduled for execution (i.e. fired). The firing of a Codelet will remove all input events (one from each input), and will produce output events, one on each output.
23 A&M 05-16-2016
The Codelet: A Fine-Grain Piece of Computing
Codelet
Result Object
Data Objects
A&M 05-16-2016 24 5/25/2016
• A Codelet is fired when all its inputs are available. • Inputs can be data or resource conditions. • Fundamental properties of Data-Flow: Determinacy,
Repeatability, Composability, among others.
This Looks Like Data Flow ! - Jack Dennis
Evolution of Multithreaded Execution and Architecture Models
Non-dataflow based
CDC 6600 1964
MASA Halstead 1986
HEP B. Smith 1978
Cosmic Cube Seiltz 1985
J-Machine Dally 1988-93
M-Machine Dally 1994-98
Dataflow model inspired
MIT TTDA Arvind 1980
Manchester Gurd & Watson 1982
*T/Start-NG MIT/Motorola 1991-
SIGMA-I Shimada 1988
Monsoon Papadopoulos & Culler 1988
P-RISC Nikhil & Arvind 1989
EM-5/4/X RWC-1 1992-97
Iannuci’s 1988-92
Others: Multiscalar (1994), SMT (1995), etc.
Flynn’s Processor 1969
CHoPP’77 CHoPP’87
TAM Culler 1990
Tera B. Smith 1990-
Alwife Agarwal 1989-96
Cilk Leiserson
LAU Syre 1976
Eldorado
CASCADE
Static Dataflow Dennis 1972 MIT
Arg-Fetching Dataflow DennisGao
1987-88
MDFA Gao
1989-93
EARTH Hum et al. 1993-2006
HTVM/TNT-X DelCuvillo and Gao
2000-2010
Codelet Model
Gao et. al. 2009-
An early version of this slide was presented in my invited talk at Turing Award Winner
Fran Allen’s Retirement Party 2002
A&M 05-16-2016 25
Dataflow Model of Computation (pioneered by J.B. Dennis, Early 1970s)
• A data-driven program execution model (PXM) where a program unit is enabled for execution upon the availability of its input data at runtime.
• It has long been viewed as a radical (颠覆性的)departure from the classical von Neumann computation model (often referred as a control-driven or control-flow PXM).
A&M 05-16-2016 28
Inspiration: Jack Dennis General purpose parallel machines based on a dataflow graph model of computation
2013年,因其在操作系统和数据流领域的重大贡献 荣获IEEE John von
Neumann Medal
29 A&M 05-16-2016
Dataflow Model Superioity -- Breaking “两墙一锁”
• Breaking the serialization barrier to parallelism exploitation due to the von-Neumann computation model [Arvind 1982]
• Breaking the von Neumann (CPU-Memory ) bottle-neck: provide a tight/smooth coupling of the processing and data [Backus 1977 ACM Turing Award]
• Unlocking the shackle of traditional OS and VM [SPARK latest news Deep Dive Into Databricks’ Big
Speedup Plans for Apache Spark, May, 2015]
30 A&M 05-16-2016
Execution Model API
SWift Abstract Runtime Machine SWARM
Programming Environment Platforms
Users Users SW
AR
M E
xecu
tion
Mod
el
Programming Models
High-Level Programming API (MPI, Open MP, CnC, Xio,
Chapel, etc.)
Software packages Program libraries Utility applications
Compilers Tools/SDK
Exascale Hardware Architecture
SWARM Runtime
Language Runtime
A&M 05-16-2016 34
What is SWARM?
• SWARM = SWift Adaptive Runtime Machine • SWARM -- a commercialization of an An
Abstract Codelet Machine (ACM) • SWARM is developed and marketed by ETI – a
small business based on Delaware • SWARM is available for academia under special
license and agreement
A&M 05-16-2016 35
36
D-TEC http://www.dtec-xstack.org XPRESS
http://xstack.sandia.gov/xpress
XTUNE http://ctop.cs.utah.edu/
x-tune/
GVR http://gvr.cs.uchicago.edu
Traleika Glacier (https://sites.google.com/site/traleikaglacierxstack/
DynAX http://www.etinternational
.com/xstack
SLEEC https://engineering.purdu
e.edu/ ~milind/sleec/
DEGAS http://crd.lbl.gov/groups-depts/ future-technologies-group/ projects/DEGAS/
CORVETTE http://crd.lbl.gov/groups-depts/ future-technologies-group/ projects/corvette/
Codelet based
US DOE X-Stack Program --- 9 Awardees
A&M 05-16-2016
Event driven tasks (inspired by Delaware codelet model): Dataflow inspired codelets (self contained/”atomic”). Non blocking, no preemption. Programming model: Separation of concerns: Domain specification & HW mapping. Express data locality with hierarchical tiling. Global, shared, non-coherent address space. Optimization and auto generation of EDTs (HW specific). Execution model: Dynamic, event-driven scheduling, non-blocking. Dynamic decision to move computation to data. Observation based adaption (self-awareness). Implemented in the runtime environment. Separation of concerns: User application, control, and resource management.
Programming & Execution Model
A&M 05-16-2016 37
Outline
• Introduction • Second Spring of HPC Parallel Computing • New Challenges:HPC vs. Big Data – Divergence or
Convergence ? • The Codelet Model and SWARM • Challenges/Opportunities: HPC + Big Data • Summary Remarks
A&M 05-16-2016 38
Data analytics and computing ecosystem compared
Courtesy by “Exscale Computing and Big Data”, DANIEL A. REED AND JACK DONGARRA , CACM July 2015
Mahout: machine learning tool Hive: data warehouse software Pig: provide high level language for big data Sqoop: exchange data with traditional database Flume: log management Zookeeper: maintaining consistency Storm: real-time computation system. Hbase: a distributed, scalable big data store. AVRO: data serialization system.
Data Analytic
FORTRAN,C,C++: languages PAPI: performance and debugging tool MPI/OpenMP: multi-core parallel model SLURM: batch scheduler Lustre: parallel file system
Computational Science
NOTE: The Divergence of Big Data and HPC Eco-Systems!
Applications and Community Codes Mahout, R, and Applications Application
Level
Hive Pig Sqoop Flume
Storm Map-Reduce
AVRO
Hbase Big Table (key-value store)
HDFS (Hadoop File System)
Zookeeper (coordination)
Cloud Services (e.g. AWS
Application Level
Middleware and
Management
Virtual Machines and Cloud Services (optional)
Linux OS variant
FORTRAN, C, C++, and IDEs
Domain-specific Libraries
MPI/OpenMP + Accelerator Tools
Numerical Libraries Performance and Debugging
Lustre (Parallel File System)
Batch Scheduler (such as SLURM)
System Monitoring Tools
Linux OS variant
Ethernet Switches
Local Node Storage
Commodity X86 Racks
Infiniband + Ethernet Switches
SAN + Local Node
Storage
X86 Racks + GPUs or
Accelerators
System Software
Cluster Software
Data Analytics Ecosystem
Computational Science Ecosystem
A&M 05-16-2016 39
Issues and Challenges - The Three Gaps ?
• How to handle the divergence (gap) of eco-systems of extreme-scale computation and big data ?
• How to bridge the gaps between – data vs. knowledge – Knowledge vs. “$$”s
• How to best encourage and guide innovation and entrepreneurship to bridge the gaps ?
40 A&M 05-16-2016
Innovation and Entrepreneurship – Jointly Funded by
Public and Private Sectors Partnerships • Most recently: Obama and DoD are going to Silicon Volley and announced a "cooperation" between the private sector VCs to promote/fund entrepreneurship for US mission-driven innovations • Small business participation is critical for
ground-breaking R&D • Very pleased to witness new momentum from Japan
side • A broad ground open up for Japan-US collaboration • An Example
A&M 05-16-2016 41
A&M 05-16-2016 43
Case I:Cyclops 64 Project
Cyclops-64 1.1 Petaflops 13 TB memory
Processor Rack 11.52 Teraflops 144 GB memory
Mid-plane 3.84 Teraflops 48 GB memory
Node Card 80 Gigaflops 1 GB memory
Cyclops-64 ASIC 80 Procs + 16 iCaches + 96 Port xbar switch
C64 Processor 2 Thread Units + 60 KB SRAM + 1FP Unit
• Cyclops64 System (Blue Gene/C)
– first generation of large-scale
many-core chip technology is
employed (160 core/chip, upto
10,000 chips/system)
• Cyclops64系统中,leverage
datalow model to break the OS
barrier,
• Invention of the TiNy-Threads(
TNT)PXM。
Ethernet Switches Local Node Storage
Commodity X86 Racks Hardware
Infiniband Ethernet Switches
SAN Local Node Storage
X86 Racks, GPUs or Accelerators
Big Data Ecosystem HPC Ecosystem
Hadoop
System Software
Map-Reduce based API Spark RDD API
Spark Storm
CrAMER/HAMR MPI OpenMP
SWARM OpenCL
EARTH
Cilk
Flowlet-based API Storm API
Programming Model
MPI API OpenMP API
Codelet API TiNy-Threads API
Data Analytic Machine Learning
Real Time Processing Financial Application Application
Computational Chemistry Bioinformatics
Computational physics
Domain-specific Application
A&M 05-16-2016 44
Ethernet Switches Local Node Storage
Commodity X86 Racks Hardware
Infiniband Ethernet Switches
SAN Local Node Storage
X86 Racks, GPUs or Accelerators
Big Data Ecosystem HPC Ecosystem
Hadoop
System Software
Map-Reduce based API Spark RDD API
Spark Storm
HAMR MPI OpenMP
SWARM OpenCL
EARTH
Cilk
Flowlet-based API Storm API
Programming Model
MPI API OpenMP API
Codelet API TiNy-Threads API
Data Analytic Machine Learning
Real Time Processing Financial Application Application
Computational Chemistry Bioinformatics
Computational physics
Domain-specific Application
DataFlow Model Inspired Runtime System
45 A&M 05-16-2016
Outline • Introduction • Obama’s Executive Order • The Codelet Model and SWARM • Challenges/Opportunities: HPC + Big Data • Summary Remarks
A&M 05-16-2016 46
My Remarks on the “Second Spring”
• Entrepreneurship and Innovation are critical for the success in the “second spring”!
• But, we must never forget the past lessons that we have learned in the 1st Spring!
A&M 05-16-2016 47
Wang Xing – Forbes Profile
• Wang Xing on Forbes Lists #83 China Rich List (2015)
Wang Xing cracks the top 100 richest in China for the first time. Wang created the largest "online-to-offline commerce company in China. Wang, a student of Prof. Guang R. Gao at University of Delaware, went back to China in 2005 and started his innovation and entrepreneur career.
http://www.forbes.com/profile/wang-xing/
Latest News : Chinese startup raises largest private funding round ever! Published: Jan 19, 2016 9:53 a.m. ET 48 A&M 05-16-2016
The Three-Way JV Model • Three-Way Investment Public Support + Private Industry Giant + Small Entrepreneur
- My Own Experience
A&M 05-16-2016 49
Acknowledgements
• Sponsors: DOE, DOD, NSF etc. • Darema Frederica, AFSOR (DDDAS) • Colleagues and collaborators. • Intel, UIUC, Indiana U/LSU, Rice, and many
others in the DOE X-Stack Program. • Waseda University (Prof. Kasahara and others) • Japan SGU Program and Dean Sugano • University of Delaware, ETI and CAPSL
50 A&M 05-16-2016