+ All Categories
Home > Documents > Ganesh Gopalakrishnan, Wei-Fan Chiang, and Alexey Solovyev · Relevant Personal History • PhD...

Ganesh Gopalakrishnan, Wei-Fan Chiang, and Alexey Solovyev · Relevant Personal History • PhD...

Date post: 15-Aug-2020
Category:
Upload: others
View: 1 times
Download: 0 times
Share this document with a friend
111
URL: http://www.cs.utah.edu/fv Supported by NSF awards SI2 (ACI-1148127), EAGER (CCF-1241849), Failure Resistant Systems (CCF 1255776) and SRC Task 2426.001, NSF Medium (CCF 7298529), EAGER (CCF 1346756) SUPER Institute (for resilience research) and special thanks to Microsoft for funding (2006-2010) on getting established in this area! Correctness Checking Concepts and Tools for High Performance Computing ..or Ganesh Gopalakrishnan, Wei-Fan Chiang, and Alexey Solovyev School of Computing University of Utah Salt Lake City, UT 84112
Transcript
Page 1: Ganesh Gopalakrishnan, Wei-Fan Chiang, and Alexey Solovyev · Relevant Personal History • PhD from Stony Brook : 1981 (when Mead/Conway : VLSI, Hennessy : MIPS, Patterson : Sparc)!

URL: http://www.cs.utah.edu/fvSupported by NSF awards SI2 (ACI-1148127), EAGER (CCF-1241849), Failure Resistant Systems (CCF 1255776)!

and SRC Task 2426.001, NSF Medium (CCF 7298529), EAGER (CCF 1346756) !SUPER Institute (for resilience research)!

and special thanks to Microsoft for funding (2006-2010) on getting established in this area!

Correctness Checking Concepts and Tools for High Performance Computing..or

Ganesh Gopalakrishnan, Wei-Fan Chiang, and Alexey Solovyev!School of Computing

University of Utah Salt Lake City, UT 84112

Page 2: Ganesh Gopalakrishnan, Wei-Fan Chiang, and Alexey Solovyev · Relevant Personal History • PhD from Stony Brook : 1981 (when Mead/Conway : VLSI, Hennessy : MIPS, Patterson : Sparc)!

Ganesh Gopalakrishnan, Wei-Fan Chiang, and Alexey Solovyev!School of Computing

University of Utah Salt Lake City, UT 84112

URL: http://www.cs.utah.edu/fvSupported by NSF awards SI2 (ACI-1148127), EAGER (CCF-1241849), Failure Resistant Systems (CCF 1255776)!

and SRC Task 2426.001, NSF Medium (CCF 7298529), EAGER (CCF 1346756) !SUPER Institute (for resilience research)!

and special thanks to Microsoft for funding (2006-2010) on getting established in this area!

Correctness Checking Concepts and Tools for High Performance Computing

Bugs: Black Ice on the Road to Exascale..or

Page 3: Ganesh Gopalakrishnan, Wei-Fan Chiang, and Alexey Solovyev · Relevant Personal History • PhD from Stony Brook : 1981 (when Mead/Conway : VLSI, Hennessy : MIPS, Patterson : Sparc)!

!3

Page 4: Ganesh Gopalakrishnan, Wei-Fan Chiang, and Alexey Solovyev · Relevant Personal History • PhD from Stony Brook : 1981 (when Mead/Conway : VLSI, Hennessy : MIPS, Patterson : Sparc)!

Relevant Personal History• PhD from Stony Brook : 1981 (when Mead/Conway : VLSI, Hennessy : MIPS, Patterson : Sparc)!

• Joined Utah 1986!

• Taught OS as my second class!

• Wrote to Tanenbaum!

• Got Minix on 5.25 inch floppy!

• Class did kernel hacking on dual 5.25 inch IBM PC!

• ……..!

• Worked on various aspects of concurrency!

• Self-timed Circuit Design!

• Pipelined Processor Verification!

• Cache Coherence Protocols!

• Shared Memory Consistency Models!

• Feel privileged to work on Formal Methods for Concurrency in Service of HPC !!

!4

Page 5: Ganesh Gopalakrishnan, Wei-Fan Chiang, and Alexey Solovyev · Relevant Personal History • PhD from Stony Brook : 1981 (when Mead/Conway : VLSI, Hennessy : MIPS, Patterson : Sparc)!

We have been fortunate to have built some tools in support of HPC FV

• Let us do some demos … so that you have some context to what I’ll be later saying

!5

Page 6: Ganesh Gopalakrishnan, Wei-Fan Chiang, and Alexey Solovyev · Relevant Personal History • PhD from Stony Brook : 1981 (when Mead/Conway : VLSI, Hennessy : MIPS, Patterson : Sparc)!

DEMO: Dynamic Execution based !Debugging of MPI Programs!

!Tool Name : ISP

!6

Page 7: Ganesh Gopalakrishnan, Wei-Fan Chiang, and Alexey Solovyev · Relevant Personal History • PhD from Stony Brook : 1981 (when Mead/Conway : VLSI, Hennessy : MIPS, Patterson : Sparc)!

DEMO: Symbolic Execution based debugging of!Sequential programs and!

GPU CUDA programs!!

Tool name : GKLEE

!7

Page 8: Ganesh Gopalakrishnan, Wei-Fan Chiang, and Alexey Solovyev · Relevant Personal History • PhD from Stony Brook : 1981 (when Mead/Conway : VLSI, Hennessy : MIPS, Patterson : Sparc)!

Brief History of Why We are Where We Are• CISC machines (70s)!

• Pipelining —> Clock Frequency growth + Compilers!

• Hennessy and Patterson outdid the industry using “Mead and Conway” VLSI design!

• Pipelining —> Better ILP use!

• Moore’s law : afforded Pipelining tricks!

• Dennard’s law : allowed voltage scaling!

• POWER DENSITY stayed the same!

• Ridiculous Frequencies, Diminishing ILP Returns, Moore Alive, Dennard Dying already…!

• Tejas Project Write-off — NY Times !

• Dick Lyon, Charles Leiserson, Guy Blelloch, … were right ALL ALONG !!

!8

Page 9: Ganesh Gopalakrishnan, Wei-Fan Chiang, and Alexey Solovyev · Relevant Personal History • PhD from Stony Brook : 1981 (when Mead/Conway : VLSI, Hennessy : MIPS, Patterson : Sparc)!

Brief History of Why We are Where We Are

!9

• CISC machines (70s)!

• Pipelining —> Clock Frequency growth + Compilers!

• Hennessy and Patterson outdid the industry using “Mead and Conway” VLSI design!

• Pipelining —> Better ILP use!

• Moore’s law : afforded Pipelining tricks!

• Dennard’s law : allowed voltage scaling!

• POWER DENSITY stayed the same!

• Ridiculous Frequencies, Diminishing ILP Returns, Moore Alive, Dennard Dying already…!

• Tejas Project Write-off — NY Times !

• Dick Lyon, Charles Leiserson, Guy Blelloch, … were right ALL ALONG !!

Page 10: Ganesh Gopalakrishnan, Wei-Fan Chiang, and Alexey Solovyev · Relevant Personal History • PhD from Stony Brook : 1981 (when Mead/Conway : VLSI, Hennessy : MIPS, Patterson : Sparc)!

Brief History of Why We are Where We Are

!10

• CISC machines (70s)!

• Pipelining —> Clock Frequency growth + Compilers!

• Hennessy and Patterson outdid the industry using “Mead and Conway” VLSI design!

• Pipelining —> Better ILP use!

• Moore’s law : afforded Pipelining tricks!

• Dennard’s law : allowed voltage scaling!

• POWER DENSITY stayed the same!

• Ridiculous Frequencies, Diminishing ILP Returns, Moore Alive, Dennard Dying already…!

• Tejas Project Write-off — NY Times !

• Dick Lyon, Charles Leiserson, Guy Blelloch, … were right ALL ALONG !!

Page 11: Ganesh Gopalakrishnan, Wei-Fan Chiang, and Alexey Solovyev · Relevant Personal History • PhD from Stony Brook : 1981 (when Mead/Conway : VLSI, Hennessy : MIPS, Patterson : Sparc)!

Brief History of Why We are Where We Are

!11

• CISC machines (70s)!

• Pipelining —> Clock Frequency growth + Compilers!

• Hennessy and Patterson outdid the industry using “Mead and Conway” VLSI design!

• Pipelining —> Better ILP use!

• Moore’s law : afforded Pipelining tricks!

• Dennard’s law : allowed voltage scaling!

• POWER DENSITY stayed the same!

• Ridiculous Frequencies, Diminishing ILP Returns, Moore Alive, Dennard Dying already…!

• Tejas Project Write-off — NY Times !

• Dick Lyon, Charles Leiserson, Guy Blelloch, … were right ALL ALONG !!

Page 12: Ganesh Gopalakrishnan, Wei-Fan Chiang, and Alexey Solovyev · Relevant Personal History • PhD from Stony Brook : 1981 (when Mead/Conway : VLSI, Hennessy : MIPS, Patterson : Sparc)!

Brief History of Why We are Where We Are

!12

• CISC machines (70s)!

• Pipelining —> Clock Frequency growth + Compilers!

• Hennessy and Patterson outdid the industry using “Mead and Conway” VLSI design!

• Pipelining —> Better ILP use!

• Moore’s law : afforded Pipelining tricks!

• Dennard’s law : allowed voltage scaling!

• POWER DENSITY stayed the same!

• Ridiculous Frequencies, Diminishing ILP Returns, Moore Alive, Dennard Dying already…!

• Tejas Project Write-off — NY Times !

• Dick Lyon, Charles Leiserson, Guy Blelloch, … were right ALL ALONG !!

Page 13: Ganesh Gopalakrishnan, Wei-Fan Chiang, and Alexey Solovyev · Relevant Personal History • PhD from Stony Brook : 1981 (when Mead/Conway : VLSI, Hennessy : MIPS, Patterson : Sparc)!

Smart Phones (describe the shape of things to come in HPC)!!

(from Adve, http://www.cs.berkeley.edu/~bodik/ASPLOS13/Symposium/sarita-adve-12-asplos-pc-symposium.pdf

!13

Page 14: Ganesh Gopalakrishnan, Wei-Fan Chiang, and Alexey Solovyev · Relevant Personal History • PhD from Stony Brook : 1981 (when Mead/Conway : VLSI, Hennessy : MIPS, Patterson : Sparc)!

Today’s main HPC Mantra

• “Maximize the volume of computational results obtained per Watt”

!14

Page 15: Ganesh Gopalakrishnan, Wei-Fan Chiang, and Alexey Solovyev · Relevant Personal History • PhD from Stony Brook : 1981 (when Mead/Conway : VLSI, Hennessy : MIPS, Patterson : Sparc)!

But what about correctness…. ?

!15

Industrial Flares

NvidiaNASA

Uintah (SCI Group, Utah)Marsden Lab, UCSD

Wikipedia

Page 16: Ganesh Gopalakrishnan, Wei-Fan Chiang, and Alexey Solovyev · Relevant Personal History • PhD from Stony Brook : 1981 (when Mead/Conway : VLSI, Hennessy : MIPS, Patterson : Sparc)!

Today’s main HPC Mantra

• “Maximize the volume of computational results obtained per Watt”!

• Subject to Moore’s and Dennard’s laws

!16

(Courtesy Bob Colwell)

Page 17: Ganesh Gopalakrishnan, Wei-Fan Chiang, and Alexey Solovyev · Relevant Personal History • PhD from Stony Brook : 1981 (when Mead/Conway : VLSI, Hennessy : MIPS, Patterson : Sparc)!

Today’s main HPC Mantra

!17

(Courtesy Bob Colwell)

• “Maximize the volume of computational results obtained per Watt”!

• Subject to Moore’s and Dennard’s laws

Page 18: Ganesh Gopalakrishnan, Wei-Fan Chiang, and Alexey Solovyev · Relevant Personal History • PhD from Stony Brook : 1981 (when Mead/Conway : VLSI, Hennessy : MIPS, Patterson : Sparc)!

So, how prepared are we to debug Heterogeneous Concurrent Systems?

!18

Page 19: Ganesh Gopalakrishnan, Wei-Fan Chiang, and Alexey Solovyev · Relevant Personal History • PhD from Stony Brook : 1981 (when Mead/Conway : VLSI, Hennessy : MIPS, Patterson : Sparc)!

What is the young many-core world already facing ?

• Multiple heterogeneous cores!

• Multiple concurrency models!

• Data Races!

• Dead Dennard —> Dark Silicon!

• Bit Flips!

• Floating-Point Uncertainties!

• OFTEN clueless (about concurrency) programming community — will provide examples!

• WE JUST DON’T KNOW HOW TO CALIBRATE THE RISKS

!19

Page 20: Ganesh Gopalakrishnan, Wei-Fan Chiang, and Alexey Solovyev · Relevant Personal History • PhD from Stony Brook : 1981 (when Mead/Conway : VLSI, Hennessy : MIPS, Patterson : Sparc)!

What is the young many-core world already facing ?

• Multiple heterogeneous cores!

• Multiple concurrency models!

• Data Races!

• Dead Dennard —> Dark Silicon!

• Bit Flips!

• Floating-Point Uncertainties!

• OFTEN clueless (about concurrency) programming community — will provide examples!

• WE JUST DON’T KNOW HOW TO CALIBRATE THE RISKS

!20

Page 21: Ganesh Gopalakrishnan, Wei-Fan Chiang, and Alexey Solovyev · Relevant Personal History • PhD from Stony Brook : 1981 (when Mead/Conway : VLSI, Hennessy : MIPS, Patterson : Sparc)!

What is the young many-core world already facing ?

• Multiple heterogeneous cores!

• Multiple concurrency models!

• Data Races!

• Dead Dennard —> Dark Silicon!

• Bit Flips!

• Floating-Point Uncertainties!

• OFTEN clueless (about concurrency) programming community — will provide examples!

• WE JUST DON’T KNOW HOW TO CALIBRATE THE RISKS

!21

Page 22: Ganesh Gopalakrishnan, Wei-Fan Chiang, and Alexey Solovyev · Relevant Personal History • PhD from Stony Brook : 1981 (when Mead/Conway : VLSI, Hennessy : MIPS, Patterson : Sparc)!

What is the young many-core world already facing ?

• Multiple heterogeneous cores!

• Multiple concurrency models!

• Data Races!

• Dead Dennard —> Dark Silicon!

• Bit Flips!

• Floating-Point Uncertainties!

• OFTEN clueless (about concurrency) programming community — will provide examples!

• WE JUST DON’T KNOW HOW TO CALIBRATE THE RISKS

!22

Page 23: Ganesh Gopalakrishnan, Wei-Fan Chiang, and Alexey Solovyev · Relevant Personal History • PhD from Stony Brook : 1981 (when Mead/Conway : VLSI, Hennessy : MIPS, Patterson : Sparc)!

What is the young many-core world already facing ?

• Multiple heterogeneous cores!

• Multiple concurrency models!

• Data Races!

• Dead Dennard —> Dark Silicon!

• Bit Flips!

• Floating-Point Uncertainties!

• OFTEN clueless (about concurrency) programming community — will provide examples!

• WE JUST DON’T KNOW HOW TO CALIBRATE THE RISKS

!23

Page 24: Ganesh Gopalakrishnan, Wei-Fan Chiang, and Alexey Solovyev · Relevant Personal History • PhD from Stony Brook : 1981 (when Mead/Conway : VLSI, Hennessy : MIPS, Patterson : Sparc)!

What is the young many-core world already facing ?

• Multiple heterogeneous cores!

• Multiple concurrency models!

• Data Races!

• Dead Dennard —> Dark Silicon!

• Bit Flips!

• Floating-Point Uncertainties!

• OFTEN clueless (about concurrency) programming community — will provide examples!

• WE JUST DON’T KNOW HOW TO CALIBRATE THE RISKS

!24

Page 25: Ganesh Gopalakrishnan, Wei-Fan Chiang, and Alexey Solovyev · Relevant Personal History • PhD from Stony Brook : 1981 (when Mead/Conway : VLSI, Hennessy : MIPS, Patterson : Sparc)!

What is the young many-core world already facing ?

• Multiple heterogeneous cores!

• Multiple concurrency models!

• Data Races!

• Dead Dennard —> Dark Silicon!

• Bit Flips!

• Floating-Point Uncertainties!

• OFTEN clueless (about concurrency) programming community — will provide examples!

• WE JUST DON’T KNOW HOW TO CALIBRATE THE RISKS

!25

Page 26: Ganesh Gopalakrishnan, Wei-Fan Chiang, and Alexey Solovyev · Relevant Personal History • PhD from Stony Brook : 1981 (when Mead/Conway : VLSI, Hennessy : MIPS, Patterson : Sparc)!

What is the young many-core world already facing ?

• Multiple heterogeneous cores!

• Multiple concurrency models!

• Data Races!

• Dead Dennard —> Dark Silicon!

• Bit Flips!

• Floating-Point Uncertainties!

• OFTEN clueless (about concurrency) programming community — will provide examples!

• WE JUST DON’T KNOW HOW TO CALIBRATE THE RISKS

!26

Page 27: Ganesh Gopalakrishnan, Wei-Fan Chiang, and Alexey Solovyev · Relevant Personal History • PhD from Stony Brook : 1981 (when Mead/Conway : VLSI, Hennessy : MIPS, Patterson : Sparc)!

What is the young many-core world already facing ?

• Multiple heterogeneous cores!

• Multiple concurrency models!

• Data Races!

• Dead Dennard —> Dark Silicon!

• Bit Flips!

• Floating-Point Uncertainties!

• OFTEN clueless (about concurrency) programming community — will provide examples!

• WE JUST DON’T KNOW HOW TO CALIBRATE THE RISKS

!27

Page 28: Ganesh Gopalakrishnan, Wei-Fan Chiang, and Alexey Solovyev · Relevant Personal History • PhD from Stony Brook : 1981 (when Mead/Conway : VLSI, Hennessy : MIPS, Patterson : Sparc)!

Power-6 Studies

!28

Page 29: Ganesh Gopalakrishnan, Wei-Fan Chiang, and Alexey Solovyev · Relevant Personal History • PhD from Stony Brook : 1981 (when Mead/Conway : VLSI, Hennessy : MIPS, Patterson : Sparc)!

Getting Resilience Ground Truths (Power-6)

!29

Page 30: Ganesh Gopalakrishnan, Wei-Fan Chiang, and Alexey Solovyev · Relevant Personal History • PhD from Stony Brook : 1981 (when Mead/Conway : VLSI, Hennessy : MIPS, Patterson : Sparc)!

Power-7 Studies

!30

Page 31: Ganesh Gopalakrishnan, Wei-Fan Chiang, and Alexey Solovyev · Relevant Personal History • PhD from Stony Brook : 1981 (when Mead/Conway : VLSI, Hennessy : MIPS, Patterson : Sparc)!

A “feel” of HPC Correctness• Constant pressure : The “most science per dollar”!

• Many dimensions of correctness!

• HPC explores unknown aspects of Sciences!

• Algorithmic Approximations are often made!

• Growing heterogeneity in HPC platforms!

• Floating-point representation is inexact!

• “Bit flips” !

• Correctness training lacks!

• Busy-enough doing Science!

• Finding and keeping “Pi men” is difficult!

• Always makes sense to switch to latest HW!

• Often the poorest documented

!31

RIKEN K machine

(Lazowka)

HPCSciences

Page 32: Ganesh Gopalakrishnan, Wei-Fan Chiang, and Alexey Solovyev · Relevant Personal History • PhD from Stony Brook : 1981 (when Mead/Conway : VLSI, Hennessy : MIPS, Patterson : Sparc)!

A “feel” of HPC Correctness• Constant pressure : The “most science per dollar”!

• Many dimensions of correctness!

• HPC explores unknown aspects of Sciences!

• Algorithmic Approximations are often made!

• Growing heterogeneity in HPC platforms!

• Floating-point representation is inexact!

• “Bit flips” !

• Correctness training lacks!

• Busy-enough doing Science!

• Finding and keeping “Pi men” is difficult!

• Always makes sense to switch to latest HW!

• Often the poorest documented

!32 (Our twist)

FMHPC

RIKEN K machine

(Lazowka)

HPCSciences

Page 33: Ganesh Gopalakrishnan, Wei-Fan Chiang, and Alexey Solovyev · Relevant Personal History • PhD from Stony Brook : 1981 (when Mead/Conway : VLSI, Hennessy : MIPS, Patterson : Sparc)!

A Heterogeneity-induced bug!(Berzins, Meng, Humphrey, XSEDE’12)

!33

P"="0.421874999999999944488848768742172978818416595458984375""C"="0.0026041666666666665221063770019327421323396265506744384765625""

Compute:"floor("P"/"C")"

Xeon%

"P"/"C"="161.9999…"floor("P"/"C")"="161%

Xeon%Phi%

"P"/"C"="162"floor("P"/"C")"="162%

Expecting 161 msgs

Sent 162 msgs

Page 34: Ganesh Gopalakrishnan, Wei-Fan Chiang, and Alexey Solovyev · Relevant Personal History • PhD from Stony Brook : 1981 (when Mead/Conway : VLSI, Hennessy : MIPS, Patterson : Sparc)!

A Heterogeneity-induced bug!(Berzins, Meng, Humphrey, XSEDE’12)

!34

P"="0.421874999999999944488848768742172978818416595458984375""C"="0.0026041666666666665221063770019327421323396265506744384765625""

Compute:"floor("P"/"C")"

Xeon%

"P"/"C"="161.9999…"floor("P"/"C")"="161%

Xeon%Phi%

"P"/"C"="162"floor("P"/"C")"="162%

Expecting 161 msgs

Sent 162 msgs

Authors’ fix : used double-precision for P/C!Question: Is there a more deft solution ?

Page 35: Ganesh Gopalakrishnan, Wei-Fan Chiang, and Alexey Solovyev · Relevant Personal History • PhD from Stony Brook : 1981 (when Mead/Conway : VLSI, Hennessy : MIPS, Patterson : Sparc)!

A Heterogeneity-induced bug!(Berzins, Meng, Humphrey, XSEDE’12)

!35

P"="0.421874999999999944488848768742172978818416595458984375""C"="0.0026041666666666665221063770019327421323396265506744384765625""

Compute:"floor("P"/"C")"

Xeon%

"P"/"C"="161.9999…"floor("P"/"C")"="161%

Xeon%Phi%

"P"/"C"="162"floor("P"/"C")"="162%

Expecting 161 msgs

Sent 162 msgs

Authors’ fix : used double-precision for P/C!Question: Is there a more deft solution ?!More important question : What exactly went wrong ??! (the XSEDE’12 authors moved along…)

Page 36: Ganesh Gopalakrishnan, Wei-Fan Chiang, and Alexey Solovyev · Relevant Personal History • PhD from Stony Brook : 1981 (when Mead/Conway : VLSI, Hennessy : MIPS, Patterson : Sparc)!

Resilience• ~7 B transistors per GPU (and many B for CPUs) and a ton of memory!

• 10^18 Transistors Throbbing at GHz for Weeks!

• Some bit changes MUST be unplanned ones!

• In HPC, results combine more (than, say, in “cloud”)!

• “Bit flip” is a catch-all term for !

• High speed-variability of devices coupled with DVFS jitter!

• Local hot spots develop, aging chip electronics!

• Particle strikes!

• Energy is the main currency!

• Some of the energy-saving “games” that must be played (this invites bit-flips)!

• Dynamic Slack Detection, followed by lowering voltage + frequency!

• One PNNL study (Kevin Baker) : 36KW -> 18KW

!36

Page 37: Ganesh Gopalakrishnan, Wei-Fan Chiang, and Alexey Solovyev · Relevant Personal History • PhD from Stony Brook : 1981 (when Mead/Conway : VLSI, Hennessy : MIPS, Patterson : Sparc)!

Our Position (1)

• Despite “bit flips” and such, it is amply clear that sequential and concurrency bugs still ought to be our principal focus!

• They occur quite predictably (unlike bit flips)!

• They are something we can control (and eliminate in many cases)

!37

Page 38: Ganesh Gopalakrishnan, Wei-Fan Chiang, and Alexey Solovyev · Relevant Personal History • PhD from Stony Brook : 1981 (when Mead/Conway : VLSI, Hennessy : MIPS, Patterson : Sparc)!

Our Position (2)

• Unless we can debug in the small, there is NO WAY we can debug in the large

!38

Page 39: Ganesh Gopalakrishnan, Wei-Fan Chiang, and Alexey Solovyev · Relevant Personal History • PhD from Stony Brook : 1981 (when Mead/Conway : VLSI, Hennessy : MIPS, Patterson : Sparc)!

Our Observations (3)

• There are SO MANY instances where experts are getting it wrong — and spreading the wrong

!39

Page 40: Ganesh Gopalakrishnan, Wei-Fan Chiang, and Alexey Solovyev · Relevant Personal History • PhD from Stony Brook : 1981 (when Mead/Conway : VLSI, Hennessy : MIPS, Patterson : Sparc)!

Example-1• IBM Documentation: “If you debug your MPI program

under zero Eager Limit (buffering for MPI sends), then adding additional buffering does not cause new deadlocks”

• It can

!40

Page 41: Ganesh Gopalakrishnan, Wei-Fan Chiang, and Alexey Solovyev · Relevant Personal History • PhD from Stony Brook : 1981 (when Mead/Conway : VLSI, Hennessy : MIPS, Patterson : Sparc)!

Example-1• IBM Documentation: “If you debug your MPI program

under zero Eager Limit (buffering for MPI sends), then adding additional buffering does not cause new deadlocks”

• It can

!41

Page 42: Ganesh Gopalakrishnan, Wei-Fan Chiang, and Alexey Solovyev · Relevant Personal History • PhD from Stony Brook : 1981 (when Mead/Conway : VLSI, Hennessy : MIPS, Patterson : Sparc)!

Example-2

• A reduction kernel given as an early-chapter example of a recent Cuda book is broken!

• Reason: Assumes that CUDA atomic-add has a “fence” semantics!

• Erratum has been issued on book website

!42

Page 43: Ganesh Gopalakrishnan, Wei-Fan Chiang, and Alexey Solovyev · Relevant Personal History • PhD from Stony Brook : 1981 (when Mead/Conway : VLSI, Hennessy : MIPS, Patterson : Sparc)!

Example-3

• A work-stealing queue in “GPU gems” is incorrect!

• Reason: Assumes “store store” ordering between two sequentially issued stores (must have used a fence in-between)

!43

Page 44: Ganesh Gopalakrishnan, Wei-Fan Chiang, and Alexey Solovyev · Relevant Personal History • PhD from Stony Brook : 1981 (when Mead/Conway : VLSI, Hennessy : MIPS, Patterson : Sparc)!

Feature of GPU programming

• Programmers face concurrency corner-cases quite frequently

• As opposed to (e.g.) OS where low-level concurrency is usually hidden within the kernel

!44

Page 45: Ganesh Gopalakrishnan, Wei-Fan Chiang, and Alexey Solovyev · Relevant Personal History • PhD from Stony Brook : 1981 (when Mead/Conway : VLSI, Hennessy : MIPS, Patterson : Sparc)!

Example-4

• If your code ran correctly in FORTRAN, it will also run correctly in C

!45

Page 46: Ganesh Gopalakrishnan, Wei-Fan Chiang, and Alexey Solovyev · Relevant Personal History • PhD from Stony Brook : 1981 (when Mead/Conway : VLSI, Hennessy : MIPS, Patterson : Sparc)!

Example-4 invalidated

!46

Page 47: Ganesh Gopalakrishnan, Wei-Fan Chiang, and Alexey Solovyev · Relevant Personal History • PhD from Stony Brook : 1981 (when Mead/Conway : VLSI, Hennessy : MIPS, Patterson : Sparc)!

Example-5 !Simple questions can’t be answered by today’s tools!

Does this program deadlock? (Yes.)

!47

Page 48: Ganesh Gopalakrishnan, Wei-Fan Chiang, and Alexey Solovyev · Relevant Personal History • PhD from Stony Brook : 1981 (when Mead/Conway : VLSI, Hennessy : MIPS, Patterson : Sparc)!

Example-5 !Simple questions can’t be answered by today’s tools!

Does this program deadlock? (Yes.)

!48

Match

Page 49: Ganesh Gopalakrishnan, Wei-Fan Chiang, and Alexey Solovyev · Relevant Personal History • PhD from Stony Brook : 1981 (when Mead/Conway : VLSI, Hennessy : MIPS, Patterson : Sparc)!

!49

__global__'void'kernel(int*'x,'int*'y)'{'''int'index'='threadIdx.x;'

''y[index]'='x[index]'+'y[index];'

''if'(index'!='63'&&'index'!='31)'''''y[index+1]'='1111;'

}'

Ini$ally(:(x[i](==(y[i](==(i(

Warp1size(=(32(

Expected(Answer:(0,(1111,(1111,(…,(1111,(64,(1111,(…('

Example-6 : Does Warp-Synchronous Programming Help Avoid a __syncthreads?

Page 50: Ganesh Gopalakrishnan, Wei-Fan Chiang, and Alexey Solovyev · Relevant Personal History • PhD from Stony Brook : 1981 (when Mead/Conway : VLSI, Hennessy : MIPS, Patterson : Sparc)!

Example-6 : Does Warp-Synchronous Programming Help Avoid a __syncthreads?

!50

__global__'void'kernel(int*'x,'int*'y)'{'''int'index'='threadIdx.x;'

''y[index]'='x[index]'+'y[index];'

''if'(index'!='63'&&'index'!='31)'''''y[index+1]'='1111;'

}'

Ini$ally(:(x[i](==(y[i](==(i(

Warp1size(=(32(

The'hardware'schedules'these'instrucKons'in'“warps”'(SIMD'groups).''

Expected(Answer:(0,(1111,(1111,(…,(1111,(64,(1111,(…('

Page 51: Ganesh Gopalakrishnan, Wei-Fan Chiang, and Alexey Solovyev · Relevant Personal History • PhD from Stony Brook : 1981 (when Mead/Conway : VLSI, Hennessy : MIPS, Patterson : Sparc)!

!51

__global__'void'kernel(int*'x,'int*'y)'{'''int'index'='threadIdx.x;'

''y[index]'='x[index]'+'y[index];'

''if'(index'!='63'&&'index'!='31)'''''y[index+1]'='1111;'

}'

Ini$ally(:(x[i](==(y[i](==(i(

Warp1size(=(32(

The'hardware'schedules'these'instrucKons'in'“warps”'(SIMD'groups).''

However,'this'“warp'view”'oSen'appears'to'be'lost'

Expected(Answer:(0,(1111,(1111,(…,(1111,(64,(1111,(…('

Example-6 : Does Warp-Synchronous Programming Help Avoid a __syncthreads?

Page 52: Ganesh Gopalakrishnan, Wei-Fan Chiang, and Alexey Solovyev · Relevant Personal History • PhD from Stony Brook : 1981 (when Mead/Conway : VLSI, Hennessy : MIPS, Patterson : Sparc)!

!52

__global__'void'kernel(int*'x,'int*'y)'{'''int'index'='threadIdx.x;'

''y[index]'='x[index]'+'y[index];'

''if'(index'!='63'&&'index'!='31)'''''y[index+1]'='1111;'

}'

Ini$ally(:(x[i](==(y[i](==(i(

Warp1size(=(32(

The'hardware'schedules'these'instrucKons'in'“warps”'(SIMD'groups).''

However,'this'“warp'view”'oSen'appears'to'be'lost'

E.g.'When'compiling'with'opKmizaKons'

Expected(Answer:(0,(1111,(1111,(…,(1111,(64,(1111,(…('

New(Answer:(0,(2,(4,(6,(8,(…'

Example-6 : Does Warp-Synchronous Programming Help Avoid a __syncthreads?

Page 53: Ganesh Gopalakrishnan, Wei-Fan Chiang, and Alexey Solovyev · Relevant Personal History • PhD from Stony Brook : 1981 (when Mead/Conway : VLSI, Hennessy : MIPS, Patterson : Sparc)!

!53

__global__'void'kernel(int*'x,'int*'y)'{'''int'index'='threadIdx.x;'

''y[index]'='x[index]'+'y[index];'

''if'(index'!='63'&&'index'!='31)'''''y[index+1]'='1111;'

}'

Ini$ally(:(x[i](==(y[i](==(i(

Warp1size(=(32(

The'hardware'schedules'these'instrucKons'in'“warps”'(SIMD'groups).''

However,'this'“warp'view”'oSen'appears'to'be'lost'

But'if'you'read'the'CUDA'documentaKon'Carefully,'you'noKce'you'had'to'use'a''C'VolaKle'that'restored'“correct”'answers!'

Expected(Answer:(0,(1111,(1111,(…,(1111,(64,(1111,(…('

Vola$le(x[],(y[]..'

Example-6 : Does Warp-Synchronous Programming Help Avoid a __syncthreads?

Page 54: Ganesh Gopalakrishnan, Wei-Fan Chiang, and Alexey Solovyev · Relevant Personal History • PhD from Stony Brook : 1981 (when Mead/Conway : VLSI, Hennessy : MIPS, Patterson : Sparc)!

!54

__global__'void'kernel(int*'x,'int*'y)'{'''int'index'='threadIdx.x;'

''y[index]'='x[index]'+'y[index];'

''if'(index'!='63'&&'index'!='31)'''''y[index+1]'='1111;'

}'

Ini$ally(:(x[i](==(y[i](==(i(

Warp1size(=(32(

The'hardware'schedules'these'instrucKons'in'“warps”'(SIMD'groups).''

However,'this'“warp'view”'oSen'appears'to'be'lost'

But'the'ability'to'“rescue'correct'answer”'is'no'longer'a'guarantee'(since'CUDA'5.0)'

Expected(Answer:(0,(1111,(1111,(…,(1111,(64,(1111,(…('

Example-6 : Does Warp-Synchronous Programming Help Avoid a __syncthreads?

Page 55: Ganesh Gopalakrishnan, Wei-Fan Chiang, and Alexey Solovyev · Relevant Personal History • PhD from Stony Brook : 1981 (when Mead/Conway : VLSI, Hennessy : MIPS, Patterson : Sparc)!

So you really trust your compilers?

• Talk to Prof. John Regehr of Utah!

• C-Smith : Differential testing of compilers!

• The single most impressive compiler testing work (IMHO) in recent times!

• Has found goof-ups in -O0 for short programs!

• Many bugs around C volatiles!

• Learned that NOTHING is known about how compilers (ought to) treat floating-point

!55

Page 56: Ganesh Gopalakrishnan, Wei-Fan Chiang, and Alexey Solovyev · Relevant Personal History • PhD from Stony Brook : 1981 (when Mead/Conway : VLSI, Hennessy : MIPS, Patterson : Sparc)!

Without swift action, the “din” of !the blind leading the blind will sow more confusion

!56

Some threads offer advice ranging from “use volatiles”!! (was in early CUDA documentation; gone since 5.0)!! Others advocate the use of __syncthreads (barriers) !Or query device registers to know warp size

!https://devtalk.nvidia.com/default/topic/512376/ https://devtalk.nvidia.com/default/topic/499715/ https://devtalk.nvidia.com/default/topic/382928/

!And there are several threads simply discuss this issue

!https://devtalk.nvidia.com/default/topic/632471 https://devtalk.nvidia.com/default/topic/377816/

!There isn’t a comprehensive picture of dos and don’t and WHY !

Page 57: Ganesh Gopalakrishnan, Wei-Fan Chiang, and Alexey Solovyev · Relevant Personal History • PhD from Stony Brook : 1981 (when Mead/Conway : VLSI, Hennessy : MIPS, Patterson : Sparc)!

Discussions on “warp-synchronous” code

!57

https://devtalk.nvidia.com/default/topic/499715/are-threads-of-a-warp-really-sync-/?offset=2

Page 58: Ganesh Gopalakrishnan, Wei-Fan Chiang, and Alexey Solovyev · Relevant Personal History • PhD from Stony Brook : 1981 (when Mead/Conway : VLSI, Hennessy : MIPS, Patterson : Sparc)!

Example-8!Do GPUs obey coherence?!

(Coherence = per-location Seq Consistency)

• Ask me after the talk……. :)

• We are stress testing real GPUs

• and finding things out!

• (work is inspired by Bill Collier who called it “X-raying real machines” in his famous RAPA book)

!58

Page 59: Ganesh Gopalakrishnan, Wei-Fan Chiang, and Alexey Solovyev · Relevant Personal History • PhD from Stony Brook : 1981 (when Mead/Conway : VLSI, Hennessy : MIPS, Patterson : Sparc)!

Our (humble) suggestion

• There is NO WAY the complexity of anything can be conquered without mathematics!

• The complexity of debugging needs the “mathematics of debugging” — the true mathematics of Software Engineering!

• i.e. formal methods!

• Must develop the “right kind” of formal methods!

• Coexist with the grubby!

• Take on problems in context!

• Win practitioner friends early — and KEEP THEM

!59

Page 60: Ganesh Gopalakrishnan, Wei-Fan Chiang, and Alexey Solovyev · Relevant Personal History • PhD from Stony Brook : 1981 (when Mead/Conway : VLSI, Hennessy : MIPS, Patterson : Sparc)!

What is hard about HPC Concurrency?• The Scale of Concurrency and the Number of interacting APIs !

• MPI-2, MPI-3, OpenMP, CUDA, OpenCL, OpenACC, PThreads, use of NonBlocking Data Structures, dynamic Scheduling!

• Each API thinks it “owns” the machine!

• Exposure of Everyday Programmer to Low Level Concurrency is a worrisome reality!!

• Memory Consistency Models Matter!

• Governs visibility across threads / fences!

• Yet, very poorly specified / understood!

• Compiler Optimizations — not even basic studies exist!60

Page 61: Ganesh Gopalakrishnan, Wei-Fan Chiang, and Alexey Solovyev · Relevant Personal History • PhD from Stony Brook : 1981 (when Mead/Conway : VLSI, Hennessy : MIPS, Patterson : Sparc)!

Is there a role for Formal Methods? • Yes indeed!!

• For instance, why is it that microprocessors don’t do “Pentium FDIV” any more?!

• Processor ALUs have only become even more complex!

• Answer : Formal gets serious use in the industry!

• Intel : Symbolic Trajectory Evaluation!

• Others : similar methods!

• Processors get FV to varying degrees for other subsystems!

• E.g. Cache coherence (at a protocol level)

!61

Page 62: Ganesh Gopalakrishnan, Wei-Fan Chiang, and Alexey Solovyev · Relevant Personal History • PhD from Stony Brook : 1981 (when Mead/Conway : VLSI, Hennessy : MIPS, Patterson : Sparc)!

Is there a role for Formal Methods?

• Yes indeed!!

• there are a fascinating array of correctness challenges!

• Very little involvement from mainstream CS side!

• lack of exposure, limited interactions across departments,!

• Need “cool show-pieces” to draw students to HPC research…

!62

Page 63: Ganesh Gopalakrishnan, Wei-Fan Chiang, and Alexey Solovyev · Relevant Personal History • PhD from Stony Brook : 1981 (when Mead/Conway : VLSI, Hennessy : MIPS, Patterson : Sparc)!

An example “cool project”!Utah Pi “cluster” built by PhD students at Utah!

“Mo” Mohammed Saeed Al Mahfoudh !and Simone Atzeni!

!(Under $500 ; Runs MPI, Habanero Java, …)

!63

Page 64: Ganesh Gopalakrishnan, Wei-Fan Chiang, and Alexey Solovyev · Relevant Personal History • PhD from Stony Brook : 1981 (when Mead/Conway : VLSI, Hennessy : MIPS, Patterson : Sparc)!

Anyone wanting to do software testing for concurrency must slay two

exponentials

!64

Page 65: Ganesh Gopalakrishnan, Wei-Fan Chiang, and Alexey Solovyev · Relevant Personal History • PhD from Stony Brook : 1981 (when Mead/Conway : VLSI, Hennessy : MIPS, Patterson : Sparc)!

!65

Anyone wanting to do software testing for concurrency must slay two

exponentials

Page 66: Ganesh Gopalakrishnan, Wei-Fan Chiang, and Alexey Solovyev · Relevant Personal History • PhD from Stony Brook : 1981 (when Mead/Conway : VLSI, Hennessy : MIPS, Patterson : Sparc)!

A FM Grab-bag for anyone wanting to debug concurrent programs

• Slay input-space exponential using!

• Symbolic Execution!

• Slay schedule-space exponential by !

• Not jiggling schedules that are Happens-Before equivalent

!66

Page 67: Ganesh Gopalakrishnan, Wei-Fan Chiang, and Alexey Solovyev · Relevant Personal History • PhD from Stony Brook : 1981 (when Mead/Conway : VLSI, Hennessy : MIPS, Patterson : Sparc)!

Not Exploring HB-Equivalent Schedules

!67

Page 68: Ganesh Gopalakrishnan, Wei-Fan Chiang, and Alexey Solovyev · Relevant Personal History • PhD from Stony Brook : 1981 (when Mead/Conway : VLSI, Hennessy : MIPS, Patterson : Sparc)!

A FM Grab-bag for anyone wanting to debug concurrent programs

• Concepts in the fuel-tank must include!

• Lamport’s “happens before”!

• Define concurrency coverage using it!

• Design active-testing methods that systematically explore schedule-space!

• Memory consistency models!

• Data races and how to detect them!

• Symbolic execution!

• Helps achieve input-space Coverage

!68

Page 69: Ganesh Gopalakrishnan, Wei-Fan Chiang, and Alexey Solovyev · Relevant Personal History • PhD from Stony Brook : 1981 (when Mead/Conway : VLSI, Hennessy : MIPS, Patterson : Sparc)!

Overview of our (active) projects• HPC Concurrency!

• Dynamic Verification Methods for MPI : CACM, Dec 2011!

• GPU data-race checking : PPoPP’12, SC’12, SC’14!

• Floating-point!

• Finding inputs that cause highest relative error (“sour spot search”) : PPoPP’14!

• Detecting and Root-Causing Non-determinism!

• Pruner project at LLNL - combined static / dynamic analysis for OpenMP race checking!

• System Resilience!

• We have developed an LLVM-level Fault Injector called KULFI!

• Using Coalesced Stack Trace Graphs to Highlight Behavioral Differences!

• Our main focus continues to be correctness tools for HPC Concurrency

!69

Page 70: Ganesh Gopalakrishnan, Wei-Fan Chiang, and Alexey Solovyev · Relevant Personal History • PhD from Stony Brook : 1981 (when Mead/Conway : VLSI, Hennessy : MIPS, Patterson : Sparc)!

Biggest Gain due to Formal Methods:!Conceptual Cohesion!

• Example : Helps understand that Concurrency and Sequential Abstractions Tessellate

• Helps Understand that Sequential == Deterministic

• Helps Understand Data Races as Breaking the Sequential Contract

!70

Page 71: Ganesh Gopalakrishnan, Wei-Fan Chiang, and Alexey Solovyev · Relevant Personal History • PhD from Stony Brook : 1981 (when Mead/Conway : VLSI, Hennessy : MIPS, Patterson : Sparc)!

Concurrency and Sequential Abstractions Tessellate !

!71

Fine%grained*concurrency**of*transistor%level*circuits*

Sequen6al*view*of*Boolean*Func6ons*(gates)*

Concurrent*State*Machines*Using*Gates*and*Flops*

Sequen6al*Program**Abstrac6ons*(e.g.*ISA)*

Shared*memory*or*Msg**Passing*based*Parallelism*

Solving*A*x*=*B*

Page 72: Ganesh Gopalakrishnan, Wei-Fan Chiang, and Alexey Solovyev · Relevant Personal History • PhD from Stony Brook : 1981 (when Mead/Conway : VLSI, Hennessy : MIPS, Patterson : Sparc)!

Why Fixate on Data Races?

• Key assumption that enables sequential thinking!

• Sequential almost always means Deterministic!

• In an Out of Order CPU, nothing is sequential!

• Yet we think of assembly programs as “sequential”!

• Only because they yield deterministic results!

• Create Hazards (say in a time-sensitive way)!

• Then we lose this sequential / deterministic abstraction!

• Parallel Programming Almost Always Strives to produce Sequential i.e. Deterministic Outcomes!

!72

Page 73: Ganesh Gopalakrishnan, Wei-Fan Chiang, and Alexey Solovyev · Relevant Personal History • PhD from Stony Brook : 1981 (when Mead/Conway : VLSI, Hennessy : MIPS, Patterson : Sparc)!

Races and Race-Free Generalized

!73

Fine%grained*concurrency**of*transistor%level*circuits*

Sequen6al*view*of*Boolean*Func6ons*(gates)*

Concurrent*State*Machines*Using*Gates*and*Flops*

Sequen6al*Program**Abstrac6ons*(e.g.*ISA)*

Shared*memory*or*Msg**Passing*based*Parallelism*

Solving*A*x*=*B*

Critical races!gives gates!that spike !

(broken Boolean!Abstraction)

Page 74: Ganesh Gopalakrishnan, Wei-Fan Chiang, and Alexey Solovyev · Relevant Personal History • PhD from Stony Brook : 1981 (when Mead/Conway : VLSI, Hennessy : MIPS, Patterson : Sparc)!

Races and Race-Free Generalized

!74

Fine%grained*concurrency**of*transistor%level*circuits*

Sequen6al*view*of*Boolean*Func6ons*(gates)*

Concurrent*State*Machines*Using*Gates*and*Flops*

Sequen6al*Program**Abstrac6ons*(e.g.*ISA)*

Shared*memory*or*Msg**Passing*based*Parallelism*

Solving*A*x*=*B*

Races between!Clocks and Data!

Breaks !Seq. Abstraction.

Page 75: Ganesh Gopalakrishnan, Wei-Fan Chiang, and Alexey Solovyev · Relevant Personal History • PhD from Stony Brook : 1981 (when Mead/Conway : VLSI, Hennessy : MIPS, Patterson : Sparc)!

Races and Race-Free Generalized

!75

Fine%grained*concurrency**of*transistor%level*circuits*

Sequen6al*view*of*Boolean*Func6ons*(gates)*

Concurrent*State*Machines*Using*Gates*and*Flops*

Sequen6al*Program**Abstrac6ons*(e.g.*ISA)*

Shared*memory*or*Msg**Passing*based*Parallelism*

Solving*A*x*=*B*

Data Races!Break Sequential!

Consistency ! ( Unsynchronized!

Interleavings!Matter )

Page 76: Ganesh Gopalakrishnan, Wei-Fan Chiang, and Alexey Solovyev · Relevant Personal History • PhD from Stony Brook : 1981 (when Mead/Conway : VLSI, Hennessy : MIPS, Patterson : Sparc)!

Results on UT Lonestar Benchmarks

!76

Page 77: Ganesh Gopalakrishnan, Wei-Fan Chiang, and Alexey Solovyev · Relevant Personal History • PhD from Stony Brook : 1981 (when Mead/Conway : VLSI, Hennessy : MIPS, Patterson : Sparc)!

Results on UIUC Parboil Benchmarks

!77

Page 78: Ganesh Gopalakrishnan, Wei-Fan Chiang, and Alexey Solovyev · Relevant Personal History • PhD from Stony Brook : 1981 (when Mead/Conway : VLSI, Hennessy : MIPS, Patterson : Sparc)!

Uintah: A Scalable Computational Framework for Multi-physics problems

• Under continuous development over the past decade

• Scalability to 700K CPU cores possible now

• ~1M LOC or more!

• Modular extensibility to accommodate GPUs and Xeon Phis

• Partitions concerns

• App developer writes sequential apps!

• Infrastructure developer tunes / improves perf

!78

Page 79: Ganesh Gopalakrishnan, Wei-Fan Chiang, and Alexey Solovyev · Relevant Personal History • PhD from Stony Brook : 1981 (when Mead/Conway : VLSI, Hennessy : MIPS, Patterson : Sparc)!

Uintah Organization

!79

ICE MPM ARCHES

SimulationController

Load Balancer

Scheduler

t4

t1 t2 t3

t5 t6

t7 t8 t9

t10

t11 t12t13

Application Packages

Abstract Directed Acyclic Tast Graph

Runtime System

Page 80: Ganesh Gopalakrishnan, Wei-Fan Chiang, and Alexey Solovyev · Relevant Personal History • PhD from Stony Brook : 1981 (when Mead/Conway : VLSI, Hennessy : MIPS, Patterson : Sparc)!

Case Study: Data Warehouse Error!Collect Coalesced call-paths leading to DW::put().!

Diffed across two scheduler versions to isolate bug

!80

� ����

�������� ��

�������� ��

� � �������� ��

�������� ��

�������� ��

�������������������

��������� � !������

� �

�"���#�����

� � �

�"�����

$ %����&���!�����������

� �

$ %����&���!������ � �'

( )

��� !��� �'�����

( )

�������� ��� ��!�

� ���&����� � � ���

���&��� �� � � ���

��� � �*�� ��� ������������� + �,

� (

�! ���� -!�� ���% ����.���� ��

/

������ �*�#! ���������� -!�������

������ �*�#! �� � !0���� �

� �'�����

���� ��������

/

���� /�������

���� /�������

� (

���� ��������

/

���� �������

/ �

���!����!#������������ ��*1 .

���!����!#����.���� ��*2����

� ����� ���������������

� ����� ����. �&�������� %���

� ����� ����. �&�������� %���

� ����� ����. �&�������� %���

�3����� � �� � ���������&���

��� !�3��� ���

� ��� !�3��� ���

�� �

� (

/ �

� �

� �

( )

( )

�������������������

��������� � !������

( 4

�"�����

�3���&���!�������������

�3���&���!��� ����������

� �

�3���&���!���� � ���������

�3���&���!������ � �'

( )

��� !��� �'�������

( )

�3���&���!������ ������� � �'

�( )

�"���������3��

�������� ��� ��!�

� ���&����� � � ���

���&��� �� � � ���

��� � �*�� �� � ������������� + �,

� (

�! ���� -!�� ���% ����.���� ��

/

������ �*�#! ���������� -!�������

������ �*�#! �� � !0���� �

� �'������2

���� ��������

/

���� /�������

���� /�������

� (

���� ��������

/

���� ������5

/ �

���!����!#������������ ��*1 .

���!����!#����.���� ��*2����

� ����� ���������������

� ����� ����. �&�������� %���

� ����� ����. �&�������� %���

� ����� ����. �&�������� %���

�3����� � �� � ���������&���

��� !�3���� ���

� ��� !�3���� ���

�� �

� (

/ �

� �

� �

( )

� ����

�������� ��

�������� ��

�������� ��

�������� ��

�������� ��

�"���#�����

� � �

( 4

�����������

��� �����������������

��� �������������������������

��� ������� ����������

��� ���������������

� �

��� ������������������������

������������������

� �

��� ����� ���������������

��� ����� ���������������

!��

!��

"#���

����$�������

%

����$�������

����$������&

%

����$�������

%

����$������'

%

����$�����������$����(

%

����$�������������$����(

�����)������

% %%

����(��

�* + !*+, !,

%

����������

����$�������������

%

���������������

'�(��������)����-��������./����

%

��(�� � ��� � ���$(����� ���

%

%%

������0��������

'�(��������)������$(���������.1�-

%

%

������0�������&

�������.&����� �������$(����&����2��3

%

%

������,��������

&��������4������� �����-�������

%

%

�������������'

%

��� ����(���$������

%

��� ��� �(���$������

%

����������&����$(����&����

%

����������&��-�� ��������� ���

%

����������&��-�� ��������� ���

%

����������&��-�� ��������� ��&

% ��������������

%

��������������

%

%

%

%

%

% %

&�$(����.��)�������������5�&����

%

%

&�$(����.��)�������$(������4����$����(

%

%

%

% % % %

%

Page 81: Ganesh Gopalakrishnan, Wei-Fan Chiang, and Alexey Solovyev · Relevant Personal History • PhD from Stony Brook : 1981 (when Mead/Conway : VLSI, Hennessy : MIPS, Patterson : Sparc)!

Conceptual view of Uintah equipped with a monitoring network (future work)

!81

Static Analysis of DWH and Scheduler

Automaton Learning from Traces

Tailor Learning for Hybrid Concurrency Events

Build Cross-Layer Monitoring Hierarchies

Derive System Control Invariants to Document + Debug via CSTG

HierarchicalActive Testing

and Monitoring

usingStandardized

InterfacesInternal Ready Queue Post MPI

Receive

External Ready Queue

GPU Ready Queue

CPUCheck MPI

Receive

Post MPISends

Check Host to Device Copy Device Device to

Host Copy

Internal Ready Task

Completed Task

TaskGraph

Post Device Copy

DeviceEnabled

DW::reduceMPI

MPIScheduler: :execute+A

MPIScheduler::initiateReduction

1

MPIScheduler: :execute+B

MPIScheduler::runTask

7 3

MPIScheduler::runReductionTask

1

1

DetailedTask::doit

7 3

UnifiedScheduler::execute

UnifiedScheduler::runTask

-73

-73

./sus

AMRSim::run+A

0

AMRSim::run+B

1

AMRSim::run+C

0

AMRSim::run+D

0

AMRSim::run+E

0

AMRSim::executeTimestep

0

AMRSim::doInitialTimestep

1

DW::override

0 00

16 9 -694 -4

Task::doit

0

Automatato Trigger

CSTGCollection

Static Analysis

HelpsRefineCSTGs

Task GraphCompilationto Generate

SalientHigh-Level

Eventsto

Cross-Check

Page 82: Ganesh Gopalakrishnan, Wei-Fan Chiang, and Alexey Solovyev · Relevant Personal History • PhD from Stony Brook : 1981 (when Mead/Conway : VLSI, Hennessy : MIPS, Patterson : Sparc)!

Concluding Remarks

• Slaying bugs in HPC essential for Exascale!

• Need a mix of empirical to formal !

• Formal helps with concurrency coverage!

• Formal helps write clear unambiguous and validated specs!

• and educate sure-footedly

!82

Page 83: Ganesh Gopalakrishnan, Wei-Fan Chiang, and Alexey Solovyev · Relevant Personal History • PhD from Stony Brook : 1981 (when Mead/Conway : VLSI, Hennessy : MIPS, Patterson : Sparc)!

thanks!

• www.cs.utah.edu/fv

• Thanks to my former students who have taught me everything I know about FV and its relevance in the industry

!83

Page 84: Ganesh Gopalakrishnan, Wei-Fan Chiang, and Alexey Solovyev · Relevant Personal History • PhD from Stony Brook : 1981 (when Mead/Conway : VLSI, Hennessy : MIPS, Patterson : Sparc)!

The rest of the talk• Some results in GPU Data Race Checking!

• Demo of Symbolic Execution and GKLEE !

• Data Race Detection in GPU Programs!

• Computational Frameworks!

• Uintah !

• How Coalesced Stack Trace Graphs help debug !

• Other projects : Floating-Point Correctness and System Resilience !

• Concluding Remarks

!84

Page 85: Ganesh Gopalakrishnan, Wei-Fan Chiang, and Alexey Solovyev · Relevant Personal History • PhD from Stony Brook : 1981 (when Mead/Conway : VLSI, Hennessy : MIPS, Patterson : Sparc)!

The rest of the talk• Some results in GPU Data Race Checking!

• Demo of Symbolic Execution and GKLEE !

• Data Race Detection in GPU Programs!

• Computational Frameworks!

• Uintah !

• How Coalesced Stack Trace Graphs help debug !

• Other projects : Floating-Point Correctness and System Resilience !

• Concluding Remarks

!85

Page 86: Ganesh Gopalakrishnan, Wei-Fan Chiang, and Alexey Solovyev · Relevant Personal History • PhD from Stony Brook : 1981 (when Mead/Conway : VLSI, Hennessy : MIPS, Patterson : Sparc)!

The key to data race checking

• For the most part, CUDA code is synchronized via barriers (__syncthread)

• Thus, explore a “canonical” interleaving, hoping to detect the “first race” if there is any race

!86

Page 87: Ganesh Gopalakrishnan, Wei-Fan Chiang, and Alexey Solovyev · Relevant Personal History • PhD from Stony Brook : 1981 (when Mead/Conway : VLSI, Hennessy : MIPS, Patterson : Sparc)!

Interleaving exploration

!87

For$Example:$If$the$green$dots$are$local$thread$ac6ons,$$then$all$schedules$$that$arrive$at$the$“cut$line”$$are$equivalent!$

Page 88: Ganesh Gopalakrishnan, Wei-Fan Chiang, and Alexey Solovyev · Relevant Personal History • PhD from Stony Brook : 1981 (when Mead/Conway : VLSI, Hennessy : MIPS, Patterson : Sparc)!

Finding Representative Interleavings

!88

For$Example:$If$the$green$dots$are$local$thread$ac6ons,$$then$all$schedules$$that$arrive$at$the$“cut$line”$$are$equivalent!$

Page 89: Ganesh Gopalakrishnan, Wei-Fan Chiang, and Alexey Solovyev · Relevant Personal History • PhD from Stony Brook : 1981 (when Mead/Conway : VLSI, Hennessy : MIPS, Patterson : Sparc)!

Finding Representative Interleavings

!89

For$Example:$If$the$green$dots$are$local$thread$ac6ons,$$then$all$schedules$$that$arrive$at$the$“cut$line”$$are$equivalent!$

Page 90: Ganesh Gopalakrishnan, Wei-Fan Chiang, and Alexey Solovyev · Relevant Personal History • PhD from Stony Brook : 1981 (when Mead/Conway : VLSI, Hennessy : MIPS, Patterson : Sparc)!

GKLEE Examines Canonical Schedule

!90

Instead(of(considering(all(Schedules(and((All(Poten5al(Races…(

Page 91: Ganesh Gopalakrishnan, Wei-Fan Chiang, and Alexey Solovyev · Relevant Personal History • PhD from Stony Brook : 1981 (when Mead/Conway : VLSI, Hennessy : MIPS, Patterson : Sparc)!

GKLEE Examines Canonical Schedule

!91

Instead(of(considering(all(Schedules(and((All(Poten5al(Races…(

Consider(JUST(THIS(SINGLE(CANONICAL(SCHEDULE(!!(

Folk(Theorem((proved(in(our(paper):(“We(will(find(A(RACE(If(there(is(ANY(race”(!!(

Page 92: Ganesh Gopalakrishnan, Wei-Fan Chiang, and Alexey Solovyev · Relevant Personal History • PhD from Stony Brook : 1981 (when Mead/Conway : VLSI, Hennessy : MIPS, Patterson : Sparc)!

An Example with Two Data Races

!92

Page 93: Ganesh Gopalakrishnan, Wei-Fan Chiang, and Alexey Solovyev · Relevant Personal History • PhD from Stony Brook : 1981 (when Mead/Conway : VLSI, Hennessy : MIPS, Patterson : Sparc)!

An Example with Two Data Races

!93

The “classic race”!Threads i and i+1 race

Page 94: Ganesh Gopalakrishnan, Wei-Fan Chiang, and Alexey Solovyev · Relevant Personal History • PhD from Stony Brook : 1981 (when Mead/Conway : VLSI, Hennessy : MIPS, Patterson : Sparc)!

An Example with Two Data Races

!94

The “classic race”!Threads i and i+1 race

Not explained in any CUDA book as a race!This is the “porting race” (evaluation order between !

divergent warps is unspecified)

Page 95: Ganesh Gopalakrishnan, Wei-Fan Chiang, and Alexey Solovyev · Relevant Personal History • PhD from Stony Brook : 1981 (when Mead/Conway : VLSI, Hennessy : MIPS, Patterson : Sparc)!

GKLEE’s steps

!95

Page 96: Ganesh Gopalakrishnan, Wei-Fan Chiang, and Alexey Solovyev · Relevant Personal History • PhD from Stony Brook : 1981 (when Mead/Conway : VLSI, Hennessy : MIPS, Patterson : Sparc)!

!96

Symbolic Execution

GKLEE’s steps

Page 97: Ganesh Gopalakrishnan, Wei-Fan Chiang, and Alexey Solovyev · Relevant Personal History • PhD from Stony Brook : 1981 (when Mead/Conway : VLSI, Hennessy : MIPS, Patterson : Sparc)!

!97

Compute!Conflicts!and solve!for races

Symbolic Execution

Compute!Conflicts!and solve!for races

GKLEE’s steps

Page 98: Ganesh Gopalakrishnan, Wei-Fan Chiang, and Alexey Solovyev · Relevant Personal History • PhD from Stony Brook : 1981 (when Mead/Conway : VLSI, Hennessy : MIPS, Patterson : Sparc)!

GKLEE of PPoPP 2012

!98

LLVM$byte)code$

instruc2ons$

Symbolic$Analyzer$and$Scheduler$

Error$$Monitors$

C++$CUDA$Programs$with$Symbolic$Variable$

Declara2ons$

LLVM)GCC$

• "Deadlocks"• "Data"races"• "Concrete"test"inputs"• "Bank"conflicts"• "Warp"divergences"• "Non9coalesced""• $Test$Cases$

• $Provide$high$coverage$• $Can$be$run$on$HW$

Page 99: Ganesh Gopalakrishnan, Wei-Fan Chiang, and Alexey Solovyev · Relevant Personal History • PhD from Stony Brook : 1981 (when Mead/Conway : VLSI, Hennessy : MIPS, Patterson : Sparc)!

The advantages of a symbolic-execution based GPU Race Checker: Produces concrete witnesses!

!99

__global__'void'histogram64Kernel(unsigned'*d_Result,'unsigned'*d_Data,'int'dataN)'{'''const'int'threadPos'='((threadIdx.x'&'(~63))'>>'0)'''''''''''''''''''''''''''''''''''''''|'((threadIdx.x'&'15)'<<'2)'''''''''''''''''''''''''''''''''''''''|'((threadIdx.x'&'48)'>>'4);''''...'''__syncthreads();'''for'(int'pos'='IMUL(blockIdx.x,'blockDim.x)'+'threadIdx.x;'pos'<'dataN;''''''''''''pos'+='IMUL(blockDim.x,'gridDim.x))''{'''''unsigned'data4'='d_Data[pos];''''''...'''''addData64(s_Hist,'threadPos,'(data4'>>'26)'&'0x3FU);'}'''''__syncthreads();'...'}'inline'void'addData64(unsigned'char'*s_Hist,'int'threadPos,'unsigned'int'data)'

{''s_Hist['threadPos'+'IMUL(data,'THREAD_N)']++;'}'

“GKLEE:'Is'there'a'Race'?”'

Page 100: Ganesh Gopalakrishnan, Wei-Fan Chiang, and Alexey Solovyev · Relevant Personal History • PhD from Stony Brook : 1981 (when Mead/Conway : VLSI, Hennessy : MIPS, Patterson : Sparc)!

The advantages of a symbolic-execution based GPU Race Checker: Produces concrete witnesses!

!100

__global__'void'histogram64Kernel(unsigned'*d_Result,'unsigned'*d_Data,'int'dataN)'{'''const'int'threadPos'='((threadIdx.x'&'(~63))'>>'0)'''''''''''''''''''''''''''''''''''''''|'((threadIdx.x'&'15)'<<'2)'''''''''''''''''''''''''''''''''''''''|'((threadIdx.x'&'48)'>>'4);''''...'''__syncthreads();'''for'(int'pos'='IMUL(blockIdx.x,'blockDim.x)'+'threadIdx.x;'pos'<'dataN;''''''''''''pos'+='IMUL(blockDim.x,'gridDim.x))''{'''''unsigned'data4'='d_Data[pos];''''''...'''''addData64(s_Hist,'threadPos,'(data4'>>'26)'&'0x3FU);'}'''''__syncthreads();'...'}'inline'void'addData64(unsigned'char'*s_Hist,'int'threadPos,'unsigned'int'data)'

{''s_Hist['threadPos'+'IMUL(data,'THREAD_N)']++;'}'

Threads'5'and'and'13''have'a''WW'race''

when'd_Data[5]'='0x04040404'and'd_Data[13]'='0.''GKLEE''

Page 101: Ganesh Gopalakrishnan, Wei-Fan Chiang, and Alexey Solovyev · Relevant Personal History • PhD from Stony Brook : 1981 (when Mead/Conway : VLSI, Hennessy : MIPS, Patterson : Sparc)!

GKLEE of SC’12 introduced the idea of Parametric Flows !GKLEEp tool introduced (for race-checking mostly)

!101

Page 102: Ganesh Gopalakrishnan, Wei-Fan Chiang, and Alexey Solovyev · Relevant Personal History • PhD from Stony Brook : 1981 (when Mead/Conway : VLSI, Hennessy : MIPS, Patterson : Sparc)!

Idea behind Parametric flows:!Capitalize on Thread Symmetry !

Divide behavior into Flow Equivalence Classes!!

!102

Page 103: Ganesh Gopalakrishnan, Wei-Fan Chiang, and Alexey Solovyev · Relevant Personal History • PhD from Stony Brook : 1981 (when Mead/Conway : VLSI, Hennessy : MIPS, Patterson : Sparc)!

Idea behind Parametric flows: Capitalize on Thread Symmetry !Divide behavior into Flow Equivalence Classes!

!103

Keep two symbolic threads per flow-group and race-check per flow.

Page 104: Ganesh Gopalakrishnan, Wei-Fan Chiang, and Alexey Solovyev · Relevant Personal History • PhD from Stony Brook : 1981 (when Mead/Conway : VLSI, Hennessy : MIPS, Patterson : Sparc)!

Where Race-Checking Happens under Parameterized Flows!

!104

Page 105: Ganesh Gopalakrishnan, Wei-Fan Chiang, and Alexey Solovyev · Relevant Personal History • PhD from Stony Brook : 1981 (when Mead/Conway : VLSI, Hennessy : MIPS, Patterson : Sparc)!

Where Race-Checking Happens under Parameterized Flows!

!105

Intra-Flow

Page 106: Ganesh Gopalakrishnan, Wei-Fan Chiang, and Alexey Solovyev · Relevant Personal History • PhD from Stony Brook : 1981 (when Mead/Conway : VLSI, Hennessy : MIPS, Patterson : Sparc)!

Where Race-Checking Happens under Parameterized Flows!

!106

Inter-Flow

Page 107: Ganesh Gopalakrishnan, Wei-Fan Chiang, and Alexey Solovyev · Relevant Personal History • PhD from Stony Brook : 1981 (when Mead/Conway : VLSI, Hennessy : MIPS, Patterson : Sparc)!

Favorable Results

!107

Page 108: Ganesh Gopalakrishnan, Wei-Fan Chiang, and Alexey Solovyev · Relevant Personal History • PhD from Stony Brook : 1981 (when Mead/Conway : VLSI, Hennessy : MIPS, Patterson : Sparc)!

Yet, Unfavorable Results often…

!108

When parametric flow division happens inside a loop, we can get an exp # of flows.

Page 109: Ganesh Gopalakrishnan, Wei-Fan Chiang, and Alexey Solovyev · Relevant Personal History • PhD from Stony Brook : 1981 (when Mead/Conway : VLSI, Hennessy : MIPS, Patterson : Sparc)!

Symbolic Execution with Static Analysis!(SC’14 accepted paper)

!109

Page 110: Ganesh Gopalakrishnan, Wei-Fan Chiang, and Alexey Solovyev · Relevant Personal History • PhD from Stony Brook : 1981 (when Mead/Conway : VLSI, Hennessy : MIPS, Patterson : Sparc)!

How SESA works• Static Analysis Pass marks how vars affected by each flow in a barrier

interval may affect the generation of addresses in the next barrier interval

!110

Barrier

Barrier

Barrier

There are two !classes of flows:!(1) Flows that modify !Global or Shared Var!That flow into Control!Predicates or Array Indexing !Positions

(2) Flows that don’t do so

Within the next Barrier Interval,!“OR” the Green Flows into one flow

Page 111: Ganesh Gopalakrishnan, Wei-Fan Chiang, and Alexey Solovyev · Relevant Personal History • PhD from Stony Brook : 1981 (when Mead/Conway : VLSI, Hennessy : MIPS, Patterson : Sparc)!

SESA Results• We have been able to run SESA on !

• Lonestar Benchmarks (UT)!

• Parboil Benchmarks (UIUC)!

• It scales well and finds issues!

• Races!

• Out of bounds accesses!

• Tool being integrated into Eclipse

!111


Recommended