Post on 11-Feb-2020
transcript
Preparing for aPost Moore’s Law World
Todd Austin
University of Michigan
Introductions…
• Background
• Computer Architect == H/W + S/W
• PhD in CS from UW-Madison in ‘96
• An architect at Intel until ’99, then Michigan
• Teaching
• Architecture, compilers, programming, security
• Special focus on CSE development in Ethiopia
• Co-author of an undergraduate computerarchitecture textbook with Andy Tanenbaum
• Research
• Computer Architecture: EVA, SimpleScalar, Cyclone
• Computer Security: SafeC, Testudo, Schnauzer, CDI, A2 attack
• Reliable Systems: DIVA, Razor, BulletProof
• Director of C-FAR : Center for Future Architectures Research
Perspectives on Scaling
• C-FAR: Center for Future Architectures Research• Focused on scaling in 2020-2030 silicon
• Performance, power and cost
• 27 faculty at 14 universities, 92 students
• Why is C-FAR’s mission important?• The promise… tomorrow’s applications need powerful systems
• Why is C-FAR’s mission challenging? • The threats… slowing innovation and degrading silicon
3
Computer Vision Machine Learning Big Data AnalyticsEnd of Dennard ScalingMany Idle Cores Silicon Defects
All of the work presented in this talkis that of C-FAR faculty.
Moore’s Law Performance Gap
4
Today, gap iscresting 10x
Lack of perceivedvalue
Dark silicon
Diminished ILP
180130
9065
4532
22
14
10
7
1
10
100
1000
Te
ch
no
log
y N
od
e (
nm
)
10nm slipsby 5-6 quarters
14nm slipsby 2 quarters
7nm by end 2020?
Is Density Still Scaling?
Street Dates for Intel’s Lead Generation Products
Compiled with David Brooks @ Harvard
5
But, the technology scaling component has left us.
What Does This All Mean to Architects?
6
Today, value = scalability (performance, power, cost).
Remedy #1: Chip Multiprocessors
7
CMP Performance Scaling for the Highly Parallel PARSEC Benchmarks
8
From “Dark Silicon and the End of Multicore Scaling,” by Esmaeilzadeh et al.
What Does the Press Think?
9
We Investigate: Who’s to Blame?
10
?Programmers
Largest NA Bitcoin Miner
• GPGPU-based system
• Fills 2000 sq.ft. warehouse
• Computes 1 petahash/s
• Reportedly generates $8M in Bitcoins per month
• Unfortunately soon to be obsolete as Bitcoin difficulty continues to scale
11
We Investigate: Who’s to Blame?
12
?Programmers
Educators
CS Education is Booming
• CS enrollment on a fast-rising trajectory for a decade
• Parallel programming at UM• EECS 381, Object-Oriented and Advanced Programming
• EECS 482, Operating Systems
• EECS 570, Parallel Computer Architecture
• EECS 587, Parallel Computing
• EECS 591, Distributed Systems
• EECS 598, Ubiquitous Parallelism
• I have been teaching anddeveloping CS in Ethiopia• Nearly 600 students in the
CS program
• 2nd most popular major in theuniversity
13
CS
EE
CE
UM EECS Enrollment
We Investigate: Who’s to Blame?
14
?Programmers
Educators The Transistor
The Dark Silicon Dilemma
15
Courtesy Michael Taylor @ UCSD
The Dark Silicon Dilemma
16
Courtesy Michael Taylor @ UCSD
The Dark Silicon Dilemma
17
Courtesy Michael Taylor @ UCSD
We Investigate: Who’s to Blame?
18
?Programmers
Educators
Architects
The Transistor
The Tyranny of Amdahl’s Law
19
(P)
(N)
(S)
Where we need to be today! (10x)
We Investigate: Who’s to Blame?
20
?Programmers
Educators
Architects
The Transistor
What is the solution?
A Story aboutJason and His Two Advisors
21
EVA: Embedded Vision Architecture
22
Application-specificFunctional Units
Heterogeneous Multicore
EVA Functional UnitsMonopoly Compare, Dot Product Unit, Vector Max, Decision Tree Compare
Initial EVA design:90x greater efficiency for computer vision algorithms
CustomizedMemorySystem
Where We Need to Focus
23
Parallelism Customization
Heterogeneous parallel systems overcome dark silicon and the tyranny of Amdahl’s Law.
Why These Ideas Will Likely Fail, Unless We Make a Change…
• The Good: Hetero-parallel systems can close the Moore’s Law gap
• The Bad: Dennard scaling has stopped, Moore’s Law is slowing, leaving a growing gap
• The Ugly: Hetero-parallel designs needed to close the gap will be too expensive to afford• We must make design much cheaper!
24
What I Want You to Remember
• Successfully bridging the Moore’s Law performance gap is less about “How” to do it and more about “How Much” does it cost!
• My claim: if we can effect a 100x reduction in the cost to bring a design to market, innovation will flourish and scaling challenges will be overcome.
25
Design Costs Are Skyrocketing
0
20
40
60
80
100
120
140
0.5u 0.35u 0.25u 0.18u 0.13u 90nm 65nm 45nm 28nm 20nm
Co
st t
o M
arke
t ($
mill
ion
)
Silicon Technology Node
Mask Costs
S/W Development and Testing
H/W Design and Verification
Source: International Business Strategies
26
$88M
$120M
$500K
Outcome: “Nanodiversity” is Dwindling
Source: Gartner Group
27
0
2000
4000
6000
8000
10000
12000
1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014
Tota
l ASI
C S
tart
s
Year
Inexpensive “Design” Promotes Innovation and Adaptation
• Don’t Believe Me? Ask Mother Nature!• r/K selection theory is a biological mechanism
that organisms use to better adapt to their environment
• In unstable environments, r-selectionpredominates as the ability to reproduce quickly is crucial
• In stable environments, K-selectionpredominates as the ability to compete successfully for limited resources is crucial
28
The Remedy: Scale Innovation
• Ultimate goal: accelerate system architecture innovation and make it sufficiently inexpensive that anyone can do it anywhere
• Approach #1: Expect more from architectural innovation
• Approach #2: Reduce the cost to design custom hardware
• Approach #3: Embrace open-source concepts to reduce costs
• Approach #4: Widen the applicability of custom hardware
• Approach #5: Reduce the cost of manufacturing custom H/W
29
1) Expect more from architectural innovation
30
“Give me 15% speedup and I’ll
accept your paper”
“I need 1% speedup for 1%
area”
“Your idea needs to deliver 2x or more, or someone
else should fund it”
HELIX-UP Unleashed Parallelization
• Traditional parallelizing compilers must honor possible dependencies
• HELIX-UP manufactures parallelism by profiling which deps do not exist and which are not needed• Based on user supplied output
distortion function
• Big step for parallelization• 2x speedup over parallelizing
compilers, 6x over serial, < 7% distortion
Thread 0Thread 1Thread 2Thread 3
Data
Data
Data
Iteration 0
Iteration 1
David Brooks @ Harvard
Nehalem 6 cores, 2 threads per core
31
Association Rule Mining with the Automata Processor
• Micron’s Automata processor• Implements FSMs at memory
• Massively parallel with accelerators
• Mapped data-mining ARM rules to memory-based FSMs• ARM algorithms identify relationships
between data elements
• Implementations are often memory bottlenecked
• Big-data sets had big speedups• 90x+ over single CPU performance
• 2-9x+ speedups over CMPs and GPUs
• Joint effort with UVA and Micron
32
Kevin Skadron @ UVA
2) Reduce the cost to design custom hardware
• Better tools and infrastructure• Scalable accelerator synthesis and compilation, generate code and H/W for
highly reusable accelerators
• Composable design space exploration, enables efficient exploration of highly complex design spaces
• Well put-together benchmark suites to drive development efforts
33
Shared Memory/InterconnectModels
UnmodifiedC-Code
Accelerator DesignParameters
(e.g., # FU, mem. BW)
Private L1/Scratchpad
AcceleratorSpecific
Datapath
David Brooks@ Harvard
FeatureTracking
DisparityMap
Image Stitch
ImageSegmentation
RobotLocalization
TextureSynthesis
SIFT
Support Vector
Machines
CortexSuite:A Synthetic Brain Benchmark Suite
Michael Taylor @ UCSD
34
• Thought experiment: let’s design the next great smartphone
3) Embrace Open-Source Concepts to Reduce Costs
35
Red = non-free IP, Green = free IP
3) Embrace Open-Source Concepts to Reduce Costs
36
As a community, we need to consider:How much of our basic technologyshould be collectively maintained?
Red = non-free IP, Green = free IP
Does Open Source Mean an End to Profits?
• No, not in the way I am suggesting we utilize open source…• Should all hardware designs be open sourced? NO!
• Should all hardware design be closed sourced? NO!
• We need to decide as a community what IP is no longer worth investing in its closed-source development• Because it’s not worth the $$$ to maintain older stable technologies
• Examples: USB controllers, standard bus controllers, DDRx memory controllers, simple FPGAs and CPUs
• Instead, develop and maintain these as a community and invest in developing closed-source IP beyond these components
37
Open-Source H/W is Growing
38
4) Widen the Applicability of Customized H/W
39
• ESP: Ensembles of Specialized Processors
• Ensembles are algorithmic-specific processors optimized for code “patterns”
• Approach uses composable customization to deliver speed and efficiency that is widely applicable to general purpose programs
• Grand challenges remain: what are the components and how are they connected?
ILP Engine
Dense Engine
Sparse Engine
Graph Engine
ESP Core
Glue Code
Dense Code
SparseCode
Graph Code
ESP Code
Dense GraphSparse …
ApplicationsMultimedia
AnalysisComputer
Vision
Machine Learning
Computational Patterns
Specializers with custom implementations and autotuning
Krste Asanovic @ UC-Berkeley
• Brick-and-mortar silicon explores assembly-time customization, i.e., MCMs + 3D + FPGA interconnect
• Diversity via brick ecosystem & interconnect flexibility
• Brick design costs amortized across all designs
• Robust interconnect and custom bricks rival ASIC speeds
• Another thought experiment: what if building a housewere like fabricating a chip?
5) Reduce the cost of manufacturing customized H/W
H/W brick
40
Martha Kim @ Columbia
Brick-and-mortar silicondesign flow:1) Assemble brick layer2) Connect with mortar layer3) Package assembly4) Deploy software
Conclusions
• Heterogeneous design could continue Moore’s law perf. scaling via innovation alone• But, it requires a diverse hardware ecosystem with
affordable customization
• Effective and affordable customization won’t happen without our help1. Expect more from architectural innovation
2. Reduce the cost to design customized design
3. Embrace open-source concepts
4. Widen the applicability of customization
5. Reduce the cost of custom manufacturing
• Increasing “nanodiversity” is a good thing
• More jobs, companies, and students
• More competition and scalable innovation
41
Questions
?
?
??
?
? ?
? ?
?
?
?