Post on 16-May-2018
transcript
Prof. Saman Amarasinghe, MIT. 1 6.189 IAP 2007 MIT
6.189 IAP 2007
Lecture 1
Multicore Programming Primer and Programming Competition
Introduction
2 6.189 IAP 2007 MITProf. Saman Amarasinghe, MIT.
The “Software Crisis”
“To put it quite bluntly: as long as there were no machines, programming was no problem at all; when we had a few weak computers, programming became a mild problem, and now we have gigantic computers, programming has become an equally gigantic problem."
-- E. Dijkstra, 1972 Turing Award Lecture
3 6.189 IAP 2007 MITProf. Saman Amarasinghe, MIT.
The First Software Crisis
● Time Frame: ’60s and ’70s
● Problem: Assembly Language ProgrammingComputers could handle larger more complex programs
● Needed to get Abstraction and Portability without losing Performance
4 6.189 IAP 2007 MITProf. Saman Amarasinghe, MIT.
How Did We Solve the First Software Crisis?
● High-level languages for von-Neumann machinesFORTRAN and C
● Provided “common machine language” for uniprocessors
Single memory image
Single flow of control
Common Properties
ISA
Functional Units
Register FileDifferences:
5 6.189 IAP 2007 MITProf. Saman Amarasinghe, MIT.
The Second Software Crisis
● Time Frame: ’80s and ’90s
● Problem: Inability to build and maintain complex and robust applications requiring multi-million lines of code developed by hundreds of programmers
Computers could handle larger more complex programs
● Needed to get Composability, Malleability and Maintainability
High-performance was not an issue left for Moore’s Law
6 6.189 IAP 2007 MITProf. Saman Amarasinghe, MIT.
How Did We Solve the Second Software Crisis?
● Object Oriented ProgrammingC++, C# and Java
● Also…Better tools– Component libraries, Purify Better software engineering methodology – Design patterns, specification, testing, code reviews
7 6.189 IAP 2007 MITProf. Saman Amarasinghe, MIT.
● Solid boundary between Hardware and Software
● Programmers don’t have to know anything about the processor
High level languages abstract away the processors– Ex: Java bytecode is machine independent Moore’s law does not require the programmers to know anything about the processors to get good speedups
● Programs are oblivious of the processor work on all processors
A program written in ’70 using C still works and is much faster today
● This abstraction provides a lot of freedom for the programmers
Today: Programmers are Oblivious to Processors
8 6.189 IAP 2007 MITProf. Saman Amarasinghe, MIT.
The Origins of a Third Crisis
● Time Frame: 2005 to 20??
● Problem: Sequential performance is left behind by Moore’s law
● Needed continuous and reasonable performance improvements to support new featuresto support larger datasets
● While sustaining portability, malleability and maintainability without unduly increasing complexity faced by the programmer
critical to keep-up with the current rate of evolution in software
9 6.189 IAP 2007 MITProf. Saman Amarasinghe, MIT.
1
10
100
1000
10000
100000
1978 1980 1982 1984 1986 1988 1990 1992 1994 1996 1998 2000 2002 2004 2006 2008 2010 2012 2014 2016
Perfo
rman
ce (v
s. V
AX-
11/7
80)
25%/year
52%/year
??%/year
8086
286
386
486
PentiumP2
P3P4
ItaniumItanium 2
The March to Multicore:Moore’s Law
From David Patterson
1,000,000,000
100,000
10,000
1,000,000
10,000,000
100,000,000
From Hennessy and Patterson, Computer Architecture: A Quantitative Approach, 4th edition, 2006
Num
ber of Transistors
10 6.189 IAP 2007 MITProf. Saman Amarasinghe, MIT.
The March to Multicore:Uniprocessor Performance (SPECint)
Specint2000
1.00
10.00
100.00
1000.00
10000.00
85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 00 01 02 03 04 05 06 07
i ntel 386
i ntel 486
i ntel pent i um
i ntel pent i um 2
i ntel pent i um 3i ntel pent i um 4
i ntel i tani um
A l pha 21064
A l pha 21164
A l pha 21264
Spar c
Super Spar c
Spar c64
M i ps
HP PAPower PC
AM D K6
AM D K7
AM D x86-64
11 6.189 IAP 2007 MITProf. Saman Amarasinghe, MIT.
The March to Multicore:Uniprocessor Performance (SPECint)
● General-purpose unicores have stopped historic performance scaling
Power consumptionWire delaysDRAM access latencyDiminishing returns of more instruction-level parallelism
From David Patterson
12 6.189 IAP 2007 MITProf. Saman Amarasinghe, MIT.
Power Consumption (watts)
Power
1
10
100
1000
85 87 89 91 93 95 97 99 01 03 05 07
intel 386
intel 486
intel pentium
intel pentium 2
intel pentium 3
intel pentium 4
intel i tanium
Alpha 21064
Alpha 21164
Alpha 21264
Spar c
Super Spar c
Spar c64
Mips
HP PA
Power PC
AMD K6
AMD K7
AMD x86-64
13 6.189 IAP 2007 MITProf. Saman Amarasinghe, MIT.
Power Efficiency (watts/spec)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
1982 1984 1987 1990 1993 1995 1998 2001 2004 2006
Year
Wat
ts/S
pec
intel 386intel 486intel pentiumintel pentium 2intel pentium 3intel pentium 4intel itaniumAlpha 21064Alpha 21164Alpha 21264SparcSuperSparcSparc64M ipsHP PAPower PCAM D K6AM D K7AM D x86-64
14 6.189 IAP 2007 MITProf. Saman Amarasinghe, MIT.
Range of a Wire in One Clock Cycle
00.020.040.060.080.1
0.120.140.160.180.2
0.220.240.260.28
1996 1998 2000 2002 2004 2006 2008 2010 2012 2014Year
Pro
cess
(mic
rons
)
700 MHz
1.25 GHz
2.1 GHz
6 GHz10 GHz
13.5 GHz
• 400 mm2 Die• From the SIA Roadmap
15 6.189 IAP 2007 MITProf. Saman Amarasinghe, MIT.
DRAM Access Latency
● Access times are a speed of light issue
● Memory technology is also changing
SRAM are getting harder to scaleDRAM is no longer cheapest cost/bit
● Power efficiency is an issue here as well
1
100
10000
1000000
1980
1982
1984
1986
1988
1990
1992
1994
1996
1998
2000
2002
2004
Year
Perf
orm
ance
µProc60%/yr.
(2X/1.5yr)
DRAM9%/yr.
(2X/10 yrs)
16 6.189 IAP 2007 MITProf. Saman Amarasinghe, MIT.
Diminishing Returns
● The ’80s: Superscalar expansion 50% per year improvement in performanceTransistors applied to implicit parallelism– pipeline processor (10 CPI --> 1 CPI)
● The ’90s: The Era of Diminishing ReturnsSqueaking out the last implicit parallelism– 2-way to 6-way issue, out-of-order issue, branch prediction– 1 CPI --> 0.5 CPIperformance below expectationsprojects delayed & canceled
● The ’00s: The Beginning of the Multicore EraThe need for Explicit Parallelism
17 6.189 IAP 2007 MITProf. Saman Amarasinghe, MIT.
AMD OpteronDual Core
Intel Montecito1.7 Billion transistors
Dual Core IA/64Intel TanglewoodDual Core IA/64
Intel Pentium Extreme3.2GHz Dual Core
Intel Tejas & JayhawkUnicore (4GHz P4)
Intel DempseyDual Core Xeon
Intel Pentium D(Smithfield)
Cancelled
Intel YonahDual Core Mobile
IBM Power 6Dual Core
IBM Power 4 and 5Dual Cores Since 2001
IBM CellScalable Multicore
Sun Olympus and Niagara8 Processor Cores
MIT Raw 16 Cores
Since 2002
… 1H 2005 1H 2006 2H 20062H 20052H 2004
Unicores are on the verge of extinction Multicores are here
18 6.189 IAP 2007 MITProf. Saman Amarasinghe, MIT.
1985 199019801970 1975 1995 2000 2005
Raw
Power4Opteron
Power6
Niagara
YonahPExtreme
Tanglewood
Cell
IntelTflops
Xbox360
CaviumOcteon
RazaXLR
PA-8800
CiscoCSR-1
PicochipPC102
Boardcom 1480
20??
# ofcores
1
2
4
8
16
32
64
128256
512
Opteron 4PXeon MP
AmbricAM2045
Multicores are Here
4004
8008
80868080 286 386 486 Pentium P2 P3P4Itanium
Itanium 2Athlon
19 6.189 IAP 2007 MITProf. Saman Amarasinghe, MIT.
Requirements and Outcomes
● RequirementsA good programmer with experience Fluent in C
● OutcomesKnow fundamental concepts of parallel programming (both hardware and software)Understand issues of parallel performance Able to synthesize a fairly complex parallel programHands-on experience with the IBM Cell processor
20 6.189 IAP 2007 MITProf. Saman Amarasinghe, MIT.
The Project● You proposed the projects● We selected 7 teams
Mainly by the strength of the project proposals● Seven Great Projects
Distributed Real-time Ray TracerGlobal IlluminationLinear Algebra PackMolecular Dynamics SimulatorSpeech SynthesizerSoft RadioBackgammon Tutor
● Project Characteristics Ambitious but accomplishable Important and RelevantOpportunity to sizzle
● Get them started ASAP!
21 6.189 IAP 2007 MITProf. Saman Amarasinghe, MIT.
A Note of Caution
● Cell processor is very new● It is not an easy architecture to work with● The tool chain is thin and brittle● Most of the staff have limited experience ● Projects you are doing are of your own making.
They aren’t canned exercises that are tried and proven. ● You will face unexpected problems.● WE ARE ALL IN THIS TOGETHER!!
22 6.189 IAP 2007 MITProf. Saman Amarasinghe, MIT.
Grading
● Mini Quizzes 16%At the beginning of each class day5 minutes each
● Lab Projects 24%
● Final Group Project 60%
23 6.189 IAP 2007 MITProf. Saman Amarasinghe, MIT.
Final Competition
● The competition will be decided onPerformance Completeness Algorithmic complexityDemo and Presentation
● The winning team willGet gift certificates ($150 each)Be invited to IBM TJ Watson Research Center for a day– Tour of the facilities– Present your project
24 6.189 IAP 2007 MITProf. Saman Amarasinghe, MIT.
Staff● Prof. Saman Amarasinghe (saman@mit.edu)
Interested in languages, compilers and computer architectureRaw Processor (with Prof. Anant Agarwal)StreamIt language SUIF parallelizing compiler
● Dr. Rodric Rabbah (rabbah@mit.edu)Currently a researcher at IBM Watson Research CenterWas a research scientist at CSAIL before thatInterested in compilers, computer architecture and FPGAs
● TAsDavid Zhang (dxzhang@mit.edu)– Course 6 M.Eng.Phil Sung (psung@mit.edu)– Course 6 M.Eng.
25 6.189 IAP 2007 MITProf. Saman Amarasinghe, MIT.
Guest Lectures
● Dr. Michael PerroneIBM Watson Research CenterExpert in Cell Architecture and Application Development
● Prof. Alan EdelmanMath and CS. Interested in parallel algorithms
● Prof. ArvindParallel architectures, compilers and languages
● Dr. Bradley KuszmaulResearch scientist at CSAIL working on Cilk
● Mike Acton Professional game developer
● Bill ThiesCSAIL PhD candidateArchitect of StreamIt
26 6.189 IAP 2007 MITProf. Saman Amarasinghe, MIT.
Lecture Organization
Implicit Explicit
Hardware CompilerSuperscalarProcessors
(start of Lecture 3)
ParallelizingCompilers
(Lectures 11 & 12)
LibraryLanguagesConcurrency
(Lecture 4)
Design Patterns(Lectures 5,6 7)
StreamIt (Lecture 8)Star-P (Lecture 13)BlueSpec (Lecture 14)Cilk (Lecture 15)
Extracting Parallelism
27 6.189 IAP 2007 MITProf. Saman Amarasinghe, MIT.
Schedule Monday Tuesday Wednesday Thursday Friday
10:00 – 10:55
Lecture 1: Course Introduction
Recitation 1: Getting to Know Cell
Lecture 3: Introduction to Parallel Architectures
Lecture 5: Parallel Programming Concepts Jan
8 11:05 – 12:00
Lecture 2: Introduction to Cell Processor
Lecture 4: Introduction to Concurrent Programming
Project Reviews
Lecture 6: Design Patterns for Parallel Programming I
10:00 – 10:55
Lecture 7: Design Patterns for Parallel Programming II
Recitation 4: Cell Debugging Tools
Lecture 9: Debugging and Performance Monitoring Jan
15 11:05 – 12:00
Holiday Recitation 2-3: Cell Programming Hands-On
Lecture 8: StreamIt Language
Lecture 10: Performance Optimizations
10:00 – 10:55
Lecture 11: Classic Parallelizing Compilers
Lecture 13: Star-P Lecture 15: Cilk Jan 22
11:05 – 12:00
Lecture 12: StreamIt Parallelizing Compiler
Recitation 5, 6: Cell Performance Monitoring Tools Lecture 14:
Synthesizing Parallel Programs
Lecture 16: Anatomy of a Game
10:00 – 10:55
Lecture 17: The Raw Experience Jan
29 11:05 – 12:00 18: The Future
Group Presentations Awards & Reception