Post on 29-May-2020
transcript
Parallel Computing
Benson Muite
benson.muite@ut.eehttp://math.ut.ee/˜benson
https://courses.cs.ut.ee/2014/paralleel/fall/Main/HomePage
22 September 2014
Document Preparation: LaTeX and Lyx
• https://en.wikibooks.org/wiki/LaTeX
• http://texblog.org/about/
• http://www.latex-project.org/
• http://www.lyx.org/
Computer Architecture
Parallel Computer Architecture
• Chip Architecture Review• Accelerators• Graphics Cards• Intel Xeon Phi• Parallel Computer Networking• CM3• The Earth Simulator• IBM Blue Gene• K computer• Titan• Tianhe II
Chip Architecture Review
• Typical chip today has multiple cores• Data may need to be obtained from a hard disk, RAM or
cache before being processed• For many applications getting data can be more of a
constraint than computing the data
Example HPC Chip Architectures
• Intel Haswell• AMD Opteron• SPARC64 XIfx• NEC SX-ACE• IBM Power 8• IBM PowerPC A2• Hotchips (http://www.hotchips.org/), Coolchips
(http://www.coolchips.org/2015/)
Accelerators
• External specialized device for floating point operations• Typically good at doing many simplified instructions in
parallel• High latency is compensated by high bandwidth
Graphics Cards and General Purpose Computing onGraphics Cards
• Nvidia – many simple cores, CUDA, CUDA Fortran, OpenACC, OpenCL and OpenGL application programminginterfaces, strong support of academic community
• AMD – many simple cores, Open CL and OpenGL. Havelaunched APU (Accelerated Processing Unit) whichcombines CPU and GPU
• Embedded graphics cards in AMD APU, Cell phone chips,such as Qualcomm snapdragon
Intel Xeon Phi
• 1Tflop of performance• Mini-supercomputer in a compute card• Simplified x86 cores• Typically easy to get code to run, more difficult to get code
to run efficiently
Parallel Computer Networks
• Bus – simple, cheap, poor communication performance• Ring – simple, cheap, poor communication performance• Mesh – simple, more expensive than ring, better
communication performance than ring• Hypercube – good communication performance, expensive
at a large scale• Torus 2D, 3D, 4D, 6D – good communication performance,• Fat tree – Commonly used, not quite as good performance
as a torus, but cheaper• Which topology is cost effective for a monte carlo
simulation?• What is the topology of Rocket?
Parallel Computer Networks
• http://htor.inf.ethz.ch/research/topologies/
CM-5• http://people.csail.mit.edu/bradley/cm5/,
https://en.wikipedia.org/wiki/Connection_Machine
Figure: NAS Thinking Machines CM-5, photographer: TomTrower, 1993 (This is probably a 256 processor machine.)
• 131 Gflops on 1024 processors• World’s most powerful known computer in June 1993• Fat tree topology network• Thinking Machines grew out of Danny Hills doctoral
research, but is no longer producing supercomputers
The Earth Simulator• https://en.wikipedia.org/wiki/Earth_Simulator
http://www.jamstec.go.jp/ceist/avcrg/index.en.html
Figure: Old Earth Simulator Figure: Earth Simulator 2
• 35.86 Tflops on 5120 processors• World’s most powerful known computer between March
2002 and November 2004• Vector processors• Five times faster than previous first computer on Top500
IBM Blue Gene L• https://en.wikipedia.org/wiki/Blue_Gene#Blue_Gene.2FL
https://asc.llnl.gov/computing_resources/bluegenel/photogallery.html
Figure: Adam Bertsch next to a Blue Gene L system atLawrence Livermore National Laboratories
• 596 Tflops on 106,496 dual core processors• World’s most powerful known computer between
November 2004 and November 2007• 3D torus and many not so fast cores• More at
https://asc.llnl.gov/computing_resources/bluegenel/configuration.html
K computer• https://en.wikipedia.org/wiki/K_computer
http://www.aics.riken.jp/en/outreach/photo-gallery/
Figure: K computer at RIKEN, picture courtesy of RIKEN.
• Currently 10.5 Pflops on 88,128 SPARC64 VIIIfxprocessors with 8 cores per processor
• World’s most powerful known computer between June2011 and June 2012
• 6D “mesh/”torus network and many fast and smart cores• More at http://www.aics.riken.jp/en/k-computer/system
Titan• https://en.wikipedia.org/wiki/Titan_%28supercomputer%29
https://www.olcf.ornl.gov/
Figure: Titan Supercomputer at Oak Ridge National Laboratory
• 27 Pflops on 18,688 AMD Opteron 6274 16-core CPUsand 18,688 Nvidia Tesla K20X GPUs
• World’s most powerful known computer betweenNovember 2012 and June 2013
• More at https://www.olcf.ornl.gov/computing-resources/titan-cray-xk7/
Tianhe II
• https://en.wikipedia.org/wiki/Tianhe-2 https://www.olcf.ornl.gov/
• https://duckduckgo.com/?q=tianhe+II+pictures
• 33.86 Pflops on 32,000 Intel Xeon E5-2692 chips with48,000 Xeon Phi 31S1P coprocessors
• Fat tree topology, American chips, but Fat tree topologyInterconnect is made in China
• World’s most powerful known computer• More at
www.netlib.org/utk/people/JackDongarra/PAPERS/tianhe-2-dongarra-report.pdf
Summary
• Supercomputer architectures are still evolving• Depending on the problem you are solving, the best choice
of computer architecture and algorithm should be made ifpossible
• In many cases, you have no choice in the computerarchitecture of a supercomputer, but do have some choicein the algorithm
• Sometimes you are lucky and can choose both, but mayneed to write a lot of code
New Key Concepts and References
• Parallel Computer Architecture; RR 2.1-2.3• Rahman, R. Intel Xeon Phi Coprocessor Architecture
and Tools: The Guide for Application Developers,Apress Open, (2013) $0.35 on Amazon
• T. Hoefler “Networking and Computer Architecture”http://htor.inf.ethz.ch/teaching/CS498/
• A. Grama, A. Gupta, G. Karypis, V. Kumar, Introduction toParallel Computing, 2nd Ed., Addison Wesley (2003)
• Wang, E., Zhang, Q., Shen, B., Zhang, G., Lu, X., Wu, Q.,Wang, Y. High-Performance Computing on theIntel R©Xeon PhiTM, Springer (2014) http://www.springer.com/computer/communication+networks/book/978-3-319-06485-7?otherVersion=978-3-319-06486-4