Date post: | 28-Dec-2015 |
Category: |
Documents |
Upload: | darrell-flowers |
View: | 234 times |
Download: | 6 times |
Introduction to Parallel Computing:
Architectures, Systems, and Programming
Prof. Rajkumar Buyya
Cloud Computing and Distributed Systems
(CLOUDS) Lab. The University of Melbourne, Australiawww.buyya.com
Serial Vs. Parallel Services
QPlease
COUNTER
COUNTER 1
COUNTER 2
Overview of the Talk
Introduction Why Parallel Processing ?Parallel System H/W ArchitectureParallel Operating SystemsParallel Programming ModelsSummary
P PP P P PMicrokernelMicrokernel
Multi-Processor Computing System
Threads InterfaceThreads Interface
Hardware
Operating System
ProcessProcessor ThreadPP
Applications
Computing Elements
Programming paradigms
Two Eras of Computing Architectures System
Software/Compiler Applications P.S.Es Architectures System Software Applications P.S.Es
SequentialEra
ParallelEra
1940 50 60 70 80 90 2000 2030
Commercialization R & D Commodity
History of Parallel Processing
The notion of parallel processing can be traced to a tablet dated around 100 BC. Tablet has 3 calculating positions capable of
operating simultaneously. From this we can infer that:
They were aimed at “speed” or “reliability”.
Motivating Factor: Human Brain
The human brain consists of a large number (more than a billion) of neural cells that process information. Each cell works like a simple processor and only the massive interaction between all cells and their parallel processing makes the brain's abilities possible.
Individual neuron response speed is slow (ms)
Aggregated speed with which complex calculations carried out by (billions of) neurons demonstrate feasibility of parallel processing.
Why Parallel Processing?
Computation requirements are ever increasing: simulations, scientific prediction (earthquake),
distributed databases, weather forecasting (will it rain tomorrow?), search engines, e-commerce, Internet service applications, Data Center applications, Finance (investment risk analysis), Oil Exploration, Mining, etc.
Silicon based (sequential) architectures reaching their limits in processing capabilities (clock speed) as they are constrained by: the speed of light, thermodynamics
Age
Gro
wth
5 10 15 20 25 30 35 40 45 . . . .
Human Architecture! Growth Performance
Vertical Horizontal
No. of Processors
C.P
.I
1 2 . . . .
Computational Power Improvement
Multiprocessor
Uniprocessor
Why Parallel Processing?
Hardware improvements like pipelining, superscalar are not scaling well and require sophisticated compiler technology to exploit performance out of them.
Techniques such as vector processing works well for certain kind of problems.
Why Parallel Processing?
Significant development in networking technology is paving a way for network-based cost-effective parallel computing.
The parallel processing technology is now mature and is being exploited commercially. All computers (including desktops and
laptops) are now based on parallel processing (e.g., multicore) architecture.
Processing Elements Architecture
Processing Elements
Flynn proposed a classification of computer systems based on a number of instruction and data streams that can be processed simultaneously.
They are: SISD (Single Instruction and Single Data)
Conventional computers SIMD (Single Instruction and Multiple Data)
Data parallel, vector computing machines MISD (Multiple Instruction and Single Data)
Systolic arrays MIMD (Multiple Instruction and Multiple Data)
General purpose machine
SISD : A Conventional Computer
Speed is limited by the rate at which computer can transfer information internally.
ProcessorProcessorData Input Data Output
Instru
ctions
Ex: PCs, Workstations
The MISD Architecture
More of an intellectual exercise than a practical configuration. Few built, but commercially not available
Data InputStream
Data OutputStream
Processor
A
Processor
B
Processor
C
InstructionStream A
InstructionStream B
Instruction Stream C
SIMD Architecture
Ex: CRAY machine vector processing, Thinking machine cm*Intel MMX (multimedia support)
Ci<= Ai * Bi
InstructionStream
Processor
A
Processor
B
Processor
C
Data Inputstream A
Data Inputstream B
Data Inputstream C
Data Outputstream A
Data Outputstream B
Data Outputstream C
Unlike SISD, MISD, MIMD computer works asynchronously.
Shared memory (tightly coupled) MIMD e.g., Multicore
Distributed memory (loosely coupled) MIMD
MIMD Architecture
Processor
A
Processor
B
Processor
C
Data Inputstream A
Data Inputstream B
Data Inputstream C
Data Outputstream A
Data Outputstream B
Data Outputstream C
InstructionStream A
InstructionStream B
InstructionStream C
MEMORY
BUS
Shared Memory MIMD machine
Communication: Source PE writes data to GM & destination PE retrieves it
Easy to build, conventional OSes of SISD can be easily be ported Limitation : reliability & expandability. A memory component or any
processor failure affects the whole system. Increase of processors leads to memory contention.
Ex. : Silicon graphics supercomputers and now Multicore systems
MEMORY
BUS
Global Memory SystemGlobal Memory System
ProcessorA
ProcessorA
ProcessorB
ProcessorB
ProcessorC
ProcessorC
MEMORY
BUS
MEMORY
BUS
Distributed Memory MIMD
Communication : IPC (Inter-Process Communication) via High Speed Network.
Network can be configured to ... Tree, Mesh, Cube, etc. Unlike Shared MIMD
easily/ readily expandable Highly reliable (any CPU failure does not affect the whole system)
ProcessorA
ProcessorA
ProcessorB
ProcessorB
ProcessorC
ProcessorC
MEMORY
BUS
MEMORY
BUS
MemorySystem A
MemorySystem A
MemorySystem B
MemorySystem B
MemorySystem C
MemorySystem C
IPC
channel
IPC
channel
Types of Parallel Systems
Tightly Couple Systems: Shared Memory Parallel
Smallest extension to existing systems
Program conversion is incremental
Distributed Memory Parallel Completely new systems Programs must be
reconstructed Loosely Coupled Systems:
Clusters (now Clouds) Built using commodity
systems Centralised management
Grids Aggregation of distributed
systems Decentralized management
Laws of caution.....
Speed of computation is proportional to the square root of system cost.
i.e. Speed = Cost
Speedup by a parallel computer increases as the logarithm of the number of processors. Speedup = log2(no. of processors) S
P
log 2P
C
S
Caution....
Very fast development in network computing and related area have blurred concept boundaries, causing lot of terminological confusion: concurrent computing, parallel computing, multiprocessing, supercomputing, massively parallel processing, cluster computing, distributed computing, Internet computing, grid computing, Cloud computing, etc.
At the user level, even well-defined distinctions such as shared memory and distributed memory are disappearing due to new advances in technologies.
Good tools for parallel application development and debugging are yet to emerge.
Caution....
There is no strict delimiters for contributors to the area of parallel processing: computer architecture, operating systems,
high-level languages, algorithms, databases, computer networks, …
All have a role to play.
Operating Systems forHigh Performance
Computing
Operating Systems for PP
MPP systems having thousands of processors requires OS radically different from current ones.
Every CPU needs OS : to manage its resources to hide its details
Traditional systems are heavy, complex and not suitable for MPP
Operating System Models
Frame work that unifies features, services and tasks performed
Three approaches to building OS.... Monolithic OS Layered OS Microkernel based OS
Client server OS Suitable for MPP systems Simplicity, flexibility and high performance
are crucial for OS.
ApplicationPrograms
ApplicationPrograms
System ServicesSystem Services
HardwareHardware
Monolithic Operating System
Better application Performance Difficult to extend Ex: MS-DOS
User Mode
Kernel Mode
Layered OS
Easier to enhance Each layer of code access lower level interface Low-application performance
ApplicationPrograms
ApplicationPrograms
System ServicesSystem Services
User Mode
Kernel Mode
Memory & I/O Device MgmtMemory & I/O Device Mgmt
HardwareHardware
Process ScheduleProcess Schedule
ApplicationPrograms
ApplicationPrograms
Ex : UNIX
Traditional OS
OS DesignerOS Designer
OS
Hardware
User Mode
Kernel Mode
ApplicationPrograms
ApplicationPrograms
ApplicationPrograms
ApplicationPrograms
New trend in OS design
User Mode
Kernel Mode
Hardware
Microkernel
ServersApplicationPrograms
ApplicationPrograms
ApplicationPrograms
ApplicationPrograms
Microkernel/Client Server OS
(for MPP Systems)
Tiny OS kernel providing basic primitive (process, memory, IPC)
Traditional services becomes subsystems Monolithic Application Perf. Competence OS = Microkernel + User Subsystems
ClientApplication
ClientApplication
Thread lib.
Thread lib.
FileServer
FileServer
NetworkServer
NetworkServer
DisplayServer
DisplayServer
MicrokernelMicrokernel
HardwareHardware
User
Kernel
SendReply
Few Popular Microkernel Systems
MACH, CMU
PARAS, C-DAC
Chorus
QNX
(Windows)
Parallel Programs
Consist of multiple active “processes” simultaneously solving a given problem.
And the communication and synchronization between them (parallel processes) forms the core of parallel programming efforts.
Parallel Programming Models
Shared Memory Model DSM Threads/OpenMP (enabled for clusters) Java threads (HKU JESSICA, IBM cJVM)
Message Passing Model PVM MPI
Hybrid Model Mixing shared and distributed memory model Using OpenMP and MPI together
Object and Service Oriented Models Wide area distributed computing technologies
OO: CORBA, DCOM, etc. Services: Web Services-based service composition
Summary/Conclusions
Parallel processing has become a reality: E.g., SMPs are used as (Web) Servers extensively. Threads concept utilized everywhere. Clusters have emerged as popular data centers
and processing engines: E.g., Google search engine.
The emergence of commodity high-performance CPU, networks, and OSs have made parallel computing applicable to enterprise and consumer applications. E.g., Oracle {9i,10g} database on Clusters/Grids. E.g. Facebook and Twitter running on Clouds