+ All Categories
Home > Documents > Parallel Processing1 Parallel Processing (CS 676) Overview Jeremy R. Johnson.

Parallel Processing1 Parallel Processing (CS 676) Overview Jeremy R. Johnson.

Date post: 17-Dec-2015
Category:
Upload: cuthbert-patterson
View: 219 times
Download: 0 times
Share this document with a friend
Popular Tags:
17
Parallel Processing 1 Parallel Processing (CS 676) Overview Jeremy R. Johnson
Transcript
Page 1: Parallel Processing1 Parallel Processing (CS 676) Overview Jeremy R. Johnson.

Parallel Processing 1

Parallel Processing (CS 676)

Overview

Jeremy R. Johnson

Page 2: Parallel Processing1 Parallel Processing (CS 676) Overview Jeremy R. Johnson.

Parallel Processing 2

Goals

• Parallelism: To run large and difficult programs fast.

• Course: To become effective parallel programmers– “How to Write Parallel Programs”– “Parallelism will become, in the not too distant future, an essential part

of every programmer’s repertoire”– “Coordination – a general phenomenon of which parallelism is one

example – will become a basic and widespread phenomenon in CS”

• Why? – Some problems require extensive computing power to solve– The most powerful computer by definition is a parallel machine– Parallel computing is becoming ubiquitous– Distributed & networked computers with simultaneous users require

coordination

Page 3: Parallel Processing1 Parallel Processing (CS 676) Overview Jeremy R. Johnson.

Parallel Processing 3

Top 500

Page 4: Parallel Processing1 Parallel Processing (CS 676) Overview Jeremy R. Johnson.

Parallel Processing 4

LINPACK Benchmark

• Solve a dense N N system of linear equations, y = Ax, using Gaussian Elimination with partial pivoting

– 2/3N3 + 2N2 FLOPS

• High Performance LINPACK used to measure performance for TOP500 (introduced by Jack Dongarra)

uuuuuu

lllll

l

aaaaaaaaa

33

2322

131211

333231

2221

11

333231

232221

131211

00

00

00

Page 5: Parallel Processing1 Parallel Processing (CS 676) Overview Jeremy R. Johnson.

Parallel Processing 5

Example LU Decomposition

• Solve the following linear system

• Find LU decomposition A = PLU

1

1

1

yx

zx

zy

011

101

110

A

Page 6: Parallel Processing1 Parallel Processing (CS 676) Overview Jeremy R. Johnson.

Parallel Processing 6

Big Machines

Cray 2DoE-Lawrence Livermore

National Laboratory (1985)3.9 gigaflops

8 processor vector machine

Cray XMP/4DoE, LANL,… (1983)

941 megaflops4 processor vector machine

Page 7: Parallel Processing1 Parallel Processing (CS 676) Overview Jeremy R. Johnson.

Parallel Processing 7

Big Machines

Cray JaguarORNL (2009)

1.75 petaflops224,256 AMD Opteron cores

Tianhe-1ANSC Tianjin, China (2010)

2.507 petaflops14,336 Xeon X5670 processors 7,168 Nvidia Tesla M2050 GPUS

Page 8: Parallel Processing1 Parallel Processing (CS 676) Overview Jeremy R. Johnson.

Parallel Processing 8

Need for Parallelism

Page 9: Parallel Processing1 Parallel Processing (CS 676) Overview Jeremy R. Johnson.

Parallel Processing 9

Multicore

Intel Core i7

Page 10: Parallel Processing1 Parallel Processing (CS 676) Overview Jeremy R. Johnson.

Parallel Processing 10

Multicore

IBM Blue Gene/L2004-2007

478.2 teraflops65,536 "compute nodes”

Cyclops6480 gigaflops

80 cores @ 500 megahertzmultiply-accumulate

Page 11: Parallel Processing1 Parallel Processing (CS 676) Overview Jeremy R. Johnson.

Parallel Processing 11

Multicore

Page 12: Parallel Processing1 Parallel Processing (CS 676) Overview Jeremy R. Johnson.

Parallel Processing 12

Multicore

Page 13: Parallel Processing1 Parallel Processing (CS 676) Overview Jeremy R. Johnson.

Parallel Processing 13

GPU

Nvidia GTX 480 1.34 teraflops

480 SP (700 MHz)Fermi chip 3 billion transistors

Page 14: Parallel Processing1 Parallel Processing (CS 676) Overview Jeremy R. Johnson.

Parallel Processing 14

Google Server

• 2003: 15,000 servers ranging from 533 MHz Intel Celeron to dual 1.4 GHz Intel Pentium III

• 2005: 200,000 servers

• 2006: upwards of servers

Page 15: Parallel Processing1 Parallel Processing (CS 676) Overview Jeremy R. Johnson.

Drexel Machines

• Tux• 5 nodes

– 4 Quad-Core AMD Opteron 8378 processors (2.4 GHz)

– 32 GB RAM

• Draco• 20 nodes

– Dual Xeon Processor X5650 (2.66 GHz)

– 6 GTX 480– 72 GB RAM

• 4 nodes– 6 C2070 GPUs

Parallel Processing 15

Page 16: Parallel Processing1 Parallel Processing (CS 676) Overview Jeremy R. Johnson.

Parallel Processing 16

Programming Challenge

• “But the primary challenge for an 80-core chip will be figuring out how to write software that can take advantage of all that horsepower.”

• Read more: http://news.cnet.com/Intel-shows-off-80-core-processor/21001006_36158181.html?tag=mncol#ixzz1AHCK1LEc

Page 17: Parallel Processing1 Parallel Processing (CS 676) Overview Jeremy R. Johnson.

Parallel Processing 17

Basic Idea

• One way to solve a problem fast is to break the problem into pieces, and arrange for all of the pieces to be solved simultaneously.

• The more pieces, the faster the job goes - upto a point where the pieces become too small to make the effort of breaking-up and distributing worth the bother.

• A “parallel program” is a program that uses the breaking up and handing-out approach to solve large or difficult problems.


Recommended