[Harvard CS264] 01 - Introduction

Post on 17-Nov-2014

3,464 views 4 download

Tags:

description

http://cs264.org

transcript

Lecture #1: Introduction | January 25th, 2011

Nicolas Pinto (MIT, Harvard) pinto@mit.edu

Massively Parallel ComputingCS 264 / CSCI E-292

...

pinto@mit.edu

Distant Students

pinto@mit.edu

Take a picture with...

I likea friend

I likehis dog

cool hardware

your mom

pinto@mit.edu

Send it to:pinto@mit.edu

Today

Outline

Outline

Human? “Computing”

Massively Parallel Computing

Supercomputing

Cloud ComputingHigh-Throughput Computing

Many-core Computing

MPC

Human? “Computing”

Massively Parallel Computing

Cloud ComputingHigh-Throughput Computing

Many-core Computing

SupercomputingMPC

Modeling & Simulation

• Physics, astronomy, molecular dynamics, finance, etc.

• Data and processing intensive

• Requires high-performance computing (HPC)

• Driving HPC architecture development

Top Dog (2008)

• Roadrunner, LANL

• #1 on top500.org in 2008 (now #7)

• 1.105 petaflop/s

• 3000 nodes with dual-core AMD Opteron processors

• Each node connected via PCIe to two IBM Cell processors

• Nodes are connected via Infiniband 4x DDR

CS264 (2009)

Tianhe-1Aat NSC Tianjin

2.507 Petaflop7168 Tesla M2050 GPUs

Slide courtesy of Bill Dally (NVIDIA)

1 Petaflop/s = ~1M high-end laptops = ~world population with hand calculators 24/7/365 for ~16 years

What $100+ million can buy you...

Roadrunner (#7) Jaguar (#2)

Jaguar (#2)

Who uses HPC?

Who uses HPC?

Human? “Computing”

Massively Parallel Computing

Supercomputing

Cloud ComputingHigh-Throughput Computing

Many-core Computing

MPC

Cloud Computing?

Buzzword ?

Careless Computing?

...

Response from the legend:

Cloud Utility Computing?for CS264

http://aws.amazon.com/ec2/

Web Data Explosion

How much Data?

• Google processes 24 PB / day, 8 EB / year (’10)

• Wayback Machine has 3 PB,100 TB/month (’09)

• Facebook user data: 2.5 PB, 15 TB/day (’09)

• Facebook photos: 15 B, 3 TB/day (’09) - 90 B (now)

• eBay user data: 6.5 PB, 50 TB/day (’09)

• “all words ever spoken by human beings”~ 42 ZB

Adapted from http://www.umiacs.umd.edu/~jimmylin/cloud-2010-Spring/

“640k ought to be enough for anybody.”- Bill Gates just a rumor (1981)

Disk Throughput

• Average Google job size: 180 GB

• 1 SATA HDD = 75 MB / sec

• Time to read 180 GB off disk: 45 mins

• Solution: parallel reads

• 1000 HDDs = 75 GB / sec

• Google’s solutions: BigTable, MapReduce, etc.

• Clear trend: centralization of computing resources in large data centers

• Q: What do Oregon, Iceland, and abandoned mines have in common?

• A: Fiber, juice, and space

• Utility computing!

Cloud Computing

Human? “Computing”

Massively Parallel Computing

Supercomputing

Cloud ComputingHigh-Throughput Computing

Many-core Computing

MPC

Instrument Data Explosion

Sloan Digital Sky Survey

ATLUM / Connectome Project

Another example?hint: Switzerland

CERN in 2005....

CERN Summer School 2005

CERN Summer School 2005

bad taste party...

pitchers...

CERN Summer School 2005

LHC

Maximilien Brice, © CERN

Maximilien Brice, © CERN

LHC

Maximilien Brice, © CERN

LHC

~5000 nodes (‘05)

CERN’s Cluster

CERN Summer School 2005

presentations...

Diesel Powered HPC

Life SupportLife Support……

Slide courtesy of Hanspeter Pfister

Murchison Widefield Array

How much Data?

• NOAA has ~1 PB climate data (‘07)

• MWA radio telescope: 8 GB/sec of data

• Connectome: 1 PB / mm3 of brain tissue (1 EB for 1 cm3)

• CERN’s LHC will generate 15 PB a year (‘08)

High Flops / Watt

Human? “Computing”

Massively Parallel Computing

Supercomputing

Cloud ComputingHigh-Throughput Computing

Many-core Computing

MPC

Computer Games

• PC gaming business:

• $15B / year market (2010)

• $22B / year in 2015 ?

• WOW: $1B / year

• NVIDIA Shipped 1B GPUs since 1993:

• 10 years to ship 200M GPUs (1993-2003)

• 1/3 of all PCs have more than one GPU

• High-end GPUs sell for around $300

• Now used for science application

Intel Core i7-980X Extreme6 cores

1.17B transistors

NVIDIA GTX 580 SC512 cores

3B transistors

Many-Core Processors

http://en.wikipedia.org/wiki/Transistor_count

Data Throughput

MassiveData

Parallelism

InstructionLevel

Parallelism

Data Fits in Cache Huge Data

CPU

GPU

David Kirk, NVIDIA

3 of Top5 Supercomputers

!

"!!

#!!!

#"!!

$!!!

$"!!

%&'()*+#, -'./'0 1*2/3'* %4/2'5* 6788*09::

!"#$%&'()

Bill Dally, NVIDIA

Personal Supercomputers

~4 Teraflops @ 1500 Watts

Disruptive Technologies

• Utility computing

• Commodity off-the-shelf (COTS) hardware

• Compute servers with 100s-1000s of processors

• High-throughput computing

• Mass-market hardware

• Many-core processors with 100s-1000s of cores

• High compute density / high flops/W

Green HPC

NVIDIA/NCSA Green 500 Entry

Green HPC

NVIDIA/NCSA Green 500 Entry

128 nodes, each with:1x Core i3 530 (2 cores, 2.93 GHz => 23.4 GFLOP peak)1x Tesla C2050 (14 cores, 1.15 GHz => 515.2 GFLOP peak)4x QDR Infiniband4 GB DRAM

Theoretical Peak Perf: 68.95 TFFootprint: ~20 ft^2 => 3.45 TF/ft^2 Cost: $500K (street price) => 137.9 MF/$Linpack: 33.62 TF, 36.0 kW => 934 MF/W

One more thing...

Human? “Computing”

Massively Parallel Computing

Supercomputing

Cloud ComputingHigh-Throughput Computing

Many-core Computing

MPC

Human? “Computing”

Massively Parallel Computing

Supercomputing

Cloud ComputingHigh-Throughput Computing

Many-core Computing

MPC

Massively Parallel Human Computing ???

• “Crowdsourcing”

• Amazon Mechanical Turk (artificial artificial intelligence)

• Wikipedia

• Stackoverflow

• etc.

What is this course about?

What is this course about?Massively parallel processors

• GPU computing with CUDA

Cloud computing

• Amazon’s EC2 as an example of utility computing

• MapReduce, the “back-end” of cloud computing

Less like Rodin...

More like Bob...

Outline

wikipedia.org

Anant Agarwal, MIT

Power Cost

• Power ∝ Voltage2 x Frequency

• Frequency ∝ Voltage

• Power ∝ Frequency3

Jack Dongarra

Power Cost

Cores Freq Perf Power P/W

CPU 1 1 1 1 1

“New” CPU 1 1.5 1.5 3.3 0.45x

Multicore 2 0.75 1.5 0.8 1.88x

Jack Dongarra

Anant Agarwal, MIT

Problem with Buses

Problem with Memory

http://www.OpenSparc.net/

Problem with Disks

Tom’s Hardware

64 MB / sec

Good News

• Moore’s Law marches on

• Chip real-estate is essentially free

• Many-core architectures are commodities

• Space for new innovations

Bad News

• Power limits improvements in clock speed

• Parallelism is the only route to improve performance

• Computation / communication ratio will get worse

• More frequent hardware failures?

BadNews

A “Simple” Matter of Software

• We have to use all the cores efficiently

• Careful data and memory management

• Must rethink software design

• Must rethink algorithms

• Must learn new skills!

• Must learn new strategies!

• Must learn new tools...

tew Our mantra: always use the right tool !

Outline

Instructor: Nicolas Pinto

• biz card (joke on it abt PhD now)

• I’m like you guys

• not an expert

• we are all here to learn from each other

• recent graduate

• collaborative event this class

The Rowland Institute at HarvardHARVARD UNIVERSITY

~50% of is for vision!

Everyone knows that...

The ApproachReverse and Forward Engineering the Brain

The ApproachReverse and Forward Engineering the Brain

Build Artificial System

FORWARD REVERSE Study

Natural System

brain = 20 petaflops !

http://vimeo.com/7945275

Linus Pauling(double Nobel Prize Winner)

If you want to have good ideas you must have many ideas.”“

Most of them will be wrong, and what you have to learn is

which ones to throw away.

“”

High-throughput Screening

thousands of big models

The curse of speed...and the blessing of massively parallel computing

large amounts of unsupervised learning experience

The curse of speed...and the blessing of massively parallel computing

No off-the-shelf solution? DIY!

Engineering (Hardware/SysAdmin/Software) Science

Leverage non-scientific high-tech markets and their $billions of R&D...

Gaming: Graphics Cards (GPUs), PlayStation 3

Web 2.0: Cloud Computing (Amazon, Google)

Build your own!

DIY GPU pr0n (since 2006) Sony Playstation 3s (since 2007)

The blessing of GPUs

Q9450 (Matlab/C) [2008]

Q9450 (C/SSE) [2008]

7900GTX (OpenGL/Cg) [2006]

PS3/Cell (C/ASM) [2007]

8800GTX (CUDA1.x) [2007]

GTX280 (CUDA2.x) [2008]

GTX480 (CUDA3.x) [2010] 974.3

339.3

192.7

111.4

68.2

9.0

0.3

>1000X speedup is game changing...

Pinto, Doukhan, DiCarlo, Cox PLoS 2009

Pinto, Cox GPU Comp. Gems 2011

speed(in billion floating point operations per second)

(Fermi)

Tired Of Waiting For Your Computations?

6.963 (IAP09)

Supercomputing on your desktop:

Programming the next generation of cheap and

massively parallel hardware using CUDA

This IAP has been designed to give students extensive

hands-on experience in using a new potentially disruptive

technology. This technology enables the masses having

access to supercomputing capabilities.

We will introduce students to the CUDA programming

language developed by NVIDIA Corp. which, has been an

essential step towards simplifying and unifying the

programming of massively parallel chips.

This IAP is supported by generous contributions from

NVIDIA Corp. , The Rowland Institute at Harvard, and MIT

(OEIT, BCS, EECS) and will be featuring talks given by

experts from various fields.

Co-Instructor:Hanspeter Pfister

Visual Computing• Large image & video collections

• Physically-based modeling

• Face modeling and recognition

• Visualization

VolumePro 500

Released1999

GPGPU

Connectome

NSF CDI Grant ’08-’11

NVIDIA CUDA Center of Excellence

TFs

• Claudio Andreoni (MIT Course 18)

• Dwight Bell (Harvard DCE)

• Krunal Patel (Accelereyes)

• Jud Porter (Harvard SEAS)

• Justin Riley (MIT OEIT)

• Mike Roberts (Harvard SEAS)

Claudio Andreoni(MIT Course 18)

Dwight Bell(Harvard DCE)

Krunal Patel(Accelereyes)

Jud Porter(Harvard SEAS)

Justin Riley(MIT OEIT)

Mike Roberts(Harvard SEAS)

About You

About you...

• Undergraduate ? Graduate ?

• Programming ? >5 years ? <2 years ?

• CUDA ? MPI ? MapReduce ?

• CS ? Life Sc ? Applied Sc ? Engineering ? Math ? Physics ?

• Humanities ? Social Sc ? Economy ?

Outline

CS 264 Goals• Have fun!

• Learn basic principles of parallel computing

• Learn programming with CUDA

• Learn to program a cluster of GPUs (e.g. MPI)

• Learn basics of EC2 and MapReduce

• Learn new learning strategies, tools, etc.

• Implement a final project

Experimental Learning Strategy

Mem

ory

“rec

all”

Repeat, repeat, re

peat

Lectures

•Theory, Architecture, Patterns ?

•Act 1: GPU Computing

•Act II: Cloud Computing

•Act III: Guest Lectures

Lectures “Format”

• 2x ~ 45min regular “lectures”

• ~ 15min “Clinic”• we’ll be here to fix your problems

• ~ 5 min: Life and Code “Hacking”:• GTD Zen

• Presentation Zen

• Ninja Programming Tricks & Tools, etc.

• Interested? email staff+spotlight@cs264.org

Act I: GPU Computing

• Introduction to GPU Computing

• CUDA Basics

• CUDA Advanced

• CUDA Ninja Tricks !

Performance / Effort

Matlab

C/SSE

PS3

GT20010.0

30.0

10.0

0.5

339.3

111.4

9.0

0.3

Performance (g!ops) Development Time (hours)

3D Filterbank Convolution

Empirical results...

Performance (g!ops)

Q9450 (Matlab/C) [2008]

Q9450 (C/SSE) [2008]

7900GTX (Cg) [2006]

PS3/Cell (C/ASM) [2007]

8800GTX (CUDA1.x) [2007]

GTX280 (CUDA2.x) [2008]

GTX480 (CUDA3.x) [2010] 974.3

339.3

192.7

111.4

68.2

9.0

0.3

>1000X speedup is game changing...

Act II: Cloud Computing

• Introduction to utility computing

• EC2 & starcluster (Justin Riley, MIT OEIT)

• Hadoop (Zak Stone, SEAS)

• MapReduce with GPU Jobs on EC2

Amazon’s Web Services

• Elastic Compute Cloud (EC2)

• Rent computing resources by the hour

• Basic unit of accounting = instance-hour

• Additional costs for bandwidth

• You’ll be getting free AWS credits for course assignments

MapReduce

• Functional programming meets distributed processing

• Processing of lists with <key, value> pairs

• Batch data processing infrastructure

• Move the computation where the data is

Act III: Guest Lectures• Andreas Knockler (NYU): OpenCL & PyOpenCL

• John Owens (UC Davis): fundamental algorithms/data structures and irregular parallelism

• Nathan Bell (NVIDIA): Thrust

• Duane Merrill* (Virginia Tech): Ninja Tricks

• Mike Bauer* (Stanford): Sequoia

• Greg Diamos (Georgia Tech): Ocelot

• Other lecturers* from Google, Yahoo, Sun, Intel, NCSA, AMD, Cloudera, etc.

Labs

• Lead by TF(s)

• Work on an interesting small problem

• From skeleton code to solution

• Hands-on

53 Church St.

53 Church St.

53 Church St.

53 Church St., Rm 10453 Church St., Room 104

Thu, Fri 7.35-9.35 pm

53 Church St., Rm 10553 Church St., Room 105

NVIDIA Fx4800 Quadro• MacPro

• NVIDIA Fx4800 Quadro, 1.5 GB

What do you need to know?

• Programming (ideally in C / C++)

• See HW 0

• Basics of computer systems

• CS 61 or similar

Homeworks

• Programming assignments

• “Issue Spotter” (code debug & review, Q&A)

• Contribution to the community(OSS, Wikipedia, Stackoverflow, etc.)

• Due: Fridays at 11 pm EST

• Hard deadline - 2 “bonus” days

Office Hours

• Lead by a TF

• 104 @ 53 Church St (check website and news feed)

Participation

• HW0 (this week)

• Mandatory attendance for guest lectures

• forum.cs264.org

• Answer questions, help others

• Post relevant links and discussions (!)

Final Project

• Implement a substantial project

• Pick from a list of suggested projects or design your own

• Milestones along the way (idea, proposal, etc.)

• In-class final presentations

• $500+ price for the best project

Grading

• On a 0-100 scale

• Participation: 10%

• Homework: 50%

• Final project: 40%

www.cs264.org

• Detailed schedule (soon)

• News blog w/ RSS feed

• Video feeds

• Forum (forum.cs264.org)

• Academic honesty policy

• HW0 (due Fri 2/4)

iPhD

Thank you!

iPhD one more thingfrom WikiLeaks?

Is this course for me ???

This course is not for you...

• If you’re not genuinely interested in the topic

• If you can’t cope with uncertainly, unpredictability, poor documentation, and immature software

• If you’re not ready to do a lot of programming

• If you’re not open to thinking about computing in new ways

• If you can’t put in the time

Slide after Jimmy Lin, iSchool, Maryland

Otherwise...It will be a richly rewarding experience!

Guaranteed?!

http://davidzinger.wordpress.com/2007/05/page/2/

Be Patient

Be Flexible

Be Constructive

It would be a win-win-win situation!

(The Office Season 2, Episode 27: Conflict Resolution)

Hypergrowth ?

Acknowledgements

• Hanspeter Pfister & Henry Leitner, DCE

• TFs

• Rob Parrott & IT Team, SEAS

• Gabe Russell & Video Team, DCE

• NVIDIA, esp. David Luebke

• Amazon

COME

Next?

• Fill out the survey: http://bit.ly/enrb1r

• Get ready for HW0 (Lab 1 & 2)

• Subscribe to http://forum.cs264.org

• Subscribe to RSS feed: http://bit.ly/eFIsqR