+ All Categories
Home > Documents > Stream Programming: Luring Programmers into the Multicore Era Bill Thies Computer Science and...

Stream Programming: Luring Programmers into the Multicore Era Bill Thies Computer Science and...

Date post: 17-Jan-2016
Category:
Upload: georgia-nelson
View: 221 times
Download: 0 times
Share this document with a friend
Popular Tags:
78
Stream Programming: Luring Programmers into the Multicore Era Bill Thies Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology Spring 2008
Transcript
Page 1: Stream Programming: Luring Programmers into the Multicore Era Bill Thies Computer Science and Artificial Intelligence Laboratory Massachusetts Institute.

Stream Programming: LuringProgrammers into the Multicore Era

Bill Thies

Computer Science and Artificial Intelligence Laboratory

Massachusetts Institute of Technology

Spring 2008

Page 2: Stream Programming: Luring Programmers into the Multicore Era Bill Thies Computer Science and Artificial Intelligence Laboratory Massachusetts Institute.

Multicores are Here

1985 199019801970 1975 1995 2000

4004

8008

80868080 286 386 486 Pentium P2 P3P4Itanium

Itanium 2

2005

Raw

Power4Opteron

Power6

Niagara

YonahPExtreme

Tanglewood

Cell

IntelTflops

Xbox360

CaviumOcteon

RazaXLR

PA-8800

CiscoCSR-1

PicochipPC102

Broadcom 1480

20??

# ofcores

1

2

4

8

16

32

64

128

256

512

Opteron 4PXeon MP

Athlon

AmbricAM2045

Tilera

Page 3: Stream Programming: Luring Programmers into the Multicore Era Bill Thies Computer Science and Artificial Intelligence Laboratory Massachusetts Institute.

Multicores are Here

1985 199019801970 1975 1995 2000

4004

8008

80868080 286 386 486 Pentium P2 P3P4Itanium

Itanium 2

2005

Raw

Power4Opteron

Power6

Niagara

YonahPExtreme

Tanglewood

Cell

IntelTflops

Xbox360

CaviumOcteon

RazaXLR

PA-8800

CiscoCSR-1

PicochipPC102

Broadcom 1480

20??

# ofcores

1

2

4

8

16

32

64

128

256

512

Opteron 4PXeon MP

Athlon

AmbricAM2045

Tilera

Hardware wasresponsible forimproving performance

Page 4: Stream Programming: Luring Programmers into the Multicore Era Bill Thies Computer Science and Artificial Intelligence Laboratory Massachusetts Institute.

Multicores are Here

1985 199019801970 1975 1995 2000

4004

8008

80868080 286 386 486 Pentium P2 P3P4Itanium

Itanium 2

2005

Raw

Power4Opteron

Power6

Niagara

YonahPExtreme

Tanglewood

Cell

IntelTflops

Xbox360

CaviumOcteon

RazaXLR

PA-8800

CiscoCSR-1

PicochipPC102

Broadcom 1480

20??

# ofcores

1

2

4

8

16

32

64

128

256

512

Opteron 4PXeon MP

Athlon

AmbricAM2045

Tilera

Now, performanceburden falls onprogrammers

Page 5: Stream Programming: Luring Programmers into the Multicore Era Bill Thies Computer Science and Artificial Intelligence Laboratory Massachusetts Institute.

Is Parallel Programming a New Problem?

• No! Decades of research targeting multiprocessors– Languages, compilers, architectures, tools…

• What is different today?1. Multicores vs. multiprocessors. Multicores have:

- New interconnects with non-uniform communication costs

- Faster on-chip communication than off-chip I/O, memory ops

- Limited per-core memory availability

2. Non-expert programmers- Supercomputers with >2048 processors today: 100 [top500.org]

- Machines with >2048 cores in 2020: >100 million [ITU, Moore]

3. Application trends- Embedded: 2.7 billion cell phones vs 850 million PCs [ITU 2006]

- Data-centric: YouTube streams 200 TB of video daily

Page 6: Stream Programming: Luring Programmers into the Multicore Era Bill Thies Computer Science and Artificial Intelligence Laboratory Massachusetts Institute.

Streaming Application Domain

• For programs based on streams of data– Audio, video, DSP, networking, and

cryptographic processing kernels – Examples: HDTV editing, radar

tracking, microphone arrays, cell phone base stations, graphics

Adder

Speaker

AtoD

FMDemod

LPF1

Duplicate

RoundRobin

LPF2 LPF3

HPF1 HPF2 HPF3

Page 7: Stream Programming: Luring Programmers into the Multicore Era Bill Thies Computer Science and Artificial Intelligence Laboratory Massachusetts Institute.

Streaming Application Domain

• For programs based on streams of data– Audio, video, DSP, networking, and

cryptographic processing kernels – Examples: HDTV editing, radar

tracking, microphone arrays, cell phone base stations, graphics

• Properties of stream programs– Regular and repeating computation– Independent filters

with explicit communication– Data items have short lifetimes

Adder

Speaker

AtoD

FMDemod

LPF1

Duplicate

RoundRobin

LPF2 LPF3

HPF1 HPF2 HPF3

Page 8: Stream Programming: Luring Programmers into the Multicore Era Bill Thies Computer Science and Artificial Intelligence Laboratory Massachusetts Institute.

Brief History of Streaming1960 1970 1980 1990 2000

Models of Computation

Languages / Compilers

Modeling Environments

Petri NetsComp. Graphs

Kahn Proc. NetworksCommunicating Sequential Processes

SisalOccam

Lucid IdVALlazy

Synchronous Dataflow

Gabriel

LUSTRE

Ptolemy

Esterel

C

Grape-IIMatlab/Simulink

etc.

ErlangpH

Page 9: Stream Programming: Luring Programmers into the Multicore Era Bill Thies Computer Science and Artificial Intelligence Laboratory Massachusetts Institute.

Weaknesses• Unsuitable for static analysis• Cannot leverage deep results from DSP / modeling community

Strengths• Elegance• Generality

Brief History of Streaming1960 1970 1980 1990 2000

Models of Computation

Languages / Compilers

Modeling Environments

Petri NetsComp. Graphs

Kahn Proc. NetworksCommunicating Sequential Processes

SisalOccam

Lucid IdVALlazy

Synchronous Dataflow

Gabriel

LUSTRE

Ptolemy

Esterel

C

Grape-IIMatlab/Simulink

etc.

ErlangpH

Page 10: Stream Programming: Luring Programmers into the Multicore Era Bill Thies Computer Science and Artificial Intelligence Laboratory Massachusetts Institute.

Weaknesses• Unsuitable for static analysis• Cannot leverage deep results from DSP / modeling community

Strengths• Elegance• Generality

Brief History of Streaming1960 1970 1980 1990 2000

Models of Computation

Languages / Compilers

Modeling Environments

Petri NetsComp. Graphs

Kahn Proc. NetworksCommunicating Sequential Processes

SisalOccam

Lucid IdVALlazy

Synchronous Dataflow

Gabriel

LUSTRE

Ptolemy

Esterel

C

Grape-IIMatlab/Simulink

etc.

ErlangpH

StreamItCg StreamC

Brook

“StreamProgramming”

Page 11: Stream Programming: Luring Programmers into the Multicore Era Bill Thies Computer Science and Artificial Intelligence Laboratory Massachusetts Institute.

StreamIt: A Language and Compilerfor Stream Programs

• Key idea: design language that enables static analysis

• Goals:1. Expose and exploit the parallelism in stream programs

2. Improve programmer productivity in the streaming domain

• Project contributions:– Language design for streaming [CC'02, CAN'02, PPoPP'05, IJPP'05]

– Automatic parallelization [ASPLOS'02, G.Hardware'05, ASPLOS'06]

– Domain-specific optimizations [PLDI'03, CASES'05, TechRep'07]

– Cache-aware scheduling [LCTES'03, LCTES'05]

– Extracting streams from legacy code [MICRO'07]

– User + application studies [PLDI'05, P-PHEC'05, IPDPS'06]

– 7 years, 25 people, 300 KLOC

– 700 external downloads, 5 external publications

Page 12: Stream Programming: Luring Programmers into the Multicore Era Bill Thies Computer Science and Artificial Intelligence Laboratory Massachusetts Institute.

StreamIt: A Language and Compilerfor Stream Programs

• Key idea: design language that enables static analysis

• Goals:1. Expose and exploit the parallelism in stream programs

2. Improve programmer productivity in the streaming domain

• I contributed to:– Language design for streaming [CC'02, CAN'02, PPoPP'05, IJPP'05]

– Automatic parallelization [ASPLOS'02, G.Hardware'05, ASPLOS'06]

– Domain-specific optimizations [PLDI'03, CASES'05, TechRep'07]

– Cache-aware scheduling [LCTES'03, LCTES'05]

– Extracting streams from legacy code [MICRO'07]

– User + application studies [PLDI'05, P-PHEC'05, IPDPS'06]

– 7 years, 25 people, 300 KLOC

– 700 external downloads, 5 external publications

Page 13: Stream Programming: Luring Programmers into the Multicore Era Bill Thies Computer Science and Artificial Intelligence Laboratory Massachusetts Institute.

StreamIt: A Language and Compilerfor Stream Programs

• Key idea: design language that enables static analysis

• Goals:1. Expose and exploit the parallelism in stream programs

2. Improve programmer productivity in the streaming domain

• This talk:– Language design for streaming [CC'02, CAN'02, PPoPP'05, IJPP'05]

– Automatic parallelization [ASPLOS'02, G.Hardware'05, ASPLOS'06]

– Domain-specific optimizations [PLDI'03, CASES'05, TechRep'07]

– Cache-aware scheduling [LCTES'03, LCTES'05]

– Extracting streams from legacy code [MICRO'07]

– User + application studies [PLDI'05, P-PHEC'05, IPDPS'06]

– 7 years, 25 people, 300 KLOC

– 700 external downloads, 5 external publications

Page 14: Stream Programming: Luring Programmers into the Multicore Era Bill Thies Computer Science and Artificial Intelligence Laboratory Massachusetts Institute.

Part 1: Language Design

Joint work with Michael GordonWilliam Thies, Michal Karczmarek, Saman Amarasinghe (CC’02)

William Thies, Michal Karczmarek, Janis Sermulins, Rodric Rabbah,Saman Amarasinghe (PPoPP’05)

Page 15: Stream Programming: Luring Programmers into the Multicore Era Bill Thies Computer Science and Artificial Intelligence Laboratory Massachusetts Institute.

StreamIt Language Basics

• High-level, architecture-independent language– Backend support for uniprocessors, multicores (Raw, SMP),

cluster of workstations

• Model of computation: synchronous dataflow– Program is a graph of independent filters– Filters have an atomic execution step

with known input / output rates– Compiler is responsible for

scheduling and buffer management

• Extensions to synchronous dataflow – Dynamic I/O rates– Support for sliding window operations– Teleport messaging [PPoPP’05]

Decimate

Input

Output

1

10

1

1

x 10

x 1

x 1

[Lee & Messerschmidt, 1987]

Page 16: Stream Programming: Luring Programmers into the Multicore Era Bill Thies Computer Science and Artificial Intelligence Laboratory Massachusetts Institute.

Representing Streams

• Conventional wisdom: stream programs are graphs– Graphs have no simple textual representation– Graphs are difficult to analyze and optimize

• Insight: stream programs have structure

structuredunstructured

Page 17: Stream Programming: Luring Programmers into the Multicore Era Bill Thies Computer Science and Artificial Intelligence Laboratory Massachusetts Institute.

Structured Streams

may be any StreamIt language construct

joinersplitter

pipeline

feedback loop

joiner splitter

splitjoin

filter • Each structure is single-input, single-output

• Hierarchical and composable

Page 18: Stream Programming: Luring Programmers into the Multicore Era Bill Thies Computer Science and Artificial Intelligence Laboratory Massachusetts Institute.

Radar-Array Front End

Page 19: Stream Programming: Luring Programmers into the Multicore Era Bill Thies Computer Science and Artificial Intelligence Laboratory Massachusetts Institute.

Filterbank

Page 20: Stream Programming: Luring Programmers into the Multicore Era Bill Thies Computer Science and Artificial Intelligence Laboratory Massachusetts Institute.

FFT

Page 21: Stream Programming: Luring Programmers into the Multicore Era Bill Thies Computer Science and Artificial Intelligence Laboratory Massachusetts Institute.

Block Matrix Multiply

Page 22: Stream Programming: Luring Programmers into the Multicore Era Bill Thies Computer Science and Artificial Intelligence Laboratory Massachusetts Institute.

MP3 Decoder

Page 23: Stream Programming: Luring Programmers into the Multicore Era Bill Thies Computer Science and Artificial Intelligence Laboratory Massachusetts Institute.

Bitonic Sort

Page 24: Stream Programming: Luring Programmers into the Multicore Era Bill Thies Computer Science and Artificial Intelligence Laboratory Massachusetts Institute.

FM Radio with Equalizer

Page 25: Stream Programming: Luring Programmers into the Multicore Era Bill Thies Computer Science and Artificial Intelligence Laboratory Massachusetts Institute.

Ground Moving Target Indicator (GMTI)

99 filters

3566 filter instances

Page 26: Stream Programming: Luring Programmers into the Multicore Era Bill Thies Computer Science and Artificial Intelligence Laboratory Massachusetts Institute.

26

void->void pipeline FMRadio(int N, float lo, float hi) {add AtoD();

add FMDemod();

add splitjoin {

split duplicate;

for (int i=0; i<N; i++) {add pipeline {

add LowPassFilter(lo + i*(hi - lo)/N);

add HighPassFilter(lo + i*(hi - lo)/N);}

}join roundrobin();

}add Adder();

add Speaker();}

Adder

Speaker

AtoD

FMDemod

LPF1

Duplicate

RoundRobin

LPF2 LPF3

HPF1 HPF2 HPF3

Example Syntax: FMRadio

Page 27: Stream Programming: Luring Programmers into the Multicore Era Bill Thies Computer Science and Artificial Intelligence Laboratory Massachusetts Institute.

• Software radio

• Frequency hopping radio

• Acoustic beam former

• Vocoder

• FFTs and DCTs

• JPEG Encoder/Decoder

• MPEG-2 Encoder/Decoder

• MPEG-4 (fragments)

• Sorting algorithms

• GMTI (Ground Moving Target Indicator)

• DES and Serpent crypto algorithms

• SSCA#3 (HPCS scalable benchmark for synthetic aperture radar)

• Mosaic imaging using RANSAC algorithm

StreamIt Application Suite

Total size: 60,000 lines of code

Page 28: Stream Programming: Luring Programmers into the Multicore Era Bill Thies Computer Science and Artificial Intelligence Laboratory Massachusetts Institute.

Control Messages

• Occasionally, low-bandwidth control messages are sent between actors

• Often demands precise timing

– Communications: adjust protocol,amplification, compression

– Network router: cancel invalid packet

– Adaptive beamformer: track a target

– Respond to user input, runtime errors

– Frequency hopping radio

• Traditional techniques:

– Direct method call (no timing guarantees)

– Embed message in stream (opaque, slow)

AtoD

duplicate

LPF2LPF1 LPF3

HPF2HPF1 HPF3

Transmit

roundrobin

Encode

Decode

Page 29: Stream Programming: Luring Programmers into the Multicore Era Bill Thies Computer Science and Artificial Intelligence Laboratory Massachusetts Institute.

• Looks like method call, but timed relative to data in the stream

– Exposes dependences to compiler– Simple and precise for user

- Adjustable latency

- Can send upstream or downstream

void setProtocol(int p) { reconfig(p);}

TargetFilter x;if newProtocol(p) { x.setProtocol(p) @ 2;}

Idea 2: Teleport Messaging

Page 30: Stream Programming: Luring Programmers into the Multicore Era Bill Thies Computer Science and Artificial Intelligence Laboratory Massachusetts Institute.

Part 2: Automatic Parallelization

Joint work with Michael GordonMichael I. Gordon, William Thies, Saman Amarasinghe (ASPLOS’06)

Michael I. Gordon, William Thies, Michal Karczmarek, Jasper Lin, Ali S. Meli, Andrew A. Lamb, Chris Leger, Jeremy Wong, Henry Hoffmann, David Maze, Saman

Amarasinghe (ASPLOS’02)

Page 31: Stream Programming: Luring Programmers into the Multicore Era Bill Thies Computer Science and Artificial Intelligence Laboratory Massachusetts Institute.

Streaming is an Implicitly Parallel Model

• Programmer thinks about functionality, not parallelism

• More explicit models may…– Require knowledge of target [MPI] [cG]

– Require parallelism annotations [OpenMP] [HPF] [Cilk] [Intel TBB]

• Novelty over other implicit models?[Erlang] [MapReduce] [Sequoia] [pH] [Occam] [Sisal] [Id] [VAL] [LUSTRE][HAL] [THAL] [SALSA] [Rosette] [ABCL] [APL] [ZPL] [NESL] […]

Exploiting streaming structure for robust performance

Page 32: Stream Programming: Luring Programmers into the Multicore Era Bill Thies Computer Science and Artificial Intelligence Laboratory Massachusetts Institute.

Parallelism in Stream Programs

Task parallelism– Analogous to thread (fork/join)

parallelism

Data Parallelism

– Peel iterations of filter, place within scatter/gather pair (fission)

– parallelize filters with state

Pipeline Parallelism

– Between producers and consumers– Stateful filters can be parallelized

Splitter

Joiner

Task

Page 33: Stream Programming: Luring Programmers into the Multicore Era Bill Thies Computer Science and Artificial Intelligence Laboratory Massachusetts Institute.

Parallelism in Stream Programs

Task parallelism– Analogous to thread (fork/join)

parallelism

Data parallelism– Analogous to DOALL loops

Pipeline parallelism– Analogous to ILP that is

exploited in hardware

Splitter

Joiner

Splitter

Joiner

Task

Pip

elin

e

Data

Stateless

Page 34: Stream Programming: Luring Programmers into the Multicore Era Bill Thies Computer Science and Artificial Intelligence Laboratory Massachusetts Institute.

Baseline: Fine-Grained Data Parallelism

Adder

Splitter

Joiner

BandStopBandStopBandStopAdder

Splitter

Joiner

ExpandExpandExpand

ProcessProcessProcess

Joiner

BandPassBandPassBandPass

CompressCompressCompress

BandStopBandStopBandStop

Expand

BandStop

Splitter

Joiner

Splitter

Process

BandPass

Compress

Splitter

Joiner

Splitter

Joiner

Splitter

Joiner

ExpandExpandExpand

ProcessProcessProcess

Joiner

BandPassBandPassBandPass

CompressCompressCompress

BandStopBandStopBandStop

Expand

BandStop

Splitter

Joiner

Splitter

Process

BandPass

Compress

Splitter

Joiner

Splitter

Joiner

Splitter

Joiner

Page 35: Stream Programming: Luring Programmers into the Multicore Era Bill Thies Computer Science and Artificial Intelligence Laboratory Massachusetts Institute.

0

2

4

6

8

10

12

14

16

18

Bitoni

cSor

t

Chann

elVoc

oder

DCTDES

FFT

Filterb

ank

FMRad

io

Serpe

ntTDE

MPEG2-

subs

et

Vocod

er

Radar

Geom

etric

Mea

nTh

rou

gh

pu

t N

orm

aliz

ed t

o S

ing

le C

ore

Str

eam

It

Fine-Grained Data

Coarse-Grained Task + Data

Evaluation:Fine-Grained Data Parallelism

Raw Microprocessor 16 inorder, single-issue cores with D$ and I$

16 memory banks, each bank with DMACycle accurate simulator

Page 36: Stream Programming: Luring Programmers into the Multicore Era Bill Thies Computer Science and Artificial Intelligence Laboratory Massachusetts Institute.

0

2

4

6

8

10

12

14

16

18

Bitoni

cSor

t

Chann

elVoc

oder

DCTDES

FFT

Filterb

ank

FMRad

io

Serpe

ntTDE

MPEG2-

subs

et

Vocod

er

Radar

Geom

etric

Mea

nTh

rou

gh

pu

t N

orm

aliz

ed t

o S

ing

le C

ore

Str

eam

It

Fine-Grained Data

Coarse-Grained Task + Data

Evaluation:Fine-Grained Data Parallelism

Good Parallelism! Too Much Synchronization!

Page 37: Stream Programming: Luring Programmers into the Multicore Era Bill Thies Computer Science and Artificial Intelligence Laboratory Massachusetts Institute.

Splitter

Joiner

Expand

BandStop

Process

BandPass

Compress

Expand

BandStop

Process

BandPass

Compress

Adder

Coarsening the Granularity

Page 38: Stream Programming: Luring Programmers into the Multicore Era Bill Thies Computer Science and Artificial Intelligence Laboratory Massachusetts Institute.

Splitter

Joiner

BandPassCompressProcessExpand

BandPassCompressProcessExpand

BandStop BandStop

Adder

Coarsening the Granularity

Page 39: Stream Programming: Luring Programmers into the Multicore Era Bill Thies Computer Science and Artificial Intelligence Laboratory Massachusetts Institute.

Splitter

Joiner

BandPassCompressProcessExpand

Splitter

Joiner

BandPassCompressProcessExpand

BandPassCompressProcessExpand

Splitter

Joiner

BandPassCompressProcessExpand

BandStop BandStop

Coarsening the Granularity

Adder

Page 40: Stream Programming: Luring Programmers into the Multicore Era Bill Thies Computer Science and Artificial Intelligence Laboratory Massachusetts Institute.

BandStop BandStop

Splitter

Joiner

BandPassCompressProcessExpand

Splitter

Joiner

BandPassCompressProcessExpand

BandPassCompressProcessExpand

Splitter

Joiner

BandPassCompressProcessExpand

Splitter

Joiner

BandStop

Splitter

Joiner

BandStop

Coarsening the Granularity

AdderAdderAdderAdderAdder

Splitter

Joiner

Page 41: Stream Programming: Luring Programmers into the Multicore Era Bill Thies Computer Science and Artificial Intelligence Laboratory Massachusetts Institute.

0

2

4

6

8

10

12

14

16

18

Bitoni

cSor

t

Chann

elVoc

oder

DCTDES

FFT

Filterb

ank

FMRad

io

Serpe

ntTDE

MPEG2-

subs

et

Vocod

er

Radar

Geom

etric

Mea

nTh

rou

gh

pu

t N

orm

aliz

ed t

o S

ing

le C

ore

Str

eam

It

Fine-Grained Data

Coarse-Grained Task + Data

Evaluation: Coarse-Grained Data Parallelism

Good Parallelism! Low Synchronization!

Page 42: Stream Programming: Luring Programmers into the Multicore Era Bill Thies Computer Science and Artificial Intelligence Laboratory Massachusetts Institute.

Simplified Vocoder

RectPolar

Splitter

Joiner

AdaptDFT AdaptDFT

Splitter

Splitter

Amplify

Diff

UnWrap

Accum

Amplify

Diff

Unwrap

Accum

Joiner

Joiner

PolarRect

66

20

2

1

1

1

2

1

1

1

20 Data Parallel

Data Parallel

Target a 4-core machine

Page 43: Stream Programming: Luring Programmers into the Multicore Era Bill Thies Computer Science and Artificial Intelligence Laboratory Massachusetts Institute.

Data Parallelize

RectPolarRectPolarRectPolar

Splitter

Joiner

AdaptDFT AdaptDFT

Splitter

Splitter

Amplify

Diff

UnWrap

Accum

Amplify

Diff

Unwrap

Accum

Joiner

RectPolar

Splitter

Joiner

RectPolarRectPolarRectPolarPolarRect

Splitter

Joiner

Joiner

66

20

2

1

1

1

2

1

1

1

20

5

5

Target a 4-core machine

Page 44: Stream Programming: Luring Programmers into the Multicore Era Bill Thies Computer Science and Artificial Intelligence Laboratory Massachusetts Institute.

Data + Task Parallel Execution

Time

Cores

21

Splitter

Joiner

Splitter

Splitter

Joiner

Splitter

Joiner

RectPolarSplitter

Joiner

Joiner

66

2

1

1

1

2

1

1

1

5

5

Target a 4-core machine

Page 45: Stream Programming: Luring Programmers into the Multicore Era Bill Thies Computer Science and Artificial Intelligence Laboratory Massachusetts Institute.

We Can Do Better

Time

Cores

Splitter

Joiner

Splitter

Splitter

Joiner

Splitter

Joiner

RectPolarSplitter

Joiner

Joiner

66

2

1

1

1

2

1

1

1

5

5

16

Target a 4-core machine

Page 46: Stream Programming: Luring Programmers into the Multicore Era Bill Thies Computer Science and Artificial Intelligence Laboratory Massachusetts Institute.

RectPolar

RectPolar

RectPolar

RectPolar

Prologue

New Steady

State

Coarse-Grained Software Pipelining

Page 47: Stream Programming: Luring Programmers into the Multicore Era Bill Thies Computer Science and Artificial Intelligence Laboratory Massachusetts Institute.

0

2

4

6

8

10

12

14

16

18

Bitoni

cSor

t

Chann

elVoc

oder

DCTDES

FFT

Filterb

ank

FMRad

io

Serpe

ntTDE

MPEG2-

subs

et

Vocod

er

Radar

Geom

etric

Mea

nTh

rou

gh

pu

t N

orm

aliz

ed t

o S

ing

le C

ore

Str

eam

It

Fine-Grained DataCoarse-Grained Task + DataCoarse-Grained Task + Data + Software Pipeline

Evaluation: Coarse-Grained Task + Data + Software Pipelining

Page 48: Stream Programming: Luring Programmers into the Multicore Era Bill Thies Computer Science and Artificial Intelligence Laboratory Massachusetts Institute.

0

2

4

6

8

10

12

14

16

18

Bitoni

cSor

t

Chann

elVoc

oder

DCTDES

FFT

Filterb

ank

FMRad

io

Serpe

ntTDE

MPEG2-

subs

et

Vocod

er

Radar

Geom

etric

Mea

nTh

rou

gh

pu

t N

orm

aliz

ed t

o S

ing

le C

ore

Str

eam

It

Fine-Grained DataCoarse-Grained Task + DataCoarse-Grained Task + Data + Software Pipeline

Evaluation: Coarse-Grained Task + Data + Software Pipelining

Best Parallelism! Lowest Synchronization!

Page 49: Stream Programming: Luring Programmers into the Multicore Era Bill Thies Computer Science and Artificial Intelligence Laboratory Massachusetts Institute.

Parallelism: Take Away

• Stream programs have abundant parallelism– However, parallelism is obfuscated in language like C

• Stream languages enable new & effective mapping

– In C, analogous transformations impossibly complex – In StreamC or Brook, similar transformations possible

[Khailany et al., IEEE Micro’01] [Buck et al., SIGGRAPH’04] [Das et al., PACT’06] […]

• Results should extend to other multicores– Parameters: local memory, comm.-to-comp. cost– Preliminary results on Cell are promising [Zhang, dasCMP’07]

Coarsen Granularity

Data Parallelize

Software Pipeline

Page 50: Stream Programming: Luring Programmers into the Multicore Era Bill Thies Computer Science and Artificial Intelligence Laboratory Massachusetts Institute.

Part 3: Domain-Specific Optimizations

Joint work with Andrew Lamb, Sitij AgrawalAndrew Lamb, William Thies, Saman Amarasinghe (PLDI’03)

Sitij Agrawal, William Thies, Saman Amarasinghe (CASES’05)

Page 51: Stream Programming: Luring Programmers into the Multicore Era Bill Thies Computer Science and Artificial Intelligence Laboratory Massachusetts Institute.

DSP Optimization Process

• Given specification of algorithm,minimize the computation cost

Adder

Speaker

AtoD

FMDemod

LPF1

Duplicate

RoundRobin

LPF2 LPF3

HPF1 HPF2 HPF3

Linear

Page 52: Stream Programming: Luring Programmers into the Multicore Era Bill Thies Computer Science and Artificial Intelligence Laboratory Massachusetts Institute.

Equalizer

DSP Optimization Process

• Given specification of algorithm,minimize the computation cost

AtoD

Adder

LPF1

Duplicate

RoundRobin

LPF2 LPF3

HPF1 HPF2 HPF3

LinearFMDemod

Speaker

Page 53: Stream Programming: Luring Programmers into the Multicore Era Bill Thies Computer Science and Artificial Intelligence Laboratory Massachusetts Institute.

Speaker

Equalizer

DSP Optimization Process

• Given specification of algorithm,minimize the computation cost

AtoD

FMDemod

FFT

IFFT

Page 54: Stream Programming: Luring Programmers into the Multicore Era Bill Thies Computer Science and Artificial Intelligence Laboratory Massachusetts Institute.

DSP Optimization Process

• Given specification of algorithm,minimize the computation cost– Currently done by hand (MATLAB)

• Can compiler replace DSP expert?– Library generators limited [Spiral] [FFTW] [ATLAS]

– Enable unified development environment

Speaker

Equalizer

AtoD

FMDemod

IFFT

FFT

Page 55: Stream Programming: Luring Programmers into the Multicore Era Bill Thies Computer Science and Artificial Intelligence Laboratory Massachusetts Institute.

Focus: Linear State Space Filters

• Properties:– Outputs are linear function of inputs and states– New states are linear function of inputs and states

• Most common target of DSP optimizations– FIR / IIR filters– Linear difference equations– Upsamplers / downsamplers– DCTs

u

x’ = Ax + Bu

y = Cx + Du

inputs

states

outputs

Page 56: Stream Programming: Luring Programmers into the Multicore Era Bill Thies Computer Science and Artificial Intelligence Laboratory Massachusetts Institute.

Focus: Linear State Space Filters

u

x’ = Ax + Bu

y = Cx + Du

inputs

states

outputs

Page 57: Stream Programming: Luring Programmers into the Multicore Era Bill Thies Computer Science and Artificial Intelligence Laboratory Massachusetts Institute.

Focus: Linear Filters

float->float filter Scale { work push 2 pop 1 { float u = pop(); push(u); push(2*u); }}

u

y = Du

inputs

outputs

Linear dataflow analysis

Page 58: Stream Programming: Luring Programmers into the Multicore Era Bill Thies Computer Science and Artificial Intelligence Laboratory Massachusetts Institute.

Focus: Linear Filters

float->float filter Scale { work push 2 pop 1 { float u = pop(); push(u); push(2*u); }}

uinputs

outputs

Linear dataflow analysis

=y1y2

12

u

Page 59: Stream Programming: Luring Programmers into the Multicore Era Bill Thies Computer Science and Artificial Intelligence Laboratory Massachusetts Institute.

Combining Adjacent Filters

y = Du

z = EyG

z = EDu

Filter 1

Filter 2

y

u

z

CombinedFilter

u

z

z = Gu

Page 60: Stream Programming: Luring Programmers into the Multicore Era Bill Thies Computer Science and Artificial Intelligence Laboratory Massachusetts Institute.

Combination Example

Filter 1

Filter 2

y

u

z

CombinedFilter

u

z 654A E

3

2

1

B D

C = [ 32 ]G

1 mults

output6 mults

output

Page 61: Stream Programming: Luring Programmers into the Multicore Era Bill Thies Computer Science and Artificial Intelligence Laboratory Massachusetts Institute.

• If matrix dimensions mis-match?

The General Case

[D]U

E

[D]U

E

[D][D]

[D]

Original Expanded

pop =

Matrix expansion:

Page 62: Stream Programming: Luring Programmers into the Multicore Era Bill Thies Computer Science and Artificial Intelligence Laboratory Massachusetts Institute.

• If matrix dimensions mis-match?

The General Case

[D]U

E

[D]U

E

[D][D]

[D]

Original Expanded

pop =

Matrix expansion:

Page 63: Stream Programming: Luring Programmers into the Multicore Era Bill Thies Computer Science and Artificial Intelligence Laboratory Massachusetts Institute.

Pipelines

Feedback Loops

The General Case

Page 64: Stream Programming: Luring Programmers into the Multicore Era Bill Thies Computer Science and Artificial Intelligence Laboratory Massachusetts Institute.

Splitjoins

The General Case

Page 65: Stream Programming: Luring Programmers into the Multicore Era Bill Thies Computer Science and Artificial Intelligence Laboratory Massachusetts Institute.

-40%

-20%

0%

20%

40%

60%

80%

100%

Benchmark

Flo

ps

Rem

ove

d (

%)

linear

0.3%

Floating-Point Operations Reduction

Page 66: Stream Programming: Luring Programmers into the Multicore Era Bill Thies Computer Science and Artificial Intelligence Laboratory Massachusetts Institute.

-40%

-20%

0%

20%

40%

60%

80%

100%

Benchmark

Flo

ps

Rem

ove

d (

%)

linear

freq

-140%

0.3%

Floating-Point Operations Reduction

Page 67: Stream Programming: Luring Programmers into the Multicore Era Bill Thies Computer Science and Artificial Intelligence Laboratory Massachusetts Institute.

Splitter

Sink

RR

Mag

Detect

Duplicate

Mag

Detect

Mag

Detect

BeamForm

BeamForm

BeamForm

BeamForm

Filter Filter Filter Filter

Mag

Detect

RR

Splitter(null)

Input Input Input Input Input Input Input Input Input Input Input Input

Dec Dec Dec Dec Dec Dec Dec Dec Dec Dec Dec Dec

Dec Dec Dec Dec Dec Dec Dec Dec Dec Dec Dec Dec

FIR1 FIR1 FIR1 FIR1 FIR1 FIR FIR1 FIR1 FIR1 FIR1 FIR1 FIR1

FIR2 FIR2 FIR2 FIR2 FIR2 FIR FIR2 FIR2 FIR2 FIR2 FIR2 FIR2

Radar (Transformation Selection)

Page 68: Stream Programming: Luring Programmers into the Multicore Era Bill Thies Computer Science and Artificial Intelligence Laboratory Massachusetts Institute.

RR

RR

RR

Splitter

Sink

RR

Filter

Mag

Detect

Filter

Mag

Detect

Filter

Mag

Detect

Duplicate

BeamForm

BeamForm

BeamForm

BeamForm

Filter

Mag

Detect

Splitter(null)

Input Input Input Input Input Input Input Input Input Input Input Input

Radar (Transformation Selection)

Page 69: Stream Programming: Luring Programmers into the Multicore Era Bill Thies Computer Science and Artificial Intelligence Laboratory Massachusetts Institute.

RR

RR

Splitter

Sink

RR

Filter

Mag

Detect

Filter

Mag

Detect

Filter

Mag

Detect

RR

Duplicate

BeamForm

BeamForm

BeamForm

BeamForm

Filter

Mag

Detect

Splitter(null)

Input Input Input Input Input Input Input Input Input Input Input Input

Radar (Transformation Selection)

Page 70: Stream Programming: Luring Programmers into the Multicore Era Bill Thies Computer Science and Artificial Intelligence Laboratory Massachusetts Institute.

2.4 times as many FLOPS

half as many FLOPS

Radar (Transformation Selection)

RR

RR

Splitter

Sink

RR

Filter

Mag

Detect

Filter

Mag

Detect

Filter

Mag

Detect

Filter

Mag

Detect

Splitter(null)

Input Input Input Input Input Input Input Input Input Input Input Input

Splitter

Sink

RR

Mag

Duplicate

Mag Mag Mag

RR

Splitter(null)

Input Input Input Input Input Input Input Input Input Input Input Input

Maximal Combination andShifting to Frequency Domain

Using TransformationSelection

Page 71: Stream Programming: Luring Programmers into the Multicore Era Bill Thies Computer Science and Artificial Intelligence Laboratory Massachusetts Institute.

-40%

-20%

0%

20%

40%

60%

80%

100%

FIR

RateCon

vert

Targe

tDet

ect

FMRad

io

Radar

FilterB

ank

Vocode

r

Overs

ample

DToA

Benchmark

Flo

ps

Rem

ove

d (

%)

linear

freq

autosel

-140%

0.3%

Floating Point Operations Reduction

Page 72: Stream Programming: Luring Programmers into the Multicore Era Bill Thies Computer Science and Artificial Intelligence Laboratory Massachusetts Institute.

-200%

-100%

0%

100%

200%

300%

400%

500%

600%

700%

800%

900%

FIR

RateCon

vert

Targe

tDet

ect

FMRad

io

Radar

FilterB

ank

Vocode

r

Overs

ample

DToA

Benchmark

Sp

eed

up

(%

)

linear

freq

autosel

Execution Speedup

On a Pentium IV

5%

Page 73: Stream Programming: Luring Programmers into the Multicore Era Bill Thies Computer Science and Artificial Intelligence Laboratory Massachusetts Institute.

-200%

-100%

0%

100%

200%

300%

400%

500%

600%

700%

800%

900%

FIR

RateCon

vert

Targe

tDet

ect

FMRad

io

Radar

FilterB

ank

Vocode

r

Overs

ample

DToA

Benchmark

Sp

eed

up

(%

)

linear

freq

autosel

Execution Speedup

On a Pentium IV

5%

Additional transformations:1. Eliminating redundant states2. Eliminating parameters (non-zero, non-unary coefficients)3. Translation to the compressed domain

Page 74: Stream Programming: Luring Programmers into the Multicore Era Bill Thies Computer Science and Artificial Intelligence Laboratory Massachusetts Institute.

StreamIt: Lessons Learned

• In practice, I/O rates of filters are often matched [LCTES’03]

– Over 30 publications study an uncommon case (CD-DAT)

• Multi-phase filters complicate programs, compilers– Should maintain simplicity of only one atomic step per filter

• Programmers accidentally introduce mutable filter state

1 2 3 2 7 8 7 5

x 147 x 98 x 28 x 32

void>int filter SquareWave() { int x = 0;

work push 1 { push(x); x = 1 - x;} }

void>int filter SquareWave() {

work push 2 { push(0); push(1); }} stateful stateless

Page 75: Stream Programming: Luring Programmers into the Multicore Era Bill Thies Computer Science and Artificial Intelligence Laboratory Massachusetts Institute.

Future of StreamIt

• Goal: influence the next big language

Source: B. Stroustrup, The Design and Evolution of C++

1960

1970

1980

1990

Structural influenceFeature influenceFortran

Algol 60CPL

BCPL

C

ANSI C

Simula 67

C with Classes

C++

C++arm

C++std

ML CluAlgol 68

Ada

Origins of C++

Academic origin

Page 76: Stream Programming: Luring Programmers into the Multicore Era Bill Thies Computer Science and Artificial Intelligence Laboratory Massachusetts Institute.

Research Trajectory

• Vision: Make emerging computational substrates universally accessible and useful

1. Languages, compilers, & tools for multicores – I believe new language / compiler technology

can enable scalable and robust performance

– Next inroads: expose & exploit flexibility in programs

2. Programmable microfluidics – We have developed programming languages,

tools, and flexible new devices for microfluidics

– Potential to revolutionize biology experimentation

3. Technologies for the developing world – TEK: enable Internet experience over email account

– Audio Wiki: publish content from a low-cost phone

– uBox / uPhone: monitor & improve rural healthcare

Page 77: Stream Programming: Luring Programmers into the Multicore Era Bill Thies Computer Science and Artificial Intelligence Laboratory Massachusetts Institute.

Conclusions

• A parallel programming model will succeed only by luring programmers, making them do less, not more

• Stream programminglures programmers with:– Elegant programming primitives– Domain-specific optimizations

• Meanwhile, streamingis implicitly parallel– Robust performance via task,

data, & pipeline parallelism

• We believe stream programming will play a key rolein enabling a transition to multicore processors

Contributions– Structured streams

– Teleport messaging

– Unified algorithm for task,data, pipeline parallelism

– Software pipelining of whole procedures

– Algebraic simplification ofwhole procedures

– Translation from time to frequency

– Selection of best DSP transforms

Page 78: Stream Programming: Luring Programmers into the Multicore Era Bill Thies Computer Science and Artificial Intelligence Laboratory Massachusetts Institute.

Acknowledgments• Project supervisors

– Prof. Saman Amarasinghe – Dr. Rodric Rabbah

• Contributors to this talk– Michael I. Gordon (Ph.D. Candidate) – leads StreamIt backend efforts– Andrew A. Lamb (M.Eng) – led linear optimizations– Sitij Agrawal (M.Eng) – led statespace optimizations

• Compiler developers– Kunal Agrawal– Allyn Dimock– Qiuyuan Jimmy Li

• Application developers– Basier Aziz– Matthew Brown– Matthew Drake

• User interface developers– Kimberly Kuo

– Jasper Lin– Michal Karczmarek– David Maze

– Shirley Fung– Hank Hoffmann– Chris Leger

– Janis Sermulins– Phil Sung– David Zhang

– Ali Meli– Satish Ramaswamy– Jeremy Wong

– Juan Reyes


Recommended