Vorlesung Softwarearchitektur Einführung · Source: [4] 08 -51 everett.txt Education is a better...

08 -

Parallel Programming

8. Task Parallelism

Christoph von [email protected]

1

08 -

(1) Parallel algorithm structure design space

2

Organization by Tasks

(1.3) Task Parallelism

(1.4) Recursive Splitting

Organization by Data

Organization by Data Flow

(1.1) Geometric Decomposition

(1.2) Recursive Data

(1.5) Pipeline

08 -

(1.3) Task parallelism

3

Context:• Problem can be naturally decomposed into a

collection of tasks.• Tasks are (mostly) independent• Task is typically computationally bound, data

access may not be regular (as e.g. in data parallelism)

08 -

Example: Ray tracing

4

• 3D computer graphics• Rendering of shadows and reflection in

modeled images

“Juggler” (Amiga 1986): 4096 colors, 320x200 pixels sequential rendering 1 image ~1h

Source: [4]

• Simple geometric shapes (e.g. spheres, see Juggler)

• Vertex figures

• Triangulated shapes

08 -

Image model

5Source: [8, 9]

08 -

Ray tracing principles

6

Light Source

Shadow RayView Ray

Scene object

ImageCamera

• Each pixel corrsponds to one view ray.• Simulate rays of light relecting within a

mathematically defined scene. • Computation of pixels/ray traces are independent

Parallel Programming, Summer 2010Source: [5]

08 - 7

Raytracing for movies (e.g. Pixar)• 100s of lights• 1.000s of textures • 10.000s of scene objects• 100.000.000 of ploygons• Shaders with 10000 lines of code

(computation of shadow rays)• Resolution: 2048x1536 pixel

Challenge: Scene and auxiliary structures (textures) should fit in memory.

Source: [3]

Ever growing appetite for computing resources ...

Toy Story (1995) Cars (2005)Image resolution: 1536 x 922 2048 x 1536 Rendering algorithm: simple, scanline complex [3]Time per frame 2h 15h (!)

Next: Ray tracing animations:Juggler could be computed at 30fps today in real-time (in 1986: 1 frame/h)

08 - 8Source: [3,5]

Ray tracing has also important applications in medical imaging, e.g., analysis and correction of PET and CT images• Simulation of how radiation propagates through

the body and hits a camera model• Distinction of different types of tissues

based on their propagation of radiation at different energy levels.

08 - 9Source: [1, http://www.siemens.com/press/de/pressebilder/?press=/de/pp_cc/2007/10_oct/sosep200729_32_1465320.htm]

http://www.siemens.com/press/de/pressebilder/?press=/de/pp_cc/2007/10_oct/sosep200729_32_1465320.htm

http://www.siemens.com/press/de/pressebilder/?press=/de/pp_cc/2007/10_oct/sosep200729_32_1465320.htm

08 -

Scalability

10

Almost perfect (strong and weak) scaling on today’s architectures.Example:

Source: [5]

Animation of 256x256 pixels (Quake 4 game) on Intel Quad Core: • 4 cores 16.9 fps 3.84x speedup• 2 cores 8.6 fps 1.96x speedup • 1 core 4.4 fps

08 -

Other Examples

11

Molecular dynamics: Simulation of interaction between atoms in a chemical reaction.

Source: [1, http://www.almaden.ibm.com/st/computational_science/MSA/]

Computation proceeds in phases; each phase consists of a number of parallel tasks:1) For each atom: Find and compute forces that

impact movement (access to neighbour atoms).2) For each atom: Upate position and velocity3) For each atom: Update neighbor list

http://www.almaden.ibm.com/st/computational_science/MSA/

http://www.almaden.ibm.com/st/computational_science/MSA/

08 - 12

Forces• Task scheduling is the assignment of tasks

to units of execution units (UE)– typically: #tasks >> #UEs– goal: load balancing

a cb de

f

Example: tasks with different computational requirements should be mapped to 4 UEs.

Source: [1]

async { a(); }async { b(); }async { c(); }async { d(); }async { e(); }async { f(); }

08 - 13

Challenges: • Schedule is typically determined at run time• Execution time of tasks is not known in advance

unbalanced balanced

ac

e

b d f a ce

df

08 - 14

Forces (cont.)• Tasks may have dependences:

- limited choice of scheduler. E.g. if dependence a → e

is the best schedule-management of task order (task queue)

ac

e

b d f async { a(); async { e(); } }async { b(); }async { c(); }async { d(); }async { e(); }async { f(); }

08 - 15

Consequences• Task decomposition should

make sure data is efficiently accessible • Ray tracing: part of the scene and textures

required to compute one ray trace should fit into main memory of one UE.

• Several generic scheduling algorithms are known and commonly used in practice:• Thread-pool scheduler• Work-stealing scheduler

08 - 16

Scheduler by example:• Thread-pool

(current implementation of X10)• Work-stealing

08 - 17

Thead-pool scheduler

Global task queue:(typically a doubly-linked list)

Thread pool (2 UE):

head

tail

Each thread ...• ... repeatedly takes

unscheduled task from the head

• ... inserts new tasks at the tail

08 - 18

headtail

thread-1 thread-2

0 async { //task-0 a(); async { //task-1 b(); async { //task-2 c(); } d(); } async { //task-3 e(); } async { //task-4 f(); } g();}

08 - 19

head

tail

thread-1 thread-2

10

async { //task-0 a(); async { //task-1 b(); async { //task-2 c(); } d(); } async { //task-3 e(); } async { //task-4 f(); } g();}

08 - 20

head

thread-1 thread-2

10


tail

08 - 21

thread-1 thread-2

1 2

0


tail

head

08 - 22

thread-1 thread-2

1 2

3

0


tail

head

08 - 23

thread-1 thread-2

2

3

4

1

0


tail

head

08 - 24

thread-1 thread-2

0


2

3

4

tail

head

08 - 25


thread-1 thread-2

0

2

3

4

tail

head

08 - 26


thread-1 thread-2

2

3

4

tail

head

08 - 27


thread-1 thread-2

2

3 4

tail

head

08 - 28


thread-1 thread-2

3 4

tail

head

08 - 29


thread-1 thread-2

3

4 tailhead

08 - 30


thread-1 thread-2

tailhead

08 - 31

Work-stealing schedulerEach “thread” (2 UE) has its own task dequeue (double ended queue)

head

tail

Each thread ...• ... repeatedly takes task from head of its dequeue

- if local queue is empty, thread steals from the tail of another queue

• ... inserts new tasks at the head of its queue

headtail

08 - 32

thread-2

0 async { //task-0 a(); async { //task-1 b(); async { //task-2 c(); } d(); } async { //task-3 e(); } async { //task-4 f(); } g();}

thread-1

deque: owner accesses at head, others access at tail

08 - 33

0


1

thread-1 thread-2

headtail

08 - 34

thread-1

0


1

thread-2

headtail ste

al

08 - 35

thread-1

0


1

thread-2

3headtail

08 - 36

thread-1

0


1

thread-2

4

deqeue: owner inserts new task at head

3

head

tail

08 - 37

thread-1

0


1

thread-2

2

4

3

head

tailheadtail

deqeue: owner inserts new task at head

08 - 38

thread-1

0


thread-2

2

4

3

head

tailheadtail

08 - 39

thread-1

0


thread-2

24

3

head

tailheadtail

deqeue: owner takes task from head

08 - 40

thread-1


thread-2

24

3

head

tailheadtail

08 - 41

thread-1


thread-2

2

4

deque: owner takes task from head

3headtail

headtail

08 - 42

thread-1


thread-2

4

3headtail

headtail

08 - 43

thread-1


thread-2

3

4

3headtail

headtailste

al

deque: other thread steals from tail

08 - 44

thread-1


thread-2

headtail

headtail

deques empty, execution done

08 - 45

Benefits of the work-stealing scheduler: Data locality:

- of accesses to the (local) task-queue- of data accesses by the tasks (task usually

accesses some data initialized by parent task)

Work-stealing was developed for the programming langage CILK

- continuation-passing style - provable efficient in space and communication

[2]

08 -

Another Example

46Source: [4]

08 -

• Google computes an index that maps search terms to URLs:- continuation → {http://en.wikipedia.org/wiki/

Continuation, ..., http://en.wikipedia.org/wiki/Continuation-passing_style, ...}

- passing → {http://books.google.de/books?isbn=0486437132..., http://en.wikipedia.org/wiki/Continuation-passing_style, ...}

- style → {http://www.style.com/, ... http://en.wikipedia.org/wiki/Continuation-passing_style, ...}

• Search performs lookup and combines results

47

http://en.wikipedia.org/wiki/Continuation




http://books.google.de/books?isbn=0486437132




http://www.style.com

http://www.style.com

http://en.wikipedia.org/wiki/Continuation-passing_style




08 -

• Input: Large amounts of raw data (petabytes)– web pages– logs– web page meta data– ...

• Input data cannot fit on a single machine

48

How is the index computed?

08 -

Architecture for analyzing large amounts of raw data• By J. Dean and S. Ghemawat, OSDI, 2004• Used by Google for daily data analysis

and index creation• Today: Various frameworks and

implementations (C++, Java, ...)

49

Map-Reduce

08 -

1) Partition and distribute input data(100s or 1000s of parts)

2) Map tasks operate independently on each partition and compute partial results

3) Reduce tasks combine and aggregate the output of the map operations to a final result.

50

Map-Reduce principles

08 - 51Source: [4]

everett.txt

Education is a better safeguard of liberty than a standing army.

aristotle.txt

The roots of education are bitter, but the fruit is sweet.

hugo.txt

He who opens a school door, closes a prison.

he, 1who, 1opens, 1a, 1school, 1door, 1closes, 1a, 1prison, 1

the, 1roots, 1of, 1education, 1are, 1bitter, 1but, 1the, 1fruit, 1is, 1sweet, 1

map (3 tasks)

education, 1is, 1a, 1better, 1safeguard, 1of, 1liberty, 1than, 1a, 1standing, 1army, 1

08 - 52Source: [4]

partitionhe, 1who, 1opens, 1a, 1school, 1door, 1closes, 1a, 1prison, 1

the, 1roots, 1of, 1education, 1are, 1bitter, 1but, 1the, 1fruit, 1is, 1sweet, 1

education, 1is, 1a, 1better, 1safeguard, 1of, 1liberty, 1than, 1a, 1standing, 1army, 1

a, 1are, 1army, 1a, 1better, 1bitter, 1a, 1but, 1a, 1

of, 1is, 1eduction, 1liberty, 1opens, 1of, 1prison, 1

safeguard, 1sweet, 1than, 1the, 1roots, 1who, 1the, 1school, 1standing, 1

he, 1 closes, 1door, 1education, 1fruit, 1

a, [1, 1, 1, 1]are, [1]army, [1]better, [1]bitter, [1]but, [1]

safeguard, [1]school, [1]standing, [1]sweet, [1]than, [1]the, [1, 1]roots, [1]who, [1]

08 - 53Source: [4]

sort + group by key

is, [1, 1]liberty, [1]opens, [1]of, [1, 1]prison, [1]

closes, [1]door, [1]education, [1, 1]fruit, [1]he, [1]

a, 1are, 1army, 1a, 1better, 1bitter, 1a, 1but, 1a, 1

of, 1is, 1eduction, 1liberty, 1opens, 1of, 1prison, 1

safeguard, 1sweet, 1than, 1the, 1roots, 1who, 1the, 1school, 1standing, 1

he, 1 closes, 1door, 1education, 1fruit, 1

08 - 54Source: [4]

reduce (4 tasks)

a, [1, 1, 1, 1]are, [1]army, [1]better, [1]bitter, [1]but, [1]

safeguard, [1]school, [1]standing, [1]sweet, [1]than, [1]the, [1, 1]roots, [1]who, [1]

is, [1, 1]liberty, [1]opens, [1]of, [1, 1]prison, [1]

closes, [1]door, [1]education, [1, 1]fruit, [1]he, [1]

a, 4are, 1army, 1better, 1bitter, 1but, 1

safeguard, 1school, 1standing, 1sweet, 1than, 1the, 2roots, 1who, 1

is, 2liberty, 1opens, 1of, 2prison, 1

closes, 1door, 1education, 2fruit, 1he, 1

08 -

Input & Output: each a set of key/value pairsProgrammer specifies two functions:map (InKey, InValue): List[Pair[OutKey, IntermediateValue]]

• Processes input key/value pair• Produces set of intermediate pairs

reduce (OutKey, List[IntermediateValue]): List[OutValue]

• Combines all intermediate values for a particular key

• Produces a set of merged output values (usually just one)

55

Programming model

Source: [3]

08 -

Input & Output: each a set of key/value pairsProgrammer specifies two functions:map (InKey, InValue): List[Pair[OutKey, IntermediateValue]]

• Processes input key/value pair• Produces set of intermediate pairs


• Combines all intermediate values for a particular key

• Produces a set of merged output values (usually just one)

56

Programming model

Source: [3]

Inspired

by funct

ional pro

gramming

language

s (e.g. LI

SP)

From the

program

mer’s pers

pective,

model is s

equential

.

08 -

Word Counting: map function

57

/* @param inkey document name * @param invalue document contents * @param result collect intermediate result */def map (inkey: String, invalue: String, result: List[Pair[String, Int]]) { val chars = invalue.chars(); var sb: StringBuilder = new StringBuilder(); for (c in chars.region) { val tmp = chars(c); if (tmp.isLetterOrDigit()) sb.add(tmp.toLowerCase()); else { // emit result result.add(Pair[String, Int](sb.result(), 1)); sb = new StringBuilder(); } }}

map (InKey, InValue): List[Pair[OutKey, IntermediateValue]]

08 -

Word Counting: reduce function

58

/* * @param outkey word * @param intermediate_values occurrence of word in all texts * @return sum of occurrences */def reduce (outkey: String, intermediate_values: List[Int]): List[Int] { var acc: Int = 0; val ret = new List[Int](); for (i in intermediate_values) acc += i; ret.add(acc); return ret;}


08 -

Word Counting: main function

59

public static def main(Array[String]) { // (1) initialize val mr = new MapReduceImpl[String, String, Int, String, Int](3,3); val m = (ik:String, iv:String, result:List[Pair[String, Int]]) => { mr.map(ik, iv, result); }; val r = (ok: String, iv: List[Int]): List[Int] => { return mr.reduce(ok, iv); }; mr.registerReduceFun(r); mr.registerMapFun(m); mr.setInput(input);

// (2) run it mr.run();

// (3) print result val result <: List[Pair[String,Int]] = mr.getResult(); for (p in result) Console.OUT.println(p.first + ", " + p.second);}

08 -

Generic MapReduce

60

public interface MapReduce [InKey, InVal, IntermediateVal, OutKey, OutVal] {

property def num_map_tasks(): Int; property def num_reduce_tasks(): Int;

def setInput(i: Array[Pair[InKey, InVal]]); def registerMapFun(m: (InKey, InVal, List[Pair[OutKey, IntermediateVal]]) => void); def registerReduceFun(r: (OutKey, List[IntermediateVal]) => List[OutVal]); def run(); def getResult(): List[Pair[OutKey, List[OutVal]]];}

08 -

Programmer specifies two functions:map (InKey, InValue): List[Pair[OutKey, IntermediateValue]]


Parallelization, load-balancing and fault-tolerance are handled by the programming framework.

61

Programming model

Source: [3]

08 -

More examples of MapReduce programs:

• distributed grep• distributed sort• web link-graph reversal• term-vector per host• web access log stats• inverted index construction• document clustering• machine learning• ...

62Source: [3]

08 -

Task granularity and pipelining (Google, 2004)

• Often use 200.000 map 5000 reduce tasks on 2000 machines

• Fine granularity tasks: many more map tasks than machines- Minimizes time for fault recovery

(single task is small)- Better dynamic load balancing- Can pipeline shuffling (partition, sort/group)

with execution of map tasks.

63Source: [3]

08 -

...cont.• Fine granularity tasks: many more map tasks than

machines- Can pipeline shuffling (partition, sort/group)

with execution of map tasks.

64Source: [3]

“shuffling”: reduce tasks read/sort/group intermediate data

08 -

(1) Parallel algorithm structure design space

65

Organization by Tasks

(1.3) Task Parallelism

(1.4) Recursive Splitting

Organization by Data

Organization by Data Flow

(1.1) Geometric Decomposition

(1.2) Recursive Data

(1.5) Pipeline

08 -

(1.4) Recursive splitting

66

Context:• The solution to a problem is naturally

described through a recursive algorithm– also called “Divide and Conquer”– solution of a large problem can be

synthesized from the solution of smaller problems.

08 - 67

Context (cont.)• Each recursion step can be regarded as a

task – Tasks at different levels in the invocation

hierarchy are dependent, – Tasks at the same level of the hierarchy

are (mostly) independent

08 -

Example: Merge sort

68Source: [4]

4 × sequential sort

divide

divide merge

merge

13 7 5 8 2 1 13 9

13 7 5 8

13 7 5 8

2 1 13 9

2 1 13 9 7 13 5 8 1 2 139

13875 1 2 139

1313987521

08 -

Example: Merge sort

69

13 7 5 8 2 1 13 9

13 7 5 8

13 7 5 8

2 1 13 9

2 1 13 9 7 13 5 8 1 2 139

13875 1 2 139

1313987521

merge tasks sort tasks

1 2 9 13

08 - 70

sort tasks

5 7 8 13 1 2 9 13

13875 1 2 139

1313987521

7 13

13 7

5 8

5 8

1 2

2 1

139

913

merge tasks

Task graph

7 13 5 8

partial order due to data dependency (input/output)

08 - 71

Forces • Task can have different size

– load balancing– simple merge sort: last “large” task is

sequential

• Tasks can have dependences– due to input/output relation– manage dependences with explicit

synchronization (typically barriers)

08 - 72

Forces • Tasks granularity

– merge sort: threshold determines the input size below which sorting is done sequentially (quicksort)

• Scheduling (mapping of of tasks to UE)should support data locality– Work stealing is typically good– merge sort: better schema exist.

08 - 73

5 7 8 13 1 2 9 13

1313987521

7 13

13 7

5 8

5 8

1 2

2 1

139

913

Scheduling for data locality

Examples forpossible task schedules on single UE:

Constraint: partialorder

5

7

6

4321

4321 765

3521 764

1)2)

08 - 74

Assumptions:• Sort tasks perform sequential in-place sorting• Cache line is two values wide• Cache has capacity of two lines

• cache is fully associative

08 - 75

Schedule 1)

5 7 8 13 1 2 9 13

1313987521

7 13

13 7

5 8

5 8

1 2

2 1

139

913

5

7

6

4321

4321 765

cache contents cache hitcache miss X

!

X

X

08 - 76

Schedule 1)

5 7 8 13 1 2 9 13

1313987521

7 13

13 7

5 8

5 8

1 2

2 1

139

913

5

7

6

4321

4321 765


!

X

X X X

08 - 77

Schedule 1)

5 7 8 13 1 2 9 13

1313987521

7 13

13 7

5 8

5 8

1 2

2 1

139

913

5

7

6

4321

4321 765


!

X

X X X

X X

08 - 78

Schedule 1)

5 7 8 13 1 2 9 13

1313987521

7 13

13 7

5 8

5 8

1 2

2 1

139

913

5

7

6

4321

4321 765


!

X

X X X

X X X X

08 - 79

Schedule 1)

5 7 8 13 1 2 9 13

1313987521

7 13

13 7

5 8

5 8

1 2

2 1

139

913

5

7

6

4321

4321 765


!

X X X X

X X X X

X X !

!

!

2 hits10 misses(simplified)

08 - 80

Schedule 2)

5 7 8 13 1 2 9 13

1313987521

7 13

13 7

5 8

5 8

1 2

2 1

139

913

5

7

6

4321

4321 765


!

X X

!

08 - 81

Schedule 2)

5 7 8 13 1 2 9 13

1313987521

7 13

13 7

5 8

5 8

1 2

2 1

139

913

5

7

6

4321

4321 76


!

X X

!

! !

5

08 - 82

Schedule 2)

5 7 8 13 1 2 9 13

1313987521

7 13

13 7

5 8

5 8

1 2

2 1

139

913

5

7

6

4321

4321 76


!

X X

!

! !

5

X X

08 - 83

Schedule 2)

5 7 8 13 1 2 9 13

1313987521

7 13

13 7

5 8

5 8

1 2

2 1

139

913

5

7

6

4321

4321 7


!

X X

!

! !

5

X X

! !

6

08 - 84

Schedule 2)

5 7 8 13 1 2 9 13

1313987521

7 13

13 7

5 8

5 8

1 2

2 1

139

913

5

7

6

4321

4321 7


!

X X

!

! !

5

X X

! !

6

! !X X

6 hits6 misses(simplified)

Schedu

le 2)

has be

tter

data l

ocality

than

1) !

08 - 85

Task scheduling on multiple UEs:

• Parallel breadth first (PBF): {1,2,3,4,5,6,7}

• Parallel depth first (PDF): {1,2,5,3,4,6,7}

• Choice of schedule especially relevant for performance on CMPs (Chip-Multi-Processors)

• For merge sort, PDF is preferable. Details in [10]

5 7 8 13 1 2 9 13

1313987521

7 13

13 7

5 8

5 8

1 2

2 1

139

913

5

7

6

4321

08 -

Example: Traveling salesman (TSP)

86

A B

C D

A B

C D

A B

C DA, B, C, D, A A, C, B, D, A A, D, B, C, A

A B

C D

A B

C D

A B

C D

A, B, D, C, A A, C, D, B, A A, D, C, B, A

Input: Weighted, fully connected graphOutput: Shortest cycle through all nodes of the graph. Without loss of generality, tour starts and ends in .A

08 - 87

• TSP is an optimization problem• Naive strategy: Exhaustive search.

– One task per tour. – For N nodes, there are (N-1)! different

tours – Number of tours grows exponentially.

• Theory says: Problem is NP-complete.

08 - 88

... phrase problem as recursion

Problem formalization:• N = {0, ...., n-1} nodes• dist(i, j) = <direct distance from i to j>• L(i, A) = <length of shortest path from i to

0 through nodes A ⊆ N>

A better approach ...

08 - 89

• Recursion: –L(i, ∅) = dist(i, 0) –L(i, A) =

• Result: L(0, {1, ..., n-1})

Recursive procedure

j∈Amin ( dist(i, j) + L(j, A \ {j}) )

recursive call

08 - 90

Naive computation of the recursionL(0, {1,...,n-1})

L(1, {2,...,n-1}) L(2, {1,3,...,n-1}) L(n-1, {1,...,n-2})...

...... ...

L(1, ∅) L(1, ∅)... ...

L(3, {4,...,n-1})

• n-1 hierarchy levels• tasks at the same level can

be compted independently

L(3, {4,...,n-1})

task

08 - 91

Naive computation of the recursion

L(0, {1,...,n-1})

L(1, {2,...,n-1}) L(2, {1,3,...,n-1}) L(n-1, {1,...,n-2})...

...... ...

L(1, ∅) L(1, ∅)... ...

L(3, {4,...,n-1})

Same work is done several times!(in parallel)

L(3, {4,...,n-1})

08 - 92

• Key idea: Store intermediate results, and reuse them in other parts of the computation.

• For TSP: Maintain table for L(i, A), i∈{0...n-1}, A∈ P ({1...n-1}) (P: Powerset)– Problem: Size of that table is nx2n

Dynamic programming

08 - 93

• Recursion: –L(i, ∅) = dist(i, 0) –L(i, A) =

• Result: L(0, {1, ..., n-1}

Dynamic programming

j∈Amin (dist(i, j) + L(j, A \ {j})

recursive call or table lookup

08 - 94

Dynamic programming

L(0, {1,...,n-1})

L(1, {2,...,n-1}) L(2, {1,3,...,n-1}) L(n-1, {1,...,n-2})...

...... ...

L(1, ∅) L(1, ∅)... ...

L(3, {4,...,n-1}) L(3, {4,...,n-1})computation lookup

08 - 95

Dynamic programming

• Compute task tree bottom up

• Enumerate possible tasks at each hierarchy level; tasks at the same level can be done concurrently

• Barrier when ascending the hierarchy to the next level

L(0, {1,...,n-1})

L(1, {2,...,n-1}) L(2, {1,3,...,n-1}) L(n-1, {1,...,n-2})......... ...

L(1, ∅) L(1, ∅)... ...

L(3, {4,...,n-1}) L(3, {4,...,n-1})comptation lookup

08 -

[1] Timothy G. Mattson, Beverly A. Sanders, Berna L. Massingill: Patterns for Parallel Programming, Addison Wesley 2005.

[2] Robert D. Blumofe, Christopher F. Joerg, Bradley C. Kuszmaul, Charles E. Leiserson! Keith H. Randall!Yuli Zhou: Cilk - An Efficient Multithreaded Runtime System, PPoPP, 1995

[3] Jeffrey Dean und Sanjay Ghemawat: MapReduce: Simplified Data Processing on Large Clusters, OSDI, 2004. http://labs.google.com/papers/mapreduce.html

[4] Per H. Christensen, Julian Fong, David M. Laur, Dana Batali (Pixar Animation Studios): Ray tracing for the movie ‘Cars’, Ray Tracing Symposium 2006. http://www.sci.utah.edu/~wald/RT06/papers/raytracing06per.pdf http://graphics.pixar.com/library/RayTracingCars/paper.pdf

96

Sources

http://www.sci.utah.edu/~wald/RT06/papers/raytracing06per.pdfmposium

http://www.sci.utah.edu/~wald/RT06/papers/raytracing06per.pdfmposium

http://graphics.pixar.com/library/RayTracingCars/paper.pdf

http://graphics.pixar.com/library/RayTracingCars/paper.pdf

08 -

[5] Ernie Wright: Amiga Juggler animation. http://home.comcast.net/~erniew/juggler.html

[6] Jeff Atwood: Real-Time Raytracing: Blog Entry March 10, 2008.

[7] Nate Nystrom, X10, work stealing, Lecture CSE 3302, April 2010.

[8] William Welch and Andrew Witkin. Free form shape design using triangulated surfaces. Computer Graphics (SIGGRAPH), 1994.

[9] Wolfram Math World: http://mathworld.wolfram.com/VertexFigure.html

[10] S. Chen, P. B. Gibbons, M. Kozuch, V. Liaskovitis, A. Ailamaki, G. E. Blelloch, B. Falsafi, L. Fix, N. Hardavellas, T. C. Mowry, and C. Wilkerson. Scheduling threads for constructive cache sharing on CMPs. Symp. on Parallel Algorithms and Architectures (SPAA), 2007.

97

Sources

http://home.comcast.net/~erniew/juggler.html

http://home.comcast.net/~erniew/juggler.html

http://mathworld.wolfram.com/VertexFigure.html

http://mathworld.wolfram.com/VertexFigure.html

08 - 98

This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 License.

• You are free:– to Share — to copy, distribute and transmit the work – to Remix — to adapt the work

• Under the following conditions:– Attribution. You must attribute the work to “The Art of

Multiprocessor Programming” (but not in any way that suggests that the authors endorse you or your use of the work).

– Share Alike. If you alter, transform, or build upon this work, you may distribute the resulting work only under the same, similar or a compatible license.

• For any reuse or distribution, you must make clear to others the license terms of this work. The best way to do this is with a link to– http://creativecommons.org/licenses/by-sa/3.0/.

• Any of the above conditions can be waived if you get permission from the copyright holder.

• Nothing in this license impairs or restricts the author's moral rights.

http://creativecommons.org/licenses/by-sa/2.5/















Date post:	02-Aug-2020
Category:	Documents
Upload:	others
View:	0 times
Download:	0 times

Vorlesung Softwarearchitektur Einführung · Source: [4] 08 -51 everett.txt Education is a better...

Documents