+ All Categories
Home > Documents > Sound and Precise Analysis of Parallel Programs through Schedule Specialization

Sound and Precise Analysis of Parallel Programs through Schedule Specialization

Date post: 22-Feb-2016
Category:
Upload: jag
View: 38 times
Download: 0 times
Share this document with a friend
Description:
Sound and Precise Analysis of Parallel Programs through Schedule Specialization. Jingyue Wu, Yang Tang, Gang Hu, Heming Cui, Junfeng Yang Columbia University. Motivation. Analyzing parallel programs is difficult. . precision. Total Schedules. Dynamic Analysis. Analyzed Schedules. ?. - PowerPoint PPT Presentation
36
Sound and Precise Analysis of Parallel Programs through Schedule Specialization Jingyue Wu, Yang Tang, Gang Hu, Heming Cui, Junfeng Yang Columbia University 1
Transcript
Page 1: Sound and Precise Analysis of Parallel Programs through Schedule Specialization

Sound and Precise Analysis ofParallel Programs through

Schedule Specialization

Jingyue Wu, Yang Tang, Gang Hu, Heming Cui, Junfeng YangColumbia University

1

Page 2: Sound and Precise Analysis of Parallel Programs through Schedule Specialization

2

Motivation

soundness (# of analyzed schedules / # of total schedules)

precision Total Schedules

AnalyzedSchedules

StaticAnalysis

DynamicAnalysis

AnalyzedSchedules

?

• Analyzing parallel programs is difficult.

Page 3: Sound and Precise Analysis of Parallel Programs through Schedule Specialization

3

• Precision: Analyze the program over a small set of schedules. • Soundness: Enforce these schedules at runtime.

Schedule Specialization

soundness (# of analyzed schedules / # of total schedules)

precision Total Schedules

StaticAnalysis

DynamicAnalysis

AnalyzedSchedules

EnforcedSchedules

ScheduleSpecialization

Page 4: Sound and Precise Analysis of Parallel Programs through Schedule Specialization

4

Enforcing Schedules Using Peregrine

• Deterministic multithreading– e.g. DMP (ASPLOS ’09), Kendo (ASPLOS ’09), CoreDet

(ASPLOS ’10), Tern (OSDI ’10), Peregrine (SOSP ’11), DTHREADS (SOSP ’11)

– Performance overhead• e.g. Kendo: 16%, Tern & Peregrine: 39.1%

• Peregrine– Record schedules, and reuse them on a wide range of

inputs.– Represent schedules explicitly.

Page 5: Sound and Precise Analysis of Parallel Programs through Schedule Specialization

5

• Precision: Analyze the program over a small set of schedules. • Soundness: Enforce these schedules at runtime.

Schedule Specialization

soundness (# of analyzed schedules / # of total schedules)

precision

StaticAnalysis

DynamicAnalysis

AnalyzedSchedules

EnforcedSchedulesSchedule

Specialization

Page 6: Sound and Precise Analysis of Parallel Programs through Schedule Specialization

6

Framework

• Extract control flow and data flow enforced by a set of schedules

Schedule

ScheduleSpecializationProgram

C/C++ programwith Pthread

Total order ofsynchronizations

SpecializedProgram

Extra def-usechains

Page 7: Sound and Precise Analysis of Parallel Programs through Schedule Specialization

7

Outline

• Example• Control-Flow Specialization• Data-Flow Specialization• Results• Conclusion

Page 8: Sound and Precise Analysis of Parallel Programs through Schedule Specialization

Running Example

int results[p_max];int global_id = 0;

int main(int argc, char *argv[]) { int i; int p = atoi(argv[1]); for (i = 0; i < p; ++i) pthread_create(&child[i], 0, worker, 0); for (i = 0; i < p; ++i) pthread_join(child[i], 0); return 0;}

void *worker(void *arg) { pthread_mutex_lock(&global_id_lock); int my_id = global_id++; pthread_mutex_unlock(&global_id_lock); results[my_id] = compute(my_id); return 0;}

8

Thread 0 Thread 1 Thread 2

create

create

join

join

lock

unlocklock

unlock

Race-free?

Page 9: Sound and Precise Analysis of Parallel Programs through Schedule Specialization

Control-Flow Specializationint main(int argc, char *argv[]) { int i; int p = atoi(argv[1]); for (i = 0; i < p; ++i) pthread_create(&child[i], 0, worker, 0); for (i = 0; i < p; ++i) pthread_join(child[i], 0); return 0;}

9

create

create

join

join

atoi

++i

create

return

i = 0

i < p

++i

join

i < p

i = 0

Page 10: Sound and Precise Analysis of Parallel Programs through Schedule Specialization

Control-Flow Specializationint main(int argc, char *argv[]) { int i; int p = atoi(argv[1]); for (i = 0; i < p; ++i) pthread_create(&child[i], 0, worker, 0); for (i = 0; i < p; ++i) pthread_join(child[i], 0); return 0;}

10

create

create

join

join

atoi

++i

create

return

i = 0

i < p

++i

join

i < p

i = 0

atoi

create

i = 0

i < p

Page 11: Sound and Precise Analysis of Parallel Programs through Schedule Specialization

Control-Flow Specializationint main(int argc, char *argv[]) { int i; int p = atoi(argv[1]); for (i = 0; i < p; ++i) pthread_create(&child[i], 0, worker, 0); for (i = 0; i < p; ++i) pthread_join(child[i], 0); return 0;}

11

create

create

join

join

atoi

++i

create

return

i = 0

i < p

++i

join

i < p

i = 0

create

atoi

i = 0

i < p

create

++i

create

i < p

Page 12: Sound and Precise Analysis of Parallel Programs through Schedule Specialization

Control-Flow Specializationint main(int argc, char *argv[]) { int i; int p = atoi(argv[1]); for (i = 0; i < p; ++i) pthread_create(&child[i], 0, worker, 0); for (i = 0; i < p; ++i) pthread_join(child[i], 0); return 0;}

12

create

create

join

join

atoi

++i

create

return

i = 0

i < p

++i

join

i < p

i = 0

atoi

create

i = 0

i < p

++i

create

i < p

++i

i < p

join

i < p

i = 0

++i

join

i < p

++i

i < p

return

Page 13: Sound and Precise Analysis of Parallel Programs through Schedule Specialization

13

Control-Flow Specialized Program

int main(int argc, char *argv[]) { int i; int p = atoi(argv[1]); i = 0; // i < p == true pthread_create(&child[i], 0, worker.clone1, 0); ++i; // i < p == true pthread_create(&child[i], 0, worker.clone2, 0); ++i; // i < p == false i = 0; // i < p == true pthread_join(child[i], 0); ++i; // i < p == true pthread_join(child[i], 0); ++i; // i < p == false return 0;}

atoi

create

i = 0

i < p

++i

create

i < p

++i

i < p

join

i < p

i = 0

++i

join

i < p

++i

i < p

return

Page 14: Sound and Precise Analysis of Parallel Programs through Schedule Specialization

14

More Challenges onControl-Flow Specialization

• Ambiguity

call

Caller Callee

call

S1

• A schedule has too many synchronizations

ret

S2

Page 15: Sound and Precise Analysis of Parallel Programs through Schedule Specialization

Data-Flow Specialization

int global_id = 0;

void *worker.clone1(void *arg) { pthread_mutex_lock(&global_id_lock); int my_id = global_id++; pthread_mutex_unlock(&global_id_lock); results[my_id] = compute(my_id); return 0;}

void *worker.clone2(void *arg) { pthread_mutex_lock(&global_id_lock); int my_id = global_id++; pthread_mutex_unlock(&global_id_lock); results[my_id] = compute(my_id); return 0;}

15

Thread 0 Thread 1 Thread 2

create

create

join

join

lock

unlock

lock

unlock

global_id = 0

my_id = global_idglobal_id++

my_id = global_idglobal_id++

Page 16: Sound and Precise Analysis of Parallel Programs through Schedule Specialization

Data-Flow Specialization

int global_id = 0;

void *worker.clone1(void *arg) { pthread_mutex_lock(&global_id_lock); int my_id = global_id++; pthread_mutex_unlock(&global_id_lock); results[my_id] = compute(my_id); return 0;}

void *worker.clone2(void *arg) { pthread_mutex_lock(&global_id_lock); int my_id = global_id++; pthread_mutex_unlock(&global_id_lock); results[my_id] = compute(my_id); return 0;}

16

Thread 0 Thread 1 Thread 2

create

create

join

join

lock

unlock

lock

unlock

global_id = 0

my_id = global_idglobal_id++

my_id = global_idglobal_id++

Page 17: Sound and Precise Analysis of Parallel Programs through Schedule Specialization

Data-Flow Specialization

int global_id = 0;

void *worker.clone1(void *arg) { pthread_mutex_lock(&global_id_lock); int my_id = global_id++; pthread_mutex_unlock(&global_id_lock); results[my_id] = compute(my_id); return 0;}

void *worker.clone2(void *arg) { pthread_mutex_lock(&global_id_lock); int my_id = global_id++; pthread_mutex_unlock(&global_id_lock); results[my_id] = compute(my_id); return 0;}

17

Thread 0 Thread 1 Thread 2

create

create

join

join

lock

unlock

lock

unlock

global_id = 0

my_id = 0global_id = 1

my_id = global_idglobal_id++

Page 18: Sound and Precise Analysis of Parallel Programs through Schedule Specialization

Data-Flow Specialization

int global_id = 0;

void *worker.clone1(void *arg) { pthread_mutex_lock(&global_id_lock); int my_id = global_id++; pthread_mutex_unlock(&global_id_lock); results[my_id] = compute(my_id); return 0;}

void *worker.clone2(void *arg) { pthread_mutex_lock(&global_id_lock); int my_id = global_id++; pthread_mutex_unlock(&global_id_lock); results[my_id] = compute(my_id); return 0;}

18

Thread 0 Thread 1 Thread 2

create

create

join

join

lock

unlock

lock

unlock

global_id = 0

my_id = 0global_id = 1

my_id = 1global_id = 2

Page 19: Sound and Precise Analysis of Parallel Programs through Schedule Specialization

Data-Flow Specialization

int global_id = 0;

void *worker.clone1(void *arg) { pthread_mutex_lock(&global_id_lock); global_id = 1; pthread_mutex_unlock(&global_id_lock); results[0] = compute(0); return 0;}

void *worker.clone2(void *arg) { pthread_mutex_lock(&global_id_lock); global_id = 2; pthread_mutex_unlock(&global_id_lock); results[1] = compute(1); return 0;}

19

Thread 0 Thread 1 Thread 2

create

create

join

join

lock

unlock

lock

unlock

global_id = 0

my_id = 0global_id = 1

my_id = 1global_id = 2

Page 20: Sound and Precise Analysis of Parallel Programs through Schedule Specialization

20

More Challenges onData-Flow Specialization

• Must/May alias analysis– global_id

• Reasoning about integers– results[0] = compute(0)– results[1] = compute(1)

• Many def-use chains

Page 21: Sound and Precise Analysis of Parallel Programs through Schedule Specialization

21

Evaluation

• Applications– Static race detector– Alias analyzer– Path slicer

• Programs– PBZip2 1.1.5– aget 0.4.1– 8 programs in SPLASH2– 7 programs in PARSEC

Page 22: Sound and Precise Analysis of Parallel Programs through Schedule Specialization

22

Program Original Specialized

aget 72 0

PBZip2 125 0

fft 96 0

blackscholes 3 0

swaptions 165 0

streamcluster 4 0

canneal 21 0

bodytrack 4 0

ferret 6 0

raytrace 215 0

cholesky 31 7

radix 53 14

water-spatial 2447 1799

lu-contig 18 18

barnes 370 369

water-nsquared 354 333

ocean 331 292

StaticRaceDetector

# of FalsePositives

Page 23: Sound and Precise Analysis of Parallel Programs through Schedule Specialization

23

Program Original Specialized

aget 72 0

PBZip2 125 0

fft 96 0

blackscholes 3 0

swaptions 165 0

streamcluster 4 0

canneal 21 0

bodytrack 4 0

ferret 6 0

raytrace 215 0

cholesky 31 7

radix 53 14

water-spatial 2447 1799

lu-contig 18 18

barnes 370 369

water-nsquared 354 333

ocean 331 292

StaticRaceDetector

# of FalsePositives

Page 24: Sound and Precise Analysis of Parallel Programs through Schedule Specialization

24

Program Original Specialized

aget 72 0

PBZip2 125 0

fft 96 0

blackscholes 3 0

swaptions 165 0

streamcluster 4 0

canneal 21 0

bodytrack 4 0

ferret 6 0

raytrace 215 0

cholesky 31 7

radix 53 14

water-spatial 2447 1799

lu-contig 18 18

barnes 370 369

water-nsquared 354 333

ocean 331 292

StaticRaceDetector

# of FalsePositives

Page 25: Sound and Precise Analysis of Parallel Programs through Schedule Specialization

25

Program Original Specialized

aget 72 0

PBZip2 125 0

fft 96 0

blackscholes 3 0

swaptions 165 0

streamcluster 4 0

canneal 21 0

bodytrack 4 0

ferret 6 0

raytrace 215 0

cholesky 31 7

radix 53 14

water-spatial 2447 1799

lu-contig 18 18

barnes 370 369

water-nsquared 354 333

ocean 331 292

StaticRaceDetector

# of FalsePositives

Page 26: Sound and Precise Analysis of Parallel Programs through Schedule Specialization

26

Static Race Detector: Harmful Races Detected

• 4 in aget• 2 in radix• 1 in fft

Page 27: Sound and Precise Analysis of Parallel Programs through Schedule Specialization

27

Precision of Schedule-AwareAlias Analysis

Page 28: Sound and Precise Analysis of Parallel Programs through Schedule Specialization

28

Precision of Schedule-AwareAlias Analysis

Page 29: Sound and Precise Analysis of Parallel Programs through Schedule Specialization

29

Precision of Schedule-AwareAlias Analysis

Page 30: Sound and Precise Analysis of Parallel Programs through Schedule Specialization

30

Conclusion and Future Work

• Designed and implemented schedule specialization framework– Analyzes the program over a small set of schedules– Enforces these schedules at runtime

• Built and evaluated three applications– Easy to use– Precise

• Future work– More applications– Similar specialization ideas on sequential programs

Page 31: Sound and Precise Analysis of Parallel Programs through Schedule Specialization

31

Related Work• Program analysis for parallel programs

– Chord (PLDI ’06), RADAR (PLDI ’08), FastTrack (PLDI ’09)• Slicing

– Horgon (PLDI ’90), Bouncer (SOSP ’07), Jhala (PLDI ’05), Weiser (PhD thesis), Zhang (PLDI ’04)

• Deterministic multithreading– DMP (ASPLOS ’09), Kendo (ASPLOS ’09), CoreDet (ASPLOS ’10),

Tern (OSDI ’10), Peregrine (SOSP ’11), DTHREADS (SOSP ’11)• Program specialization

– Consel (POPL ’93), Gluck (ISPL ’95), Jørgensen (POPL ’92), Nirkhe (POPL ’92), Reps (PDSPE ’96)

Page 32: Sound and Precise Analysis of Parallel Programs through Schedule Specialization

32

Backup Slides

Page 33: Sound and Precise Analysis of Parallel Programs through Schedule Specialization

33

Specialization Time

Page 34: Sound and Precise Analysis of Parallel Programs through Schedule Specialization

34

Handling Races

• We do not assume data-race freedom. • We could if our only goal is optimization.

Page 35: Sound and Precise Analysis of Parallel Programs through Schedule Specialization

35

Input Coverage

• Use runtime verification for the inputs not covered

• A small set of schedules can cover a wide range of inputs

Page 36: Sound and Precise Analysis of Parallel Programs through Schedule Specialization

36


Recommended