+ All Categories
Home > Documents > Synchronization Transformations for Parallel Computing

Synchronization Transformations for Parallel Computing

Date post: 15-Jan-2016
Category:
Upload: happy
View: 22 times
Download: 1 times
Share this document with a friend
Description:
Synchronization Transformations for Parallel Computing. Pedro Diniz and Martin Rinard Department of Computer Science University of California, Santa Barbara http://www.cs.ucsb.edu/~{pedro,martin}. Motivation. Parallel Computing Becomes Dominant Form of Computation - PowerPoint PPT Presentation
Popular Tags:
34
Synchronization Transformations for Parallel Computing Pedro Diniz and Martin Rinard Department of Computer Science University of California, Santa Barbara http://www.cs.ucsb.edu/ ~{pedro,martin}
Transcript
Page 1: Synchronization Transformations for Parallel Computing

Synchronization Transformationsfor

Parallel Computing

Pedro Dinizand

Martin Rinard

Department of Computer ScienceUniversity of California, Santa Barbara

http://www.cs.ucsb.edu/~{pedro,martin}

Page 2: Synchronization Transformations for Parallel Computing

Motivation

Parallel Computing Becomes Dominant Form of Computation

Parallel Machines Require Parallel Software

Parallel Constructs Require New Analysis and Optimization Techniques

Our GoalEliminate Synchronization Overhead

Page 3: Synchronization Transformations for Parallel Computing

Talk Outline

• Motivation

• Model of Computation

• Synchronization Optimization Algorithm

• Applications Experience

• Dynamic Feedback

• Related Work

• Conclusions

Page 4: Synchronization Transformations for Parallel Computing

Model of Computation

• Parallel Programs• Serial Phases• Parallel Phases

• Single Address Space

• Atomic Operations on Shared Data• Mutual Exclusion Locks• Acquire Constructs• Release Constructs

Acq

S1MutualExclusionRegion

Rel

Page 5: Synchronization Transformations for Parallel Computing

Reducing Synchronization Overhead

Acq

S1

S2

Rel

S3

Page 6: Synchronization Transformations for Parallel Computing

Rel

Acq

Page 7: Synchronization Transformations for Parallel Computing

Synchronization Optimization

Idea:Replace Computations that Repeatedly Acquire and Release the Same Lock with a Computation that Acquires and Releases the Lock Only Once

Result:Reduction in the Number of

Executed Acquire and Release Constructs

Mechanism:Lock Movement Transformations and

Lock Cancellation Transformations

Page 8: Synchronization Transformations for Parallel Computing

Lock Cancellation

Page 9: Synchronization Transformations for Parallel Computing

Acquire Lock Movement

Page 10: Synchronization Transformations for Parallel Computing

Release Lock Movement

Page 11: Synchronization Transformations for Parallel Computing

Synchronization Optimization Algorithm

Overview:

• Find Two Mutual Exclusion Regions With the Same Lock

• Expand Mutual Exclusion Regions Using Lock Movement Transformations Until They are Adjacent

• Coalesce Using Lock Cancellation Transformation to Form a Single Larger Mutual Exclusion Region

Page 12: Synchronization Transformations for Parallel Computing

Interprocedural Control Flow Graph

Page 13: Synchronization Transformations for Parallel Computing

Acquire Movement Paths

Page 14: Synchronization Transformations for Parallel Computing

Release Movement Paths

Page 15: Synchronization Transformations for Parallel Computing

Migration Paths and Meeting Edge

Page 16: Synchronization Transformations for Parallel Computing

Intersection of Paths

Page 17: Synchronization Transformations for Parallel Computing

Compensation Nodes

Page 18: Synchronization Transformations for Parallel Computing

Final Result

Page 19: Synchronization Transformations for Parallel Computing

Synchronization Optimization Trade-Off

• Advantage: • Reduces Number of Executed Acquires and Releases• Reduces Acquire and Release Overhead

• Disadvantage: May Introduce False Exclusion• Multiple Processors Attempt to Acquire Same Lock• Processor Holding the Lock is Executing Code that

was Originally in No Mutual Exclusion Region

Page 20: Synchronization Transformations for Parallel Computing

False Exclusion Policy

Goal: Limit Potential Severity of False Exclusion

Mechanism: Constrain the Application of Basic

Transformations

• Original: Never Apply Transformations• Bounded: Apply Transformations only on

Cycle-Free Subgraphs of ICFG

• Aggressive: Always apply Transformations

Page 21: Synchronization Transformations for Parallel Computing

Experimental Results

• Automatic Parallelizing Compiler Based on Commutativity Analysis [PLDI’96]

• Set of Complete Scientific Applications (C++ subset)• Barnes-Hut N-Body Solver (1500 lines of Code)• Liquid Water Simulation Code (1850 lines of Code)• Seismic Modeling String Code (2050 lines of Code)

• Different False Exclusion Policies

• Performance of Generated Parallel Code on Stanford DASH Shared-Memory Multiprocessor

Page 22: Synchronization Transformations for Parallel Computing

Lock Overhead

0

20

40

60

Perc

enta

ge L

ock

Ove

rhea

d

Barnes-Hut (16K Particles)

Original

Bounded

Aggressive

Percentage of Time that the Single Processor Execution Spends Acquiring and Releasing Mutual Exculsion Locks

0

20

40

60

Perc

enta

ge L

ock

Ove

rhea

d

Water (512 Molecules)

Original

BoundedAggressive

0

20

40

60

Perc

enta

ge L

ock

Ove

rhea

d

String (Big Well Model)

OriginalAggressive

Page 23: Synchronization Transformations for Parallel Computing

Contention OverheadC

onte

ntio

n Pe

rcen

tage

Percentage of Time that Processors Spend Waiting to Acquire Locks Held by Other Processors

100

0

25

50

75

0 4 8 12 16Processors

Barnes-Hut (16K Bodies)

0

25

50

75

100

0 4 8 12 16Processors

Water (512 Molecules)

0

25

50

75

100

0 4 8 12 16Processors

String (Big Well Model)

OriginalBoundedAggressive

Page 24: Synchronization Transformations for Parallel Computing

0

2

4

6

8

10

12

14

16

Spe

edup

0 2 4 6 8 10 12 14 16Number of Processors

Ideal

Aggressive

Bounded

Original

Barnes-Hut (16384 bodies)

Performance Results : Barnes-Hut

Page 25: Synchronization Transformations for Parallel Computing

Performance Results: Water

Ideal

Aggressive

Bounded

Original

0

2

4

6

8

10

12

14

16

0 2 4 6 8 10 12 14 16

Spe

edup

Number of Processors

Water (512 Molecules)

Page 26: Synchronization Transformations for Parallel Computing

Performance Results: String

String (Big Well Model)

Spe

edup

Number of Processors

0

2

4

6

8

10

12

14

16

0 2 4 6 8 10 12 14 16

Ideal

Original

Aggressive

Page 27: Synchronization Transformations for Parallel Computing

Choosing Best Policy

• Best False Exclusion Policy May Depend On• Topology of Data Structures• Dynamic Schedule Of Computation

• Information Required to Choose Best Policy Unavailable at Compile Time

• Complications• Different Phases May Have Different Best Policy• In Same Phase, Best Policy May Change Over Time

Page 28: Synchronization Transformations for Parallel Computing

Solution: Dynamic Feedback

• Generated Code Consists of• Sampling Phases: Measure Performance of Different

Policies• Production Phases : Use Best Policy From Sampling

Phase

• Periodically Resample to Discover Changes in Best Policy

• Guaranteed Performance Bounds

Page 29: Synchronization Transformations for Parallel Computing

Dynamic Feedback

AggressiveOriginalBounded

Time

Ove

rhea

d

Sampling Phase Production Phase Sampling Phase

AggressiveCodeVersion

Page 30: Synchronization Transformations for Parallel Computing

Dynamic Feedback : Barnes-Hut

0

2

4

6

8

10

12

14

16

Spe

edup

0 2 4 6 8 10

12

14

16Number of Processors

Ideal

Aggressive

Dynamic Feedback

Bounded

Original

Barnes-Hut (16384 bodies)

Page 31: Synchronization Transformations for Parallel Computing

Dynamic Feedback : Water

0

2

4

6

8

10

12

14

16

0 2 4 6 8 10 12 14 16

Spe

edup

Number of Processors

Ideal

Bounded

Original

Aggressive

Dynamic Feedback

Water (512 Molecules)

Page 32: Synchronization Transformations for Parallel Computing

Dynamic Feedback : String

String (BigWell Model)

0

2

4

6

8

10

12

14

16

0 2 4 6 8 10 12 14 16

Spe

edup

Number of Processors

Ideal

Original

Aggressive

Dynamic Feedback

Page 33: Synchronization Transformations for Parallel Computing

Related Work

• Parallel Loop Optimizations (e.g. [Tseng:PPoPP95])

• Array-based Scientific Computations• Barriers vs. Cheaper Mechanisms

• Concurrent Object-Oriented Programs (e.g. [PZC:POPL95])

• Merge Access Regions for Invocations of Exclusive Methods

• Concurrent Constraint Programming• Bring Together Ask and Tell Constructs

• Efficient Synchronization Algorithms• Efficient Implementations of Synchronization

Primitives

Page 34: Synchronization Transformations for Parallel Computing

Conclusions

• Synchronization Optimizations• Basic Synchronization Transformations for Locks• Synchronization Optimization Algorithm

• Integrated into Prototype Parallelizing Compiler• Object-Based Programs with Dynamic Data Structures• Commutativity Analysis

• Experimental Results• Optimizations Have a Significant Performance Impact• With Optimizations, Applications Perform Well

• Dynamic Feedback


Recommended