+ All Categories
Home > Documents > VODCA: View-Oriented, Distributed, Cluster-based Approach to parallel computing

VODCA: View-Oriented, Distributed, Cluster-based Approach to parallel computing

Date post: 31-Jan-2016
Category:
Upload: alva
View: 37 times
Download: 0 times
Share this document with a friend
Description:
VODCA: View-Oriented, Distributed, Cluster-based Approach to parallel computing. Dr Zhiyi Huang Dept of Computer Science University of Otago New Zealand. Motivation. DSM applications are not as efficient as MPI on cluster computers. VOPP. - PowerPoint PPT Presentation
23
March 17, 2006 Zhiyi’s RSL 1 VODCA: View-Oriented, Distributed, Cluster- based Approach to parallel computing Dr Zhiyi Huang Dept of Computer Science University of Otago New Zealand
Transcript
Page 1: VODCA: View-Oriented, Distributed, Cluster-based Approach to parallel computing

March 17, 2006 Zhiyi’s RSL 1

VODCA: View-Oriented, Distributed, Cluster-based

Approach to parallel computing

VODCA: View-Oriented, Distributed, Cluster-based

Approach to parallel computing

Dr Zhiyi HuangDept of Computer Science

University of OtagoNew Zealand

Page 2: VODCA: View-Oriented, Distributed, Cluster-based Approach to parallel computing

March 17, 2006 Zhiyi’s RSL 2

MotivationMotivation

DSM applications are not as efficient as DSM applications are not as efficient as MPI on cluster computersMPI on cluster computers

DSM applications are not as efficient as DSM applications are not as efficient as MPI on cluster computersMPI on cluster computers

0

5

10

15

20

25

2-p 4-p 8-p 16-p 32-p

TMKMPI

Page 3: VODCA: View-Oriented, Distributed, Cluster-based Approach to parallel computing

March 17, 2006 Zhiyi’s RSL 3

VOPPVOPP

VODCA is a system supporting View-VODCA is a system supporting View-Oriented Parallel Programming (VOPP)Oriented Parallel Programming (VOPP)

Why a new programming style?Why a new programming style? Improve the performance of DSM Improve the performance of DSM

applications on cluster computersapplications on cluster computers Provide a programming style better than Provide a programming style better than

MPIMPIMessage passing is notoriously known as a Message passing is notoriously known as a

difficult programming styledifficult programming style

VODCA is a system supporting View-VODCA is a system supporting View-Oriented Parallel Programming (VOPP)Oriented Parallel Programming (VOPP)

Why a new programming style?Why a new programming style? Improve the performance of DSM Improve the performance of DSM

applications on cluster computersapplications on cluster computers Provide a programming style better than Provide a programming style better than

MPIMPIMessage passing is notoriously known as a Message passing is notoriously known as a

difficult programming styledifficult programming style

Page 4: VODCA: View-Oriented, Distributed, Cluster-based Approach to parallel computing

March 17, 2006 Zhiyi’s RSL 4

What is a view?What is a view?

Suppose Suppose MM is the set of data objects in is the set of data objects in shared memoryshared memory

A view is a group of data objects from the A view is a group of data objects from the shared memoryshared memory V, VV, VMM

Views must not overlap each otherViews must not overlap each other Vi, Vj, i Vi, Vj, i j, Vi j, Vi Vj = Vj =

Suppose there are Suppose there are nn views in shared memory views in shared memory ∑ ∑ Vi=MVi=M

Suppose Suppose MM is the set of data objects in is the set of data objects in shared memoryshared memory

A view is a group of data objects from the A view is a group of data objects from the shared memoryshared memory V, VV, VMM

Views must not overlap each otherViews must not overlap each other Vi, Vj, i Vi, Vj, i j, Vi j, Vi Vj = Vj =

Suppose there are Suppose there are nn views in shared memory views in shared memory ∑ ∑ Vi=MVi=M

Page 5: VODCA: View-Oriented, Distributed, Cluster-based Approach to parallel computing

March 17, 2006 Zhiyi’s RSL 5

VOPP RequirementsVOPP Requirements

The programmer The programmer shouldshould divide the shared divide the shared data into a number of views according to the data into a number of views according to the data flowdata flow of the of the parallel parallel algorithmalgorithm..

A view should consist of data objects that A view should consist of data objects that are are always processed as an atomic set in a always processed as an atomic set in a program.program.

Views can be created and destroyed anytime.Views can be created and destroyed anytime. Each view has a unique view identifierEach view has a unique view identifier

The programmer The programmer shouldshould divide the shared divide the shared data into a number of views according to the data into a number of views according to the data flowdata flow of the of the parallel parallel algorithmalgorithm..

A view should consist of data objects that A view should consist of data objects that are are always processed as an atomic set in a always processed as an atomic set in a program.program.

Views can be created and destroyed anytime.Views can be created and destroyed anytime. Each view has a unique view identifierEach view has a unique view identifier

Page 6: VODCA: View-Oriented, Distributed, Cluster-based Approach to parallel computing

March 17, 2006 Zhiyi’s RSL 6

VOPP Requirements (cont.)VOPP Requirements (cont.)

View primitives View primitives such as such as acquire_viewacquire_view and and release_viewrelease_view must be used when a must be used when a view is accessed.view is accessed.

acquire_view(View_A);acquire_view(View_A);A = A + 1;A = A + 1;

release_view(View_A);release_view(View_A);acquire_acquire_RRviewview and and release_release_RRviewview can can

be used when a view is only read by a be used when a view is only read by a processor.processor.

View primitives View primitives such as such as acquire_viewacquire_view and and release_viewrelease_view must be used when a must be used when a view is accessed.view is accessed.

acquire_view(View_A);acquire_view(View_A);A = A + 1;A = A + 1;

release_view(View_A);release_view(View_A);acquire_acquire_RRviewview and and release_release_RRviewview can can

be used when a view is only read by a be used when a view is only read by a processor.processor.

Page 7: VODCA: View-Oriented, Distributed, Cluster-based Approach to parallel computing

March 17, 2006 Zhiyi’s RSL 7

ExampleExample

A VOPP program for a A VOPP program for a producer/consumer problemproducer/consumer problem

A VOPP program for a A VOPP program for a producer/consumer problemproducer/consumer problem

If(prod_id == 0){ acquire_view(1); produce(x); release_view(1);}barrier(0);acquire_Rview(1);consume(x);release_Rview(1);

Page 8: VODCA: View-Oriented, Distributed, Cluster-based Approach to parallel computing

March 17, 2006 Zhiyi’s RSL 8

Advantages of VOPPAdvantages of VOPP

Keep the convenience of shared memory Keep the convenience of shared memory programmingprogramming

Focus on data partitioning and data access Focus on data partitioning and data access instead of data race and mutual exclusioninstead of data race and mutual exclusion View primitives automatically achieve mutual View primitives automatically achieve mutual

exclusionexclusion View primitives are not extra burdenView primitives are not extra burden

The programmer can finely tune the parallel The programmer can finely tune the parallel algorithm by careful view partitioningalgorithm by careful view partitioning

Keep the convenience of shared memory Keep the convenience of shared memory programmingprogramming

Focus on data partitioning and data access Focus on data partitioning and data access instead of data race and mutual exclusioninstead of data race and mutual exclusion View primitives automatically achieve mutual View primitives automatically achieve mutual

exclusionexclusion View primitives are not extra burdenView primitives are not extra burden

The programmer can finely tune the parallel The programmer can finely tune the parallel algorithm by careful view partitioningalgorithm by careful view partitioning

Page 9: VODCA: View-Oriented, Distributed, Cluster-based Approach to parallel computing

March 17, 2006 Zhiyi’s RSL 9

Philosophy of VOPPPhilosophy of VOPP

Shared memory is a critical resource Shared memory is a critical resource that needs to be used with carethat needs to be used with care If there is no need to use shared memory, If there is no need to use shared memory,

don’t use itdon’t use it Justification is wanted before a view is Justification is wanted before a view is

createdcreated

Shared memory is a critical resource Shared memory is a critical resource that needs to be used with carethat needs to be used with care If there is no need to use shared memory, If there is no need to use shared memory,

don’t use itdon’t use it Justification is wanted before a view is Justification is wanted before a view is

createdcreated

Page 10: VODCA: View-Oriented, Distributed, Cluster-based Approach to parallel computing

March 17, 2006 Zhiyi’s RSL 10

VOPP vs. MPIVOPP vs. MPI

Easier for programmers than MPIEasier for programmers than MPI For problems like task queue, programming with For problems like task queue, programming with

MPI is horrific.MPI is horrific. Can mimic any finely-tuned MPI programCan mimic any finely-tuned MPI program

Shared message Shared message view view Send/recv Send/recv acquire_view acquire_view

Essential differencesEssential differences View is location transparentView is location transparent More barriers in VOPPMore barriers in VOPP

Easier for programmers than MPIEasier for programmers than MPI For problems like task queue, programming with For problems like task queue, programming with

MPI is horrific.MPI is horrific. Can mimic any finely-tuned MPI programCan mimic any finely-tuned MPI program

Shared message Shared message view view Send/recv Send/recv acquire_view acquire_view

Essential differencesEssential differences View is location transparentView is location transparent More barriers in VOPPMore barriers in VOPP

Page 11: VODCA: View-Oriented, Distributed, Cluster-based Approach to parallel computing

March 17, 2006 Zhiyi’s RSL 11

ImplementationImplementation

VODCA: View-Oriented, Distributed, VODCA: View-Oriented, Distributed, Cluster-based Approach to parallel Cluster-based Approach to parallel computingcomputing

VODCA version 1.0VODCA version 1.0 Released as an open source softwareReleased as an open source software A library run at the user spaceA library run at the user space Based on View-based ConsistencyBased on View-based Consistency Use an efficient consistency protocol Use an efficient consistency protocol

VOUPIDVOUPID

VODCA: View-Oriented, Distributed, VODCA: View-Oriented, Distributed, Cluster-based Approach to parallel Cluster-based Approach to parallel computingcomputing

VODCA version 1.0VODCA version 1.0 Released as an open source softwareReleased as an open source software A library run at the user spaceA library run at the user space Based on View-based ConsistencyBased on View-based Consistency Use an efficient consistency protocol Use an efficient consistency protocol

VOUPIDVOUPID

Page 12: VODCA: View-Oriented, Distributed, Cluster-based Approach to parallel computing

March 17, 2006 Zhiyi’s RSL 12

View-based ConsistencyView-based Consistency

Condition for View-based Consistency Before a processor Pi is allowed to access a view

by calling acquire_view or acquire_Rview, all previous write accesses to data objects of the view must be performed with respect to Pi according to their causal order.

In VOPP, barriers are only used for synchronization and have nothing to do with consistency maintenance for DSM.

Condition for View-based Consistency Before a processor Pi is allowed to access a view

by calling acquire_view or acquire_Rview, all previous write accesses to data objects of the view must be performed with respect to Pi according to their causal order.

In VOPP, barriers are only used for synchronization and have nothing to do with consistency maintenance for DSM.

Page 13: VODCA: View-Oriented, Distributed, Cluster-based Approach to parallel computing

March 17, 2006 Zhiyi’s RSL 13

Consistency protocolsConsistency protocols

They are page basedThey are page basedUpdate protocolUpdate protocol

Modify immediatelyModify immediately Invalidation protocolInvalidation protocol

Use a write notice to invalidate a pageUse a write notice to invalidate a page When the page is accessed, a page fault When the page is accessed, a page fault

causes the fetch of diffs which are applied causes the fetch of diffs which are applied on the pageon the page

They are page basedThey are page basedUpdate protocolUpdate protocol

Modify immediatelyModify immediately Invalidation protocolInvalidation protocol

Use a write notice to invalidate a pageUse a write notice to invalidate a page When the page is accessed, a page fault When the page is accessed, a page fault

causes the fetch of diffs which are applied causes the fetch of diffs which are applied on the pageon the page

Page 14: VODCA: View-Oriented, Distributed, Cluster-based Approach to parallel computing

March 17, 2006 Zhiyi’s RSL 14

Consistency protocols (cont.)Consistency protocols (cont.)

Home-based protocolHome-based protocol Based on invalidate protocol, butBased on invalidate protocol, but For each page, use a copy as its homeFor each page, use a copy as its home When a diff is created, it is applied to the When a diff is created, it is applied to the

home copy immediatelyhome copy immediately When the page is accessed, a page fault When the page is accessed, a page fault

causes the fetch of the home copy (Pros: causes the fetch of the home copy (Pros: resolve the diff accumulation problem)resolve the diff accumulation problem)

Home-based protocolHome-based protocol Based on invalidate protocol, butBased on invalidate protocol, but For each page, use a copy as its homeFor each page, use a copy as its home When a diff is created, it is applied to the When a diff is created, it is applied to the

home copy immediatelyhome copy immediately When the page is accessed, a page fault When the page is accessed, a page fault

causes the fetch of the home copy (Pros: causes the fetch of the home copy (Pros: resolve the diff accumulation problem)resolve the diff accumulation problem)

Page 15: VODCA: View-Oriented, Distributed, Cluster-based Approach to parallel computing

March 17, 2006 Zhiyi’s RSL 15

The VOUPID protocolThe VOUPID protocol

View-Oriented Update Protocol with View-Oriented Update Protocol with Integrated DiffIntegrated Diff BBased on the update protocolased on the update protocol DDiffs of a page of a view are merged into a iffs of a page of a view are merged into a

single diffsingle diff TThe single diff is used to update the page he single diff is used to update the page

when the view is acquiredwhen the view is acquired

View-Oriented Update Protocol with View-Oriented Update Protocol with Integrated DiffIntegrated Diff BBased on the update protocolased on the update protocol DDiffs of a page of a view are merged into a iffs of a page of a view are merged into a

single diffsingle diff TThe single diff is used to update the page he single diff is used to update the page

when the view is acquiredwhen the view is acquired

Page 16: VODCA: View-Oriented, Distributed, Cluster-based Approach to parallel computing

March 17, 2006 Zhiyi’s RSL 16

ExperimentExperiment

Use a cluster computerUse a cluster computer TheThe cluster computer, cluster computer, in Tsinghua Univ.in Tsinghua Univ., consists , consists

of of 128 Itanium 2 128 Itanium 2 running Linux 2.4, connected by running Linux 2.4, connected by InfiniBandInfiniBand. Each . Each nodenode has has two 1.3 GHztwo 1.3 GHz processorprocessorss and and 4 G4 Gbytes RAM. We run two bytes RAM. We run two processes on each node.processes on each node.

We used four applications, Integer Sort (IS), We used four applications, Integer Sort (IS), Gauss, Successive Over-Relaxation (SOR), Gauss, Successive Over-Relaxation (SOR), and Neural Network (NN).and Neural Network (NN).

Use a cluster computerUse a cluster computer TheThe cluster computer, cluster computer, in Tsinghua Univ.in Tsinghua Univ., consists , consists

of of 128 Itanium 2 128 Itanium 2 running Linux 2.4, connected by running Linux 2.4, connected by InfiniBandInfiniBand. Each . Each nodenode has has two 1.3 GHztwo 1.3 GHz processorprocessorss and and 4 G4 Gbytes RAM. We run two bytes RAM. We run two processes on each node.processes on each node.

We used four applications, Integer Sort (IS), We used four applications, Integer Sort (IS), Gauss, Successive Over-Relaxation (SOR), Gauss, Successive Over-Relaxation (SOR), and Neural Network (NN).and Neural Network (NN).

Page 17: VODCA: View-Oriented, Distributed, Cluster-based Approach to parallel computing

March 17, 2006 Zhiyi’s RSL 17

Related systemsRelated systems

TreadMarks (TMK) is a state-of-the-art TreadMarks (TMK) is a state-of-the-art Distributed Shared Memory system Distributed Shared Memory system based on traditional parallel based on traditional parallel programming.programming.

Message Passing Interface (MPI) is a Message Passing Interface (MPI) is a standard for message passing-based standard for message passing-based parallel programming. We used parallel programming. We used LAM/MPI.LAM/MPI.

TreadMarks (TMK) is a state-of-the-art TreadMarks (TMK) is a state-of-the-art Distributed Shared Memory system Distributed Shared Memory system based on traditional parallel based on traditional parallel programming.programming.

Message Passing Interface (MPI) is a Message Passing Interface (MPI) is a standard for message passing-based standard for message passing-based parallel programming. We used parallel programming. We used LAM/MPI.LAM/MPI.

Page 18: VODCA: View-Oriented, Distributed, Cluster-based Approach to parallel computing

March 17, 2006 Zhiyi’s RSL 18

Performance of NNPerformance of NN

0

5

10

15

20

25

30

35

2-p 4-p 8-p 16-p 32-p

VODCATMKMPI

Page 19: VODCA: View-Oriented, Distributed, Cluster-based Approach to parallel computing

March 17, 2006 Zhiyi’s RSL 19

Performance of ISPerformance of IS

0

5

10

15

20

25

2-p 4-p 8-p 16-p 32-p

VODCATMKMPI

Page 20: VODCA: View-Oriented, Distributed, Cluster-based Approach to parallel computing

March 17, 2006 Zhiyi’s RSL 20

Performance of SORPerformance of SOR

0

2

4

6

8

10

12

14

16

2-p 4-p 8-p 16-p 32-p

VODCATMKMPI

Page 21: VODCA: View-Oriented, Distributed, Cluster-based Approach to parallel computing

March 17, 2006 Zhiyi’s RSL 21

Performance of GaussPerformance of Gauss

0

5

10

15

20

25

2-p 4-p 8-p 16-p 32-p

VODCATMKMPI

Page 22: VODCA: View-Oriented, Distributed, Cluster-based Approach to parallel computing

March 17, 2006 Zhiyi’s RSL 22

Future work on VOPPFuture work on VOPP

More benchmarks/applicationsMore benchmarks/applications Performance evaluationPerformance evaluation on larger clusters on larger clusters Optimized implementation of barriers for Optimized implementation of barriers for

VOPPVOPP More auxiliary utilitiesMore auxiliary utilities for for VOPP VOPP programmersprogrammers A view-based debugger for VOPPA view-based debugger for VOPP A fault-tolerant system for VODCAA fault-tolerant system for VODCA

More benchmarks/applicationsMore benchmarks/applications Performance evaluationPerformance evaluation on larger clusters on larger clusters Optimized implementation of barriers for Optimized implementation of barriers for

VOPPVOPP More auxiliary utilitiesMore auxiliary utilities for for VOPP VOPP programmersprogrammers A view-based debugger for VOPPA view-based debugger for VOPP A fault-tolerant system for VODCAA fault-tolerant system for VODCA

Page 23: VODCA: View-Oriented, Distributed, Cluster-based Approach to parallel computing

March 17, 2006 Zhiyi’s RSL 23

Questions?Questions?


Recommended