Branch-and-Bound with Peer-to-Peer for Large-Scale Grid

Post on 24-Oct-2021

4 views 0 download

transcript

Branch-and-Bound with Peer-to-Peerfor Large-Scale Grids

Alexandre di Costanzo

INRIA - CNRS - I3S - Université de Sophia Antipolis

Ph.D. defenseFriday 12th October 2007

Advisor: Prof. Denis Caromel

The Big Picture

2

The Big PictureObjective

2

The Big PictureObjective

Solving combinatorial optimization problems with Grids

2

The Big PictureObjective

Solving combinatorial optimization problems with Grids

Approach

2

The Big PictureObjective

Solving combinatorial optimization problems with Grids

Approach

Parallel Branch-and-Bound and Peer-to-Peer

2

The Big PictureObjective

Solving combinatorial optimization problems with Grids

Approach

Parallel Branch-and-Bound and Peer-to-Peer

Contributions

2

The Big PictureObjective

Solving combinatorial optimization problems with Grids

Approach

Parallel Branch-and-Bound and Peer-to-Peer

Contributions

1. Branch-and-Bound framework for Grids

2

The Big PictureObjective

Solving combinatorial optimization problems with Grids

Approach

Parallel Branch-and-Bound and Peer-to-Peer

Contributions

1. Branch-and-Bound framework for Grids

2. Peer-to-Peer Infrastructure for Grids

2

The Big PictureObjective

Solving combinatorial optimization problems with Grids

Approach

Parallel Branch-and-Bound and Peer-to-Peer

Contributions

1. Branch-and-Bound framework for Grids

2. Peer-to-Peer Infrastructure for Grids

3. Large-scale experiments2

Agenda

Context, Problem, and Related Work

Contributions

Branch-and-Bound Framework for Grids

Desktop Grid with Peer-to-Peer

Mixing Desktops & Clusters

Perspectives & Conclusion

3

Context

4

Context

Combinatorial Optimization Problems (COPs)

costly to solve (finding the best solution)

4

Context

Combinatorial Optimization Problems (COPs)

costly to solve (finding the best solution)

Branch-and-Bound (BnB)

well adapted for solving COPs

relatively easy to provide parallel version

4

Context

Combinatorial Optimization Problems (COPs)

costly to solve (finding the best solution)

Branch-and-Bound (BnB)

well adapted for solving COPs

relatively easy to provide parallel version

Grid Computing

large pool of resources

large-scale environment

4

Branch-and-Bound

5

Branch-and-Bound

5

Consists of a partial enumeration of all feasible solutions and returns the guaranteed

optimal solution

Branch-and-Bound

Feasible solutions are organized as a tree: search-tree

5

Consists of a partial enumeration of all feasible solutions and returns the guaranteed

optimal solution

Branch-and-Bound

Feasible solutions are organized as a tree: search-tree

3 operations:

5

Consists of a partial enumeration of all feasible solutions and returns the guaranteed

optimal solution

Branch-and-Bound

Feasible solutions are organized as a tree: search-tree

3 operations:

Branching: split in sub-problems

5

Consists of a partial enumeration of all feasible solutions and returns the guaranteed

optimal solution

Branch-and-Bound

Feasible solutions are organized as a tree: search-tree

3 operations:

Branching: split in sub-problems

Bounding: compute lower/upper bounds (objective function)

5

Consists of a partial enumeration of all feasible solutions and returns the guaranteed

optimal solution

Branch-and-Bound

Feasible solutions are organized as a tree: search-tree

3 operations:

Branching: split in sub-problems

Bounding: compute lower/upper bounds (objective function)

Pruning: eliminate bad branches

5

Consists of a partial enumeration of all feasible solutions and returns the guaranteed

optimal solution

Branch-and-Bound

Feasible solutions are organized as a tree: search-tree

3 operations:

Branching: split in sub-problems

Bounding: compute lower/upper bounds (objective function)

Pruning: eliminate bad branches

Well adapted for solving COPs [Papadimitriou 98]

5

Consists of a partial enumeration of all feasible solutions and returns the guaranteed

optimal solution

Branch-and-Bound

Feasible solutions are organized as a tree: search-tree

3 operations:

Branching: split in sub-problems

Bounding: compute lower/upper bounds (objective function)

Pruning: eliminate bad branches

Well adapted for solving COPs [Papadimitriou 98]

return the best combinaison out of all5

Consists of a partial enumeration of all feasible solutions and returns the guaranteed

optimal solution

Branch-and-Bound

Feasible solutions are organized as a tree: search-tree

3 operations:

Branching: split in sub-problems

Bounding: compute lower/upper bounds (objective function)

Pruning: eliminate bad branches

Well adapted for solving COPs [Papadimitriou 98]

return the best combinaison out of all5

Consists of a partial enumeration of all feasible solutions and returns the guaranteed

optimal solution

Branching: split in sub-problems

Bounding: compute local lower/upper bounds

Pruning: local lower bound higher than the global one

Not generated and explored parts of the tree

Parallel Branch-and-Bound

6

Parallel Branch-and-BoundCOPs are difficult to solve:

enumeration size & NP-hard class

6

Parallel Branch-and-BoundCOPs are difficult to solve:

enumeration size & NP-hard class

Many studies on parallel approach [Gendron 94, Crainic 06]

node-based: parallel bounding on sub-problems

tree-based: building the tree in parallelmulti-search: several trees are generated in parallel

6

Parallel Branch-and-BoundCOPs are difficult to solve:

enumeration size & NP-hard class

Many studies on parallel approach [Gendron 94, Crainic 06]

node-based: parallel bounding on sub-problems

tree-based: building the tree in parallelmulti-search: several trees are generated in parallel

Tree-based is the most studied [Authié 95 , Crainic 06]

the solution tree is not known beforehand

no part of the tree may be estimated at compilation

tasks are dynamically generatedtask allocations to processors must be done dynamically

distributing issues, such as load-balancing & information sharing

6

Parallel Branch-and-BoundCOPs are difficult to solve:

enumeration size & NP-hard class

Many studies on parallel approach [Gendron 94, Crainic 06]

node-based: parallel bounding on sub-problems

tree-based: building the tree in parallelmulti-search: several trees are generated in parallel

Tree-based is the most studied [Authié 95 , Crainic 06]

the solution tree is not known beforehand

no part of the tree may be estimated at compilation

tasks are dynamically generatedtask allocations to processors must be done dynamically

distributing issues, such as load-balancing & information sharing

Sharing a global bound for optimizing the prune operation

6

Parallel Branch-and-BoundCOPs are difficult to solve:

enumeration size & NP-hard class

Many studies on parallel approach [Gendron 94, Crainic 06]

node-based: parallel bounding on sub-problems

tree-based: building the tree in parallelmulti-search: several trees are generated in parallel

Tree-based is the most studied [Authié 95 , Crainic 06]

the solution tree is not known beforehand

no part of the tree may be estimated at compilation

tasks are dynamically generatedtask allocations to processors must be done dynamically

distributing issues, such as load-balancing & information sharing

Sharing a global bound for optimizing the prune operation

6

Parallel Branch-and-BoundCOPs are difficult to solve:

enumeration size & NP-hard class

Many studies on parallel approach [Gendron 94, Crainic 06]

node-based: parallel bounding on sub-problems

tree-based: building the tree in parallelmulti-search: several trees are generated in parallel

Tree-based is the most studied [Authié 95 , Crainic 06]

the solution tree is not known beforehand

no part of the tree may be estimated at compilation

tasks are dynamically generatedtask allocations to processors must be done dynamically

distributing issues, such as load-balancing & information sharing

Sharing a global bound for optimizing the prune operation

6

Proposition: Parallel BnB + Grid

Tree-based is the most known by users &Related difficulties are known

BnB Related Work

7

Frameworks Algorithms Parallelization Machines PUBB Low-level, Basic B&B SPMD Cluster/PVM BOB++ Low-level, Basic B&B SPMD Cluster/MPI PPBB-Lib Basic B&B SPMD Cluster/PVM PICO Basic B&B, Mixed-integer LP hier. master-worker Cluster/MPI MallBa Low-level, Basic B&B SPMD Cluster/MPI ZRAM Low-level, Basic B&B SPMD Cluster/PVM

ALPS/BiCePS •Low-level•Basic B&B•Mixed-integer LP•Branch&Price&Cut

hier. master-worker Cluster/MPI

Metacomputing MW Basic B&B master-worker Grids/Condor Symphony Mixed-integer LP, Branch&Price&Cut master-worker Cluster/PVM

BnB Related Work

7

Frameworks Algorithms Parallelization Machines PUBB Low-level, Basic B&B SPMD Cluster/PVM BOB++ Low-level, Basic B&B SPMD Cluster/MPI PPBB-Lib Basic B&B SPMD Cluster/PVM PICO Basic B&B, Mixed-integer LP hier. master-worker Cluster/MPI MallBa Low-level, Basic B&B SPMD Cluster/MPI ZRAM Low-level, Basic B&B SPMD Cluster/PVM

ALPS/BiCePS •Low-level•Basic B&B•Mixed-integer LP•Branch&Price&Cut

hier. master-worker Cluster/MPI

Metacomputing MW Basic B&B master-worker Grids/Condor Symphony Mixed-integer LP, Branch&Price&Cut master-worker Cluster/PVM

Most Previous work are based on SPMD and target clustersbetter for sharing bounds

Grid Computing

8

Grid ComputingDistributed shared computing infrastructure

multi-institutional virtual organization

8

Grid ComputingDistributed shared computing infrastructure

multi-institutional virtual organization

Provide large pool of resources

8

Grid ComputingDistributed shared computing infrastructure

multi-institutional virtual organization

Provide large pool of resources

New challenges

geographically distributed

deployment

scalability

communication

fault-tolerance

multiple administrative domains, heterogeneity, high performance, programming model, etc.

8

Grid ComputingDistributed shared computing infrastructure

multi-institutional virtual organization

Provide large pool of resources

New challenges

geographically distributed

deployment

scalability

communication

fault-tolerance

multiple administrative domains, heterogeneity, high performance, programming model, etc.

8

Parallel BnB related work: SPMDGrids are not adapted for SPMD (heterogeneity, latency, etc.)

Grid Computing

9

Grids involve new concepts for development & execution of applications

Grid Computing

9

Grid FabricSchedulers, Networking, OSFederated Hardware Resources

Grids involve new concepts for development & execution of applications

Grid Computing

9

Grid FabricSchedulers, Networking, OSFederated Hardware Resources

Grid Middleware Infrastructure

Super-Schedulers, Resource Trading, Information, Security, etc.

Grids involve new concepts for development & execution of applications

Grid Computing

9

Grid FabricSchedulers, Networking, OSFederated Hardware Resources

Grid Middleware Infrastructure

Super-Schedulers, Resource Trading, Information, Security, etc.

Grid ProgrammingModels, Tools,High-Level Access to Middleware

Grids involve new concepts for development & execution of applications

Grid Computing

9

Grid Applications & Portals

Grid FabricSchedulers, Networking, OSFederated Hardware Resources

Grid Middleware Infrastructure

Super-Schedulers, Resource Trading, Information, Security, etc.

Grid ProgrammingModels, Tools,High-Level Access to Middleware

Grids involve new concepts for development & execution of applications

Grid Computing

9

Grid Applications & Portals

Grid FabricSchedulers, Networking, OSFederated Hardware Resources

Grid Middleware Infrastructure

Super-Schedulers, Resource Trading, Information, Security, etc.

Grid ProgrammingModels, Tools,High-Level Access to Middleware

Grids involve new concepts for development & execution of applications

Branch-and-Bound API

Challenges

10

Combinatorial Optimization Problemscostly to solve

Challenges

10

Branch-and-Boundadapted for solving COPs

easy to parallelize

Combinatorial Optimization Problemscostly to solve

Challenges

10

Branch-and-Boundadapted for solving COPs

easy to parallelize

Grid Computinghuge number of resources

Combinatorial Optimization Problemscostly to solve

Challenges

10

Branch-and-Boundadapted for solving COPs

easy to parallelize

Grid Computinghuge number of resources

Combinatorial Optimization Problemscostly to solve

Use Branch-and-Bound on Grids for solving COPs

+

Challenges

10

Branch-and-Boundadapted for solving COPs

easy to parallelize

Grid Computinghuge number of resources

Combinatorial Optimization Problemscostly to solve

Use Branch-and-Bound on Grids for solving COPs

+

Efficient communications with Grids is difficult Problem with sharing bounds

Agenda

Context, Problem, and Related Work

Contributions

Branch-and-Bound Framework for Grids

Desktop Grid with Peer-to-Peer

Mixing Desktops & Clusters

Perspectives & Conclusion

11

BnB for Grids Related Work

12

BnB for Grids Related Work

Aida et al. focus on the design:hierarchical master-worker scales on Grids

12

BnB for Grids Related Work

Aida et al. focus on the design:hierarchical master-worker scales on Grids

Foster et al. fully decentralized approach:communication overhead

12

BnB for Grids Related Work

Aida et al. focus on the design:hierarchical master-worker scales on Grids

Foster et al. fully decentralized approach:communication overhead

ParadisEO focus on meta-heuristics (not exact solution):master-worker

12

BnB for Grids Related Work

Aida et al. focus on the design:hierarchical master-worker scales on Grids

Foster et al. fully decentralized approach:communication overhead

ParadisEO focus on meta-heuristics (not exact solution):master-worker

12

Skeletons with farm or divide-and-conquerSatin for divide-and-conquer

BnB on Grids: Problems and Solutions

13

BnB on Grids: Problems and Solutions

13

BnB on Grids: Problems and Solutions

13

Latency Asynchronous communications

Scalability Hierarchical master-worker

Solution tree size Dynamically generated by splitting tasks

Share the best bounds

Efficient parallelism & communication

Faults Fault-tolerance

BnB on Grids: Problems and Solutions

13

Latency Asynchronous communications

Scalability Hierarchical master-worker

Solution tree size Dynamically generated by splitting tasks

Share the best bounds

Efficient parallelism & communication

Faults Fault-tolerance

BnB on Grids: Problems and Solutions

13

Latency Asynchronous communications

Scalability Hierarchical master-worker

Solution tree size Dynamically generated by splitting tasks

Share the best bounds

Efficient parallelism & communication

Faults Fault-tolerance

BnB on Grids: Problems and Solutions

13

Latency Asynchronous communications

Scalability Hierarchical master-worker

Solution tree size Dynamically generated by splitting tasks

Share the best bounds

Efficient parallelism & communication

Faults Fault-tolerance

BnB on Grids: Problems and Solutions

13

Latency Asynchronous communications

Scalability Hierarchical master-worker

Solution tree size Dynamically generated by splitting tasks

Share the best bounds

Efficient parallelism & communication

Faults Fault-tolerance

BnB on Grids: Problems and Solutions

13

Latency Asynchronous communications

Scalability Hierarchical master-worker

Solution tree size Dynamically generated by splitting tasks

Share the best bounds

Efficient parallelism & communication

Faults Fault-tolerance

BnB on Grids: Problems and Solutions

13

Latency Asynchronous communications

Scalability Hierarchical master-worker

Solution tree size Dynamically generated by splitting tasks

Share the best bounds

Efficient parallelism & communication

Faults Fault-tolerance

BnB on Grids: Problems and Solutions

13

Latency Asynchronous communications

Scalability Hierarchical master-worker

Solution tree size Dynamically generated by splitting tasks

Share the best bounds

Efficient parallelism & communication

Faults Fault-tolerance

BnB on Grids: Problems and Solutions

13

Latency Asynchronous communications

Scalability Hierarchical master-worker

Solution tree size Dynamically generated by splitting tasks

Share the best bounds

Efficient parallelism & communication

Faults Fault-tolerance

BnB on Grids: Problems and Solutions

13

Latency Asynchronous communications

Scalability Hierarchical master-worker

Solution tree size Dynamically generated by splitting tasks

Share the best bounds

Efficient parallelism & communication

Faults Fault-tolerance

BnB on Grids: Problems and Solutions

13

Latency Asynchronous communications

Scalability Hierarchical master-worker

Solution tree size Dynamically generated by splitting tasks

Share the best bounds

Efficient parallelism & communication

Faults Fault-tolerance

Objective: hide Grid difficulties to usersEspecially communication problems

BnB Framework - Entities and Roles

14

BnB Framework - Entities and Roles

Root Task:implemented by usersobjective-function and splitting/branching operation

14

BnB Framework - Entities and Roles

Root Task:implemented by usersobjective-function and splitting/branching operation

Master: Entry Pointsplits the problem in taskscollects partial-results ➟ the best solution

14

BnB Framework - Entities and Roles

Root Task:implemented by usersobjective-function and splitting/branching operation

Master: Entry Pointsplits the problem in taskscollects partial-results ➟ the best solution

Sub-Master:intermediary between master and workers

14

BnB Framework - Entities and Roles

Root Task:implemented by usersobjective-function and splitting/branching operation

Master: Entry Pointsplits the problem in taskscollects partial-results ➟ the best solution

Sub-Master:intermediary between master and workers

Worker:computes tasks

14

15

Master

Sub-master

Cluster

Worker Worker

Worker

Cluster

Worker

Worker Worker

Cluster

Worker

Worker Worker

Cluster

Worker

Worker Worker

Sub-master

Worker

Worker

Worker Worker

BnB Framework - Architecture

15

Master

Sub-master

Cluster

Worker Worker

Worker

Cluster

Worker

Worker Worker

Cluster

Worker

Worker Worker

Cluster

Worker

Worker Worker

Sub-master

Problem

Worker

Worker

Worker Worker

BnB Framework - Architecture

15

Master

Sub-master

Cluster

Worker Worker

Worker

Cluster

Worker

Worker Worker

Cluster

Worker

Worker Worker

Cluster

Worker

Worker Worker

Sub-master

Problem

First splitting

Worker

Worker

Worker Worker

BnB Framework - Architecture

15

Master

Sub-master

Cluster

Worker Worker

Worker

Cluster

Worker

Worker Worker

Cluster

Worker

Worker Worker

Cluster

Worker

Worker Worker

Sub-master

Pending tasksProblem

First splitting

Worker

Worker

Worker Worker

BnB Framework - Architecture

15

Master

Sub-master

Cluster

Worker Worker

Worker

Cluster

Worker

Worker Worker

Cluster

Worker

Worker Worker

Cluster

Worker

Worker Worker

Sub-master

Pending tasksProblem

Worker

Worker

Worker Worker

BnB Framework - Architecture

15

Master

Sub-master

Cluster

Worker Worker

Worker

Cluster

Worker

Worker Worker

Cluster

Worker

Worker Worker

Cluster

Worker

Worker Worker

Sub-master

Pending tasksProblem

Ask task

Worker

Worker

Worker Worker

BnB Framework - Architecture

15

Master

Sub-master

Cluster

Worker Worker

Worker

Cluster

Worker

Worker Worker

Cluster

Worker

Worker Worker

Cluster

Worker

Worker Worker

Sub-master

Pending tasksProblem

Worker

Worker

Worker Worker

BnB Framework - Architecture

15

Master

Sub-master

Cluster

Worker Worker

Worker

Cluster

Worker

Worker Worker

Cluster

Worker

Worker Worker

Cluster

Worker

Worker Worker

Sub-master

Pending tasksProblem

Ask task

Worker

Worker

Worker Worker

BnB Framework - Architecture

15

Master

Sub-master

Cluster

Worker Worker

Worker

Cluster

Worker

Worker Worker

Cluster

Worker

Worker Worker

Cluster

Worker

Worker Worker

Sub-master

Pending tasksProblem

Worker

Worker

Worker Worker

BnB Framework - Architecture

15

Master

Sub-master

Cluster

Worker Worker

Worker

Cluster

Worker

Worker Worker

Cluster

Worker

Worker Worker

Cluster

Worker

Worker Worker

Sub-master

Pending tasksProblem

Got task

Worker

Worker

Worker Worker

BnB Framework - Architecture

15

Master

Sub-master

Cluster

Worker Worker

Worker

Cluster

Worker

Worker Worker

Cluster

Worker

Worker Worker

Cluster

Worker

Worker Worker

Sub-master

Pending tasksProblem

Worker

Worker

Worker Worker

BnB Framework - Architecture

15

Master

Sub-master

Cluster

Worker Worker

Worker

Cluster

Worker

Worker Worker

Cluster

Worker

Worker Worker

Cluster

Worker

Worker Worker

Sub-master

Pending tasksProblem

Got task

Worker

Worker

Worker Worker

BnB Framework - Architecture

15

Master

Sub-master

Cluster

Worker Worker

Worker

Cluster

Worker

Worker Worker

Cluster

Worker

Worker Worker

Cluster

Worker

Worker Worker

Sub-master

Pending tasksProblem

Worker

Worker

Worker Worker

BnB Framework - Architecture

15

Master

Sub-master

Cluster

Worker Worker

Worker

Cluster

Worker

Worker Worker

Cluster

Worker

Worker Worker

Cluster

Worker

Worker Worker

Sub-master

Pending tasksProblem

ComputingWorker

Worker

Worker Worker

BnB Framework - Architecture

15

Master

Sub-master

Cluster

Worker Worker

Worker

Cluster

Worker

Worker Worker

Cluster

Worker

Worker Worker

Cluster

Worker

Worker Worker

Sub-master

Pending tasksProblem

Computing

Sending result

Worker

Worker

Worker Worker

BnB Framework - Architecture

15

Master

Sub-master

Cluster

Worker Worker

Worker

Cluster

Worker

Worker Worker

Cluster

Worker

Worker Worker

Cluster

Worker

Worker Worker

Sub-master

Pending tasksProblem

ComputingWorker

Worker

Worker Worker

BnB Framework - Architecture

15

Master

Sub-master

Cluster

Worker Worker

Worker

Cluster

Worker

Worker Worker

Cluster

Worker

Worker Worker

Cluster

Worker

Worker Worker

Sub-master

Pending tasksProblem

ComputingWorker

Worker

Worker Worker

BnB Framework - Architecture

15

Master

Sub-master

Cluster

Worker Worker

Worker

Cluster

Worker

Worker Worker

Cluster

Worker

Worker Worker

Cluster

Worker

Worker Worker

Sub-master

Pending tasks Partial resultsProblem

ComputingWorker

Worker

Worker Worker

BnB Framework - Architecture

BnB Framework - Solutions

16

BnB Framework - SolutionsContext ProActive Java Grid middleware:

latency ➟ asynchronous communicationunderlaying Grid infrastructure ➟ deployment framework (abstraction)

16

BnB Framework - SolutionsContext ProActive Java Grid middleware:

latency ➟ asynchronous communicationunderlaying Grid infrastructure ➟ deployment framework (abstraction)

16

Implement the tree-based parallelMaster-worker architectureProblem: workers need to share bounds

difficult to adapt SPMD for Grids (heterogeneity, distribution, etc.)

BnB Framework - SolutionsContext ProActive Java Grid middleware:

latency ➟ asynchronous communicationunderlaying Grid infrastructure ➟ deployment framework (abstraction)

16

Solution 1: Master keeps the bound

Implement the tree-based parallelMaster-worker architectureProblem: workers need to share bounds

difficult to adapt SPMD for Grids (heterogeneity, distribution, etc.)

BnB Framework - SolutionsContext ProActive Java Grid middleware:

latency ➟ asynchronous communicationunderlaying Grid infrastructure ➟ deployment framework (abstraction)

16

Solution 1: Master keeps the bound- previous work shows that not scale [Aida 2003]

Implement the tree-based parallelMaster-worker architectureProblem: workers need to share bounds

difficult to adapt SPMD for Grids (heterogeneity, distribution, etc.)

BnB Framework - SolutionsContext ProActive Java Grid middleware:

latency ➟ asynchronous communicationunderlaying Grid infrastructure ➟ deployment framework (abstraction)

16

Solution 1: Master keeps the bound- previous work shows that not scale [Aida 2003]

Solution 2: Message framework (Enterprise Service Bus)

Implement the tree-based parallelMaster-worker architectureProblem: workers need to share bounds

difficult to adapt SPMD for Grids (heterogeneity, distribution, etc.)

BnB Framework - SolutionsContext ProActive Java Grid middleware:

latency ➟ asynchronous communicationunderlaying Grid infrastructure ➟ deployment framework (abstraction)

16

Solution 1: Master keeps the bound- previous work shows that not scale [Aida 2003]

Solution 2: Message framework (Enterprise Service Bus)- Grid middleware dependent / Good for SOA

Implement the tree-based parallelMaster-worker architectureProblem: workers need to share bounds

difficult to adapt SPMD for Grids (heterogeneity, distribution, etc.)

BnB Framework - SolutionsContext ProActive Java Grid middleware:

latency ➟ asynchronous communicationunderlaying Grid infrastructure ➟ deployment framework (abstraction)

16

Solution 1: Master keeps the bound- previous work shows that not scale [Aida 2003]

Solution 2: Message framework (Enterprise Service Bus)- Grid middleware dependent / Good for SOA Solution 3: Broadcasting

Implement the tree-based parallelMaster-worker architectureProblem: workers need to share bounds

difficult to adapt SPMD for Grids (heterogeneity, distribution, etc.)

BnB Framework - SolutionsContext ProActive Java Grid middleware:

latency ➟ asynchronous communicationunderlaying Grid infrastructure ➟ deployment framework (abstraction)

16

Solution 1: Master keeps the bound- previous work shows that not scale [Aida 2003]

Solution 2: Message framework (Enterprise Service Bus)- Grid middleware dependent / Good for SOA Solution 3: Broadcasting- 1 to n communication cannot scale

Implement the tree-based parallelMaster-worker architectureProblem: workers need to share bounds

difficult to adapt SPMD for Grids (heterogeneity, distribution, etc.)

BnB Framework - SolutionsContext ProActive Java Grid middleware:

latency ➟ asynchronous communicationunderlaying Grid infrastructure ➟ deployment framework (abstraction)

16

Solution 1: Master keeps the bound- previous work shows that not scale [Aida 2003]

Solution 2: Message framework (Enterprise Service Bus)- Grid middleware dependent / Good for SOA Solution 3: Broadcasting- 1 to n communication cannot scale

✓ hierarchical broadcasting scale [Baduel 05]

Implement the tree-based parallelMaster-worker architectureProblem: workers need to share bounds

difficult to adapt SPMD for Grids (heterogeneity, distribution, etc.)

Organizing Communications for Broadcasting

17

Organizing Communications for Broadcasting

Idea: Grids are composed of clusters ➟ organizing Workers in groups

17

Organizing Communications for Broadcasting

Idea: Grids are composed of clusters ➟ organizing Workers in groups

clusters are high-performance communication environments

17

Organizing Communications for Broadcasting

Idea: Grids are composed of clusters ➟ organizing Workers in groups

clusters are high-performance communication environments

17

Organizing Communications for Broadcasting

Idea: Grids are composed of clusters ➟ organizing Workers in groups

clusters are high-performance communication environments

Solution:

add a new entity for organizing communications: Leader

Leader is a Worker chose by the Master for each group

17

Organizing Communications for Broadcasting

Idea: Grids are composed of clusters ➟ organizing Workers in groups

clusters are high-performance communication environments

Solution:

add a new entity for organizing communications: Leader

Leader is a Worker chose by the Master for each group

17

Organizing Communications for Broadcasting

Idea: Grids are composed of clusters ➟ organizing Workers in groups

clusters are high-performance communication environments

Solution:

add a new entity for organizing communications: Leader

Leader is a Worker chose by the Master for each group

Process to update Bounds:

1. the Worker broadcasts the new Bound inside its group

2. the group Leader broadcasts the new Bound to all Leaders

3. each Leader broadcasts the new value inside their groups

17

18

Cluster

Worker Worker

Worker Leader

Cluster

Worker Leader

Worker Worker

Cluster

Worker Leader

Worker Worker

Cluster

Leader Worker

Worker Worker

Sub-master

Sub-master

BnB framework - Communications for Sharing Bound

18

Cluster

Worker Worker

Worker Leader

Cluster

Worker Leader

Worker Worker

Cluster

Worker Leader

Worker Worker

Cluster

Leader Worker

Worker Worker

Sub-master

Sub-master

BnB framework - Communications for Sharing Bound

18

Cluster

Worker Worker

Worker Leader

Cluster

Worker Leader

Worker Worker

Cluster

Worker Leader

Worker Worker

Cluster

Leader Worker

Worker Worker

Sub-master

Sub-master

BnB framework - Communications for Sharing Bound

18

Cluster

Worker Worker

Worker Leader

Cluster

Worker Leader

Worker Worker

Cluster

Worker Leader

Worker Worker

Cluster

Leader Worker

Worker Worker

Sub-master

Sub-master

BnB framework - Communications for Sharing Bound

18

Cluster

Worker Worker

Worker Leader

Cluster

Worker Leader

Worker Worker

Cluster

Worker Leader

Worker Worker

Cluster

Leader Worker

Worker Worker

Sub-master

Sub-master

new best bound!

BnB framework - Communications for Sharing Bound

18

Cluster

Worker Worker

Worker Leader

Cluster

Worker Leader

Worker Worker

Cluster

Worker Leader

Worker Worker

Cluster

Leader Worker

Worker Worker

Sub-master

Sub-master

new best bound!

BnB framework - Communications for Sharing Bound

18

Cluster

Worker Worker

Worker Leader

Cluster

Worker Leader

Worker Worker

Cluster

Worker Leader

Worker Worker

Cluster

Leader Worker

Worker Worker

Sub-master

Sub-master

new best bound!

BnB framework - Communications for Sharing Bound

18

Cluster

Worker Worker

Worker Leader

Cluster

Worker Leader

Worker Worker

Cluster

Worker Leader

Worker Worker

Cluster

Leader Worker

Worker Worker

Sub-master

Sub-master

new best bound!

BnB framework - Communications for Sharing Bound

18

Cluster

Worker Worker

Worker Leader

Cluster

Worker Leader

Worker Worker

Cluster

Worker Leader

Worker Worker

Cluster

Leader Worker

Worker Worker

Sub-master

Sub-master

new best bound!

BnB framework - Communications for Sharing Bound

18

Cluster

Worker Worker

Worker Leader

Cluster

Worker Leader

Worker Worker

Cluster

Worker Leader

Worker Worker

Cluster

Leader Worker

Worker Worker

Sub-master

Sub-master

new best bound!

BnB framework - Communications for Sharing Bound

Grid middleware provides communications

BnB Search StrategiesThe improvement of the Bound

1. depends of how it is shared (communication)

2. depends of how the search-tree is generated:

Classical

Depth-First Search

Breadth-First Search

Contribution

First-In-First-Out (FIFO)

Priority

open API ...19

BnB Framework: Fault-Tolerance

20

BnB Framework: Fault-Tolerance

20

Manage user exception ➟ computation stopped

BnB Framework: Fault-Tolerance

20

Manage user exception ➟ computation stopped

For us, a fault is a Failed Stop

BnB Framework: Fault-Tolerance

20

Manage user exception ➟ computation stopped

For us, a fault is a Failed Stop

Worker fault ➟ handled by (sub-)master

BnB Framework: Fault-Tolerance

20

Manage user exception ➟ computation stopped

For us, a fault is a Failed Stop

Worker fault ➟ handled by (sub-)master

Leader fault ➟ master choose new one

BnB Framework: Fault-Tolerance

20

Manage user exception ➟ computation stopped

For us, a fault is a Failed Stop

Worker fault ➟ handled by (sub-)master

Leader fault ➟ master choose new one

Sub-master fault ➟ manager change a worker to sub-master

BnB Framework: Fault-Tolerance

20

Manage user exception ➟ computation stopped

For us, a fault is a Failed Stop

Worker fault ➟ handled by (sub-)master

Leader fault ➟ master choose new one

Sub-master fault ➟ manager change a worker to sub-master

Master fault ➟ back-up file

BnB Framework: Fault-Tolerance

20

Manage user exception ➟ computation stopped

For us, a fault is a Failed Stop

Worker fault ➟ handled by (sub-)master

Leader fault ➟ master choose new one

Sub-master fault ➟ manager change a worker to sub-master

Master fault ➟ back-up file

Load-Balancing is natural with master-worker

BnB Framework: Fault-Tolerance

20

Manage user exception ➟ computation stopped

For us, a fault is a Failed Stop

Worker fault ➟ handled by (sub-)master

Leader fault ➟ master choose new one

Sub-master fault ➟ manager change a worker to sub-master

Master fault ➟ back-up file

Load-Balancing is natural with master-worker

the framework provides a function to get the number of free Workers ➟ users use it to decide branching

Grid’BnB [HiPC07] Features

21

Grid’BnB [HiPC07] Features

21

Design

Grid’BnB [HiPC07] FeaturesAsynchronous communicationsHierarchical master-worker with com.Dynamic task splittingEfficient communications with groupsFault-tolerance

21

Design

Grid’BnB [HiPC07] FeaturesAsynchronous communicationsHierarchical master-worker with com.Dynamic task splittingEfficient communications with groupsFault-tolerance

21

Design

Users

Grid’BnB [HiPC07] FeaturesAsynchronous communicationsHierarchical master-worker with com.Dynamic task splittingEfficient communications with groupsFault-tolerance

21

Hidden parallelism and Grid difficultiesAPI for COPsEase of deploymentPrincipally tree-basedImplementing and testing search strategiesFocus on objective function

Design

Users

Grid’BnB [HiPC07] FeaturesAsynchronous communicationsHierarchical master-worker with com.Dynamic task splittingEfficient communications with groupsFault-tolerance

21

Hidden parallelism and Grid difficultiesAPI for COPsEase of deploymentPrincipally tree-basedImplementing and testing search strategiesFocus on objective function

Design

Users

Validate and Test Grid'BnB by experiments

Flow-Shop Experiments

NP-hard permutation optimization problem

22

Execution time

m1

m2

m3

m4

o1,1

o1,2

o1,3

o1,4

o2,1

o2,2

o2,3

o2,4

o3,1

o3,2

o3,3

o3,4

o4,1

o4,2

o4,3

o4,4

J = {j1, j2, . . . jn}ji = { oi1, oi2, . . . oim }M = {m1,m2, . . .mm}

23

Flow-Shop: 16 Jobs / 20 Machines

0

50

100

150

200

250

20 30 40 50 60

CPUs

Tim

e in

min

utes

FIFO Depth-first search Breadth-first search Priority

Single Cluster: Search Strategies

24

Flow-Shop: 16 Jobs / 20 Machines

0

10

20

30

40

50

60

70

80

20 30 40 50 60

CPUs

Tim

e in

min

utes

1

2

3

Ratio

Facto

r

With Group Communications Without Group Communications

Ratio (With Com / No Com)

Single Cluster: Communications

Grid'5000: the French Grid

25

Sophia

GrenobleLyon

Nancy

Orsay

Lille

Rennes

Bordeaux

Toulouse

9 sites - 14 clusters - 3586 CPUs

Flow-Shop: 17 Jobs / 17 Machines

0

20

40

60

80

100

120

0 100 200 300 400 500 600 700

CPUs

Tim

e in

min

ute

s

Execution Time

Grid Experimentations

26

up to 621 CPUs on 5 sites

Flow-Shop: 17 Jobs / 17 Machines

0

20

40

60

80

100

120

0 100 200 300 400 500 600 700

CPUs

Tim

e in

min

ute

s

Execution Time

Grid Experimentations

26

Flow-Shop: 17 Jobs / 17 Machines

0

20

40

60

80

100

120

0 100 200 300 400 500 600 700

CPUs

Tim

e in

min

ute

s

0

0,05

0,1

0,15

0,2

0,25

0,3

% o

f exp

lored

search

tree

Execution Time Total Work

up to 621 CPUs on 5 sites

Flow-Shop: 17 Jobs / 17 Machines

0

20

40

60

80

100

120

0 100 200 300 400 500 600 700

CPUs

Tim

e in

min

ute

s

Execution Time

Grid Experimentations

26

Flow-Shop: 17 Jobs / 17 Machines

0

20

40

60

80

100

120

0 100 200 300 400 500 600 700

CPUs

Tim

e in

min

ute

s

0

0,05

0,1

0,15

0,2

0,25

0,3

% o

f exp

lored

search

tree

Execution Time Total Work

up to 621 CPUs on 5 sites

Speedup anomalies [Lai 84, Li 86]

Speedup Anomalies & Efficiency

Parallel tree-based speedup may sometimes quite spectacular (> or < linear)[Mans 95]

Speedup Anomalies in BnB [Roucairol 87, Lai 84, Li 86]

speedup depends of how the tree is dynamically built

Efficiency: is a related measure computed as the speedup divided by the number of processors.

27

Speedup Anomalies & Efficiency

Parallel tree-based speedup may sometimes quite spectacular (> or < linear)[Mans 95]

Speedup Anomalies in BnB [Roucairol 87, Lai 84, Li 86]

speedup depends of how the tree is dynamically built

Efficiency: is a related measure computed as the speedup divided by the number of processors.

27

Flow-Shop: 17 Jobs / 17 Machines

0

0,2

0,4

0,6

0,8

1

1,2

90 140 190 240 290 340 390 440 490 540 590 640

CPUs

Eff

icie

ncy

Grid'BnB: Results

28

Grid'BnB: Results

Experimentally validate our BnB framework for Gridsvalidity of organizing communicationsscalability on Grid (up to 621 CPUs on 5 sites)

28

Grid'BnB: Results

Experimentally validate our BnB framework for Gridsvalidity of organizing communicationsscalability on Grid (up to 621 CPUs on 5 sites)

28

Grid'BnB: Results

Experimentally validate our BnB framework for Gridsvalidity of organizing communicationsscalability on Grid (up to 621 CPUs on 5 sites)

Problems:deployment on Grids is difficultdynamically acquiring new resources is difficultpopularity of Grid'5000 cannot mix Grid'5000 and under-utilized lab desktops

28

Grid'BnB: Results

Experimentally validate our BnB framework for Gridsvalidity of organizing communicationsscalability on Grid (up to 621 CPUs on 5 sites)

Problems:deployment on Grids is difficultdynamically acquiring new resources is difficultpopularity of Grid'5000 cannot mix Grid'5000 and under-utilized lab desktops

28

Grid middleware needs a better supporting ofdynamic infrastructure

Agenda

Context, Problem, and Related Work

Contributions

Branch-and-Bound Framework for Grids

Desktop Grid with Peer-to-Peer

Mixing Desktops & Clusters

Perspectives & Conclusion

29

Peer-to-Peer as Grid Middleware

30

Grid Computing and Peer-to-Peer share a common goal:sharing resources [Foster 03, Goux 00]

Grid related work: [Globus]

✓ providing computational resources- installing/deploying Grid middleware is difficultP2P related work: [Gnutella, Freenet, DHT]

- focusing on sharing data & mono-application✓ dynamic & easy to deploy

Peer-to-Peer as Grid Middleware

30

Grid Computing and Peer-to-Peer share a common goal:sharing resources [Foster 03, Goux 00]

Grid related work: [Globus]

✓ providing computational resources- installing/deploying Grid middleware is difficultP2P related work: [Gnutella, Freenet, DHT]

- focusing on sharing data & mono-application✓ dynamic & easy to deploy

Objective: provide a P2P infrastructure for Grids andsharing computational resources

Which P2P Architecture?Master-Worker (SETI@home)

• centralized

- targets only desktops

• good for embarrassingly parallelPure/Unstructured Peer-to-Peer (Gnutella)

- flooding problems

✓ supports high-churn (good for Desktop Grids)

✓ supports many kind of application (data, computational)Hybrid Peer-to-Peer (JXTA)

• uses central servers

✓ limits the flooding

- has to manage churnStructured Peer-to-Peer (Chord)

- high cost for managing churn

• efficient for data sharing (Distributed Hash Table)

31

Worker

Master

Worker WorkerWorkerWorker ... ...

T' T'' T'''

pending tasks

R' R'' R'''

partial results

asking for a task

sending a task

sending a partial result

computing task computing task

B

A

Resource Y

B

A

Resource Y

B

A

Resource Y

Peers in acquaintance Message queryPeer

FloodingUnstructured P2P Peer A accesses directly to resource Y on B

B

A

Resource Y

B

A

Resource Y

B

A

Resource Y

Peers in acquaintance

Message query

Peer

FloodingHybrid P2P Peer A accesses directly to resource Y on B

Server

B -> Y

...

N1

N38

N8

N14

N21

N32N42

N48

N51

N56

lookup(K54)

Peer Query

02 -1m

K54

Finger table

N42N8+32

N32N8+16

N21N8+8

N14N8+4

N8+2 N14

N14N8+1

m=6

Which P2P Architecture?Master-Worker (SETI@home)

• centralized

- targets only desktops

• good for embarrassingly parallelPure/Unstructured Peer-to-Peer (Gnutella)

- flooding problems

✓ supports high-churn (good for Desktop Grids)

✓ supports many kind of application (data, computational)Hybrid Peer-to-Peer (JXTA)

• uses central servers

✓ limits the flooding

- has to manage churnStructured Peer-to-Peer (Chord)

- high cost for managing churn

• efficient for data sharing (Distributed Hash Table)

31

Worker

Master

Worker WorkerWorkerWorker ... ...

T' T'' T'''

pending tasks

R' R'' R'''

partial results

asking for a task

sending a task

sending a partial result

computing task computing task

B

A

Resource Y

B

A

Resource Y

B

A

Resource Y

Peers in acquaintance Message queryPeer

FloodingUnstructured P2P Peer A accesses directly to resource Y on B

B

A

Resource Y

B

A

Resource Y

B

A

Resource Y

Peers in acquaintance

Message query

Peer

FloodingHybrid P2P Peer A accesses directly to resource Y on B

Server

B -> Y

...

N1

N38

N8

N14

N21

N32N42

N48

N51

N56

lookup(K54)

Peer Query

02 -1m

K54

Finger table

N42N8+32

N32N8+16

N21N8+8

N14N8+4

N8+2 N14

N14N8+1

m=6

Pure Peer-to-Peer is the most adaptedNeeds to avoid the flooding problem

Contribution & Positioning

32

Grid Applications & Portals

Grid Programming

Grid Middleware Infrastructure

Grid Fabric

Branch-and-Bound API

Contribution & Positioning

32

Grid Applications & Portals

Grid Programming

Grid Middleware Infrastructure

Grid Fabric

Branch-and-Bound API

P2P Infrastructure

The Peer-to-Peer Infrastructure [CMST06]

33

The Peer-to-Peer Infrastructure [CMST06]

33

Pure Peer-to-Peer overlay networkUsing it as Grid middleware infrastructure

The proposed solution to avoid the flooding problem:3 node-request protocols:

1 node: Random walk algorithmn nodes: Breadth-First-Search (BFS) algorithm with acknowledgementmax nodes: BFS without acknowledgement

Best-effort

INRIA Sophia P2P Desktop GridNeed to validate the infrastructure with desktops

260 desktops at INRIA Sophia lab

No disturbing normal users:

running in low priority

working schedules:

24/24 ≈ 50 machines (INRIA-2424)

night/weekend ≈ 260 machines (INRIA-ALL)

34

INRIA Sophia P2P Desktop GridNeed to validate the infrastructure with desktops

260 desktops at INRIA Sophia lab

No disturbing normal users:

running in low priority

working schedules:

24/24 ≈ 50 machines (INRIA-2424)

night/weekend ≈ 260 machines (INRIA-ALL)

34

Deployed our P2P infrastructure as permanentDesktop Grid at INRIA Sophia

INRIA Sophia P2P Desktop GridNeed to validate the infrastructure with desktops

260 desktops at INRIA Sophia lab

No disturbing normal users:

running in low priority

working schedules:

24/24 ≈ 50 machines (INRIA-2424)

night/weekend ≈ 260 machines (INRIA-ALL)

34

Deployed our P2P infrastructure as permanentDesktop Grid at INRIA Sophia

INRIA Sophia Antipolis - Desktop Machines Network

Desktop Machines - INRIA-2424 Desktop Machines - INRIA-All

Aquaintance

INRIA Sub Network

INRIA Sub Network

INRIA Sub Network

INRIA Sub Network

Long Running Experiments with the P2P Desktop Grid

Context: ETSI Grid Plugtests contest ➟ n-queens

n-queens:

embarrassingly parallel / CPU intensive / master-worker

35

Long Running Experiments with the P2P Desktop Grid

Context: ETSI Grid Plugtests contest ➟ n-queens

n-queens:

embarrassingly parallel / CPU intensive / master-worker

35

How many solution for 25 queens ?

n-queens Experiment Results

World Record [Sloane Integers Sequence A000170]

What we learn from this experiments:

validate the workability of the infrastructure

validate the robustness of the infrastructure

hard to forecast machine's performances36

Total # of Tasks 12,125,199

Task Computation ≈ 138''

Computation Time ≈ 185 days

Cumulated Time ≈ 53 years

# of Desktop Machines 260

Total of Solution Found 2,207,893,435,808,352≈ 2 quadrillions

Agenda

Context, Problem, and Related Work

Contributions

Branch-and-Bound Framework for Grids

Desktop Grid with Peer-to-Peer

Mixing Desktops & Clusters

Perspectives & Conclusion

37

Mixing Desktops & Clusters [PARCO07]

38

Grid'BnBParallel BnB Framework for Grid

P2P Infrastructureas Grid Midlleware

Mixing Desktops & Clusters [PARCO07]

38

Grid'BnBParallel BnB Framework for Grid

P2P Infrastructureas Grid Midlleware

API for solving COPs &Dynamic Grid Infrastructure

+

Mixing Desktops & Clusters [PARCO07]

38

Grid'BnBParallel BnB Framework for Grid

P2P Infrastructureas Grid Midlleware

API for solving COPs &Dynamic Grid Infrastructure

+

Validate by experiments on a Grid of Desktops & Clusters

Mixing Desktops & Clusters [PARCO07]

38

Grid'BnBParallel BnB Framework for Grid

P2P Infrastructureas Grid Midlleware

API for solving COPs &Dynamic Grid Infrastructure

+

Validate by experiments on a Grid of Desktops & Clusters

New Problems:firewalls ➟ forwardersharing clusters with the P2P infrastructure

Sharing Cluster's Nodes with P2P

39

Host

shared node

Peer

Peer sharing local node

Sharing Cluster's Nodes with P2P

39

Host

shared node

Peer

Peer sharing local node

Peer sharing remote nodes

ClusterHost

shared node

Host

shared node

Host

shared node

Host

shared node

...

Host

Peer

Testbed

40

G5K PlatformINRIA Sophia - Desktops

Orsay GDX

Sophia

Lyon

Toulouse

Testbed

40

G5K PlatformINRIA Sophia - Desktops

Orsay GDX

Sophia

Lyon

Toulouse

G5K PlatformINRIA Sophia - Desktops

Orsay GDX

Sophia

Lyon

Toulouse

Peer

Peer Peer

Peer

Node Node

NodeNode

Peer

Node

Testbed

40

G5K PlatformINRIA Sophia - Desktops

Orsay GDX

Sophia

Lyon

Toulouse

G5K PlatformINRIA Sophia - Desktops

Orsay GDX

Sophia

Lyon

Toulouse

Peer

Peer Peer

Peer

Node Node

NodeNode

Peer

Node

G5K PlatformINRIA Sophia - Desktops

Orsay GDX

Sophia

Lyon

Toulouse

Peer

Peer Peer

Peer

Peer

Peer

Peer

Peer

Node

Node

Node

Node

Node

Node

Node

Node

Node

Node

Node

Node

Node

Node

Node

Node Node

NodeNode

RMI/SSH

Large-Scale Experiments

Goal: validate the infrastructure by experiments

With n-queens:

no communication between workers

test the workability of the infrastructure

With flow-shop:

communications between workers

test solving COPs

41

42

1007

859

572

380

250

0 60 120 180 240 300

238,68

190,63

63,68

39,01

24,49

NQueens with n=22

Time in minutes

Tota

l N

um

ber

of C

PU

s for

Experim

enta

tions

1007

859

572

380

250

0 210 420 630 840 1!050

50

298

302

70

70

140

175

176

156

80

80

80

250

240

247

305

349

INRIA-ALL G5K Lyon

G5K Sophia G5K Bordeaux

G5K Orsay G5K Toulouse

Total !of CPUs

Lyon

Sophia

Orsay

Bordeaux

Toulouse

Sophia

Sophia

Sophia

Lyon

Lyon Bord. Orsay

N-Queens Results

43

628

346

321

313

220

201

80

0 21,667 65,000 108,333

125,55

61,31

86,19

56,03

83,73

59,14

61,15

Flow-Shop 17 jobs / 17 machines

Time in minutes

Tota

l N

um

ber

of C

PU

s for

Experim

enta

tions 628

346

321

313

220

201

80

0 140 280 420 560 700

14218680

256

290

90

62

146

163

173

116

5624

55

57

57

58

56

42

INRIA-2424 G5K Lyon

G5K Sophia G5K Bordeaux

G5K Orsay G5K Nancy

G5K Rennes G5K Toulouse

Total # of CPUs

Lyon

Sophia

Sophia

Sophia

Sophia

Orsay

Orsay

Bordeaux

Bord.Toulouse

RennesNancy

Flow-Shop Results

43

628

346

321

313

220

201

80

0 21,667 65,000 108,333

125,55

61,31

86,19

56,03

83,73

59,14

61,15

Flow-Shop 17 jobs / 17 machines

Time in minutes

Tota

l N

um

ber

of C

PU

s for

Experim

enta

tions 628

346

321

313

220

201

80

0 140 280 420 560 700

14218680

256

290

90

62

146

163

173

116

5624

55

57

57

58

56

42

INRIA-2424 G5K Lyon

G5K Sophia G5K Bordeaux

G5K Orsay G5K Nancy

G5K Rennes G5K Toulouse

Total # of CPUs

Lyon

Sophia

Sophia

Sophia

Sophia

Orsay

Orsay

Bordeaux

Bord.Toulouse

RennesNancy

Flow-Shop Results

With desktops & clusters speedup anomalies occur more

Mixing - AnalysisN-Queens problem scales well up to 1007 CPUs

349 Desktops + 5 Clusters

Flow-Shop up to 628 CPUs

42 Desktops + 5 Clusters

worse performances than using only clusters (anomalies are more frequents)

Experimented in closed environments -- security

Grid'5000 platform's success ➟ hard for running long exp.44

P2P as a Meta-Grid infrastructure

45

P2P as a Meta-Grid infrastructureObservation from Grid'5000:

300 nodes available for 2 minutes

provide best-effort queue

45

P2P as a Meta-Grid infrastructureObservation from Grid'5000:

300 nodes available for 2 minutes

provide best-effort queue

Idea: take benefit from these nodes [Condor]

by hand: not easy

with the P2P infrastructure:

deploying peers in best-effort queue

45

P2P as a Meta-Grid infrastructureObservation from Grid'5000:

300 nodes available for 2 minutes

provide best-effort queue

Idea: take benefit from these nodes [Condor]

by hand: not easy

with the P2P infrastructure:

deploying peers in best-effort queue

Result: a permanent P2P infrastructure over Grid'5000

45

P2P as a Meta-Grid infrastructureObservation from Grid'5000:

300 nodes available for 2 minutes

provide best-effort queue

Idea: take benefit from these nodes [Condor]

by hand: not easy

with the P2P infrastructure:

deploying peers in best-effort queue

Result: a permanent P2P infrastructure over Grid'5000

45

P2P infrastructure as a dynamic Grid middleware

P2P with N-Queens on Grid'5000

46

Experimentation time in minutes

200

400

600

800

1000

1200

1400#

of

CP

Us

Experimentation: n22.log.40

0 5 10 15 20 25 30 35 40200

400

600

800

1000

1200

1400

1600

1800

# o

f T

ask

s

1384 CPUs - 9 sites# of workers by minutes

tasks computed by minutes

P2P with N-Queens on Grid'5000

46

Experimentation time in minutes

200

400

600

800

1000

1200

1400#

of

CP

Us

Experimentation: n22.log.40

0 5 10 15 20 25 30 35 40200

400

600

800

1000

1200

1400

1600

1800

# o

f T

ask

s

1384 CPUs - 9 sites# of workers by minutes

tasks computed by minutes

Embarrassingly applications can take benefit fromMeta-Grid Infrastructure

Agenda

Context, Problem, and Related Work

Contributions

Branch-and-Bound Framework for Grids

Desktop Grid with Peer-to-Peer

Mixing Desktops & Clusters

Perspectives & Conclusion

47

Summary

Grid'BnB: a BnB framework for Grids

communication between workers

organizing workers in groups

Grid infrastructure

based on P2P architecture

mixing desktops and clusters

deployed at INRIA Sophia lab

48

PerspectivesPeer-to-Peer Infrastructure:

Job SchedulerResource Localization (PhD)

Large-Scale Experiments:International Grid: France, Japan, and NetherlandsGrid Pugtests

Deployment: Contracts in Grids (GCM Standard)

Industrialization:P2P ➟ Desktop Resource Virtualization

CPER - P2P1M€/4years to professionalize the INRIA Grid

49

Conclusion

50

Conclusion

Branch-and-Bound for Gridssolve COPshide Grid difficultiescommunication between workers

50

Conclusion

Branch-and-Bound for Gridssolve COPshide Grid difficultiescommunication between workers

50

Conclusion

Branch-and-Bound for Gridssolve COPshide Grid difficultiescommunication between workers

Peer-to-Peer as Grid infrastructuremixing desktops and clusters

50

Conclusion

Branch-and-Bound for Gridssolve COPshide Grid difficultiescommunication between workers

Peer-to-Peer as Grid infrastructuremixing desktops and clusters

50

Conclusion

Branch-and-Bound for Gridssolve COPshide Grid difficultiescommunication between workers

Peer-to-Peer as Grid infrastructuremixing desktops and clusters

Tested and experimented

50

Conclusion

Branch-and-Bound for Gridssolve COPshide Grid difficultiescommunication between workers

Peer-to-Peer as Grid infrastructuremixing desktops and clusters

Tested and experimentedAvailable in open source

50

Conclusion

Branch-and-Bound for Gridssolve COPshide Grid difficultiescommunication between workers

Peer-to-Peer as Grid infrastructuremixing desktops and clusters

Tested and experimentedAvailable in open source

50

➟ Provides framework and infrastructureto hide Grid difficulties

[SCCC05] Balancing Active Objects on a Peer to Peer InfrastructureJavier Bustos-Jimenez, Denis Caromel, Alexandre di Costanzo, Mario Leyton and Jose M. Piquer. Proceedings of the XXV International Conference of the Chilean Computer Science Society (SCCC 2005), Valdivia, Chile, November 2005. [HIC06] Executing Hydrodynamic Simulation on Desktop Grid with ObjectWeb ProActiveDenis Caromel, Vincent Cavé, Alexandre di Costanzo, Céline Brignolles, Bruno Grawitz, and Yann Viala. HIC2006: Proceedings of the 7th International Conference on HydroInformatics, Nice, France, September 2006.

[HiPC07] A Parallel Branch & Bound Framework for Grids Denis Caromel, Alexandre di Costanzo, Laurent Baduel, and Satoshi Matsuoka. Grid’BnB: HiPC’07, Goa, India, December 2007.

[CMST06] ProActive: an Integrated platform for programming and running applications on grids and P2P systemsDenis Caromel, Christian Delbé, Alexandre di Costanzo, and Mario Leyton. Journal on Computational Methods in Science and Technology, volume 12, 2006.

[PARCO07] Peer-to-Peer for Computational Grids: Mixing Clusters and Desktop MachinesDenis Caromel, Alexandre di Costanzo, and Clément Mathieu. Parallel Computing Journal on Large Scale Grid, 2007. [FGCS07] Peer-to-Peer and Fault-tolerance: Towards Deployment-based Technical ServicesDenis Caromel, Alexandre di Costanzo, and Christian Delbé. Journal Future Generation Computer Systems, to appear, 2007.

and Workshops, Master report, ...