Tammy Dahlgren, Tom Epperly, Scott Kohn, & Gary Kumfert

Post on 15-Jan-2016

23 views 0 download

description

Tammy Dahlgren, Tom Epperly, Scott Kohn, & Gary Kumfert. Going Parallel. Goals. Describe our vision to the CCA Solicit contributions (code) for: RMI (SOAP | SOAP w/ Mime types) Parallel Network Algs (general arrays) Encourage Collaboration. Outline. Background on Components @llnl.gov - PowerPoint PPT Presentation

transcript

Tammy Dahlgren, Tom Epperly, Scott Kohn, & Gary Kumfert

GKK 2CASC

Goals

Describe our vision to the CCA

Solicit contributions (code) for: RMI (SOAP | SOAP w/ Mime types) Parallel Network Algs (general arrays)

Encourage Collaboration

GKK 3CASC

Outline

Background on Components @llnl.gov General MxN Solution : bottom-up

Initial Assumptions MUX Component MxNRedistributable interface

Parallel Handles to a Parallel Distributed Component

Tentative Research Strategy

GKK 4CASC

Components @llnl.gov

Quorum - web voting Alexandria - component repository Babel - language interoperability

maturing to platform interoperability … implies some RMI mechanismSOAP | SOAP w/ MIME typesopen to suggestions,

& contributed sourcecode

GKK 5CASC

Babel & MxN problem

Unique Opportunities SIDL communication directives Babel generates code anyway Users already link against Babel

Runtime Library Can hook directly into Intermediate

Object Representation (IOR)

GKK 6CASC

Impls and Stubs and Skels

Application

Stubs

Skels

IORs

Impls

Application: uses components in user’s language of choice

Client Side Stubs: translate from application language to C

Internal Object Representation: Always in C

Server Side Skeletons: translates IOR (in C) to component implementation language

Implementation: component developers choice of language. (Can be wrappers to legacy code)

GKK 8CASC

Internet

Remote Components

Application

Stubs

Marshaler

IORs

Line Protocol

Line Protocol

Unmarshaler

Skels

IORs

Impls

GKK 9CASC

Outline

Background on Components @llnl.gov General MxN Solution : bottom-up

Initial Assumptions MUX Component MxNRedistributable interface

Parallel Handles to a Parallel Distributed Component

Tentative Research Strategy

GKK 10CASC

Initial Assumptions

Working Point-to-Point RMI Object Persistence

GKK 11CASC

Example #1: 1-D Vectors

p0

in t g loba lS ize = 6 ;in t loca lS ize = 2 ;in t[ ] loca l2globa l = { 0 , 1 };doub le [ ] loca lD ata = { 0 .0 , 1 .1 };

V ector y;

p1

in t g loba lS ize = 6 ;in t loca lS ize = 2 ;in t[ ] loca l2globa l = { 2 , 3 };doub le [ ] loca lD ata = { 2 .2 , 3 .3 };

V ector y;

p2

in t g loba lS ize = 6 ;in t loca lS ize = 2 ;in t[ ] loca l2globa l = { 4 , 5 };doub le [ ] loca lD ata = { 4 .4 , 5 .5 };

V ector y;

p4in t g loba lS ize = 6 ;in t loca lS ize = 3 ;in t[ ] loca l2globa l = { 3 , 4 , 5 };doub le [ ] loca lD ata = { .6 , .5 , .4 };

V ector x;

doub le resu lt;

p3in t g loba lS ize = 6 ;in t loca lS ize = 3 ;in t[ ] loca l2globa l = { 0 , 1 , 2 };doub le [ ] loca lD ata = { .9 , .8 , .7 };

V ector x;

doub le resu lt;0.0, 1.1p0

x

2.2, 3.3p1

x

4.4, 5.5p2

x

.9, .8, .7p3

y

.6, .5, .4p4

y

double d = x.dot( y );double d = x.dot( y );

GKK 12CASC

Example #1: 1-D Vectors

p0

in t g loba lS ize = 6 ;in t loca lS ize = 2 ;in t[ ] loca l2globa l = { 0 , 1 };doub le [ ] loca lD ata = { 0 .0 , 1 .1 };

V ector y;

p1

in t g loba lS ize = 6 ;in t loca lS ize = 2 ;in t[ ] loca l2globa l = { 2 , 3 };doub le [ ] loca lD ata = { 2 .2 , 3 .3 };

V ector y;

p2

in t g loba lS ize = 6 ;in t loca lS ize = 2 ;in t[ ] loca l2globa l = { 4 , 5 };doub le [ ] loca lD ata = { 4 .4 , 5 .5 };

V ector y;

p4in t g loba lS ize = 6 ;in t loca lS ize = 3 ;in t[ ] loca l2globa l = { 3 , 4 , 5 };doub le [ ] loca lD ata = { .6 , .5 , .4 };

V ector x;

doub le resu lt;

p3in t g loba lS ize = 6 ;in t loca lS ize = 3 ;in t[ ] loca l2globa l = { 0 , 1 , 2 };doub le [ ] loca lD ata = { .9 , .8 , .7 };

V ector x;

doub le resu lt;p0

x

p1

x

p2

x

p3

y

p4

y

double d = x.dot( y );double d = x.dot( y );

GKK 13CASC

Rule #1: Owner Computes

double vector::dot( vector& y ) {

// initialize double * yData = new double[localSize]; y.requestData( localSize, local2global, yData);

// sum all x[i] * y[i] double localSum = 0.0; for( int i=0; i<localSize; ++i ) { localSum += localData[i] * yData[i]; }

// cleanup delete[] yData; return localMPIComm.globalSum( localSum );}

GKK 14CASC

Design Concerns

vector y is not guaranteed to have data mapped appropriately for dot product.

vector y is expected to handle MxN data redistribution internally

Should each component implement MxN redistribution?

y.requestData( localSize, local2global, yData);y.requestData( localSize, local2global, yData);

GKK 15CASC

Outline

Background on Components @llnl.gov General MxN Solution : bottom-up

Initial Assumptions MUX Component MxNRedistributable interface

Parallel Handles to a Parallel Distributed Component

Tentative Research Strategy

GKK 16CASC

Vector Dot Product: Take #2

double vector::dot( vector& y ) {

// initialize MUX mux( *this, y ); double * yData =

mux.requestData( localSize, local2global );

// sum all x[i] * y[i] double localSum = 0.0; for( int i=0; i<localSize; ++i ) { localSum += localData[i] * yData[i]; }

// cleanup mux.releaseData( yData ); return localMPIComm.globalSum( localSum );}

GKK 17CASC

Generalized Vector Ops

vector<T>::parallelOp( vector<T>& y ) {

// initialize MUX mux( *this, y ); vector<T> newY =

mux.requestData( localSize, local2global );

// problem reduced to a local operation result = x.localOp( newY );

// cleanup mux.releaseData( newY ); return localMPIComm.reduce( localResult );}

GKK 18CASC

Rule #2: MUX distributes data

Users invoke parallel operations without concern to data distribution

Developers implement local operation assuming data is already distributed

Babel generates code that reduces a parallel operation to a local operation

MUX handles all communication How general is a MUX?

GKK 19CASC

Example #2: Undirected Graph

1 2

0

3

6 7 8

9 10

11

1 2

0

3 4 5

4 5

6 7

8

9 10

11

GKK 20CASC

Key Observations

Every Parallel Component is a container and is divisible to subsets.

There is a minimal (atomic) addressable unit in each Parallel Component.

This minimal unit is addressable in global indices.

GKK 21CASC

Atomicity

Vector (Example #1): atom - scalar addressable - integer offset

Undirected Graph (Example #2): atom - vertex with ghostnodes addressable - integer vertex id

Undirected Graph (alternate): atom - edge addressable - ordered pair of integers

GKK 22CASC

Outline

Background on Components @llnl.gov General MxN Solution : bottom-up

Initial Assumptions MUX Component MxNRedistributable interface

Parallel Handles to a Parallel Distributed Component

Tentative Research Strategy

GKK 23CASC

MxNRedistributable Interface

interface Serializable {

store( in Stream s ); load( in Stream s );};

interface MxNRedistributable extends Serializable {

int getGlobalSize(); local int getLocalSize(); local array<int,1> getLocal2Global();

split ( in array<int,1> maskVector, out array<MxNRedistributable,1> pieces);

merge( in array<MxNRedistributable,1> pieces);};

GKK 24CASC

Rule #3: All Parallel Components implement “MxNRedistributable”

Provides standard interface for MUX to manipulate component

Minimal coding requirements to developer

Key to abstraction split() merge()

Manipulates “atoms” by global address

GKK 25CASC

Now for the hard part...

... 13 slides illustrating how it all fits together for an Undirected Graph

GKK 26CASC

pid=0pid=0

pid=1pid=1

%> mpirun -np 2 graphtest

GKK 27CASC

BabelOrb * orb = BabelOrb.connect( “http://...”);

pid=0pid=0

pid=1pid=1

1

2orb

3

orb 4

GKK 28CASC

Graph * graph = orb->create(“graph”,3);

orb

orb

graph

graph

graph

graph

graph

MUX

MUX

MUX

MUX

MUX

1

2

3

4

GKK 29CASC

graph

MUX

graph

MUX

graph

MUX

graph->load(“file://...”);

1 2

0

3 4 5

6

5

6 7 8

9 10

11

2

3 4 5

6 7 8

9 10

orb

orb

graph

MUX

graph

MUX

FancyMUX

Routing

1

2

GKK 30CASC

graph->doExpensiveWork();

1 2

0

3 4 5

6

2

3 4 5

6 7 8

9 10

5

6 7 8

9 10

11

orb

orb

graph

MUX

graph

MUX

1 2

0

3 4 5

6

2

3 4 5

6 7 8

9 10

5

6 7 8

9 10

11

1 2

0

3 4 5

6

2

3 4 5

6 7 8

9 10

5

6 7 8

9 10

11

GKK 31CASC

PostProcessor * pp = new PostProcessor;

1 2

0

3 4 5

6

2

3 4 5

6 7 8

9 10

5

6 7 8

9 10

11

orb

orb

graph

MUX

graph

MUX

pp

pp

GKK 32CASC

pp->render( graph );

1 2

0

3 4 5

6

2

3 4 5

6 7 8

9 10

5

6 7 8

9 10

11

orb

orb

graph

MUX

graph

MUX

pp

pp

MUX queries graph for global size (12)

Graph determinesparticulardata layout(blocked)

012345

6789

1011

require

require

MUX is invoked to guaranteethat layout before render implementation is called

GKK 33CASC

MUX solves general parallel network flow problem (client & server)

1 2

0

3 4 5

6

2

3 4 5

6 7 8

9 10

5

6 7 8

9 10

11

orb

orb

graph

MUX

graph

MUX

pp

pp

012345

6789

1011

require

require

0, 1, 2, 3

4, 5

6, 7

8, 9, 10, 11

GKK 34CASC

MUX opens communication pipes

1 2

0

3 4 5

6

2

3 4 5

6 7 8

9 10

5

6 7 8

9 10

11

orb

orb

graph

MUX

graph

MUX

pp

pp

012345

6789

1011

require

require

0, 1, 2, 3

4, 5

6, 7

8, 9, 10, 11

GKK 35CASC

MUX splits graphs with multiple destinations (server-side)

1 2

0

3 4 5

6

2

3 4 5

6 7 8

9 10

5

6 7 8

9 10

11

orb

orb

graph

MUX

graph

MUX

pp

pp

012345

6789

1011

require

require

0, 1, 2, 3

4, 5

6, 7

8, 9, 10, 11

2

3 4 5

6 7 8

3 4 5

6 7 8

9 10

GKK 36CASC

MUX sends pieces through communication pipes (persistance)

1 2

0

3 4 5

6

2

3 4 5

6 7 8

9 10

orb

orb

graph

MUX

graph

MUX

pp

pp

012345

6789

1011

require

require

5

6 7 8

9 10

11

2

3 4 5

6 7 8

3 4 5

6 7 8

9 10

1 2

0

3 4 5

6

5

6 7 8

9 10

11

GKK 37CASC

MUX receives graphs through pipes & assembles them (client side)

1 2

0

3 4 5

6

2

3 4 5

6 7 8

9 10

orb

orb

graph

MUX

graph

MUX

pp

pp

012345

6789

1011

require

require

5

6 7 8

9 10

11

2

3 4 5

6 7 8

3 4 5

6 7 8

9 10

1 2

0

3 4 5

6

5

6 7 8

9 10

11

3 4 5

6 7 8

9 10

11

1 2

0

3 4 5

6 7 8

GKK 38CASC

pp -> render_impl( graph );(user’s implementation runs)

1 2

0

3 4 5

6

3 4 5

6 7 8

9 10

11

1 2

0

3 4 5

6 7 8

2

3 4 5

6 7 8

9 10

5

6 7 8

9 10

11

GKK 39CASC

Outline

Background on Components @llnl.gov General MxN Solution : bottom-up

Initial Assumptions MUX Component MxNRedistributable interface

Parallel Handles to a Parallel Distributed Component

Tentative Research Strategy

GKK 40CASC

Summary

All distributed components are containers and subdivisable

The smallest globally addressable unit is an atom

MxNRedistributable interface reduces general component MxN problem to a 1-D array of ints

MxN problem is a special case of the general problem N handles to M instances

Babel is uniquely positioned to contribute a solution to this problem

GKK 41CASC

Outline

Background on Components @llnl.gov General MxN Solution : bottom-up

Initial Assumptions MUX Component MxNRedistributable interface

Parallel Handles to a Parallel Distributed Component

Tentative Research Strategy

GKK 42CASC

Tentative Research Strategy

Java only, no Babel serialization &

RMI built-in Build MUX Experiment Write Paper

Finish 0.5.x line

add serialization

add RMI

Add in technology from Fast Track

Fast Track Sure Track

GKK 43CASC

Open Questions

Non-general, Optimized Solutions Client-side Caching issues Fault Tolerance Subcomponent Migration Inter vs. Intra component

communication MxN , MxP, or MxPxQxN

GKK 44CASC

MxPxQxN Problem

Long-Haul Network

GKK 45CASC

MxPxQxN Problem

Long-Haul Network

GKK 46CASC

The End

GKK 47CASC

Work performed under the auspices of the U. S. Department of Energy by the University of California, Lawrence Livermore National Laboratory under Contract W-7405-Eng-48

UCRL-VG-142096