+ All Categories
Home > Documents > High Performance Event Service for CCA Framework: Design and Experiences Khushbu Agarwal Manoj...

High Performance Event Service for CCA Framework: Design and Experiences Khushbu Agarwal Manoj...

Date post: 03-Jan-2016
Category:
Upload: jessie-fleming
View: 213 times
Download: 0 times
Share this document with a friend
Popular Tags:
31
High Performance Event Service for CCA Framework: Design and Experiences Khushbu Agarwal Manoj Krishnan Daniel Chavarria Ian Gorton
Transcript
Page 1: High Performance Event Service for CCA Framework: Design and Experiences Khushbu Agarwal Manoj Krishnan Daniel Chavarria Ian Gorton.

High Performance Event Service for CCA Framework: Design and Experiences

Khushbu AgarwalManoj KrishnanDaniel ChavarriaIan Gorton

Page 2: High Performance Event Service for CCA Framework: Design and Experiences Khushbu Agarwal Manoj Krishnan Daniel Chavarria Ian Gorton.

Outline

Event Service OverviewDesign and ImplementationDesign ChallengesPreliminary ResultsConclusion

2

Page 3: High Performance Event Service for CCA Framework: Design and Experiences Khushbu Agarwal Manoj Krishnan Daniel Chavarria Ian Gorton.

CCA Event Service 101

Publish-subscribe1-n, n-m, n-1

Specification is similar to:Many distributed event/messaging services

3

interface<<SIDL>>

WildcardTopic

interface

<<SIDL>>EventListener

OperationsprocessEvent():void

interface

<<SIDL>>EventService

OperationsprocessEvents():voidCreateTopic():TopicCreateWildcardTopic():WildcardTopicgetTopic():TopicgetWildcardTopic():WildcardTopicReleaseTopic():voidReleaseWildcardTopic():void

interface<<SIDL>>

Event

OperationssetHeader():voidgetHeader():TypeMapgetBody():TypeMapsetBody():void

interface

<<SIDL>>TypeMap

interface

<<SIDL>>

Topic

OperationssendEvent():voidgetTopicName():StringRegisterEventListener():voidUnRegisterEventListener():void

2

manages

*1

*

processes

Page 4: High Performance Event Service for CCA Framework: Design and Experiences Khushbu Agarwal Manoj Krishnan Daniel Chavarria Ian Gorton.

Possible use cases

Event/message distribution between components in the same framework

Initial SciRun implementation

Event/message distribution across processes in a HPC application

Across address spacesNeeds to be fastHandle a range of potential payload sizes

Event/messaging service schizophrenia!!

Other work exists …ECho Grid event service

4

Page 5: High Performance Event Service for CCA Framework: Design and Experiences Khushbu Agarwal Manoj Krishnan Daniel Chavarria Ian Gorton.

CCA Event Service

Started with Utah CCA/SciRun event service implementationCreated two standalone prototypes (no SIDL, no framework): (‘08)

Reliable: events transferred via files

Fast: events transferred over ARMCI on Cray XD1

Single-sided memory transfers

Now: Event Service in CCA Framework:Framework design based on specification provided by CCA forum

5

Page 6: High Performance Event Service for CCA Framework: Design and Experiences Khushbu Agarwal Manoj Krishnan Daniel Chavarria Ian Gorton.

Sidl Specification

Interfaces : Event

EventListener

EventServiceException

Subscriber

Topic

Portsinterface PublisherEventService extends cca.Port {}

interface SubscriberEventService extends cca.Port { }

6

Page 7: High Performance Event Service for CCA Framework: Design and Experiences Khushbu Agarwal Manoj Krishnan Daniel Chavarria Ian Gorton.

Modifications from Sidl Specification :

Modifications to InterfacesTopic /* The wildcard topic management is not implemented yet */

Publisher Needed by PublisherEventService

ESTypeMap --extends=gov.cca.TypeMap Added Relocate()

Added ComponentsEventService --provides=PublisherEventService, SubscriberEventService

Driver --go=run --uses=SubscriberEventService ,PublisherEventService

7

Page 8: High Performance Event Service for CCA Framework: Design and Experiences Khushbu Agarwal Manoj Krishnan Daniel Chavarria Ian Gorton.

Implementation Experience

Used sidl file provided at the CCA forum websiteBocca create EventService --import-sidl = EventService.sidl

Doesn't work

LL1: The sidl file needs to be separated into multiple files (for each interface)bocca create interface Event --import-sidl=Event.sidlbocca create class CCAEvent --implements=Event

....bocca create interface Subscriber --import-sidl=Subscriber.sidlbocca create class CCASubscriber --implements=Subscriber

....

8

Page 9: High Performance Event Service for CCA Framework: Design and Experiences Khushbu Agarwal Manoj Krishnan Daniel Chavarria Ian Gorton.

Implementation Experiences (contd.)

Specifying dependencies during creationExample:

bocca create interface A1 {func ();}

bocca create interface A2 – requires=A1

{ A1 obj_1; }

bocca create interface A3 – requires =A2

{

A2. obj1.func(); //Error, undefined A1

}

9

Page 10: High Performance Event Service for CCA Framework: Design and Experiences Khushbu Agarwal Manoj Krishnan Daniel Chavarria Ian Gorton.

Implementation Experience (contd.)

Solution:Re-do create interface with explicit dependencies

bocca create interface A3 – requires =A2,A1

and , include <A1> explicitly in A3’s implementation file

Example:

bocca create class CCASubscriber --implements=Subscriber

--requires=CCAESTypeMap --requires=CCAEvent

LL2: Specify dependencies for all levels explicitly

LL3: Cannot modify dependencies on fly

10

Page 11: High Performance Event Service for CCA Framework: Design and Experiences Khushbu Agarwal Manoj Krishnan Daniel Chavarria Ian Gorton.

Implementation Experiences (contd.)

Accessing data not defined in sidl file. E.g.: Sidl File: interface A { } A_Impl.cxx:CCA class A_impl{ ComplexType x; // Class A, member variable x

A_func1(ComplexType y); // member function A_func1}

C_Impl.cxx : CCA class C_impl{ ComplexType p =A_obj.x; // Error: Class A

does not define x A_obj.A_func1(); // Error: Class A does not define A_func1 }

11

Page 12: High Performance Event Service for CCA Framework: Design and Experiences Khushbu Agarwal Manoj Krishnan Daniel Chavarria Ian Gorton.

Implementation: Experiences(contd.)

Solution:

Access data through member methods only

Define member method parameters using opaque

A Sidl File: interface A {

opaque getComplexType();

void setComplexType(in opaque);

}

A_Impl.cxx: CCA class A_impl{

ComplexType x;

};

12

Page 13: High Performance Event Service for CCA Framework: Design and Experiences Khushbu Agarwal Manoj Krishnan Daniel Chavarria Ian Gorton.

Implementation: Experiences(contd.)

C_Impl.cxx: CCA class C_impl{

func() {

ComplexType* y =

(ComplexType*)A::getComplexType();

}

};

LL4: Changes made to _Impl.xxx files are not visible to other classes unless specified in sidl file

13

Page 14: High Performance Event Service for CCA Framework: Design and Experiences Khushbu Agarwal Manoj Krishnan Daniel Chavarria Ian Gorton.

Implementation: Experiences(contd.)Returning CCA objects as pointers

14

Opaque func1(){ CCA_class_A *p =

&(CCA_class_A::_create()); ……… return p;}

class_A* func1(){ class_A *p = new class_A; …… return p;

}

C++ CCA

•CCA implementation: func1() returns invalid pointer. Why? •CCA object is destroyed at the end of function. •Solution :Return objects, instead of pointers. • Edit class sidl file: bocca edit CCAESTypeMap

CCA_class_A func1(){ CCA_class_A p =

CCA_class_A::_create(); ……… return p;}

Page 15: High Performance Event Service for CCA Framework: Design and Experiences Khushbu Agarwal Manoj Krishnan Daniel Chavarria Ian Gorton.

Design Challenges

Sidl file does not allow defining variables for non-primitive data types (e.g from libraries)

New constructors cannot be defined in sidl file, not accessible if defined in _Impl.xxx files)

Introduce init() function calls for all classes

Object creationUse _ create()New operator does not work

Explicit freeing memory for a Babel object, introduces segfaults. auto de-allocation when object goes out of scopeUsed dummy namespace to trigger destructor of EventService Object.

Directory process need to receive Quit eventservice message

15

Page 16: High Performance Event Service for CCA Framework: Design and Experiences Khushbu Agarwal Manoj Krishnan Daniel Chavarria Ian Gorton.

Mapping CCA objects to different address space

ChallengesCCA objects contain implicit pointers

CCA objects have dynamic allocation of some data (change size)?

Solution 1Sender unpacks and transfers elementary data. Receiver reconstructs required CCA object

Possible, but tricky in case of nested CCA objects

Example: Trying to transfer CCA Events between address spaces requires transferring ESTypeMaps which are represented using two FixedMaps, which in turn store primitive data

16

Page 17: High Performance Event Service for CCA Framework: Design and Experiences Khushbu Agarwal Manoj Krishnan Daniel Chavarria Ian Gorton.

Mapping CCA objects to different address space (contd.) – Potential Solutions

Solution 2Babel RMI -> Clean but potentially lower performance solution for HPC needs

Solution 3Babel support could be extended to provide RMI on MPI/ARMCI

Solution4 Babel may provide an abstraction that returns a pointer to a flat C++ object given a CCA object

17

Page 18: High Performance Event Service for CCA Framework: Design and Experiences Khushbu Agarwal Manoj Krishnan Daniel Chavarria Ian Gorton.

Preliminary Result

4 processesCCAProcess 0: Total EventService Time 129.041195 (ms)

Process 2: Total EventService Time 129.059076 (ms)

Process 1: Total EventService Time 129.112005 (ms)

Process 3: Total EventService Time 129.111052 (ms)

C++ Process 0: Total EventService Time 1.814842 (ms)

Process 1: Total EventService Time 1.585007 (ms)

Process 2: Total EventService Time 2.148151 (ms)

Process 3: Total EventService Time 1.943827 (ms)

18

Page 19: High Performance Event Service for CCA Framework: Design and Experiences Khushbu Agarwal Manoj Krishnan Daniel Chavarria Ian Gorton.

Conclusion

The transfer of CCA objects into multiple address spaces efficiently, remains to be addressed.

The efficiency of event service design is somewhat limited by the CCA constraints.

The event service implementation is similar to the CCA provided specification.

19

Page 20: High Performance Event Service for CCA Framework: Design and Experiences Khushbu Agarwal Manoj Krishnan Daniel Chavarria Ian Gorton.

Questions ?

20

Page 21: High Performance Event Service for CCA Framework: Design and Experiences Khushbu Agarwal Manoj Krishnan Daniel Chavarria Ian Gorton.

More::

21

Page 22: High Performance Event Service for CCA Framework: Design and Experiences Khushbu Agarwal Manoj Krishnan Daniel Chavarria Ian Gorton.

ARMCI Prototype

Goals: maintain interface/semantics of the event service model

achieve high performance in a distributed memory HPC system

Used combination of MPI & ARMCIMPI - Process 0 operates as a Topic Directory process

Maintains a Topic List with the locations of the publishers

Uses an MPI messaging protocol to serve topic creation requests and queries

ARMCI - Publishers create events locally in their own address space

Subscribers read remote events from the publishers using one-sided ARMCI_Get() operations

no need for coordination with the publisher

22

Page 23: High Performance Event Service for CCA Framework: Design and Experiences Khushbu Agarwal Manoj Krishnan Daniel Chavarria Ian Gorton.

ARMCI Prototype (cont.)

Used a combination of MPI & ARMCI to create the event serviceTransfer C++ class instances directly over ARMCI without the need for type serialization

Events comprise two TypeMaps: header and body

Created a special heap manager for the ARMCI address spaceobjects can be allocated directly through standard new() and delete() operators

synchronous garbage collection by the publisher

For high performance, all objects in the ARMCI heap are flattenedno pointers or references to external objects

member variables embedded

fixed size

23

Page 24: High Performance Event Service for CCA Framework: Design and Experiences Khushbu Agarwal Manoj Krishnan Daniel Chavarria Ian Gorton.

Initial Performance Results

We measured event processing rates:66K events/second with one publisher/one subscriber (small event 4KB)950 events/second with one publisher/16 subscribers (large event 50KB)Minimal overhead to reconstruct the object on the subscriber after the transfer

24

Processing Rate

0.00

10,000.00

20,000.00

30,000.00

40,000.00

50,000.00

60,000.00

70,000.00

1 2 4 8 16

# of subscribers

Ev

en

ts/S

ec

on

d

50 KB event size

4 KB event size

Page 25: High Performance Event Service for CCA Framework: Design and Experiences Khushbu Agarwal Manoj Krishnan Daniel Chavarria Ian Gorton.

Analysis

Performance drops as number of subscribers increasesContention for events at publisher ARMCI memory

Alternatives implementations are possible:Maintain topics for subscribers only in local ARMCI memory

Publishers write to subscriber memory directly for each event published

25

Page 26: High Performance Event Service for CCA Framework: Design and Experiences Khushbu Agarwal Manoj Krishnan Daniel Chavarria Ian Gorton.

Alternative Design

26

Publisher

Topic 1

Subscriber

Topic 2

Master topic list

Sub1 Sub2

Sub1 Sub3

ARMCI Subscriber Buffer

Topic1

Topic struct

Topic2

mess mess

next

Maintain topic list in process 0 (using MPI) or ARMCI shared memory?

Send()

Strengths?Likely reduced contentionSimplifies ‘publish semantics’ and event retention issues

Weaknesses?Publish can fail if subscriber memory fullSome subscribers slower than others - events delivered unpredictably depending on consumption rate

Page 27: High Performance Event Service for CCA Framework: Design and Experiences Khushbu Agarwal Manoj Krishnan Daniel Chavarria Ian Gorton.

Polygraph Issues: Delivery Semantics

Basic pub-sub good for N-to-N event distributionNeed to keep events until all subscribers consume them

Optional ‘time-to-live’ in header can help

Workload distribution use cases require ‘load-balancing’ topics

Same programmatic interface

Each event consumed by only one subscriber

No complex event retention issues

Could define load-balancing policies for publishersDeclaratively?

A ‘one-to-one’ queue-like mechanism may also be useful?

27

Page 28: High Performance Event Service for CCA Framework: Design and Experiences Khushbu Agarwal Manoj Krishnan Daniel Chavarria Ian Gorton.

Issues: Topic Memory Management

Managing memory for a topic is tricky:Need to know how many subscribers for each specific event

Events are variable size, hence allocating/reclaiming memory for events is complex

One possibility: typed topicsAssociate an event type with a topic

Specify maximum size for any event

Simplifies memory management for each topic

28

Page 29: High Performance Event Service for CCA Framework: Design and Experiences Khushbu Agarwal Manoj Krishnan Daniel Chavarria Ian Gorton.

Issues - Miscellaneous

What are semantics when a new subscriber subscribes to a topic?

What exactly do they see?

All messages in topic queue at subscription time?

Only new ones?

In ARMCI implementation, memory for topic queues is finite

Should it be user-configurable?

What happens when topic memory full?

Standard publish error defined by Event Service?

29

Page 30: High Performance Event Service for CCA Framework: Design and Experiences Khushbu Agarwal Manoj Krishnan Daniel Chavarria Ian Gorton.

Issues - Miscellaneous

Event Service SIDL doesn’t clearly demarcate if there are:Calls for publishers only?Calls for subscribers only?

So … what happens if:A publisher calls ReleaseTopic()? A publisher calls ProcessEvents()?

How can CreateTopic() fail?Two publishers call CreateTopic in a non-deterministic sequence. What happens?Can a subscriber call CreateTopic()?

Why is argument to ReleaseTopic() a string?Would a valid Topic reference be less error-prone/simpler?

Should events have a ‘standard’ headerUsed by all event service implementationsNot settable programmaticallyE.g. Time-to-live, timestamp, correlation-id, likely others …

30

Page 31: High Performance Event Service for CCA Framework: Design and Experiences Khushbu Agarwal Manoj Krishnan Daniel Chavarria Ian Gorton.

Next steps …

Implement alternative ‘subscriber side’ ARMCI implementationDetailed performance analysis …Use Event Service to implement PolyGraph use case

31


Recommended