Date post: | 03-Jan-2016 |
Category: |
Documents |
Upload: | jessie-fleming |
View: | 213 times |
Download: | 0 times |
High Performance Event Service for CCA Framework: Design and Experiences
Khushbu AgarwalManoj KrishnanDaniel ChavarriaIan Gorton
Outline
Event Service OverviewDesign and ImplementationDesign ChallengesPreliminary ResultsConclusion
2
CCA Event Service 101
Publish-subscribe1-n, n-m, n-1
Specification is similar to:Many distributed event/messaging services
3
interface<<SIDL>>
WildcardTopic
interface
<<SIDL>>EventListener
OperationsprocessEvent():void
interface
<<SIDL>>EventService
OperationsprocessEvents():voidCreateTopic():TopicCreateWildcardTopic():WildcardTopicgetTopic():TopicgetWildcardTopic():WildcardTopicReleaseTopic():voidReleaseWildcardTopic():void
interface<<SIDL>>
Event
OperationssetHeader():voidgetHeader():TypeMapgetBody():TypeMapsetBody():void
interface
<<SIDL>>TypeMap
interface
<<SIDL>>
Topic
OperationssendEvent():voidgetTopicName():StringRegisterEventListener():voidUnRegisterEventListener():void
2
manages
*1
*
processes
Possible use cases
Event/message distribution between components in the same framework
Initial SciRun implementation
Event/message distribution across processes in a HPC application
Across address spacesNeeds to be fastHandle a range of potential payload sizes
Event/messaging service schizophrenia!!
Other work exists …ECho Grid event service
4
CCA Event Service
Started with Utah CCA/SciRun event service implementationCreated two standalone prototypes (no SIDL, no framework): (‘08)
Reliable: events transferred via files
Fast: events transferred over ARMCI on Cray XD1
Single-sided memory transfers
Now: Event Service in CCA Framework:Framework design based on specification provided by CCA forum
5
Sidl Specification
Interfaces : Event
EventListener
EventServiceException
Subscriber
Topic
Portsinterface PublisherEventService extends cca.Port {}
interface SubscriberEventService extends cca.Port { }
6
Modifications from Sidl Specification :
Modifications to InterfacesTopic /* The wildcard topic management is not implemented yet */
Publisher Needed by PublisherEventService
ESTypeMap --extends=gov.cca.TypeMap Added Relocate()
Added ComponentsEventService --provides=PublisherEventService, SubscriberEventService
Driver --go=run --uses=SubscriberEventService ,PublisherEventService
7
Implementation Experience
Used sidl file provided at the CCA forum websiteBocca create EventService --import-sidl = EventService.sidl
Doesn't work
LL1: The sidl file needs to be separated into multiple files (for each interface)bocca create interface Event --import-sidl=Event.sidlbocca create class CCAEvent --implements=Event
....bocca create interface Subscriber --import-sidl=Subscriber.sidlbocca create class CCASubscriber --implements=Subscriber
....
8
Implementation Experiences (contd.)
Specifying dependencies during creationExample:
bocca create interface A1 {func ();}
bocca create interface A2 – requires=A1
{ A1 obj_1; }
bocca create interface A3 – requires =A2
{
A2. obj1.func(); //Error, undefined A1
}
9
Implementation Experience (contd.)
Solution:Re-do create interface with explicit dependencies
bocca create interface A3 – requires =A2,A1
and , include <A1> explicitly in A3’s implementation file
Example:
bocca create class CCASubscriber --implements=Subscriber
--requires=CCAESTypeMap --requires=CCAEvent
LL2: Specify dependencies for all levels explicitly
LL3: Cannot modify dependencies on fly
10
Implementation Experiences (contd.)
Accessing data not defined in sidl file. E.g.: Sidl File: interface A { } A_Impl.cxx:CCA class A_impl{ ComplexType x; // Class A, member variable x
A_func1(ComplexType y); // member function A_func1}
C_Impl.cxx : CCA class C_impl{ ComplexType p =A_obj.x; // Error: Class A
does not define x A_obj.A_func1(); // Error: Class A does not define A_func1 }
11
Implementation: Experiences(contd.)
Solution:
Access data through member methods only
Define member method parameters using opaque
A Sidl File: interface A {
opaque getComplexType();
void setComplexType(in opaque);
}
A_Impl.cxx: CCA class A_impl{
ComplexType x;
};
12
Implementation: Experiences(contd.)
C_Impl.cxx: CCA class C_impl{
func() {
ComplexType* y =
(ComplexType*)A::getComplexType();
}
};
LL4: Changes made to _Impl.xxx files are not visible to other classes unless specified in sidl file
13
Implementation: Experiences(contd.)Returning CCA objects as pointers
14
Opaque func1(){ CCA_class_A *p =
&(CCA_class_A::_create()); ……… return p;}
class_A* func1(){ class_A *p = new class_A; …… return p;
}
C++ CCA
•CCA implementation: func1() returns invalid pointer. Why? •CCA object is destroyed at the end of function. •Solution :Return objects, instead of pointers. • Edit class sidl file: bocca edit CCAESTypeMap
CCA_class_A func1(){ CCA_class_A p =
CCA_class_A::_create(); ……… return p;}
Design Challenges
Sidl file does not allow defining variables for non-primitive data types (e.g from libraries)
New constructors cannot be defined in sidl file, not accessible if defined in _Impl.xxx files)
Introduce init() function calls for all classes
Object creationUse _ create()New operator does not work
Explicit freeing memory for a Babel object, introduces segfaults. auto de-allocation when object goes out of scopeUsed dummy namespace to trigger destructor of EventService Object.
Directory process need to receive Quit eventservice message
15
Mapping CCA objects to different address space
ChallengesCCA objects contain implicit pointers
CCA objects have dynamic allocation of some data (change size)?
Solution 1Sender unpacks and transfers elementary data. Receiver reconstructs required CCA object
Possible, but tricky in case of nested CCA objects
Example: Trying to transfer CCA Events between address spaces requires transferring ESTypeMaps which are represented using two FixedMaps, which in turn store primitive data
16
Mapping CCA objects to different address space (contd.) – Potential Solutions
Solution 2Babel RMI -> Clean but potentially lower performance solution for HPC needs
Solution 3Babel support could be extended to provide RMI on MPI/ARMCI
Solution4 Babel may provide an abstraction that returns a pointer to a flat C++ object given a CCA object
17
Preliminary Result
4 processesCCAProcess 0: Total EventService Time 129.041195 (ms)
Process 2: Total EventService Time 129.059076 (ms)
Process 1: Total EventService Time 129.112005 (ms)
Process 3: Total EventService Time 129.111052 (ms)
C++ Process 0: Total EventService Time 1.814842 (ms)
Process 1: Total EventService Time 1.585007 (ms)
Process 2: Total EventService Time 2.148151 (ms)
Process 3: Total EventService Time 1.943827 (ms)
18
Conclusion
The transfer of CCA objects into multiple address spaces efficiently, remains to be addressed.
The efficiency of event service design is somewhat limited by the CCA constraints.
The event service implementation is similar to the CCA provided specification.
19
Questions ?
20
More::
21
ARMCI Prototype
Goals: maintain interface/semantics of the event service model
achieve high performance in a distributed memory HPC system
Used combination of MPI & ARMCIMPI - Process 0 operates as a Topic Directory process
Maintains a Topic List with the locations of the publishers
Uses an MPI messaging protocol to serve topic creation requests and queries
ARMCI - Publishers create events locally in their own address space
Subscribers read remote events from the publishers using one-sided ARMCI_Get() operations
no need for coordination with the publisher
22
ARMCI Prototype (cont.)
Used a combination of MPI & ARMCI to create the event serviceTransfer C++ class instances directly over ARMCI without the need for type serialization
Events comprise two TypeMaps: header and body
Created a special heap manager for the ARMCI address spaceobjects can be allocated directly through standard new() and delete() operators
synchronous garbage collection by the publisher
For high performance, all objects in the ARMCI heap are flattenedno pointers or references to external objects
member variables embedded
fixed size
23
Initial Performance Results
We measured event processing rates:66K events/second with one publisher/one subscriber (small event 4KB)950 events/second with one publisher/16 subscribers (large event 50KB)Minimal overhead to reconstruct the object on the subscriber after the transfer
24
Processing Rate
0.00
10,000.00
20,000.00
30,000.00
40,000.00
50,000.00
60,000.00
70,000.00
1 2 4 8 16
# of subscribers
Ev
en
ts/S
ec
on
d
50 KB event size
4 KB event size
Analysis
Performance drops as number of subscribers increasesContention for events at publisher ARMCI memory
Alternatives implementations are possible:Maintain topics for subscribers only in local ARMCI memory
Publishers write to subscriber memory directly for each event published
25
Alternative Design
26
Publisher
Topic 1
Subscriber
Topic 2
Master topic list
Sub1 Sub2
Sub1 Sub3
ARMCI Subscriber Buffer
Topic1
Topic struct
Topic2
mess mess
next
Maintain topic list in process 0 (using MPI) or ARMCI shared memory?
Send()
Strengths?Likely reduced contentionSimplifies ‘publish semantics’ and event retention issues
Weaknesses?Publish can fail if subscriber memory fullSome subscribers slower than others - events delivered unpredictably depending on consumption rate
Polygraph Issues: Delivery Semantics
Basic pub-sub good for N-to-N event distributionNeed to keep events until all subscribers consume them
Optional ‘time-to-live’ in header can help
Workload distribution use cases require ‘load-balancing’ topics
Same programmatic interface
Each event consumed by only one subscriber
No complex event retention issues
Could define load-balancing policies for publishersDeclaratively?
A ‘one-to-one’ queue-like mechanism may also be useful?
27
Issues: Topic Memory Management
Managing memory for a topic is tricky:Need to know how many subscribers for each specific event
Events are variable size, hence allocating/reclaiming memory for events is complex
One possibility: typed topicsAssociate an event type with a topic
Specify maximum size for any event
Simplifies memory management for each topic
28
Issues - Miscellaneous
What are semantics when a new subscriber subscribes to a topic?
What exactly do they see?
All messages in topic queue at subscription time?
Only new ones?
In ARMCI implementation, memory for topic queues is finite
Should it be user-configurable?
What happens when topic memory full?
Standard publish error defined by Event Service?
29
Issues - Miscellaneous
Event Service SIDL doesn’t clearly demarcate if there are:Calls for publishers only?Calls for subscribers only?
So … what happens if:A publisher calls ReleaseTopic()? A publisher calls ProcessEvents()?
How can CreateTopic() fail?Two publishers call CreateTopic in a non-deterministic sequence. What happens?Can a subscriber call CreateTopic()?
Why is argument to ReleaseTopic() a string?Would a valid Topic reference be less error-prone/simpler?
Should events have a ‘standard’ headerUsed by all event service implementationsNot settable programmaticallyE.g. Time-to-live, timestamp, correlation-id, likely others …
30
Next steps …
Implement alternative ‘subscriber side’ ARMCI implementationDetailed performance analysis …Use Event Service to implement PolyGraph use case
31