Date post: | 27-Dec-2015 |
Category: |
Documents |
Upload: | bertha-douglas |
View: | 216 times |
Download: | 0 times |
Composing Just Enough Middleware
Christopher D. [email protected]
Department of Computer Science and EngineeringWashington University
St. Louis, MO, USA
E71 CS 6785 Programming Languages SeminarFriday, October 17, 2003
Talk Outline
• Part I: Introduction to middleware
(30 minutes + Q&A)
• Part II: Composing just enough middleware
(60 minutes + Q&A)
• Questions are welcome and encouraged
(During and after each part)
Part I: Introduction to Middleware
• Middleware is a “glue” layer between – “Fixed” infrastructure (hardware, operating system) – Variable (application-specific) parts of a system
• Middleware raises the level of abstraction– At which developers program the system
Client Server
(Lots of details hidden by middleware)
message
Distributed Object Computing (DOC)• Assumes objects, references, methods
• Also, an Object Request Broker (ORB)
• Historically, a user-space software layer – Between the operating system and the application– Middleware may be pushed into OS, hardware
• Sensor nets, on-chip FPGAs, hardware threading
– But for this talk, assume• implemented on top of threads, sockets, and timers
Client Object
(Lots of details again hidden by middleware)
method ()
reference
Maintaining the DOC Illusion
• A “reference” to an object– Encodes IP address, ID
• A “wire format” is defined for invocation messages– Client stubs “marshal”– Server skeletons “un-marshal”
• A servant implements an object in a server
• All other details are ORB implementation features– Thread pools, sockets, etc.– Usually abstracted away
ORB ORB
Stub
Client
Skeleton
Servant
IIOPmessage
objectreference
A Simple DOC Invocation Path
Client Server
ClientStub
ORB Core
ORB Core
ServerSkeleton
SimpleObject
Adapter
CDRDemarshaller
CDRMarshaller
Client invokesa method call on the remote object
Stub creates a collection
of parameters
Parameters aremarshalled
IIOPRequest
sent
IIOPRequestreceived
Server methodinvoked
Servant objectlocated
Parameters areunmarshalled
Dispatch method call to servant
But What’s Really Inside an ORB?
• ORB connects stubs to skeletons
• Uses threads, sockets, reactors, tables, etc.
• Ensures 0/1 delivery– At most one recipient once– Caller may get exception
• User level: servant throws• ORB level: e.g., network
• Object adapters support multiple servants
ORBIIOP
message
ORB
socket
POA
lookup
timers reactor
thread pool
send_n recv_n
upcall to skeleton
Marshalling (IIOP message)
• Standard CORBA GIOP message format– Mapped to TCP/IP protocols
• IDL parameter types– Primitive types – Arrays – Sequences– Structures– Interfaces– Anys– Typecodes
• Marshaling and un-marshaling relatively expensive– Apply optimizations, e.g., co-location, where possible
Wait-On-Reply Strategy (2-Way Calls)
• Wait on connection– Low overhead– Does not interfere
with other calls– But may lead to
deadlock
• Wait on reactor– Higher overhead– May block other calls– Does not lead to
same deadlock
Client ServerC
Reactor
wait
Re
acto
r
Servant
Callback
6
Deadlockavoided
by waitingon reactor
1
3
42
5
Client ServerC
Reactor
3 wait
Re
acto
r
Servant
Deadlockhere
Callback
1 2 4
5
Socket/Thread Multiplexing• Connection cache
– Serial socket re-use
• Thread pool– Serial thread re-use
• Reactors support concurrent multiplexing– Calls onto a thread
• Can multiplex calls onto a socket concurrently too– Leader/followers– Half-sync/Half-asyncHalf-Sync/Half-Async
Design Pattern (POSA2)
worker thread
asynchronousrequests
bufferedrequests
Where does Real-Time Fit In?
• Real-time is about predictability“Real-time != real fast” (predicable real fast is good)
• Different real-time enforcement mechanisms – Static → efficiency of mechanisms (e.g., overhead)– Dynamic → flexibility of policies (e.g., utilization)– Hybrid → combine both (e.g., utilization + isolation)
• Different notions of scheduling strategy possible– E.g., EDF, MLF, RMS, MAU
• Various kinds of schedulable entities– E.g., messages, events, OS or distributable threads
Static Real-Time Lanes
• RT CORBA 1.0 spec implemented in TAO– Irfan Pyarali’s dissertation work
• Each lane composed of thread(s)+socket(s)– Dedicated statically to each lane a priori
• Each lane has a priority value– Highest priority lane is most eligible at a resource
• Efficient enforcement by OS, network layers– E.g., fixed thread + RSVP priorities
• Dispatching decision functionPriority value → lane assignment (lane does the rest)
Real-Time Object Adapters• Efficiency vs. flexibility trade-offs in an upcall
– Irfan Pyarali, Aniruddha Gokhale, Doug Schmidt
• If space of object IDs is known a priori– Can use hashing for O(1) servant lookup– But must waste some space of empty table slots– And server must assign IDs (limits spec slightly)
• Offers another scheduling point in the ORB– Can adapt priorities at these points– Restrict priority inversion (say if no RSVP)
• Promotes Efficiency (Gokhale et al.)– Reductions in locking and copying during upcall
Kokyu Dispatching Framework• “Kokyu” is a Japanese word
– Means “breath”, but also timing
• Generalizes lanes– To dynamic, hybrid cases
• Prioritized Threads– Isolate dispatch lanes
• Queues– Order requests within each lane
• Timers– Pace periodic requests
• Configuration (tuples)<#, prio, Q type, timer>
• Implicit projection of scheduling onto OS
DispatcherDispatching configuration
RMS
LLF
mandatory
optionallaxity
static
static
timers
Kokyu in a Real-Time Event Channel
• Well known environment– Harrison M.S. thesis, O’Ryan, Gill dissertations– Boeing, Stanford, KSU collaborations
• Schedule event pushes as in Kokyu experiments– Application: ~70 components “flown” in flight simulator– Static, hybrid static/dynamic scheduling strategies RMS, MUF, RMS+LLF– Empirical studies of isolation and effectiveness
Dispatcher
laxity
static
static
Pro
xyPro
xyF
ilte
ring
Co
rre
latio
n
Suppliers Event ChannelConsumers
Kokyu with (Distributable) Threads
• E.g., schedule DSRT CORBA 2.0 thread eligibility• Work in progress (2004)
– Thrall, Torri, Mgeta (M.S.) Zhang, Subramonian (D.Sc.)– Boeing, BBN, URI, OU, KU collaborations
• Many open issues– Distributable thread identity, mechanism trade-offs– Template meta-programming, configuration logic/types
ClientsServants
DispatcherDispatcher
POA
Skeleton Skeleton
StubStub
ORB
Part I: Middleware Summary
• Middleware is ultimately about abstraction– Which details to reveal and which to hide
• Optimization possible then hit trade-offs– Affects abstraction, policy, mechanism choices
• We’ve surveyed a number of issues– Themes and examples will re-appear in part II
• Set the stage for discussion of composition– Both what to compose and how to compose it
Part II: Just Enough Middleware
• Motivating example: active vibration damping– Illustrates networked embedded systems domain
• Constraints: optimizations and trade-offs• ORB customization for example application
– Footprint reduction approach in nORB– Real-time optimizations in ACE, nORB, TAO
• Beyond optimization to trade-offs– Footprint vs. timeliness (add/remove features)– Deadlock vs. feasibility (change a feature)
• Composition logics, types, models
An Aside: Why Not minimumCORBA?
Pros• minimumCORBA designed for resource-constrained systems• Maintains interoperability with full-featured CORBACons• Designed for a particular point in the larger design space• Resource constraints may be too stringent (or just different)• Thus, “one size fits all” may cost too much in key casesOur approach• Provide a flexible efficient substrate for feature compositionConsequences for “just enough” ORB design• Follows the spirit if not the letter of minimumCORBA• Provides fine grain tailored feature subsetting/removal• Maintains appropriate interoperability
Networked Embedded Systems
Structure with Piezoelectric Transducers
Sensor Measurements
Actuator Excitation
Active vibration damping– Sensors: 2kHz sampling
F: deflection → voltage (100Hz reporting rate)
– Computing nodes• Coordination protocols
(ping scheduling)• Closed sensor, control,
actuator loop
– Actuators: continuous G: voltage → deflection
(Change value when told)
computenode
computenode
computenode
QoS Constraints on Time and Space
• Given > 1 dimension– E.g., real-time, footprint– Tight constraints are
commonly the case
• Optimizations– Improve in a dimension– At no cost to the other
(may even improve too)
• Trade-offs– Improve in a dimension – At cost to other
execution time
foot
prin
t
optimize
optimizetrade
off
Footprint Reduction Approach in ACE
• Doxygen, other tools show ACE dependency graph• Prune unused classes and interactions first• Refactor for fine-grain composition of what remains
Refining the ACE Substrate
• Decouple concerns– Re-factoring needed
• Also shows problem– With inheritance
based composition• Pattern-oriented
role decomposition– Reactor– Acceptor– Connector– Event Handler – Svc Handler
ACE_Svc_Handler
ACE_Task
ACE_Task_Base
ACE_Service_Object
ACE_Event_Handler
PS_Event_Handler
Peer stream
ACE_Event_Handler
A Starting Point for Just Enough Middleware
Client Server
ClientStub
ORB Core
ORB Core
ServerSkeleton
SimpleObject
Adapter
CDRDemarshaller
CDRMarshaller
Client invokesa method call on the remote object
Stub creates a collection
of parameters
Parameters aremarshalled
IIOPRequest
sent
IIOPRequestreceived
Server methodinvoked
Servant objectlocated
Parameters areunmarshalled
Dispatch method call to servant
nORB Design Approach
• Implement simple DOC invocation path
• Customize TAO’s CORBA IDL compiler
• Benchmark nORB, TAO, ACE empirically– Using a representative coordination protocol– Pay careful attention to time/space trade-offs
• Cycle: design → implement → benchmark …
Customized CORBA IDL Compiler
• Subset of standard CORBA IDL, IIOP 1.0 only– Optimizes marshaling time, message sizes – All primitive CORBA types (boolean, long, float, …)– Arrays, Sequences, Structures, Interfaces
• Re-factored TAO IDL compiler– nORB specific back end
• Custom mapping using C++ STL– Easier programming model to use– I.e., fewer memory management pitfalls
nORB Performance Enhancements
• Critical path optimizations similar to TAO– Gather-write technique
used to send requests– Memory allocators instead
of new/delete– Direct upcall model– Single-read optimization
for server side requests
• Capture key build options
ACE_COMPONENTS=FOR_TAOexceptions=0rtti=0inline=0threads=0debug=0 optimize=1ami=0corba_messaging=0rt_corba=0shared_libs=0 static_libs_only=1 DEFFLAGS=-
DACE_USE_RCSID=0minimum_corba=1
Benchmarking Studies
• With Venkita Subramonian, Guoliang Xing, Ron Cytron
• Early results reported at WORDS ’03 (Guadalajara)
• Test application– Distributed graph coloring– Simple distributed constraint
satisfaction problem– Represents, e.g., ping node
scheduling in our example– We used it as a touchstone for
footprint & performance• Compare 3 implementations
using ACE, TAO, nORB
V
colo
rv
colorxW
X
Y
Z
colorw
colo
rx
colorx
colo
r x
colory
colo
r z
Comparing ACE, nORB and TAO
• ~500,000 repeated trials to generate large sample population– Better confidence in finer-grain distinctions between time bounds– Time for each asynchronous message passing round to complete– 100 nodes in 10x10 square mesh (interior nodes have 4 neighbors) – Four 2.53GHz P4 512MB RAM KURT-Linux boxes over 100Kb/s Ethernet
choose
choose
comparetime
one round
node Y
colorZ
improveZ
colorYimproveY
colorY
improveY
colorX
improveX
store
Node Footprint Comparison
0
200
400
600
800
1,000
1,200
1,400
1,600
1,800
2,000
Fo
otp
rin
t in
KB
NodeNodeRegistry
Node 376 1800 567 1738 509
NodeRegistry 324 1778 549 1725 492
ACE TAO nORBcompile
optimized TAO
compile optimized
nORB
Middleware layer with only ACE costs 212KB
Middleware layer with nORB+ACE costs 345KB (133KB over ACE)
Middleware layer using TAO+ACE costs ~1.7MB (~1.2MB over nORB+ACE)
Node application alone costs 164KB
Experimental Trials
~500,000 repeated trials to generate large sample population– Want confidence in fine-grain time bounds distinctions – Measure time of each message passing round– 100 nodes in 10x10 square mesh on 4 networked machines– 2.53GHz P4 512MB RAM KURT-Linux, 100Kb/s Ethernet
choose
choose
comparetime
one round
node Y
colorZ
improveZ
colorYimproveY
colorY
improveY
colorX
improveX
store
Optimizing TAO Cost per Round
g++ -O3 compile time optimization
null locks
wait-on-select-reactor
notice tails
single read
POA and reactor locks
wait-on-TP-reactor
Optimizing nORB Cost per Round
g++ -O3 compile time optimization
single read null locks
tighter distributions
than with TAO
SOA and reactor locks wait-on-connection
Impact on Algorithm Convergence
nORB distribution
is tighter
TAO better in average case
ORB cost
~6Hz 4Hz
3Hz2.5Hz
Soft Real-Time Bounds on Round Times
0
10
20
30
40
50
60
70
99% 98% 95% 90% 80%
percentage of samples bounded
ms
ec
TAOnORBcompile optimized TAOcompile optimized nORBruntime optimized TAOruntime optimized nORBACE
Bounds for nORB tighter than for TAO at or above 90%
previous plots resolution
Hard Real-Time Bounds on Round Times
0
50
100
150
200
250
100.00% 99.9999% 99.999% 99.99% 99.9%
percentage of samples bounded
mse
c
TAOnORBcompile optimized TAOcompile optimized nORBruntime optimized TAOruntime optimized nORBACE
Bounds for nORB much tighter than for TAO as we approach 100%
ACE values anomalous… (termination artifact)
Time and Space Design Map
time cost
foot
prin
t co
st
TAOcompile
optimizationruntime
optimization
nORBcompile
optimizationruntime
optimization
hash lookup, single marshal(in progress)
ACE(hand crafted)
Beyond Optimization to Trade-Offs
• For a given application, customization is possible– Usually more a matter of engineering than research
• However, beyond a certain point we hit trade-offs– Finding those trade-offs is interesting systems research– Lead to deep research questions in CS theory, logic
• We looked at combinations of features in nORB– Leading to time and space trade-offs
• We’ll consider the impact of a single feature– Interesting due to interactions with other features
• And the use of typesystems to generate configurations
Call Reply Configuration Use-Cases• Strategy used to wait for
replies – Wait on connection
• No interference with other calls
– Wait on reactive mechanism• Interleaved processing of
incoming requests• Blocking factors affected
• Choose strategy based on system characteristics– Call graph– Deployment of servants– Thread pool sizes
Client ServerC
Reactor
wait
Re
acto
r
Servant
Callback
6
Deadlockavoided
by waitingon reactor
1
3
42
5
Client ServerC
Reactor
3 wait
Re
acto
r
Servant
Deadlockhere
Callback
1 2 4
5
Nested Upcall scenario
Logics and Typesystems• Can describe configuration problem informally
– But computability and time complexity are serious issues
• Can describe problem formally in first-order logic– But may be computationally infeasible for complex systems– Horn clauses etc. may help, but only in some cases
• Another approach: apply re-factoring here as well– Push evaluation down into universe of discourse– Thus simplifying logic so it’s tractable (even real-time!)
• Typesystems approach may help– Compute static modes (state space explosion)– Compute dynamic modes (halting problems)– Behavioral types are interesting (Henzinger and Lee)
Dispatching Configuration Example
• QoS attributes based on scheduling policy
• Bundle together all QoS attributes in one descriptor
• Can we generate the appropriate QoS descriptor?– Use a configurator to generate
the attributes
– Scheduling policy as input to generator
QoS DescriptorGenerator
Scheduling policy
QoS Descriptor
C++ Template Meta-Programming
• Mechanism to embed generators in C++– Completely within the purview of C++ language
• Metainformation represented using– Member traits, Traits classes, Traits templates
• Compile-time control-flow constructs – Template metafunctions E.g. IF, THEN, ELSE, CASE– Conditional compilation based on evaluation of type-
expressions
• Issues– Advanced usage of C++ templates – Compiler support an issue
Generator for QoS Descriptorenum Disp_Rule_t { RMS, EDF, MLF, MUF, other };template<Disp_Rule_t>struct QoSDesc{};//template specializationstemplate<>struct QoSDesc<RMS>{ long period; //fields specific to RMS};template<>struct QoSDesc<EDF>{ long deadline; //fields specific to EDF};template<>struct QoSDesc<MLF>{ //fields specific to MLF};
template <Disp_Rule_t disp_rule>
struct QoSDescriptorGenerator
{
typedef typename CASE<EDF,QoSDesc<EDF>,
CASE<RMS,QoSDesc<RMS>,
CASE<MLF,QoSDesc<MLF> > > >
disp_rule_case_list;
typedef typename
SWITCH<disp_rule,
disp_rule_case_list>::RET
QoSDescriptor_;
typedef QoSDescriptor_ RET;
};
typedef QoSDescriptorGenerator<EDF>::RET
QoSDescriptor;
Related Work• Minimalist middleware frameworks
– UBI-core at UIUC, other projects• Note that making lowest level substrate robust is an issue
• Composition logics– Task Scheduler Logic at U. Utah– WUGLE project at WUSTL
• Instrumentation and history analysis– DSKI/DSUI and event history work at KU
• Extending/exploiting type systems– Ptolemy at U.C. Berkeley– Kokyu template meta-programming at WUSTL
• RMA, dispatcher composition in C++ typesystem
Concluding Remarks
• Use generative middleware programming to compose and configure fine-grained infrastructure– Leverage features of existing programming languages
• Design substrates for use in system generation– “system aspect frameworks”
• Drive infrastructure configuration and adaptation strategies from logics, types, algebras– While avoiding NP-hard cases and halting problems
• From “Just Enough Middleware” we want to generate “Just The Right Middleware” each time