Composing Just Enough Middleware Christopher D. Gill [email protected] Department of Computer...

Composing Just Enough Middleware

Christopher D. [email protected]

Department of Computer Science and EngineeringWashington University

St. Louis, MO, USA

E71 CS 6785 Programming Languages SeminarFriday, October 17, 2003

Talk Outline

• Part I: Introduction to middleware

(30 minutes + Q&A)

• Part II: Composing just enough middleware

(60 minutes + Q&A)

• Questions are welcome and encouraged

(During and after each part)

Part I: Introduction to Middleware

• Middleware is a “glue” layer between – “Fixed” infrastructure (hardware, operating system) – Variable (application-specific) parts of a system

• Middleware raises the level of abstraction– At which developers program the system

Client Server

(Lots of details hidden by middleware)

message

Distributed Object Computing (DOC)• Assumes objects, references, methods

• Also, an Object Request Broker (ORB)

• Historically, a user-space software layer – Between the operating system and the application– Middleware may be pushed into OS, hardware

• Sensor nets, on-chip FPGAs, hardware threading

– But for this talk, assume• implemented on top of threads, sockets, and timers

Client Object

(Lots of details again hidden by middleware)

method ()

reference

Maintaining the DOC Illusion

• A “reference” to an object– Encodes IP address, ID

• A “wire format” is defined for invocation messages– Client stubs “marshal”– Server skeletons “un-marshal”

• A servant implements an object in a server

• All other details are ORB implementation features– Thread pools, sockets, etc.– Usually abstracted away

ORB ORB

Stub

Client

Skeleton

Servant

IIOPmessage

objectreference

A Simple DOC Invocation Path

Client Server

ClientStub

ORB Core

ORB Core

ServerSkeleton

SimpleObject

Adapter

CDRDemarshaller

CDRMarshaller

Client invokesa method call on the remote object

Stub creates a collection

of parameters

Parameters aremarshalled

IIOPRequest

sent

IIOPRequestreceived

Server methodinvoked

Servant objectlocated

Parameters areunmarshalled

Dispatch method call to servant

But What’s Really Inside an ORB?

• ORB connects stubs to skeletons

• Uses threads, sockets, reactors, tables, etc.

• Ensures 0/1 delivery– At most one recipient once– Caller may get exception

• User level: servant throws• ORB level: e.g., network

• Object adapters support multiple servants

ORBIIOP

message

ORB

socket

POA

lookup

timers reactor

thread pool

send_n recv_n

upcall to skeleton

Marshalling (IIOP message)

• Standard CORBA GIOP message format– Mapped to TCP/IP protocols

• IDL parameter types– Primitive types – Arrays – Sequences– Structures– Interfaces– Anys– Typecodes

• Marshaling and un-marshaling relatively expensive– Apply optimizations, e.g., co-location, where possible

Wait-On-Reply Strategy (2-Way Calls)

• Wait on connection– Low overhead– Does not interfere

with other calls– But may lead to

deadlock

• Wait on reactor– Higher overhead– May block other calls– Does not lead to

same deadlock

Client ServerC

Reactor

wait

Re

acto

r

Servant

Callback

6

Deadlockavoided

by waitingon reactor

1

3

42

5

Client ServerC

Reactor

3 wait

Re

acto

r

Servant

Deadlockhere

Callback

1 2 4

5

Socket/Thread Multiplexing• Connection cache

– Serial socket re-use

• Thread pool– Serial thread re-use

• Reactors support concurrent multiplexing– Calls onto a thread

• Can multiplex calls onto a socket concurrently too– Leader/followers– Half-sync/Half-asyncHalf-Sync/Half-Async

Design Pattern (POSA2)

worker thread

asynchronousrequests

bufferedrequests

Where does Real-Time Fit In?

• Real-time is about predictability“Real-time != real fast” (predicable real fast is good)

• Different real-time enforcement mechanisms – Static → efficiency of mechanisms (e.g., overhead)– Dynamic → flexibility of policies (e.g., utilization)– Hybrid → combine both (e.g., utilization + isolation)

• Different notions of scheduling strategy possible– E.g., EDF, MLF, RMS, MAU

• Various kinds of schedulable entities– E.g., messages, events, OS or distributable threads

Static Real-Time Lanes

• RT CORBA 1.0 spec implemented in TAO– Irfan Pyarali’s dissertation work

• Each lane composed of thread(s)+socket(s)– Dedicated statically to each lane a priori

• Each lane has a priority value– Highest priority lane is most eligible at a resource

• Efficient enforcement by OS, network layers– E.g., fixed thread + RSVP priorities

• Dispatching decision functionPriority value → lane assignment (lane does the rest)

Real-Time Object Adapters• Efficiency vs. flexibility trade-offs in an upcall

– Irfan Pyarali, Aniruddha Gokhale, Doug Schmidt

• If space of object IDs is known a priori– Can use hashing for O(1) servant lookup– But must waste some space of empty table slots– And server must assign IDs (limits spec slightly)

• Offers another scheduling point in the ORB– Can adapt priorities at these points– Restrict priority inversion (say if no RSVP)

• Promotes Efficiency (Gokhale et al.)– Reductions in locking and copying during upcall

Kokyu Dispatching Framework• “Kokyu” is a Japanese word

– Means “breath”, but also timing

• Generalizes lanes– To dynamic, hybrid cases

• Prioritized Threads– Isolate dispatch lanes

• Queues– Order requests within each lane

• Timers– Pace periodic requests

• Configuration (tuples)<#, prio, Q type, timer>

• Implicit projection of scheduling onto OS

DispatcherDispatching configuration

RMS

LLF

mandatory

optionallaxity

static

static

timers

Kokyu in a Real-Time Event Channel

• Well known environment– Harrison M.S. thesis, O’Ryan, Gill dissertations– Boeing, Stanford, KSU collaborations

• Schedule event pushes as in Kokyu experiments– Application: ~70 components “flown” in flight simulator– Static, hybrid static/dynamic scheduling strategies RMS, MUF, RMS+LLF– Empirical studies of isolation and effectiveness

Dispatcher

laxity

static

static

Pro

xyPro

xyF

ilte

ring

Co

rre

latio

n

Suppliers Event ChannelConsumers

Kokyu with (Distributable) Threads

• E.g., schedule DSRT CORBA 2.0 thread eligibility• Work in progress (2004)

– Thrall, Torri, Mgeta (M.S.) Zhang, Subramonian (D.Sc.)– Boeing, BBN, URI, OU, KU collaborations

• Many open issues– Distributable thread identity, mechanism trade-offs– Template meta-programming, configuration logic/types

ClientsServants

DispatcherDispatcher

POA

Skeleton Skeleton

StubStub

ORB

Part I: Middleware Summary

• Middleware is ultimately about abstraction– Which details to reveal and which to hide

• Optimization possible then hit trade-offs– Affects abstraction, policy, mechanism choices

• We’ve surveyed a number of issues– Themes and examples will re-appear in part II

• Set the stage for discussion of composition– Both what to compose and how to compose it

Part II: Just Enough Middleware

• Motivating example: active vibration damping– Illustrates networked embedded systems domain

• Constraints: optimizations and trade-offs• ORB customization for example application

– Footprint reduction approach in nORB– Real-time optimizations in ACE, nORB, TAO

• Beyond optimization to trade-offs– Footprint vs. timeliness (add/remove features)– Deadlock vs. feasibility (change a feature)

• Composition logics, types, models

An Aside: Why Not minimumCORBA?

Pros• minimumCORBA designed for resource-constrained systems• Maintains interoperability with full-featured CORBACons• Designed for a particular point in the larger design space• Resource constraints may be too stringent (or just different)• Thus, “one size fits all” may cost too much in key casesOur approach• Provide a flexible efficient substrate for feature compositionConsequences for “just enough” ORB design• Follows the spirit if not the letter of minimumCORBA• Provides fine grain tailored feature subsetting/removal• Maintains appropriate interoperability

Networked Embedded Systems

Structure with Piezoelectric Transducers

Sensor Measurements

Actuator Excitation

Active vibration damping– Sensors: 2kHz sampling

F: deflection → voltage (100Hz reporting rate)

– Computing nodes• Coordination protocols

(ping scheduling)• Closed sensor, control,

actuator loop

– Actuators: continuous G: voltage → deflection

(Change value when told)

computenode

computenode

computenode

QoS Constraints on Time and Space

• Given > 1 dimension– E.g., real-time, footprint– Tight constraints are

commonly the case

• Optimizations– Improve in a dimension– At no cost to the other

(may even improve too)

• Trade-offs– Improve in a dimension – At cost to other

execution time

foot

prin

t

optimize

optimizetrade

off

Footprint Reduction Approach in ACE

• Doxygen, other tools show ACE dependency graph• Prune unused classes and interactions first• Refactor for fine-grain composition of what remains

Refining the ACE Substrate

• Decouple concerns– Re-factoring needed

• Also shows problem– With inheritance

based composition• Pattern-oriented

role decomposition– Reactor– Acceptor– Connector– Event Handler – Svc Handler

ACE_Svc_Handler

ACE_Task

ACE_Task_Base

ACE_Service_Object

ACE_Event_Handler

PS_Event_Handler

Peer stream

ACE_Event_Handler

A Starting Point for Just Enough Middleware

Client Server

ClientStub

ORB Core

ORB Core

ServerSkeleton

SimpleObject

Adapter

CDRDemarshaller

CDRMarshaller

Client invokesa method call on the remote object

Stub creates a collection

of parameters

Parameters aremarshalled

IIOPRequest

sent

IIOPRequestreceived

Server methodinvoked

Servant objectlocated

Parameters areunmarshalled

Dispatch method call to servant

nORB Design Approach

• Implement simple DOC invocation path

• Customize TAO’s CORBA IDL compiler

• Benchmark nORB, TAO, ACE empirically– Using a representative coordination protocol– Pay careful attention to time/space trade-offs

• Cycle: design → implement → benchmark …

Customized CORBA IDL Compiler

• Subset of standard CORBA IDL, IIOP 1.0 only– Optimizes marshaling time, message sizes – All primitive CORBA types (boolean, long, float, …)– Arrays, Sequences, Structures, Interfaces

• Re-factored TAO IDL compiler– nORB specific back end

• Custom mapping using C++ STL– Easier programming model to use– I.e., fewer memory management pitfalls

nORB Performance Enhancements

• Critical path optimizations similar to TAO– Gather-write technique

used to send requests– Memory allocators instead

of new/delete– Direct upcall model– Single-read optimization

for server side requests

• Capture key build options

ACE_COMPONENTS=FOR_TAOexceptions=0rtti=0inline=0threads=0debug=0 optimize=1ami=0corba_messaging=0rt_corba=0shared_libs=0 static_libs_only=1 DEFFLAGS=-

DACE_USE_RCSID=0minimum_corba=1

Benchmarking Studies

• With Venkita Subramonian, Guoliang Xing, Ron Cytron

• Early results reported at WORDS ’03 (Guadalajara)

• Test application– Distributed graph coloring– Simple distributed constraint

satisfaction problem– Represents, e.g., ping node

scheduling in our example– We used it as a touchstone for

footprint & performance• Compare 3 implementations

using ACE, TAO, nORB

V

colo

rv

colorxW

X

Y

Z

colorw

colo

rx

colorx

colo

r x

colory

colo

r z

Comparing ACE, nORB and TAO

• ~500,000 repeated trials to generate large sample population– Better confidence in finer-grain distinctions between time bounds– Time for each asynchronous message passing round to complete– 100 nodes in 10x10 square mesh (interior nodes have 4 neighbors) – Four 2.53GHz P4 512MB RAM KURT-Linux boxes over 100Kb/s Ethernet

choose

choose

comparetime

one round

node Y

colorZ

improveZ

colorYimproveY

colorY

improveY

colorX

improveX

store

Node Footprint Comparison

0

200

400

600

800

1,000

1,200

1,400

1,600

1,800

2,000

Fo

otp

rin

t in

KB

NodeNodeRegistry

Node 376 1800 567 1738 509

NodeRegistry 324 1778 549 1725 492

ACE TAO nORBcompile

optimized TAO

compile optimized

nORB

Middleware layer with only ACE costs 212KB

Middleware layer with nORB+ACE costs 345KB (133KB over ACE)

Middleware layer using TAO+ACE costs ~1.7MB (~1.2MB over nORB+ACE)

Node application alone costs 164KB

Experimental Trials

~500,000 repeated trials to generate large sample population– Want confidence in fine-grain time bounds distinctions – Measure time of each message passing round– 100 nodes in 10x10 square mesh on 4 networked machines– 2.53GHz P4 512MB RAM KURT-Linux, 100Kb/s Ethernet

choose

choose

comparetime

one round

node Y

colorZ

improveZ

colorYimproveY

colorY

improveY

colorX

improveX

store

Optimizing TAO Cost per Round

g++ -O3 compile time optimization

null locks

wait-on-select-reactor

notice tails

single read

POA and reactor locks

wait-on-TP-reactor

Optimizing nORB Cost per Round

g++ -O3 compile time optimization

single read null locks

tighter distributions

than with TAO

SOA and reactor locks wait-on-connection

Impact on Algorithm Convergence

nORB distribution

is tighter

TAO better in average case

ORB cost

~6Hz 4Hz

3Hz2.5Hz

Soft Real-Time Bounds on Round Times

0

10

20

30

40

50

60

70

99% 98% 95% 90% 80%

percentage of samples bounded

ms

ec

TAOnORBcompile optimized TAOcompile optimized nORBruntime optimized TAOruntime optimized nORBACE

Bounds for nORB tighter than for TAO at or above 90%

previous plots resolution

Hard Real-Time Bounds on Round Times

0

50

100

150

200

250

100.00% 99.9999% 99.999% 99.99% 99.9%

percentage of samples bounded

mse

c

TAOnORBcompile optimized TAOcompile optimized nORBruntime optimized TAOruntime optimized nORBACE

Bounds for nORB much tighter than for TAO as we approach 100%

ACE values anomalous… (termination artifact)

Time and Space Design Map

time cost

foot

prin

t co

st

TAOcompile

optimizationruntime

optimization

nORBcompile

optimizationruntime

optimization

hash lookup, single marshal(in progress)

ACE(hand crafted)

Beyond Optimization to Trade-Offs

• For a given application, customization is possible– Usually more a matter of engineering than research

• However, beyond a certain point we hit trade-offs– Finding those trade-offs is interesting systems research– Lead to deep research questions in CS theory, logic

• We looked at combinations of features in nORB– Leading to time and space trade-offs

• We’ll consider the impact of a single feature– Interesting due to interactions with other features

• And the use of typesystems to generate configurations

Call Reply Configuration Use-Cases• Strategy used to wait for

replies – Wait on connection

• No interference with other calls

– Wait on reactive mechanism• Interleaved processing of

incoming requests• Blocking factors affected

• Choose strategy based on system characteristics– Call graph– Deployment of servants– Thread pool sizes

Client ServerC

Reactor

wait

Re

acto

r

Servant

Callback

6

Deadlockavoided

by waitingon reactor

1

3

42

5

Client ServerC

Reactor

3 wait

Re

acto

r

Servant

Deadlockhere

Callback

1 2 4

5

Nested Upcall scenario

Logics and Typesystems• Can describe configuration problem informally

– But computability and time complexity are serious issues

• Can describe problem formally in first-order logic– But may be computationally infeasible for complex systems– Horn clauses etc. may help, but only in some cases

• Another approach: apply re-factoring here as well– Push evaluation down into universe of discourse– Thus simplifying logic so it’s tractable (even real-time!)

• Typesystems approach may help– Compute static modes (state space explosion)– Compute dynamic modes (halting problems)– Behavioral types are interesting (Henzinger and Lee)

Dispatching Configuration Example

• QoS attributes based on scheduling policy

• Bundle together all QoS attributes in one descriptor

• Can we generate the appropriate QoS descriptor?– Use a configurator to generate

the attributes

– Scheduling policy as input to generator

QoS DescriptorGenerator

Scheduling policy

QoS Descriptor

C++ Template Meta-Programming

• Mechanism to embed generators in C++– Completely within the purview of C++ language

• Metainformation represented using– Member traits, Traits classes, Traits templates

• Compile-time control-flow constructs – Template metafunctions E.g. IF, THEN, ELSE, CASE– Conditional compilation based on evaluation of type-

expressions

• Issues– Advanced usage of C++ templates – Compiler support an issue

Generator for QoS Descriptorenum Disp_Rule_t { RMS, EDF, MLF, MUF, other };template<Disp_Rule_t>struct QoSDesc{};//template specializationstemplate<>struct QoSDesc<RMS>{ long period; //fields specific to RMS};template<>struct QoSDesc<EDF>{ long deadline; //fields specific to EDF};template<>struct QoSDesc<MLF>{ //fields specific to MLF};

template <Disp_Rule_t disp_rule>

struct QoSDescriptorGenerator

{

typedef typename CASE<EDF,QoSDesc<EDF>,

CASE<RMS,QoSDesc<RMS>,

CASE<MLF,QoSDesc<MLF> > > >

disp_rule_case_list;

typedef typename

SWITCH<disp_rule,

disp_rule_case_list>::RET

QoSDescriptor_;

typedef QoSDescriptor_ RET;

};

typedef QoSDescriptorGenerator<EDF>::RET

QoSDescriptor;

Related Work• Minimalist middleware frameworks

– UBI-core at UIUC, other projects• Note that making lowest level substrate robust is an issue

• Composition logics– Task Scheduler Logic at U. Utah– WUGLE project at WUSTL

• Instrumentation and history analysis– DSKI/DSUI and event history work at KU

• Extending/exploiting type systems– Ptolemy at U.C. Berkeley– Kokyu template meta-programming at WUSTL

• RMA, dispatcher composition in C++ typesystem

Concluding Remarks

• Use generative middleware programming to compose and configure fine-grained infrastructure– Leverage features of existing programming languages

• Design substrates for use in system generation– “system aspect frameworks”

• Drive infrastructure configuration and adaptation strategies from logics, types, algebras– While avoiding NP-hard cases and halting problems

• From “Just Enough Middleware” we want to generate “Just The Right Middleware” each time

Date post:	27-Dec-2015
Category:	Documents
Upload:	bertha-douglas
View:	216 times
Download:	0 times

Composing Just Enough Middleware Christopher D. Gill [email protected] Department of Computer...

Documents