+ All Categories
Home > Documents > Synchronous/Reactive Programming of Concurrent System … · 2016-01-27 · Synchronous/Reactive...

Synchronous/Reactive Programming of Concurrent System … · 2016-01-27 · Synchronous/Reactive...

Date post: 27-May-2020
Category:
Upload: others
View: 23 times
Download: 0 times
Share this document with a friend
33
Synchronous/Reactive Programming of Concurrent System Software Bruce R. Montague and Charles E. McDowell Computer and Information Sciences University of California, Santa Cruz UCSC-CRL-95-51 November 28, 1995 Baskin Center for Computer Engineering & Information Sciences University of California, Santa Cruz Santa Cruz, CA 95064 USA ABSTRACT Synchronous languages are intended for programming reactive systems. Reactive systems, which include real-time systems and key operating system components, interact continually with their environment. This paper considers the applicability of imperative synchronous/reactive languages to the development of general system software, that is, to the implementation of operating system kernels, file systems, databases, networks, server architectures, device drivers, etc.. The languages Esterel and Reactive C (RC) receive special attention as Esterel is the oldest and most developed such language and RC is specifically designed for compatibility with C systems programming. An alternative soft-instruction software architecture is described which is well suited to real-world system programming. keywords: reactive systems, synchronous language, concurrent programming, system software, operating systems, threading, real-time systems, soft-instructions.
Transcript

Synchronous/Reactive Programmingof Concurrent System Software

Bruce R. Montague and Charles E. McDowellComputer and Information SciencesUniversity of California, Santa Cruz

UCSC-CRL-95-51November 28, 1995

Baskin Center forComputer Engineering & Information Sciences

University of California, Santa CruzSanta Cruz, CA 95064 USA

ABSTRACT

Synchronous languages are intended for programming reactive systems. Reactive systems,which include real-time systems and key operating system components, interact continually withtheir environment. This paper considers the applicability of imperative synchronous/reactivelanguages to the development of general system software, that is, to the implementation ofoperating system kernels, file systems, databases, networks, server architectures, device drivers,etc.. The languages Esterel and Reactive C (RC) receive special attention as Esterel is the oldestand most developed such language and RC is specifically designed for compatibility with C systemsprogramming. An alternative soft-instruction software architecture is described which is well suitedto real-world system programming.keywords: reactive systems, synchronous language, concurrent programming, system software,

operating systems, threading, real-time systems, soft-instructions.

0.1. Introduction 1

0.1 Introduction

The imperative synchronous languages Estereland Reactive C (RC) were developed to ad-dress concurrent programming difficulties associatedwith the reactive systems commonly encounteredin real-time and embedded programming [BdS91][Bou91]. These languages have been called syn-chronous/reactive languages. This paper considersthe applicability of such languages to the closely re-lated problem of general system software develop-ment, and then describes an alternative software ar-chitecture intended specifically for system softwaredevelopment.

The remainder of this paper is organized as fol-lows: The concurrent system programming problemis briefly described and the importance of establish-ing software architectures suited to this problem isnoted. Reactive systems and synchronous languagesare then described and existing work briefly sur-veyed. Source code is examined from the viewpointof the system programmer, and conclusions drawnregarding applicability to implementation of genericsystem software. The soft-instruction software ar-chitecture is described, an example considered, andrelevant historical and current work noted.

0.1.1 Concurrent System Software

System software consists of components such asoperating system kernels; file, database, and networksystems; device drivers; and server architectures forI/O, transaction processing, and multimedia. Thesecomponents react to external service requests gen-erated by applications, other components, and hard-ware. Each component is generally capable of servic-ing multiple concurrent requests, some of which maybe of long-duration, and some of which may havereal-time constraints. Thus, operating system soft-ware constitutes a significant concurrent program-ming problem, and it is widely agreed that devel-opment of such software remains difficult [Atw76][Sch86] [Wat90].

There are many ways to view concurrency. Theremainder of this section describes concurrent pro-gramming considerations germane to the design andprogramming of concurrent system software, that is,

concurrency from the viewpoint of the system pro-grammer.

We can identify three orthogonal attributes thataffect the systems programmer:Competitive vs Cooperative Are concurrent requests

competing for the same resource or are theycooperating to satisfy a single, higher level re-quest? The answer to this may depend on thecomponent that is being used as a frame of ref-erence.

Heavyweight, Lightweight, or Featherweight Howexpensive is a context switch?

Internal vs External Are concurrent requests beingmanaged by a single component?

These are explained further below.

Cooperative concurrency either increases a singlerequest’s performance or simplifies multiple eventcoordination pertaining to a single request. Perfor-mance is increased by techniques such as issuingmultiple overlapped I/O operations on behalf of asingle service request. Coordination provided by co-operative concurrency simplifies activities such asrequest cancelation and error handling. Cooperativeconcurrency typically reduces the latency of a singlerequest.

Competitive concurrency maintains the timeshar-ing illusion. This permits a given service requestto be considered ‘invisible’ to other unrelated ser-vice requests currently active within the same systemcomponent. Competitive concurrency occurs when asystem component can execute two or more concur-rent and unrelated service requests generated externalto the component. Competitive concurrency mech-anisms typically increase overall throughput of thecomponent.

The amount of context copied during a contextswitch dominates performance. Modern hardwareregister sets are divided into non-privileged and priv-ileged registers. Non-privileged registers are acces-sible to applications while privileged registers areprotected from normal application access. A heavy-weight context switch completely switches both priv-ileged and non-privileged registers. This may requireupdating memory structures associated with both theprivileged registers and with the software processmodel implemented by the system. A heavyweight

2

context switch typically crosses protection bound-aries.

Switching only the non-privileged registers pro-vides lightweight context switching. Typically, oneof these registers is the application stack pointer,so there is usually a stack associated with eachthread defined by a set of non-privileged registers.A lightweight context switch is often an order ofmagnitude faster than a heavyweight context switch[Laz91].

High-performance systems occasionally are de-signed which switch only a subset of the non-privileged registers. This approach is used in sometransaction systems, such as IBM’s TPF, and is oftenused within the interrupt handling component of op-erating systems [Mar90]. Context switching a subsetof the non-privileged registers will be called feather-weight in the remainder of this paper. In the extreme,featherweight context switching involves switchingonly a single register, which usually is the base reg-ister of a data context describing a transaction orservice request.

In this paper we are concerned with concurrencyexplicitly controlled by a single component to at leastsome extent. We call this internal concurrency. Allother concurrency is external. More specifically,from the viewpoint of a systems programmer, allconcurrent activities occur in response to some re-quest. Given two concurrent activities, the activitiesrepresent internal concurrency if at least one activityis being handled by the component under consider-ation and the following two conditions are true. Allother cases represent external concurrency.

1. Both activities are being handled by the samecomponent.

2. The data structures representing the concurrentactivities are explicitly represented within thecomponent.

An example that satisfies the first condition abovebut not the second would be conventional re-entrantcode with no shared data structures. One or moreexternal components could initiate distinct activitieswithin the component that were completely unre-lated.

Due to differing internal concurrency require-ments, different components often implement differ-ent internal concurrency mechanisms. Both compet-

itive and cooperative concurrency may be supportedinternal to the component.

A system component itself typically executes ina concurrent environment in which it competes withother components for external resources. A commonexample of such external concurrency is competitionbetween components for the CPU. Another exam-ple of important external concurrency is the mecha-nism by which the component can initiate and man-age multiple asynchronous external requests, for in-stance, multiple asynchronous I/O operations.

The degree that concurrent contexts internal to asystem component are simultaneously exposed to thesystem programmer differs, often reflecting the re-quired degree of cooperation between requests. Atone end of the spectrum are classical threading mod-els which typically do not expose to a given thread theinternal state of other threads, that is, a thread’s stackcontents are private. At the other extreme, as ex-emplified by some windowing systems, multimediaservers, and simulation-action games, all concurrentcontext may be equally visible and expressly man-aged by the programmer.

In thread-based approaches context tends to beencapsulated in a single data structure, such as astack, which is explicitly scheduled by a lower-levelscheduler. The implementation of this scheduler isoften not visible to the programmer. In these systems,thread scheduling provides for concurrent behaviorof processes, threads, transactions, requests, etc..The programmer gets concurrency ‘for free’ becauseimplementation of the underlying threading systemis not the programmer’s responsibility.

At the other extreme, where expressly managedconcurrent behavior is inherent in the program im-plementation, the programmer has complete respon-sibility for locating and scheduling activities. Theprogrammer is in control and effectively defines cus-tom context and concurrency mechanisms. Contextmay be distributed throughout program data struc-tures. Locating relevant context upon the occurrenceof each significant event may require considerablerun-time logic. Such expressly managed approachestend to be used when concurrent activities are highlyinterrelated and it is advantageous for the program-mer to have a ‘gods eye’ view of the entire concurrentstate. Sometimes, as with the X-windows system, the

0.2. Reactive Systems 3

code that expressly manages activities is placed intoa standard run-time system. In general, modern win-dowing systems have used approaches based on ex-plicit application event dispatch work-loops and call-backs rather than thread-based concurrency. Thisconcurrency style, convenient in highly interactivecooperative environments, is sometimes called fauxconcurrency [Rep95].

Concurrent service requests for file, network,database, and transaction systems often have a highdegree of independence. Unlike message and win-dow servers, two arbitrary service requests to thistype of server are usually competitive. Optimally,the degree of internal concurrency within such acomponent is determined dynamically by the rateat which the external environment generates servicerequests. Lightweight or featherweight competitiveinternal concurrency mechanisms are thus very im-portant for these servers.

The system programmer is responsible for consid-ering all of these issues. The internal concurrencymechanisms in many system components are onlyused by the system programmer, and never directlyby external code. The system programmer is there-fore free to implement custom concurrency mecha-nisms as well as program using those mechanisms.

Thus, when implementing a system component,one often programs with at least 3 different view-points in mind: the perspective of the individualservice request, the perspective of the custom in-ternal concurrent programming mechanism throughwhich all individual service requests are controlledwithin the component, and the external concurrentenvironment that must be used but over which onemay have no direct control. Optimally, the operatingsystem kernel implementation itself supports theseviewpoints, thus facilitating component implemen-tation. The kernel is often considered simply anothercomponent that has primary responsibility for CPUallocation and interrupt dispatching.

0.1.2 Software Architecture

Software architecture has been defined as ‘Thestructure of the components of a program/system,their interrelationships, and principles and guide-lines governing their design and evolution over time’

[GP95]. A software architecture is distinguished by‘a shared repertoire of methods, techniques, pat-terns and idioms for structuring complex softwaresystems’.

Software architectures provide an abstract con-ceptual approach to complex systems more specificthan the models often inherent in a general pur-pose language but more general than a single design.A software architecture provides an identifiable ap-proach or framework often implicitly framed by theadopted tools, languages, and development environ-ments [GTP95] [Gar95].

Good system software invariably adopts a welldefined software architecture. The architecturesused for system software are usually based on someform of explicitly coded critical sections [Atw76][FP88] [Her91]. However, explicit critical sectionsintroduce nondeterminism and are a root cause ofmany concurrent programming difficulties [Her90].Since synchronous languages do not contain explicitcritical sections, software architectures based onsuch languages may have advantages over traditionalarchitectures.

0.2 Reactive Systems

A reactive system is event-driven, maintains a per-manent interaction with its environment, and exe-cutes at a rate determined by the environment [HP85].Reactive systems are assumed to execute with perfor-mance sufficient to ensure they are never overdrivenby their environment. Since under normal circum-stances a reactive system never terminates, reactivesystems cannot be characterized as a simple functionfrom a single initial input state to a single final outputstate. Real-time systems are reactive systems with theaddition of timing constraints. Operating systems areinherently reactive and provide the archetypical ex-ample of large reactive systems [JS89] [MK93].

The term reactive is more specific than the in-formal term event-driven, which is widely used andoverloaded. For instance, an event-driven programmay calculate a simple transformation and terminate.The term reactive is more general than soft real-timeand near real-time, because a reactive system doesnot address any real-time constraints but only correctcausality ordering [BB91].

4

light = TRUE;

pvPut(light);

} state light_on

}

state light_on {

light = FALSE;

pvPut(light);

} state light_off

}

when (volt > 5.0) {

GUARDS

when (volt < 3.0) {

state light_off {

/* When in this state and voltage */

/* State 2 -- Light is ON */

/* falls below 3 volts, turn the */

/* light off, and then */

/* Enter State 1, Light Off. */

/* State 1 -- Light is Off */

/* When in this state and voltage */

/* exceeds 5 volts, turn on the */

/* light, and then */

/* Enter State 2, Light On. */

Figure 0.1: A SNL Code Fragment

Simple reactive systems are often programmed asexplicit finite state machines, with external eventsdriving the machine through state transitions. Ex-plicitly coded state machines work well for problemswith fewer than around 10 states. Above this size, ex-plicitly programming a single state machine becomesdifficult. Nonetheless, state machines are often usedfor device drivers, as this size suffices for many driverarchitectures. In this case a component such as anI/O supervisor usually has responsibility both for ex-ecuting the state machines and instantiating the statemachines needed to deal with physical concurrencydue to multiple devices.

Programmers are often provided with special lan-guages for integrating state machines into their pro-gram source. State Notation Language (SNL) is typ-ical of these languages [Koz93] [KKW94]. SNL isused for I/O intensive control systems and is compat-ible with C system programming.

An example SNL program fragment is shownin Figure 0.1. This code turns a light onwhen a voltage exceeds 5 volts and off whenthe voltage falls below 3 volts. Examinationof this code fragment illustrates how large pro-grams developed in this manner suffer from hidden‘spaghetti gotos’. Statement <state label 1{...}state label 2> effectively terminates ina <goto label 2>. The when() clauses guardthe following code-block whenever the appropriatestate has been entered, that is, the code-block will notexecute until the corresponding guard is true.

SNL uses a ‘run-time sequencer’ responsible forevaluating all the guards and serializing execution ofready code-blocks. SNL provides very good integra-tion of the state machine, I/O, and conventional Ccode. Kozubal, et al., note that ‘extremely complex’SNL applications have between 10 and 20 states.However, an SNL sequencer may control executionof multiple independent state machines, on occasioncontrolling concurrent execution of as many as 10state machines. SNL is considered easy to under-stand and use.

For slightly larger applications, such as stand-alone industrial controllers, some means of providinghierarchical structuring of state machines is required.A typical industrial system, intended to support upto around 150 states, is the Action-State diagram[KVK91]. This approach resembles coupling deci-sion tables and state transition diagrams. The deci-sion tables are compiled into hierarchical tables ofaction-routine addresses. A work-loop then sequen-tially executes action routines in response to eventsand the current state. This approach works well forcontrollers or drivers that do not require internal com-petitive concurrency.

Larger reactive systems that must support bothcompetitive and cooperative concurrency are usuallybased on conventional operating system kernels pro-viding multithreading and critical sections. This ap-proach introduces nondeterminism and its associatedconcurrent programming problems. For large hardreal-time systems that must guarantee predictable

0.4. Current Studies 5

performance, cyclic executives are arguably still thepreferable architecture. A cyclic system uses pre-computed deterministic schedules designed for theworst case. While effective, the overhead of suchpessimistic systems can be high, as code executes ona rigid table-driven timeline even if not needed. Un-der sustained near worst case conditions, however,nondeterministic systems that must expend run-timeoverhead scheduling their activities are less efficientthan cyclic systems [Kop91].

0.3 Synchronous Languages

Recently, synchronous software architectures havebeen proposed specifically for reactive systems[BB91] [Hal93]. The resulting synchronous/reactivesystems include data-flow and declarative ap-proaches as well as more traditional imperative lan-guages. None of these approaches have proven in-trinsically more powerful than the others, and aneffort is currently underway to provide a common‘back-end’ for a number of these systems. The typ-ical reactive kernel for which the synchronous lan-guages are intended has many properties of generalsystem software. Many such kernels resemble stand-alone device drivers, that is, drivers running directlyon the machine hardware without additional systemssupport.

The synchrony hypothesis assumes that all compu-tation occurs in discrete atomic steps during whichtime is ignored. This is often stated as the assumptionthat all program code executes in zero time. Timeonly advances when no code is eligible for execu-tion. During a single step, all output is consideredto occur at the same time as all step input, that is,output is synchronous with input. The notion of con-tinuous time is thus replaced by an ordered series ofdiscrete steps between which discrete changes occurin global state. The code executed at each step iscalled a reaction.

The fundamental advantage of this approach isthat internal cooperative concurrency can be handleddeterministically. Concurrent asynchronous eventsare manifested only within global state ‘snapshots’.No explicit critical sections occur in the source code,not even in the guise of monitors or concurrentobjects, because all code can be considered inside

implicit critical sections executed as required viaguarded command style programming.

The synchronous/reactive languages focus on theinternal cooperative concurrency commonly foundin drivers and controllers. Synchronous languagesessentially ‘compile away’ all internal cooperativeconcurrency by producing a single deterministic statemachine that manages all required activities. Non-deterministic external events are handled by thereaction dispatch or guard evaluation mechanism,and do not directly propagate into the body of theprogram [BGJ91]. Since the single state machineinto which concurrent programs are compiled cannotdeadlock, the need for multiple threads, critical sec-tions, and nondeterminacy is eliminated. However,synchronous languages that compile to a single statemachine must sacrifice recursion and dynamic dataallocation to obtain determinism.

Synchronous architectures somewhat resemblecyclic systems executing code repetitively using pre-computed schedules. Synchronous approaches, how-ever, see time as merely another discrete global statevariable and only execute required reactions. Syn-chronous languages provide non-blocking and wait-free internal concurrency. A concurrent system isnon-blocking if some member is guaranteed to com-plete an operation in a finite number of system steps.A system is wait-free if it can be guaranteed thateach member completes an operation in a finite num-ber of system steps [Her91] [Her90]. Investigatingalternatives to process-based concurrency is a cur-rent research area. For instance, Lamport notes thatprocesses are an artifact and need not be adopted asa fundamental primitive in theories of concurrency[Lam94].

0.4 Current Studies

Selected efforts relating to synchronous/reactiveprogramming are examined in this section, primar-ily with respect to their source code and potentialapplicability to system programming.

0.4.1 Meta/NPL

The ISIS system is a reliable distributed systemdeveloped at Cornell [BJ87]. While attempting a

6

large distributed ISIS application, the need arose for adistributed reactive toolkit because ISIS lacked toolsfor distributed control [MW91]. This motivated thedevelopment of Meta, a toolkit for building non-real-time reactive systems. Meta provides a state machinelanguage called NPL. NPL is a guarded commandlanguage providing a globally consistent view ofdistributed state. Guarded commands are interpretedand have atomic action semantics. Performance isconsidered adequate for systems in which timing isnot crucial.

Figure 0.2, based on an example by Marzulloand Wood, illustrates NPL code. NPL uses simplestack-based expressions, with arguments preceding apostfix operator. The general format of a statement is<predicate GUARD actions [ALTERNATEpredicate GUARD actions]*>. The actionsare executed atomically whenever the correspondingguard predicates become true. Figure 0.2 consistsof one such statement. The action consists of asingle statement with operator NPL which takes the2 preceding strings as arguments. The second stringargument is itself an NPL statement which containsan alternate guard.

Three guards exist in Figure 0.2. The first guardexecutes the NPL statement whenever load exceeds5. The NPL statement takes two arguments, a‘context’ (in this case, server), and a programfragment to execute in that context. The secondguard exits the action if the load falls below 5. In thiscase, the first guard remains enabled and the actionwill be re-executed if load again exceeds 5. Thethird ‘alternate’ guard starts a timer whenever the firstguard fires. If 20 seconds passes, TIMER returns trueand the third guard executes idle-server. TheLEAVE statement then causes the entire statement inFigure 0.2 to be removed from further possibilityof execution. Thus, Figure 0.2 has the effect ofexecuting idle-server if the load exceeds 5 for20 seconds.

NPL is implemented as an interpreter driven byguard evaluation. NPL uses a non-deterministicLeast Recently Used (LRU) policy to select whichready action to execute first. Although typical ofguarded command languages and designed explicitlyfor reactive environments, NPL does not assume thesynchrony hypothesis.

0.4.2 Esterel

Esterel is the oldest synchronous/reactive languageand the best documented [BC84] [BdS91] [BG92][Hal93] [Edw94]. The design of Esterel was mo-tivated by an effort to develop a semantics of par-allel and real-time programming following RobinMilner’s theories of synchronous process algebras[Mil93].

A complete Esterel module is shown in Figure0.4 and code fragments in Figures 0.3 and 0.5.Figure 0.3 is from an example by Murakami andSethi, and Figures 0.4 and 0.5 follow an exampleby Halbwachs [MS90] [Hal93]. Esterel is not acomplete programming language. It is a programgenerator used to describe the reactive kernel ofreactive programs, that is, it provides a deterministicreactive control harness which calls routines writtenin a conventional programming language.

Program execution forms a discrete sequence ofinstants. The only global state consists of instan-taneously broadcast signals, with broadcast syn-chronous with the instant in which a signal occurs.Signals last the entire current instant, that is, fromthe start of the instant in which they are emitted tothe end of that instant. The state of all signals emit-ted during a given instant is altered synchronously.Time is treated as a signal identical to any othersignal. A pure signal is emitted using the syntax<emit s;>, where s is a signal name. Statement<emit s(N);> associates the integer value Nwithsignal s. The statement <present s then s1else s2 end> executes statement s1 if signal sis present and statement s2 otherwise. The value ofsignal s is obtained by ?s.

The statement <s1||s2> indicates that state-ments s1 and s2 react in the same instant. Po-tentially all routines in a program can react duringthe same instant. All reactions are atomic and exe-cute to completion within the current instant. Whileone reaction is executing, no other reactive routinecan execute. A reaction can block ‘in place’ untilthe instant in which signal s is present via <awaits;>. To wait for the third instant in which signals has occured, for example, await takes the form<await 3 s;>.

The fundamental control construct is the watch-dog, with syntax<do s1 watching s timeout

0.4. Current Studies 7

GUARDSload 5 > GUARD

"server"

EXIT

LEAVE

NPL

"

" load 5 < GUARD

ALTERNATE

20000 TIMER GUARD

"Idle-Server"

Figure 0.2: An NPL Routine

trap GET_STRING in

signal RESTART in

every BREAK do

call cycleBaud()();

emit RESTART

end

||

loop

aString := "";

do

loop

await DATAIN;

call build(aString,done)(?DATAIN);

if done then

exit GET_STRING

end

end

watching ALARM

each RESTART

end % signal

GUARD (do)

end % trap

emit TIMER(N);

State 0

aString := ""; TIMER(N);

nextstate 1

State 1

if BREAK then

cycleBaud()();

aString := ""; TIMER(N);

nextstate 1

end;

if ALARM then nextstate 2 end;

if DATAIN then

build(...)(...);

if done then nextstate 3 end;

TIMER(N);

nextstate 1

end;

nextstate 1

State 2

if BREAK then

cycleBaud()();

aString := ""; TIMER(N);

nextstate 1

end

nextstate 2

ESTEREL SOURCE COMPILED AUTOMATON

GUARDS

GUARD (loop)

Figure 0.3: An ESTEREL Routine

8

module BUTTON_INTERPRETER :

else % The stopwatch is stopped

else emit RESET

end

end

end

output RESET, DISPLAY_TOGGLE;

signal STOPWATCH_RUNNING, FROZEN_DISPLAY in

present STOPWATCH_RUNNING then emit DISPLAY_TOGGLE

upto DISPLAY_TOGGLE

present FROZEN_DISPLAY then emit DISPLAY_TOGGLE

do sustain FROZEN_DISPLAY

end

end.

|| % flip-flop: stopwatch runs between presses.

do sustain STOPWATCH_RUNNING

loop % Toggle frozen_display state.

loop % Toggle running state

await START_STOP_BUTTON;

await DISPLAY_TOGGLE;

GUARD

GUARD

GUARD

CAUSALITY ERROR,

i.e., ‘DEADLOCK’.

input START_STOP_BUTTON, FREEZE_BUTTON;

every FREEZE_BUTTON do

|| % flip-flop: Freeze display between FREEZE_BUTTON presses.

upto START_STOP_BUTTON

Figure 0.4: A Causality Error in ESTEREL

trap T in

||

end

end

GUARDS

loop % Toggle frozen_display state.

await DISPLAY_TOGGLE;

sustain FROZEN_DISPLAY

await DISPLAY_TOGGLE; exit T

% flip-flop: Freeze display between FREEZE_BUTTON presses.

Figure 0.5: Interval Sequencing Solves a Causality Error

s2>, where s is a signal and s1 and s2 are com-pound statements, that is, sequences of statementsseparated by semicolons. Bodys1 is executed unlesssignal s occurs, in which case the body is terminatedand the timeout executed. The timeout is optional.Statement s1 is not executed in the instant signals occurs. Since statement s1 may contain awaitstatements, a do statement may span many instants.

The general form of a trap statement is <trap

T in s1 end>, where T specifies the name of thetrap block defined by compound statements1. Trapscan be nested. The trap can be exited from within thetrap block by a statement of the form <exit T;>,where T indicates which enclosing trap to exit. Thestatement <trap T in await s; s1; exitT end> executes s1 until and including the instantin which signal s occurs. Note that the previous dostatement does not execute statement s1 within the

0.4. Current Studies 9

e

m

i

T

sustain FROZEN_DISPLAY upto DISPLAY_TOGGLE

emit DISPLAY_TOGGLE

present FROZEN_DISPLAY

present STOPWATCH_RUNNING

every FREEZE_BUTTON

End of Instant

n

n - 1

6 T F T/ F T

T F T T5

4

entire current instant. This makes FROZEN_DISPLAY False at the start of the instant, but to arrive at step 6, FROZEN_DISPLAY must be True.

3

T F T F

T F T F

F F T F

2

1

T F T F

DISPLAY

FROZEN

TOGGLE

DISPLAYSTOPWATCH

RUNNINGBUTTON

FREEZE

Start of Instant

7 T F F F

8 T F T F

Microinstant description:

2) Between instant n-1 and instant n, the state of FREEZE_BUTTON changes (it has been pressed).

3) STOPWATCH_RUNNING is False.

4) FROZEN_DISPLAY is True.

6) The ‘upto DISPLAY_TOGGLE’ becomes True, which terminates the ‘sustain FROZEN_DISPLAY’, making FROZEN_DISPLAY False throughout the

7) FROZEN_DISPLAY False invalidates step 4, which keeps step 5 from setting the state of DISPLAY_TOGGLE to True, so both DISPLAY_TOGGLE and

FROZEN_DISPLAY are False. It is a contradiction for DISPLAY_TOGGLE to be both True and False at step 2.

8) Since FROZEN_DISPLAY must have been True at the start of the instant, the state of the signals must be as shown in step 8, which is identical to

the state at step 2, so the cycle has no stable solution.

Changing the state

of a signal must

logically change its

state at the beginning

of the instant.

‘microinstant’

1) At the end of instant n-1, FROZEN_DISPLAY is True and DISPLAY_TOGGLE is False due to the ‘sustain’ shown in the microinstant at step 6.

} Signals

5) The ‘emit DISPLAY_TOGGLE’ sets DISPLAY_TOGGLE to True.

Figure 0.6: A Causality Error Trace

first instant signal s occurs, while the given trapstatement does execute statement s1 within the firstinstant signal s occurs. Many Esterel statements areeffectively constructed as macros using the do andtrap statements as primitives.

Figure 0.3, from an example by Murakami andSethi, contains a code-block that assembles a stringof input characters into a buffer, subject to a time-out [MS90]. If a BREAK occurs, the baud rate ofthe input line is re-determined and buffer assemblyrestarted. This is a typical small systems program-ming problem.

The <trap GET STRING> in Figure 0.3 en-ables <exit GET STRING> to exit the code-block. Signal RESTART is declared with localname scope. In any instant that external signalBREAK occurs, routine cycleBaud()() executesand RESTART will reset the loop executing in paral-lel. Note Esterel routines take 2 argument lists. Thefirst specifies parameters passed by reference and thesecond parameters passed by value.

The loop collects input into buffer aString. Theloop starts a timer by emitting signal TIMER(n). At

the instant corresponding to an elapsed time of n, thetimer module will raise signal ALARM. The timercounts signal TICK, which implicitly occurs everyinstant. The await command blocks executionuntil the instant that DATAIN contains data, at whichtime its value is obtained by ?DATAIN. Since allcode executes atomically and global state is updatedsynchronously with the instant, there is never apossibility of DATAIN changing value due to a racecondition.

Figure 0.3 also illustrates Esterel compilation.Output is a directly compiled finite state machinewhich does not require an interpreter for execution.The compiled code is extremely fast since there isno need for concurrent threads, messages, or inter-preters. However, there is considerable redundantcode. For example, the first code-blocks in State 1and State 2 are the same.

Figure 0.4, based on an example by Halbwachs, il-lustrates a module in a stopwatch controller [Hal93].The stopwatch has 2 buttons. One button starts andstops the timer. The other button freezes and un-freezes the stopwatch display when the stopwatch is

10

running. The start/stop button can stop the runningstopwatch when the display is frozen, after whichthe freeze button can be used once to display thefinal time. When the stopwatch is stopped and the fi-nal time is displayed, pressing the freeze button againresets the stopwatch.

The example contains 3 concurrent code-blocks.The signal start stop button simply togglesthe signal stopwatch running as specified bythe loop in the second code block. The sig-nal display toggle likewise toggles the signalfrozen display in the loop contained in the thirdcode block. Signal display toggle does notsimply reflect the state offreeze button becauseof the modal operation of this button, which is inter-preted by the first code-block.

This example contains a causality error, whichis the synchronous equivalent of deadlock. Unlikerun-time deadlock, causality errors in principle canbe detected at compile time due to the synchronyhypothesis. These errors arise because it is possibleto write reactions of the form ‘emit signal s ifand only if it is absent’ or the undetermined ‘emitsignal s if and only if it is present’. In Figure0.4 it arises because if signal frozen displayis present, signal display toggle is emitted, butif signal display toggle is emitted then signalfrozen display is not present due to the uptoat the bottom of the module. The <do s1 uptos> is defined as<do s1; halt watching s>,that is, statement s1 is not executed the instant thatsignal s occurs and s1will not be executed in futureinstants.

Figure 0.6 illustrates the causality error. The syn-chronous nature of Esterel is also illustrated in thisfigure because theemitof signaldisplay toggle,which occurs if signal frozen display is true,must cause the signal frozen display to be falseat the beginning of the instant, leading to a contra-diction.

The solution to the causality problem, follow-ing Halbwachs, is shown in Figure 0.5. Thecode-block at the bottom of the module in Fig-ure 0.4 is replaced with a trap-based sequencer.Since the trap continues to execute its body thefirst instant that display toggle is true, signal

frozen display remains true for the entire in-stant, becoming false in the next following instant.

0.4.3 Reactive C

Reactive C (RC) is designed to provide extensionsto C supporting synchronous/reactive programmingbased on the Esterel model [Bou91] [Bou92]. RC isimplemented as a preprocessor generating C sourcecode. RC does not compile to a finite state ma-chine, but rather provides the C programmer addi-tional statements for expressing Esterel-style reactivecontrol flow. RC is more general than Esterel. Guardconditions can be boolean expressions and reactionsexecuting within the same instant can coordinate andsynchronize via micro instants and the suspend state-ment.

Example RC statements described by Boussinotare shown in Figure 0.7. The stop statementin reactive procedure Hello() is considered thebasic RC reactive statement. Stop halts executionfor the remainder of the current instant. The nextinstant, control resumes at the immediately followingstatement.

The condition argument of the select statementin Figure 0.7 is evaluated every instant. If the con-dition is true, routine P1() is executed, otherwiseP2() is executed. In the statement shown, x alter-nates between true and false, so execution of P1()and P2() will alternate each instant.

The par statement in Figure 0.7 specifies mul-tiple reactive statements to be executed in the sameinstant. It takes only 2 other statements as arguments.Unlike the Esterel || syntax, the order of executionof these 2 statements is fixed, with the first argumentexecuting first within the instant, and the second ex-ecuting next within the instant. Typically, executionwould block within each branch of the par state-ment during the first instant. At the second instant,control proceeds from the 2 blocked control pointsin the same order, that is, the first argument executesbefore the second.

RC abandons Esterel’s purity with respect to log-ical instants providing the only event ordering. Asingle instant can be divided up into micro instants.This is illustrated by the 2 statements on the rightside of Figure 0.7. In the first par statement, the

0.4. Current Studies 11

rproc void Hello(){

printf( "Hello, world\n" );

stop;

printf( "I repeat: hello, world\n" );

}

select( x = !x )

exec P1();

{ printf("2"); }

exec P2();

par

exec Hello();

exec Bye();

par

{ suspend; printf("1"); }

close

par

{ suspend; printf("1"); }

{ printf("2"); }

Figure 0.7: Reactive C Statements

suspend in the first argument suspends executiontill the next instant. Thus, a ‘2’ is output the firstinstant and a ‘1’ the second instant. The closestatement forces execution to be restarted during thecurrent instant, that is, after the second argumentof the par statement completes execution, the sus-pended first argument is resumed. The output ofthe close statement is ‘21’ at the end of the firstinstant. The suspend and close statements areuseful when multiple routines have to monitor eachothers state, wait for initialization to be complete,etc..

0.4.4 Evaluations

A group at Bell Labs evaluated 6 different reactivespecification and development systems with respectto a reactive coding problem originally implementedin C in the ATT 5ESS system [ACJ+95]. Esterel wasincluded but not RC. The evaluation considered real-world applicability, compatibility, and software en-gineering concerns such as language learning curves.This study concluded that Esterel required more ex-pertise than the other approaches. For this groupof experienced evaluators, however, overall learningcurves were not a significant problem for any of thesystems. Esterel was found to be the most expres-sive language and scaled best to large application

domains. The most notable result of this study wasthat maintainability was not a strength of any of theevaluated approaches. The group recommended thatmaintainability must be addressed before any of theevaluated methods would be suitable for large-scaleindustrial use.

A German research effort evaluated 18 differentreactive programming systems amenable to formalanalysis by funding implementations of the same‘toy’ manufacturing cell controller in each system[LL95]. As with the ATT evaluation, Esterel wasincluded but not RC. The resulting 400 line Esterelprogram provided a reactive control harness for con-ventional C routines. The Esterel program had tobe split into 5 separate reactive kernel modules withindependent state machines in order to be tractablefor the tools and the code generator. No proofswere attempted because the complete system was toolarge, and the modularized system introduced asyn-chronous communication between modules. It wasnoted that a large number of signals overwhelm thedesigner, and that small changes in an Esterel pro-gram can result in significant changes in state ma-chine size, making requirements unpredictable. Al-though believing the approach simplified the designof the controller, the group implementing the Esterelcontroller concluded that ‘the advantages are limitedif either many kernels are loosely coupled or if the

12

data structures used are complex’ [Bud95]. The re-sulting executable program was 46 Kilobytes, with26 Kilobytes resulting directly from Esterel and theremainder primarily resulting from the low-level Croutines.

0.5 System Programming Considerations

High-performance concurrent reactive systemsplay a central role in the future of operating sys-tems. This is especially true for servers requiringa very high degree of competitive internal concur-rency, for instance, servers for transaction process-ing, databases, networks, and multimedia. Con-text switch overhead bounds the performance ofconventional multithreading approaches based onheavyweight and lightweight concurrency [Ous89][MB91] [ALBL91]. Context switch code pathsthrough general purpose systems are lengthy andcontext switches destroy locality of reference as-sumptions upon which high performance systemsrely. Featherweight context switch techniques thatcan provide a high degree of both competitive andcooperative internal concurrency are thus of inter-est. Historically, such approaches have long beenused in areas such as real-time avionics and high per-formance transaction processing. Example systemsthat have been described include Boeing’s Rex andIBM’s TPF [BS86] [Mar90]. This section examineswhether the synchronous/reactive techniques consid-ered in the previous section are appropriate for highperformance generic system software.

0.5.1 General Observations

As indicated by the ‘guard’ balloons in the figures,all the languages described enable programminglarge state machines using parallel variations ofDijkstra’s guarded commands [Dij75] [Hal77]. Tounderstand the source code in these systems, oneidentifies the guard locations and determines theconditions under which a guard will activate. Forinstance, in reading Esterel or RC, a productive firststep is to identify all emits and all correspondingguards.

Guarded command programming resembles dataflowprogramming in that the programmer has no direct

control over the selection process, that is, the orderin which multiple ready reactions are activated. Thisis most clearly visible in NPL, which uses an LRUpolicy to select what executes next. Although syn-chronous languages are deterministic, the specific or-der in which code executes may not be immediatelyobvious from the source code.

The synchronous languages resemble a dataflowapproach to programming cyclic systems with allcomputation based on a traditional timeline dividedinto major and minor cycles. The similarity is mostobvious in the case of RC, where instants may be con-sidered major cycles and micro instants correspondto minor cycles. In both Esterel and RC instants canbe considered major cycles and the sequence of allreactions that execute within the instant correspondsto a minor cycle sequence. In RC these minor cy-cles are guaranteed to occur in a fixed order. Unlikecyclic systems, however, the synchronous languagesonly execute needed reactions.

It is not clear whether the synchronous/reactivelanguages are truly intended for programming gen-eral operating systems software. Operating systemsdrivers, protocol stacks, user interfaces, and real-timeprocess controllers are mentioned as example appli-cations. Halbwachs specifically notes that most sys-tem software is based on reactive kernels embeddedwithin more traditional architectures. This split cor-responds roughly to the distinction between coopera-tive and competitive internal concurrency. Althoughthe synchronous languages make internal coopera-tive concurrency deterministic, traditional means ofmanaging competitive concurrency must be used.

Previous efforts to eliminate operating system non-determinism, most notably associated with declara-tive language research, have not been particularlysuccessful. For instance, in describing the designof a functional operating system based on streams,Jones and Sinclair note that ‘operating systems areinherently reactive’, but that ‘it is not clear that oper-ating systems can be written without modeling non-deterministic behavior’ [JS89]. They discuss suchefforts in the functional language community, includ-ing an effort to provide implicit time determinism,timestamps, and clock oracles, and note that ‘Theeffect of this proposal is to place non-determinismentirely outside the software..’. They remark:

0.5. System Programming Considerations 13

‘In recent years, however, researchactivity on the functional operating sys-tems front has been rather quiet, possiblybecause the experiments reported aboveshowed that... it does not always lead togreater elegance and clarity in the detailedcoding of programs’ [JS89].

Logic programming approaches have fared littlebetter. Concurrent Prolog was expressly designedfor systems programming and provided dataflowstyle synchronization via nondeterministic guarded-commands as its basic control mechanism [Sha86a][Sha87]. Concurrent Prolog served as the basis forKL1, the Kernel Language used to write the operatingsystem for the Japanese Fifth Generation project[Fur92]. Regarding efficiency of Concurrent Prologimplementation efforts, Shapiro reports:

‘ ... It was a deathblow to the imple-mentability of Concurrent Prolog, at leastfor the time being, since it showed that im-plementing Concurrent Prolog efficientlyis as hard, and probably harder than, im-plementing OR-parallel Prolog. As weall know, no one knows how to imple-ment OR-parallel Prolog efficiently, as yet’[Sha86a].

0.5.2 Implementation Concerns

It is not possible to ignore software engineeringissues such as performance, size, and maintainabil-ity when evaluating system software. There are anumber of means by which one can implement syn-chronous/reactive languages. Esterel, for example,has been implemented both as a compiler producinga finite state machine and as a compiler producinga set of routines called by a reaction dispatch work-loop interacting with the environment and then call-ing required reactions. A conventional thread-basedimplementation appears quite possible for languagessuch as RC, as RC’s fundamental stop statementcan be considered a coroutine call to a coroutinecoordinator which evaluates conditions and issues acoroutine resume to the next ready coroutine.

When compiling a language such as Esterel into asingle finite state machine, the number of resulting

states is potentially exponential in the size of the Es-terel program source, as the state machine must cor-rectly support all legal orderings in which reactionscan occur. These orderings correspond to all possibleinput sequences. Observations such as these moti-vated the Esterel compiler of Edwards, which resem-bles a traditional compiler [Edw94]. Source code istranslated to assembler, and the resulting routines dis-patched by a run-time reaction dispatch work-loop.This compiler is reported to have roughly linear com-pile time, output size, and output execution time.

Edwards attempted to compile a 1000 line Es-terel program using both compiler approaches. Thegenerated finite state machine contained over 200states and was output as a C source file of over 230Megabytes, which could not be compiled. This same1000 line Esterel program was compiled by the work-loop compiler and produced 21 thousand lines of as-sembler and a 128 Kilobyte executable.

A 600 line Esterel program compiled to a statemachine of 32 states. This was output as a 19Megabyte C source file which, somewhat amazingly,compiled into a 12 Megabyte executable. The 600line program, when compiled with the work-loopcompiler, produced 13 thousand lines of assemblercode resulting in a 96 Kilobyte executable.

These executables are exceptionally large by con-ventional system programming standards, especiallyconsidering that the 1000 line program consisted ofthe controller for a 4 button digital stopwatch. Whilenot a trivial application, some device drivers are cer-tainly more complex. Edwards notes that this is a‘large’ Esterel application, and also notes that causal-ity errors can be arbitrarily subtle but that it is im-practical to have the compiler perform exact causalitychecking due to excessive compile times.

0.5.3 Programming Implications

None of the synchronous/reactive languages aregeneral purpose. For instance, Esterel lacks datastructures and its signals consist only of integervalues. The languages presented here all requiresome form of guard evaluation which requires eitherevaluation of global state between each instant orredundant code. Overhead, either time or space, canbecome excessive.

14

These languages have been designed to provideprimarily cooperative concurrency. None except forNPL address the needs of systems which must dealwith both cooperative and competitive concurrency.Esterel and RC explicitly preclude internal compet-itive concurrency because dynamic thread creationintroduces a dynamic degree of concurrency thatcannot be eliminated by the compiler and thus isnot allowed. These languages have no mechanismwhereby a single routine can be executed concur-rently with itself within the same instant.

Debugging large state machines is a significantproblem, especially when the state machine has beenautomatically generated. Halbwachs notes that ‘thecorrespondence between the source code and thegenerated code is far from being obvious. Theslightest change in the Esterel program can involve acomplete modification of the automaton’. This alsoillustrates why handcoding state machines for largeproblems is unreasonable.

Although the compiler can in principle detectindirect causality errors, in large systems it may notbe obvious how to fix a causality error. Causalityerrors complicate program development and impactmodularity as the compiler needs to analyze allprogram source.

Programming RC’s micro instants, which requireexplicit source code coordination, resembles con-ventional concurrent programming. Micro instantsintroduce ‘invisible gotos’ and their attendant pro-gramming difficulties. Additionally, Huizing andGerth note that the semantics of such micro instants‘turned out to be too subtle and non-deterministic tobe of practical use’ [HG91].

The problem of distributed mental state is notsolved by any of these languages. Understanding theinteraction of cooperative code scattered throughouta program presents a serious cognitive challenge toprogram comprehension [LS86]. Programs writtenin a guarded command dataflow style can requireconsiderable study before overall program behavioris deduced. Since every routine potentially executesevery instant, and since control can be located in themidst of each routine, the programmer potentiallymust keep the entire program state in mind. For in-stance, consider the following quote from Halbwachsin describing a 36 line Esterel program:

‘Initially, the control is stopped at...,lines 3, 12, and 24... line 12 is interruptedand ... stopped by ... line 16... The newglobal state is... lines 3, 16, and 24... line3 is interrupted. ... comes back to ... line3. ... control is stopped at lines 3, 16, 28,and 32’ [Hal93].

It is thus difficult to program Esterel or RC withoutdrawing timelines, mapping source statements to thetimeline, and mentally executing code fragments.Without significant study, the source is insufficientto understand the program.

0.5.4 Summary

In summary, the advantages of the synchronous/reactiveapproach with respect to programming system soft-ware are:� Programs become deterministic and deadlocks

at run-time impossible. Internal cooperativeconcurrency and communication overheads arecompiled away and inherently handled by onestate machine.� Atomic reactions with essentially basic blockgranularity, coupled with discrete time stepsthat change global state in ‘snapshot’ fashion,simplify concurrent programming.� Because explicit register-set based contextswitch is not required, implemented systems canbe extremely fast, indeed, potentially optimumwith respect to time.� Since explicit critical sections are eliminated, soare the associated maintenance, debugging, anddesign problems.

Disadvantages of synchronous/reactive program-ming with respect to system software are:� Guarded-command dataflow-style programming

is difficult.� Distributed mental state makes programming-in-the-large difficult. Debugging large statemachines is hard.� There is little provision for competitive inter-nal concurrency. In general, all the problemsrelating to competitive concurrency remain.� The compilers produce code that is considerablylarger than desirable for real systems, and thusthe approach does not scale to large systems.

0.6. An Alternative Proposal 15� If state machines are not produced, the overheadof the guard interpreter can become excessive.

The first 2 disadvantages relate directly to mentalprogramming difficulty. The sample source codeillustrates that basic psychological complexity andprogram comprehension effort appears similar totraditional concurrent programming approaches.

0.6 An Alternative Proposal

Is there a software architecture for conven-tional systems programming which retains syn-chronous/reactive advantages but not the disadvan-tages identified in the previous section? The devel-opers of Esterel have noted the similarity betweenthe synchrony hypothesis and clocked digital circuitsin which all ‘reactions’ take one clock cycle [BB91].Indeed, it is natural to consider, as an alternative todataflow-flavored approaches, a software architec-ture based on a sequential instruction model similarto that found in conventional hardware.

0.6.1 The Soft-Instruction Architecture

A conventional computer architecture advances aprogram counter each ‘instant’, thereby executing in-struction sequences. The current instruction may al-ter the value of the program counter, thus transferringcontrol to another location in the instruction stream.The processor keeps a large amount of local statewhich participates in instruction stream execution.

The software architecture in Figure 0.8 is calleda soft-instruction architecture by analogy with thediscrete atomic instructions implemented in conven-tional hardware. Programs are developed at 2 levels.A reactive level is concerned primarily with concur-rency, high-level control flow, and the reactive logicof the program. This level consists of control con-structs and soft-instructions. It is programmed as ifa custom instruction set, one instruction per requiredtype of reaction, is available. The control constructscan be more elaborate than those in hardware instruc-tion sets and can resemble those of any high-levelprogramming language. The reactive level providesfor programming-in-the-large and exposes one viewof the deep structure of the program [DK76].

A synchronous lower program level consists ofsoft-instruction implementations. Soft-instructionimplementations perform the programming-in-the-small tasks required of the program. Soft-instructions are written in a traditional system pro-gramming language such as C. These routines arewritten in a stylized manner defined by each specificimplementation of the soft-instruction architecture.With respect to the soft-instruction program, soft-instructions execute atomically, similar to the mannerin which most normal hardware instructions executeatomically with respect to the CPU. As with hardwareinstruction implementation, soft-instruction durationmust be bounded by the implementation.

In conjunction with the design of the two programlevels, a data structure defining instruction streamstate must be developed. This data structure iscalled a context structure and is an implicit argumentprocessed by all soft-instructions. The actual contextstructure can be produced automatically by soft-instruction programming tools.

0.6.2 The Soft-Instruction

Figure 0.8 shows a single system component im-plemented using a soft-instruction architecture. Thesingle system component in Figure 0.8 could be adriver, I/O subsystem, file system, network server,etc.. This component could be either included in alarger system or running stand-alone. If implement-ing a conventional server within a conventional oper-ating system, the entire system component shown inFigure 0.8 could be implemented internal to a singleserver process.

The soft-instruction program in Figure 0.8 is par-titioned completely into soft-instructions, S1(), S2(),etc.. Soft-instructions often correspond to basicblocks. All soft-instructions must be implementedso that they never block. Thus, as with hardwareI/O instructions, all I/O requests internal to a soft-instruction implementation are asynchronous, withthe I/O activation typically followed by a return fromthe soft-instruction. Again, the size of a given soft-instruction may be bounded by the desired latency ofthe system, as well as by I/O requirements.

16

...

DISPATCHER

Context

...

Context Structures

S5

S1

S1

S2

Reactive Control Table

Current

Structure

Current Context Structure Pointer

stack

‘PC’

GLOBALS

PC

PC

PC PC

soft-instruction

soft-instruction

soft-instruction

S2()

S1()

Sn()Counter (PC)

Level Program

High

Current

long-term

short-termstack

Figure 0.8: Soft-Instruction Software Architecture

0.6.3 The Scheduling and Dispatch Loop

A reactive dispatch loop drives the reactive level.It selects a ready context structure using any oper-ating system scheduling technique. The dispatcherin Figure 0.8 then obtains a reactive control tableprogram counter (PC) from within the selected con-text structure. This high-level logical PC is usedin conjunction with the control table to locate thesoft-instruction implementation to execute. The dis-patcher typically bumps the control table PC withinthe context structure to point to the next entry inthe control table, thus pointing indirectly to thenext soft-instruction to be executed on the contextstructure’s behalf after completion of the currentsoft-instruction. The dispatcher invokes the soft-instruction with the context structure specified, insome fashion, as an argument to the soft-instruction.

The dispatcher can transfer control to a soft-instruction in many ways, including a direct or in-direct jump or call. Whatever the mechanism, thedispatch loop invokes the routine implementing thenext soft-instruction, and upon completion of the rou-tine control returns to the dispatcher. Successivesoft-instruction dispatches may advance the state ofdifferent context structures, thus providing feather-weight internal concurrency.

The dispatcher is a small threaded-code dispatchloop that can easily be modified by the programmer toimplement custom reactive policies. The overheadof this dispatcher can usually be reduced to a fewinstructions. In threaded-code terminology, such aloop is an inner interpreter or address interpreter. Onsome architectures such interpreters can be reducedto a single instruction, and as such they do not havethe negative performance connotations of high-levelouter interpreters [Kog82].

0.6.4 Short-term and Long-term Stacks

Each valid context structure provides an explicitcontext for an internally competitive concurrent soft-instruction stream. A single program run-time stackprovides space for short-term variables with liveslasting the duration of the current soft-instructionexecution. This short-term stack usage is identical tothat found in any modern language, for instance, itcan be the normal C run-time stack.

The soft-instruction model does not have conven-tional per-thread run-time stacks. Rather, the ‘per-thread’ context structures contain fixed-size stacksused for both reactive-level control flow and for long-term variables with lives spanning soft-instructionexecutions. Typical system components, such as

0.6. An Alternative Proposal 17

file servers, often require only around 1 Kilobyte forsuch long term variables and reactive level control.The bounded ‘per-thread’ stack space in each con-text structure is thus usually quite small. This canbe important when serving hundreds of concurrentrequests.

Variables within the scope of a single soft-instruction implementation can thus be found on 2stacks, one the standard run-time stack containingthe soft-instruction’s short-term variables, and theother containing long-term variables specified as ar-guments, perhaps implicitly, to the soft-instruction.Dynamically allocated global resources, such as I/Obuffers, are accessed by long-term pointer variables.

0.6.5 The Control Table

The control table represents the reactive program.This data structure can take many forms dependingon the dispatcher design. However implemented, thecontrol table determines the execution path of eachcontext structure. A typical implementation consistsof a table of addresses pointing to soft-instructionimplementations. This is similar to a traditionalsequence of machine instructions.

Any number of context structures can be activelytraversing the control table, that is, in general thedesign of the control table is completely indepen-dent of the system’s required degree of internal con-currency. Soft-instructions that define control con-structs within the reactive table evaluate context stateand may alter the value of the context structure’sprogram counter. The implementation of such soft-instructions can be included directly within the dis-patcher or implemented in the same manner as anyother soft-instruction.

Control tables can be generated and analyzed bytools and utilities ranging from simple macros tocomplete compilers.

0.6.6 Concurrent Programming, I/O, andContext Switch

Soft-instructions are serially reusable. The dis-patcher executes a single soft-instruction at a time,and each soft-instruction runs to completion, thus

implicitly placing all the ‘synchronous’ code in theprogram within critical sections.

Each context structure concurrently traversing thetable represents a unique soft-instruction stream ex-ecuting the common reactive control table logic.When an activity of unknown duration, such asan asynchronous I/O request, is activated on be-half of a context structure, the context structureis blocked. This blocks the corresponding soft-instruction stream.

Context structures are blocked by eliminating thestructure’s eligibility to be selected by the dispatcherfor execution. In Figure 0.8, the current contextstructure could be blocked by pointing the currentcontext structure pointer at another ready contextstructure after removing the current context structurefrom the dispatcher’s ready queue. Although suchblocking and scheduling logic may be implementedin many ways, it is often useful to place an explicitconcurrency instruction in the control table, for in-stance, a SUSPEND or AWAIT IO COMPLETIONsoft-instruction. Conversely, if a certain class of soft-instructions exist that all initiate I/O requests, eachof these instructions can suspend instruction streamexecution, as in the usual manner of hardware I/Oinstructions.

As with any soft-instruction, the soft-instructionthat activates an I/O request or other asynchronousactivity runs to completion and then returns to thedispatcher. Note that it is the soft-instruction streamdescribed by the context structure, not the soft-instruction itself, that becomes blocked.

A blocked context structure is unblocked by somemechanism associated with completion of the con-current asynchronous activity. This reactive eventlogic and associated unblocking mechanism varieswidely. For instance, logic to react to event com-pletion can be located directly in the dispatch loop,handled by interrupt routines that interact with thescheduler, or performed by special soft-instructionstreams that examine status locations associated withevent completion. The specific mechanism will usu-ally depend on the lower layer mechanisms whichsupport external concurrency with respect to the sys-tem component. For example, a hardware or soft-ware interrupt routine could change the status of thecontext structure and put the context structure on

18

cs_adr *fork_input; /* ‘producer’ location in ring-buffer. */

CONTEXT *cs; /* Current Context Structure pointer. */

typedef instruction *ins_adr; /* A pointer to a soft-instruction. */

typedef void instruction(); /* A soft-instruction is a void func. */

cs_adr *fork_output; /* ’consumer’ location in ring-buffer. */

... /* bumping the table pointer.*/

} /* address and execute it, */

cs = null_cs.cs_flink; /* Get the head of the */

} /* needed. */

fork_output = fork_ring; /* wrap the fork ring, if */

while( fork_input != fork_output ) { /* If anything is in the */

for(;;) { /* Execute soft-instructions forever. */

...

ins_adr si_adr; /* Current soft-instruction address. */

si_adr = (ins_adr)(*cs->cs_next++); /* ready list and fetch */

(*si_adr)( cs ); /* the next soft-instruction */

fork_output = fork_input = fork_ring; /* Initialize fork ring. */

unblock( *fork_output++ ); /* fork ring, put it on */

if( fork_output >= END_RING ) /* the ready list and */

Figure 0.9: A Simple Dispatcher

a dispatcher ready queue. Conversely, if comple-tion status can be checked with minimal overhead,it may be desirable for the dispatcher loop to di-rectly check all required completion status at the endof every soft-instruction. Whatever the mechanism,when it is determined that the activity invoked by theblocked soft-instruction stream has completed, thecorresponding context structure is placed in a statethat will again result in the dispatcher executing soft-instructions on its behalf.

The range of possible reactive dispatch mecha-nisms, and the small amount of code required to im-plement alternatives, is one of the advantages of thesoft-instruction architecture. The dispatcher in manyways resembles a single work-loop merging both thework-loop of a hardware microcode instruction inter-preter and the inner work-loop of an operating systemkernel. The dispatcher implementation is under com-plete control of the system programmer and providesa mechanism by which soft-instruction systems canreadily be made compatible with particular architec-tures and environments into which a soft-instructionsystem is inserted.

Context switching between concurrent soft-instructionstreams uses the same mechanism as that used to se-quence through the soft-instructions in a single soft-instruction stream. Discounting the overhead of the

scheduler, which typically only runs as the result ofa significant I/O event, context switching is accom-plished by low-overhead operations such as changinga single base register pointing to the current contextstructure and then performing a table-directed jumpor call.

0.6.7 A Simple Dispatcher

Figure 0.9 shows a simple dispatcher implementa-tion. This dispatcher code fragment is designed as-suming hardware or software interrupt routines exe-cute upon external asynchronous request completion.The simplest assumption, given this implementation,is that the interrupt routines are real hardware deviceinterrupts. In this case, the ‘driver’ routines that man-age the device are simply special soft-instructions.

A ring buffer called the ‘fork ring’ provides aproducer-consumer style data structure used to com-municate between the interrupt level and the soft-instruction level. The fork ring is simply an arrayof addresses. The fork ring contains addresses ofblocked context structures that need to be unblockedbecause the external request on which they were wait-ing has completed. In this implementation, it is as-sumed that pointers are incremented atomically byhardware without any need for interrupt masking,

0.6. An Alternative Proposal 19

if( cs == null_cs.cs_tail )

}

old_tail->cs_flink = cs;

null_cs.cs_tail = cs;

}

cs = null_cs.cs_flink;

null_cs.cs_flink = cs->cs_flink;

}

...

/* on the dispatcher’s */

CONTEXT *old_tail; /* Context Structure */

/*-- Called by the dispatcher’s work-loop. */unblock( CONTEXT *cs ) { /* Puts a ready */

old_tail = null_cs.cs_tail; /* ready queue. */

fork_input = fork_ring;

/* of the dispatch queue. */

CONTEXT *cs; /* by removing it from the head */

block() { /* Block the current Context Structure *//*-- Called from within a Control soft-instruction. */

/*-- Called by an interrupt routine to put the *//* address of a Context Structure now ready to *//* proceed into the ‘fork’ ring. */

add_context_to_ring( CONTEXT *cs ) {

null_cs.cs_tail = NULL_CS:

cs->cs_flink = NULL_CS;

*fork_input++ = cs; if( fork_input == END_RING )

Figure 0.10: Dispatcher Support Routines

which is true on many, but not all, hardware architec-tures. The size of the fork ring in this implementa-tion limits the degree of concurrency, although it canclearly be very large.

If the fork ring contains anything, the dispatch loop‘moves’ each context structure from the fork ringto the dispatcher’s ready queue by simply chasingthe interrupt level’s fork input pointer with thefork output pointer and calling unblock()for every address it encounters. The unblock()routine simply links the context structure onto theend of the ready list. This implementation assumesthat a null context structure, null cs, anchors thedispatcher ready list. Context structure elementcs flink is the forward link and pointer cs tailin the null context block points to the end of theready list. The cs flink pointer of the last contextstructure in the list points back to the null contextblock.

When all newly ready context structures have been

unblocked, this implementation simply selects thefirst context structure on the ready list as the soft-instruction stream to execute. Element cs nextcontains this stream’s high-level program counterwhich points to the stream’s current location in thecontrol table. The address of the soft-instructionimplementation corresponding to the current controltable location is obtained and stored in si addr.The stream’s program counter, cs next, is incre-mented to point to the next table location. The soft-instruction implementation is then called with theaddress of the context structure itself passed as thesingle argument to the soft-instruction implementa-tion.

Given this dispatch loop, and a typical interruptroutine design, the maximum penalty between acti-vation of any 2 successive soft-instructions is deter-mined by the number of external devices, each ofwhich may correspond to a single interrupt routineexecution, plus the time for the dispatch loop to call

20

unblock() on each corresponding context struc-ture. This is a fixed worst case overhead that will onlyoccur if all devices have outstanding requests whichcomplete in the course of the first soft-instructionexecution or the dispatcher’s unblock() loop. Inmost implementations, external interrupts will notbe ‘rearmed’ on individual interrupting devices untilspecific soft-instructions in the stream handling thatdevice perform I/O completion operations.

Figure 0.10 shows this implementation’sunblock()routine and 2 other support routines of interest.The block() routine is called internal to the im-plementation of a control soft-instruction, such asAWAIT IO COMPLETION. It simply removes thecurrent context structure from the head of the readylist. When the soft-instruction containing this callreturns to the dispatcher, the next iteration of the dis-patcher’s loop will no longer be able to advance thesoft-instruction stream corresponding to the blockedcontext structure. When using hardware interrupts,the address of the blocked context structure is of-ten stored within a control block corresponding tothe device performing the pending operation. Whenusing software interrupts or signals, arguments canusually be passed to the external service request.These values are then passed back by the externalenvironment to the software interrupt or completionroutine associated with the asynchronous request. Ineither case, the interrupt routine can trivially locatethe context structure corresponding to the event com-pletion. The interrupt routine then puts the address ofthis context structure into the fork ring using routineadd context to ring().

0.7 An Example

The soft-instruction program get string shownin Figures 0.11 and 0.12 is similar to the EsterelGET STRING program in Figure 0.3. Figure 0.11contains the high-level ‘reactive’ code and Figure0.12 contains the low-level ‘synchronous’ C code.The equivalent of the Esterel program’s hiddenbuild call is included in the soft-instruction source.The soft-instructionget string()() takes an ad-ditional argument specifying maximum buffer sizeand is also passed an I/O handle so it can be usedconcurrently by any number of instruction streams,

that is, unlike the Esterel program the soft-instructionversion supports an arbitrary degree of internal com-petitive concurrency within the component in whichit is located.

In this example, keywords defined by the softwarearchitecture implementation are in upper case, whilelower-case names denote items in the particularexample program written using this architecture.Names pertaining to external items defined by theexternal system are in lower case with upper caseleading characters.

Routineget string()() is declaredREACTIVE.Its arguments and automatic variables will be locatedin context structure long-term stacks, not the conven-tional C run-time stack. The soft-instructions shownhere, following Esterel, use 2 argument lists, thefirst for arguments that can be modified by the soft-instruction, often called in-out arguments, and thesecond for arguments that cannot be modified by thesoft-instruction, often called in arguments. Both pa-rameter lists are call-by-reference, providing a win-dow of selective exposure between the reactive andsynchronous levels that can be treated as a mergedsingle level.

A REACTIVE routine contains only declarations,soft-instructions, control-flow statements, and con-currency control statements. Limited expressionsresulting in boolean values can be used in control-flow conditionals. Except for iteration initialization,assignment is not allowed within a REACTIVE rou-tine.

In this example, soft-instruction init bufferreadies the buffer for I/O, after which a loop is en-tered in which soft-instruction post read issues asingle-character I/O request to read the next charac-ter into the buffer. After each asynchronous read re-quest is issued, the AWAIT IO COMPLETION soft-instruction blocks the instruction stream if the I/Orequest has not yet completed. I/O request com-pletion triggers a software interrupt routine whichcalls add context to ring(), thus resumingthe execution of the soft-instruction stream. Soft-instruction check read then checks for I/O er-rors, maintains the buffer, and sets the done flagif needed. Routine get string()() completesnormally when eithermax buf characters have beenreceived, a newline encountered, or a timeout occurs.

0.7. An Example 21

( int io_handle,

int max_buf )

{

char *p;

int i;

for(i=0;i<max_buf;i++) {

post_read( p )( io_handle, sb );

check_read( p, done )( sb );

if( TRUE == done ) break;

//-----------------------------------------

{

}

char *buf,

Status_Block sb;

REACTIVE get_string( int done )

}

} ON_ERROR {

}

ERROR_THROW( sb.ret_stat );

init_buffer( p, done )( buf );

get_string_io_err()( sb );

AWAIT_IO_COMPLETION;

if( sb.ret_stat == Timeout ) RETURN;

if( sb.ret_stat == Ctrl_Break ) RESTART( get_string );

INSTRUCTION( get_string_io_err )()( Status_Block sb )

Figure 0.11: get string – Reactive Level

The status variable in post read is the onlyvariable declared in the entire get string()()program that uses any space on the C run-time stack.Long-term data structure Status Block receivesI/O completion status when primitive Read Iocompletes. The Status Block structure andRead Io function are not part of the soft-instructionarchitecture but are primitives of the implementationin which get string()() is included. Theseprimitives are typical of an environment that sup-ports asynchronous requests. I/O request completionupdates the status block and readies the correspond-ing context structure. The Read Io call specifies atimeout after which the I/O request completes with atimeout status.

The ON ERROR statement specifies a reactive er-ror handler for get string()(). TheON ERROR

block is effectively a reactive level interrupt rou-tine, that is, a trap handler. In Figure 0.11, thisinterrupt routine executes only 1 soft-instruction,get string io err()(). Although in princi-ple the logic included within the implementation ofthis soft-instruction could be included directly withinthe reactive level interrupt handler, it is placed in thesoft-instruction implementation to illustrate a customsoft-instruction that alters context structure locationwithin the soft-instruction stream. Using the machineinstruction analogy, this soft-instruction correspondsto the implementation of a privileged machine in-struction, for instance, a return-from-interrupt or trapinstruction.

Theget string io err()() soft-instructionsource is included with the source for the REACTIVEget string()() routine because an error han-

22

//--------------------------------------

{

done = FALSE;

p = buf;

*p = ’\0’;

}

//--------------------------------------

{

int status;

}

//--------------------------------------

INSTRUCTION check_read( char *p,

int done )

{

if( ’\n’ == *p ) {

} else {

( Status_Block sb )

p++;

}

}

done = TRUE;

*p = ’\0’;

INSTRUCTION post_read( char *p )

( int io_handle,

if( !sb.ret_stat ) ERROR_THROW( sb.ret_stat );

if( !status ) ERROR_THROW( status );

INSTRUCTION init_buffer( char *p,

int done )

( char *buf )

done = FALSE;

Status_Block sb )

status = Read_Io( io_handle, p, 1, Timeout, &sb );

Figure 0.12: get string – Synchronous Soft-Instruction Level

dler can potentially be invoked at any point inget string()() execution. Placing the sourcein this location emphasizes the unique relationshipof the trap handler and its custom soft-instructions tothe code in which it is enabled. The ERROR THROWstatement propagates an error to the next high-est ON ERROR handler. The RESTART statementin Figure 0.11 causes the instruction stream torestart execution of the get string()() func-tion. The RETURN results in an immediate re-turn from get string()(). In this case con-trol transfers back to the reactive routine whichcalled get string()(). Reactive routines cancall other reactive routines, thus providing modular-ity within the control table, that is, at the reactivelevel. No such call is shown here.

Finally, soft-instructions from many instructionstreams may be executing in interleaved fashion, thusproviding an arbitrary degree of internal competitiveconcurrency. By design, this concurrency is notvisible in the get string()() source.

0.8 Soft-Instruction Advantages

The soft-instruction architecture illustrated in theexample implementation has the following advan-tages:� Separation of Concerns and Programming-

in-the-Large. The soft-instruction architectureis based on a separation of concerns betweenthe reactive high-level and the synchronous in-struction low-level. Concurrent programming,

0.8. Soft-Instruction Advantages 23

overall organization, and gross control flow aretreated as high-level programming-in-the-largethat reflects the deep structure of the overall pro-gram. Low-level issues such as data structuremaintenance are programming-in-the-small is-sues cleanly encapsulated within synchronoussoft-instructions. Logical concurrency resultsfrom interleaving at the soft-instruction levelof granularity, rather than explicit P and Vsemaphore coding.

A reactive subset of C is used for programming-in-the-large. The reactive level is programmedin familiar sequential procedure-oriented fash-ion, rather than using a nonprocedural or non-sequential approach, such as results from usingguarded commands or path expressions [And79][AS83]. Programming-in-the-small uses a mi-nor variant of C, an ordinary systems program-ming language.� Single Locus Concurrency. Concurrency isexplicitly visible when reading the source ‘fromthe top down’. Concurrency is visible at oneprogram level, the reactive level, and can beunderstood in a reading confined to the sourceof all the reactive routines. There is no ‘hidden’special-case concurrency lurking at the bottomof long call chains initiated by arbitrary Croutines. Rather, it is clear that interleavingcan occur between every soft-instruction.

Soft-instruction program source is ‘structured’with respect to concurrency. Control flow ob-scured by gotos is considered problematic. Ifso, hidden concurrency in many conventionalmultithreading implementations is worse be-cause nothing about the potential concurrencyevent is necessarily visible at the point of invo-cation in the program source and any subroutinecall can result in numerous concurrency events.

There is minimal low-level ‘bookkeeping’ codein the high-level reactive code. This single locusof concurrency information makes concurrencyexplicit in the entire architecture of the program,not just an implicit side effect of some systemcall. Thus, it may be fair to say there isno invisible ‘spaghetti concurrency’. To thesystems programmer, this means that all code

does not need to be read ‘bottom-up’, searchingfor ‘buried’ concurrency constructs.

Critical sections protecting main memorydata structures have been subsumed by soft-instructions. As with monitors, external deviceor request serialization is typically performedby blocking context structures on a wait queuefrom which they are later unblocked. For ex-ample, AWAIT IO COMPLETION could sim-ply link the context structure onto the correctdevice waiting list.

However, it is not the case that all criticalsection and semaphore-based coding has beenirrevocably banished. The programmer caneasily construct any desired concurrent pro-gramming soft-instructions, and can thus pro-gram at the reactive level with concurrent pro-gram constructs tailored to the internal con-currency requirements of the system compo-nent. For instance, custom soft-instructionsP()(resource) andV()(resource) canreadily be implemented. The effect of a criti-cal section in the control table defined by thesesoft-instructions would be to cause a sequenceof soft-instructions to be treated as atomic withrespect to some logical resource. Even in thiscase, all concurrency remains visible at the re-active control table level. The reactive levelin a real system is usually considerably smallerthan the synchronous level. In practice, the useof 2 layers seems to significantly reduce boththe demand for explicit critical sections and theamount of code that must be studied to under-stand the concurrency aspects of the program.One finds oneself studying a dozen pages ofcode instead of hundreds. Quantifying the re-duction in explicit concurrent programming andthe relative sizes of the reactive and synchronouslevels requires additional study.� No Concurrent C Programming. No explicitconcurrent programming is possible in the com-piled C code, that is, in the synchronous soft-instruction implementations. Potential concur-rent programming bugs in the low-level C codeare thus eliminated. Deadlock requires mutualexclusion, wait-and-hold, no preemption, and

24

circular wait. Soft instructions prevent dead-lock from occuring since there is no wait-and-hold (strong) and there is preemption (weak)in that any correctly written soft-instruction isrequired to terminate.

The synchronous C code is more understand-able without embedded concurrency primitives.When writing C code for soft-instructions onecan concentrate on data structure bookkeeping.Because each soft-instruction executes to com-pletion, globals (such as hardware registers)cannot be updated nondeterministically duringthe same ‘instant’, that is, one can think ofthe soft-instructions as executing atomically atdiscrete points in time during which ‘invisible’events cannot happen.� Context-switch. Context switch is feather-weight, with no copying of register sets re-quired. Creation of a concurrent service threadis also very low overhead. Context switch over-head is less than for a system using lightweightconcurrency, and significantly less than theheavyweight overhead found in a general pur-pose operating system. Since soft-instructionsrun to completion, context does not need tobe saved upon featherweight context switch be-cause context is saved ‘on-the-fly’ in the currentcontext structure.� Stack Space. An entire soft-instruction pro-gram, no matter what the degree of internalconcurrency, requires only 1 conventional run-time stack. Short term variables all reuse thissame short term stack, while long term variablesuse the bounded size stacks embedded in con-text structures. If soft-instruction implemen-tations never nest, that is, never call anothersoft-instruction implementation or recursivelycall themselves, maximum short-term run-timestack depth is simply the deepest conventionalstack usage of any single soft-instruction imple-mentation.� Reusability. Because soft-instructions havearguments, soft-instructions are general routinesthat can be reused. They are not staticallybound to specific transaction or object blockelements. Thus, as with a number of threadedenvironments, application development results

in the evolution of a special application-oriented‘vocabulary’ at the reactive level.

This soft-instruction architecture provides practi-cal advantages to the practicing systems programmer.The soft-instruction architecture does not precludebeing used in conjunction with other conventionaltechniques and can be used within components ofexisting systems. Because the reactive dispatcheris small enough to be routinely customized, the ar-chitecture leaves the system programmer in controlat all levels, unlike approaches which preempt theprogrammer’s design prerogatives [SW92].

The synchronicity model adopted by soft-instructionsis weaker than the strong synchronicity model as-sumed by synchronous/reactive languages such asEsterel, but it is intuitive to systems programmersfamiliar with hardware instruction sets. Since sys-tems programmers are responsible for ‘disguising’the low-level machine hardware, they must be inti-mately familiar with low-level hardware instructionset details. Pragmatic advantages result from usingfamiliar cognitive models at multiple programminglevels.

0.9 Related Work

Some historical and current work of interest isbriefly described in this section. Variations of thesoft-instruction architecture are perhaps among theoldest architectures used for concurrent real-timeprogramming. The generic features of the abstractarchitecture, however, do not appear widely appreci-ated. Soft-instruction techniques have not been or-ganized, analyzed, and presented so that the featuresand advantages of the generic software architectureare clear. General purpose toolsets are not availablesupporting concurrent systems programming usingsoft-instructions.

0.9.1 Historical Work

Systems implemented using techniques similar tothose described here have usually suffered from low-level ad-hoc connotations often confounded withlow-level assembler implementation concerns. Real-time cyclic executives, for example, have been imple-mented essentially using soft-instruction techniques.

0.9. Related Work 25

Each such system has developed its own custom reac-tive language, table compiler/assembler, and meansfor the synchronous instructions to interact with thereactive layer.

The advantages of ‘traditional’ soft-instruction fla-vored cyclic executives, as used in real-time avionics,have previously been enumerated and contrasted withconventional multithreading techniques using pre-emptive scheduling of lightweight threads [Mac80][Gla83] [Sha86b] [BS86]. MacLaren, in discussingone such system, notes that ‘the efficiency of a cyclicexecutive derives from its minimal scheduling prop-erty, and from the very small implementation cost’[Mac80].

Shaw classifies real-time software as eitherbased on concurrent interacting processes, or basedon table-driven soft-instruction style approaches,which he terms slice-based following that usagein BBN’s Pluribus IMP Arpanet communicationprocessor [OCK+75] [Sha86b]. Shaw notes thatsoft-instruction equivalents have been called slices,chunks, and strips. He contrasts the two real-timesoftware architectures, and calls for research in in-cluding time as a first class programming object. Cur-rent real-time and concurrent programming researchtends to emphasize replacing, rather than moderniz-ing, soft-instruction related approaches.

Baker and Scallon describe a family of systemsimplemented at Boeing using a soft-instruction ar-chitecture [BS86]. They note that its lineage in-cludes the executives of the 1965 Apollo Range In-strumentation Ship and the 1970 Safeguard ballisticmissile defense effort. They point out that a soft-instruction architecture provides a high-level virtualmachine language, reduces the need for explicit syn-chronization, and provides the benefits of splittingthe system into 2 levels, one intended for high-levelprogramming, and one for low-level programming.They describe a member of this family as follows:

‘Rex’s software architecture is character-ized by its view of the executive as an in-dependently programmable machine thatexecutes application procedures written inconventional programming languages asif they were individual instructions of ahigher level program.

... Programming in the large is concernedwith producing a program in the machinelanguage of the executive, while program-ming in the small is concerned with ex-tending its instruction set.

... Application procedures are coded in aconventional programming language andcompiled into machine code of a physicalmachine. A plan for the managementof these procedures’ executions to form asystem is expressed in a separate system-specification language, and is separatelytranslated into tables (agendas) used bythe Rex virtual machine. These tablesare in effect a high-level machine codeinterpreted by the executive’ [BS86].

They claim the resulting system provides a vir-tual machine for the system specification, rather thansimply a virtual machine on which the applicationprogram executes. Baker and Scallon trace the ori-gins of these systems at least back to the AgPrep sys-tem developed by DBA Systems for the Apollo rangeship [Joh70]. This system apparently functioned asa special linker that processed tables of entry pointsand scheduling requirements called agendas.

The soft-instruction approach has many similari-ties to the threaded-code architectures used by someMicrosoft applications and languages such as Forthand UCSD Pascal [Kog82]. These systems are usu-ally single threaded, that is, have a single contextblock, and thus concurrent programming is not inte-gral to the basic threading system. Threaded-code isoften used as an implementation technique in mem-ory constrained environments. The size of the com-piled control table, and thus the resulting executable,depends on the control table data structure design.Well designed control tables can take considerablyless space than the equivalent assembler code. Thus,compilers generating threaded-code were originallyintroduced to implement high-level languages, suchas Fortran, on small address-space minicomputerswhere program size was critical [Bel73] [Bre78].Besides not directly supporting concurrent program-ming, most existing threaded-code architectures im-plement a rather fixed set of low-level primitives,with the bulk of the program occuring at the controltable level.

26

Allworth provides a short description of real-timethreaded-code soft-instruction architectures withonly a single high-level program counter [All81].He notes the flexibility and space savings providedby what is now called token-threaded code and sim-ply calls the dispatcher an ‘interpreter’. Token-threaded code compresses control tables by using in-dices smaller than the machine address size to accessan intermediate table. Allworth notes the analogywith machine hardware instructions:

‘A compiler translates... a high-level-language program into an equivalent ma-chine code program. ... hardware readsan instruction, interprets the instruction tomean that it must carry out a certain ac-tion, then executes that action. A pieceof software that acts in this way, reading,interpreting and executing a sequence ofcoded instructions, is called an interpreter.

... The action code address table containsthe value of the start location of the pro-gram code that implements each possibleaction... Each action is given an instruc-tion code... The instruction pointer indi-cates which code is to be interpreted next.On each cycle of the interpreter it is in-cremented to point to the next code in se-quence. Non-sequential jumps within theinterpreted code can be implemented by al-lowing action programs to manipulate theinstruction pointer’ [All81].

The view of non-concurrent programs as abstractlayers of virtual instructions is an old one. Forinstance, Dijkstra writes:

‘I want to view the main program as ex-ecuted by its own, dedicated machine,equipped with the adequate instructionrepertoire operating on the adequate vari-ables and sequenced under control of itsown instruction counter, in order that mymain program would solve my problem ifI had such a machine. I want to view itthat way, because it stresses the fact thatthe correctness of the main program can bediscussed and established regardless of theavailability of this (probably still virtual)machine....

... this ideal machine will turn out not toexist, so our next task – structurally similarto the original one – is to program the sim-ulation of the ‘upper’ machine. ... we haveto decide upon data structures to providefor the state space of the upper machine;furthermore we have to make a bunch ofalgorithms, each of them providing an im-plementation of an instruction assumed forthe order code of the upper machine. Fi-nally, the ‘lower’ machine may have a setof private variables, introduced for its ownbenefit and completely outside the scopeof the upper machine... until finally wehave a program that can be executed byour hardware’ [DDH72].

Perhaps because Dijkstra is explaining layered ar-chitecture and step-wise design in a book on struc-tured programming, he does not propose literally im-plementing such a multiple-level instruction inter-preter, but rather is motivating an abstract model ofstructured programming languages.

Early operating systems often treated software im-plemented instructions similar to hardware instruc-tions with respect to sequencing and concurrency.Operating system services were considered simplyspecial assembler instructions implemented in soft-ware instead of hardware, that is, instructions in theprogram instruction stream to be executed interpre-tively by the operating system. Purser and Jenningssummarize this viewpoint:

‘The basic instruction code of the com-puter is frequently supplemented with vir-tual instructions (VIs). These are subrou-tines which are available for performingcertain critical operations: they are codedas such for the usual reason of writing asubroutine (saving repetition of code) butalso, more importantly, to incorporate theminto the executive. VIs perform operationson executive and other data on behalf ofprocesses, with the result that processes donot have to operate on such data (includingtheir own PCBs) directly. Many VIs canbe constructed...

... In general, therefore, a VI is not en-tered in parallel and hence is non-reentrant’

0.9. Related Work 27

[PJ75].Mixed hardware/software instruction stream inter-

pretation, with custom user written routines extend-ing the machine instruction set, is an old idea [Bra82].Such approaches are often seen today on RISC archi-tectures, for instance PALcode on the DEC Alphasprovides a mechanism for operating system program-mers to develop privileged ‘hardware’ instructionsspecific to the support of their system. Similarly, inmany CISC processor families, software instructionexecution has been used to provide compatibility be-tween low-end and high-end members of a processorfamily.

0.9.2 Current Work

Huizing and Gerth have proposed a 2 level seman-tics with separate modularity and causality levels thatis suggestive of the 2 level soft-instruction architec-ture. In their semantics ‘global’ time is more abstractthan ‘local’ time. This proposal is of special interestas it is intended to overcome problems in the seman-tics of Esterel [HG91].

Discrete-event simulations often use architecturessimilar to soft-instructions but without direct real-time application. These environments are usually notintended for development of large system programs.An example of such a programming environment isReactive-C, not to be confused with the RC describedearlier in this paper [Su90].

There have been efforts in the debugging commu-nity to analyze sequential programs and determineinformation similar to that needed to automaticallygenerate soft-instructions [Wei82]. These programslicing algorithms are currently impractical for realprograms [GL91] [Hu93].

In parallel and distributed programming research,coordination frameworks support subroutine-levelparallelism [Pan93]. Examples of such systems areParallel Virtual Machine (PVM), STRAND88, Pro-gram Composition Notation (PCN), and Express.These systems have many soft-instruction character-istics:

‘The programmer codes subroutines insome standard programming language...the tool automatically generates a sourcecode ‘wrapper’ for each subroutine, as well

as a driver... The coordination language isnot really ‘compiled’; rather, it is trans-formed into calls to runtime libraries ina preprocessing step before compilation’[Pan93].

These systems are intended for writing distributedapplications on top of existing operating systems.The level of granularity of these systems is often notappropriate for systems programming.

A large amount of work has been performed onmultithreaded runtimes and parallel programmingenvironments for massively parallel machines. Asystem of particular interest is Cilk, a multithread-ing parallel programming C run-time influenced bydataflow research [BJK+95]. It is used to programparallel MIMD machines for a particular class ofcomputationally intensive distributed computations.Cilk provides high-performance cooperative concur-rency in a distributed environment.

As with soft-instruction architectures, individualCilk routines are atomic units of computation whichalways run to completion. Cilk calls such routinesthreads. All Cilk threads participating in a givencomputation on a single machine share the same stacklocations and return to a common scheduling and dis-patching loop. No lightweight context switching isrequired as context is explicitly saved ‘on-the-fly’ indata structures called closures. A unique closure cor-responds to each thread. Closures resemble customcontext structures created dynamically before eachthread’s activation and deleted upon thread comple-tion.

A Cilk thread’s argument list effectively can spec-ify guards corresponding to locations within its clo-sure. When all arguments become valid, the guardfires, closure arguments are copied to automatic vari-ables, and the thread executes, processing the ‘argu-ments’. Cilk threads cannot return values to theirparents. Rather, the programmer uses explicit callsto place data into other closures using descriptors in-dicating specific closure locations. Cilk calls suchdescriptors continuations. The parent thread passesits children all needed continuations.

This programming model leads to an explicit con-tinuation style of programming in which a parentthread never waits for a child, but rather explic-itly spawns a successor thread which blocks until

28

all its arguments are provided by the children ofthe first thread. Programming in this style raisesprogramming-in-the-large issues and can result in a‘chained’ style of programming with a sense of high-level control that can be characterized by ‘next do’and ‘after goto’ control logic.

The applicability of this model is described by itsimplementors as follows:

‘Although Cilk offers performance guar-antees, its current capabilities are limited,and programmers find its explicit continu-ation passing style to be onerous. Cilk isgood at expressing and executing dynamic,synchronous, tree-like, MIMD computa-tions, but it is not yet ideal for more tra-ditional parallel applications that can beprogrammed effectively in, for example,a message-passing data-parallel, or single-threaded shared-memory style’ [BJK+95].

Cilk is of interest as its developers note and studythe performance advantages resulting from varioussoft-instruction related techniques, for instance, theuse of atomic routines, a single scheduler/dispatchloop, and featherweight context switch provided bya thread model based on a linguistic abstraction.Through studied in a different context, many of themotivations for such techniques are equally applica-ble to concurrent systems programming.

Also of direct relevance are the observations notingthat, although well suited for the class of computa-tions for which it is intended, the programming styleis questionable for general purpose programming-in-the-large. Such concerns are similar to those motivat-ing the explicit reactive level of the soft-instructionarchitecture.

0.10 Conclusions

The soft-instructionsoftware architecture has beendescribed. This architecture supports implementa-tion of general purpose concurrent system software.Soft-instructions provide many of the benefits ofthe synchronous/reactive languages that assume thestrong synchrony hypothesis, while providing pro-gram source that is easier to read, comprehend, andmaintain. In addition, the soft-instruction architec-ture can coexist with existing systems, can flexibly

adapt to many environments, and has many similari-ties to the hardware instruction set architectures withwhich systems programmers are very familiar.

0.10. Conclusions 29

References

[ACJ+95] Mark A. Ardis, John A. Chaves,Lalita Jategaonkar Jagadeesan, Pe-ter Mataga, Carlos Puchol, Mark G.Staskauskas, and James Von Olnhausen.A framework for evaluating specifica-tion methods for reactive systems. InProceedings of the 17th InternationalConference on Software Engineering,pages 159–168, April 1995.

[ALBL91] Thomas E. Anderson, Henry M. Levy,Brian N. Bershad, and Edward D. La-zowska. The interaction of architec-ture and operating system design. InProceedings of the Fourth InternationalConference on Architectural Support forProgramming Languages and OperatingSystems, pages 108–120, April 1991.

[All81] Steve T. Allworth. Introduction to Real-Time Software Design. Springer-Verlag,1981.

[And79] S. Andler. Predicate path expressions.In Sixth Annual ACM Symposium onPrinciples of Programming Languages,pages 226–236, 1979.

[AS83] G. R. Andrews and F. B. Schneider. Con-cepts and notations for concurrent pro-gramming. ACM Computing Surveys,15(1):3–43, March 1983.

[Atw76] J. William Atwood. Concurrency inoperating systems. IEEE Computer,9(10):18–26, October 1976.

[BB91] Albert Benveniste and Gerard Berry.The synchronous approach to reactiveand real-time systems. Proceedings ofthe IEEE, 79(9):1270–1282, September1991.

[BC84] Gerard Berry and Laurent Cosserat. TheEsterel synchronous programming lan-guage and its mathematical semantics.In G. Goos and J. Hartmanis, editors,Lecture Notes in Computer Science, 197,Seminar on Concurrency, pages 389–448. Springer–Verlag, July 1984.

[BdS91] Frederic Boussinot and Robert de Si-mone. The Esterel language. Pro-ceedings of the IEEE, 79(9):1293–1304,September 1991.

[Bel73] James R. Bell. Threaded code. Commu-nications of the ACM, 16(6):370–372,June 1973.

[BG92] Gerard Berry and Georges Gonthier. TheEsterel synchronous programming lan-guage: Design, semantics, implemen-tation. Science of Computer Program-ming, 19(2):87–152, November 1992.

[BGJ91] Albert Benveniste, Paul Le Guernic, andChristian Jacquemot. Synchronous pro-gramming with events and relations:the Signal language and its seman-tics. Science of Computer Programming,16:103–149, 1991.

[BJ87] K. Birman and T. Joseph. Exploiting vir-tual synchrony in distributed systems. InProceedings of the Eleventh ACM Sym-posium on Operating System Principles,pages 123–138, November 1987.

[BJK+95] Robert D. Blumofe, Christopher F. Jo-erg, Bradley C. Kuszmaul, Charles E.Leiserson, Keith H. Randall, and YuliZhou. Cilk: An efficient multithreadedruntime system. In Proceedings of theFifth ACM SIGPLAN Symposium onPriciples and Practice of Parallel Pro-gramming, pages 207–216, July 1995.

[Bou91] Frederic Boussinot. Reactive C: An ex-tension of C to program reactive sys-tems. Software – Practice and Experi-ence, 21(4):401–428, April 1991.

[Bou92] Frederic Boussinot. RC reference man-ual. Technical Report CMA 92–16,Ecole Nationale Superieure des Mines,1992.

[Bra82] James Brakefield. Just what is an op-code? or a universal computer design.Computer Architecture News, 10(4):31–34, June 1982.

30

[Bre78] Ronald F. Brender. Turning cousinsinto sisters: An example of softwaresmoothing of hardware differences. InC. Gordon Bell, J. Craig Mudge, andJohn E. McNamara, editors, ComputerEngineering: A DEC View of HardwareSystems Design, pages 365–378. DigitalPress, 1978.

[BS86] Theodore P. Baker and Gregory M. Scal-lon. An architecture for real-time soft-ware systems. IEEE Software, 2(5):50–58, May 1986.

[Bud95] Reinhard Budde. Esterel. In FormalDevelopment of Reactive Systems: CaseStudy Production Cell, pages 75–100,1995.

[DDH72] O. J. Dahl, Edsger W. Dijkstra, andC. A. R. Hoare. Structured Program-ming. Academic Press, 1972.

[Dij75] Edsger W. Dijkstra. Guarded com-mands, nondeterminacy and formalderivation of programs. Communica-tions of the ACM, 18(8):453–457, Au-gust 1975.

[DK76] Frank DeRemer and Hans H. Kron. Pro-gramming in the large versus program-ming in the small. IEEE Transactions onSoftware Engineering, 2(2):80–86, June1976.

[Edw94] Stephen Edwards. An Esterel compilerfor a synchronous/reactive developmentsystem. Technical Report ERL 94–43,University of California, Berkeley, June1994.

[FP88] Stuart R. Faulk and David L. Parnas. Onsynchronization in hard-real-time sys-tems. Communications of the ACM,31(3):274–287, March 1988.

[Fur92] Koichi Furukawa. Logic programmingas the integrator of the fifth generationcomputer systems project. Communica-tions of the ACM, 35(3):82–92, March1992.

[Gar95] David Garlan. First international work-shop on architectures for software sys-tems: Workshop summary. ACM Soft-ware Engineering Notes, 20(3):84–89,July 1995.

[GL91] Keith Brian Gallagher and James R.Lyle. Using program slicing in softwaremaintenance. IEEE Transactions onSoftware Engineering, 17(8):751–760,August 1991.

[Gla83] Robert L. Glass. Real-Time Software.Prentice-Hall, 1983.

[GP95] David Garlan and Dewayne E. Perry. In-troduction to the special issue on soft-ware architecture. IEEE Transactions onSoftware Engineering, 21(4):269–274,April 1995.

[GTP95] David Garlan, Walter Tichy, andFrances Paulisch. Summary of theDagstuhl workshop on software archi-tecture. ACM Software EngineeringNotes, 20(3):63–83, July 1995.

[Hal77] Horst Halling. Steps towards the imple-mentation of a parallel code executor.In Proceedings of the IFAC/IFIP Work-shop, pages 55–63, June 1977.

[Hal93] Nicolas Halbwachs. Synchronous Pro-gramming of Reactive Systems. KluwerAcademic Publishers, 1993.

[Her90] Maurice Herlihy. A methodology for im-plementing highly concurrent data struc-tures. In Proceedings of the SecondACM SIGPLAN Symposium on Princi-ples and Practice of Parallel Program-ming (PPOPP), March 1990.

[Her91] Maurice Herlihy. Wait-free synchro-nization. Communications of the ACM,11(1):124–149, January 1991.

[HG91] C. Huizing and R. Gerth. Semantics ofreactive systems in abstract time. InReal-Time: Theory in Practice, pages291–314, June 1991.

[HP85] D. Harel and A. Pnueli. On the devel-opment of reactive systems. Logics andModels of Concurrent Systems, NATOASI Series, 13:477–498, 1985.

0.10. Conclusions 31

[Hu93] Shibin Hu. Automatic Generation ofLanguage-Based Program Slicer. PhDthesis, Wayne State University, 1993.

[Joh70] Frederick C. Johnson. Real-time dataprocessing and orbit determination onthe Apollo tracking ships, NASA-CR-111576. In AGARD Conference Pro-ceedings No. 68 on the Applicationof Digital Computers to Guidance andControl, AGARD-CP68-70, pages 22–32. Harford House, June 1970.

[JS89] S. B. Jones and A. F. Sinclair. Functionalprogramming and operating systems.The Computer Journal, 32(2):162–174,1989.

[KKW94] Andrew J. Kozubal, Debora M. Ker-stiens, and Rozelle M. Wright. Experi-ence with the State Notation Languageand run-time sequencer. Nuclear Instru-ments and Methods in Physics ResearchA, 352(1,2):411–414, 1994.

[Kog82] Peter M. Kogge. An architectural trailto threaded-code systems. IEEE Com-puter, 15(3):22–32, March 1982.

[Kop91] H. Kopetz. Event-triggered versustime-triggered real-time systems. InA. Karshmer and J. Nehmer, editors,Operating Systems of the 90s and Be-yond, pages 87–101. Springer–Verlag,July 1991.

[Koz93] Andy Kozubal. State Notation Lan-guage and Run-time Sequencer UsersGuide. Los Alamos National Labora-tory, September 1993.

[KVK91] Ilkka Kuuluvainen, Mika Vanttinen, andPerttu Koskinen. The action-state dia-gram: A compact finite state machinerepresentation for user interfaces andsmall embedded reactive systems. IEEETransactions on Consumer Electronics,37(3):651–658, August 1991.

[Lam94] Leslie Lamport. Processes are in theeye of the beholder. Technical Report132, Digital Systems Research Center,December 1994.

[Laz91] Edward D. Lazowska. Operating systemsupport for high-performance architec-tures. In A. Karshmer and J. Nehmer, ed-itors, Operating Systems of the 90s andBeyond, pages 40–43. Springer–Verlag,July 1991.

[LL95] Claus Lewerentz and Thomas Linder.Formal Development of Reactive Sys-tems: Case Study Production Cell.Springer-Verlag, 1995.

[LS86] Stanley Letovsky and Elliot Soloway.Delocalized plans and program compre-hension. IEEE Software, 3(3):41–49,May 1986.

[Mac80] Lee MacLaren. Evolving toward ADAin real-time systems. In Proceedingsof the ACM SIGPLAN Symposium onthe ADA Language, Boston, December1980.

[Mar90] R. Jordan Martin. Transaction Process-ing Facility: A Guide for ApplicationProgrammers. Yourdon Press, 1990.

[MB91] J.C. Mogul and A. Borg. The effectof context switches on cache perfor-mance. In Fourth International Con-ference on Architectural Support forProgramming Languages and OperatingSystems, pages 75–84, 1991.

[Mil93] Robin Milner. Turing award lecture: El-ements of interaction. Communicationsof the ACM, 36(1):78–89, January 1993.

[MK93] K. R. Mayes and J. A. Keane. Levelsof atomic action in the Flagship parallelsystem. Concurrency: Practice andExperience, 5(3):193–212, 1993.

[MS90] Gary J. Murakami and Ravi Sethi. Ter-minal call processing in Esterel. ATT,January 1990.

[MW91] Keith Marzullo and Mark D. Wood.Tools for constructing distributed reac-tive systems. Technical Report TR 91-1193, Cornell, February 1991.

[Neh91] Jurgen Nehmer. The immortality of op-erating systems, or is research in operat-ing systems still justified? In A. Karsh-mer and J. Nehmer, editors, Operating

32

Systems of the 90s and Beyond, pages77–83. Springer–Verlag, July 1991.

[OCK+75] S.M. Ornstein, W.R. Crowther, M.F.Kraley, R.D. Bressler, A. Michel, andF.E. Hart. Pluribus – a reliable multi-processor. In Procedings of the AFIPS1975 Conference, pages 551–559, 1975.

[Ous89] John Ousterhout. Why aren’t operatingsystems getting faster as fast as hard-ware? Technical Report TN-11, DigitalWestern Research Laboratory, October1989.

[Pan93] Cherri M. Pancake. Multithreadedlanguages for scientific and technicalcomputing. Proceedings of the IEEE,81(2):288–304, February 1993.

[PJ75] W. F. C. Purser and D. M. Jennings. Thedesign of a real-time operating systemfor a minicomputer. Software – Practiceand Experience, 5:147—167, 1975.

[Rep95] John H. Reppy. First-class synchronousoperations. In Proceedings of theFirst International Workshop on Theoryand Practice of Parallel Programming,pages 235–252, February 1995.

[Sch86] Fred B. Schneider. Concepts for concur-rent programming. In Current Trends inConcurrency, pages 669–236. Springer–Verlag, 1986.

[Sha86a] Ehud Shapiro. Concurrent Prolog:A progress report. IEEE Computer,19(8):44–58, August 1986.

[Sha86b] Alan C. Shaw. Software clocks, con-current programming, and slice-basedscheduling. In Proceedings of the 1986Real-Time Systems Symposium, pages14–18, December 1986.

[Sha87] Ehud Shapiro. Systems programmingin Concurrent Prolog. In ConcurrentProlog, volume 2, pages 6–27, 1987.

[Su90] Wen-King Su. Reactive-Process Pro-gramming and Distributed Discrete-Event Simulation. PhD thesis, CaliforniaInstitute of Technology, October 1990.

[SW92] Mary Shaw and William A. Wulf. Tyran-nical languages still preempt system de-sign. In Proceedings 1992 InternationalConference on Computer Languages,pages 200–211. IEEE Computer Soci-ety Press, April 1992.

[Wat90] David A. Watt. Programming LanguageConcepts and Paradigms. Prentice Hall,1990.

[Wei82] Mark Weiser. Programmers use sliceswhen debugging. Communications ofthe ACM, pages 446–452, July 1982.


Recommended