DIGITAL SYSTEMS LABORATORY Ii.stanford.edu/pub/cstr/reports/csl/tr/77/149/CSL-TR-77...direct control...

DIGITAL SYSTEMS LABORATORY II I

STANFORD ELECTRONICS LABORATORIESDEPARTMENT OF ELECTRICAL ENGINEERING

STANFORD UNIVERSITY - STANFORD, CA 94305 SU-SEL 77-03

INTERPRETIVE MACHINES

bY

John K. Iliffe

June 1977

Technical Report No. 149

The work described herein was supported in part by theJoint Services Electronics Program under Contract No.NO00 14-75-060 1. The lectures also form part of a courseon “The Microprocessor and its Application” held at theUniversity College Swansea under the auspices of theInformatics Training Group of the E.E.C. in September 1977. L - l

SU-SEL 77-030


bY

John K. Iliffe

June 1977


Digital Systems Laboratory

Departments of Electrical Engineering and Computer Science

Stanford University

Stanford, CA 94305

The work described herein was supported in part by the Joint ServicesElectronics Program under Contract No. N00014-75-0601. The lectures alsoform part of a course on "The Microprocessor and its Application" held atthe University College Swansea under the auspices of the Informatics TrainingCroup of the E.E.C. in September 1977.

SU-SEL 77-030

Digital Systems LaboratoryDepartments of Electrical Engineering and Computer Science

Stanford UniversityStanford, CA 94305


June 1977


bY

John K. Iliffe

ABSTRACT

These lectures survey attempts to apply computers directly to high levellanguages using microprogrammed interpreters. The motivation for such workis to achieve language implementations that are more effective in some measureof translation, execution or response to the user than would otherwise beobtained. The implied comparison is with the established technique of compilinginto a fixed general-purpose machine code prior to execution. It is arguedthat while substantial benefits can be expected from microprogramming it doesnot represent the best approach to design when the contributing factors areanalysed in a general system context, that is to say when wide performancerange, multiple source language, and stringent security requirements have tobe satisfied. An alternative is suggested, using a combination of interpre-tation and a primitive instruction set and providing security at the microprogramlevel.

The work described herein was supported in part by the Joint ServicesElectronics Program under Contract No. N00014-75-0601. The lectures alsoform part of a course on "The Microprocessor and its Application" held atthe University College Swansea under the auspices of the Informatics TrainingGroup of the E.E.C. in September 1977.


J. K. Iliffe

International Computers Limited

These lectures survey attempts to apply computers directly tohigh level languages using microprogrammed interpreters. Themotivation for such work is to achieve language implementationsthat are more effective in some measure of translation, executionor response to the user than would otherwise be obtained. Theimplied comparison is with the established technique of compilinginto a fixed general-purpose machine code prior to execution. Itis argued that while substantial benefits can be expected frommicroprogramming it does not represent the best approach to designwhen the contributing factors are analysed in a general systemcontext, that is to say when wide performance range, multiplesource language, and stringent security requirements have to besatisfied. An alternative is suggested, using a combination ofinterpretation and a primitive instruction set and providingsecurity at the microprogram level.

The early lectures review the history and terminology of micro-programmable machines. Knowledge of conventional practice isassumed. Readers already experienced in microprogramming shouldskip rapidly to Lecture 3.

1 MICROINSTRUCTION DESIGN

If we abandon the conventional machine code (at least temporar-ily) as a means of defining the computer's function set it isnecessary to fall back on the next level of description, i.e. themicrocode. A very extensive literature has grown up around thatsubject in recent years, but I think it is true to say that nocommonly accepted theory or principles have emerged: that is theconsequence of rapid changes in the process of manufacturinglogical devices which force a continual revision of the economicsof design. In the introductory lectures we shall study theevolution of microprogrammed machines, but one can do little morethan present a collection of techniques. For detailed study ofapplication to machine language interpretation the student isreferred to Husson (1970), where an extensive bibliography to

1968 will be found, and to Boulaye (1971), for a shorter survey oftechniques. In the following notes I can do no more than providean outline of design principles and introduce terminology.

The branch of technology that enables a raw microprocessor tointerpret a given order code is termed 'microsystem design'. Ifone machine is to interpret one order code it is a very localisedaffair. If several machines must imitate two or three order codesthe need for standard procedures and documentation arises: in themajor application areas this is treated very much as an extensionof the logic design. Tucker (1967) and Husson have written infor-matively on that aspect of microsystems. However, high levellanguages are not nearly as well defined as machine codes, theyare generally more complex, subject to greater variation, and out-side the control of any one laboratory.. A survey by Rosin high-lights some of the difficulties involved, Rosin (1969). We shallreturn to that subject in the last lecture, showing how it affectsmachine design. For the time being, let us recall how a micro-programmed machine handles the interpretation of a single 'targetinstruction set' or 'machine code'.

The first application of microprogramming as a formal techniqueis generally attributed to the designers of EDSAC- at CambridgeUniversity, Wilkes (1958). It is a systematic way of controllingthe flow of signals through the data paths of a processing unit,each path, or in some cases each function of the processor, beingdetermined by a bit in a microinstruction. If we regard the stateof the processor as defined by the assembly of registers and con-trol flip-flops, then a microinstruction determines a simple tran-sition from one state to another. The attraction of the techniqueis that transformations of any complexity can be composed by apply-ing a sequence of microinstructions: the limitations imposed by adhoc control logic, which are apparent in the areas of machinedefinition and construction, are greatly reduced. At a time whenrelatively complex target instructions are thought to be the keyto greater machine efficiency, the introduction of microinstruc-tions obviously has great attraction.

The source of microinstructions is a store, which will becalled the control memory in the present context. A single bitin the microinstruction can control the transmission of an entirefield from one register along several parallel paths in oneprocessor 'cycle'; another bit, or group of bits, will select adestination register and field. It is fairly easy to evolve arequirement for fifty or more bits in the microinstruction tocontrol the possible data paths in the processor.

The second requirement of the microinstruction is to determineits successor. Application of a sequencing rule determines thestring of actions carried out by the processor which, whenproperlydefined, will interpret a target instruction. One of the simplest

2

ways of sequencing is to place the next microinstruction addressLn the one currently being obeyed. To achieve conditional branch-ing effects it is necessary to use the state of the processinglogic in the calculation of at least part of the next address.The elements of the machine can be visualised as in Figure 1.The machine operates in three steps; i.e.:

1. Access control memory using the microinstruction address.

2 . Use the microinstruction to control the state transitionof the processor logic

3* Use microinstruction digits and the result of step 2 todetermine the next microinstruction address.

CONTROL

MEMORY

rp

M I C R O I N S T R U C T I O Nr

(STEP 2

irIi

bPROCESSOR

L O G I C

!EXTrDDRESS

I (STEP 3)----l MICROSEQXNCER

STATUS

Figure 1: Microprogram Control

The development of microprogrammable machines from the aboveprinciple of design leads to great elaboration of detail, themain considerations being (a) optimising the use of controlmemory, (b) achieving balanced timing of control memory andprocessor logic, and (c) organising the registers and data pathsof the processor to suit the class of target machines of interest.I shall discuss each aspect of design, giving examples from someof the earlier microprogrammed machines.

1.1 Minimising the Cost of Control Memory

Exploitation of microprogramming was not widespread untilsuitable techniques for loading and manufacturing control memoryhad been developed. Such techniques are discussed by Husson(Chapter 5), where it can be seen that the predominant forms ofoonstruction allowed microinstructions to be read but not writtenunder program control. That is clearly sufficient for a welldefined and fixed instruction set. The later development ofsemiconductor control memories with write capability has beenthe main stimulus to further research in microprogram application.With all memories, however, the main design requirement is todeliver the information required at the right time and in as fewbits as possible.

Considerations of space lead to various forms of microinstruc-tion coding. The form in which a single microinstruction bitcontrols a unique processor gate (or data path) is termed directcontrol. If we can find sets of mutually exclusive controlsignals, such that not more than one is activated in a givencycle, it is poss'b0e to encode them: a field of K bits will

-activate one of 2 control lines, or none at all. That isobviously the case when one of, say, 8 registers can be gated toone input of an adder. The same technique is used in machinecode design. It is illustrated below by the structure of theIBM 360/30 microinstruction and by most of the 'first generation'microcodes, all of which may be said to use encoded control, theindividual fields controlling microorders.

Three other common forms of coding deserve mention. In bit-steering the particular control lines activated by a microorder(or bit) are determined by another field of the microinstruction.The second field directs the first to one or another set of con-trol lines; it is appropriate when the processor logic can bepartitioned into sections that do not require activation on everycycle (and can to some degree proceed in parallel). It has beenused in combination with other techniques, for example in the RCASpectra 70/45, Honeywell 4200 and IBM 360/25. Carried to theextreme, the microinstruction ends up as a function group and anumber of operand fields, which would be difficult to distinguishat first sight from a conventional machine code.

4

The second technique derives from the observation that overmany sequences of microinstructions the values of certain controllines will remain constant, therefore they can be set in advanceand taken as an implicit extension of the microinstruction. Thattechnique will be referred to as preset control. It applies, forexample, if particular carry or shift paths are fixed in advance,or if one of several possible register sets is being used.

Finally, it is easy to see that all 2 100 versions of a loo-bitdirect control microinstruction will not be used, and instead ofattempting to encode individual fields it would be possible tolist all the distinct microinstructions in a particular applicationand select those required by indexing a store containing the list.For example, in a particular application theremay be less than1024 distinct microinstructions. In that case a 2000 word micro-program can be compressed into 20 000 bits, a saving of 90%. Allthat is required is that the fully encoded microinstruction indexanother store 100 bits wide containing the 1024 fully decodedinstructions (the second store is called the nanostore). The netsaving in storage space is thus 40%.

It is more like that some of the fields of the microinstruc-tion will be fully used, leaving a residual field to be handledin the above way. The Nanodata QM-1 machine, Rosin et al (1972),provides an illustration. The 16 bit microinstruction is loadedinto one of the microregisters, a six bit field is then used toselect a 342-bit nanoinstruction. The latter can use the remain-ing ten microinstruction bits as operand selectors, so it isappropriate to regard them as a form of preset nanocontrol(Figure 2). At this point the designer faces the same set ofchoices at nanomachine level as we have already discussed inconnection with micromachines. He could use direct control: infact, QM-1 does not, but obeys a far more elaborate sequence ofnanoorders. The reader is referred to the literature for details.

MICfiOI~ls:If~:!JCTION A D D R E S S

/

;;;OKTROLL I ::,S5

‘/‘/ PROCESSOR

A~~IIlSTRUCTIO~~ L O G I C

Figure 2: Nanoprogram Control

5

1.2 Timing and Control Considerations

It will be shown later that interpreting one of the commontarget instructions takes approximately 20 microorders and twomain memroy cycles. If a premium is placed on memory utilisationit follows that the effective microorder rate must be ten timesthat of main memory: to achieve that the early machines use ahorizontal or multi-order microinstruction that activates betweenfi;z ~~ds~~h~~~~~~~o~op~ths fn parallel. The microinstruction

- or -1.5 psec core memory wou d be1 3 the memory cycle time so that a

associated with a 750nsec or 500nsecmicroinstruction rate. Horizontal coding achieves speed at theexpense of generality and ease of programming: in the nextlecture we shall introduce a more 'relaxed' form of code in whicheach microinstruction contains only one or two microorders, whichis naturally called vertical control.

The elementary steps of the machine execution cycle havealready been indicated. If no overlap is attempted then themajor components--control memory and processor--are alternatelyidle while the other completes its task (remember that read-onlymemories, and even writable semiconductor memories, may requirevery little time to recover for the next cycle). In order toachieve higher performance it is necessary to use faster andtherefore more expensive components, or to overlap the elementarysteps. The options are superficially the same as in machine codedesign. The main differences derive from the fact that micro-programs have been for the most part fixed, comparatively small,and have made extensive use of multiway branch or switch instruc-tions: the alternative of using a sequence of tests to decodea target instruction would simply be too slow.

A control memory address is frequently composed from severalfields whose values are determined at different points in themachine cycle. The high order fields are normally known first,so the construction of an address reflects a gradual narrowingdown of the alternatives until the exact microinstruction canbe fetched.

In the IBM 360/Model 30, for example, a block address isfound as part of the preset control, not normally affected bythe current microinstruction; a functional branch is a fieldinserted directly from the microinstruction, and a switch is thelow-order two-bit field of the control memory address, computedfrom the processor state. Thus, the successor to any instructionis within the current block of 256 (see diagram) and may bedependent on the outcome of one or two conditions or registervalues.

preset from processorlogicmicroinstruction

IBM 360/30 MICROINSTRUCTIONl

BLOCK FUNCTIONAL SWITCHADDRESS BRANCH

We can now see more clearly when the overlap of processor andcontrol memory cycles can be achieved. If the control address isdetermined by the processor state at the end of the current micro-instruction then although access might be initiated on the basisof block/functional branch fields the final decision has to bedelayed until the state of the processor logic is known (theexample given above falls into that category).

If the control address is determined by the processor state atthe end of the previous instruction, then the control memory canbe accessed while obeying the current instruction, e.g.

TIME

Previous pinst: ----> OBEY / STATUS)

Current vinst: Ilr ACCESS / /OBEY / STATUS

Next pinst:4ACCESS / OBEY------

The timing considerations just described are shared with verymuch more sophisticated processors: they result from any attemptto overlap one instruction with others and it is easy to see thatthe more 'changes in direction' in the flow of control the lesseffective are the overlap arrangements. It is true to say thatmicroprogram is more afflicted by conditional and computedbranches than machine language program, for which reason designersare reluctant to throw away the contents of the micropipeline andmay ask the coder to deal with various 'run-on' conditions. Whatthis means in practice is that one or two instructions in writtensequence after a branch may be obeyed, e.g. in decoding a hypo-thetical target instruction the microsequence is written:

ml : Extract function field

m2 : Branch to address + function

m3 : Increment target instruction counter

Here, although the branch m2 is taken, the following microinstruc-tion is still obeyed. It is in avoiding or dealing with suchcoding peculiarities and in taking account of critical memory orI-O timing constraints that microprogramming differs from conven-tional coding, or has done so in the past. Luckily, increasing

7

hardware power has removed many of the characteristics of micro-program from modern machines, perhaps the only positive way inwhich a microprocessor can be distinguished from a 'mini' is inits dedication to the task of modelling processors rather thanusers' problems.

1.3 Highway and Register Organization

The basic requirements for imitating a given target instruc-tion set are:

(a> arithmetic primitives for composing the arithmetic,logical and addressing functions of the target machine;

6) memory mapping and resolution compatible with the storestructure of the target machine;

(d imitation of the internal control states, registers andregister access requirements of the target machine;

and (d) peripheral interfaces that reflect the formats, statusand timing expected by the target machine.

Within this field the degree of dedication varies with theperformance/cost objective. Different design teams have goneabout the same task in quite different ways: Husson (~414) makesthe point that although the IBM 360 and RCA Spectra 70 achieve thesame architecture the latter is a much more 'specific' designthan the IBM models.

In this subsection I shall illustrate features of micropro-cessor design referring to the IBM 360/Model 30 which was one ofthe earliest models of the IBM 360 range and, as it happens, thesubject of an early experiment in language oriented design thatI shall refer to later. Further details will be found in Boulaye(1971) and Weber (1967).

Figure 3 shows the data paths in the central processor of theIBM 360/Model 30. There are twelve registers, each of one byte.Apart from the main memory address and data buffers (MN and R) nospecific allocation of content is made by hardware. The datapaths are uniformly 8 bits. The microinstruction is 60 bitsl o n g , encoded into the following microorder groups:

(i> Store access: Fields CM, CN, CU

(ii) Data flow: 4-bit literal field CK

(iii) ALU control: CA, CF, CB, CG, CV, CD, CC, CZ

(iv> Sequencing: CH, CL

(4 Status: cs

8

M A I N &S T O R E D A T A B U S

P- L O C A L

MEMORY

r

I

+'STATUS

J

CBI

B - B U S /T9B

Y CG1/

H - L

8YCV

Q/V

c z ’ STATUS

Figure 3: Simplified Data Flow of the IBM 360/Model 30 CPU

9

For example, under group (i):

CM (3 bits) indicates: No action

Read from address IJ, UV, or LT to R

Regenerate

Write from R

CU (2 bits) selects main or local (register) storage.

Under group (iii):

CA (4 bits) selects one of 10 inputs to the ALU through theA register

CB (2 bits) selects one of R, L, D or the literal CKCK

CC (3 bits) selects the actual ALU function

CF (3 bits) modulates the A-input to ALU, i.e. high digit,low digit, none, low or cross-over

CG (2 bits) modulates the B-input to the ALU

CV (2 bits) selects true, complement or six-correct form of B

CZ (4 bits) gives the destination, one of ten registers.

Thus in one microinstruction, which takes 750nsec, an 8-bitarithmetic or logical operation is carried out, half a main storecycle is controlled, and the next microinstruction is selected.In the next cycle the main store operation must be completedwhile other operations are carried out.

If we consider the loop of instructions which interprets thetarget machine code it clearly consists of first fetching theinstruction, then looking at the function/format digits and pre-paring each operand by computing an address and accessing thestorewhen necessary, and then branching to the 'semantic' microsequencethat interprets the target function. The instruction will normallyterminate by servicin,0 interrupts before proceding to the next insequence. Elementary IBM 360 instructions take between 15 and 30psecs in execution, i.e. 20-40 microinstructions: the large numberreflects the fact that any address or arithmetic calculationinvolving operands of more than 8 bits has to be carried outserially by byte.

In order to achieve higher performance the microregistersand internal data paths must be more closely matched to those ofthe target machine, and supplementary functional units introducedto minimise the 'mismatch' between the microprocessor and thetarget system architecture.

.

2. GENERALIZED HOST MACHINES

We have seen some of the ways in which specific features arebuilt into microprogrammable machine to.help in modelling particu-lar order codes. However, our main objective is to consider sys-tems at a level removed from machine,code, where the targetinstruction sets can to some extent be chosen to suit the availablehardware: in the last lecture we can attempt to answer the questionof'whether the need for specific adaptation will still arise.

I shall now discuss design generalisations that have beenfavored in recent years as the result of rapid reduction in thecost of storage and logical devices. In the latter context'regularity' of hardware is at least as important as circuit orgate count, which is greatly to the benefit of the microprogrammer.I shall refer to the class of processors under discussion as host,machines in order to suggest their role and to avoid undue emphasison 'microprogram' or 'microprocessor' technology. In practice,the principal use of host hachines has been in the form ofinstruc-set emulators (e.g. IBM 360 imitating the IBM 1401). The designobjective of producing a 'universal emulator' became feasible withthe introduction of writable control memories. It is clear fromthe outset that machines capable of imitating spy instruction setat competitive speed could not be produced at competitive cost,nevertheless such a machine is invaluable as a vehicle forresearchinto computer architectures. The ICL Research Emulator El, IliffeMay (1972), the Standard Computer'Corporation MLP-900, Rakocsi(1972), the Stanford University EMMY, Neuhauser (1975), and theNanodata Corporation QM-1, Rosin, et al (1972), provide examplesof generalised facilities, while in the commercial field theBorroughs Corporation B-1700 is particularly interesting from thepoint of view of memory allocation.

All the machines in this category use vertical instructioncoding which allows much greater flexibility in function sequenc-ing than the older horizontal designs, and at the same time asimpler and more familiar form of program input. The reader maycompare the example of microprogramming given in Weber (1967) withthe program style of any of the machines mentioned above, whichbears comparison with a conventional assembly program listingexcept for the primitive nature of the arithmetic, the absenceof address modification, and the elaborate field selection andbranching functions.

In moving to vertical coding it is normally the case that themain memory system has a much higher data rate than the host needs,evcil with the fastest control store. The extra capacity is usedin direct memory access by I-O devices, in dual processor con-figurations, and in many instances by using the main memory as asource of microinstruction. The last option is particularlyattractive because it affords an escape from the rigid limitation

11

I

on microprogram that is imposed by a separate control store. Onthe other hand it does impose a control structure which isdifficult to rationalise: perhaps the simplest view is to lookupon the interpreter as providing system standards, operating sys-tem interfaces, protection, etc, which are not normally presentat the microcontrol level.

The following subsections correspond to the main design areasnoted in the last lecture, with illustrations drawn from themachines mentioned above. Further examples can be found in lessreadily accessible specifications for many machines currently onthe market.

2.1 Ceneralised Arithmetic and Data Paths

One of the obvious ways in which MS1 or LSI components affectthe arithmetic system is in allowing register lengths to bestandardised at a reasonably high value, rather than making useof specialised lengths seen in earlier machines. The effects areto speed up the machine and to save control memory, becauseoperations previously performed by a loop of microinstructionscan now be carried out in one.

The host is still specialised with regard to arithmetic widthand shift paths. Two methods have been employed for variableprecision arithmetic up to a prescribed field size:

(i> using a third input to the ALU, which is in fact a mask allow-ing carries to propagate. The SCC WL-900 allows the micro-instruction to select one of 32 possible masks which can beused to propagate carry to the 'normal' sign position. Amask may also be used to permit operations on unpacked fieldssuch as 6-bit characters stored in byte positions. One of thedifficulties of working with unpacked data, however, is thatit may eventually have to be aligned to an external interfacesuch as the store address bus.

(ii)allow the effective ALU width to be variable, i.e. takingsign, carry and zero-test signals from any position of theALU. This method is used in the El emulator and the B-1700,where the sign is part of preset control. If there are morethan one arithmetic widths in use concurrently it is desirableto have more than one preset sign position, selected by micro-instruction.

Variation in ALU width has an obvious counterpart in shiftfunctions. To reproduce exactly the shift patterns of a word ofarbitrary length it is necessary to preset the point at which endconnections are made, which is more difficult to engineer thansign adjustment because a stream of bits is being handled. TheEl emulator does allow shift lengths from one to 64 bits, but the

12

logic is expensive and most designers have settled for single ordouble length shifts and rotations. For high level languageinterpretation that is probably sufficient.

A final area where both the ALU and shifter are affected is inthe type of arithmetic carried out. The predominant types arebinary integer, decimal, and floating point. Generalisedfacilities for the last are usually complex and of limited valuein either the commercial or research context. Decimal facilitiescan be built into the ALU in varying degrees, from fully signedoperations down to facilities for detecting carries at the decimaldigit positions. The choice rests entirely on the final cost/performance required. Although an important area of design it canbe 'factored out' in comparative studies of language-oriented andfixed instructions set machines, for which reason I shall notextend the discussion at this point. It is important to rememberthat if a host has good arithmetic facilities then any lapse inhandling the control or data access side of a language will beconspicuous, and conversely.

If the path from memory is not selective enough (and it usuallyis not) facilities are required for extracting fields from micro-registers for input to the ALU. Such facilities are expensive andmay be confined to limited field selection or to particularregisters (e.g. in the shift unit). Thus, the B-1700 provides fullextraction on one 24-bit register and 6-bit subfield addressing onmost others. The El emulator can extract any byte from the 15microregisters for comparison or control purposes. The MLP-900can conveniently use the third ALU input to select fields withinregisters. Apart from the obvious hardware cost of selecting anyfield in any register, space will be taken to identify the fieldin microinstructions. It does not appear that high levellanguagesdemand complete generality, and limitations could be acceptedsimply on the grounds of coding efficiency.

2.2 Memory Mapping and Address Translation

The unstructured nature of machine codes, allowinginstructionsto be used as data, and vice-versa, requires a strict correspond-ence to be maintained between the target machine and its represen-tation in the host. (There are exceptions: in mapping the IBM1401 onto the IBM 360 it is more convenient for the latter to useEBCDIC character codes, converting to and from BCD in thoseinstructions sensitive to BCD formats). In most instances thetarget machine word is 'rounded up' when necessary to fit thehost, not attempting to make use of every bit in store. However,the B-1700 goes to the length of resolving memory addresses to thebit level and allowing any string of up to 24 bits to be read orwritten, starting (or finishing) at a given position. In thatcase 100% memory utilisation can always be achieved.

13

STCiR

The memory word or part-word is made available for analysisig the microregisters. It is an advantage to be able to selectft-on two or three potcntial.data registers in order to avoidextr;t 'move' microinstructions. At this point there is also theopport.unity to map the data into a more easily managed form. The'crus.bp0int.s' of the El emulator and 'language boards' of theF&P-900 both allow the choice by program of alternative hardwireddata paths to and from memory. They may be used, for example, toprepare an instruction for decoding, to align 6-bit charactersto B-bit byte boundaries, or to handle parity conventions on a'foreign' data bus. The diagram shows the cross point paths usedby El to read ICL 1900 instructions, which enable function,register and modifier fields to be accessed without shifting the

NCR

5 D A T

c7fuOREGI

A R

P

STE

E G I S T E R

?\ 3210 6 5 4 3

target instruction microregister. The effect of the crosspointisto save 5 or 6 steps in the typical interpretive loop of 25-30microinstructions. It can be seen as complementing the internaldata selection functions: in a machine with powerful fieldselection orders crosspoints would be less important.

Apart from data, addresses have to be matched to the conven-tions of the host. For example, if the target machine usesdecimal addressing and the host uses binary then conversion musttake place before accessing the store. Similarly, if the targetmachine operates in virtual program space then virtual to realtranslation is called for. If page and segment table accessesare implicit in each memory reference the address conversion couldeasily exceed the combined steps of instruction decode and instruc-tion execution. The alternative of using hardware assistance--allowing the host to work in virtual space--is expensive and stillleads to delay in memory access. Fortunately, in the environmentof high level language execution it is possible to work in avirtual address space but avoid most of the overhead of addresstranslation.

2.3 Representing the Target Machine State

The primary data of an interpretive program are the registers,the program counter, the instruction register, control flags,

14

channel status and control words of the target machine. Ageneralised host would expect to have room for the largest targetmachine state of interest, but even so it is unlikely to requiremore than a few hundred bytes of storage for that purpose, whichoften justifies a file of fast registers, the scratchpad (orlocal memory in IBM), in addition to the microregisters themselves.

It is a common requirement to access the scratchpad using anindex value. For example, a target machine 'register-register'instruction contains two indices. Microinstructions do not admitthe type of address calculation found in.machine instructionssets,therefore it is necessary to carry out some preliminary scratch-pad address calculation. That happens often enough--at leastonce in most target instructions--to justify building in predic-tive indexing hardware, which works in the following way. Certainmicroregister fields are designated (by preset parameters) asscratchpad indices. When any of those field values changes ascratchpad access is initiated (relative to a preset base), sothat the corresponding scratchpad element is available for read-ing or writing in the next microinstruction (compare the mainstore address registers of the CDC 6600). The crosspoints forthe El emulator are designed to place the target instruction

P R E S E T I N D E X D E S C R I P T O R

2 2 3 -1 8

I I I 1. 1/ / ,-/ / , \/ / ,, /MICROREGISIER~~,,~~“,,~~~~’ / I 1 \

I BASi‘ADDRESSBYTE” ,. / BYtE/ I N SCRATCHPAD

RANGE’ 0 * WORDACCESS

register and modifier digits in the position of predictiveindices,allowing the register and modifier values to be used without delay.

The primary data of a high level language machine are theintermediate results, control flags, and the control, stack andenvironmental pointers that allow access to contextually relevantdata. For the most widely used languages the 'state' can bemapped into a register file quite easily; moreover, its accesspatterns correspond closely to those of conventional targetmachines, hence the scratchpad organisation of a 'universalemulator' is equally applicable to the major programming languages.Whether there are alternative organisations suited to a widerclass of languages is a question we shall consider later: it mightbe argued that a language is 'major' because it happens to fitonto conventional hardware, and that when that constraint isremoved more attention can be given to problem-oriented languages.

15

2 . 4 Gcneralised Control of Peripherals--_p-______p -___________

At this point we must draw a broad distinction betweenemulcttion of the non-privileged users' instruction set and thatof the operating system. The latter would include instructionsfor channel selection, requesting device status and sendingcornlands as well as receiving and sending data. It may alsoinclude special addressing modes for channel control words, pageand segment table control, interrupt register and timer access,handkeys, displays, fault indicators and so on. Full-scaleemulation, to the extent of running the target machine's periph-erals, en@neering test programs, channel commands and operatingsystems involves at ltlast twice the design effort of the non-privileged instruction set alone and will almost certainly involvephysical. adaptation of the peripht-ral interfaces.

In the present context, recognising that most languages arenon-specific with regard to the means of peripheral control, thepreferred approach is to match the I-O statements to the hostsy:;tem using machine language and microcode procedures.

2.5 The Effect of Large Scale Integration-

The level of complexity achievable inbipolar LSI devices hasreached the point of presenting complete slices (2 or 4 bits) ofcontrol or arithmetic circuitry in a single package. However,such circuits are only realised in favourable commercial/technicalsituations, i.e. wide applicability and high functional contentin relation to edge connection. Some of the machine featuresdiscussed above would fail on both counts. On the other hand, Ihave indicated that language execution makes less stringentdemands then universal emulation, hence the 'generality' aimed atby device manufacturers may well provide effective support forthe target instruction sets of interest in the context of highlevel languages.

How much does generality cost in terms of performance? Thatis impossible to say without detailed analysis of a range oftarget machines. An indication can be given by comparing thevertical encoding of the ICL register-store 'ORX' instruction onthe El emulator with the horizontal form for the 1904E. In termsof microorders, the El obeys 30 compared with 14 for the special-ised host. The difference is by sequence control (13:6), functiondecode (5:2) and operand access (10:5). However, the most start-ling figure in each case is the ratio of support activity to 'use-ful' function: about 15:l. Our main concern in designinglanguage-oriented target machines must be to reduce that ratio.

3 . INTERPRETATION OF HIGH LEVEL LANGUAGES

The existence of readily microprogrammed host machinesnaturally gi.ves rise to speculation about the likely return frombypassing the normal instruction set. To do so succeefully involvesthe solution of a range of problems concerning definition, security,expansion,maintainability and so on, whose solution is taken forgranted in conventional systems. Before looking at the broaderptoblems it would be reassuring to have some measure of the poten-tial advantage of microcoding, which is the subject of thislecture.

It is easy to find performance improvements in the region of10:X or more for a particular algorithm expressed in microcodecompared withmachine code. In evaluating such figures it must beremembered that they defive from three contributing sources:(i) the inherent speed of microcode which is the result of thesimplicity of the instructions and the use of high speed controlStore; (ii) occasional advantages of the microfunctions over thetarget machine functions, especially in bit manipulation and con-trol sequencing; and (iii) advantages gained from bypassing thearchitectural framework of the target machine, especially itsprotection mechanisms.

ft would be meaningless to draw conclusions from isolatedalgorithms. The minimum basis of comparison is taken to be thecombination of hardware and software supporting one of the majorprogramming languages, which provides the syntax and semanticsfor a broad class of problems. The main parameters of performanceare taken to be:

(9 compile and load time

(ii) execution time

(iii) size of the support system

(iv) object program size

w diagnostic aids in (i) and (ii)

The two techniques used for performance comparison are bench-mark testing, in which space and time measures are obtained for arepresentative sample of source programs, and factoring, in whichperformance is inferred from independent measures on artificiallychosen statements. From the design point of view the second ismuch more useful, though except in the case of Algol 60 there donot appear to be any widely published sets of reference statements.Needless to say, the object of design is to optimise performanceat a g-lven system cost over a prescribed set of languages.

The weights attached to the measured parameters will vary fromone class of use to another and no attempt will be made to deter-mine them here. The aim is to show how variations in processor

17

function-- specificaLly those brot@t about by microprogramming--affect the parameters (i) - (iv). At the same time the qualita-tive effect of dia;:nostic aids will be assessed. Tt will be seentktt the time measures depend partly on performance of a secondlanguage which will be referred to as the system implementationlanguage (SIL), so whether the machine is good at compilingFortran, say, depends on what it has to do to produce executablecode, and how well it does it: as far as possible the second fac-tor will be isolated by measuring the overall performance of runtime support modules. Whichapplies also to execution of the func-tions of the language by stored microprogram or hardware becausethat does not usually vary from one language implementation toanother and it can be measured in basic arithmetic speeds. Itwould be relevent, however, if one implementation chose to use adecimal radix, while ,:r:otller imp1 ftmentation of the same languageon the same machine used binary. Most of the language implemen-tations reported in the literature have been rendered useless fromthe design point of view by not keeping the executive algorithmsconstant: in other words, if a performance gain P is generated-it is impossible to telL how much of P derived from the interpre-tive technique and how much from improved arithmetic or run-timesupport.

The following subsections make a broad distinction betweenprocedure coding, illustrated 'by some of the scientific languages,and data access, which is examined in the context provided byCobol.

3.1 Algol, Euler and Expression Evaluation-

Factored measurements of Ale;01 performance are reported byWichman (1973). In Table 1 I have abstracted some figures formachines with roughly comparable arithmetic times. It is wellknown that the Burroughts B-6700 uses a target instruction settailored to the representation of Algal: its effect can be seenin the times for procedure entry. One would also expect it to beeffective in array assignement, but in this particular case thecompilers spot the indices [l,l] etc and generate optimised codefor the conventional machines. The advantage of the language-oriented code is to simplify the compiler rather than speed upexecution.

The importance of individual statement times depends on theweights attached to them in the final performance measure. Ingeneral, arithmetic and array access operations have the highestweights, procedure entry is an order of magnitude less important,and array declarations an order of magnitude less than that. Itmust be remembered that experimentally observed times reflect acomplex combination of hardware, software and support system.ImpLicit in many decisions is the designers' assessment oEdifferent language features, and his budget reflects an assessment

of the importance of the language as a whole.

TABLE 1: SOME ALGOL STATEMENT EXECUTION TIMESt

Statement Execution time in microseconds

B-6700 IBM 370/165 Univac 1108

x := 1.0 5.5 1.4 1.5x := 1 2.7 1.9 1.5X := Y 3.9 1.4 1.5X := y + z 5.5 1.4 3.4X := Y*z 11.3 1.4 4.0el[l] := 1 5.3 2.7e2[1,1] := 1 7.7 1.7 5.8e3[1,1,1] := 1 11.3 1.7 9.0

begin array a[1:500];end 408. 242. 918.

PI bd 28.6 60.7 127.P2(X,Y) 30.5 83.6 137.

[Note: The times for the IBM 370 probably err on the low side

because of the effect of the cache]

In comparing object code size, Wichman gives the followingfigures normalised with respect to Atlas:

Burroughs B-5500 0.16Univac 1108 0.31CDC-6600 0.56

The advantage of the Algol-oriented intermediate form in compari-son with some of the best conventional systems is evident. Tounderstand how such results are obtained we must examine sometarget machine states and the functions applied to them.

The advantage of language-oriented intermediate code is that,provided an 'expression-evaluation' mechanism is built in to theinterpreter, the details of register transfers that are usuallyfound in machine code can be omitted. The compiler is simplified,the code is more compact. It is not inherently faster, becausethe data access is indirect, but in many instances that is morethan compensated by savings in other parts of microprogram. Thestack mechanism is the best known means of expression evaluation:the reader is no doubt familiar with the reverse polish form ofcode used in Burroughts B6700 and other mechines and the variousstack and environmental (display) pointers associated with it.

19

However, the apparent simplicity of the Burroughts represerlta-tion leads to some complexity in the machine functions themselves.The value call operator (VALC) has to be able to detect andinterpret all the operand types that can legitimately be presentedin the course of computation, including indirect referencesthrough the stack and procedural definitions arising in parameterlists. In most applications the questions answered by examiningtags could be answered in advance by the compiler: as a generalrule unnecessary tests at execution time should be avoided exceptas deliberate backup for the compiler, the support system or datasecurity.

In contrast, dynamic tag testing is essential to languagessuch as Euler and APL because the type of a variable is not pre-dictable at compile time. Let us examine the Euler representationin greater detail and see how one of the target machine syllablesfits onto the architecture of the IBM 360/Model 30 described inthe first lecture (for greater detail, see Weber (1967)).

The representation of a variable is a [tag,value] pair, thetags having the following significance:

0 Null 5 Reference (m,loc)

1 Integer 6 Procedure (m, link)

2 Real 7 List (length, lot)

3 Boolean 8 (Unassigned)4 Label (mp, pa) 9 Block mark (in stack)

The run-time environment consists of three storage areas: Program,which is indexed by pa (program address) and link(returnaddress);Variable, indexed by lot (location), where all defined data is tobe found, and the Stack, which consists simply of block marksgiving static and dynamic chain links, references to parametersin the Variable space, and intermediate results. Operators existto test the tag of a variable, e.g.

isn A Is A an integer?

returns the boolean value true or false. Standard operators suchas + - * / mod max abs can be applied to numeric values, yielding- - -numeric results, and failing if illegal tags are encountered.

A list is an ordered set of values, each of which is eitheranelementary type or a list. Lists can be created dynamically, andoperators exist for enquiring the length, detaching the tail,selecting an element and concatenating two lists. The existenceof reference variables causes the variable space to be maintainedby scanning pointers and recovering space which is no longerreferenced, updating pointers when compacting the active storeareas.

20

The Euler program area consists of sequences of operatorsyllables (bytes), each followed by the appropriate number ofbytes giving literal values or indices. The program is represent-ed in reverse Polish form, e.g. the statement:

'if v<n or t = 0 then d else e'da _ -would be represented by the following string of 27 bytes:

[yj m I@t [p-i IlitlO]ltest true? Y:d N:(load @t) (0 load zero

test true? Y:d N: got0 e

Note that the @ operator forms a reference on the stack, whichvale converts to the corresponding value. The translation isthus a simple reordering of the input string, replacing variablesby [block number, The latter are convertedinto [mark number,

displacement] pairs.lot] pairs on loading to the stack. In the

program the logical connectives give a destination to which con-trol passes if the top of stack element has the required value.Figure 4 gives the microcode for the and, or and then operators.- -A Boolean variable has the binary form 'OOllOOOy', i.e. tag 3and value y = 1 for true. The microregisters IJ are used asprogram counter, UV points to the top of stack. For simplicity,the address incrementing microorders, which are really byte-serial, have been written as 'IJ + 1' etc.

The sample microsequence checks the tag of the operand andinterprets the logical connective in 8 microinstructions, 4 mainmemory cycles, or 6 psec (7.5 if false). The corresponding IBM360 target instructions would take the form:

CL1 O(STACK), LOGTBE ORTRUECL1 O(STACK), LOGFBNE TYPERRORSH STACK, =‘4’

The interpretation of that sequence takes 32 psec if 'true', 90psec if 'false'. It occupies 24 bytes of program as opposed to3. That puts microprogram interpretation in its most favorablelight: dynamic type assignment, minimal arithmetic content andn&live compiling techniques. It is easy to see that even withdynamic type -assignment it is often possible for the compiler to

21

. , 1 ’ e-

predict the result of an operation as far as type isand to omit further checks, as in:

concerned,

if x=y.?..

which must give a Boolean on top of the stack.

The advantage in space which results from the syllabic form oftarget instruction is a combination of two effects: the localisa-tion of the operator/operand space implied by the source language,and the use of working registers implied by the stack. It wouldbe possible to compress an operand'address' to 3 or 4 bits, forexample, provided changes of 'context', in which the full meaningof the operand is expanded, can be effected without excessiveoverhead. Unfortunately, very little is known about the conse-quences of one choice or another; it is not even clear that pro-cedure boundaries should play a part in defining context. The useof a stack mechansim may not be optimal: we can see that somerun-time maintenance activity is involved of which a compiler couldavoid, and it is known that the majority of expressions found inpractice are of very simple forms which do not require the fullgenerality of stack evaluation. Hoevel and Flynn (1977) suggestan alternative primitive form of instruction which recognises manyimportant special cases. Space gains of up to 5:l for Fortrancompared with IBM System 370 optimising compiler are reported.

3.2 Cob01 Interpretation

The major parts of a Cobol program are the Data and ProcedureDivisions. The program operates on files of records and usesinternal records for workspace. Each possible record format isdeclared in the Data Division: the same physical record may bemapped according to many different declarations, so there is noquestion of concealing representations or placing descriptive tags

" as parts of the record. The elementary items of data have a widevariety of representations with a dozen or so basic data types.The elementary items are named, and may be collected into namedgroups 9 which in turn may be grouped, up to the level of therecord name itself. With the aid of PICTURE descriptions editingcharacters can be inserted in a field for output (and converselyfor input) with the result that the 'type' code associated with adata item can be of almost any length.

Within a record individual items or groups of items may berepeated. The number of actual occurrences may vary, dependingon a field in a fixed position in the same record. Repeated itemsare selected by following the repeated group or field name in theProcedure Division by one or more subscripts, or by using animplied Index value. The coefficients of the associated storagemapping function can be determined by the compiler.

23

'L'he Procedurt! Division is composed of a number of Segments,whose significance derives from the days of programmed overlays.A Segment comprises a number of labelled paragraphs, each contain-ini; one or more sentences. A sentence consists of one or moreCob01 statements.

fOl1

Indfvid ua1 statements have a faiowed by data namtbs and Segment o

rly simple syntaxr paragraph names

, a verb9 e.g.

ADD 1' 'I'0 Q GIVING DAY-TOTAL ROUNDED

where P, Q and DAY TOTAL are dat:l names. The definition of Cobolin;plies strict observation of decLm:ll rounding and truncation andis subject to the types of operar!J; and the size of intermediateresults (18 digits). The compiler i.s required to indicate ifoperands :ire incompatible. or if intermediate results are o~vf- otrange. Some indication of verb irc(iuencies is given by thefollowing measures from a benchm‘lrk test:

VEIlI3 DYNAMLC STAI'TCUSAGE USAGE

MOVETFGOT0ADDI'ERFOKMWRITEREADOthers

30%30%11%10%7 x4%3%5%

33%18%19%6%8%3%2%

11%

'!'hus for execution purposes seven verbs account for 95% ofexecutedstatements, while the same seven account for almost 90% of storedStaLements. The target code can be chosen purely as a compromisebetween compiler and microcode, without concern for reconstructingthe source string (which affects ht'i, coding for example). Thefinal form depends on what are regarded as reasonable limits forfielcl sizes ix one Cobol source module. In the target instructionlisted in Table 2 the maxima are taken to be:

Variables: 4096 ; Indices: 256 ; Files: 256 ; Data areas: 64

Procedure variables: 256.

In the design used here, which is based on a Cobol interpreterwritten for the 1CL E1. emulator, each Cobol statement is represenL-ec! by a sequence of 16-bit tar>;ct instructions.

CABLE 2: A COBOL TARGET INSTRUCTION LANGUAGE

Format ,rtl

cormat #2

I f nI

f=O: Source operand at DQT[n]f=l : Destination at DQT[n]f=2: Operand at DQT[n]f=3: Operand nf=6: Branch within code area, offset n

4 4 8

I 1. n

f=7: n-byte literal operand, type vf=8: Scale operand, partial result,..., by nf=9: Arithmetic; scale first operand by n

v[ADD, SUBTRACT, SUBTRACT-GIVING,MULTIPLYDIVIDE, DIVIDE-REMAINDER, . . . , etc]

f=lO: Branch DEPENDING, via Procedure variablef=ll: Branchn,depending on condition vf=13: v[MOVE, COMPARE, SET INDEX, DEBUG, STOP,

and call RUNTIME support]

RUNTIME: ACCEPT TIME, DATE, DAY, DISPLAY,OPEN, CLOSE, READ, WRITE, REWRITE, START, DELETE,CANCEL, CALL, EXIT, etc.

Cobol control structure is the source of some complexity be-cause of the use of procedure variables and debugging options.Apart from the normal branching determined by GOT0 statements itis possible to specify that a particular paragraph or sequence ofparagraphs should be PERFORMed one or more times, or until acondition is satisfied (possibly varying some elements on eachrepetition). A simple compiler cannot tell in advance whichparagraphs will be the subject of PERFORM, so it will insert apossible branch to a 'procedure variable' at the end of eachparagraph: if PERFORM does not apply, the branch 'drops through'to the next paragraph in sequence. Further complication derivesfrom the ALTER verb, which can be used to change the destinationof a GOTO. Rather than change the stored object code the branchis again directed through the procedure variable table.

The complication arising from debugging is that any attemptto access a named data item, paragraph, file or index may berequired to enter a debug procedure. In most compilers that meansthat the code generated for handling debugged elements is differ-ent from (and slower than) normal code, even when executing withDEBUG OFF. In interpretive systems the same target code isgenerated in all cases and the branch is taken in the interpreter.

25

In the Data Division all names are mapped unambiguously intoindices in the lists of data qualifiers (DQT), file and indextable. Procedure variables are indexed in the Procedure Division.Information built up during the compilation phase can be carriedOVCJL- into execution without change in many cases. Figure 5 showsthe m,dulnr structure of Cobol as far as it affects the interpre-ter. The DQT contains a 64-bit descriptor for each variable,givir,g:

. the index of the base pointer for the record currentlycontaining the variable

. offset and limit of the variable within the record area

. whether the debug option applies

. operand typ e and scaling information

. it sllhsct-ipted, the index of mapl'ing parameters in thesubscript information table

l if edited, the index of editing parameters in the editinformation table.

At runtime the data qualifier element DQT[n] is interpreted togive the aclclress pointer to a sequence of bytes (or bits) withinthe area deCi.ned by the base. About 20 microsteps are required toextract tllc dat:~ attributes and place them in microregisters,followed by whatever is needed to extract the data itself andpresent it for the next operation. Hence the management of theDQT represents a significant part of the interpretive overhead.

In measuring Cobol performance the time and space requirementsof a set of test statements were measured, and final figures ofmerit obtained by weighting the results according to dynamic orstatic usage. For space, a gain of 1:3 resulted in comparisonwith the ICL 1900 program requirements. It appeared possible toimprove on that by adding to the function set. For time, an over-all improvement of 1:2.5 was observed in comparison with the.,onventional compiler on the ICL 1900. That figure is disappoint-ing . It is accounted for in part by the arithmetic complexity ofCobol. Nevertheless the average Cobol statement appears to needabout 200 microsteps (as opposed to 500), and in several instancesthe conventional compiler generates code that runs fastier than theinterpreter, for much the same reason as we saw earlier in lookingat Algol implementations. However, another factor proves to besignificant: the time spent in the interface between the languageinrerpreter and the supporting SIL.

26

4. INTERPRETIVE SYSTEM DESIGN

Improving on the range-defined instruction sets of fifteenyears ago without meeting comparable system objectives is notparticularly difficult. To present a realistic alternative itmust be shown how programming standards can be maintained througha very wide power range; it must be 'possible to develop and main-tain new languages and subsystems taking full advantage of thearchitecture without endangering system security; storage and con-trol structures must be created to suit modern applications ratherthan those of the early 1960's. As far as I know, no 'microsystem'has been developed with the required properties. Even so, it isnot sufficient to show that variable microcode achieves betterresults than fixed instruction sets: we also need to be con-vinced that it is the best way of using modern technology. Inthis lecture I shall draw together some of the results observed inlanguage-oriented machine design and suggest two alternativesystem frameworks in which the demonstrated advantages could beretained.

4.1. The Effect on Language Parameters

As I have already indicated, many of the measures of languageperformance are affected strongly by the choice of supportingsystem, which we suppose to be reflected in the semantics of theSystem Implementation Language (SIL). For example, suppose theSIL is in fact a copy of the Executive package of a conventionalmachine range, and that a Cobol application package is obeyed(a) using the fixed instruction set and (b) using a Cobol targetcode such as discussed in the last lecture. Then the observableeffect on storage requirements would be as follows (using typicalfigures for the ICL 1900): (4 (b)

Fixed Instr. Fixed+Cobol

Fixed instr. pcodeCob01 target pcodeExecutive (kernel) functions:System functions (spooling,command language, etc)Cobol run-time support:Cobol application - data (say>

16 Kbyte 16 Kbyte0 9 Kbyte

16 Kbyte 16 Kbyte

20 Kbyte 20 Kbyte25 Kbyte 25 Kbyte9 Kbyte 9 Kbyte

Total- code by> 9 Kbyte 3 Kbyte

95 Kbyte. 98 fiyte

In other words, the reward for a great deal of effort and invest-ment in control memory is negligible as far as storage is concerned.Of course, one can present the picture in other ways and use thespeed gain to advantage if there is sufficient I-O capacity, butthe point remains that unless the support system gains similaradvarltages from the interpretive techniques the improvement inlanguage performance will be seriously diluted. Let us assume,

27

therefore, that the SIL itself benefits from the use of micro-p i-OgraIIl. The effect may be seen as space reduction and a gain inspeed; more probably it will be seen as improvement in functionarid flexibility. In reviewing the parameters listed earliersome of the requirements of the SIT., will be noted.

(i) Compile and Load Time.

Substantial (say a factor of 5) gains in speed can be made inthe portions of a compiler concerned with lexical and syntaxanalysis, and to a lesser extent in code generation, by microcodeinterpretation of syntax tables. Where in-line coding has beenused in the past the speed gain is smaller but significant savingin space is achieved by table-driven techniques. Compile time isindirectly affected by the choice of object code under (ii).

Load time is normally determined by the supporting system.If al:L programs have to be mapped into a (virtual or real) linearstore the time and space overheads in starting a job step may besignificant (comparable with the compiler itself in many conven-tional systems). Moreover, the operating inconvenience issignificant and may result in such anomalies as separate 'batch'and 'load-and-go' language systems. There is no reason, however,why the SIL functions should not allow program execution withexplicit structure. For example, the operating environment shownin Figure 5 can be maintained with no appreciable execution over-head on the part of the SIL. In that case, the load time isnegligible.

(ii) Execution Time

Excluding arithmetic and I-O, execution time is governed bythe time of access to variables and the change of control environ-ments, i.e. the subsets of the program space immediately availablefrom particular points in the program. It is the 'localisation'of the environment which allows short addresses to be used andproduces the greatest contribution to code compaction. The dia-gram shows the components of a generalised access chain. Dataelements are assumed to be created in blocks (activation recordsor file areas) which are not necessarily contiguous in store, butselectable by an index n. Data identifiers in the source textare mapped into indices m, which are used to refer to a table ofattributes (cf the DQT in Cobol) which give record pointer, off-set, size, type, and possibly other information derived by thecompiler and required during execution. In general, several setsof attributes may refer to the same record, and one set ofattributes can refer to several record areas (through dynamicadjustment of the control environment).

28

l-3

S I Z E , . .TYPE ) -- - - - - -

b

b + n

OSJECT C O D E A T T R I B U T E S CONTROL D A T AE N V I R O N M E N T S T O R A G E

Languages differ in the amount of attribute informationcarried into the execution phase, the method of changing the con-trol environment, the time at which attributes are assigned, andhence in the ways of distributing components of the access chainfn storage. In Fortran, for example, attributes and recordpointers can be absorbed into the object code; in APL the objectcode and attributes are dynamically assigned; in Algol the (g,n)pair and size can be absorbed into the object code while the typeis sometimes attached to the data in the form of a tag. Whereexplicit maintenance of attribute and environment is demanded bythe language there can be significant gains from using microcode.The ratio of addressing and control instructions to arithmetic inthe output of a conventional compiler is in the region of 4:1, soassuming a 5:l speed increase from microcoding the former an over-all speed gain of 5:1.8 or 2.8:1 is indicated. One would expectmore for the highly structured or 'dynamic' languages. Furtherspeed gains can be expected where specialised arithmetic functionsare called for, e.g. array, complex, controlled precision orcharacter string manipulation. A minimum overall gain of 3:l inspeed o f a 'production' compiler to range standards would be arealistic objective for the languages in common use.

A language allowing free assignment of pointers (referencevariables) entails potentially serious support overheads in theassignment and recovery of space, not necessarily eliminated bythe provision of a large virtual store. Even if the SIL recognisespointers it seems preferable for the language subsystem to under-ta'xe its own space management to take advantage of known localcharacteristics. The language 'pointer' is evaluated in terms ofthcb underlying program structure at the time of use: that opera-tion occurs frequently and benefits from processor adaptation tothe extent that once an evaluation has been carried out the resultcan be used repeatedly on successive items of data. It is thenrequired of the SIL to allow language interpreters to work with

29

'absoLute' as well as virtual addresses. In the next subsectionwe shall see what that implies. (The alternative of having boththe SIL and the language microcode work in a virtual space support-ed by hardware can be disregarded because of the delay in access-ing memory and the poor store utilization that results.)

Space management functions are principally concerned withsearching for and updating pointers and physically moving blocksof data. They are time consuming and in many languages their useis discouraged by artificial means, so the gain from making themmore efficient would be seen in program flexibility (in the userlanguage and the SIL) rather than in execution time.

(iii) Size of Support System

The SIL code benefits in two ways: in many situations, e.g.3.n compiling to language-oriented code, it has to do less; andit does it more efficiently than other high level system program-ming languages, or more elegantly than a macroassembler. Sizereductions in the region of 5:l have been achieved for compilers.Each language microcode represents a space overhead of at least10 Kbytes, plus a similar amount for the resident SIX,.

(iv) Object Program Size

Tailoring the object code to fit the source language shows theclearest gains over conventional systems because of the elimina-tion of unnecessary function, register and address bits. Anoverall reduction in procedure size of 4:l for large programs,including attribute tables, would be a realistic aim. No signi-ficant gains in data mapping over a conventional system with wordand character addressing can be expected. Gains in space can beseen as gains in main memory and channel capacity and to a smallerextent in file space.

(v) Diagnositc Aids.

As any APL user discovers, interpretive methods can giveexceptionally good diagnostic information, sufficient to overcomeeccentricities of the language itself. Unfortunately, diagnosticquality is one that cannot be measured and is often overlooked infavour. of marginal improvements in the others.

4.2 Microsystem Problems

The use of microprogram brings its own problems, and raisesthe question of whether the implied comparison with machines ofthe mid-60's was the correct one to use. In the system context,the obstacles to using interpretive microprogram are as follows.

(A) Kange Definition

The microprogram appropriate to a high performance machine isquite different from that of a slower microprocessor. Thereis also an absolute speed limitation: a machine executingtarget instructions at 10 MIPS is obeying microorders at least10 times as fast, which is beyond the power of verticallyencoded (i.e. easily programmed) host machines.

(B) Security

Microprogram derives part of its speed advantage by ignoringthe security checks inherent in fixed instruction sets. Fora small amount of microprogram under control of the manufac-turer that is tolerable. The language performance figuresobtained in practice give the interpreter responsibility forresources normally regarded as protected, i.e. absoluteaddresses, in which case the security of the system is in thehands of language implementors.

(C) Flexibility

Microprogram is a static form of code. It cannot easily bemoved in store. Fast control memories and scratchpads arenecessarily small, so the problems of sharing resourcesbetween interpreters and scheduling their use have to besolved.

Of the above, (B) alone is sufficient to prevent widespreaduse of microprogram in commercial systems. Four types of responsecan be recognised:

1. Embed the Microprogram in a Conventional System

We have already noted that the space and time advantages arediluted in the context of a conventional system, nevertheless,those that remain are obtained with minimum investment in redesign.The IBM APL Assist Feature running under DOS/VS, OS/VSl and OS/VS2has been made available on the System/370 Models 135, 138, 145 and148 (Hassitt'and Lyon (1976)). It consists of an additional 20Kbytes of microprogram, resident in main store, which interpretsAPL statements. It carries out virtual--real address translationaccording to the rules of the host system, but returns control tothe host to service interrupts and page faults. Hence,systemintegrity depends upon correct use of addresses in the APL micro-code.

2. Extend Security Boundaries to the Microprogram Level

The in-line checks that can be used without impairing perfor-mance are restricted to key comparison, lockout on fixed sized

31

tjlocks of store, etc. The El emulator provides write protectionon 16-word frames of scratchpad, 64-word frames of control memory,16 hlc-ord Frames of main memory and all I-O multiplex positions.The main drawback to such schemes is their inaccuracy and thedifficulty encountered in handling dynamically changing or movingy iu ;I-;rnms , which occur quite frequently in modern systems.

3. Control Address Formation in Microcode

An alternative, which can be seen as a generalisation of thefirst approach, is to validate addresses when they are formed,then to restrict their use so that further checks are unnecessary.The SIL is responsible for forming addresses (from segment capa-bilities); the language microcode can modify them within givenlim -!'.t s 2nd access the store directly. Addresses are distinguishedby tags so that the SIL can find and update them when necessary,independent of the source language. This method is used in theVariable Computer System(Iliffe and May (1974)) on the El emulator,which makes provision for tag manipulation. For complete security,however, specialised hardware support is necessary.

4 . Separate the Language Processors Physically

A special case of the second approach, which is attractivebecause technology is available in the form of low-cost micro-programmable machines. The separation is conceptually physical,in the form of multiple processor-memory pairs, but it could beachieved by time-slit-irIg.

From the general design viewpoint either of the last twoapproaches can be used to provide a viable system model. Eachintends to cover a wide range of performance by using multiplecomputers. From 3 it can be seen that because access to programspace is controlled the SIL and user programs can coexist in themain memory and control store (if it exists), and that programscan be distributed over the available memory space. This'distributed program' model is well suited to the class ofapplications with dynamically changing program requirements, orwhich can be expressed in terms of cooperating parallel processes.

From 4 a more specialised 'dedicated language' model isderived.Each program, together with its interpreter, has unrestricted useof the local memory space of a processor-memory pair duringexecution, but it is rolled in and out by the scheduler whichformspart of the SIL. The STL microcode and system procedures can beprotected by holding them in read-only memory. Access to shareddat:i or to overlays must be through some form of secondary storemanager, which checks the rights of the user against declaredaccessibility of the data, a relatively slow operation. Thedisadvantages of the dedicated-language model are the sensitivityof programs to physical store sizf>s, the amount of unproductive

32

traffic between central (i.e. secondary) memory and languageprocessors, the poor utilization of processor and memory resources(if it is argued that processors and memory are give-away items,why bother with microprogram at all?). Nevertheless, such asystem is in many ways the easiest to understand, it is leastafEected by failure of one of the processor-memory pairs, and itlends itself to the 'personal computer' mode of working in thesame way that private cars lend themselves to private transport,however inefficient.

Each model presupposes the use of a system implementationlanguage (SIL) whose aim is to provide a set of functions thatcan be used in all language applications to reduce developmenteffort and code duplication at both micro- and target machinelevels. In so doing it sets standards that can also be used inthe variable part. There is no doubt that certain operations suchas input-output and frequently used arithmetic procedures areproperly part of the SIL. How far one can go depends on the typeof system: if the integrity of system data cannot be guaranteed(which is the case for dedicated-language models) the amount ofsupport the SIL can give is limited. On the other hand, commit-ment of the SIL to support facilities that are rarely used compli-cates the system and wastes resources. The interesting designarea is thus the 'fringe' of functions just inside or just outsidethe SIL, which I can best illustrate by reference to the VariableComputer System developed on theElresearch emulator and latertransferred to another host machine.

4.3 An Example of a SIL: The Variable Computer System

VCS is implemented at two levels of control: microprogram andthe system target language (VCSL) in which all compilers and sys-tem utilities are written. The VCS procedures can be calledeither at microcode or at machine code level. It follows that ifa microprogrammed procedure is called from machine level, or vice-versa, some code must be obeyed to adapt from one level to theother. It is undesirable to impose restrictions at this pointbecause one cannot always predict whether a procedure will becommitted to microprogram; the descrimination must be dynamic orimmediately before task initiation, at worst. For that reasonthe list of procedure activations associated with any process con-tains both micro and machine level linkage information. Again,it is undesirable to impose limits on the depth of procedure call,therefore linkage information is stacked in main memory, the hostmachine link stack having very limited use.

Procedure activations form part of the process state vector(PW , which also contains VCS registers, environment pointer,current program pointer an,d various flag bits that are mapped intothe host registers. As calculation proceeds it is possible thatother host registers will be used, but it is required that all

33

state information will be ront,lined in the YSV at points whert3 tch;lnge of procedure or process may occur, In that way the VCScan effect process m3n :lqcment without explicit knowledge of thelan)page state, arid with a fair degree of indeprldcn~e of theh.)=;t machine. Similarly, by rec.rlgnising tagged ;Iddrcs:;es the VCS&I carry out store manng;ement without explicit declaration uf‘@bz nlrlpping used in current proce:;ses.

Procctdli rtA entry and exit is controlled through a dynamic chainof murklAd 1. inks. The purpose of the marks is to distinguish taskinitiatiou, system ca1L and user procedure calls, allowing variouslevels of restart to be employed and providing excellent diagnosticsat both clvntrol It2ve12;.

The int+!r-p:-r clt_at ion to be pla~l*d on a program segment isindicclted by a control type assigntld to a particular compiler.ControX typl’ zero is used for pure data: any attempt to obey itwill fail. Control type 1 is for system use, type 2 for VCSLtarpc code, and type values for language extensions, e.g. toCobd , APL, etc, are assigned 3, 4, . . . on a global basis. Thecontrol type is examined on procedure call and return (in the c:l:,~lof mrtcbine level code), branching to the appropriate interpretckr.

It can bee seen that the PSV's are key conerol structures th:ltmust be protected if system security is to be ensu-red. The mostefficient and flexible basis for prctcction is a capntjility :;(,hc?mesuch as that of the Basic Language Machine. Many of tl1c: vcsfunctions are concerned with creating and manipulating abstractsystem objects in a consistent way, the PSV's being the repr~~sc~nta-tion of the abstract idea of a 'process'. In particular, we findfunctions for:

W setting up operating environments (bases) and definingthe resources found in them;

(ii) creating, starting and stopping processes;

(iii) entering and leaving procedures;

and (iv) controlling access to resources.

Here a 'resource' is a storage segment, PSV, I-O device, or a setof resources. The recursive nature of this definition ailows tlnc-hbase to be constructed as a tree. Clearly, the integrity of anyobject depends in the end on maintaining the intc)grity of itsrepresentation, i.e. the store, and of the procedllres that are

applied to it, i.e. the activation records contaiirled irl the I'SV“;.

Program structure is dynamic. A new base is able to shart theinforzt.2tiora available to its ' p;trent’ at the time of its crcla/ ion,with the effect that a hierarchy of bases is set up with the'system' at the apex. The base structure is importanL in buijding

language subsystems and dependent application environments:Figure 5 shows a typical three-level base structure to whichone or more Cobol modules might be attached.

SY STT PIBASE MODULES

1 CO~IPILER

I I IRUNTIME SUPPORT

SURSYSTEM DEVICCS& PItOC:I<L;SES

.TEST PROGRAi'lS & I>kCk

COZOL O E J E C T C O D E

‘.DATA B U F F E R S

RECORD AREA PO I IiTERS

D A T A Q U A L I F I E R S

EDIT I N F O R M A T I O N

S T O R E M A P P I N G

i p!DEX TABLE

P R O C E D U R E V A R I A B L E S

F I L E D E S C R I P T O R S

It4ITIALISING CODE

DEBUG CONTROL QUALIFIER

I I I 1 I

FRO?4 PROCEDURE FRO?4 I)ATADIVISIOI'J DIVlClON

Figure 5: VCS Base Hierarchy

35

Resources are defined by various types of capability, foundin cap'ability segments at the branch points of the program tree.Tile most time-critical VCS functions are those concerned withforming addresses from segment capabilities (codewords), and withusing them to access memory. For system reasons a codeword refersindirectly to store via a global segment table (GST). The corres-ponding address retains the GST index in order to check theaccessibility and position of the segment, which happens each timean address is loaded into a register (from the PSV). The accesscode is used to control shared (read-only) access by severalprocesses or unique (update) access by individuals. All suchcontrol and conversion together with the recycling of GST indicesand memory is exercised by VCS microprogram, which provides agood example of the application of microcode to system problems.

The 'read', 'write' and 'modify' instructions which shouldstrictly speaking be found on the VCS function list are toocritical to handle by microsubroutine call. Users are thereforeallowed to issue them directly for binary data and trusted toobserve the limit and protection codes.

CODEWORD [GST index]

Gail f[access control] [fbl]

. . . . . . .,ADDRESS hl [type] [GST

. . . . . .

byte location

In the course of design numerous candidates for positions inthe VCS function list have to be considered. A fundamental pro-blem in extending the system is to achieve valuable effect with-out degrading overall performance. Sometimes a microcode branchis obtained 'for free', while at other times a new facility en-tails extra tests in a critical path. The available control storein a range of host machines has also to be considered. Optionsconsidered in that light are:

(i> selection of set elements by key rather than indexvalue;

(ii) provision of paging facilities;

(iii) static chaining in the procedure activation list;

(iv> introduction of a third segment type consisting of aset of tagged elements;

36

(4 use of semaphore variables for interprocess communication.

There are many possible variations of the addressing rule suchas (i) and (ii) but each entails a loss of space or time thatskilled programmers will try to circumvent. The best programmingenvironment appears to be a set of dynamically constructed,variable sized segments: they make optimal use of store andtheir access overheads are well understood. It is left to sub-system designers to map programs efficiently onto the tree struc-ture, so that the store management implicit in a language such asAPL is carried out in part by the language subsystem (which isaware of the details of APL usage) and in part by VCS functionswhich provide the containers for the APL workspaces.

VCS procedures are not intended to represent high level con-trol structures directly, though they happen to be adequate forVCSL and simple languages such as Fortran. Recognition of staticlevels involves extra work in procedure management and a varietyof actions dealing with special cases that could not be built in-to a fixed system, so it is intended that such structures bemapped by the language microcode into simulated control stacks.It seemed probable that mapping a display structure such as thosefound in Algal-derived languages would benefit from the ability tomanipulate sets of addresses, but the practical implementationsstudied so far have used indirect mapping techniques, i.e. a newform of 'pointer' peculiar to the language is invented and mappeddynamically onto the VCS structures (cf the Data Qualifiers inCobol). The advantage of such techniques is that they can takeaccount of language parameters in the design of pointers, but wenoted earlier that 20 or more microsteps may be taken to recon-struct the absolute VCS address.

Finally, various forms of semapore signalling were consid-ered, but only a minimal 'busy' flag was implemented in the PSV.The argument against greater elaboration is that the accessmechanism of the Global Segment Table already provides direct con-trol over shared resources, associating the control variable withthe resource itself, so there is little point in providing moreobscure functions to the same end. The release of a segment forrescheduling at the end of a critical section is not automatic:to force it at procedure exit, for example, would again implyintolerable overheads, so an explicit VCS Release function isrequired.

The Variable Computer System provides support for language-oriented microprograms in easily portable form: an investment ofabout 8 Kbyt-s of microcode transfers the VCS functions, VCSLsupport codes, compilers, utilities, etc to a new host machine.It provides the type of support which is needed if the advantagesof microcode are to be fully realised for each language, andalthough the function list could be improved in the light of

37

experience I think it is a sound method of exploiting the currentgeneration of general purpose emulators, acknowledging that systemsecurity rests on the correct design of language interpreters.

4.4 Future Developments

Careful choice of words has left'the most critical questionunanswered: leaving aside short-term expedients, is a general pur-pose host machine with two levels of writable control the beststarting point for processor design? I think not, for threereasons.

Firstly, the arguments that have been used are based on mea-sures of high level language implementation, whereas a substantialpart of information processing still lies outside that well-defined area. Several systems of mediocre performance and limitedapplicability have resulted from the assumption that a high levellanguage or set of languages would cover the field. On the otherhand without the formality of high level constructs it is diffi-cult to see how to make use of writable control memory.

But even accepting the limitations of high level languages itcan still be argued that the interpretive approach is not optimalin many instances and that the system problems outlined earlierhave still not been solved. It has to be shown that there is abetter approach to language implementation with the range andflexibility of conventional systems. We begin by drawing adistinction between the inherent coding advantages of micropro-grammed interpretation and the benefits which result from usingfast storage or ducking behind the range architecture.

Microprogrammed interpreters have improved on fixed, complextarget instruction sets to the extent that much of the redundantinformation in the instruction stream has been eliminated. Thefigures given earlier show a reduction from 500 to 200 microstepsfor the average Cobol statement, or a reduction from 15:l to 6:lin the ratio of support steps to useful arithmetic and logic.That suggests there is still room for improvement, which might befound in a hybrid form of control in which in-line and interpre-tive methods can be mixed. After all, an interpreter is simply ameans of calling a subroutine from the target instruction stream:its weakness is that the interpretive overhead is paid on everysyllable. In other words, if we think in terms of an 8-bitfunction syllable, 128 codes might be assigned to hard-wiredfunctions, the other 128 to procedure entries in a variable'control environment'.

The starting point I suggest is that each language should beanalysed from the point of view of minimising the product of micro-steps and space in the representation of programs, covering bothinstruction and descriptor decoding. I expect, though I do not

38

know of a fully tested example, that the best code a compiler canproduce will be a mixture of microsteps and monosyllabic procedurecalls. In other words, the separation into 'interpreter' and'target' code is no longer relevent.

The problem of presenting the control stream to the processorat high speed cannot be solved by committing the entire interpreterto control memory because it is now diffused through the programspace. As it happens, it was not at all clear how to do that ina flexible manner for a general purpose multilanguage system. Theconversion of 'microsteps' to 'nanoseconds' can best be treated inthe broader context of speeding up memory access rates: look ahead,use cache buffers, or in the last resort pay more, but do notattempt to deal specifically with the restrictions of controlmemory or scratchpad. It will be noted in passing that for themulticomputer architectures envisaged the path from memory toprocessor is shorter than that of a centralised system with sharedstore highways, therefore the benefit of high speed control memorywould be less marked.

Returning to system problems, we are left with (A) range cover,which it was (and still is) hoped to achieve using multiple compu-ters, and (B) security. The dedicated-language system is notaffected by the use of hybrid control: no assumptions are madeabout program security. The distributed-program system doesdepend on controlled address formation, which was achieved in theVariable Computer System by a policy of trusting the languagesubsystems. With hybrid control it becomes imperative to havehardware-enforced protection. It is also the case that many ofthe key VCS functions at present implemented by microsubroutinecalls could be implemented by in-line code.

The above discussion has been based on vaguely defined 'micro-steps' comparable with the vertical microinstructions of present-day machines. The reader may feel concerned at reverting to aprocessor style not far removed from that of twenty years ago. Isthere a danger of inventing more and more complex microsteps andrepeating the evolutionary cycle that led to the IBM System/360and other 'range' architectures? The return in space that can beexpected from more complex instructions depends on findingfrequently repeated diagrams or n-grams that can be suitablypackaged. They are more likely to occur in arithmetic, where'hardened' floating point and decimal operation can be expected,then in control sequences. It would not be surprising to see thehost arithmetic functions develop in the direction of currentmachine codes (with type interpretation placed on descriptor ortag gields), but the many nodes of data access appear to benefitvery little from complex addressing rules.

39

Date post:	17-May-2020
Category:	Documents
Upload:	others
View:	2 times
Download:	0 times

DIGITAL SYSTEMS LABORATORY Ii.stanford.edu/pub/cstr/reports/csl/tr/77/149/CSL-TR-77...direct control...

Documents