IMPLEMENTATION OF AN ARRAY PROCESSOR/ MINICOMPUTER SYSTEM …

IMPLEMENTATION OF AN ARRAY PROCESSOR/ MINICOMPUTER SYSTEM

by

MARVIN J. SPINHIRNE, B.S in E.E.

A THESIS

IN

ELECTRICAL ENGINEERING

Submitted to the Graduate Faculty of Texas Tech University in Partial Fulfillment of the Requirements for

the Degree of

MASTER OF SCIENCE

IN

ELECTRICAL ENGINEERING

Approved

Accepted

.J ugust, 1982

ACKNOWLEDGMENTS

I am deeply indebted to Dr. Donald L. Gustafson for his help in

the production of this thesis, and to Dr. Thomas Krile and Dr. Milton

Smith for serving on my committee.

I would also like to thank Mr. Steve Patterson for his help with

the construction.

11

CONTENTS

ACKNOWLEDGMENTS ii

LIST OF TABLES V

LIST OF FIGURES vi

Chapter

I. INTRODUCTION 1

Background 1 Purpose 2 Outline of Thesis 5

II. ARRAY PROCESSOR ARCHITECTURE 7

III. THE ANALOGIC AP400 15

The Pipeline Arithmetic Unit 17

The Control Processor 19 The Data Memory 21 The I/O Unit 21 Host/AP400 System Operation 23

IV. THE UNIBUS OPERATION 26

UNIBUS Operation 27 UNIBUS Specifications 30

V. THE 990/12 MINICOMPUTER 32

The 990/12 Central Processor Unit 32 The TILINE Data Bus 37 TILINE Specifications 42

VI. THE HARDWARE INTERFACE 45

The Address Bus 46 Tlie Data Bus 47 DMA Memory Mapping 49 Data Transfer Control Logic 49 TILINE Bus Acquisition Logic 53 Reset and Timeout Logic 56 AP400 Bus Terminations 57

111

VII. THE SOFIWARE INTERFACE 58

The AP400 Host Resident Manager 60 The AP400 Host Resident Driver 62 Tlie Device Service Routine 69 AP400/990 Operation 74

VIII. CCNCLUSION 76

LIST OF REFERENCES 81

APPENDIX 83

A. TIMING DIAGRAMS 84

B. SCHEMATIC DIAGRAMS 90

C. SOFTWARE LISTINGS 96

IV

LIST OF TABLES

Table Page

1. AP400 SPECIFICATIONS 17

2. UNIBUS SIGNALS 28

3. TILINE SIGNALS 40

4. BASELINE DRIVER ROUTINES 67

5. PROGRAM LOAD DRIVER ROUTINES 70

LIST OF FIGURES

Figure Page

1. AP400/990 System Block Diagram 4

2. The Basic von Neuman Machine 8

3. AP400 Block Diagram 16

4. TILINE Map File and Address Development 36

5. 990 Memory Map 38

6. TILINE Master Priority Logic 43

7. AP400 Address Development 48

8. Bus Acquisition Sequence 55

9. Supervisor Call Block Structure 73

A-1. UNIBUS Read Cycle 84

A-2. UNIBUS Write Cycle 85

A-3. UNIBUS Acquisition Cycle 86

A-4. TILINE Read Cycle 87

A-5. TILINE Write Cycle 88

A-6. TILINE Acquisition Cycle 89

B-1. Data Bus Logic 90

B-2. Address Bus Logic 91

B-3. DMA Mapping Logic 92

B-4. Data Transfer Control Logic 93

B-5. Bus Acquisition Logic 94

B-6. Reset and Timeout Logic 95

VI

CHAPTER I

INTRODUCTION

Background

In recent years the demand for computer architectures capable of

processing large arrays of data quickly and efficiently has

increased dramatically. Applications such as real-time signal

processing, simultaneous solutions of partial differential equations,

and manipulation of large arrays of data all require fast as well as

accurate processing capabilities. Several large mainframes, such as

the CRAY 1 and the CDC CYBER series of computers, while having the

capability for this type of processing, are cost prohibitive in many

smaller applications. Minicomputers, while being cost effective for

general applications, do not possess the processing speed for many

scientific applications. This state of affairs has led to the

development of the attached peripheral processors known as Array

Processors. These processors are typically designed to work with

existing minicomputers, enhancing their processing speed for particular

applications by up to 1000 times. Some array processors, such as the

IBM 3838 and the CDC MAP-III are designed to work with existing large

mainframes.

The term "array processor" has been used to describe many

different types of computing systems - systems designed for processing

vectors, arrays of nunt)ers, and data originating from arrays of points

as well as systems composed of arrays of processing elements. The term

array processor will be used here to designate a scientific processor

designed as an attached peripheral for an existing host computer; it

achieves high performance through specialized architecture such as

pipelining and/or parallelism, and can be programmed by the user for

different applications. In many cases the array processor possesses

its own external I/O ports which can be configured for data acquisition

or other purposes [1].

Present day array processors are used for two basic types of

computation: vector processing and digital signal processing. Many of

the original array processors were developed specifically as signal

processing units. In addition, many of the design techniques used in

array processors were originally developed for large mainframe vector

processors. The largest use of array processors today is for increasing

the computational power of a minicomputer system in such applications

as speech synthesis and recognition, geological data processing, and

general scientific computations. Many of the currently available array

processors are designed to be as general purpose as possible while

still maintaining their processing speed.

Purpose

Itie purpose of this thesis is to present the implementation of an

Array Processor/Minicomputer system. This required the development of

both a hardware interface between the two systems as well as a complete

software package for control of the array processor.

Ttie array processor used in this implementation is an Analogic

Corporation model AP400. Analogic provides interface hardware and

software for the AP400 for the following host minicomputers: Digital

Equipment Corporation PDP-11 and VAX computers; Hewlett Packard 21MX;

Data General Nova and ECLIPSE; and Interdata 7/16 and 8/16 computers

[21. While several PDP-11 and a VAX computer system are currently

being used in the Electrical Engineering department, none of these

machines were available for use with the AP400. In order to implement

an AP/Minicomputer system vîch would be available for general use or

potential new areas of research, it was decided that the system should

be implemented using a Texas Instruments 990/12 computer currently

available in the department. A block diagram of the system as

implemented is shown in figure 1.

The use of the 990/12 for this system required the

development of a complete hardware interface between the 990/12 TILINE

data bus and the DEC PDP-11 interface board supplied with the AP400

(this interface board is essentially a DEC UNIBUS interface) . In

addition, all of the existing software packages written for the AP/PDP

system had to be re-written and optimized for the 990/12. This

included both FORTRAN and assembly language packages, as well as some

modifications to the existing AP400 assembly language libraries. This

thesis will present the development of the hardware and software

package.

COMMUNICATIONS REGISTER UNIT (CRU)

TILINE ASYNCHRONOUS DATA BUS

FIGURE 1: AP400/990 System Block Diagram

Outline of Thesis

Chapter II presents a brief introduction to some typical computer

architectures employed in current array processor designs

specifically parallel and pipelined structures. This chapter is

intended only as an introduction.

Chapter III presents a description of the Analogic AP400 array

processor architecture; this includes a partial description of the

software operation as well as the basic components of the hardware.

Chapter IV is a description of the operation of the DEC UNIBUS

(the data bus used on all PDP-11 computers). This material is

necessary for an understanding of the AP/990 interface.

Chapter V is a description of the CPU and memory systems as well

as the TILINE bus structure. A brief description of the current 990

configuration is included.

Chapter VI presents the actual hardware interface which was

designed and built; chapter VII is a description of the software

implementation. Complete software listings are not included in this

thesis - only selected examples to illustrate the basic structure.

Differences between the 990/12 and the PDP-11 implementations, and

problems which were encountered or still exist are discussed.

Additional information on the operation of the system is included in

the reference manuals provided with the AP400 and the 990 system.

Chapter VIII is a concluding chapter on the operation of the

AP400/990 system, as well as suggestions for use of the system.

Appendix A contains the timing diagrams which are referred to in

chapters IV, V, and VI. Appendix B contains the schematic diagrams of

the hardware interface which are referred to in chapter VI. Appendix C

contains example software listings referred to in chapter VII.

CHAPTER II

ARRAY PROCESSOR ARCHITECTURE

Since the introduction, of the first computers in the decade

following World War II, processing requirements have increased rapidly.

The technology available to the computer architect, however, has rarely

kept up with these needs; the designer is always limited by the maximum

processing speed of the individual logic elements. For this reason,

computer architectures must make increasing use of new designs and

concepts. The earliest computers were based upon the von Neuman

architecture [31. This architecture consists of five basic units: the

Arithmetic and Logic Unit (ALU), which performs all mathematical and

logical functions; the memory unit containing both program and data

memory; the input and output units; and the Control Unit vîch

maintains control over the other four units (see figure 2). This basic

pattern has inherent problems in that all I/O operations must pass

through the ALU; this prevents any processing from taking place while

I/O operations are in progress. Since this time, many different

methods have been implemented for overcoming this problem,

e.g., altering the data flow paths to allow I/O operations to occur

concurrently with processing operations. Two of the most widely used

methods of increasing overall computation speed have been the use of

parallel and pipelined computer architectures.

8

INPUT UNIT

4 V

MC*MrtT3V

UNIT

K

y'

'^—

ARITHMETIC & LOGIC UNIT

CONTROL

UN] CT

OUTPUT UNIT

U

FIGURE 2: Tlie Basic von Neuman Machine

The distinction between parallel and pipelined architectures is

often hard to make in practical applications. Both parallel and

pipelined structures can be classified as allowing concurrent

operations - several operations are being processed within the system

at any instant. Parallel architecture designs are usually accomplished

by providing several copies of a basic piece of the computing hardware;

each component is then programmed to operate on a specific portion of

the input data. Pipelined designs consist of breaking the computation

process into several smaller subfunctions; each of these can then be

processed by a separate piece of hardware, called a stage, which are

arranged in a "pipeline" - the outputs of one stage are passed to the

inputs of the following stage. One of the main limitations on how fast

the data may be processed is the speed at which new data may be fed to

the input of the pipeline [41 .

Itie use of pipelining may be demonstrated by considering the

implementation of a floating point addition. The addition operation

can be divided into the following steps [41:

1) Subtraction of the exponents. 2) Shifting right the fraction from

the number with the smaller exponent by an amount corresponding to the difference in exponents.

3) Addition of the other fraction to the shifted one.

4) Counting the number of leading zeroes in the sum.

5) Shifting left the sum by the number of leading zeroes and adjusting the exponent accordingly.

10

To implement this operation as a pipeline, a separate piece of hardware

can be designed to handle each of the five subfunctions. Itiese units

are then arranged in a pipeline; the output of one stage is passed to

the input of the next stage. When each stage has processed one data

set in the input data stream, it may then begin processing the next

data set. This type of configuration will achieve the largest overall

speed improvements for large input data streams.

The use of parallelism, or multicomputer architecture, is almost

as old as modern digital computers. This type of architecture consists

of several subunits, usually identical, which are capable of

simultaneous processing. The earliest implementations of this type

were for purposes of reliability - typically two or more identical

computer systems were attached in such a way that one could take over

the processing if the other failed. While this type of system did not

increase computation speeds, it did provide the experience for later

development of truly parallel processors. Another common method of

parallelism is the provision of separate processing elements for

handling computation, I/O, and memory accesses. In this type of system

multiple data paths are usually provided between the functional

subunits. In this manner, computations may be performed while I/O or

memory accesses are being performed.

One method of implementing multiple computer systems is through

the interconnection of I/O channels. In this type of architecture,

each unit in the system treats the other as an I/O channel or device.

11

Interaction between systems is at the data set level only.

Another method is through the use of shared memory. In this

configuration, several separate processors have access to a common

memory unit, as well as their own private memory. Intermediate results

may then be passed between the separate units through the common

memory. In this type of system a much higher level of interaction is

necessary to maintain validity of the data in the common memory.

The most sophisticated method, and the only one which may truly be

classified as a multiprocessor system, is the type which incorporates

several ALU or CPU modules in the common architecture, all of which are

capable of operating simultaneously [31. Some good examples of this

type of parallel system are the ILLIAC IV, which employs 64 processing

elements, and the C.mmp system which employs 16 PDP-11 processors [51.

Often a control processor is provided to maintain synchronization

between the parallel processors.

The computer architectures discussed above achieve their greatest

speed improvements for vector processing applications. Vector

processing typically refers to a computation performed repeatedly on a

large array of input data; the same computation is performed for each

element of the array. This type of processing lends itself well to

pipelined architectures in that once the pipeline has been set up for a

particular operation, the data may be streamed through at a high rate

of speed. Often the only limitation is the bandwidth of the memory

unit supplying the input data stream. For scalar operations (an

12

operation performed on a single data value), however, the pipelined or

parallel architecture quickly loses its speed advantages. For this

reason, many of the high speed scientific mainframes employ both vector

and scalar hardware.

Parallel and pipelined architectures are often used

simultaneously. A processor, for example, may include multiple

independent function units such as multipliers, adders, etc., vrtiich can

be used in parallel - if one multiplier is in use another may be used

concurrently. At the same time, each of the separate function units

may well be pipelined as discussed earlier for the addition. Computer

architectures are commonly broken into four distinct categories [41:

SISD - Single Instruction stream/ Single Data stream SIMD - Single Instruction stream/Multiple Data stream MISD - Multiple Instruction stream/Single Data stream MIMD - Multiple Instruction stream/Multiple Data stream

Most conventional computer architectures are described by the SISD

category. Vector processors usually are contained in the SIMD

category, and multiprocessor systems in the MISD category.

There are many experimental methods of implementing parallel

computer architectures which are currently being explored. These

methods typically employ many identical processing units; different

methods of interconnection are used to improve performance for various

processing needs. In addition, totally new architectures, such as data

flow computers, are being researched [61, [71.

13

%

One of the problems associated with the use of the computer

architectures described is the increased difficulty of programming. In

order to make full use of the parallel or pipelined elements it is

often necessary for the programmer to have a complete understanding of

the hardware structure. Pipelined processors typically present fewer

problems in this area than do parallel architectures in that a pipeline

can usually be controlled through microcode developed by the hardware

designer. The most frequently used operations (such as multiplies,

adds, etc.) are preprogrammed and are transparent to the user. The

efficient use of parallel architectures, however, is often task

dependent. Speed increases are obtained only vên individual tasks can

be allocated to the separate processors.

Initially the computer architectures discussed above were confined

to large, expensive mainframes. In recent years, however, the need for

more computational capabilities at a moderate price has led to the

development of the attached peripheral or array processor.

Several different architectures have been used in the development

of modern array processors; most of these include some form of both

parallel and pipelined structures. In addition, several methods of

programming for these systems have been used. At the highest level,

FORTRAN compatible subroutines are often provided for programming

convenience, while at the lowest level microcode is typically used for

direct control of the hardware.

14

The major reason for the development of current array processors

was the need for high speed vector processing at a reasonable cost.

The current AP designs accomplish this by restricting their design, in

most cases, to the necessary processing elements. Most of the I/O

operations and peripheral device handling is done by the host system;

the array processor is dependent upon its host for the input data

stream and control functions. In some cases the host is also used for

pre-processing of the data.

Current array processor designs range from a single printed

circuit board which may be installed in the host system backplane, to a

completely stand alone system designed for such applications as real

time signal processing. This area is covered in detail in references

[11, [81, and [91.

CHAPTER III

THE ANALOGIC AP400

Ttie array processor used in the implementation to be presented is

an Analogic Corporation model AP400. This processor was designed as a

high speed computational unit to be used with an existing minicomputer

system for such applications as signal processing. It achieves

processing speed increases through a combination of parallel and

pipelined architectures. This chapter presents a functional

description of the AP400; for a more detailed description refer to

reference [101.

Ttie Analogic AP400 array processor consists of four basic

subunits: the Pipeline Arithmetic Unit, the Control Processor unit,

the Input/Output controller, and the Data Memory unit (see figure 3).

Each of the four basic units operates as a separate processor;

communication between the units is performed asynchronously where

possible, thus allowing each to operate independently of the others.

Some of the specifications of the AP400 are contained in Table 1; all

specifications refer to floating point operations. Each of the

subunits will be described below. The following information has been

summarized from references [101 and [21.

15

16

HOST SYSTEM BUS

AP400

Control Processor

\Z COMMAND & CONTROL BUS

\Z RALU BUS

y\

7S

x>x>v>

7S

SZ. I/O

Host & Aux

^W\ /\ \L

O J

AP400 AUXILIARY

BUS

iz DATA BUS

Data Memory

\Ji Pipeline

Arithmetics

FIGURE 3: AP400 Block Diagram

17

TABLE 1: AP400 SPECIFICATIONS

MULTIPLICATION RATE Up to 2.1 million/sec. ADDITION AND SUBTRACTION RATE Up to 6.3 million/sec. 512 POINT REAL FFT 1.5 milliseconds 1024 POINT REAL FFT 3.6 milliseconds 1024 POINT COMPLEX FFT 7.4 milliseconds REAL CONVOLUTION 7.3 milliseconds

The Pipeline Arithmetic Unit

The Pipeline Arithmetic Unit is the main processing unit of the

AP400. A complete pass through the pipeline consists of fetching

operands, operating on than, and storing the results. One pipeline

pass requires 36 clock intervals of 160 nanoseconds each.

The AP400 pipeline unit is composed of three stages:

Stage A: Data Characterization Stage B: Data Manipulation

Stage C: Data Accumulation and Logical manipulation

TTie input to the pipeline consists of eight 24-bit values; the output

will consist of four 24-bit results. Each stage of the pipeline is

controlled through a Pipeline Arithmetic Command (PAC) and may be configured for several variations of its particular function. In this

manner the pipeline may be reconfigured for a variety of applications.

Ttie input to the characterizer stage consists of eight 24-bit

values. These eight inputs may be configured as four complex pairs of

numbers or eight independent values. The Command and Address Buffer

(CAB), located on the data memory board, contains the addresses of the

four input pairs. The characterizer may use the data values of the

first and second data words to modify the initial addresses of the

18

third and fourth data words; this allows the use of data look-up tables

for certain applications such as logarithms and trigonometric values.

The four data word addresses may also be passed through unmodified,

thus generating four pairs of multiple inputs to the next stage.

The output of the characterizer stage is passed to the multiplier

stage, which accepts the eight 24-bit operands and generates four

24-bit outputs. The result is a 48-bit word which may be either

rounded or truncated to a 24-bit result; two 24-bit results may be used

as two parts of a 48-bit double precision result. The purpose of the

multiplier stage is to prepare the data inputs for the ALU section of

the pipeline, which is contained in the Accumulator/Logic stage. The

multiplier stage itself consists of four adjacent multipliers, which

may be configured in several different groups. The pipeline arithmetic

command for the multiplier stage will determine:

1) which inputs will be a multiplier, multiplicand or bypass operand (since the third and fourth data word inputs may be table look-up data, the multiplier stage must be capable of passing this data through unmodified);

2) if the product will be truncated or rounded;

3) if the MSB, LSB, or the bypass operand will be passed to the next stage;

4) if an adjacent accumulator result will be introduced into the accumulator;

5) if the result will be downshifted 0, 1, 2, or 3 binary places.

19

The next stage of the pipeline is the Accumulator/Logic stage.

This section consists of four processing units, one for each of the

four outputs from the multiplier stage. Each processing unit consists

of two Arithmetic Logic Units (ALU's) designated the X and the Y ALU,

and associated data selection and storage hardware. This section

performs the actual computation and processing of the data. The

Accumulator/Logic stage contains eight 24-bit accumulator registers

(two for each section of the unit designated the T and S registers)

which may be loaded by the current Pipeline Arithmetic command, or may

have been loaded by the previous PAC command. Each ALU has two inputs,

called the P and Q inputs; the inputs to each ALU may be from either of

the accumulators or from the output of the multiplier stage. The

output of the ALU's may be directed to the pipeline output or to the

accumulators for use by the next command. The functions performed by

each of the ALU's is determined by an ALU function select code

generated by the control processor.

The Control Processor

The Control Processor unit is the executive controller of the

AP400. It consists of a 16 register File Arithmetic and Logic unit

(the RALU), a program counter, program memory address register, command

and instruction decoder blocks, status bit register, and an interrupt

vector encoder. The Control Processor is implemented with four 2901

4-bit slice ALU units. The Control Processor is responsible for

20

performing pipeline setup and control, as well as the data memory

allocation.

The control program for the Control Processor is contained in the

Program Memory, which consists of 2048 22-bit words. This memory is

loaded with the AP400 Executive and Function Library by the host

computer prior to operation. The program memory address register, 12

bits in length, is used to access the program memory. It may receive

inputs from the interrupt encoder, the command and instruction bus, the

RALU bus, or the program memory itself. Instruction prefetch is

performed using the Program Memory Data Register. The control

processor does not have the capability of modifying the program

memory; this can only be done by the host computer.

The control processor contains 16 16-bit registers (R0-R15) which

are used for address development. This allows addressing of up to

65536 words of data memory, which is separate from the program memory.

The control processor can access data memory on a cycle stealing basis

with the Pipeline Arithmetic section, thus allowing it to handle its

own data memory allocation and perform any necessary scalar processing

which cannot be efficiently performed by the pipeline. Stack

operations (for subroutine and interrupt handling) are performed by

using register RO as a stack pointer. The vector interrupt encoder

section can handle up to eight interrupt levels; the interrupts can be

individually masked.

21

The Data Memory

The Data Memory consists of up to 65536 24-bit words. The memory

addresses can be generated by the three other units - the control

processor, the pipeline unit, and the I/O unit. Access to data memory

is handled on a priority basis, with the pipeline having the lowest

priority. This allows the I/O unit and the control processor to

utilize the bus for data transfers; the pipeline clock is halted

during this transfer. The pipeline unit obtains data directly from the

data memory. The command and address data, however, are obtained

directly from the control procesor via the RALU bus. These cornnands

are stored in the Command and Address Buffer (CAB) for use by the

pipeline. The CAB can hold up to 64 24-bit words, which constitutes 16

pipeline commands. The control processor is responsible for keeping

the buffer full during pipeline operation. If the buffer is emptied

the pipeline will halt itself until the buffer is filled; this allows

asynchronous communication between the two units.

The I/O Unit

The Input/Output (I/O) unit provides for communication between the

array processor and the host computer, as well as control of the

auxiliary I/O port. The I/O unit is responsible for directing the

operation of the array processor - Halt, Run, Single Step etc. It also

performs all data transfer between the host and the array processor,

controls DMA transfers, and performs diagnostic functions.

22

The host interface is performed through a particular host's data

bus. The interface is controlled through two addresses - one for the

Command Register and one for the Data Register. The communication may

be performed in three different modes: programmed I/O, DMA, or

interrupt.

In the progranmed I/O mode, the host controls the array processor

through the command and data registers. The command register is used

to write to the command and message register of the AP400, thus passing

commands to the unit; it is also used to read from the AP message and

status register, thus determining the current state of the AP. The

data register is used to pass data to and from the AP. The host may

transfer an immediate command, vîch will be executed upon receipt by

the AP, or a non-immediate command which will be used to route data

transferred through the data register.

In DMA operations, the host loads the proper registers in the AP

with the value of the Host memory and AP memory addresses from and to

which the DMA transfer will occur. The host then directs the AP to

perform the DMA, at which time the AP will gain control of the bus to

perform the transfer. This process allows operation of the A? with a

minimum of overhead required of the host.

If the proper status bits in the AP have been set by the host, the

AP will be allowed to interrupt the host upon completion of some

specified action. The host resident interrupt routine for the AP is

then executed; the action taken by this routine will depend on the host

23

resident software (the AP Driver and Manager routines).

The AP400 Auxiliary I/O port is also controlled by the I/O unit.

This port consists of two 24-bit registers and associated handshake

signals (this provides one 24-bit input and one 24-bit output port).

This port may be used by the AP for input and output of data, thus

allowing such operations as real-time signal processing which is

completely independent of the host computer. The liost must load the

appropriate control program into the AP before operation; ttie unit can

then be allowed to proceed on its own, interrupting the host in the

event of an error or upon completion. The auxiliary port could

possibly be used for access to external devices such as a disk drive or

memory.

Host/AP400 System Operation

The AP400 is controlled by the Executive Control program located

in AP program memory. This program must be developed on the host

computer and loaded into AP memory prior to operation of the processor.

It is written in AP400 machine language, which is specific to the AP

control processor, and assembled through a cross-assembler located on

the host machine. It must then be linked with the necessary Library

functions for the operations to be performed (FFT's, convolutions,

etc.). Both the Assembler and Linker are written in host FORTRAN.

In addition, an AP400 software driver routine must be resident in

the host. This driver is written in host assembly language and

. 24

performs all l/o operations with the AP400. Calls are made to the

driver from host FORTRAN or assembly language programs.

At a lower level, the control processor uses Pipeline Arithmetic

Commands (PAC's) to control the pipeline unit. These are stored in

programmable read only memory units which are factory programmed by

Analogic. Provisions have also been made to allow the end user to

change these PROM's to develop its own firmware. The PAC's consist of

two basic units: the PIPE which actually controls the pipeline

function, and the PAD which performs setup operations. Each PAC

consists of five PIPE'S and four PAD'S. With careful programming,

PAC's may be interleaved to provide faster operation.

All AP400 arithmetic operations are performed on arrays of data

stored in data buffers in the AP400 data memory. These data buffers

are transferred from the host by special Library Functions which also

perform any necessary conversion between host and AP4G0 floating point

representations. When a block of data has been transferred to the

AP400, any number of operations may be performed on the data; the

functions may be "chained" together. When finished, the AP may be

directed to transfer the result (which is also contained in an AP data

buffer) back to host memory. All data buffer transfers are done

through DMA operations.

A complete software library has been developed for the AP400.

This includes many AP resident routines used to perform the library

functions, as well as the host resident manager subroutines. The

25

operation of the AP400 may be tailored to individual needs through the

AP Assembler and Linker. Utilization of the auxiliary I/O ports allows

use of the AP400 for real-time signal processing applications.

CHAPTER IV

THE UNIBUS OPERATION

The AP400 array processor was designed to interface to Digital

Equipment Corporation PDP-11 minicomputers. The interface controller

for the PDP machine performs all communication with the host through

the host system bus - in this case the DEC UNIBUS. This chapter

presents an overview of the UNIBUS operation.

The UNIBUS is the bidirectional asynchronous data bus used by all

PDP-11 minicomputers for data transfer. It is a 16 bit data path

implemented with 56 signal lines: 18 address lines, 16 data lines, 7

control lines, and 12 priority arbitration lines.

Communication between devices on the bus is done through a

master/slave relationship. One device on the bus gains control of the

bus (the bus master); the device to which it issues commands is the

slave device. A typical bus master is the CPU, while a typical slave

device is the memory. Some devices, such as a disk drive controller,

may be both a master and a slave, depending on the mode of operation.

Bus mastership is typically granted on a priority basis. Each

master is assigned a priority level, and access to the bus is granted

according to this priority.

The UNIBUS is completely asynchronous in operation. Maximum

transfer rates are determined by device speed and length of the bus.

With optimum device design the maximum transfer rate is 2.5

26

27

megawords/second. The following information has been summarized from

reference [11].

The UNIBUS address, data, and control lines are described in Table

2.

UNIBUS Operation

The following paragraphs describe the sequence of events and

timing requirements for UNIBUS data transfer operations. Timing

diagrams for each of the operations are included in Appendix A.

Master - Slave Read Cycle. After gaining access to the bus, a

master performs a UNIBUS read cycle by placing the address on the bus

and asserting the proper control lines (CO and Cl). After 150

nanoseconds (75 nanoseconds maximum UNIBUS skew and 75 nanoseconds for

address decode) the master asserts MASTER SYNC (MSYN) to indicate that

the data is valid. The device which has been addressed then performs

the requested read operation and places the data on the data lines.

When the data is valid, the slave asserts SLAVE SYNC (SSYN). The

master then waits 75 nanoseconds after receipt of SSYN for data deskew

and strobes the input data. The master then releases MSYN, and after

75 nanoseconds releases the address and control lines. The slave,

after receiving the release of MSYN, will release SSYN. The master may

then release the bus or perform another data transfer.

28

TABLE 2: UNIBUS SIGNALS

Address Lines - 18 lines (0 through 17)

Data Lines - 16 lines (0 through 15)

Control Lines CO and Cl - 2 lines used by the bus master to designate a read or a write operation.

MASTER SYNC (MSYN) - 1 line; used by the master device to indicate valid data on the address and data lines.

SLAVE SYNC (SSYN) - 1 line; used by the slave to indicate that it has completed its part of the data transfer.

INTERRUPT REQUEST (INTR) - 1 line; asserted by the bus master to indicate an interrupt request; the interrupt vector must be on the data lines at this time.

BUS REQUEST (BR4-BR7) - 4 lines; used to request bus access for an interrupt.

BUS GRANT (BG4-BG7) - 4 lines; used by the bus arbiter to grant access to the bus in response to a Bus Request.

NON-PROCESSOR REQUEST (NPR) - 1 line; used to request access to the bus for a standard data transfer.

NON-PROCESSOR GRANT (NPG) - 1 line; used by the bus arbiter to grant the bus in response to a NPR.

SELECTION ACKNCWLBGE (SACK) - 1 line; used by a master to acknowlege the bus grant.

BUS BUSY (BBSY) - 1 line; indicates that the bus is in use by a master.

PARITY ERROR (PA,PB) - 2 lines; used by a slave to indicate that a parity error has occured.

INITIALIZE (INIT) - 1 line; used to reset all UNIBUS devices upon execution of a RESET instruction, activation of the console START switch, or upon power failure.

AC LINE LOW - 1 line; indicates impending failure of the AC power.

DC LINE LOW - 1 line; indicates impending failure or instability of the DC power.

29

Master - Slave Write Cycle. The master gains access to the bus,

then places the proper address and data on the lines and asserts the

proper control lines. The master may then assert MSYN, after a 150

nanosecond deskew delay, to indicate valid data. Upon receipt of MSYN

the addressed slave device performs the write operation, and asserts

SLAVE SYNC (SSYN). The master may release MSYN upon receipt of the

SSYN signal. The address, data, and control lines may be released

after a 75 nanosecond deskew time. The slave will release SSYN upon

receipt of the release of MSYN; the master may then release the bus or

perform another data transfer.

Bus Acquisition Sequence. The master device first asserts

NON-PROCESSOR REQUEST (NPR) to indicate a bus request. The bus arbiter

will assert NON-PROCESSOR GRANT (NPG) when the bus becomes available.

The requesting device, on receipt of NPG, will assert SELECTION

ACKNOWLEGE (SACK) to indicate it has received NPG; it may also release

NPR at any time after asserting SACK but before SACK is released. The

arbiter, upon receipt of SACK, will release NPG. The requesting device

then begins monitoring the BUS BUSY (BBSY) line. When BBSY has been

released, the requesting device asserts this line and begins data

transfer operations. The master device may release SACK at any time

after the assertion of BBSY but before BBSY is released - this allows

the bus arbiter to resume the arbitration sequence for the next bus

master. The present bus master releases BBSY upon completion of the

30

data transfer sequence.

Interrupt Sequence. The bus master generates a system interrupt

by gaining access to the bus and asserting the INTERRUPT REQUEST (INTR)

line. The bus acquisition sequence is performed as described above

except that the BUS REQUEST (BR) and BUS GRANT(BG) lines are used

instead of the NPR and NPG lines. The master, after asserting INTR,

places an interrupt vector on the data lines. The processor strobes

this interrupt vector after a 75 nanosecond deskew delay, then asserts

SSYN. The master, upon receipt of SSYN, releases MSYN and the bus.

The processor then uses the interrupt vector to determine the address

of the proper interrupt service routine for the device.

UNIBUS Specifications

The UNIBUS utilizes 120 ohm characteristic impedance doubly

terminated transmission lines. UNIBUS signal levels are:

Logic 1 = 0 volts (LCW)

Logic 0 = +3.4 volts (HIGH)

The rest state of the bus (except BG and NPG) is a logic 0 level of

approximately 3.4 volts. Typical receiver switching threshold is 1.5

volts.

Timeout protection is usually provided by the bus master on the

UNIBUS. A monostable multivibrator is triggered each time the MSYN

signal is asserted by the master; if the slave device does not respond

31

wi ithin the timeout period, the SSYN signal is generated by the

monostable multivibrator. Timeout is usually set at 10 to 25

microseconds on the UNIBUS. Maximum UNIBUS length is 50 feet (using a

ribbon cable). Maximum UNIBUS loading without a repeater is 20 bus

loads.

The interface between the AP400 and the TI 990/12 was designed to

translate the UNIBUS control, address, and data signals described to

the proper levels required by the 990/12. Subsequent chapters will

present the 990/12 and the interface logic.

CHAPTER V

THE 990/12 MINICOMPUTER

The AP400 array processor is designed to operate with an existing

host minicomputer system; in this case the host is a Texas Instruments

990/12 minicomputer. All communication with the AP400 is done through

the host system bus. For the 990/12 this is the TILINE asynchronous

data bus. This chapter presents a brief description of the 990/12

processor and a detailed description of the TILINE bus operation.

The 990/12 processor has a 16-bit word length and incorporates

floating point arithmetic, byte string operations, bit array

intructions, and multiprecision integer and decimal conversion. The

990/12 is implemented in three basic units: the Arithmetic Unit (AU),

the System Mapping Interface (SMI), and the memory and memory

controller unit. Input/Output operations are handled by two different

methods: through the TILINE asynchronous data bus or through the

Communications Register Unit (CRU). All high speed data transfers,

such as disk I/O, memory access, and DMA transfers are done through the

TILINE. Slower operations, such as communication with terminals and

printers, are done through the CRU [121 .

The 990/12 Central Processor Unit

The central processor, as discussed earlier, is implemented in

three separate units. The following paraghraphs describe the operation

of each of these units.

32

33

The Arithmetic Unit. The Arithmetic Unit is implemented on one

ten-layer printed circuit board with four SN74S481 4-bit slice

processor elements, which comprise the arithmetic and logic unit.

Control functions are performed by three cascaded SN74S482 4-bit slice

microsequencer elements using read-only memory (ROM) microsequencing.

The Arithmetic Unit is microprogrammed to implement a super-set of the

9900 microprocessor instruction set. Additional instructions, such as

floating-point multiplication and division, have been included. The AU

is also user microprogrammable through the Writable Control Store,

which consists of 1024 64 bit words. Special instructions are included

in the instruction set for loading the WCS.

The 990 series of minicomputers utilize multiple register files,

consisting of 16 registers each, \4iich reside in main memory. The only

user accessible registers located on the AU board are the Program

Counter, Memory Counter, and the Status register. In addition, the

990/12 architecture includes a workspace cache which contains a copy of

the current workspace registers [131•

Memory. Memory on the 990/12 consists of up to two megabytes of

error checking and correcting memory. Each memory word consists of 22

bits, which includes the 16 bits of data and 6 bits used for the error

detection and correction. All memory is controlled by the memory

controller unit, which generates the error detection bits and handles

refresh for the 4116 dynamic memory elements. The 6-bit code generated

34

by the controller (a modified Hamming code) during store operations

allows detection and correction of single bit errors and detection of

two or more errors. Light emitting diodes mounted on the controller

board allow user location of faulty memory devices.

In addition to the main memory controller, a cache memory

controller with 2048 bytes of cache memory may be included. The cache

memory consists of two banks of fast memory devices. Each memory word

consists of 16 data bits, 2 parity bits, 1 data error bit, 11 address

bits, 2 address parity bits, and 1 validity bit. This memory is used

to store frequently accessed data from the relatively slow main memory.

Calls for data from the processor are honored faster than would be

possible from main memory. The cache controller stores the contents of

the last addressed 512 odd/even memory word pairs in cache memory;

when a word is accessed by the TILINE device the cache controller

searches the cache memory to see if the word is present. If it is not

present, the word is added to the cache memory for the next access

[141.

The System Mapping Interface. The System Mapping Interface is

implemented on an additional ten layer printed circuit board and

contains the TILINE and CRU interface, interrupt logic, loader and

self-test ROM, system clock, a 12-mi H i second test clock, front panel

interface, error and diagnostic registers, and memory mapping hardware.

All memory addresses generated by the AU board are only 16 bits wide

(the word size of the machine). The memory mapping hardware contained

35

on the SMI board uses this 16 bit address to generate the 20 oit

address actually used by the TILINE for peripheral and memory

addressing. This extends the physical address space of the machine

from 65536 bytes to 2 megabytes [131•

The memory mapping logic contained on the SMI board consists of

four sets of mapping registers called Map Files. Each set of mapping

registers consists of three limit registers and three bias registers.

When a specified map file is used for memory mapping the 16 bit address

generated by the AU board is compared with each of the limit registers

in the Map File. If the 11 most significant bits of the 16 bit address

are less than or equal to limit register 1, then bias register 1 is

used for the mapping. If the address is greater than limit 1 but less

than or equal to limit 2, then bias register 2 is used; if it is

greater than limit two but less than or equal to limit 3, then bias 3

is used.

The actual 20 bit address is computed by taking the sum of the 16

bit processor address and the 11 most significant bits of the bias

register extended to the right with 5 zeroes. The least significant

bit of the 16 bit processor address is dropped (thus only words - 16

bits - can be accessed on the TILINE) . Bits 0 and 1 (the 2 least

significant bits) of the limit registers are used by the operating

system to designate the status and protection of the mapped memory

segment. Figure 4 depicts the map file registers and development of

the 20 bit TILINE address [15].

36

Map File

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

u

Bl

L2

B2

L3

B3

1 = Limit . b = Bias a( x = Don't ( s = Status

1

b

1

b

1

b

1

b

1

b

1

b

1

b

1

b

1

b

1

b

1

b

1

b

address bit fdress bit 3are bit bit

1

b

1

b

1

b

1

b

1

b

1

b

1

b

1

b

1

b

1

b

1

b

1

b

1

b

1

b

1

b

1

b

1

b

1

b

1

b

1

b

1

b

X

b

X

b

X

b

X

b

X

b

X

b

X

b

X

b

X

b

s

b

s

b

s

b

s

b

s

b

s

b

Address Development

15 14 13 12 11 10 9 P P P P P P P

8 P

7 6 P P

5 4 P P

3 P

2 1 0 P P P

I I I I 1 1 I I I I I 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

I b b b b b b b b b b b b b b b b

M i l l I I I I I II I 11 I 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 m m m m m m m m m m m m m m m m m m m m

p = Processor address bit b = Bias register address bit m = Memory address bit

FIGURE 4: TILINE Map File and Address Development

37

The 990/12 memory map is shown in figure 5. The first 32 words

are used for the 16 interrupt trap vectors. The next 32 words are

used for XOP instruction trap vectors. Addresses in the range F800

through FBFE (hexadecimal) are mapped into the TILINE Peripheral

Control Space (TPCS), consisting of addresses FFCOO through FFDFE.

These addresses are reserved for TILINE peripheral devices, such as

disk drive controllers, etc. (the AP400 is a TILINE device).

Addresses FCOO through FFFE are used to access the loader and self-test

ROM [131.

Each slot in the chassis of the 990/12 has a wired interrupt

level. When the device generates an interrupt its level is compared

with the interrupt mask contained in the current status register. If

the interrupt is less than or equal to the enabled interrupt level, an

interrupt trap is made to the appropriate address in the 32 words of

interrupt trap vectors. Each interrupt trap contains two words - the

first is the entry point for the interrupt routine and the second is

the workspace pointer for the routine. Control is then passed to the

appropriate interrupt service routine.

The TILINE Data Bus

The 990/12 uses the TILINE, which is a high speed bidirectional

data bus, for I/O transfers between all high speed system elements such

as memory and disk drive controllers. The TILINE is an asynchronous 16

bit data bus; the speed of operation is determined by the devices

38

Area Definition

Interrupts

Extended Operations 0 through 15

General Memory Area

TILINE Peripheral Control Space

Programmable Read Only

Memory (PROM)

Memory Address

0000

0004

003C

0040

0044

007C

0080

OOAO

F800

FCOO

FFFC FFFE

Level 0 Int. vector

Level 1 Int. vector

Level 15 Int. vector

XOP transfer vector 0



Front Panel workspace

General memory area

TILINE

Programmer panel and loader

Restart vector

FIGURE 5: 990 Memory Map

39

Address, Data, and Control [16].

The TILINE, as does the UNIBUS, operates on the master/slave

concept. Each device may be either a master, a slave, or both. When

operating as a slave, the device may only respond to requests made by

the master which is currently in control of the bus. Each slave device

has a particular address or addresses which the master uses to pass

commands and data to and from the device. Main memory is a typical

slave device in that it can respond only vên a read or write operation

is done to a particular memory address.

A master device has complete control of the TILINE bus and may

access any slave device on the bus. Each master device must first

gain access to the bus before performing any I/O transfers. This is

done through an arbitration scheme which is handled by special hardware

on the SMI board. TILINE operations are described in detail in the

following paragraphs. Timing diagrams for the TILINE operations are

contained in Appendix A. TILINE signals are described in Table 3.

Master to Slave Write Cycle. The TILINE master, after obtaining

the bus, places the data and address on the lines. At the same time it

asserts the TILINE GO (TLGO) signal and pulls the TILINE READ (TLREAD)

signal low. Each slave device on the bus is responsible for decoding

the address to determine if it is the device being accessed.

40

TABLE 3: TILINE SIGNALS

Address - 20 lines (AOO through A19)

Data - 16 lines (DOO throgh D15)

TILINE GO (TLGO) - 1 line; used by master to initiate data transfer.

TILINE TERMINATE (TLTERM) - 1 line; used by slave to indicate termination of a data transfer.

TILINE MEM ERROR (TLMER) - 1 line; indicates a memory error.

TILINE READ (TLREAD) - 1 line; a logic 0 indicates a read operation, a logic 1 indicates a write operation.

TILINE ACCESS GRANTED (TLAG) - 2 lines (TLAG IN and TLAG OUT) ; establishes master device priority.

TILINE ACCESS GRANTED (TLAK) - 1 line; used in bus arbitration.

TILINE AVAILABLE (TLAV) - 1 line; used in bus arbitration.

TILINE POWER RESET (TLPRES) - 1 line; generated by the power supply on power up to reset all TILINE devices.

TILINE POWER FAILURE WARNING (TLPBWP) - 1 line; indicates an impending failure of the DC power.

TILINE I/O RESET (TLIORES) - 1 line; generated by the CPU to reset all TILINE devices.

TILINE WAIT (TLWAIT) - 1 line; inhibits all activity on the TILINE to resolve conflicts during computer to computer communication.

TILINE HOLD (TLHOLD) - 1 line; used by TILINE couplers to prevent memory modification during computer to computer comunication.

41

All slaves are also responsible for delaying the TLGO signal for the

time required to perform the address decode. This delay must also take

into account the TILINE skew time which is defined to be 20

nanoseconds maximum. If the proper slave does not respond within 1.5

microseconds a TILINE timeout will occur, generating an error

interrupt. After decoding the address the selected slave device

performs the write cycle and then asserts the TILINE TERMINATE (TLTERM)

signal. When the master receives the assertion of TLTERM it must

release TLGO, TLREAD, and TLDAT lines within 120 nanoseconds. When the

slave receives the release of TLGO it must release TLTERM within 120

nanoseconds. The TILINE master may then perform another read/write

operation or relinquish the bus to another device.

Master to Slave Read Cycle. The TILINE master asserts TLGO and at

the same time generates the proper TILINE address and TILINE READ

signals. After address decoding, the addressed slave device places the

proper data on the lines and asserts TILINE TERMINATE (TLTM). If an

error occurs during the read operation the slave asserts the TILINE

ERROR (TLMER) signal. When the master receives the TLTERM signal it

must release the TLGO and TLADR signals. At this time it must be

finished with the TLDAT and TLMER lines. When the slave receives the

release of TLGO it must release the TLTERM and TLDAT lines within 120

nanoseconds. The master may then perform another read/write operation

or relinquish the bus to another device.

42

TILINE Bus Acquisition. A TILINE device must gain access to the

bus before it may perform any I/O operations. This is done through the

use of the three TILINE control signals TLAG, TLAK, and TLAV.

All TILINE master devices are connected to the bus in order of

priority (figure 6). When a master device is not utilizing the bus or

attempting to gain access to the bus, it is in the idle state. During

this time TLAG-IN from the next highest priority device • is passed on

to the next lower priority device. When the master is attempting to

gain access to the bus it disables TLAG-OUT to the next device and

monitors TLAG-IN from the higher priority master. When TLAG-IN has

been high for at least 100 nanoseconds the access controller will pull

TILINE ACKNOWLEGE (TLAK) low and monitor TILINE AVAILABLE (TLAV) . When

TLAV is released, the requesting device will pull TLAV low and TLAG-OUT

is again enabled to the lower priority devices. At this time the

master has complete control of the TILINE bus and may begin I/O

transfers to a slave device. After the last data transfer sequence the

controller releases the TILINE and returns to the IDLE state.

TILINE Specifications

The TILINE Address, Data, TLMER, TLPRES, TLPFWP, and TLIORES lines

use the following signal levels:

Logic 0 > 2.0 volts (High)

Logic 1 < .8 volts (Low)

^ ^

43

&

a:

^ -p eg c

^ 03 O

^

u

§ >1

• H

b •H

VD

M

44

The remaining TILINE lines use the following levels:

Logic 0 > 3.0 volts (High)

Logic 1 < 1.0 volts (Low)

The Address and Data lines on the TILINE are tri-stated (i.e, are in a

high impedance state when not being used) and are not terminated. The

control lines are terminated in the computer backplane; the termination

value depends on the signal [16].

The 990/12 computer system is currently using a Texas Instruments

DXIO release 3.4 operating system [17 - 22] . This operating system

supports language compilers for FORTRAN, Pascal, and COBOL as well as

BASIC and FORTH interpreters. System hardware includes 512 kilobytes

of error correcting memory, two 50 megabyte disk drives, eight video

display terminals, and five dot matrix printers. A 2400 Bit Per Second

synchronous communications link to the University computer system (a

Natinal Advanced Systems AS/6) using IBM 3780 protocol has also been

implemented.

The interface between the AP400 and the 990/12 required the

development of both a hardware interface between the two systems (the

TILINE and the UNIBUS) as well as a complete software driver for the

AP400. The remaining chapters will present the development of this

interface.

CHAPTER VI

THE HARDWARE INTERFACE

The hardware interface between the Analogic AP400 array processor

and the Texas Instruments 990/12 minicomputer consists of two half-size

wire-wrap boards located in the main chassis of the 990/12. These

boards are connected by a ribbon cable and together perform the

functions of a full size TILINE controller. The interface performs the

following basic functions:

1) Provides buffering between the TILINE 20 line tri-state address bus and the AP400 18 line open-collector address bus.

2) Provides buffering between the TILINE 16 line tri-state data bus and the AP400 16 line open collector data bus.

3) Provides memory mapping hardware for use by the AP400 in DMA operations - this allows the AP400 to drive all 20 lines of the TILINE address bus.

4) Provides data transfer control logic to allow both the 990 and the AP400 to perform data transfers through the interface.

5) Provides TILINE bus acquisition logic to allow the AP400 to become a TILINE bus master for DMA operations.

6) Provides system reset and bus timeout capabilities.

45

46

The following paragraphs will discuss each section of the hardware

in detail. Schematic diagrams for each of the sections are contained

in Appendix B.

The Address Bus

The address bus hardware must perform all buffering between the

AP400 open collector address bus and the TILINE tri-state address bus.

This is done through the use of two sets of bus buffers - SN75136 quad

tri-state bus buffers for the TILINE and MC3438 quad open-collector bus

buffers for the AP400 bus (the UNIBUS). The bus driver and receiver

outputs of the 75136 can be tri-stated; the drivers are enabled high

and the receivers enabled low. This allows both enables to be

controlled by a single line - ADDRESS ENABLE 1 (AEl) - which is

generated by the data transfer control logic.

The MC3438 bus transceivers are designed for bus oriented

structures with 120 ohm terminated lines. The outputs are

open-collector, thus allowing wired-or configurations. The driver

outputs are controlled by the control line ADDRESS ENABLE 2 (AE2) which

is also generated by the data transfer control logic. Termination of

the AP400 lines consists of 180 ohm resisters to +5 volts (pull-ups)

and 390 ohm resisters to ground (pull-downs) which provide a 120 ohm

bus.

Additional logic is required to map the 20-bit TILINE address into

the 18-bit UNIBUS address as well as mapping the TILINE Peripheral

47

Control Space (TPCS) into the AP400 select address range. The TPCS is

a 512 word address range from FFCOO to FFDFF hexadecimal which is used

to access all TILINE peripheral controllers. The AP400 interface

controller is designed to respond to addresses in the range 1F200 to

1F3FF hexadecimal. Address lines 9, 10, and 11 of the TILINE must

therefore be inverted in order to provide the correct address to the

AP400. Since the AP400 does not respond to or drive the least

significant bit of the address bus and the TILINE mapping logic does

not map the least significant bit of the processor address, then bit 0

of the TILINE address bus is used to drive bit 1 of the AP400 bus.

This results in only the first 17 lines of the TILINE bus being used by

the AP400 bus. The upper 3 lines (17,18, and 19) of the TILINE are

used to enable inverters on lines 10 and 11 of the TILINE bus receivers

- when the three most significant lines are all high the inverters are

enabled. This allows the TPCS addess range to be mapped into the AP400

address range (see figure 7 for address development).

The Data Bus

The data bus logic utilizes the same type buffers as used for the

address bus. These buffers are controlled by lines DATA ENABLE 1 (DEI)

and DATA ENABLE 2 (DE2) generated by the data transfer control logic.

19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

48

t t t t t t t t t t t t t t t t t t t t

V V Y

17 16 15 14 13 12 11 10 u u u u u u u u

9 u

8 u

7 u

6 u

5 u

4 u

3 u

2 u

1 u

enable

t = TILINE address bit u = UNIBUS address bit

FIGURE 7: AP400 Address Development

49

DMA Memory Mapping

The DMA (Direct Memory Adressing) mapping logic allows the AP400

to perform DMA operations in the entire two megabyte address range of

the 990. The AP400 only drives the lower 17 lines of the TILINE

address bus; therefore some method must be provided for driving the

upper 3 lines. This is done through a SN74LS175 quad bistable latch.

This latch will respond to any address in the TPCS, which is selected

through an 8 position DIP switch provided on the interface. Address

decoding for the latch is performed with a DM8130 10-bit comparator and

a 74LS133 13 input NAND gate. The lower 8 bits of the TILINE address

are compared (through the DM8130) to the DIP switch settings, thus

providing the proper address range. The latch, when selected, stores

the three most significant bits of the data bus. By writing tlie proper

data to the latch the upper three lines of the TILINE address bus may

be driven. The AP400, with the latch properly loaded, may perform DMA

operations on up to 128 kilowords of memory (through the 17 lines

driven by the AP400) without reloading the latch. The software,

however, must assure that the DMA operation does not cross the 128 K

boundary without reloading the mapping latch.

Data Transfer Control Logic

The data transfer control logic provides all buffering of the

read/write control lines between the TILINE and the AP400, as well as

providing control of the data and address buffers. Three of the TILINE

50

control signals correspond directly to UNIBUS control signals: TILINE

GO corresponds to UNIBUS MSYN; TILINE READ corresponds to UNIBUS Cl;

and TILINE TERMINATE corresponds to UNIBUS SSYN.

All peripheral device slaves on the TILINE are required to delay

the TTi30 signal by the time required to decode the TILINE address as

well as for TILINE skew. The UNIBUS, however, provides the necessary

delay of the MSYN signal for the slave device (this is performed by the

master). For this reason a delay is necessary in the TLGO signal to

allow for address decoding, bus skew, and buffer delay to assure that

valid data is presented to the AP400. The maximum delay has been

calculated to be less than 100 nanoseconds.

The UNIBUS SSYN signal (TILINE TERMINATE) is used in additional

logic to provide the DATA ENABLE signals to the data buffers. The data

and address buffers on the interface must be enabled in different

directions (i.e., the drivers and receivers disabled or enabled for

each set of buffers) depending on the operation being performed. Four

different sequences may occur:

1) A TILINE master performs a READ operation from the AP400 as a slave.

2) A TILINE master performs a WRITE operation to the AP400 as a slave.

3) The AP400, acting as a bus master, performs a READ operation from a TILINE slave.

4) The AP400, acting as a bus master, performs a WRITE operation to a TILINE slave.

51

For operations in which the AP400 is a slave the address bus must

be receiving from the TILINE and driving the UNIBUS. For operations in

which the AP400 is a master the address bus must be receiving from the

UNIBUS and driving the TILINE. The ADDRESS ENABLE lines are developed

from the SLAVE signal generated by the bus acquisition logic; this

signal is high when the AP400 does not have control of the TILINE

(i.e., is a bus slave). When the AP400 gains control of the TILINE

(becomes a bus Master), this signal is low and the address bus buffers

are enabled in the opposite direction. Two 74LS04 inverters are used

to provide extra drive capability for the AE control lines.

The DATA ENABLE control signals are generated from the TLREAD,

TLTW, and from the MASTER signal generated by the bus acquisition logic

(the MASTER signal is the logical inversion of the SLAVE signal). The

Boolean expression for the data buffer control is:

DE = WRITE*SLAVE*TLTM + READ*MASTER

The WRITE and SLAVE signals are used here to represent the logical

inversion of the TLREAD and MASTER signals. DE is the data enable

signal used to produce DEI and DE2. When this signal is high the

SN75136 bus buffer drivers are enabled while the receivers are

tri-stated; the MC3438 bus buffer drivers are disabled while the

receivers are enabled - thus the TILINE is being driven. When this

signal is low, the SN75136 bus buffer drivers are disabled while the

MC3438 bus buffer drivers are enabled, thus driving the UNIBUS.

According to the Boolean expression, then, the UNIBUS will be driven

52

while the AP400 is a slave and a write operation is being performed, or

while the AP400 is a master and a read operation is being performed.

The TILINE will be driven while the AP400 is a bus master and a write

operation is being performed, or while it is a slave and a read

operation is being performed. For the SLAVE and READ operation,

however, the TILINE buffers are not enabled until the AP400 SSYN signal

is asserted; this assures that the TILINE drivers are enabled only when

the AP400 is the device being accessed. A 74LS08 AND gate is used to

enable the TILINE buffers through the use of a monostable multivibrator

described below.

The use of the TLTM signal in generating the DE control signals

neccesitates the insertion of a delay into this control line. While

the Ai'400 is a slave and a read operation is being performed the TILINE

data bus drivers are not enabled until after the SSYN signal is

asserted by the AP400. The TLTM signal must therefore iDe delayed in

order to allow the data to stabilize on the TILINE bus, before allowing

the master device to latch it.

Another timing problem in the TLTM signal was encountered after

the interface had been implemented. On the TILINE, the slave device

must release the TLTW signal within 120 nanoseconds after receiving

the release of the TLGO signal from the master. On the UNIBUS,

however, there is no timing constraint on the slave device for the

release of the SSYN signal. It was discovered that the AP400 required

more than 120 nanoseconds to release the SSYN signal; an overlap

53

occurred between the release of TLTM by the AP400 and the assertion of

TLGO by the master for the next I/O transfer. For this reason a

monostable multivibrator was included in the design. This monostable

multivibrator (an SN74LS221) is triggered by the falling edge of the

TLGO signal from the master device (after the assertion of TLTW by the

AP400). The monostable then generates a disable pulse for the TLTM and

the DEI signals for approximately 400 nanoseconds (this pulse width is

determined by the external resistor and capacitor). This allows the

next TILINE operation to begin while the AP400 is still in the process

of releasing the TLTM signal.

All control lines on the TILINE are driven by open collector logic

(thus allowing wired-OR configurations). These control signals are

generated through SN75138 quad open-collector bus transceivers designed

for single ended transmission lines. The UNIBUS control lines are

driven by the MC3438 bus transceivers.

The delayed TLGO signal is also required by the mapping logic;

thus the generation of the MAP LATCH TLGO signal. Similarly, the

mapping logic must provide a TLTM signal when the data has been

latched. This signal is ORed with the AP400 SSYN signal by a 74LS00

NAND gate to provide the TLTM signal to the TILINE.

TILINE Bus Acquisition Logic

This section of the interface performs all actions necessary to

allow the AP400 to become a bus master on the TILINE. It performs

54

conversion between the UNIBUS bus acquisition signals and TILINE

acquisition signals, as well as providing control signals for other

sections of the logic. For the following discussion refer to the flow

chart in figure 8.

To gain access to the bus, the AP400 first asserts the

NON-PROCESSOR REQUEST(NPR) control line. If SELECTION ACKNOWLEGE

(SACK) is not asserted and TILINE POWER RESET or TILINE I/O RESET are

not asserted then NPR is passed to the preset input of the Device

Access Request (DAR) flip-flop. When the DAR flip-flop is set the

TLAG-IN signal from the next highest priority device is disabled to the

next lower priority device.

TLAG-IN is then continually tested; when it has been high for at

least 100 nanoseconds, and the interface has been in the DAR state for

at least 100 nanoseconds, then TILINE ACKNOWLEGE is tested. When this

signal is unasserted the Device Acknowlege (DAK) flip-flop is set; the

interface is now in the DAK state.

After entering the DAK state the TLAK line is asserted and TLAV is

tested. When TLAV is released by the previous device the Device Access

(DACC) flip-flop is set; the MASTER control line is also asserted. The

NPG flip-flop will generate a NON-PROCESSOR GRANT (NPG) signal to the

AP400 informing it that it is now the bus master.

The AP400, upon receipt of the NPG signal, will assert SACK. The

receipt of SACK by the interface will clear the DAR flip-flop and

enable TLAG-IN to the next lower priority device, and will also clear

55

1 DAR

— no-

no

DAR 0 TLAG

^^ RESET? J^

^^^1TJ«^\^^ —-<^0 nsec? ^ ^

X' TLAK \ > "V^ HIGH? ^ ^

yes'

yes

DAK = 1 0 TLAG 0 TLAK

no

DACC = 1 1 NPG

DAR TLAV DAK NPG TLAG

no

0 DACC

FIGURE 8: Bus Acquisition Sequence

56

the NPG flip-flop. The DAK flip-flop will also be cleared, allowing

the next TILINE device to begin bus arbitration procedures.

Upon receipt of the release of NPG the AP400 will assert BUS BUSY

(BBSY); it may then begin data transfer through the TILINE. The AP400

will release the BBSY signal after performing the last data transfer.

The rising edge of BBSY (after the inverter) will clock the D input to

the DACC flip-flop, thus clearing it (the input to the DACC flip-flop

is low in the DACC state). This will release TLAV and the TILINE to

the next device.

Reset and Timeout Logic

Two reset lines are used in the interface logic: TLPRES and

TLIORES. The assertion of either of these lines during the bus

arbitration process will reset the DAR, DAK, and NPG flip-flops, thus

ending the sequence. It will also assert the INIT line to the AP400,

causing a hardware reset. A reset of the AP400 will cause the release

of BBSY, thus indirectly reseting the DACC flip-flop. An assertion of

the TLPRES line will reset the DACC flip-flop, thus assuring that the

interface is in the SLAVE state on power-up. The DACC flip-flop may

also be reset by a TILINE timeout or TLWAIT signal.

The two signals TLTM and TLWAIT are used in the bus timeout logic.

These signals are passed through the inverters to open collector

drivers, and then through a delay circuit. While the interface is in

the SLAVE state the TLTM input to the delay is disabled, thus

57

maintaining the output of the delay high (no reset occurs). When the

interface enters the MASTER state the TLTM signal is enabled; a TILINE

timeout will begin as long as TLTM is not asserted. If a data transfer

occurs and the TLTM line is asserted (the slave device responds) then

the timeout will restart. If no device responds in the specified time,

or the AP400 fails to begin data transfers in the specified time, the

timeout will occur and the DACC flip-flop will be cleared. This

timeout has been set at five microseconds.

The TLWAIT signal, if asserted at any time during data transfer,

will cause all TILINE devices to wait. To prevent a TILINE timeout

from occurring during this wait period the timeout circuitry will be

disabled while the TLWAIT signal is asserted.

AP400 Bus Terminations

The UNIBUS, as described earlier, is a doubly terminated 120 ohm

data bus. The terminations are provided on both ends by 180 ohm

pull-up resisters and 390 ohm pull-down resisters. For the AP400 to

perform correctly, these terminations must be provided at both ends of

the AP connector cable. The terminations at the 990 end are provided

on the interface boards, as described in a previous paragraph. At the

AP end it was necessary to construct a printed circuit board, with the

proper terminations, which is plugged into the UNIBUS connector.

CHAPTER VII

THE SOFTWARE INTERFACE

The AP400 software package consists of several separate modules.

Some are resident in the host computer (the 990/12), while others are

resident in the AP400 itself. The AP400 resident software consists of

the AP Executive, which handles all communication with the host and

controls the AP execution, and the AP Service Subroutines, which

perform the specific functions of the AP (they are called by the

Executive). The host resident softo/are consists of the AP Manager and

Driver, which control all access to and control of the AP400. The

AP400 software modules are described in detail in reference [2]. The

following discussion will be limited to an overview of the host

resident software as well as differences between the 990/12

implementation and the PDP-11 implementation described in the manuals

provided by Analogic.

The host computer communicates with the AP400 through the host

system bus. This communication may be performed either through

programmed I/O or through DMA operations, as described previously. The

host resident software consists of two separate modules: the AP

Manager and the AP Driver. The Manager handles all communication

between the user application program, written in either FORTRAN or host

assembly language, and the AP Driver. The Driver handles all of the

actual communication between the AP400 and the host. It passes data

and parameters to and from the AP400 and the AP Manager routines.

58

59

The AP Driver module maintains an AP400 status table containing

information on the current status of the AP, such as whether it is

running or halted, executing a Function Control Block, etc. As calls

are made to the AP by the AP Manager through the Driver, this status

table is updated.

The AP400 resident software (consisting of the AP Executive and

the Service subroutines) maintains a Configuration Table (CFT) in AP

data memory. This configuration table contains the following

information:

Word 1

2 3 4 5 5 7 8 9 10 11 12 13 14

15:

16

Flag to indicate the Executive has been loaded. Contains the current AP Executive version. Last physical address in program memory. Last physical address in data memory. First free location in program memory. First free location in data memory. Last free location in program memory. Last free location in data memory. Pipeline Arithmetic Command PROM set code. Limit address on function table. First address of function table. Last address of function table. Host DMA limit count. Register for 24-bit data transfers (AP to Host). Register for 24-bit data transfers (Host to AP). Unused by the 990/12.

It is the responsibility of the host resident AP driver to load

the configuration table and update the status. The driver uses such

routines as GETCFT and PUTCFT to maintain updated copies of the CFT in

both the host and the AP400. The data contained in the CFT is used by

60

both the host and the AP for proper AP execution. The following

paragraphs descibe the 990/12 implementation of the Driver and Manager.

The AP400 Host Resident Manager

The Manager consists of several separate assembly language

routines which may be called by either FORTRAN programs or other

assembly language programs. Each of the routines (referred to as a K

function) performs a specific function, such as reseting or loading the

AP, by making appropriate calls to the AP Driver routines. Functions

such as Fast Fourier Transforms (FFT's), convolutions etc. require that

a Function Control Block (FOB) be set up. This FCB is subsequently

loaded into the AP and executed. Each of these FCB's has a

corresponding AP function (referred to as a Q function) which must be

resident in the AP prior to execution. This is done through the AP

loader routine KLOAD. Each FCB sets up the required data for execution

of the corresponding AP service subroutine. An example of a typical

FCB is contained in Appendix C. Detailed descriptions of the AP

Manager K functions are contained in the AP400 software reference

manuals.

The Host resident K functions are contained in a host Library.

Calls are made to the Manager through a standard FORTRAN calling

sequence:

CALL subname (argl,arg2,...,argN)

Calls from assembly language are made using a BLWP (Branch and Link

61

Workspace Pointer) instruction. The arguments are passed as data words

following the BLWP instruction:

BLWP @subname DATA n DATA argument 1 DATA argument 2

•

DATA argument n

The first word following the BLWP instruction must be the number of

parameters being passed. The arguments being passed are the addresses

of the parameters to be transferred. For example, if the routine TEST

is to be called and the two parameters 5 and 7 are to be passed to the

routine, the following code is necessary:

PARMl DATA 5 First parameter value PARM2 DATA 7 Second parameter value *

ARGl DATA PARMl Address of the first parm ARG2 DATA PARM2 Address of the sec. parm *

BLWP @TEST DATA 2 DATA ARGl DATA ARG2

The purpose for passing the address of the parameter, rather than the

parameter value itself, is to maintain compatibility with the FORTRAN

calling sequence. For more information on FORTRAN callable subroutines

see the FORTRAN reference manual [23]. An example of a FORTRAN and

assembly language callable K function is contained in Appendix C.

62

The Manager routines are linked to the FORTRAN or Assembly

language program through the use of a LIBRARY statement in the DXIO

Linker (see the DXIO Link Editor Reference manual [24]). For assembly

language programs, a REF (external Reference) assembler directive

statement must be included for each subroutine called.

Errors encountered by the AP Manager during execution are reported

to the terminal associated with the calling program. These errors are

displayed to the Foreground Terminal Local file through a call to an

appropriate system routine [211. Upon completion of the program these

messages are displayed to the terminal. The error messages returned by

the manager are described in the AP400 software reference manual [2].

The following errors differ from those described:

Error -85: Used only for an End of File read error.

Error -86: Will return a specific file I/O error (error -86 will not actually be returned - an SVC error will replace it with an indication that it is an SVC error); this will be the Supervisor Call (SVC) error returned by DXIO through the SVC block used for file I/O. These errors are described in the DXIO Error Reporting and Correction Manual [26].

Error -87: Not used.

In all other respects the AP Manager should function as described

in the AP400 Software Reference Manual [2].

The AP400 Host Resident Driver

The AP Driver consists of three separate modules: the Baseline

driver (BASE), the Program Load Driver (PLDRV) , and the Relocator

(REL). The Baseline driver consists of the minimal number of

63

subroutines necessary for use of the AP400 by an application program.

The Program Load driver and the Relocator are the routines used to

perform 'one-time' services such as AP hardware and software reset and

AP400 program loading (the Executive and service subroutines). In this

manner, the AP400 may be initialized and loaded only once (on power

up); each application program will then need to be linked only with the

Baseline driver. The Program Load driver and the Relocator are called

by only a few of the Manager routines.

The 990/12 implementation of the Baseline driver does not support

host interrupts by the AP400. This capability may be included later,

but will probably require a modification of the AP400 interface

controller. Each application task should use polling of the AP through

the KSTAT, KSETIW, KWTFCB, or KWAIT Manager routines - these routines

allow the user program to perform other actions while the AP is

executing, then resynchronize with the AP at the appropriate point in

the program. Alternately the application program may call available

system routines to suspend itself while waiting for the AP to finish

execution. These routines (specifically the Suspend Task Supervisor

call) are explained in reference [19]. The following paragraphs

describe each of the three modules contained in the Driver.

The Baseline Driver. The Baseline driver, as implemented on the

990/12, does not allow the user to change the default AP400 Coninand and

Data register addresses. If the AP400 register select addresses are

64

changed, the Baseline driver will need to be re-assembled with the new

addresses (this consists of changing the two DATA statements at the

beginning of the driver). The driver was implemented in this fashion

due to the hardware changes necessary to install a TILINE device on the

bus, as well as changing the register addresses on the AP400 interface

board. Only a system operator should be allowed to make these changes

when necessary.

Direct Memory Access (DMA) operations by the AP400 require that

the AP be loaded with the proper DMA memory address. Usually the DMA

operation will be done to a data buffer area set up in the calling

program. The AP400, upon initialization, is loaded with the base

address of the calling task. To perform a DMA operation it is then

passed an offset (from the base address) to the data buffer area. A

DMA operation therefore requires the determination of the base address

of the calling task, as well as assuring that the task is not

•rolled-out' of memory during DMA operations.

In a time-sharing system, several tasks may be executing

simultaneously by having the CPU share its time among all of the tasks.

Since there may be, at times, more tasks executing than memory space

will allow, some method must be provided for storing tasks on a mass

storage device (i.e., a disk drive) while they are not currently being

executed. This allows several tasks to utilize the same memory space

concurrently. When a task has been temporarily stored on disk, it is

referred to as being 'rolled-out'. The operating system typically

65

provides a special file for this purpose called a system roll file. If

DMA operations are being performed to a certain task memory area,

however, rolling the task out (due to time sharing) would be disastrous

- the DMA operation would be performed to a completely different task,

thus overwriting it. Some means must be provided for securing the task

in system memory while the DMA operations are being performed (referred

to as a memory resident task).

Under DXIO each currently executing task has an associated Task

Status Block. This TSB remains in memory (as part of the operating

system) while the task is active. The system uses the TSB to store

information about the current status of the task - if it is privileged,

memory resident, suspended or active, the current contents of the task

map file registers etc. [21]. A task may therefore be made memory

resident, privileged, etc. by modifying this Task Status Block. At the

same time, the base address of the task may be determined from the

contents of the map file registers associated with the task.

In order to assure that AP400 DMA operations are performed in an

orderly fasion, one of the tasks performed by the Baseline driver is to

make the calling task memory resident. At the same time it must read

the task map file registers to determine the absolute base address of

the task in system memory. This is done be calling the Device Service

Routine (DSR) which is installed as a part of the DXIO operating

system.

If the base address of the calling task is located above IFFFF

66

hexadecimal the map latch located on the interface board must also be

loaded in order to drive the upper three lines of the address bus (the

AP400 will only drive the lower 17 lines). Before any DMA operations

are performed a check must be made to assure that the DMA absolute

memory address will not cross the IFFFF boundary. If this occurs, the

map latch must first be reloaded to prevent 'wrap-around' to an address

outside of the task memory area by the DMA operation. This address

checking is done in each of the K funtions which perform DMA

operations.

Upon completion of use of the AP400 by the calling task (via a

call to the manager routine KDETCH) the DSR is again called to make the

task unprivileged and non-^nemory resident.

All input/output operations to the AP400 are done through the use

of the Long Distance instructions (LDD and LDS) [151 • These

instructions load the user map file (map file 2) with a six word block

of memory specified in the instruction. The instruction immediately

following the LD instruction will then use the map file to determine

either the source (LDS) or the destination (LDD). The proper values

required to map the TILINE Peripheral Control Space are contained in

the baseline driver. The use of map files has been described in

chapter V.

A summary of subroutines included in the baseline driver are

listed in Table 4. For a more detailed discussion see the AP400

Software reference manual [2].

67

TABLE 4: BASELINE DRIVER ROUTINES

TELLAP - Will send a command to the AP without overwriting the current message in the command register.

WTAP - Will wait for a specified Function Control Block (FCB) or all FCB's to complete.

WTAPPM - Will wait fot the AP to process the last message sent to it; this is done by continually checking the Host to AP interrupt pending status bit. Error code -75 is returned if the AP takes too long to complete.

MSTRUN - Will check to see if the AP is currently running. If not, error code -72 is returned.

MNRUN - Will check to see if the AP is halted. If not, error code -73 is returned.

SETVEC - On the first call to this routine the task is attached to the AP400 by a call to the DSR. On all other calls the AP400 register addresses are loaded into the task workspace for use by Long Distance intructions.

NORMEX - This routine is called for a normal exit (no error) from the Baseline driver.

ERREX - This routine is called for an error exit from the Baseline driver. Register 12 is made nonzero to indicate the error condition.

START - This routine calls the DSR to attach the AP400 to the calling task.

STRTAP - This routine will start AP execution at a specified address.

EXCFCB - This routine is called to execute a Function Control Block.

TERFCB - This routine is called to terminate the execution of a

Function Control Block.

SNDMSG - Sends a general message to the AP400.

UPDSTA - Will determine which Function Control Block the AP is currently executing.

REPSTS - Will report the current AP execution status.

68

TABLE 4 (Continued)

WTFFCB - Allows the caller to suspend operation until a specific Function Control Block in the chain has been executed.

REINIT - Re-initializes the AP400.

DETACH - Will make a call to the DSR to detach the AP from the calling task.

69

The Program Load Driver. The host resident Program Load Driver

consists of the subroutines which are generally used only for initial

loading of the AP400 (such as after power up) . It consists of the

modules described in Table 5.

The Relocator. The host resident Relocator is used to process

Object/Load modules produced by the AP Linker and produces

Absolute/Load modules for loading by the absolute loader ABLOAD

contained in the Program Load driver. The Relocator handles the

relocatable code in the Object file and produces absolute memory

addresses for actual loading into the AP400.

The Device Service Routine

The Device Service Routine (DSR) is a series of assembly language

routines which are generated as a part of the DXIO operating system.

These routines perform the following functions:

1) Service any interrupts generated by the

AP400.

2) Perform 'housekeeping' functions upon power up of the CPU.

3) Handle 'Abort I/O* calls made from the operating system.

4) Execute a set of Operation Codes which are passed to the DSR by the Baseline driver via a standard Supervisor call.

70

TABLE 5: PROGRAM LOAD DRIVER ROUTINES

FLSHP - This routine will "flush" the AP400 pipeline.

CFTSET - This is the setup routine for Configuration table read/write routines.

ABLOAD - This routine performs an absolute loading of AP program and data memory (after the relocatable addresses have been determined).

RESAP - Performs a hardware and software reset of the AP400.

STOPAP - Will halt AP execution.

RESTRT - Restart the AP at the address at which it was stopped by the STOPAP routine.

GETCFT - Reads the AP Configuration table from the AP.

PUTCFT - Stores the Configuration table in the AP.

71

The Interrupt Service Routine. This routine performs all

functions necessary to handle an AP400 interrupt of the host system.

At the present time, the AP400 interrupt system is not enabled, so this

routine will simply return to the calling task on entry.

The Powe£-U£ Routine. The power up routine is called by the

system after the initial operating system load sequence. It will clear

the AP BUSY flag, indicating that the AP may be used, and return.

The Abort 1/2 Routine. The Abort I/O routine is called by the

operating system any time an Abort I/O supervisor call is issued by a

task. It will terminate the current AP operation to assure that it is

detached from any tasks.

Operation Codes. The Operation Codes define specific operations

to be performed on the AP400 by the Device Service Routine. These Op

Codes are specified in the Supervisor Call Block used to call the DSR.

The OP Codes for the AP400 DSR are:

00 - OPEN: Will attach the calling task the the AP400 to prevent other tasks from accessing it.

01 - CLOSE: Will detach the AP from the calling task.

02 - MEMRES: Will make the calling task memory resident for DMA operations.

03 - UNMEM: Will make the calling task non-memory resident.

04 - WCMD: Will write a command passed in the SVC block to the AP400 command register.

72

05 - RCMD: Will read the AP400 command register and return the value in the SVC block.

06 - WDAT: Will write the data passed in the SVC block to the AP400 data register.

07 - RDAT: Will read the AP400 data register and return the value in the SVC block.

08 - ABORT: Detaches the AP400 from a task which may have terminated abnormally without releasing the AP400.

The SVC block structure is shown in figure 9. The byte structure

is:

Byte Byte Byte Byte

Byte Byte Byte

Byte

Byte

0: 1: 2: 3:

Must be zero. Error codes are returned here. Contains the DSR Op Code. Contains the LUNO assigned to the AP400. System Flags - not used. User Flags - not used. For WCMD and WDAT the command or data is passed in this word. Not used - must be set to zero.

10-11: For RCMD and RDAT the data is returned here.

4 5

6-7

8-9:

A detailed description of DXIO Device Service Routine structure is

contained in reference [21].

73

Byte

00

02

04

06

08

10

12

SVC OP CODE (00)

OP CODE

SYSTEM FLAGS

STATUS CODE

LUNO

USER FLAGS

VCMD and WDAT BUFFER

NOT USED

RCMD and RDAT BUFFER

NOT USED

FIGURE 9: Supervisor Call Block Structure

74

AP400/990 Operation

The steps required for operation of the AP400 with the 990/12

system are outlined below:

1) The AP400 must be loaded with an operating system. This

includes the AP Executive, several housekeeping routines, and the

necessary service subroutines. This operating system is developed

through the use of the AP Linker - a link control file must be created

which includes the AP Executive and associated subroutines. These

routines are all contained in a host resident library or libraries; the

linker will combine the specified modules into a single module. This

module is then loaded into the AP400 by calling the KLOAD subroutine

(as described in the AP400 Software Reference manual), or alternately

by using the APLOAD SCI command.

2) The FORTRAN or assembly language source program, including the

appropriate subroutine calls, must be written, compiled, and linked.

This is done through standard program development commands under DXIO.

During the linking phase the AP Driver and Manager host libraries must

by included in the link control file. This is described in the AP400

Software Reference Manuals.

3) The program may then be executed. The AP400 may only be

accessed by one program at a time; while the AP Driver assures that

only one task is attached to the AP at a time, the possibility exists

that a task may be terminated without being detached from the AP

(through an unrecoverable error). If this occurs, the user should

75

detach the AP by using the APKILL SCI command. This will assure that

the next user may have access to the AP.

The AP Linker mentioned above, along with an AP Assembler, are

implemented on the 990/12 in FORTRAN. The operation of these routines

will be described in detail in an AP400 users manuals which will be

developed along with the AP400/990 interface. Details of their

operation will not be included here due to the possibility of their

being modified as the operating system is updated.

The AP Driver routines, when fully implemented, should be

completely compatible with the DXIO operating system now being used on

the 990. Notes will be made in the appropriate reference manauals of

the differences in implementation between the 990 and the PDP-11

system. The completed software package for the AP400 should include

all necessary host and AP libraries, as well as some pre-tested

application programs.

CHAPTER VIII

CONCLUSION

The 990/12 has proven to be a successful host system for the AP400

array processor; the Baseline driver and Manager routines implemented

in host assembly language perform as required. The AP400/990

implementation should prove to be a valuable computing tool for

research into such areas as real-time signal processing. With the

addition of analog to digital and digital to analog converters for the

auxiliary I/O ports on the AP400, the system could be used as an almost

independent unit - the 990 would be used for initial program loading

and possible storage of intermediate or final results, vîle the AP400

would function as a separate processor.

More extensive testing of the AP400/990 system will be necessary

to determine the best method of using the system. Several options

exist for the software driver implementations. At the present time,

this driver is implemented as a linked-in part of the user application

program. This method allows faster execution, but reduces the amount

of memory space available to the user task. The inclusion of the WCMD,

RCMD, WDAT, and RDAT Operation Codes in the Device Service Routine

would allow the modification of the Baseline driver to restrict all

actual AP400 I/O operations to the DSR. This would be useful in

applications where the user task cannot be made privileged (the Long

Distance instructions used in the present implementation of the driver

76

77

require that the task be installed as a privileged task).

Another method of implementing the Driver would be to include the

Baseline Driver in the Device Service Routine; in this manner the

subroutines required by the application program would be kept to a

minimum. The drawback to this type of implementation would be the

increase in processing time required by the system overhead routines.

Options also exist in the method by which the AP400 DMA operations

are performed. An alternative to the present method would be to

include a specific buffer area in the Device Service Routine from which

all DMA transfers are done. The data buffer in the application program

would then have to be transferred to this buffer before each DMA

operation. The advantage of this method is that the application

program would not have to be made memory resident - the DMA operation

would always be performed from the same base address, vîch is

determined when the system is generated (the operating system

containing the DSR and the data buffer is memory resident). In

addition, the map latch located on the interface board would not have

to be loaded because the memory resident portion of the operating

system is always loaded into the lower portion of system memory.

Experimentation with the AP400 driver routines for various

applications should determine the best method for each situation.

Several different drivers may be included in the system software, with

the particular driver used being determined by the application.

78

Problems still exist in the present hardware implementation. Some

of these are related to the PDP-11 interface controller included in the

AP400, while others are due to differences between the 990 TILINE and

the PDP-11 UNIBUS structures.

As discussed earlier, interrupt capabilities have not been

included in the present design. The major reason for this is the

difference between the UNIBUS and TILINE methods of interrupt handling.

For the UNIBUS, the interrupting device must first gain access to the

bus; an interrupt vector is then placed on the data bus to allow the

processor to execute the proper interrupt service routine. On the

TILINE, however, interrupts are generated through a single interrupt

line, with the interrupt level determined by the slot in which the

device is installed. While hardware could have been included on the

interface to handle this situation, it was felt that this would make

the hardware more complex than necessary. One method of overcoming

this problem would be to modify the microsequencer control code located

in ROM on the AP400 interface controller to be compatible with the

TILINE interrupt structure. The only hardware changes necessary would

be the inclusion of a single signal line from the UNIBUS INTR line to

the TILINE interrupt line. The Device Service Routine could then be

modified to handle the interrupts generated by the AP400. For many

applications, however, the AP400 may be used without the need for

interrupt capability (through the use of the AP400 polling routines).

79

Another problem in the current implementation is the lack of

capability for handling memory errors which may be encountered during

AP400 DMA operations. While both the TILINE and the UNIBUS have the

capability of generating memory parity error signals, the PDP-11

interface controller in the AP400 does not utilize the UNIBUS memory

parity signal lines. This should cause a minimum of problems, however,

because any memory parity errors encountered on the 990 will probably

be detected and logged by the CPU before the AP400 experiences any

problems related to this. In addition, the error correcting capability

of the memory controller on the 990 should keep this problem to a

minimum.

One other problem exists in the TILINE timeout circuitry. While

the timeout logic will prevent the AP400 from locking up the TILINE in

the event of an access to non-existent memory, no method is available

for informing either the AP400 or the 990 that this timeout has

occurred. One possible solution would be to use the timeout reset

signal to generate the TLTERM signal for the AP400. This would allow

the AP400 to continue execution, but the application program would

never be aware that the timeout had occurred and that the data is

invalid. Another method would be to reset the AP400 in the event of a

TILINE timeout - this may not be acceptable in many situations. In the

present implementation, a timeout will simply release the TILINE to the

next bus master; the AP400 will wait indefinitely for the device to

respond. This will inform the user that a problem has occurred, even

80

though it will require a hardware reset of the AP400 to acknowlege it.

The most promising method for solving this problem would be to use the

timeout error signal to generate an interrupt to the 990. The

interrupt handling routine could then be used to inform the calling

task of the error. If the interrupt capabilty of the AP400 is

implemented at a later date, the interrupt service routine would need

the capability of determining a valid interrupt from an error

interrupt. This could be done through the use of an interrupt status

flag, indicating whether the AP400 interrupts have been enabled - if

not, then the interrupt was caused by an error condition.

Use of the AP400/990 system should suggest the solutions to these

and other problems. The development of complete software libraries and

a simplified user interface should not prove difficult.

LIST OF REFERENCES

[I] Walter J. Karplus and Danny Cohen, "Architectural and Software Issues in the Design and Application of Peripheral Array Processors", Computer, volume 14 no. 9, p. 11, September 1981.

[21 AP400 Software Reference Manual, volume II, Analogic Corporation, Wakefield Massachusetts, 1980.

[31 Multiprocessors and Parallel Processing, Comtre Corporation, John Wiley and Sons, New York, 1974.

[4] Peter M. Kogge, The Architecture of Pipelined Computers, Mcgraw Hill, New York, 1981.

[51 William A. Wulf, Roy Levin, and Samual P. Harbison, HYDRA/C.mmp: An Experimental Computer System, McGraw Hill, New York, 1981.

[61 "Highly Parallel Computing", Computer, volume 15 no. 1, January 1982.

[71 "Data Flow Systems", Computer, volume 15 no. 2, February 1982.

[81 Robert A. Caspe, "Array Processors", Mini-Micro Systems, p. 54, July 1978.

[9] Robert Bernhard, "Giants in Small Packages", IEEE Spectrum, volume 19 no. 2, p.39, February 1982.

[10] An Introduction to the AP400 Array Processor, Analogic Corpora"tron, Wakefield Massachusetts, 1979.

[Ill PDP-11 Peripherals Handbook, Digital Equipment Corporation, Maynard Massachusetts, 1975.

[121 Model 990/12 Computer Hardware Users Guide, Texas Instruments, August 1979.

[13] Model 990/12 Central Processor Unit Depot Maintenance Manual, Texas Instruments, July 1980.

[14] Models 990/10 and 990/12 Computers Memories Depot Maintenance Manual, Texas Instruments, December 1980.

81

82

[15] Model 990/12 Assembly Language Reference Manual, Texas Instruments, January 1981.

[16] TILINE Three-State Asynchronous Data Bus Specification, Texas Instruments, August 1976.

[17] Model 990 Computer DXIO Operating System Concepts and Facilities Manual, Texas Instruments, April 1981.

[18] Model 990 Computer DXIO Operating System Production Operation Manual, T?ixas Instruments, April 1981.—

[19] Model 990 Computer DXIO Operating System Application Programming Guide, Texas Instruments, April 1981.

[20] Model 990 Computer DXIO Operating System Developmental Operation Manual, Texas Instruments, April 1981.

[21] Model 990 Computer DXIO Operating System Programming Guide, Texas Instruments, April 1981.

[22] Model 990 Computer DXIO Operating System Error Reporting and Recovery Manual, Texas Instruments, April 1981.

[23] Model 990 Computer FORTRAN Programmer's Reference Manual, Texas Instruments, April 1981.

[24] Model 990 Computer Link Editor Reference Manual, Texas Instruments, December 1979.

APPENDIX

A. TIMING DIAGRAMS

B. SCHEMATIC DIAGRAMS

C. SOFTWARE LISTINGS

83

QJ

a o U

cn U)

'^

rH U o U

2 ?"•

g 2 >H

C/3

n3 P g

a

84

85

•H

D CQ

c (—I r-

' ate

86

§ •H -P •H cn

s §

ro

5 M fa

PQ CQ

87

^

r^

^^

^

T3

5

^

i CN oi

cn

T

3

8 LD cn

• o b

I rH

cn

J

: :^

^

^

^

Q) rH

M EH

I

fa

—-

88

cn

i

a j-j-^

Oi EH

ftJ rH cn

rH

P •H

H EH

in

« g

89

o A

o A-

T

T ^

c 0

•H P •H cn

•H

o (U rH cn c

:i_bv

V

M £H

T O (D ^ cn c

•f 7

7^

\

^

i

M fa

o

a^

Q

H &H

cn

00 I X Q r

a^

a

K1 H

nzL

ro

in

8 VO u. cn

in

8 VD S-J ro Ln

VD ro l-H

in

8 u

I t

g CN

UL

00 ro '^ ro

8 ^

HI

¥^

IZ 13

i f ? 00 ro

ro

¥^ !i Li

ro

in s + r

3_J:

00 ro "^ ro

n

• ^ ^ a

• ^

-( cn Q

^| '

f -^ Q

• B

cn

•ll-

_> 0 0

H«-

^ ?3 - ^

, i n ^r- t

Q

O

s cn

4-) s I

PQ

M fa

t 90

91

0 • r i

CO D

CC

cn cn <D u

CN I

CC

fa

92

cr> 00 r^

^ ^ ^

V\AA^

CJ • H

• H

ro I

03

^

93

u

•H

O

c

8 14-1

cn

03

I CC

fa

94

i

^

O 35 rH o cn 0) <N C T3

u

^

c o •H X) •H cn

•H

cn

in I

95

RESET

TLIORES>

TLPRES

II>IIT

+5 ^JVVHHM" 5 ^ microsec

delay

r

TLWAIT J*

TLTM in

MASTER

TDIEOUT

FIGURE B-6: Timeout and Reset Logic

APPENDIX C

SOFTWARE LISTINGS

FCBBLK FCBID FCBCTL FCBDON FCBLNK

PCBPLT FCBNRG FCBLEN FCBARL

EQU DATA DATA DATA DATA DATA DATA DATA DATA DATA DATA DATA DATA DATA DATA

$

FFTCl 0 1 0 0 2 3 6 0 0 0 0 0 0

FCB Entry point AP function ID Control Word FCB done flag Link to next FCB If n

FCB Paramter list type Number of arguments Length of argument list Argument list n n M If

n N

ff N

If It

LISTING C-l: A Typical Function Control Block

96

97

* KRESET: *

*

*

*

*

*

*

* RSX

Host K - function; * performs hardware and software reset * of the AP400. sequence:

FORTRAN calling * *

CALL KRESET (IRESM) * where IRESM is to reserve (if is assumed).

the amount of memory * less than 32, Kbytes *

*

-11 Source by John Hawkins, 9/12/80 * * Optimized

*

KRESET KRSET WS

IRESM ORIG

NAME •ff

• ^

END

IDT DEF REF

EQU DATA BSS DATA DATA BL DATA DATA TEXT

MOV MOV BL MOV JEQ BL

RTWP END

1 for 990/12 by

•KRESET' KRESET,KRSET

Marvin Spinhime, 4/82 *

Entry points F$RGMY, MGRFER, RESAP

$

WS,ORIG 32 NAME,ORIG,1 0 @F$RGMY 1 IRESM 'KRESET'

@IRESM,RIO *R10,R2 @RESAP R12,R12 END §MGRFER

Entry point Entry data Workspace area FORTRAN link data Parameter buffer Link to FORTRAN Number of parameters Pointer to parameter Subroutine name

Get the parameter add Get parameter value Call the driver Error? NO Yes, call error rtn

Return to caller

LISTING C-2: A FORTRAN Callable K-Function

98

*

*

*

*

*

*

AP400 Device Service Routine

This is the operating system portion of the AP400 array processor driver software. It performs I/O Operations to the AP400, handles interrupts, power up, and assures that only one task at a time may access the AP400.

* Author: Marvin Spinhime * Date: 6/82

*

*

*

*

IDT 'APDSR'

* External Definitions *

DEF APDSR DEF APINT

DSR Entry point Interrupt handler entry

* External References

REF SETWPS,BRCALL,ENDRCD

* Equates it

APHLT EQU >80 INTDSA EQU >98 APRST •ft

APDSR

^ ^

^ m

EQU >F0

DATA APPWR DATA ABORT

LIMI >F BL §SETWPS

BL @BRCALL DATA 8 DATA ERROR DATA OPEN DATA CLOSE DATA MEMRES DATA UNMEM DATA WCMD DATA RCMD DATA WDAT DATA RDAT

Halt AP command Disable interrupts Reset AP

Power up routine Abort I/O routine

Do not mask interrupts Restore return vector

Call OPCODE decoder Maximum no. of opcodes Illegal opcode handler 00 - Attach to task 01 - Detach from task 02 - Setup for DMA 03 - Release memres 04 - Write to command register 05 - Read from command register 06 - Write to data register 07 - Read from data register

99

ERROB ERR02

DATA ABORT

BYTE >0B BYTE >02

08 - Abort AP operations

AP in use error Illegal op code error

* OPEN: Attach the AP to the calling task. The AP BUSY * flag will be set, and the runtime ID of the

calling task will be stored. If the AP is in * use, error >0B will be returned. *

-- *

OPEN MOV *R4,*R4 Is the AP in use? JEQ 0PEN2 No B @S /CERR Yes

0PEN2 INC *R4 MOV @-8(Rl),R5 MOVB 015(R5),@2(R4) BL @ENDRCD RTWP

Set the AP BUSY flag Get the address of the TSB Save the task runtime ID Tell the system we're through

CLOSE: Detach the AP from the calling task. The AP * busy flag and the runtime ID will be cleared. * The task will also be made non memory resident *

* *

CLOSE BL @CHECK *

CL0SE2 CLR *R4 CLR @2(R4) SZCB @MEMR,@10(R5) BL @ENDRCD RTWP

Is it the proper task?

Clear the AP BUSY flag Clear the runtime ID Make it non memory resident Tell the system we're through

* MEMRES: Make the calling task memory resident and * return bias register 1 of the map file in * word 6 of the SVC block.

*

*

MEMRES BL *

@CHECK

MEMR *

SOCB @MEMR,@10(R5) MOV @52(R5),(a8(Rl) BL @ENDRCD RTWP

BYTE >20


Make it memory resident Return bias 1

Data for making task memres

• UNMEM:

*,

Will make the task non memory resident without detaching it from the calling task.

*

100

UNMEM BL @CHECK

SZCB @MEMR,@10(R5) BL @ENDRCD RTWP


Make it non memory resident

* WGMD: Write the command stored in word 4 of the SVC to * * the AP400 command register. *

WCMD BL 0CHECK Is it the proper task?

Write the command MOV @4(R1),*R12 BL @ENDRCD RTWP

* — — *

* RCMD: Read the AP400 command register and return the * * value in word 6 of the SVC. *

RCMD BL (aCHECK

MOV *R12,@8(R1) BL @ENDRCD RTWP


Read the command

* WDAT: Write the command stored in word 4 of the SVC to * * the AP400 data register. * * *

WDAT BL *

§CHECK

MOV R12,R10 INCT RIO MOV @4(R1),*R10 BL @ENDRCD RTWP


Calculate data reg address

Write the data

* RDAT: Read the AP400 data register and store the value * * in word 6 of the SVC. . *

RDAT BL •ft

MOV INCT MOV BL RTWP

eCHECK

R12,R10 RIO *R10,@8(R1) §ENDRCD


Calculate the data reg. add

Read the data

* SVCERR: Return SVC error >0B *— —

101

SVCERR MOVB @ERR0B,@-1(R1) Load the error code BL @ENDRCD RTWP

*. .

* CHECK: See if the calling task is the same as the * * task attached to the AP. *

. *

* . . *

CHECK MOV §-8(Rl),R5 Get the TSB address CB @15(R5),@2(R4) I s i t the same task? JEQ OK Yes B esVCERR No

OK B *R11 Return * • ,. „ -*

* ERROR: Illegal opcode handler. * * *

ERROR MOVB eERRO2,0-l(Rl) Load the error code BL @ENDRCD RTWP

* *

* APINT: AP interrupt handler. **.

APINT RTWP *

* ABORT: Abort I/O routine - the runtime ID of the * * task attached to the AP (If any) is returned * * in byte 1 of the SVC block. * * *

ABORT LI R7,@INTDSA Disable AP interrupts MOV R7,*R12 LI R7,§APHLT Halt the AP MOV R7,*R12 MOVB @2(R4),@-l(Rl) Return the runtime ID B @CL0SE2 Go detach the task

* *

* APPWR: Power up routine. * * *

APPWR CLR *R4 Clear the AP BUSY flag CLR a2(R4) Clear the runtime ID RTWP

END

LISTING C-3: The Device Service Routine

Date post:	05-Jan-2022
Category:	Documents
Upload:	others
View:	10 times
Download:	0 times

IMPLEMENTATION OF AN ARRAY PROCESSOR/ MINICOMPUTER SYSTEM …

Documents