IMPLEMENTATION OF AN ARRAY PROCESSOR/ MINICOMPUTER SYSTEM
by
MARVIN J. SPINHIRNE, B.S in E.E.
A THESIS
IN
ELECTRICAL ENGINEERING
Submitted to the Graduate Faculty of Texas Tech University in Partial Fulfillment of the Requirements for
the Degree of
MASTER OF SCIENCE
IN
ELECTRICAL ENGINEERING
Approved
Accepted
.J ugust, 1982
ACKNOWLEDGMENTS
I am deeply indebted to Dr. Donald L. Gustafson for his help in
the production of this thesis, and to Dr. Thomas Krile and Dr. Milton
Smith for serving on my committee.
I would also like to thank Mr. Steve Patterson for his help with
the construction.
11
CONTENTS
ACKNOWLEDGMENTS ii
LIST OF TABLES V
LIST OF FIGURES vi
Chapter
I. INTRODUCTION 1
Background 1 Purpose 2 Outline of Thesis 5
II. ARRAY PROCESSOR ARCHITECTURE 7
III. THE ANALOGIC AP400 15
The Pipeline Arithmetic Unit 17
The Control Processor 19 The Data Memory 21 The I/O Unit 21 Host/AP400 System Operation 23
IV. THE UNIBUS OPERATION 26
UNIBUS Operation 27 UNIBUS Specifications 30
V. THE 990/12 MINICOMPUTER 32
The 990/12 Central Processor Unit 32 The TILINE Data Bus 37 TILINE Specifications 42
VI. THE HARDWARE INTERFACE 45
The Address Bus 46 Tlie Data Bus 47 DMA Memory Mapping 49 Data Transfer Control Logic 49 TILINE Bus Acquisition Logic 53 Reset and Timeout Logic 56 AP400 Bus Terminations 57
111
VII. THE SOFIWARE INTERFACE 58
The AP400 Host Resident Manager 60 The AP400 Host Resident Driver 62 Tlie Device Service Routine 69 AP400/990 Operation 74
VIII. CCNCLUSION 76
LIST OF REFERENCES 81
APPENDIX 83
A. TIMING DIAGRAMS 84
B. SCHEMATIC DIAGRAMS 90
C. SOFTWARE LISTINGS 96
IV
LIST OF TABLES
Table Page
1. AP400 SPECIFICATIONS 17
2. UNIBUS SIGNALS 28
3. TILINE SIGNALS 40
4. BASELINE DRIVER ROUTINES 67
5. PROGRAM LOAD DRIVER ROUTINES 70
LIST OF FIGURES
Figure Page
1. AP400/990 System Block Diagram 4
2. The Basic von Neuman Machine 8
3. AP400 Block Diagram 16
4. TILINE Map File and Address Development 36
5. 990 Memory Map 38
6. TILINE Master Priority Logic 43
7. AP400 Address Development 48
8. Bus Acquisition Sequence 55
9. Supervisor Call Block Structure 73
A-1. UNIBUS Read Cycle 84
A-2. UNIBUS Write Cycle 85
A-3. UNIBUS Acquisition Cycle 86
A-4. TILINE Read Cycle 87
A-5. TILINE Write Cycle 88
A-6. TILINE Acquisition Cycle 89
B-1. Data Bus Logic 90
B-2. Address Bus Logic 91
B-3. DMA Mapping Logic 92
B-4. Data Transfer Control Logic 93
B-5. Bus Acquisition Logic 94
B-6. Reset and Timeout Logic 95
VI
CHAPTER I
INTRODUCTION
Background
In recent years the demand for computer architectures capable of
processing large arrays of data quickly and efficiently has
increased dramatically. Applications such as real-time signal
processing, simultaneous solutions of partial differential equations,
and manipulation of large arrays of data all require fast as well as
accurate processing capabilities. Several large mainframes, such as
the CRAY 1 and the CDC CYBER series of computers, while having the
capability for this type of processing, are cost prohibitive in many
smaller applications. Minicomputers, while being cost effective for
general applications, do not possess the processing speed for many
scientific applications. This state of affairs has led to the
development of the attached peripheral processors known as Array
Processors. These processors are typically designed to work with
existing minicomputers, enhancing their processing speed for particular
applications by up to 1000 times. Some array processors, such as the
IBM 3838 and the CDC MAP-III are designed to work with existing large
mainframes.
The term "array processor" has been used to describe many
different types of computing systems - systems designed for processing
vectors, arrays of nunt)ers, and data originating from arrays of points
as well as systems composed of arrays of processing elements. The term
array processor will be used here to designate a scientific processor
designed as an attached peripheral for an existing host computer; it
achieves high performance through specialized architecture such as
pipelining and/or parallelism, and can be programmed by the user for
different applications. In many cases the array processor possesses
its own external I/O ports which can be configured for data acquisition
or other purposes [1].
Present day array processors are used for two basic types of
computation: vector processing and digital signal processing. Many of
the original array processors were developed specifically as signal
processing units. In addition, many of the design techniques used in
array processors were originally developed for large mainframe vector
processors. The largest use of array processors today is for increasing
the computational power of a minicomputer system in such applications
as speech synthesis and recognition, geological data processing, and
general scientific computations. Many of the currently available array
processors are designed to be as general purpose as possible while
still maintaining their processing speed.
Purpose
Itie purpose of this thesis is to present the implementation of an
Array Processor/Minicomputer system. This required the development of
both a hardware interface between the two systems as well as a complete
software package for control of the array processor.
Ttie array processor used in this implementation is an Analogic
Corporation model AP400. Analogic provides interface hardware and
software for the AP400 for the following host minicomputers: Digital
Equipment Corporation PDP-11 and VAX computers; Hewlett Packard 21MX;
Data General Nova and ECLIPSE; and Interdata 7/16 and 8/16 computers
[21. While several PDP-11 and a VAX computer system are currently
being used in the Electrical Engineering department, none of these
machines were available for use with the AP400. In order to implement
an AP/Minicomputer system v^ich would be available for general use or
potential new areas of research, it was decided that the system should
be implemented using a Texas Instruments 990/12 computer currently
available in the department. A block diagram of the system as
implemented is shown in figure 1.
The use of the 990/12 for this system required the
development of a complete hardware interface between the 990/12 TILINE
data bus and the DEC PDP-11 interface board supplied with the AP400
(this interface board is essentially a DEC UNIBUS interface) . In
addition, all of the existing software packages written for the AP/PDP
system had to be re-written and optimized for the 990/12. This
included both FORTRAN and assembly language packages, as well as some
modifications to the existing AP400 assembly language libraries. This
thesis will present the development of the hardware and software
package.
COMMUNICATIONS REGISTER UNIT (CRU)
TILINE ASYNCHRONOUS DATA BUS
FIGURE 1: AP400/990 System Block Diagram
Outline of Thesis
Chapter II presents a brief introduction to some typical computer
architectures employed in current array processor designs
specifically parallel and pipelined structures. This chapter is
intended only as an introduction.
Chapter III presents a description of the Analogic AP400 array
processor architecture; this includes a partial description of the
software operation as well as the basic components of the hardware.
Chapter IV is a description of the operation of the DEC UNIBUS
(the data bus used on all PDP-11 computers). This material is
necessary for an understanding of the AP/990 interface.
Chapter V is a description of the CPU and memory systems as well
as the TILINE bus structure. A brief description of the current 990
configuration is included.
Chapter VI presents the actual hardware interface which was
designed and built; chapter VII is a description of the software
implementation. Complete software listings are not included in this
thesis - only selected examples to illustrate the basic structure.
Differences between the 990/12 and the PDP-11 implementations, and
problems which were encountered or still exist are discussed.
Additional information on the operation of the system is included in
the reference manuals provided with the AP400 and the 990 system.
Chapter VIII is a concluding chapter on the operation of the
AP400/990 system, as well as suggestions for use of the system.
Appendix A contains the timing diagrams which are referred to in
chapters IV, V, and VI. Appendix B contains the schematic diagrams of
the hardware interface which are referred to in chapter VI. Appendix C
contains example software listings referred to in chapter VII.
CHAPTER II
ARRAY PROCESSOR ARCHITECTURE
Since the introduction, of the first computers in the decade
following World War II, processing requirements have increased rapidly.
The technology available to the computer architect, however, has rarely
kept up with these needs; the designer is always limited by the maximum
processing speed of the individual logic elements. For this reason,
computer architectures must make increasing use of new designs and
concepts. The earliest computers were based upon the von Neuman
architecture [31. This architecture consists of five basic units: the
Arithmetic and Logic Unit (ALU), which performs all mathematical and
logical functions; the memory unit containing both program and data
memory; the input and output units; and the Control Unit v^ich
maintains control over the other four units (see figure 2). This basic
pattern has inherent problems in that all I/O operations must pass
through the ALU; this prevents any processing from taking place while
I/O operations are in progress. Since this time, many different
methods have been implemented for overcoming this problem,
e.g., altering the data flow paths to allow I/O operations to occur
concurrently with processing operations. Two of the most widely used
methods of increasing overall computation speed have been the use of
parallel and pipelined computer architectures.
8
INPUT UNIT
4 V
MC*MrtT3V
UNIT
K
y'
'^—
ARITHMETIC & LOGIC UNIT
CONTROL
UN] CT
OUTPUT UNIT
U
FIGURE 2: Tlie Basic von Neuman Machine
The distinction between parallel and pipelined architectures is
often hard to make in practical applications. Both parallel and
pipelined structures can be classified as allowing concurrent
operations - several operations are being processed within the system
at any instant. Parallel architecture designs are usually accomplished
by providing several copies of a basic piece of the computing hardware;
each component is then programmed to operate on a specific portion of
the input data. Pipelined designs consist of breaking the computation
process into several smaller subfunctions; each of these can then be
processed by a separate piece of hardware, called a stage, which are
arranged in a "pipeline" - the outputs of one stage are passed to the
inputs of the following stage. One of the main limitations on how fast
the data may be processed is the speed at which new data may be fed to
the input of the pipeline [41 .
Itie use of pipelining may be demonstrated by considering the
implementation of a floating point addition. The addition operation
can be divided into the following steps [41:
1) Subtraction of the exponents. 2) Shifting right the fraction from
the number with the smaller exponent by an amount corresponding to the difference in exponents.
3) Addition of the other fraction to the shifted one.
4) Counting the number of leading zeroes in the sum.
5) Shifting left the sum by the number of leading zeroes and adjusting the exponent accordingly.
10
To implement this operation as a pipeline, a separate piece of hardware
can be designed to handle each of the five subfunctions. Itiese units
are then arranged in a pipeline; the output of one stage is passed to
the input of the next stage. When each stage has processed one data
set in the input data stream, it may then begin processing the next
data set. This type of configuration will achieve the largest overall
speed improvements for large input data streams.
The use of parallelism, or multicomputer architecture, is almost
as old as modern digital computers. This type of architecture consists
of several subunits, usually identical, which are capable of
simultaneous processing. The earliest implementations of this type
were for purposes of reliability - typically two or more identical
computer systems were attached in such a way that one could take over
the processing if the other failed. While this type of system did not
increase computation speeds, it did provide the experience for later
development of truly parallel processors. Another common method of
parallelism is the provision of separate processing elements for
handling computation, I/O, and memory accesses. In this type of system
multiple data paths are usually provided between the functional
subunits. In this manner, computations may be performed while I/O or
memory accesses are being performed.
One method of implementing multiple computer systems is through
the interconnection of I/O channels. In this type of architecture,
each unit in the system treats the other as an I/O channel or device.
11
Interaction between systems is at the data set level only.
Another method is through the use of shared memory. In this
configuration, several separate processors have access to a common
memory unit, as well as their own private memory. Intermediate results
may then be passed between the separate units through the common
memory. In this type of system a much higher level of interaction is
necessary to maintain validity of the data in the common memory.
The most sophisticated method, and the only one which may truly be
classified as a multiprocessor system, is the type which incorporates
several ALU or CPU modules in the common architecture, all of which are
capable of operating simultaneously [31. Some good examples of this
type of parallel system are the ILLIAC IV, which employs 64 processing
elements, and the C.mmp system which employs 16 PDP-11 processors [51.
Often a control processor is provided to maintain synchronization
between the parallel processors.
The computer architectures discussed above achieve their greatest
speed improvements for vector processing applications. Vector
processing typically refers to a computation performed repeatedly on a
large array of input data; the same computation is performed for each
element of the array. This type of processing lends itself well to
pipelined architectures in that once the pipeline has been set up for a
particular operation, the data may be streamed through at a high rate
of speed. Often the only limitation is the bandwidth of the memory
unit supplying the input data stream. For scalar operations (an
12
operation performed on a single data value), however, the pipelined or
parallel architecture quickly loses its speed advantages. For this
reason, many of the high speed scientific mainframes employ both vector
and scalar hardware.
Parallel and pipelined architectures are often used
simultaneously. A processor, for example, may include multiple
independent function units such as multipliers, adders, etc., vrtiich can
be used in parallel - if one multiplier is in use another may be used
concurrently. At the same time, each of the separate function units
may well be pipelined as discussed earlier for the addition. Computer
architectures are commonly broken into four distinct categories [41:
SISD - Single Instruction stream/ Single Data stream SIMD - Single Instruction stream/Multiple Data stream MISD - Multiple Instruction stream/Single Data stream MIMD - Multiple Instruction stream/Multiple Data stream
Most conventional computer architectures are described by the SISD
category. Vector processors usually are contained in the SIMD
category, and multiprocessor systems in the MISD category.
There are many experimental methods of implementing parallel
computer architectures which are currently being explored. These
methods typically employ many identical processing units; different
methods of interconnection are used to improve performance for various
processing needs. In addition, totally new architectures, such as data
flow computers, are being researched [61, [71.
13
%
One of the problems associated with the use of the computer
architectures described is the increased difficulty of programming. In
order to make full use of the parallel or pipelined elements it is
often necessary for the programmer to have a complete understanding of
the hardware structure. Pipelined processors typically present fewer
problems in this area than do parallel architectures in that a pipeline
can usually be controlled through microcode developed by the hardware
designer. The most frequently used operations (such as multiplies,
adds, etc.) are preprogrammed and are transparent to the user. The
efficient use of parallel architectures, however, is often task
dependent. Speed increases are obtained only v^en individual tasks can
be allocated to the separate processors.
Initially the computer architectures discussed above were confined
to large, expensive mainframes. In recent years, however, the need for
more computational capabilities at a moderate price has led to the
development of the attached peripheral or array processor.
Several different architectures have been used in the development
of modern array processors; most of these include some form of both
parallel and pipelined structures. In addition, several methods of
programming for these systems have been used. At the highest level,
FORTRAN compatible subroutines are often provided for programming
convenience, while at the lowest level microcode is typically used for
direct control of the hardware.
14
The major reason for the development of current array processors
was the need for high speed vector processing at a reasonable cost.
The current AP designs accomplish this by restricting their design, in
most cases, to the necessary processing elements. Most of the I/O
operations and peripheral device handling is done by the host system;
the array processor is dependent upon its host for the input data
stream and control functions. In some cases the host is also used for
pre-processing of the data.
Current array processor designs range from a single printed
circuit board which may be installed in the host system backplane, to a
completely stand alone system designed for such applications as real
time signal processing. This area is covered in detail in references
[11, [81, and [91.
CHAPTER III
THE ANALOGIC AP400
Ttie array processor used in the implementation to be presented is
an Analogic Corporation model AP400. This processor was designed as a
high speed computational unit to be used with an existing minicomputer
system for such applications as signal processing. It achieves
processing speed increases through a combination of parallel and
pipelined architectures. This chapter presents a functional
description of the AP400; for a more detailed description refer to
reference [101.
Ttie Analogic AP400 array processor consists of four basic
subunits: the Pipeline Arithmetic Unit, the Control Processor unit,
the Input/Output controller, and the Data Memory unit (see figure 3).
Each of the four basic units operates as a separate processor;
communication between the units is performed asynchronously where
possible, thus allowing each to operate independently of the others.
Some of the specifications of the AP400 are contained in Table 1; all
specifications refer to floating point operations. Each of the
subunits will be described below. The following information has been
summarized from references [101 and [21.
15
16
HOST SYSTEM BUS
AP400
Control Processor
\Z COMMAND & CONTROL BUS
\Z RALU BUS
y\
7S
x>x>v>
7S
SZ. I/O
Host & Aux
^W\ /\ \L
O J
AP400 AUXILIARY
BUS
iz DATA BUS
Data Memory
\Ji Pipeline
Arithmetics
FIGURE 3: AP400 Block Diagram
17
TABLE 1: AP400 SPECIFICATIONS
MULTIPLICATION RATE Up to 2.1 million/sec. ADDITION AND SUBTRACTION RATE Up to 6.3 million/sec. 512 POINT REAL FFT 1.5 milliseconds 1024 POINT REAL FFT 3.6 milliseconds 1024 POINT COMPLEX FFT 7.4 milliseconds REAL CONVOLUTION 7.3 milliseconds
The Pipeline Arithmetic Unit
The Pipeline Arithmetic Unit is the main processing unit of the
AP400. A complete pass through the pipeline consists of fetching
operands, operating on than, and storing the results. One pipeline
pass requires 36 clock intervals of 160 nanoseconds each.
The AP400 pipeline unit is composed of three stages:
Stage A: Data Characterization Stage B: Data Manipulation
Stage C: Data Accumulation and Logical manipulation
TTie input to the pipeline consists of eight 24-bit values; the output
will consist of four 24-bit results. Each stage of the pipeline is
controlled through a Pipeline Arithmetic Command (PAC) and may be configured for several variations of its particular function. In this
manner the pipeline may be reconfigured for a variety of applications.
Ttie input to the characterizer stage consists of eight 24-bit
values. These eight inputs may be configured as four complex pairs of
numbers or eight independent values. The Command and Address Buffer
(CAB), located on the data memory board, contains the addresses of the
four input pairs. The characterizer may use the data values of the
first and second data words to modify the initial addresses of the
18
third and fourth data words; this allows the use of data look-up tables
for certain applications such as logarithms and trigonometric values.
The four data word addresses may also be passed through unmodified,
thus generating four pairs of multiple inputs to the next stage.
The output of the characterizer stage is passed to the multiplier
stage, which accepts the eight 24-bit operands and generates four
24-bit outputs. The result is a 48-bit word which may be either
rounded or truncated to a 24-bit result; two 24-bit results may be used
as two parts of a 48-bit double precision result. The purpose of the
multiplier stage is to prepare the data inputs for the ALU section of
the pipeline, which is contained in the Accumulator/Logic stage. The
multiplier stage itself consists of four adjacent multipliers, which
may be configured in several different groups. The pipeline arithmetic
command for the multiplier stage will determine:
1) which inputs will be a multiplier, multiplicand or bypass operand (since the third and fourth data word inputs may be table look-up data, the multiplier stage must be capable of passing this data through unmodified);
2) if the product will be truncated or rounded;
3) if the MSB, LSB, or the bypass operand will be passed to the next stage;
4) if an adjacent accumulator result will be introduced into the accumulator;
5) if the result will be downshifted 0, 1, 2, or 3 binary places.
19
The next stage of the pipeline is the Accumulator/Logic stage.
This section consists of four processing units, one for each of the
four outputs from the multiplier stage. Each processing unit consists
of two Arithmetic Logic Units (ALU's) designated the X and the Y ALU,
and associated data selection and storage hardware. This section
performs the actual computation and processing of the data. The
Accumulator/Logic stage contains eight 24-bit accumulator registers
(two for each section of the unit designated the T and S registers)
which may be loaded by the current Pipeline Arithmetic command, or may
have been loaded by the previous PAC command. Each ALU has two inputs,
called the P and Q inputs; the inputs to each ALU may be from either of
the accumulators or from the output of the multiplier stage. The
output of the ALU's may be directed to the pipeline output or to the
accumulators for use by the next command. The functions performed by
each of the ALU's is determined by an ALU function select code
generated by the control processor.
The Control Processor
The Control Processor unit is the executive controller of the
AP400. It consists of a 16 register File Arithmetic and Logic unit
(the RALU), a program counter, program memory address register, command
and instruction decoder blocks, status bit register, and an interrupt
vector encoder. The Control Processor is implemented with four 2901
4-bit slice ALU units. The Control Processor is responsible for
20
performing pipeline setup and control, as well as the data memory
allocation.
The control program for the Control Processor is contained in the
Program Memory, which consists of 2048 22-bit words. This memory is
loaded with the AP400 Executive and Function Library by the host
computer prior to operation. The program memory address register, 12
bits in length, is used to access the program memory. It may receive
inputs from the interrupt encoder, the command and instruction bus, the
RALU bus, or the program memory itself. Instruction prefetch is
performed using the Program Memory Data Register. The control
processor does not have the capability of modifying the program
memory; this can only be done by the host computer.
The control processor contains 16 16-bit registers (R0-R15) which
are used for address development. This allows addressing of up to
65536 words of data memory, which is separate from the program memory.
The control processor can access data memory on a cycle stealing basis
with the Pipeline Arithmetic section, thus allowing it to handle its
own data memory allocation and perform any necessary scalar processing
which cannot be efficiently performed by the pipeline. Stack
operations (for subroutine and interrupt handling) are performed by
using register RO as a stack pointer. The vector interrupt encoder
section can handle up to eight interrupt levels; the interrupts can be
individually masked.
21
The Data Memory
The Data Memory consists of up to 65536 24-bit words. The memory
addresses can be generated by the three other units - the control
processor, the pipeline unit, and the I/O unit. Access to data memory
is handled on a priority basis, with the pipeline having the lowest
priority. This allows the I/O unit and the control processor to
utilize the bus for data transfers; the pipeline clock is halted
during this transfer. The pipeline unit obtains data directly from the
data memory. The command and address data, however, are obtained
directly from the control procesor via the RALU bus. These cornnands
are stored in the Command and Address Buffer (CAB) for use by the
pipeline. The CAB can hold up to 64 24-bit words, which constitutes 16
pipeline commands. The control processor is responsible for keeping
the buffer full during pipeline operation. If the buffer is emptied
the pipeline will halt itself until the buffer is filled; this allows
asynchronous communication between the two units.
The I/O Unit
The Input/Output (I/O) unit provides for communication between the
array processor and the host computer, as well as control of the
auxiliary I/O port. The I/O unit is responsible for directing the
operation of the array processor - Halt, Run, Single Step etc. It also
performs all data transfer between the host and the array processor,
controls DMA transfers, and performs diagnostic functions.
22
The host interface is performed through a particular host's data
bus. The interface is controlled through two addresses - one for the
Command Register and one for the Data Register. The communication may
be performed in three different modes: programmed I/O, DMA, or
interrupt.
In the progranmed I/O mode, the host controls the array processor
through the command and data registers. The command register is used
to write to the command and message register of the AP400, thus passing
commands to the unit; it is also used to read from the AP message and
status register, thus determining the current state of the AP. The
data register is used to pass data to and from the AP. The host may
transfer an immediate command, v^ich will be executed upon receipt by
the AP, or a non-immediate command which will be used to route data
transferred through the data register.
In DMA operations, the host loads the proper registers in the AP
with the value of the Host memory and AP memory addresses from and to
which the DMA transfer will occur. The host then directs the AP to
perform the DMA, at which time the AP will gain control of the bus to
perform the transfer. This process allows operation of the A? with a
minimum of overhead required of the host.
If the proper status bits in the AP have been set by the host, the
AP will be allowed to interrupt the host upon completion of some
specified action. The host resident interrupt routine for the AP is
then executed; the action taken by this routine will depend on the host
23
resident software (the AP Driver and Manager routines).
The AP400 Auxiliary I/O port is also controlled by the I/O unit.
This port consists of two 24-bit registers and associated handshake
signals (this provides one 24-bit input and one 24-bit output port).
This port may be used by the AP for input and output of data, thus
allowing such operations as real-time signal processing which is
completely independent of the host computer. The liost must load the
appropriate control program into the AP before operation; ttie unit can
then be allowed to proceed on its own, interrupting the host in the
event of an error or upon completion. The auxiliary port could
possibly be used for access to external devices such as a disk drive or
memory.
Host/AP400 System Operation
The AP400 is controlled by the Executive Control program located
in AP program memory. This program must be developed on the host
computer and loaded into AP memory prior to operation of the processor.
It is written in AP400 machine language, which is specific to the AP
control processor, and assembled through a cross-assembler located on
the host machine. It must then be linked with the necessary Library
functions for the operations to be performed (FFT's, convolutions,
etc.). Both the Assembler and Linker are written in host FORTRAN.
In addition, an AP400 software driver routine must be resident in
the host. This driver is written in host assembly language and
. 24
performs all l/o operations with the AP400. Calls are made to the
driver from host FORTRAN or assembly language programs.
At a lower level, the control processor uses Pipeline Arithmetic
Commands (PAC's) to control the pipeline unit. These are stored in
programmable read only memory units which are factory programmed by
Analogic. Provisions have also been made to allow the end user to
change these PROM's to develop its own firmware. The PAC's consist of
two basic units: the PIPE which actually controls the pipeline
function, and the PAD which performs setup operations. Each PAC
consists of five PIPE'S and four PAD'S. With careful programming,
PAC's may be interleaved to provide faster operation.
All AP400 arithmetic operations are performed on arrays of data
stored in data buffers in the AP400 data memory. These data buffers
are transferred from the host by special Library Functions which also
perform any necessary conversion between host and AP4G0 floating point
representations. When a block of data has been transferred to the
AP400, any number of operations may be performed on the data; the
functions may be "chained" together. When finished, the AP may be
directed to transfer the result (which is also contained in an AP data
buffer) back to host memory. All data buffer transfers are done
through DMA operations.
A complete software library has been developed for the AP400.
This includes many AP resident routines used to perform the library
functions, as well as the host resident manager subroutines. The
25
operation of the AP400 may be tailored to individual needs through the
AP Assembler and Linker. Utilization of the auxiliary I/O ports allows
use of the AP400 for real-time signal processing applications.
CHAPTER IV
THE UNIBUS OPERATION
The AP400 array processor was designed to interface to Digital
Equipment Corporation PDP-11 minicomputers. The interface controller
for the PDP machine performs all communication with the host through
the host system bus - in this case the DEC UNIBUS. This chapter
presents an overview of the UNIBUS operation.
The UNIBUS is the bidirectional asynchronous data bus used by all
PDP-11 minicomputers for data transfer. It is a 16 bit data path
implemented with 56 signal lines: 18 address lines, 16 data lines, 7
control lines, and 12 priority arbitration lines.
Communication between devices on the bus is done through a
master/slave relationship. One device on the bus gains control of the
bus (the bus master); the device to which it issues commands is the
slave device. A typical bus master is the CPU, while a typical slave
device is the memory. Some devices, such as a disk drive controller,
may be both a master and a slave, depending on the mode of operation.
Bus mastership is typically granted on a priority basis. Each
master is assigned a priority level, and access to the bus is granted
according to this priority.
The UNIBUS is completely asynchronous in operation. Maximum
transfer rates are determined by device speed and length of the bus.
With optimum device design the maximum transfer rate is 2.5
26
27
megawords/second. The following information has been summarized from
reference [11].
The UNIBUS address, data, and control lines are described in Table
2.
UNIBUS Operation
The following paragraphs describe the sequence of events and
timing requirements for UNIBUS data transfer operations. Timing
diagrams for each of the operations are included in Appendix A.
Master - Slave Read Cycle. After gaining access to the bus, a
master performs a UNIBUS read cycle by placing the address on the bus
and asserting the proper control lines (CO and Cl). After 150
nanoseconds (75 nanoseconds maximum UNIBUS skew and 75 nanoseconds for
address decode) the master asserts MASTER SYNC (MSYN) to indicate that
the data is valid. The device which has been addressed then performs
the requested read operation and places the data on the data lines.
When the data is valid, the slave asserts SLAVE SYNC (SSYN). The
master then waits 75 nanoseconds after receipt of SSYN for data deskew
and strobes the input data. The master then releases MSYN, and after
75 nanoseconds releases the address and control lines. The slave,
after receiving the release of MSYN, will release SSYN. The master may
then release the bus or perform another data transfer.
28
TABLE 2: UNIBUS SIGNALS
Address Lines - 18 lines (0 through 17)
Data Lines - 16 lines (0 through 15)
Control Lines CO and Cl - 2 lines used by the bus master to designate a read or a write operation.
MASTER SYNC (MSYN) - 1 line; used by the master device to indicate valid data on the address and data lines.
SLAVE SYNC (SSYN) - 1 line; used by the slave to indicate that it has completed its part of the data transfer.
INTERRUPT REQUEST (INTR) - 1 line; asserted by the bus master to indicate an interrupt request; the interrupt vector must be on the data lines at this time.
BUS REQUEST (BR4-BR7) - 4 lines; used to request bus access for an interrupt.
BUS GRANT (BG4-BG7) - 4 lines; used by the bus arbiter to grant access to the bus in response to a Bus Request.
NON-PROCESSOR REQUEST (NPR) - 1 line; used to request access to the bus for a standard data transfer.
NON-PROCESSOR GRANT (NPG) - 1 line; used by the bus arbiter to grant the bus in response to a NPR.
SELECTION ACKNCWLBGE (SACK) - 1 line; used by a master to acknowlege the bus grant.
BUS BUSY (BBSY) - 1 line; indicates that the bus is in use by a master.
PARITY ERROR (PA,PB) - 2 lines; used by a slave to indicate that a parity error has occured.
INITIALIZE (INIT) - 1 line; used to reset all UNIBUS devices upon execution of a RESET instruction, activation of the console START switch, or upon power failure.
AC LINE LOW - 1 line; indicates impending failure of the AC power.
DC LINE LOW - 1 line; indicates impending failure or instability of the DC power.
29
Master - Slave Write Cycle. The master gains access to the bus,
then places the proper address and data on the lines and asserts the
proper control lines. The master may then assert MSYN, after a 150
nanosecond deskew delay, to indicate valid data. Upon receipt of MSYN
the addressed slave device performs the write operation, and asserts
SLAVE SYNC (SSYN). The master may release MSYN upon receipt of the
SSYN signal. The address, data, and control lines may be released
after a 75 nanosecond deskew time. The slave will release SSYN upon
receipt of the release of MSYN; the master may then release the bus or
perform another data transfer.
Bus Acquisition Sequence. The master device first asserts
NON-PROCESSOR REQUEST (NPR) to indicate a bus request. The bus arbiter
will assert NON-PROCESSOR GRANT (NPG) when the bus becomes available.
The requesting device, on receipt of NPG, will assert SELECTION
ACKNOWLEGE (SACK) to indicate it has received NPG; it may also release
NPR at any time after asserting SACK but before SACK is released. The
arbiter, upon receipt of SACK, will release NPG. The requesting device
then begins monitoring the BUS BUSY (BBSY) line. When BBSY has been
released, the requesting device asserts this line and begins data
transfer operations. The master device may release SACK at any time
after the assertion of BBSY but before BBSY is released - this allows
the bus arbiter to resume the arbitration sequence for the next bus
master. The present bus master releases BBSY upon completion of the
30
data transfer sequence.
Interrupt Sequence. The bus master generates a system interrupt
by gaining access to the bus and asserting the INTERRUPT REQUEST (INTR)
line. The bus acquisition sequence is performed as described above
except that the BUS REQUEST (BR) and BUS GRANT(BG) lines are used
instead of the NPR and NPG lines. The master, after asserting INTR,
places an interrupt vector on the data lines. The processor strobes
this interrupt vector after a 75 nanosecond deskew delay, then asserts
SSYN. The master, upon receipt of SSYN, releases MSYN and the bus.
The processor then uses the interrupt vector to determine the address
of the proper interrupt service routine for the device.
UNIBUS Specifications
The UNIBUS utilizes 120 ohm characteristic impedance doubly
terminated transmission lines. UNIBUS signal levels are:
Logic 1 = 0 volts (LCW)
Logic 0 = +3.4 volts (HIGH)
The rest state of the bus (except BG and NPG) is a logic 0 level of
approximately 3.4 volts. Typical receiver switching threshold is 1.5
volts.
Timeout protection is usually provided by the bus master on the
UNIBUS. A monostable multivibrator is triggered each time the MSYN
signal is asserted by the master; if the slave device does not respond
31
wi ithin the timeout period, the SSYN signal is generated by the
monostable multivibrator. Timeout is usually set at 10 to 25
microseconds on the UNIBUS. Maximum UNIBUS length is 50 feet (using a
ribbon cable). Maximum UNIBUS loading without a repeater is 20 bus
loads.
The interface between the AP400 and the TI 990/12 was designed to
translate the UNIBUS control, address, and data signals described to
the proper levels required by the 990/12. Subsequent chapters will
present the 990/12 and the interface logic.
CHAPTER V
THE 990/12 MINICOMPUTER
The AP400 array processor is designed to operate with an existing
host minicomputer system; in this case the host is a Texas Instruments
990/12 minicomputer. All communication with the AP400 is done through
the host system bus. For the 990/12 this is the TILINE asynchronous
data bus. This chapter presents a brief description of the 990/12
processor and a detailed description of the TILINE bus operation.
The 990/12 processor has a 16-bit word length and incorporates
floating point arithmetic, byte string operations, bit array
intructions, and multiprecision integer and decimal conversion. The
990/12 is implemented in three basic units: the Arithmetic Unit (AU),
the System Mapping Interface (SMI), and the memory and memory
controller unit. Input/Output operations are handled by two different
methods: through the TILINE asynchronous data bus or through the
Communications Register Unit (CRU). All high speed data transfers,
such as disk I/O, memory access, and DMA transfers are done through the
TILINE. Slower operations, such as communication with terminals and
printers, are done through the CRU [121 .
The 990/12 Central Processor Unit
The central processor, as discussed earlier, is implemented in
three separate units. The following paraghraphs describe the operation
of each of these units.
32
33
The Arithmetic Unit. The Arithmetic Unit is implemented on one
ten-layer printed circuit board with four SN74S481 4-bit slice
processor elements, which comprise the arithmetic and logic unit.
Control functions are performed by three cascaded SN74S482 4-bit slice
microsequencer elements using read-only memory (ROM) microsequencing.
The Arithmetic Unit is microprogrammed to implement a super-set of the
9900 microprocessor instruction set. Additional instructions, such as
floating-point multiplication and division, have been included. The AU
is also user microprogrammable through the Writable Control Store,
which consists of 1024 64 bit words. Special instructions are included
in the instruction set for loading the WCS.
The 990 series of minicomputers utilize multiple register files,
consisting of 16 registers each, \4iich reside in main memory. The only
user accessible registers located on the AU board are the Program
Counter, Memory Counter, and the Status register. In addition, the
990/12 architecture includes a workspace cache which contains a copy of
the current workspace registers [131•
Memory. Memory on the 990/12 consists of up to two megabytes of
error checking and correcting memory. Each memory word consists of 22
bits, which includes the 16 bits of data and 6 bits used for the error
detection and correction. All memory is controlled by the memory
controller unit, which generates the error detection bits and handles
refresh for the 4116 dynamic memory elements. The 6-bit code generated
34
by the controller (a modified Hamming code) during store operations
allows detection and correction of single bit errors and detection of
two or more errors. Light emitting diodes mounted on the controller
board allow user location of faulty memory devices.
In addition to the main memory controller, a cache memory
controller with 2048 bytes of cache memory may be included. The cache
memory consists of two banks of fast memory devices. Each memory word
consists of 16 data bits, 2 parity bits, 1 data error bit, 11 address
bits, 2 address parity bits, and 1 validity bit. This memory is used
to store frequently accessed data from the relatively slow main memory.
Calls for data from the processor are honored faster than would be
possible from main memory. The cache controller stores the contents of
the last addressed 512 odd/even memory word pairs in cache memory;
when a word is accessed by the TILINE device the cache controller
searches the cache memory to see if the word is present. If it is not
present, the word is added to the cache memory for the next access
[141.
The System Mapping Interface. The System Mapping Interface is
implemented on an additional ten layer printed circuit board and
contains the TILINE and CRU interface, interrupt logic, loader and
self-test ROM, system clock, a 12-mi H i second test clock, front panel
interface, error and diagnostic registers, and memory mapping hardware.
All memory addresses generated by the AU board are only 16 bits wide
(the word size of the machine). The memory mapping hardware contained
35
on the SMI board uses this 16 bit address to generate the 20 oit
address actually used by the TILINE for peripheral and memory
addressing. This extends the physical address space of the machine
from 65536 bytes to 2 megabytes [131•
The memory mapping logic contained on the SMI board consists of
four sets of mapping registers called Map Files. Each set of mapping
registers consists of three limit registers and three bias registers.
When a specified map file is used for memory mapping the 16 bit address
generated by the AU board is compared with each of the limit registers
in the Map File. If the 11 most significant bits of the 16 bit address
are less than or equal to limit register 1, then bias register 1 is
used for the mapping. If the address is greater than limit 1 but less
than or equal to limit 2, then bias register 2 is used; if it is
greater than limit two but less than or equal to limit 3, then bias 3
is used.
The actual 20 bit address is computed by taking the sum of the 16
bit processor address and the 11 most significant bits of the bias
register extended to the right with 5 zeroes. The least significant
bit of the 16 bit processor address is dropped (thus only words - 16
bits - can be accessed on the TILINE) . Bits 0 and 1 (the 2 least
significant bits) of the limit registers are used by the operating
system to designate the status and protection of the mapped memory
segment. Figure 4 depicts the map file registers and development of
the 20 bit TILINE address [15].
36
Map File
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
u
Bl
L2
B2
L3
B3
1 = Limit . b = Bias a( x = Don't ( s = Status
1
b
1
b
1
b
1
b
1
b
1
b
1
b
1
b
1
b
1
b
1
b
1
b
address bit fdress bit 3are bit bit
1
b
1
b
1
b
1
b
1
b
1
b
1
b
1
b
1
b
1
b
1
b
1
b
1
b
1
b
1
b
1
b
1
b
1
b
1
b
1
b
1
b
X
b
X
b
X
b
X
b
X
b
X
b
X
b
X
b
X
b
s
b
s
b
s
b
s
b
s
b
s
b
Address Development
15 14 13 12 11 10 9 P P P P P P P
8 P
7 6 P P
5 4 P P
3 P
2 1 0 P P P
I I I I 1 1 I I I I I 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
I b b b b b b b b b b b b b b b b
M i l l I I I I I II I 11 I 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 m m m m m m m m m m m m m m m m m m m m
p = Processor address bit b = Bias register address bit m = Memory address bit
FIGURE 4: TILINE Map File and Address Development
37
The 990/12 memory map is shown in figure 5. The first 32 words
are used for the 16 interrupt trap vectors. The next 32 words are
used for XOP instruction trap vectors. Addresses in the range F800
through FBFE (hexadecimal) are mapped into the TILINE Peripheral
Control Space (TPCS), consisting of addresses FFCOO through FFDFE.
These addresses are reserved for TILINE peripheral devices, such as
disk drive controllers, etc. (the AP400 is a TILINE device).
Addresses FCOO through FFFE are used to access the loader and self-test
ROM [131.
Each slot in the chassis of the 990/12 has a wired interrupt
level. When the device generates an interrupt its level is compared
with the interrupt mask contained in the current status register. If
the interrupt is less than or equal to the enabled interrupt level, an
interrupt trap is made to the appropriate address in the 32 words of
interrupt trap vectors. Each interrupt trap contains two words - the
first is the entry point for the interrupt routine and the second is
the workspace pointer for the routine. Control is then passed to the
appropriate interrupt service routine.
The TILINE Data Bus
The 990/12 uses the TILINE, which is a high speed bidirectional
data bus, for I/O transfers between all high speed system elements such
as memory and disk drive controllers. The TILINE is an asynchronous 16
bit data bus; the speed of operation is determined by the devices
38
Area Definition
Interrupts
Extended Operations 0 through 15
General Memory Area
TILINE Peripheral Control Space
Programmable Read Only
Memory (PROM)
Memory Address
0000
0004
003C
0040
0044
007C
0080
OOAO
F800
FCOO
FFFC FFFE
Level 0 Int. vector
Level 1 Int. vector
Level 15 Int. vector
XOP transfer vector 0
XOP transfer vector 2
XOP transfer vector 15
Front Panel workspace
General memory area
TILINE
Programmer panel and loader
Restart vector
FIGURE 5: 990 Memory Map
39
Address, Data, and Control [16].
The TILINE, as does the UNIBUS, operates on the master/slave
concept. Each device may be either a master, a slave, or both. When
operating as a slave, the device may only respond to requests made by
the master which is currently in control of the bus. Each slave device
has a particular address or addresses which the master uses to pass
commands and data to and from the device. Main memory is a typical
slave device in that it can respond only v^en a read or write operation
is done to a particular memory address.
A master device has complete control of the TILINE bus and may
access any slave device on the bus. Each master device must first
gain access to the bus before performing any I/O transfers. This is
done through an arbitration scheme which is handled by special hardware
on the SMI board. TILINE operations are described in detail in the
following paragraphs. Timing diagrams for the TILINE operations are
contained in Appendix A. TILINE signals are described in Table 3.
Master to Slave Write Cycle. The TILINE master, after obtaining
the bus, places the data and address on the lines. At the same time it
asserts the TILINE GO (TLGO) signal and pulls the TILINE READ (TLREAD)
signal low. Each slave device on the bus is responsible for decoding
the address to determine if it is the device being accessed.
40
TABLE 3: TILINE SIGNALS
Address - 20 lines (AOO through A19)
Data - 16 lines (DOO throgh D15)
TILINE GO (TLGO) - 1 line; used by master to initiate data transfer.
TILINE TERMINATE (TLTERM) - 1 line; used by slave to indicate termination of a data transfer.
TILINE MEM ERROR (TLMER) - 1 line; indicates a memory error.
TILINE READ (TLREAD) - 1 line; a logic 0 indicates a read operation, a logic 1 indicates a write operation.
TILINE ACCESS GRANTED (TLAG) - 2 lines (TLAG IN and TLAG OUT) ; establishes master device priority.
TILINE ACCESS GRANTED (TLAK) - 1 line; used in bus arbitration.
TILINE AVAILABLE (TLAV) - 1 line; used in bus arbitration.
TILINE POWER RESET (TLPRES) - 1 line; generated by the power supply on power up to reset all TILINE devices.
TILINE POWER FAILURE WARNING (TLPBWP) - 1 line; indicates an impending failure of the DC power.
TILINE I/O RESET (TLIORES) - 1 line; generated by the CPU to reset all TILINE devices.
TILINE WAIT (TLWAIT) - 1 line; inhibits all activity on the TILINE to resolve conflicts during computer to computer communication.
TILINE HOLD (TLHOLD) - 1 line; used by TILINE couplers to prevent memory modification during computer to computer comunication.
41
All slaves are also responsible for delaying the TLGO signal for the
time required to perform the address decode. This delay must also take
into account the TILINE skew time which is defined to be 20
nanoseconds maximum. If the proper slave does not respond within 1.5
microseconds a TILINE timeout will occur, generating an error
interrupt. After decoding the address the selected slave device
performs the write cycle and then asserts the TILINE TERMINATE (TLTERM)
signal. When the master receives the assertion of TLTERM it must
release TLGO, TLREAD, and TLDAT lines within 120 nanoseconds. When the
slave receives the release of TLGO it must release TLTERM within 120
nanoseconds. The TILINE master may then perform another read/write
operation or relinquish the bus to another device.
Master to Slave Read Cycle. The TILINE master asserts TLGO and at
the same time generates the proper TILINE address and TILINE READ
signals. After address decoding, the addressed slave device places the
proper data on the lines and asserts TILINE TERMINATE (TLTM). If an
error occurs during the read operation the slave asserts the TILINE
ERROR (TLMER) signal. When the master receives the TLTERM signal it
must release the TLGO and TLADR signals. At this time it must be
finished with the TLDAT and TLMER lines. When the slave receives the
release of TLGO it must release the TLTERM and TLDAT lines within 120
nanoseconds. The master may then perform another read/write operation
or relinquish the bus to another device.
42
TILINE Bus Acquisition. A TILINE device must gain access to the
bus before it may perform any I/O operations. This is done through the
use of the three TILINE control signals TLAG, TLAK, and TLAV.
All TILINE master devices are connected to the bus in order of
priority (figure 6). When a master device is not utilizing the bus or
attempting to gain access to the bus, it is in the idle state. During
this time TLAG-IN from the next highest priority device • is passed on
to the next lower priority device. When the master is attempting to
gain access to the bus it disables TLAG-OUT to the next device and
monitors TLAG-IN from the higher priority master. When TLAG-IN has
been high for at least 100 nanoseconds the access controller will pull
TILINE ACKNOWLEGE (TLAK) low and monitor TILINE AVAILABLE (TLAV) . When
TLAV is released, the requesting device will pull TLAV low and TLAG-OUT
is again enabled to the lower priority devices. At this time the
master has complete control of the TILINE bus and may begin I/O
transfers to a slave device. After the last data transfer sequence the
controller releases the TILINE and returns to the IDLE state.
TILINE Specifications
The TILINE Address, Data, TLMER, TLPRES, TLPFWP, and TLIORES lines
use the following signal levels:
Logic 0 > 2.0 volts (High)
Logic 1 < .8 volts (Low)
^ ^
43
&
a:
^ -p eg c
^ 03 O
^
u
§ >1
• H
b •H
VD
M
44
The remaining TILINE lines use the following levels:
Logic 0 > 3.0 volts (High)
Logic 1 < 1.0 volts (Low)
The Address and Data lines on the TILINE are tri-stated (i.e, are in a
high impedance state when not being used) and are not terminated. The
control lines are terminated in the computer backplane; the termination
value depends on the signal [16].
The 990/12 computer system is currently using a Texas Instruments
DXIO release 3.4 operating system [17 - 22] . This operating system
supports language compilers for FORTRAN, Pascal, and COBOL as well as
BASIC and FORTH interpreters. System hardware includes 512 kilobytes
of error correcting memory, two 50 megabyte disk drives, eight video
display terminals, and five dot matrix printers. A 2400 Bit Per Second
synchronous communications link to the University computer system (a
Natinal Advanced Systems AS/6) using IBM 3780 protocol has also been
implemented.
The interface between the AP400 and the 990/12 required the
development of both a hardware interface between the two systems (the
TILINE and the UNIBUS) as well as a complete software driver for the
AP400. The remaining chapters will present the development of this
interface.
CHAPTER VI
THE HARDWARE INTERFACE
The hardware interface between the Analogic AP400 array processor
and the Texas Instruments 990/12 minicomputer consists of two half-size
wire-wrap boards located in the main chassis of the 990/12. These
boards are connected by a ribbon cable and together perform the
functions of a full size TILINE controller. The interface performs the
following basic functions:
1) Provides buffering between the TILINE 20 line tri-state address bus and the AP400 18 line open-collector address bus.
2) Provides buffering between the TILINE 16 line tri-state data bus and the AP400 16 line open collector data bus.
3) Provides memory mapping hardware for use by the AP400 in DMA operations - this allows the AP400 to drive all 20 lines of the TILINE address bus.
4) Provides data transfer control logic to allow both the 990 and the AP400 to perform data transfers through the interface.
5) Provides TILINE bus acquisition logic to allow the AP400 to become a TILINE bus master for DMA operations.
6) Provides system reset and bus timeout capabilities.
45
46
The following paragraphs will discuss each section of the hardware
in detail. Schematic diagrams for each of the sections are contained
in Appendix B.
The Address Bus
The address bus hardware must perform all buffering between the
AP400 open collector address bus and the TILINE tri-state address bus.
This is done through the use of two sets of bus buffers - SN75136 quad
tri-state bus buffers for the TILINE and MC3438 quad open-collector bus
buffers for the AP400 bus (the UNIBUS). The bus driver and receiver
outputs of the 75136 can be tri-stated; the drivers are enabled high
and the receivers enabled low. This allows both enables to be
controlled by a single line - ADDRESS ENABLE 1 (AEl) - which is
generated by the data transfer control logic.
The MC3438 bus transceivers are designed for bus oriented
structures with 120 ohm terminated lines. The outputs are
open-collector, thus allowing wired-or configurations. The driver
outputs are controlled by the control line ADDRESS ENABLE 2 (AE2) which
is also generated by the data transfer control logic. Termination of
the AP400 lines consists of 180 ohm resisters to +5 volts (pull-ups)
and 390 ohm resisters to ground (pull-downs) which provide a 120 ohm
bus.
Additional logic is required to map the 20-bit TILINE address into
the 18-bit UNIBUS address as well as mapping the TILINE Peripheral
47
Control Space (TPCS) into the AP400 select address range. The TPCS is
a 512 word address range from FFCOO to FFDFF hexadecimal which is used
to access all TILINE peripheral controllers. The AP400 interface
controller is designed to respond to addresses in the range 1F200 to
1F3FF hexadecimal. Address lines 9, 10, and 11 of the TILINE must
therefore be inverted in order to provide the correct address to the
AP400. Since the AP400 does not respond to or drive the least
significant bit of the address bus and the TILINE mapping logic does
not map the least significant bit of the processor address, then bit 0
of the TILINE address bus is used to drive bit 1 of the AP400 bus.
This results in only the first 17 lines of the TILINE bus being used by
the AP400 bus. The upper 3 lines (17,18, and 19) of the TILINE are
used to enable inverters on lines 10 and 11 of the TILINE bus receivers
- when the three most significant lines are all high the inverters are
enabled. This allows the TPCS addess range to be mapped into the AP400
address range (see figure 7 for address development).
The Data Bus
The data bus logic utilizes the same type buffers as used for the
address bus. These buffers are controlled by lines DATA ENABLE 1 (DEI)
and DATA ENABLE 2 (DE2) generated by the data transfer control logic.
19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
48
t t t t t t t t t t t t t t t t t t t t
V V Y
17 16 15 14 13 12 11 10 u u u u u u u u
9 u
8 u
7 u
6 u
5 u
4 u
3 u
2 u
1 u
enable
t = TILINE address bit u = UNIBUS address bit
FIGURE 7: AP400 Address Development
49
DMA Memory Mapping
The DMA (Direct Memory Adressing) mapping logic allows the AP400
to perform DMA operations in the entire two megabyte address range of
the 990. The AP400 only drives the lower 17 lines of the TILINE
address bus; therefore some method must be provided for driving the
upper 3 lines. This is done through a SN74LS175 quad bistable latch.
This latch will respond to any address in the TPCS, which is selected
through an 8 position DIP switch provided on the interface. Address
decoding for the latch is performed with a DM8130 10-bit comparator and
a 74LS133 13 input NAND gate. The lower 8 bits of the TILINE address
are compared (through the DM8130) to the DIP switch settings, thus
providing the proper address range. The latch, when selected, stores
the three most significant bits of the data bus. By writing tlie proper
data to the latch the upper three lines of the TILINE address bus may
be driven. The AP400, with the latch properly loaded, may perform DMA
operations on up to 128 kilowords of memory (through the 17 lines
driven by the AP400) without reloading the latch. The software,
however, must assure that the DMA operation does not cross the 128 K
boundary without reloading the mapping latch.
Data Transfer Control Logic
The data transfer control logic provides all buffering of the
read/write control lines between the TILINE and the AP400, as well as
providing control of the data and address buffers. Three of the TILINE
50
control signals correspond directly to UNIBUS control signals: TILINE
GO corresponds to UNIBUS MSYN; TILINE READ corresponds to UNIBUS Cl;
and TILINE TERMINATE corresponds to UNIBUS SSYN.
All peripheral device slaves on the TILINE are required to delay
the TTi30 signal by the time required to decode the TILINE address as
well as for TILINE skew. The UNIBUS, however, provides the necessary
delay of the MSYN signal for the slave device (this is performed by the
master). For this reason a delay is necessary in the TLGO signal to
allow for address decoding, bus skew, and buffer delay to assure that
valid data is presented to the AP400. The maximum delay has been
calculated to be less than 100 nanoseconds.
The UNIBUS SSYN signal (TILINE TERMINATE) is used in additional
logic to provide the DATA ENABLE signals to the data buffers. The data
and address buffers on the interface must be enabled in different
directions (i.e., the drivers and receivers disabled or enabled for
each set of buffers) depending on the operation being performed. Four
different sequences may occur:
1) A TILINE master performs a READ operation from the AP400 as a slave.
2) A TILINE master performs a WRITE operation to the AP400 as a slave.
3) The AP400, acting as a bus master, performs a READ operation from a TILINE slave.
4) The AP400, acting as a bus master, performs a WRITE operation to a TILINE slave.
51
For operations in which the AP400 is a slave the address bus must
be receiving from the TILINE and driving the UNIBUS. For operations in
which the AP400 is a master the address bus must be receiving from the
UNIBUS and driving the TILINE. The ADDRESS ENABLE lines are developed
from the SLAVE signal generated by the bus acquisition logic; this
signal is high when the AP400 does not have control of the TILINE
(i.e., is a bus slave). When the AP400 gains control of the TILINE
(becomes a bus Master), this signal is low and the address bus buffers
are enabled in the opposite direction. Two 74LS04 inverters are used
to provide extra drive capability for the AE control lines.
The DATA ENABLE control signals are generated from the TLREAD,
TLTW, and from the MASTER signal generated by the bus acquisition logic
(the MASTER signal is the logical inversion of the SLAVE signal). The
Boolean expression for the data buffer control is:
DE = WRITE*SLAVE*TLTM + READ*MASTER
The WRITE and SLAVE signals are used here to represent the logical
inversion of the TLREAD and MASTER signals. DE is the data enable
signal used to produce DEI and DE2. When this signal is high the
SN75136 bus buffer drivers are enabled while the receivers are
tri-stated; the MC3438 bus buffer drivers are disabled while the
receivers are enabled - thus the TILINE is being driven. When this
signal is low, the SN75136 bus buffer drivers are disabled while the
MC3438 bus buffer drivers are enabled, thus driving the UNIBUS.
According to the Boolean expression, then, the UNIBUS will be driven
52
while the AP400 is a slave and a write operation is being performed, or
while the AP400 is a master and a read operation is being performed.
The TILINE will be driven while the AP400 is a bus master and a write
operation is being performed, or while it is a slave and a read
operation is being performed. For the SLAVE and READ operation,
however, the TILINE buffers are not enabled until the AP400 SSYN signal
is asserted; this assures that the TILINE drivers are enabled only when
the AP400 is the device being accessed. A 74LS08 AND gate is used to
enable the TILINE buffers through the use of a monostable multivibrator
described below.
The use of the TLTM signal in generating the DE control signals
neccesitates the insertion of a delay into this control line. While
the Ai'400 is a slave and a read operation is being performed the TILINE
data bus drivers are not enabled until after the SSYN signal is
asserted by the AP400. The TLTM signal must therefore iDe delayed in
order to allow the data to stabilize on the TILINE bus, before allowing
the master device to latch it.
Another timing problem in the TLTM signal was encountered after
the interface had been implemented. On the TILINE, the slave device
must release the TLTW signal within 120 nanoseconds after receiving
the release of the TLGO signal from the master. On the UNIBUS,
however, there is no timing constraint on the slave device for the
release of the SSYN signal. It was discovered that the AP400 required
more than 120 nanoseconds to release the SSYN signal; an overlap
53
occurred between the release of TLTM by the AP400 and the assertion of
TLGO by the master for the next I/O transfer. For this reason a
monostable multivibrator was included in the design. This monostable
multivibrator (an SN74LS221) is triggered by the falling edge of the
TLGO signal from the master device (after the assertion of TLTW by the
AP400). The monostable then generates a disable pulse for the TLTM and
the DEI signals for approximately 400 nanoseconds (this pulse width is
determined by the external resistor and capacitor). This allows the
next TILINE operation to begin while the AP400 is still in the process
of releasing the TLTM signal.
All control lines on the TILINE are driven by open collector logic
(thus allowing wired-OR configurations). These control signals are
generated through SN75138 quad open-collector bus transceivers designed
for single ended transmission lines. The UNIBUS control lines are
driven by the MC3438 bus transceivers.
The delayed TLGO signal is also required by the mapping logic;
thus the generation of the MAP LATCH TLGO signal. Similarly, the
mapping logic must provide a TLTM signal when the data has been
latched. This signal is ORed with the AP400 SSYN signal by a 74LS00
NAND gate to provide the TLTM signal to the TILINE.
TILINE Bus Acquisition Logic
This section of the interface performs all actions necessary to
allow the AP400 to become a bus master on the TILINE. It performs
54
conversion between the UNIBUS bus acquisition signals and TILINE
acquisition signals, as well as providing control signals for other
sections of the logic. For the following discussion refer to the flow
chart in figure 8.
To gain access to the bus, the AP400 first asserts the
NON-PROCESSOR REQUEST(NPR) control line. If SELECTION ACKNOWLEGE
(SACK) is not asserted and TILINE POWER RESET or TILINE I/O RESET are
not asserted then NPR is passed to the preset input of the Device
Access Request (DAR) flip-flop. When the DAR flip-flop is set the
TLAG-IN signal from the next highest priority device is disabled to the
next lower priority device.
TLAG-IN is then continually tested; when it has been high for at
least 100 nanoseconds, and the interface has been in the DAR state for
at least 100 nanoseconds, then TILINE ACKNOWLEGE is tested. When this
signal is unasserted the Device Acknowlege (DAK) flip-flop is set; the
interface is now in the DAK state.
After entering the DAK state the TLAK line is asserted and TLAV is
tested. When TLAV is released by the previous device the Device Access
(DACC) flip-flop is set; the MASTER control line is also asserted. The
NPG flip-flop will generate a NON-PROCESSOR GRANT (NPG) signal to the
AP400 informing it that it is now the bus master.
The AP400, upon receipt of the NPG signal, will assert SACK. The
receipt of SACK by the interface will clear the DAR flip-flop and
enable TLAG-IN to the next lower priority device, and will also clear
55
1 DAR
— no-
no
DAR 0 TLAG
^^ RESET? J^
^^^1TJ«^\^^ —-<^0 nsec? ^ ^
X' TLAK \ > "V^ HIGH? ^ ^
yes'
yes
DAK = 1 0 TLAG 0 TLAK
no
DACC = 1 1 NPG
DAR TLAV DAK NPG TLAG
no
0 DACC
FIGURE 8: Bus Acquisition Sequence
56
the NPG flip-flop. The DAK flip-flop will also be cleared, allowing
the next TILINE device to begin bus arbitration procedures.
Upon receipt of the release of NPG the AP400 will assert BUS BUSY
(BBSY); it may then begin data transfer through the TILINE. The AP400
will release the BBSY signal after performing the last data transfer.
The rising edge of BBSY (after the inverter) will clock the D input to
the DACC flip-flop, thus clearing it (the input to the DACC flip-flop
is low in the DACC state). This will release TLAV and the TILINE to
the next device.
Reset and Timeout Logic
Two reset lines are used in the interface logic: TLPRES and
TLIORES. The assertion of either of these lines during the bus
arbitration process will reset the DAR, DAK, and NPG flip-flops, thus
ending the sequence. It will also assert the INIT line to the AP400,
causing a hardware reset. A reset of the AP400 will cause the release
of BBSY, thus indirectly reseting the DACC flip-flop. An assertion of
the TLPRES line will reset the DACC flip-flop, thus assuring that the
interface is in the SLAVE state on power-up. The DACC flip-flop may
also be reset by a TILINE timeout or TLWAIT signal.
The two signals TLTM and TLWAIT are used in the bus timeout logic.
These signals are passed through the inverters to open collector
drivers, and then through a delay circuit. While the interface is in
the SLAVE state the TLTM input to the delay is disabled, thus
57
maintaining the output of the delay high (no reset occurs). When the
interface enters the MASTER state the TLTM signal is enabled; a TILINE
timeout will begin as long as TLTM is not asserted. If a data transfer
occurs and the TLTM line is asserted (the slave device responds) then
the timeout will restart. If no device responds in the specified time,
or the AP400 fails to begin data transfers in the specified time, the
timeout will occur and the DACC flip-flop will be cleared. This
timeout has been set at five microseconds.
The TLWAIT signal, if asserted at any time during data transfer,
will cause all TILINE devices to wait. To prevent a TILINE timeout
from occurring during this wait period the timeout circuitry will be
disabled while the TLWAIT signal is asserted.
AP400 Bus Terminations
The UNIBUS, as described earlier, is a doubly terminated 120 ohm
data bus. The terminations are provided on both ends by 180 ohm
pull-up resisters and 390 ohm pull-down resisters. For the AP400 to
perform correctly, these terminations must be provided at both ends of
the AP connector cable. The terminations at the 990 end are provided
on the interface boards, as described in a previous paragraph. At the
AP end it was necessary to construct a printed circuit board, with the
proper terminations, which is plugged into the UNIBUS connector.
CHAPTER VII
THE SOFTWARE INTERFACE
The AP400 software package consists of several separate modules.
Some are resident in the host computer (the 990/12), while others are
resident in the AP400 itself. The AP400 resident software consists of
the AP Executive, which handles all communication with the host and
controls the AP execution, and the AP Service Subroutines, which
perform the specific functions of the AP (they are called by the
Executive). The host resident softo/are consists of the AP Manager and
Driver, which control all access to and control of the AP400. The
AP400 software modules are described in detail in reference [2]. The
following discussion will be limited to an overview of the host
resident software as well as differences between the 990/12
implementation and the PDP-11 implementation described in the manuals
provided by Analogic.
The host computer communicates with the AP400 through the host
system bus. This communication may be performed either through
programmed I/O or through DMA operations, as described previously. The
host resident software consists of two separate modules: the AP
Manager and the AP Driver. The Manager handles all communication
between the user application program, written in either FORTRAN or host
assembly language, and the AP Driver. The Driver handles all of the
actual communication between the AP400 and the host. It passes data
and parameters to and from the AP400 and the AP Manager routines.
58
59
The AP Driver module maintains an AP400 status table containing
information on the current status of the AP, such as whether it is
running or halted, executing a Function Control Block, etc. As calls
are made to the AP by the AP Manager through the Driver, this status
table is updated.
The AP400 resident software (consisting of the AP Executive and
the Service subroutines) maintains a Configuration Table (CFT) in AP
data memory. This configuration table contains the following
information:
Word 1
2 3 4 5 5 7 8 9 10 11 12 13 14
15:
16
Flag to indicate the Executive has been loaded. Contains the current AP Executive version. Last physical address in program memory. Last physical address in data memory. First free location in program memory. First free location in data memory. Last free location in program memory. Last free location in data memory. Pipeline Arithmetic Command PROM set code. Limit address on function table. First address of function table. Last address of function table. Host DMA limit count. Register for 24-bit data transfers (AP to Host). Register for 24-bit data transfers (Host to AP). Unused by the 990/12.
It is the responsibility of the host resident AP driver to load
the configuration table and update the status. The driver uses such
routines as GETCFT and PUTCFT to maintain updated copies of the CFT in
both the host and the AP400. The data contained in the CFT is used by
60
both the host and the AP for proper AP execution. The following
paragraphs descibe the 990/12 implementation of the Driver and Manager.
The AP400 Host Resident Manager
The Manager consists of several separate assembly language
routines which may be called by either FORTRAN programs or other
assembly language programs. Each of the routines (referred to as a K
function) performs a specific function, such as reseting or loading the
AP, by making appropriate calls to the AP Driver routines. Functions
such as Fast Fourier Transforms (FFT's), convolutions etc. require that
a Function Control Block (FOB) be set up. This FCB is subsequently
loaded into the AP and executed. Each of these FCB's has a
corresponding AP function (referred to as a Q function) which must be
resident in the AP prior to execution. This is done through the AP
loader routine KLOAD. Each FCB sets up the required data for execution
of the corresponding AP service subroutine. An example of a typical
FCB is contained in Appendix C. Detailed descriptions of the AP
Manager K functions are contained in the AP400 software reference
manuals.
The Host resident K functions are contained in a host Library.
Calls are made to the Manager through a standard FORTRAN calling
sequence:
CALL subname (argl,arg2,...,argN)
Calls from assembly language are made using a BLWP (Branch and Link
61
Workspace Pointer) instruction. The arguments are passed as data words
following the BLWP instruction:
BLWP @subname DATA n DATA argument 1 DATA argument 2
•
DATA argument n
The first word following the BLWP instruction must be the number of
parameters being passed. The arguments being passed are the addresses
of the parameters to be transferred. For example, if the routine TEST
is to be called and the two parameters 5 and 7 are to be passed to the
routine, the following code is necessary:
PARMl DATA 5 First parameter value PARM2 DATA 7 Second parameter value *
ARGl DATA PARMl Address of the first parm ARG2 DATA PARM2 Address of the sec. parm *
BLWP @TEST DATA 2 DATA ARGl DATA ARG2
The purpose for passing the address of the parameter, rather than the
parameter value itself, is to maintain compatibility with the FORTRAN
calling sequence. For more information on FORTRAN callable subroutines
see the FORTRAN reference manual [23]. An example of a FORTRAN and
assembly language callable K function is contained in Appendix C.
62
The Manager routines are linked to the FORTRAN or Assembly
language program through the use of a LIBRARY statement in the DXIO
Linker (see the DXIO Link Editor Reference manual [24]). For assembly
language programs, a REF (external Reference) assembler directive
statement must be included for each subroutine called.
Errors encountered by the AP Manager during execution are reported
to the terminal associated with the calling program. These errors are
displayed to the Foreground Terminal Local file through a call to an
appropriate system routine [211. Upon completion of the program these
messages are displayed to the terminal. The error messages returned by
the manager are described in the AP400 software reference manual [2].
The following errors differ from those described:
Error -85: Used only for an End of File read error.
Error -86: Will return a specific file I/O error (error -86 will not actually be returned - an SVC error will replace it with an indication that it is an SVC error); this will be the Supervisor Call (SVC) error returned by DXIO through the SVC block used for file I/O. These errors are described in the DXIO Error Reporting and Correction Manual [26].
Error -87: Not used.
In all other respects the AP Manager should function as described
in the AP400 Software Reference Manual [2].
The AP400 Host Resident Driver
The AP Driver consists of three separate modules: the Baseline
driver (BASE), the Program Load Driver (PLDRV) , and the Relocator
(REL). The Baseline driver consists of the minimal number of
63
subroutines necessary for use of the AP400 by an application program.
The Program Load driver and the Relocator are the routines used to
perform 'one-time' services such as AP hardware and software reset and
AP400 program loading (the Executive and service subroutines). In this
manner, the AP400 may be initialized and loaded only once (on power
up); each application program will then need to be linked only with the
Baseline driver. The Program Load driver and the Relocator are called
by only a few of the Manager routines.
The 990/12 implementation of the Baseline driver does not support
host interrupts by the AP400. This capability may be included later,
but will probably require a modification of the AP400 interface
controller. Each application task should use polling of the AP through
the KSTAT, KSETIW, KWTFCB, or KWAIT Manager routines - these routines
allow the user program to perform other actions while the AP is
executing, then resynchronize with the AP at the appropriate point in
the program. Alternately the application program may call available
system routines to suspend itself while waiting for the AP to finish
execution. These routines (specifically the Suspend Task Supervisor
call) are explained in reference [19]. The following paragraphs
describe each of the three modules contained in the Driver.
The Baseline Driver. The Baseline driver, as implemented on the
990/12, does not allow the user to change the default AP400 Coninand and
Data register addresses. If the AP400 register select addresses are
64
changed, the Baseline driver will need to be re-assembled with the new
addresses (this consists of changing the two DATA statements at the
beginning of the driver). The driver was implemented in this fashion
due to the hardware changes necessary to install a TILINE device on the
bus, as well as changing the register addresses on the AP400 interface
board. Only a system operator should be allowed to make these changes
when necessary.
Direct Memory Access (DMA) operations by the AP400 require that
the AP be loaded with the proper DMA memory address. Usually the DMA
operation will be done to a data buffer area set up in the calling
program. The AP400, upon initialization, is loaded with the base
address of the calling task. To perform a DMA operation it is then
passed an offset (from the base address) to the data buffer area. A
DMA operation therefore requires the determination of the base address
of the calling task, as well as assuring that the task is not
•rolled-out' of memory during DMA operations.
In a time-sharing system, several tasks may be executing
simultaneously by having the CPU share its time among all of the tasks.
Since there may be, at times, more tasks executing than memory space
will allow, some method must be provided for storing tasks on a mass
storage device (i.e., a disk drive) while they are not currently being
executed. This allows several tasks to utilize the same memory space
concurrently. When a task has been temporarily stored on disk, it is
referred to as being 'rolled-out'. The operating system typically
65
provides a special file for this purpose called a system roll file. If
DMA operations are being performed to a certain task memory area,
however, rolling the task out (due to time sharing) would be disastrous
- the DMA operation would be performed to a completely different task,
thus overwriting it. Some means must be provided for securing the task
in system memory while the DMA operations are being performed (referred
to as a memory resident task).
Under DXIO each currently executing task has an associated Task
Status Block. This TSB remains in memory (as part of the operating
system) while the task is active. The system uses the TSB to store
information about the current status of the task - if it is privileged,
memory resident, suspended or active, the current contents of the task
map file registers etc. [21]. A task may therefore be made memory
resident, privileged, etc. by modifying this Task Status Block. At the
same time, the base address of the task may be determined from the
contents of the map file registers associated with the task.
In order to assure that AP400 DMA operations are performed in an
orderly fasion, one of the tasks performed by the Baseline driver is to
make the calling task memory resident. At the same time it must read
the task map file registers to determine the absolute base address of
the task in system memory. This is done be calling the Device Service
Routine (DSR) which is installed as a part of the DXIO operating
system.
If the base address of the calling task is located above IFFFF
66
hexadecimal the map latch located on the interface board must also be
loaded in order to drive the upper three lines of the address bus (the
AP400 will only drive the lower 17 lines). Before any DMA operations
are performed a check must be made to assure that the DMA absolute
memory address will not cross the IFFFF boundary. If this occurs, the
map latch must first be reloaded to prevent 'wrap-around' to an address
outside of the task memory area by the DMA operation. This address
checking is done in each of the K funtions which perform DMA
operations.
Upon completion of use of the AP400 by the calling task (via a
call to the manager routine KDETCH) the DSR is again called to make the
task unprivileged and non-^nemory resident.
All input/output operations to the AP400 are done through the use
of the Long Distance instructions (LDD and LDS) [151 • These
instructions load the user map file (map file 2) with a six word block
of memory specified in the instruction. The instruction immediately
following the LD instruction will then use the map file to determine
either the source (LDS) or the destination (LDD). The proper values
required to map the TILINE Peripheral Control Space are contained in
the baseline driver. The use of map files has been described in
chapter V.
A summary of subroutines included in the baseline driver are
listed in Table 4. For a more detailed discussion see the AP400
Software reference manual [2].
67
TABLE 4: BASELINE DRIVER ROUTINES
TELLAP - Will send a command to the AP without overwriting the current message in the command register.
WTAP - Will wait for a specified Function Control Block (FCB) or all FCB's to complete.
WTAPPM - Will wait fot the AP to process the last message sent to it; this is done by continually checking the Host to AP interrupt pending status bit. Error code -75 is returned if the AP takes too long to complete.
MSTRUN - Will check to see if the AP is currently running. If not, error code -72 is returned.
MNRUN - Will check to see if the AP is halted. If not, error code -73 is returned.
SETVEC - On the first call to this routine the task is attached to the AP400 by a call to the DSR. On all other calls the AP400 register addresses are loaded into the task workspace for use by Long Distance intructions.
NORMEX - This routine is called for a normal exit (no error) from the Baseline driver.
ERREX - This routine is called for an error exit from the Baseline driver. Register 12 is made nonzero to indicate the error condition.
START - This routine calls the DSR to attach the AP400 to the calling task.
STRTAP - This routine will start AP execution at a specified address.
EXCFCB - This routine is called to execute a Function Control Block.
TERFCB - This routine is called to terminate the execution of a
Function Control Block.
SNDMSG - Sends a general message to the AP400.
UPDSTA - Will determine which Function Control Block the AP is currently executing.
REPSTS - Will report the current AP execution status.
68
TABLE 4 (Continued)
WTFFCB - Allows the caller to suspend operation until a specific Function Control Block in the chain has been executed.
REINIT - Re-initializes the AP400.
DETACH - Will make a call to the DSR to detach the AP from the calling task.
69
The Program Load Driver. The host resident Program Load Driver
consists of the subroutines which are generally used only for initial
loading of the AP400 (such as after power up) . It consists of the
modules described in Table 5.
The Relocator. The host resident Relocator is used to process
Object/Load modules produced by the AP Linker and produces
Absolute/Load modules for loading by the absolute loader ABLOAD
contained in the Program Load driver. The Relocator handles the
relocatable code in the Object file and produces absolute memory
addresses for actual loading into the AP400.
The Device Service Routine
The Device Service Routine (DSR) is a series of assembly language
routines which are generated as a part of the DXIO operating system.
These routines perform the following functions:
1) Service any interrupts generated by the
AP400.
2) Perform 'housekeeping' functions upon power up of the CPU.
3) Handle 'Abort I/O* calls made from the operating system.
4) Execute a set of Operation Codes which are passed to the DSR by the Baseline driver via a standard Supervisor call.
70
TABLE 5: PROGRAM LOAD DRIVER ROUTINES
FLSHP - This routine will "flush" the AP400 pipeline.
CFTSET - This is the setup routine for Configuration table read/write routines.
ABLOAD - This routine performs an absolute loading of AP program and data memory (after the relocatable addresses have been determined).
RESAP - Performs a hardware and software reset of the AP400.
STOPAP - Will halt AP execution.
RESTRT - Restart the AP at the address at which it was stopped by the STOPAP routine.
GETCFT - Reads the AP Configuration table from the AP.
PUTCFT - Stores the Configuration table in the AP.
71
The Interrupt Service Routine. This routine performs all
functions necessary to handle an AP400 interrupt of the host system.
At the present time, the AP400 interrupt system is not enabled, so this
routine will simply return to the calling task on entry.
The Powe£-U£ Routine. The power up routine is called by the
system after the initial operating system load sequence. It will clear
the AP BUSY flag, indicating that the AP may be used, and return.
The Abort 1/2 Routine. The Abort I/O routine is called by the
operating system any time an Abort I/O supervisor call is issued by a
task. It will terminate the current AP operation to assure that it is
detached from any tasks.
Operation Codes. The Operation Codes define specific operations
to be performed on the AP400 by the Device Service Routine. These Op
Codes are specified in the Supervisor Call Block used to call the DSR.
The OP Codes for the AP400 DSR are:
00 - OPEN: Will attach the calling task the the AP400 to prevent other tasks from accessing it.
01 - CLOSE: Will detach the AP from the calling task.
02 - MEMRES: Will make the calling task memory resident for DMA operations.
03 - UNMEM: Will make the calling task non-memory resident.
04 - WCMD: Will write a command passed in the SVC block to the AP400 command register.
72
05 - RCMD: Will read the AP400 command register and return the value in the SVC block.
06 - WDAT: Will write the data passed in the SVC block to the AP400 data register.
07 - RDAT: Will read the AP400 data register and return the value in the SVC block.
08 - ABORT: Detaches the AP400 from a task which may have terminated abnormally without releasing the AP400.
The SVC block structure is shown in figure 9. The byte structure
is:
Byte Byte Byte Byte
Byte Byte Byte
Byte
Byte
0: 1: 2: 3:
Must be zero. Error codes are returned here. Contains the DSR Op Code. Contains the LUNO assigned to the AP400. System Flags - not used. User Flags - not used. For WCMD and WDAT the command or data is passed in this word. Not used - must be set to zero.
10-11: For RCMD and RDAT the data is returned here.
4 5
6-7
8-9:
A detailed description of DXIO Device Service Routine structure is
contained in reference [21].
73
Byte
00
02
04
06
08
10
12
SVC OP CODE (00)
OP CODE
SYSTEM FLAGS
STATUS CODE
LUNO
USER FLAGS
VCMD and WDAT BUFFER
NOT USED
RCMD and RDAT BUFFER
NOT USED
FIGURE 9: Supervisor Call Block Structure
74
AP400/990 Operation
The steps required for operation of the AP400 with the 990/12
system are outlined below:
1) The AP400 must be loaded with an operating system. This
includes the AP Executive, several housekeeping routines, and the
necessary service subroutines. This operating system is developed
through the use of the AP Linker - a link control file must be created
which includes the AP Executive and associated subroutines. These
routines are all contained in a host resident library or libraries; the
linker will combine the specified modules into a single module. This
module is then loaded into the AP400 by calling the KLOAD subroutine
(as described in the AP400 Software Reference manual), or alternately
by using the APLOAD SCI command.
2) The FORTRAN or assembly language source program, including the
appropriate subroutine calls, must be written, compiled, and linked.
This is done through standard program development commands under DXIO.
During the linking phase the AP Driver and Manager host libraries must
by included in the link control file. This is described in the AP400
Software Reference Manuals.
3) The program may then be executed. The AP400 may only be
accessed by one program at a time; while the AP Driver assures that
only one task is attached to the AP at a time, the possibility exists
that a task may be terminated without being detached from the AP
(through an unrecoverable error). If this occurs, the user should
75
detach the AP by using the APKILL SCI command. This will assure that
the next user may have access to the AP.
The AP Linker mentioned above, along with an AP Assembler, are
implemented on the 990/12 in FORTRAN. The operation of these routines
will be described in detail in an AP400 users manuals which will be
developed along with the AP400/990 interface. Details of their
operation will not be included here due to the possibility of their
being modified as the operating system is updated.
The AP Driver routines, when fully implemented, should be
completely compatible with the DXIO operating system now being used on
the 990. Notes will be made in the appropriate reference manauals of
the differences in implementation between the 990 and the PDP-11
system. The completed software package for the AP400 should include
all necessary host and AP libraries, as well as some pre-tested
application programs.
CHAPTER VIII
CONCLUSION
The 990/12 has proven to be a successful host system for the AP400
array processor; the Baseline driver and Manager routines implemented
in host assembly language perform as required. The AP400/990
implementation should prove to be a valuable computing tool for
research into such areas as real-time signal processing. With the
addition of analog to digital and digital to analog converters for the
auxiliary I/O ports on the AP400, the system could be used as an almost
independent unit - the 990 would be used for initial program loading
and possible storage of intermediate or final results, v^ile the AP400
would function as a separate processor.
More extensive testing of the AP400/990 system will be necessary
to determine the best method of using the system. Several options
exist for the software driver implementations. At the present time,
this driver is implemented as a linked-in part of the user application
program. This method allows faster execution, but reduces the amount
of memory space available to the user task. The inclusion of the WCMD,
RCMD, WDAT, and RDAT Operation Codes in the Device Service Routine
would allow the modification of the Baseline driver to restrict all
actual AP400 I/O operations to the DSR. This would be useful in
applications where the user task cannot be made privileged (the Long
Distance instructions used in the present implementation of the driver
76
77
require that the task be installed as a privileged task).
Another method of implementing the Driver would be to include the
Baseline Driver in the Device Service Routine; in this manner the
subroutines required by the application program would be kept to a
minimum. The drawback to this type of implementation would be the
increase in processing time required by the system overhead routines.
Options also exist in the method by which the AP400 DMA operations
are performed. An alternative to the present method would be to
include a specific buffer area in the Device Service Routine from which
all DMA transfers are done. The data buffer in the application program
would then have to be transferred to this buffer before each DMA
operation. The advantage of this method is that the application
program would not have to be made memory resident - the DMA operation
would always be performed from the same base address, v^ich is
determined when the system is generated (the operating system
containing the DSR and the data buffer is memory resident). In
addition, the map latch located on the interface board would not have
to be loaded because the memory resident portion of the operating
system is always loaded into the lower portion of system memory.
Experimentation with the AP400 driver routines for various
applications should determine the best method for each situation.
Several different drivers may be included in the system software, with
the particular driver used being determined by the application.
78
Problems still exist in the present hardware implementation. Some
of these are related to the PDP-11 interface controller included in the
AP400, while others are due to differences between the 990 TILINE and
the PDP-11 UNIBUS structures.
As discussed earlier, interrupt capabilities have not been
included in the present design. The major reason for this is the
difference between the UNIBUS and TILINE methods of interrupt handling.
For the UNIBUS, the interrupting device must first gain access to the
bus; an interrupt vector is then placed on the data bus to allow the
processor to execute the proper interrupt service routine. On the
TILINE, however, interrupts are generated through a single interrupt
line, with the interrupt level determined by the slot in which the
device is installed. While hardware could have been included on the
interface to handle this situation, it was felt that this would make
the hardware more complex than necessary. One method of overcoming
this problem would be to modify the microsequencer control code located
in ROM on the AP400 interface controller to be compatible with the
TILINE interrupt structure. The only hardware changes necessary would
be the inclusion of a single signal line from the UNIBUS INTR line to
the TILINE interrupt line. The Device Service Routine could then be
modified to handle the interrupts generated by the AP400. For many
applications, however, the AP400 may be used without the need for
interrupt capability (through the use of the AP400 polling routines).
79
Another problem in the current implementation is the lack of
capability for handling memory errors which may be encountered during
AP400 DMA operations. While both the TILINE and the UNIBUS have the
capability of generating memory parity error signals, the PDP-11
interface controller in the AP400 does not utilize the UNIBUS memory
parity signal lines. This should cause a minimum of problems, however,
because any memory parity errors encountered on the 990 will probably
be detected and logged by the CPU before the AP400 experiences any
problems related to this. In addition, the error correcting capability
of the memory controller on the 990 should keep this problem to a
minimum.
One other problem exists in the TILINE timeout circuitry. While
the timeout logic will prevent the AP400 from locking up the TILINE in
the event of an access to non-existent memory, no method is available
for informing either the AP400 or the 990 that this timeout has
occurred. One possible solution would be to use the timeout reset
signal to generate the TLTERM signal for the AP400. This would allow
the AP400 to continue execution, but the application program would
never be aware that the timeout had occurred and that the data is
invalid. Another method would be to reset the AP400 in the event of a
TILINE timeout - this may not be acceptable in many situations. In the
present implementation, a timeout will simply release the TILINE to the
next bus master; the AP400 will wait indefinitely for the device to
respond. This will inform the user that a problem has occurred, even
80
though it will require a hardware reset of the AP400 to acknowlege it.
The most promising method for solving this problem would be to use the
timeout error signal to generate an interrupt to the 990. The
interrupt handling routine could then be used to inform the calling
task of the error. If the interrupt capabilty of the AP400 is
implemented at a later date, the interrupt service routine would need
the capability of determining a valid interrupt from an error
interrupt. This could be done through the use of an interrupt status
flag, indicating whether the AP400 interrupts have been enabled - if
not, then the interrupt was caused by an error condition.
Use of the AP400/990 system should suggest the solutions to these
and other problems. The development of complete software libraries and
a simplified user interface should not prove difficult.
LIST OF REFERENCES
[I] Walter J. Karplus and Danny Cohen, "Architectural and Software Issues in the Design and Application of Peripheral Array Processors", Computer, volume 14 no. 9, p. 11, September 1981.
[21 AP400 Software Reference Manual, volume II, Analogic Corporation, Wakefield Massachusetts, 1980.
[31 Multiprocessors and Parallel Processing, Comtre Corporation, John Wiley and Sons, New York, 1974.
[4] Peter M. Kogge, The Architecture of Pipelined Computers, Mcgraw Hill, New York, 1981.
[51 William A. Wulf, Roy Levin, and Samual P. Harbison, HYDRA/C.mmp: An Experimental Computer System, McGraw Hill, New York, 1981.
[61 "Highly Parallel Computing", Computer, volume 15 no. 1, January 1982.
[71 "Data Flow Systems", Computer, volume 15 no. 2, February 1982.
[81 Robert A. Caspe, "Array Processors", Mini-Micro Systems, p. 54, July 1978.
[9] Robert Bernhard, "Giants in Small Packages", IEEE Spectrum, volume 19 no. 2, p.39, February 1982.
[10] An Introduction to the AP400 Array Processor, Analogic Corpora"tron, Wakefield Massachusetts, 1979.
[Ill PDP-11 Peripherals Handbook, Digital Equipment Corporation, Maynard Massachusetts, 1975.
[121 Model 990/12 Computer Hardware Users Guide, Texas Instruments, August 1979.
[13] Model 990/12 Central Processor Unit Depot Maintenance Manual, Texas Instruments, July 1980.
[14] Models 990/10 and 990/12 Computers Memories Depot Maintenance Manual, Texas Instruments, December 1980.
81
82
[15] Model 990/12 Assembly Language Reference Manual, Texas Instruments, January 1981.
[16] TILINE Three-State Asynchronous Data Bus Specification, Texas Instruments, August 1976.
[17] Model 990 Computer DXIO Operating System Concepts and Facilities Manual, Texas Instruments, April 1981.
[18] Model 990 Computer DXIO Operating System Production Operation Manual, T?ixas Instruments, April 1981.—
[19] Model 990 Computer DXIO Operating System Application Programming Guide, Texas Instruments, April 1981.
[20] Model 990 Computer DXIO Operating System Developmental Operation Manual, Texas Instruments, April 1981.
[21] Model 990 Computer DXIO Operating System Programming Guide, Texas Instruments, April 1981.
[22] Model 990 Computer DXIO Operating System Error Reporting and Recovery Manual, Texas Instruments, April 1981.
[23] Model 990 Computer FORTRAN Programmer's Reference Manual, Texas Instruments, April 1981.
[24] Model 990 Computer Link Editor Reference Manual, Texas Instruments, December 1979.
APPENDIX
A. TIMING DIAGRAMS
B. SCHEMATIC DIAGRAMS
C. SOFTWARE LISTINGS
83
QJ
a o U
cn U)
'^
rH U o U
2 ?"•
g 2 >H
C/3
n3 P g
a
84
85
•H
D CQ
c (—I r-
' ate
86
§ •H -P •H cn
s §
ro
5 M fa
PQ CQ
87
^
r^
^^
^
T3
5
^
i CN oi
cn
T
3
8 LD cn
• o b
I rH
cn
J
: :^
^
^
^
Q) rH
M EH
I
fa
—-
88
cn
i
a j-j-^
Oi EH
ftJ rH cn
rH
P •H
H EH
in
« g
89
o A
o A-
T
T ^
c 0
•H P •H cn
•H
o (U rH cn c
:i_bv
V
M £H
T O (D ^ cn c
•f 7
7^
\
^
i
M fa
o
a^
Q
H &H
cn
00 I X Q r
a^
a
K1 H
nzL
ro
in
8 VO u. cn
in
8 VD S-J ro Ln
VD ro l-H
in
8 u
I t
g CN
UL
00 ro '^ ro
8 ^
HI
¥^
IZ 13
i f ? 00 ro
ro
¥^ !i Li
ro
in s + r
3_J:
00 ro "^ ro
n
• ^ ^ a
• ^
-( cn Q
^| '
f -^ Q
• B
cn
•ll-
_> 0 0
H«-
^ ?3 - ^
, i n ^r- t
Q
O
s cn
4-) s I
PQ
M fa
t 90
91
0 • r i
CO D
CC
cn cn <D u
CN I
CC
fa
92
cr> 00 r^
^ ^ ^
V\AA^
CJ • H
• H
ro I
03
^
93
u
•H
O
c
8 14-1
cn
03
I CC
fa
94
i
^
O 35 rH o cn 0) <N C T3
u
^
c o •H X) •H cn
•H
cn
in I
95
RESET
TLIORES>
TLPRES
II>IIT
+5 ^JVVHHM" 5 ^ microsec
delay
r
TLWAIT J*
TLTM in
MASTER
TDIEOUT
FIGURE B-6: Timeout and Reset Logic
APPENDIX C
SOFTWARE LISTINGS
FCBBLK FCBID FCBCTL FCBDON FCBLNK
PCBPLT FCBNRG FCBLEN FCBARL
EQU DATA DATA DATA DATA DATA DATA DATA DATA DATA DATA DATA DATA DATA DATA
$
FFTCl 0 1 0 0 2 3 6 0 0 0 0 0 0
FCB Entry point AP function ID Control Word FCB done flag Link to next FCB If n
FCB Paramter list type Number of arguments Length of argument list Argument list n n M If
n N
ff N
If It
LISTING C-l: A Typical Function Control Block
96
97
* KRESET: *
*
*
*
*
*
*
* RSX
Host K - function; * performs hardware and software reset * of the AP400. sequence:
FORTRAN calling * *
CALL KRESET (IRESM) * where IRESM is to reserve (if is assumed).
the amount of memory * less than 32, Kbytes *
*
-11 Source by John Hawkins, 9/12/80 * * Optimized
*
KRESET KRSET WS
IRESM ORIG
NAME •ff
• ^
END
IDT DEF REF
EQU DATA BSS DATA DATA BL DATA DATA TEXT
MOV MOV BL MOV JEQ BL
RTWP END
1 for 990/12 by
•KRESET' KRESET,KRSET
Marvin Spinhime, 4/82 *
Entry points F$RGMY, MGRFER, RESAP
$
WS,ORIG 32 NAME,ORIG,1 0 @F$RGMY 1 IRESM 'KRESET'
@IRESM,RIO *R10,R2 @RESAP R12,R12 END §MGRFER
Entry point Entry data Workspace area FORTRAN link data Parameter buffer Link to FORTRAN Number of parameters Pointer to parameter Subroutine name
Get the parameter add Get parameter value Call the driver Error? NO Yes, call error rtn
Return to caller
LISTING C-2: A FORTRAN Callable K-Function
98
*
*
*
*
*
*
AP400 Device Service Routine
This is the operating system portion of the AP400 array processor driver software. It performs I/O Operations to the AP400, handles interrupts, power up, and assures that only one task at a time may access the AP400.
* Author: Marvin Spinhime * Date: 6/82
*
*
*
*
IDT 'APDSR'
* External Definitions *
DEF APDSR DEF APINT
DSR Entry point Interrupt handler entry
* External References
REF SETWPS,BRCALL,ENDRCD
* Equates it
APHLT EQU >80 INTDSA EQU >98 APRST •ft
APDSR
^ ^
^ m
EQU >F0
DATA APPWR DATA ABORT
LIMI >F BL §SETWPS
BL @BRCALL DATA 8 DATA ERROR DATA OPEN DATA CLOSE DATA MEMRES DATA UNMEM DATA WCMD DATA RCMD DATA WDAT DATA RDAT
Halt AP command Disable interrupts Reset AP
Power up routine Abort I/O routine
Do not mask interrupts Restore return vector
Call OPCODE decoder Maximum no. of opcodes Illegal opcode handler 00 - Attach to task 01 - Detach from task 02 - Setup for DMA 03 - Release memres 04 - Write to command register 05 - Read from command register 06 - Write to data register 07 - Read from data register
99
ERROB ERR02
DATA ABORT
BYTE >0B BYTE >02
08 - Abort AP operations
AP in use error Illegal op code error
* OPEN: Attach the AP to the calling task. The AP BUSY * flag will be set, and the runtime ID of the
calling task will be stored. If the AP is in * use, error >0B will be returned. *
-- *
OPEN MOV *R4,*R4 Is the AP in use? JEQ 0PEN2 No B @S /CERR Yes
0PEN2 INC *R4 MOV @-8(Rl),R5 MOVB 015(R5),@2(R4) BL @ENDRCD RTWP
Set the AP BUSY flag Get the address of the TSB Save the task runtime ID Tell the system we're through
CLOSE: Detach the AP from the calling task. The AP * busy flag and the runtime ID will be cleared. * The task will also be made non memory resident *
* *
CLOSE BL @CHECK *
CL0SE2 CLR *R4 CLR @2(R4) SZCB @MEMR,@10(R5) BL @ENDRCD RTWP
Is it the proper task?
Clear the AP BUSY flag Clear the runtime ID Make it non memory resident Tell the system we're through
* MEMRES: Make the calling task memory resident and * return bias register 1 of the map file in * word 6 of the SVC block.
*
*
MEMRES BL *
@CHECK
MEMR *
SOCB @MEMR,@10(R5) MOV @52(R5),(a8(Rl) BL @ENDRCD RTWP
BYTE >20
Is it the proper task?
Make it memory resident Return bias 1
Data for making task memres
• UNMEM:
*,
Will make the task non memory resident without detaching it from the calling task.
*
100
UNMEM BL @CHECK
SZCB @MEMR,@10(R5) BL @ENDRCD RTWP
Is it the proper task?
Make it non memory resident
* WGMD: Write the command stored in word 4 of the SVC to * * the AP400 command register. *
WCMD BL 0CHECK Is it the proper task?
Write the command MOV @4(R1),*R12 BL @ENDRCD RTWP
* — — *
* RCMD: Read the AP400 command register and return the * * value in word 6 of the SVC. *
RCMD BL (aCHECK
MOV *R12,@8(R1) BL @ENDRCD RTWP
Is it the proper task?
Read the command
* WDAT: Write the command stored in word 4 of the SVC to * * the AP400 data register. * * *
WDAT BL *
§CHECK
MOV R12,R10 INCT RIO MOV @4(R1),*R10 BL @ENDRCD RTWP
Is it the proper task?
Calculate data reg address
Write the data
* RDAT: Read the AP400 data register and store the value * * in word 6 of the SVC. . *
RDAT BL •ft
MOV INCT MOV BL RTWP
eCHECK
R12,R10 RIO *R10,@8(R1) §ENDRCD
Is it the proper task?
Calculate the data reg. add
Read the data
* SVCERR: Return SVC error >0B *— —
101
SVCERR MOVB @ERR0B,@-1(R1) Load the error code BL @ENDRCD RTWP
*. .
* CHECK: See if the calling task is the same as the * * task attached to the AP. *
. *
* . . *
CHECK MOV §-8(Rl),R5 Get the TSB address CB @15(R5),@2(R4) I s i t the same task? JEQ OK Yes B esVCERR No
OK B *R11 Return * • ,. „ -*
* ERROR: Illegal opcode handler. * * *
ERROR MOVB eERRO2,0-l(Rl) Load the error code BL @ENDRCD RTWP
* *
* APINT: AP interrupt handler. **.
APINT RTWP *
* ABORT: Abort I/O routine - the runtime ID of the * * task attached to the AP (If any) is returned * * in byte 1 of the SVC block. * * *
ABORT LI R7,@INTDSA Disable AP interrupts MOV R7,*R12 LI R7,§APHLT Halt the AP MOV R7,*R12 MOVB @2(R4),@-l(Rl) Return the runtime ID B @CL0SE2 Go detach the task
* *
* APPWR: Power up routine. * * *
APPWR CLR *R4 Clear the AP BUSY flag CLR a2(R4) Clear the runtime ID RTWP
END
LISTING C-3: The Device Service Routine