+ All Categories
Home > Documents > Hardware accelerators for CAD

Hardware accelerators for CAD

Date post: 20-Sep-2016
Category:
Upload: n
View: 213 times
Download: 1 times
Share this document with a friend
8
Hardware accelerators for CAD by A. P. Ambler, R. L. Manning and N. Muhammed Brunei University At Brunei University, a small research team has been developing special- purpose hardware and algorithms that will greatly reduce the run times of CAD tasks currently run on conventional von Neumann computers. This article gives an example to illustrate the benefits that can be obtained by the use of hardware accelerators, indicating also how these benefits are achieved, and discusses the work currently being undertaken by the Brunei University group. Introduction Hardware designed solely with the objective of performing a particular computer-aided design (CAD) task at a much higher speed than could be accomplished on a conventional computer is now well established as a concept and as a marketable product. One only has to attend any conference or exhibition in the field of CAD to ver- ify the interest in this area (the con- ference program for the ACM-IEEE 22nd Design Automation Conference held in Las Vegas in June this year provides a particular case in point). Certainly, when one appreciates the excessive and prolific use of computer resources used by, say, logic simulation (quotes of days of CPU time are not unknown) and other tasks of today's problem sizes, CAD managers must be horrified by the expectation of the widespread use of VLSI devices and the forthcoming WSI. Unfortunately, leaps and bounds in processing technology Fig. 1 VLSI chip 110 also mean leaps and bounds in required design CPU time. Thus the stage is set for any means to reduce the design bottleneck in a von Neumann machine. This article will not consider what might seem to be the obvious solutions — i.e. to replace your VAX 730 with a Cray supercomputer, or to add a float- ing-point co-processor or array pro- cessor — but will examine hardware that has been designed particularly for the solution of CAD algorithms. Logic simulation is the area that has received the most attention from accel- erator designers, resulting in well known systems, for example from ZYCAD and IBM. I n order to set the scene for the rest of this article, a brief description of spe- cial-purpose hardware with special ref- erence to logic simulation will be presented, indicating the potential ben- efits to be gained. Logic simulation Most digital logic simulation acceler- ators attempt to exploit two main sources of concurrency, i.e. circuit con- currency and algorithm concurrency [1]. Circuit concurrency results from processing the signals which in a real digital circuit would be propagating simultaneously in different parts of the circuit. Fig. 1 shows a possible VLSI chip, where signals are likely to be changing at the same time in, say, the registers and the ALU. Algorithm con- currency is concurrency within the algorithm used for simulation. Circuit concurrency can be simply exploited by a multiprocessor machine; in an n-processor machine, the circuit is divided such that each processor separ- Computer-Aided Engineering Journal August 1985
Transcript
Page 1: Hardware accelerators for CAD

Hardware accelerators for CAD

by A. P. Ambler, R. L. Manning and N. MuhammedBrunei University

At Brunei University, a small research team has been developing special-purpose hardware and algorithms that will greatly reduce the run timesof CAD tasks currently run on conventional von Neumann computers.This article gives an example to illustrate the benefits that can beobtained by the use of hardware accelerators, indicating also how thesebenefits are achieved, and discusses the work currently beingundertaken by the Brunei University group.

Introduction

Hardware designed solely with theobjective of performing a particularcomputer-aided design (CAD) task at amuch higher speed than could beaccomplished on a conventionalcomputer is now well established as aconcept and as a marketable product.One only has to attend any conferenceor exhibition in the field of CAD to ver-ify the interest in this area (the con-ference program for the ACM-IEEE 22nd

Design Automation Conference held inLas Vegas in June this year provides aparticular case in point).

Certainly, when one appreciates theexcessive and prolific use of computerresources used by, say, logic simulation(quotes of days of CPU time are notunknown) and other tasks of today'sproblem sizes, CAD managers must behorrified by the expectation of thewidespread use of VLSI devices and theforthcoming WSI. Unfortunately, leapsand bounds in processing technology

Fig. 1 VLSI chip

110

also mean leaps and bounds in requireddesign CPU time. Thus the stage is setfor any means to reduce the designbottleneck in a von Neumann machine.

This article will not consider whatmight seem to be the obvious solutions— i.e. to replace your VAX 730 with aCray supercomputer, or to add a float-ing-point co-processor or array pro-cessor — but will examine hardwarethat has been designed particularly forthe solution of CAD algorithms.

Logic simulation is the area that hasreceived the most attention from accel-erator designers, resulting in wellknown systems, for example fromZYCAD and IBM.

I n order to set the scene for the rest ofthis article, a brief description of spe-cial-purpose hardware with special ref-erence to logic simulation will bepresented, indicating the potential ben-efits to be gained.

Logic simulation

Most digital logic simulation acceler-ators attempt to exploit two mainsources of concurrency, i.e. circuit con-currency and algorithm concurrency[1]. Circuit concurrency results fromprocessing the signals which in a realdigital circuit would be propagatingsimultaneously in different parts of thecircuit. Fig. 1 shows a possible VLSIchip, where signals are likely to bechanging at the same time in, say, theregisters and the ALU. Algorithm con-currency is concurrency within thealgorithm used for simulation.

Circuit concurrency can be simplyexploited by a multiprocessor machine;in an n-processor machine, the circuit isdivided such that each processor separ-

Computer-Aided Engineering Journal August 1985

Page 2: Hardware accelerators for CAD

ately and concurrently simulates 1/nthof the circuit. The resultant acceleratorarchitecture is shown in Fig. 2. The par-titioning of the circuit in practice may bemore complex than a mere geometriccarve up, and divisions along functionallines may be more appropriate. Thus,referring to Fig. 1, the registers wouldbe assigned to one processor, while theALU would be assigned to another etc.

To exploit algorithm concurrency,the sequential steps involved in the sim-ulation algorithm — for example deter-mining where the activity in a circuit islocated, updating logic values and eval-uating new outputs of gates as a resultof input value changes — can be brokendown further into a sequence of simpleprocesses and linked together to form apipeline. Thus many activities within acircuit can be processed concurrently,but at different stages within the simula-tion algorithm, internal to the pipeline.

One example of a pipelined accelera-tor is ULTIMATE (see Ref. 1, wherein afuller description of a pipelined acceler-ator and the logic simulation algorithmcan be found). In ULTIMATE, most ofthe processes within the pipeline arethose which must be performed duringthe simulation of every, or nearly every,current event. Processes which rarelyneed to be performed, such as dealingwith the incidence of detected oscilla-tion etc., are software routines, han-dled by a master processor which mustbe invoked when necessary (Fig. 3).

It is, of course, possible to have anarchitecture that exploits both circuitand algorithm concurrency by replacingthe independent logic simulation pro-cessors in Fig. 2 with pipeline pro-cessors. Thus it should be possible totake maximum advantage of the simula-tion process.

The performance figures of somelogic simulation accelerators are givenbelow [2]:

gate evaluations per secondIBM 960 millionZYCAD 60 millionULTIMATEt 5 millionValid Logic Systems 0.5 millionDaisy Systems 0.1 million

t predicted speed only

These figures compare with a typical3000-5000 gate evaluations per secondfor a simulator running on a con-ventional computer. Figures as high as2 billion gate evaluations per secondhave been quoted for the IBM machine,along with claims that such a machinecould simulate the behaviour of a circuitfaster than the circuit would operate inreal life!

It is perhaps worth pointing out herethe difference between different unitsthat are sometimes quoted in referenceto the performance of these accelera-tors. Consider the gate arrangementshown in Fig. 4. When a logic valuechange occurs on the output of gate A,this is described as an 'event'. Sub-sequent to this it is possible that theoutputs of gates B, C and D will beforced to change owing to this event.Thus the outputs of these gates must beevaluated — gate evaluation. Once all

three gates have been processed, anevent evaluation has been completed;i.e. a gate evaluation processes just onegate, whereas an event evaluation pro-cesses all gates on the fan-out of a logicgate change. Thus 100 event evaluationsper second can be reckoned to be fasterthan 100 gate evaluations per second.

Negative points

The potential benefits to be accruedfrom the use of such machines is

Fig. 2 Basic architecture exploiting circuit concurrency

Fig. 3 ULTIMATE architecture — exploiting algorithm concurrency

Computer-Aided Engineering Journal August 1985 111

Page 3: Hardware accelerators for CAD

Fig. 4 Event and gate evaluation

VDU

Fig. 5 Typical accelerator configuration

obvious. However, all good things havedrawbacks which must be highlighted.

The performance gains may hot be asgreat as as a potential purchaser of sucha system might have expected. Thequoted speed-ups usually relate to theprocessing power of the acceleratorhardware in isolation, leaving out asso-ciated tasks (Fig. 5), such as compilationand the downloading of simulation datato and from the accelerator and hostcomputer, i.e. the input/outputbandwidth.

These tasks form part of the overallsimulation process and their times ofexecution must be included in anybenchmark. Unfortunately they canoften dominate the overall simulationtime, and indeed can create such abottleneck that performance gains canbe negative. (Compilation may be ofminor or major significance. Daisy'sMegaLogician appears to use the samedata structure for the accelerator as forthe software simulator. Thus compila-tion overheads will be little. HoweverZYCAD evaluators require that theentire circuit description be levelled to

a description containing only three-input single-output elements.)

DEC, a user of logic simulation accel-erators, recently reported on its experi-ences with the use of these machines[3]:

Consider two circuits, circuit A beingof approximately 200-300 gates, and cir-cuit B being of 3000-5000 gates. The rawacceleration factor of the simulationprocess offered by the machine (in thiscase a ZYCAD) was found to be 40-50 forcircuit A, and of the order of 300 forcircuit B (such results are to be expectedfrom a pipelined machine). If the timetaken for compilation and input/outputare taken into account, then the accel-eration factors drop dramatically — 1.2for circuit A and 20-30 for circuit B.

IBM, with its own in-housedeveloped machine, reported the fol-lowing figures [1]:

Consider the simulation of a 500 000-gate processor over a 100-instructionsequence. Using the standard softwaresimulator required 4.5 minutes, whilethe hardware accelerator required 49minutes! Repeating the simulation, but

over a 1 million-instruction sequence,required 250 hours on the software sim-ulator, but only 66 minutes on theaccelerator.

The usefulness of these types ofaccelerators can therefore be seen to beapplicable only to problems above agiven size. It is unfortunate that manypotential users and purchasers of suchhardware who currently have a signifi-cant simulation problem can find nobenefit from the use of these machinesas their individual problem sizes are toosmall (Fig. 6).

Other applications

The use of hardware accelerators is notrestricted just to logic simulation. Otherapplications in the area of CAD for VLSIthat can be mentioned include circuitsimulation, test pattern generation,placement and routing, and design rulechecking. Unfortunately, to a potentialuser of such hardware, acceleration ofall these tasks might require the pur-chase of many pieces of expensive and,perhaps, totally incompatible hard-ware. Hardware and software supportfor these machines could becomeprohibitive.

Work at Brunei

The current work at Brunei University isin the development of a single hardwarearchitecture amenable to the accelera-tion of a number of design automationtasks, and in the development of thealgorithms that will be compatible withsuch a machine. The basis for the pro-posed Brunei accelerator may best bedescribed by references to, in the firstinstance, another particular example,namely circuit simulation.

The University of California,Berkeley, the home of SPICE, the wellknown circuit simulation software, hasproposed a multiprocessor architecturein an attempt to reduce the processingtimes of circuit simulation (Fig. 7) [4].

In this machine, each processor-memory element can access its ownlocal storage or, via the interconnectionnetwork, can access the local memoryof another processor-memory element.The simulation task is then divided bypartitioning the circuit to be simulatedand assigning each partition to an indi-vidual processor. The segments arethen processed concurrently. For anexample run, the maximum predictedspeed-up factor quoted, using a 128-processor system, was 35.

Similar work at California Institute ofTechnology, however, where amachine to accelerate switch level sim-ulation using a similar architecture toBerkeley's is being developed, suggests

112 Computer-Aided Engineering Journal August 1985

Page 4: Hardware accelerators for CAD

that the intercommunication networkwill give rise to bus contention prob-lems. This work predicts the optimumnumber of processors for simulating acircuit with an increasing number ofnodes as follows [5]:

circuit size(thousands of

nodes)48

163264

128256512

100020004000

00

number ofprocessors

1.02.04.08.06.835.694.934.394.03.693.442

Clearly this problem is negatingthe pur-pose of such a machine, which is toapply more processors to larger prob-lems in order to contain the solutiontime within acceptable limits.

These points help to highlight asevere limitation of multiprocessor con-figurations: that of interprocessor com-munications, which will restrict anypotential expansibility of the system.

It is with this point in mind that theBrunei team is developing hardwareand algorithms that will exploit nearest-neighbour communications with noupper limit to the number of these com-munications that can operate con-currently. Consider this applied to thecircuit simulation problem [6]. Thealgorithm to be used dispenses with thelarge matrices, and hence with the cum-bersome matrix operations, normallyassociated with circuit simulation. Thisis achieved by representing the circuitusing numerous simultaneous equa-tions. Each equation is then assigned aprocessor dedicated to its solution.Consequently an iterative method isused to solve the circuit equations.

These equations are obtained byapplying Kirchhoff's current law to eachnode in the circuit. Hence, con-ceptually, the circuit topography ismapped on to a processor array with thesame topography. Each equation hasone unknown — the voltage at the nodebeing represented — and all other volt-ages in the circuit are assumed to beconstant. When a solution has beenobtained, the result is distributed tothose processors whose equations aredependent upon it, and if any of thevoltages on which that processor'sequation is dependent have alteredthen the solution is recalculated.

Each processor simultaneously per-

hardware accelerator

software simulator

totalsimulationtime

simulationtime

input/output timebetweenaccelerator andhost computer

compilationtime

The advantages of using an accelerator can be dependent upon the size ofthe problem applied to it. The added compilation overheads and input/outputtime can mean that the use of an accelerator can be detrimental even thoughthe raw simulation process time has been reduced. It is only if the rawsimulation time is significant in comparison with these other factors that anaccelerator becomes useful.

Fig. 6 Economic benefits of accelerators — part I

forms these procedures until all theprocessors have satisfied the con-vergence criterion; i.e. the currentflowing into a node is calculated to beless than 1 |ULA.

Fig. 8 shows a simple example of acircuit, the processor configuration thatwould bq used for the simulation of thiscircuit and, in tabular form, the sol-utions that would be obtained by apply-ing the algorithms previouslydescribed. Initial estimates for thenodal voltages are zero.

Before each processor can solve thenodal equation at the node it repre-sents, it has to wait for the neighbouringprocessors to calculate the voltage attheir respective nodes. This reduces theamount of activity, i.e. the time spent bya processor solving an equation. In theprevious example, all the processorsare only 50% active. An increase in pro-cessor activity can be obtained at thecost of an increase in software over-heads by allocating a processor to thesolution of one, or more than one,nodal equation. In the example above,

processor 1 could solve the nodal equa-tion at node 1 and the nodal equation atnode 3. The activity of processor 1would be increased to 100%, and theactivity of processor 2 would remain at50%. However, a disadvantage of thissystem is that two nodal processors cal-culate the current flowing in the branchconnecting them.

A further reduction in the number ofprocessors required to simulate a cir-cuit can be achieved by only processingnodes that are independent of eachother; i.e. at any instant, only nodesthat are not connected to the samedevices are processed. Thus blocks ofnodes are processed sequentially andnodes within the block are processed inparallel. Another advantage of thisupdated method is that no two pro-cessors are calculating the current flow-ing into the same branch, as occurredpreviously. This algorithm leads to anaccelerator architecture where onlyadjacent processors are connected, forexample a two-dimensional array ofprocessors.

Computer-Aided Engineering Journal August 1985 113

Page 5: Hardware accelerators for CAD

processor-memoryelements

processor - memoryelements

Fig. 7 Berkeley machine

processor 1 processor 2 processor 3

Solutions (voltage to one decimal place) obtained by the processorconfiguration:

processor 1

0.00.01.01.01.51.51.71.71.91.9

processor 2

0.02.02.03.03.03.53.53.53.73.9

processor 3

4.04.05.05.05.55.55.75.75.95.9

Fig. 8 Simple resistor circuit and its processor representation

Additional refinements to the processcan be accomplished by consideringthat in large circuits only 5-10% of thecircuit is active at any one time. Thus itwould not only be wasteful but alsoinefficient to use a processor dedicatedto the solution of each nodal equation.The algorithms employed are condu-cive to the exploitation of this 'latency'in large circuits.

For simulation of large circuits, it isintended that an array of processorswould simulate, serially, sectors of thecircuit. For example, consider a circuitof 1 million nodes and a processor arrayof 400 processors. The most straightfor-ward method of simulation would be topartition the circuit into modules, eachcontaining 2500 nodes, and simulateeach one of the modules in series.When the array comes to a block wherenone of the circuit parameters haschanged, the block is bypassed and thenext module is processed. However, forthis size of an array, correct partitioningof the circuit would be crucial for fastsimulation. Consider, for example, ifonly one node was active in one of theblocks; then possibly only one pro-cessor would be active in the processorarray.

A better implementation is to use anumber of smaller arrays of processors,for example ten arrays of 40 processors,each array processing smaller blocks inparallel with one another in much thesame way as each processor within thearray is processing each node con-currently (Fig. 9). Thus, if only one nodeis active within a module, then a smallerpercentage of processors are inactive.The partitioning of the circuit to mini-mise interaction between modules is nolonger necessary.

The proposed simulation engine hasa hierarchical array structure. Becauseof this, processing elements can beeasily attached to the processor array,and processor arrays can be connectedto the main body of the engine with thesame ease (see Fig. 10).

There is a large amount of similaritybetween the operations of the proces-sing elements within the array and thearray itself — the processor array simu-lates blocks of the circuit concurrently;processor elements simulate nodeswithin the block of the circuitconcurrently.

To date, a speed-up factor of 50 timesover SPICE has been measured in sim-ulations, and this figure will increasewith the size of the circuit to be simu-lated as the effects of latency in the cir-cuit are exploited.

It is envisaged that this machine canbe used for true mixed-mode simula-tion. In this case, each processor arraywould perform a different level of sim-

114 Computer-Aided Engineering Journal August 1985

Page 6: Hardware accelerators for CAD

ulation on different sections of the cir-cuitry, for example logic, functional andswitch level simulation. This extensionhas already been considered while theauthors were at the University of Man-chester Institute of Science andTechnology.

Another application currently beingpursued is that of the acceleration oftest pattern generation. This work is stillin its relatively early stages, but an indi-cation of the current thinking might beuseful.

Automatic test pattern generation(ATPC) algorithms have been availablefor some time. PODEM and DALG arenerhaps the two most well known. Bothgenerate test vectors for any givensingle stuck-at fault in combinationalcircuits if the fault is testable. Attemptsare made to sensitise one or more crit-ical paths from fault to observable out-put using random or heuristicarguments on logic or switch level cir-cuit representations. PODEM hasproved to be particularly successful,but, owing to the NP-complete natureof the test generation problem at such alow level, only for small- to medium-sized purely combinatorial circuits. Forcircuits of more than a few thousandgates the run time necessary for com-plete execution of PODEM quicklybecomes prohibitively expensive. Theprocess of test generation for an /-inputcombinational circuit can be con-sidered as a search of /-dimensionalspace with volume 21. PODEM usesheuristic arguments to bound the vol-ume of space to be searched until all theremaining points in the bound volumeare tests for the current fault.

Execution time for test generationalgorithms can be reduced by the use ofseveral techniques:

• circuit segmentation —' for anN-gate circuit, run time = kN3; for twoNil segments, run time = 2(N/2)3

• tighter space bounding.

Given just a gate level description of acircuit, both the above methods requirea large amount of additional computa-tion. Considering current hardwaredevelopment environments it is quitelikely that large circuits will have beendescribed at various levels of detail —for example algorithmic, functional,gate and finally circuit level. At func-tional level the circuit would have beenpartitioned into many smaller blocks,some of which could be treated separ-ately. An experienced test engineerwould not consider a large circuit asjust a collection of logic gates, butwould make use of the functionaland algorithmic representations. Forexample, to test for a fault on one set of

inputs to an ALU, one would make useof the nature and function of the circuitby selecting control lines to select a suit-able function (for example logical OR)and setting the other input word to aknown value (for example zero). Bydoing this the engineer is actually per-forming very strong space boundingand providing logically related alterna-tive correct test assignments or testvolumes.

The use of hierarchical test genera-tion appears to provide a means ofreducing the effect of the drastic scalingproperties of conventional methods.Recent work in artificial intelligencemaking use of knowledge-rich func-tional models has proved to be verysuccessful.

When conventional methods areapplied to sequential circuits many newproblems arise. PODEM cannot directlyoperate on any circuits containing feed-back paths; hence any sequential cir-cuit must be reduced to a combinatorialform. This formation of pseudo-inputsand outputs can be a non-trivial task for

even the simplest of sequential devices.Each pass of PODEM will generate a

new input assignment. These define therequired state of the circuit at the pre-vious clock cycle. This rapidly leads toextremely long execution times, owingto the ultra-combinatorial search ofsequential devices. Simple heuristicsapplied to the reasoning backward —the reverse simulation problem — willnot be sufficient to reduce the searchspace to a manageable size. Test engin-eers are able to solve many of theseproblems fairly simply. For example, totest the output of an 8 bit shift registerone would initially reset the register andthen set the control lines to shift data ina known direction, set the data inputlines to test values and then toggle theclock line for a further eight cycles.These are all sensible things to do butquite complex events when viewed atdetailed gate level, especially inpseudo-combinatorial form.

A method of deducing the necessarycontrol events for a specified objectivefrom knowledge-rich models of stored-

Fig. 9 Processor representation of an RTL inverter

Computer-Aided Engineering Journal August 1985 115

Page 7: Hardware accelerators for CAD

array of arrays

array of •processing elements

input/output handler

Fig. 10 Brunei simulation engine

Fig. 11 ALL) processor array

116

state devices has been proposed byKramer [7]. Device operation is definedusing a Lisp-like hardware descriptionlanguage capable of representing bothMealy and Moore models of finite-statemachines. Working systems based onthese ideas have yet to be developedand hence timing characteristics are notavailable. They are expected to be muchmore favourable than,those of presentsystems for more complex devices,although as circuit complexityapproaches VLSI densities the need forfurther speed enhancements willdevelop.

Conventional ATPG systems do notlend themselves readily to performanceacceleration. They require largeamounts of computation to be per-formed in strict gate sequence, andhence only minimal speed enhance-ments can be achieved with the use ofparallel processes. For example, duringthe backward objective drive operationof PODEM a decision node is soonreached — for example a NOR gatewhose output is required to be at logic0. If all possible paths were to be tracedto primary inputs, and multiple assign-ments were made, possible critical for-ward D-paths may be blocked owing tothe possibly redundant assignments.PODEM is inherently sequential in itsoperation. Multiple forward and back-ward traces can be performed in parallelif strict priorities are maintained andrevised after each input assignment.

ATPGs are often used to generate setsof test vectors that provide maximumtest cover for a circuit, and therefore thebasic algorithm may be executed sev-eral times using different primary faulttargets. Hence another way to reducethe total run time of an ATPG operationwould be to perform several isolatedsequential ATPGs using different prim-ary faults. Redundancy due to overlap-ping of fault coverage can possibly bereduced using heuristics to select setsof most distinct faults.

Acceleration of conventional ATPGscan result in possible speed improve-ments of several factors, comparablewith improvements achievable with theuse of simply faster computers. Newalgorithms that make best use of thenext-generation highly parallel pro-cessor networks currently underdevelopment are required.

The use of hierarchical circuitdescriptions provides a natural meansof circuit segmentation that can be map-ped on to a topologically similar array ofprocessors. An algorithm proposed byKramer can be modified for executionon such an array of processors. A simpleexample to demonstrate its operation isdescribed below.

Consider the ALL) shown in Fig. 11,

Computer-Aided Engineering Journal August 1985

Page 8: Hardware accelerators for CAD

with some initial fault in multiplexer Mv

Using a gate level ATPC, the necessarydata and control inputs to M, would begenerated by processor B; these objec-tives would be passed on to neighbour-ing processors. Processor C would usefunction data to determine possibleways of transporting the fault to its out-puts while satisfying any constraintsplaced on other input lines. Any result-ing additional constraints would betransmitted to neighbouring pro-cessors. While processor C was per-forming the D-propagation, processorA would be attempting to satisfy datarequirements for setting up the initialfault in M v

The use of natural functional circuitsegmentation should greatly reducethe amount of conflicting objectivesgenerated when performing forwardand backward drives concurrently,although a substantial amount of trans-fer still has to be performed, but onlybetween nearest-processor pairs. In theexample, the processor efficiency isquite low. This is in part due to the sim-plicity of the circuit, but in general, byattempting to generate tests for as manyfaults as possible, in strict process pri-ority, concurrency should result in highefficiency.

Other design automation algorithms— for example placement and routingand design rule checking — are knownto fit on this type of architecture. Thus itis hoped that the Brunei simulationengine (for which we have yet to thinkof an interesting and catchy name!) willanswer one of the criticisms that couldbe made about hardware accelerators,i.e. that they are currently too specificand expensive (see Fig. 12).

Future developments

Unfortunately, the above machine willstill suffer from one of the bottlenecksmentioned earlier, i.e. the compilationand input/output between host andaccelerator. The only way that this prob-lem will be removed is by integratingthe accelerator and host computerhardware as much as possible.

Daisy and Valid appear to be the clos-est to achieving this concept as they areboth workstation and acceleratormanufacturers: thus it can be reason-ably assumed that the accelerator archi-tecture takes fuil account of the datastructures within the workstation. Thecompilation problem can be expectedto be reduced; however, the input/out-put problem will probably remain.

At Brunei, we are now consideringdesigns where the accelerator andworkstation hardware are one and thesame. Thus one piece of hardware will

time

logic simulation

circuit simulation

placement androuting

design rulechecking

The above shows a possible scenario with, admittedly, an arbitrary selectionof CAD processes and durations representing the total design time for anintegrated circuit. 'A' shows the design time without an accelerator; 'B' showsthe design time but now using a logic simulation accelerator. Is the actualdifference in time A and time B for the complete design time justifying the useof the accelerator? Obviously, if the proportion of time that logic simulationrepresents is significant, then an accelerator can be justified. Equally, is anaccelerator that is capable of moderate acceleration of all the CAD tasksjustifiable?

Fig. 12 Economic benefits of accelerators, part II

perform workstation functions, graph-ics and the acceleration of CAD taskscompared with speeds currentlyachievable.

Many of the proposed solutions men-tioned in this article will probably be

directly realisable in silicon. Hence theuse of VLSI and WSI, which these CADtools are intended to help design, mayprovide the solution to their more eco-nomic production and more wide-spread use.

References

1 GLAZIER, M. E., and AMBLER, A. P.: 'ULTIMATE: a hardware logic simulation engine'.Proceedings of 21st ACM-IEEE Design Automation Conference, Albuquerque, NM, USA,June 1984

2 BLANK, T.: 'A survey of hardware accelerators used in computer-aided design', IEEEDesign & Test of Computers, 1984, 1, (3)

3 SMITH, L. T., and REZAC, R. R.: 'Methodology for and results from the use of a hardwarelogic simulation engine for fault simulation'. Proceedings of IEEE International Test Con-ference, Philadelphia, PA, USA, Oct. 1984

4 DEUTSCH, J. T., and NEWTON, A. R.: 'A multiprocessor implementation of relaxation-based electrical circuit simulation'. Proceedings of 21st ACM-IEEE Design AutomationConference, Albuquerque, NM, USA, June 1984

5 DALLY, W. J.: 'The MOSSIM simulation engine architecture and design'. Report5123:TR:84, California Institute of Technology, Pasadena, CA, USA, 1984

6 MANNING, R. L., and AMBLER, A. P.: 'High-speed simulation of 1 million transistorcircuits'. Proceedings of Custom Integrated Circuits Conference, Portland, OR, USA, May1985

7 KRAMER, G. A.: 'Brute force and complexity management: two approaches to digital testgeneration'. M. Sc. Thesis, Massachusetts Institute of Technology, Cambridge, MA, USA,1984

A. P. Ambler, R. L. Manning and N. Muhammed are with the Department of ElectricalEngineering & Electronics, Brunei University, Uxbridge, Middlesex UB8 3PH, England

Computer-Aided Engineering Journal August 1985 117


Recommended