A review of fault models for lsi/vlsi devices

A review of fault models forLSI/VLSI devices

Silvano Gai, Marco Mezzalama and Paolo Prinetto

Indexing terms: Fault testing, Modelling

Abstract: The review paper deals with problems concerning fault modelling for LSI/VLSI devices. Bothrandom and regular logic are considered, and different fault classes are discussed for each, including stuck-at,bridging, functional and time-dependent faults. Specific fault models are then considered for microprocessors,RAMs and PLAs.

1 Introduction

Problems involved in testing LSI and VLSI circuits haveincreased considerably in the last few years, both fromthe manufacturer and the user points of view. The reasonsfor the difficulties encountered are best understood byexamining the driving force behind the changes in theLSI/VLSI test environment. As recently as 1976 thesemiconductor industry was able to achieve only about100 logic gates/mm2. Today the figure is around 1500gates/mm2 and historically has increased linearly with time.With the additional growth in the physical size of a chip,the number of circuit functions per chip has grown from10000 in 1976 to approaching 100000 today [1, 2] .Hewlett Packard's NMOS 32-bit microprocessor has450 000 transistors on one silicon chip of less than 38 mm2,designed for a maximum speed of 18 MHz. The so-called1 /urn barrier for optical systems has been reached, butelectron beam and X-ray techniques are enabling sub-micrometre lines to become a practical design consideration.

Improvements in device fabrication technology have givendesigners freedom to increase the performance, and hencethe complexity, of the chip: four 32-bit microprocessorsare now available [1 ] , and the first megabit RAM is expectedin 1985. The area of testability, rather than technology,may now become the limiting factor in device development.

These considerations justify and explain the growth ofinterest in a CAD-oriented approach to testing. Thestructural elements of CAD systems are good circuitsimulators, fault simulators and (automatic) test patterngenerators.

To cope with LSI/VLSI chips, the philosophy of newCAD systems must be, to a large extent, completely new.In particular, they should provide the possibility of dealingwith LSI/VLSI at different hierarchical levels [3].

Two main problems arise in CAD systems, namely:

(i) how to find algorithms able to deal with the LSI/VLSI complexity

(ii) how to find proper device and fault models torepresent LSI/VLSI circuits.

Paper SM31, received 12th August 1982The authors are with the Dipartimento di Automatica ed Informatica,Politecnico di Torino, Corso Duca degli Abruzzi 24, 10129 Torino,Italy

In this review paper we analyse the problems of faultmodels to be considered in CAD systems to adequatelytreat new LSI/VLSI devices.

The selection of fault models is of primary importancefor both fault simulation and automatic test generation,since it directly influences both the algorithms to be usedand their effectiveness [4].

Physical faults occur in most devices and are due tonumerous factors, such as manufacturing processes, materialimperfections and the user environment. The output signalU of a logic block with m inputs and n state variables maybe expressed as

, i2,... , (1)

where ik is the kth input, sk is the fcth state variable (forsequential circuits) and F is a generic logic function (forexample AND, SHIFT, ADD).

In faulty circuits the function F* (where the subscript/denotes variables in a faulty circuit) depends on severalparameters (temperature, humidity, time, manufacturingprocess etc.) in such a way that Uf differs from the outputU of the good circuit; the way in which Uf changes withrespect to t/ is usually very complex and difficult to expressin an analytical form.

As a first simplification let us consider only logic faults,i.e. the faults which modify the logic characteristics of acircuit:

U,UfeB (2)

where B is the set of adopted logic values, typically {0, 1}In order to perform an automatic fault analysis it is

necessary to define what kind of logic faults can bemanaged by a computer. Indeed, it is clear that it is im-possible to consider all failures, owing both to the impossi-bility of defining the complete set of such faults in analgorithmic way and to the difficulty of enumerating andsimulating them on a real computer. In particular, the setof fault models should have the following properties:

(i) They should be readily definable and henceenumerable by computer.

(ii) They should give a satisfactory coverage of realfaults.

(iii) They should represent significant technology faultsand logic functional faults.

44 Software & Microsystems, Vol.2, No.2, April 1983

In practice restriction (i) leads to a consideration of only'static' faults, i.e. faults whose effects on U are permanentin time.

A further subdivision concerns the logic level at whichthe effect of faults is considered. This is directly related tothe level of circuit description (behavioural, functional,gate etc.) at which algorithms for simulation or test gener-ation are applied. At present, a gate level fault model is themost widely used, since the major part of CAD softwaretools (simulators and automatic test generators) availableon the market uses a gate level description of circuits [5, 6]and adequate gate level modelling is available for SSI andMSI devices.

A general classification of various types of logical faultsis given in Fig. 1.

It is worth noting that there are two general classes offaults which are undetectable by logic procedures. The firstmay be termed 'parametric faults': these degrade someanalogue characteristic of the circuit, but not its truthtable. This category may include defects which cause higherleakage current, longer propagation delay, higher powerdissipation etc. Secondly, there are undetectable logicfaults, mainly due to circuit redundancy, reconvergentfan-out, circuit operation etc. [4].

faulttypes'

dynamicfaults

timingfaults

intermittentfaults

Ttime-dependentfaults

staticfaults

Tfunctionalfaults

shortfaults

stuck-atfaults

functionallevel

time-independentfaults

gatelevel

Fig. 1 Classification of logic faults

Section 2 analyses fault models for a gate level circuitdescription, functional fault models for random logic anddynamic (i.e. time-dependent) fault models. In Section 3functional fault models developed for well defined classesof devices, such as microprocessors, random access memoriesand regular logic, are discussed.

2 Fault models for random logic

2. / Stuck-at fault model

The stuck-at fault model is by far the most widely used,mainly because of its simplicity and amenability to analysis[5,6] .

In this model, it is assumed that a gate input or outputis permanently fixed at either logical 0, stuck-at-zero (SA-0),or logical 1, stuck-at-one (SA-1).

This model, although defined entirely theoretically, doesin fact account for the function of many electrical faults in

an actual circuit. Some examples of faults which can beaccurately modelled for circuits made up of standardtransistor-transistor logic (TTL) elements are:

(i) an open connection between the IC solder pad andthe pins of an IC package, or between the pins and externalcircuits, modelled as SA-1

(ii) various collector-emitter or base-emitter short oropen circuits in input or output transistors of a TTL gate,modelled as SA-0 or SA-1

(iii) a node shorted to VCC or ground, modelled asSA-1 or SA-0.

A typical example is given in Fig. 2a for TTL logic (aNAND gate), while in Fig. 2b the relationship betweenphysical defects and stuck-at faults is shown for the case ofa DTL NAND circuit. With the increasing complexity ofICs and PCBs the number of stuck-at faults to be consideredin automatic fault analysis would be prohibitive owing toboth the CPU time requiredt and storage requirements.Considering a VLSI chip having a gate equivalent number of100 000, and assuming that each gate has three inputs andone output, the total number of SA-1 and SA-0 faults to beconsidered would be 8 x 100000 = 800 000. Since faultsimulation is the process of applying every test pattern tothe fault-free machine and to each copy (800 000) of themachine containing a single stuck-at fault, and assumingthat a simulation run requires several minutes, the completesimulation would last more than one year!

A technique which is frequently used to reduce thenumber of stuck-at faults for fault analysis is the so-called'fault collapsing' technique. It makes use of the fact thattwo, or more, faults may be equivalent [8, 9] in their effecton the behaviour of the circuit, at least as viewed from theprimary circuit outputs. For example, the input of a NANDgate SA-0 is equivalent to its output SA-1; also the outputof a gate SA-0 or SA-1 is equivalent to the input of the gateit feeds SA-0 or SA-1, if no other fan-out from the drivinggate exists. By identifying such equivalent faults, it ispossible to partition the set of all faults into 'fault equiva-lent classes', and then to simulate only one representativefault for each class.

As an extension of the notion of fault equivalence,'fault dominance' [8] has been used, although much lessfrequently, to reduce the number of faults to be simulated.In this technique, it is noted that, although the faults arenot equivalent, the input of a NAND gate SA-1 can bedetected only if the output of the same gate SA-0 isdetectable (but not vice versa). Thus analysis of only theoutput faults may sometimes eliminate the need to simulateinput faults. However, the fault dominance relation doesnot apply to sequential circuits [4].

For single stuck-at faults in a combinatorial network, wecan say [10, 11] that only the fault set consisting of faultsat the primary input lines and fan-out branches need to beconsidered. A test pattern that covers all members of thisset is a complete test for all detectable faults. It has beenshown that essentially the same holds for multiple faults[11,12].

A different approach to reducing the number of faultsto be simulated is to consider only the faults at the inputand output pins. This technique, which is widely used forPCBs, has a percentage detection of internal faults which

f It has been observed that computer run time just for fault simu-lation is proportional to N?, where N is the number of gates [7 ].

Software & Microsystems, Vol.2, No.2, April 1983 45

varies widely, depending on the type of circuit [13]. Forexample, a test sequence which detects all (100%) of the pinstuck-at faults of a 4 to 16 decoder must exhaust its inputtest space (all 16 input combinations), and must thereforedetect 100% of the detectable internal gate stuck-at faults.

SA-0

defect

diode D/̂ opendiode Dg opendiode DQ openresistor R3 opentransistor Q opendiode Dx opendiode D2 openresistor Rj open

fault model

SA-1 (input A)SA-1 (input B)SA-1 (input C)SA-0 (output U)SA-1 (output U)SA-1 (output U)SA-1 (output U)SA-1 (output U)

Fig. 2 Examples of defect-fault relationship for TTL and OTL

TTLDTL

At the other extreme, consider a 2N x 1 RAM and applythe following 100% pin stuck-at fault test sequence:

(i) Write a 0 into location 0.(ii) Write a 1 into location 2' for / : = 1 to AM (loca-

tion 2 ,4 ,8 , 16 etc.).(hi) Read all locations 2' for i : = 0 to N-l (location

1,2,4 etc).

The above sequence exercises only AM bits, and theinternal fault coverage cannot exceed (AM)/2. In the caseof a 1024 x 1 RAM, the internal fault coverage is less than1%.

Of course, typical internal fault coverage will be between

46

these two extreme cases.The application of this approach to LSI/VLSI is expected

to give very poor results owing to the reduction in thenumber-of-pins/number-of-internal-gates ratio, which isapproximately 0.2 for MSI, 0.02 for present LSI and0.002 for future VLSI.

Alternative and more promising ways to reduce thecomputer run time for fault analysis are both the use ofmethods for partitioning complex (LSI/VLSI) chips intosmall modules, which obviously can be more easily dealtwith, eventually in parallel, and the introduction of faultmodels at a high description level (functional), as discussedin a later Section.

2.2 Nonclassical stuck-at fault models

SA-1 and SA-0 give a satisfactory representation of physicalfaults mainly for bipolar (TTL) logic. However, theincreasing use of MOS technology, due to its higherintegration capability, leads us to consider other types ofstuck-at faults to represent logical failures peculiar to thistechnology.

Two nonclassical stuck-at fault models have been intro-duced for MOS circuits [14, 15]: stuck-at-open (SA-op)and stuck-at-x (SA-x), where x is an unknown. SA-op isimportant primarily in the high-impedance technologies ofPMOS, NMOS and CMOS [14]. In PMOS and NMOS,SA-op faults are significant for nodes at which severaltransmission gates converge and for circuits performinglogic by the precharge/discharge technique. It is, however,CMOS technology which most naturally exhibits the natureof SA-op faults.

SA-op faults can be produced either by a missingconnection to the gate of a nonconducting FET or by anopen or missing connection to the source or the drain ofan FET.

As an example, consider a two-input CMOS NOR gate(see Fig. 3): the ^utput is high if, and only if, A = B = 0(U - A + B = A ' B). Assuming an open, or missing,connection in the ^-channel yl-input pull-down FET, theoutput values for every input condition are as summarisedin Table 1, in which Z denotes a high-impedance output.In fact, for A = 1 and B = 0, none of the four FETs isconducting; thus there is no change in the voltage of theoutput node U. The capacitance associated with U retainsthe previous logic state, resulting a high-impedance state.This fault can be modelled by an SA-op on the ,4-inputline.

Fig. 3 Two-input CMOS NOR gate

Software & Microsystems, Vol.2, No.2, April 1983

A way to model this fault for fault simulation of theNOR gate is shown in Fig. 4.

The introduction of the SA-op fault is defined by theSA-op control input C. The latch represents the (storing)memory property (capacitance) of the output node. ForC = 0, the output latches and stores the previous state; forC = 1, the output equals the input.

Note that the global model of a combinatorial element,i.e. the NOR gate, becomes a sequential circuit, leadingto a more complex simulation.

A model of general validity for any kind of CMOSdevice, including the transmission gate and the tristateinverter, is suggested in Reference 15.

Table 1: Truth table for a faulty CMOS

A

0011

B

0101

U

10

z1

It should be pointed out that SA-op faults are timesensitive owing to the leakage current time constant:this may suggest the use of a time-dependent model forSA-op faults.

NOR gate to besimulated latch

(SA-op control)

Fig. 4 Fault model for an SA-op fault

The SA-JC model is useful for representing situations inwhich the logic value of a lead is not well defined owing tosome failure, for example the output of a CMOS gatetaking on a value intermediate between VDD and Vss.However, in practice, the effectiveness of an SA-x model isoften only apparent. Two drawbacks come to mind. First,even if the simulator propagates the fault to an output, anx output does not define whether the fault has beendetected or _not. Secondly, several simulators do not dealwith x and JC states; thus circuit defects are not properlymodelled.

2.3 Bridging (short) fault model

Although stuck-at fault models give satisfactory coverage ofreal faults, additional models are often added to achieve amore accurate fault diagnosis.

Bridging fault models correspond closely with actualphysical faults having a significant probability of occurrenceand are therefore widely used, adding a substantial degreeof accuracy to the diagnostic process [16].

This model considers the set of possible shorts betweenpairs of nodes in the circuit, each short being assigned alogic value which is a technology-dependent function of thevalues of the two shorted nodes. Thus the introduction of

'additional gates' is usually required to properly modelshort-circuit faults, as shown in Fig. 5, where some examplesof input conditions are given to illustrate the differentproperties of good and faulty circuits. In DTL, TTL andother logic types which use a positive supply voltage,owing to the device structure, a gate-to-gate short wouldcause a 'zero-dominant' fault (for positive logic circuits);i.e. the resultant value is the AND of the values of thetwo nodes. Conversely, for PMOS or ECL, the same shortwould be 'one-dominant'; i.e. the resultant value is theOR of the values of the two nodes.

For CMOS circuits, in the presence of a short the sym-metrical nature of the FET structure of CMOS gates causesan intermediate voltage state between supply levels VDD

and Vss', hence an 'unknown' value is generated: it can bemodelled by an SA-x fault [14].

The number of arbitrary shorts between any two nodesin a circuit with n nodes is n (n-l)/2: it is clearly prohibitiveto consider all of them. For example, in a 500-node circuitwe would have to consider approximately 1000 stuck-atfaults and 124 750 short faults.

It is therefore necessary to limit the number of short-circuit faults considered to those that are most likely tooccur in an actual circuit, typically as a result of themanufacturing and assembly processes.

The most common shorts are as follows:

(i) shorts between adjacent pins of the same ICpackage

(ii) shorts between closely spaced printed-circuittracks, caused by the wave soldering process or mechanicalmishandling, or bare wire fragments

(iii) shorts between wires due to either worn out orburned isolation

(iv) shorts between lines due to diffusion or metalisa-tion failures.

short

A B C

0 1 01 0 10 1 1

short

good

U, U2

0 00 10 0

faulty

U

011

111

good

A B C

1 0 0

U2

1

faulty

U, U2

0 0

Fig. 5 Example of a short-circuit faulta Assuming 1 as the dominant valueb Assuming 0 as the dominant value


2.4 Functional fault models

Traditionally, stuck-at and bridging fault models are appliedto a gate level description of devices. However, this can becumbersome and both time and storage consuming.

There are two main reasons for introducing functionalfaults in the simulation of complex digital circuits:

(i) the inability to model some physical faults withstuck-at faults, for example a NAND gate which does notinvert, thus resulting in an AND-equivalent operation

(ii) the necessity to provide fault models compatiblewith the functional description and simulation of a circuit;for instance, in a purely functional simulator the ability toinsert internal stuck-at faults is lost.

It is worth noting that several studies suggest the use of afunctional simulator to overcome the problems of futurecircuit complexity [17, 18].

Moreover, because of the increasingly high level ofintegration, physical defects at the manufacturing phaseusually concern a large area of the chip, thus affectingthe correct operation of the circuit at a level of complexitydirectly related to hardware functional blocks. These faultsrequire functional fault models.

clock

x=0

x=1

x=0

x = 0 x = 0

AB

CD

00

11

01

10

01

10

Fig. 6 Example of a sequential circuit and its state diagram

a Circuit diagramb State diagram

It is well known that a sequential circuit can be regardedas an «-input, ra-output, s-state finite-state machine, i.e. an(«, m, s)-machine. Usually an (n, m, s)-machine is rep-resented (and simulated) by means of its transition diagram,with no regard for its actual implementation.

The functional fault model allows an (n, m, s)-machine(M) to be changed by a fault to an (rif, ntf, sy)-machine (M/)having the same number of inputs (n = nf) and outputs(m = mj) but with different functional properties (in general,s differs from sf s > s/). In the most general case M and

My have two different state diagrams.Let us consider a machine M (« = 1, m = l ,s = 4) having

the state diagram shown in Fig. 6b. A possible implemen-tation of this state diagram is given in Fig. 6a. If we consideran SA-1 fault on the node X, which can be considered asan SA-1 fault in the state variable yi, the new faultymachine Mft(rif = 1, my = 1, Sy = 2) is represented by thestate diagram shown in Fig. la. Thus, from a functionalpoint of view, the SA-1 fault changes the machine M intoMy!: the state diagram shown in Fig. la can be viewed asthe functional model of the faulty machine.

Consider now the circuit shown in Fig. 6 and supposethat the inverter I does not invert because of some failure.This can not be modelled by a stuck-at fault: in this case Mis changed to My2 (^ Mya), whose state diagram is shown inFig. 1b.

In general, for a given circuit many possible faulty-circuitmodels may be assumed at the functional level. In practice,the following two fault models applicable to functionalelements are used [17]:

(i) the 'functional fault model'(ii) the 'internal state fault model'.

X r O

X z i

x=01

Xz1

Fig. 7

x = 0

Modifications of the state diagram given in Fig. 6a Due to an SA-1 at node Xb Due to a defect on inverter I

In the functional fault model, specific faults which modifythe function of the device are defined by the person whodevelops the functional simulation code itself. In particular,during the simulation phase, a specific segment code isactivated only if the effects of the related faults are to beconsidered in the actual simulation step.

This technique, used in the LAMP system [19], provides


complete flexibility in the definition of fault effects, but itrequires faults to be predefined. Furthermore, any additionof new faults may require a significant effort at the program-ming level, which can be expensive and time consuming.

The internal state fault model [17] is based on theassumption that a functional model usually containsinternal state variables which keep track of the state of thecircuit at any point of the simulation process. The faultinsertion is performed by associating each internal statefault with an SA-1 or SA-0 fault in the correspondinginternal state variable(s); this may very easily be doneautomatically. The example given previously (the circuitgiven in Fig. 6, SA-1 in variable^!) is an application of thisapproach.

Although less general and flexible than the functionalfault model, this technique has the advantage of providingan automatic fault analysis by simply considering SA-1 andSA-0 for each bit of each internal state variable. For example,in a functional RAM model, internal state faulting willconsider each bit of the RAM's memory as SA-0 and SA-1.

The main problem in functional fault modelling is theestimation of the coverage of real faults in terms of'covered' gate level stuck-at faults. At present we do nothave an anlytical method to determine the coverage, butempirical data have been published by Szygenda [20],who states that a 100% test program on the functionallevel (using functional fault models) drops to 70—90% onthe internal (SA-1, SA-0) gate level. A reduction of 50% ormore in fault simulation time is usually obtained usingfunctional models; for higher integration the coveragedecreases.

2.5 Time-dependent fault model

Many real faults depend on time, in the sense that eithertheir occurrence is a function of time (ageing) or they affectcircuit parameters which define the timing characteristics ofthe device, such as propagation delay, hold and set-up timesetc.

Owing to its complexity, no systematic and deterministicapproach has been suggested to this problem. Considerationis therefore restricted to probabilistic or statistical analysis,as in the case of intermittent faults, i.e. faults whoseappearance is random in time [21, 22].

Some work has been done on those manufacturingdefects which simply modify the switching speed, leavingthe logic function unchanged. Examples are provided by amonostable multivibrator (one-shot), in which a significantchange in the time constant may prevent proper operationof the circuit, and by the omission of Schottky diodes inTTL technology, which leads to performance degradationin terms of speed. The majority of these faults are usuallymodelled by the so-called 'delay or dynamic fault model',in which the delay, or time constant, of a particular deviceis altered by a well defined amount.

The possibility of introducing delay faults directlydepends on the timing model assumed for a device. Model-ling a real AND gate as shown in Fig. 8 (i.e. by the so-calledhanit delay' [4]), a delay fault simply modifies the delayvalue A by a predefined amount so that

Ay = A + K

where K is a generic constant.Note that the accuracy of any delay fault model can be


only as good as that provided by the simulation process [23].The difficulty both in automatically managing these

faults and in reaching a satisfactory degree of accuracymeans that time-dependent faults play a marginal role inpresent CAD systems.

^ ) — A

B i

time

Fig. 8 Delay model for an AND gate

3 Fault models for microprocessors and regular logic

The approaches examined in preceding Sections are wellsuited to the problem of testing random logic, but therecent trend in the LSI/VLSI industry is towards the use ofparticular classes of devices such as microprocessors,random access memories and regular logic. For theseclassical fault models play a very small part in assuring satis-factory performance in test pattern generation. Hence, inpast years, new functional fault models have been investi-gated for each class of device to take into account boththeir general hardware structure and typical failure modes.In the following Sections the main results for micro-processors, RAMs and PLAs are presented.

3.1 Functional fault models for microprocessors

An approach to functional testing of microprocessors hasbeen proposed by Robach and Saucier [24, 25]. Theyobserved that every digital system may be decomposed intoa set of functional units and a control unit.

The diagnosis of the control unit should be performedthrough the 'controlled' functional units (assumed fault-free), which can be tested later using classical algorithms.The control unit is described as a Moore automaton, withstates and commands, rather than gates and flip-flops,assumed as primitive elements, and faults are modelled aschanges in the structure of the automaton. The testsequence for the control unit verifies the distinguishabilityof every state from all others, the correctness of commandsgenerated in every state, and the transition function, whichcomputes the next state on the basis of both a finite set ofsignals issued from the functional units and the presentstate.

The test sequence can be subdivided into three parts:

(i) a rapid test sequence or 'partial state identificationprocedure' that covers all the control states by a minimalset of paths

49

(ii) a complementary test for the complete stateidentification which, for each state, tests the complete setof commands provided by the control unit to eachfunctional unit

(iii) a complementary sequencing test which achieves thecomplete sequencing verification by testing all possiblepaths in the state diagram.

A second approach has been proposed by Abraham andThatte [26], who developed a general graph-theoreticmodel (the iS-graph) for describing the microprocessororganisation and the instruction set at the register transferlevel. Each register is represented by a node in the S-graph,and the main memory and the I/O devices are representedby two additional nodes named 'IN' and 'OUT'. Data flowsamong registers, memories and I/O devices are representedby paths connecting respective nodes. Each machineinstruction, being a set of data flows, is consequently rep-resented by a set of paths. Each node is then labelled withthe minimum number of instructions needed to transfer itscontents to the 'OUT' node, and each path set is labelledwith the label of the destination register plus one. Thisoperation provides, for each node, a figure of its observ-ability and controllability, i.e. of the 'difficulty' encounteredin the test pattern generation phase.

Functional level fault models, capable of describingfaulty behaviour at a high level, independently of imple-mentation details, have been identified for the followingfive main functions:

(i) Register decoding function: Whenever a register isto be accessed:

(a) no register is accessed(b) a set of registers are accessed.

(ii) Instruction decoding and control function: When-ever an instruction is to be executed:

(a) a different instruction is executed(b) in addition to the right instruction a wrong

instruction is executed(c) no instruction is executed.

(iii) Data storage function: Any cell of any registermay be SA-0 or SA-l; this fault may occur for any numberof cells and for any number of registers.

(iv) Data transfer function: Any line of a transfer pathmay be SA-0 or SA-1, and two lines of the same path maybe shorted.

(v) Data manipulation function: This refers to thefaults in the functional units, which are assumed to betestable by classical approaches.

The previously described S-graph and fault models directlyfeed the procedures for the automatic test pattern genera-tion. The generated test sequences consist of valid machineinstructions, which can be assembled to produce the testprogram.

Abraham and Thatte divided the test sequences into twogroups, the first containing the sequences for testing registerdecoding, data storage, data transfer and data manipulationfunctions, and the second containing only the sequence fortesting instruction decoding functions, which is thedominant part of the test sequences. For an 8-bit Hewlett-Packard microprocessor the length of the two sequenceswere instructions of about IK and 8K, respectively. Thecoverage of single stuck-at faults was of the order 90% for

the first sequence and about 96% when both sequenceswere applied.

Special-purpose functional fault models have also beendeveloped for bit-slice microprocessors. Sridhar and Hayes[27] have shown that bit-slice microprocessors may bemodelled as iterative logic arrays (called 'C-testable' [28]),which require a constant number of tests independently ofthe array size. A formal C-model has been defined for a1-bit microprocessor, including most of the features ofcommercially available bit-slice microprocessors. For testingpurposes, C is treated as a network of small modules,described at the behavioural level, with no regard for theirinternal structure. These modules are either combinatorialor sequential. The combinatorial modules are tested with anexhaustive test sequence of 2n patterns (where n is thenumber of inputs to the combinatorial network), while theapproach proposed by Friedman and Menon [29] isadopted for the sequential modules. This approach is foundto yield minimal, or near-minimal, test sequences in thecase of small Moore-type sequential machines, such as theones considered here.

3.2 Fault models for random access memories

The problem of testing random access memories (RAMs)has been extensively studied in past years: several faultmodels and test pattern generation algorithms have beenproposed, but this problem is far from completely solved.In fact, owing to developments in fabrication technology,memory chip dimensions are becoming larger and larger:64Kbit dynamic RAMs are available and 256 Kbits will beannounced in 1983. Testing algorithms must be able todeal with these dimensions in an acceptable period of time.It is therefore very important to know the complexity ofeach algorithm, i.e. the number of operations needed as afunction of the number n of memory cells.

Table 2: Test times for RAMs

Pattern

GALPATGATCOLMARCH

Test time

4n2c3n3/2c10n

Actual time

1 K

2s49 ms

5 ms

4 K

33 s393 ms

20 ms

16K

536 s3s

80 ms

64 K

2.4 h25 s

320 ms

256 K

38 h201 s2.4 s

n = number of word locationsc = cycle time = 500ns

The time needed to test the memory can be computedby multiplying the number of operations (clock cycles) bythe clock period. Table 2 shows some results for O(n2),O(n3n) and O(n) algorithms: it is evident that for RAMslarger than 4K the only suitable algorithms are thosecharacterised by a linear complexity <9(/?).).

Old algorithms [30] are most characterised by a lackof fault modelling capabilities. For instance, GALPAT, aO(n2) algorithm, does not test a 1-0 transition in a cellcaused by 0-1 transition made in another cell [31]; it alsohas a prohibitive complexity. MARCH, a O(n) algorithm,has a limited fault coverage. For this reason, the faultmodels proposed for these old algorithms (for exampleGALPAT, GALTCOL, WAKPAT, MARCH, MASEST,DIAPAT etc.) will not be discussed here: we shall simplyconsider the algorithms developed in the last four to fiveyears by three groups of authors [31-34]. To cover these


faults they proposed quasi-optimal algorithms with acomplexity included in the 4« - 200« range. For thepurpose of fault modelling, a memory may be subdividedinto three functional blocks [31]:

(i) the memory cell array(ii) the decoder logic

(iii) the READ/WRITE logic.

Nair, Thatte and Abraham proved [31] that failures in thedecoder and READ/WRITE logic are equivalent to failurein the memory cell array; therefore, in order to test a RAM,only faults in the memory cell array need be considered.The basic fault hypotheses that they made are as follows:

(i) One or more cells are SA-0 or SA-l.(ii) There exist one or more pairs of cells which are

coupled; i.e. a transition from x to y (with x,ye {0,1} andy = x) in one cell of the pair, say cell i, changes the state ofthe other cell, say cell /, from JC to y or from y to x.

They proposed a test sequence which has a complexity 30/iand which can detect all the faults in the stated fault model.This has been refined by Marinescu [32], who substitutedhypothesis (ii) of the previous model with the followingdefinitions (where t and 4- denote 0—1 and 1—0 transitions,respectively):

(i) A transition t in cell i has an idempotent influenceon cell / if the transition t in i has an influence on / onlywhen/ is in a fixed state u. If cell / is in state u, the transitiont in cell / has no influence on /. An analogous definitionexists for an idempotent influence caused by a transition 4-.

(ii) A transition t in cell / has an inversion influenceon cell / if the transition t in / changes the state of jindependently of its previous state. An analogous definitionexists for an inversion influence caused by a transition 4-.

An influence is either idempotent or inversion.Marinescu further stated [32] that the 30« algorithm

proposed in Reference 31 detects the stuck-at and theidempotent influence faults. He also proposed three differ-ent algorithms of complexity 15w, \ln and l ln withdifferent fault coverages.

Different fault hypotheses have been proposed by Sukand Reddy [33, 34], who developed a model based on theconcept of the 'neighbourhood' of a memory cell, intro-duced by Hayes [35]. They observed that memory cellsalways have two-dimensional array organisations anddefined two neighbourhood types:

(i) A type-1 neighbourhood is a set of five memorycells comprising the base cell and four adjacent neighbourcells located at its top, bottom, left and right, as shownin Fig. 9A.

(ii) A type-2 neighbourhood is a set of memory cellswithin mx columns to the left, m2 rows to the top, m3

columns to the right and m4 rows to the bottom of thebase cell, as shown in Fig. 9B.

Two kinds of fault are considered: active neighbourhoodpattern sensitive faults (ANPSFs) and passive neighbour-hood pattern sensitive faults (PNPSFs).

In ANPSFs the base cell may be influenced by a tran-sition in one of the cells of the neighbourhood, for a fixed

\

base cell

1i

1

Fig. 9A Illustration of a type-1 neighbourhood

///"%.X

8

base cell

1m

Fig. 9B Illustration of a type-2 neighbourhood

value of the other cells of the neighbourhood. This influencemay only occur for a fixed previous value of the base celland only for a transition t (or i) of the written cell.

In PNPSFs a transition (t or 4-) in the base cell cannotoccur when writing the base cell, for a fixed value of theother cells of the neighbourhood.

Suk and Reddy proposed several quasi-optimal algorithmsin References 33 and 34. Courtois [36] proposed a unifiedmodel of fault hypotheses that includes the functional faultmodels discussed earlier; it must be mentioned that, if italso is possible to develop new variants of these quasi-optimal algorithms, it is impossible to say how more rep-resentative of real failures they would be. Moreover, inmost cases, the complexity of the software procedures thatgenerate the test pattern is not given, and so the quasi-optimality may be paid for by very complex softwareprocedures.

3.3 Fault models for regular logic

In the preceding Sections we have examined fault models


for random access memories that may be seen as sequentialregular logic. In this Section we shall deal with combinatorialregular logic and, in particular, with programmable logicarrays (PLAs). The PLA logic structure is shown in Fig. 10.

productterms

ANDarray

I I - -

•••• ORarray

• • • •

inputs

Fig. 10 PLA logic structure

outputs

It consists of two arrays that, in series, perform anyBoolean logic operation. The AND array and the OR arraymay both contain NAND or NOR gates implementing SPor PS functions; other combinations are also possible, ifless common. Fig. 11 shows a possible PLA implementationusing a NOR-NOR structure [37]. The Boolean logicfunction can easily be modified by inserting or removing aMOS transistor into or from a crosspoint without changingthe general structure of the PLA. For this reason PLAsprovide logic designers with an economical way of realisingcombinatorial logic functions: the economy is achievedeither when PLAs are a part of an LSI/VLSI chip or whenthe PLA is itself a chip. In both cases, PLAs must be testedto ensure that they operate correctly. A logic fault modelhas been proposed by Smith [38]. It consists of testing forincorrect logic connections in the AND and OR array cross-

h 13

Fig. 11 Possible PLA implementation

points. Since these are usually made using a device (typicallya transistor or transistor plus fuse), faults may be modelledas:

(i) Missing device: An expected connection is notpresent.

(ii) Extra devices: An unexpected connection is present.

Growth and disappearance faults belong to the former class;shrinkage and appearance faults to the latter. Let usexamine these faults in detail:

(i) Growth faults: A gate input in the AND array isdisconnected from an input variable. Because every gate in

the AND array is an implicant of the Boolean functionimplemented by the PLA, the disconnection causes theimplicant to 'grow' since it becomes independent of oneinput variable.

(ii) Disappearance faults: A gate input in the OR arrayis disconnected from a product line. The disconnectioncauses an implicant to disappear from an output variable.

(iii) Shrinkage faults: A gate input in the AND arraybecomes incorrectly connected to an input variable. Theconnection causes the implicant to 'shrink'.

(iv) Appearance faults: A gate input in the OR arraybecomes incorrectly connected to a product line. Theconnection causes an implicant to appear on an outputvariable.

The fault model assumes that only one failure is present ata time. This assumption is not very restrictive since it maybe proved [38] that important classes of multiple faults aredetected by any single-fault test set.

The regularity in the PLA structure allows the efficientuse of fault collapsing techniques based on fault dominancerules. For instance, if a product line becomes incorrectlyconnected to both an input variable and its complement,there is a shrinkage fault that dominates the disappearanceof the implicant on the output variables, and therefore it isnot necessary to model shrinkage faults when there alreadyexists a connection between a product line and a variable.Moreover, Smith [38] proved the following theorem.

Theorem

A test set that detects all detectable growth disappearance,shrinkage and appearance faults also detects all the follow-ing detectable stuck-at faults:

(i) input inverters SA-1 and SA-0(ii) AND gate inputs and outputs SA-1 or SA-0(iii) OR gate inputs and outputs SA-1 and SA-0

output inverters SA-1 and SA-0(iv)

except in PLAs, where either of the following conditionsholds:

(i) Some AND gates are redundant, i.e. they can bedeleted from the array without affecting the functionsrealised.

(ii) There exists an OR gate output which is normally1 whenever any AND gate output in the array is 1.

The fault model formulated earlier is very useful in testpattern generation: it allows one to generate test sequencestaking into account only the Boolean function and not thePLA internal structure. This implies the possibility ofgenerating in an automatic way relatively small completesets of tests.

4 Conclusion

In this paper the problem of defining efficient fault modelsfor LSI/VLSI devices has been considered, and differentfault models for random logic, microprocessors, RAMs andPLAs have been presented. We have also noted that theinadequacy of present models is one of the main bottleneckspreventing a wider use of CAD systems in automatic test


pattern generation, and that it is essential to develop newfunctional fault models to cope with increasing devicecomplexity.

5 Acknowledgments

This present work has been partially supported by OlivettiTecnost SpA, Ivrea (TO), Italy, and the authors wish tothank Dr. Francesco Olla for his helpful suggestions andcontributions.

6 References

1 SHACKIL, A.F.: 'Microprocessors', IEEE Spectrum, 1982, 19,(1), pp. 32-33

2 CHALKLEY, M.G.: 'Trends in VLSI testing'. Proceedings ofIEEE test conference, Cherry Hill, NJ, USA, 1979, pp. 3-6

3 VAN CLEEMPUT, W.H.: 'A structured design automationenvironment for digital systems'. C0MPC0N 78, San Francisco,USA, 1978, pp. 139-142

4 BREUER, M.H., and FRIEDMAN, A.D.: 'Diagnosis andreliabledesign of digital systems' (Computer Science Press, 1976)

5 'CAD for electronics'. EEC report, 19786 'Simulation'. Italian Research Council report, 19807 WILLIAMS, T.W.: 'Testing logic networks and designing for

testability' Comput. /., 1979, 10, pp. 9-218 SHERTZ, D.R.: 'On the representation of digital faults'. Co-

ordinated Science Laboratory report, University of Illinois,Illinois, USA, 1969

9 GOONDAN, A., and HAYES, J.P.: 'Identification of equivalentfaults in logic networks', IEEE Trans., 1981, C-30, pp. 978-986

10 TO, K.: Fault folding for irredundant combinational circuits',ibid., 1973, C-22, pp. 1008-1015

11 MUEHLDORF, E.C., and SAVKAR, A.D.: 'LSI logic testing -An overview', ibid., 1981, C-30, pp. 1-16

12 BOSSEN, D.C., and HONG, S.J.: 'Cause and effect analysis formultiple fault detection in combinatorial networks', ibid.,1971, C-20, pp. 1252-1257

13 LO, C: 'Probabilistic analysis of faults internal to combinatorialblocks as mapped onto pin faults'. Master's thesis, Universityof Texas, USA, 1978

14 WADRACK, R.L.: 'Technology dependent logic faults'.COMPCON 78, San Francisco, USA, 1978, pp. 124-129

15 WADRAK, R.L.: 'Fault modelling and logic simulation ofCMOS and MOS integrated circuits', Bell Syst. Tech. J., 1978,57, pp. 1449-1475

16 MEI, K.C.Y.: 'Bridging and stuck-at faults', IEEE Trans., 1974,C-23, pp. 720-727

17 ABRAMOVICI, M., BREUER, M., and KUMAR, P.: 'Con-current fault simulation and functional level modelling'. 14thdesign automation conference, 1977

18 MENON, P.R., and CHAPPELL, S.G.: 'Deductive fault simu-lation with functional blocks', IEEETrans., 1979,C-27, pp.690

19 CHAPPELL, S., MENON, P., PELLEGRIN, J., and SCHOWE,A.: 'Functional simulation in the LAMP system'. 13th designautomation conference, San Francisco, USA, 1976 (& J.Des.Autom. & Fault Tolerant Comput., 1977, 1, pp. 203-215)

20 SZYGENDA, S.A., and LEKKOS, A.A.: 'Integrated techniquesfor functional and gate-level digital logic simulation'. Proceed-ings 10th design automation conference, Portland, OR, USA,Jun. 1973, pp. 159-172

21 KOREN, I., and KOHAVI, Z.: 'Diagnosis of intermittentfaults in combinatorial networks', IEEE Trans., 1977, C-26.,pp. 1154-1158

22 MALLELA, S., and MASSON, G.M.: 'Diagnosable system forintermittent faults', ibid., 1978, C-27, pp. 560-566

23 SZYGENDA, S.A.: 'Element level simulation for design verifi-cation and diagnosis'. CRES course report, 1980

24 ROBACH, C, and SAUCIER, G.: 'Diversified test methods forlocal control units', IEEE Trans., 1975, C-24, pp. 562-567

25 ROBACH, C, and SUACIER, G.: 'Dynamic testing of controlunits', ibid., 1978, C-27, pp. 617-623

26 THATTE, S.M., and ABRAHAM, J.A.: 'Test generation formicroprocessrs', ibid., 1980, C-29, pp. 429-441

27 SRIDHAR, T., and HAYES, J.P.: 'A functional approach totesting bit-sliced microprocessors', ibid., 1981, C-30, pp. 563-571

28 FRIEDMAN, A.D.: 'Easily testable iterative systems', ibid.,1973, C-22, pp. 1061-1064

29 FRIEDMAN, A.D., and MENON, P.R.: 'Fault detection indigital circuits' (Prentice-Hall, 1971)

30 BARRACLOUGH, W., CHIANG, A.C.L., and SOHL, W.:'Techniques for testing the microcomputer family', Proc.IEEE, 1976, 64, pp. 943-950

31 NAIR, R., THATTE, S.M., and ABRAHAM, J.A.: 'Efficientalgorithms for testing semiconductor random access memories',IEEE Trans., 1978, C-27, pp. 572-576

32 MARINESCU, M.: 'Simple and efficient algorithms for func-tional RAM testing'. IEEE test conference, Philadelphia, USA,Oct. 1982

33 SUK, D.S., and REDDY, S.M.: 'Test procedures for a class ofpattern sensitive faults in semiconductor random accessmemories', IEEE Trans., 1980, C-29, pp. 419-429

34 SUK, D.S., and REDDY, S.M.: 'A march test for functionalfaults in semiconductor random access memories', ibid., 1981,C-30

35 HAYES, J.P.: 'Testing memories for single cell pattern sensitivefaults', ibid., 1980, C-29, pp. 249-254

36 COURTOIS, B.: 'Functional RAM testing: A unified model'.Internal Report 235, IMAG, Grenoble, France, Feb. 1981

37 MEAD, C, and CONWAY, L.: 'Introduction to VLSI systems',(Addison Wesley, 1980)

38 SMITH, J.E.: 'Detection of fault in programmable logic arrays',IEEE Trans., 1979, C-28, pp. 845-853

Guide to authorsSoftware & Microsystems reports innovations in the fieldsof micro hardware and software. The journal has a highintellectual level, the readership being workers in the fore-front of microtechnology.

Practical technical papers which should be original andcurrently applicable (about 5000 words long) and state-of-the-art reviews are acceptable, as well as letters andcomments (about 200 words), including constructive criti-cisms about practical problems in systems and circuits.Acceptance of material is subject to a peer review procedure.

The complete typescript (three copies), including referencesand legends to illustrations, should be double spaced and onone side of the paper only. SI units should be used through-out. An abstract of not more than 150 words shouldaccompany the manuscript. Papers should be submitted tothe Executive Editor, Software & Microsystems, IEE

Publishing Department, PO Box 8, Southgate House,Stevenage, Herts SGI 1HQ, England.

Reference should include all of the following:

(a) Names of all authors(b) Title of paper(c) Title of journal(d) Year of publication and volume number(e) First and last page numbers

e.g.BURKIMSHER, P.C.: 'EMU: a multiprocessor software debuggingtool', Software & Microsystems, 1982, 1, (2), pp. 41-47

Illustrations should be suitable for direct reproduction.

The affiliation and address of each author should be givenfor publication.


Date post:	21-Sep-2016
Category:	Documents
Upload:	paolo
View:	213 times
Download:	0 times

A review of fault models for lsi/vlsi devices

Documents