Fault modeling and diagnosis for nanometric analog circuits · Fault modeling and diagnosis for...

Fault modeling and diagnosis for nanometricanalog circuits

Ke Huang∗, Haralampos-G. Stratigopoulos†, and Salvador Mir†∗Department of Electrical Engineering, The University of Texas at Dallas, Richardson, TX 75080, USA

†TIMA Laboratory (CNRS-INP Grenoble-UJF), 46 Av. Felix Viallet, 38031 Grenoble, France

Abstract—Fault diagnosis of Integrated Circuits (ICs) hasgrown into a special field of interest in the SemiconductorIndustry. Fault diagnosis is very useful at the design stage fordebugging purposes, at high-volume manufacturing for obtainingfeedback about the underlying fault mechanisms and improvingthe design and layout in future IC generations, and in caseswhere the IC is part of a larger safety-critical system (e.g.automotive, aerospace) for identifying the root-cause of failureand for applying corrective actions that will prevent failure re-occurrence and, thereby, will expand the safety features. In thissummary paper, we present a methodology for fault modelingand fault diagnosis of analog circuits based on machine learning.A defect filter is used to recognize the type of fault (parametric orcatastrophic), inverse regression functions are used to locate andpredict the values of parametric faults, and multi-class classifiersare used to list catastrophic faults according to their likelihood ofoccurrence. The methodology is demonstrated on both simulationand high-volume manufacturing data showing excellent overalldiagnosis rate.

I. INTRODUCTION

An IC is tested several times during its lifetime. A firstset of tests are performed at wafer-level before packaging,in order to identify gross instabilities in the manufacturingprocess. Final module tests are performed after packaging andaim to verify that the actual design specifications of the ICare met. Depending on the end-user application, ICs may alsogo through burn-in tests, where they are exercised sufficientlylong in stress conditions, in order to avoid early in-use systemfailures. Finally, ICs that are deployed in safety-critical andmission-critical applications need to be tested during theirnormal operation in idle times or even concurrently. In manycases, whenever an IC fails a test, it is important to diagnosethe source of failure.

At the design stage, diagnosing the sources of failures in thefirst prototypes helps to reduce design iterations and to meetthe time-to-market goal. Failures at this stage are related tothe incomplete simulation models and the aggressive designtechniques that are being adopted to exploit the maximumof performances out of the current technology. Especially foranalog circuits, failures at this stage are very common due tothe lack of reliable design automation tools.

This paper is an invited summary paper on the Ph.D. dissertation work ofKe Huang [1] that was carried out at TIMA Laboratory (CNRS - GrenobleINP - UJF), Grenoble, France, from October 2008 to November 2011, underthe supervision of Haralampos-G. Stratigopoulos and Salvador Mir. Parts ofthis summary paper have been previously published in [2]–[4]. This summarypaper was invited in the framework of the finals of the 2013 IEEE ComputerSociety Test Technology Technical Council (TTTC) doctoral thesis awardcompetition that took place at the 2013 IEEE International Test Conference,at Anaheim, CA, USA.

In a high-volume production environment, diagnosing thesources of failures can assist the designers in gathering valu-able information regarding the underlying failure mechanisms.The objective here is to make use of the diagnosis resultsto enhance yield for future products through improvement ofthe manufacturing environment and development of designtechniques that minimize the failure rate.

Diagnosis is also of vital importance in cases where theIC is part of a larger system that is safety-critical, for exam-ple, a system that is deployed in automotive, aerospace, orbiomedical applications. During its lifetime, an IC might faildue to aging, wear-and-tear, harsh environments, overuse, ordue to defects that are not detected by the production testsand manifest themselves later in the field of operation. Here,it is important to identify the root-cause of failure so as torepair the system if possible, gain insight about environmentalconditions that can jeopardize the system’s health, and applycorrective actions that will prevent failure re-occurrence and,thereby, will expand the safety features.

Fault diagnosis is a severe challenge nowadays that callsfor immediate solutions. According to anecdotal evidence [5],35% of car failures are due to the embedded electronics, ofwhich only 60% are diagnosed, the rest being classified as“trouble not found”. Failure analysis (FA) of defective ICsis traditionally performed using light-emission, laser prob-ing, picosecond imaging, etc. All these methods consist ofobserving failures by their optical characteristics. However,with the increasing reduction in feature sizes and the highcomplexity of modern ICs, the time and the cost required forapplying these methods has become intolerable. To this end,there is a pressing need for an alternative diagnosis approach.The aim is to develop a low-cost approach that is able todetermine the root cause of failure or to guide appropriately theaforementioned classical FA methods and reduce the requiredtime-to-diagnose.

An alternative solution is to develop test vectors to de-embed the values of components in IC blocks, a form ofreverse engineering. The challenges with this approach includethe limited controllability and observability of internal ICblocks and the defect ambiguity (different defects having thesame influence on the IC behavior) which does not permitcase-based reasoning. Furthermore, it is difficult to deal withunanticipated defects and with the limited available diagnosticinformation (typically only one/few IC samples showing thesame erroneous behavior are available).

In this work, we propose a course of action for faultdiagnosis in analog ICs. We first employ a defect filter thatrecognizes the type of fault that has occurred, e.g. catastrophic

Paper PTF3978-1-4799-0859-2/13/$31.00 c©2013 IEEE

INTERNATIONAL TEST CONFERENCE 1

or parametric, based on the diagnostic measurements. Thedefect filter revokes any assumptions about the type of faultthat has occurred and offers a unified catastrophic/parametricfault diagnosis approach. Thereafter, if the type of fault isparametric, parametric fault diagnosis is achieved down to thetransistor level, i.e. conclusions can be made as to which cir-cuit branch, transistor or passive component is affected, usinginverse regression functions. The inverse regression functionsallow us to infer the value of several circuit parameters ofinterest based on the diagnostic measurements. On the otherhand, if the type of fault is catastrophic, catastrophic faultdiagnosis is achieved by multi-class classifiers trained in thespace of diagnostic measurements. The classifiers list the mostprobable defects according to their probability of occurrenceand, thereby, guide and accelerate the classical FA methods.

The rest of the paper is structured as follows. In Section II,we discuss the failure mechanisms in ICs and we provide abrief survey of the alternative diagnosis approaches proposedin the literature to date. Section III describes our approach toanswering the principal fault diagnosis questions. In SectionIV, we demonstrate the methodology on both simulation andhigh-volume manufacturing data. Finally, Section V concludesthe paper.

II. FAILURE MECHANISMS IN ICS AND DIAGNOSISAPPROACHES

ICs can fail due to large global process variations whichaffect complete regions of a wafer. Typically such variationsoccur in immature technologies and are due to instabilitiesin the process conditions and materials, mask misalignment,etc. Large global process variations are not considered in adiagnosis context since they can be readily detected at wafer-level by supply current tests or by process control monitors inthe scribe lines.

ICs can also fail due to normal global process variationsthat affect all identical components in the same die in the sameway, e.g., making the transistor gate lengths on the same dieall larger or all smaller. This type of failure mechanism isoften referred to as inter-die variation.

IC failure is also ascribed to local process variations whichaffect the components across the die independently. In gen-eral, such variations can lead to deviations of process-relatedparameters in individual components, but they do not alter thecircuit’s topology. Examples of local process variations includelocal geometrical deformations, such as variations in the ef-fective channel length Leff and width Weff of MOS devices,doping concentration variations due to the non-uniformity ofthe dopant ions density distribution, etc. In addition, localprocess variations can be induced in the field depending on theoperation conditions due to aging phenomena such as NegativeBias Temperature Instability (NTBI), Hot Carrier Injection(HCI), electromigration, etc. This type of failure mechanismis often referred to as intra-die variation. Failures due to interand/or intra-die variation are often considered as parametricfaults. Techniques for diagnosing parametric faults includeexplicit nonlinear equations [6], [7], sensitivity analysis [8],[9] and regression functions [2], [10].

Another failure mechanism is local spot defects. Theseare particles or contamination which can occur in any man-ufacturing step. They can take forms of missing or extramaterials and they are often modeled by open- and short-circuits. Since they lead to a modification of the circuit’stopology, they are often considered as catastrophic faults. Spotdefects have for a long time been recognized as the mainroot cause of IC failures [11], [12], but with the advent ofshort-channel technologies, parametric faults have become asignificant source of yield loss. As shown in [13], [14], spotdefects can have a finite resistance value for open-circuits anda non-negligible resistance value for short-circuits.

Numerous methods have been proposed to diagnose spotdefects. The most well-known approach is the fault dictionaryapproach. It requires the a priori definition of a list of defectsand their locations which can be obtained by historical de-fective data and an Inductive Fault Analysis (IFA). Diagnosisconsists of assigning a defect in the dictionary to the DeviceUnder Test (DUT). This is in essence a pattern recognitionapproach, which can be solved in a deterministic way using,for example, k-nearest neighbors [10], supervised neural net-works [15], unsupervised neural networks [16], support vectormachines [2], etc. It can also be solved in a probabilistic wayto address the fault ambiguities [3], [17], [18].

III. PROPOSED APPROACH

The proposed fault diagnosis method relies on an assemblyof learning machines that are tuned in a pre-diagnosis learningphase. A high-level description is illustrated in Fig. 1. Thediagnosis starts by obtaining the diagnostic measurementsspecified in the pre-diagnosis phase. At first, we can resideon a subset of the standard specification-based tests. If the di-agnostic accuracy is not sufficient, the complete specification-based test suite can be used or additional special tests canbe crafted to target undiagnosed parameters or to resolveambiguity groups.

The central learning machine is a defect filter that istrained in the pre-diagnosis phase to distinguish devices withcatastrophic faults from devices with parametric faults [2].Thus, the defect filter enables a unified catastrophic/parametricfault diagnosis approach without needing to specify in advancethe fault type. This defect filter relies on a non-parametrickernel density estimation (KDE) f (m) of the joint probabilitydensity function f (m), where m = [m1, . . . ,md] is the d-dimensional diagnostic measurement vector. Notice that f (m)is estimated using only devices with process variation, that is,no devices with catastrophic faults are required to estimatef (m). By construction, f (m) is parameterized with a singleparameter α, namely f (m, α), which can be tuned in the pre-diagnosis learning phase to control the extent of the filter, thatis, how much lenient or strict it is in filtering out devices [2],[19].

Figure 2 shows an example of a fitted joint probabilitydensity function in a 2-dimensional measurement space. Thedensity is fitted using the devices with process variationsshown with the blue dots. The isoline contour of zero probabil-ity density serves as the defect filter. By tuning the parameter

Paper PTF3 INTERNATIONAL TEST CONFERENCE 2

Multi-class classifiers

Fig. 1. Proposed fault diagnosis flow.

α, we can set the location of the isoline contour of zeroprobability density so as to make the defect filter stricter ormore lenient. As can be seen, the devices with catastrophicfaults, shown with the red dots, lie in an area that has zeroprobability density, that is, f (m, α) = 0, since they are incon-sistent with the statistical nature of the bulk of the data fromdevices with process variations that was used to estimate thedensity. The devices with catastrophic faults that are filteredout are forwarded to multi-class classifiers that are trained inthe pre-diagnosis phase to map any diagnostic measurementpattern to the underlying catastrophic fault. Thus, in this stepwe follow a fault dictionary approach that employs multi-class classifiers, each with N outputs, where N is the numberof modeled catastrophic faults in the pre-diagnosis phase.Details of diagnosis of catastrophic faults will be discussedin Section III-A. On the other hand, if f (m, α) > 0, thedevice is considered to contain process variations, that is, aparametric fault has occurred. For parametric fault diagnosis,we use nonlinear inverse regression functions that are trainedin the pre-diagnosis phase to map the diagnostic measurementpattern to the values of circuit parameters of interest. Detailsof diagnosis of parametric faults will be discussed in SectionIII-B.

The defect filter is always tuned to filter out devices withcatastrophic faults. However, this could inadvertently result insome devices with parametric faults being also screened outand forwarded to the classifier. To correct this leakage, eachmulti-class classifier is trained during the pre-diagnosis phaseto include detection of devices with process variations as well,i.e. an additional output is added, raising the number of outputsto N + 1. Thus, in the unlikely case where a device with aparametric fault is presented to a classifier, the classifier kicks

f (m, α) > 0 Devices withprocess variations

f (m, α) = 0 Devices with catastrophic fault

~

~

f (m, α) ~

Measurement 1Measurement 2

Fig. 2. Defect filter in a 2-dimensional diagnostic measurement space.

(a)

104

106

108

1010

x10 -10

0

2

4

6

Den

sity

R (in Ω)

0

0.4

0.8

1.2

(b)

0 102

103

Den

sity

x10 -3

R (in Ω)10

2

Fig. 3. Estimated probability density function of resistance (in Ω) for (a)open defects and (b) short defects, plotted in logarithmic scale

it back to the regression tier.

A. Diagnosis of catastrophic faults

From an IFA and historical defect data, we create a list ofthe N most probable catastrophic short- or open-circuit faultlocations. Suppose that we inject the j-th catastrophic fault inthe device and we perform Monte Carlo simulation, where ineach pass a different short or open resistance is used. Thesevalues are sampled from the resistance distributions for short-and open-circuits [3], as shown in Figure 3. Let

mji =

[mji,1,m

ji,2, · · · ,m

ji,d

](1)

denote the d-dimensional diagnostic measurement vector ofthe i-th simulation for the j-th catastrophic fault. For nsimulations, we obtain the j-th fault cluster for the j-thcatastrophic fault

FCj =

mj1, · · · ,mj

n

. (2)

In other words, the j-th fault cluster consists of n pointsallocated in the space of diagnostic measurements, where eachpoint corresponds to the diagnostic measurement pattern of thej-th defect for a specific resistance value. It is also possibleto enhance each fault cluster with more points that representprocess spread. This is recommended if we can afford theextra simulation effort. In particular, for each resistance value,we can perform n

′Monte Carlo simulations by allowing

the circuit parameters to vary according to their fault-freedistributions in the process design kit. In this case, each faultcluster consists of nc = n · n′ points. The fault clusters FCj ,j = 1, · · · , N , compose the fault dictionary.


The fault dictionary FCj, j = 1, · · · , N is used in thepre-diagnosis phase to train a set of c multi-class classifiersC1, C2, · · · , Cc, where each classifier allocates a boundaryin the space of diagnostic measurements to separate one faultcluster from another. For a device that is diagnosed by thedefect filter to contain a catastrophic fault, we obtain the samed-dimensional diagnostic measurement pattern and we presentit to the c classifiers. Each classifier assigns a score to each ofthe N considered catastrophic faults, instead of just makinga deterministic judgment about which catastrophic fault ispresent in the faulty device. Thereafter, the individual scores ofthe classifiers are combined to assign a single score dcom(j)to each catastrophic fault [4]. As suggested by practitionersin the field of pattern recognition [20], [21], the overallclassification accuracy can be improved by combining theresponse of different classifiers. Various combination methodshave been proposed in the literature, including averaging,weighted averaging, majority vote, fuzzy integral, etc. [20],[21]. We have chosen the averaging method by reason ofits simplicity and its capacity of providing a score for allcatastrophic faults [4]. In contrast, the weighted averagingand fuzzy integral methods require a validation set to assignweights to the classifiers, yet this validation set is typicallynot available at the time when the diagnosis tools are built.Furthermore, the majority vote method renders a deterministicdiagnosis rather than a ranking of catastrophic faults that arelikely to have occurred. The averaging method consists ofcomputing the average value of scores obtained by differentclassifiers. Formally, for a total number of c classifiers, thescore for j-th defect is given by

dcom(j) =1

c

c∑i=1

di(j), (3)

where di is the score assigned by the i-th classifier. The outputof the diagnosis phase is the ranking of the catastrophic faultsaccording to their likelihood of occurrence in the faulty device.

B. Diagnosis of parametric faults

In the pre-diagnosis phase, we train a set of non-linear re-gression functions to map the diagnostic measurement patternto the values of circuit parameters of interest. In particular, fornp parameters pjj=1,··· ,np

, we train np regression functionsfj : m 7→ pj , j = 1, ..., np [2]. The training phase employsa set of devices with typical and extreme process variations.Unlike prior work on parametric fault diagnosis, this approachallows an implicit modeling of the unknown dependenciesbetween m and all pj using statistical data and domain-specific knowledge. Thus, it avoids the complications relatedto an explicit formulation (i.e. diagnosability, convergence,problems with large deviations in parameters, etc) [6]–[9].For a device that is diagnosed by the defect filter to contain aparametric fault, we obtain the diagnostic measurement patternand we use the inverse regression functions to predict thevalues of circuit parameters. The main goal is to constructregression models with generalization capabilities, i.e. that canaccurately predict the parameters of devices other than thosein the training set.

Bias circuit

Fig. 4. Schematic of LNA under test.

Fig. 5. Layout of LNA under test.

IV. CASE STUDIES

A. Low noise amplifier (LNA)

Our first case study is a 2.4 GHz LNA designed with the0.25 µm BiCMOS7RF ST Microelectronics technology [2].The schematic of the LNA is shown in Figure 4, the layoutis shown in Figure 5, and the specification requirements arelisted in Table I. We have chosen the four scattering parametersas our initial diagnostic measurements (a DC diagnostic testwill be added later to resolve one ambiguity that we found).Each scattering parameter is sampled at 41 frequency pointsbetween 1 GHz and 5 GHz with a step of 100 MHz. Thus, intotal, we have 4× 41 = 164 diagnostic measurements.

1) Fault models: In this case study, we consider simplecatastrophic fault models by setting n = 1. Short circuits aremodeled with a 1 Ω resistor, while open circuits are modeledwith a 10 MΩ resistor. As will be shown later, each faultcluster is enhanced with more points by running a MonteCarlo simulation to consider process spread. In total, thereare 23 catastrophic faults, which are listed in Table II. In theabbreviation term x XX yz, x denotes the fault type (x=s


TABLE IPERFORMANCES AND SPECIFICATION LIMITS FOR THE LNA UNDER TEST.

NF (dB) S11 (dB) S12 (dB) S21 (dB) S22 (dB) 1-dB CP (dBm) IIP3 (dBm)

≤ 0.7 ≤ −8 ≤ −35 ≥ 11.5 ≤ −8.1 ≥ −3 ≥ 2.8

TABLE IILIST OF CATASTROPHIC FAULTS.

Fault Faulty ComponentF1 s M3 gs, s M3 dsF2 s M1 dsF3 s M1 gsF4 s M1 gdF5 s M2 dsF6 s M2 gd, s L3, s R3, s C1F7 s M2 gsF8 o M3 dF9 o M3 g

F10 o M3 sF11 o M1 g, o L2F12 o M1 s, o L1F13 o M1 d, o M2 sF14 o M2 gF15 o M2 dF16 s R1F17 s R2F18 s L2F19 s L1F20 o R1, o R2F21 o L3F22 o R3F23 o C1

for short circuit and x=o for open circuit), XX denotes theaffected component, and yz concerns only the transistors anddenotes the terminals pair (g=gate, d=drain, and s=source).

We model parametric faults as large deviations in the passivecomponents and in the low-level transistor parameters (i.e. ox-ide thickness, substrate doping concentration, surface mobility,flatband voltage, etc.). Large parametric deviations in passivecomponents are imposed by simply distorting their fault-freedistribution to have a larger standard deviation. With respect tolow-level transistor parameters, we noticed that in the designkit they are parameterized with a single variable t with nominalvalue t = 0. Thus, denoting these parameters by q1, ..., qk, thetransistor model consists of intricate functions of the formqi = fi (t, q1, ..., qi−1, qi+1, ..., qk). A Monte Carlo simulationis then enabled by simply varying t uniformly around t = 0

TABLE IIILIST OF CIRCUIT PARAMETERS UNDER DIAGNOSIS.

RMSParameter Nominal Fault-free Distortedpredictionvalue distribution distribution

errorC1 500 fF -5...5% -40...40% 3.9%L1 700 pH -5...5% -40...40% 3.2%L2 8 nH -5...5% -40...40% 2.1%L3 6 nH -5...5% -40...40% 2.1%R1 2 KΩ -5...5% -40...40% 25.9%R2 3 KΩ -5...5% -40...40% 22.9%R3 100 Ω -5...5% -40...40% 1%

Cgs1 347 fF -20.3...23% -44.4...27.7% 2.7%gm1 84 m -20.3...42.6% -94.1...79.7% 3.5%Cgs2 358 fF -13.8...17.7% -34.5...20.8% 2.6%gm2 87 m -18.8...34.5% -94...70.6% 3.4%Cgs3 52 fF -19.2...22.4% -22.1...24.4% 3%gm3 10 m -13.1...16.3% -26.1...42.3% 11.8%

with standard deviation σt. This observation allowed us togenerate realistic faulty transistor models by assigning a largerstandard deviation βt · σt, βt > 1. Intuitively, deviations inlow-level transistor parameters will be reflected in the small-signal parameters. To this end, we deemed efficient to monitordeviations in gm and Cgs.

The first column of Table III summarizes the circuit param-eters that we diagnose in our experiment (13 in total). Thesecond column lists their nominal values. The third columnshows minimum and maximum parameter variations observedover 5000 Monte Carlo simulations using STMicroelectronicsin-house values for the standard deviations. The forth columnshows the corresponding parameter variations after havingincreased the standard deviations. It should be noted that thedistortions that we have imposed in the parameters distribu-tions are illustrative and can be changed to accommodate anyfault model of this type.

2) Classifier and regression functions: We have consideredone classifier in this case study, that is, c = 1. The classifieris a support vector machine (SVM) [22]. We obtain the nor-malized scores for each fault cluster using (3). The diagnosedcatastrophic fault is considered to have the highest probabilityof occurrence in the faulty device. In contrast to other type ofclassifiers (i.e. neural networks, nearest neighbors, etc.), SVMsallocate the separation boundaries such that they traverse themiddle of the distance between the fault clusters. Now, as willbe shown later, our fault clusters are cleanly separated whenthey are projected in the diagnostic measurement space, i.e.there are large empty subspaces amidst the fault clusters. Incase of fault ambiguities, more classifiers can be considered toimprove the classification accuracy, as shown in Section III-A.SVMs can be adapted for regression as well [22].

3) Pre-diagnosis learning phase: We generate the follow-ing data sets to train and validate the learning machinesof the diagnosis flow (i.e. defect filter, classifier, regressionfunctions):

• The set S1 contains 10000 LNA instances generated byMonte Carlo simulation where all circuit parameters aresampled from their distorted distributions in Table III.The hint here is to model larger component variationsin the pre-diagnosis phase than those expected in reality.This way, we minimize the probability that the defectfilter will screen out devices with excessive parametricdeviations and we ensure that future devices will fallin regions where the regression functions are valid, i.e.in regions where there were enough samples duringthe pre-diagnosis phase to carry out the regression. Inother words, S1 must be information-rich such that theregression functions can generalize for every possibleparametric fault scenario.

• The set S2 contains 23 subsets S2j , j = 1, ..., 23,corresponding to the 23 fault classes in Table II. Eachsubset S2j contains 100 LNA instances generated byinserting the hard fault j in the netlist and subsequentlyrunning 100 Monte Carlo simulations where the rest ofthe circuit parameters are sampled from their fault-freedistributions. Thus, the size of S2 is 23× 100 = 2300.


Fig. 6. Projection of training devices in the top three principal components.

To gain some insight about the structure of the data,we perform a Principal Component Analysis (PCA) on the(10000+2300)×164 matrix whose rows correspond to thediagnostic measurement patterns of the devices in S1 andS2. Fig. 6 shows the projection of these devices in the topthree principal components. Fault clusters are representedwith different colors, whereas the largely populated “processvariation” class is represented with black dots. As can beobserved, even in this primitive visualization, fault clustersare cleanly separated.

The set S1 is split in two equal sets St1 and Sv1 uniformlyat random. Similarly S2 is split in St2 and Sv2 .St1 is used to built the defect filter, i.e. to generate the density

estimate f (m,α). Sv1 and S2 are used to validate the defectfilter. We tested a defect filter with α = 0 (this value of αimplements a rather strict defect filter, see [19]) which gaveoptimal filtering: devices in S2 have a zero probability densitywhile devices in Sv1 have a nonzero probability density.

The regression models are trained using St1 and are validatedusing Sv1 . The result is shown in the fifth column of TableIII in terms of the root mean squared (RMS) predictionerror. As can be observed, the regression models can predictaccurately multiple parameter variations with the exception ofthe resistors R1, R2 and the transistor M3 in the bias circuit.In retrospect, this could have been anticipated because thebias circuit operates in DC, thus it is not excited by the high-frequency diagnostic measurements. As we will see later, thisresults in an ambiguity, which calls for additional diagnosticmeasurements.

The classifier is trained using St1 and St2 and is validated us-ing Sv1 and Sv2 (S1 constitutes the “process variations” class).The only misclassification occurred between fault classes F8and F9. Looking at the LNA schematic, it can be observedthat faults F8 and F9 have the same effect: the transistor M3is off. Thus, these two fault classes can be collapsed in one,resulting in an overall 100% classification rate. This exampleillustrates that the classifier can help us to identify ambiguouscatastrophic faults in the pre-diagnosis phase that we missedout by just looking the schematic with the naked eye.

TABLE IVSINGLE SOFT FAULT SCENARIOS.

Single fault Number of faulty RMS error ofscenarios circuits /100 estimated valuesC1+30% 69 1.9%C1-30% 0 -L1+30% 74 1.5%L1-30% 0 -L2+30% 17 1.9%L2-30% 81 1.9%L3+30% 88 1.5%L3-30% 0 -R1+30% 0 -R1-30% 0 -R2+30% 0 -R2-30% 0 -R3+30% 100 0.006%R3-30% 42 1.3%

M1+ 19 cgs1 : 2.3%gm1 : 1.2%

M1- 4 cgs1 : 1%gm1 : 1%

M2+ 0 -M2- 0 -M3+ 16 cgs3 : 1.9%

gm3 : 5.1%M3- 94 cgs3 : 3.2%

gm3 : 3.1%Total 604/2000 -

4) Diagnosis phase: We use the following data sets toevaluate the generalization of the proposed diagnosis flow:

• The set S3 is generated independently in the same way asS2. This set corresponds to 23 single hard fault scenarios.

• The set S4 contains 20 subsets S4j , j = 1, ..., 20,corresponding to the 20 single parametric fault scenariosshown in the first column of Table IV. For the passivecomponents, we consider ±30% deviations. For the tran-sistors, we distort the mean value of t in two directions(Mi+ means positive direction and Mi- means negativedirection) such that the inflicted (excessive) variationson gm and Cgs are still within the ranges of the fourthcolumn of Table III. Each subset S4j contains 100 LNAinstances generated by inserting the j-th single para-metric fault and running 100 Monte Carlo simulationswhere the rest of (unaffected) parameters are sampledfrom their fault-free distribution. Thus, the size of S4 is20× 100 = 2000.

The devices in S3 and S4 undergo specification-basedtesting, according to Figure 1. All devices in S3 violate at leastone specification and as such are labeled as faulty. However,this is not the case for devices in S4, as shown in the secondcolumn of Table IV. Faulty devices are next forwarded to thediagnosis phase where they are first subjected to the defectfilter. The defect filter fails to characterize correctly a singledevice with parametric fault L2+30%, which is erroneouslyscreened out and forwarded to the classifier. However, theclassifier maps it to the “process variation” class and kicksit back to the regression tier as indicated by the dashed arrowin Figure 1. The rest of devices with catastrophic faults are allcorrectly classified, thus we conclude that catastrophic faultdiagnosis succeeds 100%.


4 6 8 10L2 (nH)

(a)

124

6

8

10

12L2

pre

dict

ed (n

H)

R3 p

redi

cted

(nH)

50 70 90 110 130 150R3 (Ω)

(b)

50

70

90

110

130

150

Fig. 7. Comparison between target and predicted values for (a) L2 (b) R3.

All faulty devices in S4 are forwarded to the regressiontier. The third column of Table IV shows the RMS predictionerror of the parameters that deviate in each fault scenario andFigure 7 plots the situation for L2 and R3. Note that the RMSprediction error of the “fault-free” parameters (not shown heredue to lack of space) is similar to that of Table III (in generalit is even smaller since large errors typically correspond toexcessive deviations).

The following observations are useful in building parametricfault diagnosis rules: (a) the deviations of gm1 and gm2 arenot necessarily due to a fault in M1 or M2. Indeed, gm1 andgm2 are also defined by the current flowing through M1 andM2, which, in turn, depends on all passive components in thesame branch, as well as on the bias circuit. Thus, a fault inany passive component or in the bias circuit will also impactgm1 and gm2. (b) A soft fault in M2 does not render thecircuit faulty (see zero M2 entries in Table IV). (c) Recallfrom section IV-A3 that the components of the bias circuitcannot be diagnosed by high-frequency measurements; hence,the predicted deviations of R1, R2, or M3 are not genuineand, thereby, are disregarded. (d) The probability of two faultscenarios occurring at the same time is negligible (single faultassumption).

Based on the predicted values of parameters and the aboveobservations, we define the following diagnosis rules: (a)if gm1 and gm2 deviate at the same time when a passivecomponent deviates, then the faulty component is the passivecomponent. (b) If both gm1 and gm2 deviate, then the faultycomponent is M1 or is located in the bias circuit. The latterrule leads to the only ambiguity so far. Now, note that theLNA fails if a fault within the bias circuit results in adramatic decrease of the DC bias point of M1 and/or the inputimpedance of the bias circuit. Thus, this ambiguity can beresolved in part by measuring the gate-source voltage Vgs3 ofM3 (the gate of M3 is not an RF sensitive node). Two follow-up rules to rule (b) above are: (c) if gm1 deviates and Vgs3is outside its tolerance, then M3 is faulty, (d) if gm1 deviatesand Vgs3 is within its tolerance, then the faulty component isM1 or is located in the bias circuit. Using rule (c), we wereable to diagnose correctly 49 out of the 16+94=110 circuitswith faulty M3.

B. Controller Area Network (CAN) transceiver

Our second case study is a Controller Area Network (CAN)transceiver designed by NXP Semiconductors in a BiCMOS-DMOS process. The netlist of this DUT has 1032 elements

Fig. 8. High-level block diagram of the CAN transceiver.

Fig. 9. FIB image of the short-circuit defect diagnosed in DUT 18.

of which 613 are transistors. A high-level block diagram ofthe device is shown in Figure 8. This device is produced inhigh-volume and constitutes an essential part in the electronicsystem of automobiles. It is deployed in a safety-criticalapplication, thus it has to meet stringent specifications anddemands practically zero test escapes. Therefore, it is of vitalimportance to diagnose the sources of failure, in order toachieve better quality control and, when possible, improve thedesign such that similar failures do not emerge in the fieldduring the lifetime of the operation [4].

We have at hand a set of 29 devices from different lotsthat failed at least one of the specifications during productiontest. The classical FA was carried out for all these devicesand it was observed in all cases that the cause of failure is ashort-circuit defect. For example, Figure 9 shows a FocusedIon Beam (FIB) image of the short-circuit defect observed inDUT 18 and Figure 10 shows a Scanning Electron Microscope(SEM) image of the short-circuit defect observed in DUT26. For the purpose of the experiment, we assume that theactual defects that have occurred in each of these devices areunknown and we set out to diagnose them by applying the


Fig. 10. SEM image of the short-circuit defect diagnosed in DUT 26.

proposed flow. The standard production tests for this DUTinclude digital, analog, and IDDQ tests. We consider d = 97non-digital tests (i.e. voltage, current, timing and hysteresismeasurements) which dominate the test time. No additionalmeasurements are performed for the purpose of diagnosis.Each measurement is scaled in the range [-1,1].

1) Fault model: For this particular device produced inhigh volume under mature technology, where process variationis well understood and controlled, device failures due toparametric deviation of process and device parameters are veryunlikely to occur. Furthermore, for this particular technology,open-circuit defects are less likely to occur than short-circuitdefects. In fact, in analog designs, typically one has space todo via doubling, which makes open-circuit defects even lesslikely. As a result, more than 90% of the observed defectsin production are short-circuits. Thus, only catastrophic short-circuit faults are considered for fault modeling.

We have performed an IFA which resulted in a list ofN=923 probable short-circuit faults. Each short-circuit ismodeled with 3 different bridge resistance values (e.g. 5 Ω, 50Ω, 200 Ω), that is, n = 3. These values are chosen accordingto defect data characterization analysis for this particulartechnology. Subsequently, a total of 3 × 923 = 2769 faultsimulations were carried out to generate the fault clusters thatwe use to build the diagnosis tools. In this large-scale industrialcase study, we cannot afford extra simulation effort to considerprocess variation in fault simulation. Thus, each simulationconsists of inserting a short-circuit defect in the netlist witha specific bridge resistance value while the circuit parametersare fixed to their nominal design values, that is, n′ = 1. Ineach fault simulation we collect the same d =97 diagnosticmeasurements. Fault simulation took approximately 12 hours.Notice that fault simulation is a one time effort. Building thediagnosis tools and performing the diagnosis of a faulty DUTtakes only a few minutes [4].

2) Missing value analysis: In this real-world case study, theinjection of a defect in the device netlist might render the sys-tem of equations during circuit simulation unsolvable. There-fore, it is highly likely that there exist diagnostic measurementsthat are unattainable for specific defects and specific resistancevalues. The problem of missing values also concerns the realdiagnostic measurement pattern ml = [ml,1,ml,2, · · · ,ml,d]

Indeed, a diagnostic measurement might hit the instrumentlimit, in which case its value is artificially “forced” to equalthe instrument limit. In this case, we can only use the pass/failinformation provided by the diagnostic measurement and weshould consider the absolute value as missing.

Let zk denote a value of the k-th diagnostic measure-ment. According to our notation in Section III, zk =mj

i,k,ml,kj=1,··· ,Ni=1,··· ,n . In this work, we apply the Not Missing

At Random (NMAR) mechanism [23] which states that zkis considered to be missing if |zk| > nth, where nth isa threshold value. Notice that the fact that each diagnosticmeasurement is scaled in the range [-1,1] allows us to use asingle threshold nth. The definition of the value of nth is nota simple task due to the discrepancy between the simulationenvironment and the characterization test bench. One canchoose to incorporate the load board configuration, the testhardware, the test instrument limits, etc., in the simulationenvironment, but this is time consuming given the complexityof the characterization measurements, if at all possible. For thispurpose, we follow the suggestion in [23] and we consider avariety of missing models, that is, many different values ofnth are tested.

The proposed approach to account for the missing data isas follows:

1) If ml,k is missing, then the k-th diagnostic measurementis excluded from the analysis.

2) If mji,k is missing but the same element is available for

other resistance values of the j-th defect, then mji,k is re-

placed by the mean value of the available elements. Thisapproach is called mean imputation [23]. For example,if mj

h,k is available for h = 1, · · · , i − 1, i + 1, · · · , n,then mj

i,k is replaced by 1n−1

∑h6=im

jh,k.

3) Let

Aj =

mj

1

mj2

...mjn

(4)

denote the matrix corresponding to the j-th fault clusterFCj and let

A =

A1

A2

...AN

. (5)

The matrix A is scanned and each time an element mji,k

is found to be missing and it cannot be replaced usingmean imputation in step 2), then either the j-th defector the k-th diagnostic measurement is excluded from theanalysis. This approach is called listwise deletion [23].To decide whether to exclude the defect or the diagnosticmeasurement we count the number of defects for whichthe k-th diagnostic measurement is missing, denoted byNkdef , as well as the number of diagnostic measurements

that are missing for the j-th defect, denoted by N jmeas.

If


d min

DUT

Fault cluster 2Fault cluster 1

Fault cluster 3

Measurement 1

Measurement 2

d (m , m ) 1 i l

d

d

(m , m ) 3i l

(m , m ) 2 i l

m l

Fig. 11. Euclidean distance method in a 2-dimensional diagnostic measure-ment space.

Nkdef

N> β × N j

meas

d, (6)

where β is a user-defined coefficient, then we excludethe k-th diagnostic measurement, otherwise we excludethe j-th defect. Setting β small, more diagnostic mea-surements will be excluded, whereas, setting β large,more defects will be excluded.

To conclude, missing values force us to exclude eitherdiagnostic measurements or defects from the analysis. In theformer case, we remove information that may be useful forperforming diagnosis. In the latter case, we are bound to obtainmisleading diagnosis results if the defect that is present inthe faulty device has been inadvertently excluded from theanalysis [4].

3) Classifiers: We consider three classifiers, that is, c =3, based on Euclidean distance, non-parametric KDE, andpass/fail verification. In [4], an additional set of classifiers,namely an SVM and a classifier based on Mahalanobis dis-tance, are applied to the same case study. The interested readeris referred to [4] for a more detailed discussion on the principleof operation of classifiers, the training procedures, how scoresare assigned to faults, the overall performance, etc.

3.1. Euclidean distanceAs shown in Figure 11, this method assigns to the j-th

fault a normalized score d1(j) in the range [0,1] accordingto the distance between its footprints mj

i , i = 1, · · · , n, andthe measurement pattern of the DUT ml. We use Euclideandistance to determine pattern proximity.

3.2. Non-parametric KDEThis method relies on the densities fj(m|Fj), j =

1, · · · , N , where Fj denotes the event that the j-fault hasoccurred. The estimation of the densities is carried out usingthe available observations mj

i , i = 1, · · · , n, contained inthe j-th fault cluster FCj . To estimate fj , (m|Fj), we willnot make any assumption regarding its parametric form (e.g.normal). Instead, we will use non-parametric KDE whichallows the observations to speak for themselves [4]. Figure12 shows the kernel density estimates fj,α(m|Fj), α = 0, forthree faults in a 2-dimensional diagnostic measurement space.

Given a faulty device with pattern ml, we assign to thej-th fault a normalized score d2(j) in the range [0,1]. Thefault that achieves the highest density fj,α(ml|Fj) is mappedto 1. Furthermore, if d2(j) is zero for every defect, then thepattern ml is considered to be “foreign” to all fault clusters.In this case, we can conclude that the defect that has occurred


Den

sity

p(m|F1p(m|F2)

p(m|F3)


Den

sity

f (m|F2 )2,αf (m|F 3 )3,α

f (m|F )1,α 1~ ~

~

Fig. 12. KDE method in a 2-dimensional diagnostic measurement space.

had not been modeled in the fault dictionary. Thus, unlike theother methods that always assign a score to each defect, thenon-parametric KDE method is the only one that in theorycan identify an “unexpected” defect. This is a very importantattribute of the KDE method.

3.3. Pass/fail verification methodThis method simply examines the similarity of the pat-

terns ml and mji by verifying the pass/fail information for

each diagnostic measurement [4]. Formally, we consider thespecification indicator Iji,k, such that (a) Iji,k = 1 if both ml

and mji comply with the specification of the k-th diagnostic

measurement or if both ml and mji fail this specification

and (b) Iji,k = 0 if only one of ml and mji complies with

the specification of the k-th diagnostic measurement. Thenormalized score in the range [0,1] for the j-th fault is definedas

d3(j) =1

n

n∑i=1

1

d

d∑k=1

Iji,k. (7)

4) Diagnosis Results: Table V shows the 5 most highlyranked defects according to their scores for each of the 29failed devices, computed using (3). The first column showsthe DUT number, the second column shows the actual defectthat is present, the third column shows the ranking of defects,and the fourth column shows the corresponding (rounded) finalscores. As can be observed in Table V, the proposed methoddiagnoses correctly 17 out of the 29 failed devices with the truedefect matching with the first choice and for 4 failed devicesthe true defect appears in the first three choices. In some casesthe ranking indicates with high confidence the location of thedefect. For example, for DUT 2, the five defects that comefirst in the ranking (e.g. 320, 341, 126, 374, 111) are short-circuits across nodes of a transistor pair. The ranking of thesedefects can be subsequently used to speed up a classical FAmethod by placing the emphasis on the locations of the chipwhere the defect has probably occurred.

By comparing the diagnosis predictions to the true defectexisting in each DUT, we identify the defects that we areunable to diagnose. We were unable to diagnose correctlydefects 21, 28, 156, 300, 376, 380, and in one case defect101. Furthermore, in some cases the true defects are not ranked


TABLE VDIAGNOSIS RESULTS.

True Defect NormalizedDUTdefect ranking scores

1 107 107 90 920 114 347 0.924 0.923 0.923 0.923 0.9232 320 320 341 126 374 111 0.948 0.867 0.833 0.827 0.8223 125 47 616 125 681 360 0.914 0.839 0.838 0.837 0.8374 101 101 117 459 50 388 0.831 0.829 0.826 0.817 0.8175 216 216 666 192 516 120 0.831 0.795 0.792 0.788 0.7856 300 524 608 744 294 789 0.900 0.890 0.862 0.855 0.8507 20 20 126 24 27 111 0.889 0.866 0.862 0.850 0.8498 27 27 111 126 446 341 0.891 0.856 0.837 0.834 0.8349 104 111 104 465 721 126 0.848 0.844 0.839 0.823 0.82210 21 310 682 524 789 608 0.867 0.858 0.855 0.855 0.85111 101 101 117 459 50 388 0.831 0.829 0.826 0.818 0.81712 19 19 541 106 562 595 0.810 0.794 0.780 0.780 0.78013 19 19 541 562 595 106 0.799 0.791 0.788 0.771 0.77114 140 401 140 457 40 919 0.936 0.912 0.911 0.910 0.91015 20 20 24 126 27 111 0.887 0.865 0.862 0.853 0.84916 101 101 117 459 50 388 0.831 0.829 0.826 0.817 0.81717 107 107 90 920 114 347 0.924 0.923 0.923 0.923 0.92318 31 117 31 50 388 622 0.901 0.888 0.882 0.881 0.88019 101 252 305 366 363 31 0.883 0.857 0.846 0.844 0.84320 19 19 541 106 562 595 0.821 0.794 0.793 0.780 0.78021 156 524 608 744 789 682 0.903 0.893 0.872 0.872 0.86622 20 20 126 24 27 111 0.882 0.870 0.867 0.864 0.85323 107 107 90 920 114 347 0.924 0.923 0.923 0.923 0.92324 22 22 19 541 338 106 0.826 0.808 0.808 0.795 0.79525 107 107 90 920 114 347 0.924 0.923 0.923 0.923 0.92326 380 666 192 516 676 457 0.910 0.906 0.905 0.904 0.90327 376 383 456 112 34 196 0.924 0.920 0.830 0.826 0.82428 28 666 192 516 355 676 0.910 0.907 0.898 0.896 0.89629 300 524 608 744 475 215 0.896 0.896 0.866 0.864 0.862

as the first priority, such as the cases of DUT 3, 9, 14, and18. The reason for the above fault ambiguities is that thereare different defects whose patterns tend to overlap in thediagnostic measurement space. In other words, the impact ofthese defects on the diagnostic measurements is very similar.Fault ambiguity can be observed as early as in the faultsimulation phase. To resolve fault ambiguity we will need toconsider additional diagnostic measurements.

V. CONCLUSION

In this work, we have presented a new methodology forfault modeling and fault diagnosis of analog circuits based onmachine learning. The proposed approach is able to diagnoseboth catastrophic and parametric faults without making anyprior assumption about the type of fault that has occurred.A defect filter recognizes the type of fault and forwards thefaulty circuit to the appropriate tier. Circuits with catastrophicfaults are forwarded to a combination of multi-class classifierswhich list the catastrophic faults according to their likelihoodof occurrence. Circuits with parametric faults are forwarded toinverse regression functions which predict the values of a set ofpredefined design and transistor-level parameters, in order tolocate and predict the faulty parameter. The proposed approachwas demonstrated using both simulation and high-volumemanufacturing data showing excellent overall diagnosis rate.We also discussed the complexities often met in real casestudies related to missing values in data.

VI. ACKNOWLEDGEMENT

This research has been carried out within the framework ofthe European CATRENE Project CT302-TOETS. The authors

would like to thank C. Hora, Y. Xing, and B. Kruseman, fromNXP Semiconductors, The Netherlands, for their support anduseful discussions.

REFERENCES

[1] K. Huang, “Modelisation de fautes et diagnostic pour les circuitsmixtes/RF nanometriques,” PhD Thesis, University of Grenoble, France,Nov. 2011.

[2] K. Huang, H.-G. Stratigopoulos, and S. Mir, “Fault diagnosis of analogcircuits based on machine learning,” in Proc. Design, Automation & Testin Europe Conference, 2010, pp. 1761–1766.

[3] K. Huang, H.-G. Stratigopoulos, and S. Mir, “Bayesian fault diagnosisof RF circuits using nonparametric density estimation,” in Proc. IEEEAsian Test Symposium, 2010, pp. 295–298.

[4] K. Huang, H.-G. Stratigopoulos, S. Mir, C. Hora, Y. Xing, and B. Kruse-man, “Diagnosis of local spot defects in analog circuits,” IEEETransactions on Instrumentation and Measurement, vol. 61, no. 10, pp.2701–2712, 2012.

[5] Panel discussion, “Extended Diagnosis Requirements in AutomotiveApplications”, IEEE European Test Symposium, Seville, Spain, 2009.

[6] N. Sen and R. Saeks, “Fault diagnosis for linear systems via multifre-quency measurements,” IEEE Transactions on Circuits and Systems, vol.26, no. 7, pp. 457–465, 1979.

[7] L. Rapisarda and R. A. Decarlo, “Analog multifrequency fault diagnosis,”IEEE Transactions on Circuits and Systems, vol. CAS-30, no. 4, pp. 223–234, 1983.

[8] H. Dai and M. Souders, “Time-domain testing strategies and faultdiagnosis for analog systems,” IEEE Transactions on Instrumentationand Measurement, vol. 39, no. 1, pp. 157–162, 1990.

[9] M. Slamani and B. Kaminska, “Analog circuit fault diagnosis based onsensitivity computation and functional testing,” IEEE Design & Test ofComputers, vol. 9, no. 1, pp. 30–39, 1992.

[10] S. Chakrabarti, S. Cherubal, and A. Chatterjee, “Fault diagnosis formixed-signal electronic systems,” in IEEE Aerospace Conference, 1999,pp. 169–179.

[11] W. Maly, “Modeling of lithography related yield losses for CAD of VLSIcircuits,” IEEE Transactions on Computer-Aided Design of IntegratedCircuits and Systems, vol. 4, no. 3, pp. 166–177, 1985.

[12] J. Pineda de Gyvez and C. Di, “IC defect sensitivity for footprint-typespot defects,” IEEE Transactions on Computer-Aided Design of IntegratedCircuits and Systems, vol. 11, no. 1, pp. 638–658, 1992.

[13] R. Rodriguez-Montanes, E. Bruis, and J. Figueras, “Bridging defects re-sistance measurements in a CMOS process,” in Proc. IEEE InternationalTest Conference, 1992, pp. 892–899.

[14] R. Rodriguez-Montanes, J.P. de Gyvez, and P. Volf, “Resistance char-acterization for weak open defects,” IEEE Design & Test of Computers,vol. 19, no. 5, pp. 18–26, 2002.

[15] R. Spina and S. Upadhyaya, “Linear circuit fault diagnosis usingneuromorphic analyzers,” IEEE Transactions on Circuits and Systems-II:Analog and Digital Signal Processing, vol. 44, no. 3, pp. 188–196, 1997.

[16] S. S. Somayajula, E. Sanchez-Sinencio, and J. Pineda de Gyvez, “Analogfault diagnosis based on ramping power supply current signature clusters,”IEEE Trans. Circuits Syst. II, Analog Digit. Signal Process., vol. 43, no.10, pp. 703–712, 1996.

[17] Z. Wang, G. Gielen, and W. Sansen, “Probabilistic fault detection andthe selection of measurements for analog integrated circuits,” IEEE Trans-actions on Computer-Aided Design of Integrated Circuits and Systems,vol. 17, no. 9, pp. 862–872, 1998.

[18] S. Krishnan, K. D. Doornbos, R. Brand, and H. G. Kerkhoff, “Blocklevel bayesian diagnosis of analogue electronic circuits,” in Proc. Design,Automation & Test in Europe Conference, 2010, pp. 1767–1772.

[19] H.-G. Stratigopoulos, S. Mir, E. Acar, and S. Ozev, “Defect filter foralternate RF test,” in Proc. IEEE European Test Symposium, 2009, pp.101–106.

[20] A. Verikas, A. Lipnickas, K. Malmqvist, M. Bacauskiene, and A. Gelzi-nis, “Soft combination of neural classifiers: A comparative study,” PatternRecognition Letters, vol. 20, no. 4, pp. 429–444, 1999.

[21] L.I. Kuncheva, ““Fuzzy” versus “nonfuzzy” in combining classifiersdesigned by Boosting,” IEEE Transactions on Fuzzy Systems, vol. 11,no. 6, pp. 729–741, 2003.

[22] N. Cristianini and J. Shawe-Taylor, Support Vector Machines and OtherKernel-Based Learning Methods, Cambridge, 2000.

[23] R.J.A. Little and D.B. Rubin, Statistical Analysis with Missing data,2nd Edition, John Wiley & Sons, Inc, 2002.


Date post:	02-Jul-2018
Category:	Documents
Upload:	trannhan
View:	221 times
Download:	0 times

Fault modeling and diagnosis for nanometric analog circuits · Fault modeling and diagnosis for...

Documents