+ All Categories
Home > Documents > IC Failure Analysis: The Importance of Test and Diagnostics · Failure analysis VLSI failure...

IC Failure Analysis: The Importance of Test and Diagnostics · Failure analysis VLSI failure...

Date post: 15-Jul-2020
Upload: others
View: 25 times
Download: 0 times
Share this document with a friend
DEVELOPMENT AND MANUFACTURING failures are unfortunately an inherent part of the microelectronics business, where com- plexity is growing rapidly. Failures can occur during several points of a product’s life cycle, such as technology or product development and qualification, yield learning, reliability improvement, system manufacture, and field application. The impact of such failures ranges from consequential to catastrophic. While we expect failures during reliability stressing or yield learning on a new technol- ogy, mature programs and parts qualified for sale and field application demand competi- tive quality and reliability levels. Failures dur- ing these later phases of production need immediate analysis and corrective action. Whether anticipated or sudden, failures can have a severe business impact. Because nar- row market opportunities often drive short- ened product cycles, companies need to understand failures and take corrective ac- tions quickly. Electrical characterization, sta- tistical analysis, signature analysis, and process experiments can provide important clues that allow us to infer the cause of failure. But only full root-cause physical failure analy- sis can provide the incriminating evidence necessary to correct problems with confi- dence: the picture worth a thousand words. The crucial element of failure analysis is fault localization, a task for which both hard- ware and software techniques exist (see the adjacent box). Trends toward denser circuits and more sophisticated packaging, however, are limiting physical access to internal chip circuitry and thus diminishing the effective- ness of hardware-based diagnostics. This arti- cle reviews hardware and software options for fault localization and shows why software di- agnostics must become a key focus within the design, test, and failure analysis communities. Failure analysis VLSI failure analysis is the process of deter- mining the physical root cause of component failure, given the electrical failure mode and characteristics. To accomplish this, it is nec- essary to combine a series of electrical and physical steps aimed at localizing and identi- fying the ultimate cause of failure (Figure 1). The following sections describe each step in detail. For simplicity, Figure 1 shows the process in serial form. Due to the widely vary- ing nature of components, failures, and defect mechanisms, a typical analysis could involve many loops between the steps shown. Kudva et al. 1 provides a more detailed description of the process and its future challenges. Fault localization. The size and com- plexity of modern VLSI components and de- fects of nanometric proportions make it imperative to accurately localize faults prior IC Failure Analysis: The Importance of Test and Diagnostics 76 0740-7475/97/$10.00 © 1997 IEEE IEEE DESIGN & TEST OF COMPUTERS Continuous improvements in yield, reliability, time to market, and customer satisfaction all benefit from quick corrective action through root- cause failure analysis. The author reviews software- and hardware- based diagnostic methods for fault localization, the first and most critical step in this process. DAVID P. VALLETT IBM Microelectronics Division IC DIAGNOSIS AND FAILURE ANALYSIS .
Page 1: IC Failure Analysis: The Importance of Test and Diagnostics · Failure analysis VLSI failure analysis is the process of deter-mining the physical root cause of component failure,

DEVELOPMENT AND MANUFACTURINGfailures are unfortunately an inherent part of

the microelectronics business, where com-

plexity is growing rapidly. Failures can occur

during several points of a product’s life cycle,

such as technology or product development

and qualification, yield learning, reliability

improvement, system manufacture, and field

application. The impact of such failures

ranges from consequential to catastrophic.

While we expect failures during reliability

stressing or yield learning on a new technol-

ogy, mature programs and parts qualified for

sale and field application demand competi-

tive quality and reliability levels. Failures dur-

ing these later phases of production need

immediate analysis and corrective action.

Whether anticipated or sudden, failures can

have a severe business impact. Because nar-

row market opportunities often drive short-

ened product cycles, companies need to

understand failures and take corrective ac-

tions quickly. Electrical characterization, sta-

tistical analysis, signature analysis, and

process experiments can provide important

clues that allow us to infer the cause of failure.

But only full root-cause physical failure analy-

sis can provide the incriminating evidence

necessary to correct problems with confi-

dence: the picture worth a thousand words.

The crucial element of failure analysis is

fault localization, a task for which both hard-

ware and software techniques exist (see the

adjacent box). Trends toward denser circuits

and more sophisticated packaging, however,

are limiting physical access to internal chip

circuitry and thus diminishing the effective-

ness of hardware-based diagnostics. This arti-

cle reviews hardware and software options for

fault localization and shows why software di-

agnostics must become a key focus within the

design, test, and failure analysis communities.

Failure analysisVLSI failure analysis is the process of deter-

mining the physical root cause of component

failure, given the electrical failure mode and

characteristics. To accomplish this, it is nec-

essary to combine a series of electrical and

physical steps aimed at localizing and identi-

fying the ultimate cause of failure (Figure 1).

The following sections describe each step in

detail. For simplicity, Figure 1 shows the

process in serial form. Due to the widely vary-

ing nature of components, failures, and defect

mechanisms, a typical analysis could involve

many loops between the steps shown. Kudva

et al.1 provides a more detailed description of

the process and its future challenges.

Fault localization. The size and com-

plexity of modern VLSI components and de-

fects of nanometric proportions make it

imperative to accurately localize faults prior

IC Failure Analysis: The Importance of Test andDiagnostics

76 0740-7475/97/$10.00 © 1997 IEEE IEEE DESIGN & TEST OF COMPUTERS

Continuousimprovements in yield,

reliability, time tomarket, and customersatisfaction all benefitfrom quick correctiveaction through root-

cause failure analysis.The author reviews

software- and hardware-based diagnosticmethods for fault

localization, the first andmost critical step in this


DAVID P. VALLETTIBM Microelectronics Division



Page 2: IC Failure Analysis: The Importance of Test and Diagnostics · Failure analysis VLSI failure analysis is the process of deter-mining the physical root cause of component failure,


to any destructive analysis. We can localize faults to the near-

est logic block or circuit net, or in some cases directly to the

responsible defect’s physical location. Fault localization is a

primary focus of this article; the sections on hardware and

software diagnostics discuss the topic in detail.

Deprocessing. Once we have localized a fault as accu-

rately as possible, we must prepare the sample for further

characterization and inspection. Usually, we first remove

the chip from its package. Depending on the accuracy of

fault localization and the nature of the failure, we may need

to sequentially inspect and remove one or more levels of

the interlevel insulating films and metal wiring. The process

continues until we can isolate the defect electrically and

physically to where it is best identified and characterized.

To a great extent, deprocessing is a reversal of the manu-

facturing process; films are removed in reverse order of ap-

plication. The failure analysis laboratory employs many of

the same chemicals and processes used in manufacturing to

define shapes and structures, such as mechanical polishing,

plasma or dry etching, and wet chemical etching.

Defect localization and characterization. Again de-

pending on the accuracy of fault localization and the nature

of the failure, a second localization step or characterization

of the defect may be necessary. At this point, we localize the

fault to a circuit block such as a NAND gate, latch, or mem-

ory cell. By characterizing the effects of the fault on the cir-

cuit’s performance, we may further pinpoint its location.

Because the subsequent steps are irreversible, we need to

gather as much information as possible about the defect and

its location before proceeding with the failure analysis.


Hardware diagnosticsSoftware diagnostics

Mechanical techniquesWet or dry chemical etches

Internal circuit probingParameter analyzerScanning probe microscopeHardware diagnostics

MicroscopesFocused ion beamMaterial analysis



Defect localizationand characterization

Inspection andphysical characterization

Figure 1. The basic failure analysis process.

The first and most criti-cal step in the failureanalysis process is fault lo-calization. Without know-ing where to look on acomplex VLSI component,the likelihood of locatingand identifying a defectmechanism is infinitesi-mal. The problem is likethe familiar needle-in-the-haystack metaphor. Thehaystack could be a mi-croprocessor containingover five million transistorsand five levels of wiring ina 196-mm2 area. The nee-dle could be a 0.2-µm ×0.4-µm nickel particle causing a short between two nodes(Figure A). The failure analysis challenge would be to locateand identify this defect, knowing only its electrical effects on

the chip and perhaps secondary characteristics such as tem-perature and voltage sensitivity.

Figure A. The needle in the haystack: a microprocessor shown at actual size (1) compared to adefect shown magnified 50,000 times (2).

1 2

The importance of fault localization

continued on the next page


Page 3: IC Failure Analysis: The Importance of Test and Diagnostics · Failure analysis VLSI failure analysis is the process of deter-mining the physical root cause of component failure,


A number of tools and techniques facilitate defect local-

ization and characterization. We use micrometer-driven me-

chanical, electrical, or hydraulic positioners with ultrafine

probes (with tips approximately 0.2 µm in diameter) to in-

ject and measure signals on conductors of interest. Using

high-resolution optical microscopes with long working-

distance objectives, we can observe and position the probes.

Signals can be AC or DC, often requiring measurement res-

olution of tens of millivolts or picoamperes. Because of

shrinking line widths, it has lately become necessary to use

a focused ion beam (FIB) tool to create localized probe pads

on the nodes of interest. Additionally, a scanning probe mi-

croscope can measure the effects of the defect on electro-

static force, atomic force, or capacitance.

Several other techniques for defect and fault localization,

discussed later, are based on the use of light, heat, or elec-

tron-beam radiation.

Inspection and defect characterization. After ex-

hausting all appropriate means to localize and characterize

a fault, we inspect the sample for a physical defect. Once

identified, we must often characterize the defect so that its

material properties will provide the manufacturing line with

enough information to determine its source.

Depending on the accuracy of localization, we inspect

the failure site using one of three common techniques: op-

tical, scanning electron, or scanning probe microscopy.

Optical microscopy scans for anomalies on relatively long

wires or individual circuit blocks (latches, SRAM cells, and

so on). While relatively inadequate for high-magnification

imaging, optical microscopy is superior for its ability to si-

multaneously image numerous vertical levels through trans-

parent insulating films.

We can attain nanometer-scale resolution with scanning

electron microscopy (SEM). In addition to its high magnifi-

cation capabilities, SEM can evaluate material properties

such as atomic weight and chemical content. However, it

is limited to surface imaging and so requires delayering of

films between inspection steps.

For faults or defects localized to extremely small areas


Fault localization on microelectronics components hasbeen difficult since the earliest days of circuit integration.Future trends show that the problem is worsening. FigureB1 illustrates the projected increase in device complexity(related to localization) versus the projected decrease inminimum defect size, both relative to 1995 levels.2 For thepurposes of this discussion, complexity is the sum of thenumber of transistors and the total wiring length on the chip.To illustrate the relationship between complexity and defectsize as a single parameter, Figure B2 shows the complexi-ty divided by defect size. An order of magnitude increase

in localization difficulty is apparent about every six years. There are two major methods of fault localization: hard-

ware-based diagnostics, which use physical parameters likelight, heat, and electron-beam radiation; and software-based diagnostics, which use tester data and simulation.

I’ve aimed the discussion in this article at digital logic com-ponents—for example, application-specific products andmicroprocessors. While memory components have the samelocalization needs as logic, bit-fail maps generally offer astraightforward solution.


















ct s


1995 1998 2001 2004 2007 2010

Relative decrease inminimum defect size

Relative increasein complexity

(1) Year






ity d


ed b

y de



1995 1998 2001 2004 2007 2010(2) Year

Figure B. Projected increase2 in relative complexity versus minimum defect size (1); relative increase in localization difficulty (2).

The importance of fault localization (continued)


Page 4: IC Failure Analysis: The Importance of Test and Diagnostics · Failure analysis VLSI failure analysis is the process of deter-mining the physical root cause of component failure,


(individual transistors, dynamic memory cell capacitors,

and so forth), we can use a scanning probe microscope

(SPM). This technique offers atomic-scale resolution and

can characterize electrostatic potential, capacitance, atom-

ic force, and topography across small areas.

When these techniques cannot determine the material

composition of the defect or are unable to locate a defect al-

together, we turn to a suite of more sophisticated chemical

and material analysis tools—for example, transmission elec-

tron microscopy (TEM), Auger electron spectroscopy (AES),

and electron spectroscopy for chemical analysis (ESCA).

Hardware diagnosticsHardware diagnostic techniques fall into two broad cat-

egories. The first is the direct, passive observation of a phys-

ical phenomenon associated with the defect and its effects

on the chip’s operation. The second is the active measure-

ment of the chip’s response to an outside physical stimulus,

which correlates to the instantaneous location of that stim-

ulus at the time of response.

To perform passive observation, the failure analyst places

the chip in its failed state and makes an image of it based

on light emission, thermal effects, or the interaction of the

chip with a focused beam of electrons. Techniques in use in-

clude photon emission microscopy (PEM), liquid-crystal

hot-spot analysis, fluorescent microthermal imaging (FMI),

and electron-beam or voltage contrast testing (EBT or VC).

Figure 2 shows an entire chip imaged with photon emission

microscopy while in a failed state. A single emission point

is highlighted at or very near the defect site.

Using the chip’s response to active outside stimulus, a

laser, an electron beam, or an ion beam scans the device.

The analyst monitors the appropriate chip input/output ports

(I/O) during scanning. The resulting image correlates a

change in signal on the I/O to the position of the scanned

beam at the time of the change. Figure 3 is an example of a

scanned-laser technique known as light-induced voltage al-

teration (LIVA), where the power supply voltage is moni-

tored for changes caused by interaction of the laser beam

with the defect site. Several articles offer details on hard-

ware techniques.3-5

There are many considerations for the use of hardware

diagnostics. Most of these techniques require significant in-

vestment in tooling and in personnel experienced in chip

architecture, testing, and the technique itself. In addition,

hardware diagnostics can sometimes isolate a fault directly

to the defect site. These techniques, however, have two pri-

mary limitations.

The first is their defect dependence. Not all defects emit

light or cause localized heating. Some are not light sensi-

tive, nor will they cause a signal change that can be im-

aged with an electron beam. As such, failure analysis

engineers must often apply a series of techniques, not

knowing ahead of time what the defect mechanism is.

Because of this, engineers often need considerable time

to localize a defect.

The second and most serious limitation of hardware di-

agnostics is the necessity for access to the chip’s transistors

and internal wiring. In every case, the appropriate detection

equipment or stimulating beam must be able to view or ir-

radiate the site of interest, respectively. Figure 4 (next page)

shows a package containing a front-side-accessible die with

wire-bonded I/O located around the periphery of the chip.

The package can be inserted into a customized socket and

used directly with any hardware technique.

However, as wiring density and I/O count increase rapid-

ly, the die’s internal circuitry becomes obscured. Flip-chip

Figure 2. Photon-emission-microscopy image of an entire chipunder bias with the emitting area circled.

Figure 3. Light-induced voltage alteration (LIVA) image withdefective circuit nets highlighted. (Courtesy Sandia NationalLaboratories, 1994)


Page 5: IC Failure Analysis: The Importance of Test and Diagnostics · Failure analysis VLSI failure analysis is the process of deter-mining the physical root cause of component failure,


and direct-chip-attach (DCA) packaging conceal the surface

of the chip altogether. In Figure 5, the chip’s surface is buried

between the silicon die substrate and the module substrate.

Clearly, only hardware techniques that are operable from

the back of the die are feasible for parts of this type.

Fortunately, silicon is transparent to infrared light under

the right conditions. This makes the use of photon emission

microscopy and scanned-laser techniques possible from the

back side. Researchers are also developing drilling, milling,

and other silicon-thinning techniques to improve back-side

access and signal integrity. Thermal and electron beam tech-

niques, however, are obso-

lete without significant

innovation and develop-

ment. Defects that can only

be localized by these meth-

ods may become unde-


Because these techniques

are limited to the back side

of the die, hardware diag-

nostic capability may be re-

stricted to defect

mechanisms that emit in-

frared light or are sensitive to

it. Certainly this severely re-

duces the overall utility of

these methods across a

broadening range of fail


Software diagnosticsSoftware techniques rely

on the combination of fault

simulation and chip design

data to determine probable fault locations. While manual

analysis of failing patterns can yield such a determination,

this is impractical for parts of even moderate complexity.

Software diagnostics generally fall into two groups that

both involve simulation of faults and test results: precalcu-

lated fault dictionaries and post-test fault simulation.

Precalculated fault dictionaries are typically built during

test generation, before parts are manufactured. By simulat-


Figure 5. Microprocessor package limited to back-side dieaccess.

Summary of results for: PN-0000000 EC=000000 SER#=00000000 Pass#=0 Part name = 0000

#Expected defects=1 All fails explained=YES All passes expalined = NO

Failing Interval Data


Fault candidates for defect#=1

Fault BlockID EDSNAME LOC Type

1 80194 RS025AW.A 1-SAO

2 85676 RS075FE10 0-SAO85676 RS075FE10.0110 0-SA185676 RS075FE10 1-SA1

3 128451 ZEM12FB10 0-SA0136897 RS013CE.100.L1IN L1 1-SA0 128450 ZEM12FB.0110 0-SA1128451 ZEM12FB10 1-SA0

80198 RS025AW10 0-SA0

Figure 6. Software diagnostic results showing failing net list.

Figure 4. Microprocessor package with front-side-accessibledie surface.


Page 6: IC Failure Analysis: The Importance of Test and Diagnostics · Failure analysis VLSI failure analysis is the process of deter-mining the physical root cause of component failure,


ing faults at all known circuit nodes on an IC, test engineers

compile a list, or dictionary, of expected outputs for each

fault.6 Engineers then use this dictionary on actual failures

to determine probable faults and their locations based on

failing outputs. As complexity increases, however, the prac-

ticality and computing requirements for dictionaries of this

type become prohibitive without enhancements. Aitken7

and Eichelberger et al.8 describe two such improvements.

Post-test fault simulation takes place after the product is

built and tested, on a fail-by-fail basis. The engineer restricts

analysis to the outputs and corresponding circuits involved

in a given failure. This approach significantly reduces the size

of the potential fault list and subsequently the computing re-

quirements. By structuring the design of the chip for greater

internal controllability and observability with methods gen-

erally known as scan, designers can make fault simulation for

diagnosis particularly efficient.8 Using scan-designed and test-

ed parts at IBM, we perform accurate and efficient diagnosis

for a variety of failures. Included are deterministic stuck-at

fault, weighted random pattern, delay, boundary scan I/O,

embedded static RAM, and built-in self-test failures.

We are also investigating software diagnostics for parts

that draw excessive power supply current, or IDDQ. The use

of simple fault models (pseudo stuck-at and bridging), a

fault dictionary, and a large number of IDDQ measurements

has shown promising results at IBM and elsewhere.9 Initial

failure analysis shows good correlation between the loca-

tion predicted by a simulated IDDQ pass/fail signature and the

physical defect location. We can localize most fails down

to one or two logical gates using IDDQ diagnostic techniques,

given a reasonable number of test vectors (for example,

100). We also improve stuck-at fault and delay diagnostics

by combining them with IDDQ techniques.

Figure 6 shows a typical fault list generated with scan-

based software diagnostics. Figure 7 shows the associated

graphical display of suspect circuits containing physical co-

ordinates for failure analysis. This provides what is essen-

tially a bit-fail map for logic to guide the inspection step of

the failure analysis process.

While there are costs associated with structured scan de-

sign and test, its diagnosis capabilities for failure analysis

have tremendous value. Resolution is usually good enough

to allow deprocessing or inspection after only a few hours

of diagnosis time. The analysis can be performed without

in-depth knowledge of chip design, test, and architecture.

Most important, the technique is largely independent of de-

fect and technology type, and the internal chip circuitry

need not be physically accessible.

Figure 7. Failing nets plotted on graphical display of chiplayout.

Table 1. Diagnostic technique comparison.

Diagnostic Physical Fault Relative time Typical technique die access types to localize resolution Costs

Hardware Required Technique-dependent Long Defect Tools, personnelSoftware Not required Most Short Net Data processing, design for test



Are suitablesoftware diagnostics


Yes Yes


NoIs front side ofdie accessibleduring test?




diagnostics(in development)

Physicalfailure analysis

Figure 8. Relationship between hardware and software diag-nostics regarding die accessibility.


Page 7: IC Failure Analysis: The Importance of Test and Diagnostics · Failure analysis VLSI failure analysis is the process of deter-mining the physical root cause of component failure,


DiscussionTable 1 summarizes key qualitative factors associated with

hardware and software diagnostics. Fault type, relative time

to localize, resolution, and cost are all important consider-

ations. The most critical, however, is die accessibility.

Figure 8 shows the relationship between hardware and

software diagnostics as it relates to accessibility. With suit-

able software diagnostics, inspection of the identified cir-

cuits can begin immediately once a fault list is produced.

Hardware diagnostics require access to internal circuit

nodes, however. Current hardware techniques depend on

front-side access, which is not available in an increasing

number of designs. Back-side hardware techniques are in

development, but these limit the types of defects that can

be localized. As such, as die access disappears, hardware

diagnostics will function as a complement to software di-

agnostics, which are easier to use, capable of localizing a

broader spectrum of faults, and do not require die access.

THE VALUE OF ROOT-CAUSE DEFECT identification by elec-

trical and physical failure analysis is immense. When we can

accomplish rapid corrective actions confidently, yield and

reliability learning, time-to-market, and end-customer satis-

faction improvements are direct benefits of the process.

Fault localization is a critical step in the process of ana-

lyzing a failure to its root cause. With localization difficulty

increasing about an order of magnitude every six years, the

ability to find faults on complex ICs is in jeopardy.

Today, hardware and software diagnostics exist as inde-

pendent solutions. But as front-side die access disappears,

hardware techniques will become significantly limited in

scope. In the future, back-side, infrared optical techniques

will merely complement mostly superior software methods.

Efficient software diagnostics require an aggressive

design-for-test and diagnostic strategy. Such a strategy is

threatened, however, by the microelectronics industry’s

strong focus on cost reduction in the design and test phas-

es of product development. Designers, program managers,

and quality, failure analysis, and test engineers must work as

a team to recognize and optimize the return on investment

that diagnosability provides. Externally, the design, test, and

failure analysis communities must work with academia, the

national laboratories, and equipment suppliers to develop

new software and hardware solutions for increasingly com-

plex ICs and more subtle failure mechanisms. Together, we

can discover techniques that provide an optimal balance

between cost and performance such that we can continue

to conclusively find and fix chip failures.

References1. S.M. Kudva et al., “The Sematech Failure Analysis Roadmap,”

Proc. 21st Int’l Symp. Testing and Failure Analysis, ASM Int’l,

Materials Park, Oh., 1995, pp. 1-5.

2. The National Technology Roadmap for Semiconductors, Semi-

conductor Industry Association, San Jose, Calif., 1994.

3. R.E. Anderson et al., “Future Technology Challenges for Failure

Analysis,” Proc. 21st Int’l Symp. Testing and Failure Analysis,

1995, pp. 27-32.

4. T.S. Taylor et al., “Leakage Detection Techniques: A Compar-

ative Study,” Proc. 15th Int’l Symp. Testing and Failure Analysis,

1989, pp. 5-13.

5. B. Bossman et al., “Failure Analysis Techniques with the Con-

focal Laser Scanning Microscope,” Proc. 18th Int’l Symp. Test-

ing and Failure Analysis, 1992, pp. 351-361.

6. H.Y. Chang et al., Fault Diagnosis of Digital Systems, Wiley In-

terscience, New York, 1970, pp. 126-140.

7. R.C. Aitken, “Finding Defects with Fault Models,” Proc. Int’l Test

Conf., IEEE Computer Society Press, Los Alamitos, Calif., 1995,

pp. 498-505.

8. E.B. Eichelberger et al., Structured Logic Testing, Prentice-Hall,

Inc., Englewood Cliffs, N.J., 1991, pp. 112, 167.

9. R.C. Aitken, “Fault Location with Current Monitoring,” Proc.

Int’l Test Conf., IEEE CS Press, 1991, pp. 623-632.

David P. Vallett is an advisory engineer at IBM

Microelectronics Division in Essex Junction,

Vermont, where he is responsible for the strate-

gic development of failure analysis tools and

techniques. His research interests include

back-side fault localization and characteriza-

tion methods for silicon CMOS ICs. He has authored several papers

and conference presentations and is currently chair of the Sema-

tech Product Analysis Forum.

Vallett holds a BS degree in electrical engineering from the State

University of New York at Buffalo and is a member of IEEE and Tau

Beta Pi, the National Engineering Honor Society.

Direct questions concerning this article to the author at IBM Mi-

croelectronics, Dept. 382, Bldg. 967-2, Essex Junction, VT 05452-

4299; [email protected].


