TCAD BIST Launch on Capture Paper 4885 091209 R69

8/8/2019 TCAD BIST Launch on Capture Paper 4885 091209 R69

http://slidepdf.com/reader/full/tcad-bist-launch-on-capture-paper-4885-091209-r69 1/14

4885 – P. 1 To Appear in IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS

Abstract—This paper presents a new at-speed logic

Built-In Self-Test (BIST) architecture supporting two

launch-on-capture schemes, namely aligned

double-capture and staggered double-capture, for

testing multi-frequency synchronous and asynchronous

clock domains in a scan-based BIST design. Theproposed architecture also includes BIST debug and

diagnosis circuitry to help locate BIST failures. The

aligned scheme detects and allows diagnosis of

structural and delay faults among all synchronous clock

domains, whereas the staggered scheme detects and

allows diagnosis of structural and delay faults among all

asynchronous clock domains. Both schemes solve the

long-standing problem of using the conventional one-hot

scheme which requires testing each clock domain one at

a time or the simultaneous scheme which requires

adding isolation logic to normal functional paths across

interacting clock domains. Physical implementation is

easily achieved by the proposed solution due to the use of a slow-speed, global scan enable signal and reduced

timing-critical design requirements. Application results

Manuscript received May 12, 2008; revised November 12, 2008.

Laung-Terng Wang is with Dept. of Electrical Engineering and

Graduate Institute of Electronics Engineering at National Taiwan

University, and SynTest Technologies, Inc., 505 S. Pastoria Ave., Suite

101, Sunnyvale, CA 94086, USA (1-408-720-9956x200; fax:

1-408-720-9960; e-mail: [email protected]).

Xiaoqing Wen is with Dept. of Creative Informatics, Kyushu Institute of

Technology, Iizuka, Fukuoka 820-8502, Japan (e-mail:

[email protected]).

Shianling Wu is with SynTest Technologies, Inc., Princeton Junction,

NJ 08550, USA (e-mail: [email protected]).

Hiroshi Furukawa is with Dept. of Creative Informatics, Kyushu

Institute of Technology, Iizuka, Fukuoka 820-8502, Japan (e-mail:

[email protected]).

Hao-Jan Chao is with SynTest Technologies, Inc., 2F, No. 27, Industry

E. Road 9, Science-Based Industrial Park Hsinchu, Taiwan (e-mail:

[email protected]).

Boryau Sheu was formerly with SynTest Technologies, 505 S. Pastoria

Ave., Suite 101, Sunnyvale, CA 94086, USA (e-mail:

[email protected]).

Jianghao Guo is with Dept. of Electrical and Computer Engineering,

University of Cincinnati, OH 45221, USA (e-mail: [email protected]).

Wen-Ben Jone is with Dept. of Electrical and Computer Engineering,

University of Cincinnati, OH 45221, USA (e-mail: [email protected]).

Digital Object Identifier 09.0909/TCAD.2009.826558

for industrial designs demonstrate the effectiveness of

the proposed architecture.

Index Terms—Aligned Double-Capture, At-Speed

Self-Test, Double-Capture, Launch-on-Capture, Logic

BIST, Staggered Double-Capture

I. INTRODUCTION

OGIC Built-In Self-Test (BIST) [1]-[7] is a

Design-for-Testability (DFT) technique in which a

portion of a circuit on a chip, board, or system is used to test

the digital logic circuit itself. Logic BIST is crucial for many

applications, in particular, for life-critical and

mission-critical applications. These applications commonly

found in the aerospace/defense, automotive, banking,

computer, health care, networking, and telecommunications

industries require on-chip, on-board, in-system, and in-field

self-test to ensure the reliability of the entire system, as well

as its ability to perform remote test and diagnosis.The logic BIST technique widely used in industry is based

on the STUMPS (Self-Test Using a MISR and Parallel Shift

register sequence generator ) structure [8]. In the STUMPS

architecture, a Pseudorandom Pattern Generator (PRPG) is

used to generate pseudorandom patterns and shift each

pattern in parallel to the inputs of the scan chains embedded

in a scan-based design, and a Multiple-Input Signature

Register (MISR) is used to compact the test responses

shifted out of the scan chain outputs to create a signature.

After a pre-determined number of test cycles are executed,

the final signature is then compared against an embedded

golden (good circuit) signature to judge whether the circuit

under test (CUT) passes or fails. As no test patterns aresupplied externally, logic BIST can reduce test cost and also

allow the circuit to perform in-field self-test.

While logic BIST offers many benefits, its real value is in

providing at-speed testing for high-speed and

high-performance circuits. These circuits often contain

multiple clock domains, each running at a frequency that is

either synchronous or asynchronous to the other clock

domains. Two clock domains are said to be synchronous if

the active edges of both clocks controlling the two clock

domains can be aligned precisely or triggered

Using Launch-on-Capture for Testing BIST

Designs Containing Synchronous and

Asynchronous Clock Domains

Laung-Terng Wang, Fellow, IEEE , Xiaoqing Wen, Senior Member , IEEE ,

Shianling Wu, Member , IEEE , Hiroshi Furukawa, Hao-Jan Chao, Member , IEEE , Boryau Sheu ,

Jianghao Guo, and Wen-Ben Jone, Senior Member , IEEE

L




simultaneously. Two clock domains are said to be

asynchronous if they are not synchronous.

Despite its conceptual simplicity, logic BIST faces many

practical hurdles, especially in at-speed testing for

multi-clock, multi-frequency circuits. Each clock in such a

circuit controls a clock domain, whose clock skew is

minimized and which runs at a frequency either synchronous

or asynchronous to other clock domains. The most critical

yet difficult part of logic BIST is how to detectintra-clock-domain faults and inter-clock-domain faults

thoroughly and efficiently with a proper capture-clocking

scheme. An intra-clock-domain fault originates at one

clock domain and terminates at the same clock domain. An

inter-clock-domain fault originates at one clock domain

but terminates at another clock domain.

Previous STUMPS-based logic BIST schemes [9]-[11]

have not been effectively applied in practice. The reasons

are mainly due to the need to manipulate test frequency when

the CUT contains asynchronous clock domains, and the

difficulty in timing control and physical implementation.

Alternatively, a conventional at-speed BIST scheme using

one-hot clocking or simultaneous clocking would need totest one clock domain at a time resulting in long test time or

add isolation logic to normal functional paths across

interacting clock domains resulting in fault coverage loss

across these clock domains.

These problems will be addressed in this paper, with a new

logic BIST architecture using launch-on-capture schemes -

aligned clocking and staggered clocking - that achieves

true at-speed test quality for any multi-clock,

multi-frequency design and that is easy for physical

implementation.

It should be noted that all above-mentioned

capture-clocking schemes are applicable for both BIST

designs and scan designs. The main difference is that forBIST designs no unknown ( X ) values are allowed to

propagate through the scan chains to reach the MISR.

Throughout this paper, we will assume that a

STUMPS-based architecture is used and that each clock

domain contains one test clock and one scan enable signal.

The faults we will consider for comparison include

structural faults, such as stuck-at faults and bridging faults,

as well as timing-related delay faults, such as path-delay

faults and transition faults.

The rest of the paper is organized as follows: Section 2

describes the background. Section 3 presents the logic BIST

architecture. Section 4 discusses at-speed timing control

issues, and Section 5 focuses on physical implementationissues. Section 6 shows results on several industrial designs,

and Section 7 concludes the paper.

II. BACKGROUND

There are two basic capture-clocking schemes for testing

multiple clock domains at-speed: (1) skewed-load (which is

now commonly called launch-on-shift [LOS]) [12] and (2)

double-capture (which was called broad-side in [13] but is

now commonly called launch-on-capture [LOC]). Both

schemes are helpful for detecting structural faults and delay

faults within each clock domain (called intra-clock-domain

faults) or across clock domains (called inter-clock-domain

faults). Skewed-load uses the last shift clock pulse followed

immediately by a capture clock pulse to launch a transition

and capture its output test response, respectively.

Double-capture uses two consecutive capture clock pulses to

launch the transition and capture the output test response,

respectively. In either scheme, both launch and capture clock

pulses must be running at the domain’s operating speed orat-speed. The difference is that skewed-load requires the

domain’s scan enable signal SE to switch its value between

the launch and capture clock pulses making SE function as a

clock signal. Figure 1 shows sample waveforms using the

basic skewed-load and double-capture at-speed test

schemes.

Typically, testing a scan-based BIST design based on

skewed-load for at-speed delay fault testing can achieve

higher fault coverage with shorter test length [14]-[20].

Although some novel DFT techniques as proposed in [21]

and [22] have addressed the timing problem of operating the

scan enable signal SE at-speed for each clock domain,

skewed-load can cause unwanted over-testing because morefalse paths can be exercised, and incur higher

implementation cost associated with the at-speed scan

enable signal SE . This is in sharp contrast to double-capture

in which only a slow-speed, global scan enable signal GSE

for all clock domains is needed.

Therefore, this paper will focus on logic BIST architecture

based on double-capture. There are two known at-speed

capture-clocking schemes: (1) one-hot double-capture that

conducts capture for one clock domain at a time and (2)

simultaneous double-capture that allows testing to be

performed on all clock domains in parallel.(a) Skewed-load (a.k.a. Launch-on-shift)

(b) Double-capture (a.k.a. Launch-on-capture)

Fig. 1. Basic at-speed test schemes.

A. One-Hot Double-Capture

The one-hot double-capture scheme tests clock domains

one by one. A sample timing diagram for two clocks is

shown in Fig. 2. The main advantages are that (1) two

consecutive capture pulses are applied (C 1-followed-by-C 2

or C 3-followed-by-C 4) at their respective clock domains’

Shift Shift LastShift

Shift

SE

CK

C a p t u r e

L a u n c h

Shift Shift DeadCycles

Shift

SE

CK

C a p t u r e

L a u n c h




frequencies (of period d 1 or d 2) to test intra-clock-domain

delay faults, and (2) a single, slow-speed global scan enable

signal GSE is used to drive both clock domains.

Fig. 2. One-hot double-capture.

Hence, this scheme can be used for true at-speed testing of

intra-clock-domain delay faults in both synchronous and

asynchronous clock domains. However, this scheme suffers

from two drawbacks in that (1) it cannot be used to detect

inter-clock-domain delay faults and (2) it has long test time.

B. Simultaneous Double-Capture

The long test time problem of one-hot double-capture can

be resolved by using the simultaneous double-capture

scheme illustrated in Fig. 3. The simultaneous

double-capture scheme allows testing to be performed on

all clock domains in parallel. However, since data may

propagate from one clock domain to the other clock domain,

isolation logic (or capture-disabling circuitry), such as

AND/OR gates or multiplexers, must be added at the

sources, sinks, or along the normal functional paths to force

all inter-clock-domain paths to exhibit constant values of 0’s

or 1’s at the receiver side.

Fig. 3. Simultaneous double-capture.

The major advantages of this approach are that (1) all

intra-clock-domain (structural and delay) faults can be tested

simultaneously thus yielding much shorter test time and (2)

there is no need to worry about the clock skew issue between

any two clock domains, be they synchronous or

asynchronous. This approach, however, requires that

isolation logic be inserted across all interacting clock

domains so each clock domain can be tested independent of

all other clock domains. The insertion of isolation logic intothe domain boundary exposes the design to one major

drawback which is not present in one-hot double-capture:

the added circuitry to isolate all interacting clock domains

may increase the propagation delay of the design in normal

mode and will prevent all inter-clock-domain faults -

structural faults and delay faults - from being detected.

As the one-hot double-capture or simultaneous

double-capture scheme cannot detect inter-clock-domain

delay faults, how to preserve all benefits of both schemes

while at the same time remove all of their drawbacks is the

main focus of this paper. In the following, we propose novel

logic BIST architecture and launch-on-capture schemes for

detecting intra-clock-domain faults and inter-clock-domain

faults in a multi-frequency circuit containing both

synchronous and asynchronous clock domains. The

proposed launch-on-capture (or double-capture) schemes

are intended to increase the fault coverage of the circuit. The

new architecture implemented with the schemes is intended

to facilitate physical implementation as well as debug anddiagnosis of ASIC devices from the system level down to the

chip level [23]-[26]. The architecture requires a BIST-ready

core that has complied with all scan and BIST design rules.

III. LOGIC BIST ARCHITECTURE USING

LAUNCH-ON-CAPTURE

A. General Architecture

The proposed logic BIST architecture is illustrated in Fig.

4. The BIST architecture for testing the BIST-ready core

consists of a test pattern generator (TPG) for generating test

stimuli, an input selector for providing pseudorandom ortop-up ATPG patterns for the core-under-test, an output

response analyzer (ORA) for compacting test responses, a

clock gating block (discussed in Section IV-C) for

generating test clocks from original or functional clocks, and

a BIST controller for coordinating the whole BIST

operation. The top-up ATPG patterns can include

compressed ATPG patterns to improve the circuit’s fault

coverage during manufacturing test, when the

combinational-logic-based scan compression architecture as

proposed in [27]-[29] is embedded in the design. The test

clocks are placed in a predetermined order of sequence (see

Section IV) so that single-capture or double-capture clock pulses can be supplied to the BIST-ready core. The self-test

operation is started by asserting the Start signal, its end is

indicated by the Finish signal, and its result is shown by the

Result signal. A standard IEEE 1149.1 Boundary-Scan

interface under the control of the test access port (TAP)

controller is used for loading initialization and configuration

data or for downloading internal states for fault diagnosis.

Fig. 4. Logic BIST architecture.

MISR2

Clock Domain

#1

TPG

PS1 /SpE1

PRPG2

PS2 /SpE2

Input Selector

Clock Domain

#

BIST - Ready Core

ORA

PRPG1

MISR1

SpC SpC2

PIs/SIs

POs/SOs Clock Gating Block

Controller

CK 1 CK 2

TDI TDO TCK TMS

Start Finish Result

T CK 1 T CK 2 C

C CK 1 C CK 2

Clock

TPG

Clock Domain

#2

BIST - Ready Core

SpC1

C

Shift Window Capture Window Shift Window Capture Window Shift Window

CK 1 … … …C 1 C 2

d 1

CK 2 …C 4

……C 3

d 2 GSE

Shift Window Capture Window Shift Window

…

…

…

…

CK 1

CK 2

C 1 C 2

C 3 C 4

GSE




B. BIST-Ready Core

The BIST-ready core is a full-scan circuit (scan design)

that satisfies all scan design rules. Additional circuitry may

need to be added for preventing bus conflicts at tri-state

buses and for disabling asynchronous set/reset signals and

false paths. In addition to scan design rules, the BIST-ready

core must also satisfy all BIST-specific design rules, such as

X-blocking and for test point insertion (TPI). X-blocking

which blocks all unknowns ( X ’s) from reaching the scanoutputs is conducted in an intelligent way so that critical

paths are avoided and that the X-blocking circuitry is placed

as close to the X-source as possible. TPI is guided by fault

simulation results. In addition, all multi-cycle paths and false

paths may be selected or blocked depending on test needs.

C. TPG Circuitry and ORA Circuitry

In general, clock skews between two interacting clock

domains in a BIST-ready core, as shown in Fig. 4, are not

aggressively managed. In order to avoid additional design

efforts for clock skew management in logic BIST, two

PRPG-MISR pairs, one for each clock domain, can be used,

even though both clock domains may operate at the samefrequency. However, if hardware overhead is a major

concern, one PRPG-MISR pair can be used. Also, linear

phase shifters, (PS1 and PS2) (a.k.a. space expanders [SpE 1and SpE 2]) can be used to reduce the length of PRPGs, and

space compactors (SpC 1 and SpC 2) can be used to reduce the

length of MISRs.

D. Test Control Circuitry

The test control circuitry consists of a BIST controller and a

clock gating block. The inputs to the clock gating block are

system clocks CK 1 and CK 2, which become CCK 1 and CCK 2

after going through some buffers. The two clocks CCK 1 and

CCK 2 are in fact used by the PRPG and MISR pair as will bediscussed in Section V. In addition, the clock gating block is

controlled by signals from the BIST controller to generate

test clocks TCK 1 and TCK 2. The timings of TCK 1 and TCK 2,

especially in capture mode, play a critical role in determining

the test capability and physical implementation easiness of

the logic BIST scheme. The BIST controller works in tandem

with an embedded TAP controller, which complies with the

IEEE 1149.1 Boundary-Scan standard to coordinate the test,

debug, and diagnosis tasks.

E. Debug and Diagnosis Circuitry

In addition to Logic BIST, diagnosis of the BIST-ready

core at the core level and then down to the failed scan chain

and signature cycle levels, to locate faulty scan cells and

logic gates, is also important. Many innovative BIST debug

and diagnosis approaches have been proposed and surveyed

in [5], [6], [30], and [31]. This subsection details a few

unique design for debug and diagnosis (DFD) features.

E-1. Core-Level Diagnosis

To facilitate test, debug, and diagnosis, each clock domain

(core) is embedded with a unique core-identifier (CID) bit to

decide whether this clock domain will be targeted or not [32].

A design for debug and diagnosis (DFD) circuitry including

the CID bit register will be surrounding all BIST cores that

have been synthesized with their respective logic BIST

controllers under boundary-scan control. There will be an

isolation wrapper for each core. With the objective to enable

end-to-end debug and diagnosis from core to in-field

applications, these CID bits are stitched to form a shift

register, a CID register, so during test, debug, and diagnosis,

these bits can be programmed on-the-fly and these cores are

tested either in series or in parallel.In order to debug or diagnose logic BIST cores in an

integrated circuit, one first sets the CID bits of the BIST

cores to be diagnosed to all 1’s. These cores can then be

processed in parallel. In addition to single error diagnosis,

the CID register can further include additional bits in each

logic BIST core, when desired, to increase diagnostic

resolution for multiple errors that may arise from different

faults, such as stuck-type faults, delay faults, or bridging

faults.

There are a number of benefits with this CID approach.

First, it allows designers to enable or disable diagnosis of

selected BIST cores at any time. Second, it allows designers

to skip the failed BIST cores and focus on diagnosis of otherBIST cores. Third, it allows designers to perform multiple

error diagnosis. Finally, it allows test engineers to manage

power consumption during production testing by selectively

choosing BIST cores for serial/parallel testing.

E-2. Signature Diagnosis

The logic BIST (LBIST) architecture includes special DFD

circuitry to help locate BIST failures down to all failed

signature cycles. When BIST does not pass a manufacturing

test or a system test, designers can utilize the DFD circuitry to

enter the signature diagnosis mode by loading initial seeds

into the PRPG and the MISR, as well as the required number

of LBIST cycles to perform signature diagnosis.When the logic BIST operation in the logic BIST

controller completes the selected LBIST cycles, the

controller will issue a cycle-end signal, halt the BIST

operation, and begin to wait for a continue signal to resume

its BIST operation and to reset the cycle-end signal. During

this period, all LBIST output responses are captured

repeatedly into a test-and-diagnosis (TDR) register to form

an intermediate signature that will be shifted out for analysis.

If the intermediate signature does not agree with the expected

signature during analysis, designers can further instruct the

TAP controller to inform the controller to issue the continue

signal to resume the logic BIST operation. The BIST

operations are repeated until designers have located the firstfailed BIST pattern (signature cycle) or all failed BIST

patterns.

This cycle-based signature diagnosis approach indicates

that the contents of the TDR register are only sampled after a

cycle-end signal is received. If there are many BIST

operations to be performed in parallel, the TAP controller

must wait until all cycle-end signals have been generated. In

addition to loading the initial seeds for the PRPG and the

MISR, the TDR register shown in Fig. 5 stores additional

LBIST data including the golden signature for comparison

with the MISR, pattern counter, scan-chain mask, cycle-mask




start index, cycle-mask stop index, and programmable shift

and capture modes. Additional seeds can also be

pre-computed ahead of time. Those seeds are then loaded

into the BIST core to run multiple, short, logic BIST test

sessions to further help with diagnosis.

Fig. 5. Contents of the test-and-diagnosis (TDR) register.

There are a number of advantages in using this

test-and-diagnosis (TDR) register for signature diagnosis.

First, it can locate failed patterns. Second, it can further

locate multiple capture errors in the scan chain. Finally,

linear search or binary search can be employed to reduce the

diagnosis time.

Since the TDR register contains a pattern counter in

support of the cycle-based approach, the LBIST controller is

capable of launching test sessions with varying pattern counts,

i.e., a test session with its required test patterns derived from

pre-computed logic/fault simulation, or a test session with a

predetermined pattern count. For diagnosis purposes, a test

session can be of a single pattern, thereby allowing designers

to apply test patterns one-by-one to sort and identify failed

patterns for the entire test set. Alternatively, a binary search

can be used to speed up finding the failed patterns through

interactive test sessions between the device-under-debug and

the tester (or system). As the TDR register supports reseeding

of the PRPG/MISR, designers can adjust the starting and

ending points of a test session according to test and diagnosis

needs. The identification of failed patterns must be done

before proceeding to masked-chain or one-chain diagnosis to

minimize the time-consuming diagnosis effort.

E-3. Masked-Chain Diagnosis

The DFD circuitry also includes the XYZ diagnosis

structure proposed in [33] and the improved masking

methods developed in [34] to locate the failed scan chain(s)

and the associated failed scan cells in each failed scan chain.

An LBIST TDR register for debug or diagnosis of LBIST

failures is embedded in each core. The XYZ operation using

the contents of the TDR register shown in Fig. 5 in support of

XYZ diagnosis is illustrated in Fig. 6, where the X field

indicates which scan chain(s) to mask; Y and Z fields indicate

which ranges of scan cells in the scan chain(s) to mask.

For n scan chains, there will be n bits in the X field and n

AND gates in the masked-scan diagnosis (MCD) logic

feeding the MISR. A 0 set to the ith

bit of the X field, X(i)=0,

where 0<i<n, causes the MISR to receive a constant 0 for the

ith

chain. The method of feeding constant values to the MISR

is called a masking mechanism. With this embedded

scan-chain masking logic, designers can run test sessions in a

linear or binary search fashion to locate faulty chain(s).

Fig. 6. Masked-chain diagnosis (MCD) logic diagram.

Once a faulty chain is identified, the next task is to locate

the faulty scan flip-flops within the chain. The Y and Z fields

specify the start and stop indices of the scan flip-flops. The

cycle-mask start index and cycle-mask stop index will allow

the logic BIST controller to skip the region (cycles) where

failed scan flip-flops reside so BIST diagnosis can still be

performed on the remaining fault-free scan cells. The

masked-chain diagnosis feature is also helpful for unknown

( X ) masking in case the X -sources fall within a small fraction

of the scan index or range. The programmable shift and

capture modes are to shift the scan chains at a selected,

reduced frequency to avoid overheating as well as to select

the number of capture clock pulses for stuck-at or delay faulttesting during capture, respectively.

E-4. One-Chain Diagnosis

The DFD circuitry further includes additional test

mechanism to dump all scan values from the reconfigured

scan chains. Assume a failed pattern (signature cycle) has

been located during signature diagnosis or XYZ diagnosis.

The TAP controller can then load an instruction to the TDR

register and instruct the logic BIST controller to (1) set the

TDR register to one-chain diagnosis (OCD) mode, and (2)

shift out the contents of all scan cells in the scan chains which

are reconfigured as an OCD register for analysis. The

reconfigured OCD register is treated as one of the severaldata register chains of the TAP controller after the diagnostic

instruction register has been decoded.

As the number of failed patterns could be large, all failed

patterns are fault graded such that the number of diagnostic

patterns being applied in OCD mode can be reduced without

affecting the resolution of finding the root cause from the

failed combinational logic. Additional diagnostic patterns

can then be shifted into the scan chains to pinpoint the failed

combinational logic gates. This allows the BIST core to be

debugged and diagnosed at the board or system level,

enabling in-field diagnosis [35].

P/F

Y Z

X

PSM2 PCM

PSM1

SEED SIG

Pass/Fail Indicator

Cycle - Mask Start Index Scan - Chain Mask

Programmable Shift Mode

Programmable Capture Mode

Pattern Counter

Initial seeds for PRPG and MISR Final Signature of MISR

Cycle - Mask Stop Index

TDI

TDO

P/F

Y Z

X

PSM PCM

PC

SEED SIG

Cycle - Mask Start Index Scan - Chain Mask

Programmable Capture Mode

Final Signature of MISR

Cycle - Mask Stop Index

TDI

TDO

PSM

Scan Chain MISR PRPG

PCM PSM1 Z Y X P/F

Scan Chain Masking Logic

&

&

&

S E E D

TDO TDI

S I G

Scan Chain MISR PRPG

PCM PC Z Y X P/F

Scan-Chain Masking Logic

&

&

&

S E E D

TDO TDI

S I G




IV. TEST TIMING CONTROL

This section proposes test timing control methods for

capture clocking. Techniques to improve fault coverage and

ease physical implementation are then discussed.

A. Basic Approach

The basic idea is to use an ordered sequence of capture

clocks for all clock domains in each capture window

[23]-[26]. The order can be properly selected based on testrequirements.

A-1. Staggered Single-Capture

Single-capture is a slow-speed test technique in which

only one capture pulse is applied to each clock domain. It is

the simplest approach for detecting all intra-clock-domain

and inter-clock-domain structural faults. No

intra-clock-domain delay faults can be detected within each

clock domain. An example of low-speed test timing control

is shown in Fig. 7, where test clocks TCK 1 and TCK 2 are

staggered and generated by the clock gating block shown in

Fig. 4. In this approach, capture pulses C 1 and C 2 are applied

in a sequential or staggered order within the capture windowto test all intra-clock-domain and inter-clock-domain

structural faults in the two clock domains. For synchronous

clock domains, adjusting d 2 will allow us to detect

inter-clock-domain delay faults between the two clock

domains at-speed.

For asynchronous clock domains, we often can only detect

inter-clock-domain structural faults, not inter-clock-domain

delay faults. As synchronization circuits (or two-flop

synchronizers) are mostly used as a handshaking protocol to

transfer data between two asynchronous clock domains in

normal mode [36]-[37], it is still possible to detect

inter-clock-domain delay faults as long as the delay d 2 can be

properly adjusted (or programmed) via the logic BISTcontroller. Theoretically, a synchronizer can tolerate any

delay, so testing of an inter-clock-domain delay fault across

asynchronous clock domains appears to be not required.

However, an excessive delay in the “request” circuit

(including the synchronizer) or the data path across the two

asynchronous clock domains might not be allowed in a

design which implements a real-time system. Thus, by

adjusting the d 2 value, the excessive delay can also be

detected by this clocking scheme. The value of d 2 depends

on the circuit delay across the two asynchronous clock

domains under consideration, and the allowed delay between

the two clock domains which mainly depends on the system

requirement.Since d 1 and d 3 can be as long as desired, a single,

slow-speed global scan enable signal GSE can be used. This

significantly simplifies the logic BIST physical

implementation for designs with multiple clock domains.

There may be some structural fault coverage loss between

clock domains if the ordered sequence of capture clocks is

fixed for all capture cycles. The reason is because the ordered

sequence may create sequentially untestable faults which

could be detected when the sequence is reversed.

Fig. 7. Timing control using staggered single-capture.

A-2. Aligned Double-Capture

The inability of staggered single-capture to detect

intra-clock-domain delay faults as well as simultaneous

double-capture to detect inter-clock-domain delay faults

among synchronous clock domains can be resolved by using

the aligned double-capture scheme. One aligned

double-capture approach that aligns all capture edges

together is illustrated in Fig. 8. The approach is referred to as

capture aligned double-capture. The major advantage of

using this approach is that all intra-clock-domain and

inter-clock-domain faults for synchronous clock domains

can be tested. The arrows shown in Fig. 8 indicate the delay

faults that can be detected. For example, the three arrowsfrom TCK 1 to C are used to test all intra-clock-domain delay

faults in the clock domain controlled by TCK 1, and all

inter-clock-domain delay faults from TCK 1 to TCK 2 and

TCK 3. The remaining six arrows shown from TCK 2 to C , and

TCK 3 to C are used to test the remaining delay faults.

Fig. 8. Capture aligned double-capture.

Since the active edges (rising edges) of the three capture

pulses (see vertical dashed line C ) must be aligned precisely,

the circuit must contain one reference clock, and the

frequency of all remaining test clocks must be derived from

the reference clock. In the example given here, TCK 1 is the

reference clock operating at the highest frequency, and TCK 2

and TCK 3 are derived from TCK 1 and designed to operate at1/2 and 1/4 the frequency of TCK 1, respectively. Therefore,

this approach is only applicable for at-speed testing of

intra-clock-domain and inter-clock-domain delay faults in

synchronous clock domains.

A similar aligned double-capture approach is shown in

Fig. 9 that aligns all first capture edges rather than second

capture edges. This approach is referred to as launch

aligned double-capture. Similar to capture aligned

double-capture, it is also only applicable for at-speed testing

of intra-clock-domain and inter-clock-domain delay faults in

synchronous domains.

Capture Window Shift Window

…

…

…

…

TCK 1

TCK 2

C 1

C 2

GSE

d 2 d d 1


…

…

…

…

d d 3 d

C 3

TCK 1

TCK 2

TCK 3

C 2 C 1 C

GSE




Consider the three clock domains, driven by TCK 1, TCK 2,

and TCK 3, again. The eight arrows among the dashed line C

and the three capture pulses, C 1, C 2, and C 3, indicate that the

intra-clock-domain and inter-clock-domain delay faults that

can be tested. Unlike Fig. 8, however, in order to detect the

inter-clock-domain delay faults from TCK 1 to TCK 3, a

special shift pulse C 4 is required. As this method requires a

much more complex timing-control scheme, a clock

suppression circuit similar to that proposed in [38] needs tobe used to enable or disable the selected capture pulses. The

dotted clock pulses shown in the figure indicate the

suppressed capture pulses.

The main advantages of both aligned double-capture

approaches are that (1) all intra-clock-domain faults and

inter-clock-domain faults can be detected and (2) a single,

slow-speed global scan enable signal GSE is used. Hence,

both approaches can be used for true at-speed testing of

synchronous clock domains. However, one major drawback

is that precise alignment of the capture pulses is still

required.

Fig. 9. Launch aligned double-capture.

A-3. Staggered Double-Capture

The staggered double-capture scheme solves the capturealignment problem in the aligned double-capture approach.

An example at-speed timing control diagram is shown in Fig.

10. In the capture window, two capture pulses are generated

for each clock domain. The first two capture pulses (C 1 and

C 3) are used to create transitions at the outputs of scan cells,

and the output responses to the transitions are captured by

the second two capture pulses (C 2 and C 4), respectively.

Both delays d 2 and d 4 are set to their respective domains’

operating frequencies. Since d 1, d 3, and d 5 can be adjusted to

any length, we can simply use a single, slow-speed global

scan enable signal GSE for driving all clock domains. Hence,

this approach can provide at-speed testing of

intra-clock-domain delay faults within each clock domain.Similar to staggered single-capture, for both synchronous

and asynchronous clock domains, adjusting d 3 will enable

detection of inter-clock-domain delay faults and

inter-clock-domain structural faults. Since a single GSE

signal is used, this scheme significantly eases physical

implementation and allows designers to integrate logic BIST

with scan/ATPG easily in order to improve the circuit’s

manufacturing fault coverage.

Fig. 10. Timing control using staggered double-capture.

B. Fault Detection Capability

Modern VLSI designs often contain synchronous and

asynchronous clock domains. To maximize BIST (structural

and delay) fault coverage, this section discusses the fault

detection capability associated with each timing control

diagram.

B-1. Intra-Clock-Domain Fault Detection

Intra-clock-domain fault detection is relatively easy by

using an ordered sequence of capture clocks for all clock

domains in each capture window. For each clock domain, a

single clock pulse is used to detect structural faults inlow-speed testing, while two at-speed clock pulses are used

to detect timing-related faults in at-speed testing. It is

preferable to use the double-capture scheme as it detects not

only structural faults but also timing-related faults.

B-2. Inter-Clock-Domain Fault Detection

Inter-clock-domain fault detection is more complex,

especially for timing-related delay faults. Figure 10 shows

four timing waveforms for detecting inter-clock-domain

faults from the clock domain driven by TCK 1 to the clock

domain driven by TCK 2.

If structural faults are to be detected, then delay d can be

adjusted to be larger than the clock-skew between the two

clock domains. This adjustment is easy. On the other hand, if

timing-related delay faults also need to be detected, then

delay d should be further adjusted to satisfy the specified

timing relation between the two clock domains. Generally,

the waveform of Fig. 11d can achieve higher

inter-clock-domain delay fault coverage since a pattern of

higher randomness is applied and fault effects can be

captured immediately.

We conducted an experiment using the logic BIST

product TurboBIST-Logic [39] developed by SynTest

Technologies to compare the transition delay fault detection

capabilities of the waveforms shown in Fig. 11. The circuit

consisted of 11K primitives (instances) and had three clock

domains driven by cclk , mclk , and pclk , respectively. For the

sake of clarity, we disabled cclk and only explored the fault

detection capabilities for the transition delay faults across

the clock-domain logic block between the clock domain

driven by mclk and the one driven by pclk . As shown in

Table I, the waveform shown in Fig. 11d can achieve the

highest inter-clock-domain transition fault coverage. If cclk

is enabled, one would expect the resulting transition fault

coverage to be much higher than that reported in Table I.


…

…

…

…

TCK 1

TCK 2

C 1 C 2

C 3 C 4 d 2 d 3 d 4 d 1 d 5


…

…

…

…

d d d d d

GSE

C

TCK 1

TCK 2

TCK 3

Capture Window C 1 C

C 2

GSE

C 3




(a) Two capture pulses followed by two capture pulses

(staggered double-capture)

(b) Two capture pulses followed by one capture pulse

(c) One capture pulse followed by two capture pulses

(d) One capture pulse followed by one capture pulse

(staggered single-capture)

Fig. 11. Inter-clock-domain delay fault test timing.

TABLE I

INTER-CLOCK-DOMAIN DELAY FAULT DETECTION

CAPABILITY

Test Timing Fault Coverage

Fig. 11a 61.11%

Fig. 11b 61.11%

Fig. 11c 84.92%

Fig. 11d 87.70%

B-3. Fault Detection Summary

Tables II and III show the type of intra-clock-domain and

inter-clock-domain faults in synchronous and asynchronous

clock domains that can be detected by the above-mentioned

timing control schemes described in Sections II and IV.

Each scheme has its advantages and disadvantages. For

example:

(1) One-hot double-capture can yield the highest fault

coverage for intra-clock-domain delay faults, but cannot

detect inter-clock-domain delay faults.

(2) Simultaneous double-capture can only detect

intra-clock- domain structural and delay faults, but

cannot detect any inter-clock-domain structural or delay

faults.

(3) Staggered single-capture can yield the highest fault

coverage for detecting intra-clock-domain structuralfaults in both synchronous and asynchronous clock

domains, but cannot detect any intra-clock-domain

delay faults.

(4) Aligned double-capture can yield the highest fault

coverage for detecting all intra-clock-domain and

inter-clock- domain delay faults in synchronous clock

domains, but is not applicable for testing asynchronous

clock domains; hence, it is best suited for testing

synchronous clock domains.

(5) Staggered double-capture can also detect inter-clock-

domain structural and delay faults that aligned double-

capture cannot detect; hence, it is best suited for testing

asynchronous clock domains.

The summary further indicates that it is preferred to use a

hybrid clocking scheme that includes aligned double-capture

for testing synchronous clock domains and staggered

double-capture for testing asynchronous clock domains.

Alternatively, one may consider testing all synchronous and

asynchronous clock domains in multiple test sessions using a

hybrid scheme that includes simultaneous double-capture

and staggered single-capture. The drawbacks are the

complexity of the BIST controller for providing multiple test

sessions and the need to add isolation logic across all

interacting clock domains.

TABLE II

INTRA-CLOCK-DOMAIN FAULT DETECTION CAPABILITY

(S: STRUCTURAL FAULTS DETECTED; D: DELAY FAULTS

DETECTED)

Timing Control Scheme Synchronous

Intra-domain

Asynchronous

Intra-domain

One-hot double-capture (S, D) (S, D)

Simultaneous double-capture (S, D) (S, D)

Staggered single-capture (S) (S)

Aligned double-capture (S, D) −

Staggered double-capture (S, D) (S, D)

TABLE III

INTER-CLOCK-DOMAIN FAULT DETECTION CAPABILITY(S: STRUCTURAL FAULTS DETECTED; D: DELAY FAULTS

DETECTED)

Synchronous

Inter-domain

Asynchronous

Inter-domain

One-hot double-capture (S) (S)

Simultaneous double-capture − −

Staggered single-capture (S, D) (S, D)

Aligned double-capture (S, D) −

Staggered double-capture (S, D) (S, D)


…

…

…

…

TCK 1

TCK 2

C 1

C 2

GSE

d


…

…

…

…

d


…

…

…

…

TCK 1

TCK 2

C 1

C 2 C 3

GSE

d


…

…

…

…

d


…

…

…

…

TCK 1

TCK 2

C 1 C 2

C 3

GSE

d


…

…

…

…

d


…

…

…

…

TCK 1

TCK 2

C 1 C 2

C 3 C 4

GSE

d


…

…

…

…

d




C. Capture Clock Generation

In order to generate an ordered sequence of

double-capture clocks, one can use clock suppression,

daisy-chain clock-triggering, or token-ring clock-enabling.

Generally, the clock suppression technique as proposed in

[38] is more suitable for testing synchronous clock domains,

whereas the daisy-chain clock-triggering technique or the

token-ring clock-enabling technique as proposed in

[23]-[25] is more suitable for testing asynchronous clock domains. Other design flavors of on-chip clock controllers

can also be found in [40] and [41].

The clock suppression scheme typically requires using a

reference clock operating at the highest frequency. Figure 12

shows an example launch aligned double-capture for two

interacting clocks. Figure 13 shows a clock suppression

circuit for generating the launch aligned double-capture

waveform given in Fig. 12. This circuit uses a reference

clock (CK 1) to program the capture window. The contents of

the 8-bit shift register are preset to {0011, 1111} during each

shift window. Due to its programmability, the approach can

also be used to generate timing waveforms for testing

asynchronous designs. One major requirement is thatdepending on needs, the delay measured by the number of

reference clock pulses be equal to or longer than delay d

between C 2 and C 3 as shown in Fig. 11a. A novel clock

gating circuit for generating staggered single-capture clocks

for inter-clock-domain at-speed testing of synchronous

clock domains can also be found in [42].

Fig. 12. An example launch aligned double-capture

clocking.

Fig. 13. A clock suppression circuit for generating thewaveform given in Fig. 12.

The daisy-chain clock-triggering technique means that the

completion of the shift-in operation triggers the GSE signal

to become 0, switching operation mode from shift to capture.

This in turn triggers the generation of two at-speed clock

pulses for the first clock domain, the rising edge of the

second capture clock pulse triggers the generation of two

at-speed clock pulses for the second clock domain, and so

on. Finally, the rising edge of second capture clock pulse for

the last clock domain triggers the GSE signal to become 1,

switching operation mode from capture to shift. A timing

waveform is shown in Fig. 14 where the delay d is properly

adjusted depending on whether inter-clock-domain

structural or delay faults are to be detected or not.

Fig. 14. Daisy-chain clock-triggering.

The token-ring clock-enabling technique is very similar to

the daisy-chain clock-triggering technique. The only

difference between them is that the former uses a clock edge

to trigger the next event, while the latter uses a signal level to

enable the next event. Figure 15 shows a daisy-chain

clock-triggering circuit for generating the staggered

double-capture waveform given in Fig. 14. When the BIST

mode is activated, the SE1/SE2 generators and 2-pulse

controllers will generate the required scan enable anddouble-capture clock pulses per the arrows shown in Fig. 14.

Each SE1/SE2 can be treated as a GSE signal for CD1/CD2.

Fig. 15. A daisy-chain clock-triggering circuit for

generating the waveform given in Fig. 14.

V. PHYSICAL IMPLEMENTATION ISSUES

A major difference between ATE-based scan test and

logic BIST is that the latter requires a more complex self-test

circuitry be implemented together with the functional

circuitry. Successfully conducting the physical

implementation of the functional circuitry for a high-speed

and high-performance design is in itself a big challenge. If

the self-test circuitry adds a large number of critical signals

and requires strict clock-skew management, the physical

implementation of logic BIST can become prohibitively

difficult.

The proposed logic BIST scheme employs severaltechniques to ease physical implementation. Of the most

significance, a slow-speed global scan enable signal GSE is

used to greatly reduce the clock-skew management

complexity and a PRPG-MISR pair is used for each clock

domain to avoid layout routing congestion. One more

technique using re-timing logic to control clock skew among

each group of PRPG, scan chain, and MISR is illustrated in

Fig. 16 is illustrated in Fig. 16.

TCK 1

TCK 2

GSE

d d

SE2 Generator

SE 2

2-Pulse Controller

TCK 1

CK 1

SE1 Generator

BIST mode

CK 2 CK 2

TCK2-Pulse

Controller

CK1

SE 1

CK 1

CK 1

CK 1

TCK 2 1 1 1 1

CK 2

GSE Generator

BIST mode

GSE

0 0 1 1 TCK 1 ‘0’

‘0’


…

…

…

…

TCK 1

TCK 2

C 1 C 2

C 3 C 4

GSE




Fig. 16. Controlled clock skew management.

For the sake of clarity, Fig. 16 shows only two scan chains,

one each in two different clock domains. During shifting, the

PRPG-MISR pair and its associated scan chain are

reconfigured as one shift register. Since the PRPG and the

MISR are mostly placed far from the scan chain, timing

violations may occur between the PRPG and the scan chain

inputs as well as between the scan chain outputs and the

MISR.

To facilitate physical design, we propose a technique that

always makes the CCK clock that drives the PRPG and the

MISR arrive earlier than the TCK shift clock that drives thescan chain. With this approach, only hold-time violations

may occur from the PRPG to the scan chain inputs, while

only setup-time violations may occur from the scan chain

outputs to the MISR. In this case, the hold-time violations

can be corrected with re-timing D flip-flops, whereas the

setup-time violations can be avoided by reducing logic

levels from the scan chain outputs to the MISR or be

corrected with re-timing D flip-flops. The re-timing logic

should consist of at least one negative-edge pipelining

register (D flip-flop) and one positive-edge pipelining

register (D flip-flop).

Figure 16 illustrates an example re-timing logic among the

PRPG, a scan chain, and the MISR, using two pipeliningregisters on each end. Note that the two clocks, CCK and

TCK , could belong to one clock tree with a small phase shift;

the space expander and space compactor as shown in Fig. 16

can then be ignored. By making the clock-skew problem

under control in this way, we can significantly simplify

physical implementation.

Most importantly, the proposed logic BIST scheme does

not add any isolation logic along the paths from scan chain B

to scan chain A through the combinational logic block C .

This greatly reduces design complexity and avoids potential

performance degradation.

Fig. 17. Re-timing logic among the PRPG, a scan chain,

and the MISR.

VI. EXPERIMENTALRESULTS

The logic BIST architecture proposed in this paper has

been successfully implemented in many industrial designs

[23], [26], [42]. The TurboBIST-Logic tool [39] developed

by SynTest Technologies was used for logic BIST

implementation. The tool allows for designing a logic BIST

system at the register-transfer level (RTL) or gate level

based on:

• The type of logic BIST architecture to adopt.

• The number of PRPG-MISR pairs to use.

• The length of each PRPG-MISR (or PEPG-MISR) pair.

• The stuck-at and transition faults to be detected and

BIST timing control diagrams to be used for detecting

these faults in synchronous clock domains and

asynchronous clock domains.

• The types of optional logic to be added in the BIST

system to ease physical implementation, facilitate debug

and diagnosis, as well as improve the circuit’s fault

coverage.

The tool consists of three major steps in designing the

logic BIST system once all decisions regarding the logic

BIST architecture are made. They are described below.

A. BIST Rule Checking and Violation Repair

The first step is to perform logic BIST design rule

checking on the RTL or gate-level design. All DFT rule

violations of the scan design rules and BIST-specific design

rules as given in [39] must be repaired. Once all DFT rule

violations are repaired, the design which is referred to as a

BIST-ready core is considered to have followed all scan and

logic BIST design rules.

B. RTL BIST Synthesis

Then, the tool automatically creates the RTL logic BIST

controller. The capture-clocking schemes supported in the

BIST controller includes capture aligned double-capture for

synchronous clock domains and staggered double-capture

for asynchronous clock domains. The number of scan chains

for each clock domain is specified along with the names of

their associated scan inputs (SIs) and scan outputs (SOs)

without inserting the actual scan chains into the circuit. The

scan synthesis task can be handled as part of the general

synthesis task, implemented using any (commercially

available) synthesis tool for converting the RTL BIST-ready

core and the logic BIST system into a gate-level netlist.

C. Design Verification and Fault Coverage

Enhancement

Finally, Verilog testbenches are generated and the

synthesized netlist is verified with functional and/or timing

verification to ensure that the logic BIST system functions as

intended. Fault simulation is then performed on the

pseudorandom patterns generated by the PRPGs to

determine the circuit’s fault coverage. If the circuit does not

reach the target fault coverage goal, additional test points

(including control points and/or observation points) can be

PRPG1 MISR1 Scan Chain A PS 1 /SpE 1 SpC 1

PRPG2 MISR2 Scan Chain B PS 2 /SpE 2 SpC 2

CCK 1 TCK 1 CCK 1

CCK 2 TCK 2 CCK 2

C

Shift

Capture

Scan Chain A PS 1 /SpE 1 SpC 1

Scan Chain B PS 2 /SpE 2 SpC 2

C

Shift

D Q

CK

D Q

CK

M I S

R

CCK

D Q

CK

D Q

CK

P R P

G

CCK TCK

Scan Chain




selected and inserted in the BIST-ready core, or top-up

ATPG patterns (including compressed ATPG patterns when

available) can be used during manufacturing test, to increase

the circuit’s fault coverage. The tool allows adding extra test

points in advance at the RTL design with the hope to achieve

the target fault coverage goal. Otherwise, the test point

insertion can be inserted at the gate-level and the fault

simulation process is repeated until the final fault coverage

goal is reached. An example fault simulation and test pointinsertion flow is illustrated in Fig. 18.

Fig. 18. Fault simulation and test point insertion flow.

D. Industrial Design Examples

Based on the TurboBIST-Logic tool capabilities, we have

implemented the staggered double-capture architecture in

two commercially popular CPU cores [23] and added

observation points to improve the fault coverage of each

core. The circuit statistics and experimental results for the

two IP cores are shown in Table IV.

TABLE IV

EXPERIMENTAL RESULTS FOR IP CORES X and Y

Core X Core Y

# of Primitives 218,100 633,400

# of Flip-Flops 10,300 33,200

# of Clock Domains 2 8

Operating Frequency 250MHz 330MHz

# of PRPG/MISR Pairs 2 8

PRPG Length Range 19 19

MISR Length Range 1: 19 / 1: 99 7: 19 / 1: 80

# of Observation Points 1,000 1,000

# of BIST Patterns 20,000 20,000

BIST Fault Coverage

– Transition Faults 93.82% 93.22%

BIST Area Overhead 4.40% 3.20%

During the implementation, we also chose to: (1) construct

one PRPG-MISR pair for each clock domain because there

were crossing clock-domain logic between any two clock

domains, (2) insert scan cells into all PIs and POs so as to

increase intra-clock-domain transition fault coverage, and

(3) skip inserting space compactors between scan chain

outputs and the MISRs in order to avoid setup-time

violations. This is why there were two long MISRs, one with

99 bits in Core X and the other with 80 bits in Core Y . Such a

MISR is generally related to the main and large clock

domain that contains a larger number of scan chains. The

results indicate that it is possible to achieve more than 93%

BIST intra-clock-domain transition fault coverage (in all

synchronous and asynchronous clock domains) when 1000

observation points were inserted into the design. Because

each core has already implemented scan chains, we can only

obtain the BIST area overhead that includes the

PRPG/MISR pairs, the BIST controller, and additional

circuits to insert the space expanders and observation points,

block all unknown ( X ) signals potentially propagated to theMISRs, and mask off all multi-cycle paths and false paths.

In addition, we implemented logic BIST on two large

industrial designs, ranging from 15 to 20 million primitives.

Table V shows the experimental results. Again, we

implemented the staggered double-capture architecture in all

clock domains of each design to reduce physical

implementation efforts. We calculated both BIST transition

fault coverage in all synchronous and asynchronous clock

domains using the intra-clock-domain transition fault model

and the BIST stuck-at fault coverage using the staggered

double-capture patterns in each design. Knowing these BIST

coverage numbers allows for top-up (transition and stuck-at)

ATPG patterns to be added at a later stage to increase thecircuit’s fault coverage during manufacturing test. Both

designs have been successfully taped out and worked the

first time on manufactured chips.

TABLE V

EXPERIMENTAL RESULTS FOR INDUSTRIAL DESIGNS

CKT1 CKT2

# of Primitives 15M 20M

# of Flip-Flops 673K 1.61M

# of Clock Domains 6 9

Operating Frequency 266MHz 533MHz

# of PRPG/MISR Pairs 6 9

PRPG Length Range 19-27 19-24

MISR Length Range 19-253 21-220# of BIST Patterns 64,000 64,000

BIST Fault Coverage

– Stuck-At Faults

– Transition Faults

87.79%

86.63%

89.38%

82.32%

BIST Area Overhead 2.39% 2.07%

The results show that each design can run at its intended

operating frequency. For a design containing millions of

gates, the BIST overhead becomes a fraction of the design.

Once again, because each circuit has already implemented

scan chains, the BIST area overhead is computed similar to

that used in the two CPU cores. However, obtaining better

BIST fault coverage, for example, 95% or higher, is achallenge without adding additional fault coverage

improvement features, such as test point insertion or other

techniques discussed in [5]-[6] and [43]-[45]. This is typical

for BIST designs when only pseudorandom patterns and

launch-on-capture schemes are used.

Table VI shows the circuit statistics and the combined

BIST and ATPG experimental results for another industrial

design that has been in production [42]. The results

demonstrated how the BIST design is coupled with top-up

ATPG to detect 96.9% intra-clock-domain transition faults

Test Point Selection at RTL Design

Logic/Scan Synthesis

Gate-Level TestPoint Insertion

Yes

Done

No CoverageAcceptable ?

Fault Simulation




in all synchronous and asynchronous clock domains. In this

experiment, 64K random patterns using logic BIST and

8,183 deterministic patterns using top-up ATPG were

applied to the design.

To detect inter-clock-domain transition faults in all

synchronous clock domains, additional top-up ATPG

patterns were also generated and applied using the

inter-clock at-speed control scheme proposed in [42]. The

results are shown in Table VII. In this case, 6 inter-clock logic blocks, A ~ F , were targeted. For example, A is a logic

block from a 100MHz clock to a 300MHz clock, and

contains 36,858 transition faults. Table VII shows the

generated ATPG patterns, fault coverage, and CPU time.

Furthermore, the area overhead in this application was

mainly due to 6 inter-clock enable generators, each

containing 20 flip-flops and consuming an area of roughly

124 equivalent 2-input NAND gates.

TABLE VI

INTRA-DOMAIN FAULT DETECTION FOR BIST DESIGN CKT3

CKT3

# of Primitives 11.1M

# of Flip-Flops 404.9K

# of Clock Domains 11

Min. Frequency 66MHz

Max. Frequency 533MHz

# of Scan Chains 32

Maximum Chain Length 13,598

# of BIST Patterns 64,000

# of Top-Up ATPG Patterns 8,183

Intra-Domain Fault Coverage

– Transition Faults 96.9%

TABLE VII

INTER-DOMAIN FAULT DETECTION FOR BIST DESIGN CKT3

A

B

C

D

E

F

Inter-Clock LogicBlocks

From(MHz)

To(MHz) Fault

Cov.# of Vec.

300

533

266

133

533

133

100

133

133

533

266

266

232

32

36

9

9

3

86.4

100

100

100

100

100

CPU(h:m)

4:30

0:15

0:14

0:10

0:10

0:09

ATPG# of

Faults

36858

8350

4942

1940

732

64

A

B

C

D

E

F

Inter-Clock LogicBlocks

From(MHz)

To(MHz) Fault

Cov.# of Vec.

300

533

266

133

533

133

100

133

133

533

266

266

232

32

36

9

9

3

86.4

100

100

100

100

100

CPU(h:m)

4:30

0:15

0:14

0:10

0:10

0:09

ATPG# of

Faults

36858

8350

4942

1940

732

64

In the case of detecting inter-clock-domain delay faults in

asynchronous clock domains, unfortunately, we have not

been able to conduct true experiments because all industrialchips taped out as of today have not been designed for such

purpose. However, as discussed earlier, our proposed

staggered single or double capture scheme can deal with

such cases by carefully adjusting the d value given in Figs.

10a or 10d. Consequently, delay fault testing of

inter-clock-domain faults in asynchronous clock domains is

exactly the same as that in synchronous clock domains.

Table I has demonstrated via an industrial design how

various clocking schemes affect the fault coverage of

inter-clock-domain transition faults between two

synchronous (or two asynchronous) clock domains.

VII. CONCLUSIONS

Delay fault testing based on launch-on-capture is

commonly practiced in industry due to the ease of using a

slow-speed global scan enable signal. When a BIST design

contains a mix of synchronous and asynchronous clock

domains, the conventional one-hot or simultaneousdouble-capture scheme cannot detect any

inter-clock-domain delay faults. This makes logic BIST

even more difficult to achieve high BIST fault coverage

when pseudorandom patterns are mainly used for testing.

This paper presented new at-speed logic BIST

architecture using double-capture (a.k.a. launch-on-capture)

for testing BIST designs containing multi-frequency

synchronous and asynchronous clock domains. To facilitate

debug and diagnosis, the BIST architecture also includes

BIST diagnosis logic to help locate BIST failures.

It was shown that the aligned double-capture scheme

employed in the architecture is most suitable for testing

synchronous clock domains to achieve true at-speed test

quality, whereas the staggered double-capture scheme

employed is most suitable for testing asynchronous clock

domains. Physical implementation becomes easier due to the

use of a slow-speed global scan enable signal and reduced

timing-critical design requirements. If structural faults are

only considered for detection and diagnosis, the BIST

architecture built upon the staggered single-capture scheme

can result in highest fault coverage with lowest BIST

overhead. Application results for several industrial designs

have demonstrated the effectiveness of the proposed

architecture. These results further indicated that the

proposed double-capture schemes can reach high BIST faultcoverage. For designs containing both synchronous and

asynchronous clock domains, a hybrid clocking scheme

using aligned double-capture and staggered double-capture

is proposed; however, challenges still lie ahead with regard

to how to increase the BIST transition fault coverage of the

design to a much more acceptable level, say 95% or above,

and how to locate BIST failures more effectively during

debug and diagnosis from the system level down to the chip

level.

ACKNOWLEDGMENTS

The authors are grateful to the anonymous referees forpointing out unclear descriptions of the paper and giving

constructive suggestions. The authors also would like to

thank Dr. B. Cheon and E. Lee of Samsung Electronics in

Korea, Tomotaka Odajima of Marubeni Information

Systems in Japan, and many colleagues of SynTest

Technologies in the US, Korea, China, and Taiwan for

providing the experimental results listed in Tables IV to VII.

This work was supported in part by National Science

Foundation of USA under grant CCF-0541103.




REFERENCES

[1] M. Abramovici, M. A. Breuer, and A. D. Friedman, Digital Systems

Testing and Testable Design. Piscataway, NJ: IEEE Press, 1990.

[2] M. L. Bushnell and V. D. Agrawal, Essentials of Electronic Testing

for Digital, Memory & Mixed-Signal VLSI Circuits. Boston, MA:

Springer, 2000.

[3] C. E. Stroud, A Designer’s Guide to Built-In Self-Test . Boston, MA:

Springer, 2002.

[4] N. K. Jha and S. K. Gupta, Testing of Digital Systems. London, UK:

Cambridge University Press, 2003.

[5] L.-T. Wang, C.-W. Wu, and X. Wen, Eds., VLSI Test Principles and

Architectures: Design for Testability. San Francisco, CA: Morgan

Kaufmann, 2006.

[6] L.-T. Wang, C. E. Stroud, and N. A. Touba, Eds., System-on-Chip

Test Architectures: Nanometer Design for Testability. San

Francisco, CA: Morgan Kaufmann, 2007.

[7] L.-T. Wang, Y.-W. Chang, and K.-T. Cheng, Eds., Electronic

Design Automation: Synthesis, Verification, and Test . San

Francisco, CA: Morgan Kaufmann, 2009.

[8] P. H. Bardell and W. H. McAnney, “Self-testing of multiple logic

modules,” in Proc. IEEE Int. Test Conf., 1982, pp. 200-204.

[9] B. Nadeau-Dostie, A. Hassan, D. Burek, and S. Sunter, “Multiple

Clock Rate Test Apparatus for Testing Digital Systems,” U.S.

Patent 5 349 587, Sept. 20, 1994.

[10] S. Bhawmik, “Method and Apparatus for Built-In Self-Test with

Multiple Clock Circuits,” U.S. Patent 5 680 543, Oct. 21, 1997.

[11] G. Hetherington, T. Fryars, N. Tamarapalli, M. Kassab, A. Hassan,and J. Rajski, “Logic BIST for large industrial designs: Real issues

and case studies,” in Proc. IEEE Int. Test Conf., 1999, pp. 358-367.

[12] J. Savir and S. Patil, “Scan-based transition test,” IEEE Trans. on

Computer-Aided Design, vol. 12, no. 8, pp. 1232-1241, Aug. 1993.

[13] J. Savir and S. Patil, “Broad-side delay test,” IEEE Trans. on

Computer-Aided Design, vol. 13, no. 8, pp.1057-1064, Aug. 1994.

[14] S. Wang, X. Liu, and S. T. Chakradhar, “Hybrid delay scan: A low

hardware overhead scan-based delay test technique for high fault

coverage and compact test sets,” in Proc. IEEE/ACM Design,

Automation and Test in Europe Conf., 2004, pp. 1296-1301.

[15] J. Abraham, U. Goel, and A. Kumar, “Multi-cycle sensitizable

transition delay faults,” in Proc. IEEE VLSI Test Symp., 2006, pp.

306-311.

[16] Z. Zhang, S. M. Reddy, I. Pomeranz, X. Lin and J. Rajski, “Scan

tests with multiple fault activation cycles for delay faults,” in Proc.

IEEE VLSI Test Symp., 2006, pp. 343-348.[17] N. Ahmed and M. Tehranipoor, “Improving Transition delay test

using a hybrid method,” IEEE Design & Test of Computers , vol. 23,

no. 5, pp. 402-412, Sept.-Oct. 2006.

[18] G. Xu and A. D. Singh, “Achieving high transition delay fault

coverage with partial DTSFF scan chains,” in Proc. IEEE Int. Test

Conf., 2007, Paper 17.1.

[19] B. Nadeau-Dostie, K. Takeshita, and J.-F. Côté, “Power-aware

at-speed scan test methodology for circuits with synchronous

clocks,” in Proc. IEEE Int. Test Conf , 2008, Paper 9.3.

[20] I. Park and E. J. McCluskey, “Launch-on-shift-capture transition

tests,” in Proc. IEEE Int. Test Conf , 2008, Paper 35.3.

[21] G. Xu and A. D. Singh, “Low Cost LOS Delay Test with Slow Scan

Enable,” in Proc. IEEE European Test Symp., 2006, pp. 9-14.

[22] G. Xu and A. D. Singh, “Delay test scan flip-flop: DFT for high

coverage delay testing,” in Proc. Int. Conf. on VLSI Design, 2007,

pp. 763-768.[23] B. Cheon, E. Lee, L.-T. Wang, X. Wen, P. Hsu, J. Cho, J. Park, H.

Chao, and S. Wu, “At-speed logic BIST for IP cores,” in Proc.

IEEE/ACM Design, Automation and Test in Europe Conf. , 2005, pp.

860-861.

[24] L.-T. Wang, X. Wen, P.-C. Hsu, S. Wu, and J. Guo, “At-speed logic

BIST architecture for multi-clock designs,” in Proc. IEEE Int. Conf.

on Computer Design, 2005, pp. 475-478.

[25] L.-T. Wang, P.-C. Hsu, S.-C. Kao, M.-C. Lin, H.-P. Wang, H.-J.

Chao, and X. Wen, “Multiple-Capture DFT System for Detecting or

Locating Crossing Clock-Domain Faults During Self-Test or

Scan-Test,” U.S. Patent 7 007 213, Feb. 28, 2006; also in European

Patent 1 360 513, Apr. 2, 2008.

[26] J. Qian, X. Wang, Q. Yang, F. Zhuang, J. Jia, X. Li, Y. Zuo, J.

Mekkoth, J. Liu, H.-J. Chao, S. Wu, H. Yang, L. Yu, F. Zhao, and

L.-T. Wang, “Logic BIST architecture for system-level test and

diagnosis,” in Proc. IEEE Asian Test Symp., 2009.

[27] L.-T. Wang, X. Wen, H. Furukawa, F.-S. Hsu, S.-H. Lin, S.-W. Tsai,

K. S. Abdel-Hafez, and S. Wu, VirtualScan: A new compressed

scan technology for test cost reduction, in Proc. IEEE Int. Test

Conf., 2004, pp. 916-925.

[28] L.-T. Wang, X. Wen, S. Wu, Z. Wang, Z. Jiang, B. Sheu, and X. Gu,

VirtualScan: Test compression technology using combinational

logic and one-pass ATPG, IEEE Design & Test of Computers, vol.25, no. 2, pp. 122-130, March-April 2008.

[29] L.-T. Wang, H.-P. Wang, X. Wen, M.-C. Lin, S.-H. Lin, D.-C. Yeh,

S.-W. Tsai, and K. S. Abdel-Hafez, “Method and Apparatus for

Broadcasting Scan Patterns in a Scan Based Integrated Circuit,”

U.S. Patents 7 412 637 and 7 412 672, Aug. 12, 2008.

[30] W.-T. Cheng, M. Sharma, T. Rinderknecht, L. Lai, and C. Hill,

“Signature-based diagnosis for logic BIST,” in Proc. IEEE Int. Test

Conf., 2006, pp. 265-273.

[31] Y. Huang, R. Guo, W.-T. Cheng, and J. C.-M. Li, “Survey of Scan

Chain Diagnosis,” IEEE Design & Test of Computers , vol. 25, no. 3,

pp. 240-248, May-June 2008.

[32] L.-T. Wang, M.-T. Chang, S.-H. Lin, H.-J. Chao, J. Lee, H.-P. Wang,

X. Wen, P.-C. Hsu, S.-C. Kao, M.-C. Lin, S.-W. Tsai, and C.-C. Hsu,

“Method and Apparatus for Diagnosing Failures in an Integrated

Circuit Using Design-for-Debug (DFD) Techniques,” European

Patent 1 364 436, May 24, 2006; also in U.S. Patent 7 284 175, Oct.16, 2007.

[33] J. Ghosh-Dastidar and N. A. Touba, “A rapid and scalable diagnosis

scheme for BIST environments with a large number of scan chains,”

in Proc. IEEE VLSI Test Symp., 2000, pp. 79-85.

[34] K. S. Abdel-Hafez, X. Wen, L.-T. Wang, P.-C. Hsu, S.-C. Kao,

H.-J. Chao, and H.-P. Wang, “Method and Apparatus for Debug,

Diagnosis, and Yield Improvement for Scan-Based Integrated

Circuits,” U.S. Patent 7 058 869, June 6, 2006.

[35] L.-T. Wang, X. Wen, K. S. Abdel-Hafez, S.-H. Lin, H.-P. Wang,

M.-T. Chang, P.-C. Hsu, S.-C. Kao, M.-C. Lin, and C.-C. Hsu,

“Method and Apparatus for Unifying Self-Test with Scan-Test

During Prototype Debug and Production Test,” U.S. Patent 7 444

567, Oct. 28, 2008.

[36] C. Dike and E. Burton, “Miller and noise effects in a synchronizing

flip-flop,” IEEE J. of Solid-State Circuits, vol. 34, no. 6, pp.

849-855, June 1999.

[37] R. Ginosar, “Fourteen ways to fool your synchronizer,” in Proc.

IEEE Int. Symp. on Asynchronous Circuits and Systems, 2003, pp.

89-96.

[38] M. Beck, O. Barondeau, M. Kaibel, F. Poehl, X. Lin, and R. Press,

“Logic design for on-chip test clock generation - Implementation

details and impact on delay test quality,” in Proc. IEEE/ACM

Design, Automation and Test in Europe Conf., 2005, pp. 56-61.

[39] TurboBIST-Logic User’s Manual, SynTest Technologies,

Sunnyvale, CA, USA, 2009. (http:www.syntest.com)

[40] B. Keller, A. Uzzaman, B. Li, and T. Snethen, “Using

programmable on-product clock generation (OPCG) for delay test,”

in Proc. IEEE Asian Test Symp., 2007, pp. 69-72.

[41] X.-X. Fan, Y. Hu, and L.-T. Wang, “An on-chip test clock control

scheme for multi-clock at-speed testing,” in Proc. IEEE Asian Test

Symp., 2007, pp. 341-348.

[42] H. Furukawa, X. Wen, L.-T. Wang, B. Sheu, Z. Jiang, and S. Wu,

“A novel and practical control scheme for inter-clock at-speed

testing,” in Proc. IEEE Int. Test Conf., 2006, Paper 17.2.

[43] H.-C. Tsai, K.-T. Cheng, and S. Bhawmik, “Improving the test

quality for scan-based BIST using a general test application

scheme,” in Proc. ACM/IEEE Design Automation Conf., 1999, pp.

748–753.

[44] Y. Li, S. Makar, and S. Mitra, “CASP: Concurrent autonomous chip

self-test using stored test patterns,” in Proc. IEEE/ACM Design,

Automation and Test in Europe Conf., 2008, pp. 885-890.

[45] L.-T. Wang, H. S. Hsiao, H.-J. Chao, Z. Jiang, S. Wu, and J. Yan,

“Method and Apparatus for Delay Fault Coverage Enhancement,”

U.S. Patent Application 12/554,437, Sept. 4, 2009.




Laung-Terng (L.-T.) Wang (M’87–SM’04–F’08) received his BSEE and

MSEE degrees from National Taiwan University in 1975 and 1977,

respectively, and his MSEE and EE Ph.D. degrees under the Honors

Cooperative Program (HCP) from Stanford University in 1982 and 1987,

respectively.

He has been chairman and chief executive officer (CEO) of SynTest

Technologies, Inc. (Sunnyvale, CA) since January 1990, and a visiting

professor in the Department of Electrical Engineering and the Graduate

Institute of Electronics Engineering at National Taiwan University since

July 2009. Prior to founding SynTest in 1990, he worked in the industry,including Intel (Santa Clara, CA) from 1980 to 1983 and Daisy Systems

(Mountain View, CA) from 1983 to 1986, and was with the Department of

Electrical Engineering at Stanford University as Research Associate and

Lecturer from 1987 to 1991.

Dr. Wang currently holds 21 U.S. Patents, 15 European Patents, one

Japan Patent, and one China Patent, in the areas of scan synthesis, test

generation, at-speed scan testing, test compression, logic built-in self-test

(BIST), and design for debug-and-diagnosis (DFD). The

design-for-testability (DFT) technologies Dr. Wang has developed have

been successfully implemented in thousands of ASIC designs worldwide.

He has also co-authored and co-edited three internationally used DFT/EDA

textbooks – VLSI Test Principles and Architectures (2006),

System-on-Chip Test Architectures (2007), and Electronic Design

Automation (2009).

A member of Sigma Xi, Dr. Wang received a 2007 Meritorious Service

Award from the IEEE Computer Society and is a co-recipient of the 2008IEICE Information and Systems Society Excellent Paper Award for an

excellent series of papers that appeared in IEICE Transactions on

Information and Systems during a period of five years. He is a Fellow of the

IEEE and a Golden Core Member of the IEEE Computer Society.

Xiaoqing Wen (S’89–M’93–SM’08) received his B.S. degree in

Computer Science and Technology from Tsinghua University, Beijing,

China, in 1986, his M.S. degree in Information Engineering from

Hiroshima University, Hiroshima, Japan, in 1990, and his Ph.D. degree in

Applied Physics from Osaka University, Osaka, Japan, in 1993.

From 1993 to 1997, he was a Lecturer at Akita University. He was a

Visiting Researcher at University of Wisconsin - Madison from October

1995 to March 1996. He joined SynTest Technologies, Inc. (Sunnyvale,

CA) in 1998 and served as its chief technology officer (CTO) until 2003. In

2004, he joined the Kyushu Institute of Technology, Iizuka, Japan, where

he is currently a Professor and Chair of the Department of Creative

Informatics. His research interests include design, test, and diagnosis of

integrated circuits.

Prof. Wen currently holds 15 U.S. Patents, 2 Japan Patents, and 22

pending U.S. and Japan Patents in logic built-in self-test (BIST), test

compression, and low-capture-power (LCP) test generation. He has also

co-authored and co-edited two textbooks – VLSI Test Principles and

Architectures (2006) and Power-Aware Testing and Test Strategies for

Low Power Devices (2009). He received the 2008 IEICE Information and

Systems Society Excellent Paper Award for LCP X -filling and test

generation. He is a senior member of the IEEE, a member of the IEICE, and

a member of the REAJ.

Shianling Wu (A’88–M’09) has an M.S. in Computer Science from

Columbia University. She joined SynTest Technologies, Inc. (Princeton

Junction, NJ) in 2003 and is presently Vice President of Engineering

focusing on advanced VLSI DFT research and development. Prior to

SynTest, she was with Bell Laboratories for over 23 years. She currently

holds 5 U.S. Patents and has 3 pending U.S. Patents. She has published over

15 DFT papers and contributed chapters to two DFT textbooks – VLSI Test

Principles and Architectures (2006) and Electronic Design Automation

(2009).

She has served as a program committee member for IEEE International

Test Conference, Asian Test Symposium, and North Atlantic Test

Workshop. She won numerous AT&T and Lucent Awards and received a

Best Panel Award in the 2005 IEEE International Test Conference. She was

a member of SEMATECH, SRC, GSRC, STARC-International, VSIA, and

the IEEE1500 Standard Committee. She is a member of the IEEE.

Hiroshi Furukawa received his B.S. degree in Electrical Engineering from

Kumamoto University, Japan, in 1992.

He joined NEC Micro Systems (Kumamoto, Japan) in 1992 and is

currently a design manager. He is also a Ph.D. student in the Department of

Creation Informatics at the Graduate School of Computer Science and

Systems Engineering, Kyushu Institute of Technology, Japan. His research

interests include VLSI testing and logic built-in self-test (BIST).

Hao-Jan Chao (M’09) graduated from the Department of Electronic

Engineering at National Taipei University of Technology (formerly,

National Taipei Institute of Technology), Taipei, Taiwan, in 1990. He

further received his B.S. degree in Nautical Technology at National Taiwan

Ocean University, Keelung, Taiwan, in 1995, and his M.S. degree in

Electrical Engineering from National Central University, Taoyuan, Taiwan,

in 1999.

He has been working at SynTest Technologies, Inc., Hsinchu, Taiwan,

since 1999, and is currently an R&D Manger responsible for the logic

built-in self-test (BIST) as well as silicon debug and fault diagnosis product

lines. His research interests include logic BIST, core-based design for

testability (DFT), and design for debug-and-diagnosis (DFD).

Boryau (Jack) Sheu received his BSEE and MSEE degrees from National

Taiwan University and Washington University in St. Louis in 1984 and

1991, respectively.He was Director of Operation and R&D at SynTest Technologies from

January 2001 to April 2009. Prior to SynTest, he held various software

engineering positions in startup companies. He is currently with Sigma

Designs (Milpitas, CA) focusing on DFT flow and implementation. His

research interests include at-speed ATPG, logic BIST, as well as test

strategy and integration for VDSM SOC designs. He is a co-inventor of 6

U.S. Patents related to at-speed scan testing and test compression.

Jianghao Guo received his B.S. and M.S. degrees in Control Science and

Engineering from Shanghai Jiaotong University, Shanghai, China, in 2001

and 2004, respectively. He joined SynTest Technologies, Inc., Shanghai,

China, in 2004, where he served as an engineering manager until 2008.

He is currently a Ph.D. candidate in the Department of Electrical and

Computer Engineering at the University of Cincinnati. His research

interests include VLSI testing, computer architecture, and multi-core

system design.

Wen-Ben Jone (M’84–SM’02) was born in Taipei, Taiwan, Republic of

China. He received the B.S. degree in Computer Science in 1979, the M.S.

degree in Computer Engineering in 1981, both from National Chiao-Tung

University, Hsinchu, Taiwan, and the Ph.D. degree in Computer

Engineering and Science from Case Western Reserve University,

Cleveland, Ohio, in 1987.

In 1987, he joined the Department of Computer Science at New Mexico

Institute of Mining and Technology, Socorro, New Mexico, where he was

promoted to an associate professor in 1992. From 1993 to 2000, he was a

full professor in the Department of Computer Engineering and Information

Science at National Chung-Cheng University, Chiayi, Taiwan, R.O.C.

Since 2001, he has been an associate professor in the Department of

Electrical and Computer Engineering at the University of Cincinnati, Ohio.

His research interests include VLSI design for testability, built-in self-test,

memory testing, high-performance circuit testing, MEMS testing and

repair, and low-power circuit design and test.

Dr. Jone has been a reviewer in these research areas in various technical

journals and conferences. He also served on the program committee of

various technical conferences. He received the Best Thesis Award from The

Chinese Institute of Electrical Engineering (Republic of China) in 1981. He

is a co-recipient of the 2003 IEEE Donald G. Fink Prize Paper Award. He is

also a co-recipient of the Best Paper Award of the 2008 IEEE International

Symposium on Low-Power Electronics & Design. He is a senior member of

the IEEE and the IEEE Computer Society Test Technology Technical

Committee.

Date post:	09-Apr-2018
Category:	Documents
Upload:	ryan-child
View:	225 times
Download:	0 times

TCAD BIST Launch on Capture Paper 4885 091209 R69

Documents