IndirectTestArchitectureforSoCTestingsoc/soc/publications/indirect_test_access.pdf ·...

1128 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 23, NO. 7, JULY 2004

TABLE IVRISE TIME t IN NANOSECONDS

responses of the approximations and frequency magnitude responsesfor the approximation error transfer functions show similar results asin Example 1 and thus are not shown again.

VI. CONCLUSION

We propose a method to obtain second-order approximations fortransfer functions in RLC trees. Examples show that the two-poleone-zero approximations give improved accuracy over the existingsecond-order approximations in terms of step response, frequencyresponse, estimated delay time, and rise time. The results can be usedto quickly estimate signal delay and other parameters in RLC trees.

ACKNOWLEDGMENT

The authors thank the reviewers, whose comments improved thepaper.

REFERENCES

[1] L. O. Chua, C. A. Desoer, and E. S. Kuh, Linear and Nonlinear Cir-cuits. New York: McGraw Hill, 1987.

[2] Y. I. Ismail, E. G. Friedman, and J. L. Neves, “Equivalent elmore delayfor RLC trees,” in Proc. IEEE/ACM Design Automation Conf., June1999, pp. 715–720.

[3] , “Equivalent elmore delay for RLC trees,” IEEE Trans. Computer-Aided Design, vol. 19, pp. 83–97, Jan. 2000.

[4] L. T. Pillage and R. A. Rohrer, “Asymptotic waveform evaluation fortiming analysis,” IEEE Trans. Computer-Aided Design, vol. 9, pp.352–366, Apr. 1990.

[5] G. Lindfield and J. Penny, Numerical Methods UsingMATLAB. Englewood Cliffs, NJ: Prentice Hill, 2000.

[6] C. H. Wu, “Model order reduction for RLC trees,” M.S. thesis, Dept.Elect. Contr. Eng., National Chiao-Tung Univ., June 2003.

Indirect Test Architecture for SoC Testing

Mohsen Nahvi and André Ivanov

Abstract—A generic model for test architectures in the core-basedsystem-on-chip (SoC) designs consists of source/sink, wrapper, and testaccess mechanism (TAM). Current test architectures for digital coresassume a direct connection between the core and the tester. In thesearchitectures, the tester establishes a physical link between itself and thecore, such that it can directly control the core’s design-for-testability(DFT), such as the scan chains or primary inputs. This direct connectionundermines the modularity in the generic test architecture by tightlycoupling its elements. In this paper, we propose a network-orientedindirect and modular architecture (NIMA) for postfabrication test inan SoC design methodology. In NIMA, test stimuli and expected resultsfor digital cores are first compiled into new formats and subsequentlyencapsulated into packets. These packets are augmented with controland address bits such that they can autonomously be transmitted to theirdestination through a switching fabric. Owing to the indirect nature of theconnection, embedded autonomous blocks at each core are used to applythe test to the core and compare the test results with expected values. Thisindirect access to the core decouples test data processing at the core fromits communication providing the basis for flexible and modular test designand programming. Moreover, NIMA facilitates remote-access of single ormultiple testers to an SoC, and enables the sending of test data to an SoCin-field in order to test the chip in its target system. Finally, NIMA servesin contributing toward the development of new test architectures thatbenefit from network-centric SoCs. We present a first implementation ofNIMA when applied to a number of SoC benchmarks.

Index Terms—Core-based testing, design-for-testability (DFT), net-works-on-chip (NoC), system-on-chip (SoC).

I. INTRODUCTION

The productivity gap is one that exists between the productivity ofchip designers and the available resources on today’s complex chips[1]. An effective way to overcome the productivity gap is to reusepreviously designed/verified functional blocks, or semiconductorintellectual properties (IPs), as embedded cores in a system-on-chip(SoC) design [2]. The integration of IP blocks into a design resultsin increased complexity of the design-for-testability (DFT) aspectsand manufacturing test [3]. Testing core-based SoCs presents majorchallenges especially in regards to the limited accessibility of theembedded cores and the generation of a system DFT [2]. To reducetest development time and, hence, keep up with shorter time-to-marketand time-to-volume pressures, a similar productivity improvementtechnique, i.e., reuse of the DFT and test program for each core,needs to be applied [3]. In the modular model of SoC testing, thecore-user treats individual core test programs as distinct componentsand integrates/schedules these components into a system test programwith limited knowledge of the core’s internal detail [2].

To address the core-based SoC testing challenges, a more structuredand systematic approach than the traditional DFT is required. Zorianet al. [2] proposed a generic test architecture consisting of source/sink,wrapper, and test access mechanism (TAM). In this model, the source

Manuscript received September 5, 2002; revised August 2, 2003 and October28, 2003. This work was supported in part by the Canadian MicroelectronicsCorporation (CMC), in part by the Natural Sciences and Engineering ResearchCouncil of Canada (NSERC), in part by Micronet Research and Development,in part by PMC-Sierra, and in part by the Gennum Corporation. This paper wasrecommended by Associate Editor K. Chakrabarty.

The authors are with the Department of Electrical and Computer Engi-neering, University of British Columbia, Vancouver, BC V6T 1Z4, Canada(e-mail: [email protected]; [email protected]).

Digital Object Identifier 10.1109/TCAD.2004.829796

0278-0070/04$20.00 © 2004 IEEE

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 23, NO. 7, JULY 2004 1129

provides the test stimuli to the core under test and the sink collectsand performs an analysis of the test results. The TAM is the physicalmechanism that connects the source and the sink with the core. Itsdesign determines how efficiently information is received and trans-mitted from and to the outside world, and thus controls, in part, thetotal test time, the complexity of test flow, and test cost in general. Thewrapper is a shell around a core providing an interface between the coreand its surroundings and can isolate the core for testing purposes. Thewrapper effectively enables “plug-and-play” cores where cores can beacquired from different providers and integrated into an SoC withoutany modifications. The IEEE P1500 working group is working towarda standard for embedded core test (SECT) to allow the automatic iden-tification and configuration of testability features in integrated circuitscontaining embedded cores [4]. Toward that end, the P1500 aims atstandardizing a scalable architecture in the form of a wrapper around acore and defines a core test language (CTL). The scalable architecturewill provide a mandatory serial port and an optional parallel interfacefor testing a core. In addition, the architecture will provide a standardset of instructions enabling different modes, mandatory and optional,for the wrapper. A core provider will use the standard language to com-municate to the core user the internal, external, and pattern informationfor every test mode of a core.

Embedded testing architectures in the form of built-in-self-test(BIST) have been used to reduce test cost [5]. In these techniques,both the test source and the sink are local to the circuit under test(CUT) and, hence, the TAM amounts to local wires between BISTcircuitry and the core. This local communication model simplifies theproblem of test data transmission. However, on-chip storage of theentire deterministic test vectors set for every core is not cost-effectiveor even practical in many cases. To eliminate the need for on-chipstorage of test data, typically, linear feedback shift registers (LFSR)are used to generate pseudo random patterns as the stimuli in theBIST arrangement. However, using LFSR results in potentially lowerfault coverage compared to that resulting from deterministic patternsobtained via automatic test pattern generation (ATPG).

Although BIST can be used for testing the cores of an SoC design,this work assumes that functional and/or ATPG-based tests are an es-sential part of chip sign-off. We also assume that the test data cannot beeffectively stored in the proximity of cores and that it needs to be com-municated between the source and the sink and the core. Therefore, thefocus of this work is the development of a novel test architecture thatrequires the communication of test data through global interconnectsacting as a communication link. Moreover, the typical arrangement infunctional and/or ATPG-based tests relies on an external automatic testequipment (ATE) to act as both a test source and a sink. In this arrange-ment, the tester establishes a physical link between itself and the coresuch that it can directly control the core’s DFT, such as the scan chainsor primary inputs. In our work, we propose a network-oriented indi-rect and modular architecture (NIMA), where the tester is assumed tolack direct control over a core’s DFT and is decoupled from the TAMand the core wrapper. Test data in NIMA is formatted into a new pro-tocol and augmented with control bits. Subsequently, the reformattedtest data and the control bits are encapsulated into packets such thatthe test data can be forwarded to its core autonomously. At a core, em-bedded autonomous blocks decode these packets and retrieve the testdata. Finally, these embedded blocks apply the test stimuli to the coreand compare the responses with expected results identifying any error.

In the remainder of this paper, we first review prior proposed testarchitectures and motivate our current work. In Section III, we describethe concept of NIMA in detail. In Section IV, we present one exampleof NIMA’s implementation on an SoC platform, and Section V reportsthe experimental results. Finally, Section VI concludes the paper andpoints to future work.

II. PRIOR WORK

In SoC test architectures, test stimuli and test results are transportedthrough the TAM. Hence, the TAM can be viewed as the communi-cation link in the architecture. In addition to the TAM, extra controllines are used to properly set-up the TAM. Hence, there are two sets oflines in SoC test architectures: 1) data lines (referred to as TAM in thiswork) and 2) control lines. Dedicated wires or existing functional inter-connects can be used for the data lines [6]–[9]. While in the first threeworks the cores aremodified such that each core has a transparent modefor testing, [9] uses the processor bus for test data transport. There aremany proposed test architectures in the literature. Based on the con-nection method between the chip pins and the core terminals, thesecan all be grouped into three main categories: multiplexer-, serial-, andbus-based connections. Most of these architectures suggest the use ofa serial control mechanism to properly set-up the TAM.

In the first category, multiplexers are used to allow test access to thecores. The simplest method in this category is to multiplex the test pinsto the primary inputs/outputs (I/Os) such that a direct path is establishedduring test [10]. A second method modifies the cores such that eachcore has a transparent mode for testing [6], [7]. A recent third methodprovides a transparent path based on modeling the TAM design as aninteger linear programming (ILP) problem to minimize the overall testtime and overhead area [8].

A number of test architectures in the serial-based category use theestablished IEEE 1149.1 standard [11]–[13]. Whetsel, in [14], uses ahierarchical structure by introducing a tap link module (TLM). An im-provement on the TLM is presented in [15], where the test access port(TAP) of the 1149.1 standard is kept unchanged from its original formand, hence, simpler TLM controls are designed.

A number of different variations of the bus-based connectionschemes have been reported. Varma et al. [16] suggest a structuredarchitecture based on separate data and control buses. In their work,provision has also been made for using several such buses withdifferent widths. To simplify the control mechanism in the TAM archi-tecture and provide scalability of the architecture through hierarchy,a multilevel bus structure connected in a tree topology has been sug-gested [17]. Marinissen et al. suggested the TestRail architecture [18],where cores are connected in a daisy-chain configuration and buses(or rails) can have different widths, fanin, and fanout. In TestRail,each core can be bypassed if needed to access the next one in line, andcontrol is achieved via a serial connection. In the bus-based category,different methods are suggested for accessing the cores from thebuses. Core access switches select P signals out ofN bits of a bus anduse the TestRail topology for the buses [19]. Whetsel has suggestedan addressable architecture in [20]. In this latter architecture, eachcore is given an addressable test port, which can serially be assignedwith its appropriate address to provide an intelligent distributedcontrol mechanism for connecting cores to buses. Finally, a timedivision multiplexing technique has been suggested in [21]. In thelatter, through the use of configurable and dedicated arbiters, coresautonomously assume the control of the bus.

In the prior test architectures, there is no distinction made betweenthe communication and the application of test data. In these works, itis assumed that the tester establishes a data path between itself and acore, such that the tester has direct control of features of the core’sDFT. Using this data path, the tester applies the test data to the coreand collects and observes the results. Fig. 1 shows an example of asimplified test arrangement with six data lines and two control lines. InFig. 1, the data lines are grouped into two TAMs, such that the tester canhave direct control of the scan chains of each core while minimizingtest time. However, one system-level design challenge identified in [3]


Fig. 1. Example of a simplified SoC test architecture.

is due to system complexity. It is predicted that this challenge will leadto forcing a focus on communication rather than computation in thenext ten years. It is stated in [3] that:

At 65 nm and below, communication architectures and proto-cols for on-chip functional processing units will require signifi-cant change from today’s approaches. As it becomes impossibleto move signals across a large die within one clock cycle or in apower-effective manner, or to run control and dataflow processesat the same clock rate, the likely result is a shift to asynchronous[or, globally asynchronous and locally synchronous (GALS)] de-sign style. In such a regime, islands of self-timed functionalitycommunicate via network-oriented protocols.

Therefore, an underlying assumption of our work is that future SoCswill include a switching fabric coupled with network-oriented proto-cols for connecting cores together and that new methodologies and testarchitectures will be needed to lower the test cost by reusing such afabric for testing the cores.

Evidently, the concept of having a network-on-chip (NoC) that usesnetwork-centric protocols and a switching fabric for data communica-tion on a chip is very new. Most relevant works are geared toward theproblem of core interconnection [22]–[25]. The motivations for manyof these works are the nonscalability of global wire delays and theireffect on global synchronization, the degradation of buses’ electricalperformance with every attached unit, and the need for special errorcontrol mechanisms due to the unreliable transmission medium. Anearly work proposing the use of a private switching fabric with net-work-centric protocols for post fabrication test is reported in [26]. Tothe best of our knowledge, there is no other work on the subject of testarchitectures that use network-oriented approaches. The only excep-tion is the work in [27] that studies the impact of reusing an NoC fortesting core-based systems.

In current test practices that are generally in use today, one implicitassumption is that a single tester, acting as both the source and the sink,is in direct control of the embedded cores and their DFT. While thisassumption results in simple test protocols inmany cases, it undermines

the modularity in the generic test architecture by tightly coupling theelements of the test architecture, i.e., the source, sink, wrapper, and theTAM. Such a tester uses the total test bandwidth. Also, based on the testrequirements of every core, the system integrator divides and fixes thebandwidth between different cores of the SoC. For the above template,the CUT is physically connected to the tester through a test-head and,to maintain timing requirements, a physical proximity between the chipand the tester is required. For the example illustrated in Fig. 1, outof the total of eight test pins, six pins are used for the test data andtwo pins are used for the test control. This pin arrangement is basedon: 1) the number of test patterns for each core, assuming identicalfrequency of operation and 2) the total given number of test pins forthe chip. In addition, the data lines are divided into two TAM groups,as shown in Fig. 1, and the wrappers around the cores are designedto match the number of the cores’ internal scan chains to the widthof the TAM. The arrangement of Fig. 1 requires a tester with eighttest pins. If such a tester is unavailable, the TAMs need to be changedto accommodate the available tester’s channels. Moreover, if after thedesign or in a subsequent design, any of the cores needs to be connectedto a different TAM width from the widths given in Fig. 1, the wrapperfor that core would need to be changed. As an additional example ofthe close coupling between the test elements of the test arrangement ofFig. 1, the tester is forced to operate at a minimum frequency as dictatedby different delays in the forward and return paths of the TAMs. Here,the forward path refers to the data lines from the tester to the cores andthe return path refers to the data lines from the cores to the tester.

From the above arguments, the diminishedmodularity in tightly cou-pled test architectures leads to a reduced flexibility in being able tomodify the test architecture elements, as the modification of any partof the architecture requires subsequent changes in the other parts. Forexample, the addition of one extra core on a TAM can change thetiming characteristics of the TAM and, hence, the operating frequencyof the tester. By the same argument, the restricted modularity also re-sults in a reduced flexibility in regards to implementing or integratingnew schemes in the methodology. As chips become more complex, a


Fig. 2. Conceptual representation of NIMA.

high flexibility for implementing new schemes in the test methodologyis key for keeping the overall test cost low. Here, we refer to such flexi-bility as scalability of the test architecture and methodology. Moreover,with a tightly coupled test model, testing requires a close proximityof a chip and the tester. Hence, in-field testing, i.e., testing the chipwhen in its target system, or remote-access testing is virtually impos-sible or extremely costly. In addition, multiple testers cannot test a chipsimultaneously. A multiple-tester arrangement is key if test resourcepooling is required between multiple sites or companies to reduce testcost. Furthermore, using a multiple-tester arrangement can be cost-ef-fective when the cores’ test speeds vary significantly. In such a sce-nario, it is more cost-effective to reserve high-speed tester’s channelsfor fast cores and use low-speed channels for lower speed cores. Amul-tiple-tester arrangement is in direct contrast to a test architecture whereoperating at the lowest frequency is the fundamental mechanism usedto solve the disparity between the speed of tester, I/O pins, the TAM,and the cores. Volkernik et al. in [28] have shown the cost benefit ofbandwidth matching between a tester and cores.

As the first steps toward developing solutions addressing the issuesdiscussed above, in this work we propose an indirect1 and modular ar-chitecture (NIMA), where different testers can connect to a commonswitching fabric and send test data to the cores under test. We con-sider the NIMA as a special communication network that consists ofhardware and software that allow the transportation of test stimuli andexpected results from multiple sources to multiple cores. In addition,we also propose a switching fabric for the test architecture. However,the NIMA architectural design is such that it can be migrated into atemplate where the common switching fabric used for core intercon-nections is also used for testing the SoC.

The indirect methodology of NIMA breaks the coupling between thecore, the TAM, and the tester by decoupling test-data processing andits communication. In our terminology, test-data processing refers tothe functional behavior of source/sink and wrapper, whereas, test-datacommunication refers to the interaction between source/sink and thewrapper. Hence, NIMA alleviates the previously discussed problemsassociated with tightly coupled test models. NIMA provides the basisfor modular test design and programming. NIMA also enables singleor multiple testers to send test data over a local area network (LAN) ora wide area network (WAN), such as the Internet, to an SoC in-field in

1The indirect property of NIMA refers to tester’s lack of direct control overa core’s DFT.

Fig. 3. Example showing function of the IM block.

order to test the chip in its target system. Finally, the control mechanismof the test architecture is incorporated in the packets of NIMA. Hence,fewer test pins are required when using NIMA, compared to typicaltest architectures that require separate data and control lines in theirarchitectures.

III. NIMA

A. Concept

The key concept in the NIMA is establishing an indirect digital com-munication path frommultiple sources to multiple destinations througha switching fabric [26]. Testing ICs using the NIMA scheme requiresthat test stimuli and expected results for cores are first compiled intonew formats and then encapsulated into packets. These packets are sub-sequently augmented with control and address bits such that they canautonomously be transmitted to their destination through a switchingfabric. Owing to the indirect nature of the connection, embedded au-tonomous blocks at each core are responsible for applying the test tothe core and comparing the test results. To simplify the requirements ofthese embedded blocks, we assume here that: 1) the packets can arrive


Fig. 4. Conceptual three-layer model in NIMA.

Fig. 5. NIMA packet format.

at their destination cores with varying delays; 2) the packets arrive intheir original order; and 3) no packet is lost in the communication link.

The proposed NIMA is illustrated in Fig. 2 as a block diagram, whereIP cores, a switching fabric, on- and off-chip sources, embedded au-tonomous sequencers (EAS), embedded autonomous results analyzer(EARA), an interface matching (IM) block, and an optional transmis-sion system are shown. In NIMA, dedicated blocks at each core, indi-cated as EAS blocks in Fig. 2, extract test stimuli from the incomingpackets and apply them to their respective core. Moreover, in syn-chrony with the application of the test vectors by the EAS blocks, ded-icated EARA blocks compare test results at the output of their respec-tive core to the expected results within the incoming packets. Conceptsand implementations of EAS and EARA for a dedicated autonomousscan-based testing (DAST) methodology were first presented in [29]and [30].

As illustrated in Fig. 2, the connection between off-chip sources andthe IM block can be direct and/or through a transmission system suchas a WAN or a LAN. The IM block in Fig. 2 receives test packetsfrom different test sources with varying bandwidths and matches thesebandwidths to those of the SoC, where we define bandwidth to be theproduct of the frequency and the number of the channels. As an ex-ample, consider the case in Fig. 3 where four different sources sendtest data to the SoC. These four sources have 5, 3, 2, and 1 channels,respectively. Moreover, these sources can send data at maximum fre-quencies of 50, 100, 100, and 150 MHz per channel, respectively (notethat, there are no control lines associated with these sources, as controlsignals are embedded in the incoming packets). Also, in Fig. 3 we canidentify three different groups of test input pins to the SoC. These threegroups have 2, 2, and 1 channels with maximum frequency of 200, 100,and 150 MHz per channel, respectively. According to the sequence ofthe incoming packets, the IM block matches the total incoming band-width of 900 MHz to the total SoC bandwidth of 750 MHz.

We regard the NIMA as a special communication network consistingof hardware and software that allow transportation of test stimuli andexpected results from multiple sources to multiple cores. RegardingNIMA as a special communication network helps in using the accu-mulative knowledge in different communication networks developedfor audio, video, and data. To promote the modularity and simplicityof the design tasks, a layered architecture with a formal interface be-tween each layer is the most established method to divide the functions

Fig. 6. Black-box diagram of a switch in NIMA with four output channels.

implemented by communication networks. The International Organi-zation for Standardization (ISO) proposes a seven-layer open systemsinterconnection reference model (OSI model) [31]. Since the commu-nication tasks involved in SoC testing are not as complex as those inother communication networks, we simplify the model in NIMA to athree-layer model consisting of a physical layer, a network layer, andan application layer, as shown in Fig. 4. In this model, the tasks in oneelement of the architecture deal directly with tasks in another elementwithin the same layer through a virtual link. The physical connection,however, is only in the vertical direction of the model except in thephysical layer, where there is no virtual link, only a physical link.

B. Physical Layer

The physical layer encompasses the actual interconnection medium.For contemporary ICs, the physical medium is the metal wires routedbetween the SoC blocks. This layer specifies the voltage level of thesignals on the wires, the timing of the signal events, the signaling tech-niques, and other physical properties of the link, such as protectionmeasures against cross-talk. Finally, the physical layer presents the dataas a stream of 1’s and 0’s to the network layer in our present designof NIMA. We assume the use of synchronous transmission in NIMA,where a separate clock is provided next to the data bit stream. How-ever, it is possible to encode the clock in the data and later extract it


Fig. 7. Occupancy of test-stimuli subchannels in NIMA switches.

Fig. 8. Occupancy of test-results subchannel in NIMA switches.

in the switching fabric in a GALS environment. In the OSI seven-layermodel, a data-link layer sits above the physical layer to handle errordetection and access control. We omit this layer in our design becausewe assume that the reliability of the physical link can be maintained.

Although electrical signals over metal wires are the dominant phys-ical link in today’s chips, it does not mean that NIMA cannot use acompletely different physical connection, such as guided or unguidedelectromagnetic waves, in the future. One benefit of a layered approachin NIMA’s design is the possibility of utilizing new approaches withminimal design efforts as new techniques become available and whentheir use are justified.

C. Network Layer

1) Packet Format: The network layer dictates the details regardinghow data is transmitted across the network, the switching techniquesused, and the network topology implemented. The most fundamentaldesign issue is the packet format. The packet format in NIMA is illus-trated in Fig. 5.

We next briefly describe each field.a) Sync Word Field: The sync word signals the beginning of the

packet to a switch. The starting point of a packet within the incomingbit stream needs to be identified for the switches, as the communicationlink in NIMA can be idle for any length of time. The predefined patternin sync word identifies a new packet to the switches.

b) Data Length Field: Packets in NIMA can have varying datalength to accommodate different requirements of cores in terms of testdata length. The data length field identifies the length of the embeddeddata in the packets to the switches, and hence, enables the switches toswitch entire packets to the proper output channel.

c) Address Length Field: In NIMA, to promote scalability, wehave opted for a dynamic addressing mechanism as discussed later inSection III-C3. A variable length address field is used to achieve thedynamic addressing mechanism. The address length defines the lengthof the address field.

d) Address Field: The address field holds a variable lengthof address bits. The details about this field are provided later inSection III-C3.

e) Data Field: The data field holds a variable number of data bits(alternatively referred to as the payload in this document). The payloadincludes test data as well as other related patterns for the applicationlayer.

The sizes of the fields in NIMA packet format, in terms of numberof bits, are denoted by S; LD; LA; A, and D, as shown in Fig. 5. Forthe address field, A represents the number of locations in this fieldwhere each location contains n bits. This results in switches having 2n

output channels. In addition, the first four fields, identified in Fig. 5with a different color/shading from the last field, together constitutethe header of a packet.2) Switches: Switches in NIMA have one input and 2

n outputchannels. To promote the scalability of the NIMA design, eachchannel in the switch is of width 2k and comprises of test-stimuli andtest-results subchannels. The black-box diagram of a switch in NIMAwith n = 2 is shown in Fig. 6.

For switches in NIMA, the least significant bit (LSB) of the test-stimuli subchannel is denoted as the primary line and only this pri-mary line follows the format of NIMA packets. For this reason, thepacket header is only defined in the primary line, denoted by Line 0 inFig. 7. As illustrated in Figs. 7 and 8, with the header only defined in theprimary line of the test-stimuli subchannel, part of the switch channel


bandwidth is reserved and thus not used. However, this arrangementresults in a scalable architecture, where the value of k can range fromone to its maximum available value without any change in the design ofswitches or NIMA’s network layer protocol. That is, the same genericcode for the switches can be used to instantiate the NIMA network fora particular design.

Lines 1 to (k�1) in both the test-stimuli and the test-results subchan-nels can either hold valid or invalid payload in a given packet. The Vflag bits shown in Fig. 7 help the tasks in the application layer of NIMAto identify lines with valid payload, where a line with valid payload isidentified with logic 1. Note that, in terms of having valid or invalidpayload data, the status of corresponding lines in the test-stimuli andthe test-results subchannels are identical. Hence, for example, if Lines3 to (k � 1) do not carry valid payload data in the test-stimuli sub-channels, the same lines do not carry a valid payload in the test-resultssubchannels. Also note that if a packet is present on a switch’s outputchannel, Line 0 in both subchannels carries valid payload data.

In addition to payload validity, the tasks in the application layer re-quire further payload information. As shown in Fig. 9, the payload isdivided into an array of k � D bits for some applications. However,certain payloads may not occupy the trailing bits of this array in all thek bits of the output subchannels. Fig. 9 shows an example for a designwith k = 6. In this case, a payload of length 81 bits is divided intoan array of 14� 6 bits where an X denotes invalid bits in the array. Asshown in Fig. 7, C flag bits can be programmed in Lines 1 to (k � 1)of the test-stimuli subchannel to identify the invalid bits in the array.

However, the C flag requires only dlog2ke bits and is used as a

binary number, jCj, in the application layer. This number shows onlythe status of the trailing bits in Lines 1 to (k � 1) of the subchannels,as Line 0 in the subchannels will always have a valid trailing bit in theexistence of a packet. Thus, the number of lines with invalid trailing bitis equal to k � 1 � jCj. For the case shown in Fig. 9, the C flag willconsist of three bits and is equal to 010 indicating 6� 1� 2 = 3 lineshaving invalid trailing bits.

In addition, NIMA is designed such that packets can succeed eachother without any gaps and switches in a network operate within thesame clock domain if a clock signal is provided along the data line(s).3) Dynamic Addressing Mechanism: Using the packet structure of

Fig. 5, the number of the bits in the address field is: A � n � 2L .Therefore, the logical address space (LAS) is: LAS � 2n�2 . Ifthe values of n and LA are chosen appropriately, the LAS can be verylarge and, hence, enable the system designer to use partial addressing.Using variable length addressing, the entire LAS is essentially dividedinto 2L hierarchical levels each comprised of 2 n pages. The valueof A, the length of the address field, need not be constant and can bechosen such that N � 2n�A, where N is the total number of cores tobe accessed for testing. Hence, a configurable andmodular architectureis provided where the size of the network scales and grows accordingto the number of the cores without any design change.

The physical address space (PAS) is defined as the part of the LASthat is actually being used. The PAS can vary in size and be chosen tobe the minimum size required. When new cores in later iterations/re-visions are added to the SoC and if the PAS is all assigned, new levelsof addresses can be introduced to increase the size of the PAS. Thus,new cores can be accommodated with no design modification of thenetwork layer. This yields the advantage of a scalable TAM architec-ture between SoC design versions, referred to here as design-versionscalability. Moreover, if an existing SoC is used as an embedded corein a subsequent SoC, the PAS of the earlier SoC is assigned to thelowest part of the subsequent SoC’s LAS. The PAS of the subsequentSoC is then continued from the next available hierarchical level of the2n pages. This creates another level of scalability: multilevel scala-bility. Again, using the design-version scalability feature, new cores

Fig. 9. Typical payload array in the subchannels with invalid last bits markedby “X.”

Fig. 10. Address spaces for NIMA’s network layer.

in the subsequent SoC are assigned addresses in the PAS. Design-ver-sion scalability and multilevel scalability are illustrated conceptuallyin Fig. 10.

It will be possible to connect each first-level switch to a primary I/Opin of the system chip. This effectively divides the entire chip space/ad-dress into subsections with uncorrelated networks that can be individ-ually addressed and accessed. Hence, the effective logical addressingspace, LASE� , for an n-dimensional architecture can be sized to be:LASE� = m�2n�2 , wherem is the number of chip primary I/Os.Note that, in general, m � 2k, where 2k was defined above to beswitches channels width.4) Routing in the Switches2 : To eliminate the need for maintaining

routing information in the switches, the packets in NIMA are routedusing source routing, i.e., implying that the route is predefined andhardwired. The sync word in the primary line identifies the beginningof an incoming packet to the switches. The switches in NIMA saveS + LD + LA and the first n bits in the address field of a packetas defined in Fig. 5. The switches then use the first n bits in the ad-dress field of the packets to decode the destination of the packets andidentify the output channel to which the packet will be forwarded. Assoon as the output channel is determined, the packets are routed to thatchannel. This ensures minimal number of buffers and a maximum ofS+LD+LA+n clock cycles delays in the switches. This technique isreferred to as cut-through routing [31]. In cut-through routing, packetsare forwarded to the router output as soon as the destination is known,without waiting for the tail of the packet to arrive. This technique re-duces latency of the network and reduces the amount of required hard-ware in the form of buffers in the routers.

2In our design, the network layer is based on virtual indirect connections pro-vided by switches. That is to say, to simplify NIMA, packets are sent in a partic-ular order and are required to reach their destination in that order.More elaboratedesigns will include out-of-order packet reception that we do not consider in thiswork. Future studies can look into the effects of out-of-order packet reception interms of design complexities and cores’ fault coverage in cases where packetsare not put back in their original order at the cores.


Fig. 11. Block diagrams of the cores used in the SoC3 platform.

Fig. 12. Interconnect architecture for the switch fabric in SoC3.

5) Application Layer: The application layer in our design is an ag-gregation of the fourth to the seventh layers in the OSI model and de-fines the protocol for accessing the network. At this level, the data iscomprised of test vectors, test results, andDFT control signals that haveto be converted to/from packets. The NIMA architecture allows arbi-trary bit widths for switches’ subchannels, k, such that packets in a sub-channel can be blocks of k-dimensional bit arrays instead of a one-di-mensional bit stream (note 2k <= m where m is the total primarytest pins). As a result, the cores that require the use of the test networkmust have the capability of scaling the packet payload into the bit widthsuitable for their application. Moreover, for packets wider than 1 bit,handshaking ready signals are required to indicate which bits are valid,as the data may not completely fill the entire payload array as describedin Section III-C2.

Since cores on an SoC are assumed to possibly have different origins,a standardized test interface or wrapper at each core is critical. More-over, a standard language for the description of core test programs andDFT is also essential. One such standard, referred to as P1500 [4], iscurrently under development by the IEEE. Specific P1500 instructionsare required to support full scan test as well as any other DFT strategies.Furthermore, a mechanism is needed to generate the P1500 control sig-nals in the correct sequence at the core site. The mechanism needed togenerate the P1500 control signals and the test programs based on CTL[4] fall in NIMA’s application layer.

For cores with a P1500 wrapper, a set of P1500 wrapper control flagsis devised in the application layer of NIMA. These control flags en-hance the flexibility and scalability of NIMA such that NIMA remains

independent of future modifications or improvements to the P1500standard as long as the interface between the application layer and thenetwork layer is maintained. In our design, these control flags are basedon a set of wrapper control bits embedded in the payload that results inan additional hierarchy in the message that is to be packetized for eachcore. These bits serve as instructions for the P1500 control mechanism.Hence, any modifications to the P1500 wrapper will result in the mod-ification of these bits or their interpretation in the application layer.

Blocks of EAS and EARA, as shown in Fig. 2, are other application-layer tasks. EAS blocks are responsible for receiving test stimuli andapplying them according to the instruction embedded in the payload.These instructions detail the data bits that must be applied to the pinsof the core under test, and the timing at which they are applied. EARAblocks, on the other hand, receive the expected results and comparethem to the output of the core. Using internal signaling, EARA blocksoperate in synchrony with the EAS blocks to ensure the integrity of thetest process.

Another task in the application layer is decompression of the in-coming test data if compression is used on the original test data. Com-pression and decompression of test data can help toward reducing testtime, and hence test cost, by reducing the required memory in the ex-ternal tester and the number of bits needed for communication. Thereare many different compression and decompression techniques [32],and if used, their tasks fall within the application layer in NIMA.

Scheduling is another application-layer task. There are many pro-posed schemes for SoC test scheduling in the literature [33]–[36], andthe results of these works can be used for scheduling in NIMA. The


Fig. 13. Block diagram of a switch in our NIMA implementation.

Fig. 14. EAS hardware block diagram.

problem definition in NIMA scheduling is similar to the above works,and is to determine when to send test packets for each core such that,while keeping within the bounds of any given constraints, the overalltest time and the hardware complexity are minimized. For instance,the scheduler must prevent conflicts in the network resource and atthe same time minimize the test time. During the test, activating allcore DFT simultaneously may result in power dissipation that exceedsthe chip junction or package heat tolerance, so the scheduler needs tocontrol the activity of the DFT’s based on a given power budget [37].Moreover, the test data from the NIMA network may arrive at fasterrates than can be consumed by a core, so a buffering scheme is re-quired to ensure the core gets the data only when it is ready to acceptit. Otherwise, the integrity of the test will not be maintained. To pre-vent buffer overflow, constraints must be applied to the time intervalsbetween consecutive packets destined for the cores.

IV. IMPLEMENTATION

To validate the concept, NIMA was implemented on a simple SoCplatform, denoted here as SoC3. This platform uses the b10 and theb15 blocks from Politecnico di Torino circuits (I99T) in the ITC’99benchmark [38], for which the RTL codes are readily available. Usingthe Synopsys Design Compiler, the circuits included in the platform,i.e., b10 and b15 were first synthesized in a 0.18-�m technology. Usingthe same tool, we then inserted one or more scan chain(s) in two copiesof each circuit and generated their respective test files. In this way,SoC3 consists of four cores: 1) one instance of b10 with a single scanchain; 2) one instance of b10 with three scan chains; 3) one instance

Fig. 15. EAS algorithmic state machine (as a pseudocode).

of b15 with one scan chain; and 4) one instance of b15 with two scanchains. The cores of the platform are illustrated in Fig. 11 as blockdiagrams.


Fig. 16. EARA hardware block diagram.

A. Physical Layer

For the physical layer, we used the traditional metal interconnect.Automatic tools, as part of a standard application-specific integratedcircuit design flow, routed the wires in our implementation of NIMA.The voltage on thewire is expected to swing between 0 and 1.8V,whichis a typical range for the targeted 0.18-�m technology. The analysissuggested that the capacitive coupling between routed wires would notbe severe enough to cause erroneous signal transition, so no additionalhardware for cross-talk protection was implemented.

B. Network Layer

To mirror a real SoC with embedded cores, we assumed that there isno direct access to the cores in SoC3 from its primary I/Os. Moreover,we assumed the availability of three test pins in the platform. Based onthese assumptions, we designed the switching fabric, such that coresbe accessible through a maximum of two levels of switches. Fig. 12illustrates the interconnect architecture used for the switching fabric inSoC3.

Fig. 13 illustrates a block diagram of the switches in our NIMA im-plementation. In Fig. 13, incoming packets are first buffered. Subse-quently, appropriate logic informs the finite state machines (FSM) ofthe detection of the sync word that, in turn, indicates the beginning ofa packet. After the detection of a packet, the FSM provides necessarysignals for the packet end block such that it calculates the length of thepacket. The FSM also provides necessary signals for the last switch de-tection block. Based on the signal from the last switch detection block,if the switch is a last switch, i.e., the switch that is directly connected tocores, the switch only outputs the payload and not the entire packet. Asdetailed in Section III-C2, based on the valid or invalid status of eachlink in the output channels and the trailing bits in the payload array, nec-essary information is extracted from the packets to generate the Readysignals. Finally, the FSM provides necessary switching signals for theDe-Mux block to route the packet, or the payload in the case of lastswitches, to the appropriate output channel.

The switches in the network layer that process the packets header andeventually route packets were written in VHDL RTL and synthesizedwith the following parameters: S = 6; LD = 10; LA = 6; n = 2,

Fig. 17. EARA Algorithmic state machine (as a pseudocode).

TABLE IAREA AND POWER REQUIREMENTS FOR THE NETWORK LAYER SWITCHES

and k = 1. The switches buffer the incoming data for a maximumof S + LD + LA bits in shift registers. In our implementation, thisamounts to 22 bits. After detecting an incoming packet from a syncword in the primary line, switches use the first two bits in the addressfield to decode the destination of the packet and identify the outputchannel to which the packet will be forwarded. In our implementation,after the decision on the routing is made, the switch deletes these twobits from the address field, updates the address length field, and passesthe packets to the appropriate output channel.

The ready signal is of width dlog2ke+k bits and is the concatenation

of three parts. The LSB is a general flag showing if a valid packet ispresent on a switch’s output channel. The next k�1 bits are direct copyof the V flag bits in the test-stimuli subchannel as shown in Fig. 7. Theremaining bits are a copy of the C flag bits as shown in Fig. 7 andexplained in Section III-C2.


TABLE IIEAS AREA AND POWER FOR ITC’02 SoC BENCHMARK MODULES

C. Application Layer

The EAS and EARA logic for each block are generated in RTLVHDL and placed on the periphery of the cores. EAS and EARA blockwere designed to interface their respective core’s test pins to the two-bitwide channels in the switches. Hence, a simple serial to parallel scalingmechanism was adopted in the EAS and EARA logic. We developedC programs that take as input the ATPG test stimuli and expected testresults and insert three-bit op-codes to the beginning of each sectionof the test program to convert the test vectors and expected results intoEAS and EARA protocols, respectively. In general, the time taken forthis protocol synthesis is less than one second on Sun Blade100 ma-chines. The following op-codes are used as simple instructions in theEAS and EARA:

1) Shift-PI-BSR: shift into primary input boundary scan register;2) Shift-SC: shift into scan chains;3) Shift-SC-BSR: shift into scan inputs boundary scan register;4) Assert-Clk: assert the test clock;5) Shift-PO-BSR: shift into primary output boundary scan register;6) Shift-SO-BSR: shift into scan outputs boundary scan register.Other preprocessing C programs were used to construct NIMA net-

work layer’s packets, and to encapsulate EAS and EARA data filesinto the packets’ payload. Again, as with the case in generating EASand EARA data files, the preprocessing can be executed off-line withminimal computational effort.

Fig. 14 illustrates the hardware block diagram of an EAS block. Asseen, the hardware implementation of the EAS is extremely simple andcompact. The incoming data is captured in a shift register until the FSMblock can decode the op-codes. Test stimuli destined for the primaryinputs (PI) are sent to the PI-BSR and appropriate shift/capture signalsare asserted by the FSM. The same is true for test data destined for thescan inputs (SI) of the core. Upon receipt of the Assert-Clk op-code, theEAS toggles the core’s test clock. Also, Fig. 15 gives the algorithmicstate machine (ASM) of the EAS as a pseudocode.

Fig. 16 illustrates the hardware-block diagram of an EARA block.Clearly, the hardware implementation of the EARA is extremely com-

pact and requires minimal design effort for any given core. The in-coming data is captured in a shift register until its FSM block can de-code the op-codes (Fig. 17 gives the ASM of the EARA as a pseu-docode). Expected test results for the primary outputs are then sent tothe PO-BSR and appropriate shift/capture signals are asserted by theFSM. The expected test results of the scan outputs, however, are al-ways captured in the SO-BSR. Using XOR gates, the outputs of both thePO- and SO-BSR are always compared to a core’s PO and SO values,respectively. However, using AND gates, the results of the comparisonare gated by the ST PO and ST SO signals, respectively. The output ofthe AND gates then act as chip enable signals to two flip flops such thatif a mismatch is detected the go/no-go signal is asserted high.

Scheduling of the packets was trivial, as three test-pins were usedin the present implementation of NIMA. Using three test-pins and as-suming the same frequency for different layers of the test architectureimplies that the rate of incoming data is always smaller than the rate ofdata that can be used at the cores. NIMA implementations with moretest pins can use more sophisticated scheduling techniques. Findingnovel scheduling algorithms for NIMA is the subject of ongoingresearch.

V. EXPERIMENTAL RESULTS

In this section, we analyze and report the area and the power over-head for the current implementation of switches in the network layer.Also, to provide an understanding of the area and the power over-head requirements for the current implementation of NIMA applicationlayer, we report these for a selection of ITC’02 benchmark circuits3

modules. Finally, in Section V-B, we present the test times for cores ofSoC3. Using these empirical values, we then develop a test time model

3ITC’02 SoC test benchmarks are a set of circuits intended to help the researchcommunity for objective comparison of methods and tools for modular testingof core-based SoCs [39].


Fig. 18. P1500 BSR and adjusted EAS and EARA areas for modules of ITC’02 benchmarks.

TABLE IIIEARA AREA AND POWER FOR ITC’02 SoC BENCHMARK MODULES

for the current implementation of NIMA and use this model to predictthe test times for a selection of ITC’02 benchmark modules.

A. Area and Power Overhead

Using Synopsys Design Compiler, the area and power requirementsfor the current implementation of the switches in the network layer arereported in Table I.

Table II presents area and power for components of EAS blocks forcores (modules) of the ITC’02 SoC benchmarks. In Table II, PI BSR,SI BSR, Buffer in, and FSM refer to primary inputs BSR, scan input

BSR, buffer input, and the FSM blocks of the EAS blocks, respectively.The data for the EAS block excluding the mandatory components ofIEEE P1500 wrapper appears in the Adj EAS column. Here, for thedifferent benchmarks under consideration, the EAS blocks amount toan area equal to approximately 350 to 450 two-input NAND gates.

Similarly, Table III reports area and power for components ofEARA blocks for cores (modules) of the ITC’02 SoC benchmarks.In Table III, PO BSR, SO BSR, Buffer in, and FSM refer to primaryoutputs BSR, scan output BSR, buffer input, and the FSM blocksof the EARA blocks. Again, similarly to the EAS blocks, the datafor the EARA blocks excluding the mandatory components of IEEE


TABLE IVNIMA ACTUAL TEST TIME AND ITS TEST-TIME MODEL PREDICTION VALUES FOR CORES OF SoC3 (IN CLOCK CYCLES)

TABLE VPREDICTED NIMA TEST TIME FOR ITC’02 SoC BENCHMARKS MODULES (IN CLOCK CYCLES)

P1500 wrapper is given in the Adj EARA column. Here, for differentbenchmarks, the EARA blocks amount to an area equivalent to that of250 to 300 two-input NAND gates. Thus, the total additional area forour NIMA application layer is minimal as it amounts to the equivalentof about 600-800 two-input NAND gates. Adj EAS, Adj EARA, andIEEE P1500 BSR areas for ITC’02 benchmarks’ modules are plottedin Fig. 18.

B. Test-Time Overhead

In [30], we developed test time models for the EAS and EARAcomponents of DAST, as given in (1)–(3) below. These models aredeveloped using empirical data, the facts that op-codes used in EASand EARA blocks consist of three bits, and that, typically, for eachtest pattern, both EAS and EARA sequence through several states. In(1)–(3), TP, SE, PI, PO, and SI denote the number of test patterns, max-imum number of flip-flops in the scan chain(s), core input pins, coreoutput pins, and scan input numbers, respectively. Also, TM1 DAST,TM2 DAST, and TM3 DAST are the test-time models for DAST interms of clock cycles. These models can also be used to predict the re-

quired total data bits for EAS and EARA in NIMA (denoted as DDMin this paper). This holds true, as the models are equal to the number ofthe clock cycles needed to apply all the data bits and, hence, are equalto the total data bits for EAS and EARA (note that the sizes of EASand EARA data files are equal)

TM1 DAST = (9 + 2PI + SI� SE) + TP(23

+ 2PI + SI + SI� SE)

if and only if PO < PI + 3 (1)

TM2 DAST = (9 + 2PI + SI� SE) + TP(20

+ PI + PO+ SI + SI� SE)

if and only if PI + SI + 6 > PO � PI + 3 (2)

TM3 DAST = (9 + 2PI + SI� SE) + TP(14

+ 2PO+ SI� SE)

if and only if PO � PI + SI + 6: (3)

In NIMA, all the data bits for EAS and EARA blocks are encap-sulated in the payloads. Hence, the total number of packets needed to


accommodate all EAS and EARA data bits can be modeled as Packetssuch that

Packets = DDM

D > 1 (4)

where DDM, D, and denote EAS and EARA data bits, size of thepayload in bits, and an empirical calibrating coefficient4 .

Finally, test-time model for any core in the present implementationof NIMA is denoted byTMN and according to the following expression:

TMN = Packets� (22 +A) + DDM (5)

where A denotes the address bits used for the core (e.g., using Fig. 12A = 4 for b10 with three and one scan chains and A = 2 for b15blocks). Also, in (5), the first part is equal to the total header bits in allthe packets and DDM represents the total payload data for the core.

In Table IV and for each core, we report the measured NIMA testtime, TN , for our implementation of SoC3 with D = 1000. Also,in Table IV we have calculated TMN as given by (5), where = 2and D = 1000. Moreover, to compare NIMA test time to that of aserial connection in conventional test architectures, we have reportedthe lower bound TS from [30]. Finally, the last column in Table IVreports the percentage error between our experimental NIMA test time,TN , and the time model, TMN, as given in (5).

Table V reports the estimated NIMA test time based on the modelcaptured by (5) for the ITC’02 SoC benchmark modules and the per-centage overhead compared to conventional external test architectures.TMN in Table V is reported using the maximum value of D = 1023,assumingA = 2 for every core, and using the same value for as usedin Table IV. Note that in Table V, we have not included the bidirectionalpins of a module. Instead, we counted these pins both in the primaryinputs and outputs. From Table V and assuming no control lines are re-quired for the conventional serial connection, the increase in test timewith NIMA is on average 6.1% when compared to a lower bound serialconnection in a conventional external tester approach.

VI. CONCLUSION

We categorized the present test architectures into three groups basedon the connection scheme between the chip pins and the core terminals.We observed that, in general, in the current common test practices, oneimplicit underlying assumption is the existence of a physical link thatenables a single tester to directly control a core’s DFT elements. Whilethis assumption generally results in simple test protocols, we identifieda number of problems with such a direct-access template. We statedthese problems to be: 1) reduced flexibility; 2) the need for close prox-imity between the CUT and the tester; 3) impracticality or high cost ofin-field or remote-access testing; and 4) impracticality of using mul-tiple-testers to test an SoC. These problems were noted to be in additionto reduced modularity in the generic test architecture. Furthermore, tomotivate, we relied on the International Technology Roadmap predic-tion that the system complexity in chips using 65-nm technology andbeyond will necessitate a focus on communication rather than com-putation. Based on this prediction, we proposed that future chips willinclude a switching fabric coupled with network-oriented protocols forconnecting cores together. We also proposed that new methodologiesand test architectures will be needed to lower the test cost. These newmethodologies are likely to require the reuse of the fabric for test pur-poses.

4Due to their structure, the EAS and EARA data files may not necessarilydivide into completeD bits segments. In many cases, these segments, that rep-resent the payloads, will be shorter than the maximumD bits allowed. The cal-ibrating coefficient, , in (4) is, hence, used to compensate for this fact.

The issues outlined above motivated us to propose the concept of anNIMA for testing core-based SoCs. In the latter, different testers canconnect to a common switching fabric and send test data to the core(s)under test. We showed that the indirect methodology of NIMA breaksthe coupling between the core, the TAM, and the tester. That is, NIMAdecouples test-data processing and communication. In doing so, NIMAalleviates the previously outlined problems associated with tightly cou-pled test architectures. We developed NIMA such that test stimuli andexpected results for cores are first compiled into new formats and thenencapsulated into packets. These packets are subsequently augmentedwith control and address bits such that they are autonomously trans-mitted to their destination through a switching fabric. Owing to the in-direct nature of the connection, embedded autonomous blocks at eachcore are used for applying the test pattern to a core and comparing thetest results with the expected results. We considered the NIMA test ar-chitecture as a special communication network consisting of hardwareand software that allow the transportation of the test stimuli and ex-pected results from multiple sources to multiple cores. We predictedthat regarding NIMA as a special communication network helps in al-lowing us to use the accumulative knowledge in different communica-tion networks developed for audio, video, and data. As one example,we suggested and developed a three-layered model consisting of phys-ical layer, network layer, and application layer for NIMA.

We also presented an implementation of NIMA on a simple SoCto validate its underlying concept. For each layer of NIMA’s architec-ture, we provided the detailed design parameters used in the imple-mentation. Finally, we reported our experimental results for SoC3 andITC’02 benchmark circuits. We observed that, in general, switches inNIMA require an area equal to about 1250 two-input NAND gates, andthat the application layer in NIMA adds an area equivalent to about 600two-input NAND gates. In addition, we predicted an average increase intest time of about 6.1 % when compared to a lower bound in conven-tional test architectures. Hence, assuming the same number of test pinsfor NIMA and the current test architectures and based on the using con-trol lines, the test time in NIMAwill be lower than the test time in othertest architectures.

In summary, NIMA provides the basis for embedded, cost-effec-tive, scalable, modular, and flexible test design and programming withsmall area overhead. NIMA, by integrating the control mechanism inthe packets, eliminates the need for control lines and, hence, requiresfewer test pins or smaller test times, when compared to current test ar-chitectures. Moreover, NIMA facilitates the remote-access of an SoCby single or multiple testers. NIMA also enables the transmission ofthe test data to an SoC deployed in the filed when it is desired to testand monitor a chip in its target system. Finally, and equally important,NIMA serves in contributing toward the development of new test archi-tectures that benefit from the reuse of an NoC interconnect template.

Future work should investigate other switching techniques, redun-dancy in routing, and out-of-order packet delivery in the network layer.For the application layer, use of wider channels, dynamic change in thewidth of the channel, presence of multiple clock-domains within thecore, and scheduling techniques are areas of interest for future work. Fi-nally, investigations into drastically different techniques, such as wire-less connections, in the physical layer and their impact on the overallperformance of SoC testing constitute other envisaged future works.

REFERENCES

[1] The International Technology Roadmap for Semiconductors, Design,1999 Edition

[2] Y. Zorian, E. J. Marinissen, and S. Dey, “Testing embedded-core basedsystem chips,” in Proc. IEEE ITC, 1998, pp. 130–143.

[3] The International Technology Roadmap for Semiconductors, Design,2001 Edition


[4] IEEE P1500 Standard for Embedded Core Test (SECT) [Online]. Avail-able: http://grouper.ieee.org/groups/1500

[5] M. L. Bushnell and V. D. Agrawal, Essentials of Electronic Testing,for Digital, Memory & Mixed-signal VLSI Circuits. Norwell, MA:Kluwer, 2000.

[6] I. Ghosh, S. Dey, and N. K. Jha, “A fast & low cost testing technique forcore-based system-on-chip,” in Proc. Design Automation Conf., 1998,pp. 542–547.

[7] I. Ghosh, N. K. Jha, and S. Dey, “A low overhead design for testability& test generation technique for core-based systems,” in Proc. IEEE ITC,1999, pp. 50–59.

[8] M. Nourani and C. Papachristou, “An ILP formulation to optimize testaccess mechanism in system-on-chip testing,” in Proc. IEEE ITC, 2000,pp. 902–1000.

[9] B. Mathewson. Core provider’s test experience. presented atIEEEP1500 Working Group Meeting. [Online]. Available: http://grouper.ieee.org/groups/1500/pastmeetings.html#dac98

[10] V. Immaneni and S. Raman, “Direct access test scheme-design of blockand core cells for embedded ASICS,” in Proc. IEEE ITC, 1990, pp.488–492.

[11] N. A. Touba and B. Pouya, “Testing embedded cores using partial iso-lation rings,” in Proc. IEEE VLSI Test Symp., 1997, pp. 10–16.

[12] B. Pouya and N. A. Touba, “Modifying user-defined logic for test accessto embedded cores,” in Proc. IEEE ITC, 1997, pp. 60–68.

[13] L. Whetsel, “Core test connectivity communication & control,” in Proc.IEEE ITC, 1998, pp. 303–312.

[14] , “An IEEE 1149.1 based test access architecture for IC’s with em-bedded cores,” in Proc. IEEE ITC, 1997, pp. 69–78.

[15] D. Bhattacharya, “Hierarchical test access architecture for embeddedcores in an integrated circuit,” in Proc. IEEE VLSI Test Symp., 1998,pp. 8–14.

[16] P. Varma and S. Bhatia, “A structured test Re-use methodology for core-based system chips,” in Proc. IEEE ITC, 1998, pp. 294–302.

[17] A. Benso et al., “HD2BIST: Architectural framework for BIST sched-uling, data patterns delivering & diagnosis in SoCs,” in Proc. IEEE ITC,2000, pp. 892–901.

[18] E. J. Marinissen et al., “A structured & scalable mechanism for testaccess to embedded reusable cores,” in Proc. IEEE ITC, 1998, pp.284–293.

[19] M. Benabdenbi and W. Maroufi, “CAS-bus: A scalable and reconfig-urable test access mechanism for systems on a chip,” in Proc. IEEE De-sign, Automation, Test Eur., 2000, pp. 141–145.

[20] L. Whetsel, “Addressable test ports, an approach to testing embeddedcores,” in Proc. IEEE ITC, 1999, pp. 1055–1064.

[21] Z. S. Ebadi and A. Ivanov, “Time domain multiplexed TAM: Imple-mentation and comparison,” Proc. Design, Automation Test Eur., pp.732–737, 2003.

[22] P. Guerrier and A. Greiner, “A generic architecture for on-chippacket-switched interconnections,” in Proc. Design, Automation TestEur., 2000, pp. 250–256.

[23] W. J. Dally and B. Towless, “Route packets, not wires: On-chip in-terconnection networks,” in Proc. Design Automation Conf., 2001, pp.684–689.

[24] L. Benini and G. De Micheli, “Networks on chips: A new SoC para-digm,” IEEE Comput., vol. 1, pp. 70–78, Jan. 2002.

[25] P. P. Pande, C. Grecu, A. Ivanov, and R. Saleh, “Design of a switch fornetwork on chip applications,” in Proc. IEEE Int. Symp. Circuits Syst.,2003, pp. 217–220.

[26] M. Nahvi and A. Ivanov, “A packet switching communication-based testaccess mechanism for system chips,” Proc. IEEE Eur. Test Workshop,pp. 81–86, 2001.

[27] E. Cota et al., “The impact of NoC reuse on the testing of core-basedsystems,” in Proc. IEEE VLSI Test Symp., 2003, pp. 128–133.

[28] E. H. Volkernik, A. Khoche, J. Rivoir, and K. D. Hilliges, “Moderntest techniques: Tradeoffs, synergies, and scalable benefits,” J. Electron.Test.: Theory Applicat., vol. 19, pp. 125–135, 2003.

[29] M. Nahvi, A. Ivanov, and R. Saleh, “Dedicated autonomous scan-basedtesting (DAST) for embedded cores,” in Proc. IEEE ITC, 2002, pp.1176–1183.

[30] M. Nahvi and A. Ivanov, “An embedded autonomous scan-based re-sults analyzer (EARA) for SoC cores,” Proc. IEEE VLSI Test Symp.,pp. 293–298, 2003.

[31] W. Stallings, Data & Computer Communications, 6th ed. EnglewoodCliffs, NJ: Prentice-Hall.

[32] A. Chandra and K. Chakrabarty, “System-on-a-chip test-data compres-sion and decompression architectures based on Golomb codes,” IEEETrans. Computer-Aided Design, vol. 20, pp. 355–368, Mar. 2001.

[33] I. Gosh, N. K. Jha, and S. Dey, “A low overhead design for testabilityand test generation technique for core-based systems-on-a-chip,” IEEETrans. Computer-Aided Design, vol. 18, pp. 1661–1676, Nov. 1999.

[34] V. Iyengar and K. Chakrabarty, “System-on-a-chip test scheduling withprecedence relationships, pre-emption, and power constraints,” IEEETrans. Computer-Aided Design, vol. 21, pp. 1088–1094, Sept. 2002.

[35] E. Larsson and Z. Peng, “An integrated system-on-chip test framework,”in Proc. Design, Automation, Test Eur., 2001, pp. 139–144.

[36] Y. Huang et al., “Optimal core wrapper width selection and SOC testscheduling based on 3-D bin packing algorithm,” in Proc. ITC, 2002,pp. 74–82.

[37] R.M. Chou, K. K. Saluja, andV. D. Agrawal, “Scheduling tests for VLSIsystems under power constraints,” IEEE Trans. VLSI Syst., vol. 5, pp.175–185, June 1997.

[38] ITC99 Benchmarks. [Online][39] ITC’02 SoC Test Benchmarks [Online]. Available: http://www.extra.re-

search.philips.com/itc02socbenchm/

Scan Architecture With Mutually Exclusive Scan SegmentActivation for Shift- and Capture-Power Reduction

Paul Rosinger, Bashir M. Al-Hashimi, and Nicola Nicolici

Abstract—Power dissipation during scan testing is becoming an im-portant concern as design sizes and gate densities increase. While severalapproaches have been recently proposed for reducing power dissipationduring the shift cycle (minimum-transition don’t care fill, special scancells, and scan chain partitioning), limited work has been carried outtoward reducing the peak power during test response capture and thefew existing approaches for reducing capture power rely on complexautomatic test pattern generation (ATPG) algorithms. This paper proposesa scan architecture with mutually exclusive scan segment activation whichovercomes the shortcomings of previous approaches. The proposed archi-tecture achieves both shift and capture-power reduction with no impacton the performance of the design, and with minimal impact on area andtesting time (typically 2%–3%). An algorithmic procedure for assigningflip-flops to scan segments enables reuse of test patterns generated bystandard ATPG tools. An implementation of the proposed method hadbeen integrated into an automated design flow using commercial synthesisand simulation tools which was used on a wide range of benchmarkdesigns. Reductions up to 57% in average power, and up to 44% and 34%in peak-power dissipation during shift and capture cycles, respectively,were obtained when using two scan segments. Increasing the number ofscan segments to six leads to reductions of 96% and 80% in average powerand, respectively, maximum number of simultaneous transitions.

Index Terms—Design for testability, low power, scan testing.

I. INTRODUCTION

Scan architectures represent an attractive solution for both built-inand external testing of digital integrated circuits (ICs). This is because

Manuscript received November 24, 2002; revised February 21, 2003 and June4, 2003. The work of P. Rosinger and B. M. Al-Hashimi was supported bythe Engineering and Physical Sciences Research Council (EPSRC) under GrantGR/S05557. This paper was recommended by Associate Editor K. Chakrabarty.

P. Rosinger and B. M. Al-Hashimi are with the Electronic SystemsDesign Group, Department of Electronics and Computer Science,University of Southampton, Southampton SO17 1BJ, U.K. (e-mail:[email protected]; [email protected]).

N. Nicolici is with the Computer-Aided Design and Test Research Group,Department of Electrical and Computer Engineering, McMaster University,Hamilton, ON L8S 4K1, Canada (e-mail: [email protected]).

Digital Object Identifier 10.1109/TCAD.2004.829797

0278-0070/04$20.00 © 2004 IEEE

Date post:	24-Apr-2020
Category:	Documents
Upload:	others
View:	4 times
Download:	0 times

IndirectTestArchitectureforSoCTestingsoc/soc/publications/indirect_test_access.pdf ·...

Documents