+ All Categories
Home > Documents > On-Chip Support for NoC-Based SoC Debugging

On-Chip Support for NoC-Based SoC Debugging

Date post: 12-Dec-2016
Category:
Upload: sandip
View: 228 times
Download: 8 times
Share this document with a friend
10

Click here to load reader

Transcript
Page 1: On-Chip Support for NoC-Based SoC Debugging

1608 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 57, NO. 7, JULY 2010

On-Chip Support for NoC-Based SoC DebuggingHyunbean Yi, Sungju Park, and Sandip Kundu, Fellow, IEEE

Abstract—This paper presents a design-for-debug (DfD) tech-nique for network-on-chip (NoC)-based system-on-chips (SoCs).We present a test wrapper and, a test and debug interface unit.They enable data transfer between a tester/debugger and a core-under-test (CUT) or -debug (CUD) through the available NoC tofacilitate test and debug. We also present a novel core debug sup-porting logic to enable transaction- and scan-based debug opera-tions. The basic operations supported by our scheme include eventprocessing, stop/run/single-step and selective storage of debug in-formation such as current state, time, and debug event indication.This allows internal visibility and control into core operations. Ex-perimental results show that single and multiple stepping throughtransactions are feasible with moderately low area overhead.

Index Terms—Design-for-debug (DfD), design-for-testa-bility (DfT), digital system testing, network-on-chip (NoC),system-on-chip (SoC).

I. INTRODUCTION

A S the complexity of a system-on-chip (SoC) grows,on-chip bus becomes a bottleneck for overall throughput,

scalability, and communication among different clock domains.To overcome these limitations, network-on-chip (NoC) hasemerged as a new communication architecture [1]–[3]. Testingand debugging large and complex SoCs are difficult becausethere are many different clock domains and engineers have todevise how to access the core internals and probe them [4], [5].Accordingly, efficient design-for-test (DfT) and -debug (DfD)techniques are required.

In order to test such complex SoCs, modular testing is one ofefficient methods to reduce test time [6], [7]. In modular testing,a test access mechanism (TAM), which enables the exchangeof test data between external pins of a chip and its embeddedcores, and test wrappers, which provides an interface between aTAM and the embedded cores. By applying predetermined testpatterns supplied by IP core providers, test generation time andtest data volume can be reduced [6]. Schemes such as Daisy-chain [8], TestRail [9] and TestBus [10] have been proposed as a

Manuscript received April 04, 2009; revised June 10, 2009 and August 13,2009; accepted September 23, 2009. Date of publication December 31, 2009;date of current version July 16, 2010. This work was supported by the Korea Re-search Foundation Grant funded by the Korean Government (MOEHRD) underGrant KRF-2007-357-D00229. An earlier version of this paper was presentedat the 17th Asian Test Symposium (ATS) 2008. This paper was recommendedby Associate Editor V. De.

H. Yi and S. Kundu are with the Electrical and Computer EngineeringDepartment, University of Massachusetts, Amherst, MA 01003 USA (e-mail:[email protected]; [email protected]).

S. Park is with the Department of Electrical Engineering and Computer Sci-ence, Hanyang University, Ansan, Kyunggi-do 426-791 South Korea (e-mail:[email protected]).

Digital Object Identifier 10.1109/TCSI.2009.2034887

TAM architecture and TAM chain [11] (see in Section II), whichare based on IEEE 1500 wrapper [12].

In general, SoC debugging methods implement on- and off-chip debug support logic and optimize the test infrastructure.Accordingly, the area overhead due to on-chip test and debug in-frastructure is an issue in cost sensitive SoC designs. This paperfocuses on the reuse of NoC as a test and debug data path forNoC-based SoC testing and debugging. Fig. 1 shows a basicSoC architecture for test and debug. There are test wrappers anddebug support logic around each core. The test and debug inter-face unit (TDIU) is required to convert the tester or debugger in-terface into the functional protocol between NoC and cores. TheTDIU communicates with each test wrapper and debug supportlogics using some control signals, and enables tester or debuggerto apply and observe data via functional interconnect.

In this paper, we design a test wrapper to use NoC as a TAMand presents on-chip debug components to utilize the test infra-structure and NoC for embedded core debug. By adopting theproposed techniques, time-to-market as well as area overheadcan be reduced.

The remainder of the paper is organized as follows. Section IIreviews related work. Section III presents problem statements.In Section IV, we show the overview of the proposed archi-tecture and describe the core test wrapper and on-chip debugsupport components in detail. Experimental results are given inSection V and we conclude the paper in Section VI.

II. RELATED WORK

There have been several published work on the reuse of func-tional interconnect such as on-chip bus and NoC as a TAM.Feige et al. [14] introduced the Scan-Test Harness (STH) asa test wrapper and Test Interface Controller (TIC) to performstructural test via Advanced Microcontroller Bus Architecture(AMBA) bus. Lin and Liang [15] added a separate observation-dedicated TAM to simultaneously perform scan-in and scan-outoperations. In order to reduce test time, Song et al. [16] usedthe Test BUS (TBUS) of TIC as a test stimuli application ded-icated path and modified the Extended Bus Interface (EBI) sothat the address bus of the EBI can be used as a test responsesobservation dedicated path. Marinissen et al. [11] presented atest wrapper for embedded core test and introduced an effi-cient wrapper cells-to-scan chains configuration named as TAMchain as shown in Fig. 2. Amory et al. [17] presented a testwrapper for the reuse of NoC as a TAM. They also utilized theTAM chain as shown in Fig. 3. For the case the core test band-width is higher than the Automated Test Equipment (ATE) testbandwidth, they reduced test length by enabling some wrappercells at each end of a wrapper chain to perform parallel-to-se-rial/serial-to-parallel conversion. Hussin et al. [18] also used the

1549-8328/$26.00 © 2010 IEEE

Page 2: On-Chip Support for NoC-Based SoC Debugging

YI et al.: ON-CHIP SUPPORT FOR NoC-BASED SoC DEBUGGING 1609

Fig. 1. SoC Architecture for Test & Debug.

Fig. 2. TAM chain.

Fig. 3. TAM chain with parallel-to-serial/serial-to-parallel conversion func-tions.

Amory’s method [17] and added the bandwidth matching regis-ters to the test wrapper to increase the test bandwidth utilization.

B. Vermeulen et al. [19]–[21] introduced the serial debuginfrastructure based on IEEE 1149.1 TAP and scan chain. Theyproposed debug control logics such as a debug breakpointdetector and a clock control logic to stop and run one or more

CUDs. For on-chip bus-based SoC debug, ARM providesEmbedded Trace Macrocell (ETM) which can trace an ARMprocessor core by monitoring AMBA bus [22]. As a multiplecore debug solution for an AMBA based SoC, ARM presentedCoreSight [23] and First Silicon presented Multicore Em-bedded Debug (MED) [24]. CoreSight and MED use ETM andOCI (On-Chip Instrumentation) respectively as a core debugsupporting module and probe AMBA bus directly. Debugcontrol is basically conducted through IEEE 1149.1 (JTAG).

In an NoC-based SoC, multiple transactions are performedat the same time and cores operate with different clocks.Accordingly, different strategies to monitor, stop, and run fordebug are required. C. Ciordas et al. [25] introduced a genericNoC monitoring service concept based on event-based bit-levelmonitoring. However, a higher abstraction level of monitoringneeds to be considered because the size of monitored dataduring the bit-level monitoring is relatively big and IP coresin an NoC-based SoC perform packet based communications[26], [27]. C. Ciordas et al. [26] proposed a transaction monitorfor the Ethereal NoC-based SoC, and K. Goossens et al. [28]presented a general NoC-based SoC debug architecture concep-tually and implemented message- and transaction-based debug.By attaching monitors and stop modules to the routers as wellas the IP cores, they defined various scopes of debug such asSoC, IP, and NoC [28]. In order to reduce routing overheaddue to the debug control and data signals, S. Tang and Q. Xu[27] have tried reusing NoC as the trace data path. However,more routers have to be added in NoC because each core tracerhas to have its own NI (Network Interface) and the NoC canaccommodate the debug traffic.

III. PROBLEM STATEMENT

During a debug phase, when a certain event is generated by amonitor, a debug engineer or an on-chip debug component can

Page 3: On-Chip Support for NoC-Based SoC Debugging

1610 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 57, NO. 7, JULY 2010

stop the entire or part of the SoC and the engineer reads outthe contents of scan chains. As the need arises, the engineer canapply specific functional patterns or state values and comparethe results with a golden reference. To resume operation, thecontents read out have to be restored into the scan chains. Ifspecial flip-flops for saving states such as the hold-scan cells[29], [30] and the swap registers [31] are implemented in cores,the operation resume time can be reduced. For more efficientNoC-based SoC debugging, three problems to be solved in thispaper are:

Problem 1. Most of the existing debug infrastructures usethe IEEE 1149.1 serial path for scan dump. Therefore, ittakes excessively long time to perform scan dump for largeSoCs.Problem 2. In order to reduce debugging time, internal datatracing buffers and debug dedicated parallel paths can beadded, but area overhead due to the additional buffers andwires becomes too great.Problem 3. No matter how quickly a debug controller triesstopping the SoC right after an event occurs, it is not pos-sible to stop the SoC instantaneously because there is asignal propagation delay from the time the debug controllerdetects the event and generates a clock gating signal. Inaddition, even though it is predictable, the SoC stops afterseveral cycles due to latency in clock distribution.

To solve the abovementioned problems, we deploy trans-action-based [26], [28] and scan-based debug strategy [19],[32]. Transactions are read and write actions for communica-tion among cores. A transaction is initiated by a master (aninitiator), executed by a slave (a target), and completed by themaster [26], [28]. Therefore, the initiations and completions oftransactions can be analyzed by probing the incoming/outgoingsignals or packets of masters. Once an event occurs, core debugcomponents prevent all masters from initiating new transac-tions and do not stop cores until all outstanding transactions arecompleted. Instead of stopping cores as soon as an event occurs,some information such as transaction counts, timer values, andthe core from which the event is generated are recorded in anSoC. After all cores stop, a debug engineer conducts embeddedcore debugging by reading out the information and scan values.To be noted in this debug strategy is that the scan dump canbe done through not the IEEE 1149.1 but an NoC. Since alloutstanding transactions are completed when all cores arestopped, routers in the NoC are empty. Accordingly, the scandump can be performed through an NoC.

In this way, we can solve the first and second problems byapplying scan-based debug without data tracing and reusing thetest wrapper and the NoC for debug. The third problem can beaddressed by stopping cores at not the clock cycle but the trans-action level and recording the timing information.

IV. DESIGN-FOR-TEST AND -DEBUG

A. Architecture Overview

Fig. 4 shows the proposed debug architecture overview for anNoC-based SoC. The TDIU includes test/debug control logic,IEEE 1149.1 TAP, and NoC protocol interface logic. Test/debugcontrol such as selecting CUTs/CUDs, stopping them, setting

break points, and checking core control information is con-ducted through the data control bus and IEEE 1149.1. Test ordebug data application and observation are performed throughhigh speed parallel NoC data path. For core debugging, the coredebug supporters (CDS) are attached to the cores. There aretwo types of CDS. One is the master debug supporter (MDS)for master cores and the other is the slave debug supporter(SDS) for slave cores. Each CDS operates with the functionalclock of its core. We just follow a basic debug procedure asshown in Fig. 5. Once an SoC enters debug mode, a debugengineer sets error conditions or break points in each MDSand SDS according to functions of the core. Consequently, run,stop, scan dump, and resume operations will be applied. Afterscan dump, the break points may be readjusted and internalstates may be changed before the system resumes operation.Additionally, the MDS may allow repeated application of testsin a loop. Section IV describes our proposed components indetail.

A CDS monitors the signals between an NI and its core, dis-tributes an event signal, and generates some debug control sig-nals. Once an event occurs, the following core-stopping processsteps are performed to make all routers empty and stop all cores.Step 1) The event is distributed to the TDIU and all CDSs.

Each CDS stores its current transaction count andtimer value in a debug information register as soonas the event is detected. The CDS which generatesthe event sets a special flag called “event flag” in itsdebug information register to ‘1’.

Step 2) Each MDS prevents its master core from initiating anew transaction and waits for all outstanding trans-actions issued by its master core to be completed.

Step 3) When all its outstanding transactions are completed,the MDS stops its master core, records its currenttransaction count and timer value, and informs theTDIU that its master core is stopped.

Step 4) When all master cores are stopped, the TDIU sendsa control signal to enable each SDS to stop its slavecore.

Step 5) On receiving the control signal from the TDIU, eachSDS stops its slave core and records its current trans-action count and timer value.

Step 6) The TDIU informs the external debugger that allcores are stopped.

After the core-stopping process, all packets are consumed andall cores are stopped. Then, a debug engineer reads out the debuginformation and selects cores to be debugged using the TAP con-troller, and dumps scan contents or applies and observes debugdata via NoC. For all cores or some selected cores including oneor more associated master cores, single-transaction-step debug-ging can be performed by stopping and running the cores ona transaction basis. How to perform the single-transaction-stepdebugging is explained in more detail in the Section IV-C in thisSection.

B. Core Test Wrapper

To enable test data to be transferred via NoC, the TDIU (seeFig. 4) becomes a master and test wrappers of IP cores must op-erate as slaves. If there are scan chains where ,

Page 4: On-Chip Support for NoC-Based SoC Debugging

YI et al.: ON-CHIP SUPPORT FOR NoC-BASED SoC DEBUGGING 1611

Fig. 4. Architecture for Test & Debug.

Fig. 5. Scan-based debug procedure.

a test wrapper based on an IEEE 1500 wrapper can be depictedas shown in Fig. 6. In test mode, the Test/Debug Control & Pro-tocol Interface (TDCPI) is enabled and a -bit width TAM chain(see Fig. 2) is configured by setting Config MUXs.

The test wrapper shown in Fig. 3 can reduce test length. How-ever, due to the parallel-to-serial and serial-to-parallel conver-sion functions, wrapper cell control logic becomes complex andATE test bandwidth is effectively utilized only when the coretest bandwidth is higher than ATE test bandwidth. Therefore,

in this paper, we implement the TAM chain operation as shownin Fig. 2. The test patterns and responses are loaded and un-loaded, respectively, and shifted in -bit parallel without the par-allel-to-serial and serial-to-parallel conversion.

Fig. 7 shows the TAM chain configuration used in this paper.The wrapper instruction register (WIR) and the wrapper bypassregister (WBY), which are control components, are included inthe TDCPI. In order to utilize the NoC data path as a TAM, intest or debug scan shift operations, the test wrapper is configuredso that data inputs and data outputs respectively become inputsand outputs of the scan chains. The bold solid lines and arrowsbecome a test data shift path in test mode. During shift opera-tion, the test data come in through the functional data input ter-minals and go out through the functional data output terminals.Data input and output wrapper cells are required to be slightlymodified for the TAM chain operation. To shift and capture op-erations with this TAM chain configuration, some scan test con-trol logics such as shift counters, clock gating logics, and scanenable generation logic are added in the TDCPI.

C. Debug Components

Fig. 8 and Fig. 9 show the proposed CDS and TDIU, re-spectively. Each CDS operates with each core clock and theTDIU operates with a Tester clock or a Debugger clock. In otherwords, they operate totally asynchronously. The debug compo-nents may communicate with each other through handshakingprotocol. However, to minimize the number of wires, we use asimpler scheme with a control signal interface such that once itis asserted it is not de-asserted until the debug process is over.

The detailed descriptions of main components and their func-tions are follows:

Page 5: On-Chip Support for NoC-Based SoC Debugging

1612 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 57, NO. 7, JULY 2010

Fig. 6. Test wrapper architecture with protocol interface.

• Clock gating cell (CGC) and Clock Multiplexer: The CGC[33], [34] and the Clock Multiplexer [35] are used forglitch-free clock gating and switching.

• Event & Trans. Detector: Generates events based onbreakpoints programmed by the TAP controller and IEEE1500. The Event & Trans. Detector in each MDS monitorsthe initiations and completions of transactions and setsTR_sig into ‘1’ when all its outstanding transactions issuedbefore an event (from itself or other CDSs) is generatedare completed. The Event and Transaction Detectors ineach SDS simply counts the number of transactions.

• Transaction Counter: Simply increments by one, when-ever a new transaction is started. The counter value in eachMDS is increased by one every time a new transaction isinitiated by its master core, and the counter value in eachSDS is increased by one every time a new transaction re-quest packet is received in its slave core. Accordingly, dif-ferent CDSs’ have different counter values.

• Timer: Counts the number of the core clocks and stopswhen the core stops. Each core operates with different coreclocks. Therefore, the timer values are also different amongCDSs’.

• Transaction & Core Stopper: Requests all MDSs andthe TDIU to enter the process to stop cores throughTR_stop_req_inout when an event occurs. WhenTR_stop_req_inout goes high, the Transaction & CoreStopper in each MDS prevents its master core from ini-tiating a new transaction and waits until all outstandingtransactions initiated by the master core are completed.Then, the Transaction & Core Stopper has the TDCPI tostop the core and configure the TAM chain, inform theTDIU of the completion of all its transactions, and store

debug information. When all masters complete all theiroutstanding transactions and all TR_completed signals areasserted, the TDIU enables each SDS to stop its slave coreand informs a debugger that all the cores are stopped.

• Debug Information Register & Setter: The Debug Infor-mation Register consists of the Event Point Register andthe Stop Point Register, which are the key idea of thispaper. The current timer value (Time) and the current trans-action counter value (TRCnt) are stored in the Event PointRegister immediately when an event occurs. If the event isgenerated from itself ' , then the event flag (F)is set to ‘1’, but if the event is generated from other CDS( ' & '), then the eventflag (F) is set to ‘0’. When the core stops ' ,the current timer value and the transaction counter valuefor the latest issued transaction are stored in the Stop PointRegister. The event flag (F) in the Stop Point Register isalways set to ‘0’.

The TDIU consists of the Master Under Debug (MUD) Se-lector, the Tester and Debugger (T/D) Interface, the Master Pro-tocol Interface, and the TAP Controller. When an event occurs,the MUD Selector checks the transaction completions status ofselected some or all MDS’s. When all the transactions are com-pleted, it stops all slave cores through T/D_mode. In test ordebug mode, the T/D Interface block and the Master Protocol In-terface block are required for data transmission between an ex-ternal Tester or Debugger and on-chip NoC. If sufficient pins fortesting and debugging are not provided, some serial-to-parallel/parallel-to-serial conversions are required for programming theMUD Selector and sending/receiving signals and data. The TAPcontroller is used to read the register values in the CDSs’ andselecting cores to be debugged. In order to control IEEE 1500,

Page 6: On-Chip Support for NoC-Based SoC Debugging

YI et al.: ON-CHIP SUPPORT FOR NoC-BASED SoC DEBUGGING 1613

Fig. 7. TAM chain configuration.

some simple TAP-to-IEEE 1500 control signal mapping logicsare needed [12], [34].

D. Single Transaction Step Debugging

Our single-transaction-step debug (STSD) is performed byrunning and stopping master cores on a transaction basis asshown in Fig. 10. The single_TR_step and TR_stop_req_inoutsignals of the TDIU, which are directly connected to a debugger,are used for the STSD. Once a debugger sets single_TR_stepto high, the Event & Trans. Detectors and the Transaction &Core Stoppers in MDS’s block transactions 'and enter the STSD mode. Then, the STSD is controlled by theTR_stop_req signal. When TR_stop_req goes high, TR_blockin each MDS goes low to enable a new transaction to be initi-ated. Then, each master is stopped after one transaction. After allmasters are stopped, slaves are stopped and the masters’ trans-actions are unblocked. And then, the TDIU sets the debug readysignal, debug_rdy, to high so that the debug engineer can per-form the next steps such as scan (or I/O) dump, debug data appli-cation and observation, and resuming normal operation. In orderto repeat the STSD, resetting (debug_reset) the debug compo-nents is needed. One STSD, finally, comes to an end by finishingthe debug operations. By de-asserting TR_stop_req after a com-pletion of an STSD, a debugger enables the cores to be ready tostart another STSD. In the case that two or more masters and oneor more slaves are selected for the STSD, each master executesone transaction but a slave can execute two or more transac-tions. Therefore, to perform the STSD of a slave, an appropriatemaster-slave pair has to be selected.

E. Interrupt for Debugging

Long pattern sequences cannot be simulated. Consequently,golden reference values are not known. As an alternativedebug method, the intermediate values can be considered. Adebugger need not wait for an event to be generated to stopand trace through internal states. At periodic intervals all coresare stopped and internal states are gathered. This is basedon Periodic System Management Interrupt (PSMI) described

originally by I. Silas et al. [36], who uncovered 100% of logicissues by using PSMI failures reproduction. These periodicstops allow state dump and synchronization between the chip’sbehavior and RTL simulation, and restore (was restores) stateto resume normal operation. By the synchronization process,the waveform of the golden reference and the current stateis matched. With our debug components, an interrupt by adebugger and a debug cycle can easily be done by assertingdebug_enable and TR_stop_req signals as shown in Fig. 11.By periodically repeating this debug cycle, a PSMI failurereproduction is performed.

Each debug supporting logic operates with its core functionalclock, detects the initiations and completions of transactions,and records the times and transaction counts at which an eventoccurred and its core stopped. This allows debug support formultiple clock domains.

V. EXPERIMENTAL RESULTS

A. Scan Dump Time

Goel and Marinissen [37] presented a lower bound on the coretest time for the case that a multiple width TAM is used. If weassume that there is no bidirectional terminal, then, for a coretest, stimulus bits to be loaded for the numberof primary inputs and the number of scan flip-flops , andtest response bits to be unloaded for the number ofprimary outputs . The shift-in of the responses of a test patternand the shift-out of the stimuli of next test pattern are overlappedin time. Therefore, if w balanced TAM chains are configured forthe TAM width , then a lower bound on the test time ofa core by the number of tester’s test clocks is

(1)

where is the number of test patterns which becomes thenumber of captures. As described in Section IV, this paperuses a -bit width TAM chain for the functional protocol datawidth . Therefore, if we assume that -bit test data channels( for data inputs and for data outputs) between a tester and

Page 7: On-Chip Support for NoC-Based SoC Debugging

1614 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 57, NO. 7, JULY 2010

Fig. 8. Core Debug Supporter (CDS).

the TDIU are available and the NoC provides a guaranteedbandwidth and latency, then a lower bound on the core test timein an NoC-based SoC is

(2)

where is the packet transmission delay between the TDIUand the core-under-test (CUT).

To compare debug speed with IEEE 1149.1 TAP, let us as-sume that a 1-bit input data channel and a 1-bit output datachannel between a debugger and the TDIU are available. Thescan debug data are transferred thorough the -bit data path tothe TDIU in which a parallel-to-serial converter converts the

-bit parallel debug data into 1-bit streams. Then, the scan dumptime for a core becomes

(3)

When IEEE 1149.1 TAP is used, the scan dump time for acore is which is smaller than by . However,the debug data shift time via IEEE 1149.1 TAP is still muchlonger than that via an NoC because the proposed TDIU andtest wrapper can operate at a high frequency test/debug clockprovided by a tester or a debugger while IEEE 1149.1 TAP, ingeneral, operates at 10 MHz or so [28], [38].

B. Simulation

Fig. 12 shows a simulation result of the core-stopping processby debug components. Master 1 and 2 run at 500 MHz, and

Slave 1 and Slave 2 run at 250 MHz and 125 MHz, respectively.As soon as an event is generated in Slave 2, its SDS informsother CDSs’ and TDIU of the event through TR_stop_req_busand each MDS blocks a new transaction ( and ). In thissimulation, all outstanding transactions of Master 1 are com-pleted and Master 1 stops first . When all outstandingtransactions of Master 2 are completed , the TDIU stopsall slaves using T/D_mode because all the outstanding trans-actions are completed. Even though each core is stopped by agated core clock (Core_CLK_Gated), its CDS and TDCPI keeprunning with the Core_CLK.

C. Area Overhead

Table I shows the area overhead due to our debug supportcomponents. For eights IP cores [39] which are Leon3 pro-cessor, SDRAM controller, Ethernet MAC, VGA, GPIO, PS/2,Timer, and UART, we designed the IEEE 1500 wrappers andthe TDCPIs with AXI [40]. We also assume that each original IPcore is already wrapped with IEEE 1500 and do not consider thearea of the IEEE 1500 wrappers. The RTL code is synthesizedby the Synopsys DFT Compiler with TSMC 0.25 m library.

The gate counts (# of 2-input NANDs) of the TDCPIs are ar-ranged from 611 to 1519 according to the number of primaryinputs and outputs of the core and the average gate count of theTDCPIs is 889. The size of registers in a CDS may vary de-pending on the complexity of its core. Further, in order to eval-uate the routing overhead, we have performed placement androuting using the Cadence SOC Encounter with 45 nm library.The average area overhead for adding a TDCPI and a CDS is

Page 8: On-Chip Support for NoC-Based SoC Debugging

YI et al.: ON-CHIP SUPPORT FOR NoC-BASED SoC DEBUGGING 1615

Fig. 9. TDIU architecture.

Fig. 10. Single-transaction-step debugging.

Fig. 11. Interrupt by TR_stop_req signal and one debug cycle.

Page 9: On-Chip Support for NoC-Based SoC Debugging

1616 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 57, NO. 7, JULY 2010

Fig. 12. Simulation result of core-stopping process.

TABLE IAREA OVERHEAD

TDCPI = Test/Debug Control & Protocol InterfaceCDS = Core Debug Supporter (MDS and SDS).

about 30.9% that increases to 35.7% when routing is consid-ered. This area overhead is appears large only because the targetSoC is small. The gate count of the TDIU including the TAPcontroller is only 1015 and the TDIU interfaces with all CDSsthrough only 5 additional wires beyond the number of Mastersbus wires. Thus, as the core size grows, this overhead pales intoinsignificance.

VI. CONCLUSION

In this paper, we proposed a DfD technique to reuse test in-frastructure and the available NoC for debugging of an NoC-based SoC. The main benefit of using the available on-chip net-work are (i) data throughput and (ii) hybrid operation that al-lows a core to be taken down by a tester for examination/debugwhile all other cores remain in functional mode. We designeda core test wrapper to connect core internal scan chains withan NoC protocol and added a test interface unit to convert an

external tester interface into the functional interface protocol.To reuse the core test wrapper and an NoC for debug, we usetransaction- and scan-based debugging strategy and presentedthe core debug supporting logic. It was designed so as to beable to support a single transaction step debugging and inter-face with other debugging components operating with differentclocks with small number of signals. By using our DfT and DfDtechniques, testing and debugging of a large NoC-based SoCcan be performed efficiently without adding new parallel pathsor using the slow IEEE 11491.1 as a test and debug data path.

REFERENCES

[1] L. Benini and G. De Micheli, “Networks on chips: A new SoC para-digm,” IEEE Computer, pp. 70–80, 2002.

[2] P. P. Pande, C. Grecu, A. Ivanov, R. Saleh, and G. De Micheli, “De-sign, synthesis, and test of networks on chips,” IEEE Design & Test ofComputers, pp. 404–412, 2005.

[3] J. Lee and H. Lee, “Wire optimization for multimedia SoC and SiPdesigns,” IEEE Trans. Circuits Syst. I, vol. 55, no. 8, pp. 2202–2215,Sep. 2008.

[4] H. Hong and S. Liang, “A decorrelating design-for-digital-testabilityscheme for � �� modulators,” IEEE Trans. Circuits Syst. I, vol. 56,no. 1, pp. 60–73, Jan. 2009.

[5] J. Song, H. Yi, J. Han, and S. Park, “An efficient SoC testing techniqueby reusing on/off-chip bus bridge,” IEEE Trans. Circuits Syst. I, vol.56, no. 3, pp. 554–565, Mar. 2009.

[6] E. J. Marinissen and T. Waayers, “Infrastructure for modular SOCtesting,” in Proc. IEEE Custom Integrated Circuits Conf., 2004, pp.671–678.

[7] T. Waayers, R. Morren, and R. Grandi, “Definition of a robust mod-ular SOC test architecture; resurrection of the single TAM daisy-chain,”presented at the IEEE Int. Test Conf., 2005, Paper 25.3, PP. 1–10.

[8] J. Aerts and E. J. Marinissen, “Scan chain design for test time reductionin core-based ICs,” in Proc. IEEE Int. Test Conf., 1998, pp. 448–457.

[9] E. J. Marinissen, R. Arendsen, G. Bos, H. Dingemanse, M. Lousberg,and C. Wouters, “A structured and scalable mechanism for test accessto embedded reusable cores,” in Proc. IEEE Int. Test Conf., 1998, pp.284–293.

[10] P. Varma and S. Bhatia, “A structured test re-use methodology for core-based system chips,” in Proc. IEEE Int. Test Conf., 1998, pp. 294–302.

[11] E. J. Marinissen, S. K. Goel, and M. Lousberg, “Wrapper design forembedded core test,” in Proc. IEEE Int. Test Conf., 2000, pp. 911–920.

[12] IEEE Standard Testability Method for Embedded Core-Based Inte-grated Circuits, , IEEE Computer Society, Aug. 2005.

Page 10: On-Chip Support for NoC-Based SoC Debugging

YI et al.: ON-CHIP SUPPORT FOR NoC-BASED SoC DEBUGGING 1617

[13] IEEE Standard Test Access Port and Boundary-Scan Architec-ture-IEEE Std. 1149.1-2001, , IEEE Computer Society, 2001, IEEEPress.

[14] C. Feige, J. T. Pierick, C. Wouters, R. Tangelder, and H. G. Kerkhoff,“Integration of the scan-test method into an architecture specificcore-test approach,” J. Electronic Testing: Theory and Applications,pp. 125–131, 1998.

[15] C. Lin and H. Liang, “Bus-oriented DFT design for embedded cores,”in IEEE Asia-Pacific Conf., 2004, pp. 561–563.

[16] J. Song, P. Min, H. Yi, and S. Park, “Design of test access mechanismfor AMBA based system-on-a-chip,” in IEEE VLSI Test Symp., 2007,pp. 375–380.

[17] A. M. Amory, K. Goossens, E. J. Marinissen, M. Lubaszewski, and F.Moraes, “Wrapper design for the reuse of a bus, network-on-chip, orother functional interconnect as test access mechanism,” IET Comput.Digit. Tech., pp. 197–206, 2007.

[18] F. A. Hussin, T. Yoneda, and H. Fujiwara, “Optimization of NoCwrapper design under bandwidth and test time constraints,” in Proc.IEEE Eur. Test Symp., 2007, pp. 35–42.

[19] B. Vermeulen and S. K. Goel, “Design for debug: Catching design er-rors in digital chips,” IEEE Design & Test of Computers, pp. 35–43,May–Jun. 2002.

[20] B. Vermeulen, T. Waayers, and S. Bakker, “IEEE 1149.1-compliantaccess architecture for multiple core debug on digital system chips,” inProc. IEEE Int. Test Conf., 2002, pp. 55–63.

[21] B. Vermeulen, T. Waayers, and S. K. Goel, “Core-based scan archi-tecture for silicon debug,” in Proc. IEEE Int. Test Conf., 2002, pp.638–647.

[22] “ETM10 Technical Reference Manual,” ARM, Nov. 2003.[23] “CoreSight Technology System Design Guide,” Revision: r1p0 ARM,

Jul. 2007.[24] N. Stollon, R. Leatherman, B. Ableidinger, and E. Edgar, “Multi-core

embedded debug for structured ASIC systems,” in Proc. DesignCon,2004.

[25] C. Ciordas, T. Basten, A. Radulescu, K. Goossens, and J. V. Meer-bergen, “An event-based monitoring service for networks on chip,”ACM TODAES, vol. 10, no. 4, pp. 702–723, 2005.

[26] C. Ciordas, K. Goossens, T. Basten, A. Radulescu, and A. Boon,“Transaction monitoring in networks on chip: The on-chip run-timeperspective,” Proc. IEEE IES, pp. 1–10, 2006.

[27] S. Tang and Q. Xu, “A multi-core debug platform for NoC-basedsystems,” in Proc. Design, Automation and Test in Europe, 2007, pp.870–875.

[28] K. Goossens, B. Vermeulen, R. van Steeden, and M. Bennerbroek,“Transaction-based communication-centric debug,” in Proc. Int. Symp.on Networks-on-Chip, 2007, pp. 95–106.

[29] R. Kuppuswamy, P. DesRosier, D. Feltham, R. Sheikh, and P.Thadikaran, “Full hold-scan systems in microprocessors: Cost/benefit.analysis,” Intel Technol. J., vol. 18, no. 1, Feb. 2004.

[30] H. Yi and S. Kundu, “On design of hold scan cell for hybrid operationof a circuit,” in Proc. IEEE Europ. Test Symp., 2008.

[31] H. Al-Asaad and P. Moore, “Non-concurrent on-line testing via scanchains,” in Proc. IEEE Systems Readiness Technology Conf., 2006, pp.683–689.

[32] B. Vermeulen, “Functional debug techniques for embedded systems,”IEEE Design & Test of Computers, pp. 208–215, May/Jun. 2008.

[33] M. Beck, O. Barondeau, M. Kaibel, F. Poehl, X. Lin, and R. Press,“Logic design for on-chip test clock generation—Implementation de-tails and impact on delay test quality,” in Proc. Design, Automation andTest in Europe, 2005, pp. 56–61.

[34] H. Yi, J. Song, and S. Park, “Low cost scan test for IEEE 1500-basedSoC,” IEEE Trans. Instrum. Meas., pp. 1071–1078, May 2008.

[35] Glitch Free Safe Clock Switching. VLSI-WORLD. [Online]. Avail-able: http://www.vlsi-world.com/content/view/64/47/1/0/

[36] I. Silas, I. Frumkin, E. Hazan, E. Mor, and G. Zobin, “System-level val-idation of the Intel Pentium M processor,” Intel Technol. J., pp. 37–43,2003.

[37] S. K. Goel and E. J. Marinissen, “Effective and efficient test archi-tecture design for SOCs,” in Proc. IEEE Int. Test Conf., 2002, pp.529–538.

[38] R. Siripokarpirom, “A run-time reconfigurable hardware infrastructurefor IP-core evaluation and test,” in Proc. Int. Conf. Field ProgrammableLogic and Applications, 2005, pp. 505–508.

[39] “GRLIB IP Liabrary User’s Manual,” ver. 1.0.16, 2007, J. Gaisler, S.Habinc and E. Catovic.

[40] AMBA AXI Protocol Specification, V.1.0, ARM, 2003.

Hyunbean Yi received the B.S., M.S., and Ph. D.degrees in Computer Science and Engineering fromHanyang University, Korea, in 2001, 2003, and 2007,respectively.

He is currently a Research Scholar in Elec-trical and Computer Engineering at University ofMassachusetts, Amherst. He has been working atKorea Electronics Technology Institute (KETI) asan associated researcher (2002–2007). His researchinterests include high-speed communication systemdesign, Design-for-Testability (DfT), SoC/NoC

testing/debugging, and NoC architecture optimization. He is a member of theInstitute of Electronics Engineers of Korea (IEEK), the Korea InformationScience Society (KISS) the Korea Information and Communications Society(KICS), and the IEEE.

Sungju Park received the B.S. degree in electronicsengineering from Hanyang University, Korea in 1983and the M.S. and Ph.D. degrees in electrical and com-puter engineering from the University of Massachu-setts at Amherst in 1988 and 1992 respectively.

He was with the Gold Star Company in Seoul,Korea from 1983 to 1986 in charge of developingmicrocomputer and network interface systems. From1992 to 1994 he worked for IBM Microelectronicsin Endicott, NY in charge of Test Design Automa-tion. He has been a Professor in the Department

of Electrical and Computer Engineering, Hanyang University, Ansan, Koreasince 1995. His research interests lie in the area of the VLSI testing includingSoC testing, scan designs, memory BIST, interconnect testing, and additionalinterests include network chip design, embedded system, and graph theory.

Prof. Park has been a life time member of Institute of Electronics of Korea,the Korea Information Science Society, and the Institute of Semiconductor Testof Korea.

Sandip Kundu (M’86–SM’94–F’07) is a Professorof Electrical and Computer Engineering at Univer-sity of Massachusetts, Amherst. Previously, he was aPrincipal Engineer at Intel Corporation and ResearchStaff Member at IBM Corporation. He has publishedmore than 130 papers in VLSI design and CAD, holds12 patents, and given more than a dozen tutorials atconferences.

Prof. Kundu was the Technical Program Chair ofICCD in 2000 and General Chair in 2001. He alsoserved as a co-General Chair of VLSI 2005 confer-

ence. He is a Fellow of the IEEE and a Distinguished Visitor of the IEEE Com-puter Society.


Recommended