Layout-driven chaining of scan flip-flops

Layout-driven chaining of scan flip-flops

K.-H.Lin C.-S. Chen T.-T. Hwang

Indexing terms: Partial scan, Chaining of scan flip-flops, Matching, Placement, Global routing

Abstract: In an era of submicron technology, routing is becoming a dominant factor in area, timing and power consumption. The problem of scan flip-flops chaining with the objective of achieving minimum routing area overhead is studied. The first attempt is to chain the flip-flops in logic level. To make more accurate decisions on chaining flip-flops, the second attempt is to perform the chaining of scan flip-flops taking layout information into consideration. Specifically, the authors show that the chaining problem is a travelling salesman problem (TSP). Then, two heuristics, greedy and matching-based algorithms, are proposed to solve the TSP problem. Various cost functions will be defined which take layout information into account. Benchmarking results show that the cost function achieves the best results when it considers placement and routing information and is dynamically updated.

1 Introduction

Scan design methodology, as design for testability (DFT), is becoming a popular technique for sequential circuits [I]. Scan design adds test mode control signals to the circuit, connects flip-flops to form a shift register in test mode, and makes primary input/output of the test shift register controllable and observable. It pro- vides controllability and observability of the internal state variable for testing.

Three types of scan method have been proposed: full scan, random access and partial scan. Of these, the partial scan, which leads to low overhead, is the most popular in scan methodology. It can be applied to a modular/bus oriented design [ 121 employing parallel scan paths to access partial scan chains in the modules.

To maintain both low complexity of test generation and low overhead, scan flip-flops must be selected care- fully in partial scan design. Cheng and Agrawal [2] have proposed a cycle-breaking technique to select scan flip-flops by which test generation complexity of the resultant circuits grows linearly with the sequential depth and the overhead is about 25% of full scan. 0 IEE, 1996 IEE Proceedings online no. 19960638 Paper first received 14th July 1995 and in revised form 12th April 1996 The authors are with the Department of Computer Science, Tsing Hua University, Hsin-Chu, Taiwan 30043, Republic of China

Later, the cycle-breaking algorithm was improved by [3, 41. Taking timing into consideration, Jou and Cheng [5] have proposed a timi ng-driven partial scan system which aims at reducing area overhead and performance degradation caused by added test logic.

Most of previous work [l-71 on partial scan has put emphasis on selecting scan flip-flops. The chaining of scan flip-flops was not considered. Howeverjn an era of submicron technology, routing is becoming a dominant factor in area, timing and power consumption. A good chaining of flip-flops may result in 57% reduction in routing area, according to the experimental results presented in Section 5. Therefore, in this paper, we address the problem of scan flip-flop chaining with the objective of achieving the minimum routing area. The first attempt is to chain the flip-flops in logic level. To make a more accurate decision on chaining the flip- flops, the second attempt is to perform chaining of scan flip-flops taking the layout information into consideration. Specifically, we show that chaining of scan flip-flops is a travelling salesman problem (TSP). Then, we propose heuristics to solve the TSP problem. Vari- ous cost functions will he defined which take layout information into account.

FFs by scan FFs

chain the scan FFs

I Fig. 1 Conventional partial scan design flow

2 Flip-flops chaining without layout information

A conventional approach would decide the chaining sequence of scan flip-flop before layout synthesis, as Fig. 1 shows. First, it selects scan flip-flops of the given circuit. Then, it replaces a selected flip-flop by a scan flip-flop. Next, it chains the scan flip-flops. Finally, it performs the place and route.

The information which could be used in deciding the order of chaining scan flip-flops is the topology of the circuit. For example, let Fig. 2a be the S-graph of a given circuit, where a vertex represents a flip-flop of

42 1 IEE Proc.-Comput. Digit. Tech.. Vol. 143, No. 6, November 1996

the circuit and an edge ei,. exists if there is a combina- tional path from flip-flop i to flip-flop j . Assume flip- flops A , B, C be selected as scan flip-flops. We can con- struct a reduced graph for the selected scan flip-flops, as shown in Fig. 2b, where vertices correspond to the scan flip-flops, and where an edge ey exists if there is a path from flip-flop i to flip-flop j in the S-graph. To consider the topology of the scan flip-flops, we will chain the flip-flops according to the depth first search (DFS) order of the reduced graph. Therefore, the chaining sequence will be A-B-C. The ordering may result in unsatisfactory chaining since the layout information is not taken into consideration. If the layout information is taken into account, we will have a design flow as shown in Fig. 3. Having selected the scan flip-flops, placement and global routing will be performed first. Then, the chaining sequence can be determined by the layout information obtained so far.

U

b Fig.2

a S-graph; b reduced graph

S-graph of a given circuit and reduced graph for selected scan PGJ-Jlops . select scan FFs

FFs by scan FFs

global route

by [ayout information

interconnect the SFFs

I Fig. 3 Layout driven partial scan design jlow

3 flop chaining

Scan flip-flops can be selected by any algorithms proposed previously [24]. After selecting flip-flops, we formulate the chaining of scan flip-flops as follows. Let G = (V, E) be a complete undirected weighted graph, where vertex v, E V' corresponds to the scan flip-flop and weight on edge E E denotes the cost of chaining SFF, to SFC. Since all scan flip-flops have to appear in the chaining sequence once and at most once, and have

Problem modelling for the layout-driven flip-

422

to be chained with the minimum sum of weights on edges, the problem of chaining scan flip-flops will be equivalent to finding a travelling salesman path in the complete undirected weight graph G = (V, E>. Further- more, since the travelling salesman problem is NP-complete [XI, the chaining of scan flip-flops is NP-complete. Fig. 4 shows the correspondence of the chaining problem and the travelling salesman problem, where the path 1-3-5-2-4-6 in the graph G corresponds to the chaining of flip-flops F Fl-F F3-F F5-F F2-F F4-F F6.

n'

W4 Fig.4 Travelling salesman modelling

~~~

Algorithm Greedy(G = (V, E, W)) Input: G is a complete graph with weight on each edge; Begin

while (the-number-vertices > 1) then if (eiJ E E and W (eij) is the smallest) do

Chain vertex i and vertexj; Form a supervertex V(iJl; Delete the edges incident to vertex i and vertex j ; Compute and assign weights on edges connecting to other vertices;

endif endwhile

End

Fig.5 Greedy algorithm

Algorithm Matching(G = (V, E, w)) Input: G is a complete graph with weight on each edge; Begin

while (the-number-vertices > 1) then Perform minimum weighted matching on G; for each pair of matched vertices i and j do

Chain vertex i and vertex j ; Form a supervertex V,,,,; Delete the edges incident to vertex i and vertex j ;

endif Compute and assign weights on edges;

endwhile End

Fig. 6 Matching-based algorithm

Many sophisticated heuristics have been proposed to solve the travelling salesman problem. It is not the pur- pose here to offer another sophisticated heuristic. Instead, we try to pick up two relatively simple heuristics in solving the travelling salesman problem. The first one is a greedy method. Specifically, in each iteration, we chain the two flip-flops, which has the least cost on the edge, and combine them as one supervertex. The iteration continues until a single vertex is left. The second heuristic is a matching-based algorithm. That is, in each iteration, a minimum weighted matching is found. Two matched nodes are chained and combined as a supervertex. The matching continues until only a single vertex is left. Figs. 5 and 6 are the greedy and the matching-based algorithms. Since the matching algorithm considers more global information than the greedy one in each iteration, it seems that the former will achieve better solution than the latter. However,

IEE Proc.-Comput. Digit. Tech., Vol. 143, No. 6, November 1996

we found that it is not always the case. The results also depend on if the cost function of two scan flip-flops is dynamically updated. Section 5.3 contains a discussion on this.

After vertices are merged, the weights on the complete graph have to be reassigned. A supervertex represents a chain of vertices. Only the ending vertices of the chain sequences can be connected to each other. For two supervertices, there are four possible connections between these two nodes. The minimum cost of all possible connections between two supervertices will be assigned as the weight on the edge of these two nodes. Fig. 7 shows an example of the reassignment of weight on the edge between two supervertices. In Fig. 7a, vertices 1, 2, 3 and vertices 4, 5 are combined to form supervertex ~ { 1 , 2 , 3 } and supervertex v { ~ , ~ } , respectively. Only the ending vertices v1 and v3 of supervertex v{1,2,31 can be connected to the ending vertices v4 and v5 of supervertices v { ~ , ~ ) . Four costs c1, c2, c3 and c4 result from the four connections, VI-", v I - v ~ , v3-v4, v3-v5. Fig. l b shows that the weight on the edge of the two supervertices is the minimum value of c1, c2, c3 and c4.

The unsolved problem now is how to define the weight on the edge in the complete graph.

Fig. 7 assigned as the weight on the edge of the two supervertices a Possible connections: b minimum cost

Possible connections between two supervertices and minimum cost

determined. Then, the weight on an edge, eY will be defined as the Manhattan distance of the two scan flip- flops, vi and vF Therefore, we have the first weight function as follows:

WI (ez, j ) = the Manhattan distance between uz and wu3 For example, let the selected scan flip-flops A , B and C of the previous example as shown in Fig. 2 be placed as shown in Fig. 8. If the Manhattan distance is used as weight on the edge, the chaining sequence will be B-A-C.

4.2 Weight function taking into account placement and ro U ting in forma tion The cost function of Manhattan distance will result in the shortest distance between vertices. But it does not necessarily lead to the minimum total routing area. If the route with the shortest Manhattan distance goes through a congested area, it will increase the routing track and result in more area.

Fig. 9 Flip-jlops chained using placement and routing information

Fig. 9 illustrates this situation. In it, there are two channels between three mws of cells, where the channel densities of channel 1 and 2 are 5 and 4, respectively. If Manhattan distance is used in determining the chaining

Fig.8

sequence, the flip-flops will be chained as B-A-C a n i thus the channel density of channel 1 will increase to 6. However, if we chain the flip-flops as sequence A-C-B, the channel density of both channels will remain the same. The reason why 1 he sequence B-A-C increases the channel density is that the maximum column density between cell A and cell B is equal to the channel density. That is, this pari of channel is congested. If the connection of cell A and cell B has to be made, an extra track is required. Therefore, to save area overhead, cells should be chained according to the congestivity of the layout plane. Since the placement and global routing has been performed, the column density can be used in estimating the congestivity of the layout

Flip-flops chained using plucement information

4

The weight on the edge can reflect the area overhead, and the timing cost and power consumption when the circuit operates in the test mode. Since the circuit usu- ally operates at slow clock rate in test mode, timing issue is not so critical. But the area caused by the test logic will affect the overhead, and power consumption which is proportional to the capacitance load. There- fore, we will consider the minimisation of area on routing only in this paper.

4.1 Weight function taking into account placement information The first attempt to define a layout-driven cost function is to consider the placement information. After placement is performed, the positions of all cells are

Defining weight on the edge

IEE Proc.-Comput. Digit. Tech., Vol. 143, No. 6, November 1996

plane. We will assume that feedthrough and only one chan-

nel will be used in connecting two cells. Since the cells may not be located at adjacent rows, more than one channel can be considered to connect the two cells. For channel selecting, the fallowing heuristics are used. If two cells are in the adjacent rows, the channel between these two cells is considered. If two cells are in the same row, the up channel and the lower channels are considered. If the cells are apart from each other by more than one channel, the channels between the two cells are considered. Therefore, for each pair of scan flip-flops, vIvI, and a channel c under consideration for routing, we define the following weight:

WI (ez , j ) --cy x min(channe1-density.-column-densityk} w; (U,, vug, c) =

k D

423

where D is the columns between v, and vJ, colurnn-density, is the column density of column k in channel c and a is a very large number. The second term denotes the number of tracks which can be used without increasing the channel density. The first term is used in breaking the tie of the second term. Finally, the weight defined on the edge for vertices, i, j , is to select the less congested channel from the channels under consideration for routing. It is defined as

Wz(e,,J = minW;(v,,v,,c) CEG

where C is the channels under consideration which are selected using the channel selecting heuristic above- described.

We say that the cost function between two vertices is static if it is computed at the beginning of the chaining process and used without change during the entire process. During the whole chaining process, the Man- hattan distance between two vertices will remain the same. Hence a cost function with static feature. How- ever, the maximum column density between two vertices may change in each iteration of chaining. Therefore, both static and dynamic cost function can be used. A static cost function of column density will save computation time but results in inferior solutions to a dynamic cost function. To obtain better results, when column density is used, the cost function should be updated dynamically.

5 ~ x ~ e r i m e ~ t a l result

The algorithms and cost functions described in the previous sections have been implemented in C on a Sun workstation. Several experiments have been conducted to investigate their effectiveness. A subset of bench- mark circuits from ISCAS is selected for the experiment. The circuits whose selected scan flip-flops is less than 10 are excluded in the selection because the interconnection area required to route these flip-flops will be very insignificant compared to area for the whole circuit. For all selected circuits, the experimental process begins with selecting the scan flip-flops using the method proposed by [4]. Then, the selected flip-flops are replaced by the scan flip-flops. Timberwolf [9] is used for the placement and routing before and after the chaining for different cases. The output of Timberwolf includes the number of rows in layout, the length of each row, and the total number of tracks. We will use the library [lo] in which the cell height is 5 8 p , and the routing pitch is 8 p . Hence, the area needed will be estimated by the following formula:

Area =[58 x (no. of rows) + 8 x (total no. of tracks)] In Section 5.1, we will present the comparisons of routing area used by various cost functions which are used

x length of longest row

in selecting the chaining sequence. In Section 5.2, we will compare the results of using static and dynamic updated cost functions. In Section 5.3, we will compare the results of using greedy and matching-based algorithm in solving TSP.

5. I The first experiment is to compare the results of using various cost functions in selecting the chaining sequence. Table 1 shows the comparisons. The first column is the names of circuits, and the second the number of selected scan flip-flops. The results under columns Random and Logic are obtained without layout information. For Random, the scan flip-flops are chained randomly. For Logic, the scan flip-flops are chained using DFS ordering of the reduced graph as presented in Section 2. Having the flip-flops been selected, the circuits are fed through Timberwolf for placement and global routing.

The results under columns Manhattan, and Density are obtained with layout information. Before flip-flops are chained, the circuits are fed through Timberwolf first. After the placement and global routing, the layout information obtained so far is then used for chaining the scan flip-flops. For Mahattan, the flip-flops are chained using Manhanttan distance as W, defined. For Density, the flip-flops are chained using channel density as W, defined. The TSP is solved using matching-based algorithm. After the chaining sequence is determined, the circuit is fed to Timberwolf again to complete the routing of scan flip-flops.

Since the area excluding the interconnection for scan flip-flops chaining is the same for all cases, we will compare only the routing area required for chaining flip-flops. Column 3 is obtained by the difference of area without chaining scan flip-flops and area with chaining the scan flip-flops. It is used as a reference value. The columns under area are the routing area required to chain the scan flip-flops and the columns under ratio are the ratio to the area for random case. From the Table, it is clear that the weight using channel density as the cost function obtains the best results. On average, it requires about 46% less routing area than does random chaining.

Comparisons of cost functions

5.2 Comparisons of static and dynamic updated costs The column density between two vertices may change in each iteration of chaining. To compare the static and dynamic updated cost functions, we have the following experiment. For Static, the column density is computed at the begining of the algorithm and used without change during entire chaining process. For Dynamic, the column density is updated on each iteration of chaining. The matching-based algorithm is used in

Table 1: Comparisons of various cost functions

Circuit SFF Random Logic Manhattan Density area ratio area ratio area ratio area ratio

% % Yo % 6 3 7 8 30 2042 100 2009 98.37 1974 96.68 1526 74.74

s9234 54 3941 100 3833 97.27 3689 93.60 1715 43.52

~13207 59 5145 100 4948 96.16 4264 82.88 2881 56.00

~15850 91 6814 100 5633 82.68 3917 57.50 2729 40.06

Average 100 93.62 82.67 53.58

424 IEE Proc.-Comput. Digit. Tech., Vol. 143, No. 6, Novembev 1996

solving the TSP. Table 2 shows the result. From the Table, it can be seen that dynamic updated cost function obtains better results.

Table 2: Comparisons of static and dynamic cost functions

Circuit SFF Static Dynamic

area ratio area ratio

% %

s5378 30 1526 74.74 1341 65.69

s9234 54 1715 43.52 1667 42.30

s13207 59 2881 56.00 2369 46.05

s15850 91 2729 40.06 2501 36.71

Aver age 53.58 47.69

5.3 Comparisons of greedy and matching- based algorithms When used in solving TSP, the matching-based algorithm considers more global information than the greedy one in each iteration. Therefore, the result of the matching-based algorithm is likely better than that of the greedy algorithm. To confirm this presumption, we have conducted the experiments using greedy and matching-based algorithms in solving TSP. The experiment uses column density as the cost function since it is the cost function obtaining the best results from Table 1. Table 3 confirms our presumption. It shows that the matching-based algorithm obtains a better result than obtained by the greedy algorithm.

Table 3: Comparisons of matching-based and greedy algorithms with the static cost function

Circuit SFF Matching Greedy


% % s5378 30 1526 74.74 1550 75.93

a 2 3 4 54 1715 43.52 1973 50.08

~13207 59 2881 56.00 3184 61.88

~15850 91 2729 40.06 2837 41.64

Average 53.58 57.38

Table 4: Comparisons of matching-based and greedy algorithms with the cost function dynamically updated

Circuit SFF Matching Greedy


% % s5378 30 1341 65.69 1260 61.70

s9234 54 1667 42.30 1440 36.55

~13207 59 2369 46.05 2193 42.63

~15850 91 2501 36.71 1919 28.17

Averaae 47.69 42.26

We continue the experiment for the case where the cost function is dynamically updated. Unexpectedly, we found that the greedy algorithm perfoms better than the matching-based one. Table 4 shows the results. The reason is that the matching-based algorithm determine chaining of half of scan flip-flops for the first iteration. It cannot fully utilise the incremental information about the usage of the routing tracks.

6 Conclusions

In this paper we have studied the problem of scan flip- flops chaining with the objective of achieving minimum routing area overhead. The first attempt is to chain the flip-flops in logic level. To make a more accurate decision on chaining the flip-flops, the second attempt is to perform the chaining of scan flip-flops taking layout information into consideration. Specifically, we have shown that chaining problem is a travelling salesman problem (TSP). Then, two heuristics, greedy and matching-based algorithms, have been proposed to solve the TSP problem. Various cost functions have been defined which take layout information into account. Benchmarking results show that the cost function achieves the best results when it takes into account placement and routing information and is dynamically updated.

7

1

2

3

4

5

6

7

8

9

10 11

12

References

AGRAWAL, V.D., CHENG, K.-T., JOHNSON, D.D., and LIN, T.: Designing circuits with partial scan, ZEEE Design Test Comput., 1988, 5, (2), pp. 8-15 CHENG, K.-T., and AGRAWAL, V.D.: A partial scan method for sequential circuits with Feedback, IEEE Trans. Cornput., 1990, pp. 544-548 LEE, D.H., and REDDY, S.M.: On determining scan flip-flops in partial-scan. Proc. ICCAD-90, May 1991, pp. 322-325 BHAWMIK, A., LIN, C.J., CHENG, K.-T., and AGRA- WAL, V.D.: Pascant: a partial scan and test generation system. Proceedings of the conference on Custom integrated circuits, May 1991, JOU, J.-Y., and CHENG, K.-T.: Timing-driven partial scan. Proceedings of the international conference on Computer-aided design, Nov. 1991, pp. 404-4.07 KAGARIS, D., and TRAGONDAS, S.: Partial scan with retim- ing. Proceedings of the conference on Design automation, June 1993, pp. 249-254 PARIKH, P.S., and ABRAMOVICI, M.: A cost-based approach to partial scan. Proceedings of the conference on Design automation, June 1993, pp, 255-259 GAREY, M., and JOHNSON, D.: Computers and intractability: a guide to the theory of NP-:ompleteness W.H. Freeman & Co., 1979) Timber Wolf: Mixed macroistandard cell floorplaning, placement and routing package (Yale IJniversity, Sept. 1991) MCNC 2.0 standard cell data book, March 1993 PAPADIMINITRIOU, C.H., and STEIGLITZ, K.: Combinato- rial optimization: algorithms and complexity (Prentice-Hall, NJ, 1982) NOZUYAMA, Y . , NISHIMURA, A., and IWAMURA, J.: Design for testability of a 32-bit microprocessor, the TX1. Inter- national test conference, 1988, pp. 267-277

IEE Proc-Comput. Digit. Tech., Vol. 143, No. 6, November 1996 425

Date post:	18-Sep-2016
Category:	Documents
Upload:	tt
View:	219 times
Download:	1 times

Layout-driven chaining of scan flip-flops

Documents