Scaling theory in modern VLSI

Factors Affecting Interconnects, Wire Length, and Clock Speed

s the dimensions of critical parts of the individual transistors in an integrated circuit are continually down- sized from one generation to the next, the integrated circuits continue to become more complex and there-

fore contain more transistors and/or gates. The manner in which the dimensional downsizing is accomplished is through a process known as scaling theory [l, 21. On the other hand, the growth of integration density on the chip has followed a relatively well-known exponential increase with a critical doubling of the number of transistors (or gates) every 18 months-a rule known as Moore’s Law [ 3 ] , in which downsizing is accompanied by increased circuit complexity and increased die size. Indeed, the scaling arguments and Moore’s Law have been used recently by the Semiconductor Industries Association (SIA) to generate a roadmap describing the expectations for the next several generations of integrated circuits [4]. However, there is some contro- versy surrounding the scaling extensions applied within this latter document. For example, in a recent article, we pointed out that the number of pins in modern VLSI circuits is not increasing as rapidly as predicted by the roadmap [ 5 ] . Moreover, a grow- ing fraction of the actual increase in pin number that is occurring is connected with power and ground requirements and not with the basic signal U0 that is really associatedwith the scaling theory.

This slower (than expected) increase in pin count with each successive generation is important as it is also intimately connected to the increase in wire length on the chip [6, 71. The increase in the number of pins for an increase in the number of gates is determined by Rent’s Rule:

P = AGS,

where P is the number of pins, G is the number of gates (or transistors in modern CMOS), and A and s are constants. In our re-

cent article, it was found that s - 0.3, and similar results have been found by others [8]. It is important that these studies have shown that s < 0.5, for the latter value signals a transition in the behavior of the needed wire to interconnect the chip. Indeed, for s < 0.5, the average interconnect length on the chip is independ- ent of the number of gates, and is determined bys itself [6, 71. Here, the average interconnect length is the averagevalue of the sum of the wire interconnect lengths that run from the output of one given cell to other cells.

In this article, we will discuss interconnects and wire lengths on the chip, and, consequently, we will discuss the manner in which the total wire length on the chip increases under the scaling theory described above. We will also describe how the clock speed scales under this theory, and what kind of speedups can be expected with each successive generation. In the next section, we first discuss how the average interconnect length is achieved, and how scaling affects the overall wire length. In the succeeding section, we will describe how the actual wire length increases with each generation as a result of various factors other than the reduction of the critical dimension. Finally, we turn to the clock speed and its scaling from one generation to the next.

Average Interconnect lengths Modern VLSI chips are designed according to a functional integration approach first described by McGroddy and Solomon [9]. In this approach, a procedure is followed that is based upon the Mead-Conway assertion that transistors are free and interconnect delay is costly [lo]. This approach is distinctly different from earlier design philosophies in which the number of transistors (or gates) was minimized as costwas directly related to these quantities. Today, the basic cost of an integrated circuit is relatively constant from one generation to another and is described in terms of a few dollars per square centimeter of processed and

CIRCUITS 8 DEVICES i~ SEPTEMBER 1997 8755-3996/97/$10.00 0 1 9 9 7 IEEE

packaged wafer. In contrast, the design, verification, and testing of the chip can be quite expensive and contributes to a major fraction of the overall cost.

As a consequence of the functional design of the chip, most interconnects run to other transistors (or gates) that are physi- cally located quite close to the originating transistor (or gate). Long interconnects tend to be associated with a few connections between different functional blocks or with clock and power dis- tribution (as well as long data buses). As a result, most of the interconnects are relatively short, with only a few running long distances across the chip. It is this recognition that leads to the properties of the interconnect topology being described through a connection to Rent’s rule [6,7]. A simple argument can illus- trate this connection [ 111. Let us put a set of pins around the periphery of a chip (for this discussion, we ignore the obvious point that pad sizes often dictate a double, or even triple, row of pins around the periphery). The pins have a fixed pad size, so the number of pins is related to the area of the chip. However, the pins are on the periphery and, if the entire periphery is used, the number of pins increases as the square root of the area of the chip (the number of pins is set by the circumference of the chip, not by the area directly).

On the other hand, the number of gates increases directly with the area of the chip, if we assume that the cell area remains constant. Hence, the number of pins increases as the square root of the number of gates on the chip. This is one of the simplest statements of Rent’s rule. If, in practice, the exponent in Eq. (1) is less than 0.5, the number ofpins is increasing less rapidly than our example, and we have a functional architecture whose flow is essentially less than the two dimensions of the chip. If, on the other hand, the exponent in Eq. (1) is greater than 0.5 (as it was in early discrete component computers), then we are faced with a gate architecture whose information flow is essentially greater than the two dimensions of the chip (a situation termed highly partitioned by McGroddy and Solomon [9]). This latter case re- quires trying to cover the entire chip surface with pins, or adopt- ing a less useful multi-chip implementation, either of which constitute a troublesome situation.

For the case in which s < 0.5, the information flow does not require the full two dimensions of the chip, so that interconnects can easily be run to successive gates of the functional block. Con- sequently, as one scales the circuit size, the average interconnect length will scale down directly with the scaling of the transistors of the chip. This result has been shown mathemati- cally [6], and one can write the average interconnect length, relative to the cell size, (in one theory) as:

circuit cell pitches fors - 0.3. This is an increase over that found in earlier work [8] and is a direct result of the increase in the value of s. If we now take an “average” type of chip in a particular generation in which the critical dimension is 0.35 pm, with 64 x lo6 tran-

sistors on a 1 cm2 chip (this is about one-half the SIA estimate, as we have not included the pad area, preferring instead to deal only with the active area), the average cell size is 1.3 x 1.3 pm2, and the average interconnect length is estimated to be 5.3 pm from Eq. (2). This is not the length of a single interconnect, but is the sum of the important signal interconnects emanating from the cell. This leads to a total wire length of 340 m on the chip.

It is therefore evident that simply scaling the functional block size downward by a factor 1, according to the scaling rules, will actually shorten the total wire length by this same scaling factor. However, as one downsizes the transistors, the scaling rules usually only decrease the critical dimension by a factor ?L =

& each generation (we return to a more expansive discussion of this below). This only increases the density by a factor of 2 every three years, so other factors are at work, as indicated by Moore’s Law. We now turn to the other factors.

From one generation to the next, which usually occurs on a three-year cycle, the density of the chip increases by a factor of 4. Part of this increase is due to the downsizing of the critical dimensions by a factor of h. As mentioned above, this factor typically has avalue of 4. In going from our 64 x lo6 transistor chip, with critical dimension of 0.35 pm, to the next generation of 256 x lo6 tuan- sistors with a critical dimension of 0.25 pm, one expects the cell size to be reduced to 0.9 xO.9 pm2. However, this does not provide the needed density increase. There are two other factors that con- tribute to the density increase, according to Moore’s Law [3]. One of these is an increase in the chip area, which we will deal with shortly. The other factor, however, is an over-scaling of the cell area. That is, the cell area is reduced more than that expected from the dimensional scaling, due to circuit clevemess [3]. The single- transistor RAM cell, the CMOS gate, trench isolation, trench ca- pacitors, etc., have contributed to this. These factors have led to an enhanced reduction in the cell area. In one study, itwas found that the actual cell size was reduced an additional factor of 1.5, so that the cell was over-scaled by a factor of 3X reduction rather than 2X [ 121. This means that our cell size is reduced to 0.73 x 0.73 ym2. As a consequence, our average interconnect length is also over- scaled and reduces to 3 pm.

This over-scaling of the cell area, which arises from the circuit cleverness, creates a couple of long-term problems for integrated circuits. One of these is the well-known problem with capacitor area. It is normally desirable to keep a fixed amount of charge in the capacitor of a DRAM cell, and this means that the capacitor area is consuming a larger and larger portion of the overall cell area [la]. The second problem is more subtle, but just as important: the cell area is decreasing faster than the square of the critical dimension. In the not too distant future, the cell areawill be smaller than the gate area of the single transistor, which is a rather uncomfortable problem that common wisdomwould rule as impossible. This may actually be a limiting process in future VLSI integration.

We can now estimate the increase in the chip area. Since the cell area is now 0.53 ym2, the total chip area has increased to

1. The critical dimension, cell edge (the square root o f the cell area), and the total wire length expected from scaling considerations are

plotted as a function o f the year.

1.36 cm2, or 1.17 cm on aside for a square chip. This is a factor of 1.33X increase in chip area from one generation to the next. The total wire length is now 770 m, or an increase of a factor of 2.3 (which is h3/&). It is important to note that the over-scaling of the cell area results in holding down the total wire length as well as making it possible to maintain proper scaled speedup of the clock frequency.

In some cases, there may be an increase in the total wire length beyond that indicated above. For example, scaling the chip down means that everything scales downward in size, and the interconnect patterning of the chip can remain precisely the same. However, in increasing the number of transistors, one driving factor is increased functionality, which means that new functions are usually added to the chip. As a result, the problems of providing new routing to incorporate these new functions can lead to the need to add levels of metallization. Simply scaling the chip does not lead to a need for additional levels of metal, but adding functionality to the chip can often do so. This points out the need for more advanced architectural design. In most cases, however, this additional level of metal accommodates the need for a few longer interconnects, which, when averaged over the total number of transistors (or gates), does not significantly increase the average interconnect length. So, while the total length of wire may increase slightly faster than indicated above, it is not expected to dramatically alter the scaling rules that have evolved for VLSI.

In Fig. 1, we plot the critical dimension, the square root of the cell area, and the total wire length on the chip as a function of the chip density. Here, we use the SIA information for the 64 MB chip, and extrapolate using the scaling rules discussed above. From this figure, one can see how these factors are expected to change over the next few generations of chip- density increases.

Clock Speeds As with wire length, there have been some misconceptions

surrounding the concept of clock speed increase over the past few generations of VLSI. One aspect of this is the role played by the board size, and the need to send signals out over the board. In fact, in highly partitioned architectures (such as gate arrays), the board size really dominates the degree to which the clock speed can be increased. Moreover, in the highly partitioned architectures, the average interconnection length does increase with the number of transistors (or gates) on the chip [6-81. While proper scaling would suggest that the clock frequency should be increased by h3, this does not occur in gate arrays due to these two factors. The limitation of having to send computational signals across the board limits the speedup to a factor of h2, and the increase of average interconnect length means that the frequency increase must be further reduced by a factor of 4/3 (which arises from the increase in average interconnect length in gate arrays [SI). As a consequence, the frequency increase is limited to h2/1.33, which almost exactly corresponds to that observed in large mainframes.

On the other hand, modernVLSI is mainly functionally partitioned. The consequences of this are that the average interconnect length actually decreases due to the over-scaling of the cell area. Moreover, since most computational communica- tions are kept on the single chip, the frequency increase can fol- low the correct h3 scaling rules. There has been some concern that perhaps the voltages were not being scaled down as rapidly as expected, but the decrease in average interconnect length due to over-scaling of the cell area compensates for this some- what. This also makes the modern VLSI chip more attuned to low-power designs. Using the h3 scaling, and recognizing how the factor of h connects to time, one can then project clock fre- quencies as a function of generation. In Fig. 2, we plot the h3 in- c r e a s e i n f r e q u e n c y a n d c o m p a r e i t w i t h m o d e r n microprocessors. The latter include the data in our previous work [5] involving microprocessors from Motorola, Intel, Sun, DEC, SCI, and so on. It is clear that the scaling rules are very good predictors for the clock frequency of the microprocessor. We note, however, that it is the clock speed that is plotted in this figure and not the instruction execution rate, as the latter will vary from one architecture to another.

Summary We have discussed the scaling rules for VLSI that pertain to the total wire length and the clock speed. Our analysis indicates that the total wire length is not increasing as rapidly as standard scaling theory would indicate. This results from over-scaling of the cell size reduction from one generation to the next (as predicted by Moore [3]). However, the total wire length is still increasing at a rate that will cause significant power dissipation in the interconnects and indicates the need for new locally interconnected architectures. Moreover, the over-scaling of cell size reduction also raises the possible limitations that arise as the cell size is reduced faster than the gate length.

2. The progression in chip clock frequency is plotted as a function o f the year. The solid curve is the scaling relationship expected for

functionally designed modern VLSI chips.

We also discussed the effects of scaling on on-die clock speed. While gate-array clock speeds are scaling slower than the scaling rules would predict (a problem for large multi-chip architectures), clock speeds in modern VLSI chips track the scaling rule quite accurately.

Ackn~wledgments The authors have benefited from discussions with J.R. Barker and P. Hasler. This work was supported in part by the Defense Advanced Research Projects Agency.

D.K. Ferry is a professor at Arizona State University’s Depart- ment of Electrical Engineering in Tempe, Arizona. LA. Akers is director of engineering at the University of Texas-San Antonio in San Antonio, Texas.

Refe~ences 1. R. H. Dennard, F. H. Gaensslen, L. Kuhn, and H. N. Yu, “Design of Micron MOS Switching Devices,” IEDM Tech. Dig., December 1972.

2. G. Baccarani, M. R. Wordeman, and R. H. Dennard, “Generalized Scaling Theory and Its Application to a 1/4 Micrometer MOSFET Design,” IEEE Trans. Electron Dev., Vol. ED-31, pp. 452-462, April 1984.

3. G. Moore, IEDMTech. Dig., pp. 11-4, Dec. 1975.

4. The National Technology Roadmap for Semiconductors, San Jose, C A Semiconductor Industry Association, 1994.

5. M. Yazdani, D.K. Ferry, and L.A. Akers, “Microprocessor Pin Predicting,” IEEE Circuits andDevicesMagazine, Vol. 13, No. 2, pp. 28-31, March 1997.

6. W.E. Donath, “Placement and Average Interconnection Lengths of Computer Logic,” IEEE Trans. Circuits and Syst., Vol. CAS-26, pp. 272-277, Apr. 1979.

7. D.K. Ferry, L.A. Akers, and E.W. Greeneich, Ultra Large Scale Integrated Microelectronics, Englewood Cliffs, NJ: Prentice-Hall, 1988.

8. D.K. Ferry, “Interconnection Lengths and VLSI,” IEEE Circuits and Dev. Mag., pp. 39-42, July 1985.

9. J.C. McGroddy and P. M. Solomon, “Device Technology Comparison in the Con- text of Large Scale Digital Applications,” IEDM Tech. DG., pp. 2-5, Dec. 1982.

10. C.A. Mead and L. Conway, Introduction to VLSI Systems, Reading, MA: Addison-Wesley, 1980.

11. L.A. Akers, D.K. Ferry, and R.O. Grondin, “Synthetic Neural Systems in the 1990s,” An Introduction to Neural and Electronic Networks, 2nd Ed., Ed. by S.F. Zornetzer, J.L. Davis, C. Lau, and T. McKenna, San Diego, C A Academic Press, 1995, pp. 359-387.

12. Nicky C.C. Lu, Proceedings of the MicroProcess Conference, Kanazawa, Japan, 15-18 July 1991, unpublished. CD

Date post:	22-Sep-2016
Category:	Documents
Upload:	la
View:	217 times
Download:	1 times

Scaling theory in modern VLSI

Documents