Power Delivery for Multicore Systems

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 19, NO. 12, DECEMBER 2011 2243

Power Delivery for Multicore SystemsAida Todri, Member, IEEE, and Malgorzata Marek-Sadowska, Fellow, IEEE

Abstract—As the industry moves from single- to multicore pro-cessors, the challenges of how to reliably design and analyze powerdelivery for such systems arise. We study various workload assign-ments to cores and their effect on the global power supply noise andground bounce. We provide a detailed analysis of single and mul-tiple cores and develop analytical formulas to capture the powersupply noise and ground bounce of the system. We introduce met-rics to estimate the amount of noise propagated from core to coreand propose a supply noise aware workload assignment method.In our experiments, we show that timing constraints can be signif-icantly affected if workload assignments are not properly made.

Index Terms—Ground bounce, multicore system, power supplynoise.

I. INTRODUCTION

T ODAY more and more processor chips use multiple coresin an attempt to deliver additional system performance

within their power budget. In 2001, IBM introduced POWER4,the first multicore processor chip [1]. A multicore design con-sists of several cores integrated on a single chip to maximizethroughput that accelerate application performance by dividingthe workloads among cores and executing them in parallel. Sev-eral challenges are associated with implementing multicore de-signs including connectivity and communication between thecores, data/cache coherency, and partitioning tasks among thecores. There are also challenges related to the physical design ofmulticore devices, such as signal integrity, power consumption,heat dissipation, and noise immunity. In this paper, we studymulticore systems supplied by common power and ground net-works. We consider cores executing independent tasks that re-quire minimal data transfers. The workloads may have differentfrequencies and may evoke different instruction sequences. Inthe context of power (ground) analysis such working cores actas current sources drawing various amounts of current at variousfrequencies. These fast and large transient currents introducevoltage drops on the power network or increase voltage on theground network. Such deviations of the power levels from theirnominal values are referred to as power supply noise and groundbounce. Both conditions are undesirable because they affect sig-nificantly the on-chip signal propagation. The transistor’s per-

Manuscript received July 20, 2009; revised January 29, 2010, June 10, 2010;accepted August 30, 2010. Date of publication February 28, 2011; date of cur-rent version October 28, 2011. This work was supported by SRC under Grant1421, by Apache Design Systems, and by the California MICRO Program.

A. Todri is with the Fermi National Accelerator Laboratory, Batavia, IL 60510USA (e-mail: [email protected]).

M. Marek-Sadowska is with the Electrical and Computer Engineering De-partment, University of California at Santa Barbara, CA 93106 USA (e-mail:[email protected]).

Color versions of one or more of the figures in this paper are available onlineat http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TVLSI.2010.2080694

formance depends on the voltage difference between the powerand ground levels. Decreasing this difference reduces the speedof the transistor switching.

In this work, we investigate the problem of workload assign-ment to minimize the system’s supply noise. Supply noise couldalso be decreased by increasing the voltage levels, however thiswould lead to a significant increase of the system’s power con-sumption. Workload assignment does not introduce any addi-tional overhead, and it can be used to control the effects of powerdelivery noise.

We consider a case of identical cores and tasks that can beprocessed by them. We assume that each task can be processedby a single core and it may require many clock cycles. The coresmay be running at various frequencies, which are attributes ofthe tasks. The power delivery system is shared among the cores.It is also possible that each core or a group of cores have theirown individual power networks. Such architectural solutions arenot studied in this paper. Those cores with a shared power de-livery structure share decoupling capacitance and have lower in-ductance paths to the package [2].

Given that workloads can last for many cycles, there are casesin which neighboring cores simultaneously perform tasks. In ashared power/ground grid system, a working core can inducepower supply noise and/or ground bounce on the neighboringworking cores. The global grid can act as a medium for noisepropagation between the cores. In our motivational experiments,we highlight the significance of task assignment on the level ofsupply noise. Additionally, for accurate noise calculation, bothpower and ground networks should be considered simultane-ously. An assumption of an ideal 0 V ground network could leadto inaccurate noise and timing requirement calculations as it isshown in Section IV.

In the open literature, there are no studies addressing theproblem of power/ground delivery for multicores. An intuitivemethod for designing such grids could be to optimize it for allcores operational (i.e., switching) with a typical clock frequencyand current demands. Despite being optimized for such a sce-nario, the global grid might experience intolerable power gridnoise and ground bounce for some configurations of workingcores. The challenge of how to assign the workloads to coresfor minimum performance loss arises. Various workload assign-ments may create different noise maps. Without an in-depthstudy of workload assignments and their effects on the globalpower and ground networks, we might not be able to determinethe optimal timing constraints of a multicore system.

Throughout this paper, we refer to the power supply noise andground bounce as the supply noises generated by the workingcores. We investigate the problem of workload assignment tominimize the supply noises, which are one of the indicators ofthe system’s performance loss. The chip/package power distri-bution network is modeled with an RLC network and the cores

1063-8210/$26.00 © 2011 IEEE

https://www.researchgate.net/publication/4255280_Comparison_of_Split-Versus_Connected-Core_Supplies_in_the_POWER6_Microprocessor?el=1_x_8&enrichId=rgreq-5411e757-6878-43af-82a9-28c81478a483&enrichSource=Y292ZXJQYWdlOzIyMDUyNDg3MztBUzoxMDE3MTY0NDM3OTU0NjdAMTQwMTI2MjQ5ODY3Ng==

2244 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 19, NO. 12, DECEMBER 2011

Fig. 1. (a) Core model. (b) 3� 3 multicore system.

are modeled with current sources that represent their current de-mands. We are considering flip-chip designs with controlled col-lapse chip connections (C4s) distributed throughout the grid.

When assigning workloads in multicore systems, we mustconsider and model the core-to-core interactions. The authorsof [2] show chip measurements and observations for POWER6,a dual-core microprocessor. They mention that noise from onecore could propagate to the other core and, in the worst case, thenoise might arrive when the other core is experiencing a locallyproduced drop, causing a perfect storm.

The existing literature describes techniques for assigningtasks without taking into account the core-to-core interactions.The authors of [3] consider task scheduling while improvingsystem performance by applying different voltage levels to thecores. The authors of [4] propose a task assignment schemethat takes cache behavior into consideration. In [5], the authorsexamine a multicore system’s architectural description anddiscuss its susceptibility to power variability caused by processvariations. In [6], the authors present an architecture for a36-multiprocessor array in which each processor operates asa single core. The authors utilize a multicore model similarto ours. They use it to examine the architectural details ofworkload assignment without including any physical designimplementation aspects. The global power and ground deliverynetworks are not analyzed and the core-to-core interactions arenot examined. In contrast to the existing works, we considerthe global power and ground grid integrity and core-to-coreinteractions when various workload assignments are applied.

In this paper, we analyze the global grid in the transient do-main. We estimate the voltage drop on the grid before and aftera workload assignment and show that random assignments canaffect system’s timing requirement. We develop metrics to mea-sure noise propagation from core to core. These metrics accountfor the amount of decap available during circuit operation. Wepropose an assignment technique that controls the supply noiseof a core and the noise induced by it on the neighboring cores.We show that utilizing supply noise aware workload assign-ments ensures meeting the system’s timing constraints. The re-minder of the paper is organized as follows. In Section II, we de-scribe the models. In Section III we describe the effect of supply

noise on timing. In Section IV, we show the motivational experi-ments. In Section V, we analyze the base, core and global grids.We describe our workload assignment method in Section VI,followed by the experiments in Section VII, and conclusions inSection VIII.

II. MODELS

A complete power/ground supply distribution model includesthe package and the chip grid equivalent circuits. The package-level supply distribution model is dominated by inductance. Theon-chip power and ground grids are primarily dominated byand parasitics. We adapt the power grid model described in[7]. It uses a passive linear time invariant (LTI) network con-sisting of resistors, inductors and capacitors, which are extractedfrom the parasitics of the power grid tracks. This model cap-tures well the grid’s behavior and it describes the power grid asa system of linear equations. A core is represented by a set ofdistributed C4s, current sources, and non-switching decaps, asshown in Fig. 1(a).

A non-switching core is represented by the decoupling capac-itance, , and the leakage current, . The decoupling ca-pacitance includes the capacitance from non-switching circuitryand the intentionally placed decaps.

In this work, we investigate workloads of various frequenciesthat represent diverse applications [8]. An equivalent circuit isbuilt to model the switching activities for each functional blockon a core. If the simulation results of the circuit are available,then nonlinear devices can be replaced with linear time-varyingcurrent sources which mimic the waveforms of the actual cir-cuits. Each switching CMOS transistor draws a pulse of currentfrom the power grid. The current drawn from the grid into thecircuit blocks is modeled by a linear current source as in [7]and [9] to represent the current peak, switching time and fre-quency of the switching circuit. Derivation and validity of suchmodels are described in [10] and [11]. These models capture ac-curately the current changes that occur on a core due to a work-load assignment [12].

Switching circuits are typically represented by triangularwaveforms [10], [11]; as those shown in Fig. 2(a). A triangularor trapezoidal current waveform is used to represent circuits

https://www.researchgate.net/publication/3954580_High-level_current_macro-model_for_power-grid_analysis?el=1_x_8&enrichId=rgreq-5411e757-6878-43af-82a9-28c81478a483&enrichSource=Y292ZXJQYWdlOzIyMDUyNDg3MztBUzoxMDE3MTY0NDM3OTU0NjdAMTQwMTI2MjQ5ODY3Ng==

https://www.researchgate.net/publication/2983769_AsAP_An_Asynchronous_Array_of_Simple_Processors?el=1_x_8&enrichId=rgreq-5411e757-6878-43af-82a9-28c81478a483&enrichSource=Y292ZXJQYWdlOzIyMDUyNDg3MztBUzoxMDE3MTY0NDM3OTU0NjdAMTQwMTI2MjQ5ODY3Ng==

https://www.researchgate.net/publication/4255280_Comparison_of_Split-Versus_Connected-Core_Supplies_in_the_POWER6_Microprocessor?el=1_x_8&enrichId=rgreq-5411e757-6878-43af-82a9-28c81478a483&enrichSource=Y292ZXJQYWdlOzIyMDUyNDg3MztBUzoxMDE3MTY0NDM3OTU0NjdAMTQwMTI2MjQ5ODY3Ng==

https://www.researchgate.net/publication/3694802_Power_Supply_Noise_Analysis_Methodology_For_Deep-submicron_Vlsi_Chip_Design?el=1_x_8&enrichId=rgreq-5411e757-6878-43af-82a9-28c81478a483&enrichSource=Y292ZXJQYWdlOzIyMDUyNDg3MztBUzoxMDE3MTY0NDM3OTU0NjdAMTQwMTI2MjQ5ODY3Ng==

https://www.researchgate.net/publication/3694802_Power_Supply_Noise_Analysis_Methodology_For_Deep-submicron_Vlsi_Chip_Design?el=1_x_8&enrichId=rgreq-5411e757-6878-43af-82a9-28c81478a483&enrichSource=Y292ZXJQYWdlOzIyMDUyNDg3MztBUzoxMDE3MTY0NDM3OTU0NjdAMTQwMTI2MjQ5ODY3Ng==

https://www.researchgate.net/publication/220811702_Modeling_and_Characterizing_Power_Variability_in_Multicore_Architectures?el=1_x_8&enrichId=rgreq-5411e757-6878-43af-82a9-28c81478a483&enrichSource=Y292ZXJQYWdlOzIyMDUyNDg3MztBUzoxMDE3MTY0NDM3OTU0NjdAMTQwMTI2MjQ5ODY3Ng==

https://www.researchgate.net/publication/3512639_Maximum_current_estimation_in_CMOS_circuits?el=1_x_8&enrichId=rgreq-5411e757-6878-43af-82a9-28c81478a483&enrichSource=Y292ZXJQYWdlOzIyMDUyNDg3MztBUzoxMDE3MTY0NDM3OTU0NjdAMTQwMTI2MjQ5ODY3Ng==

https://www.researchgate.net/publication/240794339_Real-Time_Scheduling_on_Multicore_Platforms_Full_Version?el=1_x_8&enrichId=rgreq-5411e757-6878-43af-82a9-28c81478a483&enrichSource=Y292ZXJQYWdlOzIyMDUyNDg3MztBUzoxMDE3MTY0NDM3OTU0NjdAMTQwMTI2MjQ5ODY3Ng==

https://www.researchgate.net/publication/4280560_Power_Management_of_Multicore_Multiple_Voltage_Embedded_Systems_by_Task_Scheduling?el=1_x_8&enrichId=rgreq-5411e757-6878-43af-82a9-28c81478a483&enrichSource=Y292ZXJQYWdlOzIyMDUyNDg3MztBUzoxMDE3MTY0NDM3OTU0NjdAMTQwMTI2MjQ5ODY3Ng==

https://www.researchgate.net/publication/3341985_Interconnect_and_circuit_modeling_techniques_for_full-chip_power_supply_noise_analysis?el=1_x_8&enrichId=rgreq-5411e757-6878-43af-82a9-28c81478a483&enrichSource=Y292ZXJQYWdlOzIyMDUyNDg3MztBUzoxMDE3MTY0NDM3OTU0NjdAMTQwMTI2MjQ5ODY3Ng==

https://www.researchgate.net/publication/3341985_Interconnect_and_circuit_modeling_techniques_for_full-chip_power_supply_noise_analysis?el=1_x_8&enrichId=rgreq-5411e757-6878-43af-82a9-28c81478a483&enrichSource=Y292ZXJQYWdlOzIyMDUyNDg3MztBUzoxMDE3MTY0NDM3OTU0NjdAMTQwMTI2MjQ5ODY3Ng==

https://www.researchgate.net/publication/221031630_Technology_trends_in_power-grid-induced_noise?el=1_x_8&enrichId=rgreq-5411e757-6878-43af-82a9-28c81478a483&enrichSource=Y292ZXJQYWdlOzIyMDUyNDg3MztBUzoxMDE3MTY0NDM3OTU0NjdAMTQwMTI2MjQ5ODY3Ng==

https://www.researchgate.net/publication/221031630_Technology_trends_in_power-grid-induced_noise?el=1_x_8&enrichId=rgreq-5411e757-6878-43af-82a9-28c81478a483&enrichSource=Y292ZXJQYWdlOzIyMDUyNDg3MztBUzoxMDE3MTY0NDM3OTU0NjdAMTQwMTI2MjQ5ODY3Ng==

TODRI AND MAREK-SADOWSKA: POWER DELIVERY FOR MULTICORE SYSTEMS 2245

Fig. 2. (a) Discrete triangular based waveform. (b) Continuous Weibull distribution function waveform representations.

average current consumption, peak current, cycle time, rise andfall times. For simplicity, we refer to various workloads usingparameters of the triangular waveform, but in our calculations,we utilize continuous, Weibull distribution-based, currentmodels with a period , peak switching time , peak current

, and leakage current in idle mode . Fig. 2(b) illus-trates the continuous current waveform model of a workload.

The Weibull current model is continuous which facilities themathematical computations. It is expressed as

(1)

where, is the shape parameter that corresponds to , andis the scaling parameter that corresponds to . A workload

represents a working task. Variables (peak current, frequency,rise/fall time) of the Weibull model capture sufficiently closethe actual circuit current waveform. Based on our observations,deviations within 10% from the actual current waveform do notaffect the noise estimations as we aim to measure the averagesupply noise, however large deviations between the actual andestimated average supply noise will result in inaccurate timingrequirement estimates.

Utilizing these models, we aim to capture the effect of supplynoise on a core due to core-to-core interactions. The currentmodel is flexible enough to represent various switching activ-ities and frequencies within a core by assigning appropriatepeak current, rise and fall times, and frequencies to each currentsource on the core. In the case of spatial intra-core variations,the core can be partitioned into smaller blocks whose spatialvariations can be omitted.

III. EFFECT OF SUPPLY NOISE ON TIMING

Transistors’ behavior and performance depend on the voltagedifference between power and ground levels, thus supply noise(the amount of voltage supply voltage deviation from their nom-inal levels) can affect the circuit delay. The purpose of sup-pressing the cores’ supply noise is to maintain their timing con-straints. Authors of [13] showed that the voltage drop of 4.2%for a 2.53-GHz Pentium4 microprocessor can reduce the clockfrequency by 6.7%. In [14], it was shown that a 12.5% voltagevariation may cause up to 2.4 X increase of gate delay in 0.13 umCMOS.

In [15] and [13], the authors developed analytical formulasand performed on-chip measurements to verify that the averagesupply noise, rather than the peak noise, is a good indicator

of timing variations. They observed that when the switchingtiming window is short with respect to the noise variation’s du-ration, minimizing the average voltage drop and minimizing itspeak are equivalent operations. However, when the switchingwindow is comparable to the duration of the noise variation, thepeak is not a good predictor of the circuit’s delay.

In this work, we utilize noise metrics proposed in [16], whichcapture the average noise

(2)

(3)

(4)

where are power and ground node voltages andand are the starting and ending switching times. The powersupply noise and ground bounce is measured in (2) and (3) asthe integrals of the difference between the actual waveforms andideal voltage levels. Equation (4) states that the supply noise ofa system is a sum of its power and ground network noises overa timing window.

Supply noise can affect the transistors’ switching, thus, bymeasuring its average value, we can accurately estimate thecore’s timing requirement. In [13], the authors provide a de-tailed discussion of the circuit delay’s dependency on the supplynoise.

The data exchange protocol for core-to-core communicationtypically involves one core writing data to particular commu-nication buffers—either via CPU, cache write-backs or directmemory access (DMA), and notifying the other core that thedata is available. In this work, we consider independent statictasks with minimal data transfer [17], [18].

In [19], buffer delay variations are studied for various noiseconditions. In the case of a two-core system there are fourcore-to-core communication scenarios to evaluate for timingconstraints. A core can be either noisy or quiet; therefore wehave to consider 1) noisy-noisy, 2) noisy-quiet, 3) quiet-noisy,and 4) quiet-quiet communications. In the case of n-cores, thiswould imply scenarios to analyze for communication timingdegradation which can be computationally expensive. In thiswork, we perform the task-to-core assignment guided by thefast core timing constraint evaluation and we do not considerany possible degradation of the core-to-core communication.

https://www.researchgate.net/publication/3452800_Validation_of_a_Full-Chip_Simulation_Model_for_Supply_Noise_and_Delay_Dependence_on_Average_Voltage_Drop_With_On-Chip_Delay_Measurement?el=1_x_8&enrichId=rgreq-5411e757-6878-43af-82a9-28c81478a483&enrichSource=Y292ZXJQYWdlOzIyMDUyNDg3MztBUzoxMDE3MTY0NDM3OTU0NjdAMTQwMTI2MjQ5ODY3Ng==

https://www.researchgate.net/publication/4051047_Timing_analysis_in_presence_of_power_supply_and_ground_voltage_variations?el=1_x_8&enrichId=rgreq-5411e757-6878-43af-82a9-28c81478a483&enrichSource=Y292ZXJQYWdlOzIyMDUyNDg3MztBUzoxMDE3MTY0NDM3OTU0NjdAMTQwMTI2MjQ5ODY3Ng==

https://www.researchgate.net/publication/220399477_Noise_considerations_in_circuit_optimization?el=1_x_8&enrichId=rgreq-5411e757-6878-43af-82a9-28c81478a483&enrichSource=Y292ZXJQYWdlOzIyMDUyNDg3MztBUzoxMDE3MTY0NDM3OTU0NjdAMTQwMTI2MjQ5ODY3Ng==

https://www.researchgate.net/publication/3423520_Impact_of_Power-Supply_Noise_on_Timing_in_High-Frequency_Microprocessors?el=1_x_8&enrichId=rgreq-5411e757-6878-43af-82a9-28c81478a483&enrichSource=Y292ZXJQYWdlOzIyMDUyNDg3MztBUzoxMDE3MTY0NDM3OTU0NjdAMTQwMTI2MjQ5ODY3Ng==




IV. MOTIVATIONAL EXPERIMENT

Various workload distributions in a multicore system createdifferent supply noises that result in different timing delays. Wedemonstrate this by analyzing a 3 3 multicore system, shownin Fig. 1(b). Each core is represented by the model shown inFig. 1(a).

Each workload is characterized by its current demand,switching frequency, and leakage current. In a multicoresystem, the cores may be running similar or different tasks.Some may be high frequency; other may be mid-frequencyswitching applications. In this experiment, we investigateworkloads operating at various frequencies that represent di-verse applications. We experiment with the workloads availablein the SPEC CPU2006 suites [8] that exhibit a spectrum ofpower and performance requirements and correspond to taskssuch as video compression, combinatorial optimization, andpath-finding algorithms.

We note that these characterized workloads are only used tohighlight the effect that workloads have on power supply noiseand timing constraints. By no means, they should be utilized forvalidation of a real system.

We assume that the global power and ground grids of themulticore system used in this experiment are designed to workproperly for all cores operational with typical current demandworkloads. Parameters of a typical workload are:

We also experiment with a high frequency workloadwhose parameters are:

and with a low frequency workload described by the fol-lowing parameters:

We estimate the leakage currents to be consistent with theleakage power of each workload. The total leakage power of theworkload is derived according to the voltage drop on the core.There is an exponential dependency between and leakagecurrents as discussed in [20].

We measure the distance between cores by Manhattan metricnormalized to the length of the core’s side. The decoupling ca-pacitance available to a core is provided by its neighbors, thecore’s non-switching circuits and intentionally inserted decaps.We normalize the capacitance to the capacitance of an idle core.The core labels correspond to the multicore system shown inFig. 1(b).

We investigate the core-to-core interactions as a function ofthe distance between them. We assign a workload to core 1and vary the task assigned to the other core. We consider thecore-to-core interactions for high–high, mid–mid, and low–lowfrequency cores. Figs. 3 and 4 show the supply noise for thecore-to-core interaction as a function of their proximity. Themid–mid frequency workloads have the greatest supply noise,which decreases as the distance between the cores increases.

Fig. 3. Core–core interaction for high–high, mid–mid, and low–low frequencyworkloads. Proximity between cores is measured by Manhattan metric normal-ized to the length of the core’s side.

Fig. 4. Supply noise dependency on proximity for high–low, high–mid, andmid–low frequency workloads. Proximity between cores is measured by Man-hattan metric normalized to the length of the core’s side.

Similarly, the supply noise decreases with proximity for thelow–low frequency cores. For high–high frequency cores, thesupply noise changes only slightly with proximity. This is dueto the parasitic effects in the high frequency domain where in-ductance plays an important role. In low frequencies, decapsdominate, but their effects are mostly local and rapidly decaywhen the distance between the cores increases.

Next, we investigate the effect of the available decap on thesupply noise. This experiment is performed using a single op-erational core at various locations. For example, core 1 has twoimmediately neighboring cores that can act as decap. Similarly,cores 3, 7, and 9 have two neighboring cores that can act asdecap. Cores 2, 4, 6, and 8 have three immediately neighboringcores and core 5 has four neighboring cores to act as decap.

Fig. 5 shows the effect of decap on the supply noise for dif-ferent frequency workloads. For the same experimental setup,we vary the workload frequency and the amount of decap avail-able. The worst case supply noise always occurs at mid frequen-cies (closer to the resonant frequency), regardless of the amount

https://www.researchgate.net/publication/3943110_Topological_analysis_for_leakage_prediction_of_digital_circuits?el=1_x_8&enrichId=rgreq-5411e757-6878-43af-82a9-28c81478a483&enrichSource=Y292ZXJQYWdlOzIyMDUyNDg3MztBUzoxMDE3MTY0NDM3OTU0NjdAMTQwMTI2MjQ5ODY3Ng==


Fig. 5. Supply noise dependency on workload frequency and decap avail-ability. Decap is normalized to the decap value of an idle core.

Fig. 6. Difference in supply noise due to ground grid parasitics.

of decap. This was also observed in [21]. The amount of avail-able decaps moves the supply noise curves up or down, as illus-trated in Fig. 5.

We perform experiments on circuits with an ideal 0 V leveland the actual ground grid networks. We observe that there isa significant difference in the total supply noise between thesetwo cases. Fig. 6 shows the supply noise for different workloadfrequencies. We include the actual ground grid network in theremaining experiments in this paper.

We draw the following observations:Observation 1: Supply Noises and Frequency: Workload

frequency has an effect on supply noises. We observe thatthe supply noises are larger when the workload frequency iscloser to the resonant frequency of the system. This is alsoillustrated in Fig. 3. The workloads assigned to the cores coulddiffer in their application and switching frequencies, thus somecores will experience greater supply noise than others. Thecorrelation between the supply noise and workload frequen-cies of the cores is shown in Fig. 4. For smaller core–to-coredistances, in our experiments, for distances of less than 2 unitsof a normalized Manhattan distance, high-low workload coreshave less power supply noise than low-mid workload cores.

For distances greater than 2, low-mid workload cores havethe least power supply noise. The noises generated by thehigh-mid and high-low workloads change slowly with distancedue to inductive effects, whereas the noises for low-mid coresdecrease faster due to the local decap effects.

Observation 2: Supply Noises and Proximity: Supply noisesfrom one core to the other are inversely proportional to the dis-tance between them. Additionally, the frequency of the work-loads affects the core-core interactions. The supply noises fromtwo cores, both with high frequency workloads, tend to be thesame regardless of their proximity because of the inductive ef-fects present. At low frequencies, supply noises decrease rapidlywith increasing proximity due to decap effects. A workload as-signment strategy should take into consideration the proximitybetween the cores and their operational frequencies.

Observation 3: Supply Noises and Decaps: Supply noise isinversely proportional to the amount of decap. As the amountof decap increases, the supply noises decrease, thus increasingthe noise resilience of the core. This effect is well-known andwidely used to suppress the amount of supply noises. There arevarious works that investigate the decap placement and sizing tocontrol supply noises on a single core [22]. Fig. 5 illustrates thiseffect. In a multicore system, the operational core will experi-ence different amounts of decap depending on its location andits neighboring cores’ activities. The neighboring cores can actas decaps to suppress the noises. An assignment strategy shouldconsider these factors.

Observation 4: Effect of Ground Bounce: Assuming there isan ideal 0 V level ground network ignores the ground bounceand leads to inaccurate calculations of the system’s delay. Thevoltage difference between the power and ground network in-dicates the amount of voltage supply applied to a circuit. Con-sidering an ideal 0 V level ground network results in an inac-curate workload assignment. These behaviors are also observedwhen several workload assignments are applied to the circuit inFig. 1(b).

In this experiment, the global power and ground grids are de-signed to withstand the supply noise when cores operate withthe typical workload . In the first assignment, cores 1, 2,and 4 are operational. In the second assignment, cores 1, 5,and 9 are operational. We apply various frequency workloads tothe cores and measure their supply noises on operational cores.Table I shows the comparisons of supply noise and timing delayincrease for each assignment. Assignment 1 has larger supplynoises than assignment 2 for the same set of workloads. For ex-ample, workload - - for assignment 1 has the largestpower supply noise in comparison to assignment 2.

These observations illustrate the effect that workload fre-quency, decap and location have on the system’s supply noise.Supply noise can degrade core’s timing constraints and weutilize workload assignment as a knob that can be used tocontrol the supply noise without additional costs. These obser-vations further motivate us to study the effect that workloadassignments have on the supply noise and to propose a supplynoise aware assignment strategy.

In the next section we analyze the global grid to capture theeffects of proximity between cores, available decap, and work-load frequency.

https://www.researchgate.net/publication/220651229_Power_Grid_Physics_and_Implications_for_CAD?el=1_x_8&enrichId=rgreq-5411e757-6878-43af-82a9-28c81478a483&enrichSource=Y292ZXJQYWdlOzIyMDUyNDg3MztBUzoxMDE3MTY0NDM3OTU0NjdAMTQwMTI2MjQ5ODY3Ng==

https://www.researchgate.net/publication/3224788_Decoupling_capacitance_allocation_and_its_application_to_power-supply_noise-aware_floorplanning?el=1_x_8&enrichId=rgreq-5411e757-6878-43af-82a9-28c81478a483&enrichSource=Y292ZXJQYWdlOzIyMDUyNDg3MztBUzoxMDE3MTY0NDM3OTU0NjdAMTQwMTI2MjQ5ODY3Ng==


Fig. 7. (a) Base grid. (b) Core grid.

TABLE ISUPPLY NOISES COMPARISONS BETWEEN VARIOUS

ASSIGNMENTS AND DIFFERENT FREQUENCIES

V. GRID ANALYSIS

In this work, we assume that power/ground delivery for amulticore system consists of a mesh with C4s, current sources,and decaps distributed on the grid. Each core is replicated manytimes to model the whole multicore system structure, as shownin Fig. 1(a) and (b).

Each core consists of several blocks. Each block correspondsto a circuit bounded by a rectangle whose corners are C4s. Thecurrent sources modeled as s are shown in Fig. 2. Thebase grid is a 2 2 portion of the grid between the neighboringC4s along with the current sources representing the circuit anddecoupling capacitances. The capacitors and current sources onthe boundaries of the base are represented by their half values,

and in order to account for those capacitors andcurrent sources that are partitioned among two bases.

The core grid can be viewed as a collection of connected basegrids. For example, a core can have many blocks such as cache,cpu, alu, and decoders/drivers. We decompose the core grid intosmaller grids while maintaining the system’s behavior. Repre-senting the core with several base models allows us to analyzeeach base grid separately and reuse the results to analyze thecore and the whole global grid. Fig. 7(a) and (b) illustrate thebase and core grids. The black dots on the grid represent C4s.In this section, we perform detailed analyses of the base, core,and global grids.

TABLE IIIMPEDANCE APPROXIMATION COMPARISONS

A. Base Grid Analysis

In this study, we assume the current sources and decouplingcapacitances in a core grid are uniformly distributed over thecore nodes. The decoupling capacitances of the base grid areextracted from the corresponding circuits. Our analysis is alsovalid for non-uniform distribution of decap and current sources.We assume their uniformity to simplify the explanation. For thissame reason, we also assume that a core grid includes four con-nected base grids. The analysis is valid for those cases with morethan four base grids per core. We reduce the base grid to a singlenode. As shown in Fig. 8(a), the center node, 5 in the powergrid, will have the greatest amount of voltage drop and simi-larly, nodes 4 or 6 will have the greatest voltage drop on theground grid due to being the nodes the furthest away from thesupply pins. Since our objective is to estimate the average supplynoise which directly affects timing, we investigate these nodesas they exhibit the largest amount of voltage drop. Additionally,considering nodes 5 in power grid and 6 in ground grid wouldalso provide the maximum noise peaks which can change thelogic state of a node and cause circuit to fail.

The simplified circuit is shown in Fig. 8(b). Node repre-sents reduced circuit node 5 of the power grid and node rep-resents reduced circuit node 6 of the ground grid. The circuit re-sistances are determined using the 1st and 2ndshortest paths’ impedances. These paths provide a reasonableapproximation of the impedance between two points and intro-duce only a small amount of inaccuracy. Including more pathswill improve the approximation. Given that our base circuit is asimple resistive network, we can obtain a very accurate approx-imation of the impedances with less than 2% error. We com-pare the approximations of impedances and their effect on thevoltage response with the results obtained from HSPICE sim-ulations. The results are shown in Table II. Similar estimations


Fig. 8. (a) 2� 2 base grid. (b) Simplified circuit for analyzing node 5.

were performed in [22] on large granularity power grids withmultiple power pins; the error estimate provided there was lessthan 10%.

is the effective capacitance from the neighboring nodesthat is seen at node 5. The capacitance is the decap availableat node 5. To obtain , we solve the base grid by applyingthe modified nodal analysis (MNA) [23]. The node voltages forthe actual base grid can be expressed as , where

is the conductance matrix for the base grid structure for bothpower and ground bases, are the nodevoltages for power and ground bases, andis the vector of current sources on the base grid.

Given that node 5 would have the largest amount of voltagedrop, . Similarly, node 6 would have thegreatest bounce. We have . Because we arefocusing on capturing the average case voltage drop, investi-gating the voltages at nodes 5 and 6 is sufficient. To maintainthe behavior of the actual core in the reduced model, the nodevoltage of the reduced circuit must represent the node voltageof the actual circuit, i.e., .

The reduced circuit shown in Fig. 8(b) has as the onlyunknown parameter, where is the package inductance,is the extracted current source, and is the decap available atnode 5. The parameter can be derived by solving the nodevoltage equation for nodes and in the -domain

(5)

where inverse Laplace transforms are taken for and. varies in value depending on the workload

frequency and switching activity, as shown in Figs. 9(a) and(b). decreases exponentially with current frequency andlinearly with switching activity.

The reduced base circuit is obtained analytically and ourmethod is similar to that described in [24]. The estimations areonly applied when deriving parasitic impedances, and

, which introduce some inaccuracy. This inaccuracy ismostly contributed by due to the location of node 6 ona boundary between two base grids. The ground pins of the

Fig. 9. (a) � as a function of current frequency. (b) � as a function ofswitching activity.

neighboring base grid introduce inaccuracy to the current flowand voltage response at node 6. Our experiments suggest thatwe are able to capture the voltage response on the power andground grid with an error less than 2%.


Fig. 10. Core model with simplified base models.

B. Core Grid Analysis

The core grid consists of several connected base grids.Fig. 7(b) shows a circuit representation of the core grid. We usethe models and analyses developed for the base grid to derivethe node voltages of the core grid.

Each base is represented by its simplified model, as shown inFig. 8(b). Bases are connected through the shortest paths’ im-pedances between them. Impedancesrepresent the local power grid branches between bases 1, 2, 3,and 4. Similarly, represent the localground grid branches between bases 1, 2, 3, and 4. Such a rep-resentation simplifies the core grid structure and its analysis.Fig. 10 shows the core grid with the base models. We assumethat when a core is operational, all its bases are operational withthe same frequency as that of the workload assigned to it. Weutilize the core model from Fig. 10 to derive analytical formulasfor the node voltages on the core grid. We express them in termsof the node voltages derived from the base grid analysis.

The current for the base grid in the -domain is expressed as

(6)

For a core grid with simplified base grid models, the nodevoltage at any node can be expressed as

(7)

Combining (6) and (7), we express the core node voltage interms of the base node voltages as follows:

(8)

In a matrix form, the node voltages on the core grid can beexpressed in terms of the base voltages as:

(9)

Fig. 11. Simplified core model.

where, for a 4-base core, the matrices and are 8 8, andare 8 1 vectors including the power and ground

nodes: . The matrices and representthe conductance parameters for the core and the base grids de-rived from (9). We have already derived the solution forthus, the node voltages of the core are expressed as:

(10)

This equation is valid, assuming that all bases of the samecore are operational with the same switching frequency. In thecase of various frequencies, we need to apply the superpositionof the frequency responses. We further simplify the core modelto include only a single node voltage. To do this, we need toknow the amount of decap available for a core. We use the re-duction technique described in subsection .

The single node represents the minimum node voltage of thecore. Thus, for a power grid node we have

(11)

and for a ground grid we have

(12)

The simplified core model is shown in Fig. 11. It has onlyone current source representing the current demand and fre-quency of the workload assigned to it. The simplified model hasalso one decoupling capacitor, which represents the amount ofdecap available in the core. The package is modeled by the in-ductance . The impedances are derived fromthe delta-wye conversion of the impedances in the circuit, illus-trated in Fig. 10. The only unknown parameter of the model isthe amount of decap of the core, because and areknown.

represents the combined decap available to the bases ofthe core and is derived from analytical equations for the circuitshown in Fig. 11. In order to maintain the same system behaviorbetween the simplified core model and the actual core circuit,we maintain the equality and .

Based on KCL, for a simplified core model we have

(13)


Fig. 12. (a) Global grid for a 2� 2 multicore system. (b) Simplified coremodels for a 2� 2 multicore system.

From (13), we derive an analytical formula for decap

(14)

We compute the inverse Laplace transforms for andin (14) to derive . We further utilize the simplified core modelfor the global power grid analysis.

C. Global Grid Analysis

We perform global grid analysis using the base and coremodels discussed in the previous sections. This significantlyreduces the complexity of the global grid analysis withoutlosing any accuracy. The global grid for a 2 2 multicoresystem is shown in Fig. 12(a). Fig. 12(b) illustrates the globalgrid structure with simplified core models for a 2 2 multicoresystem.

The cores are connected through the global grid at thepackage level. , and

are the impedances of the power andground global grid branches that connect the cores. In theglobal grid, the connected cores can have different frequencies.Thus, in our analysis we apply superposition to consider thefrequency response of each workload. Superposition applies tolinear circuits [23].

We divide the ranges of workload frequencies into threegroups. The first group consists of high frequency workloads(wh) and is represented by a single average frequency for therange. Similarly, the second and third groups consist of the mid(wm) and low (wl) frequency workloads.

We derive the frequency response for a set of cores that arein the same frequency group. In a matrix form, the frequencyresponse for the high frequency group is expressed as

(15)

where is the conductance matrix of the grid struc-ture, is the vector of the global node voltages, and

is the vector of the current sources where isthe number of cores. For the sample grid structure in Fig. 12(b)

, where , or , de-pending on the frequency. Similarly, we derive the frequencyresponses for other frequencies as

(16)

The global node voltages are expressed by superposition as

(17)

where is the initial condition voltage. Superposition is ap-plied as property of linear networks. Equation (17) is rewrittenas

(18)

where .For each individual core, the global voltage is expressed as

(19)

Using (19) we are able to express the core voltages in termsof the frequency response for each frequency group. The coeffi-cients capture the effect that has on in any of thefrequency groups, wh, wm or wl. We use these coefficients todecide how to assign workloads to minimize the supply noisescaused by the core-to-core interactions. We note that frequencygrouping would introduce some inaccuracy in the frequencyresponse of the multicore system, however, the analytical for-mulas provide sufficient accuracy and capture the trends of thefrequency response and the generated supply noises. Equation(19), can be extended to include all frequencies by obtaining

for each domain.The node voltage equations for the global grid can also be ex-

pressed in terms of the base and core node voltages as derivedin the previous subsections. We choose to represent the globalnode voltages in terms of the current sources such that we de-rive coefficients that effectively capture the core-to-core in-teractions. The simplified core model for the global grid analysissignificantly reduces the complexity of the problem. The size ofthe matrix that needs to be solved for the global grid analysisis directly proportional to the number of cores. We measure theamount of supply noise on the global grid using (4).


VI. ASSIGNMENT STRATEGIES

We formulate two workload assignment problems.Problem 1: Given workloads, and a global grid of

cores initially idle, decide how to assign the workloads such thata minimum of supply noises are generated.

Problem 2: Given workloads, and a global grid ofcores with initially some cores working, decide how to assignthe new workloads without reassigning the previously as-signed cores and such that a minimum amount of supply noiseis generated.

Below, we propose four assignment strategies.

A. Simulated Annealing-Based Assignment

Both problems can be solved using a simulated-an-nealing-based algorithm. Simulated annealing is a well-knownoptimization technique widely used for various applications.We apply simulated annealing to explore the effects of work-load assignments on the supply noise and system’s delay. Theassignment vector for problem 1 has all its elements asvariables; in problem 2, vector has some fixed elements dueto the initial, existing assignment. The evaluation function isthe supply noise obtained using the node voltages expressedby (19) and noise metric given by (4); the cooling rate is set as

, where , and is the coolingstep in the loop. For each temperature step, equilibrium isreached if there is no more change in the supply noise for aperturbed assignment configuration.

B. Assignment Heuristics

We utilize our observations and analyses to formulate theworkload assignment algorithms based on quantitative rea-soning. Based on our observations of several examples, wehave the following classifications and rules.

1) There are three kinds of workloads: for high frequency,for mid frequency, and for low frequency.

2) Based on their switching activities we further refinethe workloads as , where each representsswitching activities of 0.2, 0.3, and 0.4, respectively. Aswitching activity of 0.3 means that, on average, in everyclock cycle, 30% of the core’s transistors switch. Similarlywe define and .

3) We first assign the mid, then high, and finally low fre-quency workloads due to the amount of supply noise theygenerate.

4) The core-core interactions are ordered from the strongestto the weakest as - - - - - , and

. Thus, high and mid frequency workloads should beplaced farther apart to reduce their interactions whereas thelow-low frequency workloads can be placed close to eachother without a large supply noise penalty.

The quantitative assignment (QA) strategy can be summa-rized as follows:

1. Place {Mi} workloads• Place them far apart from each other to weaken the

core interactions.2. Construct the core voltage matrix using EQ10 andobtain power supply noise and ground bounce usingEQ11 and EQ12.

3. Place {Hi} workloads• Place them far apart from each other to weaken H-M

core interactions.4. Construct the core voltage matrix with multiplefrequencies using EQ18 and obtain the new power supplynoise and ground bounce on the system.5. Place {Li} workloads

• Place them far apart from each other to weaken M-Lcore interactions.6. Recompute the power supply noise and ground bounceon the system.

We also introduce two other assignment algorithms based ongeometric distances and the amount of current consumed by thecores. In geometric assignment (GA), the summation of inter-core distances is maximized. Current dependency assignment(CDA) is based on the amount of current drawn by the cores.Cores with large current workloads are assigned far away fromeach other to minimize the core–core interaction.

VII. RESULTS

We have implemented the assignment algorithms and testedthem on a set of circuits. The circuit parameter values were takenfrom [21]. We consider the power and ground delivery in 90 nmtechnology with V. We compare our simulated-an-nealing-based assignment (SABA) with the QA, GA, and CDAalgorithms in terms of supply noise. We note that there are noprior works that developed tools for supply noise aware work-load assignment in multicore systems. However, we have ex-haustively checked all possible assignments and demonstratedthe quality of our algorithms as shown in Fig. 13.

The initial global power and ground grid are designed to sat-isfy the voltage drop and current density constraints when allcores are operational with their current demand . In thiswork, we assume that all the cores are identical and any workloadcan be assigned to any core. We study the assignment problemfor different workload distributions and grid configurations. Inour experiments, we assumed that up to 50% of the cores can beoperational at any time. Assumption of 50% operational coresis based solely to highlight the effectiveness of the SABA, pro-posed method. There is no restriction on the number of workingcores and SABA can also be applied when all cores are opera-tional. Workloads that are applied vary in current magnitudesand frequencies in order to represent various working tasks.

1) Effect of Core Granularity: Several 3 3, 4 4, 5 5,and 10 10 multicores were tested for the same workloadsto study the effects of core granularity on the assignment.The assignment was tested on problems 1 and 2, described inSection V. Table III(a) shows the results for problem 1 andTable III(b) shows the results for problem 2. In all tables, thepercentage of supply noises (%SN) describe the noise increasefor algorithms GA, CDA and QA versus the SABA algorithm.

2) Effect of Core Size: We study three different core sizes: asmall, medium, and large. In our experiment, a small core con-sists of a single base grid, a medium size core consists of fourbase grids, and a large size core consists of nine base grids. Weapply the same workload to each core. We assume that the exe-cution of a workload consumes the same amount of charge re-gardless of the core size. The base grids are the same for all core


Fig. 13. (a) Workload placement for all assignment strategies. (b) Supply noise (in unit of � � ��) for all configurations.

TABLE IIISUPPLY NOISES FOR VARIOUS CORE GRANULARITIES (a) WHEN INITIALLY ALL CORES ARE IDLE, (b) WITH AN INITIAL EXISTING ASSIGNMENT,

(c) FOR VARIOUS CORE SIZES AND (d) FOR VARIOUS FREQUENCY WORKLOAD ASSIGNMENTS

sizes. Large core sizes lead to less timing delay increase due tothe amount of available decap. Results are shown in Table III(c).

3) Effect of Workloads: Several workloads were tested tostudy their effect on PSN. The workloads have different currentdemands and switching frequencies. We experimented with theworkloads described in Section II. The results areshown in Table III(d).

4) Effect of Ground Bounce: As we discussed on the pre-vious sections, proper modeling of ground bounce has a signifi-cant effect on the system’s supply noise. Ignoring it can lead toinaccurate noise and timing delay calculations.

We provide supply noise comparisons between the ideal 0 Vlevel and the actual ground network. We apply our algorithm on

various grid granularities. We observe a significant difference inthe amount of supply noise which increases with the granularityof the grids. Table IV shows the comparisons.

We observe that, overall, the CDA algorithm results in thelargest amounts of supply noises. We also observe that QA givesbetter results than GA and CDA. This is because QA takes intoaccount the frequency and proximity between cores. QA doesnot capture all the nuances of core-core interactions and is not asgood as SABA which takes into account decap availability andnoise propagation between cores. However, QA can be a goodstarting point for further optimization. We also observe that theinitial assignment of working cores plays a significant role in thesystem supply noises. The supply noises of a multicore system


TABLE IVSUPPLY NOISE COMPARISON WITH IDEAL AND ACTUAL GROUND GRID

TABLE VRUN-TIMES FOR DIFFERENT CORE GRANULARITIES

with no initial assignment is less than when an initial assignmentexists. This is because an initial assignment restricts the possibleassignment of workloads. These differences can be observed inTables III(a) and III(b).

The GA algorithm considers only the geometric distance toreduce the noise propagation, but it cannot capture any pos-sible trade-offs. The amount of charge consumed by the simul-taneously operating workloads has an effect on timing delay.Greater, more frequent current demands create larger noises.

In Fig. 13(a), we show the best possible workload assign-ments for a 3 3 multicore system determined by each algo-rithm. Table V shows run-times comparisons for different coregranularities.

To calibrate the quality of solutions determined by each of thealgorithms, in Fig. 13(b) we show the supply noise for all con-figurations of working cores. On the -axis are the assignmentconfigurations sorted by their grid noises in an increasing order.We observe that in terms of supply noises, the SABA algorithmindeed produces a high quality solution, whereas other algo-rithms are quite far from the optimum. The obtained results canbe explained based on the properties of each algorithm. SABAassignment strategy is based on simulated annealing approachwhich is an iterative method and attempts to provide a globalminimum solution. The other strategies are rule-based and donot capture all the dependencies and their mutual effects. Fig. 13illustrates these findings. SABA has greater run times that theother algorithms while providing a better quality solution.

VIII. CONCLUSION

In this paper, we demonstrated that workload assignmentaffects the system’s overall timing constraints. Workload fre-quency, core activity, and amount of decap play an importantrole in the amount of supply noise. We developed metrics tocapture the supply noise, effective capacitance, and core-to-core

noise propagation. We developed a supply noise aware assign-ment strategy and showed that our algorithm is efficient andachieves better results than those obtained by geometric, currentdependency, and quantitative assignments.

ACKNOWLEDGMENT

The authors gratefully acknowledge the equipment grantfrom Intel. We also thank the anonymous reviewers for theirfeedback and suggestions to strengthen this paper.

REFERENCES

[1] J. M. Tendler, J. S. Dodson, J. S. Fields Jr., H. Le, and B. Sinharoy,“POWER4 system microarchitecture,” IBM J. Res. Develop., vol. 46,no. 1, pp. 5–25, Jan. 2002.

[2] N. James, Ph. Restle, J. Friedrich, B. Huott, and B. McCredle, “Com-parison of split versus connected-core supplies in the POWER6 micro-processor,” in Proc. Int. Solid-State Circuits Conf., San Francisco, Feb.2007, pp. 298–300.

[3] G. Qu, “Power management of multicore multiple voltage embeddedsystems by task scheduling,” in Proc. Int. Conf. Parallel Process. Work-shops, Sep. 2007, pp. 78–83.

[4] J. Anderson, J. Calandrino, and U. Devi, “Real-time scheduling on mul-ticore platforms,” in Proc. Real-Time Embedded Technol. Appl. Symp.,Apr. 2006, pp. 179–190.

[5] K. Meng, F. Huebbers, R. Joseph, and Y. Ismail, “Modeling and char-acterization power variability in multicore architectures,” in Proc. Int.Symp. Perform. Anal. Syst. Software, Apr. 2007, pp. 146–157.

[6] Zh. Yu, M. J. Meeuwsen, R. W. Apperson, O. Sattari, M. Lai, J. W.Webb, E. W. Work, D. Truong, T. Mohsenin, and B. Baas, “AsAP: Anasynchronous array of simple processors,” IEEE J. Solid-State Circuits,vol. 43, no. 3, pp. 695–705, Mar. 2008.

[7] H. Chen and J. S. Neely, “Interconnect and circuit modeling techniquesfor full-chip power supply noise analysis,” IEEE Trans. Compon.,Packag. Manuf. Technol., vol. 21, no. 3, pp. 209–215, Aug. 1998.

[8] Standard Performance Evaluation Corporation, “SPEC CPU2006,”[Online]. Available: www.spec.org/cpu2006.

[9] H. Kriplani, F. Najm, and I. Hajj, “Maximum current estimation inCMOS circuit,” in Proc. Design Autom. Conf., Jun. 1992, pp. 2–7.

[10] S. R. Nassif and O. Fakhouri, “Technology trends in power-grid-in-duced noise,” Proc. Syst. Level Interconnect Prediction, pp. 55–59,2002.

[11] H. Chen and D. Ling, “Power supply noise analysis methodology fordeep-submicron VLSI chip design,” in Proc. Design Autom. Conf., Jun.1997, pp. 638–643.

[12] S. Bodapati and F. Najm, “High-level current macro-model for power-grid analysis,” in Proc. Design Autom. Conf., 2002, pp. 385–390.

[13] M. Saint-Laurent and M. Swaminathan, “Impact of power-supplynoise on timing in high-frequency microprocessors,” IEEE Trans. Adv.Packag., vol. 27, no. 1, pp. 135–144, Feb. 2004.

[14] R. Ahmadi and F. N. Najm, “Timing analysis in presence of powersupply and ground voltage variations,” in Proc. Int. Conf. Comput.-Aided Design, Nov. 2003, pp. 176–183.

[15] Y. Ogasahara, T. Enami, M. Hashimoto, T. Sato, and T. Onoye, “Val-idation of a full-chip simulation model for supply noise and delay de-pendence on average voltage drop with on-chip delay measurement,”IEEE Trans. Circuits Syst.-II: Exp. Briefs, vol. 54, no. 10, pp. 868–872,Oct. 2007.

[16] A. R. Conn, R. A. Haring, and C. Visweswariah, “Noise considera-tions in circuit optimization,” in Proc. Int. Conf. Comput.-Aided De-sign, Nov. 1998, pp. 220–227.

[17] L. Chai, A. Hartono, and D. K. Panda, “Designing high performanceand scalable MPI intra-node communication support for clusters,” inIEEE Int. Conf. Cluster Comput., Sep. 2006.

[18] J. Anderson, J. Calandrino, and U. Devi, “Real-time scheduling on mul-ticore platforms,” in Proc. Real Time Embedded Technol. Appl. Symp.,Apr. 2006, pp. 179–190.

[19] L. C. Chen, M. Marek-Sadowska, and F. Brewer, “Buffer delay changein the presence of power and ground noise,” IEEE Trans. Very LargeScale Integr. Syst., vol. 11, no. 3, pp. 461–473, Jun. 2003.

[20] W. Jiang, V. Tiwari, E. D. Iglesia, and A. Sinha, “Topological analysisfor leakage prediction of digital circuits,” in Proc. Int. Conf. VLSI De-sign, 2002, pp. 39–44.


[21] S. Pant and E. Chiprout, “Power grid physics and implications forCAD,” in Proc. Design Autom. Conf., 2006, pp. 199–204.

[22] Sh. Zhao, K. Roy, and Ch. K. Koh, “Decoupling capacitance allocationand its application to power-supply noise-aware floorplanning,” IEEETrans. Comput.-Aided Design Integr. Circuits Syst., vol. 21, no. 1, pp.81–92, Jan. 2002.

[23] D. Johnson, J. Johnson, and J. Hilburn, Electric Circuit Analysis, 2nded. Englewood Cliffs, NJ: Prentice Hall, 1992.

[24] Ch. Kashyap, Ch. Alpert, and A. Devgan, “An effective capacitancebased delay metric for RC interconnects,” in Proc. Int. Conf. Comput.-Aided Design, 2000, pp. 229–234.

[25] N. Madan and R. Balasubramonian, “Power-efficient approaches to re-dundant multithreading,” IEEE Trans. Parallel Distrib. Syst., vol. 18,no. 8, pp. 1066–1079, Aug. 2007.

Aida Todri (M’03) received the B.S. degree inelectrical engineering from Bradley University, IL in2001, the M.S. degree in electrical engineering fromLong Beach State University, CA, in 2003 and Ph.D.degree in electrical and computer engineering fromthe University of California, Santa Barbara, in 2009.

She has been with the Computing Division atFermi National Accelerator Laboratory, Batavia, IL,since 2009. She has worked as a graduate intern forMentor Graphics Corp, Cadence Design Systems,STMicroelectronics and IBM TJ Watson Research

Center. Her research interests include low power design, clock and powernetwork design, signal and power integrity analysis.

Dr. Todri is the recipient of John Bardeen Fellow in Engineering Award in2009.

Malgorzata Marek-Sadowska (M’87–SM’95–F’97)received the M.S. degree in applied mathematicsand the Ph.D. degree in electrical engineering fromTechnical University of Warsaw, Poland.

From 1976 to 1982, she was an Assistant Pro-fessor with the Institute of Electron Technology,Technical University of Warsaw. She was a Re-search Engineer with the University of California(UC) Berkeley Electronics Research Laboratoryfrom 1982 until 1990. Since then, she has beenwith the Department of Electrical and Computer

Engineering, UC Santa Barbara, as a Professor.Dr. Marek-Sadowska was an Associate Editor from 1991 to 1993, and

from 1993 to 1995 she was the Editor-In-Chief of IEEE TRANSACTIONS ON

COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS.

Date post:	12-May-2023
Category:	Documents
Upload:	independent
View:	1 times
Download:	0 times

Power Delivery for Multicore Systems

Documents