Variability and Statistical Dynamic Analysis Flow for ...€¦ · Variability and Statistical...

Variability and Statistical Dynamic Analysis Flow forLinear Interconnect Networks

Antonio Lucas Robalo Martins

Thesis to obtain the Master of Science Degree in

Electrical and Computer Engineering

Supervisor: Prof. Luis Miguel Teixeira d’Avila Pinto da Silveira

Examination CommitteeChairperson: Prof. Horacio Claudio de Campos Neto

Supervisor: Prof. Luis Miguel Teixeira d’Avila Pinto da SilveiraMembers of the Committee: Prof. Nuno Cavaco Gomes Horta

October 2015

Acknowledgments

I’d like to start by thanking my supervisor, Professor Luıs Miguel Silveira, for giving me the chance towork with him in this project and guiding me through all the challenges I faced during its development.

None of this would be possible if it weren’t for my family and friends, whose support was without ashadow of a doubt what made me persevere through the academia and believe all of this was possible.For that, I am endearingly grateful.

For ideas can change the world, a thank you to my friends at Utopia Society.

Abstract

The analysis of interconnect networks under the influence of limitations in the lithographic process isof extreme importance in the design cycle of integrated circuits, as those variations can cause undesiredbehaviour in the final product. Power grids are extremely large interconnects, covering the circuit areaand used to provide the required bias to all other circuit elements. Variations can impact the power gridlocal or globally, giving rise to the need to simulate the effects of process variation.

Extensive work has been done trying to solve this problem efficiently, whose main difficulty lies onthe size of the power grids, making the problem computationally intensive. To analyze the effects ofprocess variation, statistical data of significance is obtained from analysis of the circuit behaviour, whichunfortunately implies solving the circuit under different parameter settings.

Several approaches to the variational analysis problem have been suggested, in particular a classof methods for static variational analysis that separates the problem in two stages: first, a parametrizedmodel for the network is created only once, and then the model is solved for any number of parametersettings in a highly efficient fashion.

In this work we propose a novel scheme for the dynamic analysis of power grid behaviour under theinfluence of variations, where a compressed parametrized model is used to represent the network. Theproposed scheme is highly parallelizable, shows average errors under 5%, speed-ups up to 26x versuscommon techniques and low memory requirements due to high model compressibility.

Keywords: Dynamic Analysis, Variational Analysis, Model Order Reduction, Parametrization

Resumo

A analise de malhas de alimentacao sob a influencia de variacoes causadas pelo processo litograficoe de extrema importancia no ciclo de design de circuitos integrados, visto que essas variacoes podemcausar comportamentos nao desejados no produto final. Malhas de alimentacao sao interconexoes degrandes dimensoes, cobrindo toda a area do circuito e polarizando todos seus elementos. As variacoespodem afectar a malha tanto local como globalmente, dando azo a necessidade de simular de formaeficiente os efeitos das variacoes causadas pelo processo de fabrico.

Imenso esforco e dedicado a resolucao eficiente deste problema, cuja dificuldade e causada pelotamanho das malhas que tornam o problema computacionalmente intensivo. Para analisar os efeitosdas variacoes, dados estatısticos sao obtidos atraves da analise do comportamento do circuito, o queinfelizmente exige a sua resolucao para um grande numero de configuracoes.

Ja foram sugeridas varias abordagens ao problema da analise variacional, em particular um metodopara analise estatica que separa o problema em duas etapas: primeiro, um modelo parametrizado damalha e criado uma unica vez, e depois esse modelo e resolvido de forma extremamente eficiente paraqualquer numero de configuracoes parametricas.

Neste trabalho propomos um esquema inovador em duas etapas para a analise variacional dinamicade malhas de alimentacao, em que um modelo parametrizado comprimido representa a malha. Oesquema proposto e altamente paralelizavel, demonstra erros medios inferiores a 5%, uma aceleracaoate 26x comparado com tecnicas standard, e nao exige elevados recursos de memoria devido a altataxa de compressao atingida.

Keywords: Analise Dinamica, Analise Variacional, Model Order Reduction, Parametrizacao

Contents

1 Introduction 11.1 The Semiconductor Industry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.2 Manufacturing Process of IC’s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.3 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.4 Outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.5 Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2 Background 72.1 Power Grids . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.2 Sparse Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.3 The Static Analysis Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.4 Solving the Static Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.4.1 Cholesky Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.4.2 Model Order Reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.4.3 Hierarchical representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.4.4 Multigrid method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.4.5 Random Walks Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.5 Introducing the Time Domain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132.5.1 Temporal Discretization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132.5.2 Simulation of the Dynamic System . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.6 Operation Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142.6.1 Big O Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142.6.2 Complexity of Matrix Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

3 Network Parametrization and Variational Analysis 193.1 Network Parametrization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193.2 Static Variational Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

3.2.1 Taylor Series Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213.2.2 Two-Step Methods: OLA and SPARE approaches . . . . . . . . . . . . . . . . . . 21

3.3 Dynamic Variational Analysis - Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

4 Dynamic Variational Analysis Scheme 254.1 Analysis Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254.2 Compression Scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

4.2.1 The Eigenvalue Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264.2.2 QR Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274.2.3 Introducing the RRQR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294.2.4 Principal Component Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

4.3 Set Up Stage - Model Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324.3.1 SPARE Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 334.3.2 OLA Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

4.4 Evaluation Stage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344.5 Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344.6 Memory Requisites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354.7 Time Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

4.7.1 Set Up Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354.7.2 Evaluation Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

5 Experiments and Results 375.1 Data Preparation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 375.2 Chosen Parametrization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 385.3 Results Quantification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 395.4 Model Behaviour . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

5.4.1 On the Maximum relative Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . 415.4.2 Maximum parameter Variation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

5.5 Compressibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 415.5.1 Compression Rate in the Parameter Space . . . . . . . . . . . . . . . . . . . . . . 425.5.2 Compression Rate in the Time Domain . . . . . . . . . . . . . . . . . . . . . . . . 435.5.3 Compression Rate versus RRQR thresholds . . . . . . . . . . . . . . . . . . . . . 43

5.6 Large Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 455.7 Achieving Faster Evaluation Times . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

6 Conclusion 49

List of Tables

5.1 Benchmark networks and their default region configuration . . . . . . . . . . . . . . . . . 385.2 Time domain variability analysis for network ibmpg1t . . . . . . . . . . . . . . . . . . . . . 405.3 Evaluating Compressibility in the Parameter Space . . . . . . . . . . . . . . . . . . . . . . 425.4 Ratios between evaluation of the compressed and non-compressed model . . . . . . . . 435.5 Evaluating Compressibility in the Time Domain . . . . . . . . . . . . . . . . . . . . . . . . 435.6 Evaluating compressibility versus absolute threshold . . . . . . . . . . . . . . . . . . . . . 455.7 Evaluating compressibility versus relative threshold . . . . . . . . . . . . . . . . . . . . . . 455.8 Time domain variability analysis for network ibmpg2t . . . . . . . . . . . . . . . . . . . . . 465.9 Time domain variability analysis for network ibmpg3t . . . . . . . . . . . . . . . . . . . . . 465.10 Time domain variability analysis for network ibmpg6t . . . . . . . . . . . . . . . . . . . . . 46

List of Figures

1.1 Transistor count of microprocessors against their dates of introduction . . . . . . . . . . . 11.2 Section of a photomask . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.3 Brief explanation of the photolitography process . . . . . . . . . . . . . . . . . . . . . . . . 31.4 Schematic of a chip . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.1 Typical power grid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82.2 A RLC model of an on-chip power grid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82.3 A simple RC circuit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82.4 A sparse matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

3.1 SPARE representation of a parametric system . . . . . . . . . . . . . . . . . . . . . . . . 23

4.1 Traditional analysis flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264.2 Proposed analysis flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264.3 Linear regression as an example of data compression . . . . . . . . . . . . . . . . . . . . 294.4 RRQR orthogonalization process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

5.1 Time-domain variational analysis for a given node . . . . . . . . . . . . . . . . . . . . . . 405.2 Causes of the maximum relative errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 415.3 Histogram of maximum absolute error per parameter setting - 3 σ variations . . . . . . . . 425.4 Histogram of maximum absolute error per parameter setting - 4 σ variations . . . . . . . . 425.5 Compressibility analysis in the parameter space . . . . . . . . . . . . . . . . . . . . . . . 425.6 Compressibility analysis in the time domain . . . . . . . . . . . . . . . . . . . . . . . . . . 445.7 Eigenvalues of the model matrix X . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

Acronyms

BEOL Back End of Line.

EDA Electronic Design Automation.

EM Electromigration.

FEOL Front End of Line.

IC Integrated Circuit.

MOR Model Order Reduction.

NNZ Non-Zero Elements.

OLA Output Linear Approximation.

PCA Principal Component Analysis.

PDE Partial Differential Equation.

PG Power Grid.

RRQR Rank Revealing QR.

SSI Small-Scale Integration.

TS Taylor Series.

ULSI Ultra Large-Scale Integration.

UV Ultraviolet.

VLSI Very Large-Scale Integration.

Chapter 1

Introduction

The invention of the integrated circuit revolutionized the world. The first of those circuits was presentedin 1959, by Jack Kilby, who worked at Texas Instruments: a body of semiconductor material wherein allthe components of the electronic circuit are completely integrated. This started a movement that allowedthe miniaturization of complex circuits, paving the way for modern technologies like personal computers,calculators, and most electronic devices.

In 1965 Gordon Moore predicted that the number of transistors in an Integrated Circuit (IC) woulddouble nearly every 18 months, following an exponential growth that couldn’t be found in any otherindustry [1], a trend that became known as Moore’s Law. In the last decades, the number of elementsin a single integrated circuit rose from several thousands to thousands of millions (Figure 1.1). As thetechnology progressed, new design methods evolved, coining new generations of processes, from theso-called Small-Scale Integration (SSI) to the Very Large-Scale Integration (VLSI) and the Ultra Large-Scale Integration (ULSI).

Figure 1.1: Transistor count of microprocessors against their dates of introduction.y-axis is in logarithmic scale, highlighting an exponential growth predicted by Moore’s Law [2].

1

1.1 The Semiconductor Industry

The semiconductor industry market amounted to over 335 000 US$M in 2014 and was forecast to grow2.3% this year. More than 82% of its composition is related to integrated circuits [3], showing that thereis a huge amount of resources allocated to this topic.

Research and development is a prevalent part of this industry, as of most industries in the technologysector. Market demands create the need to efficiently design integrated circuits, spanning a parallelindustry dedicated to the computer aided design of integrated circuits and Electronic Design Automation(EDA) software.

EDA Software

Historically, integrated circuits were designed and laid out by hand. A shift in the design paradigm duringthe 1980’s led to a big increase in circuit complexity, due to tools that allowed the designer to specifythe desired behaviour via a programming language and compile it to silicon, that is, convert it into logic.EDA tools became a crucial component of the IC industry, allowing the design and analysis of integratedcircuits.

Nowadays, the EDA industry generates an annual revenue of over 6 000 US$M and its size keepsincreasing [4], showing a huge demand that requires the development of solutions for the ever-risingchallenges.

EDA tools allow both the design of complex systems, as well as their verification. This is of the utmostimportance in part because circuit fabrication has become extremely expensive and any error that mayrequire a fabrication respin will greatly increase product costs. Additionally the semiconductor industry iskeenly aware of the need to be on time in the market, as delayed products may either miss their windowof opportunity or provide competition with an unsurmountable advantage.

While sophisticated verification tools may provide designers with the ability to debug possible de-sign problems, a new paradigm has emerged that complicates things considerably in the semiconductorindustry, This paradigm is associated with the issue of variability, which is related to an seemingly un-avoidable lack of control over multiple aspects of the manufacturing process of circuits using the latestfabrication technologies.

The need to account for deviations stemming from such limitations has led to the development of anew set of tools and to the establishment of methodologies and flows whereby the effects of variabilitycan be estimated and minimized. To understand the implications of such a paradigm shift requires amore in-depth look at the sources and effects of such variability.

1.2 Manufacturing Process of IC’s

A commonly used process to manufacture an integrated circuit is photolithography. In this process,Ultraviolet (UV) light is used to transfer a pattern from a photomask to a light-sensitive chemical on asubstrate, followed by a series of chemical treatments that will engrave the pattern on that substrate.This process may be repeated several times, etching several layers of patterns until the end-product isfinished. Figure 1.3 briefly explains this process.

The end-product contains features like transistors, resistors, and interconnect lines that carry powerand data, as specified by the pattern in the photomasks created as a result of the design step of the ICcycle.

2

Figure 1.2: Section of a photomask, where the pattern to be printed can be seen [5].

The substrate, or wafer, is the basis of the integrated circuit, and it is a thin slice of what usually isa semiconductor (silicon). The first step in the process is wafer cleaning and preparation: it is rid ofany contaminants on its surface, and then a chemical reaction is promoted with the purpose of forming alayer of silicon dioxide on the wafer surface (a). Then, the photoresist is applied, a light-sensitive materialthat will be affected by the UV light. A very thin (sub-micrometric) layer of photoresist now covers thesurface of the wafer (b).

The photomask (Fig. 1.2) is placed over the substrate, and intense UV light exposes the photoresist,that will chemically change. The photomask contains the pattern of interest that will be printed on thephotoresist (c,d).

Now, the photoresist that was affected by the UV light is removed by a chemical agent called thedeveloper, that dissolves it (e). The substrate must now be etched in the same pattern as delivered bythe photomask. The top layer of the substrate is removed by a chemical agent, only affecting the areasnot protected by the photoresist (f). The remaining photoresist is then chemically removed (g). A readerwith interest in the photolitography process may consult the reference [6].

Figure 1.3: Brief explanation of the photolitography process.a) prepare wafer, b) apply photoresist, c) align photomask, d) expose to UV light, e) develop and

remove photoresist exposed to UV light, f) etch exposed oxide, g) remove remaining photoresist [7].

After the process is completed different materials, e.g. semiconductor crystals or metal alloys canbe be deposited or grown on top of the wafer, filling the etched interstices and building circuit elements.This process is repeated for multiple layers, as many as the process is designed to sustain, forminga tri-dimensional circuit. Usually the design can be separated in two main portions: the Front End ofLine (FEOL), containing the substrate and the first layers where individual components are patterned(transistors, resistors, capacitors ...), and the Back End of Line (BEOL), the top layers containing themetal interconnects that wire the circuit. A schematic of the layering can be seen in Figure 1.4.

An important step in IC design is the design of the interconnect Power Grid (PG) networks, a type ofinterconnect that spans across the entire chip area, providing circuit bias, carrying large currents, andbringing the IC together by connecting all circuit elements and devices. Power grids are typically built bythe process described above, on the top layers of the integrated circuit, and can be seen in Figure 1.4,where they are made of copper and depicted in orange.

3

Figure 1.4: Schematic of a chip, with the BEOL layers on the bottom, and the FEOL interconnectlayers on the top [8].

1.3 Motivation

Technological evolution has led to the design of IC’s with smaller and smaller components, in an effortto cram more sophisticated designs in a single chip. As the size of each component in an IC decreases,it is natural that the power grid wires that bring the circuit together also get smaller. Nowadays, thewidth of the interconnect lines is on the order of the tens of nanometers, with 14nm technologies beingpresented [9].

The nanometer regime introduces new challenges, as the lithographic process described abovefailed to keep pace with Moore’s law and the printed features are now under the wavelength of the lightused on the process. That creates an environment where printed features become highly susceptibleto variations on the lithographic process, leading to increased variability on the final designs, affectingtheir behaviour and performance. This variations affect the power grids, e.g. variations on the powergrid wiring width that may cause non-desired voltage and current fluctuations on the chip, that can leadto several problems. Therefore the need to analyze and simulate the network under the effect of thosevariations arises.

Nowadays, the analysis of design variability is very challenging [10], as it is a time-consuming taskdue to the large number of network variations that must be studied, leading to repetitive simulation ofthe network taking increased computational effort. Several cases must be studied: DC perturbations,where the circuit voltage bias is affected, and AC perturbations, changes in the transient response whenthe circuit is turned on. An example of a relevant analysis that is affected by the process variation is theelectromigration analysis.

Electromigration (EM) refers to the unwanted movement of materials in a semiconductor, and thiseffect becomes especially relevant when working with high current densities, as it ends up happeningin an IC, where currents circulate in very thin power grid wires. This phenomena can cause failures inthe integrated circuit, making it crucial to simulate and analyze new devices before their manufactureso that EM problems can be found in the early stages of development, as the cost of prototyping andmanufacture is elevated. This analysis can be done by determining the voltage of the power grid nodes(from which the branch currents can be obtained and the EM analysis done).

The importance of analyzing the power grid is real. This task, however, is not computationally simple.PG descriptions are very complete and the number of voltage nodes is extremely high, in the order of

4

hundred million nodes. Solving the power grid becomes a resource-hungry task.If the AC problem is considered, the power grid must be solved for several time steps, and when

process variation is introduced all the analysis effort must be repeated for a large number of parametersettings that describe the effect of the variations, so that statistical information about how the power gridbehaviour changes with process variation can be obtained. In the end, a time-variant variational analysisseems to require a considerable amount of solves of the original analysis problem.

However, one can exploit both the structure of the power grid representation and the physical pro-prieties of the grid, as the spatial distribution of voltages, to find a better solution for the PG analysisproblem.

The present dissertation will focus on the analysis of power grids under the effect of perturbationscaused by process variation, discussing state of the art methodologies and presenting a novel schemeto the efficient and memory-aware analysis of the dynamic behaviour of interconnect networks. Theperformance of the proposed scheme will be evaluated using a set of benchmark power grids, usuallyused by the community.

1.4 Outcomes

The end result of the present work is a novel scheme that efficiently estimates the effects of variabilityon dynamic interconnects. We start by defining a set of possible process variations and present theirimpact on the power grid behaviour. Then, we present the analysis scheme, accomplished in a two-stageapproach:

1. A set up stage where a parametrized model for the network is generated only once, and com-pressed on-the-fly;

2. A highly efficient evaluation stage that can be repeated for any number of parameter settings.

The compression will allow the analysis of large networks that would otherwise be impossible toevaluate.

This scheme can be embedded in enhanced design cycles and EDA software, allowing the studyof the impact on circuit performance of the variations, for example to estimate the distribution of themaximum resistor current (relevant for electromigration analysis), the maximum voltage drop (relevantfor grid integrity analysis), or to obtain statistical information about voltage variations.

A paper describing this work entitled “Variability and Statistical Analysis Flow for Dynamic LinearSystems with Large Number of Inputs” was submitted to the DATE 2016 Conference [11] and is underreview at this time.

1.5 Structure

This thesis is structured as follows.In Chapter 2 some background to the analysis problem is presented, including the description of the

power grid network, the static and dynamic analysis problems under nominal conditions and operationcomplexity, setting the stage for additional discussion regarding the complexity of such procedures.

In Chapter 3 process variation is introduced, and state of the art solutions for the static analysisproblem are presented.

Then, in Chapter 4 the proposed dynamic analysis scheme for networks under the effect of pertur-bations is presented and explained.

5

Afterwards, in Chapter 5 we present and discuss the results of a series of experiments realized usingthe analysis scheme proposed in Chapter 4, including a discussion on the compression technique usedand the scalability of the approach.

Finally, in Chapter 6 we summarize the developed work and finish by discussing challenges left open,under which future work could be developed.

6

Chapter 2

Background

In this chapter we present some background to the dynamic analysis problem. We start by describingthe power grid network and its state-space representation in Section 2.1, and then we introduce theconcept of sparse matrices in Section 2.2, as a sparse representation will be used for the network.

Afterwards, in Section 2.3 we present the static analysis problem and discuss its difficulty, and followby presenting a series of solutions for the problem in Section 2.4, starting by the direct solution via aCholesky decomposition and a set of example methods and techniques used to accelerate the solveprocess for the static analysis: Model Order Reduction (MOR), hierarchical representation, a multigridmethod, and a random walks method.

In Section 2.5 we introduce the time domain and upgrade the static analysis problem to a dynamicone. We present temporal discretization techniques used to fabricate a time-domain representation forthe network, and the standard solution to the dynamic analysis problem.

Finally, in Section 2.6 we discuss the complexity of matrix operations, paving way to the study of themodel efficiency.

2.1 Power Grids

A power grid (PG) is the system by which power is delivered to the circuit, to provide bias and operatethe device. It is usually designed as an orthogonal mesh of metal strips, as shown in Figure 2.1, thatwill carry power to feed the elements of the circuit. A real power grid may not be complete as depictedin the figure, and the wires may not all have the same width or length. To model the power grid, a sim-plified representation is used, albeit enough to be representative of what is used in commercial tools,and tries to model the impact the PG design has on the behaviour of the circuit. Roughly 10% of thewiring of an integrated circuit is part of the power grid [12], which means that the electrical resistancecaused by the wires is non-negligible. Coupling capacitance caused by the overlap of the wires is alsoconsidered. Inductance between wires or between wires and bonding structures ([13],[14],[15]) can alsobe considered, but inductance effects are usually found when analyzing models including IC packaging,where those effects are characterized. Therefore, inductive characteristics will not be considered in thepresented model, that considers an environment containing only the PG. The PG model proposed inthis thesis can be combined with packaging models in a hierarchical fashion for an analysis that includesthe inductance effect as well as any other effects that one might want to study. In the following, we willconcentrate our attention to the modeling and analysis of on-chip RC effects. Wires are translated to re-sistors, and overlaps translated to capacitors. Each overlap becomes a node of the power grid network.Figure 2.2 shows the simplified network as an RLC model, with ideal current sources representing the

7

Figure 2.1: Small portion of a typical PG.( c©2008 IEEE, used with permission)

Figure 2.2: A RLC model of an on-chip PG.( c©2010 IEEE, used with permission)

elements the power grid is feeding, and a voltage source as a bias for the network [16]. As mentionedbefore, on this work inductance was not considered, reducing the network model to a RC circuit.

A simple RC circuit is presented in 2.3, and is composed by a capacitor, a resistor, and a currentsource connected in parallel. The following linear differential equation is satisfied:

CdV

dt+V

R= i⇔ C

dV

dt+GV = i , (2.1)

where C is the capacitance of the capacitor, V is the voltage across the points a and b, i is the current

intensity of the source, and G =1

Ris the conductance of the resistor. The power grid network is

Figure 2.3: A simple RC circuit.

built as a set of RC circuits, and its equations can be extrapolated from the simple example described.Let’s consider a matrix C ∈ Rn×n, the capacitance matrix of the network, and a matrix G ∈ Rn×n, theadmittance matrix of the network, where n is the number of nodes in the network. G is built as

G = PR ×D~g ×PTR , (2.2)

where PR ∈ Rn×r is the incidence matrix that describes how the network nodes are connected and bywhich of the r resistors, and D~g = diag (~g), where ~g ∈ Rr×1 is a vector where each element is gi = 1

ri,

the conductance of each of the r resistors of the network. We should note that the resulting admittancematrix G is symmetric, as it is the result of a quadratic form in the symmetric matrix D~g. The capacitancematrix is built as

diag(~c) = C , (2.3)

where ~c ∈ Rn×1 contains the numerical values of the capacitors. From Eq. (2.1), the network can nowbe represented in matrix form as:

Cd~v(t)

dt+ G~v(t) =~i(t) , (2.4)

where ~v ∈ Rn×1 is the vector of node voltages (the output of the system), and~i ∈ Rn×1 a vector with the

8

input currents. This will be our simplified state-space representation for the network. Both the admittanceand capacitance matrices have a very small fill-in, with O(n) non-zero elements each.

2.2 Sparse Matrices

The matrices G,C in Eq. (2.4) are mostly empty, containing few non-zero elements. This calls for theintroduction of the concept of sparsity and sparse matrices.

A sparse matrix is a matrix where most of its elements are zero. A diagonal matrix D ∈ Rn×n is anexample of a sparse matrix, as the total number of elements of D is n2 and there are only n non-zeroelements on that matrix, corresponding to the elements dii, ∀i ⊂ 1, . . . , n. The sparsity of a matrix is itsfill-in ratio, i.e. the number of non-zero elements over the total number of matrix elements.

Figure 2.4: A sparse matrix graphic visualization. The black dots represent non-zero elements.Example created with matrix G from example ibmpg1t (more about the examples in Section 5.1).

Let’s imagine we have a large sparse matrix A ∈ Rm×n, that we need to store in memory. A naiveapproach to the storage of that matrix would be to store the mn elements in memory, explicitly repre-senting all elements. But, as most of the elements are zero, this is very inefficient. Several techniquesexist to create a more compact representation, and the basis of most is to implicitly represent the zeros:only non-zeros are represented and all other elements are assumed to be zero.

An example of a commonly used sparse matrix representation scheme called the Yale sparse matrixformat (used by MATLAB [17]) is presented. The matrix A is represented by three arrays: ~a, whichcontains the non-zero elements of A, organized top-to-bottom and left-to-right; ~b, of the same size of ~aand containing the row index of each non-zero element; ~c of size n+1 and contains, for each column, theindex of an element of ~a corresponding to the first element of that column, followed by the total numberof non-zero elements.

Although it introduces considerable overhead, the representation above reduces the representationsize if the number of Non-Zero Elements (NNZ) is less than (n (m− 1) − 1)/2, which corresponds to asparsity of about 50%.

Using an implicit representation for the sparse matrices is also useful in increasing the efficiency ofseveral matrix operations, e.g. multiplications. The operations involving zero elements are skipped thusskimming computation time.

2.3 The Static Analysis Problem

In this section we will present the power grid analysis problem and discuss its difficulty. The problem isdefined by the system equation (2.4). Starting by considering the static case, d~v

dt becomes zero and the

9

system equation becomesG~v =~i , (2.5)

which solution ~v = G−1~i is trivial.

This means that the main concern is the computational cost of calculating the solution, and not itsmathematical formulation. This is due to the fact that G ∈ Rn×n, where n is the number of nodes inthe network, and the number of nodes is very large (upwards to several millions). Therefore, severalconcerns related to the complexity of the operation arise:

1. Computing the inverse of the system matrix G−1 is an expensive operation, even considering thefact that G is sparse;

2. After obtaining the inverse the equation still needs to be solved, and the resulting inverse G−1 isnot sparse, which means that the effort of completing the matrix-vector product is considerable;

3. Large power grids mean high memory requisites, as space to store and process the system matri-ces is needed.

The presented problems will become immense when process variations are introduced, as multiplesolves of the Eq. (2.5) will be required, for different system matrices.

A lot of work has been done in solving power grid networks efficiently, by exploiting the physical basisof the problem and/or the structure of the matrix G. The sparsity of G can be observed in Table 5.1 whichcontains benchmark examples, where the number of resistors r ≈ nnz(G) is O(n). We present somemethods used for the simulation and analysis of power grids.

2.4 Solving the Static Problem

There are several methods to tackle the power grid analysis problem. Direct methods solve the systemequation (2.5) directly, while iterative methods try to find a method to iteratively approximate the solution.There are other classes of methods, like Monte Carlo methods, that rely upon random sampling toapproximate the solution.

We start by presenting the triangular solve, a standard approach using the Cholesky decomposition,followed by some techniques techniques and methods that accelerate the solve process. Afterwards, wepresent examples of iterative and Monte Carlo techniques. In the next section we’ll reintroduce the timedomain and explain how the time variant system equation can be solved and what are the differencesfrom the static case.

2.4.1 Cholesky Decomposition

The standard approach to solve Eq. (2.5) would be to apply the Gauss-Jordan elimination to the systemmatrix, thus obtaining the inverse. We can decompose G in such a way that the matrices that resultfrom the decomposition have a structure that makes the solve easier, as the Cholesky decomposition.

The Cholesky decomposition separates a hermitian (square and equal to its conjugate transpose)and positive-definite matrix A ∈ Rn×n into a lower triangular matrix L ∈ Rn×n and its conjugate trans-pose:

A = LL∗ (2.6)

An example on how to compute the decomposition will be described, by the use of a recursive

10

algorithm. We’ll start by defining the first iteration A(1) = A, and will calculate A(i) and Li as:

A(i) =

Ii−1 0 0

0 ai ~b Ti

0 ~bi B(i)

, Li =

Ii−1 0 0

0√ai,i 0

0 1√ai,i~bi In−i

, (2.7)

where B(i) ∈ R(n−i)×(n−i), ~bi ∈ R(n−i)×1, and Li is the output of the i-th iteration.

For each iteration i, Li is generated as described above, and A is updated as

A(i+1) =

Ii−1 0 0

0 1 0

0 0 B(i) − 1ai,i~bi~b∗i

(2.8)

After n steps, we can calculate L as

L = L1L2 . . .Ln (2.9)

Calculating the Cholesky decomposition of a sparse matrix greatly speeds up this decompositionprocess, and the following solve operation. The decomposition cost will also depend on how the valuesof the sparse matrix are distributed. Therefore two identical matrices but with different ordered vectorscan have Cholesky factors L with a very different fill-in, i.e. number of non-zero elements.

As we will be working with very large matrices, it is of interest to reduce the fill-in of the resulting L

matrix, not only for space consideration but also to reduce the computational costs, as less elements di-rectly means less multiplication operations involving this matrices, as multiplications with zero-elementscan be skipped.

This brings us to an alternative to the presented decomposition in Eq. (2.6)

A = PLLTPT , (2.10)

where P is a permutation matrix that permutes the rows and columns of A. In this alternative process,we calculate the Cholesky decomposition of the matrix A′

A′ = PTAP = LLT , (2.11)

instead of the original A matrix. The choice of this permutation matrix P greatly affects the sparsity ofthe resulting matrix L. Heuristic methods exist to discover the best choice of permutation matrix P [18].

Back to the power grid problem, equation (2.5) can now be solved by first computing the Choleskyfactorization of G, obtaining G = LLT (or the equivalent G = PLLTPT ), with L being a lower triangularmatrix. Then, the system can be solved as

LLT~v =~i⇒ L~x =~i

LT~v = ~x , (2.12)

solving L~x = ~i by forward substitution and LT~x = ~i by back substitution. The cost of the Choleskyfactorization is O

(n3), with n the number of nodes and the size of G, and the computation of ~x and ~v

have a cost of O(n2)

each (operation complexity is discussed at the end of this chapter). As G is asparse matrix, the factorization cost will be smaller, approaching O (nα), with α < 2 [19].

The complete algorithm is as follows:

11

Algorithm 1: Solving the Static Analysis Problem using the Cholesky Decomposition

Data: G;~iResult: ~v

1 [L,P]← chol(G)

2 i← PT ~i

3 x← L \ i4 v ← LT \ x5 ~v ← Pv

2.4.2 Model Order Reduction

The Model Order Reduction (MOR) technique tries to reduce the complexity of the solve by creating asmaller sized representation (compressed representation) for the system matrix, reducing the computa-tional effort of the factorization. The goal is to find a transformation T ∈ Rn×q , with q < n, that allowsthe reduction of the size of G, allowing a representation equivalent to Eq. (2.5)

G = TTGT

Gv = TT ~i , (2.13)

where v ∈ Rq×1 is the reduced set of voltages. The full voltage set can then be approximated as~v = Tv. As G is smaller than G, memory and time gains are obtained, in exchange for precision. AMOR example can be found in [20], and several examples of reduction techniques in [21].

2.4.3 Hierarchical representation

This method tries to enhance the efficiency of the model by representing the system as a hierarchicalH-matrix [22], and then solving the system directly. This is an efficient data-sparse representation, wherea given matrix is split into a hierarchy of sub-blocks and each sub-block is approximated by a low-rankmatrix. This representation is used to store the matrix L as defined in Eq. (2.12). A setup step createsthe H-matrix, and a solve step resolves the problem in Eq. (2.5) in a highly efficient fashion, leading topotential savings if multiple solves for the same network are needed.

2.4.4 Multigrid method

This method is part of the class of iterative methods. The multigrid method models the power grid as aPartial Differential Equation (PDE) [23]. As node voltage distribution is spatially smooth, the assumptionis that the solution of the power grid resembles the finite element discretization of a 2D parabolic PDE,and its solution is the discretiation of that PDE in a set of points that correspond to the grid nodes. Thisis done iterativelly, first solving the PDE on a coarse grid and then mapping the solution to a fine grid.This gives an approximate solution for the problem in Equation (2.5).

2.4.5 Random Walks Method

The random walk method is a type of Monte Carlo method [24]. A set of experiments are done where anode is selected and an approximate voltage for the node is calculated based on a random path startingfrom the node that tries to find the ground voltage. This can become quite useful if only a subset of thenetwork nodes needs to be solved, as those nodes can be selected and solved individually.

12

2.5 Introducing the Time Domain

Having presented the static analysis problem, we will now allow the node voltages to vary in the timedomain. Dynamic characteristics on the network appear due to the existence of capacitive elements inthe network, represented by the matrix C. The system equation is again

Cd~v(t)

dt+ G~v(t) =~i(t) , (2.14)

which will need to be evaluated in a given time interval T .

2.5.1 Temporal Discretization

The equation (2.14) is continuous in the temporal dimension, and to make it suitable for numericalevaluation a discrete description must be adopted.

The derivative in Eq. (2.14) can be written as

d~v(t)

dt= F (~v(t)) (2.15)

To evaluate the derivative, the equation must be discretized, i.e. evaluated over a time step of sizeh = ∆t, for which the above derivative must be integrated. This time step can be as small as desired,with smaller time steps usually yielding smaller discretization errors. The number of time steps M isgiven by T/h, which also means that decreasing the size of the time step will increase M , increasingthe number evaluations needed to solve the problem.

Different orders of temporal discretization and integration intervals produce slightly different solutions.Various discretization techniques exist, with varying degrees of complexity and results.

Two main types of integration are the implicit and explicit, depending on the form of the finite differ-ence obtained for the integration interval. Implicit integration evaluates F (~v(t)) at a future time:

~v (m+1) − ~v (m)

h= F

(~v (m+1)

), (2.16)

while explicit integration evaluates F (~v(t)) at a current time:

~v (m+1) − ~v (m)

h= F

(~v (m)

)(2.17)

The shown examples use a first order discretization, where (m) represents the m-th time step. Morecomplex discretization methods exist, like the Runge-Kutta method, that present a lower truncation errorthan the first order discretizations shown above [25].

For this purpose the first order implicit method is used, known as the backward Euler method. Al-though simple, it is numerically stable and sufficient for the problem. The solution will be obtainediteratively.

The following substitution is made

d~v(t)

dt=~v(t+ h)− ~v(t)

h, (2.18)

for a given time step of size h. Substituting into Eq. (2.14), we obtain(G +

C

h

)~v(t+ h) =~i(t+ h) +

C

h~v(t) , (2.19)

13

completing the discretization process. We now have the discretized system equation for the dynamicanalysis.

2.5.2 Simulation of the Dynamic System

Obtaining the voltage response of the system for a given time point mh involves solving Eq. (2.19)several times. Let’s represent the system matrix as Y =

(G + C

h

). The equation to solve is:

~v (mh) = Y−1(~i (mh) +

C

h~v ((m− 1)h)

)(2.20)

To solve the equation above the voltage vector from the previous time point ~v((m− 1)h) is required,which means that the equation must be solved recursively for all time points kh, k = 1, . . . ,m. To startthe process, the voltage vector at the initial time ~v (0) = ~v(t = 0) must be known, as well as the inputcurrents at all times~i (m) =~i(mh), m = 0, · · · ,M :

Algorithm 2: Dynamic Simulation

Data: Number of time step M ; Size of time step h; Voltage vector ~v (0); System input~i (m), m = 0, · · · ,M ; System matrices C,G

Result: Voltage vectors ~v (m), m = 1, · · · ,M1 Y ←

(G + C

h

)2 for m = 1, . . . ,M do3 ~v (m) ← Y−1

(~i (m) + C

h ~v(m−1)

)4 end

The inverse of the system matrix Y−1 only needs to be computed once, as it can be stored in memory,decreasing the complexity of the dynamic simulation. The equation 3 in Algorithm 2 is equivalent tosolving Eq. (2.5), therefore methods used to solve the static problem can be applied to solve this one aswell.

2.6 Operation Complexity

In this section the time complexity of several matrix operations used in the present work is discussed.This is how much time a given operation will take to complete on a computer. It is important to knowthe complexity of the operations, as it allows the estimation of run times and how they escalate with anincrease in the size of the problem to be solved. We will start by introducing the big O notation, usedto represent how the complexity of an operation scales with the size of its inputs, and follow with theanalysis of the complexity of some operations.

2.6.1 Big O Notation

The Big O notation is part of the family of the asymptotic notations, and describes the behaviour of afunction when its arguments tend towards infinity. In practice, it identifies the growth rate of a givenfunction, discarding information related to its particular behaviour. Mathematically, this can be describedas follows. Let f and g be two functions in R. Then:

f(x) = O(g(x)) as x→∞ ⇔ ∃M,x0 : |f(x)| ≤M |g(x)|, ∀x > x0 (2.21)

14

This can be read as: f(x) grows asymptotically no faster than g(x), meaning that the behaviour ofg(x) when x goes towards infinity majorates f(x). In a simpler fashion, one can find the growth rategiven by the big O notation by applying the following rules:

1. Choose the highest growth rate term from the function f to analyze;

2. Drop any constants associated with that term. This is the final big O notation of f .

Thus, by applying the concepts above we can say that, for the sake of example, f1(x) = 5x2 isO(x2), and that f2(x) = x2 − 4x+ 10 is also O(x2). This means that these functions grow no faster thang(x) = x2.

Using this notation, we can classify the complexity of computational operations, both in time andmemory. For example, measuring how much time an operation takes to complete using inputs of differentsizes, one can construct a function T (n), the time taken to complete the operation for an input of sizen, and apply the big O notation to find its time complexity. This allows, for instance, the comparisonbetween different algorithms that complete the same operation, or the estimation of run time for a givenproblem.

2.6.2 Complexity of Matrix Operations

Here, the time complexity of some matrix operations is presented. When calculating a complexity, it isimportant to know what is the unit of the time function T (n) that serves as a basis for the figures. In thiscase, the basic operation is a floating point operation (flop), i.e. an arithmetic operation between twonumbers represented in floating point notation. This arithmetic operation is realized in constant time, i.e.its complexity is O(1), and can be a sum, a multiplication, or a division. Thus, time complexity can bediscovered by counting the number of floating point operations needed to complete the more complexoperation. Larger matrices will have more elements, requiring more flops to complete an operation,leading to a final complexity that depends on the size of the matrix inputs. The presented algorithmsand figures are based on the book [26].

Sums and scalar products

The simplest operations are matrix sums and multiplications by scalars. Computing the sum of twomatrices Y = A + B, where A,B ∈ R m× n needs mn additions, one for each element of the matricesto be added.

The scalar multiplication Y = Ab, where b ∈ R, needs mn multiplications, one for each element ofthe matrix, as in the sum. Therefore, both matrix sums and multiplications by a scalar are operations ofcomplexity O(mn).

Dot product

We’ll start the multiplications by the dot product, a multiplication operation between two vectors. Theoperation to complete is y = ~a T~b, with ~a,~b ∈ Rn×1. The following pseudocode represents the stepsneeded to complete this operation:

This corresponds to n multiplications and n additions between scalar elements, totaling 2n flops,where bi and ai are the i-th elements of the corresponding input vectors. This means that the complexityof the dot product is O(n), i.e. the computation time is directly proportional to the size of the vectorinputs.

15

Algorithm 3: Dot Product

Data: ~a; ~bResult: y

1 y ← 02 for i = 1, . . . , n do3 y ← y + aibi4 end

Matrix-vector multiplication

The matrix-vector multiplication can be divided in a series of dot products. For A ∈ Rm×n and~b ∈ Rn×1,the operation ~y = A~b can be computed an element of ~y at a time:

yj = ~Aj~b , (2.22)

with ~Aj the j-th line of A, and yj the j-th element of ~y. This corresponds to the following:

Algorithm 4: Matrix-Vector Product

Data: A; ~bResult: ~y

1 ~y ← 02 for i = 1, . . . ,m do3 yi ← dot( ~Ai ,~b )4 end

The dot operation is the dot product described in Algorithm 3. As m dot products are needed, onefor each line of the input matrix A, the final flop count is m× 2n = 2mn flops. Therefore, the complexityof this operation is O(mn).

Matrix-matrix multiplication

A matrix-matrix multiplication B = AX, with A ∈ Rm×n and X ∈ Rn×p is equivalent to computing p

matrix-vector products, totaling 2mnp flops for the operation. The complexity of the operation is thereforeO(mnp).

There are more efficient algorithms for computing matrix products, being the Winograd algorithm oneof those [27], that computes a matrix multiplication with complexity O(n2.375). We will, however, considerthe schoolbook formulation (the one exemplified above), for further complexity calculations.

Cholesky decomposition

To calculate the cost of the Cholesky decomposition of a matrix A, we must first know what operationsare done on each of the n iterations of the algorithm.

In each step, Li is calculated, and L and A are updated. The update L is free, as the structure ofthe matrices means that the data on the matrices Li is copied to L without the need of extra operations.To calculate Li, a matrix-scalar product costing (n− i) flops is needed. Finally, we need to update A.

To update A the operation B(i) − 1ai,i~bi~b∗i needs to be computed. Usually, this would require (n− i)

flops for the scalar-vector product, (n − i)2 flops for the matrix product (the external product doesn’trequire a sum), and another (n − i)2 flops for the matricial sum. However, as B(i) is a submatrix of

16

the hermitian matrix A(i) that includes its diagonal, B(i) is also hermitian. Therefore, the result of thisoperation will also be hermitian. Therefore, only the terms from the lower triangular need to be computed,in a total of (n− i)2 + 2(n− i) operations.

The total number of flops per iteration is approximately (n− i)2 + 2(n− i). The total complexity of theCholesky decomposition is:

n∑i=1

((n− i)2 + 2(n− i)

)=

1

3n3 +

2

3n = O(n3) (2.23)

If a very sparse matrix is being decomposed using this technique, a majority of the multiplicationoperations 1

ai,i~bi~b∗i will be skipped, greatly reducing the complexity of the Cholesky decomposition.

17

18

Chapter 3

Network Parametrization andVariational Analysis

In this chapter we present the variational analysis problem and state of the art solutions for the efficientstatic analysis of power grids under the effect of process variation, building up from the nominal analysisproblem described in the previous chapter.

We start by introducing a parametrization scheme for the perturbations that reflects their effect on thenetwork behaviour in Section 3.1. The static variational analysis problem is then presented in Section3.2, together with three approaches to its resolution: a standard Taylor Series approach, a SPAREapproach, and an OLA approach. As the static analysis precedes the dynamic analysis, this will be astarting point for the work presented in the next chapter.

Finally, in Section 3.3 we introduce the dynamic variational analysis problem, together with a standardTaylor Series approach to the solution. This section introduces Chapter 4, where a novel approach tothis problem will be presented.

3.1 Network Parametrization

Process variation during the lithography process causes changes in the metal stripes that make upthe interconnect network, resulting in wires with different measures than expected. This irregularity indimensions causes changes in the resistance and capacitor values of the network’s RC model.

The resistance of a wire depends on the resistivity of its material ρ, on the wire section S, and thelength L. The wire section can be represented as S = WT , with W the width of the wire and T itsthickness:

R =ρL

S=

ρL

WT(3.1)

The coupling capacitance can be modelled as a planar capacitor between wires in different layers.The capacitance depends on the area of the cross-section A, on the medium permittivity ε, and on theinter-plate distance d:

C = εA

d(3.2)

A vector with the parameters can now be built

~λ = [∆ρ,∆L,∆W,∆T,∆ε,∆A,∆d] , (3.3)

19

making the parametrized representation of the resistors and capacitors become

R(~λ) =(ρ+ ∆ρ) (L+ ∆L)

(W + ∆W ) (T + ∆T ), C(~λ) = (ε+ ∆ε)

A+ ∆A

d+ ∆d(3.4)

Making the assumption that the variations are independent, a low order approximation around thenominal values, i.e. the stamp of the resistors and capacitors with no variation, is made

R(~λ) ≈ R0

[1 + ∆ρ+ ∆L+

O∑k=1

(−∆W )k

+

O∑k=1

(−∆T )k

]

C(~λ) ≈ C0

[1 + ∆ε+ ∆A+

O∑k=1

(−∆d)k

], (3.5)

where C0 and R0 are the nominal values, and O is the truncated order of the approximation. For smallvariations of the parameters, the order O can be quite small while still guaranteeing a near perfectapproximation. The variations can be different depending on what physical region of the integratedcircuit the element is located, and therefore it is possible to expand ~λ to contain as many independentsets of the parameters as needed, each set representing a region.

The effects of ~λ cause Eq. (2.4) to be represented as

C(~λ)d~v

dt+ G(~λ) ~v =~i , (3.6)

where C(~λ) and G(~λ) are now dependant on the parameter setting. This will be our simplified state-space representation for the perturbed network.

3.2 Static Variational Analysis

We’ll now focus on solving the problem described in Eq. (3.6), considering C = 0. As the number ofnodes can be as high as several millions, re-factoring the matrix G for all parameter settings makesthe problem intractable. The goal will be to compute perturbed node voltages for a given number ofparameter settings, after which the problem is considered solved, and resulting data can be used toanalyze the behaviour of the network under the influence of variations.

Different solutions have been proposed, e.g. an extension of the random walks method for incremen-tal analysis [28], incremental analysis via sparse approximation [29], and the analysis of the parametersas a spatial stochastic process [30]. These solutions are, however, quite expensive, and if very largenetworks must be analyzed their computational cost becomes huge.

In this section we’ll describe three ways to approximate the perturbed voltage vector, solving thestatic variational analysis problem in a very efficient fashion. The first will be a traditional Taylor Seriesapproximation approach to the variational problem, that will be used as a comparison base for the othertwo techniques. The second will is an Output Linear Approximation (OLA) approach, which matches theperturbed response of the voltages at the maximum variation for each parameter. The third will be aSPARE approach [31]. Both SPARE and OLA approaches separate the problem in two stages: a firstone where a parametrized model X is computed, and a second one where the system is solved. Thesetechniques are described in [32], and are the basis of the present work.

20

3.2.1 Taylor Series Approach

The matrix G(~λ) can be approximated as a Taylor Series truncated to an arbitrary order, that if desiredmay include cross-terms. Assuming a first order Taylor Series (TS) approximation, we obtain

G(~λ) = G0 +

P∑k=1

Gkλk , (3.7)

for a set of P parameters. The matrix G0 is the nominal admittance matrix, and corresponds to the casewhere no variation occurs. The matrix Gk is the first order sensitivity for the parameter k, and can beobtained by the differentiation of G with respect to the parameter λk. This can be done directly in matrixform, or using the vector ~r as exemplified:

r kj =dRj(~λ)

dk

∣∣∣∣~λ=0

, ∀j = 1, . . . , nR

Gk = PGDGPTG , (3.8)

where j is the j-th element of the vector ~r, nR is the number of resistances in the network and DG =

diag (1./~rk). For the static analysis case, only G is used and the solution for each parameter setting isnow obtained by factorizing G and solving:

G(~λ)~v(~λ) =~i ⇒ ~v(~λ) = G−1(~λ)~i , (3.9)

bringing us to the complexity problem presented around Eq. (2.5), as a factorization operation must becompleted for each parameter setting ~λ, increasing the computational effort required.

3.2.2 Two-Step Methods: OLA and SPARE approaches

This methods will be based upon the following Taylor Series approximation for the voltage vector, hererepresented for a 1st order approximation:

~v(~λ) = ~v0 +

P∑k=1

~vkpk , (3.10)

that can be extended to any desired order. The assumption that the voltage response with respect to theparameters is close to linear is made, making a small order Taylor Series enough to represent it. Thisis not an assumption in the dark, as voltages are expected to be proportional to the resistance path, asv = Ri and we are only varying the value of the resistances in the static analysis.

Using this representation, if the terms ~vk are known, calculating the node voltage for any parametersetting is not only trivial but also highly efficient, as it can be done by a simple matrix-vector product

~v(~λ) = X[1 ~λ

]T, (3.11)

where X is a matrix whose columns are[~v0 ~v1 ~v2 . . . ~vP

], P being the number of parameters and ~v0 the

nominal voltage vector. Solving the equation (3.11) represents the solve step of the methods describedbelow, whose set up step computes the terms ~vk.

21

Perturbed Vectors Generation: OLA

The OLA approach approximates the voltage vector ~v by matching the perturbed response at the max-imum variation using an affine function, and therefore only works for first order approximations. This isusually enough, though.

For a parameter pk, with maximum value pk, a nominal voltage vector ~v0 and a perturbed voltagevector ~vk that matches the system response at pk,

~v0 + ~vkpk = ~v(pk) ⇒ G−10~i+ ~vkpk = G(pk)−1 ~i (3.12)

We now substitute G(pk) by its first order approximation, as described in Eq. (3.7), obtaining

G−10~i+ ~vkpk = (G0 + Gkpk)

−1 ~i ⇔ ~vkpk =(

(G0 + Gkpk)−1 −G−10

)~i

~vkpk =((

I + G−10 Gkpk)−1 − I

)G−10

~i (3.13)

We now define ∆ = G−10 Gkpk, obtaining

~vk =1

pk

(−∆~v0 + ∆2~v0 −∆3~v0 + . . .

), (3.14)

and truncating Eq. (3.14) at the desired order (when the absolute value of ∆j~v0 is smaller than athreshold value), the terms ~vk are obtained. The terms of the approximation can be computed recursivelyas

∆j~v0 = pkG−10 Gk∆j−1~v0 , (3.15)

which means that the factorization of the nominal matrix G0 only needs to be computed once, andits factors stored. We now have the vectors ~vk and equation (3.11) can be solved for any number ofparameter settings.

Perturbed Vectors Generation: SPARE

The SPARE approach replaces G and ~v in Eq. (3.9) by their Taylor Series approximation, and representsa parametric system as a cascade of non-parametric systems while guaranteeing that the parametric de-pendence is explicit (Figure 3.1). this allows its use with parameter settings as required by the variationalanalysis. If G, ~v are written as

~v = ~v0 +

P∑k=1

~vkpk +

P∑k=1

P∑j=1

pkpj~vkj + . . .

G = G0 +

P∑k=1

Gkpk +

P∑k=1

P∑j=1

pkpjGkj + . . . , (3.16)

where ~v0 = ~v(~λ = 0), ~vk =d~v(~λ = 0)

dpk, ~vkj =

d2~v(~λ = 0)

dpkdpj, ...

As ~v(~λ) = G(~λ)−1 ~i =(G0 +

∑Pk=1 Gkpk + . . .

)~i, we can calculate the derivatives and build a

representation using SPARE.

The nominal voltage is obtained as ~v0 = G−10~i. The following example calculates the 1st order

22

Figure 3.1: SPARE representation of a 2nd order parametric system.

( c©2008 IEEE, used with permission)

perturbed voltage vector for a parameter pk:

~vk =d~v(~λ)

dpk

∣∣∣∣~λ=0

=d

dpk(G0 + G1p1 + . . .+ Gkpk + . . . )~i

∣∣∣∣~λ=0

⇒ ~vk = −GkG−20~i = −G1G

−10 ~v0 (3.17)

We illustrate the use of the SPARE structure in this second example, using two parameters p1 andp2, with a second order expansion, and a cross term. The first sub-index is the order with relation to the1st parameter and the second the order with relation to the second parameter:

~v00 = G−100~i

~v01 = −G−100 G01~v00

~v10 = −G−100 G10~v00

~v20 = −G−100 (G10~v10 + G20~v00)

~v02 = −G−100 (G01~v01 + G02~v00)

~v11 = −G−100 (G10~v01 + G01~v10 + G11~v00)

Here we can clearly see the structure of the voltage vectors. Only the nominal admittance G0 needsto be factorized, and only once, as higher orders can be recursively built upon already computed voltagevectors. As in the OLA approach, we now have the vectors ~vk and are ready to solve the system (3.11)for any number of parameter settings.

3.3 Dynamic Variational Analysis - Problem

As for the analysis in Section 2.5, the appearance of dynamic characteristics increases the difficulty ofthe problem, as now Eq. (3.6) must be discretized and evaluated at a given time mh. The resultingequation is be similar to Eq. (2.19), now with parametric dependence:(

G(~λ) +C(~λ)

h

)~v(~λ, t+ h) =~i(t+ h) +

C(~λ)

h~v(~λ, t) (3.18)

This is the dynamic variational analysis problem. It combines the difficulty of factorizing the systemmatrix found in the static case with no variations (Section 2.3), the need to solve the system for all pre-vious time steps found when introducing dynamic characteristics (Section 2.5.2), and the parametrizedsystem matrix from process variations that forces the whole dynamic simulation to be reevaluated for allparameter settings (Section 3.1).

A standard solution for this huge problem would be a Taylor Series approximation for the systemmatrices, similar to Section 3.2.1. The capacitance matrix C(~λ) is represented as a Taylor Series as

23

Algorithm 5: Dynamic Variational Analysis - Taylor Series, 1st order

Data: Number of time steps M ; Size of time step h; Voltage vector ~v (0); System input~i (m), m = 0, · · · ,M ; System matrices C,G; Parameter settings ~λ(p), p = 1, · · · , P

Result: Voltage vectors ~v (m)p , m = 1, · · · ,M

1 for p = 1, · · · , P do2 C← C0

3 G← G0

4 for k = 1, · · · , P do5 C← Ckλ

(p)k

6 G← Gkλ(p)k

7 end8 Y ←

(G + C

h

)9 for m = 1, . . . ,M do

10 ~v(m)p ← Y−1

(~i (m) + C

h ~v(m−1)p

)11 end12 end

done for the admittance matrix:

C(~λ) = C0 +

P∑k=1

Ck~λk , (3.19)

where C0 is the nominal capacitance matrix, and Ck is the first order sensitivity for the parameter k.Now, the system equation is solved recursively:

~v(~λ, mh) =

(G(~λ) +

C(~λ)

h

)−1 (~i (mh) +

C(~λ)

h~v(~λ, (m− 1)h)

)(3.20)

The big difference from the nominal case, where there are no variations, is that this process mustnow be fully repeated for all parameter settings, making the variational analysis algorithm extremelyheavy. As seen in Algorithm 5, the equivalent of P ×M nominal static systems need to be solved, whereP is the number of independent parameters and M is the number of time steps. Usually the number ofparameter settings is much larger than the number of time points, therefore a great deal of factorizationsmust be computed. Also, if one wants to study the perturbed response at a given time point, all previoustime points need to be reevaluated for each parameter setting, creating an enormous overhead.

This is the problem that will carry to the next chapter, where a novel solution to the dynamic variationalanalysis problem is presented, the Dynamic Variational Analysis Scheme. Its results will be comparedwith the standard TS approach presented here.

24

Chapter 4

Dynamic Variational Analysis Scheme

In this chapter a dynamic analysis scheme for power grid networks under the effect of parametric vari-ation is described. This reflects the original contributions developed during the course of the currentwork.

We’ll start by presenting the proposed analysis flow, and comparing it to the traditional one in Section4.1. Afterwards, in Section 4.2, the compression strategy used in the analysis scheme is presented.Then, in Sections 4.3 and 4.4 the proposed two-stage approach is described:

1. SET UP: Generation of a compressed parametrized model

2. EVALUATION: Usage of the parametrized model to analyze the perturbed network

After presenting the proposed solution, considerations relative to the proposed approach are dis-cussed in Section 4.5, and we finish by estimating the memory and time performance of the scheme inSections 4.6 and 4.7. In the next chapter the proposed scheme will be tested with a series of experi-ments using industrial benchmarks.

4.1 Analysis Flow

Analyzing the network involves the repetitive evaluation of Eq. (3.20). This requires the factorization ofthe matrix

Y(~λ) =

(G(~λ) +

C(~λ)

h

), (4.1)

a costly operation, as this matrix is very large. To study the power grid, it is of interest to experimentwith a lot of different parameter configurations, especially if statistical data about the PG is needed. Thetraditional analysis flow, depicted in Figure 4.1, needs a new time domain simulation of the power grid foreach new parameter setting, which in turn requires a new factorization of the system matrix, increasingcomputation time.

There are methods to accelerate and optimize the solution of the system, as referenced in Section2.3. However, for problems that require the usage of a large number of parameter settings, for instance,to estimate the distribution of peak resistor current (relevant for electromigration analysis) or peak voltagedrop (relevant for grid integrity analysis), more efficient methods are required.

To increase the efficiency of the analysis, a new analysis flow is proposed (Figure 4.2), where vari-ability information for the same input vector on the network (the same driving excitation) can be collectedwithout re-simulating the PG. This means that parametric analysis can be done, for any number of pa-rameter settings, with a single factorization of the system matrix and a single time domain simulation.

25

Figure 4.1: Traditional analysis flow. Figure 4.2: Proposed analysis flow.

The proposed variational analysis is based on two assumptions:

• Node voltages vary smoothly in time, and can be recovered by a low order approximation, simpli-fying the representation;

• Node voltages are highly correlated, both in the time and parameter domain, allowing high com-pressibility of the model.

4.2 Compression Scheme

The proposed dynamic analysis scheme will generate a parametrized model similar to to the modelproduced by the static analysis approaches in Section 3.2.2, containing sets of perturbed voltage vectorsthat will be compressed. This is done by the mean of a RRQR decomposition, applied on-the-fly to thegenerated vectors during model generation. Before presenting how the compressed model is generated,we describe the eigenvalue problem and the QR decomposition. The PCA procedure is also presented,due to its similarity to to the applied technique.

4.2.1 The Eigenvalue Problem

Let’s consider a square matrix A ∈ Rn×n, and a vector ~x ∈ Rn×1. For most ~x, the product ~y = A~x yieldsa vector ~y ∈ Rn×1 that has as a direction different from the direction of ~x. However, some vectors ~x willhave the same direction of A~x, turning the equation

A~x = λ~x (4.2)

true. This is called the eigenvalue problem, where λ is called the eigenvalue, or the characteristic valueassociated with the the eigenvector ~x. Its solution is obtained by solving

(A− λI)~x = 0 (4.3)

If all eigenvalues of ~x are non-zero, a basis built with the eigenvectors will span the column space ofA. That also means that A is diagonalizable.

The eigendecomposition of a diagonalizable matrix separates it into an orthonormal matrix that con-tains its eigenvectors and a diagonal matrix with the corresponding eigenvalues:

A = QLQ−1 (4.4)

26

This clearly shows that the matrix Q spans a n-dimensional space, as it contains n vectors that are allorthonormal to each other. As those vectors are eigenvectors of A, it means that n-dimensional spaceis also the column space of A. Thus, we conclude that A spans a n-dimensional space.

4.2.2 QR Decomposition

If A is not a square matrix, obtaining an orthonormal basis by A can’t be done from solving Eq. (4.4).However, mathematical tools to obtain a basis that span the column space of A exist. One of these toolsis the QR decomposition, that decomposes A into

A = QR , (4.5)

where Q is an orthonormal matrix and R is an upper triangular matrix. If rank(A) = n, then the first ncolumns of Q form an orthonormal basis for the column space of A. Moreover, the k-th column of Aonly depends on the first k columns of Q. The terms in R rebuild the matrix A from a linear combinationof the vectors in the basis Q.

There are several methods that compute this decomposition, e.g. using the Gram-Schmidt process,or Householder reflections [33].

Gram-Schmidt process

The Gram-Schmidt process is an orthogonalization method that takes a linearly independent set S =

{~v1, . . . , ~vk} ⊂ Rn , with k < n, and generates an orthogonal set S′ that spans the same k-dimensionalsubspace as S. We start by defining the projection operator:

proj~u(~v) =< ~v, ~u >

< ~u, ~u >~u , (4.6)

where < ~v, ~u > is the inner product between the vectors ~v and ~u. This operator will project ~v onto the1-dimensional space spanned by ~u.

The following algorithm represents the Gram-Schmidt process and iteratively replaces the vectors~v1, . . . , ~vk for orthonormal vectors:

Algorithm 6: Gram-Schmidt processData: ~v1, . . . , ~vkResult: ~v1, . . . , ~vk

1 for i=1:k do2 ~vi ← ~vi

‖~vi‖ (normalize)3 for j=i+1:k do4 ~vj ← ~vj − proj~vi(~vj) (remove component in direction ~vi)5 end6 end

To obtain a QR decomposition from the Gram-Schmidt process, the process is applied to the columnsof A, obtaining Q. As Q is an orthonormal matrix, Q−1 = QT , and therefore we can obtain R asR = QTA.

However, executing the algorithm on a computer will lead to rounding errors in the computation of theorthonormal vectors, which causes them to not be perfectly orthonormal. This causes misbehaviours inthe Gram-Schmidt process and causes it to be numerically unstable.

27

Algorithm 7: QR decomposition using Householder transformations

Data: A ∈ Rm×nResult: Q, R

1 R← A2 for k = 1, . . . ,m− 1 do3 ~u← ~Rk:m,k4 u1 ← u1 − ‖~u‖5 ~u← 1

‖~u‖~u

6 Qk ← I− 2~u~uT

7 expand(Qk)8 R← QkR

9 end10 Q← QT

1 QT2 . . .Q

Tm−1

Householder Reflection

A Householder reflection, also known as a Householder transformation, is a linear transformation thatreflects a vector on a hyperplane.

The reflection hyperplane can be defined by an orthonormal vector ~v perpendicular to the originalhyperplane. The linear transformation is given by the Householder matrix P

P = In − 2~v~vT (4.7)

To obtain the QR decomposition, we start by a column ~a1 of A ∈ Rm×n, with m < n. with α1 = ‖~a1‖,we obtain ~v:

~u1 = ~a1 − α1

[1 0 0 . . .

]T~v1 =

~u1‖~u1‖

(4.8)

We now calculate the Householder matrix by applying Eq. (4.7), obtaining Q1 = P = I − 2~v1~vT1 .

Applying Q1 to A, we obtain

Q1A =

[α1 ?

0 A′

], (4.9)

and we can repeat the process in a recursive fashion for the submatrix A′. As A′ is smaller than A, Q′2will be smaller than Q1, so we expand it forming (the general case is presented here):

Qk =

[Ik−1 0

0 Q′k

](4.10)

After computing all Q matrices, repeating the process m− 1 times, we can obtain the QR decompo-sition as

R = Qm−1 . . .Q1A

Q = QT1 Q

T2 . . .Q

Tm−1 (4.11)

This method has greater numerical stability, when compared with the Gram-Schmidt method. It hasthe downside that the orthonormal basis Q can only be obtained at the end of the process. Counting

28

the number of operations, we obtain:

2

3n3 + n2 +

1

3n− 2 = O(n3) flops (4.12)

4.2.3 Introducing the RRQR

To compress the model, a Rank Revealing QR (RRQR) decomposition is applied to the generatedvectors, and the resulting Q and R matrices are stored. The idea of this variant of the QR decompositionis the following: if the vectors of X ∈ Rm×n, with m > n are highly correlated, then X is rank deficient,i.e. there is a set of q < n vectors that generates the same subspace spanned by the columns of X.Therefore, a QR decomposition of X would reveal a Q matrix with empty columns, as vectors in Q

should be orthonormal to each other.

Q =[Q′ ∈ Rm×q 0

]R =

[R′ ∈ Rq×n

0

](4.13)

This brings advantages in the form of memory saving if the matrices Q′ and R′ are stored instead ofX, if q is small enough.

There is a problem with this approach: in practice, Q will be full ranked, but with most dimensionsshowing almost no variance. An example of this concept is shown below in Figure 4.3, displaying 2Ddata that can be represented with almost no loss of information in a 1D space (this is the basis of theleast squares regression, for example). We see that a line can fit the represented data and almost noinformation is lost on the process. This would mean, in the example of Fig. 4.3, dropping the componentof the data associated with the direction vector ~y ′.

Figure 4.3: Linear regression as an example of data compression.

The same idea can be applied to the model matrix X. Finding a representation that, although lossy,contains the information that represents the variations caused by the parameter, will be equivalent tofinding the principal directions of the data in X, making this a Principal Component Analysis (PCA)-likeproblem, as described in Section 4.2.4.

This brings us to the RRQR decomposition. This variant of the QR decomposition will allow thediscovery of the principal components of the data by checking, each time a new vector would be addedto the basis Q, how much information is the new component carrying. The process is iterative, and

29

Figure 4.4: Orthogonalizing a new candidate basis vector.

starts with a Q basis with a single vector. Representing X as X =[~X1~X2 . . . ~Xn

], then Q(1) =

~X1

‖ ~X1‖and

R(1) =[‖ ~X1‖

].

Each time a new vector ~Xk is generated, it is orthogonalized using the existing basis Q ∈ Rm×j ,Q =

[~Q1~Q2 . . . ~Qj

]basis (Q may change size whenever a new vector is generated, and j ≤ k − 1),

obtaining the candidate basis vector ~Q′j+1. This is a rank one update to the QR decomposition. Thenorm ‖ ~Q′j+1‖ is computed, and the decision to add a new vector to the basis or not is made based onthe value of its norm. ‖ ~Q′j+1‖ is the energy of ~Xk in the new dimension, i.e. it represents the amount ofinformation in ~Xk outside the subspace spanned by ~Qk−1. An example is shown in Figure 4.4, wherethe space spanned by ~Qj is the red dotted line. The vector ~Xk is orthogonalized as explained in Section6, and a new candidate basis vector ~Q′j+1 is shown.

Two thresholds are used to make a decision on ~Q′j+1. They will drop basis vectors that are repre-senting very small amounts of information, and the decision process is as follows:

• Absolute Threshold tabs: If ‖ ~Q′j+1‖ < tabs, then ~Q′j+1 is dropped. Otherwise, Q =[Q ~Qj+1

],

with ~Qj+1 =~Q′

j+1

‖~Q′j+1‖

• Relative Threshold trel: If ‖ ~Q′j+1‖ < trel × maxi=1,...,j(‖ ~Q′i)‖, then ~Q′j+1 is dropped. Otherwise,

Q =[Q ~Qj+1

], with ~Qj+1 =

~Q′j+1

‖~Q′j+1‖

The R matrix is upper triangular, and contains the projections of the original vectors of X in the newbasis. Each time a new basis vector ~Qj+1 is added to Q, a row of zeros is added to the R matrix aswell,as the previous matrix vectors ~X1, . . . , ~Xk−1 represented in R do not depend on the new basis vector.The new vector ~Rk+1 is

~Rk+1 =[proj~Q1

( ~Xk) . . . proj~Qj( ~Xk)

]T, (4.14)

and R is updated as R =[R ~Rk+1

], ending the RRQR iteration. The full algorithm is based on Section

6 and [34] and is, for a single iteration k:The end result is similar to the PCA technique presented in section 4.2.4. Although eigenvalues

are not explicitly calculated, the subspace spanned by Q reflects the first principal components of thedata contained in X, minimizing information loss, although they are not sorted as in the PCA. Changingthe thresholds will have an effect on the resulting basis Q, by allowing more or less vectors to berepresented, changing the number of principal components selected and, as expected, the error rate inthe reconstitution.

The reconstitution of the original model matrix X is simply done by a matrix product, as in the originalQR decomposition:

X = QR (4.15)

30

Algorithm 8: RRQR decomposition rank 1 update

Data: Q; R; ~Xk; tabs; trel; maxNormResult: Q; R; trel; maxNorm

1 if k = 1 then2 maxNorm← ‖ ~Xk‖3 Q← ~Xk

‖ ~Xk‖

4 R← ‖ ~Xk‖5 end6 for i = 1, . . . , j do7 ai ← proj~Qi

( ~Xk)

8 ~Xk ← ~Xk − ~Qiai9 end

10 if ‖ ~Xk‖ ≥ maxNorm× trel and ‖ ~Xk‖ ≥ tabs then11 if ‖ ~Xk‖ > maxNorm then12 maxNorm← ‖ ~Xk‖13 end

14 Q←[Q

~Xk

‖ ~Xk‖

]15 R←

[R ~a

0 ‖ ~Xk‖

]16 else17 R←

[R ~a

]18 end

4.2.4 Principal Component Analysis

The PCA is a statistical procedure that can be used to find patterns in data, expressing the data in away that its variance structure is reflected. Let’s suppose we have a set of m n-dimensional vectors, withm,n� 1. It’s difficult to graphically visualize this dataset, and therefore hard to highlight similar vectors,their distribution, and what are the main directions where the information is located. PCA allows theeasy visualization of the data, and also allows an easy reduction of the dimensionality of the dataset,while keeping the reconstitution error low.

The result of the procedure is a set of k principal components, 1 < k < n, a set of vectors that forman orthonormal basis where the data can be projected. These components represent the directionswhere the data variance is the highest, being the maximum variance on the direction of the first principalcomponent.

A preprocessing is done, by centering the data matrix X ∈ Rm×n and calculating its covariance matrixS = XTX

m−1 . Afterwards, the eigenvectors and eigenvalues of S are computed, obtaining an orthonormalmatrix U ∈ Rn×n with the eigenvectors of S and a vector Λ ∈ Rn×1 with the respective eigenvalues,where the eigenvalues are ordered from the greatest to the smallest.

The eigenvalues represent the energy in the direction of the corresponding eigenvector. This can beused to reduce the dimensionality of X, while keeping most of the information contained by the data.The percentage of the information contained in a given principal component can be calculated as

infok% =λk∑ni=1 λi

× 100 (4.16)

Choosing to project X on a subset of S containing the k eigenvectors corresponding to the highereigenvalues will reduce the data to a k-dimensional space. The total information lost can be estimatedby applying Eq. (4.16).

31

4.3 Set Up Stage - Model Generation

During the Set Up stage a parametric model of the system is created. Similarly to the static analysiscase, two approaches for the model generation are presented: an approach based on SPARE and anapproach based on linear approximation (OLA).

The model is stored by representing the voltage vectors generated for each time step in a reducedform, exploiting the correlation between them. For each time step and for P parameters that are as-sumed independent, a first order approximation is sufficient to represent the system and a set of P + 1

voltage vectors are generated (equivalent to the model generated by the static analysis scheme de-scribed in Section 3.2.2): a nominal voltage vector representing the network under the effect of novariations and P perturbed voltage vectors. This means that, if the simulation interval is Mh (M timesteps of size h), then the model will need to store information corresponding to (P+1)M voltage vectors.If the network has n nodes, then this is equivalent to a (P + 1)M × n dense matrix (lets call that thematrix X, the full model matrix), which for large networks is huge. This quickly becomes impossible tomanage, making the reduced representation fundamental.

The parametrized model generation can be summarized as described in Algorithm 9.

Algorithm 9: Dynamic Variational Analysis Scheme - Set Up Stage

Data: h, ~v0(0), ~vp(0) ∀p ⊂ {1, . . . , P},~i(mh) ∀m ⊂ {1, . . . ,M}, G0, C0, Gp ∀p ⊂ {1, . . . , P},Cp ∀p ⊂ {1, . . . , P}

Result: Q, R

1 Factorize Y0 =(G0 + C0

h

)2 for m = 1, . . . ,M do3 Generate the nominal response of the system ~v0(mh)4 Apply the RRQR and update Q, R5 for p = 1, . . . , P do6 Generate the perturbed response of the system ~vp(mh)7 Apply the RRQR and update Q, R8 end9 end

The nominal response is generated using Eq. (2.19), where ~λ = 0, obtaining Eq. (4.17). For practicalpurposes, a copy of v0 ((m− 1)h) is saved in memory. The factors from Y0 is also stored in memoryand reused in each time step while calculating ~v0(mh), as it does not depend on time.

~v0(mh) =

(G0 +

C0

h

)−1 [~i(mh) +

C0

h~v0 ((m− 1)h)

](4.17)

For the generation of the perturbed response, the two followed approaches are presented. Althoughthey are very similar, they yield different results and have different computational costs.

32

4.3.1 SPARE Approach

We’ll expand the SPARE approach presented in Section 3.2.2 to the dynamic system equation (2.19).We’ll start by approximating ~v, G, and C with a Taylor Series:

~v(~λ, t) = ~v0(t) +

P∑k=1

~vk(t)pk +

P∑k=1

P∑k=1

pkpj~vkj(t) + . . .

G(~λ) = G0 +

P∑k=1

Gkpk +

P∑k=1

P∑j=1

pkpjGkj + . . .

C(~λ) = C0 +

P∑k=1

Ckpk +

P∑k=1

P∑j=1

pkpjCkj + . . . , (4.18)

where ~v0 = ~v(0, t), ~vk(t) = dv(~λ,t)dpk

∣∣∣∣(0,t)

, ~vkj(t) = d2v(~λ,t)dpkdpj

∣∣∣∣(0,t)

, . . .

Differentiating and substituting the expressions above into equation (2.19), we obtain ~vk.

~vk(mh) = −Y−10

[YkY

−10

(~i(mh) +

C0

h~v0((m− 1)h)

)− Ck

h~v0((m− 1)h)− C0

h~vk((m− 1)h)

](4.19)

We can now construct the above expressions recursively, using Eq. (4.17) and Eq. (4.19). Weshow the expression for the 2nd order perturbation ~vkk, and others can be calculated using the samemethodology. We’ll represent the voltage vector at a given time step mh as ~v (m).

~v(m)k = −Y−10

[Yk~v

(m)0 − Ck

h~v

(m−1)0 − C0

h~v

(m−1)k

](4.20)

~v(m)kk = −Y−10

[Ykk~v

(m)0 + Yk~v

(m)k − Ckk

h~v

(m−1)0 − Ck

h~v

(m−1)k − C0

h~v

(m−1)kk

](4.21)

It’s possible to observe how the structure of the expressions is similar to the static analysis case, butthese include new terms due to the capacitive effect. Also, as in the static case, only one factorization isdone during the generation of the parametrized model as the factors obtained from the decompositionof Y0 can be stored in memory and reused. The sensitivity matrices Gk and Ck are generated asexplained in Section 3.2.1.

4.3.2 OLA Approach

In a similar fashion to what was done in Section 3.2.2, the Output Linear Approximation (OLA) approachis based on matching the perturbation for the maximum expected variation for each parameter. Startingfrom equations (3.10), (3.12) and (2.19), we can compute ~v(pk):

Y(pk)~vk(pk) (m) =~i (m) +1

hC(pk)~p

(m−1)k (4.22)

Now, Y(pk) and C(pk) are linearly approximated as Y(pk) = Y0 + pkYk and C(pk) = C0 + pkCk,and by substituting in Eq. (4.22) and applying some linear algebra we obtain

~v(pk) (m) = (Y0 + pkYk)−1(~i (m) +

1

hC0~v

(m−1)0 + pk~b

), (4.23)

where ~b = 1h

(Ck~v

(m−1)0 + C0

~(m− 1) + pkCk~v(m−1)k

).

33

As in Eq. (3.15), we can approximate the inverse term in Eq. (3.14) with a truncated series

(Y0 + pkYk)−1

=(I−∆ + ∆2 −∆3 + . . .

)Y −1

0 , (4.24)

where ∆ = pkY−10 Y1. This can be done recursively as in the static case. In practice, the previous

computed vectors are stored in memory, aswell as the factors from the decomposition of Y0.

4.4 Evaluation Stage

The Evaluation stage of the scheme is where the computational effort that takes place during the SetUp is rewarded. After generating the parametrized model, it’s very computationally efficient to evaluatethe system response for any number of parameter settings. The parametrized model corresponds to thematrices Q and R provided by the Set Up. This can be used to reconstruct the voltage on any of the ncircuit nodes, for any of the M time steps, and any parameter setting.

The matrix R is sorted by time steps. The sub-block of P +1 columns of R that contains the columnsthat go from m(P + 1) to (m + 1)(P + 1) contains the nominal voltage vector for the time point m plusthe P perturbed voltages for each of the P independent parameters. Let’s call each sub-block of R withthe vectors associated with the m-th time point Rm.

To evaluate the voltage nodes at a time step m for a given parameter setting ~λ, we apply the followingmatrix-vector products:

~v (m)(~λ) = Q

(Rm

[1 ~λ

]T), (4.25)

and the problem is solved for that parameter setting.

4.5 Considerations

Several considerations relative to this approach are presented:

• Only the set of nodes to be studied needs to be computed. If only a subset of the network nodesneed to be analyzed, the parametrized model can be built for these nodes only.

• Time and space-variant parameters can be included straight-away. As each time point is evaluatedindependently, if the parameter setting ~λ changes with time the changes are reflected immediatelyin the voltage vector without the need to recompute the model.

• Acceleration techniques as described in Section 2.3 can be applied during the Set Up stage of thescheme, combining the very efficient variational analysis on the Evaluation stage with a speededup Set Up stage.

• Both the Set Up and the Evaluation stage are highly parallelizable, as all P parameters are inde-pendent. during the Set Up stage the computation of the perturbed vectors ~vk for each time pointcan be straight away computed concurrently in up to P threads. The Evaluation stages ups themark by allowing full parallelization of the computation effort. All parameter settings, for all timepoints, can be computed in parallel allowing the use of an arbitrary number of machines. As theEvaluation stage is solely composed of matrix-vector products, the operations can be parallelizedusing GPU’s, increasing the efficiency of the operation. In full honesty parallelism can be alsodirectly exploited in any technique that is based on sampling the parameters and computing thesubsequent response (in essence a Monte-Carlo-type method).

34

4.6 Memory Requisites

As the networks under the scope of analysis are usually very large, it is important to know how muchmemory space is needed to build the parametrized model and to evaluate it.

The first thing to consider is the need to store the system nominal matrix Y0 = (G0 + C0/h), aswellas the sensitivity matrices Gk and Ck, with k = 1, . . . , P . As the number of nodes n is very high, thismay seem like a huge overhead. The sparsity of those matrices is, however, very high, with the numberof non-zero elements in these matrices is O(n). A larger overhead exists due to the need to store thefactors resulting from the Cholesky decomposition of Y0, as the resulting matrices P and L are denserthan the original nominal matrix Y0. We can set an upper bound for the number of elements as O(n2).The real number of elements should much closer to O(n), however.

As a working space, we’ll need to store the current voltage vector being computed and the voltagevectors for the previous time point, totaling (P + 2) n×1 dense vectors. Finally, space is needed to storethe matrices Q and R. We can estimate the total memory requisites to be between O(n) and O(n2).The larger the number of nodes, the closer to O(n) memory usage should be. This considers the basememory unit as the space needed to store one matrix element.

The final parametrized model will require nq+ q(P + 1)M memory units, which can be approximatedas O(nq), because n � (P + 1)M . The full reconstructed model occupies n(P + 1)M . Therefore, thecompression rate is approximately

Compression = 1− q

(P + 1)M(4.26)

4.7 Time Complexity

As with the memory, it is also needed to have an estimate for the time complexity of both the Set Upstage and the Evaluation stage for the proposed scheme. Thus, calculations are presented that allowthat estimation, considering the use of the SPARE approach.

4.7.1 Set Up Complexity

Counting the number of the operations, we need a Cholesky decomposition, and for each time step: 1nominal voltage vector calculation and P perturbed voltage calculations.

To compute one nominal voltage vector, 6 matrix-vector multiplications and 1 vector sum are needed.As all the matrices are n × n and the vectors n × 1, this can be approximated as 12n2 + n ≈ 12n2

operations.

To compute one perturbed voltage vector, 7 matrix-vector multiplications, 1 matrix-matrix sum and 2vector sums are needed. A total of 15n2 + 2n ≈ 15n2 operations are needed.

Thus, the total operations needed for a single time point calculation is (15P + 12)n2, making it a totalof M(15P+12)n2 for the full model, plus the Cholesky decomposition and the RRQR factorization, whichis O(n3). The total computational cost of the Set Up stage is therefore O(n3 +M(15P + 12)n2).

This number is, however, a crude estimation, as most operations are with sparse matrices,whosesimplified representation greatly reduces the number of operations (in particular the Cholesky decom-position becomes very efficient in this situation), bringing down the presented time complexity.

35

4.7.2 Evaluation Complexity

The complexity of the Evaluation stage is easier to calculate and estimate, as it only depends on acouple full matrix-vector operations. For each time step, the time complexity is the corresponding to:

• The product of a R ∈ Rq×(P+1) matrix with a (P + 1) × 1 vector, which corresponds to 2q(P + 1)

operations;

• The product of a Q ∈ Rn×q matrix with a q × 1 vector, which corresponds to 2nq operations.

The total number of operations is 2q(n + (P + 1)) ≈ 2qn operations, as n � P + 1, for each timepoint and each parameter setting. If instead a non-compressed model were being used, i.e. a fullX ∈ Rn×(P+1), then the time complexity would be corresponding to 2n(P + 1) operations, for eachparameter setting.

A ratio that relates the time complexity of the compressed model versus the full model can be calcu-lated, obtaining the following formula for the speed-up ratio due to compression:

rspeedup =P + 1

q(4.27)

Thus, if the size q of the basis is smaller than P + 1, there will be an increase in the evaluation speedpurely due to the compression scheme. We must once again note, however, that the memory requisitesneeded to store and evaluate the full model would be huge, rendering the non-compressed model verydifficult to use in practice.

36

Chapter 5

Experiments and Results

The following chapter will describe the experiments done during the development of the present work,and the analysis of their results. First, in Section 5.1 we’ll present the source of the data and its initialpreparation.

A first set of experiments is done on a small network, with the purpose of studying the proposedscheme under several configurations. We start in Section 5.2 by setting the parameters presentedin Section 3.1 and computing a set of perturbed responses. Afterwards, we analyze how the modelbehaviour changes when the range of the variations allowed by the parameters changes. We also studyhow the compression rate and computational cost changes with the change of the length of the timedomain simulation, the number of parameters, and the compression thresholds.

In Section 5.6 a second set of experiments intends to test the scheme on large networks, demon-strating its scalability. Finally, considerations on the use of more efficient algebra routines to enhancefaster evaluation times are presented in Section 5.7.

All the Set Up and Evaluation routines were coded and executed in MATLAB, a non-compiled language.Therefore, the execution times are merely indicative.

5.1 Data Preparation

To test the proposed methodology, the IBM Power Grid Benchmarks [12] were used, as those are drawnfrom real designs and are widely used in research in the power grid analysis area. Those are providedas a SPICE netlist, that we prepared and converted to a binary file that can be read by MATLAB. A smallsection of a netlist is presented below:

(...)

* layer: M5,VDD net: 1

R554 n1_333_383 n1_521_383 1.342857e-01

R555 n1_521_383 n1_2400_383 1.342143e+00

R556 n1_2400_383 n1_2583_383 1.307143e-01

R557 n1_2583_383 n1_2771_383 1.342857e-01

(...)

The benchmark netlists for transient analysis contain information about the elements on the network(resistors, capacitors, inductors, voltage sources and current sources) and how they are connected. Ele-ments can be located in different layers of the grid, and this fact will later be used to increase the numberof independent regions under analysis, which in turn increases the number of parameters and therefore

37

PG #n #r #c #Reg #Pibmpg1t 25095 40801 25095 8 48ibmpg2t 163577 245163 163577 32 192ibmpg3t 1039624 1602626 1039624 16 96ibmpg6t 1530562 2410486 1530562 24 144

Table 5.1: Benchmark networks and their default region configuration.

the variability of the network response. The behaviour studied is the response to a step function, equiv-alent to turning on the circuit. The time step t = 0 represents the moment when the current sources areturned on. Their value is as described on the netlist. Voltage sources are converted to current sourcesin the 3rd step of the preparation while calculating the Norton equivalent.

The flow of the preparation is the following:

1. Parse the netlist and create its description (component list and connections);

2. Replace the inductors with zero-ohm links;

3. Simplify the circuit by generating its Norton equivalent and removing shorts;

4. Extract the behaviour of the AC sources;

5. Create an index with the circuit nodes, and build the resulting matrices;

6. Save to a binary .mat file.

The simplification step ensures that the resulting representation of the system is identical to the onedescribed in [32]. The output of this preparation step is a set of matrices that describe the system,readable by MATLAB.

• Vector r , containing the values of the resistances;

• Vector c, contaning the values of the capacitors;

• Matrix i, describing the behaviour of the AC sources;

• Vector B, containing the value of the DC sources;

• Matrix PG, incidence matrix of the resistors;

• Matrix PC, incidence matrix of the capacitors;

• Vector R, that indexes the nodes to their corresponding region in the network.

The network regions can be divided into smaller sub-regions, each one with an independent set ofparameters, emulating the locality of the perturbations. This causes an increase on the number of theparameters, and therefore, a complexity increase.

The Table 5.1 presents the networks used and their default configuration. #n indicates the size of thenetwork, that is, the number of columns in the matrices G and C. This also corresponds to the numberof capacitors on the network. #r is the number of resistors, #Reg represents the number of independentregions, and #P the number of independent parameters (6 × #Reg).

5.2 Chosen Parametrization

The parameters will affect the capacitors and resistors in the corresponding region. For each Evaluationiteration, a set of parameters is randomly generated, following a normal distribution independent for eachparameter:

λp ∼ N

(0,

(∆λmax

p

4

)2)

(5.1)

38

This means that each set of parameters corresponds to a randomly generated vector ~λ in a P-dimensional space. The 4σ interval means that almost every generated value (99.994% of the valuesgenerated) will be contained inside the allowed interval for variation. If, by chance, there is a generatedvalue outside the interval, its value will be truncated to the maximum value allowed, effectively cuttingthe tails of the normal distribution. More about the random generation of parameters will be discussedin the present chapter.

The ∆λmaxp is the the maximum percentage of variation allowed for the parameter, being the default

values [∆ρmax, ∆Lmax, ∆Wmax, ∆Tmax, ∆εmax, ∆dmax] = [10%, 1%, 30%, 30%, 10%, 10%]. Thesevalues allow a high level of variability on the values of the resistors and capacitors (up to 94.4% on theresistors and 90.1% on the capacitors).

The model used to generate the results found in this chapter considers that the parameters areindependent and that a first order approximation is sufficient to represent their effect. However, asexemplified in Section 4.3, the proposed model does not force the independence of the parameters,neither limits the order of approximation.

5.3 Results Quantification

To quantify the results, they will be compared to a 3rd order Taylor Series approximation of the origi-nal perturbed system, i.e. the stamp of the perturbed resistors and capacitors under the effect of theparameters, as described in Eq. (3.5). This approximation will be referred to as the golden solution.

We will calculate the maximum absolute error (Eabs), the maximum relative error (Erel), the averageabsolute error (Aabs), and the average relative error (Arel). For each parameter setting ~λ, those arecalculated as:

Eabs = max(∣∣∣~v (m)(~λ)− ~v (m)

r

∣∣∣) , ∀m = 1, · · · ,M

Erel = max(∣∣∣~v (m)(~λ)− ~v (m)

r

∣∣∣ ./ |~vr|) , ∀m = 1, · · · ,M

Aabs = mean(∣∣∣~v (m)(~λ)− ~v (m)

r

∣∣∣) , ∀m = 1, · · · ,M

Arel1 = mean(∣∣∣~v (m)(~λ)− ~v (m)

r

∣∣∣ ./ ∣∣∣~v (m)r

∣∣∣) , ∀m = 1, · · · ,M (5.2)

The reference solution ~v (m)r is the nominal response when computing the error of the golden solution,

highlighting the variations caused in the network by the parametric set, and is the golden solution whencomputing the error of the remaining methods, showing how the methods behave in approximating theperturbations.

5.4 Model Behaviour

We start by analyzing the smallest dataset, ibmpg1t. Using the Monte Carlo method, 1000 randomsettings for ~λ were generated as indicated in Section 5.2, and the network solved using the differentapproaches. The following table shows the results, with errors presented according to Eqs. (5.2).Computation time and memory usage are also presented. The Figure 5.1 shows the result of the nodalanalysis for a given node of the network. The mean voltage and standard deviation caused by thevariations can be observed.

The first line of the Table 5.2 shows the effect the variations have on the network, and correspondsto the golden solution. Data shows how the variations can grow quite large, with a maximum absolute

39

Basis (Mem.) Set Up (s) Eval. (s) Eabs Erel Aabs Arel

Perturbation - - - 0.600 612% 0.023 4.03%TS – (19MB) 0.7 2108 0.158 128% 0.002 0.44%

SPARE Compressed 83 (38MB) 144 81 0.194 1082% 0.003 3.33%OLA Compressed 80 (37MB) 244 80 0.263 1353% 0.004 3.66%

SPARE Non-Comp. 4949 (967MB) 62 76 0.186 426% 0.003 0.71%OLA Non-Comp. 4949 (967MB) 169 78 0.259 502% 0.003 0.90%

Table 5.2: Results of time domain variability analysis (ibmpg1t: 48 parameters, 1000 settings)

error of 0.6V, and an average relative error of ≈4%, proving that the perturbed network has a behaviourdistinct from the nominal, giving rise to the need to efficiently simulate the effect of the perturbations.

Figure 5.1: Black points indicate the time-domain nominal voltage for a given node in the ibmpg1t

example, whereas red points indicate perturbed voltage values obtained for 1000 random parametersamples.

Then, we compare how the SPARE, OLA, and 1st order Taylor Series approximation behave. Themost relevant features are the reduction in the memory usage of the compressed models versus the un-compressed, with memory savings of over 96%, and how the proposed analysis is much faster than theTaylor Series approach. This is expected, as the Taylor Series approach requires a matrix factorizationoperation for each solve, greatly increasing its computational cost.

The Set Up cost of the proposed approach comes with the disadvantage that its high (68 Taylor Seriessolves could be completed in that time for the SPARE Compressed approach) and must be completedbefore any simulation begins, but the amortization factor is quite high, as the speedup per evaluation isaround 26x, making it more cost efficient if a large number of parameter settings must be evaluated.

Both approaches (non-compressed) manage to achieve an average error under 1%. The compres-sion greatly reduces the memory usage, while maintaining the error manageable (under 4%). Thiscomes at the cost of a slower set up time, with a cost increase of around 100%. The increase on thecomputational cost comes from the RRQR decomposition that must be applied to reduce the basis size.

The SPARE approach seems to produce better results than the OLA approach. This goes againstprevious work on static networks, where the OLA approach outperforms the SPARE approach [32]. Thisis caused by the fact that the OLA approach considers that the voltage vector varies linearly with thechange in the parameter settings, as changes in the resistance values affect linearly the voltage. In thedynamic problem, however, linear changes in the value of the capacitors don’t affect the voltage in alinear fashion, leading to errors in the approximation.

40

Figure 5.2: In this time-domain response for the node that carries the highest relative error, we identifythe region where this maximum error takes place.

5.4.1 On the Maximum relative Errors

The maximum relative errors can be seen to grow quite large. Nevertheless we find it difficult to attributesignificance to that value, as small oscillations caused by the transient regime in nodes whose voltage isnear-zero can easily create those high maximum relative errors. An example can be seen in the Figure5.2. For that reason, this measurement is not used for the quantification of the results, and it will not bepresented for the analysis of the larger networks.

5.4.2 Maximum parameter Variation

The definition of the limits of maximum variation for the parameters is important, as it is of interest toallow a wide range of variations to increase variability in the network, while making sure the valuesrepresented have physical significance. For instance, each individual capacitor and resistor needs tohave a positive value. To assure that this condition is verified, no set of parameters can represent avariation on the value of an element that is greater than 100% . The figures 5.3 and 5.4 represent themaximum absolute error for 1000 Monte Carlo samples on the example ibmpg1t, on the left with a 3 σ

effect on the parameter variation and on the right with a 4 σ effect and truncation of the distribution tails.

We observe that some samples present a very large maximum absolute variation. Those outliersare caused by variations larger than 100% on some elements value (for a 3 σ effect we can expect thataround 3 in 1000 samples are outside the allowed range). By shortening the range of variation andtruncating the tails, the maximum absolute variations can be controlled and set back to realistic values.

5.5 Compressibility

We will now study the compression, namely how the compression rate changes with the number ofparameters and time points, with the threshold settings and how those changes alter the memory usageand computation time. Memory savings will be presented as a comparison against the memory neededfor the full model, where no compression scheme is used, for the SPARE approach.

41

Figure 5.3: Histogram of the maximumabsolute error per parameter setting with a 3 σ

effect.

Figure 5.4: Histogram of the maximumabsolute error per parameter setting with a 4 σ

effect.

5.5.1 Compression Rate in the Parameter Space

In this section a study of the model compressibility in parameter-space is presented. A set of exper-iments were done using example ibmpg1t and ibmpg2t, changing the number of sub-regions of thenetwork. The size of the resulting Q basis is presented, alongside the memory usage and time neededfor each solve iteration.

PG P 24 48 96 192

ibmpg1tq 62 83 111 147

Mem Savings 97.54% 98.32% 98.87% 99.246%

ibmpg2tq 47 64 89 112

Mem Savings 98.14% 98.71% 99.09% 99.425%

Table 5.3: Evaluating Compressibility in the Parameter Space.

Figure 5.5: Compressibility analysis in the parameter space. Black dots indicate data points, and thered dotted line represents a log-fit to the data.

42

The data shows some compressibility in parameter-space, as an increase in the number of param-eters produces a sub-linear increase in the basis size, and therefore in the memory usage (Figure 5.5).This goes as expected, proving the correlation of the voltages at the nodes of different regions. Suchredundant vectors are easily represented, as they are linear dependent on previous calculated perturba-tion vectors. A logarithmic function y = a∗ log (b+ x)+c is adjusted to the data to highlight the sub-linearincrease and the effectiveness of the compression.

The relation presented in Section 4.7.2 for the cost of the Evaluation stage of the algorithm can beshown by the data. Computing the ratio q/(P + 1) and the ratio tComp/tNComp, we observe that theyare similar (R2 = 0.88). This shows that for a large enough number of parameters and small enoughbasis, the compression not only greatly diminishes the memory resources needed to solve the networkbut also makes the computation effort smaller.

P 24 48 96 192q/(P+1) 2.48 1.69 1.14 0.762

comp/Ncomp 2.15 1.55 1.07 0.754

Table 5.4: Ratios between evaluation times of the compressed and non-compressed model andcomparation to the predicted speed-up ratio in Eq. (4.27).

5.5.2 Compression Rate in the Time Domain

We now study the compressibility in time-domain by executing a set of experiments using exampleibmpg1t, changing the number of time steps of the simulation. The relative threshold was kept at trel =

10−3. The size of the resulting Q basis is presented, alongside the memory usage and time needed foreach solve iteration.

PG M 81 101 121 151 201

ibmpg1tq 234 238 241 242 245

Mem Savings 94.10% 95.19% 95.94% 96.73% 97.51%

Table 5.5: Evaluating Compressibility in the Time Domain.

The data shows, as expected, high compressibility in time-domain, confirming the existence of acorrelation between time steps. Perturbation vectors for each time step are linearly dependent on thevectors of the previous ones, resulting in new time steps being extremely compressed.

5.5.3 Compression Rate versus RRQR thresholds

The compression rate is controlled by a set of threshold settings as explained in Section 4.2.3. In thissection the effect those settings have on the compression rate is studied. Experiments were conductedusing the example network ibmpg1t. To quantify the results the error of the compressed model in relationto the uncompressed full model is presented, together with computation time and memory usage, for theSPARE approach.

Analysis of the uncompressed model

In the Figure 5.7 the first 100 eigenvalues of the uncompressed parametrized model X for ibmpg1t

are shown. The eigenvalues are ordered from highest to lowest. We can see that the majority of theinformation is contained in a small set of vectors at the left side of the graph, with over 98% of the total

43

Figure 5.6: Compressibility analysis in the time domain. Black dots indicate data points.

sum contained in the first 10 vectors, showing that most of the perturbations can be represented as alinear combination of other basis vectors.

Figure 5.7: Eigenvalues of the model matrix X.

Absolute threshold

This threshold tabs drops candidate basis vectors whose orthogonalized norm is less than its value.Relative threshold is kept at trel = 10−3.

Data shows that increasing the absolute threshold will reduce the basis size up to a plateau value.After that value the basis size would only be reduced with a high increase in the absolute threshold, andthat would come along with a significant increase in the reconstitution error.

44

tolaabs q Eabs Aabs Arel(%) Mem Savings10−2 364 0.184 0.003 1.04% 92.64%10−1 238 0.185 0.003 1.22% 95.19%103 238 0.185 0.003 1.22% 95.19%

Table 5.6: Evaluating compressibility versus absolute threshold.

Relative Threshold

This threshold drops candidate basis vectors whose orthogonalized norm is less than trel×a, where a isthe greatest orthogonalized norm found during the RRQR process in a vector contained in Q. Absolutethreshold is kept at tabs = 1.

tolrel q Eabs Aabs Arel(%) Mem Savings10−6 363 0.186 0.003 0.71% 92.66%10−4 262 0.189 0.003 0.71% 94.71%10−2 83 0.194 0.003 3.33% 98.32%100 50 2.220 0.022 18.45% 98.99%

Table 5.7: Evaluating compressibility versus relative threshold.

This setting allows a more fine-tuned approach to controlling the basis size, compared to changingthe absolute threshold. The basis size steadily drops with the increase of the relative threshold setting,resulting in a bigger error, as less and less principal components of the data are considered. A goodcompromise between an acceptable error rate and the memory available for computation efforts can bemanaged by adjusting this setting.

We can see that in general the compression rates are quite high, and using a relative threshold oftrel = 10−2 we achieve a compression rate of around 98%. This allows us to solve big networks thatwould otherwise be computationally intractable.

5.6 Large Networks

In this section results for larger networks are presented and discussed. The presented data follows thescheme of Section 5.4, for examples ibmpg2t, ibmpg3t, and ibmpg6t.

The results profile is very similar to the obtained for the example ibmpg1t presented in Section 5.4.Uncompressed models for the larger examples were not computed as those simulations would take anunreasonable amount of time. All analysis were done using 1000 parameter settings. The averagerelative error for the compressed models is under 5%. We remind that the error of the compressedmodel can be adjusted in accordance to what was shown in Section 5.5.3. The bigger examples showhow scalable this approach is, with a speedup versus the Taylor Series approach of over 4x. Althoughthe Taylor Series approach is more accurate, it is clear that its use is impractical for the simulation ofthe dynamic behaviour of large networks, as the computational cost of the simulation quickly becomesunattainable.

It is noticeable that for the larger network, ibmpg6t, the sum of the costs of the set-up and evaluationstages for the proposed model seems to bring no advantage versus the standard approach. We shouldremind the reader that only 1000 parameter settings are computed in this example, and if a largernumber of settings is considered the high cost of the set-up stage will rapidly become manageable(e.g. for a reasonable amount of 10 000 settings, the total costs become approximately 2890364 for theTS approach and 870376 for the proposed approach, a speedup of 3,32x). If additional samples are

45

Basis (Mem.) Set Up (s) Eval. (s) Eabs Aabs Arel

Perturbation - - - 0.505 0.023 3.70%TS – (415MB) 632 18058 0.208 0.004 0.70%

SPARE Compressed 112 (572MB) 9922 996 0.208 0.005 3.50%OLA Compressed 108 (566MB) 23518 976 0.276 0.004 3.92%

SPARE Non-Comp. 19493 (24.74GB) 5567 1565 0.190 0.004 0.83%OLA Non-Comp. 19493 (24.74GB) 19410 1559 0.251 0.004 0.87%

Table 5.8: Results of time domain variability analysis (ibmpg2t: 192 parameters)




Table 5.9: Results of time domain variability analysis. Full model memory usage is 79GB. (ibmpg3t: 96parameters)




Table 5.10: Results of time domain variability analysis. Full model memory usage is 223GB. (ibmpg6t:144 parameters)

46

required for analysis, then the speedup will further increase.The example ibmpg2t also shows a behaviour described in Section 4.7.2, where the compressed

model can require less computational effort to solve, if the number of parameters is large enough. Forthis example, using 192 parameters, the ratio in Eq.(4.27) is smaller than 1, leading to a faster solvestage of the algorithm, while keeping memory usage low (savings of over 97%) and an acceptable errorof the same order of magnitude as the uncompressed model.

5.7 Achieving Faster Evaluation Times

The proposed algorithm efficiency can be dramatically increased by the use of more efficient algorithmsfor matrix manipulations [35]. The use of the BLAS package[36], namely the level 2 BLAS routines thatdeal with matrix-vector operations, will speed up the solve stage of the algorithm.

The BLAS routines are a considered a standard for matrix operations e.g. dot products, matrix-vectormultiplications, and matrix-matrix products[37]. BLAS implementations are highly optimized for speed,and often take into account the specificity of the hardware they run on. Those implementations areusually written in FORTRAN (as the original specification), in C, or directly in machine code, makingthem extremely efficient.

An experiment was prepared using example ibmpg2t, intending to compare the execution time ofthe SPARE approach while using the standard MATLAB matrix-vector operation algorithm and the BLASroutine, for the Evaluation stage. This, however, remains as a work in progress at this date, but a highspeedup factor is expected.

47

48

Chapter 6

Conclusion

During the course of the present work we developed a novel scheme that efficiently estimates the effectsof variability on large dynamic interconnects, a goal successfully completed as demonstrated by thepresented results. The presented scheme allows the analysis of large numbers of perturbation settingsin a very efficient fashion. A paper was submitted to the DATE 2016 Conference[11] and is under reviewat this time.

The proposed approaches are based on separating the problem in two stages: a first Set Upstage where the perturbations are approximated using a low order approximation and a compressedparametrized model of the network is generated, and a second Evaluation stage where the previouslygenerated model is used any number of times to simulate the network’s response to several parametersettings. The model generation is very efficient, requiring only one factorization of the nominal sys-tem matrix, that is reused. The presented scheme is also highly parallelizable, with full parallelizationachieved on the Evaluation stage.

The compressibility analysis revealed that the perturbed voltage vectors are in fact highly compress-ible, allowing the evaluation of larger networks in limited memory environments. The compression wasstudied only in the time and parameter space, but increased compression rates and faster set up timescould be achieved by applying MOR techniques on the original system matrix, as we are working underthe assumption that the voltages are spatially correlated thus allowing the compression on the spa-tial(nodal) dimension.

We have shown how the use of efficient algorithms for matrix manipulation can greatly change thespeed of the matrix-vector operations during the Evaluation stage, concluding that the use of a compiledlanguage with more efficient algebraic routines could increase the speedup rates that were presented inthe results.

The performance has been proved using a set of benchmark power grids, both in the achievedaccuracy versus the Taylor Series approach (with approximation errors under 5%, and manageablevia a trade-off with the compression ratio), and in the speedup rates verified during evaluation thatrapidly discount the cost of the set up. Improved results are expected if larger networks are used,especially if these networks are divided in a large amount of regions causing an increase in the numberof independent parameters, thus increasing the compressibility.

Future Work

There is still work to do in the development of efficient analysis methods for large power grids under theeffect of variations, as this problem continues to be a great challenge due to its size. At the moment, wesuggest a continuation of the present work by using two-stage schemes, as evaluating a parametrized

49

model as suggested is a task that albeit being computationally heavy can be done very efficiently. Thisleaves the generation of the parametrized model as a working topic.

There are several directions one can follow to enhance the model generation stage. Three possibili-ties would be:

1. Generating a smaller model, therefore reducing the evaluation time;

2. Generating a model in a faster fashion, while maintaining the approximation error low.

3. Generating a more complex model, increasing specification.

All seem as a probable goal for future work that is developed based around the presented anal-ysis flow. As new specifications for the model, adding packaging models leading to RLC analysis orintroducing nonlinear devices seem interesting options.

With that assumption in mind, we suggest an approach to generate a model applicable to nonlineardevices, based on support vector machines. Support vector machines (SVM) are machine learningmodels used to analyze data and recognize patterns, and have seen application in numerous areas, i.e.object detection, cancer diagnosis and signal processing [38].

A SVM-based approach could potentially provide enhancements in the generation of a compactrepresentation of the nonlinear system, and also function as a model order reduction tool. This processwould entail the projection of the system response using a kernel function, that would allow the nonlinearcharacteristics to be approximated, leading to the generation of a model similar to the presented inChapter 4.

50

Bibliography

[1] G. Moore, “Cramming more components onto integrated circuits,” Proceedings of the IEEE, vol. 86,pp. 82–85, Jan 1998.

[2] Work by Wgsimon used with permission under license (CC BY-SA 3.0)http://creativecommons.org/licenses/by/3.0/.

[3] WSTS August 2015 Press Release - https://www.wsts.org/content/download/3775/25690 , ac-cessed on 09/10/2015.

[4] The Electronic Design Automation Consortium - http://www.edac.org/initiatives/committees/mss ,accessed on 09/10/2015.

[5] Based on work by Peellden with permission under license (CC BY-SA 3.0)http://creativecommons.org/licenses/by/3.0/.

[6] C. Mack, Fundamental Principles of Optical Lithography: The Science of Microfabrication. Wiley,2008.

[7] Based on work by Cmglee with permission under license (CC BY-SA 3.0)http://creativecommons.org/licenses/by/3.0/.

[8] Based on work by Cepheiden with permission under license (CC BY-SA 2.5)http://creativecommons.org/licenses/by/2.5/.

[9] Intel - http://www.intel.com/content/dam/www/public/us/en/documents/pdf/foundry/mark-bohr-2014-idf-presentation.pdf , accessed on 09/10/2015.

[10] J. Owens, W. Dally, R. Ho, D. Jayasimha, S. Keckler, and L.-S. Peh, “Research challenges foron-chip interconnection networks,” Micro, IEEE, vol. 27, pp. 96–108, Sept 2007.

[11] DATE Conference - http://www.date-conference.com/ , accessed on 09/10/2015.

[12] S. Nassif, “Power grid analysis benchmarks,” in Design Automation Conference, 2008. ASPDAC2008. Asia and South Pacific, pp. 376–381, March 2008.

[13] N. Srivastava, X. Qi, and K. Banerjee, “Impact of on-chip inductance on power distribution networkdesign for nanometer scale integrated circuits,” in Quality of Electronic Design, 2005. ISQED 2005.Sixth International Symposium on, pp. 346–351, March 2005.

[14] A. V. Mezhiba and E. G. Friedman, “Inductive properties of high-performance power distributiongrids,” Very Large Scale Integration (VLSI) Systems, IEEE Transactions on, vol. 10, no. 6, pp. 762–776, 2002.

51

[15] F. Alimenti, P. Mezzanotte, L. Roselli, and R. Sorrentino, “Modeling and characterization of thebonding-wire interconnection,” Microwave Theory and Techniques, IEEE Transactions on, vol. 49,no. 1, pp. 142–150, 2001.

[16] S. Kvatinsky, E. Friedman, A. Kolodny, and L. Schachter, “Power grid analysis based on a macrocircuit model,” in Electrical and Electronics Engineers in Israel (IEEEI), 2010 IEEE 26th Conventionof, pp. 708–712, Nov 2010.

[17] J. R. Gilbert, C. Moler, and R. Schreiber, “Sparse matrices in matlab: design and implementation,”SIAM Journal on Matrix Analysis and Applications, vol. 13, no. 1, pp. 333–356, 1992.

[18] E. G. Ng and P. Raghavan, “Performance of greedy ordering heuristics for sparse cholesky factor-ization,” SIAM Journal on Matrix Analysis and Applications, vol. 20, no. 4, pp. 902–914, 1999.

[19] T. Davis, Direct Methods for Sparse Linear Systems. Society for Industrial and Applied Mathemat-ics, 2006.

[20] N. Mi, S.-D. Tan, Y. Cai, and X. Hong, “Fast variational analysis of on-chip power grids by stochasticextended krylov subspace method,” Computer-Aided Design of Integrated Circuits and Systems,IEEE Transactions on, vol. 27, pp. 1996–2006, Nov 2008.

[21] W. Schilders, H. van der Vorst, and J. Rommes, Model Order Reduction: Theory, Research Aspectsand Applications. Mathematics in Industry, Springer Berlin Heidelberg, 2008.

[22] J. a. M. S. Silva, J. R. Phillips, and L. M. Silveira, “Efficient simulation of power grids,” Trans. Comp.-Aided Des. Integ. Cir. Sys., vol. 29, pp. 1523–1532, Oct. 2010.

[23] S. Nassif and J. Kozhaya, “Fast power grid simulation,” in Design Automation Conference, 2000.Proceedings 2000, pp. 156–161, 2000.

[24] H. Qian, S. Nassif, and S. Sapatnekar, “Power grid analysis using random walks,” Computer-AidedDesign of Integrated Circuits and Systems, IEEE Transactions on, vol. 24, pp. 1204–1224, Aug2005.

[25] P. Olver and C. Shakiban, Applied Linear Algebra. Pearson, 2005.

[26] D. S. Watkins, Fundamentals of Matrix Computations. New York, NY, USA: John Wiley & Sons,Inc., 1991.

[27] D. Coppersmith and S. Winograd, “Matrix multiplication via arithmetic progressions,” in Proceedingsof the nineteenth annual ACM symposium on Theory of computing, ACM, 1987.

[28] B. Boghrati and S. Sapatnekar, “Incremental solution of power grids using random walks,” in DesignAutomation Conference (ASP-DAC), 2010 15th Asia and South Pacific, pp. 757–762, Jan 2010.

[29] P. Sun, X. Li, and M.-Y. Ting, “Efficient incremental analysis of on-chip power grid via sparse ap-proximation,” in Proceedings of the 48th Design Automation Conference, DAC ’11, (New York, NY,USA), pp. 676–681, ACM, 2011.

[30] P. Ghanta, S. Vrudhula, R. Panda, and J. Wang, “Stochastic power grid analysis considering pro-cess variations,” in Design, Automation and Test in Europe, 2005. Proceedings, pp. 964–969 Vol.2, March 2005.

52

[31] J. Villena and L. Miguel Silveira, “Spare - a scalable algorithm for passive, structure preserving,parameter-aware model order reduction,” in Design, Automation and Test in Europe, 2008. DATE’08, pp. 586–591, March 2008.

[32] J. Fernandez Villena and L. Silveira, “Efficient analysis of variability impact on interconnect lines andresistor networks,” in Design, Automation and Test in Europe Conference and Exhibition (DATE),2014, pp. 1–6, March 2014.

[33] W. Gander, “Algorithms for the qr-decomposition,” Technical Report 80-02, Angewandte Mathe-matik, ETH, 1980.

[34] Y. P. Hong and C.-T. Pan, “Rank-revealing qr factorizations and the singular value decomposition,”Mathematics of Computation, vol. 58, no. 197, pp. pp. 213–232, 1992.

[35] “An updated set of basic linear algebra subprograms (blas),” ACM Trans. Math. Softw., vol. 28,pp. 135–151, June 2002.

[36] BLAS - http://www.netlib.org/blas/ , accessed on 09/10/2015.

[37] Robert A. van de Geijn - http://www.cs.utexas.edu/users/flame/books/ACMTOMS.pdf , accessedon 09/10/2015.

[38] L. Wang, Support Vector Machines: theory and applications, vol. 177. Springer Science & BusinessMedia, 2005.

53

Date post:	11-Jun-2020
Category:	Documents
Upload:	others
View:	5 times
Download:	0 times

Variability and Statistical Dynamic Analysis Flow for ...€¦ · Variability and Statistical...

Documents