Date post: | 18-Dec-2015 |
Category: |
Documents |
View: | 221 times |
Download: | 0 times |
Simulated Evolution Algorithm for Multi-Objective VLSI Netlist
Bi-Partitioning
Sadiq M. Sait, Aiman El-Maleh, Raslan Al-Abaji
King Fahd University of Petroleum & Minerals
Dhahran, Saudi Arabia
27th May, ISCAS-2003, Bangkok, Thailand
2
Introduction Problem Formulation Cost Functions Proposed Approach Experimental Results Conclusion
Outline
3
Design Characteristics0.13M12MHz1.5um
CAESystems,Silicon
Compilation
7.5M333MHz0.25um
Cycle-BasedSimulation,
FormalVerification
3.3M200MHz
0.6um
Top-DownDesign,
Emulation
1.2M50MHz0.8um
HDLs,Synthesis
0.06M2MHz6um
SPICESimulation
Key CAD Capabilities
The challenges to sustain such a fast growth to achieve giga-scale integration have shifted in a large degree, from the process of manufacturing technologies to the design technology. New issues have also come up.
VLSI Technology Trends
4
1. System Specification2. Functional Design3. Logic Design4. Circuit Design5. Physical Design6. Design Verification 7. Fabrication 8. Packaging Testing and Debugging
VLSI design process comprises a number of levels:
VLSI Design Cycle
5
What is Physical Design? A process that translates a structural (netlist)description into a geometric description that is used to manufacture a chip.
The physical design cycle consists of:
1. Partitioning 2. Floorplanning and Placement3. Routing 4. Compaction
Why do we need Partitioning ?
Physical Design
6
System Level Partitioning
Board Level Partitioning
Chip Level Partitioning
System
PCBs
Chips
Subcircuits/Blocks
Levels of Partitioning
7
Partitioning Algorithms
Group Migration Iterative HeuristicsPerformance
Driven
1. Kernighan-Lin
2. Fiduccia-Mattheyeses (FM)
3. Multilevel K-way Partitioning
Others
1. Simulated Annealing
2. Simulated Evolution
3. Tabu Search
4. Genetic Algorithm
1. Lawler et al.
2. Vaishnav
3. Choi et al.
4. Jun’ichiro et al.
1. Spectral
2. Multilevel Spectral
Classification of Partitioning Algorithms
8
Related previous Work
1969 A bottom-up approach for delay optimization (clustering) was proposed by Lawler et al.
1998 A circuit partitioning algorithm under path delay constraint is proposed by jun’ichiro et al. The proposed algorithm consists of the clustering and iterative improvement phases.
1999 Two low power oriented techniques based on simulated annealing (SA) algorithm by choi et al.
1999 Enumerative partitioning algorithm targeting low power were proposed by Vaishnav et al. Enumerates alternate partitioning and selects a partitioning that has the same delay but less power dissipation.
9
Need for Power optimization: Portable devices Power consumption is a hindrance to further integration Increasing clock frequency
Need for Delay optimization: In current sub micron design wire delays tend to dominate
gate delay. Larger die size imply long on-chip wires which affect
performance Delay due to off-chip capacitance
Objectives: Power, Delay & Cutset are optimized Constraint: Balanced partitions (with some tolerance)
Motivation & Objective
10
Problem formulation
The circuit is modeled as a hypergraph H(V,E), where V={v1,v2,v3,… vn} is a set of modules (cells)
And E={e1, e2, e3,… ek} is a set of hyperedges. Being the set
of signal nets, each net is a subset of V containing the modules that the net connects.
A 2-way partitioning of a set of nodes V is to determine subsets VA and VB such that VA VB = V and VA VB =
11
Based on hypergraph model H = (V, E) Cost: c(e) = 1 if e spans more than 1 block Cutset = sum of hyperedge costs Efficient gain computation and update
cutset = 3
Cutset
12
SE1 SE2C1 C4 C5
C3
C2
C6
Cu
t Lin
e
CoffChip
C7
Metal 1
Metal 2
path : SE1 C1C4C5SE2.
Delay = CDSE1 + CDC1+ CDC4+ CDC5+ CDSE2
CDC1 = BDC1 + LFC1 * ( Coffchip + CINPC2+ CINPC3+ CINPC4)
Delay
PinetPicell
netDelaycellDelayPiDelay )()()(
)(: PiDelayMaxObjectivePPi
13
The average dynamic power consumed by CMOS logic gate in a synchronous circuit is given by:
iLoadi
cycle
ddaveragei NC
T
VP
2
5.0Ni is the number of output gate transition per cycle (Switching Probability)
load capacitance = Load Capacitances before Partitioning + load due to off chip capacitance
Power
extrai
basici
Loadi CCC
ii
extrai
basici
cycle
dd NCCT
VP
2
Total Power dissipation of a Circuit: vi
iNObjective
:
14
Weighted Sum Approach
MaxPower
PowerW
MaxCutest
CutsetCostW
MaxDelay
CircuitofCostDelayWCost pcd
___
1. Problems in choosing weights
2. Need to tune for every circuit
Unifying Objectives by Fuzzy logic
Imprecise values of the objectivesBest represented by linguistic terms that are basis of fuzzy algebra Conflicting objectives Operators for aggregating function
15
1. The cost to membership mapping
2. Linguistic fuzzy rule for combining the membership values in an aggregating function
3. Translation of the linguistic rule in form of appropriate fuzzy operators
4. Fuzzy operators
• And-like operators: Min operator = min (1, 2)
• And-like OWA: = * min (1,2) + ½ (1-) (1+ 2)
• Or-like operators: Max operator = max (1, 2)
• Or-like OWA: = * max (1,2) + ½ (1-) (1+ 2)
Where is a constant in range [0,1]
Fuzzy logic for Multi-objective function
16
Where Oi and Ci are lower bound and actual cost of objective “i”
i(x) is the membership of solution x in set “good ‘i’ gi is the relative acceptance limit for each objective.
Membership functions
17
A good partitioning can be described by the following fuzzyrule
IF solution has small cutset AND low power AND short delay AND good BalanceTHEN it is a good solution
The above rule is translated to AND-like OWA
Fuzzy linguistic rule & Cost function
BDPCBDPCx 4
11,,,min)(
Represent the total Fuzzy fitness of the solution, our aim is to Maximize this fitness
)(x
BDPC ,,, Respectively (Cutset, Power, Delay, Balance) Fitness
18
Simulated Evolution
Algorithm Simulated_EvolutionBegin Start with an initial feasible Partition S RepeatEvaluation: Evaluate Gi (goodness) for all modulesSelection: For each Vi (cell) DO begin if Random Rm > Gi then select the cell End ForAllocation: For each selected Vi (cell) DO begin Move the cell to destination block. End ForUntil Stopping criteria is satisfied.Return best solution.End
19
Cut goodness
2
3
1
4
5
7 6
Partition 1 Partition 2
i
iii d
wdgc
33.03
235
gc
di: set of all nets, connected and not cut.
wi: set of all nets, connected and cut.
20
Power Goodness
2
3
1
4
5
0.2
0.1
0.2
7
0.3
6
0.4
0.1
Partition 1 Partition 2
Vi is the set of all nets connected
and Ui is the set of all nets connected and cut.
k
jIj
k
j
k
jijIj
i
VjS
UjSVjS
gp
1
1 1
428.07.0
4.07.05
gp
21
Delay Goodness
2
3
1
4
5
7 6
Partition 1 Partition 2
Q
QSET
CLR
D
Q
QSET
CLR
D
Ki: is the set of cells in all paths passing by cell i.Li: is the set of cells in all paths passing by cell i and are not in same block as i.
i
iii K
LKgd
6.05
255
gd 4.0
5
354
gd
22
Final selection Fuzzy ruleIF cell ‘i’ is near its optimal cut-set goodness as compared to other cells AND
AND
THEN it has a high goodness.
near its optimal net delay goodness as compared to other cells
OR T(max)(i) is much smaller than Tmax
near its optimal power goodness compared to other cells
23
Experimental Results
ISCAS 85-89 Benchmark Circuits
24
SimE versus Tabu Search & GA against time
Circuit: s13207
25
Experimental Results: SimE versus TS and GA
SimE results were better than TS and GA, with faster execution time.
26
Conclusion
The present work addressed the issue of partitioning VLSI circuits with the objective of reducing power and delay (in addition to nets cut)
Fuzzy logic was resorted to for combining multi-objectives
Iterative algorithms (GA, SA, and SimE) were investigated and compared for performance in terms of quality of solution and run time
SimE outperformed TS and GA in terms of quality of solution and execution time
27
Thank you
28
Tmax :delay of most critical path in
current iteration.
T(max)(i) :delay of longest path
traversing cell i.
Xpath= Tmax / T(max)(i)
iDiPiCiDiPiCii xg 3
11,,min)(
Fuzzy Goodness
iDiPiC ,, Respectively (Cutset, Power, Delay ) goodness.