1
Simulated Evolution Algorithm for Multiobjective VLSI
Netlist Bi-Partitioning
By
Dr Sadiq M. Sait
Dr Aiman El-Maleh
Raslan Al Abaji
King Fahd University
Computer Engineering Department
MS Thesis Presentation
2
• Introduction
• Problem Formulation
• Cost Functions
• Proposed Approaches
• Experimental results
• Conclusion
Outline ….
3
Design Characteristics
0.13M12MHz1.5um
CAESystems,Silicon
compilation
7.5M333MHz0.25um
Cycle-basedsimulation,
FormalVerification
3.3M200MHz
0.6um
Top-DownDesign,
Emulation
1.2M50MHz0.8um
HDLs,Synthesis
0.06M2MHz6um
SPICESimulation
Key CAD Capabilities
The Challenges to sustain such an exponential growth to achieve gigascale integration have shifted in a large degree, from the process of manufacturing technologies to the design technology.
VLSI Technology Trend
4
Technology 0.1 umTransistors 200 MLogic gates 40 MSize 520 mm2
Clock 2 - 3.5 GHzChip I/O’s 4,000Wiring levels 7 - 8Voltage 0.9 - 1.2Power 160 WattsSupply current ~160 Amps
PerformancePower consumptionNoise immunityAreaCostTime-to-market
Tradeoffs!!!
The VLSI Chip in 2006
5
1. System Specification
2. Functional Design
3. Logic Design
4. Circuit Design
5. Physical Design
6. Design Verification
7. Fabrication
8. Packaging Testing and Debugging
•VLSI design process is carried out at a number of levels.
VLSI Design Cycle
6
Physical Design converts a circuit description into a geometric description. This description is used to manufacture a chip.
1. Partitioning
2. Floorplanning and Placement
3. Routing
4. Compaction
The physical design cycle consists of:
Physical Design
7
• Decomposition of a complex system into smaller subsystems.
• Each subsystem can be designed independently speeding up the design process (divide-and conquer-approach).
• Decompose a complex IC into a number of functional blocks, each of them designed by one or a team of engineers.
• Decomposition scheme has to minimize the interconnections between subsystems.
Why we need Partitioning ?
8
System Level Partitioning
Board Level Partitioning
Chip Level Partitioning
System
PCBs
Chips
Subcircuits/ Blocks
Levels of Partitioning
9
Partitioning Algorithms
Group Migration Simulation Based IterativePerformance
Driven
1. Kernighan-Lin
2. Fiduccia-Mattheyeses (FM)
3. Multilevel K-way Partitioning
Others
1. Simulated annealing
2. Simulated evolution
3. Tabu Search
4. Genetic
1. Lawler et al.
2. Vaishnav
3. choi et al.
4. jun’ichiro et al.
1. Spectral
2. Multilevel Spectral
Classification of Partitioning Algorithms
10
Related previous Works1999 Two low power oriented techniques based on simulated annealing (SA)
algorithm by choi et al.
1969 A bottom-up approach for delay optimization (clustering) was proposed by Lawler et al.
1998 A circuit partitioning algorithm under path delay constraint is proposed by jun’ichiro et al. The proposed algorithm consists of the clustering and iterative improvement phases.
1999 Enumerative partitioning algorithm targeting low power is proposed in Vaishnav et al. Enumerates alternate partitionings and selects a partitioning that has the same delay but less power dissipation. (not feasible for huge circuits.)
11
Need for Power optimization
• Portable devices.
• Power consumption is a hindrance in further integration.
• Increasing clock frequency.
Need for delay optimization
• In current sub micron design wire delay tend to dominate gate delay. Larger die size imply long on-chip global routes, which affect performance.
• Optimizing delay due to off-chip capacitance.
Motivation
12
Objective
• Design a class of iterative algorithms for VLSI multi objective partitioning.
• Explore partitioning from a wider angle and consider circuit delay , power dissipation and interconnect in the same time, under balance constraint.
13
Objectives :
• Power cost is optimized AND
• Delay cost is optimized AND
• Cutset cost is optimized
Constraint
• Balanced partitions to a certain tolerance degree. (10%)
Problem formulation
14
Problem formulation• the circuit is modeled as a hypergraph H(V,E)
• Where V ={v1,v2,v3,… vn} is a set of modules (cells).
• And E = {e1, e2, e3,… ek} is a set of hyperedges. Being the set of signal nets, each net is a subset of V containing the modules that the net connects.
• A two-way partitioning of a set of nodes V is to determine two subsets VA and VB such that VA U VB = V and VA VB =
15
• Based on hypergraph model H = (V, E)
• Cost 1: c(e) = 1 if e spans more than 1 block
• Cutset = sum of hyperedge costs
• Efficient gain computation and update
cutset = 3
Cutset
16
SE1 SE2C1 C4 C5
C3
C2
C6
Cu
t Lin
e
CoffChip
C7
Metal 1
Metal 2
path : SE1 C1C4C5SE2.
Delay = CDSE1 + CDC1+ CDC4+ CDC5+ CDSE2
CDC1 = BDC1 + LFC1 * ( Coffchip + CINPC2+ CINPC3+ CINPC4)
Delay Model
17
PinetPicell
netDelaycellDelay )()(Delay(Pi) =
Picell
cellDelay )(Delay(Pi) =
Pi: is any path Between 2 cells or nodes
P : set of all paths of the circuit.
)(: PiDelayMaxObjectivePPi
Delay
18
The average dynamic power consumed by CMOS logic gate in a synchronous circuit is given by:
iLoadi
cycle
ddaveragei NC
T
VP
2
5.0
Ni : is the number of output gate transition per cycle( switching Probability)
LoadiC : Is the Load Capacitance
Power
19
extrai
basici
Loadi CCC basiciC : Load Capacitances driven by a cell
before Partitioning
extraiC : additional Load due to off chip
capacitance.( cut net)
ii
extrai
basici
cycle
dd NCCT
VP
2Total Power dissipation of a Circuit:
Power
20
vi
iNobjective
:
basici
extrai CC
extraiC : Can be assumed identical for all nets
v :Set of Visible gates Driving a load outside the partition.
Power
21
The Balance as constraint is expressed as follows:
TolblockscellsBlockTolblockscells i //
PercentTolblockcellsTol */However balance as a constraint is not appealing because it may prohibits lots of good moves.
Objective : |Cells(block1) – Cells( block2)|
Balance
22
• Weighted Sum Approach
MaxPower
PowerW
MaxCutest
CutsetCostW
Maxdelay
fCircuitDelayCostoWt pcd *cos
1. Problems in choosing Weights.
2. Need to tune for every circuit.
Unifying Objectives, How ?
23
• Imprecise values of the objectives– best represented by linguistic terms that are
basis of fuzzy algebra• Conflicting objectives• Operators for aggregating function
Fuzzy logic for cost function
24
1. The cost to membership mapping.
2. Linguistic fuzzy rule for combining the membership values in an aggregating function.
3. Translation of the linguistic rule in form of appropriate fuzzy operators.
Use of fuzzy logic for Multi-objective cost function
25
• And-like operators– Min operator = min (1, 2)– And-like OWA
= * min (1, 2) + ½ (1- ) (1+ 2)
Or-like operators– Max operator = max (1, 2)– Or-like OWA
= * max (1, 2) + ½ (1- ) (1+ 2)Where is a constant in range [0,1]
Some fuzzy operators
26
Where Oi and Ci are lower bound and actual cost of objective “i”
i(x) is the membership of solution x in set “good ‘i’ ”
gi is the relative acceptance limit for each objective.
Membership functions
27
• A good partitioning can be described by the following fuzzy rule
IF solution has
small cutset AND
low power AND
short delay AND
good Balance.
THEN it is a good solution
Fuzzy linguistic rule
28
The above rule is translated to AND-like OWA
BDPC
BDPCx
4
11
,,,min)(
Represent the total Fuzzy fitness of the solution, our aim is to Maximize this fitness.
)(x
BDPC ,,, Respectively (Cutset, Power, Delay , Balance ) Fitness.
Fuzzy cost function
29
Simulated EvolutionAlgorithm Simulated evolution
Begin
Start with an initial feasible Partition S
Repeat
Evaluation : Evaluate the Gi (goodness) of all modules
Selection :
For each Vi (cell) DO
begin
if Random Rm > Gi then select the cell
End For
Allocation: For each selected Vi (cell) DO
begin
Move the cell to destination Block.
End For
Until Stopping criteria is satisfied.
Return best solution.
End
Simulated evolution Implementation.
• Cut goodness
• Power goodness
• Delay goodness
• The overall is a Fuzzy goodness.
31
Cut goodness
2
3
1
4
5
7 6
Partition 1 Partition 2
i
iii d
wdgc
33.03
235
gc
di: set of all nets, Connected and not cut.
wi : set of all nets, Connected and cut.
32
Power Goodness
2
3
1
4
5
0.2
0.1
0.2
7
0.3
6
0.4
0.1
Partition 1 Partition 2
Vi is the set of all nets connected and Ui is
the set of all nets connected and cut.
k
jIj
k
j
k
jijIj
i
VjS
UjSVjS
gp
1
1 1
428.07.0
4.07.05
gp
33
Delay Goodness
2
3
1
4
5
7 6
Partition 1 Partition 2
Q
QSET
CLR
D
Q
QSET
CLR
D
Ki: is the set of cells in all paths passing by cell i.Li: is the set of cells in all paths passing by cell i and are not in same block as i.
i
iii K
LKgd
6.05
255
gd
4.05
354
gd
34
Final selection Fuzzy rule.IF Cell I is near its optimal Cut-set goodness as compared to other cells AND
AND
THEN it has a high goodness.
near its optimal net delay goodness as compared to other cells
OR T(max)(i) is much smaller than Tmax
near its optimal power goodness compared to other cells
35
Tmax :delay of most critical path
in current iteration.
T(max)(i) :delay of longest path
traversing cell i.
Xpath= Tmax / T(max)(i)
iDiPiCiDiPiCii xg 3
11,,min)(
Fuzzy Goodness
Fuzzy Goodness:
iDiPiC ,, Respectively (Cutset, Power, Delay ) goodness.
36
Selection implementation
• Biasless selection scheme• The goodness distribution among the cells
is Guassian, with mean Gm and Standard deviation G .
• A random Guassian Rm number is generated with R .
• Eliminate having cells with zero selection probability.
37
Selection implementation
• Rm = Gm - G
• R = G
selection rule :
if Rm > Goodness (I) then select the cell.
38
Experimental Results
ISCAS 85-89 Benchmark Circuits
39
SimE Vs Ts Vs GA against time Circuit S13207
40
Experimental Results SimE Vs Ts Vs GA
SimE results were better than TS and GA, with faster execution time.
41
Thank you.
Questions?
42
Evolutionary Heuristics for Multiobjective
VLSI Netlist Bi-Partitioning
by
Dr. Sadiq M. Sait
Dr. Aiman El-Maleh
Mr. Raslan Al Abaji
Computer Engineering Department
43
• Introduction
• Problem Formulation
• Cost Functions
• Proposed Approaches
• Experimental results
• Conclusion
Outline ….
44
Design Characteristics
0.13M12MHz1.5um
CAESystems,Silicon
compilation
7.5M333MHz0.25um
Cycle-basedsimulation,
FormalVerification
3.3M200MHz
0.6um
Top-DownDesign,
Emulation
1.2M50MHz0.8um
HDLs,Synthesis
0.06M2MHz6um
SPICESimulation
Key CAD Capabilities
The Challenges to sustain such an exponential growth to achieve gigascale integration have shifted in a large degree, from the process of manufacturing technologies to the design technology.
VLSI Technology Trend
45
Technology 0.1 umTransistors 200 MLogic gates 40 MSize 520 mm2
Clock 2 - 3.5 GHzChip I/O’s 4,000Wiring levels 7 - 8Voltage 0.9 - 1.2Power 160 WattsSupply current ~160 Amps
PerformancePower consumptionNoise immunityAreaCostTime-to-market
Tradeoffs!!!
The VLSI Chip in 2006
46
1. System Specification
2. Functional Design
3. Logic Design
4. Circuit Design
5. Physical Design
6. Design Verification
7. Fabrication
8. Packaging Testing and Debugging
•VLSI design process is carried out at a number of levels.
VLSI Design Cycle
47
Physical Design converts a circuit description into a geometric description. This description is used to manufacture a chip.
1. Partitioning
2. Floorplanning and Placement
3. Routing
4. Compaction
The physical design cycle consists of:
Physical Design
48
• Decomposition of a complex system into smaller subsystems.
• Each subsystem can be designed independently speeding up the design process (divide-and conquer-approach).
• Decompose a complex IC into a number of functional blocks, each of them designed by one or a team of engineers.
• Decomposition scheme has to minimize the interconnections between subsystems.
Why we need Partitioning ?
49
System Level Partitioning
Board Level Partitioning
Chip Level Partitioning
System
PCBs
Chips
Subcircuits/ Blocks
Levels of Partitioning
50
Need for Power optimization
• Portable devices.
• Power consumption is a hindrance in further integration.
• Increasing clock frequency.
Need for delay optimization
• In current sub micron design wire delay tend to dominate gate delay. Larger die size imply long on-chip global routes, which affect performance.
• Optimizing delay due to off-chip capacitance.
Motivation
51
Objective
• Design a class of iterative algorithms for VLSI multi objective partitioning.
• Explore partitioning from a wider angle and consider circuit delay , power dissipation and interconnect in the same time, under balance constraint.
52
Objectives :
• Power cost is optimized AND
• Delay cost is optimized AND
• Cutset cost is optimized
Constraint
• Balanced partitions to a certain tolerance degree. (10%)
Problem formulation
53
• Based on hypergraph model H = (V, E)
• Cost 1: c(e) = 1 if e spans more than 1 block
• Cutset = sum of hyperedge costs
• Efficient gain computation and update
cutset = 3
Cutset
54
SE1 SE2C1 C4 C5
C3
C2
C6
Cu
t Lin
e
CoffChip
C7
Metal 1
Metal 2
path : SE1 C1C4C5SE2.
Delay = CDSE1 + CDC1+ CDC4+ CDC5+ CDSE2
CDC1 = BDC1 + LFC1 * ( Coffchip + CINPC2+ CINPC3+ CINPC4)
Delay Model
55
PinetPicell
netDelaycellDelay )()(Delay(Pi) =
Picell
cellDelay )(Delay(Pi) =
Pi: is any path Between 2 cells or nodes
P : set of all paths of the circuit.
)(: PiDelayMaxObjectivePPi
Delay
56
The average dynamic power consumed by CMOS logic gate in a synchronous circuit is given by:
iLoadi
cycle
ddaveragei NC
T
VP
2
5.0
Ni : is the number of output gate transition per cycle( switching Probability)
LoadiC : Is the Load Capacitance
Power
57
extrai
basici
Loadi CCC basiciC : Load Capacitances driven by a cell
before Partitioning
extraiC : additional Load due to off chip
capacitance.( cut net)
ii
extrai
basici
cycle
dd NCCT
VP
2Total Power dissipation of a Circuit:
Power
58
vi
iNobjective
:
basici
extrai CC
extraiC : Can be assumed identical for all nets
v :Set of Visible gates Driving a load outside the partition.
Power
59
The Balance as constraint is expressed as follows:
TolblockscellsBlockTolblockscells i //
PercentTolblockcellsTol */However balance as a constraint is not appealing because it may prohibits lots of good moves.
Objective : |Cells(block1) – Cells( block2)|
Balance
60
• Imprecise values of the objectives– best represented by linguistic terms that are
basis of fuzzy algebra• Conflicting objectives• Operators for aggregating function
Fuzzy logic for cost function
61
1. The cost to membership mapping.
2. Linguistic fuzzy rule for combining the membership values in an aggregating function.
3. Translation of the linguistic rule in form of appropriate fuzzy operators.
Use of fuzzy logic for Multi-objective cost function
62
• And-like operators– Min operator = min (1, 2)– And-like OWA
= * min (1, 2) + ½ (1- ) (1+ 2)
Or-like operators– Max operator = max (1, 2)– Or-like OWA
= * max (1, 2) + ½ (1- ) (1+ 2)Where is a constant in range [0,1]
Some fuzzy operators
63
Where Oi and Ci are lower bound and actual cost of objective “i”
i(x) is the membership of solution x in set “good ‘i’ ”
gi is the relative acceptance limit for each objective.
Membership functions
64
• A good partitioning can be described by the following fuzzy rule
IF solution has
small cutset AND
low power AND
short delay AND
good Balance.
THEN it is a good solution
Fuzzy linguistic rule
65
The above rule is translated to AND-like OWA
BDPC
BDPCx
4
11
,,,min)(
Represent the total Fuzzy fitness of the solution, our aim is to Maximize this fitness.
)(x
BDPC ,,, Respectively (Cutset, Power, Delay , Balance ) Fitness.
Fuzzy cost function
66
Algorithm (Genetic_Algorithm)
Construct_Population(Np);
For j = 1 to Np
Evaluate_Fitness (Population[j]) End For;
For i = 1 to Ng
For j = 1 to No
(x,y) Choose_parents;offspring[j] Crossover(x,y)
EndFor;
Population Select ( Population, offspring, Np )
For k = 1 to Np
Apply Mutation (Population[k]) EndFor; EndFor;
GA for multiobjective Partitioning
67
Solution representation
68
• Different population sizes• Parent selection: Roulette wheel
– The probability of selecting a chromosome for crossover is
Np is the population size
Np
1i
choice
μ(i)
μ(x)P
GA implementation
69
•Simple single pointcrossover:
•Selection before mutation
•Roulette wheel (rlt)
• Elitism random (ernd)
GA implementation
70
Algorithm Tabu_Search Start with an initial feasible solution S Initialize tabu list and aspiration levelFor fixed number of iterations Do
Generate neighbor solutions V* N(S)Find best S* V*If move S to S* is not in T Then
Accept move and update best solutionUpdate T and AL
Else If Cost(S*) < AL ThenAccept move and update best solution
Update T and ALEnd IfEnd For
Tabu Search
71
• Neighbor solution – Change the block of a randomly selected cells.
• The Tabu list size depends on the circuit size.
TS implementation
72
Tabu list
• Store index of one of the swapped cell.
• Various sizes for tabu list.
Aspiration Level
• The best neighbor is better than the global best.
TS implementation
73
Experimental Results
ISCAS 85-89 Benchmark Circuits
74
GA Vs Tabu Multi-objective
75
Circuit S13207 GA
76
Circuit S13207 TS
77
Circuit S13207 GA Vs TS time
78
Conclusion
• The present work successfully addressed the important issue of reducing power and delay consumption in VLSI circuits.
• The present work successfully formulate and provide solutions to the problem of multiobjective VLSI partitioning
• TS partitioning algorithm outperformed GA in terms of quality of solution and execution time
79
Thank you.