3D IC Design Tools and Applications to3D IC Design Tools and Applications toMicroarchitecture ExplorationMicroarchitecture Exploration
Jason CongJason CongUCLA Computer Science DepartmentUCLA Computer Science Department
[email protected]@cs.ucla.eduhttp://http://cadlab.cs.ucla.educadlab.cs.ucla.edu/~cong/~cong
2
OutlineOutlineThermalThermal--Aware 3D IC Physical Design FlowAware 3D IC Physical Design Flow
Thermal Models and AssumptionsThermal Models and Assumptions3D Routing with Thermal Via Planning3D Routing with Thermal Via Planning3D Placement3D Placement3D 3D FloorplanningFloorplanning
3D Architecture Exploration3D Architecture Exploration3D Component Modeling and Testing3D Component Modeling and Testing
Concluding Remarks and Future WorkConcluding Remarks and Future Work
3
Thermal Challenges in 3Thermal Challenges in 3--D ICsD ICs
Key Challenge of 3Key Challenge of 3--D IC D IC Design:Design:
Higher power densityHigher power densityInterInter--layer dielectric layer dielectric layerslayers
High Temperature High Temperature Effects:Effects:
Longer interconnect Longer interconnect delaysdelaysFunctional failureFunctional failure
Temperature increases dramatically along the z direction
Z
T
30oC
100oC
135oC
Si 1 Si 2Si 3 Si 4
150oC
Temperature distribution along z direction
4
33--D IC Cooling SchemesD IC Cooling SchemesHeat Sink OptimizationHeat Sink Optimization
Air cooling fansAir cooling fansHeat radiating fins Heat radiating fins Thermal grease, AC, etcThermal grease, AC, etc....
ChipChip--Level Temperature Level Temperature OptimizationOptimization
MicrochannelMicrochannel coolingcoolingFloorplanningFloorplanningRouting Routing Thermal via insertionThermal via insertion
5
ThermalThermal--Aware 3D Physical Design Flow at Aware 3D Physical Design Flow at UCLA (2002 UCLA (2002 –– 2005)2005)
NetlistNetlist (LEFDEF)(LEFDEF) Design constraintsDesign constraints TechnologyTechnology
CIF/GDSIICIF/GDSIIParasitic Parasitic
ExtractionExtraction
ThermalThermalSimulationSimulation
Timing Timing AnalysisAnalysis
ThermalThermal--DrivenDriven3D Floorplanner3D Floorplanner
ThermalThermal--Aware Aware 3D Router w/ 3D Router w/
Thermal Via PlanningThermal Via Planning
OpenAccess
OpenOpenAccessAccess
ThermalThermal--Driven Driven 3D Placement3D Placement
Compact Thermalmodel
Compact Compact ThermalThermalmodelmodel
Layout Layout VerificationVerification
10/8/2007 UCLA VLSICAD LAB 6
Tech. LibTech. Lib
Ref. LibRef. Lib
DesignDesign
3D OA3D OA
ThermalThermal--Driven Driven 3D Floorplanner3D Floorplanner
ThermalThermal--Driven Driven 3D Placer3D Placer
3D Global Router3D Global Router
ThermalThermal--Via PlannerVia PlannerTier Export
Tier Import
Detailed Routing Detailed Routing by Cadence Routerby Cadence Router2D OA2D OA
3D Physical Design Flow (IBM, UCLA, and PSU) 3D Physical Design Flow (IBM, UCLA, and PSU) (2006 (2006 –– present)present)
Layer & Design Rules
(LEF)
Cell & Via* definitions
(LEF)
Netlist (HDL or DEF)
3D RC extraction3D RC extraction
Timing Timing
InterfaceInterface
3D DRC & 3D LVS3D DRC & 3D LVS
Layout (GDSII )
EinsTimerEinsTimer
PSUPSU UCLAUCLA
7
Rlateral
Thermal Resistive Network [Wilkerson04]Thermal Resistive Network [Wilkerson04]Circuit stack partitioned into tilesTiles connected through thermal resistances
Lateral resistances: fixedVertical resistances ∝ 1/#via
Heat sources modeled as current sources
Current value = power
Heat sinks modeled as ground nodes
(a) Tiles stack array
(b) Single tile stack
P1R2
R3
R4
P4
P3
P2
R1
1
2
3
4
-
±
R5
P5 5
Accurate and slow
8
Thermal Resistive Chain ModelThermal Resistive Chain Model
OneOne--Dimension Heat Flow AnalysisDimension Heat Flow AnalysisElmore delayElmore delay--like formula [Chiang01]like formula [Chiang01]
∑ ∑= =
=4
1i
4
ijji4 PRT )(
∑ ∑= =
=4
1i
i
1jji4 RPT )(
P1
R2
R3
R4
P4
P3
P2
R1
1
2
3
4
-
±
Fast and rough
Reduce R: thermal via insertion (routing)Permute P: floorplanning
9
ThroughThrough--thethe--Silicon Vias (TSSilicon Vias (TS--Vias) in 3D ICsVias) in 3D ICs
Effective in heat dissipatingEffective in heat dissipatingRegular wires have almost no effect (size/direction)Regular wires have almost no effect (size/direction)
Two types of TSTwo types of TS--viasviasSignal TSSignal TS--vias, part of the vias, part of the netlistnetlistThermal TSThermal TS--vias, with no connections, introduced to reduce vias, with no connections, introduced to reduce temperaturetemperature
Pad
Dielectric Layer
Block 1 Block 2
Block 3
Block 4
Metal Routing Layer
Silicon (Device Layers)Block 5
Through-the-Silicon Via(Thermal TS Via)
Through-the-Silicon Via(Signal TS Via)
10
ThermalThermal--AwareAware 3D Routing Problem3D Routing ProblemInputInput
33--D floorplanning (placement) resultD floorplanning (placement) resultTechnologyTechnologyNetlistNetlistRequired temperature, such as 80Required temperature, such as 80OOCC
OutputOutputRouted netsRouted netsThermal TSThermal TS--via number and locationsvia number and locations
ObjectivesObjectivesMinimum wirelengthMinimum wirelengthMinimum TSMinimum TS--via numbervia number
11
Multilevel TSMultilevel TS--Via Planning and 3D Routing (TMARS)Via Planning and 3D Routing (TMARS)
Gi
G0
Gk
G0
Gi
Downward PassUpward Pass
level 0
level i
level k
level i
level 0
(1). Power Density Calculation(2). Heat Flow Estimation
(3). Routing Resource Estimation
(1). Power Density Coarsening(2). Heat Flow Estimation
(3). Routing Resource Coarsening
(1). Init Routing Tree Generation(2). TTS Via Planning
(3). TTS Via Number Adjustment
(1) Routing Refinement(2). TTS Via Planning
(3). TTS Via Number Adjustment
Thermal Resistive Network Model
12
Thermal TSThermal TS--Via Planning Problem Via Planning Problem Determines the thermal TS via density for all tilesDetermines the thermal TS via density for all tilesMinimizing #total thermal TS viaMinimizing #total thermal TS viaMeeting capacity and temperature constraintMeeting capacity and temperature constraintSolving through Solving through
Via planning proportional to Via planning proportional to ∆∆tt (VPPT)(VPPT)•• ∆∆t: vertical t differencet: vertical t difference
Alternating direction via planning (ADVP)Alternating direction via planning (ADVP)
01
35 8
4
0 1086 2 5
∆ta =ta-tba
b
13
Thermal TS Via Planning Thermal TS Via Planning [Cong & Zhang, ICCAD[Cong & Zhang, ICCAD’’05]05]NonNon--Linear Programming FormulationLinear Programming Formulation
Variable Definition, for tile Variable Definition, for tile LLi, j, ki, j, kai,j,k : TS-via number Ri,j,k : vertical thermal resistancePi,j,k : current source Ύ : constant Ri,j,k = Ύ / ai,j,k
ti,j,k : temperature Ii,j,k : heat flow
ObjectiveObjective
ConstraintsConstraintsCapacity constraintCapacity constraintTemperature constraintTemperature constraintKirchoff'sKirchoff's current lawcurrent law
Constrained NLPConstrained NLPCan be solved by general NLP solverCan be solved by general NLP solverBut very time consumingBut very time consuming
Ri,j,k=Ύ /ai,j,k
±Fixed R
Ii,j,k
ti,j,kN
i , j ,ki , j ,k
k 2 i , j ,k i , j ,k 1
I# total _ via a
t tγ
≥ −
= =−∑ ∑
14
Alternating Direction TSAlternating Direction TS--Via Planning (ADVP)Via Planning (ADVP)Decompose the NLP into simplified subDecompose the NLP into simplified sub--problemsproblems
Optimizing the via distribution at one direction at a timeOptimizing the via distribution at one direction at a timeAlternating between vertical via planning and horizontal Alternating between vertical via planning and horizontal via planning at each levelvia planning at each levelUpdating the heat flow after every stepUpdating the heat flow after every step
15
Vertical TSVertical TS--Via PlanningVia PlanningResistive network Resistive network →→ resistive chain resistive chain NLP NLP →→ convex programming convex programming Solvable by any convex Solvable by any convex programming toolprogramming toolTheorem:Theorem:
no capacity constraint: TSno capacity constraint: TS--via number via number proportional to the square root of proportional to the square root of ∆∆tt
VPPTVPPT
4 3 2 4 3 2a : a : a t : t : tΔ Δ Δ=
4 3 2 4 3 2a : a : a t : t : tΔ Δ Δ=
I1
R2=γ /a2
R3=γ /a3
R4=γ /a4
I4
I3
I2
R1
1
2
3
4
-
±
16
Horizontal TSHorizontal TS--Via PlanningVia PlanningStill an NLP Still an NLP Further simplificationFurther simplification
TTS via number givenTTS via number givenEven out Even out ∆∆t t in one layerin one layerTSTS--via number proportional via number proportional to the vertical heat flow to the vertical heat flow IIi,j,ki,j,k
Fast heat flow estimationFast heat flow estimationThrough Through path countingpath countingError can be corrected by Error can be corrected by accurate modelaccurate model
Ii,j,k+1
layer k
Ii,j,k
Pi,j,k
Ii,j,k+1
123 4 5
17
Experiment SetupExperiment SetupFourFour--layer 3D Floorplanning results from 3DFP [ICCAD04]layer 3D Floorplanning results from 3DFP [ICCAD04]
MCNC and GSRC floorplanning benchmarks MCNC and GSRC floorplanning benchmarks Power density, random value (10Power density, random value (1055 ~~10107 7 W/mW/m22))
Required temperature, 77Required temperature, 77ooCC
block # net # Init Temp (C)ami33 33 123 298.8ami49 49 408 210.7n100 100 885 275.3n200 200 1585 311.2n300 300 1893 290.2
Benchmark characteristicsBenchmark characteristics
18
Experimental Results Experimental Results ⎯⎯ Temperature ReductionTemperature Reduction
With thermal via insertion, temperature can be reduced to the With thermal via insertion, temperature can be reduced to the required temperature (77required temperature (77ooC)C)Thermal via insertion can reduce the maximum onThermal via insertion can reduce the maximum on--chip chip temperature by over temperature by over 40%40%
050
100150200250300350
T (C)
ami33 ami49 n100 n200 n300
inputafter routingwith thermal via insertion
19
Temperature Maps of ami33 Top Layer Temperature Maps of ami33 Top Layer
157-158156-157155-156154-155153-154152-153
76-7775-7674-7573-7472-7371-7270-7169-7068-6967-6866-6765-6664-6563-64
Before Thermal Via Insertion After Thermal Via Insertion
20
Experimental Results Experimental Results ⎯⎯ Different TSDifferent TS--Via PlannersVia Planners
All can reach the required temperatureAll can reach the required temperaturemm--ADVPADVP
11%11% reduction over flat ADVPreduction over flat ADVP68%68% reduction over TSreduction over TS--via insertion by temperature (mvia insertion by temperature (m--VPPT)VPPT)3.5x 3.5x reduction over even TS via distributionreduction over even TS via distribution
012345678
normalizedTS-vianumber
ami33 ami49 n100 n200 n300
m-ADVPf-ADVPm-VPPTeven
21
Experimental Results Experimental Results ⎯⎯ Final Routing ResultsFinal Routing Results
0.5
0.6
0.7
0.8
0.9
1
ami33 n100 n300
Completion Rates
m-ADVPm-VPPTeven
Completion rates: mCompletion rates: m--ADVP: ADVP: 96.9%96.9% , m, m--VPPT: VPPT: 93.7% , 93.7% , even: even: 73.44%73.44%Normalized runtime: mNormalized runtime: m--ADVP:ADVP:1.01.0, m, m--VPPT:VPPT:1.491.49 and even:and even:3.83.8
02468
10
ami33 n100 n300
Runtime (s)
22
OutlineOutlineThermalThermal--Aware 3D IC Physical Design FlowAware 3D IC Physical Design Flow
Thermal Models and AssumptionsThermal Models and Assumptions3D Routing with Thermal Via Planning3D Routing with Thermal Via Planning3D Placement3D Placement3D 3D FloorplanningFloorplanning
3D Architecture Exploration3D Architecture Exploration3D Component Modeling and Testing3D Component Modeling and Testing
Concluding Remarks and Future WorkConcluding Remarks and Future Work
2D to 3D Transformation by Local Stacking 1. 2D placement on area K*A
For 3D chip with K device layers and each with area A
2. Shrink:
3. Tetris-style 3D legalizationCost R = αd + βv + γtMinimize displacement, #via and thermal cost
23
)K/y,K/(x)y,(x iiii →
2D to 3D Transformation by FoldingLayer assignment and location mapping according to the folded order
Folding-2
Folding-4
24
Window-based Stacking / Folding1. Divde 2D placement into NxN windows
2. Apply stacking or folding in a window
Effect of stacking or folding would be spreaded out, and trade-offs are achieved by varying N
UCLA VLSICAD LAB 26
3D Placement via Transformation3D Placement via TransformationFeaturesFeatures
Existing wellExisting well--performing 2D performing 2D placers can be reusedplacers can be reusedSimple but effective Simple but effective transformation heuristicstransformation heuristicsTradeTrade--off between wire length off between wire length and #via to adapt different and #via to adapt different manufacturing abilitymanufacturing abilityRefinement through RCN graphRefinement through RCN graph
2D Wirelength- and/or Thermal- Driven Placement
2D to 3D Transformation
Layer Reassignment through RCN Graph
2D Detailed Placement for Each Layer
Fast Thermal Model
Accurate Thermal Model
3D Placement Results (1/2)3D Placement Results (1/2)Wirelength (stacking) Wirelength (stacking)
compared to 2D mPL5compared to 2D mPL5Wirelength Wirelength v.sv.s. # TS via . # TS via tradetrade--offsoffs
circuit 2D mPL5 T3Place
ibm01 5.19E+ 06 2.51E+ 06
6.95E+ 06
ibm03 1.37E+ 07 6.67E+ 06
ibm02 1.44E+ 07
8.21E+ 06
ibm05 4.23E+ 07 1.94E+ 07
ibm04 1.67E+ 07
1.09E+ 07
ibm07 3.73E+ 07 1.90E+ 07
ibm06 2.20E+ 07
1.98E+ 07
ibm09 3.46E+ 07 1.78E+ 07
ibm08 3.94E+ 07
3.61E+ 07
ibm11 5.02E+ 07 2.51E+ 07
ibm10 6.82E+ 07
3.78E+ 07
ibm13 6.58E+ 07 3.30E+ 07
ibm12 7.58E+ 07
7.40E+ 07
ibm15 1.65E+ 08 8.42E+ 07
ibm14 1.42E+ 08
1.06E+ 08
ibm17 3.05E+ 08 1.60E+ 08
ibm16 2.04E+ 08
1.28E+ 08
avg. 1 0.5
ibm18 2.43E+ 08
0.00E+00
1.00E+04
2.00E+04
3.00E+04
4.00E+04
5.00E+04
6.00E+04
7.00E+04
8.00E+04
2.00E+07 2.50E+07 3.00E+07 3.50E+07 4.00E+07 4.50E+07
wirelength
number of TS vias
folding + 7(a)
stacking 7(a)
folding+7(b)
stacking + 7(b)
1 1
2
2
2 2
32 folding + sequential
stacking + sequential
folding + symmetric
stacking + symmetric
27
UCLA VLSICAD LAB 28
3D Placement Results (2/2)3D Placement Results (2/2)
LST, r = 10%, LST, r = 10%, w/ temp optimization
circuit Temp. (ºC) WL via # Temp. (ºC)
ibm01 276.5 2.81E+06 19020 159.8
ibm03 196.7 7.13E+06 31780 121.6
ibm04 159.6 9.11E+06 40219 96.0
ibm06 160.4 1.23E+07 50576 103.5
ibm07 107.5 2.01E+07 69111 66.4
ibm08 97.7 2.05E+07 75397 63.2
ibm09 96.1 1.94E+07 78102 60.6
ibm13 249.3 3.47E+07 127520 156.2
ibm15 136.5 8.58E+07 260681 90.1
ibm18 89.4 1.31E+08 332012 58.7
Avg. 1.0 1.08 1.06 0.63
Effect of temperature optimizationEffect of temperature optimization
Analytical Engine for 3D PlacementAnalytical Engine for 3D PlacementDiscrete tier assignmentDiscrete tier assignment
Variables(xi,yi,zi), i=1,2,…,ncell i is placed at (xi,yi) on the tier zi
Relaxed tier assignmentRelaxed tier assignment
UCLA VLSICAD LAB 29
discrete(legalized solution)
relaxed(intermediate solution)
Analytical EngineAnalytical EngineDiscrete tier assignmentDiscrete tier assignment
Formulate 3D placement problem as continuous Formulate 3D placement problem as continuous optimizationoptimization
Relaxed tier assignmentRelaxed tier assignment
UCLA VLSICAD LAB 30
minimize ( , , )
subject to (no overlap between cells)ee
WL x y z∑
discrete(legalized solution)
relaxed(intermediate solution)
NonNon--overlap Constraintsoverlap ConstraintsRelaxed by area density Relaxed by area density constraintsconstraints
Divide the placement region into Divide the placement region into binsbinsMeasure the overflow of bin area Measure the overflow of bin area to capture cell overlapsto capture cell overlaps•• Cell overlaps in overflow bins Cell overlaps in overflow bins
violate density constraintsviolate density constraints•• Cell overlaps not in overflow bins Cell overlaps not in overflow bins
do not violate density constraintsdo not violate density constraints
UCLA VLSICAD LAB 31
NonNon--overlap Constraintoverlap ConstraintReplaced by area density constraintReplaced by area density constraint
Divide the placement region into binsDivide the placement region into binsMeasure the overflow of bin area to Measure the overflow of bin area to capture cell overlapscapture cell overlaps
UCLA VLSICAD LAB 32
minimize ( , , )
subject to (no overlap between cells)ee
WL x y z∑
, , , ,
minimize ( , , )
subject to ( , , )for all , ,
ee
i j k i j k
WL x y z
A x y z Ci j k≤
∑
NonNon--overlap Constraintoverlap ConstraintReplaced by area density constraintReplaced by area density constraint
Divide the placement region into binsDivide the placement region into binsMeasure the overflow of bin area to Measure the overflow of bin area to capture cell overlapscapture cell overlaps
UCLA VLSICAD LAB 33
minimize ( , , )
subject to (no overlap between cells)ee
WL x y z∑
, , , ,
minimize ( , , )
subject to ( , , )for all , ,
ee
i j k i j k
WL x y z
A x y z Ci j k=
∑add filler cells[Chan et al., ISPD’06]
NonNon--overlap Constraintoverlap ConstraintReplaced by area density Replaced by area density constraintconstraint
Divide the placement region into binsDivide the placement region into binsMeasure the overflow of bin area to Measure the overflow of bin area to capture cell overlapscapture cell overlaps
UCLA VLSICAD LAB 34
, , , ,
minimize ( , , )
subject to ( , , ) for all , ,ee
i j k i j k
WL x y z
A x y z C i j k=∑
2, , , ,,
minimize ( , , ) ( ( , , ) )2
increase until overlaps are removed
e i j k i j ke k i jWL x y z A x y z Cμ
μ
+ −∑ ∑ ∑[Nam & Cong, Springer’07][Cong & Luo, ISPD’08]
NonNon--overlap Constraintoverlap ConstraintReplaced by area density Replaced by area density constraintconstraint
Divide the placement region into Divide the placement region into binsbinsMeasure the overflow of bin area Measure the overflow of bin area to capture cell overlapsto capture cell overlaps
Area projection to obtain bin Area projection to obtain bin densities from intermediate densities from intermediate solutionsolution
UCLA VLSICAD LAB 35
NonNon--overlap Constraintoverlap ConstraintReplaced by area density Replaced by area density constraintconstraint
Divide the placement region into Divide the placement region into binsbinsMeasure the overflow of bin area Measure the overflow of bin area to capture cell overlapsto capture cell overlaps
Area projection to obtain bin Area projection to obtain bin densities from intermediate densities from intermediate solutionsolution
UCLA VLSICAD LAB 36
Area ProjectionArea ProjectionBellBell--shaped function to project areashaped function to project area
UCLA VLSICAD LAB 37
2
2
1 2( ) 1 2( , ) 2( 1) 1 2 1
0 otherwise
z k z kk z z k z kη
⎧ − − − ≤⎪= − − < − ≤⎨⎪⎩
ηη((k,zk,z)) - The projection ratiofrom “tier z” to tier k
Area ProjectionArea ProjectionBellBell--shaped function to project areashaped function to project area
An ExampleAn ExampleIntermediate placement ofIntermediate placement ofa cell at a cell at ““tier 2.316tier 2.316””
Projects 0% area to tier 1Projects 0% area to tier 1Projects 80% area to tier 2Projects 80% area to tier 2Projects 20% area to tier 3Projects 20% area to tier 3Projects 0% area to tier 4Projects 0% area to tier 4
UCLA VLSICAD LAB 38
2
2
1 2( ) 1 2( , ) 2( 1) 1 2 1
0 otherwise
z k z kk z z k z kη
⎧ − − − ≤⎪= − − < − ≤⎨⎪⎩ (2, )zη (3, )zη (4, )zη(1, )zη
0%
80%
0%
20%
Area ProjectionArea ProjectionBellBell--shaped function to project areashaped function to project area
An ExampleAn ExampleIntermediate placement ofIntermediate placement ofa cell at a cell at ““tier 2.316tier 2.316””
Projects 0% area to tier 1Projects 0% area to tier 1Projects 80% area to tier 2Projects 80% area to tier 2Projects 20% area to tier 3Projects 20% area to tier 3Projects 0% area to tier 4Projects 0% area to tier 4
UCLA VLSICAD LAB 39
2
2
1 2( ) 1 2( , ) 2( 1) 1 2 1
0 otherwise
z k z kk z z k z kη
⎧ − − − ≤⎪= − − < − ≤⎨⎪⎩ (2, )zη (3, )zη (4, )zη(1, )zη
ηη((k,zk,z)) - The projection ratiofrom “tier z” to tier k
Equivalence to NonEquivalence to Non--overlap Constraintoverlap ConstraintArea projection to tiers is not enoughArea projection to tiers is not enough
Counter example: projected area failed to capture illegalityCounter example: projected area failed to capture illegality
Solution: area projection on pseudoSolution: area projection on pseudo--tierstiers
UCLA VLSICAD LAB 40overflow
Equivalence to NonEquivalence to Non--overlap Constraintoverlap ConstraintTheorem: (Theorem: (x,y,zx,y,z) satisfy the constraints) satisfy the constraints
if.fif.f. (. (x,y,zx,y,z) is a legal placement (no overlaps)) is a legal placement (no overlaps)** after adding** after adding
UCLA VLSICAD LAB 41
, , , ,
, , , ,
( , , )for all , ,
( , , )i j k i j k
i j k i j k
A x y z Ci j k
A x y z C
=⎧⎪⎨ ′ ′=⎪⎩
Multilevel FrameworkMultilevel Framework
UCLA VLSICAD LAB 42
Level at which analytical engine is appliedCoarseningInterpolation
CI
Experimental Results (1/2)Experimental Results (1/2)Comparison of tradeComparison of trade--off curves (ibm13)off curves (ibm13)
19% shorter WL19% shorter WL9% fewer TSV9% fewer TSVthanthan15% shorter WL15% shorter WL43% fewer TSV43% fewer TSVthanthan
(consistent behavior on other circuits)(consistent behavior on other circuits)
UCLA VLSICAD LAB 43
Trans.
44
OutlineOutlineThermalThermal--Aware 3D IC Physical Design FlowAware 3D IC Physical Design Flow
Thermal Models and AssumptionsThermal Models and Assumptions3D Routing with Thermal Via Planning3D Routing with Thermal Via Planning3D Placement3D Placement3D 3D FloorplanningFloorplanning
3D Architecture Exploration3D Architecture Exploration3D Component Modeling and Testing3D Component Modeling and Testing
Concluding Remarks and Future WorkConcluding Remarks and Future Work
45
ThermalThermal--Aware 3D Floorplanning [ICCAD04]Aware 3D Floorplanning [ICCAD04]First work in this fieldFirst work in this field
Simulated Annealing (SA) EngineSimulated Annealing (SA) EngineNew local zNew local z--neighbor operationsneighbor operationsCost functionCost function
•• nwlnwl ⎯⎯ normalized normalized wirelengthwirelength•• nareanarea ⎯⎯ normalizednormalized chip areachip area•• nvcnvc ⎯⎯ normalized normalized interlayer via numberinterlayer via number•• ccTT ⎯⎯ temperaturetemperature costcost
Hybrid Thermal Evaluation Hybrid Thermal Evaluation At each move At each move ―― uses simplified uses simplified chain modelchain modelAt each SA temperature drop At each SA temperature drop ―― the resistive the resistive network modelnetwork model
a b c
d
e
f g
L1
L2
i h
j k
L3Tcnvcnareanwltcos ⋅+⋅+⋅+⋅= ηγβα
46
Temperature/Runtime TradeoffTemperature/Runtime Tradeoff
3DFP3DFP--T can reduce the temperature by T can reduce the temperature by 56%56% with with 9.7x9.7x runtimeruntime3DFP3DFP--TT--Fast can reduce the temperature by Fast can reduce the temperature by 40%40% with with 1.8x1.8x
runtimeruntime3DFP3DFP--TT--Hybrid can reduce the temperature by Hybrid can reduce the temperature by 50%50% with with 3.2x3.2x
runtimeruntimeWirelength increase less than 6%Wirelength increase less than 6%
3DFP
3DFP-T
3DFP-T-Fast
3DFP-T-Hybrid
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15
Normalized Runtime
Nor
mal
ized
Tem
pera
tu
47
Detailed Simulation Result Detailed Simulation Result
Without Thermal Optimization With Thermal Optimization
-- ami33 benchmark with 33 blocks and 4 layers - Generated by FEM based thermal simulation tool (CFD-ACE+)
3D Floorplanning with Folded BlocksThe exploration of the use of vertical integration on microprocessor design requires consideration for both physical design and architecture.
True 3D packing
Architectural Alternative Selection• The number of layers in folded blocks• The partition way: block folding or port partitioning
3D Architectural Blocks 3D Architectural Blocks –– Issue QueueIssue QueueBlock foldingBlock folding
Fold the entries and place them Fold the entries and place them on different layers on different layers Effectively shortens the tag linesEffectively shortens the tag lines
Port partitioningPort partitioningPlace tag lines and ports on Place tag lines and ports on multiple layer, thus reducing multiple layer, thus reducing both the height and width of the both the height and width of the ISQ.ISQ.The reduction in tag and The reduction in tag and matchlinematchline wires can help reduce wires can help reduce both power and delay. both power and delay.
Benefits from block foldingBenefits from block foldingMaximum delay reduction of Maximum delay reduction of 50%, maximum area 50%, maximum area reduction of 90% and a reduction of 90% and a maximum reduction in maximum reduction in power consumption of 40%power consumption of 40%
(a) 2D issue queue with 4 taglines; (b) block folding; (c) port partitioning
3D Architectural Blocks 3D Architectural Blocks –– CachesCaches
Port PartitioningWordline FoldingSingle Layer Design
3D3D--CACTI: a tool to model 3D cache for area, delay and powerCACTI: a tool to model 3D cache for area, delay and powerWe add port partitioning methodWe add port partitioning methodThe area impaction of The area impaction of viasvias
ImprovementsImprovementsPort folding performs better than Port folding performs better than wordlinewordline folding for area.(72% folding for area.(72% vsvs 51%)51%)WordlineWordline folding is more effective in reducing the block delay (13% folding is more effective in reducing the block delay (13% vsvs 5%)5%)Port folding also performs better in reducing power (13% Port folding also performs better in reducing power (13% vsvs 5%)5%)
Corner Block List (CBL) Representation for 3D Floorplan (ICCD’07)
A 3D CBL composes a 3-tuple (S, L, T) S: a list of block nameL: corner cubic block orientation(X-, Y- or Z- oriented)T: The sequence of {Tn,Tn-1, …,T2} recording the number of blocks (represented by # 1’s separated by a 0) covered by corner cubic block in the uncovered block list
3
4
12
S={1 2 3 4 5}L = ( Y,Z,Y,X)
T=( 10,110,10,1110)
5
52
OutlineOutlineThermalThermal--Aware 3D IC Physical Design FlowAware 3D IC Physical Design Flow
Thermal Models and AssumptionsThermal Models and Assumptions3D Routing with Thermal Via Planning3D Routing with Thermal Via Planning3D Placement3D Placement3D 3D FloorplanningFloorplanning
3D Architecture Exploration3D Architecture Exploration3D Component Modeling and Testing3D Component Modeling and Testing
Concluding Remarks and Future WorkConcluding Remarks and Future Work
53
3D Architecture Evaluation with Physical Planning 3D Architecture Evaluation with Physical Planning ---- MEVAMEVA--3D [DAC3D [DAC’’03 & ASPDAC03 & ASPDAC’’06]06]
Optimize Optimize BIPS (not IPC or Freq)BIPS (not IPC or Freq)•• Consider interconnect Consider interconnect
pipelining based on early pipelining based on early floorplanning for critical pathsfloorplanning for critical paths
•• Use IPC sensitivity model Use IPC sensitivity model [Jagannathan05][Jagannathan05]
Area/wirelength Area/wirelength
TemperatureTemperature
2D/3D floorplanning forperformance and thermal with
interconnect pipelining
performance simulationwith interconnect latencies
2D/3D thermal simulation
microarchitectureconfiguration
targetfrequency
critical architecturalpaths and sensitivity
power densityestimates
estimated performance, temperature,and interconnect data
power density withinterconnect consideration
performance, power andtemperature
ESTI
MA
TIO
NVA
LID
ATI
ON
54
IPC Sensitivity ModelsIPC Sensitivity ModelsStudy sensitivity by varying latency of P with all other Study sensitivity by varying latency of P with all other parameters fixedparameters fixed
Build mathematical models [linear, pieceBuild mathematical models [linear, piece--wise linear, etc. or wise linear, etc. or tabletable--lookup]lookup]•• PPBLBL: minimum latency along P (only from blocks): minimum latency along P (only from blocks)•• PPPLPL: post: post--layout latency along P (blocks + wires)layout latency along P (blocks + wires)•• Delta latency Delta latency δδ = (P= (PPLPL –– PPBLBL))•• f(Pf(P,,δδ): relative degraded IPC with extra ): relative degraded IPC with extra δδ cycle latency on Pcycle latency on P
f(Pf(P,,δδ) = (1 ) = (1 –– x)x)δδ, where x is per, where x is per--cycle IPC degradation for Pcycle IPC degradation for Pe.g.: 2 extra cycles, new IPC = (1e.g.: 2 extra cycles, new IPC = (1--0.024)*(10.024)*(1--0.024)0.024)
•• IPCIPCPLPL = IPC= IPCBLBL x x f(Pf(P,,δδ))We ignore path interactions and use a simple additive We ignore path interactions and use a simple additive model to combine multiple pathsmodel to combine multiple paths
IPCPL(P1,P2,…,PN,δ1,δ2,…,δN) =
IPCBL(P1,P2,…,PN,0,0..,0) * f(P1,δ1) * f(P2,δ2) * … * f(PN,δN)
55
Design ExampleDesign ExampleAn outAn out--ofof--order superscalar processor microorder superscalar processor micro--architecture architecture with 4 banks of L2 cache in 70with 4 banks of L2 cache in 70nm nm technologytechnology
Critical pathsCritical paths
56
Baseline Processor ParametersBaseline Processor Parameters
57
Wirelength Improvement from 3D LayoutWirelength Improvement from 3D Layout
0
20000
40000
60000
80000
100000
120000
3G 4G 5G 6G
2D
3D
Assume two device layers
58
Performance Improvement of 3D Layout Performance Improvement of 3D Layout
Assume two device layers
59
2D 2D vsvs 3D Layout3D Layout
2D EV6-like core 3D EV6-like core (2 layers)BIPS= 2.75 BIPS= 2.94
Wakeup loop : The extra cycle is
eliminated.
Branch mispredictionresolution loop and the
L2 cache access latency :
Some of the extra cycles are eliminated
Assume two device layers
60
Maximum OnMaximum On--Chip TemperaturesChip Temperatures
HS denotes a heat sink, and the 3D integration allows to insert thermal vias to reduce the temperature.
Frequency
Assume two device layers
61
Thermal Profiles for 2D chip(4Ghz)Thermal Profiles for 2D chip(4Ghz)
Temperature distribution in 2D integration. Temperature distribution in 2D integration.
62
Thermal Profiles for 3D chip(4Ghz)Thermal Profiles for 3D chip(4Ghz)
Temperature distribution in 3D integration with one heat sink. Temperature distribution in 3D integration with one heat sink.
Temperature distribution in 3D integration with two heat sinks aTemperature distribution in 3D integration with two heat sinks and flipped upper layer. nd flipped upper layer.
63
Limitation of Component Stacking AloneLimitation of Component Stacking Alone
Extra latency seen by some critical loops:Extra latency seen by some critical loops:
Stacking can only attack wire latency between blocksStacking can only attack wire latency between blocks
Further benefit can only come from attacking block Further benefit can only come from attacking block latencylatency
Component FoldingComponent Folding
64
Solution: 3D Design w/ Component Folding and Solution: 3D Design w/ Component Folding and StackingStacking
Explore 3D design of architectural structures that areExplore 3D design of architectural structures that areTiming/Throughput CriticalTiming/Throughput CriticalExpensive in Terms of Power Consumption and/or Thermal Expensive in Terms of Power Consumption and/or Thermal OutputOutput
Possible candidates for 3D component foldingPossible candidates for 3D component foldingInstruction Scheduling WindowInstruction Scheduling Window•• Issue Queue can be partitioned into multiple levels via Issue Queue can be partitioned into multiple levels via
matchlinesmatchlines or taglines.or taglines.OnOn--Chip CachesChip Caches•• Regular structure lends itself to a wide range of Regular structure lends itself to a wide range of partitioningspartitionings
Register FileRegister File•• Thermally critical resource Thermally critical resource –– also has a regular structurealso has a regular structure
65
Results from 3D Folding and StackingResults from 3D Folding and Stacking
0
0.5
1
1.5
2
2.5
3
3.5
4
3G 4G 5G 6G
1 layer
2 layers
3 layers
4 layers
Over 35% performance improvement
66
5GHz 3 Device Layer Layout5GHz 3 Device Layer Layout
Exploration of 3D MultiCore Systems -- MC-Sim
L2Bank
L2Bank
L2Bank
SESC Instance
MINT
C C C…
…
…
CACHE CONTROLLER
Functional Network Switch
…
SESC Instance
MINT
C C C…
SESC Instance
MINT
C C C…
SystemC NoC Model
message latencies
messages
Central Page Handler
MC-Sim ComponentsA number of SESC instances
Each instance is a number of cores cooperating on a single (potentially multithreaded) application
A number of cache banksShared cache state that can be accessed by any SESC instance
A central page handlerTo dole out physical pages to SESC instancesAllows support for multitasking
A functional network switchTo functionally route messages between components
A SystemC NoC modelTo accurately model latency and power Entries in the functional switch wait for an amount of time specified by the NoC
69
SummarySummaryVery little 3D CAD support from major EDA vendorsVery little 3D CAD support from major EDA vendors
A complete set of thermalA complete set of thermal--aware 3D IC physical design tool is aware 3D IC physical design tool is available from UCLA/available from UCLA/PennStatePennState/IBM collaboration/IBM collaboration
3D thermal modeling3D thermal modeling3D routing with thermal via planning3D routing with thermal via planning3D placement3D placement3D 3D floorplanningfloorplanning
3D physical design tools provide the capability for early physic3D physical design tools provide the capability for early physical al prototyping for microarchitecture explorationprototyping for microarchitecture exploration
Coupled with 3D physical planningCoupled with 3D physical planningConsider both 3D component stacking and foldingConsider both 3D component stacking and foldingOver 35% performance improvementOver 35% performance improvement
Further ReadingY. Liu, Y. Ma, E. Kursun, J. Cong, and G. Reinman, “Fine Grain 3D Integration for Microarchitecture Design Through Cube Packing Exploration,” Proceedings of 25th IEEE International Conference on Computer Design, Lake Tahoe, CA, pp. 259-266, October 2007.J. Cong, Y. Ma, Y. Liu, E. Kursun, and G. Reinman, “3D Architecture Modeling and Exploration,” Proceedings of 24th International VLSI/ULSI Multilevel Interconnection Conference (VMIC), Fremont, CA, pp. 231-238, September 2007.G. Loh, Y. Xie, and B.Black, “3D processor Design” , IEEE Micro, 2007 J. Cong, G. Luo, J. Wei, and Y. Zhang, “Thermal-Aware 3D IC Placement via Transformation,” Proceedings of the 12th Asian and South Pacific Design Automation Conference (ASP-DAC 2007), Yokohama, Japan, pp. 780-785, January, 2007.Yuan Xie, G. Loh, B. Black, K. Bernstein. Design Space Exploration for 3D Architecture. ACM Journal of Emerging Technologies for Computer Systems 2(2):65-103.J. Cong and Y. Zhang., “Thermal Via Planning for 3-D ICs,” Proceedings of the 2005 IEEE/ACM Int’l Conference on Computer Aided Design, November 2005, pp. 745-752. Tsai, Y-F., Y. Xie, N. Vijaykrishnan, M. J. Irwin Three-Dimensional Cache Design Exploration Using 3DCacti. Proceedings of the IEEE International Conference on Computer Design (ICCD 2005). pp. 519-524
http://http://cadlab.cs.ucla.educadlab.cs.ucla.edu/~cong/~cong
AcknowledgementsWe would like to thank the supports from DARPA
Support from the primary contractors --Collaboration with CFDRC and IBM and
Publications are available from http://cadlab.cs.ucla.edu/~cong