UNDERSTANDING THE ROLE OF THE POWER DELIVERY NETWORK IN 3-D STACKED MEMORY DEVICES
Manjunath Shevgoor, Niladrish Chatterjee, Rajeev Balasubramonian, Al DavisUniversity of Utah
Aniruddha N. UdipiARM R&D
1
Jung-Sik KimDRAM Design Team, Samsung Electronics
Background
2
1.5V
GND
A BWire Resistance
1.5V 1.2V
Circuit Element
Voltage along Wire A-B
• Only part of the Supply Voltage reaches the circuit elements• This loss of Voltage over the Power Delivery Network (PDN) is
called IR -Drop
• 3D stacking increases current density • Top layer in a 9 high stack needs to go
through 8 TSV layers• IR Drop violations can lead to
correctness issues
DIE 2
DIE 3
DIE 1
3
Addressing IR Drop
Reduce Resistance Make wires wider Add more VDD/VSS bumps Increases Cost
Reduce current Control Activity on chip Decreases Performance
This paper tries to reduce the performance impact without increasing cost
Relationship between pin count and package costSource: Dong el al. Fabrication Cost Analysis and Cost-Aware DesignSpace Exploration for 3-D ICs
4
A big shift going forward Current limiting constraints already exist
DDR3 uses tFAW and tRRD Recent work on PCM (Hay et al.) using Power Tokens
to limit PCM current draw These solutions use Temporal Constraints,
which are not optimal to address IR Drop in 3D DRAM
Quality of Power Delivery also depends on location on die
We propose Spatial Constraints to leverage this disparity
5
DRAM Layout – Spatial Dependence
• We use an HMC like architecture for our evaluation• IR Drop worsens as distance from TSVs increase
VDD on M1 on Layer 9
X Coordinate
VDD
Y Co
ordi
nate
6
IR Drop Profile
• Figures illustrate IR Drop when all banks in the 3D stack are executing ACT
• IR Drop worsens as the distance from the source increases
Layer 2
Layer 3
Layer 4
Layer 5
Layer 6
Layer 7
Layer 8
Layer 9
7
Iso-IR Drop Regions
• IR Drop worsens as distance from TSVs increase
• IR Drop also worsens as height from C4 bumps increase
• We define activity Constraints on a region by region basis
8
DRAM CurrentsSymbol Value (mA) Description Consumed ByIDD0 66 One bank
Activate to Precharge
Local Sense Amps,Row Decoders, and I/O Sense Amps
IDD4R 235 Burst Read Current
Peripherals, Local Sense Amps, IO Sense Amps, Column Decoders
IDD4W 171 Burst Write Current
Peripherals, IO Sense Amps, Column Decoders
Source: Micron Data Sheet for 4Gb x16 part
• Read Consumes the highest current of any DRAM command• To keep design complexity down, we define all other currents in
terms of READS
9
Different Regions have very different IR Drop characteristics
To not be constrained by the worst region, we determine max. number of Reads supported by each Region
IR Drop in any region is not determined by just the activity in that region
Memory controller complexity increases with the number of ‘Regions’
Region based Read constraints
10
Region based Read constraints To keep the memory controller simple,
four kinds of constraints are created for Reads Single Region Constraints- Assume all
Reads happen in only Region Two Region Constraints- Assume all
Reads happen in only two adjacent regions Four Region Constraints- Assume all
Reads happen in either top four or bottom four dies
Die Stack wide constraint- Reads can be happening any where in the die stack
11
Read Based constraints
To limit controller complexity, we define ACT, PRE and Write constraints in terms of Read
The Read-Equivalent is the min. number of ACT/PRE/WR that cause the same IR-Drop as the Read with the least IR-Drop in that RegionCommand Read Equivalent
ACT 2PRE 6WR 1
12
Proposals 1- Controlling Starvation As long as Bottom Regions are serving
more than 8 Reads, Top Regions can never service a Read
Requests mapped to Top regions suffer Prioritize Requests that are older than
N* Avg. Read Latency(N is empirically determined to be 1.2 in our simulations)
Die Stack Wide Constraint
At least one Rd in Top Regions
8 Reads allowed
No Top Region Reads
16 Reads allowed
13
Case Study– Page Placement Profile Applications to find out the most
accessed pages Map most accessed pages to the most IR
Drop resistant regions (Bottom Regions) The profile is divided into 8 sections. The
4 most accessed sections are mapped to Bottom regions
The rest are mapped to C_TOP, B_TOP, D_TOP, A_TOP, in that order
14
Modeling IR Drop
• Power assigned to each block is assumed to be distributed evenly over the block
• Current sources are used to model the power consumption• More details in the paperSource: Sani R. Nassif, Power Grid Analysis Benchmarks
15
Methodology HMC based memory system Simics coupled with augmented USIMM SPEC CPU 2006 mp
CPU ConfigurationCPU 8-core Out-of-Order CMP, 3.2 GHz
L2 Unified Last Level Cache 8MB/8-way, 10-cycle accessMemory Configuration
Total DRAM Capacity 8 GB in 1 3D stackDRAM Configuration 2 16-bit uplinks, 1 16-bit downlink
@ 6.4 Gbps32 banks/DRAM die, 16 vaults
8 DRAM dies/3D-stacktFAW honored on each die
16
Results
With All Constraints, (Real PDN) performance falls by 4.6X
With Starvation management, gap is reduced to 1.47X
Profiled Page Placement with Starvation Control is within 15.35% of unrealistic Ideal PDN
17
Future Work We present a Case study, which explores the
performance improvement of IR Drop aware page placement A realistic page placement/migration scheme is
required to leverage the disparity in IR Drop tolerance of different regions
Metrics other than number of page accesses might be more appropriate when identifying critical pages
Prioritizing requests to Bottom regions could help overcome the detrimental effects of the Top regions
Exploring the complexity – performance tradeoff for memory controllers
18
Conclusions
This paper presents the problem of problem IR Drop in 3D DRAM
We reduce the impact of worst case IR Drop by introducing Region based constraints
Memory controller complexity is limited by simplifying IR Drop constraints
By addressing both the spatial and the temporal aspects of the problem, we achieve performance that is very close to that of the Ideal PDN
19
Thank You