Date post: | 01-Apr-2015 |
Category: |
Documents |
Upload: | gunner-merriam |
View: | 222 times |
Download: | 4 times |
Tezzaron Semiconductor
Device Summary
• Fully functional devices demonstrating a wide variety of applications
• Good yield– 90% process, 96% device
• Demonstrated alignment– Repeatable ~0.3micron
• High interconnect density– 10,000 to 170,000 per sqmm
• Positive thermal cycle testing – >100,000 device cycles –65 to 150C 15 minute soak
• Good correlation with models and simulations• Demonstration of tools• Demonstrated faster, lower power, higher density
Tezzaron Semiconductor
Considerations
Tezzaron Semiconductor
Wafer to Wafer - Best Fit
• Memory– DRAM
– PCRAM, FERAM, MRAM
• FPGA• Sensors
• Processors– Short wires
– Heat, heat, heat
Tezzaron Semiconductor
3D Interconnect CharacteristicsSuperViaTM Gen II Face to Face
Size 4.0 X 4.0 1.2 X 1.2 1.7 X 1.7 (0.75 X 0.75
Minimum Pitch 6.08 <4 2.4 (1.46
Feedthrough Capacitance
7fF 2-3fF <<
Series Resistance
<0.25 <0.35 <
Tezzaron Semiconductor
Parameters
• 10um Z dimension increments– 5-15um thickness
• Low R• Moderate C• Repair & Redundancy
– It’s still per sqmm!
• Pitch– 0.5um limit
• How many layers?– 2 to 5, current horizon
Tezzaron Semiconductor
HEAT!!!
• Modeling– What modeling, more data, more testing required
• What we know….– 32W/sqmm, Structurally sound
• <5W easy rules– ~15W/100sqmm cliff
– >150W possible
– >500W liquid cooling
Tezzaron Semiconductor
Even with innovations like DDR II* and QDR,* inadequate memory speed – the so-called “Memory Wall” – is still the primary obstacle to system performance;[i] it undermines most of the speed improvements of today’s processors.[ii] In spite of gains in bus speed, high memory latency causes processors to wait for data; 2003 statistics show that individual processors in high-performance systems and servers spend 65-95% of their time[iii] idly waiting for either memory or I/O. *
[i] N.R. Mahapatra and B. Venkatrao, “The Processor-Memory Bottleneck: Problems and Solutions” Association for Applied Computing Crossroads 5 no. 3 (1999) [e-journal].
[ii] Anthony Cataldo, “MPU designers target memory to battle bottlenecks” EETimes, 19 October 2001
[iii] Sally McKee, “Perspectives on The Memory Wall Problem” accessed online Sept. 2003 at http://www.lanl.gov/orgs/ccn/salishan2003/pdf/mckee.pdf.
Jack Dongarra, “Getting the Performance out of High Performance Computing” accessed online Sept. 2003 at http://www.nersc.gov/conferences/SciDAC2003/Presentations/Dongarra.pdf
Graph: J. Dongarra, U. of Tennessee
Tezzaron Semiconductor
Commodity Memory …
FLAT!
Tezzaron Semiconductor
A Poster Child for the Productivity Crisis
Sparse Matrix OperationsParticle Physics, Weapons Dev.
5.9% efficiency
Finite Element AnalysisWeather & Ocean Forecasting
7.1% efficiency
Large Matrix Manipulation Engineering Design of Complex Structures
8.4% efficiency
Memory Intensive CalculationsCryptanalysis
< 3.0% efficiency
I/O BW to Processing RatioRadar, Sonar, Imaging Sensors
12% efficiency
ASCI Q
PC’s are 15 to 25% Efficient
Tezzaron Semiconductor
Linear!
Comparable
When Processor Limited
Tezzaron Semiconductor
Log! 3
0.006
500 X
When Memory Limited
Tezzaron Semiconductor
Where Does the Bandwidth Go?•When Costs are Grouped by Bandwidth, Memory Bandwidth is 80% of the Cost of a Cray X1 Class Machine
Tezzaron Semiconductor
OK, access to main memory is glacial
Solution: On Chip Cache
Tezzaron Semiconductor
Cache THE Driver of Processor Die Area
$4227$1980 $TBD
130 nm 90 nm
Tezzaron Semiconductor
Deciding What to Bring “On Chip”
Tezzaron Semiconductor
The Good
The Bad
The Ugly
Tezzaron Semiconductor
On Chip / Off Chip Power
Operation Energy32-bit ALU operation 5 pJ
32-bit register read 10 pJ
Read 32 bits from 8K RAM 50 pJ
Move 32 bits across 10mm chip 100 pJ
Move 32 bits off chip 1300 to 1900 pJ
Calculations using a 130nm process operating at a core voltage of 1.2V (Source: Bill Dally, Stanford)
Prefetch/Cache Overhead and Off Chip Memory Access are key Power Issues
Tezzaron Semiconductor
On Chip / Off Chip Latency
Madison 6M POWER4+ POWER5
Frequency (GHz)1.5 1.7 1.9
L2 Latency
5 cycles3.3 ns
12 cycles7.1 ns
13 cycles6.8 ns
L3 Latency
14 cycles9.3 ns
123 cycles72.3 ns
87 cycles45.8 ns
Memory Latency
~224 cycles~149 ns
351 cycles206 ns
220 cycles116 ns
Tezzaron Semiconductor
3D Heterogeneous Integration
Rendering of 3D IC
Maps to memory die array
Maps to logic only die
AFTER: 3D IC
Single Die~ 430 mm2 2D IC “All or Nothing”
Wafer Cost ~ $6,000
Low yield ~ 15%, ~ 10 parts per wafer 128MB not 9MB
memory costs ~ $44/MB memory costs ~ $1.50/MB $0.44/MB
14x increase in memory density4X Logic Cost Reduction29x 100x memory cost reduction (choice!)
Intel Photo used as proxyBEFORE Only Memory Directly Compatible with Logic(virtually no choice!)
Tezzaron Semiconductor
Octopus L3 Cache DRAM
• 1Gb-4Gb• Down to 5ns, latency• 1GHz Max clock rate• Minimum Timing - tRCD=1, tCYC=4, tPRE=0, tCL=2• Programmable 8 port by 256 bit architecture• Programmable burst length 4 to 256• Programmable port width 32 to 256 bits• Exposed or hidden refresh options• DDR 2000MT Max • >200GB/s sustained, closed page mode, BL=4, bandwidth• 512GB/s peak bandwidth• >25TB/s peak on-board transfer rate• 1.0V to 1.7V I/O• 1.4V to 1.6V Core• Internally ECC protected, Dynamic self-repair• 115C die full function operating temperature• 65 sqmm die footprint
Tezzaron Semiconductor
The Demo!