Date post: | 12-Jan-2016 |
Category: |
Documents |
Upload: | augustus-scott |
View: | 218 times |
Download: | 0 times |
Concurrent Autonomous Self-Test for Uncore Components in SoCs
Yanjing Li, Stanford University
Onur Mutlu, Carnegie Mellon University
Donald S. Gardner, Intel Corporation
Subhasish Mitra, Stanford University
1
Overcoming CMOS Reliability Challenges
2
Circuit agingEarly-life failures LifetimeTime
Failure rate
Burn-in difficult
Guardbands expensive
On-line self-test and diagnostics
Soft errorsBuilt-In Soft Error
Resilience (BISER)
Uncore Components Significant in SoCs
Cisco Network Processing Engine
Uncore
Components
Uncore
Components
NVIDIA Tegra
Uncore Components
Uncore Components
IBM Power 7
© techvishal.wordpress.com
© news.cnet.com
© ciscosistemas.org
Uncore examples
Controllers for cache & DRAM
Crossbar
I/O interfaces
3
Robust Uncore Essential
Uncore12%
Processor cores12%
Memories76%
New on-line self-test for uncore
CASP for processor cores [Li DATE 08, ICCAD 09]
ECC, Memory BIST & repair for memories
8-cores 64-threads
OpenSPARC T2 SoC
© opensparc.net
Uncore
4
Challenge 1: High Test Coverage
CASP Logic BIST Roving Emulation
Coverage High ? Depends
Cost Low High High
Design effort Moderate High High
CASP: Concurrent, Autonomous, Stored Patterns
High-coverage patterns off-chip FLASH
System-level on-line test access
FLASH cheap, test compression pervasive
5
© intel.com
Challenge 2: Power, Performance, Area Costs
Stall-and-test inadequate 4-core Intel® Core™ i7 system results
On-line self-test
Requests from multiple cores
DRAM Controller
Core
Caches and Interconnects
Core Core Core
Unresponsiveness or system hang
Multiple cores stall
6
Naïve Approaches Inadequate for Uncore
Stall-and-test
Unresponsiveness or complete hang
Spare unit for each uncore type
12% area overhead*
Small area cost
Small performance
impact
Uncore CASP new techniques required
* OpenSPARC T2 design 7
New Uncore On-line Self-Test Principles
I. Resource reallocation and sharing (RRS)
II. No-performance-impact testing
III. Smart backup
< 1% area impact, < 3% performance impact
©opensparc.net
OpenSPARC T2 SoC
8
I. Resource Reallocation and Sharing (RRS)
Components with “similar” functionality in SoCs
Temporary reallocation and sharing
Small performance hit without replication
©opensparc.net
4 cores
On-line self-test4. Reroute
Crossbar blocks
CASP controller
L2 banks
4 cores
2. Transfer dirty lines
3. Invalidate
1. Stall and drain requests
OpenSPARC T2
9
II. No-Performance-Impact Testing
©opensparc.net
4 cores
On-line self-test
RRS
CASP controller
L2 banks
4 cores
OpenSPARC T2
IDLE
Implication-relations among SoC components
Component(s) tested when idle
During test of another component
Crossbar blocks
10
III. Smart Backup
DMA for network
DMA for disks
I/O interface
Support in smart backup
Stall or handle slowly via
Programmed I/O
Programmed
I/O
Operations with different requirements
Backup unit for performance-critical operations
Absolute minimal additional hardware
OpenSPARC T2
11
Application Performance Impact Memory-centric
I/O-centric on 4-core Intel system
Disk access: 3% impact
Uncore CASP emulated
4-core Intel® Core™ i7
© intel.com
Execution time
impact
PARSEC benchmarks
No visible unresponsiveness
1.5% performance impact
12
Area and Power Impact
CASP controller(< 0.01% area)
OFF-CHIP FLASH
200 MB On-chip buffer(8KB)
Uncore on-line self-test principles applied
© opensparc.net
Minimal area impact: < 1%
Minimal power impact: < 1%13
Test Results for Uncore Components
200 MB off-chip FLASH
10X test compression
7 ms – 300 ms test time per component
Total pattern count Test coverage
Stuck-at 5,577 99.2% - 99.9%
Transition 11,049 92.8% - 97.8%
Inexpensive FLASH
Thorough on-line self-test14
Logic BISTConcurrent BIST
[Saluja IEEE TCAD 88]
Uncore CASP [This work]
CoverageHigh with high
costsDepends High
Area Cost
HighHigh costs possible
Low
Design complexity
Moderate
Performance impact
Low with our uncore
principlesLow Low
Uncore CASP vs. Existing Techniques
15
CASP Applicable for Other SoCs
Cisco Network Processing EngineNVIDIA Tegra
IBM Power 7 I. RRS
II. No-performance-impact testing
III. Smart backup
IV. Core CASP
© techvishal.wordpress.com
© news.cnet.com
© ciscosistemas.org16
CASP adaptive on-line self-test & diagnostics
3 new principles for uncore CASP
I. Resource reallocation and sharing (RRS)
II. No-performance-impact testing
III. Smart backup
Effective and practical
High test coverage
1% power, 3% performance, 1% area
Conclusions
17
18
Backup Slides
CASP on Actual Intel® Core™ i7 System Intel Research collaboration
Quad-core Intel® Core™ i7 (3.2 GHz)
Thermoelectric temperature controller
Debug tool
Unique real-life experiment
Development of adaptive self-diagnostics
Debut Tool Adapter
TemperatureController
19
20
CASP Flow
4. Resume operationScan chain
3. Apply / analyze high-quality test patterns
(test compression, at-speed test…)
1. Select uncore or core component
2. Isolate
SoC with CASP controller(mulit-core SoC proliferation)
Inexpensive off-chip FLASH(non-volatile storage technology)
RRS Example: L2 Cache Banks
3b. Transfer necessary states (dirty blocks)
Write-backto main memory if necessary
Crossbar
DRAM Controller 0
Bank 0(under test)
DataTagetc.
Controller
1. Stall cache controller
2. Drain outstanding requests
3a. Invalidate clean blocks; Invalidate directory; Invalidate L1
4. Route packets with destination {bank 0, bank 1} to bank 1
Bank 1(helper)
Controller
DataTagetc.
…
21
22
No-Performance-Impact Testing Example: CCX (Crossbar)
8 cores , 64 threads
L2 Bank 0 L2 Bank 7
CCX: multiplexers and arbitration logic 0
CCX: multiplexers and arbitration logic 7
Separate scan chains
Separate scan chains
Packets reallocated to helper
Test at the same time
…
23
Smart Backup Example: Non-Cachable Unit
5. Select outputs from backup
3.Turn onReset
4. Transfer states
MUX
PIO
Boot ROM
interface
1. Stall2. Drain outstanding requests
Interrupt status table
Interrupt processing
Config. status
register interface
Original (under test)
PIO
Interrupt processing
Backup
Minimize area costs at acceptable performance impact
Naïve Approaches Inadequate for Uncore
Simple stall-and-test technique
OS timer interrupt handler on core i
DRAM controller
Request to DRAM
Under testStall
Demonstration on actual 4-core Intel® Core™ i7 system
Infrequent Test
Noticeable unresponsiveness
Frequent Test
System hang
Identical backup units: 12% area overhead
OS timer interrupt handler on core 1
Stall
…
24
Performance Impact
Simulated Latency Overhead (PARSEC Benchmark Suite)
Tool: GEMS simulator (modified for RRS)
Workload: PARSEC benchmark suite
4 threads on 4 cores, CASP runs 1 sec. every 10 sec.
25
III. Smart Backup
DMA for network
DMA for disks
I/O interface
Support in smart backup
Stall or handle slowly via
Programmed I/O
Programmed
I/O
Operations with different requirements
Backup unit for performance-critical operations
Absolute minimal additional hardware
OpenSPARC T2 Ethernet port interface
Layers 3 and 4 acceleration
Network interface
Support in smart backup
OSorchestration
Layer 2
packet process
OpenSPARC T2
26