+ All Categories
Home > Documents > EE241 - Spring 2006

EE241 - Spring 2006

Date post: 25-Oct-2021
Category:
Upload: others
View: 3 times
Download: 0 times
Share this document with a friend
21
1 EE241 - Spring 2006 Advanced Digital Integrated Circuits Lecture 26: Future Perspectives 2 Embedded Software-Based Self-Testing for Programmable System Chips BusInterface Master Wrapper Bus Arbiter Low-Cost Tester Test program Responses VCI Signatures DSP VCI IP Core VCI System Memory VCI On-chip Bus BusInterface Master Wrapper BusInterface Target Wrapper BusInterface Target Wrapper Loading test program at low speed Self-test at operational speed Unloading response signature at low speed Low-cost tester High-quality at-speed test Low test overhead Non-intrusive Test in normal operational mode No violation of power consumption More accurate speed-binning CPU
Transcript

1

EE241 - Spring 2006Advanced Digital Integrated Circuits

Lecture 26:Future Perspectives

2

Embedded Software-Based Self-Testing for Programmable System Chips

BusInterface Master Wrapper

BusArbiter

Low-CostTester

On-ChipMemory

Test program

Responses

VCISignatures

DSP

VCI

IP CoreVCI

System MemoryVCI

On-chip Bus

BusInterfaceMaster Wrapper

BusInterface Target WrapperBusInterface Target Wrapper

Loading test program at low speed

Self-test at operational

speed

Unloading response

signature at low speed

Low-cost tester

High-quality at-speed test

Low test overhead

Non-intrusive

Test in normal operational mode

No violation of power consumption

More accurate speed-binning

CPU

2

3

Project Presentations Tomorrow

Tu May 9 2-5pm in 127 Dwinelle

Length of presentation: 8 + (N-1) *2 minutes (with N number of people in group)

3’ per group allocated for Q&A

Presentation should contain the following

1 slide outlining problem

Couple of slides discussing proposed solution and how it differs from what other people have done

Couple of results slides

Number of slides should NOT be larger than the number of minutes

BE CONCISE and DO NOT EXCEED THE ALLOTTED TIME

4

Project ReportsDue by Fr May 12th 5pmShould be in paper format – max of 6 pages (font 10 minimum)

Title of the project/ your names and e-mail addresses Abstract (100 words) Motivation Problem statement Possible solutions from literature (from midterm report) Proposed comparison/solution. Discuss why did you select this particular one. Conditions/assumptions of your design Analysis: Does it work? Analytical analysis, simulation results.Conclusion. What is this approach good for? What else could be done? References

3

5

Take-Home Final

Two options:Early Bird: Fr May 12, 5pm – Monday May 15, 10amRegular Offering: Monday May 15, 5pm – Th May 18, 10amSend e-mail to [email protected] and [email protected] if you want to subscribe to the early bird option

Submissions of answers preferred in electronic form (scans are ok). If impossible, paper version to Jessica in 558 Cory before 10am on respective days.THIS IS A PRIVATE AND PERSONAL EXERCISE. Honesty is appreciated.

6

Semester Look Back

What we did not cover …Interconnect

Key challenge

Interesting areas: high-speed serial interconnect, alternative interconnect, networks on a chip

Arithmetic

Trading off performance and power

Please send feedback on topics you would like to see covered in less or more detail

4

EE241, S06EE241, S06

The Silicon Age ― A Closer Look

19501950 19601960 19701970 1990199019801980 20002000

Co

mp

lexi

tyC

om

ple

xity

???

Some structureSome structure

StructuredStructured

UnstructuredUnstructured

CustomCustom

ASICASIC

DiscreteDiscrete

IP/SocIP/Soc

1 Billion Transistors1 Billion Transistors

EE241, S06EE241, S06

The Silicon Age Still on a Roll, But …

Medium High Very HighVariability

Energy scaling will slow down>0.5>0.5>0.35Energy/Logic Op scaling

0.5 to 1 layer per generation8-97-86-7Metal Layers

11111111RC Delay

Reduce slowly towards 2-2.5<3~3ILD (K)

Low Probability High ProbabilityAlternate, 3G etc

128

11

2016

High Probability Low ProbabilityBulk Planar CMOS

Delay scaling will slow down>0.7~0.70.7Delay = CV/I scaling

256643216842Integration Capacity (BT)

8162232456590Technology Node (nm)

2018201420122010200820062004High Volume Manufacturing

Some Major Hurdles on The Way!

2003 ITRS Roadmap2003 ITRS Roadmap2003 ITRS Roadmap

5

EE241, S06EE241, S06

The Challenges of the Next Decade(s)

•The Physics and Manufacturing Challenges

– A whole slew of static and dynamic variations and error mechanisms

•The Design Introduction Challenge

– Complexity, risk, time, cost

•The n-furcation of the Market

EE241, S06EE241, S06

Design at the End Of the Roadmap

An era of abundance,An era of abundance,

self-healing,self-healing,

resiliency,resiliency,

and beating the odds. and beating the odds.

6

EE241, S06EE241, S06

A Roadmap for Late-Silicon Age Design

20052005 20102010 The far beyondThe far beyondBeyondBeyond

Co

mp

lexi

tyC

om

ple

xity

20002000

Concurrency

And Flexibility

Concurrency

And Flexibility

Self-HealingSelf-Healing

EmbracingRandomnessEmbracing

Randomness

Error-resiliencyError-resiliency

Fully structured and regular fabrics

EE241, S06EE241, S06

A Roadmap for Late-Silicon Age Design

• Regularity and Structure

• Concurrency and Flexibility

• Self-Healing

• Error-Resiliency

• Embracing Randomness

Increasing necessity

Increasing necessity

Absolutely required for manufacturabilityDriven by photo-lithography and eventually self-assembly constraints

Also for variability, reliability, and time-to-market

Regular implementation fabricsRegular implementation fabrics

7

EE241, S06EE241, S06

Regular Fabrics – A Plethora of Choices

FPGAFPGA

VPGACMU

VPGACMU

River PLABerkeley

River PLABerkeley

Structured ASIC (e.g. LSI RapidChip)Structured ASIC (e.g. LSI RapidChip)

Trade-off between area, performance, power and

time-to-market (factors 5 to 10)

TradeTrade--off between area, off between area, performance, power and performance, power and

timetime--toto--market market (factors 5 to 10)(factors 5 to 10)

EE241, S06EE241, S06

Regular Fabrics - Example

CMU Regular Logic BricksStandard-cell library with fewer (~10),

coarser, configurable (w/ vias), micro-regular brick layouts…

…that exhibit macro-regularitywhen assembled at chip-level

2-D FFT plotsof poly-Si

patterns

ASIC “spatial” regularity2-D FFT plots

of poly-Si patterns

Brick “spatial” regularity

[Courtesy: Larry Pileggi, Andrzej Strojwas, CMU – C2S2]

8

EE241, S06EE241, S06

A Roadmap for Late-Silicon Age Design

• Regularity and Structure

• Concurrency and Flexibility

• Self-Healing

• Error-Resiliency

• Embracing Randomness

Immediate future

Immediate future

Concurrency and heterogeneity::• Driven by power density concerns• Alternative to higher clock frequencies

Flexibility:• Higher re-use, shorter time-to-market, in- field adaptation and upgrade

EE241, S06EE241, S06

The Age of Concurrency and Flexibility

AMD Dual Core Microprocessor

Heterogeneous concurrency now prevalent in wireless, automotive,consumer, media processing, graphics and gaming

Heterogeneous concurrency now prevalent in wireless, automotive,Heterogeneous concurrency now prevalent in wireless, automotive,consumer, media processing, graphics and gamingconsumer, media processing, graphics and gaming

Berkeley Pleiades

ARMARMARM

Heterogeneousreconfigurable

fabric

HeterogeneousHeterogeneousreconfigurablereconfigurable

fabricfabric

NTT Video codecwith 4 Tensilica coresNTT Video codecNTT Video codecwith 4 with 4 TensilicaTensilica corescores

IBM/Sony Cell ProcessorIBM/Sony Cell ProcessorIntel Dual Core

Xilinx Vertex 4

9

EE241, S06EE241, S06

Are We Ready for 1000 CPU’s per Chip?

Berkeley BEE-II: 2 TOPs system protytyping environment

• Compilers, operating systems, architectures are definitely not!• How to do research on 1000 CPU systems in compilers, OS, architecture “in parallel”?

30-40 TOPS (2 TFlops) Rack

One Solution: RAMPA Framework for Multi-Core System Development• Create 1000-CPU system from ~ 40 FPGAs• Distribute out-of-the-box Massively Parallel Processor that runs standard binaries of OS and application to all major research institutes in the US.• Provides uniform framework for architecture development and exploration

Core Team: D. Patterson, J. Warzyniek, J. Rabaey (UCB), James Hoe (CMU), Christos Kozyrakis (Stanford),Krste Asanovich (MIT), M. Oskin (Washington), D. Chiou (Texas), W. Hwu (Illinois), S. Lu (Intel)

EE241, S06EE241, S06

A Roadmap for Late-Silicon Age Design

• Regularity and Structure

• Concurrency and Flexibility

• Self-Healing

• Error-Resiliency

• Embracing RandomnessLater this decade

Later this decade

Self-Healing Architectures• On chip-test and diagnostics used to

correct for variations and stress• Static and dynamic

10

EE241, S06EE241, S06

Variations Becoming Pronounced

0.01

0.1

1

1980 1990 2000 2010 2020

micron

10

100

1000

nm

193nm193nm248nm248nm

365nm365nmLithographyLithographyWavelengthWavelength

65nm65nm90nm90nm

130nm130nm

GenerationGeneration

GapGap

45nm45nm

32nm32nm13nm 13nm EUVEUV

180nm180nm

Design becoming “statistical”• makes verification substantially harder• challenging synchronization strategies• “error-free” design untenable

Courtesy: Shekhar Borkar, Intel

XY 40

50

60

70

80

90

100

110

Tem

per

atu

re (

C)

130nm

30%

5X

0.90.9

1.01.0

1.11.1

1.21.2

1.31.3

1.41.4

11 22 33 44 55Normalized Leakage (Isb)Normalized Leakage (Isb)

No

rmal

ized

Fre

qu

ency

No

rmal

ized

Fre

quen

cy

EE241, S06EE241, S06

Self-Healing

• Introduce sensors that monitor key aspects of system

– Manufacturing and environmental conditions

Process variations, temperature, voltage, activity, etc

– Key properties that accelerate failure mechanisms

• Employ system-level intelligent control to reduce stress

– Temperature control via resource assignment

– Active management of voltage-reliability trade-offs

• Utilize tuning and healing to alleviate reliability threats

– NBTI reversal

– In-field clock tuning

Courtesy: T. AustinCourtesy: T. Austin

11

EE241, S06EE241, S06

Test Moving On-Line

• On-chip resources used to minimize test cost • Also available for dynamic re-evaluation and adaptation

On-chip noise samplersOn-chip noise samplers

BusInterface Master Wrapper

Low-CostTester

On-ChipMemory

Diag. test program

Responsemap

VCI

On-chip Bus

00001100000000000000000000000000000000100000000000100110000000001100010000000000111111111111111111111111111111110000000000000000

Logic failure map

CPU

On-chip leakage sensorOn-chip leakage sensor

EE241, S06EE241, S06

Adaptive Biasing Using On-Line Test

5

10

15

20

25

30

35

40

45

50

1.0E+03 1.0E+04 1.0E+05 1.0E+06 1.0E+07

Path Delay (ps)

Esw

itch

ing

(fJ) Adaptive Tuning

Worst Case, w/o Vth tuningNominal, w/ Vth tuning

Energy-performance trade-off

ModuleTest

Module

Vbb

Test inputsand responses

Tclock

Vdd

Dynamically adjust supply and threshold design parameters to center the design in the presence of process variations!

Courtesy: K. Cao, Berkeley

10xEasier Again in Regular Fabrics

12

EE241, S06EE241, S06

Adaptive (Body) Biasing Impact

Courtesy: P. Gelsinger and S. Borkar, Intel (DAC04)

4.5 mm

5.3

mm

Multiplesubsites

4.5 mm

5.3

mm

Multiplesubsites

4.5 mm

5.3

mm

Multiplesubsites

4.5 mm

5.3

mm

Multiplesubsites

EE241, S06EE241, S06

Dynamic Resource Allocation

In the MultiIn the Multi--Processor SpaceProcessor SpaceCompiler combines load Compiler combines load assignment with DVSassignment with DVS

mdlmdl group at PSUgroup at PSU

405060708090

100

2 4 8 16 32

Num ber of Processors

Nor

mal

ized

Ene

rgy

3D DFE LU SPLAT MGRID WAVE5

More savings with more processors!More savings with more processors!

In the Interconnect SpaceIn the Interconnect SpaceUse routing throttling to Use routing throttling to perform thermal managementperform thermal management

ThermalHerdThermalHerd (L.S. Peh, Princeton)(L.S. Peh, Princeton)

13

EE241, S06EE241, S06

Rejuvenation

Source: D. Blaauw, UMichSource: D. Blaauw, UMich

Negative Bias Temperature InstabilityNegative Bias Temperature Instability

EE241, S06EE241, S06

A Roadmap for Late-Silicon Age Design

• Regularity and Structure

• Concurrency, Heterogeneity and Flexibility

• Self-Healing

• Error-Resiliency

• Embracing Randomness Beyond 2010

Beyond 2010Redundancy GaloreThe only way to provide true error-resiliency!

With billions of transistors, overhead factors of 2 to 3 are reasonable if leading to 100% yield, supreme performance, or new applications.

14

EE241, S06EE241, S06

Error-Resilient Systems

Incorporate facilities to push through system faults

• Error detection technologies

– Systems checkers, online testing, continuous functional verification

• Fault diagnosis

– Fine-grained testing, online testing

• System state recovery

– Microarchitectural checkpointing, algorithmic tolerance

• Physical repair

– Sparing, TMR

Courtesy: T. AustinCourtesy: T. Austin

EE241, S06EE241, S06

A Gradual Introduction Process

A “pseudo-synchronous”approach to address process variations and power minimization with minimal overhead by combining circuit and architectural techniques

Courtesy: T. Austin, D. Blaauw, MichiganCourtesy: T. Austin, D. Blaauw, Michigan

Example: Aggressive Deployment using “Razor”Example: Aggressive Deployment using “Razor”

recover

IF

Razo

r FF

ID

Razo

r FF

EX

Razo

r FF

MEM(read-only)

WB(reg/mem)

errorbubble

recover recover

Razo

r FF

Stab

ilizer

FF

PC

recover

flushID

bubble

errorbubble

flushID

errorbubble

flushID

FlushControl

flushID

error

recover

IF

Razo

r FF

Razo

r FF

ID

Razo

r FF

Razo

r FF

EX

Razo

r FF

Razo

r FF

MEM(read-only)

WB(reg/mem)

errorbubble

recover recover

Razo

r FF

Razo

r FF

Stab

ilizer

FF

Stab

ilizer

FF

PCPC

recover

flushID

bubble

errorbubble

flushID

errorbubble

flushID

FlushControl

flushID

error

“razored pipeline”“razored pipeline”

Shadow Latch

Error_L

Errorcomparator

clk_del

FF

clk

QD

Processor

Total

Optimal Voltage

RecovEnergy

Supply Voltage

Ene

rgy

Processor

Total

Optimal Voltage

RecovEnergy

Supply Voltage

Ene

rgy

15

EE241, S06EE241, S06

“Aggressive” Deployment At the Algorithm Level

][nx][nyaMain Block

Estimator

][ˆ ny| | >Th

][nye

Energy savings

Voltage

Pow

er

Pmain

PTOT

PEC

1.0

1.0

Courtesy: N. Shanbhagh, IllinoisCourtesy: N. Shanbhagh, Illinois

Voltage overscale Main Block.

Correct errors using Estimator.

Power savings ≥ 3X!

Voltage overscale Main Block.

Correct errors using Estimator.

Power savings ≥ 3X!

EE241, S06EE241, S06

• Core function validated by checker

• Checker relaxes burden of correctness on core processor

• Core does the heavy lifting, removes hazards that could slow the simple checker

speculativeinstructions

in-orderwith PC, inst,inputs, addr

IF ID REN REG

EX/MEM

SCHEDULER CHK CT

Performance Correctness

Core Checker

Courtesy: Todd Austin, Univ. of Michigan

205 mm2

Alpha 21264REMORA

Checker

12 mm2

Self-checking processor

Moving the Verification on the Chip

16

EE241, S06EE241, S06

“On-Line X”(X = Verification, Test, Tuning, Reliability, Resource,

Power and Leakage Management)

From Design time to Run Time Yield Improvement!

“Turning lemons into lemonade”

T. Austin

“Turning lemons into lemonade”

T. Austin

EE241, S06EE241, S06

Coordinated Forward Error RecoveryCoordinated Forward Error Recovery

Runtime Validation of Multithreaded Processors

0.99

1

1.01

1.02

1.03

1.04

1.05

1.06

FFT LU CHOLESKY BARNES FMM WATER-NSQUARED

WATER-SPATIAL

Runtime Validation Configuration Fault Rate = 1/1K Fault Rate = 1/1M

SM

T P

roce

sso

r

Reg. File Memory

Runtime Monitorin

g Hardware Context Status Register

Hardware Synchronization Unit

DIVA checker processor

DIVA checker processor

Per-thread retired instructions

dis

pat

ch

Correctness Correctness Properties of Properties of Multithreaded Multithreaded

ExecutionExecution

InterInter--thread thread CommunicationCommunication

InterInter--thread thread SynchronizationSynchronization

IntraIntra--thread thread Data FlowData Flow

IntraIntra--thread thread Control FlowControl Flow

17

EE241, S06EE241, S06

Towards malleable, resilient architectures

The Quest: Scaleable (hard and soft) architectures that provide flexible redundancy to accommodate systematic and random, static and dynamic errors while avoiding brittleness!

EE241, S06EE241, S06

A Roadmap for Late-Silicon Age Design

• Regularity and Structure

• Concurrency and Flexibility

• Self-Healing

• Error-Resiliency

• Embracing Randomness

The Far Beyond

The Far Beyond

Maintaining a purely deterministic Boolean abstraction ultimately becomes untenable! Maintaining our abstractions == Slowly abandon them !!

18

EE241, S06EE241, S06

The Search for (New) Scaleable and Stackable Abstractions

An Interesting Case Study:The “Neural Network” MOCProperties:Properties:• Works well on noisy signals• Uses “soft” decisions • Operates in the presence of failures of components and interconnections

Challenge: Limited scopeWorks mostly for classification problems

Artificial neuronArtificial neuron

Allow devices to make errorsand use models-of-computation that tolerate them

(signal processing, communication, coding, information theory)

EE241, S06EE241, S06

Example: Collaborative Networks

• Large number of states/nodes

• Bi-directional, non-linear, non-deterministic links

• Local coupling with globally emergent behavior

• Inherently redundant and resilient to failure

• Large number of states/nodes

• Bi-directional, non-linear, non-deterministic links

• Local coupling with globally emergent behavior

• Inherently redundant and resilient to failure

Sensor Network-on-a-chip

Source: N. Shangbah

19

EE241, S06EE241, S06

Distributed Collaborative Systems on a Chip

Example: A configurable radio architecture based on collaborative autonomous entities

Source: J. Roychowdhury, J. Rabaey

Array of locally-coupled cheaplow-power oscillator-based units• Known to exhibit complex, spontaneous pattern formation • Operation mode selected through choice of coupling factors and operational nodes

Emerging patternas a function of coupling factor

EE241, S06EE241, S06

The Mechanical Radio

The Ultimate ULP Tunable Wireless Transceiver?

Support BeamsWine-Glass

Disk

Anchor

InputElectrode

Coupling Beam

OutputElectrode

R = 32 μm

Source: C. Nguyen, UC Michigan

9 wine-glass disc oscillator-based GSMcompliant oscillator

20

EE241, S06EE241, S06

Transitioning to the Post-Silicon Age

Implementation platforms that work under very low SNR, are non-deterministic, unpredictable and unreliable…

Molecular

Organic

NanoOptics

Nanotube

EE241, S06EE241, S06

Some Concluding Remarks

Formidable challenges over the next decades to dramatically alter design paradigms

Variability and reliability to lead to novel micro-architectures and computational models

Regularity and redundancy central tenets

The opportunities:

Use the abundance of transistors to move the burden from pre- or post-manufacturing evaluation to on-line activities

Gradual incorporation of error-resilient computational models

Formidable challenges over the next decades to dramatically alter design paradigms

Variability and reliability to lead to novel micro-architectures and computational models

Regularity and redundancy central tenets

The opportunities:

Use the abundance of transistors to move the burden from pre- or post-manufacturing evaluation to on-line activities

Gradual incorporation of error-resilient computational models

21

EE241, S06EE241, S06

That’s all Folks!

Thanks for the fun semester.See you tomorrow Tuesday!


Recommended