11 1 Process Variation in Near-threshold Wide SIMD Architectures Sangwon Seo 1, Ronald G. Dreslinski...

11

1

Process Variation in Near-threshold Wide SIMD Architectures

Sangwon Seo1, Ronald G. Dreslinski1, Mark Woh1, Yongjun Park1,Chaitali Chakrabarti2, Scott Mahlke1, David Blaauw1, Trevor Mudge1

University of Michigan1, Arizona State University2

22

2Near Threshold Computing

Super Threshold high performance

high energy consumption

Near Threshold 10x energy reduction

10x performance degradation

Sub Threshold exponentially decreasing

performance

increasing leakage becomes dominant

2

33

3Near-threshold Computing

Advantage: High energy efficiency

Disadvantage Low performance throughput

Compensated with very wide SIMD architecture

Sensitive to variations in threshold voltage

More critical issues in wide SIMD architectures Increased probability of timing errors

Expensive error recovery mechanisms

3

44

4Near-threshold Computing

Advantage: High energy efficiency

Disadvantage Low performance throughput

Compensated with very wide SIMD architecture

Sensitive to variations in threshold voltage

More critical issues in wide SIMD architectures Increased probability of timing errors

Expensive error recovery mechanisms

How bad is the delay variation in wide SIMD architectures running at near-threshold voltages?

How to mitigate the variation-induced timing errors?

4

55

5Delay Variations in 90nm

5

~2.3x ~1.6x

Uncorrelated variations are averaged out over the chain.

66

6Delay Variations – f(Vdd=0.55V, N)

6

A long chain helps, but the effect diminishes as N increases.

Variations are exacerbated with technology scaling.

77

7Delay Variations – f(Vdd, N=50)

7

LER causes high variations in advanced technology nodes

Strict Design Rules

Metal-Gates w/ high-k material or SOI

Advanced lithography

88

8Delay Distribution – 90nm GP

8

1 critical path delay = delay of a chain of 50 FO4 inverters.

1-wide system delay = max (delays of 100 critical paths )

128-wide system delay = max (delays of 128 1-wide system)

Performance Drop

99

9Variation Effects on 128-wide SIMD Architecture

9

- Structural Duplication- Voltage margining- Frequency margining

1010

10Near-threshold Wide SIMD Architecture: Diet SODA

10

[Seo et al. ISLPED 2010]

1111

11Structural Duplication

11

SIMD Function Unit #7










Crossbar

Datapath#7

Datapath#6

Datapath#5

Datapath#4

Datapath#3

Datapath#2

Datapath#1

Datapath#0

8-wide+2-spare system

Increase number of processing resources

1212

12Structural Duplication

12











Crossbar

Datapath#6

Datapath#6

Datapath#5

Datapath#4

Datapath#3

Datapath#2

Datapath#1

Datapath#0

8-wide+2-spare system

Use the spares if required.

1313

13Structural Duplication – 90nm GP

13

6 spares are required to match the chip delay of baseline architecture.

1414

14Voltage Margining

14

Delay distributions: 45nm PTM model is used

Increase supply voltage

1515

15Frequency Margining

Increase clock period

Applicable for applications with relaxed time constraints

For advanced technology nodes, this is impractical

Caveat

Consider its impact on system

SIMD subsystem clock period (Tclk@NTV)

memory subsystem clock period (Tclk@FV)

15

1616

16Structural Duplication vs. Voltage Margining

16

1717

17Combination of two schemes – 45nm GP

17

128-wide system @ 0.6V

26 spares

17mV boost

5mV + 8 spares

10mV + 2 spares

1818

18Variation-Aware Diet SODA

18

1919

19Conclusions

Near-threshold operation of wide SIMD system can have timing problems due to process variations.

Variation effects on a 128-wide SIMD architecture are marginal for 90nm technology node, but could be non-negligible for current/future technology nodes.

A combination of structural duplication and voltage margining provides a minimal power overhead solution to mitigate variation-induced timing problems in wide SIMD architectures.

19

2020

20Questions?

Thank you!

20

2121

21Backup Slides

21

2222

22Local Spares vs. Global Spares

22

Local Sparing 1 out of 4

(2 spares)

Global Sparing

(2 spares)

+ small overhead

- burst errors

+ burst errors

- Large overhead

2323

23Local Spares vs. Global Spares

23

Global sparing is better than local sparing.

XRAM crossbar supports global sparing.

128 + 8 global spares

128 + 32 local spares(1 out of 4)

2424

24Variation-Aware Diet SODA

24

With little area and power overhead, delay variations can be solved.

Date post:	25-Dec-2015
Category:	Documents
Upload:	reginald-maxwell
View:	214 times
Download:	0 times

11 1 Process Variation in Near-threshold Wide SIMD Architectures Sangwon Seo 1, Ronald G. Dreslinski...

Documents