+ All Categories
Home > Documents > NoC Frequency Scaling with Flexible-Pipeline Routers

NoC Frequency Scaling with Flexible-Pipeline Routers

Date post: 11-Jan-2016
Category:
Upload: natan
View: 27 times
Download: 0 times
Share this document with a friend
Description:
NoC Frequency Scaling with Flexible-Pipeline Routers. Pingqiang Zhou, Jieming Yin , Antonia Zhai , and Sachin S. Sapatnekar University of Minnesota – Twin Cities. Tile-Based Multicore System. MEM. MEM. C. L1. L2. MEM. MEM. R. R. NoC dissipates substantial system energy. - PowerPoint PPT Presentation
26
International Symposium on Low Power Electronics and Design NoC Frequency Scaling with Flexible-Pipeline Routers Pingqiang Zhou, Jieming Yin, Antonia Zhai, and Sachin S. Sapatnekar University of Minnesota – Twin Cities
Transcript
Page 1: NoC  Frequency Scaling with Flexible-Pipeline Routers

International Symposium on Low Power Electronics and Design

NoC Frequency Scaling with Flexible-Pipeline Routers

Pingqiang Zhou, Jieming Yin, Antonia Zhai, and Sachin S. Sapatnekar

University of Minnesota – Twin Cities

Page 2: NoC  Frequency Scaling with Flexible-Pipeline Routers

MEM

MEM

MEM

MEM

NoC dissipates substantial system energy

C L1

L2

RR

Tile-Based Multicore System

RAW – 36%; Intel 80-tile – 28% [Vangal et al. 2008]

2

Page 3: NoC  Frequency Scaling with Flexible-Pipeline Routers

MEM

MEM

MEM

MEM

Superscalar Machine

VFS and Its Limitations• NoC is– Potential performance bottleneck– Source of energy consumptionDesigned for diverse traffic patterns

• VFS to reduce energy• Limitations of Aggressive VFS– Reduce throughput– Increase latencyWork for limited traffic pattern

Can we make VFS work for other important traffic patterns?

3

Sensitive Insensitive

Hig

h

Latency

Thro

ughp

utLo

w

3

Page 4: NoC  Frequency Scaling with Flexible-Pipeline Routers

Frequency Scaling1 2 3 4 Frequency = F1

T

44

2

ammp art blackscholes equake fkmeans kmeans Avg0

0.2

0.4

0.6

0.8

1

StaticClockDynamic

Net

wor

k En

ergy

Br

eakd

own

1 2 3 4 Frequency = 0.5F

Animationammp art blackscholes equake fkmeans kmeans Avg

0

0.2

0.4

0.6

0.8

1

StaticClockDynamic

Net

wor

k En

ergy

Br

eakd

own

Frequency scaling harms performance

Page 5: NoC  Frequency Scaling with Flexible-Pipeline Routers

1 2 3 4

Reconfigure Pipeline

Frequency = 0.5F

Frequency = 0.5F

T

4

Flexible pipeline can reduce router pipeline delay

5

1 2 3 4

T

T

Page 6: NoC  Frequency Scaling with Flexible-Pipeline Routers

Flexible Pipeline Routers

+ Reduce NoC energy+ Negligible performance

degradationSensitive Insensitive

Hig

hLo

w

Latency

Thro

ughp

utReduce frequency without increasing router latency

56

Target Application• Low throughput• Latency sensitive

Page 7: NoC  Frequency Scaling with Flexible-Pipeline Routers

Outline

• Background/Motivation• Router Design• Experimental Results• Related work• Conclusion

67

Page 8: NoC  Frequency Scaling with Flexible-Pipeline Routers

Route Computation

VC Allocator(VA)

Switch Allocator(SA)

MC 1, VC 1

MC n, VC 1

Crossbar Switch(ST)

Outputports

Inputports

Input Controller(BW/RC)

BWRC VA SA ST

Headflit

BW SA STBody/tailflit

Baseline Router Architecture

How to reconfigure

pipeline?

BWRC

Route Computation

VA

VC Allocator(VA)

SA

Switch Allocator(SA)

ST

78

Page 9: NoC  Frequency Scaling with Flexible-Pipeline Routers

Pipeline Stage Delay

BW+RC VA SA ST

100 τ 65.5 τ 77.7 τ 45 τ

Delay of 4-stage pipeline:

Tclk = 72.1τ

109

Time-borrowing• Boost pipeline frequency• Average out stage delays

τ : inverter delay

The router delay model is presented in [Peh et al., HPCA 2001].

Page 10: NoC  Frequency Scaling with Flexible-Pipeline Routers

Pipeline Reconfiguration

• Flex Router: pipeline reconfiguration

BW+RC VA SA ST

100 τ4 65.5 τ4 77.7 τ4 45 τ4

BW+RC VA+SA+ST

100 τ2 170.2 τ2

BW+RC VA SA+ST

100 τ3 65.5 τ3 113.7 τ3

BW+RC+VA+SA+ST

270.2 τ1

4-stage pipelineVdd = 1.2 V

3-stage pipelineVdd = 1.0 V

2-stage pipelineVdd = 1.0 V

1-stage pipelineVdd = 0.8 V

How much hardware overhead?

Tclk = 93.1τ3

= 102.1τ4

Tclk = 135.1τ2

= 148.7τ4

Tclk = 72.1τ4

Tclk = 270.2τ1

= 337.7τ4

1010

Page 11: NoC  Frequency Scaling with Flexible-Pipeline Routers

Route Computation

VC Allocator

Switch Allocator

Input Controller(with buffers)

Flits outFlits inRoute

Computation

VASA

Input Controller(with buffers)

Flits outFlits in

BW/RC ST

Architecture Support

BW+RC VA SA ST 4-stage pipeline

R

R

R

11

R R R

11

Page 12: NoC  Frequency Scaling with Flexible-Pipeline Routers

BW+RC VA SA ST 4-stage pipelineR R R

Architecture Support

Route Computation

VASA

Input Controller(with buffers)

Flits outFlits in

R

R

MU

X

RM

UX

R

R

MU

X11

BW/RC ST

BW+RC VA SA ST 3-stage pipelineR R

MU

X

BW+RC VA SA ST 2-stage pipelineR

MU

XMU

XBW+RC VA SA ST 1-stage pipelineM

UXM

UXM

UXLess than 2% overhead in router area

+ Control Logics11

Page 13: NoC  Frequency Scaling with Flexible-Pipeline Routers

Outline

• Background/Motivation• Router Design• Experimental Results• Related work• Conclusion

1212

Page 14: NoC  Frequency Scaling with Flexible-Pipeline Routers

Experimental Platform

• Simulator– Full system simulator: GEMS– Power module: Wattch & Orion2.0– Infrastructure: 8 Core, 1 issue in-order

• Benchmarks– From SPEC OMP2001, NU-Mine and PARSEC

1313

MEM

MEM

C L1

L2

R

1.5 GHz

Page 15: NoC  Frequency Scaling with Flexible-Pipeline Routers

Base: Baseline RouterBase-2: VFS, Slowdown Factor of 2Flex-2: VFS + Flexible-Pipeline Router

Base

Base

-2Fl

ex-2

Base

Base

-2Fl

ex-2

Base

Base

-2Fl

ex-2

Base

Base

-2Fl

ex-2

Base

Base

-2Fl

ex-2

Base

Base

-2Fl

ex-2

Base

Base

-2Fl

ex-2

ammp art blackscholes

equake fkmeans kmeans Avg

00.20.40.60.8

11.2

Dynamic Clock Static

Net

wor

k En

ergy

Base

Base

-2Fl

ex-2

Base

Base

-2Fl

ex-2

Base

Base

-2Fl

ex-2

Base

Base

-2Fl

ex-2

Base

Base

-2Fl

ex-2

Base

Base

-2Fl

ex-2

Base

Base

-2Fl

ex-2

ammp art blackscholes

equake fkmeans kmeans Avg

00.20.40.60.8

11.2

Dynamic Clock Static

Net

wor

k En

ergy

Efficacy in Network Energy Saving

14

Base

Base

-2Fl

ex-2

Base

Base

-2Fl

ex-2

Base

Base

-2Fl

ex-2

Base

Base

-2Fl

ex-2

Base

Base

-2Fl

ex-2

Base

Base

-2Fl

ex-2

Base

Base

-2Fl

ex-2

ammp art blackscholes

equake fkmeans kmeans Avg

00.20.40.60.8

11.2

Dynamic Clock Static

Net

wor

k En

ergy

41%

2%

14

Dynamic energy decreases quadratically as voltage goes downClock energy reduction is significant (65%)

Changes in static energy are minimal

Page 16: NoC  Frequency Scaling with Flexible-Pipeline Routers

Sensitive Insensitive

Hig

hLo

w

Latency

Thro

ughp

utBase: Baseline RouterBase-2: VFSFlex-2: VFS + Flexible-Pipeline Router

Base

Base

-2

Flex

-2

Base

Base

-2

Flex

-2

Base

Base

-2

Flex

-2

Base

Base

-2

Flex

-2

Base

Base

-2

Flex

-2

Base

Base

-2

Flex

-2

Base

Base

-2

Flex

-2

ammp art blackscholes equake fkmeans kmeans G.M.

0.8

0.9

1

1.1

1.2

Nor

mal

ized

Exe

cutio

n Ti

me

Efficacy in Execution Time

Workload L1 data cache(misses/K instructions)

L2 cache(misses/K instructions)

ammp 13.7 4.4art 40.8 18.1blackscholes 8.1 0.9equake 2.8 2.6fkmeans 1.9 1.7kmeans 2.4 1.9

1.5%

Average system performance degradation is reduced

1515

Page 17: NoC  Frequency Scaling with Flexible-Pipeline Routers

SystemEnergy

System Delay

• System-level ED2 Product– Cores, caches and the interconnection networks– E: System Energy– D: System Delay

System-Level Evaluation

1616

NetworkEnergy

Network Delay Tradeoff

Page 18: NoC  Frequency Scaling with Flexible-Pipeline Routers

Base

Base

-2

Flex

-2

Base

Base

-2

Flex

-2

Base

Base

-2

Flex

-2

Base

Base

-2

Flex

-2

Base

Base

-2

Flex

-2

Base

Base

-2

Flex

-2

Base

Base

-2

Flex

-2

ammp art blackscholes equake fkmeans kmeans G.M.

0.80.9

11.11.21.31.41.5

Syst

em E

D2

Efficacy in System ED2 Product

ED2 increase

16

Base: Baseline RouterBase-2: VFSFlex-2: VFS + Flexible-Pipeline Router

Frequency tuning should be based on workloads

17

Page 19: NoC  Frequency Scaling with Flexible-Pipeline Routers

Base: Baseline RouterFlex-2: Flexible-Pipeline Router + Slowdown Factor of 2Flex-4: Flexible-Pipeline Router + Slowdown Factor of 4

Base

Flex

-2Fl

ex-4

Base

Flex

-2Fl

ex-4

Base

Flex

-2Fl

ex-4

Base

Flex

-2Fl

ex-4

Base

Flex

-2Fl

ex-4

Base

Flex

-2Fl

ex-4

Base

Flex

-2Fl

ex-4

ammp art blackscholes equake fkmeans kmeans Avg

00.20.40.60.8

11.2

Dynamic Clock Static

Net

wor

k En

ergy

More Aggressive VFS: Network Energy Saving

Flexible –Pipeline Router is scalable in reducing network energy

43%

39%

1718

Page 20: NoC  Frequency Scaling with Flexible-Pipeline Routers

Base: Baseline RouterFlex-2: Flexible-Pipeline Router + Slowdown Factor of 2Flex-4: Flexible-Pipeline Router + Slowdown Factor of 4

Base

Flex

-2Fl

ex-4

Base

Flex

-2Fl

ex-4

Base

Flex

-2Fl

ex-4

Base

Flex

-2Fl

ex-4

Base

Flex

-2Fl

ex-4

Base

Flex

-2Fl

ex-4

Base

Flex

-2Fl

ex-4

ammp art blackscholes equake fkmeans kmeans G.M.

0.8

0.9

1

1.1

Nor

mal

ized

Exe

cutio

n Ti

me

More Aggressive VFS: Execution Time

18

Performance degradation is increasing19

Page 21: NoC  Frequency Scaling with Flexible-Pipeline Routers

Base: Baseline RouterFlex-2: Flexible-Pipeline Router + Slowdown Factor of 2Flex-4: Flexible-Pipeline Router + Slowdown Factor of 4

Base

Flex

-2

Flex

-4

Base

Flex

-2

Flex

-4

Base

Flex

-2

Flex

-4

Base

Flex

-2

Flex

-4

Base

Flex

-2

Flex

-4

Base

Flex

-2

Flex

-4

Base

Flex

-2

Flex

-4

ammp art blackscholes equake fkmeans kmeans G.M.

0.8

0.9

1

1.1

1.2

Syst

em E

D2

Limits of VFS: System ED2 Product

Diminishing returns when pushing the frequency scaling limitWorkload-dependent

1920

Page 22: NoC  Frequency Scaling with Flexible-Pipeline Routers

Related Works

• “A case for dynamic frequency tuning in on-chip networks” [Mishra `09]

Dynamically router VFS for reducing network power consumption

– Flexible-pipeline routers enable more drastic scaling

• “A variable-pipeline on-chip router optimized to traffic pattern”[Hirata `10]Dynamically router VFS + variable-pipeline-routers

– Flexible-pipeline routers have lower hardware overhead– Our work presents system-level evaluation

2021

Page 23: NoC  Frequency Scaling with Flexible-Pipeline Routers

Conclusions

Network

21

Energy Performance

Flexible-Pipeline Router Minimal hardware overhead Enable aggressive VFS

System Level Implications Considerable energy saving Negligible performance degradation

22

Page 24: NoC  Frequency Scaling with Flexible-Pipeline Routers

Thank you!

21

Q & A

Page 25: NoC  Frequency Scaling with Flexible-Pipeline Routers

Router Delay Model*

• Router stage delay:

9

htT istage

9

Route Computation

VC Allocator(VA)

Switch Allocator(SA)

MC 1, VC 1

MC n, VC 1

Crossbar Switch(ST)

Outputports

Inputports

Input Controller(BW/RC)

p: # of input/output portsc: # of message classesv: # of VCs/message classω: flit size in bits

ti: sequential logic latencyh: setup delay τ: inverter delay

Stage ti hBW/RC constant 0

VA f(p, v) 9 τSA f(p, c, v) 9 τST f(p, ω) 0

*This model is presented in [Peh et al., HPCA 2001].

Page 26: NoC  Frequency Scaling with Flexible-Pipeline Routers

System Energy BreakdownBa

seBa

se-2

Flex

-2Fl

ex-4

Base

Base

-2Fl

ex-2

Flex

-4

Base

Base

-2Fl

ex-2

Flex

-4

Base

Base

-2Fl

ex-2

Flex

-4

Base

Base

-2Fl

ex-2

Flex

-4

Base

Base

-2Fl

ex-2

Flex

-4

ammp art blackscholes equake fkmeans kmeans

00.20.40.60.8

11.2

Network Core+Cache


Recommended