+ All Categories
Home > Documents > Microarchitectural Wire Management for Performance and Power in Partitioned Architectures

Microarchitectural Wire Management for Performance and Power in Partitioned Architectures

Date post: 04-Feb-2016
Category:
Upload: kato
View: 18 times
Download: 0 times
Share this document with a friend
Description:
Microarchitectural Wire Management for Performance and Power in Partitioned Architectures. Rajeev Balasubramonian Naveen Muralimanohar Karthik Ramani Venkatanand Venkatachalapathy. Overview/Motivation. Wire delays are costly for performance and power - PowerPoint PPT Presentation
Popular Tags:
25
Feb 14 th 2005 University of Utah 1 Microarchitectural Wire Management for Performance and Power in Partitioned Architectures Rajeev Balasubramonian Naveen Muralimanohar Karthik Ramani Venkatanand Venkatachalapathy
Transcript
Page 1: Microarchitectural Wire Management for Performance and Power in Partitioned Architectures

Feb 14th 2005 University of Utah 1

Microarchitectural Wire Management for Performance and Power in Partitioned

Architectures

Rajeev BalasubramonianNaveen Muralimanohar

Karthik RamaniVenkatanand Venkatachalapathy

Page 2: Microarchitectural Wire Management for Performance and Power in Partitioned Architectures

February 14th 2005

2 University of Utah

Overview/Motivation

Wire delays are costly for performance and

power

Latencies of 30 cycles to reach ends of a

chip

50% of dynamic power is in interconnect

switching (Magen et al. SLIP 04)

Abundant number of metal layers

Page 3: Microarchitectural Wire Management for Performance and Power in Partitioned Architectures

February 14th 2005

3 University of Utah

Wire Characteristics

Wire Resistance and capacitance per unit length

),()22(0 verthorizverthorizwire fringenglayerspaci

width

spacing

thicknessKC

)2()( BarrierwidthBarrierthicknessRwire

(Width & Spacing) Delay (as delay RC), Bandwidth

Resistance Capacitance Bandwidth

Width

Spacing

Page 4: Microarchitectural Wire Management for Performance and Power in Partitioned Architectures

February 14th 2005

4 University of Utah

Design Space Exploration

Tuning wire width and spacing

d

2d

B WiresResistance

Capacitance

Resistance

Capacitance

BandwidthL wires

Page 5: Microarchitectural Wire Management for Performance and Power in Partitioned Architectures

February 14th 2005

5 University of Utah

Transmission Lines

Allow extremely low delay

High implementation complexity and overhead!

Large width

Large spacing between wires

Design of sensing circuit

Shielding power and ground lines adjacent to each line

Implemented in test CMOS chips

Not employed in this study

Page 6: Microarchitectural Wire Management for Performance and Power in Partitioned Architectures

February 14th 2005

6 University of Utah

Design Space Exploration

Tuning Repeater size and spacing

Traditional WiresLarge repeatersOptimum spacing

Power Optimal WiresSmaller repeatersIncreased spacing

Dela

y Po

wer

Page 7: Microarchitectural Wire Management for Performance and Power in Partitioned Architectures

February 14th 2005

7 University of Utah

Design Space Exploration

Base caseB wires

BandwidthOptimizedW wires

PowerOptimized

P wires

Power and B/WOptimizedPW wires

Fast, low bandwidth

L wires

Page 8: Microarchitectural Wire Management for Performance and Power in Partitioned Architectures

February 14th 2005

8 University of Utah

Outline

Overview

Wire Design Space Exploration

Employing L wires for Performance

PW wires: The Power Optimizers

Results

Conclusions

Page 9: Microarchitectural Wire Management for Performance and Power in Partitioned Architectures

February 14th 2005

9 University of Utah

Evaluation Platform

L1 DCache Cluster

Centralized front-end

I-Cache & D-Cache

LSQ

Branch Predictor

Clustered back-end

Page 10: Microarchitectural Wire Management for Performance and Power in Partitioned Architectures

February 14th 2005

10 University of Utah

Cache Pipeline

L1 DCache

LSQ

Eff. Address Transfer 10c

Mem. DepResolution

5c

CacheAccess

5c

Data return at 20c

L1 DCache

LSQ

Eff. Address Transfer 10c

Mem. DepResolution

5c

CacheAccess

5c

Data return at 20c

L1 DCache

LSQ

Eff. Address Transfer 10c

PartialMem. DepResolution

3c

CacheAccess

5c

8-bit Transfer 5c

Data return at 14c

Functional

Unit

Page 11: Microarchitectural Wire Management for Performance and Power in Partitioned Architectures

February 14th 2005

11 University of Utah

L wires: Accelerating cache access

Transmit LSB bits of effective address through L wires Faster memory disambiguation

Partial comparison of loads and stores in LSQ

Introduces false dependences ( < 9%)

Indexing data and tag RAM arrays LSB bits can prefetch data out of L1$

Reduce access latency of loads

Page 12: Microarchitectural Wire Management for Performance and Power in Partitioned Architectures

February 14th 2005

12 University of Utah

L wires: Narrow Bit Width Operands

PowerPC: Data bit-width determines FU

latency

Transfer of 10 bit integers on L wires

Can introduce scheduling difficulties

A predictor table of saturating counters

Accuracy of 98%

Reduction in branch mispredict penalty

Page 13: Microarchitectural Wire Management for Performance and Power in Partitioned Architectures

February 14th 2005

13 University of Utah

Power Efficient Wires.

Base caseB wires

Power and B/WOptimizedPW wires

Idea: steer non-critical data through

energy efficient PW interconnect

Page 14: Microarchitectural Wire Management for Performance and Power in Partitioned Architectures

February 14th 2005

14 University of Utah

PW wires: Power/Bandwidth Efficient

Ready Register operands Transfer of data at

instruction dispatch

Transfer of input operands

to remote register file

Covered by long dispatch to

issue latency

Store data Could stall commit process

Delay dependent loads

Rename&

Dispatch

IQ

Regfile

FU

IQ

Regfile

FU

IQ

Regfile

FU

IQ

Regfile

FU

Operand is ready at cycle 90

Consumer instruction Dispatched at cycle 100

Page 15: Microarchitectural Wire Management for Performance and Power in Partitioned Architectures

February 14th 2005

15 University of Utah

Outline

Overview

Wire Design Space Exploration

Employing L wires for Performance

PW wires: The Power Optimizers

Results

Conclusions

Page 16: Microarchitectural Wire Management for Performance and Power in Partitioned Architectures

February 14th 2005

16 University of Utah

Evaluation Methodology

L1 DCache

B wires (2 cycles)

L wires (1 cycle)

PW wires (3 cycles)

Cluster

Simplescalar -3.0 augmented to simulate a dynamically scheduled 4-cluster model

Crossbar interconnects (L, B and PW wires)

Page 17: Microarchitectural Wire Management for Performance and Power in Partitioned Architectures

February 14th 2005

17 University of Utah

Heterogeneous Interconnects Intercluster global Interconnect

72 B wires (64 data bits and 8 control bits) Repeaters sized and spaced for optimum delay

18 L wires Wide wires and large spacing

Occupies more area

Low latencies 144 PW wires

Poor delay

High bandwidth

Low power

Page 18: Microarchitectural Wire Management for Performance and Power in Partitioned Architectures

February 14th 2005

18 University of Utah

Analytical Model

C = Ca + WsCb + Cc/S

1 2 3

1 Fringing Capacitance

2 Capacitance between adjacent metal layers

3 Capacitance between adjacent wires

RC Model of the wire

Total Power = Short-Circuit Power + Switching Power + Leakage

Power

Page 19: Microarchitectural Wire Management for Performance and Power in Partitioned Architectures

February 14th 2005

19 University of Utah

Evaluation methodology

I-Cache

D-cache

LSQ Cluster

Cross bar

Ring interconnect

Simplescalar -3.0

augmented to simulate

a dynamically

scheduled 16-cluster

model

Ring latencies

B wires ( 4 cycles)

PW wires ( 6 cycles)

L wires (2 cycles)

Page 20: Microarchitectural Wire Management for Performance and Power in Partitioned Architectures

February 14th 2005

20 University of Utah

IPC improvements: L wires

L wires improve performance by 4.2% on four cluster

system and 7.1% on a sixteen cluster system

0

0.5

1

1.5

2

2.5

am

mp

ap

plu

ap

si art

bzi

p2

cra

fty

eo

n

eq

ua

ke

fma

3d

ga

lge

l

ga

p

gcc

gzi

p

luca

s

mcf

me

sa

mg

rid

pa

rse

r

swim

two

lf

vort

ex

vpr

wu

pw

ise

AM

Baseline: 144 B-Wires

Low-latency optimizations: 144 B-Wires and 36 L-Wires

Page 21: Microarchitectural Wire Management for Performance and Power in Partitioned Architectures

February 14th 2005

21 University of Utah

Four Cluster System: ED2 Improvements

92.195.0970.961.5144 PW 36 L

99.296.61030.982.0288 B

94.593.31010.992.0144 B, 36 L

93.294.4990.972.0288 PW,36 L

100.2103.4970.921.0288 PW

1001001000.951.0144 B

Relative

ED2

(20%)

Relative

ED2

(10%)

Relative

processor

energy

(10%)

IPCRelative

metal

area

Link

Page 22: Microarchitectural Wire Management for Performance and Power in Partitioned Architectures

February 14th 2005

22 University of Utah

Sixteen Cluster system: ED2 gains

93.11051.18288 B

88.71071.22288 B, 36 L

88.71021.19144 B, 36 L

105.3941.05144 PW, 36 L

1001001.11144 B

Relative ED2

(20%)

Relative

Processor

Energy (20%)

IPCLink

Page 23: Microarchitectural Wire Management for Performance and Power in Partitioned Architectures

February 14th 2005

23 University of Utah

Conclusions

Exposing the wire design space to the architecture

A case for micro-architectural wire management!

A low latency low bandwidth network alone helps improve performance by up to 7%

ED2 improvements of about 11% compared to a baseline processor with homogeneous interconnect

Entails hardware complexity

Page 24: Microarchitectural Wire Management for Performance and Power in Partitioned Architectures

February 14th 2005

24 University of Utah

Future work

3-D wire model for the interconnects

Design of heterogeneous clusters

Interconnects for cache coherence and L2$

Page 25: Microarchitectural Wire Management for Performance and Power in Partitioned Architectures

February 14th 2005

25 University of Utah

Questions and Comments?

Thank you!


Recommended