+ All Categories
Home > Documents > Dynamic Zero Compression for Cache Energy Reduction

Dynamic Zero Compression for Cache Energy Reduction

Date post: 08-Jan-2016
Category:
Upload: arion
View: 39 times
Download: 1 times
Share this document with a friend
Description:
Dynamic Zero Compression for Cache Energy Reduction. Luis Villa Michael Zhang Krste Asanovic {luisv|rzhang|krste}@lcs.mit.edu. wl. bit. bit_b. addr. BUS. Conventional Cache Structure. Energy Dissipation Bitlines (~75%) Decoders I/O Drivers Wordlines. Address Decoder. I/O. - PowerPoint PPT Presentation
Popular Tags:
19
Dynamic Zero Compression for Cache Energy Reduction Luis Villa Michael Zhang Krste Asanovic {luisv|rzhang|krste}@lcs.mit.edu
Transcript
Page 1: Dynamic Zero Compression for Cache Energy Reduction

Dynamic Zero Compression

for Cache Energy Reduction

Luis Villa

Michael Zhang

Krste Asanovic

{luisv|rzhang|krste}@lcs.mit.edu

Page 2: Dynamic Zero Compression for Cache Energy Reduction

Conventional Cache Structure

Energy Dissipation Bitlines (~75%) Decoders I/O Drivers Wordlines

wl bit bit_b

addr

BUS

Ad

dre

ss D

ecod

er

I/O

Page 3: Dynamic Zero Compression for Cache Energy Reduction

Existing Energy Reduction Techniques

Sub-banking

Hierarchical Bitlines

Low-swing BitlinesOnly for reads, writes

performed full swing.

Wordline Gating

I/O

BUS

addr

Ad

dre

ss D

ecod

er

gwl

lwl

Offset Dec.

offset

SRAM Cells

SenseAmps

lwl

Offset Dec.

offset

SRAM Cells

SenseAmps

32128

Page 4: Dynamic Zero Compression for Cache Energy Reduction

Asymmetry of Bits in Cache

>70% of the bits in D-cache accesses are “0”s Measured from SPECint95 and MediaBench Examples: small values, data types

Differential bitlines preferred in large SRAM designs. Better Noise Immunity Faster Sensing

Related work with single-ended bitlines [Tseng and Asanovic ’00] --- Used in register file

design with single-ended bitlines. [Chang et. al. ’99] --- Used in ROM and small

RAM with single-ended bitlines.

Page 5: Dynamic Zero Compression for Cache Energy Reduction

Dynamic Zero Compression

Zero Indicator Bit One bit per grouping of bits Set if bits are zeros Controls wordline gating

I/O

addr

Ad

dre

ss D

ecod

er

lwl

SRAM Cells

SnsAmpoff dec

Address-controlled

BUS

lwl

SRAM Cells

Sns Amp

ZIB

Data-Controlled

Page 6: Dynamic Zero Compression for Cache Energy Reduction

Data Cache Bitline Swing Reduction%

Bit

lin

e S

win

g R

ed

ucti

on

-10

0

10

20

30

40

50

com

p liijp

eg govo

rtex

m88

kgc

cpe

rl

adpc

m_e

n

adpc

m_d

eep

ic

unep

ic

g721

_en

g721

_de

mpe

g_en

mpe

g_de

pegw

it_en

pegw

it_de Avg

wordhalf-wordbytehalf-byte

Calculation includes the bitline swings introduced by ZIB

Page 7: Dynamic Zero Compression for Cache Energy Reduction

Hardware Modifications

Zero Indicator Bit

Wordline Gating Circuitry

Sense Amplifier

CPU Store Driver

Cache Output Driver

Page 8: Dynamic Zero Compression for Cache Energy Reduction

ZIB and Wordline Gating Circuitry

ZIB

Wordline Gating Circuitry

LWL

BitBWL

Bit_b

ZIB_b

W_EN

I/O

BUS

addr

Ad

dre

ss D

ecod

er

bw

l

SRAM Cells

Sense Amplifiers

ZIB

Cwl

Cwl/4

Small Delay Overhead

Page 9: Dynamic Zero Compression for Cache Energy Reduction

Sense Amplifier Modification

BUS

Modified Sense-Amp

Bit Bit_b

zero

Data Bit

Sense-Amp

ZIB ZIB_bsense

ZIB

I/O

addr

Ad

dre

ss D

ecod

er

bw

l

SRAM Cells

Sense Amplifiers

ZIB

Zero-valued data: Not driven onto bus Not in critical path ZIB read w/o delay

Page 10: Dynamic Zero Compression for Cache Energy Reduction

CPU Store and Cache Output Drivers

ZIB

W_EN

wri

te d

ata

CPU

LWL

Data Bits

ZIB

Cache

Data Bits

ZIB

Low

-Sw

ing

Bu

s

To WLG

8

88

8

Reduce Data Bus Energy Dissipation

Page 11: Dynamic Zero Compression for Cache Energy Reduction

Area Overhead

Area Overhead: 9% Zero-Indicator-Bits Sense Amplifiers WLG Circuitry I/O Circuitry

Byte slice of the sub-bank

(Data,ZIB,WLG)

Page 12: Dynamic Zero Compression for Cache Energy Reduction

Delay Overhead

No delay overhead for writes Zero check performed in parallel with tag check

2 F04 gate-delays for reads A pessimistic 7% worst case delay

Data Bits

Low

-Sw

ing

Bu

s

ZIB

LWL

Page 13: Dynamic Zero Compression for Cache Energy Reduction

Data Cache Energy Savings%

of

En

erg

y S

avin

gs

Savings obtained for a low-power cache with sub-banking, wordline gating, and low-swing bitlines

0

5

10

15

20

25

30

35

40

45

comp li

ijpeg go

vorte

x

m88k

gcc

perl

adpc

m_en

adpc

m_de

epic

unep

ic

g721

_en

g721

_de

mpeg_

en

mpeg_

de

pegw

it_en

pegw

it_de Avg

Page 14: Dynamic Zero Compression for Cache Energy Reduction

Bits Distribution for Instruction Cache

Zeros are not as prevalent in I-Cache. Use a recoding scheme to increase the zero-byte in I-cache. [Panich ’99] --- IWLG technique that compacts the

instructions. Use two-address form when src reg = dest reg Shorter immediates Three different instruction length: short, medium, long Gate the unused portion of the instruction to avoid bitline swing Faster read-out for top two bytes (opcode, reg. acc., inter-locks)

16 7 9Optimal:

lwl

s/m m/l

Page 15: Dynamic Zero Compression for Cache Energy Reduction

IWLG to Dynamic Zero Compression

Adopting IWLG technique for Dynamic Zero Compression Small modification on instruction format

Use 8-8-8-8 instead of 16-7-9 Upper two byte are zero-detected Lower two bytes are usage-detected Able to eliminate bitline swings of zero-valued

bytes in 2 upper bytesExample: Opcode 000000

Slower than IWLG due to wordline gating in the critical path

s/m m/l0? 0?8 8 8 8

lwl

Page 16: Dynamic Zero Compression for Cache Energy Reduction

Instruction Cache Bit Swing Reduction%

Red

ucti

on

of

Bit

lin

e

Sw

ing

s

0

5

10

15

20

25

30

35 byte w/o recodingbyte w/ recodingIWLG

Page 17: Dynamic Zero Compression for Cache Energy Reduction

Instruction Cache Energy Savings%

En

erg

y S

avin

gs

0

5

10

15

20

25 byte w/o recodingbyte w/ recodingIWLG

Page 18: Dynamic Zero Compression for Cache Energy Reduction

Conclusion

A novel hardware technique to reduce cache energy by eliminating the access of zero bytes. Small area and delay overhead

Area: 9%, Delay: 2 F04 gate-delays Average energy saving: D-Cache: 26%, I-

Cache:18%Processor wide: ~10% for typical embedded processors

Completely orthogonal to existing energy reduction techniques

Dynamic Zero Compression is applicable to Second level caches DRAM Datapath [Canal et. al. Micro-33]

Page 19: Dynamic Zero Compression for Cache Energy Reduction

Thank You!

http://www.cag.lcs.mit.edu/scale/


Recommended