+ All Categories
Home > Documents > Adaptive Cache Compression for High-Performance Processors Alaa Alameldeen and David Wood University...

Adaptive Cache Compression for High-Performance Processors Alaa Alameldeen and David Wood University...

Date post: 24-Dec-2015
Category:
Upload: stanley-garrison
View: 217 times
Download: 0 times
Share this document with a friend
Popular Tags:
49
Adaptive Cache Adaptive Cache Compression for High- Compression for High- Performance Processors Performance Processors Alaa Alameldeen Alaa Alameldeen and David Wood and David Wood University of Wisconsin-Madison University of Wisconsin-Madison Wisconsin Multifacet Project Wisconsin Multifacet Project http://www.cs.wisc.edu/multifacet http://www.cs.wisc.edu/multifacet
Transcript
Page 1: Adaptive Cache Compression for High-Performance Processors Alaa Alameldeen and David Wood University of Wisconsin-Madison Wisconsin Multifacet Project.

Adaptive Cache Adaptive Cache Compression for High-Compression for High-

Performance ProcessorsPerformance Processors

Alaa Alameldeen Alaa Alameldeen and David Woodand David Wood

University of Wisconsin-MadisonUniversity of Wisconsin-Madison

Wisconsin Multifacet ProjectWisconsin Multifacet Project

http://www.cs.wisc.edu/multifacethttp://www.cs.wisc.edu/multifacet

Page 2: Adaptive Cache Compression for High-Performance Processors Alaa Alameldeen and David Wood University of Wisconsin-Madison Wisconsin Multifacet Project.

ISCA 2004ISCA 2004 Alaa Alameldeen – Adaptive Cache CompressionAlaa Alameldeen – Adaptive Cache Compression 22

OverviewOverview Design of high performance processors Design of high performance processors

Processor speed improves faster than memoryProcessor speed improves faster than memory

Memory latency dominates performanceMemory latency dominates performance Need more effective cache designsNeed more effective cache designs

On-chip cache compression On-chip cache compression + Increases effective cache sizeIncreases effective cache size- Increases cache hit latencyIncreases cache hit latency

Does cache compression help or hurt?Does cache compression help or hurt?

Page 3: Adaptive Cache Compression for High-Performance Processors Alaa Alameldeen and David Wood University of Wisconsin-Madison Wisconsin Multifacet Project.

ISCA 2004ISCA 2004 Alaa Alameldeen – Adaptive Cache CompressionAlaa Alameldeen – Adaptive Cache Compression 33

0

0.2

0.4

0.6

0.8

1

1.2

Norm

alize

d R

unti

me

_ _____ ______

Does Cache Compression Help or Does Cache Compression Help or Hurt?Hurt?

Page 4: Adaptive Cache Compression for High-Performance Processors Alaa Alameldeen and David Wood University of Wisconsin-Madison Wisconsin Multifacet Project.

ISCA 2004ISCA 2004 Alaa Alameldeen – Adaptive Cache CompressionAlaa Alameldeen – Adaptive Cache Compression 44

0

0.2

0.4

0.6

0.8

1

1.2

Norm

alize

d R

unti

me

apache

No Compression

Compression

Does Cache Compression Help or Does Cache Compression Help or Hurt?Hurt?

Page 5: Adaptive Cache Compression for High-Performance Processors Alaa Alameldeen and David Wood University of Wisconsin-Madison Wisconsin Multifacet Project.

ISCA 2004ISCA 2004 Alaa Alameldeen – Adaptive Cache CompressionAlaa Alameldeen – Adaptive Cache Compression 55

0

0.2

0.4

0.6

0.8

1

1.2

Norm

alize

d R

unti

me

apache ammp

No Compression

Compression

Does Cache Compression Help or Does Cache Compression Help or Hurt?Hurt?

Page 6: Adaptive Cache Compression for High-Performance Processors Alaa Alameldeen and David Wood University of Wisconsin-Madison Wisconsin Multifacet Project.

ISCA 2004ISCA 2004 Alaa Alameldeen – Adaptive Cache CompressionAlaa Alameldeen – Adaptive Cache Compression 66

0

0.2

0.4

0.6

0.8

1

1.2

Norm

alize

d R

unti

me

apache ammp

No Compression

Compression

Adaptive

Does Cache Compression Help or Does Cache Compression Help or Hurt?Hurt?

Adaptive Compression determines when compression is beneficialAdaptive Compression determines when compression is beneficial

Page 7: Adaptive Cache Compression for High-Performance Processors Alaa Alameldeen and David Wood University of Wisconsin-Madison Wisconsin Multifacet Project.

ISCA 2004ISCA 2004 Alaa Alameldeen – Adaptive Cache CompressionAlaa Alameldeen – Adaptive Cache Compression 77

OutlineOutline

MotivationMotivation

Cache Compression FrameworkCache Compression Framework Compressed Cache HierarchyCompressed Cache Hierarchy Decoupled Variable-Segment CacheDecoupled Variable-Segment Cache

Adaptive CompressionAdaptive Compression

EvaluationEvaluation

ConclusionsConclusions

Page 8: Adaptive Cache Compression for High-Performance Processors Alaa Alameldeen and David Wood University of Wisconsin-Madison Wisconsin Multifacet Project.

ISCA 2004ISCA 2004 Alaa Alameldeen – Adaptive Cache CompressionAlaa Alameldeen – Adaptive Cache Compression 88

Compressed Cache HierarchyCompressed Cache Hierarchy

InstructionInstruction FetcherFetcher

L2 Cache (Compressed)L2 Cache (Compressed)

L1 D-CacheL1 D-Cache(Uncompressed)(Uncompressed)

Load-StoreLoad-StoreQueueQueue

L1 I-CacheL1 I-Cache(Uncompressed)(Uncompressed)

L1 Victim CacheL1 Victim Cache

CompressionCompressionPipelinePipeline

DecompressionDecompressionPipelinePipeline

UncompressedUncompressedLineLine

BypassBypass

From MemoryFrom Memory To MemoryTo Memory

Page 9: Adaptive Cache Compression for High-Performance Processors Alaa Alameldeen and David Wood University of Wisconsin-Madison Wisconsin Multifacet Project.

ISCA 2004ISCA 2004 Alaa Alameldeen – Adaptive Cache CompressionAlaa Alameldeen – Adaptive Cache Compression 99

Address BAddress B

Decoupled Variable-Segment Decoupled Variable-Segment CacheCache

Objective: pack more lines into the same Objective: pack more lines into the same spacespace

Data AreaData Area

Address AAddress A

Tag AreaTag Area

2-way set-associative with 64-byte lines2-way set-associative with 64-byte lines

Tag Contains Address Tag, Permissions, LRU Tag Contains Address Tag, Permissions, LRU (Replacement) Bits(Replacement) Bits

Page 10: Adaptive Cache Compression for High-Performance Processors Alaa Alameldeen and David Wood University of Wisconsin-Madison Wisconsin Multifacet Project.

ISCA 2004ISCA 2004 Alaa Alameldeen – Adaptive Cache CompressionAlaa Alameldeen – Adaptive Cache Compression 1010

Address BAddress B

Decoupled Variable-Segment Decoupled Variable-Segment CacheCache

Objective: pack more lines into the same Objective: pack more lines into the same spacespace

Data AreaData Area

Address AAddress A

Tag AreaTag Area

Address CAddress C

Address DAddress D

Add two Add two more tagsmore tags

Page 11: Adaptive Cache Compression for High-Performance Processors Alaa Alameldeen and David Wood University of Wisconsin-Madison Wisconsin Multifacet Project.

ISCA 2004ISCA 2004 Alaa Alameldeen – Adaptive Cache CompressionAlaa Alameldeen – Adaptive Cache Compression 1111

Address BAddress B

Decoupled Variable-Segment Decoupled Variable-Segment CacheCache

Objective: pack more lines into the same Objective: pack more lines into the same spacespace

Data AreaData Area

Address AAddress A

Tag AreaTag Area

Address CAddress C

Address DAddress D

Add Compression Size, Add Compression Size, Status, More LRU bitsStatus, More LRU bits

Page 12: Adaptive Cache Compression for High-Performance Processors Alaa Alameldeen and David Wood University of Wisconsin-Madison Wisconsin Multifacet Project.

ISCA 2004ISCA 2004 Alaa Alameldeen – Adaptive Cache CompressionAlaa Alameldeen – Adaptive Cache Compression 1212

Address BAddress B

Decoupled Variable-Segment Decoupled Variable-Segment CacheCache

Objective: pack more lines into the same Objective: pack more lines into the same spacespace

Data AreaData Area

Address AAddress A

Tag AreaTag Area

Address CAddress C

Address DAddress D

Divide Data Area into Divide Data Area into 8-byte segments8-byte segments

Page 13: Adaptive Cache Compression for High-Performance Processors Alaa Alameldeen and David Wood University of Wisconsin-Madison Wisconsin Multifacet Project.

ISCA 2004ISCA 2004 Alaa Alameldeen – Adaptive Cache CompressionAlaa Alameldeen – Adaptive Cache Compression 1313

Decoupled Variable-Segment Decoupled Variable-Segment CacheCache

Objective: pack more lines into the same Objective: pack more lines into the same spacespace

Data AreaData AreaTag AreaTag Area

Address BAddress B

Address AAddress A

Address CAddress C

Address DAddress D

Data lines composed Data lines composed of 1-8 segmentsof 1-8 segments

Page 14: Adaptive Cache Compression for High-Performance Processors Alaa Alameldeen and David Wood University of Wisconsin-Madison Wisconsin Multifacet Project.

ISCA 2004ISCA 2004 Alaa Alameldeen – Adaptive Cache CompressionAlaa Alameldeen – Adaptive Cache Compression 1414

Addr B compressed 2 Addr B compressed 2

Decoupled Variable-Segment Decoupled Variable-Segment CacheCache

Objective: pack more lines into the same Objective: pack more lines into the same spacespace

Data AreaData Area

Addr A uncompressed 3Addr A uncompressed 3

Addr C compressed 6Addr C compressed 6

Addr D compressed 4Addr D compressed 4

Tag AreaTag Area

Compression StatusCompression Status Compressed SizeCompressed SizeTag is present Tag is present but line isn’tbut line isn’t

Page 15: Adaptive Cache Compression for High-Performance Processors Alaa Alameldeen and David Wood University of Wisconsin-Madison Wisconsin Multifacet Project.

ISCA 2004ISCA 2004 Alaa Alameldeen – Adaptive Cache CompressionAlaa Alameldeen – Adaptive Cache Compression 1515

OutlineOutline

MotivationMotivation

Cache Compression FrameworkCache Compression Framework

Adaptive CompressionAdaptive Compression Key InsightKey Insight Classification of L2 accessesClassification of L2 accesses Global compression predictorGlobal compression predictor

EvaluationEvaluation

ConclusionsConclusions

Page 16: Adaptive Cache Compression for High-Performance Processors Alaa Alameldeen and David Wood University of Wisconsin-Madison Wisconsin Multifacet Project.

ISCA 2004ISCA 2004 Alaa Alameldeen – Adaptive Cache CompressionAlaa Alameldeen – Adaptive Cache Compression 1616

Adaptive CompressionAdaptive Compression

Use past to predict futureUse past to predict future

Key Insight:Key Insight: LRU Stack [Mattson, et al., 1970] indicates for each LRU Stack [Mattson, et al., 1970] indicates for each

reference whether compression helps or hurtsreference whether compression helps or hurts

Benefit(CompressionBenefit(Compression) )

> Cost(Compression> Cost(Compression))

Do not compress Do not compress future linesfuture lines

Compress Compress future linesfuture lines

YesYes NoNo

Page 17: Adaptive Cache Compression for High-Performance Processors Alaa Alameldeen and David Wood University of Wisconsin-Madison Wisconsin Multifacet Project.

ISCA 2004ISCA 2004 Alaa Alameldeen – Adaptive Cache CompressionAlaa Alameldeen – Adaptive Cache Compression 1717

Cost/Benefit ClassificationCost/Benefit Classification

Classify each cache referenceClassify each cache reference Four-way SA cache with space for two 64-byte linesFour-way SA cache with space for two 64-byte lines

Total of 16 available segmentsTotal of 16 available segments

Addr A uncompressed 3Addr A uncompressed 3

Addr B compressed 2Addr B compressed 2

LRU StackLRU Stack Data AreaData Area

Addr C compressed 6Addr C compressed 6

Addr D compressed 4Addr D compressed 4

Page 18: Adaptive Cache Compression for High-Performance Processors Alaa Alameldeen and David Wood University of Wisconsin-Madison Wisconsin Multifacet Project.

ISCA 2004ISCA 2004 Alaa Alameldeen – Adaptive Cache CompressionAlaa Alameldeen – Adaptive Cache Compression 1818

An Unpenalized HitAn Unpenalized Hit

Read/Write Address ARead/Write Address A LRU Stack order = 1 LRU Stack order = 1 ≤≤ 2 2 Hit regardless of compression Hit regardless of compression Uncompressed Line Uncompressed Line No decompression penalty No decompression penalty Neither cost nor benefitNeither cost nor benefit

Addr A uncompressed 3Addr A uncompressed 3

Addr B compressed 2Addr B compressed 2

LRU StackLRU Stack Data AreaData Area

Addr C compressed 6Addr C compressed 6

Addr D compressed 4Addr D compressed 4

Page 19: Adaptive Cache Compression for High-Performance Processors Alaa Alameldeen and David Wood University of Wisconsin-Madison Wisconsin Multifacet Project.

ISCA 2004ISCA 2004 Alaa Alameldeen – Adaptive Cache CompressionAlaa Alameldeen – Adaptive Cache Compression 1919

A Penalized HitA Penalized Hit

Read/Write Address BRead/Write Address B LRU Stack order = 2 LRU Stack order = 2 ≤≤ 2 2 Hit regardless of compression Hit regardless of compression Compressed Line Compressed Line Decompression penalty incurred Decompression penalty incurred Compression costCompression cost

Addr A uncompressed 3Addr A uncompressed 3

Addr B compressed 2Addr B compressed 2

LRU StackLRU Stack Data AreaData Area

Addr C compressed 6Addr C compressed 6

Addr D compressed 4Addr D compressed 4

Page 20: Adaptive Cache Compression for High-Performance Processors Alaa Alameldeen and David Wood University of Wisconsin-Madison Wisconsin Multifacet Project.

ISCA 2004ISCA 2004 Alaa Alameldeen – Adaptive Cache CompressionAlaa Alameldeen – Adaptive Cache Compression 2020

An Avoided MissAn Avoided Miss

Read/Write Address CRead/Write Address C LRU Stack order = 3 LRU Stack order = 3 >> 2 2 Hit only because of compression Hit only because of compression Compression benefit: Eliminated off-chip missCompression benefit: Eliminated off-chip miss

Addr A uncompressed 3Addr A uncompressed 3

Addr B compressed 2Addr B compressed 2

LRU StackLRU Stack Data AreaData Area

Addr C compressed 6Addr C compressed 6

Addr D compressed 4Addr D compressed 4

Page 21: Adaptive Cache Compression for High-Performance Processors Alaa Alameldeen and David Wood University of Wisconsin-Madison Wisconsin Multifacet Project.

ISCA 2004ISCA 2004 Alaa Alameldeen – Adaptive Cache CompressionAlaa Alameldeen – Adaptive Cache Compression 2121

An Avoidable MissAn Avoidable Miss

Read/Write Address DRead/Write Address D Line is not in the cache but tag exists at LRU stack order = 4Line is not in the cache but tag exists at LRU stack order = 4 Missed only because some lines are not compressedMissed only because some lines are not compressed Potential compression benefitPotential compression benefit

Addr A uncompressed 3Addr A uncompressed 3

Addr B compressed 2Addr B compressed 2

LRU StackLRU Stack Data AreaData Area

Addr C compressed 6Addr C compressed 6

Addr D compressed 4Addr D compressed 4

Sum(CSize) = 15 Sum(CSize) = 15 ≤ 16≤ 16

Page 22: Adaptive Cache Compression for High-Performance Processors Alaa Alameldeen and David Wood University of Wisconsin-Madison Wisconsin Multifacet Project.

ISCA 2004ISCA 2004 Alaa Alameldeen – Adaptive Cache CompressionAlaa Alameldeen – Adaptive Cache Compression 2222

An Unavoidable MissAn Unavoidable Miss

Read/Write Address ERead/Write Address E LRU stack order > 4 LRU stack order > 4 Compression wouldn’t have helped Compression wouldn’t have helped Line is not in the cache and tag does not existLine is not in the cache and tag does not exist Neither cost nor benefitNeither cost nor benefit

Addr A uncompressed 3Addr A uncompressed 3

Addr B compressed 2Addr B compressed 2

LRU StackLRU Stack Data AreaData Area

Addr C compressed 6Addr C compressed 6

Addr D compressed 4Addr D compressed 4

Page 23: Adaptive Cache Compression for High-Performance Processors Alaa Alameldeen and David Wood University of Wisconsin-Madison Wisconsin Multifacet Project.

ISCA 2004ISCA 2004 Alaa Alameldeen – Adaptive Cache CompressionAlaa Alameldeen – Adaptive Cache Compression 2323

Compression PredictorCompression Predictor

Estimate: Benefit(Compression) – Cost(Compression)Estimate: Benefit(Compression) – Cost(Compression)

Single counter : Global Compression Predictor (GCP)Single counter : Global Compression Predictor (GCP) Saturating up/down 19-bit counterSaturating up/down 19-bit counter

GCP updated on each cache accessGCP updated on each cache access Benefit: Increment by memory latencyBenefit: Increment by memory latency Cost: Decrement by decompression latencyCost: Decrement by decompression latency Optimization: Normalize to decompression latency = 1Optimization: Normalize to decompression latency = 1

Cache AllocationCache Allocation Allocate compressed line if GCP Allocate compressed line if GCP 0 0 Allocate uncompressed lines if GCP < 0Allocate uncompressed lines if GCP < 0

Page 24: Adaptive Cache Compression for High-Performance Processors Alaa Alameldeen and David Wood University of Wisconsin-Madison Wisconsin Multifacet Project.

ISCA 2004ISCA 2004 Alaa Alameldeen – Adaptive Cache CompressionAlaa Alameldeen – Adaptive Cache Compression 2424

OutlineOutline

MotivationMotivation

Cache Compression FrameworkCache Compression Framework

Adaptive CompressionAdaptive Compression

EvaluationEvaluation Simulation SetupSimulation Setup PerformancePerformance

ConclusionsConclusions

Page 25: Adaptive Cache Compression for High-Performance Processors Alaa Alameldeen and David Wood University of Wisconsin-Madison Wisconsin Multifacet Project.

ISCA 2004ISCA 2004 Alaa Alameldeen – Adaptive Cache CompressionAlaa Alameldeen – Adaptive Cache Compression 2525

Simulation SetupSimulation Setup

Simics full system simulator augmented Simics full system simulator augmented with:with:

Detailed OoO processor simulator [TFSim, Mauer, et al., 2002]Detailed OoO processor simulator [TFSim, Mauer, et al., 2002] Detailed memory timing simulator [Martin, et al., 2002]Detailed memory timing simulator [Martin, et al., 2002]

Workloads: Workloads: Commercial workloads:Commercial workloads:

Database servers: OLTP and SPECJBBDatabase servers: OLTP and SPECJBB Static Web serving: Apache and ZeusStatic Web serving: Apache and Zeus

SPEC2000 benchmarks:SPEC2000 benchmarks: SPECint: bzip, gcc, mcf, twolfSPECint: bzip, gcc, mcf, twolf SPECfp: ammp, applu, equake, swimSPECfp: ammp, applu, equake, swim

Page 26: Adaptive Cache Compression for High-Performance Processors Alaa Alameldeen and David Wood University of Wisconsin-Madison Wisconsin Multifacet Project.

ISCA 2004ISCA 2004 Alaa Alameldeen – Adaptive Cache CompressionAlaa Alameldeen – Adaptive Cache Compression 2626

System configurationSystem configuration

A dynamically scheduled SPARC V9 uniprocessorA dynamically scheduled SPARC V9 uniprocessor

Configuration parameters:Configuration parameters:

L1 CacheL1 Cache Split I&D, 64KB each, 2-way SA, 64B Split I&D, 64KB each, 2-way SA, 64B line, 2-cycles/accessline, 2-cycles/access

L2 CacheL2 Cache Unified 4MB, Unified 4MB, 8-way8-way SA, 64B line, SA, 64B line, 20cycles+decompression latency per 20cycles+decompression latency per accessaccess

MemoryMemory 4GB DRAM, 400-cycle access time, 128 4GB DRAM, 400-cycle access time, 128 outstanding requestsoutstanding requests

Processor pipelineProcessor pipeline 4-wide superscalar, 11-stage pipeline: 4-wide superscalar, 11-stage pipeline: fetch (3), decode(3), schedule(1), fetch (3), decode(3), schedule(1), execute(1+), retire(3)execute(1+), retire(3)

Reorder bufferReorder buffer 64 entries64 entries

Page 27: Adaptive Cache Compression for High-Performance Processors Alaa Alameldeen and David Wood University of Wisconsin-Madison Wisconsin Multifacet Project.

ISCA 2004ISCA 2004 Alaa Alameldeen – Adaptive Cache CompressionAlaa Alameldeen – Adaptive Cache Compression 2727

Simulated Cache ConfigurationsSimulated Cache Configurations

Always:Always: All compressible lines are stored in All compressible lines are stored in compressed formatcompressed format Decompression penalty for all compressed linesDecompression penalty for all compressed lines

Never:Never: All cache lines are stored in All cache lines are stored in uncompressed format uncompressed format Cache is 8-way set associative with half the number of Cache is 8-way set associative with half the number of

setssets Does not incur decompression penaltyDoes not incur decompression penalty

Adaptive:Adaptive: Our adaptive compression Our adaptive compression schemescheme

Page 28: Adaptive Cache Compression for High-Performance Processors Alaa Alameldeen and David Wood University of Wisconsin-Madison Wisconsin Multifacet Project.

ISCA 2004ISCA 2004 Alaa Alameldeen – Adaptive Cache CompressionAlaa Alameldeen – Adaptive Cache Compression 2828

PerformancePerformance

0

0.2

0.4

0.6

0.8

1

1.2bzip

gcc

mcf

twolf

ammp

applu

equake

swim

apache

zeus

oltp jbb

Norm

alize

d R

unti

me

NeverAlwaysAdaptive

SpecINTSpecINT SpecFPSpecFP CommercialCommercial

Page 29: Adaptive Cache Compression for High-Performance Processors Alaa Alameldeen and David Wood University of Wisconsin-Madison Wisconsin Multifacet Project.

ISCA 2004ISCA 2004 Alaa Alameldeen – Adaptive Cache CompressionAlaa Alameldeen – Adaptive Cache Compression 2929

PerformancePerformance

0

0.2

0.4

0.6

0.8

1

1.2bzip

gcc

mcf

twolf

ammp

applu

equake

swim

apache

zeus

oltp jbb

Norm

alize

d R

unti

me

NeverAlwaysAdaptive

Page 30: Adaptive Cache Compression for High-Performance Processors Alaa Alameldeen and David Wood University of Wisconsin-Madison Wisconsin Multifacet Project.

ISCA 2004ISCA 2004 Alaa Alameldeen – Adaptive Cache CompressionAlaa Alameldeen – Adaptive Cache Compression 3030

PerformancePerformance

0

0.2

0.4

0.6

0.8

1

1.2bzip

gcc

mcf

twolf

ammp

applu

equake

swim

apache

zeus

oltp jbb

Norm

alize

d R

unti

me

NeverAlwaysAdaptive

35% 35% SpeeduSpeedu

pp

18% 18% SlowdownSlowdown

Page 31: Adaptive Cache Compression for High-Performance Processors Alaa Alameldeen and David Wood University of Wisconsin-Madison Wisconsin Multifacet Project.

ISCA 2004ISCA 2004 Alaa Alameldeen – Adaptive Cache CompressionAlaa Alameldeen – Adaptive Cache Compression 3131

PerformancePerformance

0

0.2

0.4

0.6

0.8

1

1.2bzip

gcc

mcf

twolf

ammp

applu

equake

swim

apache

zeus

oltp jbb

Norm

alize

d R

unti

me

NeverAlwaysAdaptive

Adaptive performs similar to the best of Always and Adaptive performs similar to the best of Always and NeverNever

Bug in GCP Bug in GCP updateupdate

Page 32: Adaptive Cache Compression for High-Performance Processors Alaa Alameldeen and David Wood University of Wisconsin-Madison Wisconsin Multifacet Project.

ISCA 2004ISCA 2004 Alaa Alameldeen – Adaptive Cache CompressionAlaa Alameldeen – Adaptive Cache Compression 3232

Effective Cache CapacityEffective Cache Capacity

Page 33: Adaptive Cache Compression for High-Performance Processors Alaa Alameldeen and David Wood University of Wisconsin-Madison Wisconsin Multifacet Project.

ISCA 2004ISCA 2004 Alaa Alameldeen – Adaptive Cache CompressionAlaa Alameldeen – Adaptive Cache Compression 3333

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

ammp gcc mcf apache

Norm

alized M

iss R

ate

NeverAlways

Cache Miss RatesCache Miss Rates

Penalized Hits Penalized Hits Per Per

Avoided MissAvoided Miss67096709 489 12.3 4.7 489 12.3 4.7

0.09 2.52 12.28 14.380.09 2.52 12.28 14.38Misses PerMisses Per

1000 1000 InstructionsInstructions

Page 34: Adaptive Cache Compression for High-Performance Processors Alaa Alameldeen and David Wood University of Wisconsin-Madison Wisconsin Multifacet Project.

ISCA 2004ISCA 2004 Alaa Alameldeen – Adaptive Cache CompressionAlaa Alameldeen – Adaptive Cache Compression 3434

Adapting to L2 SizesAdapting to L2 Sizes

ammp

0

0.2

0.4

0.6

0.8

1

1.2

256K 1M 4M 16M

Norm

alize

d R

unti

me

NeverAlwaysAdaptive

0.93 5.7 6503 3260000.93 5.7 6503 326000

104.8 36.9 0.09 0.05104.8 36.9 0.09 0.05Misses PerMisses Per

1000 1000 InstructionsInstructions

Penalized Hits Penalized Hits Per Per

Avoided MissAvoided Miss

Page 35: Adaptive Cache Compression for High-Performance Processors Alaa Alameldeen and David Wood University of Wisconsin-Madison Wisconsin Multifacet Project.

ISCA 2004ISCA 2004 Alaa Alameldeen – Adaptive Cache CompressionAlaa Alameldeen – Adaptive Cache Compression 3535

ConclusionsConclusions

Cache compression increases cache Cache compression increases cache capacity but slows down cache hit timecapacity but slows down cache hit time Helps some benchmarks (e.g., apache, mcf)Helps some benchmarks (e.g., apache, mcf) Hurts other benchmarks (e.g., gcc, ammp)Hurts other benchmarks (e.g., gcc, ammp)

Our Proposal: Adaptive compression Our Proposal: Adaptive compression Uses (LRU) replacement stack to determine whether Uses (LRU) replacement stack to determine whether

compression helps or hurtscompression helps or hurts Updates a single global saturating counter on cache Updates a single global saturating counter on cache

accessesaccesses

Adaptive compression performs similar to Adaptive compression performs similar to the better of the better of Always CompressAlways Compress and and Never Never CompressCompress

Page 36: Adaptive Cache Compression for High-Performance Processors Alaa Alameldeen and David Wood University of Wisconsin-Madison Wisconsin Multifacet Project.

ISCA 2004ISCA 2004 Alaa Alameldeen – Adaptive Cache CompressionAlaa Alameldeen – Adaptive Cache Compression 3636

Backup SlidesBackup Slides

Frequent Pattern Compression (FPC)Frequent Pattern Compression (FPC) Decoupled Variable-Segment CacheDecoupled Variable-Segment Cache Classification of L2 AccessesClassification of L2 Accesses (LRU) Stack Replacement(LRU) Stack Replacement Cache Miss RatesCache Miss Rates Adapting to L2 SizesAdapting to L2 Sizes – mcf – mcf Adapting to L1 SizeAdapting to L1 Size Adapting to Decompression LatencyAdapting to Decompression Latency – mcf – mcf Adapting to Decompression LatencyAdapting to Decompression Latency – ammp – ammp Phase BehaviorPhase Behavior – gcc – gcc Phase BehaviorPhase Behavior – mcf – mcf Can We Do Better Than Adaptive?Can We Do Better Than Adaptive?

Page 37: Adaptive Cache Compression for High-Performance Processors Alaa Alameldeen and David Wood University of Wisconsin-Madison Wisconsin Multifacet Project.

ISCA 2004ISCA 2004 Alaa Alameldeen – Adaptive Cache CompressionAlaa Alameldeen – Adaptive Cache Compression 3737

Decoupled Variable-Segment Decoupled Variable-Segment CacheCache

Each set contains Each set contains fourfour tags and space for tags and space for twotwo uncompressed lines uncompressed lines

Data area divided into 8-byte segments Data area divided into 8-byte segments

Each tag is composed of:Each tag is composed of: Address tagAddress tag PermissionsPermissions

CStatus : 1 if the line is compressed, 0 otherwiseCStatus : 1 if the line is compressed, 0 otherwise CSize: Size of compressed line in segmentsCSize: Size of compressed line in segments LRU/replacement bitsLRU/replacement bits

Same as Same as uncompressed uncompressed

cachecache

Page 38: Adaptive Cache Compression for High-Performance Processors Alaa Alameldeen and David Wood University of Wisconsin-Madison Wisconsin Multifacet Project.

ISCA 2004ISCA 2004 Alaa Alameldeen – Adaptive Cache CompressionAlaa Alameldeen – Adaptive Cache Compression 3838

Frequent Pattern CompressionFrequent Pattern Compression

A significance-based compression algorithmA significance-based compression algorithm

Related Work: Related Work: X-Match and X-RL Algorithms [Kjelso, et al., 1996]X-Match and X-RL Algorithms [Kjelso, et al., 1996] Address and data significance-based compression [Farrens and Address and data significance-based compression [Farrens and

Park, 1991, Citron and Rudolph, 1995, Canal, et al., 2000]Park, 1991, Citron and Rudolph, 1995, Canal, et al., 2000]

A 64-byte line is decompressed in five cyclesA 64-byte line is decompressed in five cycles

More details in technical report:More details in technical report: ““Frequent Pattern Compression: A Significance-Based Compression Frequent Pattern Compression: A Significance-Based Compression

Algorithm for L2 CachesAlgorithm for L2 Caches,” ,” Alaa R. Alameldeen and David A. Wood, Alaa R. Alameldeen and David A. Wood, Dept. of Computer Sciences Technical Report CS-TR-2004-1500, April Dept. of Computer Sciences Technical Report CS-TR-2004-1500, April 2004 (available online). 2004 (available online).

Page 39: Adaptive Cache Compression for High-Performance Processors Alaa Alameldeen and David Wood University of Wisconsin-Madison Wisconsin Multifacet Project.

ISCA 2004ISCA 2004 Alaa Alameldeen – Adaptive Cache CompressionAlaa Alameldeen – Adaptive Cache Compression 3939

Frequent Pattern Compression Frequent Pattern Compression (FPC)(FPC)

A significance-based compression algorithm A significance-based compression algorithm combined with zero run-length encodingcombined with zero run-length encoding

Compresses each 32-bit word separatelyCompresses each 32-bit word separately Suitable for short (32-256 byte) cache linesSuitable for short (32-256 byte) cache lines Compressible Patterns: zero runs, sign-ext. 4,8,16-bits, zero-Compressible Patterns: zero runs, sign-ext. 4,8,16-bits, zero-

padded half-word, two SE half-words, repeated bytepadded half-word, two SE half-words, repeated byte A 64-byte line is decompressed in a five-stage pipelineA 64-byte line is decompressed in a five-stage pipeline

More details in technical report:More details in technical report: ““Frequent Pattern Compression: A Significance-Based Compression Frequent Pattern Compression: A Significance-Based Compression

Algorithm for L2 CachesAlgorithm for L2 Caches,” ,” Alaa R. Alameldeen and David A. Wood, Alaa R. Alameldeen and David A. Wood, Dept. of Computer Sciences Technical Report CS-TR-2004-1500, April Dept. of Computer Sciences Technical Report CS-TR-2004-1500, April 2004 (available online). 2004 (available online).

Page 40: Adaptive Cache Compression for High-Performance Processors Alaa Alameldeen and David Wood University of Wisconsin-Madison Wisconsin Multifacet Project.

ISCA 2004ISCA 2004 Alaa Alameldeen – Adaptive Cache CompressionAlaa Alameldeen – Adaptive Cache Compression 4040

Classification of L2 AccessesClassification of L2 Accesses

Cache hits:Cache hits: Unpenalized hit:Unpenalized hit: Hit to an Hit to an uncompresseduncompressed line that line that would have would have

hithit without compression without compression - Penalized hit:Penalized hit: Hit to a Hit to a compressedcompressed line that line that would have hitwould have hit

without compressionwithout compression+ Avoided miss:Avoided miss: Hit to a line that Hit to a line that would NOT have hitwould NOT have hit without without

compressioncompression

Cache misses:Cache misses:+ Avoidable miss:Avoidable miss: Miss to a line that Miss to a line that would have hitwould have hit with with

compression compression Unavoidable miss:Unavoidable miss: Miss to a line that Miss to a line that would have missedwould have missed

even with compressioneven with compression

Page 41: Adaptive Cache Compression for High-Performance Processors Alaa Alameldeen and David Wood University of Wisconsin-Madison Wisconsin Multifacet Project.

ISCA 2004ISCA 2004 Alaa Alameldeen – Adaptive Cache CompressionAlaa Alameldeen – Adaptive Cache Compression 4141

Differentiate penalized hits and avoided misses?Differentiate penalized hits and avoided misses? Only hits to top half of the tags in the LRU stack are penalized hitsOnly hits to top half of the tags in the LRU stack are penalized hits

Differentiate avoidable and unavoidable misses?Differentiate avoidable and unavoidable misses?

Is not dependent on LRU replacementIs not dependent on LRU replacement Any replacement algorithm for top half of tagsAny replacement algorithm for top half of tags Any stack algorithm for the remaining tagsAny stack algorithm for the remaining tags

(LRU) Stack Replacement(LRU) Stack Replacement

Page 42: Adaptive Cache Compression for High-Performance Processors Alaa Alameldeen and David Wood University of Wisconsin-Madison Wisconsin Multifacet Project.

ISCA 2004ISCA 2004 Alaa Alameldeen – Adaptive Cache CompressionAlaa Alameldeen – Adaptive Cache Compression 4242

Cache Miss RatesCache Miss Rates

0

0.2

0.4

0.6

0.8

1

1.2

ammp gcc mcf apache

Norm

alize

d M

iss

Rate

NeverAlwaysAdaptive

Page 43: Adaptive Cache Compression for High-Performance Processors Alaa Alameldeen and David Wood University of Wisconsin-Madison Wisconsin Multifacet Project.

ISCA 2004ISCA 2004 Alaa Alameldeen – Adaptive Cache CompressionAlaa Alameldeen – Adaptive Cache Compression 4343

Adapting to L2 SizesAdapting to L2 Sizes

mcf

0

0.2

0.4

0.6

0.8

1

1.2

256K 1M 4M 16M

Norm

alize

d R

unti

me

NeverAlwaysAdaptive

11.6 4.4 12.6 2x1011.6 4.4 12.6 2x1066

98.9 88.1 12.4 0.0298.9 88.1 12.4 0.02Misses PerMisses Per

1000 1000 InstructionsInstructions

Penalized Hits Penalized Hits Per Per

Avoided MissAvoided Miss

Page 44: Adaptive Cache Compression for High-Performance Processors Alaa Alameldeen and David Wood University of Wisconsin-Madison Wisconsin Multifacet Project.

ISCA 2004ISCA 2004 Alaa Alameldeen – Adaptive Cache CompressionAlaa Alameldeen – Adaptive Cache Compression 4444

Adapting to L1 SizeAdapting to L1 Size

Page 45: Adaptive Cache Compression for High-Performance Processors Alaa Alameldeen and David Wood University of Wisconsin-Madison Wisconsin Multifacet Project.

ISCA 2004ISCA 2004 Alaa Alameldeen – Adaptive Cache CompressionAlaa Alameldeen – Adaptive Cache Compression 4545

Adapting to Decompression Adapting to Decompression LatencyLatency

mcf

0

0.2

0.4

0.6

0.8

1

1.2

0 5 10 15 20 25

Decompression Latency (Cycles)

Nor

mal

ized

Run

tim

e

NeverAlwaysAdaptive

Page 46: Adaptive Cache Compression for High-Performance Processors Alaa Alameldeen and David Wood University of Wisconsin-Madison Wisconsin Multifacet Project.

ISCA 2004ISCA 2004 Alaa Alameldeen – Adaptive Cache CompressionAlaa Alameldeen – Adaptive Cache Compression 4646

Adapting to Decompression Adapting to Decompression LatencyLatency

ammp

0

0.5

1

1.5

2

0 5 10 15 20 25

Decompression Latency (Cycles)

Nor

mal

ized

Run

tim

e

NeverAlwaysAdaptive

Page 47: Adaptive Cache Compression for High-Performance Processors Alaa Alameldeen and David Wood University of Wisconsin-Madison Wisconsin Multifacet Project.

ISCA 2004ISCA 2004 Alaa Alameldeen – Adaptive Cache CompressionAlaa Alameldeen – Adaptive Cache Compression 4747

Phase BehaviorPhase BehaviorPredictor Value (K)Predictor Value (K)

Cache Size (MB)Cache Size (MB)

Page 48: Adaptive Cache Compression for High-Performance Processors Alaa Alameldeen and David Wood University of Wisconsin-Madison Wisconsin Multifacet Project.

ISCA 2004ISCA 2004 Alaa Alameldeen – Adaptive Cache CompressionAlaa Alameldeen – Adaptive Cache Compression 4848

Phase BehaviorPhase BehaviorPredictor Value (K)Predictor Value (K)

Cache Size (MB)Cache Size (MB)

Page 49: Adaptive Cache Compression for High-Performance Processors Alaa Alameldeen and David Wood University of Wisconsin-Madison Wisconsin Multifacet Project.

ISCA 2004ISCA 2004 Alaa Alameldeen – Adaptive Cache CompressionAlaa Alameldeen – Adaptive Cache Compression 4949

Can We Do Better Than Adaptive?Can We Do Better Than Adaptive?

Optimal is an unrealistic configuration: Always with no Optimal is an unrealistic configuration: Always with no decompression penaltydecompression penalty

0

0.2

0.4

0.6

0.8

1

1.2bzip

gcc

mcf

twolf

ammp

applu

equake

swim

apache

zeus

oltp jbb

Norm

alize

d R

unti

me

NeverAlwaysAdaptiveOptimal


Recommended