Date post: | 24-Dec-2015 |
Category: |
Documents |
Upload: | stanley-garrison |
View: | 217 times |
Download: | 0 times |
Adaptive Cache Adaptive Cache Compression for High-Compression for High-
Performance ProcessorsPerformance Processors
Alaa Alameldeen Alaa Alameldeen and David Woodand David Wood
University of Wisconsin-MadisonUniversity of Wisconsin-Madison
Wisconsin Multifacet ProjectWisconsin Multifacet Project
http://www.cs.wisc.edu/multifacethttp://www.cs.wisc.edu/multifacet
ISCA 2004ISCA 2004 Alaa Alameldeen – Adaptive Cache CompressionAlaa Alameldeen – Adaptive Cache Compression 22
OverviewOverview Design of high performance processors Design of high performance processors
Processor speed improves faster than memoryProcessor speed improves faster than memory
Memory latency dominates performanceMemory latency dominates performance Need more effective cache designsNeed more effective cache designs
On-chip cache compression On-chip cache compression + Increases effective cache sizeIncreases effective cache size- Increases cache hit latencyIncreases cache hit latency
Does cache compression help or hurt?Does cache compression help or hurt?
ISCA 2004ISCA 2004 Alaa Alameldeen – Adaptive Cache CompressionAlaa Alameldeen – Adaptive Cache Compression 33
0
0.2
0.4
0.6
0.8
1
1.2
Norm
alize
d R
unti
me
_ _____ ______
Does Cache Compression Help or Does Cache Compression Help or Hurt?Hurt?
ISCA 2004ISCA 2004 Alaa Alameldeen – Adaptive Cache CompressionAlaa Alameldeen – Adaptive Cache Compression 44
0
0.2
0.4
0.6
0.8
1
1.2
Norm
alize
d R
unti
me
apache
No Compression
Compression
Does Cache Compression Help or Does Cache Compression Help or Hurt?Hurt?
ISCA 2004ISCA 2004 Alaa Alameldeen – Adaptive Cache CompressionAlaa Alameldeen – Adaptive Cache Compression 55
0
0.2
0.4
0.6
0.8
1
1.2
Norm
alize
d R
unti
me
apache ammp
No Compression
Compression
Does Cache Compression Help or Does Cache Compression Help or Hurt?Hurt?
ISCA 2004ISCA 2004 Alaa Alameldeen – Adaptive Cache CompressionAlaa Alameldeen – Adaptive Cache Compression 66
0
0.2
0.4
0.6
0.8
1
1.2
Norm
alize
d R
unti
me
apache ammp
No Compression
Compression
Adaptive
Does Cache Compression Help or Does Cache Compression Help or Hurt?Hurt?
Adaptive Compression determines when compression is beneficialAdaptive Compression determines when compression is beneficial
ISCA 2004ISCA 2004 Alaa Alameldeen – Adaptive Cache CompressionAlaa Alameldeen – Adaptive Cache Compression 77
OutlineOutline
MotivationMotivation
Cache Compression FrameworkCache Compression Framework Compressed Cache HierarchyCompressed Cache Hierarchy Decoupled Variable-Segment CacheDecoupled Variable-Segment Cache
Adaptive CompressionAdaptive Compression
EvaluationEvaluation
ConclusionsConclusions
ISCA 2004ISCA 2004 Alaa Alameldeen – Adaptive Cache CompressionAlaa Alameldeen – Adaptive Cache Compression 88
Compressed Cache HierarchyCompressed Cache Hierarchy
InstructionInstruction FetcherFetcher
L2 Cache (Compressed)L2 Cache (Compressed)
L1 D-CacheL1 D-Cache(Uncompressed)(Uncompressed)
Load-StoreLoad-StoreQueueQueue
L1 I-CacheL1 I-Cache(Uncompressed)(Uncompressed)
L1 Victim CacheL1 Victim Cache
CompressionCompressionPipelinePipeline
DecompressionDecompressionPipelinePipeline
UncompressedUncompressedLineLine
BypassBypass
From MemoryFrom Memory To MemoryTo Memory
ISCA 2004ISCA 2004 Alaa Alameldeen – Adaptive Cache CompressionAlaa Alameldeen – Adaptive Cache Compression 99
Address BAddress B
Decoupled Variable-Segment Decoupled Variable-Segment CacheCache
Objective: pack more lines into the same Objective: pack more lines into the same spacespace
Data AreaData Area
Address AAddress A
Tag AreaTag Area
2-way set-associative with 64-byte lines2-way set-associative with 64-byte lines
Tag Contains Address Tag, Permissions, LRU Tag Contains Address Tag, Permissions, LRU (Replacement) Bits(Replacement) Bits
ISCA 2004ISCA 2004 Alaa Alameldeen – Adaptive Cache CompressionAlaa Alameldeen – Adaptive Cache Compression 1010
Address BAddress B
Decoupled Variable-Segment Decoupled Variable-Segment CacheCache
Objective: pack more lines into the same Objective: pack more lines into the same spacespace
Data AreaData Area
Address AAddress A
Tag AreaTag Area
Address CAddress C
Address DAddress D
Add two Add two more tagsmore tags
ISCA 2004ISCA 2004 Alaa Alameldeen – Adaptive Cache CompressionAlaa Alameldeen – Adaptive Cache Compression 1111
Address BAddress B
Decoupled Variable-Segment Decoupled Variable-Segment CacheCache
Objective: pack more lines into the same Objective: pack more lines into the same spacespace
Data AreaData Area
Address AAddress A
Tag AreaTag Area
Address CAddress C
Address DAddress D
Add Compression Size, Add Compression Size, Status, More LRU bitsStatus, More LRU bits
ISCA 2004ISCA 2004 Alaa Alameldeen – Adaptive Cache CompressionAlaa Alameldeen – Adaptive Cache Compression 1212
Address BAddress B
Decoupled Variable-Segment Decoupled Variable-Segment CacheCache
Objective: pack more lines into the same Objective: pack more lines into the same spacespace
Data AreaData Area
Address AAddress A
Tag AreaTag Area
Address CAddress C
Address DAddress D
Divide Data Area into Divide Data Area into 8-byte segments8-byte segments
ISCA 2004ISCA 2004 Alaa Alameldeen – Adaptive Cache CompressionAlaa Alameldeen – Adaptive Cache Compression 1313
Decoupled Variable-Segment Decoupled Variable-Segment CacheCache
Objective: pack more lines into the same Objective: pack more lines into the same spacespace
Data AreaData AreaTag AreaTag Area
Address BAddress B
Address AAddress A
Address CAddress C
Address DAddress D
Data lines composed Data lines composed of 1-8 segmentsof 1-8 segments
ISCA 2004ISCA 2004 Alaa Alameldeen – Adaptive Cache CompressionAlaa Alameldeen – Adaptive Cache Compression 1414
Addr B compressed 2 Addr B compressed 2
Decoupled Variable-Segment Decoupled Variable-Segment CacheCache
Objective: pack more lines into the same Objective: pack more lines into the same spacespace
Data AreaData Area
Addr A uncompressed 3Addr A uncompressed 3
Addr C compressed 6Addr C compressed 6
Addr D compressed 4Addr D compressed 4
Tag AreaTag Area
Compression StatusCompression Status Compressed SizeCompressed SizeTag is present Tag is present but line isn’tbut line isn’t
ISCA 2004ISCA 2004 Alaa Alameldeen – Adaptive Cache CompressionAlaa Alameldeen – Adaptive Cache Compression 1515
OutlineOutline
MotivationMotivation
Cache Compression FrameworkCache Compression Framework
Adaptive CompressionAdaptive Compression Key InsightKey Insight Classification of L2 accessesClassification of L2 accesses Global compression predictorGlobal compression predictor
EvaluationEvaluation
ConclusionsConclusions
ISCA 2004ISCA 2004 Alaa Alameldeen – Adaptive Cache CompressionAlaa Alameldeen – Adaptive Cache Compression 1616
Adaptive CompressionAdaptive Compression
Use past to predict futureUse past to predict future
Key Insight:Key Insight: LRU Stack [Mattson, et al., 1970] indicates for each LRU Stack [Mattson, et al., 1970] indicates for each
reference whether compression helps or hurtsreference whether compression helps or hurts
Benefit(CompressionBenefit(Compression) )
> Cost(Compression> Cost(Compression))
Do not compress Do not compress future linesfuture lines
Compress Compress future linesfuture lines
YesYes NoNo
ISCA 2004ISCA 2004 Alaa Alameldeen – Adaptive Cache CompressionAlaa Alameldeen – Adaptive Cache Compression 1717
Cost/Benefit ClassificationCost/Benefit Classification
Classify each cache referenceClassify each cache reference Four-way SA cache with space for two 64-byte linesFour-way SA cache with space for two 64-byte lines
Total of 16 available segmentsTotal of 16 available segments
Addr A uncompressed 3Addr A uncompressed 3
Addr B compressed 2Addr B compressed 2
LRU StackLRU Stack Data AreaData Area
Addr C compressed 6Addr C compressed 6
Addr D compressed 4Addr D compressed 4
ISCA 2004ISCA 2004 Alaa Alameldeen – Adaptive Cache CompressionAlaa Alameldeen – Adaptive Cache Compression 1818
An Unpenalized HitAn Unpenalized Hit
Read/Write Address ARead/Write Address A LRU Stack order = 1 LRU Stack order = 1 ≤≤ 2 2 Hit regardless of compression Hit regardless of compression Uncompressed Line Uncompressed Line No decompression penalty No decompression penalty Neither cost nor benefitNeither cost nor benefit
Addr A uncompressed 3Addr A uncompressed 3
Addr B compressed 2Addr B compressed 2
LRU StackLRU Stack Data AreaData Area
Addr C compressed 6Addr C compressed 6
Addr D compressed 4Addr D compressed 4
ISCA 2004ISCA 2004 Alaa Alameldeen – Adaptive Cache CompressionAlaa Alameldeen – Adaptive Cache Compression 1919
A Penalized HitA Penalized Hit
Read/Write Address BRead/Write Address B LRU Stack order = 2 LRU Stack order = 2 ≤≤ 2 2 Hit regardless of compression Hit regardless of compression Compressed Line Compressed Line Decompression penalty incurred Decompression penalty incurred Compression costCompression cost
Addr A uncompressed 3Addr A uncompressed 3
Addr B compressed 2Addr B compressed 2
LRU StackLRU Stack Data AreaData Area
Addr C compressed 6Addr C compressed 6
Addr D compressed 4Addr D compressed 4
ISCA 2004ISCA 2004 Alaa Alameldeen – Adaptive Cache CompressionAlaa Alameldeen – Adaptive Cache Compression 2020
An Avoided MissAn Avoided Miss
Read/Write Address CRead/Write Address C LRU Stack order = 3 LRU Stack order = 3 >> 2 2 Hit only because of compression Hit only because of compression Compression benefit: Eliminated off-chip missCompression benefit: Eliminated off-chip miss
Addr A uncompressed 3Addr A uncompressed 3
Addr B compressed 2Addr B compressed 2
LRU StackLRU Stack Data AreaData Area
Addr C compressed 6Addr C compressed 6
Addr D compressed 4Addr D compressed 4
ISCA 2004ISCA 2004 Alaa Alameldeen – Adaptive Cache CompressionAlaa Alameldeen – Adaptive Cache Compression 2121
An Avoidable MissAn Avoidable Miss
Read/Write Address DRead/Write Address D Line is not in the cache but tag exists at LRU stack order = 4Line is not in the cache but tag exists at LRU stack order = 4 Missed only because some lines are not compressedMissed only because some lines are not compressed Potential compression benefitPotential compression benefit
Addr A uncompressed 3Addr A uncompressed 3
Addr B compressed 2Addr B compressed 2
LRU StackLRU Stack Data AreaData Area
Addr C compressed 6Addr C compressed 6
Addr D compressed 4Addr D compressed 4
Sum(CSize) = 15 Sum(CSize) = 15 ≤ 16≤ 16
ISCA 2004ISCA 2004 Alaa Alameldeen – Adaptive Cache CompressionAlaa Alameldeen – Adaptive Cache Compression 2222
An Unavoidable MissAn Unavoidable Miss
Read/Write Address ERead/Write Address E LRU stack order > 4 LRU stack order > 4 Compression wouldn’t have helped Compression wouldn’t have helped Line is not in the cache and tag does not existLine is not in the cache and tag does not exist Neither cost nor benefitNeither cost nor benefit
Addr A uncompressed 3Addr A uncompressed 3
Addr B compressed 2Addr B compressed 2
LRU StackLRU Stack Data AreaData Area
Addr C compressed 6Addr C compressed 6
Addr D compressed 4Addr D compressed 4
ISCA 2004ISCA 2004 Alaa Alameldeen – Adaptive Cache CompressionAlaa Alameldeen – Adaptive Cache Compression 2323
Compression PredictorCompression Predictor
Estimate: Benefit(Compression) – Cost(Compression)Estimate: Benefit(Compression) – Cost(Compression)
Single counter : Global Compression Predictor (GCP)Single counter : Global Compression Predictor (GCP) Saturating up/down 19-bit counterSaturating up/down 19-bit counter
GCP updated on each cache accessGCP updated on each cache access Benefit: Increment by memory latencyBenefit: Increment by memory latency Cost: Decrement by decompression latencyCost: Decrement by decompression latency Optimization: Normalize to decompression latency = 1Optimization: Normalize to decompression latency = 1
Cache AllocationCache Allocation Allocate compressed line if GCP Allocate compressed line if GCP 0 0 Allocate uncompressed lines if GCP < 0Allocate uncompressed lines if GCP < 0
ISCA 2004ISCA 2004 Alaa Alameldeen – Adaptive Cache CompressionAlaa Alameldeen – Adaptive Cache Compression 2424
OutlineOutline
MotivationMotivation
Cache Compression FrameworkCache Compression Framework
Adaptive CompressionAdaptive Compression
EvaluationEvaluation Simulation SetupSimulation Setup PerformancePerformance
ConclusionsConclusions
ISCA 2004ISCA 2004 Alaa Alameldeen – Adaptive Cache CompressionAlaa Alameldeen – Adaptive Cache Compression 2525
Simulation SetupSimulation Setup
Simics full system simulator augmented Simics full system simulator augmented with:with:
Detailed OoO processor simulator [TFSim, Mauer, et al., 2002]Detailed OoO processor simulator [TFSim, Mauer, et al., 2002] Detailed memory timing simulator [Martin, et al., 2002]Detailed memory timing simulator [Martin, et al., 2002]
Workloads: Workloads: Commercial workloads:Commercial workloads:
Database servers: OLTP and SPECJBBDatabase servers: OLTP and SPECJBB Static Web serving: Apache and ZeusStatic Web serving: Apache and Zeus
SPEC2000 benchmarks:SPEC2000 benchmarks: SPECint: bzip, gcc, mcf, twolfSPECint: bzip, gcc, mcf, twolf SPECfp: ammp, applu, equake, swimSPECfp: ammp, applu, equake, swim
ISCA 2004ISCA 2004 Alaa Alameldeen – Adaptive Cache CompressionAlaa Alameldeen – Adaptive Cache Compression 2626
System configurationSystem configuration
A dynamically scheduled SPARC V9 uniprocessorA dynamically scheduled SPARC V9 uniprocessor
Configuration parameters:Configuration parameters:
L1 CacheL1 Cache Split I&D, 64KB each, 2-way SA, 64B Split I&D, 64KB each, 2-way SA, 64B line, 2-cycles/accessline, 2-cycles/access
L2 CacheL2 Cache Unified 4MB, Unified 4MB, 8-way8-way SA, 64B line, SA, 64B line, 20cycles+decompression latency per 20cycles+decompression latency per accessaccess
MemoryMemory 4GB DRAM, 400-cycle access time, 128 4GB DRAM, 400-cycle access time, 128 outstanding requestsoutstanding requests
Processor pipelineProcessor pipeline 4-wide superscalar, 11-stage pipeline: 4-wide superscalar, 11-stage pipeline: fetch (3), decode(3), schedule(1), fetch (3), decode(3), schedule(1), execute(1+), retire(3)execute(1+), retire(3)
Reorder bufferReorder buffer 64 entries64 entries
ISCA 2004ISCA 2004 Alaa Alameldeen – Adaptive Cache CompressionAlaa Alameldeen – Adaptive Cache Compression 2727
Simulated Cache ConfigurationsSimulated Cache Configurations
Always:Always: All compressible lines are stored in All compressible lines are stored in compressed formatcompressed format Decompression penalty for all compressed linesDecompression penalty for all compressed lines
Never:Never: All cache lines are stored in All cache lines are stored in uncompressed format uncompressed format Cache is 8-way set associative with half the number of Cache is 8-way set associative with half the number of
setssets Does not incur decompression penaltyDoes not incur decompression penalty
Adaptive:Adaptive: Our adaptive compression Our adaptive compression schemescheme
ISCA 2004ISCA 2004 Alaa Alameldeen – Adaptive Cache CompressionAlaa Alameldeen – Adaptive Cache Compression 2828
PerformancePerformance
0
0.2
0.4
0.6
0.8
1
1.2bzip
gcc
mcf
twolf
ammp
applu
equake
swim
apache
zeus
oltp jbb
Norm
alize
d R
unti
me
NeverAlwaysAdaptive
SpecINTSpecINT SpecFPSpecFP CommercialCommercial
ISCA 2004ISCA 2004 Alaa Alameldeen – Adaptive Cache CompressionAlaa Alameldeen – Adaptive Cache Compression 2929
PerformancePerformance
0
0.2
0.4
0.6
0.8
1
1.2bzip
gcc
mcf
twolf
ammp
applu
equake
swim
apache
zeus
oltp jbb
Norm
alize
d R
unti
me
NeverAlwaysAdaptive
ISCA 2004ISCA 2004 Alaa Alameldeen – Adaptive Cache CompressionAlaa Alameldeen – Adaptive Cache Compression 3030
PerformancePerformance
0
0.2
0.4
0.6
0.8
1
1.2bzip
gcc
mcf
twolf
ammp
applu
equake
swim
apache
zeus
oltp jbb
Norm
alize
d R
unti
me
NeverAlwaysAdaptive
35% 35% SpeeduSpeedu
pp
18% 18% SlowdownSlowdown
ISCA 2004ISCA 2004 Alaa Alameldeen – Adaptive Cache CompressionAlaa Alameldeen – Adaptive Cache Compression 3131
PerformancePerformance
0
0.2
0.4
0.6
0.8
1
1.2bzip
gcc
mcf
twolf
ammp
applu
equake
swim
apache
zeus
oltp jbb
Norm
alize
d R
unti
me
NeverAlwaysAdaptive
Adaptive performs similar to the best of Always and Adaptive performs similar to the best of Always and NeverNever
Bug in GCP Bug in GCP updateupdate
ISCA 2004ISCA 2004 Alaa Alameldeen – Adaptive Cache CompressionAlaa Alameldeen – Adaptive Cache Compression 3232
Effective Cache CapacityEffective Cache Capacity
ISCA 2004ISCA 2004 Alaa Alameldeen – Adaptive Cache CompressionAlaa Alameldeen – Adaptive Cache Compression 3333
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
ammp gcc mcf apache
Norm
alized M
iss R
ate
NeverAlways
Cache Miss RatesCache Miss Rates
Penalized Hits Penalized Hits Per Per
Avoided MissAvoided Miss67096709 489 12.3 4.7 489 12.3 4.7
0.09 2.52 12.28 14.380.09 2.52 12.28 14.38Misses PerMisses Per
1000 1000 InstructionsInstructions
ISCA 2004ISCA 2004 Alaa Alameldeen – Adaptive Cache CompressionAlaa Alameldeen – Adaptive Cache Compression 3434
Adapting to L2 SizesAdapting to L2 Sizes
ammp
0
0.2
0.4
0.6
0.8
1
1.2
256K 1M 4M 16M
Norm
alize
d R
unti
me
NeverAlwaysAdaptive
0.93 5.7 6503 3260000.93 5.7 6503 326000
104.8 36.9 0.09 0.05104.8 36.9 0.09 0.05Misses PerMisses Per
1000 1000 InstructionsInstructions
Penalized Hits Penalized Hits Per Per
Avoided MissAvoided Miss
ISCA 2004ISCA 2004 Alaa Alameldeen – Adaptive Cache CompressionAlaa Alameldeen – Adaptive Cache Compression 3535
ConclusionsConclusions
Cache compression increases cache Cache compression increases cache capacity but slows down cache hit timecapacity but slows down cache hit time Helps some benchmarks (e.g., apache, mcf)Helps some benchmarks (e.g., apache, mcf) Hurts other benchmarks (e.g., gcc, ammp)Hurts other benchmarks (e.g., gcc, ammp)
Our Proposal: Adaptive compression Our Proposal: Adaptive compression Uses (LRU) replacement stack to determine whether Uses (LRU) replacement stack to determine whether
compression helps or hurtscompression helps or hurts Updates a single global saturating counter on cache Updates a single global saturating counter on cache
accessesaccesses
Adaptive compression performs similar to Adaptive compression performs similar to the better of the better of Always CompressAlways Compress and and Never Never CompressCompress
ISCA 2004ISCA 2004 Alaa Alameldeen – Adaptive Cache CompressionAlaa Alameldeen – Adaptive Cache Compression 3636
Backup SlidesBackup Slides
Frequent Pattern Compression (FPC)Frequent Pattern Compression (FPC) Decoupled Variable-Segment CacheDecoupled Variable-Segment Cache Classification of L2 AccessesClassification of L2 Accesses (LRU) Stack Replacement(LRU) Stack Replacement Cache Miss RatesCache Miss Rates Adapting to L2 SizesAdapting to L2 Sizes – mcf – mcf Adapting to L1 SizeAdapting to L1 Size Adapting to Decompression LatencyAdapting to Decompression Latency – mcf – mcf Adapting to Decompression LatencyAdapting to Decompression Latency – ammp – ammp Phase BehaviorPhase Behavior – gcc – gcc Phase BehaviorPhase Behavior – mcf – mcf Can We Do Better Than Adaptive?Can We Do Better Than Adaptive?
ISCA 2004ISCA 2004 Alaa Alameldeen – Adaptive Cache CompressionAlaa Alameldeen – Adaptive Cache Compression 3737
Decoupled Variable-Segment Decoupled Variable-Segment CacheCache
Each set contains Each set contains fourfour tags and space for tags and space for twotwo uncompressed lines uncompressed lines
Data area divided into 8-byte segments Data area divided into 8-byte segments
Each tag is composed of:Each tag is composed of: Address tagAddress tag PermissionsPermissions
CStatus : 1 if the line is compressed, 0 otherwiseCStatus : 1 if the line is compressed, 0 otherwise CSize: Size of compressed line in segmentsCSize: Size of compressed line in segments LRU/replacement bitsLRU/replacement bits
Same as Same as uncompressed uncompressed
cachecache
ISCA 2004ISCA 2004 Alaa Alameldeen – Adaptive Cache CompressionAlaa Alameldeen – Adaptive Cache Compression 3838
Frequent Pattern CompressionFrequent Pattern Compression
A significance-based compression algorithmA significance-based compression algorithm
Related Work: Related Work: X-Match and X-RL Algorithms [Kjelso, et al., 1996]X-Match and X-RL Algorithms [Kjelso, et al., 1996] Address and data significance-based compression [Farrens and Address and data significance-based compression [Farrens and
Park, 1991, Citron and Rudolph, 1995, Canal, et al., 2000]Park, 1991, Citron and Rudolph, 1995, Canal, et al., 2000]
A 64-byte line is decompressed in five cyclesA 64-byte line is decompressed in five cycles
More details in technical report:More details in technical report: ““Frequent Pattern Compression: A Significance-Based Compression Frequent Pattern Compression: A Significance-Based Compression
Algorithm for L2 CachesAlgorithm for L2 Caches,” ,” Alaa R. Alameldeen and David A. Wood, Alaa R. Alameldeen and David A. Wood, Dept. of Computer Sciences Technical Report CS-TR-2004-1500, April Dept. of Computer Sciences Technical Report CS-TR-2004-1500, April 2004 (available online). 2004 (available online).
ISCA 2004ISCA 2004 Alaa Alameldeen – Adaptive Cache CompressionAlaa Alameldeen – Adaptive Cache Compression 3939
Frequent Pattern Compression Frequent Pattern Compression (FPC)(FPC)
A significance-based compression algorithm A significance-based compression algorithm combined with zero run-length encodingcombined with zero run-length encoding
Compresses each 32-bit word separatelyCompresses each 32-bit word separately Suitable for short (32-256 byte) cache linesSuitable for short (32-256 byte) cache lines Compressible Patterns: zero runs, sign-ext. 4,8,16-bits, zero-Compressible Patterns: zero runs, sign-ext. 4,8,16-bits, zero-
padded half-word, two SE half-words, repeated bytepadded half-word, two SE half-words, repeated byte A 64-byte line is decompressed in a five-stage pipelineA 64-byte line is decompressed in a five-stage pipeline
More details in technical report:More details in technical report: ““Frequent Pattern Compression: A Significance-Based Compression Frequent Pattern Compression: A Significance-Based Compression
Algorithm for L2 CachesAlgorithm for L2 Caches,” ,” Alaa R. Alameldeen and David A. Wood, Alaa R. Alameldeen and David A. Wood, Dept. of Computer Sciences Technical Report CS-TR-2004-1500, April Dept. of Computer Sciences Technical Report CS-TR-2004-1500, April 2004 (available online). 2004 (available online).
ISCA 2004ISCA 2004 Alaa Alameldeen – Adaptive Cache CompressionAlaa Alameldeen – Adaptive Cache Compression 4040
Classification of L2 AccessesClassification of L2 Accesses
Cache hits:Cache hits: Unpenalized hit:Unpenalized hit: Hit to an Hit to an uncompresseduncompressed line that line that would have would have
hithit without compression without compression - Penalized hit:Penalized hit: Hit to a Hit to a compressedcompressed line that line that would have hitwould have hit
without compressionwithout compression+ Avoided miss:Avoided miss: Hit to a line that Hit to a line that would NOT have hitwould NOT have hit without without
compressioncompression
Cache misses:Cache misses:+ Avoidable miss:Avoidable miss: Miss to a line that Miss to a line that would have hitwould have hit with with
compression compression Unavoidable miss:Unavoidable miss: Miss to a line that Miss to a line that would have missedwould have missed
even with compressioneven with compression
ISCA 2004ISCA 2004 Alaa Alameldeen – Adaptive Cache CompressionAlaa Alameldeen – Adaptive Cache Compression 4141
Differentiate penalized hits and avoided misses?Differentiate penalized hits and avoided misses? Only hits to top half of the tags in the LRU stack are penalized hitsOnly hits to top half of the tags in the LRU stack are penalized hits
Differentiate avoidable and unavoidable misses?Differentiate avoidable and unavoidable misses?
Is not dependent on LRU replacementIs not dependent on LRU replacement Any replacement algorithm for top half of tagsAny replacement algorithm for top half of tags Any stack algorithm for the remaining tagsAny stack algorithm for the remaining tags
(LRU) Stack Replacement(LRU) Stack Replacement
ISCA 2004ISCA 2004 Alaa Alameldeen – Adaptive Cache CompressionAlaa Alameldeen – Adaptive Cache Compression 4242
Cache Miss RatesCache Miss Rates
0
0.2
0.4
0.6
0.8
1
1.2
ammp gcc mcf apache
Norm
alize
d M
iss
Rate
NeverAlwaysAdaptive
ISCA 2004ISCA 2004 Alaa Alameldeen – Adaptive Cache CompressionAlaa Alameldeen – Adaptive Cache Compression 4343
Adapting to L2 SizesAdapting to L2 Sizes
mcf
0
0.2
0.4
0.6
0.8
1
1.2
256K 1M 4M 16M
Norm
alize
d R
unti
me
NeverAlwaysAdaptive
11.6 4.4 12.6 2x1011.6 4.4 12.6 2x1066
98.9 88.1 12.4 0.0298.9 88.1 12.4 0.02Misses PerMisses Per
1000 1000 InstructionsInstructions
Penalized Hits Penalized Hits Per Per
Avoided MissAvoided Miss
ISCA 2004ISCA 2004 Alaa Alameldeen – Adaptive Cache CompressionAlaa Alameldeen – Adaptive Cache Compression 4444
Adapting to L1 SizeAdapting to L1 Size
ISCA 2004ISCA 2004 Alaa Alameldeen – Adaptive Cache CompressionAlaa Alameldeen – Adaptive Cache Compression 4545
Adapting to Decompression Adapting to Decompression LatencyLatency
mcf
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25
Decompression Latency (Cycles)
Nor
mal
ized
Run
tim
e
NeverAlwaysAdaptive
ISCA 2004ISCA 2004 Alaa Alameldeen – Adaptive Cache CompressionAlaa Alameldeen – Adaptive Cache Compression 4646
Adapting to Decompression Adapting to Decompression LatencyLatency
ammp
0
0.5
1
1.5
2
0 5 10 15 20 25
Decompression Latency (Cycles)
Nor
mal
ized
Run
tim
e
NeverAlwaysAdaptive
ISCA 2004ISCA 2004 Alaa Alameldeen – Adaptive Cache CompressionAlaa Alameldeen – Adaptive Cache Compression 4747
Phase BehaviorPhase BehaviorPredictor Value (K)Predictor Value (K)
Cache Size (MB)Cache Size (MB)
ISCA 2004ISCA 2004 Alaa Alameldeen – Adaptive Cache CompressionAlaa Alameldeen – Adaptive Cache Compression 4848
Phase BehaviorPhase BehaviorPredictor Value (K)Predictor Value (K)
Cache Size (MB)Cache Size (MB)
ISCA 2004ISCA 2004 Alaa Alameldeen – Adaptive Cache CompressionAlaa Alameldeen – Adaptive Cache Compression 4949
Can We Do Better Than Adaptive?Can We Do Better Than Adaptive?
Optimal is an unrealistic configuration: Always with no Optimal is an unrealistic configuration: Always with no decompression penaltydecompression penalty
0
0.2
0.4
0.6
0.8
1
1.2bzip
gcc
mcf
twolf
ammp
applu
equake
swim
apache
zeus
oltp jbb
Norm
alize
d R
unti
me
NeverAlwaysAdaptiveOptimal