Date post: | 11-Jan-2016 |
Category: |
Documents |
Upload: | rudolph-mcdaniel |
View: | 213 times |
Download: | 0 times |
Seok-Won Seong and Prabhat MishraUniversity of Florida
IEEE Transaction on Computer Aided Design of Intigrated Systems April 2008, Vol 27, No. 4
Rahul Sridharan1 of 25
Motivation Background
◦ Code compression using Bitmasks Challenges in Bitmask-based approach Application-Aware Code Compression
◦ Mask Selection◦ Bitmask-aware Dictionary Selection◦ Code Compression Algorithm
Results Conclusion
2 of 25
Bitmask-based code compression◦Addresses issue of memory constraints in
Embedded Systems improving power and performance
◦Constraints code size Application-Aware code compression
algorithm◦ Improve compression efficiency without
introducing decompression penalty
3 of 25
Compressed Code(Memory)
DecompressionEngine
Processor(Fetch and Execute)
Application Program (Binary)
CompressionAlgorithm
Static Encoding(Offline)
Dynamic Decoding(Online)
Compressed program sizeCompression ratio
Original program size
4 of 25
Format for Uncompressed Code Format for Compressed Code
Uncompressed Data(32 Bits)
Decision(1 Bit)
Decision(1 Bit)
# of Bit Changes
DictionaryIndex
Location(5 Bits)
Location(5 Bits)… …
Dictionary based◦ Frequency based Dictionary-selection
Format for Uncompressed Code (32 Bit Code) Format for Compressed Code
Uncompressed Data(32 Bits)
Decision
(1 Bit) Dictionary IndexDecision(1 Bit)
Hamming Distance based◦ Remembering Mismatches
Bit-mask based
5 of 25
32-bit instructions Format for uncompressed code
Format for compressed code
Uncompressed Data(32 Bits)
Decision(1 Bit)
Decision(1 Bit)
Number of Masks
DictionaryIndex
…MaskType
LocationMask
PatternMaskType
LocationMask
Pattern
Location to apply the bitmask
Actual mask patternType of the mask e.g., 2-bit, 4-bit etc.
6 of 25
0000 00001000 00100000 00100100 00100100 11100101 00100000 11000100 00101100 00000000 0000
Original Program Compressed Program Dictionary
0 1 00 0 00 11 10 0 11 10 00 1 10 0 10 11 10 0 01 01 10 0 10 11 00 1 10 0 00 11 00 1 0
Index Entry
0 0000 0000
1 0100 0010
0 – Compressed1 – Not Compressed
0 – Bit Mask Used1 – No Bit Mask Used
Bit Mask Position Bit Mask Value
7 of 25
Selection of appropriate mask pattern◦ Larger bitmask generates more matches
4-bit mask can handle up to 16 mismatches 8-bit mask can handle up to 256 mismatches
◦ Larger bitmask incurs higher cost 4-bit mask costs 7 bits 8-bit mask costs 10 bits
Efficient Dictionary Selection◦ Frequency-based selection not always optimum
Need for efficient masking and dictionary selection schemes to improve efficiency
8 of 25
Frequency-based DS
CR = 97.5%
Spanning-based DS
CR = 87.5%
9 of 25
Bitmask Selection
Bitmask-Aware Dictionary Selection◦Nondeterministic polynomial-time-hard
problem
Code Compression Algorithm◦Based on the combination of the two
approaches
10 of 25
How many bitmask patterns are needed? Which of them are profitable? Fixed and sliding bitmask patterns
Mask Fixed Sliding
1 Bit X
2 Bits X X
3 Bits X
4 Bits X X
5 Bits X
6 Bits X
7 Bits X
8 Bits X X
BitChanges
Size of Mask Pattern
1Bit
2Bits
4Bits
8Bits
16Bits
32Bits
32Bits 165 100 59 42 35 32
16Bits 84 51 30 21 17
8Bits 43 26 15 10
4Bits 22 13 7
2Bits 11 6
1Bit 5
11 of 25
Bits needed to indicate particular location◦ Size of mask◦ Type of mask
No. of bitmask patterns needed◦ Up to two mask patterns
Minimum cost to store three bitmasks is 27-31 bits for a 32-bit vector
Not very profitable Which combinations are profitable?
◦ Eleven possibilities 1s, 2s, 2f, 3s, 4s, 4f, 5s, 6s, 7s, 8s, 8f
◦ Select one/two from eleven possibilities Number of combinations can be further reduced
12 of 25
Benchmarks are compiled for TI TMS320C6x(1s, 4f) and (2f, 2s) provide the best compression
s
(1s, 4f) (1s, 4f)(2s, 2f)
13 of 25
Factors of 32 (1, 2, 4 and 8) produce better results◦ Since they can be applied cost-effectively on fixed locations
8-bit fixed/sliding is not helpful◦ Probability of more than 4 consecutive changes is low◦ Two smaller masks perform better than a larger one◦ 4-bit sliding does not perform better than 4-bit fixed
Two bitmasks provide better results than a single one Choose two from four bitmasks: (1s, 2f, 2s, 4s)
Mask Fixed Sliding
1 Bit X
2 Bits X X
4 Bits X
14 of 25
Dictionary Selection
Dynamic Static
Frequency Spanning Bit Savings
Select most frequently occurring binary patterns
Select patterns to ensure uniform coverage of all patterns based on hamming distance.
Select patterns based on bit savings due to self and mask-matched repetitions
15 of 25
16 of 25
A = 0+10 = 10 B = 7+15 = 22 C = 7+15 = 22 D = 0+5 = 5 E = 0+15 = 15 F = 7+20 = 27 G =14+10 = 24
A(0)
B(7)
C(7)
D(0)
E(0)F(7)
G(14)
5
105
10
10
5
Node Weight: number of bits saved due to frequency of the patternEdge Weight: number of bits saved due to use of the bitmask based matchTotal weight: node weight + all edge weights (connected to the node)
17 of 25
A = 0+10 = 10 B = 7+15 = 22 D = 0+5 = 5 G =14+10 = 24
A(0)
B(7)
D(0)
G(14)
5
5
10
Node Weight: number of bits saved due to frequency of the patternEdge Weight: number of bits saved due to use of the bitmask based matchTotal weight: node weight + all edge weights (connected to the node)
Continues until the dictionary is full or the graph is empty18 of 25
19 of 25
Experimental Setup◦ Benchmarks: TI and MediaBench◦ Architectures: Sparc, TI TMS320C6x, MIPS
Results◦ BCC: Bitmask-based code compression
Customized encodings for different architectures Effects of dictionary size selection Comparison with existing techniques
◦ ACC: Application-aware code compression Bitmask selection Dictionary selection
20 Of 25
•Encoding 1 (one 8-bit mask) •Encoding 2 (two 4-bit masks) •Encoding 3 (4-bit and 8-bit masks)
Encoding2 outperforms others
21 of 25
Outperforms other dictionary-based techniques by 15% Higher decompression bandwidth than existing compression
techniques
Smaller compression ratio is better
Compression Method Target Architecture Compression RatioDecompression
BandwidthParallel
DecompressionWolfe and Chanin Hoffman Coding
MIPS 73% 8 bits No
IBM CodePack PowerPC 60% 8 bits NoSAMC MIPS 57% 6-8 bits No
V2F TMS320C6x 70-82% 14.5-64 bits NoMCSSC TMS320C6x 75% 8 bits Yes
Prakash et al (Hamming Distance)
TMS320C6x 76-80% N/A Yes
Ros and Sutton (Hamming Distance)
TMS320C6x, Itanium 72-80% N/A Yes
Our ApproachMIPS, SPARC,
TMS320C6x55%-65% 32-64 bits YesBitmask Approach
22 of 25
BitSavings approach outperforms bothfrequency- and spanning-based techniques
23 of 25
BCC generates 15-20% improvement over other techniques ACC outperforms BCC by another 5-10%
BCC: Bitmask-based Code CompressionACC: Application-aware Code Compression
24 of 25
???
25 of 25