Huffman Encoder Project
Howd - Zur HungEric Lai
Wei Jie LeeYu - Chiang Lee
Design Manager: Jonathan P. Lee
Huffman Encoder Project
Final PresentationApril 30th, 2007
Overall Project Objective:Design a Low Power Huffman Encoder
• About Huffman Compression (Wei Jie)• Marketing (Wei Jie)• Project Description (Wei Jie)• Design Methodology (Randal)• Original Huffman Recipe (Randal)• Our Huffman Encoder (Randal)• Design Decisions (Randal)• Behavioral/Algorithmic Description (Eric)• Floorplan Evolution (Eric)• Layout (John)• Verification (Eric)• Issues Encountered (John)• Specifications (John)• Conclusions (John)
Agenda of Presentation
About Huffman
• Huffman is a compression algorithm
• Often used as a back-end to other compressions
• Greedy algorithm
The Need for Compression
• It is becoming a wireless world
• Wireless bandwidth limited
• Power is limited
• COMPRESSION!
• Reduce data size = Save power + time + bandwidth
Why Huffman?
• Lossless
• Statistical
• David Huffman is the man!
• Outdid Shannon-Fano coding
Project Description
Our Huffman Encoder is a fast and power efficient solution to data compression with on-chip cache
• Hardware compression out performs software based solution
• Small, affordable, and power efficient chip is perfect for portable devices
Why Hardware?
Hardware Huffman Solution
• Low power, compact, full-custom ASIC
• Saves power, time, and system resources
• Compress data packets on network cards
• Cell phones, PDA, Laptop
Design Methodology
1.Understand the algorithm
2.Design functional blocks
3.Behavioral Verilog
4.Structural Verilog
5.Schematic
6.Layout
7.Simulations
Specifics of Huffman
Procedure
• pre-scan data and count frequency
• iteratively find least two frequent word and build a tree
• encode word according to
the final tree structure
4 7 2 1 15a b c d e
3
d c
7
a
14
b
29
e
0
0
0
0
1
1
1
1
001 01 0001 0000 1a b c d e
Our Huffman
Procedure
• pre-scan data, count frequency, and assign unique group number
• iteratively find least two frequent word to update group number and encoding
• finish encoding look up table
4 7 2 1 15
a b c d e
0 1 2 3 4
4 7 3 3 15
a b c d e
0 1 2 2 4
1 0
7 7 7 7 15
a b c d e
0 1 0 0 4
1 01 00
14 14 14 14 15
a b c d e
0 0 0 0 4
01 1 001 000
29 29 29 29 29
a b c d e
0 0 0 0 0
001 01 0000 0000 1
1. 5-bit input word size
2. 16-bit frequency
3. Two SRAMs
4. Adders: 16-bit Carry Select Adders
5. Serial output
6. Control logic to shut down modules
Design Decisions
SRAMFrequency /
Group
Find2Freq
Combine Groups
Output Tree
SRAMCode / Length
Serial Output
Control Logic
Count Frequency
Dat
aIn
TreeOutput
Com
pres
sed
Out
put
DataIn
Behavioral / Algorithm Description
turns off unused blocks to reduce power
Schematic Diagram
Floorplan Prelayout
Floorplan Midlayout
Final Chip Layout
Top
Find2Freq
Combine
SR
AM
freq
Gro
up
SRAMcodeLength
countFreq
serial output
control
CountFrequency
Find2Freq
Combine
SerialOutput
SRAM(FreqGroup)
Poly
Metal 1
Metal 2
Metal 3
Metal 4
• Matched Verilog results with MATLAB results
• Verified the successful compression of several test cases including parts of an image file:
Verification: Verilog
• Vigorously tested each block
• Combined them and encoded several words
Verification: Schematic
• Verified strong signal integrity
• Buffered high fan-outs and long wires
• Critical Path: 4.88 ns
• All outputs of modules go through registers
Verification: Layout
Component Specifications
countFreq find2freq combine Serial
Output
SRAM
(combined)
Transistor
Count718 2810 2702 1404 13764
Area
(in μm2)2656 9844 9750 4380 39041
Density 0.270 0.285 0.277 0.321 0.353
Power
(mWatt)0.405 0.615 0.665 0.241 9.19
Final Specifications
Number of Transistors : 23,322
Area : 288.18 x 273.645 = 78859 μm2
Density : 0.296 (transistors/μm2)
Aspect Ratio = 1:1.05
Pin Count = 52 pins• Input : 5-bit data input, start, done, finish
• Output : 36-bit treeOutput, treeReady, out, request, error
• vdd!, gnd!, clk, reset
Final Clock Speed = 200 MHz
Final Specifications
• Final result is up to 1800 times faster than Java! (probably because it’s Java)
• Compressed 640 bits of an image
• Java results – 10 ms
• Centrino 1.5 GHz
• 512 RAM
• Our hardware Huffman – 5.4 us
• 1071 cycles
1. Bad estimate for original floorplan
2. Long SRAM simulation time
3. SRAM sense amp issue
4. Too much poly!
5. Cannot route through SRAM
Issues Encountered
Conclusions
• Next Steps:
• Scale up design
• Better compression ratio
• Higher throughput
• Meeting of the Minds
• HUFFMAN DECODER!!
Questions?