+ All Categories
Home > Documents > Compression & Huffman Codes

Compression & Huffman Codes

Date post: 30-Dec-2015
Category:
Upload: lionel-franklin
View: 42 times
Download: 1 times
Share this document with a friend
Description:
Compression & Huffman Codes. Compression. Definition Reduce size of data (number of bits needed to represent data) Benefits Reduce storage needed Reduce transmission cost / latency / bandwidth. Sources of Compressibility. Redundancy Recognize repeating patterns Exploit using - PowerPoint PPT Presentation
29
Compression & Huffman Codes
Transcript
Page 1: Compression & Huffman Codes

Compression & Huffman Codes

Page 2: Compression & Huffman Codes

Compression

DefinitionReduce size of data

(number of bits needed to represent data)

BenefitsReduce storage needed

Reduce transmission cost / latency / bandwidth

Page 3: Compression & Huffman Codes

Sources of Compressibility

RedundancyRecognize repeating patterns

Exploit using

Dictionary

Variable length encoding

Human perceptionLess sensitive to some information

Can discard less important data

Page 4: Compression & Huffman Codes

Types of Compression

LosslessPreserves all information

Exploits redundancy in data

Applied to general data

LossyMay lose some information

Exploits redundancy & human perception

Applied to audio, image, video

Page 5: Compression & Huffman Codes

Effectiveness of Compression

MetricsBits per byte (8 bits)

2 bits / byte ¼ original size

8 bits / byte no compression

Percentage

75% compression ¼ original size

Page 6: Compression & Huffman Codes

Effectiveness of Compression

Depends on dataRandom data hard

Example: 1001110100 ?

Organized data easy

Example: 1111111111 110

CorollaryNo universally best compression algorithm

Page 7: Compression & Huffman Codes

Effectiveness of Compression

Lossless Compression is not always possibleIf compression is always possible (alternative view)

Compress file (reduce size by 1 bit)

Recompress output

Repeat (until we can store data with 0 bits)

Page 8: Compression & Huffman Codes

Lossless Compression Techniques

LZW (Lempel-Ziv-Welch) compressionBuild pattern dictionary

Replace patterns with index into dictionary

Run length encodingFind & compress repetitive sequences

Huffman codesUse variable length codes based on frequency

Page 9: Compression & Huffman Codes

Huffman Code

ApproachVariable length encoding of symbols

Exploit statistical frequency of symbols

Efficient when symbol probabilities vary widely

PrincipleUse fewer bits to represent frequent symbols

Use more bits to represent infrequent symbols

A A B A

A AA B

Page 10: Compression & Huffman Codes

Huffman Code Example

Expected sizeOriginal 1/82 + 1/42 + 1/22 + 1/82 = 2 bits / symbol

Huffman 1/83 + 1/42 + 1/21 + 1/83 = 1.75 bits / symbol

Symbol A B C D

Frequency 13% 25% 50% 12%

Original Encoding

00 01 10 11

2 bits 2 bits 2 bits 2 bits

Huffman Encoding

110 10 0 111

3 bits 2 bits 1 bit 3 bits

Page 11: Compression & Huffman Codes

Huffman Code Data Structures

Binary (Huffman) treeRepresents Huffman code

Edge code (0 or 1)

Leaf symbol

Path to leaf encoding

Example

A = “110”, B = “10”, C = “0”

Priority queueTo efficiently build binary tree 1

1 0

0

D

C

B

A

01

Page 12: Compression & Huffman Codes

Huffman Code Algorithm Overview

EncodingCalculate frequency of symbols in file

Create binary tree representing “best” encoding

Use binary tree to encode compressed file

For each symbol, output path from root to leaf

Size of encoding = length of path

Save binary tree

Page 13: Compression & Huffman Codes

Huffman Code – Creating Tree

AlgorithmPlace each symbol in leaf

Weight of leaf = symbol frequency

Select two trees L and R (initially leafs)

Such that L, R have lowest frequencies in tree

Create new (internal) node

Left child L

Right child R

New frequency frequency( L ) + frequency( R )

Repeat until all nodes merged into one tree

Page 14: Compression & Huffman Codes

Huffman Tree Construction 1

3 5 8 2 7A C E H I

Page 15: Compression & Huffman Codes

Huffman Tree Construction 2

3 5 82 7

5

A C EH I

Page 16: Compression & Huffman Codes

Huffman Tree Construction 3

3

5

82

7

5

10

A

C

EH I

Page 17: Compression & Huffman Codes

Huffman Tree Construction 4

3

5

82

7

5

10

15

A

C

EH I

Page 18: Compression & Huffman Codes

Huffman Tree Construction 5

3

5 8

2

75

10 15

251

1

1

1

0

0

0

0

A

C E

H

I

E = 01I = 00C = 10A = 111H = 110

Page 19: Compression & Huffman Codes

Huffman Coding Example

Huffman code

InputACE

Output(111)(10)(01) = 1111001

E = 01I = 00C = 10A = 111H = 110

Page 20: Compression & Huffman Codes

Huffman Code Algorithm Overview

DecodingRead compressed file & binary tree

Use binary tree to decode file

Follow path from root to leaf

Page 21: Compression & Huffman Codes

Huffman Decoding 1

3

5 8

2

75

10 15

251

1

1

1

0

0

0

0

A

C E

H

I

1111001

Page 22: Compression & Huffman Codes

Huffman Decoding 2

3

5 8

2

75

10 15

251

1

1

1

0

0

0

0

A

C E

H

I

1111001

Page 23: Compression & Huffman Codes

Huffman Decoding 3

3

5 8

2

75

10 15

251

1

1

1

0

0

0

0

A

C E

H

I

1111001

A

Page 24: Compression & Huffman Codes

Huffman Decoding 4

3

5 8

2

75

10 15

251

1

1

1

0

0

0

0

A

C E

H

I

1111001

A

Page 25: Compression & Huffman Codes

Huffman Decoding 5

3

5 8

2

75

10 15

251

1

1

1

0

0

0

0

A

C E

H

I

1111001

AC

Page 26: Compression & Huffman Codes

Huffman Decoding 6

3

5 8

2

75

10 15

251

1

1

1

0

0

0

0

A

C E

H

I

1111001

AC

Page 27: Compression & Huffman Codes

Huffman Decoding 7

3

5 8

2

75

10 15

251

1

1

1

0

0

0

0

A

C E

H

I

1111001

ACE

Page 28: Compression & Huffman Codes

Huffman Code Properties

Prefix codeNo code is a prefix of another code

Example

Huffman(“I”) 00

Huffman(“X”) 001 // not legal prefix code

Can stop as soon as complete code found

No need for end-of-code marker

NondeterministicMultiple Huffman coding possible for same input

If more than two trees with same minimal weight

Page 29: Compression & Huffman Codes

Huffman Code Properties

Greedy algorithmChooses best local solution at each step

Combines 2 trees with lowest frequency

Still yields overall best solutionOptimal prefix code

Based on statistical frequency

Better compression possible (depends on data)Using other approaches (e.g., pattern dictionary)


Recommended