Huffman Coding and Decoding TAIABUL HAQUE NAEEMUL HASSAN.

Post on 02-Jan-2016

222 views 0 download

Tags:

transcript

Huffman Coding and Decoding

• TAIABUL HAQUE

• NAEEMUL HASSAN

Huffman Encoding

An encoding algorithm used for lossless data compression-

• Variable-length code

• Prefix code

Basic Intuition- Those symbols that are more frequent should have smaller codes

A special kind of tree called Huffman Tree is built by exploiting this property

Huffman Tree Creation

Character Frequency

A 29

E 23

I 25

O 14

U 7

Huffman Tree Creation

Character Frequency

A 29

E 23

I 25

O 14

U 7

7 14 23 25 29

Huffman Tree Creation

Character Frequency

A 29

E 23

I 25

O 14

U 7

U(7) O(14)

21

21 23 2925

Huffman Tree Creation

Character Frequency

A 29

E 23

I 25

O 14

U 7

U(7) O(14)

21E(23)

44

25 29 44

Huffman Tree Creation

Character Frequency

A 29

E 23

I 25

O 14

U 7

U(7) O(14)

21E(23)

44

I(25) A(29)

54

5444

Huffman Tree Creation

Character Frequency

A 29

E 23

I 25

O 14

U 7

U(7) O(14)

21E(23)

44

I(25) A(29)

54

98

Start

Accept training data

Scan data, keep tally

Make prioritized list

Create, Draw Tree

Traverse tree

Determine code words

Save code words

Accept test sentence

Encode with lookup

Display encoded string

Decode with traversal

Display decoded string

Calculate comp. ratio

End

Start

Accept training data

Scan data, keep tally

Make prioritized list

Create, Draw Tree

Traverse tree

Determine code words

Save code words

Accept test sentence

Encode with lookup

Display encoded string

Decode with traversal

Display decoded string

Calculate comp. ratio

End

analysis of algorithm

analysis of algorithm

a(3)

Start

Accept training data

Scan data, keep tally

Make prioritized list

Create, Draw Tree

Traverse tree

Determine code words

Save code words

Accept test sentence

Encode with lookup

Display encoded string

Decode with traversal

Display decoded string

Calculate comp. ratio

End

Start

Accept training data

Scan data, keep tally

Make prioritized list

Create, Draw Tree

Traverse tree

Determine code words

Save code words

Accept test sentence

Encode with lookup

Display encoded string

Calculate comp. ratio

Decode with traversal

Display decoded string

End

algo = 4 * 8 = 32 bits

10101100011110 = 14 bits

Compression Ratio = 14/32*100 = 43.75

Start

Accept training data

Scan data, keep tally

Make prioritized list

Create, Draw Tree

Traverse tree

Determine code words

Save code words

Accept test sentence

Encode with lookup

Display encoded string

Calculate comp. ratio

Decode with traversal

Display decoded string

End

10101100011110

10101100011110

a(3)

10101100011110

l(2)

10101100011110

g(1)

10101100011110

o(2)

Frequency Analysis

• E T A O I N S H R D L U is the approximate order of frequency of the twelve most commonly used letters in the English language.

• Our Observation:

File Size Order of letters

338 E T I A O N S R C H L D

65 E T N O I A R S C L H D

70 E A O T R S I N L H D C

8 E O T A N R I S L H U D

677 E T A O N I R S H L D C

THANK YOU