Data Compression (Huffman)

Post on 24-Dec-2014

143 views 4 download

Tags:

description

 

transcript

Turn off the lights

Clip End

Data Compression

Muhammad Raza Master (B12101085)

Muhammad Ali Mehmood (B12101065)

Syed Faraz Naqvi (B12101123)

-Department of Computer Science, University of Karachi

What is data

compression???

Reduction in size of data

Why??

Save storage when saving information

Save time when communicating information

Types of data compression

Compression

Lossless Lossy

Practical

Examples

• Image Compression

• Audio Compression

• Video compression

• All Sort of DataCompression

Tree

TREE

ROOT

Sub Tree/Parent Node

Left child Right child

• Sum of children’s frequency • Reference of B-Tree(0/1)

* Char variable * Frequency * Reference of B-Tree(0/1)

APPLICATION

• Find an object with a certain property in a collection of objects of a certain type

• Items in a list be stored so that an item can be easily located

• Efficient encoding of set of characters by bit strings

TRAVERSING IN TREE

• IN-ORDER TRAVERSAL

• PREORDER TRAVERSAL

• POSTORDER TRAVERSAL

4 12 18 24

10 22

31 44 66 90

35 70

15 50

25

Pre-Order In-Order Post-order1. Visit the root Traverse the left subtree Traverse the left subtree2. Traverse the left subree Visit the root Traverse the right subtree3. Traverse the right subtree Traverse the right subtree Visit the root

Pre-Order: 25, 15, 10, 4, 12, 22, 18, 24, 50, 35, 31, 44, 70, 66, 90In-Order: 4, 10, 12, 15, 18, 22, 24, 25, 31, 35, 44, 50, 66, 70, 90Post Order: 4, 12, 10, 18, 24, 22, 15, 31, 44, 35, 66, 90, 70, 50, 25

Huffman

Encoding

• By Dr. David Huffman (1952)

• First data compression algorithm

• An example of ‘LOSSLESS DATA COMPRESSION’

• Binary tree is used to construct Huffman encoding algorithm

Introduction

Basic Idea

Largest occurring char has the least encoded bit.

Save bits by encoding frequently used characters with fewer bits than rarely used characters

Algorithm

HUFFMAN(X)

• Compute frequency f(c) for each character c in X.• Let Q be an empty priority queue• Insert every character c into Q as singleton trees with

key f(c)• while Q.SIZE() > 1

– Do• f1 ← Q.MIN-KEY()• T1 ← Q.REMOVE-MIN()• f2 ← Q.MIN-KEY()• T2 ← Q.REMOVE-MIN()• Let T be a new tree with left subtree T1 and right subtree T2• Q.INSERT(T, f1 + f2)

• Return Q.REMOVE-MIN()

it was the best of times it was the worst of times.

Symbol Count

LF 1

b 1

r 1

f 2

h 2

m 2

a 2

w 3

o 3

i 4

e 5

s 6

t 8

space 11

(full stop) = LF

Example:

Symbol BitsLF 101010b 101011r 10100f 11000h 11001m 11010a 11011w 0010o 0011i 1011e 000s 100t 111space 01

Demonstration of

Huffman

Encoding

Example#1:

HumeraTariq

Symbol Count

H 1

u 1

m 1

e 1

r 2

a 2

T 1

I 1

q 1

H u m e T i

2 2 2 q

4 3 r a

7 4

110 1

1

1

11

1

1

10

0

0

0

00

0

m = HumeraTariq

Symbol Bits

H 0000

u 0001

m 0010

e 0011

r 10

a 11

T 0100

i 0101

q 0110

Compressed Bit-stream

C(m) = 000000010010001110110100111001010110

Proposition

The length of the encoded bit-stream is the sum over all letters of the number of occurrences times the number of

bits per occurrence

Compressed bit-stream = frequency * Distance

Proof

E.g: m= HumeraTariq• At distance:– 4: six leaf (‘H’, ‘u’, ‘m’, ‘e’, ‘T’, ‘i’, with total

frequency 6) – 3: one leaf (‘q’, with frequency 1)– 2: two leaf nodes (‘r’ and ‘a’, with total frequency

4)

• Compressed bit-stream = frequency * Distance

• total = 4·6 + 3·1 + 2.4 = 35 is the length of compressed bit-stream as expected

Proved!!

Complexity

Let d be the number of symbols, n be the length of the input

Huffman’s algorithm runs in O(n + d log d) time

Success of

Huffman Coding

We can apply it to any bytestream

Milestone of LZW compression

REFERENCES

• Robert Sedgewick and Kevin Wayne - Algorithms, (4th edition)

• https://blog.itu.dk/BADS-F2009/files/2009/04/46-huffman.pdf

• Discrete Mathematics and Its Applications (7th Edition-Rosen)