CSC 2300 Data Structures & Algorithms

Post on 07-Jan-2016

34 views 1 download

Tags:

description

CSC 2300 Data Structures & Algorithms. April 27, 2007 Chap. 10. Algorithm Design Techniques. Today. File Compression Huffman Code. ASCII. What does ASCII stand for? The ASCII character set consists of about 100 “printable” characters. How many bits to represent these characters? - PowerPoint PPT Presentation

transcript

CSC 2300Data Structures & Algorithms

April 27, 2007

Chap. 10. Algorithm Design Techniques

Today

File Compression Huffman Code

ASCII

What does ASCII stand for? The ASCII character set consists of about 100

“printable” characters. How many bits to represent these characters? The set includes some “nonprintable” characters. An 8th bit is added as a parity bit.

Example

A file with only the characters a, e, i, s, t, blankspace, newline. There are seven characters, and so three bits are sufficient.

i see a seat 010101011001001101000101011001000100110 (39 bits) How to do better?

Binary Tree

Binary tree:

The data reside only at the leaves. Can you improve this representation?

Example

newline becomes 11 i see a seat 01010101100100110100010101100100010011 (38 bits) A reduction of 1 bit. Want more significant improvement. How?

The Two Trees

What can you say about the structure of the better tree? It a a full tree. All nodes either are leaves or have two children. An optimal code will always have this property. Why? Nodes with only one child can always move up one level.

Prefix Code

If the characters are placed only at the leaves, the given sequence of bits can be decoded unambiguously.

Prefix code: no character code is a prefix of another character code.

Example: 01001111000010110001000111 What is it? is

a tie

Optimal Prefix Code

Binary tree:

How to find optimal code?

Our Example

i see a seat 1011000000101110011100000010010001 (34 bits) The code in the table is not optimal for our example. Why not? Exercise. Find the optimal code for our example.

Huffman’s Algorithm

Assume that there are C characters. Maintain a forest of trees. The weight of a tree is equal to the sum of the frequencies

of its leaves. For C – 1 times, select the two trees T1 and T2 of smallest

weights, breaking ties arbitrarily, and form a new tree with subtrees T1 and T2.

At the beginning, there are C single-node trees. At the end, there is one single tree, which is the optimal Huffman coding tree.

Example

Initial stage:

After first merge:

Example

After first merge:

After second merge:

After third merge:

Example

After third merge:

After fourth merge:

Example

After fourth merge:

After fifth merge:

Example

After fifth merge:

After final merge:

Implementation

If we maintain the trees in a priority queue, ordered by weight, what is the running time?

O( C log C ). We say that Huffman’s method is a two-pass

algorithm. What are the two passes? The first pass selects the frequency data and

the second pass performs the encoding.