+ All Categories
Home > Documents > CSC 2300 Data Structures & Algorithms

CSC 2300 Data Structures & Algorithms

Date post: 07-Jan-2016
Category:
Upload: locke
View: 34 times
Download: 1 times
Share this document with a friend
Description:
CSC 2300 Data Structures & Algorithms. April 27, 2007 Chap. 10. Algorithm Design Techniques. Today. File Compression Huffman Code. ASCII. What does ASCII stand for? The ASCII character set consists of about 100 “printable” characters. How many bits to represent these characters? - PowerPoint PPT Presentation
Popular Tags:
17
CSC 2300 Data Structures & Algorithms April 27, 2007 Chap. 10. Algorithm Design Techniques
Transcript
Page 1: CSC 2300 Data Structures & Algorithms

CSC 2300Data Structures & Algorithms

April 27, 2007

Chap. 10. Algorithm Design Techniques

Page 2: CSC 2300 Data Structures & Algorithms

Today

File Compression Huffman Code

Page 3: CSC 2300 Data Structures & Algorithms

ASCII

What does ASCII stand for? The ASCII character set consists of about 100

“printable” characters. How many bits to represent these characters? The set includes some “nonprintable” characters. An 8th bit is added as a parity bit.

Page 4: CSC 2300 Data Structures & Algorithms

Example

A file with only the characters a, e, i, s, t, blankspace, newline. There are seven characters, and so three bits are sufficient.

i see a seat 010101011001001101000101011001000100110 (39 bits) How to do better?

Page 5: CSC 2300 Data Structures & Algorithms

Binary Tree

Binary tree:

The data reside only at the leaves. Can you improve this representation?

Page 6: CSC 2300 Data Structures & Algorithms

Example

newline becomes 11 i see a seat 01010101100100110100010101100100010011 (38 bits) A reduction of 1 bit. Want more significant improvement. How?

Page 7: CSC 2300 Data Structures & Algorithms

The Two Trees

What can you say about the structure of the better tree? It a a full tree. All nodes either are leaves or have two children. An optimal code will always have this property. Why? Nodes with only one child can always move up one level.

Page 8: CSC 2300 Data Structures & Algorithms

Prefix Code

If the characters are placed only at the leaves, the given sequence of bits can be decoded unambiguously.

Prefix code: no character code is a prefix of another character code.

Example: 01001111000010110001000111 What is it? is

a tie

Page 9: CSC 2300 Data Structures & Algorithms

Optimal Prefix Code

Binary tree:

How to find optimal code?

Page 10: CSC 2300 Data Structures & Algorithms

Our Example

i see a seat 1011000000101110011100000010010001 (34 bits) The code in the table is not optimal for our example. Why not? Exercise. Find the optimal code for our example.

Page 11: CSC 2300 Data Structures & Algorithms

Huffman’s Algorithm

Assume that there are C characters. Maintain a forest of trees. The weight of a tree is equal to the sum of the frequencies

of its leaves. For C – 1 times, select the two trees T1 and T2 of smallest

weights, breaking ties arbitrarily, and form a new tree with subtrees T1 and T2.

At the beginning, there are C single-node trees. At the end, there is one single tree, which is the optimal Huffman coding tree.

Page 12: CSC 2300 Data Structures & Algorithms

Example

Initial stage:

After first merge:

Page 13: CSC 2300 Data Structures & Algorithms

Example

After first merge:

After second merge:

After third merge:

Page 14: CSC 2300 Data Structures & Algorithms

Example

After third merge:

After fourth merge:

Page 15: CSC 2300 Data Structures & Algorithms

Example

After fourth merge:

After fifth merge:

Page 16: CSC 2300 Data Structures & Algorithms

Example

After fifth merge:

After final merge:

Page 17: CSC 2300 Data Structures & Algorithms

Implementation

If we maintain the trees in a priority queue, ordered by weight, what is the running time?

O( C log C ). We say that Huffman’s method is a two-pass

algorithm. What are the two passes? The first pass selects the frequency data and

the second pass performs the encoding.


Recommended