+ All Categories
Home > Documents > CMSC 100 Storing Data: Huffman Codes and Image Representation Professor Marie desJardins Tuesday,...

CMSC 100 Storing Data: Huffman Codes and Image Representation Professor Marie desJardins Tuesday,...

Date post: 03-Jan-2016
Category:
Upload: chad-potter
View: 216 times
Download: 0 times
Share this document with a friend
Popular Tags:
27
CMSC 100 CMSC 100 Storing Data: Huffman Codes and Image Storing Data: Huffman Codes and Image Representation Representation Professor Marie desJardins Tuesday, September 18, 2012 Tue 9/18/12 1 CMSC 100 -- Data Compression
Transcript
Page 1: CMSC 100 Storing Data: Huffman Codes and Image Representation Professor Marie desJardins Tuesday, September 18, 2012 Tue 9/18/12 1CMSC 100 -- Data Compression.

CMSC 100CMSC 100

Storing Data: Huffman Codes and Image Storing Data: Huffman Codes and Image

RepresentationRepresentation

Professor Marie desJardins

Tuesday, September 18, 2012

Tue 9/18/121CMSC 100 -- Data Compression

Page 2: CMSC 100 Storing Data: Huffman Codes and Image Representation Professor Marie desJardins Tuesday, September 18, 2012 Tue 9/18/12 1CMSC 100 -- Data Compression.

Data Compression: Data Compression: MotivationMotivation

Memory is a finite resource: the more data we have, the more space it takes to store Same with bandwidth: the more data we need to send, the

more time it takes

Data compression can reduce space and bandwidth Lossless compression: Store the exact same data in less

space Lossy compression: Store an approximation of the data in

less space

Tue 9/18/12CMSC 100 -- Data Compression

2

Page 3: CMSC 100 Storing Data: Huffman Codes and Image Representation Professor Marie desJardins Tuesday, September 18, 2012 Tue 9/18/12 1CMSC 100 -- Data Compression.

Time and Space Time and Space TradeoffsTradeoffs

Data compression trades (computational) time for space and bandwidth: It takes time to convert the original data D to the compressed

format DC

It takes time to convert compressed data DC back to a viewable format D’

Compression ratio:

Space savings:

Tue 9/18/12CMSC 100 -- Data Compression

3 €

CR =Length(DC )

Length(D)

SS =1−CR

Page 4: CMSC 100 Storing Data: Huffman Codes and Image Representation Professor Marie desJardins Tuesday, September 18, 2012 Tue 9/18/12 1CMSC 100 -- Data Compression.

Lossless vs. Lossy Lossless vs. Lossy CompressionCompression

Lossless: Save space without losing any information Take advantage of repetition and self-similarity (e.g., solid-

color regions in an image)

Lossy: Save space but lose some information Lose resolution or detail (e.g., “pixillate” an image or remove

very high/low frequencies in a sound file)

Tue 9/18/12CMSC 100 -- Data Compression

4

Page 5: CMSC 100 Storing Data: Huffman Codes and Image Representation Professor Marie desJardins Tuesday, September 18, 2012 Tue 9/18/12 1CMSC 100 -- Data Compression.

Encoding StrategiesEncoding Strategies Run-length encoding: replace n instances of object x with the

pair of numbers (n,x)

Frequency-dependent encoding: use shorter representations (fewer bits) for objects that appear more frequently in a document

Relative or differential encoding: when x is followed by y, represent y by the difference y-x (which is often small in images etc. and can therefore be represented by a short code)

Dictionary encoding: Create an index of all of the objects (e.g., words) in a document, then replace each object with its index location (can save space if there is a lot of repetition)

Tue 9/18/12CMSC 100 -- Data Compression

5

Page 6: CMSC 100 Storing Data: Huffman Codes and Image Representation Professor Marie desJardins Tuesday, September 18, 2012 Tue 9/18/12 1CMSC 100 -- Data Compression.

Image and Sound Image and Sound FormatsFormats

Images Row-by-row bitmaps in different color spaces:

RGB (one byte per color = 24 bits = 17M different colors), a.k.a “True Color” (used in JPEG formats) (How much storage for one True Color 2Kx3K digital camera image?)

Color palette: Use only one byte to index 256 of the 17M 24-bit colors (used in GIF formats) (How much storage for one 24-bit color 200x300 image on a website?)

Variable resolution provides different image sizes and levels of fidelity to an original (continuous or very high-resolution digital) image

Sound Convert continuous sound to digital by sampling (variable-rate) Each sample can be represented with varying levels of resolution (“bit

depth”) (MP3: 44K samples/second, 16 bits/sample – how much storage for one minute of sound?)

Tue 9/18/12CMSC 100 -- Data Compression

6

Page 7: CMSC 100 Storing Data: Huffman Codes and Image Representation Professor Marie desJardins Tuesday, September 18, 2012 Tue 9/18/12 1CMSC 100 -- Data Compression.

Compression Ratio: Compression Ratio: ExampleExample

Suppose I have a 2M .PNG (bitmap) image and I store it in a compressed .JPG file that is 187K. What is the compression ratio? What is the space savings?

Tue 9/18/12CMSC 100 -- Data Compression

7

Page 8: CMSC 100 Storing Data: Huffman Codes and Image Representation Professor Marie desJardins Tuesday, September 18, 2012 Tue 9/18/12 1CMSC 100 -- Data Compression.

Huffman CodingHuffman Coding Lossless frequency-based encoding

Huffman coding is (space-)optimal in the sense that if we need the exact distribution (frequency) of every object, we will be able to represent the document in the shortest possible number of bits

Downside: It takes a while to compute

Goal #1: Length of each object should be related to its frequency Specifically: length is proportion to the negative log of the frequency

Goal #2: Code should be unambiguous Since objects will be encoded at different lengths, as we read the

bits, we need to know when we’ve reached the end of one object and should begin processing the next one

This type of code is called a prefix code

Tue 9/18/12CMSC 100 -- Data Compression

8

Page 9: CMSC 100 Storing Data: Huffman Codes and Image Representation Professor Marie desJardins Tuesday, September 18, 2012 Tue 9/18/12 1CMSC 100 -- Data Compression.

Using a Prefix CodeUsing a Prefix Code

Tue 9/18/12CMSC 100 -- Data Compression

9

A E

LH O

SC

How would you represent“HELLO” using this code?

0 1

Note: By convention, the left branch is 0;the right branch is 1

0 1 0 1

0 1

0 1

0 1

Page 10: CMSC 100 Storing Data: Huffman Codes and Image Representation Professor Marie desJardins Tuesday, September 18, 2012 Tue 9/18/12 1CMSC 100 -- Data Compression.

Interpreting a Prefix Interpreting a Prefix CodeCode

Tue 9/18/12CMSC 100 -- Data Compression

10

A E

LH O

SC

What does “1110000110110111110”mean in this code?

0 1

0 1 0 1

0 1

0 1

0 1

Page 11: CMSC 100 Storing Data: Huffman Codes and Image Representation Professor Marie desJardins Tuesday, September 18, 2012 Tue 9/18/12 1CMSC 100 -- Data Compression.

Interpreting a Prefix Interpreting a Prefix CodeCode

Tue 9/18/12CMSC 100 -- Data Compression

11

A E

LH O

SC

What does “1110000110110111110”mean in this code?

0 1

0 1 0 1

0 1

0 1

0 1

C

Page 12: CMSC 100 Storing Data: Huffman Codes and Image Representation Professor Marie desJardins Tuesday, September 18, 2012 Tue 9/18/12 1CMSC 100 -- Data Compression.

Interpreting a Prefix Interpreting a Prefix CodeCode

Tue 9/18/12CMSC 100 -- Data Compression

12

A E

LH O

SC

What does “1110 | 000110110111110”mean in this code?

0 1

0 1 0 1

0 1

0 1

0 1

C

Page 13: CMSC 100 Storing Data: Huffman Codes and Image Representation Professor Marie desJardins Tuesday, September 18, 2012 Tue 9/18/12 1CMSC 100 -- Data Compression.

Interpreting a Prefix Interpreting a Prefix CodeCode

Tue 9/18/12CMSC 100 -- Data Compression

13

A E

LH O

SC

What does “1110 | 000110110111110”mean in this code?

0 1

0 1 0 1

0 1

0 1

0 1

C H

Page 14: CMSC 100 Storing Data: Huffman Codes and Image Representation Professor Marie desJardins Tuesday, September 18, 2012 Tue 9/18/12 1CMSC 100 -- Data Compression.

Interpreting a Prefix Interpreting a Prefix CodeCode

Tue 9/18/12CMSC 100 -- Data Compression

14

A E

LH O

SC

What does “1110 | 000 | 110 | 110 | 1111 | 10”mean in this code?

0 1

0 1 0 1

0 1

0 1

0 1

C H O O S E

Page 15: CMSC 100 Storing Data: Huffman Codes and Image Representation Professor Marie desJardins Tuesday, September 18, 2012 Tue 9/18/12 1CMSC 100 -- Data Compression.

A OSPC

TL W

C! PM US

Y

RE

Decode the Message:

0111110010100101011011100011110111110110 010 00111111110 010

0110001110 010 0110001110 010 0110001110 010

0001100000100100000000110 010 011111001000000 01110

0 1

Tue 9/18/12

15CMSC 100 -- Data Compression

0 1

0 1

0 1

0 1 0 1

0 1

0 1

0 10 10 1

0 1 0 1 0 1

Page 16: CMSC 100 Storing Data: Huffman Codes and Image Representation Professor Marie desJardins Tuesday, September 18, 2012 Tue 9/18/12 1CMSC 100 -- Data Compression.

Encoding AlgorithmEncoding Algorithm Frequency distribution:

Set of k objects, o1...ok

Number of times of each object appears in the document, n1...nk

Construct a Huffman code as follows:n Pick the two least frequent objects, oi and oj

n Replace them with a single combined object, oij, with frequency ni+nj

n If there are at least two objects left, go to step 1

Visually:1. Each of the original objects is a leaf (bottom node) in the prefix tree

2. Each combined objects represents a 0/1 split where the “children” are the two objects that were combined

3. In the last step, we combine two subtrees into a single final prefix tree

Tue 9/18/12CMSC 100 -- Data Compression

16

Page 17: CMSC 100 Storing Data: Huffman Codes and Image Representation Professor Marie desJardins Tuesday, September 18, 2012 Tue 9/18/12 1CMSC 100 -- Data Compression.

Encoding ExampleEncoding Example SHE SELLS SEASHELLS BY THE SEASHORE

Tue 9/18/12CMSC 100 -- Data Compression

17

Page 18: CMSC 100 Storing Data: Huffman Codes and Image Representation Professor Marie desJardins Tuesday, September 18, 2012 Tue 9/18/12 1CMSC 100 -- Data Compression.

Encoding ExampleEncoding Example SHE SELLS SEASHELLS BY THE SEASHORE

Frequency distribution: A – 2 B – 1 E – 7 H – 4 L – 4 O – 1 R – 1 S – 8 T – 1 Y – 1 <SPC> – 5

Tue 9/18/12CMSC 100 -- Data Compression

18

Page 19: CMSC 100 Storing Data: Huffman Codes and Image Representation Professor Marie desJardins Tuesday, September 18, 2012 Tue 9/18/12 1CMSC 100 -- Data Compression.

Encoding ExampleEncoding Example

Tue 9/18/12CMSC 100 -- Data Compression

19

2

O1B1

Page 20: CMSC 100 Storing Data: Huffman Codes and Image Representation Professor Marie desJardins Tuesday, September 18, 2012 Tue 9/18/12 1CMSC 100 -- Data Compression.

Encoding ExampleEncoding Example

Tue 9/18/12CMSC 100 -- Data Compression

20

2

O1B1

2

T1R1

3

Y1A2

Page 21: CMSC 100 Storing Data: Huffman Codes and Image Representation Professor Marie desJardins Tuesday, September 18, 2012 Tue 9/18/12 1CMSC 100 -- Data Compression.

Encoding ExampleEncoding Example

Tue 9/18/12CMSC 100 -- Data Compression

21

2

O1B1

2

T1R1

3

Y1A2

4

7

Page 22: CMSC 100 Storing Data: Huffman Codes and Image Representation Professor Marie desJardins Tuesday, September 18, 2012 Tue 9/18/12 1CMSC 100 -- Data Compression.

Encoding ExampleEncoding Example

Tue 9/18/12CMSC 100 -- Data Compression

22

2

O1B1

2

T1R1

3

Y1A2

4

7 8

L4H4

Page 23: CMSC 100 Storing Data: Huffman Codes and Image Representation Professor Marie desJardins Tuesday, September 18, 2012 Tue 9/18/12 1CMSC 100 -- Data Compression.

CMSC 100 -- Data Compression

Encoding ExampleEncoding Example

Tue 9/18/12

23

2

O1B1

2

T1R1

3

Y1A2

4

7 8

L4H4

12

E7_5

Page 24: CMSC 100 Storing Data: Huffman Codes and Image Representation Professor Marie desJardins Tuesday, September 18, 2012 Tue 9/18/12 1CMSC 100 -- Data Compression.

CMSC 100 -- Data Compression

Encoding ExampleEncoding Example

Tue 9/18/12

24

2

O1B1

2

T1R1

3

Y1A2

4

7 8

L4H4

12

E7_5

15

Page 25: CMSC 100 Storing Data: Huffman Codes and Image Representation Professor Marie desJardins Tuesday, September 18, 2012 Tue 9/18/12 1CMSC 100 -- Data Compression.

CMSC 100 -- Data Compression

Encoding ExampleEncoding Example

Tue 9/18/12

25

2

O1B1

2

T1R1

3

Y1A2

4

8

L4H4

12

E7_5

15

20

S87

35

Page 26: CMSC 100 Storing Data: Huffman Codes and Image Representation Professor Marie desJardins Tuesday, September 18, 2012 Tue 9/18/12 1CMSC 100 -- Data Compression.

Green Eggs and HamGreen Eggs and Ham

Tue 9/18/12CMSC 100 -- Data Compression

26

Page 27: CMSC 100 Storing Data: Huffman Codes and Image Representation Professor Marie desJardins Tuesday, September 18, 2012 Tue 9/18/12 1CMSC 100 -- Data Compression.

Green Eggs and HamGreen Eggs and Ham

I am Sam

I am Sam

Sam I am

That Sam-I-am!

That Sam-I-am!

I do not like

that Sam-I-am!

Do you like

green eggs and ham?

I do not like them,

Sam-I-am.

I do not like

green eggs and ham. Tue 9/18/12

27

CMSC 100 -- Data Compression

Symbols (not letters!) are words.Ignore spaces and punctuation.


Recommended