Algorithmsesslab.hanyang.ac.kr/uploads/algorithm_2018_2/lecture... · 2018-11-05 · Greedy...

Post on 26-Apr-2020

32 views 2 download

transcript

Algorithms

Greedy Algorithms

Dong Kyue Kim

Hanyang University

dqkim@hanyang.ac.kr

Content

• 16.1 An activity selection problem

• 16.3 Huffman Codes

2

Greedy Algorithms

• Greedy Algorithms

– A greedy algorithm always makes the choice that looks best at the moment.

– it makes a locally optimal choice in the hope that this choice will lead to a globally optimal solution.

3

An activity selection problem

• An activity selection problem– to select a maximum-size subset of mutually compatible

activities.

• For example– n classes, 1 lecture room

– to select a maximum number of classes

4

An activity selection problem

– activities set S = {a1, a2, ..., an}

– start time si, finish time fi, where 0 ≤ si < fi < ∞

– ai takes place during [si, fi)

– ai and aj are compatible if the intervals [si, fi) and [si, fi) do not overlap.

5

– {a3, a9, a11} : consists of mutually compatible activities. (not a maximal subset)

– {a1, a4, a8, a11} : largest subset of mutually compatible activities

– {a2, a4, a9, a11} : another largest subset

An activity selection problem

6

i 1 2 3 4 5 6 7 8 9 10 11

si 1 3 0 5 3 5 6 8 8 2 12

fi 4 5 6 7 8 9 10 11 12 13 14

An activity selection problem

• Subproblem (optimal substructure)

• a0 and an+1 and adopt the conventions that f0 = 0 and sn+1 = ∞.

• S = S0,n+1 for 0 ≤ i, j ≤ n + 1.

}:{ jkikij ssfSaS

7

An activity selection problem

• Let c[i, j] be the number of activities in a maximum-size subset of mutually compatible activities in Sij.– We have c[i, j]=0 whenever Sij = Ø ; in particular, c[i, j]=0 for

i≥ j.

– If ak is used in a maximum-size subset of mutually compatible activities of Sij,

– We also use maximum-size subsets of mutually compatible activities for the subproblems Sik and Skj.

8

An activity selection problem

9

}1],[],[{max

0

],[

jkckicjic

ijk Sajki

if Sij= Ø

if Sij≠ Ø

An activity selection problem

10

Theorem 16.1

Consider any nonempty subproblem Sij, andlet am be the activity in Sij with the earliest finish time:

fm = min {fk : ak ∈Sij}.

Then 1. Activity am is used in some maximum-size subset of mutually compatible activities of Sij.

2. The subproblem Sim is empty, so that choosing am leaves the subproblem Smj as the only one that may be nonempty.

An activity selection problem

Proof.

2. Some activity ak such that fi ≤ sk < fk ≤ sm < fm.

Then ak is also in Sij and it has an earlier finish time than am, which contradicts our choice of am.

We conclude that Sim is empty.

1. Aij : a maximum-size subset of Sij order the activities in Aij

in monotonically increasing order of finish time.

ak : the first activity in Aij.

11

An activity selection problem

– If ak = am

• We are done , am is used in some maximum-size subset of mutually compatible activities of Sij.

– If ak ≠ am

• Construct the subset

• A’ij are disjoint since Aij are

• ak is the first activity in Aij to finish fm ≤ fk.

• A’ij has the same number of activities as Aij

• A’ij is a maximum-size subset of mutually compatible activities of Sij that includes am.

12

An activity selection problem

13

An activity selection problem

• Optimal solution– Take earliest finish time

14

Greedy Algorithms

• Huffman Codes

– A widely used and very effective technique for compressing data

– Savings 20% ~ 90% are typical, depending on the characteristics of the data being compressed.

– Uses character frequencies

15

Greedy Algorithms

– For example• To represent 100,000 characters drawn from 6 characters

(a, b, c, d, e, f)

• Uses fixed-length : 300,000 bits• Uses variable-length :

(45·1+13·3+12·3+16·3+9·4+5·4)·1000=224,000 bits• A savings of approximately 25%

16

a b c d e f

Frequency (in thousands) 45 13 12 16 9 5

Fixed-length codeword 000 001 010 011 100 101

Variable-length codeword 0 101 100 111 1101 1100

Greedy Algorithms

• Prefix codes– Prefix code: no codeword is also a prefix of some other

codeword.• Encoding abc : 0·101·100

• Decoding 0·0·101·1101: aabe

• Easy decoding : tree for codes

17

a:45 b:13 c:12 d:16 f:5e:9

58 28 14

86 14

1000 1

0

0 0 0

01

1 1 1

a:45

b:13c:12 d:16

f:5 e:9

25 30

14

55

1000

0

0 0

0

1

1

1

1

1

Greedy Algorithms

– Decoding problem• a: 0 b: 01 c: 1

• 001: aac or ab

– Prefix code is required.• 0 : left child

• 1 : right child

• For example : – 0 = left = a

– 101=right-left-right = b

18

(the optimal prefix code tree)

a:45

b:13c:12 d:16

f:5 e:9

25 30

14

55

1000

0

0 0

0

1

1

1

1

1

Greedy Algorithms

19

Cc

T cdcfTB )()()(

– A cost of tree T• each character c in the alphabet C

• frequency of c : f(c)

• length of the codework for character c : dT(c)

(16.4)

Greedy Algorithms

• Huffman code : An optimal prefix code

– An optimal prefix code: full binary tree (Every node is either leaf or has two children)

– A full binary tree for alphabet C has |C| leaves and |C|-1

internal nodes.

20

Greedy Algorithms

• Building Huffman tree– Running Time : O(n lg n)

– Algorithm

21

f : 5 e : 9 c : 12 b : 13 d : 16 a : 45

a b c d e f

Frequency (in thousands) 45 13 12 16 9 5

Fixed-length codeword 000 001 010 011 100 101

Variable-length codeword 0 101 100 111 1101 1100

Greedy Algorithms

22

f : 5 e : 9 c : 12 b : 13 d : 16 a : 45

f : 5 e : 9

c : 12 b : 13 d : 16 a : 4514

0 1

Greedy Algorithms

23

f : 5 e : 9

c : 12 b : 13 d : 16 a : 4514

f : 5 e : 9 c : 12 b : 13

d : 16 a : 4514

10

25

10

Greedy Algorithms

24

f : 5 e : 9 c : 12 b : 13

d : 16 a : 4514

1

25

10

f : 5 e : 9

c : 12 b : 13d : 16

a : 45

14

1

25

10

0

30

10

Greedy Algorithms

25

f : 5 e : 9

c : 12 b : 13 d : 16

a : 45

14

1

25

10

0

30

10

f : 5 e : 9

c : 12 b : 13 d : 16

a : 45

14

1

25

10

0

30

10

55

10

Greedy Algorithms

26

a : 45

100

10

f : 5 e : 9

c : 12 b : 13 d : 1614

1

25

10

0

30

10

55

10

a b c d e f

Frequency (in thousands) 45 13 12 16 9 5

Fixed-length codeword 000 001 010 011 100 101

Variable-length codeword 0 101 100 111 1101 1100

Greedy Algorithms

• Correctness

27

Lemma 16.2

Let C be an alphabet in which each character c in C has frequency f [c].

Let x and y be two characters in C having the lowest frequencies.

Then there exists an optimal prefix code for C in which the codewords for x and y have the same length and differ only in the last bit.

Greedy Algorithms

Proof.

Idea : take an arbitrary optimal prefix code tree T.

Modify it and to make a tree representing another optimal prefix code such that the characters x and y appear as sibling leaves of maximum depth in the new tree.

Codewords will have the same length and differ only in the last bit.

28

Greedy Algorithms

29

b

T

y

a

x

T′

y

x b

a

T′′

b

x y

a

Greedy Algorithms

• Let a and b are sibling leaves of maximum depth in T.

• assume f[a] ≤ f[b] and f[x] ≤ f[y].

• f[x] and f[y]are lowest leaf frequencies, in order

• f[a] and f[b] are arbitrary frequencies, in order

• f[x] ≤ f[a] and f[y]≤ f[b] .

• exchange the positions in T and T′

• By equation (16.5), the difference in cost between T and T’ is

30

0

))()(])([][(

)(][)(][)(][)(][

)(][)(][)(][)(][

)()()()()'()(

''

'

xdadxfaf

xdafadxfadafxdxf

adafxdxfadafxdxf

cdcfcdcfTBTB

TT

TTTT

TTTT

CcT

CcT

Greedy Algorithms

– f [a]-f [x] and dT(a)-dT(x) are nonnegative. because x is a minimum-frequency leaf and a is a leaf of maximum depth in T.

– B(T’ ) - B(T ’’) is nonnegative. Therefore, B(T ’’) ≤B(T), and since T is optimal, B(T) ≤ B(T ’’), which implies B(T ’’) = B(T).

– Thus, T ’’ is an optimal tree in which x and y appear as sibling leaves of maximum depth, from which the lemma follows.

31

Greedy Algorithms

32

Lemma 16.3

Let C be a given alphabet with frequency f[c] defined for each character c∈C.

Let x and y be two characters in C with minimum frequency. Let C′ be the alphabet C with characters x, y removed and

character z added, so that C′ = C - {x, y}U{z}; define f for C′ as for C, except that f[z] = f[x] + f[y].

Let T′ be any tree representing an optimal prefix code for the alphabet C′.

Then the tree T, obtained from T’ by replacing the leaf node for z with an internal node having x and y as children,

represents an optimal prefix code for the alphabet C.

Greedy Algorithms

Proof.

For each c∈C – {x, y}, we have dT(c) = dT’(c), and hence f [c]dT(c) = f [c]dT’(c). Since dT(x) = dT(y) = d’(z) + 1, we have

from which we conclude that B(T) = B(T’) + f [x] + f [y]

or, equivalently, B(T’) = B(T) - f [x] - f [y].

33

])[][()(][

)1)(])([][()(][)(][

'

'

yfxfzdzf

zdyfxfydyfxdxf

T

TTT

Greedy Algorithms

Suppose that T does not represent an optimal prefix code for C.

• B(T’’) < B(i)

• T’’ has x and y as siblings.

• Let T’’’ be the tree T’’ with the common parent of x and y replaced by a leaf z with frequency f [z] = f [x] + f [y].

B(T’’’) = B(T’’) – f [x] – f [y] < B(T) – f [x] – f [y] = B(T’)

contradiction

– T must represent an optimal prefix code for the alphabet C.

34

Greedy Algorithms

Proof.

– Immediate from Lemmas 16.2 and 16.3.

Theorem 16.4Procedure HUFFMAN produces an optimal prefix code.

35