+ All Categories
Home > Documents > 1 Chapter 7 Skip Lists and Hashing Part 2: Hashing.

1 Chapter 7 Skip Lists and Hashing Part 2: Hashing.

Date post: 17-Jan-2016
Category:
Upload: kerry-ray
View: 229 times
Download: 1 times
Share this document with a friend
72
1 Chapter 7 Skip Lists and Hashing Part 2: Hashing
Transcript
Page 1: 1 Chapter 7 Skip Lists and Hashing Part 2: Hashing.

1

Chapter 7

Skip Lists and Hashing

Part 2: Hashing

Page 2: 1 Chapter 7 Skip Lists and Hashing Part 2: Hashing.

2

Sorted Linear Lists• For formula-based implementation

– Insert: O(n) comps & data moves

– Delete: O(n) comps & data moves

– Search: O(log(n)) comps

• For chained implementation: – Insert: O(n) comps

– Delete: O(n) comps

– Search: O(n) comps

Page 3: 1 Chapter 7 Skip Lists and Hashing Part 2: Hashing.

3

Sorted Chain

Page 4: 1 Chapter 7 Skip Lists and Hashing Part 2: Hashing.

4

Page 5: 1 Chapter 7 Skip Lists and Hashing Part 2: Hashing.

5

Page 6: 1 Chapter 7 Skip Lists and Hashing Part 2: Hashing.

6

Page 7: 1 Chapter 7 Skip Lists and Hashing Part 2: Hashing.

7

Dictionary• A dictionary is a collection of elements, each element

has a field called key.• Key is unique for each element• Operations:

– Insert an element with a specified key value– Search the dictionary for an element with a specified key

value– delete an element with a specified key value

• The access mode for elements in a dictionary is random access (or direct access) mode: i.e. any element may be retrieved by performing a search on its key.

Page 8: 1 Chapter 7 Skip Lists and Hashing Part 2: Hashing.

8

Dictionary

Page 9: 1 Chapter 7 Skip Lists and Hashing Part 2: Hashing.

9

Ideal hashing

• Hash table: table used to store elements

• Hash function: function to map keys to positions: k => f(k)

• Search for an element with key k: if f(k) is not empty, found; otherwise, failed

• Insert: f(k) must be empty

• Delete: f(k) cannot be empty

Page 10: 1 Chapter 7 Skip Lists and Hashing Part 2: Hashing.

10

Example: Student record dictionary

• Use student ID (6 digit number) as the key

• ID range 951000 and 952000

• f(k) = k - 951000

• Table size: 1001 i.e. ht[0..1000]

• ht[i].key = 0 indicates an empty entry

Page 11: 1 Chapter 7 Skip Lists and Hashing Part 2: Hashing.

11

Evaluation: Ideal Hashing

• Initialize an empty dictionary: Θ(b) where b is the size of the table

• Search, insert, and delete: Θ(1)• Property: 1 key <=> 1 position • Problem: the range of the keys may be very

large resulting in large hash table, e.g. if the key is a 9 digit integer (ex SSN), the size of the table will be 109

Page 12: 1 Chapter 7 Skip Lists and Hashing Part 2: Hashing.

12

Hashing with linear open addressing

• Used when the size of the hash table (D) is smaller than the key range• f(k) = k % D• Positions in hash table are indexed 0..D-1• bucket - position in a hash table• If key values are not integral type, they need to be converted first.• two keys k1 and k2 map into the same bucket if f(k1) = f(k2) collision

• home bucket - position numbered f(k) is the home bucket for k• In general a bucket may contain space for more than one element.• An overflow occurs if there is not room in the home bucket for the

new element.• If bucket has space for only one element, collision and overflow are

the same.

Page 13: 1 Chapter 7 Skip Lists and Hashing Part 2: Hashing.

13

Collision, overflow and linear open addressing

80, 58, &35 map into home bucket ht(3).

In case of collision, insert in next available bucket in sequence.

Page 14: 1 Chapter 7 Skip Lists and Hashing Part 2: Hashing.

14

Search

• To search for an element with key k, begin at bucket f(k) and continue in successive bucket regarding the table as circular, until:– a bucket containing an element with k is found

(successful)– an empty bucket is reached (unsuccessful)– return to the home bucket (unsuccessful)

Page 15: 1 Chapter 7 Skip Lists and Hashing Part 2: Hashing.

15

deletion• After deletion, must move successive elements

until:– am empty bucket is reached– return to the bucket from which the deletion took

place

• To improve performance, use a NeverUsed field. May need reorganization when many buckets have their NeverUsed field set to false

Page 16: 1 Chapter 7 Skip Lists and Hashing Part 2: Hashing.

16

Class definition

Page 17: 1 Chapter 7 Skip Lists and Hashing Part 2: Hashing.

17

Constructor

Page 18: 1 Chapter 7 Skip Lists and Hashing Part 2: Hashing.

18

hSearch

Page 19: 1 Chapter 7 Skip Lists and Hashing Part 2: Hashing.

19

Search

Page 20: 1 Chapter 7 Skip Lists and Hashing Part 2: Hashing.

20

Insert

Page 21: 1 Chapter 7 Skip Lists and Hashing Part 2: Hashing.

21

Performance analysis

• b - the number of buckets in the hush table, b = D

• initialization - Θ(b)

• worst-case insert and search - Θ(n), where n is the number of elements in the table

• worst-case happens when all n keys have the same home bucket

Page 22: 1 Chapter 7 Skip Lists and Hashing Part 2: Hashing.

22

Performance analysis (continue)Average performance • Let α=n/b denote the loading factor

• Un and Sn - average number of buckets examined during and unsuccessful and successful search, respectively, then

]1

11[

2

1~

]1

1[2

1~

)1(2

S

U

n

n

Page 23: 1 Chapter 7 Skip Lists and Hashing Part 2: Hashing.

23

Performance analysis (continue)

• The performance of hashing with linear open addressing is superior:

– when α=0.5 table is half full

Un=2.5 and Sn=1.5

– when α=0.9 table is 90% full

Un=50.5 and Sn=5.5

Page 24: 1 Chapter 7 Skip Lists and Hashing Part 2: Hashing.

24

Determining D

• either a prime number or has no prime factors less than 20

• two methods:– begin with the largest possible value for b

– Then find the largest D (<= b) that is either a prime or has no factors smaller than 20

– e.g., when b = 530, then D = 23*23 = 529

Page 25: 1 Chapter 7 Skip Lists and Hashing Part 2: Hashing.

25

Determining D

Second method:– determine your accepted Un and Sn

– Estimate n

– determine α

– determine smallest b for the above α

– determine smallest integer D >= b that either prime or has no factor smaller than 20.

Page 26: 1 Chapter 7 Skip Lists and Hashing Part 2: Hashing.

26

Determining D

• n = 1000

• S 4 and U 50.5– S = 4 ==> α = 6/7

– U = 50.05 ==> α = 0.9

– α = min(6/7 , 0.9) = 6/7

– b = n/ α = 7000/6 = 1167

– note: 23*51 = 1173

• ==> select D = b = 1173

Page 27: 1 Chapter 7 Skip Lists and Hashing Part 2: Hashing.

27

Hashing with Chains

Page 28: 1 Chapter 7 Skip Lists and Hashing Part 2: Hashing.

28

Implementations

Page 29: 1 Chapter 7 Skip Lists and Hashing Part 2: Hashing.

29

An improved implementation

Page 30: 1 Chapter 7 Skip Lists and Hashing Part 2: Hashing.

30

Comparison with Linear Open Addressing

• Space complexity– Let s be the space required by an element– Let b and n denote the number of buckets and

number of elements, respectively– Linear open addressing: b(s+2) bytes (2 for an

element of empty array)– chaining: 2b+2n+ns bytes– when n < bs/(s+2), chaining takes less space

Page 31: 1 Chapter 7 Skip Lists and Hashing Part 2: Hashing.

31

Search time complexity• Worst-case time complexity= n occurs when all

elements map to same bucket (equal to that of linear open addressing)

• Average– average length of a chain is α=n/b– average number of nodes examined in an unsuccessful

search:

* if chain has i nodes, it may take 1, 2, 3, …,I examinations. Assuming equal probability, on average

search time = 2

1i

i2

)1i(ij

i

1i

1j

Page 32: 1 Chapter 7 Skip Lists and Hashing Part 2: Hashing.

32

Search time complexity Ctnd

2

1

2

)1(j

1

1jnU

If α=0, Un=0If α<1, Un<= α

If α>=1,

Page 33: 1 Chapter 7 Skip Lists and Hashing Part 2: Hashing.

33

Average time complexity for successful search

• Need to know the expected distance of each of the n elements from the head of its chain

• Without losing generality, we assume elements are inserted into the chain in increasing order

• When the ith element is inserted, the expected length of the chain is (i-1)/b; and the ith element is added into the end of the chain

• A search for this element will require examination of

1+(i-1)/b nodes

• Assuming n elements are searched for with equal probability, then

21~

2

11}/)1(1{

1

1

b

nbi

n

n

inS

Page 34: 1 Chapter 7 Skip Lists and Hashing Part 2: Hashing.

34

Comparison with linear open addressing

• The expected performance of chaining is superior, e.g.,– when α=0.9

– Chaining: Un=0.9, Sn=1.45

– Linear open addressing: Un=50.5, Sn=5.5

Page 35: 1 Chapter 7 Skip Lists and Hashing Part 2: Hashing.

35

Skip Lists

Page 36: 1 Chapter 7 Skip Lists and Hashing Part 2: Hashing.

36

20 24 30 40 807560

A sorted chain with head and tail nodes

20 24 30 40 807560

Pointers to middle are added

Page 37: 1 Chapter 7 Skip Lists and Hashing Part 2: Hashing.

37

20 24 30 40 807560

Pointers to every second node

Page 38: 1 Chapter 7 Skip Lists and Hashing Part 2: Hashing.

38

Page 39: 1 Chapter 7 Skip Lists and Hashing Part 2: Hashing.

39

Skip List Implementation

Page 40: 1 Chapter 7 Skip Lists and Hashing Part 2: Hashing.

40

Page 41: 1 Chapter 7 Skip Lists and Hashing Part 2: Hashing.

41

Page 42: 1 Chapter 7 Skip Lists and Hashing Part 2: Hashing.

42

Page 43: 1 Chapter 7 Skip Lists and Hashing Part 2: Hashing.

43

Page 44: 1 Chapter 7 Skip Lists and Hashing Part 2: Hashing.

44

Page 45: 1 Chapter 7 Skip Lists and Hashing Part 2: Hashing.

45

Page 46: 1 Chapter 7 Skip Lists and Hashing Part 2: Hashing.

46

Page 47: 1 Chapter 7 Skip Lists and Hashing Part 2: Hashing.

47

Page 48: 1 Chapter 7 Skip Lists and Hashing Part 2: Hashing.

48

Page 49: 1 Chapter 7 Skip Lists and Hashing Part 2: Hashing.

49

Page 50: 1 Chapter 7 Skip Lists and Hashing Part 2: Hashing.

50

An application

• Text compression– compressor: file coding

• run-length coding: 1000 xs + 2000 ys => 1000x2000y• space needed: 3002 bytes (2 bytes for string ends) =>

12 bytes

– decompressor: decoding

• LZW Compression (Lempel, Ziv, and Welch)

Page 51: 1 Chapter 7 Skip Lists and Hashing Part 2: Hashing.

51

LZW Compression

• Try aaabbbbbbaabaaba

• encoded as: 0214537

Page 52: 1 Chapter 7 Skip Lists and Hashing Part 2: Hashing.

52

Input/Output

Page 53: 1 Chapter 7 Skip Lists and Hashing Part 2: Hashing.

53

Input/Output (continue)

Page 54: 1 Chapter 7 Skip Lists and Hashing Part 2: Hashing.

54

Dictionary organization

• Use code to represent the prefix of key

Page 55: 1 Chapter 7 Skip Lists and Hashing Part 2: Hashing.

55

Dictionary organization (continue)

• assume each code is 12 bits long. Hence there are at most 212=4096 codes

• Use hash table with divisor D = 4099

ChainHashTable<element, unsigned long> h(D)

Page 56: 1 Chapter 7 Skip Lists and Hashing Part 2: Hashing.

56

Output of codes

Page 57: 1 Chapter 7 Skip Lists and Hashing Part 2: Hashing.

57

Compression

Page 58: 1 Chapter 7 Skip Lists and Hashing Part 2: Hashing.

58

Compression (continue)

Page 59: 1 Chapter 7 Skip Lists and Hashing Part 2: Hashing.

59

Compression (continue)

Page 60: 1 Chapter 7 Skip Lists and Hashing Part 2: Hashing.

60

Headers and Function main

Page 61: 1 Chapter 7 Skip Lists and Hashing Part 2: Hashing.

61

Headers and Function main (continue)

Page 62: 1 Chapter 7 Skip Lists and Hashing Part 2: Hashing.

62

LZW Decompression• The dictionary is searched for an entry with a given code• The first code in the compressed file corresponds to a single

character• For all other codes p:

– Case 1: p is in the dictionary– Case 2: p is not in the dictionary

• If q is the code that precedes p in the compressed file, then pair (next code, test(q)fc(p)) is entered into dictionary, where fc(p) is the first character of text(p). This can only happen when text(p) = text(q)fc(q) and the current text segment is text(q)text(q)fc(q)

Page 63: 1 Chapter 7 Skip Lists and Hashing Part 2: Hashing.

63

Try

• Decode 0214537

• the result should be aaabbbbbbaabaaba

Page 64: 1 Chapter 7 Skip Lists and Hashing Part 2: Hashing.

64

Input/Output

Page 65: 1 Chapter 7 Skip Lists and Hashing Part 2: Hashing.

65

Input/Output (continue)

Page 66: 1 Chapter 7 Skip Lists and Hashing Part 2: Hashing.

66

Dictionary organization

Page 67: 1 Chapter 7 Skip Lists and Hashing Part 2: Hashing.

67

Input of Code

Page 68: 1 Chapter 7 Skip Lists and Hashing Part 2: Hashing.

68

Decompression

Page 69: 1 Chapter 7 Skip Lists and Hashing Part 2: Hashing.

69

Decompression (continue)

Page 70: 1 Chapter 7 Skip Lists and Hashing Part 2: Hashing.

70

Headers and Function main

Page 71: 1 Chapter 7 Skip Lists and Hashing Part 2: Hashing.

71

Headers and Function main (continue)

Page 72: 1 Chapter 7 Skip Lists and Hashing Part 2: Hashing.

72

End of Chapter 7


Recommended