+ All Categories
Home > Documents > 1 HashTable. 2 Dictionary A collection of data that is accessed by “key” values –The keys may...

1 HashTable. 2 Dictionary A collection of data that is accessed by “key” values –The keys may...

Date post: 03-Jan-2016
Category:
Upload: norah-lawrence
View: 218 times
Download: 1 times
Share this document with a friend
45
1 HashTable
Transcript
Page 1: 1 HashTable. 2 Dictionary A collection of data that is accessed by “key” values –The keys may be ordered or unordered –Multiple key values may/may-not.

1

HashTable

Page 2: 1 HashTable. 2 Dictionary A collection of data that is accessed by “key” values –The keys may be ordered or unordered –Multiple key values may/may-not.

2

Dictionary• A collection of data that is accessed by “key” values

– The keys may be ordered or unordered– Multiple key values may/may-not be allowed

• Supports the following fundamental methods– void put(Object key, Object data)

• Inserts data into the dictionary using the specified key– Object get(Object key)

• Returns the data associated with the specified key• An error occurs if the specified key is not in the dictionary

– Object remove(Object key)• Removes the data associated with the specified key and returns the

data.• An error occurs if the specified key is not in the dictionary

Page 3: 1 HashTable. 2 Dictionary A collection of data that is accessed by “key” values –The keys may be ordered or unordered –Multiple key values may/may-not.

3

Abstract Dictionary Example

((5,A), (7,B), (2,C)) or ((5,A), (7,B), (2, Q))C or Qremove(2)

((5,A), (7,B), (2,C), (2, Q))Errorremove(Q)

((5,A), (7,B), (2,C), (2, Q))C or Qget(2)

((5,A), (7,B), (2,C), (2, Q))Noneput(2, Q)

((5,A), (7,B), (2,C))Bget(7)

((5,A), (7,B), (2,C))Errorget(A)

((5,A), (7,B), (2,C))Noneput(2,C)

((5,A), (7,B))Noneput(7, B)

((5,A))Noneput(5, A)

DictionaryOutputOperation

Page 4: 1 HashTable. 2 Dictionary A collection of data that is accessed by “key” values –The keys may be ordered or unordered –Multiple key values may/may-not.

4

What is a Hashtable?

• A hashtable is an unordered dictionary that uses an array to store data– Each data element is associated with a key

– Each key is mapped into an array index using a hash function

– The key AND the data are then stored in the array

• Hashtables are commonly used in the construction of compiler symbol tables.

Page 5: 1 HashTable. 2 Dictionary A collection of data that is accessed by “key” values –The keys may be ordered or unordered –Multiple key values may/may-not.

5

DictionariesAVL Trees vs. Hashtables

O(1)O(N)O(Log N)O(Log N)remove

O(1)O(N)O(Log N)O(Log N)get

O(1)O(N)O(Log N)O(Log N)put

AverageAstounding!

WorstAverageNot Bad

Worst

HashtableAVLMethod

Page 6: 1 HashTable. 2 Dictionary A collection of data that is accessed by “key” values –The keys may be ordered or unordered –Multiple key values may/may-not.

6

Simple Example

Insert data into the hashtable using characters as keys

The hashtable is an array of “items”

The hashtables’ capacity is 7

The hash function must take a character as input and convert it into a number between 0 and 6.

Use the following hash function: Let P be the position of the character in the English alphabet (starting with 1). The hash function h(K) = P

The function must be normalized in order to map into the appropriate range (0-6). The normalized hash function is h(K) % 7.

0123456

Page 7: 1 HashTable. 2 Dictionary A collection of data that is accessed by “key” values –The keys may be ordered or unordered –Multiple key values may/may-not.

7

0123456

Exampleput(B2, Data1)

put(S19, Data2)

put(J10, Data3)

put(N14, Data4)

put(X24, Data5)

put(W23, Data6)

put(B2, Data7)

get(X24)

get(W23)

(B2, Data1)

(S19, Data2)

(J10, Data3)

(N14, Data4)

(X24, Data5) ???

This is called a collision

Collisions are handled via a “collision resolution policy”

Page 8: 1 HashTable. 2 Dictionary A collection of data that is accessed by “key” values –The keys may be ordered or unordered –Multiple key values may/may-not.

8

From Keys to Indices

• The mapping of keys to indices of a hash table is called a hash function• A hash function is usually the composition of two maps, a hash code map and a compression map.

– An essential requirement of the hash function is to map equal keys to equal indices– A “good” hash function minimizes the probability of collisions

Page 9: 1 HashTable. 2 Dictionary A collection of data that is accessed by “key” values –The keys may be ordered or unordered –Multiple key values may/may-not.

9

Popular Hash-Code Maps• Integer cast: for numeric types with 32 bits or less, we can reinterpret the

bits of the number as an int

• Component sum: for numeric types with more than 32 bits (e.g., long and double), we can add the 32-bit components.

• Polynomial accumulation: for strings of a natural language, combine the character values (ASCII or Unicode) a0a1 ... an-1 by viewing them as the coefficients of a polynomial:

a0 + a1x + ...+ xn-1an-1

-The polynomial is computed with Horner’s rule, ignoring overflows, at a fixed value x:

a0 + x (a1 +x (a2+ ... x (an-2+ x an-1) ... ))

-The choice x = 33, 37, 39, or 41gives at most 6 collisions on a vocabulary of 50,000 English words

• Why is the component-sum hash code bad for strings?

Page 10: 1 HashTable. 2 Dictionary A collection of data that is accessed by “key” values –The keys may be ordered or unordered –Multiple key values may/may-not.

10

Popular Compression Maps

• Division: h(k) = |k| mod N– the choice N = 2k is bad because not all the bits are taken into

account

– the table size N is usually chosen as a prime number

– certain patterns in the hash codes are propagated

• Multiply, Add, and Divide (MAD): h(k) = |ak + b| mod N

Page 11: 1 HashTable. 2 Dictionary A collection of data that is accessed by “key” values –The keys may be ordered or unordered –Multiple key values may/may-not.

11

Details and Definitions

• Load factor is the size of the table divided by the capacity of the table

•Various means of “collision resolution” can be used. The collision resolution policy determines what is done when two keys map to the same array index.

–Open Addressing: look for an open slot

–Separate Chaining: keep a list of key/value pairs in a slot

Page 12: 1 HashTable. 2 Dictionary A collection of data that is accessed by “key” values –The keys may be ordered or unordered –Multiple key values may/may-not.

12

Exampleput(B2, Data1)

put(S19, Data2)

put(J10, Data3)

put(N14, Data4)

put(X24, Data5)

put(W23, Data6)

get(X24)

get(W23)

0123456

(B2, Data1)

(S19, Data2)

(J10, Data3)

(N14, Data4)

(X24, Data5)

(W23, Data7)

(X24, Data5) ???

Open Addressing: When a collision occurs, probe for an empty slot. In this case, use linear probing (looking “down”) until an empty slot is found.

Page 13: 1 HashTable. 2 Dictionary A collection of data that is accessed by “key” values –The keys may be ordered or unordered –Multiple key values may/may-not.

13

Open Addressing

• Uses a “probe sequence” to look for an empty slot to use

• The first location examined is the “hash” address

• The sequence of locations examined when locating data is called the “probe sequence”

• The probe sequence {s(0), s(1), s(2), … } can be described as follows:

s(i) = norm(h(K) + p(i))

– where h(K) is the “hash function” mapping K to an integer

– p(i) is a “probing function” returning an offset for the ith probe

– norm is the “normalizing function” (usually division modulo capacity)

Page 14: 1 HashTable. 2 Dictionary A collection of data that is accessed by “key” values –The keys may be ordered or unordered –Multiple key values may/may-not.

14

Open Addressing

• Linear probing– use p(i) = i

– The probe sequence becomes {norm(h(k)), norm(h(k)+1), norm(h(k)+2), …}

• Quadratic probing– use p(i) = i2

– The probe sequence becomes {norm(h(k)), norm(h(k)+1), norm(h(k)+4),…}

– Must be careful to allow full coverage of “empty” array slots

– A theorem states that this method will find an empty slot if the table is not more that ½ full.

Page 15: 1 HashTable. 2 Dictionary A collection of data that is accessed by “key” values –The keys may be ordered or unordered –Multiple key values may/may-not.

15

Linear Probing• If the current location is used, try the next table location

linear_probing_insert(K)if (table is full) errorprobe = h(K)

while (table[probe] occupied)probe = (probe + 1) mod M

table[probe] = K

• Lookups walk along table until the key or an empty slot is found

• Uses less memory than chaining. (Don’t have to store all those links)

• Slower than chaining. (May have to walk along table for a long way.)

• Deletion is more complex. (Either mark the deleted slot or fill in the slot by shifting some elements down.)

Page 16: 1 HashTable. 2 Dictionary A collection of data that is accessed by “key” values –The keys may be ordered or unordered –Multiple key values may/may-not.

16

Linear Probing Example• h(k) = k mod 13• Insert keys:

1841 2244 59 32

44 32

31

31

73

73

Page 17: 1 HashTable. 2 Dictionary A collection of data that is accessed by “key” values –The keys may be ordered or unordered –Multiple key values may/may-not.

17

Linear Probing Example (cont.)

Page 18: 1 HashTable. 2 Dictionary A collection of data that is accessed by “key” values –The keys may be ordered or unordered –Multiple key values may/may-not.

18

Keys

h N N 0

1

Linear probing

h(key)

Page 19: 1 HashTable. 2 Dictionary A collection of data that is accessed by “key” values –The keys may be ordered or unordered –Multiple key values may/may-not.

19

Keys

h N N 0

1

Linear probing

(h(key) + 1) mod N

Page 20: 1 HashTable. 2 Dictionary A collection of data that is accessed by “key” values –The keys may be ordered or unordered –Multiple key values may/may-not.

20

Keys

h N N 0

1

Linear probing

(h(key) + 2) mod N

Page 21: 1 HashTable. 2 Dictionary A collection of data that is accessed by “key” values –The keys may be ordered or unordered –Multiple key values may/may-not.

21

Keys

h N N 0

1

Linear probing

(h(key) + 3) mod N

Page 22: 1 HashTable. 2 Dictionary A collection of data that is accessed by “key” values –The keys may be ordered or unordered –Multiple key values may/may-not.

22

Keys

h N N 0

1

Linear probing

(h(key) + 4) mod N

Page 23: 1 HashTable. 2 Dictionary A collection of data that is accessed by “key” values –The keys may be ordered or unordered –Multiple key values may/may-not.

23

Keys

h N N 0

1

Quadratic probing

h(key)

Page 24: 1 HashTable. 2 Dictionary A collection of data that is accessed by “key” values –The keys may be ordered or unordered –Multiple key values may/may-not.

24

Keys

h N N 0

1

Quadratic probing

(h(key) + 1) mod N

Page 25: 1 HashTable. 2 Dictionary A collection of data that is accessed by “key” values –The keys may be ordered or unordered –Multiple key values may/may-not.

25

Keys

h N N 0

1

Quadratic probing

(h(key) + 4) mod N

Page 26: 1 HashTable. 2 Dictionary A collection of data that is accessed by “key” values –The keys may be ordered or unordered –Multiple key values may/may-not.

26

Keys

h N N 0

1

Quadratic probing

(h(key) + 9) mod N

Page 27: 1 HashTable. 2 Dictionary A collection of data that is accessed by “key” values –The keys may be ordered or unordered –Multiple key values may/may-not.

27

Keys

h

Quadratic probing

h(key)

N = 17 (prime)

N N 0

1

(h(key) + 121) mod N

Page 28: 1 HashTable. 2 Dictionary A collection of data that is accessed by “key” values –The keys may be ordered or unordered –Multiple key values may/may-not.

28

Keys

h

Quadratic probing

h(key)

N = 17 (prime)

N N 0

1

(h(key) + 144) mod N

Page 29: 1 HashTable. 2 Dictionary A collection of data that is accessed by “key” values –The keys may be ordered or unordered –Multiple key values may/may-not.

29

Quadratic probing

h(key)

N = 17 (prime)

N N 0

1

Theorem:

If quadratic probing is used, and the table size is prime, then a new element can always be inserted if the table is at least half empty.

Page 30: 1 HashTable. 2 Dictionary A collection of data that is accessed by “key” values –The keys may be ordered or unordered –Multiple key values may/may-not.

30

Quadratic probing

h(key)

N N 0

1

Application:

Probing visited only 9 of the 17 bins, but if the table is half empty, not all those 9 bins can be occupied, so we must be able to insert a new element in one of them.

Theorem:

If quadratic probing is used, and the table size is prime, then a new element can always be inserted if the table is at least half empty.

N = 17 (prime)

Page 31: 1 HashTable. 2 Dictionary A collection of data that is accessed by “key” values –The keys may be ordered or unordered –Multiple key values may/may-not.

31

CollisionsGiven N people in a room, what are the odds that at least two of them will have the same birthday?

•Table capacity of 365

•After N insertions what are the odds of at least one collision?

Who wants to be a Millionaire?

Assume N = 23 (load factor is therefore 23/365 = 6.3%). What are the approximate odds that two of these people have the same birthday?

10% 75%

25% 90%

50% 99%

Page 32: 1 HashTable. 2 Dictionary A collection of data that is accessed by “key” values –The keys may be ordered or unordered –Multiple key values may/may-not.

32

CollisionsLet Q(n) be the probability that when n people are in a room, nobody has the same birthday.

Let P(n) be the probability that when n people are in a room, at least two of them have the same birthday.

P(n) = 1 – Q(n)

Consider that:

Q(1) = 1

Q(2) = Odds that Q(1) don’t collide times the odds of one more person not “colliding”

Q(2) = Q(1) * 364/365

Q(3) = Q(2) * 363/365

Q(4) = Q(3) * 362/365

Q(n) = (365/365) * (364/365) * (363/365) * … * ((365-n+1)/365)

Q(n) = 365! / (365n * (365-n)!)

Page 33: 1 HashTable. 2 Dictionary A collection of data that is accessed by “key” values –The keys may be ordered or unordered –Multiple key values may/may-not.

33

Collisions

Number of people

Odd

s of

a c

olli

sion

Odds of Collision

N

99.9999%100

94.1%45

89.1%40

70.1%30

50.7%23

25.3%15

11.7%10

2.7%5

Collisions are more frequent than you might expect, even for low load factors!

Page 34: 1 HashTable. 2 Dictionary A collection of data that is accessed by “key” values –The keys may be ordered or unordered –Multiple key values may/may-not.

34

Hashcodes and table size

• Hashcodes should be fast/easy to compute

• Keys should evenly distribute across the table

• Hashtable capacities are usually kept at prime-values to avoid problems with probe sequences– Consider inserting into the table below using quadratic probing

and a key object that hashes to index 2

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Page 35: 1 HashTable. 2 Dictionary A collection of data that is accessed by “key” values –The keys may be ordered or unordered –Multiple key values may/may-not.

35

We need to have a little talk

• How to remove an item from a hashtable that uses open addressing?

• Consider a table of size 11 with the following sequence of operations using h(k) = K%11 and linear probing (p(i) = i)– put(36, D1)

– put(23, D2)

– put(4, D3)

– put(46, D4)

– put(1, D5)

– remove(23)

– remove(36)

– get(1)

Page 36: 1 HashTable. 2 Dictionary A collection of data that is accessed by “key” values –The keys may be ordered or unordered –Multiple key values may/may-not.

36

Removal

• If an item is removed from the table, it could mess up gets on other items in the table.

• Fix the problem by using a “tombstone” marker to indicate that while the item has been removed from the array slot, the slot should be considered “occupied” for purposes of later gets.

Page 37: 1 HashTable. 2 Dictionary A collection of data that is accessed by “key” values –The keys may be ordered or unordered –Multiple key values may/may-not.

37

Double Hashing

• Another probing strategy is to use “double hashing”

• The probe sequence becomes

s(k,i) = norm(h(k) + i*h2(k))

• The hash value is determined by “two” hash functions and is typically better than linear or quadratic probing.

Page 38: 1 HashTable. 2 Dictionary A collection of data that is accessed by “key” values –The keys may be ordered or unordered –Multiple key values may/may-not.

38

Double Hashing Example• h1(K) = K mod 13

• h2 (K) = 8 - K mod 8

• we want h2 to be an offset to add

Page 39: 1 HashTable. 2 Dictionary A collection of data that is accessed by “key” values –The keys may be ordered or unordered –Multiple key values may/may-not.

39

Double Hashing Example (cont.)

Page 40: 1 HashTable. 2 Dictionary A collection of data that is accessed by “key” values –The keys may be ordered or unordered –Multiple key values may/may-not.

40

Separate Chaining

• A way to “avoid” collisions• Each array slot contains a list of data elements• The fundamental methods then become:

– PUT: hash into array and add to list

– GET: hash into array and search the list

– REMOVE: hash into array and remove from list

• The built-in HashMap and Hashtable classes use separate chaining

Page 41: 1 HashTable. 2 Dictionary A collection of data that is accessed by “key” values –The keys may be ordered or unordered –Multiple key values may/may-not.

41

0123456

Chaining Exampleput(B2, Data1)

put(S19, Data2)

put(J10, Data3)

put(N14, Data4)

put(X24, Data5)

put(W23, Data6)

put(B2, Data7)

get(X24)

get(W23)

(B2, Data1)

(S19, Data2)

(J10, Data3)

(N14, Data4)

(X24, Data5) ???(X24, Data5) ???

Page 42: 1 HashTable. 2 Dictionary A collection of data that is accessed by “key” values –The keys may be ordered or unordered –Multiple key values may/may-not.

42

0123456

Chaining Exampleput(B2, Data1)

put(S19, Data2)

put(J10, Data3)

put(N14, Data4)

put(X24, Data5)

put(W23, Data6)

put(B2, Data7)

get(X24)

get(W23)

(B2, Data1)

(S19, Data2)

(J10, Data3)

(N14, Data4)

I’m sorelieved!

(X24, Data5)

Page 43: 1 HashTable. 2 Dictionary A collection of data that is accessed by “key” values –The keys may be ordered or unordered –Multiple key values may/may-not.

43

Theoretical Results•Let = N/Mthe load factor: average number of keys per array index•Analysis is probabilistic, rather than worst-case

Expected Number of Probes

foundNot found

Page 44: 1 HashTable. 2 Dictionary A collection of data that is accessed by “key” values –The keys may be ordered or unordered –Multiple key values may/may-not.

44

Expected Number of Probes vs. Load Factor

Page 45: 1 HashTable. 2 Dictionary A collection of data that is accessed by “key” values –The keys may be ordered or unordered –Multiple key values may/may-not.

45

Summary

• Dictionaries may be ordered or unordered– Unordered can be implemented with

• lists (array-based or linked)

• hashtables (best solution)

– Ordered can be implemented with• lists (array-based or linked)

• trees (avl (best solution), splay, bst)


Recommended