Date post: | 03-Jan-2016 |
Category: |
Documents |
Upload: | adrienne-roderick |
View: | 25 times |
Download: | 0 times |
COSC 2007 Data Structures II
Chapter 13Advanced Implementation of
Tables IV
2
Topics
How to choose a Hash Function? Closed hashing
Linear hashing Quadratic hashing Double hashing
3
Hash Functions Good hash function:
Easy & fast to compute Has minimal number of clashes Data items are spread uniformly throughout the array
Hashing problems reduce to the following points: Finding a hashing method that minimizes collisions Resolving collisions when they do happen
4
Hashing Methods
Integer Type It is sufficient for a hash function to operate on integers Any arbitrary integer can be converted into an integer
within a certain range The index of the hash table lies within a specific range
Solutions Digit Selection Folding Modulo arithmetic
5
Hashing Methods
Digit Selection Choose a group of digits from the number Use combination of Mod/div operations on the
search key One of the most effective hashing methods
6
Hashing Methods
Digit Selection Example
Assume table size = 1000 Key = 01234567
Choose 2nd, 4th,& last digits H(key) = 147
key = d1 d2 d3 d4 d5 d6 d7 d8 d9 Choose leftmost 3 digits
H(key) = key Div 1000000 = d1 d2 d3 Choose rightmost 3 digits
H(key) = key Mod 1000 = d7 d8 d9
7
Hashing Methods
Digit Selection Mid-square Method (Multiplication)
First Variant Key is squared, then some digits of this square are selected to
give the index.
Example k = 54321 H(k) = k2 = 2950771041 Pick up 3 middle digits index = 077
8
Hashing Methods
Folding Method Digits are added together instead of just being selected Digits can first be grouped and then add the groups Folding can be done more than once on the search key
9
Hashing Methods
Folding Method Example:
Key = 1234567 H(Key) = 1 + 2 + 3 + 4 + 5 + 6 + 7 = 28
Disadvantage All values will be put in the range
Solution Divide into groups then fold
Key = 1234567 Groups: 12 345 67 Fold: 12 + 345 + 67 = 454 Hash again to fit into table size
10
Hashing Methods
Modulo Arithmetic Choose a prime table size Divide the search key using modulo the size of the table
h(x) = x mod TableSize
Items will be distributed over the table Advantages
Simple Reduces collisions
items will be evenly distributed if table size is a prime number
11
Hashing Methods What should be done if the search key is a
character? Convert the character string into some integer
before applying the hash function How should we do that?
Use the ASCII code: Can lead to duplication (e.g. NOTE and TONE will result in the
same hash function)
Write a numeric value for each character in binary Concatenate the results
12
Hashing Methods
Example: Key = NOTE
ASCII code for each character N = 14 = (01110) // Order of ‘N’ in alphabet O = 15 = (01111) T = 20 = (10100) E = 5 = (00101)
Concatenation Binary result:
y = (01110 01111 10100 00101) Equivalent decimal
X = 474,757 Apply hash function
h(x) = x mod TableSize
13
Closed Hashing (Open Addressing)
No secondary data structure All the data goes inside the table. On collision, try alternate cells until an
empty cell is found. How? Bigger table is needed.
14
Linear Probing
Linear search from position where collision occurred.
15
Linear Probing This is called a collision, because
there is already another valid record at [2].
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ]
. . .
[100]Number 506643548Number 233667136Number 281942902 Number 155778322Number 580625685
Number 265-7917
My hashvalue is [2].[2] is occupied, how to do[2] is occupied, how to do
16
Linear Probing This is called a collision, because
there is already another valid record at [2].
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ]
. . .
[100]Number 506643548Number 233667136Number 281942902 Number 155778322Number 580625685
Number 265-7917
My hashvalue is [2].
When a collision occurs,move forward until you
find an empty spot.
When a collision occurs,move forward until you
find an empty spot.
17
Linear Probing
This is called a collision, because
there is already another valid record at [2].
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ]
. . .
[100]Number 506643548Number 233667136Number 281942902 Number 155778322Number 580625685
Number 265-7917
My hashvalue is [2].[5] is empty, I can insert it[5] is empty, I can insert it
18
Linear Probing This is called a collision, because
there is already another valid record at [2].
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ]
. . .
[100]Number 506643548Number 233667136Number 281942902 Number 155778322Number 580625685
The new record goesin the empty spot.
The new record goesin the empty spot.
Number 701466868
19
Linear Probing
Find the next index in the array up until the maximum subscript is reached and then it should return to the first index (wrap around)
Try alternate cells Cells h0(x), h1(x), h2(x), … are tried until an free cell
is found hi(x) = ( hash(x) + f(i) ) mod TSIZE f(0) = 0
Linear probing f(i) = i
20
Searching for a Key The data that's attached to a key
can be found fairly quickly.
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ]
. . .
[100]Number 506643548Number 233667136Number 281942902 Number 155778322Number 580625685 Number 701466868
Number 265-7917
21
Searching for a Key Calculate the hash value. Check that location of the array
for the key..
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ]
. . .
[100]Number 506643548Number 233667136Number 281942902 Number 155778322Number 580625685 Number 701466868
Number 265-7917
My hashvalue is [2].Not me.
22
Searching for a Key Keep moving forward until you find the
key, or you reach an empty spot.
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ]
. . .
[100]Number 506643548Number 233667136Number 281942902 Number 155778322Number 580625685 Number 701466868
Number 265-7917
My hashvalue is [2].Not me.
23
Searching for a Key Keep moving forward until you find the
key, or you reach an empty spot.
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ]
. . .
[100]Number 506643548Number 233667136Number 281942902 Number 155778322Number 580625685 Number 701466868
Number 265-7917
My hashvalue is [2].Not me.
24
Searching for a Key Keep moving forward until you find the
key, or you reach an empty spot.
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ]
. . .
[100]Number 506643548Number 233667136Number 281942902 Number 155778322Number 580625685 Number 701466868
Number 265-7917
My hashvalue is [2].Yes!
25
Searching for a Key When the item is found, the information
can be copied to the necessary location.
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ]
. . .
[100]Number 506643548Number 233667136Number 281942902 Number 155778322Number 580625685 Number 701466868
Number 265-7917
My hashvalue is [2].
Yes!
26
Deleting a Record
Records may also be deleted from a hash table
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ]
. . .
[100]Number 506643548Number 233667136Number 281942902 Number 155778322Number 580625685 Number 701466868
Pleasedelete me.
27
Deleting a Record Records may also be deleted from a hash table. But the location must not be left as an ordinary
"empty spot" since that could interfere with searches.
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ]
. . .
[100]Number 233667136Number 281942902 Number 155778322Number 580625685 Number 701466868
28
Deleting a Record Records may also be deleted from a hash table. But the location must not be left as an ordinary "empty
spot" since that could interfere with searches. The location must be marked in some special way so
that a search can tell that the spot used to have something in it.
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ]
. . .
[100]Number 233667136Number 281942902 Number 155778322Number 580625685 Number 701466868
29
Linear Probing Advantage
Uses less memory than chaining don’t have to store all the links
Disadvantages Can be slower than chaining
may have to walk along the table for a long way Difficult to delete a key and associated record.
has an impact on the search process Clustering
Primary clustering Table contains groups of consecutively occupied locations
30
Linear probing: f(i) = i Quadratic probing: f(i) = i2
Insert 10, 40, 60, 20, 30, 70, 80
Quadratic Probing
0123456789
10
02
1040
12
1040
60
22
1040
60
2032
1040
60
30
2042
1040
607030
2052
1040
607030
2062 mod 10 = 6
31
Quadratic Probing Advantages
Easy to compute Avoids primary clustering
Disadvantage Not all entries are searched. Might not encounter a free storage
location even when there are locations that are still free Elements that has the same hash value will probe the same set
of alternate cells Secondary clustering Not a big problem in practice
Use a good hash function
32
Double Hashing
Use two hash functions one as before that generates the ‘home’ position. second one generates a sequence of offsets from
the home position that define the probe sequence. probe = (probe + offset) mod N
If the size of the table is prime, this method will eventually examine every position in the table.
33
Problems with Closed Hashing
Table too full Running time too long Inserts could fail
Must be chosen in advance Don’t know the number of elements
Rehashing Build a new table that is about twice as big Hash the elements into the new table
Need to apply new hash function to every item in the old hash table
34
Summary
Hash tables are specialized for dictionary operations: Insert, Delete, Search
Principle: Turn the key field of the record into a number, which we use as an index for locating the item in an array.
O(1) in the ideal case Problems: find a good hash function, collisions,
wasted space, do not support ordering queries Implementations: open hashing, closed hashing,
dynamic hashing
35
Reveiw
What is a perfect hash function? What is a collision? What is meant by clustering? How does
clustering affect the overall efficiency of hashing?
What is a bucket? What is the time complexity for insertion,
deletion, and search in Hashing?