Date post: | 21-Dec-2015 |
Category: |
Documents |
View: | 228 times |
Download: | 3 times |
Sets and Maps (and Hashing)
Chapter 9
Chapter 9: Sets and Maps 2
Chapter Objectives
• To understand the Java Map and Set interfaces and how to use them
• To learn about hash codes and how they are used to facilitate efficient search and retrieval
• To study two forms of hash tables—open addressing and chaining—and to understand their relative benefits and performance tradeoffs
Chapter 9: Sets and Maps 3
Chapter Objectives
• To learn how to implement both hash table forms• To be introduced to the implementation of Maps and
Sets• To see how two earlier applications can be more easily
implemented using Map objects for data storage
Chapter 9: Sets and Maps 4
Review of Sets
• Set is unordered, and has no duplicate elements• Suppose A = {1,3,5,7,9,11}, B = {2,3,5,7,11,13}• Then
• A B = {1,2,3,5,7,9,11,13}• A B = {3,5,7,11}• A B = {1,9}• B A = {2,13}• If C = {3,5,9}, then C A
Chapter 9: Sets and Maps 5
Sets and the Set Interface
• The part of the Collection hierarchy that relates to sets • Includes three interfaces, two abstract classes, and
two actual classes
Chapter 9: Sets and Maps 6
The Set Abstraction
• A set is a collection that contains no duplicate elements• And at most, one null element
• In a set, index of an element is meaningless• If s is a set,
s.contains(“apple”) returns true or false
s.indexOf(“apple”) makes no sense
s.get(i) is also nonsensical
Chapter 9: Sets and Maps 7
The Set Abstraction
• Operations on sets include:• Testing for membership• Adding (inserting) elements• Removing elements• Union• Intersection• Difference• Subset
Chapter 9: Sets and Maps 8
The Set Interface and Methods
• Has required methods for …• Testing set membership• Testing for an empty set• Determining set size• Creating an iterator over the set
• Two optional methods for …• To add an element• To remove an element
• Constructors enforce no duplicate members, and…• …add method does not allow duplicate item
Chapter 9: Sets and Maps 9
The Set Interface and Methods
Chapter 9: Sets and Maps 10
Comparison of Lists and Sets
• Duplicate elements• OK in a list• Not allowed in sets: Set.add returns false if you try to
insert a duplicate element• Get method
• List has a get method• A set has no get method (index is meaningless)
• Iterators• Lists have iterators• Can also iterate thru elements in a set
Chapter 9: Sets and Maps 11
Maps
• A map relates one set to another set• Map is a set of ordered pairs (x,y)
• Where x == key and y == value (element)• For example
• This map is: {(J,Jane), (B,Bill), (B2,Bill), (S,Sam), (B1,Bob)}
Chapter 9: Sets and Maps 12
Maps
• Map is a set of ordered pairs (x,y)• Where x == key and y == value (element)
• Keys must be unique• But values need not be unique (onto, not 1-to-1)• Each key “maps” to a particular value (element)
• Or, you might say it “corresponds” to• Maps used for very efficient storage and retrieval of
information in tables• Key is used like index into a list
• But key does not need to be integer
Chapter 9: Sets and Maps 13
Maps
• Suppose we have the map:{(J,Jane), (B,Bill), (B2,Bill), (S,Sam), (B1,Bob)}
• And it is stored in “aMap”• Then
• What does aMap.get(“B2”) return?• “Bill”• What does aMap.get(“Bill”) return? • Null, since nothing in aMap has key == “Bill”
Chapter 9: Sets and Maps 14
Map Interface
Chapter 9: Sets and Maps 15
Hash Tables
• For maps, want to access entry by its key, not its value• A hash table is used for such access• For efficiency, want to access element directly by its key
• As opposed to searching for key value in an array• Using a hash table we can retrieve an item in constant
time, on average, and linear time in worst case• That is, O(1) is expected, but O(n) is worst case
Chapter 9: Sets and Maps 16
Hash Codes and Index Calculation
• Hashing idea• Transform an item’s key value into an integer • Then use this integer as a numeric index
Chapter 9: Sets and Maps 17
Hash Code Index Example
• Suppose we want to store number of occurrences of each Unicode characters in a file• There are 65,536 Unicode characters
• What to do?• Could create an array of size 65,536 and store count of
character i in array element i • This will work, but…• …very inefficient for a small file• Suppose file only has 100 characters!• Is there a better way?
Chapter 9: Sets and Maps 18
Hash Code Index Calculation
• Suppose we want to store number of occurrences of each Unicode characters in a file• There are 65,536 Unicode characters
• File of 100 characters• Use a hash code for each character
• But how to compute hash code?• Could do the following:
• Create an array of size 200 and compute index as index = uniChar % 200
• Good since it uses less space• Bad if there are collisions
• 2 or more characters in file “hash” to same value
Chapter 9: Sets and Maps 19
Methods for Generating Hash Codes
• Usually, keys consist of strings of letters and/or digits• The number of possible key values is much larger than
the table size• Generating a good hash code is something of an art
• Some experimentation, trial-and-error may be required• Desirable properties of a “hash function”?
• A “random” (uniform) distribution of values• Relatively simple function• Efficient to compute
• Collisions can always occur---what to do?
Chapter 9: Sets and Maps 20
Java HashCode Method
• For strings, could simply sum int values of all characters • Will return the same hash code for sign and sing
• The Java API algorithm accounts for position of the characters as follows…• The String.hashCode() returns the integer calculated
by the formula: s0 x 31(n-1) + s1
x 31(n-2) + … + sn-1 where si is the ith character of the string, and n is the length of the string
• “Cat” will have a hash code of: ‘C’ x 312 + ‘a’ x 31 + ‘t’• Since 31 is a prime number, fewer collisions
Chapter 9: Sets and Maps 21
Open Addressing
• We consider two ways to organize hash tables• Open addressing• Chaining
• For open addressing, linear probing can be used to deal with collisions• If that element contains an item with a different key,
increment the index by one• Keep incrementing until you find the key or null entry• Null indicates element is not in the table
Chapter 9: Sets and Maps 22
Open Addressing Algorithm
Chapter 9: Sets and Maps 23
Table Wraparound and Search Termination
• As index increases, must wrap around (circular array)• Leads to the potential of an infinite loop• How do you know when to stop searching if the table is
full and you have not found the correct value?• Stop when the index value for the next probe is the
same as the hash code value for the object, or…• Ensure that the table is never full by increasing its size
after an insertion if its occupancy rate exceeds a specified threshold (sparser table has fewer collisions)
Chapter 9: Sets and Maps 24
Open Addressing Example
• Suppose we have the following values and hash codes
Name hashCode hashCode % 5 hashCode %11
“Tom” 84274 4 3
“Dick” 2129869 4 5
“Harry” 69496448 3 10
“Sam” 82879 4 5
“Pete” 2484038 3 7
Chapter 9: Sets and Maps 25
Open Addressing Example
• Suppose we use hashCode % 5 to create hash table• Using open addressing
Name hashCode % 5
“Tom” 4
“Dick” 4
“Harry” 3
“Sam” 4
“Pete” 3
index data
0 null
1 null
2 null
3 null
4 null
Chapter 9: Sets and Maps 26
Open Addressing Example
• Suppose we use hashCode % 5 to create hash table• Using open addressing
Name hashCode % 5
“Tom” 4
“Dick” 4
“Harry” 3
“Sam” 4
“Pete” 3
index data
0 null
1 null
2 null
3 null
4 “Tom”
Chapter 9: Sets and Maps 27
Open Addressing Example
• Suppose we use hashCode % 5 to create hash table• Using open addressing
Name hashCode % 5
“Tom” 4
“Dick” 4
“Harry” 3
“Sam” 4
“Pete” 3
index data
0 “Dick”
1 null
2 null
3 null
4 “Tom”
Chapter 9: Sets and Maps 28
Open Addressing Example
• Suppose we use hashCode % 5 to create hash table• Using open addressing
Name hashCode % 5
“Tom” 4
“Dick” 4
“Harry” 3
“Sam” 4
“Pete” 3
index data
0 “Dick”
1 null
2 null
3 “Harry”
4 “Tom”
Chapter 9: Sets and Maps 29
Open Addressing Example
• Suppose we use hashCode % 5 to create hash table• Using open addressing
Name hashCode % 5
“Tom” 4
“Dick” 4
“Harry” 3
“Sam” 4
“Pete” 3
index data
0 “Dick”
1 “Sam”
2 null
3 “Harry”
4 “Tom”
Chapter 9: Sets and Maps 30
Open Addressing Example
• Suppose we use hashCode % 5 to create hash table• Using open addressing
Name hashCode % 5
“Tom” 4
“Dick” 4
“Harry” 3
“Sam” 4
“Pete” 3
index data
0 “Dick”
1 “Sam”
2 “Pete”
3 “Harry”
4 “Tom”
Chapter 9: Sets and Maps 31
Open Addressing Example
• Suppose we use hashCode % 11 to create hash table• Using open addressing
Name hashCode % 5
“Tom” 3
“Dick” 5
“Harry” 10
“Sam” 5
“Pete” 7
Index data
0 null
1 null
2 null
3 null
4 null
5 null
6 null
7 null
8 null
9 null
10 null
Chapter 9: Sets and Maps 32
Open Addressing Example
• Suppose we use hashCode % 11 to create hash table• Using open addressing
Name hashCode % 5
“Tom” 3
“Dick” 5
“Harry” 10
“Sam” 5
“Pete” 7
Index data
0 null
1 null
2 null
3 “Tom”
4 null
5 null
6 null
7 null
8 null
9 null
10 null
Chapter 9: Sets and Maps 33
Open Addressing Example
• Suppose we use hashCode % 11 to create hash table• Using open addressing
Name hashCode % 5
“Tom” 3
“Dick” 5
“Harry” 10
“Sam” 5
“Pete” 7
Index data
0 null
1 null
2 null
3 “Tom”
4 null
5 “Dick”
6 null
7 null
8 null
9 null
10 null
Chapter 9: Sets and Maps 34
Open Addressing Example
• Suppose we use hashCode % 11 to create hash table• Using open addressing
Name hashCode % 5
“Tom” 3
“Dick” 5
“Harry” 10
“Sam” 5
“Pete” 7
Index data
0 null
1 null
2 null
3 “Tom”
4 null
5 “Dick”
6 null
7 null
8 null
9 null
10 “Harry”
Chapter 9: Sets and Maps 35
Open Addressing Example
• Suppose we use hashCode % 11 to create hash table• Using open addressing
Name hashCode % 5
“Tom” 3
“Dick” 5
“Harry” 10
“Sam” 5
“Pete” 7
Index data
0 null
1 null
2 null
3 “Tom”
4 null
5 “Dick”
6 “Sam”
7 null
8 null
9 null
10 “Harry”
Chapter 9: Sets and Maps 36
Open Addressing Example
• Suppose we use hashCode % 11 to create hash table• Using open addressing
Name hashCode % 5
“Tom” 3
“Dick” 5
“Harry” 10
“Sam” 5
“Pete” 7
Index data
0 null
1 null
2 null
3 “Tom”
4 null
5 “Dick”
6 “Sam”
7 “Pete”
8 null
9 null
10 “Harry”
Chapter 9: Sets and Maps 37
Hash Table Operations
• Iterating thru hash table gives entries in “arbitrary” order• Deleting from hash table
• Cannot just insert a null --- why not?• Null used for stopping/not found condition• Can insert a “dummy value”• So, removing does not improve search time
• Reducing collisions• Expand size of hash table, and rehash elements• Tradeoff between table size and search efficiency
Chapter 9: Sets and Maps 38
Reducing Collisions by Quadratic Probing
• Linear probing tends to form clusters of keys in the table, causing longer search chains
• Quadratic probing can reduce the effect of clustering• Increments form a quadratic series
• Disadvantages?• More work to calculate next index (multiplication,
addition, and modular division)• Not all table elements are examined when looking for
an insertion index
Chapter 9: Sets and Maps 39
Chaining
• Chaining is an alternative to open addressing• Each table element references a linked list that contains
all of the items that hash to the same table index• The linked list is often called a bucket• The approach sometimes called bucket hashing
• Only items that have the same value for their hash codes will be examined when looking for an object
Chapter 9: Sets and Maps 40
Chaining
• Recall hashCode % 5 • Chaining creates linked
list for each collision• In this example
• Linked list for Tom, Dick, Sam
• Another linked list for Harry and Pete
Name hashCode % 5
“Tom” 4
“Dick” 4
“Harry” 3
“Sam” 4
“Pete” 3
Chapter 9: Sets and Maps 41
Chaining
Chapter 9: Sets and Maps 42
Chaining
• Plusses?• Conceptually simple• Minimizes table size• Good search efficiency
• Minuses?• Overhead of linked lists (more storage)• More complex (perhaps)
Chapter 9: Sets and Maps 43
Performance of Hash Tables
• Load factor is number of filled cells divided by table size• Load factor has greatest effect on performance
• The lower the load factor, the better the performance • Why?• Less chance of collision in a sparsely populated table• But, smaller the load factor, more wasted space…
Chapter 9: Sets and Maps 44
Performance of Hash Tables
Chapter 9: Sets and Maps 45
Maps and Hashing
• Maps use hash tables!• Hashing converts the key into an index
• Index is place where corresponding value stored• Makes it possible to search efficiently
• Recall, O(1), on average• Without having an (explicit) index• Of course, there is some additional overhead
Chapter 9: Sets and Maps 46
Implementing a Hash Table
Chapter 9: Sets and Maps 47
Implementing a Hash Table
Chapter 9: Sets and Maps 48
Implementation of Maps and Sets
• Class Object implements methods hashCode and equals, so every class can access these methods unless it overrides them
• Object.equals compares two objects based on their addresses, not their contents
• Object.hashCode calculates an object’s hash code based on its address, not its contents
• Java recommends that if you override the equals method, then you should also override the hashCode method
Chapter 9: Sets and Maps 49
Implementing HashSetOpen
Chapter 9: Sets and Maps 50
Implementing Java Map and Set Interfaces
• The Java API uses a hash table to implement both the Map and Set interfaces
• The task of implementing the two interfaces is simplified by the inclusion of abstract classes AbstractMap and AbstractSet in the Collection hierarchy
Chapter 9: Sets and Maps 51
Nested Interface Map.Entry
• One requirement on the key-value pairs for a Map object is that they implement the interface Map.Entry<K, V>, which is an inner interface of interface Map• An implementer of the Map interface must contain an
inner class that provides code for the methods in the table below
Chapter 9: Sets and Maps 52
Additional Applications of Maps
• Can implement the phone directory using a map
Chapter 9: Sets and Maps 53
Additional Applications of Maps
• Huffman Coding Problem• Use a map for creating an array of elements and
replacing each input character by its bit string code in the output file
• Frequency table• The key will be the input character• The value is the character code string
Chapter 9: Sets and Maps 54
Chapter Review
• The Set interface describes an abstract data type that supports the same operations as a mathematical set
• The Map interface describes an abstract data type that enables a user to access information corresponding to a specified key
• A hash table uses hashing to transform an item’s key into a table index so that insertions, retrievals, and deletions can be performed in expected O(1) time
• A collision occurs when two keys map to the same table index
• In open addressing, linear probing is often used to resolve collisions
Chapter 9: Sets and Maps 55
Chapter Review
• The best way to avoid collisions is to keep the table load factor relatively low by rehashing when the load factor reaches a value such as 0.75
• In open addressing, you can’t remove an element from the table when you delete it, but you must mark it as deleted
• A set view of a hash table can be obtained through method entrySet
• Two Java API implementations of the Map (Set) interface are HashMap (HashSet) and TreeMap (TreeSet)