Date post: | 01-Jan-2016 |
Category: |
Documents |
Upload: | quynn-petersen |
View: | 14 times |
Download: | 0 times |
averageTimeS(n), THE AVERAGE TIME
FOR A SUCCESSFUL SEARCH
averageTimeU(n), … UNSUCCESSFUL …
worstTimeS(n)
worstTimeU(n)
SEQUENTIAL SEARCH
// Postcondition: if there is an item in the range of iterators// from first (inclusive) through last// (exclusive) that is equal to value, the// iterator returned is the first iterator i in that// range such that *i = value. Otherwise,// last is returned. The worstTime(n) is O(n).template <class InputIterator, class T>InputIterator find(InputIterator first, InputIterator last, const T& value){
while (first != last && *first != value)++first;
return first;}
BINARY SEARCH OF A SORTEDCONTAINER:
template <class ForwardIterator, class T>inline bool binary_search (ForwardIterator first, ForwardIterator last, const T& value){ ForwardIterator i = lower_bound(first, last, value); return i != last && !(value < *i);}
lower_bound RETURNS THE FIRSTPOSITION WHERE value COULD BEINSERTED WITHOUT DISORDERINGTHE CONTAINER.
template <class ForwardIterator, class T>inline ForwardIterator lower_bound(ForwardIterator first, ForwardIterator last,
const T& value) { return __lower_bound(first, last, value, distance_type(first),
iterator_category(first));}
IF THE ITERATOR CATEGORY IS
RANDOM ACCESS, __lower_boundUSES THE RANDOM-ACCESSPROPERTY TO SUBDIVIDE THEREGION SEARCHED:
template <class RandomAccessIterator, class T, class Distance>RandomAccessIterator __lower_bound( RandomAccessIterator first,
RandomAccessIterator last, const T& value, Distance*, random_access_iterator_tag) {
Distance len = last - first; Distance half; RandomAccessIterator middle; while (len > 0) {
half = len / 2;middle = first + half;if (*middle < value) {
first = middle + 1; len = len - half - 1;
} else len = half;
} return first;}
template <class Key, class Value, class KeyOfValue, class Compare>rb_tree<Key, Value, KeyOfValue, Compare>::iteratorrb_tree<Key, Value, KeyOfValue, Compare>::find(const Key& k){ link_type y = header; /* Last node which is not less than k. */ link_type x = root(); /* Current node. */
while (x != NIL) if (!key_compare(key(x), k)) y = x, x = left(x); else x = right(x);
iterator j = iterator(y); return (j == end() || key_compare(k, key(j.node))) ? end() : j;}
THE CLASS IN WHICH HASHING IS
IMPLEMENTED IS THE hash_map
CLASS. THIS IS NOT YET IN THE
STANDARD TEMPLATE LIBRARY.
TO A USER, THE hash_map CLASS
IS SIMILAR TO THE map CLASS,
EXCEPT hash_map HAS ONLY A FEW
METHODS, SUCH AS insert, erase, AND
find. AND THE TIMING ESTIMATES
FOR THOSE METHODS ARE LOWERTHAN IN THE map CLASS.
RECALL THAT EACH VALUE (THATIS, ITEM) IN A MAP IS A PAIR WHOSE
FIRST COMPONENT IS OF TYPE Key
AND WHOSE SECOND COMPONENT IS
OF TYPE T. THE KEYS ARE UNIQUE,THAT IS, NO TWO DISTINCT VALUESHAVE THE SAME KEY.
1. // Postcondition: this hash_map is empty. hash_map( );
2. // Postcondition: the number of items in this hash_map// has been returned.
int size( );
3. // Postcondition: If an item with x's key had already been// inserted into this hash_map, the pair// returned consists of an iterator positioned// at the previously inserted item, and false. // Otherwise, the pair returned consists of
// an iterator positioned at the newly inserted// item, and true. Timing estimates are// discussed later.
pair<iterator, bool> insert ( const value_type<const key_type, T>& x);
4. // Postcondition: if this hash_map already contains a value// whose key part is key, a reference to that// value's second component has been// returned. Otherwise, a new value, <key,// T( )>, is inserted into this hash_map. Timing// estimates are discussed later.
T& operator[ ] (const key_type& key);
5. // Postcondition: If this hash_map contains a value whose// first component equals key, an iterator// positioned at that value has been returned.// Otherwise, an iterator at the same
// position as end() has been returned. // Timing estimates are discussed later. iterator find (const key_type& key);
6. // Precondition: itr is positioned at value in this hash_map. // Postcondition: the value that itr is positioned at has been // deleted from this hash_map. Timing // estimates are discussed later in this chapter. void erase (iterator itr);
7. // Postcondition: an iterator positioned at the beginning // of this hash_map has been returned. // Timing estimates are discussed later. iterator begin( );
8. // Postcondition: an iterator has been returned that can be// used in comparisons to terminate iterating// through this hash_map.
iterator end( );
9. // Postcondition: the space for this hash_map object has// been deallocated.~hash_map( );
WE’LL STUDY THE TIME ESTIMATES
AFTER WE DEFINE THE METHODS.
BUT BASICALLY, FOR find, insert, AND
erase,
averageTime(n) IS CONSTANT!
CONTIGUOUSarray? vector? deque? heap?
LINKED
Linked? list? map?
BUT NONE OF THESE WILL GIVE
CONSTANT AVERAGE TIME FOR
SEARCHES, INSERTIONS AND
REMOVALS.
LET’S SEE WHERE THAT LEADS.
SUPPOSE persons IS A HASH MAPTHAT WILL HOLD UP TO 1000VALUES. EACH VALUE CONSISTSOF A UNIQUE 3-DIGIT INTEGER (THEKEY), AND A NAME.
Persons [351] = “Prashant”;
persons [108] = “Barrett”;
persons[435] = “Lin”;
WHERE SHOULD WE STORE THEVALUE WHOSE KEY IS 351?
NOW FOR SOMETHING SLIGHTLY
DIFFERENT: SUPPOSE persons IS A
HASH MAP THAT HOLDS UP TO 1000
VALUES. EACH VALUE CONSISTS OF
A 10-DIGIT TELEPHONE NUMBER
(THE KEY), AND A NAME.
persons [9876543210] = “Prashant”;
persons [6103301256] = “Barrett”;
persons [6103309816] = “Lin”;
persons [4153576256] = “Sutey”;
WHERE SHOULD THESE VALUES
BE STORED?
WHEN TWO DIFFERENT KEYS MAP TOTHE SAME INDEX, THAT IS CALLED ACOLLISION.
KEYS THAT MAP TO THE SAME INDEXARE CALLED SYNONYMS.
THE ALGORITHM HAS TWO PARTS:
1. A HASH FUNCTION: AN EASILYCOMPUTABLE OPERATION ON THE
KEY THAT RETURNS AN unsigned
long, WHICH IS THEN CONVERTED
INTO AN INDEX IN THE ARRAY
buckets;
2. A COLLISION HANDLER.
HERE IS THE START OF THE
hash_map CLASS:
template<class Key, class T, class HashFunc>class hash_map{
THE THIRD TEMPLATE PARAMETERIS A FUNCTION CLASS: A CLASS INWHICH THE FUNCTION-CALL
OPERATOR, operator( ), ISOVERLOADED.
THE HEADING FOR operator( ) ISunsigned long operator( ) (const key_type& key)
FOR EXAMPLE, WE CAN DEFINE ASIMPLE FUNCTION CLASS IF EACH
KEY IS AN int:class hash_func{ public:
unsigned long operator( ) (const int& key){
return (unsigned long)key;} // overloaded operator( )
} // class hash_func
HERE IS A PROGRAM WITH A
hash_map CLASS IN WHICH EACH
VALUE CONSISTS OF A TELEPHONE
EXTENSION AND THE PERSON AT
THAT EXTENSION. THE ABOVE
hash_func IS USED.
int main(){ const string CLOSE_WINDOW_PROMPT = "Please press the Enter key to close this output window.";
typedef hash_map<int, string, hash_func> hash_class;
hash_class extensions; hash_class::iterator itr;
extensions [5520] = "Yvonne"; extensions [5415] = "Jim"; extensions [5416] = "Penny"; extensions [5537] = "Chun Wai"; extensions [5273] = "Jim";
for (itr = extensions.begin(); itr != extensions.end(); itr++) cout << (*itr).first << " " << (*itr).second << endl;
cout << "The number of items is " << extensions.size() << endl; if (extensions.find (5537) != extensions.end()) { cout << endl << "At extension " << 5537 << " is " << extensions [5537] << endl; extensions.erase (extensions.find (5537)); } // if for (itr = extensions.begin( ); itr != extensions.end( ); itr++) cout << (*itr).first << " " << (*itr).second << endl;
cout << endl << endl << CLOSE_WINDOW_PROMPT; cin.get( );
return 0;} // main
HERE IS THE OUTPUT:
5520 Yvonne5537 Chun Wai5415 Jim5416 Penny5273 JimThe number of items is 5
At extension 5537 is Chun Wai5520 Yvonne5415 Jim5416 Penny5273 Jim
Please press the Enter key to close this output window.
THERE IS NO OBVIOUS ORDER OFTHE KEYS. IF THE CONTAINER MUST
ALWAYS BE IN ORDER, USE A map
INSTEAD OF A hash_map.
HERE IS ANOTHER hash_func CLASS,ONE IN WHICH THE KEY IS A STRINGOF UP TO 20 CHARACTERS.BASICALLY, WE ADD UP THE ASCIIVALUES OF THE KEY’S CHARACTERS.TO FURTHER SPREAD OUT THERESULT, PARTIAL TOTALS ARE MUL-TIPLIED BY 13, AND THE FINAL TOTALIS MULTIPLIED BY A BIG PRIME.
class hash_func{ public:
unsigned long operator( ) (const string& key) { const unsigned long BIG_PRIME = 4294967291; unsigned long total = 0;
for (unsigned i = 0; i < key.length(); i++) total += 13 * key [i]; return total * BIG_PRIME; } // operator( )}; // class hash_func
THE hash_func CLASS IS SUPPLIED BY
THE USER. THE hash_map CLASS
CONVERTS THE unsigned long
RETURNED BY operator( ) INTO AN
ARRAY INDEX BY TAKING THE
REMAINDER % CAPACITY OF buckets.
EXERCISE:
SUPPOSE THE CAPACITY OF bucketsIS 203, AND FOR key1, key2, AND key3,
THE unsigned long NUMBERS
RETURNED BY hash_func (const string&
key) ARE 202, 203, AND 204RESPECTIVELY. AT WHATLOCATIONS WOULD THE VALUES
WITH KEYS key1, key2, AND key3 BESTORED?
USERS OF THE hash_map CLASS“HOPE” THAT THE KEYS ARE
SCATTERED RANDOMLYTHROUGHOUT THE TABLE. THIS
HOPE IS FORMALLY STATED AS
FOLLOWS:
THE UNIFORM HASHING ASSUMPTION
EACH KEY IS EQUALLY LIKELY TOHASH TO ANY ONE OF THE TABLEADDRESSES, INDEPENDENTLY OFWHERE THE OTHER KEYS HAVEHASHED.
CHAINING (ALSO CALLED CHAINED
HASHING): AT INDEX i IN buckets,STORE THE LIST OF ALL VALUES
WHOSE KEYS HASH TO i.
HERE ARE THE FIELDS FOR CHAINEDHASHING:
list <value_type< const key_type, T> >* buckets; // at each index in the array buckets,
// buckets, we will store the list of all // items whose keys hashed to that index
int count, // number of items in this hash_map length; // number of buckets in this hash_maphash_func hash; // hash is a function object
HERE IS HOW THE unsigned long
RETURNED BY hash_func IS
CONVERTED INTO AN INDEX:
int index = hash_func (key) % length;
INSERT VALUES WITH THESE KEYS:
21555516127178626358610330935861033090007178621359717862745121555543586103300451
ASSUME length = 1000. IGNORE 2ND COMPONENT
IN VALUE, IGNORE prev FIELD, USE ‘X’ AT END.
buckets count 0 1
... 358
359... 451 ... 612
6103309000 X 8
X
7178626358 6103309358
7178627451
2155551612 X
6103300451 X
7178621359 2155554358 X X
WHAT ABOUT THE SIZE OF buckets,
AND SHOULD THAT ARRAY EVER BE
RE-SIZED?
THE INITIAL CAPACITY WILL BE 203,
AND WE WILL RE-SIZE WHENEVER
THE LOAD FACTOR, THE RATIO OF
count TO length, EXCEEDS 0.75.
TO RE-SIZE, WE WILL DOUBLE THE
OLD CAPACITY, PLUS 1.
NOTE THAT WE RE-SIZE WHENEVER
THE LOAD FACTOR, THAT IS, THE
AVERAGE LIST SIZE, EXCEEDS 0.75.
BY THE WAY, IF WE NEVER RESIZED,
THE AVERAGE LIST SIZE WOULD BE
PROPORTIONAL TO count, SO WE
WOULD HAVE SEQUENTIAL
SEARCHES, AND averageTime(n)
WOULD BE LINEAR IN n FOR find,
insert, AND erase!
hash_map( ){
count = 0; length = DEFAULT_SIZE; // = 203 buckets = new list<value_type<const key_type, T> > [DEFAULT_SIZE];} // default constructor
THE DEFINITION OF THE insert
METHOD IS COMPLICATED BY
OF THE POSSIBILITY OF RESIZING.
THE RESIZING ISSUE IS HANDLED IN
A SEPARATE METHOD, SO THE insert
METHOD ITSELF BECOMES FAIRLY
SIMPLE:
TO INSERT THE VALUE x IF x IS NOT
ALREADY IN THE hash_map, GET THE
address THAT x.first HASHES TO, AND
APPEND x TO THE LINKED LIST AT
buckets [address].
pair<iterator, bool> insert ( const value_type<const key_type, T>& x){
iterator old_itr; key_type key = x.first;
old_itr = find (key); if (old_itr != end()) return pair <iterator, bool> (old_itr, false); int address = hash (key) % length; buckets [address].push_back (x); count++; check_for_expansion( ); return pair<iterator, bool> (find (key), true);} // insert
IN check_for_expansion, IF count >=
length * 0.75, CREATE A NEW ARRAY
OF DOUBLE THE OLD LENGTH (PLUS
1). FOR EACH LIST IN THE OLD
ARRAY, ITERATE THROUGH THE
LIST, AND HASH EACH VALUE TO
THE NEW ARRAY. FINALLY, ERASE
THE OLD ARRAY.
void check_for_expansion(){
list<value_type<const key_type, T> >::iterator list_itr; int index;
if (count > int (MAX_RATIO * length)) {
list< value_type<const key_type, T> >* temp_buckets = buckets;
length = 2 * length + 1; buckets = new list< value_type<const key_type, T> > [length]; Key new_key;
for (int i = 0; i < length / 2; i++) if (!temp_buckets [i].empty()) for (list_itr = temp_buckets [i].begin();
list_itr != temp_buckets [i].end(); list_itr++) { new_key = (*list_itr).first; index = hash (new_key) % length; buckets [index].push_back (*list_itr); } // storing temp_buckets [i] back into buckets
// destroy the old linked lists for (int i = 0; i < length / 2; i++) if (!temp_buckets [i].empty()) temp_buckets [i].erase (temp_buckets [i].begin( ), temp_buckets [i].end( )); } // doubling buckets size} // method check_for_expansion
THE begin, end, find, AND erase
METHODS INVOLVE THE EMBEDDED
iterator CLASS, SO LET’S DEVELOP
THAT iterator CLASS NOW.
WE HAVE ALREADY SEEN THE
METHOD INTERFACES FOR
OPERATORS ==, !=, ++, AND *.
WHAT FIELDS DO WE NEED?
TO ITERATE THROUGH A LIST, WE
WILL NEED A LIST ITERATOR FIELD.
TO BE ABLE TO GET TO THE
NEXT LIST, WE NEED A COPY OF
buckets (REMEMBER, THIS IS JUST A
POINTER TO AN ARRAY, NOT AN
ARRAY ITSELF), AND index. AND TO
KNOW WHEN THERE ARE NO MORE
LISTS, WE NEED A COPY OF length.
class iterator{
friend class hash_map<key_type, T, hash_func>;
protected:
unsigned index, length; list<value_type<const key_type, T> >::iterator list_itr; list<value_type<const key_type, T> >* buckets;
THE DEFINITIONS OF ==, !=, AND * ARE
STRAIGHTFORWARD:
bool operator== (const iterator& other) const{
return (index == other.index) && (list_itr == other.list_itr) && (buckets == other.buckets) && (length = other.length)} // operator==
value_type<const key_type, T> & operator* (){ return *list_itr;} // operator*
bool operator!= (const iterator& other) const{ return !(*this == other);} // operator!=
operator++(int) FIRST SAVES THE
CALLING OBJECT AND THEN ADVAN-
CES list_itr. IF THAT LIST ITERATOR
IS AT THE END OF ITS LIST, THE
REMAINING LISTS IN buckets ARE
SEARCHED UNTIL (UNLESS) A NON-
EMPTY BUCKET IS FOUND; list_itr IS
POSITIONED AT THE BEGINNING OF
THAT LIST. ULTIMATELY, THE SAVEDCALLING OBJECT IS RETURNED.
iterator operator++ (int){ iterator old_itr = *this; if (++list_itr != buckets [index].end()) return old_itr; while (index < length - 1) { index++; if (!buckets [index].empty()) { list_itr = buckets [index].begin(); return old_itr; } // bucket entry not empty }// while list_itr = buckets [index].end(); return old_itr;} // postincrement operator++
NOW WE CAN GET BACK TO THE
hash_map METHOD DEFINITIONS.
THE begin METHOD RETURNS ANITERATOR POSITIONED AT THEFIRST NON-EMPTY LIST IN THE
ARRAY buckets:
iterator begin(){ int i = 0; iterator itr; itr.buckets = buckets; while (i < length - 1) { if (!buckets [i].empty()) { itr.list_itr = buckets [i].begin(); itr.index = i; itr.length = length; return itr; } // buckets [i] not empty i++; } // while itr.list_itr = buckets [i].end(); itr.index = i; itr.length = length; return itr;} // begin
THE DEFINITION OF THE end METHOD
IS EVEN SIMPLER: AN ITERATOR
POSITIONED AT THE END OF THE LIST
IN buckets [length – 1] IS RETURNED:
iterator end(){ int i = length – 1; iterator itr; itr.buckets = buckets; itr.list_itr = buckets [i].end(); itr.index = i; itr.length = length; return itr;} // end
FINALLY, THE find METHODSEARCHES THE LIST AT THE ENTRY
IN buckets WHERE key HASHED. ANITERATOR REFLECTING THE RESULTOF THAT SEARCH IS RETURNED:
iterator find (const key_type& key){ int index = hash (key) % length; iterator itr;
itr.list_itr = std::find (buckets [index].begin(), buckets [index].end(), value_type<const key_type, T> (key, T())); if (itr.list_itr == buckets [index].end()) return end(); itr.buckets = buckets; itr.index = index;
itr.length = length; return itr;} // find
THIS DEFINITION HAS A COUPLE OFINTERESTING FEATURES:
1. THE SCOPE-RESOLUTIONOPERATOR, operator::, IS USED TOINDICATE THAT THE CALL TO findIN LINE 5 REFERS TO THEGENERIC ALGORITHM find IN THESTANDARD TEMPLATE LIBRARY,NOT A RECURSIVE CALL TO THEhash_map CLASS’S find METHOD.
2. WHEN THE STANDARD TEMPLATECLASS’S find METHOD IS CALLED,A LIST OF VALUES IS SEARCHED.BUT EQUALITY SHOULD BE BASEDON KEYS. SO A VALUE CANNOTSIMPLY BE AN INSTANCE OF THE,pair CLASS BECAUSE THE pair CLASS DOES NOT EXPLICITLYDEFINE operator==. INSTEAD:
template<class Key, class T>struct value_type
{ value_type (const value_type& p): first (p.first) { second = p.second; }
value_type (const Key& key, const T& t): first (key) { second = t; }
bool operator== (const value_type& x) { return first == x.first; }
Key first; T second; }; // struct value_type
THE CONSTRUCTOR-INITIALIZER SECTION:
:first (p.first)
IS NEEDED BECAUSE THE CALL TO find HAS
itr.list_itr = std::find (buckets [index].begin(), buckets [index].end(), value_type<const key_type, T> (key, T()));
SO first CANNOT BE MODIFIED WITHIN THE
CONSTRUCTOR.
TIME ESTIMATES:
LET n = count, LET m = length.
MAKE THE UNIFORM HASHING
ASSUMPTION AND ASSUME THATcount <= length * 0.75.
FOR THE find METHOD,
averageTimeS(n, m) n / 2m iterations.
<= 0.75 / 2
SO averageTimeS(n, m) <= A CONSTANT.
averageTimeS(n, m) IS CONSTANT.
EVEN IF THE UNIFORM HASHING
ASSUMPTION HOLDS, IT IS POSSIBLE
FOR EACH KEY TO HASH TO THE
SAME INDEX. TO SEARCH THE LIST
AT THAT INDEX TAKES LINEAR-IN-n
TIME.
SO worstTimeS(n, m) IS LINEAR IN n.
EXERCISE: FOR operator++ IN THE
iterator CLASS, FIND THE SMALLEST
UPPER BOUND OF worstTime(n, m) AND
THE SMALLEST UPPER BOUND OF
averageTime(n, m).
NOTE: IT DOES NOT MATTER
WHETHER THE UNIFORM HASHING
ASSUMPTION HOLDS!
ANOTHER COLLISION HANDLER IS
PROVIDED WITH OPEN-ADDRESS
HASHING.
NO LINKED LISTS
AT MOST ONE VALUE IS STORED AT
EACH INDEX IN buckets.
key TO index: SAME AS BEFORE
COLLISION HANDLER?
SEARCH THE TABLE UNTIL AN
“OPEN” SLOT IN buckets IS FOUND.
OFFSET-OF-1 COLLISION HANDLER:
IF buckets [index] ALREADY HAS
ANOTHER ELEMENT, TRY
buckets [index + 1], buckets [index + 2], …,
buckets [length – 1], buckets [0],
buckets [1], …, buckets [index – 1].
TO INDICATE WHETHER A LOCATION
IS OCCUPIED, THE value_type CLASS
WILL HAVE
bool occupied;
IN THE FOLLOWING SNAPSHOT, THE
SECOND COMPONENT IN EACH OF
THE 5 VALUES SHOWN IS IGNORED:
key occupied
0
54 1069 % 203 = 54 55 460 % 203 = 54 56 1070 % 203 = 55
109 312 % 203 = 109
201 607 % 203 = 201 202
? false
… false
1069 true 460 true 1070 true
312 true
607 true false
key occupied
0
54 1069 % 203 = 54 55 460 % 203 = 54 56 1070 % 203 = 55
109 312 % 203 = 109
201 607 % 203 = 201 202
? false
… false
1069 false 460 true 1070 true
312 true
607 true false
NOW A SEARCH 460 FOR WOULD BE
UNSUCCESSFUL BECAUSE 460
INITIALLY HASHES TO 54, AN
UNOCCUPIED LOCATION.
SOLUTION:
bool marked_for_removal;
THE CONSTRUCTOR SETS EACH bucket’s marked_for_removal FIELD TO false.
insert SETS marked_for_removal TO false;
erase SETS marked_for_removal TO true.
SO AFTER THE INSERTIONS:
marked_for_ key occupied removal
0
54 1069 % 203 = 54 55 460 % 203 = 54 56 1070 % 203 = 55
109 312 % 203 = 109
201 607 % 203 = 201 202
? false
… false
1069 true 460 true 1070 true
312 true
607 true false
false
false
falsefalsefalse
false
falsefalse
marked_for_ key occupied removal
0
54 1069 % 203 = 54 55 460 % 203 = 54 56 1070 % 203 = 55
109 312 % 203 = 109
201 607 % 203 = 201 202
? false
… false
1069 true 460 true 1070 true
312 true
607 true false
false
false
truefalsefalse
false
falsefalse
THIS SOLUTION LEADS TO ANOTHER
PROBLEM: SUPPOSE length = 203.
inserterase // what was just inserted (insert AND erase A TOTAL OF 202 TIMES)insert
count = 1, SO THERE IS NO NEED TO
EXPAND AND RE=HASH. BUT THERE
ARE 202 MARKED-FOR-REMOVALS!
FOR find, AN UNSUCCESSFUL
SEARCH CANNOT STOP UNTIL buckets
[index].marked_for_removal = false.
APPROXIMATELY HOW MANY
ITERATIONS WILL BE MADE IN AN
UNSUCCESSFUL SEARCH, ON
AVERAGE, OF THIS TABLE?
SOLUTION: KEEP TRACK OF REMOVALS:
int count_plus // count + number of removals since last rehashing
void check_for_expansion(){
if (count_plus >= int (MAX_RATIO * length)) { // Do not copy the marked_for_removals.
delete[ ] temp_buckets;
count_plus = count; } // doubling buckets size} // method check_for_expansion
pair<iterator, bool> insert ( const value_type<const key_type, T>& x){ key_type key = x.first; unsigned long hash_int = hash (key); int index = hash_int % length; while ((buckets [index].marked_for_removal) || buckets [index].occupied && key != buckets [index].first) index = (index + 1) % length; if (buckets [index].occupied && key == buckets [index].first) return pair <iterator, bool> (find (key), false); buckets [index].first = x.first; buckets [index].second = x.second; buckets [index].occupied = true; buckets [index].marked_for_removal = false; count++; count_plus++; check_for_expansion(); return pair<iterator, bool> (find (key), true);} // insert
marked_for_ key occupied removal
0
54 1069 % 203 = 54 55 460 % 203 = 54 56 1070 % 203 = 55
109 312 % 203 = 109
201 607 % 203 = 201 202
? false
… false
1069 true 460 true 1070 true
312 true
607 true false
false
false
falsefalsefalse
false
falsefalse
PRIMARY CLUSTERING: THE
PHENOMENON THAT OCCURS WHEN
THE COLLISION HANDLER ALLOWS
THE GROWTH OF CLUSTERS TO
ACCUMULATE.
THIS WILL OCCUR WITH OFFSET-OF-
1 OR ANY CONSTANT OFFSET.
SOLUTION: DOUBLE HASHING, THAT
IS, OBTAIN BOTH INDICES AND
OFFSETS BY HASHING:
unsigned long hash_int = hash (key);int index = hash_int % length,offset = hash_int / length;
NOW THE OFFSET DEPENDS ON THE
KEY, SO DIFFERENT KEYS WILL USU-
ALLY HAVE DIFFERENT OFFSETS, SO
NO MORE PRIMARY CLUSTERING!
EXAMPLE: length = 11
key index offset15
4
119
8
116
5
158
3
527
5
235
2
330
8
247
3
4
WHERE WOULD THESE KEYS GO IN buckets?
PROBLEM: WHAT IF OFFSET IS MULTIPLE OF length?
EXAMPLE: length = 11
key index offset15
4
119
8
116
5
158
3
527
5
235
2
347
3
4246
4
22 // BUT 15 IS AT INDEX 4
FOR KEY 246, NEW INDEX = (4 + 22) % 11 = 4. OOPS!
SOLUTION:
if (offset % length == 0)
offset = 1;
ON AVERAGE, offset % length WILL
EQUAL 0 ONLY ONCE IN EVERY
length TIMES.
FINAL PROBLEM: WHAT IF length HAS SEVERAL FACTORS?
EXAMPLE: length = 20
key index offset20 0 125 5 130 10 135 15 1110 10 5 // BUT 30 IS AT INDEX 10
FOR KEY 110, NEW INDEX = (10 + 5) % 20 = 15, WHICH IS OCCUPIED, SO NEW INDEX = (15 + 5) % 20, WHICH IS OCCUPIED, SO NEW INDEX = ...
THIS VERSION OF OPEN-ADDRESS
HASHING IS FAST. IF THE UNIFORM
HASHING ASSUMPTION HOLDS,
averageTime(n, m) FOR SEARCHING,
INSERTING AND REMOVING IS
CONSTANT.
HOW DOES DOUBLE-HASHING
COMPARE WITH CHAINED HASHING?
n / m 0.25 0.50 0.75 0.90 0.99
Chained Hashing:successful 0.13 0.25 0.38 0.45 0.50unsuccessful 0.25 0.50 0.75 0.90 0.99
Double Hashing:successful 1.14 1.39 1.85 2.56 4.65unsuccessful 1.33 2.00 4.00 10.00 100.00
Figure 13.14. ESTIMATED AVERAGE NUMBER OFLOOP ITERATIONS FOR SUCCESSFUL ANDUNSUCCESSFUL CALLS TO find, UNDERBOTH CHAINED HASHING AND DOUBLEHASHING.