RED-BLACK TREE SEARCH THE FOLLOWING METHOD IS IN tree.h OF THE HEWLETT-PACKARD IMPLEMENTATION:

CHAPTER 13

SEARCHING

AND THE

HASH CLASSES

averageTimeS(n), THE AVERAGE TIME

FOR A SUCCESSFUL SEARCH

averageTimeU(n), … UNSUCCESSFUL …

worstTimeS(n)

worstTimeU(n)

LET’S START WITH A REVIEW OFEARLIER SEARCH TECHNIQUES:

SEQUENTIAL SEARCH

// Postcondition: if there is an item in the range of iterators// from first (inclusive) through last// (exclusive) that is equal to value, the// iterator returned is the first iterator i in that// range such that *i = value. Otherwise,// last is returned. The worstTime(n) is O(n).template <class InputIterator, class T>InputIterator find(InputIterator first, InputIterator last, const T& value){

while (first != last && *first != value)++first;

return first;}

THE worstTimeU(n) IS LINEAR IN n.

DITTO FOR worstTimeS(n),averageTimeU(n), AND averageTimeS(n).

BINARY SEARCH OF A SORTEDCONTAINER:

template <class ForwardIterator, class T>inline bool binary_search (ForwardIterator first, ForwardIterator last, const T& value){ ForwardIterator i = lower_bound(first, last, value); return i != last && !(value < *i);}

lower_bound RETURNS THE FIRSTPOSITION WHERE value COULD BEINSERTED WITHOUT DISORDERINGTHE CONTAINER.

template <class ForwardIterator, class T>inline ForwardIterator lower_bound(ForwardIterator first, ForwardIterator last,

const T& value) { return __lower_bound(first, last, value, distance_type(first),

iterator_category(first));}

IF THE ITERATOR CATEGORY IS

RANDOM ACCESS, __lower_boundUSES THE RANDOM-ACCESSPROPERTY TO SUBDIVIDE THEREGION SEARCHED:

template <class RandomAccessIterator, class T, class Distance>RandomAccessIterator __lower_bound( RandomAccessIterator first,

RandomAccessIterator last, const T& value, Distance*, random_access_iterator_tag) {

Distance len = last - first; Distance half; RandomAccessIterator middle; while (len > 0) {

half = len / 2;middle = first + half;if (*middle < value) {

first = middle + 1; len = len - half - 1;

} else len = half;

} return first;}

THE worstTimeU(n) IS LOGARITHMIC INn.


RED-BLACK TREE SEARCH

THE FOLLOWING METHOD IS IN

tree.h OF THE HEWLETT-PACKARD

IMPLEMENTATION:

template <class Key, class Value, class KeyOfValue, class Compare>rb_tree<Key, Value, KeyOfValue, Compare>::iteratorrb_tree<Key, Value, KeyOfValue, Compare>::find(const Key& k){ link_type y = header; /* Last node which is not less than k. */ link_type x = root(); /* Current node. */

while (x != NIL) if (!key_compare(key(x), k)) y = x, x = left(x); else x = right(x);

iterator j = iterator(y); return (j == end() || key_compare(k, key(j.node))) ? end() : j;}

THE worstTimeU(n) IS LOGARITHMIC INn.


NOW LET’S FOCUS ON AN UNUSUALBUT VERY EFFICIENT SEARCHTECHNIQUE:

HASHING

THE CLASS IN WHICH HASHING IS

IMPLEMENTED IS THE hash_map

CLASS. THIS IS NOT YET IN THE

STANDARD TEMPLATE LIBRARY.

TO A USER, THE hash_map CLASS

IS SIMILAR TO THE map CLASS,

EXCEPT hash_map HAS ONLY A FEW

METHODS, SUCH AS insert, erase, AND

find. AND THE TIMING ESTIMATES

FOR THOSE METHODS ARE LOWERTHAN IN THE map CLASS.

RECALL THAT EACH VALUE (THATIS, ITEM) IN A MAP IS A PAIR WHOSE

FIRST COMPONENT IS OF TYPE Key

AND WHOSE SECOND COMPONENT IS

OF TYPE T. THE KEYS ARE UNIQUE,THAT IS, NO TWO DISTINCT VALUESHAVE THE SAME KEY.

HERE ARE THE METHOD

INTERFACES FOR THE hash_map

CLASS:

1. // Postcondition: this hash_map is empty. hash_map( );

2. // Postcondition: the number of items in this hash_map// has been returned.

int size( );

3. // Postcondition: If an item with x's key had already been// inserted into this hash_map, the pair// returned consists of an iterator positioned// at the previously inserted item, and false. // Otherwise, the pair returned consists of

// an iterator positioned at the newly inserted// item, and true. Timing estimates are// discussed later.

pair<iterator, bool> insert ( const value_type<const key_type, T>& x);

4. // Postcondition: if this hash_map already contains a value// whose key part is key, a reference to that// value's second component has been// returned. Otherwise, a new value, <key,// T( )>, is inserted into this hash_map. Timing// estimates are discussed later.

T& operator[ ] (const key_type& key);

5. // Postcondition: If this hash_map contains a value whose// first component equals key, an iterator// positioned at that value has been returned.// Otherwise, an iterator at the same

// position as end() has been returned. // Timing estimates are discussed later. iterator find (const key_type& key);

6. // Precondition: itr is positioned at value in this hash_map. // Postcondition: the value that itr is positioned at has been // deleted from this hash_map. Timing // estimates are discussed later in this chapter. void erase (iterator itr);

7. // Postcondition: an iterator positioned at the beginning // of this hash_map has been returned. // Timing estimates are discussed later. iterator begin( );

8. // Postcondition: an iterator has been returned that can be// used in comparisons to terminate iterating// through this hash_map.

iterator end( );

9. // Postcondition: the space for this hash_map object has// been deallocated.~hash_map( );

WE’LL STUDY THE TIME ESTIMATES

AFTER WE DEFINE THE METHODS.

BUT BASICALLY, FOR find, insert, AND

erase,

averageTime(n) IS CONSTANT!

THE ASSOCIATED ITERATORS AREFORWARD ITERATORS, WITH

OPERATORS ++(int), *, ==, AND !=.

FIELDS IN THE hash_map CLASS

CONTIGUOUSarray? vector? deque? heap?

LINKED

Linked? list? map?

BUT NONE OF THESE WILL GIVE

CONSTANT AVERAGE TIME FOR

SEARCHES, INSERTIONS AND

REMOVALS.

HERE IS THE BASIC IDEA:

buckets // an array of values

count // the number of values in the hash_map

LET’S SEE WHERE THAT LEADS.

SUPPOSE persons IS A HASH MAPTHAT WILL HOLD UP TO 1000VALUES. EACH VALUE CONSISTSOF A UNIQUE 3-DIGIT INTEGER (THEKEY), AND A NAME.

buckets count 0 1 2 . . . 999

Persons [351] = “Prashant”;

persons [108] = “Barrett”;

persons[435] = “Lin”;

WHERE SHOULD WE STORE THEVALUE WHOSE KEY IS 351?

buckets count

0

108 351 435

999

3

108 Barrett

351 Prashant

435 Lin

? ?…

…

…

…

NOW FOR SOMETHING SLIGHTLY

DIFFERENT: SUPPOSE persons IS A

HASH MAP THAT HOLDS UP TO 1000

VALUES. EACH VALUE CONSISTS OF

A 10-DIGIT TELEPHONE NUMBER

(THE KEY), AND A NAME.

persons [9876543210] = “Prashant”;

persons [6103301256] = “Barrett”;

persons [6103309816] = “Lin”;

persons [4153576256] = “Sutey”;

WHERE SHOULD THESE VALUES

BE STORED?

9876543210 210

6103301256 256

6103309816 816

4153576256 OOPS!

WHEN TWO DIFFERENT KEYS MAP TOTHE SAME INDEX, THAT IS CALLED ACOLLISION.

KEYS THAT MAP TO THE SAME INDEXARE CALLED SYNONYMS.

HASHING:

AN ALGORITHM THAT TRANSFORMSA KEY INTO AN ARRAY INDEX.

THE ALGORITHM HAS TWO PARTS:

1. A HASH FUNCTION: AN EASILYCOMPUTABLE OPERATION ON THE

KEY THAT RETURNS AN unsigned

long, WHICH IS THEN CONVERTED

INTO AN INDEX IN THE ARRAY

buckets;

2. A COLLISION HANDLER.

HERE IS THE START OF THE

hash_map CLASS:

template<class Key, class T, class HashFunc>class hash_map{

THE THIRD TEMPLATE PARAMETERIS A FUNCTION CLASS: A CLASS INWHICH THE FUNCTION-CALL

OPERATOR, operator( ), ISOVERLOADED.

THE HEADING FOR operator( ) ISunsigned long operator( ) (const key_type& key)

FOR EXAMPLE, WE CAN DEFINE ASIMPLE FUNCTION CLASS IF EACH

KEY IS AN int:class hash_func{ public:

unsigned long operator( ) (const int& key){

return (unsigned long)key;} // overloaded operator( )

} // class hash_func

HERE IS A PROGRAM WITH A

hash_map CLASS IN WHICH EACH

VALUE CONSISTS OF A TELEPHONE

EXTENSION AND THE PERSON AT

THAT EXTENSION. THE ABOVE

hash_func IS USED.

int main(){ const string CLOSE_WINDOW_PROMPT = "Please press the Enter key to close this output window.";

typedef hash_map<int, string, hash_func> hash_class;

hash_class extensions; hash_class::iterator itr;

extensions [5520] = "Yvonne"; extensions [5415] = "Jim"; extensions [5416] = "Penny"; extensions [5537] = "Chun Wai"; extensions [5273] = "Jim";

for (itr = extensions.begin(); itr != extensions.end(); itr++) cout << (*itr).first << " " << (*itr).second << endl;

cout << "The number of items is " << extensions.size() << endl; if (extensions.find (5537) != extensions.end()) { cout << endl << "At extension " << 5537 << " is " << extensions [5537] << endl; extensions.erase (extensions.find (5537)); } // if for (itr = extensions.begin( ); itr != extensions.end( ); itr++) cout << (*itr).first << " " << (*itr).second << endl;

cout << endl << endl << CLOSE_WINDOW_PROMPT; cin.get( );

return 0;} // main

HERE IS THE OUTPUT:

5520 Yvonne5537 Chun Wai5415 Jim5416 Penny5273 JimThe number of items is 5

At extension 5537 is Chun Wai5520 Yvonne5415 Jim5416 Penny5273 Jim

Please press the Enter key to close this output window.

THERE IS NO OBVIOUS ORDER OFTHE KEYS. IF THE CONTAINER MUST

ALWAYS BE IN ORDER, USE A map

INSTEAD OF A hash_map.

HERE IS ANOTHER hash_func CLASS,ONE IN WHICH THE KEY IS A STRINGOF UP TO 20 CHARACTERS.BASICALLY, WE ADD UP THE ASCIIVALUES OF THE KEY’S CHARACTERS.TO FURTHER SPREAD OUT THERESULT, PARTIAL TOTALS ARE MUL-TIPLIED BY 13, AND THE FINAL TOTALIS MULTIPLIED BY A BIG PRIME.

class hash_func{ public:

unsigned long operator( ) (const string& key) { const unsigned long BIG_PRIME = 4294967291; unsigned long total = 0;

for (unsigned i = 0; i < key.length(); i++) total += 13 * key [i]; return total * BIG_PRIME; } // operator( )}; // class hash_func

THE hash_func CLASS IS SUPPLIED BY

THE USER. THE hash_map CLASS

CONVERTS THE unsigned long

RETURNED BY operator( ) INTO AN

ARRAY INDEX BY TAKING THE

REMAINDER % CAPACITY OF buckets.

EXERCISE:

SUPPOSE THE CAPACITY OF bucketsIS 203, AND FOR key1, key2, AND key3,

THE unsigned long NUMBERS

RETURNED BY hash_func (const string&

key) ARE 202, 203, AND 204RESPECTIVELY. AT WHATLOCATIONS WOULD THE VALUES

WITH KEYS key1, key2, AND key3 BESTORED?

AS YOU MIGHT HAVE GUESSED,

HASHING IS INEFFICIENT WHEN

THERE ARE A LOT OF COLLISIONS.

USERS OF THE hash_map CLASS“HOPE” THAT THE KEYS ARE

SCATTERED RANDOMLYTHROUGHOUT THE TABLE. THIS

HOPE IS FORMALLY STATED AS

FOLLOWS:

THE UNIFORM HASHING ASSUMPTION

EACH KEY IS EQUALLY LIKELY TOHASH TO ANY ONE OF THE TABLEADDRESSES, INDEPENDENTLY OFWHERE THE OTHER KEYS HAVEHASHED.

EVEN IF THE UNIFORM HASHINGASSUMPTION HOLDS, THERE MAYSTILL BE COLLISIONS.

NOW WE’LL LOOK AT SPECIFICCOLLISION HANDLERS.

CHAINING (ALSO CALLED CHAINED

HASHING): AT INDEX i IN buckets,STORE THE LIST OF ALL VALUES

WHOSE KEYS HASH TO i.

HERE ARE THE FIELDS FOR CHAINEDHASHING:

list <value_type< const key_type, T> >* buckets; // at each index in the array buckets,

// buckets, we will store the list of all // items whose keys hashed to that index

int count, // number of items in this hash_map length; // number of buckets in this hash_maphash_func hash; // hash is a function object

HERE IS HOW THE unsigned long

RETURNED BY hash_func IS

CONVERTED INTO AN INDEX:

int index = hash_func (key) % length;

INSERT VALUES WITH THESE KEYS:

21555516127178626358610330935861033090007178621359717862745121555543586103300451

ASSUME length = 1000. IGNORE 2ND COMPONENT

IN VALUE, IGNORE prev FIELD, USE ‘X’ AT END.

buckets count 0 1

... 358

359... 451 ... 612

6103309000 X 8

X

7178626358 6103309358

7178627451

2155551612 X

6103300451 X

7178621359 2155554358 X X

IMPLEMENTATION OF THE hash_map

CLASS

WHAT ABOUT THE SIZE OF buckets,

AND SHOULD THAT ARRAY EVER BE

RE-SIZED?

THE INITIAL CAPACITY WILL BE 203,

AND WE WILL RE-SIZE WHENEVER

THE LOAD FACTOR, THE RATIO OF

count TO length, EXCEEDS 0.75.

TO RE-SIZE, WE WILL DOUBLE THE

OLD CAPACITY, PLUS 1.

NOTE THAT WE RE-SIZE WHENEVER

THE LOAD FACTOR, THAT IS, THE

AVERAGE LIST SIZE, EXCEEDS 0.75.

BY THE WAY, IF WE NEVER RESIZED,

THE AVERAGE LIST SIZE WOULD BE

PROPORTIONAL TO count, SO WE

WOULD HAVE SEQUENTIAL

SEARCHES, AND averageTime(n)

WOULD BE LINEAR IN n FOR find,

insert, AND erase!

hash_map( ){

count = 0; length = DEFAULT_SIZE; // = 203 buckets = new list<value_type<const key_type, T> > [DEFAULT_SIZE];} // default constructor

THE DEFINITION OF THE insert

METHOD IS COMPLICATED BY

OF THE POSSIBILITY OF RESIZING.

THE RESIZING ISSUE IS HANDLED IN

A SEPARATE METHOD, SO THE insert

METHOD ITSELF BECOMES FAIRLY

SIMPLE:

TO INSERT THE VALUE x IF x IS NOT

ALREADY IN THE hash_map, GET THE

address THAT x.first HASHES TO, AND

APPEND x TO THE LINKED LIST AT

buckets [address].

pair<iterator, bool> insert ( const value_type<const key_type, T>& x){

iterator old_itr; key_type key = x.first;

old_itr = find (key); if (old_itr != end()) return pair <iterator, bool> (old_itr, false); int address = hash (key) % length; buckets [address].push_back (x); count++; check_for_expansion( ); return pair<iterator, bool> (find (key), true);} // insert

IN check_for_expansion, IF count >=

length * 0.75, CREATE A NEW ARRAY

OF DOUBLE THE OLD LENGTH (PLUS

1). FOR EACH LIST IN THE OLD

ARRAY, ITERATE THROUGH THE

LIST, AND HASH EACH VALUE TO

THE NEW ARRAY. FINALLY, ERASE

THE OLD ARRAY.

void check_for_expansion(){

list<value_type<const key_type, T> >::iterator list_itr; int index;

if (count > int (MAX_RATIO * length)) {

list< value_type<const key_type, T> >* temp_buckets = buckets;

length = 2 * length + 1; buckets = new list< value_type<const key_type, T> > [length]; Key new_key;

for (int i = 0; i < length / 2; i++) if (!temp_buckets [i].empty()) for (list_itr = temp_buckets [i].begin();

list_itr != temp_buckets [i].end(); list_itr++) { new_key = (*list_itr).first; index = hash (new_key) % length; buckets [index].push_back (*list_itr); } // storing temp_buckets [i] back into buckets

// destroy the old linked lists for (int i = 0; i < length / 2; i++) if (!temp_buckets [i].empty()) temp_buckets [i].erase (temp_buckets [i].begin( ), temp_buckets [i].end( )); } // doubling buckets size} // method check_for_expansion

THE begin, end, find, AND erase

METHODS INVOLVE THE EMBEDDED

iterator CLASS, SO LET’S DEVELOP

THAT iterator CLASS NOW.

WE HAVE ALREADY SEEN THE

METHOD INTERFACES FOR

OPERATORS ==, !=, ++, AND *.

WHAT FIELDS DO WE NEED?

TO ITERATE THROUGH A LIST, WE

WILL NEED A LIST ITERATOR FIELD.

TO BE ABLE TO GET TO THE

NEXT LIST, WE NEED A COPY OF

buckets (REMEMBER, THIS IS JUST A

POINTER TO AN ARRAY, NOT AN

ARRAY ITSELF), AND index. AND TO

KNOW WHEN THERE ARE NO MORE

LISTS, WE NEED A COPY OF length.

class iterator{

friend class hash_map<key_type, T, hash_func>;

protected:

unsigned index, length; list<value_type<const key_type, T> >::iterator list_itr; list<value_type<const key_type, T> >* buckets;

THE DEFINITIONS OF ==, !=, AND * ARE

STRAIGHTFORWARD:

bool operator== (const iterator& other) const{

return (index == other.index) && (list_itr == other.list_itr) && (buckets == other.buckets) && (length = other.length)} // operator==

value_type<const key_type, T> & operator* (){ return *list_itr;} // operator*

bool operator!= (const iterator& other) const{ return !(*this == other);} // operator!=

operator++(int) FIRST SAVES THE

CALLING OBJECT AND THEN ADVAN-

CES list_itr. IF THAT LIST ITERATOR

IS AT THE END OF ITS LIST, THE

REMAINING LISTS IN buckets ARE

SEARCHED UNTIL (UNLESS) A NON-

EMPTY BUCKET IS FOUND; list_itr IS

POSITIONED AT THE BEGINNING OF

THAT LIST. ULTIMATELY, THE SAVEDCALLING OBJECT IS RETURNED.

iterator operator++ (int){ iterator old_itr = *this; if (++list_itr != buckets [index].end()) return old_itr; while (index < length - 1) { index++; if (!buckets [index].empty()) { list_itr = buckets [index].begin(); return old_itr; } // bucket entry not empty }// while list_itr = buckets [index].end(); return old_itr;} // postincrement operator++

NOW WE CAN GET BACK TO THE

hash_map METHOD DEFINITIONS.

THE begin METHOD RETURNS ANITERATOR POSITIONED AT THEFIRST NON-EMPTY LIST IN THE

ARRAY buckets:

iterator begin(){ int i = 0; iterator itr; itr.buckets = buckets; while (i < length - 1) { if (!buckets [i].empty()) { itr.list_itr = buckets [i].begin(); itr.index = i; itr.length = length; return itr; } // buckets [i] not empty i++; } // while itr.list_itr = buckets [i].end(); itr.index = i; itr.length = length; return itr;} // begin

THE DEFINITION OF THE end METHOD

IS EVEN SIMPLER: AN ITERATOR

POSITIONED AT THE END OF THE LIST

IN buckets [length – 1] IS RETURNED:

iterator end(){ int i = length – 1; iterator itr; itr.buckets = buckets; itr.list_itr = buckets [i].end(); itr.index = i; itr.length = length; return itr;} // end

FINALLY, THE find METHODSEARCHES THE LIST AT THE ENTRY

IN buckets WHERE key HASHED. ANITERATOR REFLECTING THE RESULTOF THAT SEARCH IS RETURNED:

iterator find (const key_type& key){ int index = hash (key) % length; iterator itr;

itr.list_itr = std::find (buckets [index].begin(), buckets [index].end(), value_type<const key_type, T> (key, T())); if (itr.list_itr == buckets [index].end()) return end(); itr.buckets = buckets; itr.index = index;

itr.length = length; return itr;} // find

THIS DEFINITION HAS A COUPLE OFINTERESTING FEATURES:

1. THE SCOPE-RESOLUTIONOPERATOR, operator::, IS USED TOINDICATE THAT THE CALL TO findIN LINE 5 REFERS TO THEGENERIC ALGORITHM find IN THESTANDARD TEMPLATE LIBRARY,NOT A RECURSIVE CALL TO THEhash_map CLASS’S find METHOD.

2. WHEN THE STANDARD TEMPLATECLASS’S find METHOD IS CALLED,A LIST OF VALUES IS SEARCHED.BUT EQUALITY SHOULD BE BASEDON KEYS. SO A VALUE CANNOTSIMPLY BE AN INSTANCE OF THE,pair CLASS BECAUSE THE pair CLASS DOES NOT EXPLICITLYDEFINE operator==. INSTEAD:

template<class Key, class T>struct value_type

{ value_type (const value_type& p): first (p.first) { second = p.second; }

value_type (const Key& key, const T& t): first (key) { second = t; }

bool operator== (const value_type& x) { return first == x.first; }

Key first; T second; }; // struct value_type

THE CONSTRUCTOR-INITIALIZER SECTION:

:first (p.first)

IS NEEDED BECAUSE THE CALL TO find HAS

itr.list_itr = std::find (buckets [index].begin(), buckets [index].end(), value_type<const key_type, T> (key, T()));

SO first CANNOT BE MODIFIED WITHIN THE

CONSTRUCTOR.

TIME ESTIMATES:

LET n = count, LET m = length.

MAKE THE UNIFORM HASHING

ASSUMPTION AND ASSUME THATcount <= length * 0.75.

THE AVERAGE SIZE OF EACH LIST IS

n / m

FOR THE find METHOD,

averageTimeS(n, m) n / 2m iterations.

<= 0.75 / 2

SO averageTimeS(n, m) <= A CONSTANT.

averageTimeS(n, m) IS CONSTANT.

EVEN IF THE UNIFORM HASHING

ASSUMPTION HOLDS, IT IS POSSIBLE

FOR EACH KEY TO HASH TO THE

SAME INDEX. TO SEARCH THE LIST

AT THAT INDEX TAKES LINEAR-IN-n

TIME.

SO worstTimeS(n, m) IS LINEAR IN n.

THE SAME RESULTS, CONSTANT

AVERAGE TIME AND LINEAR WORST

TIME, HOLD FOR insert AND erase.

EXERCISE: FOR operator++ IN THE

iterator CLASS, FIND THE SMALLEST

UPPER BOUND OF worstTime(n, m) AND

THE SMALLEST UPPER BOUND OF

averageTime(n, m).

NOTE: IT DOES NOT MATTER

WHETHER THE UNIFORM HASHING

ASSUMPTION HOLDS!

ANOTHER COLLISION HANDLER IS

PROVIDED WITH OPEN-ADDRESS

HASHING.

NO LINKED LISTS

AT MOST ONE VALUE IS STORED AT

EACH INDEX IN buckets.

key TO index: SAME AS BEFORE

COLLISION HANDLER?

SEARCH THE TABLE UNTIL AN

“OPEN” SLOT IN buckets IS FOUND.

OFFSET-OF-1 COLLISION HANDLER:

IF buckets [index] ALREADY HAS

ANOTHER ELEMENT, TRY

buckets [index + 1], buckets [index + 2], …,

buckets [length – 1], buckets [0],

buckets [1], …, buckets [index – 1].

TO INDICATE WHETHER A LOCATION

IS OCCUPIED, THE value_type CLASS

WILL HAVE

bool occupied;

IN THE FOLLOWING SNAPSHOT, THE

SECOND COMPONENT IN EACH OF

THE 5 VALUES SHOWN IS IGNORED:

key occupied

0

54 1069 % 203 = 54 55 460 % 203 = 54 56 1070 % 203 = 55

109 312 % 203 = 109

201 607 % 203 = 201 202

? false

… false

1069 true 460 true 1070 true

312 true

607 true false

SUPPOSE itr IS POSITIONED AT INDEX

54 AND THE MESSAGE IS

my_map.erase (itr);

key occupied

0

54 1069 % 203 = 54 55 460 % 203 = 54 56 1070 % 203 = 55

109 312 % 203 = 109

201 607 % 203 = 201 202

? false

… false

1069 false 460 true 1070 true

312 true

607 true false

NOW A SEARCH 460 FOR WOULD BE

UNSUCCESSFUL BECAUSE 460

INITIALLY HASHES TO 54, AN

UNOCCUPIED LOCATION.

SOLUTION:

bool marked_for_removal;

THE CONSTRUCTOR SETS EACH bucket’s marked_for_removal FIELD TO false.

insert SETS marked_for_removal TO false;

erase SETS marked_for_removal TO true.

SO AFTER THE INSERTIONS:

marked_for_ key occupied removal

0

54 1069 % 203 = 54 55 460 % 203 = 54 56 1070 % 203 = 55

109 312 % 203 = 109

201 607 % 203 = 201 202

? false

… false


312 true

607 true false

false

false

falsefalsefalse

false

falsefalse

AFTER DELETING THE VALUE WITH

KEY 1069:


0

54 1069 % 203 = 54 55 460 % 203 = 54 56 1070 % 203 = 55

109 312 % 203 = 109

201 607 % 203 = 201 202

? false

… false


312 true

607 true false

false

false

truefalsefalse

false

falsefalse

THIS SOLUTION LEADS TO ANOTHER

PROBLEM: SUPPOSE length = 203.

inserterase // what was just inserted (insert AND erase A TOTAL OF 202 TIMES)insert

count = 1, SO THERE IS NO NEED TO

EXPAND AND RE=HASH. BUT THERE

ARE 202 MARKED-FOR-REMOVALS!

FOR find, AN UNSUCCESSFUL

SEARCH CANNOT STOP UNTIL buckets

[index].marked_for_removal = false.

APPROXIMATELY HOW MANY

ITERATIONS WILL BE MADE IN AN

UNSUCCESSFUL SEARCH, ON

AVERAGE, OF THIS TABLE?

max = 202,

min = 0,

average = 101

SOLUTION: KEEP TRACK OF REMOVALS:

int count_plus // count + number of removals since last rehashing

void check_for_expansion(){

if (count_plus >= int (MAX_RATIO * length)) { // Do not copy the marked_for_removals.

delete[ ] temp_buckets;

count_plus = count; } // doubling buckets size} // method check_for_expansion

pair<iterator, bool> insert ( const value_type<const key_type, T>& x){ key_type key = x.first; unsigned long hash_int = hash (key); int index = hash_int % length; while ((buckets [index].marked_for_removal) || buckets [index].occupied && key != buckets [index].first) index = (index + 1) % length; if (buckets [index].occupied && key == buckets [index].first) return pair <iterator, bool> (find (key), false); buckets [index].first = x.first; buckets [index].second = x.second; buckets [index].occupied = true; buckets [index].marked_for_removal = false; count++; count_plus++; check_for_expansion(); return pair<iterator, bool> (find (key), true);} // insert

CLUSTER: A SEQUENCE OF NON-EMPTY LOCATIONS


0

54 1069 % 203 = 54 55 460 % 203 = 54 56 1070 % 203 = 55

109 312 % 203 = 109

201 607 % 203 = 201 202

? false

… false


312 true

607 true false

false

false

falsefalsefalse

false

falsefalse

KEYS THAT HASH TO 54 FOLLOW

THE SAME COLLISION-PATH AS

KEYS THAT HASH TO 55, …

PRIMARY CLUSTERING: THE

PHENOMENON THAT OCCURS WHEN

THE COLLISION HANDLER ALLOWS

THE GROWTH OF CLUSTERS TO

ACCUMULATE.

THIS WILL OCCUR WITH OFFSET-OF-

1 OR ANY CONSTANT OFFSET.

SOLUTION: DOUBLE HASHING, THAT

IS, OBTAIN BOTH INDICES AND

OFFSETS BY HASHING:

unsigned long hash_int = hash (key);int index = hash_int % length,offset = hash_int / length;

NOW THE OFFSET DEPENDS ON THE

KEY, SO DIFFERENT KEYS WILL USU-

ALLY HAVE DIFFERENT OFFSETS, SO

NO MORE PRIMARY CLUSTERING!

TO GET A NEW INDEX:

index = (index + offset) % length;

EXAMPLE: length = 11

key index offset15

4

119

8

116

5

158

3

527

5

235

2

330

8

247

3

4

WHERE WOULD THESE KEYS GO IN buckets?

index

key 0

47 1 2

35 3

58 4

15 5

16 6

7

27 8

19 910

30

PROBLEM: WHAT IF OFFSET IS MULTIPLE OF length?


key index offset15

4

119

8

116

5

158

3

527

5

235

2

347

3

4246

4

22 // BUT 15 IS AT INDEX 4

FOR KEY 246, NEW INDEX = (4 + 22) % 11 = 4. OOPS!

SOLUTION:

if (offset % length == 0)

offset = 1;

ON AVERAGE, offset % length WILL

EQUAL 0 ONLY ONCE IN EVERY

length TIMES.

FINAL PROBLEM: WHAT IF length HAS SEVERAL FACTORS?


key index offset20 0 125 5 130 10 135 15 1110 10 5 // BUT 30 IS AT INDEX 10

FOR KEY 110, NEW INDEX = (10 + 5) % 20 = 15, WHICH IS OCCUPIED, SO NEW INDEX = (15 + 5) % 20, WHICH IS OCCUPIED, SO NEW INDEX = ...

SOLUTION: MAKE length A PRIME.

THIS VERSION OF OPEN-ADDRESS

HASHING IS FAST. IF THE UNIFORM

HASHING ASSUMPTION HOLDS,

averageTime(n, m) FOR SEARCHING,

INSERTING AND REMOVING IS

CONSTANT.

HOW DOES DOUBLE-HASHING

COMPARE WITH CHAINED HASHING?

n / m 0.25 0.50 0.75 0.90 0.99

Chained Hashing:successful 0.13 0.25 0.38 0.45 0.50unsuccessful 0.25 0.50 0.75 0.90 0.99

Double Hashing:successful 1.14 1.39 1.85 2.56 4.65unsuccessful 1.33 2.00 4.00 10.00 100.00

Figure 13.14. ESTIMATED AVERAGE NUMBER OFLOOP ITERATIONS FOR SUCCESSFUL ANDUNSUCCESSFUL CALLS TO find, UNDERBOTH CHAINED HASHING AND DOUBLEHASHING.

GROUP EXERCISE: DETERMINE WHERE THE

FOLLOWING KEYS WILL GO IN buckets. ASSUME

THAT length = 13. NO RESIZING NEEDED.

key index offset 20 7 1 33 7 2 49 10 3 22 9 1 26 0 2140 10 10 38 12 2 9 9 0

Date post:	01-Jan-2016
Category:	Documents
Upload:	quynn-petersen
View:	14 times
Download:	0 times

RED-BLACK TREE SEARCH THE FOLLOWING METHOD IS IN tree.h OF THE HEWLETT-PACKARD IMPLEMENTATION:

Documents