David Luebke 1 04/10/23
ITCS 6114
Skip Lists
Hashing
David Luebke 2 04/10/23David Luebke 2 04/10/23
Red-Black Trees
● Red-black trees do what they do very well● What do you think is the worst thing about
red-black trees?● A: coding them up
David Luebke 3 04/10/23David Luebke 3 04/10/23
Skip Lists
● A relatively recent data structure■ “A probabilistic alternative to balanced trees”■ A randomized algorithm with benefits of r-b trees
○ O(lg n) expected time for Search, Insert
○ O(1) time for Min, Max, Succ, Pred
■ Much easier to code than r-b trees
■ Fast!
David Luebke 4 04/10/23David Luebke 4 04/10/23
Linked Lists
● Think about a linked list as a structure for dynamic sets. What is the running time of:■ Min() and Max()?■ Successor()?■ Delete()?
○ How can we make this O(1)?■ Predecessor()?■ Search()?■ Insert()?
Goal: make these O(lg n) time in a linked-list setting
So these all take O(1)time in a linked list. Can you think of a wayto do these in O(1) timein a red-black tree?
David Luebke 5 04/10/23
Skip Lists
● The basic idea:
● Keep a doubly-linked list of elements■ Min, max, successor, predecessor: O(1) time■ Delete is O(1) time, Insert is O(1)+Search time
● During insert, add each level-i element to level i+1 with probability p (e.g., p = 1/2 or p = 1/4)
level 1
3 9 12 18 29 35 37
level 2
level 3
David Luebke 6 04/10/23
Skip List Search
● To search for an element with a given key:■ Find location in top list
○ Top list has O(1) elements with high probability○ Location in this list defines a range of items in next list
■ Drop down a level and recurse
● O(1) time per level on average● O(lg n) levels with high probability● Total time: O(lg n)
David Luebke 7 04/10/23
Skip List Insert
● Skip list insert: analysis■ Do a search for that key■ Insert element in bottom-level list■ With probability p, recurse to insert in next level■ Expected number of lists = 1+ p + p2 + … = ???
= 1/(1-p) = O(1) if p is constant
■ Total time = Search + O(1) = O(lg n) expected
● Skip list delete: O(1)
David Luebke 8 04/10/23
Skip Lists
● O(1) expected time for most operations● O(lg n) expected time for insert● O(n2) time worst case (Why?)
■ But random, so no particular order of insertion evokes worst-case behavior
● O(n) expected storage requirements (Why?)● Easy to code
David Luebke 9 04/10/23
Review: Hashing Tables
● Motivation: symbol tables■ A compiler uses a symbol table to relate symbols
to associated data○ Symbols: variable names, procedure names, etc.○ Associated data: memory location, call graph, etc.
■ For a symbol table (also called a dictionary), we care about search, insertion, and deletion
■ We typically don’t care about sorted order
David Luebke 10 04/10/23
Review: Hash Tables
● More formally:■ Given a table T and a record x, with key (= symbol) and
satellite data, we need to support:○ Insert (T, x)○ Delete (T, x)○ Search(T, x)
■ We want these to be fast, but don’t care about sorting the records
● The structure we will use is a hash table■ Supports all the above in O(1) expected time!
David Luebke 11 04/10/23
Hashing: Keys
● In the following discussions we will consider all keys to be (possibly large) natural numbers
● How can we convert floats to natural numbers for hashing purposes?
● How can we convert ASCII strings to natural numbers for hashing purposes?
David Luebke 12 04/10/23
Review: Direct Addressing
● Suppose:■ The range of keys is 0..m-1 ■ Keys are distinct
● The idea:■ Set up an array T[0..m-1] in which
○ T[i] = x if x T and key[x] = i○ T[i] = NULL otherwise
■ This is called a direct-address table○ Operations take O(1) time!○ So what’s the problem?
David Luebke 13 04/10/23
The Problem With Direct Addressing
● Direct addressing works well when the range m of keys is relatively small
● But what if the keys are 32-bit integers?■ Problem 1: direct-address table will have
232 entries, more than 4 billion■ Problem 2: even if memory is not an issue, the time to
initialize the elements to NULL may be
● Solution: map keys to smaller range 0..m-1● This mapping is called a hash function
David Luebke 14 04/10/23David Luebke 14 04/10/23
Hash Functions
● Next problem: collisionT
0
m - 1
h(k1)
h(k4)
h(k2) = h(k5)
h(k3)
k4
k2 k3
k1
k5
U(universe of keys)
K(actualkeys)
David Luebke 15 04/10/23David Luebke 15 04/10/23
Resolving Collisions
● How can we solve the problem of collisions?● Solution 1: chaining● Solution 2: open addressing
David Luebke 16 04/10/23David Luebke 16 04/10/23
Open Addressing
● Basic idea (details in Section 12.4): ■ To insert: if slot is full, try another slot, …, until
an open slot is found (probing)■ To search, follow same sequence of probes as
would be used when inserting the element○ If reach element with correct key, return it○ If reach a NULL pointer, element is not in table
● Good for fixed sets (adding but no deletion)■ Example: spell checking
● Table needn’t be much bigger than n
David Luebke 17 04/10/23David Luebke 17 04/10/23
Chaining
● Chaining puts elements that hash to the same slot in a linked list:
——
——
——
——
——
——
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 ——
k5 k2
k3
k8 k6 ——
——
k7 ——
David Luebke 18 04/10/23David Luebke 18 04/10/23
Chaining
● How do we insert an element?
——
——
——
——
——
——
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 ——
k5 k2
k3
k8 k6 ——
——
k7 ——
David Luebke 19 04/10/23David Luebke 19 04/10/23
Chaining
——
——
——
——
——
——
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 ——
k5 k2
k3
k8 k6 ——
——
k7 ——
● How do we delete an element?■ Do we need a doubly-linked list for efficient delete?
David Luebke 20 04/10/23David Luebke 20 04/10/23
Chaining
● How do we search for a element with a given key?
——
——
——
——
——
——
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 ——
k5 k2
k3
k8 k6 ——
——
k7 ——
David Luebke 21 04/10/23David Luebke 21 04/10/23
Analysis of Chaining
● Assume simple uniform hashing: each key in table is equally likely to be hashed to any slot
● Given n keys and m slots in the table: the load factor = n/m = average # keys per slot
● What will be the average cost of an unsuccessful search for a key?
David Luebke 22 04/10/23David Luebke 22 04/10/23
Analysis of Chaining
● Assume simple uniform hashing: each key in table is equally likely to be hashed to any slot
● Given n keys and m slots in the table, the load factor = n/m = average # keys per slot
● What will be the average cost of an unsuccessful search for a key? A: O(1+)
David Luebke 23 04/10/23David Luebke 23 04/10/23
Analysis of Chaining
● Assume simple uniform hashing: each key in table is equally likely to be hashed to any slot
● Given n keys and m slots in the table, the load factor = n/m = average # keys per slot
● What will be the average cost of an unsuccessful search for a key? A: O(1+)
● What will be the average cost of a successful search?
David Luebke 24 04/10/23David Luebke 24 04/10/23
Analysis of Chaining
● Assume simple uniform hashing: each key in table is equally likely to be hashed to any slot
● Given n keys and m slots in the table, the load factor = n/m = average # keys per slot
● What will be the average cost of an unsuccessful search for a key? A: O(1+)
● What will be the average cost of a successful search? A: O(1 + /2) = O(1 + )
David Luebke 25 04/10/23David Luebke 25 04/10/23
Analysis of Chaining Continued
● So the cost of searching = O(1 + )● If the number of keys n is proportional to the
number of slots in the table, what is ?● A: = O(1)
■ In other words, we can make the expected cost of searching constant if we make constant
David Luebke 26 04/10/23David Luebke 26 04/10/23
Choosing A Hash Function
● Clearly choosing the hash function well is crucial■ What will a worst-case hash function do?■ What will be the time to search in this case?
● What are desirable features of the hash function?■ Should distribute keys uniformly into slots■ Should not depend on patterns in the data
David Luebke 27 04/10/23David Luebke 27 04/10/23
Hash Functions:The Division Method
● h(k) = k mod m■ In words: hash k into a table with m slots using the
slot given by the remainder of k divided by m
● What happens to elements with adjacent values of k?
● What happens if m is a power of 2 (say 2P)?● What if m is a power of 10?● Upshot: pick table size m = prime number not
too close to a power of 2 (or 10)
David Luebke 28 04/10/23David Luebke 28 04/10/23
Hash Functions:The Multiplication Method
● For a constant A, 0 < A < 1:● h(k) = m (kA - kA)
What does this term represent?
David Luebke 29 04/10/23David Luebke 29 04/10/23
Hash Functions:The Multiplication Method
● For a constant A, 0 < A < 1:● h(k) = m (kA - kA)
● Choose m = 2P
● Choose A not too close to 0 or 1● Knuth: Good choice for A = (5 - 1)/2
Fractional part of kA
David Luebke 30 04/10/23David Luebke 30 04/10/23
Hash Functions: Worst Case Scenario
● Scenario:■ You are given an assignment to implement hashing■ You will self-grade in pairs, testing and grading
your partner’s implementation■ In a blatant violation of the honor code, your
partner:○ Analyzes your hash function○ Picks a sequence of “worst-case” keys, causing your
implementation to take O(n) time to search
● What’s an honest CS student to do?
David Luebke 31 04/10/23David Luebke 31 04/10/23
Hash Functions: Universal Hashing
● As before, when attempting to foil an malicious adversary: randomize the algorithm
● Universal hashing: pick a hash function randomly in a way that is independent of the keys that are actually going to be stored■ Guarantees good performance on average, no
matter what keys adversary chooses