Date post: | 16-Jan-2016 |
Category: |
Documents |
Upload: | cornelia-lloyd |
View: | 213 times |
Download: | 0 times |
LECTURE 11: WHY I LIKE HASH
CSC 213 – Large Scale Programming
Today’s Goal
Consider what will be important when searching Why search in first place? What is its
purpose? What should we expect & handle when
searching? What factors matter to our users (and
ourselves)? (Besides source of bad jokes) What is
hashing? Why important for searching? How can it
help? What are critical factors of good hash
function? Commonly-used hash function example
examined
Keys To Map & Dictionary
1. Used to convert the key into value2. values cannot share a key and be in
same Map3. In searching failure is normal, not
exceptional
Entry ADT
Needs 2 pieces: what we have & what we want First part is the key: data used in search Item we want is value; the second part of
an Entry Implementations must define 2
methods key() & value() return appropriate item Usually includes setValue() but NOT setKey()
SEQUENCE-Based Map
SEQUENCE’s perspective of MAP that it holds
POSITIONs
elements
SEQUENCE-Based Map
Outside view of MAP and how it is stored
POSITIONs
ENTRYs
SEQUENCE-Based Map
MAP implementation’s view of data and storage
POSITIONs
Elements/ENTRYs
Emergency
Please hold while the machine
searches 1,000,000 records for your location
Map Performance
In all seriousness, can be matter of life-or-death 911 Operators immediately need
addresses Google’s search performance in TB/s O(log n) time too slow for these uses
Would love to use arrays Get O(1) time to add, remove, or lookup
data This HUGE array needs massive RAM
purchase
Monster Amounts of RAM
Java requires using int as array index Limit to int and RAM available in a
machine Integer.MAX_VALUE = 2,147,483,647 8,200,000,000 pages in Google’s index
(2005) In US, possible phone numbers =
10,000,000,000 Must do more for O(1) array usage time
Monster Amounts of RAM
Java requires using int as array index Limit to int and RAM available in a
machine Integer.MAX_VALUE = 2,147,483,647 8,200,000,000 pages in Google’s index
(2005) In US, possible phone numbers =
10,000,000,000 Must do more for O(1) array usage time
As with all life’s problems we turn to hash
Monster Amounts of RAM
Java requires using int as array index Limit to int and RAM available in a
machine Integer.MAX_VALUE = 2,147,483,647 8,200,000,000 pages in Google’s index
(2005) In US, possible phone numbers =
10,000,000,000 Must do more for O(1) array usage time
As with all life’s problems we turn to hash
Hashing To The Rescue
Hash function turns key into int from 0 – N-1 Result is usable as index for an array Specific for key’s type; cannot be reused
Store the Entrys in array (“HASH TABLE”) (Great name for shop in Amsterdam, too) Begin by computing key’s hash value Result is array index for that Entry
Now is possible to use array for O(1) time!
Hash Table Example
Example shows table of Entry<Long,String>
Simple hash function ish(x) = x mod 10,000 x is/from Entry’s key h(x) computes index to use Always is mod array length
Not all locations used Holes will appear in array Empties: set to null -or-
use sentinel
value
Hash Table
Entrys
0 •
10256120001
“Jay Doe”
29811010002
“Bob Doe”
3 •
44512290004
“Jill Roe”
⁞ ⁞
9997 •
9998
2007519998
“Rhi Smith”
9999 •
When We Use Hash
When We Use Hash
Hash key to find index First step for most calls
get()- need index to check Add at that index - put() remove()- index to set null
Then check key at index At index many keys
possible Still a Map, so results known If you find keys not same
cannot treat as the same!
Hash Table
Entrys
0 •
10256120001
“Jay Doe”
29811010002
“Bob Doe”
3 •
44512290004
“Jill Roe”
⁞ ⁞
9997 •
9998
2007519998
“Rhi Smith”
9999 •
Properties of Good Hash
To really be useful, hash must have properties
ReliableFAST
Use entire table
Properties of Good Hash
To really be useful, hash must have properties
ReliableFAST
Use entire tableMake good brownies
Reliability of Hash Function Implement Map with a hash table
To use Entry, get key to easily look up its index
Always computes same index for that key
Speed of Hash Function
Hash must be computed on each access Goal: O(1) efficiency by using an array Efficiency of array wasted if hash is slow
If O(1) computation performed by hash function It is possible to perform get in O(1) time O(1) time for put & remove could also occur None of this is guaranteed; many problems
can occur
Use Entire Table Important
Hashing take lots of space because array is used When creating, make array big enough to
hold all data Can copy to larger array, but this not O(1)
operation Use prime number lengths but these quickly
get large Spreads out Entrys equally across
entire table Further apart it's spread, easier to find
opening
Hash Function Analogy
Hash Function Analogy
Hash table
Hash Function Analogy
Hash functionHash table
Examples of Bad Hash
h(x) = 0 Reliable, fast, little use of table
h(x) = random.nextInt() Unreliable, fast, uses entire table
h(x) = current index -or- free index Reliable, slow, uses entire table
h(x) = x34 + 2x33+ 24x32 + 10x31… Reliable, moderate, too large
Incredibly Bad Hash
Incredibly Bad Hash
Using only part of key & not whole thing No matter what, inevitably, you will guess
wrong
Incredibly Bad Hash
Using only part of key & not whole thing No matter what, inevitably, you will guess
wrong
Incredibly Bad Hash
Using only part of key & not whole thing No matter what, inevitably, you will guess
wrong
Part used for hash
Incredibly Bad Hash
Using only part of key & not whole thing No matter what, inevitably, you will guess
wrong
Part used for hashPart that matters
Good Hash
Hash must first turn key into int Easy for numbers, but rarely that simple in
real life For a String, could add value of each
character Would hash to same index “spot”, “pots”,
“stop” Instead we usually use polynomial code:
Censored
= (x0 * ak-1) + (x1 * ak-2) + … + (xk-2 * a1) + xk-1
Good Hash
Hash must first turn key into int Easy for numbers, but rarely that simple in
real life For a String, could add value of each
character Would hash to same index “spot”, “pots”,
“stop” Instead we usually use polynomial code:
Censored
= (x0 * ak-1) + (x1 * ak-2) + … + (xk-2 * a1) + xk-1
“spot” = (‘s’ * a3) + (‘p’ * a2) + (‘o’ * a1) + (‘t’ * a0)
Good Hash
Hash must first turn key into int Easy for numbers, but rarely that simple in
real life For a String, could add value of each
character Would hash to same index “spot”, “pots”,
“stop” Instead we usually use polynomial code:
Censored
= (x0 * ak-1) + (x1 * ak-2) + … + (xk-2 * a1) + xk-1
“spot” = (‘s’ * a3) + (‘p’ * a2) + (‘o’ * a1) + (‘t’ * a0)
“stop” = (‘s’ * a3) + (‘t’ * a2) + (‘o’ * a1) + (‘p’ * a0)
Good, Fast Hash
Polynomial codes good, but very slow Major bummer since we use hash for its
speed Cause of slowdown: computing an takes n
operations Horner’s method better by
piggybacking work
Slow Approach:“spot” = (‘s’ * a3) + (‘p’ * a2) + (‘o’ * a1) + (‘t’ * a0)
Horner’s Method“spot” = ‘t’ + (a * (‘o’ + (a * (‘p’ + (a * ‘s’)))))
Compression
Hash’s only use is computing array indices Useless if larger than table’s length: no
index exists! When a=33, “spot” hashed to
4,293,383 Some hash incalculable (like
“triskaidekaphobia”) To compress result, work like array-
based queuehash = (result + length) % length
% returns by modulus (the remainder from division)
Serves exact same purpose: keeps index within limits
Before Next Lecture…
Continue working on week #4 assignment Due at usual time Tues. so may want to get
cracking Start thinking of designs & CRC cards for
project Due in 10 days as projects completed in stages
Read sections 9.2.1 & 9.2.5 – 9.2.7 of the book Consider better ways of handling this situation: