+ All Categories
Home > Documents > 130 universal hashing pub - cs.jhu.edu · Universal hashing A set of items have been hashed to an...

130 universal hashing pub - cs.jhu.edu · Universal hashing A set of items have been hashed to an...

Date post: 07-Oct-2020
Category:
Upload: others
View: 10 times
Download: 0 times
Share this document with a friend
31
Universal hashing Ben Langmead Department of Computer Science Please sign guestbook (www.langmead-lab.org/teaching-materials) to tell me briey how you are using the slides. For original Keynote les, email me ([email protected]).
Transcript
Page 1: 130 universal hashing pub - cs.jhu.edu · Universal hashing A set of items have been hashed to an -bucket hash table using from a 2-universal family S m n h For given element let

Universal hashingBen Langmead

Department of Computer Science

Please sign guestbook (www.langmead-lab.org/teaching-materials) to tell me briefly how you are using the slides. For original Keynote files, email me ([email protected]).

Page 2: 130 universal hashing pub - cs.jhu.edu · Universal hashing A set of items have been hashed to an -bucket hash table using from a 2-universal family S m n h For given element let

Randomness & independence

h1

h2

h1h2

Page 3: 130 universal hashing pub - cs.jhu.edu · Universal hashing A set of items have been hashed to an -bucket hash table using from a 2-universal family S m n h For given element let

Universal hashing

A family of hash functions from universe with to range {0, 1, ... , } is 2-universal if

for distinct elements and for function drawn uniformly from :

H U|U | ≥ n n − 1

x1, x2 hH

Pr (h(x1) = h(x2)) ≤1n

Vn

U HLet's prove a useful expectation for hash tables…

Page 4: 130 universal hashing pub - cs.jhu.edu · Universal hashing A set of items have been hashed to an -bucket hash table using from a 2-universal family S m n h For given element let

Universal hashing

A set of items have been hashed to an -bucket hash table using from a 2-universal family

S m nh

For given element let r.v. be the number of items in bucket . We want to show:

x Xh(x)

E[X] ≤ {m/n if x ∉ S1 + (m − 1)/n if x ∈ S

Not-in-table case 1 if m = n

In-table case if < 2 m = n

Page 5: 130 universal hashing pub - cs.jhu.edu · Universal hashing A set of items have been hashed to an -bucket hash table using from a 2-universal family S m n h For given element let

Universal hashing

E[X] ≤ {m/n if x ∉ S1 + (m − 1)/n if x ∈ S

Pr(Xi = 1) ≤1n

By 2-universality!

Let be a r.v. = 1 when the element of is in same bucket as . = 0 otherwise

Xi Xi ith Sx Xi

Page 6: 130 universal hashing pub - cs.jhu.edu · Universal hashing A set of items have been hashed to an -bucket hash table using from a 2-universal family S m n h For given element let

Universal hashing

E[X] = E [m

∑i=1

Xi] =m

∑i=1

E[Xi] ≤mn

2-universality

Linearity

+ expectation of indicator

E[Xi] = Pr(Xi = 1) ≤1n

casex ∉ S

Page 7: 130 universal hashing pub - cs.jhu.edu · Universal hashing A set of items have been hashed to an -bucket hash table using from a 2-universal family S m n h For given element let

Universal hashing

E[X] ≤ {m/n if x ∉ S1 + (m − 1)/n if x ∈ S

Without loss of generality, use for item i = 1 x

Pr(Xi = 1) ≤1n

for i > 1

Let be a r.v. = 1 when the element of is in same bucket as . = 0 otherwise

Xi Xi ith Sx Xi

Page 8: 130 universal hashing pub - cs.jhu.edu · Universal hashing A set of items have been hashed to an -bucket hash table using from a 2-universal family S m n h For given element let

Universal hashing

E[X] = E [m

∑i=1

Xi] = 1 +m

∑i=2

E[Xi] ≤ 1+m − 1

n

2-universality

Linearity

+ expectation of indicator

casex ∈ S

Page 9: 130 universal hashing pub - cs.jhu.edu · Universal hashing A set of items have been hashed to an -bucket hash table using from a 2-universal family S m n h For given element let

Universal hashing

Proving a key property; with 2-universal hashing, expected query time is when O(1) m ≤ n

E[X] ≤ {m/n if x ∉ S1 + (m − 1)/n if x ∈ S

Not-in-table case 1 if m = n

In-table case ~2 if m = n

Page 10: 130 universal hashing pub - cs.jhu.edu · Universal hashing A set of items have been hashed to an -bucket hash table using from a 2-universal family S m n h For given element let

Universal hashing

What kind of family has this property?

Are functions easy to draw from the family?

Are functions easy to store and compute with?

Page 11: 130 universal hashing pub - cs.jhu.edu · Universal hashing A set of items have been hashed to an -bucket hash table using from a 2-universal family S m n h For given element let

Universal hashing

Universe U : {0, 1, 2, . . . , m − 1}

Range with V : {0, 1, 2, . . . , n − 1} n ≤ mPrime p ≥ m

pm

U Vn

Page 12: 130 universal hashing pub - cs.jhu.edu · Universal hashing A set of items have been hashed to an -bucket hash table using from a 2-universal family S m n h For given element let

Universal hashing

Example of a 2-universal family from to :U V

ha,b(x) = ((ax + b) mod p) mod n

H = {ha,b | 1 ≤ a ≤ p − 1, 0 ≤ b ≤ p − 1}

pm

U Vn

Page 13: 130 universal hashing pub - cs.jhu.edu · Universal hashing A set of items have been hashed to an -bucket hash table using from a 2-universal family S m n h For given element let

Prime field

A prime field is a number system consisting of integers modulo a prime , and rules for plus & times

Fpp

Plus & times have many of our favorite properties

+ 0 1 2 3 40 0 1 2 3 41 1 2 3 4 02 2 3 4 0 13 3 4 0 1 24 4 0 1 2 3

x+

0 1 2 3 40 0 0 0 0 01 0 1 2 3 42 0 2 4 1 33 0 3 1 4 24 0 4 3 2 1

F5

Page 14: 130 universal hashing pub - cs.jhu.edu · Universal hashing A set of items have been hashed to an -bucket hash table using from a 2-universal family S m n h For given element let

Prime field

2 ⋅ 3 = 3 ⋅ 2 = 1 mod 54 ⋅ 4 = 1 mod 51 ⋅ 1 = 1 mod 5

Fields are special for having multiplicative inverses

x+

0 1 2 3 40 0 0 0 0 01 0 1 2 3 42 0 2 4 1 33 0 3 1 4 24 0 4 3 2 1

F5Each number (except 0) has another it multiplies with to get 1

Page 15: 130 universal hashing pub - cs.jhu.edu · Universal hashing A set of items have been hashed to an -bucket hash table using from a 2-universal family S m n h For given element let

Prime field

Does modulo a non-prime work?

Signs of trouble. 1) We sometimes get 0s when multiplying non-0s

x+

0 1 2 3 4 50 0 0 0 0 0 01 0 1 2 3 4 52 0 2 4 0 2 43 0 3 0 3 0 34 0 4 2 0 4 25 0 5 4 3 2 1

F?6

Page 16: 130 universal hashing pub - cs.jhu.edu · Universal hashing A set of items have been hashed to an -bucket hash table using from a 2-universal family S m n h For given element let

Prime field

Does modulo a non-prime work?

Signs of trouble. 1) We sometimes get 0s when multiplying non-0s

2) Some rows don't have 1; no multiplicative inverse

x+

0 1 2 3 4 50 0 0 0 0 0 01 0 1 2 3 4 52 0 2 4 0 2 43 0 3 0 3 0 34 0 4 2 0 4 25 0 5 4 3 2 1

F?6

no 1}

Page 17: 130 universal hashing pub - cs.jhu.edu · Universal hashing A set of items have been hashed to an -bucket hash table using from a 2-universal family S m n h For given element let

Prime field

x+

0 1 2 3 4 5 60 0 0 0 0 0 0 01 0 1 2 3 4 5 62 0 2 4 6 1 3 53 0 3 6 2 5 1 44 0 4 1 5 2 6 35 0 5 3 1 6 4 26 0 6 5 4 3 2 1

F7

Page 18: 130 universal hashing pub - cs.jhu.edu · Universal hashing A set of items have been hashed to an -bucket hash table using from a 2-universal family S m n h For given element let

Universal hashing

Choose distinct . Can they collide in ?x1, x2 ∈ U p

ax1+b = ax2+b mod p?

pm

Ux1

x2

ax1+b = ax2+b mod pax1 = ax2 mod p

a(x1 − x2) = 0 mod p

We said and

Left side is product of two numbers and neither is 0 mod .

a ≥ 1 x1 ≠ x2

p

Page 19: 130 universal hashing pub - cs.jhu.edu · Universal hashing A set of items have been hashed to an -bucket hash table using from a 2-universal family S m n h For given element let

Prime field

x+

0 1 2 3 4 5 60 0 0 0 0 0 0 01 0 1 2 3 4 5 62 0 2 4 6 1 3 53 0 3 6 2 5 1 44 0 4 1 5 2 6 35 0 5 3 1 6 4 26 0 6 5 4 3 2 1

F7

No 0s

Page 20: 130 universal hashing pub - cs.jhu.edu · Universal hashing A set of items have been hashed to an -bucket hash table using from a 2-universal family S m n h For given element let

Universal hashing

ac ≠ zp

Can , where is a prime, is some integer multiple, and & are not 0 mod ?

ac = zp p za c p

Consider prime factorizations of and a c

For equality to hold, must be a prime factor of or , contradicting " & are not 0 mod "

p ac a c p

ac = zp

Page 21: 130 universal hashing pub - cs.jhu.edu · Universal hashing A set of items have been hashed to an -bucket hash table using from a 2-universal family S m n h For given element let

Universal hashing

pm

Vn

Fact 1: Distinct items from won't collide in prime field

U

(ax + b) mod p

x1

x2

U

u

Page 22: 130 universal hashing pub - cs.jhu.edu · Universal hashing A set of items have been hashed to an -bucket hash table using from a 2-universal family S m n h For given element let

Universal hashing

0 0 0 0 1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 41 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4

0 0 0 0 01 1 2 3 42 2 4 1 33 3 1 4 24 4 3 2 1

(ax + b) mod 5

x

ab

Copied from tableF5 ×

Each column is a permutation of integers mod 5

Page 23: 130 universal hashing pub - cs.jhu.edu · Universal hashing A set of items have been hashed to an -bucket hash table using from a 2-universal family S m n h For given element let

Universal hashing

0 0 0 0 1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 41 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4

0 0 0 0 0 1 1 1 11 1 2 3 4 2 3 4 02 2 4 1 3 3 0 2 43 3 1 4 2 4 2 0 34 4 3 2 1 0 4 3 2

(ax + b) mod 5

x

ab

Same as block to left but + 1 mod 5

Page 24: 130 universal hashing pub - cs.jhu.edu · Universal hashing A set of items have been hashed to an -bucket hash table using from a 2-universal family S m n h For given element let

Universal hashing

0 0 0 0 1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 41 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4

0 0 0 0 0 1 1 1 1 2 2 2 21 1 2 3 4 2 3 4 0 3 4 0 12 2 4 1 3 3 0 2 4 4 1 3 03 3 1 4 2 4 2 0 3 0 3 1 44 4 3 2 1 0 4 3 2 1 0 4 3

(ax + b) mod 5

x

ab

Same as block to left but + 1 mod 5

Page 25: 130 universal hashing pub - cs.jhu.edu · Universal hashing A set of items have been hashed to an -bucket hash table using from a 2-universal family S m n h For given element let

Universal hashing

0 0 0 0 1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 41 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4

0 0 0 0 0 1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 41 1 2 3 4 2 3 4 0 3 4 0 1 4 0 1 2 0 1 2 32 2 4 1 3 3 0 2 4 4 1 3 0 0 2 4 1 1 3 0 23 3 1 4 2 4 2 0 3 0 3 1 4 1 4 2 0 2 0 3 14 4 3 2 1 0 4 3 2 1 0 4 3 2 1 0 4 3 2 1 0

(ax + b) mod 5

x

ab

Every column is a permutation of integers mod 5. Therefore: no collisions. Distinct s get distinct answersx

Page 26: 130 universal hashing pub - cs.jhu.edu · Universal hashing A set of items have been hashed to an -bucket hash table using from a 2-universal family S m n h For given element let

Universal hashing

0 0 0 1 1 1 2 2 2 3 3 31 2 3 1 2 3 1 2 3 1 2 3

0 0 0 0 1 1 1 2 2 2 3 3 31 1 2 3 2 3 0 3 0 1 0 1 22 2 0 2 3 1 3 0 2 0 1 3 13 3 2 1 0 3 2 1 0 3 2 1 0

(ax + b) mod 4

x

ab

Is every columns necessarily a permutation of another column?

Page 27: 130 universal hashing pub - cs.jhu.edu · Universal hashing A set of items have been hashed to an -bucket hash table using from a 2-universal family S m n h For given element let

Universal hashing

pm

Vn

(ax + b) mod p

UGiven , what is the

chance that and ?

x1, x2, u, vha,b(x1) = u ha,b(x2) = v

x1

x2u

v

Page 28: 130 universal hashing pub - cs.jhu.edu · Universal hashing A set of items have been hashed to an -bucket hash table using from a 2-universal family S m n h For given element let

Universal hashing

pm

U

(a x1 + b) = u mod p(a x2 + b) = v mod p

a =v − u

x2 − x1mod p

b = u − ax1 mod p

x1

x2u

v

Fact 2: Single choice of satisfies the equations. and , so chance is

a, b0 ≤ b ≤ p − 1 1 ≤ a ≤ p − 1 1

p(p − 1) pairs are equally likelyu, v

Page 29: 130 universal hashing pub - cs.jhu.edu · Universal hashing A set of items have been hashed to an -bucket hash table using from a 2-universal family S m n h For given element let

Universal hashing

pm

U V

n

Last concern: collisions from final mod n

(ax + b) mod p

u

v

w

mod n( )

Page 30: 130 universal hashing pub - cs.jhu.edu · Universal hashing A set of items have been hashed to an -bucket hash table using from a 2-universal family S m n h For given element let

Universal hashing

p

V

nu

n + u

−n + u

Taking a number in the prime field, the others are its colliders w/r/t

u±zn

V

......

2n + u

0 1 2 3 4 5 6 7 8 9 10012345678910

For & , 20 out of 110 pairs collide (red squares)

p = 11 n = 4u, v

u

v

Page 31: 130 universal hashing pub - cs.jhu.edu · Universal hashing A set of items have been hashed to an -bucket hash table using from a 2-universal family S m n h For given element let

Universal hashing

0 1 2 3 4 5 6 7 8 9 10012345678910

For given , number of possible 's ( ) is

uv u ≠ v

u

v

At most choices are collisions

⌈p/n⌉ − 1

Pr (ha,b(x1) = ha,b(x2)) ≤⌈p/n⌉ − 1

p − 1≤

(p − 1)/np − 1

=1n

2-universality ✅

, all equally likelyp − 1

p = 11


Recommended