+ All Categories
Home > Documents > Chapter 08 Part i

Chapter 08 Part i

Date post: 15-Feb-2016
Category:
Upload: anonymous-f29iaz
View: 221 times
Download: 0 times
Share this document with a friend
Description:
useful material
Popular Tags:
25
Part II Chapter 8 Hashing
Transcript
Page 1: Chapter 08 Part i

Part II

Chapter 8 Hashing

Page 2: Chapter 08 Part i

IntroductionConsider we may perform insertion,

searching and deletion on a dictionary (symbol table).Array

Linked list Tree

Sorted Not Sorted

unbalanced balanced

Insertion O(n) / O(1) O(h) O(h) O(logk n)

Searching

O(log n) / O(1) O(h) O(h) O(logk n)

Deletion O(n) / O(1) O(h) O(h) O(logk n)Is it possible to perform these

operations in O(1) ?

Page 3: Chapter 08 Part i

IntroductionIf we find a mapping from a key to an

index, then we can locate a record quickly according its key and perform random access.

S1S2S3…

012…

Page 4: Chapter 08 Part i

IntroductionThis mapping can be illustrated as

follows:

Hashing: define a function h so that h(Key) = i, where h is called a hash function.

Two kindsStatic hashingDynamic hashing

hKey

i

Page 5: Chapter 08 Part i

8.2 Static Hashing

Page 6: Chapter 08 Part i

DefinitionIn static hashing, identifiers/keys are

stored in table with a fixed size that is called hash table.

slot1 slot2Bucket

0Bucket 1Bucket 2

Bucket n

Bucket: Each bucket has its

own address and is capable of holding a key.

hx h(x)

Hash function

Identifier Bucket address

Page 7: Chapter 08 Part i

DefinitionSlot: Each bucket may consists of s

slots to hold synonym (同義字 )i1 and i2 are synonyms if h(i1) = h(i2).

Distinct synonyms enter into the same bucket as long as the bucket has slots available.

Page 8: Chapter 08 Part i

ExampleNumber of buckets:Number of slots for each

bucket:Define hashing function f(x)

f(x) = {i | i is the order of the initial of x}.

A and A2 are synonyms.GA and GB are synonyms.If “Doll” enters, it will be

put at buckect _______ (according to the hash function).

A A2slot1 slot2

Bucket 0Bucket 1Bucket 2

Bucket 25

DBucket 3

GA GB

Page 9: Chapter 08 Part i

Overflow and CollisionOverflow occurs when a new identifier is

mapped into a full bucket.Collision occurs when two non-identical

identifiers are hashed into the same bucket.If the number of slot is 1, then overflow and

collision occur simutaneously.

A A2slot1 slot2

Bucket 0Bucket 1Bucket 2

If A3 enters bucket 0, A3 collides with A and A2. The bucket overflows as well.

Page 10: Chapter 08 Part i

8.2.2 Hash FunctionsIdeally, we expect to find a hash

function that is one-to-one and easy to compute.

The hash function f(x) wheref(x) = {i | i is the order of the initial of x}.The hash function can result in a lot of

collisions because it only considers the initial character.

Key points: use every character in the identifier as possible.

Page 11: Chapter 08 Part i

Common ApproachesDivisionMid-squareFoldingDigit Analysis

Page 12: Chapter 08 Part i

DivisionThe most widely used hash functionThe key k is divided by some number

D, and the remainder is used as the bucket address.h(k) = k % DSince the bucket address is from 0 to b-1 if there are b buckets, D is usually selected as the number of buckets.

Page 13: Chapter 08 Part i

Selecting The DivisorWhen the divisor is an even number, odd

integers hash into odd home buckets and even integers into even home buckets.

20%14 = 6, 30%14 = 2, 8%14 = 815%14 = 1, 3%14 = 3, 23%14 = 9

When the divisor is an odd number, odd (even) integers may hash into any home.

20%15 = 5, 30%15 = 0, 8%15 = 815%15 = 0, 3%15 = 3, 23%15 = 8

The bias in the keys does not result in a bias toward either the odd or even home buckets.

Better chance of uniformly distributed home buckets.So do not use an even divisor.

Page 14: Chapter 08 Part i

Selecting The DivisorSimilar biased distribution of home buckets is

seen, in practice, when the divisor is a multiple of prime numbers such as 3, 5, 7, …

The effect of each prime divisor p of b decreases as p gets larger.

Ideally, choose b so that it is a prime number.Alternatively, choose b so that it has no prime

factor smaller than 20.

Page 15: Chapter 08 Part i

Mid-squareSquaring the key and then using an

appropriate number of bits from the middle of the square.

Example:Suppose a character is represented in 6 bits

and the bucket size is 2r.0 1 3 4

A 1

0 0 0 0 0 1 0 1 1 0 1 0 92

92x92=84640 1 0 0 0 0 0 1 0 0 0 01 0 0

r bits

Page 16: Chapter 08 Part i

Mid-squareExample

Key = 113586, m =10000, where 9999 is the largest bucket address.

Squaring the key, and then we have

1 2 9 0 1 7 7 9 3 9 6

h(x) = 1779

Page 17: Chapter 08 Part i

FoldingThe key k is partitioned into several parts,

all of the same length. These partitions are then added together to obtain the hash address of k.

Two schemesShift foldingFolding at the boundaries

1 2 3 2 0 3 2 4 1 1 1 2 2 0

P1 P2 P3 P4 P5

Page 18: Chapter 08 Part i

P1

Folding

P2

P3

P4

P5

1 2 32 0 32 4 11 1 2 2 0

6 9 9Shift folding

P1

P2

P3

P4

P5

1 2 33 0 22 4 12 1 1 2 0

8 9 7Folding at the

boundaries

Page 19: Chapter 08 Part i

Overflow HandlingAn overflow occurs when the home bucket for a

new pair (key, element) is full.We may handle overflows by:

Search the hash table in some systematic fashion for a bucket that is not full.Linear probing (linear open addressing).Quadratic probing.Rehashing.

Eliminate overflows by permitting each bucket to keep a list of all pairs for which it is the bucket address.Array linear list.Chain.

Page 20: Chapter 08 Part i

Linear ProbingAlso called linear opening addressing

Search one by one until a empty slot is found.Procedures: suppose b denotes the bucket

size.1.Compute h(k).2.Examine the hash table buckets in the order

ht[h(k)], ht[(h(k)+1)%b],…, ht[(h(k)+j)%b] until one of the following happens: ht[(h(k)+j)%b] has a pair whose key is k; k is found. ht[(h(k)+j)%b] is empty; k is not in the table. Return to ht[h(k)]; the table is full.

Page 21: Chapter 08 Part i

Linear Probingdivisor = b (number of buckets) = 17.Bucket address = key % 17.

0 4 8 12

16

• Insert pairs whose keys are 6, 12, 34, 29, 28, 11, 23, 7, 0, 33, 30, 45

6 12

29

34 28

11

23 70 33

30

45

Page 22: Chapter 08 Part i

Linear Probing0 4 8 1

2166 1

229

34 28

11

23 70 33

30

45Consider: when 51 enters, how many comparisons are required?

Linear opening addressing tends to create “cluster”. These clusters become larger as more synonyms enter.

Page 23: Chapter 08 Part i

Quadratic ProbingSuppose i is used as the increment.When overflow occurs, the search is carried

out by examining h(x), (h(x)+i2)%b, and (h(x)-i2)%b.For 1≦i ≦(b-1)/2 and b is a prime number of

4j+3.For example, b=3, 7, 11,…,43, 59..

Page 24: Chapter 08 Part i

RehashingIf overflow occurs at hi(x), then try hi+1(x).Use a series of hash function h1, h2, …, hm

to find an empty bucket.

h1 h2 hmx hm(x)

Page 25: Chapter 08 Part i

Chaining[0]

[4]

[8]

[12]

[16]

12

6

34

29

28

11

237

0

33

30

45

Disadvantage of linear probingComparison of

identifiers with different hash values.

Use linked list to connect the identifiers with the same hash value and to increase the capacity of a bucket.


Recommended