+ All Categories
Home > Documents > CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More.

CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More.

Date post: 15-Dec-2015
Category:
Upload: hayden-elam
View: 215 times
Download: 0 times
Share this document with a friend
63
CS 245 Notes 5 1 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More
Transcript
Page 1: CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More.

CS 245 Notes 5 1

CS 245: Database System Principles

Hector Garcia-Molina

Notes 5: Hashing and More

Page 2: CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More.

CS 245 Notes 5 2

key h(key)

Hashing

<key>

.

.

Buckets(typically 1disk block)

Page 3: CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More.

CS 245 Notes 5 3

.

.

.

Two alternatives

records

.

.

.

(1) key h(key)

Page 4: CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More.

CS 245 Notes 5 4

(2) key h(key)

Index

recordkey 1

Two alternatives

• Alt (2) for “secondary” search key

Page 5: CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More.

CS 245 Notes 5 5

Example hash function

• Key = ‘x1 x2 … xn’ n byte character string

• Have b buckets• h: add x1 + x2 + ….. xn

– compute sum modulo b

Page 6: CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More.

CS 245 Notes 5 6

This may not be best function … Read Knuth Vol. 3 if you really

need to select a good function.

Good hash Expected number of function: keys/bucket is the

same for all buckets

Page 7: CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More.

CS 245 Notes 5 7

Within a bucket:

• Do we keep keys sorted?

• Yes, if CPU time critical & Inserts/Deletes not too frequent

Page 8: CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More.

CS 245 Notes 5 8

Next: example to illustrateinserts, overflows,

deletes

h(K)

Page 9: CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More.

CS 245 Notes 5 9

EXAMPLE 2 records/bucket

INSERT:h(a) = 1h(b) = 2h(c) = 1h(d) = 0

0

1

2

3

Page 10: CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More.

CS 245 Notes 5 10

EXAMPLE 2 records/bucket

INSERT:h(a) = 1h(b) = 2h(c) = 1h(d) = 0

0

1

2

3

d

ac

b

h(e) = 1

Page 11: CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More.

CS 245 Notes 5 11

EXAMPLE 2 records/bucket

INSERT:h(a) = 1h(b) = 2h(c) = 1h(d) = 0

0

1

2

3

d

ac

b

h(e) = 1

e

Page 12: CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More.

CS 245 Notes 5 12

0

1

2

3

a

bc

e

d

EXAMPLE: deletion

Delete:ef

fg

Page 13: CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More.

CS 245 Notes 5 13

0

1

2

3

a

bc

e

d

EXAMPLE: deletion

Delete:ef

fg

maybe move“g” up

c

Page 14: CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More.

CS 245 Notes 5 14

0

1

2

3

a

bc

e

d

EXAMPLE: deletion

Delete:ef

fg

maybe move“g” up

cd

Page 15: CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More.

CS 245 Notes 5 15

Rule of thumb:• Try to keep space utilization

between 50% and 80% Utilization = # keys used

total # keys that fit

Page 16: CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More.

CS 245 Notes 5 16

Rule of thumb:• Try to keep space utilization

between 50% and 80% Utilization = # keys used

total # keys that fit

• If < 50%, wasting space• If > 80%, overflows significant

depends on how good hashfunction is & on # keys/bucket

Page 17: CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More.

CS 245 Notes 5 17

How do we cope with growth?

• Overflows and reorganizations• Dynamic hashing

Page 18: CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More.

CS 245 Notes 5 18

How do we cope with growth?

• Overflows and reorganizations• Dynamic hashing

• Extensible• Linear

Page 19: CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More.

CS 245 Notes 5 19

Extensible hashing: two ideas

(a) Use i of b bits output by hash function

b h(K)

use i grows over time….

00110101

Page 20: CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More.

CS 245 Notes 5 20

(b) Use directory

h(K)[i ] to bucket

.

.

.

.

Page 21: CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More.

CS 245 Notes 5 21

Example: h(k) is 4 bits; 2 keys/bucket

i =01

1

1

1

0001

1001

1100

Insert 1010

Page 22: CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More.

CS 245 Notes 5 22

Example: h(k) is 4 bits; 2 keys/bucket

i =01

1

1

1

0001

1001

1100

Insert 101011100

1010

Page 23: CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More.

CS 245 Notes 5 23

Example: h(k) is 4 bits; 2 keys/bucket

i = 1

1

1

0001

1001

1100

Insert 101011100

1010

New directory

200

01

10

11

i =

2

2

Page 24: CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More.

CS 245 Notes 5 24

10001

21001

1010

21100

Insert:

0111

0000

00

01

10

11

2i =

Example continued

Page 25: CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More.

CS 245 Notes 5 25

10001

21001

1010

21100

Insert:

0111

0000

00

01

10

11

2i =

Example continued

0111

0000

0111

0001

Page 26: CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More.

CS 245 Notes 5 26

10001

21001

1010

21100

Insert:

0111

0000

00

01

10

11

2i =

Example continued

0111

0000

0111

0001

2

2

Page 27: CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More.

CS 245 Notes 5 27

00

01

10

11

2i =

21001

1010

21100

20111

20000

0001

Insert:

1001

Example continued

Page 28: CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More.

CS 245 Notes 5 28

00

01

10

11

2i =

21001

1010

21100

20111

20000

0001

Insert:

1001

Example continued

1001

1001

1010

Page 29: CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More.

CS 245 Notes 5 29

00

01

10

11

2i =

21001

1010

21100

20111

20000

0001

Insert:

1001

Example continued

1001

1001

1010

000

001

010

011

100

101

110

111

3i =

3

3

Page 30: CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More.

CS 245 Notes 5 30

Extensible hashing: deletion

• No merging of blocks• Merge blocks

and cut directory if possible(Reverse insert procedure)

Page 31: CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More.

CS 245 Notes 5 31

Deletion example:

• Run thru insert example in reverse!

Page 32: CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More.

CS 245 Notes 5 32

Note: Still need overflow chains

• Example: many records with duplicate keys

11101

1100

2

21100

insert 1100

1100

if we split:

Page 33: CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More.

CS 245 Notes 5 33

Solution: overflow chains

11101

1100

11100

insert 1100 add overflow block:

1101

1101

Page 34: CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More.

CS 245 Notes 5 34

Extensible hashing

Can handle growing files- with less wasted space- with no full reorganizations

Summary

+

Indirection(Not bad if directory in

memory)

Directory doubles in size(Now it fits, now it does not)

-

-

Page 35: CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More.

CS 245 Notes 5 35

Linear hashing

• Another dynamic hashing scheme

Two ideas:(a) Use i low order bits of hash

01110101grows

b

i

(b) File grows linearly

Page 36: CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More.

CS 245 Notes 5 36

Example b=4 bits, i =2, 2 keys/bucket

00 01 10 11

0101

1111

0000

1010

m = 01 (max used block)

Futuregrowthbuckets

Page 37: CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More.

CS 245 Notes 5 37

Example b=4 bits, i =2, 2 keys/bucket

00 01 10 11

0101

1111

0000

1010

m = 01 (max used block)

Futuregrowthbuckets

If h(k)[i ] m, then look at bucket h(k)[i ]

else, look at bucket h(k)[i ] - 2i -1

Rule

Page 38: CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More.

CS 245 Notes 5 38

Example b=4 bits, i =2, 2 keys/bucket

00 01 10 11

0101

1111

0000

1010

m = 01 (max used block)

Futuregrowthbuckets

If h(k)[i ] m, then look at bucket h(k)[i ]

else, look at bucket h(k)[i ] - 2i -1

Rule

• insert 0101

Page 39: CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More.

CS 245 Notes 5 39

Example b=4 bits, i =2, 2 keys/bucket

00 01 10 11

0101

1111

0000

1010

m = 01 (max used block)

Futuregrowthbuckets

If h(k)[i ] m, then look at bucket h(k)[i ]

else, look at bucket h(k)[i ] - 2i -1

Rule

0101• can have overflow chains!

• insert 0101

Page 40: CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More.

CS 245 Notes 5 40

Note• In textbook, n is used instead of m• n=m+1

00 01 10 11

0101

1111

0000

1010

m = 01 (max used block)

Futuregrowthbuckets

n=10

Page 41: CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More.

CS 245 Notes 5 41

Example b=4 bits, i =2, 2 keys/bucket

00 01 10 11

0101

1111

0000

1010

m = 01 (max used block)

Futuregrowthbuckets

10

1010

0101 • insert 0101

Page 42: CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More.

CS 245 Notes 5 42

Example b=4 bits, i =2, 2 keys/bucket

00 01 10 11

0101

1111

0000

1010

m = 01 (max used block)

Futuregrowthbuckets

10

1010

0101 • insert 0101

11

Page 43: CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More.

CS 245 Notes 5 43

Example b=4 bits, i =2, 2 keys/bucket

00 01 10 11

0101

1111

0000

1010

m = 01 (max used block)

Futuregrowthbuckets

10

1010

0101 • insert 0101

11

11110101

Page 44: CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More.

CS 245 Notes 5 44

Example Continued: How to grow beyond this?

00 01 10 11

111110100101

0101

0000

m = 11 (max used block)

i = 2

. . .

Page 45: CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More.

CS 245 Notes 5 45

Example Continued: How to grow beyond this?

00 01 10 11

111110100101

0101

0000

m = 11 (max used block)

i = 2

0 0 0 0100 101 110 111

3

. . .

Page 46: CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More.

CS 245 Notes 5 46

Example Continued: How to grow beyond this?

00 01 10 11

111110100101

0101

0000

m = 11 (max used block)

i = 2

0 0 0 0100 101 110 111

3

. . .

100

100

Page 47: CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More.

CS 245 Notes 5 47

Example Continued: How to grow beyond this?

00 01 10 11

111110100101

0101

0000

m = 11 (max used block)

i = 2

0 0 0 0100 101 110 111

3

. . .

100

100

101

101

0101

0101

Page 48: CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More.

CS 245 Notes 5 48

• If U > threshold then increase m(and maybe i )

When do we expand file?

• Keep track of: # used slots total # of slots = U

Page 49: CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More.

CS 245 Notes 5 49

Linear Hashing

Can handle growing files- with less wasted space- with no full reorganizations

No indirection like extensible hashing

Summary

+

+

Can still have overflow chains-

Page 50: CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More.

CS 245 Notes 5 50

Example: BAD CASE

Very full

Very empty Need to move

m here…Would wastespace...

Page 51: CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More.

CS 245 Notes 5 51

Hashing- How it works- Dynamic hashing

- Extensible- Linear

Summary

Page 52: CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More.

CS 245 Notes 5 52

Next:

• Indexing vs Hashing• Index definition in SQL• Multiple key access

Page 53: CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More.

CS 245 Notes 5 53

• Hashing good for probes given keye.g., SELECT …

FROM RWHERE R.A = 5

Indexing vs Hashing

Page 54: CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More.

CS 245 Notes 5 54

• INDEXING (Including B Trees) good for

Range Searches:e.g., SELECT

FROM RWHERE R.A > 5

Indexing vs Hashing

Page 55: CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More.

CS 245 Notes 5 55

Index definition in SQL

• Create index name on rel (attr)• Create unique index name on rel

(attr)defines candidate key

• Drop INDEX name

Page 56: CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More.

CS 245 Notes 5 56

CANNOT SPECIFY TYPE OF INDEX

(e.g. B-tree, Hashing, …)

OR PARAMETERS(e.g. Load Factor, Size of

Hash,...)

... at least in SQL...

Note

Page 57: CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More.

CS 245 Notes 5 57

ATTRIBUTE LIST MULTIKEY INDEX

(next) e.g., CREATE INDEX foo ON

R(A,B,C)

Note

Page 58: CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More.

CS 245 Notes 5 58

Motivation: Find records where DEPT = “Toy” AND SAL >

50k

Multi-key Index

Page 59: CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More.

CS 245 Notes 5 59

Strategy I:

• Use one index, say Dept.• Get all Dept = “Toy” records

and check their salary

I1

Page 60: CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More.

CS 245 Notes 5 60

• Use 2 Indexes; Manipulate Pointers

Toy Sal>

50k

Strategy II:

Page 61: CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More.

CS 245 Notes 5 61

• Multiple Key Index

One idea:

Strategy III:

I1

I2

I3

Page 62: CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More.

CS 245 Notes 5 62

Example

ExampleRecord

DeptIndex

SalaryIndex

Name=JoeDEPT=SalesSAL=15k

ArtSalesToy

10k15k17k21k

12k15k15k19k

Page 63: CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More.

CS 245 Notes 5 63

For which queries is this index good?

Find RECs Dept = “Sales” SAL=20kFind RECs Dept = “Sales” SAL > 20kFind RECs Dept = “Sales”Find RECs SAL = 20k


Recommended