+ All Categories
Home > Documents > CS 4432lecture #10 - indexing & hashing1 CS4432: Database Systems II Lecture #10 Professor Elke A....

CS 4432lecture #10 - indexing & hashing1 CS4432: Database Systems II Lecture #10 Professor Elke A....

Date post: 21-Dec-2015
Category:
View: 212 times
Download: 0 times
Share this document with a friend
32
CS 4432 lecture #10 - indexing & hashing 1 CS4432: Database Systems II Lecture #10 Professor Elke A. Rundensteiner
Transcript
Page 1: CS 4432lecture #10 - indexing & hashing1 CS4432: Database Systems II Lecture #10 Professor Elke A. Rundensteiner.

CS 4432 lecture #10 - indexing & hashing

1

CS4432: Database Systems IILecture #10

Professor Elke A. Rundensteiner

Page 2: CS 4432lecture #10 - indexing & hashing1 CS4432: Database Systems II Lecture #10 Professor Elke A. Rundensteiner.

CS 4432 lecture #10 - indexing & hashing

2

1.1. B+-tree Odds and EndsB+-tree Odds and Ends2.2. Hashing (briefly)Hashing (briefly)

Chapter 4 – INDEXING Wrap-up

value

record

Page 3: CS 4432lecture #10 - indexing & hashing1 CS4432: Database Systems II Lecture #10 Professor Elke A. Rundensteiner.

CS 4432 lecture #10 - indexing & hashing

3

Root

B+Tree Example n=3

100

120

150

180

30

3 5 11

30

35

100

101

110

120

130

150

156

179

180

200

Page 4: CS 4432lecture #10 - indexing & hashing1 CS4432: Database Systems II Lecture #10 Professor Elke A. Rundensteiner.

CS 4432 lecture #10 - indexing & hashing

4

ComparisonB-tree vs. indexed seq.

file• Less space, so

lookup faster• Inserts managed

by overflow area• Requires

temporary restructuring

• Unpredictable performance

• Consumes more space, so lookup slower

•Each insert/delete potentially restructures

•Build-in restructuring

• Predictable performance

Page 5: CS 4432lecture #10 - indexing & hashing1 CS4432: Database Systems II Lecture #10 Professor Elke A. Rundensteiner.

CS 4432 lecture #10 - indexing & hashing

5

• DBA does not know when to reorganize

• DBA does not know how full to loadpages of new index

B-trees better …

Page 6: CS 4432lecture #10 - indexing & hashing1 CS4432: Database Systems II Lecture #10 Professor Elke A. Rundensteiner.

CS 4432 lecture #10 - indexing & hashing

6

• A la buffering… Is LRU a good policy for B+tree buffers?

Of course not!

Should try to keep root in memory at all times

(and perhaps some nodes from second level)

Page 7: CS 4432lecture #10 - indexing & hashing1 CS4432: Database Systems II Lecture #10 Professor Elke A. Rundensteiner.

CS 4432 lecture #10 - indexing & hashing

7

Interesting problem:

For B+tree, how large should n be?

n is number of keys / node

Page 8: CS 4432lecture #10 - indexing & hashing1 CS4432: Database Systems II Lecture #10 Professor Elke A. Rundensteiner.

CS 4432 lecture #10 - indexing & hashing

8

assumptions: n children per node and N records in database

(1) Time to read B-Tree node from disk is (tseek + tread*n) msec.(2) Once in main memory, use binary search to locate key, (a + b log_2 n) msec(3) Need to search (read) log_n (N) tree nodes

(4) t-search = (tseek + tread*n + (a + b*log_2(n)) * log n (N)

Page 9: CS 4432lecture #10 - indexing & hashing1 CS4432: Database Systems II Lecture #10 Professor Elke A. Rundensteiner.

CS 4432 lecture #10 - indexing & hashing

9

Can get: f(n) = time to find a record

f(n)

nopt n

FIND nopt by f’(n) = 0

What happens to nopt as:•Disk gets faster? CPU get faster? …

Page 10: CS 4432lecture #10 - indexing & hashing1 CS4432: Database Systems II Lecture #10 Professor Elke A. Rundensteiner.

CS 4432 lecture #10 - indexing & hashing

10

Bulk Loading of B+ Tree

• For large collection of records, create B+ tree.• Method 1: Repeatedly insert records slow.• Method 2: Bulk Loading more efficient.

Page 11: CS 4432lecture #10 - indexing & hashing1 CS4432: Database Systems II Lecture #10 Professor Elke A. Rundensteiner.

CS 4432 lecture #10 - indexing & hashing

11

Bulk Loading of B+ Tree

• Initialization: – Sort all data entries – Insert pointer to first (leaf) page in new (root) page.

3* 4* 6* 9* 10* 11* 12* 13* 20* 22* 23* 31* 35* 36* 38* 41* 44*

Sorted pages of data entries; not yet in B+ treeRoot

Page 12: CS 4432lecture #10 - indexing & hashing1 CS4432: Database Systems II Lecture #10 Professor Elke A. Rundensteiner.

CS 4432 lecture #10 - indexing & hashing

12

Bulk Loading (Contd.)

• Index entries for leaf pages always entered into right-most index page

• When this fills up, it splits. (Split may go up right-

most path to root.)

Faster than repeated inserts, especially when one considers locking!

3* 4* 6* 9* 10*11* 12*13* 20*22* 23* 31* 35*36* 38*41* 44*

Root

Data entry pages

not yet in B+ tree3523126

10 20

3* 4* 6* 9* 10* 11* 12*13* 20*22* 23* 31* 35*36* 38*41* 44*

6

Root

10

12 23

20

35

38

not yet in B+ treeData entry pages

Page 13: CS 4432lecture #10 - indexing & hashing1 CS4432: Database Systems II Lecture #10 Professor Elke A. Rundensteiner.

CS 4432 lecture #10 - indexing & hashing

13

Summary of Bulk Loading

• Method 1: multiple inserts.– Slow.– Does not give sequential storage of leaves.

• Method 2: Bulk Loading – Has advantages for concurrency control.– Fewer I/Os during build.– Leaves will be stored sequentially (and

linked) – Can control “fill factor” on pages.

Page 14: CS 4432lecture #10 - indexing & hashing1 CS4432: Database Systems II Lecture #10 Professor Elke A. Rundensteiner.

CS 4432 lecture #10 - indexing & hashing

14

key h(key)

Hashing

<key>

.

.

Buckets(typically 1disk block)

Page 15: CS 4432lecture #10 - indexing & hashing1 CS4432: Database Systems II Lecture #10 Professor Elke A. Rundensteiner.

CS 4432 lecture #10 - indexing & hashing

15

Example hash function

• Key = ‘x1 x2 … xn’ n byte character string

• Have b buckets• h: add x1 + x2 + ….. xn

– compute sum modulo b

Page 16: CS 4432lecture #10 - indexing & hashing1 CS4432: Database Systems II Lecture #10 Professor Elke A. Rundensteiner.

CS 4432 lecture #10 - indexing & hashing

16

This may not be best function … Read Knuth Vol. 3 if you really

need to select a good function.

Good hash Expected number of function: keys/bucket is the

same for all buckets

Page 17: CS 4432lecture #10 - indexing & hashing1 CS4432: Database Systems II Lecture #10 Professor Elke A. Rundensteiner.

CS 4432 lecture #10 - indexing & hashing

17

Within a bucket:

• Do we keep keys sorted?

• Yes, if CPU time critical & Inserts/Deletes not too frequent

Page 18: CS 4432lecture #10 - indexing & hashing1 CS4432: Database Systems II Lecture #10 Professor Elke A. Rundensteiner.

CS 4432 lecture #10 - indexing & hashing

18

Next: example to illustrateinserts, overflows,

deletes

h(K)

Page 19: CS 4432lecture #10 - indexing & hashing1 CS4432: Database Systems II Lecture #10 Professor Elke A. Rundensteiner.

CS 4432 lecture #10 - indexing & hashing

19

EXAMPLE 2 records/bucket

INSERT:h(a) = 1h(b) = 2h(c) = 1h(d) = 0

0

1

2

3

d

ac

b

h(e) = 1

e

Page 20: CS 4432lecture #10 - indexing & hashing1 CS4432: Database Systems II Lecture #10 Professor Elke A. Rundensteiner.

CS 4432 lecture #10 - indexing & hashing

20

0

1

2

3

a

bc

e

d

EXAMPLE: deletion

Delete:ef

fg

maybe move“g” up

cd

Page 21: CS 4432lecture #10 - indexing & hashing1 CS4432: Database Systems II Lecture #10 Professor Elke A. Rundensteiner.

CS 4432 lecture #10 - indexing & hashing

21

Rule of thumb:• Try to keep space utilization

between 50% and 80% Utilization = # keys used

total # keys that fit

• If < 50%, wasting space• If > 80%, overflows significant

depends on how good hashfunction is & on # keys/bucket

Page 22: CS 4432lecture #10 - indexing & hashing1 CS4432: Database Systems II Lecture #10 Professor Elke A. Rundensteiner.

CS 4432 lecture #10 - indexing & hashing

22

How do we cope with growth?

• Overflows and reorganizations• Dynamic hashing

• Extensible hashing• Others …

Page 23: CS 4432lecture #10 - indexing & hashing1 CS4432: Database Systems II Lecture #10 Professor Elke A. Rundensteiner.

CS 4432 lecture #10 - indexing & hashing

23

Extensible hashing : idea 1

(a) Use i of b bits output by hash function

b h(K)

use i grows over time….

00110101

Page 24: CS 4432lecture #10 - indexing & hashing1 CS4432: Database Systems II Lecture #10 Professor Elke A. Rundensteiner.

CS 4432 lecture #10 - indexing & hashing

24

(b) Use directory

h(K)[i ] to bucket

.

.

.

.

Extensible hashing : idea 2

Page 25: CS 4432lecture #10 - indexing & hashing1 CS4432: Database Systems II Lecture #10 Professor Elke A. Rundensteiner.

CS 4432 lecture #10 - indexing & hashing

25

Example: h(k) is 4 bits; 2 keys/bucket

i = 1

1

1

0001

1001

1100

Insert 1010

11100

1010

New directory

200

01

10

11

i =

2

2

01

Page 26: CS 4432lecture #10 - indexing & hashing1 CS4432: Database Systems II Lecture #10 Professor Elke A. Rundensteiner.

CS 4432 lecture #10 - indexing & hashing

26

10001

21001

1010

21100

Insert:

0111

0000

00

01

10

11

2i =

Example continued

0111

0000

0111

0001

2

2

Page 27: CS 4432lecture #10 - indexing & hashing1 CS4432: Database Systems II Lecture #10 Professor Elke A. Rundensteiner.

CS 4432 lecture #10 - indexing & hashing

27

00

01

10

11

2i =

21001

1010

21100

20111

20000

0001

Insert:

1001

Example continued

1001

1001

1010

000

001

010

011

100

101

110

111

3i =

3

3

Page 28: CS 4432lecture #10 - indexing & hashing1 CS4432: Database Systems II Lecture #10 Professor Elke A. Rundensteiner.

CS 4432 lecture #10 - indexing & hashing

28

Extensible hashing: deletion

• Merge blocks and cut directory if possible

(Reverse insert procedure)

Page 29: CS 4432lecture #10 - indexing & hashing1 CS4432: Database Systems II Lecture #10 Professor Elke A. Rundensteiner.

CS 4432 lecture #10 - indexing & hashing

29

Extensible hashing

Can handle growing files- with less wasted space- with no full reorganizations

Summary

+

Indirection(Not bad if directory in

memory)

Directory doubles in size(Now it fits, now it does not)

-

-

Page 30: CS 4432lecture #10 - indexing & hashing1 CS4432: Database Systems II Lecture #10 Professor Elke A. Rundensteiner.

CS 4432 lecture #10 - indexing & hashing

30

• Hashing good for probes given keye.g., SELECT …

FROM RWHERE R.A = 5

Indexing vs Hashing

Page 31: CS 4432lecture #10 - indexing & hashing1 CS4432: Database Systems II Lecture #10 Professor Elke A. Rundensteiner.

CS 4432 lecture #10 - indexing & hashing

31

• INDEXING (Including B Trees) good for

Range Searches:e.g., SELECT

FROM RWHERE R.A > 5

Indexing vs Hashing

Page 32: CS 4432lecture #10 - indexing & hashing1 CS4432: Database Systems II Lecture #10 Professor Elke A. Rundensteiner.

CS 4432 lecture #10 - indexing & hashing

32

The BIG picture….

• Chapters 2 & 3: Storage, records, blocks...• Chapter 4 & 5: Access Mechanisms -

Indexes- B trees- Hashing- Multi key

• Chapter 6 & 7: Query Processing


Recommended