+ All Categories
Home > Documents > Database Systems ( 資料庫系統 )

Database Systems ( 資料庫系統 )

Date post: 04-Jan-2016
Category:
Upload: marshall-green
View: 40 times
Download: 0 times
Share this document with a friend
Description:
Database Systems ( 資料庫系統 ). November 28, 2005 Lecture #9. Announcement. Next week reading: Chapters 12 Pickup your midterm exams at the end of the class. Pickup your assignments #1~3 outside of the TA office 336/338. Assignment #4 & Practicum #2 are due in one week. - PowerPoint PPT Presentation
Popular Tags:
28
1 Database Systems ( 資資資資資 ) November 28, 2005 Lecture #9
Transcript
Page 1: Database Systems ( 資料庫系統 )

1

Database Systems( 資料庫系統 )

November 28, 2005Lecture #9

Page 2: Database Systems ( 資料庫系統 )

2

Announcement

• Next week reading: Chapters 12 • Pickup your midterm exams at the

end of the class.• Pickup your assignments #1~3

outside of the TA office 336/338.• Assignment #4 & Practicum #2 are

due in one week.– Significant amount of coding, so start

now

Page 3: Database Systems ( 資料庫系統 )

3

Interesting Talk

• Rachel Kern, “From Cell Phones To Monkeys: Research Projects in the Speech Interface Group at the M.I.T. Media Lab”, CSIE 102, Friday 2:20 ~ 3:30

Page 4: Database Systems ( 資料庫系統 )

4

Midterm Exam Score Distribution

05

10152025303540

1數列

Page 5: Database Systems ( 資料庫系統 )

5

Ubicomp project of the week• From Pervasive to

Persuasive Computing• Pervasive Computing

(smart objects)– Design to be aware of

people’s behaviors• Examples: smart dining table,

smart chair, smart wardrobe, smart mirror, smart shoes, smart spoon, …

• Persuasive Computing– Design to change people’s

behaviors

Page 6: Database Systems ( 資料庫系統 )

6

Baby Think It Over

Page 7: Database Systems ( 資料庫系統 )

7

Smart Device:Credit Card Barbie Doll (from

Accenture)• Barbie gets wireless implant of chip and

sensors and become decision-making objects.

• When one Barbie meets another Barbie …– Detect the presence of clothing of the other

Barbie.– If she does not have it … she can

automatically send an online order through the wireless connection!

– You can give her a credit card limit.

• Good that this is just a concept toy.• It illustrates the concept of autonomous

purchasing object: car, home, refrigerator, …

Page 8: Database Systems ( 資料庫系統 )

8

Hash-Based Indexing

Chapter 11

Page 9: Database Systems ( 資料庫系統 )

9

Introduction

• Hash-based indexes are best for equality selections. Cannot support range searches.– Equality selections are useful for join

operations.

• Static and dynamic hashing techniques; trade-offs similar to ISAM vs. B+ trees.– Static hashing technique– Two dynamic hashing techniques

• Extendible Hashing• Linear Hashing

Page 10: Database Systems ( 資料庫系統 )

10

Static Hashing

• # primary pages fixed, allocated sequentially, never de-allocated; overflow pages if needed.

• h(k) mod N = bucket to which data entry with key k belongs. (N = # of buckets)

h(key) mod N

hkey

Primary bucket pages Overflow pages

20

N-1

Page 11: Database Systems ( 資料庫系統 )

11

Static Hashing (Contd.)

• Buckets contain data entries.• Hash function works on search key field of record r. Must

distribute values over range 0 ... N-1.– h(key) = (a * key + b) usually works well.– a and b are constants; lots known about how to tune h.

• Cost for insertion/delete/search: 2/2/1 disk page I/Os (no overflow chains).

• Long overflow chains can develop and degrade performance. – Why poor performance? Scan through overflow chains linearly.– Extendible and Linear Hashing: Dynamic techniques to fix this

problem.

Page 12: Database Systems ( 資料庫系統 )

12

Extendible Hashing

• Simple Solution (no overflow chain): – When bucket (primary page) becomes full, .. – Re-organize file by doubling # of buckets. Cost concern?– High cost: rehash all entries - reading and writing all

pages is expensive!• How to reduce high cost?

– Use directory of pointers to buckets, double # of buckets by doubling the directory, splitting just the bucket that overflowed!

– Directory much smaller than file, so doubling much cheaper. Only one page of data entries is split.

– How to adjust the hash function? Before doubling directory, h(r) → 0..N-1 buckets. After doubling directory, h(r) → 0 .. 2N-1

Page 13: Database Systems ( 資料庫系統 )

13

Example

• Directory is array of size 4.• To find bucket for r, take

last global depth # bits of h(r); – Example: If h (r= 5), 5’s

binary is 101, it is in bucket pointed to by 01.

• Global depth: # of bits used for hashing directory entries.

• Local depth of a bucket: # bits for hashing a bucket.

• When can global depth be different from local depth?

13*00

01

10

11

LOCAL DEPTH

GLOBAL DEPTH

DIRECTORY

Bucket A

Bucket B

Bucket C

Bucket D

DATA PAGES

10*

1* 21*

4* 12*32*16*

15*7*19*

2

2

2

2

2

5*

Page 14: Database Systems ( 資料庫系統 )

14

Insert 20 = 10100 (Causes Doubling)

19*

2

2

2

000001010011100101110111

3

3

3DIRECTORY

Bucket A

Bucket B

Bucket C

Bucket D

Bucket A2(`split image'of Bucket A)

32*

1*5*21*13*

16*

10*

15*7*

4* 20*12*

LOCAL DEPTH

GLOBAL DEPTH00011011

2 2

2

LOCAL DEPTH 2

DIRECTORY

GLOBAL DEPTHBucket A

Bucket B

Bucket C

Bucket D

1*5* 21*13*

32*16*

10*

15*7*19*

4*12*

2

double directory:-Increment global depth-Rehash bucket A-Increment local depth, why track local depth?

Page 15: Database Systems ( 資料庫系統 )

15

Insert 9 = 1001 (No Doubling)

19*

3

2

2

000001010011100101110111

3

3

3DIRECTORY

Bucket A

Bucket B

Bucket C

Bucket D

Bucket A2

32*

1*9*

21*13*

16*

10*

15*7*

4* 20*12*

LOCAL DEPTH

GLOBAL DEPTH

19*

2

2

2

000001010011100101110111

3

3

3DIRECTORY

Bucket A

Bucket B

Bucket C

Bucket D

Bucket A2

32*

1*5*21*13*

16*

10*

15*7*

4* 20*12*

LOCAL DEPTH

GLOBAL DEPTH

3Bucket B2

(split image of Bucket B)5*Only split bucket:

-Rehash bucket B-Increment local depth

Page 16: Database Systems ( 資料庫系統 )

16

Points to Note

• Global depth of directory: Max # of bits needed to tell which bucket an entry belongs to.

• Local depth of a bucket: # of bits used to determine if an entry belongs to this bucket.

• When does bucket split cause directory doubling?– Before insert, bucket is full & local depth = global

depth.

• Directory is doubled by copying it over and `fixing’ pointer to split image page.– You can do this only by using the least significant bits

in the directory.

Page 17: Database Systems ( 資料庫系統 )

17

Directory Doubling

00

0110

11

2

Why use least significant bits in directory?

Allows for doubling via copying!

3

vs.

000

001010

011100

101110

111

00

1001

11

2

3

Least Significant Most Significant

000

001010

011100

101110

111

Split buckets

Page 18: Database Systems ( 資料庫系統 )

18

Comments on Extendible Hashing

• If directory fits in memory, equality search answered with one disk access; else two.

• Problem with extendible hashing:– If the distribution of hash values is skewed

(concentrates on a few buckets), directory can grow large.

– Can you come up with one insertion leading to multiple splits

• Delete: If removal of data entry makes bucket empty, can be merged with `split image’. If each directory element points to same bucket as its split image, can halve directory.

Page 19: Database Systems ( 資料庫系統 )

19

Skewed data distribution (multiple splits)

• Assume each bucket holds one data entry

• Insert 2 (binary 10) – how many times of split?

• Insert 16 (binary 10000) – how many times of split?

0

1

LOCAL DEPTH

GLOBAL DEPTH 0* 8*

1

11

Page 20: Database Systems ( 資料庫系統 )

20

Delete 10*

00011011

2 2

2

LOCAL DEPTH 2

DIRECTORY

GLOBAL DEPTHBucket A

Bucket B

Bucket C

Bucket D

1*5* 21*13*

32*16*

10*

15*7*19*

4*12*

2 00011011

2 2

2

LOCAL DEPTH 1

DIRECTORY

GLOBAL DEPTHBucket A

Bucket B

Bucket B2

1*5* 21*13*

32*16*

15*7*19*

4*12*

Page 21: Database Systems ( 資料庫系統 )

21

Delete 15*, 7*, 19*

00011011

2 2

2

LOCAL DEPTH 1

DIRECTORY

GLOBAL DEPTHBucket A

Bucket B

Bucket B2

1*5* 21*13*

32*16*

15*7*19*

4*12*

00011011

2 1

LOCAL DEPTH 1

GLOBAL DEPTHBucket A

Bucket B1*5* 21*13*

32*16*4*12*

DIRECTORY

0001

1 1

LOCAL DEPTH 1

GLOBAL DEPTHBucket A

Bucket B1*5* 21*13*

32*16*4*12*

Page 22: Database Systems ( 資料庫系統 )

22

Linear Hashing (LH)

• This is another dynamic hashing scheme, an alternative to Extendible Hashing.– LH fixes the problem of long overflow chains (in static

hashing) without using a directory (in extendible hashing).

• Basic Idea: Use a family of hash functions h0, h1, h2, ...

– Each function’s range is twice that of its predecessor.– Pages are split when overflows occur – but not necessarily

the page with the overflow.– Splitting occurs in turn, in a round robin fashion.– When all the pages at one level (the current hash

function) have been split, a new level is applied.– Splitting occurs gradually– Primary pages are allocated consecutively.

Page 23: Database Systems ( 資料庫系統 )

23

Levels of Linear Hashing• Initial Stage.

– The initial level distributes entries into N0 buckets.– Call the hash function to perform this h0.

• Splitting buckets.– If a bucket overflows its primary page is chained to an

overflow page (same as in static hashing).– Also when a bucket overflows, some bucket is split.

• The first bucket to be split is the first bucket in the file (not necessarily the bucket that overflows).

• The next bucket to be split is the second bucket in the file … and so on until the Nth. has been split.

• When buckets are split their entries (including those in overflow pages) are distributed using h1.

– To access split buckets the next level hash function (h1) is applied.

– h1 maps entries to 2N0 (or N1)buckets.

Page 24: Database Systems ( 資料庫系統 )

24

Levels of Linear Hashing (Cnt)

• Level progression:– Once all Ni buckets of the current level (i) are split t

he hash function hi is replaced by hi+1.– The splitting process starts again at the first bucket

and hi+2 is applied to find entries in split buckets.

Page 25: Database Systems ( 資料庫系統 )

25

Linear Hashing Example• Initially, the index level equal

to 0 and N0 equals 4 (three entries fit on a page).

• h0 maps index entries to one of four buckets.

• h0 is used and no buckets have been split.

• Now consider what happens when 9 (1001) is inserted (which will not fit in the second bucket).

• Note that next indicates which bucket is to split next. (Round Robin)

next

64 36

1 17 5

6

31 15

00

01

10

11

h0

Page 26: Database Systems ( 資料庫系統 )

26

Linear Hashing Example 2

• An overflow page is chained to the primary page to contain the inserted value.

• Note that the split page is not necessary the overflow page – round robin.

• If h0 maps a value from zero to next – 1 (just the first page in this case), h1 must be used to insert the new entry.

• Note how the new page falls naturally into the sequence as the fifth page.

h1 next

64

h0 next

1 17 5 9

h0 6

h0 31 15

h1 36

• The page indicated by next is split (the first one).

• Next is incremented.

Page 27: Database Systems ( 資料庫系統 )

27

Linear Hashing

• Assume inserts of 8, 7, 18, 14, 111, 32, 162, 10, 13, 233

• After the 2nd. split the base level is 1 (N1 = 8), use h1.

• Subsequent splits will use h2 for inserts between the first bucket and next-1.

2 1

h1 h1 next3

64 8 32 16

h1 h1 1 17

9

h1 h0 next1

10 18

6 18

14

h0 h0 next2

11

31 15

7 11

h1 h1 36

h1 h1 5 13

h1 - 6 14

- - 31 15

7 23

Page 28: Database Systems ( 資料庫系統 )

28

LH Described as a Variant of EH

• Two schemes are similar:– Begin with an EH index where directory has N elements.– Use overflow pages, split buckets round-robin.– First split is at bucket 0. (Imagine directory being doubled

at this point.) But elements <1,N+1>, <2,N+2>, ... are the same. So, need only create directory element N, which differs from 0, now.

• When bucket 1 splits, create directory element N+1, etc.

• So, directory can double gradually. Also, primary bucket pages are created in order. If they are allocated in sequence too (so that finding i’th is easy), we actually don’t need a directory! Voila, LH.


Recommended