Date post: | 04-Jan-2016 |
Category: |
Documents |
Upload: | marshall-green |
View: | 40 times |
Download: | 0 times |
1
Database Systems( 資料庫系統 )
November 28, 2005Lecture #9
2
Announcement
• Next week reading: Chapters 12 • Pickup your midterm exams at the
end of the class.• Pickup your assignments #1~3
outside of the TA office 336/338.• Assignment #4 & Practicum #2 are
due in one week.– Significant amount of coding, so start
now
3
Interesting Talk
• Rachel Kern, “From Cell Phones To Monkeys: Research Projects in the Speech Interface Group at the M.I.T. Media Lab”, CSIE 102, Friday 2:20 ~ 3:30
4
Midterm Exam Score Distribution
05
10152025303540
1數列
5
Ubicomp project of the week• From Pervasive to
Persuasive Computing• Pervasive Computing
(smart objects)– Design to be aware of
people’s behaviors• Examples: smart dining table,
smart chair, smart wardrobe, smart mirror, smart shoes, smart spoon, …
• Persuasive Computing– Design to change people’s
behaviors
6
Baby Think It Over
7
Smart Device:Credit Card Barbie Doll (from
Accenture)• Barbie gets wireless implant of chip and
sensors and become decision-making objects.
• When one Barbie meets another Barbie …– Detect the presence of clothing of the other
Barbie.– If she does not have it … she can
automatically send an online order through the wireless connection!
– You can give her a credit card limit.
• Good that this is just a concept toy.• It illustrates the concept of autonomous
purchasing object: car, home, refrigerator, …
8
Hash-Based Indexing
Chapter 11
9
Introduction
• Hash-based indexes are best for equality selections. Cannot support range searches.– Equality selections are useful for join
operations.
• Static and dynamic hashing techniques; trade-offs similar to ISAM vs. B+ trees.– Static hashing technique– Two dynamic hashing techniques
• Extendible Hashing• Linear Hashing
10
Static Hashing
• # primary pages fixed, allocated sequentially, never de-allocated; overflow pages if needed.
• h(k) mod N = bucket to which data entry with key k belongs. (N = # of buckets)
h(key) mod N
hkey
Primary bucket pages Overflow pages
20
N-1
11
Static Hashing (Contd.)
• Buckets contain data entries.• Hash function works on search key field of record r. Must
distribute values over range 0 ... N-1.– h(key) = (a * key + b) usually works well.– a and b are constants; lots known about how to tune h.
• Cost for insertion/delete/search: 2/2/1 disk page I/Os (no overflow chains).
• Long overflow chains can develop and degrade performance. – Why poor performance? Scan through overflow chains linearly.– Extendible and Linear Hashing: Dynamic techniques to fix this
problem.
12
Extendible Hashing
• Simple Solution (no overflow chain): – When bucket (primary page) becomes full, .. – Re-organize file by doubling # of buckets. Cost concern?– High cost: rehash all entries - reading and writing all
pages is expensive!• How to reduce high cost?
– Use directory of pointers to buckets, double # of buckets by doubling the directory, splitting just the bucket that overflowed!
– Directory much smaller than file, so doubling much cheaper. Only one page of data entries is split.
– How to adjust the hash function? Before doubling directory, h(r) → 0..N-1 buckets. After doubling directory, h(r) → 0 .. 2N-1
13
Example
• Directory is array of size 4.• To find bucket for r, take
last global depth # bits of h(r); – Example: If h (r= 5), 5’s
binary is 101, it is in bucket pointed to by 01.
• Global depth: # of bits used for hashing directory entries.
• Local depth of a bucket: # bits for hashing a bucket.
• When can global depth be different from local depth?
13*00
01
10
11
LOCAL DEPTH
GLOBAL DEPTH
DIRECTORY
Bucket A
Bucket B
Bucket C
Bucket D
DATA PAGES
10*
1* 21*
4* 12*32*16*
15*7*19*
2
2
2
2
2
5*
14
Insert 20 = 10100 (Causes Doubling)
19*
2
2
2
000001010011100101110111
3
3
3DIRECTORY
Bucket A
Bucket B
Bucket C
Bucket D
Bucket A2(`split image'of Bucket A)
32*
1*5*21*13*
16*
10*
15*7*
4* 20*12*
LOCAL DEPTH
GLOBAL DEPTH00011011
2 2
2
LOCAL DEPTH 2
DIRECTORY
GLOBAL DEPTHBucket A
Bucket B
Bucket C
Bucket D
1*5* 21*13*
32*16*
10*
15*7*19*
4*12*
2
double directory:-Increment global depth-Rehash bucket A-Increment local depth, why track local depth?
15
Insert 9 = 1001 (No Doubling)
19*
3
2
2
000001010011100101110111
3
3
3DIRECTORY
Bucket A
Bucket B
Bucket C
Bucket D
Bucket A2
32*
1*9*
21*13*
16*
10*
15*7*
4* 20*12*
LOCAL DEPTH
GLOBAL DEPTH
19*
2
2
2
000001010011100101110111
3
3
3DIRECTORY
Bucket A
Bucket B
Bucket C
Bucket D
Bucket A2
32*
1*5*21*13*
16*
10*
15*7*
4* 20*12*
LOCAL DEPTH
GLOBAL DEPTH
3Bucket B2
(split image of Bucket B)5*Only split bucket:
-Rehash bucket B-Increment local depth
16
Points to Note
• Global depth of directory: Max # of bits needed to tell which bucket an entry belongs to.
• Local depth of a bucket: # of bits used to determine if an entry belongs to this bucket.
• When does bucket split cause directory doubling?– Before insert, bucket is full & local depth = global
depth.
• Directory is doubled by copying it over and `fixing’ pointer to split image page.– You can do this only by using the least significant bits
in the directory.
17
Directory Doubling
00
0110
11
2
Why use least significant bits in directory?
Allows for doubling via copying!
3
vs.
000
001010
011100
101110
111
00
1001
11
2
3
Least Significant Most Significant
000
001010
011100
101110
111
Split buckets
18
Comments on Extendible Hashing
• If directory fits in memory, equality search answered with one disk access; else two.
• Problem with extendible hashing:– If the distribution of hash values is skewed
(concentrates on a few buckets), directory can grow large.
– Can you come up with one insertion leading to multiple splits
• Delete: If removal of data entry makes bucket empty, can be merged with `split image’. If each directory element points to same bucket as its split image, can halve directory.
19
Skewed data distribution (multiple splits)
• Assume each bucket holds one data entry
• Insert 2 (binary 10) – how many times of split?
• Insert 16 (binary 10000) – how many times of split?
0
1
LOCAL DEPTH
GLOBAL DEPTH 0* 8*
1
11
20
Delete 10*
00011011
2 2
2
LOCAL DEPTH 2
DIRECTORY
GLOBAL DEPTHBucket A
Bucket B
Bucket C
Bucket D
1*5* 21*13*
32*16*
10*
15*7*19*
4*12*
2 00011011
2 2
2
LOCAL DEPTH 1
DIRECTORY
GLOBAL DEPTHBucket A
Bucket B
Bucket B2
1*5* 21*13*
32*16*
15*7*19*
4*12*
21
Delete 15*, 7*, 19*
00011011
2 2
2
LOCAL DEPTH 1
DIRECTORY
GLOBAL DEPTHBucket A
Bucket B
Bucket B2
1*5* 21*13*
32*16*
15*7*19*
4*12*
00011011
2 1
LOCAL DEPTH 1
GLOBAL DEPTHBucket A
Bucket B1*5* 21*13*
32*16*4*12*
DIRECTORY
0001
1 1
LOCAL DEPTH 1
GLOBAL DEPTHBucket A
Bucket B1*5* 21*13*
32*16*4*12*
22
Linear Hashing (LH)
• This is another dynamic hashing scheme, an alternative to Extendible Hashing.– LH fixes the problem of long overflow chains (in static
hashing) without using a directory (in extendible hashing).
• Basic Idea: Use a family of hash functions h0, h1, h2, ...
– Each function’s range is twice that of its predecessor.– Pages are split when overflows occur – but not necessarily
the page with the overflow.– Splitting occurs in turn, in a round robin fashion.– When all the pages at one level (the current hash
function) have been split, a new level is applied.– Splitting occurs gradually– Primary pages are allocated consecutively.
23
Levels of Linear Hashing• Initial Stage.
– The initial level distributes entries into N0 buckets.– Call the hash function to perform this h0.
• Splitting buckets.– If a bucket overflows its primary page is chained to an
overflow page (same as in static hashing).– Also when a bucket overflows, some bucket is split.
• The first bucket to be split is the first bucket in the file (not necessarily the bucket that overflows).
• The next bucket to be split is the second bucket in the file … and so on until the Nth. has been split.
• When buckets are split their entries (including those in overflow pages) are distributed using h1.
– To access split buckets the next level hash function (h1) is applied.
– h1 maps entries to 2N0 (or N1)buckets.
24
Levels of Linear Hashing (Cnt)
• Level progression:– Once all Ni buckets of the current level (i) are split t
he hash function hi is replaced by hi+1.– The splitting process starts again at the first bucket
and hi+2 is applied to find entries in split buckets.
25
Linear Hashing Example• Initially, the index level equal
to 0 and N0 equals 4 (three entries fit on a page).
• h0 maps index entries to one of four buckets.
• h0 is used and no buckets have been split.
• Now consider what happens when 9 (1001) is inserted (which will not fit in the second bucket).
• Note that next indicates which bucket is to split next. (Round Robin)
next
64 36
1 17 5
6
31 15
00
01
10
11
h0
26
Linear Hashing Example 2
• An overflow page is chained to the primary page to contain the inserted value.
• Note that the split page is not necessary the overflow page – round robin.
• If h0 maps a value from zero to next – 1 (just the first page in this case), h1 must be used to insert the new entry.
• Note how the new page falls naturally into the sequence as the fifth page.
h1 next
64
h0 next
1 17 5 9
h0 6
h0 31 15
h1 36
• The page indicated by next is split (the first one).
• Next is incremented.
27
Linear Hashing
• Assume inserts of 8, 7, 18, 14, 111, 32, 162, 10, 13, 233
• After the 2nd. split the base level is 1 (N1 = 8), use h1.
• Subsequent splits will use h2 for inserts between the first bucket and next-1.
2 1
h1 h1 next3
64 8 32 16
h1 h1 1 17
9
h1 h0 next1
10 18
6 18
14
h0 h0 next2
11
31 15
7 11
h1 h1 36
h1 h1 5 13
h1 - 6 14
- - 31 15
7 23
28
LH Described as a Variant of EH
• Two schemes are similar:– Begin with an EH index where directory has N elements.– Use overflow pages, split buckets round-robin.– First split is at bucket 0. (Imagine directory being doubled
at this point.) But elements <1,N+1>, <2,N+2>, ... are the same. So, need only create directory element N, which differs from 0, now.
• When bucket 1 splits, create directory element N+1, etc.
• So, directory can double gradually. Also, primary bucket pages are created in order. If they are allocated in sequence too (so that finding i’th is easy), we actually don’t need a directory! Voila, LH.