+ All Categories
Home > Documents > Temple University – CIS Dept. CIS331– Principles of Database Systems

Temple University – CIS Dept. CIS331– Principles of Database Systems

Date post: 16-Feb-2016
Category:
Upload: garan
View: 47 times
Download: 0 times
Share this document with a friend
Description:
Temple University – CIS Dept. CIS331– Principles of Database Systems. V. Megalooikonomou Indexing and Hashing II (based on notes by Silberchatz,Korth, and Sudarshan and notes by C. Faloutsos at CMU). General Overview - rel. model. Relational model - SQL Formal & commercial query languages - PowerPoint PPT Presentation
57
Temple University – CIS Dept. CIS331– Principles of Database Systems V. Megalooikonomou Indexing and Hashing II (based on notes by Silberchatz,Korth, and Sudarshan and notes by C. Faloutsos at CMU)
Transcript
Page 1: Temple University – CIS Dept. CIS331– Principles of Database Systems

Temple University – CIS Dept.CIS331– Principles of Database Systems

V. Megalooikonomou

Indexing and Hashing II

(based on notes by Silberchatz,Korth, and Sudarshan and notes by C. Faloutsos at CMU)

Page 2: Temple University – CIS Dept. CIS331– Principles of Database Systems

General Overview - rel. model Relational model - SQL

Formal & commercial query languages

Functional Dependencies Normalization Physical Design Indexing

Page 3: Temple University – CIS Dept. CIS331– Principles of Database Systems

Indexing- overview ISAM and B-trees Hashing Hashing vs B-trees Indices in SQL Advanced topics:

dynamic hashing multi-attribute indexing

Page 4: Temple University – CIS Dept. CIS331– Principles of Database Systems

(Static) HashingProblem: “find EMP record with ssn=123”

Q: What if disk space was free, and time was at premium?

Page 5: Temple University – CIS Dept. CIS331– Principles of Database Systems

HashingA: Brilliant idea: key-to-address transformation:

#0 page

#123 page

#999,999,999

123; Smith; Main str

Page 6: Temple University – CIS Dept. CIS331– Principles of Database Systems

HashingSince space is NOT free: use M, instead of 999,999,999 slots hash function: h(key) = slot-id

#0 page

#123 page

#999,999,999

123; Smith; Main str

Page 7: Temple University – CIS Dept. CIS331– Principles of Database Systems

HashingTypically: each hash bucket is a page, holding

many records:

#0 page

#h(123)

M

123; Smith; Main str

Page 8: Temple University – CIS Dept. CIS331– Principles of Database Systems

HashingNotice: could have clustering, or non-clustering

versions:

#0 page

#h(123)

M

123; Smith; Main str.

Page 9: Temple University – CIS Dept. CIS331– Principles of Database Systems

123

...

HashingNotice: could have clustering, or non-clustering

versions:

#0 page

#h(123)

M

...EMP file

123; Smith; Main str.

...

234; Johnson; Forbes ave

345; Tompson; Fifth ave

...

Page 10: Temple University – CIS Dept. CIS331– Principles of Database Systems

Indexing- overview ISAM and B-trees hashing

hashing functions size of hash table collision resolution

Hashing vs B-trees Indices in SQL Advanced topics:

Page 11: Temple University – CIS Dept. CIS331– Principles of Database Systems

Design decisions1) formula h() for hashing function2) size of hash table M3) collision resolution method

Page 12: Temple University – CIS Dept. CIS331– Principles of Database Systems

Design decisions - functions Goal:

uniform spread of keys over hash buckets Popular choices:

Division hashing

Multiplication hashing

Page 13: Temple University – CIS Dept. CIS331– Principles of Database Systems

Division hashingh(x) = (a*x+b) mod M

eg., h(ssn) = (ssn) mod 1,000 gives the last three digits of ssn

M: size of hash table - choose a prime number, defensively (why?)

Page 14: Temple University – CIS Dept. CIS331– Principles of Database Systems

eg., M=2; hash on driver-license number (dln), where the last digit is ‘gender’ (0/1 = M/F)

in an army unit with predominantly male soldiers

Thus: avoid cases where M and keys have common divisors -- prime M guards against that!

Division hashing

Page 15: Temple University – CIS Dept. CIS331– Principles of Database Systems

Multiplication hashingh(x) = [ fractional-part-of ( x * φ ) ] * M

φ: golden ratio ( 0.618... = ( sqrt(5)-1)/2 ) In general, we need an irrational number Advantage: M need not be a prime number But φ must be irrational

Page 16: Temple University – CIS Dept. CIS331– Principles of Database Systems

Other hashing functions quadratic hashing (bad) ... conclusion: use division hashing

Page 17: Temple University – CIS Dept. CIS331– Principles of Database Systems

Design decisions1) formula h() for hashing function2) size of hash table M3) collision resolution method

Page 18: Temple University – CIS Dept. CIS331– Principles of Database Systems

Size of hash table eg., 50,000 employees, 10

employee-records / page Q: M=?? pages/buckets/slots

Page 19: Temple University – CIS Dept. CIS331– Principles of Database Systems

Size of hash table eg., 50,000 employees, 10

employees/page Q: M=?? pages/buckets/slots A: utilization ~ 90% and

M: prime numberEg., in our case: M= closest prime to

50,000/10 / 0.9 = 5,555

Page 20: Temple University – CIS Dept. CIS331– Principles of Database Systems

Design decisions1) formula h() for hashing function2) size of hash table M3) collision resolution method

Page 21: Temple University – CIS Dept. CIS331– Principles of Database Systems

Collision resolution Q: what is a ‘collision’? A: ??

Page 22: Temple University – CIS Dept. CIS331– Principles of Database Systems

Collision resolution#0 page

#h(123)

M

123; Smith; Main str.

Page 23: Temple University – CIS Dept. CIS331– Principles of Database Systems

Collision resolution Q: what is a ‘collision’? A: ?? Q: why worry about

collisions/overflows? (recall that buckets are ~90% full)

A: e.g. bank account balances between $0 and $10,000 and between $90,000 and 100,000

Page 24: Temple University – CIS Dept. CIS331– Principles of Database Systems

Collision resolution open addressing

linear probing (ie., put to next slot/bucket)

re-hashing separate chaining (ie., put links to

overflow pages)

Page 25: Temple University – CIS Dept. CIS331– Principles of Database Systems

Collision resolution#0 page

#h(123)

M

123; Smith; Main str.

linear probing:

Page 26: Temple University – CIS Dept. CIS331– Principles of Database Systems

Collision resolution#0 page

#h(123)

M

123; Smith; Main str.

re-hashing

h1()

h2()

Page 27: Temple University – CIS Dept. CIS331– Principles of Database Systems

Collision resolution

123; Smith; Main str.

separate chaining

Page 28: Temple University – CIS Dept. CIS331– Principles of Database Systems

Design decisions - conclusions function: division hashing

h(x) = ( a*x+b ) mod M size M: ~90% util.; prime number. collision resolution: separate

chaining easier to implement (deletions!); no danger of becoming full

Page 29: Temple University – CIS Dept. CIS331– Principles of Database Systems

Indexing- overview ISAM and B-trees hashing Hashing vs B-trees Indices in SQL Advanced topics:

dynamic hashing multi-attribute indexing

Page 30: Temple University – CIS Dept. CIS331– Principles of Database Systems

Hashing vs B-trees:

Hashing offers speed ! ( O(1) avg. search time)

..but B-trees offer:

Page 31: Temple University – CIS Dept. CIS331– Principles of Database Systems

Hashing vs B-trees:

… but B-trees offer: key ordering:

range queries proximity queries sequential scan

O(log(N)) guarantees for search, ins./del. graceful growing/shrinking

Page 32: Temple University – CIS Dept. CIS331– Principles of Database Systems

Hashing vs B-trees:

thus: B-trees are implemented in most

systems

footnotes: hashing is not (why not?)

Page 33: Temple University – CIS Dept. CIS331– Principles of Database Systems

Indexing- overview ISAM and B-trees hashing Hashing vs B-trees Indices in SQL Advanced topics:

dynamic hashing multi-attribute indexing

Page 34: Temple University – CIS Dept. CIS331– Principles of Database Systems

Indexing in SQL create index <index-name> on <relation-

name> (<attribute-list>) create unique index <index-name> on

<relation-name> (<attribute-list>) (in the case that the search key is a

candidate key) drop index <index-name>

Page 35: Temple University – CIS Dept. CIS331– Principles of Database Systems

Indexing in SQL e.g.,

create index ssn-indexon STUDENT (ssn)

or (e.g., on TAKES(ssn,cid, grade) ):create index sc-indexon TAKES (ssn, c-id)

Page 36: Temple University – CIS Dept. CIS331– Principles of Database Systems

Indexing- overview ISAM and B-trees hashing Hashing vs B-trees Indices in SQL Advanced topics: (theoretical interest)

dynamic hashing multi-attribute indexing

Page 37: Temple University – CIS Dept. CIS331– Principles of Database Systems

Problem with static hashing problem: overflow? problem: underflow? (under-utilization)

Page 38: Temple University – CIS Dept. CIS331– Principles of Database Systems

Solution: Dynamic/extendible hashing

Idea: shrink / expand hash table on demand.. ... dynamic hashing

Details: how to grow gracefully, on overflow?

Many solutions - One of them: ‘extendible hashing’

Page 39: Temple University – CIS Dept. CIS331– Principles of Database Systems

Extendible hashing#0 page

#h(123)

M

123; Smith; Main str.

Page 40: Temple University – CIS Dept. CIS331– Principles of Database Systems

Extendible hashing#0 page

#h(123)

M

123; Smith; Main str.

solution:

split the bucket in two

Page 41: Temple University – CIS Dept. CIS331– Principles of Database Systems

Extendible hashingin detail: keep a directory, with ptrs to hash-buckets

Q: how to divide contents of bucket in two? A: hash each key into a very long bit

string; keep only as many bits as needed

Eventually:

Page 42: Temple University – CIS Dept. CIS331– Principles of Database Systems

Extendible hashingdirectory

00...01...

10...

11...

10101...

10110...

1101...

10011...

0111...0001...

101001...

Page 43: Temple University – CIS Dept. CIS331– Principles of Database Systems

Extendible hashingdirectory

00...01...

10...

11...

10101...

10110...

1101...

10011...

0111...0001...

101001...

Page 44: Temple University – CIS Dept. CIS331– Principles of Database Systems

Extendible hashingdirectory

00...01...

10...

11...

10101...

10110...

1101...

10011...

0111...0001...

101001...

split on 3-rd bit

Page 45: Temple University – CIS Dept. CIS331– Principles of Database Systems

Extendible hashingdirectory

00...01...

10...

11...

1101...

10011...

0111...0001...

101001...10101...

10110...

new page / bucket

Page 46: Temple University – CIS Dept. CIS331– Principles of Database Systems

Extendible hashingdirectory (doubled)

1101...

10011...

0111...0001...

101001...10101...

10110...

new page / bucket

000...001...

010...

011...

100...101...

110...

111...

Page 47: Temple University – CIS Dept. CIS331– Principles of Database Systems

Extendible hashing

00...01...

10...

11...

10101...

10110...

1101...

10011...

0111...0001...

101001...

000...001...

010...

011...

100...101...

110...

111...

1101...

10011...

0111...0001...

101001...10101...

10110...

BEFORE AFTER

Page 48: Temple University – CIS Dept. CIS331– Principles of Database Systems

Extendible hashing

Summary: directory doubles on demand or halves, on shrinking files needs ‘local’ and ‘global’ depth (see book) Mainly, of theoretical interest - same for

‘linear hashing’ of Litwin ‘order preserving’ ‘perfect hashing’ (no collisions!)

Page 49: Temple University – CIS Dept. CIS331– Principles of Database Systems

Indexing- overview ISAM and B-trees Hashing Hashing vs B-trees Indices in SQL Advanced topics:

dynamic hashing multi-attribute indexing

Page 50: Temple University – CIS Dept. CIS331– Principles of Database Systems

multiple-key access How to support queries on multiple

attributes, like grade>=3 and course=‘415’

Major motivation: Geographic Information systems (GIS)

Page 51: Temple University – CIS Dept. CIS331– Principles of Database Systems

multiple-key access

x

y

Page 52: Temple University – CIS Dept. CIS331– Principles of Database Systems

multiple-key accessTypical query: Find cities within x miles from

Philadelphiathus, we want to store nearby cities

on the same disk page:

Page 53: Temple University – CIS Dept. CIS331– Principles of Database Systems

multiple-key access

x

y

Page 54: Temple University – CIS Dept. CIS331– Principles of Database Systems

multiple-key access

x

y

Page 55: Temple University – CIS Dept. CIS331– Principles of Database Systems

multiple-key access - R-trees

x

y

Page 56: Temple University – CIS Dept. CIS331– Principles of Database Systems

multiple-key access - R-trees R-trees: very successful for GIS (along with ‘z-ordering’) more details: at ‘advanced topics’, later

Page 57: Temple University – CIS Dept. CIS331– Principles of Database Systems

Indexing- overview ISAM and B-trees hashing Hashing vs B-trees Indices in SQL Advanced topics:

dynamic hashing multi-attribute indexing

industry workhorse


Recommended