+ All Categories
Home > Documents > Chapter 4 Index Structures. Yeungnam University, Database Lab.Chapter 4 - 1 Table of Contents...

Chapter 4 Index Structures. Yeungnam University, Database Lab.Chapter 4 - 1 Table of Contents...

Date post: 17-Jan-2016
Category:
Upload: jemimah-little
View: 228 times
Download: 0 times
Share this document with a friend
68
Chapter 4 Chapter 4 Index Structures Index Structures
Transcript
Page 1: Chapter 4 Index Structures. Yeungnam University, Database Lab.Chapter 4 - 1 Table of Contents 1.Indexes on Sequential Files 2.Secondary Indexes 3.B-Trees.

Chapter 4Chapter 4

Index StructuresIndex Structures

Page 2: Chapter 4 Index Structures. Yeungnam University, Database Lab.Chapter 4 - 1 Table of Contents 1.Indexes on Sequential Files 2.Secondary Indexes 3.B-Trees.

Yeungnam University, Database Lab. Chapter 4 - 2

Table of ContentsTable of Contents

1. Indexes on Sequential Files

2. Secondary Indexes

3. B-Trees

4. Hash Tables

Page 3: Chapter 4 Index Structures. Yeungnam University, Database Lab.Chapter 4 - 1 Table of Contents 1.Indexes on Sequential Files 2.Secondary Indexes 3.B-Trees.

Yeungnam University, Database Lab. Chapter 4 - 3

Index 의 종류 Ordered Indexing: Indexed Sequential Access Method Hashing: Relative Access Method

Performance Factors Access Type (point query, range query) Access, Insertion and Deletion Time Space Overhead

Basic ConceptsBasic Concepts

Page 4: Chapter 4 Index Structures. Yeungnam University, Database Lab.Chapter 4 - 1 Table of Contents 1.Indexes on Sequential Files 2.Secondary Indexes 3.B-Trees.

Yeungnam University, Database Lab. Chapter 4 - 4

Access MethodAccess Method 정의

Storage Structure and Search Mechanism

종류 Primary Access Method

– Primary key 에 대한 indexing– Sequential, Indexed sequential, Hashing

Secondary Access Method– Secondary key 에 대한 indexing– Multi-list File, Inverted File

Access Methods for Multi-key Searching

Page 5: Chapter 4 Index Structures. Yeungnam University, Database Lab.Chapter 4 - 1 Table of Contents 1.Indexes on Sequential Files 2.Secondary Indexes 3.B-Trees.

Yeungnam University, Database Lab. Chapter 4 - 5

1. Indexes on Sequential Files1. Indexes on Sequential Files Sequential Files Dense Indexes Sparse Indexes Multiple Levels of Index Indexes with Duplicate Search Keys Managing Indexes During Data Modifications

Page 6: Chapter 4 Index Structures. Yeungnam University, Database Lab.Chapter 4 - 1 Table of Contents 1.Indexes on Sequential Files 2.Secondary Indexes 3.B-Trees.

Yeungnam University, Database Lab. Chapter 4 - 6

1.1 Sequential Files1.1 Sequential Files 정의

Records ordered by search key. Blocks containing records therefore ordered.

On Insert Put record in appropriate block if room. Good Idea:

– Initialize blocks to be less than full. (Page Fill Factor)– Reorganize periodically if file grows.

If no room in proper block:– Create new block; insert into proper order if possible.– If not possible, create (linked) overflow block.

Page 7: Chapter 4 Index Structures. Yeungnam University, Database Lab.Chapter 4 - 1 Table of Contents 1.Indexes on Sequential Files 2.Secondary Indexes 3.B-Trees.

Yeungnam University, Database Lab. Chapter 4 - 7

Ordered IndexingOrdered Indexing Indexed Sequential File 을 구성할 때 사용

Sequential Access Random Access

Notation Primary Index (Clustering Index)

– File 의 sequential order 를 결정하는 필드의 index Secondary Index Dense Index Sparse Index Multilevel Index

Page 8: Chapter 4 Index Structures. Yeungnam University, Database Lab.Chapter 4 - 1 Table of Contents 1.Indexes on Sequential Files 2.Secondary Indexes 3.B-Trees.

Yeungnam University, Database Lab. Chapter 4 - 8

1.2-3 Dense/Sparse Index1.2-3 Dense/Sparse Index Dense Index

Pointer to every record of file, ordered by search key. Can make sense because records may be much bigger than

key-pointer pairs.– Fit index in memory, even if data file does not?– Faster search through index than data file?– Test existence of record without going to data file.

Sparse Index Key-pointer pairs for only a subset of records, typically

first in each block. Saves index space.

Page 9: Chapter 4 Index Structures. Yeungnam University, Database Lab.Chapter 4 - 1 Table of Contents 1.Indexes on Sequential Files 2.Secondary Indexes 3.B-Trees.

Yeungnam University, Database Lab. Chapter 4 - 9

Dense IndexDense Index

40

30

20

10

20

10

80

70

60

50

120

110

100

90

40

30

60

50

80

70

100

90

Page 10: Chapter 4 Index Structures. Yeungnam University, Database Lab.Chapter 4 - 1 Table of Contents 1.Indexes on Sequential Files 2.Secondary Indexes 3.B-Trees.

Yeungnam University, Database Lab. Chapter 4 - 10

Sparse IndexSparse Index

70

50

30

10

20

10

150

130

110

90

230

210

190

170

40

30

60

50

80

70

100

90

Page 11: Chapter 4 Index Structures. Yeungnam University, Database Lab.Chapter 4 - 1 Table of Contents 1.Indexes on Sequential Files 2.Secondary Indexes 3.B-Trees.

Yeungnam University, Database Lab. Chapter 4 - 11

1.4 Multiple Levels of Index1.4 Multiple Levels of Index A sparse index on a (sparse or dense) index is an

option. Good chance than 2nd or higher level indexes can be

housed in main memory, so no additional disk I/O’s. Dense higher level indexes make no sense;

– dense(dense) = same dense index.

Page 12: Chapter 4 Index Structures. Yeungnam University, Database Lab.Chapter 4 - 1 Table of Contents 1.Indexes on Sequential Files 2.Secondary Indexes 3.B-Trees.

Yeungnam University, Database Lab. Chapter 4 - 12

Multiple Levels of IndexMultiple Levels of Index

DataBlock 0

DataBlock 1

IndexBlock 0

IndexBlock 1

Inner index

Page 13: Chapter 4 Index Structures. Yeungnam University, Database Lab.Chapter 4 - 1 Table of Contents 1.Indexes on Sequential Files 2.Secondary Indexes 3.B-Trees.

Yeungnam University, Database Lab. Chapter 4 - 13

1.5 Indexes with Duplicate Search Keys1.5 Indexes with Duplicate Search Keys

Dense Index Duplicate key-pointer pairs. Pointers to only first record with given search key.

Sparse Index Pointer to first record of each block. Pointer to first new key on block.

– Sole key if all are the same.

Page 14: Chapter 4 Index Structures. Yeungnam University, Database Lab.Chapter 4 - 1 Table of Contents 1.Indexes on Sequential Files 2.Secondary Indexes 3.B-Trees.

Yeungnam University, Database Lab. Chapter 4 - 14

First Key Occurrences OnlyFirst Key Occurrences Only

40

30

20

10

10

10

etc.

5020

10

30

20

30

30

50

40

Page 15: Chapter 4 Index Structures. Yeungnam University, Database Lab.Chapter 4 - 1 Table of Contents 1.Indexes on Sequential Files 2.Secondary Indexes 3.B-Trees.

Yeungnam University, Database Lab. Chapter 4 - 15

Sparse, First Key on BlockSparse, First Key on Block

30

20

10

10

etc.

40

10

10

20

10

30

20

30

30

50

40

Page 16: Chapter 4 Index Structures. Yeungnam University, Database Lab.Chapter 4 - 1 Table of Contents 1.Indexes on Sequential Files 2.Secondary Indexes 3.B-Trees.

Yeungnam University, Database Lab. Chapter 4 - 16

Sparse, First New KeySparse, First New Key

30

30

20

10

etc.

40

10

10

20

10

30

20

30

30

50

40

Page 17: Chapter 4 Index Structures. Yeungnam University, Database Lab.Chapter 4 - 1 Table of Contents 1.Indexes on Sequential Files 2.Secondary Indexes 3.B-Trees.

Yeungnam University, Database Lab. Chapter 4 - 17

LookupLookup Find key in dense index; Find greatest key in sparse. Follow Pointer

Dense, no duplicates: Just follow. Dense, duplicates:

– Follow each (pointer per record)– Follow and look at successive records (pointer to first

with given key) Sparse, no duplicates: Follow to block, examine block. Sparse, duplicates, key = lowest in block:

– Follow to block, look at block, successive blocks until higher key met, and (if key = desired key) previous block.

Sparse, duplicates, key = lowest new in block:– Follow to record, search following records of block, and

successive blocks until higher key met.

Page 18: Chapter 4 Index Structures. Yeungnam University, Database Lab.Chapter 4 - 1 Table of Contents 1.Indexes on Sequential Files 2.Secondary Indexes 3.B-Trees.

Yeungnam University, Database Lab. Chapter 4 - 18

1.6 Managing Indexes During Data 1.6 Managing Indexes During Data Modifications Modifications DB Modifications

When we insert or delete on the data file, here are the primitive actions we might take:

1. Create or destroy an empty block in the sequence of blocks belonging to the sequential file.

2. Create or destroy an overflow block.3. Insert a record into a block that has room.4. Delete a record.5. Slide a record to an adjacent block.

Page 19: Chapter 4 Index Structures. Yeungnam University, Database Lab.Chapter 4 - 1 Table of Contents 1.Indexes on Sequential Files 2.Secondary Indexes 3.B-Trees.

Yeungnam University, Database Lab. Chapter 4 - 19

Effect of Primitive Actions on Index FileEffect of Primitive Actions on Index File

Action Dense Sparse

Create/destroy empty overflow block

none none

Create empty seq. block none insert

Destroy empty seq. block none delete

Insert record insert update(?)

Delete record delete update(?)

Slide record update update(?)

Page 20: Chapter 4 Index Structures. Yeungnam University, Database Lab.Chapter 4 - 1 Table of Contents 1.Indexes on Sequential Files 2.Secondary Indexes 3.B-Trees.

Yeungnam University, Database Lab. Chapter 4 - 20

Example:Example: Delete 30 with Dense IndexDelete 30 with Dense Index

40

20

10

20

10

80

70

60

5040

60

50

80

70

Page 21: Chapter 4 Index Structures. Yeungnam University, Database Lab.Chapter 4 - 1 Table of Contents 1.Indexes on Sequential Files 2.Secondary Indexes 3.B-Trees.

Yeungnam University, Database Lab. Chapter 4 - 21

Example:Example: Delete 30 with Sparse IndexDelete 30 with Sparse Index

70

50

40

10

20

10

150

130

110

90

40

60

50

80

70

Page 22: Chapter 4 Index Structures. Yeungnam University, Database Lab.Chapter 4 - 1 Table of Contents 1.Indexes on Sequential Files 2.Secondary Indexes 3.B-Trees.

Yeungnam University, Database Lab. Chapter 4 - 22

Example: Insert 15 with Sparse Index - Example: Insert 15 with Sparse Index - RedistributeRedistribute

70

50

20

10

15

10

150

130

110

9040

20

60

50

80

70

Page 23: Chapter 4 Index Structures. Yeungnam University, Database Lab.Chapter 4 - 1 Table of Contents 1.Indexes on Sequential Files 2.Secondary Indexes 3.B-Trees.

Yeungnam University, Database Lab. Chapter 4 - 23

Use Overflow Block InsteadUse Overflow Block Instead

70

50

40

10

15

10

150

130

110

90

40

60

50

80

70

20

Page 24: Chapter 4 Index Structures. Yeungnam University, Database Lab.Chapter 4 - 1 Table of Contents 1.Indexes on Sequential Files 2.Secondary Indexes 3.B-Trees.

Yeungnam University, Database Lab. Chapter 4 - 24

2. Secondary Indexes2. Secondary Indexes 정의

Primary Index: Search key 에 따라 레코드의 위치 결정 Secondary Index: Search key 와 레코드 주소간의 관계 無

– Sparse, secondary index makes no sense?– Usually, search key is not a “key”.

Table of Contents Design of Secondary Indexes Application of Secondary Indexes Indirection in Secondary Indexes Document Retrieval and Inverted Indexes

Page 25: Chapter 4 Index Structures. Yeungnam University, Database Lab.Chapter 4 - 1 Table of Contents 1.Indexes on Sequential Files 2.Secondary Indexes 3.B-Trees.

Yeungnam University, Database Lab. Chapter 4 - 25

2.1 Design of Secondary Index2.1 Design of Secondary Index

20

20

10

10

40

20

50

40

30

20

60

50

20

10

30

50

50

10

20

60

Data File:

not sortedIndex File:

sorted

Page 26: Chapter 4 Index Structures. Yeungnam University, Database Lab.Chapter 4 - 1 Table of Contents 1.Indexes on Sequential Files 2.Secondary Indexes 3.B-Trees.

Yeungnam University, Database Lab. Chapter 4 - 26

2.2 Application of Secondary Index2.2 Application of Secondary Index

Clustered File Structure Records of different types (e.g. EMPLOYEE, DEPT)

allowed in same block.

DEPT.dept_id 에 대한 secondary index 필요 장점

– Records that are frequently accessed together should be in the same block.

– Reduce join overhead 단점

– Select * from EMP;

EMP e1 DEPT d1 DEPT d2

Page 27: Chapter 4 Index Structures. Yeungnam University, Database Lab.Chapter 4 - 1 Table of Contents 1.Indexes on Sequential Files 2.Secondary Indexes 3.B-Trees.

Yeungnam University, Database Lab. Chapter 4 - 27

2.3 Indirect Buckets2.3 Indirect Buckets 필요성

To avoid repeating keys in index, use a level of indirection, called buckets.

추가적인 장점– 실제 레코드에 대한 검색없이 레코드 집합들간에

교집합 연산 가능

Example Movies(title, year, length, studioName) Secondary indexes on studioName and year.

SELECT title FROM MoviesWHERE studioName = ‘Disney’ AND year = 1995;

Page 28: Chapter 4 Index Structures. Yeungnam University, Database Lab.Chapter 4 - 1 Table of Contents 1.Indexes on Sequential Files 2.Secondary Indexes 3.B-Trees.

Yeungnam University, Database Lab. Chapter 4 - 28

Saving Space by Indirect BucketsSaving Space by Indirect Buckets

40

20

20

10

30

50

50

10

20

60

40

30

20

10

60

50

etc.

Page 29: Chapter 4 Index Structures. Yeungnam University, Database Lab.Chapter 4 - 1 Table of Contents 1.Indexes on Sequential Files 2.Secondary Indexes 3.B-Trees.

Yeungnam University, Database Lab. Chapter 4 - 29

Intersecting Buckets in Main MemoryIntersecting Buckets in Main Memory

Movie tuplesBuckets for studio Buckets for year

Disney

Studio Index

1995

Year Index

Page 30: Chapter 4 Index Structures. Yeungnam University, Database Lab.Chapter 4 - 1 Table of Contents 1.Indexes on Sequential Files 2.Secondary Indexes 3.B-Trees.

Yeungnam University, Database Lab. Chapter 4 - 30

2.4 Document Retrieval and2.4 Document Retrieval and Inverted Indexes Inverted Indexes

Relational View of Documents A document A tuple in a relation Doc

– An attribute for each possible word in a document– Each attribute is boolean ( 예 : hasCat, hasDog, …)

There is a secondary index on each of the attributes of Doc.– FALSE attribute 에 대해서는 인덱스 필요 없음 .

각 애트리뷰트에 대해 인덱스를 만들지 않고 , inverted index 형태로 하나의 인덱스에 통합 .

Page 31: Chapter 4 Index Structures. Yeungnam University, Database Lab.Chapter 4 - 1 Table of Contents 1.Indexes on Sequential Files 2.Secondary Indexes 3.B-Trees.

Yeungnam University, Database Lab. Chapter 4 - 31

Inverted Index on DocumentsInverted Index on Documents

cat

dog

Invertedindex

Buckets

…the catis fat…

…was rainingcats and dogs

…Fido the dog…

Documents

Page 32: Chapter 4 Index Structures. Yeungnam University, Database Lab.Chapter 4 - 1 Table of Contents 1.Indexes on Sequential Files 2.Secondary Indexes 3.B-Trees.

Yeungnam University, Database Lab. Chapter 4 - 32

Additional Info. in BucketsAdditional Info. in Buckets

cat Title 5

Title 100

Author 10

Abstract 57

Title 12

d3d2

d1

dog

typeposit

ion

locatio

n

Page 33: Chapter 4 Index Structures. Yeungnam University, Database Lab.Chapter 4 - 1 Table of Contents 1.Indexes on Sequential Files 2.Secondary Indexes 3.B-Trees.

Yeungnam University, Database Lab. Chapter 4 - 33

3. B3. B++-Trees-Trees Properties

Balanced tree (i.e., dynamic multilevel index) Each node, except the root node, is at least half-full. Leaves are dense and sequentially ordered index. The root has at least two children. The records in a node are ordered.

Page 34: Chapter 4 Index Structures. Yeungnam University, Database Lab.Chapter 4 - 1 Table of Contents 1.Indexes on Sequential Files 2.Secondary Indexes 3.B-Trees.

Yeungnam University, Database Lab. Chapter 4 - 34

Node FormatsNode Formats Non-leaf Nodes

< (P0), (K1, P1), …, (Kn-1, Pn-1) > Pi: Pointer to the child node Ki: Search key

Leaf Nodes < (K0, P0), (K1, P1), …, (Kn-2, Pn-2), (Pn-1) > P0 i n-2: Pointer to the data Pn-1: Pointer to the sibling leaf node

Page 35: Chapter 4 Index Structures. Yeungnam University, Database Lab.Chapter 4 - 1 Table of Contents 1.Indexes on Sequential Files 2.Secondary Indexes 3.B-Trees.

Yeungnam University, Database Lab. Chapter 4 - 35

BB++-Tree Example (-Tree Example (nn = 3) = 3)

100

120

150

180

30

3 5 11

30

35

100

101

110

120

130

150

156

179

180

200

Root

Page 36: Chapter 4 Index Structures. Yeungnam University, Database Lab.Chapter 4 - 1 Table of Contents 1.Indexes on Sequential Files 2.Secondary Indexes 3.B-Trees.

Yeungnam University, Database Lab. Chapter 4 - 36

Sample Non-Leaf NodeSample Non-Leaf Node

to keys to keys to keys to keys

< 57 57 k < 81 81 k < 95 95

57

81

95

Page 37: Chapter 4 Index Structures. Yeungnam University, Database Lab.Chapter 4 - 1 Table of Contents 1.Indexes on Sequential Files 2.Secondary Indexes 3.B-Trees.

Yeungnam University, Database Lab. Chapter 4 - 37

Sample Leaf NodeSample Leaf Node

From non-leaf node

to next leaf

in sequence57

81

95

To r

eco

rd

wit

h k

ey 5

7

To r

eco

rd

wit

h k

ey 8

1

To r

eco

rd

wit

h k

ey 8

5

Page 38: Chapter 4 Index Structures. Yeungnam University, Database Lab.Chapter 4 - 1 Table of Contents 1.Indexes on Sequential Files 2.Secondary Indexes 3.B-Trees.

Yeungnam University, Database Lab. Chapter 4 - 38

Operations of BOperations of B++-Tree -Tree Record Location (Basic Rule)

If Ki L < Ki+1, then L is in a descendant node in the subtree pointed by Pi.

If L < K1, then L is in a descendant node in the subtree pointed by P0.

LOOKUP Record Location Rule

Page 39: Chapter 4 Index Structures. Yeungnam University, Database Lab.Chapter 4 - 1 Table of Contents 1.Indexes on Sequential Files 2.Secondary Indexes 3.B-Trees.

Yeungnam University, Database Lab. Chapter 4 - 39

Insertion into BInsertion into B++-Tree-Tree INSERT (Search Key = K)

Find appropriate leaf node, say L1. Case 1: L1 is not full: Insert and STOP Case 2: L1 is full (Node Splitting)

– Create a new node, say L2.– Move the last half of L1 K to L2.– Insert the first search key of L2 into the parent

recursively until no need for node splitting.

Page 40: Chapter 4 Index Structures. Yeungnam University, Database Lab.Chapter 4 - 1 Table of Contents 1.Indexes on Sequential Files 2.Secondary Indexes 3.B-Trees.

Yeungnam University, Database Lab. Chapter 4 - 40

Example: Insert key = 32Example: Insert key = 32

3 5 11

30

31

30

10

03

2

Page 41: Chapter 4 Index Structures. Yeungnam University, Database Lab.Chapter 4 - 1 Table of Contents 1.Indexes on Sequential Files 2.Secondary Indexes 3.B-Trees.

Yeungnam University, Database Lab. Chapter 4 - 41

Example: Insert key = 7Example: Insert key = 7

3 5 11

30

31

30

100

3 5

7

7

Page 42: Chapter 4 Index Structures. Yeungnam University, Database Lab.Chapter 4 - 1 Table of Contents 1.Indexes on Sequential Files 2.Secondary Indexes 3.B-Trees.

Yeungnam University, Database Lab. Chapter 4 - 42

Example: Insert key = 160Example: Insert key = 160

100

120

150

180

150

156

179

18

02

00

16

0

180

160

179

Page 43: Chapter 4 Index Structures. Yeungnam University, Database Lab.Chapter 4 - 1 Table of Contents 1.Indexes on Sequential Files 2.Secondary Indexes 3.B-Trees.

Yeungnam University, Database Lab. Chapter 4 - 43

Example: Insert key = 45Example: Insert key = 45

10

20

30

1 2 3 10

12

20

25

30

32

40

40

45

40

30new root

Page 44: Chapter 4 Index Structures. Yeungnam University, Database Lab.Chapter 4 - 1 Table of Contents 1.Indexes on Sequential Files 2.Secondary Indexes 3.B-Trees.

Yeungnam University, Database Lab. Chapter 4 - 44

Deletion from BDeletion from B++-Tree-Tree DELETE (Search Key = K)

Find appropriate leaf node. Remove K from the node. Case 1: The node is more than half-full.

– Handle if K is the first key, and STOP. Case 2: The node is less than half-full.

– Distribute or Coalesce– Ripple the effect to the ancestors, if necessary.

Page 45: Chapter 4 Index Structures. Yeungnam University, Database Lab.Chapter 4 - 1 Table of Contents 1.Indexes on Sequential Files 2.Secondary Indexes 3.B-Trees.

Yeungnam University, Database Lab. Chapter 4 - 45

Coalesce with Sibling (Coalesce with Sibling (nn = 4) = 4)

10

40

100

10

20

30

40

50

40

Delete 50

Page 46: Chapter 4 Index Structures. Yeungnam University, Database Lab.Chapter 4 - 1 Table of Contents 1.Indexes on Sequential Files 2.Secondary Indexes 3.B-Trees.

Yeungnam University, Database Lab. Chapter 4 - 46

Redistribute Keys (Redistribute Keys (nn = 4) = 4)

10

40

100

10

20

30

35

40

5035

35

Delete 50

Page 47: Chapter 4 Index Structures. Yeungnam University, Database Lab.Chapter 4 - 1 Table of Contents 1.Indexes on Sequential Files 2.Secondary Indexes 3.B-Trees.

Yeungnam University, Database Lab. Chapter 4 - 47

Non-leaf CoalesceNon-leaf Coalesce

40

45

30

37

25

26

20

22

10

141 3

10 20 30

4040

30

25

25

new root

Delete 37

Page 48: Chapter 4 Index Structures. Yeungnam University, Database Lab.Chapter 4 - 1 Table of Contents 1.Indexes on Sequential Files 2.Secondary Indexes 3.B-Trees.

Yeungnam University, Database Lab. Chapter 4 - 48

B-Tree Index FilesB-Tree Index Files

RedwoodDowntown

Downtownbucket

ClearviewBrighton

Brightonbucket

Clearviewbucket

PerryridgeMianus

Mianusbucket

Perryridgebucket

Round Hill

Round Hillbucket

Redwoodbucket

Page 49: Chapter 4 Index Structures. Yeungnam University, Database Lab.Chapter 4 - 1 Table of Contents 1.Indexes on Sequential Files 2.Secondary Indexes 3.B-Trees.

Yeungnam University, Database Lab. Chapter 4 - 49

B-Tree Index Files (Cont’d)B-Tree Index Files (Cont’d) Advantages

Search-key values appear only once. The pointers of desired data can be found in nonleaf nodes.

Disadvantages Nonleaf nodes are larger than leaf nodes. Deletion is more complicated. Difficulty in sequential searching

Page 50: Chapter 4 Index Structures. Yeungnam University, Database Lab.Chapter 4 - 1 Table of Contents 1.Indexes on Sequential Files 2.Secondary Indexes 3.B-Trees.

Yeungnam University, Database Lab. Chapter 4 - 50

4. Hashing4. Hashing Hashing 의 정의

– Key-to-address transformation– For each search key K, H(K) gives the bucket number.– H: Hash Function– Uniformity– Problem in sequential processing

Hashing 의 종류 Static Hashing Dynamic Hashing

Page 51: Chapter 4 Index Structures. Yeungnam University, Database Lab.Chapter 4 - 1 Table of Contents 1.Indexes on Sequential Files 2.Secondary Indexes 3.B-Trees.

Yeungnam University, Database Lab. Chapter 4 - 51

HashingHashing 의 개념의 개념

<key>

.

.

Buckets(typically 1disk block)

key h(key)

Page 52: Chapter 4 Index Structures. Yeungnam University, Database Lab.Chapter 4 - 1 Table of Contents 1.Indexes on Sequential Files 2.Secondary Indexes 3.B-Trees.

Yeungnam University, Database Lab. Chapter 4 - 52

Two AlternativesTwo Alternatives

.

.

.

records

.

.

.

key h(key)

key h(key)

Index

recordkey 1

Page 53: Chapter 4 Index Structures. Yeungnam University, Database Lab.Chapter 4 - 1 Table of Contents 1.Indexes on Sequential Files 2.Secondary Indexes 3.B-Trees.

Yeungnam University, Database Lab. Chapter 4 - 53

4.1 Static Hashing4.1 Static Hashing Static Hashing 의 특징

Fixed directory size Overflow 발생 가능 Directory 의 크기에 따라

– Sparse directory– Increased search time

Overflow Handling Linear probing Chaining

Page 54: Chapter 4 Index Structures. Yeungnam University, Database Lab.Chapter 4 - 1 Table of Contents 1.Indexes on Sequential Files 2.Secondary Indexes 3.B-Trees.

Yeungnam University, Database Lab. Chapter 4 - 54

Hash IndexHash Index

Redwood

Perryridge

Perryridge

Perryridge

Mianus

Downtown

Downtown

Brighton

Round Hill

A-222

A-218

A-201

A-102

A-215

A-110

A-101

A-217

A-305

700

700

900

400

700

600

500

750

350

Bucket 0

A-305

A-215

Bucket 1

A-222

Bucket 6

Page 55: Chapter 4 Index Structures. Yeungnam University, Database Lab.Chapter 4 - 1 Table of Contents 1.Indexes on Sequential Files 2.Secondary Indexes 3.B-Trees.

Yeungnam University, Database Lab. Chapter 4 - 55

Example: 2 records/bucketExample: 2 records/bucket

INSERT:

h(a) = 1

h(b) = 2

h(c) = 1

h(d) = 0

0

1

2

3

d

ac

b

h(e) = 1

e

Page 56: Chapter 4 Index Structures. Yeungnam University, Database Lab.Chapter 4 - 1 Table of Contents 1.Indexes on Sequential Files 2.Secondary Indexes 3.B-Trees.

Yeungnam University, Database Lab. Chapter 4 - 56

Example: DeletionExample: Deletion

0

1

2

3

a

bc

e

d

Delete:ef

fg

maybe move“g” up

cd

Page 57: Chapter 4 Index Structures. Yeungnam University, Database Lab.Chapter 4 - 1 Table of Contents 1.Indexes on Sequential Files 2.Secondary Indexes 3.B-Trees.

Yeungnam University, Database Lab. Chapter 4 - 57

5.2 Dynamic Hashing5.2 Dynamic Hashing File 크기가 증가할 경우 Static Hashing

현재 파일 크기를 기준으로 한 hash function 설계 예상 파일 크기를 기준으로 한 hash function 설계 Hash function 을 주기적으로 재 계산

Dynamic Hashing Dynamic directory size Controlled overflow/underflow Extendable hashing Linear Hashing

Page 58: Chapter 4 Index Structures. Yeungnam University, Database Lab.Chapter 4 - 1 Table of Contents 1.Indexes on Sequential Files 2.Secondary Indexes 3.B-Trees.

Yeungnam University, Database Lab. Chapter 4 - 58

Extensible HashingExtensible Hashing 기본 개념

Use i of b bits output by hash function.

bucket address table

00

01

10

11

hash prefix

buckets

i1

i2

i3

00110101

use i grows over time….

b

h(k)

Page 59: Chapter 4 Index Structures. Yeungnam University, Database Lab.Chapter 4 - 1 Table of Contents 1.Indexes on Sequential Files 2.Secondary Indexes 3.B-Trees.

Yeungnam University, Database Lab. Chapter 4 - 59

ExampleExamplebranch-name h(branch-name)

Brighton

Clearview

Downtown

Mianus

Perryridge

Redwood

Round Hill

0010 1101 1111 1011 0010 1100 0011 0000

1101 0101 1101 1110 0100 0110 1001 0011

1010 0011 1010 0000 1100 0110 1001 1111

1000 0111 1110 1101 1011 1111 0011 1010

1111 0001 0010 0100 1001 0011 0110 1101

1011 0101 1010 0110 1100 1001 1110 1011

0101 1000 0011 1111 1001 1100 0000 0001

bucket address table

0

hash prefix

buckets

0

Initial Extensible Hash Structure (i = 0)

Page 60: Chapter 4 Index Structures. Yeungnam University, Database Lab.Chapter 4 - 1 Table of Contents 1.Indexes on Sequential Files 2.Secondary Indexes 3.B-Trees.

Yeungnam University, Database Lab. Chapter 4 - 60

Example (Cont’d)Example (Cont’d)

bucket address table

0

hash prefix

buckets

Round Hill… 1

Hash Structure after 3 Insertions (i = 1)

1 Perryridge…Downtown…

1

Page 61: Chapter 4 Index Structures. Yeungnam University, Database Lab.Chapter 4 - 1 Table of Contents 1.Indexes on Sequential Files 2.Secondary Indexes 3.B-Trees.

Yeungnam University, Database Lab. Chapter 4 - 61

Example (Cont’d)Example (Cont’d)

bucket address table

00

hash prefix

buckets

Round Hill… 1

Hash Structure after 4 Insertions (i = 2)

01 Downtown…Redwood…

2

10

11 Perryridge… 2

Page 62: Chapter 4 Index Structures. Yeungnam University, Database Lab.Chapter 4 - 1 Table of Contents 1.Indexes on Sequential Files 2.Secondary Indexes 3.B-Trees.

Yeungnam University, Database Lab. Chapter 4 - 62

Example (Cont’d)Example (Cont’d)

bucket address table

000

hash prefix

buckets

Round Hill…Brighton…

1

Hash Structure after 7 Insertions (i = 3)

001Mianus… 3

010

011Downtown…Redwood…

3

100

101

110

111

Perryridge…Clearview…

2

Page 63: Chapter 4 Index Structures. Yeungnam University, Database Lab.Chapter 4 - 1 Table of Contents 1.Indexes on Sequential Files 2.Secondary Indexes 3.B-Trees.

Yeungnam University, Database Lab. Chapter 4 - 63

Linear HashingLinear Hashing Extensible Hashing 의 단점

Bucket array 의 크기가 2 배로 증가할 때 , 처리 시간 증가 2 배로 증가한 bucket array 를 memory 에 저장할 수 없을

경우 , 디스크 I/O 증가 특정 bucket 에 레코드가 폭주할 경우 , bucket array 크기

증가

Linear Hashing 의 개념 전체 Bucket 수 (n) 에 비해 저장된 레코드 수 (r) 가

과다할 경우 Bucket 수를 하나씩 증가 Overflow 발생 가능 (r / n < threshold)

Page 64: Chapter 4 Index Structures. Yeungnam University, Database Lab.Chapter 4 - 1 Table of Contents 1.Indexes on Sequential Files 2.Secondary Indexes 3.B-Trees.

Yeungnam University, Database Lab. Chapter 4 - 64

Linear Hashing (Cont’d)Linear Hashing (Cont’d) Linear Hashing 의 동작 과정

Use i bits from right (low-order) end of h(K). Buckets numbered [0..n-1], where 2i-1 < n 2i. Let last i bits of h(K) be m = (a1 a2 … ai).

– If m < n, then record belongs in bucket m.– If n m < 2i. then record belongs in bucket (0 a2 … ai).

00001010

0

11111

i = 1n = 2r = 3

Page 65: Chapter 4 Index Structures. Yeungnam University, Database Lab.Chapter 4 - 1 Table of Contents 1.Indexes on Sequential Files 2.Secondary Indexes 3.B-Trees.

Yeungnam University, Database Lab. Chapter 4 - 65

Insertion into Linear Hash TableInsertion into Linear Hash Table

알고리즘 Lookup 의 결과 bucket B 가 underflow 이면 , B 에 저장 B 가 full 일 경우 , overflow bucket 을 생성하여 저장 . (++r / n > threshold) 일 경우 , n = n + 1

– n > 2i + 1 일 경우 , i = i + 1– n - 2i-1 의 bucket 을 분할하여 새로운 bucket 에 할당

Example (Threshold = 1.7)

00001010

0

11111

i = 1n = 2r = 3

+ 0101

Page 66: Chapter 4 Index Structures. Yeungnam University, Database Lab.Chapter 4 - 1 Table of Contents 1.Indexes on Sequential Files 2.Secondary Indexes 3.B-Trees.

Yeungnam University, Database Lab. Chapter 4 - 66

Example - InsertionExample - Insertion

000000

01011111

01

i = 2n = 3r = 4

101010

+ 0001

000000

00010101

01

i = 2n = 3r = 5

101010

1111

+ 0111

Page 67: Chapter 4 Index Structures. Yeungnam University, Database Lab.Chapter 4 - 1 Table of Contents 1.Indexes on Sequential Files 2.Secondary Indexes 3.B-Trees.

Yeungnam University, Database Lab. Chapter 4 - 67

Example - InsertionExample - Insertion

000000

00010101

01

i = 2n = 4r = 6

101010

01111111

11

Page 68: Chapter 4 Index Structures. Yeungnam University, Database Lab.Chapter 4 - 1 Table of Contents 1.Indexes on Sequential Files 2.Secondary Indexes 3.B-Trees.

Yeungnam University, Database Lab. Chapter 4 - 68

Comparison of B-Tree and HashingComparison of B-Tree and Hashing

Performance Depends on Query Type One Record Lookup (Point Query)

– Select A1, A2 from R where A3 = ‘c’– Hashing is good.

Range Searching (Range Query)– Select A1, A2 from R where A3 >= ‘c1’ and A3 <= ‘c2’– Indexing (i.e., B+-Tree) is good.

Order Preserving Hash Function If K1 < K2, then H(K1) < H(K2) Difficult to achieve


Recommended