Chapter 4 Index Structures. Yeungnam University, Database Lab.Chapter 4 - 1 Table of Contents...

Chapter 4Chapter 4

Index StructuresIndex Structures

Yeungnam University, Database Lab. Chapter 4 - 2

Table of ContentsTable of Contents

1. Indexes on Sequential Files

2. Secondary Indexes

3. B-Trees

4. Hash Tables


Index 의 종류 Ordered Indexing: Indexed Sequential Access Method Hashing: Relative Access Method

Performance Factors Access Type (point query, range query) Access, Insertion and Deletion Time Space Overhead

Basic ConceptsBasic Concepts


Access MethodAccess Method 정의

Storage Structure and Search Mechanism

종류 Primary Access Method

– Primary key 에 대한 indexing– Sequential, Indexed sequential, Hashing

Secondary Access Method– Secondary key 에 대한 indexing– Multi-list File, Inverted File

Access Methods for Multi-key Searching


1. Indexes on Sequential Files1. Indexes on Sequential Files Sequential Files Dense Indexes Sparse Indexes Multiple Levels of Index Indexes with Duplicate Search Keys Managing Indexes During Data Modifications


1.1 Sequential Files1.1 Sequential Files 정의

Records ordered by search key. Blocks containing records therefore ordered.

On Insert Put record in appropriate block if room. Good Idea:

– Initialize blocks to be less than full. (Page Fill Factor)– Reorganize periodically if file grows.

If no room in proper block:– Create new block; insert into proper order if possible.– If not possible, create (linked) overflow block.


Ordered IndexingOrdered Indexing Indexed Sequential File 을 구성할 때 사용

Sequential Access Random Access

Notation Primary Index (Clustering Index)

– File 의 sequential order 를 결정하는 필드의 index Secondary Index Dense Index Sparse Index Multilevel Index


1.2-3 Dense/Sparse Index1.2-3 Dense/Sparse Index Dense Index

Pointer to every record of file, ordered by search key. Can make sense because records may be much bigger than

key-pointer pairs.– Fit index in memory, even if data file does not?– Faster search through index than data file?– Test existence of record without going to data file.

Sparse Index Key-pointer pairs for only a subset of records, typically

first in each block. Saves index space.


Dense IndexDense Index

40

30

20

10

20

10

80

70

60

50

120

110

100

90

40

30

60

50

80

70

100

90


Sparse IndexSparse Index

70

50

30

10

20

10

150

130

110

90

230

210

190

170

40

30

60

50

80

70

100

90


1.4 Multiple Levels of Index1.4 Multiple Levels of Index A sparse index on a (sparse or dense) index is an

option. Good chance than 2nd or higher level indexes can be

housed in main memory, so no additional disk I/O’s. Dense higher level indexes make no sense;

– dense(dense) = same dense index.


Multiple Levels of IndexMultiple Levels of Index

DataBlock 0

DataBlock 1

IndexBlock 0

IndexBlock 1

Inner index


1.5 Indexes with Duplicate Search Keys1.5 Indexes with Duplicate Search Keys

Dense Index Duplicate key-pointer pairs. Pointers to only first record with given search key.

Sparse Index Pointer to first record of each block. Pointer to first new key on block.

– Sole key if all are the same.


First Key Occurrences OnlyFirst Key Occurrences Only

40

30

20

10

10

10

etc.

5020

10

30

20

30

30

50

40


Sparse, First Key on BlockSparse, First Key on Block

30

20

10

10

etc.

40

10

10

20

10

30

20

30

30

50

40


Sparse, First New KeySparse, First New Key

30

30

20

10

etc.

40

10

10

20

10

30

20

30

30

50

40


LookupLookup Find key in dense index; Find greatest key in sparse. Follow Pointer

Dense, no duplicates: Just follow. Dense, duplicates:

– Follow each (pointer per record)– Follow and look at successive records (pointer to first

with given key) Sparse, no duplicates: Follow to block, examine block. Sparse, duplicates, key = lowest in block:

– Follow to block, look at block, successive blocks until higher key met, and (if key = desired key) previous block.

Sparse, duplicates, key = lowest new in block:– Follow to record, search following records of block, and

successive blocks until higher key met.


1.6 Managing Indexes During Data 1.6 Managing Indexes During Data Modifications Modifications DB Modifications

When we insert or delete on the data file, here are the primitive actions we might take:

1. Create or destroy an empty block in the sequence of blocks belonging to the sequential file.

2. Create or destroy an overflow block.3. Insert a record into a block that has room.4. Delete a record.5. Slide a record to an adjacent block.


Effect of Primitive Actions on Index FileEffect of Primitive Actions on Index File

Action Dense Sparse

Create/destroy empty overflow block

none none

Create empty seq. block none insert

Destroy empty seq. block none delete

Insert record insert update(?)

Delete record delete update(?)

Slide record update update(?)


Example:Example: Delete 30 with Dense IndexDelete 30 with Dense Index

40

20

10

20

10

80

70

60

5040

60

50

80

70


Example:Example: Delete 30 with Sparse IndexDelete 30 with Sparse Index

70

50

40

10

20

10

150

130

110

90

40

60

50

80

70


Example: Insert 15 with Sparse Index - Example: Insert 15 with Sparse Index - RedistributeRedistribute

70

50

20

10

15

10

150

130

110

9040

20

60

50

80

70


Use Overflow Block InsteadUse Overflow Block Instead

70

50

40

10

15

10

150

130

110

90

40

60

50

80

70

20


2. Secondary Indexes2. Secondary Indexes 정의

Primary Index: Search key 에 따라 레코드의 위치 결정 Secondary Index: Search key 와 레코드 주소간의 관계 無

– Sparse, secondary index makes no sense?– Usually, search key is not a “key”.

Table of Contents Design of Secondary Indexes Application of Secondary Indexes Indirection in Secondary Indexes Document Retrieval and Inverted Indexes


2.1 Design of Secondary Index2.1 Design of Secondary Index

20

20

10

10

40

20

50

40

30

20

60

50

20

10

30

50

50

10

20

60

Data File:

not sortedIndex File:

sorted


2.2 Application of Secondary Index2.2 Application of Secondary Index

Clustered File Structure Records of different types (e.g. EMPLOYEE, DEPT)

allowed in same block.

DEPT.dept_id 에 대한 secondary index 필요 장점

– Records that are frequently accessed together should be in the same block.

– Reduce join overhead 단점

– Select * from EMP;

EMP e1 DEPT d1 DEPT d2


2.3 Indirect Buckets2.3 Indirect Buckets 필요성

To avoid repeating keys in index, use a level of indirection, called buckets.

추가적인 장점– 실제 레코드에 대한 검색없이 레코드 집합들간에

교집합 연산 가능

Example Movies(title, year, length, studioName) Secondary indexes on studioName and year.

SELECT title FROM MoviesWHERE studioName = ‘Disney’ AND year = 1995;


Saving Space by Indirect BucketsSaving Space by Indirect Buckets

40

20

20

10

30

50

50

10

20

60

40

30

20

10

60

50

etc.


Intersecting Buckets in Main MemoryIntersecting Buckets in Main Memory

Movie tuplesBuckets for studio Buckets for year

Disney

Studio Index

1995

Year Index


2.4 Document Retrieval and2.4 Document Retrieval and Inverted Indexes Inverted Indexes

Relational View of Documents A document A tuple in a relation Doc

– An attribute for each possible word in a document– Each attribute is boolean ( 예 : hasCat, hasDog, …)

There is a secondary index on each of the attributes of Doc.– FALSE attribute 에 대해서는 인덱스 필요 없음 .

각 애트리뷰트에 대해 인덱스를 만들지 않고 , inverted index 형태로 하나의 인덱스에 통합 .


Inverted Index on DocumentsInverted Index on Documents

cat

dog

Invertedindex

Buckets

…the catis fat…

…was rainingcats and dogs

…

…Fido the dog…

Documents


Additional Info. in BucketsAdditional Info. in Buckets

cat Title 5

Title 100

Author 10

Abstract 57

Title 12

d3d2

d1

dog

typeposit

ion

locatio

n


3. B3. B++-Trees-Trees Properties

Balanced tree (i.e., dynamic multilevel index) Each node, except the root node, is at least half-full. Leaves are dense and sequentially ordered index. The root has at least two children. The records in a node are ordered.


Node FormatsNode Formats Non-leaf Nodes

< (P0), (K1, P1), …, (Kn-1, Pn-1) > Pi: Pointer to the child node Ki: Search key

Leaf Nodes < (K0, P0), (K1, P1), …, (Kn-2, Pn-2), (Pn-1) > P0 i n-2: Pointer to the data Pn-1: Pointer to the sibling leaf node


BB++-Tree Example (-Tree Example (nn = 3) = 3)

100

120

150

180

30

3 5 11

30

35

100

101

110

120

130

150

156

179

180

200

Root


Sample Non-Leaf NodeSample Non-Leaf Node

to keys to keys to keys to keys

< 57 57 k < 81 81 k < 95 95

57

81

95


Sample Leaf NodeSample Leaf Node

From non-leaf node

to next leaf

in sequence57

81

95

To r

eco

rd

wit

h k

ey 5

7

To r

eco

rd

wit

h k

ey 8

1

To r

eco

rd

wit

h k

ey 8

5


Operations of BOperations of B++-Tree -Tree Record Location (Basic Rule)

If Ki L < Ki+1, then L is in a descendant node in the subtree pointed by Pi.

If L < K1, then L is in a descendant node in the subtree pointed by P0.

LOOKUP Record Location Rule


Insertion into BInsertion into B++-Tree-Tree INSERT (Search Key = K)

Find appropriate leaf node, say L1. Case 1: L1 is not full: Insert and STOP Case 2: L1 is full (Node Splitting)

– Create a new node, say L2.– Move the last half of L1 K to L2.– Insert the first search key of L2 into the parent

recursively until no need for node splitting.


Example: Insert key = 32Example: Insert key = 32

3 5 11

30

31

30

10

03

2



3 5 11

30

31

30

100

3 5

7

7



100

120

150

180

150

156

179

18

02

00

16

0

180

160

179



10

20

30

1 2 3 10

12

20

25

30

32

40

40

45

40

30new root


Deletion from BDeletion from B++-Tree-Tree DELETE (Search Key = K)

Find appropriate leaf node. Remove K from the node. Case 1: The node is more than half-full.

– Handle if K is the first key, and STOP. Case 2: The node is less than half-full.

– Distribute or Coalesce– Ripple the effect to the ancestors, if necessary.


Coalesce with Sibling (Coalesce with Sibling (nn = 4) = 4)

10

40

100

10

20

30

40

50

40

Delete 50


Redistribute Keys (Redistribute Keys (nn = 4) = 4)

10

40

100

10

20

30

35

40

5035

35

Delete 50


Non-leaf CoalesceNon-leaf Coalesce

40

45

30

37

25

26

20

22

10

141 3

10 20 30

4040

30

25

25

new root

Delete 37


B-Tree Index FilesB-Tree Index Files

RedwoodDowntown

Downtownbucket

ClearviewBrighton

Brightonbucket

Clearviewbucket

PerryridgeMianus

Mianusbucket

Perryridgebucket

Round Hill

Round Hillbucket

Redwoodbucket


B-Tree Index Files (Cont’d)B-Tree Index Files (Cont’d) Advantages

Search-key values appear only once. The pointers of desired data can be found in nonleaf nodes.

Disadvantages Nonleaf nodes are larger than leaf nodes. Deletion is more complicated. Difficulty in sequential searching


4. Hashing4. Hashing Hashing 의 정의

– Key-to-address transformation– For each search key K, H(K) gives the bucket number.– H: Hash Function– Uniformity– Problem in sequential processing

Hashing 의 종류 Static Hashing Dynamic Hashing


HashingHashing 의 개념의 개념

<key>

.

.

Buckets(typically 1disk block)

key h(key)


Two AlternativesTwo Alternatives

.

.

.

records

.

.

.

key h(key)

key h(key)

Index

recordkey 1


4.1 Static Hashing4.1 Static Hashing Static Hashing 의 특징

Fixed directory size Overflow 발생 가능 Directory 의 크기에 따라

– Sparse directory– Increased search time

Overflow Handling Linear probing Chaining


Hash IndexHash Index

Redwood

Perryridge

Perryridge

Perryridge

Mianus

Downtown

Downtown

Brighton

Round Hill

A-222

A-218

A-201

A-102

A-215

A-110

A-101

A-217

A-305

700

700

900

400

700

600

500

750

350

Bucket 0

A-305

A-215

Bucket 1

A-222

Bucket 6


Example: 2 records/bucketExample: 2 records/bucket

INSERT:

h(a) = 1

h(b) = 2

h(c) = 1

h(d) = 0

0

1

2

3

d

ac

b

h(e) = 1

e


Example: DeletionExample: Deletion

0

1

2

3

a

bc

e

d

Delete:ef

fg

maybe move“g” up

cd


5.2 Dynamic Hashing5.2 Dynamic Hashing File 크기가 증가할 경우 Static Hashing

현재 파일 크기를 기준으로 한 hash function 설계 예상 파일 크기를 기준으로 한 hash function 설계 Hash function 을 주기적으로 재 계산

Dynamic Hashing Dynamic directory size Controlled overflow/underflow Extendable hashing Linear Hashing


Extensible HashingExtensible Hashing 기본 개념

Use i of b bits output by hash function.

bucket address table

00

01

10

11

hash prefix

buckets

i1

i2

i3

00110101

use i grows over time….

b

h(k)


ExampleExamplebranch-name h(branch-name)

Brighton

Clearview

Downtown

Mianus

Perryridge

Redwood

Round Hill

0010 1101 1111 1011 0010 1100 0011 0000

1101 0101 1101 1110 0100 0110 1001 0011

1010 0011 1010 0000 1100 0110 1001 1111

1000 0111 1110 1101 1011 1111 0011 1010

1111 0001 0010 0100 1001 0011 0110 1101

1011 0101 1010 0110 1100 1001 1110 1011

0101 1000 0011 1111 1001 1100 0000 0001


0

hash prefix

buckets

0

Initial Extensible Hash Structure (i = 0)


Example (Cont’d)Example (Cont’d)


0

hash prefix

buckets

Round Hill… 1

Hash Structure after 3 Insertions (i = 1)

1 Perryridge…Downtown…

1




00

hash prefix

buckets

Round Hill… 1


01 Downtown…Redwood…

2

10

11 Perryridge… 2




000

hash prefix

buckets

Round Hill…Brighton…

1


001Mianus… 3

010

011Downtown…Redwood…

3

100

101

110

111

Perryridge…Clearview…

2


Linear HashingLinear Hashing Extensible Hashing 의 단점

Bucket array 의 크기가 2 배로 증가할 때 , 처리 시간 증가 2 배로 증가한 bucket array 를 memory 에 저장할 수 없을

경우 , 디스크 I/O 증가 특정 bucket 에 레코드가 폭주할 경우 , bucket array 크기

증가

Linear Hashing 의 개념 전체 Bucket 수 (n) 에 비해 저장된 레코드 수 (r) 가

과다할 경우 Bucket 수를 하나씩 증가 Overflow 발생 가능 (r / n < threshold)


Linear Hashing (Cont’d)Linear Hashing (Cont’d) Linear Hashing 의 동작 과정

Use i bits from right (low-order) end of h(K). Buckets numbered [0..n-1], where 2i-1 < n 2i. Let last i bits of h(K) be m = (a1 a2 … ai).

– If m < n, then record belongs in bucket m.– If n m < 2i. then record belongs in bucket (0 a2 … ai).

00001010

0

11111

i = 1n = 2r = 3


Insertion into Linear Hash TableInsertion into Linear Hash Table

알고리즘 Lookup 의 결과 bucket B 가 underflow 이면 , B 에 저장 B 가 full 일 경우 , overflow bucket 을 생성하여 저장 . (++r / n > threshold) 일 경우 , n = n + 1

– n > 2i + 1 일 경우 , i = i + 1– n - 2i-1 의 bucket 을 분할하여 새로운 bucket 에 할당

Example (Threshold = 1.7)

00001010

0

11111

i = 1n = 2r = 3

+ 0101


Example - InsertionExample - Insertion

000000

01011111

01

i = 2n = 3r = 4

101010

+ 0001

000000

00010101

01

i = 2n = 3r = 5

101010

1111

+ 0111


Example - InsertionExample - Insertion

000000

00010101

01

i = 2n = 4r = 6

101010

01111111

11


Comparison of B-Tree and HashingComparison of B-Tree and Hashing

Performance Depends on Query Type One Record Lookup (Point Query)

– Select A1, A2 from R where A3 = ‘c’– Hashing is good.

Range Searching (Range Query)– Select A1, A2 from R where A3 >= ‘c1’ and A3 <= ‘c2’– Indexing (i.e., B+-Tree) is good.

Order Preserving Hash Function If K1 < K2, then H(K1) < H(K2) Difficult to achieve

Date post:	17-Jan-2016
Category:	Documents
Upload:	jemimah-little
View:	228 times
Download:	0 times