Date post: | 17-Jan-2016 |
Category: |
Documents |
Upload: | jemimah-little |
View: | 228 times |
Download: | 0 times |
Chapter 4Chapter 4
Index StructuresIndex Structures
Yeungnam University, Database Lab. Chapter 4 - 2
Table of ContentsTable of Contents
1. Indexes on Sequential Files
2. Secondary Indexes
3. B-Trees
4. Hash Tables
Yeungnam University, Database Lab. Chapter 4 - 3
Index 의 종류 Ordered Indexing: Indexed Sequential Access Method Hashing: Relative Access Method
Performance Factors Access Type (point query, range query) Access, Insertion and Deletion Time Space Overhead
Basic ConceptsBasic Concepts
Yeungnam University, Database Lab. Chapter 4 - 4
Access MethodAccess Method 정의
Storage Structure and Search Mechanism
종류 Primary Access Method
– Primary key 에 대한 indexing– Sequential, Indexed sequential, Hashing
Secondary Access Method– Secondary key 에 대한 indexing– Multi-list File, Inverted File
Access Methods for Multi-key Searching
Yeungnam University, Database Lab. Chapter 4 - 5
1. Indexes on Sequential Files1. Indexes on Sequential Files Sequential Files Dense Indexes Sparse Indexes Multiple Levels of Index Indexes with Duplicate Search Keys Managing Indexes During Data Modifications
Yeungnam University, Database Lab. Chapter 4 - 6
1.1 Sequential Files1.1 Sequential Files 정의
Records ordered by search key. Blocks containing records therefore ordered.
On Insert Put record in appropriate block if room. Good Idea:
– Initialize blocks to be less than full. (Page Fill Factor)– Reorganize periodically if file grows.
If no room in proper block:– Create new block; insert into proper order if possible.– If not possible, create (linked) overflow block.
Yeungnam University, Database Lab. Chapter 4 - 7
Ordered IndexingOrdered Indexing Indexed Sequential File 을 구성할 때 사용
Sequential Access Random Access
Notation Primary Index (Clustering Index)
– File 의 sequential order 를 결정하는 필드의 index Secondary Index Dense Index Sparse Index Multilevel Index
Yeungnam University, Database Lab. Chapter 4 - 8
1.2-3 Dense/Sparse Index1.2-3 Dense/Sparse Index Dense Index
Pointer to every record of file, ordered by search key. Can make sense because records may be much bigger than
key-pointer pairs.– Fit index in memory, even if data file does not?– Faster search through index than data file?– Test existence of record without going to data file.
Sparse Index Key-pointer pairs for only a subset of records, typically
first in each block. Saves index space.
Yeungnam University, Database Lab. Chapter 4 - 9
Dense IndexDense Index
40
30
20
10
20
10
80
70
60
50
120
110
100
90
40
30
60
50
80
70
100
90
Yeungnam University, Database Lab. Chapter 4 - 10
Sparse IndexSparse Index
70
50
30
10
20
10
150
130
110
90
230
210
190
170
40
30
60
50
80
70
100
90
Yeungnam University, Database Lab. Chapter 4 - 11
1.4 Multiple Levels of Index1.4 Multiple Levels of Index A sparse index on a (sparse or dense) index is an
option. Good chance than 2nd or higher level indexes can be
housed in main memory, so no additional disk I/O’s. Dense higher level indexes make no sense;
– dense(dense) = same dense index.
Yeungnam University, Database Lab. Chapter 4 - 12
Multiple Levels of IndexMultiple Levels of Index
DataBlock 0
DataBlock 1
IndexBlock 0
IndexBlock 1
Inner index
Yeungnam University, Database Lab. Chapter 4 - 13
1.5 Indexes with Duplicate Search Keys1.5 Indexes with Duplicate Search Keys
Dense Index Duplicate key-pointer pairs. Pointers to only first record with given search key.
Sparse Index Pointer to first record of each block. Pointer to first new key on block.
– Sole key if all are the same.
Yeungnam University, Database Lab. Chapter 4 - 14
First Key Occurrences OnlyFirst Key Occurrences Only
40
30
20
10
10
10
etc.
5020
10
30
20
30
30
50
40
Yeungnam University, Database Lab. Chapter 4 - 15
Sparse, First Key on BlockSparse, First Key on Block
30
20
10
10
etc.
40
10
10
20
10
30
20
30
30
50
40
Yeungnam University, Database Lab. Chapter 4 - 16
Sparse, First New KeySparse, First New Key
30
30
20
10
etc.
40
10
10
20
10
30
20
30
30
50
40
Yeungnam University, Database Lab. Chapter 4 - 17
LookupLookup Find key in dense index; Find greatest key in sparse. Follow Pointer
Dense, no duplicates: Just follow. Dense, duplicates:
– Follow each (pointer per record)– Follow and look at successive records (pointer to first
with given key) Sparse, no duplicates: Follow to block, examine block. Sparse, duplicates, key = lowest in block:
– Follow to block, look at block, successive blocks until higher key met, and (if key = desired key) previous block.
Sparse, duplicates, key = lowest new in block:– Follow to record, search following records of block, and
successive blocks until higher key met.
Yeungnam University, Database Lab. Chapter 4 - 18
1.6 Managing Indexes During Data 1.6 Managing Indexes During Data Modifications Modifications DB Modifications
When we insert or delete on the data file, here are the primitive actions we might take:
1. Create or destroy an empty block in the sequence of blocks belonging to the sequential file.
2. Create or destroy an overflow block.3. Insert a record into a block that has room.4. Delete a record.5. Slide a record to an adjacent block.
Yeungnam University, Database Lab. Chapter 4 - 19
Effect of Primitive Actions on Index FileEffect of Primitive Actions on Index File
Action Dense Sparse
Create/destroy empty overflow block
none none
Create empty seq. block none insert
Destroy empty seq. block none delete
Insert record insert update(?)
Delete record delete update(?)
Slide record update update(?)
Yeungnam University, Database Lab. Chapter 4 - 20
Example:Example: Delete 30 with Dense IndexDelete 30 with Dense Index
40
20
10
20
10
80
70
60
5040
60
50
80
70
Yeungnam University, Database Lab. Chapter 4 - 21
Example:Example: Delete 30 with Sparse IndexDelete 30 with Sparse Index
70
50
40
10
20
10
150
130
110
90
40
60
50
80
70
Yeungnam University, Database Lab. Chapter 4 - 22
Example: Insert 15 with Sparse Index - Example: Insert 15 with Sparse Index - RedistributeRedistribute
70
50
20
10
15
10
150
130
110
9040
20
60
50
80
70
Yeungnam University, Database Lab. Chapter 4 - 23
Use Overflow Block InsteadUse Overflow Block Instead
70
50
40
10
15
10
150
130
110
90
40
60
50
80
70
20
Yeungnam University, Database Lab. Chapter 4 - 24
2. Secondary Indexes2. Secondary Indexes 정의
Primary Index: Search key 에 따라 레코드의 위치 결정 Secondary Index: Search key 와 레코드 주소간의 관계 無
– Sparse, secondary index makes no sense?– Usually, search key is not a “key”.
Table of Contents Design of Secondary Indexes Application of Secondary Indexes Indirection in Secondary Indexes Document Retrieval and Inverted Indexes
Yeungnam University, Database Lab. Chapter 4 - 25
2.1 Design of Secondary Index2.1 Design of Secondary Index
20
20
10
10
40
20
50
40
30
20
60
50
20
10
30
50
50
10
20
60
Data File:
not sortedIndex File:
sorted
Yeungnam University, Database Lab. Chapter 4 - 26
2.2 Application of Secondary Index2.2 Application of Secondary Index
Clustered File Structure Records of different types (e.g. EMPLOYEE, DEPT)
allowed in same block.
DEPT.dept_id 에 대한 secondary index 필요 장점
– Records that are frequently accessed together should be in the same block.
– Reduce join overhead 단점
– Select * from EMP;
EMP e1 DEPT d1 DEPT d2
Yeungnam University, Database Lab. Chapter 4 - 27
2.3 Indirect Buckets2.3 Indirect Buckets 필요성
To avoid repeating keys in index, use a level of indirection, called buckets.
추가적인 장점– 실제 레코드에 대한 검색없이 레코드 집합들간에
교집합 연산 가능
Example Movies(title, year, length, studioName) Secondary indexes on studioName and year.
SELECT title FROM MoviesWHERE studioName = ‘Disney’ AND year = 1995;
Yeungnam University, Database Lab. Chapter 4 - 28
Saving Space by Indirect BucketsSaving Space by Indirect Buckets
40
20
20
10
30
50
50
10
20
60
40
30
20
10
60
50
etc.
Yeungnam University, Database Lab. Chapter 4 - 29
Intersecting Buckets in Main MemoryIntersecting Buckets in Main Memory
Movie tuplesBuckets for studio Buckets for year
Disney
Studio Index
1995
Year Index
Yeungnam University, Database Lab. Chapter 4 - 30
2.4 Document Retrieval and2.4 Document Retrieval and Inverted Indexes Inverted Indexes
Relational View of Documents A document A tuple in a relation Doc
– An attribute for each possible word in a document– Each attribute is boolean ( 예 : hasCat, hasDog, …)
There is a secondary index on each of the attributes of Doc.– FALSE attribute 에 대해서는 인덱스 필요 없음 .
각 애트리뷰트에 대해 인덱스를 만들지 않고 , inverted index 형태로 하나의 인덱스에 통합 .
Yeungnam University, Database Lab. Chapter 4 - 31
Inverted Index on DocumentsInverted Index on Documents
cat
dog
Invertedindex
Buckets
…the catis fat…
…was rainingcats and dogs
…
…Fido the dog…
Documents
Yeungnam University, Database Lab. Chapter 4 - 32
Additional Info. in BucketsAdditional Info. in Buckets
cat Title 5
Title 100
Author 10
Abstract 57
Title 12
d3d2
d1
dog
typeposit
ion
locatio
n
Yeungnam University, Database Lab. Chapter 4 - 33
3. B3. B++-Trees-Trees Properties
Balanced tree (i.e., dynamic multilevel index) Each node, except the root node, is at least half-full. Leaves are dense and sequentially ordered index. The root has at least two children. The records in a node are ordered.
Yeungnam University, Database Lab. Chapter 4 - 34
Node FormatsNode Formats Non-leaf Nodes
< (P0), (K1, P1), …, (Kn-1, Pn-1) > Pi: Pointer to the child node Ki: Search key
Leaf Nodes < (K0, P0), (K1, P1), …, (Kn-2, Pn-2), (Pn-1) > P0 i n-2: Pointer to the data Pn-1: Pointer to the sibling leaf node
Yeungnam University, Database Lab. Chapter 4 - 35
BB++-Tree Example (-Tree Example (nn = 3) = 3)
100
120
150
180
30
3 5 11
30
35
100
101
110
120
130
150
156
179
180
200
Root
Yeungnam University, Database Lab. Chapter 4 - 36
Sample Non-Leaf NodeSample Non-Leaf Node
to keys to keys to keys to keys
< 57 57 k < 81 81 k < 95 95
57
81
95
Yeungnam University, Database Lab. Chapter 4 - 37
Sample Leaf NodeSample Leaf Node
From non-leaf node
to next leaf
in sequence57
81
95
To r
eco
rd
wit
h k
ey 5
7
To r
eco
rd
wit
h k
ey 8
1
To r
eco
rd
wit
h k
ey 8
5
Yeungnam University, Database Lab. Chapter 4 - 38
Operations of BOperations of B++-Tree -Tree Record Location (Basic Rule)
If Ki L < Ki+1, then L is in a descendant node in the subtree pointed by Pi.
If L < K1, then L is in a descendant node in the subtree pointed by P0.
LOOKUP Record Location Rule
Yeungnam University, Database Lab. Chapter 4 - 39
Insertion into BInsertion into B++-Tree-Tree INSERT (Search Key = K)
Find appropriate leaf node, say L1. Case 1: L1 is not full: Insert and STOP Case 2: L1 is full (Node Splitting)
– Create a new node, say L2.– Move the last half of L1 K to L2.– Insert the first search key of L2 into the parent
recursively until no need for node splitting.
Yeungnam University, Database Lab. Chapter 4 - 40
Example: Insert key = 32Example: Insert key = 32
3 5 11
30
31
30
10
03
2
Yeungnam University, Database Lab. Chapter 4 - 41
Example: Insert key = 7Example: Insert key = 7
3 5 11
30
31
30
100
3 5
7
7
Yeungnam University, Database Lab. Chapter 4 - 42
Example: Insert key = 160Example: Insert key = 160
100
120
150
180
150
156
179
18
02
00
16
0
180
160
179
Yeungnam University, Database Lab. Chapter 4 - 43
Example: Insert key = 45Example: Insert key = 45
10
20
30
1 2 3 10
12
20
25
30
32
40
40
45
40
30new root
Yeungnam University, Database Lab. Chapter 4 - 44
Deletion from BDeletion from B++-Tree-Tree DELETE (Search Key = K)
Find appropriate leaf node. Remove K from the node. Case 1: The node is more than half-full.
– Handle if K is the first key, and STOP. Case 2: The node is less than half-full.
– Distribute or Coalesce– Ripple the effect to the ancestors, if necessary.
Yeungnam University, Database Lab. Chapter 4 - 45
Coalesce with Sibling (Coalesce with Sibling (nn = 4) = 4)
10
40
100
10
20
30
40
50
40
Delete 50
Yeungnam University, Database Lab. Chapter 4 - 46
Redistribute Keys (Redistribute Keys (nn = 4) = 4)
10
40
100
10
20
30
35
40
5035
35
Delete 50
Yeungnam University, Database Lab. Chapter 4 - 47
Non-leaf CoalesceNon-leaf Coalesce
40
45
30
37
25
26
20
22
10
141 3
10 20 30
4040
30
25
25
new root
Delete 37
Yeungnam University, Database Lab. Chapter 4 - 48
B-Tree Index FilesB-Tree Index Files
RedwoodDowntown
Downtownbucket
ClearviewBrighton
Brightonbucket
Clearviewbucket
PerryridgeMianus
Mianusbucket
Perryridgebucket
Round Hill
Round Hillbucket
Redwoodbucket
Yeungnam University, Database Lab. Chapter 4 - 49
B-Tree Index Files (Cont’d)B-Tree Index Files (Cont’d) Advantages
Search-key values appear only once. The pointers of desired data can be found in nonleaf nodes.
Disadvantages Nonleaf nodes are larger than leaf nodes. Deletion is more complicated. Difficulty in sequential searching
Yeungnam University, Database Lab. Chapter 4 - 50
4. Hashing4. Hashing Hashing 의 정의
– Key-to-address transformation– For each search key K, H(K) gives the bucket number.– H: Hash Function– Uniformity– Problem in sequential processing
Hashing 의 종류 Static Hashing Dynamic Hashing
Yeungnam University, Database Lab. Chapter 4 - 51
HashingHashing 의 개념의 개념
<key>
.
.
Buckets(typically 1disk block)
key h(key)
Yeungnam University, Database Lab. Chapter 4 - 52
Two AlternativesTwo Alternatives
.
.
.
records
.
.
.
key h(key)
key h(key)
Index
recordkey 1
Yeungnam University, Database Lab. Chapter 4 - 53
4.1 Static Hashing4.1 Static Hashing Static Hashing 의 특징
Fixed directory size Overflow 발생 가능 Directory 의 크기에 따라
– Sparse directory– Increased search time
Overflow Handling Linear probing Chaining
Yeungnam University, Database Lab. Chapter 4 - 54
Hash IndexHash Index
Redwood
Perryridge
Perryridge
Perryridge
Mianus
Downtown
Downtown
Brighton
Round Hill
A-222
A-218
A-201
A-102
A-215
A-110
A-101
A-217
A-305
700
700
900
400
700
600
500
750
350
Bucket 0
A-305
A-215
Bucket 1
A-222
Bucket 6
Yeungnam University, Database Lab. Chapter 4 - 55
Example: 2 records/bucketExample: 2 records/bucket
INSERT:
h(a) = 1
h(b) = 2
h(c) = 1
h(d) = 0
0
1
2
3
d
ac
b
h(e) = 1
e
Yeungnam University, Database Lab. Chapter 4 - 56
Example: DeletionExample: Deletion
0
1
2
3
a
bc
e
d
Delete:ef
fg
maybe move“g” up
cd
Yeungnam University, Database Lab. Chapter 4 - 57
5.2 Dynamic Hashing5.2 Dynamic Hashing File 크기가 증가할 경우 Static Hashing
현재 파일 크기를 기준으로 한 hash function 설계 예상 파일 크기를 기준으로 한 hash function 설계 Hash function 을 주기적으로 재 계산
Dynamic Hashing Dynamic directory size Controlled overflow/underflow Extendable hashing Linear Hashing
Yeungnam University, Database Lab. Chapter 4 - 58
Extensible HashingExtensible Hashing 기본 개념
Use i of b bits output by hash function.
bucket address table
00
01
10
11
hash prefix
buckets
i1
i2
i3
00110101
use i grows over time….
b
h(k)
Yeungnam University, Database Lab. Chapter 4 - 59
ExampleExamplebranch-name h(branch-name)
Brighton
Clearview
Downtown
Mianus
Perryridge
Redwood
Round Hill
0010 1101 1111 1011 0010 1100 0011 0000
1101 0101 1101 1110 0100 0110 1001 0011
1010 0011 1010 0000 1100 0110 1001 1111
1000 0111 1110 1101 1011 1111 0011 1010
1111 0001 0010 0100 1001 0011 0110 1101
1011 0101 1010 0110 1100 1001 1110 1011
0101 1000 0011 1111 1001 1100 0000 0001
bucket address table
0
hash prefix
buckets
0
Initial Extensible Hash Structure (i = 0)
Yeungnam University, Database Lab. Chapter 4 - 60
Example (Cont’d)Example (Cont’d)
bucket address table
0
hash prefix
buckets
Round Hill… 1
Hash Structure after 3 Insertions (i = 1)
1 Perryridge…Downtown…
1
Yeungnam University, Database Lab. Chapter 4 - 61
Example (Cont’d)Example (Cont’d)
bucket address table
00
hash prefix
buckets
Round Hill… 1
Hash Structure after 4 Insertions (i = 2)
01 Downtown…Redwood…
2
10
11 Perryridge… 2
Yeungnam University, Database Lab. Chapter 4 - 62
Example (Cont’d)Example (Cont’d)
bucket address table
000
hash prefix
buckets
Round Hill…Brighton…
1
Hash Structure after 7 Insertions (i = 3)
001Mianus… 3
010
011Downtown…Redwood…
3
100
101
110
111
Perryridge…Clearview…
2
Yeungnam University, Database Lab. Chapter 4 - 63
Linear HashingLinear Hashing Extensible Hashing 의 단점
Bucket array 의 크기가 2 배로 증가할 때 , 처리 시간 증가 2 배로 증가한 bucket array 를 memory 에 저장할 수 없을
경우 , 디스크 I/O 증가 특정 bucket 에 레코드가 폭주할 경우 , bucket array 크기
증가
Linear Hashing 의 개념 전체 Bucket 수 (n) 에 비해 저장된 레코드 수 (r) 가
과다할 경우 Bucket 수를 하나씩 증가 Overflow 발생 가능 (r / n < threshold)
Yeungnam University, Database Lab. Chapter 4 - 64
Linear Hashing (Cont’d)Linear Hashing (Cont’d) Linear Hashing 의 동작 과정
Use i bits from right (low-order) end of h(K). Buckets numbered [0..n-1], where 2i-1 < n 2i. Let last i bits of h(K) be m = (a1 a2 … ai).
– If m < n, then record belongs in bucket m.– If n m < 2i. then record belongs in bucket (0 a2 … ai).
00001010
0
11111
i = 1n = 2r = 3
Yeungnam University, Database Lab. Chapter 4 - 65
Insertion into Linear Hash TableInsertion into Linear Hash Table
알고리즘 Lookup 의 결과 bucket B 가 underflow 이면 , B 에 저장 B 가 full 일 경우 , overflow bucket 을 생성하여 저장 . (++r / n > threshold) 일 경우 , n = n + 1
– n > 2i + 1 일 경우 , i = i + 1– n - 2i-1 의 bucket 을 분할하여 새로운 bucket 에 할당
Example (Threshold = 1.7)
00001010
0
11111
i = 1n = 2r = 3
+ 0101
Yeungnam University, Database Lab. Chapter 4 - 66
Example - InsertionExample - Insertion
000000
01011111
01
i = 2n = 3r = 4
101010
+ 0001
000000
00010101
01
i = 2n = 3r = 5
101010
1111
+ 0111
Yeungnam University, Database Lab. Chapter 4 - 67
Example - InsertionExample - Insertion
000000
00010101
01
i = 2n = 4r = 6
101010
01111111
11
Yeungnam University, Database Lab. Chapter 4 - 68
Comparison of B-Tree and HashingComparison of B-Tree and Hashing
Performance Depends on Query Type One Record Lookup (Point Query)
– Select A1, A2 from R where A3 = ‘c’– Hashing is good.
Range Searching (Range Query)– Select A1, A2 from R where A3 >= ‘c1’ and A3 <= ‘c2’– Indexing (i.e., B+-Tree) is good.
Order Preserving Hash Function If K1 < K2, then H(K1) < H(K2) Difficult to achieve