CSCE 520 Test 2 Info Indexing

1

CSCE 520 Test 2 InfoIndexing

Modified from slides of Hector Garcia-Molina and Jeff Ullman

2

Physical Storage Media

Speed of data access

Cost per unit of data

Reliability

•Data loss (power failure or system crash)

•Physical failure (storage device)

•Storage types

•Volatile storage

•Non-volatile storage

3

Memory Hierarchy

DBMSPrograms,Main MemoryDBMS

Tertiary Storage

VirtualMemory

Disk FileSystem

Main Memory

Cache

4

Disk Access Characteristics

•Move data to main memory: •Position head on cylinder•Find and access sector

•Steps of reading a block:•Processor and disk controller processes the request •Seek time: position the head•Rotation latency: rotate the sector under the head•Transfer time: sector/block read by the head

5

Disk Access Characteristics

•Steps of writing a block:•Read the block into the main memory•Change main memory copy of block•Write new content back on disk•Verify correctness of write

6

How to find records efficiently?

• Primary key – sequential organization

• Search key?• High I/O cost

INDEXING

Cost of Indexing

• Where the time spent on answering a query

• Fast: processing in memory• Slow: fetching from secondary storage• Cost of indexing:

– Index on several attributes: fast retrieval but slow writes (maintain index structure)

7

8

Topics

• Conventional indexes• B-trees• Hashing schemes (read only)

9

Sequential File

2010

4030

6050

8070

10090

10

Sequential File

2010

4030

6050

8070

10090

Dense Index

10203040

50607080

90100110120

11

Sequential File

2010

4030

6050

8070

10090

Sparse Index

10305070

90110130150

170190210230

12

Sequential File

2010

4030

6050

8070

10090

Sparse 2nd level

10305070

90110130150

170190210230

1090

170250

330410490570

13

Sparse vs. Dense Tradeoff

• Sparse: Less index space per record can keep more of

index in memory• Dense: Can tell if any record exists

without accessing file

14

Terms

• Index sequential file• Search key ( primary key)• Primary index (on Sequencing field)• Secondary index• Dense index (all Search Key values in)• Sparse index• Multi-level index

15

Next:

• Duplicate keys

• Deletion/Insertion

• Secondary indexes

16

Duplicate keys

1010

2010

3020

3030

4540

17

1010

2010

3020

3030

4540

10101020

20303030

1010

2010

3020

3030

4540

10101020

20303030

Dense index, one way to implement?

Duplicate keys

18

1010

2010

3020

3030

4540

10203040

Dense index, better way?

Duplicate keys

19

1010

2010

3020

3030

4540

10102030

Sparse index, one way?

Duplicate keys

care

ful if lookin

gfo

r 2

0 o

r 3

0!

20

1010

2010

3020

3030

4540

10203030

Sparse index, another way?

Duplicate keys

– place first new key from block

shouldthis be40?

21

Duplicate values, primary index

• Index may point to first instance ofeach value only

File Index

Summary

aaa

b

22

Deletion from sparse index

2010

4030

6050

8070

10305070

90110130150

23


2010

4030

6050

8070

10305070

90110130150

– delete record 40

24


2010

4030

6050

8070

10305070

90110130150


4040

25


2010

4030

6050

8070

10305070

90110130150

– delete records 30 & 40

5070

26

Deletion from dense index

2010

4030

6050

8070

10203040

50607080

27

Deletion from dense index

2010

4030

6050

8070

10203040

50607080


4040

28

Insertion, sparse index case

2010

30

5040

60

10304060

29


2010

30

5040

60

10304060

– insert record 34

34

• our lucky day! we have free space where we need it!

30


2010

30

5040

60

10304060


15

2030

20

• Illustrated: Immediate reorganization• Variation:

– insert new block (chained file)– update index

31


2010

30

5040

60

10304060


25

overflow blocks(reorganize later...)

32

Insertion, dense index case

• Similar

• Often more expensive . . .

33

Summary so far

• Conventional index– Basic Ideas: sparse, dense, multi-

level…– Duplicate Keys– Deletion/Insertion– Secondary indexes

34

Conventional indexes

Advantage:- Simple- Index is sequential file

good for scans

Disadvantage:- Inserts expensive,

and/or- Lose sequentiality &

balance

35

• NEXT: Another type of index– Give up on sequentiality of index– Try to get “balance”

36

Root

B+Tree Example n=3

100

120

150

180

30

3 5 11

30

35

100

101

110

120

130

150

156

179

180

200

37

Sample non-leaf

to keys to keys to keys to keys

< 57 57 k<81 81k<95 95

57

81

95

38

Sample leaf node:

From non-leaf node

to next leafin

sequence5

7

81

95

To r

eco

rd

wit

h k

ey 5

7

To r

eco

rd

wit

h k

ey 8

1

To r

eco

rd

wit

h k

ey 8

5

39

Size of nodes: n+1 pointersn keys

(fixed)

40

Don’t want nodes to be too empty

• Use at least

Non-leaf: (n+1)/2pointers

Leaf: (n+1)/2 pointers to data

41

Full nodemin. node

Non-leaf

Leaf

n=3

12

01

50

18

0

30

3 5 11

30

35

counts

even if

null

42

B+tree rules tree of order n

(1) All leaves at same lowest level(balanced tree)

(2) Pointers in leaves point to records except for “sequence pointer”

43

(3) Number of pointers/keys for B+tree

Non-leaf(non-root) n+1 n (n+1)/2 (n+1)/2- 1

Leaf(non-root) n+1 n

Root n+1 n 1 1

Max Max Min Min ptrs keys ptrsdata keys

(n+1)/2 (n+1)/2

44

Insert into B+tree (read only)

(a) simple case– space available in leaf

(b) leaf overflow(c) non-leaf overflow(d) new root

45

(a) Insert key = 32 n=33 5 11

30

31

30

100

32

46

(a) Insert key = 7 n=3

3 5 11

30

31

30

100

3 5

7

7

47

(a) Simple case - no example

(b) Coalesce with neighbor (sibling)

(c) Re-distribute keys(d) Cases (b) or (c) at non-leaf

Deletion from B+tree

48

(b) Coalesce with sibling– Delete 50

10

40

100

10

20

30

40

50

n=4

40

49

(c) Redistribute keys– Delete 50

10

40

100

10

20

30

35

40

50

n=4

35

35

50

B+tree deletions in practice

– Often, coalescing is not implemented– Too hard and not worth it!

Date post:	11-Jan-2016
Category:	Documents
Upload:	conlan
View:	31 times
Download:	0 times

CSCE 520 Test 2 Info Indexing

Documents