+ All Categories
Home > Documents > I/O-Algorithms

I/O-Algorithms

Date post: 22-Feb-2016
Category:
Upload: tallys
View: 19 times
Download: 0 times
Share this document with a friend
Description:
I/O-Algorithms. Lars Arge Spring 2012 February 27, 2012. Random Access Machine Model. Standard theoretical model of computation: Infinite memory Uniform access cost. R A M. R A M. L 1. L 2. Hierarchical Memory. Modern machines have complicated memory hierarchy - PowerPoint PPT Presentation
Popular Tags:
33
I/O-Algorithms Lars Arge Spring 2012 February 27, 2012
Transcript
Page 1: I/O-Algorithms

I/O-Algorithms

Lars Arge

Spring 2012

February 27, 2012

Page 2: I/O-Algorithms

Lars Arge

I/O-algorithms

2

Random Access Machine Model

• Standard theoretical model of computation:– Infinite memory– Uniform access cost

R

AM

Page 3: I/O-Algorithms

Lars Arge

I/O-algorithms

3

Hierarchical Memory

• Modern machines have complicated memory hierarchy– Levels get larger and slower further away from CPU– Large access time amortized using block transfer between levels

• Bottleneck often transfers between largest memory levels in use

L1

L2

R

AM

Page 4: I/O-Algorithms

Lars Arge

I/O-algorithms

4

I/O-Bottleneck• I/O is often bottleneck when handling massive datasets

– Disk access is 106 times slower than main memory access– Large transfer block size (typically 8-16 Kbytes)

• Important to obtain “locality of reference”– Need to store and access data to take advantage of blocks

track

magnetic surface

read/write armread/write head

Page 5: I/O-Algorithms

Lars Arge

I/O-algorithms

5

I/O-Model

• ParametersN = # elements in problem instanceB = # elements that fits in disk blockM = # elements that fits in main memory

T = # output size in searching problem

• We often assume that M>B2

• I/O: Movement of block between memory and disk

D

P

M

Block I/O

Page 6: I/O-Algorithms

Lars Arge

I/O-algorithms

6

Fundamental Bounds Internal External

• Scanning: N• Sorting: N log N• Permuting • Searching:

• Note:– Linear I/O: O(N/B)– Permuting not linear– Permuting and sorting bounds are equal in all practical cases– B factor VERY important: – Cannot sort optimally with search tree

NBlog

BN

BN

BMlog

BN

NBN

BN

BN

BM log

log,min BN

BN

BMNN

N2log

Page 7: I/O-Algorithms

Lars Arge

I/O-algorithms

7

Merge Sort• Merge sort:

– Create N/M memory sized sorted runs– Merge runs together M/B at a time

phases using I/Os each

• Distribution sort similar (but harder – partition elements)

)( BNO)(log M

NB

MO

Page 8: I/O-Algorithms

Lars Arge

I/O-algorithms

8

Permuting Lower BoundPermuting N elements according to a given permutation takes I/Os in “indivisibility” model

• Indivisibility model: Move of elements only allowed operation• Note:

– We can allow copies (and destruction of elements)– Bound also a lower bound on sorting

• Proof:– View memory and disk as array of N tracks of B elements– Assume all I/Os track aligned (assumption can be removed)

)log,(min BN

BN

BMN

Page 9: I/O-Algorithms

Lars Arge

I/O-algorithms

9

Permuting Lower Bound– Array contains permutation of N elements at all times– We will count how many permutations can be

reached (produced) with t I/Os– Input:

* Choose track: N possibilities* Rearrange ≤ B element in track and place among ≤ M-B

elements in memory:– possibilities if “fresh” track– otherwise

at most permutations after t inputs– Output:

* Choose track: N possibilities

BN

BM BN t )!())((

)(!BMB

)(BM

Page 10: I/O-Algorithms

Lars Arge

I/O-algorithms

10

Permuting Lower Bound– Permutation algorithm needs to be able to produce N! permutations

(using Stirlings formula and )– If we have– If we have and thus

!)!())(( NBN BN

BM t

)!log())log((log)!log( NNtB

BM

BN

NNBNtBN BM log)log(loglog

BM

BN

BNN

tloglog

log

xxx log!log BMB

BM log)log(

BMBN loglog B

NBMB

NB

N

BMBN

t /log2

log log

BMBN loglog NB

NNNNNt NB

NN B

N

21

21

loglog

21

log2log

)log,(min BN

BN

BMNt

Page 11: I/O-Algorithms

Lars Arge

I/O-algorithms

11

Sorting lower boundSorting N elements takes I/Os in comparison model

• Proof:– Initially N elements stored in

N/B first blocks on disk– Initially all N! possible orderings

consistent with our knowledge– After t I/Os?

)log( BN

BN

BM

N!

Page 12: I/O-Algorithms

Lars Arge

I/O-algorithms

12

Sorting lower bound• Consider one input assuming:

– S consistent orderings before input– Compute total order of elements in memory– Adversary choose ”worst” outcome of comparisons done

• possible orderings of M-B ”old”and B new elements in memory

• Adversary can choose outcome such thatstill consistent orderings

• Only get B! term N/B times consistent orderings after t I/Os

!)( BBM N!

)!)/(( BSBM

))!()/((! BN

BM BN t

)log(1))!()/((! BN

BNt

BM

BN

BM tBN

Page 13: I/O-Algorithms

Lars Arge

I/O-algorithms

13

Summary/Conclusion: Sorting

• External merge or distribution sort takes I/Os– Merge-sort based on M/B-way merging– Distribution sort based on -way distribution

and partition elements finding

• Optimal in comparison model

• Can prove lower boundin stronger model– Holds even for permuting

)log( BN

BN

BMO

)log,(min BN

BN

BMN

BM

Page 14: I/O-Algorithms

I/O-algorithms

14Lars Arge

– If nodes stored arbitrarily on diskÞ Search in I/OsÞ Rangesearch in I/Os

• Binary search tree:– Standard method for search among N elements– We assume elements in leaves

– Search traces at least one root-leaf path

External Search Trees

)(log2 NO

)(log2 N

)(log 2 TNO

Page 15: I/O-Algorithms

Lars Arge

I/O-algorithms

15

External Search Trees

• BFS blocking:– Block height– Output elements blocked Rangesearch in I/Os

• Optimal: O(N/B) space and query

)(log2 B

)(B

)(log)(log/)(log 22 NOBONO B

)(log BT

B N )(log B

TB N

Page 16: I/O-Algorithms

Lars Arge

I/O-algorithms

16

• Maintaining BFS blocking during updates?– Balance normally maintained in search trees using rotations

• Seems very difficult to maintain BFS blocking during rotation– Also need to make sure output (leaves) is blocked!

External Search Trees

x

y

x

y

Page 17: I/O-Algorithms

Lars Arge

I/O-algorithms

17

B-trees• BFS-blocking naturally corresponds to tree with fan-out

• B-trees balanced by allowing node degree to vary– Rebalancing performed by splitting and merging nodes

)(B

Page 18: I/O-Algorithms

Lars Arge

I/O-algorithms

18

• (a,b)-tree uses linear space and has heightChoosing a,b = each node/leaf stored in one disk blockN/B space and query

(a,b)-tree• T is an (a,b)-tree (a≥2 and b≥2a-1)

– All leaves on the same level and contain between a and b elements

– Except for the root, all nodes have degree between a and b

– Root has degree between 2 and b

)(log NO a

)(log BT

B N

)(B

2,4tree

Page 19: I/O-Algorithms

Lars Arge

I/O-algorithms

19

(a,b)-Tree Insert• Insert:

Search and insert element in leaf vDO v has b+1 elements/children

Split v:make nodes v’ and v’’ with

and elements insert element (ref) in parent(v)

(make new root if necessary)v=parent(v)

• Insert touch nodes

bb 2

1 ab 2

1

)(log Na

v

v’ v’’

21b 2

1b

1b

Page 20: I/O-Algorithms

Lars Arge

I/O-algorithms

20

(2,4)-Tree Insert

Page 21: I/O-Algorithms

Lars Arge

I/O-algorithms

21

(a,b)-Tree Delete• Delete:

Search and delete element from leaf vDO v has a-1 elements/children

Fuse v with sibling v’:move children of v’ to vdelete element (ref) from parent(v)(delete root if necessary)

If v has >b (and ≤ a+b-1<2b) children split vv=parent(v)

• Delete touch nodes )(log NO a

v

v

1a

12 a

Page 22: I/O-Algorithms

Lars Arge

I/O-algorithms

22

(2,4)-Tree Delete

Page 23: I/O-Algorithms

Lars Arge

I/O-algorithms

23

• (a,b)-tree properties:– If b=2a-1 every update can

cause many rebalancingoperations

– If b≥2a update only cause O(1) rebalancing operations amortized– If b>2a only rebalancing operations amortized

* Both somewhat hard to show– If b=4a easy to show that update causes rebalance

operations amortized* After split during insert a leaf contains 4a/2=2a elements* After fuse during delete a leaf contains between 2a and

5a elements (split if more than 3a between 3/2a and 5/2a)

(a,b)-Tree

)()( 112

aa OO b

)log( 1 NO aa

insert

delete

(2,3)-tree

Page 24: I/O-Algorithms

Lars Arge

I/O-algorithms

24

Summary/Conclusion: B-tree• B-trees: (a,b)-trees with a,b =

– O(N/B) space– O(logB N+T/B) query– O(logB N) update

• B-trees with elements in the leaves sometimes called B+-tree

• Construction in I/Os– Sort elements and construct leaves– Build tree level-by-level bottom-up

)(B

)log( BN

BN

BMO

Page 25: I/O-Algorithms

Lars Arge

I/O-algorithms

25

Summary/Conclusion: B-tree• B-tree with branching parameter b and leaf parameter k (b,k≥8)

– All leaves on same level and contain between 1/4k and k elements– Except for the root, all nodes have degree between 1/4b and b– Root has degree between 2 and b

• B-tree with leaf parameter – O(N/B) space– Height – amortized leaf rebalance operations– amortized internal node rebalance operations

• B-tree with branching parameter Bc, 0<c≤1, and leaf parameter B– Space O(N/B), updates , queries

)(log BN

bO)( 1

kO)log( 1

BN

bkbO

)(Bk

)(log NO B )(log BT

B NO

Page 26: I/O-Algorithms

Lars Arge

I/O-algorithms

26

Secondary Structures• When secondary structures used, a rebalance on v often requires

O(w(v)) I/Os (w(v) is weight of v)– If inserts have to be made below v between operations O(1) amortized split bound amortized insert bound

• Nodes in standard B-tree do not have this property

))(( vw

)(log NO B

2,4tree

Page 27: I/O-Algorithms

Lars Arge

I/O-algorithms

27

BB[]-tree• In internal memory BB[]-trees have the desired property• Defined using weight-constraint

– Ratio between weight of left child and weight of right child of a node v is between and 1- (<1)

Height O(log N)

• If rebalancing can be performed using rotations

• Seems hard to implement BB[]-trees I/O-efficiently

21 21

112

x

yx

y

Page 28: I/O-Algorithms

Lars Arge

I/O-algorithms

28

Weight-balanced B-tree• Idea: Combination of B-tree and BB[]-tree

– Weight constraint on nodes instead of degree constraint– Rebalancing performed using split/fuse as in B-tree

• Weight-balanced B-tree with parameters b and k (b>8, k≥8)– All leaves on same level and

contain between k/4 and k elements– Internal node v at level l has

w(v) < – Except for the root, internal node v

at level l has w(v)>– The root has more than one child

kbl

kbl41

level l-1

level lkbkb ll ...41

kbkb ll 1141 ...

Page 29: I/O-Algorithms

Lars Arge

I/O-algorithms

29

Weight-balanced B-tree• Every internal node has degree between

and

Height

• External memory:– Choose 4b=B (or even Bc for 0 < c ≤ 1)– k=B O(N/B) space, query

bkbkb ll411

41 / bkbkb ll 4/ 1

41

)(log kN

bO

)(log BT

B NO

level l-1

level lkbkb ll ...41

kbkb ll 1141 ...

Page 30: I/O-Algorithms

Lars Arge

I/O-algorithms

30

Weight-balanced B-tree Insert• Search for relevant leaf u and insert new element• Traverse path from u to root:

– If level l node v now has w(v)=blk+1then split into nodes v’ and v’’ with

and

• Algorithm correct since such that and – touch nodes

• Weight-balance property:– updates below v’ and v’’ before next rebalance operation

kbkbvw ll 121 )1()'(

kbkbvw ll 121 )1()''(

kbkb ll811

kbvw l83)'( kbvw l

85)''(

)( kbl

1kbl

kbkb ll 1141 ...

kbkb ll 1141 ...

)(log kN

bO

Page 31: I/O-Algorithms

Lars Arge

I/O-algorithms

31

Weight-balanced B-tree Delete• Search for relevant leaf u and delete element• Traverse path from u to root:

– If level l node v now hasthen fuse with sibling into node v’with

– If now then split into nodeswith weightand

• Algorithm correct and touch nodes• Weight-balance property:

– updates below v’ and v’’ before next rebalance operation

1)'(1 45

42 kbvwkb ll

)( kbl

1)( 41 kbvw l

kbvw l87)'(

11 1651

167 kbkbkb lll

kbkbkb lll861

85

141 kbl

kbkb ll 1141 ...

kbkb ll 1141 ...

)(log kN

bO

Page 32: I/O-Algorithms

Lars Arge

I/O-algorithms

32

Summary/Conclusion: Weight-balanced B-tree• Weight-balanced B-tree with branching parameter b and leaf

parameter k=Ω(B)– O(N/B) space– Height– rebalancing operations after update– Ω(w(v)) updates below v between consecutive operations on v

• Weight-balanced B-tree with branching parameter Bc and leaf parameter B– Updates in and queries in I/Os

• Construction bottom-up in I/O

)(log kN

bO)(log NO b

)(log NO B )(log BT

B NO

)log( BN

BN

BMO

Page 33: I/O-Algorithms

Lars Arge

I/O-algorithms

33

References• Lower bound on External Permuting/Sorting

Lecture notes by L. Arge.

• External Memory Geometric Data StructuresLecture notes by Lars Arge.– Section 1-3


Recommended