I/O-Algorithms
Lars Arge
Spring 2012
April 17, 2012
I/O-algorithms
2Lars Arge
I/O-Model
• ParametersN = # elements in problem instanceB = # elements that fits in disk blockM = # elements that fits in main memory
T = # output size in searching problem
• We often assume that M>B2
• I/O: Movement of block between memory and disk
D
P
M
Block I/O
Lars Arge
I/O-Algorithms
3
Fundamental Bounds Internal External
• Scanning: N• Sorting: N log N• Permuting • Searching:
• Note:– Linear I/O: O(N/B)– Permuting not linear– Permuting and sorting bounds are equal in all practical cases– B factor VERY important: – Cannot sort optimally with search tree
NBlog
BN
BN
BMlog
BN
NBN
BN
BN
BM log
}log,min{ BN
BN
BMNN
N2log
Scalability Problems: Block Access Matters• Example: Traversing linked list (List ranking)
– Array size N = 10 elements– Disk block size B = 2 elements– Main memory size M = 4 elements (2 blocks)
• Large difference between N and N/B large since block size is large– Example: N = 256 x 106, B = 8000 , 1ms disk access time
N I/Os take 256 x 103 sec = 4266 min = 71 hr N/B I/Os take 256/8 sec = 32 sec
Algorithm 2: N/B=5 I/OsAlgorithm 1: N=10 I/Os
1 5 2 6 73 4 108 9 1 2 10 9 85 4 76 3
4Lars Arge
I/O-Algorithms
I/O-algorithms
5Lars Arge
List Ranking• Problem:
– Given N-vertex linked list stored in array– Compute rank (number in list) of each vertex
• One of the simplest graph problem one can think of
• Straightforward O(N) internal algorithm– Also uses O(N) I/Os in external memory
• Much harder to get external algorithm
3 4 5 9 68 27 101 5 2 6 73 4 108 9
)log( BN
BMBNO
I/O-algorithms
6Lars Arge
List Ranking• We will solve more general problem:
– Given N-vertex linked list with edge-weights stored in array– Compute sum of weights (rank) from start for each vertex
• List ranking: All edge weights one
• Note: Weight stored in array entry together with edge (next vertex)
1 5 2 6 73 4 108 9
1 1 11 111 1
1 1
I/O-algorithms
7Lars Arge
List Ranking
• Algorithm:1. Find and mark independent set of vertices2. “Bridge-out” independent set: Add new edges3. Recursively rank resulting list4. “Bridge-in” independent set: Compute rank of independent set
• Step 1, 2 and 4 in I/Os• Independent set of size αN for 0 < α ≤ 1
I/Os
11 111 1 1 11 12 2 2
1 3 4 6 8 9 102 5 7
)log( BN
BMBNO
)log()log())1(()( BN
BMBN
BN
BMBN OONTNT
I/O-algorithms
8Lars Arge
List Ranking: Bridge-out/in
• Obtain information (edge or rang) of successor – Make copy of original list– Sort original list by successor id– Scan original and copy together to obtain successor information– Sort modified original list by id
I/Os
11
2 3 4 5 9 68 27 102 3 4 95 86 7 103 4 5 9 68 27 103 4 8 9 627 10
)log( BN
BMBNO
I/O-algorithms
9Lars Arge
List Ranking: Independent Set• Easy to design randomized algorithm:
– Scan list and flip a coin for each vertex– Independent set is vertices with head and successor with tails
Independent set of expected size N/4
• Deterministic algorithm:– 3-color vertices (no vertex same color as predecessor/successor) – Independent set is vertices with most popular color
Independent set of size at least N/3
• 3-coloring I/O algorithm)log( BN
BMBNO )log( B
NBMB
NO
)log( BN
BMBNO
3 4 5 9 68 27 10
I/O-algorithms
10Lars Arge
List Ranking: 3-coloring• Algorithm:
– Consider forward and backward lists (heads/tails in two lists)– Color forward lists (except tail) alternately red and blue– Color backward lists (except tail) alternately green and blue
3-coloring
3 4 5 9 68 27 10
I/O-algorithms
11Lars Arge
List Ranking: Forward List Coloring• Identify heads and tails• For each head, insert red element in priority-queue (priority=position)• Repeatedly:
– Extract minimal element from queue– Access and color corresponding element in list– Insert opposite color element corresponding to successor in queue
• Scan of list• O(N) priority-queue operations
I/Os
`3 4 5 9 68 27 10
)log( BN
BMBNO
I/O-algorithms
12Lars Arge
Summary: List Ranking• Simplest graph problem: Traverse linked list
• Very easy O(N) algorithm in internal memory• Much more difficult external memory
– Finding independent set via 3-coloring– Bridging vertices in/out
• Permuting bound best possible– Also true for other graph problems
)log( BN
BMBNO
})log,(min{ BN
BN
BMNO
3 4 5 9 68 27 101 5 2 6 73 4 108 9
I/O-algorithms
13Lars Arge
Summary: List Ranking• External list ranking algorithm similar to PRAM algorithm
– Sometimes external algorithms by “PRAM algorithm simulation”
• Forward list coloring algorithm example of “time forward processing”– Use external priority-queue to send information “forward in time”
to vertices to be processed later
3 4 5 9 68 27 10
I/O-algorithms
14Lars Arge
Algorithms on TreesTBD
I/O-algorithms
15Lars Arge
References• External-Memory Graph Algorithms
Y-J. Chiang, M. T. Goodrich, E.F. Grove, R. Tamassia. D. E. Vengroff, and J. S. Vitter. Proc. SODA'95– Section 3-6
• I/O-Efficient Graph AlgorithmsNorbert Zeh. Lecture notes– Section 2-4
• Cache-Oblivious Priority Queue and Graph Algorithm ApplicationsL. Arge, M. Bender, E. Demaine, B. Holland-Minkley and I. Munro. SICOMP, 36(6), 2007– Section 3.1-3-2