+ All Categories

Problem

Date post: 13-Jan-2016
Category:
Upload: gerik
View: 26 times
Download: 0 times
Share this document with a friend
Description:
Problem. Parallelize (serial) applications that use files. Examples: compression tools, logging utilities, databases. In general applications that use files depend on sequential output, serial append is the usual file I/O operation. Goal: perform file I/O operations in parallel, - PowerPoint PPT Presentation
Popular Tags:
29
Problem Parallelize (serial) applications that use files. Examples: compression tools, logging utilities, databases. In general applications that use files depend on sequential output, serial append is the usual file I/O operation. Goal: perform file I/O operations in parallel, keep the sequential, serial append of the file.
Transcript
Page 1: Problem

Problem

● Parallelize (serial) applications that use files.– Examples: compression tools, logging utilities, databases.

● In general– applications that use files depend on sequential output,

– serial append is the usual file I/O operation.

● Goal:– perform file I/O operations in parallel,

– keep the sequential, serial append of the file.

Page 2: Problem

Results

● Cilk runtime-support for serial append with good scalability.

● Three serial append schemes and implementations for Cilk:

1. ported Cheerio, previous parallel file I/O API (M. Debergalis),

2. simple prototype (with concurrent Linked Lists),

3. extension, more efficient data structure (concurrent double-linked Skip Lists).

● Parallel bz2 using PLIO.

Page 3: Problem

Single Processor Serial Append

1

5

12

92 6

1187

3 4

10

computation DAG

FILE (serial append)

Page 4: Problem

Single Processor Serial Append

1

5

12

92 6

1187

3 4

10

computation DAG

FILE (serial append)

1 2 3

Page 5: Problem

Single Processor Serial Append

1

5

12

92 6

1187

3 4

10

computation DAG

FILE (serial append)

1 2 3 4 5 6 7

Page 6: Problem

Single Processor Serial Append

1

5

12

92 6

1187

3 4

10

computation DAG

FILE (serial append)

1 2 3 4 5 6 7 8 9 10 11 12

Page 7: Problem

Single Processor Serial Append

1

5

12

92 6

1187

3 4

10

computation DAG

FILE (serial append)

1 2 3 4 5 6 7 8 9 10 11 12

Why not in parallel?!

Page 8: Problem

Fast Serial Append

ParalleL file I/O (PLIO) support for Serial Append in

Cilk

Alexandru Caracaş

Page 9: Problem

Outline

● Example– single processor & multiprocessor

● Semantics – view of Cilk Programmer

● Algorithm– modification of Cilk runtime system

● Implementation– Previous work

● Performance– Comparison

Page 10: Problem

Multiprocessor Serial Append

1

5

12

92 6

1187

3 4

10

computation DAG

FILE (serial append)

Page 11: Problem

Multiprocessor Serial Append

1

5

12

92 6

1187

3 4

10

computation DAG

FILE (serial append)

1 2 7

Page 12: Problem

Multiprocessor Serial Append

1

5

12

92 6

1187

3 4

10

computation DAG

FILE (serial append)

1 2 3 5 7 8 9

Page 13: Problem

Multiprocessor Serial Append

1

5

12

92 6

1187

3 4

10

computation DAG

FILE (serial append)

1 2 3 4 5 7 8 9 106

Page 14: Problem

Multiprocessor Serial Append

1

5

12

92 6

1187

3 4

10

computation DAG

FILE (serial append)

1 2 3 4 5 6 7 8 9 10 11 12

Page 15: Problem

File Operations

● open (FILE, mode) / close (FILE).● write (FILE, DATA, size)

– processor writes to its PION.

● read (FILE, BUFFER, size)– processor reads from PION.

● Note: a seek operation may be required

● seek (FILE, offset, whence)– processor searches for the right PION in the ordered data

structure

Page 16: Problem

Semantics

● View of Cilk programmer:– Write operations

● preserve the sequential, serial append.

– Read and Seek operations● can occur only after the file has been closed, ● or on a newly opened file.

Page 17: Problem

Approach (for Cilk)

● Bookkeeping (to reconstruct serial append)– Divide execution of the computation,

– Meta-Data (PIONs) about the execution of the computation.

● Observation– In Cilk, steals need to be accounted for during execution.

● Theorem

– expected # of steals = O ( PT∞

).

● Corollary (see algorithm)

– expected # of PIONs = O ( PT∞

).

Page 18: Problem

PION (Parallel I/O Node)

● Definition: a PION represents all the write operations to a file performed by a processor in between 2 steals.

● A PION contains:– # data bytes written,

– victim processor ID,

– pointer to written data.

π1

π1 π3

π3 π2π2 π4

π4

FILE

PION

1 2 3 4 5 6 7 8 9 10 11 12

Page 19: Problem

Algorithm

● All PIONSs are kept in an ordered data structure.– very simple Example: Linked List.

● On each steal operation performed by processor Pi from

processor Pj:

– create a new PION πi,

– attach πi immediately after π

j, the PION of P

j in the order data

structure.

PIONsπ

1 πk

πkπ

j

Page 20: Problem

Algorithm

● All PIONSs are kept in an ordered data structure.– very simple Example: Linked List.

● On each steal operation performed by processor Pi from

processor Pj:

– create a new PION πi,

– attach πi immediately after π

j, the PION of P

j in the order data

structure.

PIONsπ

1 πk

πkπ

j

πi

πi

Page 21: Problem

Algorithm

● All PIONSs are kept in an ordered data structure.– very simple Example: Linked List.

● On each steal operation performed by processor Pi from

processor Pj:

– create a new PION πi,

– attach πi immediately after π

j, the PION of P

j in the order data

structure.

π1

π1 π

j πk

πk

PIONsπ

i

Page 22: Problem

Implementation

● Modified the Cilk runtime system to support desired operations.– implemented hooks on the steal operations.

● Initial implementation:– concurrent Linked List (easier algorithms).

● Final implementation:– concurrent double-linked Skip List.

● Ported Cheerio to Cilk 5.4.

Page 23: Problem

Details of Implementation

● Each processor has a buffer for the data in its own PIONs– implemented as a file.

● Data structure to maintain the order of PIONs:– Linked List, Skip List.

● Meta-Data (order maintenance structure of PIONs)– kept in memory,

– saved to a file when serial append file is closed.

Page 24: Problem

Skip List

NILNIL

NILNIL

NILNIL

NILNIL

● Similar performance with search trees:– O ( log (SIZE) ).

Page 25: Problem

Double-Linked Skip List

NILNIL

NILNIL

NILNIL

NILNIL

● Based on Skip Lists (logarithmic performance).● Cilk runtime-support in advanced implementation of

PLIO as rank order statistics.

Page 26: Problem

PLIO Performance

● no I/O vs writing 100MB with PLIO (w/ linked list),● Tests were run on yggdrasil a 32 proc Origin machine.● Parallelism=32,● Legend:

– black: no I/O,

– red: PLIO.

1 2 3 4 5 6 7 80

2

4

6

8

10

12

14

16

Number of Processors

Exe

cutio

n T

ime

(se

con

ds)

Page 27: Problem

Improvements & Conclusion

● Possible Improvements:– Optimization of algorithm:

● delete PIONs with no data,● cache oblivious Skip List,

– File system support,

– Experiment with other order maintenance data structures:● B-Trees.

● Conclusion:– Cilk runtime-support for parallel I/O

● allows serial applications dependent on sequential output to be parallelized.

Page 28: Problem

References

– Robert D. Blumofe and Charles E. Leiserson. Scheduling multithreaded computations by work stealing. In Proceedings of the 35th Annual Symposium on Foundations of Computer Science, pages 356-368, Santa Fe, New Mexico, November 1994.

– Matthew S. DeBergalis. A parallel file I/O API for Cilk. Master's thesis, Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, June 2000.

– William Pugh. Concurrent Maintenance of Skip Lists. Departments of Computer Science, University of Maryland, CS-TR-2222.1, June, 1990.

Page 29: Problem

References

– Thomas H. Cormen, Charles E. Leiserson, Donald L. Rivest and Clifford Stein. Introduction to Algorithms (2nd Edition). MIT Press. Cambridge, Massachusetts, 2001.

– Supercomputing Technology Group MIT Laboratory for Computer Science. Cilk 5.3.2 Reference Manual, November 2001. Available at http://supertech.lcs.mit.edu/cilk/manual-5.3.2.pdf.

– bz2 source code. Available at http://sources.redhat.com/bzip2.


Recommended