+ All Categories
Home > Documents > 2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications 1 Three Topics in Parallel...

2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications 1 Three Topics in Parallel...

Date post: 26-Mar-2015
Category:
Upload: maya-mccurdy
View: 220 times
Download: 0 times
Share this document with a friend
Popular Tags:
76
2006-09-29 2006-09-29 Emin Gabrielyan, Three Topi Emin Gabrielyan, Three Topi cs in Parallel Communicatio cs in Parallel Communicatio ns ns 1 Three Topics in Three Topics in Parallel Parallel Communications Communications Thesis presentation by Thesis presentation by Emin Gabrielyan Emin Gabrielyan
Transcript
Page 1: 2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications 1 Three Topics in Parallel Communications Thesis presentation by Emin Gabrielyan.

2006-09-292006-09-29 Emin Gabrielyan, Three Topics in ParallEmin Gabrielyan, Three Topics in Parallel Communicationsel Communications

11

Three Topics in Parallel Three Topics in Parallel CommunicationsCommunications

Thesis presentation by Emin Thesis presentation by Emin GabrielyanGabrielyan

Page 2: 2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications 1 Three Topics in Parallel Communications Thesis presentation by Emin Gabrielyan.

222006-09-292006-09-29 Emin Gabrielyan, Three Topics in ParallEmin Gabrielyan, Three Topics in Parallel Communicationsel Communications

Parallel communications: Parallel communications: bandwidth enhancement or fault-bandwidth enhancement or fault-

tolerance?tolerance?

We do not know if parallel communications We do not know if parallel communications were first used for fault-tolerance or for were first used for fault-tolerance or for bandwidth enhancementbandwidth enhancement

In 1964 Paul Baran proposed parallel In 1964 Paul Baran proposed parallel communications for fault-tolerance communications for fault-tolerance (inspiring the design of ARPANT and Internet)(inspiring the design of ARPANT and Internet)

1981 IBM invented the 8-bit parallel port 1981 IBM invented the 8-bit parallel port for faster communicationfor faster communication

Page 3: 2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications 1 Three Topics in Parallel Communications Thesis presentation by Emin Gabrielyan.

332006-09-292006-09-29 Emin Gabrielyan, Three Topics in ParallEmin Gabrielyan, Three Topics in Parallel Communicationsel Communications

Bandwidth enhancement by Bandwidth enhancement by parallelizing the sources and sinksparallelizing the sources and sinks

Bandwidth enhancement Bandwidth enhancement can be achieved by can be achieved by adding parallel pathsadding parallel pathsBut a greater capacity But a greater capacity enhancement is enhancement is achieved if we can achieved if we can replace the senders and replace the senders and destinations with parallel destinations with parallel sources and sinkssources and sinksThis is possible in This is possible in parallel I/O (first topic of parallel I/O (first topic of the thesis)the thesis)

Page 4: 2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications 1 Three Topics in Parallel Communications Thesis presentation by Emin Gabrielyan.

442006-09-292006-09-29 Emin Gabrielyan, Three Topics in ParallEmin Gabrielyan, Three Topics in Parallel Communicationsel Communications

Parallel transmissions in coarse-Parallel transmissions in coarse-grained networks cause congestionsgrained networks cause congestions

In coarse-grained circuit-switched HPC In coarse-grained circuit-switched HPC networks uncoordinated parallel networks uncoordinated parallel transmissions cause congestionstransmissions cause congestions

The overall throughput degrades due to The overall throughput degrades due to access conflicts on shared resourcesaccess conflicts on shared resources

Coordination of parallel transmissions is Coordination of parallel transmissions is covered by the second topic of my thesis covered by the second topic of my thesis (liquid scheduling)(liquid scheduling)

Page 5: 2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications 1 Three Topics in Parallel Communications Thesis presentation by Emin Gabrielyan.

552006-09-292006-09-29 Emin Gabrielyan, Three Topics in ParallEmin Gabrielyan, Three Topics in Parallel Communicationsel Communications

Classical backup parallel circuits for Classical backup parallel circuits for fault-tolerancefault-tolerance

Typically the Typically the redundant redundant resource remains resource remains idleidle

As soon as there is As soon as there is a failure with the a failure with the primary resourceprimary resource

The backup The backup resource replaces resource replaces the primary onethe primary one

Page 6: 2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications 1 Three Topics in Parallel Communications Thesis presentation by Emin Gabrielyan.

662006-09-292006-09-29 Emin Gabrielyan, Three Topics in ParallEmin Gabrielyan, Three Topics in Parallel Communicationsel Communications

Parallelism in living organismsParallelism in living organismsParallelism is Parallelism is observed in observed in almost every almost every living organismsliving organismsDuplication of Duplication of organs primarily organs primarily serves for fault-serves for fault-tolerancetoleranceAnd as a And as a secondary secondary purpose, for purpose, for capacity capacity enhancementenhancement

Page 7: 2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications 1 Three Topics in Parallel Communications Thesis presentation by Emin Gabrielyan.

772006-09-292006-09-29 Emin Gabrielyan, Three Topics in ParallEmin Gabrielyan, Three Topics in Parallel Communicationsel Communications

Simultaneous parallelism for fault-Simultaneous parallelism for fault-tolerance in fine-grained networkstolerance in fine-grained networks

A challenging bio-A challenging bio-inspired solution is inspired solution is to use to use simultaneously all simultaneously all available paths for available paths for achieving fault-achieving fault-tolerancetoleranceThis topic is This topic is addressed in the addressed in the last part of my last part of my presentation presentation (capillary routing)(capillary routing)

Page 8: 2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications 1 Three Topics in Parallel Communications Thesis presentation by Emin Gabrielyan.

2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications

8

Fine Granularity Parallel I/O for Cluster

Computers

SFIO, a Striped File parallel I/O

Page 9: 2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications 1 Three Topics in Parallel Communications Thesis presentation by Emin Gabrielyan.

2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications

9

Why is parallel I/O required

Single I/O gateway for cluster computer saturates

Does not scale with the size of the cluster

Page 10: 2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications 1 Three Topics in Parallel Communications Thesis presentation by Emin Gabrielyan.

2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications

10

What is Parallel I/O for Cluster Computers

Some or all of the cluster computers can be used for parallel I/O

Page 11: 2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications 1 Three Topics in Parallel Communications Thesis presentation by Emin Gabrielyan.

2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications

11

Objectives of parallel I/O

Resistance to concurrent access Scalability as the number of I/O nodes

increases High level of parallelism and load balance for

all application patterns and all types of I/O requests

Page 12: 2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications 1 Three Topics in Parallel Communications Thesis presentation by Emin Gabrielyan.

2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications

12

Parallel I/O Subsystem

Concurrent Access by Multiple Compute Nodes

No concurrent access overheads

No performsne degradation

When the number of compute nodes increases

Page 13: 2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications 1 Three Topics in Parallel Communications Thesis presentation by Emin Gabrielyan.

2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications

13

Scalable throughput of the parallel I/O subsystem

The overall parallel I/O throughput should increase linearly as the number of I/O nodes increasesParallel I/O Subsystem

Number of I/O Nodes

Thr

ough

put

Page 14: 2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications 1 Three Topics in Parallel Communications Thesis presentation by Emin Gabrielyan.

2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications

14

Concurrency and Scalability = Scalable All-to-All Communication

Concurrency and Scalability (as the number of I/O nodes increases) can be represented by scalable overall throughput when the number of compute and I/O nodes increases

Number of I/O and Compute Nodes

All-

to-A

ll T

hrou

ghpu

t

I/O Nodes

Compute Nodes

Page 15: 2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications 1 Three Topics in Parallel Communications Thesis presentation by Emin Gabrielyan.

2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications

15

High level of parallelism and load balance

Balanced distribution across parallel disks must be ensured:

For all types of application patterns: Using small or large I/O requests Continuous or fragmented I/O request

patterns

Page 16: 2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications 1 Three Topics in Parallel Communications Thesis presentation by Emin Gabrielyan.

2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications

16

How parallelism is achieved?

Split the logical file into stripes

Distribute the stripes cyclically across the subfiles

Sub

files

file1

file2 file3

file4

file5file6

Logical file

Page 17: 2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications 1 Three Topics in Parallel Communications Thesis presentation by Emin Gabrielyan.

2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications

17

The POSIX-like Interface of Striped File I/O

Using SFIO from MPI

Simple Posix like interface

#include <mpi.h>#include "/usr/local/sfio/mio.h"int _main(int argc, char *argv[]){ MFILE *f; int r=rank(); //Collective open operation f=mopen("p1/tmp/a.dat;p2/tmp/a.dat;", 5); //each process writes 8 to 14 characters at its own position

if(rank==0) mwritec(f,0,"Good*morning!",13); if(rank==1) mwritec(f,13,"Bonjour!",8); if(rank==2) mwritec(f,21,"Buona*mattina!",14);

mclose(f); //Collective close operation}

Page 18: 2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications 1 Three Topics in Parallel Communications Thesis presentation by Emin Gabrielyan.

2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications

18

Distribution of the global file data across the subfiles Example with three compute nodes and two I/O

nodes

First subfile

Global file

Second subfile

G o o d *

G o o d *

n g ! B o

n g ! B o

! B u o n

! B u o n

t i n a !

t i n a !

m o r n i

m o r n i

n j o u r

n j o u r

a * m a t

a * m a t

130 21

Page 19: 2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications 1 Three Topics in Parallel Communications Thesis presentation by Emin Gabrielyan.

2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications

19

Impact of the stripe unit size on the load balance

When the stripe unit size is large there is no guarantee that an I/O request will be well parallelized

subfiles

Logical fileI/O Request

Page 20: 2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications 1 Three Topics in Parallel Communications Thesis presentation by Emin Gabrielyan.

2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications

20

Fine granularity striping with good load balance

Low granularity ensures good load balance and high level of parallelism

But results in high network communication and disk access costsubfiles

Logical fileI/O Request

Page 21: 2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications 1 Three Topics in Parallel Communications Thesis presentation by Emin Gabrielyan.

2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications

21

Fine granularity striping is to be maintained

Most of the HPC parallel I/O solutions are optimized only for large I/O blocks (order of Megabytes)

But we focus on maintaining fine granularity The problem of the network communication

and disk access are addressed by dedicated optimizations

Page 22: 2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications 1 Three Topics in Parallel Communications Thesis presentation by Emin Gabrielyan.

2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications

22

Overview of the implemented optimizations

Disk access requests aggregation (sorting, cleaning-overlaps and merging)

Network communication aggregation Zero-copy streaming between network and

fragmented memory patterns (MPI derived datatypes)

Support of the multi-block interface efficiently optimizes application related file and memory fragmentations (MPI-I/O)

Overlapping of network communication with disk access in time (at the moment write operation only)

Page 23: 2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications 1 Three Topics in Parallel Communications Thesis presentation by Emin Gabrielyan.

2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications

23

Multi-block I/O request

Disk access optimizations Sorting Cleaning the

overlaps Merging Input: striped

user I/O requests

Output: optimized set of I/O requests

No data copy

block 1 bk. 2 block 3

access1 access2

Local subfile

6 I/O access requests are

merged into 2

Page 24: 2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications 1 Three Topics in Parallel Communications Thesis presentation by Emin Gabrielyan.

2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications

24

Network Communication Aggregation without Copying

Striping across 2 subfiles

Derived datatypes on the fly

Contiguous streaming

Logical file

From: application memory

Remote I/O node 1

Remote I/O node 2

To: remote I/O nodes

Page 25: 2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications 1 Three Topics in Parallel Communications Thesis presentation by Emin Gabrielyan.

2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications

25

SFIO library on compute node

Functional Architecture

Blue: Interface functions

Green: Striping functionality

Red: I/O request optimizations

Orange: Network communication and relevant optimizations

bkmerge: overlapping and aggregation

mkbset: creates on the fly MPI derived datatypes

SFP_CMD_WRITESFP_CMD

_READ

mreadmwrite

mreadc mreadb mwritec mwriteb

mrw (cyclic distribution)

sfp_rflush sfp_wflush

sfp_readc sfp_writec

sfp_rdwrc (request caching)

flushcache

sfp_readsfp_write sortcache

sfp_readb sfp_writeb

bkmerge

mkbsetsfp_wait

all

SFP_CMD_BREAD

SFP_CMD_BWRITE

I/O Node

MPI MPIMPIMPI

I/O L

isten

er

Page 26: 2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications 1 Three Topics in Parallel Communications Thesis presentation by Emin Gabrielyan.

2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications

26

Optimized throughput as a function of the stripe unit size

3 I/O nodes

1 compute node

Global file size: 660 Mbytes

TNET About 10

MB/s per disk

0

5

10

15

20

25

3050 100

200

500

1000

2000

5000

1000

0

2000

0

5000

0

Stripe unit size (bytes)

Wri

te t

hro

ug

hp

ut

(MB

/s)

non-optimized optimized

Page 27: 2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications 1 Three Topics in Parallel Communications Thesis presentation by Emin Gabrielyan.

2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications

27

All-to-all stress test on Swiss-Tx cluster supercomputer

Stress test is carried out on Swiss-Tx machine

8 full crossbar 12-port TNet switches

64 processors Link throughput is

about 86 MB/s

Page 28: 2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications 1 Three Topics in Parallel Communications Thesis presentation by Emin Gabrielyan.

2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications

28

SFIO on the Swiss-Tx cluster supercomputer

MPI-FCI Global file size: up

to 32 GB Mean of 53

measurements for each number of nodes

Nearly linear scaling with 200 bytes stripe unit !

Network is a bottleneck above 12 nodes

0

50

100

150

200

250

300

350

400

1 3 5 7 911 13 15 17 19 21 23 25 27 29 31

Number of compute and I/O nodes

Ove

rall

all-t

o-al

l thr

ough

put (

MB

/s)

write maximum

write average

read maximum

read average

Page 29: 2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications 1 Three Topics in Parallel Communications Thesis presentation by Emin Gabrielyan.

2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications

29

Liquid scheduling for low-latency circuit-switched networks

Reaching liquid throughput in HPC wormhole switching and in Optical lightpath routing networks

Page 30: 2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications 1 Three Topics in Parallel Communications Thesis presentation by Emin Gabrielyan.

2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications

30

Upper limit of the network capacity

Given is a set of parallel transmissions

and a routing scheme

The upper limit of network’s aggregate capacity is its liquid throughput

Page 31: 2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications 1 Three Topics in Parallel Communications Thesis presentation by Emin Gabrielyan.

2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications

31

Distinction: Packet Switching versus Circuit Switching

Packet switching is replacing circuit switching since 1970 (more flexible, manageable, scalable)

New circuit switching networks are emerging (HPC clusters, Optical switching)

In HPC wormhole routing targets extremely low latency requirements

In optical network packet switching is not possible due to lack of technology

Page 32: 2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications 1 Three Topics in Parallel Communications Thesis presentation by Emin Gabrielyan.

2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications

32

Coarse-Grained Networks In circuit switching

the large messages are transmitted entirely (coarse-grained switching)

Low latency The sink starts

receiving the message as soon as the sender starts transmission

Message Sink

Message Source

Fin

e-G

rain

ed

Pac

ket

switc

hing

Coa

rse-

grai

ned

Circ

uit

switc

hing

Page 33: 2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications 1 Three Topics in Parallel Communications Thesis presentation by Emin Gabrielyan.

2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications

33

Parallel transmissions in coarse-grained networks

When the nodes transmit in parallel across a coarse-grained network in uncoordinated fashion congestion may occur

The resulting throughput can be far below the expected liquid throughput

Page 34: 2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications 1 Three Topics in Parallel Communications Thesis presentation by Emin Gabrielyan.

2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications

34

Congestions and blocked paths in wormhole routing

When the message encounters a busy outgoing port it waits

The previous portion of the path remains occupied

Source1

Sink2

Sink1

Source2

Sink3

Source3

Page 35: 2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications 1 Three Topics in Parallel Communications Thesis presentation by Emin Gabrielyan.

2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications

35

Hardware solution in Virtual Cut-Through routing

In VCT when the port is busy

The switch buffers the entire message

Much more expensive hardware than in wormhole switching

Source1

Sink2

Sink1

Source2

Sink3

Source3

buffering

Page 36: 2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications 1 Three Topics in Parallel Communications Thesis presentation by Emin Gabrielyan.

2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications

36

Other hardware solutions

In optical networks OEO conversion can be used

Significant impact on the cost (vs. memory-less wormhole switch and MEMS optical switches)

Affecting the properties of the network (e.g. latency)

Page 37: 2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications 1 Three Topics in Parallel Communications Thesis presentation by Emin Gabrielyan.

2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications

37

Application level coordinated liquid scheduling

Liquid scheduling is a software solution

Implemented at the application level No investments in network hardware Coordination between the edge nodes

is required Network topology knowledge is

assumed

Page 38: 2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications 1 Three Topics in Parallel Communications Thesis presentation by Emin Gabrielyan.

2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications

38

Example of a simple traffic pattern

5 sending nodes (above)

5 receiving nodes (below)

2 switches 12 links of

equal capacity Traffic consist

of 25 transfers

Page 39: 2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications 1 Three Topics in Parallel Communications Thesis presentation by Emin Gabrielyan.

2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications

39

Round robin schedule of all-to-all traffic pattern

First, all nodes simultaneously send the message to the node in front

Then, simultaneously, to the next node

etc

Page 40: 2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications 1 Three Topics in Parallel Communications Thesis presentation by Emin Gabrielyan.

2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications

40

Throughput of round-robin schedule

3rd and 4th phases require each two timeframes

7 timeframes are needed in total

Link throughput = 1Gbps Overall throughput =

25/7x1Gbps = 3.57Gbps

Page 41: 2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications 1 Three Topics in Parallel Communications Thesis presentation by Emin Gabrielyan.

2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications

41

A liquid schedule and its throughput

6 timeframes of non-congesting transfers Overall throughput = 25/6x1Gbps = 4.16Gbps

Page 42: 2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications 1 Three Topics in Parallel Communications Thesis presentation by Emin Gabrielyan.

2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications

42

Problem of liquid scheduling

Building liquid schedule for arbitrary traffic of transfers

Problem of partitioning of the traffic into minimal number of subsets consisting of non-congesting transfers

Timeframe = a subset of non-congesting transfers

Page 43: 2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications 1 Three Topics in Parallel Communications Thesis presentation by Emin Gabrielyan.

2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications

43

Definitions of our mathematical model

Transfer is a set of links lying on the path of the transmission

Load of a link is the number of transfers in the traffic using that link

Most loaded links are called bottlenecks

Duration of the traffic is the load of its bottlenecks

Page 44: 2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications 1 Three Topics in Parallel Communications Thesis presentation by Emin Gabrielyan.

2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications

44

bott

lene

cks

Teams = non-congesting transfers using all bottleneck links

The shortest possible time to carry out the traffic is the active time of the bottleneck links

Then the schedule must keep the bottleneck links busy all the time

Therefore the timeframes of a liquid schedule must consist of transfers using all bottlenecks

team

not

a te

am

Page 45: 2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications 1 Three Topics in Parallel Communications Thesis presentation by Emin Gabrielyan.

2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications

45

Retrieval of teams without repetitions by subdivisions

Teams can be retrieved without repetitions by recursive partitioning

By a choice of a transfer all teams are divided into teams using that transfer and teams not using it

Each halves can be similarly sub divided until individual teams are retrieved

Page 46: 2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications 1 Three Topics in Parallel Communications Thesis presentation by Emin Gabrielyan.

2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications

46

Teams use all bottlenecks: retrieving teams of traffic skeleton

Since teams must use transfers using the bottleneck links

We can first create teams using only such transfers (traffic skeleton)

Chart: fraction of the traffic skeleton

0%10%20%30%40%50%60%70%80%90%

100%

0 (0

0)64

(08

)10

0 (1

0)12

1 (1

1)14

4 (1

2)16

9 (1

3)19

6 (1

4)22

5 (1

5)22

5 (1

5)25

6 (1

6)28

9 (1

7)32

4 (1

8)36

1 (1

9)40

0 (2

0)44

1 (2

1)48

4 (2

2)57

6 (2

4)62

5 (2

5)90

0 (3

0)

Number of transfers (and number of contributing nodes) for 362 different traffic patterns across Swiss-Tx cluster

Frac

tion

of

tran

sfer

s us

ing

bott

lene

cks

nodes:transfers:

Page 47: 2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications 1 Three Topics in Parallel Communications Thesis presentation by Emin Gabrielyan.

2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications

47

Optimization by first retrieving the teams of the skeleton

Speedup: by skeleton optimization

Reducing the search space 9.5 times

4.7

5.5 7.4

7.9

8.1

8.3

9.2

9.3

9.6

9.9

10.0

10.1

10.7

10.8

10.9

11.3

12.0

12.2

12.6

12.7

13.4

14.0 20

.0

0%

5%

10%

15%

20%

25%

30%

35%

466.

6K (

100)

926.

2K (

121)

4.2M

(12

1)4.

2M (

121)

212K

(10

0)4.

9M (

121)

4.1M

(12

1)9.

2M (

121)

693.

2K (

100)

14.1

M (

121)

15.2

M (

121)

753.

7K (

100)

682K

(10

0)93

6K (

100)

1.2M

(10

0)88

.1K

(81

)95

K (

81)

115.

9K (

81)

1.8M

(10

0)57

.6K

(81

)9.

2K (

64)

136.

7K (

81)

14.2

M (

121)

Number of possible full teams (and number of transfers) for 23 different traffic patterns across the Swiss-Tx cluster

Sea

rch

spac

e re

duct

ion

(%)

idle+skeleton+blank idle+blank blank

transfers:

full

Page 48: 2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications 1 Three Topics in Parallel Communications Thesis presentation by Emin Gabrielyan.

2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications

48

Liquid schedule assembling from retrieved teams

By relying on efficient retrieval of full teams (subsets of non-congesting transfers using all bottlenecks)

We assemble liquid schedule by trying together different combinations of teams

Until all transfers of the traffic are used

Page 49: 2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications 1 Three Topics in Parallel Communications Thesis presentation by Emin Gabrielyan.

2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications

49

Liquid schedule assembling optimizations (reduced traffic)

Proved. If we remove a team from a traffic, new bottlenecks can emerge

New bottlenecks add additional constraints on the teams of the reduced traffic

Proved. A liquid schedule can be assembled if we use teams of the reduced traffic (instead of constructing teams of the initial traffic from the remaining transfers)

Proved. A liquid schedule can be assembled by considering only saturated full teams

Page 50: 2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications 1 Three Topics in Parallel Communications Thesis presentation by Emin Gabrielyan.

2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications

50

Liquid schedule construction speed with our algorithm

0.001

0.01

0.1

1

10

100

1000

10000

100000

1 21 41 61 81 101

121

141

161

181

201

221

241

261

281

301

321

341

361

362 sample topologies

CP

U ti

me

in s

econ

ds -

MILP Cplex method Liquid schedule construction algorithm

360 traffic patterns across Swiss-Tx network

Up to 32 nodes Up to 1024 transfers Comparison of our

optimized construction algorithm with MILP method (optimized for discrete optimization problems)

Page 51: 2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications 1 Three Topics in Parallel Communications Thesis presentation by Emin Gabrielyan.

2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications

51

Carrying real traffic patterns according to liquid schedules

Swiss-Tx supercomputer cluster network is used for testing aggregate throughputs

Traffic patterns are carried out according liquid schedules

Compare with topology-unaware round robin or random schedules

Page 52: 2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications 1 Three Topics in Parallel Communications Thesis presentation by Emin Gabrielyan.

2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications

52

Theoretical liquid and round-robin throughputs of 362 traffic samples

362 traffic samples across Swiss-Tx network

Up to 32 nodes Traffic carried out

according to round robin schedule reaches only 1/2 of the potential network capacity

0

200

400

600

800

1000

1200

1400

1600

1800

0 (

00)

64 (

08)

100

(10

)12

1 (

11)

144

(12

)16

9 (

13)

196

(14

)22

5 (

15)

225

(15

)25

6 (

16)

289

(17

)32

4 (

18)

361

(19

)40

0 (

20)

441

(21

)48

4 (

22)

576

(24

)62

5 (

25)

900

(30

)

Ove

rall

thro

ughp

ut (

MB

/s)

-

liquid throughput round-robin schedule

nodes:

transfers:

Page 53: 2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications 1 Three Topics in Parallel Communications Thesis presentation by Emin Gabrielyan.

2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications

53

Throughput of traffic carried out according liquid schedules

Traffic carried out according to liquid schedule practically reaches the theoretical throughput

200

400

600

800

1000

1200

1400

1600

1800

1 (

01)

64 (

08)

100

(10

)

121

(11

)

144

(12

)

169

(13

)

196

(14

)

225

(15

)

225

(15

)

256

(16

)

289

(17

)

324

(18

)

361

(19

)

400

(20

)

441

(21

)

484

(22

)

576

(24

)

676

(26

)

961

(31

)

Ove

rall

tthr

ough

put (

MB

/s)

theoretical liquid throughputmeasured throughput of a topology-unaware schedulemeasured throughput of a liquid schedule

nodes:

transfers:

Page 54: 2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications 1 Three Topics in Parallel Communications Thesis presentation by Emin Gabrielyan.

2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications

54

Liquid scheduling conclusions: application, optimization, speedup

In HPC networks, large messages are “copied” across the network causing congestions

Arbitrarily transmitted transfers yield throughput below the theoretical capacity

Liquid scheduling: relies on network topology and reaches the theoretical liquid throughput of the network

Liquid schedules can be constructed in less than 0.1 sec for traffic patterns with 1000 transmissions (about 100 nodes)

Future work: dynamic traffic patterns and application in OBS

Page 55: 2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications 1 Three Topics in Parallel Communications Thesis presentation by Emin Gabrielyan.

2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications

55

Fault-tolerant streaming with Capillary-routing

Path diversity and Forward Error Correction codes at the packet level

Page 56: 2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications 1 Three Topics in Parallel Communications Thesis presentation by Emin Gabrielyan.

2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications

56

Structure of my talk The advantages of packet level FEC in

Off-line streaming Solving the difficulties of Real-time

streaming by multi-path routing Generating multi-path routing

patterns of various path diversity Level of the path diversity and the

efficiency of the routing pattern for real-time streaming

Page 57: 2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications 1 Three Topics in Parallel Communications Thesis presentation by Emin Gabrielyan.

2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications

57

Decoding a file with Digital Fountain Codes

A file is divided into packets

Digital fountain code generates numerous checksum packets

Sufficient quantity of any checksum packets recovers the file

Like when filling your cup only collecting a sufficient amount of drops matters

Page 58: 2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications 1 Three Topics in Parallel Communications Thesis presentation by Emin Gabrielyan.

2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications

58

Transmitting large files without feedback across lossy networks using digital fountain codes

Sender transmits the checksum packets instead of the source packets

Interruptions cause no problems

The file is recovered once a sufficient number of packets is delivered

FEC in off-line streaming relies on time stretching

Page 59: 2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications 1 Three Topics in Parallel Communications Thesis presentation by Emin Gabrielyan.

2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications

59

In Real-time streaming the receiver play-back buffering time is limited

While in off-line streaming the data can be hold in the receiver buffer …

In real-time streaming the receiver is not permitted to keep data too long in the playback buffer

Page 60: 2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications 1 Three Topics in Parallel Communications Thesis presentation by Emin Gabrielyan.

2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications

60

Long failures on a single path route

If the failures are short, by transmitting a large number of FEC packets, receiver may constantly have in time a sufficient number of checksum packets

If the failure lasts longer than the playback buffering limit, no FEC can protect the real-time communication

Page 61: 2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications 1 Three Topics in Parallel Communications Thesis presentation by Emin Gabrielyan.

2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications

61

Reliable Off-line streaming

Rel

iabl

e re

al-

Tim

e st

ream

ing

Applicability of FEC in Real-Time streaming by using path diversity

Time stretching

Pla

ybac

k b

uffe

r lim

it

Real-time streaming

Losses can be recovered by extra packets:

received later (in off-line streaming)

received via another path (in real-time streaming)

Path diversity replaces time-stretching

Pat

h di

vers

ity

Page 62: 2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications 1 Three Topics in Parallel Communications Thesis presentation by Emin Gabrielyan.

2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications

62

Creating an axis of multi-path patterns

Intuitively we imagine the path diversity axis as shown

High diversity decreases the impact of individual link failures, but uses much more links, increasing the overall failure probability

We must study many multi-path routings patterns of different diversity in order to answer this question

Single path routing

Multi-path routing

Multi-path routing

Multi-path routing

Path diversity

Page 63: 2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications 1 Three Topics in Parallel Communications Thesis presentation by Emin Gabrielyan.

2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications

63

Capillary routing creates solutions with different level of path diversity

As a method for obtaining multi-path routing patterns of various path diversity we relay on capillary routing algorithm

For any given network and pair of nodes capillary routing produces layer by layer routing patterns of increasing path diversity

Path diversity = Layer of Capillary Routing

Page 64: 2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications 1 Three Topics in Parallel Communications Thesis presentation by Emin Gabrielyan.

2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications

64

Capillary routing - introduction

Capillary routing first offers a simple multi-path routing pattern

At each successive layer it recursively spreads out individual sub-flows of previous layers

The path diversity develops as the layer number increases

The construction relies on LP

Page 65: 2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications 1 Three Topics in Parallel Communications Thesis presentation by Emin Gabrielyan.

2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications

65

Reduce the maximal load of all links

Capillary routing – first layer First take the

shortest path flow and minimize the maximal load of all links

This will split the flow over a few parallel routes

Page 66: 2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications 1 Three Topics in Parallel Communications Thesis presentation by Emin Gabrielyan.

2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications

66

Capillary routing – second layer Then identify the

bottleneck links of the first layer

And minimize the flow of the remaining links

Continue similarly, until the full routing pattern is discovered layer by layer

Reduce the load of the remaining

links

Page 67: 2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications 1 Three Topics in Parallel Communications Thesis presentation by Emin Gabrielyan.

2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications

67

Capillary Routing Layers

Single network

4 routing patterns

Increasing path diversity

Page 68: 2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications 1 Three Topics in Parallel Communications Thesis presentation by Emin Gabrielyan.

2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications

68

Application model: evaluating the efficiency of path diversity To evaluate the efficiencies of patterns

with different path diversities we rely on an application model where:

The sender uses a constant amount of FEC checksum packets to combat weak losses and

The sender dynamically increases the number of FEC packets in case of serious failures

source packets re

dund

ant

pack

ets

FEC block

Page 69: 2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications 1 Three Topics in Parallel Communications Thesis presentation by Emin Gabrielyan.

2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications

69

Packet Loss Rate = 3%

Packet Loss Rate = 30%

Strong FEC codes are used in case of serious failures

When the packet loss rate observed at the receiver is below the tolerable limit, the sender transmits at its usual rate

But when the packet loss rate exceeds the tolerable limit, the sender adaptively increases the FEC block size by adding more redundant packets

Page 70: 2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications 1 Three Topics in Parallel Communications Thesis presentation by Emin Gabrielyan.

2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications

70

Redundancy Overall Requirement The overall amount of dynamically

transmitted redundant packets during the whole communication time is proportional:

to the duration of communication and the usual transmission rate

to a single link failure frequency and its average duration

and to a coefficient characterizing the given multi-path routing pattern

Page 71: 2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications 1 Three Topics in Parallel Communications Thesis presentation by Emin Gabrielyan.

2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications

71

Equation for ROR: it depends only on the routing pattern r(l)

Where: FECr(l) is the FEC transmission block size in case of the complete failure of link l

r(l) is the load of link l for a given routing pattern FECt is the FEC block size at default

streaming (tolerating loss rate t)

1)(|

)( 1lrtLl t

lr

FEC

FECROR

Page 72: 2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications 1 Three Topics in Parallel Communications Thesis presentation by Emin Gabrielyan.

2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications

72

ROR coefficient Smaller the ROR coefficient of the multi-

path routing pattern, better is the choice of multi-path routing for real-time streaming

By measuring ROR coefficient of multi-path routing patterns of different path diversity, we can evaluate the advantages (or disadvantages) of diversification

Multi-path routing patterns of different diversity are created by capillary routing algorithm

Page 73: 2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications 1 Three Topics in Parallel Communications Thesis presentation by Emin Gabrielyan.

2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications

73

05

1015202530354045505560

laye

r1

laye

r2

laye

r3

laye

r4

laye

r5

laye

r6

laye

r7

laye

r8

laye

r9

laye

r10

capillarization

Ave

rage

RO

R r

atin

g

ROR as a function of diversity Here is ROR as a

function of the capillarization level

It is an average function over 25 different network samples (obtained from MANET)

The constant tolerance of the streaming is 5.1%

Here is ROR function for a stream with a static tolerance of 4.5%

Here are ROR functions for static tolerances from 3.3% to 7.5%

3.3%3.9%4.5%5.1%

7.5%6.3%

Page 74: 2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications 1 Three Topics in Parallel Communications Thesis presentation by Emin Gabrielyan.

2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications

74

05

1015202530354045505560

Eight different sets of 25 network samples

Ave

rage

RO

R r

atin

g

3.3%

3.9%

4.5%5.1%

7.5%…

layers: 1…10 |1…10 |1…10 |1…10 |1…10 |1…10 |1…10 |1…10

Set2 Set3 Set4 Set5 Set6 Set7 Set8Set1

ROR rating over 200 network samples

ROR coefficients for 200 network samples

Each section is the average for 25 network samples

Network samples are obtained from random walk MANET

Path diversity obtained by capillary routing reduces the overall amount of FEC packets

Page 75: 2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications 1 Three Topics in Parallel Communications Thesis presentation by Emin Gabrielyan.

2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications

75

Conclusions

Although strong path diversity increases the overall failure rate it is beneficiary for real-time streaming (except a few pathological cases)

Capillary routing patterns reduce the overall number of redundant packets required from the sender

In single-path real-time streaming application of FEC at packet level is almost useless

With multi-path routing patterns real-time applications can have great advantages from application of FEC

Future work: using overly network to achieve a multi-path communication flow

Considering coding also inside network, not only at the edges; aiming also at energy saving in MANET

Page 76: 2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications 1 Three Topics in Parallel Communications Thesis presentation by Emin Gabrielyan.

76762006-09-292006-09-29 Emin Gabrielyan, Three Topics in ParallEmin Gabrielyan, Three Topics in Parallel Communicationsel Communications

Thank you!Thank you!

Presented topics:Presented topics:

Fine-grained parallel I/O for cluster Fine-grained parallel I/O for cluster computerscomputers

Liquid scheduling of parallel transmissions Liquid scheduling of parallel transmissions in coarse-grained networksin coarse-grained networks

Capillary routing: fault-tolerance in fine-Capillary routing: fault-tolerance in fine-grained networksgrained networks


Recommended