Towards an Application-Aware Towards an Application-Aware Multicast Communication Multicast Communication
Framework for Computational Framework for Computational GridsGrids
M. MM. MAIMOURAIMOUR, C. P, C. PHAMHAM
RESO/LIP, UCB LyonRESO/LIP, UCB Lyon
ASIAN'02, HanoiASIAN'02, Hanoi
Dec 5th, 2002Dec 5th, 2002
Computational gridsComputational gridsapplication user
from Dorian Arnold: Netsolve Happenings
The current usage of gridsThe current usage of grids
MostlyMostly– Database accesses, sharing, replications(DataGrid, Database accesses, sharing, replications(DataGrid,
Encyclopedia of Life Project…)Encyclopedia of Life Project…)
– Distributed Data Mining (seti@home…)Distributed Data Mining (seti@home…)
– Data and code transfert, massively parallel job Data and code transfert, massively parallel job submissions (task-farm computing)submissions (task-farm computing)
FewFew– Distributed applications (MPI…) Distributed applications (MPI…)
– Interactive applications (DIS, HLA…), remote Interactive applications (DIS, HLA…), remote visualizationvisualization
WHY?
WHY?WHY?
End-to-End performances are not here yetEnd-to-End performances are not here yet
Not scalable!Not scalable!
Unable to adapt to new technologies and usesUnable to adapt to new technologies and uses
WHY??WHY??People forgot the networking side of gridsPeople forgot the networking side of grids
Gbits/s links do not mean E2E performances!Gbits/s links do not mean E2E performances!
Computing resources and network resources Computing resources and network resources are logically separatedare logically separated
Visions for a gridVisions for a gridFROM DUMB LINKS CONNECTING COMPUTING RESOURCES
TO COLLABORATIVE RESOURCES
The network can work together withthe applications toprovide in-networkprocessing functions
The network can work together withthe applications toprovide in-networkprocessing functions
Application-AwareApplication-AwareInfrastructure on GridsInfrastructure on Grids
core networkGbits/s rate
100 Base TXsourcesource
active router active router
active routerInternet Data Center
application-aware component
computing center
computing center
campus/corporate
lab cluster
Application-Aware Components Application-Aware Components AACAAC
Based on pBased on programmable rogrammable active active nodesnodes/routers/routers
Customized Customized computationscomputations on packetson packets
Standardized execution Standardized execution environment and environment and programming interfaceprogramming interface
DataData
active code A1
active code A2
A1A1A2A2
Interoperability with legacy Interoperability with legacy routersrouters
IP IP IP IP IP IP
TCP/UDP TCP/UDP TCP/UDP TCP/UDP
AL AL AL ALtraditional IP routing
APPLI APPLI
similar to tunnelling
Deploying new servicesDeploying new services
Collective/gather operationsCollective/gather operations Interest management, filtering (DIS, HLA)Interest management, filtering (DIS, HLA) On-the-fly flow adaptation (compression, On-the-fly flow adaptation (compression,
layering…) for remote displayslayering…) for remote displays Intelligent directory servicesIntelligent directory services Distributed, hierarchical security systemDistributed, hierarchical security system Distributed Logistical StorageDistributed Logistical Storage Custom QoS policyCustom QoS policy
Ex: Collective operationsEx: Collective operationsmax computationmax computation
if x<a then x=a
if x<a then x=a
MAX MAX
MAX
MAX
if x<a then x=a
AAC
AAC
AAC
Ex: Wide-area interactive Ex: Wide-area interactive simulationssimulations
human in the loopflight simulator
remote displayflight traffic generator
INTERNETGRID
airport simulator
flow adaptationspecific filter
specific filter
specific filter
"only very closeevents" filter
Deploying reliable multipoint Deploying reliable multipoint data distribution servicesdata distribution services
ForFor– Database accesses, sharing, replicationsDatabase accesses, sharing, replications– Data and code transfert, massively parallel job Data and code transfert, massively parallel job
submissions (task-farm computing)submissions (task-farm computing)– Distributed applications (MPI…) Distributed applications (MPI…) – Interactive applications (DIS, HLA…)Interactive applications (DIS, HLA…)
Desired featuresDesired features– scalablescalable– low latencieslow latencies
Deploying reliable multipoint Deploying reliable multipoint data distribution servicesdata distribution services
ForFor– Database accesses, sharing, Database accesses, sharing,
replicationsreplications– Data and code transfert, Data and code transfert,
massively parallel job massively parallel job submissions (task-farm submissions (task-farm computing)computing)
– Distributed applications Distributed applications (MPI…) (MPI…)
– Interactive applications Interactive applications (DIS, HLA…)(DIS, HLA…)
Desired featuresDesired features– scalablescalable– low latencieslow latencies
Sender
data
datadata
data
Receiver Receiver
datadata
withoutmulticast
Deploying reliable multipoint Deploying reliable multipoint data distribution servicesdata distribution services
ForFor– Database accesses, sharing, Database accesses, sharing,
replicationsreplications– Data and code transfert, Data and code transfert,
massively parallel job massively parallel job submissions (task-farm submissions (task-farm computing)computing)
– Distributed applications Distributed applications (MPI…) (MPI…)
– Interactive applications Interactive applications (DIS, HLA…)(DIS, HLA…)
Desired featuresDesired features– scalablescalable– low latencieslow latencies
Sender
data
datadata
data
Receiver Receiver Receiver
IP multicast
DyRAMDyRAM
Protocol with modular services for achieving Protocol with modular services for achieving reliability, scalability and low latenciesreliability, scalability and low latencies
global NACKsuppression
Early PacketLoss Detection
Local
Recoveries
DynamicReplierElection
AccurateCongestion
Control
subcast ofrepair
packets
Ex: Ex: Global NACKs suppressionGlobal NACKs suppression
NACK4NACK4
NACK4
NACK4data4
NACK4
only one NACK is forwarded to the source
Ex: Ex: EEaarly losrly lostt packet packet detectiondetection
NACK4
NACK4
NACK4
NACK4
NACK4
A NACK is sent by the router
data3data4
data5
The repair latency can be reduced if the lost packet could be requested as soon as possible
These NACKs are ignored!
core networkGbits/s rate
active router active router
active router
sourcesource
Internet Data Center
application-aware component
computing center
computing center
campus/corporate
The AAC associated to the source can perform early processing on packets. For instance the DyRAM protocol uses subcast and loss detection services in order to reduce the end-to-end latency.
In DyRAM, any recei-ver can be designated as a replier for a loss packet.The election service is performed by the upstream AAC on a per-packet basis. Having dynamic repliers allows for more scalability as caching within routers is not required.
An AAC associated to a tail link performs NACK aggregation, subcasting and the election on a per-packet basis of the replier.
Deploying reliable multipoint Deploying reliable multipoint data distribution servicesdata distribution services
Local recovery & replier electionLocal recovery & replier election
Local recoveries Local recoveries reduces the end-to-reduces the end-to-end delay end delay (especially for (especially for high loss rates and high loss rates and a large number of a large number of receivers).receivers).
#grp: 6…24
4 receivers/group
p=0.25
Local recovery & replier electionLocal recovery & replier election
As the group size As the group size increases, doing the increases, doing the recoveries from the recoveries from the receivers greatly receivers greatly reduces the reduces the bandwidth bandwidth consumptionconsumption
48 receivers distributed in g groups #grp: 2…24
Early Packet Loss ServiceEarly Packet Loss Service
p=0.25
#grp: 6…244 receivers/group
EPLD is very beneficialto DyRAM
DyRAM implementation DyRAM implementation
TAMANOIR active execution environmentTAMANOIR active execution environment Java 1.3.1 and a linux kernel 2.4Java 1.3.1 and a linux kernel 2.4 A set of PCs receivers and 2 PC-based A set of PCs receivers and 2 PC-based
routers ( Pentium II 400 MHz 512 KB routers ( Pentium II 400 MHz 512 KB cache 128MB RAM)cache 128MB RAM)
Data packets are 4 KBytesData packets are 4 KBytes
testbed configurationtestbed configuration
The data pathThe data path
IP UDP S,@IP data
ANEP packet
IP
UDP
S,@IP dataTamanoir portFTP port
S
TAMANOIRTAMANOIR
S1
FTPFTP
Cost of Data Packet ServicesCost of Data Packet Services
ike
resama
resamo
resamdstan
Cost of Data Packet ServicesCost of Data Packet Services
NACK: 135μsNACK: 135μs DP : 20μs if DP : 20μs if
no seq gap, no seq gap, 12ms-17ms 12ms-17ms otherwise. otherwise. Only 256μs Only 256μs without timer without timer settingsetting
Repair: 123μsRepair: 123μs
Cost of Replier ElectionCost of Replier Election
ike
resamo
NACK
The election is performed on-the-fly.
It depends on the number of downstream links.
Costs range from 0.1 to 1ms for 5 to 25 links per router.
Cost of Replier ElectionCost of Replier Election
Conclusions (1)Conclusions (1)
Grids can be more than end-host computing Grids can be more than end-host computing resources interconnected with network linksresources interconnected with network links
High-bandwidth links is not enough to High-bandwidth links is not enough to provide E2E performances for distributed, provide E2E performances for distributed, interactive applicationsinteractive applications
Application-aware components can be Application-aware components can be deployed to host high-value servicesdeployed to host high-value services
In-network processing functions can make In-network processing functions can make grids more responsive to applications' needsgrids more responsive to applications' needs
Conclusions (2)Conclusions (2)
The paper shows how an efficient The paper shows how an efficient multipoint service can be deployed on an multipoint service can be deployed on an application-aware infrastructureapplication-aware infrastructure
Simulations and experimentations shows Simulations and experimentations shows that low latencies can be obtained with the that low latencies can be obtained with the combination and collaboration of light and combination and collaboration of light and simple servicessimple services