An Exposed Approach to Reliable Multicast in Heterogeneous Logistical Networks
Micah Beck, Assoc. Prof. & DirectorLogistical Computing & Internetworking (LoCI) Lab
Grids and Advanced Networking Tokyo, 14 May 2003
Credits
• Authors• Micah Beck
• Ying Ding
• Erika Fuentes
• Sharmila Kancherla
• LoCI Lab • James S. Plank• Terry Moore• Alex Bassi• Yong Zheng• Hunter Hagewood
• PlanetLab
Funding• Dept. of Energy
SciDAC
• National Science Foundation ANIR
• UT Center for Info Technology Research
Logistical Networking Research at UTK
University of Tennessee
• Micah Beck• James S. Plank• Jack Dongarra
University of California, Santa Barbara
• Rich Wolski
What is Logistical Networking?
• A scalable mechanism for deploying shared storage resources throughout the network
• A general store-and-forward overlay networking infrastructure
• A way to break transfers into segments and employ heterogeneous network technologies on the pieces
Why “Logistical Networking”
• Analogy to logistics in distribution of industrial and military personnel & materiel
• Fast highways alone are not enough Goods are also stored in warehouses for
transfer or local distribution
• Fast networks alone are not enough Data must be stored in buffers/files for
transfer or local distribution
The Network Storage Stack
Applications
Logistical File System
Logistical Tools
L-Bone
IBP
Local Access
Physical
exNode
• Our adaptation of the network stack architecture for storage• Like the IP Stack• Each level encapsulates details from the lower levels, while still exposing details to higher levels
IBP: The Internet Backplane Protocol
• Storage provisioned on community “depots”• Very primitive service (similar to block service, but
more sharable)• Goal is to be a common platform (exposed)• Also part of end-to-end design
• Best effort service – no heroic measures• Availability, reliability, security, performance
• Allocations are time-limited!• Leases are respected, can be renewed• Permanent storage is to strong to share!
Data Movers
• Module implementing standard point-to-multipoint transfer between IBP allocations
• Uniform API allows independence from the underlying data transfer protocol
• Not every DM can apply to every transfer• Caller responsible for determining validity
• Current options: Multi-TCP, Multi-UDP (reliable), UDP Multicast (unreliable)
mcopy operation
• Encapsulates shared buffering, management of multiple low level transfers
File System
Memory
1. Buffering
Sending DepotReceiving Depots
2. Parallel Transfers
Heterogeneity in mcopy
• TCP connections
• Unreliable UDP multicast
• Reliable UDP with flow control, retransmit
• Reliable UDP with TCP control channel• SABUL (R. Grossman, University of Chicago)
• Reliability must be end-to-end!
0
20
40
60
3 4 5 6
Destinations
Mb
/s
UDP MCAST
Pt-to-pt TCP
Pt-to-pt UDP
Comparison of Sending Rates in the LAN
Heterogeneous Multicast
End-to-End Reliability through Retransmission
source
destination
4. TCP control |channel
5. TCP retransmission
2. IBP mcasts
IBP depots
1. IBP upload
3. IBP download
Other Approaches to Reliable Multicast
• Retransmission in orginal group
• Multiple groups for retransmission assigned dynamically to sets of missed blocks
• Retransmission from intermediate nodes
• Application-dependent approaches• Video doesn’t need perfect reliability• Time deadlines alter retransmission priorities
Exposed Approach to Multicast
• Many important elements are under the control of an endpoint (the source)• Topology of multicast tree • Choice of mcast operation types • Handling of intermediate errors• Performance optimization
• Global & app-specific strategies possible
Limitations of Exposed Approach
• Scalability problems• Control from one end-point is limiting• Not sufficient for public media distribution• A distributed control infrastructure is required• Active routers provide a natural platform
• Tamanoir project of ENS-Lyon may provide a testbed for this architecture• Laurent Lefevre, Jean-Patrick Gelas
Topology and Performance
• Choosing tree nodes (can we detect underlying Layer 2 topology?)• Where is UDP multicast enabled?• Where is are UDP flooding protocols legal?
• Evaluating reliability, performance of component mcasts
• Trading off scalability for reliability and performance
Experiment:Three Approaches
• 10 recievers• Direct Unicast TCP to all nodes• Pure TCP overlay multicast
• TCP Data Mover used at every tree node
• Mixed TCP/UDP multicast • TCP Data Mover used in backbone• UDP multicast in edge networks
• Caveat: Measurements are not end-to-end!
Direct Unicast TCP
5
D
A
12
3
C
6
4
S
B
Pure TCP Overlay Multicast
5
D
A
12
3
C
6
4
S
B
Mixed TCP in Backbone/UDP Mcast at Edge
Experimental Results Direct TCP vs Overlay
• 10 simultaneous TCP streams/connection
• 50 MB transfers
• Sending rate (not scaled by recievers)• Direct TCP Unicast 3.4 Mb/s• Pure TCP Overlay Multicast 5.1Mb/s
• Speedup obtained: 50%
Experimental Results Overlay TCP vs Mixed
• 10 recievers
• No rate control on UDP Multicast, can’t run multiple streams
• Comparing Overlay TCP with single TCP stream/connection to Mixed, there is a 15% speedup
• UDP at edge offers some speedup over TCP
Conclusions
• Logistical Networking implements a scalable overlay networking infrastructure
• Data Movers provides support heterogeneity even within a single transfer
• Exposed & heterogeneous multicast can achieve speedups in the WAN
• Defining the tree and managing it for reliability and performance is a challenge
L-Bone: January 2003
Current Storage Capacity: 13 TB