04/20/23 Nicholis Bufmack - CS 622 1
Constructing Internet Coordinate System Based on
Delay Measurement
Hiyuk Lim, Jennifer C. Hou, and Chong-Ho Coi
Paper Presented by Nicholis Bufmack
04/20/23 Nicholis Bufmack - CS 622 2
Introduction
The Problem: Estimating network distances between arbitrary Internet hosts.
The Solution: Represent the locations of Internet hosts in a Cartesian coordinate system.
Topology provides useful for nearby server selection, overlay network construction, routing path construction, and peer-to-peer computing.
04/20/23 Nicholis Bufmack - CS 622 3
Estimating Distances Between Hosts
Construct network topology without direct measurement between hosts.
Several network properties may be used: bandwidth, round-trip time (RTT), packet loss
Framework consists of a common architecture consisting of beacon nodes.
Hosts only determine distance to beacon nodes.
04/20/23 Nicholis Bufmack - CS 622 4
Other Approaches
IDMaps: use distance to beacon nodes to represent location of host.
GNP: transform original distance data space into coordinate system and use coordinate system to represent the location of host.
GNP is superior, but flawed: no guarantee that host has unique coordinate.
04/20/23 Nicholis Bufmack - CS 622 5
Related Work – IDMapsInternet Distance Map Service
Developed by Francis, et al. – beacons measure distance to IP address prefixes close to itself and then use spanning tree algorithm to find shortest distance between measured hosts.
Does not analyze delay measurements or infer network topology.
Performance depends heavily on placement and number of beacons: small in number and dispersed provides poor performance.
04/20/23 Nicholis Bufmack - CS 622 6
Related Work – GNPGlobal Network Positioning
Developed by Ng – represents the location of each host in geometric space.
Distance between hosts is defined by a geometric function.
Has advantage of being able to extract network topology information from measured network distances.
Has disadvantage of not ensuring that each host will have a unique coordinate.
04/20/23 Nicholis Bufmack - CS 622 7
ICS – Internet Coordinate System
Infers the network topology based on delay measurement.
Estimates the distance between hosts without direct measurement.
Less susceptible to the distance metrics used in representing topological information.
Uses a smaller set of uncorrelated bases to represent the Cartesian space.
Base Correlation
Principle component analysis is used to minimize the large number of (possibly) correlated variables.
Consider a principle component to a collection of orthogonal projections representing the direction of maximum variance.
Singular Decomposition is used to determine each PCA.
04/20/23 Nicholis Bufmack - CS 622 8
04/20/23 Nicholis Bufmack - CS 622 9
Effect of Distance Metrics on ICS
Independence of ICS on distance metric used is a consequence of PCA.
Representing the topological information as orthogonal vectors maximizing variance removes reliance on the details of the underlying data set.
04/20/23 Nicholis Bufmack - CS 622 10
How Small Should the Number of Dimensions Be?
Usually defined as the cumulative percentage of variance that selected principle components contribute.
A threshold is established (80%) and the number of dimensions is
Tk = 100 x (∑j=1k varj) / (∑j=1
m varj) K is the number of principle components that cause
cumulative variance to reach established threshold. It is the number of dimensions in the new coordinate system.
04/20/23 Nicholis Bufmack - CS 622 11
04/20/23 Nicholis Bufmack - CS 622 12
Definitions
Raw Distance Space: m hosts measure RTT to other hosts using ping or traceroute.
The coordinate of a host Hi in an m-dimensional system
di = [di1, …, dim]T
dij does not equal dji because forward and reverse paths may be different
The overall m x m system of distances where each column i is the host Hi
D = [d1, …, dm]
04/20/23 Nicholis Bufmack - CS 622 13
04/20/23 Nicholis Bufmack - CS 622 14
Overview
Beacon nodes periodically measure RTTs to other beacon nodes and construct a coordinate system.
Coordinates of beacon nodes are calculated from raw distance space.
Ordinary hosts determine location by measuring delay to entire or partial set of beacon nodes to obtain a distance vector.
Location is determined by multiplying distance vector with a transformation matrix.
04/20/23 Nicholis Bufmack - CS 622 15
Calculating the Coordinates of the Beacon Nodes
Beacon nodes independently determine d Aggregate information and determines D Apply primary component analysis (PCA) to
obtain the transformation matrix U; the orthogonal bases of the new subspace.
Determine dimension of the coordinate system using cumulative percentage variation.
Calculate the transformation matrix Un
04/20/23 Nicholis Bufmack - CS 622 16
Determining the Coordinates of A Host Host obtains the list of beacon nodes and the
transformation matrix Un . Measure network distances to all beacon
nodes using RTT from ping or traceroute. Ia = [Ia1, …, Iam]T
Calculate the coordinate xa by multiplying the measured distance vector Ia with the transformation matrix
xa = UnT . Ia
04/20/23 Nicholis Bufmack - CS 622 17
Empirical Study
Compared against IDMaps and GNP. Used 2 datasets
National Laboratory for Applied Network Research(NLANR) – contains RTT, packet loss, topology, and on demand throughput measurements from 113 monitors.
GT-ITM topology generator – synthetic generator using 3 <= m <= 30 beacon nodes.
Used proximity as the comparison metrix, defined as the distance from the closest calculated host to the actual closest host.
04/20/23 Nicholis Bufmack - CS 622 18
Comparison in Terms of Estimation Error
IDMaps has large estimation error that decreases with the number of beacon nodes.
GNP has an increase in estimation error as the number of beacon nodes increases.
ICS has less estimation errors than IDMaps and outperforms GNP for # hosts >= 15.
04/20/23 Nicholis Bufmack - CS 622 19
04/20/23 Nicholis Bufmack - CS 622 20
Effect of the Coordinate System Dimension on the Performance
Estimation error of ICS is largest when n = 2 and improves as n increases, leveling off when n >= 6.
Estimation error of GNP is smallest when n = 4 and much larger when n = 6.
GNP is slightly better when 5 <= m <= 16 ICS does not show a significant increase in
estimation error when the number of measurements is small. GNP does.
04/20/23 Nicholis Bufmack - CS 622 21
04/20/23 Nicholis Bufmack - CS 622 22
Comparison Between ICS and GNP in Terms of Computational
Costs As number of beacon nodes increases, the computation time of GNP for calculating coordinates of beacon nodes exponentially increases.
ICS has a maximal computation time of approximately 17.1 ms compared to GNP’s maximum of 884.06 s (~ 15 minutes).
04/20/23 Nicholis Bufmack - CS 622 23
N-hierarchical Network Topology
Network topology is a tree. Each level represents a distance from the
root node. Higher levels also represent much more
complexity in the network topology, even if the number of nodes happens to be the same (as is the case for the empirical study).
04/20/23 Nicholis Bufmack - CS 622 24
04/20/23 Nicholis Bufmack - CS 622 25
Effect of Topology Complexity on Performance
In two-level hierarchical topology, ICS performed better than GNP and IDMaps.
In three-level hierarchical topology, ICS gave almost the same performance.
In both instances, estimation error for ICS remained relatively constant.
04/20/23 Nicholis Bufmack - CS 622 26
04/20/23 Nicholis Bufmack - CS 622 27
Enhancements
ICS can be enhanced by clustering. Well distributed and selected placement of beacon
nodes increases performance across the board. Performance enhancement comes from the fact that
the basis of the coordinate system is measurements between beacon nodes. Node distance is measured in-cluster and between clusters.
Increases performance of partial measurements where only a limited number of beacon nodes are used by a host to calculate Ia
04/20/23 Nicholis Bufmack - CS 622 28
04/20/23 Nicholis Bufmack - CS 622 29
Conclusion
ICS can effectively extract topological information from delay measurements between beacon hosts.
A coordinate system of much smaller dimensions can be extracted from raw data space using ICS enabling end hosts to obtain a unique location with a small number of measurements.
ICS makes accurate estimates, is much less computationally expensive, and is much less dependent on the number of beacon nodes, coordinate system dimension, or complexity of topology.
Assessment of ICS
ICS provides a promising solution for determining network distances (nearest server problem). (Napster, online gaming).
Geographical distance does not equal network distance (due to routing polices, network connectivity). (PlanetLabs)
Could be applicable to network construction of peer-to-peer systems and routing in mobile ad-hoc networks where number of hosts can change. (Promise P2P)
04/20/23 Nicholis Bufmack - CS 622 30
Future Work
Visual tools need to be developed. More research needs to be done on the ideal
number of dimensions necessary to represent the internet (6-8?).
04/20/23 Nicholis Bufmack - CS 622 31
04/20/23 Nicholis Bufmack - CS 622 32
References
04/20/23 Nicholis Bufmack - CS 622 33
References (cont.)