+ All Categories
Home > Documents > A Fine-Grained Distance Metric for Analyzing Internet Topology · A Fine-Grained Distance Metric...

A Fine-Grained Distance Metric for Analyzing Internet Topology · A Fine-Grained Distance Metric...

Date post: 18-Oct-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
37
A Fine-Grained Distance Metric for Analyzing Internet Topology Mark Crovella joint with Gonca Gursun, Natali Ruchansky, and Evimaria Terzi
Transcript
Page 1: A Fine-Grained Distance Metric for Analyzing Internet Topology · A Fine-Grained Distance Metric for Analyzing Internet Topology Mark Crovella joint with Gonca Gursun, Natali Ruchansky,

A Fine-Grained Distance Metric for Analyzing Internet Topology

Mark Crovella joint with

Gonca Gursun, Natali Ruchansky, and Evimaria Terzi

Page 2: A Fine-Grained Distance Metric for Analyzing Internet Topology · A Fine-Grained Distance Metric for Analyzing Internet Topology Mark Crovella joint with Gonca Gursun, Natali Ruchansky,

Distance in Graphs with Low Diameter •  Do ‘neighborhoods’ still

exist? •  Under what distance

metric?

Page 3: A Fine-Grained Distance Metric for Analyzing Internet Topology · A Fine-Grained Distance Metric for Analyzing Internet Topology Mark Crovella joint with Gonca Gursun, Natali Ruchansky,

A Motivating Problem

•  Internet routing between domains –  Autonomous Systems (ASes)

•  Simple question: what paths pass through my network? •  Important for network planning, traffic management,

security –  If someone at BU were to send an email to UM, would it go

through my network?

Page 4: A Fine-Grained Distance Metric for Analyzing Internet Topology · A Fine-Grained Distance Metric for Analyzing Internet Topology Mark Crovella joint with Gonca Gursun, Natali Ruchansky,

Surprisingly hard to answer!

•  Routing consists of an AS choosing a next hop for each destination –  Destinations are ‘prefixes’

•  Decisions are only partially communicated to neighbors –  In general, decisions made by a remote AS are not known

Page 5: A Fine-Grained Distance Metric for Analyzing Internet Topology · A Fine-Grained Distance Metric for Analyzing Internet Topology Mark Crovella joint with Gonca Gursun, Natali Ruchansky,

Observing Traffic

•  An AS can observe the traffic passing through it –  If BU sends traffic to UM through Sprint, Sprint knows it

•  Traffic only provides positive information –  Absence of traffic is ambiguous

•  If the observer does not see traffic from I to j, it is either –  A true zero: the path from i to j does not go through the observer; or –  A false zero: the path goes through, but i is not sending anything to j

Page 6: A Fine-Grained Distance Metric for Analyzing Internet Topology · A Fine-Grained Distance Metric for Analyzing Internet Topology Mark Crovella joint with Gonca Gursun, Natali Ruchansky,

The Visibility-Inference Problem

•  For each observer there is a ground truth matrix T –  T(i,j) = 1 è path from i to j passes through observer

•  Traffic summarized in observable matrix M –  M(i,j) = 1 è traffic was seen flowing from i to j –  M(i,j) = 1 è T(i,j) = 1

•  Problem: label the zeros in M as either true or false

•  Success metric: Detection Rate, False Alarm Rate –  Of true zeros

T =

2

40 1 01 1 01 0 1

3

5 M =

2

40 0 01 0 01 0 1

3

5

Page 7: A Fine-Grained Distance Metric for Analyzing Internet Topology · A Fine-Grained Distance Metric for Analyzing Internet Topology Mark Crovella joint with Gonca Gursun, Natali Ruchansky,

Intuition

•  Amplify knowledge obtained from traffic observation •  Empirically we observe that there are groups of

sources, destinations exhibiting `similar routing’ •  Observed traffic provides positive knowledge for entire

group

•  Requires a metric that captures the notion that d1 is `close’ to d2 while s1 is `far’ from s2

M =s1

s2

2

40 0 01 0 01 0 1

3

5

d1 d2 s1

s2

d1

d2

Page 8: A Fine-Grained Distance Metric for Analyzing Internet Topology · A Fine-Grained Distance Metric for Analyzing Internet Topology Mark Crovella joint with Gonca Gursun, Natali Ruchansky,

Seeking a metric for ‘neighborhoods’

•  Typical distance used in graphs is hop count •  Not suitable in small worlds

•  90% of prefix pairs have hop distance < 5 –  Clearly, typical distance metric is inappropriate

•  Need a metric that expresses ‘routed similarly in the Internet’ –  or other graph

0 200 400 600 800 10001

2

3

4

5

6

7

Hop

Dis

tanc

e

Prefix Pairs

Page 9: A Fine-Grained Distance Metric for Analyzing Internet Topology · A Fine-Grained Distance Metric for Analyzing Internet Topology Mark Crovella joint with Gonca Gursun, Natali Ruchansky,

Capturing global routing state •  Conceptually, imagine capturing the entire routing

state of the Internet in a matrix N •  N(i,j) = next hop on path from i to j •  Each row is actually the routing table of a single AS •  Now consider the columns

N =

2

66664

3

77775

Page 10: A Fine-Grained Distance Metric for Analyzing Internet Topology · A Fine-Grained Distance Metric for Analyzing Internet Topology Mark Crovella joint with Gonca Gursun, Natali Ruchansky,

Routing State Distance •  rsd(a,b) = # of entries that differ in columns a and b of N •  If rsd is small, most ASes think the two prefixes are

‘in the same direction’ •  A metric (obeys triangle inequality)

rsd=3 rsd=5

N =

2

66664

3

77775N =

2

66664

3

77775

Page 11: A Fine-Grained Distance Metric for Analyzing Internet Topology · A Fine-Grained Distance Metric for Analyzing Internet Topology Mark Crovella joint with Gonca Gursun, Natali Ruchansky,

RSD in Practice

•  Key observation: we don’t need all of H to obtain a useful metric –  That’s good – the problem would be solved anyway if we had that

•  Many (most?) nodes contribute little information to RSD –  Nodes at edges of network have nearly-constant rows in H

•  Sufficient to work with a small set of well-chosen rows of H –  Empirically we observe this to be the case

•  Such a set is obtainable from publicly available BGP measurements –  Note that public BGP measurements require some careful handling to use

properly for computing RSD

0 200 400 600 800 10001

2

3

4

5

6

7

Hop

Dis

tanc

e

Prefix Pairs0 200 400 600 800 1000

0

50

100

150

200

250

RSD

Prefix Pairs

Page 12: A Fine-Grained Distance Metric for Analyzing Internet Topology · A Fine-Grained Distance Metric for Analyzing Internet Topology Mark Crovella joint with Gonca Gursun, Natali Ruchansky,

Using RSD to amplify observed traffic

•  For each zero (i,j) in M –  Find , –  Compute –  If then (i,j) is a false zero; o/w true zero

•  Thresholds τ and β are easy to set in an automatic way •  Applied to three sets of ASes:

–  Core-100 and Core-1000: centrally located ASes –  Edge-1000: randomly chosen ASes, mostly near edge of net

Si = {i0 | rsd(i, i0) < ⌧} Di = {j0 | rsd(j, j0) < ⌧}⇡(i, j) =

PM(Si, Dj)

⇡(i, j) > �

Page 13: A Fine-Grained Distance Metric for Analyzing Internet Topology · A Fine-Grained Distance Metric for Analyzing Internet Topology Mark Crovella joint with Gonca Gursun, Natali Ruchansky,

Experimental setup

•  Ground-truth matrices from BGP data –  Collected all active paths from 38 sources to 135,000

destinations –  For every AS, construct 38 x 135,000 ground truth matrix T –  Data hygiene: discussed at end

•  Simulate traffic absence by setting some 1s to zeros –  Flipped at random from 1 to 0 –  10%, 30%, 50%, 95%

Page 14: A Fine-Grained Distance Metric for Analyzing Internet Topology · A Fine-Grained Distance Metric for Analyzing Internet Topology Mark Crovella joint with Gonca Gursun, Natali Ruchansky,

Quick Look

Page 15: A Fine-Grained Distance Metric for Analyzing Internet Topology · A Fine-Grained Distance Metric for Analyzing Internet Topology Mark Crovella joint with Gonca Gursun, Natali Ruchansky,

Performance

•  Accuracy on Edge-1000 is poorer –  At edge of network, problem is actually easier –  RSD not the right idea to use there

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

False Positive Rate

True

Pos

itive

Rat

e

10%30%50%95%

0 0.05 0.1

0.8

0.9

1

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

False Positive Rate

True

Pos

itive

Rat

e

10%30%50%95%

0 0.05 0.1

0.8

0.9

1

Core-1000 Core-100

Page 16: A Fine-Grained Distance Metric for Analyzing Internet Topology · A Fine-Grained Distance Metric for Analyzing Internet Topology Mark Crovella joint with Gonca Gursun, Natali Ruchansky,

Mean TPR and FPR: Core-1000

Flip Rate TPR FPR

10% 0.98 (0.03) 0.03 (0.04)

30% 0.98 (0.03) 0.03 (0.04)

50% 0.98 (0.03) 0.03 (0.05)

95% 0.98 (0.03) 0.21 (0.18)

Page 17: A Fine-Grained Distance Metric for Analyzing Internet Topology · A Fine-Grained Distance Metric for Analyzing Internet Topology Mark Crovella joint with Gonca Gursun, Natali Ruchansky,

Properties of RSD

•  Tree: rsd = 1 + length of path in tree

–  So rsd in a tree is low, generally O(log n)

•  Clique: rsd = n

•  Star: rsd = 3

On graphs of size n with shortest-path routing

Page 18: A Fine-Grained Distance Metric for Analyzing Internet Topology · A Fine-Grained Distance Metric for Analyzing Internet Topology Mark Crovella joint with Gonca Gursun, Natali Ruchansky,

Comparison to Classical Distance

rsd dStar 3 2

Balanced tree log n log nClique n 1

Page 19: A Fine-Grained Distance Metric for Analyzing Internet Topology · A Fine-Grained Distance Metric for Analyzing Internet Topology Mark Crovella joint with Gonca Gursun, Natali Ruchansky,

RSD on small worlds

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

0.0001 0.001 0.01 0.1 1p

Clustering CoefficientHop Distance

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

0.0001 0.001 0.01 0.1 1p

Clustering CoefficientHop Distance

RSD

W&S, Nature, ‘98

Page 20: A Fine-Grained Distance Metric for Analyzing Internet Topology · A Fine-Grained Distance Metric for Analyzing Internet Topology Mark Crovella joint with Gonca Gursun, Natali Ruchansky,

RSD remains useful as hop distance breaks down

p = 0 p = 0.01 p = 0.1

d

rsd

Page 21: A Fine-Grained Distance Metric for Analyzing Internet Topology · A Fine-Grained Distance Metric for Analyzing Internet Topology Mark Crovella joint with Gonca Gursun, Natali Ruchansky,

RSD as a tool for data analysis and discovery

•  RSD applied to Internet prefixes shows low effective rank •  Implication: RSD is good for visualization of prefixes

0 5 10 15 200

5

10

15 x 104

20 largest singular values of 1000 x 1000 RSD matrix for Internet prefixes

Page 22: A Fine-Grained Distance Metric for Analyzing Internet Topology · A Fine-Grained Distance Metric for Analyzing Internet Topology Mark Crovella joint with Gonca Gursun, Natali Ruchansky,

RSD for clustering – fine scale

−0.04 −0.038 −0.036 −0.034 −0.032 −0.030.02

0.03

0.04

0.05

0.06

0.07

ASN−QWEST

ESNET

GTANET−ASAPPLE

ASN852KREONET−AS−KR

TELE2TELIANET

Rede

ERX−JARINGWISCNET1−ASKANREN

INFOSPHERE

IDC2554

MORENET

SPIRITTEL−AS

XO−AS15

SINET−AS

ASC−NET

HARNET

UIOWA−AS INSINET

LGDACOM

CPCNET−AS4058

ERX−AU−NET

CTM−MO

INET−TH−AS

UNSPECIFIED

ASN−PACIFIC−INTERNET−IX

THAI−GATEWAY

HYUNDAI−KR

IDC

ODN

SAMART−BOARDER−AS

CSLOXINFO−AS−AP

WORLDNET−AS

KIXS−AS−KR

ICONZ−AS

GLOBE−TELECOM−AS

SEEDNET

LINTASARTA−AS−AP

ASN−IINETVOCUS−BACKBONE−AS

IINET−AU

DISC−AS−KR

ICN−AS

GLOBEINTERNET

GT−BELL

BAYAN−TELECOMMUNICATIONSHURRICANEWINDSTREAM

GRANDECOM−AS1

TRUEINTERNET−AS−AP

CSTNET−AS−AP

NETIRD

PI−AU

TPG−INTERNET−AP

HCNSEOCHO−AS−KR

APAN−JP

CTNET

TRANSACT−SDN−AS

VISIONNET

BEZEQ−INTERNATIONAL−AS

CELLCOM−AS

SPEEDCAST−AP

ISP−AS−AP

HCNCHUNGJU−AS−KR

TDNC

NEWTT−IP−AP

CRNET

MULTIMEDIA−AS−AP

DREAMX−AS

SMARTONE−AS−AP

ASN−PLATFORM−AP

AP−AUST−AU−AS

FX−PRIMARY−AS

GENESIS−AP

ZAQ

ASN−OZONLINE−AU

MOPH−TH−AP

FCABLE−AS

TOTNET−TH−AS−AP

CHEONANVITSSEN−AS−KR

DONGBUIT−AS−KRPUBNET1−AS

CMNET−GD

CNNIC−CN−COLNET

ASN−ATHOMEJP

GITS−TH−AS−AP

GNGAS

CAT−AP

COMINDICO−AP

SAERONET−AS−KR

TOKAI PACNET

GAYANET−AS−KREFTEL−AS−APHCNKUMHO−AS−KR

YAHOO−1VISIONARY

AIRSTREAMCOMM−NET

TELWEST−NETWORK−SVCS−STATIC

AFILIAS−NST

TRUVISTA

IPPLANET−AS

ITGATE

TVC−AS1

Diveo

ROCK−HILL−TELEPHONE

VEROXITY

BROADVIEWNET

INFOLINK−MIA−US

Hadara

RIPNET

CAVTEL02

HOMESCCITIZENS

CORPCOLO

NWT−AS−APK−OPTICOM

QALA−SG−AP

CITIC−HK−AP

ADC−BUDDYB−AS

GIGAPASS−AS−KR

ICE−AS−KRCOMCLARK−AS

SPTEL−AP AS−PNAPTOK

GIGAINFRA

CAMNET−AS

ASN−EQUINIX−APDREAMPLUS−AS−KR

GINAMHANVIT−AS−KR

HANVITIAB−AS−KR

ACE−1−WIFI−AS−AP

KBTNETSPEED−AS−AP

WWW−PH−AP

LINKTELECOM−NZ−AP

PLANET−OZI

POSNET

FPT−AS−AP

EGIHOSTING

USCARRIER

TWIN−LAKES

SERVERROOM

O1COMM

AMPATH−AS

CHL

BEN−LOMAND−TEL

C−R−T

HARCOM1

ISPNET−1

ALLIED−TELECOMBARR−XPLR−ASN

SKYBEST

VITSSEN−SUWON−AS−KR

INCHEON−AS−KR

ONLINE−AS

YAHOO

AINS−AS−AP

IPVG−AS−AP

SKYBB−AS−AP

BIGAIR−AP

PIPETRANSIT−AS−AP

KIRZ−AS−TH

BCL−TRANSIT−AS−AP

ABN−PEERING−AS−AP

CMNET−V4SHANGHAI−AS−AP

WICAM−AS

ISATNET−AS−IDDIGISATNET−AS−ID

YAHOO−JP−AS−AP

ENMAX−ENVISION

SANTEECOOPER

2901

CooperaciónUniversidade

FIORD−AS

IPNXng

NASSIST−AS

VOXEL−DOT−NETCOLOGUYS

GIGANEWS

XITTEL−AS EONIX−CORPORATION−AS−PHX01−WWW−INFINITIE−NET

EVERN2

ONS−COS

NEMONT−CELLULAR

CANBYTELEPHONEASSOCIATION

RSN−1

PARC−ASN

VPLSNET

CNNIC−PBSL−AP

HELLONET−AS−KR

EXTREMEBB−AS−MY

SIS−GROUP−SYD−AS−AP

AIS−TH−AP

PACTEL−AS−AP

SUNNYVISION−AS−AP

HCLC−AS−KR

CMBHK−AS−KR

CMBDAEJEON−AS−KR

BB−BROADBAND−TH−AS−AP

MURAMATSU−AS−AP

GOSCOMB−AS

NORTHLAND−CABLE

BUDREMYER−AS

CTC−CORE−AS

EPSILON

IPSERVERONE−AS−AP

JCN−AS−KR

DAELIM

DEDICATED−SERVERS−SYD−AS−AP

SUPERBROADBANDNETWORK−AS−AP

GRNT−JP−AS−AP

JASTEL−NETWORK−TH−AP

LBNI

TRIPLETNET−AS−AP

PFNL−ASN

EVERN−1

LIGHTOWER

IMPACT−AS−APCMNET−GUANGDONG−AP

CMNET−JIANGSU−AP

Page 23: A Fine-Grained Distance Metric for Analyzing Internet Topology · A Fine-Grained Distance Metric for Analyzing Internet Topology Mark Crovella joint with Gonca Gursun, Natali Ruchansky,

Visualizing all prefixes

−100 0 100

−100

0

100Smaller cluster is about 25% of all prefixes

Consistent clustering in any sample of prefixes

Page 24: A Fine-Grained Distance Metric for Analyzing Internet Topology · A Fine-Grained Distance Metric for Analyzing Internet Topology Mark Crovella joint with Gonca Gursun, Natali Ruchansky,
Page 25: A Fine-Grained Distance Metric for Analyzing Internet Topology · A Fine-Grained Distance Metric for Analyzing Internet Topology Mark Crovella joint with Gonca Gursun, Natali Ruchansky,

Understanding Clustering under RSD

In terms of the next-hop matrix N: •  A cluster C corresponds to a

set of columns of N, ie, N(:,C) •  The columns are close in RSD •  So they must be similar in

some positions S

In terms of BGP routing: •  For any row in N(S,C) the entries

must be nearly the same •  So S is a set of ASes making

similar routing decisions w.r.t. C

We call such a pair (S,C) a local atom

−100 0 100

−100

0

100

Page 26: A Fine-Grained Distance Metric for Analyzing Internet Topology · A Fine-Grained Distance Metric for Analyzing Internet Topology Mark Crovella joint with Gonca Gursun, Natali Ruchansky,

Why are Local Atoms Interesting?

•  Routing decisions by ASes are made independently •  There will often be many next hops for any given prefix •  Different ASes will choose different next hops for a prefix

Page 27: A Fine-Grained Distance Metric for Analyzing Internet Topology · A Fine-Grained Distance Metric for Analyzing Internet Topology Mark Crovella joint with Gonca Gursun, Natali Ruchansky,

Exposing Unexpected Coordination

−100 0 100

−100

0

100

−100 0 100

−100

0

100 A significant fraction of ASes are all following the same rule.

There is a special AS X in the system.

The rule is: “Always use AS X to get to any customer in its cone.”

The ASes making this coordinated decision are not related in any obvious way

The smaller cluster is a local atom.

Page 28: A Fine-Grained Distance Metric for Analyzing Internet Topology · A Fine-Grained Distance Metric for Analyzing Internet Topology Mark Crovella joint with Gonca Gursun, Natali Ruchansky,

Attractivity of AS6939

Page 29: A Fine-Grained Distance Metric for Analyzing Internet Topology · A Fine-Grained Distance Metric for Analyzing Internet Topology Mark Crovella joint with Gonca Gursun, Natali Ruchansky,

What is so special about Hurricane Electric?

•  A peering policy based only on colocation –  Settlement-free peering –  No commercial consideration or traffic ratios

•  Immensely attractive to small/medium NSPs –  “free” access to any customer network of HE –  HE backbone has global scope

“Hurricane Electric is willing to peer with networks which are connected to one or more exchange points which we have in common.”

he.net/adm/peering.html

Page 30: A Fine-Grained Distance Metric for Analyzing Internet Topology · A Fine-Grained Distance Metric for Analyzing Internet Topology Mark Crovella joint with Gonca Gursun, Natali Ruchansky,

Systematic Discovery of Local Atoms

•  Manual inspection uncovers a large-scale local atom –  The Hurricane Electric cluster –  Includes about 25% of prefixes in the Internet

•  Do smaller local atoms exist? How can we detect them? •  Need an effective clustering strategy for RSD

•  A natural approach: seek clusters such that –  Within-cluster RSD is minimized –  Between-cluster RSD is maximized

Page 31: A Fine-Grained Distance Metric for Analyzing Internet Topology · A Fine-Grained Distance Metric for Analyzing Internet Topology Mark Crovella joint with Gonca Gursun, Natali Ruchansky,

Pivot Clustering

•  Minimizing P-Cost is NP-hard, but there is an expected 3-approximation algorithm for it: Pivot clustering

Page 32: A Fine-Grained Distance Metric for Analyzing Internet Topology · A Fine-Grained Distance Metric for Analyzing Internet Topology Mark Crovella joint with Gonca Gursun, Natali Ruchansky,

Largest 5 clusters found using Pivot Clustering

•  Clusters show clear separation

•  Each cluster corresponds to a local atom

Page 33: A Fine-Grained Distance Metric for Analyzing Internet Topology · A Fine-Grained Distance Metric for Analyzing Internet Topology Mark Crovella joint with Gonca Gursun, Natali Ruchansky,

33

Number of Prefixes (C)

Number of Source ASes (S)

Destinations

C1 150 16 Ukraine 83% Czech. Rep 10%

C2 170 9 Romania 33% Poland 33%

C3 126 7 India 93% US 2%

C4 484 8 Russia 73% Czech rep. 10%

C5 375 15 US 74% Australia 16%

Interpreting Clusters

Page 34: A Fine-Grained Distance Metric for Analyzing Internet Topology · A Fine-Grained Distance Metric for Analyzing Internet Topology Mark Crovella joint with Gonca Gursun, Natali Ruchansky,

Conclusions

•  RSD is a metric that captures when two prefixes are “routed similarly in the Internet”

•  It has many useful properties –  a fine-grained distance metric even in small-world graphs –  captures a different notion of distance than shortest-path –  summarizes the routing state of the entire network –  effective for visualizing Internet prefixes –  allows in-depth analysis of BGP –  uncovers surprising and interesting patterns in routing

•  Local atoms

Page 35: A Fine-Grained Distance Metric for Analyzing Internet Topology · A Fine-Grained Distance Metric for Analyzing Internet Topology Mark Crovella joint with Gonca Gursun, Natali Ruchansky,

Thank you!

Gonca Gursun Natali Ruchansky Evimaria Terzi

Page 36: A Fine-Grained Distance Metric for Analyzing Internet Topology · A Fine-Grained Distance Metric for Analyzing Internet Topology Mark Crovella joint with Gonca Gursun, Natali Ruchansky,

Related Work •  Reported that BGP tables provide an incomplete view of the

AS graph. [Roughan et. al. ‘11]

•  Visualization based on AS degree and geo-location. [Huffaker and k. claffy ‘10]

•  Small scale visualization through BGPlay and bgpviz

•  Clustering on the inferred AS graph. [Gkantsidis et. al. ‘03]

•  Clustering prefixes that share the same BGP paths into policy atoms. [Broido and k. claffy ‘01]

•  Methods for calculating policy atoms and characteristics. [Afek et. al. ‘02]

36

Page 37: A Fine-Grained Distance Metric for Analyzing Internet Topology · A Fine-Grained Distance Metric for Analyzing Internet Topology Mark Crovella joint with Gonca Gursun, Natali Ruchansky,

Data Hygiene Implications

•  BGP data is known to favor customer-provider links and miss peer-peer links

•  Our restriction to 38 x 135000 known paths means that we are not missing any links in the scope of our experiments

•  Hence accuracy for the chosen subsets of M is not affected by missing links

•  However, the accuracy of our methods may be different on the full M –  Whether better or worse, it’s not clear –  There is some reason to believe it would be better…


Recommended