+ All Categories
Home > Documents > Hybrid network traffic engineering system (HNTES)

Hybrid network traffic engineering system (HNTES)

Date post: 21-Feb-2016
Category:
Upload: river
View: 30 times
Download: 0 times
Share this document with a friend
Description:
Hybrid network traffic engineering system (HNTES). Zhenzhen Yan, M. Veeraraghavan, Chris Tracy University of Virginia ESnet June 23, 2011 Please send feedback/comments to: [email protected], [email protected], [email protected]. - PowerPoint PPT Presentation
44
1 Hybrid network traffic engineering system (HNTES) Zhenzhen Yan, M. Veeraraghavan, Chris Tracy University of Virginia ESnet June 23, 2011 Please send feedback/comments to: [email protected], [email protected], [email protected] This work was carried out as part of a sponsored research project from the US DOE ASCR program office on grant DE- SC002350
Transcript
Page 1: Hybrid network traffic engineering system (HNTES)

1

Hybrid network traffic engineering system

(HNTES)Zhenzhen Yan, M. Veeraraghavan, Chris TracyUniversity of Virginia ESnet

June 23, 2011

Please send feedback/comments to:[email protected], [email protected], [email protected]

This work was carried out as part of a sponsored research project from the US DOE ASCR program office on grant DE-SC002350

Page 2: Hybrid network traffic engineering system (HNTES)

2

Outline• Problem statement• Solution approach

– HNTES 1.0 and HNTES 2.0 (ongoing)• ESnet-UVA collaborative work• Future work: HNTES 3.0 and integrated

network

Project web site: http://www.ece.virginia.edu/mv/research/DOE09/index.html

Page 3: Hybrid network traffic engineering system (HNTES)

Problem statement• Hybrid network is one that supports

both IP-routed and circuit services on:– Separate networks as in ESnet4, or– An integrated network

• A hybrid network traffic engineering system (HNTES) is one that moves data flows between these two services as needed– engineers the traffic to use the service type

appropriate to the traffic type3

Page 4: Hybrid network traffic engineering system (HNTES)

Two reasons for using circuits1. Offer scientists rate-guaranteed connectivity

– necessary for low-latency/low-jitter applications such as remote instrument control

– provides low-variance throughput for file transfers2. Isolate science flows from general-purpose flows

4

ReasonCircuit scope

Rate-guaranteed connections

Science flow isolation

End-to-end(inter-domain)

Per provider (intra-domain)

Page 5: Hybrid network traffic engineering system (HNTES)

Role of HNTES

• HNTES is a network management system and if proven, it would be deployed in networks that offer IP-routed and circuit services 5

Page 6: Hybrid network traffic engineering system (HNTES)

6

Outline• Problem statement Solution approach

– Tasks executed by HNTES– HNTES architecture– HNTES 1.0 vs. HNTES 2.0– HNTES 2.0 details

• ESnet-UVA collaborative work• Future work: HNTES 3.0 and integrated

network

Page 7: Hybrid network traffic engineering system (HNTES)

Three tasks executed by HNTES

7

online: upon flow arrival

1.

2.

3.

Page 8: Hybrid network traffic engineering system (HNTES)

HNTES architecture

8

1. Offline flow analysis and populate MFDB 2. RCIM reads MFDB and programs routers to port mirror packets

from MFDB flows3. Router mirrors packets to FMM4. FMM asks IDICM to initiate circuit setup as soon as it receives

packets from the router corresponding to one of the MFDB flows5. IDCIM communicates with IDC, which sets up circuit and PBR for

flow redirection to newly established circuit

HNTES 1.0

Page 9: Hybrid network traffic engineering system (HNTES)

Heavy-hitter flows• Dimensions

– size (bytes): elephant and mice– rate: cheetah and snail– duration: tortoise and dragonfly– burstiness: porcupine and stingray

9

Kun-chan Lan and John Heidemann, A measurement study of correlations of Internet flow characteristics. ACM Comput. Netw. 50, 1 (January 2006), 46-62.

Page 10: Hybrid network traffic engineering system (HNTES)

HNTES 1.0 vs. HNTES 2.0

10

HNTES 1.0(tested on ANI testbed)

HNTES 2.0

Dimension of heavy-hitter flow

Duration Size

Circuit granularity Circuit for each flow Circuit carries multiple flows

Heavy hitter flow identification

Online Offline

Circuit provisioning Online OfflineFlow redirection (PBRconfiguration)

Online Offline

IDC circuit setup delay is about 1 minute

Can use circuits only forlong-DURATION flows

Focus: DYNAMIC (or online) circuit setup

HNTES 1.0 logic

Page 11: Hybrid network traffic engineering system (HNTES)

Rationale for HNTES 2.0• Why the change in focus?

– Size is the dominant dimension of heavy-hitter flows in ESnet

– Large sized (elephant) flows have negative impact on mice flows and jitter-sensitive real-time audio/video flows

– Do not need to assign individual circuits for elephant flows

– Flow monitoring module impractical if all data packets from heavy-hitter flows are mirrored to HNTES

11

Page 12: Hybrid network traffic engineering system (HNTES)

HNTES 2.0 solution• Task 1: offline algorithm for elephant flow

identification - add/delete flows from MFDB• Nightly analysis of MFDB for new flows (also

offline)– Task 2: IDCIM initiates provisioning of rate-unlimited

static MPLS LSPs for new flows if needed– Task 3: RCIM configures PBR in routers for new flows

• HNTES 2.0 does not use FMM

12

MFDB: Monitored Flow Data BaseIDCIM: IDC Interface ModuleRCIM: Router Control Interface ModuleFMM: Flow Monitoring Module

Page 13: Hybrid network traffic engineering system (HNTES)

HNTES 2.0: use rate-unlimited static MPLS LSPs

• With rate-limited LSPs: If the PNNL router needs to send elephant flows to 50 other ESnet routers, the 10 GigE interface has to be shared among 50 LSPs

• A low per-LSP rate will decrease elephant flow file transfer throughput• With rate-unlimited LSPs, science flows enjoy full interface bandwidth• Given the low rate of arrival of science flows, probability of two elephant

flows simultaneously sharing link resources, though non-zero, is small. Even when this happens, theoretically, they should each receive a fair share

• No micromanagement of circuits per elephant flow• Rate-unlimited virtual circuits feasible with MPLS technology• Removes need to estimate circuit rate and duration

13

PNNL-locatedESnet PE router PNWG-cr1

ESnet core router

10 GigE LSP 50 to site PE router

LSP 1 to site PE router

Page 14: Hybrid network traffic engineering system (HNTES)

HNTES 2.0 Monitored flow database (MFDBv2)

14

Row number

Source IP address

Destination IP address

Is the source a data door?

Is the destination a data door?

Day 1

Day 2 .... Day 30

(total transfer size; if one day the total transfer size between this node pair is < 1GB, list 0)

1 0 or 1 0 or 12

Row number Ingress Router ID Egress Router ID

Row number

Source IP address

Destination IP address

Ingress Router ID

Egress Router ID

Circuit number

12

Identified elephant flows table

Existing circuits table

Flow analysis table

Page 15: Hybrid network traffic engineering system (HNTES)

HNTES 2.0 Task 1Flow analysis table

• Definition of “flow”: source/destination IP address pair (ports not used)

• Add sizes for a flow from all flow records in say one day

• Add flows with total size > threshold (e.g. 1GB) to flow analysis table

• Enter 0 if a flow size on any day after it first appears is < threshold

• Enter NA for all days other than when it first appears as a > threshold sized flow

• Sliding window: number of days15

Page 16: Hybrid network traffic engineering system (HNTES)

HNTES 2.0 Task 1Identified elephant flows table

• Sort flows in flow analysis table by a metric• Metric: weighted sum of

– persistency measure– size measure

• Persistency measure: Percentage of days in which size is non-zero out of the days for which data is available

• Size measure: Average per-day size measure (for days in which data is available) divided by max value (among all flows)

• Set threshold for weighted sum metric and drop flows whose metric is smaller than threshold

• Limits number of rows in identified elephant flows table

16

Page 17: Hybrid network traffic engineering system (HNTES)

Sensitivity analysis• Size threshold, e.g., 1GB• Period for summation of sizes, e.g.,

1 day• Sliding window, e.g., 30 days• Value for weighted sum metric

17

Page 18: Hybrid network traffic engineering system (HNTES)

Is HNTES 2.0 sufficient?• Will depend on persistency measure

– if many new elephant flows appear each day, need a complementary online solution

• Online Flow Monitoring Module (FMM)

18

Page 19: Hybrid network traffic engineering system (HNTES)

19

Outline• Problem statement• Solution approach

– HNTES 1.0 and HNTES 2.0 (ongoing) ESnet-UVA collaborative work

– Netflow data analysis– Validation of Netflow based size estimation– Effect of elephant flows

• SNMP measurements• OWAMP data analysis

– GridFTP transfer log data analysis• Future work: HNTES 3.0 and integrated network

Page 20: Hybrid network traffic engineering system (HNTES)

Netflow data analysis• Zhenzhen Yan coded OFAT (Offline flow analysis

tool) and R program for IP address anonymization• Chris Tracy is executing OFAT on ESnet Netflow data

and running the anonymization R program• Chris will provide UVA Flow Analysis table with

anonymized IP addresses• UVA will analyze flow analysis table with R

programs, and create identified elephant flows table• If high persistency measure, then offline solution is

suitable; if not, need HNTES 3.0 and FMM!

20

Page 21: Hybrid network traffic engineering system (HNTES)

Findings: NERSC-mr2, April 2011 (one month data)

21

Persistency measure = ratio of (number of days in which flow size > 1GB) to (number of days from when the flow first appears)Total number of flows = 2281 Number of flows that had > 1GB transfers every day = 83

Page 22: Hybrid network traffic engineering system (HNTES)

Data doors• Number of flows from NERSC data doors = 84

(3.7% of flows)• Mean persistency ratio of data door flows =

0.237• Mean persistency ratio of non-data door flows

= 0.197• New flows graph right skewed offline is good

enough? (just one month – need more months’ data analysis)

• Persistency measure is also right skewed online may be needed

22

Page 23: Hybrid network traffic engineering system (HNTES)

Validation of size estimation from Netflow data

• Hypothesis– Flow size from concatenated Netflow

records for one flow can be multiplied by 1000 (since the ESnet Netflow sampling rate is 1 in 1000 packets) to estimate actual flow size

23

Page 24: Hybrid network traffic engineering system (HNTES)

Experimental setup

24

• GridFTP transfers of 100 MB, 1GB, 10 GB files

• sunn-cr1 and chic-cr1 Netflow data usedChris Tracy set up this experiment

Page 25: Hybrid network traffic engineering system (HNTES)

Flow size estimation experiments

• Workflow inner loop (executed 30 times):– obtain initial value of firewall counters at sunn-cr1

and chic-cr1 routers– start GridFTP transfer of a file of known size– from GridFTP logs, determine data connection TCP

port numbers– read firewall counters at the end of the transfer– wait 300 seconds for Netflow data to be exported

• Repeat experiment 400 times for 100MB, 1 GB and 10 GB file sizes

25Chris Tracy ran the experiments

Page 26: Hybrid network traffic engineering system (HNTES)

Create log files• Filter out GridFTP flows from Netflow data• For each transfer, find packet counts and

byte counts from all the flow records and add

• Multiply by 1000 (1-in-1000 sampling rate)• Output the byte and packet counts from the

firewall counters• Size-accuracy ratio = Size computed from

Netflow data divided by size computed from firewall counters

26Chris Tracy wrote scripts to create these log files and gave UVA these files for analysis

Page 27: Hybrid network traffic engineering system (HNTES)

Size-accuracy ratio

27

Netflow records obtained from Chicago ESnet router

Netflow records obtained from Sunnyvale ESnet router

Mean Standard deviation

Mean Standard deviation

100 MB 0.949 0.2780 1.0812 0.30731 GB 0.996 0.1708 1.032 0.165310 GB 0.990 0.0368 0.999 0.0252

• Sample mean shows a size-accuracy ratio close to 1

• Standard deviation is smaller for larger files. • Dependence on traffic load• Sample size = 50

Zhenzhen Yan analyzed log files

Page 28: Hybrid network traffic engineering system (HNTES)

28

Outline• Problem statement• Solution approach

– HNTES 1.0 and HNTES 2.0 (ongoing) ESnet-UVA collaborative work

– Netflow data analysis– Validation of Netflow based size estimation Effect of elephant flows

• SNMP measurements• OWAMP data analysis

– GridFTP log analysis• Future work: HNTES 3.0 and integrated network

Page 29: Hybrid network traffic engineering system (HNTES)

Effect of elephant flows on link loads

• SNMP link load averaging over 30 sec• Five 10GB GridFTP transfers• Dashed lines: rest of the traffic load 29

2.5 Gb/s

10 Gb/s

1 minute

SUNN-cr1interface SNMP load

CHIC-cr1interface SNMP load

Chris Tracy

Page 30: Hybrid network traffic engineering system (HNTES)

OWAMP (one-way ping)• One-Way Active Measurement Protocol

(OWAMP)– 9 OWAMP servers across Internet2 (72 pairs)– The system clock is synchronized– The “latency hosts” (nms-rlat) are dedicated

only to OWAMP– 20 packets per second on average (10 for ipv4,

10 for ipv6) for each OWAMP server pair– Raw data for 2 weeks obtained for all pairs

30

Page 31: Hybrid network traffic engineering system (HNTES)

Study of “surges” (consecutive higher OWAMP delays on 1-minute basis)

• Steps:• Find the 10th percentile delay b across

the 2-weeks data set• Find the 10th percentile delay i for

each minute • If i > n × b, i is considered a surge

point (n = 1.1, 1.2, 1.5)• Consecutive surge points are

combined as a single surge31

Page 32: Hybrid network traffic engineering system (HNTES)

Study of surges cont.

32

CHIC-LOSA CHIC-KANS KANS-HOUS HOUS-LOSA LOSA-SALT

10th percentile 29 ms 5 ms 6.7 ms 16.1 ms 7.3 ms>1.1×(10th percentile) 31 ms 5.9 ms 7.3 ms 17.5 ms 8.5 ms

>1.2×(10th percentile) 34 ms 6.3 ms 8 ms 19 ms 9.5 ms

>1.5×(10th percentile) NA NA NA 23.9 ms 11.6 ms

• Sample absolute values of 10th percentile delays

Page 33: Hybrid network traffic engineering system (HNTES)

PDF of surge duration

33

• a surge lasted for 200 mins• the median value is 34 mins

Page 34: Hybrid network traffic engineering system (HNTES)

95th percentile per minute

34

CHIC-LOSA CHIC-KANS KANS-HOUS HOUS-LOSA LOSA-SALT

10th percentile of 2 weeks 29 ms 5 ms 6.7 ms 16.1 ms 7.3 ms

>1.2×(10th percentile) 33 ms 6.4 ms 8 ms 18.7 ms 9.3 ms

>1.5×(10th percentile) 50 ms 8.1 ms 18.8 ms 23.9 ms 11.5 ms

>2×(10th percentile) 58 ms 11 ms 18.8 ms 40.7 ms NA

>3×(10th percentile) 84 ms 17 ms NA 53.8 ms NA

Max of 95th percentile 119.8 ms 50.5 ms NA 86.7 ms NA

•The 95 percentile delay per min was 4.13 (CHIC-LOSA), 10.1 (CHIC-KANS) and 5.4 (HOUS-LOSA) times the one way propagation delay

Page 35: Hybrid network traffic engineering system (HNTES)

Future workDetermine cause(s) of surges

• Host (OWAMP server) issues?– In addition to OWAMP pings, OWAMP server pushes

measurements to Measurement Archive at IU• Interference from BWCTL at HP LAN switch

within PoP?– Correlate BWCTL logs with OWAMP delay surges

• Router buffer buildups due to elephant flows– Correlate Netflow data with OWAMP delay surges

• If none of above, then surges due to router buffer buildups resulting from multiple simultaneous mice flows

35

Page 36: Hybrid network traffic engineering system (HNTES)

GridFTP data analysis findings

Size (bytes) Duration (sec) ThroughputMinimum 100003680 0.25 1.2 Mbps

Median 104857600 2.5 348 Mbps

Maximum 96790814720 = 90 GB

9952 4.3 Gbps

36

• All GridFTP transfers from NERSC GridFTP servers that > 100 MB: one month (Sept. 2010)

• Total number of transfers: 124236 • Data from GridFTP logs

Page 37: Hybrid network traffic engineering system (HNTES)

Throughput of GridFTP transfers

37

• Total number of transfers: 124236

• Most transfers get about 50 MB/sec or 400 Mb/s

Page 38: Hybrid network traffic engineering system (HNTES)

Variability in throughput for files of the same size

Throughput in bits/sMinimum 7.579e+081st quartile 1.251e+09Median 1.499e+09Mean 1.625e+093rd quartile 1.947e+09Maximum 3.644e+09

38

• There were 145 file transfers of size 34359738368 (bytes) – 34 GB approx.

• IQR (Inter-quartile range) measure of variance is 695 Mbps

• Need to determine other end and consider time

Page 39: Hybrid network traffic engineering system (HNTES)

39

Outline• Problem statement• Solution approach

– HNTES 1.0 and HNTES 2.0 (ongoing)• ESnet-UVA collaborative work Future work: HNTES 3.0 and integrated

network

Page 40: Hybrid network traffic engineering system (HNTES)

HNTES 3.0• Online flow detection

Packet header based schemes– Payload based scheme– Machine learning schemes

• For ESnet– Data door IP address based 0-length (SYN) segment

mirroring to trigger PBR entries (if full mesh of LSPs), and LSP setup (if not a full mesh)

– PBR can be configured only after finding out the other end’s IP address (data door is one end)

– “real-time” analysis of Netflow data• Need validation by examining patterns within each day

40

Page 41: Hybrid network traffic engineering system (HNTES)

HNTES in an integrated network

• Setup two queues on each ESnet physical link; each rate-limited

• Two approaches• Use different DSCP taggings

– General purpose: rate limited at 20% capacity– Science network: rate limited at 80% capacity

• IP network + MPLS network– General purpose: same as approach I– Science network: full mesh of MPLS LSPs

mapped to 80% queue

41

Ack: Inder Monga

Page 42: Hybrid network traffic engineering system (HNTES)

Comparison• In first solution, there is no easy way to

achieve load balancing of science flows• Second solution:

– MPLS LSPs are rate unlimited– Use SNMP measurements to measure load on each

of these LSPs– Obtain traffic matrix– Run optimization to load balance science flows by

rerouting LSPs to use whole topology– Science flows will enjoy higher throughput than in

the first solution because TE system can periodically re-adjust routing of LSPs

42

Page 43: Hybrid network traffic engineering system (HNTES)

Discuss integration with IDC• IDC established LSPs have rate

policing at ingress router• Not suitable for HNTES redirected

science flows• Add a third queue for this category

43Discussion with Chin Guok

Page 44: Hybrid network traffic engineering system (HNTES)

Summary• HNTES 2.0 focus

– Elephant (large-sized) flows– Offline detection– Rate-unlimited static MPLS LSPs– Offline setting of policy based routes for flow

redirection• HNTES 3.0

– Online PBR configuration – Requires flow monitoring module to receive port

mirrored packets from routers and execute online flow redirection after identifying other end

• HNTES operation in an integrated network44


Recommended