+ All Categories
Home > Documents > Data Streaming Algorithms for Accurate and Efficient Measurement of Traffic and Flow Matrices Qi...

Data Streaming Algorithms for Accurate and Efficient Measurement of Traffic and Flow Matrices Qi...

Date post: 22-Dec-2015
Category:
Upload: lana-queen
View: 212 times
Download: 0 times
Share this document with a friend
28
Data Streaming Algorithms for Accurate and Efficient Measurement of Traffic and Flow Matrices Qi Zhao*, Abhishek Kumar*, Jia Wang + and Jun (Jim) Xu* *College of Computing, Georgia Tech + AT&T Labs - Research
Transcript

Data Streaming Algorithms for Accurate and Efficient Measurement of Traffic and Flow Matrices

Qi Zhao*, Abhishek Kumar*, Jia Wang+ and Jun (Jim) Xu*

*College of Computing, Georgia Tech+AT&T Labs - Research

Flow matrix FM FM [i, j, f] = the size of th

e flow f flowing from node i to node j

Useful in Computing usage patter

n of ISPs Detecting of flapping ro

utes Detecting DDoS attacks

Traffic and flow matrices

Traffic matrix TM TM [i, j] = traffic volume

from node i to node j Useful in

Capacity planning and forecasting

Routing configuration Network fault/reliability

diagnoses Provisioning for SLA

[ , , ] [ , ]f

FM i j f TM i j

Existing approaches

Traffic matrix Indirect inference (ho

listic) Link counts from SN

MP Routing matrix Network model

Direct measurement Sampling Our approach

Flow matrix Not well studied yet Straightforward appr

oach: sampling

Data streaming algorithms

Data streaming: processing a long stream of data items in one pass using a small working memory in order to answer a class of queries regarding the stream.

Our context Packet arrival rate is high (e.g., 10-40 Gbps) Small but fast memory — SRAM (10ns per access) w

ill be used. Challenge: how to fully use SRAM to remember

as much information pertinent to traffic/flow matrix as possible?

Two data streaming schemes

The bitmap-based scheme Traffic matrix

The counter array-based scheme Flow matrix Traffic matrix

System model

Online streamingmodule

Online streamingmodule

Data analysismodule

Node i

Node j

Sever

The bitmap-based scheme

Online streaming module Data analysis module

Online streaming module

The data digest data-structure is a bit array (bitmap) initially set to all 0’s.

It is updated upon each packet arrival. Measurement proceeds in epochs.

Example

packet

0 1 2 i

0

Invariant packet header + the first 8 bytes of the payload

[Snoeren et al. SIGCOMM’01] shows that these 28 bytes

are sufficient to differentiate almost all non-identical packets.

H(.)

U := U-1

If U/b < Threshold

save the bitmap

start a new epoch

b-1

1

Complexities

Computational complexity One hash function computation One write to the memory

Storage complexity Each packet only produces a little more than

one bit as its digest. This can be further reduced using sampling.

The bitmap-based scheme

Online streaming module Data analysis module

Data analysis module

What we have so far? (for TM [i, j]): BMi generated by the traffic at node i (Ti) and

BMj generated by the traffic at node j (Tj)

What we want to estimate

[ , ] | | | | | | | |i j i j i jTM i j T T T T T T

Estimation based on BMi and BMj

[Whang et al. 1990] proposed a method to infer |T| from BM , i.e.,

where is the number of “0”s in BM. |Ti U Tj| can be inferred from the bitwise-OR of

BMi and BMj.

An estimator of TM [i, j] is given by

We derive the variance of the estimator

U

ln( / )T b b U

[ , ] i j i jTM i j T T T T | |/ | | | |/| |/(2 | | / 1)i j i j jiT T b T T T bT b

i jb e e e e T T b

Multipaging

1 2

,1 1

[ , ] ( , )k k

q rq r

TM i j N overlap q r

1

1

2 3 4

2 3

t1 t2

Node i

Node j

Eliminating the effects of clock offset and packets in transit

1

1

2 3 4

2 3

t

Node i

Node j

T1 : a tight upper bound of clock offset (e.g., 50ms in a NTP enabled network)If t < T1, then overlap(1,2) = 1

Combining with packets in transitT2 : a tight upper bound of packet traversal timeIf t < T1+T2, then overlap(1,2) = 1

Counter array based scheme

Online streaming module Data analysis module

Online streaming module

The data digest data-structure is a counter array.

It is updated upon each packet arrival. Measurement proceeds in epochs.

Example

packet

0 1 2 i b-1

n

Flow label

H(.)

n+1

Counter array based scheme

Online streaming module Data analysis module

Data analysis module

Principle: find good counter-value matching between ingress nodes and egress nodes

Challenge: the hashing collisions make the one-to-one matching fail.

Method: iterative elephant-first matching Accuracy: work well for the medium-to-large

flow matrix elements due to the Zipfian nature of Internet traffic.

Elephant-first matching

K a1

Node i

a2

Node j

a1>a2a1-a2

Node i

0

Node j

FM[i, j, f] = a2

K a1 a2a1<=a2

0 a2-a1 FM[i, j, f] = a1

Evaluation

Ideally it would require packet-level traces collected simultaneously at hundreds of ingress and egress routers in an ISP during a certain period of time.

We construct the synthetic experiments based on 16 publicly available packet-level traces from NLANR.

Evaluation: traffic matrix

100000

1e+06

100000 1e+06

est

ima

ted

tra

ffic

ma

trix

ele

me

nts

Original traffic matrix elements

100000

1e+06

100000 1e+06

est

ima

ted

tra

ffic

ma

trix

ele

me

nts

Original traffic matrix elements

bitmap scheme counter array scheme

Metric

2

1

ˆ1

refers to the number of matrix

elements greater than .

i

Ni i

iT ix T

T

x xRMSRE

N x

N

T

RMSRE: traffic matrix

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0 20 40 60 80 100

RM

SR

E f

or

top

% o

f tr

aff

ic

Percentage of traffic above T

Bitmap schemeCounter array scheme

RMSRE: flow matrix

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

10 20 30 40 50 60 70 80 90 100

RM

SR

E

Threshold (packets)

Conclusion

A novel data streaming algorithm that can produces traffic matrix estimation much more accurate than existing approaches.

Another data streaming algorithm that very accurately estimates flow matrix, a finer-grained characterization than traffic matrix.

Both algorithms are designed to operate at very high speed networks.

Thank You!

Questions?


Recommended