+ All Categories
Transcript

Predicting and Bypassing End-to-End Internet Service Degradation

Anat Bremler-Barr Edith Cohen Haim Kaplan Yishay Mansour

Tel-Aviv University AT&T Labs Tel-Aviv University

Talk

Omer Ben-ShalomTel-Aviv University

Outline:

• Degradation– deviation from “normal” (minimum) RTT.

• Predicting Degradation:– Different Predictors

• Performance Evaluation:– Precision/recall methodology

• Suggested Application: Gateway selection

Motivating Application

AS 56Peering link

Peering link AS 123

Intelligent Routing device

?

• Gateway selection (Intelligent Routing device)• Choosing peering links

AS 12

AS 41

Data and Measurements: Sources

•Aciri (CA2)•AT&T (CA1)

•AT&T(NJ1)•Princeton (NJ2)

•Base Measurements from 4 different location (AS) simulated 4gateway:

California (CA): AT&T + ACIRINew Jersey (NJ): AT&T + Princeton

Data and Measurements: Destinations

•Obtaining a representative sets of web servers + weights(derived from proxy-log)

•Aciri (CA2)•AT&T(CA1)

•AT&T(NJ1)•Princeton (NJ2)

Data and Measurements: RTT

• Data: Weekly RTT (SYN) ( End to End (path+server)) Hourly measurements 35,124 servers Once-a-minute weighted sample measurements 100 servers

•Aciri (CA2)•AT&T(CA1)

•AT&T(NJ1)•Princeton(NJ2)

Degradation: Definition• Deviation from minimum recorded RTT (propagation delay)

• Discrete degradation levels 1-6.

Leveltime (ms)

150+

2+100

3+200

4+400

5+800

6+1600

Objective: Avoiding degradation?

• Attempt to reroute through a different gateway

• Two conditions have to hold

Need to be able to predict the failure from a gateway

Need to have a substitute gateway (low correlation between gateways)

• Blackout (consecutive degradation) through one gateway

Blackout durations• Longer duration, easier to predict.

• Majority of blackouts are short 1-3 consecutive points

• However, considerable fraction occurs in longer durations.

Long duration blackout

Gateways Correlation

• Gateways are correlated but often the correlation is not too strong

Gateways Correlation• Longer blackouts more likely to be shared

– failure closer to the server

• Majority of 2-gateways blackouts involved same-coast pairs

Building predictors

• For a given degradation level l.

• Prediction per IP.

• Input: Previous RTT Measurements for the IP-address.

• Output: probability for a failure

• Predict “failure” if probability > Ф

Precision \ Recall Methodology

Predicted degraded

Actual degraded

PrecisionPrecision= = Predicted degradedPredicted degraded

Actual degraded & Predicted DegradedActual degraded & Predicted Degraded

RecallRecall= = Actual degradedActual degraded

Actual degraded & Predicted DegradedActual degraded & Predicted Degraded

Precision-recall curve

• Sweep the threshold Ф in [0,1] to obtain a precision-recall curve.

• In other words, let P(t) the predicted failure probability at time t

])(| tat time failurePr[)(

] tat time failure|)(Pr[)(

tPprecision

tPrecall

What is important for prediction?

• Recency principle– The more recent RTTs are more important.

• Quantity Principle– The more measurements the higher the

accuracy.

Recency Principle : Importance• Test case: Single measurement predictor

– predict according to a measurement x-minute ago.– observe the change in the quality of the prediction.

15% different between using the last minute measurement or the 15 minutes ago measurement

Minute ago

NJ-2 failure level 6 recall(=precision)

NJ-1 failure level 3 recall(=precision)

10.330.5220.310.4940.290.4870.280.46

100.270.45150.260.44

Quantity Principle: Importance

• Test case: Fixed-Window-Count (FWC)– the prediction is the fraction of failures in the W most

recent measurements

By quantity we can achieve better precision for high recall

FWC 1FWC 5FWC 10FWC 50

Our predictors

– Exponential Decay – Polynomial Decay– Model based Predictors:

• VW-cover : Variable Window Cover algorithm

• HMM : Hidden Markov Model

Exponential-decay predictors

• The weight of each measurement is exponentially decreasing with its age by factor λ.

For consecutive measurements:

– Binary variable ft represents a failure at time t.

• In general,

t

t

Ht

tt

tt

Ht tft

'

'

'

' ')(ExpDecay

)1(ExpDecay)1()(ExpDecay tft t

Polynomial-decay predictors

t

t

Ht

Ht t

tt

ttft

'

' '

)'(

)'()(PolyDecay

• Exact computation required to maintaining the complete history.

• We approximated it.

The VW-Cover predictor

• Consists of a list of pairs

( a1 , b1) ( a2 , b2 ) …( an , bn )

• Predict a failure if exist i such that there are at least bi failures among previous ai

measurements

VW-Cover predictor: Building

• Build the predictor greedily to cover the failures.

• Use a learning set of measurements – Pick ( a1 , b1 ) to be the pair which maximizes

precision

– Pick ( ai , bi ) to be the pair which maximizes precision among uncovered failures

Hidden Markov Model

• Finite set states S (we use 3 states)

• Output probability as(0),as(1)

• Transition function, determines the probability distribution of the

next state.

• The probability for a failure:

Where ps(t) is the probability to

be at state s at time t. Ps(t) is updated according to the output of time t-1.

)()1()( spatHMM tSs

s

Experimental Evaluation

A recall 0.5 precision close to 0.9

Predictor Performance – Level 3

FWC10FWC 50ExpDecay 0.99ExpDecay 0.95VW-CoverHMM

Predictor Performance – Level 6

Degradation of level-6 are harder to predict: recall 0.5 precision 0.4

FWC10FWC 50ExpDecay 0.99ExpDecay 0.95VW-CoverHMM

Predictor Performance: Conclusion

• The best predictors in level 3 and 6 are

VW-cover and HMM

• But they only slightly outperform ExpDecay0.95 which is considerable simpler to implement

Gateway Selection

Best Gateway

Worst Gateway

OptimalExpDecay0.95VW-Cover

Static:

IP Gateway

1.15%3.29%0.08%0.52%0.49%0.86%

Level 6

Best-Gateway

Worst Gateway

OptimalExpDecay0.95VW-Cover

Static:

IP Gateway

3.45%5.77%0.45%1.56%1.50%2.41%

Level 3

Gateway Selection: Conclusion

• Active gateway selection resulted in 50% reduction in the degradation-rate with respect to best single gateway.

• Static gateway selection can avoid at most 25% of degradations.

• Again ExpDecay0.95 only slightly under perform the best predictor (VW-cover).

Performance of gateway selection as a function of recency

Correlation between coast

• Gateway selection on same-coast pair resulted only in 10% reduction. Chose independent gateways

NJ-2 NJ-1 CA-2 NJ-2

levelBest gateway

Best Predictor

Best gateway

Best Predictor

61.15%1.05%1.15%0.54%

33.45%3.05%3.45%1.78%

Controlling prediction overhead

• Type of measurements:– Active measurements :

• initiate probes (SYN,ping,HTTP request).• Scalability problem.

– Passive measurements:• collected on regular traffic

• Controlling the prediction overhead:– Using less-recent measurements– Active measurements only to small set of destinations,

which cover the majority of traffic.– Cluster destinations. The measurements of one destination

can be used to predict another.


Top Related