The Impact and Implications of the Growth in Residential User-to-User Traffic

Post on 31-Jan-2016

26 views 0 download

description

The Impact and Implications of the Growth in Residential User-to-User Traffic. Kenjiro Cho, Kensuke Fukuda, Hiroshi Esaki, Akira Kato (SIGCOMM'06). Presented by Stanley Wong, Tony Wat Spring 2007. 1. Introduction. - PowerPoint PPT Presentation

transcript

The Impact and Implications of the Growth in Residential User-to-U

ser Traffic

Kenjiro Cho, Kensuke Fukuda, Hiroshi Esaki, Akira Kato

(SIGCOMM'06)Presented by Stanley Wong, Tony Wat

Spring 2007

1. Introduction

• Worldwide increase in user-to-user traffic observed, putting pressure on commercial backbone

• Strong concern on Internet backbone technologies not able to keep up with rapid-growing residential traffic

• Ensure the evolution of Internet, understand the effects of growing residential traffic

1. Introduction

• Japan has high penetration rate of fibre-based broadband access (expontentially increasing) while increase in DSL is slowing down

• Good candidate for study different behaviours

1. Introduction

• Technically and politically difficult to obtain traffic data from commercial ISP as it contain sensitive data of ISPs

• Measuring methods and policies varies among ISPs, make it difficult to compare

• Involved seven major Japanese commercial ISPs in collecting traffic data

• Goal is to know the ratio of residential broadband traffic to other traffic, changes in traffic patterns, regional differences among ISPs

2. Data Collection

• Two data sets

• Aggregated interface counters of edge routers from 7 ISPs– analysis at macro-scopic level

• Sampled NetFlow data of one of the ISPs– detailed per-customer analysis

2.1 Data collection of aggregated traffic

• Most ISPs collect interface counters values on their routers, usually have data in 2-hour resolutions

• Developed and provided a perl script to ISPs to read log files and aggregate data according to different group of routers

• So as to allow ISPs not to disclose internal network structure or unrelated details of their traffic

2.1 Data collection of aggregated traffic

• Collected six times, month-long traffic logs from 7 ISPs from 2004 to 2006

• Focus on traffic crossing ISP boundaries

• Grouped to customer, domestic and international traffic

2.2 Data collection of per-customer traffic

• Sampled NetFlow data from one ISP• Sampling rate of 1/2048 on all edge routers to re

sidential broadband customers• Collected five times, week-long data sets from 2

004 to 2005• Data include inbound/outbound traffic volume of

each customer in 1 hour resolution with customer attributes such as line type (fibre or DSL), customer IDs

• Combined with 2 geo-IP databases to analyze geographic communication patterns

3. Analysis Aggregated Traffic• Between Nov 2004 and

Nov 2005– RBB customer traffic (A1)

= 26% for inbound, 46% for outbound and 37% for combined volume

– Different between inbound and outbound slightly widened in the first 6 months

– Estimated ratio (A1)/(A1+A2) = 59%

3.1 Growth of Traffic• The average rates of a

ggregated external traffic– Total volume of externa

l domestic traffic (B2), exceeds the volume for the 6 major IXes (B1)

– International traffic : Total external traffic = 30% for inbound and 26% for outbound

3.1 Growth of Traffic

• Relationship between total customer traffic (A) and total external traffic (B)– Assume all inbound traffic

from other ISPs is destined to customers:

• Inbound traffic volume for (B) should be closed to outbound traffic for (A)

• Outbound traffic volume for (B) should be closed to inbound traffic for (A)

3.1 Growth of Traffic• Relationship between IX traffic (B1) and total i

nput rate of 6 major IXes– Total incoming traffic of these IXes = 42% of total t

raffic– Total amount of residential broadband traffic in Ja

pan in Nov 2005: 353Gbps for inbound, 468 for outbound

3.2 Customer Traffic

• Took the average of the same weekdays in a month

• Excluded holidays from the weekly analysis since holiday traffic patterns are closer to weekends

3.2 Customer Traffic• For RBB customer (A1), exceeds 260Gbps in

evening hours• The peak hours are from 21:00 to 23:00• Downstream traffic is much larger than

upstream• Believe that P2P applications contribute

significantly to the upstream traffic• For non-RBB customers (A2), dominated by

residential traffic• Observe office hour traffic in the daytime but

those customer traffic is smaller than residential customer traffic

3.3 External Traffic

• External traffic group are used to understand the total traffic volume in back bone network

• Top graph shows traffic to and from 6 major IXes (B1)

• Middle graph shows external domestic traffic (B2)

• Bottom graph shows international traffic (B3)

3.3 External Traffic

• For bottom graph, inbound traffic is much larger than the outbound

• Traffic pattern is clearly different from the domestic traffic

• Peak hour are still in the evening, but outbound traffic volume is virtually flat compared to inbound volume

3.4 Prefectural Traffic• Investigate regional diff

erence (between metropolitan and rural areas

• Similar temporal patterns

• 70% of average traffic is constant

• Prefecture’s traffic is roughly proportional to the population of the perfecture

4. Analysis of Per-customer Traffic

• Analyzes Sampled NetFlow data from one of the ISPs

• The number of unique active users identified by customer Ids

• Classified into 2 groups: more than 2.5GB/days and less than 2.5 GB/days

• The total number of active users of DSL is slightly higher than fiber

4.1 Distribution of Heavy-hitters

• Cumulative distribution of total traffic volume of heavy-hitters in decreasing order of volume

4.1 Distribution of Heavy-hitters

• Cumulative distribution of daily traffic per user on a log-log scale– Total users– Fiber users– DSL users

4.1 Distribution of Heavy-hitters

• The distribution is heavy-tailed but there is a knee in the slope

• Top 4% of heavy-hitters using more than 2.5GB/day (or 230kbits/sec) for the total users

• Top 10% using more than 2.5GB/day for the fiber users

• Less clear for DSL users, a knee can be seen at around the top 2% using more than 2.5GB/day

• Outbound traffic is larger for the majority of the users on the left side of the knee

• But does not hold for heavy-hitters on the right side of the knee

4.1 Distribution of Heavy-hitters

• Distribution of the metropolitan prefecture is closer to that of the total users

• Distribution of the rural prefecture is closer to that of DSL users

4.2 Correlation of Inbound and Outbound Volume

• Correlation between inbound and outbound volumes for each user shown as log-log scatter plots

• 4300 points for fiber and 5400 for DSL

• Highest density cluster is below and parallel to the unity line where outbound volume is about 10 times larger than that of inbound

• Slope of cluster seems to be slightly larger than 1

• High-volume cluster is larger in the fiber plots

• Much more low-volume users in the DSL plot

4.3 Temporal Behavior

4.3 Temporal Behavior• Inbound and outbound volumes are almost

equal for fiber traffic• Inbound is 61% > heavy-hitters and outbound

is 166% > normal users• In DSL traffic, outbound volume is 83% > total

users, only 11% > heavy-hitters and 179% > normal users

• Inbound traffic of fiber heavy-hitters is much larger than outbound traffic

• Fiber traffic accounts for 86% of the total inbound volume and 80% of total residential volume

4.3 Temporal Behavior

• Increase of active users in morning > Increase of traffic volume, but the increase is smaller

4.4 Protocol and Port Usage• Port 80 (http) accounts

only 9 % of total traffic• TCP dynamic port account

83% of total traffic but the usage of each port is small

• Most popular P2P file-sharing software in Japan (WINNY)

• No longer possible to make use of port number for identifying applications

4.5 Geographic Traffic Matrices• Shows traffic matrix among residential users (RBB), domestic data-centers leased-

lines (DOM) and international addresses (INTL)• 90% is domestic communication• Both ends are either domestic residential users or other domestic addresses

– Language and cultural barriers– Domestic fiber users are connected so well

4.5 Geographic Traffic Matrices

• Divided into heavy-hitters and normal users

• Ratio of user-to-user traffic is 69% for heavy-hitters and 49% for normal users

4.5 Geographic Traffic Matrices

• Users access similar destinations regardless of the user location

• Cannot identify any increase in traffic to neighbor prefectures

• A small number of peers for video

4.5 Geographic Traffic Matrices

• Users-to-users group has a much larger number of peers than the user-to-domestic group

• 80% at the horizontal line have less that 18 dominant peers

• 80% have only less than 4.7 dominant peers

4.5 Geographic Traffic Matrices

• Wider range of peer numbers regardless of the traffic volume

• High-volume traffic is generated not only for P2P file-sharing but also by other applications

5. Related Work

• Previous study on growth rate of Internet traffic, now becomes harder after privatization of Interner after mid 90s

• One study shows 100% growth rate per year for U.S. in 2003

• From data observed in Japan, growth rate slow down after 2002 to stable at 50% per year

• Similar rate observed in Australia and Hong Kong

• Probably due to broadband deployment already reached most technically concious users

5. Related Work

• Consistent findings with earlier measurements of peer-to-peer traffic where it is dominant in commercial backbones, exhibit different behaviour from traditional web traffic

• However, no longer able to rely on known port numbers to identify applications as peer-to-peer traffic shifting from using known to arbitrary ports

5. Related Work

• Previous studies reported asymmetric nature of peer-to-peer traffic

• Findings from this paper show from comparison between fiber and DSL users that bandwidth demands are not asymmetric

• And deployment of symmetric access will change traffic patterns

6. Implications

• Initially observed large skew in traffic usage• top 4% heavy-hitters account for 70% traffic; fibe

r user accounts for 80% traffic• Per-customer measurement found that distributi

on of their traffic is heavy-tailed, it is widespread and appear to be casual users rather than more dedicated users

• Traffic patterns apparently shows it is a diversed mixture of peer-to-peer file sharing and content-downloading

6. Implications

• Can no longer view heavy-hitters as exceptional extremes, too many of them, statistically distributed over wide traffic volume range

• Natural to think casual user start play with new applications such as video downloading and peer-to-peer file-sharing, become heavy-hitters, and shift from DSL to fiber

• Or user start with fiber, and look for applications to use the abundant bandwidth

• Their behaviour easily affected by social, economical or political factors

6. Implications

• Total traffic volume heavily impacted by heavy-hitters, slight change in application algorithm or charging policies will cause significant impact to backbone traffic

• ISP tempted to avoid congestion by suppressing traffic from extreme heavy-hitters, however users as a whole are shifting towards high-volume usage

6. Implications

• Japan can be regarded as model of widespread symmetric residential broadband access, even Korea has highest broadband penetration ratio, but majority are not fiber access

• Japan has fairly closed domestic traffic, partly due to language and cultral barriers and partly due to rich connectivity within the country

7. Conclusion

• Widespread residential broadband access• Essential for researchers and industry to prepare

to accommodate with end users ever-changing behaviour

• Established protected data sharing mechanisms with commercial Japanese ISP for data collection

• Residential broadband traffic accounts for 2/3 of ISP backbone traffic, increasing at 37% per year

7. Conclusion

• Investigated differences between DSL and fiber users, heavy-hitters and normal users, and geographic traffic matrices

• Small segment of user dictates the overall behaviour

• The distribution of heavy-hitters is heavy-tailed without a clear boundary between heavy-hitters and rest of users

Our view

Our findings

• Data from Hong Kong OFTA on monthly broadband customers Internet traffic

• Nov 2005: 505739 Terabits / month = 195Gbps• Vs. paper estimated 353Gbps-468Gbps resident

ial broadband traffic in Japan Nov 2005

http://www.ofta.gov.hk/en/tele-lic/operator-licensees/opr-isp/s2.html

Our findings

• HKIX switching statistics

• Also observed slight different patterns on weekdays and weekends with larger daytime traffic at weekends

• Peak hours in HK 22:00-00:00 vs. paper mentioned 21:00-23:00 in Japan

http://www.cuhk.edu.hk/hkix/stat/aggt/hkix-aggregate.html

Significance

• Japan as a good place to study behaviour of symmetric broadband access– Proportion of users of symmetric broadband access in Japan are

larger than that in other countries

• Can no longer rely on port numbers to identify application used– May need to identify by other kind of signatures

• The bandwidth demands of P2P applications and users are not asymmetric in nature– Previous studies report P2P traffic are asymmetric in nature– Actual demand of users are shown when they are given

symmetric bandwidth

Limitations and potential improvements

• Language and cultural barriers of Japan– Majority of content is in the Japanese language– 90% communication are domestic at both ends– Other countries may exhibit different behaviour

• Potential improvements– Find out the differences in traffic behaviour among co

untries where language and cultural differences are not that significant

Limitations and potential improvements

• Bandwidth usage are application-specific– P2P application dominate usage patterns– Social factor, different P2P application popular in

different country– Slight change in P2P application behaviour can affect

the bandwidth usage

• Potential improvements– Find out in what ways will a difference in application b

ehaviour can affect bandwidth usage– And how the application behaviour can be optimized s

o that network resources can be better utilized

Thank you