Moisés Bezerra Estrela Rodrigues - UFPE...Moisés Bezerra Estrela Rodrigues Towards Improvements in...

Moisés Bezerra Estrela Rodrigues

TOWARDS IMPROVEMENTS IN RESOURCE MANAGEMENT FOR

CONTENT DELIVERY NETWORKS

Ph.D. Thesis

Federal University of [email protected]

www.cin.ufpe.br/~posgraduacao

RECIFE2016

www.cin.ufpe.br/~posgraduacao


TOWARDS IMPROVEMENTS IN RESOURCE MANAGEMENT FORCONTENT DELIVERY NETWORKS

A Ph.D. Thesis presented to the Center for Informatics of

Federal University of Pernambuco in partial fulfillment of

the requirements for the degree of Philosophy Doctor in

Computer Science.

Advisor: Djamel Fawzi Hadj Sadok

RECIFE2016

Catalogação na fonte

Bibliotecária Monick Raquel Silvestre da S. Portes, CRB4-1217

R696t Rodrigues, Moisés Bezerra Estrela

Towards improvements in resource management for content delivery networks / Moisés Bezerra Estrela Rodrigues. – 2016.

116 f.: il., fig., tab. Orientador: Djamel Fawzi Hadj Sadok. Tese (Doutorado) – Universidade Federal de Pernambuco. CIn, Ciência da

Computação, Recife, 2016.

Inclui referências e apêndices.

1. Redes de computadores. 2. Sistemas distribuídos. 3. Telefonia celular. 4. Vídeo - distribuição. I. Sadok, Djamel Fawzi Hadj (orientador). II. Título. 004.6 CDD (23. ed.) UFPE- MEI 2016-092


Towards Improvements in Resource Management for Content Delivery Networks

Tese de doutorado apresentada ao Programade Pós-Graduação em Ciência da Com-putação da Universidade Federal de Pernam-buco, como requisito parcial para a obtençãodo título de Doutor em Ciência da Com-putação

Aprovado em: 03/03/2016.

———————————————————————–Orientador: Prof. Djamel Fawzi Hadj Sadok

BANCA EXAMINADORA

———————————————————————–Prof. Eduardo James Pereira Souto

Instituto de Computação/UFAM

———————————————————————–Prof. Arthur de Castro Callado

Departamento de Ciência da Computação/UFC

———————————————————————–Dr. Artur Ziviani

Laboratório Nacional de Computação Científica/LNCC

———————————————————————–Prof. Paulo Romero Martins MacielCentro de Informática(CIn)/UFPE

———————————————————————–Prof. Ricardo Massa Ferreira LimaCentro de Informática(CIn)/UFPE

To my parents Raimundo Nonato and Maria das Neves, my

brothers Matheus and Naum, my sister Raquel and my

beloved wife, Romanan.

Acknowledgements

First and foremost, thanks to my best friend, my wife, and my love, Romanan, for herpatience and support during this work.

I would like to thank Professor Djamel and professor Judith for all the work, ideas,and support in supervising this thesis. Also special thanks to them for believing and allowingme to work with them in the GPRT (Networks and Telecommunications Research Group).Thanks also to André, Glauco, Ernani, Marcos, Daniel, and Demis for all the suggestions andtalks that cleared so many doubts. Special thanks for Patricia and Wesley, your comments andrecommendations were essential in the process of finalizing this work. Many thanks to all theother people from GPRT, who also offered support for this work. Thanks to the support staff,Rodrigo, Andreas, Bruno, Manu, Ana, Roxana, without your help and hard work this work surelywould not be possible.

I am grateful to György Dán for the time he spent advising my research and for receivingme as a visiting researcher at the Lab of Communication Networks (LCN). Thanks also to myLCN colleagues, especially, Valentino, Sladjana, and Emil, for all the support and fantasticfoosball matches.

Thanks for all my great friends from VaiDiBolo group. 20 years of friendship andcounting. To my friends Thiago Aguiar, Rafael and Gabriel Malta, Felipe, Juju, and Pedro, forall the gigs and beach volleyball matches. Also, to Thiago Araujo for all the discussions aboutnothing and everything.

Finally, I would like to thank my parents Nonato and Nevinha, my brothers Naumand Matheus, my sister Raquel, my grandparents João, Antonieta (in memorian), Bezerra (inmemorian) and Francisquinha (in memorian), my uncles, my aunts and my cousins for all faithin me and support throughout my life. They made me who I am, and I will always be grateful forthat.

The seed is planted,

let the roots reach far and wide,

and let it grow tall

let the rings remain intact on the inside

and though the autumn brings a fall of leaves

let it grow tall

—PROTEST THE HERO

Resumo

Durante a última década, a rede mundial de computadores evoluiu de um meio de conexão paraum pequeno grupo de nós para o meio de pelo qual pessoas obtém conhecimento, interaçãosocial e entretenimento. Além disso, nossas casas e estações de trabalho não são nossos únicospontos de acesso à rede. De acordo com a Cisco, o tráfego global da rede em 2018 será trêsvezes maior do que era em 2013. Entretenimento em tempo real tem sido e continuará sendouma parte importante nesse crescimento. No entanto, a rede não foi projetada para lidar comessa demanda, portanto, existe a necessidade de novas tecnologias para superar tais desafios.Content Delivery Networks (CDN) se mostram como uma boa alternativa para superar esses

desafios. Seu conceito básico é distribuir servidores de réplica geograficamente, mantendo assimo conteúdo próximo aos usuários. Seguindo sua popularidade, um número crescente de CDNs,em sua maioria locais, começaram a ser implementadas. Além disso, computação em nuvemsurgiu, tornando software e hardware recursos acessíveis através de interfaces bem definidas. Osserviços na nuvem, tais como Infrastructure as a Service (IaaS) distribuídos, tornam possível aimplementação de CDNs complexas. Apesar de ser a melhor tecnologia para entrega de conteúdoem termos de escalabilidade, existem cenários que ainda desafiam as CDNs, como eventos deflash crowd. Portanto, precisamos estudar estratégias de entrega de conteúdo para acompanhar demaneira eficiente o constante crescimento na necessidade por conteúdo, aproveitando também asnovas possibilidade como, o crescimento de CDNs localizadas e popularização da computaçãoem nuvem.Examinando os problemas levantados, essa tese apresenta estratégias no sentido de melhorar Con-tent Delivery Networks (CDN). Fazemos isso propondo e avaliando algoritmos, modelos e umprotótipo demonstrando possíveis usos de tais tecnologias para melhorar o gerenciamento derecursos das CDNs. Apresentamos o P2PCDNSim, um simulador de CDNs planejado paraauxiliar pesquisadores no processo de planejamento e avaliação de novas estratégias. Além disso,propomos uma nova estratégia de posicionamento de réplicas dinâmica, baseada na contagemde fluxos de dados passando pelos nós, que mantém uma Quality of Experience (QoE) similarenquanto diminui tráfego entre Autonomous System (AS). Ademais, propomos uma soluçãobaseada em Software Defined Networks (SDN) que aumenta a flexibilidade de posicionamentode servidores réplica dentro do backhaul móvel. Nossos resultados experimentais mostram que oatraso introduzido pelo nosso módulo é menor que 5ms em 99% dos pacotes transmitidos, atrasomínimo nas redes Long-Term Evolution (LTE) atuais.

Palavras-chave: Redes de Distribuição de Conteúdo. Redes Definidas por Software. Com-putação em Nuvem. P2P. Simulação de Eventos Discretos. Protocolo de Tunelamento GPRS

Abstract

During the last decades, the world web went from a way to connect a handful of nodes to themeans with which people cooperate in search of knowledge, social interaction, and entertainment.Furthermore, our homes and workstations are not the only places where we are connected, themobile broadband market is present and changing the way we interact with the web. Accordingto Cisco, global network traffic will be three times higher in 2018 than it was in 2013. Real-timeentertainment has been and will remain an important part of this growth. However, the internetwas not designed to handle such demand and, therefore, there is a need for new technologies toovercome those challenges.Content Delivery Networks (CDN) prove to be an alternative to overcome those challenges. Thebasic concept is to distribute replica servers scattered geographically, keeping content close toend users. Following CDN’s popularity an increasing number of CDNs, most of them extremelylocalized, began to be deployed. Furthermore, Cloud Computing emerged, making software andhardware accessible as resources through well-defined interfaces. Using Cloud services, such asdistributed IaaS, one could deploy complex CDNs. Despite being the best technology to scalecontent distribution, there are some scenarios where CDNs may perform poorly, such as flashcrowd events. Therefore, we need to study content delivery techniques to efficiently accompanythe ever increasing need for content contemplating new possibilities, such as growing the numberof smaller localized CDNs and Cloud Computing.Examining given issues this work presents strategies towards improvements in Content DeliveryNetworks (CDN). We do so by proposing and evaluating algorithms, models and a prototypedemonstrating possible uses of such new technologies to improve CDN’s resource management.We present P2PCDNSim, a comprehensive CDN simulator designed to assist researchers inthe process of planning and evaluating new strategies. Furthermore, we propose a new dy-namic Replica Placement Algorithm (RPA), based on the count of data flows through networknodes, that maintains similar Quality of Experience (QoE) while decreasing cross traffic duringflash crowd events. Also, we propose a solution to improve the mobile backhaul’s replicaplacement flexibility based on SDN. Our experimental results show that the delay introduced bythe developed module is less than 5ms for 99% of the packets, which is negligible in today’sLTE networks, and the slight negative impact on streaming rate selection is easily outweighed bythe increased flexibility.

Keywords: Content Delivery Networks. Software Defined Networks. Cloud Computing.Simulation. Discrete Event. GPRS Tunneling Protocol

List of Figures

1.1 Peak period aggregate traffic composition. . . . . . . . . . . . . . . . . . . . . 171.2 The growth of the mobile market. . . . . . . . . . . . . . . . . . . . . . . . . 20

2.1 500 nodes topology used in the evaluation of the FlowCount strategy. . . . . . 292.2 Bandwidth timeline during the Youtube scenario simulation. . . . . . . . . . . 312.3 Cross traffic timeline during Youtube scenario simulation. . . . . . . . . . . . . 312.4 Startup delay box plot for YouTube scenario considering all clients. . . . . . . 322.5 Startup delay box plot for YouTube scenario excluding outliers. . . . . . . . . . 332.6 Bandwidth timeline during the Youtube scenario simulation. . . . . . . . . . . 332.7 Cross traffic timeline during Youtube scenario simulation. . . . . . . . . . . . . 34

3.1 Basic model components. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403.2 Comparison of results obtained by our model with results from Molina, Palau,

and Esteve (2004), considering the same scenario (Table 3.2). . . . . . . . . . . 473.3 Comparison of response time considering scenarios with and without multi-

ple CDN collaboration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 473.4 Relation between kl , k f , which represent local and foreign CDN capacities

respectively, and response time. . . . . . . . . . . . . . . . . . . . . . . . . . . 493.5 Relation between k f , mcp and response time. . . . . . . . . . . . . . . . . . . 493.6 Relation between kl , mcp and response time. . . . . . . . . . . . . . . . . . . . 503.7 Comparison of response time considering two redirection strategies, namely

Recursive and Interactive Request Routing. . . . . . . . . . . . . . . . . . . . 533.8 Comparison of response time for cache hit values [0,1] considering Recursive

Request Routing, Interactive Request Routing and no collaboration. . . . . . . 54

4.1 LTE Mobile backhaul architecture. . . . . . . . . . . . . . . . . . . . . . . . . 594.2 Software Defined Networks (SDN) control plane illustration. . . . . . . . . . . 604.3 Transparent caching for LTE networks architecture. . . . . . . . . . . . . . . . 624.4 Experimental testbed topology. . . . . . . . . . . . . . . . . . . . . . . . . . . 654.5 Flow diagram illustrating TC’s operation to transparently redirect and splice

content between DC and CSVR. . . . . . . . . . . . . . . . . . . . . . . . . . 674.6 Box plot of bitrates for 15s length video segments. . . . . . . . . . . . . . . . 674.7 CDF of bitrate for 1s-15s segments for GTP+TC scenario. . . . . . . . . . . . 684.8 Segment bitrate selection frequency for three scenarios and n = 4,32,64,128

simultaneous DASH clients. Bitrates in the legend in kbps. . . . . . . . . . . . 69

4.9 Autocorrelation function (ACF) of segment download rates for three scenariosand n = 4,32,64,128 simultaneous DASH clients. . . . . . . . . . . . . . . . . 70

A.1 Illustration of the basic CDN concept, store content close to end users. . . . . . 86A.2 CDN components and how they interact based on an illustration found in (BUYYA;

PATHAN; VAKALI, 2008). . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88A.3 Request rate variation over time during a flash crowd event. . . . . . . . . . . . 91A.4 Screen shot of P2PCDNSim’s GUI. This realistic globe shaped interface enables

real-time on-the-fly metric monitoring. . . . . . . . . . . . . . . . . . . . . . . 94A.5 The modularized architecture of the P2PCDNSim simulator. . . . . . . . . . . 95A.6 Example of a possible hybrid CDN-P2P topology where a set of nodes compose

an AS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96A.7 Example of a simple topology where each node is considered a single AS. . . . 96A.8 Screenshot from the P2PCDNSim simulation scenario wizard. . . . . . . . . . 99A.9 TCP congestion window behavior comparing ns-3 and P2PCDNSim. . . . . . . 101A.10 Memory usage comparison between ns-3 and P2PCDNSim considering the three

nodes scenario. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102A.11 Memory usage comparison between ns-3 and P2PCDNSim considering the 20

nodes scenario. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

C.1 Cache hit count collected from all replica servers during the simulation. . . . . 113C.2 Cache miss count collected from all replica servers during the simulation. . . . 113C.3 Cross AS traffic rate, in bits/second, during the simulation. . . . . . . . . . . . 114C.4 Inner AS traffic rate, in bits/second, during the simulation. . . . . . . . . . . . 114C.5 Total Network traffic rate, in bits/second, during the simulation. . . . . . . . . 115C.6 Startup Delay mean collected from all clients that requested content during the

experiment. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115C.7 Startup Delay through simulation time collected from all clients that requested

content during the experiment. . . . . . . . . . . . . . . . . . . . . . . . . . . 116

List of Tables

2.1 Description of the 24 hours scenario generated used ProwGen (BUSARI; WILLIAMSON,2002) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

2.2 95th percentile in MB/s for Youtube scenario considering Total Network Trafficand Cross AS Traffic metrics. . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

2.3 95th percentile for ProwGen scenario considering Total Network Traffic andCross AS Traffic metrics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

3.1 Description of variables used in our proposed model. . . . . . . . . . . . . . . 433.2 Second scenario describing a small network with 8 client clusters and 4 Replica

Serverss (RSs). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 463.3 Description of the scenario used to evaluate if it is worth to collaborate. . . . . 46

5.1 Scientific papers produced related to this Thesis. . . . . . . . . . . . . . . . . . 735.2 Other publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

A.1 Results collected from both simulators for the three nodes topology. . . . . . . 103A.2 Results collected from both simulators for the 20 nodes topology. . . . . . . . . 103A.3 Scenario configuration for the comparison between P2PCDNSim and CDNSim. 103A.4 Cache hit results collected for scenarios 1 and 2 using P2PCDNSim and CDNSim.103A.5 List of all contributions related to the P2PCDNSim simulator tool. . . . . . . . 106

List of Acronyms

3GPP 3rd Generation Partnership Project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

AS Autonomous System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

AIS Accounting Internetworking System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

ATM Asynchronous Transfer Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

CAPEX Capital Expenditure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

CDN Content Delivery Networks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .17

CP Content Providers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

CPM Constraint P-Median . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

CDNI Content Delivery Networks Interconnection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .38

CSN Content Service Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

CDI Content Distribution Internetworking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

CSDN Content and Service Delivery Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

CSD Content Server Directory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

CSP Content Service Provider . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

CLD Content Location Directory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

CCN Content Centric Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

DNS Domain Name Service . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

DIS Distribution Internetworking System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

DASH Dynamic Adaptive Streaming over HTTP. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .60

DPI Deep Packet Inspection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

EPS Evolved Packet System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

GTP GPRS Tunnelling Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

GPRT Network and Telecommunications Research Group . . . . . . . . . . . . . . . . . . . . . . . . 92

ISP Internet Service Provider . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

IaaS Infrastructure as a Service . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

IETF Internet Engineering Task Force . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

IDNS Intelligent Domain Name Server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

ICN Information Centric Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

LFU Least Frequently Used . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

LTE Long-Term Evolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

LRU Least Recently Used . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

MME Mobility Management Entity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

MPD Media Presentation Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

ns-3 Network Simulator 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

NAT Network Address Translation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

NRS Name Routing System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

P2P Peer to Peer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

PGW Packet Data Network Gateway . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

PDH Plesiosynchronous Digital Hierarchy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

OS Origin Server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

OPEX Operating Expense . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

QoE Quality of Experience. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .21

RAN Radio Access Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

RPA Replica Placement Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

RS Replica Servers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

RR Request Redirector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

RRIS Request-routing Internetworking System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

RIEP Request-Routing Information Exchange Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . 38

RTT Round Trip Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

SLA Service Level Agreements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

SDN Software Defined Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

SGW Serving Gateways . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

TCP Trasmission Control Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

TEID Tunnel Endpoint Identifier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

UGC User Generated Content . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

UE User Equipment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .58

VO Virtual Organizations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

Contents

1 Introduction 161.1 Motivations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171.2 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211.3 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

2 FlowCount: Dynamic Replica Placement Algorithm 232.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232.2 RPA State of the Art . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252.3 FlowCount Algorithm Description . . . . . . . . . . . . . . . . . . . . . . . . 26

2.3.1 FlowCount Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . 262.3.2 Complexity Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

2.4 Experiments and Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282.4.1 Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282.4.2 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

2.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

3 Multiple CDNs Collaboration 363.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363.2 State of the Art . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

3.2.1 CDN Collaboration . . . . . . . . . . . . . . . . . . . . . . . . . . . . 373.2.2 CDN Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

3.3 CDN Collaboration Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393.3.1 Model Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

3.4 Methodology and Initial Experimental Results . . . . . . . . . . . . . . . . . . 453.5 Collaboration Overhead . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

3.5.1 Collaboration Overhead Results . . . . . . . . . . . . . . . . . . . . . 533.6 Lessons Learned . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 543.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

4 Enabling Transparent Caching in LTE Mobile Backhaul 564.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 564.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

4.2.1 Mobile Backhaul Architecture . . . . . . . . . . . . . . . . . . . . . . 584.2.2 Stateful L4-L7 Processing in SDN . . . . . . . . . . . . . . . . . . . . 594.2.3 MPEG DASH Streaming . . . . . . . . . . . . . . . . . . . . . . . . . 60

4.3 Transparent Caching in the Mobile Backhaul . . . . . . . . . . . . . . . . . . 61

15

4.3.1 Design and Function Placement . . . . . . . . . . . . . . . . . . . . . 624.3.2 Switch-based transparent caching for LTE . . . . . . . . . . . . . . . . 63

4.4 Prototyping and Experimental evaluation . . . . . . . . . . . . . . . . . . . . . 644.4.1 Prototype Implementation . . . . . . . . . . . . . . . . . . . . . . . . 644.4.2 Experiment Methodology . . . . . . . . . . . . . . . . . . . . . . . . 664.4.3 Throughput Performance . . . . . . . . . . . . . . . . . . . . . . . . . 664.4.4 DASH Streaming Performance . . . . . . . . . . . . . . . . . . . . . . 68

4.5 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 704.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

5 Conclusion 725.1 Contributions of this Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 725.2 Future Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

References 76

Appendix 83

A P2PCDNSim Simulation Tool 85A.1 Content Delivery Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

A.1.1 CDN Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87A.1.2 The Flash Crowd Challenge . . . . . . . . . . . . . . . . . . . . . . . 90

A.2 CDN Simulation State of the Art . . . . . . . . . . . . . . . . . . . . . . . . . 91A.3 P2PCDNSim Simulation Tool . . . . . . . . . . . . . . . . . . . . . . . . . . 92

A.3.1 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93A.3.2 Network Layer Comparison and CDN Layer Validation . . . . . . . . . 99

A.4 Lessons Learned . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103A.5 Observational Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104A.6 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

B P2PCDNSim I/O 107

C Report System Graph Examples 112

161616

1Introduction

If you would cause your view . . . to be acknowledged by scientific men; you

would do a great service to science. If you would even get them to say yes or

no to your conclusions it would help to clear the future progress. I believe

some hesitate because they do not like their thoughts disturbed.

—MICHAEL FARADAY

The world-wide web started as a way to connect a handful of nodes and it is nowthe means through which people cooperate in search of knowledge, social interaction andentertainment. Not only broadband access is evermore present but the “Next-Generation Access”,connections above 30Mbps, reached almost complete urban coverage in the US. According tothe North American Federal Communications Commission, more than 80% households havehigh broadband connections (CHIEF, 2015). Furthermore, our homes and workstations are notthe only places where we are connected, the mobile broadband market is changing the way weinteract with the world wide web. Shipments of mobile devices are increasing worldwide eachyear, according to the International Data Corporation (IDC), these are expected to grow at acompound annual growth rate (CAGR) of 12.7% from 2013 to 2018 (IDC, 2014). Therefore,mobile network traffic is also expected to follow suite. According to Cisco, the expected CAGRto the mobile market is 54% from 2014 to 2019 (CISCO, 2015). The numbers only supportsomething that is clear in our everyday activities, we were never as connected as we are today.

The presence of more devices leads to more people connected, which, in turn, increasesin network traffic. According to Cisco’s network forecast, global network traffic will continue toincrease (FORECAST, 2014) resulting in a total network traffic almost three times higher in 2018than in 2013. . Mobile data traffic is also increasing, according to the same source, it will grow ata CAGR of 57% from 2014 to 2019. One of the main contributors to this growth is multimediastreaming. According to the Global Internet Phenomena report (ULC, 2014), considering NorthAmerica, real-time entertainment traffic is responsible for 59.09% of the aggregated traffic forfixed access and 36.07% for mobile traffic, as we can see in Figure 1.1(a) and Figure 1.1(b)

1.1. MOTIVATIONS 17

respectively. Both industry and academia agree that real-time entertainment has been and willremain an important subject, as confirmed by Kurt Michel, the director of product marketing atAkamai, “live video streaming has become an increasingly important part of the web contentuniverse, as a variety of businesses and organizations attempt to capture a ’share of eyeball’and deliver richer, more HDTV-like experiences” (MICHEL, 2013). Several studies argue thatlive streaming is increasingly popular (ZHUANG; GUO, 2011; FORECAST, 2014) and alsothat HD streaming is an established standard for all viewing experiences, with the Super HDtechnology becoming the next big thing in video content delivery. However, the internet wasnot designed to handle such demand and, therefore, new technologies are proposed to overcomethose challenges. We need to study content delivery techniques to plan and efficiently accompanythe ever increasing need for content.

Figure 1.1: Peak period aggregate traffic composition.

(a) Traffic composition for fixed access. (b) Traffic composition for mobile access.

Source: Sandvine’s global Internet phenomena report (ULC, 2014).

1.1 Motivations

Examining content delivery scalability, the first technology that comes to mind is ContentDelivery Networks (CDN). CDNs were responsible for around 37% of the global traffic in2013 according to Cisco’s forecast. Furthermore, according to the same forecast they will beresponsible for as much as 55% of all internet traffic in 2018. Considering only video traffic thenumbers are even higher, from 53% in 2013 to 67% in 2018. It is expected that every majorcontent provider is either using a commercial CDN to deliver content or deploying his own.Netflix reportedly started deploying its own CDN, the OpenConnect, to handle the impressiveamount of video content they deliver. Considering North America alone, Netflix is responsiblefor more than a third (34.2%) of all network traffic (ULC, 2014). CDN related numbers are

1.1. MOTIVATIONS 18

always impressive as the technology became a critical part of the current internet infrastructure,for instance, Akamai claims to handle between 15% and 30% of all internet traffic (AKAMAI.FACTS & FIGURES., 2013).

Content Delivery Networks (CDN) prove to be the alternative to overcome challengesimposed by the increasing traffic demands and the best effort nature of the network. The basicconcept behind a CDN design is to keep content close to end users. This is done by strategicallyplacing several servers near end users. Those servers are called Surrogates, Caches or ReplicaServers (RS). They interact with a special server, called the Origin server, to obtain the mostpopular content according to users’ location. Finally, we need a coordinator to complete the basicinfrastructure of a CDN, which assures that clients are redirected to the most suitable, frequentlythe closest, replica server. The name of this entity is Request Redirector (RR).

Despite being based on a simple concept, to bring content closer to end users, CDNsare complex systems with several decisions to enable content delivery, such as replica servermanagement. The RS management problem includes RS placement and scaling. RS placement’smain question would be where to place replicas? The fundamental principle is to place themas close to end-users as possible. However, there is a finite number of replica servers. Therefore,one should find a way to evaluate the best placement according to one’s scenario. ReplicaPlacement Algorithm (RPA) are algorithms designed to evaluate scenarios and propose theplacement of available replicas. On the other hand, RS scaling problem relates to the necessityand outcomes of scaling overall caching capacity of the system. Scaling the caching capacitycould mean increasing the number of replica servers, increasing their capacity or establishmentof a cooperation agreement between peering CDNs.

CDNs are de facto accepted as the primary content delivery strategy. The result ofCDN’s popularity is an increasing number of CDNs deployed with several purposes (NIVEN-JENKINS; LE FAUCHEUR; BITAR, 2012), varying regarding content, coverage, and capacity.Among them, a very limited number aims to distribute content on a worldwide scale, such asAkamai and Limelight. Most of them have a restricted coverage, being extremely localized,for instance, within an Internet Service Provider (ISP) (BERTRAND et al., 2012; FRANK et al.,2013; SHARMA; VENKATARAMANI; SITARAMAN, 2013). Scaling each CDN, in terms ofcoverage and capability, would be very expensive. A possible way to deal with this RS scalingproblem would be through collaboration between restricted CDNs. This way CDNs’ coveragecould expand temporally to handle a set of requests greater than the local CDN capacity (NIVEN-JENKINS; LE FAUCHEUR; BITAR, 2012; JESUS; AGUIAR, 2012). CDNs could also negotiatecollaborations to fulfill service level agreements established between the CDN and contentproviders. However, effective collaboration raises several challenges (PATHAN et al., 2007a;NIVEN-JENKINS; LE FAUCHEUR; BITAR, 2012). Furthermore, considering the diversity ofexisting CDNs, determining their essential features to decide which third party CDN is mostsuitable to tackle a problem offers a significant challenge.

Such scenario grants the demand for a way to evaluate collaboration scenarios among CDNs

1.1. MOTIVATIONS 19

to quantify the possible gain obtained through collaboration, as well as to gather more informationabout the several variables involved in the process.

One important thing to note about CDNs is that this technology is part of the Internetsince at least the beginning of this century. The concept is not new but new possibilities areavailable. Possibilities driven by an ever-growing number of smaller localized CDNs, along withnew technologies, such as Cloud Computing and Software Defined Networks (SDN).

Cloud Computing emerged as a buzzword in the end of the last decade and it is part ofthe trending topics ever since. The cloud can be seen as a conceptual layer on the Internet, whichturns all available software and hardware resources transparent, making them accessible througha well-defined interface. Notions like on-demand self-service, broad network access, resourcepooling (DILLON; WU; CHANG, 2010) and other trademarks of Cloud Computing serviceswere a key point to its current popularity. Also, many Cloud Storage services emerged in thelast years providing data storage in several continents and backed by rigorous Service LevelAgreements (SLA) (CATHERINE; EDWIN, 2013). Resources made available through provider-specific Web Service APIs. This way, through Cloud services, one could deploy complex CDNsusing the IaaS distributed infrastructure, or alternatively a simpler CDN using Cloud Storageservices.

We believe that both technologies devise new possibilities for CDNs, resulting in moreflexible resource management. For instance, a basic step to deploy a CDN is to place replicaservers, using a RPA. Usually, replica servers are statically placed, meaning that they will notbe moved during the operation of the CDN. However, the flexibility provided by Cloud Com-puting services enables on-demand deployment and reallocation of replica servers to differentgeographic regions. Such flexibility opens the space for new strategies, for instance, new RPAs,that thrive from those new technologies.

As the number of mobile users grow the role of mobile carriers in content deliveryincreases. Figure 1.2 illustrates the growth in mobile devices and mobile access to multimediacontent. The extension indicates that consumers are expanding their number of preferredplatforms, imposing on mobile carriers the challenge of coping with this growth while meetingtheir expectations.

The current mobile network architecture is an IP packet switched network built around 3rdGeneration Partnership Project (3GPP) specifications that define the Evolved Packet System(EPS) architecture, the foundation of fourth-generation mobile networks. In the EPS, trafficis encapsulated in GPRS Tunnelling Protocol (GTP) tunnels transporting packets from edgenodes, eNodeB, to mobile network gateways, such as the Packet Data Network Gateway (PGW),where packets are forwarded towards the global Internet. Although this architecture has been ableto drive the ongoing mobile revolution, it lacks the flexibility needed to control the constantlyincreasing amount of mobile traffic, dynamically. Indeed, GTP tunnels impose that packetstraverse the whole mobile infrastructure, and transform the mobile backhaul into a passivenetwork segment in which traffic cannot be dynamically managed. Forced by the limited

1.1. MOTIVATIONS 20

Figure 1.2: The growth of the mobile market.

Source: Conviva’s viewer experience report from (CONVIVA, 2015).

flexibility of the current infrastructure, to meet bandwidth and latency requirements despite theincreasing amount of traffic in their networks, mobile network operators have been constantlyincreasing their network capacity. Despite being effective, increasing the network capacitysignificantly increases mobile network operators’ costs as it requires the deployment of additionalinfrastructure. As a result, mobile edge solutions have more recently become available, e.g.,LTE caches by ARA Networks 1, and DatE by I-Direct 2. However, limited to the network edge,and since they do not allow to bypass GTP tunnels, edge solutions miss the potential benefits ofin-network caching and dynamic traffic management.

Summarizing, this Thesis investigates the following research questions:

� How to develop on new technologies, such as Cloud Computing, to propose newand improved RPAs? We believe that new technologies devise new possibilitiesfor CDNs, resulting in possibly a more flexible resource management. For instance,a primary step to deploy a CDN is to place replica servers, using a Replica PlacementAlgorithm (RPA). Usually, replica servers are statically placed, meaning that theywill not be moved during the operation of the CDN. However, the flexibility providedby Cloud Computing services enables on-demand deployment and reallocation ofreplica servers to different geographic regions. A new RPA that thrives from thosetechnologies is proposed and investigated in this thesis, comparing its performancewith well-known RPAs.

1http://www.aranetworks.com/solutions/mobile_edgeCDN2http://www.idirect.net/Altobridge.aspx

1.2. OBJECTIVES 21

� How to leverage from CDNs popularity to solve momentaneous RS scaling de-mands? The result of CDNs’ popularity is an increasing number of CDNs de-ployed with several purposes (NIVEN-JENKINS; LE FAUCHEUR; BITAR, 2012),varying regarding content, coverage, and capacity. Through collaboration betweenrestricted CDNs, coverage could expand temporally to handle an unusual set ofrequests. There is, therefore, the need for a way to analyze collaboration scenariosamong CDNs to quantify the possible gain obtained through collaboration. This the-sis presents an analytical model for collaboration among CDNs, considering amongother variables, client dispersion through the network, different CDN capacities, andcache misses.

� How to increase mobile backhaul flexibility to enhance replica placement pos-sibilities? The current mobile network architecture lacks the flexibility needed tocontrol the constantly increasing amount of mobile traffic, due to traffic being en-capsulated in GPRS Tunnelling Protocol (GTP) tunnels transporting packets fromedge nodes, to mobile network gateways that forward packets towards the globalInternet. Despite effective, increasing the network capacity significantly increasesmobile network operators’ costs. Edge caching solutions have become available,however, since they do not allow to bypass GTP tunnels, they miss the potentialbenefits of in-network caching and dynamic traffic management. This thesis presentsa solution to enhance the mobile backhaul’s flexibility based on SDN technology,compliant with the current architecture. In particular, we propose a user-space ex-tension to OpenFlow switches inside the mobile backhaul and show the benefits ofnetwork devices’ programmability by designing and prototyping a transparentcache service.

1.2 Objectives

Considering these research challenges, the main objective of this Research Project isto propose and evaluate replica server management strategies to improve multimedia contentdelivery efficiency. The specific goals of this Doctoral Thesis are:

� To propose a new RPA considering new technologies available that reduces OPEXcosts with little to no impact in Quality of Experience (QoE).

� To construct and evaluate models that represent the collaboration between CDNs,considering among other variables, client dispersion through the network and differentbandwidth capacities.

� To propose a solution to enhance the mobile backhaul’s flexibility compliant with thecurrent architecture.

1.3. THESIS OUTLINE 22

1.3 Thesis Outline

This Doctoral Thesis identifies the challenges involved in replica server management formultimedia content delivery and presents solutions (algorithm, tool, model and prototype) toimprove replica server management considering different application scenarios. The remainderof this document is organized as follows:

Chapter 2 presents a new RPA called FlowCount. It is a greedy algorithm based onthe number of flows passing through the nodes that compose the network. We present also acomparison between the strategy proposed and other well-known RPAs.

Chapter 3 shows an analytical model that represents the collaboration between CDNs.Chapter 4 presents a user-space extension to OpenFlow switches inside the mobile back-

haul and show the benefits of network devices’ programmability by designing and prototyping atransparent cache service.

Chapter 5 presents contributions and future works of this Thesis.

232323

2FlowCount: Dynamic Replica Placement Al-gorithm

I never am really satisfied that I understand anything; because, understand it

well as I may, my comprehension can only be an infinitesimal fraction of all I

want to understand about the many connections and relations .

—ADA LOVELACE

Taking into consideration the tool and concepts presented in Appendix A this chapterpresents a new dynamic RPA strategy called FlowCount. The strategy thrives from DistributedCloud computing services to propose a dynamic strategy that is able to efficiently manage CDNresources maintaining QoE. The primary goal of this Chapter is to describe the proposed strategyand present the evaluation made using the simulator P2PCDNSim.

This Chapter is organized as follows: Section 2.2 presents the state of the art regard-ing RPAs. Section 2.3 presents a description of the algorithm along with its analysis. Section 2.4presents an evaluation of the algorithm, comparing its performance with other RPAs found in theliterature, finally Section 2.5 presents concluding remarks.

The results obtained from this Chapter were published in Rodrigues et al. (2013a) andRodrigues et al. (2013b).

2.1 Introduction

The Internet plays a crucial role in our modern society, and its usage is increasing everyday promoting new challenges. CDNs helped improving accessibility through content replicationin replica servers near clients (BUYYA; PATHAN; VAKALI, 2008). The success of CDNs isillustrated by Akamai’s, one of the key players in the market, impressive numbers. Akamaiclaims to handle between 15 and 20% of the world’s Web traffic, corresponding to over a trillionrequests per day (QUARTER, 2012). The idea behind CDNs is to exceed the classical client-server architecture and spread content towards the network edges. Therefore, one of the primary

2.1. INTRODUCTION 24

concerns is to decide where to place content. Techniques designed to solve this problem fallinto two basic categories: caching algorithms and Replica Placement Algorithm (RPA). Theformer category consists of distributed algorithms that perform content management withinreplica servers’ storage areas. They are also known as caching replacement techniques or,simply, caching techniques (JAMIN et al., 2001). The latter category relates to choosing the bestlocation for replica servers, thus reducing their perceived latency and bandwidth consumption.Furthermore, RPAs can be divided into two categories, namely static and dynamic. Staticstrategies consider only one supposedly perfect placement, in other words, replicas are fixed.Dynamic strategies, on the other hand, adapt the placement of replicas according to scenariochanges.

CDN providers understood the importance of Cloud Computing in the very early stagesand started leveraging this new paradigm. Cloud Computing emerged proposing a conceptuallayer on the Internet, which turns all available software and hardware resources transparent,making them accessible through a well-defined interface. Notions such as on-demand self-service,broad network access, resource pooling (DILLON; WU; CHANG, 2010) and other trademarksof Cloud Computing services were an essential point to its current popularity. Most CloudComputing providers, as major CDN players, rely on large and consolidated data centers. Suchdata centers are infrastructures expensive and hard to manage. Thus, small and geographicallydistributed data centers could also be an alternative to Cloud providers since they can offercheaper and low-power consumption alternatives that reduce the significant costs of centralizeddata centers. These small and distributed data centers can be built and connected to differentgeographical regions to form a Distributed Cloud (DCloud) (CHANDRA; WEISSMAN, 2009;GONÇALVES et al., 2012). Furthermore, many Cloud Storage services emerged in the lastyears providing data storage in several continents backed by rigorous Service Level Agreements(SLAs) (CATHERINE; EDWIN, 2013). Cloud-oriented CDNs use DCloud infrastructure tomap CDN infrastructure components into virtual components in the Cloud. For instance, asurrogate server can be mapped into an Infrastructure as a Service (IaaS) service or even a Cloudstorage service. These capabilities can bring new opportunities such as enabling a small businessto become a CDN provider, offering content delivery service for third parties without the costof owning or operating geographically distributed data centers. An example of this approach isMetaCDN (BROBERG; BUYYA; TARI, 2009).

We believe that those technologies devise new possibilities for CDNs, resulting in moreflexible possibilities regarding resource management. For instance, through Cloud services, onecould dynamically manage CDN resources to accompany specific demands, such as flash crowdevents. In the next Section, we present the state of the art regarding RPAs.

2.2. RPA STATE OF THE ART 25

2.2 RPA State of the Art

Some theoretical approaches model the RPA problem as the “center placement problem”:for the placement of a given number of centers, they minimize the maximum distance betweena node and the nearest center. Some variants of this problem are the facility location problem,k-hierarchically well-separated trees and the minimum K-center problem (BARTAL, 1996). Theminimum k-center problem is too complex and computationally intensive to be used in practice.

Due to the computational cost of these algorithms, some heuristics have been proposed.They take into account existing information from a CDN, such as workload patterns and thenetwork topology, and devise reasonable solutions with a lower computation cost. The work inChen, Katz, and Kubiatowicz (2002) evaluated a set of heuristic based strategies characterizedalong three axes: metric scope (the technique used, centralized or decentralized computation),approximation method (e.g. ranking, relaxation, fixed threshold, and dynamic programming)and cost function simplification.

During the last decade, there has been a considerable number of research papers on replicaplacement. First algorithms fit into the static placement group (KRISHNAN; RAZ; SHAVITT,2000; QIU; PADMANABHAN; VOELKER, 2001; JAMIN et al., 2001; KANGASHARJU;ROBERTS; ROSS, 2002). Overall, the best representatives of static placement strategies areGreedy and HotSpot, previously discussed in Section A.1.1.

The next RPA group comprises dynamic strategies, dynamic in the sense that RSs areadded or moved according to the dynamically changing user request traffic (PRESTI; BAR-TOLINI; PETRIOLI, 2005). They represent an improvement on static approaches that poorlyadapt to changes in user requests. The work in (KHAN et al., 2009) discussed a robust replicaplacement for improved performance under the uncertainty of random server failures while (XU;BHUYAN, 2005) introduced also QoE awareness. In (SUN et al., 2011), a model to reduce thecomputational cost of the heuristics is presented to address problems of limited storage capacity.It also performed a comparison through simulation of the main heuristics found in the literature.

Authors in Khalaji and Analoui (2013) present an evaluation of several RPAs consideringthe hybrid CDN-Peer to Peer (P2P) scenario. This scenario proposes the active participationof CDN clients in a P2P distribution network to assist CDN’s content delivery system. The studyin question presents a comparison between well-known strategies, namely Greedy, Hot Spotand Constraint P-Median (CPM). CPM is an approach based on minimizing the sum of weighteddistances from all vertices to the selected points. Their simulation results show that consideringthe Hybrid CDN scenario, the strategy that results in lower cost of content replicating is Greedy.

In the next Section, we present the FlowCount, our proposal for a dynamic RPA basedon a Greedy heuristic that uses the number of flows and distance between servers and clients tomanage replica placement.

2.3. FLOWCOUNT ALGORITHM DESCRIPTION 26

2.3 FlowCount Algorithm Description

This section presents a detailed description of the FlowCount strategy, divided intotwo subsections. The first subsection presents a description of our approach, describing thefunctioning of the reallocation strategy and showing an example of how we count flows. Thesecond subsection shows a complexity analysis of the proposed algorithm.

2.3.1 FlowCount Strategy

The CDN can be modeled as a number of nodes, replica servers, clients and a routingmatrix. Expected output of a RPA is a placement matrix linking the replica servers to the nodes,representing the optimal location(s) (KARLSSON et al., 2002). The replica placement problemconsists basically of a cost function that has to be optimized under certain constraints (number ofreplicas, server capacity or client quality of experience). The replica placement problem belongsto the NP-complete complexity class (SUN et al., 2011).

Our FlowCount placement strategy is based on a greedy algorithm with a particularselection function. This selection function uses the number of flows as the main metric to decideupon placement. We consider that a node with high flow count could represent central nodesregarding content distribution. The Flow Count placement strategy follows this idea and countsall flows passing through all routing nodes in the topology, then later uses this information toplace replica servers.

We divide our strategy into two basic parts. The first one is counting flows and the secondone is analyzing the flow counts to decide where to place replicas. The first part is made by ananalyzer running on every routing node that counts and updates tables with information about allflows passing through that node. We identify SDN technologies as a possible tool to enable flowmonitoring. In our experiments, a flow is represented by the object identifier and the destinationaddress. This information is stored to prevent counting a recognized flow again. Recent worksinvolving both Content-Aware Networks (CAN) (NICULESCU et al., 2011) and new networkmanagement tools (MCKEOWN et al., 2008) present new horizons that make collecting thisinformation possible.

The second part of the strategy relates to the decision regarding where to place serversbased on two different pieces of information: topology and flow counts. The topology representsthe network and candidate locations to place servers. Nodes with higher flow counts are likelycentral nodes regarding traffic, in other words, they should be critical nodes for the overall contentdistribution. Since the number of flows passing by each node is directly related to the currentreplica placement, it is important to notice that we set to zero flow counts for all nodes after everyreplacement made. The pseudo-code Algorithm 1 illustrates the placement selection process forFlowCount placement strategy. The first for loop selects a candidate node, representing a possiblenode to place a replica. The line four updates the flow count for the selected candidate node.

2.3. FLOWCOUNT ALGORITHM DESCRIPTION 27

The second for loop calculates the cost of serving content according to the current configuration,considering the current candidate node and previous replicas placed. After examining all nodes,the result of selecting the current candidate node is compared with the best cost so far, andthe best choice is updated accordingly. To dynamically adapt to workload changes, we repeatthis process every T seconds. We consider the cost as being f lowCount ·distanceTo(node), asillustrated in lines 6 and 9 of Algorithm 1. By selecting the minimal between the candidateand other already placed replicas cost, the algorithm tries to minimize total flow count. Overallconfigurations costs are ordered and replicas are placed according to the the placement thatresulted in lower costs.

Algorithm 1 Algorithm that selects the best candidate node to place a replica server.1: procedure BESTNODE

2: for each candidateNode nodeList do3: for each node nodeList do4: f lowCount← f lowCountList(node).5: for each replica placedReplicaList do6: replicaCost← f lowCount×distanceTo(replica).7: if replicaCost < costToClosestReplica then8: costToClosestReplica← replicaCost.9: costToCandidate← f lowCount×distanceTo(candidateNode).

10: cost← min(costToClosestReplica,costToCandidate).11: if cost < lowerCost then12: bestNode← node.13: lowerCost← cost.

2.3.2 Complexity Analysis

This Section presents a complexity analyses of the FlowCount RPA. Within this Sectionwe will consider K to be the total number of replica servers and N to be the number of possiblenodes where one can allocate replica servers.

Our algorithm is based on the classic Greedy replica placement (QIU; PADMANABHAN;VOELKER, 2001) a known solution to the replica placement problem. Considering N and K asdescribed earlier, the greedy algorithm’s complexity is O(K ·N2). The FlowCount strategy usesthe same basis as the Greedy algorithm, however using a different metric as the cost and thushas the same complexity as seen in Algorithm 1. There are three main loops, two of them foreach one of the possible nodes to place, resulting in N2 interactions, and a third one consideringreplicas already placed; in other words, at most K iterations. Thus, Flow Count replica placementstrategy is also O(K ·N2).

2.4. EXPERIMENTS AND RESULTS 28

2.4 Experiments and Results

To evaluate the proposed strategy, we used two scenarios. The first and smaller one,called the YouTube scenario, it uses a YouTube trace1 made by the Laboratory for AdvancedSoftware Systems from the University of Massachusetts Amherst . The second, called theProwGen scenario, we generated using ProwGen (BUSARI; WILLIAMSON, 2002) a popularand cited tool described as "a synthetic workload generation tool for simulation evaluation ofweb proxy caches". Both scenarios used the 500, divided across 10 Autonomous System (AS)es,nodes topology generated using the topology generator BRITE (MEDINA et al., 2001), with 10replica servers and illustrated in Figure 2.1, initially placed one per AS.

In the YouTube scenario, links between nodes and CDN entities had 2GB/s capacity,except those between the ASes which were 10GB/s instead, and the clients had 6MB/s accesslinks. In the ProwGen scenario clients had 1MB/s links while routers and other CDN entities had1GB/s links. Client links reflected average connection speed reported by Akamai for consumersin North and South Americas (QUARTER, 2012).

The YouTube scenario has 5771 requests placed in the topology in round robin fashion.Requested object size followed an exponential distribution (BUSARI; WILLIAMSON, 2002)with a 2MB mean. The other scenario was generated using ProwGen and is a much larger one. Ithad almost 1 million requests and two flash crowds during the timeline. Table A.1 illustratesthis scenario. This scenario has a basic workload lasting 24 hours and two flash crowds, namelyFlash Crowd 1 and Flash Crowd 2 lasting 2 and 1,5 hours each. During both flash crowdsone can notice modifications regarding peer arrival and object popularity, reflecting expectedcharacteristics of such events as discussed in Section A.1.2. Moreover, different from the basicworkload that has traffic scattered through the topology, Flash Crowds 1 and 2 are located inAS0 and AS5 respectively.

For both scenarios we used P2PCDNSim’s CDN Overlay relocating RSs every T seconds.Each relocation process used a different RPA considering only information received during thelast T seconds to perform replica placement. We used two different values for T, accordingto the total duration of the each one of them, therefore the first and smaller Youtube scenarioconsidered T = 1000s whilst ProwGen scenario used T = 5000s.

2.4.1 Metrics

During our simulations we collected a series of metrics to evaluate and compare thestrategies used. This section describes all the collected metrics and what they represent.

� Startup Delay: represents the time between requesting the content and receivingfirst useful information.

1http://traces.cs.umass.edu/index.php/Network/Network


Figure 2.1: 500 nodes topology used in the evaluation of the FlowCount strategy.

Source: Made by author.

Table 2.1: Description of the 24 hours scenario generated used ProwGen (BUSARI;WILLIAMSON, 2002)

BasicWorkload Flash Crowd 1 Flash Crowd 2

Duration 24 hours 2 hours 1,5 hours# of objects 432 1440 810Peer arrival 4 peers/second 40 peers/second 60 peers/second# of requests 345512 287999 323937

ObjectsPopularity Zipf (0.6) Zipf (1) Zipf (1)

Cacheable 40% 10% 15%ASId All of them AS0 AS5

� Total Network Traffic: as presented in Section A.3 represents total traffic passingthrough all nodes that compose the network.

� Routers Inner and Cross Traffic: as presented in Section A.3 those metrics followthe same basic idea as Total Network Traffic in the sense that they represent the totaltraffic passing through nodes. However, they limit the set of nodes considered, in thecase of the Routers Inner Traffic we only account traffic within an AS on the otherhand, considering Routers Cross Traffic we only account traffic between ASes.


� Bandwidth 95th Percentile: this is a typical metric used by ISPs to charge cus-tomers (GOLDENBERG et al., 2004; VANDERHOOF, 2011; SLATTERY, 2011).Using this metric, ISPs record traffic volume every user generates during a 5 minuteinterval. At the end of the agreed period, for instance, one month, the 95th of allrecords is used as the charging volume for each client. This metric is a way tomeasure bandwidth usage allowing customers to burst beyond their committed baserate while providing the carrier with the ability to scale billing accordingly.

2.4.2 Simulation Results

First, we ran simulations using the YouTube scenario. This was our first test using thenew strategy and our intention was to compare the new strategy performance to other approachesfound in the literature, namely Greedy and Hotspot. Greedy is known to outperform Hotspot interms of network usage.

Figure 2.2 shows Total Network Traffic Bandwidth Timeline, we can see that Hotspothas a clear higher total traffic during peak traffic situations, between 40000s and 60000s. We canalso notice a smaller difference between FlowCount and Greedy metrics, which seems to indicatethat FlowCount needs more time to adjust to sudden traffic changes. Analyzing the Cross ASTraffic bandwidth timeline in Figure 2.3, we can see that, throughout the simulation, FlowCounthad equal or lower cross traffic than all the other strategies. Furthermore, once again Hot Spotis outperformed by Greedy and FlowCount strategies. Considering that several studies arguethat cross traffic is more expensive than traffic within a domain (AGGARWAL; AKONJANG;FELDMANN, 2008; CHOFFNES; BUSTAMANTE, 2008; LE BLOND; LEGOUT; DABBOUS,2011; RUAN et al., 2009; SEEDORF; KIESEL; STIEMERLING, 2009), this result demonstratesthat Flow Count could be a valuable player in reducing network costs.

Table 2.2 presents 95th percentile values for Total Network Traffic and Cross AS Trafficmetrics. Although the difference in total network traffic bandwidth is almost insignificant,cross-network traffic is more than 20% lower demonstrating that the proposed strategy renderspotential cost savings.

With regard to QoE, Figures 2.4 and 2.5 shows the startup delay mean and standarddeviation for all strategies. Figure 2.4 show the box plot for startup delays collected from all 5771clients for all strategies. One might notice that because of some outliers it is not possible to devisea clear conclusion QoE performance comparison between strategies. For a clear comparison wedecided to exclude all outliers, Figure 2.5 show the box plot for all startup delays collected duringthe experiment excluding outliers. It is clear that considering this scenario, all three strategieshave nearly the same startup delay, around 9ms, with a slightly better-perceived experiencewhen using Flow Count strategy. It is important to notice that for all scenario presented here thenumber of outliers represented less than 1% of the total number of clients (5771).

For a better understanding of FlowCount’s performance, we decided to execute more


Figure 2.2: Bandwidth timeline during the Youtube scenario simulation.


Figure 2.3: Cross traffic timeline during Youtube scenario simulation.


experiments using a bigger and more representative scenario, namely the ProwGen scenario.This larger scenario simulates two different flash crowds with various content objects sets in a24 hours time window. The first flash crowd event goes from approximately 18000s to 25000sand the second flash crowd event goes from approximately 43000s to 48000s. ConsideringHot Spot’s poor performance during Youtube scenario experiments, which agrees with other


Table 2.2: 95th percentile in MB/s for Youtube scenario considering Total NetworkTraffic and Cross AS Traffic metrics.

Total TrafficBandwidth

Cross TrafficBandwidth

FlowCount 4791,75 191,8Greedy 4793,05 243,8

Hot Spot 6153,2 938,25

Figure 2.4: Startup delay box plot for YouTube scenario considering all clients.


results found in the literature (QIU; PADMANABHAN; VOELKER, 2001), we now focus onGreedy and FlowCount strategies. Figure 2.6 illustrates Total Network Traffic bandwidth timeline during ProwGen scenario simulation. We can see that FlowCount has a slightly higher TotalNetwork Bandwidth traffic only during both flash crowd events. Before carefully looking at theresults, this could signal that Greedy outperforms FlowCount. However, if we look at Figure 2.7,illustrating Cross AS traffic, we notice that during both flash crowd events using FlowCount RPAresults in a significant cross AS traffic reduction. As commented earlier, several studies signalcross traffic as being more expensive than inner AS traffic. In other words, demonstrating onceagain that Flow Count could be a valuable player in reducing network costs.

Table 2.3 shows 95th percentile values for Total Network Traffic and Cross AS Trafficmetrics considering the ProwGen scenario. Once again, we are not able to see significantdifference in terms of total network bandwidth timeline. Nonetheless, when considering crosstraffic the difference is even bigger than in the previous scenario, as we can see in Table 2.3.


Figure 2.5: Startup delay box plot for YouTube scenario excluding outliers.


Therefore, we expect great operational cost reductions when using the FlowCount strategyinstead of the Greedy placement strategy.

Figure 2.6: Bandwidth timeline during the Youtube scenario simulation.


2.5. CONCLUSION 34

Figure 2.7: Cross traffic timeline during Youtube scenario simulation.


Table 2.3: 95th percentile for ProwGen scenario considering Total Network Traffic andCross AS Traffic metrics.

Total TrafficBandwidthTime line

(GB/s)

Cross TrafficBandwidthTime line

(MB/s)FlowCount 224,8 3516,6

Greedy 217,8 5943,7

2.5 Conclusion

This Chapter presented a novel Replica Placement Algorithm (RPA), namely FlowCount,that thrives from Distributed Cloud computing services to enable dynamic replica relocationefficiently manage CDN resources maintaining QoE. Simulation results shows that the novelstrategy proposed, Flow Count placement, provides similar QoE and slightly higher TotalNetwork Traffic. However, considering Cross AS Traffic, using the proposed strategy resultedin considerable cross traffic reductions, specially during flash crowd events, reaching 40% lessinter AS traffic. These findings provide a good incentive for the actual deployment of the FlowCount placement algorithm in present CDNs.

One of the main lessons learned regards the need for a dynamic adaptation of work-ing conditions. Finding the best location for each replica server is important for the optimalfunctioning of a CDN, while also assuring the best QoE and network utilization metrics. Thisideal location, however, varies with time, and while an adaptation is necessary, an instantaneousre-adaptation might lead to instability in the location of servers (for instance, the same server

2.5. CONCLUSION 35

being relocated constantly). Therefore, a certain relocation stability is desired, providing atheoretically sub-optimal solution while still providing lower cross-traffic.

Next chapter presents an analytical model to evaluate possible CDN collaboration benefits.The proposed model is an extension of a previously published analytical model in order toconsider the collaboration between CDNs. Our results show that offloading requests to neighborcontent networks could help increase Quality of Experience (QoE).

363636

3Multiple CDNs Collaboration

I have yet to see any problem, however complicated, which, when looked at

in the right way, did not become still more complicated.

—POUL ANDERSON

Considering CDN’s popularity and the ever growing number of small localized CDNsdeployed, this chapter presents an analytical model for collaboration among CDNs, considering,client dispersion through the network, different overall capacities, and cache miss. Resultsdemonstrate that CDNs collaboration could lead to better QoE. The primary goal of this Chapteris to present the proposed model and an evaluation of collaboration benefits in terms of user QoE.

This Chapter is organized as follows: We start by presenting an introduction of theproblem in Section 3.1. Section 3.2 presents the state of the art regarding CDN and multiple CDNmodels. Section 3.3 presents the proposed model followed by a discussion regarding the model inSection 3.3.1. We then show our first results obtained with the model in Section 3.4. Afterwords,we discuss redirection strategies and their overhead in Secion 3.5 and present results consideringthe redirection overhead in Section 3.5.1. Section 3.6 discuss lessons learned and Section 3.7presents concluding remarks.

The results obtained from this Chapter were published in Rodrigues et al. (2013) andRodrigues et al. (2014).

3.1 Introduction

CDNs are de facto accepted as the primary content delivery strategy, and such popularityresulted in an increasing number of CDNs deployed with several purposes (NIVEN-JENKINS;LE FAUCHEUR; BITAR, 2012), varying regarding content, coverage, and capacity. Amongthem, a very limited number aims to distribute content on a worldwide scale, such as Akamai1

and Limelight2. Most of them have a restricted coverage, being extremely localized, meaning

1https://www.akamai.com/2https://www.limelight.com

3.2. STATE OF THE ART 37

that their coverage is restricted to a single Internet Service Provider (ISP) (BERTRAND et al.,2012; FRANK et al., 2013; SHARMA; VENKATARAMANI; SITARAMAN, 2013). Increasingeach CDN’s capacity, in terms of coverage and capability, would be very expensive. Throughcollaboration between restricted CDNs, coverage could expand temporally to handle a set ofrequests greater than the local CDN capacity (NIVEN-JENKINS; LE FAUCHEUR; BITAR,2012; JESUS; AGUIAR, 2012). CDNs could also negotiate collaborations to fulfill service levelagreements established between the CDN and content providers. However, effective collaborationraises several challenges (PATHAN et al., 2007a; NIVEN-JENKINS; LE FAUCHEUR; BITAR,2012). Currently, proposed techniques handle client redirection to servers controlled by thesame CDN, a much simpler scenario than redirecting to servers belonging to the third party CDNs.Furthermore, considering the diversity of existing CDNs, determining their essential features todecide which third party CDN is most suitable to tackle a problem offers a significant challenge.There is, therefore, the demand for a way to evaluate collaboration scenarios among CDNs toquantify the possible gain obtained through collaboration, as well as to gather more informationabout the several variables involved in the process.

This Chapter presents an analytical model for collaboration among CDNs. The modelexposes dependencies of response time to replica server capacities and proportion of requestsredirected to the foreign CDN. It considers among other variables, client dispersion throughthe network, different bandwidth capacities, and cache misses. Validating our hypotheses andPathan’s (PATHAN; BUYYA, 2008), our analyses show that CDN collaboration can effectivelydecrease response time. The main contributions presented in this Chapter are three-fold:

� An analytical model for collaboration among CDNs, where clients and servers aregeographically distributed with different latencies and bandwidth capacities;

� Feasibility analysis of collaboration among CDNs;

� A comparison between two Collaboration Request Routing Strategies, Recursive andInteractive.

3.2 State of the Art

3.2.1 CDN Collaboration

In Vakalli and Pallis (2003) authors describe basic CDN entities, their relations and be-havior. Also, authors comment on several aspects of CDNs including peering following the earlysteps of Internet Engineering Task Force (IETF)’s Content Distribution Internetworking (CDI)workgroup (DAY et al., 2003). CDN brokering (BILIRIS et al., 2002) proposes client redirec-tion through Domain Name Service (DNS) redirection techniques. It uses a brokering CDNserver to handle DNS redirections using an Intelligent Domain Name Server (IDNS) which

3.2. STATE OF THE ART 38

considers metrics like load status rather than a static response. The main issue is that IDNSis proprietary and might not be suitable for CDN interconnection/collaboration. In a workrelated to the CDI (BERTRAND et al., 2012), CDN collaboration is proposed through threesystems; Request-routing Internetworking System (RRIS), Accounting Internetworking Sys-tem (AIS), and Distribution Internetworking System (DIS). RRIS redirects client’s request tothe CDN that better satisfies it according to performance data exchanged between CDNs throughthe Request-Routing Information Exchange Protocol (RIEP). Main AIS’s responsibility is toexchange accounting data, in other words data related to resource consumption. Finally, the DISmoves content between CDNs.

IETF’s Content Delivery Networks Interconnection (CDNI) workgroup restarted thediscussion about patterns and protocols to enable CDN collaboration. Their main goal is toallow interconnection of multiple CDNs under different administration. This interconnectionshould consider all actors involved, from CDNs to content providers and end users. Discussionsalso promote concern about some distinct subjects such as the complex accounting mechanismand the management of agreements needed for a good collaboration. They propose a taxonomyfor the actors in a collaboration scenario, upstream and downstream CDNs. Respectively, theprimary CDN and the CDN hired to team up with the primary CDN. In Buyya et al. (2006)authors present concepts for CDN collaboration where the peering model is based on a differenttype of CDN, the Content Service Networks (CSN), which act as another infrastructure layeron top of the CDN forming the Content and Service Delivery Networks (CSDN). Using theirapproach a CDN can share and request resources according to specific needs. Resources arepublished and found through a Service Registry. In Pathan et al. (2007b) extended discussionabout a peering model is presented. The idea is to serve clients using local resources as longas they are enough. Formation of a Virtual Organizations (VO) is initiated by a CDN calledthe primary CDN; all other CDNs in the same VO are called peering CDNs. User actionscould result in collaboration which are transparent from the user’s point of view. In Chang etal. (2012) authors propose a strategy to deploy CDNI using OpenFlow 3. The idea is that thecontroller would receive information from the CDN and, if needed, manage the interconnectionbetween CDNs. SDN is also used to enable CDN collaboration in Wichtlhuber, Reinecke, andHausheer (2015) where authors propose a system for ISP/CDN interaction based on a minimaldeployment of SDN-capable switches inside the CDN provider’s network. A proof-of-conceptdeployment is presented and used to evaluate and discuss performance.

3.2.2 CDN Models

In Molina, Palau, and Esteve (2004) authors present a model based on queues M/M/1 tomodel response time, and CDN components. Through the proposed model authors show basicconcepts such as the advantage of using replica servers due to their proximity to clients. They did

3https://www.opennetworking.org/sdn-resources/openflow

3.3. CDN COLLABORATION MODEL 39

not consider multiple CDN collaboration and used static entity placement, which are currentlyimportant aspects covered by the model presented in this Chapter.

The only model found that proposed CDN collaboration is found in Pathan and Buyya(2008). In their model, collaboration is made up of a set of queues M/G/1 showing how it can helpa CDN meet service level agreements, even at a high rate of incoming requests. Although usefulas a first step on modeling multiple CDNs, the paper has some limitations, such as not consideringcache misses nor collaboration overhead. The work (JESUS; AGUIAR, 2012) proposes amodel to estimate the cost to deliver content to users and evaluate the efficiency of using CDNcollaboration. Their results show that inter connecting CDNs might not be advantageous forboth CDNs providers. Furthermore, in Jeong et al. (2013) authors model CDN traffic to evaluatetraffic reduction considering CDN collaboration. They present three different optimizationmodels, no CDN, CDN only, and CDN-Interconnection. Their study shows that CDN cachingcan reach up to 24% traffic reduction. Likewise, Telco- CDN and Telco- CDNI can reduce trafficup to 6% and 27% respectively.

None of those studies focus on QoE, as our proposed model. Also, to the best of ourknowledge, our model is the first one to deal with scenarios of multiple CDNs collaborationconsidering collaboration overhead and cache misses.

3.3 CDN Collaboration Model

Our model represents CDN’s basic components (as detailed in Section A.1) along withother entities related to the CDN collaboration scenario, illustrated in Figure 3.1. Our modelextends previous work introducing CDN collaboration to the model presented in Molina, Palau,and Esteve (2004). We describe each entity as follows:

� Origin Server (OS): represented by the circle in the middle of Figure 3.1. The entityresponsible for storing the original content. In case of a cache miss, RS will requestcontent copies from the OS.

� Clients: represented by squares distributed on the outer circle, they represent clientclusters. We consider a group of clients located within the same domain, for instance,the same ISP as a client cluster.

� Replica Servers (RS): There are two types of replica servers:

� Local RS: represented by triangles, local replica servers are RSs thatbelong to the upstream CDN. They are placed between the OS and clients.Pl represents the total number of local RS.

� Foreign RS: represented by trapeziums, foreign RSs are replica serversfrom downstream CDNs. They are the result of eventual collaborations


made by the primary CDN, and are also placed between the OS. Pf

represents the total number of foreign RS.

� Request Redirector (RR): they are the center of the inner circles in Figure 3.1.The RR is the entity responsible for receiving client requests and redirecting it tothe most suitable RS. There are two types of RR, the Foreign RR and the Local RR.The later is the RR that belongs upstream CDN placed in the center of the Local RSscircle, whereas Foreign RR is located in the center of the Foreign RSs circle.

Consider M the number of client clusters positioned in a circle around the OriginServer (OS). Inside the client cluster circle, we have two other circles of Replica Servers (RS),with the Request Redirector (RR) in the center.

τo is the distance, or Round Trip Time (RTT), between client clusters and the OS. RSshave a different RTT; they are τl and τ f away from the origin server, and local replica serversare τpl away from clients whilst foreign replica servers are τp f away from clients.

Figure 3.1: Basic model components.


When the CDN receives a new request, it can be handled either by a local RS, a foreign RSfrom an active collaboration, or by the origin server. Therefore, we consider pl the probabilitythat a client will be redirected to a local server and p f the probability that the client will be


redirected to a foreign replica server. Thus, the client’s probability to be redirected to theorigin server will be (1− pl − p f ). Considering that response time is a traditional metric toevaluate CDNs, we define the following expression, involving replica and origin servers, for theestimated response time:

R̂ = plR̂local + p f R̂ f oreign +(1− pl− p f )R̂origin� �3.1

where R̂local is the estimated response time for clients requesting content to local replicaservers and R̂ f oreign is the estimated response time for clients requesting content to foreign replicaservers. Furthermore, R̂origin is the estimated response time when requesting content to the originserver. The option to redirect the client to the origin server is not a common practice in CDNs,however we consider it as a representation of a cache miss event when the cache has to requestcontent to the origin server. Our model enables easy modification to deal differently with thisquestion. Authors in Sayal et al. (1998) found a weak correlation between RTT and responsetime and a strong correlation between HTTP request latency, i.e., time to receive the first byteof a response, and response time. Therefore, response time can be represented using a linearmodel (MOLINA; PALAU; ESTEVE, 2004):

R = Nτ +S� �3.2

where τ represents RTT and N represents a scaling factor that is combining effects ofnetwork loss rates, re-transmission and, in general, the volume of exchanged data required forthe request (MOLINA; PALAU; ESTEVE, 2004). S represent the processing time required toeach server, i.e. OS and RSs, handle requests, modeled as an M/M/1 queue, where requestsarrive according to a Poisson process and service times are assumed to be independent andexponentially distributed (SZTRIK, 2012). Such characteristics correspond to known criticalaspects related to streaming media services and web proxy workloads (TANG et al., 2003;BUSARI; WILLIAMSON, 2002). Thus, we can extend the previous formula 3.2 as:

R = pl

[Nτl +

1

µl−λl

]+ p f

[Nτ f +

1

µ f −λ f

]+(1− pl− p f )

[Nτo +

1

µo−λo

] � �3.3

where µl , µ f , and µo are the mean service rate of local replica servers, foreign replicaservers, and the origin server respectively. The same idea is valid to λl , λ f , and λo representingmean arrival rate. Both rates should be different between each replica server and clients as wellas between replica servers and the origin server. Therefore, we should consider a different latencyfor each client and the origin server τ i

o, as well as mean arrival rate for each client λi where0 > i≥M. Likewise, we have τ

i jpl for each i replica server and j client, µ

jl for each replica server

j and λj

li for each client i and replica server j, where 0 > i≥M and 0 > j ≥ Pl . Considering


foreign replica servers, there will be a minor change. The concept is the same; each replicaserver will have a different rate, however, because it is possible to collaborate with more thanone CDN at the same time, we have to consider different rates for each existing collaboration.Therefore, considering C the number of existing collaborations among CDNs, we have τ

i jcp f , µ

jcf

and λjcf i where 0 > i≥M, 0≥ j ≥ Pf and 0≥ c≥C.

In single or multiple CDN scenario, a content request is redirected to replica serversaccording to network and load conditions. A client is most likely redirected to the nearest replicaserver, but there could be an exception due to extreme conditions, for instance, high CPU loador network failure. Thus, if we consider that a client i can potentially be redirected to one ofall replica servers, this means that p f and pl should have different values for each client andeach replica server. Also, considering p f , there should be a different probability for each replicaserver and existing collaboration.

We also consider a factor k representing the capacity of each CDN, meaning µp = kλp,k > 1. Furthermore, for a more realistic approach we consider a k to the primary CDN and adifferent k for each existing collaborations. This factor represents CDN’s capacity to handlerequests; in other words a higher k value means a more robust CDN.

Following the previous formula, we developed a response time expression for the overallsystem, considering the mean estimated response time obtained for all M clusters.

R̄ =1

M

M

∑i=1

R̂i� �3.4

We can then derive the mean estimated response time:

R̄ = R̄l + R̄ f + R̄o� �3.5

In what follows we describe R̄l , R̄ f , and R̄o. Also, variables used in all equations aresummarized in Table 3.1.

R̄l represents the mean estimated response time for all clients requesting content to a Lo-cal RS. The estimated value is calculated following equation 3.2, considering the estimated RTTbetween each client cluster and the each local RS along with the estimated time to serve therequest.

R̄l =1

M

M

∑i=1

Pl

∑j=1

[p j

li

(Nτ

i jpl +

1

µj

l −∑Mk=1 λ

jlk

)] � �3.6

R̄ f represents the mean estimated response time for all clients requesting content toforeign RSs. The estimated value is calculated following equation 3.2, considering the esti-mated RTT between each client cluster and the each foreign RS for all existing collaborationsalong with the estimated time to serve the request.


Table 3.1: Description of variables used in our proposed model.

Variable Description Variable Description

M Number of client clusters. τi jpl

RTT between local RSj and client i.

Pl Number of local RS. τi jcp f

RTT between foreign RSj, related to collaboration c,

and client i.

C Number of existingcollaborations. τ i

0RTT between origin server

and client i.

Pf Number of foreign RS. µj

lMean service rate for

local RS j.

Pcf

Number of foreign RSrelated to collaboration c. µ

jcf

Mean service rate forforeign RS j

related to collaboration c.

pl and p f

The probability to redirecta client to a local or a

foreign RS.µs Mean service rate for origin server.

p jli

The probability to redirectclient i to local server j. λ

jlk

Represents the mean arrivalrate that cluster k sends to

local replica server j.

p jcf i

The probability to redirectclient i to foreign server

j related to collaboration c.λ

jcf k

Represents the mean arrivalrate that cluster k sends to

foreign replica server jrelated to collaboration c.

R̄ f =1

M

M

∑i=1

C

∑c=1

Pcf

∑j=1

[p jc

f i

(Nτ

i jcp f +

1

µjcf −∑

Mk=1 λ

jcf k

)] � �3.7

R̄o represents the mean estimated response time for all clients requesting content directlyto the origin server. The value follows also the equation 3.2, considering the estimated RTTbetween each client cluster and the origin server along with the estimated time to serve therequest. We also define the variable Hi, representing the cache miss event.

R̄o =1

M

M

∑i=1

Hi

[Nτ

io +

1

µs−∑Mk=1(Hi)λk

] � �3.8

where Hi is:

Hi =

1−

Pl

∑j=1

p jli +

C

∑c=1

Pcf

∑j=1

p jcf i

� �3.9


3.3.1 Model Discussion

It is important emphasizing that in a CDN scenario the number of clients is expected tobe higher than the number of replica servers. In fact, Akamai, a leading company in the CDNmarket during the writing of this document, reported handling more than a trillion requests perday (QUARTER, 2012). Thus, it is acceptable to say that M� Pl +∑

Cc=1 Pc

f .For every client, we assign a different redirection probability to each replica server. We

later normalize probabilities considering client redirection proportion for each server group. Inother words, consider a client requesting content to a CDN. Assuming an illustrative scenariowith three local replica servers and two foreign replica servers, this will result in three values,p1

l1, p2l1, p3

l1 for local replica servers and two values, p11f 1 and p21

f 1 for foreign replica servers. Acommon strategy to select which replica server will serve content to a client is to select the clos-est (PATHAN; BUYYA, 2007). Thus, we assign all probabilities according to the RTT betweenthe client and replica servers. Higher RTT means lower redirection probability. Furthermore,considering that CDNs distribute replica servers trying to reach as much network coverage aspossible, there should be replica servers so far away from the client that redirecting him to themcould result in poor performance. Therefore, we consider a threshold α , meaning that all replicaservers with a RTT bigger than ατ i

o, should not be seen as a possible candidate for client i.Consequently, we have three redirection possibilities for a client, local replica server, foreignreplica server, and origin server. The cache hit rate (chr) is the probability that a new clientrequest will be served by a local replica server. In the case of a cache miss, there are two possibleoutcomes, either redirect the client to a foreign replica server or redirect it to the origin server.To represent the possibility to redirect clients to foreign server we propose a new variable calledmulti CDN proportion (mcp). This variable represents how many requests will be redirectedto peer CDNs being then served by foreign replica servers. The chance of a new request being

redirected to a foreign replica server is (1− chr)mcp

Cfor c > 0. All probabilities mentioned

earlier will be normalized according to their proportion, generating, therefore, final redirectionprobabilities for each entity.

Another extension to Molina’s model relates to entities placement. Previously, as illus-trated in Figure 3.1 and described inMolina, Palau, and Esteve (2004), entities used a factor called

αc0 randomly chosen within the range

[0,

2π

L

]. All entities were placed in αci =

2π

Li+αc0

where 0 < i < L and L is the total number of entities to be placed. This way, clients were alwaysdistributed uniformly. Our model proposes equal slices for each entity, but a different placementfor each entity inside the slice. Therefore, we have for entity i:

αci =2π

L(i−1)+ random

(0,

2π

L

) � �3.10

As shown in Equation 3.2, response time can be seen as the sum of two factors, one

3.4. METHODOLOGY AND INITIAL EXPERIMENTAL RESULTS 45

representing transmission time and the other processing time. To better understand those factorswe can isolate them in two separate equations, where Rp represents processing time and Rt

represents the transmission time. Consider that, to calculate the overall estimated processingtime we have to sum up the estimated processing time for all entities involved. In other words,R̂p = R̂o

p + R̂lp + R̂ f

p where R̂op is the estimated processing time of the origin server, also R̂l

p andR̂ f

p are the estimated processing times of local and foreign replica servers respectively. The sameidea is used for R̂t , therefore, we have:

R̄t =1

M

M

∑i=1

Pl

∑j=1

p jliNτ

i jpl +

C

∑c=1

Pcf

∑j=1

p jcf iNτ

i jcp f +HiNτ

io

� �3.11

where R̄t represents the mean response time considering only the transmission componentof the response time for all possible entities.

R̄p =1

M

M

∑i=1

Pl

∑j=1

(p j

li

µj

l −∑Mk=1 λ

jlk

)+

C

∑c=1

Pcf

∑j=1

p jcf i

µjcf −∑

Mk=1 λ

jcf k

+Hi

µs−∑Mk=1 Hiλk

� �3.12

where R̄p represents the mean response time considering only the processing timecomponent of the response time for all possible entities.

In what follows we present results obtained from our model. In Section 3.4 we evaluatethe feasibility of multiple CDNs collaboration focusing on cache hit, multi CDN proportion(mcp), and CDNs capacities (represented by the factor k as summarized in Table 3.1). Then weextend our model to consider redirection overhead, according to redirection strategies proposedby Brandenburg, Peterson, and Davie (2015). Section 3.5.1 presents results obtained by theextended model evaluating multiple CDN collaboration.

3.4 Methodology and Initial Experimental Results

This section presents evaluation methodology and results obtained from our analyticalmodel. We evaluate collaboration benefits and factors that are relevant to our main metric,response time, when multiple CDNs are deployed to deliver content. In all Figures presented,each point represents the mean response time obtained from 1000 repetitions. We set a 99%confidence level to calculated confidence interval, using Apache Math Commons lib 4, for allresponse time means as well. Confidence intervals are also plotted in all Figures, however, valueswere so small that they are covered by the line markers.

It is important to notice that all entities were placed according to equation 3.10. Somefactors, for instance RTT, are presented through maximum and minimum values (τmin , τmax).

4https://commons.apache.org/proper/commons-math/


Table 3.2: Second scenario describing a small network with 8 client clusters and 4 RSs.

Value Max MinM 8 hit_ratio 0 1P 4 τo 2 2N 5 τd 1 1α 1.01 λ 100 100k 1.01

Source: Modeling content delivery networks and their performance (MOLINA; PALAU;ESTEVE, 2004).

Table 3.3: Description of the scenario used to evaluate if it is worth to collaborate.

Value Max MinM 2500 hit_ratio 0 1N 5 τo 0.5 1α 1.0 τd 0.2 0.7kl 1.2 τ f 0.3 0.8k f 1.2 λ 100 100

In this case, each path will have a random latency τ within the range[τmin,τmax], for instance

if τmino = 1s and τmax

o = 2s, then all RTTs between clients and the origin server will be within[1,2]s range. If not mentioned otherwise, consider mcp = 0.6. From our practical experience,this proportion represents well a peering arrangement. It is high enough to result in considerableredirections, but not too high, thus not excluding cache misses.

Our first experiment used one of the scenarios previously presented in Molina, Palau,and Esteve (2004). The scenario chosen was the second presented in the paper, described inTable 3.2. For our model we used exactly the same data; the only difference was that therewere some additional variables included, all related to CDN collaboration, and they were alldefined to avoid specifically any intervention from foreign replica servers. Figure 3.2(a) showsresults obtained by (MOLINA; PALAU; ESTEVE, 2004) whilst Figure 3.2(b) illustrates resultsobtained using our model. Comparing both figures shows that results were faithfully reproduced.One may notice an extra curve on our results; this curve corresponds to the response time of theforeign replica servers and is always zero, since, for a fair comparison with results found Molina,Palau, and Esteve (2004), no influence of foreign replica servers was allowed.

We then continue our analyses trying to answer a fundamental question regarding col-laboration among CDNs: “Is it worth collaborating?” To answer this question we will compareresponse time between two scenarios with similar setups, with and without collaboration. Thescenario used is described in Table 3.3.

Since we are evaluating scenarios with different types of RSs, foreign and local, weneed to set an equivalent number of replica servers for both. We decided that the best way todo that would be to have a constant value for Pl +Pf , meaning that for both scenarios the totalnumber of RSs is the same. For the collaboration comparison scenario described in Table 3.3


Figure 3.2: Comparison of results obtained by our model with results from Molina,Palau, and Esteve (2004), considering the same scenario (Table 3.2).

(a) Response time results from Molina, Palau, andEsteve (2004).

(b) Response time results from our model.

Source: (a) got from Molina, Palau, and Esteve (2004), (b) made by author.

total number of replica servers is 75, whereas Pl = 75 for the scenario without collaboration, andPl = 50 and Pf = 25 for the scenario with CDN collaboration.

Figure 3.3: Comparison of response time considering scenarios with and withoutmultiple CDN collaboration.

(a) Response time for different cache hit valueswithout considering CDN collaboration.

(b) Response time for different cache hit valueswith CDN collaboration.


Figures 3.3(a) and 3.3(b) illustrate the behavior of the response time for scenarios withoutand with collaboration respectively. Both figures show four lines; each line represents one typeof response time. The types are Overall, Local, Foreign, and Origin, representing respectivelythe response times experienced by all clients, clients receiving content from local replica servers,those from foreign replica servers, and those from the origin server. It is important to show that


the line representing the response time experienced by clients receiving content from foreignreplica servers is always zero in Figure 3.3(a), because this scenario does not consider CDNcollaboration. Comparing Figures 3.3(a) and 3.3(b), one may notice a lower overall responsetime, sometimes around 35% lower, when there is collaboration. Also, we observe a lower meanorigin response time, reaching a 60% decrease. We can also observe that the response timeregarding foreign replica servers has a similar behavior to the origin response time. This happensdue to a higher cache hit resulting in fewer requests redirected to foreign servers or requestingcontent from the origin. Another interesting point is that curves representing overall responsetime and local replica servers overlap when the cache hit is 1; this happens because, as hit ratioreaches 1, all requests will be served by local replica servers in both scenarios.

Due to variables added by the extension proposed, we thought it would be worthwhileto evaluate them to have a better understanding of their influence on the model. Two primarycollaboration metrics were selected for a better analysis, k f and mcp. Keep in mind that thefactor k represents the capacity of each CDN, meaning µ f = k f λ f , k f > 1. We compared k f ,which represents the capacity of foreign CDNs, to kl , local CDNs capacity, with the intention tounderstand the influence of the foreign CDN’s capacity in the response time. Using mcp, whichrepresents the proportion of cache misses that will be redirected to foreign CDNs, we made twocomparisons between kl and k f ; the idea is to have a better understanding of the mcp relationwith CDN’s capacity and the response time.

Figure 3.4 shows the relation between kl , k f , and the response time. The scenario used isthe same collaboration scenario used previously but, varying both capacities (kl and k f ), andalso with hit_ratio = 0.5. As expected, high values for both factors mean lower response time.When both factors are close to 1, response time increases, although when only k f increases itis clear that the impact is considerably lower than for similar values of kl . This happens due tothe mcp factor lowering the influence of foreign servers in response time. Another consequenceof the same fact is that even high k f values impact a lot less than having the local CDN underheavy loading. Considering that the low influence of k f could be due to low values of mcp, wedecided to evaluate how the response time behaved when mcp changes.

Figure 3.5 and Figure 3.6 show the behavior of overall response time varying k f and kl ,respectively, relative to mcp. Both figures show that a mcp near zero, in other words withoutcollaboration, increases considerably the response time, due to the high number of requestsdepending on the origin server, usually more distant than replica servers. Overall response timeis also increased with lower k f and kl values. Nonetheless, both figures show interesting valuesfor overall response time. Instead of what we see in Figure 3.4, where a low k f and kl result inhigher overall response time, now increasing the proportion of requests being redirected to theforeign CDN, the impact of the low capacities is lightened by CDN collaboration, resulting in alower overall response time.


Figure 3.4: Relation between kl , k f , which represent local and foreign CDN capacitiesrespectively, and response time.


Figure 3.5: Relation between k f , mcp and response time.


3.5. COLLABORATION OVERHEAD 50

Figure 3.6: Relation between kl , mcp and response time.


3.5 Collaboration Overhead

Collaboration seems to be beneficial, as seen in the previous Section, however unfor-tunately there are some drawbacks. There are new interactions happening and new messagesexchanged. Although it could vary depending on the strategy used, the need to consider anadditional overhead is clear. This Section presents an extension to the model in order to get anaccurate view on the collaboration overhead. We discuss what might cause the overhead andanalyze what is the expected impact on response time.

There are two main overheads to consider, namely negotiation and redirection. Nego-tiation overhead concerns the negotiation phase before the collaboration agreement. It representsthe time between a CDN determining that a peering is needed and the time when the peeringagreement is already established. Although considerably higher than the redirection overhead, itcould be almost imperceptible in the long run. Once the agreement is established, the CDN usesinformation acquired to redirect clients thus, considering the mean response time the negotiationoverhead is scattered.

On the other hand, redirection overhead is present in every request from each client.According to (BRANDENBURG; PETERSON; DAVIE, 2015) there could be two differentRequest Routing strategies, namely interactive and recursive. To better understand them we haveto focus on a basic CDN entity, the Request Redirector (RR). The RR receives clients’ requestsand redirects them to the most suitable replica server. But considering a scenario with CDNcollaboration, the redirection could to a local or a foreign replica server. Thus, once the RR


realizes that the received request needs to be redirected to a foreign CDN there are two basicoptions: either the RR knows which foreign replica server the client should be redirected to,or it redirects to the downstream CDN’s RR which will then redirect the client to the properreplica server. The former alternative is called Recursive Request Routing and the latter iscalled Interactive Request Routing. To represent the collaboration overhead, first we extendedthe original model to consider the request redirection overhead. There are two basic parts for thisoverhead: the network transfer time from the client to the RR and the time to process the request.Considering that requests handled by the RR are simple receive and response messages, the mostimportant part is the network transfer. Therefore, according to Equation 3.2 the response timeregarding the RR (Rrr) should be:

Rrr = Nτirr

� �3.13

Where τ irr represents the round trip time between client i and the RR. The distribution

of RRs in the network depends on the RR strategy being used. This work considers one of themost popular RR strategies, the end user request mapping (ECONOMOU, 2010) or DNS-basedrequest-routing (PATHAN; BUYYA, 2007). Using this approach, the RR work with DNS serversby resolving domain names for clients according to their geographic position and other aspectsrelated to load balancing techniques. Using this is strategy, RRs and replica servers are distributedin similar ways, as seen (DILLON; WU; CHANG, 2010). Thus, we decided to estimate τ i

rr

according to τi jpl and τ

i jcp f for the RTT from the client to local and foreign RRs. Therefore:

τirrl =

∑Plj=1 τ

i jpl

Plτ

irr f =

∑Cc=1 ∑

Pcf

j=1 τi jcp f

∑Cc=1 Pc

f

� �3.14

In other words, we estimate τ lrr and τ

frr for each client according the average response

time from the same client to local and foreign RSs. Nevertheless, this mean would be unfair ifwe consider all replica servers for each client because distant replica servers most certainly willnot serve clients. The basic idea of a CDN is to bring content closer to the user. Therefore, weuse α , the same threshold presented and discussed in Section 3.3.1. Thus, the calculation of themean τ i

rr comply with the following condition:

τi jpl =

0 if τi jpl > ατ i

o,

τi jpl if τ

i jpl ≤ ατ i

o.τ

i jcp f =

0 if τi jcp f > ατ i

o,

τi jcp f if τ

i jcp f ≤ ατ i

o.

� �3.15

The idea is that replica servers located within distance greater than ατ io most certainly

will not serve the client, therefore, will not be considered as an estimate of the location of the RR.Another point to consider is CDN capacity, bigger CDNs should have wider presence of RSs,


and RRs. Thus, we modified Equation 3.13 to include CDN’s capacities, resulting in:

Orr =Nτ i

rr

k

� �3.16

Where k represents CDNs’ capacities and Orr representing the redirection overhead forlocal requests. Considering requests redirected to foreign CDNs, there are two alternatives,namely Interactive and Recursive. When using the Recursive strategy, the RR already knowswhere to redirect the client, therefore the result will be in general similar to a request servedby the a local CDN in terms of redirection overhead. Alternatively, using Interactive RequestRouting between CDN will certainly result in two consecutive requests (local and foreign) madeto both RRs. Therefore, the redirection overhead for foreign requests when using InteractiveRequest Routing can be modeled as follows:

Orr f =Nτ i

rrl

kl+

Nτ irr f

k f

� �3.17

Where τ irrl and τ i

rr f represent, respectively the RTT between the client and local RRand the client and foreign RR. When using Recursive Request Routing, τ i

rr f = 0. It is worthemphasizing that the authors are aware that even using Recursive Request Routing a given clientcan be redirected from local RR to foreign RR. However, the idea is to incorporate both strategieswith their differences in the proposed model, making comparisons between them possible.

Now that we discussed the redirection overhead, lets get back to the negotiation overheadfor a while. As far as we are concerned, there are several unclear aspects about negotiationoverhead. Protocol and technologies to be used are not standardized yet. Therefore, we decidedto use a variable to represent the negotiation overhead time, Po, representing the peering overhead.Thus, the mean negotiation overhead is:

No =CPo

M

� �3.18

where C is the number of existing collaborations, Po represents the negotiation overheadand M is the number of client clusters. For a better representation of the response time, we nowexpand Equations 3.6, 3.7, 3.8, 3.11 considering redirection overheads.

R̄l =1

M

M

∑i=1

Pl

∑j=1

[p j

li

(Nτ

i jpl +

1

µj

l −∑Mk=1 λ

jlk

+Orr

)] � �3.19

R̄ f =1

M

M

∑i=1

C

∑c=1

Pcf

∑j=1

[p jc

f i

(Nτ

i jcp f +

1

µjcf −∑

Mk=1 λ

jcf k

+Orr f

)] � �3.20

R̄o =1

M

M

∑i=1

Hi

[Nτ

io +

1

µs−∑Mk=1(Hi)λk

+Orr

] � �3.21


R̄t =1

M

M

∑i=1

Pl

∑j=1

p jli(Nτ

i jpl +Orr)+

C

∑c=1

Pcf

∑j=1

p jcf i(Nτ

i jcp f +Orr f )+Hi(Nτ

io +Orr)

� �3.22

3.5.1 Collaboration Overhead Results

We used the same scenario presented in the previous section, described in Table 3.3,the only difference is the new variable used for negotiation overhead. For all overhead resultswe used Po = 180 seconds. Our purpose was to compare the difference between Recursive andInteractive Request Routing and analyze if considering redirection and negotiation overheads;collaboration is still beneficial regarding response time.

Figures 3.7(a) and 3.7(b) shows, respectively, the response time for different cachehit ratios when using Recursive and Interactive Request Routing, respectively. As expected,Interactive routing results in additional overhead. The increase in overall mean response time,reaching 27%, is a result of the client’s redirection to both local and foreign RRs. Althoughconsiderable, the increase is not enough to reach the mean response time in a no collaborationscenario.

Figure 3.8 compares the overall response time without collaboration and with collab-oration using Recursive and Interactive Request Routing. Collaboration has a definite impacton response time reaching 10% and 30% decrease for Interactive and Recursive strategies,respectively.

Figure 3.7: Comparison of response time considering two redirection strategies, namelyRecursive and Interactive Request Routing.

(a) Response time for cache hit values from 0 to 1,considering Recursive Request Routing.

(b) Response time for cache hit values from 0 to 1,considering Interactive Request Routing.


3.6. LESSONS LEARNED 54

Figure 3.8: Comparison of response time for cache hit values [0,1] considering RecursiveRequest Routing, Interactive Request Routing and no collaboration.


3.6 Lessons Learned

After analyzing data collected from our CDN model, we conclude that multiple CDNcollaboration can potentially decrease perceived response time therefore increasing client’s QoE.We demonstrated total response time reduction using collaboration between CDNs, due to replicaservers, even if foreign, usually closer to clients than to origin servers.

We also observed that CDN collaboration can also benefit an overloaded primary CDN,redirecting some of its requests to a foreign CDN and lowering the total response time. Theoverload of the primary CDN could be the result of a flash crowd event or even sudden capacityloss due to hardware problems. In both cases, collaboration could help lower the total responsetime experienced by clients resulting in a better network performance.

Considering redirection overhead, we observed that negotiation overhead seems to havea less significant impact on the overall mean response time. This happens due to the negotiationbeing done only once. Therefore usually, this overhead is divided amongst several clients. Theredirection overhead, on the other hand, impacts clearly on client’s response time. This effect,although notable, is not enough to make CDN collaboration not worth from client’s responsetime perspective.

3.7. CONCLUSION 55

3.7 Conclusion

Content Delivery Networks (CDN) popularity encouraged a considerable number of newlocalized CDNs with different characteristics, for instance, with different capacities and types ofcontent. Considering this scenario, CDN collaboration becomes a good alternative to expandtheir capacity. Through collaboration a particular CDN could expand coverage and enhanceuser’s quality of experience.

This Chapter presented an analytical modelused to understand better performance aspectsof collaboration among multiple CDNs. We also studied the impact of redirection overhead oftwo different Request Routing strategies proposed for CDN collaboration strategies, Recursiveand Interactive. Through our analytical model, we can see that collaboration can decrease the totalresponse time of users. Response time decrease is a direct result of redirecting clients to foreignreplica servers which, are usually closer to customers when than the origin server. Anotherimportant finding was the possibility of using CDN collaboration to relieve primary CDNs ofconsiderable loads, for example, the occurrence of a flash crowd event.

In the next Chapter we present another way of extending CDN’s capacity focusing onthe mobile network scenario. We present a solution to enhance mobile backhaul’s flexibility,based on SDN technology, compliant with current architecture.

565656

4Enabling Transparent Caching in LTE Mo-bile Backhaul

Without continual growth and progress such words as improvement,

achievement and success have no meaning.

—BENJAMIN FRANKLIN

This Chapter presents a solution to enhance the mobile backhaul’s flexibility. Thesolution thrives from Software Defined Networks (SDN) to propose a user-space extensionto Openflow switches inside the mobile backhaul. Such extension enables better placementof Replica Servers (RS) inside the mobile backhaul. We also present the benefits of such solutionby designing and prototyping a transparent cache service. The primary goal of this Chapter isto describe the proposed solution and present the evaluation made using the proof of conceptprototype.

This Chapter is organized as follows: Section 4.2 introduces basic concepts of the mobilebackhaul architecture and MPEG DASH streaming. Section 4.3 analyzes the design space anddescribes the proposed design, while Section 4.4 presents our prototype implementation andits experimental evaluation. Section 4.5 discusses related work and Section 4.6 concludes thechapter.

The results obtained from this Chapter were published in Rodrigues, Dán, and Gallo(2016).

4.1 Introduction

In the last decade Internet traffic significantly increased due to the penetration of broad-band access. In particular, mobile traffic grew by 70% during the last year from 1.5 to 2.5exabytes per month. The increase is mainly driven by the rapid innovation and diffusion ofmobile terminals and video related services, and it is predicted that by 2020 more than 60% ofInternet traffic will be originated by mobile terminals (CISCO, 2015). The increasing mobile

4.1. INTRODUCTION 57

traffic indicates the need for mobile network operators to design unique solutions to scale contentdelivery. An efficient approach to scale content delivery is CDNs. As described in Section A.1,the primary concept behind CDNs is to distribute RSs through the network, bringing contentcloser to end users. Therefore, we need to be able to dynamically place RSs, according to localdemands and resource availability.

The current mobile network architecture is an IP packet switched network built around 3rdGeneration Partnership Project (3GPP) specifications that define the Evolved Packet System(EPS) architecture, the foundation of fourth generation mobile networks. In the EPS traffic isencapsulated in GPRS Tunnelling Protocol (GTP) tunnels transporting packets from edge nodes,eNodeB, to mobile network gateways, such as the Packet Data Network Gateway (PGW), wherepackets are forwarded towards the global Internet. Although this architecture has been able todrive the ongoing mobile revolution, it lacks the flexibility needed to dynamically control theconstantly increasing amount of mobile traffic. One of the consequences of the lack of flexibilityis the impossibility to dynamically place and relocate Replica Servers (RS) inside the mobilebackhaul. Indeed, GTP tunnels impose that packets traverse the whole mobile infrastructure,and transform the mobile backhaul into a passive network segment in which traffic cannot bedynamically managed.

Forced by the limited flexibility of the current infrastructure, to meet bandwidth andlatency requirements despite the increasing amount of traffic in their networks, mobile networkoperators have been constantly increasing their network capacity. Despite being effective,increasing the network capacity significantly increases mobile network operators’ costs asit requires the deployment of additional infrastructure. To alleviate this problem, motivatedby cache-ability studies (RAMANAN et al., 2013; CAROFIGLIO et al., 2015; IMBRENDA;MUSCARIELLO; ROSSI, 2014), edge caching solutions have been explored recently (BASTUG;BENNIS; DEBBAH, 2014; WANG et al., 2014), and commercial edge caching products havebecome available, e.g., LTE caches by ARA Networks 1, and DatE by I-Direct 2. These solutionsare, however, limited to the network edge, and since they do not allow to bypass GTP tunnels,they miss the potential benefits of in-network caching and dynamic traffic management.

While Software Defined Networks (SDN) are considered as an enabler for the emerging5G mobile backhaul architecture, co-existence with EPS requires that an SDN-based backhaulwould have to support dynamic traffic management for the existing architecture, without addi-tional middleboxes. SDN offers significantly simplified network management, mainly due tothe centralized control plane, and opens up to the introduction of novel functionalities owing toincreased network programmability.

In this Chapter we present a solution to enhance the mobile backhaul’s flexibility basedon SDN technology, compliant with the current architecture. In particular we propose a user-space extension to OpenFlow switches inside the mobile backhaul and show the benefits of

1http://www.aranetworks.com/solutions/mobile_edgeCDN2http://www.idirect.net/Altobridge.aspx

4.2. BACKGROUND 58

network devices’ programmability by designing and prototyping a transparent cache service.The proposed in-network caching system can be deployed on top of OpenFlow switches in orderto reduce the excessive load observed in the mobile backhaul network and can be offered by themobile network operators to Content Providers.

The main contributions of the Chapter are as follows

� Design space analysis for the introduction of SDN inside the current mobile backhaularchitecture through the in-network caching use case;

� Design of a solution for transparent caching of MPEG DASH multimedia content;

� Prototype of SDN-enabled in-network caching solution compatible with currentstandards and protocols;

� Feasibility analysis of the proposed solution.

4.2 Background

4.2.1 Mobile Backhaul Architecture

The backhaul is the part of the mobile network that ties the core network, which connectsto the Internet, to the antennas that provide wireless access to mobile devices. During the years themobile backhaul evolved from a Plesiosynchronous Digital Hierarchy (PDH) and AsynchronousTransfer Mode (ATM) transport to an Ethernet based transport network, in which data aretransmitted on top of IP.

The key elements of the Long-Term Evolution (LTE) backhaul architecture are illustratedin Figure 4.1. Mobile terminals, or User Equipment (UE), are connected to the network throughantennas associated with at least one eNodeB. The eNodeBs provide access to the UE belongingto one or more cells by performing radio admission control, dynamic resource allocation andtransforming the radio signal into digital information. At the eNodeB, the UE’s traffic isencapsulated in a GTP Tunnel that is directed over the Serving Gateways (SGW) and terminatedat the PGW, where the mobile backhaul is connected to the global Internet. The SGW’s functionis to route incoming UEs’ traffic in the appropriate GTP Tunnel, as well as to support users’mobility driven by the Mobility Management Entity (MME), which is the essential elementof the mobile backhaul’s control plane and is responsible for multiple control operations suchas UEs’ mapping, authentication, and authorization. Finally, the PGW is the termination point ofthe mobile backhaul, and connects the network to the Internet. Its primary functions are packetfiltering and marking, accounting, and IP address assignment under control of the MME.

4.2. BACKGROUND 59

Figure 4.1: LTE Mobile backhaul architecture.


4.2.2 Stateful L4-L7 Processing in SDN

To keep the data plane simple and efficient, the original SDN architecture made theswitches stateless and operate only on L2-L4 packet header information. E.g., in the OpenFlowprotocol, the first proposal and de-facto standard for a programmable SDN switch. Figure 4.2shows the main elements of an OpenFlow switch. The datapath of the OpenFlow switch isimplemented through the Flow Table that indicates which is the action to be performed whena packet for a matching flow-entry (i.e., an entry of the FlowTable) is received. Although theOpenFlow protocol is constantly evolving and the type of supported actions is extensible, fourbasic actions that constitute the minimum requirements for OpenFlow switches can be identified:

� Forward the packet to a given (set of) port(s) allowing packets to be dynamicallyrouted.

� Encapsulate the packet and send it to the controller for further processing. Thisaction is typically executed on the first packet belonging to a flow. The controller canthen decide if and which rule to install in the OpenFlow Switch’s datapath.

� Modify portion of the packet header, provided that the protocol is supported byOpenFlow. As an example, this kind of action could be used to implement a NATinside the OpenFlow switch.

� Inspect the packet up to application layer in order to perform Deep Packet Inspection(DPI) and take informed decisions to dynamically manage the traffic.

� Drop the packet allowing security checks to be performed by adding specific rulesfor misbehaving flows.

4.2. BACKGROUND 60

Figure 4.2: Software Defined Networks (SDN) control plane illustration.


Motivated by use cases such as Network Address Translation (NAT) and L7 load bal-ancing, recent work has proposed an application-aware SDN architecture based on the conceptof AppTables that enables stateful processing based on L4-L7 information. An AppTable storeapplication specific state and can dynamically add and remove rules from the FlowTable, just likea controller, but is located on the switch (MEKKY et al., 2014). Driven by similar motivation,but arguably less versatile, a DPI service and stateful tracking of flows (based on conntrack) isbeing added to Open vSwitch 3, a widely used open source SDN switch implementation.

Since in the EPS mobile backhaul all traffic of a UE is carried in a GTP tunnel, statefulL4-L7 inspection is essential for dynamic traffic management.

4.2.3 MPEG DASH Streaming

MPEG Dynamic Adaptive Streaming over HTTP (DASH) (MPEG, 2014) is a popularvideo-on-demand protocol adopted in 3GPP Release 10 for video streaming over mobile net-works. DASH video content is partitioned into one or more segments, with typical segmentdurations between 1s and some tens of seconds. The DASH client retrieves metadata informa-tion on the streams it is interest in by downloading from a web server the Media PresentationDescription (MPD) file, which specifies segment information including timing, duration, avail-able bitrates and resolutions, and the URL. Given the MPD file, the DASH client requests thesegments sequentially using HTTP and chooses the bitrate of the next segment based on theestimated download rate. A DASH client would thus generate an HTTP request up to once per

3http://www.openvswitch.org

4.3. TRANSPARENT CACHING IN THE MOBILE BACKHAUL 61

second on average, depending on the segment duration. While DASH naturally lends itself tocaching due to relying on HTTP, GTP tunneling in the mobile backhaul makes it challenging toimplement efficient dynamic caching.

In what follows we propose a solution to address this challenge leveraging SDN.

4.3 Transparent Caching in the Mobile Backhaul

The introduction of remotely controlled SDN switches could provide mobile networkoperators a more flexible architecture allowing them to sustain the increasing amount of trafficwhile reducing infrastructure’s maintenance costs, as well as giving them a way to increasetheir revenue by introducing novel in-network services. To investigate the feasibility of theintroduction of SDN in the current mobile backhaul architecture, in this Section we explore theuse case of transparent in-network caching for MPEG DASH video content.

In order to enable transparent caching of DASH content the proposed architectureneeds to (i) intercept an HTTP GET request in a Trasmission Control Protocol (TCP) segmenttransmitted in a GTP tunnel, (ii) decide if the request is for cacheable content, (iii) decide if therequested segment is cached in a nearby cache and, if yes, (iv) remove the request from the GTPtunnel and the TCP connection, redirect it to the appropriate cache, and then (v) insert the datasent by the cache into the GTP tunnel and TCP connection so that caching remains transparent tothe UE. In what follows we propose an architecture for supporting these five network functionsand discuss the main design choices. The components of the architecture are shown in Figure 4.3.

Function (i) requires a stateless DPI service, and since all UE traffic traverses the backhaulin a GTP tunnel, it has to be performed by the SDN switches, as otherwise all packets would haveto be transmitted to the controller. Since various SDN switches are expected to include a DPIengine in the near future (e.g., Open vSwitch (BAUDIN, 2014)), we assume DPI is available.

To decide whether the request is for cacheable content, function (ii) matches the destina-tion socket of the request (extracted from the GTP tunnel) against a list of known content serversockets, maintained in the Content Server Directory (CSD). Function (ii) can be implementedusing stateless DPI, although the entries in the CSD may change over time.

To support function (iii) we maintain a Content Location Directory (CLD), which holdsthe content placement information: for each cached content segment it contains the address(es) ofthe cache(s) serving the segment, and a reference to the DASH MPD file. Notice that obtainingthe MPD file of a content is straightforward, and allows optimizations as we discuss later.The CLD can be either centralized (at or near the PGW) or distributed, and the information itcontains does not change due to flow or packet arrivals, hence function (iii) can be consideredstateless.

To support functions (iv) and (v) we designed a splicing network function that transpar-ently extracts and reinserts segments into a TCP connection, and packets into the GTP tunnel.Unlike the DPI, the CSD and the CLD, the splicing function is stateful.


Figure 4.3: Transparent caching for LTE networks architecture.


4.3.1 Design and Function Placement

While function (i) based on stateless DPI must be performed in the switch, there areseveral alternatives for the placement of functions (ii)-(v), which we will discuss next.

Controller-based: The SDN switches use a stateless DPI engine to perform function(i) and redirect the identified requests to the central SDN controller. Functions (ii)-(v) are thenperformed in the controller. Since splicing (function (iv) and (v)) is located in the controller,the video segments are delivered from the caches via the controller to the switch. While thissolution provides a centralized view of the network, is easy to manage, and requires simplestateless DPI in the switches, it does not scale well and has a single point of failure. The singlepoint of failure problem can be partly solved by using a distributed controller, such as the oneproposed in (TOOTOONCHIAN; GANJALI, 2010). However, a distributed controller increasescomplexity, and a significant amount of packets has to be forwarded via the controllers, hencethe solution is not scalable.

Switch-Controller Hybrid: The SDN switches use a stateless DPI engine to performfunctions (i) and (ii). If the requested content is cacheable, the packet is redirected to a controller,where functions (iii)-(v) are performed. Using this solution only requests for cachable videocontent are forwarded to the controller, and the DPI on the switch only needs to hold the CSD,but the amount of traffic traversing the controller is still very high, leading to limited scalability.

Switch-based: The SDN switches perform functions (i)-(v) by using stateful L5-L7packet inspection, similar to the AppTables proposed in (MEKKY et al., 2014). With thissolution, the SDN switch still exchanges messages with the Controller (in order to use the CLDto find the best cache to serve the request, in case the CLD is centralized), but since therequest rate per DASH client is relatively low (up to 1/sec on average), this solution scales well,


particularly if the CLD is distributed. Note that since (iv) and (v) are implemented in the switch,video segments are delivered from the caches via the local SDN switch and not via the controller.

While currently available SDN switches do not have the required features for the imple-mentation of the switch-based design, various switches will include DPI and stateful connectiontracking (e.g., Open vSwitch (BAUDIN, 2014)) in the near future. In the rest of the section wedescribe the components of the Switch-based design, assuming that DPI and connection trackingare feasible.

4.3.2 Switch-based transparent caching for LTE

The proposed solution for SDN-enabled transparent caching in the mobile backhaulrelies on a local AppTable located in the user space of the SDN switch, similar to what has beenproposed in (MEKKY et al., 2014). In this solution, the SDN switch has rules that contain ’gotoApplication Table’ actions instead of ’output to controller’. If functions (i) and (ii) identify arequest for cacheable content, the request is redirected to the AppTable, where it is matchedagainst existing entries. AppTable entries provide the location of the best cache if the samecontent segment has been previously requested. Otherwise, the request is forwarded to the CLD(located in the switch or in the controller). If the segment is cached, the CLD returns the locationof the best cache (and other information, as described later), and request serving is performed inthe switch’s AppTable scope using functions (iv) and (v).

AppTable Design

The AppTable is very similar to an Openflow table, the only difference is that matchingcan be performed on L5-L7. The AppTable contains entries of the form

<IPaddress ><port ><segmentpath>→ <action>

where <IPaddress >and <port >are the IP address and the port number of the web server andare matched against the IP address and the TCP port number of the datagram and the TCPsegment carrying the HTTP GET method. The <segmentpath>field is a string of the format“GET path/to/cachedvideosegment”, and <action>is the address of the cache where the videosegment is cached.

The AppTable is initially empty, and is proactively populated in response to requests.Every time a request is forwarded to the CLD, and the requested DASH segment is identifiedas cached, the CLD returns the names of all cached segments listed in the same MPD file asthe requested segment, together with the corresponding cache locations. The returned segmentnames are then added to the AppTable. As an effect, upon receiving the request for the first DASHsegment of a media file, the AppTable is populated with the cache locations for the subsequentsegments of the file, which helps keep the load of the CLD low. Although the AppTable is

4.4. PROTOTYPING AND EXPERIMENTAL EVALUATION 64

populated dynamically, its use does not need per-flow state to be maintained.

GTP and TCP splicing

The solution we chose for functions (iv) and (v), i.e., the removal and insertion of packetsinto the GTP tunnel and the TCP connection, is based on splicing.

TCP splicing is a technique for enhancing L7 proxy performance, which allows theproxy to forward TCP segments received from one endpoint to the other, avoiding applicationlayer segment processing (COHEN; RANGARAJAN; SLYE, 1999). In order for splicing towork, connection specific information contained in the TCP headers needs to be translated,e.g., the acknowledgement and sequence numbers, and consequently the TCP checksum. Itis important to note that our use of splicing is different from that of regular proxies. Proxiesintercept the SYN segment sent by the client and send a new SYN segment to the server toestablish a connection (COHEN; RANGARAJAN; SLYE, 1999), as implemented in TCPSP 4.

Unlike a regular proxy, we do not splice all TCP connections traversing the switch, butonly connections that are requesting content cached by local caches. Therefore, the SYN segmentsent by the client remains unchanged. If a request for cached content is identified in an alreadyestablished TCP connection, the segment containing the request to the local cache is redirectedthrough splicing. The response sent by the cache is then spliced into the connection between theclient and the server. A similar solution is used for the GTP tunnel. Notice that functions (iv)and (v) require per-flow state to be kept.

4.4 Prototyping and Experimental evaluation

In this section we present a prototype-based evaluation of the transparent SDN-basedin-network caching solution. In order to evaluate the feasibility of the proposed architecturewe implemented a proof-of-concept prototype. Due to the lack of DPI support in OpenFlow,we choose to implement the solution based on a centralized controller. This solution allowsus to assess the worst-case impact of the proposed architecture on the download performanceof DASH clients, not considering scalability.

4.4.1 Prototype Implementation

Our prototype implementation is based on Open vSwitch v2.0.2 (PFAFF et al., 2015),and Floodlight v1.1 5 running on Ubuntu 14.04, and is a module in the Floodlight controller. Werefer to it as the Transparent Caching (TC) module. The TC module performs GTP dissection,payload inspection to detect cacheability (search for HTTP GET method for a locally cached

4http://www.linux-vs.org/software/tcpsp/index.html5http://www.projectfloodlight.org/floodlight/


Figure 4.4: Experimental testbed topology.


content), acts as a local CLD, and does GTP and TCP splicing, i.e., it implements functions(i)-(v).

TCP Splicing: To implement TCP splicing, upon identifying a request for cachedcontent, we buffer the segment corresponding to the GET method, and we initiate an active openfrom the module to the local cache by sending a SYN segment. We add the newly initiatedconnection to the list of spliced connections, and we set its state to Sync, which represents thebeginning and the end of a spliced connection. While in this state, if a SYN-ACK is receivedfrom the local cache then the module acknowledges its receipt and then sends the buffered GETmethod to the local cache. Once the local cache acknowledges the segment of the GET methodwith a TCP ACK, the state of the connection is set to Connected and data start to be spliced.Since after this point everything is spliced, flow control, congestion control and error control aretaken care of by the two sessions’ endpoints (the DASH client and the cache server). To reduceprocessing time and memory consumption, we keep track of the difference between the SEQand ACK numbers of the two TCP connections. We also store some static information, such asnetwork and hardware addresses for the two end points, and TCP source and destination ports.While the connection is in the Connected state, the only TCP segment that needs to be evaluatedis the FIN segment. If either the DASH client or the local cache sends a FIN segment then thestate of the connection is changed to Sync, Disconnecting and Disconnected after FIN-ACK andACK messages are sent. When a connection is considered as Disconnected the module removesit from the list of spliced connections.


GTP support: In order to enable GTP in Floodlight we developed two subclasses of theBasePacket class found in net.floodlightcontroller.packet, one for GTP-C (GTP Control plane,used for signaling) and the other for GTP-U (GTP User plane, used to transport data packets)packets. Also, we developed header representations for GTPv1 and for GTPv2 packets. Inthe testbed, we use GTPv1. The implemented subclasses allow us to handle GTP packets inthe developed Floodlight module, i.e., remove and insert packets from/into the GTP tunnel.Removal and insertion of packets require to store the context for each new client opening a GTPtunnel, which consists of the MAC addresses, IP addresses and GTP information, such as flagsand Tunnel Endpoint Identifier (TEID) value. Therefore, to recreate the GTP stack one mustrecover GTP information regarding the specific session being spliced and update the Sequencenumber.

4.4.2 Experiment Methodology

The topology of our experimental platform is shown in Figure 4.4. In the consideredtopology a GTPv1 tunnel is established between a DASH Client (DC) and the Origin server(OSVR), traversing an OVS switch. The GTP tunnel is created using the ggsn and the sgsnemu

tools in OpenGGSN6. The DC establishes TCP connections over the GTP tunnel to the OSVR inorder to download DASH segments. When the DC requests a segment over the TCP connection,the Transparent Cache (TC) module detects the request, and if the content is stored by the cacheserver (CSVR) then the module establishes a TCP connection between itself and the CSVR.Once the TCP connection is established, the TC module starts to splice the TCP connection andthe GTP tunnel between the DC and the OSVR, i.e., it removes and reinserts traffic from/intothe GTP tunnel between the DC and the OSVR. Figure 4.5 illustrates TC’s operation.

In order to evaluate the impact of the TC module and of GTP, we considered fourdifferent scenarios. The first one is a baseline scenario, in which GTP tunnels are not used,and Floodlight is equipped with a dummy module that takes packets as input and puts themunmodified as its output (’dummy’). In the second scenario GTP tunneling is used and Floodlightuses the dummy module (’GTP’). In the third scenario GTP tunneling is not used and Floodlightis equipped with the TC module that performs all functions but GTP splicing (’TC’). In thefourth scenario GTP tunneling is used and Floodlight is equipped with the TC module thatperforms TCP and GTP splicing (’GTP+TC’).

4.4.3 Throughput Performance

In the first set of experiments we consider a single DC that downloads DASH segmentsto evaluate the impact of the TC module on the achievable download rate. We used DASH videosegments7 of lengths between 1s and 15s worth of video content and bitrates between 150 and

6http://openbsc.osmocom.org/trac/7http://www-itec.uni-klu.ac.at/dash/


Figure 4.5: Flow diagram illustrating TC’s operation to transparently redirect and splicecontent between DC and CSVR.


8000kbps. For each scenario, we used Apache ab8 as the DC, and downloaded 7293 videosegments twice, consecutively.

Figure 4.6: Box plot of bitrates for 15s length video segments.


Figure 4.6 shows the box plot of the achieved bitrates for the four scenarios in which theDC download segments of length 15s. We observe a decrease of about 30% in download rate

8https://httpd.apache.org/docs/2.2/programs/ab.html


Figure 4.7: CDF of bitrate for 1s-15s segments for GTP+TC scenario.


when comparing ’GTP+TC’ to the ’dummy’ scenario, which is due to TCP and GTP splicing,mainly the delay introduced by the initial TCP connection establishment from the module to theCSRV. The performance decrease is, however, not that significant considering the complexityof GTP and TCP splicing.

Figure 4.7 shows the CDF of the achieved bitrates for the ’GTP+TC’ scenario for DASHsegment lengths of 1s to 15s. The figure shows that the download rate achieved for shortersegments is lower, which is partly due to TCP slow start and partly due to the delay introducedby the connection establishment needed for TCP splicing.

To gain more insight into the overhead introduced by the TC module we measured thelatency the module adds to packet processing. For both the GTP+TC and the TC scenarios wefound that the delay was less than 5ms for 99% of the packets. This delay is negligible comparedto the typical round trip time of around 50ms experienced in LTE networks.

4.4.4 DASH Streaming Performance

In the second set of experiments we consider multiple DCs simultaneously stream-ing DASH content to evaluate the impact of the TC module on the achieved streaming rate,and the impact of the module on the rate selection algorithm of the DCs, two factors that deter-mine the user perceived QoE. We use 6s length DASH segments and bitrates between 150 and8000kbps. To perform this set of experiments, we implemented an instrumented DASH client


Figure 4.8: Segment bitrate selection frequency for three scenarios and n = 4,32,64,128simultaneous DASH clients. Bitrates in the legend in kbps.

n4-d

umm

y

n4-G

TP+TC

n4-T

C

n32-

dum

my

n32-

GTP+T

C

n32-

TC

n64-

dum

my

n64-

GTP+T

C

n64-

TC

n128

-dum

my

n128

-GTP+T

C

n128

-TC

Fre

qu

en

cy o

f ra

te s

ele

cte

d

0

0.2

0.4

0.6

0.8

18000500025001200 700 500 150


that uses the bitrate calculation and selection algorithm of DASH-js9. In order to compensate forFloodlight’s multi-threaded packet processing, which can result in out-of-order packet deliveries,we increased the TCP receive buffers to allow for segment reordering.

Figure 4.8 shows the bitrate selection of the DASH clients for the ’dummy’, ’TC’ and’GTP+TC’ scenarios for n = 4,32,64,128 simultaneous DCs. For n = 4 the highest bitrate(8000kbps) is chosen almost exclusively in all three scenarios, hence the TC module has nosignificant impact on the rate selection algorithm. For 32 or more simultaneous clients we observethat ’TC’ and ’GTP+TC’ typically results in one level lower bitrate chosen most frequently (e.g.,700kbps instead of 1200 kbps for n = 64).

To gain insight into the rate changes of the DASH clients, Figure 4.9 shows the autocor-relation function (ACF) of the download rates achieved by the DCs for the same scenarios andfor n = 2,16,128 simultaneous DCs, as a function of the lag in terms of segments. The ACFshows an exponential decay for all scenarios and number of clients, which shows that the bitratesare stationary both with and without the TC module.

The experimental results show that the the proposed solution slightly decreases thestreaming performance. Note, however, that in lack of support in the available switch implemen-tations we implemented all functionality in the controller, and the performance of the proposed

9https://github.com/dazedsheep/DASH-JS

4.5. RELATED WORK 70

Figure 4.9: Autocorrelation function (ACF) of segment download rates for threescenarios and n = 4,32,64,128 simultaneous DASH clients.


switch-based solution would be better. Furthermore, in our evaluation scenario the OSVR islocated as close to the DCs as the CSVR. In practice, the round trip time to the OSVR may besignificantly higher than to the CSVR, which again would favor the proposed solution.

4.5 Related work

Previous work has considered SDN assisted caching and caching inside the mobilebackhaul.

Several works propose the use of SDN in the mobile backhaul, motivated by the abilityfor traffic offloading, multi vendor environments, support for higher rate of innovation and legacysupport, resulting in Operating Expense (OPEX) and Capital Expenditure (CAPEX) improve-ments (TOMOVIC; PEJANOVIC-DJURISIC; RADUSINOVIC, 2014; BRIEF, 2013). Authorsin (TOMOVIC; PEJANOVIC-DJURISIC; RADUSINOVIC, 2014; HAMPEL; STEINER; BU,2013) propose to replace LTE network entities by SDN controllers. A more evolutionary ap-proach is taken in (COSTA-REQUENA et al., 2014; HEINONEN et al., 2014), where the currentmobile network architecture is assisted by SDN controllers.

Several recent works have shown the benefits of in-network caching for mobile net-works (RAMANAN et al., 2013; CAROFIGLIO et al., 2015; WANG et al., 2014). While theseworks show the potential benefits of Radio Access Network (RAN) level caching, they do notconsider the implementation of dynamic caching and the impact of GTP tunneling between the

4.6. CONCLUSION 71

SGSN/ SGW and the GGSN/ PGW, which is the focus of our work.An SDN-based solution to provide cache-as-a-service was proposed in (GEORGOPOU-

LOS et al., 2014). The proposed solution consists of two entities, a coordinator and a cachenode. The coordinator receives and redirects requests according to information previouslyreceived from Content Providers about the cache contents. The proposed architecture doesnot consider a mobile network scenario and does not implement real-time cacheable contentdetection. SDN-based transparent caching at the RAN level was proposed in (KIMMERLIN;COSTA-REQUENA; MANNER, 2014). The proposal relies on DPI to identify requests forcacheable content, and if a request for cacheable content is identified, a content-depedent IPv6address is assigned to the request. The prefix is later used to forward the request to the appro-priate cache, without the need for further inspection. Authors in (HAW; HONG; LEE, 2014)propose the joint deployment of a MME within an SDN controller to support content deliverythrough Content Centric Networks (CCN). The solution proposes to use an IPv6 header extensionto identify CCN packets by the CCN Gateway, placed after the PGW, and redirect requests to theappropriate cache. Also in the context of CCN, (SALSANO et al., 2013) proposes an SDN-basedextension of the CONET Information Centric Network (ICN) architecture. The CONET ICNarchitecture is composed of ICN Clients, ICN Servers and a Name Routing System (NRS).Packet forwarding is done based on content name in a "Lookup-and-Cache" fashion where eachforwarding node contacts the NRS for routing information. Content is identified using tagswithin an ICN/Openflow domain, which are later removed by edge nodes whenever traffic isleaving the ICN domain. The proposed solution was implemented on the OFELIA testbed usingOpenflow v1.0.

Unlike all previous works, we propose a scalable solution for SDN-enabled dymamiccaching that is compatible with current LTE backhaul technology, and supports dynamic requestredirection to caches based on locally managed lookup tables that can proactively be populatedbased on meta information such as the MPEG DASH MPD for minimizing the control traffic.

4.6 Conclusion

In this Chapter we investigated the feasibility of introducing SDN in the current mobilebackhaul architecture through the use-case of transparent in-network caching. We analyzed thedesign space for the introduction of SDN in the mobile backhaul for in-network caching, andproposed a scalable solution compatible with the existing backhaul architecture. We made acontroller-based prototype implementation of the proposed solution that demonstrates that theoverhead of the network functions needed in order to enable in-network caching in the mobilebackhaul through SDN switches (and controllers) is negligible. Furthermore, our results showthat the proposed solution has a minor impact on the streaming clients’ behaviour.

727272

5Conclusion

In the end? Nothing ends, Adrian. Nothing ever ends.

—DR. MANHATTAN (ALAN MOORE)

In this thesis, we presented strategies, tools and models towards improvements on CDNs’multimedia content delivery efficiency. This last chapter presents the main contributions of thisthesis (Section 5.1) and delineates some future works (Section 5.2).

5.1 Contributions of this Thesis

As stated in the Chapter 1, the primary goal of this Thesis was to design and evaluatesolutions to improve CDNs’ multimedia content delivery efficiency. Solutions presented coverrecent changes in the content delivery scenario, such as the ever growing number of small andlocalized CDNs and new technologies, i.e., Software Defined Networks (SDN).

In the Chapter 2, we presented a new Replica Placement Algorithm (RPA) called Flow-Count. A greedy RPA based on the number of flows passing through the nodes that compose thenetwork. We used the tool presented in Appendix A to evaluate the new strategy and compare it toother well-known RPAs showing that it maintains Quality of Experience (QoE) while decreasingnetwork resource usage.

Chapter 3 presented an analytical model that represents the collaboration between CDNs.The model represented all fundamental entities involved in a multiple CDN collaboration scenarioand considered two different redirection strategies, namely Recursive and Interactive RequestRouting. Using the proposed model we show that collaboration between CDNs presents potentialbenefits in terms of QoE.

Chapter 4 offered a solution to enhance the mobile backhaul’s flexibility enabling betterpositioning of Replica Servers (RS) inside the backhaul. The solution is based on a user-spaceextension to OpenFlow switches. We show the benefits of network devices’ programmabilityby designing and prototyping a transparent cache service. Our experimental results show thatthe delay introduced by the developed module is less than 5ms for 99% of the packets, which is

5.1. CONTRIBUTIONS OF THIS THESIS 73

Table 5.1: Scientific papers produced related to this Thesis.

# Reference Chapter Status

1

Rodrigues, M., Moreira, A., Azevedo, E., Neves, M., Sadok,D., Callado, A., Moreira, J. and Souza, V., 2013, April.On learning how to plan content delivery networks.In Proceedings of the 46th Annual Simulation Symposium(p. 13). Society for Computer Simulation International.

Appendix A Published

2

Rodrigues, M., Moreira, A., Neves, M., Azevêdo, E., Sadok,D., Callado, A. and Souza, V., 2013, April. Optimizing crosstraffic with an adaptive CDN replica placement strategy. InProceedings of the 46th Annual Simulation Symposium(p. 14). Society for Computer Simulation International.

Chapter 2 Published

3

Rodrigues, M., Moreira, A., Neves, M., Azevêdo, E., Sadok,D., Callado, A. and Souza, V., 2013, May. Flow count: ACDN dynamic Replica Placement Algorithm for cross trafficoptimization. In Integrated Network Management (IM 2013),2013 IFIP/IEEE International Symposium on (pp. 684-687).

Chapter 2 Published

4

Rodrigues, M., Fernandes, S., Kelner, J. and Sadok, D., 2014,May. An Analytical View of Multiple CDNs Collaboration.In Advanced Information Networking and Applications (AINA),2014 IEEE 28th International Conference on (pp. 25-32). IEEE.

Chapter 3 Published

5

Rodrigues, M., Fernandes, S., Kelner, J. and Sadok, D., UmaVisão Analítica da Colaboração entre Múltiplas CDNs.WPERFORMANCE – XII Workshop em Desempenho deSistemas Computacionais e de Comunicação, 2013.

Chapter 3 Published

6

Rodrigues, M., Dán, G., Gallo, M., Enabling TransparentCaching in LTE Mobile Backhaul Networks with SDN.IEEE INFOCOM’s International Workshop on Software-DrivenFlexible and Agile Networking (Swfan), 2016.

Chapter 4 Accepted

negligible in today’s LTE networks.It is also worth mentioning that during the course of this Thesis we worked towards the

development of the P2PCDNSim simulator. We present the simulator in the Appendix A alongwith a description of Content Delivery Networks (CDN) and their challenges. The developingthe simulator was to assist researchers in the process of proposing, developing and evaluatingstrategies to enhance CDNs.

Some results presented in this Doctoral Thesis were developed in cooperation with theMCP - A Multi-CDN Peering Platform and the IPTV - Optimizing Video Content Distribution,research projects carried out by the GPRT (Grupo de Pesquisa em Redes e Telecomunicações)and funded by Ericsson Telecomunicações S.A., Brazil.

Table 5.1 shows papers produced in the main area and scope of this Doctoral Thesis,including the papers already accepted, and the submitted papers that are under revision; andTable 5.2 shows others publications accepted during the Doctoral Thesis production.

5.2. FUTURE WORKS 74

Table 5.2: Other publications

# Reference

1

Neves, M., Rodrigues, M., Azevêdo, E., Sadok, D., Callado, A., Moreira, J. and Souza,V., 2013, May. Selecting the most suited cache strategy for specific streaming mediaworkloads. In Integrated Network Management (IM 2013), 2013 IFIP/IEEEInternational Symposium on (pp. 792-795). IEEE.

2

Takako Endo, P., Batista, M.S., Goncalves, G.E., Rodrigues, M., Sadok, D., Kelner, J.,Sefidcon, A. and Wuhib, F., 2013, December. Self-organizing strategies for resourcemanagement in Cloud Computing: State-of-the-art and challenges. In Cloud Computingand Communications (LatinCloud), 2nd IEEE Latin American Conference on (pp. 13-18).

3

Endo, P., Santos, M., Vitalino, J., Gonçalves, G., Rodrigues, M., Sadok, D.F., Kelner,J. and Sefidcon, A., 2014. Self-management of Live Streaming Application inDistributed Cloud Infrastructure. In Adaptive Resource Management and Schedulingfor Cloud Computing (pp. 165-179). Springer International Publishing.

4

Palhares, A., Santos, M., Endo, P., Vitalino, J., Rodrigues, M., Goncalves, G., Sadok, D.,Sefidcon, A. and Wuhib, F., 2014, May. Joint Allocation of Nodes and Links with LoadBalancing in Network Virtualization. In Advanced Information Networking andApplications (AINA), 2014 IEEE 28th International Conference on (pp. 148-155).

5

Santos, M., Endo, P., Bezerra, M., Gonçalves, G., Sadok, D. and Fernandes, S.,Revisitando uma Infraestrutura Autonômica: Uma Perspectiva Baseada em uma RedeDefinida por Software. In: 4o Workshop de Sistemas Distribuídos Autonômicos -WoSiDA 2014, 2014,Florianópolis.

6

Moreira, A., Rodrigues, M., Azevedo, E., Sadok, D., Callado, A. and Souza, V.,2015, May. Analyzing strategies to effectively detect changes in content deliverynetworks. In Integrated Network Management (IM), 2015 IFIP/IEEE InternationalSymposium on (pp. 693-699).

5.2 Future Works

Content Delivery Networks (CDN) prove to be the alternative to overcome challengesimposed by the increasing traffic demands and the best effort nature of the network. Despitebeing based on a simple concept, to bring content closer to end users, CDNs are complex systemswith several decisions to enable content delivery. New technologies stem new possibilities toenhance CDN’s efficiency, opening more and more opportunities for new strategies.

As a possible future work we suggest extending P2PCDNSim’s network capabilities tocover more transport protocols and, therefore, be able to simulate more complex scenarios. Afuture work related to FlowCount RPA would not only place replicas but also decide how manyreplicas should be placed according to current network status (i.e., the number of requests andnetwork topology). Furthermore, it is currently known that clients are never redirected to originservers; in Chapter 3 we considered this redirection as an approximation of a cache miss event.Therefore, as another future work we propose an extension of the Multiple CDN collaborationmodel for a more detailed treatment of the cache miss. Finally, our research group received anenormous amount of real trace data and logs from a significant content provider in Brazil. We

5.2. FUTURE WORKS 75

are collecting our first results regarding client behavior, and workload characteristics. Our planis to use information from those traces to conduct additional experiments and also reevaluate thestrategies proposed.

767676

References

ADLER, S. The Slashdot effect: an analysis of three internet publications. Linux Gazette, [S.l.],v.38, p.2, 1999.

AGGARWAL, V.; AKONJANG, O.; FELDMANN, A. Improving user and ISP experiencethrough ISP-aided P2P locality. In: INFOCOM WORKSHOPS 2008, IEEE. Anais. . . [S.l.: s.n.],2008. p.1–6.

AKAMAI. Facts & Figures. Accessed: 2016-01-15, 2013. Available at<http://www.akamai.com/html/about/facts_figures.html>. Visited on: 01 jan. 2016.

ALLMAN, M.; PAXSON, V.; BLANTON, E. TCP congestion control. [S.l.: s.n.], 2009.

BARTAL, Y. Probabilistic approximation of metric spaces and its algorithmic applications. In:ANNUAL SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE, 1996., 37.Proceedings. . . [S.l.: s.n.], 1996. p.184–193.

BASTUG, E.; BENNIS, M.; DEBBAH, M. Living on the edge: the role of proactive caching in5G wireless networks. IEEE Comm. Mag., [S.l.], 2014.

BAUDIN, F. OpenvSwitch L7 matchers & conntrack metadatas. 2014.

BERTRAND, G. et al. Use cases for CDNi. IETF Draft, Jan, [S.l.], 2012.

BILIRIS, A. et al. CDN brokering. Computer Communications, [S.l.], v.25, n.4, p.393–402,2002.

BRANDENBURG, R. van; PETERSON, L.; DAVIE, B. Framework for Content DistributionNetwork Interconnection (CDNI). [S.l.]: RFC Editor, 2015. n.7336. (Request for Comments).

BRIEF, O. S. OpenFlow™-Enabled Mobile and Wireless Networks. 2013.

BROBERG, J.; BUYYA, R.; TARI, Z. MetaCDN: harnessing ‘storage clouds’ for highperformance content delivery. Journal of Network and Computer Applications, [S.l.], v.32,n.5, p.1012–1022, 2009.

BUSARI, M.; WILLIAMSON, C. ProWGen: a synthetic workload generation tool forsimulation evaluation of web proxy caches. Computer Networks, [S.l.], v.38, n.6, p.779–794,2002.

BUYYA, R. et al. A case for peering of content delivery networks. Distributed SystemsOnline, IEEE, [S.l.], v.7, n.10, p.3–3, 2006.

BUYYA, R.; PATHAN, M.; VAKALI, A. Content delivery networks. [S.l.]: Springer Science& Business Media, 2008. v.9.

CAROFIGLIO, G. et al. Scalable mobile backhauling via information-centric networking. In:IEEE LANMAN. Proceedings. . . [S.l.: s.n.], 2015.

REFERENCES 77

CATHERINE, M. R.; EDWIN, E. B. A survey on recent trends in cloud computing and itsapplication for multimedia. International Journal of Advanced Research in ComputerEngineering & Technology (IJARCET), [S.l.], v.2, n.1, p.pp–304, 2013.

CDNSIM tutorial. Accessed: 2016-01-21, 2009. Available at:<http://oswinds.csd.auth.gr/cdnsim/docs/tut1/index.html>. Visited on: 01 jan. 2016.

CHANDRA, A.; WEISSMAN, J. Nebulas: using distributed voluntary resources to build clouds.In: HOT TOPICS IN CLOUD COMPUTING, 2009. Proceedings. . . [S.l.: s.n.], 2009. p.2–2.

CHANG, D. et al. How to realize CDN Interconnection (CDNI) over OpenFlow? In:INTERNATIONAL CONFERENCE ON FUTURE INTERNET TECHNOLOGIES, 7.Proceedings. . . [S.l.: s.n.], 2012. p.29–30.

CHEN, Y.; KATZ, R. H.; KUBIATOWICZ, J. D. Dynamic replica placement for scalablecontent delivery. In: Peer-to-peer systems. [S.l.]: Springer, 2002. p.306–318.

CHENYU, P. et al. FCAN: flash crowds alleviation network using adaptive p2p overlay of cacheproxies. IEICE transactions on communications, [S.l.], v.89, n.4, p.1119–1126, 2006.

CHIEF, I. B. Report from the Federal Communications Commission. [S.l.: s.n.], 2015.Accessed: 2016-01-15.

CHOFFNES, D. R.; BUSTAMANTE, F. E. Taming the torrent: a practical approach to reducingcross-isp traffic in peer-to-peer systems. In: ACM SIGCOMM COMPUTERCOMMUNICATION REVIEW. Anais. . . [S.l.: s.n.], 2008. v.38, n.4, p.363–374.

CISCO, C. V. N. I. Global Mobile Data Traffic Forecast Update. 2014–2019 (white paper).[S.l.: s.n.], 2015.

COHEN, A.; RANGARAJAN, S.; SLYE, H. On the Performance of TCP Splicing forURL-aware Redirection. In: USENIX USITS. Proceedings. . . [S.l.: s.n.], 1999.

CONVIVA. Conviva: viewer experience report. Accessed: 2016-01-15, 2015. Available at:<http://www.conviva.com/conviva-viewer-experience-report/vxr-2015/>. Visited on: 01 jan.2016.

COSTA-REQUENA, J. et al. SDN optimized caching in LTE mobile networks. In: IEEE ICTC.Proceedings. . . [S.l.: s.n.], 2014.

DAS, S.; KANGASHARJU, J. Evaluation of network impact of content distribution mechanisms.In: SCALABLE INFORMATION SYSTEMS, 1. Proceedings. . . [S.l.: s.n.], 2006. p.35.

DAY, M. et al. A model for content internetworking (CDI). Work in Progress, [S.l.], 2003.

DHINGRA, A.; SACHDEVA, M. Recent Flash Events: a study. Accessed: 2016-01-19, 2014.Available at: <http://sbsstc.ac.in/icccs2014/Papers/Paper20.pdf>. Visited on: 01 jan. 2016.

DILLON, T.; WU, C.; CHANG, E. Cloud computing: issues and challenges. In: ADVANCEDINFORMATION NETWORKING AND APPLICATIONS (AINA), 2010 24TH IEEEINTERNATIONAL CONFERENCE ON. Anais. . . [S.l.: s.n.], 2010. p.27–33.

ECONOMOU, G. How Akamai maps the net: an industry perspective, 2010. 2010.

REFERENCES 78

FORECAST, C. Cisco Visual Networking Index: forecast and methodology 2013-2018. CiscoPublic Information, [S.l.], 2014.

FORTINO, G.; RUSSO, W. Using P2P, GRID and Agent technologies for the development ofcontent distribution networks. Future Generation Computer Systems, [S.l.], v.24, n.3,p.180–190, 2008.

FRANK, B. et al. Collaboration Opportunities for Content Delivery and Network Infrastructures.Recent Advances in Networking, [S.l.], p.305–377, 2013.

FREEDMAN, M. J.; FREUDENTHAL, E.; MAZIERES, D. Democratizing Content Publicationwith Coral. In: NSDI. Anais. . . [S.l.: s.n.], 2004. v.4, p.18–18.

GADDE, S.; CHASE, J.; RABINOVICH, M. Web caching and content distribution: a view fromthe interior. Computer Communications, [S.l.], v.24, n.2, p.222–231, 2001.

GEORGOPOULOS, P. et al. Cache as a service: leveraging sdn to efficiently and transparentlysupport video-on-demand on the last mile. In: ICCCN. Proceedings. . . [S.l.: s.n.], 2014.

GOLDENBERG, D. K. et al. Optimizing cost and performance for multihoming. In: ACMSIGCOMM COMPUTER COMMUNICATION REVIEW. Anais. . . [S.l.: s.n.], 2004. v.34, n.4,p.79–92.

GONÇALVES, G. et al. D-CRAS: distributed cloud resource allocation system. In: IEEENETWORK OPERATIONS AND MANAGEMENT SYMPOSIUM, 2012. Anais. . . [S.l.: s.n.],2012.

HAMPEL, G.; STEINER, M.; BU, T. Applying Software-Defined Networking to the TelecomDomain. In: GLOBAL INTERNET SYMPOSIUM. Proceedings. . . [S.l.: s.n.], 2013.

HAW, R.; HONG, C. S.; LEE, S. An Efficient Content Delivery Framework for SDN Based LTENetwork. In: ACM ICUIMC. Proceedings. . . [S.l.: s.n.], 2014.

HEINONEN, J. et al. Dynamic Tunnel Switching for SDN-based Cellular Core Networks. In:ACM ALLTHINGSCELLULAR. Proceedings. . . [S.l.: s.n.], 2014.

IDC. Smartphone Outlook Remains Strong for 2014, Up 23.8According to IDC. Accessed:2016-01-15, 2014. Available at: <http://www.idc.com/getdoc.jsp?containerId=prUS25058714>.Visited on: 2016-01-15.

IMBRENDA, C.; MUSCARIELLO, L.; ROSSI, D. Analyzing cacheability in the accessnetwork with HACkSAw. In: ACM ICN. Proceedings. . . [S.l.: s.n.], 2014.

JAMIN, S. et al. Constrained mirror placement on the Internet. In: INFOCOM 2001.TWENTIETH ANNUAL JOINT CONFERENCE OF THE IEEE COMPUTER ANDCOMMUNICATIONS SOCIETIES. PROCEEDINGS. IEEE. Anais. . . [S.l.: s.n.], 2001. v.1,p.31–40.

JEONG, J. et al. Network traffic reduction through smart network. In: INFORMATIONNETWORKING (ICOIN), 2013 INTERNATIONAL CONFERENCE ON. Anais. . . [S.l.: s.n.],2013. p.686–689.

REFERENCES 79

JESUS, V.; AGUIAR, R. L. Figures of merit for the placement (in) efficiency of interconnectedCDNs. In: COMPUTERS AND COMMUNICATIONS (ISCC), 2012 IEEE SYMPOSIUM ON.Anais. . . [S.l.: s.n.], 2012. p.000277–000282.

JOHANSSON, N.; LöFGREN, A. Designing for Extensibility: an action research study ofmaximizing extensibility by means of design principles. [S.l.]: University of Gothenburg,Department of Applied Information Technology, 2009.

JOHNSON, K. L. et al. The measured performance of content distribution networks. ComputerCommunications, [S.l.], v.24, n.2, p.202–206, 2001.

KANGASHARJU, J.; ROBERTS, J.; ROSS, K. W. Object replication strategies in contentdistribution networks. Computer Communications, [S.l.], v.25, n.4, p.376–383, 2002.

KANGASHARJU, J.; ROSS, K. W.; ROBERTS, J. W. Performance evaluation of redirectionschemes in content distribution networks. Computer Communications, [S.l.], v.24, n.2,p.207–214, 2001.

KARLSSON, M. et al. Do we need replica placement algorithms in content delivery networks.In: WCW), 7. Anais. . . [S.l.: s.n.], 2002.

KATSAROS, D. et al. CDNs Content Outsourcing via Generalized Communities. Knowledgeand Data Engineering, IEEE Transactions on, [S.l.], v.21, n.1, p.137–151, Jan 2009.

KHALAJI, F. K.; ANALOUI, M. Hybrid CDN-P2P architecture: replica content placementalgorithms. In: INFORMATION AND KNOWLEDGE TECHNOLOGY (IKT), 2013 5THCONFERENCE ON. Anais. . . [S.l.: s.n.], 2013. p.7–12.

KHAN, S. U. et al. Robust CDN replica placement techniques. In: IEEE INTERNATIONALSYMPOSIUM ON PARALLEL & DISTRIBUTED PROCESSING, 2009. IPDPS 2009.Anais. . . [S.l.: s.n.], 2009. p.1–8.

KIMMERLIN, M.; COSTA-REQUENA, J.; MANNER, J. Caching using software-definednetworking in LTE networks. In: IEEE ANTS. Proceedings. . . [S.l.: s.n.], 2014.

KRISHNAMURTHY, B.; WILLS, C.; ZHANG, Y. On the use and performance of contentdistribution networks. In: ACM SIGCOMM WORKSHOP ON INTERNET MEASUREMENT,1. Proceedings. . . [S.l.: s.n.], 2001. p.169–182.

KRISHNAN, P.; RAZ, D.; SHAVITT, Y. The cache location problem. IEEE/ACMTransactions on Networking (TON), [S.l.], v.8, n.5, p.568–582, 2000.

LAM, S. S. Sliding Window Protocol and TCP Congestion Control.http://www.cs.utexas.edu/users/lam/386p/slides/TCP2014. Available at:<http://www.cs.utexas.edu/users/lam/386p/slides/TCPVisited on: 01 jan. 2016.

LE BLOND, S.; LEGOUT, A.; DABBOUS, W. Pushing bittorrent locality to the limit.Computer Networks, [S.l.], v.55, n.3, p.541–557, 2011.

LI, Y.; LIU, M. T. Optimization of performance gain in content distribution networks with serverreplicas. In: SYMPOSIUM ON APPLICATIONS AND THE INTERNET, 2003. Proceedings. . .[S.l.: s.n.], 2003. p.182–189.

REFERENCES 80

MCKEOWN, N. et al. OpenFlow: enabling innovation in campus networks. ACM SIGCOMMComputer Communication Review, [S.l.], v.38, n.2, p.69–74, 2008.

MEDINA, A. et al. BRITE: an approach to universal topology generation. In: MODELING,ANALYSIS AND SIMULATION OF COMPUTER AND TELECOMMUNICATIONSYSTEMS, 2001. PROCEEDINGS. NINTH INTERNATIONAL SYMPOSIUM ON. Anais. . .[S.l.: s.n.], 2001. p.346–353.

MEKKY, H. et al. Application-aware Data Plane Processing in SDN. In: ACM HOTSDN.Proceedings. . . [S.l.: s.n.], 2014.

MICHEL, K. Live Video Streaming that Can Handle Traffic Spikes - The Challenge.Accessed: 2016-01-10, 2013. Available at: <https://blogs.akamai.com/2013/01/live-video-streaming-that-can-handle-traffic-spikes-the-challenge.html>. Visited on: 01 jan.2016.

MOLINA, B.; PALAU, C. E.; ESTEVE, M. Modeling content delivery networks and theirperformance. Computer Communications, [S.l.], v.27, n.15, p.1401 – 1411, 2004.

MOREIRA, A. AN ADAPTABLE STORAGE SLICING ALGORITHM FOR CONTENTDELIVERY NETWORKS. 2015. Tese (Doutorado em Ciência da Computação) —Universidade Federal de Pernambuco.

MOREIRA, A. et al. A Case for Virtualization of Content Delivery Networks. In: WINTERSIMULATION CONFERENCE. Proceedings. . . Winter Simulation Conference, 2011.p.3183–3194. (WSC ’11).

MOREIRA, A. et al. Analyzing strategies to effectively detect changes in content deliverynetworks. In: INTEGRATED NETWORK MANAGEMENT (IM), 2015 IFIP/IEEEINTERNATIONAL SYMPOSIUM ON. Anais. . . [S.l.: s.n.], 2015a. p.693–699.

MOREIRA, A. et al. An adaptable storage slicing algorithm for content delivery networks. In:INTEGRATED NETWORK MANAGEMENT (IM), 2015 IFIP/IEEE INTERNATIONALSYMPOSIUM ON. Anais. . . [S.l.: s.n.], 2015b. p.679–685.

MPEG, I. Information technology-dynamic adaptive streaming over http (dash)-part 1: mediapresentation description and segment formats. ISO/IEC MPEG, Tech. Rep, [S.l.], 2014.

NEVES, M. et al. Selecting the most suited cache strategy for specific streaming mediaworkloads. In: INTEGRATED NETWORK MANAGEMENT (IM 2013), 2013 IFIP/IEEEINTERNATIONAL SYMPOSIUM ON. Anais. . . [S.l.: s.n.], 2013. p.792–795.

NICULESCU, D. et al. Implementation of a Media Aware Network Element for ContentAware Networks. [S.l.]: CTRQ, 2011.

NIVEN-JENKINS, B.; LE FAUCHEUR, F.; BITAR, N. Content distribution networkinterconnection (CDNI) problem statement. [S.l.: s.n.], 2012.

PALLIS, G.; VAKALI, A. Insight and Perspectives for Content Delivery Networks. Commun.ACM, New York, NY, USA, v.49, n.1, p.101–106, Jan. 2006.

PATHAN, A.-M. K.; BUYYA, R. A taxonomy and survey of content delivery networks. GridComputing and Distributed Systems Laboratory, University of Melbourne, TechnicalReport, [S.l.], p.4, 2007.

REFERENCES 81

PATHAN, A.-M. K. et al. An Architecture for Virtual Organization (VO)-based EffectivePeering of Content Delivery Networks. In: SECOND WORKSHOP ON USE OF P2P, GRIDAND AGENTS FOR THE DEVELOPMENT OF CONTENT NETWORKS, New York, NY,USA. Proceedings. . . ACM, 2007. p.29–38. (UPGRADE ’07).

PATHAN, A.-M. K. et al. An architecture for virtual organization (VO)-based effective peeringof content delivery networks. In: USE OF P2P, GRID AND AGENTS FOR THEDEVELOPMENT OF CONTENT NETWORKS. Proceedings. . . [S.l.: s.n.], 2007. p.29–38.

PATHAN, M. Internetworking of Content Delivery Networks. [S.l.]: LAP LambertAcademic Publishing, 2011.

PATHAN, M.; BUYYA, R. Performance models for peering content delivery networks. In:NETWORKS, 2008. ICON 2008. 16TH IEEE INTERNATIONAL CONFERENCE ON.Anais. . . [S.l.: s.n.], 2008. p.1–7.

PAXSON, V.; ALLMAN, M.; TIMER, C. T. R. RFC 2988. Computing TCP’s RetransmissionTimer, [S.l.], 2000.

PFAFF, B. et al. The Design and Implementation of Open vSwitch. In: USENIX NSDI.Proceedings. . . [S.l.: s.n.], 2015.

PRESTI, F. L.; BARTOLINI, N.; PETRIOLI, C. Dynamic replica placement and user requestredirection in content delivery networks. In: IEEE INTERNATIONAL CONFERENCE ONCOMMUNICATIONS, 2005. ICC 2005., 2005. Anais. . . [S.l.: s.n.], 2005. v.3, p.1495–1501.

QIU, L.; PADMANABHAN, V. N.; VOELKER, G. M. On the placement of web server replicas.In: INFOCOM 2001. TWENTIETH ANNUAL JOINT CONFERENCE OF THE IEEECOMPUTER AND COMMUNICATIONS SOCIETIES. PROCEEDINGS. IEEE. Anais. . .[S.l.: s.n.], 2001. v.3, p.1587–1596.

QUARTER, A. R. T. ’State of the Internet’Report. 2012.

RAMANAN, B. et al. Cacheability analysis of HTTP traffic in an operational LTE network. In:IN PROC. OF WTS. Anais. . . [S.l.: s.n.], 2013.

RODRIGUES, M.; DÁN, G.; GALLO, M. Enabling Transparent Caching in LTE MobileBackhaul Networks with SDN. In: THE 1ST INTERNATIONAL WORKSHOP ONSOFTWARE-DRIVEN FLEXIBLE AND AGILE NETWORKING. Anais. . . [S.l.: s.n.], 2016.

RODRIGUES, M. et al. On Traffic Locality and QoE in Hybrid CDN-P2P Networks. In:ANNUAL SIMULATION SYMPOSIUM, 44., San Diego, CA, USA. Proceedings. . . Societyfor Computer Simulation International, 2011. p.175–182. (ANSS ’11).

RODRIGUES, M. et al. Uma Visão Analítica da Colaboração entre Múltiplas CDNs. In:WPERFORMANCE – XII WORKSHOP EM DESEMPENHO DE SISTEMASCOMPUTACIONAIS E DE COMUNICAÇÃO. Anais. . . [S.l.: s.n.], 2013.

RODRIGUES, M. et al. On Learning How to Plan Content Delivery Networks. In: ANNUALSIMULATION SYMPOSIUM, 46., San Diego, CA, USA. Proceedings. . . Society forComputer Simulation International, 2013. p.13:1–13:8. (ANSS 13).

REFERENCES 82

RODRIGUES, M. et al. Optimizing Cross Traffic with an Adaptive CDN Replica PlacementStrategy. In: ANNUAL SIMULATION SYMPOSIUM, 46., San Diego, CA, USA.Proceedings. . . Society for Computer Simulation International, 2013a. p.14:1–14:8. (ANSS13).

RODRIGUES, M. et al. Flow count: a cdn dynamic replica placement algorithm for cross trafficoptimization. In: INTEGRATED NETWORK MANAGEMENT (IM 2013), 2013 IFIP/IEEEINTERNATIONAL SYMPOSIUM ON. Anais. . . [S.l.: s.n.], 2013b. p.684–687.

RODRIGUES, M. et al. An Analytical View of Multiple CDNs Collaboration. In: ADVANCEDINFORMATION NETWORKING AND APPLICATIONS (AINA), 2014 IEEE 28THINTERNATIONAL CONFERENCE ON. Anais. . . [S.l.: s.n.], 2014. p.25–32.

RUAN, B. et al. Improving locality of BitTorrent with ISP cooperation. In: INTERNATIONALCONFERENCE ON ELECTRONIC COMPUTER TECHNOLOGY, 2009. Anais. . . [S.l.: s.n.],2009. p.443–447.

SALSANO, S. et al. Information centric networking over SDN and OpenFlow: Architecturalaspects and experiments on the OFELIA testbed. Computer Networks, [S.l.], 2013.

SARGENT, R. G. Verification and validation of simulation models. In: WINTERSIMULATION, 37. Proceedings. . . [S.l.: s.n.], 2005. p.130–143.

SATSIOU, A.; PATERAKIS, M. Efficient caching of video content to an architecture of proxiesaccording to a frequency-based cache management policy. In: ADVANCEDARCHITECTURES AND ALGORITHMS FOR INTERNET DELIVERY ANDAPPLICATIONS, 2. Proceedings. . . [S.l.: s.n.], 2006. p.9.

SAYAL, M. et al. Selection algorithms for replicated web servers. ACM SIGMETRICSPerformance Evaluation Review, [S.l.], v.26, n.3, p.44–50, 1998.

SEEDORF, J.; KIESEL, S.; STIEMERLING, M. Traffic localization for P2P-applications: thealto approach. In: PEER-TO-PEER COMPUTING, 2009. P2P’09. IEEE NINTHINTERNATIONAL CONFERENCE ON. Anais. . . [S.l.: s.n.], 2009. p.171–177.

SHARMA, A.; VENKATARAMANI, A.; SITARAMAN, R. K. Distributing content simplifiesisp traffic engineering. In: ACM SIGMETRICS PERFORMANCE EVALUATION REVIEW.Anais. . . [S.l.: s.n.], 2013. v.41, n.1, p.229–242.

SLATTERY, T. 95TH Percentile Calculation. Accessed: 2016-01-28, 2011. Available at:<http://www.netcraftsmen.com/95th-percentile-calculation/>. Visited on: 01 jan. 2016.

STAMOS, K. et al. CDNsim: a simulation tool for content distribution networks. ACMTransactions on Modeling and Computer Simulation (TOMACS), [S.l.], v.20, n.2, p.10,2010.

SUN, J. et al. Heuristic Replica Placement Algorithms in Content Distribution Networks.Journal of Networks, [S.l.], v.6, n.3, 2011.

SZTRIK, J. Basic queueing theory. University of Debrecen, Faculty of Informatics, [S.l.],v.193, 2012.

REFERENCES 83

TANG, W. et al. Medisyn: a synthetic streaming media service workload generator. In:NETWORK AND OPERATING SYSTEMS SUPPORT FOR DIGITAL AUDIO AND VIDEO,13. Proceedings. . . [S.l.: s.n.], 2003. p.12–21.

TOMOVIC, S.; PEJANOVIC-DJURISIC, M.; RADUSINOVIC, I. SDN Based MobileNetworks: concepts and benefits. Wireless Personal Communications, [S.l.], 2014.

TOOTOONCHIAN, A.; GANJALI, Y. HyperFlow: a distributed control plane for openflow. In:USENIX INM/WREN. Proceedings. . . [S.l.: s.n.], 2010.

TRIUKOSE, S.; AL-QUDAH, Z.; RABINOVICH, M. Content delivery networks: protection orthreat? In: Computer Security–ESORICS 2009. [S.l.]: Springer, 2009. p.371–389.

ULC, S. Sandvine global Internet phenomena Report-1h2014. [S.l.]: Technical report, 2014.

VAKALI, A.; PALLIS, G. Content delivery networks: status and trends. Internet Computing,IEEE, [S.l.], v.7, n.6, p.68–74, 2003.

VANDERHOOF, D. 95th percentile bandwidth metering explained and analyzed.[S.l.: s.n.], 2011. Accessed: 2016-01-28.

WANG, L. et al. Reliability and Security in the CoDeeN Content Distribution Network. In:USENIX ANNUAL TECHNICAL CONFERENCE, GENERAL TRACK. Anais. . . [S.l.: s.n.],2004. p.171–184.

WANG, L.; PAI, V.; PETERSON, L. The effectiveness of request redirection on CDNrobustness. ACM SIGOPS Operating Systems Review, [S.l.], v.36, n.SI, p.345–360, 2002.

WANG, R. et al. TCP startup performance in large bandwidth networks. In: INFOCOM 2004.TWENTY-THIRD ANNUALJOINT CONFERENCE OF THE IEEE COMPUTER ANDCOMMUNICATIONS SOCIETIES. Anais. . . [S.l.: s.n.], 2004. v.2, p.796–805.

WANG, W.; ZENG, G. A generic trust overlay simulator for P2P networks. In: DEPENDABLECOMPUTING, 2006. PRDC’06. 12TH PACIFIC RIM INTERNATIONAL SYMPOSIUM ON.Anais. . . [S.l.: s.n.], 2006. p.401–402.

WANG, X. et al. Cache in the air: exploiting content caching and delivery techniques for 5Gsystems. IEEE Comm. Mag., [S.l.], 2014.

WENDELL, P.; FREEDMAN, M. J. Going viral: flash crowds in an open cdn. In: ACMSIGCOMM CONFERENCE ON INTERNET MEASUREMENT CONFERENCE, 2011.Proceedings. . . [S.l.: s.n.], 2011. p.549–558.

WICHTLHUBER, M.; REINECKE, R.; HAUSHEER, D. An SDN-Based CDN/ISPCollaboration Architecture for Managing High-Volume Flows. IEEE Transactions onNetwork and Service Management, [S.l.], v.12, n.1, p.48–60, March 2015.

XU, Z.; BHUYAN, L. QoS-aware object replica placement in CDNs. In: IEEE GLOBALTELECOMMUNICATIONS CONFERENCE, 2005. GLOBECOM’05. Anais. . . [S.l.: s.n.],2005. v.2, p.5–pp.

ZHUANG, Z.; GUO, C. Optimizing CDN Infrastructure for Live Streaming with ConstrainedServer Chaining. In: PARALLEL AND DISTRIBUTED PROCESSING WITHAPPLICATIONS (ISPA), 2011 IEEE 9TH INTERNATIONAL SYMPOSIUM ON. Anais. . .[S.l.: s.n.], 2011. p.183–188.

Appendix

858585

AP2PCDNSim Simulation Tool

We live in a society exquisitely dependent on science and technology, in

which hardly anyone knows anything about science and technology.

—CARL SAGAN

A quick development and evaluation of new strategies could be a key difference intoday’s fast changing market. Therefore, there is a need for a tool that can assist researchersin the process of proposing and evaluating new strategies. Such tool should model CDN andnetwork aspects and also assist researchers in evaluation tasks, such as configuring differentscenarios and comparing strategies decisions that cover from replica server placement to RR’sredirection strategy selection.

Therefore, this Appendix presents the P2PCDNSim, a simulation tool designed toassist researchers in the process of proposing and evaluating strategies to improve CDNs. ThisAppendix is divided into two main parts: first we describe CDN and afterwords we presentthe simulator. For this, it is organized as follows: in Section A.1, we present a description ofCDN; Section A.2 presents the state of the art regarding CDN simulation tools; Section A.3describes our simulation tool, the P2PCDNSim simulator; in Section A.5, we discuss lessonslearned with our simulator; and finally, some considerations about this second Chapter are raisedin Section A.6.

The results obtained from this Chapter were published in Rodrigues et al. (2013).

A.1 Content Delivery Networks

CDNs are overlay networks designed to efficiently distribute content. The basic ideabehind a CDN is to keep the content close to the users, this is done by distributing content to RS,strategically located near such users. Thus, improving user’s perceived QoE by providing efficientcontent distribution (PALLIS; VAKALI, 2006). CDNs are a collaborative set of network elementsin which content is replicated in order to perform its transparent and effective delivery (BUYYA;PATHAN; VAKALI, 2008). Figure A.1 illustrates the CDN model, where content is scattered

A.1. CONTENT DELIVERY NETWORKS 86

around the globe by the content distribution system. A CDNs system is composed by three basic

Figure A.1: Illustration of the basic CDN concept, store content close to end users.


actors (PATHAN, 2011); the Content Providers (CP) is the one that provides the content thatwill be distributed; the CDN provider which is the actor that provides the content distributioninfrastructure to the content provider; and the end user is the user that consumes the contentgenerated by the content provider.

Typical CDN’s functionalities include (BUYYA; PATHAN; VAKALI, 2008):

� Request redirection and content delivery services: to redirect requests to theclosest server and, therefore, overcoming flash crowd effects;

� Content outsourcing and distribution services: to replicate content from the originserver to RSs;

� Management services: to manage network resources, handle accounting, monitorand report content usage.

The distributed content comprises, objects, provided by content providers, and requestedby the end-users. They may be replicated on demand or pushed before hand to RSs. Inpractice CDNs typically host static content (for instance; static HTML pages, images, documents,software patches), multimedia streaming content, such as video and audio, that can be UserGenerated Content (UGC). CPs include, but are not limited to, large enterprises, Web serviceproviders, media companies, and news broadcasters (BUYYA; PATHAN; VAKALI, 2008).


A.1.1 CDN Architecture

CDNs are composed by three main entities; an OS, the entity that holds the content thatwill be distributed by the CDN; a set of Replica Servers (RS), the entities that are geograph-ically distributed and store popular content; and a Request Redirector (RR), responsible forredirecting clients to the most suitable RS. Assisting those entities we have the distributionsystem, responsible for collecting content from the OS and distributing it to RSs, and the ac-counting/monitoring system. Figure A.2illustrates CDN components and their relation. The OScontacts the distribution system to provide the content that will be distributed by the CDN. Thedistribution system is the main entity involved in content outsourcing and distribution service. Itreports the content to geographically scattered RSs, which will report the new handled contentto the RR. Regarding the request redirection and content delivery service, when end usersrequest content from the content provider, they are redirected to the RR, which will choosethe most suitable RS to handle the received request, and redirect clients to the selected RS. Itis important to notice that when the distribution system reports the new content to RSs, thisdoes not necessarily imply that the distribution system is actually sending content to RSs. Inother words, they operate in a reactive manner, the content will only be fetched by RSs onceit receives a request from end users. Considering management services the monitoring systemkeeps continuous contact with OS and RSs to track content injection and distribution.

To better understand a CDN we need to take a closer look into its main entities; OS, RR, RS,and Monitor.

The OS represents the content owner in a CDN system. It works alongside the distributionsystem to inject content in RSs. The content injection occurs in full or partial mode (BUYYA;PATHAN; VAKALI, 2008). The full content injection means that RSs will receive a completecopy of all objects from the OS, which might be a possibility for small objects, such as webpages. However, considering multimedia HD and UltraHD content, full content injection modebecomes unpractical. In partial content injection mode RSs receive a subset of the total contentprovided by the OS. Authors in Katsaros et al. (2009) categorize the following subset selectionstrategies:

� Empirical-based outsourcing: CDN engineers decides which content subset thatwill later be replicated to RSs. Engineers base their decision on heuristics used toguide their choices. The main problems with this solution are the lack of flexibilityand the uncertainty of choosing the right heuristics;

� Popularity-based outsourcing: content subset is composed of the most popularobjects. A popular and largely studied approach. Depends on specific contentstatistics which are often available through the monitoring system. However, themonitor might lack statistics regarding freshly injected content.

� Object-based outsourcing: content is replicated to RSs by unit of objects. This


Figure A.2: CDN components and how they interact based on an illustration foundin (BUYYA; PATHAN; VAKALI, 2008).

Source: Based on a figure found at (BUYYA; PATHAN; VAKALI, 2008).

strategy counts on a greedy approach to determine object outsourcing according tothe highest performance gain, achieving the best performance but it is also the morecomplex one.

� Cluster-based outsourcing: a set of content clusters composes the subset of contentthat will be replicated into RSs. A content cluster is a defined as a group of contentobjects that have common characteristics, such as content type, reference time, andnumber of references.

RSs, also called edge servers or surrogate servers, are those servers geographicallydistributed to be placed as close to end users as possible to enhance their QoE. However, theplacement of those RSs is not a trivial task (SUN et al., 2011). The problem consists in achievingthe best content delivery possible, meaning high QoE and low infrastructure resources usage,by placing a set of RSs in a larger set of possible locations. Considering the computationalcomplexity of the problem, researchers proposed RPAs based on a set of different heuristics that


use existing information collected from the CDN, such as network topology and content requestpatterns, to efficiently place RSs. Below, we present a list of popular heuristic based RPAsstrategies:

� Greedy algorithm (KRISHNAN; RAZ; SHAVITT, 2000; QIU; PADMANABHAN;VOELKER, 2001): this RPA, as expected, is based on a greedy strategy. The ideais to choose M among N potential sites interactively placing each replica at a time.The placement selection starts by evaluating the cost of serving all clients from areplica server placed in all potential sites available. The lowest cost is selected andthe first RS is placed. Subsequent placement calculations will consider previousplaced RSs. The interaction continues until the placement of all RSs. This approachis known to perform well even in the case of imperfect input data. However, itrequires previous knowledge of client locations and inter node distances.

� Topolgy-informed placement (JAMIN et al., 2001): this approach is also based on agreedy algorithm. The difference is that the metric used is outdegrees (outgoing links)of topology nodes. The basic assumption is that nodes with a high outdegree countare better connected and can reach more nodes while experiencing lower latency.Nonetheless, the outdegree metric may not reflect important aspects for RPAs, suchas end user density and location.

� Hot Spot (QIU; PADMANABHAN; VOELKER, 2001): this approach is based onthe idea that replicas should be placed near spots with clients generating the greatestamong of requests. Therefore, it sorts all potential sites according to the amount oftraffic generated in its mist, according to a radius value. The radius represents thehighest separation degree between the potential site and other sites in its vicinity.Thus, radius = 1 means that amount of traffic related to each potential site will be thesum of the traffic generated by the potential site itself and its direct neighbors. Thisapproach is very simple to understand and implement, however, it is outperformed byGreedy.

The expected result of a RPA is the scattered placement of replicas near end users. Thenext step into efficient content delivery is to redirect such users to the most suitable replica.The RR is the entity responsible for user redirection. It is important to notice that the mostappropriate replica may not always be the "closest" one. Typically RR uses a set of metrics toselect the most suitable options, such as network proximity, client-perceived latency, distance,available bandwidth and replica server load.

The set of metrics used by the RR enables a high-level understanding of the contentdistribution service. The entity responsible for probing the network to collect such informationis the Monitor. It gathers information related to almost all entities in a CDN system. Fromthe content provider’s perspective there are three metric sets to consider availability, cost and


performance. Our study focuses on enhancing the performance of CDNs, therefore we will focuson performance metrics, such as (VAKALI; PALLIS, 2003; JOHNSON et al., 2001; LI; LIU,2003):

� Cache hit ratio: the ratio between the number of requested objects stored in replicaservers and the total number of requests. Robust caching policies have a high cachehit ratio (GADDE; CHASE; RABINOVICH, 2001).

� Saved bandwidth: represents the difference between serving content from RSs andserving the same content from the OS.

� Latency: the latency perceived by clients requesting content from RSs.

� RSs utilization: general information regarding replica servers, such as CPU loadand I/O.

A.1.2 The Flash Crowd Challenge

Along with resource management challenges discussed earlier, CDN systems continueto face the scalability problem. Flash crowd events, also called slash dot effect (ADLER, 1999)and flash events (DHINGRA; SACHDEVA, 2014), are a sudden increase in request rate. Despitethe fact of being the best technology to scale content distribution, a flash crowd event may resultin CDNs systems performing poorly. According to (BUYYA; PATHAN; VAKALI, 2008) theterm "flash crowd" comes from a 1973 science fiction writer called Larry Niven, and it relates toone of his short novels called Flash Crowd where he describes a situation where "cheap and easyteleportation enables tens of thousands of people worldwide to flock to the scene of anythinginteresting almost instantly". One might notice the similarity between what he described andwhat happens with extremely popular content distributed on-line.

Authors in (WENDELL; FREEDMAN, 2011) define flash crowd events as periodswhere request rates over a particular fully-qualified domain name are increasing exponentially.Considering rti the average per-minute request rate over the period ti, authors consider that aparticular site is experiencing a flash crowd if rti > 2i · rt0,∀i ∈ [0,k], where k represents a modestnumber of periods. Figure A.3 illustrates the number of requests received over time during aflash crowd event.

Authors in Chenyu et al. (2006) presented a list of some significant flash crowd eventcharacteristics, such as:

1. Often related to events of great interest, for instance, popular sports related events(FIFA’s world cup) and breaking news stories (September 11th terrorist attack);

2. The increase in request rate is dramatic but also has a short duration. Therefore, thetraditional over provisioning strategy results in underused resources.

A.2. CDN SIMULATION STATE OF THE ART 91

Figure A.3: Request rate variation over time during a flash crowd event.


3. Request distribution is Zipf-like and the number of clients is equivalent to the requestrate. These are big differences to rule out the DDoS attack.

4. A small number of objects, less than 10%, is responsible for most requests, morethan 90%.

One might notice that in the case of popular events, such as FIFA’s world cup, itspopularity means that a flash crowd is expected and, therefore, one can prepare resources tohandle the event. However, there may not be enough resources to deal with the extra traffic andconsidering the amount of resources involved, and simple management decisions could result inconsiderable OPEX savings. Therefore, both predictable and unpredictable flash crowd eventspose a serious challenge to CDN systems.

A.2 CDN Simulation State of the Art

Trying to identify and learn about existing CDN simulators, we observed that mostexisting research tackles specific CDN design issues, often in an isolated manner. Whereasthere is abundant literature on many CDN challenges and their solutions, there is a lack ofstudies that consider the full picture. Available CDN simulators are mostly products of academicwork, such as CoDeeN (WANG et al., 2004) and CoralCDN (FREEDMAN; FREUDENTHAL;MAZIERES, 2004). They offer specific choices and are mostly P2P based. Moreover, we

A.3. P2PCDNSIM SIMULATION TOOL 92

also found tools designed to evaluate specific problems, such as CDN protocols (STAMOSet al., 2010; WANG; PAI; PETERSON, 2002), cache replacement (SATSIOU; PATERAKIS,2006), link and node allocation (QIU; PADMANABHAN; VOELKER, 2001) and securityproblems (TRIUKOSE; AL-QUDAH; RABINOVICH, 2009).

CDNsim (STAMOS et al., 2010) seems to be, the only CDN-specific simulator recognizedby the community. It is based on the known OMNnet++ open software. Similarly, the INETopen source simulation framework is also based on OMNET++ and includes some CDN support.Nonetheless, both options lack additional overlay support, such as hybrid CDN-P2P overlay apopular scenario considering multimedia content distribution.

Other simulators, such as (DAS; KANGASHARJU, 2006; KRISHNAMURTHY; WILLS;ZHANG, 2001), were used to evaluate different CDN strategies but often neglected importantaspects of a CDN infrastructure, such as cache capacities and request routing (KANGASHARJU;ROSS; ROBERTS, 2001). They usually fail to identify cross issues interference, known to existeven among seemingly unrelated modules within an analyzed system. For example, it is mislead-ing to assume that there are shortcuts through dynamic aspects as diverse as traffic routing, cachelocation, CDN topology information, caching strategies, transport protocols and user accesstechnology. There is an urgent need for a holistic cross layer tool capable of combining diversestructured, P2P and complex CDN strategies (FORTINO; RUSSO, 2008; WANG; ZENG, 2006;KANGASHARJU; ROBERTS; ROSS, 2002; MOREIRA et al., 2011). Such need motivated usto design the P2PCDNSim tool.

Our first idea was to extend CDNSim, but it turns out this would not be an easy task.According to Johansson and Löfgren (2009), three fundamental principles regarding extensibilityare; Modifiability, Maintainability, and Scalability. Each of them supported by a set of designprinciples, such as modularity, dependency inversion and interface segregation. With thoseprinciples in mind, the extensibility of CDNSim’s code was not acceptable. Therefore, wedecided to design our tool from scratch and make it as extensible as possible. A substantial partof the simulator’s development happened during two collaboration projects between the Networkand Telecommunications Research Group (GPRT) and Ericsson Research, starting from late 2008until the beginning of 2013. Following, the author of this thesis and other members from GPRT’steam were responsible for maintaining the simulator. Next section presents P2PCDNSim’sarchitecture, and illustrates all extra functionalities inserted and its extensibility.

A.3 P2PCDNSim Simulation Tool

We present the P2PCDNSim, a flexible and highly configurable discreet-event basedsimulation system designed to assist researchers in the process of describing and evaluatingstrategies to enhance CDNs. It is a Java discreet-event based tool that simulates complexscenarios, including hybrid CDN and P2P overlays, and Multi-CDN collaboration. It alsoprovides a clear separation between layers, facilitating the process of implementing and testing


new strategies; an easy GUI to create new scenarios; a flexible collector for a set of built-inmetrics with real time display of graphs during the simulations; and a realistic representation ofAutonomous Systems, and AS-cross traffic monitoring.

The interested reader may see a demonstration scenario video 1 presenting our GUI andillustrating implemented functionalities.

A.3.1 Architecture

The simulator adopts a number of abstraction levels. There are many ways in which aCDN project may be built. Some based on fixed infrastructure, others use P2P, and there are thosethat are hybrid or Cloud based. As a result, our first decision was to establish a clear separationbetween these different overlays (operating modes), the common underlying support strategies,and functionalities which include caching strategies, placement algorithms, the simulator’s eventscheduler, metric collection functions and graph plotting services. The proposed architectureenables the quick development of new overlays, which should be placed on top of the baseP2PCDNSim module (underlay), extending the pre-existing CDN overlay if possible.

A second design decision was made to support geographical location through an ad-vanced realistic globe shaped interface. A user may examine relevant real-time metrics bymonitoring their color coded evolution on the graphs. Figure A.4 illustrates such GUI. Animatedoutput conveys relevant knowledge and allows observing system evolution at a controlled pace.Animators have successfully used this technique in simulators such as the ns-22.

We started the design process by searching for libraries that could assist our developmentwith basic discrete event simulation core. Considering our familiarity with Java programminglanguage, we decided to use Desmo-J3, a free object-oriented library for developing object-oriented simulation models in Java. We also looked for a library to assist simulator’s graphplotting services and decided to use JFreechart4, a free Java based well-documented chart library.

From a structural perspective, this simulator has been divided into three main parts:simulator core, simulator base and overlay, as illustrated in A.5. At its core, we find the clock,an event scheduler, and data collectors. This job is done by the Desmo-j library and is the onlymodule present in the figure that we did not develop. Above it, we find the Simulator Base,composed of the network infrastructure and base overlay modules. The former faithfully mimicsa communication network, topology information, protocols and links properties such as theirsymmetry, capacity, delay, packet loss, congestion, jitter and throughput information, whilst thelater provides fundamental entities to develop any desired overlay, along with the base metricreport system and topology builder. It is important to notice that Jfreechart was used within thereport system to assist graph plotting. The third abstraction level is that of the overlay, i.e. CDN

1https://youtu.be/2ZovxyHyJrg2http://www.isi.edu/nsnam/ns/3http://desmoj.sourceforge.net/home.html4http://www.jfree.org/jfreechart/


Figure A.4: Screen shot of P2PCDNSim’s GUI. This realistic globe shaped interfaceenables real-time on-the-fly metric monitoring.

Source: Screen shot from simulator’s GUI, made by author.

or P2P Overlays, the actual application target. To describe new overlays one must extend theApplication Process entity provided by the Base Overlay module, defining their behavior in away similar to Java’s Runnable interface 5 which is then mapped onto their discrete events.

The author of this Thesis has been the leading developer of the simulator since monthsafter the beginning of the development process. I started working with this simulator during myMasters, in the end of 2008. In addition to fruitful discussions on all aspects of this work, AndréMoreira, and Marcio Neves also were core developers of the simulator. Also, Arthur Callado,Josilene Moreira, and Ernani Azevedo gave input regarding different application scenarios for allimplemented overlay networks. Furthermore, Djamel Sadok and Victor Souza helped to refactorour proposed solutions giving us extensive views from the academy and the industry.

At the simulator’s base module, we have components designed to faithfully simulatenetworks along with the necessary tools to build an overlay. The Link Layer is representedby the Link class and is responsible for the exchange of frames and updating time-relatedcalculations according to the intrinsic link delay, the data that will be sent, link capacity and thesize of the buffer. A discrete-event simulation models system’s operation as a discrete sequenceof events, we consider frame transmission between nodes as our basic event. When a buffer isfull, the data is discarded raising a lost packet event. This event can be collected and loggedfor future analysis. Each Node has a set of links, and each link connects two nodes. There are

5https://docs.oracle.com/javase/7/docs/api/java/lang/Runnable.html


Figure A.5: The modularized architecture of the P2PCDNSim simulator.


two main Node types: Router and Client nodes. Router nodes are responsible for decisionsregarding packet switching. The set of links for each node will depend on the type of node andits connections. For a Router node, all its links are symmetric in terms of bandwidth capacityas well as being full duplex. Client nodes differ due to user churn and possible presence ofasymmetric links. Symmetric versus asymmetric node contributions are important, especiallyin the case of P2P based scenarios. Client link characteristics are set through client’s XMLconfiguration files, samples can be found at Apendix B.

Considering the Network Layer, the routing table is generated once per simulation. Thenetwork layer module calculates the shortest path between all nodes in the topology. However,paths between nodes A and B, which belong to the same AS A, that pass through a node C,that does not belong to AS A, will not be considered. This behavior enforces the commonbelief that intra-AS paths are preferred to inter-AS ones even when longer. Therefore, routingdepends on the AS representation described in the properties file. There are three possible ASrepresentations; Regional, Unique and Multi. Regional representation considers each AS asa subset of nodes previously set by the topology XML file, illustrated in Figure A.6. Uniquerepresentation considers all nodes as being part of a single AS, illustrated in Figure A.7. Finally,Multi-AS representation, considers every node is considered a separate AS.


It is important to notice that Figures A.6 and A.7 are examples to illustrate possibletopologies.

Figure A.6: Example of a possible hybrid CDN-P2P topology where a set of nodescompose an AS.


Figure A.7: Example of a simple topology where each node is considered a single AS.


At the Transport Layer, TCP and UDP transport protocols are effectively supported byour simulator. Their implementations are completely transparent to the layers above and bellowto facilitate the development of new transport strategies. Our UDP simulation code is straightforward in the sense that UDP is stateless. However, the TCP protocol implementation coversRFC5681 (ALLMAN; PAXSON; BLANTON, 2009), the connection handshake negotiation andpacket timeout calculations according to RFC2988 (PAXSON; ALLMAN; TIMER, 2000). Theimplemented TCP based flow control uses the sliding window mechanism. The behavior of the


congestion window includes Fast Retransmit and Fast Recovery algorithms. RTT is calculatedaccording to RFC2988 (PAXSON; ALLMAN; TIMER, 2000). Overlays use both transmissionprotocols through a Socket and ServerSocket like classes following a similar concept found inJava’s net package 6.

Also, at the simulator’s base layer, a number of XML based configuration files are usedto select among the supported strategies, scenario definition, expected output metrics, reports andlogs with detailed event tracing. One can find XML configuration files examples in Apendix B.

Considering the Application Layer, we present an overlay developed on top of the basemodule, the CDN overlay, designed for video distribution.

First we extended the Application Process to implement the behavior of all main CDNentities, such as Origin Server, Surrogate Server and Request Redirector (see Figure A.5). TheOrigin Server is responsible for storing content from the CP. It acts as a content source whichremains always available for access when caches do not have the requested content. Note thatclients never contact the origin server directly. Surrogate servers, also known as cache or replicaservers, are servers located near clients with considerable storage space capacity in order tocache content requested by clients. The Request Redirector is the entry point of the CDN for anyclient and is responsible for selecting the most suitable surrogate server for a client’s request.Four types of strategies or CDN sub-modules have been included into the overlay. They offerdecisions for the placement of replica servers, caching optimization techniques, request routingalgorithms, and content outsourcing. Replica placement refers to the replica/cache placementproblem. It emphasizes the issue of choosing the best location for each replica, according topreviously selected metrics, and the number of surrogates needed for an optimal operation withina network infrastructure. Available strategies are: Greedy, HotSpot and Random. With regardto caching strategies the P2PCDNSim offers Least Recently Used (LRU) and Least FrequentlyUsed (LFU).

Content outsourcing sets the relationship between surrogates and the origin server intoa CDN. There are two main policies for content outsourcing available for use: cooperativepull-based and non-cooperative pull-based (STAMOS et al., 2010). The request-routing moduleis responsible for directing client requests to a suitable replica server, based on traffic forwardingpolicies or metrics. This is an important part of the simulator that must be well engineered forresource optimization. A set of metrics (such as network proximity, bandwidth availability, clientperceived latency, distance, surrogate load, network load, and content availability) are possiblecandidates to offer the most suitable replica server for a given client. We use a DNS-basedrequest routing mechanism that replies with candidate IP addresses according to one of thefollowing strategies: Network Proximity, Number of Requests, Shortest Distance and RoundRobin. Network Proximity calculates latency between replicas and clients selecting the smallerlatency experienced. The Number of Requests strategy identifies which replicas overloaded interms of requests and ignores their distance to clients in an attempt to restore load balancing.

6https://docs.oracle.com/javase/7/docs/api/java/net/package-summary.html


Shortest Distance estimates the distance between a client and replicas through the numberof router hops. Finally, the Round Robin request routing strategy distributes requests to thesurrogate servers, while balancing load among them.

To assist users on the process of evaluating results and comparing strategies we designedand implemented the Report System. It collects a set of pre-configured events along with thescenario description and store it into a file, once the simulation finished. Assembling files fromdifferent scenarios, for instance, adopting different RPAs, one can plot graphs comparing themcollected metrics. There are several metric events pre-configured in the simulator meaning thatall simulations will automatically generate and store such events enabling later comparison. Wedivided those default metric events into three groups:

� Network traffic: there is a set of events related specifically to different types ofnetwork traffic, such as total network traffic, and cross and inner AS traffic. TotalNetwork traffic measures the total amount bits that passed through all links. Crossand Inner AS traffic accounts total network traffic within and between ASes. Noticethat, Cross AS traffic plus Inner AS traffic equals Total Network Traffic.

� QoE metrics: considering the importance of QoE (CONVIVA, 2015) we providemetrics that reflect client’s quality of experience, such as Startup Delay representingthe period between video content request and the actual playback start. Also we havea metric called Playback Continuity, that measures whether the client was able tosee the whole video without glitches.

� Overlay related metrics: comprises metrics related to specific overlays, such asfor our CDN overlay we have Cache hit and Cache miss.

Appendix C shows examples of graphs illustrating some of the previously describedmetrics plotted by P2PCDNSim’s Report System.

We also developed the P2PCDNSim Simulator wizard, a tool to facilitate the first contactwith the simulator. Simulating a complex overlay requires the selection of several parameters.The wizard compiles a scenario in four steps. The first step is to choose which overlay theuser wants to examine: CDN, Hybrid or CDNI. The second step concerns topology details withsurrogates’ distribution. One may choose from three surrogate placement techniques: “Fixed”for the length of the simulation, “One surrogate per AS” for each CDN, or one of the othersurrogate placement strategies previously described. In the third step, the user places clients andCDN entities over a graphical world map containing the selected topology as seen in A.8. Thefourth and final step is where the user describes the scenario’s characteristics, such as the numberof objects, their sizes, and the number or rate of requests. These characteristics will be used toconfigure workload used for the simulation, which will be generated using Medisyn (TANG et al.,2003), a synthetic streaming media service workload generator. With regard to the distribution ofuser requests, this is done by defining proportions for every client community. For instance, one


could associate 50% of client requests to one community of clients and 50% to another. Also,during this step the user defines clients’ link bandwidth distribution. For instance, 20% of CDN1clients could have 10Mbps access links, 30% with 5Mbps and the rest with only 2Mbps.

Figure A.8: Screenshot from the P2PCDNSim simulation scenario wizard.

Source: Screenshot from simulator’s wizard, made by author.

A.3.2 Network Layer Comparison and CDN Layer Validation

Sargent (2005) describes a technique to validate models called Comparison to Other

Models. The idea is to use a valid model to evaluate a new model, comparing various results (e.g.,outputs) of the new model with results obtained with the valid model. Therefore, we decidedto validate our simulator comparing it to two reliable simulation tools. First, we examined theSimulator Base module and compared the results of a network transmission using our simulatorto the same scenario using Network Simulator 3 (ns-3) 7, a discrete-event network simulator.Next we evaluate the CDN Overlay, using our simulator to assess an example scenario presentedby the CDNSim (STAMOS et al., 2010) and comparing results.

7https://www.nsnam.org/overview/what-is-ns-3/


To evaluate Simulator’s Base module, we ran experiments based on the TCP transmissionexample scenario presented by ns-3 (usually example/tcp/tcp-large-transfer.cc). This examplescenario describes a TCP transmission of a large file (200MB) between two nodes, one acting asthe server and the other as client. To connect both of them, we considered two topologies; a threenodes topology and a 20 nodes topology. The three nodes topology has two nodes connectedthrough a center node, the only one with two network interfaces (the original topology in theexample). The other topology had 20 nodes, consisting of 4 ASes with five nodes each, andwas generated using BRITE (MEDINA et al., 2001) and its integration8 with ns-3. All BRITEtopology configuration values were set to default. All links set to 10Mbps with 1ms delay.Router’s buffer was 128KB, which is the same default send buffer limit used in the ns-3.

We considered the following metrics for all experiments comparing the Base Simulatormodule to ns-3:

� Execution time: the time to run the simulation for that particular scenario.

� Download time: the simulation time elapsed between the begining and the end ofthe file transmission.

� Download rate: the ratio, in MB/s, in which data was transmitted between the clientand the server. It is important to notice that there is a slight difference between theratio presented by ns-3 and the one from P2PCDNSim. The Download rate collectedfrom ns-3 considers all data transffered, including TCP overhead information. Onthe other hand, the download rate collected from our simulator is calculated on theclient side, considering only useful information received by the client.

Figure A.9 shows how the congestion window changes during the simulations for bothtools (P2PCDNSim and the ns-3). TCP version used by ns-3 during these experiments was NewReno TCP, which differs from the RFC5681 standard used on our TCP implementation. Lookingat other TCP congestion window evaluations (WANG et al., 2004; LAM, 2014) one may noticethat the behavior of the congestion window is similar to other TCP implementations found in theliterature.

Tables A.1 and A.2 shows similar download rates and download times for both tools,considering 10Mbps links experiencing 1ms delay. The slightly lower download rate relates tothe fact that we have different TCP versions and the semantic difference between both simulatorsmetric collection system, as described earlier. Considering Execution time we can see that forthe three nodes topology scenario P2PCDNSim’s simulation lasted three times more than ns-3’s.We believe that this relates to the different programming language used, ns-3 is written in C++,which is known to be faster than the interpreted language Java, used in our simulator. However,looking at Execution time in Table A.2 that Execution time for ns-3 increased and is now higher

8https://www.nsnam.org/wiki/BRITE_integration_with_ns-3


than P2PCDNSim’s Execution time. By conducting further experiments, we realized that thereason for such larger simulation time difference was the amount of data written into traces. Bydisabling the pcap trace file generation 9 Execution time dropped to 21 minutes and dropped evenfurther to 6 minutes when trace collection was completely off. Considering that a simulationwithout any trace or record is useless, it is fair to state that the minimal practical ns-3 simulationtime was almost twice P2PCDNSim simulation time for the same exact topology and application.

We also looked into memory usage of both simulators. Figure A.10 shows memoryusage for both simulators during the three nodes topology experiment. Figure A.11 showsmemory usage for both simulators, during the 20 nodes topology experiment, considering virtualand residual memory. Comparing both figures, one can see that a ten-fold topology increase didnot considerably affect memory consumption. From our previous experience with P2PCDNSim,we believe that the primary aspect responsible for memory consumption is the number of clientsin the network, the size and the quantity of objects requested.

It is important to notice that in both cases, ns-3 and P2PCDNSim, we have no randomvariables in all scenarios presented here. Both tools receive the same configuration and performTCP transmission according to their respective versions. Therefore, there is no need to replicateexperiments since they will always result in the same output.

Figure A.9: TCP congestion window behavior comparing ns-3 and P2PCDNSim.


To further validate our CDN Overlay implementation, we replicated the experimentsprovided from the publically available CDNsim tutorial (CDNSIM TUTORIAL, 2009) onto ourown simulator. We followed instructions provided in CDNsim’s tutorial to prepare the scenarios.

9http://www.tcpdump.org/


Figure A.10: Memory usage comparison between ns-3 and P2PCDNSim considering thethree nodes scenario.


Figure A.11: Memory usage comparison between ns-3 and P2PCDNSim considering the20 nodes scenario.


The first step is to set the redirection policy. We set it to non-cooperative with closest originsince it is the simplest policy option(with the exception of random that would not be useful tocompare). Table A.3 summarizes both scenarios’ configuration. The common collected metricis limited to the cache hit as shown in Table A.4. One can see that the cache hit ratios are very

A.4. LESSONS LEARNED 103

Table A.1: Results collected from both simulators for the three nodes topology.

ns-3 P2PCDNSimExecution Time 3m 6s 9m 39sDownload Time 188s 185sDownload Rate 1,24 MB/s 1,08MB/s

Table A.2: Results collected from both simulators for the 20 nodes topology.

ns-3 P2PCDNSimExecution Time 51m 33s 11m 12sDownload Time 188s 185sDownload Rate 1,24 MB/s 1,08MB/s

similar for both tools. The small difference, seen in the one RS scenario, can be attributed todifferences on the simulation of the network (for instance TCP implementation).

A.4 Lessons Learned

Though we have presented an initial validation of our simulator through a comparison totwo others, we understand that this task is by no means exhausted and that more tests need to becarried out. Most users have also been pleased with the globe interface, as it facilitates topologymanagement. Previous works by the authors during the construction of the simulator studied theeffects of cache relocation under an infrastructure that allowed for network virtualization. Welearned that periodic cache relocation (MOREIRA et al., 2011, 2015a; NEVES et al., 2013) can

Table A.3: Scenario configuration for the comparison between P2PCDNSim andCDNSim.

Scenario1 Scenario2# nodes 50 3037

# replica servers 1,2,3,10 10

TraceProvided by CDNsim tutorial:

trace_file50000(first 5000 lines)

Provided by CDNsim tutorial:trace_file50000

Link Capacity 100 Mbit/s 100 Mbit/s

Table A.4: Cache hit results collected for scenarios 1 and 2 using P2PCDNSim andCDNSim.

#replica servers CDNSimScenario1

P2PCDNSimScenario1

CDNSimScenario2

P2PCDNSimScenario2

1 24% 20.44% - -2 14.4% 12.78% - -3 10.5% 10.26% - -

10 3.72% 3.78% 30.71% 27.52%

A.5. OBSERVATIONAL STUDY 104

present a considerable gain in terms of inter-AS traffic, total network traffic and QoE metrics,such as startup delay and playback continuity. We also observed that such a strategy can providegreatly improved performance on the same metrics compared to no relocation of caches atall, through adapting to the characteristic of the requests and thereby saving valuable networkresources and providing an enhanced experience for the user. Another previous work by theauthors (RODRIGUES et al., 2011) studied the effects of different traffic locality strategies whenredirecting user requests to RSs. We learned that an AS-based strategy can significantly reduceinter-AS traffic, startup delay and total download times when compared to the random attributionof caches and can also reduce the observed values for the same metrics when compared to theper-hop (shortest paths independent of AS) strategy.

Current simulator’s primary drawback, memory consumption, is caused by its faithfulportrayal of functionalities, such as underlying communication protocols and real-time animation.For example, each packet is associated with a Java object, inheriting all its overhead. Therefore,we are always trying to improve the code to mitigate as much as possible the high memoryconsumption, for instance, recently we refactored the Report System to reduce the amount ofdata collected.

A.5 Observational Study

Aside from the validation and comparison with other well-known tools, our simulatorwas used by several academic studies. Its flexibility enabled fast development of several differentoverlays to evaluate different scenarios. We could group studies assisted by the P2PCDNSiminto groups related to three distinct topics:

� Network management: this group comprises studies, where strategies to improvemanagement of network resources, were evaluated. An example of such study isMoreira et al. (2015a) where the authors introduce a technique to detect signifi-cant changes in a monitored metric of a CDN to allow provisioning adaptation ofresources.

� Replica Placement: this group encompasses studies that propose techniques tomanage the placement of replicas. An example of such studies is Rodrigues et al(2013a) where authors offer a new dynamic RPA strategy, based on the Greedyapproach, which uses the count of data flows through network nodes as primarymetric.

� CDN Resource management: this group includes studies that propose techniquesto manage CDN resources, such as replica server storage administration and contenteviction. An example os such studies is Moreira et al. (2015b) where authors proposean efficient cache slicing mechanism that frees the CDN provider from dedicating adifferent virtual surrogate for each Content Service Provider (CSP) in each location.

A.6. CONCLUDING REMARKS 105

In general, we have over 7 conference papers, one master thesis and 3 Ph.D. thesis relatedto the P2PCDNSim simulator, Table A.5 summarizes works related to our simulation tool.

A.6 Concluding Remarks

In this Chapter, we presented the P2PCDNSim, a simulation tool designed to assistresearcher’s performance in the process of proposing and evaluating strategies to improve CDNs.It is flexible enough to simulate many overlay designs, including dynamic and virtualizedinfrastructures, and its modularized design allows fast development of new overlays. Suchflexibility allowed its usage to evaluate several research problems. In general, our tool is relatedto more than 7 conference papers, one master thesis and 3 Ph.D. thesis related to the P2PCDNSimsimulator, Table A.5 summarizes works related to our simulation tool.

To validate P2PCDNSim, we performed a comparison with reference tools, the ns-3 andCDNSim. The latter recognized throughout the community, but with some limitations, such aslack of modularity and only supporting CDN overlays. Also, it has a limited number of inputsfor the CDN parameters and strategies. Our tool allows the creation of more complex scenariosand is not limited to the basic CDN components.

Next chapter presents a new RPA strategy called FlowCount. A greedy algorithm basedon the number of flows passing through the nodes that compose the network. We used theP2PCDNSim to evaluate the new RPA showing that it maintains similar Quality of Experience(QoE) while decreasing cross traffic during flash crowd events.

A.6. CONCLUDING REMARKS 106

Table A.5: List of all contributions related to the P2PCDNSim simulator tool.

Reference TypeRodrigues, M., Neves, M., Moreira, J., Sadok, D., Callado,A., Karlsson, P. and Souza, V., 2011, April. On trafficlocality and QoE in hybrid CDN-P2P networks.In Proceedings of the 44th Annual SimulationSymposium.

ConferencePaper

Moreira, A., Moreira, J., Sadok, D., Callado, A., Rodrigues,M., Neves, M., Souza, V. and Karlsson, P.P., 2011, December.A case for virtualization of content delivery networks.In Proceedings of the Winter Simulation Conference.

ConferencePaper

J. A. Moreira, “Cache Strategies for Internet-based Videoon-Demand Distribution,” Ph.D. dissertation,Centro de Informática, Universidade Federal de Pernambuco,Recife, Brasil, 2011.

Ph.D.Thesis

M. B. E. Rodrigues, “Traffic Locality and QoE in HybridCDN-P2P Networks,” Master dissertation,Centro de Informática, Universidade Federal dePernambuco, Recife, Brasil, 2011.

MasterThesis

Neves, M., Rodrigues, M., Azevêdo, E., Sadok, D., Callado,A., Moreira, J. and Souza, V., 2013, May. Selecting the mostsuited cache strategy for specific streaming media workloads.In IFIP/IEEEIntegrated Network Management. (IM 2013).

ConferencePaper

Rodrigues, M., Moreira, A., Neves, M., Azevêdo, E., Sadok,D., Callado, A. and Souza, V., 2013, April. Optimizing cross trafficwith an adaptive CDN replica placement strategy. In Proceedingsof the 46th Annual Simulation Symposium.

ConferencePaper

Rodrigues, M., Moreira, A., Neves, M., Azevêdo, E., Sadok,D., Callado, A. and Souza, V., 2013, May. Flow count: A CDNdynamic Replica Placement Algorithm for cross traffic optimization.In IFIP/IEEEIntegrated Network Management. (IM 2013).

ConferencePaper

Moreira, A., Rodrigues, M., Azevedo, E., Sadok, D., Callado, A.and Souza, V., 2015, May. Analyzing strategies to effectivelydetect changes in content delivery networks.In IFIP/IEEEIntegrated Network Management. (IM 2015).

ConferencePaper

Moreira, A., Azevedo, E., Kelner, J., Sadok, D., Callado, A. and Souza, V.,2015, May. An adaptable storage slicing algorithm for content delivery networks.In IFIP/IEEEIntegrated Network Management. (IM 2015).

ConferencePaper

A. L. C. Moreira, “An Adaptable Storage Slicing Algorithm forContent Delivery Networks,” Ph.D. dissertation, Centro deInformática, Universidade Federal de Pernambuco, Recife, Brasil, 2015.

Ph.D.Thesis

P. T. Endo, “Role-based Self-Appointment for ResourceManagement in Distributed Environments,” Ph.D. dissertation,Centro de Informática, Universidade Federal de Pernambuco, Recife, Brasil, 2015.

Ph.D.Thesis

107107107

BP2PCDNSim I/O

Example topology XML used to create a simple two node topology for the P2PCDNSimsimulator. In the topology below there are two nodes and one link between them. Also, bothnodes belong to the same AS, namely AS1.

<? xml v e r s i o n =" 1 . 0 " ?>< !DOCTYPE Links SYSTEM " R o u t e r s . d t d "><Topology>

<RoutersToAS><RouterToAS>

<NodeId>0< / NodeId><ASId>AS1< / ASId>< Border > f a l s e < / Border >

< / RouterToAS><RouterToAS>

<NodeId>1< / NodeId><ASId>AS1< / ASId>< Border > f a l s e < / Border >

< / RouterToAS>< / RoutersToAS><Links>

<Link><NodeTo>1< / NodeTo><Delay > n u l l < / Delay ><LinkSpeed > n u l l < / LinkSpeed ><NodeFrom>0< / NodeFrom><Queue> n u l l < / Queue>

< / Link>< / Links>

< / Topology>

108

Example topology.properties file describing basic topology configuration. This file isused to configure topology characteristics for all links and nodes present in the topology used.

# P2P−CDN s i m u l a t o r p r o p e r t i e s f i l e######### Topology p r o p e r t i e s ########

# Link c a p a c i t y o f t h e l i n k between e v e r y node and h i s c l i e n t ( s )LANBusLinkCapacity = 1

# M u l t i p l i e r f o r c r o s s l i n k s c a p a c i t yC r o s s L i n k C a p a c i t y M u l t i p l i e r = 2

# M u l t i p l i e r f o r c r o s s l i n k s l a t e n c yC r o s s L i n k L a t e n c y M u l t i p l i e r = 2

# Link c a p a c i t y between nodes# I n n e r l i n k s c a p a c i t y , l i n k s between AS have# L i n k C a p a c i t y * C r o s s L i n k M u l t i p l i e rL i n k C a p a c i t y = 1073741824# L i n k C a p a c i t y = 750000000

# Toggle i n d e p e n d e n c e l i n k s between t h e c l i e n t s and t h e i r nodeI n d e p e n d e n c e = t r u e

# La tency of ne twork l i n k sLa tency = 0 .005

# Border Node I d e n t i f i e rBorde r = Border

#Does t h e ne twork h a n d l e s wi th d i f e r e n t ASs . I t can be Reg iona l ,#Uniq , M u l t i . R e g i o n a l i s when an AS i s a group of p e e r s .# Uniq i s when everybody b e l o n g s to t h e same AS and M u l t i i s#when each node r e p r e s e n t s one AS .ASAware = R e g i o n a l

#The d e f a u l t AS Id to be used i f none i s g i v e nDefau l tAS = AS0

109

Example XML describing a set of requests for the P2P overlay. In the example bellowthere are three requests, each one for a different video object with a different and increment timestamp.

<? xml v e r s i o n =" 1 . 0 " ?>< !DOCTYPE RequestsP2P [<!ELEMENT RequestsP2P ( Request +)>< !ELEMENT Request ( C l i e n t , Timestamp , VideoObjec t ) >< !ELEMENT C l i e n t (#PCDATA) >< !ELEMENT Timestamp (#PCDATA) >< !ELEMENT VideoObjec t (#PCDATA) >] >

<RequestsP2P><Request>

< C l i e n t >0< / C l i e n t ><Timestamp> 0 . 3 3 < / Timestamp>< VideoObjec t >113< / VideoObjec t >

< / Request><Request>




< / Request>< / RequestsP2P>

A fragment of a log file from a simulation using the Hybrid CDN-P2P Overlay. We canpeers leaving the network and others interacting with Origin and Surrogate/Replica servers.

1859800 [ Pee r #25#5] WARN p2pcdnsim . l o g .P2PCDNSimLoggerCreator − Pee r l e f t t h e ne tworkdownloaded s i z e = 5701632 .0 d e l a y =201.90306212928346 r a t i o = 0 .0271875 f o r VodCl i en t :AddressNode35 . 4 wi th a download r a t e o f [ 2 7 . 7 0 1 ] KBpsor [ 0 . 0 2 7 ] MB/ s1859804 [ Pee r #25] WARN p2pcdnsim . l o g . P2PCDNSimLoggerCreator −[ 2 6 1 . 9 0 4 ] Pee r [ VodCl i en t : AddressNode35 . 4 ] l e f t t h e ne twork

110

1859804 [ Pee r #25] WARN p2pcdnsim . l o g . P2PCDNSimLoggerCreator −[ 2 6 1 . 9 0 4 ] D i s p o s i n g c l i e n t node AddressNode35 . 41859810 [ S u r r o g a t e 2 #1] WARN p2pcdnsim . l o g .P2PCDNSimLoggerCreator − [ 2 6 1 . 9 0 6 ] S u r r o g a t e [ AddressNode32 ]Rece ived C o n n e c t i o n message from AddressNode34 . 11859813 [ S u r r o g a t e 2 #1#29452#1] WARN p2pcdnsim . l o g .P2PCDNSimLoggerCreator − [ 2 6 1 . 9 0 7 ] S u r r o g a t e R o u t i n e B y P i e c e0 [ AddressNode32 ] Downloaded P i e c e 45 from chunk 64 fromv i d e o 7 from @Origin Server@ a t [ 2 4 9 9 . 2 4 8 ] KB/ s TOTAL

DOWNLOAD Rate = 305947.33844057616KB/ s1859818 [ Pee r #82#3] WARN p2pcdnsim . l o g . P2PCDNSimLoggerCreator− [ 2 6 1 . 9 0 9 ] Pee r [ AddressNode7 . 1 4 ] ( S e q u e n t i a l 1 ) DownloadedP i e c e 94 from chunk 0 from # Pee r AddressNode47 . 4 # a t[ 7 9 . 7 7 2 ] KB/ s Upload ing to 0 Connec ted to 9 P e e r sCONTINUITY = 82.6086956521739% − Downlink = 1.430511474609375Mbs− Upl ink = 0.3662109375 Mbs TOTAL DOWNLOAD Rate =26.90790748303225KB/ s

The sample XML below is a client placement XML file. It describes the placement andbandwidth capacity for two clients. Each client has a different Id and is connected to a differentRouter in the topology, identified by its Id. Uplink and Downlink tags represents upload anddownload capacities for each client respectively.

<? xml v e r s i o n =" 1 . 0 " ?>< !DOCTYPE P l a c e m e n t s SYSTEM " Reques t sP2P . d t d ">

< C l i e n t >2< / C l i e n t >< R ou te r >22< / R ou te r ><Downlink>2097152< / Downlink>< Upl ink >2097152< / Upl ink >



< C l i e n t >3< / C l i e n t >< R ou te r >22< / R ou te r ><Downlink>2097152< / Downlink>< Upl ink >2097152< / Upl ink >



Bellow, a sample of the file that represents client requesting objects. This XML has a list

111

describing the client, the time stamp and the ID of the requested object. The simulator sort allrequests, according to the time stamp, and execute them sequentially.

<? xml v e r s i o n =" 1 . 0 " ?>< !DOCTYPE R e q u e s t s SYSTEM " R e q u e s t s . d t d ">< R e q u e s t s >

<Request>< C l i e n t >1< / C l i e n t ><Timestamp>0< / Timestamp>< VideoObjec t >10250< / VideoObjec t >


< C l i e n t >2< / C l i e n t ><Timestamp>4< / Timestamp>< VideoObjec t >10250< / VideoObjec t >





< / Request>< / R e q u e s t s >

112112112

CReport System Graph Examples

Here we show example graph generated by the Report System, package p2pcdnsim.report

provided by the simulator. All figures shown here are real examples of graphs plotted directlyby the system during one of our experiments. When the Report System is on, it gathers allevents reported into a serialized file. This file has a complete description of the simulation, theparameters used to configure the simulated scenario, and all events collected. With this file, theuser is possible to compare this scenario with any other scenario simulated, this is done by theAnalyzer. It compares events and plot graphs regarding important events collected, such as TotalNetwork traffic, Cross and Inner AS traffic, Cache hit, Startup Delay and so on.

We extracted all examples bellow from a preliminary test experiment made by AndréMoreira when he was testing the Cache Slicing overlay, that would later be part of a paper (MOR-EIRA et al., 2015b) and his Ph.D. thesis (MOREIRA, 2015). We start with Overlay Metrics,Figures C.1 and C.2 present respectively cache hit and cache miss reported by all replica serversduring the simulation. We can see that strategies proposed increase Cache hit, therefore decreaseCache miss, in comparison with the baseline (No Slicing).

Now looking at Network metrics, Figures C.3, C.4, and C.5 illustrate bits per secondthrough time considering Cross AS traffic, Inner AS traffic and Total Network Traffic respectively.One can notice lower Cross and Inner AS traffic when using proposed techniques in comparisonwith the baseline.

Finally, Figures C.6 and C.7 illustrate QoE metrics collected by the simulator. Figure C.6shows the Startup delay mean, for all clients that requested content during the experiment,whereas, Figure C.7 illustrate all reported Startup delay through simulation time.

113

Figure C.1: Cache hit count collected from all replica servers during the simulation.


Figure C.2: Cache miss count collected from all replica servers during the simulation.


114

Figure C.3: Cross AS traffic rate, in bits/second, during the simulation.


Figure C.4: Inner AS traffic rate, in bits/second, during the simulation.


115

Figure C.5: Total Network traffic rate, in bits/second, during the simulation.


Figure C.6: Startup Delay mean collected from all clients that requested content duringthe experiment.


116

Figure C.7: Startup Delay through simulation time collected from all clients thatrequested content during the experiment.

Date post:	06-Jun-2020
Category:	Documents
Upload:	others
View:	1 times
Download:	0 times

Moisés Bezerra Estrela Rodrigues - UFPE...Moisés Bezerra Estrela Rodrigues Towards Improvements in...

Documents