Overlay (and P2P) Networks - cs.helsinki.fi ISP Content Provider DNS Server DNS Server CDN DNS...

Post on 26-Mar-2018

242 views 3 download

transcript

09.02.2017

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI

Faculty of SciencesDepartment of Computer Science 1Overlay (and P2P)

Overlay (and P2P) Networks

● Structured Overlay Networks

– Content Delivery Networks

● Akamai● Coral

– Amazon Dynamo

Samu Varjonen

Ashwin Rao

Part II

09.02.2017

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI

Faculty of SciencesDepartment of Computer Science 2Overlay (and P2P)

Content Delivery Networks

09.02.2017

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI

Faculty of SciencesDepartment of Computer Science 3Overlay (and P2P)

Limitations of Web Proxies (Caching)

● Inability of cache all objects/content

– Dynamic Data

– Encrypted data

● Server Side Analytics

– Hit Metering, User Demographics, etc.

● Scalability

– Inability to support flash crowds

● …

09.02.2017

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI

Faculty of SciencesDepartment of Computer Science 4Overlay (and P2P)

Content Delivery Networks● Role

– Redirect content requests to an 'optimal site'

– Cache and Serve content from 'optimal site'

– Export logs and other information to origin servers

● Redirection mechanism

– DNS redirection

– URL rewriting

09.02.2017

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI

Faculty of SciencesDepartment of Computer Science 5Overlay (and P2P)

Critical Issues in Deploying CDNs● Servers Placement

– Where to place the servers?

– How many in each location?

● Content Selection

– Which content to distribute in CDNs?

● Content Replication

– Proactive push from origin server

– Cooperative vs Uncooperative Pulls

● Pricing

George Pallis et al. ”Insight and perspectives for content delivery networks.” In  Communicatoins of ACM 49, 1 (January 2006), 

09.02.2017

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI

Faculty of SciencesDepartment of Computer Science 6Overlay (and P2P)

Server Placement Problem

Given N possible locations at edge of the Internet, we are able to place K (K<N) surrogate servers, how to place them to minimize the total cost?

● Minimum K-median problem

– Given N points we need to select K centers

– Assign each input point j to a center 'closest' to it

– Minimize the sum of distances between each j and its center

● NP-Hard

09.02.2017

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI

Faculty of SciencesDepartment of Computer Science 7Overlay (and P2P)

Redirection Techniques● Routing Strategy

– Anycast

– Load Balancing

● Application specific selection

– HTTP redirection

● Naming based redirection

– DNS

09.02.2017

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI

Faculty of SciencesDepartment of Computer Science 8Overlay (and P2P)

DNS Based Redirection

Client ISP

Content Provider

DNSServer

DNSServer

CDNDNS

Server

09.02.2017

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI

Faculty of SciencesDepartment of Computer Science 9Overlay (and P2P)

DNS Based Redirection

Client ISP

Content Provider

DNSServer

DNSServer

CDNDNS

Server

1

09.02.2017

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI

Faculty of SciencesDepartment of Computer Science 10Overlay (and P2P)

DNS Based Redirection

Client ISP

Content Provider

DNSServer

DNSServer

CDNDNS

Server

1

2

09.02.2017

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI

Faculty of SciencesDepartment of Computer Science 11Overlay (and P2P)

DNS Based Redirection

Client ISP

Content Provider

DNSServer

DNSServer

CDNDNS

Server

1

2

3

09.02.2017

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI

Faculty of SciencesDepartment of Computer Science 12Overlay (and P2P)

DNS Based Redirection

Client ISP

Content Provider

DNSServer

DNSServer

CDNDNS

Server

1

2

3

4

09.02.2017

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI

Faculty of SciencesDepartment of Computer Science 13Overlay (and P2P)

DNS Based Redirection

Client ISP

Content Provider

DNSServer

DNSServer

CDNDNS

Server

1

2

3

4

5

09.02.2017

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI

Faculty of SciencesDepartment of Computer Science 14Overlay (and P2P)

DNS Based Redirection

Client ISP

Content Provider

DNSServer

DNSServer

CDNDNS

Server

1

2

3

4

5

6

09.02.2017

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI

Faculty of SciencesDepartment of Computer Science 15Overlay (and P2P)

Akamai CDN(overview)

● Client requests content from Original Server

– URLs for content in CDN modified in the original response

● Client resolves <content>.<akamai host> name

● Server from the region (best server) chosen

● Client fetches content from akamai server

09.02.2017

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI

Faculty of SciencesDepartment of Computer Science 16Overlay (and P2P)

Akamai(initial request)

Srinivasan Seshan “Computer Networking Caching, CDN, Consistent Hashing, P2P”

Content Source DNS Root Server

Akamai (high level DNS server)

Akamai (low level DNS server)

Akamai (low level DNS server)

Akamai (content server)

1

23 4

5

6

78

9

1011

12

09.02.2017

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI

Faculty of SciencesDepartment of Computer Science 17Overlay (and P2P)

Akamai(subsequent request)

Srinivasan Seshan “Computer Networking Caching, CDN, Consistent Hashing, P2P”

Content Source DNS Root Server

Akamai (high level DNS server)

Akamai (low level DNS server)

Akamai (low level DNS server)

Akamai (content server)

1

23

4

5

6

09.02.2017

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI

Faculty of SciencesDepartment of Computer Science 18Overlay (and P2P)

Democratizing Content Publication with Coral

(Coral CDN)

09.02.2017

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI

Faculty of SciencesDepartment of Computer Science 19Overlay (and P2P)

Coral Objectives● Pool resources to dissipate Flash Crowds

● Work with unmodified clients

● Fetch content only once from Origin

● No centralized management

09.02.2017

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI

Faculty of SciencesDepartment of Computer Science 20Overlay (and P2P)

Coral Objectives● Pool resources to dissipate Flash Crowds

● Work with unmodified clients

● Fetch content only once from Origin

● No centralized management

Browser

Browser Browser

Browser Browser

Browser Browser

OriginServer

09.02.2017

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI

Faculty of SciencesDepartment of Computer Science 21Overlay (and P2P)

Coral Objectives● Pool resources to dissipate Flash Crowds

● Work with unmodified clients

● Fetch content only once from Origin

● No centralized management

Browser

Browser Browser

Browser Browser

Browser Browser

CoralCDN

OriginServer

09.02.2017

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI

Faculty of SciencesDepartment of Computer Science 22Overlay (and P2P)

Coral Objectives● Pool resources to dissipate Flash Crowds

● Work with unmodified clients

● Fetch content only once from Origin

● No centralized management

Browser

Browser Browser

Browser Browser

Browser Browser

CoralCDN

OriginServer

09.02.2017

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI

Faculty of SciencesDepartment of Computer Science 23Overlay (and P2P)

Coral Objectives● Pool resources to dissipate Flash Crowds

● Work with unmodified clients

● Fetch content only once from Origin

● No centralized management

Browser

Browser Browser

Browser Browser

Browser Browser

CoralCDN

OriginServer

Coralhttp prxdnssrv

Coralhttp prxdnssrv Coral

http prxdnssrvCoral

http prxdnssrv

09.02.2017

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI

Faculty of SciencesDepartment of Computer Science 24Overlay (and P2P)

Using Coral

09.02.2017

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI

Faculty of SciencesDepartment of Computer Science 25Overlay (and P2P)

Using Coral● Origin Server rewrites URLs

– abc.com abc.com.coralhost:coralport→– Redirect clients to Coral server

09.02.2017

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI

Faculty of SciencesDepartment of Computer Science 26Overlay (and P2P)

Using Coral● Origin Server rewrites URLs

– abc.com abc.com.coralhost:coralport→– Redirect clients to Coral server

● Coral CDN Components

– DNS server ● Given address of resolver used by the client, return the

address of proxy near the client

09.02.2017

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI

Faculty of SciencesDepartment of Computer Science 27Overlay (and P2P)

Using Coral● Origin Server rewrites URLs

– abc.com abc.com.coralhost:coralport→– Redirect clients to Coral server

● Coral CDN Components

– DNS server ● Given address of resolver used by the client, return the

address of proxy near the client

– HTTP proxy● Given the URL find nearest proxy that has content● Cache the content (DHT)

09.02.2017

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI

Faculty of SciencesDepartment of Computer Science 28Overlay (and P2P)

Using Coral● Origin Server rewrites URLs

– abc.com abc.com.coralhost:coralport→– Redirect clients to Coral server

● Coral CDN Components

– DNS server ● Given address of resolver used by the client, return the

address of proxy near the client

– HTTP proxy● Given the URL find nearest proxy that has content● Cache the content (DHT)

– Distributed Sloppy Hash Table (DSHT)

09.02.2017

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI

Faculty of SciencesDepartment of Computer Science 29Overlay (and P2P)

Using Coral● Origin Server rewrites URLs

– abc.com abc.com.coralhost:coralport→– Redirect clients to Coral server

● Coral CDN Components

– DNS server ● Given address of resolver used by the client, return the

address of proxy near the client

– HTTP proxy● Given the URL find nearest proxy that has content● Cache the content (DHT)

– Distributed Sloppy Hash Table (DSHT)● No load-balancing & content locality support in Basic DHTs

(Chord)

09.02.2017

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI

Faculty of SciencesDepartment of Computer Science 30Overlay (and P2P)

Coral System Overview

Freedman, Michael  et al. "Democratizing Content Publication with Coral." In NSDI 2004.

09.02.2017

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI

Faculty of SciencesDepartment of Computer Science 31Overlay (and P2P)

Hierarchical Indexing

09.02.2017

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI

Faculty of SciencesDepartment of Computer Science 32Overlay (and P2P)

Hierarchical Indexing● Diameter, Clusters, Levels

– Each Coral Node part of several DSHTs called clusters

– Each cluster characterized by max RTT (diameter)

– Fixed hierarchy of diameters called levels

– Group of nodes can form a level-i cluster if the pair-wise RTT less than threshold for level-i

● Paper uses 3 (levels): 20ms (2), 60ms (1), (0)

09.02.2017

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI

Faculty of SciencesDepartment of Computer Science 33Overlay (and P2P)

Hierarchical Indexing● Diameter, Clusters, Levels

– Each Coral Node part of several DSHTs called clusters

– Each cluster characterized by max RTT (diameter)

– Fixed hierarchy of diameters called levels

– Group of nodes can form a level-i cluster if the pair-wise RTT less than threshold for level-i

● Paper uses 3 (levels): 20ms (2), 60ms (1), (0)

● SHA-1 for Coral Keys and Node-Ids

09.02.2017

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI

Faculty of SciencesDepartment of Computer Science 34Overlay (and P2P)

Hierarchical Indexing● Diameter, Clusters, Levels

– Each Coral Node part of several DSHTs called clusters

– Each cluster characterized by max RTT (diameter)

– Fixed hierarchy of diameters called levels

– Group of nodes can form a level-i cluster if the pair-wise RTT less than threshold for level-i

● Paper uses 3 (levels): 20ms (2), 60ms (1), (0)

● SHA-1 for Coral Keys and Node-Ids

● Bitwise XOR is distance (Kademlia)

– Longer matching prefix numerically closer

– Key stored at node having ID ”close” to key

09.02.2017

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI

Faculty of SciencesDepartment of Computer Science 35Overlay (and P2P)

Routing and Sloppy Storage

Freedman, Michael  et al. "Democratizing Content Publication with Coral." In NSDI 2004.

09.02.2017

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI

Faculty of SciencesDepartment of Computer Science 36Overlay (and P2P)

Routing and Sloppy Storage● Routing

– Routing table size logarithmic in total number of nodes

Freedman, Michael  et al. "Democratizing Content Publication with Coral." In NSDI 2004.

09.02.2017

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI

Faculty of SciencesDepartment of Computer Science 37Overlay (and P2P)

Routing and Sloppy Storage● Routing

– Routing table size logarithmic in total number of nodes

● Sloppy Storage

– Cache key/value pairs at nodes whose IDs are close to the key being referenced

– Reduces hot-spot congestion for popular content

Freedman, Michael  et al. "Democratizing Content Publication with Coral." In NSDI 2004.

09.02.2017

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI

Faculty of SciencesDepartment of Computer Science 38Overlay (and P2P)

Routing and Sloppy Storage● Routing

– Routing table size logarithmic in total number of nodes

● Sloppy Storage

– Cache key/value pairs at nodes whose IDs are close to the key being referenced

– Reduces hot-spot congestion for popular content

Freedman, Michael  et al. "Democratizing Content Publication with Coral." In NSDI 2004.

09.02.2017

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI

Faculty of SciencesDepartment of Computer Science 39Overlay (and P2P)

Routing and Sloppy Storage● Routing

– Routing table size logarithmic in total number of nodes

● Sloppy Storage

– Cache key/value pairs at nodes whose IDs are close to the key being referenced

– Reduces hot-spot congestion for popular content

Freedman, Michael  et al. "Democratizing Content Publication with Coral." In NSDI 2004.

09.02.2017

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI

Faculty of SciencesDepartment of Computer Science 40Overlay (and P2P)

Routing and Sloppy Storage● Routing

– Routing table size logarithmic in total number of nodes

● Sloppy Storage

– Cache key/value pairs at nodes whose IDs are close to the key being referenced

– Reduces hot-spot congestion for popular content

Freedman, Michael  et al. "Democratizing Content Publication with Coral." In NSDI 2004.

Initiate from highest­level cluster

Continue at lower­levels on MISS

09.02.2017

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI

Faculty of SciencesDepartment of Computer Science 41Overlay (and P2P)

Coral Implemented on PlanetLab

Global Research NetworkAs of Feb 2014, PlanetLab has 1181 nodes at 567 sites

09.02.2017

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI

Faculty of SciencesDepartment of Computer Science 42Overlay (and P2P)

Reduction in Server Load

Freedman, Michael  et al. "Democratizing Content Publication with Coral." In NSDI 2004.

09.02.2017

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI

Faculty of SciencesDepartment of Computer Science 43Overlay (and P2P)

Reduction in Server Load

Freedman, Michael  et al. "Democratizing Content Publication with Coral." In NSDI 2004.

09.02.2017

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI

Faculty of SciencesDepartment of Computer Science 44Overlay (and P2P)

Reduction in Server Load

Freedman, Michael  et al. "Democratizing Content Publication with Coral." In NSDI 2004.

09.02.2017

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI

Faculty of SciencesDepartment of Computer Science 45Overlay (and P2P)

Dynamics of Flash Crowds

Freedman, Michael. "Experiences with CoralCDN: A Five­Year Operational View." In NSDI 2010.

28% of 30s epochs have no domains with a ≥ 1 OOM rate increase

09.02.2017

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI

Faculty of SciencesDepartment of Computer Science 46Overlay (and P2P)

Insights from 5 year Deployment● A large majority of its traffic does not require any

cooperative caching

– Handling of flash crowds relies on cooperative caching

● Flash Crowds

– Small fraction of CoralCDN’s domains experience large rate increases within short time periods

– Flash crowd domains’ traffic accounts for a small fraction of the total requests

– Request rate increases very rarely occur on the order of seconds

● Content delivery via untrusted nodes requires the HTTP protocol to support end-to-end signatures for content integrity Freedman, Michael. "Experiences with CoralCDN: A 

Five­Year Operational View." In NSDI 2010.

09.02.2017

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI

Faculty of SciencesDepartment of Computer Science 47Overlay (and P2P)

Other CDNs

09.02.2017

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI

Faculty of SciencesDepartment of Computer Science 48Overlay (and P2P)

P4P (Provider Portals for Applications)

● P2P applications may be oblivious to underlying network

– Lot of inter-domain traffic (Karagiannis et al. 2005)

● Approaches to address this problem

– ISP approaches● Block P2P, Rate-limit P2P, Cache content, etc.

– P2P approaches● Locality (Ono Project)

09.02.2017

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI

Faculty of SciencesDepartment of Computer Science 49Overlay (and P2P)

itracker of P4P● Network provider runs an iTracker

● itracker used by ISPs to provides additional information regarding network topology

– P2P networks may choose to utilize to optimize network data delivery

Haiyong Xie et al. ”P4p: provider portal for applications.” In SIGCOMM 2008 

09.02.2017

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI

Faculty of SciencesDepartment of Computer Science 50Overlay (and P2P)

Maygh P2P CDN● P2P CDN on Browser

Liang Zhang et al. ”Maygh: building a CDN from client web browsers.” In EuroSys '13.

09.02.2017

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI

Faculty of SciencesDepartment of Computer Science 51Overlay (and P2P)

Maygh P2P CDN● P2P CDN on Browser

– Leverage on WebSockets, WebRTC, WebStorage API

Liang Zhang et al. ”Maygh: building a CDN from client web browsers.” In EuroSys '13.

09.02.2017

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI

Faculty of SciencesDepartment of Computer Science 52Overlay (and P2P)

Maygh P2P CDN● P2P CDN on Browser

– Leverage on WebSockets, WebRTC, WebStorage API

Liang Zhang et al. ”Maygh: building a CDN from client web browsers.” In EuroSys '13.

09.02.2017

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI

Faculty of SciencesDepartment of Computer Science 53Overlay (and P2P)

CDNI (CDN Interconnection)

● Leverage collective CDN footprint

– One CDN to reuse resources of another CDN provider

– ISPs can deploy their own CDNs

Niven-Jenkins etal. "Content distribution network interconnection (CDNI) problem statement." RFC 6707 (2012).

09.02.2017

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI

Faculty of SciencesDepartment of Computer Science 54Overlay (and P2P)

CDNI (CDN Interconnection)

● Leverage collective CDN footprint

– One CDN to reuse resources of another CDN provider

– ISPs can deploy their own CDNs

Niven-Jenkins etal. "Content distribution network interconnection (CDNI) problem statement." RFC 6707 (2012).

09.02.2017

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI

Faculty of SciencesDepartment of Computer Science 55Overlay (and P2P)

Amazon Dynamo

09.02.2017

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI

Faculty of SciencesDepartment of Computer Science 56Overlay (and P2P)

ACID (Recap)

Theo Haerder et al. "Principles of transaction­oriented database recovery." ACM Computing Surveys. 1983.

09.02.2017

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI

Faculty of SciencesDepartment of Computer Science 57Overlay (and P2P)

ACID (Recap)

● Atomicity

Theo Haerder et al. "Principles of transaction­oriented database recovery." ACM Computing Surveys. 1983.

09.02.2017

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI

Faculty of SciencesDepartment of Computer Science 58Overlay (and P2P)

ACID (Recap)

● Atomicity

– All or nothing

Theo Haerder et al. "Principles of transaction­oriented database recovery." ACM Computing Surveys. 1983.

09.02.2017

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI

Faculty of SciencesDepartment of Computer Science 59Overlay (and P2P)

ACID (Recap)

● Atomicity

– All or nothing

● Consistency

Theo Haerder et al. "Principles of transaction­oriented database recovery." ACM Computing Surveys. 1983.

09.02.2017

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI

Faculty of SciencesDepartment of Computer Science 60Overlay (and P2P)

ACID (Recap)

● Atomicity

– All or nothing

● Consistency

– Successful transaction commits only legal results

Theo Haerder et al. "Principles of transaction­oriented database recovery." ACM Computing Surveys. 1983.

09.02.2017

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI

Faculty of SciencesDepartment of Computer Science 61Overlay (and P2P)

ACID (Recap)

● Atomicity

– All or nothing

● Consistency

– Successful transaction commits only legal results

● Isolation

Theo Haerder et al. "Principles of transaction­oriented database recovery." ACM Computing Surveys. 1983.

09.02.2017

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI

Faculty of SciencesDepartment of Computer Science 62Overlay (and P2P)

ACID (Recap)

● Atomicity

– All or nothing

● Consistency

– Successful transaction commits only legal results

● Isolation

– Events within a transaction must be hidden from other transactions running concurrently

Theo Haerder et al. "Principles of transaction­oriented database recovery." ACM Computing Surveys. 1983.

09.02.2017

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI

Faculty of SciencesDepartment of Computer Science 63Overlay (and P2P)

ACID (Recap)

● Atomicity

– All or nothing

● Consistency

– Successful transaction commits only legal results

● Isolation

– Events within a transaction must be hidden from other transactions running concurrently

● Durability

Theo Haerder et al. "Principles of transaction­oriented database recovery." ACM Computing Surveys. 1983.

09.02.2017

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI

Faculty of SciencesDepartment of Computer Science 64Overlay (and P2P)

ACID (Recap)

● Atomicity

– All or nothing

● Consistency

– Successful transaction commits only legal results

● Isolation

– Events within a transaction must be hidden from other transactions running concurrently

● Durability

– Once a transaction has been committed its results, the system must guarantee the results survive subsequent malfunctions

Theo Haerder et al. "Principles of transaction­oriented database recovery." ACM Computing Surveys. 1983.

09.02.2017

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI

Faculty of SciencesDepartment of Computer Science 65Overlay (and P2P)

CAP Theorem

A. Fox et al. “Harvest, yield, and scalable tolerant systems.” HotOS. 1999.

09.02.2017

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI

Faculty of SciencesDepartment of Computer Science 66Overlay (and P2P)

CAP Theorem● C: Strong Consistency (single-copy ACID consistency)

● A: High Availability (available at all times)

● P: Partition Resilience (survive partition between replicas)

A. Fox et al. “Harvest, yield, and scalable tolerant systems.” HotOS. 1999.

09.02.2017

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI

Faculty of SciencesDepartment of Computer Science 67Overlay (and P2P)

CAP Theorem● C: Strong Consistency (single-copy ACID consistency)

● A: High Availability (available at all times)

● P: Partition Resilience (survive partition between replicas)

PICK ANY TWO

A. Fox et al. “Harvest, yield, and scalable tolerant systems.” HotOS. 1999.

09.02.2017

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI

Faculty of SciencesDepartment of Computer Science 68Overlay (and P2P)

CAP Theorem● C: Strong Consistency (single-copy ACID consistency)

● A: High Availability (available at all times)

● P: Partition Resilience (survive partition between replicas)

PICK ANY TWO● C A without P

A. Fox et al. “Harvest, yield, and scalable tolerant systems.” HotOS. 1999.

09.02.2017

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI

Faculty of SciencesDepartment of Computer Science 69Overlay (and P2P)

CAP Theorem● C: Strong Consistency (single-copy ACID consistency)

● A: High Availability (available at all times)

● P: Partition Resilience (survive partition between replicas)

PICK ANY TWO● C A without P

● C P without A

A. Fox et al. “Harvest, yield, and scalable tolerant systems.” HotOS. 1999.

09.02.2017

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI

Faculty of SciencesDepartment of Computer Science 70Overlay (and P2P)

CAP Theorem● C: Strong Consistency (single-copy ACID consistency)

● A: High Availability (available at all times)

● P: Partition Resilience (survive partition between replicas)

PICK ANY TWO● C A without P

● C P without A

● A P without C

A. Fox et al. “Harvest, yield, and scalable tolerant systems.” HotOS. 1999.

09.02.2017

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI

Faculty of SciencesDepartment of Computer Science 71Overlay (and P2P)

CAP Theorem● C: Strong Consistency (single-copy ACID consistency)

● A: High Availability (available at all times)

● P: Partition Resilience (survive partition between replicas)

PICK ANY TWO● C A without P

● C P without A

● A P without C

Popular work-around – reduced consistency (eventual consistency) or reduced availability

A. Fox et al. “Harvest, yield, and scalable tolerant systems.” HotOS. 1999.

09.02.2017

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI

Faculty of SciencesDepartment of Computer Science 72Overlay (and P2P)

Two Phase Perspective of CAP● Two phase commit

– P1: Coordinator asks databases to perform a pre-commit and asks them if commit is possible. If all DBs agree then proceed to P2

– P2: Coordinator asks DBs to commit

● Two phase commit supports consistency and partitioning. How is availability violated?

– Availability of any system is the product of the availability of the components required for the operation

● ACID provides Consitency. Partion Tolerance is essential. How do you achieve Availability?

– BASE

Dan Pritchett. “BASE: An Acid Alternative.” ACM Queue. 2008

09.02.2017

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI

Faculty of SciencesDepartment of Computer Science 73Overlay (and P2P)

BASE ● Basically available, Soft state, Eventually consistent

● Strong vs Eventual (informal comparison)

– Strong: Every replica sees every update in the same order (atomic updates)

– Eventual: every replica will eventually see updates and eventually agree on all values (non-atomic updates)

● Eventual Consistency

– Database consistency will be in a state of flux but eventually it will be consistent

– Reads might not return the results of the latest update

Dan Pritchett. “BASE: An Acid Alternative.” ACM Queue. 2008

09.02.2017

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI

Faculty of SciencesDepartment of Computer Science 74Overlay (and P2P)

Requirements from Dynamo

G. DeCandia et al. “Dynamo: Amazon’s Highly Available Key­value Store,” In SOSP 2007.

09.02.2017

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI

Faculty of SciencesDepartment of Computer Science 75Overlay (and P2P)

Requirements from Dynamo● Key-value store

– shopping carts, seller lists, preferences, product catalog

G. DeCandia et al. “Dynamo: Amazon’s Highly Available Key­value Store,” In SOSP 2007.

09.02.2017

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI

Faculty of SciencesDepartment of Computer Science 76Overlay (and P2P)

Requirements from Dynamo● Key-value store

– shopping carts, seller lists, preferences, product catalog

● System built using off-the-shelf hardware.

● Platform must scale to support continuous growth

● Address tradeoff of high-availability, guaranteed performance, cost-effectiveness, and performance

G. DeCandia et al. “Dynamo: Amazon’s Highly Available Key­value Store,” In SOSP 2007.

09.02.2017

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI

Faculty of SciencesDepartment of Computer Science 77Overlay (and P2P)

Requirements from Dynamo● Key-value store

– shopping carts, seller lists, preferences, product catalog

● System built using off-the-shelf hardware.

● Platform must scale to support continuous growth

● Address tradeoff of high-availability, guaranteed performance, cost-effectiveness, and performance – ”The system needs to have scalable and robust solutions for load

balancing, membership and failure detection, failure recovery, replica synchronization, overload handling, state transfer, concurrency and job scheduling, request marshalling, request routing, system monitoring and alarming, and configuration management”

G. DeCandia et al. “Dynamo: Amazon’s Highly Available Key­value Store,” In SOSP 2007.

09.02.2017

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI

Faculty of SciencesDepartment of Computer Science 78Overlay (and P2P)

Partitioning and Replication in Dynamo

● Consistent Hashing DHT

– Virtual nodes in DHT

– Each physical node added as multiple virtual nodes

● Each data-item replicated in N nodes

– Each virtual node responsible for the region between it and its Nth predecessor

– Preference List: list of nodes (in (multiple datacenters) storing a key

G. DeCandia et al. “Dynamo: Amazon’s Highly Available Key­value Store,” In SOSP 2007.

09.02.2017

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI

Faculty of SciencesDepartment of Computer Science 79Overlay (and P2P)

API● get (key)

– may return many versions of the same object

● put(key, context, object)

– Context: encodes system metadata and includes information such as the version of the object

– may return to its caller before the update has been applied at all the replicas

– An object may have different version sub-histories

● Vector clock based versioning

● One vector clock associated with every version of objects

09.02.2017

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI

Faculty of SciencesDepartment of Computer Science 80Overlay (and P2P)

Data Versioning

G. DeCandia et al. “Dynamo: Amazon’s Highly Available Key­value Store,” In SOSP 2007.

Objects versions: D1, D2, D3, ...

Assume object is shopping cart. Requirements: additions to the cart don’t get lost but deletions can belost

all versions of the object committed to the system are returned when read

09.02.2017

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI

Faculty of SciencesDepartment of Computer Science 81Overlay (and P2P)

Sloppy Quorum

09.02.2017

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI

Faculty of SciencesDepartment of Computer Science 82Overlay (and P2P)

Sloppy Quorum● Read + Write involves N nodes from the preference list

– R: minimum number of nodes for Read

– W: minimum number of nodes for Write

09.02.2017

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI

Faculty of SciencesDepartment of Computer Science 83Overlay (and P2P)

Sloppy Quorum● Read + Write involves N nodes from the preference list

– R: minimum number of nodes for Read

– W: minimum number of nodes for Write

● R + W > N

– R = W = 5 high consistency but system is vulnerable →to network partitions

– R = W = 1 weak consistency with failure→– Typical values of (N, R, W) = (3,2,2) balance between →

performance and consistency

09.02.2017

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI

Faculty of SciencesDepartment of Computer Science 84Overlay (and P2P)

Read and Write Operations

09.02.2017

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI

Faculty of SciencesDepartment of Computer Science 85Overlay (and P2P)

Read and Write Operations● Coordinator

– Node responsible for read/writes

– First node in the preference list

09.02.2017

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI

Faculty of SciencesDepartment of Computer Science 86Overlay (and P2P)

Read and Write Operations● Coordinator

– Node responsible for read/writes

– First node in the preference list

● Write Operation

09.02.2017

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI

Faculty of SciencesDepartment of Computer Science 87Overlay (and P2P)

Read and Write Operations● Coordinator

– Node responsible for read/writes

– First node in the preference list

● Write Operation

– New vector clock from coordinator

– Write locally and forward to N-1 nodes, if W-1 nodes respond then write was successful

09.02.2017

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI

Faculty of SciencesDepartment of Computer Science 88Overlay (and P2P)

Read and Write Operations● Coordinator

– Node responsible for read/writes

– First node in the preference list

● Write Operation

– New vector clock from coordinator

– Write locally and forward to N-1 nodes, if W-1 nodes respond then write was successful

● Read Operation

09.02.2017

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI

Faculty of SciencesDepartment of Computer Science 89Overlay (and P2P)

Read and Write Operations● Coordinator

– Node responsible for read/writes

– First node in the preference list

● Write Operation

– New vector clock from coordinator

– Write locally and forward to N-1 nodes, if W-1 nodes respond then write was successful

● Read Operation

– Forward request to N-1 nodes, if R-1 nodes respond then forward to user

– User resolves conflicts and writes back result

09.02.2017

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI

Faculty of SciencesDepartment of Computer Science 90Overlay (and P2P)

Membership Changes● Gossip-based Protocol to propagate membership

changes

– Each node contacts a peer chosen at random every second and the two nodes efficiently reconcile their persisted membership change histories

● Each node is aware of the key ranges handled by its peers

09.02.2017

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI

Faculty of SciencesDepartment of Computer Science 91Overlay (and P2P)

Handling Failures: Hinted Handoff

09.02.2017

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI

Faculty of SciencesDepartment of Computer Science 92Overlay (and P2P)

Handling Failures: Hinted Handoff● Imagine A goes down

and N=3

09.02.2017

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI

Faculty of SciencesDepartment of Computer Science 93Overlay (and P2P)

Handling Failures: Hinted Handoff● Imagine A goes down

and N=3

● Keys stored by A will now be stored by D

● D is hinted in the metadata that it is storing keys meant for A

● When A recovers, the keys at D are now copied to A

09.02.2017

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI

Faculty of SciencesDepartment of Computer Science 94Overlay (and P2P)

Handling Failures: Merkle Trees● Minimize the amount of

transferred data

● Merkle Tree:

– Leaves are hashes of keys

– Parents are hashes of children

● Each node maintains seperate Merkle tree for each key-range

H(.....)

H(B1) H(B2) H(B3)

H(B4) H(B5)

H(H(B1), H(B2))

H(H(B3), H(H(B4), H(B5)))

H(H(B4), H(B5))

09.02.2017

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI

Faculty of SciencesDepartment of Computer Science 95Overlay (and P2P)

Summary

G. DeCandia et al. “Dynamo: Amazon’s Highly Available Key­value Store,” In SOSP 2007.

09.02.2017

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI

Faculty of SciencesDepartment of Computer Science 96Overlay (and P2P)

References

09.02.2017

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI

Faculty of SciencesDepartment of Computer Science 97Overlay (and P2P)

Important References● A. Fox et al. “Harvest, yield, and scalable tolerant systems.” HotOS. 1999. 

● Theo Haerder et al. "Principles of transaction­oriented database recovery." ACM Computing Surveys. 1983.

● Dan Pritchett. “BASE: An Acid Alternative.” ACM Queue. 2008.

● G. DeCandia et al. “Dynamo: Amazon’s Highly Available Key­value Store,” In SOSP 2007.