+ All Categories
Home > Documents > A scalable Content- Addressable Network

A scalable Content- Addressable Network

Date post: 13-Jan-2016
Category:
Upload: linus
View: 78 times
Download: 6 times
Share this document with a friend
Description:
A scalable Content- Addressable Network. Sylvia Rathnasamy, Paul Francis, Mark Handley, Richard Karp, Scott Shenker. Pirammanayagam Manickavasagam. Overview. Introduction Design Design Improvements Design Review Related works Discussion. Introduction. Hash Table Functionality: - PowerPoint PPT Presentation
Popular Tags:
33
1 A scalable Content- Addressable Network Sylvia Rathnasamy, Paul Francis, Mark Handley, Richard Karp, Scott Shenker Pirammanayagam Manickavasagam
Transcript
Page 1: A scalable Content- Addressable Network

1

A scalable Content- Addressable Network

Sylvia Rathnasamy, Paul Francis, Mark Handley, Richard Karp, Scott Shenker

Pirammanayagam Manickavasagam

Page 2: A scalable Content- Addressable Network

2

Overview

Introduction Design Design Improvements Design Review Related works Discussion

Page 3: A scalable Content- Addressable Network

3

Introduction

Hash Table Functionality: Maps ‘key’ to a ‘value’.

Content Addressable Network (CAN) :-

Is a concept that provides distributed infrastructure which has Hash Table like functionality on Internet like Scale.

Characteristics: scalable, fault-tolerant and completely self-organizing.

Page 4: A scalable Content- Addressable Network

4

Introduction (cont..)

Napster Locating a file is centralized.

Gnutella Floods the request for a file, not scalable

CAN provides a solution: Scalable - Nodes maintain small amount of control

state Distributed - Hash table is stored in all Peers, so it

is.

Page 5: A scalable Content- Addressable Network

5

Design

Each node stores a chunk of hash table entry and details of adjacent zones.

Requests are forwarded towards the CAN node that contains the key.

Indexing uses virtual d-dimensional Cartesian coordinates. Coordinates are purely logical

Page 6: A scalable Content- Addressable Network

6

Coordinate Space

•A

•D •B

•C

0,01,0

0,1

Each node randomly picks a coordinate.Coordinate space is dynamically partitioned

Each node owns its individual zone

Page 7: A scalable Content- Addressable Network

7

Design (cont..)

Inserting a pair ( key K1, value V1) Use Hash function to map K1 to a point P1 in space Then this pair is stored in the Node that owns the zone

Retrieving a value: Need to know the key and use the key to identify the

node Node learns and maintains the table of details of

adjacent nodes.

Page 8: A scalable Content- Addressable Network

8

Routing

Information's needed for routing CAN node hold routing table that contains IP address

and its virtual coordinate space. Neighbor is determined if one of the d-dimension is

same and another dimension abuts. For a d-dimensional coordinate individual node

maintains 2d neighbors

Page 9: A scalable Content- Addressable Network

9

In figure nodes 5&1 are neighbors, as 5 has same Y coordinates as 1 and X coordinate abut 1’s.

Page 10: A scalable Content- Addressable Network

10

Routing (Cont..)

CAN message has destination address By simple greedy forwarding to the neighbor

closest to the destination it proceeds it routing. average path length = (d/4)n1/d hops. ( n - # of

zones) As many path is available, network sustains even

if some node fails.

Page 11: A scalable Content- Addressable Network

11

Construction

1. First the new node must find a node already in the CAN.

2. Next, using the CAN routing mechanisms, it must find a node whose zone will be split.

3. Finally, the neighbors of the split zone must be notified so that routing can include the new node.

Page 12: A scalable Content- Addressable Network

12

Bootstrap

From DNS domain name, one or more bootstrap nodes is determined.

A bootstrap node maintains a partial list of CAN nodes it believes are currently in the system.

TO join a CAN, a new node looks up the CAN domain name in DNS to retrieve a bootstrap nodes IP address.

This bootstrap node then supplies the IP address of several randomly chosen nodes currently in system.

Page 13: A scalable Content- Addressable Network

13

Finding a zone

New node randomly chooses a point (p) in space. Sends JOIN request destined for P. This is sent into CAN via existing CAN node. Current occupant node then splits its zone in half

and assigns one half to the new node. Splitting is done by assuming certain order.

Eg, in 2 d, X coordinate splits first and then Y coordinate.

Page 14: A scalable Content- Addressable Network

14

Maintenance

Departure of a Node Single Node Failure Multiple Failure

Page 15: A scalable Content- Addressable Network

15

Departure of a Node

The node that departs hands over the details to the one of its neighbor.

If the zone of one of the neighbors can be merged with the departing node’s zone to produce a valid single zone, then this is done.

If not, then the zone is handed to the neighbor whose current zone is smallest, and that node will then temporarily handle both zones.

Page 16: A scalable Content- Addressable Network

16

Departure of a Node

•A

•D •B

•C

1,0

0,1

0,0

•D

•E •F.

When node F fails, E will be merged with F

Page 17: A scalable Content- Addressable Network

17

Failures

Prolonged absence of update message will indicate the failure of a node. Neighbor node starts a takeover timer running. When the timer expires, a node sends a TAKEOVER

message conveying its own zone volume to all of the failed node’s neighbors.

It accepts the TAKEOVER only if the zone volume in the message is smaller than its own zone volume.

Otherwise it sends its TAKEOVER message.

Page 18: A scalable Content- Addressable Network

18

Multiple Failure

First does a ring search to get the unreachable nodes.

Then rebuilds neighbor state table to do safe takeover.

Page 19: A scalable Content- Addressable Network

19

Design Improvements

Multi-dimensioned coordinate spaces Increasing the dimensions of the CAN coordinate space

reduces the routing path length, and hence the path latency.

Increase in Dimension => increase in neighbor => increase in routing => increases routing fault tolerance

Page 20: A scalable Content- Addressable Network

20

Page 21: A scalable Content- Addressable Network

21

Design Improvements

Realities: multiple coordinate spaces Each node maintain multiple, independent coordinate spaces with

each node in the system. Each such coordinate space is a “reality”.

Given a coordinate, it is searched in all realities. This reduces the average path length.

Multiple dimensions vs. multiple realities Multiple Reality has increased fault tolerance and data

availability than multiple dimensions.

Page 22: A scalable Content- Addressable Network

22

Design Improvements

Overloading coordinate zones allow multiple nodes to share the same zone. Nodes that share the

same zone are termed peers. MAXPEERS, which is the maximum number of allowable peers

per zone. reduced path length (number of hops), and hence reduced path

latency improved fault tolerance

Multiple hash functions Almost equal to multi realities.

Page 23: A scalable Content- Addressable Network

23

Design Improvements

Topologically-sensitive construction of the CAN overlay network CAN nodes are ordered with their round-trip-time to

each of landmarks. With m landmarks, m! such orderings are possible. Every portion is assigned a landmark ordering. a new node joins the CAN at a random point in that

portion of the coordinate space associated with its landmark ordering.

Page 24: A scalable Content- Addressable Network

24

Design Improvements

More Uniform Partitioning Zone are split after comparing volume of its zone with those

of its immediate neighbors in the coordinate space. Zone with the largest volume is split. we can see that without the uniform partitioning feature a

little over 40% of the nodes are assigned to zones with volume V as compared to almost 90% with this feature and the largest zone volume drops from 8V to 2V .

Not surprisingly, the partitioning of the space further improves with increasing dimensions.

Caching and Replication techniques

Page 25: A scalable Content- Addressable Network

25

Page 26: A scalable Content- Addressable Network

26

Design Review

Following metrics were used to evaluate system performance: Path length: the number of (application-level) hops required to route

between two points in the coordinate space. Neighbor-state: the number of CAN nodes for which an individual node

must retain state. Latency: we consider both the end-to-end latency of the total routing path

between two points in the coordinate space and the per-hop latency, i.e., latency of individual application level hops obtained by dividing the end-to-end latency by the path length.

Volume: the volume of the zone to which a node is assigned that is indicative of the request and storage load a node must handle.

Routing fault tolerance: the availability of multiple paths between two points in the CAN.

Hash table availability: adequate replication of a (key,value) entry to withstand the loss of one or more replicas.

Page 27: A scalable Content- Addressable Network

27

Design Review

The key design parameters affecting system performance are: dimensionality of the virtual coordinate space: d number of realities: r number of peer nodes per zone: p number of hash functions (i.e. number of points per reality at which a (key, value)

pair is stored): k use of the RTT-weighted routing metric use of the uniform partitioning

Test system specification: A system size of n=218 nodes ,Transit-Stub topology with delay of 100ms on intra-

transit links, 10ms on stub-transit links and 1ms on intra-stub links (i.e. 100ms on links that connect two transit nodes, 10ms on links that connect a transit node to a stubnode and so forth).

Transit-stub models explicitly group vertices into domains, and reflect that grouping in the connectivity between vertices.

Page 28: A scalable Content- Addressable Network

28

100 node transit-stub topology

Page 29: A scalable Content- Addressable Network

29

Bare bones: CAN that does not utilize most of our additional design features Knobs-on-full: CAN making full use of our added features (without the landmark ordering feature)

Page 30: A scalable Content- Addressable Network

30

Related Work

Related Algorithms Distance vector and Link State algorithms

These need widespread topological information. CAN in other hand stores only less data.

Plaxton algorithm Each node has n bit label divided into l levels. Each level has width w = n/ l. Each node forwards a packet to a neighbor whose label

matches the destination label in more digits.

Page 31: A scalable Content- Addressable Network

31

Related Work

Algorithms with geographic routing. ‘space’ in this algorithm refers to physical space. No neighbor search problem. Correctly mimic the space is a trivial problem It is not extensible to multi dimension

Page 32: A scalable Content- Addressable Network

32

Related System

Domain Name System It stores (domain name, IP address).

Ocean Store To provide continuous access to persistent information Uses Plaxtons algorithm

Peer-to-Peer file sharing systems Freenet

Stores Keys ( analogous URL ), address of other nodes, data corresponding to key.

Page 33: A scalable Content- Addressable Network

33

Discussion

Addresses two key problems in the design of Content-Addressable Networks: scalable routing and indexing.

Simulation results validate the scalability of our overall design – for a CAN with over 260,000 nodes, we can route with a latency that is less than twice the IP path latency.

Future works Secure CAN Key word searching


Recommended