+ All Categories
Home > Documents > Data Storage Solutions for Decentralized Online Social...

Data Storage Solutions for Decentralized Online Social...

Date post: 03-Aug-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
83
Data Storage Solutions for Decentralized Online Social Networks — Anwitaman Datta S* Aspects of Networked & Distributed Systems (SANDS) School of Computer Engineering NTU Singapore iSocial Summer School, KTH Stockholm
Transcript
Page 1: Data Storage Solutions for Decentralized Online Social ...linc.ucy.ac.cy/isocial/images/stories/Events... · applications, such as backup (e.g., wuala.com), or social net- works [1]

Data Storage Solutions for Decentralized Online Social Networks

— Anwitaman Datta S* Aspects of Networked & Distributed Systems (SANDS)!

School of Computer EngineeringNTU Singapore

iSocial Summer School, KTH Stockholm

Page 2: Data Storage Solutions for Decentralized Online Social ...linc.ucy.ac.cy/isocial/images/stories/Events... · applications, such as backup (e.g., wuala.com), or social net- works [1]

Research @ SANDS

codes&for&storage&

&

trust&models&

&

social&network&analysis&

secure/privacy&preserved&computa7on&

primi7ves&

networked&distributed&storage&&&&data&management&systems&

distributed&key:value&stores&

P2P/F2F&storage&systems&

data:center&design&

&

privacy&aware/preserved&data&aggrega7on,&storage,&sharing&&

&&analy7cs/data:mining&

data/computa7on&&at&&3rd&party/outsourced&

decentralized&online&social&networking&and&collabora7on&

&

recommenda7on&and&decision&support&systems&

&

Founda'onal)(Distributed)))System

s)Applica'ons)

Page 3: Data Storage Solutions for Decentralized Online Social ...linc.ucy.ac.cy/isocial/images/stories/Events... · applications, such as backup (e.g., wuala.com), or social net- works [1]

DOSNish research at SANDS

Page 4: Data Storage Solutions for Decentralized Online Social ...linc.ucy.ac.cy/isocial/images/stories/Events... · applications, such as backup (e.g., wuala.com), or social net- works [1]

DOSNish research at SANDS

Selective information dissemination using social links

GoDisco

Page 5: Data Storage Solutions for Decentralized Online Social ...linc.ucy.ac.cy/isocial/images/stories/Events... · applications, such as backup (e.g., wuala.com), or social net- works [1]

DOSNish research at SANDS

Selective information dissemination using social links

GoDisco

Security issues

Access control, Private Information Retrieval, …

Page 6: Data Storage Solutions for Decentralized Online Social ...linc.ucy.ac.cy/isocial/images/stories/Events... · applications, such as backup (e.g., wuala.com), or social net- works [1]

DOSNish research at SANDS

Selective information dissemination using social links

GoDisco

Security issues

Access control, Private Information Retrieval, …

DOSN architectures

PeerSoN, SuperNova, PriSM, …

Page 7: Data Storage Solutions for Decentralized Online Social ...linc.ucy.ac.cy/isocial/images/stories/Events... · applications, such as backup (e.g., wuala.com), or social net- works [1]

DOSNish research at SANDS

Selective information dissemination using social links

GoDisco

Security issues

Access control, Private Information Retrieval, …

DOSN architectures

PeerSoN, SuperNova, PriSM, …

P2P storage

Page 8: Data Storage Solutions for Decentralized Online Social ...linc.ucy.ac.cy/isocial/images/stories/Events... · applications, such as backup (e.g., wuala.com), or social net- works [1]

DOSNish research at SANDS

Selective information dissemination using social links

GoDisco

Security issues

Access control, Private Information Retrieval, …

DOSN architectures

PeerSoN, SuperNova, PriSM, …

P2P storage

h"p://sands.sce.ntu.edu.sg/0

Page 9: Data Storage Solutions for Decentralized Online Social ...linc.ucy.ac.cy/isocial/images/stories/Events... · applications, such as backup (e.g., wuala.com), or social net- works [1]

P2P Storage

Not the same as a file-sharing system

Peer-to-Peer (P2P) storage systems leverage the combined storage capacity of a network of storage devices (peers) contributed typically by autonomous end-users as a common pool of storage space to store content reliably.

Page 10: Data Storage Solutions for Decentralized Online Social ...linc.ucy.ac.cy/isocial/images/stories/Events... · applications, such as backup (e.g., wuala.com), or social net- works [1]

P2P Storage

Page 11: Data Storage Solutions for Decentralized Online Social ...linc.ucy.ac.cy/isocial/images/stories/Events... · applications, such as backup (e.g., wuala.com), or social net- works [1]

P2P Storage

Design space

Page 12: Data Storage Solutions for Decentralized Online Social ...linc.ucy.ac.cy/isocial/images/stories/Events... · applications, such as backup (e.g., wuala.com), or social net- works [1]

P2P Storage

Design space

Reliability: Availability & Durability (focus of this talk)

Page 13: Data Storage Solutions for Decentralized Online Social ...linc.ucy.ac.cy/isocial/images/stories/Events... · applications, such as backup (e.g., wuala.com), or social net- works [1]

P2P Storage

Design space

Reliability: Availability & Durability (focus of this talk)

Security & Privacy: Access control, integrity, free-riding, anonymity, privacy, …

Page 14: Data Storage Solutions for Decentralized Online Social ...linc.ucy.ac.cy/isocial/images/stories/Events... · applications, such as backup (e.g., wuala.com), or social net- works [1]

P2P Storage

Design space

Reliability: Availability & Durability (focus of this talk)

Security & Privacy: Access control, integrity, free-riding, anonymity, privacy, …

Sophisticated functionalities: Concurrency, Version Control, …

Page 15: Data Storage Solutions for Decentralized Online Social ...linc.ucy.ac.cy/isocial/images/stories/Events... · applications, such as backup (e.g., wuala.com), or social net- works [1]

Realizing Reliability

Proactive

Eager: Repair all

Lazy: Deterministic (Threshold based)

Lazy: Randomized Reactive

Maintenance strategies

Redundancy type

Replication New codes, e.g. self-repairing codes

Erasure codes

Key based (e.g., DHTs)

Selective (e.g., at friends or trusted nodes, history or proximity based, etc.)

Random

Placement

Garbage collection

Diversity of online fragments

Duplicates of same fragment

P2P#storage#design#space#

Page 16: Data Storage Solutions for Decentralized Online Social ...linc.ucy.ac.cy/isocial/images/stories/Events... · applications, such as backup (e.g., wuala.com), or social net- works [1]

Redundancy Type

Page 17: Data Storage Solutions for Decentralized Online Social ...linc.ucy.ac.cy/isocial/images/stories/Events... · applications, such as backup (e.g., wuala.com), or social net- works [1]

Redundancy Type

Replication

Page 18: Data Storage Solutions for Decentralized Online Social ...linc.ucy.ac.cy/isocial/images/stories/Events... · applications, such as backup (e.g., wuala.com), or social net- works [1]

Redundancy Type

Replication

Erasure codes

Dat

a =

Obj

ect

Encoding

k blocks

O1

O2

Ok

B2

B1

Bn

n encoded blocks (stored in storage devices in a network)

Lost blocks

Retrieve any k’ (≥ k) blocks

Original k blocks

Rec

onst

ruct

Dat

a

O1

O2

Ok

Decoding Bl

Page 19: Data Storage Solutions for Decentralized Online Social ...linc.ucy.ac.cy/isocial/images/stories/Events... · applications, such as backup (e.g., wuala.com), or social net- works [1]

Redundancy placement

Page 20: Data Storage Solutions for Decentralized Online Social ...linc.ucy.ac.cy/isocial/images/stories/Events... · applications, such as backup (e.g., wuala.com), or social net- works [1]

Redundancy placement

A rather complicated problem

All peers are fully cooperative and altruistic, but autonomous

System capacity and resource allocation …

• Heterogeneity, …

Coverage: history/prediction/…

Page 21: Data Storage Solutions for Decentralized Online Social ...linc.ucy.ac.cy/isocial/images/stories/Events... · applications, such as backup (e.g., wuala.com), or social net- works [1]

Redundancy placement

A rather complicated problem

All peers are fully cooperative and altruistic, but autonomous

System capacity and resource allocation …

• Heterogeneity, …

Coverage: history/prediction/…

Selfish/Byzantine peers: Incentives, trust, enforcement, …

Page 22: Data Storage Solutions for Decentralized Online Social ...linc.ucy.ac.cy/isocial/images/stories/Events... · applications, such as backup (e.g., wuala.com), or social net- works [1]

Redundancy placement

A rather complicated problem

All peers are fully cooperative and altruistic, but autonomous

System capacity and resource allocation …

• Heterogeneity, …

Coverage: history/prediction/…

Selfish/Byzantine peers: Incentives, trust, enforcement, …

Security & privacy implications of data placement …

Page 23: Data Storage Solutions for Decentralized Online Social ...linc.ucy.ac.cy/isocial/images/stories/Events... · applications, such as backup (e.g., wuala.com), or social net- works [1]

Classical P2P storage systems

DHT$ID$space$

Successor$li

st$

replicas)

Page 24: Data Storage Solutions for Decentralized Online Social ...linc.ucy.ac.cy/isocial/images/stories/Events... · applications, such as backup (e.g., wuala.com), or social net- works [1]

Classical P2P storage systems

Distributed Hash Table (DHT) determines storage placement, e.g., CFS/OpenDHT

DHT$ID$space$

Successor$li

st$

replicas)

Page 25: Data Storage Solutions for Decentralized Online Social ...linc.ucy.ac.cy/isocial/images/stories/Events... · applications, such as backup (e.g., wuala.com), or social net- works [1]

Classical P2P storage systems

Distributed Hash Table (DHT) determines storage placement, e.g., CFS/OpenDHT

Pros: Simple design, ease of locating data

DHT$ID$space$

Successor$li

st$

replicas)

Page 26: Data Storage Solutions for Decentralized Online Social ...linc.ucy.ac.cy/isocial/images/stories/Events... · applications, such as backup (e.g., wuala.com), or social net- works [1]

Classical P2P storage systems

Distributed Hash Table (DHT) determines storage placement, e.g., CFS/OpenDHT

Pros: Simple design, ease of locating data

Cons: mixes indexing with storage

DHT$ID$space$

Successor$li

st$

replicas)

Page 27: Data Storage Solutions for Decentralized Online Social ...linc.ucy.ac.cy/isocial/images/stories/Events... · applications, such as backup (e.g., wuala.com), or social net- works [1]

Classical P2P storage systems

Distributed Hash Table (DHT) determines storage placement, e.g., CFS/OpenDHT

Pros: Simple design, ease of locating data

Cons: mixes indexing with storage

high correlation of failures

DHT$ID$space$

Successor$li

st$

replicas)

Page 28: Data Storage Solutions for Decentralized Online Social ...linc.ucy.ac.cy/isocial/images/stories/Events... · applications, such as backup (e.g., wuala.com), or social net- works [1]

Classical P2P storage systems

Distributed Hash Table (DHT) determines storage placement, e.g., CFS/OpenDHT

Pros: Simple design, ease of locating data

Cons: mixes indexing with storage

high correlation of failures

cannot leverage other characteristics

• e.g., locality, history, etc. DHT$ID$space$

Successor$li

st$

replicas)

Page 29: Data Storage Solutions for Decentralized Online Social ...linc.ucy.ac.cy/isocial/images/stories/Events... · applications, such as backup (e.g., wuala.com), or social net- works [1]

Classical P2P storage systems

Distributed Hash Table (DHT) determines storage placement, e.g., CFS/OpenDHT

Pros: Simple design, ease of locating data

Cons: mixes indexing with storage

high correlation of failures

cannot leverage other characteristics

• e.g., locality, history, etc.

may lead to poor performance

• access latency, repair cost, …

DHT$ID$space$

Successor$li

st$

replicas)

Page 30: Data Storage Solutions for Decentralized Online Social ...linc.ucy.ac.cy/isocial/images/stories/Events... · applications, such as backup (e.g., wuala.com), or social net- works [1]

Classical P2P storage systems

DHT$ID$space$

Successor$li

st$

pointers)to))

replicas)

Page 31: Data Storage Solutions for Decentralized Online Social ...linc.ucy.ac.cy/isocial/images/stories/Events... · applications, such as backup (e.g., wuala.com), or social net- works [1]

Classical P2P storage systems

Distributed Hash Table (DHT) as a directory, e.g., TotalRecall

DHT$ID$space$

Successor$li

st$

pointers)to))

replicas)

Page 32: Data Storage Solutions for Decentralized Online Social ...linc.ucy.ac.cy/isocial/images/stories/Events... · applications, such as backup (e.g., wuala.com), or social net- works [1]

Classical P2P storage systems

Distributed Hash Table (DHT) as a directory, e.g., TotalRecall

Pros: Flexible placement policy

DHT$ID$space$

Successor$li

st$

pointers)to))

replicas)

Page 33: Data Storage Solutions for Decentralized Online Social ...linc.ucy.ac.cy/isocial/images/stories/Events... · applications, such as backup (e.g., wuala.com), or social net- works [1]

Classical P2P storage systems

Distributed Hash Table (DHT) as a directory, e.g., TotalRecall

Pros: Flexible placement policy

Cons of TotalRecall, which placed at random:

???

DHT$ID$space$

Successor$li

st$

pointers)to))

replicas)

Page 34: Data Storage Solutions for Decentralized Online Social ...linc.ucy.ac.cy/isocial/images/stories/Events... · applications, such as backup (e.g., wuala.com), or social net- works [1]

Source:  Google  tech  talk  on  Wuala:  http://www.youtube.com/watch?v=3xKZ4KGkQY8  

Cloud assisted storage system

Page 35: Data Storage Solutions for Decentralized Online Social ...linc.ucy.ac.cy/isocial/images/stories/Events... · applications, such as backup (e.g., wuala.com), or social net- works [1]

Source:  Google  tech  talk  on  Wuala:  http://www.youtube.com/watch?v=3xKZ4KGkQY8  

Cloud assisted storage system Hybrid architecture (used previously in Wuala)

Page 36: Data Storage Solutions for Decentralized Online Social ...linc.ucy.ac.cy/isocial/images/stories/Events... · applications, such as backup (e.g., wuala.com), or social net- works [1]

Source:  Google  tech  talk  on  Wuala:  http://www.youtube.com/watch?v=3xKZ4KGkQY8  

DHT

Superpeers

Cloud assisted storage system Hybrid architecture (used previously in Wuala)

Page 37: Data Storage Solutions for Decentralized Online Social ...linc.ucy.ac.cy/isocial/images/stories/Events... · applications, such as backup (e.g., wuala.com), or social net- works [1]

Source:  Google  tech  talk  on  Wuala:  http://www.youtube.com/watch?v=3xKZ4KGkQY8  

DHT

Users

Superpeers

Cloud assisted storage system Hybrid architecture (used previously in Wuala)

Page 38: Data Storage Solutions for Decentralized Online Social ...linc.ucy.ac.cy/isocial/images/stories/Events... · applications, such as backup (e.g., wuala.com), or social net- works [1]

Source:  Google  tech  talk  on  Wuala:  http://www.youtube.com/watch?v=3xKZ4KGkQY8  

DHT

Storage  peers

Users

Superpeers

Cloud assisted storage system Hybrid architecture (used previously in Wuala)

Page 39: Data Storage Solutions for Decentralized Online Social ...linc.ucy.ac.cy/isocial/images/stories/Events... · applications, such as backup (e.g., wuala.com), or social net- works [1]

Source:  Google  tech  talk  on  Wuala:  http://www.youtube.com/watch?v=3xKZ4KGkQY8  

DHT

Storage  peers

Users

GET

Superpeers

Cloud assisted storage system Hybrid architecture (used previously in Wuala)

Page 40: Data Storage Solutions for Decentralized Online Social ...linc.ucy.ac.cy/isocial/images/stories/Events... · applications, such as backup (e.g., wuala.com), or social net- works [1]

Source:  Google  tech  talk  on  Wuala:  http://www.youtube.com/watch?v=3xKZ4KGkQY8  

DHT

Storage  peers

Users

GETRouting

Superpeers

Cloud assisted storage system Hybrid architecture (used previously in Wuala)

Page 41: Data Storage Solutions for Decentralized Online Social ...linc.ucy.ac.cy/isocial/images/stories/Events... · applications, such as backup (e.g., wuala.com), or social net- works [1]

Source:  Google  tech  talk  on  Wuala:  http://www.youtube.com/watch?v=3xKZ4KGkQY8  

DHT

Storage  peers

Users

GETRouting

Superpeers

Cloud assisted storage system Hybrid architecture (used previously in Wuala)

Page 42: Data Storage Solutions for Decentralized Online Social ...linc.ucy.ac.cy/isocial/images/stories/Events... · applications, such as backup (e.g., wuala.com), or social net- works [1]

Source:  Google  tech  talk  on  Wuala:  http://www.youtube.com/watch?v=3xKZ4KGkQY8  

DHT

Storage  peers

Users

GETRouting

Superpeers

Cloud assisted storage system Hybrid architecture (used previously in Wuala)

Page 43: Data Storage Solutions for Decentralized Online Social ...linc.ucy.ac.cy/isocial/images/stories/Events... · applications, such as backup (e.g., wuala.com), or social net- works [1]

Source:  Google  tech  talk  on  Wuala:  http://www.youtube.com/watch?v=3xKZ4KGkQY8  

DHT

Storage  peers

Users

GETRouting

Superpeers

Cloud assisted storage system Hybrid architecture (used previously in Wuala)

Page 44: Data Storage Solutions for Decentralized Online Social ...linc.ucy.ac.cy/isocial/images/stories/Events... · applications, such as backup (e.g., wuala.com), or social net- works [1]

Source:  Google  tech  talk  on  Wuala:  http://www.youtube.com/watch?v=3xKZ4KGkQY8  

DHT

Storage  peers

Wuala’s  dedicated  storage  data  center  

as  fallback

Users

GETRouting

Superpeers

Cloud assisted storage system Hybrid architecture (used previously in Wuala)

Page 45: Data Storage Solutions for Decentralized Online Social ...linc.ucy.ac.cy/isocial/images/stories/Events... · applications, such as backup (e.g., wuala.com), or social net- works [1]

Source:  Google  tech  talk  on  Wuala:  http://www.youtube.com/watch?v=3xKZ4KGkQY8  

DHT

Storage  peers

Wuala’s  dedicated  storage  data  center  

as  fallback

Users

GETRouting

Superpeers

Cloud assisted storage system Hybrid architecture (used previously in Wuala)

Index independent of storage

Page 46: Data Storage Solutions for Decentralized Online Social ...linc.ucy.ac.cy/isocial/images/stories/Events... · applications, such as backup (e.g., wuala.com), or social net- works [1]

Source:  Google  tech  talk  on  Wuala:  http://www.youtube.com/watch?v=3xKZ4KGkQY8  

DHT

Storage  peers

Wuala’s  dedicated  storage  data  center  

as  fallback

Users

GETRouting

Superpeers

Cloud assisted storage system Hybrid architecture (used previously in Wuala)

Index independent of storage

Many fragments per object

Page 47: Data Storage Solutions for Decentralized Online Social ...linc.ucy.ac.cy/isocial/images/stories/Events... · applications, such as backup (e.g., wuala.com), or social net- works [1]

Source:  Google  tech  talk  on  Wuala:  http://www.youtube.com/watch?v=3xKZ4KGkQY8  

DHT

Storage  peers

Wuala’s  dedicated  storage  data  center  

as  fallback

Users

GETRouting

Superpeers

Cloud assisted storage system Hybrid architecture (used previously in Wuala)

Index independent of storage

Many fragments per object

Suitable for sharing very large but static files

Page 48: Data Storage Solutions for Decentralized Online Social ...linc.ucy.ac.cy/isocial/images/stories/Events... · applications, such as backup (e.g., wuala.com), or social net- works [1]

Source:  Google  tech  talk  on  Wuala:  http://www.youtube.com/watch?v=3xKZ4KGkQY8  

DHT

Storage  peers

Wuala’s  dedicated  storage  data  center  

as  fallback

Users

GETRouting

Superpeers

Cloud assisted storage system Hybrid architecture (used previously in Wuala)

Index independent of storage

Many fragments per object

Suitable for sharing very large but static files

Parallel download

Page 49: Data Storage Solutions for Decentralized Online Social ...linc.ucy.ac.cy/isocial/images/stories/Events... · applications, such as backup (e.g., wuala.com), or social net- works [1]

Source:  Google  tech  talk  on  Wuala:  http://www.youtube.com/watch?v=3xKZ4KGkQY8  

DHT

Storage  peers

Wuala’s  dedicated  storage  data  center  

as  fallback

Users

GETRouting

Superpeers

Cloud assisted storage system Hybrid architecture (used previously in Wuala)

Index independent of storage

Many fragments per object

Suitable for sharing very large but static files

Parallel download

Piggy-backed, large DHT routing states

Page 50: Data Storage Solutions for Decentralized Online Social ...linc.ucy.ac.cy/isocial/images/stories/Events... · applications, such as backup (e.g., wuala.com), or social net- works [1]

Source:  Google  tech  talk  on  Wuala:  http://www.youtube.com/watch?v=3xKZ4KGkQY8  

DHT

Storage  peers

Wuala’s  dedicated  storage  data  center  

as  fallback

Users

GETRouting

Superpeers

Cloud assisted storage system Hybrid architecture (used previously in Wuala)

Index independent of storage

Many fragments per object

Suitable for sharing very large but static files

Parallel download

Piggy-backed, large DHT routing states

So very few hops needed, gives high through-put

Page 51: Data Storage Solutions for Decentralized Online Social ...linc.ucy.ac.cy/isocial/images/stories/Events... · applications, such as backup (e.g., wuala.com), or social net- works [1]

More sophisticated heuristics

Page 52: Data Storage Solutions for Decentralized Online Social ...linc.ucy.ac.cy/isocial/images/stories/Events... · applications, such as backup (e.g., wuala.com), or social net- works [1]

More sophisticated heuristics

Incentives

reciprocity, trust/reputation, …

Page 53: Data Storage Solutions for Decentralized Online Social ...linc.ucy.ac.cy/isocial/images/stories/Events... · applications, such as backup (e.g., wuala.com), or social net- works [1]

More sophisticated heuristics

Incentives

reciprocity, trust/reputation, …

QoS: 24/7 coverage, locality, …

online/offline behavior (history/prediction), …

Page 54: Data Storage Solutions for Decentralized Online Social ...linc.ucy.ac.cy/isocial/images/stories/Events... · applications, such as backup (e.g., wuala.com), or social net- works [1]

More sophisticated heuristics

Incentives

reciprocity, trust/reputation, …

QoS: 24/7 coverage, locality, …

online/offline behavior (history/prediction), …

Control

De/centralized, local/global knowledge

Page 55: Data Storage Solutions for Decentralized Online Social ...linc.ucy.ac.cy/isocial/images/stories/Events... · applications, such as backup (e.g., wuala.com), or social net- works [1]

Replication model: A clique of replicas storing each other’s data (reciprocity)

Explores both centralized and decentralized settings for clique formation

Challenge

Centralized matching - right set of peers to optimize storage capacity utilization (proven NP-hard)

Decentralized matching - uses an underlying gossip algorithm (T-man) to explore partners

Replica Placement in P2P Storage:Complexity and Game Theoretic Analyses

TECHNICAL REPORT, 15TH JUNE 2010

Krzysztof RzadcaSchool of Computer Engineering

Nanyang Technological UniversitySingapore

Email: [email protected]

Anwitaman DattaSchool of Computer Engineering

Nanyang Technological UniversitySingapore

Email: [email protected]

Sonja BucheggerSchool of Computer Science

KTHSweden

Email: [email protected]

Abstract—In peer-to-peer storage systems, peers replicate eachothers’ data in order to increase availability. If the matching isdone centrally, the algorithm can optimize data availability in anequitable manner for all participants. However, if matching isdecentralized, the peers’ selfishness can greatly alter the results,leading to performance inequities that can render the systemunreliable and thus ultimately unusable.

We analyze the problem using both theoretical approaches(complexity analysis for the centralized system, game theory forthe decentralized one) and simulation. We prove that the problemof optimizing availability in a centralized system is NP-hard.In decentralized settings, we show that the rational behaviorof selfish peers will be to replicate only with similarly-availablepeers. Compared to the socially-optimal solution, highly availablepeers have their data availability increased at the expense ofdecreased data availability for less available peers. The price ofanarchy is high: unbounded in one model, and linear with thenumber of time slots in the second model.

We also propose centralized and decentralized heuristics that,according to our experiments, converge fast in the average case.

The high price of anarchy means that a completely decentral-ized system could be too hostile for peers with low availability,who could never achieve satisfying replication parameters. More-over, we experimentally show that even explicit considerationand exploitation of diurnal patterns of peer availability has asmall effect on the data availability—except when the systemhas truly global scope. Yet a fully centralized system is infeasible,not only because of problems in information gathering, but alsothe complexity of optimizing availability. The solution to thisdilemma is to create system-wide cooperation rules that allowa decentralized algorithm, but also limit the selfishness of theparticipants.Index Terms—price of anarchy, equitable optimization, dis-tributed storage

I. INTRODUCTION

A decentralized system for data storage and replicationis an important building block of many peer-to-peer (p2p)applications, such as backup (e.g., wuala.com), or social net-works [1] (in which, when a user is off-line, the system ensuresthat her data is available for her friends). In such systems,individual users (peers) store other users’ data. Data storage

The work in this paper has been funded in part by NTU/MoE’s AcRF Tier-1RG 29/09 and A*Star SERC 072 134 0055 grants.

uses not only storage space but, more importantly, consumesbandwidth [2]. In return, a user expects that her data will alsobe stored remotely, increasing availability and resilience. Asusers in p2p systems are assumed to be independent [3], [4],they seek to maximize their perceived profits (e.g., availabilityof their data) and to minimize their contribution (e.g., theamount of other users’ data they store). Thus, the crucialdecision an user must take is to choose other users that willreplicate her data (and whose data she will replicate, assuminga reciprocity-based scheme). Depending on the organization ofthe system, this decision is either done through the agency ofa centralized matching system (like in wuala.com), or using afully decentralized algorithm in which users form replicationagreements [5], [6].

In this paper, we study the problem of maximization ofdata availability in a decentralized data replication system. Inorder to obtain worst-case bounds in these complex systems,we model what we consider the crucial characteristics of theproblem along two axes: (1) peer availability (deterministictime slots or probabilistic); (2) matching (centralized andenforced or decentralized and autonomous).

In the probabilistic model, a peer’s availability is theprobability of the peer being available (correlated with thepeer’s expected lifetime, like in [7], [6]). The goal is tomaximize data availability given the constraints on the storagesize. In contrast, in the time slot (deterministic) model peeravailability is a function of time, either in a periodic way [8](also observed for the whole system in [9]), or according toa detailed prediction for the next time period. In this model,availability is a set of time slots in which the peer is availablewith certainty. The goal is to minimize the number of replicassuch that the sum of their availability periods covers the wholeprediction time.

We analyze both availability models when matching is doneeither centrally or in a decentralized manner. A centralizedsystem collects information about the peers’ availabilities andthen derives replication groups so that the expected availability(or resource usage) is optimized in a manner equitable to allthe participants. In a decentralized system, each peer seeks to

Rzadca et al, ICDCS 2010

Page 56: Data Storage Solutions for Decentralized Online Social ...linc.ucy.ac.cy/isocial/images/stories/Events... · applications, such as backup (e.g., wuala.com), or social net- works [1]

Representative result(simulations with artificial data)

covered time slots (possibly replacing one of the existingmembers of Gk), scoreT (i, j) = |Aj � AGk|/T . If Gk iscomplete, the score is inversely proportional to the differencein the number of members in cliques, scoreT (i, j) = 1 �||Gk|� |Gl||/(max(Gk, Gl)).

When two cliques Gk and Gl are merged, and |Gk|+|Gl| >s + 1, we use a greedy algorithm to construct the “better”clique. The algorithm starts with choosing the peer who coversthe maximum number of time slots. Then, from the remainingpeers, the algorithm adds the peer that covers the maximumnumber of currently uncovered time slots. This step is repeateduntil there are peers able to cover uncovered time slots (with alimit of maximum clique size s+1) or the number of remainingpeers is greater than s + 1 (as the remaining peers will formone clique).

Finally, as peers also minimize the clique size, each cliqueperiodically removes redundant members. A peer j is redun-dant for clique Gk if and only if all the time slots coveredby j are covered by other members of the clique, thusAGk�{j} = AGk.

The algorithm is evaluated in Section VII-B.

VII. SIMULATION OF THE ALGORITHMS

A. Probabilistic Model

1) Simulation Settings: Peers’ availabilities were generatedin three steps. Firstly, according to [6], 10% of the peershave availability 0.95, 25%—0.87, 30%—0.75 and 30%—0.33. Then, we added a Gaussian noise with � = 0.1 toeach availability. Finally, we caped the resulting value, sothat 0.03 av(i) 0.97. Histogram on Figure 2 shows theresulting distribution of peers. We repeated each experiment on50 instances with peers’ availabilities generated as describedabove; error bars on plots denote standard deviations.

We set the storage size s = 5 and the sizes of random andmetric pools in T-Man gossiping to 50.

We implemented decentralized algorithms in a custom dis-crete event simulator. In each round of the simulated matching,all the peers are processed sequentially in random order. Eachpeer performs one iteration of T-Man gossiping, and then oneiteration of the decentralized matching algorithm (in the firstfour rounds we perform only gossiping in order to “warm up”the metric pools).

2) Centralized Algorithms: Subgame Perfect vs EquitableSolutions: We started with comparing random, subgame per-fect and equitable allocation algorithms according to theresulting data unavailability. We ran these algorithms on 50randomly-generated instances of 10000 peers each; then wecomputed averages over all the random instances and allpeers having similar availabilities (with resolution equal totwo decimal places, e.g., the score for 0.95 is an average forall peers with 0.95 av(i) < 0.96). Figure 2 summarizes theobtained results.

The equitable algorithm produces cliques that result insimilar data availability regardless of the peer’s availability.In contrast, the subgame perfect equilibrium results in widerange of data availabilities: while the highly available peers

� worse better �!

better

worse�!

10

�8

10

�6

10

�4

10

�2

10

0

estim

ated

data

unavailability

0 0.2 0.4 0.6 0.8 1

peer availability

0k

10k

20k

30k

40k

50k

num

ber

ofpeers

in

bucket

(histogram

)

peers (histogram)

random

equitable

subgame perfect

Fig. 2. Peers’ expected data unavailability as a function of their availabilityin random, equitable and subgame perfect assignment. Histogram shows thenumber of peers in each availability bucket.

have their data available with expected failure probability ofapproximately 10

�9, the weakest available peers almost do notgain from replication, with data unavailability close to 1.

Such diversification in the subgame perfect solution pro-vides incentives for peers to be highly available. A highlyavailable peer is able to replicate its data with other highlyavailable peers, which exponentially increases peer’s dataavailability. Thus, the subgame perfect solution is fair to par-ticipants. However, the subgame perfect solution might be too“extreme” to the less-available peers. Peers with availabilitiesless than approximately 0.5 have their data available withprobability less than 0.99 (approximately), which might be notsufficient for some applications. This, in turn, can discouragesuch peers to join the system, and consequently, prohibit thesystem from growing to a critical mass.

On the other hand, an equitable solution does not rewardhighly available peers. In absence of altruistic peers, thesystem would degenerate.

Consequently, a robust system might require a hybrid ofthe selfish and the equitable solution: guaranteeing someminimal level of service to less available peers (but alsorequiring minimal availability), at the same time rewardinghighly available peers with higher data availability.

Also note that the equitable solution clearly Pareto-dominates the random assignment, resulting in higher dataavailabilities for all classes of peers.

3) Decentralized Algorithms: Speed of Convergence: Inthe next series of experiments, we measure how fast do thedecentralized algorithms presented in Section VI-A convergeto the subgame perfect cliques.

Initial experiments revealed that the Optimistic Queriesversion of the algorithm is inefficient. After the first fewrounds when the underlying gossiping protocol efficiently fillsthe metric pools of all peers with the same set of 50 highestavailable peers, in the subsequent rounds the whole populationqueries the best peer, the second-best peer, and so on. Thus,replication agreements are formed extremely slowly. We ob-serve that if peers’ availabilities are distinct, approximatelyk/(s + 1) cliques are formed after approximately k rounds.

Figure 3 compares the convergence speed of PragmaticQueries to Explicit Cliques, measured as the median average

Page 57: Data Storage Solutions for Decentralized Online Social ...linc.ucy.ac.cy/isocial/images/stories/Events... · applications, such as backup (e.g., wuala.com), or social net- works [1]

Representative result(simulations with artificial data)

covered time slots (possibly replacing one of the existingmembers of Gk), scoreT (i, j) = |Aj � AGk|/T . If Gk iscomplete, the score is inversely proportional to the differencein the number of members in cliques, scoreT (i, j) = 1 �||Gk|� |Gl||/(max(Gk, Gl)).

When two cliques Gk and Gl are merged, and |Gk|+|Gl| >s + 1, we use a greedy algorithm to construct the “better”clique. The algorithm starts with choosing the peer who coversthe maximum number of time slots. Then, from the remainingpeers, the algorithm adds the peer that covers the maximumnumber of currently uncovered time slots. This step is repeateduntil there are peers able to cover uncovered time slots (with alimit of maximum clique size s+1) or the number of remainingpeers is greater than s + 1 (as the remaining peers will formone clique).

Finally, as peers also minimize the clique size, each cliqueperiodically removes redundant members. A peer j is redun-dant for clique Gk if and only if all the time slots coveredby j are covered by other members of the clique, thusAGk�{j} = AGk.

The algorithm is evaluated in Section VII-B.

VII. SIMULATION OF THE ALGORITHMS

A. Probabilistic Model

1) Simulation Settings: Peers’ availabilities were generatedin three steps. Firstly, according to [6], 10% of the peershave availability 0.95, 25%—0.87, 30%—0.75 and 30%—0.33. Then, we added a Gaussian noise with � = 0.1 toeach availability. Finally, we caped the resulting value, sothat 0.03 av(i) 0.97. Histogram on Figure 2 shows theresulting distribution of peers. We repeated each experiment on50 instances with peers’ availabilities generated as describedabove; error bars on plots denote standard deviations.

We set the storage size s = 5 and the sizes of random andmetric pools in T-Man gossiping to 50.

We implemented decentralized algorithms in a custom dis-crete event simulator. In each round of the simulated matching,all the peers are processed sequentially in random order. Eachpeer performs one iteration of T-Man gossiping, and then oneiteration of the decentralized matching algorithm (in the firstfour rounds we perform only gossiping in order to “warm up”the metric pools).

2) Centralized Algorithms: Subgame Perfect vs EquitableSolutions: We started with comparing random, subgame per-fect and equitable allocation algorithms according to theresulting data unavailability. We ran these algorithms on 50randomly-generated instances of 10000 peers each; then wecomputed averages over all the random instances and allpeers having similar availabilities (with resolution equal totwo decimal places, e.g., the score for 0.95 is an average forall peers with 0.95 av(i) < 0.96). Figure 2 summarizes theobtained results.

The equitable algorithm produces cliques that result insimilar data availability regardless of the peer’s availability.In contrast, the subgame perfect equilibrium results in widerange of data availabilities: while the highly available peers

� worse better �!

better

worse�!

10

�8

10

�6

10

�4

10

�2

10

0

estim

ated

data

unavailability

0 0.2 0.4 0.6 0.8 1

peer availability

0k

10k

20k

30k

40k

50k

num

ber

ofpeers

in

bucket

(histogram

)

peers (histogram)

random

equitable

subgame perfect

Fig. 2. Peers’ expected data unavailability as a function of their availabilityin random, equitable and subgame perfect assignment. Histogram shows thenumber of peers in each availability bucket.

have their data available with expected failure probability ofapproximately 10

�9, the weakest available peers almost do notgain from replication, with data unavailability close to 1.

Such diversification in the subgame perfect solution pro-vides incentives for peers to be highly available. A highlyavailable peer is able to replicate its data with other highlyavailable peers, which exponentially increases peer’s dataavailability. Thus, the subgame perfect solution is fair to par-ticipants. However, the subgame perfect solution might be too“extreme” to the less-available peers. Peers with availabilitiesless than approximately 0.5 have their data available withprobability less than 0.99 (approximately), which might be notsufficient for some applications. This, in turn, can discouragesuch peers to join the system, and consequently, prohibit thesystem from growing to a critical mass.

On the other hand, an equitable solution does not rewardhighly available peers. In absence of altruistic peers, thesystem would degenerate.

Consequently, a robust system might require a hybrid ofthe selfish and the equitable solution: guaranteeing someminimal level of service to less available peers (but alsorequiring minimal availability), at the same time rewardinghighly available peers with higher data availability.

Also note that the equitable solution clearly Pareto-dominates the random assignment, resulting in higher dataavailabilities for all classes of peers.

3) Decentralized Algorithms: Speed of Convergence: Inthe next series of experiments, we measure how fast do thedecentralized algorithms presented in Section VI-A convergeto the subgame perfect cliques.

Initial experiments revealed that the Optimistic Queriesversion of the algorithm is inefficient. After the first fewrounds when the underlying gossiping protocol efficiently fillsthe metric pools of all peers with the same set of 50 highestavailable peers, in the subsequent rounds the whole populationqueries the best peer, the second-best peer, and so on. Thus,replication agreements are formed extremely slowly. We ob-serve that if peers’ availabilities are distinct, approximatelyk/(s + 1) cliques are formed after approximately k rounds.

Figure 3 compares the convergence speed of PragmaticQueries to Explicit Cliques, measured as the median average

Good or bad?

Page 58: Data Storage Solutions for Decentralized Online Social ...linc.ucy.ac.cy/isocial/images/stories/Events... · applications, such as backup (e.g., wuala.com), or social net- works [1]

How about F2F storage?

Page 59: Data Storage Solutions for Decentralized Online Social ...linc.ucy.ac.cy/isocial/images/stories/Events... · applications, such as backup (e.g., wuala.com), or social net- works [1]

How about F2F storage?

Friend-to-Friend instead of Peer-to-Peer

Page 60: Data Storage Solutions for Decentralized Online Social ...linc.ucy.ac.cy/isocial/images/stories/Events... · applications, such as backup (e.g., wuala.com), or social net- works [1]

How about F2F storage?

Friend-to-Friend instead of Peer-to-Peer

Translating “real life” trust into something useful for reliable “system” design

Page 61: Data Storage Solutions for Decentralized Online Social ...linc.ucy.ac.cy/isocial/images/stories/Events... · applications, such as backup (e.g., wuala.com), or social net- works [1]

How about F2F storage?

Friend-to-Friend instead of Peer-to-Peer

Translating “real life” trust into something useful for reliable “system” design

Maps naturally to the overlying social application

Page 62: Data Storage Solutions for Decentralized Online Social ...linc.ucy.ac.cy/isocial/images/stories/Events... · applications, such as backup (e.g., wuala.com), or social net- works [1]

How about F2F storage?

Friend-to-Friend instead of Peer-to-Peer

Translating “real life” trust into something useful for reliable “system” design

Maps naturally to the overlying social application

Anecdotal note: SafeBook used Friend-of-Friends for access control also

Page 63: Data Storage Solutions for Decentralized Online Social ...linc.ucy.ac.cy/isocial/images/stories/Events... · applications, such as backup (e.g., wuala.com), or social net- works [1]

Place data at friends: That’s it?

Store at all friends (naïve/baseline)

Best one can do in terms of achieving highest possible availability

Very high overheads!

Storage

Maintenance

Page 64: Data Storage Solutions for Decentralized Online Social ...linc.ucy.ac.cy/isocial/images/stories/Events... · applications, such as backup (e.g., wuala.com), or social net- works [1]

Place data at friends: That’s it?

Store at all friends (naïve/baseline)

Best one can do in terms of achieving highest possible availability

Very high overheads!

Storage

Maintenance Find instead a

“reasonable” subset of friends to store at!

Page 65: Data Storage Solutions for Decentralized Online Social ...linc.ucy.ac.cy/isocial/images/stories/Events... · applications, such as backup (e.g., wuala.com), or social net- works [1]

An empirical study of availability in friend-to-friendstorage systems

Rajesh Sharma and Anwitaman DattaNanyang Technological University, Singapore

[email protected], [email protected]

Matteo Dell’Amico and Pietro MichiardiEurecom, Sophia-Antipolis, France

{matteo.dell-amico,pietro.michiardi}@eurecom.fr

Abstract—Friend-to-friend networks, i.e. peer-to-peer networkswhere data are exchanged and stored solely through nodesowned by trusted users, can guarantee dependability, privacy anduncensorability by exploiting social trust. However, the limitationof storing data only on friends can come to the detriment ofdata availability: if no friends are online, then data stored in thesystem will not be accessible. In this work, we explore the trade-offs between redundancy (i.e., how many copies of data are storedon friends), data placement (the choice of which friend nodes tostore data on) and data availability (the probability of findingdata online). We show that the problem of obtaining maximalavailability while minimizing redundancy is NP-complete; inaddition, we perform an exploratory study on data placementstrategies, and we investigate their performance in terms ofredundancy needed and availability obtained. By performing atrace-based evaluation, we show that nodes with as few as 10friends can already obtain good availability levels.Keywords: friend-to-friend (F2F), storage systems, data place-ment, NP-complete, heuristics

I. INTRODUCTION

Peer-to-peer (P2P) storage systems have been studied forover a decade, starting with the OceanStore [5] project. Thepremise of P2P storage is crowdsourcing the storage cloud[2] to other end-users. One of the many design issues in suchsystems is the choice of peers at which to store data. A specificsubclass of P2P storage systems have emerged based on theplacement choice being constrained to ‘friends’ of the dataowner, for example, FriendStore [7]. The basic characteristicsof such friend-to-friend (F2F) storage systems are: (i) real-life social trust is exploited to guarantee a dependable system(e.g., a friend of mine won’t erase my data); (ii) data access ispredominantly confined within a small social neighborhood.These networks, also known as ‘darknets’ when the focusis on security, can also guarantee privacy and resistance tocensorship [1]. F2F storage thus constitutes a good buildingblock for diverse applications such as personal backup serviceand decentralized online social networking.

For personal backup, while data persistence is more critical,data availability is nevertheless desirable. For decentralizedonline social networking systems such as SuperNova [6],availability is of paramount importance. Thus, a fundamentalproblem that arises is determining what kind of availabilityone can achieve in a storage system where data placement for

This work was supported in part by A*Star SERC grant 072 134 0055and NTU/MoE Tier-1 grant RG 29/09. The collaboration between NTU andEurecom was supported by Merlion grant.

any specific data owner is constrained by the use of only peernodes run by friends of the data owner.

There are several variations of this basic question thatwould interest a F2F storage system designer. A baseline isdetermined when all friends of a node store its data. This isthe best in terms of availability that one can achieve subjectto the constraint of using friend nodes exclusively. However,there are some obvious variations worth studying. Can thesame availability (or any other predetermined threshold ofavailability) be achieved using only a subset of the node’sfriends? How does the law of diminishing returns work interms of availability, as the number of used friends is in-creased? If a stipulated number of friends are to be used, whatis the best availability that can be achieved? Furthermore, theway to measure availability itself may vary. For a personalbackup application, the data owner may care for the data to beavailable only when it itself is online - for example, with otherportable devices. For a decentralized online social networkingapplication, the data owner can serve its own data when itis itself online, but will like the friends to make the dataavailable when it is itself offline. More generally, availabilitymay also be determined based on whether it was availablewhen there was any access request for the data. These variousinterpretations of availability may depend on the access andapplication specific characteristics.

The achievable and achieved performance would depend onthe (temporal) characteristics of individual nodes’ egocentricnetworks (i.e., the social network consisting of those nodes andtheir respective immediate friends), the actual data-placementpolicies determining a subset of friends to store data at, as wellas the interpretation of availability itself. This paper is a firstattempt to formalize these quantitative aspects of F2F storagesystems, exploring algorithmic aspects of data placement in(sub-)optimal subset of friends, and exposition of the efficacyof F2F storage systems using trace-driven simulations usingreal egocentric social network traces capturing additionallynode availability traces over time.

The important contributions of this paper include (i) defin-ing some key characteristics of an ego-network which influ-ence the achievable availability in a F2F storage system, (ii)observing that identification of a minimal set of friends toachieve the maximum achievable coverage is in fact analogousto the set cover problem, and hence NP-hard, (iii) proposegreedy heuristic data placement algorithms, and (iv) evaluation

Sharma et al, P2P 2011

Page 66: Data Storage Solutions for Decentralized Online Social ...linc.ucy.ac.cy/isocial/images/stories/Events... · applications, such as backup (e.g., wuala.com), or social net- works [1]

Look at the temporal online/offline behavior of friends

An empirical study of availability in friend-to-friendstorage systems

Rajesh Sharma and Anwitaman DattaNanyang Technological University, Singapore

[email protected], [email protected]

Matteo Dell’Amico and Pietro MichiardiEurecom, Sophia-Antipolis, France

{matteo.dell-amico,pietro.michiardi}@eurecom.fr

Abstract—Friend-to-friend networks, i.e. peer-to-peer networkswhere data are exchanged and stored solely through nodesowned by trusted users, can guarantee dependability, privacy anduncensorability by exploiting social trust. However, the limitationof storing data only on friends can come to the detriment ofdata availability: if no friends are online, then data stored in thesystem will not be accessible. In this work, we explore the trade-offs between redundancy (i.e., how many copies of data are storedon friends), data placement (the choice of which friend nodes tostore data on) and data availability (the probability of findingdata online). We show that the problem of obtaining maximalavailability while minimizing redundancy is NP-complete; inaddition, we perform an exploratory study on data placementstrategies, and we investigate their performance in terms ofredundancy needed and availability obtained. By performing atrace-based evaluation, we show that nodes with as few as 10friends can already obtain good availability levels.Keywords: friend-to-friend (F2F), storage systems, data place-ment, NP-complete, heuristics

I. INTRODUCTION

Peer-to-peer (P2P) storage systems have been studied forover a decade, starting with the OceanStore [5] project. Thepremise of P2P storage is crowdsourcing the storage cloud[2] to other end-users. One of the many design issues in suchsystems is the choice of peers at which to store data. A specificsubclass of P2P storage systems have emerged based on theplacement choice being constrained to ‘friends’ of the dataowner, for example, FriendStore [7]. The basic characteristicsof such friend-to-friend (F2F) storage systems are: (i) real-life social trust is exploited to guarantee a dependable system(e.g., a friend of mine won’t erase my data); (ii) data access ispredominantly confined within a small social neighborhood.These networks, also known as ‘darknets’ when the focusis on security, can also guarantee privacy and resistance tocensorship [1]. F2F storage thus constitutes a good buildingblock for diverse applications such as personal backup serviceand decentralized online social networking.

For personal backup, while data persistence is more critical,data availability is nevertheless desirable. For decentralizedonline social networking systems such as SuperNova [6],availability is of paramount importance. Thus, a fundamentalproblem that arises is determining what kind of availabilityone can achieve in a storage system where data placement for

This work was supported in part by A*Star SERC grant 072 134 0055and NTU/MoE Tier-1 grant RG 29/09. The collaboration between NTU andEurecom was supported by Merlion grant.

any specific data owner is constrained by the use of only peernodes run by friends of the data owner.

There are several variations of this basic question thatwould interest a F2F storage system designer. A baseline isdetermined when all friends of a node store its data. This isthe best in terms of availability that one can achieve subjectto the constraint of using friend nodes exclusively. However,there are some obvious variations worth studying. Can thesame availability (or any other predetermined threshold ofavailability) be achieved using only a subset of the node’sfriends? How does the law of diminishing returns work interms of availability, as the number of used friends is in-creased? If a stipulated number of friends are to be used, whatis the best availability that can be achieved? Furthermore, theway to measure availability itself may vary. For a personalbackup application, the data owner may care for the data to beavailable only when it itself is online - for example, with otherportable devices. For a decentralized online social networkingapplication, the data owner can serve its own data when itis itself online, but will like the friends to make the dataavailable when it is itself offline. More generally, availabilitymay also be determined based on whether it was availablewhen there was any access request for the data. These variousinterpretations of availability may depend on the access andapplication specific characteristics.

The achievable and achieved performance would depend onthe (temporal) characteristics of individual nodes’ egocentricnetworks (i.e., the social network consisting of those nodes andtheir respective immediate friends), the actual data-placementpolicies determining a subset of friends to store data at, as wellas the interpretation of availability itself. This paper is a firstattempt to formalize these quantitative aspects of F2F storagesystems, exploring algorithmic aspects of data placement in(sub-)optimal subset of friends, and exposition of the efficacyof F2F storage systems using trace-driven simulations usingreal egocentric social network traces capturing additionallynode availability traces over time.

The important contributions of this paper include (i) defin-ing some key characteristics of an ego-network which influ-ence the achievable availability in a F2F storage system, (ii)observing that identification of a minimal set of friends toachieve the maximum achievable coverage is in fact analogousto the set cover problem, and hence NP-hard, (iii) proposegreedy heuristic data placement algorithms, and (iv) evaluation

Sharma et al, P2P 2011

Page 67: Data Storage Solutions for Decentralized Online Social ...linc.ucy.ac.cy/isocial/images/stories/Events... · applications, such as backup (e.g., wuala.com), or social net- works [1]

Look at the temporal online/offline behavior of friends

Achievable coverage

What best availability can be achieved?

An empirical study of availability in friend-to-friendstorage systems

Rajesh Sharma and Anwitaman DattaNanyang Technological University, Singapore

[email protected], [email protected]

Matteo Dell’Amico and Pietro MichiardiEurecom, Sophia-Antipolis, France

{matteo.dell-amico,pietro.michiardi}@eurecom.fr

Abstract—Friend-to-friend networks, i.e. peer-to-peer networkswhere data are exchanged and stored solely through nodesowned by trusted users, can guarantee dependability, privacy anduncensorability by exploiting social trust. However, the limitationof storing data only on friends can come to the detriment ofdata availability: if no friends are online, then data stored in thesystem will not be accessible. In this work, we explore the trade-offs between redundancy (i.e., how many copies of data are storedon friends), data placement (the choice of which friend nodes tostore data on) and data availability (the probability of findingdata online). We show that the problem of obtaining maximalavailability while minimizing redundancy is NP-complete; inaddition, we perform an exploratory study on data placementstrategies, and we investigate their performance in terms ofredundancy needed and availability obtained. By performing atrace-based evaluation, we show that nodes with as few as 10friends can already obtain good availability levels.Keywords: friend-to-friend (F2F), storage systems, data place-ment, NP-complete, heuristics

I. INTRODUCTION

Peer-to-peer (P2P) storage systems have been studied forover a decade, starting with the OceanStore [5] project. Thepremise of P2P storage is crowdsourcing the storage cloud[2] to other end-users. One of the many design issues in suchsystems is the choice of peers at which to store data. A specificsubclass of P2P storage systems have emerged based on theplacement choice being constrained to ‘friends’ of the dataowner, for example, FriendStore [7]. The basic characteristicsof such friend-to-friend (F2F) storage systems are: (i) real-life social trust is exploited to guarantee a dependable system(e.g., a friend of mine won’t erase my data); (ii) data access ispredominantly confined within a small social neighborhood.These networks, also known as ‘darknets’ when the focusis on security, can also guarantee privacy and resistance tocensorship [1]. F2F storage thus constitutes a good buildingblock for diverse applications such as personal backup serviceand decentralized online social networking.

For personal backup, while data persistence is more critical,data availability is nevertheless desirable. For decentralizedonline social networking systems such as SuperNova [6],availability is of paramount importance. Thus, a fundamentalproblem that arises is determining what kind of availabilityone can achieve in a storage system where data placement for

This work was supported in part by A*Star SERC grant 072 134 0055and NTU/MoE Tier-1 grant RG 29/09. The collaboration between NTU andEurecom was supported by Merlion grant.

any specific data owner is constrained by the use of only peernodes run by friends of the data owner.

There are several variations of this basic question thatwould interest a F2F storage system designer. A baseline isdetermined when all friends of a node store its data. This isthe best in terms of availability that one can achieve subjectto the constraint of using friend nodes exclusively. However,there are some obvious variations worth studying. Can thesame availability (or any other predetermined threshold ofavailability) be achieved using only a subset of the node’sfriends? How does the law of diminishing returns work interms of availability, as the number of used friends is in-creased? If a stipulated number of friends are to be used, whatis the best availability that can be achieved? Furthermore, theway to measure availability itself may vary. For a personalbackup application, the data owner may care for the data to beavailable only when it itself is online - for example, with otherportable devices. For a decentralized online social networkingapplication, the data owner can serve its own data when itis itself online, but will like the friends to make the dataavailable when it is itself offline. More generally, availabilitymay also be determined based on whether it was availablewhen there was any access request for the data. These variousinterpretations of availability may depend on the access andapplication specific characteristics.

The achievable and achieved performance would depend onthe (temporal) characteristics of individual nodes’ egocentricnetworks (i.e., the social network consisting of those nodes andtheir respective immediate friends), the actual data-placementpolicies determining a subset of friends to store data at, as wellas the interpretation of availability itself. This paper is a firstattempt to formalize these quantitative aspects of F2F storagesystems, exploring algorithmic aspects of data placement in(sub-)optimal subset of friends, and exposition of the efficacyof F2F storage systems using trace-driven simulations usingreal egocentric social network traces capturing additionallynode availability traces over time.

The important contributions of this paper include (i) defin-ing some key characteristics of an ego-network which influ-ence the achievable availability in a F2F storage system, (ii)observing that identification of a minimal set of friends toachieve the maximum achievable coverage is in fact analogousto the set cover problem, and hence NP-hard, (iii) proposegreedy heuristic data placement algorithms, and (iv) evaluation

Sharma et al, P2P 2011

Page 68: Data Storage Solutions for Decentralized Online Social ...linc.ucy.ac.cy/isocial/images/stories/Events... · applications, such as backup (e.g., wuala.com), or social net- works [1]

Look at the temporal online/offline behavior of friends

Achievable coverage

What best availability can be achieved?

Criticality of friends

Which friends are indispensable?

An empirical study of availability in friend-to-friendstorage systems

Rajesh Sharma and Anwitaman DattaNanyang Technological University, Singapore

[email protected], [email protected]

Matteo Dell’Amico and Pietro MichiardiEurecom, Sophia-Antipolis, France

{matteo.dell-amico,pietro.michiardi}@eurecom.fr

Abstract—Friend-to-friend networks, i.e. peer-to-peer networkswhere data are exchanged and stored solely through nodesowned by trusted users, can guarantee dependability, privacy anduncensorability by exploiting social trust. However, the limitationof storing data only on friends can come to the detriment ofdata availability: if no friends are online, then data stored in thesystem will not be accessible. In this work, we explore the trade-offs between redundancy (i.e., how many copies of data are storedon friends), data placement (the choice of which friend nodes tostore data on) and data availability (the probability of findingdata online). We show that the problem of obtaining maximalavailability while minimizing redundancy is NP-complete; inaddition, we perform an exploratory study on data placementstrategies, and we investigate their performance in terms ofredundancy needed and availability obtained. By performing atrace-based evaluation, we show that nodes with as few as 10friends can already obtain good availability levels.Keywords: friend-to-friend (F2F), storage systems, data place-ment, NP-complete, heuristics

I. INTRODUCTION

Peer-to-peer (P2P) storage systems have been studied forover a decade, starting with the OceanStore [5] project. Thepremise of P2P storage is crowdsourcing the storage cloud[2] to other end-users. One of the many design issues in suchsystems is the choice of peers at which to store data. A specificsubclass of P2P storage systems have emerged based on theplacement choice being constrained to ‘friends’ of the dataowner, for example, FriendStore [7]. The basic characteristicsof such friend-to-friend (F2F) storage systems are: (i) real-life social trust is exploited to guarantee a dependable system(e.g., a friend of mine won’t erase my data); (ii) data access ispredominantly confined within a small social neighborhood.These networks, also known as ‘darknets’ when the focusis on security, can also guarantee privacy and resistance tocensorship [1]. F2F storage thus constitutes a good buildingblock for diverse applications such as personal backup serviceand decentralized online social networking.

For personal backup, while data persistence is more critical,data availability is nevertheless desirable. For decentralizedonline social networking systems such as SuperNova [6],availability is of paramount importance. Thus, a fundamentalproblem that arises is determining what kind of availabilityone can achieve in a storage system where data placement for

This work was supported in part by A*Star SERC grant 072 134 0055and NTU/MoE Tier-1 grant RG 29/09. The collaboration between NTU andEurecom was supported by Merlion grant.

any specific data owner is constrained by the use of only peernodes run by friends of the data owner.

There are several variations of this basic question thatwould interest a F2F storage system designer. A baseline isdetermined when all friends of a node store its data. This isthe best in terms of availability that one can achieve subjectto the constraint of using friend nodes exclusively. However,there are some obvious variations worth studying. Can thesame availability (or any other predetermined threshold ofavailability) be achieved using only a subset of the node’sfriends? How does the law of diminishing returns work interms of availability, as the number of used friends is in-creased? If a stipulated number of friends are to be used, whatis the best availability that can be achieved? Furthermore, theway to measure availability itself may vary. For a personalbackup application, the data owner may care for the data to beavailable only when it itself is online - for example, with otherportable devices. For a decentralized online social networkingapplication, the data owner can serve its own data when itis itself online, but will like the friends to make the dataavailable when it is itself offline. More generally, availabilitymay also be determined based on whether it was availablewhen there was any access request for the data. These variousinterpretations of availability may depend on the access andapplication specific characteristics.

The achievable and achieved performance would depend onthe (temporal) characteristics of individual nodes’ egocentricnetworks (i.e., the social network consisting of those nodes andtheir respective immediate friends), the actual data-placementpolicies determining a subset of friends to store data at, as wellas the interpretation of availability itself. This paper is a firstattempt to formalize these quantitative aspects of F2F storagesystems, exploring algorithmic aspects of data placement in(sub-)optimal subset of friends, and exposition of the efficacyof F2F storage systems using trace-driven simulations usingreal egocentric social network traces capturing additionallynode availability traces over time.

The important contributions of this paper include (i) defin-ing some key characteristics of an ego-network which influ-ence the achievable availability in a F2F storage system, (ii)observing that identification of a minimal set of friends toachieve the maximum achievable coverage is in fact analogousto the set cover problem, and hence NP-hard, (iii) proposegreedy heuristic data placement algorithms, and (iv) evaluation

Sharma et al, P2P 2011

Page 69: Data Storage Solutions for Decentralized Online Social ...linc.ucy.ac.cy/isocial/images/stories/Events... · applications, such as backup (e.g., wuala.com), or social net- works [1]

Evaluation

Data set

Italian instant messenger service

Pros

• Social+Temporal characterisitcs

• “May” reasonably reflect the online/offline behavior

Cons:

• Not a p2p storage system trace

• “small”, “incomplete” and “geographically localized”

Page 70: Data Storage Solutions for Decentralized Online Social ...linc.ucy.ac.cy/isocial/images/stories/Events... · applications, such as backup (e.g., wuala.com), or social net- works [1]

Evaluation

Data set

Italian instant messenger service

Pros

• Social+Temporal characterisitcs

• “May” reasonably reflect the online/offline behavior

Cons:

• Not a p2p storage system trace

• “small”, “incomplete” and “geographically localized”

!  3436$nodes$o 848$nodes$in$the$largest$component$

" Note$that$many$nodes$had$“neighbors”$in$other$servers,$for$whom$we$did$not$have$info.$

" Between$1A18$neighbors$

!  Use$two$weeks$of$data$o One$for$“learning”,$one$for$evaluaFon$

" Time$of$day,$day$of$week$effects$

Page 71: Data Storage Solutions for Decentralized Online Social ...linc.ucy.ac.cy/isocial/images/stories/Events... · applications, such as backup (e.g., wuala.com), or social net- works [1]

Representative results

Page 72: Data Storage Solutions for Decentralized Online Social ...linc.ucy.ac.cy/isocial/images/stories/Events... · applications, such as backup (e.g., wuala.com), or social net- works [1]

Representative results

!  AC:$achievable$coverage$o 50%$nodes$can$get$more$than$90%$availability$

!  Crit:$Time$covered$using$cri<cal$nodes$o Too$much$dependence$on$cri<cal$nodes$

Page 73: Data Storage Solutions for Decentralized Online Social ...linc.ucy.ac.cy/isocial/images/stories/Events... · applications, such as backup (e.g., wuala.com), or social net- works [1]

Representative results

!  AC:$achievable$coverage$o 50%$nodes$can$get$more$than$90%$availability$

!  Crit:$Time$covered$using$cri<cal$nodes$o Too$much$dependence$on$cri<cal$nodes$

!  !<Achievable!coverage,!Degree!of!Cri3cality,!#!of!Friends>!

Page 74: Data Storage Solutions for Decentralized Online Social ...linc.ucy.ac.cy/isocial/images/stories/Events... · applications, such as backup (e.g., wuala.com), or social net- works [1]

Representative results

!  AC:$achievable$coverage$o 50%$nodes$can$get$more$than$90%$availability$

!  Crit:$Time$covered$using$cri<cal$nodes$o Too$much$dependence$on$cri<cal$nodes$

!  !<Achievable!coverage,!Degree!of!Cri3cality,!#!of!Friends>!

If there are “enough” friends, (>10), ought to be okay! (assuming

storage capacity is not an issue)

Page 75: Data Storage Solutions for Decentralized Online Social ...linc.ucy.ac.cy/isocial/images/stories/Events... · applications, such as backup (e.g., wuala.com), or social net- works [1]

Bootstrapping pangs!

New peers with few friends in the system, or no reputation of being highly available, will find it difficult to get started!

Game-theoretic study on reciprocity based P2P cliques

Analysis of ego-centric networks for F2F storage

Page 76: Data Storage Solutions for Decentralized Online Social ...linc.ucy.ac.cy/isocial/images/stories/Events... · applications, such as backup (e.g., wuala.com), or social net- works [1]

SuperNova: Super-peers Based Architecture forDecentralized Online Social Networks

Rajesh Sharma and Anwitaman Datta

School of Computer Engineering, Nanyang Technological University, Singapore.{raje0014,Anwitaman}@ntu.edu.sg

Abstract. Recent years have seen several earnest initiatives from both academicresearchers as well as open source communities to implement and deploy decen-tralized online social networks (DOSNs). The primary motivations for DOSNsare privacy and autonomy from big brotherly service providers. The promise ofdecentralization is complete freedom for end-users from any service providersboth in terms of keeping privacy about content and communication, and also fromany form of censorship. However decentralization introduces many challenges.One of the principal problems is to guarantee availability of data even when thedata owner is not online, so that others can access the said data even when anode is offline or down. Intuitively this can be solved by replicating the data onother users’ machines. Existing DOSN proposals try to solve this problem usingheuristics which are agnostic to the various kinds of heterogeneity both in termsof end user resources as well as end user behaviors in such a system. For instance,some propose replication at friends, or at some other peers based on other heuris-tics such as reciprocal storage among nodes with similar availability, or storagein a global DHT realized using all peers’ resources. In this paper, we argue thata pragmatic design needs to explicitly allow for and leverage on system hetero-geneity, and provide incentives for the resource rich participants in the systemto contribute such resources. To that end we introduce SuperNova - a super-peerbased DOSN architecture. Super-peers can help (i) bootstrap new peers who areyet to have/find any friends by either providing them storage space, (ii) maintain-ing a directory of users, so that users can find friends in the network by name orinterests, (iii) help peers find other peers to store their content in case they don’thave adequate friends to do so, or if their friends are already overloaded. We en-vision a self-organizing system, where nodes that provide substantial resourcescan gain reputation, and be elevated to the status of super-peers. Users may wantto become super-peers out of altruism (they want DOSNs to succeed), for thesake of the reputation (e.g., being an influential member for an interest basedcommunity) as well as potentially to monetize their special roles (e.g., run adver-tisements). While proposing the SuperNova architecture, we envision a dynamicsystem driven by incentives and reputation, however, investigation of such incen-tives and reputation, and its effect on determining peer behaviors is a subject forour future study. In this paper we instead investigate the efficacy of a super-peerbased system at any time point (a snap-shot of the envisioned dynamic system),that is to say, we try to quantify the performance of SuperNova system given any(fixed) mix of peer population and strategies.

Keywords: System architecture, Super-peers, Storage, Self-organization

arX

iv:1

105.

0074

v2 [

cs.S

I] 2

5 M

ay 2

011

Sharma et al, Comsnets 2012

Page 77: Data Storage Solutions for Decentralized Online Social ...linc.ucy.ac.cy/isocial/images/stories/Events... · applications, such as backup (e.g., wuala.com), or social net- works [1]

The big picture/premise

SuperNova: Super-peers Based Architecture forDecentralized Online Social Networks

Rajesh Sharma and Anwitaman Datta

School of Computer Engineering, Nanyang Technological University, Singapore.{raje0014,Anwitaman}@ntu.edu.sg

Abstract. Recent years have seen several earnest initiatives from both academicresearchers as well as open source communities to implement and deploy decen-tralized online social networks (DOSNs). The primary motivations for DOSNsare privacy and autonomy from big brotherly service providers. The promise ofdecentralization is complete freedom for end-users from any service providersboth in terms of keeping privacy about content and communication, and also fromany form of censorship. However decentralization introduces many challenges.One of the principal problems is to guarantee availability of data even when thedata owner is not online, so that others can access the said data even when anode is offline or down. Intuitively this can be solved by replicating the data onother users’ machines. Existing DOSN proposals try to solve this problem usingheuristics which are agnostic to the various kinds of heterogeneity both in termsof end user resources as well as end user behaviors in such a system. For instance,some propose replication at friends, or at some other peers based on other heuris-tics such as reciprocal storage among nodes with similar availability, or storagein a global DHT realized using all peers’ resources. In this paper, we argue thata pragmatic design needs to explicitly allow for and leverage on system hetero-geneity, and provide incentives for the resource rich participants in the systemto contribute such resources. To that end we introduce SuperNova - a super-peerbased DOSN architecture. Super-peers can help (i) bootstrap new peers who areyet to have/find any friends by either providing them storage space, (ii) maintain-ing a directory of users, so that users can find friends in the network by name orinterests, (iii) help peers find other peers to store their content in case they don’thave adequate friends to do so, or if their friends are already overloaded. We en-vision a self-organizing system, where nodes that provide substantial resourcescan gain reputation, and be elevated to the status of super-peers. Users may wantto become super-peers out of altruism (they want DOSNs to succeed), for thesake of the reputation (e.g., being an influential member for an interest basedcommunity) as well as potentially to monetize their special roles (e.g., run adver-tisements). While proposing the SuperNova architecture, we envision a dynamicsystem driven by incentives and reputation, however, investigation of such incen-tives and reputation, and its effect on determining peer behaviors is a subject forour future study. In this paper we instead investigate the efficacy of a super-peerbased system at any time point (a snap-shot of the envisioned dynamic system),that is to say, we try to quantify the performance of SuperNova system given any(fixed) mix of peer population and strategies.

Keywords: System architecture, Super-peers, Storage, Self-organization

arX

iv:1

105.

0074

v2 [

cs.S

I] 2

5 M

ay 2

011

Sharma et al, Comsnets 2012

Page 78: Data Storage Solutions for Decentralized Online Social ...linc.ucy.ac.cy/isocial/images/stories/Events... · applications, such as backup (e.g., wuala.com), or social net- works [1]

The big picture/premise

Well resourced nodes act as super-peers

incentives (could be): reputation within an interest community, ability to monetize (e.g., using ads), …

SuperNova: Super-peers Based Architecture forDecentralized Online Social Networks

Rajesh Sharma and Anwitaman Datta

School of Computer Engineering, Nanyang Technological University, Singapore.{raje0014,Anwitaman}@ntu.edu.sg

Abstract. Recent years have seen several earnest initiatives from both academicresearchers as well as open source communities to implement and deploy decen-tralized online social networks (DOSNs). The primary motivations for DOSNsare privacy and autonomy from big brotherly service providers. The promise ofdecentralization is complete freedom for end-users from any service providersboth in terms of keeping privacy about content and communication, and also fromany form of censorship. However decentralization introduces many challenges.One of the principal problems is to guarantee availability of data even when thedata owner is not online, so that others can access the said data even when anode is offline or down. Intuitively this can be solved by replicating the data onother users’ machines. Existing DOSN proposals try to solve this problem usingheuristics which are agnostic to the various kinds of heterogeneity both in termsof end user resources as well as end user behaviors in such a system. For instance,some propose replication at friends, or at some other peers based on other heuris-tics such as reciprocal storage among nodes with similar availability, or storagein a global DHT realized using all peers’ resources. In this paper, we argue thata pragmatic design needs to explicitly allow for and leverage on system hetero-geneity, and provide incentives for the resource rich participants in the systemto contribute such resources. To that end we introduce SuperNova - a super-peerbased DOSN architecture. Super-peers can help (i) bootstrap new peers who areyet to have/find any friends by either providing them storage space, (ii) maintain-ing a directory of users, so that users can find friends in the network by name orinterests, (iii) help peers find other peers to store their content in case they don’thave adequate friends to do so, or if their friends are already overloaded. We en-vision a self-organizing system, where nodes that provide substantial resourcescan gain reputation, and be elevated to the status of super-peers. Users may wantto become super-peers out of altruism (they want DOSNs to succeed), for thesake of the reputation (e.g., being an influential member for an interest basedcommunity) as well as potentially to monetize their special roles (e.g., run adver-tisements). While proposing the SuperNova architecture, we envision a dynamicsystem driven by incentives and reputation, however, investigation of such incen-tives and reputation, and its effect on determining peer behaviors is a subject forour future study. In this paper we instead investigate the efficacy of a super-peerbased system at any time point (a snap-shot of the envisioned dynamic system),that is to say, we try to quantify the performance of SuperNova system given any(fixed) mix of peer population and strategies.

Keywords: System architecture, Super-peers, Storage, Self-organization

arX

iv:1

105.

0074

v2 [

cs.S

I] 2

5 M

ay 2

011

Sharma et al, Comsnets 2012

Page 79: Data Storage Solutions for Decentralized Online Social ...linc.ucy.ac.cy/isocial/images/stories/Events... · applications, such as backup (e.g., wuala.com), or social net- works [1]

The big picture/premise

Well resourced nodes act as super-peers

incentives (could be): reputation within an interest community, ability to monetize (e.g., using ads), …

New nodes use superpeers for storage, until they get established in the system

so that the super-peers are not over-burdened, or become a bottleneck for established peers, …

SuperNova: Super-peers Based Architecture forDecentralized Online Social Networks

Rajesh Sharma and Anwitaman Datta

School of Computer Engineering, Nanyang Technological University, Singapore.{raje0014,Anwitaman}@ntu.edu.sg

Abstract. Recent years have seen several earnest initiatives from both academicresearchers as well as open source communities to implement and deploy decen-tralized online social networks (DOSNs). The primary motivations for DOSNsare privacy and autonomy from big brotherly service providers. The promise ofdecentralization is complete freedom for end-users from any service providersboth in terms of keeping privacy about content and communication, and also fromany form of censorship. However decentralization introduces many challenges.One of the principal problems is to guarantee availability of data even when thedata owner is not online, so that others can access the said data even when anode is offline or down. Intuitively this can be solved by replicating the data onother users’ machines. Existing DOSN proposals try to solve this problem usingheuristics which are agnostic to the various kinds of heterogeneity both in termsof end user resources as well as end user behaviors in such a system. For instance,some propose replication at friends, or at some other peers based on other heuris-tics such as reciprocal storage among nodes with similar availability, or storagein a global DHT realized using all peers’ resources. In this paper, we argue thata pragmatic design needs to explicitly allow for and leverage on system hetero-geneity, and provide incentives for the resource rich participants in the systemto contribute such resources. To that end we introduce SuperNova - a super-peerbased DOSN architecture. Super-peers can help (i) bootstrap new peers who areyet to have/find any friends by either providing them storage space, (ii) maintain-ing a directory of users, so that users can find friends in the network by name orinterests, (iii) help peers find other peers to store their content in case they don’thave adequate friends to do so, or if their friends are already overloaded. We en-vision a self-organizing system, where nodes that provide substantial resourcescan gain reputation, and be elevated to the status of super-peers. Users may wantto become super-peers out of altruism (they want DOSNs to succeed), for thesake of the reputation (e.g., being an influential member for an interest basedcommunity) as well as potentially to monetize their special roles (e.g., run adver-tisements). While proposing the SuperNova architecture, we envision a dynamicsystem driven by incentives and reputation, however, investigation of such incen-tives and reputation, and its effect on determining peer behaviors is a subject forour future study. In this paper we instead investigate the efficacy of a super-peerbased system at any time point (a snap-shot of the envisioned dynamic system),that is to say, we try to quantify the performance of SuperNova system given any(fixed) mix of peer population and strategies.

Keywords: System architecture, Super-peers, Storage, Self-organization

arX

iv:1

105.

0074

v2 [

cs.S

I] 2

5 M

ay 2

011

Sharma et al, Comsnets 2012

Page 80: Data Storage Solutions for Decentralized Online Social ...linc.ucy.ac.cy/isocial/images/stories/Events... · applications, such as backup (e.g., wuala.com), or social net- works [1]

The big picture/premise

Well resourced nodes act as super-peers

incentives (could be): reputation within an interest community, ability to monetize (e.g., using ads), …

New nodes use superpeers for storage, until they get established in the system

so that the super-peers are not over-burdened, or become a bottleneck for established peers, …

Superpeers help coordinating, finding storage partners, etc.

SuperNova: Super-peers Based Architecture forDecentralized Online Social Networks

Rajesh Sharma and Anwitaman Datta

School of Computer Engineering, Nanyang Technological University, Singapore.{raje0014,Anwitaman}@ntu.edu.sg

Abstract. Recent years have seen several earnest initiatives from both academicresearchers as well as open source communities to implement and deploy decen-tralized online social networks (DOSNs). The primary motivations for DOSNsare privacy and autonomy from big brotherly service providers. The promise ofdecentralization is complete freedom for end-users from any service providersboth in terms of keeping privacy about content and communication, and also fromany form of censorship. However decentralization introduces many challenges.One of the principal problems is to guarantee availability of data even when thedata owner is not online, so that others can access the said data even when anode is offline or down. Intuitively this can be solved by replicating the data onother users’ machines. Existing DOSN proposals try to solve this problem usingheuristics which are agnostic to the various kinds of heterogeneity both in termsof end user resources as well as end user behaviors in such a system. For instance,some propose replication at friends, or at some other peers based on other heuris-tics such as reciprocal storage among nodes with similar availability, or storagein a global DHT realized using all peers’ resources. In this paper, we argue thata pragmatic design needs to explicitly allow for and leverage on system hetero-geneity, and provide incentives for the resource rich participants in the systemto contribute such resources. To that end we introduce SuperNova - a super-peerbased DOSN architecture. Super-peers can help (i) bootstrap new peers who areyet to have/find any friends by either providing them storage space, (ii) maintain-ing a directory of users, so that users can find friends in the network by name orinterests, (iii) help peers find other peers to store their content in case they don’thave adequate friends to do so, or if their friends are already overloaded. We en-vision a self-organizing system, where nodes that provide substantial resourcescan gain reputation, and be elevated to the status of super-peers. Users may wantto become super-peers out of altruism (they want DOSNs to succeed), for thesake of the reputation (e.g., being an influential member for an interest basedcommunity) as well as potentially to monetize their special roles (e.g., run adver-tisements). While proposing the SuperNova architecture, we envision a dynamicsystem driven by incentives and reputation, however, investigation of such incen-tives and reputation, and its effect on determining peer behaviors is a subject forour future study. In this paper we instead investigate the efficacy of a super-peerbased system at any time point (a snap-shot of the envisioned dynamic system),that is to say, we try to quantify the performance of SuperNova system given any(fixed) mix of peer population and strategies.

Keywords: System architecture, Super-peers, Storage, Self-organization

arX

iv:1

105.

0074

v2 [

cs.S

I] 2

5 M

ay 2

011

Sharma et al, Comsnets 2012

Page 81: Data Storage Solutions for Decentralized Online Social ...linc.ucy.ac.cy/isocial/images/stories/Events... · applications, such as backup (e.g., wuala.com), or social net- works [1]

Representative result(a) Cumulative Availability (b) Individual Availability

(c) System Performance

Fig. 2. Comparison for Friend’s Time (FT) and Total Time (TT) for Deviation (D) andNonDeviation (ND)

compared to a node which has scattered his data, like by selecting some strangers aswell. This observation is reinforced with the help of Figure 2(c), where the total per-centage of nodes for ”excellent” plus ”very good” in case of friends after introducingdeviation (FD) becomes less than the total excellent nodes without introducing the de-viation (FND). If we consider the total time as a measure to find availability, we noticethat there is not much difference in performance (excellent plus very good) between nodeviation (NDTT) and when deviation is introduced (TTD). This can be attributed tothe fact that if a node spread his data to nodes other than friends, then he will be lessaffected by the deviation and thus he can either choose strangers, and if strangers arenot willing to do so he can take help of super-peers.2. Comparison of Flat Vs Super-peer architectures: We also compare our approachto a flat scheme where there are no super-peers, in contrast to our architecture. In theabsence of a trusting authority like super-peers, it is difficult to keep a strangers’ list ortrust information regarding strangers for finding suitable store-keepers. However, to doa fair comparison with the flat scheme, we assume that nodes can convince strangersto store their data using a reciprocity scheme [16]. This assumption works well with

Take with a huge pinch of salt: artificial data to drive simulations, with too many parameters …

Page 82: Data Storage Solutions for Decentralized Online Social ...linc.ucy.ac.cy/isocial/images/stories/Events... · applications, such as backup (e.g., wuala.com), or social net- works [1]

Moving forward

dynamic/social data store High availability High consistency High rate of data updates Small volume of data

Security modulesEncryption access control …

Social modulesAnalytics Search/Navigation Recommendation …

Bulk (static) data

storage

Full-fledged (D)OSNLight weight P2P OSN

P2P overlay with basic services: DHT lookup, peer-sampling, etc.

Could be even (multi-)cloud based.

Can be a small dynamic clique maintained aggressively

Page 83: Data Storage Solutions for Decentralized Online Social ...linc.ucy.ac.cy/isocial/images/stories/Events... · applications, such as backup (e.g., wuala.com), or social net- works [1]

Recommended