+ All Categories
Home > Documents > Network Computing Laboratory FeedEx: Collaborative Exchange of News Feeds Seung Jun, Mustaque Ahamad...

Network Computing Laboratory FeedEx: Collaborative Exchange of News Feeds Seung Jun, Mustaque Ahamad...

Date post: 02-Jan-2016
Category:
Upload: lillian-rich
View: 219 times
Download: 4 times
Share this document with a friend
27
Network Computing Laboratory FeedEx: Collaborative Exchange FeedEx: Collaborative Exchange of News Feeds of News Feeds Seung Jun, Mustaque Ahamad Georgia Institute of Technology WWW 2006
Transcript
Page 1: Network Computing Laboratory FeedEx: Collaborative Exchange of News Feeds Seung Jun, Mustaque Ahamad Georgia Institute of Technology WWW 2006.

Network Computing Laboratory

FeedEx: Collaborative Exchange FeedEx: Collaborative Exchange of News Feedsof News Feeds

Seung Jun, Mustaque AhamadGeorgia Institute of Technology

WWW 2006

Page 2: Network Computing Laboratory FeedEx: Collaborative Exchange of News Feeds Seung Jun, Mustaque Ahamad Georgia Institute of Technology WWW 2006.

Korea Advanced Institute of Science and Technology

Network Computing Laboratory | 2

OutlineOutline

One line comment

Motivation/Problem

Approach

Analysis of feed publishing

Challenges

Experiments

Critique

Page 3: Network Computing Laboratory FeedEx: Collaborative Exchange of News Feeds Seung Jun, Mustaque Ahamad Georgia Institute of Technology WWW 2006.

Korea Advanced Institute of Science and Technology

Network Computing Laboratory | 3

One line commentOne line comment

Disseminate web feeds in a distributed (P2P) manner to increase scalability of web servers

RSS reveals visitors to content providers

RSS decoupled fetch operation from read

RSS A B

Traditional method

P2P method

A B

Page 4: Network Computing Laboratory FeedEx: Collaborative Exchange of News Feeds Seung Jun, Mustaque Ahamad Georgia Institute of Technology WWW 2006.

Korea Advanced Institute of Science and Technology

Network Computing Laboratory | 4

Motivation & ProblemMotivation & ProblemRSS/Atom feeds have become increasingly popular

Published by most traditional media and blogs

Feeding mechanismhttp://nyt.com/../feed.xml

Update page as contents are added

HTTP request

HTTP response

nyt.com

RSS reader:

Poll server to check updates

……

Scalability

Page 5: Network Computing Laboratory FeedEx: Collaborative Exchange of News Feeds Seung Jun, Mustaque Ahamad Georgia Institute of Technology WWW 2006.

Korea Advanced Institute of Science and Technology

Network Computing Laboratory | 5

ApproachApproachThe Approach

P2P overlay + gossip based protocolP2P: Scalable growth in resources with service demandGossip: Scalable, Robustness (Join & Leave)

Feature of this overlayDon’t have to guarantee delivery or delay

Challenges

Overlay construction

Fetching interval

determination

Data disseminationFree riding

prevention

?content

searching

Page 6: Network Computing Laboratory FeedEx: Collaborative Exchange of News Feeds Seung Jun, Mustaque Ahamad Georgia Institute of Technology WWW 2006.

Korea Advanced Institute of Science and Technology

Network Computing Laboratory | 6

Analysis of Feed PublishingAnalysis of Feed Publishing

Methodology245 popular feeds monitored for 10 days

Most popular feeds – information from Gmail’s web clips, Bloglines

Feeds fetched every 2 minutes

Measured..Publishing rate

Entry count in a feed

Entry lifetime

Page 7: Network Computing Laboratory FeedEx: Collaborative Exchange of News Feeds Seung Jun, Mustaque Ahamad Georgia Institute of Technology WWW 2006.

Korea Advanced Institute of Science and Technology

Network Computing Laboratory | 7

Publishing Rate by RankPublishing Rate by Rank

Great difference between publishers

Partly zipf distribution●

● ●

● ● ●

● ● ● ●●

●●

● ● ● ●●

●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●

●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●

Page 8: Network Computing Laboratory FeedEx: Collaborative Exchange of News Feeds Seung Jun, Mustaque Ahamad Georgia Institute of Technology WWW 2006.

Korea Advanced Institute of Science and Technology

Network Computing Laboratory | 8

Entry CountEntry Count

High publish rate, More entry counts? – NO

Lifetime of entries are short Entries can be lost with infrequent requests

Page 9: Network Computing Laboratory FeedEx: Collaborative Exchange of News Feeds Seung Jun, Mustaque Ahamad Georgia Institute of Technology WWW 2006.

Korea Advanced Institute of Science and Technology

Network Computing Laboratory | 9

Publishing Rate by TimePublishing Rate by Time

4 types of publishing patterns

010

25 Reuters

05

10

Yahoo(M)

04

8

Motley Fool

04

812 NPR

0 1 2 3 4 5 6 7Sat Sun

Time (day)

Entr

ies

pub

lishe

d per

hou

r

Page 10: Network Computing Laboratory FeedEx: Collaborative Exchange of News Feeds Seung Jun, Mustaque Ahamad Georgia Institute of Technology WWW 2006.

Korea Advanced Institute of Science and Technology

Network Computing Laboratory | 10

Challenges Challenges – Overlay Construction – Overlay Construction

(1/2) –(1/2) –

Goal: Minimize network management overhead

Join1. Well known host

OR Contact previous neighbors

2. Share subscription set info3. Update subscription set info to the network

LeaveSoft-state

Update subscription set periodically

Gateway

Neighbor list

Subscription setdest hop

CNN 0dest hop

YAHO 0

HANI 1

dest hop

CNN 1

Page 11: Network Computing Laboratory FeedEx: Collaborative Exchange of News Feeds Seung Jun, Mustaque Ahamad Georgia Institute of Technology WWW 2006.

Korea Advanced Institute of Science and Technology

Network Computing Laboratory | 11

Challenges Challenges – Overlay Construction – Overlay Construction (1/2) –(1/2) –

Neighbor selectionMany neighbors may incur overhead

Need to adapt to my resource status select “useful” neighbors to me

Whose subscription set is similar to me

HANI 0

CNN 0

YAHOO 0

DAUM 0

A

BNCLAB 0

CNN 0

HANI 1

DAUM 2

1 direct,

1 one-hop,

1 two-hop

Page 12: Network Computing Laboratory FeedEx: Collaborative Exchange of News Feeds Seung Jun, Mustaque Ahamad Georgia Institute of Technology WWW 2006.

Korea Advanced Institute of Science and Technology

Network Computing Laboratory | 12

Challenges Challenges – Fetching interval – Fetching interval determination –determination –

Adaptive FetchingProblem: Little hints about the publishing rate or entry lifetime

Frequent polling: overload servers, consume clients’ net bandwidth

Lazy polling: increase delay or miss entries

Adaptive AlgorithmIntuition: Frequent fetching few new entries

Freshness rate: fraction of new entries in the fetched document

If Freshness rate < target freshness Halve the fetching rate

If Freshness rate > target freshness Double the fetching rate

Fetch

HANI 1. Report 12. Report 23. Report 34. …

Entries in a feed

Page 13: Network Computing Laboratory FeedEx: Collaborative Exchange of News Feeds Seung Jun, Mustaque Ahamad Georgia Institute of Technology WWW 2006.

Korea Advanced Institute of Science and Technology

Network Computing Laboratory | 13

Challenges Challenges – Data dissemination–– Data dissemination–

Goal: Minimize bandwidth consumption1. Limit the boundary of delivery

Forward only to matching neighbors (subscription set, hop_count)

reduce forwarding overhead2. Reduce the unit of delivery

Unit of delivery : Entry bundleA set of new entries (Filter out old entries)

Reduce redundant content delivery3. Check before forwarding

Exchange id of an entry bundle (ID: SHA-1 digest of the bundle) If it is an undelivered bundle deliver it

HANI

2

Fetch

HANIHANI 0

HANI

0

HANI

1

Max subset hops = 1

Page 14: Network Computing Laboratory FeedEx: Collaborative Exchange of News Feeds Seung Jun, Mustaque Ahamad Georgia Institute of Technology WWW 2006.

Korea Advanced Institute of Science and Technology

Network Computing Laboratory | 14

Challenges Challenges – Free riding prevention–– Free riding prevention–

Nodes may manifest selfish behaviorOnly receive, without forwardingLie subscription set to become a preferred neighbor

Solution: Provide a neighbor evaluation methodContribution metric

Nodes who forwards feeds I subscribe, and my near neighbors subscribeLevel of contribution: direct subscription, 1 hop subscription, 2 hop sub, …

cmi, j += wf −hf

Cut out unhelpful neighbors: I helped, but it doesn’t helped medi,j = cmi,j − cmj,i

Feature Uses local information only

Easy to implement and enforce the mechanism

Page 15: Network Computing Laboratory FeedEx: Collaborative Exchange of News Feeds Seung Jun, Mustaque Ahamad Georgia Institute of Technology WWW 2006.

Korea Advanced Institute of Science and Technology

Network Computing Laboratory | 15

Challenges Challenges – Entry searching –– Entry searching –

Overlay as a distributed storageIterative searching

Strong points: Searching latency, query traffic

Recursive searching (flooding)Strong points: low overhead of a requester, caching for popular queries, reflect to neighbor evaluation

?

Page 16: Network Computing Laboratory FeedEx: Collaborative Exchange of News Feeds Seung Jun, Mustaque Ahamad Georgia Institute of Technology WWW 2006.

Korea Advanced Institute of Science and Technology

Network Computing Laboratory | 16

Benefits of FeedExBenefits of FeedEx

1. Scalability

2. ArchivabilityStorage of entries

3. ControllabilityCompared to web based readers : e.g. Fetch interval

4. Filtering and recommendationShare opinions on entries (e.g. voting)

Feed recommendation

5. PrivacyUsers can fetch documents for others

anonymize actual users

Page 17: Network Computing Laboratory FeedEx: Collaborative Exchange of News Feeds Seung Jun, Mustaque Ahamad Georgia Institute of Technology WWW 2006.

Korea Advanced Institute of Science and Technology

Network Computing Laboratory | 17

Architecture of FeedExArchitecture of FeedEx

To News Feed Servers

To Neighbors

Neighbor

Server

RPC

From Neighbors

To List ServerConnector

Feed Fetch Scheduler

Prototpye: python

Networking: Twisted

Protocol : XML-RPC

Interoperability, fast-prototyping

Entry Storage: SQLite (Lightweight RDB)

RSS parser : feedparser.org

Page 18: Network Computing Laboratory FeedEx: Collaborative Exchange of News Feeds Seung Jun, Mustaque Ahamad Georgia Institute of Technology WWW 2006.

Korea Advanced Institute of Science and Technology

Network Computing Laboratory | 18

Experimental SetupExperimental SetupTwo modes

Stand-alone mode SLNFeedEx mode XCH

MetricsTime lagMissing entriesCommunication cost

ExperimentsUse 189 PlanetLab nodesRun 22 hours on a weekdayPrimary factor: 6 fetching intervalsLet each node subscribe 20 out of 70 feeds

Page 19: Network Computing Laboratory FeedEx: Collaborative Exchange of News Feeds Seung Jun, Mustaque Ahamad Georgia Institute of Technology WWW 2006.

Korea Advanced Institute of Science and Technology

Network Computing Laboratory | 19

Results: Time LagResults: Time Lag

Average Time LagAverage of node averages

Without applying adaptive fetching algorithm

Despite of fetching interval, contents are delivered soon

Fetching interval (hours)

Tim

e la

g (h

ours

)

0 5 10 15

02

46

8

● ● ● ● ●●

15.8times

Page 20: Network Computing Laboratory FeedEx: Collaborative Exchange of News Feeds Seung Jun, Mustaque Ahamad Georgia Institute of Technology WWW 2006.

Korea Advanced Institute of Science and Technology

Network Computing Laboratory | 20

Rate of Missing entries# enrtries in a node / # of entries in a reference node

Low missing rate despite of a problem(DNS error or routing error) in the network Sometimes better than the reference node

Fetching interval (hours)

Mis

sing e

ntr

ies

(%)

.5 1 2 4 8 16

020

4060

8010

0

● ● ●●

●●

● ● ● ● ● ●

●●

● ● ● ● ● ●

XCH miss

Results: Missing EntriesResults: Missing Entries

Page 21: Network Computing Laboratory FeedEx: Collaborative Exchange of News Feeds Seung Jun, Mustaque Ahamad Georgia Institute of Technology WWW 2006.

Korea Advanced Institute of Science and Technology

Network Computing Laboratory | 21

Two most frequently called precedures: check_did, put_entries

Check_did call: single IP packet

Put_entries: 2 calls / minute deliver 2.67 entries / call

Low communication cost

Results: Communication CostResults: Communication Cost

Fetching interval (hours)

Rece

ived

cal

ls p

er m

iniu

te

.5 1 2 4 8 16

04

812

16

●●

●●

●●

check_did

Page 22: Network Computing Laboratory FeedEx: Collaborative Exchange of News Feeds Seung Jun, Mustaque Ahamad Georgia Institute of Technology WWW 2006.

Korea Advanced Institute of Science and Technology

Network Computing Laboratory | 22

CritiqueCritique

Strong pointsMade an new problem from an old domain “web caching”

Free from delay / failure of nodes

Draw out possible benefits/extensions

simple!Practically deployable

Tried to find a mechanism both good for servers and clients

Page 23: Network Computing Laboratory FeedEx: Collaborative Exchange of News Feeds Seung Jun, Mustaque Ahamad Georgia Institute of Technology WWW 2006.

Korea Advanced Institute of Science and Technology

Network Computing Laboratory | 23

CritiqueCritiqueWeak points

Overload due to RSS feed delivery?Only a small text file delivery

Should have considered podcasting(Multimedia RSS)

Will the clients donate their resource? Is “short delay” a strong incentive?

Is “low bandwidth consumption” a strong incentive?

Will the subscription sets of people really overlap a lot?Net effective to SPs providing diverse RSS feeds

e.g. Naver blog, egloos..

Is it really robust to frequent leave and join?

Lack of server side evaluationServer load & network resource

Delivering critical data (e.g. timely news) using RSS?

Page 24: Network Computing Laboratory FeedEx: Collaborative Exchange of News Feeds Seung Jun, Mustaque Ahamad Georgia Institute of Technology WWW 2006.

Korea Advanced Institute of Science and Technology

Network Computing Laboratory | 24

Supplementary slidesSupplementary slides

Page 25: Network Computing Laboratory FeedEx: Collaborative Exchange of News Feeds Seung Jun, Mustaque Ahamad Georgia Institute of Technology WWW 2006.

Korea Advanced Institute of Science and Technology

Network Computing Laboratory | 25

Entry LifetimeEntry Lifetime

Generally CNN,

Publishers have policies (probably)

Lifetime (hours)

Cum

ulat

ive

prob

abili

ty

0 20 40 60 80

0.0

0.2

0.4

0.6

0.8

1.0

CNN

FOX News

Techbargains.com

Beta News

Page 26: Network Computing Laboratory FeedEx: Collaborative Exchange of News Feeds Seung Jun, Mustaque Ahamad Georgia Institute of Technology WWW 2006.

Korea Advanced Institute of Science and Technology

Network Computing Laboratory | 26

New ideaNew idea

Topic based feed pub/sub systemWhy should we register the address of a feed?

Need to find addresses providing contents I want

A feed may contain contents that I don’t want

Web Content providers

feeds

feeds

Topic based feed pub/sub(P2P based)

Topic of interest(Maybe Tags?)

Contents related to the topic

Page 27: Network Computing Laboratory FeedEx: Collaborative Exchange of News Feeds Seung Jun, Mustaque Ahamad Georgia Institute of Technology WWW 2006.

Korea Advanced Institute of Science and Technology

Network Computing Laboratory | 27

New ideaNew idea

Topic based feeding services are already launched

Baebo Create new feeds by keywords from the Amazon, Yahoo, eBay feeds

Say4Extract entries containing sentences in the bible from the BBC feed.

But centralized server runs the serviceLimitation in the number of input feeds

Hard to add input feed dynamically compared to P2P approach


Recommended