+ All Categories
Home > Documents > Distributed Monitoring and Management Presented by: Ahmed Khurshid Abdullah Al-Nayeem CS 525 Spring...

Distributed Monitoring and Management Presented by: Ahmed Khurshid Abdullah Al-Nayeem CS 525 Spring...

Date post: 13-Dec-2015
Category:
Upload: osborne-whitehead
View: 215 times
Download: 0 times
Share this document with a friend
Popular Tags:
51
Distributed Monitoring and Management Presented by: Ahmed Khurshid Abdullah Al-Nayeem CS 525 Spring 2009 Advanced Distributed Systems
Transcript
Page 1: Distributed Monitoring and Management Presented by: Ahmed Khurshid Abdullah Al-Nayeem CS 525 Spring 2009 Advanced Distributed Systems.

Distributed Monitoring and Management

Presented by:Ahmed Khurshid

Abdullah Al-Nayeem

CS 525 Spring 2009 Advanced Distributed Systems

Page 2: Distributed Monitoring and Management Presented by: Ahmed Khurshid Abdullah Al-Nayeem CS 525 Spring 2009 Advanced Distributed Systems.

Large Distributed Systems

InfrastructurePlanetLab: 971 nodes in 485 sites

ApplicationHadoop at Yahoo!: 4000 nodes

2

Google probably has more than 450,000 servers worldwide (Wikipedia)

Not only nodes, data processed in commercial systems, e.g. Facebook isenormous (over 10 billion picture uploaded).

3/12/2009 Department of Computer Science, UIUC

Page 3: Distributed Monitoring and Management Presented by: Ahmed Khurshid Abdullah Al-Nayeem CS 525 Spring 2009 Advanced Distributed Systems.

Monitoring and Management

• Monitoring and management of both infrastructures and applications– Corrective measures against failure, attacks, etc.– Ensuring better performance, e.g. load balancing

• What resources are managed?– Distributed application processes, objects (log files,

routing table, etc.)– System resources: CPU utilization, free disk space,

bandwidth utilization

33/12/2009 Department of Computer Science, UIUC

Page 4: Distributed Monitoring and Management Presented by: Ahmed Khurshid Abdullah Al-Nayeem CS 525 Spring 2009 Advanced Distributed Systems.

Management and Monitoring Operations

• Query current system status– CPU utilization, disk space..– Process progress rate..

• Push software updates– Install the query program

• Monitor dynamically changing state

?

n1

n2n3

n4

n5

n6

43/12/2009 Department of Computer Science, UIUC

Page 5: Distributed Monitoring and Management Presented by: Ahmed Khurshid Abdullah Al-Nayeem CS 525 Spring 2009 Advanced Distributed Systems.

Challenges

• Managing today’s large-scale systems is difficult.– A centralized solution doesn’t scale (no in-network

aggregation)– Self-organization capability is becoming a necessity– Responses are expected in seconds, not in minutes/hours.– Node failure causes inconsistent results (network partition)

• Brewer’s conjecture:– It is impossible for a web service to provide the following three

guarantees: Consistency, Availability, Partition-tolerance (CAP Dilemma)

53/12/2009 Department of Computer Science, UIUC

Page 6: Distributed Monitoring and Management Presented by: Ahmed Khurshid Abdullah Al-Nayeem CS 525 Spring 2009 Advanced Distributed Systems.

Astrolabe: A Robust and Scalable Technology for Distributed System

Monitoring, Management, and Data Mining

Presented by:Abdullah Al-Nayeem

Robbert Van Renesse, Kenneth P. Birman, and Werner Vogels

Page 7: Distributed Monitoring and Management Presented by: Ahmed Khurshid Abdullah Al-Nayeem CS 525 Spring 2009 Advanced Distributed Systems.

Overview• Astrolabe as an information management service.– Locates and collects the status of a set of servers.– Reports the summaries of this information (aggregation

mechanism using SQL). – Automatically updates and reports any changed

summaries.

• Design principles:– Scalability through hierarchy of resources– Robustness through gossip protocol (p2p)– Flexibility through customization of queries (SQL)– Security through certificates

73/12/2009 Department of Computer Science, UIUC

Page 8: Distributed Monitoring and Management Presented by: Ahmed Khurshid Abdullah Al-Nayeem CS 525 Spring 2009 Advanced Distributed Systems.

n8

Astrolabe Zone Hierarchy

n1

n2

n3

n4

n5

n7

n6

/berkeley

/cornell

/uiuc /uiuc/csZone: Zone:

Zone:

Zone:

83/12/2009 Department of Computer Science, UIUC

Page 9: Distributed Monitoring and Management Presented by: Ahmed Khurshid Abdullah Al-Nayeem CS 525 Spring 2009 Advanced Distributed Systems.

Astrolabe Zone Hierarchy (2)

/uiuc/ece/n1 /uiuc/cs/n4 /uiuc/cs/n6 /cornell/n2 /cornell/cs/n3/berkeley/eecs/n5

/berkeley/eecs/n7

/berkeley/eecs/n8

/uiuc/ece /uiuc/cs /cornell/cs /berkeley/eecs

/uiuc /cornell /berkeley

/

It’s a virtual hierarchy. Only the host in leaf zone runs an Astrolabe agent

- Zone hierarchy is determined by the administrators (less flexibility).- Assumption: zone names are consistent with the physical topology.

- Zone hierarchy is determined by the administrators (less flexibility).- Assumption: zone names are consistent with the physical topology.

93/12/2009 Department of Computer Science, UIUC

Page 10: Distributed Monitoring and Management Presented by: Ahmed Khurshid Abdullah Al-Nayeem CS 525 Spring 2009 Advanced Distributed Systems.

Decentralized Hierarchical Database

• An attribute list is associated with each zone.– This attribute list is defined as Management

Information Base (MIB)• Attributes includes information on load, total free disk

space, process information, etc.

• Each internal zone has a relational table of MIBs of its child zones.– The leaf zone is an exception (.. next slide)

103/12/2009 Department of Computer Science, UIUC

Page 11: Distributed Monitoring and Management Presented by: Ahmed Khurshid Abdullah Al-Nayeem CS 525 Spring 2009 Advanced Distributed Systems.

Decentralized Hierarchical Database (2)

/uiuc/ece/n1 /uiuc/cs/n4 /uiuc/cs/n6

/uiuc/ece /uiuc/cs

/uiuc /cornell /berkeley

/

csece

uiuccornellberkeley

11

… …n4n6

Load = 0.1Load = 0.3

system

process

Load = 0.1 Disk = 1.2TB

Service: A(1.1) progress = 0.7

files

system

process

Load = 0.3 Disk = 0.6TB

Service: A(1.0) progress = 0.5

files3/12/2009 Department of Computer Science, UIUC

Agent (/uiuc/cs/n6) has its local copy ofthese management table of MIBs.Agent (/uiuc/cs/n6) has its local copy ofthese management table of MIBs.

Page 12: Distributed Monitoring and Management Presented by: Ahmed Khurshid Abdullah Al-Nayeem CS 525 Spring 2009 Advanced Distributed Systems.

State Aggregation

uiuccornellberkeley

/uiuc/cs/n4

/uiuc/cs

/uiuc

/

csece

n4n6

Load = 0.3

Load = 0.5

(Own)

(Own)

(Own)

SELECT MIN(Load) as Load

Load = 0.3

Time = 121

Time = 101

Time = 130

Other aggregation includes:MAX (attribute)SUM (attribute)AVG (attribute)FIRST(n, attribute)

12

Aggregates the resultusing SQL query

3/12/2009 Department of Computer Science, UIUC

Page 13: Distributed Monitoring and Management Presented by: Ahmed Khurshid Abdullah Al-Nayeem CS 525 Spring 2009 Advanced Distributed Systems.

State Merge – Gossip Protocoluiuc

cornellberkeley

/uiuc/cs/n4

/uiuc/cs

/uiuc

/

csece

n4n6

Load = 0.3

Load = 0.5

Load = 0.3

(Own)

(Own)

(Own)

/uiuc/cs/n6

/uiuc/cs

/uiuc

/

Time = 121

Time = 101

uiuccornellberkeley

csece

n4n6

Load = 0.5

Load = 0.5

Load = 0.5

(Own)

(Own)

(Own)

Time = 101

Time = 101

Each agent periodically contacts some other agent and exchangesthe state associated with MIB based on timestamp.

Time = 130 Time = 110

Time = 121 Load = 0.3

133/12/2009 Department of Computer Science, UIUC

Page 14: Distributed Monitoring and Management Presented by: Ahmed Khurshid Abdullah Al-Nayeem CS 525 Spring 2009 Advanced Distributed Systems.

More about Astrolabe Gossip

/uiuc/ece/n1 /uiuc/cs/n4 /uiuc/cs/n6 /cornell/n2 /cornell/cs/n3/berkeley/eecs/n5

/berkeley/eecs/n7

/berkeley/eecs/n8

/uiuc/ece /uiuc/cs /cornell/cs /berkeley/eecs

/uiuc /cornell /berkeley

/uiuccornellberkeley

csece

How does /uiuc/cs/n4 know the MIB of /cornell?

Gossiped MIBs in /uiuc/cs

14By gossiping with /cornell/n2 or /cornell/cs/n3

3/12/2009 Department of Computer Science, UIUC

Page 15: Distributed Monitoring and Management Presented by: Ahmed Khurshid Abdullah Al-Nayeem CS 525 Spring 2009 Advanced Distributed Systems.

More about Astrolabe Gossip (2)• Each zone dynamically elects a set of representative agents

to gossip on behalf of this zone.– Election can be based on the load of the agents, their longevity.– The MIB contains the list of representative agents. – An agent can represent for multiple zones.

• Each agent periodically gossips for a zone it represents. – Randomly picks another sibling zones and one of its

representative agents.– Gossips the MIBs of all their sibling zones.

• Gossip dissemination within a zone grows at O(logK). – K = no. of child zones.

153/12/2009 Department of Computer Science, UIUC

Page 16: Distributed Monitoring and Management Presented by: Ahmed Khurshid Abdullah Al-Nayeem CS 525 Spring 2009 Advanced Distributed Systems.

More about Astrolabe Gossip (3)

/uiuc/ece/n1 /uiuc/cs/n4 /uiuc/cs/n6 /cornell/n2 /cornell/cs/n3/berkeley/eecs/n5

/berkeley/eecs/n7

/berkeley/eecs/n8

/uiuc/ece /uiuc/cs /cornell/cs /berkeley/eecs

/uiuc /cornell /berkeley

/

/uiuc/cs/n4/uiuc/ece/n1

/cornell/cs/n3/berkeley/eecs/n8/berkeley/eecs/n5

Gossips about the MIBs of the /berkeley, /uiuc16

Representative agents for /uiuc

uiuccornellberkeley

3/12/2009 Department of Computer Science, UIUC

Page 17: Distributed Monitoring and Management Presented by: Ahmed Khurshid Abdullah Al-Nayeem CS 525 Spring 2009 Advanced Distributed Systems.

Example: P2P Caching of Large Objects

• Query to locate a copy of the file, game.db– SELECT

COUNT(*) AS file_count FROM files WHERE name = ‘game.db’

– SELECTFIRST(1, result) AS result

SUM(file_count) AS file_countWHERE file_count > 0

• The SQL query code is installed in an Astrolabe agent using an aggregation function certificate (AFC)

17

Each host installs an attribute ‘result’ withits host name in its leaf MIB.

Aggregates the ‘result’of each zone and picksone host in each zone.

3/12/2009 Department of Computer Science, UIUC

Page 18: Distributed Monitoring and Management Presented by: Ahmed Khurshid Abdullah Al-Nayeem CS 525 Spring 2009 Advanced Distributed Systems.

Example: P2P Caching of Large Objects (2)

• The querying application introduces this new AFC at the management table of some Astrolabe agent.

• Query propagation and aggregation of output:1. An Astrolabe agent automatically evaluates the AFC for the leaf

zone and recursively updates the table of the ancestor zones.• A copy of the AFC is also included along the result. • This query is evaluated recursively till the root zone as long as the

policy permits.

2. AFCs are also gossiped to other agents (as part of the MIB).• An receiving agent scans the gossiped message and installs the new AFCs

and in the leaf MIBs

• Each AFC has an expiration period. Till then, queries are evaluated frequently.

183/12/2009 Department of Computer Science, UIUC

Page 19: Distributed Monitoring and Management Presented by: Ahmed Khurshid Abdullah Al-Nayeem CS 525 Spring 2009 Advanced Distributed Systems.

Membership

• Removing failed or disconnected nodes:– Astrolabe also gossips about the membership information. – If a process (or host) fails, its MIB will eventually expire

and be deleted.

• Integrating new members:– Astrolabe relies on IP multicast to set up the initial

contacts. • Gossip message is also broadcast on the local LAN (occasionally).

– Astrolabe agents also contact a set of its relatives (occasionally)

193/12/2009 Department of Computer Science, UIUC

Page 20: Distributed Monitoring and Management Presented by: Ahmed Khurshid Abdullah Al-Nayeem CS 525 Spring 2009 Advanced Distributed Systems.

Simulation Results

- Number of representative agents per zone = 1.- No failure.

The smaller the branching of the zone hierarchy, the slower the gossip dissemination.

20

Effect of branching factor of the zone hierarchy on the gossip dissemination time.

3/12/2009 Department of Computer Science, UIUC

Page 21: Distributed Monitoring and Management Presented by: Ahmed Khurshid Abdullah Al-Nayeem CS 525 Spring 2009 Advanced Distributed Systems.

Simulation Results (2)

- Branching factor = 25- No failure.

The higher the representative agents per zone, the lower the timeof gossip dissemination in the presence of failures.

21

Effect of the number of representative agents on the gossip dissemination time.

3/12/2009 Department of Computer Science, UIUC

Page 22: Distributed Monitoring and Management Presented by: Ahmed Khurshid Abdullah Al-Nayeem CS 525 Spring 2009 Advanced Distributed Systems.

Discussion

• Astrolabe is not meant to provide routing features similar to DHTs.– How is Astrolabe different from DHTs?

• Astrolabe attributes are updated proactively and frequently.– Do you think this proactive management is better

than the reactive one?

223/12/2009 Department of Computer Science, UIUC

Page 23: Distributed Monitoring and Management Presented by: Ahmed Khurshid Abdullah Al-Nayeem CS 525 Spring 2009 Advanced Distributed Systems.

Moara: Flexible and Scalable Group-Based Querying System

Steven Y. Ko1, Praveen Yalagandula2, Indranil Gupta1,Vanish Talwar2, Dejan Milojicic2, Subu Iyer2

1University of Illinois at Urbana-Champaign 2HP Labs, Palo Alto

ACM/IFIP/USENIX Middleware, 2008

Presented byAhmed Khurshid

Page 24: Distributed Monitoring and Management Presented by: Ahmed Khurshid Abdullah Al-Nayeem CS 525 Spring 2009 Advanced Distributed Systems.

Motivation

3/12/2009 Department of Computer Science, UIUC 24

Linux

Apache

MySQL What is the average memory

utilization of machines running

MySQL?

Naïve approachConsumes extra bandwidth and adds delay

Page 25: Distributed Monitoring and Management Presented by: Ahmed Khurshid Abdullah Al-Nayeem CS 525 Spring 2009 Advanced Distributed Systems.

Motivation (cont.)

3/12/2009 Department of Computer Science, UIUC 25

Linux

Apache

MySQL What is the average memory

utilization of machines running

MySQL?

Better approachAvoids unnecessary traffic

Page 26: Distributed Monitoring and Management Presented by: Ahmed Khurshid Abdullah Al-Nayeem CS 525 Spring 2009 Advanced Distributed Systems.

Two Approaches

3/12/2009 Department of Computer Science, UIUC 26

Single Tree (No Grouping)

Query cost is high

Group Based Trees

Group maintenance cost is high

MOARA

Page 27: Distributed Monitoring and Management Presented by: Ahmed Khurshid Abdullah Al-Nayeem CS 525 Spring 2009 Advanced Distributed Systems.

Moara Features• Moara maintains aggregation trees for different groups

– Uses FreePasrty DHT for this purpose• Supports a query language having the following form

– (query-attribute, aggregation function, group-predicate)– E.g. (Mem-Util, Average, MySQL = true)

• Supports composite queries that target multiple groups using unions and intersections

• Reduces bandwidth usage and response time by– Adaptively pruning branches of the DHT tree– Bypassing intermediate nodes that do not satisfy a given query– Only querying those nodes that are able to answer quickly without

affecting the result of the query

3/12/2009 Department of Computer Science, UIUC 27

Page 28: Distributed Monitoring and Management Presented by: Ahmed Khurshid Abdullah Al-Nayeem CS 525 Spring 2009 Advanced Distributed Systems.

Common Queries

3/12/2009 Department of Computer Science, UIUC 28

Page 29: Distributed Monitoring and Management Presented by: Ahmed Khurshid Abdullah Al-Nayeem CS 525 Spring 2009 Advanced Distributed Systems.

Group Size and Dynamism

3/12/2009 Department of Computer Science, UIUC 29

Usage of PlanetLab nodes by different slices

Usage of HP’s utility computing environment by different jobs

Most slices have fewer than 10 nodes

Number of machines to do a job varies

Page 30: Distributed Monitoring and Management Presented by: Ahmed Khurshid Abdullah Al-Nayeem CS 525 Spring 2009 Advanced Distributed Systems.

Moara: Data and Query Model

• Three part query– Query attribute– Aggregation function– Group-predicate

• Aggregation functions are partially aggregatable– To perform in-network

aggregation

• Composite queries can be constructed using “and” and “or operators”– E.g. (Linux=true and Apache=true)

3/12/2009 Department of Computer Science, UIUC 30

(attribute1, value1)(attribute2, value2)

.

.

.(attributen, valuen)

Moara Agent

Page 31: Distributed Monitoring and Management Presented by: Ahmed Khurshid Abdullah Al-Nayeem CS 525 Spring 2009 Advanced Distributed Systems.

Scalable Aggregation

• Moara employs a peer-to-peer in-network aggregation approach

• Maintains separate DHT trees for each group predicate

• Hash of the group attribute is used to designate a node as the root of such tree

• Queries are first send to the root node that then propagates the query down the tree

• Data coming from child nodes are aggregated before sending results to parent node

3/12/2009 Department of Computer Science, UIUC 31

Page 32: Distributed Monitoring and Management Presented by: Ahmed Khurshid Abdullah Al-Nayeem CS 525 Spring 2009 Advanced Distributed Systems.

DHT Tree

Steps• Take the hash of the

group predicate– Let, Hash(ServiceX) = 000

• Select the root based on the hash

• Use Pastry’s routing mechanism to join the tree– Similar to SCRIBE

3/12/2009 Department of Computer Science, UIUC 32

111 110 011 101 100

010 001

000

Page 33: Distributed Monitoring and Management Presented by: Ahmed Khurshid Abdullah Al-Nayeem CS 525 Spring 2009 Advanced Distributed Systems.

Optimizations

• Prune out branches of the tree the do not contain any node satisfying a given query– Need to balance the cost of maintaining group

tree with the query resolution cost• Bypass internal nodes that do not satisfy a

given query• While dealing with composite queries, select a

minimal set of groups by rewriting the query into a more manageable form

3/12/2009 Department of Computer Science, UIUC 33

Page 34: Distributed Monitoring and Management Presented by: Ahmed Khurshid Abdullah Al-Nayeem CS 525 Spring 2009 Advanced Distributed Systems.

Dynamic Maintenance

3/12/2009 Department of Computer Science, UIUC 34

D E F G H

B C

A

p=false

prune (p) = binary local state variable at every node per attribute

p=false p=false p=false p=false

p(D)=falsep(E)=false

p=false

p(F)=falsep(G)=false

p=false

p(B)=falsep(C)=falsep(H)=false

NO-PRUNE message

Page 35: Distributed Monitoring and Management Presented by: Ahmed Khurshid Abdullah Al-Nayeem CS 525 Spring 2009 Advanced Distributed Systems.

Dynamic Maintenance

3/12/2009 Department of Computer Science, UIUC 35

D E F G H

B C

A

p=false

prune (p) = binary local state variable at every node

p=false p=true p=false p=false

p(D)=falsep(E)=false

p=false

p(F)=truep(G)=false

p=false

p(B)=falsep(C)=falsep(H)=false

PRUNE message

Page 36: Distributed Monitoring and Management Presented by: Ahmed Khurshid Abdullah Al-Nayeem CS 525 Spring 2009 Advanced Distributed Systems.

Dynamic Maintenance

3/12/2009 Department of Computer Science, UIUC 36

D E F G H

B C

A

p=false

prune (p) = binary local state variable at every node

p=false p=true p=false p=false

p(D)=falsep(E)=false

p=false

p(F)=truep(G)=true

p=true

p(B)=falsep(C)=truep(H)=false

PRUNE message

High group churn rate will cause more PRUNE/NO-PRUNE message and may be more expensive than forwarding query to

all nodes

Page 37: Distributed Monitoring and Management Presented by: Ahmed Khurshid Abdullah Al-Nayeem CS 525 Spring 2009 Advanced Distributed Systems.

Adaptation Policy

• Maintain two additional state variables at every node– sat

• To track if a subtree rooted at this node should continue receiving queries for a given predicate

– update• Denotes whether a node will update its prune variable or not

• The following invariants are maintained– update = 1 AND sat = 1 => prune = 0– update = 1 AND sat = 0 => prune = 1– update = 0 => prune = 0

3/12/2009 Department of Computer Science, UIUC 37

Page 38: Distributed Monitoring and Management Presented by: Ahmed Khurshid Abdullah Al-Nayeem CS 525 Spring 2009 Advanced Distributed Systems.

Adaptation Policy (cont.)

3/12/2009 Department of Computer Science, UIUC 38

When overhead due to unrelated queries is higher compared to group

maintenance messages

When group maintenance messages consume more bandwidth than queries

Page 39: Distributed Monitoring and Management Presented by: Ahmed Khurshid Abdullah Al-Nayeem CS 525 Spring 2009 Advanced Distributed Systems.

Separate Query Plane• Used to bypass intermediate nodes that do not satisfy a given query• Reduces message complexity from O(mlogN) to O(m), where

– N = total nodes in the system– m = number of query satisfying nodes

• Uses two locally maintained sets at each node– updateSet – a list of nodes forwarded to parent– qSet – list of children nodes to which queries are forwarded

• qSet is the union of all updateSets received from child nodes

• Based on the size of the qSet and SAT value, a node decides whether to remain in the tree or not

3/12/2009 Department of Computer Science, UIUC 39

Page 40: Distributed Monitoring and Management Presented by: Ahmed Khurshid Abdullah Al-Nayeem CS 525 Spring 2009 Advanced Distributed Systems.

Separate Query Plane (cont.)

3/12/2009 Department of Computer Science, UIUC 40

SAT SAT NOSAT

C D E

B

A

NOSAT

C D

B

C,D

updateSet

qSet

Remain in the tree if SAT is true and |qSet| ≥ threshold

SAT SAT NOSAT

C D E

B

A

NOSAT

C D

C,D

C,D

threshold = 2 threshold = 3

(B)

B

Query

(C,D)

C,D

Page 41: Distributed Monitoring and Management Presented by: Ahmed Khurshid Abdullah Al-Nayeem CS 525 Spring 2009 Advanced Distributed Systems.

Composite Queries• Moara does not maintain trees for composite queries• Answers composite queries by contacting one or more simple

predicate trees• Example - 1

– (Free_Mem, Avg, ServiceX = true AND ServiceY = true)– Two trees – one for each service– Can be answered using a single tree (whichever promises to respond

early)• Example - 2

– (Free_Mem, Avg, ServiceX = true OR ServiceY = true)– Two trees – one for each service– Need to query both trees

3/12/2009 Department of Computer Science, UIUC 41

Page 42: Distributed Monitoring and Management Presented by: Ahmed Khurshid Abdullah Al-Nayeem CS 525 Spring 2009 Advanced Distributed Systems.

Composite Queries (cont.)

• Moara selects a small cover that contains required trees to answer a query

• For example– cover(Q=“A”) = {A} if A is a predefined group– cover(Q=“A or B”) = cover(A) U cover(B)– cover(Q=“A and B”) = cover(A), cover(B) or (cover(A) U cover(B))

• Bandwidth is saved by– Rewriting a nested query to select a low cost cover– Estimating query costs for individual trees– Use semantic information supplied by the user

3/12/2009 Department of Computer Science, UIUC 42

Page 43: Distributed Monitoring and Management Presented by: Ahmed Khurshid Abdullah Al-Nayeem CS 525 Spring 2009 Advanced Distributed Systems.

Finding Low-Cost Covers

3/12/2009 Department of Computer Science, UIUC 43

- CNF = Conjunctive Normal Form - Gives minimal-cost cover

Page 44: Distributed Monitoring and Management Presented by: Ahmed Khurshid Abdullah Al-Nayeem CS 525 Spring 2009 Advanced Distributed Systems.

Performance Evaluation - Dynamic Maintenance

• FreePastry simulator

• 10,000 nodes• 2000-sized group

churn event• 500 events• Query for an

attribute A with value Є {0, 1}

3/12/2009 Department of Computer Science, UIUC44

Bandwidth usage with various query-to churn ratios

Moara performs better than both the extreme approaches

Page 45: Distributed Monitoring and Management Presented by: Ahmed Khurshid Abdullah Al-Nayeem CS 525 Spring 2009 Advanced Distributed Systems.

Performance Evaluation - Separate Query Plane

3/12/2009 Department of Computer Science, UIUC45

Bandwidth usage for different (group size, threshold)

For higher threshold values, query cost does not depend on total number of nodes

Page 46: Distributed Monitoring and Management Presented by: Ahmed Khurshid Abdullah Al-Nayeem CS 525 Spring 2009 Advanced Distributed Systems.

Emulab Experiments

• 50 machines, 10 instances of Moara per machine• Fixed query rate

3/12/2009 Department of Computer Science, UIUC 46

Latency and BW usage with static groups

Average Latency of dynamically changing groups

Average latency with a static group of the

same size

- Latency and message cost increases with group size- Moara performs well even with high group churn events

Page 47: Distributed Monitoring and Management Presented by: Ahmed Khurshid Abdullah Al-Nayeem CS 525 Spring 2009 Advanced Distributed Systems.

PlanetLab Experiments

• 200 PlanetLab nodes, one instance of Moara per node• 500 queries injected 5 seconds apart

3/12/2009 Department of Computer Science, UIUC 47

Moara vs. Centralized Aggregator

Moara responds faster than a centralized aggregator

Page 48: Distributed Monitoring and Management Presented by: Ahmed Khurshid Abdullah Al-Nayeem CS 525 Spring 2009 Advanced Distributed Systems.

Discussion Points

• Using DAG or Synopsis Diffusion instead of trees• Handling group churn in the middle of a query– Moara ensures eventual consistency

• Effect of nodes that are unreachable• State maintenance overhead for different attributes• Computation overhead for maintaining DHT trees for

different attributes• Using Moara in ad-hoc mobile wireless networks

3/12/2009 Department of Computer Science, UIUC 48

Page 49: Distributed Monitoring and Management Presented by: Ahmed Khurshid Abdullah Al-Nayeem CS 525 Spring 2009 Advanced Distributed Systems.

Network Imprecision: A New Consistency Metric for Scalable

Monitoring

Navendu Jain†, Prince Mahajan⋆, Dmitry Kit⋆, Praveen Yalagandula‡, Mike Dahlin⋆, and Yin Zhang⋆

†Microsoft Research ⋆The University of Texas at Austin ‡HP Labs

Page 50: Distributed Monitoring and Management Presented by: Ahmed Khurshid Abdullah Al-Nayeem CS 525 Spring 2009 Advanced Distributed Systems.

Motivation

• Providing a consistency metric suitable for large-scale monitoring system

• Safeguarding accuracy despite node and network failure

• Providing a level of confidence on the reported information

• Efficiently tracking the number of nodes that fail to report status or that report status multiple times

3/12/2009 Department of Computer Science, UIUC 50

Page 51: Distributed Monitoring and Management Presented by: Ahmed Khurshid Abdullah Al-Nayeem CS 525 Spring 2009 Advanced Distributed Systems.

Thanks

Questions and Comments?


Recommended