+ All Categories
Home > Technology > Sistemas Distribuidos

Sistemas Distribuidos

Date post: 02-Jul-2015
Category:
Upload: locaweb
View: 94 times
Download: 7 times
Share this document with a friend
Description:
Diego Souza fala sobre sistemas distribuídos mostradando uma introdução sobre os conceitos básicos e algumas considerações práticas que podem afetar o nosso dia a dia. Assista esta palestra em https://www.eventials.com/locaweb/sistemas-distribuidos/
43
distributed systems diego souza @ infra-dev
Transcript
Page 1: Sistemas Distribuidos

distributed systemsdiego souza @ infra-dev

Page 2: Sistemas Distribuidos

agenda

● the basics● models● practical aspects

Page 3: Sistemas Distribuidos

the basics

Page 4: Sistemas Distribuidos

the basics

what is a distributed system? (cont.)● a distributed system is a piece of software

that ensures that a collection of independent computers appears to its users as a single coherent system;

Page 5: Sistemas Distribuidos

the basics

what is a distributed system? (cont.)● a distributed system is a software system in

which components located on networked computers communicate and coordinate their actions by passing messages;

Page 6: Sistemas Distribuidos

the basics

what is a distributed system?● a distributed system is one in which the

failure of a computer you didn't even know existed can render your own computer unusable [Lamport];

Page 7: Sistemas Distribuidos

the basics

fallacies of a distributed system1. the network is reliable;2. latency is zero;3. bandwidth is infinite;4. the network is secure;5. topology doesn't change;6. there is one administrator;7. transport cost is zero;8. the network is homogeneous;

Page 8: Sistemas Distribuidos

the basics

examples:● cassandra● hadoop● www● internet● etc.

Page 9: Sistemas Distribuidos

the basics

why?● things no longer fit in a single machine;● scalability [size, geographic, organizational];● availability;● fault tolerance;● performance;

Page 10: Sistemas Distribuidos

the basics

scalability● is the ability of a system, network, or

process, to handle a growing amount of work in a capable manner or its ability to be enlarged to accommodate that growth;

Page 11: Sistemas Distribuidos

the basics

performance● depends on the context and what we want

to achieve:○ response time/low latency;○ throughput;○ utilization of computer resources;

Page 12: Sistemas Distribuidos

the basics

latency● the state of being latent; delay, a period

between the initiation of something and the occurrence;

● a wise man once said:○ Bandwidth is easy. Engineers build bandwidth. But

latency is hard. Only God gives us latency;

Page 13: Sistemas Distribuidos

the basics

availability● the proportion of time a system is in a

functioning condition. If a user cannot access the system, it is said to be unavailable;

Page 14: Sistemas Distribuidos

the basics

fault tolerance● ability of a system to behave in a well-

defined manner once faults occur;

Page 15: Sistemas Distribuidos

models

Page 16: Sistemas Distribuidos

models

availability metrics

availability = uptime / (uptime + downtime)

availability = mtbf / (mtbf + mttr)

mtbf: mean time between failure

mttr: mean time to repair

● q: is every second the same?

Page 17: Sistemas Distribuidos

models

availability metrics

yield = successes / requests

● a: very unlikely!

Page 18: Sistemas Distribuidos

models

availability metrics

harvest = data_available / total_data

● how incomplete is this [think of websearch]?

Page 19: Sistemas Distribuidos

models

distributing the dataset● partition● replication

Page 20: Sistemas Distribuidos

models

partition● improves performance [reduces dataset];● improves availability [partial failures];

● usually application specific [random, time, user];

Page 21: Sistemas Distribuidos

models

replication● improves performance [full copy];● improves availability [full copy, reed-

solomon codes];○ synchronous, asynchronous;○ single copy, multi-master○ crdts

Page 22: Sistemas Distribuidos

models

replication [strong consistency]● primary/copy [eg. mysql master]● 2pc [eg. mysql cluster]● paxos, zab, raft

Page 23: Sistemas Distribuidos

models

replication [weak consistency]● amazon dynamo

○ consistent hashing [partitioning]○ partial quorums○ failure detection and read repair○ gossip protocol

● note: r + w > n != strong consistency

Page 24: Sistemas Distribuidos

models

time● global clock [ntp, total order]● local clock [partial order]● logical clock [partial order; lamport clock,

vector clocks]

Page 25: Sistemas Distribuidos

models

consensus & atomic broadcast● consensus: vote & agreement;● atomic broadcast: reliable message

transmission and order guarantees;

● they are equivalent

Page 26: Sistemas Distribuidos

models

flp impossibility● does not exist an algorithm for the

consensus problem in an asynchronous system subject to failures, even if messages can never be lost, at most one process may fail, and it can only fail by crashing

● note: its not that bad! :)

Page 27: Sistemas Distribuidos

models

Page 28: Sistemas Distribuidos

models

cap: [note: pick only two is misleading]● consistency: the same data at the same

time;● availability;● partition tolerance: continues to operate

despite message loss [network or node failure];

Page 29: Sistemas Distribuidos

practical aspects

Page 30: Sistemas Distribuidos

I find latency one of the most important aspects of performance

Page 31: Sistemas Distribuidos

hard to develop, even hard to operate: they are not unbreakable

Page 32: Sistemas Distribuidos

consensus is a hard problem

Page 33: Sistemas Distribuidos

failures are the norm

Page 34: Sistemas Distribuidos

metrics, metrics, metrics

Page 35: Sistemas Distribuidos

what to do in presence of failures

Page 36: Sistemas Distribuidos

think about backpressure mechanisms

Page 37: Sistemas Distribuidos

think about timeouts

Page 38: Sistemas Distribuidos

feature flag as a deploy mechanism

Page 39: Sistemas Distribuidos

think hard about scalability

Page 40: Sistemas Distribuidos

thanks :)questions or comments?

Page 41: Sistemas Distribuidos

appendix

Page 42: Sistemas Distribuidos

appendix: what we have here

● cassandra● zookeeper● ceph● etcd● consul● leela

Page 43: Sistemas Distribuidos

links● http://book.mixu.net/distsys/


Recommended