+ All Categories
Home > Documents > Distributed Systems - Brown Universitycs.brown.edu/courses/cs138/s19/lectures/Day19_2019.pdf · Day...

Distributed Systems - Brown Universitycs.brown.edu/courses/cs138/s19/lectures/Day19_2019.pdf · Day...

Date post: 19-Aug-2020
Category:
Upload: others
View: 2 times
Download: 0 times
Share this document with a friend
24
Distributed Systems Day 19: Practical Consensus
Transcript
Page 1: Distributed Systems - Brown Universitycs.brown.edu/courses/cs138/s19/lectures/Day19_2019.pdf · Day 19: Practical Consensus. Azure Storage 1998 Paxos 2014 Raft 2008 ZAB. Consensus

DistributedSystemsDay19:PracticalConsensus

Page 2: Distributed Systems - Brown Universitycs.brown.edu/courses/cs138/s19/lectures/Day19_2019.pdf · Day 19: Practical Consensus. Azure Storage 1998 Paxos 2014 Raft 2008 ZAB. Consensus

AzureStorage1998Paxos

2014Raft

2008ZAB

Page 3: Distributed Systems - Brown Universitycs.brown.edu/courses/cs138/s19/lectures/Day19_2019.pdf · Day 19: Practical Consensus. Azure Storage 1998 Paxos 2014 Raft 2008 ZAB. Consensus

ConsensusinPractice

ConsistentData

TapestryNetwork

ServerB(follower)

ServerC(Follower)

ServerA(leader)

Thusfar,we’veusedconsistencyforapplicationOralltheapplication’sdata

Route TableBackPointers

Local <K,V>

Rout e TableBackPointers

Local <K,V>

Rout e TableBackPointers

Local <K,V>

ConsistentApplications

ButmetadatashouldBeconsistent

ThedataandthestorageDoesn’tneedtobeconsisten

TapestryConfiguration

Page 4: Distributed Systems - Brown Universitycs.brown.edu/courses/cs138/s19/lectures/Day19_2019.pdf · Day 19: Practical Consensus. Azure Storage 1998 Paxos 2014 Raft 2008 ZAB. Consensus

ConsensusinPractice

ConsistentData Tapestry

Network

ServerB(follower)

ServerC(Follower)

ServerA(leader)

Thusfar,we’veusedconsistencyforapplicationOralltheapplication’sdata

Route TableBackPointers

Local <K,V>

Rout e TableBackPointers

Local <K,V>

Rout e TableBackPointers

Local <K,V>

ConsistentApplications

ButmetadatashouldBeconsistent

ThedataandthestorageDoesn’tneedtobeconsisten

TapestryConfiguration

GroupMembership

WhoisinmyTapestry cluster?

ConfigurationMetaData

WhatistheIPoftheLiteMinerMaster?HowmanynodesinRaft?

DistributedLocks Whohaslocksonafile?

Page 5: Distributed Systems - Brown Universitycs.brown.edu/courses/cs138/s19/lectures/Day19_2019.pdf · Day 19: Practical Consensus. Azure Storage 1998 Paxos 2014 Raft 2008 ZAB. Consensus

HowwouldyouimplementalockwithRaft?

• PrimitivesexposedtotheFEs• Lock()• Unlock()

ServerB(follower)

ServerC(Follower)

ServerA(leader)FE

FE

Page 6: Distributed Systems - Brown Universitycs.brown.edu/courses/cs138/s19/lectures/Day19_2019.pdf · Day 19: Practical Consensus. Azure Storage 1998 Paxos 2014 Raft 2008 ZAB. Consensus

HowwouldyouimplementalockwithRaft?

• PrimitivesexposedtotheFEs• Lock()• Unlock()

• Challenges:• Locksmustbeimplemented asastatemachine

• Mustunderstand log-replicationsemantics

ServerB(follower)

ServerC(Follower)

ServerA(leader)FE

FE

Page 7: Distributed Systems - Brown Universitycs.brown.edu/courses/cs138/s19/lectures/Day19_2019.pdf · Day 19: Practical Consensus. Azure Storage 1998 Paxos 2014 Raft 2008 ZAB. Consensus

LifeBeforeCubbywasfarworse…

• Distributedsystemsdevelopers..• Implement Raft(wellactually Paxos)

• Applicationmustbewrittenasastatemachine• Potentialperformanceproblems

• Quorumon5iseasieroverquorumof10Knodes

• Sharedcriticalregions(Exclusive locks)• Hardtocode/understand

• Peoplethinktheycan… buttheycan’t!

Page 8: Distributed Systems - Brown Universitycs.brown.edu/courses/cs138/s19/lectures/Day19_2019.pdf · Day 19: Practical Consensus. Azure Storage 1998 Paxos 2014 Raft 2008 ZAB. Consensus

BeforeChubbyCameAbout…• Lotsofdistributedsystemswithclientsinthe10,000s

• Howtodoprimaryelection?– Adhoc(noharmfromduplicatedwork)

– Operatorintervention(correctnessessential)

• Unprincipled• Disorganized• Costly• Lowavailability

Page 9: Distributed Systems - Brown Universitycs.brown.edu/courses/cs138/s19/lectures/Day19_2019.pdf · Day 19: Practical Consensus. Azure Storage 1998 Paxos 2014 Raft 2008 ZAB. Consensus

WhichwouldyouprogramwithLocks?OrRaft? Designrequirements:

• Exposelockstodevelopers• Locksareeasierthan

redesigningasstatemachines• Lockscan’tbepermanent

• Iflocksarepermanentandserversfailthenlocksarelost

• Allserversinnetworkshouldnotbepartoftheservice

Tapestry=10000nodes

Rout e TableBackPointers

Local <K,V>

Rout e TableBackPointers

Local <K,V>

Rout e TableBackPointers

Local <K,V>

Chubby=5nodes

TapestryConfiguration

Page 10: Distributed Systems - Brown Universitycs.brown.edu/courses/cs138/s19/lectures/Day19_2019.pdf · Day 19: Practical Consensus. Azure Storage 1998 Paxos 2014 Raft 2008 ZAB. Consensus
Page 11: Distributed Systems - Brown Universitycs.brown.edu/courses/cs138/s19/lectures/Day19_2019.pdf · Day 19: Practical Consensus. Azure Storage 1998 Paxos 2014 Raft 2008 ZAB. Consensus

WhatistheChubbyPaperabout?

“BuildingChubbywasanengineeringeffort…itwasnotresearch.Weclaimnonewalgorithmsortechniques.Thepurposeofthispaperistodescribewhatwedidandwhy,ratherthantoadvocateit.”• Designofconsensusservicebasedonwell-knownideas

• distributed consensus, caching,notifications, file-system interface

1998Paxos

2006Raft

2008ZAB

Chubby2001

Zookeeper2010

Etcd.2013

ConsensusProtocol

ConsensusService

Page 12: Distributed Systems - Brown Universitycs.brown.edu/courses/cs138/s19/lectures/Day19_2019.pdf · Day 19: Practical Consensus. Azure Storage 1998 Paxos 2014 Raft 2008 ZAB. Consensus

ChubbyDesign

Page 13: Distributed Systems - Brown Universitycs.brown.edu/courses/cs138/s19/lectures/Day19_2019.pdf · Day 19: Practical Consensus. Azure Storage 1998 Paxos 2014 Raft 2008 ZAB. Consensus

DesignDecisions:MotivatingLocks?

• Lockservicevs.consensus(Raft/Paxos)library

• Advantages:• Noneedtorewritecode

• Maintainprogramstructure,communicationpatterns• Cansupport notificationmechanism

• Smaller#ofnodes(servers)needed tomakeprogress

• Advisoryinsteadofmandatorylocks(why?):• Holdingalockcalled Fneither isnecessary toaccessthefileF,norpreventsotherclients fromdoingso

Page 14: Distributed Systems - Brown Universitycs.brown.edu/courses/cs138/s19/lectures/Day19_2019.pdf · Day 19: Practical Consensus. Azure Storage 1998 Paxos 2014 Raft 2008 ZAB. Consensus

DesignDecisions:LockTypes• Coarsevs.fine-grainedlocks

• Fine-grained: grablockbeforeeveryevent• Coarse-grained: grablockforlargegroupofevents

Advantagesofcoarse-grainedlocksLessloadonlockserverLessdelaywhenlockserverfailsLesslockserversandavailabilityrequired

Advantagesoffine-grainedlocksMorelockserverloadIfneeded,couldbeimplementedonclientside

Page 15: Distributed Systems - Brown Universitycs.brown.edu/courses/cs138/s19/lectures/Day19_2019.pdf · Day 19: Practical Consensus. Azure Storage 1998 Paxos 2014 Raft 2008 ZAB. Consensus

SystemStructure

• Chubbycell:asmallnumberofreplicas(e.g.,5)

• Masterisselectedusingaconsensusprotocol(e.g.,Raft)

Page 16: Distributed Systems - Brown Universitycs.brown.edu/courses/cs138/s19/lectures/Day19_2019.pdf · Day 19: Practical Consensus. Azure Storage 1998 Paxos 2014 Raft 2008 ZAB. Consensus

SystemStructure• Clients

• Sendreads/writes onlytothemaster

• Communicates withmasterviaachubbylibrary

• Everyreplicaserver• Islisted inDNS• Directclients tomaster• Maintaincopiesofasimpledatabase

Page 17: Distributed Systems - Brown Universitycs.brown.edu/courses/cs138/s19/lectures/Day19_2019.pdf · Day 19: Practical Consensus. Azure Storage 1998 Paxos 2014 Raft 2008 ZAB. Consensus

ReadandWrites

• Write• Masterpropagateswritetoreplica

• Replies afterthewritereaches amajority(e.g.,quorum)

• Read• Masterreplies directly,asithasmostuptodatestate

• Readsmuststillgotothemaster

Page 18: Distributed Systems - Brown Universitycs.brown.edu/courses/cs138/s19/lectures/Day19_2019.pdf · Day 19: Practical Consensus. Azure Storage 1998 Paxos 2014 Raft 2008 ZAB. Consensus

ChubbyAPIandLocks

Page 19: Distributed Systems - Brown Universitycs.brown.edu/courses/cs138/s19/lectures/Day19_2019.pdf · Day 19: Practical Consensus. Azure Storage 1998 Paxos 2014 Raft 2008 ZAB. Consensus

SimpleUNIX-likeFileSystemInterface• Barebonefile&directorystructure

• /ls/foo/wombat/pouch

Lock service; common to all names

Cell name

Name within cell

Page 20: Distributed Systems - Brown Universitycs.brown.edu/courses/cs138/s19/lectures/Day19_2019.pdf · Day 19: Practical Consensus. Azure Storage 1998 Paxos 2014 Raft 2008 ZAB. Consensus

SimpleUNIX-likeFileSystemInterface• Barebonefile&directorystructure

• /ls/foo/wombat/pouch

• Doesnotsupport,maintain,orreveal• Movingfiles• Path-dependent permission semantics• Directorymodified times, files last-access times

Page 21: Distributed Systems - Brown Universitycs.brown.edu/courses/cs138/s19/lectures/Day19_2019.pdf · Day 19: Practical Consensus. Azure Storage 1998 Paxos 2014 Raft 2008 ZAB. Consensus

Nodes• Node:afileordirectory

• Anynodecanactasanadvisoryreader/writer lock

• Anodemaybeeitherpermanentorephemeral• Ephemeral usedastemporaryfiles,e.g.,indicate aclient isalive

• Metadata• ThreenamesofACLs(R/W/change ACLname)

• Authentication build intoROC• 64-bitfilecontentchecksum

Page 22: Distributed Systems - Brown Universitycs.brown.edu/courses/cs138/s19/lectures/Day19_2019.pdf · Day 19: Practical Consensus. Azure Storage 1998 Paxos 2014 Raft 2008 ZAB. Consensus

Locks• Any node can act as lock (shared or exclusive)

• Advisory (vs. mandatory)• Protect resources at remote services• No value in extra guards by mandatory locks

• Write permission needed to acquire• Prevents unprivileged reader blocking progress

Page 23: Distributed Systems - Brown Universitycs.brown.edu/courses/cs138/s19/lectures/Day19_2019.pdf · Day 19: Practical Consensus. Azure Storage 1998 Paxos 2014 Raft 2008 ZAB. Consensus

LocksandSequences• Potential lockproblems indistributed systems

• AholdsalockL,issuesrequestW,thenfails• BacquiresL(becauseAfails),performsactions• Warrives(out-of-order)afterBʼsactions

• Solution1:backwardcompatible• Lockserverwillpreventotherclientsfromgettingthelockifalockbecomeinaccessibleortheholderhasfailed

• Lock-delayperiodcanbespecifiedbyclients

Page 24: Distributed Systems - Brown Universitycs.brown.edu/courses/cs138/s19/lectures/Day19_2019.pdf · Day 19: Practical Consensus. Azure Storage 1998 Paxos 2014 Raft 2008 ZAB. Consensus

LocksandSequences• Potential lockproblems indistributed systems

• AholdsalockL,issuesrequestW,thenfails• BacquiresL(becauseAfails),performsactions• Warrives(out-of-order)afterBʼsactions

• Solution2:sequencer• Alockholdercanobtainasequencer fromChubby• Itattachesthesequencer toanyrequests thatitsendstootherservers

• Theotherserverscanverifythesequencer information


Recommended