Versioning, Consistency, and Agreement · 2013. 10. 10. · 1 Jenkins, if I want another yes‐man,...

Post on 03-Oct-2020

0 views 0 download

transcript

Versioning,Consistency,andAgreement

COS461:ComputerNetworksSpring2010(MW3:00‐4:20inCS105)

MikeFreedmanhIp://www.cs.princeton.edu/courses/archive/spring10/cos461/

1

Jenkins,ifIwantanotheryes‐man,I’llbuildone!

Lee Lorenz, Brent Sheppard

Timeanddistributedsystems

•  WithmulPpleevents,whathappensfirst?

A shoots B

B dies 2

Timeanddistributedsystems

•  WithmulPpleevents,whathappensfirst?

A dies

B shoots A

3

Timeanddistributedsystems

•  WithmulPpleevents,whathappensfirst?

A shoots B

B dies A dies

B shoots A

4

JustusePmestamps?

•  Needsynchronizedclocks

•  ClocksynchviaaPmeserver

p Time server S

5

•  Usesa!meservertosynchronizeclocks

•  TimeserverkeepsthereferencePme

•  ClientsaskserverforPmeandadjusttheirlocal

clock,basedontheresponse

–  Butdifferentnetworklatency→clockskew?

•  Correctforthis?Forlinkswithsymmetricallatency:

CrisPan’sAlgorithm

adjusted-local-time = server-timestamp t + (RTT / 2)

local-clock-error = adjusted-local-time – local-time

RTT = response-received-time – request-sent-time

6

Isthissufficient?

•  Serverlatencyduetoload?–  Ifcanmeasure:

•  adjusted‐local‐Pme=server‐Pmet+(RTT+lag)/2

•  Butwhataboutasymmetriclatency?– RTT/2notsufficient!

•  WhatdoweneedtomeasureRTT?– Requiresnoclockdrib!

•  Whatabout“almost”concurrentevents?– Clockshavemicro/milli‐secondprecision

7

EventsandHistories

•  Processesexecutesequencesofevents

•  Eventscanbeof3types:–  local,send,andreceive

•  Thelocalhistoryhpofprocesspisthesequenceofeventsexecutedbyprocess

8

Orderingevents

•  ObservaPon1:– Eventsinalocalhistoryaretotallyordered

time!pi

9

Orderingevents

•  ObservaPon1:– Eventsinalocalhistoryaretotallyordered

•  ObservaPon2:– Foreverymessagem,send(m)precedesreceive(m)

time!

time!

time!

pi

pi

pj

m

10

Happens‐Before(Lamport[1978])

•  RelaPvePme?DefineHappens‐Before(→):–  Onthesameprocess:a→b,if!me(a)<!me(b)

–  Ifp1sendsmtop2:send(m)→receive(m)

–  Ifa→bandb→cthena→c

•  LamportAlgorithmusesforparPalordering:–  Allprocessesuseacounter(clock)withiniPalvalueof0–  Counterincrementedbyandassignedtoeachevent,as

itsPmestamp

–  Asend(msg)eventcarriesitsPmestamp

–  Forreceive(msg)event,counterisupdatedbyMax(receiver‐counter,message‐Pmestamp)+1

11

EventsOccurringatThreeProcesses

12

LamportTimestamps

1

1

2

3 4

5

13

LamportLogicalTime

Host 1

Host 2

Host 3

Host 4

1

2

2

3

3

5

4

5

3

6

4

6

7

8

0

0

0

0

1

2

4

3 3

4

7

Physical Time

14

LamportLogicalTime

Host 1

Host 2

Host 3

Host 4

1

2

2

3

3

5

4

5

3

6

4

6

7

8

0

0

0

0

1

2

4

3 3

4

7

Physical Time

Logically concurrent events!

15

VectorLogicalClocks•  WithLamportLogicalTime

–  eprecedesf⇒Pmestamp(e)<Pmestamp(f),but

–  Pmestamp(e)<Pmestamp(f)⇒eprecedesf

16

VectorLogicalClocks•  WithLamportLogicalTime

–  eprecedesf⇒Pmestamp(e)<Pmestamp(f),but

–  Pmestamp(e)<Pmestamp(f)⇒eprecedesf

•  VectorLogicalPmeguaranteesthis:–  Allhostsuseavectorofcounters(logicalclocks),ithelementistheclockvalueforhosti,iniPally0

–  Eachhosti,incrementstheithelementofitsvectoruponanevent,assignsthevectortotheevent.

–  Asend(msg)eventcarriesvectorPmestamp

–  Forreceive(msg)event,

Max(Vreceiver[j],Vmsg[j]), ifjisnotself

Vreceiver[j]+1 otherwiseVreceiver[j] =

17

VectorTimestamps

18

VectorLogicalTime

Host 1

Host 2

Host 3

Host 4

1,0,0,0

Physical Time

1,1,0,0

1,0,0,0

1,2,0,0

2,2,3,0

1,2,0,0

2,0,0,0

2,0,1,0 2,0,2,0

Vreceiver[j] =

2,0,2,1

Max(Vreceiver[j],Vmsg[j]), ifjisnotself

Vreceiver[j]+1 otherwise

19

ComparingVectorTimestamps

•  a=b iftheyagreeateveryelement

•  a<b ifa[i]<=b[i]foreveryi,but!(a=b)•  a>b ifa[i]>=b[i]foreveryi,but!(a=b)•  a||b ifa[i]<b[i],a[j]>b[j],forsomei,j(conflict!)

•  Ifonehistoryisprefixofother,thenonevectorPmestamp<other

•  Ifonehistoryisnotaprefixoftheother,then(atleastbyexample)VTswillnotbecomparable.

20

GivenanoPonofPme…

…What’sanoPonofconsistency?

21

StrictConsistency

•  Strongestconsistencymodelwe’llconsider– AnyreadonadataitemXreturnsvaluecorrespondingtoresultofthemostrecentwriteonX

•  NeedanabsoluteglobalPme– “Mostrecent”needstobeunambiguous

x

WritextoaReadxreturnsa

22

Whatelsecanwedo?

•  Strictconsistencyistheidealmodel– Butimpossibletoimplement!

•  SequenPalconsistency– Slightlyweakerthanstrictconsistency– DefinedforsharedmemoryformulP‐processors

23

SequenPalConsistency

•  DefiniPon:ResultofanyexecuPonisthesameasifall(readandwrite)operaPonsondatastorewereexecutedinsomesequenPalorder,andtheoperaPonsofeachindividualprocessappearinthissequenceintheorderspecifiedbyitsprogram

•  DefiniPon:Whenprocessesarerunningconcurrently:–  InterleavingofreadandwriteoperaPonsisacceptable,butallprocessesseethesameinterleavingofoperaPons

•  Differencefromstrictconsistency–  NoreferencetothemostrecentPme

–  AbsoluteglobalPmedoesnotplayarole24

ValidSequenPalConsistency?

x 25

Linearizability

•  Linearizability– Weakerthanstrictconsistency–  StrongerthansequenPalconsistency

•  AlloperaPons(OP=read,write)receiveaglobalPme‐stampusingasynchronizedclock

•  Linearizability:–  RequirementsforsequenPalconsistency,plus

–  Iftsop1(x)<tsop2(y),thenOP1(x)shouldprecedeOP2(y)inthesequence

26

CausalConsistency

•  NecessarycondiPon:– Writesthatarepoten&allycausallyrelatedmustbeseenbyallprocessesinthesameorder.

–  Concurrentwritesmaybeseeninadifferentorderondifferentmachines.

•  WeakerthansequenPalconsistency

•  Concurrent:Opsthatarenotcausallyrelated

27

CausalConsistency

•  Allowedwithcausalconsistency,butnotwithsequenPalorstrictconsistency

•  W(x)bandW(x)careconcurrent–  Soallprocessesdon’tseetheminthesameorder

•  P3andP4readthevalues‘a’and‘b’inorderaspotenPallycausallyrelated.No‘causality’for‘c’.

28

CausalConsistency

x

29

CausalConsistency

•  Requireskeepingtrackofwhichprocesseshaveseenwhichwrites

– Needsadependencygraphofwhichopisdependentonwhichotherops

– …orusevectorPmestamps!

30

Eventualconsistency•  Ifnonewupdatesaremadetoanobject,abersomeinconsistencywindowcloses,allaccesseswillreturnthelastupdatedvalue

•  Prefixproperty:–  IfPihaswritewacceptedfromsomeclientbyPj–  ThenPihasallwritesacceptedbyPjpriortow

•  Usefulwhereconcurrencyappearsonlyinarestrictedform

•  AssumpPon:writeconflictswillbeeasytoresolve–  Eveneasierifwhole‐”object”updatesonly

31

Systemsusingeventualconsistency

•  DB:updatedbyafewproc’s,readbymany–  Howfastmustupdatesbepropagated?

•  Webpages:typicallyupdatedbysingleuser–  So,nowrite‐writeconflicts–  Howevercachescanbecomeinconsistent

32

Systemsusingeventualconsistency

•  DNS:eachdomainassignedtoanamingauthority–  Onlymasterauthoritycanupdatethenamespace–  OtherNSserversactas“slave”servers,downloadingDNSzonefilefrommasterauthority

–  So,write‐writeconflictswon’thappen

–  Isthisalwaystruetoday?

$ORIGINcoralcdn.org.@INSOAns3.fs.net.hostmaster.scs.cs.nyu.edu.( 18 ;serial 1200 ;refresh 600 ;retry 172800 ;expire 21600);minimum

33

TypicalimplementaPonofeventualconsistency

•  Distributed,inconsistentstate– Writesonlygotosomesubsetofstoragenodes

•  Bydesign(forhigherthroughput)•  Duetotransmissionfailures

•  “AnP‐entropy”(gossiping)fixesinconsistencies– Usevectorclocktoseewhichisolder–  Prefixpropertyhelpsnodesknowconsistencystatus–  IfautomaPc,requiressomewaytohandlewriteconflicts

•  ApplicaPon‐specificmerge()funcPon•  Amazon’sDynamo:UsersmayseemulPpleconcurrent“branches”beforeapp‐specificreconciliaPonkicksin

34

Examples…•  Causalconsistency.Non‐causallyrelatedsubjecttonormaleventualconsistencyrules

•  Read‐your‐writesconsistency.

•  Sessionconsistency.Read‐your‐writesholdsiffclientsessionexists.Ifsessionterminates,noguaranteesbetweensessions.

•  Monotonicreadconsistency.Oncereadreturnsaversion,subsequentreadsneverreturnolderversions.

•  Monotonicwriteconsistency.Writesbysameprocessareproperlyserialized.Reallyhardtoprogramsystemswithoutthisprocess.

35

Evenread‐your‐writesmaybedifficulttoachieve

36

Whataboutstrongeragreement?

•  Two‐phasecommitprotocol

37

•  Marriageceremony

•  Theater

•  Contractlaw

Doyou?Ido.Inowpronounceyou…

Readyontheset?Ready!AcPon!

OfferSignatureDeal/lawsuit

Whataboutstrongeragreement?

•  Two‐phasecommitprotocol

38

LeaderAcceptorsAcceptorsAcceptors

PREPARE

READY

COMMIT

ACK

ClientWRITE

ACK

Allprepared?

Allack’d?

Whataboutfailures?

•  Ifanacceptorfails:– CansPllensurelinearizabilityif|R|+|W|≥N

– “read”and“write”quorumsoverlapinatleast1node

•  Iftheleaderfails?– Loseavailability:systemnotlonger“live”

•  Pickanewleader?– Needtomakesureeverybodyagreesonleader!– Needtomakesurethat“group”isknown

39

ConsensusandPaxosAlgorithm•  “Consensus”problem

–  Nprocesseswanttoagreeonavalue–  IffewerthanFfaultsinawindow,consensusachieved

•  “Crash”faultsneed2F+1processes•  “Malicious”faults(calledByzanPne)need3F+1processes

•  CollecPonofprocessesproposingvalues–  Onlyproposedvaluemaybechosen

–  Onlysinglevaluechosen

•  Commonusage:–  Viewchange:defineleaderandgroupviaPaxos–  Leaderusestwo‐phasecommitforwrites

–  Acceptorsmonitorleaderforliveness.Ifdetectfailure,re‐execute“viewchange” 40

Paxos:Algorithm

ViewChangefromcurrentview

Viewi:V={Leader:N2,Group:{N1,N2,N3}}

Phase1(Prepare)•  Proposer:Sendpreparewithversion#jtomembersofViewi

•  Acceptor:ifj>vers#kofanyotherprepareitseen,respondwithpromisenottoacceptlower‐numberedproposals.Otherwise,

respondwithkandvaluev’accepted.

Phase2(Accept)•  Ifmajoritypromise,proposersendsacceptwith(versj,valuev)

•  Acceptoracceptsunlessithasrespondedtopreparewithhighervers#thanj.Sendsacknowledgementtoallviewmembers.

41

Summary

•  GlobalPmedoesn’texistindistributedsystem

•  LogicalPmecanbeestablishedviaversion#’s

•  LogicalPmeusefulinvariousconsistencymodels–  Strict>Linearizability>SequenPal>Causal>Eventual

•  Agreementindistributedsystem–  Eventualconsistency:Quorums+anP‐entropy

–  Linearizability:Two‐phasecommit,Paxos

42