+ All Categories
Home > Documents > Verifying Distributed Systems - The Coq Workshop 2018 · 2018. 8. 15. · the job market next year!...

Verifying Distributed Systems - The Coq Workshop 2018 · 2018. 8. 15. · the job market next year!...

Date post: 25-Jan-2021
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
157
Verifying Distributed Systems Zachary Tatlock Coq Workshop 2018
Transcript
  • Verifying Distributed Systems

    Zachary Tatlock

    Coq Workshop 2018

  • The Team

    Steve Anton

    Mike Ernst

    Tom Anderson

    Ryan Doenges

    Keith Simmons

    Xi
Wang

    Ilya
Sergey

    Karl Palmskog

    Miranda Edwards

    Doug Woos

    Pavel
Panchekha

    James Wilcox

    Justin Adsuara

  • Steve Anton

    Mike Ernst

    Tom Anderson

    Ryan Doenges

    Keith Simmons

    Xi
Wang

    Ilya
Sergey

    Karl Palmskog

    Miranda Edwards

    Justin Adsuara

    James Wilcox

    Doug Woos

    Pavel
Panchekha

    The Team

  • Steve Anton

    Mike Ernst

    Tom Anderson

    Ryan Doenges

    Keith Simmons

    Xi
Wang

    Ilya
Sergey

    Karl Palmskog

    Miranda Edwards

    Justin Adsuara

    James Wilcox

    Doug Woos

    Pavel
Panchekha

    Amazing researchers on the job market next year!

    The Team

  • Distributed Systems

  • Distributed Systems

  • Distributed Apps

  • Distributed Infrastructure

  • One summer day...

  • One summer day...

  • One summer day...

  • One summer day...

  • How distributed systems fail

  • How distributed systems fail

    concurrencyChallenges

  • How distributed systems fail

    concurrencymessage drops

    machine crash

    Challenges

    message dupsmessage reorder

    machine reboot

  • How distributed systems fail

  • How distributed systems fail

    Too many possible behaviors to effectively test!

  • How distributed systems fail

    Too many possible behaviors to effectively test!

    Edsger W. DijkstraUnder the Spell of Leibniz's Dream

    When exhaustive testing is impossible, our trust can only be based on proof.

  • Toward verified distributed systems

  • Toward verified distributed systems

    Formalize network semanticscapture how faults can occur

  • Toward verified distributed systems

    Separate app / fault reasoning

    Formalize network semanticscapture how faults can occur

  • Toward verified distributed systems

    Separate app / fault reasoningdevelop and prove in simple fault model

    Formalize network semanticscapture how faults can occur

  • Toward verified distributed systems

    Separate app / fault reasoningdevelop and prove in simple fault model

    Formalize network semanticscapture how faults can occur

    !AAAB8nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lE0GPRi8cK9gPSUDbbTbt0kw27E6WE/gwvHhTx6q/x5r9x2+agrQ8GHu/NMDMvTKUw6LrfTmltfWNzq7xd2dnd2z+oHh61jco04y2mpNLdkBouRcJbKFDybqo5jUPJO+H4duZ3Hrk2QiUPOEl5ENNhIiLBKFrJ72kxHCHVWj31qzW37s5BVolXkBoUaParX72BYlnME2SSGuN7bopBTjUKJvm00ssMTykb0yH3LU1ozE2Qz0+ekjOrDEiktK0EyVz9PZHT2JhJHNrOmOLILHsz8T/PzzC6DnKRpBnyhC0WRZkkqMjsfzIQmjOUE0so08LeStiIasrQplSxIXjLL6+S9kXdc+ve/WWtcVPEUYYTOIVz8OAKGnAHTWgBAwXP8ApvDjovzrvzsWgtOcXMMfyB8/kDwruRjQ==AAAB8nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lE0GPRi8cK9gPSUDbbTbt0kw27E6WE/gwvHhTx6q/x5r9x2+agrQ8GHu/NMDMvTKUw6LrfTmltfWNzq7xd2dnd2z+oHh61jco04y2mpNLdkBouRcJbKFDybqo5jUPJO+H4duZ3Hrk2QiUPOEl5ENNhIiLBKFrJ72kxHCHVWj31qzW37s5BVolXkBoUaParX72BYlnME2SSGuN7bopBTjUKJvm00ssMTykb0yH3LU1ozE2Qz0+ekjOrDEiktK0EyVz9PZHT2JhJHNrOmOLILHsz8T/PzzC6DnKRpBnyhC0WRZkkqMjsfzIQmjOUE0so08LeStiIasrQplSxIXjLL6+S9kXdc+ve/WWtcVPEUYYTOIVz8OAKGnAHTWgBAwXP8ApvDjovzrvzsWgtOcXMMfyB8/kDwruRjQ==AAAB8nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lE0GPRi8cK9gPSUDbbTbt0kw27E6WE/gwvHhTx6q/x5r9x2+agrQ8GHu/NMDMvTKUw6LrfTmltfWNzq7xd2dnd2z+oHh61jco04y2mpNLdkBouRcJbKFDybqo5jUPJO+H4duZ3Hrk2QiUPOEl5ENNhIiLBKFrJ72kxHCHVWj31qzW37s5BVolXkBoUaParX72BYlnME2SSGuN7bopBTjUKJvm00ssMTykb0yH3LU1ozE2Qz0+ekjOrDEiktK0EyVz9PZHT2JhJHNrOmOLILHsz8T/PzzC6DnKRpBnyhC0WRZkkqMjsfzIQmjOUE0so08LeStiIasrQplSxIXjLL6+S9kXdc+ve/WWtcVPEUYYTOIVz8OAKGnAHTWgBAwXP8ApvDjovzrvzsWgtOcXMMfyB8/kDwruRjQ==AAAB8nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lE0GPRi8cK9gPSUDbbTbt0kw27E6WE/gwvHhTx6q/x5r9x2+agrQ8GHu/NMDMvTKUw6LrfTmltfWNzq7xd2dnd2z+oHh61jco04y2mpNLdkBouRcJbKFDybqo5jUPJO+H4duZ3Hrk2QiUPOEl5ENNhIiLBKFrJ72kxHCHVWj31qzW37s5BVolXkBoUaParX72BYlnME2SSGuN7bopBTjUKJvm00ssMTykb0yH3LU1ozE2Qz0+ekjOrDEiktK0EyVz9PZHT2JhJHNrOmOLILHsz8T/PzzC6DnKRpBnyhC0WRZkkqMjsfzIQmjOUE0so08LeStiIasrQplSxIXjLL6+S9kXdc+ve/WWtcVPEUYYTOIVz8OAKGnAHTWgBAwXP8ApvDjovzrvzsWgtOcXMMfyB8/kDwruRjQ==

  • Toward verified distributed systems

    Separate app / fault reasoningdevelop and prove in simple fault modelapply generic verified fault handling

    Formalize network semanticscapture how faults can occur

    !AAAB8nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lE0GPRi8cK9gPSUDbbTbt0kw27E6WE/gwvHhTx6q/x5r9x2+agrQ8GHu/NMDMvTKUw6LrfTmltfWNzq7xd2dnd2z+oHh61jco04y2mpNLdkBouRcJbKFDybqo5jUPJO+H4duZ3Hrk2QiUPOEl5ENNhIiLBKFrJ72kxHCHVWj31qzW37s5BVolXkBoUaParX72BYlnME2SSGuN7bopBTjUKJvm00ssMTykb0yH3LU1ozE2Qz0+ekjOrDEiktK0EyVz9PZHT2JhJHNrOmOLILHsz8T/PzzC6DnKRpBnyhC0WRZkkqMjsfzIQmjOUE0so08LeStiIasrQplSxIXjLL6+S9kXdc+ve/WWtcVPEUYYTOIVz8OAKGnAHTWgBAwXP8ApvDjovzrvzsWgtOcXMMfyB8/kDwruRjQ==AAAB8nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lE0GPRi8cK9gPSUDbbTbt0kw27E6WE/gwvHhTx6q/x5r9x2+agrQ8GHu/NMDMvTKUw6LrfTmltfWNzq7xd2dnd2z+oHh61jco04y2mpNLdkBouRcJbKFDybqo5jUPJO+H4duZ3Hrk2QiUPOEl5ENNhIiLBKFrJ72kxHCHVWj31qzW37s5BVolXkBoUaParX72BYlnME2SSGuN7bopBTjUKJvm00ssMTykb0yH3LU1ozE2Qz0+ekjOrDEiktK0EyVz9PZHT2JhJHNrOmOLILHsz8T/PzzC6DnKRpBnyhC0WRZkkqMjsfzIQmjOUE0so08LeStiIasrQplSxIXjLL6+S9kXdc+ve/WWtcVPEUYYTOIVz8OAKGnAHTWgBAwXP8ApvDjovzrvzsWgtOcXMMfyB8/kDwruRjQ==AAAB8nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lE0GPRi8cK9gPSUDbbTbt0kw27E6WE/gwvHhTx6q/x5r9x2+agrQ8GHu/NMDMvTKUw6LrfTmltfWNzq7xd2dnd2z+oHh61jco04y2mpNLdkBouRcJbKFDybqo5jUPJO+H4duZ3Hrk2QiUPOEl5ENNhIiLBKFrJ72kxHCHVWj31qzW37s5BVolXkBoUaParX72BYlnME2SSGuN7bopBTjUKJvm00ssMTykb0yH3LU1ozE2Qz0+ekjOrDEiktK0EyVz9PZHT2JhJHNrOmOLILHsz8T/PzzC6DnKRpBnyhC0WRZkkqMjsfzIQmjOUE0so08LeStiIasrQplSxIXjLL6+S9kXdc+ve/WWtcVPEUYYTOIVz8OAKGnAHTWgBAwXP8ApvDjovzrvzsWgtOcXMMfyB8/kDwruRjQ==AAAB8nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lE0GPRi8cK9gPSUDbbTbt0kw27E6WE/gwvHhTx6q/x5r9x2+agrQ8GHu/NMDMvTKUw6LrfTmltfWNzq7xd2dnd2z+oHh61jco04y2mpNLdkBouRcJbKFDybqo5jUPJO+H4duZ3Hrk2QiUPOEl5ENNhIiLBKFrJ72kxHCHVWj31qzW37s5BVolXkBoUaParX72BYlnME2SSGuN7bopBTjUKJvm00ssMTykb0yH3LU1ozE2Qz0+ekjOrDEiktK0EyVz9PZHT2JhJHNrOmOLILHsz8T/PzzC6DnKRpBnyhC0WRZkkqMjsfzIQmjOUE0so08LeStiIasrQplSxIXjLL6+S9kXdc+ve/WWtcVPEUYYTOIVz8OAKGnAHTWgBAwXP8ApvDjovzrvzsWgtOcXMMfyB8/kDwruRjQ==

  • Toward verified distributed systems

    Separate app / fault reasoningdevelop and prove in simple fault modelapply generic verified fault handling

    Formalize network semanticscapture how faults can occur

    !AAAB8nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lE0GPRi8cK9gPSUDbbTbt0kw27E6WE/gwvHhTx6q/x5r9x2+agrQ8GHu/NMDMvTKUw6LrfTmltfWNzq7xd2dnd2z+oHh61jco04y2mpNLdkBouRcJbKFDybqo5jUPJO+H4duZ3Hrk2QiUPOEl5ENNhIiLBKFrJ72kxHCHVWj31qzW37s5BVolXkBoUaParX72BYlnME2SSGuN7bopBTjUKJvm00ssMTykb0yH3LU1ozE2Qz0+ekjOrDEiktK0EyVz9PZHT2JhJHNrOmOLILHsz8T/PzzC6DnKRpBnyhC0WRZkkqMjsfzIQmjOUE0so08LeStiIasrQplSxIXjLL6+S9kXdc+ve/WWtcVPEUYYTOIVz8OAKGnAHTWgBAwXP8ApvDjovzrvzsWgtOcXMMfyB8/kDwruRjQ==AAAB8nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lE0GPRi8cK9gPSUDbbTbt0kw27E6WE/gwvHhTx6q/x5r9x2+agrQ8GHu/NMDMvTKUw6LrfTmltfWNzq7xd2dnd2z+oHh61jco04y2mpNLdkBouRcJbKFDybqo5jUPJO+H4duZ3Hrk2QiUPOEl5ENNhIiLBKFrJ72kxHCHVWj31qzW37s5BVolXkBoUaParX72BYlnME2SSGuN7bopBTjUKJvm00ssMTykb0yH3LU1ozE2Qz0+ekjOrDEiktK0EyVz9PZHT2JhJHNrOmOLILHsz8T/PzzC6DnKRpBnyhC0WRZkkqMjsfzIQmjOUE0so08LeStiIasrQplSxIXjLL6+S9kXdc+ve/WWtcVPEUYYTOIVz8OAKGnAHTWgBAwXP8ApvDjovzrvzsWgtOcXMMfyB8/kDwruRjQ==AAAB8nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lE0GPRi8cK9gPSUDbbTbt0kw27E6WE/gwvHhTx6q/x5r9x2+agrQ8GHu/NMDMvTKUw6LrfTmltfWNzq7xd2dnd2z+oHh61jco04y2mpNLdkBouRcJbKFDybqo5jUPJO+H4duZ3Hrk2QiUPOEl5ENNhIiLBKFrJ72kxHCHVWj31qzW37s5BVolXkBoUaParX72BYlnME2SSGuN7bopBTjUKJvm00ssMTykb0yH3LU1ozE2Qz0+ekjOrDEiktK0EyVz9PZHT2JhJHNrOmOLILHsz8T/PzzC6DnKRpBnyhC0WRZkkqMjsfzIQmjOUE0so08LeStiIasrQplSxIXjLL6+S9kXdc+ve/WWtcVPEUYYTOIVz8OAKGnAHTWgBAwXP8ApvDjovzrvzsWgtOcXMMfyB8/kDwruRjQ==AAAB8nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lE0GPRi8cK9gPSUDbbTbt0kw27E6WE/gwvHhTx6q/x5r9x2+agrQ8GHu/NMDMvTKUw6LrfTmltfWNzq7xd2dnd2z+oHh61jco04y2mpNLdkBouRcJbKFDybqo5jUPJO+H4duZ3Hrk2QiUPOEl5ENNhIiLBKFrJ72kxHCHVWj31qzW37s5BVolXkBoUaParX72BYlnME2SSGuN7bopBTjUKJvm00ssMTykb0yH3LU1ozE2Qz0+ekjOrDEiktK0EyVz9PZHT2JhJHNrOmOLILHsz8T/PzzC6DnKRpBnyhC0WRZkkqMjsfzIQmjOUE0so08LeStiIasrQplSxIXjLL6+S9kXdc+ve/WWtcVPEUYYTOIVz8OAKGnAHTWgBAwXP8ApvDjovzrvzsWgtOcXMMfyB8/kDwruRjQ==

  • Toward verified distributed systems

    Verified Raft Consensus

    The Verdi Framework

    TCB, Tools, Teaching

    Enriching Models & Modularity

  • Toward verified distributed systems

    Verified Raft Consensus

    The Verdi Framework

    TCB, Tools, Teaching

    Enriching Models & Modularity

  • Formalizing distributed systems

  • Formalizing distributed systems

  • Formalizing distributed systems

    timeouts

  • Formalizing distributed systems

    timeouts

    msg delivery

  • Formalizing distributed systems

    timeouts

    msg delivery

    state change

  • Formalizing distributed systems

    timeouts

    msg delivery

    state change

    node failure

  • Formalizing distributed systems

  • Formalizing distributed systems

    1. Defining distributed systems2. Giving systems semantics3. Proving system safety4. Reusable, verified fault-tolerance

  • 1. Distributed sys as event handlers

    Def mySys (P : params) : system :=

    ...

  • Def mySys (P : params) : system :=

    // types for state and I/OType msg := (* to/from internal nodes *)Type cmsg := (* to/from external world *)Type data := (* node-local state *)

    ...

    1. Distributed sys as event handlers

  • Def mySys (P : params) : system :=

    // types for state and I/OType msg := (* to/from internal nodes *)Type cmsg := (* to/from external world *)Type data := (* node-local state *)

    Type resp := data * list cmsg * list msg

    // event handlersDef onMsg : data * msg -> respDef onTmOut : data * unit -> respDef onClient : data * cmsg -> resp

    1. Distributed sys as event handlers

  • 2. Network semantics

  • state of the world

    2. Network semantics

  • state of the world

    packets in flight

    2. Network semantics

  • state of the world

    packets in flight

    data @ nodes

    2. Network semantics

  • state of the world

    packets in flight history of client I/O

    data @ nodes

    2. Network semantics

  • 2. Network semantics

  • Good old small step operational semantics.

    2. Network semantics

  • Hnet(dst, ⌃[dst], src, m)=(�0, o, P

    0) ⌃0=⌃[dst 7! �0]({(src, dst, m)} ] P, ⌃, T ) (P ] P 0, ⌃0, T ++ hoi) Deliver

    Example rule: message delivery

  • Hnet(dst, ⌃[dst], src, m)=(�0, o, P

    0) ⌃0=⌃[dst 7! �0]({(src, dst, m)} ] P, ⌃, T ) (P ] P 0, ⌃0, T ++ hoi) Deliver

    Example rule: message delivery

    if this message is in the network

  • Hnet(dst, ⌃[dst], src, m)=(�0, o, P

    0) ⌃0=⌃[dst 7! �0]({(src, dst, m)} ] P, ⌃, T ) (P ] P 0, ⌃0, T ++ hoi) Deliver

    Example rule: message delivery

    if this message is in the network

    run handler on message

  • Hnet(dst, ⌃[dst], src, m)=(�0, o, P

    0) ⌃0=⌃[dst 7! �0]({(src, dst, m)} ] P, ⌃, T ) (P ] P 0, ⌃0, T ++ hoi) Deliver

    Example rule: message delivery

    if this message is in the network

    get responserun handler on message

  • Hnet(dst, ⌃[dst], src, m)=(�0, o, P

    0) ⌃0=⌃[dst 7! �0]({(src, dst, m)} ] P, ⌃, T ) (P ] P 0, ⌃0, T ++ hoi) Deliver

    Example rule: message delivery

    if this message is in the network

    get response

    resulting new global state

    run handler on message

  • Hnet(dst, ⌃[dst], src, m)=(�0, o, P

    0) ⌃0=⌃[dst 7! �0]({(src, dst, m)} ] P, ⌃, T ) (P ] P 0, ⌃0, T ++ hoi) Deliver

    p 2 P(P, ⌃, T ) (P ] {p}, ⌃, T ) Duplicate

    ({p} ] P, ⌃, T ) (P, ⌃, T ) Drop

    Htmt(n, ⌃[n]) = (�0, o, P

    0) ⌃0 = ⌃[n 7! �0](P, ⌃, T ) (P ] P 0, ⌃0, T ++ htmt, oi) Timeout

    2. Network semantics

  • Library of network semanticsType sem := state -> state -> Prop

    Def sync_sem := (* in-order delivery *)

    Def async_sem := (* + reordering *)

    Def flaky_sem := (* + drops, timeouts *)

    Def busy_sem := (* + duplicates *)

    Def crash_sem := (* + crash, reboot *)

  • Library of network semanticsType sem := state -> state -> Prop

    Def sync_sem := (* in-order delivery *)

    Def async_sem := (* + reordering *)

    Def flaky_sem := (* + drops, timeouts *)

    Def busy_sem := (* + duplicates *)

    Def crash_sem := (* + crash, reboot *)

    Precisely characterize fault model for sys.

  • Library of network semanticsType sem := state -> state -> Prop

    Def sync_sem := (* in-order delivery *)

    Def async_sem := (* + reordering *)

    Def flaky_sem := (* + drops, timeouts *)

    Def busy_sem := (* + duplicates *)

    Def crash_sem := (* + crash, reboot *)

    Precisely characterize fault model for sys.

    more behaviors--> harder proof

  • Def ok : state -> Prop

    3. Verifying system safety

  • Def ok : state -> Prop

    3. Verifying system safety

  • init state

    Def ok : state -> Prop

    3. Verifying system safety

  • init state

    Def ok : state -> Prop

    3. Verifying system safety

  • init state

    Def ok : state -> Prop

    3. Verifying system safety

    need to show all reachable states ok

  • 3. Verifying system safety

    ok

    ?Def ok : state -> Prop

  • 3. Verifying system safety

    okDef ok : state -> Prop

    As usual, problem isspecs not inductive.

  • Def ok : state -> Prop

    3. Verifying system safety

    Strengthen “ok” to inductive “ok_ind”.

    As usual, problem isspecs not inductive.

    ok

    ok_ind

  • Def ok : state -> Prop

    3. Verifying system safety

    Strengthen “ok” to inductive “ok_ind”.

    As usual, problem isspecs not inductive.

    ok

    ok_ind

    When verifying systems in a particular semantics, need to repeat similar fault tolerance reasoning for every system.

  • 4. Verifying system transformers

    Implement fault tolerance as wrapperDef tcp : system -> system

    Transfer proofs across semanticsTheorem tcp_ok : forall s P, P s -> lift_tcp P (tcp s)

    Separate app proof / fault tolerancehandles class of faults once and for allcan compose transformers, proofs

  • 4. Verifying system transformers

    App

    RaftConsensus

    App

    PrimaryBackup

    Seq # andRetrans

    GhostVariables

  • Toward verified distributed systems

    Verified Raft Consensus

    The Verdi Framework

    TCB, Tools, Teaching

    Enriching Models & Modularity

  • Example: verifying Raft in Verdi

  • critical components must not fail

    Example: verifying Raft in Verdi

  • Replication for fault tolerance

  • Replication for fault tolerance

    )available if n/2 nodes are up

  • Replication for fault tolerance

    )

  • )Replication correctness

  • Replication correctness

  • Replication correctness

    ⇡linearizability

    cluster looks like a single node (state machine) to clients

  • Def raft(sm: state machine, ...) :=

    ...

    Defining Raft

  • Def raft(sm: state machine, ...) :=

    // types for state and I/Ocmsg := sm.cmsg

    ...

    Defining Raft

  • Def raft(sm: state machine, ...) :=

    // types for state and I/Ocmsg := sm.cmsgmsg := (* ??? *)data := (* ??? *)

    // event handlersDef onMsg := (* ??? *)Def onTmOut := (* ??? *)Def onClient := (* ??? *)

    Defining Raft

  • Raft: election and replication terms

  • {

    Term 1

    {Term 2

    {Term 3

    Raft: election and replication terms

    election

    replication

    ...

  • ...{

    Term 1

    {Term 2

    {Term 3

    Raft: leader election

  • ...{

    Term 1

    {Term 2

    {Term 3

    Raft: leader election

    Candidate

  • ...{

    Term 1

    {Term 2

    {Term 3

    Raft: leader election

    Candidate

    Followers

    ReqVote

  • ...{

    Term 1

    {Term 2

    {Term 3

    Raft: leader election

    Candidate

    Followers

    ReqVote Vote

  • ...{

    Term 1

    {Term 2

    {Term 3

    Raft: election and replication terms

  • ...{

    Term 1

    {Term 2

    {Term 3

    Raft: log replication

    Leader

    Followers

    Append AppendAck

  • Def raft(sm: state machine, ...) :=

    // types for state and I/Ocmsg := sm.cmsgmsg := ReqVote | Vote | Append | ...data := { sm.data, list sm.op, ... }

    // event handlersDef onMsg :=Def onTmOut :=Def onClient := {

    Defining Raft

  • Verifying Raft

  • Verifying Raft

    ⇡linearizability

  • ⇡Raft internal correctness

  • ⇡Raft internal correctness

    linearizability follows from internal correctness:

    state machine safety

  • Proving Raft in Verdi

    )

  • )

    Proving Raft in Verdi

  • State machine safetyNodes’ logs match on committed entries

    proof by induction over executions

    since only committed entries executed

    )

  • State Machine Safety: Proof

  • State Machine Safety: Proof

    not inductive!

  • State Machine Safety: Proof

    I

  • State Machine Safety: Proof

    I) I

  • State Machine Safety: Proof

    I) I

  • State Machine Safety: Proof

    I) II Itrue initially preserved

  • State Machine Safety: Proof

    I) II Itrue initially preserved

    Lemma Lemma Lemma …90 invariants

    in total

  • State Machine Safety: Proof

    I) II Itrue initially preserved

    Lemma Lemma Lemma …

  • State Machine Safety: Proof

    I) II Itrue initially preserved

    Lemma Lemma Lemma …

  • State Machine Safety: Proof

    I) II Itrue initially preserved

    Lemma Lemma Lemma …

  • The burden of proof

    P) PP with ghost state

    P true initially P preservedLemma Lemma …Lemma

    Re-verification is the primary challenge: - invariants are not inductive - not-yet-verified code is wrong - need additional invariants

  • The burden of proof

    P) PP with ghost state

    P true initially P preservedLemma Lemma …LemmaRe-verification is the primary challenge

  • The burden of proof

    P) PP with ghost state

    P true initially P preservedLemma Lemma …LemmaRe-verification is the primary challenge

    Proof engineering techniques help: - affinity lemmas - intermediate reachability - structural tactics - information hiding

  • Ghost state: exampleCapture all entries received by a node

  • Ghost state: exampleCapture all entries received by a node

    Leader

  • Ghost state: exampleCapture all entries received by a node

    Leader

    Log (real)

    A,B,C

  • Ghost state: exampleCapture all entries received by a node

    Leader

    FollowerLog (real)

    A,B,C

    A,D

  • Ghost state: exampleCapture all entries received by a node

    Leader

    FollowerLog (real) allEntries (ghost)

    A,B,C

    A,D {A,D}

    {A,B,C}

  • Ghost state: exampleCapture all entries received by a node

    Leader

    Follower

    Append

    Log (real) allEntries (ghost)

    A,B,C

    [A],B,C

    A,D {A,D}

    {A,B,C}

  • Ghost state: exampleCapture all entries received by a node

    Leader

    Follower

    Append

    Log (real) allEntries (ghost)

    A,B,C

    [A],B,C

    A,B,C {A,B,C,D}

    {A,B,C}

  • ) e.term > 0e allEntries 2

    Affinity lemmas: example

  • )e.term > 0

    e log2

    ) e.term > 0e allEntries 2

    Affinity lemmas: example

  • Affinity Lemma)e.term > 0

    e log2

    ) e.term > 0e allEntries 2

    Affinity lemmas: example

  • Affinity Lemma

    every invariant of entries in logs is invariant of entries in allEntries

    )e.term > 0

    e log2

    ) e.term > 0e allEntries 2

    Affinity lemmas: example

  • Affinity lemmas: example

    Affinity Lemma

    every invariant of entries in logs is invariant of entries in allEntries

    )P e

    e log2

    ) P ee allEntries 2

  • More affinity lemmas

    Relate ghost state to real statetransfer properties once and for all

    Relate current messages to pastresponse => past request

  • handler = update_state ; respond

    Structured handlers

  • handler = update_state ; respond

    handler

    net

    net’

    Structured handlers

  • handler = update_state ; respond

    handler

    net

    net’

    update_statenet

    net’

    netirespond

    Structured handlers

  • handler = update_state ; respond

    handler

    net

    net’

    I

    I

    update_statenet

    net’

    netirespond

    Structured handlers

  • Structured handlershandler = update_state ; respond

    handler

    net

    net’

    update_statenet

    net’

    netirespond

    I

    I

    I

    I

    I

  • First formal verification of Raft

    50k lines of Coq 18 person-months Considerable pizza and beer

  • First formal verification of Raft

    50k lines of Coq 18 person-months Considerable pizza and beer

  • First formal verification of Raft

    50k lines of Coq 18 person-months Considerable pizza and beer

  • Toward verified distributed systems

    Verified Raft Consensus

    The Verdi Framework

    TCB, Tools, Teaching

    Enriching Models & Modularity

  • Verified perfect

    Network semantics shim is delicateatomicity, fairness, serialization,…

    Verdi users need Coq + distr sys skillsnotorious learning curves hinder impact

    Regular development still trickymaintenance, extension, management

    6=AAAB63icbVBNS8NAEJ3Ur1q/qh69LBbBU0lE0GPRi8cK9gPaUDbbSbt0s4m7G6GE/gUvHhTx6h/y5r9x0+agrQ8GHu/NMDMvSATXxnW/ndLa+sbmVnm7srO7t39QPTxq6zhVDFssFrHqBlSj4BJbhhuB3UQhjQKBnWBym/udJ1Sax/LBTBP0IzqSPOSMmlzqS3wcVGtu3Z2DrBKvIDUo0BxUv/rDmKURSsME1brnuYnxM6oMZwJnlX6qMaFsQkfYs1TSCLWfzW+dkTOrDEkYK1vSkLn6eyKjkdbTKLCdETVjvezl4n9eLzXhtZ9xmaQGJVssClNBTEzyx8mQK2RGTC2hTHF7K2FjqigzNp6KDcFbfnmVtC/qnlv37i9rjZsijjKcwCmcgwdX0IA7aEILGIzhGV7hzYmcF+fd+Vi0lpxi5hj+wPn8ARgNjkI=AAAB63icbVBNS8NAEJ3Ur1q/qh69LBbBU0lE0GPRi8cK9gPaUDbbSbt0s4m7G6GE/gUvHhTx6h/y5r9x0+agrQ8GHu/NMDMvSATXxnW/ndLa+sbmVnm7srO7t39QPTxq6zhVDFssFrHqBlSj4BJbhhuB3UQhjQKBnWBym/udJ1Sax/LBTBP0IzqSPOSMmlzqS3wcVGtu3Z2DrBKvIDUo0BxUv/rDmKURSsME1brnuYnxM6oMZwJnlX6qMaFsQkfYs1TSCLWfzW+dkTOrDEkYK1vSkLn6eyKjkdbTKLCdETVjvezl4n9eLzXhtZ9xmaQGJVssClNBTEzyx8mQK2RGTC2hTHF7K2FjqigzNp6KDcFbfnmVtC/qnlv37i9rjZsijjKcwCmcgwdX0IA7aEILGIzhGV7hzYmcF+fd+Vi0lpxi5hj+wPn8ARgNjkI=AAAB63icbVBNS8NAEJ3Ur1q/qh69LBbBU0lE0GPRi8cK9gPaUDbbSbt0s4m7G6GE/gUvHhTx6h/y5r9x0+agrQ8GHu/NMDMvSATXxnW/ndLa+sbmVnm7srO7t39QPTxq6zhVDFssFrHqBlSj4BJbhhuB3UQhjQKBnWBym/udJ1Sax/LBTBP0IzqSPOSMmlzqS3wcVGtu3Z2DrBKvIDUo0BxUv/rDmKURSsME1brnuYnxM6oMZwJnlX6qMaFsQkfYs1TSCLWfzW+dkTOrDEkYK1vSkLn6eyKjkdbTKLCdETVjvezl4n9eLzXhtZ9xmaQGJVssClNBTEzyx8mQK2RGTC2hTHF7K2FjqigzNp6KDcFbfnmVtC/qnlv37i9rjZsijjKcwCmcgwdX0IA7aEILGIzhGV7hzYmcF+fd+Vi0lpxi5hj+wPn8ARgNjkI=AAAB63icbVBNS8NAEJ3Ur1q/qh69LBbBU0lE0GPRi8cK9gPaUDbbSbt0s4m7G6GE/gUvHhTx6h/y5r9x0+agrQ8GHu/NMDMvSATXxnW/ndLa+sbmVnm7srO7t39QPTxq6zhVDFssFrHqBlSj4BJbhhuB3UQhjQKBnWBym/udJ1Sax/LBTBP0IzqSPOSMmlzqS3wcVGtu3Z2DrBKvIDUo0BxUv/rDmKURSsME1brnuYnxM6oMZwJnlX6qMaFsQkfYs1TSCLWfzW+dkTOrDEkYK1vSkLn6eyKjkdbTKLCdETVjvezl4n9eLzXhtZ9xmaQGJVssClNBTEzyx8mQK2RGTC2hTHF7K2FjqigzNp6KDcFbfnmVtC/qnlv37i9rjZsijjKcwCmcgwdX0IA7aEILGIzhGV7hzYmcF+fd+Vi0lpxi5hj+wPn8ARgNjkI=

  • Network semantics shim is delicate

  • Network semantics shim is delicate

    Note that all steps are atomic in semantics! Shim must carefully persist to ensure fidelity.

  • Network semantics shim is delicate

  • Network semantics shim is delicate

    User stumbled across liveness bug for single

    node cluster.

  • Network semantics shim is delicate

  • Network semantics shim is delicate

    “CSmith” paper for verified distr sys

  • Network semantics shim is delicate

  • Network semantics shim is delicate

    Pedro et al. found several bugs, BUT none in any verified components.

  • Network semantics shim is delicate

    Pedro et al. found several bugs, BUT none in any verified components.

    Cheerios:New system transformer with correct serialization implemented and verified.Justin Adsuara

  • Training the next generation

    Doug Woos

    There will always be a TCBwe’ll always need informed judgement

    Engineers unlikely to pick this up at workbut courses great evangelism opportunity

    How to get this into ugrad canon?need reusable labs and tools

  • Training the next generation

    Doug Woos

  • Training the next generation

    Doug Woos

  • Proof engineering

  • Proof engineering

    Karl Palmskog

    Finally catching the interest of the SE

    community: ASE ’17, ICSE ’18, ISSTA ‘18

  • Toward verified distributed systems

    Verified Raft Consensus

    The Verdi Framework

    TCB, Tools, Teaching

    Enriching Models & Modularity

  • Churn = nodes joining & leaving a system at run time

  • ...

    Punctuated safety propertiesReachable under churn ( )

    Safety after churn stops( )

    Ryan Doenges

  • ...

    Punctuated safety propertiesReachable under churn ( )

    Safety after churn stops( )

  • Toward verifying churn tolerance

    Tree aggregation

    Chord

    aggregate data in sensor networksdesignated root node eventually correct

    distributed hash tableprotocol bugs found [Zave 2015]ring should eventually stabilize

  • Composition: A way to make proofs harder

  • Composition: A way to make proofs harder

    “In 1997, the unfortunate reality is that engineers rarely specify and reason formally about the systems they build. It seems unlikely that reasoning about the composition of open-system specifications will be a practical concern within the next 15 years.”

  • “Horizontal composition”: eliminate closed world hypothesis

  • + =

    “Horizontal composition”: eliminate closed world hypothesis

  • Compositional Verif of Distr Sys

    [POPL 18] Ilya
Sergey

    James Wilcox

  • Toward verified distributed systems

    Verified Raft Consensus

    The Verdi Framework

    TCB, Tools, Teaching

    Enriching Models & Modularity

  • Reflections on Verdi experience

    Distributed sys good fit for verificationcritical, expert-written, I/O bound cases

    Biggest challenge is proof engineeringreproving and managing scale daunting

    Lots of low-hanging fruit leftdynamic update, concurrency, optimization

  • The most important ingredients

    Steve Anton

    Mike Ernst

    Tom Anderson

    Ryan Doenges

    Keith Simmons

    Xi
Wang

    Ilya
Sergey

    Karl Palmskog

    Miranda Edwards

    Doug Woos

    Pavel
Panchekha

    James Wilcox

    Justin Adsuara

  • Thank You!

    Verified Raft Consensus

    The Verdi Framework

    TCB, Tools, Teaching

    Enriching Models & Modularity

    http://distributedcomponents.net


Recommended