Page 1
Aysylu GreenbergOctober 11, 2016
Distributed Systems in Practice,in Theory
Page 2
How I got into reading papers as a
practitioner in industry
Page 3
Computer Science ResearchInDistributed Systems Industry
Page 4
Operating systems research
Page 5
Operating systems research
Page 6
Operating systems research
Concurrency
Page 7
Operating systems research
Concurrency
Concurrency primitives: mutex & semaphore
Page 8
Operating systems research
Concurrency
Concurrency primitives: mutex & semaphore
Processes execute at different speeds
Page 9
Time in distributed systems
https://www.flickr.com/photos/national_archives_of_norway/62633532281970
Page 10
Time in distributed systems
1970
Page 11
Time in distributed systems
Pipelining
1970
Page 15
Internet
Distributed consensus
1980
Page 16
Internet
Distributed consensus
1980
Page 17
Internet
Distributed consensus
1980
Page 18
Paxos
Internet
Distributed consensus
1980
Page 19
Reconsider large systems
Page 20
Reconsider large systems
Shared infrastructure
...
Page 21
CS Research is Timeless
Inform decisions
Mitigate technical risk
Page 22
* 22
Aysylu Greenberg
@aysylu22
Page 23
Papers We Love NYC
Page 24
Papers We Love SF
Page 25
* 25
Aysylu Greenberg
@aysylu22
Page 26
Today
● Staged Event-Driven Architecture
Page 27
Today
● Staged Event-Driven Architecture● Leases
Page 28
Today
● Staged Event-Driven Architecture● Leases● Inaccurate Computations
Page 29
Staged Event Driven
Architecture&
Deep Pipelines
2001
Page 30
Hardware to Data Pipelines
Page 31
Hardware to Data Pipelines
https://en.wikipedia.org/wiki/Graphics_pipeline
Page 33
Staged Event Driven Architecture
Page 34
Staged Event Driven Architecture
+ -
Page 35
Single-machine pipeline
generalizes to distributed pipelines
Staged Event Driven Architecture
Page 36
Search Indexing Pipelines
Page 37
Search Indexing Pipelines
Page 38
Search Indexing Pipelines
Page 39
Search Indexing Pipelines
Page 40
Search Indexing Pipelines
Page 41
Search Indexing Pipelines
Page 42
Search Indexing Pipelines
Page 43
Search Indexing Pipelines
Page 44
Search Indexing Pipelines
+ -
Page 45
Leasesas Heart Beat in
Distributed Systems
1989
Page 47
Leases
● Distributed locking
Page 48
Leases
● Distributed locking● Lease term tradeoffs
○ short
Page 49
Leases
● Distributed locking● Lease term tradeoffs
○ short vs long
Page 50
Leases
● Distributed locking● Lease term tradeoffs
○ short vs long● Use of leases in modern applications
○ Leader election TTL (in etcd)
Page 51
Leases
● Distributed locking● Lease term tradeoffs
○ short vs long● Use of leases in modern applications
○ Leader election TTL (in etcd)○ Liveness detection
Page 53
Leases in Build System:Success Scenario
Page 54
Build my project
Build System
Page 55
Build my project
Build System
OK
Page 56
Build my project
Build System
OK
Waiting for the results
Page 57
Build my project
Build System
OK
Waiting for the results
Build is in progress
Page 58
Build my project
Build System
OK
Waiting for the results
Build is in progress
Waiting for the results
Page 59
Build my project
Build System
OK
Waiting for the results
Build is in progress
Waiting for the results
Build is finished
Page 60
Leases in Build System:Failure Scenario
Page 61
Leases in Build System
Page 62
Leases in Build System
Page 63
Leases in Build System
Page 64
Leases in Build System
Page 65
Leases in Build System
Page 66
Leases in Build System
Page 67
Using etcd leases for heartbeat$ curl http://server.com/v2/keys/foo -XPUT -d\
value=bar -d ttl=300
Page 68
{ "action": "set", "node": { "createdIndex": 2, "expiration":"2016-10-11T16:50:00", "key": "/foo", "modifiedIndex": 2, "ttl": 300, "value": "bar" }}
Page 69
Using etcd leases for heartbeat$ curl http://server.com/v2/keys/foo -XPUT -d \
value=bar -d ttl=300
… 3 minutes later...
Page 70
Using etcd leases for heartbeat$ curl http://server.com/v2/keys/foo -XPUT -d \
value=bar -d ttl=300
$ curl \
http://server.com/v2/keys/foo?prevValue=bar \
-XPUT -d ttl=300 -d refresh=true -d \
prevExist=true
Page 71
{ "action": "update", "node": { "createdIndex": 2, "expiration":"2016-10-11T16:53:00", "key": "/foo", "modifiedIndex": 3, "ttl": 300, "value": "bar" } "prevNode": {...}}
Page 72
{ "action": "update", "node": { "createdIndex": 2, "expiration":"2016-06-14T16:18:00", "key": "/foo", "modifiedIndex": 3, "ttl": 300, "value": "bar" } "prevNode": {...}}
"prevNode": { "createdIndex": 2, "expiration":"2016-10-11T16:50:00", "key": "/foo", "modifiedIndex": 2, "ttl": 120, "value": "bar"}
Page 73
Leases for heartbeat:How long should the lease term be?
Page 74
Inaccurate Computations&Serving Search Results
Page 75
From Accurate to "Good Enough"
Page 76
[Trade off] Inaccuracy for Performance
Page 80
[Trade off] Inaccuracy for Resilience
Page 83
Reduce
Map
Input
Map
Input
Map
Input
Page 84
Inaccuracy for Resilience
1. Task decomposition
Page 86
Inaccuracy for Resilience
1. Task decomposition2. Baseline for correctness
Page 88
Inaccuracy for Resilience
1. Task decomposition2. Baseline for correctness3. Criticality Testing
Page 94
Inaccuracy for Resilience
1. Task decomposition2. Baseline for correctness3. Criticality Testing4. Distortion and timing models
Page 97
[In production]Inaccuracy for Performance & Resilience
Page 98
Jeff Dean "Building Software Systems at Google and Lessons Learned", Stanford, 2010
Page 101
[Designing with]Inaccuracy for Performance & Resilience
Page 102
[Designing with]Inaccuracy for Performance & Resilience
simplified implementation
focus on observabilityapplicable to some problem domains
Page 103
[Designing with]Inaccuracy for Performance & Resilience
fuzz testing
generative testing
simplified implementation
fault injection testing
focus on observabilityapplicable to some problem domains
Page 104
References● T. Wurthinger, C. Wimmer et al. "One VM to Rule Them
All"● M. Rinard "Probabilistic Accuracy Bounds for
Fault-Tolerant Computations that Discard Tasks"● F. Corbato, M. Daggett, R. Daley "An Experimental
Time-Sharing System"● E. Dijkstra "Cooperating Sequential Processes"● L. Lamport "Time, Clocks, and the Ordering of Events in a
Distributed System"● http://blinkdb.org/
Page 105
References● B. Oki, B. Liskov "Viewstamped Replication: A New Primary Copy
Method to Support Highly-Available Distributed Systems"● L. Lamport "The Part-Time Parliament"● M. Welsh, D. Culler, E. Brewer "SEDA: An Architecture for
Well-Conditioned, Scalable Internet Services"● C. Gray, D. Cheriton "Leases: An Efficient Fault-Tolerant
Mechanism for Distributed File Cache Consistency"● S. Agarwal, B. Mozafari et al. "BlinkDB: Queries with Bounded
Errors and Bounded Response Times on Very Large Data"
Page 106
GratitudeInes SombraDavid GreenbergKaran ParikhMatt WelshErran Berger
Page 107
Robust & scalable pipelines
Page 108
Robust & scalable pipelinesLeases for sharing &
heartbeat
Page 109
Robust & scalable pipelinesLeases for sharing &
heartbeatInaccuracy for resilience &
performance
Page 110
Robust & scalable pipelinesLeases for sharing &
heartbeatInaccuracy for resilience &
performance
CS research is timeless:use it to mitigate risk
Page 111
Aysylu GreenbergOctober 11, 2016
Distributed Systems in Practice,in Theory
@aysylu22