Adopting actorsAn epic tail of loss and learning
Workday
Growth
2013 2014 2015 2016
Cloud Master
Launch tasks Assign to agents
Cloud Master
Launch tasks Assign to agents
Service Growthin millions of tasks per month
2015-09 2015-10 2015-11 2015-12 2016-01 2016-02 2016-03 2016-04 2016-050
5
10
15
20
PrintLargeSmallBatch
Why Akka?
Initial Observations
Parent
Config ChildSnapshots
Changes
Parent
Config ChildSnapshots
Changes
Message flow: Ensure messages follow a consistent path
Parent
Config ChildSnapshots
Changes
Creation: Assume actor is recovering from failure (state machine)
Anti-patterns
GodClass
Movie Star
Pool
Agent
State
Agent Agent Agent Agent
Queue
Movie Star
Too much state• Hard to reason about• Too many messages in flight• Hard to recover• Bad concurrencyBreakup Actor
Split Brain
Pool
Agent
State
Agent Agent Agent Agent
Duplicate state
Single source of truth• Synchronizing state is hard• Failure causes–State out of sync–Causes more failureMerge Actors
Split Brain
Pool
Agent
State
Agent Agent Agent Agent
Task
Passing responsibility
Seems simple at first• Do not always know who is in control• Both actors updating the same row• Creates race conditionsSplit Actors
Can youlet it crash?
Pool
Agent
State
Agent Agent Agent Agent
Can you let it crash?
Do not make it
more robust!!!
Lessons
Test for resilience
• Chaos Marmoset• Unit test recovery• Destructive system test
Stateless Enterprise
idioms do not apply
Sovereignty
One actor • One row • One shard• One table
Otherwise failure hard to handle
AtomicityActors
Atomic receive methodState not sharedComms async messagesNot nestable
MutexAtomic scopeState is sharedComms via mutable stateNestable (ACID)
Atomicity
Anything!!! Nothing
Actors Mutex
Pool
Agent
State
Agent Agent Agent Agent
Atomicity
Eventual consistency
Lessons
- Atomicity and Consistency
- Actor modeling ≠ Object modeling
- Test for Resilience not robustness
- Refactor Early
?