Putting the “Micro” Backin Microservices
Sol Boucher, Carnegie Mellon University
Joint work with:Anuj Kalia
David G. AndersenMichael Kaminsky, Intel Labs
Tech Target
Wall Street Journal
The Register
Hacker Noon
InfoWorld 2
The hope for serverless computing
3
Only have to manage code
Microservices invoked by triggers
Microservices are stateless
This makes the system scalable
Fine-grained billing that scales to zero
Median AWS Lambda warm-start latency 25 ms
Median cold-start latency >160 ms
Median AWS Lambda warm-start latency 25 ms
Median cold-start latency >160 ms
Goal: Reduce microservice invocation latency
4
[Yesterday, ATC‘18]
Latency between Azure VMs ~10 μs [AccelNet, NSDI‘18]
Commit ACID transactions in ~20 μs [FaRM, SOSP‘15]
Latency between Azure VMs ~10 μs [AccelNet, NSDI‘18]
Commit ACID transactions in ~20 μs [FaRM, SOSP‘15]
Speed begets generality
Make it fast, rather than general or powerful.
— Butler Lampson
“”
5
Worker node
Current request path
6
μservice
μservice
Dispatcher process μservice
Proposal: Reduce overhead...
Worker node
7
CPU core
CPU core
CPU core
CPU core
μsvc μsvc μsvc μsvc μsvc μsvc
μsvc μsvc μsvc μsvc μsvc μsvc
Proposal: ...by running code in shared workers...
Worker node
8
CPU core
CPU core
CPU core
CPU core
Worker process
μservice μservice
Worker process
μservice μservice
Worker process
μservice μservice
Proposal: ...and distributing work using polling
Worker node
9
CPU core
CPU core
CPU core
CPU core
Worker process
μservice μservice
Worker process
μservice μservice
Worker process
μservice μservice
Dispatcher process
Proposal: ...and distributing work using polling
Worker node
10
CPU core
CPU core
CPU core
CPU core
Worker process
μservice μservice
Worker process
μservice μservice
Worker process
μservice μservice
Dispatcher process
But...How do we provide isolation?
We use Rust for this, inspired by NetBricks and [OSDI‘16] [SOSP‘17]
How do we achieve isolation similar to processes?
Language-based isolation: compile-time safety guarantees
Fine-grained preemption: intra-process task interruption
11
User submits Rust code; we verify it
Language-based isolation cuts invocation latency
12
Language-based isolation cuts invocation latency
13
Language-based isolation: Use Rust
Rust is…● Strongly typed, compiled● Specified safe subset● No garbage collector
Memory safety guarantees:● No dereferencing null/dangling pointers● All variables initialized to valid values● Enforced data immutability
14
Worker node
Language-based isolation: Defense in depth
15
Worker process
μservice μservice Blacklisted library functions
User
Kernelseccomp() to permit only whitelisted system calls
Worker node
Language-based isolation: Defense in depth
16
Worker process
μservice μservice Blacklisted library functions
User
Kernelseccomp() to permit only whitelisted system callsBut...
What if a microservice doesn’t yield?
CPU timesharing: Fine-grained preemption
Goal: Recover from microservice that doesn’t return quickly
1. Regain control of the CPU2. Abort/clean up after microservice’s code
Implementation: POSIX timers, special cleanup logic
17
Fine-grained preemption
18
3-μs period is possible!
20-μs period is practical!
Preemption interval (μs)
Workload throughput
(M ops/s)
BaselinePreemption90% of Baseline
Fine-grained preemption: Aborting and cleanup
SIGALRM handler: missed deadline?
Worker’s main loop catches exception
Handler returns, microservice
continuesno
Handler throws exception,
unwinding stack
yes
19
Trust model
Trusted computing base:
● Rust compiler, standard library● Any allowed unsafe or native dependencies
Successful compilation indicates microservice is memory safe
Successful linking indicates all dependencies are trusted
20
✓ Consolidate microservices into shared processes✓ Improved local invocation latency by orders of magnitude✓ (Hopefully) better resource utilization
→ Current limitations and future work
21
Recap
Call to malloc()
Future work: Aborting/cleanup limitations
22
Worker process
μservice μservice
Call to malloc()
Future work: Aborting/cleanup limitations
23
Worker process
μservice μservice
Call to malloc()
Future work: Aborting/cleanup limitations
24
Worker process
μservice μserviceμservice
Call to malloc()
Future work: Aborting/cleanup limitations
25
Worker process
μservice μserviceμservice μservice
Upcoming: More general accounting/deallocation scheme
● Operates outside the Rust runtime● Disables preemption during trusted library routines
Call to malloc()
Future work: Aborting/cleanup limitations
26
Worker process
μservice μserviceμservice μservice
Future work: Side-channel attacks
Heightened Spectre vulnerability requires hardware mitigation
Must consider microservices’ access to:
● Process’s proximity to resource limits● Addresses and timings from the dynamic allocator● File descriptor numbers
Shorter microservice durations make behavior less obscure
27
Conclusion
Improved performance by shifting isolation abstraction layer
Replaced traditional process-based isolation with:
● Language-based isolation● Fine-grained preemption
28
Conclusion
Improved performance by shifting isolation abstraction layer
Replaced traditional process-based isolation with:
● Language-based isolation● Fine-grained preemption
29
Questions?
Thank you!
30