You build it, you run itMatthias Rampke, SoundCloud
You build it, you run itOperating SoundCloud's microservice architecture
GOTO Berlin 2016
Engineer in Production Engineering(platform, monitoring, availability)
previously in Systems Engineering(ops remnant catch-all)
Intro: meWho I am and where I work
a cloud full of sounds
135M tracks, 12M artists, 175M listeners
300+ employees
no ops team
Intro: SoundCloudWho I am and where I work
Where we came from
Where we are today
Why we did it
How you can do it
How does this compare to…?
Intro: Agenda
⋁
⋁
Where we came from
One team
One table
One codebase
In the beginning …the early days
2009/2010
20-50 engineers
hired an ops team, 24/7 on-call
app team deploys the monolith
first separate "micro"services
growing pains
2011/2012
more microservices
deployment platform
SRE/platforms team
multiple on-call rotations
the fork in the road
2013-2015
cambrian explosion of microservices
feature teams and collectives
client specific APIs
shared components & libraries
continuous delivery
maturing
⋁
⋁
Where we are today
simplified
Org chart
every
feature • service • codebase
is owned by a team
OwnershipYou buildown it, you run it
owners are on call for what they own
groups of teams work together to reduce load
remove alerts • write documentation
On Call
avoid shared infrastructure
be flexible
don't duplicate work
Shared Components
run the systems that run systems
monitoring & availability
internal consulting
Production Engineering
⋁
⋁
Why we did it
autonomy
predictability
velocity
Deliveryget more done, consistently
learn something new every day
no pure specialists
internal mobility
Personal growth
simple
resilient
operable
Better systems
⋁
⋁
How you can do it
basic automation
openness
pride
trust
Prerequisites
testing & deployment
on-call
provisioning
dependencies
Expanding ownership
internal moves
escalation paths
documentation
tooling
Checks & Balances
learn
improve
commiserate
Postmortems
⋁
⋁
How does this compare to …?
no assignment to SWE teams
no on-call handoff
no deploy blocks
Site Reliability Engineeringas Google describes it
more shared code
more communication
infrastructure & core teams
Radical agilityas Zalando describe it
no Ops team
less shared infrastructure
less standardization
deploys spread in a different dimension
DevOpsas described by Etsy
.soundcloud.com
Berlin New York San Francisco London
Slides: https://bit.ly/gotober16-scPlease rate!