Date post: | 14-May-2015 |
Category: |
Technology |
Upload: | ross-snyder |
View: | 8,117 times |
Download: | 3 times |
Etsy is the world’s handmade marketplace.
(vintage and supplies, too)
2
Etsy was founded in mid-2005 and is constantly growing.
Gross Merchandise Sales ($MM)
3
Four employees, one web*, one db, founder’s apartment
June2005:
* until getting slashdotted by a link from Boing Boing in Aug. 2005
From humble beginnings...
4
250+ employees, multiple offices, billions of pageviews
Sept.2011:
... to today’s handmade juggernaut.
(NYC Mayor Mike Bloomberg visited Etsy in June 2011)
5
How’d we get here?
6
Answer: with some difficulty.“There is no education like adversity.” - Benjamin Disraeli
7
A few disclaimers
8
Hindsight is 20/20
9
“History is written by the victors”
10
Etsy thrives today because of what
its early employees accomplished
11
Your narrator wasn’t present for mostof the events covered in this talk
12
Etsy Architecture: 2007
13
Etsy Architecture: 2007
Operating System:
Database:
Webserver:
Languages:
14
Etsy Architecture: 2007
Most business logic inPostgres stored procedures
15
Etsy Architecture: 2007
Front end / database interaction = stored procedure calls wrapped with PHP functions
16
Etsy Architecture: 2007
Some database partitioning by feature,but still with a large central DB
17
Etsy Architecture: 2007
Site uptime = not great
18
Etsy Architecture: 2007
“How do we scale?”
19
Etsy Architecture: 2007
“Let’s write some middleware!”
(runners up: “Let’s rewrite the site in Java!”and “Let’s rewrite the site in Python!”)
20
“Any organization that designs a system (defined broadly) will produce a design whose structure is a copy of the organization's communication structure.”
Conway’s Law:
- Melvin Conway, 1968
21
Etsy Engineering: 2007
Dev DBA Ops
22
Etsy Engineering: 2007
Dev DBA Ops
Devs write code
23
Etsy Engineering: 2007
Dev DBA Ops
DBAs write SQL
24
Etsy Engineering: 2007
Dev DBA Ops
Ops deploys code & touches prod
25
SILOS
26
Etsy’s big bet: “Sprouter”(the Stored Procedure Router)
27
Web(PHP)
Sprouter(Python)
DB(Postgres)
Sprouter
Runs on each webserver,listens on port 8010
28
Web(PHP)
Sprouter(Python)
DB(Postgres)
Sprouter
Maps name/arguments to a Postgres stored procedure, calls it, returns results
29
Web(PHP)
Sprouter(Python)
DB(Postgres)
Sprouter
Caches things
30
Web(PHP)
Sprouter(Python)
DB(Postgres)
Sprouter
Supports sharding (in theory)
31
Web(PHP)
Sprouter(Python)
DB(Postgres)
Sprouter
Devs write PHP, DBAs write SQL,meet somewhere in the middle
32
SILOS
33
Web(PHP)
Sprouter(Python)
DB(Postgres)
Sprouter
The hope: easier to scale Sprouterthan to scale the database itself
34
Web(PHP)
Sprouter(Python)
DB(Postgres)
Sprouter
(scaling the db when everything’s in stored procedures = somewhere between
hard and impossible)
35
Sprouter: TimelineFall ’07: Idea first discussed
Spring ’08: Alpha version debutsFall ’08: Released in production
36
Sprouter: TimelineFall ’07: Idea first discussed
Spring ’08: Alpha version debutsFall ’08: Released in production
Spring ’09: Sprouter deprecated37
What happened?
38
Web(PHP)
Sprouter(Python)
DB(Postgres)
Sprouter: “Good” Parts
Forcibly centralizes database access
39
Web(PHP)
Sprouter(Python)
DB(Postgres)
Sprouter: “Good” Parts
Hides data store implementationfrom caller
40
Web(PHP)
Sprouter(Python)
DB(Postgres)
Sprouter: “Good” Parts
Opens the door for“clever” automatic caching
41
Web(PHP)
Sprouter(Python)
DB(Postgres)
Sprouter: “Good” Parts
Prevents developers from writing SQL (?)
42
43
Web(PHP)
Sprouter(Python)
DB(Postgres)
Sprouter: Not-As-Good Parts
Creates substantial developer friction
44
Web(PHP)
Sprouter(Python)
DB(Postgres)
Sprouter: Not-As-Good Parts
Homegrown daemon + dependenciesfor Ops to maintain
45
Web(PHP)
Sprouter(Python)
DB(Postgres)
Sprouter: Not-As-Good Parts
Lack of community support / provability
46
Web(PHP)
Sprouter(Python)
DB(Postgres)
Sprouter: Not-As-Good Parts
Complex synchronization required to deploy (due to tight coupling with Postgres)
47
Web(PHP)
Sprouter(Python)
DB(Postgres)
Sprouter: Not-As-Good Parts
Database remains single point of failure(sharding features never fully formed)
48
Sprouter: SummaryExtra barriers to development
49
Sprouter: SummaryExtra barriers to development+ Negligible (negative?) effect on site reliability
50
Sprouter: SummaryExtra barriers to development
+ Deploys even more painful+ Negligible (negative?) effect on site reliability
51
Sprouter: SummaryExtra barriers to development
+ Deploys even more painful+ Requires extra Ops/Dev resources
+ Negligible (negative?) effect on site reliability
52
Sprouter: SummaryExtra barriers to development
+ Deploys even more painful+ Requires extra Ops/Dev resources
=
+ Negligible (negative?) effect on site reliability
53
How did attitudes change so quickly?
54
Sprouter: TimelineFall ’07: Idea first discussed
Spring ’08: Alpha version debutsFall ’08: Released in production
Spring ’09: Sprouter deprecated55
The Great Etsy Culture Shift
56
The Great Etsy Culture Shift
Just as Sprouter went live, many of its strongest proponents departed Etsy
57
The Great Etsy Culture Shift
Taking with them...
58
The Great Etsy Culture Shift
Devotion to Postgres stored procedures / types
59
The Great Etsy Culture Shift
Fear of developers writing SQL
60
The Great Etsy Culture Shift
Fear of developers touching prod
61
The Great Etsy Culture Shift
Infrequent / large deploys to production
62
The Great Etsy Culture Shift
“Not developed here”
63
Fall
’08
Then Now
The Great Etsy Culture Shift
64
DevOps
65
DevOps
Silos = bad
66
DevOps
Trust, cooperation, transparency,shared responsibility = good
67
DevOps
“We’re all in this together”
68
The Way Forward: Part 1
Stabilize the site
69
The Way Forward: Part 1
Improve metrics & monitoring
Stabilize the site
70
The Way Forward: Part 1
StatsDhttp://github.com/etsy/statsd
Stabilize the site
71
The Way Forward: Part 1
Upgrade database hardwarevertically as far as possible
Stabilize the site
72
The Way Forward: Part 1
Give developers production access to help troubleshoot problems
Stabilize the site
73
The Way Forward: Part 2
Continuous Deployment
74
The Way Forward: Part 2
Any engineer can deploy to prod(generally happens 25+ times per day)
Continuous Deployment
75
The Way Forward: Part 2
Deployinatorhttp://github.com/etsy/deployinator
Continuous Deployment
76
The Way Forward: Part 2
One button that deploys the site
Continuous Deployment
77
The Way Forward: Part 2
Small changesets, deployed frequently
Continuous Deployment
78
The Way Forward: Part 2
Requires solid tests,good communication
Continuous Deployment
79
The Way Forward: Part 2
Distributed developer-driven QA
Continuous Deployment
80
The Way Forward: Part 3
Circumvent Sprouter
81
The Way Forward: Part 3
Object-Relational Mapping (ORM)
Circumvent Sprouter
82
The Way Forward: Part 3
aka “The Vietnam of Computer Science”(Google it)
Circumvent Sprouter
83
The Way Forward: Part 3
Front-end PHP talks directly to database via ORM (also written in PHP)
Circumvent Sprouter
84
The Way Forward: Part 3
ORM can cache where appropriate(as can front end)
Circumvent Sprouter
85
The Way Forward: Part 4
Database Sharding
86
The Way Forward: Part 4
Etsy has a lot of DNA from flickr -including their DB sharding scheme
Database Sharding
87
The Way Forward: Part 4
Based on MySQL
Database Sharding
88
The Way Forward: Part 4
Battle-tested, well-known
Database Sharding
89
The Way Forward: Part 4
Scales horizontally to infinity(or close enough)
Database Sharding
90
The Way Forward: Part 4
No single points of failure(master-master replication)
Database Sharding
91
Gradually phase out Sprouter,phase in ORM / sharded data
The Way Forward: Part 4Database Sharding
92
Sprouter: Timeline
Fall ’07: Idea first discussedSpring ’08: Alpha version debuts
Fall ’08: Released in productionSpring ’09: Sprouter deprecated
93
Sprouter: Timeline
Fall ’07: Idea first discussedSpring ’08: Alpha version debuts
Fall ’08: Released in productionSpring ’09: Sprouter deprecated
Spring ’11: Sprouter turned off
94
95
Lessons Learned
96
Etsy Architecture: 2007
Operating System:
Database:
Webserver:
Languages:
97
Etsy Architecture: 2011
Operating System:
Database:
Webserver:
Languages:
98
Open & trusting > closed & afraid(DevOps DevOps DevOps)
99
Front end/database interaction is too critical to take chances on novel/untested solutions
100
Side corollary: If you’re doing something “clever”, you’re probably doing it wrong
101
The architectural decisions you make today will have large impact long after you’re gone
102
No architectural hole is so deep that proven scaling strategies don’t exist for digging out
103
We are probably making decisions today that will be the subject of a similar talk in 2015
Acknowledgement
104
Learn More:http://codeascraft.etsy.com/@codeascraft
105
Etsy is hiring!http://www.etsy.com/careers@etsy
106