Outgrowing an internet startup: database administration in a fast growing company.

transcript

Spil Games: outgrowing an internet startup Art van Scheppingen Head of Database Engineering

1.  Who is Spil Games? 2.  How to professionalize? 3.  Spil Storage Pla<orm 4.  Ques@ons?

Overview

Who are we? Who is Spil Games?

•  Company founded in 2001 •  350+ employees world wide •  170M unique visitors per month •  100K unique visitors per month on spilgames.com

Geographic Reach 170 Million Monthly Ac@ve Users(*)

Source: (*) Google Analy3cs, December 2011

•  Over 40 localized portals in 19 languages •  Focus on casual and social games •  170M MAU per month (30M YoY growth) •  Over 40M registered users

Girls, Teens and Family Brands

•  Inhouse game studios •  Partnerships with Social Gaming studios •  Over 1500 licensed games

DB Servers MAU Employees

2006 2007 2008 2009 2010 2011

DB Servers

Employees

Spil Games is growing fast!

Database Engineering How to professionalize your department

•  Databases maintained by Systems Engineering •  No focus on performance, structure or backups •  Looking only one or two weeks into the future

Startup

•  Mul@ple migra@ons to new hardware •  Ping-‐pong on Master-‐Master setups •  Lack of insight into performance issues

Lessons

Read+write

•  Plan ahead up to three months •  Improve database pla<orm •  Reduce number of repe@@ve tasks

•  Write them down step by step (wiki) •  Automate where possible

•  Improve monitoring •  A single monitoring system is not enough!

•  Forecast growth •  Week / Month / Year •  Look back and evaluate!

•  Extend department

Professionalize

•  Scaling the LDAP pla<orm •  LDAP replaced by MySQL based solu@on (with help from Percona)

LDAP isn’t suitable for the web

MMMDMM

•  LVM snapshot method •  took 4 hours on average with manual interven@on

•  Innobackupex + netcat + tar + script = quick cloning •  Takes about 1 hour per 100GB •  Foolproof •  Can be run on ac@ve masters (if necessary)

Cloning

•  Different monitoring systems give different insights •  Different angles/metrics/purposes •  Early problem detec@on •  Signal abnormal use which could cause outage

Improve monitoring

•  Uneven growth: •  Ac@ve master handling all write requests •  Ver@cal scaling

•  Write only •  More writes than reads

•  SOA problems •  Connec@on spawning •  Open file descriptors

Growing pains

•  An@cipate more than one year in advance •  Acknowledge shortcomings/problems, look for solu@ons or alterna@ves •  Don't commit to one single solu@on! •  Be flexible!

•  Plan for capacity per instance, not for growth alone! •  Start thinking globally!

Outgrowing our startup phase

Spil Storage Platform Sharding is inevitable

What is this exciting project about?

•  Natural growth •  Grown out of necessity for more func@onality •  Adding func@onality means more interac@on •  Separa@on of database func@on •  Profiles •  Highscores •  Comments •  User Generated Content •  etc

Functional sharding

•  KISS •  Problem isola@on

Advantages

Disadvantages •  Uneven growth •  Difference in query panerns •  No data consistency •  No clear ownership of data •  Capacity planning on total number of reads/writes •  Horizontal scaling is difficult

Spil Storage Platform

•  What is the bucket model? •  It is an abstrac@on layer between the database and the datamodel

•  Each record has one unique owner anribute (GID) •  The GID (Global IDen@fier) iden@fies different data types

•  Different buckets per func@on •  Anributes contain record data •  Anributes do not have to correspond to schema

Bucket model

•  Flexibility •  Database backend independent •  Seamless schema changes and upgrades •  Sharded on both func@onal and GID level

•  Even distribu@on of queries possible •  Capacity planning on number and type of en@@es

•  Asynchronous writes possible •  Transparent data migra@on

Advantages

Disadvantages •  Harder to find data •  At least two lookups needed! •  Datawarehousing needs a different approach

•  Globally sharded on GID •  (local) GID Lookup

How do GIDs work?

GID lookup

Shard 1 Shard 2

Persistent storage

Pipeline flow

Current functional shards

LEGACY adapter

New Application SSP

Legacy API

New GID based shards

Read only

Read + write

Bucket mapping and migration

•  Each cluster of two masters will contain two shards •  Data is wrinen interleaved •  HA for both shards •  No warmup needed

•  Both masters ac@ve and “warmed up” •  Slave added for backups and Datawarehouse

Master-Master Sharding

Shard 1

Shard 2

•  Erlang cluster with many workers •  Every GID has its own worker process •  (Inter)cluster communica@on •  (Near) linear scalability

How are we implementing this SSP?

•  Erlang node caching •  Mul@ple backend connectors •  MySQL library •  Handlersockets •  Any other connectors if needed

•  Connec@on pooling

Advantages

Disadvantages •  NOT SEXY? ( hnp://spil.com/notsexy )

Do YOU want to be sexy?

Questions?

Thank you!

Outgrowing an internet startup: database administration in a fast growing company.

Technology