Outgrowing an internet startup: database administration in a fast growing company.

Post on 15-Jan-2015

127 views 2 download

Tags:

description

Database administration in a fast growing company: problems and solutions.

transcript

Spil Games: outgrowing an internet startup Art van Scheppingen Head of Database Engineering

2  

1.  Who  is  Spil  Games?  2.  How  to  professionalize?  3.  Spil  Storage  Pla<orm  4.  Ques@ons?  

Overview

Who are we? Who  is  Spil  Games?    

4  

•  Company  founded  in  2001  •  350+  employees  world  wide  •  170M  unique  visitors  per  month  •  100K  unique  visitors  per  month  on  spilgames.com  

Facts

5  

Geographic Reach 170  Million  Monthly  Ac@ve  Users(*)  

Source:  (*)  Google  Analy3cs,  December  2011  

•  Over  40  localized  portals  in  19  languages  •  Focus  on  casual  and  social  games  •  170M  MAU  per  month  (30M  YoY  growth)  •  Over  40M  registered  users  

6  

Girls,  Teens  and  Family  Brands

7  

Games

8  

Games

9  

Games

10  

Games

11  

Games

12  

Games

•  Inhouse  game  studios  •  Partnerships  with  Social  Gaming  studios  •  Over  1500  licensed  games  

13  

DB  Servers  MAU  Employees  

2006   2007   2008   2009   2010   2011  

DB  Servers  

MAU  

Employees  

Spil Games is growing fast!

Database Engineering How  to  professionalize  your  department    

15  

•  Databases  maintained  by  Systems  Engineering  •  No  focus  on  performance,  structure  or  backups  •  Looking  only  one  or  two  weeks  into  the  future  

Startup

16  

•  Mul@ple  migra@ons  to  new  hardware  •  Ping-­‐pong  on  Master-­‐Master  setups  •  Lack  of  insight  into  performance  issues  

Lessons

Read+write

17  

•  Plan  ahead  up  to  three  months  •  Improve  database  pla<orm  •  Reduce  number  of  repe@@ve  tasks  

•  Write  them  down  step  by  step  (wiki)  •  Automate  where  possible  

•  Improve  monitoring  •  A  single  monitoring  system  is  not  enough!  

•  Forecast  growth  •  Week  /  Month  /  Year  •  Look  back  and  evaluate!  

•  Extend  department  

Professionalize

18  

•  Scaling  the  LDAP  pla<orm  •  LDAP  replaced  by  MySQL  based  solu@on  (with  help  from  Percona)  

LDAP isn’t suitable for the web

MMMDMM

19  

•  LVM  snapshot  method  •  took  4  hours  on  average  with  manual  interven@on  

•  Innobackupex  +  netcat  +  tar  +  script  =  quick  cloning  •  Takes  about  1  hour  per  100GB  •  Foolproof  •  Can  be  run  on  ac@ve  masters  (if  necessary)  

Cloning

20  

•  Different  monitoring  systems  give  different  insights  •  Different  angles/metrics/purposes  •  Early  problem  detec@on  •  Signal  abnormal  use  which  could  cause  outage  

Improve monitoring

21  

•  Uneven  growth:  •  Ac@ve  master  handling  all  write  requests  •  Ver@cal  scaling    

•  Write  only  •  More  writes  than  reads  

•  SOA  problems  •  Connec@on  spawning  •  Open  file  descriptors  

Growing pains

22  

•  An@cipate  more  than  one  year  in  advance  •  Acknowledge  shortcomings/problems,  look  for  solu@ons  or  alterna@ves  •  Don't  commit  to  one  single  solu@on!  •  Be  flexible!  

•  Plan  for  capacity  per  instance,  not  for  growth  alone!  •  Start  thinking  globally!  

Outgrowing our startup phase

Spil Storage Platform Sharding  is  inevitable    

24  

What is this exciting project about?

25  

•  Natural  growth  •  Grown  out  of  necessity  for  more  func@onality  •  Adding  func@onality  means  more  interac@on  •  Separa@on  of  database  func@on  •  Profiles  •  Highscores  •  Comments  •  User  Generated  Content  •  etc  

Functional sharding

26  

•  KISS  •  Problem  isola@on  

Advantages

Disadvantages •  Uneven  growth  •  Difference  in  query  panerns  •  No  data  consistency  •  No  clear  ownership  of  data  •  Capacity  planning  on  total  number  of  reads/writes  •  Horizontal  scaling  is  difficult  

27  

Spil Storage Platform

28  

•  What  is  the  bucket  model?  •  It  is  an  abstrac@on  layer  between  the  database  and  the  datamodel  

•  Each  record  has  one  unique  owner  anribute  (GID)  •  The  GID  (Global  IDen@fier)  iden@fies  different  data  types  

•  Different  buckets  per  func@on  •  Anributes  contain  record  data  •  Anributes  do  not  have  to  correspond  to  schema  

Bucket model

29  

•  Flexibility  •  Database  backend  independent  •  Seamless  schema  changes  and  upgrades  •  Sharded  on  both  func@onal  and  GID  level  

•  Even  distribu@on  of  queries  possible  •  Capacity  planning  on  number  and  type  of  en@@es  

•  Asynchronous  writes  possible  •  Transparent  data  migra@on  

Advantages

Disadvantages •  Harder  to  find  data  •  At  least  two  lookups  needed!  •  Datawarehousing  needs  a  different  approach  

30  

•  Globally  sharded  on  GID  •  (local)  GID  Lookup  

How do GIDs work?

GID lookup

Shard 1 Shard 2

Persistent storage

31  

Pipeline flow

Current functional shards

SPAPI

LEGACY adapter

New Application SSP

Legacy API

New GID based shards

Read only

Read + write

32  

Bucket mapping and migration

33  

•  Each  cluster  of  two  masters  will  contain  two  shards  •  Data  is  wrinen  interleaved  •  HA  for  both  shards  •  No  warmup  needed  

•  Both  masters  ac@ve  and  “warmed  up”  •  Slave  added  for  backups  and  Datawarehouse  

Master-Master Sharding

SSP  

Shard  1                                      

Shard  2                                      

34  

•  Erlang  cluster  with  many  workers  •  Every  GID  has  its  own  worker  process  •  (Inter)cluster  communica@on  •  (Near)  linear  scalability  

How are we implementing this SSP?

35  

•  Erlang  node  caching  •  Mul@ple  backend  connectors  •  MySQL  library  •  Handlersockets  •  Any  other  connectors  if  needed  

•  Connec@on  pooling  

Advantages

Disadvantages •  NOT  SEXY?  (  hnp://spil.com/notsexy  )  

36  

Do YOU want to be sexy?

Questions?

38  

Thank you!