+ All Categories
Transcript
Page 1: Failover and Global Server Load Balancing for Better Network Availability

Failover  and  Global  Server  Load  Balancing    for  Be4er  Network  Availability  

Jeremy  Hitchcock  CEO  

Dynamic  Network  Services  

Page 2: Failover and Global Server Load Balancing for Better Network Availability

Overview  

• Problem  space:  Keeping  services  up  

• About  Failover  and  GSLB  

• Case  Study:  Roll  your  own  CDN  in...quick  • Case  Study:  Speed  and  Stability  

• Case  Study:  DR  You  can  Sleep  On  • General  lessons  for  network  availability  

Page 3: Failover and Global Server Load Balancing for Better Network Availability
Page 4: Failover and Global Server Load Balancing for Better Network Availability

You  are  probably…  

•  SoJware  service  provider  •  Completely  online  

•  UpLme  and  revenue  directly  related  

•  Audience  is  internaLonal  (non-­‐geographical)  

So  is  everyone  (lot  more  of  us)!  

Page 5: Failover and Global Server Load Balancing for Better Network Availability

Mean  Time  Between  Failures  (MTBF)  (Local)  

Page 6: Failover and Global Server Load Balancing for Better Network Availability

Fiber  Cuts  (Network/global)  

Page 7: Failover and Global Server Load Balancing for Better Network Availability

Failures  Are  a    Way  of  Life  

• Affects  bo4om  line  

• Gets  people  paged  

• Brands  loose  value  

Page 8: Failover and Global Server Load Balancing for Better Network Availability

A  Be4er  Way?  

•  Current  tools:  in-­‐house  scripts,  appliances,  CDN  networks  

•  Either  high  opex  or  capex  •  New  opLons  in  infrastructure  •  Example:  – 5-­‐10  person  [boot-­‐strapped]  companies  rolling  self-­‐healing,  auto-­‐provisioning  networks  

Page 9: Failover and Global Server Load Balancing for Better Network Availability

OpLmizing  The    Wrong  Part  

• Hardware  redundancy  is  expensive  • Single  point  of  failures  are  bad  

• Infrastructure  is  not  a  core  funcLon  • Things  break,  everything  auto  

• Easier  (cheaper)  than  you  think  

Page 10: Failover and Global Server Load Balancing for Better Network Availability

RealizaLons  

•  Things  break,  route  around  outages  •  Infrastructure  providers  a  plenty  today  •  Users  more  sensiLve  to  outages  

•  Internet  users  are  around  the  world  – Speed  of  light  is  sLll  c  – RTT  of  100m  with  50  objects  adds  up  

Traffic  management  is  criBcal  

Page 11: Failover and Global Server Load Balancing for Better Network Availability

Different  Architectures,    Different  Results  

Old   New  

Use  hardware  redundancy,  local   Use  soJware  redundancy  

Super-­‐site  build  out     Regionalize,  all  over-­‐provisioned  

Page  on  failure,  fix  based  on  page   Email  report  in  morning  

Planned  deployments   AutomaLc  load  handling  

Single  master  datacenter   Many  POPs,  all  closer  to  users  

DR  is  a  passive,  manual  failover   DR  and  failover  blended  together  

Page 12: Failover and Global Server Load Balancing for Better Network Availability

New  Tools  (new  to  some)  

•  AutomaLc  failover  •  Global  server  load  balancing  •  CDN  balancing/managing  

•  Opex  relaLve  to  actual  usage  •  Avoid  capex  step  funcLons  

Page 13: Failover and Global Server Load Balancing for Better Network Availability

•  Two  acLve  components,                                                                                              

traffic  switch                  

•  Implies  external  monitoring  

•  Hide  outages  

Failover  

Standard  operaLon  

On  Failover  

Page 14: Failover and Global Server Load Balancing for Better Network Availability

Failover  Use  Cases  

•  Two  servers  for  www.domain.com  – On  failure,  redirect  from  one  to  the  other  

– Works  via  DNS  – Redirect  to  a  staLc  page  

•  Requirements  – External  monitoring  point  

– External  DNS  – Low  DNS  caching  TTL  values  

Page 15: Failover and Global Server Load Balancing for Better Network Availability

•  More  than  two  acLve  

components  

•  Traffic  management  

–  TargeLng  (geo,  network)  –  WeighLng  (percent)  

•  Failover  plus  opLmize  RTT  

•  Hostname  to  A  record  mapping  

Global  Server  Load  Balancing  (GSLB)  

Page 16: Failover and Global Server Load Balancing for Better Network Availability

Global  Server  Load  Balancing  Use  Cases  

•  Regionalize  eyeballs/end-­‐users  •  Internet  outages/subpar  speeds  avoided  •  Weight  based  on  load,  percentages  

•  Requirements:  – Same  as  failover  – Bit  of  math/algorithms  to  balance  traffic  – Many  to  many  mappings  

Page 17: Failover and Global Server Load Balancing for Better Network Availability

•  Two  complete  systems  

•  Balance  between  CDNs  

–  Bandwidth  commits  

–  Regional  advantages  

•  Works  on  CNAMEs  

CDN  Management  

Page 18: Failover and Global Server Load Balancing for Better Network Availability

CDN  Manager    

•  Try  out  a  mix  of  networks    – CDNs,  infrastructure  providers  

•  Be4er  manage  traffic  – Cost/performance  reasons  

•  Requirements  – Same  as  GSLB  but  with  DNS  alias  CNAMEs  

Page 19: Failover and Global Server Load Balancing for Better Network Availability

•  Internet  doesn't  care  about  domain.com  

•  twi4er.com  128.121.146.228  

•  Lot  of  tricks  you  can  do  here  

Traffic  Cop:  DNS  

Page 20: Failover and Global Server Load Balancing for Better Network Availability

Lenses  and  OpLons  

•  EvaluaLon  Criteria  – SoJ/hard  costs,  capital/operaLng  costs  

•  Outcome  based  – Determine  your  metrics,  test  those  

•  PotenLal  Outcomes  – Roll  it  in  house  – CDN  Network  – Hardware  appliances  – SaaS-­‐based  

Page 21: Failover and Global Server Load Balancing for Better Network Availability

Which  one  is  be4er?  

•  Roll  it  in  house  – Mid-­‐high  capex,  higher  than  you  think  opex  –  Lots  of  soJ-­‐costs,  applicaLon  specific  though  

•  CDN  Network  –  Li4le  capex,  high  opex  –  Some  have  more  knobs  than  others    

•  Hardware  appliances  –  High  capex,  low  opex  –  Need  to  make  full  investment  into  architecture  

•  SaaS-­‐based  –  Li4le  capex,  low-­‐mid  opex  –  Let  others  worry  about  this  for  you  

Page 22: Failover and Global Server Load Balancing for Better Network Availability

Case  Study  1  Roll  your  own  CDN  in...quick  

Wikia  and  regionalizing    CDNs  for  be4er  delivery  

Page 23: Failover and Global Server Load Balancing for Better Network Availability

CDN  Choice  and  Transparency  

•  Lots  of  CDNs  – Two  great  public  ones  – 30  (more?)  private  providers  – Telco/ISP  opLons  

•  Currently  give  customer  hostname  –  (customer.cdn.com)  

•  Only  test  with  live  traffic  

Page 24: Failover and Global Server Load Balancing for Better Network Availability

CDN  Manager:  Enabling  TesLng  

•  Segment  traffic  and  test  •  Try  2  or  10  CDNs  •  Low  risk  method  to  collect  data  

•  Data  collecLon  has  to  be  from  end  points  – Your  office  computer  is  not  the  Internet  

•  Can  be4er  rate  cost/performance  

Page 25: Failover and Global Server Load Balancing for Better Network Availability

CDN  Manager:  Wikia  

•  Wikia  runs  several  niche  wikis  (audience)  •  OpLmize  traffic  delivery  for  those  niches  

•  Wanted  to  determine  the  best  CDN  based  on  actual  data  

Page 26: Failover and Global Server Load Balancing for Better Network Availability

CDN  Manager:  Wikia  

•  In  America,  use  CDN  •  In  Europe,  use  their  own  •  Why?    Who  knows,  but  it’s  the  best  for  their  traffic  

Page 27: Failover and Global Server Load Balancing for Better Network Availability

Discussion  

•  Not  all  CDNs  are  the  same  •  MulLple  relaLonships  to  manage  

•  Cost  control/performance  of  CDNs  

•  Audience  and  economies  drive  decisions  

Page 28: Failover and Global Server Load Balancing for Better Network Availability

Case  Study  2  Speed  and  Stability  

Twi4er  and  keeping  up  

Page 29: Failover and Global Server Load Balancing for Better Network Availability

Speed  and  Stability  

•  All  Internet  sites  have  DNS  – Range  from  good,  bad,  ugly  

•  Online  services  must  be  fast  and  accurate  – Latency  and  upLme  are  what  ma4ers  

•  Things  fail  all  the  Lme,  sends  users  to  what  works  

Page 30: Failover and Global Server Load Balancing for Better Network Availability

Speed  and  Stability:  Twi4er  

•  Spiky  and  growing  traffic  (like  a  lot)  •  Things  change  too  fast  to  keep  up  •  Load  balance  a  lot  •  Easier  to  scale  core  competencies  

•  One  less  thing  to  worry  about  

Page 31: Failover and Global Server Load Balancing for Better Network Availability

Speed  and  Stability:  Twi4er  

•  DNS  part  of  system  to  make  site  work  •  Desire  not  to  be  an  expert  in  it  •  Huge,  wide  spread  audience  •  Online-­‐only  service  

Page 32: Failover and Global Server Load Balancing for Better Network Availability

Discussion  

•  When  infrastructure  changes  rapidly,  external  monitoring  good  

•  Failover  message  is  be4er  than  Lmeouts  

•  Keep  traffic  regionalize  through  targeLng  

•  Outsource  non-­‐core  competencies  

•  Latency  affects  page  views  or  ad  revenue  

Page 33: Failover and Global Server Load Balancing for Better Network Availability

Case  Study  3:  Disaster  Recovery  You  Can  Sleep  With  

37  Signals  and  doing    what  needs  to  get  done  

Page 34: Failover and Global Server Load Balancing for Better Network Availability

Disaster  Recovery  ImplementaLon  

Requirements  – One  good  facility  (A)  – One  backup  facility  (B)  – Ability  to  recognize  facility  A  is  out  – Ability  to  direct  traffic  from  A  to  B  

Page 35: Failover and Global Server Load Balancing for Better Network Availability

Authorize.net  Interlude  

•  DR  implementaLon  Lmeline  –  Late-­‐July:  move  to  new  DR  facility  and  plan  –  July  2:  fire  at  Fisher  Plaza  (unplanned)  –  July  3:  …  

•  Only  missing  a  traffic  engineering  switch  •  TTLs  (DNS  record  caching)  a  big  difference  –  SLll  a  problem  today  –  secure.authorize.net.      86400      IN            A              64.94.118.32  

•  Fully  discussion:  h4p://bit.ly/23mayf  

Page 36: Failover and Global Server Load Balancing for Better Network Availability

DR:  37  Signals  

•  Cloud  based  SaaS  tools,  have  to  be  up  •  External  DNS  important  for  controlling  traffic  

•  What  if  facility  A  is  down  and  DNS  is  only  at  A?  

•  External  DNS  means  failover/DR  possible  

Page 37: Failover and Global Server Load Balancing for Better Network Availability

Discussion  

•  Ensuring  full  replicaLon  is  usually  easy  •  Traffic  management,  is  usually  the  problem  

•  Confuse  cold  assets/warm  spare/hot  acLve  

•  People  wait  unLl  they  have  an  outage  to  implement  DR  

Page 38: Failover and Global Server Load Balancing for Better Network Availability

Overall  Notes  

•  Networked  services  need  to  be  rock  solid  •  Failover,  GSLB,  and  CDNM  are  within  reach  

•  Wikia,  Twi4er,  and  37  Signals  using  external  traffic  management  for  their  applicaLon  

•  Audience  ma4ers,  so  does  tesLng  and  benchmarking  

Page 39: Failover and Global Server Load Balancing for Better Network Availability

•  DynTini  

twi4er.com/dynLni  

Page 40: Failover and Global Server Load Balancing for Better Network Availability

Copy  of  presentaLon?  

Leave  a  business  card  in  back  (or  talk  to  me  aJerwards)  and  I’ll  send  it  to  you  

Page 41: Failover and Global Server Load Balancing for Better Network Availability

Dynamic  Network  Services,  Inc.  1230  Elm  St.  FiJh  Floor  Manchester,  NH  03101  

+1  888.840.3258  [email protected]  dyn.com  

           Join  us  for  drinks:  dynLni.com                Follow  us  on  Twi4er:  @DynInc  

Contact  Us  

Uptime Is the

Bottom Line.


Top Related