+ All Categories
Home > Technology > Failover and Global Server Load Balancing for Better Network Availability

Failover and Global Server Load Balancing for Better Network Availability

Date post: 18-Nov-2014
Category:
Upload: dyn
View: 5,714 times
Download: 0 times
Share this document with a friend
Description:
Speaker Jeremy Hitchcock of Dynamic Network Services presents how to obtain better uptime and availability through network techniques like failover, global server load balancing, and CDN balancing. Presented at Interop NYC 09.
41
Failover and Global Server Load Balancing for Be4er Network Availability Jeremy Hitchcock CEO Dynamic Network Services
Transcript
Page 1: Failover and Global Server Load Balancing for Better Network Availability

Failover  and  Global  Server  Load  Balancing    for  Be4er  Network  Availability  

Jeremy  Hitchcock  CEO  

Dynamic  Network  Services  

Page 2: Failover and Global Server Load Balancing for Better Network Availability

Overview  

• Problem  space:  Keeping  services  up  

• About  Failover  and  GSLB  

• Case  Study:  Roll  your  own  CDN  in...quick  • Case  Study:  Speed  and  Stability  

• Case  Study:  DR  You  can  Sleep  On  • General  lessons  for  network  availability  

Page 3: Failover and Global Server Load Balancing for Better Network Availability
Page 4: Failover and Global Server Load Balancing for Better Network Availability

You  are  probably…  

•  SoJware  service  provider  •  Completely  online  

•  UpLme  and  revenue  directly  related  

•  Audience  is  internaLonal  (non-­‐geographical)  

So  is  everyone  (lot  more  of  us)!  

Page 5: Failover and Global Server Load Balancing for Better Network Availability

Mean  Time  Between  Failures  (MTBF)  (Local)  

Page 6: Failover and Global Server Load Balancing for Better Network Availability

Fiber  Cuts  (Network/global)  

Page 7: Failover and Global Server Load Balancing for Better Network Availability

Failures  Are  a    Way  of  Life  

• Affects  bo4om  line  

• Gets  people  paged  

• Brands  loose  value  

Page 8: Failover and Global Server Load Balancing for Better Network Availability

A  Be4er  Way?  

•  Current  tools:  in-­‐house  scripts,  appliances,  CDN  networks  

•  Either  high  opex  or  capex  •  New  opLons  in  infrastructure  •  Example:  – 5-­‐10  person  [boot-­‐strapped]  companies  rolling  self-­‐healing,  auto-­‐provisioning  networks  

Page 9: Failover and Global Server Load Balancing for Better Network Availability

OpLmizing  The    Wrong  Part  

• Hardware  redundancy  is  expensive  • Single  point  of  failures  are  bad  

• Infrastructure  is  not  a  core  funcLon  • Things  break,  everything  auto  

• Easier  (cheaper)  than  you  think  

Page 10: Failover and Global Server Load Balancing for Better Network Availability

RealizaLons  

•  Things  break,  route  around  outages  •  Infrastructure  providers  a  plenty  today  •  Users  more  sensiLve  to  outages  

•  Internet  users  are  around  the  world  – Speed  of  light  is  sLll  c  – RTT  of  100m  with  50  objects  adds  up  

Traffic  management  is  criBcal  

Page 11: Failover and Global Server Load Balancing for Better Network Availability

Different  Architectures,    Different  Results  

Old   New  

Use  hardware  redundancy,  local   Use  soJware  redundancy  

Super-­‐site  build  out     Regionalize,  all  over-­‐provisioned  

Page  on  failure,  fix  based  on  page   Email  report  in  morning  

Planned  deployments   AutomaLc  load  handling  

Single  master  datacenter   Many  POPs,  all  closer  to  users  

DR  is  a  passive,  manual  failover   DR  and  failover  blended  together  

Page 12: Failover and Global Server Load Balancing for Better Network Availability

New  Tools  (new  to  some)  

•  AutomaLc  failover  •  Global  server  load  balancing  •  CDN  balancing/managing  

•  Opex  relaLve  to  actual  usage  •  Avoid  capex  step  funcLons  

Page 13: Failover and Global Server Load Balancing for Better Network Availability

•  Two  acLve  components,                                                                                              

traffic  switch                  

•  Implies  external  monitoring  

•  Hide  outages  

Failover  

Standard  operaLon  

On  Failover  

Page 14: Failover and Global Server Load Balancing for Better Network Availability

Failover  Use  Cases  

•  Two  servers  for  www.domain.com  – On  failure,  redirect  from  one  to  the  other  

– Works  via  DNS  – Redirect  to  a  staLc  page  

•  Requirements  – External  monitoring  point  

– External  DNS  – Low  DNS  caching  TTL  values  

Page 15: Failover and Global Server Load Balancing for Better Network Availability

•  More  than  two  acLve  

components  

•  Traffic  management  

–  TargeLng  (geo,  network)  –  WeighLng  (percent)  

•  Failover  plus  opLmize  RTT  

•  Hostname  to  A  record  mapping  

Global  Server  Load  Balancing  (GSLB)  

Page 16: Failover and Global Server Load Balancing for Better Network Availability

Global  Server  Load  Balancing  Use  Cases  

•  Regionalize  eyeballs/end-­‐users  •  Internet  outages/subpar  speeds  avoided  •  Weight  based  on  load,  percentages  

•  Requirements:  – Same  as  failover  – Bit  of  math/algorithms  to  balance  traffic  – Many  to  many  mappings  

Page 17: Failover and Global Server Load Balancing for Better Network Availability

•  Two  complete  systems  

•  Balance  between  CDNs  

–  Bandwidth  commits  

–  Regional  advantages  

•  Works  on  CNAMEs  

CDN  Management  

Page 18: Failover and Global Server Load Balancing for Better Network Availability

CDN  Manager    

•  Try  out  a  mix  of  networks    – CDNs,  infrastructure  providers  

•  Be4er  manage  traffic  – Cost/performance  reasons  

•  Requirements  – Same  as  GSLB  but  with  DNS  alias  CNAMEs  

Page 19: Failover and Global Server Load Balancing for Better Network Availability

•  Internet  doesn't  care  about  domain.com  

•  twi4er.com  128.121.146.228  

•  Lot  of  tricks  you  can  do  here  

Traffic  Cop:  DNS  

Page 20: Failover and Global Server Load Balancing for Better Network Availability

Lenses  and  OpLons  

•  EvaluaLon  Criteria  – SoJ/hard  costs,  capital/operaLng  costs  

•  Outcome  based  – Determine  your  metrics,  test  those  

•  PotenLal  Outcomes  – Roll  it  in  house  – CDN  Network  – Hardware  appliances  – SaaS-­‐based  

Page 21: Failover and Global Server Load Balancing for Better Network Availability

Which  one  is  be4er?  

•  Roll  it  in  house  – Mid-­‐high  capex,  higher  than  you  think  opex  –  Lots  of  soJ-­‐costs,  applicaLon  specific  though  

•  CDN  Network  –  Li4le  capex,  high  opex  –  Some  have  more  knobs  than  others    

•  Hardware  appliances  –  High  capex,  low  opex  –  Need  to  make  full  investment  into  architecture  

•  SaaS-­‐based  –  Li4le  capex,  low-­‐mid  opex  –  Let  others  worry  about  this  for  you  

Page 22: Failover and Global Server Load Balancing for Better Network Availability

Case  Study  1  Roll  your  own  CDN  in...quick  

Wikia  and  regionalizing    CDNs  for  be4er  delivery  

Page 23: Failover and Global Server Load Balancing for Better Network Availability

CDN  Choice  and  Transparency  

•  Lots  of  CDNs  – Two  great  public  ones  – 30  (more?)  private  providers  – Telco/ISP  opLons  

•  Currently  give  customer  hostname  –  (customer.cdn.com)  

•  Only  test  with  live  traffic  

Page 24: Failover and Global Server Load Balancing for Better Network Availability

CDN  Manager:  Enabling  TesLng  

•  Segment  traffic  and  test  •  Try  2  or  10  CDNs  •  Low  risk  method  to  collect  data  

•  Data  collecLon  has  to  be  from  end  points  – Your  office  computer  is  not  the  Internet  

•  Can  be4er  rate  cost/performance  

Page 25: Failover and Global Server Load Balancing for Better Network Availability

CDN  Manager:  Wikia  

•  Wikia  runs  several  niche  wikis  (audience)  •  OpLmize  traffic  delivery  for  those  niches  

•  Wanted  to  determine  the  best  CDN  based  on  actual  data  

Page 26: Failover and Global Server Load Balancing for Better Network Availability

CDN  Manager:  Wikia  

•  In  America,  use  CDN  •  In  Europe,  use  their  own  •  Why?    Who  knows,  but  it’s  the  best  for  their  traffic  

Page 27: Failover and Global Server Load Balancing for Better Network Availability

Discussion  

•  Not  all  CDNs  are  the  same  •  MulLple  relaLonships  to  manage  

•  Cost  control/performance  of  CDNs  

•  Audience  and  economies  drive  decisions  

Page 28: Failover and Global Server Load Balancing for Better Network Availability

Case  Study  2  Speed  and  Stability  

Twi4er  and  keeping  up  

Page 29: Failover and Global Server Load Balancing for Better Network Availability

Speed  and  Stability  

•  All  Internet  sites  have  DNS  – Range  from  good,  bad,  ugly  

•  Online  services  must  be  fast  and  accurate  – Latency  and  upLme  are  what  ma4ers  

•  Things  fail  all  the  Lme,  sends  users  to  what  works  

Page 30: Failover and Global Server Load Balancing for Better Network Availability

Speed  and  Stability:  Twi4er  

•  Spiky  and  growing  traffic  (like  a  lot)  •  Things  change  too  fast  to  keep  up  •  Load  balance  a  lot  •  Easier  to  scale  core  competencies  

•  One  less  thing  to  worry  about  

Page 31: Failover and Global Server Load Balancing for Better Network Availability

Speed  and  Stability:  Twi4er  

•  DNS  part  of  system  to  make  site  work  •  Desire  not  to  be  an  expert  in  it  •  Huge,  wide  spread  audience  •  Online-­‐only  service  

Page 32: Failover and Global Server Load Balancing for Better Network Availability

Discussion  

•  When  infrastructure  changes  rapidly,  external  monitoring  good  

•  Failover  message  is  be4er  than  Lmeouts  

•  Keep  traffic  regionalize  through  targeLng  

•  Outsource  non-­‐core  competencies  

•  Latency  affects  page  views  or  ad  revenue  

Page 33: Failover and Global Server Load Balancing for Better Network Availability

Case  Study  3:  Disaster  Recovery  You  Can  Sleep  With  

37  Signals  and  doing    what  needs  to  get  done  

Page 34: Failover and Global Server Load Balancing for Better Network Availability

Disaster  Recovery  ImplementaLon  

Requirements  – One  good  facility  (A)  – One  backup  facility  (B)  – Ability  to  recognize  facility  A  is  out  – Ability  to  direct  traffic  from  A  to  B  

Page 35: Failover and Global Server Load Balancing for Better Network Availability

Authorize.net  Interlude  

•  DR  implementaLon  Lmeline  –  Late-­‐July:  move  to  new  DR  facility  and  plan  –  July  2:  fire  at  Fisher  Plaza  (unplanned)  –  July  3:  …  

•  Only  missing  a  traffic  engineering  switch  •  TTLs  (DNS  record  caching)  a  big  difference  –  SLll  a  problem  today  –  secure.authorize.net.      86400      IN            A              64.94.118.32  

•  Fully  discussion:  h4p://bit.ly/23mayf  

Page 36: Failover and Global Server Load Balancing for Better Network Availability

DR:  37  Signals  

•  Cloud  based  SaaS  tools,  have  to  be  up  •  External  DNS  important  for  controlling  traffic  

•  What  if  facility  A  is  down  and  DNS  is  only  at  A?  

•  External  DNS  means  failover/DR  possible  

Page 37: Failover and Global Server Load Balancing for Better Network Availability

Discussion  

•  Ensuring  full  replicaLon  is  usually  easy  •  Traffic  management,  is  usually  the  problem  

•  Confuse  cold  assets/warm  spare/hot  acLve  

•  People  wait  unLl  they  have  an  outage  to  implement  DR  

Page 38: Failover and Global Server Load Balancing for Better Network Availability

Overall  Notes  

•  Networked  services  need  to  be  rock  solid  •  Failover,  GSLB,  and  CDNM  are  within  reach  

•  Wikia,  Twi4er,  and  37  Signals  using  external  traffic  management  for  their  applicaLon  

•  Audience  ma4ers,  so  does  tesLng  and  benchmarking  

Page 39: Failover and Global Server Load Balancing for Better Network Availability

•  DynTini  

twi4er.com/dynLni  

Page 40: Failover and Global Server Load Balancing for Better Network Availability

Copy  of  presentaLon?  

Leave  a  business  card  in  back  (or  talk  to  me  aJerwards)  and  I’ll  send  it  to  you  

Page 41: Failover and Global Server Load Balancing for Better Network Availability

Dynamic  Network  Services,  Inc.  1230  Elm  St.  FiJh  Floor  Manchester,  NH  03101  

+1  888.840.3258  [email protected]  dyn.com  

           Join  us  for  drinks:  dynLni.com                Follow  us  on  Twi4er:  @DynInc  

Contact  Us  

Uptime Is the

Bottom Line.


Recommended