+ All Categories
Home > Documents > Spyre:’A’Resource’Management Framework’for’Container:based ... ·...

Spyre:’A’Resource’Management Framework’for’Container:based ... ·...

Date post: 22-May-2020
Category:
Upload: others
View: 4 times
Download: 0 times
Share this document with a friend
27
Spyre: A Resource Management Framework for Containerbased Clouds Karthick Rajamani, Alexandre Ferreira, Juan Rubio OpEmized Cloud Infrastructure, IBM Research Wes Felter IBM Cloud InnovaEon Lab {karthick,apferrei,rubioj,wmf}@us.ibm.com
Transcript
Page 1: Spyre:’A’Resource’Management Framework’for’Container:based ... · Containers’offer’beVer’performance’than’VMs’ 0 5 10 15 20 25 30 35 40 45 50 55 2000 4000 6000

Spyre:  A  Resource  Management  Framework  for  Container-­‐based  

Clouds  Karthick  Rajamani,  Alexandre  Ferreira,  Juan  Rubio  OpEmized  Cloud  Infrastructure,  IBM  Research    

 Wes  Felter  

IBM  Cloud  InnovaEon  Lab      

{karthick,apferrei,rubioj,wmf}@us.ibm.com    

Page 2: Spyre:’A’Resource’Management Framework’for’Container:based ... · Containers’offer’beVer’performance’than’VMs’ 0 5 10 15 20 25 30 35 40 45 50 55 2000 4000 6000

Overview  

•  What  is  Spyre?  •  Resource  management  with  Spyre  •  Performance  evaluaEon  •  Status  and  next  steps  •  Extending  Tenant  SLA  models  –  discussion    

Page 3: Spyre:’A’Resource’Management Framework’for’Container:based ... · Containers’offer’beVer’performance’than’VMs’ 0 5 10 15 20 25 30 35 40 45 50 55 2000 4000 6000

Containers  offer  beVer  performance  than  VMs  

0

5

10

15

20

25

30

35

40

45

50

55

2000 4000 6000 8000 10000 12000 14000

Late

ncy

(ms)

Throughput in Transactions/s

NativeDocker net=host volume

Docker NAT volumeDocker NAT AUFS

KVM qcow

Source:  An  Updated  Performance  Comparison  of  Virtual  Machines  and  Linux  Containers  –  Wes  Felter,  Alexandre  Ferreira,  Ram  Rajamony,  Juan  Rubio  

Sysbench  with  MySQL  

Page 4: Spyre:’A’Resource’Management Framework’for’Container:based ... · Containers’offer’beVer’performance’than’VMs’ 0 5 10 15 20 25 30 35 40 45 50 55 2000 4000 6000

What  is  Spyre?  OpEmized  foundaEon  for  the  container-­‐based  cloud  - Containers  are  fundamental  unit  of  computaEon  (not  container  in  VM)  

-  Superior  resource  isolaEon  and  performance  (tail  latency)  for  tenant/performance-­‐sensiEve  services  –  resource-­‐isolated  slices.  

-  Support  resource-­‐sharing  among  containers  used  as  side-­‐cars  (running  within  same  slice).  

- Avoid  mulE-­‐tenant  dockerd  issue  –  each  client  (slice)  can  have  their  own  dockerd.  

- Can  be  used  with  any  container  eco-­‐system  –  we  have  experimented  to  date  with  Docker.  

kvm  

Docker  

Page 5: Spyre:’A’Resource’Management Framework’for’Container:based ... · Containers’offer’beVer’performance’than’VMs’ 0 5 10 15 20 25 30 35 40 45 50 55 2000 4000 6000

Spyre  Goals  •  Predictable  performance  (including  tail)  

–  Strong  isolaEon  (e.g.,  dedicated  physical  cores)  with  slices  –  Allocate  resources  using  real  units  (say  Ghz  not  abstract  compute  units)  

–  Unique  use/configuraEon  of  cgroups  •  VerEcal  scaling    

–  Grow  containers  while  running  (e.g.,  add  cores/RAM)  –  Sublebng:  spot  market  (like  a  CloudBnB)  

•  High  performance  –  Base  unit  is  containers  –  OpEmize  storage  &  network  I/O  

 e.g.,  eliminate  NAT  and  replace  AUFS  with  block  storage    

Page 6: Spyre:’A’Resource’Management Framework’for’Container:based ... · Containers’offer’beVer’performance’than’VMs’ 0 5 10 15 20 25 30 35 40 45 50 55 2000 4000 6000

Resource  management  with  Spyre  

•  Key  concept:  Slices  • Dedicated  resources  for  predictable/guaranteed  performance  • Dedicated  physical  cores  • Dedicated  RAM  

Page 7: Spyre:’A’Resource’Management Framework’for’Container:based ... · Containers’offer’beVer’performance’than’VMs’ 0 5 10 15 20 25 30 35 40 45 50 55 2000 4000 6000

Challenges  to  sharing  core  resources  

0  

1  

2  

3  

4  

5  

6  

Integer  Loop   Float.  Loop   L1  random  reads  

L2  random  reads  

L3  random  reads  

Normalized  Throughput  (8T/1T)  

0  

1  

2  

3  

4  

5  

6  

Integer  Loop   Float.  Loop   L1  random  reads  

L2  random  reads  

L3  random  reads  

Normalized  Latency  (8T/1T)  

•  Shared  cores  result  in  variable  impact  on  performance  •  Significant,  difficult  to  predict  impact  for  tenant  workload  •  Difficult  to  predict  returns  for  provider  

•  Data  taken  on  POWER8  processor  which  has  dedicated  L1,  L2,  L3  cache  per  core  

Page 8: Spyre:’A’Resource’Management Framework’for’Container:based ... · Containers’offer’beVer’performance’than’VMs’ 0 5 10 15 20 25 30 35 40 45 50 55 2000 4000 6000

Resource  management  with  Spyre  

•  Key  concept:  Slices  • Dedicated  resources  for  predictable  performance  

• Dedicated  physical  cores  • Dedicated  RAM  

• Guaranteed  minimum  network  bandwidth  • MulEple  vNICs,  IP  addresses,  block  storage  (opEonal)    

Page 9: Spyre:’A’Resource’Management Framework’for’Container:based ... · Containers’offer’beVer’performance’than’VMs’ 0 5 10 15 20 25 30 35 40 45 50 55 2000 4000 6000

IAAS  Customer  View  -­‐  Slice  Name   Type   cores   RAM  (GB)   L3  (MB)   Net.  BW   Price/hr  

BDW-­‐2GB-­‐HT   Broadwell  2GHz   1/4   2   -­‐   0.31   $0.04  

BDW-­‐4GB-­‐1T   Broadwell  2GHz   1/2   4   -­‐   0.63   $0.06  

BDW-­‐8GB-­‐1C   Broadwell  2GHz   1   8   1.5   1.25   $0.10  

BDW-­‐16GB-­‐2C   Broadwell  2GHz   2   16   3.0   2.50   $0.22  

BDW-­‐24GB-­‐3C   Broadwell  2GHz   3   24   4.5   3.75   $0.33  

BDW-­‐32GB-­‐4C   Broadwell  2GHz   4   32   6.0   5.00   $0.44  

BDW-­‐40GB-­‐5C   Broadwell  2GHz   5   40   7.5   6.25   $0.55  

BDW-­‐48GB-­‐6C   Broadwell  2GHz   6   48   9.0   7.50   $0.66  

BDW-­‐56GB-­‐7C   Broadwell  2GHz   7   56   10.5   8.75   $0.77  

BDW-­‐64GB-­‐8C   Broadwell  2GHz   8   60   12.0   10.00   $0.80  

P8-­‐4GB-­‐2T   Power8  3.x  GHz   1/4   4   -­‐   0.25   $0.06  

P8-­‐8GB-­‐4T   Power8  3.x  GHz   1/2   8   -­‐   0.5   $0.11  

P8-­‐16GB-­‐1C   Power8  3.x  GHz   1   16   8.0   1   $0.20  

Page 10: Spyre:’A’Resource’Management Framework’for’Container:based ... · Containers’offer’beVer’performance’than’VMs’ 0 5 10 15 20 25 30 35 40 45 50 55 2000 4000 6000

ObservaEons  from  other  work  •  ISCA  2015  –  “Heracles:  Improving  Resource  Efficiency  at  

Scale  –  David  Lo  et  al.”  –  Latency  criEcal  workloads  need  dedicated/isolated  resources,  disEnct  from  those  allowed  to  be  assigned  for  batch  workloads  

•  Microservices  require  stronger  focus  around  component-­‐service  tail  latencies  –  Increased  probability  of  impact  on  composite  service  latency.  –  hVps://engineering.linkedin.com/performance/who-­‐moved-­‐my-­‐99th-­‐percenEle-­‐latency  -­‐  Richard  Hsu  and  Cuong  Tran  

Spyre-­‐slice  frame-­‐work  of  value  also  to  latency-­‐sensiAve  cloud  services.  

Page 11: Spyre:’A’Resource’Management Framework’for’Container:based ... · Containers’offer’beVer’performance’than’VMs’ 0 5 10 15 20 25 30 35 40 45 50 55 2000 4000 6000

Resource  management  with  Spyre  

•  Key  concept:  Slices  •  Dedicated  resources  for  predictable  performance  

•  Dedicated  physical  cores  •  Dedicated  RAM  

•  Guaranteed  minimum  network  bandwidth  •  MulEple  vNICs,  IP  addresses,  block  storage  (opEonal)    

•  Implemented  using  cgroups  &  systemd  units  •  Note:  systemd  does  not  yet  support  dedicated  cores  (cpusets),  custom  

script  implements  it.  

•  MulEple  containers  per  slice  (similar  to  Kubernetes  pod/Carina  segment)  •  Allows  intra-­‐customer  sharing  of  resources  

Page 12: Spyre:’A’Resource’Management Framework’for’Container:based ... · Containers’offer’beVer’performance’than’VMs’ 0 5 10 15 20 25 30 35 40 45 50 55 2000 4000 6000

Slice  

Cores  

Memory  vN

ICs  

iSCSI  volum

es  

Mem

ory   M

emory  

NICs  (Eme-­‐shared)  

Possibly  no  local  storage  

Host  

Resource  View  

Page 13: Spyre:’A’Resource’Management Framework’for’Container:based ... · Containers’offer’beVer’performance’than’VMs’ 0 5 10 15 20 25 30 35 40 45 50 55 2000 4000 6000

Slice  —  tenant  view  

Host  A  

dockerd  sshd  

Docker  container  X  

Docker  container  Y  

Docker  container  Z  

eth0  

Page 14: Spyre:’A’Resource’Management Framework’for’Container:based ... · Containers’offer’beVer’performance’than’VMs’ 0 5 10 15 20 25 30 35 40 45 50 55 2000 4000 6000

Host  —  soxware  view  

Server  sliced  systemd  

Slice  A  pflask  

dockerd  sshd  

Docker  container  X  

Docker  container  Y  

Docker  container  Z  

eth0  Slice  B  

pflask  

dockerd  sshd  

Docker  container  X’  

Docker  container  Y’  

eth0  

Slice  C  pflask  

dockerd  sshd  

Docker  container    

eth0  

eth1  

Page 15: Spyre:’A’Resource’Management Framework’for’Container:based ... · Containers’offer’beVer’performance’than’VMs’ 0 5 10 15 20 25 30 35 40 45 50 55 2000 4000 6000

Slice  benefit  analysis  with  an  in-­‐memory  database  workload  

•  DB2  BLU  (in-­‐memory  database)    –  AGG_COL  

•  includes  up  to  10  concurrent  streams  of  SQL  queries  •  used  to  emulate  background,  interfering  job  

–  REPORT_COL  •  includes  up  to  10  concurrent  streams  of  SQL  queries  •  used  to  emulate  foreground  job.  

•  4  instances  of  100GB  datasets  with  3  REPORT_COL  and  1  AGG_COL  executed  concurrently  in    –  4  Docker  containers  on  Host  –  4  Docker  containers,  each  within  own  slice  (6-­‐core)  on  Host  

•  24-­‐core  POWER8-­‐S824  machine  (2  6-­‐core  dies  per  socket,  2  sockets)  with  512GB  of  memory  spread  evenly  among  the  dies.  

•  All  databases  are  resident  on  iSCSI  volumes  •  2  runs  done  for  both  container-­‐only  and  containers-­‐within-­‐slice  scenarios  

–  6  data  points  for  REPORT_COL,  2  data  points  for  AGG_COL  for  each  scenario  

Page 16: Spyre:’A’Resource’Management Framework’for’Container:based ... · Containers’offer’beVer’performance’than’VMs’ 0 5 10 15 20 25 30 35 40 45 50 55 2000 4000 6000

REPORT_COL  average  

Average  beVer  or  same  with  slices.  

0  

100  

200  

300  

400  

500  

600  

700  

800  

10   1   2   3   4   5   6   7   8   9  

Time  (secon

ds)  

Query  Stream  ID  

Average  Stream  ExecuLon  Time  (REPORT_COL)  

Containers  

Slices  

Lower  is  BeVer  

Page 17: Spyre:’A’Resource’Management Framework’for’Container:based ... · Containers’offer’beVer’performance’than’VMs’ 0 5 10 15 20 25 30 35 40 45 50 55 2000 4000 6000

REPORT_COL  worst-­‐case  performance  

0  

0.2  

0.4  

0.6  

0.8  

1  

1.2  

1.4  

1.6  

1.8  

10   1   2   3   4   5   6   7   8   9  

RaLo

 of  m

ax  execuLo

n  Lm

e  to  average  

Query  Stream  ID  

RaLo  of  max.  exec.  Lme  across  runs  by  avg.  exec.  Lme  (REPORT_COL)  

Containers  

Slices  

Slices  improve  worst-­‐case  performance  i.e.  lower  tail  latency  (lower  raEo  of  max  to  average)  

Page 18: Spyre:’A’Resource’Management Framework’for’Container:based ... · Containers’offer’beVer’performance’than’VMs’ 0 5 10 15 20 25 30 35 40 45 50 55 2000 4000 6000

AGG_COL  average  

0  

200  

400  

600  

800  

1000  

1200  

10   1   2   3   4   5   6   7   8   9  

Time  (secon

ds)  

Query  Stream  ID  

Average  Stream  ExecuLon  Time  (AGG_COL)  

Containers  

Slices  

AGG_COL  benefits  from  stealing  resources,  i.e.,  sees  lower  performance  when  constrained  within  slice.  

Page 19: Spyre:’A’Resource’Management Framework’for’Container:based ... · Containers’offer’beVer’performance’than’VMs’ 0 5 10 15 20 25 30 35 40 45 50 55 2000 4000 6000

AGG_COL  worst-­‐case  performance  

0.9  

0.95  

1  

1.05  

1.1  

1.15  

1.2  

10   1   2   3   4   5   6   7   8   9  

RaLo

 of  m

ax  execuLo

n  Lm

e  to  average  

Query  Stream  ID  

RaLo  of  Max.  exec.  Lme  across  runs  by  average  exec.  Lme  (AGG_COL)  

Containers  

Slices  

Lower  variaEon  of  runEmes  with  slices.  caveat:  only  two  data  points  behind  each  bar.  

Page 20: Spyre:’A’Resource’Management Framework’for’Container:based ... · Containers’offer’beVer’performance’than’VMs’ 0 5 10 15 20 25 30 35 40 45 50 55 2000 4000 6000

Spyre  Status  sliced  in  Linux  on  x86  and  POWER:  •  Interface  

–  Simple  REST  API  supporEng  slice  create,  query,  resize,  delete,  and  to  query  system  for  resources  available/free  

–  Returns  and  accepts  JSON  •  Capability  

–  Provides  CPU  (core,  cache)  isolaEon  –  AutomaEc  memory  affinity  with  CPU  –  VerEcal  scaling  

•  ImplementaEon  –  Python  –  Systemd,  cgroups,  cpuset  –  Pflask  for  outer  container  –  Slice  has  own  IP,  ssh  access  with  public  key  

 Opening  project  to  community    

Page 21: Spyre:’A’Resource’Management Framework’for’Container:based ... · Containers’offer’beVer’performance’than’VMs’ 0 5 10 15 20 25 30 35 40 45 50 55 2000 4000 6000

Next  Steps  

•  Spyred  implementaEon  for  stand-­‐alone  cluster.  •  Memory  bandwidth  control  (IBM  POWER8)  and  shared-­‐cache  control  (Intel  Haswell+)  –  hardware-­‐specific.  

•  Networking  design  and  network  bw  control  work.  •  Storage  design.  •  IntegraEon  with  broader  eco-­‐system:  Machine+Swarm,  Kubernetes,  Mesos…  (?)  

•  Extending  tenant  SLA  models  

Page 22: Spyre:’A’Resource’Management Framework’for’Container:based ... · Containers’offer’beVer’performance’than’VMs’ 0 5 10 15 20 25 30 35 40 45 50 55 2000 4000 6000

Extending Tenant-Slice service models  

Performance Isolation  

Guaranteed  Resources  

Dedicated  High priority  (CPU sets)  

Page 23: Spyre:’A’Resource’Management Framework’for’Container:based ... · Containers’offer’beVer’performance’than’VMs’ 0 5 10 15 20 25 30 35 40 45 50 55 2000 4000 6000

Extending Tenant-Slice service models – Vertical Resizing  

Guaranteed  Resources  

Dedicated  High priority  (CPU sets)  

Dedicated  Low priority  (CPU sets)  

Max        Current    

Current          Min    

Pays    for  Current  and  a  premium  to  go  up  to  Max.  

Pays  for  Current  and  a  discount  for  allowing  to  be  taken  down  to  Min.  

Page 24: Spyre:’A’Resource’Management Framework’for’Container:based ... · Containers’offer’beVer’performance’than’VMs’ 0 5 10 15 20 25 30 35 40 45 50 55 2000 4000 6000

Extending Tenant-Slice service models – Increasing density  

Performance Isolation  

Guaranteed  Resources  

Dedicated  High priority  (CPU sets)  

Dedicated  Low-Priority  (CPU sets)  

Shared  High-Priority  (CPU shares)  

Gets  a  discount  for  toleraEng  jiVer,  potenEally  higher  tail  latencies.  

Page 25: Spyre:’A’Resource’Management Framework’for’Container:based ... · Containers’offer’beVer’performance’than’VMs’ 0 5 10 15 20 25 30 35 40 45 50 55 2000 4000 6000

Extending Tenant-Slice service models – Increasing density  

Performance Isolation  

Guaranteed  Resources  

Dedicated  High priority  (CPU sets)  

Dedicated  Low-Priority  (CPU sets)  

Shared  High-Priority  (CPU shares)  

Shared  Low-Priority  

(CPU shares)  

Guaranteed  =  Requested  resource  averaged  over  some  Eme  interval  

Guaranteed  =  Requested  resource  when  occasionally  acEve  (enables  provider  to  overcommit);  gets  discount  for  not  needing  requested  resources  all  the  Eme.  

Page 26: Spyre:’A’Resource’Management Framework’for’Container:based ... · Containers’offer’beVer’performance’than’VMs’ 0 5 10 15 20 25 30 35 40 45 50 55 2000 4000 6000

Discussion    •  How  important  is  dynamic  resizing  ability  –  both  growing  up  and  ability  to  

pay  for  a  lower  minimum?  

•  Is  dynamic  resizing  applicable  to  memory  –  Can  applicaEons  deal  with  some  of  their  allocated  memory  being  moved  to  

swap?  –  Will  high-­‐speed  swap  (SSD/NVME  backed)  help?  

•  If  a  system  supports  both  dedicated  and  shared  is  there  need  for  high/low  priority  sub-­‐classes?  

•  Any  user  classes  not  covered  by  these  models?  

•  Any  other  comments,  quesEons?  

Page 27: Spyre:’A’Resource’Management Framework’for’Container:based ... · Containers’offer’beVer’performance’than’VMs’ 0 5 10 15 20 25 30 35 40 45 50 55 2000 4000 6000

Thank  you  

IBM  Research  is  hiring  in  Cloud  Infrastructure  and  Data  centers  area.  

 If  interested  please  contact  me,    Email:  [email protected]  


Recommended