+ All Categories
Home > Documents > Cluster management at Google with Borg - coping with scale · Cluster management at Google with...

Cluster management at Google with Borg - coping with scale · Cluster management at Google with...

Date post: 31-May-2020
Category:
Upload: others
View: 29 times
Download: 0 times
Share this document with a friend
40
Cluster management at Google with Borg - coping with scale 2015-11 john wilkes / [email protected] Principal Software Engineer Derived from EuroSys'15 paper (http://goo.gl/1C4nuo )
Transcript
Page 1: Cluster management at Google with Borg - coping with scale · Cluster management at Google with Borg - coping with scale 2015-11 john wilkes / johnwilkes@google.com ... Core: Abhishek

Cluster management at Google with Borg - coping with scale2015-11

john wilkes / [email protected] Software Engineer

Derived from EuroSys'15 paper (http://goo.gl/1C4nuo)

Page 2: Cluster management at Google with Borg - coping with scale · Cluster management at Google with Borg - coping with scale 2015-11 john wilkes / johnwilkes@google.com ... Core: Abhishek

Cluster management

at Google with Borg -coping with scale2015-11

john wilkes / [email protected] Software Engineer

Derived from EuroSys'15 paper (http://goo.gl/1C4nuo)

the system we internally call

Page 3: Cluster management at Google with Borg - coping with scale · Cluster management at Google with Borg - coping with scale 2015-11 john wilkes / johnwilkes@google.com ... Core: Abhishek

Borg contributorsCore: Abhishek Rai, Abhishek Verma, Andy Zheng, Ashwin Kumar, Ben Smith, Beng-Hong Lim, Bin Zhang, Bolu Szewczyk, Brad Strand, Brian Budge, Brian Grant, Brian Wickman, Chengdu Huang, Chris Colohan, Cliff Stein, Cynthia Wong, Daniel Smith, Dave Bort, David Oppenheimer, David Wall, Divyesh Shah, Dawn Chen, Eric Haugen, Eric Tune, Eric Wilcox, Ethan Solomita, Gaurav Dhiman, Geeta Chaudhry, Greg Roelofs, Grzegorz Czajkowski, James Eady, Jarek Kusmierek, Jaroslaw Przybylowicz, Jason Hickey, Javier Kohen, Jeff Dean, Jeremy Dion, Jeremy Lau, Jerzy Szczepkowski, Joe Hellerstein, John Wilkes, Jonathan Wilson, Joso Eterovic, Jutta Degener, Kai Backman, Kamil Yurtsever, Ken Ashcraft, Kenji Kaneda, Kevan Miller, Kurt Steinkraus, Leo Landa, Liza Fireman, Madhukar Korupolu, Maricia Scott, Mark Logan, Mark Vandevoorde, Markus Gutschke, Matt Sparks, Maya Haridasan, Michael Abd-El-Malek, Michael Kenniston, Ming-Yee Iu, Monika Henzinger, Mukesh Kumar, Nate Calvin, Onufry Wojtaszczyk, Olcan Sercinoglu, Paul Menage, Patrick Johnson, Pavanish Nirula, Pedro Valenzuela, Percy Liang, Piotr Witusowski, Praveen Kallakuri, Rafal Sokolowski, Rajmohan Rajaraman, Richard Gooch, Rishi Gosalia, Rob Radez, Robert Hagmann, Robert Jardine, Robert Kennedy, Rohit Jnagal, Roy Bryant, Rune Dahl, Scott Garriss, Scott Johnson, Sean Howarth, Sheena Madan, Smeeta Jalan, Stan Chesnutt, Temo Arobelidze, Tim Hockin, Todd Wang, Tomasz Blaszczyk, Tomasz Wozniak, Tomek Zielonka, Victor Marmol, Vish Kannan, Vrigo Gokhale, Walfredo Cirne, Walt Drummond, Weiran Liu, Xiaopan Zhang, Xiao Zhang, Ye Zhao, and Zohaib Maya.SRE: Adam Rogoyski, Alex Milivojevic, Anil Das, Cody Smith, Cooper Bethea, Folke Behrens, Matt Liggett, James Sanford, John Millikin, Matt Brown, Miki Habryn, Peter Dahl, Robert van Gent, Seppi Wilhelmi, Seth Hettich, Torsten Marek, and Viraj Alankar.BCL and borgcfg: Marcel van Lohuizen and Robert Griesemer.Reviewers: Christos Kozyrakis, Eric Brewer, Malte Schwarzkopf, and Tom Rodeheffer.

Page 6: Cluster management at Google with Borg - coping with scale · Cluster management at Google with Borg - coping with scale 2015-11 john wilkes / johnwilkes@google.com ... Core: Abhishek

Image by Connie Zhou

Page 7: Cluster management at Google with Borg - coping with scale · Cluster management at Google with Borg - coping with scale 2015-11 john wilkes / johnwilkes@google.com ... Core: Abhishek

job hello_world = {

runtime = { cell = 'ic' } // Cell (cluster) to run in

binary = '.../hello_world_webserver' // Program to run

args = { port = '%port%' } // Command line parameters

requirements = { // Resource requirements (optional)

ram = 100M

disk = 100M

cpu = 0.1

}

replicas = 5 // Number of tasks

}

10000

User view

Page 8: Cluster management at Google with Borg - coping with scale · Cluster management at Google with Borg - coping with scale 2015-11 john wilkes / johnwilkes@google.com ... Core: Abhishek

User view

Page 9: Cluster management at Google with Borg - coping with scale · Cluster management at Google with Borg - coping with scale 2015-11 john wilkes / johnwilkes@google.com ... Core: Abhishek

What justhappened?

web browsers

BorgMaster

link shard

UI shardBorgMaster

link shard

UI shardBorgMaster

link shard

UI shardBorgMaster

link shard

UI shard

Cell

Scheduler

borgcfg web browsers

scheduler

Borglet Borglet Borglet Borglet

BorgMaster

link shard

read/UI shard

Config file

persistent store (Paxos)

Binary

User view

Page 10: Cluster management at Google with Borg - coping with scale · Cluster management at Google with Borg - coping with scale 2015-11 john wilkes / johnwilkes@google.com ... Core: Abhishek

Hello world!

Hello world!

Hello world!

Hello world!Hello

world! Hello world! Hello

world!

Hello world!

Hello world!

Hello world!

Hello world!

Hello world!

Hello world!

Hello world!

Hello world!

Hello world!

Hello world!Hello world!

Hello world!

Hello world!

Hello world!

Hello world!

Hello world! Hello

world!

Hello world!

Hello world!

Hello world!

Image by Connie Zhou

User view

Hello world!

Hello world!

Hello world! Hello

world!

Hello world! Hello

world!

Hello world!

Hello world!

Hello world!

Hello world!

Hello world! Hello

world!

Hello world! Hello

world!

Hello world!

Hello world!

Hello world!

Hello world!

Hello world! Hello

world!

Hello world! Hello

world!

Hello world!

Hello world!

Page 11: Cluster management at Google with Borg - coping with scale · Cluster management at Google with Borg - coping with scale 2015-11 john wilkes / johnwilkes@google.com ... Core: Abhishek

User view

Page 12: Cluster management at Google with Borg - coping with scale · Cluster management at Google with Borg - coping with scale 2015-11 john wilkes / johnwilkes@google.com ... Core: Abhishek

task-eviction rates and causes

13

Failures

Page 13: Cluster management at Google with Borg - coping with scale · Cluster management at Google with Borg - coping with scale 2015-11 john wilkes / johnwilkes@google.com ... Core: Abhishek

Images by Connie Zhou

A 2000-machine service will have >10 task exits per dayThis is not a problem: it's normal

Failures

Page 14: Cluster management at Google with Borg - coping with scale · Cluster management at Google with Borg - coping with scale 2015-11 john wilkes / johnwilkes@google.com ... Core: Abhishek

Advanced bin-packing algorithms

Experimental placement of production VM workload, July 2014

Efficiency

stranded resourcesavailable resourcesone

machine

Page 15: Cluster management at Google with Borg - coping with scale · Cluster management at Google with Borg - coping with scale 2015-11 john wilkes / johnwilkes@google.com ... Core: Abhishek

tasks per machine

Multiple applications per machineCPI^2 paper, EuroSys 2013

Efficiency

Page 16: Cluster management at Google with Borg - coping with scale · Cluster management at Google with Borg - coping with scale 2015-11 john wilkes / johnwilkes@google.com ... Core: Abhishek

17

Sharing clusters between prod/batch helps

Segregating them would need more machines

Efficiencyshared cell

(original)

shared cell(compacted)

non-prod load(compacted)

prod-only load(compacted)

# machines

Page 17: Cluster management at Google with Borg - coping with scale · Cluster management at Google with Borg - coping with scale 2015-11 john wilkes / johnwilkes@google.com ... Core: Abhishek

# machines

18

Sharing clusters between prod/batch helps

Segregating them would need more machines

Efficiencyshared cell

(original)

shared cell(compacted)

non-prod load(compacted)

prod-only load(compacted)

overhead

Page 18: Cluster management at Google with Borg - coping with scale · Cluster management at Google with Borg - coping with scale 2015-11 john wilkes / johnwilkes@google.com ... Core: Abhishek

WasteSharing clusters between prod/batch helps

Segregating them would need more machines

15 production cells from a larger pool, omitting small ones (<5000 machines)

19

Efficiency

Page 19: Cluster management at Google with Borg - coping with scale · Cluster management at Google with Borg - coping with scale 2015-11 john wilkes / johnwilkes@google.com ... Core: Abhishek

20

Efficiency

Smaller cells would need more machines

Page 20: Cluster management at Google with Borg - coping with scale · Cluster management at Google with Borg - coping with scale 2015-11 john wilkes / johnwilkes@google.com ... Core: Abhishek

Bucketing to next-largest power of 2 would need more machines

prod only, starting from 0.5 cores, 0.5GiB

21

Efficiency

Page 21: Cluster management at Google with Borg - coping with scale · Cluster management at Google with Borg - coping with scale 2015-11 john wilkes / johnwilkes@google.com ... Core: Abhishek

There are no obvious resource bucket sizes

cf. cloud VMs

22

nice round numbers

gaming the system

Efficiency

Page 22: Cluster management at Google with Borg - coping with scale · Cluster management at Google with Borg - coping with scale 2015-11 john wilkes / johnwilkes@google.com ... Core: Abhishek

potentially reusable resources

Resource reclamation

23

Efficiency

time

limit: amount of resource requested

usage: actual resource consumption

reservation: estimate of future usage

Page 23: Cluster management at Google with Borg - coping with scale · Cluster management at Google with Borg - coping with scale 2015-11 john wilkes / johnwilkes@google.com ... Core: Abhishek

Resource reclamation could be more aggressive

Nov/Dec 2013

24

Efficiency

Page 24: Cluster management at Google with Borg - coping with scale · Cluster management at Google with Borg - coping with scale 2015-11 john wilkes / johnwilkes@google.com ... Core: Abhishek

Resource reclamation could be more aggressive

Nov/Dec 2013

25

Efficiency

Page 25: Cluster management at Google with Borg - coping with scale · Cluster management at Google with Borg - coping with scale 2015-11 john wilkes / johnwilkes@google.com ... Core: Abhishek

web browsers

BorgMaster

link shard

UI shardBorgMaster

link shard

UI shardBorgMaster

link shard

UI shardBorgMaster

link shard

UI shard

Cell

Scheduler

borgcfg web browsers

scheduler

Borglet Borglet Borglet Borglet

BorgMaster

link shard

read/UI shard

Config file

persistent store (Paxos)

A few other moving parts

Page 26: Cluster management at Google with Borg - coping with scale · Cluster management at Google with Borg - coping with scale 2015-11 john wilkes / johnwilkes@google.com ... Core: Abhishek

app

agent

masterjob config

A few other moving parts

Page 27: Cluster management at Google with Borg - coping with scale · Cluster management at Google with Borg - coping with scale 2015-11 john wilkes / johnwilkes@google.com ... Core: Abhishek

app

agent

master

system config

monitoring

security accounting/planning

binaries + data distribution

job config

storage

Diagram from an original by Cody Smith.

A few other moving parts

Page 28: Cluster management at Google with Borg - coping with scale · Cluster management at Google with Borg - coping with scale 2015-11 john wilkes / johnwilkes@google.com ... Core: Abhishek

app

agent

master

system config

monitoring

security accounting/billing

binaries + data distribution

job config

storage

A few other moving parts

Diagram from an original by Cody Smith.

Page 29: Cluster management at Google with Borg - coping with scale · Cluster management at Google with Borg - coping with scale 2015-11 john wilkes / johnwilkes@google.com ... Core: Abhishek

κυβερνήτης: pilot or helmsman of a ship

http://kubernetes.io

Kubernetes

Page 30: Cluster management at Google with Borg - coping with scale · Cluster management at Google with Borg - coping with scale 2015-11 john wilkes / johnwilkes@google.com ... Core: Abhishek

MachineHost

MachineHost

MachineHost

MachineHost

MachineHost

MachineHost

MachineHost

ContainerAgent

ContainerAgent

ContainerAgent

ContainerAgent

ContainerAgent

ContainerAgent

ContainerAgent

Kubernetes

Web server

Log roller

Page 31: Cluster management at Google with Borg - coping with scale · Cluster management at Google with Borg - coping with scale 2015-11 john wilkes / johnwilkes@google.com ... Core: Abhishek

Log roller

Web server

MachineHost

MachineHost

MachineHost

MachineHost

MachineHost

MachineHost

MachineHost

ContainerAgent

ContainerAgent

ContainerAgent

ContainerAgent

ContainerAgent

ContainerAgent

ContainerAgent

Kubernetesmaster/scheduler

Pods

Page 32: Cluster management at Google with Borg - coping with scale · Cluster management at Google with Borg - coping with scale 2015-11 john wilkes / johnwilkes@google.com ... Core: Abhishek

FE

FE

FE

FE

FEBE

BE

BE BEBE

BE

BEBE

BE

MachineHost

MachineHost

MachineHost

MachineHost

MachineHost

MachineHost

MachineHost

ContainerAgent

ContainerAgent

ContainerAgent

ContainerAgent

ContainerAgent

ContainerAgent

ContainerAgent

Kubernetes master/scheduler

Labels

Page 33: Cluster management at Google with Borg - coping with scale · Cluster management at Google with Borg - coping with scale 2015-11 john wilkes / johnwilkes@google.com ... Core: Abhishek

FE

FE

FE

FE

FEBE

BE

BE BEBE

BE

BEBE

BE

MachineHost

MachineHost

MachineHost

MachineHost

MachineHost

MachineHost

MachineHost

ContainerAgent

ContainerAgent

ContainerAgent

ContainerAgent

ContainerAgent

ContainerAgent

ContainerAgent

Kubernetes master/scheduler

Label selectors labels: role: frontend

Page 34: Cluster management at Google with Borg - coping with scale · Cluster management at Google with Borg - coping with scale 2015-11 john wilkes / johnwilkes@google.com ... Core: Abhishek

MachineHost

MachineHost

MachineHost

MachineHost

MachineHost

MachineHost

MachineHost

ContainerAgent

ContainerAgent

ContainerAgent

ContainerAgent

ContainerAgent

ContainerAgent

ContainerAgent

Kubernetes master/scheduler

FE

FE

FE

FE

FEBE

BE

BE BEBE

BE

BEBE

BE

Label selectors labels: role: frontend stage: production

Page 35: Cluster management at Google with Borg - coping with scale · Cluster management at Google with Borg - coping with scale 2015-11 john wilkes / johnwilkes@google.com ... Core: Abhishek

FE FE FE

replicas: 3template: ...labels: role: frontend

MachineHost

MachineHost

MachineHost

MachineHost

MachineHost

MachineHost

MachineHost

ContainerAgent

ContainerAgent

ContainerAgent

ContainerAgent

ContainerAgent

ContainerAgent

ContainerAgent

Kubernetes - Master/Scheduler

Replica controller

Page 36: Cluster management at Google with Borg - coping with scale · Cluster management at Google with Borg - coping with scale 2015-11 john wilkes / johnwilkes@google.com ... Core: Abhishek

FE FE FE FE

replicas: 4template: ...labels: role: frontend

MachineHost

MachineHost

MachineHost

MachineHost

MachineHost

MachineHost

MachineHost

ContainerAgent

ContainerAgent

ContainerAgent

ContainerAgent

ContainerAgent

ContainerAgent

ContainerAgent

Kubernetes - Master/Scheduler

Replica controller

Page 37: Cluster management at Google with Borg - coping with scale · Cluster management at Google with Borg - coping with scale 2015-11 john wilkes / johnwilkes@google.com ... Core: Abhishek

id: frontend-serviceport: 9000labels: role: frontend

frontend-service

FE FE FE FE

MachineHost

MachineHost

MachineHost

MachineHost

MachineHost

MachineHost

MachineHost

ContainerAgent

ContainerAgent

ContainerAgent

ContainerAgent

ContainerAgent

ContainerAgent

ContainerAgent

Kubernetes - Master/Scheduler

Service

Page 38: Cluster management at Google with Borg - coping with scale · Cluster management at Google with Borg - coping with scale 2015-11 john wilkes / johnwilkes@google.com ... Core: Abhishek

Kubernetes

Direct Borg analogues:

● Borg containers => Docker containers● alloc (task group) => pod (container group)● Borglet => Kubelet● persistent, declarative specs● reconciliation loops

Page 39: Cluster management at Google with Borg - coping with scale · Cluster management at Google with Borg - coping with scale 2015-11 john wilkes / johnwilkes@google.com ... Core: Abhishek

New / improved:

● labels + label queries● service abstraction● composable microservices

○ replication controller○ horizontal autoscaler

● IP per pod

Kubernetes

Page 40: Cluster management at Google with Borg - coping with scale · Cluster management at Google with Borg - coping with scale 2015-11 john wilkes / johnwilkes@google.com ... Core: Abhishek

[email protected]

http://kubernetes.iohttp://goo.gl/1C4nuo (Borg paper)

Images by Connie Zhou

Observations:

1. Resiliency is achieved only by ruthless attention to detaila. ubiquitous software fault toleranceb. persistent, declarative specs

2. We get efficiency by:a. sharing resourcesb. reclaiming unused allocations

3. Containers make users more productive


Recommended