More Containers, More Problemsevents.static.linuxfound.org/sites/events/files/slides/More Containers...

Post on 01-Aug-2020

1 views 0 download

transcript

Ed Rooth@sym3tri | ed.rooth@coreos.com | coreos.com

More Containers, More Problems

1. Define problems2. Define vision of the solution3. How CoreOS is building solutions4. How you can get started

Agenda

a server

It all started with...

many servers

Then we got...

VMs on our servers

Then we got...

APIs around hosted VMs (cloud)

Then we got...

even more servers

Which led to...

The cloud made booting servers really easy.

Also… Moore’s law is still a thing.

Too Many Servers!

Patching………………………..is hardDependency management........is hardManaging access ……………...is hardManaging workloads ………....is hardApp Lifecycle management .. ..is hardIdentifying security issues ......is hard

More Servers, More Problems

More Servers == More Sysadmins

Servers

Sysadmins

1000

500

0

1000

500

0

More Servers, More Problems

Servers

Sysadmins

… before the rest of us did.

They solved many of these problems internally,and published some great papers.

Google needed more servers

We started building it

CoreOS, Google, and the community...

are building the open-source version.

#GIFEE

Google’sInfrastructureForEveryoneElse

What is #GIFEE?

"Fundamentally, it's what happens when you ask a software engineer to design an operations function."

--Ben Treynor SlossVice President, Google Engineering

founder of Google SRE

Google’s Infrastructure

Servers are not your pets

Servers are the new CPU Cores

Clusters are the new servers

What is #GIFEE?

Evolution of Servers

Clusters

Server Cluster

Clusters

Process App

Operating System Custom Linux

Distributed Consensus Chubby

Cluster Manager Borg

Monitoring BorgMon

RPC framework Stubby

Auth private

Operating System Custom Linux CoreOS Linux

Distributed Consensus Chubby etcd

Cluster Manager Borg Kubernetes

Monitoring BorgMon Prometheus

RPC framework Stubby gRPC

Auth private Dex

Open Source

“cluster operating system”

Orchestration

State

Scheduler: Gets work to the servers

OS for Clusters

Software manages servers

Software manages workloads

Declare what you want, it will become so

What is #GIFEE?

workerkubelet

workerkubelet

workerkubelet

workerkubelet

workerkubelet

workerkubelet

workerkubelet

API +

scheduler

workerkubelet

API +

scheduler

API +

Scheduler+

worker

works on 1 node too

Primary component of the Cluster OS

Fits our vision

Started by Google with over 10 yrs experience running Borg

Centralized administration & orchestration

No more SSH

Yes, that even means your favorite config mgmt tool

What is #GIFEE?

What is #GIFEE?

$ scp myapp host:/opt$ ssh host systemd-run /opt/myapp Don’t say HOW

What is #GIFEE?

$ kubectl run myapp--image=quay.io/sym3tri/hello--replicas=1

$ kubectl get podsPOD IPmyapp-97wt8 10.2.29.3

say WHAT

What is #GIFEE?

$ kubectl scale rc myapp--replicas=4

$ kubectl get podsPOD IPmyapp-97wt8 10.2.29.3myapp-f839d 10.2.29.4myapp-98b35 10.2.29.5myapp-e40ee 10.2.29.8

say WHATagain

What is #GIFEE?

$ kubectl run myapp--image=quay.io/sym3tri/hello--replicas=1

$ kubectl get podsPOD IPmyapp-97wt8 10.2.29.3

say WHAT one more time

RC web-prod

select(env=prod,app=web)count=1

Pod

env=prodapp=web

RC web-prod

select(env=prod,app=web)count=4

Pod

env=prodapp=web

Pod

env=prodapp=web

Pod

env=prodapp=web

Pod

env=prodapp=web

automated != automatic

Dependencies are isolated per app

Apps automatically migrate throughout the cluster

What is #GIFEE?

All apps are “12-factor”

Configuration/Secret management

What is #GIFEE?

prodconfig

stagingconfig

Consistent Deployment API

Deploy canary builds and experiments

Rolling Updates

What is #GIFEE?

Load BalancedService

appv1

appv1

appv1

appv1

Load BalancedService

appv1

appv1

appv1

appv1

appv2

Load BalancedService

appv1

appv1

appv1

appv1

appv2

Load BalancedService

appv1

appv1

appv1

appv1

appv2

Load BalancedService

appv1

appv1

appv1

appv2

appv2

Load BalancedService

appv1

appv1

appv2

appv2

appv2

Load BalancedService

appv2

appv2

appv2

appv2

C TeamB Team A Team

What is #GIFEE?

Mixed workloads (staging + prod)

Logically partitioned resources

Trusted & Secure from the bottom up*

Only trusted code is executed

What is #GIFEE?

Cluster OS

Container Runtime

OS

Firmware & TPM

Every {human,machine,process} is…authenticated & authorized

All communication is encrypted

What is #GIFEE?

workerkubelet

API +

scheduler

Failure is expected and handled for…

- Services / Apps- Machines- Storage- Clusters- Regions

What is #GIFEE?

Logging

Monitoring / Alerting

What is #GIFEE?

Compatibility with existing tools

Work with other projects (Docker, Calico, Prometheus)

Incorporates lessons learned

#GIFEE vs Google Infra?

Build for scale

Manage your apps, not servers

High Availability

New paradigm of infra/development

Why?

We believe:

As #GIFEE becomes ubiquitous, the Internet becomes more secure overall

#GIFEE and Security

Secure the Internet

CoreOS Mission

Journey to #GIFEE

Leverage prior work + standards

- Raft- Omaha Protocol- OIDC

Getting Started

Start from the bottom

The Operating System

Securing The Internet

Minimal Server OS + Automatic Updates

Requires:- Distributed consensus- Containers- Cluster computing

Securing The Internet

In this new world we containerize all the things…

Containerize

but…

Containerize

“Every solution breeds new problems”

-Arthur Bloch

1つの問題解決 → 別の問題発生

More Containers, More Problems

Problem #1- Secure & controlled

container distribution

More Containers, More Problems

Problem #1- Secure & controlled

container distribution

More Containers, More Problems

Solution

More Containers, More Problems

Problem #2- Docker security model- Docker coupling of

components

More Containers, More Problems

Problem #2- Docker security model- Docker coupling of

components

Solution

More Containers, More Problems

systemd

app

systemd

app

docker run redis

docker engine daemon

Implementation:

Side Note: Spec vs Implementation

Side Note: Spec vs Implementation

Specification:

https://en.wikipedia.org/wiki/ISO_668

More Containers, More Problems

Problem #3- User Authentication

More Containers, More Problems

Problem #3- User Authentication

Solution - Dex

More Containers, More Problems

Problem #4- Really big containers

More Containers, More Problems

Problem #4- Really big containers

Solution- Go- Buildroot- acbuild for ACIs

github.com/brianredbeard/minimal_containers

NOOOOOOOOO!!!

Your container is 500MB !?

Problems #5-11- Co-locating Containers- Intelligent Scheduling- Port Management- Segmenting workloads- Configuration Management- Secrets Management- Inconsistent Deployments

More Containers, More Problems

Problems #5-11- Co-locating Containers- Intelligent Scheduling- Port Management- Segmenting workloads- Configuration Management- Secrets Management- Inconsistent Deployments

More Containers, More Problems

Solution

More Containers, More Problems

Problem #12 Networking- Too many types of SDNs- IP per POD

More Containers, More Problems

Problem #12 Networking- Too many types of SDNs- IP per POD

Solution- CNI

More Containers, More Problems

Problem #13- Metrics- Monitoring- Alerting

More Containers, More Problems

Problem #13- Metrics- Monitoring- Alerting

Solution- Prometheus

More Containers, More Problems

Problem #14- Vulnerabilities inside

containers

More Containers, More Problems

Problem #14- Vulnerabilities inside

containers

Solution

More Containers, More Problems

Problem #15- Visualize & configure

clusters

More Containers, More Problems

Problem #15- Visualize & configure

clusters

Solution- Tectonic Console

More Containers, More Problems

Problem #16- Running on Bare Metal

More Containers, More Problems

Problem #16- Running on Bare Metal

Solution- Ignition- coreos-baremetal- Tectonic baremetal

installer

More Containers, More Problems

Problem #17- Inability to verify node

trust

More Containers, More Problems

Solution- Distributed Trusted

Computing (DTC)

Problem #17- Inability to verify node

trust

More Containers, More Problems

Problem #18- Persistent storage

More Containers, More Problems

Solution- Torus

Problem #18- Persistent storage

Kubernetes is the kernel, Tectonic is the distro.

tectonic.com @tectonic

off-the-shelf #GIFEE

Kubernetes Contributions

OIDC Authentication

RBAC Authorization

TLS Bootstrapping

rktnetes

2x Scheduler Performance

etcd 3 support

coreos-kubernetes

Bootstrap/Upgrade Simplification

Future

More Management Tools

Expand platform support

Prometheus Enhancements

Federated Clusters

Summary

Open-Source is key

Security is key

Updates are key

Containers

Orchestration

Automatic systems

Ed Rooth@sym3tri | ed.rooth@coreos.com | coreos.com

More Containers, More Problems

We’re hiring in all departments! Email: careers@coreos.com Positions: coreos.com/ careers

90+ Projects on GitHub, 1,000+ Contributors

OPEN SOURCE

CoreOS.com - @coreoslinux - github/coreos

Secure solutions, support plans, training + more

ENTERPRISE

sales@coreos.com - tectonic.com - quay.io

CoreOS is Running the World’s Containers