Kubernetes and lastminute.com: our course towards better scalability and processes (Codemotion Rome...

Kubernetes and lastminute.com group: our course towards better scalability and processes

[email protected]@micheleorsi

Rome, 24-25 March 2017

http://twitter.com/micheleorsi

http://twitter.com/micheleorsi

An inspiring travel company

2

A tech company to the core

Tech department: 300+ people

Applications: ~100

Database: 4 TB data

Servers: 1400 VMs, 300 physical machines

Locations: Chiasso, Milan, Madrid, London, Bengaluru

3

https://www.pexels.com/photo/turtle-walking-on-sand-132936/

Business: "technology is slow"

Technology: "the monolith is the problem"

https://www.flickr.com/photos/southtopia/5702790189

https://www.pexels.com/photo/gray-pebbles-with-green-grass-51168/

"... let’s break into microservices"

A lot of issues

● LONG provisioning time

● LACK OF alignment across environments

● LACK OF alignment across applications

● LACK OF awareness about ops (monitoring, alerting)

7

An year-long endeavour

● build a new, modern infrastructure

● migrate the search (flight/hotel) product there

... without:

● impacting the business● throwing away our whole datacenter

8

Our plan

● same architecture across environments

● a common framework to align software

● centralized monitoring/logging, with alerts

● zero downtime deployment

● automation everywhere

9

How? Teams and peopleNew teams

https://www.pexels.com/photo/blue-lego-toy-beside-orange-and-white-lego-toy-standing-during-daytime-105822/

Our infrastructure and technologyOur infrastructure and technology

https://www.pexels.com/photo/colorful-toothed-wheels-171198/

Docker containers

registry.intra/application:v2-090025032017

BASE OS

JAVA JRE

START/STOP SCRIPTS

JAR APPLICATION

● build once, run everywhere

● externalised configuration

12

Kubernetes

● independent from OS/hosts

● isolated env, managed at scale

● self-healing

● externalised configuration

Omega paper: http://research.google.com/pubs/pub41684.html

13

http://research.google.com/pubs/pub41684.html

https://www.pexels.com/photo/red-toy-truck-24619/

"Your infrastructure on wheels"

Kubernetes: physical representation

NODE1

cluster

NODE2

NODE70

...

K8S

DOCKER

FLAN

NE

LD

ET

CD

Ubuntu

K8S

DOCKER

FLAN

NE

LD

ET

CD

Ubuntu

K8S

DOCKER

FLAN

NE

LD

ET

CD

Ubuntu

15

Kubernetes: logical representation

NAMESPACE1CPU 10

MEM 40GB

cluster

NAMESPACE2CPU 20

MEM 80GB

NAMESPACE3CPU 80

MEM 90GB NAMESPACE4CPU 100

MEM 10GB

16

APP3-PRODUCTION

Kubernetes: our architecture

APP2-PRODUCTIONAPP1-PRODUCTION


APP1-PREVIEW


APP1-DEVELOPMENT


APP1-QA

nonproductionproduction

17

APP1-PRODUCTION

Kubernetes: our architecture and choices

POD

collectd

production

applicationfluentdcarbon

18

APP1-PRODUCTION

POD

Monitoring and alerting: grafana + graphite

cluster

graphiteapplication

Grafana 4

icons from http://www.flaticon.com

collectd

carbon

19

http://www.flaticon.com

Kubernetes: our architecture and choices

APP1-PRODUCTION

deployment

replica-set

app1.lastminute.intra

secret configmap

POD3

POD2

POD1

production

20

Kubernetes: what’s left outside?

● datastores

○ DBs

○ logs

○ metrics

● distributed caches

● distributed locking

● pub-sub

21

1st try (with test app), it seemed to work

https://www.flickr.com/photos/26516072@N00/2194001232

Self-healing

ref: https://technologyconversations.com/2016/01/26/self-healing-systems

application

I am fine, thanks

Hey, how are you?

Hey, how are you?

I have problems

23

https://technologyconversations.com/2016/01/26/self-healing-systems

Kubernetes contract

"When a container is dead I will restart it"

"When a container is ready I will forward traffic to it"

Kubernetes probes: liveness & readiness

Two questions:

● when can I consider my container alive?

● when can I consider my container ready to receive traffic?

spec: containers: livenessProbe: httpGet: path: /liveness

readinessProbe: httpGet: path: /readiness

deployment.yaml

/liveness:

● when tomcat container is up● when ratio active/max threads < threshold

/readiness:

● all the startup jobs have run

.. ongoing never-ending research ..

Our choices: framework - k8s

26

● zero downtime during rollout

● resilience improved

● legacy infrastructure to the rescue in case of problem

2nd try (with production traffic)

27

... failure ... the big one!

https://www.flickr.com/photos/ghost_of_kuji/2763674926

Problems

● configuration

● infrastructure

● tools

● manual mistakes

● (external) scalability

29

● temporary team focus on objective

● automation

● Go deeper in docker/kubernetes

Another improvement step

30

Pipeline: a huge step forward

microservice = factory.newDeployRequest().withArtifact("com.lastminute.application1",2)

lmn_deployCanaryStrategy(microservice,"qa") lmn_deployCanaryStrategy(microservice,"preview")lmn_deployCanaryStrategy(microservice,"production")

pipeline

31

Pipeline: a huge step forward

● git push○ continuous integration○ continuous delivery

pulljar

builddocker

(gate)

QAcanary

(gate)

QAstable

(gate)

PREVcanary

(gate)

PREVstable

(gate)

PRODcanary

(gate)

PRODstable

32

"Go" deep .. whatever language it takes

https://www.pexels.com/photo/sea-man-person-ocean-2859/

nginx ingress controller problem

NGINX

NGINX

NGINX

LB

10.0.0.1

10.0.0.2

10.0.0.3

10.0.0.4

10.0.0.5

10.0.0.6

NGINX

NGINX

NGINX

NGINX

NGINX

34

There’s light .. There’s a light .. at the end

https://www.pexels.com/photo/grayscale-photography-of-person-at-the-end-of-tunnel-211816/

● lead and migration time

● resilience

● root cause analysis

● speed of deployment

● instant and easy scaling

... benefits

36

● 70 physical nodes, 1300 pods, 5200 containers● 20k req/sec in the new cluster● 35 micro-services migrated in 6 months● 10 minutes to create a new environment ● whole pipeline runs in 16 minutes

○ 4 minutes to release 100 instances of a new version● 2M metrics/minute flows

Give me the numbers!

37

Yes, we’re hiring!

THANKS

careers.lastminutegroup.com

38

Date post:	11-Apr-2017
Category:	Technology
Upload:	michele-orsi
View:	243 times
Download:	0 times