+ All Categories
Home > Technology > Lyfts Envoy: From Monolith to Service Mesh - Matt Klein, Lyft

Lyfts Envoy: From Monolith to Service Mesh - Matt Klein, Lyft

Date post: 22-Jan-2018
Category:
Upload: datawire
View: 10,242 times
Download: 0 times
Share this document with a friend
30
Lyft's Envoy: From monolith to service mesh Matt Klein, Software Engineer @Lyft
Transcript
Page 1: Lyfts Envoy: From Monolith to Service Mesh - Matt Klein, Lyft

Lyft's Envoy: From monolith to service meshMatt Klein, Software Engineer @Lyft

Page 2: Lyfts Envoy: From Monolith to Service Mesh - Matt Klein, Lyft

Agenda

● Historical Lyft SoA architecture● State of SoA networking in industry● What is Envoy● Envoy deployment @Lyft● Envoy future directions● Q&A

Page 3: Lyfts Envoy: From Monolith to Service Mesh - Matt Klein, Lyft

Lyft ~3.5 years ago

PHP / Apachemonolith

MongoDB

InternetClients AWS ELB

Simple! No SoA!

Page 4: Lyfts Envoy: From Monolith to Service Mesh - Matt Klein, Lyft

Lyft ~2 years ago

PHP / Apachemonolith

(+haproxy/nsq)

MongoDB

Internet

Clients

AWS external ELB

DynamoDB

AWS internal ELBs

Python services

Not simple! SoA! With monolith!(and some haproxy/nsq)

Page 5: Lyfts Envoy: From Monolith to Service Mesh - Matt Klein, Lyft

Agenda

● Historical Lyft SoA architecture● State of SoA networking in industry● What is Envoy● Envoy deployment @Lyft● Envoy future directions● Q&A

Page 6: Lyfts Envoy: From Monolith to Service Mesh - Matt Klein, Lyft

State of SoA networking in industry

● Languages and frameworks.● Per language libraries for service calls.● Protocols (HTTP/1, HTTP/2, gRPC, databases, caching, etc.).● Infrastructures (IaaS, CaaS, on premise, etc.).● Intermediate load balancers (AWS ELB, F5, etc.).● Observability output (stats, tracing, and logging).● Implementations (often partial) of retry, circuit breaking, rate limiting,

timeouts, and other distributed systems best practices.● Authentication and Authorization.

Page 7: Lyfts Envoy: From Monolith to Service Mesh - Matt Klein, Lyft

State of SoA networking in industry

A really big and confusing mess...

Page 8: Lyfts Envoy: From Monolith to Service Mesh - Matt Klein, Lyft

State of SoA networking in industry

● Likely already in a world of hurt or rapidly approaching that point.● Debugging is difficult or impossible (each application exposes different stats

and logs with no tracing).● Limited visibility into infra components such as hosted load balancers,

databases, caches, network topologies, etc.).● Multiple and partial implementations of circuit breaking, retry, and rate limiting

(If I had a $ for every time someone told me that retries are “easy” …).● Furthermore, if you do have a good solution, you are likely using a library and

are locked into a particular technology stack essentially forever.● Libraries are incredibly painful to upgrade. (Think CVEs).

Page 9: Lyfts Envoy: From Monolith to Service Mesh - Matt Klein, Lyft

State of SoA networking in industry

● Ultimately, robust observability and easy debugging are everything.● As SoAs become more complicated, it is critical that we provide a common

solution to all of these problems or developer productivity grinds to a halt (and the site goes down … often).

Can we do better?

Page 10: Lyfts Envoy: From Monolith to Service Mesh - Matt Klein, Lyft

Agenda

● Historical Lyft SoA architecture● State of SoA networking in industry● What is Envoy● Envoy deployment @Lyft● Envoy future directions● Q&A

Page 11: Lyfts Envoy: From Monolith to Service Mesh - Matt Klein, Lyft

What is Envoy

The network should be transparent to applications. When network and application problems do occur it should be easy to determine the source of the problem.

This sounds great! But it turns out it’s really, really hard.

Page 12: Lyfts Envoy: From Monolith to Service Mesh - Matt Klein, Lyft

What is Envoy

● Out of process architecture: Let’s do a lot of really hard stuff in one place and allow application developers to focus on business logic.

● Modern C++11 code base: Fast and productive.● L3/L4 filter architecture: A TCP proxy at its core. Can be used for things other

than HTTP (e.g., MongoDB, redis, stunnel replacement, TCP rate limiter, etc.).● HTTP L7 filter architecture: Make it easy to plug in different functionality.● HTTP/2 first! (Including gRPC and a nifty gRPC HTTP/1.1 bridge).● Service discovery and active health checking.● Advanced load balancing: Retry, timeouts, circuit breaking, rate limiting,

shadowing, etc.● Best in class observability: stats, logging, and tracing.● Edge proxy: routing and TLS.

Page 13: Lyfts Envoy: From Monolith to Service Mesh - Matt Klein, Lyft

Envoy service to service topology

Service Cluster

Envoy

Service

Discovery

Service Cluster

Envoy

Service

External Services

HTTP/2REST / GRPC

Page 14: Lyfts Envoy: From Monolith to Service Mesh - Matt Klein, Lyft

Envoy edge proxy topology

“Front” Envoy Edge Proxy Region #1

InternetExternal Clients

HTTP/1.1, HTTP/2, TLS

“Front” Envoy Edge Proxy Region #2

Private Infra

HTTP/2, TLS, Client Auth

Page 15: Lyfts Envoy: From Monolith to Service Mesh - Matt Klein, Lyft

Lyft today

Legacy monolith(+Envoy) MongoDB

Internet

Clients

“Front” Envoy(via TCP ELB) DynamoDB

Python services(+Envoy)

Service mesh! Awesome!

Go services(+Envoy)

Stats / tracing(direct from

Envoy)

Discovery

Page 16: Lyfts Envoy: From Monolith to Service Mesh - Matt Klein, Lyft

Eventually consistent service discovery

● Fully consistent service discovery systems are very popular (ZK, etcd, consul, etc.).

● In practice they are hard to run at scale.● Service discovery is actually an eventually consistent problem. Let’s

recognize that and design for it.● Envoy is designed from the get go to treat service discovery as lossy.● Active health checking used in combination with service discovery to produce a

routable overlay.

Discovery Status HC OK HC Failed

Discovered Route Don’t Route

Absent Route Don’t Route / Delete

Page 17: Lyfts Envoy: From Monolith to Service Mesh - Matt Klein, Lyft

Advanced load balancing

● Different service discovery types.● Zone aware least request load balancing.● Dynamic stats: Per zone, canary specific stats, etc. ● Circuit breaking: Max connections, requests, and retries. ● Rate limiting: Integration with global rate limit service.● Shadowing: Fork traffic to a test cluster.● Retries: HTTP router has built in retry capability with different policies.● Timeouts: Both “outer” (including all retries) and “inner” (per try) timeouts.● Outlier detection: Consecutive 5xx● Deploy control: Blue/green, canary, etc.

Page 18: Lyfts Envoy: From Monolith to Service Mesh - Matt Klein, Lyft

Observability

● Observability is by far the most important thing that Envoy provides.● Having all SoA traffic transit through Envoy gives us a single place where we

can:○ Produce consistent statistics for every hop○ Create and propagate a stable request ID○ Consistent logging○ Distributed tracing

Page 19: Lyfts Envoy: From Monolith to Service Mesh - Matt Klein, Lyft

Observability

Page 20: Lyfts Envoy: From Monolith to Service Mesh - Matt Klein, Lyft

Observability

Page 21: Lyfts Envoy: From Monolith to Service Mesh - Matt Klein, Lyft

Observability

Page 22: Lyfts Envoy: From Monolith to Service Mesh - Matt Klein, Lyft

Observability

Page 23: Lyfts Envoy: From Monolith to Service Mesh - Matt Klein, Lyft

Performance matters for a service proxy

● Throughput is important ultimately for cost and operational scaling, but for many companies developer time is worth more than infra costs.

● BUT: Latency and predictability is what matters. And in particular tail latency (P99+).

● We already deal with incredibly confusing deployments (virtual IaaS, multiple languages and runtimes, languages that use GC, etc.). All of these niceties improve productivity and reduce upfront dev costs, but they make debugging really difficult.

● What is leading to sporadic error? The IaaS? The app? GC?● Ability to reason about overall performance and reliability is critical.

Page 24: Lyfts Envoy: From Monolith to Service Mesh - Matt Klein, Lyft

Performance matters for a service proxy

● A service proxy provides invaluable benefits, but if the proxy itself has tail latencies that are hard to reason about, most of the debugging benefits go out the window and you are back to square one.

● Anyone that is trying to sell you a service infra that does not consider the above points is selling you a dream...

Page 25: Lyfts Envoy: From Monolith to Service Mesh - Matt Klein, Lyft

Agenda

● Historical Lyft SoA architecture● State of SoA networking in industry● What is Envoy● Envoy deployment @Lyft● Envoy future directions● Q&A

Page 26: Lyfts Envoy: From Monolith to Service Mesh - Matt Klein, Lyft

Envoy deployment @Lyft

● > 100 services.● > 10,000 hosts.● > 2,000,000 RPS.● All service to service traffic (REST and gRPC).● Use gRPC bridge to unlock Python and PHP clients.● MongoDB proxy.● DynamoDB proxy.● External service proxy (AWS and other partners).● Kibana/Elastic Search for logging.● LightStep for tracing.● Wavefront for stats.

Page 27: Lyfts Envoy: From Monolith to Service Mesh - Matt Klein, Lyft

Agenda

● Historical Lyft SoA architecture● State of SoA networking in industry● What is Envoy● Envoy deployment @Lyft● Envoy future directions● Q&A

Page 28: Lyfts Envoy: From Monolith to Service Mesh - Matt Klein, Lyft

Envoy future directions

● Redis.● More outlier detection and ejection (SR and latency).● LB subset support.● More rate limiting options / open source rate limit service / IP tagging.● Configuration schema and better error output.● Work with Google/community to add k8s support.● Authentication and authorization.● Envoy ecosystem?

Page 29: Lyfts Envoy: From Monolith to Service Mesh - Matt Klein, Lyft

Agenda

● Historical Lyft SoA architecture● State of SoA networking in industry● What is Envoy● Envoy deployment @Lyft● Envoy future directions● Q&A

Page 30: Lyfts Envoy: From Monolith to Service Mesh - Matt Klein, Lyft

Q&A

● Thanks for coming!● We are super excited about building a community around Envoy. Talk to us if

you need help getting started.● https://lyft.github.io/envoy/● Lyft is hiring: Contact us if you want to work on hard scaling problems in a fast

moving company: https://www.lyft.com/jobs


Recommended